Blogs

Consolidating and migrating to an in-memory analytics cloud

Post Comment
Big Data Evangelist, IBM

In-memory computing is an analytics accelerator of major proportions. It also has clear advantages for online transaction processing.

No matter what your application, once you've tasted the speed boost from in-memory computing platforms, you won't turn back. Any business that can achieve even an incremental boost in the speed, scale, efficiency and versatility of analytic and transactional workload processing can exploit them for disruptive competitive differentiators.

As I stated recently, the consolidated memory cloud will be the dominant architecture of the big data future. Evolutionary trends are pushing us all toward a global architecture where all data is in cloud memory and only in cloud memory. In the aggregate, the end-to-end global memory cloud will scale well into the exabytes and beyond. As improvements in technological scale and cost-efficiencies make this vision a reality, other storage technologies will have vanished. All data functions will be performed asymptotically close to light speed.

However inevitable, it may take quite a while for the consolidated "all-and-only-in-memory" cloud to become the predominant big data platform. For enterprises and service providers trying to get closer to that vision, the migration path will not always be straightforward. It will include many transitional deployment patterns. For example, operational data may be partially in memory, due to technical or economic constraints on RAM. Or it may be all in memory, but duplicated onto other media, due to need for backup to other technologies that are less volatile than RAM.

BLU Acceleration.pngWith that in mind, managed-memory capabilities, as outlined in this Wikipedia entry, will be essential to all these transitional environments. This feature enables the entire working-data set (volumes, tables, records and more) to be swapped or paged to and from RAM. This enables data sets that are larger than system memory to be supported within memory-constrained in-memory architectures. It also enables backup of data from RAM to other storage platforms as needed.

Ideally, in-memory data swapping or paging should be automatic and continuously optimized. IBM DB2 10.5 with BLU Acceleration, for example, supports automated memory management, which always ensures that the hottest, most frequently queried data is kept in RAM at all times. Here's a recent blog I wrote that goes in-depth on the role of the in-memory cloud in data warehouse modernization.

These memory-management decisions are far too complex and dynamic to justify requiring database administrators (DBAs) to make them manually. In other words, most DBAs don't have the technical depth to know how to handle any or all of these functions manually: sizing of objects in memory, allocation of memory to hold these objects, tuning of in-memory objects, monitoring of memory space available to objects and tuning and adjustment of memory space. Requiring the DBA to decide all of this almost guarantees that the memory allocation settings will always be suboptimal, and that the in-memory system will almost never operate at its maximum potential speed.

Also, the need to make these memory-configuration decisions afresh every time a system is restarted will introduce unnecessary delays into the restart process. In-memory computing platforms should be as fast to boot as they are to query.

As we push deeper toward the vision of "all-and-only-in-memory" scenario, dynamic memory management will continue to play an important role. It will help to ensure that there is always an equitable split of available RAM between analytic, transactional and other processing workloads across the memory cloud, enabling them all to attain their respective performance requirements. Also, it will enable the underlying RAM resources to be dynamically split between public memory clouds, private clouds and various hybridized public/private cloud deployments.

In all of these ways, distributed RAM resources will be continuously self-optimizing as more RAM and diverse workloads are thrown into the cloud.