Driving in-memory data warehousing into the big data cloud
Of course, this trend is not brand new. As evidenced by last year's launch of dashDB, the fully managed in-memory cloud DW is already a mature technology that increasingly takes its place in hybrid, multizone big data architectures alongside Spark, Hadoop, NoSQL and other leading-edge cloud services. These and other fit-for-purpose cloud data services are increasingly used in mixed-deployment big data architectures that also include appliances, licensed software on commodity hardware and virtualization technologies.
Self-service Bluemix offering dashDB amply exemplifies the advantages of a RAM-speed cloud DW:
- Built-in low-latency performance with IBM in-memory, columnar technology, actionable compression and hardware acceleration
- Agile scaling of data volumes and processing speeds
- Data load-and-go with no manual tuning
- Storage and processing of multi-structured sources with easy synchronization of JSON to structured data
- Support for OLTP transactions and data warehousing analytics in the same cloud database
- Rich integrated in-database analytics, with accelerated parallel processing of algorithm libraries, including Netezza Analytics
- Interoperability with advanced analytic tooling such as RStudio, self-service cloud applications such as IBM Watson Analytics or any standard business intelligence tool and self-service cloud data-refinery solutions such as IBM DataWorks
- Versatility to support diverse deployment models and use cases, including standalone cloud DW, hybrid (extending your on-premise DW to the cloud), data scientist development platform (using R and other tools and algorithms at scale), development and quality assurance, analysis of NoSQL (Cloudant integration) and managed service for DBAs, data scientists, developers and solutions architects
- Tight enterprise-grade security for sensitive data on the SoftLayer Secure Cloud Infrastructure
- Support for on-demand, pay-as-you-go deployment of very large enterprise data warehouses (EDWs) in hours with rapid provisioning in the SoftLayer cloud
- Highly available, scalable, agile, robust, fully managed cloud DW service with automated tuning and disaster recovery
What’s next for the in-memory cloud DW market, and for dashDB in particular?
Greater speed, scale, agility, throughput and concurrency are increasing the range of real-world use cases for which such a platform is well-suited. The in-memory cloud DW will become a premier “single version of truth” node within a hybrid big data architecture, complementing Spark (the primary data scientist modeling workbench), Hadoop (the primary data refinery) and NoSQL (the core platform for Internet of Things and mobile analytics).
And that promise depends on continued scale-out of the in-memory cloud DW. As this week’s dashDB announcement shows, massively parallel processing (MPP) cluster technology is a critical feature for fast, efficient scale-out. This architecture enables the cloud DW’s processing, memory, storage and communications resources to be elastically grown into ever larger multi-server configurations to handle expanding workloads.
Key among these workloads are query processing and data loading—the heart and soul of any DW. Just as important in this new era of big data analytics is data scientists’ and developers’ abilities to use MPP to speed up in-database execution of sophisticated analytic algorithms, such as R-based predictive models.
Considering that the DW is where much of an enterprise’s core data resides, it wouldn’t be surprising to see data scientists begin to gravitate to platforms such as dashDB for much of their iterative modeling.
Developers can trial dashDB now. For further information, check out Sam Lightstone’s latest blog on dashDB, as well as this new post from Michael Kwok: "Make Big Data small with IBM dashDB Enterprise MPP."