Big data has its discontents. The backlash is a necessary reality-check in an otherwise vibrant arena. Often in this industry, when a technology is vogue, the hype can interfere with rational decision making, both among users and among solution providers.
Big data tends to focus on extreme scale. Those who call for a refocus on “small data” have an important point: the scale of your data platform should not be your primary focus. If you are a typical business user, the vast majority of your operational business intelligence is on data marts in the low terabytes, or is provided as an integral feature of your line-of-business transactional applications. Petabyte scales are definitely important for many advanced analytic applications, but you shouldn’t feel compelled to roll out platforms of that magnitude until you truly need them.
Many of us reduce big data to a convenient scalability shorthand: the “3 Vs” of volume, velocity and variety. As we big-data professionals hammer these Vs into people’s heads, the misconception grows that we’re trying to oversell them on the need for petabyte volumes, streaming velocities and unstructured varieties.
Increasingly, I prefer to think of big data in the broader context of business agility. What’s most important is that your data platform has the agility to operate cost-effectively at any of the following:
- Scale of business: Business operates at every scale from breathtakingly global to intensely personal. You should be able to acquire a low-volume data platform and modularly scale it out to any storage, processing, memory and I/O capacity you may need in the future. Your platform should elastically scale up and down as requirements oscillate. Your end-to-end infrastructure should also be able to incorporate platforms of diverse scales—petabyte, terabyte, gigabyte, etc.—with those platforms specialized to particular functions and all of them interoperating in a common fabric.
- Speed of business: Business moves at crazy rhythms that oscillate between lightning fast and painfully slow. You should be able to acquire a low-velocity data platform and modularly accelerate it through incorporation of faster software, faster processors, faster disks, faster cache and more DRAM as your need for speed grows. You should be able to integrate your data platform with a stream computing platform for true real-time ingest, processing and delivery. And your platform should also support concurrent processing of diverse latencies, from batch to streaming, within a common fabric.
- Scope of business: Business manages almost every type of human need, interaction and institution. You should be able to acquire a low-variety data platform—perhaps a RDBMS dedicated to marketing—and be able to evolve it as needs emerge into a multifunctional system of record supporting all business functions. Your data platform should have the agility to enable speedy inclusion of a growing variety of data types from diverse sources. It should have the flexibility to handle structured and unstructured data, as well as events, images, video, audio and streaming media with equal agility. It should be able to process the full range of data management, analytics and content management workloads. It should serve the full scope of users, devices and downstream applications.
IBM’s announcements this week point toward this new agile-data order (call it the “3 Ss,” if you will). The incorporation of BLU Acceleration technology—dynamic, in-memory, columnar—into IBM DB2 10.5 and IBM Informix 12.1 show that the transaction-processing systems are moving to speed-of-thought velocities without sacrificing their batch-oriented cores. The announcement of IBM PureData System for Hadoop, an expert integrated system able to modularly scale out into the petabytes, shows that scale, speed and scope of business can be achieved within a common rapid-value platform. And the announcement of performance gains in IBM’s Hadoop (InfoSphere BigInsights 2.1) and stream-computing (InfoSphere Streams 3.1) platforms shows the depth of our ongoing investments in accelerating these core platforms to handle both data at-rest and in-motion.
Agile data platforms can serve as the common foundation for all of your data requirements. Because, after all, you shouldn’t have to go big, fast, or all-embracing in your data platforms until you’re good and ready.
For more information
- Please join us April 5 for a Live Video Chat featuring Jeff Kelly of Wikibon, IBM product specialists and Almaden participants as they discuss what this announcement means for organizations facing a big data future.
- On April 30, IBM will hold a free broadcast event, diving into the announcement in great detail, featuring actual customer stories.
- To find out more about the products announced, please visit our new IBM.com webpages detailing the products.
Follow this list of big data social influencers and Almaden participants on twitter to see how the story continues to develop in marketplace.