Big Data Analytics Will Permeate the Internet of Things

Big Data Evangelist, IBM

Big data is a key infrastructure in the Internet of Things (IoT), but it’s far from the only piece of the fabric. As I stated here, the IoT is central to the notion of a Smarter Planet. In the coming global order, every human artifact, every element of the natural world, and even every physical person can conceivably be networked. Everything will be capable of being instrumented, given data-driven intelligence, and interconnected continuously to drive every desired human outcome.

smarter-planet.jpgWhen you consider the Smarter Planet vision, you realize it doesn’t dictate that you absolutely must interconnect every last IoT node out at the edge of your cloud, nor does it require you instrument each and every node with sensor/actuator logic or provision each with advanced analytics. That scenario will probably be expensive overkill for many IoT applications where many edge nodes are disconnected, have distinct role specializations, and/or perform passive, inflexible or “dumb” functions.

What the Smarter Planet vision points to is the “network effect” of benefits that you might realize by increasing the interconnection, instrumentation and intelligence of a cloud of IoT nodes. Where exactly you deploy the “3 I’s” in an IoT big-data cloud—at the “center” or the “periphery”—is an open issue.

In that regard, I recently came across a good article that discusses the optimal distribution of IoT intelligence. In the article, author Massimiliano Claps cites a study that made the following blanket statement: “Each connected ‘thing’ should be considered a point of data capture, analysis and actionability.” As architectural principles go, that’s a bit overkill, but Claps pushes back against it by posing the following critical question:

“Will it be more efficient to have dumb sensors that simply transmit all the data to a central server where a data warehouse can be built and then apply the analytical capabilities there, or will it make sense to put some intelligence at the periphery, in sensors, or nodes?”

He answers his own question with a good discussion of the pros and cons of the “smart vs. dumb endpoint” debate. Intelligence-poor endpoints, he notes, allow you to “use less energy and are easier to maintain” and enable you to “consolidate the data centrally to run the analysis.” On the other hand, he observes that putting some intelligence in IoT endpoints can support several key infrastructure optimizations:

  • Enable faster responses at the endpoints,
  • Save bandwidth that might otherwise be chewed up through network-intensive endpoint roundtripping to cloud servers, and
  • Ensure that any data and workloads that can be handled more efficiently at endpoints are kept off the big-data cloud servers

As a general architectural rule, I’d like to propose a set of core principles for equipping your IoT cloud with big-data smarts. We should distribute data-driven intelligence across IoT cloud environments in keeping with the following three considerations:

  • Operational economies of scale: You should deploy storage-, compute- and memory-intensive IoT analytics functions (e.g., data collection, integration, aggregation, modeling and distribution) to the IoT cloud “center,” where they are most cost-effectively handled. This centralized deployment model describes the optimal role for most of your data warehouse, Hadoop and other big-data platforms in the IoT cloud. From a planning perspective, you should start with the assumption that all IoT analytic functions benefit from the scale economies of centralization, unless one or both of the following criteria—functional scope and real-time speed—make distributed deployment more appropriate.
  • Functional breadth of scope: If IoT analytic functions are triggered by a broad scope of global variables in concert, you should deploy those functions in a centralized fashion. If, for example, you’re overseeing global optimization of a vast energy grid, it’s best to have diverse metrics feeding a centralized machine-learning model deployed on a massively parallel big-data cluster. By contrast, those functions that mostly leverage a narrow scope of local variables are best executed at “edge” nodes. Per this same example, the energy grid’s thousands of distributed sensor/actuators should each have the necessary analytic intelligence to respond rapidly to whatever local environmental conditions don’t need global coordination. Indeed, where IoT involves edge nodes that manage complex endpoints (e.g., home-automation systems for residences, vehicular-automation systems for cars, HVAC systems for offices), it’s best for the “center” to only deal with locally-sourced variables relevant those analytics that have community-wide ramifications (e.g., energy grid optimization, traffic optimization).
  • Real-time interaction speed: Another benefit of distributing locally contextual analytics to cloud “edge” devices is the speed improvement from eliminating the excessive bandwidth consumption that might otherwise come from server roundtripping. In addition, decentralized analytic deployments are well-suited to the real-time, dynamic, streaming interaction patterns often associated with IoT. Where end-to-end low latency is necessary, distributed IoT deployments can benefit from incorporation of caching, stream-computing and in-memory platforms in edge nodes. In a fully built out IoT cloud of the future, it won’t be unusual to see slimmed-down MapReduce models driving streaming predictive optimization at edge nodes, while more centralized nodes execute other MapReduce models on analytics with varying latency requirements.

As the IoT takes shape, the notion of a totally “dumb” endpoint will become antiquated. Before long, it will be difficult to find any consumer, business, industrial or other device that totally lacks embedded, data-driven analytic intelligence. What’s driving this trend are the plummeting cost of solid-state storage, the inexorable miniaturization of electronic components, and the embedding of deeper analytic libraries in every device.

No, you don’t absolutely need deep smarts at every IoT endpoint, but the benefits of adding intelligence incrementally at the periphery will become clear for many applications, as outlined above. Before long, the Smarter Planet will be rich with intelligence everywhere you look.

By the end of this decade, every little thing—no matter how specialized, mundane or disposable—will come equipped with its own dedicated stash of big-data smarts.