Blogs

Fogs, logs and cogs: The newer, bigger shape of big data in the Internet of Things

Post Comment
Big Data Evangelist, IBM

Big data is becoming the next best thing to true magic. It is everywhere and, increasingly, nowhere specific. Every node in the known computing universe is becoming a component in a vast, distributed, pervasive big data cloud.

As we transition to a world where clouds penetrate every facet of our lives, we need to wrap our heads around the thought that every edge node, no matter how resource-constrained, can be interconnected, intelligent and integral to the performance of the whole.

Fog_blog.jpgWhat I’m sketching out is the vision of a world in which the Internet of Things (IoT) increasingly drives the evolution of cloud computing architectures. In an IoT-centric world, nobody needs to know that your cloud’s processing, storage and other functions have been virtualized to endpoints of every size, configuration and capability. As I noted in this post on big data's optimal deployment model, the case for radically distributed clouds rests on the performance boosts and bandwidth savings that accrue from eliminating round-trips to central processing facilities.

As the IoT cloud evolves in this direction, so will big data. As this trend intensifies, any device that produces, consumes, analyzes and otherwise processes data will become a full-fledged big data node. Sure, many of those nodes, such as the sensors in your firm’s facilities monitoring system, may not be of “3 Vs” magnitude. But it would be as arbitrary to exclude them from your scoping of the overall big data utility as it would be to ignore the individual racks within your server farm. Virtualized into a unified big data fabric, they form a distributed utility for closed-loop IoT process monitoring and optimization.

This is the vision of "fog computing." As Ahmed Banafa explains, fogs are clouds in which the primary processing nodes are network-edge endpoints, many of which, increasingly, are the sensor-laden IoT devices known as "things." Fogs distribute the storage, bandwidth and other cloud resources out to the IoT endpoints, most of which are embedded deeply in the hardware infrastructure of the end applications. And fogs leverage distributed-caching infrastructure to facilitate management of an arbitrarily huge number of IoT endpoints. Within the fog, application and middleware functions are performed by a wide range of endpoints in conjunction with distributed gateway, proxy and intermediary processing nodes.

How will big data evolve in the era of IoT-centric fog computing? At a high level, big data analytics clouds will increasingly emphasize high-volume log analysis and rely heavily on cognitive computing algorithms to make sense of it all. Here now is my overview of the emerging, bigger big data fabric of fogs, logs and cogs:

Fogs

 IT professionals will achieve operational economies of scale by deploying storage-, compute- and memory-intensive IoT analytics functions (that is, data collection, integration, aggregation, modeling and distribution) to the IoT cloud nodes where they are most cost-effectively handled. Some functions will continue to benefit from the governance, standardization and scale economies associated with centralization. Others will be decentralized to the edges to achieve bandwidth savings, low-latency streaming performance, local optimization and other key imperatives. If, for example, you’re overseeing global optimization of a vast energy grid, it’s best to have diverse metrics feeding a centralized machine learning model deployed on a massively parallel big data cluster. By contrast, those functions that mostly leverage a narrow scope of local variables are best executed at edge nodes. According to this same hypothetical scenario, the energy grid’s thousands of distributed sensor and actuators should each have the necessary analytic intelligence and in-memory capacity to respond rapidly to whatever local environmental conditions don’t need global coordination. Indeed, where IoT involves edge nodes that manage complex endpoints (for example, home automation systems for residences, vehicular automation systems for cars and HVAC systems for offices), it’s best for the “center” to deal with locally sourced variables only when those variables drive analytics with community-wide ramifications (like energy grid optimization and traffic optimization).

Logs

The log will be the common denominator data storage and integration abstraction for IoT. Log data of all sorts (web logs, application logs, database logs, system logs and more) is fundamental to the promise of IoT. As I discussed in this IoT post on logs as a fundamental storage abstraction, IoT can't fulfill its core role as the real-time event notification bus of the online world without continuous logging of relevant events. Machine-readable event logging is fundamental to all the core applications of IoT, including real-time sensor grids, remote telemetry, self-healing network computing, medical monitoring, traffic management, emergency response and security incident and event monitoring. Ubiquitous IoT will depend on the ability to support continuous real-time ingest, analysis, correlation, handling and any-to-any routing of machine-generated information. IoT's development depends on the implementation of a ubiquitous, general-purpose event-logging infrastructure. This global logging infrastructure must be able to manage data objects of both relational and non-relational types, process advanced analytics against all of this data, support mixed latencies of batch and streaming data and ensure the linear scalability to support massive volumes of in-flight log data. Individual event logs need not be petascale; in fact, most IoT endpoints will support local logs within the increasingly tight storage constraints associated with their disparate form factors. To the extent that we intend IoT to evolve into a utility fabric for big data applications, we must grapple with the central role of distributed logs as well as with the protocols that support distributed log-data consistency, replication and concurrency.

Cogs

Cognitive computing is the new fabric of advanced analytics in the big data world, and it will be the indispensable brain of IoT fogs. The heart of cognitive computing consists of what I like to call “cogs”: machine learning, deep learning, graph analysis, stream computin, and other statistical models that bring artificial intelligence to life. Cogs are fundamental to the notion of an autonomic planet that is a self-healing intelligent ecosystem. Within that global infrastructure, the IoT underpins what I’ve called "sensory computing," but, as the enabler for “volitional computing,” it’s also the muscle sinew this new fabric. What this refers to is the fact that many IoT endpoints combine sensors with actuators (features that take actions based on sensor readings and other inputs). These features enable those endpoints to drive the automated processes that translate cognition, affect and sensory impressions into willed, purposive and effective next best actions. In order for IoT to intelligently drive next best actions in mobile and other applications, cognitive computing fabrics need to consider the full geospatial and temporal context for all events. In this new cognitive era, mobile devices that collect sensor readings are also feeding real-time information, context and guidance to end users. As mobility becomes the default mode of every aspect of our lives, IoT will become an organic extension of our biological organs of sensation and locomotion.

This is more than a vision. The IoT big data fog is rapidly becoming a reality. Joshua Whitney Allen does a good job of discussing the status of fog computing efforts in industry by providing an overview of IBM’s efforts in this area. He specifically cites the partnership with Nokia to develop the world’s first mobile edge computing platform that can run applications directly within a mobile base station. The IBM and Nokia fog platform accelerates delivery of media-rich, low-latency services to smartphones by ensuring that content is transmitted from base stations rather than a remote media center. Allen also alludes to potential applications of big data fog computing to mobile gaming, augmented reality, smarter traffic and public safety.

With the coming of IoT fog computing, big data will become an order-of-magnitude bigger than ever. Aggregate data volumes across the IoT fog will push into the exabytes, real-time data streaming velocities will connect every point to every other point and the variety of data types and formats will expand to unimagined levels of heterogeneity. More than that, advanced analytics will evolve into distributed fogs of deep-learning cogs operating on event logs that span seemingly infinite things.

If that latter sentence sounded like a page out of Dr. Seuss, you’re not imagining it. As the IoT embeds itself into every fabric of our world, big data analytics infuse it with analytic-driven applications that seem as magical as anything the cat might have pulled from his hat.