Visibility, Adaptibility and Efficiency: Making Big Data Work
IBM Fellow and Distinguished Engineer Nagui Halim recently spoke on big data at Brocade’s Technology Day. He shared some fascinating insights available through big data analysis in a variety of contexts: medical, financial trading, traffic systems and social media, to name a few.
One of the things that interested me about his talk was his refusal to think of “big data” in terms of “data.” As he later described in an interview with John Furrier of Silicon Angle, it’s all about building the right conceptual model of the system to be investigated. This model is what defines “normal,” what allows signals to be distinguished from the noise. It’s a view that personalizes and gives context to the typically horizontal activity of data mining.
Halim also spoke about the two sides of big data analysis: real-time, with immediate feedback on changes in a dynamic system, and the secondary analysis on a large set of disparate, static data. These two types of activities, of course, make very different demands on the infrastructure that supports those activities.
In the first case, visibility and elasticity are key characteristics. Visibility is needed at the application layer in order to identify emerging trends within dynamic datastreams, but the underlying infrastructure can act as sensors as well. As Halim observed while talking with Silicon Angle, sensors need to be everywhere in a system, and the data flow needs to be bidirectional. For example, a network device or management console with a system-wide view can be given specific guidance to focus on parts of the network or datastream you’re most interested in. Elasticity goes hand in hand with this, in that it allows the infrastructure to grow and shift naturally in response to changing application demand, rather than being hamstrung by a rigid, hardwired architecture.
Once data has been collected and stored, the key to returning queries rapidly lies in the infrastructure—but processing speed alone is not the only factor. Data no longer resides in monolithic applications housed in a single location, but most often in small segments scattered across multiple VMs or servers; so efficient communication between those locations and the processing node is critical.
Think of it this way: regardless of whether you have a sports car or a Yugo, there are more and less efficient ways to traverse a hilly city: via a highway that loops around it, via a few major avenues or expressways, or through neighborhood streets. In general, the highway loop should be the fastest route: fewer stops, a flatter, straighter path, more lanes, and so on. But at peak traffic times, surface streets may be faster. You can make choices about the best route at a given time by using tools such as Google Maps or Waze, which deliver near-real-time information about the citywide traffic situation.
The dramatic ebb and flow of big data traffic within a system makes managing these factors critically important—and extremely challenging with traditional static infrastructure. This is why IBM was interested in using Brocade’s Ethernet fabric to help address these challenges within their big data implementations. An Ethernet fabric, such as Brocade’s VCS fabric, creates a “flat” network with many more open routes, allowing traffic to move across the shortest, least congested path.
Brocade’s VCS Fabric technology, in particular, is highly automated, which means nodes can be easily taken in or out of service as needed.
In parallel with Halim’s vision of analysis being driven by the model vs. by the individual data, VCS fabrics can be deployed in any configuration—whatever fits best around the types of traffic the system experiences, rather than data movement being constrained by the needs of the underlying infrastructure. Each node within a VCS fabric also has visibility of the entire fabric, as well as the intelligence to reroute traffic to new, more efficient paths as needed. The VCS fabric nodes all share their knowledge of the system, allowing each node to act as an internal sensor—much like the users of the Waze system—providing an additional layer of feedback and meta-intelligence about the datastream to the user. Taken altogether, these characteristics give the system tremendous flexibility to effectively support big data analysis—both real-time and historical.
Watch the Silicon Angle interview with Halim for additional information on how big data and infrastructure come together.