Spark: The bridge across the chasm

Vice President of Product, Platfora

Certainly Hadoop has achieved significant momentum through adoption by mainstream IT buyers. But has it “crossed the chasm”? To answer that question, we must understand both what is meant by the “chasm” and what happens in the businesses that are driving adoption of Hadoop.

What is the chasm?

Geoffrey Moore introduced the idea of the chasm in 1991 in his book Crossing the Chasm. This modern business classic has seen frequent revisions and updates since then—Moore’s way of reminding the business community that the idea of the chasm is as relevant as ever.

As Moore explains it, when a new technology arrives, a small group of sophisticated and enthusiastic users adopts it immediately. These Innovators and Early Adopters, as Moore calls them, independently explore the new technology with an eye to figuring out “how it works.”

If all goes well for the technology, then these first adopters are followed by a much larger group—the Early Majority. Members of the Early Majority lack the enthusiasm—and, by and large, the technical know-how—of the Innovators and the Early Adopters. They are less interested in experimenting than they are in solving particular problems. Rather than seeking a sandbox, they want something that works straight from the box.

The Early Majority is a bellwether for a new technology. If such a class of users emerges, then the new technology is heading for success. But if no Early Majority materializes, then the technology has begun failing before it really got started. And Moore calls this gap, the space between the Early Adopters and the Early Majority, the “chasm.”

Standing at the edge

Moore’s description of the early stages of technology adoption has seen strong parallels in developments in the big data space. For example, Hadoop has benefited from the experimentation of Innovators and Early Adopters hailing from a growing number of organizations. What’s more, some of these organizations have gone far beyond experimentation to implement fully functioning production big data environments. However, if such environments are to become the norm, big data solutions must evolve to keep pace with the demands of the modern world.

In far too many businesses, however, access to big data—let alone analysis of it—eludes the get-the-job-done users who should be forming the Early Majority. After all, analysis is still primarily the domain of data scientists and other power users, so even when a broad group of users within an organization gains access to big data analysis, its members do not encounter big data analytics in the iterative, dynamic form to which they are accustomed. Worse still, bringing new users on board creates administrative and data preparation burdens for the power users, hampering their efforts to continue leveraging the big data environment.

Bridging the chasm

To cross the chasm, businesses need big data discovery technology, which empowers users to leverage big data, expanding their possibilities when working with it and boosting the business value they can derive from it. Big data discovery requires an underlying distributed processing framework to support a variety of data and analytical processes: data preparation, descriptive analysis, search, predictive analysis and advanced capabilities such as machine learning and graph processing. Moreover, businesses need a toolset that allows them to take advantage of their employees’ existing skill sets. Until the advent of Apache Spark, however, no single processing framework fit all those criteria—but things are different now. is rapidly changing how businesses rely on Hadoop to “do” big data. Using Spark, organizations can enhance, accelerate and automate data preparation, freeing power users from the drudgery of data preparation and allowing them to work on sophisticated problems—while also placing compelling out-of-the-box advanced analytics capabilities into the hands of those same users. What’s more, Spark simplifies technical proficiency requirements. No longer are expertise in MapReduce and Java required, but only a basic understanding of databases and scripting. Accordingly, businesses can draw from a broader talent pool than ever before when implementing and managing their analytics environment.

And that’s just the beginning—Spark also opens up options for SQL access of Hadoop data while eliminating concerns about which Hadoop distribution a business uses. In short, Spark not only provides the kinds of finished product solutions that Early Majority adopters are looking for, but it also empowers them to build finished product solutions themselves.

Safely across

Crossing the chasm means putting big data to work across the organization, tapping its potential as a way of adding value to—even completely transforming—the business, in part because companies that use Spark to power their big data discovery can integrate Hadoop into their way of doing business. To begin your own journey, learn more about Spark-enabled big data discovery and how Platfora is making it a reality.