Are you getting the most value out of Hadoop?
Don’t underestimate the power of 1
As a big data evangelist, I often work with clients to help them set their Apache Hadoop strategy. When they do begin adopting Hadoop, I recommend the following steps:
- Create an environment, weighing cloud and on-premises options as well as cluster sizing.
- Define the data to be loaded—usually unstructured data without a place in the traditional enterprise data warehouse.
- Organize your data, creating directories and partitions and so forth.
- Load the data using extract, transform and load (ETL) tools such as Sqoop and Flume.
Certainly the foregoing are all important early steps, but even so, I hear the same question again and again: How do I get more value out of my investment? When a client asks me this question, we begin talking about who will be accessing Hadoop—data scientists? Application developers? Analysts? But that’s only the beginning.
Empower your entire organization
Many companies approach Hadoop adoption by creating new teams. Although doing so can be helpful—traditional users can struggle to learn a new technology—it is also an expensive undertaking. Certainly Apache Hadoop is a disruptive technology—it allows storage and analysis of data before you commit information to your enterprise data dictionary. What’s more, it lets you define schemas governing reading versus writing. But how can you use Hadoop to bring value to your company as a whole?
I’ve experienced great success using what I call the power of 1. It’s simple, just like the 1s and 0s that are the building blocks of computing. Indeed, I like to tell organizations that they can remain agile and compete through the business equivalent of changing a 0 to a 1: using Apache Hadoop to turn on the power of open-source technology.
Indeed, the need has never been greater—startups are disrupting large organizations and industries at an alarming rate using innovative free and open-source technologies, Hadoop among them. Accordingly, larger companies must leverage Hadoop as well—and that means much more than just handing it to your data scientists without a second look. Rather, companies must leverage the power of Hadoop across the entire organization.
But even so, Hadoop is merely the means through which an organization can bring to bear the power of 1. What, then, is the power of 1? It is as simple as picking one use case of suitable size for one developer—and that developer could be a small Scrum team—to take on.
To start, choose a simple use case—one significant enough to have a noticeable effect but manageable enough for a small development team to address. In many organizations, operational intelligence fills this role admirably. Many operations teams deal with data scattered across multiple files, and the ability to coalesce scattered data, thereby giving operations teams a consolidated view, can add immediate value. What’s more, this can be accomplished by simply bringing raw data files into the Hadoop Distributed File System (HDFS), then coalescing and transforming the data into document structures for storage in Hbase. As a finishing touch, use data visualizations libraries such as D3.js to build a simple HTML5 web tool as a capstone for the project.
Such a project is simple, yet effective—and it can demonstrate the power and value of Apache Hadoop to an entire organization. Gone are the days of categorizing and adding data into your enterprise data warehouse—if it would fit. No longer must you create business intelligence dashboards before sharing information with users. Such solutions are unjustifiably expensive for operational intelligence projects, and even if they weren’t, the time required for delivery using traditional systems might see the problem become obsolete before you reach its solution.
Using Apache Hadoop, however, you can simplify the process by which you analyze raw data and deliver insights to your company. Learn more about what Hadoop can do for your organization.