Taking a more refined approach to big data

Director, Product Marketing, InfoSphere, IBM

At the end of September, I gave the keynote at the Boston Big Data Innovations Summit. I decided to speak to an emerging topic: data refinement. I think it is one of the most important revelations in the big data market. The idea is simple: you want to take advantage of and use all sources of big data. But each individual user needs only information relevant to them. They can’t spend time looking through all available data to get the answers they need. In fact, analysts today spend 80 percent of their time searching for data. That’s criminal. And it might get worse, as our appetite to consume more big data grows and grows.

What’s needed is a data refinery. It ingests raw big data, refines it for a specific usage and provides data to various users in the enterprise. It also automatically cleans, matches, secures and profiles data—that’s what is meant by refinement. After the presentation, I received a lot of questions and had a number of discussions about data lakes. It seemed that people who had started to build a data lake were most interested in the idea of a data refinery. Each of them explained how data lakes were great places to land data and understand its value and purpose. But each of them also explained that only the experts could use the data lake, such as data scientists. There is simply too much information in there for business analysts and users to wade through. And frankly, a lot of the data is bad, or wrong. That’s the point of a lake—it stores everything, good and bad. They’d had too many experiences with people grabbing the bad data, and that’s why a data refinery intrigued them.

Could you use a data refinery in your company? What data issues provide the biggest challenge to your business analysts? Do your application developers want to utilize data services to rapidly build data-centric applications?

Watch my keynote presentation from the Boston Big Data Innovations Summit and share your thoughts with me on data refineries—let's connect on Twitter and use hashtag #makedatawork to continue the conversation started in Boston.

It’s about time we all Make Data Work in our organizations!