Recently, I was in Nice for a three-day gathering of 150 European IBM Big Data specialists. Looking around the room at the opening plenary made me think how fast the world of big data is moving and how quickly our community is growing.
One of the topics that got discussed a lot–in the breaks and in some of the sessions–was the best places for an organization to look for their first big data project[i]. There is a school of thought that you treat the first one like a science project, just to learn about the technologies, but my instinct is to look for business ROI right from the start. Otherwise, you risk creating an aura of immaturity around your big data efforts, a sense that it’s not ready for mainstream, which may make it harder to build momentum later.
But where to look for the quick hit?
According to a recent survey by Ventana Research, the most common data source for a big data technology project is existing corporate data; adding low-cost Hadoop analytics to extend the data warehouse capability. The second major use case category for big data technology right now is ingesting data from a new source into the warehouse to enrich existing warehouse applications.
Interestingly, the most talked about set of use cases–social media analytics–is not the most commonly implemented. There are some good reasons for this–data privacy and identity being just a couple–but also the fact that for many organizations it’s not that obvious how to monetize social data.
It’s easy to see why use cases that exploit existing technologies, existing data or existing skills are front and center in the search for early big-data projects. And so they should be, precisely because they are more familiar ground, and they are more incremental–and as a long-time agilista, that’s music to my ears.
But if you want to make a case for pure social data analytics, there is one simple and potentially high-value way to mine social data–what I call crowdsourcing[ii]. Lots of strategic marketing decisions get taken on the basis of relatively small sample surveys and focus groups. In a world where you couldn’t get the data about wider attitudes to your product, competitors, product category, public tastes etc, those represent market research best practice. But social media mean you can get a much wider picture very cheaply. I’ve written before about using social media to hone marketing campaigns in-flight. That’s a monitoring and corrective app. You just couldn’t do that kind of thing before. And even though your data points are less reliable (self-selecting samples, ambiguous textual data) the sheer volume of data-points outweighs that[iii]. And in some senses self-selecting becomes a strength–you’re getting the opinions of the people who are prepared to put their money where their tweets are (“I am so going to see this movie,” “this movie sucks. you couldn’t pay me to watch it”).
So anywhere you are making decisions on gut feel or small samples of consumer taste, ask yourselves, “Can I find a ready-made source of answers to my sample questions (Twitter, forums, blog post, own web site visitors)?” [iv]
The funniest moment of my trip to Nice was when I was sitting in a bar in Nice old town around midnight, amidst a group of old Netezza colleagues who had been around from pre-IBM acquisition. We were talking about big data, analytics, NoSQL databases–all the usual stuff that gets data geeks excited. And someone at the next table complained about us. Of course, I immediately apologized, we were bang out of order, but it was the first time I’ve been thrown out of a bar for talking shop. We found another bar and managed to talk about less arcane topics, including the relative merits of Oregon-style IPA and traditional English Bitter, and The Wordy Shipmates–a less than conventional history of the Mayflower settlers.
[iii] Right now I think it is a case of pragmatic evidence, rather than mathematical evidence that the larger, but individually less reliable, social media data samples are more valid than small, more rigorously selected samples. I’m prepared to be proved wrong if evidence emerges that self-selecting social-media surveying is necessarily biased, but right now I’m going with pragmatism because there is plenty of evidence that has value. Clearly you still have to design your survey with some thought.
[iv] There’s a UK smoothie vendor founded their business on crowdsourcing. They tried out their fresh fruit smoothies at a music festival in 1999 Here’s their story. That wasacrowdsourcing with empty cups, not social media commentary, vut the principle abides, and anyway, it's a nice story. The smoothies aren’t bad either.