Not Big Data, but Broad Data
“Big data” is an area of intense interest in the IT change field right now. CIO’s are being told that this is something they need to address, and lots of big data solutions are being bought and sold. Cynics may feel that there is a lot of hype around big data, but many people clearly believe there is significant value to be gained from it.
But what is “big data” and where is the value? Are we looking in the right place to find it and extracting value in the right way? Perhaps we’ve limited ourselves by misunderstanding big data.
An idea of what “big data” is has entered the mainstream. We are fortunate enough to have a definition in the Oxford English Dictionary:
big data noun [mass noun] Computing:
data sets that are too large and complex to manipulate or interrogate with standard methods or tools
So big data is large and complex.
Gartner’s Doug Laney is more specific, introducing the now-widespread idea of 3V’s:
This re-emphasises the idea of genuinely “big” data. These definitions are pointing towards the sorts of huge, fast moving data sets that large retail or telecommunication organisations have, with perhaps billions of transactions or calls. We can rapidly appreciate how this data can yield value through data mining, giving new insights about customer behaviour, perhaps telling us when people make calls, or to whom. This idea of big data isn’t applicable to the customer list of your average corner shop, and maybe excludes most businesses.
Genuinely big data may certainly be of value, but perhaps we should look more broadly at the promise of big data, and see if there are alternative, or extended definitions, which are more helpful.
This series of posts asserts that “big data”, by these definitions, is not what we should be getting excited about. In fact, most of the real value-adding activity is not in this area at all, but in “broad data”.
What “big data” usually means - “broad”
By “broad data” we mean the enrichment of existing data by connecting to additional, new data sets. Let's consider an example. Say a small enterprise has a customer list of 10,000 individuals, and it's thinking of introducing a new product. Let's imagine it commissions market research which tells it the propensity of customers to buy the new product depending upon their age. This is a table of no more than 100 rows containing age and propensity to buy. Joining the customer data (containing date of birth) with the propensity table, the business can clearly see who's likely to buy, and can build a targeted marketing campaign and a credible sales forecast. Broad data is the addition of new, often external, data to our existing data to give new information and value. This is a key value-adding technique in the loosely defined area of "big data". Let's see how it stacks up against the 3Vs:
• Volume - neither of these data sets (customers or propensities) are particularly large. Either of them could have been smaller or larger and the technique would still give new value.
• Velocity - these data sets might well be quite static, changing very slowly over time.
• Variety - we are bringing in data from a different source, of a different type. This is the one V that matters.
So the 3V's are of limited use for defining and directing this sort of work. We need different guidelines. Here, we propose the 4 O's of Broad Data, to be taken as prompts:
• Original - this means that the new data we're bringing in has not been used by us in this way before
• Obscure - means we've not been aware of its value in this context
• Overlapping - means there is a way of joining or combining the new data with our existing data (like age in our example above)
• Augmenting - means the combined data has new added value for our business
Admittedly, we have to employ phonetic equivalence to include "Augmenting" in the 4O's, but it’s an important inclusion. Taken together, this gives us a set of terms which help us find new value.
Is Broad Data a new thing?
Let's admit straight away that this is not really new. We've been finding and joining data to realise value for a long time. It's hardly rocket science (though it may be data science). However, the era of Big Data seems to be upon us, and the renewed interest in exploring new ways of increasing data value must be a good thing. Our definition of broad data makes this more relevant for a wider audience and opens up many more use cases. It's not just the huge data users who can benefit. What is subtly different from the past is the greater emphasis on original and obscure data, often external to the enterprise, to augment existing data. Perhaps it's a call to think creatively about new places to look for data and potentially new business partnerships to support this.
The next post will look at how to use broad data, and where to find it.