Recently, I came across an intriguing post on LinkedIn from one of my colleagues that led me to this fascinating article on data in sports. Naturally, I began to ponder the visualization of data and how that affects the world around us.
Over the past few years I have covered a lot of big data ground—from big relational data at Netezza, through to big streaming data at Acunu, with a sojourn in IBM's wide big data portfolio in between. And it has been interesting to see the initial huge interest and investment in Hadoop, which still continues, followed now by a comparable interest and sharply rising investment in NoSQL databases. In fact there’s over 100 so-called NoSQL databases and they come in all sorts of architectures, targeting big data use cases and more.
Verticalization of data storage technologies
From my perspective, what has driven the roll-back of the previously omnipresent and omnipotent relational database has been the verticalization of responses to use cases. As the data has become greater in size and hardware has grown progressively cheaper, there have emerged technologies that were more effective, or at least more cost effective, than relational for many use cases. The result is an emerging horses for courses architecture, where different technologies (for example Hadoop, RDBMS, NoSQL databases, and stream processing) are evaluated and selected for each use case. Conventional relational architecture reasoning from the last decade was one big operational database and one big analytic warehouse. In practice this didn't work anyway—data marts, package and cloud app-specific schemas meant there was a proliferation of databases in the average large organization anyway. So when the big data guys came along and said you need this technology for this app and that technology for that app, it wasn't as big of a deal as a simple relational or post-relational view might suggest. True, there is a need for new skills, but the other side of that coin is new toys for the boys and girls to play with.
Then there is another issue that doesn't seem to be getting the attention it deserves: tooling. Big data apps are predominantly business intelligence and analytic apps, addressing the same domain as data warehouses. We're collecting more data, from more sources, in more formats and using this data to better understand our world and to make better decisions. The same issues of data science, statistics, algorithms and so on still apply, in spades. The volume and variety of the data have exploded, as have the opportunities to derive insight from all these new sources. But the essential point is not just to discover, identify, categorize and summarize the data, but also to communicate it, and that means visualization.
Humans can more efficiently absorb visual information than numerical information—a picture has always been worth a thousand words, even before interactive data visualization tools. How much easier is it to see the trend of a line, or the comparative trends of two lines, than it is derive the same insight from two columns of numbers? Why do you think Excel has had charting for decades?
When I look around now, I see lots of people doing very cool things to visualize all kinds of data; there are many examples of visualization on the Guardian data blog, on the Institute for Health Metrics website and here in this library of user-created IBM Cognos charts. So how come the average executive dashboard still looks like something from the 90s? Part of this is the imagination of the designers. How many times do you see what is obviously a default Excel bar chart without even the dull default colors improved? When someone has given a little thought to how their data is presented, we often find that more compelling conclusions are drawn from that data. It doesn’t have to be big data; all scales of data can benefit from better visualization.
But big data might be a catalyst in sparking a revolution in the visual representation of data, and while we’re waiting for that to arrive, here’s a nice example of what a bit of technology and a good pinch of creativity can do with data.
And for anyone in London there is a great exhibition at the British Library charting the history of data visualization from John Snow (proving Cholera is water-borne) through Florence Nightingale (pioneering pie charts, or, more accurately, rose-petal charts) and onwards. Well worth a visit.