Exploring the NoSQL Family Tree
Gain an understanding of today’s NoSQL database evolution
A marketing teammate thought my explanation of the NoSQL technology landscape to some new employees would make a good infographic. I now use this diagram a lot to help client organizations and business partners understand some important basics about NoSQL (see figure).
Source: Cloudant, an IBM Company
The fundamentals of today’s NoSQL technologies
NoSQL arose from big data before it was called “big data.” During the late 1990s and 2000s Google, Amazon, and Facebook were growing through the roof. There were no commercial or open source databases capable of supporting their growth, either in scale—data volume and number of connections—or in the variety of data structures they processed—web logs, product catalogs, full text, and so on. So these organizations invented their own solutions, and thankfully they wrote about their successes, enabling others to build on their work.
As the infographic shows, people used these ideas in different ways to create many of today’s popular NoSQL databases. For example, Apache CouchDB borrows from Google’s MapReduce white paper,1 and Cloudant borrows from Apache CouchDB and Amazon’s Dynamo white paper2 among other items. Others, such as MongoDB, sprang up independently among the big web thought leaders.
Analytics and operational databases
The color coding in the diagram highlights the fact that NoSQL technologies evolved to meet specialized workloads. They essentially divide into analytic solutions, such as the Apache Hadoop framework and Apache Cassandra, versus more operational databases such as CouchDB, MongoDB, and Riak. Analytic solutions are very good at running ad hoc queries in business intelligence (BI) and data warehousing applications. Operational databases excel at handling high numbers of concurrent end-user transactions.
Of course, these solutions can be applied to multiple purposes. One of our client organizations, Novartis, described using the Cloudant database as a service in a data warehousing application.3 Cassandra is another example. It has typically blurred the line between operational and data warehouse use cases, often leading to uncomfortable fits.
Vendor- and community-driven databases
Projects such as Hadoop, Cassandra, and CouchDB are developed by a community of both people and vendors. They are sustained, supported, and enhanced collaboratively. I prefer these projects because they tend to be highly immune to the product roadmap and licensing whims of single-vendor-backed projects.
Hopefully, this diagram can help those new to NoSQL understand the playing field. There are many other NoSQL and NewSQL technologies that are not shown—only those we hear about most often are discussed here. If you’re looking for additional information about the NoSQL landscape, some recommended resources are offered in the following list. Please share any thoughts or questions in the comments.
“NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence,” by Paramod J. Sadalage and Martin Fowler, Addison-Wesley Professional, August 2012.
“Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement,” by Eric Redmond and Jim R. Wilson, Pragmatic Bookshelf, May 2012.
“The Log: What Every Software Engineer Should Know About Real-Time Data’s Unifying Abstraction ,” by Jay Kreps, LinkedIn Engineering blog, December 2013.
1 “MapReduce: Simplified Data Processing on Large Clusters ,” by Jeffrey Dean and Sanjay Ghemawat, Google, Inc., white paper, 2004.
2 “Dynamo: Amazon’s Highly Available Key-Value Store ,” by Giuseppe DeCandia et al., Amazon.com, white paper, 2007.
3 “Lessons Learned – NoSQL and DBaaS in Life Sciences,” Cloudant on-demand webinar.