Blogs

Twisting the Kaleidoscope: Part 1

Bring NoSQL into sharp focus for integrating hybrid environments into a unified architecture

Big Data Evangelist, IBM

NoSQL is one of the most amorphous categories in the big data arena, second only to big data itself. It seems as if no two people can agree on what’s included in the NoSQL bucket or can decide how to define that bucket in the first place.

 

Fuzzy big data arena segmentations

Back in my school days, we were taught that there are two ways to define something: classically or demonstratively. A classical definition states some essential nature of whatever’s being defined; otherwise, lacking a classical definition, a demonstrative definition points to the set of instances that exemplify whatever’s being defined. Classical is the textbook definition; demonstrative is the street definition.

Where NoSQL is concerned, people try to define it both ways, but neither is satisfactory. Wikipedia attempts a classical definition: “A NoSQL database provides a mechanism for storage and retrieval of data that employs less constrained consistency models than traditional relational databases.”

The core problem with this definition is that the “less constrained consistency models than traditional…” part doesn’t feel definitive. It also doesn’t coincide with the primary reasons why enterprises adopt any of these new technologies. Instead, it feels more like a defensive retrenchment after the market’s originally definitive—and sacred only to database geeks—“eschew SQL” and “eschew relational” distinctions began to vanish. Of course, this retrenched definition alludes to the advantages in database availability and scalability that come from relaxing data consistency requirements, but that reference still doesn’t feel definitive enough to define a market segment.

Later in the same post, Wikipedia, for the purposes of classifying NoSQL databases, alludes to a more demonstrative definition that doesn’t stem from the supposedly definitive feature of less constrained consistency models than traditional relational databases. Instead, Wikipedia states that “the basic [NoSQL] classification that most would agree on is based on data model” and lists column, document, key-value, and graph as examples for real-world databases that fall into these categories.

Even later in that post, Wikipedia confuses both its classical and demonstrative definitions by throwing an unruly collection of innovative databases into the NoSQL bucket: object, tabular, tuple store, triple and quad store, hosted, multivalue, and cell. Most of these databases address data model, but the hosted database is based on deployment model.

And let’s not even talk about something called NewSQL, which for no good reason merits its own stand-alone niche based on support for Atomicity, Consistency, Isolation, Durability (ACID) transactionality.

And to continue on this rant, the NoSQL bucket, at least as described by Wikipedia, excludes the Hadoop Distributed File System (HDFS) and other file stores. Those approaches certainly conform to the supposedly definitive, less constrained consistency models than traditional relational databases. However, it includes Apache HBase, which is strongly consistent—going against the grain of the NoSQL classic definition—under the columnar category. And NoSQL’s columnar category, just like its tabular category, overlaps so extensively into the relational bucket that it feels ridiculous to exclude traditional databases—on the ostensible grounds that ACID transactionality is, what, bourgeois?—from this taxonomy.

Wikipedia is of course just one point of view on NoSQL. No two industry observers define or scope this supposed space in the same way. I get a bit impatient with articles such as, “What comes after NoSQL?” in an April 2013 social platform post.1 Discussions such as the one in this article proceed on the assumption that this so-called segment has a clear enough definition to ponder after without lapsing into nonsense. How can we conceptualize a sequel to NoSQL if nobody is clear on the boundaries of the original?

Confused? I follow this stuff closely, and even I can’t resolve NoSQL’s conceptual sprawl into tidy market categories. I once semi-jokingly tweeted the following message: “Need handy mnemonic for NoSQL database sprawl: CD FIG KNOX M (column doc file in-mem graph keyval newsql object xml multival) doesn’t cut it.”

 

Feature-oriented framework for big data alternatives

The NoSQL category is not terribly useful. The larger phenomenon that it attempts to characterize is the kaleidoscopic proliferation of innovative new database computing approaches. When you look at the range of approaches subsumed under this heading, they blur extensively into each other and into more traditional approaches.

Discussing the features that distinguish one type of data platform from another is more useful. This discussion would help data professionals compare and contrast the different approaches more concretely without being distracted by counterproductive market labels. In building a big data cloud, data professionals need to address which are the optimal data storage, management, and runtime platforms to support each deployment role, functional service layer, component, and workload, considering the target topology and service-level requirements.

As I noted in a recent column, “The next big ‘H’ in big data: Hybrid architectures,”2 the inexorable trend is toward hybrid big data environments that maximize end-to-end scalability, speed, agility, elasticity, affordability, manageability, transactionality, and consistency. You may require a hybrid big data cloud that includes different data platforms—for example, relational database management system (RDBMS) and row-based, Apache Hadoop and HDFS, in-memory and columnar, and so on—fitted to different purposes but integrated into a unified architecture.

To sum up part 1, NoSQL is an inchoate product category that includes many innovative new approaches to database computing. Nevertheless, the diverse platforms subsumed under the NoSQL banner play various important roles in the new world of hybridized big data architectures. In part 2, I’ll detail the key factors that are well suited for framing data platform options. In the meantime, please share any thoughts or questions in the comments.

 

References

1 What comes after NoSQL?” by Ewa Kucharczyk, Inno+Swiss social media platform, August 2013.
2 The next big ‘H’ in big data: Hybrid architectures,” by James Kobielus, IBM Data magazine, May 2013.
 

[followbutton username='jameskobielus' count='false' lang='en' theme='light']
 
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']