What Is a Data Warehouse After All?
Enterprises can capitalize on adaptive architectures for data warehousing as living operational environments
Recently, while standing before a sea of inquiring minds, one of them asked, “Will Hadoop replace the data warehouse?” In various social media venues, similar questions are knocked around for various database or storage technologies. As a corollary, a colleague once asked, “Do you think the population of Brazil is around 27 million?” I admitted I didn’t know. Rather than discuss it, the colleague said, “But now you will likely always discuss Brazil’s population in reference to 27 million.” Point taken; he’d laid the anchor and then everything else would revolve around it. His advice: “Avoid letting the other guy lay the anchor. Rip it out and plant a new one.”
Living, breathing environments
Consider an example of a brick-and-mortar warehouse; is the building the warehouse? How about the storage bins inside the building, or the various forklifts and equipment? The people? The policies and procedures? Clearly all these objects play a role and are part of the physical warehouse. In an information warehouse, consider extract, transform, and load (ETL) and other applications. The data actually enters the data warehouse upon exiting its source environment. Crunched, reorganized, and restructured through business rules and complex math, the flow finally lands in long-term storage. Its structure and content are the end result of all that work. The data warehouse is rather suitably defined as a collection of disparate technologies working in harmony. Conclusion: the data warehouse is not a storage technology, but a living operational environment. Conversely, regarding the data warehouse as a data storage mechanism and all other technologies as its satellites misses a huge opportunity. Integrating disparate technologies invariably means interface-driven integration. Because vendors own the interface definitions, rigid flows and a proliferation of points of failure prevail. There was a time when interface driven meant a best practice. Now we work in much larger scales. Once this approach comes under stress, the failure points will become apparent and then only emergency-room protocols can keep it alive. Getting back to the original question, more generally applied: “Will <my technology> replace the data warehouse?” Clearly not, because each component is just a part of the data warehouse environment. An adaptive architecture, however, can easily assimilate evermore disparate technologies and enforce their interoperation through metadata-driven adaptation. That is, wrap the component with an adapter and standardize its interface to appear essentially polymorphic. For example, if extracting data from Apache Hadoop, IBM® DB2®, Oracle, or SQL Server databases, each extraction adapter should behave interchangeably. This behavior allows metadata switches to dynamically choose the extraction point, and likewise automatically select the extraction adapter for that point’s technology, whereby the adapter applies the vendor-specific instructions to the component and brokers its interaction. What sounds complicated actually isn’t, and deliberate adaptation radically simplifies the assimilation of new technologies. We don’t have to build this technology if we pick a tool that transparently does it for us. But that kind of morphing is in the system interface. What about the business-facing aspects? A data model is in fact an interface in which the tables are each master objects and each table’s columns are the interface description. While table names and column names are metadata in the database catalog, once removed and utilized for a stored procedure or ETL tool, they actually become the nemesis: a hard-wired interface. The application has suddenly re-entangled itself into an interface-driven model. Changes to the tables or columns then create rippling impact across often deeply nested application logic. Consider the morphing of a business intelligence (BI) tool. It imports the metadata of the database and recasts it as end-user-facing components. The end user never really interacts with tables and columns anymore. This capability is what is desired for the back-end architecture too, but no commodity tools directly support harnessing it this way.
This need to functionally morph in the data model–facing architectural core requires the greater resilience of adaptive architecture. A failure in this point is why many data warehouses go stale or even incomprehensible over time. The warehouse’s data model requirements were in vogue when first captured, but six months later when deployed as the data warehouse, those requirements were already going stale with new ones arriving. If the data warehouse is strongly lashed to the original requirements, it could wax obsolete as the incoming requirements shift out from under it. Many years ago, American automakers recognized a primary flaw in their approach to production. Using a rigid architecture, an error discovered in testing would bounce the car back to the starting point, where it would reenter the testing cycle. Foreign automakers, however, could produce designs much faster than their American counterparts by simply adding a little tolerance here and there, such as additional material in a wheel well, and so on. These known problem areas that often failed in performance and stress tests were easily configured with adaptability. The prototype could then move through and finish testing much faster, and also experience high reuse. For example, the original Dodge Durango was built on the frame of the Dodge Dakota, leveraging millions of dollars in proven research. It moved from concept to production in 132 weeks, the shortest such period in Dodge history.* Adaptability is architecturally and thematically imposed upon the components, but the components must be equipped to yield. The question, “Is <my technology> adaptable?” is the more important inquiry. Highly successful solutions have an adaptive architecture imposed upon the technologies. If the technology resists this adaptation, it’s not a good choice for data warehousing. Programmable aspects such as business rules can often change with little destabilization. Changes to the data model, however, can initiate an end-to-end impact review. Just one missing table or column can cause a dependent operation to fail. Newly added columns that show up empty can infuriate end users, which is why they sometimes swallow when they brag about their data model being stable. What they really mean is that they have deliberately frozen it to avoid destabilization everywhere else—there’s a difference. Adaptive architectures proactively anticipate these things and can provide a means to smoke out all problem areas before they ever become an issue.
An established course for development
What good is a resilient architecture if it’s too complex to use? What good is a stable architecture if it’s too resistant to change? An adaptive architecture solution balances the stability of the parts as well as the priorities of the end users (see figure). System-facing architecture and end-user-facing consumption are held in decoupled synergy rather than mashed together in an anxious tension. They move on separate software development lifecycles (SDLCs) of 1) maturing architectural capability versus 3) morphing an end-user-centric feature. The capabilities become stable, flexible workhorses and are assembled to form end-user-facing features. This decoupled approach allows capabilities to stabilize while features move and morph at the frenetic pace of end-user action. Deep architectural change becomes evermore infrequent as capabilities mature. More importantly, the core architecture is deliberately buffered from, and even anticipatory of, data model change. Adaptive system–facing architecture and end-user-facing consumption in decoupled synergy Architecture with the expectation of change is quite different from an architecture that fears change. In common application development, the builders circle the wagons around requirements and their technical implementation. If an end user wants new features, expect a high wall of quality control. This same high wall, plus a miry moat with alligators, chokes a BI solution. The data warehouse ultimately becomes functionally frozen and unable to—easily—support new features. Unfortunately, some features will eventually arrive as workarounds outside of the data warehouse. In an oft-repeated irony, the policies and protocols meant to protect the solution in application development actually sacrifice something valuable—agility and adaptability—when applied to data warehousing. Adaptive architecture frankly means choosing tools and technologies that are metadata-driven in structure, logic, behavior, operation, administration, and as deep as the rabbit hole goes. Likewise, the architecture and the tools use this metadata to steer code generation in as many places as possible. Generating a simple SQL statement is one thing; generating the multiple disparate statements required to affect a highly complex pattern such as a slowly changing dimension, referential check, structured deduplication, unique value enforcement, or a database-wide rollback is quite another. Architects don’t necessarily start out building all these things, but set a course for their eventual construction, assimilating the capability as the needs arise.
If adaptive architecture is the foundation, then metadata is a common currency. When metadata is the primary interface adapting the underpinning physical implementation, this interface preserves the architectural freedom to fortify and expand capabilities without disrupting its consumers. No single technology is the dominant component, but all technologies should operate under a common umbrella, such as a simple component harness. Setting up and gradually maturing without needing an über-architecture at the outset is straightforward. Please share any thoughts or questions in the comments. * “1998–2003 Dodge Durango Simultaneous Design and Engineering,” First-generation Dodge Durango release material from Chrysler, Allpar.com.