Rethinking the modern data warehouse: Passé or progressive?
What do we want out of a data warehouse? Do we need sub-second query latency? Sometimes. Do we need massive scalability with zero performance degradation? Perhaps. Mostly, we just need a place to house and organize the information that supports today’s analytical activity.
It’s easy to be blinded (and impressed) with the rapid innovation and evolution in the arena of big data. Today's most technically sophisticated companies have the opportunity to exploit big data tools to address mind-numbingly cool use cases and produce very enticing results. However, so many companies today still depend on the ideals and workflow (not to mention the sunken investment) of the traditional data warehouse.
Recent Aberdeen research demonstrates that Best-in-Class companies have found the right formula to leverage their data warehouse infrastructure, in combination with newer processes and technologies, to support an elevated level of analytical activity. Some key characteristics that these top companies share include:
- Data lake technology in place. Scalability and flexibility in the data infrastructure aren’t just objectives of the largest and most sophisticated companies. While the requirements may vary, any company significantly investing in their data environment has a strong need to manage the growth of data volume and complexity with efficiency. Many companies explore a data lake architecture to help address these needs. However, according to Aberdeen’s research, only about one-third of organizations with a data lake currently implemented have built it on open-source Hadoop-based technology. The majority of organizations are leveraging commercially available technologies (including data warehouse software) to accomplish the flexibility and scalability offered by a data lake.
- Diverse set of data in use. Many would argue that the classic raison d’être of a data warehouse is to support analytical activity. By that definition, a modern data architecture would therefore need to house and organize a wide diversity of data. Traditional application-based structured data still constitutes the bulk of information used for analysis by most companies. Increasingly however, organizations are looking to exploit information from external third-party sources, unstructured data from social media channels, or machine-generated Internet of Things (IoT) data. Leading companies have more effective data management environments, largely out of their need to handle this diversity of data. The research shows that all these non-traditional types of information are more likely to be rated as “critical” by Best-in-Class companies.
- Strong data governance / oversight. In addition to the trend of data diversity that so many companies face, many are also seeing an expansion in the type of user becoming more active. And this doesn’t just apply to analytical activity, but also with accessing and manipulating data. This environment of more data and more users is arguably very healthy for the analytical prospects of the typical company, but also necessitates an elevated level of responsible oversight of data usage. Best-in-Class companies are more likely to have the right policies and procedures in place to govern access to data and its proper usage. However, these top companies are also more likely to support those policies with dedicated technology, allowing for more automation in their oversight of data usage.
These Best-in-Class characteristics help facilitate a smoother flow of information within their organizations, heightened analytical activity, and enhanced business performance as well. However, the research also demonstrates that these aspects are near and dear to the hearts of companies with a vested interest in utilizing data warehouse technology. In other words, companies that view data warehouse technology as mission critical also recognize the activities that are vital for supporting its success (Figure 1).
Figure 1: Characterizing the Modern Data Warehouse
As mentioned, the concept of a data lake is not inextricably linked to open-source Hadoop technology. More companies are still looking to leverage prior investment in commercially available data warehouse technology (and the associated skill sets) to build their data lake. Data governance and oversight are important to those continually investing in a data warehouse, as is data diversity. In fact, beyond the importance of IoT data depicted in Figure 1, these organizations have also rated several other data types (e.g. location-based geospatial information, unstructured data), as critical or very important to their analyses.
Many have wondered aloud if the data warehouse is dead. It’s not an unreasonable question given the negative perception that data warehouse projects often carry. These projects are sometimes viewed (or experienced) as wasteful time- and cost-sucking endeavors. However, the multitude of successful data warehouse implementations can most certainly be adapted to survive and thrive in today’s often challenging environments of data volume and complexity. Best-in-Class companies unite data warehouse technology with other complementary tools and platforms, as well as the right processes and policies to produce real results from their data.
For more information, explore the full report: The Data Warehouse Evolved: A Foundation for Analytical Excellence.
- Smarter data warehousing on the Watson & Cloud Platform
- How Bluemix and dashDB make flexible cloud-based data analytics possible