Three considerations for your unified data platform journey
Apache Hadoop infrastructure offers tremendous potential for modernizing the enterprise analytics architecture. In fact, one of the biggest use cases for Hadoop is to optimize the enterprise data warehouse (EDW) architecture, which is part of the enterprise analytics architecture. EDW optimization can involve a number of activities, including offloading both ETL workloads and unused data from the EDW to Hadoop. It can also help enrich EDW analytics and reporting by adding new sources of structured and unstructured data that get stored, processed and analyzed in a low-cost Hadoop infrastructure.
One unified data platform — A must for every organization
In order for Hadoop projects to be successful, organizations must address requirements for data integration, quality and governance in the Hadoop data lake. Although there are many new and emerging classes of data integration, quality and governance software tools available in the market (cloud-only tools, self-service tools designed for non-technical users, Hadoop-only tools, governance-only tools, open source tools, and more), many large organizations are coming to the conclusion that they want one unified enterprise data integration, quality and governance platform that supports the entire enterprise.
A truly unified data platform should meet the following three requirements:
1. Support workloads from any source
The platform should be able to support workloads running on-premises, on private/public cloud, and in and outside of Hadoop — the source should not be an issue. You should be able to build a job once and run it anywhere — in the EDW, the ETL grid or in the Hadoop cluster — without having to modify the job.
2. Offer platform architectures with extreme scalability
Data integration and data quality workloads should be backed by massively parallel architecture that are capable of partitioning large data sets across computing nodes and executing the workloads in parallel. This is a must for processing up to trillions of rows of data per day, which some IBM customers have been doing with the Information Server platform — years before Hadoop was even a word. Data governance architectures should support tens of millions of data assets, similarly done by IBM customers with the IBM Information Governance Catalog.
3. Serve both technical and non-technical users
Oftentimes, the needs of non-technical users are ignored or forgotten. It’s a must for a unified platform to be able to support both technical and non-technical audiences since work flows developed by non-technical users will need to run in the production computing environment.
A successful data journey
Very few software vendors in the world can support all of these requirements for one unified enterprise data integration, quality and governance platform that supports the entire enterprise. The emerging providers of Hadoop-only tools, governance-only tools, cloud-only tools and self-service-only tools will never support what large enterprise customers require. Such large organizations should evaluate vendors carefully and critically in light of these requirements so that they don’t start a journey that won’t take them where they want to go.
Learn how you can start your data journey with IBM.