Blogs

Data Warehouse Architectures for Multinational Organizations: Part 2

Discover three approaches to data warehousing that address reporting challenges for worldwide enterprises

Big Data Industry Architect, IBM

The primary value of a coordinated, centralized data warehouse comes from shared models, business terms, and reporting standards. Part 1 of this series discussed some of the advantages and disadvantages of creating a multinational data warehouse for enterprises that operate globally or across multiple regions. This second installment introduces three proposed models and discusses the challenges of setting up multinational data warehouses.

The main value of a coordinated, centralized data warehouse comes from shared models, business terms, and reporting standards. Aligning disparate businesses with each other may seem like a Herculean task at first, but once it is done, the benefits can be worth the effort. Imagine a line manager, for example, having the capability to compare his or her team with any other team in the organization. Or suppose a marketing manager is able to spot a successful campaign in one country early on and quickly propagate it to other regions. There are several advanced data governance and modeling technologies that can support well-designed models for cross-border analysis.

Moving to a coordinated approach with some shared concepts across regions can be difficult. Usually, tremendous investment in the existing infrastructure exists. There is also some form of data model, usually representing the source system model, which can be difficult to map to a more generic model that is suitable for sharing across regions. Fortunately, advanced tools exist that allow mapping between systems and for translating information from one system to another. The IBM® Cognos® TM1® enterprise planning platform is an example of one such tool.

Hardware and database technology are no longer hindrances to a coordinated data warehouse effort with the advent of data warehouse appliances that allow for most corporate data to be stored and analyzed centrally. IBM PureData™ for Analytics software is an example of a data warehouse appliance that can be suitable for this task.

Data governance and policy considerations

Before trying to implement a multinational data warehouse, thought should be given to data governance across regions. The data governance policies should require some standardization of business terms, accounts, and change processes to allow for cross-border analytics. The hard work of mapping terms and policies across regions is often not performed because of differences in front-office software and existing business intelligence (BI) data models.

Transaction data mart

The transaction data mart (TDM) holds the details of all local transactions for a specific time period. These details may be retail transactions, cases of beverages sold, or phone calls for a telecommunications organization. The TDM offers a simple star schema with a single fact table for each type of transaction and product, period, and customer dimension. There should be limited access to this data mart because of data privacy and system performance reasons. By limiting access, the cost of the system and the potential for privacy breaches can be minimized.

The data retention period for compliance with local government regulations for the transactions in the TDM can be set as required—such as by law enforcement for a telecommunications company. Each country has different laws about storing, accessing, and moving detail data across international borders, so these data marts enable companies to meet the requirements.

Enterprise data warehouses

The enterprise data warehouse (EDW) is composed of a system of record implemented as an entity-relationship or star-schema model and a series of data marts or cubes. The EDW is highly robust and designed to support hundreds or thousands of end users. The data warehouse and cubes are assumed to provide information on demand to all potential end users. The EDW holds daily—or hourly—aggregations of transaction data for years by summarizing data from the TDM. Because the data is summarized—and can be made anonymous—the EDW can hold data for periods that are far longer than those stipulated in legal requirements for data retention. And the data can be moved to other locations outside of the country. The daily and/or hourly aggregations should be sufficient for financial, marketing, and operational reporting requirements, as well as more advanced analytics such as market segmentation, churn analysis, and targeted marketing.

Data warehouse models

Each of the three data warehouse models—decentralized, distributed, and consolidated—has a place and purpose for an organization, and each has its own advantages and disadvantages. No one approach works for every organization, and in some cases organizations may move from one approach to another as they evolve over time. The decentralized model applies all analytics in each region. The distributed—or federated model—rolls some aggregated data together into headquartered data marts, and the consolidated model moves all data warehouse functions—point-of-sale (POS) transactions, enterprise resource planning (ERP), and other functions—to the headquarters of an organization.

Before trying to implement a multinational data warehouse, make sure to give some thought to data governance across regions. The data governance policies need to require some standardization of business terms, accounts, and change processes to allow for analytics that span borders. The hard work of mapping terms and policies across regions is often not performed because of differences in front-office software and existing business intelligence data models.

Decentralized model

The decentralized data warehouse model is where all analytics are performed in each region (see Figure 1). Only very high-level summarized data is brought forward to the headquarters, possibly in the form of spreadsheets.

 
Multinational Data Warehouses – Part 2: Figure 1

Figure 1. Decentralized model: Analytics in each region for summarized data at headquarters
 

Distributed model

The distributed data warehouse model applies most analytics at the regional level (see Figure 2). Regional reporting is based on individual data warehouses—EDWs and TDMs. Data marts at headquarters are actually aggregated data from the EDWs in each regional office rolled into the same granularity. These data marts are generally in the form of reporting cubes.

 
Multinational Data Warehouses – Part 2: Figure 2

Figure 2. Distributed model: Aggregated regional data in summary data marts at headquarters

 

Consolidated model

The consolidated data warehouse model involves creating a single, centralized, and fully integrated data warehouse as the primary method of supporting information needs in the regional units and at headquarters (see Figure 3). The centralized approach, in conjunction with the local transactional data marts, can serve all reporting needs.

 
Multinational Data Warehouses – Part 2: Figure 3

Figure 3. Consolidated model: A single, centralized, and integrated data warehouse

 
Each of these approaches has specific benefits for organizations. Determining which model to use is more about the culture of the organization and how it manages itself and less about technology, models, and designs.

The concluding installment of this series discusses each model and tells how an organization may benefit from each solution.

Please share any thoughts or questions in the comments.