5 Best Practices for Test Data Management

Extract subsets of production data for continuous testing at warp speed

Manager of Portfolio Strategy, IBM

The days of taking a waterfall approach to application development and delivery are gone. Many organizations are embracing agile methodologies, and as a result they are acknowledging the need for continuous testing. The shift to an increasingly flexible and dynamic development process requires rapid access to the appropriate test data. Managing test data enhances the quality of testing efforts in the following three key ways:

  • Functional testing: Extracting a subset of production data to act as input values for data-driven testing or provide the appropriate level of test databases means testers spend less time on operational activities and more time on actual testing.
  • Performance testing: Stability, load, benchmark, and other types of sustained tests require data for hundreds or even thousands of records to execute performance tests over several hours. Automated test data management makes the test data readily available.
  • Service virtualization: Virtualized components require realistic test data to simulate the behavior of the live service or software they are emulating. Leveraging a test data management strategy to subset production data while masking sensitive information meets these requirements.


Essential steps: Streamlined test data management

Implementing a test data management approach involves a few steps that can help simplify the testing process by applying five best practices to test data management before going to production after testing is complete.

Discover and understand the test data

Data is scattered across systems and resides in different formats. In addition, different rules may be applied to data depending on its type and location. Organizations should identify their test data requirements based on the test cases, which means they must capture the end-to-end business process and the associated data for testing. Capturing the proper test data could involve a single application or multiple applications. For example, a business may have a customer relationship management (CRM) system, an inventory management application, and a financial application that are all related and require test data.

Extract a subset of production data from multiple data sources

Extracting a subset of data is designed to ensure realistic, referentially intact test data from across a distributed data landscape without added cost or administrative challenges. In addition, the best approaches to collecting a data subset include obtaining metadata in the subset to accommodate data model changes quickly and accurately. In this way, obtaining a subset creates realistic test databases small enough to support rapid test runs but large enough to accurately reflect the variety of production data. Part of an automated subset gathering process involves creating test data to force error and boundary conditions, which includes inserting rows and editing database tables along with multilevel undo capabilities.

Mask or de-identify sensitive test data

Masking helps secure sensitive corporate, client, and employee information and also helps ensure compliance with government and industry regulations. Capabilities for de-identifying confidential data must provide a realistic look and feel, and should consistently mask complete business objects such as customer orders across test systems.

Automate expected and actual result comparisons

The ability to identify data anomalies and inconsistencies during testing is essential in measuring the overall quality of the application. The most efficient way to achieve this goal is by employing an automated capability for comparing the baseline test data against results from successive test runs—speed and accuracy are essential. Automating these comparisons helps save time and identify problems that might otherwise go undetected.

Refresh test data

During the testing process, test data often diverges from the baseline, which can result in a less-than-optimal test environment. Refreshing test data helps improve testing efficiencies and streamline the testing process while maintaining a consistent, manageable test environment.


Case study: The importance of test data management

Proper test data management can be an essential process for cost-effective continuous testing. Consider the following scenario in a US insurance company.1 The director of software quality was fed up because lead project managers and quality assurance (QA) staff were complaining almost daily about the amount of time they spent acquiring, validating, organizing, and protecting test data.

Complicated front-end and back-end systems in this scenario consistently caused budget overruns. Contingency plans were being built into project schedules because the team expected test data failures and reworking. Project teams added 15 percent to all test estimates to account for the effort to collect data from back-end systems, and 10 percent of all test scenarios were not executed because of missing or incomplete test data. Costly production defects were the result.

With 42 back-end systems needed to generate a full end-to-end system test, the organization in this example could not confidently launch new features. Testing in production was becoming the norm. In fact, claims could not be processed in certain states because of application defects that the teams skipped over during the testing process. Moreover, IT was consuming an increasing number of resources, yet application quality was declining rapidly.

The insurance company in this scenario clearly lacked a test data management strategy aligned to business results. Something had to change. The director of software quality assembled a cross-functional team and asked the following tough questions:

  • What is required to create test data?
  • How much does test data creation cost?
  • How far does the problem extend?
  • How is the high application defect rate affecting the business?

Finding the answers to these questions was an involved process. No one had a complete understanding of the full story.

Through the analysis process, the team in this scenario discovered that requests for test data came too late, with too many redundancies. There were no efficient processes to provide test data for all of them. Teams would use old test data because of the effort involved in getting new test data, but using old test data often resulted in a high number of defects. In addition, the security risks of exposing sensitive data during testing were rampant.

After fully analyzing the problems, the team in this example concluded that with every new USD14 million delivery, a hidden USD3 million was spent on test data management. Hidden costs were attributed to the following sources:

  • Labor required to move data to and from back-end systems and to identify the right data required for tests
  • Time spent manipulating data so it would work for various testing scenarios
  • Storage space for the test data
  • Production defects not tested because test data was not available
  • Masking sensitive data to protect privacy
  • Skipped test scenarios

After implementing a process to govern test data management, the insurance company in this scenario was able to reduce the costs of testing by USD400,000 annually. The organization also implemented IBM solutions to help deliver comprehensive test data management capabilities for creating fictionalized test databases that accurately reflect end-to-end business processes.

The insurance company in this example can today easily refresh test systems from across the organization in record time while finding defects in advance. The organization now has the enhanced ability to process claims across all 50 states cost-effectively. Testing in production is no longer the norm. In this scenario, implementing test data management not only helped the organization achieve significant cost savings, it helped reduce untested scenarios by 44 percent during a 90-day period and minimize required labor by 42 percent annually.

The insurance company in this case study scenario now has an enterprise test data process that helps reduce costs, improve predictability, and enhance testing—including enabling automation, cloud testing, mobile testing, and more. People, processes, and technologies came together to make a real change.

If you have any thoughts or questions about implementing a test data management strategy, please share them in the comments.

1 The information and data provided in the case study were drawn from "A Business View: The Cost of Managing Test Data," a speaker presentation at the Information On Demand 2012 conference, Las Vegas, Nevada, October 21–25, 2012.