Implement Data Masking to Protect Sensitive Data: Part 1

Successful obfuscation of sensitive data on an enterprise scale requires carefully considering requirements

Nowadays, enterprises need to be in compliance with customer data privacy and security related regulations stipulated by the Payment Card Industry Data Security Standard (PCI DSS), Health Insurance Portability and Accountability Act of 1996 (HIPAA), and other regulatory bodies. As a result, they are looking to adopt the right approach to data masking, particularly when transferring data out of secured production environments to less-secured test environments or other systems. Data masking, or data obfuscation, is the process of de-identifying or scrambling specific data elements to protect them from unauthorized access by specific groups of end users.

The first installment of this two-part series looks at the requirements and considerations for data masking strategies. The concluding part of this series discusses implementation and the challenges of data masking approaches that a typical enterprise may experience.

Data masking requirements

A highly critical aspect for protecting data in any organization is that any breach in data privacy and security may lead to direct and indirect financial loss, including a damaged reputation that can severely jeopardize customer loyalty. Masking data is therefore rapidly becoming an enterprise-level data management requirement. Careful identification of sensitive data elements and required masking measures need to be in place when customer information is extracted. Typically, this sensitive information includes personally identifiable information (PII), health records, general customer records, and any other sensitive business data that requires protection.

Daily business and technology operations—such as loading data to test environments, migrating data to advanced systems, and reporting to vendors and third-party organizations for sampling—call for data masking methodologies that are standard and reusable across the enterprise.

Data masking approach

Masking sensitive data elements should comply with the following considerations:

  • The output should represent the source data. Representing the source data is necessary to effectively use the data for development and testing.
  • The masked data needs to be irreversible. The source data can never be created from masked data.
  • Understand which data to mask. Not every data element has to be masked, but non-sensitive data should be masked only if it helps re-create sensitive data.
  • The output must be repeatable. The same source data, masked repeatedly by the same masking methodology, must yield the same output.
  • Maintain referential integrity. Make sure the masked data is usable.

Various standard masking methodologies are available. However, specific masking techniques can be used to preserve the format and context of the data elements because meaningful, consistent, and repeatable data is important for effective usage going forward. The following methodologies can be used:

  • Substitution: This methodology simply replaces the value of the sensitive field with another meaningful value. For example, a postal code can be randomly replaced from a set of valid postal codes.
  • Masking, nullifying, and spacing: In this case, the data is replaced with a nonmeaningful value. For example, a Social Security number can be replaced by XXX-XX-XXXX. Encryption keys provide another example in which the data can easily be replaced with spaces.
  • Number and date variance: This methodology involves modifying each number or date value by some random percentage of its real value. It provides reasonable obfuscation while still keeping the range and distribution of values. For example, an employee salary value may vary by 15 percent in either a positive or negative direction. Similarly, a date of birth value can be represented by a random date within a range of 45 days before or after the actual birth date.
  • Format-preserving encryption: This method or algorithm always produces repeatable values that preserve the original format. The original value can be retrieved only by using the appropriate decryption method. The right encryption algorithm decision, however, depends on the enterprise’s information security policies. Encryption methodologies that can be leveraged include the clear 128-bit key Advanced Encryption Standard (AES) algorithm, clear 24-byte key Triple Data Encryption Standard (3DES) algorithm, secure 24-byte key 3DES algorithm, and the secured hashing algorithm (SHA).

Enterprise-scale data masking

Although data masking requirements need to follow enterprise information security policies, implementation in any one enterprise depends on a variety of scenarios. The concluding installment of this series offers details about implementing data masking in enterprise-scale environments.

Please share any thoughts or questions in the comments.