Maintaining Privacy in the Era of Big Data

Product Marketing, Information Integration and Governance, IBM

This week at IBM’s Information On Demand conference, IBM announced a solution for one of the largest concerns in big data today:  Data security and privacy.   InfoSphere Data Privacy for Hadoop is the first solution in the market to offer a full set of capabilities designed to protect sensitive information in big data analysis. 

Having Confidence that Big Data is Secured and Protected

When it comes to security and privacy, both ‘big data’ and more traditional forms of data have much in common.  Both traditional and big data are likely to have some element of sensitivity - a customer record, financial transaction or even supplier information may constitute sensitive data worth protecting.  Both must adhere to compliance mandates.  And both must adequately address security concerns by internal and external stakeholders including auditors, risk officers, security and other business professionals. 

There are however additional security and privacy considerations with big data projects, given the nature of such initiatives:

Are you confident that all sensitive data has been identified?  While many organizations feel confident that they know where their sensitive data resides (for better or for worse), bringing together big data volumes from outside the firewall, department or even a single system, can heighten risk dramatically.  Risk could arise from sensitive data unknowingly being included as we load a big data environment, or it could actually be created when combining once disparate data sources.  (Imagine bringing together relatively safe datasets only to expose data sensitive relationships that in fact reveal PII - Personally Identifiable Information).

Are you confident that sensitive data can remain private?  Maintaining privacy while ensuring big data remains useful in analysis will continue to be a formidable challenge for any organization.  On the one hand, using sensitive data such as patient information in big data projects could yield exciting new findings for diagnosis, treatment or even disease prevention.  On the other, lumping together sensitive healthcare information in a Hadoop platform with many potential downstream users raises a number of concerns, not the least of which would include stringent HIPAA requirements.  

Are you confident that sensitive data is being protected?  As organizations break new ground in big data analysis, it’s clear that the analysis of some forms and volumes of sensitive will be critical to forming new, valuable big data insights.  For this reason, big data platforms are among the primary targets for those looking to steal sensitive data assets.  In order to address that threat and promote sharing of sensitive data across the enterprise, heightened levels of monitoring and control will be needed for these big data projects. 

InfoSphere Data Privacy for Hadoop

InfoSphere Data Privacy for Hadoop builds organizational confidence in big data security and privacy.  Masking technology privatizes data while maintaining its’ utility.  Redaction technology addresses privacy requirements in unstructured data (forms and documents).  Hadoop monitoring and auditing offers organizations a real-time understanding of the user and data activity occurring in the environment.  The combined solution delivers the capabilities to secure and protect Hadoop environments from the threats and risks of using sensitive data in analysis. 

For more information on read about InfoSphere Data Privacy for Hadoop, visit IBM’s Data Security and Privacy webpage, or contact us today!