Sample-based analysis: A new approach for unstructured data management

Offering Manager, IBM

Data is not just a valuable resource today—it is fueling digital transformation at organizations worldwide. Yet only 15 percent of all businesses get the value they need from their data. And for the rest of them, 80 percent of their data remains locked in silos or is not business-ready. With the ever-increasing growth in data, this opportunity to drive digital transformation is slipping out of hands for those who aren’t able to manage their data.

But you can’t manage data if you don’t know what it is or where it’s stored.

According to Forrester, there has been a significant jump in the number of global data and analytics decision-makers who reported storing more than 100 terabytes within their company data centers—from 30 percent in 2016 to 61 percent in 2017. And 12 percent report storing 5 petabytes or more. For most companies, data is growing at a very fast rate—and so are the challenges of managing it. 

According to an IBM study, aiding data discovery and ensuring data accuracy are the top priority focus areas for business. The growing volume of privacy and data protection regulations around the world is making data management and data protection a strategic requirement for every organization. It’s not just the GDPR that impacts global businesses. The landscape continues to evolve in many countries, with regulations such as CCPA in the U.S., LGPD in Brazil and many more. 

These new and upcoming data privacy regulations are forcing companies, regardless of size, to assess and manage sensitive and personal data across the entire organization.However, with hundreds of terabytes or even petabytes of unstructured data, companies have no idea where or how to get started. What should they do to drive value from their data? How can they be confident about their compliance and audit readiness in a timely manner? 

Traditional ways to assess unstructured data have tended to require an analyst to assess each file and every character in that file, thereby necessitating an assessment period which could span from months to years based on the amount of data. IBM has developed an innovative solution to this problem, through IBM Watson Knowledge Catalog InstaScan. It’s a new product that combines statistical sampling with unstructured data management for cloud data sources to reduce time to value and help you accelerate your journey towards regulatory compliance readiness.

Watson Knowledge Catalog InstaScan leverages native indexes to give you quick visibility into cloud data. It uses sample-based analysis to run risk assessments and compliance checks on business data. Through sample-based analysis, you can quickly discover where potential regulatory hotspots – like personal information – exist within your data footprint and prioritize cleanup efforts.  

After performing a risk assessment with Watson Knowledge Catalog InstaScan, you can look deeper into areas where the sensitive data was found to be most prevalent within your business. You can use Watson Knowledge InstaScan to run a compliance check on a presumed-clean data source and create a report that shows whether you are in compliance with governance obligations.

To summarize, Watson Knowledge Catalog InstaScan can help organizations to:

  • Define policies for assessing their cloud data sources.
  • Identify risk spots within their data and prioritize remediation action.
  • Gain more confidence in their data by periodically performing risk assessment.
  • Decide if a chosen data set contains potential policy violations by using industry-leading algorithms with the desired level of confidence and probability of error.
  • Download reports after performing risk assessment and compliance check.
  • Reduce time in gathering data to help prepare for audit and regulatory checks.

Learn more about Watson Knowledge Catalog InstaScan and how it can help your business. Now is the time to build on the analytics foundations you already have invested in. Ensure you can protect, govern and truly know your data.

Notice: Clients are responsible for ensuring their own compliance with various laws and regulations, including GDPR. IBM does not provide legal advice and does not represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.