There is No Single Version of the Truth

IBM Fellow, Chief Scientist for Entity Analytics, IBM

Truth is in the eye of the beholder

Data is at the core of the insurance industry.  It always has been.  As insurers embark on their own big data journeys, the blog below is excerpted from an article published by IBM Fellow Jeff Jonas in Insurance Day magazine (Oct. 17, 2012). In it, Jeff discusses how to embrace data and find value in all of the stories your data is telling you.  There doesn't have to be only one version of the truth.  Enjoy.

– Kim Minor,  IBM Worldwide Industry Marketing Manager for Insurance

I once built a data warehouse that was being fed daily by more than 4,000 disparate operational systems belonging to a handful of widely recognized consumer brands.

The goal was to understand the customer better by recognizing when the same person was transacting across different brands all held by the same holding company.  The underlying motivation:  the more fully the customer is understood, the more you can service the customer.  While the brand marketing execs worked for the same parent company, there was one question no-one could agree upon: when a consumer has transacted with all of the brands using a slightly different name or address, which name and address should be considered the enterprise-wide gold standard.

As it turns out there is no such thing as a single version of the truth.

The “best” data depends on its source and purpose: While a company may have employee data in different systems, like IT, HR, Finance etc., the employee name and address maintained by the payroll system is probably the best one to use for tax filing.

How do organizations go about reconciling multiple versions of the truth? Many approaches to data aggregation use a so-called “merge purge” approach. What we really need is an approach called entity resolution.

Merge-purge systems are traditionally batch-oriented. Input files are compared and the result is a de-duplicated output file. The “single version” offered by merge-purge systems drifts in accuracy between each scheduled reload, whereby the input files are then periodically reprocessed in their entirety to account for changes.

Entity resolution systems are generally designed to handle real-time updates. Entity resolution systems deliver a dynamic data store of disambiguated entities that are current to the second.

Merge-purge systems often use data survivorship rules to determine which values are kept and which are discarded or archived.

Entity resolution systems generally retain every record and attribute, each with its associated attribution. Because entity resolution systems have no data survivorship processing, there is no chance future relevant data will be prematurely discarded.

As data volumes grow, it becomes more and more unsustainable to reload periodically all of one’s data holdings. For this reason, the larger the historical volume of data, the less practical merge-purge systems become.

Entity resolution systems which support real-time and sequence-neutral processing are not dependent on periodic reloading for accuracy and currency.

The right method for the right mission

Merge-purge systems are well suited to activities that can live with snapshots, such as direct mail marketing and monthly reporting. But entity resolution systems are best suited for real-time missions where processes require access to the most accurate and most current information.

While “a single version of the truth” might sound reassuring, relying on such a strategy can seriously impede real-time businesses. The more real-time our business world grows, the clearer this becomes. For many missions, it’s time to embrace a plural version of the truth.

Related information

Learn how entity analytics can help increase the accuracy of your models. Download an ebook and watch a video of Jeff describing Entity Analytics.


Jeff Jonas is an IBM Fellow and chief scientist at the IBM Entity Analytics Group and blogs at