Key Considerations for Third-Party Information Sources

IBM experience with data sources fosters a timely primer as the number of external data sources grows

Solution CTO, IBM

A common question from many client organizations is, “How should we be thinking about using third-party data?” Third-party data sourcing builds on the realization that valuable data is not always internal and proprietary. It has become a hot topic as organizations increasingly opt to compete against each other’s analytics-driven business strategies. The third-party sourcing of information is not a new topic. The open data movement—combined with a wave of data monetization initiatives from large commercial organizations that are making new data sets available for use—is rapidly changing the art of the possible. This change has led to a rise of data aggregators that assemble, often anonymize, and catalog a wider range of data sources than was previously available. Both the open data and data monetization–driven availability of data are accelerating, and the type and amount of information that the Internet of Things movement is expected to provide is staggering. The core idea of using open data or creating a new information source by pooling data from multiple sources requires a combination of business acumen, big data technology competency, and analytics prowess. In IBM’s experience, success in third-party information sources is largely defined by what an organization chooses to ignore. That is, rather than getting caught up in possibilities, they need to focus on very specific business outcomes that are measurable and then experiment within that construct. Using internal data sources with a handful of external sources is usually highly productive. It is beneficial where the external sources provide dimensions that help the modeling break through to the confidence windows needed to make actionable and testable decisions or add new attributes that help build models that better explain behavior and outcomes. Examples of third-party data that IBM has sourced, or helped source, for its clients include huge volumes of social media data and interactions from Boardreader, Disqus, Facebook, Google, and Twitter. They also include weather data, open government data, intellectual property and device data, merchandise returns data, and many others. In addition, IBM has existing relationships—many of which are with its customers—with all the major information brokers and several specialized niche vendors as well. Information on products, logistics, pricing, and purchasing behavior are being sourced from these sources. And IBM has data sourcing arrangements with major global telecommunications providers for mobile and related communications-oriented information sources that can be of great analytics value—albeit, the quality and reliability of the data from these source providers may vary greatly. In IBM’s experience, not all efforts to source external data prove to be fruitful. So what is an enterprise to do?

Taking a pragmatic approach

While there is great potential in these sources of data, a pragmatic approach is necessary to work with these largely unexplored and unexploited sources. That approach should be driven by return on investment (ROI) and complement—rather than conflict with—the large information management and analytics agenda discussed here. At this point, organizations should be taking an approach that is rooted in corporate experimentation1 and using the MVI methodologies.2


Organizations should balance the considerations of getting started with going about operationalizing their use of third-party sourcing. Based on IBM’s experience, achieving this equilibrium over the past decade has depended on the following key considerations:

  • Determining the universe of sources and the value of an organization’s collective cultural acceptance of third-party sources: Organizations need to not only manage the data, but they also need to accommodate the internal decision maker’s perception of from where the data is being sourced. Oftentimes, the technology and sourcing are not the hard part; the internal culture and bias toward external data that needs to be managed is what can be challenging.
  • Assessing cost versus benefit: Weighing cost is an obvious consideration, but the right ratio is generally not obvious at the outset because it has never been done before. IBM strongly recommends taking an evidence-based approach grounded in its often-tested minimum viable insight (MVI) approach to working with new and/or external data. Experimentation before committing to significant time and capital investment is critical.
  • Applying appropriate staffing, ownership, and sponsorship of third-party sourcing initiatives: As noted previously, the technology needed to support data acquisition and analytics of external data is often easier than the internal culture and business-as-usual approaches.
  • Elasticity, data integration, and data governance considerations: IBM follows the practice of holding third-party data to the same standard as discussed elsewhere in this document and to use the same infrastructure and services capabilities and flexibility to thereby simplify access and management. For example, using a common data exploration zone approach for ingesting and analyzing third-party data exploration is recommended to minimize the cost of learning and validating the value of third-party sourcing strategies when validating the external sources.3
  • Understanding the difference between real-time and slow-moving decision making: Supplier agreements, reliability, speed-of-data integration, and the role of making inline decisions all need to be factored into how, when, and where third-party information is used. IBM recommends experimentation—including back testing and A/B testing—prior to low-latency or inline vectoring of business decisions based on externally sourced data. The more the use case applies to data in real time, the more likely organizations should be using their own data sources.

Combining internal, proprietary, industry, and open data raises some interesting privacy considerations in many cases. IBM believes in the privacy-by-design principles, and given the flux related to privacy best practices, requiring safeguards on privacy and intellectual property as an up-front design point is strongly recommended. Data monetization efforts by major industry players—oftentimes, as new initiatives—can be underdeveloped, and organizations should expect to invest some time in crafting appropriate agreements.

Security and contracts

Requiring security as an up-front design point when designing these sourcing strategies is as important as it is with privacy—especially in the post-Target-breach world. Equally vital is including evaluation of how potential third-party outsourcing partners manage security as part of the services they provide as well as cybersecurity risk and exposure from entering into these third-party relationships. Appropriate governance, both data management and contractual safeguards, is also a critical consideration. Security policies, tools, and the role that data plays as part of the outsourced service or sourcing need to be understood. IBM strongly recommends that organizations view potential security problems as if they own the entire process because their customers and the media likely will take this view should a problem occur.

Supporting third-party data sources

Another important consideration is understanding how third-party information providers have handled any past incidents. Third-party entities often rely on fourth- and fifth-party information providers to satisfy master services agreements, thereby creating complicated supply chain problems that have the potential to create operational and transactional risk if they are not managed properly. To be clear, IBM is quite supportive of third-party data sourcing. It just recommends making sure everyone is thinking clearly about the considerations before diving in. Based on experience, it has learned to pursue sourcing with a realistic eye that is open to avoiding potential risks and making operational considerations. Please share any thoughts or questions in the comments. 1Experimentation as a Corporate Strategy for Big Data,” by Tom Deutsch, IBM Data magazine, October 2012. 2A Different Methodology for Big Data,” by Tom Deutsch, IBM Data magazine, October 2013. 3Data Lakes, Analyst Observations, and Reality,” by Tom Deutsch, IBM Data magazine, September 2014.