Big Identity's Double-Edged Sword: Wielding It Responsibly

Identity resolution and big data analytics enhance customer engagement but should be sensitive to privacy

Big Data Evangelist, IBM

Among the principal executive-level sponsors of big data are chief marketing officers (CMOs). Consequently, the fact that many big data repositories are populated primarily with identity information on customers is no surprise. Customer identity information can encompass a vast range of sources, data types, and downstream applications. This data generally serves two broad types of applications: uniquely identifying the customer for security purposes and/or driving personalized engagement with the customer—marketing, sales, order fulfillment, billing, service, and so on.

Where security applications are concerned, core identity information supports varying degrees of customer authentication and permission management. A typical customer identity store includes security-relevant data that is associated with what each individual uniquely is, what each has in his or her possession, and/or how each behaves. Examples of data associated with what a person is include name, address, profile, social security number, user ID, and so forth. Examples of data related to what is uniquely in each person’s possession include password, smart token, digital certificate, fingerprints, password-recovery questions and answers, and other data. And examples of data representing how each individual uniquely behaves include voice, handwriting dynamics, and so forth. The range of security-relevant identity factors can expand indefinitely as more aspects of people, possessions, and predispositions are found that can be uniquely or stochastically associated with each person in any of their particulars.


Identity resolution for shaping engagement

This customer identity store is of keen interest to any security professional. But what warms the hearts of marketing professionals are the applications of personally identifiable information (PII) in customer engagement. Massive stores of customer PII—big identity repositories—are the lifeblood of modern commerce. To the extent that CMOs’ IT organizations have linked diverse customer records to positive identifiers—a process often known as identity resolution—they can drive finely targeted marketing efforts.

An identity resolution engine connects diverse data sources to probabilistically identify nonobvious relationships across identities that may be associated with particular individuals. Rolling up any particular customer’s various identities across many internal and external data stores demands the massively parallel horsepower, specialized analytics tools, and high-capacity storage of a robust, big data infrastructure.

High-performance identity resolution is already a substantial application in customer data integration, data quality, master data management (MDM), and antifraud applications. Identity resolution, enabled by technologies such as IBM® Entity Analytics Solutions (EAS) incremental context accumulators, leverages advanced algorithms to uniquely match the disparate identities that an individual or group might be using in two or more online communities.

Identity resolution also represents a potential missing link between two important use cases of big data: social media analytics on the one hand and multichannel customer relationship management (CRM) on the other. Social media, other Internet sites, and enterprise repositories manage a sprawling, heterogeneous, disconnected variety of customer PII. The methods available to identify customers can shape engagement with them through social and other channels.


Strategies with privacy sensitivity

However, many real-world, social media–monitoring applications have yet to tap the full power of identity resolution. How else can we match the disparate identities that people have on Twitter, Facebook, and other social channels—both against each other and against the system-of-record identifiers kept on customers in CRM, data warehousing, and other operational platforms? Without the ability to resolve some prospective customer's social-sourced identities, how can we determine whether or not they're an existing customer or a hot prospect? Consequently, bringing identity resolution into social marketing strategies should be a high priority in 2014 and beyond.

The flip side of identity resolution is the potential for high-powered violations of personal privacy. Some say big data is Big Brother’s chief tool for mass surveillance. Others say it opens a Pandora’s box for any grassroots Peeping Tom to pry into other people’s affairs with the most powerful telescope ever invented. A cynic might say that social business—one of the hottest new focus areas in multichannel marketing and engagement—is all about everybody minding—and mining—everybody else’s business.

Privacy concerns are rooted deep in the heart of the online experience, which thrives on freewheeling give-and-take, but can easily slip into oversharing, surveillance, and cyberstalking. To the extent that people's social PII can be resolved to a fine degree, customers can be segmented through behavioral fingerprinting, as I discussed in the LinkedIn post, “ Big identity? Social graphs enable behavioral fingerprinting.” The fact that customers are interacting in the world of public socials means they're revealing far more layers of their PII than they realize, even if they're not explicitly sharing anything out of the ordinary. All that other people, such as big data specialists, need to do is correlate the behavioral graphs of individuals and compare them to others who are, on some level, similar. Clearly, behavioral fingerprints may be a powerful tool for one-to-one personalization, but they can be easily misused for intrusive targeting.

Businesses should put privacy considerations at the core of their big data strategies, with a specific sensitivity to the uses of identity resolution. Even when customers opt out of allowing collection of their particular PII or using it for marketing purposes, an identity resolution engine could quite possibly infer the missing PII from the PII that's still legitimately in the organization’s possession. By the same token, predictive models such as churn mitigation, upsell, and target marketing might be good enough to recommend next-best actions in spite of the fact that some PII is missing from the customer’s record.

Identity resolution and big data analytics are powerful tools for delivering personalized cross-channel experiences. If an organization has decided to incorporate these tools into its channel strategy, it should not let that power go to its head. Please share any thoughts or questions about this topic in the comments.

[followbutton username='jameskobielus' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']