Here are the quick-hit ponderings that I posted on various LinkedIn big data discussion groups this past week. I opened up three new themes – meaty metadata, decision scientists, and speed of thought – while further developing established themes, including proofs of concept as core appliance acquisition approach and recommendation engines:
July 30 Meaty metadata?
The massive volumes and lightning velocity of big data are issues for scaling, performance, and resource provisioning. The dizzying varieties of big data – structured relational tables, unstructured text, and all shades between – are another matter altogether. Correlating and extracting useful intelligence from this mess demands a keen focus on rich metadata. Big data integration will run afoul of the "Tower of Babel" syndrome if we don't focus tightly on semantic integration around common industry standards.
Where exactly is the W3C-spawned Semantic Web community in the big data discussion? I see a lot of sophisticated metadata technologies embedded in commercial big data platforms, but support for the core Semantic Web standards – RDF and SPARQL – is still spotty. Taken together, these technologies, which were standardized years ago and found in many commercial products (including the RDF triple store in IBM DB2 10) provide an open framework for expressing and query the rich semantics of graph, document, and other stores.
The need is real. Why has there been no effort to incorporate Semantic Web's core into an Apache subproject that unified Hadoop, NoSQL, and other database technologies under a common, rich metadata layer?
July 31 Proofs of concept as core appliance acquisition approach? Don't base your decision on splashy ads
Keep this recent article in mind when weighing data warehousing appliance performance claims from any vendor, whether it be IBM or any of our fine competitors. At IBM, we pride ourselves on providing customers and prospects with fresh, detailed, valid performance benchmark data on our appliances that reflects the actual workloads you might run on them. The nuances of specific DW appliance loads, configurations, and environmental factors are extremely important, and we will publish those transparently to help you vet our performance claims with a sharp eye. Just as important, we will engage you in a proof of concept, with your data, queries, and loads, to show that we do indeed provide the performance levels that we claim.
That's what a proof of concept is all about: giving you a demonstration unit of the product to vet on your own turf & terms and, if you wish, allowing you to see how we stack up against any other vendor's comparable product. At IBM, we don't make bold performance claims in highly visible media and then bury the details, nuances, and assumptions far away in cyberspace. We demonstrate it all at your place, at your pace, and to your face.
August 1 Recommendation engines? Their primary fuel is behavioral analytics
The key mandate for the new chief marketing officer (CMO) is understanding each customer as an individual. To do this, the CMO must drive investments in technologies that help their companies to analyze the deep wellsprings of customer behavior. More than that, the CMO drives a continuous campaign that attempts to influence customer behavior, through what we increasingly refer to as a "system of engagement." Ideally, you influence that behavior in your direction by delivering differentiated value and experience.
How do you deliver that differentiation? It's by shaping the multichannel experience through a never-ending stream of "next best action" recommendations, offers, and interactions that are custom-tailored to each customer's behavioral profile, as expressed in what they've bought, what they're doing now, and what they're likely to do under various future circumstances.
The recommendation engine, which drives next best action, relies on a steady flow of behavioral data and processes it all with a wide range of behavioral analytics, ranging from propensity models to clickstream processing of the customer's portal visits, natural language processing of their social-media communications, geospatial processing of their location coordinates, and graph processing of their dynamic relationships with key influencers.
Behavioral analytic processing, if it executes within a low-latency stream-computing infrastructure, such as IBM InfoSphere Streams, can provide 24x7 contextualization of every customer interaction across the CMO's multichannel system of engagement. It can make all difference in whether you retain customers, grow your base, and deepen the value from those relationships.
August 2 Decision scientists?
It's easy to confuse data scientists with decision scientists, and it's not just because both of them begin with "D." Much of data science is concerned with fathoming and influencing human decision-making processes. That, after all, is the heart of propensity modeling, churn analysis, and next best action in multichannel customer channels.
The fundamental difference between the two disciplines is that data science is the core of advanced analytics in any subject domain, not just those that concern human behavior, whereas decision science is all about one broad domain: human behavior. Essentially, all of the social sciences are behavioral sciences. Some social sciences – such as econometrics and epidemiology – are far more quantitative and statistical in their core methods. Hence, these behavioral sciences demand advanced data science tools such as SPSS in order to sift through complex, nonobvious patterns in deep data sets. And of course management and operations research schools everywhere tap into the wealth of decision science research, tooling, and theory to explore the nuances of human behavior under various scenarios.
Ideally, your data scientists should have a grounding in decision science if they work in behavioral analytics. Yes, it's important to know whether your customer propensity model fits the observational data coming in from your multichannel customer relationship management platform. But does the model also fit into a valid conceptual framework (psychographic, or what have you) that helps understand what actually is going on inside customers' heads and hearts?
You need that valid model of the customer head/heart in order to know how to engage them as people. If you can't do that, the customer's propensity will almost be to dump you at their first opportunity.
August 3 Speed of thought?
Today's most powerful computers can think much faster and more precisely than you, and without need for sleep, coffee breaks, or a "life." But no, computers don't always think better than you. This is especially true when we're talking about any cognitive process that relies heavily on the fuzzy domain of qualitative human judgment. Hence the artificial intelligence dream will never die.
Speed of thought is the core design criterion for all-in-memory exploratory business intelligence (BI). This ideal has greatest value when we're focusing on the needs of the fastest thinkers among us: analysts, scientists, and other knowledge workers whose jobs revolve around speed and accuracy in fathoming complex subject domains. These people need power tools that can keep up with their intensive explorations.
But is speed-of-thought performance quite so important when we consider the traditional world of BI for batch reporting, ad-hoc query, and business performance management? Under most real-world scenarios, none of these use cases demands advanced visualizations with subsecond refresh rates. For traditional BI, if access speeds, query responses, and report load times are within acceptable, expected parameters – even if that means several seconds of waiting – then users barely notice and their productivity doesn't suffer.
The ideal scenario is when the back-end big data analytics infrastructure has thought faster than you, sparing you from having to think too much. It would be cool if the infrastructure cold, silently and behind the scenes, prefetch all the data, visualizations, and apps it anticipates you might need and automatically push them down to your local in-memory cache. That would give you the wherewithal to do local speed-of-thought exploration if you wish. But you wouldn't have to lift a finger in advance to put those wheels in motion.
At the end of the week, I'm looking forward to the launch of the new IBM big data hub, wherein this blog and two others I've already written will be featured. That will be the primary place where you can access these and all future posts from me and the rest of IBM's big data thought leadership team.