Graph Analysis Powers the Social and Semantic Spheres of Big Data

Why predictive modeling of human behavior demands an end-to-end, low-latency database architecture

Big Data Evangelist, IBM

Graph analysis is developing rapidly into one of the most promising new segments in the big data market. The vogue for graph analysis was boosted by Facebook's recent beta of the graph search feature for its online community. Graph search builds on the social graph Facebook announced three years ago, which maps the explicit and implicit relationships among members based on their profiles, timelines, and behaviors within that community.

Graph analysis is, at heart, a mathematical approach for mapping complex relationships among networks of nodes. In the business world, graph analysis has various applications, the most noteworthy being mapping social relationships (as exemplified by Facebook's offering) and mapping semantic relationships (which are at the heart of what the World Wide Web Consortium's long-running Semantic Web initiative is all about). A social graph maps relationships that are partly or entirely behavioral in nature (e.g., among individuals within social groups), while a semantic graph maps relationships among words, concepts, and other linguistic constructs within human languages.

Graph analysis is hot these days in the big data arena, but it is not a new technology within the disciplines of data science and advanced analytics. Graph modeling is an established branch of statistical modeling that focuses on mining, mapping, visualizing, and exploring connections, interactions, and affinities. What distinguishes graph analysis is a focus on "graphs," which are abstract networks of relationships (known as links) among nodes (which may be individuals, groups, companies, products, systems, objects, concepts, words, and other entities). In addition to applications in social and semantic applications, graph analysis has well-established uses in scientific, engineering, and other domains.

Of the technology's many uses, social graph analysis is most popular, thriving on the gusher of customer intelligence flowing from online communities of all shapes and sizes. In addition to customer profiles and other contextual data, modelers may incorporate a huge range of behavioral information into social graph models. The behavioral data sources might include Facebook status updates, tweets, portal clickstreams, geospatial coordinates, transaction records, interest profiles, call detail records, and usage logs. Social graphs may also incorporate diverse streams of big data—structured and unstructured, user- and machine-generated, and so on—that issue from social media as well as from B2C communities, B2B supply chains, and enterprise applications.

In the enterprise, social graph analysis powers anti-fraud, influence analysis, sentiment monitoring, market segmentation, engagement optimization, experience optimization, and other applications where complex behavioral patterns must be rapidly identified. Graph models are powerful enablers for fine-grained predictive modeling of human behaviors because they help identify the likely behaviors of individuals in their fuller context of groups, relationships, and influence. These models offer microscopically detailed views of the customer experience by focusing on human actions and interactions.

Semantic graph analysis is also a well-established discipline and a substantial focus of many big data initiatives. It is fundamental to search optimization, content analytics, and other cutting-edge applications of advanced analytics against unstructured data. Data scientists explicitly build semantic graph models as ontologies, taxonomies, thesauri, and topic maps using tools that implement standards such as the W3C-developed Resource Description Framework (RDF).

Whether you're doing social, semantic, or some other form of graph analysis, this approach is outside the core scope of traditional analytic databases and even beyond the ability of many Hadoop and NoSQL databases. Graph databases are an embryonic (but potentially huge) segment of the big data arena. However, that doesn't mean you have to acquire a new database in order to do graph analysis. You can, to varying degrees, execute graph models on a wide range of existing enterprise databases. Nevertheless, where social graph analysis is concerned, there is a growing market for graph databases or graph stores, which are specifically optimized for it. And where semantic graph analysis is concerned, you can do it on specialized RDF triple-store databases or on enterprise databases, such as DB2 v10, which provide triple-stores extensions.

But if you're serious about graph analysis, you're going to need to ramp up all three big data Vs—volume, variety, and velocity—to do it effectively. Depending on the amount of data, the complexity of models, and the range of applications, graph analysis can be a huge consumer of processing, storage, I/O bandwidth, and other big data platform resources. And if you're driving the results of graph processing into real-time applications, such as anti-fraud, you need an end-to-end, low-latency database architecture.

What do you think? Let me know in the comments.

[followbutton username='jameskobielus' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']