Indescribable! Indestructible! Humungous! Nothing Can Stop the Big Media BLOB!

Big Data Evangelist, IBM

One of my primal memories as a very young child was seeing the original of “The Blob” (the movie starring a young Steve McQueen) in the theater with my parents. IMDB describes its plot succinctly: “an alien lifeform consumes everything in its path as it grows and grows.” It totally freaked me out. It gave me recurring nightmares for years (I’m better now, thanks).

blob.jpgThat experience sums up how the big-data profession regards the coming of Big Media: with fear and trembling at the approach of the BLOBs (aka binary large objects). In this context, the fearsome protean juggernauts are the streams at the core of online media & entertainment, real-time surveillance, video over IP and other Big Media applications.

From a big-data analytics standpoint, the main apprehensions around Big Media are twofold: volume and visibility.

By volume, I’m obviously referring to the mind-boggling bigness of the streaming media, full-motion video and other BLOBs at the heart of this new age. If you want a quick calculation of how truly BIG these BLOBS are, and how much processor, storage, bandwidth and memory capacity they threaten to consume, check out this Wikipedia overview. If, as appears inevitable in the coming decade, all entertainment, news, publishing, art, culture and communications migrate to digital delivery, and all people everywhere tune into it continuously, today’s quaint petabyte streams are sure to be dwarfed by tomorrow’s unimaginably more massive BLOBs.

By visibility, I’m referring to the fact that BLOBs are usually opaque, unstructured and non-machine-readable to an extreme degree—in other words, their “semantics” (aka deeper meanings) are usually only visible to humans who watch, listen and otherwise consume them. Usually, the automated metadata on BLOBs is almost exclusively of the “envelope” variety: who created the BLOB, what tools they used, when it was posted, how often it was accessed, who accessed it, when and where they accessed it, what media player they used, etc. To the extent that big-data tools can do analytics on these dark behemoths, the analysis is almost always purely on a BLOB’s metadata envelope, rarely on its content-laden payload.

In this way, Big Media presents a challenge not found in most big-data applications. The reason why big-data tools can query, manipulate, transform, map, reduce and otherwise process their content with ease is because most of it (from structured to unstructured) is in some textual format, as is all the metadata associated with it. Where BLOBs are concerned, their “data” (the binary large object payload) can only be crunched down into metadata if it’s run through infrastructure that performs video content analytics, speech analytics, image analytics and so forth. Usually, those are resource-intensive algorithms that demand specialized co-processors and massively parallel processing platforms in their own right, distinct from whatever big-data analytics platform is consuming their output and integrating it with other metadata analytics.

These thoughts came to me as I looked at a recent LinkedIn posting called, enticingly enough, “How is big data used in the porn industry?” After you giggle, you recognize that this unsavory but incredibly lucrative industry is pushing the envelope of Big Media everywhere. What jumped out at me in the post were three things: this understatement (“They use tons of servers and Internet bandwidth, far more than Google”), this mostly-right statement (“I guess most of it is for video downloads”), and these open questions relevant to Big Media’s metadata requirements:

  • “Do they perform video analytics?
  • What kind of metrics do they track, besides video length, title (keywords), resolution (based on user), fraud indicators, video category, sales, downloads, sales velocity and maybe some metrics computed directly on the video itself?
  • Do they create taxonomies, and how?
  • What do the track (web analytics) and what models do they use for user segmentation?
  • What kind of segments do they have?”

“Metrics computed directly on the [porn] video itself”? Here’s me walking on the wild side, for the sake of discussion. Well, one could conceivably run video content analytics to identify different types of performers engaged in different types of acts in different scenarios, etc. One might use that metadata to determine what specific combinations of performers, acts and scenarios cause different types of users to respond to particular circumstances. One might create distinct taxonomies of video content types and video consumers to identify distinct preferences. One might then use the video analytics plus these taxonomies and other metadata to target the offerings presented to different users in different contexts. And so forth.

I suspect (not having any firsthand knowledge, mind you) that the porn industry is already thinking along these lines, and possibly even experimenting with the technologies. It’s not a stretch to think that “legitimate” video-entertainment content providers are doing likewise.

And one can easily imagine that these same practices will become standard for personalized content delivery throughout the Big Media world as it emerges. After all, online video entertainment content of all types—TV, movies, training, marketing, etc.—is increasingly being shrunk, truncated, foreshortened, modularized, reassembled and delivered in every conceivable way. Metadata derived from the streaming content itself—plus customer experience analytics—will help video programmers personalize everybody’s experience of whatever BLOBs come their way.

Unfortunately, video programmers can’t go back in time to erase the Blob from little Jimmy Kobielus’ experience of 1959.