Recap of Tweetchat on Big Data in the Cloud

Big Data Evangelist, IBM

Big data in the cloud is a key focus this year and beyond at IBM and with our customers. In mid-December, I participated in an excellent tweetchat on the role of big data in the cloud. Other participants included industry analyst Judith Hurwitz (Hurwitz Group), Hyoun Park (Nucleus Research), and Stephen O’Grady (Redmonk). Here’s a curated transcript of that chat.

For legibility, I have edited and organized my own tweets under the questions they respond to. Also, I have added further commentary under each response to call out the larger story that it’s difficult to convey coherently in a tweetchat:

Q1: During last year’s chat, we said 2012 would be the year of cloud ecosystems, hybrid clouds and PaaS. How did they fare?

  • Koby tweets responding to Q1: “Cloud ecosystems in #bigdata emerging. Platform providers & partners with petascale advanced analytics solutions....Hybrid clouds in #bigdata already out there: public+private, eg. IBM #SmartCloud Enterprise+ ( )....Re PaaS, cloud #bigdata services fall generally into that category, & that market remains niche/immature....What we’re seeing is emergence of multi-tier #bigdata architectures: staging tier in cloud, hub & marts on prem”
  • Additional Koby commentary: Hybrid multi-tier big-data architectures–cloud, appliance and software in a common distributed deployment–are already a reality in many users’ environments. This year and through the end of the decade, organizations will deploy a growing variety of hybrid big-data architectures that combine Hadoop, in-memory, NoSQL, stream-computing, massively parallel RDBMS, and other platforms to support a wider range of sources, workloads and applications.

Q2: What cloud news headline in 2012 do you think had the most lasting effect on the industry?

  • Koby tweets responding to Q2: “no one headline was significant, but volume of cloud #bigdata industry announcements continues to grow qtr by qtr....I counted 20 vendors–#IBM included–in Q4 alone with significant cloud #bigdata analytics announcements....I’m biased but I think this #IBM study in Oct was significant re #bigdata as killer app for cloud ( )”
  • Additional Koby commentary: The cloud uptake in big data, unlike the appliance uptake in enterprise data warehousing that it builds on, does not necessarily lend itself to splashy headlines. One key difference is that appliances are physical cabinets that look impressive in advertisements and on the stages of industry conferences. They also come in new versions that can be launched with Madison Avenue fanfare as if they were new models of luxury automobile. By contrast, cloud services are immaterial and often versionless, so they feel, to media and marketing people, more like philosophical abstractions than tangible products. There will be significant cloud industry announcements to come, for sure, but they are more likely to be in the merger & acquisition category than in new product releases.

Q3: What surprised you the most this year when it comes to cloud?

  • Koby tweets responding to Q3: “what surprised me the most in 2012 re cloud was #bigdata industry slowness to align with cloud standards....I’m not surprised that cloud continues to drive deeper into #bigdata market. Cloud/SaaS is bigdata onramp for SMB....2012 was inflection year in #bigdata adoption, from on-prem deployment models of past & to on-demand of future”
  • Additional Koby commentary: Standards are an indicator of a maturing market. The lack of standards shows that on-demand cloud-based big-data offerings, though increasingly in demand, have not yet crossed over to being the core approach for many enterprise users. Cloud big-data is still primarily deployed in tactical line-of-business applications, which tolerate technology- and/or vendor-specific siloes. Once early adopters of cloud-based big-data start to grapple with cross-silo interoperability issues, the industry push for open standards will intensify.

Q4: 2012 seemed to be a huge year for OpenStack and other standards bodies. Do you think that will continue next year?

  • Koby tweets responding to Q4: “Do I think #bigdata cloud standards will emerge in ‘13? Using my bully pulpit to push for it....But I remain disappointed that open-source community within #bigdata industry resists coordinated standardization....By the way, #IBM is organizing #bigdata cloud forum in March to discuss standards. I co-chair. NIST has one in Jan....I’d like the #Hadoop & #NoSQL open-source vendors to align their efforts with OpenStack storage & compute....I see the fundamental cloud approaches–MPP+virtualization–as already aligned with #bigdata architectures.... #bigdata rides on hybrid stor architectrs–rel, col, file, keyval –& executes every type o analytic & data mgt job....And #bigdata requires elastically scalable cloud compute, memory, & storage grids: scale-out/up/in”
  • Additional Koby commentary: March 18 seminar #BigData in the Cloud: Preparing for the Future. Register here for that seminar.

Q5: Looking ahead to 2013, what predictions do you have for the industry?

  • Koby tweets responding to Q5: “My prediction for 2013 is that cloud/SaaS will become SMB primary platform 4 #bigdata & strong option for large prediction for 2013 is that few new peta-scale DW will B prem-based–most will B cloud/pay-as-go for 2013, I predict most enterprises will archive EDW offsite in public cloud/SaaS & do more ETL there as well....I predict workload-optimized systems, not commodity HW, will become principal cloud #bigdata building block in 2013”
  • Additional Koby commentary: Clearly, I see the appliance (i.e., workload-optimized system) and cloud approaches as being very complementary. You need workload-optimized systems as the foundation of cloud data centers, enabling the elastic provisioning and linear scale-out that gives clouds their power. And I see server hardware as increasingly a component embedded into workload-optimized systems, rather than acquired outright (or integrated themselves) by the customer implementing their own big-data cloud. The bigger big-data applications will start and stay in the cloud, while the smaller customers will do the same (lacking budgets, expertise, or interest in doing it themselves).

Q6: Despite what you think will happen in 2013, what’s on your wish list for the cloud industry next year & beyond?

  • Koby tweets responding to Q6: “for 2013 and beyond, my wishlist focuses on continued order-o-magnitude declines in storage costs, especially flash....also on my wishlist 4 cloud #bigdata in 2013 is unified data/model gov tools that span hybrid public/private wishlist for coming years is hope that a peta-scale all-in-memory #bigdata #cloud grid is cost-effective by 2020....Also, I’d like a virtualization layer that spans data at-rest (Hadoop, DW, NoSQL) & in-motion/streams/ last long-range wish: that quantum computing makes the most complex #bigdata cloud jobs instantaneous.”
  • Additional Koby commentary: IBM is in a position to push all of those envelopes in 2013 through continuing investments. It’s getting harder to find any item on my personal wish list that isn’t already under deep development or ongoing investigation within Big Blue. Exciting times!

There you go. I left the core tweets unedited to preserve the compressed style that I love about this medium. The additional commentary supplements the big data predictions that I blogged a few weeks ago.