Recap of IBM Twitterchat: How In-Memory Technology is Transforming Big Data
Nothing screams “speed of business” quite like in-memory technology. On Wednesday, March 27, I participated in an IBM Twitter chat with analysts, influencers, thought leaders, fellow IBM-ers and others on this very topic. The event took take place from 12-1pm EDT and used hashtag #bigdatamgmt. (You can catch us bi-weekly - starting again April 17 - on Twitter at this time using that hashtag.)
Here are highlights from the discussion. Note that I’m using edited versions of the questions that the moderator (the all-knowing @IBMBigData) asked. And I’ve correlated and edited the participants’ tweeted responses for legibility. I’ve removed the most egregious Twitterese and, where it makes sense, I’ve concatenated particular individuals’ tweets through ellipsis marks in order to call out the larger point they made. But I’ve endeavored to retain the jazzy shorthand tweeting style characteristic of this shadowy underworld.
What is in-memory technology and how does it enable real-time speed-of-thought analytics?
We all hovered around the same core definition of this technology. Jeff Kelly (@jeffreyfkelly), a technology market analyst covering big data and business analytics for The Wikibon Project, said “in-memory refers to storing data in main memory (DRAM) rather than spinning disk.” Alex Philp (@BigDataAlex), characterized it as using “RAM-DRAM for extremely fast I/O, moving us away from slow, underutilized spinning disk.” Natasha Bishop (@Natasha_D_G), a product marketing professional, stated that “in-memory tech enables biz 2 utilize data stored in main memory vs fragmented/siloed trad databases.”
And there was general consensus on what real-time speed-of-thought means and how in-memory supports it. Michael Martin @BTRG_MikeMartin (Information Governance Practice Director at BTRG) quoted one of my recent blogs (http://ow.ly/jt6Ia), in which I describe in-memory as “velocity w/ a vengeance,” adding that “in-memory enables you to increase the exploration aspect of Big Data.” Cristian Molaro (@cristianmolaro), an independent DB2 consultant, said “memory access is way faster than disk I/O... even against SSD.”
Regarding the role of extremely velocity in enhancing information worker productivity, yours truly (@jameskobielus) stated that “in-memory puts data into RAM 2 enable interactive visualization exploration o patterns & real-time transactions.....Speed of thought is any tech that doesn’t have any architectural bottlenecks that arbitrarily slow people’s explorations.”
What are the killer apps of in-memory tech?
Cristian Molaro said “in-memory allows applications to fully exploit today’s more and more powerful CPUs... good news for big data!”
Keshav Murthy (@rkeshavmurthy), a database professional, said “ customers r using in-memory approach for simply accelerating traditional BI, recently with analysis of sensor data.”
Yours truly provided a short list of in-memory killer apps: “fast discovery, fast exploration, & fast transactions on data of any magnitude....in-mem $ visual exploration, modeling & scenario exploration. Data science, “sprdsht on steroids”....ability to rapidly evaluate, iterate, & refine statistical models....any transactional app that demands split-second response.”
In addition to “anything requiring speed-of-thought response time,” Jeff Kelly added some compelling industry-specific apps: “smart meter analytics....investigating network traffic issues, finding bottlenecks.... analyzing high-velocity financial data in trading scenarios.”
Natasha Bishop highlighted a role for in-memory in boosting customer experience and engagement optimization: “in-memory tech = gold in #CX tactics and can drive proactive #custserv: up-sell, cross sell #cxo.”
Cuneyt Goksu (@CuneytG), an outsourcing/offshoring database specialist, noted that “fraud detection and investigation is a good candidate.”
Alex Philp, a geographer and geospatial analytics researcher, sees potential value in applying in-memory technology to “working with streaming data to analyze audio – processing in real-time 32 petabytes a day burn rate.”
Michael Martin said “National security is another area in-memory is key, however we can’t say more about it than that.” He also pointed to an external document (http://ow.ly/jt7mO) that describes general in-memory applications: “ensures cost savings, enhanced efficiency, and greater immediate visibility.”
John Crupi (@johncrupi), chief technology officer with real-time analytics solution provider JackBe, said “#m2m #IndustrialInternet analytics is the killer use case for in-memory.”
And IBM big-data program director Leon Katsnelson (@katsnelson) described the role of in-memory and stream-computing technologies in the telecommunications industry: “Call Detail Records processing in memory. 9 billion CDRs per day. Can’t think of a better case for memory.... many apps where data is not valuable enough to even store on disk. In Streams we process stuff in memory and discard.”
What are in-memory’s applications in transactional computing?
Staying on the topic of killer apps, but shifting the focus away briefly from analytics to online transaction processing, we all contributed insights on the value of in-memory technology.
Cuneyt Goksu said “all oltp apps need to be fast. İn memory is fast too. So any oltp app is in the scope of inmemory.”
Michael Martin called our attention to a “customer who used in-memory to transform their OLTP systems in order to get paid faster http://btrgroup.com/bigdata”
Jeff Kelly cited transactional in-memory applications in “ad tech - analyzing user data, real-time bidding, delivering persnalized content - in milliseconds.” He also stated that “any transaction workload that requires real-time response in order to win/save/upsell the customer is in-memory candidate.”
Richard R. Lee (@InfoMgmtExec), an information governance and risk management professional, said “orgs want entire Customer Base, Product Sku’s & Pricing in Memory for rapid transaction processing. Customers will not wait!...in-Memory db will allow Predictive Models to be deployed into Transactional Work Flows for real-time scoring & prediction.”
Alex Philp said in-memory supports transactional applications in “connecting the Internet of Things - IP addressable sensors to real-time calibrate our models for better predictive analytics....Working in the oil and gas industry-energy exploration requires millions of transactions a day for discovery of new resource.”
I noted that “transactions are C (create), U (update), & delete (D) intensive...all can go faster if in-mem & no disk access.... next best action, bridging analytics & transactions, could benefit from in-mem 4 low-latency data & execution....faster transactions that result from caching more frequently used data in RAM at the server and/or client.”
And Leon Katsnelson stated that “many [IBM InfoSphere] Streams apps are transactional and Streams is always in memory.”
How does in-memory support greater data scientist productivity?
In-memory has undeniable value in the lives of data scientists everywhere.
I outlined the key benefits on in-memory to statistical analysts, predictive modelers, data miners, and others: “data scientist can ingest, regress, visualize, explore, model, score, iterate & deploy stat models more rapidly....can refine models far more rapidly if they hold all or most o relevant working data in fast RAM....ask more questions more rapidly against more of data, in-mem, helps data scientists drill deeply to patterns.”
Natasha Bishop said “Data scientist gain major advantage when they can access & digest massive amts data in secs....When data scientists can find answers 2 questions they didn’t THINK to ask it’s a win.”
Richard R. Lee, using a common synonym for “data scientist,” noted that “Decision Scientists spend way too much time today conditioning & gathering data. In Memory can have it all in one place....In-Memory allows DS to create a “Memory Palace” for Models, A/B Tests, Algorithms in development, etc. All in real-time.”
David Floyer (@dfloyer) of The Wikibon Project, said “Using flash in conjunction with DRAM increases the scope of problems tackled and improves recoverability dramatically.”
Alex Philp said “Fire Scientists in Montana are using in-memory computing to better understand wild land fire given a changing climate.”
TerraEchos Inc. (@TerraEchos), a streaming-data solution provider, said “In memory allows the analyst and #datascientist to reduce the workflow with great analytical depth - win/win.”
Ercan Yilmaz (@Ercan__Yilmaz), a big-data industry professional, said “to the effect that it improves data munging and visualization, it helps.”
And Jeff Kelly netted out the bottom-line value for data scientists and business analysts of all shapes and sizes: “less trips to the watercooler waiting for query response.”
What are the economics of in-memory technology?
Leon Katsnelson said “We had in-memory databases when DRAM was 10x [the cost] of what it is today. Cheap memory means more use cases for in-memory.”
I stated that “in-mem more expensive acq than HDD, but coming down rapidly. Cost per IOPS, though, in-mem cost-effective.”
Zachary Jeans noted that “We wouldn’t even be talking In Memory solutions today if the price for RAM wasn’t becoming so reasonable.....A memory fabric based on flash can be more than 53 times faster than one based around disks.” He pointed us to an article in Forbes (http://markerly.com/p/_I0CZUS) that underlying the improving economics.
Alex Philp noted that a flash memory is “cheap, and getting cheaper.....It takes 4 racks of disk storage to create a system capable of 1 million IOPS, or input/output operations per second.....It would take only one shelf of a flash-based storage system....Energy consumption would drop by 80 percent since memory-based systems consume less energy and require fewer air conditioners.”
Richard R. Lee said “ Economics self-evident. Living in real-time world using tools that are not real-time. Reducing Latency to Zero is end game.” He noted the current premium for in-memory over traditional rotating-media storage, but said “reductions in latency well worth the cost factors.”
Michael Martin called for in-memory to be deployed selectively, in line with its improving cost-effectiveness profile. “In- Memory can be expensive and limited in volume....Governance and solid data management help costs across the board including in memory.....Like EVERYTHING it is about finding the cost/benefit. Those rules don’t change with big data or in-memory at all.”
Leon Katsnelson pointed to the need to deploy an optimized storage infrastructure in which different technologies are cost-effective for different uses: “right cost model for the right type of data. Nothing is cheap or expensive on its own. Too expensive for something.”
Jeff Kelly stated that “hybrid approach - in-memory/disk - often needed to make economics work.” He said “don’t forget flash,” and pointed us to a “good piece [on flash economics] by [his Wikibon colleague] @dfloyer here http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan …”
Dave Vellante (@dvellante), CEO and co-founder of The Wikibon Project, elaborated on the economics of the hybrid storage approach: “Isn’t it really a balance? - hierarchy of media from in-memory->flash->spinning rust....Best economic solution is intelligence in file sys where active data svcd fm fast memory and slow data is in the bit bucket....imho less a matter of $ + more case of biz impact. If biz case=excellent $ of in-mem is irrelevant.”
How does in-memory support or supplement data warehousing?
The consensus among the tweeters was that in-memory will inexorably expand its footprint in enterprise data warehousing (EDW) architectures, due to the never-ending craving for analytic acceleration.
Richard R. Lee said “In Memory EDW is Holy Grail. Makes EDW more of ‘real-time repository’ that can better serve Operational & Analytical needs.”
Dave Vellante said “DW/BI for years has been like a “snake swallowing a basketball” -in memory is critical to solve this problem....Ask any DW practitioner and they’ll tell you a story of “chasing the chips” #the_need_for_speed.”
Alex Philp said “IMC [in-memory computing] can help folks leverage their data warehouse - rewire the house for speed.”
I discussed the role for in-memory technology in a broader enterprise data warehousing (DW) architecture: “ In-memory is for fast front-end data access, query, mart, exploration. DW hub can leverage HDD for storage....In-mem is optimal for front-end in 3-tier DW architecture. RDBMS is hub tier. Hadoop/NoSQL is staging tier.... In-mem in front-end means less roundtripping 2 DW in back-end. Save CPU & bandwidth....Potentially, In-mem can be virtualized across server cluster. Memory pooling....in a year we’ll be discussing in-mem as key data persistence & execution option in a real-time analytic infra.”
A balanced topology is essential, tweeted Jeff Kelly: “back to economics - don’t need your entire DW in-memory - use in-memory to supplement trad DW workloads.... must balance biz value of better performance via in-memory versus cost as applied to DW workloads - all workloads really.”
Can in-memory techniques be applied to non-relational databases and/or Hadoop?
We had a healthy discussion of the intersection, current and potential, of in-memory technology with the myriad range of big-data platforms, tools, and applications.
I stated that “In-mem can be applied 2 any data structure 2 accelerate. To date, industry mostly applies to columnar....IBM InfoSphere Streams brings in-mem persistence into end-to-end event processing w/out landing to disk.”
Jeff Kelly said “I believe there are in-memory instances of #Cassandra” and asked if anyone on the tweetchat had further information.
Ercan Yilmaz said “#spark uses in memory querying of data.
David Floyer said “Databases such as Couchbase (Memcache) & Aerospike (Flash) use KV pairs in memory extensively for transactions.”
Alex Philp stated that “InfoSphere #Streams brings “database” functions into IMC [in-memory computing] in real time for continuous query and calculations.”
Leon Katsnelson said “Hadoop is about data on disk. Streams does the opposite i.e processes in-memory. IBM bundles Hadoop and Streams.”
Richard R. Lee said “Time Series db’s(Informix) will benefit substantially from In Memory.”
What is main role for in-memory in big data infrastructures?
We discussed the role of in-memory in big-data architectures, but also drilled into the current constraints that keep the technology from taking that dream to the next step.
Jeff Kelly stated: “in-memory should be used strategically in big data infrastructure where speed, performance gains outweigh costs.”
Cristian Molaro said “main role should be to accelerate access in relevant chunks... big data is too big to be contained in memory....I like the concept of multi-temperature storage: the hottest data stored on the faster (and more expensive) storage device....not all the big data has the same requirements for access performance: keep the hot data close to you and in memory.”
I reaffirmed my key thesis: “ Key role 4 in-mem in big data is “velocity with a vengeance.” And Timo Elliott (@timoelliott) of SAP underlined the fact that we as an industry have a long way to go before we push in-memory architectures into the petabyte-volume stratosphere: “Largest in-memory DB today 250 TB; 64 bit address space ~ 18,000,000,000 TB. Room to grow.”
Scale-out of in-memory architectures is critical for the technology is to encompass more big-data use cases, said John Crupi: “Shouldn’t we start talking about “scale-out in-memory”. Single box, “scale-up in-memory” isn’t going to help the big data cause....Scale-out is the only way to go - otherwise problem envelope that can be tackled is too small....I really want scale-out with scale-up in-memory. I don’t want 1000 nodes each managing 32G of memory. I want 10-20 managing 2T....I think the future for in-memory is #GPGPU using GPU memory and processing....All I want is virtualized memory where I can run all my real-time analytics.”
Cristian Molaro added that “In-memory allows applications to fully exploit today’s more and more powerful CPUs....When you remove I/O constraints by going on-memory you hit next performance wall: CPU... then scale with more CPU in parallel....combine massive parallel computing with on-memory processing and you will get a super faster big data machine...then the next performance wall will be inter-CPU communication.”
Continue the discussion & check out these resources
- For more on in-memory architectures, join the IBM-hosted free virtual event on April 30
- Here is a Forbes article on the disruptive impact of in-memory technology in modern business
- Here is a WikiBon report on the economics of flash technology
- Here is Bloor article on in-memory technology
- Here is an IBM Redbooks Solution Guide on Big Data Analytics with IBM Cognos Dynamic Cubes, a solution that incorporates IBM Cognos TM1, in-memory cube technology with write-back support
- Here is my recent IBM Big Data Hub blog: “In-Memory: The Lightning in the Big Data Bottle”
- Additional resources mentioned during the chat - or shortly thereafter
- And last but not least, here is IBM’s Google+ site on big data
Please engage us and let’s continue this exciting discussion.