The science behind big data storage

Corporate Strategy and Storage Research, IBM

Scientific research has become a data powerhouse, propelled forward by sequencing and imaging technologies that accumulate data at astonishing speeds. However, managing petabytes of fast data is not a challenge exclusive to the scientific community. With the deluge of social, mobile and business data, organizations across various industries face similar challenges. This is why big data technologies seeded and tested in the scientific community are being rapidly adopted and replicated in businesses where success depends on bigger, faster and better insights.

x ray scan of biological cells.jpgOn Thursday, August 21, IBM announced a venture with Deutsches Elektronen-Synchrotron (DESY), Germany’s largest scientific research organization. DESY is one of the world's leading accelerator centers and a member of the Helmholtz Association. It develops, builds and operates large particle accelerators used to investigate the structure of matter. IBM and DESY are planning to develop a big data architecture for science that can speed up analysis of massive volumes of x-ray images of atom-sized particles and can aid in a variety of research projects ranging from semiconductor designs to cancer therapies. 

Based on the IBM Elastic Storage technology which powered Watson in its win on Jeopardy, this architecture can help DESY manage 8.5 Gigabytes of data per second generated by its synchrotron, PETRA III. PETRA III detectors take a high-speed snapshot of atom-sized particles of tens of thousands of samples in quick succession. This data is then used by 2,000 global scientists to explore the building blocks of matter for a variety of research projects.

Elastic Storage can provide DESY scientists high-speed access to increasing volumes of research data by placing critical data close to everyone and everything that needs it, no matter where they are in the world. This architecture will allow DESY to develop an open ecosystem for research and offer analytics services to its users worldwide. For Jeopardy, IBM’s Watson had access to 200 million pages of structured and unstructured data, including the full text of Wikipedia and by using Elastic Storage capabilities, around five terabytes of Watson’s “knowledge” (or 200 million pages of data) were loaded in only minutes into the computer’s memory.

With the information tsunami in businesses, data is becoming difficult and complex to manage, leading to clogged systems and ultimately an inability to meet business performance goals. Businesses will closely follow the evolution of technologies in science and research in their quest to lead in the big data economy. Technologies developed during collaborations such as the one between IBM and DESY will transfer to other data intensive industries and increasingly become mainstream.

There are several examples of this already:

  • IBM Watson, born in IBM Research, recently went to work for USAA and is helping military service members understand the nuances of the financial services as they transition back to civilian life.
  • Last month, IBM announced that an experimental microserver system developed in conjunction with the Netherlands Institute for Radio Astronomy (ASTRON), which has the potential to end up in commercial systems running massively parallel workloads.

IBM Elastic Storage used in Watson is now proven in the most demanding production environments across clients in healthcare, financial services and oil & gas industries.

To learn more