CERN Finding - A Trimuph of Big DATA

Client Technical Manager, Analytics Solution Center, IBM

On July 4th, CERN scientists announced that they observed a particle that strongly resembles the Higgs boson, a critical element of the standard model of particle physics.  This particle is thought to be responsible for the characteristic of mass, which gives objects weight when combined with gravity.

Detection of the Higgs Boson would not have been possible without the last decade’s advances in processing big data.  Joe Incandela, CMS Spokesman at CERN, explained that if every collision that they scanned was a sand grain, these sand grains would have filled up an Olympic sized pool over the last 2 years.  They had to find the several dozen or so grains of sand that exhibited characteristics consistent with the Higgs Boson. 

In addition to developing the Large Hadron Collider, the CERN teams also developed a data strategy to deal with the data from the hundreds of millions of particle collisions occurring each second.  The sensors record the raw data on billions of events occurring in the proton collider. These readings are then reconstructed to show the energy and directions of many particle traces.  The data goes through 2 stages of filtering to reduce the data on 40 million collisions/sec down to 10 million interesting ones per second, and then to 100 or 200 collisions that are studied in depth. 

According to Rolf-Dieter Heuer, director general at CERN, “The computing power and network is a very important part of the research.”   Over 15 Petabytes (1 million Gigabytes) are stored each year.  This is distributed through the Worldwide Large Hadron Collider Computing Grid (WLCG) to each of 11 major Tier 1 centers around the world, and from there to research centers and individual scientists.  In the U.S., the Open Science Grid, supported by NSF and DOE, provides much of the compute and storage power for this work.   The scientists use Monte Carlo simulations for generating and propagating the physics interactions of the elementary particles passing through the collider to determine which ones correspond to the hypothesized behavior of the Higgs Boson.  

What they found was a never seen before elementary particle that seems to fit the behavior of the Higgs Boson and is very heavy – approximately 133 proton masses.  Further data analysis is now needed to ascertain its spin, delay modes, and other characteristics. 

Think the amount of data generated by the Large Hadron Collider is huge?  The forthcoming Square Kilometre Array radio telescope is expected to generate 100’s of Petabytes of data per day.  More on that in a future blog post.