Why Compression is a Must in the Big Data Era

Post Comment
Director, Data Management Product Management,

Just last week we had a very interesting #ibmblu twitterchat on "Controlling Your Data Footprint."  Many experts participated in this discussion and many great points were made. You can view a streamlined transcript below or here.

Storage requirements have been growing quickly, increasing pressure on warehouses to achieve required performance. More objects are created to compensate for size, putting even more pressure on the need for storage and increasing the size of the warehouse. 

It's a vicious circle. Then big data comes along and accelerates the growth, further straining the pressure points. Purging and archiving data cannot be the only remedies. Many companies get aggressive on purging and archiving to the point of getting rid of important and valuable data. This is not the desired result.

Compression is now a must. 

Compression drastically reduces storage requirements while also forming the cornerstone that helps many other capabilities provide optimal value. This includes the value of "in-memory" technologies, performing predicate evaluations on compressed data, and increasing performance for utilities such as backups. This is all achievable through DB2 with BLU Acceleration.

With the rise of big data solutions, the need for compression is even more pronounced, including compression on both structured and unstructured data. The value of leveraging data tiers also becomes much more apparent with this trend. Rather than purge data that is still important, leverage mechanisms for defining multiple data tiers to better align the importance of data to the cost of the storage tier. Also, remember data becomes a liability if "hardened" for too long. Use technologies that can analyze the data in motion and reduce the amount of unnecessary data being stored. IBM Data Explorer and IBM InfoSphere Streams help accomplish exactly this.

Performance is achieved through an intricate balancing act of hardware and software, on top of a sound architecture. DBAs are losing control over the physical aspects of the architecture, forcing them to look for new ways to improve performance at the software layer. To deal with performance requirements of growing data, DBAs start leveraging indexes, aggregate and summary tables, etc. This also contributes to growing the size of the warehouse and storage costs – sometimes by a factor of 2X.  These additional objects can increase load times by 2X-3X, boosting the cost associated with maintaining these objects and additional requiring overhead when underlying data is inserted, updated and deleted. 

DB2 with BLU Acceleration drastically reduces the need for these additional objects, while still providing superior performance through technologies such as data skipping ( which can work even on data already in memory) and vector processing. DB2 with BLU Acceleration allows the DBA can stop worrying about the query access plans, defining plan hints and figuring out creative ways to trick the database optimizer - they can stop worrying about secondary objects and start helping people and the business ....and just enjoy great performance!

Join us for our next #ibmblu twitterchat on September 4, when we'll discuss Speeding Up Transactions & Analytics with In-Memory Processing.