Blogs

Debunking Database Archiving Myths

Think quick-fix approaches optimize costs and justify not implementing an archiving strategy? Think again

Product Marketing Team Lead, InfoSphere Optim Solutions, IBM

Many of today’s business requests seemingly start with more, as in more speed, more power, more data, more answers, and more throughput. But as project teams struggle to tame mounting volumes of data stored in various repositories across the organization—particularly in enterprise applications, databases, and data warehouses—they are skipping a solution that also involves more—more archiving. Continued exponential data growth is more reality than it is speculation for many organizations.

The archiving process moves inactive or infrequently accessed data from the production database to a secondary platform. An intelligent database archiving system orchestrates data movement in a way that allows functional users to search, retrieve, and consume historical data as required for compliance, data retention, and analytical purposes.

Why does archiving matter? The answer may lie in the following five common myths about database archiving that can dissuade many project teams from implementing an archiving strategy. Instead, they may use these myths as rationale to pursue quick fixes in an attempt to stave off poor application performance and optimize costs associated with uncontrolled data growth.

Storage is cheap

You’ve likely heard it many times. Storage is an inexpensive, quick, and easy fix to meet demands for additional space and increased speed. So why archive? The problem is that the cost-effective disk storage bargain isn’t always what it appears to be. As organizations wrestle with large volumes of fast-moving data, they risk being lured by the low up-front costs of procuring storage while ignoring the associated long-term costs.

The direct expense of traditional disk storage has dropped significantly thanks to ongoing advances in capacity and technology, but the associated costs of storage can be quite high. Organizations considering increasing production storage should not forget the corresponding increase in staff that will be needed to manage that storage, along with the space to house it and the energy to run it. What about the additional software licenses and maintenance costs? Potential downtime is another consideration.

As tempting as adding disk storage may be, procuring additional disk storage is not a long-term solution. A database archiving strategy is essential for an organization’s planning to manage continuous data growth and optimize its cost.

Backups can be used for archives

To protect the availability of business-critical enterprise systems, organizations can put a data-backup process in place. In case of production system failure, corruption, or other data loss event, the company uses the backup to restore original data, helping to minimize data loss.

Some organizations also use older backup files as archive files. While this approach may be a convenient way to archive data, it poses a few of the following questions:

  • How do business users access the backup files when they need old data—for example, for analytics or compliance purposes? How long does it take to covert the backup files to obtain usable data?
  • How much is an organization spending on storage to support multiple backup copies of production systems?
  • With data growing year after year, how long are backup process windows? If they get longer than they already are, how will that expanded processing impact the organization?

An archive strategy supports data retention and compliance efforts while helping to shorten backup process windows by keeping only business-critical data in production. In turn, this strategy advances disaster recovery efforts by shrinking the time it takes to restore production systems.

Current technologies make archiving unnecessary

Recent technology advances offer innovative ways to help manage enterprise data. However, these solutions should complement—not replace—archiving strategy. For example, Apache Hadoop offers an open source software framework that helps organizations manage and analyze petabytes of both structured and unstructured data. Based on business needs, structured data from enterprise applications and data warehouses may be leveraged within this framework.

However, an archiving strategy is still necessary for these applications and data warehouses to identify infrequently used or accessed data—cold data—and store it as a historical snapshot for long-term data retention needs. That archived data can be leveraged and queried in Hadoop to make informed business decisions.

Out of sight, out of mind

For many business users who rely on the data stored within production systems, the suggestion of archiving invokes anxiety that infrequently accessed data will go missing when it is tucked away. These users know exactly where the historical and current data lives in the production database and how to access it—and they like that sense of security.

This anxiety can be remedied by first understanding how business users currently access the data in production systems and for what purposes. Once the how and why of accessing historical data is understood, the next step is preparing for the what and where.

  • What should be archived? Working with their business users, organizations should understand what data in production systems is accessed infrequently. For example, perhaps customer orders should be maintained in production for at least a year. After a year has lapsed, as long as the orders have been marked complete, they can be moved to an archive and kept for another six or seven years. In addition to the order information, a copy of the comprehensive business context associated with individual orders should be made, which helps preserve a historical snapshot within the archive. For example, a copy of the customer, inventory, and sales information associated with each order should be captured as part of the archive file (see figure).
  • Where should the archive file live? Once the historical data is archived, it should be stored in accordance with data retention and accessibility needs. This requirement will depend on how business users need to access the data, and how fast and frequently it needs to be accessed.

 
Capturing the original business context for each order in a historical snapshot within the archive
 
Capturing the original business context for each order in a historical snapshot within the archive

Archiving is too complicated

The appropriate solution and preparation—implementing an archiving strategy—can be straightforward. The following process can set the right direction for an archiving strategy:

  1. Identify the complete business object. Identifying where relevant data resides and how it interrelates across applications and functions is important for understanding which data to archive.
  2. Define and classify data. Determine which business objects are absolutely integral for operations and therefore are really worth archiving. After the data is appropriately classified, define archive rules and requirements.
  3. Archive. Depending on defined retention policies, inactive but still-valuable data is removed from the production environment and stored as compressed archive files. Compressed files consume less space in the archive environment and place fewer burdens on the production server than uncompressed files.
  4. Manage storage. Storage management depends on organizational specifications and how often the archived data needs to be accessed.
  5. Access at will. Because data has been archived in a business-relevant context, it can be accessed independently, outside the original application. To interpret the data, business users can work with a variety of access methods, including Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC), XML, and web-based search tools to access and analyze the archived data.

For many organizations grappling with the challenges of today’s ever-rising data volumes, a thoughtful long-term strategy for archiving data can be far more cost-effective than giving in to the temptation of quick-fix solutions.

Additional resources

[followbutton username='swatimoran' count='false' lang='en' theme='light']
 
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']