Solid-State Drives: Changing the Data World
Say hello to a new friend
It cannot have escaped your notice, if you have been reading industry articles and press releases, that solid-state drives (SSDs) are now available for the leading enterprise storage arrays. Disk storage companies are pitching these drives as a quantum leap in enterprise storage performance, and they are. The sales literature primarily demonstrates SSD performance advantages over hard disk drives (HDDs) but also includes information about additional benefits, such as power and cooling savings, reliability, cost per I/O, and so on.
Even if you strip away the marketing hype, the numerous performance advantages make a strong argument that SSDs will completely replace Fibre Channel (FC) drives as the primary storage technology in high-end storage systems. Price is likely to be an issue for a while, but ultimately, the end of FC drives is in sight. (Who remembers the arrival of the CD? Although it was the first nail in the coffin of vinyl, vinyl indeed took many years to leave the mainstream.) The end of FC drives might seem like a very ordinary technology transition, but something very out of the ordinary is occurring. It’s possible that SSDs will cause you to reevaluate some of the ways that you deal with databases, as the move from spinning disks to solid-state technology will change the rules.
I/O cost and the slowly spinning disk
Clustering. Defragmenting. Reorganizing. Buffering. A vast number of common database tasks and strategies are designed to do one thing only: minimize disk I/O. That’s because HDD I/O is usually the most time-consuming part of any database transaction. If we put it in terms of total time costs, HDD I/O is fantastically expensive.
When you move to SSDs, I/O operations become much faster, but most people don’t realize how large the difference is. For example, take a single, random 4K-page read. From a 15,000 rpm FC HDD, the average response time is approximately 6.5 ms. When using an SSD in an enterprise-class storage array, you should estimate the same read would take 1 ms. In other words, you might expect an SSD to complete six I/Os in the time it takes an FC HDD to finish one.
But there’s more to the story. SSDs can perform operations concurrently; HDDs cannot. The best you can reasonably expect from an FC 15,000 rpm HDD is about 200 random 4K-page reads per second. Using SSDs, read requests can overlap, which gets you more like 5,000 random 4K-page reads per second. A single I/O costs 25 times less on an SSD than on an HDD.
With SSDs, disk I/O doesn’t cost a little less—it costs a lot less. But is the reduction enough to change how you build and manage your databases? In some cases, yes. (Note: This article chiefly addresses online transaction processing [OLTP] access. Sequential access, as exhibited by data warehouse activities, does not produce as great a performance improvement with SSDs as does random access.)
Cluster’s last stand?
Database clustering places frequently accessed DB2 rows in the same page and frequently accessed pages close together on disk. A successful clustering strategy can reduce the number of DB2 GETPAGEs (and also I/Os), especially if many sequential scans use the clustering index. Because the DB2 subsystem reads data in the order in which it needs to process it, clustering helps reduce expensive SORTs. Sequential scans are typical in decision support system (DSS) applications but rare with OLTP access patterns.
The notion of adjacency of pages to reduce disk head or arm movement and thus latency during I/O does not apply to SSDs, of course. When data appears to be physically adjacent in DB2 (that is, consecutive pages in the table space), it is very unlikely to be in consecutive cells on the SSD. The data is distributed evenly across the SSD capacity using wear-leveling algorithms. In fact, the exact location on the SSD is somewhat irrelevant, because the latency to retrieve the data is an order of magnitude (and in some cases, two orders of magnitude) less than the latency incurred during a random read function on a spinning disk.
Does clustering (or for that matter, poor clustering) really matter when using SSDs? Consecutive data pages are unlikely to be in adjacent cells due to the wear-leveling algorithms, so do you need to group data on the media? It’s an interesting question. If DB2 understands that two rows clustered together are on the same page, requests for both of them might result in a single I/O if the requests are close together in time. This single I/O might not occur if the data is unclustered. That is to say, both of those rows may be on separate pages, resulting in two separate GETPAGEs and two separate sync I/Os. That raises the question, is the cost of the extra GETPAGE (or GETPAGEs) and related sync I/Os punitive when using SSDs?
Purists might argue that it costs CPU and channel resources to perform the extra GETPAGEs, and they would be right. However, in the context of reducing the number of REORGs that need to be performed and the speed of the SSDs, this extra cost is easily justified.
Embedding free space
When you create a table space, you frequently embed free space so that an application can insert rows in a clustering sequence, and rows can increase in size as they are updated. The extra space can reduce overflows and index page splits. Most folks typically reserve 10 percent of the total table space as free space, but you will often see more with highly volatile applications.
However, reserving free space only trades disk space for time, potentially letting you go longer between REORGs. So consider embedding less free space when using SSDs, since the clustering sequence may not be so important.
If you decide to forgo free space in the table space, you might want to consider using APPEND YES for the tables in the table space. This option reduces the code path that DB2 must traverse to find a location for the inserted row and also avoids page overflows on INSERTs. On the downside, you need to consider concurrency. Multiple threads executing INSERTs and competing for locks and latches on the same page can be costly, especially in a data sharing environment (although MC00 may solve this problem).
DBAs use buffer pools to keep the most recently used DB2 pages in memory, hoping that the pages will be reused and thus avoiding I/O. And as they say, the best I/O is the one that does not happen. With SSDs bringing down the performance cost of an I/O, the use of buffer pools is not so critical.
A potential course of action here is to reduce the size of the buffer pools supporting the table spaces resident on SSDs. It is likely that you will have a mixture of HDDs and SSDs, so you can allocate the buffer pool space thus saved to the table spaces on HDDs, which need it more.
What does a DBA do today to keep data and indexes organized? Several things, actually, but REORGs may be the most significant activity. REORGs accomplish a number of goals, many of which are related to disk I/O:
- Place the data in a clustering sequence
- Recapture lost space due to row deletions
- Reinsert free space into the table space
- Reduce leaf levels or pages in indexes
- Reduce fragmentation in indexes
- Reduce the number of extents of the table or index space
DB2 does its level best to keep rows close to their optimal location in the clustering sequence. But this is not always possible. The page might lack space for the optimal placement of the row on an INSERT, or a row might grow beyond the space allotted for it on an UPDATE and so must be moved. And there are many other reasons. Ultimately, over time, the CLUSTERRATIO of the data decreases from 100 percent to a lower value. How quickly it decreases depends on the volatility of the data.
You need to monitor the system (ideally in an automated fashion) to determine when a REORG needs to be run. All kinds of DB2 catalog statistics describe the table space condition, and IBM suggests thresholds as to when the REORG should occur. There are many downsides to REORGs:
- They must be scheduled and monitored.
- They consume a large amount of I/O and CPU resources.
- They can reduce concurrency and availability when executed against a live table or index space.
- They can increase logging during the REORG process.
- They flood wide area networks (WANs) with changed data traffic when disaster recovery replication is used.
The last point may be the most significant if you are replicating your database over a long distance for disaster recovery. The writes generated by a REORG, while not transactional writes, are still a necessary evil that must be added to the “real” write workload being transmitted across the link. Because the cost of telecommunications lines is so high, you would hate to fill the pipe with such busywork that could be avoided or at least reduced.
Deploying table spaces on SSDs can reduce the need to REORG. This approach can generate a huge savings in management, CPU, I/O, and link bandwidth for remote replication. You just need to understand that the extra work the DB2 subsystem must do to retrieve index or data pages on SSDs is less than the cost of performing I/O on a table space that has been REORGed on HDDs.
Conversations with folks at IBM reveal a viewpoint within the company that DBAs may reduce REORG frequency. This notion is independent of SSD implementations and is more a reexamination of why REORGs are performed in the first place. For instance: does a large number of extents for a table space really have a measurable negative performance impact? Additionally, DSNACCOX with DB2 10 has specific code to reduce the REORG requirements for table spaces residing on SSDs.
Think differently about SSDs and DB2
Using SSDs to support a DB2 for z/OS OLTP subsystem can help you achieve the following:
- Reduce the number of REORGs
– Save expensive MIPS, disk, and channel resources
– Reduce the costs of remote replication
– Save personnel time managing the REORGs
- Increase space utilization by embedding less free space
- Improve buffer pool efficiency by dedicating minimal space to table spaces on SSDs and reallocating the space thus saved to HDD buffer pools
When you consider the purchase of SSDs for your enterprise-class storage array, think beyond speeds and feeds. Think about which tasks SSDs enable you to accomplish differently.
- Ready to Access DB2 for z/OS Data on Solid-State Drives
- DB2 10 for z/OS Technical Overview
- IBM Easy Tier Enables DS8700 and Storwize V7000 Users to Profit from Solid State Drives (David Hill, Mesabi Group, October 1, 2010)