Tuning SQL at the Senate: E Pluribus Unum

When the SQL flow in the United States Senate went from static to dynamic, the database team had to see many queries, but tune them as one

You wouldn’t think that U.S. senators and their staff would impose an overwhelming volume of queries on a financial system. Complex SQL and “senator” seem as oddly matched as, say, politics and rhetoric-free campaigns. Yet on an average weekday, the queries come by the thousands. DBAs everywhere know the score: the more responsive the database, the more ambitious user queries become. Budgets and projections for all the Senatorial offices, comparing them to actuals, plus other analysis, add up fast.

Within the U.S. Senate, the largest operations and support unit is the Sergeant at Arms, which provides a range of services to the Senate and includes the IT group. Lloyd Matthews, a principal DBA for the Senate Sergeant at Arms, has been a DBA for more than two decades. His group must meet performance goals and service level agreements set forth by his user community; the financial management system must respond to users’ online queries in 3 seconds or less. “For us, any online dynamic query that consumes more than 4 seconds is a problem and we go after it,” he says. The offenders that take more CPU and elapsed time get flagged for tuning.

In the past, Matthews’ team could efficiently monitor and tune static SQL queries in both online and batch processes. The U.S. Senate operates its internal financial management system on IBM DB2 for z/OS. This 200 GB database has more than 700 tables and is queried constantly by Senate staff working for 100 senators, the Sergeant at Arms, and the Senate Disbursing Office.

As the financial management system transitioned from a batch environment to a dynamic Web-based platform, the existing tools no longer sufficed. Tuning dynamic SQL queries was taking more time and effort for the Senate DBA group. “The switch to dynamic queries was a scary transition,” says Matthews.

What the Senate’s IT group didn’t realize is that while evolving their practices and tools to handle dynamic queries, they would adopt a new approach to addressing database performance.

Incoming queries, take cover

The Senate DBA team sees thousands of queries hit the financial management system in the peak period between 8 a.m. and 4 p.m. on any given weekday. Over recent years, the queries have shifted from static COBOL/CICS to 95 percent dynamic queries. Many are repeated with different literals and they can ask for “a lot of data, such as five years of history at a shot,” Matthews says.

The DBAs and software developers searched for tools to help them handle query tuning, evaluating tools from several vendors including IBM, Computer Associates, and BMC Software. The financial management system runs on both mainframe and distributed platforms, so they looked at tools that support multiple platforms.

But when they looked at IBM InfoSphere Optim Query Workload Tuner software, they saw a tool that could change their entire process. “We did find other tools that capture and analyze both dynamic and static queries, but they were limited to single queries,” Matthews says. “They don’t handle multiple queries at once. The ability to optimize multiple queries (that is, a workload) emerged as the function that matters the most to us. It enables us to look at SQL statements by groups during an interval of time, then tune them as one. That’s a powerful feature.”

>Analyze the workload, fix the workload

Governments through history have learned they can line-item a massive budget, or consider the budget as a whole, which is much faster. Echoing these two options, in query tuning the traditional approach has been “line item.” Catch an offending query, get a recommended solution, carry it out, and note the improvement. It’s straightforward, but not very scalable in a dynamic query environment.

The Senate’s DBAs now have a highly scalable approach: all queries from Monday to Friday are flagged for attention between an interval of time and that constitutes a “workload.” Then, on any given morning or afternoon at the U.S. Senate, a DBA specifies the workload to be analyzed and enters the criteria.

Typically, says Matthews, “I want to flag any SQL or query that executes in more than 5 seconds elapsed time or CPU time. The query tool gives me the ability to sort by different views. I normally look at views of elapsed times and CPU times for dynamic SQL. For batch SQL or large reports that usually run at night, I look at the view that shows any that read more than 100,000 pages of data within a transaction. This tells me that I may have an inefficient access path, where DB2 reads more data than it should. Normally I see this in batch or static packages.”

For dynamic workloads, Matthews has the query tuner “snap” SQL from the dynamic statement cache every 5 minutes and then, at the end of the workload, consolidate all the SQL into a final output of SQL statements. This is where reconciliation occurs. “I may have the same dynamic SQL statement in each interval, if it ran multiple times. In the final output version, however, or the reconciliation version, I see the statement only once, not duplicates. I then run my query tuning ‘advisors’ to get recommendations for that final version of SQL statements, which is more efficient since it includes no duplicates.”

The new work pattern

The DBAs don’t want to lock up resources or implement a recommendation in the production environment without first testing in a production-like environment, because implementing recommendations directly into production could have a negative impact upon users. “After implementing the recommendations in our production-like environment, I run the workload again. If everything looks good—lower CPU costs, lower elapsed times, and less I/O—we implement those recommendations in our production environment over the weekend after a reorg of the data. At that point, I am confident that applying the recommendations to our production database will significantly improve performance for our dynamic queries that were flagged by the query tuner,” Matthews says.

The workload approach eliminates—actually, it reconciles—redundant statistics to produce efficient solutions. Typically, a problem query that takes 30 seconds of CPU time is immediately reduced to 2 or 3 seconds. “We’ve seen immediate improvement in CPU requirements, input/output and elapsed time, and better access paths,” Matthews says.

Matthews’ group can now proactively identify and resolve emerging issues before application performance is affected. “Throughout the day, we tune workloads that we filter out [from the entire mass of query traffic] for attention,” he says.

The DBAs take advantage of report features in Optim Query Workload Tuner to gain insight for improving database query performance. DBAs can visually compare access paths side by side, view relevant statistics, and conduct what-if analysis to determine whether a change in a new index will improve the performance of a SQL statement. A query might be too CPU-intensive, consume too much I/O, or ask for too much data to read. “If it runs infrequently or is not causing production issues and is part of the workload, we’ll tune it for next week during our weekly maintenance. However, if it’s running one hundred times per day and causing production issues, that’s a repeat offender and we’re going to take action immediately to tune it,” Matthews says.

Matthews recommends DBAs look for tools that include a function that converts long SQL statements into a format that is easy to parse. “If the statement is two pages long, a huge monstrosity, a feature of our query tool called Annotation puts all the elements into a format so you can read it. When we get a serious repeat offender and have to take it back to the developer who wrote it, the Annotation makes it easier to read and shows the DBAs where missing stats are, and skewed data. The Annotation also shows us cardinality of objects and predicates, which can be critical to the tuning process: ‘This is not a good SQL statement, and here is why.’”

Access path graph (APG), commonly known as the visual explanation feature, is helpful as well, as it gives the DBA a visual view of the access path taken by the optimizer, along with cardinality of objects and costs of sorts.

>When data is all skewed up

“Data, in our experience, tends to be anything but uniform. Skewed data hurt us in the past; now we can attack it,” says Matthews. Optim Query Workload Tuner also analyzes and corrects for data skew. Without correct column distribution statistics, DB2 may not have enough information to make the best choice when choosing an access path to the data—especially if data is heavily skewed. That’s good.

Getting the correct statistics to DB2 is critical if the data is skewed, Matthews says, especially in an online environment where performance is essential. “Each day I run a workload against our dynamic cache. By the end of the week, by Friday, I have five days of statistics.” Applying just the Monday statistics recommendations—which might apply to 100 SQL statements captured—to the production-like environment often fixes most of the problems of all the workloads analyzed throughout the entire week and dramatically cuts execution time. With the new stats available to DB2’s optimizer, access to the data can be dramatically improved. “Within the workload, individual SQL statements that run for greater than 10 seconds or some queries that run more than 2 to 3 minutes can, with the appropriate statistics, drop to 2 or 3 seconds,” Matthews says. The gain in performance is achieved just by giving DB2 better statistics to compensate for data skew.

Users appreciative

System users have noticed the team’s ability to identify and resolve poorly performing queries quickly. “Before, it would take us hours or days to identify problem queries. Now, in most cases, we can have a solution within minutes,” Matthews says.

The new approach has helped Matthews and his colleagues reduce CPU time usage and has let the DBAs prevent problem queries from becoming a drag on performance. Matthews’ experience has convinced him that generating new statistics on an entire workload produces better results because it reconciles all conflicting statistics.

Best practices by the government, for the government

The Senate’s IT group moved from single-query fixes to ongoing “preventive” optimization and tuning. The new approach has cut down on the distress phone calls from users. For Matthews, it was a revelation to switch from diagnosing individual SQL statements to sift frequently through populations of queries and address them as a group.

For reasons outside the DBA’s view, the Senate tends to run different kinds of queries at different times of the day. The DBAs generally test the morning workload for two hours, then the afternoon workload for two hours, and finally the batched queries at night.

Matthews recommends that DBAs “not be afraid to use the query tuner’s variety of detailed reports. Reports generated by the workload tuner offer a wealth of detailed information about your objects.”

When only a skilled DBA can fix it

When necessary, the Senate’s DBAs still revert to the mode of catching a single major offending SQL statement. “We have all encountered that ad hoc user who uses the system once and somehow magically creates the worst possible query—so bad that it really affects performance—and we will apply a solution right there,” Matthews says.

In these instances, Matthews may reorg that object, index, restat, and rebind (in the case of static queries), possibly creating a pseudo-outage. He is glad these scenarios are the exception now. “If we analyzed each query statement individually, it would require hundreds of analysis and tuning procedures, versus one,” he says.

Matthews finds that query response times “are usually at their ugliest” after a new release or a migration. The DBAs anticipate that the database will react differently to the same workload. “We have to get ahead of the inevitable transition issues and wring them out. We capture stats, test them, and apply them to the production database over the weekend,” he says.

Does it always optimize?

Every method has its exceptions, and Matthews’ team has learned to handle them. “Once we applied new stats, and to our surprise some non-problematic SQL statements got worse,” says Matthews. “You have to be careful. These statistics will fix your offenders, but sometimes DBAs don’t reorder data as frequently as they’d like to. If statistics haven’t changed in a while and you have a desirable access path with current stats, DB2 will continue to use that path based on the old stats. However, when new statistics are applied, DB2 will now realize data may not be clustered as previously thought, therefore causing a change to an access path that is less desirable. Before you apply new stats to the database, I recommend you reorg the data first.”

When the Senate DBAs apply the recommended statistics to the production database over a weekend, after testing them in the production-like environment, it’s key to reorganize the database first, “to get back to baseline,” Matthews says. He reminds his team to use the reorg utility, which clusters the data for efficient access. “We all know the mantra: reorg, runstats, and rebind.”

There are other situations where simulation in a production-like environment is not perfect. “We can simulate the performance of the queries with new stats, but we cannot truly simulate the number of concurrent users hitting at the same time,” Matthews says. “Some queries may run faster due to more CPU availability when you don’t have a lot of users.”

The new life of Senate DBAs: Less ad hoc

Matthews’ team still gets user calls about the system response time, but much less often. “Usually they waste no time notifying us. We check our monitor, identify the resource-intensive query, and run it through the query tuner. We might find that we’re missing an index or stats, and we make the fix. But our focus is on workload tuning, not ad hoc fixes.”

Matthews points out that “we are not a huge shop.” If they tuned one query at a time, “we’d be understaffed.” Instead, they are able to spend more time doing other tasks rather than tuning queries all day. “Query workload tuning makes us proactive, not reactive,” he says.

Final vote

With the right preventive strategy and analytics tools in their repertoire, the U.S. Senate’s DBA, developer, and quality assurance groups continually optimize database query design and seek to fix problems before users are affected.

“It’s not just that now we can usually arrive at a solution for tuning queries within minutes versus hours or even days, which enables us to achieve our SLAs [service level agreements] in a timely fashion,” Matthews says. “We have become skilled at tuning entire workloads rather than single queries. To the benefit of our users, we are now proactive versus reactive in tuning bad SQL.”