Know Before You Go
Advanced testing strategies outline the path ahead—without pitfalls
Change is inevitable. Bringing new applications and new systems online is part of the job. But with change comes risk. Every new piece of code raises the question, Will it work the way it’s supposed to? When it comes to business-critical systems, the answer must be yes.
The best way to know for sure is through testing. Done properly, testing reduces risks and lets you be a change champion without gambling. But what should a test measure, and how do you build effective tests?
To help answer these questions, we turned to experts: the Performance and Benchmark team at Infor, a major ERP provider based in Alpharetta, Georgia, that is known for its rigorous benchmark practices and skills. Hans Kamman, director of Development and Release Management, runs the Infor Performance and Benchmark team based in Barneveld, the Netherlands. Kamman and three of his performance engineers, Cees Padmos, Dick Westeneng, and Adrian Voortman, offered up advice on benchmarking and the basics of efficient, effective testing.
Creating an effective test is an exercise in balance. Your test must produce results that repeat when the application is installed and running in a live environment, but you don’t have the time or budget to re-create your organization’s entire production system. If you’ve decided that you need more specific information than a standard vendor benchmark can provide, you need to create just enough realism in three areas: data, users (or processes), and infrastructure. For more information on vendor benchmarks, see the sidebar “What goes into a vendor benchmark?”
What data should you use?
For testing purposes, the goal is to create a data set that represents your organization’s data but is tailored to the specific needs of testing. Copies of actual production data are a good place to start, but you’ll almost certainly need to do some tweaking. Evaluate the quality of your test data by thinking in two dimensions: volume and distribution (skew).
The volume question may be the easiest: do you have enough data to create a realistic test load? From there, the decisions become more complicated. For example, when figuring out the correct distribution of your test data set, you obviously can’t just make a million copies of the same record. But a perfectly even distribution isn’t natural either—most data sets have areas of greater and lesser concentration.
You should also consider the impact of erroneous data on the system that you’re testing. Decide whether you need to include some percentage of mistakes.
Finally, stay aware of privacy and security issues if you’re sampling production data. You may need to set up auditable protocols that randomize, mask, or otherwise obscure your test data.
Let ROI be your guide
Want a clear map to how much testing you should do? Calculate what an application failure or slowdown would cost the business, and then use that number to make scope and budget decisions.
Who are your users?
Of course, test data is inert—you need scripts to simulate the actions that your system will support, such as transactions for an OLTP database or queries on a data warehouse. Writing a good script requires you to understand what users do and how often and how fast they do it.
The foundation of a good script starts with a clear understanding of the business process that the application will support. For example, how many total users will there be, and how many will be logged in at the same time? How many active sessions can you expect, and what is the maximum number of simultaneous transactions that the system needs to support? Critical resources for good script design are articulate, involved folks from the business side who can work with you to develop effective specifications. “Your starting point should be to establish the user roles that will figure into the testing,” says Padmos. “And don’t forget that some business processes involve multiple user roles. For example, a purchase order may have to sit in one or more managers’ queues for approval after the buyer sets it up.”
Building effective test scripts requires imagination and awareness. For example, an automated script may need less than a second to fill out a query or data entry form that would take a human several minutes to complete. Therefore the test scenario needs to incorporate waiting time between steps in the test. You can expect notable human variations in this waiting time. “What happens, for example, if the keystroke speed of operators goes up by 10 percent? Does that actually ever happen? Does it happen only on the day after Thanksgiving or every day after the 10 a.m. caffeine break? And would it adversely affect system response time?” asks Kamman rhetorically. “While it’s easier to write a script that tests every function sequentially, humans are almost never that methodical and predictable. To be realistic, the test scenarios should be able to simulate user behavior that can show almost random variations.”
What goes into a vendor benchmark?
In-house testing takes time and money. Standard benchmarks offered by many software vendors are cost-effective alternatives. For its flagship ERP application, ERP LN, the Infor team tests up to 10,000 highly active concurrent users running on a single database server with 16 application servers. “We benchmark the same applications across different hardware and database platforms,” says Kamman. “Infor has five scripted business scenarios in our ERP LN benchmark kit: sales, purchasing, finance, service, and browsing.”
Four of the Infor ERP LN benchmark scenarios consist of a complete flow, using a series of sessions to search for, enter, and process data. The browsing scenario, by contrast, entails a series of data lookup sessions. In combination, these five scenarios satisfy the sizing requirements of many companies.
If you’re having trouble making a decision or a business case for in-house testing, ask yourself these questions:
- How closely do the test data, workloads, and database configurations of the vendor benchmark match up to those of your company?
- Are you planning a big-bang implementation? All the more reason to smoke out performance issues before you go live.
- Is your IT environment unusual or nonstandard?
- Is your business volume unusually high? Will the application have an extraordinary number of users, or will your data tables grow faster than industry norms?
Once you have scripts that capture the necessary business processes and user behavior, set aside some time to review how the scripts access your data sets. Is there too much contention? Not enough? “For example, are purchase orders concentrated on a few vendors, or are they spread out evenly across several vendors?” asks Voortman. “Simply rotating through a hundred sample vendors in the same sequence isn’t realistic, but neither is issuing all purchase orders to a single vendor.” Of course, some contention does happen in real life: the warehouse that carries a popular product will have the highest transaction rate—a pressure situation that you’ll want to test.
The goal is to eliminate unrealistic conditions that will affect performance and skew results—and to identify unintended consequences. “One company that we worked with created a test script that attributed all orders to a single vendor,” says Kamman. “The idea was partially to be sure that the system could handle a specific volume, but they were also trying to reduce the amount of work that they needed to do on the test environment.” However, the strategy backfired slightly. “The high order volume caused the sample vendor to exceed its credit limit, and the system started to refuse orders,” says Westeneng.
How suitable is your test infrastructure?
Creating a test infrastructure that reflects your production systems can be a real challenge. Some companies can run a production-like environment on off-hours or weekends, but most do not have that luxury. If you’re struggling with this, you might want to ask a systems integration partner or software vendor for help. “Our consulting services, for example, may be able to create a close-enough version of our customer’s production environment,” says Kamman. “For specific requirements, the application provider may also have tools and scripts a customer can draw on for testing.”
Infor and IBM: Testing the latest iron
Infor usually focuses on testing its ERP LN software, but in 2009 and 2010 IBM asked the company to help benchmark IBM POWER6 and POWER7 systems. “When Infor tested IBM POWER6 with DB2 9.5, we encountered a huge performance bottleneck,” says Voortman. The CPU could not be saturated, and response times ballooned to as much as 500 seconds. In the end, Infor found that the cause was the database setup: a single table space and one large data file for all tables and indexes using JFS2 file system fast pre-allocation. Dividing the data and indexes over multiple table spaces and data files restored performance.
When testing a beta version of the POWER7 system, Infor discovered that the benchmarks indicated a maximum of only 2,100 users on 16 cores. In comparison, a POWER7 predecessor, the POWER6 system at 5 GHz in 2-tier mode, had been tested with 2,800 users. It took substantial tuning and testing—and help from an IBM kernel engineer—to discover that the test machine was running in power save mode at 2.5 GHz instead of the normal clock speed of 3.5 GHz shown by prtconf, a command that shows the system hardware configuration. In its optional TurboCore mode at 3.864 GHz, the system performed up to expectations.
Open minds find answers
Recognize that running tests will likely be as much of a learning experience as building them. The initial run of a test may show unexpected results—usually substandard performance. That starts the detective work. Poor test results may have causes outside the application itself. Keep an open mind and be ready to pull in database, network, and hardware experts to help explain mysterious results. And while you aim for realism, understand that no test is foolproof. “You cannot model reality 100 percent,” cautions Kamman.
Testing an application carefully builds a deeper understanding of how it works and how your own IT environment affects applications and business processes. To bring new applications and versions online, it is essential to reduce risk and protect IT performance. The better the benchmarks, the more freedom you have to be a champion of change.