Development Notebook: The Evolution of the IBM Smart Analytics System
Optimization, tuning, and the art of fitting 20 years of warehousing and analytics experience in a single package
The IBM Smart Analytics System is another step forward in integrated appliances—highly optimized and powerful, with options for built-in analytics and some attractive features aimed at database administrators (DBAs). We had a chance to sit down with two key development team members—Haider Rizvi, senior technical staff member, IBM Smart Analytics System architecture, and Nancy Kopp, program director, competitive and analyst strategy, IBM Information Management Software—and get some of the background on the system and the long history behind its development.
What sparked IBM to start building analytic appliances?
RIZVI: Market demand. Data warehousing and analytics were continuing to spread everywhere, and they often became multivendor environments. We had customers run into problems with this. In one situation, the system simply stopped running, and it wasn’t clear where the problem originated. After 15 months of frustration, it was discovered that the servers were undersized for the application and workload, and the I/O on the storage unit was bottlenecking operations.
So this kind of situation was the catalyst?
RIZVI: Not many cases are that extreme, of course, but refereeing between several vendors to determine the root cause of a problem was a serious customer pain point.
KOPP: We were also seeing customers getting stalled with the integration, trying to bring analytics online—and such delays become less tolerable all the time.
And that’s less tolerable given the demand for real-time data availability and query responses, and high availability, right?
KOPP: Of course. And you don’t get high availability on mixed, high-stress workloads without a lot of design experience.
But why build a complete, integrated system? Isn’t every situation still different?
RIZVI: Good point. Yes, certain software features will vary. But our hardware building blocks are standard, fully integrated pieces that we can clip together. They are very heavily tested and optimized—we know how they’ll behave in heavy usage, and they have been used extensively in the real world.
How long have they been in the marketplace?
KOPP: These units go back almost to when IBM started building data warehouses in the early 1990s. We developed a methodology for putting together and optimizing building blocks of hardware, software, and OS [operating system] to handle warehousing and, later, analytics.
And this led to the IBM Smart Analytics System?
RIZVI: The IBM Smart Analytics System is effectively in its fifth generation, but yes, it’s been a long-term, incremental development path to get here. Originally we named it the BCU methodology, for Balanced Configuration Unit. Over the years, BCU became the building blocks of the Smart Analytics System. Today everything in the building blocks reflects what is now market-standard and proven with today’s workloads. They are DBA-friendly.
Because the integration and load testing has been done?
RIZVI: Yes, but more than that. We did several things with the DBA in mind, including the control features. But for starters, over nearly 20 years, we have built up skills and experience at optimizing integrated, ready-to-run systems with balanced operations, so every component from processor to memory to I/O and operating system and the database can handle a specified workload without getting bottlenecked, with an attractive total cost of ownership.
With that approach, do you risk creating a result that’s too generic for some customers?
RIZVI: Usually not. Doing this over time, we learned which integration tasks we are better equipped to handle than most customers, with a combined hardware-software solution, and the kind of support they require.
Is building an analytics system really that complex? Lots of companies have created their own.
KOPP: It is a major task. Many of those companies you mention struggled with the implementation and needed up to a year to get the system working smoothly.
RIZVI: It’s useful to understand two things. First, creating a high-performance analytics system is a delicate balancing act. All of the components, from the memory and processor up through the BI [business intelligence] tools, need to be tuned to work together and balanced against each other. IBM patented the methodology we use to do this. We’re not talking about applying a system tuning checklist, or a couple of “a-ha!” moments.
So even the patented methodology is not a story of a single nova-like breakthrough.
RIZVI: That’s right. It’s a story of persisting over two decades with nonstop, incremental improvements. That led to the strong, experience-based methodology for integrating and optimizing appliances—which was granted the patent. That brings us to the second point: we’re talking about creating a system that wrings every drop of performance out of the hardware, and one that can be unboxed, loaded with data, and working in a few days or weeks. This is not something that just anyone can do.
It can take months.
RIZVI: It’s fair to say that customers who take on the task of buying and integrating their own mix of servers, memory, I/O, operating system, database, warehouse, analytic applications, and ETL [extract, transform, load]—and backup—expect to spend months on the task.
What was the biggest challenge that the team faced?
RIZVI: [laughs] I’m not sure that I can pick one out! But we definitely stay up late thinking about how to keep I/O at maximum efficiency. Many pieces contribute to efficient I/O. At one point, when we saw the stresses that analytic workloads applied, we tried offloading temp space to a solid-state device. The performance gain was exceptional.
That must have felt like a breakthrough moment.
RIZVI: It did—and this feature became a permanent part of the Smart Analytics System 5600 and 7700. The systems also leverage compression; it plays an effective role in driving down the cost of handling a specific workload.
You mentioned that your integrated appliances are DBA-friendly. What else relates to that?
KOPP: Development of the IBM Smart Analytics System brought together our hardware and software groups within IBM, which assisted collaboration. Unifying the two teams led to features—such as the system management console—that span software and hardware, helping DBAs to take on more responsibility on both sides and to have a more strategic role.
What is the system management console?
RIZVI: Every IBM Smart Analytics System 5600 and 7700 will be equipped—within the next several months—with a management console that gives the DBA or other administrator command-line control over the entire system: hardware, software, operating system, driver, firmware, and other components.
How is it different from typical software administration screens?
RIZVI: This console gives DBAs or other administrators the ability to maintain the whole cluster for not just software, but firmware and hardware as well. The administrator can see OS and firmware upgrades at a glance, and can orchestrate an update across an entire cluster that has dozens of data nodes.
How does the administrator operate the management console?
RIZVI: It’s command-line control oriented. The DBA can address the tooling for all DB2 software, IBM InfoSphere, the OS, and firmware. When conducting updates, at the right point it will even instruct the administrator to reboot so the changes take effect.
KOPP: The console eliminates certain surprises. You won’t have a situation where the system crashes and that’s how you learn a colleague updated the new OS version without telling you.
So, how does that lead to a more strategic role for DBAs?
RIZVI: Database administrators have been operating beyond the database for years; for example, by learning the OS and determining how to lay data down. In so doing, they’ve become co-architects in the overall data warehouse environment.
How does IBM ensure these units work right out of the box with the customer’s data?
RIZVI: We’re taking our direction from real-world deployments. We do extensive testing with the Smart Analytics System—it’s as if a car manufacturer put your family and luggage in the vehicle and towed your boat repeatedly to wherever you vacation, before delivering it to you. When possible, we develop and test the Smart Analytics System using real customers’ data.
Can you be more specific about the testing process?
RIZVI: IBM runs three types of query scenarios: very complex and demanding, a group of medium complexity, and finally large numbers of small queries. Our workloads are in two categories: one for performance benchmarking, and the other for customer stress tests. Both mimic large numbers of users and put the I/O system to the test. We benchmark each unit—this is important to mention—and then we use the same benchmark tests after the unit is installed with the customer’s data, to make sure no problems have entered the picture.
What is the upcoming development path for the IBM Smart Analytics System?
RIZVI: We’ll be studying and making sure we can handle growing data warehousing workloads. Pretty much what we’ve done for 20 years: ongoing, incremental improvement and optimization.
How do you see the DBA’s role evolving, and does IBM play a role in that evolution?
RIZVI: Database administrators’ roles are changing in two major ways: first, spanning software and hardware both, and second, helping business users tap more of the potential of warehouses, analytics, and other new applications. That, as it happens, can occur when they are less often forced into a troubleshoot-and-repair mode. We support their more strategic role with integrated appliances that give them broader control, give them fewer issues to troubleshoot because of our integration and testing, and contain important new applications that will take star roles in the years ahead.