DB2 pureScale Clustered Database Solution: Part 1

An architecture and technology overview

This article was coauthored by Yvonne Chan and John Hehir.
This is a two-part article. Part 1 covers DB2 pureScale architecture and technology, and Part 2 covers DB2 pureScale application configuration.

Continuous availability, application cluster transparency, extreme capacity: we’ve seen these keywords associated with IBM DB2 pureScale in all the marketing materials. But what do they really mean?

IBM® DB2® pureScale® technology is more than just a feature—it’s a whole new way to look at a DB2 databases. You’re no longer limited to a single host to access your data, and you are not required to partition your data so that each host owns a partition. DB2 pureScale provides a clustered solution that makes use of multiple hosts to access the same data partition, which allows for increased capacity and continuous availability.

DB2 pureScale technology

DB2 pureScale is a tightly integrated database clustering solution using IBM DB2 for Linux, UNIX and Windows as its core engine running on IBM POWER® and IBM System x hardware servers. DB2 pureScale can be installed on AIX, SUSE Linux Enterprise Server, or Red Hat Enterprise Linux. For the supporting network infrastructure, Ethernet is required for client/server connections. Infiniband (IB) or 10 Gigabit Ethernet (10 GigE) is required for high-speed, low-latency communication between members and cluster-caching facilities (see Figure 1).

DB2 pureScale Clustered Database Solution: Part 1 – Figure 1

Figure 1. The DB2 pureScale architecture

The DB2 pureScale feature is based on industry-leading IBM System z® data sharing architecture.

Clusters can consist of up to 128 members, providing 24/7 availability to database application(s). Clusters can be scaled out horizontally by adding servers or vertically by adding processors or memory. DB2 pureScale is an active/active failover solution requiring no intervention if a member or caching facility (CF) encounters a software failure. An entire cluster can be configured and installed from any host that will be part of the DB2 pureScale cluster.

DB2 pureScale architecture

To understand how DB2 pureScale can provide the resiliency that applications need from their databases, first you need to have a high-level understanding of what a clustered environment is and how it helps give any solution resiliency in the face of unexpected failures (see Figure 2).

DB2 pureScale Clustered Database Solution: Part 1 – Figure 2

Figure 2. DB2 pureScale cluster with four members and two cluster caching facilities


DB2 pureScale cluster members

DB2 members are the “engines” (db2sysc processes) that accept client connections and process information for applications. Each DB2 member has access to the database and any data therein; however, each member has its own local memory containing things like bufferpools, package cache, utility heap, sortheap, and locklists. Regardless of the data being accessed, transactions can run on any member in the cluster. Each member has its own transaction log stream that resides on the shared filesystem.

Cluster caching facility (CF)

Cluster caching facilities—a key component of DB2 pureScale—provide several critical services, their primary purpose being the management of central resources that are shared between the members. These shared resources include a global bufferpool and a global lock manager.

Global Bufferpool (GBP)

This part of the CF holds all the dirty pages (pages that have rows that have been updated, inserted, or deleted in the DB2 pureScale instance). Memory management in a DB2 pureScale instance is no longer single-tiered, but exists over two tiers. Local bufferpools exist per member and have a copy of all the pages the member needs. In addition to a copy of every dirty page, the GBP also holds information about which members have a copy of the current page.

Global Lock Manager (GLM)

Global lock manager services are used to manage locks held locally on members. Before any member can update a row on any page, the member needs to negotiate with the global lock manager to get the appropriate locks on the row and page. As locks are acquired and released, the pages on all the members can be invalidated as required. For example, as soon as a member makes a change to a page, all the other copies of the page that belong to other members are invalidated automatically by the lock release via remote direct memory access (RDMA).

Each cluster caching facility has a role designation. Typically one CF will be designated the PRIMARY role, while the other will be available for failover. The PRIMARY role is the main holder of all lock information, whereas the PEER role only requires a subset of this information (duplexing all lock information would be performance-intensive and introduce unnecessary transaction processing overhead). For these reasons, DB2 only duplexes the lock information that is necessary for a PEER cluster caching facility to take over the PRIMARY role. Any locking information that can be rebuilt in a short amount of time is not duplexed.

DB2 Cluster Services

This component of pureScale coordinates and orchestrates the recovery processes in the event of planned or unplanned downtime. Tivioli System Automation for Multi-Platforms (TSAMP), which is included in DB2 pureScale, is used to help detect issues in the system and recover from failures automatically. This subsystem includes Reliable Scalable Clustering Technology (RSCT), which monitors hardware components in the system, including network adapters (both Ethernet and InfiniBand). RSCT also provides cluster management and the ability to fence disks off if there is a hardware issue with any particular host and communication with that host has been disconnected. TSAMP monitors the members, CFs, and the General Parallel File Systems (GPFS) used by the database to help ensure that they are available at all times.

Cluster InterConnect

Cluster members are connected using either Infiniband (IB) or RDMA over Converged Ethernet (RoCE). Regardless of which connection standard is used, the RDMA protocol is used to communicate between hosts via the network adapter in a direct manner. RDMA is accessed using the user Direct Access Program Library (uDAPL). RDMA provides a low-latency method for a host to remotely change the state of memory pages on another host. This method does not require the kernel of the other host to be interrupted—and as a result, it gives DB2 a fast method to allow dirty pages to move between members and CFs.

Cluster File System

The cluster file system is based on a cache-coherent IBM file system called GPFS. GPFS allows us to write to the file system from any host and see the changes on any other host immediately. GPFS provides a distributed file system that is easily managed and allows for all the DB2 members to access the same database at the same time.

The components described above provide the core of the DB2 pureScale architecture. This comprehensive database solution offers high speed and low-latency transaction throughput while also providing recovery, scalability, and availability. All of the pureScale components we have discussed are included as part of the DB2 pureScale feature. They are installed as part of the regular installation process and configured as part of the DB2 pureScale instance creation step.

At a minimum, DB2 pureScale requires two DB2 members and two CFs running with independent hardware components (running with less than two of any component does not provide redundancy in the event of a hardware or software failure). You can run a CF and a member on a single host system, but failures at the host level would affect both the member and the CF on the same host.

Part 2 of this article covers application configuration.

For more information

To learn more about DB2 pureScale features, visit this site.

DB2 pureScale is available as part of several DB2 product editions: