Blogs

Creating and Using Partitioned Tables

Improve query performance and achieve greater scalability

DB2 for LUW Offering Manager, IBM

Table partitioning is a data organization scheme in which table data is divided across multiple storage objects called data partitions, based on values in one or more columns. A partitioned table can contain significantly more data than an ordinary table; yet, by taking advantage of a process known as partition elimination, queries run against partitioned tables typically execute faster and require less disk I/O than when they are run against nonpartitioned tables. (The DB2 Optimizer is data-partition aware, and only relevant data partitions are scanned during query execution.)

In this column, I’ll show you how to create partitioned tables, and I’ll show you how a partitioned table’s definition determines where individual records are stored.

Range-partitioned tables

A data partition, also referred to as a range (currently DB2 supports only a range partitioning scheme), contains a subset of rows that are stored separately from other sets of rows in the table. Different data partitions can reside in different table spaces or in the same table space. The specifications provided in the PARTITION BY clause of the CREATE TABLE statement define how data in a table is partitioned. The syntax for this optional clause varies depending upon whether you want DB2 to generate each partition range evenly within a specified table data range or you want to manually control the boundary for each range. The syntax for using automatic partition generation for a given table data range is:

 

PARTITION BY <RANGE>
([ColumnName] <NULLS LAST | NULLS FIRST> ,...)
(
STARTING <FROM> [Start | MINVALUE | MAXVALUE] |
STARTING <FROM> ([Start | MINVALUE | MAXVALUE] ,...)
<INCLUSIVE | EXCLUSIVE>
ENDING <AT> [End | MINVALUE | MAXVALUE] |
ENDING <AT> ([End | MINVALUE | MAXVALUE] ,...)
<INCLUSIVE | EXCLUSIVE>
EVERY <(>[Constant] <DurationLabel> <)>
,...)

The syntax for manually specifying partitions is:

PARTITION BY <RANGE>
([ColumnName] <NULLS LAST | NULLS FIRST> ,...)
(
<PARTITION [PartitionName]>
STARTING <FROM> [Start | MINVALUE | MAXVALUE] |
STARTING <FROM> ([Start | MINVALUE | MAXVALUE] ,...)
<INCLUSIVE | EXCLUSIVE>
ENDING <AT> [End | MINVALUE | MAXVALUE] |
ENDING <AT> ([End | MINVALUE | MAXVALUE] ,...)
<INCLUSIVE | EXCLUSIVE>
<IN [TSName]>
<INDEX IN [IndexTSName]>
<LONG IN [LongTSName]>
,... )

where:

  • ColumnName: Identifies one or more columns (up to 16) by name, whose values are used to determine which data partition a particular row is to be stored in. (The group of columns specified make up the partitioning key for the table—see sidebar, “Choosing a table partitioning key.”) No column with a data type that is a LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, DBCLOB, XML, a distinct type based on any of these data types, or a structured data type can be used as part of a data partitioning key.
  • PartitionName: Identifies the unique name to be assigned to the data partition to be created.
  • Start: Specifies the low end of the range for each data partition.
  • End: Specifies the high end of the range for each data partition.
  • Constant: Specifies the width of each data-partition range when the automatically generated form of the syntax is used. Data partitions will be created starting at the STARTING FROM value and will contain this number of values in the range. This form of the syntax is supported only if the partitioning key is made up of a single column that has been assigned a numeric, date, time, or time stamp data type.
  • DurationLabel: Identifies the duration that is associated with the Constant value specified if the partitioning key column has been assigned a date, time, or time stamp data type. The following values are valid for this parameter: YEAR, YEARS, MONTH, MONTHS, DAY, DAYS, HOUR, HOURS, MINUTE, MINUTES, SECOND, SECONDS, MICROSECOND, and MICROSECONDS.
  • TSName: Identifies the table space in which each data partition is to be stored.
  • IndexTSName: Identifies the table space in which the partitioned index for each data partition is to be stored.
  • LongTSName: Identifies the table space in which the values of any long columns are to be stored.

Note: Parameters shown in angle brackets (< >) are optional; parameters or options shown in normal brackets ([ ]) are required and must be provided; a comma followed by ellipses (...) indicates that the preceding parameter can be repeated multiple times.

Thus, if you wanted to create a table named SALES and partition it such that each quarter’s data is stored in a different partition and each partition resides in a different table space, you could do so by executing a CREATE TABLE statement like this:

CREATE TABLE sales
(sales_date DATE,
sales_amt NUMERIC(5,2))
IN tbsp0, tbsp1, tbsp2, tbsp3
PARTITION BY RANGE (sales_date)
(STARTING ‘1/1/2010’ ENDING ‘12/31/2010’
EVERY 3 MONTHS)

Choosing an effective table partitioning key column is essential to taking advantage of the benefits of table partitioning. The most effective table partitioning key column(s) are columns that provide advantages in partition elimination. For example, if you normally query a table for records on the basis of date, partition the table on a date or time column.

Similarly, if you want to remove or archive portions of a table’s data over time, partition the table according to the way you plan to archive records. For example, if you want to archive any data that is more than three years old, partition the table by week, month, or quarter so that you can roll out an old partition at the end of each week, month, or quarter, respectively. Figure 1 illustrates how the resulting table would look.

Figure 1: Data stored in a simple partitioned table.

On the other hand, if you wanted to create a table named INVENTORY and partition it such that rows with item numbers that fall in the range of 1 to 100 are stored in one partition that resides in one table space, rows with numerical values that fall in the range of 101 to 200 are stored in another partition that resides in another table space, and so on, you could do so by executing an SQL statement that looks like this:

CREATE TABLE inventory
(item_no INT,
desc VARCHAR(20))
PARTITION BY (item_no NULLS FIRST)
(PARTITION PRODUCE STARTING MINVALUE ENDING 100 IN tbsp0,
PARTITION DAIRY STARTING 101 ENDING 200 IN tbsp1,
PARTITION BAKERY STARTING 201 ENDING 300 IN tbsp2,
PARTITION MEAT STARTING 301 ENDING 400 IN tbsp3)

In this case, rows with an ITEM_NO value from 1 to 100 will be stored in a partition named PRODUCE (whose data is written to table space TBSP0), rows with an ITEM_NO value from 101 to 200 will be stored in a partition named DAIRY (whose data is written to table space TBSP1), and so on; rows with a NULL value for ITEM_NO will be stored in the PRODUCE partition.

It is important to note that when the NULLS FIRST option is specified, the first partition must start at MINVALUE. (Likewise, if the NULLS LAST option is used, the last partition must end with MAXVALUE.) Otherwise, an attempt to insert a record with a NULL value for the partitioning key will result in a “data out of bounds” error. Additionally, each partition can have its data, index, and long data in separate table spaces. If you don’t specify an index or long table space, partitioned indexes and long column data will go in the same table space as the data.

By default, range boundaries are inclusive. To prevent specific records from being stored in a particular partition, you can create ranges using the EXCLUSIVE option. For example:

CREATE TABLE sales
(sales_date DATE,
sales_amt NUMERIC(5,2))
IN tbsp0, tbsp1, tbsp2, tbsp3
PARTITION BY RANGE (sales_date)
(STARTING ‘1/1/2010’ ENDING ‘3/31/2010’ EXCLUSIVE,
STARTING ‘3/31/2010’ ENDING ‘6/30/2010’ EXCLUSIVE,
STARTING ‘6/30/2010’ ENDING ‘9/30/2010’ EXCLUSIVE,
STARTING ‘9/30/2010’ ENDING ‘12/31/2010’)

In this example, a record with a sales date of 3/31/2010 won’t be stored in table space TBSP0. Instead, it will be stored in table space TBSP1. When you insert a row into a partitioned table, it is automatically placed into the appropriate data partition based on its key value and the range it falls within. If the key value falls outside of all defined ranges for the table, the insert will fail and an error will be generated.

Easy roll-in and roll-out of data

Another advantage of using partitioned tables is that new data can be easily added to a table (as another data partition) while old or obsolete data can be easily removed and archived. In my next column, I’ll walk you through the process of adding (rolling in) and removing (rolling out) data partitions. I’ll also show you how enhancements made in DB2 9.7 make the process of rolling data partitions in and out much faster and less disruptive.

Resources