Explicit Partition Selection in MySQL 5.6

In case you missed it, MySQL 5.6 added support for explicit partition selection. Since its release in MySQL 5.1, I have found partitioning an incredibly useful feature for aging out time-series data, and I plan to put this new syntax to work.

Today I wanted to show two hypothetical examples of how it can be used.

Consider it an optimizer hint

MySQL is usually able to optimize a query to search only the partitions which will be required. For example here we can see there are 4 partitions but 2 are searched:

CREATE TABLE t1 (
 id INT NOT NULL PRIMARY KEY auto_increment,
 cola char(255)
)
PARTITION BY RANGE(id) (
 PARTITION p0 VALUES LESS THAN (64),
 PARTITION p1 VALUES LESS THAN (128),
 PARTITION p2 VALUES LESS THAN (192),
 PARTITION p3 VALUES LESS THAN MAXVALUE
);

EXPLAIN PARTITIONS SELECT * FROM t1 WHERE id BETWEEN 120 AND 140\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p1,p2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

This feature is called partition pruning.

However, similar to optimizer hints there may be cases where you are in a better position to decide which partitions should be accessed. A (poor) example to demonstrate this, is to alter the previous example and pass the value 120 through a non-deterministic function:

CREATE FUNCTION myint (i INT)
RETURNS INT NOT DETERMINISTIC
RETURN i;

mysql> EXPLAIN PARTITIONS SELECT * FROM t1 
WHERE id BETWEEN myint(120) AND 140\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p0,p1,p2,p3 <-- requires all partitions
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN PARTITIONS SELECT * FROM t1 PARTITION (p1, p2) 
WHERE id BETWEEN myint(120) AND 140\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p1,p2 <-- searches two partitions mentioned
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

Workaround partition-columns required in primary key

I am not sure I recommend this suggestion, but for some applications retrofitting in partitioning is difficult because it may require modification of the primary key. Here is an example:

mysql> CREATE TABLE invoices (
->      id INT NOT NULL PRIMARY KEY auto_increment,
->      created_at datetime NOT NULL,
->      customer_id INT NOT NULL,
->      total_amount DECIMAL(10,2)
->     )
->      PARTITION BY RANGE (YEAR(created_at)) (
->       PARTITION p2009 VALUES LESS THAN (2010),
->       PARTITION p2010 VALUES LESS THAN (2011),
->       PARTITION p2011 VALUES LESS THAN (2012),
->       PARTITION p2012 VALUES LESS THAN (2013),
->       PARTITION pCurrent VALUES LESS THAN MAXVALUE
->     );
ERROR 1503 (HY000): A PRIMARY KEY must include all 
columns in the table's partitioning function

It is possible to use the new explicit selection as a way of bypassing this requirement by partitioning by range on the primary key, and then naming the partitions as something useful. For example:

CREATE TABLE invoices (
 id INT NOT NULL PRIMARY KEY auto_increment,
 created_at datetime NOT NULL,
 customer_id INT NOT NULL,
 total_amount DECIMAL(10,2)
)
 PARTITION BY RANGE (id) ( 
  PARTITION p2009 VALUES LESS THAN (324334),
  PARTITION p2010 VALUES LESS THAN (673898),
  PARTITION p2011 VALUES LESS THAN (1203248),
  PARTITION p2012 VALUES LESS THAN (1703940),
  PARTITION pCurrent VALUES LESS THAN MAXVALUE
);

Application queries can then be written as:

SELECT * FROM invoices PARTITION(p2009) WHERE customer_id=2134;

Why I said I don’t recommend it, is that if a created_at date were to be modified, the row could suddenly find itself in the wrong partition. With the typical usage (partition by range on created_at) the row will be moved to another partition automatically upon modification.

How do you use partitioning?
Do you find explicit partition selection useful?
Leave a comment!