Optimizer Trace and EXPLAIN FORMAT=JSON in 5.7

I accidentally stumbled upon this Stack Overflow question this morning:

I am wondering if there is any difference in regards to performance between the following:

SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4);
SELECT ... FROM ... WHERE someFIELD between  0 AND 5;
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...;

It is an interesting question because there was no good way to answer it when it was asked in 2009. All of the queries resolve to the same output in EXPLAIN. Here is an example using the sakila schema:

mysql> EXPLAIN SELECT * FROM film WHERE film_id BETWEEN 1 AND 5\G
mysql> EXPLAIN SELECT * FROM film WHERE film_id IN (1,2,3,4,5)\G
mysql> EXPLAIN SELECT * FROM film WHERE film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5\G
********* 1. row *********
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 5
     filtered: 100.00
        Extra: Using where

Times have changed though. There are now a couple of useful features to show the difference :)

Optimizer Trace

Optimizer trace is a new diagnostic tool introduced in MySQL 5.6 to show how the optimizer is working internally. It is similar to EXPLAIN, with a few notable differences:

  • It doesn't just show the intended execution plan, it shows the alternative choices.
  • You enable the optimizer trace, then you run the actual query.
  • It is far more verbose in its output.

Here are the outputs for the three versions of the query:

  1. SELECT * FROM film WHERE film_id BETWEEN 1 AND 5
  2. SELECT * FROM film WHERE film_id IN (1,2,3,4,5)
  3. SELECT * FROM film WHERE film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5

What is the difference?

The optimizer trace output shows that the first query executes as one range, while the second and third execute as 5 separate single-value ranges:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 5,
                      "ranges": [
                        "1 <= film_id <= 1",
                        "2 <= film_id <= 2",
                        "3 <= film_id <= 3",
                        "4 <= film_id <= 4",
                        "5 <= film_id <= 5"
                      ]
                    },
                    "rows_for_plan": 5,
                    "cost_for_plan": 6.0168,
                    "chosen": true
                  }

This can also be confirmed with the handler counts from SHOW STATUS:

BETWEEN 1 AND 5: 
 Handler_read_key: 1
 Handler_read_next: 5
IN (1,2,3,4,5):
 Handler_read_key: 5
film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5:
 Handler_read_key: 5

So I would say that BETWEEN 1 AND 5 is the cheapest query, because it finds one key and then says next, next, next until finished. The optimizer seems to agree with me. A single range access plus next five times costs 2.0168 instead of 6.0168:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 5,
                      "ranges": [
                        "1 <= film_id <= 5"
                      ]
                    },
                    "rows_for_plan": 5,
                    "cost_for_plan": 2.0168,
                    "chosen": true
                  }
                }

For context, a cost unit is a logical representation of approximately one random IO. It is stable to compare costs between different execution plans.

Ranges are not all equal

Perhaps a better example to demonstrate this, is the difference between these two ranges:

  • SELECT * FROM film WHERE film_id BETWEEN 1 and 20
  • SELECT * FROM film WHERE (film_id BETWEEN 1 and 10) or (film_id BETWEEN 911 and 920)

It's pretty obvious that the second one needs to execute in two separate ranges. EXPLAIN will not show this difference, and both queries appear the same:

********* 1. row *********
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 20
     filtered: 100.00
        Extra: Using where

Two distinct separate ranges may be two separate pages, and thus have different cache efficiency on the buffer pool. It should be possible to distinguish between the two.

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON was introduced in MySQL 5.6 along with OPTIMIZER TRACE, but where it really becomes useful is MySQL 5.7. The JSON output will now include cost information (as well as showing separate ranges as attached_condition):

********* 1. row *********
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10.04"
    },
    "table": {
      "table_name": "film",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "film_id"
      ],
      "key_length": "2",
      "rows_examined_per_scan": 20,
      "rows_produced_per_join": 20,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "6.04",
        "eval_cost": "4.00",
        "prefix_cost": "10.04",
        "data_read_per_join": "15K"
      },
      "used_columns": [
        "film_id",
        "title",
        "description",
        "release_year",
        "language_id",
        "original_language_id",
        "rental_duration",
        "rental_rate",
        "length",
        "replacement_cost",
        "rating",
        "special_features",
        "last_update"
      ],
      "attached_condition": "((`film`.`film_id` between 1 and 10) or (`film`.`film_id` between 911 and 920))"
    }
  }
}

With the FORMAT=JSON output also showing cost, we can see that two ranges costs 10.04, versus one big range costing 9.04 (not shown). These queries are not identical in cost even though they are in EXPLAIN output.

Conclusion

I have heard many users say "joins are slow", but a broad statement like this misses magnitude. By including the cost information in EXPLAIN we get all users to speak the same language. We can now say "this join is expensive", which is a much better distinction :)

It is time to start using OPTIMIZER TRACE, and particularly in 5.7 ditch EXPLAIN for EXPLAIN FORMAT=JSON.

An easy way to describe MySQL's Binary Log Group Commit

It struck me today; there is an easy way to describe MySQL's Binary Log group commit improvements from MySQL 5.0-5.7 by using the example of a single ferry trying to ship passengers from point A to point B:

Ferry from Point A to point B

MySQL 5.0 Behaviour

In MySQL 5.0, the ferry will pick up the next passenger in line from point A, and transfer them to point B. The trip between A and B takes about 10 minutes return trip, so it's possible that several new passengers will arrive while the ferry is in transit. That doesn't matter; when the ferry arrives back at point A, it will only pick up the very next passenger in line.

MySQL 5.6 Behaviour

In MySQL 5.6, the ferry will pick up all passengers from the line at point A, and then transfer them to point B. Each time it returns to point A to pick up new passengers, it will collect everyone who is waiting and transfer them across to point B.

This is measurably better performance in real-life situations where many passengers tend to arrive while waiting for the ferry to arrive back at point A, and the trip between A and B tends to take some time. It is not so measurable in naive benchmarks that run in a single-thread.

There is no configuration necessary to enable group commit in 5.6. It works by default.

MySQL 5.7 Behaviour

MySQL 5.7 behaves similarly to 5.6 in that it will pick up all waiting passengers from point A and transfer them to point B, but with one notable enhancement!

When the ferry arrives back at point A to pick up waiting passengers, it can be configured to wait just a little bit longer with the knowledge that new passengers will likely arrive. For example: if you know the trip between point A and point B is 10 minutes in duration, why not wait an extra 30 seconds at point A before departing? This may save you on roundtrips and improve the overall number of passengers that can be transported.

The configuration variables for artificial delay are binlog-group-commit-sync-delay (delay in microseconds) and binlog-group-commit-sync-no-delay-count (number of transactions to wait for before deciding to abort waiting).

Conclusion

In this example passengers are obviously transactions, and the ferry is an expensive fsync operation. It's important to note that there is just one ferry in operation (a single set of ordered binary logs), so being able to tune this in 5.7 provides a nice level of advanced configuration.

Faking a slave: Subscribing to mysql row-based-replication changes

In complex systems it's often useful to be able to receive notification when rows have been modified in MySQL so that you can invalidate various external caches or indexes. For example: memcached, Sphinx, Lucene.

In the case of MySQL's default statement-based replication this can be quite tricky to do, as it would likely require an SQL parser to determine what was intended to be modified. However, this is made much simpler with Row-based Replication (something I recommend switching to).

A C++ library exists to be able to listen to MySQL replication, but what I want to demonstrate is that it is also very simple to be able to do this with mysqlbinlog:


shell> mysqlbinlog --read-from-remote-server --stop-never --host localhost.localdomain --port 5616 -u msandbox -pmsandbox --verbose mysql_sandbox5616-bin.000004 | grep '^### '

To explain how this command works:

  • --read-from-remote-server tells mysqlbinlog to act like a slave, and fetch remote files rather than local (new option to 5.6).
  • --stop-never makes mysqlbinlog block waiting for a continual stream of updates, rather than exiting. Perfect!
  • --verbose rewrites row-based replication events to be pseudo SQL statements. The pseudo statements are very easy to parse. For example:
    ### DELETE FROM `test2`.`a`
    ### WHERE
    ###   @1=1
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=1
    ### UPDATE `test2`.`a`
    ### WHERE
    ###   @1=10
    ### SET
    ###   @1=20
    

    Noting that a multi-row statement will appear as individual statements via --verbose. Row-based events also default to sending all columns in the row, not just those that are changed.

  • The grep '^### ' statement is just a lazy way of stripping out everything except pseudo SQL statements.

Advanced Usage

You can fairly easily extend the above to track your progress reading through the master's binary logs. This will make it easier to resume from where you left off if there is a crash. Just remove the grep, and keep track of:

  • Positional markers just before DML events. These are marked in bold here:
    # at 191
    #140526 15:28:27 server id 10  end_log_pos 239 CRC32 0x559a84a8     GTID [commit=yes]
    SET @@SESSION.GTID_NEXT= '9f0ce61c-bb92-11e3-89fd-f056da47d247:17'/*!*/;
    # at 239
    #140526 15:28:27 server id 10  end_log_pos 312 CRC32 0xff074c19     Query   thread_id=1 exec_time=0 error_code=0
    SET TIMESTAMP=1401143307/*!*/;
    BEGIN
    /*!*/;
    # at 312
    #140526 15:28:27 server id 10  end_log_pos 357 CRC32 0x74b1ad7f     Table_map: `test2`.`a` mapped to number 71
    # at 357
    #140526 15:28:27 server id 10  end_log_pos 397 CRC32 0x2c6f8b8d     Write_rows: table id 71 flags: STMT_END_F
    
    BINLOG '
    C8CDUxMKAAAALQAAAGUBAAAAAEcAAAAAAAEABXRlc3QyAAFhAAEDAAF/rbF0
    C8CDUx4KAAAAKAAAAI0BAAAAAEcAAAAAAAEAAgAB//4KAAAAjYtvLA==
    '/*!*/;
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=10
    
  • Events that show that the binary log file itself is being rotated. For example, the result of FLUSH LOGS is:
    # at 4
    #691231 16:00:00 server id 10  end_log_pos 0 CRC32 0x7800af55   Rotate to mysql_sandbox5616-bin.000006  pos: 4
    

Conclusion

This might not be as robust in all cases as using the C++ API, but it sure beats the alternative. Gone are my days of writing complex sets of triggers to write to an "events" table, which I poll continually from an external script.

MySQL 5.6.16 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.16.

In particular:

  • Thanks to Honza Horak for suggesting we make tmpdir configurable at build-time, and for providing a patch. Bug #68338.
  • Thanks to Hartmut Holzgraefe for reporting a memory leak when using the InnoDB memcached API and replication. Bug #70757.
  • Thanks to Justin Swanhart for reporting that InnoDB reported an incorrect operating system error code when it failed to initialize. Bug #70867.
  • Thanks to Yoshinori Matsunobu who reported that MySQL 5.6's persistent InnoDB statistics caused stalls due to latch contention. Bug #70768.
  • Thanks to Laurynas Biveinis for discovering that InnoDB could needlessly call os_thread_get_curr_id(). Bug #70417.
  • Thanks to Nizameddin Ordulu for identifying a case where InnoDB's doublewrite buffer would not restore corrupted pages but could have. Bug #70087.
  • Thanks to Hartmut Holzgraefe for reporting that the README file for InnoDB memcached incorrectly specified the version of libevent statically linked. Bug #70034.
  • Thanks to Shahriyar Rzayev for reporting that the error message when reseting a slave with innodb_force_recovery set was cryptic. We have since improved the error message. Bug #69907.
  • Thanks to Keith Dechant for reporting that AUTO_INCREMENT values could not be reset when the INPLACE algorithm was used for ALTER TABLE. Bug #69882.
  • Thanks to Laurynas Biveinis for reporting an incorrect comment in the InnoDB source code. Bug #69847.
  • Thanks to Christian Rabe for reporting a problem when using an InnoDB tablespace stored on a raw device. Bug #69424.
  • Thanks to Justin Swanhart for reporting that an online ALTER TABLE operation on a partitioned table can consume a significant amount of memory. Bug #69325.
  • Thanks to Valeriy Kravchuk for reporting a specific workload where InnoDB did not scale well on multi-core machines. In response to Valeriy's bug report, we added a new feature to use atomic reference counting to track page use instead of mutexes. Bug #68079.
  • Thanks to Stewart Smith for reporting that table renaming operations showed up under foreign key errors in SHOW ENGINE INNODB STATUS. Stewart also generously provided a patch. Bug #61746.
  • Thanks to Laurynas Biveinis for reporting that UNIV_SYNC_DEBUG was disabled erroneously as part of an earlier bug fix. We've since reenabled it. Bug #69617.
  • Thanks to Joffrey Michaie for reporting a situation where queries on tables partitioned by hash could return wrong results. Bug #70588.
  • Thanks to zhai weixiang for reporting an issue where binlog dump information could incorrectly be written to the error log. Bug #70685.
  • Thanks to Justin Swanhart for reporting that mysqlbinlog did not properly decode DECIMAL values. Bug #65812.
  • Thanks to Santosh Praneeth Banda for reporting that the semisynchronous replication plugin was called twice for a DDL statement. Bug #70410.
  • Thanks to Yang Dingning from NCNIPC, Graduate University of Chinese Academy of Sciences for reporting a security issue. Bug #61065.
  • Thanks to Bryan Turner for reporting a situation where tables with utf8_bin collation could return results in the wrong order. Bug #69005.
  • Thanks to Honza Horak for reporting code improvement suggestions as the result of a coverity analysis. Bug #70830.
  • Thanks to Ed Reeder for reporting that the option --local-service did not work on Windows. Bug #69637.
  • Thanks to David Coyle for reporting a limitation when using views with an ORDER BY in their definition. We have since worked around this limitation. Bug #69678.
  • Thanks to Davi Arnaut for reporting an issue with the performance_schema instrumentation interface. Bug #70628.
  • Thanks to Laurynas Biveinis for pointing out a situation where make_atomic_cas_body64 might be miscompiled. Thanks also to Davi Arnaut for his comments on the bug. Bug #63451.
  • Thanks to Davi Arnaut for reporting that the mysql_plugin client inadvertently attempts to remove files. Bug #69752.
  • Thanks to Igor Babaev and xiaobin lin who independently discovered a case when COUNT(DISTINCT) could return wrong results. xiaobin lin also provided a patch, which while we did not end up using it, we valued the contribution. Bug #68749, Bug #71028.
  • Thanks to Patrick Middleton for pointing out an issue when trying to compile MySQL without partitioning. While we didn't end up using it, we thank Patrick for also providing a patch. Bug #71010.
  • Thanks to Anthony Tso for reporting a bug where wrong results could be returned. Bug #70608.
  • Thanks to Elena Stepanova for reporting a security issue. Bug #68751.
  • Thanks to Daniel van Eeden for letting us know that some scripts provided out of date information regarding where to file bug reports. Bug #68742.
  • Thanks to Don Coffin for reporting that some file removal operations were not instrumented by performance_schema. Bug #69782.
  • Thanks to Eric Bergen for reporting that mysqldump would not release metadata locks until after the dump operation had finished. Bug #71017.
  • Thanks to Michael Ezzell for reporting a security issue. Bug #68354.
  • Thanks to Giuseppe Maxia for raising awareness of the obsolete mysqlbug utility. We have since removed mention of it from mysql_install_db. Bug #29716.

Thank you again to all the community contributors listed above. In particular, the MySQL team would like to call out the names that appear more than once in this release:

Honza Horak (2), Hartmut Holzgraefe (2), Justin Swanhart (3), Davi Arnaut (2) and Laurynas Biveinis (4).

If I missed a name here, please let me know!

- Morgan

MySQL 5.6.15 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.15.

In particular:

  • Thanks to Yoshinori Matsunobu, who reported that the performance_schema did not instrument locks related to semisync replication. Bug #70577.
  • Thanks to Sebastian Strzelczyk, who reported excessive memory usage when querying INFORMATION_SCHEMA.INNODB_FT_INDEX. Sebastian uploaded a number of reports of memory usage, which was helped us generate a good reproducible testcase. Bug #70329.
  • Thanks to Inaam Rana, who suggested improvements to InnoDB counters in information_schema, and contributed a patch! Bug #70241.
  • Thanks to Yoshinori Matsunobu, who reported inefficiency in InnoDB memcached API. Yoshinori's feedback was also part of the 1 million QPS record set for 5.7. Bug #70172.
  • Thanks to Nizameddin Ordulu, who identified a case where crash recovery might not be possible with InnoDB. Nizameddin pointed out where in the code the bug existed, and we were quickly able to identify that this was a regression from an earlier bug fix. Bug #69122.
  • Thanks to Elena Stepanova, who identified a case where SELECT COUNT queries would run very slowly when run concurrently with a "LOAD DATA" operation. Elena also provided a testcase! Bug #69141.
  • Thanks to Daniël van Eeden, who reported a confusing/incorrect error message when using the InnoDB memcached API. Bug #68684.
  • Thanks to Monty Widenius and Michal Sulik, who both reported an issue where a unique composite index would return wrong results. Bug #70038, Bug #60220.
  • Thanks to Edward Dore, who reported an issue when converting tables from MyISAM to InnoDB. It was from a table that Edward provided that we were able to reproduce the issue! Bug #50366.
  • Thanks to Elena Stepanova who reported an error when deleting from a partitioned table, and provided a testcase! Bug #67982.
  • Thanks to Santosh Praneeth Banda for reporting a security issue. Bug #70349.
  • Thanks to Vlad Lesin, who reported that certain LOAD DATA statements would appear in the binary log twice, and provided a testcase! Bug #70277.
  • Thanks to Yoshinori Matsunobu, who reported that transactions could be lost when starting/stopping slaves with GTIDs. Yoshinori also provided a small program which we were able to use to reproduce the issue. Bug #69943.
  • Thanks to raolh rao for reporting a security issue. #70045.
  • Thanks to Miguel Angel Nieto, who reported that LAST_INSERT_ID is replicated incorrectly when replication filters are used. #69861.
  • Thanks to Santosh Praneeth Banda, who reported an issue with GTIDs and a patch with a suggested fix! Bug #69618.
  • Thanks to Maciej Dobrzanski, who reported an issue where replication would break following invalid grant statements. Bug #68892.
  • Thanks to Rich Prohaska and Przemyslaw Malkowski, who both reported a problem with BINARY columns and online DDL. Bugs #67141 and #69580.
  • Thanks to Jan Staněk, who submitted a patch with code improvements following the result of a Coverity scan. Bug #68918.
  • Thanks to Alexey Kopytov, who reported an issue in debug builds of MySQL, Bug #69653.
  • Thanks to Arthur O'Dwyer, who pointed out that COUNT DISTINCT would incorrectly count NULL rows. Bug #69841.
  • Thanks to Lance Ivy for discovering an issue where UPDATE does not update rows in the correct order. Bug #68656.
  • Thanks to Elena Stepanova for pointing out that mysql_upgrade will fail with the InnoDB storage engine disabled. Bug #70152.
  • Thanks to Arthur O'Dwyer, who reported that prefix indexes on text columns could return wrong results. Bug #70341.
  • Thanks to Yoshinori Matsunobu who pointed out that PERFORMANCE_SCHEMA overhead was very high in frequent connect/disconect scenarios. Bug #70018.
  • Thanks to Moritz Schuepp, who reported that host_cache_size set in a configuration file was ignored. Bug #70552.
  • Thanks to Saverio Miroddi, who reported an issue with InnoDB FULLTEXT and using savepoints. Bug #70333.
  • Thanks to Hartmut Holzgraefe, who reported that users with a legacy hostname may not be possible to drop. Hartmut also provided a patch! Bug #62255, #62254.
  • Thanks to Honza Horak, who reported that MySQL incorrectly assumed all ethernet interfaces start with ethX. Honza also provided a patch! Bug #63055.
  • Thanks to Honza Horak, who reported a number of code improvements (and patches) resulting from Coverity analysis. Bug #70591.
  • Thanks to Jan Staněk, who suggested code improvements (and a patch) resulting from Coverity analysis. Bug #68896.
  • Thanks to Roderik van der Veer, who reported that MySQL would not compile on Mac OS X 10.9. Thanks also to David Peixotto who provided an example patch of how to fix the problem. While we didn't end up using David's patch, we certainly value his contribution. Bug #70542.
  • Thanks to Olaf van der Spek and Simon Mudd, who both reported an issue that affected server shutdown. Bugs #18256, #69975.

Thank you again all the names listed above. In particular, I would like to call out the names that appear more than once:

Yoshinori Matsunobu (4), Elena Stepanova (3), Santosh Praneeth Banda (2), Jan Staněk (2), Arthur O’Dwyer (2), Honza Horak (2).

And the names of those who contributed patches:

Inaam Rana, Santosh Praneeth Banda, Hartmut Holzgraefe, Honza Horak, Jan Staněk

If I missed a name here, please let me know!

- Morgan

Commentary on MySQL slow query collection sources

The other day it struck me that MySQL applications have no fewer than four sources to be able to collect potentially slow queries for analysis, and that I actually find myself using 3/4 methods available.

Here are the methods listed, and my commentary on where I will use each of them:

Application Logging/Monitoring

I enjoy using Application-level tools such as New Relic or xhprof to find slow experiences rather than just slow queries.

What I mean by this, is that there is not a 1:1 between pages and queries, so sometimes just heading straight to the database can be the wrong decision. As I wrote on Monday, latency is a factor with N+1 queries, and I typically want to focus on queries in aggregate that exceed page generation goals. On the same note, I also pay less attention to queries that are part of background tasks and do not impact user experience.

I also like monitoring application tools like a hawk whenever I am running any operational changes (i.e. removing indexes that I am 99.99% confident are unused, disabling the query cache). New Relic has about a 3 minute delay over real-time, but this is usually good enough for me, since it increases my confidence on top of whatever DB dashboards I am already looking at.

Performance Schema

In MySQL 5.6, the performance_schema can now instrument statements. In combination with ps_helper I find this to be easiest way to identify what has been running on a particular database server.

What I love about this, is that it's enabled by default and can stay enabled full time with a zero second slow query threshold.

Maybe it's just anecdotal, but I also find with performance_schema that I can react very fast, and won't miss anything. There used to be this situation where I would need to stalk the PROCESSLIST or make sure I lower the long_query_time in time to see a particular problem.

(I should also note, that the MySQL Enterprise Monitor 3.0 uses the PERFORMANCE_SCHEMA to show statement performance. The Enterprise Tools team is one of the driving forces behind PERFORMANCE_SCHEMA - it's great to see that the interfaces they use are available to all.)

Slow query log file

I use this for more of a long play analysis, by setting the long_query_time=0, then letting it run for between 20 minutes and a few hours. When I have enough data in the log file, I will restore the long_query_time to a high value and then copy my slow log file across the network to another server for processing.

The best tool to process this file is pt-query-digest - I routinely give it a 20GB+ file, and I love the output format it provides, since it has a histogram with useful statistics.

Even with the introduction of performance_schema, there will probably always be a use for the slow query log in some form, but in 5.6 I find I am using this method less for 'routine' investigations. Here is why:

  • I counted it out - what I do is an eight step process.
    1. Rotate the slow query log file (to avoid historical data skew)
    2. lower the long_query_time
    3. bounce applications that use connection pooling
    4. record for 20 minutes (watching dashboards for impact)
    5. increase the long_query_time
    6. rotate the log file again
    7. copy file to a different host
    8. run pt-query-digest
  • I am worried that in one of those eight steps, I will screw up and leave the long_query_time at zero seconds. I'll find out when I have a midnight page saying I'm critically low on disk space.

What I think would make me use this method more, is if I were able to specify a maximum size per slow log file (i.e. 2GB) and then a maximum number of log files to retain (i.e. 10) and have MySQL manage rotation and deletion of old files for me.

Slow query log table

I don't use this method. Since the queries in the slow log table do not include the digest, it's not possible to aggregate them in any meaningful way. What you usually have to do, is write a script to extract the slow query log table into the slow query log file format(!), then run pt-query-digest.

This feature may be helpful for cloud providers who do not want to provide local filesystem access to customers. However, at least in the case of Amazon RDS there is a way that you can access the slow query log files.


Update: A commenter on Google+ let me know that I forgot to mention network traffic inspection as an collection source. An oversight on my part. The best use case I have for this method is to workaround step (3) I mentioned in slow query log analysis - you don't need to bounce connections to start collecting queries.

Upgrading from the earlier MySQL RPM Format to Yum Repos

The packages that are available in the yum repos contain a number of enhancements over the RPM packages that are available from dev.mysql.com.

Norvald blogged on some of these enhancements earlier. Today I wanted to walk through a safe upgrade path, as they are not quite compatible with each-other.

My Existing Installation

To start with, the packages I have installed came from "Red Hat Enterprise Linux 6 / Oracle Linux 6 (x86, 64-bit), RPM Bundle" on dev.mysql.com. You can check which packages you have installed with:

[root@localhost ~]# rpm -qa | grep -i mysql
MySQL-client-5.6.14-1.el6.x86_64
MySQL-embedded-5.6.14-1.el6.x86_64
MySQL-server-5.6.14-1.el6.x86_64
MySQL-shared-5.6.14-1.el6.x86_64
MySQL-devel-5.6.14-1.el6.x86_64
MySQL-test-5.6.14-1.el6.x86_64
MySQL-shared-compat-5.6.14-1.el6.x86_64

Uninstalling and Installing Yum repos

I recommend first running yum update before removing, and then installing the new repo:

yum update
yum localinstall http://repo.mysql.com/mysql-community-release-el6-3.noarch.rpm

After this step, stop MySQL (note the missing d in the dev.mysql.com packages):

service mysql stop

Now with yum shell it's possible to uninstall the existing packages (listed in 'my existing installation' above) and install the replacement packages from the yum repo in one step:

yum shell
 > remove MySQL-shared MySQL-devel MySQL-test MySQL-server MySQL-client MySQL-shared-compat MySQL-embedded 
 > install mysql-server mysql-libs mysql-libs-compat
 > run
 > quit

Here was the summary output from my yum session:

=================================================================================================================================================================
 Package                                     Arch                   Version                      Repository                                                 Size
=================================================================================================================================================================
Installing:
 mysql-community-libs                        x86_64                 5.6.14-3.el6                 mysql-community                                           1.8 M
 mysql-community-libs-compat                 x86_64                 5.6.14-3.el6                 mysql-community                                           1.6 M
 mysql-community-server                      x86_64                 5.6.14-3.el6                 mysql-community                                            51 M
Removing:
 MySQL-client                                x86_64                 5.6.14-1.el6                 @/MySQL-client-5.6.14-1.el6.x86_64                         81 M
 MySQL-devel                                 x86_64                 5.6.14-1.el6                 @/MySQL-devel-5.6.14-1.el6.x86_64                          19 M
 MySQL-embedded                              x86_64                 5.6.14-1.el6                 @/MySQL-embedded-5.6.14-1.el6.x86_64                      432 M
 MySQL-server                                x86_64                 5.6.14-1.el6                 @/MySQL-server-5.6.14-1.el6.x86_64                        235 M
 MySQL-shared                                x86_64                 5.6.14-1.el6                 @/MySQL-shared-5.6.14-1.el6.x86_64                        8.4 M
 MySQL-shared-compat                         x86_64                 5.6.14-1.el6                 @/MySQL-shared-compat-5.6.14-1.el6.x86_64                  11 M
 MySQL-test                                  x86_64                 5.6.14-1.el6                 @/MySQL-test-5.6.14-1.el6.x86_64                          318 M
Installing for dependencies:
 mysql-community-client                      x86_64                 5.6.14-3.el6                 mysql-community                                            18 M
 mysql-community-common                      x86_64                 5.6.14-3.el6                 mysql-community                                           296 k

Transaction Summary
=================================================================================================================================================================
Install       5 Package(s)
Remove        7 Package(s)

MySQL should now be installed from the yum packages. You just have two more steps to complete - start it, and configure it on boot:

service mysqld start # note the added 'd'
chkconfig mysqld on

Your original /etc/my.cnf is even saved. You can compare it to /etc/my.cnf.rpmnew if you would like to consider switching to the new one.

Still having problems? I recommend heading to the MySQL Forums. There is a section dedicated to Install & Repo help.

Explicit Partition Selection in MySQL 5.6

In case you missed it, MySQL 5.6 added support for explicit partition selection. Since its release in MySQL 5.1, I have found partitioning an incredibly useful feature for aging out time-series data, and I plan to put this new syntax to work.

Today I wanted to show two hypothetical examples of how it can be used.

Consider it an optimizer hint

MySQL is usually able to optimize a query to search only the partitions which will be required. For example here we can see there are 4 partitions but 2 are searched:

CREATE TABLE t1 (
 id INT NOT NULL PRIMARY KEY auto_increment,
 cola char(255)
)
PARTITION BY RANGE(id) (
 PARTITION p0 VALUES LESS THAN (64),
 PARTITION p1 VALUES LESS THAN (128),
 PARTITION p2 VALUES LESS THAN (192),
 PARTITION p3 VALUES LESS THAN MAXVALUE
);

EXPLAIN PARTITIONS SELECT * FROM t1 WHERE id BETWEEN 120 AND 140\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p1,p2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

This feature is called partition pruning.

However, similar to optimizer hints there may be cases where you are in a better position to decide which partitions should be accessed. A (poor) example to demonstrate this, is to alter the previous example and pass the value 120 through a non-deterministic function:

CREATE FUNCTION myint (i INT)
RETURNS INT NOT DETERMINISTIC
RETURN i;

mysql> EXPLAIN PARTITIONS SELECT * FROM t1 
WHERE id BETWEEN myint(120) AND 140\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p0,p1,p2,p3 <-- requires all partitions
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN PARTITIONS SELECT * FROM t1 PARTITION (p1, p2) 
WHERE id BETWEEN myint(120) AND 140\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: p1,p2 <-- searches two partitions mentioned
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
        Extra: Using where
1 row in set (0.00 sec)

Workaround partition-columns required in primary key

I am not sure I recommend this suggestion, but for some applications retrofitting in partitioning is difficult because it may require modification of the primary key. Here is an example:

mysql> CREATE TABLE invoices (
->      id INT NOT NULL PRIMARY KEY auto_increment,
->      created_at datetime NOT NULL,
->      customer_id INT NOT NULL,
->      total_amount DECIMAL(10,2)
->     )
->      PARTITION BY RANGE (YEAR(created_at)) (
->       PARTITION p2009 VALUES LESS THAN (2010),
->       PARTITION p2010 VALUES LESS THAN (2011),
->       PARTITION p2011 VALUES LESS THAN (2012),
->       PARTITION p2012 VALUES LESS THAN (2013),
->       PARTITION pCurrent VALUES LESS THAN MAXVALUE
->     );
ERROR 1503 (HY000): A PRIMARY KEY must include all 
columns in the table's partitioning function

It is possible to use the new explicit selection as a way of bypassing this requirement by partitioning by range on the primary key, and then naming the partitions as something useful. For example:

CREATE TABLE invoices (
 id INT NOT NULL PRIMARY KEY auto_increment,
 created_at datetime NOT NULL,
 customer_id INT NOT NULL,
 total_amount DECIMAL(10,2)
)
 PARTITION BY RANGE (id) ( 
  PARTITION p2009 VALUES LESS THAN (324334),
  PARTITION p2010 VALUES LESS THAN (673898),
  PARTITION p2011 VALUES LESS THAN (1203248),
  PARTITION p2012 VALUES LESS THAN (1703940),
  PARTITION pCurrent VALUES LESS THAN MAXVALUE
);

Application queries can then be written as:

SELECT * FROM invoices PARTITION(p2009) WHERE customer_id=2134;

Why I said I don't recommend it, is that if a created_at date were to be modified, the row could suddenly find itself in the wrong partition. With the typical usage (partition by range on created_at) the row will be moved to another partition automatically upon modification.

How do you use partitioning?
Do you find explicit partition selection useful?
Leave a comment!

A closer look at Online DDL in MySQL 5.6

MySQL 5.6 introduced online DDL for InnoDB tables, which means that a number of ALTER TABLE statements no longer block access to data modification while running.

I've written about this feature before: it was enough to make one of my freelancing customers fall in love with MySQL again. Today I wanted to go through what's covered and what's not.

The Manual

The MySQL manual actually does all the work for me, since there is a table under 14.2.12.1. Overview of Online DDL.

The particular column you want to pay attention to is "Allows Concurrent DML?", which means that means that neither reads or writes are blocked.

Operation In-Place? Copies Table? Allows Concurrent DML? Allows Concurrent Query? Notes
CREATE INDEX,
ADD
INDEX
Yes* No* Yes Yes Some restrictions for FULLTEXT index; see next row.
Currently, the operation is not in-place (that is, it
copies the table) if the same index being created was also
dropped by an earlier clause in the same
ALTER TABLE statement.
ADD FULLTEXT
INDEX
Yes No* No Yes Creating the first FULLTEXT index for a table
involves a table copy, unless there is a user-supplied
FTS_DOC_ID column. Subsequent
FULLTEXT indexes on the same table can
be created in-place.
DROP INDEX Yes No Yes Yes Modifies .frm file only, not the data file.
OPTIMIZE TABLE Yes Yes Yes Yes Uses ALGORITHM=INPLACE as of MySQL 5.6.17.
ALGORITHM=COPY is used if
old_alter_table=1 or
mysqld --skip-new
option is enabled. OPTIMIZE TABLE using
online DDL (ALGORITHM=INPLACE) is not
supported for tables with FULLTEXT indexes.
Set default value for a column Yes No Yes Yes Modifies .frm file only, not the data file.
Change auto-increment value
for a column
Yes No Yes Yes Modifies a value stored in memory, not the data file.
Add a foreign key
constraint
Yes* No* Yes Yes To avoid copying the table, disable
foreign_key_checks during
constraint creation.
Drop a foreign key
constraint
Yes No Yes Yes The foreign_key_checks option can be
enabled or disabled.
Rename a column Yes* No* Yes* Yes To allow concurrent DML, keep the same data type and only change the
column name.
Add a column Yes Yes Yes* Yes Concurrent DML is not allowed when adding an
auto-increment
column. Although ALGORITHM=INPLACE is
allowed, the data is reorganized substantially, so it is
still an expensive operation.
Drop a column Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Reorder columns Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Change ROW_FORMAT property Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Change KEY_BLOCK_SIZE property Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Make column NULL Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Make column NOT NULL Yes* Yes Yes Yes When SQL_MODE includes
strict_all_tables or
strict_all_tables, the operation fails
if the column contains any nulls. Although
ALGORITHM=INPLACE is allowed, the data
is reorganized substantially, so it is still an expensive
operation.
Change data type of column No Yes No Yes
Add primary key Yes* Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation. ALGORITHM=INPLACE is not
allowed under certain conditions if columns have to be
converted to NOT NULL. See
Example 14.9, "Creating and Dropping the Primary Key".
Drop primary key and add another Yes Yes Yes Yes ALGORITHM=INPLACE is only allowed when you add a new
primary key in the same ALTER
TABLE
; the data is reorganized substantially, so
it is still an expensive operation.
Drop primary key No Yes No Yes Restrictions apply when you drop a primary key primary key without
adding a new one in the same ALTER
TABLE
statement.
Convert character set No Yes No Yes Rebuilds the table if the new character encoding is different.
Specify character set No Yes No Yes Rebuilds the table if the new character encoding is different.
Rebuild with FORCE option Yes Yes Yes Yes Uses ALGORITHM=INPLACE as of MySQL 5.6.17.
ALGORITHM=COPY is used if
old_alter_table=1 or
mysqld --skip-new
option is enabled. Table rebuild using online DDL
(ALGORITHM=INPLACE) is not supported
for tables with FULLTEXT indexes.
Rebuild with "null" ALTER TABLE ...
ENGINE=INNODB
Yes Yes Yes Yes Uses ALGORITHM=INPLACE as of MySQL 5.6.17.
ALGORITHM=COPY is used if
old_alter_table=1 or
mysqld --skip-new
option is enabled. Table rebuild using online DDL
(ALGORITHM=INPLACE) is not supported
for tables with FULLTEXT indexes.
Set table-level persistent
statistics options
(STATS_PERSISTENT,
STATS_AUTO_RECALC
STATS_SAMPLE_PAGES)
Yes No Yes Yes Modifies .frm file only, not the data file.

Update: I have updated the table to reflect changes in MySQL 5.6.17.

Find that hard to remember?

What I also like about MySQL 5.6, is that it also introduces syntax to force ALTER TABLE statements to not run if they can not be performed online. Here is an example of using LOCK=NONE (must allow concurrent DML):

mysql> CREATE TABLE a (id INT NOT NULL PRIMARY KEY, a char(100),
 b char(100));
Query OK, 0 rows affected (0.02 sec)

mysql> ALTER TABLE a ADD INDEX idx_a (a), LOCK=NONE;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE a DROP PRIMARY KEY,  LOCK=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. 
 Reason: Dropping a primary key is not allowed without 
 also adding a new primary key. Try LOCK=SHARED.

I recommend using this syntax whenever writing migration scripts so you are never surprised by how much impact they will have.

What's upcoming?

In general I would comment that the number of operations which switched to being online in 5.6 is quite substancial. In MySQL 5.7 two additional improvements to online DDL are available, and are already noted in the MySQL 5.7 manual page:

  1. Extending a VARCHAR column is online.
  2. Renaming an index is online.

Have you used Online DDL in 5.6? Please leave a comment and share your experiences.

MySQL 5.6 - Now with fewer gripes

I often find myself in conversations with long time MySQL users who tell me about little annoyances in MySQL that make the product harder to use. Sometimes these annoyances can stay in a product for seemingly forever, because they require a change in behaviour (breaking backwards compatibility) or they are the result of architectural limitations not easily fixed.

In either case, I think it's important to commend the MySQL engineering team at Oracle, because as well introducing a number of big ticket features for MySQL 5.6, they actually invested a lot of time into closing out a lot of old bugs.

Here is my list of historical gripes, now resolved:

  1. innodb-file-per-table ON by default means it is easier to reclaim space caused by run-away tables.
  2. When you change innodb_log_file_size in configuration, InnoDB will now resize logs to match automatically.
  3. Temporal data types can now support microseconds.
  4. Silent storage engine substitution is disabled by default (sql_mode=NO_ENGINE_SUBSTITUTION).
  5. Subqueries are now optimized.
  6. Stored procedures offer a way to get more information on errors.
  7. Timestamps can be maintained on more than one column.
  8. Replication slaves are crash safe. This means when a slave crashes you no longer need to re-image its data from the master or another slave. It can resume from where it was.
  9. Replication failover is much easier with GTIDs. Previously it was very difficult when you had a master fail with many slaves. You could not simply pick a new master from one of the slaves and reposition the other slaves under it.
  10. Indexes can now be added online, which means less downtime for a lot of people.
  11. There's a standard way to find unused indexes (performance_schema).

I should point out that don't think MySQL's alone in having these nuances - I think the political correct name is a "legacy feature" :)

In MySQL 5.7, my gripe about Control-C not working in the client has already been fixed. The Bug System also has the ability to vote for a bug as impacting you. So if you see something annoying, please click 'Affects me' to leave feedback.

Did I miss something? Please leave a comment.