MySQL 5.6 – Page 2 – Master MySQL

MySQL 5.6.14 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.14.

In particular:

Thanks to Justin Swanhart for pointing out an issue with statistics on partitioned tables – bug #69179. While we marked Justin’s bug as a duplicate, Justin continued to provide commentary on an earlier bug opened, and we appreciate his feedback.
Justin Swanhart also noticed a change in behaviour in MySQL 5.6 with an assertion raised in InnoDB – bug #70007. We’ve since reverted to the previous behaviour.
Thanks to Zhai Weixiang, who noticed that InnoDB code intended to only run in debug versions of MySQL was unintentionally running in all versions of mysql. Bug #69724.
Thanks to both Robert Lineweaver and Ryan Kenney for reporting bugs #69693 and #69707, as well as providing excellent test cases and research into versions effected. The problems both related to InnoDB mishandling some foreign key names.
Thanks to Roel Van de Paar who reported an assertion when innodb_log_group_home_dir does not exist. Bug #69000
Thanks to Elena Stepanova for reporting bug #65146. From Elena’s contribution, we both updated the documentation and now produce a warning when START TRANSACTION WITH CONSISTENT SNAPSHOT is used in isolation levels other than REPEATABLE READ.
Thanks to Mark Callaghan for reporting an issue with InnoDB’s background thread and shutting down the mysql server. Bug #63276.
Thanks to Alexey Kopytov for detecting a situation where an infinite loop could occur in handling InnoDB’s compressed pages. Alexey’s bug report contained detailed code analysis and was very helpful. Bug #61132.
Thanks to Shahriyar Rzayev for reporting the original bug, and uploading detailed data to assist in reproduction. It was from these contributions we were later able to devise a testcase and reproduce the bug internally! Bug #69898
Thanks to Justin Swanhart and Tsubasa Tanaka who both identified a bug in MASTER_DELAY not type checking input. Bugs #69249 and #69469.
Thanks to Hartmut Holzgraefe for reporting an issue with Row-based-replication and decimal data type between MySQL versions. Bug #57166.
Thanks to Mark Callaghan for reporting a performance regression in 5.6. Bug #68825
Thanks to Jean Weisbuch for identifying an issue with information_schema. Bug #68076.
Thanks to Simon Mudd for discovering an issue where the event scheduler did not correctly report when it could not create a new thread. Bug #67191
Thanks to Sergey Petrunya who reported bug #67507, as well as the community of users who provided input on the bug – mysqldump is often run from cron, so changes in behaviour risk backups not being performed.
Thanks to a private bug reporter who demonstrated that MySQL inadequently protects against stack overflow, and provided an example patch. Bug #35019
Thanks to “Dude Letme BE” who reported symbols were missing from libmysql.dll. Bugs #69204, #62394.
Thanks to Chito Angeles for reporting a bug in InnoDB fulltext search, and reducing it to a simplified testcase. Bug #69932.

Thank you again all the names listed above. In particular, I would like to call out the two names that appear more than once: Justin Swanhart (3), Mark Callaghan (2).

- Morgan

What to tune in MySQL 5.6 after installation

As the result of
a number of improvements to default values, MySQL 5.6 requires far less configuration than previous versions of MySQL. Having said that, I wanted to write about the settings that you may need to change.

InnoDB Settings

innodb_buffer_pool_size – Defaults to 128M. This is the main setting you want to change, since it sets how much memory InnoDB will use for data+indexes loaded into memory. For a dedicated MySQL server, the recommended size is 50-80% of installed memory. So for example, a server with 64G of RAM should have around a 50G buffer pool.
The danger of setting this value too high is that there will be no memory left for the operating system and some MySQL-subsystems that rely on filesystem cache such as binary logs, and InnoDB’s transaction logs.
innodb_log_file_size – defaults to 48M. High write throughput systems may want to increase this to allow background checkpoint activities to smooth out over longer periods of time, leading to improved performance. Values up to 4G are quite safe. Historically, the downside of operating with a large log file was increased crash recovery times, but this has vastly improved in 5.5 and 5.6.
innodb_flush_method – defaults to fdatasync. if you are using a hardware RAID controller, you may want to set this to O_DIRECT. This prevents the “double buffering” effect of when reading InnoDB buffer pool pages there is both a copy in InnoDB and in operating system’s filesystem cache.

If you do not have a hardware RAID controller, or with some SANs O_DIRECT may lead to worse performance. The manual and Bug #54306 clarify this in more detail.
innodb_flush_neighbors – defaults to 1. This setting should be set to 0 (disabled)
on SSDs which do not have any performance gains with sequential IO. In some hardware configurations it may also be beneficial to disable with RAID, because logically sequential blocks are not guaranteed to be physically sequential.
innodb_io_capacity and innodb_io_capacity_max – these settings influence how much background work InnoDB will do per second. In a previous post I described how most write IO (other than writing InnoDB’s log) is backgrounded. If you have knowledge of your hardware (and how many operations per second it can perform) it makes sense to consume this capacity rather than let it remain idle.

The analogy I like to use here is like a flight taking off with empty seats – sometimes it is better to fill them with people who were destined for later flights just in case there is bad weather ahead. i.e. it is better to get rid of all background work at the earliest convenience to reduce liabilities.

Some simple math: 200 IOPS per disk that can write, so a RAID10 controller with 10 disks = (10/2) * 200 = 1000. I say “simple math”, because a RAID controller will often will be able to provide additional merging and effectively increase the number of IOPS you will be capable of. In the case of SSDs, IOPS count will likely be in the thousands.

The risk of setting this too high is that you do not want background work to be so overwhelming it competes with foreground work for disk capacity. Historically there was some risk of setting this too high that InnoDB would hold an internal lock leading to decreased performance (my understanding is that this is greatly improved in 5.6).
innodb_lru_scan_depth – defaults to 1024. This is a new setting introduced in mysql 5.6. Mark Callaghan has written some advice on configuring it. The simple version is that if you raise innodb_io_capacity, so should you increase innodb_lru_scan_depth.

Replication

Lets assume that this server will want to be used for replication, or point-in-time recovery, in which case we want:

log-bin – enables the binary log. By default the binary log is not crash safe, but as I wrote earlier, I recommend that most users should aim for durability. In which case you need to also enable:
sync_binlog=1, sync_relay_log=1, relay-log-info-repository=TABLE and master-info-repository=TABLE.
expire-logs-days – By default old binary logs will be kept around forever. I recommend a value of 1-10 days. Longer is not usually useful since it will often take less time to just restore from backup.
server-id – every server in the replication topology is required to have a unique server-id.
binlog_format=ROW – changes to row based replication. I wrote about row-based recently saying that I really like it because it can improve performance by reducing locking. There are two additional settings that need to be enabled here: transaction-isolation=READ-COMMITTED and innodb_autoinc_lock_mode = 2.

Misc

timezone=GMT Change timezone to GMT. A growing number of sysadmins suggest to keep all servers in GMT. I am personally a big fan of this, since almost all businesses are global these days. The locale you first started operating in seems kind of arbitrary.
character-set-server=utf8mb4 and collation-server=utf8mb4_general_ci – As I wrote in my previous post, utf8 is a much better defaults for new applications. You may optionally also want to set skip-character-set-client-handshake to ignore applications wanting to set the character-set otherwise.
sql-mode – MySQL defaults to be very forgiving, and will silently truncate data. In my previous post, I said that for new applications it is better to set this to be STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,
NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,
NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,
NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY.
skip-name-resolve – Disables reverse name lookups on incoming connections. DNS resolution can be a little slow/flakey on some systems, so if you don’t require hostnames as part of privileges I recommend avoiding it.
max_connect_errors – As Todd Farmer writes: “[this feature] provides no meaningful protection against brute force access attacks”. And indeed max_connect_errors doesn’t even apply when skip-name-resolve is set (as recommended just above).

A firewall is better suited here, and usually what I will do is block port 3306 both publicly and internally so that only applications intended to have access can connect to MySQL. I will usually then set max_connect_errors to 100000 just so I can avoid any “double configuration” and guarantee it won’t get in the way.
max-connections – The default is 151. I see a lot of users with this set to a higher value in the 300-1000 range.

Usually setting this higher is unavoidable, but it makes me a little nervous since a 16-core machine only has an opportunity to be over subscribed by maybe 2x-10x while blocking on IO. What you are hoping for here is that many of the open connections are sleeping idle. If they all become active, you can thread-thrash.

The way I would ideally like to have this problem solved is by application connection-pools better configured to service the database rather than keep large amounts of connections open, or non-pooled applications connect, execute work and disconnect as fast as possible. Another way to solve it from 5.5 onwards (and one of the few differences between MySQL community edition and enterprise edition) is the thread pool plugin.

Conclusion

So lets assume that we’re installing MySQL on a server with:

64G RAM
Hardware RAID Controller (lets assume capable of at least 2000 IOPS)
Requires Replication
It’s a new application
We have a firewall
We don’t require hostnames for privileges
We have a global application and don’t want to be tied to one timezone.
We want the application to be durable.

Here might be our sample configuration:

# InnoDB settings
innodb_buffer_pool_size=50G
innodb_log_file_size=2G
innodb_flush_method=O_DIRECT
innodb_io_capacity=2000
innodb_io_capacity_max=6000
innodb_lru_scan_depth=2000

# Binary log/replication
log-bin
sync_binlog=1
sync_relay_log=1
relay-log-info-repository=TABLE
master-info-repository=TABLE
expire_logs_days=10
binlog_format=ROW
transaction-isolation=READ-COMMITTED
innodb_autoinc_lock_mode = 2

# Other
timezone=GMT
character-set-server=utf8
collation-server=utf8_general_ci
sql-mode="STRICT_TRANS_TABLES,
 ERROR_FOR_DIVISION_BY_ZERO,
 NO_AUTO_CREATE_USER,
 NO_AUTO_VALUE_ON_ZERO,
 NO_ENGINE_SUBSTITUTION,
 NO_ZERO_DATE,
 NO_ZERO_IN_DATE,
 ONLY_FULL_GROUP_BY"
skip-name_resolve
max-connect-errors=100000
max-connections=500

# Unique to this machine
server-id=123

Hopefully that addresses the major issues. Let me know if you have any other safe advice,
and I’ll add to the list!

Improving MySQL’s default configuration

One of MySQL’s continued values is to be easy to use. A key part of this, is making sure the default configuration is kept up to date and safe for everyone.

Internally, these configuration changes are made in close cooperation between our engineering and support teams, who are in the best position to provide insight into customer impact.

For MySQL 5.6, hats off to James Day who led the project internally. Taken directly from James’ blog, here is the list of changes made:

Setting	Old	New	Notes
back_log	50	50 + ( max_connections / 5 ) capped at 900
binlog_checksum	NONE	CRC32	New variable in 5.6. 5.5 doesn’t accept the checksums. Use NONE if you want to read binlog with 5.5 or on master if have replication slaves using 5.5.
binlog-row-event-max-size	1k	8k	no binlog_row_event_max_size variable, just the option form.
flush_time	1800	Windows changes from 1800 to 0	Was already 0 on other platforms
host_cache_size	128	128 + 1 for each of the first 500 max_connections + 1 for every 20 max_connections over 500, capped at 2000	New variable in 5.6
innodb_autoextend_increment	8	64	64 is 64 megabytes
innodb_buffer_pool_instances	0	8. On 32 bit Windows only, if innodb_buffer_pool_size is greater than 1300M, default is innodb_buffer_pool_size / 128M
innodb_concurrency_tickets	500	5000
innodb_file_per_table	0	1
innodb_log_file_size	5M	48M	InnoDB will change size to match my.cnf value. Also see innodb_log_compressed_pages and binlog_row_image
innodb_old_blocks_time	0	1000	1 second
innodb_open_files	300	300. If innodb_file_per_table is ON, higher of table_open_cache or 300
innodb_purge_batch_size	20	300
innodb_purge_threads	0	1
innodb_stats_on_metadata	on	off
join_buffer_size	128k	256k
max_allowed_packet	1M	4M
max_connect_errors	10	100
open_files_limit	0	5000	See Note 1
query_cache_size	0	1M
query_cache_type	ON/1	OFF/0
sort_buffer_size	2M	256k
sql_mode	none	NO_ENGINE_SUBSTITUTION	See later post about default my.cnf for STRICT_TRANS_TABLES
sync_master_info	0	10000	Recommend: master_info_repository=table
sync_relay_log	0	10000
sync_relay_log_info	0	10000	Recommend: relay_log_info_repository=table. Also see Replication Relay and Status Logs
table_definition_cache	400	400 + table_open_cache / 2, capped at 2000
table_open_cache	400	2000	Also see table_open_cache_instances
thread_cache_size	0	8 + max_connections/100, capped at 100

Note 1: In 5.5 there were already rules to ask the OS for the highest of (10 + max_connections + table_cache_size * 2) or (max_connections * 5) or the specified/default open_files_limit. The default is now calculated but the other rules are still used. If the OS refuses to allow as many as requested, max_connections and table_cache_size are reduced and you will see a “Changed limits” warning in the error log, as before.

Additional recommendations for new applications

Part of continuing to be easy to use means that as well as having a good set of defaults, users
also have an expectation of backwards compatibility. So there is a tradeoff that sometimes needs
to be taken to make sure that novice users can upgrade their legacy applications with minimal issues.

This tradeoff doesn’t apply to new applications, and in which case I recommend the following additional
configuration changes which should be enabled:

[mysqld]

# Force UTF-8 character-set:
character-set-server=utf8
collation-server=utf8_general_ci

# Force MySQL to be more strict and not allow legacy
# defaults which truncate data, accept invalidate dates etc
# Shown vertically to be a little easier to read.

sql-mode="STRICT_TRANS_TABLES,
  ERROR_FOR_DIVISION_BY_ZERO,
  NO_AUTO_CREATE_USER,
  NO_AUTO_VALUE_ON_ZERO,
  NO_ENGINE_SUBSTITUTION,
  NO_ZERO_DATE,
  NO_ZERO_IN_DATE,
  ONLY_FULL_GROUP_BY"

UTF-8 is a much smarter default than latin1 today. A simple Google search
shows that converting is an issue for a number of users. I would even go so far as to encourage setting
skip-character-set-client-handshake to ignore applications trying to set the character-set otherwise.

MySQL (arguably) became popular for not being strict about SQL and allowing you to insert any values. This may have made sense if a poor application did not perform error checking, and the alternative was not to store any data. I think that times have changed now – many users will use frameworks and ORMs, and are happy to catch errors up front, rather than later find their data is corrupt.

Moving forward

We are open to feedback on how we introduce configuration changes. You can leave a comment here,
or feel free to contact me.

To get the discussion rolling, what do you think about bundling a small program (mysql-upgrade-defaults) with packages that can optionally be run after installation?

Optimizing IN Subqueries in MySQL 5.6

I thought I would take the new subquery optimizations in MySQL 5.6 for a spin today, using the world sample database provided by MySQL for certification and training.

Typical IN subquery

This is a very typical query developers run, which historically has performed very poorly on MySQL:

mysql5.5.31 > EXPLAIN SELECT * FROM City WHERE CountryCode IN
 (SELECT code FROM Country WHERE name = 'United States');
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3984
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: Country
         type: unique_subquery
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: func
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

mysql5.6.11 > EXPLAIN SELECT * FROM City WHERE CountryCode IN
(SELECT code FROM Country WHERE name = 'United States');
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: Country
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 239
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: ref
possible_keys: CountryCode
          key: CountryCode
      key_len: 3
          ref: world.Country.Code
         rows: 1
        Extra: NULL
2 rows in set (0.00 sec)

Notice that in MySQL 5.6 – the very first table accessed is Country instead of City. MySQL 5.5 was not able to recognize this as a constant, and instead executed this as a DEPENDENT SUBQUERY (aka Correlated subquery) for each row it found in the city table (an estimated 3984 rows)!

MySQL 5.6 still has a table scan on Country, but I can address that with an index on Country.name:

mysql5.5.31 > ALTER TABLE Country ADD INDEX (name);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql5.5.31 > EXPLAIN SELECT * FROM City WHERE CountryCode IN
 (SELECT code FROM Country WHERE name = 'United States')\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3984
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: Country
         type: unique_subquery
possible_keys: PRIMARY,Name
          key: PRIMARY
      key_len: 3
          ref: func
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

mysql5.6.11 > ALTER TABLE Country ADD INDEX (name);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql5.6.11 > EXPLAIN SELECT * FROM City WHERE CountryCode IN
 (SELECT code FROM Country WHERE name = 'United States')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: Country
         type: ref
possible_keys: PRIMARY,Name
          key: Name
      key_len: 52
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: ref
possible_keys: CountryCode
          key: CountryCode
      key_len: 3
          ref: world.Country.Code
         rows: 1
        Extra: NULL
2 rows in set (0.00 sec)

The index doesn’t affect MySQL 5.5 – which still executes as a DEPENDENT SUBQUERY, but take a look at MySQL 5.6 – 1 row from the Country table (from an index!) and then 1 row from the City table. This optimizes great!

More complex IN example

In this example I thought I would try to find all cities in the country with the largest population. My first attempt was to see if I could now use a LIMIT in a subquery. It looks like I’ll have to wait a bit longer:

mysql5.5.31 > select * from City WHERE CountryCode IN (SELECT Code FROM country order by population desc limit 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

mysql5.6.11 > select * from City WHERE CountryCode IN (SELECT Code FROM country order by population desc limit 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

So here is my second attempt:

mysql5.5.31 > EXPLAIN SELECT * FROM City WHERE CountryCode IN 
(SELECT Code FROM country WHERE population = (SELECT max(population) FROM Country))\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3984
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: country
         type: unique_subquery
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: func
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: Country
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 219
        Extra:
3 rows in set (0.00 sec)

mysql5.6.11 > EXPLAIN SELECT * FROM City WHERE CountryCode IN 
(SELECT Code FROM country WHERE population = (SELECT max(population) FROM Country))\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: country
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 239
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: ref
possible_keys: CountryCode
          key: CountryCode
      key_len: 3
          ref: world.country.Code
         rows: 1
        Extra: NULL
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: Country
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 239
        Extra: NULL
3 rows in set (0.00 sec)

MySQL 5.5 could always optimize the population = scalar subquery, but not the IN subquery. Similar to the above example, I would expect the subqueries here should be unraveled as constants as well. If I add an index on population you can really see this happen:

mysql5.5.31 > ALTER TABLE Country ADD INDEX (population);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql5.5.31 > EXPLAIN SELECT * FROM City WHERE CountryCode IN 
(SELECT Code FROM country WHERE population = (SELECT max(population) FROM Country))\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3984
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: country
         type: unique_subquery
possible_keys: PRIMARY,Population
          key: PRIMARY
      key_len: 3
          ref: func
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
3 rows in set (0.00 sec)

mysql5.6.11 > ALTER TABLE country add index (population);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql5.6.11 > EXPLAIN select * from City WHERE CountryCode IN 
(SELECT Code FROM country where population = (SELECT max(population) FROM Country))\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: country
         type: ref
possible_keys: PRIMARY,Population
          key: Population
      key_len: 4
          ref: const
         rows: 1
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: ref
possible_keys: CountryCode
          key: CountryCode
      key_len: 3
          ref: world.country.Code
         rows: 1
        Extra: NULL
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
3 rows in set (0.00 sec)

This is looking really good in MySQL 5.6. I had a bit of a huh? moment when trying to read what the Select tables optimized away step #3 meant. This led me to try using EXPLAIN EXTENDED where I discovered a little gem:

mysql5.5.31 > SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `world`.`city`.`ID` AS `ID`,`world`.`city`.`Name` AS `Name`,
`world`.`city`.`CountryCode` AS `CountryCode`,`world`.`city`.`District` AS `District`,
`world`.`city`.`Population` AS `Population` from `world`.`city` where
 <in_optimizer>(`world`.`city`.`CountryCode`,
 <exists>(<primary_index_lookup>(<cache>(`world`.`city`.`CountryCode`) in 
 country on PRIMARY where ((`world`.`country`.`Population` = 
 (select max(`world`.`country`.`Population`) from `world`.`country`)) and 
 (<cache>(`world`.`city`.`CountryCode`) = `world`.`country`.`Code`)))))
1 row in set (0.00 sec)

mysql5.6.11 > show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `world`.`city`.`ID` AS `ID`,`world`.`city`.`Name` AS `Name`,
`world`.`city`.`CountryCode` AS `CountryCode`,`world`.`city`.`District` AS `District`,
`world`.`city`.`Population` AS `Population` from `world`.`country` join `world`.`city` 
where ((`world`.`city`.`CountryCode` = `world`.`country`.`Code`) 
and (`world`.`country`.`Population` = (/* select#3 */ 
select max(`world`.`country`.`Population`) from `world`.`country`)))
1 row in set (0.00 sec)

EXPLAIN EXTENDED writes the approximate query that MySQL is going to execute after the optimizer has applied any optimizations or transformations. This has been enhanced in MySQL 5.6 to add a comment for each step in the query execution (IDs match up to those in EXPLAIN). So if it was ever unclear, it is clearly this portion that is being optimized away:

mysql5.5.31 > EXPLAIN select max(`world`.`country`.`Population`) from `world`.`country`\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
1 row in set (0.00 sec)

mysql5.6.11 > EXPLAIN select max(`world`.`country`.`Population`) from `world`.`country`\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
1 row in set (0.00 sec)

I believe what’s happening here, is during optimization MySQL opens the index population and looks at the last value (very cheap on a B-tree). So it kind of cheats and does some of the work before it has to. I’ve seen it do this before, here is a more common example of this cheating happening:

mysql5.5.31 > EXPLAIN EXTENDED SELECT * FROM City WHERE id = 1890\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra:
1 row in set, 1 warning (0.00 sec)

mysql5.5.31 > show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select '1890' AS `ID`,'Shanghai' AS `Name`,'CHN' AS `CountryCode`,
'Shanghai' AS `District`,'9696300' AS `Population` from `world`.`city` where 1
1 row in set (0.00 sec)

mysql5.6.11 > EXPLAIN EXTENDED SELECT * FROM City WHERE id = 1890\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: City
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

mysql5.6.11 > show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select '1890' AS `ID`,'Shanghai' AS `Name`,'CHN' AS
 `CountryCode`,'Shanghai' AS `District`,'9696300' AS `Population` from `world`.`city` 
WHERE 1
1 row in set (0.00 sec)

Anyway, back to my original query. With the nesting of my IN queries I sometimes find it difficult to read the output of EXPLAIN and understand the order of execution. MySQL 5.6 also has FORMAT=JSON, which looks much nicer to me and it includes more information:

mysql5.6.11 > EXPLAIN format=json select * from City WHERE CountryCode IN 
(SELECT Code FROM country where population = (SELECT max(population) FROM Country))\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "country",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "Population"
          ],
          "key": "Population",
          "used_key_parts": [
            "Population"
          ],
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true,
          "attached_condition": "(`world`.`country`.`Population` = (/* select#3 */ 
          select max(`world`.`country`.`Population`) from `world`.`country`))",
          "attached_subqueries": [
            {
              "dependent": false,
              "cacheable": true,
              "query_block": {
                "select_id": 3,
                "table": {
                  "message": "Select tables optimized away"
                }
              }
            }
          ]
        }
      },
      {
        "table": {
          "table_name": "City",
          "access_type": "ref",
          "possible_keys": [
            "CountryCode"
          ],
          "key": "CountryCode",
          "used_key_parts": [
            "CountryCode"
          ],
          "key_length": "3",
          "ref": [
            "world.country.Code"
          ],
          "rows": 1,
          "filtered": 100
        }
      }
    ]
  }
}
1 row in set, 1 warning (0.00 sec)

While it’s possible that these queries could have been rewritten to be efficient joins, I really like seeing query optimizations being introduced to eliminate common paper cuts. Improving diagnostic features doesn’t hurt either 😉 I’m really looking forward to what tools can be built to take advantage of the JSON explain output.