Master MySQL – Page 2 – Blog of Morgan Tocker

Plan to deprecate PROCEDURE ANALYSE

In the MySQL team, we have been refactoring the SQL parser to be more maintainable. Gleb Shchepa lists the goals of this project in more details on the MySQL Server Team blog.

As part of this, we have identified the feature PROCEDURE ANALYSE as something that we would like to deprecate. For added context, here is a demonstration:

mysql> SELECT * FROM City procedure analyse()\G
*************************** 1. row ***************************
             Field_name: world.city.ID
              Min_value: 1
              Max_value: 4079
             Min_length: 1
             Max_length: 4
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 2040.0000
                    Std: 1177.5058
      Optimal_fieldtype: SMALLINT(4) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: world.city.Name
              Min_value: A Coruña (La Coruña)
              Max_value: ´s-Hertogenbosch
             Min_length: 3
             Max_length: 33
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 8.5295
                    Std: NULL
      Optimal_fieldtype: VARCHAR(33) NOT NULL
*************************** 3. row ***************************
             Field_name: world.city.CountryCode
              Min_value: ABW
              Max_value: ZWE
             Min_length: 3
             Max_length: 3
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 3.0000
                    Std: NULL
      Optimal_fieldtype: ENUM('ABW','AFG','AGO','AIA','ALB','AND','ANT','ARE','ARG','ARM','ASM','ATG','AUS','AUT','AZE','BDI','BEL','BEN','BFA','BGD','BGR','BHR','BHS','BIH','BLR','BLZ','BMU','BOL','BRA','BRB','BRN','BTN','BWA','CAF','CAN','CCK','CHE','CHL','CHN','CIV','CMR','COD','COG','COK','COL','COM','CPV','CRI','CUB','CXR','CYM','CYP','CZE','DEU','DJI','DMA','DNK','DOM','DZA','ECU','EGY','ERI','ESH','ESP','EST','ETH','FIN','FJI','FLK','FRA','FRO','FSM','GAB','GBR','GEO','GHA','GIB','GIN','GLP','GMB','GNB','GNQ','GRC','GRD','GRL','GTM','GUF','GUM','GUY','HKG','HND','HRV','HTI','HUN','IDN','IND','IRL','IRN','IRQ','ISL','ISR','ITA','JAM','JOR','JPN','KAZ','KEN','KGZ','KHM','KIR','KNA','KOR','KWT','LAO','LBN','LBR','LBY','LCA','LIE','LKA','LSO','LTU','LUX','LVA','MAC','MAR','MCO','MDA','MDG','MDV','MEX','MHL','MKD','MLI','MLT','MMR','MNG','MNP','MOZ','MRT','MSR','MTQ','MUS','MWI','MYS','MYT','NAM','NCL','NER','NFK','NGA','NIC','NIU','NLD','NOR','NPL','NRU','NZL','OMN','PAK','PAN','PCN','PER','PHL','PLW','PNG','POL','PRI','PRK','PRT','PRY','PSE','PYF','QAT','REU','ROM','RUS','RWA','SAU','SDN','SEN','SGP','SHN','SJM','SLB','SLE','SLV','SMR','SOM','SPM','STP','SUR','SVK','SVN','SWE','SWZ','SYC','SYR','TCA','TCD','TGO','THA','TJK','TKL','TKM','TMP','TON','TTO','TUN','TUR','TUV','TWN','TZA','UGA','UKR','URY','USA','UZB','VAT','VCT','VEN','VGB','VIR','VNM','VUT','WLF','WSM','YEM','YUG','ZAF','ZMB','ZWE') NOT NULL
*************************** 4. row ***************************
             Field_name: world.city.District
              Min_value: Abhasia [Aphazeti]
              Max_value: –
             Min_length: 1
             Max_length: 20
       Empties_or_zeros: 4
                  Nulls: 0
Avg_value_or_avg_length: 9.0194
                    Std: NULL
      Optimal_fieldtype: VARCHAR(20) NOT NULL
*************************** 5. row ***************************
             Field_name: world.city.Population
              Min_value: 42
              Max_value: 10500000
             Min_length: 2
             Max_length: 8
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 350468.2236
                    Std: 723686.9870
      Optimal_fieldtype: MEDIUMINT(8) UNSIGNED NOT NULL
5 rows in set (0.01 sec)

ANALYSE() examines the result from a query and returns an analysis of the results that suggests optimal data types for each column that may help reduce table sizes.

Our justification for wanting to deprecate PROCEDURE ANALYSE is as follows:

There are no other uses of SELECT * FROM table PROCEDURE. This syntax is used exclusively by ANALYSE, and uses the UK English spelling.
The name PROCEDURE predates the addition of stored procedures as a MySQL feature. Ideally this feature would use a different name (CHANNEL?) to avoid confusion in usage. It also exists as an extension to the SQL standard.
There are numerous advantages to a feature similar to this being external to the MySQL server. The server must follow a stable release cycle, with core functionality being unchanged once it is declared GA. As an external tool, it is much easier to develop in an agile way, and provide new functionality without having to provide the same level of backward compatibility.
By “external” I am implying that this could either be a script or as a view or stored procedure in MySQL. Shlomi has a good example of how to show auto_increment column capacity in common_schema!

Our current plan is to deprecate PROCEDURE ANALYSE in MySQL 5.7, for removal as soon as MySQL 5.8. We are inviting feedback from the MySQL Community and would like to hear from you if you use PROCEDURE ANALYSE. Please leave a comment, or get in touch!

An update on GROUP BY implicit sort

In the MySQL team, we have been planning for some time to remove the implicit sorting that is provided by GROUP BY. In doing so, we will make a number of existing queries faster (by no longer requiring a sort) as well as unlock opportunities to implement further optimizations.

This is one of the more complicated behaviours to remove, because it is not possible to tell if an application relies upon implicit ordering. Since a GROUP BY query without an ORDER BY clause is a valid query, it is also not reasonable to issue deprecation warnings.

However, one piece of the puzzle that was missing when I last wrote about this problem, is that MySQL 5.7 will support server-side query rewrite. What this means, is that Database Administrators will have the ability to inject an ORDER BY into queries that require this legacy behaviour. This is useful in the case where modifying the application directly is not possible.

The second part to this update, is that we also plan to deprecate the closely related syntax GROUP BY .. [ASC|DESC]. I am sure many users are probably unaware that exists, but you can change the implicit ordering to be in descending order with:

SELECT MAX(Population), Name FROM Country GROUP BY Name DESC;

(Note the missing “ORDER BY”).

This represents an extension from the SQL standard, that what I can tell is not present in other databases.

Our current plan is to make GROUP BY .. [ASC|DESC] deprecated as of MySQL 5.7, with removal in 5.8. As part of this, we also plan to remove the implicit GROUP BY sort as early as MySQL 5.8.

We are inviting feedback from the MySQL Community regarding this plan. Please leave a comment, or get in touch! I would love to hear from you.

SHOW ENGINE INNODB MUTEX is back!

We received feedback from a number of users in the MySQL community that the command SHOW ENGINE INNODB MUTEX remains useful in a number of scenarios. We listened, and the command is scheduled to make a return in MySQL 5.7.8.

To lessen overhead, the command will now feature a mechanism to enable and disable metrics collection. This is documented in the manual here:

SET GLOBAL innodb_monitor_enable='latch';
SET GLOBAL innodb_monitor_disable='latch';

Thank you for helping make a better MySQL!

Optimizer Trace and EXPLAIN FORMAT=JSON in 5.7

I accidentally stumbled upon this Stack Overflow question this morning:

I am wondering if there is any difference in regards to performance between the following:
SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4);
SELECT ... FROM ... WHERE someFIELD between  0 AND 5;
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...;

It is an interesting question because there was no good way to answer it when it was asked in 2009. All of the queries resolve to the same output in EXPLAIN. Here is an example using the sakila schema:

mysql> EXPLAIN SELECT * FROM film WHERE film_id BETWEEN 1 AND 5\G
mysql> EXPLAIN SELECT * FROM film WHERE film_id IN (1,2,3,4,5)\G
mysql> EXPLAIN SELECT * FROM film WHERE film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5\G
********* 1. row *********
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 5
     filtered: 100.00
        Extra: Using where

Times have changed though. There are now a couple of useful features to show the difference 🙂

Optimizer Trace

Optimizer trace is a new diagnostic tool introduced in MySQL 5.6 to show how the optimizer is working internally. It is similar to EXPLAIN, with a few notable differences:

It doesn’t just show the intended execution plan, it shows the alternative choices.
You enable the optimizer trace, then you run the actual query.
It is far more verbose in its output.

Here are the outputs for the three versions of the query:

What is the difference?

The optimizer trace output shows that the first query executes as one range, while the second and third execute as 5 separate single-value ranges:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 5,
                      "ranges": [
                        "1 <= film_id <= 1",
                        "2 <= film_id <= 2",
                        "3 <= film_id <= 3",
                        "4 <= film_id <= 4",
                        "5 <= film_id <= 5"
                      ]
                    },
                    "rows_for_plan": 5,
                    "cost_for_plan": 6.0168,
                    "chosen": true
                  }

This can also be confirmed with the handler counts from SHOW STATUS:

BETWEEN 1 AND 5: 
 Handler_read_key: 1
 Handler_read_next: 5
IN (1,2,3,4,5):
 Handler_read_key: 5
film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5:
 Handler_read_key: 5

So I would say that BETWEEN 1 AND 5 is the cheapest query, because it finds one key and then says next, next, next until finished. The optimizer seems to agree with me. A single range access plus next five times costs 2.0168 instead of 6.0168:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 5,
                      "ranges": [
                        "1 <= film_id <= 5"
                      ]
                    },
                    "rows_for_plan": 5,
                    "cost_for_plan": 2.0168,
                    "chosen": true
                  }
                }

For context, a cost unit is a logical representation of approximately one random IO. It is stable to compare costs between different execution plans.

Ranges are not all equal

Perhaps a better example to demonstrate this, is the difference between these two ranges:

SELECT * FROM film WHERE film_id BETWEEN 1 and 20
SELECT * FROM film WHERE (film_id BETWEEN 1 and 10) or (film_id BETWEEN 911 and 920)

It's pretty obvious that the second one needs to execute in two separate ranges. EXPLAIN will not show this difference, and both queries appear the same:

********* 1. row *********
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 20
     filtered: 100.00
        Extra: Using where

Two distinct separate ranges may be two separate pages, and thus have different cache efficiency on the buffer pool. It should be possible to distinguish between the two.

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON was introduced in MySQL 5.6 along with OPTIMIZER TRACE, but where it really becomes useful is MySQL 5.7. The JSON output will now include cost information (as well as showing separate ranges as attached_condition):

********* 1. row *********
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10.04"
    },
    "table": {
      "table_name": "film",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "film_id"
      ],
      "key_length": "2",
      "rows_examined_per_scan": 20,
      "rows_produced_per_join": 20,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "6.04",
        "eval_cost": "4.00",
        "prefix_cost": "10.04",
        "data_read_per_join": "15K"
      },
      "used_columns": [
        "film_id",
        "title",
        "description",
        "release_year",
        "language_id",
        "original_language_id",
        "rental_duration",
        "rental_rate",
        "length",
        "replacement_cost",
        "rating",
        "special_features",
        "last_update"
      ],
      "attached_condition": "((`film`.`film_id` between 1 and 10) or (`film`.`film_id` between 911 and 920))"
    }
  }
}

With the FORMAT=JSON output also showing cost, we can see that two ranges costs 10.04, versus one big range costing 9.04 (not shown). These queries are not identical in cost even though they are in EXPLAIN output.

Conclusion

I have heard many users say "joins are slow", but a broad statement like this misses magnitude. By including the cost information in EXPLAIN we get all users to speak the same language. We can now say "this join is expensive", which is a much better distinction 🙂

It is time to start using OPTIMIZER TRACE, and particularly in 5.7 ditch EXPLAIN for EXPLAIN FORMAT=JSON.

MySQL is 20 years old tomorrow!

According to Wikipedia, the initial release of MySQL was 23 May 1995. This places it at 20 years ~~old~~ young tomorrow.

2015 is actually a year of anniversaries. As well as MySQL reaching the big two-oh, we are also celebrating 10 years of InnoDB under the stewardship of Oracle, and five years of MySQL @ Oracle.

Please make sure toast to MySQL this weekend!

(Photo from Percona Live community dinner 2015.)

A quick update on our native Data Dictionary

In July 2014, I wrote that we were working on a new native InnoDB data dictionary to replace MySQL’s legacy frm files.

This is quite possibly the largest internals change to MySQL in modern history, and will unlock a number of previous limitations, as well as simplify a number of failure states for both replication and crash recovery.

With MySQL 5.7 approaching release candidate (and large changes always coming with risk attached) we decided that the timing to try to merge in a new data dictionary was just too tight. The data dictionary development is still alive and well, but it will not ship as part of MySQL 5.7.

So please stay tuned for updates… and thank you for using MySQL!

MySQL 5.6.24 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.24.

In particular:

Thank you to Simon Mudd for reporting that ALTER TABLE did not take advantage of fast alterations if the table contained temporal columns found to be in pre-5.6.4 format. Bug #72997.
Thank you to Doug Warner for reporting that a TRUNCATE TABLE operation on a temporary table raised an assertion. Bug #72080.
Thank you to Elena Stepanova for reporting that an InnoDB full-text phrase search returned incorrect results. Bug #75755.
Thank you to Miljenko Brkic for reporting that the exptime set with the memcached API set command was ignored. Bug #70055.
Thank you to Nilnandan Joshi for reporting a regression where the mysqld server could not shutdown while the memcached plugin was active. Bug #74956.
Thank you to Zhai Weixiang for reporting that several mutexes on InnoDB dummy tables caused contention, and could be relaxed safely. Bug #73361.
Thank you to Zhai Weixiang for reporting that append operations using the InnoDB memcached API could cause a segfault. Bug #75200.
Thank you to Ramesh Sivaraman and Roel Van de Paar for reporting that a number of ALTER TABLE statements that attempt to add partitions, columns, or indexes to a partitioned table while a write lock was in effect for this table were not handled correctly. Bug #74451, Bug #74478, Bug #74491, Bug #74560, Bug #74746, Bug #74841, Bug #74860, Bug #74869.
Thank you to Santosh Praneeth Banda for reporting that when gtid_mode=ON and slave_net_timeout was set to a low value, the slave I/O thread could appear to hang. Bug #74607.
Thank you to Tsubasa Tanaka and Daniël van Eeden for independently reporting that a server warning error referred to an obsolete table_cache system variable. Thank you also to Daniël for providing a patch. Bug #73373, Bug #75081.
Thank you to Simon Mudd for reporting that Performance Schema statement events tables incorrectly referenced system variables (@ @ instead of @@). Bug #71634.
Thank you to Yao Deng for reporting that mysql_real_connect() could close a file descriptor twice if the server was not running. Bug #69423.
Thank you to Olle Nilsson for reporting that notification of events for the general log were received by the audit log plugin only of the general query log was enabled. Bug #60782.
Thank you to Daniël van Eeden for reporting a security issue.

Thank you again to all the community contributors listed above. If I missed a name here, please let me know!

– Morgan

MySQL 5.6.23 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.23.

In particular:

Thank you to Inaam Rana for reporting an issue with the InnoDB purge thread state and tablespace export operations. Bug #75298.
Thank you to Inaam Rana for reporting that the INNODB_METRICS adaptive_hash_searches_btree counter failed to report counter data. Bug #74511.
Thank you to Shahriyar Rzayev and David Bennett for suggesting improvements to error messages from InnoDB on non-Windows platforms David also provided a suggested patch. Bug #73365.
Thank you to Sadao Hiratsuka and Piotr Jurkiewicz for independently reporting that integer columns failed to increment and decrement correctly when using the memcached plugin. Piotr also provided a suggested patch to fix the issue. Bug #69415, Bug #74874.
Thank you to Roel Van de Paar for reporting a security issue. Bug #74292.
Thank you to Roel Van de Paar for reporting a security issue. Bug #74597.
Thank you to Santosh Praneeth Banda for reporting a situation where a slave could have more GTIDs than a master, and data inconsistency could occur. Bug #72635.
Thank you to Domas Mituzas for reporting that when using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. Bug #72376.
Thank you to Matt Swanson for reporting that InnoDB full-text boolean searches incorrectly handled + combined with parentheses. Bug #74845.
Thank you to Roel Van de Paar for reporting a security issue. Bug #74447.
Thank you to Sean Pringle for reporting that an internal temporary table could cause problems if the file was orphaned and the name was used later for other queries. Bug #32917.

Thank you again to all the community contributors listed above. If I missed a name here, please let me know!

– Morgan

Making the case to support +2 version upgrades

In the MySQL team, we have always had a requirement to support upgrades from one major version. For example:

Upgrading from MySQL 5.5 to 5.6 is supported.
Upgrading from MySQL 5.1 to 5.6 is not supported.

Downgrades are also supported for one major version. For example, if a user upgrades to 5.6 but discovers that it is not working as expected, they have the safety knowing that there is a way to step back to MySQL 5.5. This may come with some limits; for example when new features (such as new row formats or page checksums) are enabled, this may no longer be possible.

Today I wanted to discuss a current non-requirement. We do not support skipping major versions, such as upgrading from MySQL 5.1 to 5.6. Justin however makes the case that despite not being supported, it has often worked (at least when used with mysqldump).

We believe that Justin’s bug report has a lot of merit, and are considering extending our requirements to support a +2 version upgrade (i.e. 5.5 to 5.7). This will have a noticeable impact on our QA team, and the effort required to expand testing to handle additional upgrade scenarios will have to be carefully evaluated.

We are seeking feedback from our users in the community in response to BUG #76264! The specific questions we would like to ask you are:

Are you currently running MySQL 5.5 and planning to upgrade directly to MySQL 5.7?
If so; are there any constraints that make it too to be able to step through 5.6 as part of the upgrade process?
Would it be acceptable if a 5.5 to 5.7 upgrade was only supported via mysqldump?
Do you have a requirement for a >+2 version upgrade?

Please leave a comment, or get in touch!

Heads up! MySQL 5.7 DMR6 contains a (small) known bug

MySQL 5.7 DMR6 was been released today! By my crude measurement, it is a big release with a number of new features and bug fixes:
morgo@Rbook:~$ for V in 1 2 3 4 5 6; do curl --silent http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-$V.html | wc -l; done; 2543 4914 <-- DMR2 2282 2940 4118 4121 <-- DMR6

The release notes have one important known bug to note:

This bug has been fixed in 5.7.7.