MySQL 5.6.15 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.15.

In particular:

  • Thanks to Yoshinori Matsunobu, who reported that the performance_schema did not instrument locks related to semisync replication. Bug #70577.
  • Thanks to Sebastian Strzelczyk, who reported excessive memory usage when querying INFORMATION_SCHEMA.INNODB_FT_INDEX. Sebastian uploaded a number of reports of memory usage, which was helped us generate a good reproducible testcase. Bug #70329.
  • Thanks to Inaam Rana, who suggested improvements to InnoDB counters in information_schema, and contributed a patch! Bug #70241.
  • Thanks to Yoshinori Matsunobu, who reported inefficiency in InnoDB memcached API. Yoshinori’s feedback was also part of the 1 million QPS record set for 5.7. Bug #70172.
  • Thanks to Nizameddin Ordulu, who identified a case where crash recovery might not be possible with InnoDB. Nizameddin pointed out where in the code the bug existed, and we were quickly able to identify that this was a regression from an earlier bug fix. Bug #69122.
  • Thanks to Elena Stepanova, who identified a case where SELECT COUNT queries would run very slowly when run concurrently with a “LOAD DATA” operation. Elena also provided a testcase! Bug #69141.
  • Thanks to Daniël van Eeden, who reported a confusing/incorrect error message when using the InnoDB memcached API. Bug #68684.
  • Thanks to Monty Widenius and Michal Sulik, who both reported an issue where a unique composite index would return wrong results. Bug #70038, Bug #60220.
  • Thanks to Edward Dore, who reported an issue when converting tables from MyISAM to InnoDB. It was from a table that Edward provided that we were able to reproduce the issue! Bug #50366.
  • Thanks to Elena Stepanova who reported an error when deleting from a partitioned table, and provided a testcase! Bug #67982.
  • Thanks to Santosh Praneeth Banda for reporting a security issue. Bug #70349.
  • Thanks to Vlad Lesin, who reported that certain LOAD DATA statements would appear in the binary log twice, and provided a testcase! Bug #70277.
  • Thanks to Yoshinori Matsunobu, who reported that transactions could be lost when starting/stopping slaves with GTIDs. Yoshinori also provided a small program which we were able to use to reproduce the issue. Bug #69943.
  • Thanks to raolh rao for reporting a security issue. #70045.
  • Thanks to Miguel Angel Nieto, who reported that LAST_INSERT_ID is replicated incorrectly when replication filters are used. #69861.
  • Thanks to Santosh Praneeth Banda, who reported an issue with GTIDs and a patch with a suggested fix! Bug #69618.
  • Thanks to Maciej Dobrzanski, who reported an issue where replication would break following invalid grant statements. Bug #68892.
  • Thanks to Rich Prohaska and Przemyslaw Malkowski, who both reported a problem with BINARY columns and online DDL. Bugs #67141 and #69580.
  • Thanks to Jan Staněk, who submitted a patch with code improvements following the result of a Coverity scan. Bug #68918.
  • Thanks to Alexey Kopytov, who reported an issue in debug builds of MySQL, Bug #69653.
  • Thanks to Arthur O’Dwyer, who pointed out that COUNT DISTINCT would incorrectly count NULL rows. Bug #69841.
  • Thanks to Lance Ivy for discovering an issue where UPDATE does not update rows in the correct order. Bug #68656.
  • Thanks to Elena Stepanova for pointing out that mysql_upgrade will fail with the InnoDB storage engine disabled. Bug #70152.
  • Thanks to Arthur O’Dwyer, who reported that prefix indexes on text columns could return wrong results. Bug #70341.
  • Thanks to Yoshinori Matsunobu who pointed out that PERFORMANCE_SCHEMA overhead was very high in frequent connect/disconect scenarios. Bug #70018.
  • Thanks to Moritz Schuepp, who reported that host_cache_size set in a configuration file was ignored. Bug #70552.
  • Thanks to Saverio Miroddi, who reported an issue with InnoDB FULLTEXT and using savepoints. Bug #70333.
  • Thanks to Hartmut Holzgraefe, who reported that users with a legacy hostname may not be possible to drop. Hartmut also provided a patch! Bug #62255, #62254.
  • Thanks to Honza Horak, who reported that MySQL incorrectly assumed all ethernet interfaces start with ethX. Honza also provided a patch! Bug #63055.
  • Thanks to Honza Horak, who reported a number of code improvements (and patches) resulting from Coverity analysis. Bug #70591.
  • Thanks to Jan Staněk, who suggested code improvements (and a patch) resulting from Coverity analysis. Bug #68896.
  • Thanks to Roderik van der Veer, who reported that MySQL would not compile on Mac OS X 10.9. Thanks also to David Peixotto who provided an example patch of how to fix the problem. While we didn’t end up using David’s patch, we certainly value his contribution. Bug #70542.
  • Thanks to Olaf van der Spek and Simon Mudd, who both reported an issue that affected server shutdown. Bugs #18256, #69975.

Thank you again all the names listed above. In particular, I would like to call out the names that appear more than once:

Yoshinori Matsunobu (4), Elena Stepanova (3), Santosh Praneeth Banda (2), Jan Staněk (2), Arthur O’Dwyer (2), Honza Horak (2).

And the names of those who contributed patches:

Inaam Rana, Santosh Praneeth Banda, Hartmut Holzgraefe, Honza Horak, Jan Staněk

If I missed a name here, please let me know!

- Morgan

Help shape the future of SHOW ENGINE INNODB MUTEX

Before there was PERFORMANCE_SCHEMA, there was SHOW ENGINE INNODB MUTEX. As I wrote yesterday, these commands are useful for exposing internal locking operations that happen inside of MySQL. They are mainly useful for MySQL developers, but have some appeal for DBAs as well.

As part of an ongoing effort to cleanup the MySQL code-base and increase usability, the MySQL development team is interested in hearing feedback from users who currently use the command SHOW ENGINE INNODB MUTEX.

Specifically:

  • Is there a use case for the command that is not well satisfied by Performance Schema?
  • Would you be adversely affected if SHOW ENGINE INNODB MUTEX were to be deprecated?
  • What tools do you use to analyze mutex contention, and what features could we add to benefit you?

Please leave a comment, or get in touch. We would love to hear from you.

How do you use mysqldump?

The MySQL development team is seeking feedback on how you use mysqldump!

Here is some of the feedback I have to pass along:

  • With InnoDB now the default (since MySQL 5.5) I would love to see mysqldump change to take advantage of this. For example:
    • mysqldump can do a hot backup with all InnoDB tables, all you need to do is include --single-transaction. We need to find a way to make this the default behaviour so that applications are not blocked waiting.
    • Adding indexes should be delayed until after all data has been inserted into the table to take advantage of the InnoDB fast-index creation feature (InnoDB Plugin/5.5+).
    • Option to dump as MyISAM, restore as InnoDB.
  • An option to have progress reports while running. Even in the form 34/60 tables backed up, this is incredibly useful for beginners.
  • Take inspiration from mydumper. The user should be able to dump and restore in parallel 🙂

As with some of my other posts, please leave a comment or get in touch!

Proposal to remove InnoDB Monitor tables

Before there was information_schema and performance_schema, there were InnoDB Monitor tables.

What the monitor tables are, is a way of communicating directly to the InnoDB storage engine by creating a table of a special name. Additional debugging information is then written to MySQL’s error log. For example:

CREATE TABLE innodb_monitor (a INT) ENGINE=INNODB;

/* view to the error log */

DROP TABLE innodb_monitor;

There’s a number of problems with this approach:

  1. By using a magic table name, there is no immediate feedback for typos or spelling errors in monitor table names. The user has to wait to see if the error log is written to, and then debug if not.
  2. The monitors will continually write to the error log as long as the magic table still exists. This can reduce the error log readability.
  3. Parsing and using the error log is non-trivial for automated tooling. It also requires local filesystem access, and the log-file path can be changed by configuration.
  4. It is difficult to apply privileges to the creation of magic tables, other than those required to CREATE or DROP tables. This is not specifically a security risk (local file system access is still required to view the log file), but semantically it makes more sense for these to require either the PROCESS or SUPER privilege.
  5. information_schema is the SQL Standard for meta-data access and should be used wherever possible.

Deprecation Plan

The deprecation plan proposed for MySQL 5.7 is:

  • innodb_monitor. To be replaced with SET GLOBAL innodb_monitor=ON|OFF.
  • innodb_lock_monitor. To be replaced with SET GLOBAL innodb_lock_monitor=ON|OFF.
  • innodb_tablespace_monitor. To be removed. information_schema will become the recommended alternative.
  • innodb_table_monitor. To be removed. information_schema will become the recommended alternative.
  • innodb_mem_validate. To be removed. This depends on UNIV_MEM_DEBUG, which is not normally enabled even in debug builds.

So in all cases the magic table name (point #1 above) is removed, but in some cases the monitors remain to offer compatibility where the corresponding information_schema functionality may differ dramatically in meta-data available.

Conclusion

As with our other upcoming changes planned, we are seeing community feedback as to how this change will affect users.

Do you use the InnoDB Monitors?
Will the deprecation plan affect you negatively?

Please leave a comment, or get in touch with me. Thanks!

The future of ALTER IGNORE TABLE syntax

As part of the ongoing effort to refactor MySQL and make it better, we are defining the behaviour of certain features and in some cases reimplimenting them in a more maintainable fashion.

I have previously blogged on:
* A proposal to simplify SQL_MODE options.
* The Federated Storage Engine.
* The Query Cache.
* The deprecation of EXPLAIN PARTITIONS and EXPLAIN EXTENDED.
* The deprecation of NULL synonym \N.

Today what I want to write about, is the IGNORE option in ALTER TABLE. From the MySQL manual:

“IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table
or if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key
errors occur. If IGNORE is specified, only the first row is used of
rows with duplicates on a unique key. The other conflicting rows are
deleted. Incorrect values are truncated to the closest matching
acceptable value.”

This creates several issues for the MySQL server team:

  1. IGNORE could remove rows from a parent table when using a foreign key relationship.
  2. IGNORE makes it impossible to use InnoDB Online DDL for several operations, for example adding a PRIMARY KEY or UNIQUE INDEX.
  3. IGNORE has some strange side-effects for replication. For example: DDL is always replicated via statement-based replication, and since SQL does not imply ordering, it’s not clear which rows will be deleted as part of the ignore step. I also see cross-version replication problematic if future MySQL versions were to introduce more strictness, since a slave may de-duplicate more rows.

The most common case

We believe that the most common use case for IGNORE is to be able to add a UNIQUE INDEX on a table which currently has duplicate values present. i.e.

ALTER IGNORE TABLE users ADD UNIQUE INDEX (emailaddress);

In this scenario, a novice user manages to avoid auditing each entry in the users table, and simply lets MySQL pick a row to be kept, with all duplicates automatically removed.

There are two other ways to be able to do that:

Hand removal

Using the same an example as above, return a list of email addresses and PRIMARY KEY values for records that conflict:

SELECT GROUP_CONCAT(id), emailaddress, count(*) as count FROM users 
GROUP BY emailaddress HAVING count >= 2;

/* delete or merge duplicate from above query */

ALTER TABLE users ADD UNIQUE INDEX (emailaddress);

Note: This method will be the fastest way, since when not using IGNORE, MySQL is able to use InnoDB’s Online DDL.

New table + INSERT IGNORE

While this method looks very similar, internally it’s semantics are quite different:

CREATE TABLE users_new LIKE users;
ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
INSERT IGNORE INTO users_new SELECT * FROM users;
DROP TABLE users;
RENAME TABLE users_new TO users;

By creating a table first, the MySQL server will not have to manage rows in a foreign key relationship. The rows will also be re-sent to the slave using row-based replication, so issue (3) I mentioned above does not come into play.

Conclusion

We are looking for feedback on how you use this feature.

Is there another common use-case we are not accounting for?

Would it be acceptable to add a feature to MySQL Workbench to assist in preparing a table for adding a unique index (similar to hand removal)?

Help shape the future of MySQL. Please leave a comment or contact me.

Proposal to simplify SQL_MODE options

In the MySQL team, we’ve observed that having a large number of possible SQL modes creates confusion amongst users. For example, if you follow my advice on ‘recommended defaults for new applications’, it is:

sql-mode="STRICT_TRANS_TABLES,
  ERROR_FOR_DIVISION_BY_ZERO,
  NO_AUTO_CREATE_USER,
  NO_AUTO_VALUE_ON_ZERO,
  NO_ENGINE_SUBSTITUTION,
  NO_ZERO_DATE,
  NO_ZERO_IN_DATE,
  ONLY_FULL_GROUP_BY"

This is quite specific, and may be too much to expect of a novice user to set in configuration.

One proposal we are putting forward to address this is to simply reduce the number of options that are available. Namely:

  • Remove the options ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE and NO_ZERO_IN_DATE.
  • Have these behaviours be enabled when turning on the strict mode (STRICT_TRANS_TABLES for transactional tables, STRICT_ALL_TABLES for all tables).

Which would then bring the list down to:

sql-mode="STRICT_TRANS_TABLES,
  NO_AUTO_CREATE_USER,
  NO_AUTO_VALUE_ON_ZERO,
  NO_ENGINE_SUBSTITUTION,
  ONLY_FULL_GROUP_BY"

This has some compatibility consequences of course. To understand the change a little more, let me explain what these options do:


ERROR_FOR_DIVISION_BY_ZERO

Produce an error in strict mode (otherwise a warning) when a division by zero (or MOD(X,0)) occurs during an INSERT or UPDATE. If this mode is not enabled, MySQL instead returns NULL for divisions by zero. For INSERT IGNORE or UPDATE IGNORE, MySQL generates a warning for divisions by zero, but the result of the operation is NULL.

For SELECT, division by zero returns NULL. Enabling this mode causes a warning to be generated as well.

NO_ZERO_DATE

In strict mode, do not permit '0000-00-00' as a valid date. You can still insert zero dates with the IGNORE option. When not in strict mode, the date is accepted but a warning is generated.

NO_ZERO_IN_DATE

In strict mode, do not accept dates where the year part is nonzero but the month or day part is 0 (for example, '0000-00-00' is legal but '2010-00-01' and '2010-01-00' are not). If used with the IGNORE option, MySQL inserts a '0000-00-00' date for any such date. When not in strict mode, the date is accepted but a warning is generated.


So in a sense, we are proposing that strict mode will become stricter and that each of these individual behaviours will no longer have their own on/off switches.

What do you think –

  • Do you agree that SQL_MODE usability is an issue?
  • Would this change cause you any compatibility issues? i.e. We would be interested to hear a use-case where you need to enable these options individually.
  • Do you have any other suggestions on how to improve SQL_MODE usability?

We value your input. Please leave a comment, or get in touch.

EXPLAIN PARTITIONS and EXPLAIN EXTENDED deprecation

In MySQL 5.7 we are planning to deprecate the syntax:

EXPLAIN PARTITIONS <insert query here>
EXPLAIN EXTENDED <insert query here>

.. and enable these two options by default.

The rationale is that:

  • Simple and consistent is always better. EXPLAIN FORMAT=JSON already behaves like these two flags are enabled, and if you have a partitioned table for example, it is unlikely that you would not want the PARTITIONS option. Having to remember two more flags makes the product harder to use.

  • The optimizer team has been busy refactoring and improving code quality. These two flags are supported by many if-statements, increasing complexity by more than we would like.

The intended deprecation plan is to automatically turn both flags on in MySQL 5.7 and issue deprecation warnings when using the older syntax. In MySQL 5.8, the syntax will be removed.

What do I expect will break?

That’s a good question, since EXPLAIN is really a DBA tool, and is unlikely to affect running applications. I expect it to be:

  • Automated tools that depend on column order of EXPLAIN, or EXPLAIN not producing a warning (as of MySQL 5.7).

  • Automated tools that explicitly use EXPLAIN PARTITIONS or EXPLAIN EXTENDED (as of MySQL 5.8).

Both I hope should be fairly simple fixes.

How can I make my tools more resilient?

While we’re on the subject of EXPLAIN and automated tools, it is a good segway to lead into EXPLAIN FORMAT=JSON, which is very well suited here. I would expect that with JSON output, applications are more likely to be able to tolerate changes to the information returned and in MySQL 5.7 the JSON is already much more detailed.

Heads up – Implicit sorting by GROUP BY is deprecated in MySQL 5.6

For those who were unaware, in MySQL the following statements are currently identical:

SELECT MAX(Population), CountryName FROM City GROUP BY CountryName;
SELECT MAX(Population), CountryName FROM City GROUP BY CountryName ORDER BY CountryName;

That is to say that regardless of whether or not you asked for it, whenever you chose to GROUP BY, you will also have data sorted and returned in that order too.

The problem with this, is that it can result in worse performing queries. Sorting either reduces the number of execution plans possible, or requires an additional step to sort the data. Which is why many DBAs advocate writing group by queries with ORDER BY NULL. i.e.

SELECT MAX(Population), CountryName FROM City GROUP BY CountryName;

Should be written as:

SELECT MAX(Population), CountryName FROM City GROUP BY CountryName ORDER BY NULL;

But, as Roland Bouman notes in BUG #30477 there is no standard which requires MySQL to order data in this way, and this behaviour is not present in other databases. So in MySQL 5.6, it was decided that relying on this implicit sorting should be deprecated. From
the manual:

“Relying on implicit GROUP BY sorting in MySQL 5.6 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause. GROUP BY sorting is a MySQL extension that may change in a future release; for example, to make it possible for the optimizer to order groupings in whatever manner it deems most efficient and to avoid the sorting overhead.”

So nothing has changed yet, but please make sure you are using GROUP BY CountryName ORDER BY CountryName if your application requires it.

How do you use the Federated Storage Engine?

We are looking for community feedback on the use-cases for the Federated Storage Engine in MySQL.

Specifically, I would be interesting to hear which users find that the introduction of multi-source replication meets (or does not meet) their requirements.

For a bit of background:

The Federated Storage Engine was introduced in 5.0, but has long been disabled by default. It is useful for ad-hoc queries across MySQL servers, but it misses some of MySQL’s newer optimizations, and does not perform as well as we would like. Among it’s limitations it also does not support transactions.

By using multi-source replication, many of the downsides mentioned above are negated, since the tables can be made available to query locally. The disadvantage of course, is that there is a need to provision more storage (however, given the performance limitations of federated it is not certain it could fill this use-case either).

So please leave a comment, or get in touch. I would love to hear how you are using the Federated Engine, and what you plan to use multi-source replication for.

MySQL 5.6.14 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.14.

In particular:

  • Thanks to Justin Swanhart for pointing out an issue with statistics on partitioned tables – bug #69179. While we marked Justin’s bug as a duplicate, Justin continued to provide commentary on an earlier bug opened, and we appreciate his feedback.
  • Justin Swanhart also noticed a change in behaviour in MySQL 5.6 with an assertion raised in InnoDB – bug #70007. We’ve since reverted to the previous behaviour.
  • Thanks to Zhai Weixiang, who noticed that InnoDB code intended to only run in debug versions of MySQL was unintentionally running in all versions of mysql. Bug #69724.
  • Thanks to both Robert Lineweaver and Ryan Kenney for reporting bugs #69693 and #69707, as well as providing excellent test cases and research into versions effected. The problems both related to InnoDB mishandling some foreign key names.
  • Thanks to Roel Van de Paar who reported an assertion when innodb_log_group_home_dir does not exist. Bug #69000
  • Thanks to Elena Stepanova for reporting bug #65146. From Elena’s contribution, we both updated the documentation and now produce a warning when START TRANSACTION WITH CONSISTENT SNAPSHOT is used in isolation levels other than REPEATABLE READ.
  • Thanks to Mark Callaghan for reporting an issue with InnoDB’s background thread and shutting down the mysql server. Bug #63276.
  • Thanks to Alexey Kopytov for detecting a situation where an infinite loop could occur in handling InnoDB’s compressed pages. Alexey’s bug report contained detailed code analysis and was very helpful. Bug #61132.
  • Thanks to Shahriyar Rzayev for reporting the original bug, and uploading detailed data to assist in reproduction. It was from these contributions we were later able to devise a testcase and reproduce the bug internally! Bug #69898
  • Thanks to Justin Swanhart and Tsubasa Tanaka who both identified a bug in MASTER_DELAY not type checking input. Bugs #69249 and #69469.
  • Thanks to Hartmut Holzgraefe for reporting an issue with Row-based-replication and decimal data type between MySQL versions. Bug #57166.
  • Thanks to Mark Callaghan for reporting a performance regression in 5.6. Bug #68825
  • Thanks to Jean Weisbuch for identifying an issue with information_schema. Bug #68076.
  • Thanks to Simon Mudd for discovering an issue where the event scheduler did not correctly report when it could not create a new thread. Bug #67191
  • Thanks to Sergey Petrunya who reported bug #67507, as well as the community of users who provided input on the bug – mysqldump is often run from cron, so changes in behaviour risk backups not being performed.
  • Thanks to a private bug reporter who demonstrated that MySQL inadequently protects against stack overflow, and provided an example patch. Bug #35019
  • Thanks to “Dude Letme BE” who reported symbols were missing from libmysql.dll. Bugs #69204, #62394.
  • Thanks to Chito Angeles for reporting a bug in InnoDB fulltext search, and reducing it to a simplified testcase. Bug #69932.

Thank you again all the names listed above. In particular, I would like to call out the two names that appear more than once: Justin Swanhart (3), Mark Callaghan (2).

- Morgan