A followup on show_compatibility_56

Giuseppe and Shlomi both blogged on one of the recent changes introduced to MySQL 5.7.8-rc, where the setting show_compatibility_56 is now set OFF by default.

Both raise very good points. Here is how we plan to address them:

  1. The permissions issue reported by Giuseppe will be fixed.
  2. When selecting from information_schema tables in show_compatibility_56=OFF mode, an error will now be produced:

    mysql> select * from information_schema.global_variables;
    ERROR 3167 (HY000): The 'INFORMATION_SCHEMA.GLOBAL_VARIABLES' feature is disabled; see the documentation for 'show_compatibility_56'

    (Previously this was a warning + empty set returned)
  3. When show_compatibility_56=ON, users will now be able to select from either information_schema or performance_schema tables. This presents a more viable upgrade path for users that need some period to transition.

The show_compatibility_56 setting itself will remain deprecated, as it was from its introduction. This signifies that we intend to remove INFORMATION_SCHEMA.GLOBAL_VARIABLES in a future release.

Outside of the scope of today’s update, my colleague Mark Leith has also volunteered to blog with examples of how the new performance_schema tables now expose variables down to the thread-level. This will add more context as to why we are moving this meta data from information_schema to performance_schema. Thanks Mark!

So thank you again to Giuseppe and Shlomi for helping make a better MySQL. We’re delighted to incorporate your feedback!

MySQL 5.7.8 – mysqlpump caveat

MySQL 5.7.8-rc2 was released today, and features a new server utility called mysqlpump. This utility contains a number of major improvements over mysqldump including:

  • Parallel processing of databases, and of objects within databases, to speed up the dump process
  • For dump file reloading, faster secondary index creation for InnoDB tables by adding indexes after rows are inserted
  • Better control over which databases and database objects (tables, views, stored programs, user accounts) to dump
  • Dumping of user accounts as account-management statements (CREATE USER, GRANT) rather than as inserts into the mysql system database
  • Capability of creating compressed output
  • Progress indicator

What I wanted to caution however, is that mysqlpump is not currently consistent. That is to say that currently each of the dump threads lack a synchronization point before they start backing up the data. This makes it currently unsafe as a general purpose backup replacement.

The mysqlpump developers are aware of this limitation, and are busy adding this feature into the utility.

Edit: This functionality was added to MySQL 5.7.11.

MySQL 5.7.8 – Now featuring super_read_only and disabled_storage_engines

I wanted to highlight two new features that are making their way into MySQL 5.7 via the not-yet-released 5.7.8-rc2:

  • A new system variable super_read_only allows a more strict definition of ‘read-only’ which also applies to super users.
  • A new disabled_storage_engines setting offers a way to prevent an enumerated list of storage engines from being used. For example, a DBA may wish to enforce an InnoDB-only policy to simplify common operations such as backups, but it’s possible MyISAM may sneak back in via new code-deployments. This setting allows more active enforcement.

These features are the fruits of our engineering team meeting with our users at Percona Live this year. Thank you to Percona for once again hosting a great conference, and in particular thank you to @isamlambert (and the GitHub Engineering team), @John_Cesario, @denshikarasu & Rob Wultsch for specifically requesting these two features 🙂

Proposal to deprecate INSERT and REPLACE alternative syntax

In the MySQL team we are currently considering a proposal to deprecate a number of alternative syntax uses with the INSERT and REPLACE commands. To provide examples:

CREATE TABLE `city` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1;
INSERT INTO city SET
 Name='NewCity', CountryCode='CAN',District='MyDistrict',Population=1234;
INSERT INTO city (Name,CountryCode,District,Population) VALUE
 ('NewCity2', 'CAN', 'MyDistrict', 1234);
INSERT city (Name,CountryCode,District,Population) VALUES
 ('NewCity3', 'CAN', 'MyDistrict', 1234);
REPLACE INTO city (Name,CountryCode,District,Population) VALUE
 ('NewCity4', 'CAN', 'MyDistrict', 1234);
REPLACE city (Name,CountryCode,District,Population) VALUES
 ('NewCity5', 'CAN', 'MyDistrict', 1234);

To summarize these queries:

  • INSERT using the SET syntax.
  • INSERT and REPLACE using the keyword VALUE instead of VALUES
  • INSERT and REPLACE without the keyword INTO

Our rationale for this proposal is as follows:

  • Having a number of very similar ways of completing the same task makes it very difficult for training, documentation and support. To explain this in more detail: in our manual we have always tried to document every option that the server will accept, but with no functional difference between the options, this makes the content more verbose and clumsy to read.

    MySQL usage becomes cleaner by stating which usage is explicitly preferred, even if the old syntax remains supported for the legacy use-case.

  • The syntax is non-standard and complicates our parser. While it may take some time before we are able to remove these options, by starting the deprecation cycle now we can provide application authors with as much notice as possible.

Our proposed plan is to deprecate the syntax starting with MySQL 5.7. We will assess feedback from our users before targeting a version for syntax removal.

Will you be affected by this change? Please leave a comment, or Get in touch! We’d love to hear from you.

Plan to deprecate PROCEDURE ANALYSE

In the MySQL team, we have been refactoring the SQL parser to be more maintainable. Gleb Shchepa lists the goals of this project in more details on the MySQL Server Team blog.

As part of this, we have identified the feature PROCEDURE ANALYSE as something that we would like to deprecate. For added context, here is a demonstration:

mysql> SELECT * FROM City procedure analyse()\G
*************************** 1. row ***************************
             Field_name: world.city.ID
              Min_value: 1
              Max_value: 4079
             Min_length: 1
             Max_length: 4
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 2040.0000
                    Std: 1177.5058
      Optimal_fieldtype: SMALLINT(4) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: world.city.Name
              Min_value: A Coruña (La Coruña)
              Max_value: ´s-Hertogenbosch
             Min_length: 3
             Max_length: 33
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 8.5295
                    Std: NULL
      Optimal_fieldtype: VARCHAR(33) NOT NULL
*************************** 3. row ***************************
             Field_name: world.city.CountryCode
              Min_value: ABW
              Max_value: ZWE
             Min_length: 3
             Max_length: 3
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 3.0000
                    Std: NULL
      Optimal_fieldtype: ENUM('ABW','AFG','AGO','AIA','ALB','AND','ANT','ARE','ARG','ARM','ASM','ATG','AUS','AUT','AZE','BDI','BEL','BEN','BFA','BGD','BGR','BHR','BHS','BIH','BLR','BLZ','BMU','BOL','BRA','BRB','BRN','BTN','BWA','CAF','CAN','CCK','CHE','CHL','CHN','CIV','CMR','COD','COG','COK','COL','COM','CPV','CRI','CUB','CXR','CYM','CYP','CZE','DEU','DJI','DMA','DNK','DOM','DZA','ECU','EGY','ERI','ESH','ESP','EST','ETH','FIN','FJI','FLK','FRA','FRO','FSM','GAB','GBR','GEO','GHA','GIB','GIN','GLP','GMB','GNB','GNQ','GRC','GRD','GRL','GTM','GUF','GUM','GUY','HKG','HND','HRV','HTI','HUN','IDN','IND','IRL','IRN','IRQ','ISL','ISR','ITA','JAM','JOR','JPN','KAZ','KEN','KGZ','KHM','KIR','KNA','KOR','KWT','LAO','LBN','LBR','LBY','LCA','LIE','LKA','LSO','LTU','LUX','LVA','MAC','MAR','MCO','MDA','MDG','MDV','MEX','MHL','MKD','MLI','MLT','MMR','MNG','MNP','MOZ','MRT','MSR','MTQ','MUS','MWI','MYS','MYT','NAM','NCL','NER','NFK','NGA','NIC','NIU','NLD','NOR','NPL','NRU','NZL','OMN','PAK','PAN','PCN','PER','PHL','PLW','PNG','POL','PRI','PRK','PRT','PRY','PSE','PYF','QAT','REU','ROM','RUS','RWA','SAU','SDN','SEN','SGP','SHN','SJM','SLB','SLE','SLV','SMR','SOM','SPM','STP','SUR','SVK','SVN','SWE','SWZ','SYC','SYR','TCA','TCD','TGO','THA','TJK','TKL','TKM','TMP','TON','TTO','TUN','TUR','TUV','TWN','TZA','UGA','UKR','URY','USA','UZB','VAT','VCT','VEN','VGB','VIR','VNM','VUT','WLF','WSM','YEM','YUG','ZAF','ZMB','ZWE') NOT NULL
*************************** 4. row ***************************
             Field_name: world.city.District
              Min_value: Abhasia [Aphazeti]
              Max_value: –
             Min_length: 1
             Max_length: 20
       Empties_or_zeros: 4
                  Nulls: 0
Avg_value_or_avg_length: 9.0194
                    Std: NULL
      Optimal_fieldtype: VARCHAR(20) NOT NULL
*************************** 5. row ***************************
             Field_name: world.city.Population
              Min_value: 42
              Max_value: 10500000
             Min_length: 2
             Max_length: 8
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 350468.2236
                    Std: 723686.9870
      Optimal_fieldtype: MEDIUMINT(8) UNSIGNED NOT NULL
5 rows in set (0.01 sec)

ANALYSE() examines the result from a query and returns an analysis of the results that suggests optimal data types for each column that may help reduce table sizes.

Our justification for wanting to deprecate PROCEDURE ANALYSE is as follows:

  • There are no other uses of SELECT * FROM table PROCEDURE. This syntax is used exclusively by ANALYSE, and uses the UK English spelling.
  • The name PROCEDURE predates the addition of stored procedures as a MySQL feature. Ideally this feature would use a different name (CHANNEL?) to avoid confusion in usage. It also exists as an extension to the SQL standard.
  • There are numerous advantages to a feature similar to this being external to the MySQL server. The server must follow a stable release cycle, with core functionality being unchanged once it is declared GA. As an external tool, it is much easier to develop in an agile way, and provide new functionality without having to provide the same level of backward compatibility.

    By “external” I am implying that this could either be a script or as a view or stored procedure in MySQL. Shlomi has a good example of how to show auto_increment column capacity in common_schema!

Our current plan is to deprecate PROCEDURE ANALYSE in MySQL 5.7, for removal as soon as MySQL 5.8. We are inviting feedback from the MySQL Community and would like to hear from you if you use PROCEDURE ANALYSE. Please leave a comment, or get in touch!

Optimizer Trace and EXPLAIN FORMAT=JSON in 5.7

I accidentally stumbled upon this Stack Overflow question this morning:

I am wondering if there is any difference in regards to performance between the following:

SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4);
SELECT ... FROM ... WHERE someFIELD between  0 AND 5;
SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ...;

It is an interesting question because there was no good way to answer it when it was asked in 2009. All of the queries resolve to the same output in EXPLAIN. Here is an example using the sakila schema:

mysql> EXPLAIN SELECT * FROM film WHERE film_id BETWEEN 1 AND 5\G
mysql> EXPLAIN SELECT * FROM film WHERE film_id IN (1,2,3,4,5)\G
mysql> EXPLAIN SELECT * FROM film WHERE film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5\G
********* 1. row *********
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 5
     filtered: 100.00
        Extra: Using where

Times have changed though. There are now a couple of useful features to show the difference 🙂

Optimizer Trace

Optimizer trace is a new diagnostic tool introduced in MySQL 5.6 to show how the optimizer is working internally. It is similar to EXPLAIN, with a few notable differences:

  • It doesn’t just show the intended execution plan, it shows the alternative choices.
  • You enable the optimizer trace, then you run the actual query.
  • It is far more verbose in its output.

Here are the outputs for the three versions of the query:

  1. SELECT * FROM film WHERE film_id BETWEEN 1 AND 5
  2. SELECT * FROM film WHERE film_id IN (1,2,3,4,5)
  3. SELECT * FROM film WHERE film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5

What is the difference?

The optimizer trace output shows that the first query executes as one range, while the second and third execute as 5 separate single-value ranges:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 5,
                      "ranges": [
                        "1 <= film_id <= 1",
                        "2 <= film_id <= 2",
                        "3 <= film_id <= 3",
                        "4 <= film_id <= 4",
                        "5 <= film_id <= 5"
                      ]
                    },
                    "rows_for_plan": 5,
                    "cost_for_plan": 6.0168,
                    "chosen": true
                  }

This can also be confirmed with the handler counts from SHOW STATUS:

BETWEEN 1 AND 5:
 Handler_read_key: 1
 Handler_read_next: 5
IN (1,2,3,4,5):
 Handler_read_key: 5
film_id =1 or film_id=2 or film_id=3 or film_id=4 or film_id=5:
 Handler_read_key: 5

So I would say that BETWEEN 1 AND 5 is the cheapest query, because it finds one key and then says next, next, next until finished. The optimizer seems to agree with me. A single range access plus next five times costs 2.0168 instead of 6.0168:

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 5,
                      "ranges": [
                        "1 <= film_id <= 5"
                      ]
                    },
                    "rows_for_plan": 5,
                    "cost_for_plan": 2.0168,
                    "chosen": true
                  }
                }

For context, a cost unit is a logical representation of approximately one random IO. It is stable to compare costs between different execution plans.

Ranges are not all equal

Perhaps a better example to demonstrate this, is the difference between these two ranges:

  • SELECT * FROM film WHERE film_id BETWEEN 1 and 20
  • SELECT * FROM film WHERE (film_id BETWEEN 1 and 10) or (film_id BETWEEN 911 and 920)

It's pretty obvious that the second one needs to execute in two separate ranges. EXPLAIN will not show this difference, and both queries appear the same:

********* 1. row *********
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 20
     filtered: 100.00
        Extra: Using where

Two distinct separate ranges may be two separate pages, and thus have different cache efficiency on the buffer pool. It should be possible to distinguish between the two.

EXPLAIN FORMAT=JSON

EXPLAIN FORMAT=JSON was introduced in MySQL 5.6 along with OPTIMIZER TRACE, but where it really becomes useful is MySQL 5.7. The JSON output will now include cost information (as well as showing separate ranges as attached_condition):

********* 1. row *********
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10.04"
    },
    "table": {
      "table_name": "film",
      "access_type": "range",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "film_id"
      ],
      "key_length": "2",
      "rows_examined_per_scan": 20,
      "rows_produced_per_join": 20,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "6.04",
        "eval_cost": "4.00",
        "prefix_cost": "10.04",
        "data_read_per_join": "15K"
      },
      "used_columns": [
        "film_id",
        "title",
        "description",
        "release_year",
        "language_id",
        "original_language_id",
        "rental_duration",
        "rental_rate",
        "length",
        "replacement_cost",
        "rating",
        "special_features",
        "last_update"
      ],
      "attached_condition": "((`film`.`film_id` between 1 and 10) or (`film`.`film_id` between 911 and 920))"
    }
  }
}

With the FORMAT=JSON output also showing cost, we can see that two ranges costs 10.04, versus one big range costing 9.04 (not shown). These queries are not identical in cost even though they are in EXPLAIN output.

Conclusion

I have heard many users say "joins are slow", but a broad statement like this misses magnitude. By including the cost information in EXPLAIN we get all users to speak the same language. We can now say "this join is expensive", which is a much better distinction 🙂

It is time to start using OPTIMIZER TRACE, and particularly in 5.7 ditch EXPLAIN for EXPLAIN FORMAT=JSON.

A quick update on our native Data Dictionary

In July 2014, I wrote that we were working on a new native InnoDB data dictionary to replace MySQL’s legacy frm files.

This is quite possibly the largest internals change to MySQL in modern history, and will unlock a number of previous limitations, as well as simplify a number of failure states for both replication and crash recovery.

With MySQL 5.7 approaching release candidate (and large changes always coming with risk attached) we decided that the timing to try to merge in a new data dictionary was just too tight. The data dictionary development is still alive and well, but it will not ship as part of MySQL 5.7.

So please stay tuned for updates… and thank you for using MySQL!

Proposal to change additional defaults in MySQL 5.7

Following on from my earlier proposal to change Replication + InnoDB settings, in the MySQL team, we are proposing to make the following additional changes to defaults in MySQL 5.7:

Setting Old Default New Default
binlog_error_action IGNORE_ERROR ABORT_SERVER
innodb_checksum_algorithm INNODB CRC32
innodb_page_cleaners 1 4
innodb_purge_threads 1 4
innodb_strict_mode OFF ON
innodb_log_file_size 48M 128M
innodb_buffer_pool_dump_at_shutdown OFF ON
innodb_buffer_pool_load_at_startup OFF ON
innodb_buffer_pool_dump_pct 100 25
innodb_file_format Antelope Barracuda
innodb_large_prefix OFF ON
binlog_gtid_recovery_simplified OFF ON
sync_binlog 0 1
slave_net_timeout 3600 60
P_S config Enable events_statements_history and events_transactions_history consumers by default
MySQL CLI Prompt mysql> prompt=”\u@\h [\d] > “
sql-mode +NO_AUTO_CREATE_USER

To explain the motivation behind these changes:

  • The binlog_error_action variable was introduced in MySQL 5.6 after the GA release, and for compatibility defaulted to previous behavior (ignore errors). Since some applications depend on reading from slaves after writing to masters, we feel that the better behavior is to abort.
  • The InnoDB CRC32 checksum was introduced in MySQL 5.6, and offers CPU accelerated page checksum computation. Since previous versions of MySQL were unable to read the new checksum, enabling this feature would have restricted downgrades to dump-and-restore. With MySQL 5.6 supporting the CRC32 checksum, the timing now makes sense to make this the default in 5.7.
  • In some workloads the InnoDB purge and page cleaning operations will not be able to keep up with modifications made inside InnoDB. In combination with newer storage devices that offer very high throughput and accommodate parallel writes, we feel that having multiple threads is a better default.
  • Currently the InnoDB storage engine ignores CREATE TABLE options that it does not understand (innodb_strict_mode=0). We feel this negatively impacts user experience, and are proposing to enable InnoDB strict mode in advance. The downside of this change is that some DDL statements that worked in 5.6 failing by default in 5.7. Thank you to Daniël van Eeden for providing this suggestion.
  • Since the InnoDB buffer pool now supports online resizing (and the innodb log files do not), we believe it makes sense to increase the size of the log files slightly in anticipation of an increased buffer pool size. This comes at a small cost of additional disk space usage. We felt that 128M was a good balance in size, since while larger log files are better for performance, there is also a trade off point where the log files benefit from filesystem caches. Thank you to Justin Swanhart for providing this suggestion.
  • With the increased availability of large memory, waiting for caches to warm back up after restarts has been an increased operational concern. MySQL 5.6 introduced a feature to be able to dump buffer pool page addresses (at a small fraction of total size) to assist in warming caches after restarts. MySQL 5.7 introduces an additional feature to dump only a percentage of the buffer pool, and we felt that by limiting this to 25% was a good tradeoff to enable this by default. Thank you to Daniel Black and Daniël van Eeden for this suggestion.
  • The InnoDB Barracuda file format was first introduced in the InnoDB plugin for MySQL 5.1, and enables InnoDB to use the new DYNAMIC row format (with improved BLOB handling) and COMPRESSED tables. Setting the InnoDB file format to Barracuda depended on innodb-file-per-table=1 (default changed in MySQL 5.6), and introduced an incompatibility that would prevent downgrades to earlier releases. With both MySQL 5.5 and 5.6 supporting Barracuda, the timing now makes sense to enable this by default. Thank you to Daniël van Eeden for providing this suggestion.
  • The innodb_large_prefix prefix option was introduced for Barracuda tables as an option to support much longer index key prefixes (up to 3072 bytes). It was disabled by default so that users did not accidentally create Barracuda tables that would be incompatible on Antelope systems (which would not allow downgrading). With Barracuda becoming the default, and our friends at WordPress and Drupal wanting to create much larger indexes to support utf8mb4 encoded columns, we decided to change this to default to on.
  • The binlog_gtid_recovery_simplified option was released after MySQL 5.6 was released as GA, and defaulted to OFF for backwards compatibility. We feel that ON is the better default, and plan to make this change for MySQL 5.7.
  • With the addition of binary log group commit in MySQL 5.6, the impact from enabling sync_binlog=1 on many non-synthetic workloads should be reduced considerably. This change makes MySQL binary logs durable/crash safe by default.
  • The slave_net_timeout defaults to one hour, which is too long for most modern systems. In addition: MySQL 5.7 now uses this value divided by two to configure the default master heartbeat period. We believe that a one minute default is more appropriate. Thank you to Miguel Angel Nieto for providing this suggestion.
  • In MySQL 5.7, performance_schema can now instrument transactions, which can be incredibly useful when enabled by default. In addition, enabling statements history (introduced in 5.6) provides a lot of useful meta-data for diagnosing performance problems. Thank you to Daniël van Eeden for providing this suggestion.
  • The MySQL command line client has long supported the ability to change the default prompt from “mysql >” to include additional details such as the username and database selected. This is mostly a cosmetic change, but it is useful when you are managing multiple servers. Thank you to Daniël van Eeden for providing this suggestion.
  • In MySQL 5.7 we have already made changes to the default SQL mode: enabling ONLY_FULL_GROUP_BY and STRICT_TRANS_TABLES by default. There is a long standing usability issue with user grants, where a typo in the username could result in a new user being created. We have decided that it is best to disable this behavior by enabling the NO_AUTO_CREATE_USER sql mode by default. Thank you to Daniël van Eeden for providing this suggestion.

In addition to these changes, we have decided to withdraw our proposal to change the default isolation level to READ-COMMITTED. We will re-evaluate this decision for future releases.

For those wishing to test out these new configuration settings, I have configuration file available to try out:

Please let us know what you think of these changes!
You can leave a comment here, or get in touch via email.

An easy way to describe MySQL's Binary Log Group Commit

It struck me today; there is an easy way to describe MySQL’s Binary Log group commit improvements from MySQL 5.0-5.7 by using the example of a single ferry trying to ship passengers from point A to point B:

Ferry from Point A to point B

MySQL 5.0 Behaviour

In MySQL 5.0, the ferry will pick up the next passenger in line from point A, and transfer them to point B. The trip between A and B takes about 10 minutes return trip, so it’s possible that several new passengers will arrive while the ferry is in transit. That doesn’t matter; when the ferry arrives back at point A, it will only pick up the very next passenger in line.

MySQL 5.6 Behaviour

In MySQL 5.6, the ferry will pick up all passengers from the line at point A, and then transfer them to point B. Each time it returns to point A to pick up new passengers, it will collect everyone who is waiting and transfer them across to point B.

This is measurably better performance in real-life situations where many passengers tend to arrive while waiting for the ferry to arrive back at point A, and the trip between A and B tends to take some time. It is not so measurable in naive benchmarks that run in a single-thread.

There is no configuration necessary to enable group commit in 5.6. It works by default.

MySQL 5.7 Behaviour

MySQL 5.7 behaves similarly to 5.6 in that it will pick up all waiting passengers from point A and transfer them to point B, but with one notable enhancement!

When the ferry arrives back at point A to pick up waiting passengers, it can be configured to wait just a little bit longer with the knowledge that new passengers will likely arrive. For example: if you know the trip between point A and point B is 10 minutes in duration, why not wait an extra 30 seconds at point A before departing? This may save you on roundtrips and improve the overall number of passengers that can be transported.

The configuration variables for artificial delay are binlog-group-commit-sync-delay (delay in microseconds) and binlog-group-commit-sync-no-delay-count (number of transactions to wait for before deciding to abort waiting).

Conclusion

In this example passengers are obviously transactions, and the ferry is an expensive fsync operation. It’s important to note that there is just one ferry in operation (a single set of ordered binary logs), so being able to tune this in 5.7 provides a nice level of advanced configuration.

Experimenting with the new Data Dictionary Labs Release

Labs releases are intended to provide early access to new features. As Gopal notes in his recent blog post About the Data Dictionary Labs Release, there is a notable restriction where upgrading from any previous MySQL database version is not supported.

Today, I thought I would demonstrate how to get the data dictionary lab up and running on a fresh Ubuntu 14.04 installation:

# Download from labs.mysql.com
wget http://downloads.mysql.com/snapshots/pb/mysql-5.7.5-labs-dd/mysql-5.7.5-labs-dd-linux-el6-x86_64.tar.gz
# extract it to /usr/local/mysql
# more or less following instructions in INSTALL-BINARY
groupadd mysql
useradd -r -g mysql mysql
tar -xzf mysql-5.7.5-labs-dd-linux-el6-x86_64.tar.gz
mv mysql-5.7.5-labs-dd-linux-el6-x86_64 /usr/local/
ln -s /usr/local/mysql-5.7.5-labs-dd-linux-el6-x86_64 /usr/local/mysql
cd /usr/local/mysql
chown -R mysql .
chgrp -R mysql .

At this point, the next step in the instructions actually failed for me:

bin/mysql_install_db --user=mysql
2014-10-15 09:22:34 [ERROR]   The data directory needs to be specified.

This behavior change appears to be the result of mysql_install_db being rewritten in C++. Specifying the datadir addresses this, but also results in another error:

bin/mysql_install_db --user=mysql --datadir=/usr/local/mysql/data
2014-10-15 09:24:15 [ERROR]   The child process terminated prematurely. Errno= 32
2014-10-15 09:24:15 [ERROR]   Failed to execute /usr/local/mysql-5.7.5-labs-dd-linux-el6-x86_64/bin/mysqld --no-defaults --install-server --datadir=/usr/local/mysql/data --lc-messages-dir=/usr/local/mysql/share --lc-messages=en_US
-- server log begin --

Error 32 is a broken pipe. From the description, we can see that mysql_install_db tried to call a subprocess (mysqld –install-server), but it exited prematurely. The description here doesn’t show why it exited prematurely, but we can execute this command directly to find out:

/usr/local/mysql-5.7.5-labs-dd-linux-el6-x86_64/bin/mysqld --no-defaults --install-server --datadir=/usr/local/mysql/data --lc-messages-dir=/usr/local/mysql/share --lc-messages=en_US
/usr/local/mysql-5.7.5-labs-dd-linux-el6-x86_64/bin/mysqld: error while loading shared libraries: libaio.so.1: cannot open shared object file: No such file or directory

OK! We don’t have the async IO dependency installed on this system, which is used by InnoDB:

apt-get install libaio1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-headers-3.13.0-36 linux-headers-3.13.0-36-generic
  linux-image-3.13.0-36-generic linux-image-extra-3.13.0-36-generic
Use 'apt-get autoremove' to remove them.
The following NEW packages will be installed:
  libaio1
0 upgraded, 1 newly installed, 0 to remove and 3 not upgraded.
Need to get 6,364 B of archives.
After this operation, 53.2 kB of additional disk space will be used.
Get:1 http://mirrors.digitalocean.com/ubuntu/ trusty/main libaio1 amd64 0.3.109-4 [6,364 B]
Fetched 6,364 B in 0s (319 kB/s)
Selecting previously unselected package libaio1:amd64.
(Reading database ... 116531 files and directories currently installed.)
Preparing to unpack .../libaio1_0.3.109-4_amd64.deb ...
Unpacking libaio1:amd64 (0.3.109-4) ...
Setting up libaio1:amd64 (0.3.109-4) ...
Processing triggers for libc-bin (2.19-0ubuntu6.3) ...

Now to try the original install command again:

bin/mysql_install_db --user=mysql --datadir=/usr/local/mysql/data

It should return no errors now. To follow the remaining steps in the INSTALL-BINARY instructions:


chown -R root .
chown -R mysql data
bin/mysqld_safe --user=mysql &

In MySQL 5.7, installation is secure by default. This means that a root password is automatically generated, and stored in ~/.mysql_secret:

cat ~/.mysql_secret
# Password set for user 'root@localhost' at 2014-10-15 09:32:09
Wpckgmde+U,o

It is possible to change this password with the MySQL command line client:

./bin/mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.5-labs-dd
Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SET PASSWORD=PASSWORD('acdc');
Query OK, 0 rows affected (0.00 sec)

As Gopal mentioned in his blog post, the labs release runs completely without frm files:

ls /usr/local/mysql/data/mysql
catalogs.ibd                  help_relation.ibd       slave_relay_log_info.ibd
character_sets.ibd            help_topic.ibd          slave_worker_info.ibd
collations.ibd                index_column_usage.ibd  slow_log.CSM
columns.ibd                   indexes.ibd             slow_log.CSV
columns_priv.MYD              index_partitions.ibd    table_partitions.ibd
columns_priv.MYI              index_stats.ibd         table_partition_values.ibd
column_type_elements.ibd      innodb_index_stats.ibd  tables.ibd
db.MYD                        innodb_table_stats.ibd  tablespace_files.ibd
db.MYI                        ndb_binlog_index.MYD    tablespaces.ibd
db.opt                        ndb_binlog_index.MYI    tables_priv.MYD
engine_cost.ibd               parameters.ibd          tables_priv.MYI
event.MYD                     plugin.ibd              table_stats.ibd
event.MYI                     proc.MYD                time_zone.ibd
events.ibd                    proc.MYI                time_zone_leap_second.ibd
foreign_key_column_usage.ibd  procs_priv.MYD          time_zone_name.ibd
foreign_keys.ibd              procs_priv.MYI          time_zone_transition.ibd
func.MYD                      proxies_priv.MYD        time_zone_transition_type.ibd
func.MYI                      proxies_priv.MYI        triggers.ibd
general_log.CSM               routines.ibd            user.MYD
general_log.CSV               schemata.ibd            user.MYI
gtid_executed.ibd             server_cost.ibd         view_table_usage.ibd
help_category.ibd             servers.ibd
help_keyword.ibd              slave_master_info.ibd

information_schema is a now also a set of views on top of real InnoDB tables (stored in the mysql schema). As a VIEW we can explain it, and since extended explain is always enabled in 5.7, SHOW WARNINGS will show the rewritten form:

mysql> SELECT * FROM information_schema.tables WHERE table_name = 'user'\G
*************************** 1. row ***************************
  TABLE_CATALOG: def
   TABLE_SCHEMA: mysql
     TABLE_NAME: user
     TABLE_TYPE: BASE TABLE
         ENGINE: MyISAM
        VERSION: 10
     ROW_FORMAT: Dynamic
     TABLE_ROWS: 1
 AVG_ROW_LENGTH: NULL
    DATA_LENGTH: NULL
MAX_DATA_LENGTH: NULL
   INDEX_LENGTH: NULL
      DATA_FREE: NULL
 AUTO_INCREMENT: 1
    CREATE_TIME: 2014-10-15 09:32:10
    UPDATE_TIME: 2014-10-15 09:32:10
     CHECK_TIME:
TABLE_COLLATION: utf8_bin
       CHECKSUM: NULL
 CREATE_OPTIONS: stats_persistent=1
  TABLE_COMMENT: Users and global privileges
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM information_schema.tables WHERE table_name = 'user'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cat
   partitions: NULL
         type: index
possible_keys: PRIMARY
          key: name
      key_len: 194
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: sch
   partitions: NULL
         type: ref
possible_keys: PRIMARY,catalog_id
          key: catalog_id
      key_len: 8
          ref: mysql.cat.id
         rows: 3
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
   partitions: NULL
         type: eq_ref
possible_keys: schema_id
          key: schema_id
      key_len: 202
          ref: mysql.sch.id,const
         rows: 1
     filtered: 100.00
        Extra: Using index condition
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: col
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: mysql.tbl.collation_id
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: stat
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 388
          ref: mysql.sch.name,const
         rows: 1
     filtered: 100.00
        Extra: NULL
5 rows in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `cat`.`name` AS `TABLE_CATALOG`,`sch`.`name` AS `TABLE_SCHEMA`,`tbl`.`name` AS `TABLE_NAME`,`tbl`.`type` AS `TABLE_TYPE`,`tbl`.`engine` AS `ENGINE`,`tbl`.`version` AS `VERSION`,`tbl`.`row_format` AS `ROW_FORMAT`,ifnull(`stat`.`table_rows`,1) AS `TABLE_ROWS`,`stat`.`avg_row_length` AS `AVG_ROW_LENGTH`,`stat`.`data_length` AS `DATA_LENGTH`,`stat`.`max_data_length` AS `MAX_DATA_LENGTH`,`stat`.`index_length` AS `INDEX_LENGTH`,`stat`.`data_free` AS `DATA_FREE`,ifnull(`stat`.`auto_increment`,1) AS `AUTO_INCREMENT`,`tbl`.`created` AS `CREATE_TIME`,`tbl`.`last_altered` AS `UPDATE_TIME`,'' AS `CHECK_TIME`,`col`.`name` AS `TABLE_COLLATION`,`stat`.`checksum` AS `CHECKSUM`,get_dd_create_options(`tbl`.`options`) AS `CREATE_OPTIONS`,`tbl`.`comment` AS `TABLE_COMMENT` from `mysql`.`tables` `tbl` join `mysql`.`schemata` `sch` join `mysql`.`catalogs` `cat` left join `mysql`.`collations` `col` on((`col`.`id` = `tbl`.`collation_id`)) left join `mysql`.`table_stats` `stat` on(((`stat`.`schema_name` = `sch`.`name`) and (`stat`.`table_name` = 'user'))) where ((`tbl`.`schema_id` = `sch`.`id`) and (`sch`.`catalog_id` = `cat`.`id`) and (`tbl`.`name` = 'user') and can_access_table(`sch`.`name`,`tbl`.`name`)) order by `sch`.`name`,`tbl`.`name`
1 row in set (0.00 sec)

That’s it for today. Enjoy testing the data dictionary, and please send in your feedback!