A followup on show_compatibility_56

Giuseppe and Shlomi both blogged on one of the recent changes introduced to MySQL 5.7.8-rc, where the setting show_compatibility_56 is now set OFF by default.

Both raise very good points. Here is how we plan to address them:

  1. The permissions issue reported by Giuseppe will be fixed.
  2. When selecting from information_schema tables in show_compatibility_56=OFF mode, an error will now be produced:

    mysql> select * from information_schema.global_variables;
    ERROR 3167 (HY000): The 'INFORMATION_SCHEMA.GLOBAL_VARIABLES' feature is disabled; see the documentation for 'show_compatibility_56'

    (Previously this was a warning + empty set returned)
  3. When show_compatibility_56=ON, users will now be able to select from either information_schema or performance_schema tables. This presents a more viable upgrade path for users that need some period to transition.

The show_compatibility_56 setting itself will remain deprecated, as it was from its introduction. This signifies that we intend to remove INFORMATION_SCHEMA.GLOBAL_VARIABLES in a future release.

Outside of the scope of today's update, my colleague Mark Leith has also volunteered to blog with examples of how the new performance_schema tables now expose variables down to the thread-level. This will add more context as to why we are moving this meta data from information_schema to performance_schema. Thanks Mark!

So thank you again to Giuseppe and Shlomi for helping make a better MySQL. We're delighted to incorporate your feedback!

Proposal to extend binary operators in MySQL

In order to make it easier to work with data stored as binary strings (BINARY/VARBINARY) we are considering extending the &,|,^,~,<<,>> operators, to accept any-size binary strings and return binary string data as response. This can be useful for several complex data types not covered by the basic SQL data types (e.g. working with IPV6 addresses, manipulating UUID, etc).

Motivation

Let's say we're interested in getting all the networks that contain the given IP address. With ipv4 the common practice is to store the IP addresses as INT and execute:

SELECT inet_ntoa(network) AS network, inet_ntoa(netmask) AS netmask FROM network_table WHERE (inet_aton('192.168.0.30') & netmask) = network;

At the moment you are not able to do the same with ipv6 because inet6_aton('2001:0db8:85a3:0000:0000:8a2e:0370:7334') & netmask converts both operands from VARBINARY to BIGINT resulting in data truncation and when the & operation gets executed, the result is incorrect. But, if the & operator could have worked directly on BINARY/VARBINARY data, this would have been possible:

SELECT inet6_ntoa(network) AS network, inet6_ntoa(netmask) AS netmask FROM network_table WHERE (inet6_aton('2001:0db8:85a3:0000:0000:8a2e:0370:7334') & netmask) = network;

The SQL standard does not define bitwise operations over any-size binary string data but it does define binary string comparison:

All binary string values are comparable. When binary large object string values are compared, they shall have exactly the same length (in octets) to be considered equal. Binary large object string values can be compared only for equality. For binary string values other than binary large object string values, it is implementation-defined whether trailing X'00's are considered significant when comparing two binary string values that are otherwise equivalent.

Thus, the standard allows binary strings to be zero-padded in the least significant part for comparisons (right-side). If you're interpreting binary data as a hexadecimal unsigned integer, you would expect the operand with smaller size to be zero-padded to the left side. So an easy approach to avoid confusion would be to allow the operators(^,&,|) to only work with same size operands, thus avoiding any confusion over whether padding occurs in most or least significant part.

Another aspect to be mentioned is that the old behavior with INT arguments would be preserved. Example SELECT 23 & 4 would still return a numeric BIGINT response: 4.

MySQL has two ways of representing hexadecimal string literals: x'val' and 0xval (where val contains hexadecimal digits 0-9, a-f, A-F). The difference between the two is that the first is SQL standard and has a constraint: the number of hexadecimal digits must be even, the second version is not standard and does not require an even number of characters (it will be zero-padded on the left side in case of even number of characters). But there is one issue, in numeric contexts, hexadecimal values act like integers (64-bit precision) and in string contexts, they act like binary strings. So currently when executing SELECT x'10000001' & x'00000001' the operands get converted from VARBINARY to BIGINT(int64), with loss of any parts beyond 64 bits, and this returns BIGINT; with our change, this would return BINARY(4), breaking existing applications. That's something that can be solved with a new sql_mode named BINARY_BIT_OPS_YIELD_BINARY, off by default for backward-compatibility; if turned on, bit operation on binary strings will yield BINARY result (or an error if operators are not of the same size).

The alternative would be to introduce new functions. Example bin_and(x,y), bin_or(x,y), bin_xor(x,y) that can take two binary string arguments of same length and return binary string. We are considering also other names like binary_and(x,y), binary_or(x,y), binary_xor(x,y).

Pros:

  • No compatibility issues
  • Bitwise operators (&, |, ^) would remain operators that yield integer results (preserving existing functionality), whereas the new functions would yield binary string results

Cons:

  • It's longer to type bin_and()/binary_and() than &
  • This creates new syntax for users to learn
  • bit_and(x) already exists, it's an aggregate function; we fear this could confuse users, though, fortunately both functions do a similar thing (they AND bits).

Please let us know in a comment below what are your opinions on this:

  • Is it a good idea to implement bitwise operations for binary strings?
  • Is the BINARY_BIT_OPS_YIELD_BINARY sql_mode necessary?
  • Can you think of other use cases where this can be useful (UUID handling is one such case)?

Thanks to Catalin Besleaga on the optimizer team for ghost writing this post.

MySQL 5.7.8 - mysqlpump caveat

MySQL 5.7.8-rc2 was released today, and features a new server utility called mysqlpump. This utility contains a number of major improvements over mysqldump including:

  • Parallel processing of databases, and of objects within databases, to speed up the dump process
  • For dump file reloading, faster secondary index creation for InnoDB tables by adding indexes after rows are inserted
  • Better control over which databases and database objects (tables, views, stored programs, user accounts) to dump
  • Dumping of user accounts as account-management statements (CREATE USER, GRANT) rather than as inserts into the mysql system database
  • Capability of creating compressed output
  • Progress indicator

What I wanted to caution however, is that mysqlpump is not currently consistent. That is to say that currently each of the dump threads lack a synchronization point before they start backing up the data. This makes it currently unsafe as a general purpose backup replacement.

The mysqlpump developers are aware of this limitation, and are busy adding this feature into the utility.

The MySQL 5.7 Optimizer Challenge

In the MySQL team, we have been working really hard on refactoring the optimizer and improving the cost model. The hacks of storage engines lying to the optimizer are being rolled back, and your chances of getting an optimal query plan should now be much higher than in prior releases of MySQL.

The optimizer team has also allowed cost constants to be configurable on both a server and a storage engine basis, and we are confident that the default InnoDB engine will always work "as good as MyISAM" (which has a natural advantage, in that the optimizer was originally largely built around it.)

Today, I want to issue a challenge:

Find an example where the optimizer picks the wrong execution plan for InnoDB tables but is correct for MyISAM. If you can demonstrate a reproducible testcase, I have a polo with MySQL 5.7 Community Contributor on it waiting for you.

The supplies of this special edition t-shirt are limited, but I will ship it to you anywhere in the world :)

The MySQL 5.7 Community Contributor Polo, as modeled by Daniël van Eeden.
The MySQL 5.7 Community Contributor Polo, as modeled by Daniël van Eeden. I'm the guy on the left.

MySQL 5.7.8 - Now featuring super_read_only and disabled_storage_engines

I wanted to highlight two new features that are making their way into MySQL 5.7 via the not-yet-released 5.7.8-rc2:

  • A new system variable super_read_only allows a more strict definition of 'read-only' which also applies to super users.
  • A new disabled_storage_engines setting offers a way to prevent an enumerated list of storage engines from being used. For example, a DBA may wish to enforce an InnoDB-only policy to simplify common operations such as backups, but it's possible MyISAM may sneak back in via new code-deployments. This setting allows more active enforcement.

These features are the fruits of our engineering team meeting with our users at Percona Live this year. Thank you to Percona for once again hosting a great conference, and in particular thank you to @isamlambert (and the GitHub Engineering team), @John_Cesario, @denshikarasu & Rob Wultsch for specifically requesting these two features :)

Proposal to deprecate INSERT and REPLACE alternative syntax

In the MySQL team we are currently considering a proposal to deprecate a number of alternative syntax uses with the INSERT and REPLACE commands. To provide examples:

CREATE TABLE `city` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1;

INSERT INTO city SET
 Name='NewCity', CountryCode='CAN',District='MyDistrict',Population=1234;

INSERT INTO city (Name,CountryCode,District,Population) VALUE
 ('NewCity2', 'CAN', 'MyDistrict', 1234);

INSERT city (Name,CountryCode,District,Population) VALUES
 ('NewCity3', 'CAN', 'MyDistrict', 1234);
 
REPLACE INTO city (Name,CountryCode,District,Population) VALUE
 ('NewCity4', 'CAN', 'MyDistrict', 1234);

REPLACE city (Name,CountryCode,District,Population) VALUES
 ('NewCity5', 'CAN', 'MyDistrict', 1234);

To summarize these queries:

  • INSERT using the SET syntax.
  • INSERT and REPLACE using the keyword VALUE instead of VALUES
  • INSERT and REPLACE without the keyword INTO

Our rationale for this proposal is as follows:

  • Having a number of very similar ways of completing the same task makes it very difficult for training, documentation and support. To explain this in more detail: in our manual we have always tried to document every option that the server will accept, but with no functional difference between the options, this makes the content more verbose and clumsy to read.

    MySQL usage becomes cleaner by stating which usage is explicitly preferred, even if the old syntax remains supported for the legacy use-case.

  • The syntax is non-standard and complicates our parser. While it may take some time before we are able to remove these options, by starting the deprecation cycle now we can provide application authors with as much notice as possible.

Our proposed plan is to deprecate the syntax starting with MySQL 5.7. We will assess feedback from our users before targeting a version for syntax removal.

Will you be affected by this change? Please leave a comment, or Get in touch! We'd love to hear from you.

Proposal to deprecate MySQL INTEGER display width and ZEROFILL

In the MySQL team we are currently discussing if we should deprecate the integer display width in numeric types. For example:

CREATE TABLE my_table (
 id INT(11) NOT NULL PRIMARY KEY auto_increment
);

The (11) does not affect the storage size of the data type, which for an INT will always be 4 bytes. It affects the display width.
Our rationale for proposing this should be deprecated is that it is a common source of confusion amongst users.

We are also discussing deprecating the non-standard ZEROFILL type attribute, which is the only modern consumer of this display width meta data. For example:

CREATE TABLE my_table (
 id INT(11) ZEROFILL NOT NULL PRIMARY KEY auto_increment
);

INSERT INTO my_table VALUES (1);

mysql> SELECT * FROM my_table;
+-------------+
| id          |
+-------------+
| 00000000001 |
+-------------+
1 row in set (0.00 sec)

StackOverflow has a good example of how ZEROFILL is useful:

[..] In Germany, we have 5 digit zipcodes. However, those Codes may start with a Zero, so 80337 is a valid zipcode for munic, 01067 is a zipcode of Berlin.

As you see, any German citizen expects the zipcodes to be displayed as a 5 digit code, so 1067 looks strange.

[..]

This usage is true for any numeric values that require leading zeros, such as some phone numbers.

Upgrade Paths

There are two possible upgrade paths to migrate away from ZEROFILL.

Option #1 - Move to CHAR/VARCHAR

This option is the most transparent for applications, and changes the data type to be a string instead of numeric. For example:

CREATE TABLE my_zip_codes (
 id INT NOT NULL PRIMARY KEY auto_increment,
 zip_code INT(5) ZEROFILL
);
INSERT INTO my_zip_codes (zip_code) VALUES ('01234'), ('54321'), ('00123'), ('98765');

mysql> select * from my_zip_codes;
+----+----------+
| id | zip_code |
+----+----------+
|  1 |    01234 |
|  2 |    54321 |
|  3 |    00123 |
|  4 |    98765 |
+----+----------+
4 rows in set (0.00 sec)

ALTER TABLE my_zip_codes CHANGE zip_code zip_code CHAR(5);

In the case of a CHAR(5) the storage requirements will only be one byte higher than that of an integer. In the case of other data types (phone numbers requiring leading zeros) it might be slightly more efficient to store as an integer.

Option #2 - Format integers at a different layer

This option retains the storage efficiency of an integer, but moves the presentation into the application. For example:

CREATE TABLE my_zip_codes (
 id INT NOT NULL PRIMARY KEY auto_increment,
 zip_code INT(5) ZEROFILL
);
INSERT INTO my_zip_codes (zip_code) VALUES ('01234'), ('54321'), ('00123'), ('98765');
ALTER TABLE my_zip_codes CHANGE zip_code zip_code INT;

mysql> select * from my_zip_codes;
+----+----------+
| id | zip_code |
+----+----------+
|  1 |     1234 |
|  2 |    54321 |
|  3 |      123 |
|  4 |    98765 |
+----+----------+
4 rows in set (0.00 sec)

It will also technically be possible to retrofit this into legacy applications that require ZEROFILL presentation returning from MySQL. This can be done with a query rewrite plugin to modify SELECT statements to add padding:

mysql> SELECT id, LPAD(zip_code, 5, '0') as zip_code FROM my_zip_codes;
+----+----------+
| id | zip_code |
+----+----------+
|  1 | 01234    |
|  2 | 54321    |
|  3 | 00123    |
|  4 | 98765    |
+----+----------+
4 rows in set (0.01 sec)

Conclusion

We are seeking feedback from the community in response to this proposal. If you have found the existing behavior confusing, or will be affected by the removal of zero fill, please leave a comment or get in touch! We would love to hear from you.

Plan to deprecate PROCEDURE ANALYSE

In the MySQL team, we have been refactoring the SQL parser to be more maintainable. Gleb Shchepa lists the goals of this project in more details on the MySQL Server Team blog.

As part of this, we have identified the feature PROCEDURE ANALYSE as something that we would like to deprecate. For added context, here is a demonstration:

mysql> SELECT * FROM City procedure analyse()\G
*************************** 1. row ***************************
             Field_name: world.city.ID
              Min_value: 1
              Max_value: 4079
             Min_length: 1
             Max_length: 4
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 2040.0000
                    Std: 1177.5058
      Optimal_fieldtype: SMALLINT(4) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: world.city.Name
              Min_value: A Coruña (La Coruña)
              Max_value: ´s-Hertogenbosch
             Min_length: 3
             Max_length: 33
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 8.5295
                    Std: NULL
      Optimal_fieldtype: VARCHAR(33) NOT NULL
*************************** 3. row ***************************
             Field_name: world.city.CountryCode
              Min_value: ABW
              Max_value: ZWE
             Min_length: 3
             Max_length: 3
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 3.0000
                    Std: NULL
      Optimal_fieldtype: ENUM('ABW','AFG','AGO','AIA','ALB','AND','ANT','ARE','ARG','ARM','ASM','ATG','AUS','AUT','AZE','BDI','BEL','BEN','BFA','BGD','BGR','BHR','BHS','BIH','BLR','BLZ','BMU','BOL','BRA','BRB','BRN','BTN','BWA','CAF','CAN','CCK','CHE','CHL','CHN','CIV','CMR','COD','COG','COK','COL','COM','CPV','CRI','CUB','CXR','CYM','CYP','CZE','DEU','DJI','DMA','DNK','DOM','DZA','ECU','EGY','ERI','ESH','ESP','EST','ETH','FIN','FJI','FLK','FRA','FRO','FSM','GAB','GBR','GEO','GHA','GIB','GIN','GLP','GMB','GNB','GNQ','GRC','GRD','GRL','GTM','GUF','GUM','GUY','HKG','HND','HRV','HTI','HUN','IDN','IND','IRL','IRN','IRQ','ISL','ISR','ITA','JAM','JOR','JPN','KAZ','KEN','KGZ','KHM','KIR','KNA','KOR','KWT','LAO','LBN','LBR','LBY','LCA','LIE','LKA','LSO','LTU','LUX','LVA','MAC','MAR','MCO','MDA','MDG','MDV','MEX','MHL','MKD','MLI','MLT','MMR','MNG','MNP','MOZ','MRT','MSR','MTQ','MUS','MWI','MYS','MYT','NAM','NCL','NER','NFK','NGA','NIC','NIU','NLD','NOR','NPL','NRU','NZL','OMN','PAK','PAN','PCN','PER','PHL','PLW','PNG','POL','PRI','PRK','PRT','PRY','PSE','PYF','QAT','REU','ROM','RUS','RWA','SAU','SDN','SEN','SGP','SHN','SJM','SLB','SLE','SLV','SMR','SOM','SPM','STP','SUR','SVK','SVN','SWE','SWZ','SYC','SYR','TCA','TCD','TGO','THA','TJK','TKL','TKM','TMP','TON','TTO','TUN','TUR','TUV','TWN','TZA','UGA','UKR','URY','USA','UZB','VAT','VCT','VEN','VGB','VIR','VNM','VUT','WLF','WSM','YEM','YUG','ZAF','ZMB','ZWE') NOT NULL
*************************** 4. row ***************************
             Field_name: world.city.District
              Min_value: Abhasia [Aphazeti]
              Max_value: –
             Min_length: 1
             Max_length: 20
       Empties_or_zeros: 4
                  Nulls: 0
Avg_value_or_avg_length: 9.0194
                    Std: NULL
      Optimal_fieldtype: VARCHAR(20) NOT NULL
*************************** 5. row ***************************
             Field_name: world.city.Population
              Min_value: 42
              Max_value: 10500000
             Min_length: 2
             Max_length: 8
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 350468.2236
                    Std: 723686.9870
      Optimal_fieldtype: MEDIUMINT(8) UNSIGNED NOT NULL
5 rows in set (0.01 sec)

ANALYSE() examines the result from a query and returns an analysis of the results that suggests optimal data types for each column that may help reduce table sizes.

Our justification for wanting to deprecate PROCEDURE ANALYSE is as follows:

  • There are no other uses of SELECT * FROM table PROCEDURE. This syntax is used exclusively by ANALYSE, and uses the UK English spelling.
  • The name PROCEDURE predates the addition of stored procedures as a MySQL feature. Ideally this feature would use a different name (CHANNEL?) to avoid confusion in usage. It also exists as an extension to the SQL standard.
  • There are numerous advantages to a feature similar to this being external to the MySQL server. The server must follow a stable release cycle, with core functionality being unchanged once it is declared GA. As an external tool, it is much easier to develop in an agile way, and provide new functionality without having to provide the same level of backward compatibility.

    By "external" I am implying that this could either be a script or as a view or stored procedure in MySQL. Shlomi has a good example of how to show auto_increment column capacity in common_schema!

Our current plan is to deprecate PROCEDURE ANALYSE in MySQL 5.7, for removal as soon as MySQL 5.8. We are inviting feedback from the MySQL Community and would like to hear from you if you use PROCEDURE ANALYSE. Please leave a comment, or get in touch!

An update on GROUP BY implicit sort

In the MySQL team, we have been planning for some time to remove the implicit sorting that is provided by GROUP BY. In doing so, we will make a number of existing queries faster (by no longer requiring a sort) as well as unlock opportunities to implement further optimizations.

This is one of the more complicated behaviours to remove, because it is not possible to tell if an application relies upon implicit ordering. Since a GROUP BY query without an ORDER BY clause is a valid query, it is also not reasonable to issue deprecation warnings.

However, one piece of the puzzle that was missing when I last wrote about this problem, is that MySQL 5.7 will support server-side query rewrite. What this means, is that Database Administrators will have the ability to inject an ORDER BY into queries that require this legacy behaviour. This is useful in the case where modifying the application directly is not possible.

The second part to this update, is that we also plan to deprecate the closely related syntax GROUP BY .. [ASC|DESC]. I am sure many users are probably unaware that exists, but you can change the implicit ordering to be in descending order with:

SELECT MAX(Population), Name FROM Country GROUP BY Name DESC;

(Note the missing "ORDER BY").

This represents an extension from the SQL standard, that what I can tell is not present in other databases.

Our current plan is to make GROUP BY .. [ASC|DESC] deprecated as of MySQL 5.7, with removal in 5.8. As part of this, we also plan to remove the implicit GROUP BY sort as early as MySQL 5.8.

We are inviting feedback from the MySQL Community regarding this plan. Please leave a comment, or get in touch! I would love to hear from you.

SHOW ENGINE INNODB MUTEX is back!

We received feedback from a number of users in the MySQL community that the command SHOW ENGINE INNODB MUTEX remains useful in a number of scenarios. We listened, and the command is scheduled to make a return in MySQL 5.7.8.

To lessen overhead, the command will now feature a mechanism to enable and disable metrics collection. This is documented in the manual here:

SET GLOBAL innodb_monitor_enable='latch';
SET GLOBAL innodb_monitor_disable='latch';

Thank you for helping make a better MySQL!