How important is it to use 2-byte and 3-byte integers?

One interesting feature of MySQL, is that it supports a very large number of integer data types. From the MySQL manual:


Type Storage Minimum Value Maximum Value
(Bytes) (Signed/Unsigned) (Signed/Unsigned)
TINYINT 1 -128 127
0 255
SMALLINT 2 -32768 32767
0 65535
MEDIUMINT 3 -8388608 8388607
0 16777215
INT 4 -2147483648 2147483647
0 4294967295
BIGINT 8 -9223372036854775808 9223372036854775807
0 18446744073709551615

I have previously written on How important it is to using the correct datatype in MySQL, but I find that between integer types I don't always follow my own advice. I predominantly use TINYINT, INT, and BIGINT, and usually as the default of SIGNED rather than UNSIGNED.

Am I lazy? I think it is best to run through examples...

Primary Key Integer

In this example lets say that we know that we will never have more than 65535 customers, so we have the choice of using a SMALLINT UNSIGNED at 2 bytes, or lazily an INT will accommodate 2147483647 customers at 4 bytes:

Option #1:
CREATE TABLE customers (
id SMALLINT UNSIGNED PRIMARY KEY auto_increment,
company_name VARCHAR(20),
..
..
last_order_date DATETIME
);
Option #2:
CREATE TABLE customers (
id INT PRIMARY KEY auto_increment,
company_name VARCHAR(20),
..
..
last_order_date DATETIME
);

I will project that there are 50000 rows in the table, which is about as close as you would want to get to the maximum range possible of SMALLINT. Using SMALLINT provides a 2 byte saving per row:

2 * 50000 = 100000 = A saving of 97KB

In InnoDB the primary key is included in every other index on the table, so technically we will save an additional ~97KB per-index, but I would say that this saving is too small to matter for most cases.

Attribute on a large table

Option #1:
CREATE TABLE page_views (
id BIGINT PRIMARY KEY auto_increment,
site_id SMALLINT UNSIGNED NOT NULL,
..
..
);
Option #2:
CREATE TABLE page_views (
id BIGINT PRIMARY KEY auto_increment,
site_id INT NOT NULL,
..
..
);

In this example, I will say that there are 1 billion rows in the page_views table:

1 billion * 2 bytes saving per row = 1.8GB

So the choice of using a SMALLINT over an INT has more measurable savings in this case. On a per unindexed column basis I could be saving around 2GB. So with five SMALLINT columns on the same row, I could save 10GB.

Is 10GB worth it? It is maybe easier to answer this question when expressed as a percentage saving for a table. To use some optimistic overhead numbers (that assume very little fragmentation) for InnoDB:

  • ~13 bytes per row (MVCC features)
  • ~1KB per 16K page (page fill factor)

I'll look at two table definitions. One with the best case of all SMALLINT columns in place of INT, and one with a VARCHAR column that is assumed to be 50 bytes on average:

Option #1:
CREATE TABLE page_views (
id BIGINT PRIMARY KEY auto_increment,
site_id SMALLINT UNSIGNED NOT NULL, /* or INT */
page_id SMALLINT UNSIGNED NOT NULL, /* or INT */
category_id SMALLINT UNSIGNED NOT NULL, /* or INT */
region_id SMALLINT UNSIGNED NOT NULL, /* or INT */
time_id SMALLINT UNSIGNED NOT NULL /* or INT */
);
Option #2:
CREATE TABLE page_views (
id BIGINT PRIMARY KEY auto_increment,
site_id SMALLINT UNSIGNED NOT NULL, /* or INT */
page_uri VARCHAR(2048) NOT NULL, /* assume average length of 50 bytes */
category_id SMALLINT UNSIGNED NOT NULL, /* or INT */
region_id SMALLINT UNSIGNED NOT NULL, /* or INT */
time_id SMALLINT UNSIGNED NOT NULL /* or INT */
);
# INT/SMALLINT only:
(13B (overhead) + 8B (pkey) + 5*2) * 1.06 * 1 billion = 30.60 GB
(13B (overhead) + 8B (pkey) + 5*4) * 1.06 * 1 billion = 40.47 GB

# One VARCHAR column:
(13B (overhead) + 8B (pkey) + 4*2 + 50 (varchar)) * 1.06 * 1 billion = 77.9GB
(13B (overhead) + 8B (pkey) + 4*4 + 50 (varchar)) * 1.06 * 1 billion = 85.9GB

So in the INT/SMALLINT table there is about a 25% saving by using smaller data types, and about a 10% saving in the case of the single VARCHAR column.

I do think that 25% is worth caring about, so maybe in this case I could follow my own advice a little closer :) However, I would also consider this a close to best-case situation that I have used to illustrate a point. I would expect typical space savings to be less than 10%.

One other point to consider, is that changing the datatype of a column is not supported by Online DDL. It is important not to sub-optimize data type choices and have to make changes later.

Did I miss anything? Please leave a comment.

Using MySQL Sandbox to setup a MySQL Fabric cluster in development

With Fabric's official GA release this week, I thought I would post a spin on how to setup a development environment loosely based on the Fabric Quick Start guide in the manual.

The notable change, is the use of MySQL Sandbox for bootstrapping each of the MySQL instances.

Step 1: Install the MySQL Utilities + Python Connector

In my case, I downloaded:

The GUI install in both cases works as expected. Next, Next, Done!

Step 2: Install MySQL Sandbox

MySQL Sandbox is available via Launchpad or CPAN. There isn't a major difference between the two, but I happened to choose CPAN.

The advantage of installing Sandbox over a utility such as mysqld_multi, is that it's much easier to setup and tear down replication.

Step 3: Download a 5.6 tarball

In my case, I downloaded mysql-5.6.17-osx10.7-x86.tar.gz Mac OS X 10.7 (x86, 32-bit), Compressed TAR Archive. On Mac OS X it is useful to use Google Chrome rather than Safari, as you do not want the file automatically decompressed.

Important: Download only! MySQL Sandbox does the install in the next step.

Step 4: Setup the Fabric Backing Store

The first copy of MySQL to setup is the MySQL Fabric backing store. This will house the fabric schema which stores configuration data. Here is the setup line I used:

$ make_sandbox mysql-5.6.17-osx10.7-x86.tar.gz
unpacking /Users/morgo/Downloads/mysql-5.6.17-osx10.7-x86.tar.gz
Executing low_level_make_sandbox --basedir=/Users/morgo/Downloads/5.6.17 \
    --sandbox_directory=msb_5_6_17 \
    --install_version=5.6 \
    --sandbox_port=5617 \
    --no_ver_after_name \
    --my_clause=log-error=msandbox.err
    The MySQL Sandbox,  version 3.0.43
    (C) 2006-2013 Giuseppe Maxia
installing with the following parameters:
upper_directory                = /Users/morgo/sandboxes
sandbox_directory              = msb_5_6_17
sandbox_port                   = 5617
check_port                     =
no_check_port                  =
datadir_from                   = script
install_version                = 5.6
basedir                        = /Users/morgo/Downloads/5.6.17
tmpdir                         =
my_file                        =
operating_system_user          = morgo
db_user                        = msandbox
remote_access                  = 127.%
bind_address                   = 127.0.0.1
ro_user                        = msandbox_ro
rw_user                        = msandbox_rw
repl_user                      = rsandbox
db_password                    = msandbox
repl_password                  = rsandbox
my_clause                      = log-error=msandbox.err
master                         =
slaveof                        =
high_performance               =
prompt_prefix                  = mysql
prompt_body                    =  [\h] {\u} (\d) >
force                          =
no_ver_after_name              = 1
verbose                        =
load_grants                    = 1
no_load_grants                 =
no_run                         =
no_show                        =
do you agree? ([Y],n) Y
loading grants
.. sandbox server started
Your sandbox server was installed in $HOME/sandboxes/msb_5_6_17

Using a configuration similar to the MySQL manual, with the password simply being secret I saved the following to /etc/mysql/fabric.cfg:

[DEFAULT]
prefix =
sysconfdir = /etc
logdir = /var/log

[storage]
address = localhost:5617
user = fabric
password = secret
database = fabric
auth_plugin = mysql_native_password
connection_timeout = 6
connection_attempts = 6
connection_delay = 1

[servers]
user = fabric
password = secret

[protocol.xmlrpc]
address = localhost:32274
threads = 5
user = admin
password = secret
disable_authentication = no
realm = MySQL Fabric
ssl_ca =
ssl_cert =
ssl_key =

[executor]
executors = 5

[logging]
level = INFO
url = file:///var/log/fabric.log

[sharding]
mysqldump_program = /usr/bin/mysqldump
mysqlclient_program = /usr/bin/mysql

[statistics]
prune_time = 3600

[failure_tracking]
notifications = 300
notification_clients = 50
notification_interval = 60
failover_interval = 0
detections = 3
detection_interval = 6
detection_timeout = 1
prune_time = 3600

[connector]
ttl = 1

[client]
password =

I also need to create the user in MySQL for Fabric to connect as:

$ $HOME/sandboxes/msb_5_6_17/use -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.6.17 MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {root} ((none)) > CREATE USER 'fabric'@'localhost' IDENTIFIED BY 'secret';
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {root} ((none)) > GRANT ALL ON fabric.* TO 'fabric'@'localhost';
Query OK, 0 rows affected (0.00 sec)

Fabric can then install the system tables, and be started with:

$ mysqlfabric manage setup
[INFO] 1401222720.664261 - MainThread - Initializing persister: user (fabric), server (localhost:5617), database (fabric).
[INFO] 1401222721.245034 - MainThread - Initial password for admin/xmlrpc set
Password set for admin/xmlrpc from configuration file.
[INFO] 1401222721.247700 - MainThread - Password set for admin/xmlrpc from configuration file.
$ mysqlfabric manage start
[INFO] 1401222753.862997 - MainThread - Initializing persister: user (fabric), server (localhost:5617), database (fabric).
[INFO] 1401222753.867357 - MainThread - Loading Services.
[INFO] 1401222753.902501 - MainThread - Fabric node starting.
[INFO] 1401222753.910188 - MainThread - Starting Executor.
[INFO] 1401222753.910300 - MainThread - Setting 5 executor(s).
[INFO] 1401222753.910602 - Executor-0 - Started.
[INFO] 1401222753.911354 - Executor-1 - Started.
[INFO] 1401222753.912450 - Executor-2 - Started.
[INFO] 1401222753.913185 - Executor-3 - Started.
[INFO] 1401222753.914456 - Executor-4 - Started.
[INFO] 1401222753.914701 - MainThread - Executor started.
[INFO] 1401222753.927123 - MainThread - Starting failure detector.
[INFO] 1401222753.928365 - XML-RPC-Server - XML-RPC protocol server ('127.0.0.1', 32274) started.
[INFO] 1401222753.928731 - XML-RPC-Server - Setting 5 XML-RPC session(s).
[INFO] 1401222753.928991 - XML-RPC-Session-0 - Started XML-RPC-Session.
[INFO] 1401222753.929455 - XML-RPC-Session-1 - Started XML-RPC-Session.
[INFO] 1401222753.930168 - XML-RPC-Session-2 - Started XML-RPC-Session.
[INFO] 1401222753.930747 - XML-RPC-Session-3 - Started XML-RPC-Session.
[INFO] 1401222753.931320 - XML-RPC-Session-4 - Started XML-RPC-Session.

Step 5: Build out a HA group

The next step is to bootstrap a 3-node replication sandbox, and then add it as a HA Group in Fabric.

First with MySQL Sandbox:

$ make_replication_sandbox mysql-5.6.17-osx10.7-x86.tar.gz
installing and starting master
installing slave 1
installing slave 2
starting slave 1
.. sandbox server started
starting slave 2
.. sandbox server started
initializing slave 1
initializing slave 2
replication directory installed in $HOME/sandboxes/rsandbox_mysql-5_6_17
$ $HOME/sandboxes/rsandbox_mysql-5_6_17/m -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.6.17-log MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost] {root} ((none)) > GRANT ALL ON *.* TO 'fabric'@'localhost' IDENTIFIED BY 'secret';
Query OK, 0 rows affected (0.01 sec)

master [localhost] {root} ((none)) > SHOW GLOBAL VARIABLES LIKE 'port';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port          | 20082 |
+---------------+-------+
1 row in set (0.00 sec)

master [localhost] {root} ((none)) > exit
Bye

Then with MySQL Fabric:

$ mysqlfabric group create GLOBAL1
Procedure :
{ uuid        = 23593cf1-c516-46ef-beaa-516b333353ca,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
$ mysqlfabric group add GLOBAL1 localhost:20082
Procedure :
{ uuid        = e72decdf-6524-45b4-9eb6-6d757c0daf3a,
  finished    = True,
  success     = False,
  return      = ServerError: Server (e208ca4a-e5df-11e3-9dd5-4cd7500a4330) does not have the binary log or gtid enabled.,
  activities  =
}

An error! What this error is saying, is that Fabric is designed to specifically take advantage of Replication with GTIDs which are designed to handle smoother failover if the topology changes. Thankfully, MySQL Sandbox has an easy way of being able to switch to GTIDs:

$ $HOME/sandboxes/rsandbox_mysql-5_6_17/enable_gtid
# option 'master-info-repository=table' added to master configuration file
# option 'relay-log-info-repository=table' added to master configuration file
# option 'gtid_mode=ON' added to master configuration file
# option 'log-slave-updates' added to master configuration file
# option 'enforce-gtid-consistency' added to master configuration file
# option 'master-info-repository=table' added to node1 configuration file
# option 'relay-log-info-repository=table' added to node1 configuration file
# option 'gtid_mode=ON' added to node1 configuration file
# option 'log-slave-updates' added to node1 configuration file
# option 'enforce-gtid-consistency' added to node1 configuration file
# option 'master-info-repository=table' added to node2 configuration file
# option 'relay-log-info-repository=table' added to node2 configuration file
# option 'gtid_mode=ON' added to node2 configuration file
# option 'log-slave-updates' added to node2 configuration file
# option 'enforce-gtid-consistency' added to node2 configuration file
# executing "stop" on /Users/morgo/sandboxes/rsandbox_mysql-5_6_17
executing "stop" on slave 1
executing "stop" on slave 2
executing "stop" on master
# executing "start" on /Users/morgo/sandboxes/rsandbox_mysql-5_6_17
executing "start" on master
. sandbox server started
executing "start" on slave 1
. sandbox server started
executing "start" on slave 2
. sandbox server started

With this change, adding servers to Fabric will now work:

$ mysqlfabric group add GLOBAL1 localhost:20082
Procedure :
{ uuid        = c9da0536-0f3f-4d75-9f29-1ac6c12ff686,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
$ mysqlfabric group add GLOBAL1 localhost:20083
Procedure :
{ uuid        = 92b42645-311f-423b-86b4-3a56870db244,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
$ mysqlfabric group add GLOBAL1 localhost:20084
Procedure :
{ uuid        = 1f16a586-4a60-4b7d-a339-be3830614984,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

We can find group commands with:

$ mysqlfabric group

Commands available in group 'group' are:
    group activate group_id  [--synchronous]
    group description group_id  [--description=NONE] [--synchronous]
    group deactivate group_id  [--synchronous]
    group create group_id  [--description=NONE] [--synchronous]
    group remove group_id server_id  [--synchronous]
    group add group_id address  [--timeout=5] [--update_only] [--synchronous]
    group health group_id
    group lookup_servers group_id  [--server_id=NONE] [--status=NONE] [--mode=NONE]
    group destroy group_id  [--force] [--synchronous]
    group demote group_id  [--update_only] [--synchronous]
    group promote group_id  [--slave_id=NONE] [--update_only] [--synchronous]
    group lookup_groups  [--group_id=NONE]

Of which the useful option will be to list the servers in our group GLOBAL1:

$ mysqlfabric group lookup_servers GLOBAL1
Command :
{ success     = True
  return      = [{'status': 'SECONDARY', 'server_uuid': 'e208ca4a-e5df-11e3-9dd5-4cd7500a4330', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'localhost:20082'}, {'status': 'SECONDARY', 'server_uuid': 'e931694e-e5df-11e3-9dd5-4cd64dfe115b', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'localhost:20083'}, {'status': 'SECONDARY', 'server_uuid': 'ea61dfba-e5df-11e3-9dd5-4bd23bb2cbe3', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'localhost:20084'}]
  activities  =
}

The UUID in use here, is the same one that is used for Replication with GTIDs. To promote the first server as a master:

$ mysqlfabric group promote GLOBAL1 --slave_id=e208ca4a-e5df-11e3-9dd5-4cd7500a4330
Procedure :
{ uuid        = be791f70-be3e-4ee9-9872-e1cedce36791,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}
$ mysqlfabric group lookup_servers GLOBAL1
Command :
{ success     = True
  return      = [{'status': 'PRIMARY', 'server_uuid': 'e208ca4a-e5df-11e3-9dd5-4cd7500a4330', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'localhost:20082'}, {'status': 'SECONDARY', 'server_uuid': 'e931694e-e5df-11e3-9dd5-4cd64dfe115b', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'localhost:20083'}, {'status': 'SECONDARY', 'server_uuid': 'ea61dfba-e5df-11e3-9dd5-4bd23bb2cbe3', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'localhost:20084'}]
  activities  =
}

Next steps

This How-to can easily be extended to setup additional HA groups for shards - MySQL Sandbox really simplifies the setup by bootstrapping replication.

One caveat to be aware of: Each copy of mysqld running can take up quite a bit of RAM. It is possible to tune this down using some steps that I mentioned in an earlier post.

Faking a slave: Subscribing to mysql row-based-replication changes

In complex systems it's often useful to be able to receive notification when rows have been modified in MySQL so that you can invalidate various external caches or indexes. For example: memcached, Sphinx, Lucene.

In the case of MySQL's default statement-based replication this can be quite tricky to do, as it would likely require an SQL parser to determine what was intended to be modified. However, this is made much simpler with Row-based Replication (something I recommend switching to).

A C++ library exists to be able to listen to MySQL replication, but what I want to demonstrate is that it is also very simple to be able to do this with mysqlbinlog:


shell> mysqlbinlog --read-from-remote-server --stop-never --host localhost.localdomain --port 5616 -u msandbox -pmsandbox --verbose mysql_sandbox5616-bin.000004 | grep '^### '

To explain how this command works:

  • --read-from-remote-server tells mysqlbinlog to act like a slave, and fetch remote files rather than local (new option to 5.6).
  • --stop-never makes mysqlbinlog block waiting for a continual stream of updates, rather than exiting. Perfect!
  • --verbose rewrites row-based replication events to be pseudo SQL statements. The pseudo statements are very easy to parse. For example:
    ### DELETE FROM `test2`.`a`
    ### WHERE
    ###   @1=1
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=1
    ### UPDATE `test2`.`a`
    ### WHERE
    ###   @1=10
    ### SET
    ###   @1=20
    

    Noting that a multi-row statement will appear as individual statements via --verbose. Row-based events also default to sending all columns in the row, not just those that are changed.

  • The grep '^### ' statement is just a lazy way of stripping out everything except pseudo SQL statements.

Advanced Usage

You can fairly easily extend the above to track your progress reading through the master's binary logs. This will make it easier to resume from where you left off if there is a crash. Just remove the grep, and keep track of:

  • Positional markers just before DML events. These are marked in bold here:
    # at 191
    #140526 15:28:27 server id 10  end_log_pos 239 CRC32 0x559a84a8     GTID [commit=yes]
    SET @@SESSION.GTID_NEXT= '9f0ce61c-bb92-11e3-89fd-f056da47d247:17'/*!*/;
    # at 239
    #140526 15:28:27 server id 10  end_log_pos 312 CRC32 0xff074c19     Query   thread_id=1 exec_time=0 error_code=0
    SET TIMESTAMP=1401143307/*!*/;
    BEGIN
    /*!*/;
    # at 312
    #140526 15:28:27 server id 10  end_log_pos 357 CRC32 0x74b1ad7f     Table_map: `test2`.`a` mapped to number 71
    # at 357
    #140526 15:28:27 server id 10  end_log_pos 397 CRC32 0x2c6f8b8d     Write_rows: table id 71 flags: STMT_END_F
    
    BINLOG '
    C8CDUxMKAAAALQAAAGUBAAAAAEcAAAAAAAEABXRlc3QyAAFhAAEDAAF/rbF0
    C8CDUx4KAAAAKAAAAI0BAAAAAEcAAAAAAAEAAgAB//4KAAAAjYtvLA==
    '/*!*/;
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=10
    
  • Events that show that the binary log file itself is being rotated. For example, the result of FLUSH LOGS is:
    # at 4
    #691231 16:00:00 server id 10  end_log_pos 0 CRC32 0x7800af55   Rotate to mysql_sandbox5616-bin.000006  pos: 4
    

Conclusion

This might not be as robust in all cases as using the C++ API, but it sure beats the alternative. Gone are my days of writing complex sets of triggers to write to an "events" table, which I poll continually from an external script.

MySQL soon to store system tables in InnoDB

In the MySQL team, we are changing the system tables currently located in the mysql schema from MyISAM to InnoDB.

Looking at this historically:

  • MyISAM was the default storage engine up until MySQL 5.5.
  • In 5.5 almost 4 years ago, the default storage engine changed to InnoDB, however system tables used for features such as storing privileges and timezones remained as MyISAM.

Unlike MyISAM, InnoDB is an ACID compliant storage engine, with the behaviour that once a transaction commits, modifications are able to survive power-loss or other failures. This is a solid foundation to build applications on, since developers will need to handle fewer failures. To use an example:

  1. Customer places an order
  2. A confirmation email is sent
  3. Power is lost

Without durability, (2) could occur with no record of (1) occurring! Durability is a great feature. However, we do not currently offer this for the system tables which use MyISAM. To use an example:

  1. A DBA revokes a user's privilege to MySQL (the command returns success)
  2. Power loss occurs
  3. Upon restore, the revoke never applied.

By switching to InnoDB we are improving the experience of system-related tasks by ensuring durability that ACID provides.

This change will have the effect that InnoDB will be required for all MySQL installations, and the configuration setting --skip-innodb will no longer make sense. Users will still be able to use the MyISAM storage engine, and MyISAM-heavy installations can continue to configure the InnoDB buffer pool to as low as 5MB - taking up very little memory.

This is a great step forward for MySQL, and I am personally very excited to see this change. Many in the MySQL community have been requesting this change for years, and we're happy to now be working on it. If you have any thoughts, please leave a comment, or email me!

Proposal to deprecate COM_REFRESH packet

In the MySQL team we are proposing to deprecate the COM_REFRESH packet in favor of specific queries to execute FLUSH commands. To provide a bit of context:

  • The MySQL server protocol allows for clients to speak API commands via both a query and binary protocol interface. The set of the API commands can be seen in the MySQL Client/Server Protocol internals documentation, or very simply as they appear in a single switch statement:
    # ./sql/sql_parse.cc:1009 (simplified view)
    
      switch (command) {
    
      case COM_REGISTER_SLAVE:
      {
        /* do stuff */
        break;
      }
      case COM_QUERY:
      {
        /* parse query, do stuff */
        break;
      }
      case COM_REFRESH:
      {
       /* equivalent to running a FLUSH command */
        break;
      }
      case COM_SHUTDOWN:
      {
        kill_mysql();
        break;
      }
    }
    
  • The historical advantage of having a binary protocol meant that less bytes needed to be transfered across the network as well as less processing on the server, since parsing of statements is not required.
  • The disadvantage of the binary protocol is that it is less extensible. In the case of COM_REFRESH it was designed to accept only a 1 byte payload to indicate which flush command to run. A quick look at the MySQL manual shows that there are now more than eight (1 byte payload = 8 possible flags) flush commands available. This leads to API inconsistency as some commands are available via the binary interface and others are not.

Our proposed solution to this problem is to deprecate the binary interface for refresh commands and promote usage of the query interface (FLUSH LOGS, FLUSH TABLES, FLUSH HOSTS etc) instead. While we have no immediate plans to remove the COM_REFRESH binary interface (and thus break older clients) we are seeking feedback on the direction of this change. Please leave a comment, or get in touch!

Proposal to deprecate the old password format

Improving security has been a high priority for MySQL 5.7. To give you an idea of some of the changes to date, I recommend reading Todd Farmer's blog: Redefining --ssl option, Password expiration policy, AES256 encryption.

In MySQL 5.7, we are also proposing to remove the 'old' format for saving passwords. To provide some historical context:

  • The old password format is a 16-byte hash and considered to be cryptographically weak. Code appears online (I won't link to it) which allows the hash to be brute-forced in a small-enough amount of time.
  • Ten years ago (2004), MySQL 4.1 introduced a stronger password hash format based on SHA1. The old password hash continued to be supported, since client libraries connecting to MySQL needed time to upgrade to support the new authentication scheme. A configuration setting of secure-auth=1 allowed a DBA to enforce that users could only authenticate using new password hashes, but it was not enabled by default. Another configuration setting (old-passwords=1) allowed passwords to still be generated in the weak format.
  • MySQL 5.6 (2013) enabled secure-auth=1 by default, preventing users from connecting to the server using old password hashes. However, it still permits a DBA to configure secure-auth=0 + old-passwords=1 allowing weak hashes to be generated and used. As an aside, 5.6 also introduced a new SHA256 password hash option.

This brings us to today's discussion. We would like to propose that:

  • Setting secure-auth=0 + old-passwords=1 is officially deprecated as of MySQL 5.6. The MySQL manual pages will be updated to discourage against these settings, and warnings will be written to the server error log on startup when they are in use.
  • In a future version, support for setting secure-auth=0 + old-passwords=1 will be removed. Since our policy is to support each GA release for 8 years this means that using old-password hashing will already be supported via MySQL 5.6 until 2021. Extending support beyond this leaves us in a less secure position.

    We are proposing for removal in MySQL 5.7, but are open to suggestions as to what is an acceptable timeframe.

We value your feedback regarding this proposal:

  • Are you currently using secure-auth=0 or old-passwords=1?
  • Do you use a legacy connector that will not support the new (4.1+) authentication?
  • If you do not agree with removal in MySQL 5.7, which version would you consider more suitable?

Please leave a comment, or get in touch!

Calling for all mysql_install_db feature requests

mysql_install_db is the program that is responsible for bootstrapping a fresh copy of MySQL and making sure all of the system tables are setup correctly. This program is usually invoked by other programs as part of installation, but in the case of .tar.gz downloads will need to be executed manually.

As part of our ongoing efforts to refactor and improve MySQL, we are looking at converting mysql_install_db from its current Perl script to a program written in C++. The aim of this change is to reduce external dependencies (Perl + a set of modules) and improve our cross-platform experience.

We plan for this transition to be mostly function neutral, but are open to feature requests for additional improvements as well.

Have you worked in packaging MySQL?
Do you have any suggestions on how mysql_install_db can be improved?
Please leave a comment, or get in touch!

We're looking to improve PlanetMySQL

We are looking for community feedback on how improve the Signal-to-Noise ratio of PlanetMySQL, and reduce the amount of irrelevant and off-topic posts.

To give you a sense of where we are at today, we currently approve most [all?] blogs for aggregation, provided they have some existing technical content. We then apply a naive regular expression to make sure that all posts are on topic (MySQL).

The sorts of ideas we are looking for are:

  • What content are you interested in seeing more or less of?
  • What posts do you deem as acceptable or unacceptable?
  • Do we need to change the balance between technical, marketing, event, or business-related posts?
  • Are there any tools or features we could introduce to better serve readers and authors?

If you have any suggestions - please send them my way!