Eating your own dogfood

I've just finishing converting my blog from Jekyll to WordPress.

There were a couple of features I was looking for (such as being able to schedule upcoming posts), but the real reason is that I want to be able to use MySQL 5.7 DMR3 against a tool that I interact with almost every day.

Or as this is more informally known, I wanted to eat my own dogfood:

I now have:

  • WordPress 3.8.1
  • MySQL 5.7 DMR3
  • sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
  • All InnoDB Storage Engine
  • Disqus comment engine (comment if you think this is cheating)
  • "Markdown on Save Improved" plugin.

I apologize for broken images and URLs changing slightly (which will break comments). I plan to fix them as I spot them.

Do you have any other suggestions I should try?

Proposal to enable sql mode ONLY_FULL_GROUP_BY by default

We are considering enabling the SQL mode ONLY_FULL_GROUP_BY by default in MySQL 5.7. Here is a quick example to demonstrate how this change will impact you:

Sample Data:

CREATE TABLE invoice_line_items (id INT NOT NULL PRIMARY KEY auto_increment, 
invoice_id INT NOT NULL,
description varchar(100)
);

INSERT INTO invoice_line_items VALUES 
 (NULL, 1, 'New socks'),
 (NULL, 1, 'A hat'),
 (NULL, 2, 'Shoes'),
 (NULL, 2, 'T shirt'),
 (NULL, 3, 'Tie');

Default behaviour in MySQL 5.6 and below:

mysql> SELECT id, invoice_id, description 
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+
| id | invoice_id | description |
+----+------------+-------------+
|  1 |          1 | New socks   |
|  3 |          2 | Shoes       |
|  5 |          3 | Tie         |
+----+------------+-------------+
3 rows in set (0.00 sec)

Proposed default behaviour in MySQL 5.7:

mysql> SELECT id, invoice_id, description 
FROM invoice_line_items GROUP BY invoice_id;
ERROR 1055 (42000): 'test.invoice_line_items.id' isn't in GROUP BY

That is to say that in 5.7 this statement will produce an error.

Notice that columns id and description are not actually unique? Roland Bouman has a blog post from 2007 that describes what MySQL is doing here, and how it is different in behaviour to other databases.

To summarize: MySQL is picking one of the possible values for id and description and the query executes in a non deterministic way. Here's some more information to prove this:

mysql> SELECT id, invoice_id, description, max(description) 
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+------------------+
| id | invoice_id | description | max(description) |
+----+------------+-------------+------------------+
|  1 |          1 | New socks   | New socks        |
|  3 |          2 | Shoes       | T shirt          |
|  5 |          3 | Tie         | Tie              |
+----+------------+-------------+------------------+
3 rows in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(id), invoice_id, GROUP_CONCAT(description) 
FROM invoice_line_items GROUP BY invoice_id;
+------------------+------------+---------------------------+
| GROUP_CONCAT(id) | invoice_id | GROUP_CONCAT(description) |
+------------------+------------+---------------------------+
| 1,2              |          1 | New socks,A hat           |
| 3,4              |          2 | Shoes,T shirt             |
| 5                |          3 | Tie                       |
+------------------+------------+---------------------------+
3 rows in set (0.00 sec)

What is problematic about this behaviour is that it can hide bugs in application code. To highlight two blog posts:

The Proposal

We are proposing to change ONLY_FULL_GROUP_BY to be enabled by default as of MySQL 5.7. The downside in doing this, is that some users upgrading will have statements return errors. These users will be left with two options:

  1. Set sql_mode in their my.cnf (or my.ini) file explicitly, or as part of their application code since sql_mode is configurable per session.
  2. Modify the offending GROUP BY statements to be deterministic.

On the second point, we are also proposing introducing an ANY_VALUE() SQL function to allow statements that are 'acceptable as non-deterministic' to run easy retrofit in this less permissive configuration. For example:

mysql> SELECT ANY_VALUE(id) AS id, invoice_id, 
ANY_VALUE(description) AS description 
FROM invoice_line_items GROUP BY invoice_id;
+------+------------+-------------+
| id   | invoice_id | description |
+------+------------+-------------+
| 1    |          1 | New socks   |
| 3    |          2 | Shoes       |
| 5    |          3 | Tie         |
+------+------------+-------------+
3 rows in set (0.01 sec)

Why I personally like this proposal, is that I compare it to buying an undercooked burger in a North American restaurant. We are proposing to switch our policy from "we may serve it to you without you having realized you ordered it" to "sign a waiver, then we will serve it".

i.e. We are not removing options, but want you to have to opt-in to what can be an unsafe choice, rather than opt-out.

Fans of standard SQL will rejoice at compatibility with SQL-2003, a standard defined after this behavior was first implemented.

Conclusion

As with other changes that have been proposed for MySQL 5.7, we are soliciting your feedback!

  • Have you accidentally returned incorrect data, and could have benefited from having ONLY_FULL_GROUP_BY enabled?
  • Do you maintain an application that relies on the non deterministic behaviour?
  • Do you agree or disagree that this change is the better default for users?

Please leave a comment, or get in touch!

Update: This proposal has gone ahead. MySQL 5.7 DMR5 has ONLY_FULL_GROUP_BY enabled by default!

Spring cleaning MySQL bugs

The MySQL team has been busy fixing bugs and sorting through older bugs. To give you a couple of recent stats:

  • Norvald blogged about 14 bugs fixed at the request of Linux Distributions.
  • Tomas Ulin's keynote at Percona Live mentioned 1991 Bugs Fixed in 5.6, with 3763 Bugs Fixed in Total Since MySQL 5.5 GA :)

As part of our spring cleaning efforts, we have also decided to retire the bug status to be fixed later. That is to say that:

  • Some bugs that were marked as to be fixed later have actually been fixed. Keeping a status of items that won't be fixed for now has proven difficult to diligently maintain as accurate.
  • We want to prevent a half-way state where historically a fix may not have been technically feasible but we did not always set user expectations appropriately. We have decided that in some of these cases it is better to 'break the bad news early' and set the status to won't fix.
  • All remaining bugs that were to be fixed later will now be changed to be verified, and will be re-evaluated in priority on a regular interval moving forward.

If you are affected by a particular bug, I recommend clicking the "Affects Me" button.

Oracle's MySQL Federal Symposium Recap

I'm back home in Toronto, after having spent part of last week in Washington DC meeting Oracle's Federal MySQL users.

I think generally speaking when you look at buzzwords in the startup scene, there is a filter as to what bubbles up into the more conservative enterprise and public sector.

My impression was that there is some interest in big-data, but there is absolutely phenomenal interest in cloud computing. This was a real take away for me, and next time I plan to talk more about the MySQL utilities and MySQL Fabric.

There were two side discussions that affected a number of users that I wanted to follow up with here:

My next event will be FOSDEM in Brussels. If you'll be in attendance, please sign up for the MySQL and friends community dinner (Oracle is sponsoring)!

- Morgan

Making strict sql_mode the default

MySQL has sometimes faced criticism for being too relaxed at allowing invalid values, or inserting but truncating values that are out of range. For example:

mysql> CREATE TABLE unsigned_int (a int unsigned);
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO unsigned_int (a) VALUES (-1);
Query OK, 1 row affected, 1 warning (0.01 sec)

mysql> SELECT * FROM unsigned_int;
+------+
| a    |
+------+
|    0 |
+------+
1 row in set (0.00 sec)

In MySQL 5.0 two strict sql_mode options were introduced to be able to change this behaviour:

  • STRICT_ALL_TABLES - Behave more like the SQL standard and produce errors when data is out of range.
  • STRICT_TRANS_TABLES - Behave more like the SQL standard and produce errors when data is out of range, but only on transactional storage engines like InnoDB.

However, these options were disabled by default.

Changes in MySQL 5.6

MySQL 5.6 made the first important step by setting STRICT_TRANS_TABLES "by default" for new installations of MySQL. That is to say that when using one of the MySQL installation packages, the included configuration file includes the line:

# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

Being in a configuration file only, it has given users an easy way to be able to revert to the old behaviour if their application relies on MySQL's relaxed handling.

(Small Note: The included "default configuration file" could be /etc/my.cnf or BASEDIR/my.cnf which may end up being /usr/my.cnf).

Upcoming Changes in MySQL 5.7

In MySQL 5.7 we are planning to simplify the SQL modes, and potentially even introduce additional strictness to bring MySQL even closer to the SQL standard.

I have previously written about this topic here.

Proposed Changes in MySQL 5.x

We want to make STRICT_TRANS_TABLES the default for all installations. That is to say, whether or not you have specified an sql-mode in configuration, MySQL will reject invalid or out of range values unless you chose to set it otherwise.

However, we also realize that this is one of the changes that impact users the most, so we are soliciting feedback on how we can minimize the impact.

Do you think this change can be included in MySQL 5.7?
Should it wait until MySQL 5.8?
If you discovered that you needed to remove the SQL mode STRICT_TRANS_TABLES from your configuration file after installing MySQL 5.6, what did you think of the experience?

Please leave a comment or get in touch!