Proposal to enable sql mode ONLY_FULL_GROUP_BY by default

We are considering enabling the SQL mode ONLY_FULL_GROUP_BY by default in MySQL 5.7. Here is a quick example to demonstrate how this change will impact you:
Sample Data:

CREATE TABLE invoice_line_items (id INT NOT NULL PRIMARY KEY auto_increment,
invoice_id INT NOT NULL,
description varchar(100)
);
INSERT INTO invoice_line_items VALUES
 (NULL, 1, 'New socks'),
 (NULL, 1, 'A hat'),
 (NULL, 2, 'Shoes'),
 (NULL, 2, 'T shirt'),
 (NULL, 3, 'Tie');

Default behaviour in MySQL 5.6 and below:

mysql> SELECT id, invoice_id, description
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+
| id | invoice_id | description |
+----+------------+-------------+
|  1 |          1 | New socks   |
|  3 |          2 | Shoes       |
|  5 |          3 | Tie         |
+----+------------+-------------+
3 rows in set (0.00 sec)

Proposed default behaviour in MySQL 5.7:

mysql> SELECT id, invoice_id, description
FROM invoice_line_items GROUP BY invoice_id;
ERROR 1055 (42000): 'test.invoice_line_items.id' isn't in GROUP BY

That is to say that in 5.7 this statement will produce an error.
Notice that columns id and description are not actually unique? Roland Bouman has a blog post from 2007 that describes what MySQL is doing here, and how it is different in behaviour to other databases.
To summarize: MySQL is picking one of the possible values for id and description and the query executes in a non deterministic way. Here’s some more information to prove this:

mysql> SELECT id, invoice_id, description, max(description)
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+------------------+
| id | invoice_id | description | max(description) |
+----+------------+-------------+------------------+
|  1 |          1 | New socks   | New socks        |
|  3 |          2 | Shoes       | T shirt          |
|  5 |          3 | Tie         | Tie              |
+----+------------+-------------+------------------+
3 rows in set (0.00 sec)
mysql> SELECT GROUP_CONCAT(id), invoice_id, GROUP_CONCAT(description)
FROM invoice_line_items GROUP BY invoice_id;
+------------------+------------+---------------------------+
| GROUP_CONCAT(id) | invoice_id | GROUP_CONCAT(description) |
+------------------+------------+---------------------------+
| 1,2              |          1 | New socks,A hat           |
| 3,4              |          2 | Shoes,T shirt             |
| 5                |          3 | Tie                       |
+------------------+------------+---------------------------+
3 rows in set (0.00 sec)

What is problematic about this behaviour is that it can hide bugs in application code. To highlight two blog posts:

The Proposal

We are proposing to change ONLY_FULL_GROUP_BY to be enabled by default as of MySQL 5.7. The downside in doing this, is that some users upgrading will have statements return errors. These users will be left with two options:

  1. Set sql_mode in their my.cnf (or my.ini) file explicitly, or as part of their application code since sql_mode is configurable per session.
  2. Modify the offending GROUP BY statements to be deterministic.

On the second point, we are also proposing introducing an ANY_VALUE() SQL function to allow statements that are ‘acceptable as non-deterministic’ to run easy retrofit in this less permissive configuration. For example:

mysql> SELECT ANY_VALUE(id) AS id, invoice_id,
ANY_VALUE(description) AS description
FROM invoice_line_items GROUP BY invoice_id;
+------+------------+-------------+
| id   | invoice_id | description |
+------+------------+-------------+
| 1    |          1 | New socks   |
| 3    |          2 | Shoes       |
| 5    |          3 | Tie         |
+------+------------+-------------+
3 rows in set (0.01 sec)

Why I personally like this proposal, is that I compare it to buying an undercooked burger in a North American restaurant. We are proposing to switch our policy from “we may serve it to you without you having realized you ordered it” to “sign a waiver, then we will serve it”.
i.e. We are not removing options, but want you to have to opt-in to what can be an unsafe choice, rather than opt-out.
Fans of standard SQL will rejoice at compatibility with SQL-2003, a standard defined after this behavior was first implemented.

Conclusion

As with other changes that have been proposed for MySQL 5.7, we are soliciting your feedback!

  • Have you accidentally returned incorrect data, and could have benefited from having ONLY_FULL_GROUP_BY enabled?
  • Do you maintain an application that relies on the non deterministic behaviour?
  • Do you agree or disagree that this change is the better default for users?

Please leave a comment, or get in touch!

Update: This proposal has gone ahead. MySQL 5.7 DMR5 has ONLY_FULL_GROUP_BY enabled by default!