Proposal to deprecate collation_database and character_set_database settings

In the MySQL team, we are currently discussing deprecating the ability to change the collation_database and character_set_database settings. As part of our plan, mysql clients will still be able to access these variables, but they will be read-only.

Introduction

The MySQL manual summarizes these two variables as saying:

This option is dynamic, but only the server should set this information. You should not set the value of this variable manually.

This is to say that usage is already discouraged, but these two variables originally had a purpose of allowing a DBA to change the default character-set temporarily for the current session only. For example:

SET SESSION character_set_database=latin5;
LOAD DATA infile 'foo.txt' ... ;
-- foo.txt is interpreted as if it was in latin5 character set

The most obvious use case for this is in loading data from a text file which contains an arbitrary character set. However, it is worth noting that the LOAD DATA INFILE syntax has since added the ability to specify a character-set.

Known Issues

The functionality of collation_database and character_set_database differs somewhat from the original design goal, and presents usability issues for our users. Specifically:

  1. Both variables can be set GLOBAL; yet it mostly does not apply

    The original feature was designed for these two variables to be set on a per-session basis. However, an undocumented feature is that they can both be set on a GLOBAL basis without error. For example:

    mysql> SET GLOBAL character_set_database=latin5;
    Query OK, 0 rows affected (0.01 sec)
    

    However, when set on a GLOBAL basis these variables will have largely no effect. New sessions will always copy character set information from the current database unless a database is not selected:

     SET GLOBAL @@character_set_database=latin5;
    
     [disconnect]
     [connect, without choosing the current database]
    
     SELECT @@character_set_database; -- session counterpart takes
     its initial value from the global counterpart
    

    We consider any behaviour which will only apply in specific circumstances like this as confusing, and potentially harmful to user experience.

  2. Behaviour broken by fix for Bug #2326

    As part of the fix for Bug #2326, the collation_database variable no longer affects the CREATE TABLE statement. That is to say that setting character_set_database and collation_database does not apply for all statements!

  3. Variables are not replicated prior to fix for Bug#15126

    Before Bug#15126 was fixed, setting of a local variable would not be replicated; thus relying on this feature would break replication. While this is no longer a current issue, it helps support our current understanding that the usage of this feature is minimal.

  4. Does not apply to stored programs

    Setting these variables has no effect on stored programs (triggers, functions, procedures), base tables or views — all these objects use the character set and collation of the database they belong to.

Known bugs:

  • Bug #35357 – “character_set_database does not effects CREATE DATABASE without character set”
  • Bug #27208 If no current database, character_set_database !=character_set_server
  • Bug #27687 ALTER TABLE … CONVERT TO uses @@collation_database

Our Proposal

Our plan is to deprecate both SESSION and GLOBAL variables collation_database and character_set_database starting with MySQL 5.7. In MySQL 5.8, we plan to remove the GLOBAL variables and make accessing the SESSION variables read-only.

We are currently seeking feedback relating to this plan:

  • Do you currently use the collation_database and character_set_database variables?
  • Do you have a use-case that we have not currently considered?

Please get in touch! You can either leave a comment, or contact me via email!

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • Jeremy Cole

    I didn’t even realize that these variables *were* changeable. Definitely okay with them being read-only. 🙂

  • Peter Laursen

    Those variables will constitue defaults for new tables created in the database/schema. Some people will have more databases on a server with different charset/settings for each (some legacy application – forums software etc. – require latin1, for instance)

    So I think it is a very bad idea to remove them. But there is no reason why they should be dynamic (this is nonsense to me, actually).

    • Hi Peter, I think there may be a misunderstanding here. Per (2) above, changing these variables will not affect how a table is created.

      It will still be possible to have a per-schema default character set, and it will still be possible to read collation_database and character_set_database. It just won’t be possible on a per-session basis to overwrite this temporarily.

      • Peter Laursen

        OK .. then I did not understand properly.

  • Peter Zaitsev

    MySQL character set support is so complicated due to legacy reasons. Any way to make it more simple is great.