Proposal to deprecate MySQL INTEGER display width and ZEROFILL

In the MySQL team we are currently discussing if we should deprecate the integer display width in numeric types. For example:

CREATE TABLE my_table (
 id INT(11) NOT NULL PRIMARY KEY auto_increment
);

The (11) does not affect the storage size of the data type, which for an INT will always be 4 bytes. It affects the display width.
Our rationale for proposing this should be deprecated is that it is a common source of confusion amongst users.

We are also discussing deprecating the non-standard ZEROFILL type attribute, which is the only modern consumer of this display width meta data. For example:

CREATE TABLE my_table (
 id INT(11) ZEROFILL NOT NULL PRIMARY KEY auto_increment
);

INSERT INTO my_table VALUES (1);

mysql> SELECT * FROM my_table;
+-------------+
| id          |
+-------------+
| 00000000001 |
+-------------+
1 row in set (0.00 sec)

StackOverflow has a good example of how ZEROFILL is useful:

[..] In Germany, we have 5 digit zipcodes. However, those Codes may start with a Zero, so 80337 is a valid zipcode for munic, 01067 is a zipcode of Berlin.

As you see, any German citizen expects the zipcodes to be displayed as a 5 digit code, so 1067 looks strange.

[..]

This usage is true for any numeric values that require leading zeros, such as some phone numbers.

Upgrade Paths

There are two possible upgrade paths to migrate away from ZEROFILL.

Option #1 – Move to CHAR/VARCHAR

This option is the most transparent for applications, and changes the data type to be a string instead of numeric. For example:

CREATE TABLE my_zip_codes (
 id INT NOT NULL PRIMARY KEY auto_increment,
 zip_code INT(5) ZEROFILL
);
INSERT INTO my_zip_codes (zip_code) VALUES ('01234'), ('54321'), ('00123'), ('98765');

mysql> select * from my_zip_codes;
+----+----------+
| id | zip_code |
+----+----------+
|  1 |    01234 |
|  2 |    54321 |
|  3 |    00123 |
|  4 |    98765 |
+----+----------+
4 rows in set (0.00 sec)

ALTER TABLE my_zip_codes CHANGE zip_code zip_code CHAR(5);

In the case of a CHAR(5) the storage requirements will only be one byte higher than that of an integer. In the case of other data types (phone numbers requiring leading zeros) it might be slightly more efficient to store as an integer.

Option #2 – Format integers at a different layer

This option retains the storage efficiency of an integer, but moves the presentation into the application. For example:

CREATE TABLE my_zip_codes (
 id INT NOT NULL PRIMARY KEY auto_increment,
 zip_code INT(5) ZEROFILL
);
INSERT INTO my_zip_codes (zip_code) VALUES ('01234'), ('54321'), ('00123'), ('98765');
ALTER TABLE my_zip_codes CHANGE zip_code zip_code INT;

mysql> select * from my_zip_codes;
+----+----------+
| id | zip_code |
+----+----------+
|  1 |     1234 |
|  2 |    54321 |
|  3 |      123 |
|  4 |    98765 |
+----+----------+
4 rows in set (0.00 sec)

It will also technically be possible to retrofit this into legacy applications that require ZEROFILL presentation returning from MySQL. This can be done with a query rewrite plugin to modify SELECT statements to add padding:

mysql> SELECT id, LPAD(zip_code, 5, '0') as zip_code FROM my_zip_codes;
+----+----------+
| id | zip_code |
+----+----------+
|  1 | 01234    |
|  2 | 54321    |
|  3 | 00123    |
|  4 | 98765    |
+----+----------+
4 rows in set (0.01 sec)

Conclusion

We are seeking feedback from the community in response to this proposal. If you have found the existing behavior confusing, or will be affected by the removal of zero fill, please leave a comment or get in touch! We would love to hear from you.

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • SUSANTO MAHATO

    I think LPAD mysql native function is better soluton instead of
    “CONCAT(REPEAT(‘0’, 5-LENGTH(zip_code)), zip_code) as zip_code”.

    • This is a great idea. I’ve changed the example 🙂

  • I don’t have a strong opinion on zerofill, but I’ll be very happy the day integer display width is gone!

  • Roland Bouman

    ZEROFILL as well as display length should never have been implemented at the server side in the first place. They are clearly options that have bearing on the client.

    One can imagine an application that would like to store characteristics like this at the server side. But they should do so with an application specific solution, not with a general set of attributes that affect all of server side metadata.

    I don’t want to diss the the examples that are designed to show how these features could be useful. But the limitation should be apparent since standards like ODBC and JDBC have nothing to represent these features. So, they are not even available to a wide range of clients.

    The sooner these features are removed, the better.

    • Vladislav Vaintroub

      Apparently there is getColumnDisplaySize() in JDBC,in ResultSetMetaData class, so might be something in JDBC (and I would suspect ODBC also) to represent these values.

      • Roland Bouman

        Vladislav, hey, that’s a good point. Trying to think why I didn’t remember this feature. Only reason I can think of right now is that this particular JDBC feature was not so well supported last time I checked. (not talking about MySQL, but JDBC in general)

        That said, it’s very possible I checked just once a long time ago – I’d be very happy to be proven wrong on this one.

        • Vladislav Vaintroub

          I do not know. Compliant driver needs to support it , because it is part of the interface . How well different non-MySQL drivers support it, to that I do not have much information. I’m able to find SQLServer driver documentation, and it does look supported https://msdn.microsoft.com/en-us/library/ms378162(v=sql.110).aspx . I’m not able to find any meaningful docs for other drivers.

  • Claudio Nanni

    My two cents are that each column should be of the base datatype closest to the abstract data type it has to contain. INTEGER to me is for numbers, numbers meant as mathematical entities for which leading zeroes don’t matter, if it matters it’s not a number, so it’s not an INTEGER, for all the rest you have varchar. So I agree with Roland that this is more a client thing.

  • Roland Bouman

    Another clue might be this: zerofill and padding don’t change how the value is operated upon as far as the server is concerned. So what do these definitions really mean? THe answer is nothing, untill the data reaches a client that has some way of knowing how to use this information to change the display of these values. Which, to me, confirms that this is a client feature, not server side.

    • Roland Bouman

      mm, it seems I’m wrong on this one, as a string function like CONCAT seems to respect zerofill.

      Well, How terrible! I’m really, really glad I never used ZEROFILL in my schemas.

  • It would be backward compatible and cause no confusion about the meaning if the syntax were:

    zipcode INTEGER ZEROFILL(5)

    That is, INTEGER would never have a “width” argument, but ZEROFILL would. If the column doesn’t have the ZEROFILL option, then no width argument would appear at all when you SHOW CREATE TABLE or mysqldump.

  • How will deprecating INTEGER display width occur? It will rise error or warning?
    If you want to prevent INT(11) syntax usage at all, it will likely break many apps now.
    Maybe some warning indicating that, “specifying integer display width has no effect” will be sufficient for now.
    About zerofill, well, i have never used it before.

  • Further thoughts:

    I agree wholeheartedly that ZEROFILL was a poorly-executed feature, and that it creates a lot of confusion. But deprecating it will break backward compatibility severely for a minority of applications. Though none of mine, so I guess I don’t have a personal objection.

    I understand you are proposing deprecation rather than removal, so does this mean that CREATE TABLE or ALTER TABLE will throw a warning if one uses ZEROFILL? And an error if running in strict mode, which is now the default? What will that do if I restore a mysqldump created on a pre-5.7 machine?

    If the feature is deprecated, what will mysqldump output or SHOW CREATE TABLE output be on 5.7? Will it suppress the ZEROFILL and integer-width options even on columns that had them? Then I couldn’t restore the dump on a pre-5.7 server and get my ZEROFILL columns intact.

    If ZEROFILL disappears after a dump & restore, then columns that used to be ZEROFILL will be stripped silently of that option. This will require people to make code changes before they can upgrade to 5.7, and you’ve proposed no workaround to allow them to upgrade without code changes.

    Also, there are cases that can’t zero-pad in the client, because there’s no code to amend, like reporting tools, or spreadsheets, or even SELECT…INTO OUTFILE when people export CSV data for another system.

    In short, this will probably affect a relatively small number of MySQL users, but it will affect them badly.

    • Hi Bill,

      I think just by saying “deprecated” you create real meaning for some use-cases. For example: I would no longer consider it necessary knowledge for training or certification. Independent of this, the feature can continue to work with perhaps a small warning. Deprecation is a line in the sand to say the use-of is now encouraged. It will not affect 5.7 upgrades.

      Removal (as an independent step) is something that could be phased in. Per my comment in reply to Shahriyar, it is possible to follow the path taken with INSERT DELAYED.

  • I agree with deprecation and eventual removal of display width and zero fill.

    If compatibility with numbers is required virtual columns can be used.


    mysql-5.7.7-rc-debug-log> CREATE TABLE t1 (
    -> id int unsigned auto_increment primary key,
    -> num int unsigned,
    -> num_zerofill char(5) generated always as (lpad(num,5,0)) stored);
    Query OK, 0 rows affected (0.07 sec)

    mysql-5.7.7-rc-debug-log> INSERT INTO t1(num) VALUES (1),(10),(100);
    Query OK, 3 rows affected (0.01 sec)
    Records: 3 Duplicates: 0 Warnings: 0

    mysql-5.7.7-rc-debug-log> SELECT * FROM t1;
    +----+------+--------------+
    | id | num | num_zerofill |
    +----+------+--------------+
    | 1 | 1 | 00001 |
    | 2 | 10 | 00010 |
    | 3 | 100 | 00100 |
    +----+------+--------------+
    3 rows in set (0.00 sec)

    • This is very true! Virtual columns (and views) can maintain a representation of the old format.

  • Simon J Mudd

    I agree with others and think that deprecation is probably a good idea. However you need to provide 2 things:

    * a deadline when this will go, so I’d expect the feature(s) not to exist in 5.8 [whatever 5.8 will be in the future], and
    * provide recommended guidelines on how people using this feature can work around it as they upgrade

    I’m pleased I was not aware of the AUTOFILL option so do not think that any servers I use make use of it.

    If you think of the replication of VersionX to VersionX+1 then you realise that migration requires a bit more thought to prevent breakage.

    That possibly means that 5.7 will need to recognise the options, but have an option to show/use them, whereas 5.8 will need an option _still_ to recognise the syntax (from an upstream 5.7 master) but will just ignore it. Maybe you can go straight to 5.7 recognising but ignoring the syntax, but it will at least need to recognise it to avoid upgrade problems, and may not give people time to adjust code as 5.7 GA is not going to be far away and this might prevent them being able to upgrade quickly if they want to use 5.7 GA once it is available.

    For me this does not look like a problem though which is good.

    • Simon J Mudd

      The upgrade path of converting from int to char(X) is I think unlikely to work for anyone with large tables. The examples shown of using a char(5) instead of a 5-digit int is hiding the reality that longer conversions from say an int or bigint are going to make storage much wider. If you have a few hundred million rows of data and maybe also indexes this difference starts to add up. I’d guess that most people will simply tackle this with a view, virtual column or alternatively managing the visualisation in the application which is probably where it belongs. This of course has no performance or size penalty.

  • Jörg Brühe

    I wholeheartedly agree that the “display width” is a source of confusion.

    Being German, I’m of course aware of our 5-digit postcodes. But I would never agree to a database schema that uses a numeric type to store them: There is a big difference between a digit string and a number. Yes, with a fixed width of 5 places you could use an int and zerofill, but I would discourage everybody proposing this.

    The phone number use case is no valid one at all, IMHO, because I don’t expect phone numbers to have a fixed length. And without a fixed length, I don’t see how zerofill could provide the proper number of leading zeroes.

    IMHO, we have far too many use cases of MySQL where people did not do a proper schema design. I see a strong need to educate some users, programmers, … about data types, especially about the differences between character (even digit) strings and numbers.
    And touching this subject, I see some part of the guilt with MySQL: Why does MySQL surround numbers by quotes, making them a string? Just look at this, in version 5.5.44:

    SQL> show create table t2 G
    *************************** 1. row ***************************
    Table: t2
    Create Table: CREATE TABLE `t2` (
    `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `val` int(11) DEFAULT ’42’,
    PRIMARY KEY (`ts`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.01 sec)

    So: Yes, I see deprecating the display width as a good step, assuming a good migration strategy, and we should put more stress on the difference between a number and a string.

    Regards,
    Jörg

  • Isn’t this what VIEWs are for? Just have the view do the formatting.

    • and do the right thing when doing updates. Not saying this is an easy thing to achieve in MySQL codebase though.

    • Yes. Daniel and Simon both suggested this in comments 🙂 It is a pretty elegant solution, I am not sure why I did not think of it when writing my original post.

  • Michael Widenius

    Just a couple of comments why I originally implemented width and zerofill for integers:
    – We where using a lot of big tables that had postcode and we needed an efficient storage for them. Being able to store them in a 3 bit integer was back then a big win (especially for indexes).

    – Width was required for getting some ODBC applications to work properly. The width is also used by the command line client “mysql” to decide the width of the output column to use. If you remove width, then the output for ‘mysql’ will be notable larger in some cases.

  • miquel coll

    I agree that this is confusing, because you have the DECIMAL(n,m) where it’s really a precission for that field (i.e., number of digits).

    You can use VIEWS as suggested in other posts, or just do some formatting before displaying the results.

  • What about BOOL, which is an alias for TINYINT(1) (default is 4 instead of 1)? Maybe time for a real BOOL type?

    somewhat related: https://www.percona.com/blog/2008/04/23/efficient-boolean-value-storage-for-innodb-tables/