Beyond the FRM: ideas for a native MySQL Data Dictionary

The frm file has provided long service since the earliest days of MySQL. Now, it is time to replace it with a native InnoDB-based Data Dictionary.

This is a change that has been on our wish list for a long time, as well as others in the MySQL development community:

For historical context:

  • Every table in MySQL has at least a corresponding .frm file. For example in the case of MyISAM these three files constitute a complete backup:

    mytable.MYI
    mytable.MYD
    mytable.frm
    

  • The .frm file stores information such as column names and data-types. It is a binary format, which as Stewart explains predates MySQL and spans back to 1979 with UNIREG. In addition to the .frm file, there are .trn, .trg and .par files which have been added over time to support triggers and partitioning.

Our motivation to develop a native dictionary spans from a number of issues with the current filesystem-based formats:

  1. The current .frm file predates MySQL's support for transactions, and has a lot of complexity to handle various failure states in replication and crash recovery. For example: Bug#69444. Using a native data dictionary simplifies code and makes handling failure states very simple.

  2. Our information_schema implementation currently suffers, and has been subject to years of complaints. By using a native dictionary, we will be able implement information_schema as views over real tables, significantly improving the speed of queries.

  3. On a closely related point, we currently build information_schema on top of a series of differing filesystem properties, while attempting to provide the same cross-platform experience. The code to account for filesystem case insensitivity increases code complexity, and ties up developer resources that could be better spent elsewhere.

  4. Aside from the MySQL server's data dictionary, storage engines may also store their own data dictionary. In the case of the InnoDB storage engine, this redundant storage has led to complexity in troubleshooting an out-of-sync data dictionary.

  5. The current non-comformity of the data dictionary (using .frm, .par, .trn and .trg files) spans from a lack of extensibility from the original .frm format. Not having a centralized extensible repository makes it difficult to incorporate feature requests that require additional meta data stored, or to offer new relational objects in the future.

  6. The current format does not support versioning meta-data in such a way that we can use it to assist in the upgrade experience between MySQL versions.

This change is of course in addition to my recent post about storing system tables in InnoDB.

While this change will be transparent for many of our users, we are inviting feedback from the community. Please let us know:

  • If you have a use-case where you interact with the file-based formats directly.
  • What features you want to see in a native data dictionary!

You can either leave a comment, or email me.

  • Justin Swanhart

    Can you also look at transactional DDL once the data dictionary is transactional?

    • http://www.tocker.ca/ Morgan Tocker

      Did you want to have the honor of filing the feature request on bugs.mysql.com? :)

  • Laurynas Biveinis

    Please make sure that the data dictionary schema fixes http://bugs.mysql.com/bug.php?id=43039 by design, i.e. if someone installs a storage engine plugin, creates some tables, uninstalls the plugin, installs the plugin again, possibly with different plugin id (but the same name), the tables can still be dropped.

    • http://www.tocker.ca/ Morgan Tocker

      Thank you for the feedback.

  • Aleksandr Kuzminsky

    Brilliant idea! I've been always advocating for storing the structure in one place.
    SYS_* however don't store some information .frm does. For example, DECIMAL type is described as VARBINARY type, but without precision. Probably just because there was no DECIMAL in the original version of InnoDB. Comments, table options are missing, etc.

    I guess the current implementation of the dictionary should be extended.
    Anyway, way to go, that would solve many problems.

  • http://karwin.blogspot.com Bill Karwin

    I think this is great, it should simplify and modernize the code, but it will also simplify DBA understanding of the system, I hope. Right now it's not widely understood what the FRM file is, and what one can do with it.

    Fortunately, it has never been supported to poke around in the FRM file, and there are few tools to do anything like that. So MySQL has the high ground to make this change without breaking any implied backward compatibility.

    This change would make InnoDB a mandatory storage engine (along with the recent proposal to make mysql.* use InnoDB by default). The change also moves further away from old-school MySQL, where people habitually use cp and mv to restructure or back up databases. Which is all fine -- if that's the goal.

    Third-party storage engines will be impacted. I'm not sure by how much, but it seems like if you change the way metadata is managed, it would affect all storage engines. Perhaps MySQL internals need a compatibility API for a transitionary period, so that storage engine vendors don't have to do a big-bang rewrite.

    Would this replace the in-memory data dictionary that InnoDB maintains? Or would InnoDB keep that mechanism side by side? Seems like that would be duplication and a waste of memory. But rewriting all of that inside InnoDB at the same time may be biting off more than one can chew.

    How would visibility to metadata be implemented if the metadata is stored in conventional InnoDB tables? Currently, if I don't have privilege to query a given user table, I also can't see a row for that table in I_S. I don't know how that could be implemented if the metadata is stored in InnoDB. I.e. we can do column-level privileges, but not row-level privileges.

    Desired features:

    Provide more tools to satisfy the "cp and mv" users. We now have transportable InnoDB tablespaces, but frankly they're awkward to use for more than one table at a time. If MySQL claims to maintain storage engine independence, you need command-line tools to do physical exports and imports of sets of tables, regardless of storage engine. Importing should do the necessary bookkeeping to merge tables into the metadata.

    Base system tables should be for common metadata that applies to any storage engine, not for features that are specific to a storage engine (e.g. ROW_FORMAT, CHECKSUM, UPDATE_TIME, etc.). For each row in TABLES, there should be a row in another table, e.g. TABLES_INNODB, which has all the InnoDB-specific metadata.

    Please stop storing qualified, delimited identifiers like "`database`.`table`" in some system tables, while splitting that into table_schema and table_name in other places. Always split into two columns.

    Move the new system tables in a new schema. The "sys" schema name seems to be catching on.

    Clarify the separation of dynamic runtime information, such as INNODB_BUFFER_PAGE_LRU, versus static metadata (that changes only when DDL is executed). I'm not sure where the dividing line is, since things like table_rows and data_length are a gray area. But decide on a policy, document it, and stick to it.

    Implement SQL-standard-compliant INFORMATION_SCHEMA as system views layered on top of the "real" system tables that MySQL uses. Some other RDBMS implementations have done this. These SQL-standard views should be the only views present in INFORMATION_SCHEMA by default. Deprecate all INFORMATION_SCHEMA tables that aren't SQL-standard and move that metadata to sys (or wherever).

    Provide an optional SQL script that creates views against new system tables, to mimic old MySQL-proprietary I_S tables.

    As this feature is being developed, it should be a requirement to test performance and memory consumption with large numbers of metadata objects. We see quite a few sites with hundreds of thousands of tables. This is not even exotic anymore, it's becoming mainstream. It would certainly be embarrassing to roll out this feature and have it fall over or consume gigs of memory if someone has 100k tables.

    How to make users accept this change happily? Easy -- make metadata queries lightning fast. That will be a compelling and long-desired improvement. Perhaps implement MEMORY tables that cache metadata, auto-populating on server startup from the persistent InnoDB metadata tables. DDL would update the persistent tables of course, but they could trigger updates to the MEMORY tables. The MEMORY tables should have appropriate indexes for common use cases, and allow creation of additional indexes. Especially sites with 100k's of tables would love this.

    Improve diagnostics and instrumentation. In the old days, we had things like the virtually undocumented tablespace monitor. I assume that there would need to be P_S tables to expose metadata about the metadata. :-)

    • http://www.tocker.ca/ Morgan Tocker

      Thank you for the detailed questions and feedback. I don't have answers to them all yet, but I will see what we can do in the coming weeks.

      • http://karwin.blogspot.com Bill Karwin

        One more question: how would a site upgrade to the new MySQL version that supports this change, if they want to do an in-place upgrade of a 2TB datadir with minimal downtime?

        Would one have to dump and restore the whole datadir? Would mysql_upgrade handle it? Would the new software automatically detect existing .FRM files and slurp them into the new InnoDB system tables?

        Not expecting these answers to be available right now of course -- I'm just posing the questions for the developers to think about (not that the same questions wouldn't occur to them).

        • jynus

          I want to do a +1 on the binary compatibility forward and backwards. That doesn't mean that it has to be automatic, or easy, but please make a way in and out that doesn't require a logical export/import from 5.6 to 5.7.

  • Jean-François Gagné

    When implementing this, make sure that MyISAM tables can still be move around. Some people rely on the fact that the frm, myi and myd files can be copied from one server to another: a similar feature should exist when frm files disappear. And it should still work on systems that host more than 100.000 tables and maybe up to millions of tables (yes, those exist).