Faking a slave: Subscribing to mysql row-based-replication changes

In complex systems it’s often useful to be able to receive notification when rows have been modified in MySQL so that you can invalidate various external caches or indexes. For example: memcached, Sphinx, Lucene.

In the case of MySQL’s default statement-based replication this can be quite tricky to do, as it would likely require an SQL parser to determine what was intended to be modified. However, this is made much simpler with Row-based Replication (something I recommend switching to).

A C++ library exists to be able to listen to MySQL replication, but what I want to demonstrate is that it is also very simple to be able to do this with mysqlbinlog:


shell> mysqlbinlog --read-from-remote-server --stop-never --host localhost.localdomain --port 5616 -u msandbox -pmsandbox --verbose mysql_sandbox5616-bin.000004 | grep '^### '

To explain how this command works:

  • --read-from-remote-server tells mysqlbinlog to act like a slave, and fetch remote files rather than local (new option to 5.6).
  • --stop-never makes mysqlbinlog block waiting for a continual stream of updates, rather than exiting. Perfect!
  • --verbose rewrites row-based replication events to be pseudo SQL statements. The pseudo statements are very easy to parse. For example:
    ### DELETE FROM `test2`.`a`
    ### WHERE
    ###   @1=1
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=1
    ### UPDATE `test2`.`a`
    ### WHERE
    ###   @1=10
    ### SET
    ###   @1=20
    

    Noting that a multi-row statement will appear as individual statements via --verbose. Row-based events also default to sending all columns in the row, not just those that are changed.

  • The grep '^### ' statement is just a lazy way of stripping out everything except pseudo SQL statements.

Advanced Usage

You can fairly easily extend the above to track your progress reading through the master’s binary logs. This will make it easier to resume from where you left off if there is a crash. Just remove the grep, and keep track of:

  • Positional markers just before DML events. These are marked in bold here:
    # at 191
    #140526 15:28:27 server id 10  end_log_pos 239 CRC32 0x559a84a8     GTID [commit=yes]
    SET @@SESSION.GTID_NEXT= '9f0ce61c-bb92-11e3-89fd-f056da47d247:17'/*!*/;
    # at 239
    #140526 15:28:27 server id 10  end_log_pos 312 CRC32 0xff074c19     Query   thread_id=1 exec_time=0 error_code=0
    SET TIMESTAMP=1401143307/*!*/;
    BEGIN
    /*!*/;
    # at 312
    #140526 15:28:27 server id 10  end_log_pos 357 CRC32 0x74b1ad7f     Table_map: `test2`.`a` mapped to number 71
    # at 357
    #140526 15:28:27 server id 10  end_log_pos 397 CRC32 0x2c6f8b8d     Write_rows: table id 71 flags: STMT_END_F
    
    BINLOG '
    C8CDUxMKAAAALQAAAGUBAAAAAEcAAAAAAAEABXRlc3QyAAFhAAEDAAF/rbF0
    C8CDUx4KAAAAKAAAAI0BAAAAAEcAAAAAAAEAAgAB//4KAAAAjYtvLA==
    '/*!*/;
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=10
    
  • Events that show that the binary log file itself is being rotated. For example, the result of FLUSH LOGS is:
    # at 4
    #691231 16:00:00 server id 10  end_log_pos 0 CRC32 0x7800af55   Rotate to mysql_sandbox5616-bin.000006  pos: 4
    

Conclusion

This might not be as robust in all cases as using the C++ API, but it sure beats the alternative. Gone are my days of writing complex sets of triggers to write to an “events” table, which I poll continually from an external script.

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • If only there was a usable python wrapper for that c++ API…

    Binlogs could be used in many creative ways: Cache invalidation (memcached, varnish), app notifications (websockets etc).

    • There is actually a wrapper for Ruby:
      https://github.com/y310/kodama

      But yes, in python would be a lot more practical. App notifications is an interesting use-case I hadn’t thought of. I can imagine on a site with user-to-user messaging being able to use this to push out new messages.

  • Justin Swanhart

    This is how FlexCDC captures changes. The log tables produced by FlexCDC can have triggers that send changes to other systems with something like the Gearman UDF. One large online retailer uses FlexCDC to replicate changes from MySQL into Amazon Redshift where they do analytics. Their ETL process reads from the change log tables and writes into Redshift.

    You can find it here:
    http://github.com/greenlion/swanhart-tools

    • Awesome! This use-case escaped me for some reason. Do you capture with mysqlbinlog as well, or use a wrapper for the C++ API?

      • Justin Swanhart

        It is a wrapper. And there are a few gotchas you need to know when trying to use mysqlbinlog in the way described. The first is that unsigned numbers are problematic. There is no way from the log to determine if the signed or unsigned value should be used (both are present in the output). FlexCDC works around this by examining the log table, which will have the same format as the original table when the log record is being processed.

        Timestamp values are represented by a unix timestamp instead of a datetime.

        I also have an experimental consumer that reads RBR by base64 decoding BINLOG statements. Some interesting code to handle 24 bit numbers and other fun stuff in the binlog too.

        • Justin Swanhart

          Wrapper around mysqlbinlog I mean.

  • Valerie

    Oh, this is good. I love a good mysql trick.

  • 林鴻儒

    How can I use the mysqlbinlog to get binary logs from remote server and apply to the local server?
    ex: ??
    mysqlbinlog –read-from-remote-server –stop-never –raw –host 172.16.62.116 -uroot -p mysqlbin.000004 –start-position=999 | mysql -uroot -p