Spockproxy: Dump the sharded data

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.

During the development and testing of our Spockproxy I found I was dumping and loading the data from our old non-sharded databases into shards repeatedly while we tried to get the configuration correct. I developed this process and I’ve found that id works easily; hopefully you’ll find it helpful.

1. set up a directory that your database server can dump to – in it create one directory for each shard and name them ‘db1′, ‘db2′, ‘db3′ and so on and a directory for the universal db as ‘db0′.

2. modify this SQL, change the
to the path to the directory you created in step 1 (the parent directory) and change the 30 where it says “AND range_id < 30″ to be the lowest range_id that contains no data (no harm in going too high it will just dump empty files).

SELECT concat(‘ SELECT * INTO OUTFILE \’
/db’, database_id, ‘/’, st.table_name, ‘\’ FROM ‘, st.table_name, ‘ WHERE ‘, group_concat( concat(column_name, ‘ BETWEEN ‘, low_id, ‘ AND ‘, high_id) SEPARATOR ‘ or ‘), ‘;’) AS ”
FROM shard_table_directory st,
shard_range_directory sr
WHERE status = ‘federated’ AND range_id < 30
GROUP BY st.table_id, sr.database_id;

and

SELECT concat(‘ SELECT * INTO OUTFILE \’
/db0′, ‘/’, st.table_name, ‘\’ FROM ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st
WHERE status = ‘universal’;

3. save the output of the above statement into a file – look it over. It should be a bunch of dump statements. Does it look good? Run it – this may take some time. You can perhaps split it into several files and run it on several read slaves to speed things up.

4. now you have a bunch of load files, organized into directories of the shards they need to be loaded into.

Tags:

Comments are closed.