Archive for April, 2009

Spockproxy: Load data into each specific shard

Wednesday, April 22nd, 2009

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.

When you finished my last blog post you have a directory for each shard full of load files. The tables had been created so now we’re ready to load these files.

These two queries will generate load scripts, modify them as needed or write your own.

SELECT concat(‘ LOAD DATA INFILE \’
/db0′, ‘/’, st.table_name, ‘\’ INTO TABLE ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st
WHERE status = ‘universal’;

and run this once for each shard (change the sr.database_id = 1 to each database_id).

SELECT concat(‘ LOAD DATA INFILE \’
/db’, database_id, ‘/’, st.table_name, sr.range_id, ‘\’ INTO TABLE ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st,
shard_range_directory sr
WHERE status = ‘federated’ AND range_id < 30
and sr.database_id = 1;

Now you have one universal load script and one load script for each shard. Check these carefully and run each one on the appropriate database.

After this is done you have all the data loaded – you’re almost there. The last step is to make the universal data visible within each shard. Right now it should be replicated to each database server but it is in it’s own database. We want to create a view in each shard for each table in the local universal database on that server. Again I have a query to do most of the work for you:

First drop the old views (not needed the first time). Modify the <> and the <>

SELECT CONCAT(‘DROP VIEW ‘, TABLE_NAME, ‘;’) AS ‘–‘ FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘<>’ AND TABLE_TYPE = ‘VIEW’ ;

and then create the news views.

SELECT CONCAT(‘CREATE VIEW ‘, t.TABLE_NAME, ‘ AS SELECT ‘, group_concat( COLUMN_NAME order by ORDINAL_POSITION), ‘ FROM <>.’, t.TABLE_NAME, ‘;’) AS ‘–‘ FROM INFORMATION_SCHEMA.TABLES t, INFORMATION_SCHEMA.COLUMNS c WHERE t.TABLE_SCHEMA = ‘<>’ AND c.TABLE_SCHEMA = ‘<>’ and t.TABLE_NAME = c.TABLE_NAME GROUP BY t.TABLE_NAME;

Save and run the output from these on each shard.

Spockproxy: Dump the sharded data

Wednesday, April 22nd, 2009

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.

During the development and testing of our Spockproxy I found I was dumping and loading the data from our old non-sharded databases into shards repeatedly while we tried to get the configuration correct. I developed this process and I’ve found that id works easily; hopefully you’ll find it helpful.

1. set up a directory that your database server can dump to – in it create one directory for each shard and name them ‘db1’, ‘db2’, ‘db3’ and so on and a directory for the universal db as ‘db0’.

2. modify this SQL, change the
to the path to the directory you created in step 1 (the parent directory) and change the 30 where it says “AND range_id < 30” to be the lowest range_id that contains no data (no harm in going too high it will just dump empty files).

SELECT concat(‘ SELECT * INTO OUTFILE \’
/db’, database_id, ‘/’, st.table_name, ‘\’ FROM ‘, st.table_name, ‘ WHERE ‘, group_concat( concat(column_name, ‘ BETWEEN ‘, low_id, ‘ AND ‘, high_id) SEPARATOR ‘ or ‘), ‘;’) AS ”
FROM shard_table_directory st,
shard_range_directory sr
WHERE status = ‘federated’ AND range_id < 30
GROUP BY st.table_id, sr.database_id;

and

SELECT concat(‘ SELECT * INTO OUTFILE \’
/db0′, ‘/’, st.table_name, ‘\’ FROM ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st
WHERE status = ‘universal’;

3. save the output of the above statement into a file – look it over. It should be a bunch of dump statements. Does it look good? Run it – this may take some time. You can perhaps split it into several files and run it on several read slaves to speed things up.

4. now you have a bunch of load files, organized into directories of the shards they need to be loaded into.

Spockproxy: Dump and load the schemas

Tuesday, April 21st, 2009

Dump and load the schemas (one for universal and one for the shards).

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.  

Dumping the schema is fairly straight forward using mysqldump.  There are two issues: we need to separate the “universal” tables from the sharded tables and you need to decide what to do about auto_increment.

Separating the tables is simple if you have your shard_table_directory table set up the way you like it then run:

SELECT group_concat(table_name, SEPARATOR ‘ ‘) FROM shard_table_directory WHERE status = ‘federated’;

and

SELECT group_concat(table_name, SEPARATOR ‘ ‘) FROM shard_table_directory WHERE status = ‘universal’;

and keep the outputs separate. We’ll pass the table lists to mysqldump:

mysqldump [OPTIONS] database [tables]

Where the OPTIONS are -d (no data) and probably -p -u and -h; replace ‘database’ with your old database name and [tables] with the table list.  Run this and save the two outputs.

OK now you have two files containing the create table SQL for the universal and sharded tables respectively.  Before you load them you need to decide what to do about auto_increment. Your basic choices are: 

1 – still use auto_increment but make sure the shards are set up so they do not use the same id’s

2 – let the proxy do it for you

3 – do it yourself using the get_next_id(‘table_name’, number_needed) function 

4 – do it yourself in your application

(note: 2 and 3 are compatible, possibly 4 too if you update the shard_table_directory)

When you decide how you want to deal with the incrementing you may want to change the column definitions to not be ‘auto_increment’.  Make this and any other changes necessary then load the universal file to your master universal DB; it will get replicated to the shards. Then load the shard file to each of your shard databases.