Posts Tagged ‘sharding’

I’m looking for sharding problems

Tuesday, March 3rd, 2009
Do you want a SPOCK tee shirt?  Read on:

I’m going to give a talk on Spockproxy (a sharding / connection pooling only version of MySQL proxy) at the MySQL conference and as I prepare I’m looking to give my talk broad appeal and try to address all kinds of problems folks might have sharding their databases.

So I’m throwing this question out to the MySQL community – Have you looked into sharding your database(s)?  Did you come up against problems that were difficult to solve? Please take a moment and let me know about them.  I’d like to address how to fix them with Spockproxy.  Even if you’ve solved these issues already or have no intension of using Spockproxy your problems could be interesting to others; add your sharding problem(s) in the comment below and look for me Wednesday, 04/22/2009 Ballroom E.  I’ll send a SPOCK tee shirt to you for a good problem but supplies are limited.

Hard Loading – something to avoid.

Friday, December 19th, 2008

Last week I got a question about sharding using our Spockproxy.  The question was how can I create a query for the proxy so it effectively runs:

/*in shard 1*/
SELECT * FROM table_a WHERE f_key IN (a, b, c);


/*in shard 2*/
SELECT * FROM table_a WHERE f_key IN (d, e, f);

By design our proxy will not do this.  The whole point is to hide the sharding from the application.  Given a query it will either send the same query to all the shards and combine the results or only send that query to one shard when it can figure out that the results(s) can only come from one shard (because you specified the shard key in the where clause).

I did figure out a way it could be done using views but would this ever be desirable?  

Like “Hard Coding” where values are built into the code of your application I’ll call this technique “Hard Loading” where an attribute’s value (other than the shard key) is reflected by the specific shard the data is found in.  At first glance this might seem efficient – but like hard coding I recommend against it.  Simply growing your system to include a new shard will break things.  Besides rearranging your data you’ll have to examine all of your queries – which ones assume data lives in a particular shard?  How can you fix them?  You find yourself in a position where because adding a shard is so difficult you are actually praying that you do not grow.  You never want to be in that position.