My Electric Vehicle

May 9th, 2010

I’ve always wanted an electric vehicle; I’ve looked to purchase one from time to time but I’ve found their either very expensive (think Tesla) or just not very good (think home conversion of a 1973 Nova with a 20 mile range).

So I built my own.  It’s not even up to a Nova conversion in fact it’s a converted Adult Tricycle. But I built it myself, it works, it’s fun to drive and it cost me just under $1,000.  I like to think of it as 1% of a Tesla for about 1% of the cost.

How does it stack up to the Tesla, Nissan Leaf, Think, or any other electric car?  Not at all really – but compared to the Zap scooter, electric cushman, or those “Mobility Scooters” it can kick some serious rear end.

I’ve got a top speed around 25 – you really wouldn’t want to go faster on it. It has a range of maybe 20 miles and it’s something that’s fun to ride.  Not to mention it is a head turner (although I’m not sure that they are not laughing) and it’s cheap to run.  I have my Kill-a-Watt monitoring the  charger and so far – a dozen trips to the market or hardware stores it’s 2.3 KWH which is about 33 cents from my utility.  More on that after I have more data.

I’m in hot water

April 7th, 2010

I have a solar hot water system in my home; I think it works well but how do I know?  Well now I know.  I bought a simple data logger off of ebay and wrote  a perl script to load an rrd (round robin database) and presto:

What does it mean?  Red – is hot water output.  Blue is cold water in.  The Purple line is the bottom of the tank, when it is high all 160 gallons are warm.  The Green line is the water coming out of the collectors although during night it the water just sitting there.

In this chart which is about 24 hours you can see the water cooling off and then around 10 am the green line (the water coming from the solar collector) starts to heat up.  At first it raises the purple line (the coldest water in the bottom of the tank) and then it pushes up the red line (the hot water in the top).

Exciting, no?  I was impressed to get it working but the database part was way too hard for the casual user – I had trouble with it and I do this all the time.  I’m eager to find a practical solution for the casual user; as I investigate the options I’ll post them here.  Suggestions are welcome.

Spockproxy: Load data into each specific shard

April 22nd, 2009

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.

When you finished my last blog post you have a directory for each shard full of load files. The tables had been created so now we’re ready to load these files.

These two queries will generate load scripts, modify them as needed or write your own.

SELECT concat(‘ LOAD DATA INFILE \’
/db0′, ‘/’, st.table_name, ‘\’ INTO TABLE ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st
WHERE status = ‘universal’;

and run this once for each shard (change the sr.database_id = 1 to each database_id).

SELECT concat(‘ LOAD DATA INFILE \’
/db’, database_id, ‘/’, st.table_name, sr.range_id, ‘\’ INTO TABLE ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st,
shard_range_directory sr
WHERE status = ‘federated’ AND range_id < 30
and sr.database_id = 1;

Now you have one universal load script and one load script for each shard. Check these carefully and run each one on the appropriate database.

After this is done you have all the data loaded – you’re almost there. The last step is to make the universal data visible within each shard. Right now it should be replicated to each database server but it is in it’s own database. We want to create a view in each shard for each table in the local universal database on that server. Again I have a query to do most of the work for you:

First drop the old views (not needed the first time). Modify the <> and the <>

SELECT CONCAT(‘DROP VIEW ‘, TABLE_NAME, ‘;’) AS ‘–’ FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘<>’ AND TABLE_TYPE = ‘VIEW’ ;

and then create the news views.

SELECT CONCAT(‘CREATE VIEW ‘, t.TABLE_NAME, ‘ AS SELECT ‘, group_concat( COLUMN_NAME order by ORDINAL_POSITION), ‘ FROM <>.’, t.TABLE_NAME, ‘;’) AS ‘–’ FROM INFORMATION_SCHEMA.TABLES t, INFORMATION_SCHEMA.COLUMNS c WHERE t.TABLE_SCHEMA = ‘<>’ AND c.TABLE_SCHEMA = ‘<>’ and t.TABLE_NAME = c.TABLE_NAME GROUP BY t.TABLE_NAME;

Save and run the output from these on each shard.

Spockproxy: Dump the sharded data

April 22nd, 2009

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.

During the development and testing of our Spockproxy I found I was dumping and loading the data from our old non-sharded databases into shards repeatedly while we tried to get the configuration correct. I developed this process and I’ve found that id works easily; hopefully you’ll find it helpful.

1. set up a directory that your database server can dump to – in it create one directory for each shard and name them ‘db1′, ‘db2′, ‘db3′ and so on and a directory for the universal db as ‘db0′.

2. modify this SQL, change the
to the path to the directory you created in step 1 (the parent directory) and change the 30 where it says “AND range_id < 30″ to be the lowest range_id that contains no data (no harm in going too high it will just dump empty files).

SELECT concat(‘ SELECT * INTO OUTFILE \’
/db’, database_id, ‘/’, st.table_name, ‘\’ FROM ‘, st.table_name, ‘ WHERE ‘, group_concat( concat(column_name, ‘ BETWEEN ‘, low_id, ‘ AND ‘, high_id) SEPARATOR ‘ or ‘), ‘;’) AS ”
FROM shard_table_directory st,
shard_range_directory sr
WHERE status = ‘federated’ AND range_id < 30
GROUP BY st.table_id, sr.database_id;

and

SELECT concat(‘ SELECT * INTO OUTFILE \’
/db0′, ‘/’, st.table_name, ‘\’ FROM ‘, st.table_name, ‘;’) AS ”
FROM shard_table_directory st
WHERE status = ‘universal’;

3. save the output of the above statement into a file – look it over. It should be a bunch of dump statements. Does it look good? Run it – this may take some time. You can perhaps split it into several files and run it on several read slaves to speed things up.

4. now you have a bunch of load files, organized into directories of the shards they need to be loaded into.

Spockproxy: Dump and load the schemas

April 21st, 2009

Dump and load the schemas (one for universal and one for the shards).

Note – this post is a footnote for my MySQL users conference presentation Sharding Using Spockproxy: A Sharding-only Version of MySQL Proxy.  

Dumping the schema is fairly straight forward using mysqldump.  There are two issues: we need to separate the “universal” tables from the sharded tables and you need to decide what to do about auto_increment.

Separating the tables is simple if you have your shard_table_directory table set up the way you like it then run:

SELECT group_concat(table_name, SEPARATOR ‘ ‘) FROM shard_table_directory WHERE status = ‘federated’;

and

SELECT group_concat(table_name, SEPARATOR ‘ ‘) FROM shard_table_directory WHERE status = ‘universal’;

and keep the outputs separate. We’ll pass the table lists to mysqldump:

mysqldump [OPTIONS] database [tables]

Where the OPTIONS are -d (no data) and probably -p -u and -h; replace ‘database’ with your old database name and [tables] with the table list.  Run this and save the two outputs.

OK now you have two files containing the create table SQL for the universal and sharded tables respectively.  Before you load them you need to decide what to do about auto_increment. Your basic choices are: 

1 – still use auto_increment but make sure the shards are set up so they do not use the same id’s

2 – let the proxy do it for you

3 – do it yourself using the get_next_id(‘table_name’, number_needed) function 

4 – do it yourself in your application

(note: 2 and 3 are compatible, possibly 4 too if you update the shard_table_directory)

When you decide how you want to deal with the incrementing you may want to change the column definitions to not be ‘auto_increment’.  Make this and any other changes necessary then load the universal file to your master universal DB; it will get replicated to the shards. Then load the shard file to each of your shard databases.

I’m looking for sharding problems

March 3rd, 2009
Do you want a SPOCK tee shirt?  Read on:

I’m going to give a talk on Spockproxy (a sharding / connection pooling only version of MySQL proxy) at the MySQL conference and as I prepare I’m looking to give my talk broad appeal and try to address all kinds of problems folks might have sharding their databases.

So I’m throwing this question out to the MySQL community – Have you looked into sharding your database(s)?  Did you come up against problems that were difficult to solve? Please take a moment and let me know about them.  I’d like to address how to fix them with Spockproxy.  Even if you’ve solved these issues already or have no intension of using Spockproxy your problems could be interesting to others; add your sharding problem(s) in the comment below and look for me Wednesday, 04/22/2009 Ballroom E.  I’ll send a SPOCK tee shirt to you for a good problem but supplies are limited.

BFT: Our case for 100% de-normalization.

January 20th, 2009

As a long time DBA and Database Architect this idea is repugnant – make a database 100% de-normalized; one table and except for the one query, retrieval by primary key, nothing else works.  And yet we have had great success using this kind of database.

This does not replace the original normalized database, rather it is more like a permeant cache fed from the main database.  It is a MySQL database which has certain advantages over Memcached or other true caches such as it is permanent until our processes replace it.

Consider what it might take to build a simple web page: get a request, process it which might take many queries and some significant processing then send back your html.

In building a typical web page you can have dozens or queries

In building a typical web page you can have dozens or queries

If you display a page more often than it changes (read more than you write) a BFT (Big Fat Table) can be quite effective.  The idea is simple enough; for each record in our database do all the joins, processing, XML tag it, and anything else then save it back into a table that is wide (it has text fields for all that processed data) and deep (one row for each record).  So that when we need it we can retrieve and use it in our application with minimal processing.

We can make this work because of the proxy (we use Spockproxy) which shards the database into smaller pieces.  Without sharding the BFT would be too big to fit on a single server; in our case a single record on the BFT is about 3k, twice the size of what it takes to store the record in the normalized database (figured by the total data size divided by the number of rows in the primary table).

Writes are slow.  Not only do you have to write the record to the main database you have to then recreate the BFT record and ‘REPLACE’ it (remember the BFT is only for reads so we don’t need to worry if a record is new or replacing an old one).  If you need to see saved results instantly you will need to do something tricky in your application since they will not be in the BFT immediately after you write them to the main database.

But the reads shine, particularly if your application is mostly reads, say 90% or more and you alway ask for detailed records by primary key.  If you have some complicated work to do on your data before you display it, compute it’s popularity and convert to XML for example, you can get very good throughput with little contention. 

Hard Loading – something to avoid.

December 19th, 2008

Last week I got a question about sharding using our Spockproxy.  The question was how can I create a query for the proxy so it effectively runs:

/*in shard 1*/
SELECT * FROM table_a WHERE f_key IN (a, b, c);

UNION

/*in shard 2*/
SELECT * FROM table_a WHERE f_key IN (d, e, f);

By design our proxy will not do this.  The whole point is to hide the sharding from the application.  Given a query it will either send the same query to all the shards and combine the results or only send that query to one shard when it can figure out that the results(s) can only come from one shard (because you specified the shard key in the where clause).

I did figure out a way it could be done using views but would this ever be desirable?  

Like “Hard Coding” where values are built into the code of your application I’ll call this technique “Hard Loading” where an attribute’s value (other than the shard key) is reflected by the specific shard the data is found in.  At first glance this might seem efficient – but like hard coding I recommend against it.  Simply growing your system to include a new shard will break things.  Besides rearranging your data you’ll have to examine all of your queries – which ones assume data lives in a particular shard?  How can you fix them?  You find yourself in a position where because adding a shard is so difficult you are actually praying that you do not grow.  You never want to be in that position.

Slides from Proxy talk

December 8th, 2008

I’ve reposted the slides from my Spockproxy talk (Spockproxy is a Sharding only version of the MySQL proxy).  Since I’ve have to move this web site it’s been some effort to move all of the files into their new homes.

These slides are in a variety of formats and have loads of great information if you’re considering a sharding solution – even if that is not Spockproxy.  Of course once you see how easy it is you’ll put it on your short list.

The slides are available at http://www.frankf.us/projects/spockproxy/  If you want to hear the talk you’ll have to invite me.

Spockproxy

December 7th, 2008

Spockproxy is a custom version of the MySQL Proxy that is optimized (dedicated) to range based sharding.  This is not done with Lua so it is perhaps less flexible but if you are trying to do sharding it is one of the easiest solutions to set up – and will require minimal, if any, changes to your application code.

Is is available as an open source down load at Sourceforge.