BFT: Our case for 100% de-normalization.
As a long time DBA and Database Architect this idea is repugnant – make a database 100% de-normalized; one table and except for the one query, retrieval by primary key, nothing else works. And yet we have had great success using this kind of database.
This does not replace the original normalized database, rather it is more like a permeant cache fed from the main database. It is a MySQL database which has certain advantages over Memcached or other true caches such as it is permanent until our processes replace it.
Consider what it might take to build a simple web page: get a request, process it which might take many queries and some significant processing then send back your html.
If you display a page more often than it changes (read more than you write) a BFT (Big Fat Table) can be quite effective. The idea is simple enough; for each record in our database do all the joins, processing, XML tag it, and anything else then save it back into a table that is wide (it has text fields for all that processed data) and deep (one row for each record). So that when we need it we can retrieve and use it in our application with minimal processing.
We can make this work because of the proxy (we use Spockproxy) which shards the database into smaller pieces. Without sharding the BFT would be too big to fit on a single server; in our case a single record on the BFT is about 3k, twice the size of what it takes to store the record in the normalized database (figured by the total data size divided by the number of rows in the primary table).
Writes are slow. Not only do you have to write the record to the main database you have to then recreate the BFT record and ‘REPLACE’ it (remember the BFT is only for reads so we don’t need to worry if a record is new or replacing an old one). If you need to see saved results instantly you will need to do something tricky in your application since they will not be in the BFT immediately after you write them to the main database.
But the reads shine, particularly if your application is mostly reads, say 90% or more and you alway ask for detailed records by primary key. If you have some complicated work to do on your data before you display it, compute it’s popularity and convert to XML for example, you can get very good throughput with little contention.