Background
One of the most requested features in Django is that it support connecting to multiple databases at once. This can come in several flavors, but the two most common cases are sharding, and (vertical) partitioning. If you've been waching closely, some of the core developers have been saying in various places for a few months now that this is technically possible, right now, in Django 1.0.
Of course, being technically possible is a long way from being easy. Right now there is no public API for dealing with multiple databases. So why do the developers say that it's possible to do? The answer is simple: shortly before Django 1.0 was released, much of the internals of QuerySet objects (Django's interface to the database) were refactored to use object state-level connection objects instead of a global connection object.
This seemingly-small change opens the doors for multiple databases, even if there is no API in front of it. So let's create an API. We're going to be focusing on vertical partitioning, since it's slightly easier, but the technique demonstrated here will be illustrative when implementing sharding as well. Oh, and since we're poking deep into the core of Django's internals, I'm obliged to give the standard disclaimer: this is not supported and may break in future versions of Django, so use these techniques at your own risk.
First things first
The first thing that needs to be done when implementing multiple database support is to supply Django with the information about all of the databases that you would like to connect to. Here's how that should look in settings.py:
DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = 'primary.db'
DATABASE_USER = ''
DATABASE_PASSWORD = ''
DATABASE_HOST = ''
DATABASE_PORT = ''
DATABASES = dict(
primary = dict(
DATABASE_ENGINE=DATABASE_ENGINE,
DATABASE_NAME=DATABASE_NAME,
DATABASE_USER=DATABASE_USER,
DATABASE_PASSWORD=DATABASE_PASSWORD,
DATABASE_HOST=DATABASE_HOST,
DATABASE_PORT=DATABASE_PORT,
),
secondary = dict(
DATABASE_ENGINE=DATABASE_ENGINE,
DATABASE_NAME='secondary.db',
DATABASE_USER=DATABASE_USER,
DATABASE_PASSWORD=DATABASE_PASSWORD,
DATABASE_HOST=DATABASE_HOST,
DATABASE_PORT=DATABASE_PORT,
),
)
We have not only created the typical database information that Django requires, but we've also created a dictionary containing information about all of the databases that we intend to connect to. In this case, we are connecting to two sqlite databases in the same directory, named primary.db and secondary.db.
Let's now create an app, named blog (I know, I know, very unoriginal). The models.py will look like this:
import datetime
from django.db import models
class Post(models.Model):
title = models.TextField()
body = models.TextField()
date_submitted = models.DateTimeField(default=datetime.datetime.now)
class Link(models.Model):
url = models.URLField()
description = models.TextField(null=True, blank=True)
date_submitted = models.DateTimeField(default=datetime.datetime.now)
And we hook it up to the admin and settings.py in the normal manner. For more information on how to do this, follow the official tutorial. We're going to be storing the Post objects in the primary database, and the Link objects in the secondary database. Since they don't have any foreign keys, we don't have to worry about joins. (They are possible, but not easy to describe in one post.)
Multiple databases
We should probably write some code that will inspect all of our models and create only the tables that we want in each database. For the sake of simplicity and practicality of a blog post, we're not going to do that. Instead, we will simply create all of the schema on both databases. The management command to do so might look something like this (I called it multi_syncdb):
from django.core.management.base import NoArgsCommand
from django.core.management import call_command
from django.conf import settings
class Command(NoArgsCommand):
help = "Sync multiple databases."
def handle_noargs(self, **options):
for name, database in settings.DATABASES.iteritems():
print "Running syncdb for %s" % (name,)
for key, value in database.iteritems():
setattr(settings, key, value)
call_command('syncdb')
All of this has been fine, but the real workhorse of multiple database support lies in the model's Manager. Let's write a multi-db aware manager right now:
from django.db import models
from django.conf import settings
from django.db.models import sql
from django.db.transaction import savepoint_state
try:
import thread
except ImportError:
import dummy_thread as thread
class MultiDBManager(models.Manager):
def __init__(self, database, *args, **kwargs):
self.database = database
super(MultiDBManager, self).__init__(*args, **kwargs)
def get_query_set(self):
qs = super(MultiDBManager, self).get_query_set()
qs.query.connection = self.get_db_wrapper()
return qs
def get_db_wrapper(self):
database = settings.DATABASES[self.database]
backend = __import__('django.db.backends.' + database['DATABASE_ENGINE']
+ ".base", {}, {}, ['base'])
backup = {}
for key, value in database.iteritems():
backup[key] = getattr(settings, key)
setattr(settings, key, value)
wrapper = backend.DatabaseWrapper()
wrapper._cursor(settings)
for key, value in backup.iteritems():
setattr(settings, key, value)
return wrapper
def _insert(self, values, return_id=False, raw_values=False):
query = sql.InsertQuery(self.model, self.get_db_wrapper())
query.insert_values(values, raw_values)
ret = query.execute_sql(return_id)
query.connection._commit()
thread_ident = thread.get_ident()
if thread_ident in savepoint_state:
del savepoint_state[thread_ident]
return ret
I know that's a lot of code! Let's go through each piece one-by-one. In the __init__ function, we're just taking in the name of the database that we want to use, and passing the rest into the inherited __init__ function.
get_query_set gets the QuerySet instance that it would have gotten, but replaces the connection on the query object with one provided by the manager, before returning the QuerySet. In essence, this get_db_wrapper function is doing the bulk of the work.
get_db_wrapper first gets the dictionary of the database connection information for the given database name (captured from __init__), then dynamically imports the correct database backend from Django. It then sets the global settings to the values that they should be for that database (while backing up the original settings for restoration later). Then, it initializes that database connection, and restores the settings to their original values.
Most of the database operations are done through the QuerySet, there is still one operation which takes place elsewhere--saving. To account for that, we needed to override the _insert method on the manager. In fact, all we're doing here is providing the InsertQuery with the correct connection and executing that query. Then, we need to ensure that the query is committed and do any transaction management that's necessary.
That's it!
How do we specify that one ore more models will use another database then? Because so far all that we have done is write this MultiDBManager. We will just add one line assigning the manager to our Link model. The model now looks like this:
class Link(models.Model):
url = models.URLField()
description = models.TextField(null=True, blank=True)
date_submitted = models.DateTimeField(default=datetime.datetime.now)
_default_manager = MultiDBManager('secondary')
Conclusion
The MultiDBManager can be re-used for any number of models to be partitioned on to any number of databases. The hard part is making sure that none of the models in one database reference any models in the other database. It's possible to do it, by storing the foreign key as a regular integer and querying for all of the referenced model instances through Python instead of using the database (for obvious reasons), but then it becomes much harder.
It will be great when Django provides a public API for doing this in a more transparent way, but for now this works. Please let me know if you use any of these techniques for large scale Django deployments, and if so, what were the problems that were encountered along the way?
All Content


By Atamert Ölçgen at 5:22 a.m. on Nov. 15, 2008
Great post! I don't need sharding for any of my projects. But now I can see how to implement it if needed.
I think it would look much cooler if the database name was a class variable. But I guess it would require patching the models.Manager. Such as;
class MyClass(models.Model):
database = 'secondary'
By Jannis Leidel at 6:42 a.m. on Nov. 15, 2008
Very cool, Eric. One little nitpick with the way the databases are specified in the settings. I know that it's just Python, but it seems cleaner to me to prevent dict() calls. It would be nicer to just use a simple tuple approach and convert them to dictionaries in the backend.
See http://dpaste.de/605/ for an example.
By Jijnes Patel at 8:48 a.m. on Nov. 15, 2008
Nice post. If you're using MySQL, you could use MySQL Proxy, which will accept connections from your Django application server and inspect the queries. Based on rules written in Lua, you could direct queries using the primary / secondary model you have in this post, or split reads to a Slave instance and writes to a Master.
By Leah Culver at 11 a.m. on Nov. 16, 2008
That's pretty much how we did things :D Thanks to Malcolm for adding the hooks in Django 1.0 that allow for multiple databases!
By Lis at 4:28 a.m. on Nov. 20, 2008
I'm getting deadlocks when trying to update on mysql/innodb
http://dpaste.com/92408/
By enoch at 11:53 p.m. on Nov. 20, 2008
looks great. any troubleshooting help. i setup the same stuff and keep getting a problem in the admin views for the Link objects that "Something's wrong with your database installation. Make sure the appropriate database tables have been created, and make sure the database is readable by the appropriate user."
what's odd is, that's on listing, but on "add new" link, it brings up a from fine to add a new link....
By enoch at 1:38 p.m. on Nov. 21, 2008
fixed the other problem. type on database name on my part.
looking at a new problem. when i try to access the Link objects i get "type object 'Link" has no attribute 'objects'
l = Link.objects.all()
By guille_ at 7:48 a.m. on Nov. 22, 2008
Not thread-savvy. It would be cool to pass the settings as a dictionary and not changing the global settings object.
By jfy3d at 7:45 a.m. on Nov. 27, 2008
Forenkey has error
how fix it ?
By re at 2:04 a.m. on Dec. 6, 2008
good reading!!
By Adrian Holovaty at 11:56 p.m. on March 10, 2009
As of http://code.djangoproject.com/changeset/10026, this code snippet will no longer work. Fortunately, it's a lot easier/cleaner now. The get_db_wrapper() method can be rewritten something like this (untested):
def get_db_wrapper(self):
database = settings.DATABASES[self.database]
backend = __import__('django.db.backends.' + database['DATABASE_ENGINE']
+ ".base", {}, {}, ['base'])
return backend.DatabaseWrapper(database)
By Jordan Keyes at 10:02 a.m. on March 16, 2009
I have tried this both with Django 1.0.2 and with the latest SVN version (from 3/16/09) using the settings Adrian mentioned. Both ways, the multi_syncdb shown seems to be doing a two-way sync (putting all of my models in both databases).
If I run "python manage.py shell" I can manually create items in the tables, but it doesn't appear to be using the _default_manager to decide where to put the items, just going for the secondary database.
Also, it seems the manager is working, but only partially. Any time I try to access the objects in the tables, I get errors like "type object 'TwebprPress' has no attribute 'objects'" (for TwebprPress.objects.all()) Have I done something wrong?
For reference, I am using Python 2.5, Django from SVN, django-mssql from SVN for SQL 2005 connection.
By Jordan Keyes at 1:15 p.m. on March 16, 2009
I may have solved my own problem. In your example I changed the line
"_default_manager = MultiDBManagr('secondary')" to
"objects = MultiDBManager('secondary')" on each model (changing secondary to the appropriate DB name from the Dict.) and things seem to be working appropriately now. The multi_syncdb is still a little borked, but I can do without that.
Thanks for the great tutorial, Eric!
By J at 4:19 a.m. on March 31, 2009
FYI:
Using _default_manager in a model causes many idle MySQL connections. (Django 1.0)
<a href="http://groups.google.com/group/django-users/browse_thread/thread/6ed797f6b00dca90">http://groups.google.com/group/django-users/browse_thread/thread/6ed797f6b00dca90</a>
By Praveen at 12:23 a.m. on April 3, 2009
Hi Nice articles, i have not tried but may be in future will really help.
By Richard McMillan at 12:25 p.m. on May 11, 2009
For those on SVN django: as of 10059 you will need to add :
use_for_related_fields = True
to the subclassed manager for this to work.
By ben 10 oyunları at 3:06 a.m. on May 25, 2009
Based on rules written in Lua, you could direct queries using the primary / secondary model you have in this post, or split reads to a Slave instance and writes to a Master.
By wholesale jewelry at 5:40 a.m. on May 29, 2009
Good website,nice post.
it is interesting.
By Nigel Cohen at 8 a.m. on June 2, 2009
I spent many years trying to create multiple databases with MySQL. I finally got to the simple solution (for the several transactional based database applications we wrote) of preceding the table name with the database name (eg. db1.person inner join db2.organisation).
Could this essential (for large apps) multi-database functionality not be achieved quite simply by setting up a variable in the Model object allowing a database name to be updated - which the MySQL generating code looks at when generating the MySQL code?
By Cheap WoW Gold at 4:38 a.m. on June 13, 2009
I’d like to know more details, thx.
By Buy Coursework at 12:30 a.m. on June 17, 2009
Thanks for the great tutorial.
By Thesis Help at 12:31 a.m. on June 17, 2009
Then, we need to ensure that the query is committed and do any transaction management that's necessary.
By sexy lingerie at 8:09 p.m. on June 22, 2009
This 'false-object' design has had a severe impact on the django code base, which has largely gone unnoticed and, I believe, is contributing to the problems they now face getting releases out the door. When was 1.1 supposed to be out?
By jordan shoes at 1:11 a.m. on June 25, 2009
Thanks for the great tutorial
By jordan shoes at 1:11 a.m. on June 25, 2009
’d like to know more details, thx.
By ugg boots at 1:12 a.m. on June 25, 2009
Good website,nice post.
it is interesting.
By nike shoes at 1:13 a.m. on June 25, 2009
Hi Nice articles, i have not tried but may be in future will really help.
By tiffany jewellery at 1:13 a.m. on June 25, 2009
Thanks for the great tutorial.