Database triggers aren't evil, and they actually kind of rock

Who says they suck?

Nobody that I've seen has come out and actually said that they don't like database triggers, but at the same time, Python (Django) programmers like to program in Python. And PL/pgSQL certainly is not Python. There's a tendency to do everything in Python--especially with the use of Django's dispatcher.

But there's some serious overhead with that approach, and roundtrips, and race conditions, etc. If you're using a good database, there's an alternative: you guessed it, database triggers.

Let's set up some models

Here will be our models for the remainder of this post:

class Bookmark(models.Model):
    title = models.CharField(max_length=100)
    url = models.URLField(max_length=255)

    num_votes = models.PositiveIntegerField(null=True, blank=True, default=0)
    score = models.IntegerField(null=True, blank=True, default=0)

class Vote(models.Model):
    bookmark = models.ForeignKey(Bookmark, related_name='votes')
    value = models.IntegerField()

As you can tell, we have a straightforward Bookmark and Vote models here. But there are also two denormalized fields: num_votes, and score.

Doing it in Python

The advantage of doing this in Python is that it's simple and Django supports it out of the box. Here's how the code for that would look:

from django.db.models import signals

def update_bookmark_aggregate(sender, instance, **kwargs):
    bmark = instance.bookmark
    bmark.num_votes = bmark.votes.count()
    bmark.score = sum(bmark.votes.values_list('value', flat=True))
    bmark.save(force_update=True)
signals.post_save.connect(update_bookmark_aggregate, sender=Vote)

Very simply, every time a vote is saved, the update_bookmark_aggregate function is called which updates the bookmark with its new score and num_votes.

Doing it in Pl/pgSQL

Create a new file, named management.py under your bookmarks app directory. Its contents will be as follows:

from django.db.models import signals
from bookmarks import models

sql = """
CREATE LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION update_bookmark_aggregate_trigger()
    RETURNS "trigger" AS '
    DECLARE
        new_score INTEGER;
        new_num_votes INTEGER;
    BEGIN
        SELECT COUNT(*) INTO STRICT new_num_votes FROM bookmarks_vote
            WHERE bookmark_id = NEW.bookmark_id;
        SELECT COALESCE(SUM(value), 0) INTO STRICT new_score FROM bookmarks_vote
            WHERE bookmark_id = NEW.bookmark_id;
    UPDATE bookmarks_bookmark
        SET
            score = new_score,
            num_votes = new_num_votes
        WHERE id = NEW.bookmark_id;
    RETURN NEW;
    END;' LANGUAGE 'plpgsql' VOLATILE;

CREATE TRIGGER update_bookmark_aggregate_trigger AFTER INSERT OR UPDATE
    ON bookmarks_vote FOR EACH ROW
    EXECUTE PROCEDURE update_bookmark_aggregate_trigger();
"""

def create_trigger(app, created_models, verbosity, **kwargs):
    from django.db import connection
    cursor = connection.cursor()
    cursor.execute(sql)
signals.post_syncdb.connect(create_trigger, sender=models)

In this file we have declared two variables, new_score, and new_num_votes. We do two queries to get the aggregate data. And then we update the bookmark to reflect the new aggregated values. This script is executed once when the bookmarks models are first loaded into the database, and we're all set!

Let's see how it works

>>> from bookmarks.models import Bookmark, Vote
>>> b = Bookmark.objects.create(title="Blog", url='http://eflorenzano.com/')
>>> b.num_votes
0
>>> b.score
0
# There is no aggregate data yet
>>> Vote.objects.create(bookmark=b, value=1)
<Vote: Vote object>
>>> Vote.objects.create(bookmark=b, value=2)
<Vote: Vote object>
# We need to re-query for the bookmark, due to no identity map in Django.
>>> b = Bookmark.objects.all()[0]
>>> b.num_votes
2
>>> b.score
3

Voila! This was all done in the database behind the scenes. Very cool, very fast, and it kind of rocks.

21 Comments So Far...

By Andreas at 12:43 a.m. on Nov. 4, 2008

Cool stuff indeed, i have nothing about doing stuff in other languages than python if theres is a need for it but if you compare the amount of lines of code for django and sql you'll understand why id picked the django way. Would be awesome though if there was someway to generate triggers from python code with nice django model abstraction sauce on it!

 

By Erik Karulf at 1:10 a.m. on Nov. 4, 2008

One of the coolest things I've seen ever seen done with a database actually used postgres and memcached together.

The application used pgMemcache (http://pgfoundry.org/projects/pgmemcache/) to have database triggers populate / expire caches from the database instead of the application.

 

By Eric Florenzano at 12:22 a.m. on Nov. 4, 2008

Funny you should mention that--our cache invalidation solution at at Mochi Media works very similarly to that. Probably shouldn't go into details about the secret sauce, but yeah, it's quite cool.

 

By Jiri Barton at 5:11 a.m. on Nov. 4, 2008

Cool, and I have gone a bit further with the idea. I'm keeping an object hierarchy in my application and I need to view it flattened too. I'm using triggers to keep both views in sync, just like you.

However, I just set a flag in the trigger like this:

...
FOR EACH ROW BEGIN SET @call_resolve_hierarchy = 1; END;

And then call a stored procedure at the end of each request (and thus transaction) by middleware. This procedure checks the flag and updates the hierarchy if needed. The advantage is the sync will be done by a single query at the end.

I'm so happy about your post to see the triggers have been a good idea.

 

By James Tauber at 6:37 a.m. on Nov. 4, 2008

Before moving to Django, Quisition made extensive use of hand-coded SQL including triggers but I got rid of all that when I ported. It was my first time using PostgreSQL for a site so maybe I went overboard, but it would be nice to explore a hybrid approach.

Of course that would make it even more difficult to run on Google App Engine. In many respects, triggers and the like are the opposite approach to that taken by BigTable, etc.

 

By Bartek at 6:52 a.m. on Nov. 4, 2008

Great post

One question: Why can't this exact method be placed into the save() of a vote? Whenever a vote is placed, modify the Bookmark's vote data

I guess it's just a bit more sloppy then this, which places everything outside on it's own but I'm curious if you have other reasons.

 

By Vidar at 7:12 a.m. on Nov. 4, 2008

You can also write your postgres triggers in python :

http://www.postgresql.org/docs/8.3/interactive/plpython-trigger.html

 

By Norbert at 9:56 a.m. on Nov. 4, 2008

Triggers have some downsides too. They do their invisible work and tend to be forgotten over time.

Don't forget that your model is not normalized. score and num_votes could (and should?) be computed at runtime.

If you don't want to do that triggers or updateable views are fine solutions.

 

By UloPe at 11:06 a.m. on Nov. 4, 2008

Well yeah triggers sometimes are cool, but they also have BIG disadvantages:

- You loose database independence
- You have to have internal knowledge of the ORM's name mangling scheme
- You need to reflect every change in the model to the trigger

So I think triggers are (in this case) a classic case of performance optimization (as is the denormalization of data in the first place).

 

By Jason F. McBrayer at 9:20 a.m. on Nov. 5, 2008

I'm with Norbert and UloPe. Database triggers are a little evil, but there are cases where they make sense.

Basically, the problem is that they put business logic into your storage layer. Sometimes, that's a good idea -- notably where you have lots of different applications (written by different people, at different times, in different languages) accessing the same database backend, and you need to make sure that your business logic runs no matter what the application. This is <em>not</em> the normal use case for Django, where your models are normally part of your application.

Also, normally it's a bad idea to scatter business logic all over the place -- keep it in your application where it belongs. Like Norbert says, because their work is invisible to the application developer, they get forgotten over time.

So yeah, they're an optimization, and you know what they say about optimization.

 

By wow powerleveling at 9:02 p.m. on Nov. 6, 2008

Triggers have some downsides too. They do their invisible work and tend to be forgotten over time.

Don't forget that your model is not normalized. score and num_votes could (and should?) be computed at runtime.

If you don't want to do that triggers or updateable views are fine solutions.

 

By wow power leveling at 7:22 p.m. on Nov. 7, 2008

Yes, but what happens when a data center gets blown up or catches fire. Is every single bit of data replicated multiple times? The more replication going on, the more expensive it is, so I doubt it's nearly as replicated as we'd all like to think. Thus another misconception of the reliability of the cloud I'm afraid.

 

By David at 1:07 p.m. on Nov. 8, 2008

This is what is called a materialized view and your trigger IS evil. Doing a count on tables like that within a view has the potential for some serious performance problems. As with many systems in a small scale environment this probably won't be an issue.

There is no reason for you to do those select counts every time. This is very similar to a banking example.

I would get rid of the select counts and just to the following:

UPDATE bookmarks_bookmark
SET score = score + 1
, num_votes = num_votes + NEW.value
WHERE id = NEW.bookmark_id;

That's all you need. There's no need to recalc the num_votes or the number of bookmarks everytime. That's the purpose of triggers to maintain integrity across tables.

 

By ben 10 oyunları at 3:10 a.m. on May 25, 2009

i In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

 

By buy wow gold at 4:45 a.m. on June 13, 2009

Thx for sharing the exciting info with us.

 

By Wholesale Lingerie at 8:15 p.m. on June 22, 2009

Have lots of fun in Japan! I lived there for four years and had a blast. If you don't have your travel plans made already, I heartily recommend you check out the Kansai region, especially Nara. There's a lot of great historical stuff to see and it's all pretty close together.

 

By jordan shoes at 1:51 a.m. on June 25, 2009

Triggers have some downsides too. They do their invisible work and tend to be forgotten over time

 

By jordan shoes at 1:51 a.m. on June 25, 2009

There is no reason for you to do those select counts every time. This is very similar to a banking example.

 

By ugg boots at 1:52 a.m. on June 25, 2009

i In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

 

By nike shoes at 1:52 a.m. on June 25, 2009

Don't forget that your model is not normalized. score and num_votes could (and should?) be computed at runtime.

 

By tiffany jewellery at 1:53 a.m. on June 25, 2009

Thx for sharing the exciting info with us.

 

Voice your opinion...