You won't see this in the Django documentation, you won't see it mentioned on other blogs, and you certainly won't hear the core developers of Django boasting about it, but Django's ORM has a secret weapon: it supports SQL group_by and having clauses, and it has for quite some time.
It's not part of the public QuerySet API, but rather a part of the private Query API. But just because it's not part of the public API doesn't mean that it's not easy to use--it just means that it might change in the future. So it's really a "use at your own risk" type of deal now. If you're up for keeping on top of things so that you know what to change when the next version of Django comes out, then read on. First, let's start with some model definition:
class TumbleItem(models.Model):
title = models.CharField(max_length=255)
item_type = models.CharField(max_length=50)
def __unicode__(self):
return '%s: "%s"' % (self.item_type, self.title)
A simple tumblog item. Very simple, as in, not really useful at all. But that's OK since this is just a demonstration. To demonstrate, let's create some data:
>>> ti1 = TumbleItem.objects.create(title='Blog Post 1', item_type='blog')
>>> ti2 = TumbleItem.objects.create(title='Blog Post 2', item_type='blog')
>>> ti3 = TumbleItem.objects.create(title='Blog Post 3', item_type='blog')
>>> ti4 = TumbleItem.objects.create(title='Article Dugg 1', item_type='digg')
>>> ti5 = TumbleItem.objects.create(title='Article Dugg 2', item_type='digg')
>>> ti6 = TumbleItem.objects.create(title='Link Saved 1', item_type='link')
OK now that we've loaded some data, let's use the group_by functionality!
>>> qs = TumbleItem.objects.all()
>>> qs.query.group_by = ['item_type']
>>> item_types = [i.item_type for i in qs]
>>> item_types
[u'blog', u'digg', u'link']
There we go, it's quick, easy, and it seems to Just Work. But let's try to grab only the item_types which have more than one item:
>>> qs = TumbleItem.objects.all()
>>> qs.query.group_by = ['item_type']
>>> qs.query.having = ['COUNT(item_type) > 1']
>>> item_types = [i.item_type for i in qs]
>>> item_types
[u'blog', u'digg']
And now we've successfully used the group_by and having functionality in the Django ORM. I'm excited for some aggregation functionality to start being exposed as a public API, as I'm sure it will be more elegant than this solution, but at the same time this is a neat hidden secret in the Django ORM. Well now you have the knowledge, so you have the power, and it's up to you to use it wisely!
All Content


Awesome tip! This will help me get rid of a lot of nasty "extra" statements and custom SQL.
A.w.e.s.o.m.e
Great! I get rid of hundred lines of raw-sql just for this aggregation stuff.
Is it possible to do a count(*) query for grouped lines in this way?
Great tip, I had always thought these features will be added with the aggregation support in 1.1, and didn't know they were already there—Thanks!
I had *no* clue this was possible. It would have helped a lot in building RSSmeme! Thanks Eric.
Oh nice - thanks for that. :)
Hi,
You might want to check out the latest patch on ticket #3566 which might make it to trunk soon (I hope =) ). It is important especially because the these tricks will probably break with public API aggregation support.
Very good hidden tip for the time been though =)
Thanks for this post. It was inspirational. ;-)
http://groups.google.com/group/web2py/t/22cb295d35c2a7f8
You might want to check out the latest patch on ticket #3566 which might make it to trunk soon (I hope =) ). It is important especially because the these tricks will probably break with public API aggregation support.
Great! I get rid of hundred lines of raw-sql just for this aggregation stuff.
I'm glad to hear it! Shoot me an e-mail if you have any problems.
yeah
!
yes!!!
qs.query.group_by = ['item_type']
seems to have problems with postgresql.
additional:
ProgrammingError: ERROR: column id must appear in the GROUP BY clause or be used in an aggregate function
You've pointed out that this is internal API and it's going to be one of those cases where it really will change. The "having" implementation is very fragile at the moment and has to be changed (it will end up looking more like Query.where). So any code treating it as a list won't work on trunk very shortly.
I hope people are really, really paying attention when they use internal stuff like this and have lots of tests so that they will notice when the internal code changes.
This is something i definitely will use in my code. Thanks!
Hmm, after reading a lot of enthusiastic comments I don't think that sharing of tips like that is very useful.
But anyway thanks for sharing!
I will not use it in my projects because I have a lot of code already written and tested.I will wait when it will become "regular" magic instead of "black".
Neato!
Bookmarked.to.delicious.
This is great.
yes, this is a great find and we've been working on finding a solution for aggregate functions.
Thanks
2mmy8bvajdecb7wl
Any ideas how to make resulting queryset respect order_by ?
B.w.e.s.o.m.e
good writen