Recently I launched Peevalizer, a website for talking about your pet peeves, which of course was written in Python using the Django web framework. In fact, it was the culmination of my efforts to teach myself design, and while I made some progress, it's clear that I'll never be a designer. Anyway, part of Peevalizer is that users can vote on different pet peeves, and view the peeves with the highest score. I used django-voting as the application to enable this functionality, and it provides a manager on the Vote object with methods for getting the top N results, where N is a positive integer.
One of the reasons for custom manager on Vote is because aggregate support has not yet been finished. However with Django's built-in Pagination support, it's necessary to retrieve not only a list of the top N voted pet peeves, but a list of all of the pet peeves, ordered by score. How is this possible? Specifically, how is this possible without forking django-voting? Here is the solution that I came up with:
class VoteAwareManager(models.Manager):
def _get_score_annotation(self):
model_type = ContentType.objects.get_for_model(self.model)
table_name = self.model._meta.db_table
return self.extra(select={
'score': 'SELECT COALESCE(SUM(vote),0) FROM %s WHERE content_type_id=%d AND object_id=%s.id' %
(Vote._meta.db_table, int(model_type.id), table_name)}
)
def most_hated(self):
return self._get_score_annotation().order_by('-score')
def most_loved(self):
return self._get_score_annotation().order_by('score')
Then I assigned that manager onto all of the objects that could be voted on. What that's doing is literally issuing a subquery for every row, doing an aggregate on all of the votes for that row, and assigning it to an attribute named score.
However, we also wanted to allow for voting on User objects, which is built in to Django and cannot be easily changed. How do we add this manager to user? I spent a while thinking about that before realizing that it's not the right question to ask. The right question to ask is, how can we associate the User model with this manager? A quick look through some Django source code revealed this to be an absolutely trivial task. Here's how it goes in our code:
from django.contrib.auth.models import User
manager = VoteAwareManager()
manager.model = User
for user in manager.most_hated():
# Do something with user's score
There are a few things to note about this implementation. Firstly, it can be much more computationally expensive to use this method instead of using django-voting's method (which executes some custom SQL), so either be aware of that or use aggressive caching strategies to overcome this shortcoming. The other thing is if you're not using a manager like this on multiple models, and since managers mostly just proxy to QuerySet anyway, it might be simpler to just acquire a QuerySet on the model that you would like to get, and run the extra() method in the calling function.
Django now supports Model Inheritance, and one of the coolest new opportunities that model inheritance brings is the possibility of the creation of mixins, so in this post I'll walk through the steps I went through to create some simple examples. This is just an excercise (although it could be modified to be more robust)--and right now there are better ways to achieve all of the effects of the following mixins. (See django-mptt, for example).
Model and Field Setup
First let's just set up two basic models. The first will be our mixin, NaiveHierarchy, which has a single field, parent, which is a reference to itself. Using this, we can traverse the tree and find all sorts of fun hierarchical information. Also, we'll create the canonical example model: the blog post. Our models start out looking something like this:
from django.db import models
class NaiveHierarchy(models.Model):
parent = models.ForeignKey('self', null=True)
class Meta:
abstract = True
class BlogPost(NaiveHierarchy):
title = models.CharField(max_length = 128)
body = models.TextField()
def __unicode__(self):
return self.title
Now let's test to make sure that worked. We'll create some data and test that parent exists on the instances.
>>> from mixins.models import BlogPost
>>> bp = BlogPost.objects.create(title="post1", body="First post!")
>>> bp2 = BlogPost.objects.create(title="post2", body="Second post!", parent=bp)
>>> bp3 = BlogPost.objects.create(title="post3", body="Third post!", parent=bp2)
>>> bp.parent
>>> bp2.parent
<BlogPost: post1>
Inherited Class-Level Methods
So as you can see, everything is working correctly! But that really doesn't save us much time yet, as it's fairly easy to copy and paste fields onto new models, and we still have to write methods which take advantage of those new fields. In this case, I already know that I'm going to want to get the related children and descendants of my blogposts. So why not write those methods on the abstract model? Thanks to inheritance, those methods will apply to the new model as well.
class NaiveHierarchy(models.Model):
parent = models.ForeignKey('self', null=True)
def get_children(self):
return self._default_manager.filter(parent=self)
def get_descendants(self):
descs = set(self.get_children())
for node in list(descs):
descs.update(node.get_descendants())
return descs
class Meta:
abstract = True
Now, getting all the children or descendents of a particular node is easy:
>>> bp.get_children()
[<BlogPost: post2>]
>>> bp.get_descendants()
set([<BlogPost: post2>, <BlogPost: post3>])
Now this NaiveHierarchy mixin is starting to become quite useful! But what happens if I want to get all of the BlogPosts that have no parents? It's really manager-level functionality. So let's write a manager which defines a get_roots function. Unfortunately, using abstract managers doesn't quite work yet (it works for non-abstract inheritance), but it probably will in future versions of Django. In fact, by applying the latest patch on either Django ticket 7252 or 7154, it will work today. Let's see how this would look:
class NaiveHierarchyManager(models.Manager):
def get_roots(self):
return self.get_query_set().filter(parent__isnull=True)
class NaiveHierarchy(models.Model):
parent = models.ForeignKey('self', null=True)
tree = NaiveHierarchyManager()
def get_children(self):
return self._default_manager.filter(parent=self)
def get_descendants(self):
descs = set(self.get_children())
for node in list(descs):
descs.update(node.get_descendants())
return descs
class Meta:
abstract = True
class BlogPost(NaiveHierarchy):
title = models.CharField(max_length = 128)
body = models.TextField()
objects = models.Manager()
def __unicode__(self):
return self.title
Note that we needed to explicitly define objects as the basic manager, because once a parent class specifies a manager, it gets set as the default manager on all inherited subclasses. This would play out exactly how you would expect:
>>> BlogPost.tree.get_roots()
[<BlogPost: post1>]
>>> BlogPost.tree.all()
[<BlogPost: post1>, <BlogPost: post2>, <BlogPost: post3>]
Advanced Stuff
So now I really wanted to push the limit, and write a mixin which would enhance one of the basic methods of all Model classes: save(). This would be a DateMixin which would contain date_added and date_modified, where date_modified was updated on each save. To my surprise, this Just Worked. Let's see the final result:
import datetime
from django.db import models
class DateMixin(models.Model):
date_added = models.DateTimeField(default=datetime.datetime.now)
date_modified = models.DateTimeField()
def save(self):
self.date_modified = datetime.datetime.now()
super(DateMixin, self).save()
class NaiveHierarchyManager(models.Manager):
def get_roots(self):
return self.get_query_set().filter(parent__isnull=True)
class NaiveHierarchy(models.Model):
parent = models.ForeignKey('self', null=True)
tree = NaiveHierarchyManager()
def get_children(self):
return self._default_manager.filter(parent=self)
def get_descendants(self):
descs = set(self.get_children())
for node in list(descs):
descs.update(node.get_descendants())
return descs
class Meta:
abstract = True
class BlogPost(NaiveHierarchy, DateMixin):
title = models.CharField(max_length = 128)
body = models.TextField()
objects = models.Manager()
def __unicode__(self):
return self.title
Conclusions
Mixins can be powerful tools, but there are some hazards in using mixins, which all boil down to the same basic problem: unexpected consequences. In the case of the DateMixin, if any other class has defined a save() method, our custom save() method simply won't be called unless called explicitly. Perhaps this is a documentation problem, but perhaps it's a fault in the idea of a date mixin altogether.
So all that being said, I'm not suggesting to go off and start using any of the mixins that I have provided here, but rather to illustrate how a mixin can be constructed with Django's new Model Inheritance. I do hope that a reusable app emerges with some great mixins that are useful for a large variety of tasks. Because mixins are powerful, and new shiny things that Django can do, and new shiny things are worth being explored!
Everyone has had the experience of hearing about something new and thinking: "That makes so much sense! Why didn't I think of that?" For programmers that keep up on open source software, new projects that fit the previous description attract not only our admiration, but we want to be a part of this new idea. We become involved and contribute and try to push that new software into any new direction that we can; learning from it and evolving it along the way.
One such idea that fits my description perfectly is Processing.js. Not to belittle John Resig's hard work in actually developing the initial codebase, but the idea is what is so much more important. Thousands of developers knew of both the Processing language and about the canvas tag which is coming to prevalence, but it was a revolutionary idea to notice that the pairing of the two was "both possible and desirable to do in the first place", as Reddit commenter MarshallBanana pointed out.
As a community we need both the revolutionary ideas and the evolutionary changes so that we get great software that solves problems in new and innovative ways, but also that doesn't have bugs and provides a polished experience. But I think that we've become too bogged down in the evolutionary. We get so wrapped up in others' ideas--so interested in polish and shine--that seldom few think outside the boundary of the incremental. I won't claim to be the exception here, and rightly can't claim to be, but it's something that's worrisome nonetheless.
I think that a big part of it is that the open source community has gotten so wary of experimentation with well-established applications. Why can't a development version of Firefox include a Python or Ruby interpreter alongside a JavaScript interpreter? Why can't CSS directives for reflections be explored, or animations be built into the rendering engine? I think that a big part of it is because we've spent so long talking about validation and standards that we forgot about that sense of wonder; that feeling of anything being possible with a bit of code and enthusiasm.
Processing.js, and projects like it, give me hope that revolutionary ideas are still out there. They rekindle that sense of wonder in me. They make me think about other things that are possible. They make me excited about open source again. Let's foster more and greater and better ideas, and just once in a while, eschew the incremental.
There are many times when the programming task at hand is to iterate over some semi-structured text, transform parts of that text in some way, and reintegrate those transformed parts back into the original text.
Typically using a regular expression with re.sub and a callback function, but sometimes you want a bit more control of the process (especially over those parts that dont match the regex). Usually my solution is to write a one-off function that does it, but today I had to write that function yet again and decided to generalize it and post it here.
To be completely honest, this post is more for my own archival purposes than for the internet as a whole, but if anyone else finds it useful, then I'm ecstatic.
def re_parts(regex_list, text):
"""
An iterator that returns the entire text, but split by which regex it
matched, or none at all. If it did, the first value of the returned tuple
is the index into the regex list, otherwise -1.
>>> first_re = re.compile('asdf')
>>> second_re = re.compile('an')
>>> list(re_parts([first_re, second_re], 'This is an asdf test.'))
[(-1, 'This is '), (1, 'an'), (-1, ' '), (0, 'asdf'), (-1, ' test.')]
>>> list(re_parts([first_re, second_re], 'asdfasdfasdf'))
[(0, 'asdf'), (0, 'asdf'), (0, 'asdf')]
>>> list(re_parts([], 'This is an asdf test.'))
[(-1, 'This is an asdf test.')]
>>> third_re = re.compile('sdf')
>>> list(re_parts([first_re, second_re, third_re], 'This is an asdf test.'))
[(-1, 'This is '), (1, 'an'), (-1, ' '), (0, 'asdf'), (-1, ' test.')]
"""
def match_compare(x, y):
return x.start() - y.start()
prev_end = 0
iters = [r.finditer(text) for r in regex_list]
matches = []
while iters:
if matches:
match = matches.pop(0)
(start, end) = match.span()
if start > prev_end:
yield (-1, text[prev_end:start])
yield (regex_list.index(match.re), text[start:end])
elif start == prev_end:
yield (regex_list.index(match.re), text[start:end])
prev_end = end
else:
matches = []
for iterator in iters:
try:
matches.append(iterator.next())
except StopIteration:
iters.remove(iterator)
matches = sorted(matches, match_compare)
last_bit = text[prev_end:]
if len(last_bit) > 0:
yield (-1, last_bit)
Lately there's been a lot of discussion in certain programming communities about which method of object extension makes more sense: inheritance, or composition. Most of the time these discussions turn into debates, and when that happens developers tend to "take sides"--often moving towards extremist positions on the issue. I've been sort of quietly thinking about it all lately, trying to determine which use case warrants which approach. Here I show examples of both, explore some properties and consequences of both composition and inheritance, and finally talk about my own preferences.
Examples of Composition and Inheritance
Before talking about the consequences of inheritance vs. composition, some simple examples of both are needed. Here's a simplistic example of object composition (using Python, of course, as our demonstration language):
class UserDetails(object):
email = "floguy@gmail.com"
homepage = "http://www.eflorenzano.com"
class User(object):
first_name = "Eric"
last_name = "Florenzano"
details = UserDetails()
Obviously these are not very useful classes, but the essential point is that we have created a namespace for each User object, "details", which contains the extra information about that particular user.
An example of the same objects, modified to use object inheritance might look as follows:
class User(object):
first_name = "Eric"
last_name = "Florenzano"
class UserDetails(User):
email = "floguy@gmail.com"
homepage = "http://www.eflorenzano.com"
Now we have a flat namespace, which contains all of the attributes from both of the objects. In the case of any collisions, Python will take the attribute from UserDetails.
Consequences
From a pure programming language complexity standpoint, object composition is the simpler of the two methods. In fact, the word "object" may not even apply here, as it's possible to achieve this type of composition using structs in C, which are clearly not objects in the sense that we think of them today.
Another immediate thing to notice is that with composition, there's no possibility of namespace clashes. There's no need to determine which attribute should "win", between the object and the composed object, as each attribute remains readily available.
The composed object, more often than not, has no knowledge about its containing class, so it can completely encapsulate its particular functionality. This also means that it cannot make any assumptions about its containing class, and the entire scheme can be considered less brittle. Change an attribute or method on User? That's fine, since UserDetails doesn't know or care about User at all.
That being said, object inheritance is arguably more straightforward. After all, an e-mail address isn't a logical property of some real-world object called a "UserDetails". No--it's a property of a user--so it makes more sense to make it an attribute on our virtual equivalent, the User class.
Object inheritance is also a more commonly-understood idea. Asking a typical developer about object composition will most likely result in some mumbling and deflection, whereas the same question about object inheritance will probably reveal a whole host of opinions and experience. That's not to say that composition is some sort of dark art, but simply that it's less commonly talked about in so many words.
As more of a sidenote than anything else, inheritance can be speedier in some compiled languages due to some compile-time optimizations vs. the dynamic lookup that composition requires. Of course, in Java you can't escape the dynamic method lookup, and in Python it's all a moot point.
My Preferences
In general, I find object composition to be desirable. I've seen too many projects get incredibly (and unnecessarily) confusing due to complicated inheritance hierarchies. However, there are some cases where inheritance simply makes more sense logically and programmatically. These are typically the cases where an object has been broken into so many subcomponents that it doesn't make sense any more as an object itself.
The Django web framework has an interesting way of dealing with model inheritance, and I think that more projects should follow its example. It uses composition behind the scenes, and then flattens the namespace according to typical inheritance rules. However, that composition still exists under the covers, so that that method may be used instead.
The answer is not going to be "composition always" or "inheritance always" or even any combination of the two, "always". Each has its own drawbacks and advantages and those should be considered before choosing an approach. More research needs to be done on the hybrid approaches, as well, because things like what Django is doing will provide more answers to more people than traditional approaches. Cheers to continued thought about these problems and to challenging conventional thought!
All Content

