Posted on Feb. 15, 2008 at 3:28 P.M.

In creating an any website with textual content, you have the choice of either writing plaintext or writing in a markup language of some kind. The immediately obvious choice for markup language is HTML (or XHTML), but HTML is not as human-readable as something like Textile, Markdown, or Restructured Text. The advantage of choosing one of those human-readable alternatives is that content encoded using one of them can be translated very easily into HTML.

When one of my friends started designing his blog using Django, it got me thinking about how best to deal with that translated HTML. It seems like a waste to keep re-translating it every time a visitor views the page, but it also seems like it's redundant to keep the translated HTML stored in the database.

Here's my solution to the problem: cache it. For a month. Here's an example, using Restructured Text:

from django.db import models
from django.contrib.markup.templatetags.markup import restructuredtext
from django.core.cache import cache
from django.utils.safestring import mark_safe

class MyContent(models.Model):
    content = models.TextField()

    def _get_content_html(self):
        key = 'mycontent_html_%s' % str(self.pk)
        html = cache.get(key)
        if not html:
            html = restructuredtext(self.content)
            cache.set(key, html, 60*60*24*30)
        return mark_safe(html)
    content_html = property(_get_content_html)

    def save(self):
        if self.id:
            cache.delete('mycontent_html_%s' % str(self.pk))
        super(MyContent, self).save()

What I'm doing here is writing a method which either gets the translated HTML from the cache, or translates it and stores it in the cache for a month. Then, it returns it as safe HTML to display in a template. The last thing that we do is override the save method on the model, so that whenever the model is re-saved, the cache is deleted.

There we go! We now have the HTML-rendered data that we want, and no duplicated data in the database. Keep in mind that this way of doing things becomes more and more useful the more RAM that your webserver has.

Ben
at 4:21 p.m.
on Feb. 15, 2008

Have you given this any performance tests? I'd be curious to see if network latency offsets the gains from not having to reprocess each time.

at 4:32 p.m.
on Feb. 15, 2008

No, I haven't. I'm running memcached locally on the same machine as my webserver, so it's not really an issue for me. I'd suspect that if you're avoiding fetching items from the cache due to network latency, then there may be a problem with your network or architecture.

Ben
at 5:23 p.m.
on Feb. 17, 2008

The basis of my question is: why implement caching if it doesn't decrease time-to-output? For large objects that are expensive to create, of course you'll want to cache it. For small bits that are cheap to generate, why bother caching it -- you'll likely make your app slower.

I ran a highly unscientific experiment using 'blog-sized' input text run through markdown... I was getting 4ms as a mean generation time... so there's no gain in going over a network to hit a cache.

Sure, you'll be waiting on IO, so you can have higher concurrency... but it's a questionable gain.

I agree that there's merit to your approach, and appreciate the write-up. I was just trying to explore the trade-offs.

at 5:33 p.m.
on Feb. 17, 2008

I definitely agree that it's not for every application.

My view about it is that every programming choice is about tradeoffs. In this case, the tradeoff is processing cycles vs. RAM space used. I've just chosen the latter of the two--which makes sense for me because my processing power is limited and slow on my server arrangement, but RAM is plentiful.

That said, I do see your point. If push came to shove, and memory started becoming a problem, this would definitely be the first thing to go.

Thanks for the comment!

at 4:38 p.m.
on Feb. 15, 2008

An alternative to writing your caching in the model is to make use of the new-ish cache template tag:

http://www.djangoproject.com/documentation/cache/#template-fragment-caching

So then your template might look like::

{% load cache %}
{% load markup %}
{% cache 2592000 blog_post object.id %}
{% object.content|restructuredtext %}
{% endcache %}

at 4:41 p.m.
on Feb. 15, 2008

Definitely! That's a great solution for a lot of different things--and I definitely thought about that. However, it's a little unclear how one goes about invalidating that data. That's why I ended up recommending explicit caching.

at 9:43 p.m.
on Feb. 15, 2008

NOTE: there is an even better way.

Use 'rstpages' and combine the power of restructuredtext and flatpages!!! (and it doubles as a wiki)

This is the backbone of the new PyCon website.
http://us.pycon.org/

Want to see something really cool?
Check out the recent changes RSS feed:
http://us.pycon.org/2008/recent/?feed=rss2

And I am barely scratching the surface.
https://pycon.coderanger.net/

at 6:27 p.m.
on Feb. 16, 2008

Where can I find out more info about rstpages? Is it this module?
https://pycon.coderanger.net/browser/django/trunk/pycon/restructuredtext

Also, I'm not really sure why an RSS feed would have the straight ReST and not processed ReST, but that's an interesting idea.

Anyways, thanks for all of your hard work on PyCon-Tech stuff. I'm sure we'll run into each other at PyCon!

Rajesh Dhawan
at 3:26 p.m.
on Feb. 18, 2008

You could also add a non-editable content_html field straight to your model and compute it by overriding MyContent.save().

This idea is used by the Textpattern CMS and there's a Django write up on it here:

http://code.djangoproject.com/wiki/UsingMarkup

at 3:51 p.m.
on Feb. 18, 2008

Yep, that would be classic denormalization. It's a good solution in certain situations.

Search

 

Recent Links

  • James Tauber's Talk on Pinax: Reusable Applications in Django
  • James Tauber does a great job explaining the reasoning behind Pinax, and then launches Cloud27.com. A definite must-watch if you're at all interested in reusable applications in Django.

  • Django: The Web Framework for Ponies
  • This wins the funniest post-DjangoCon blog post award. Claims that Django needs an immediate redesign to brand the Pony as the mascot. Based on some IRC chatter around this post, this Pony will also require a beard, or else BeardNotSeriousEnoughException will be raised.

  • Ars at DjangoCon: Build your own social network with Pinax
  • Excellent summary of the Pinax project, Cloud27, and everything that James and the Pinax team have been working on for months now. The video of James's talk will be uploaded to YouTube along with the rest of the talks from DjangoCon, so keep your eyes peeled for that.

  • Zerok's Blog
  • Zerok has committed a ton of great patches to my Django reusable application projects. Every time that I review one of his patches, it goes through without any modifications. This guy not only provides great patches, but he has some great Django tips and tricks. Subscribe to his blog for some serious Django tips and tricks.

  • This Week in Django
  • The TWiD guys have outdone themselves yet again! I have had the distinct honor of appearing and guest hosting on their show. Now they have completely re-done their website. This is one of the best redesigns I've seen in quite a while, and I'm certain that it will become a defacto goto for great Django information and news.

  • Callcast - Discussion with Jeff Croft
  • Great discussion with Jeff Croft and Kevin Fricovsky, talking about Django, design, web standards, and various other things. Kevin has really been on fire lately in his blog, and Jeff has some good stuff to say. Both their sites are bookmarks, for sure.

  • See the rest of my links...

Pownce

Badges

  • django badge
  • apache badge
  • GeoURL
  • XFN Friendly
  • Valid HTML 4.01 Transitional