Lately I've really fallen in love with writing utilities whose interface is simply HTTP. By making it accessible via HTTP, it's really easy to write clients that talk to the utility and, if the need arises, there are lots of tools that already exist for doing things with HTTP, like load balancing and caching, etc.
While it would be easy to use a framework to build these utilities, lately I've been choosing not to do so. Web frameworks like Django and Pylons are great when you need to build a fully-featured web application that will be accessible by people. When it will only be computers talking to the service, however, a lot of the machinery provided by frameworks is unneeded and will only slow your utility down. Instead of using a framework, we're going to write a pure WSGI application.
An Example: Music Discovery Website
This has all been very abstract, so let's take an example: Suppose you run a music discovery website that lets you play songs online. Next to each song, you simply want to display how many times the song has been played.
One solution to that problem could be to have a play_count column on the table where the song metadata is stored. Every time someone plays the song, you could issue an UPDATE on the row and increase the play_count by one. This solution will work while your site is small, but as more and more people begin using the application, the number of writes to your database is going to kill its performance.
A much more robust and scalable solution is to append a new line to a text log file every time a song is played, and have a process run regularly to scoop up all of the log files and update those play_count fields in the database.
However, even if you have that regular process run once every hour, there's still too great a lag time between when a user takes an action and when they see the results of that action. This is where our WSGI utility comes into play. It can serve as a realtime play counter to count the plays in between the time when the logs are analyzed and the play_count columns updated.
Song Play Counter
We can design the interface for our WSGI song play counter utility any way that we like, but I'm going to try to keep it as RESTful as I can. The interface will look like this:
- GET /song/SONGID will return the current play count of the given song
- POST /song/SONGID will increment the play count of the given song by one, and return its new value
- GET / will return a mapping of all songs registered to their respective play counts
- DELETE / will clear the whole mapping
So let's get started. First, I always like to start with a very basic skeleton:
def application(environ, start_response):
start_response('200 OK', [('content-type', 'text/plain')])
return ('Hello world!',)
This does what you would imagine, returns Hello world! to each and every request that it receives. Not very useful, so let's make it more interesting:
from collections import defaultdict
counts = defaultdict(int)
def application(environ, start_response):
global counts
path = environ['PATH_INFO']
method = environ['REQUEST_METHOD']
if path.startswith('/song/'):
song_id = path[6:]
if method == 'GET':
start_response('200 OK', [('content-type', 'text/plain')])
return (str(counts[song_id]),)
elif method == 'POST':
counts[song_id] += 1
start_response('200 OK', [('content-type', 'text/plain')])
return (str(counts[song_id]),)
else:
start_response('405 METHOD NOT ALLOWED', [('content-type', 'text/plain')])
return ('Method Not Allowed',)
start_response('404 NOT FOUND', [('content-type', 'text/plain')])
return ('Not Found',)
We've now added the data structure that we're using to keep track of the counts, which in this case is a defaultdict(int). We're also now looking at the request path and method, as well. If it's a GET starting with /song/, we look up the count and return it, and if it's a POST starting with /song/, we increment it by one before returning it. Also, we're doing the proper thing if we detect a method that's not allowed: we're returning HTTP error code 405.
Now let's add the final bit of functionality:
from collections import defaultdict
counts = defaultdict(int)
def application(environ, start_response):
# ... start of app
if path.startswith('/song/'):
# ... song-specific logic
elif path == '/':
if method == 'GET':
res = ','.join(['%s=%s' % (k, v) for k, v in counts.iteritems()])
start_response('200 OK', [('content-type', 'text/plain')])
return (res,)
elif method == 'DELETE':
counts = defaultdict(int)
start_response('200 OK', [('content-type', 'text/plain')])
return ('OK',)
else:
start_response('405 METHOD NOT ALLOWED', [('content-type', 'text/plain')])
return ('Method Not Allowed',)
# ... rest of app
We've done basically the same thing here as we did with the previous example: we are looking at the request path and method and doing the appropriate action. There really is nothing very tricky going on here. We're inventing our own format for the case where we return the counts for all songs, but it's nothing that will be hard to parse.
NOTE: Generally you would want to use some sort of threading lock primitive before accessing a global dictionary like this. I will be using Spawning to run this WSGI application, with a threadpool size of 0 to use cooperative coroutines instead of standard threads, so I am able to get away without locks for this application. To install Spawning for yourself, just type:
sudo easy_install Spawning
Running the Utility
Let's just take a quick look at how this utility works, from the command line:
$ spawn -t 0 -p 8000 counter.application
...and in another window:
$ curl http://127.0.0.1:8000/song/1
0
$ curl -X POST http://127.0.0.1:8000/song/1
1
$ curl http://127.0.0.1:8000/song/1
1
$ curl -X POST http://127.0.0.1:8000/song/5
1
$ curl -X POST http://127.0.0.1:8000/song/5
2
$ curl http://127.0.0.1:8000/
1=1,5=2
$ curl -X DELETE http://127.0.0.1:8000/
OK
As you can see, it seems to be working correctly. The play counter is behaving as expected.
Writing a Client to Talk to our Utility
Now that we have our WSGI utility written to keep track of the counts on our songs, we should write a client library to communicate with this server.
import httplib
class CountClient(object):
def __init__(self, servers=['127.0.0.1:8000']):
self.servers = servers
def _get_server(self, song_id):
return self.servers[song_id % len(self.servers)]
def _song_request(self, song_id, method):
conn = httplib.HTTPConnection(self._get_server(song_id))
conn.request(method, '/song/%s' % (song_id,))
resp = conn.getresponse()
play_count = int(resp.read())
conn.close()
return play_count
def get_play_count(self, song_id):
return self._song_request(song_id, 'GET')
def increment_play_count(self, song_id):
return self._song_request(song_id, 'POST')
def get_all_play_counts(self):
dct = {}
for server in self.servers:
conn = httplib.HTTPConnection(server)
conn.request('GET', '/')
counts = conn.getresponse().read()
conn.close()
if not counts:
continue
dct.update(dict([map(int, pair.split('=')) for pair in counts.split(',')]))
return dct
def reset_all_play_counts(self):
status = True
for server in self.servers:
conn = httplib.HTTPConnection(server)
conn.request('DELETE', '/')
resp = conn.getresponse().read()
if resp != 'OK':
status = False
conn.close()
return status
What we have here is a simple class that converts Python method calls to the RESTful HTTP equivalents that we have written for our WSGI utility. The best part about this setup, though, is that it uses a hash based on the song_id to determine which server to connect to. If you only ever do per-song operations, this setup is quite literally infinitely scalable. You could have thousands of servers keeping track of song counts, none of them knowing about each other. Since the decision about which server to talk to happens on the client side, there needs to be no communication between the servers whatsoever.
However, if you start to use the get_all_play_counts and reset_all_play_counts, then eventually after many many servers are added it will start to get slower.
Let's explore this client:
>>> from countclient import CountClient
>>> c = CountClient()
>>> c.get_play_count(1)
0
>>> c.increment_play_count(1)
1
>>> c.increment_play_count(1)
2
>>> c.get_play_count(1)
2
>>> c.increment_play_count(5)
1
>>> c.get_all_play_counts()
{1: 2, 5: 1}
>>> c.reset_all_play_counts()
True
>>> c.get_all_play_counts()
{}
Benchmarks!
I'm not a benchmarking nut in any way, shape, or form these days. However, in Python it's quite tough to beat pure-WSGI applications for raw speed. Using my MacBook Pro with a 2.5GHz Intel Core 2 Duo and 2 GB 667 MHz DDR2 SDRAM I got these results from ApacheBench:
e:Desktop ericflo$ ab -n 10000 http://127.0.0.1:8000/song/1
...
Concurrency Level: 1
Time taken for tests: 7.792 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 1020000 bytes
HTML transferred: 10000 bytes
Requests per second: 1283.31 [#/sec] (mean)
Time per request: 0.779 [ms] (mean)
Time per request: 0.779 [ms] (mean, across all concurrent requests)
Transfer rate: 127.83 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 2
Processing: 0 1 0.8 1 43
Waiting: 0 1 0.5 0 43
Total: 1 1 0.8 1 43
Take these results with a huge grain of salt, but suffice it to say, it's fast. It would probably be even faster using mod_wsgi instead of Spawning.
Drawing Conclusions From This Exercise
I don't want to misconstrue my standpoint on this: frameworks definitely have their place. There's no way you would want to write an entire user-facing application with pure WSGI unless you were using lots of middleware and stuff and at some point you're just recreating Pylons. But when you're writing a HTTP utility like we did here, then I think that pure-WSGI is the way to go.
I'd like to touch on one more nice side effect of using pure-WSGI: You can run it in any application server that supports WSGI. That means Google App Engine, Apache, Spawning, CherryPy, and many other containers. It can easily be served by pure python so even on very restrictive shared hosting it's possible to run your utility.
What do you think of pure-WSGI utilities? Are you using them in your app? I'd love to hear about it--leave me a comment and tell me your thoughts on this subject.
All Content


By Kevin Dangoor at 9:40 a.m. on Jan. 8, 2009
Personally, I go one step up from pure WSGI. I think the cost in complexity and speed is pretty minimal for the increase in code clarity.
First step: use WebOb to give you sane Request and Response objects, rather than just environ and start_response.
Next: use something like urlrelay for dispatching URLs. Once you have more than a couple of URLs, tearing up the PATH_INFO manually gets old, and there are *many* good dispatchers. Just pick the style that suits you.
By Christian Wyglendowski at 10:58 a.m. on Jan. 8, 2009
I guess I go more than one step up from pure WSGI. I typically use CherryPy for HTTP utilities. For a more in depth reply, see this post:
http://blog.dowski.com/2009/01/08/http-utilities-with-cherrypy/
By John Matthew at 12:07 p.m. on Jan. 8, 2009
I love numbers for benchmarks, and you sample is really cool.
However how would the numbers compare to using django to do this simple example too?
J
By Nino Walker at 1:49 p.m. on Jan. 8, 2009
Take a look at Servable, a mix-in that makes WGSI capable data services with very little effort.
http://code.google.com/p/servable/
It is designed for handling GET requests, but could easily be extended for DELETE and POST with very little work.
By Lateef Jackson at 1:58 p.m. on Jan. 8, 2009
Kevin is dead on about the dispatcher. I have been playing a lot with async WSGI server for performance, I use http://lukearno.com/projects/selector/ which is very fast. It does have a performance hit of small %, I still was able to get over 4K req/sec on my MacBook Pro.
By Poppy at 2:27 p.m. on Jan. 8, 2009
I wouldn't call this infinitely scalable for a few reasons.
1) you really don't want your clients to keep a list of servers, because that changes dynamically in any kind of production environment
2) you'll probably want to use a LB to bucket requests to servers, and when load for one key range gets high...
3) you need code to split and merge keysets across hosts.
Scaling is never easy ;) (you can do the above with a DHT, but then you have to worry about latency / consistency...)
However, this sample kicks ass. I think the commenters who say "yeah, but *just this one lib*" are missing the point of the absolute freedom you get when you assume nothing but the runs-everywhere standard. I'm not familiar with the libs in question though (and still don't have wsgi production experience), so this could all just be gluteus-babble.
By Lawrence Oluyede at 3:19 p.m. on Jan. 8, 2009
We had a lot of different applications and scripts in pure WSGI here at work, but at the end they've been rewritten in Pylons.
I agree with Kevin Dangoor with this. WSGI pure is cool, but at the end it does always come to some abstraction and if you start to add request and response objects, url selection and a webserver... then Eric, you end up rewriting Pylons or similar.
I suggest you to read this: http://bitworking.org/news/Why_so_many_Python_web_frameworks
bye!
By Tim Parkin at 4:26 p.m. on Jan. 8, 2009
We did the 'rewriting pylons' thing..
http://dev.timparkin.co.uk/2009/01/happy-medium-from-wsgi-to-cherrypy.html
Great blog post! It's great that wsgi has allowed people to play with more framework options.. That combined with better understanding of HTTP makes for an interesting period in web framework development..
By Lawrence Oluyede at 2:41 a.m. on Jan. 9, 2009
I agree, the best way to know how a web framework works is writing one :-)
ps. in your post you don't really explain the reasons to ditch most of Pylons. I'd be glad to know why
By Tim Parkin at 4:31 a.m. on Jan. 13, 2009
We didn't intend to ditch pylons, we started off using pylons and then found things that weren't really needed in a basic rest like framework. So the following were removed..
routes (you shouldn't be forced to use a particular routing system), webhelpers (wasn't used), customised traceback templates (nice but not necessary - the basic works well anyway),
error middleware (should be external really),
all the extra variables passed to the template 'just in case' (if you want them tiy cab add them), default session (you might choose something else other than beaker - or you might not use session, we don't)
default templating language (this really shouldn't be in the core of an agnostic system).
Finally we realised we'd actually removed all of pylons!
By plaes at 4:36 p.m. on Jan. 8, 2009
There is also a library called Werkzeug - bunch of WSGI utilities that speed up development of these small apps/sites tenfold...
By michele at 5:20 a.m. on Jan. 9, 2009
Maybe we will get wsgi 2.0 one day to make pure wsgi somewhat nicer... ;-)
I really hope so, just look at what the ruby guys are doing with rack, rack is based on wsgi but removes start_response (like wsgi 2.0) and other things IIRC, this makes the entry level barrier much lower and writing things "the right way" is way easier (for instance, I still don't completely understand how a well behaving wsgi middleware should act).
Its use is sky rocketing (even rails adopted it), just some links:
http://github.com/rack/rack
http://github.com/rack/rack-contrib
http://github.com/rtomayko/rack-cache
http://weblog.rubyonrails.org/2008/12/17/introducing-rails-metal
http://delicious.com/popular/rack
My wishes for 2009:
1) wsgi 2.0 out of the door
2) everyone in the python community benefits ;-)
3) even django adopts it
By Dave K at 9:28 a.m. on Jan. 9, 2009
I know it's a bit off topic but being that you're a bit more experienced I'd like to see you run a shootout of Django, Pylons and TurboGears, heck web.py too. But not "What's better" just "What is each good ad" For example Django stinks at integration with our existing Interbase database
By Dave K at 9:44 a.m. on Jan. 9, 2009
I'd also like to know more about the thread locking stuff, and is that necessary with the GIL?
By Jim at 2:51 p.m. on Jan. 9, 2009
This is not RESTful. You are just defining a set of resources and giving them URIs.
For example, you get a list of IDs from / and then get the songs via a hard-coded URI scheme (/song/[id]). A RESTful API, in contrast, would give a list of *URIs* as the response to GET /, and you would not have /song/[id] hardcoded at all, you'd just follow the URIs.
By Mike at 8:24 p.m. on Jan. 10, 2009
@Jim: Who cares if it's "RESTful". This discussion is more about the plumbing than the fixtures.
That aside, one can look at many well known services online and apply your standard and find them somehow a failure.
By Jim at 8:45 a.m. on Jan. 11, 2009
> Who cares if it's "RESTful".
From the article:
> > I'm going to try to keep it as RESTful as I can.
The *article author* cares if it's RESTful.
> That aside, one can look at many well known services online and apply your standard and find them somehow a failure.
I never called anyone a failure. The article author expressed a desire for a particular design principle, and I pointed out he's misunderstanding it and pointed him in the right direction. Stop being so over-sensitive.
And it's not *my* standard, it's what REST *is*. Read the paper. Or even this article, where he specifically addresses people calling non-RESTful things RESTful:
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
By Empty at 9:35 p.m. on Jan. 10, 2009
Wow, what a great post. I really like how you walked through all of this. I too think it's important to understand what straight WSGI provides. It would be interesting to see the same thing in Django and what the performance comparison is like.
By Mike at 1:44 p.m. on Jan. 11, 2009
Jim,
I interpret "RESTful as I can" less rigorously than you do because of the context: To me Eric's post is clear enough - its an example of what could be done in terms of real work implementing an API with raw WSGI. The example is what it is - a starting point, not a complete API.
I doubt anyone reading Eric's example was looking for a canonical example of a RESTful implementation.
There continues to be significant debate over what constitutes a "good" approach. I personally dislike content negotiation. Some adore it. Many applications that expose a RESTful interface have larger needs to support browser clients of specific representations. If the need tips to one side or the other one is going to make tool choices based on that more often than not.
By _Mark_ at 5:03 p.m. on Jan. 12, 2009
While it's interesting to see where the bottlenecks are with this approach - why use that much of a stack at all, instead of xmlrpclib (also an included battery) or even strings over sockets?
By dbr at 4:08 p.m. on Jan. 13, 2009
I suppose because WSGI is the standard for Python web-application.. things, and you can easily run it under Apache (for example), and you don't have to reimpliment the entire HTTP protocol..
By Alexwebmaster at 2:01 a.m. on March 3, 2009
Hello webmaster
I would like to share with you a link to your site
write me here preonrelt@mail.ru
By ben 10 oyunları at 3:09 a.m. on May 25, 2009
it does always come to some abstraction and if you start to add request and response objects, url selection and a webserver... then Eric, you end up rewriting Pylons or similar.
By katie millers at 1:05 p.m. on June 8, 2009
it does always come to some abstraction and if you start to add request and response objects, url selection and a webserver... then Eric, you end up rewriting Pylons or similar.
know it's a bit off topic but being that you're a bit more experienced I'd like to see you run a shootout of Django, Pylons and TurboGears, heck web.py too.
By caltins miker at 1:09 p.m. on June 8, 2009
I suppose because WSGI is the standard for Python web-application.
things, and you can easily run it under Apache
By lingeriewholesale at 9:55 a.m. on June 9, 2009
There continues to be significant debate over what constitutes a "good" approach. I personally dislike content negotiation. Some adore it. Many applications that expose a RESTful http://www.dear-lover.com/ interface have larger needs to support browser clients of specific representations. If the need tips to one side or the other one is going to make tool choices based on that more often than not.
By ed hardy at 8:55 p.m. on June 12, 2009
The official site of Don Ed Hardy Shoes. Find the latest Ed Hardy fashion footwear here.
By wow power leveling at 4:16 a.m. on June 13, 2009
I am in, what you said accords with what’s in my mind exactly.
By Cheap WoW Gold at 4:17 a.m. on June 13, 2009
Well, a cool idea
By Homeschooling at 3:58 a.m. on June 17, 2009
being that you're a bit more experienced I'd like to see you run a shootout of Django, Pylons and TurboGears, heck web.py too.
By Online High School Classes at 3:59 a.m. on June 17, 2009
agree with Kevin Dangoor with this. WSGI pure is cool, but at the end it does always come to some abstraction
By Students privacy online at 4 a.m. on June 17, 2009
very nice post thank you !
By Student Advice at 4:01 a.m. on June 17, 2009
WSGI is the standard for Python web-application right ?
By online university concession at 4:03 a.m. on June 17, 2009
Eric's example is pretty much interesting and helpful
By jordan shoes at 10:29 p.m. on June 19, 2009
good, thanks for your hard job!
By ugg boots at 10:31 p.m. on June 19, 2009
well, i really want to say is :"good job!"
By jordan shoes at 10:32 p.m. on June 19, 2009
good post, tyvm
By nike shoes at 10:33 p.m. on June 19, 2009
you are really something!
By aluminum floor jacks at 1:25 p.m. on June 21, 2009
Eric's example is pretty much interesting and helpful
By hidden deals on suv rental at 1:26 p.m. on June 21, 2009
Well, a cool idea
By olymbus waterproof at 1:27 p.m. on June 21, 2009
I know it's a bit off topic but being that you're a bit more experienced I'd like to see you run a shootout of Django, Pylons and TurboGears, heck web.py too. But not "What's better" just "What is each good ad" For example Django stinks at integration with our existing Interbase database
By custom logo design at 11:23 p.m. on June 21, 2009
WSGI really work, I would recommend it for cost and effectiveness
By website design at 11:25 p.m. on June 21, 2009
Eric you have posted a useful stuff
thank you !
By sexy lingerie at 7:59 p.m. on June 22, 2009
Pylons and TurboGears, heck web.py too. But not "What's better" just "What is each good ad" For example Django stinks at integration with our existing Interbase database
By cool at 11:22 p.m. on June 24, 2009
Looks very interesting. Thanks for sharing..
http://www.mpos.net/s/p3.asp
http://coolday.blog.com/
http://www.mpos.net/s/p4.asp
By tiffany jewellery at 12:28 a.m. on June 25, 2009
being that you're a bit more experienced I'd like to see you run a shootout of Django, Pylons and TurboGears, heck web.py too.
By ed hardy at 1:15 a.m. on June 27, 2009
This is great news. Best of luck for the future and keep up the good work.
By nilson Jack at 12:37 a.m. on July 1, 2009
good my site is http://www.abercrombieonsale.com/
By Tiffany Co at 12:13 a.m. on July 2, 2009
good !thanks !http://www.edhardyshop.us/
http://www.edhardyshop.us/
By Stop Dreaming Start Action at 7:16 a.m. on July 2, 2009
I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.
By Rusli Zainal Sang Visioner at 7:17 a.m. on July 2, 2009
I can here the new knowledge.
Thanks for the great reference post.
By Bisnis Online at 7:17 a.m. on July 2, 2009
I feel lucky can read this usefull news. Now I find something what i want to know..
Thank you for this great informations..
By Kasinos deutsch at 4:45 a.m. on July 3, 2009
Tinnitus retraining therapy (TRT) is aimed at removing negative associations of the tinnitus signal to enable the natural habituation process to occur [1]. The goal is to achieve this through retraining counseling and sound therapy. Retraining counseling is a crucial part of TRT; it teaches patients the components of the neurophysiological model of tinnitus and encourages them to reclassify their tinnitus as a neutral signal. Sound therapy is assumed to facilitate tinnitus habituation by decreasing the strength of tinnitus signal [2]. The TRT protocol requires that the patient adheres to the regimen for 12–24 months (typically attending for seven sessions over that time), except for patients experiencing weak tinnitus, which has little impact on everyday life