Posted on May 6, 2008 at 3:24 A.M.

There are many times when the programming task at hand is to iterate over some semi-structured text, transform parts of that text in some way, and reintegrate those transformed parts back into the original text.

Typically using a regular expression with re.sub and a callback function, but sometimes you want a bit more control of the process (especially over those parts that dont match the regex). Usually my solution is to write a one-off function that does it, but today I had to write that function yet again and decided to generalize it and post it here.

To be completely honest, this post is more for my own archival purposes than for the internet as a whole, but if anyone else finds it useful, then I'm ecstatic.

def re_parts(regex_list, text):
    """
    An iterator that returns the entire text, but split by which regex it
    matched, or none at all.  If it did, the first value of the returned tuple
    is the index into the regex list, otherwise -1.

    >>> first_re = re.compile('asdf')
    >>> second_re = re.compile('an')
    >>> list(re_parts([first_re, second_re], 'This is an asdf test.'))
    [(-1, 'This is '), (1, 'an'), (-1, ' '), (0, 'asdf'), (-1, ' test.')]

    >>> list(re_parts([first_re, second_re], 'asdfasdfasdf'))
    [(0, 'asdf'), (0, 'asdf'), (0, 'asdf')]

    >>> list(re_parts([], 'This is an asdf test.'))
    [(-1, 'This is an asdf test.')]

    >>> third_re = re.compile('sdf')
    >>> list(re_parts([first_re, second_re, third_re], 'This is an asdf test.'))
    [(-1, 'This is '), (1, 'an'), (-1, ' '), (0, 'asdf'), (-1, ' test.')]
    """
    def match_compare(x, y):
        return x.start() - y.start()
    prev_end = 0
    iters = [r.finditer(text) for r in regex_list]
    matches = []
    while iters:
        if matches:
            match = matches.pop(0)
            (start, end) = match.span()
            if start > prev_end:
                yield (-1, text[prev_end:start])
                yield (regex_list.index(match.re), text[start:end])
            elif start == prev_end:
                yield (regex_list.index(match.re), text[start:end])
            prev_end = end
        else:
            matches = []
            for iterator in iters:
                try:
                    matches.append(iterator.next())
                except StopIteration:
                    iters.remove(iterator)
            matches = sorted(matches, match_compare)
    last_bit = text[prev_end:]
    if len(last_bit) > 0:
        yield (-1, last_bit)

Posted on May 4, 2008 at 6:12 P.M.

Lately there's been a lot of discussion in certain programming communities about which method of object extension makes more sense: inheritance, or composition. Most of the time these discussions turn into debates, and when that happens developers tend to "take sides"--often moving towards extremist positions on the issue. I've been sort of quietly thinking about it all lately, trying to determine which use case warrants which approach. Here I show examples of both, explore some properties and consequences of both composition and inheritance, and finally talk about my own preferences.

Examples of Composition and Inheritance

Before talking about the consequences of inheritance vs. composition, some simple examples of both are needed. Here's a simplistic example of object composition (using Python, of course, as our demonstration language):

class UserDetails(object):
    email = "floguy@gmail.com"
    homepage = "http://www.eflorenzano.com"

class User(object):
    first_name = "Eric"
    last_name = "Florenzano"
    details = UserDetails()

Obviously these are not very useful classes, but the essential point is that we have created a namespace for each User object, "details", which contains the extra information about that particular user.

An example of the same objects, modified to use object inheritance might look as follows:

class User(object):
    first_name = "Eric"
    last_name = "Florenzano"

class UserDetails(User):
    email = "floguy@gmail.com"
    homepage = "http://www.eflorenzano.com"

Now we have a flat namespace, which contains all of the attributes from both of the objects. In the case of any collisions, Python will take the attribute from UserDetails.

Consequences

From a pure programming language complexity standpoint, object composition is the simpler of the two methods. In fact, the word "object" may not even apply here, as it's possible to achieve this type of composition using structs in C, which are clearly not objects in the sense that we think of them today.

Another immediate thing to notice is that with composition, there's no possibility of namespace clashes. There's no need to determine which attribute should "win", between the object and the composed object, as each attribute remains readily available.

The composed object, more often than not, has no knowledge about its containing class, so it can completely encapsulate its particular functionality. This also means that it cannot make any assumptions about its containing class, and the entire scheme can be considered less brittle. Change an attribute or method on User? That's fine, since UserDetails doesn't know or care about User at all.

That being said, object inheritance is arguably more straightforward. After all, an e-mail address isn't a logical property of some real-world object called a "UserDetails". No--it's a property of a user--so it makes more sense to make it an attribute on our virtual equivalent, the User class.

Object inheritance is also a more commonly-understood idea. Asking a typical developer about object composition will most likely result in some mumbling and deflection, whereas the same question about object inheritance will probably reveal a whole host of opinions and experience. That's not to say that composition is some sort of dark art, but simply that it's less commonly talked about in so many words.

As more of a sidenote than anything else, inheritance can be speedier in some compiled languages due to some compile-time optimizations vs. the dynamic lookup that composition requires. Of course, in Java you can't escape the dynamic method lookup, and in Python it's all a moot point.

My Preferences

In general, I find object composition to be desirable. I've seen too many projects get incredibly (and unnecessarily) confusing due to complicated inheritance hierarchies. However, there are some cases where inheritance simply makes more sense logically and programmatically. These are typically the cases where an object has been broken into so many subcomponents that it doesn't make sense any more as an object itself.

The Django web framework has an interesting way of dealing with model inheritance, and I think that more projects should follow its example. It uses composition behind the scenes, and then flattens the namespace according to typical inheritance rules. However, that composition still exists under the covers, so that that method may be used instead.

The answer is not going to be "composition always" or "inheritance always" or even any combination of the two, "always". Each has its own drawbacks and advantages and those should be considered before choosing an approach. More research needs to be done on the hybrid approaches, as well, because things like what Django is doing will provide more answers to more people than traditional approaches. Cheers to continued thought about these problems and to challenging conventional thought!



Posted on April 17, 2008 at 12:49 P.M.

What could have more of an impact on visitors of a website than its design? It's the first thing that people notice when they visit the site, and it dictates what they see and how they interact with the site. A bad design can drive visitors away, whereas a good design can bring people back again and again.

It seems that a common misconception is that design is how to make a site "look good". While this is true, to an extent, the design also determines the flow of information from the screen to the user. In the words of Andy Rutledge, "It's not about the design, it's about communicating.". This only underlines the importance of good design.

37signals designs their interfaces first, citing two basic reasons. First, design is lightweight relative to programming. That is, it's much easier to change the position of a navigation bar than to change the data persistence layer of the backend. Second, is that the interface is your product--if the visitor sees and interacts and remembers the interface and its design only, then the design really is the site itself. I'm not sure how much I agree with the former, as I believe that design is becoming more and more heavyweight, but the latter definitely has some merit.

So it's a reasonably well-accepted fact that design is one of the most important aspects of a website, so why don't more people focus on it? I think that the problem is specialization: programmers--preferring to write code and think through the program logic--attempt to muddle their way through creating a user interface, while designers--preferring to perfect the margins, whitespace, and typography--attempt to muddle their way through defining logic of the backend.

Of course, I'm talking about smaller projects of only one or two members. Once they get larger than that, they need to bring a few of "the other" type of people into the mix. That being said, where are the people who are excellent dual designer-developers? Of course people like this exist, but these people are few and far between.

Part of the problem is that both disciplines are ones of constant improvement. As a developer, I know that I will never stop getting better and more experienced in my craft. I will always look at code that I wrote a year ago and cringe. This is part of what makes developing interesting to developers. As I understand it, the same is true with design. This property of both disciplines renders learning the other discipline futile, or at least makes it seem futile (which is a bit of a self-fulfilling prophecy).

Is it possible to become at least conversational in the language of design, when your experience and main interests is developer? That's what I'll be trying to discover in the next few weeks and months. I don't have more than a few hours a week to devote to it, but I've embarked on a bit of "independent study" about design, trying to learn from the best out there about grid-based layouts, color theory, etc.

The encouraging thing is that both designers and developers trend towards being bloggers as well, and that means that there's a wealth of great articles and information out there to learn from. Keep an eye out here for updates on my progress, my successes/frustrations, and other theoretical ramblings about design.


Posted on April 10, 2008 at 12:08 P.M.

Yesterday I came across a quote from George Patton (via), which stuck me as really insightful, but later something burned the quote directly into my brain.

“A good plan, violently executed now, is better than a perfect plan next week.”

It's a quote which rings quite similar to the commonly-said open source phrase, coined by Eric S. Raymond in his article The Cathedral and the Bazaar.

"Release early. Release often."

The thing which burned these ideas into my brain is the discovery of djangoplugables.com. Simply put: this is an excellent site, which follows the ideas that I mentioned above completely. The premise behind it is essentially to list all of the available django reusable applications on google code, and to display a bit of information about each app.

A group of about 5 people, including myself, have been silently working on a very similar site for the past month or so (the bulk of our work took place during PyCon), but we utterly failed to follow the above sentiment. We debated for hours over how users would be able to submit applications, claim them, and how we could ensure that those claims were accurate. We had tagging, voting, comments, voting ON comments, graphs detailing how "hot" each application was (based on a frequency analysis of the votes over time), and OpenID integration.

But all of this functionality took time, and we implemented it behind closed doors, in a vacuum--without ever seriously focusing on the user interface. I don't know yet what will become of all of our work, but I have a feeling that it will be discontinued in favor of the much better-looking and simpler djangoplugables.com. Maybe we'll see parts of it resurface again, but that's not really the message behind this post. The real message to take away from this experience is that we should practice what we so often preach. In the case of web development, execute the good plan now and iterate, versus trying to perfect everything before release.

The upside of all of this is that our goal has been achieved. What we really wanted to accomplish is what now exists: an excellent resource for finding reusable django applications, and no matter who implements it, that's a win for everyone!


Posted on April 1, 2008 at 1:14 A.M.

April Fool's Day rocks! Maybe I enjoy it because it happens to share the same day as my birthday, but I think it's more to do with the fact that everyone's having fun, being lighthearted, and simply not taking things too seriously. Last year the blog wasn't in any shape to do anything fun for the occasion, but this year it took me about 10 minutes to whip up some middleware fun. That's right, if you can read and understand this right now, the secret is out: a bit of Django middleware is all that's needed to turn your blog into l33t-sp34k central.

I'll even go one step further than telling you how I did it, I'll give you the code:

import re
import lxml.html

trans = {
    'cks': r'xxors',
    'lol': r'r0flc0pt3r',
    'the': r'teh',
    'a': r'4',
    'e': r'3',
    'f': r'ph',
    'g': r'6',
    'h': r'|-|',
    'i': r'1',
    'o': r'0',
    's': r'5',
}

def is_code_block(node):
    return node.attrib.get('class', None) == 'highlight'

def recursive_leetifier(node):
    for child in node.iterchildren():
        if child.text and not is_code_block(child):
            for pattern in trans.iterkeys():
                child.text = re.compile(pattern, re.I).sub(trans[pattern], child.text)
        if not is_code_block(child):
            recursive_leetifier(child)

class LeetSpeakMiddleware(object):
    def process_response(self, request, response):
        try:
            html = lxml.html.fromstring(response.content)
            recursive_leetifier(html)
            response.content = lxml.html.tostring(html)
        except:
            pass
        return response

The idea behind it is simple: Let the request go completely through Django's request/response cycle, and just before returning the correct response, parse the HTML and convert all of the actual content to l33t by doing some simple regular expression substitution. I'm using lxml.html simply because I attended Ian Bicking's talk at Pycon 2008 and was intrigued. I must say that the familiar ElementTree interface helped a lot in getting this code up and running in a short amount of time.

Hopefully you all find this holiday to be as fun as I do, and maybe I'll see some more l33t next year!

Search

 

Recent Links

  • Simon Willison: The Implications of OpenID
  • I somehow missed this presentation when it came out, but it's an absolutely fantastic overview and defense of OpenID by Simon Willison. If you are in any way interested in what OpenID is and what it can offer, you owe it to yourself to check out this presentation.

  • StupidXML
  • Probably the simplest XML library that I've seen for Python. Sometimes you just want to generate some stupid XML, and this is the perfect tool for the job.

  • Pownce Adds a New API Response Format (LOLCAT)
  • This is awesome. That is all.

  • Django's queryset-refactor branch merged into trunk
  • This has been a long time coming, and thanks to the incredible efforts of Malcolm Tredinnick and others, Django has just gotten a heck of a lot better!

  • django-nyc
  • A New York City-based Django user group. It's great to see these local Django user groups, and Kevin Fricovsky and Loren Davie seem to be putting a lot of effort into this one. Hopefully this becomes a huge success! If you're a Django enthusiast in NYC, check it out and join in on the discussion.

  • Paver: Build, Distribute and Deploy Python Projects
  • Library for building and deploying Python projects. What's really cool about this project is that it brings together setuptools, nose, and sphinx. It's bootstrapped, as well, using itself for its own purposes.

  • See the rest of my links...

Pownce

    Badges

    • django badge
    • apache badge
    • GeoURL
    • XFN Friendly
    • Valid HTML 4.01 Transitional