Have an error-handling strategy...

Seems like more and more I’m finding applications that have little or no error handling strategy, which is a real shame. The job the application is performing is important to me: I want to use it to save myself the time and headache of doing something repetitive or mind-numbing. Unfortunately, while the application does its job well, it fails on less than perfect input. Now, I’ve been using computers since I could barely say “computer,” so I’m well-versed in telling my computer what it wants to know, in the format that it wants to know it. And I’ve become accustomed to looking at tracebacks and using other tools (strace, ltrace, gdb, etc.) to find what is breaking, and correct my input. However, that doesn’t work for your average user–even if your average user is a developer. The end result: the application ends up with a bad rap pretty quickly. This is especially true if you have a command line application, and you have a bunch of users who aren’t command line junkies.

Having an error handling strategy doesn’t have to be particularly complicated to make for a better user experience. In fact, it’s really pretty straight-forward. But you do need to think about having a strategy before you write your application. Why? Because you need to partition your errors into “this is a problem for the user to resolve” and “this is a problem for the developer to solve”. If you don’t have a strategy up front, then you won’t partition the two classes of problems, and you’ll have to go back and look at every single exception being generated to determine whether it’s a user-input problem, or a failure of the application. Best to figure this out up front.

First, create your own exception class. In my Python apps, I generally use:

class ScriptError(Exception):
    pass

Second, on anything that’s meant to tell the user that he made a mistake on one of his inputs (malformed paths, a full disk, invalid command line option, etc.), simply raise a ScriptError in response:

# ...
if not os.path.exists(path):
    raise ScriptError("Can't find path '%s'.  "
                       "Please re-run specifying the location of the config file." % (
                           path))
# ...

or something like,

# ...
try:
    foo(something)
except ArithmeticError, e:
    raise ScriptError(e)

Finally, catch that somewhere near the top-level, and output the error message:

if __name__ == '__main__':
    try:
        main()
    except ScriptError, e:
        import sys
        print >>sys.stderr, "ERROR: %s" % str(e)
        sys.exit(1)

That’s it. It’s that simple. And look what we have done:

  1. The user is now informed of the problem in a meaningful way.
  2. You didn’t print a 1000-line traceback that’s impossible to comprehend for errors that are theirs.
  3. The combination of which makes for a much better user experience.

You will still get tracebacks out of this strategy, and that’s not terrible. Tracebacks can aid in working with users to determine the root cause of why something broke. And the presence of one can indicate the presence of a bug, rather than the mixed message of either being a bug or a user input problem. Of course, if your application is generating many tracebacks, even with this strategy in place, you’ve got bigger problems. :-)

Is this strategy perfect? No, and it’s not meant to be. The right answer depends on the project, its user-base, and development time. However, it’s minimal, relatively painless, better than nothing, and gets you the 80% solution. So why not use it?

I’m off to work to fix some broken-ass error handling… wish the original developers had taken my approach to begin with.