Taking a look at Mercurial

So, now that the dust has settled–to some degree–on distributed version control systems being the answer to every development problem, I decided to try a few out. To be honest, I tried a number of tools out months (a year?) ago, but many of them were still in the early stages of development and had a number of shortcomings. One of the tools I revisited recently was Mercurial.

Mercurial is an interesting tool. It borrows concepts from several other DVCSes, but is by no means a clone of those projects. Mercurial has a fast backend, which makes a number of operations very, very quick. The command line interface is straight-forward and intuitive (as long as you aren’t trying to do named branches). A number of the commands mirror the svn commands, which makes the transition easier–at least for long time users of svn, such as myself. Plus, it’s written in Python, which makes it easy for me to hack.

However, I did find some shortcomings. Some of these are related to “mental-model,” meaning that the model that I envision for how things should work doesn’t match those envisioned by the Mercurial developers. To be fair, I’ve not raised a single one of these issues with them. Looking at the mailing lists, some of them have already been flogged to death, so I didn’t see a real need to raise the issue again. This commentary is in no way meant to be a slight against the Mercurial developers either. I respect their work, their decisions, and will go on record as saying that they developed an impressive tool, especially considering the code base comes in at only 18,000 lines.

Issue #1: Branches are repositories

Before I go any further, let me clarify the above statement. When I say branch, I’m talking specifically about the ability to track the long term history, and perform maintenance on a separate line of development for a lengthy period of time. That said, this issue is largely a mental-model issue. The default model for working with Mercurial is to have a separate repository for every development branch. This conflicts with my idea of a repository, which should house everything about the development history of a product. Besides that, I believe there are several other reasons that you’d want to keep the entire development history in a single repository:

  1. Safety. If your project is composed of 150 different repositories, you can’t really make the claim that you have safety in numbers because a number of those development branches may be abandoned, or may not be cloned by others. You can mitigate this issue by creating a server to house the master repositories, but that leads to the next problem.
  2. Administration. At my work place–and many others–access to our primary servers is severely restricted. That’s a strict requirement. I don’t have a choice about that, lest I want to give up several certifications that enable us to do work for our customers. Can this problem be solved in other ways? Sure. I could set up a web page that allows you to create branches for your project on the server. That’s more work for me, and worse it impedes our development flow since I need to leave my environment and navigate to a web page to create my development branch.
  3. Bug tracking. We currently use Trac as our primary bug tracker and we’ve been very happy with it. However, I don’t know how Mercurial is going to work with Trac given that branches are generally separate repositories. Moreover, it’s increasingly difficult to track statistical data (number of opened and closed issues for a project, it’s average life, etc).
  4. Support Tools and bug hunting. In effect, Mercurial has made the same mistake Subversion did–and one that I believe is a large, gaping hole that others are trying their best to patch over: branches are not first class citizens. Currently, Subversion tools that interact with the repository (such as fisheye) depend on regular expressions to determine whether a path represents a branch, a tag, or trunk. Furthermore, these tools do a tremendous amount of work to determine things, like the complete branch history of a file. Mercurial now has a similar issue since the complete branch history isn’t maintained in a single repository. When you’ve tracked down a serious bug, it’s extremely valuable to know which branches are affected, and will often help define what our follow on action needs to be. Can it be solved. Yes. Does it require a great deal of work on my part. Yes, and that’s ultimately the problem for me, and developers of support tools for Mercurial.

Issue #2: Named branches are hard to use

I tried solving #1 by using Mercurial’s named branch feature. I basically tried to mimic the development flow that I’d use in Subversion, which may be part of the problem. I was successfully able to create a branch called ‘foo’ and committed a couple of changes. However, what I did not expect is that when I switched back to the default branch and tried to update, that all my changes in foo would still be present. It could be a PEBKAC error, but I found using named branches confusing at best, and ultimately found no way to get the behavior that I desired. After running into this issue, I didn’t even bother to try cherry-picking revisions and other things that are necessary on maintenance branches.

Update: The problem I was having was two-fold. First, changing to another branch didn’t switch my working copy. I can deal with that to some degree. The bigger issue is when I switched back to the default branch, and tried ‘hg update’, I was still at the tip of foo, not default. That is highly unintuitive, and contrary to the example in the user’s guide. So I went back, and tried ‘hg update -C default’ and it errored out, saying it didn’t know anything about the ‘default’ revision. I’m certain I’ll come back and revisit this issue, but I did want at least show that I tried.

Another Update: There seems to be a bug in Mercurial 0.9.3 which prevented ‘hg branches’ from showing the default branch, and ‘hg update default’ from working. I just tried a development snapshot and it appears to handle this situation better.

Issue #3: Working copies are repositories

I like the idea, but it would fall over dead for some of my projects. For example, one of the repositories I have at my workplace is 6GB is size (and that’s small for some folks)! There are a ton of binary images in it, and they really need to be there. I can see absorbing the cost once, and having multiple working copies based off of one repository. But cloning that repeatedly? I don’t see it happening. I have a nice dev machine, equipped with RAID 10 and ton of space and I’m already pushing my storage limit. Hosting repositories repeatedly on my machine would blow it away. Storage may be cheap, but it isn’t always cheap to add it. It costs time and money for an administrator to come and perform that task. In defense of Mercurial, it makes it easy to take the work with you. Unfortunately, that’s a feature that’s largely lost in my environment, as I can’t take my work home without going to prison.

Summary

Despite the issues that I’ve outlined, I think Mercurial is a highly usable tool. It definitely does a superior job of merge-tracking (compared to Subversion), since it has a changeset oriented-DAG. The benefits of that really showed when I ran through the tutorials and created several example repositories of my own. I’d highly recommend Mercurial to DVCS first-timers, as it’s one of the simplest DVCS tools out there.