While looking over the proceedings of OOPSLA 2016, I ran across a paper called “Purposes, Concepts, Misfits, and a Redesign of Git.”. In the paper, the authors, Santiago Perez De Rosso and Daniel Jackson, attempt to apply a theory of conceptual design to help identify places in Git where it behaves badly, examines the real intent and purpose behind version control, and then provides some solutions to those issues guided by concepts and purpose.

From a bird’s eye view, the theory end of the paper looks interesting. I think they did a good job breaking down the process of analyzing software and fixing core issues. This might be confirmation bias at work, because I share a similar methodology–though I haven’t taken the time to build a lexicon around it like the authors have.

From a Git perspective, I think things fell short. In particular, I think by looking at version control from a generic point of view they failed to understand some of the core issues that Git is trying to solve, several of which fall outside your typical version control scenario. I think it’s important to keep in mind that there are some very real concerns Linus and the kernel community have that are challenging for many tools. For instance, scaling to thousands of contributors, accepting patches via mailing lists, and contributing when you don’t have access to the upstream repository.

File Renaming

As an example, let’s pick on file renaming. The paper has this to say about it:

File Rename Suppose you rename a file and make some changes to it. If you changed a significant portion of the file, then, as far as Git is concerned, you didn’t rename the file, but it is instead as if you deleted the old file and created a new one (which means that the file history is now lost). To work around this, you have to be diligent about creating a commit with the rename only, and only then creating a new commit with the modifications. This, however, likely creates a bogus commit that doesn’t correspond to a logical group of changes.

I can’t disagree with this statement. Much of what is said here is absolutely true, and it’s definitely an area of Git I’d really like to see improved. However, it fails to acknowledge some legitimate concerns that Linus faces when maintaining the kernel. Again, Git was made to help maintain the kernel. Linus did not set out to make a tool that was for everyone, but instead had to created a tool that had to deal with the myriad of workflows that the kernel community and its downstream consumers and contributors follow. I think by focusing on the generic version control problem, they missed an important purpose that Git was trying to solve. For those interested, Linus laid out his thoughts in an email discussion on the Git mailing list quite a while ago.

What I found disappointing about oversights like this in the paper was the potential to come up with a solution that would better address the problems community-wide (for both the kernel community and those using Git as the version control tool of choice). It seems like a missed opportunity, and I really would have liked to see how they approached that problem.

Untracking File

This was another spot in the paper where I think the analysis fell short. In this case, the misfit is that the user needs to edit a file for some local testing, but doesn’t want to commit the resultant changes. Here’s the text from the paper:

Untracking File Suppose there’s a database configuration file committed in the repository and you now want to edit this file to do some local testing. This new version of the file should not be committed. You could always leave out the file from the commit every time, but this is laborious and error-prone. You might think that you could make it ignored by modifying the .gitignore file but this doesn’t work for committed files. The way to ignore this file is to mark it as “assume unchanged,” but this marking will be cleared when you switch to another branch.

And here’s the paper said about the issue in the analysis section:

Divided Ignored and Assumed Unchanged The problem with assume unchanged is that it violates the non-division criteria: the same purpose “prevent committing file” motivates both the concept of an ignored file and assume unchanged file. So the user needs to learn another concept and separate set of commands to do the same thing that the ignore mechanism provides, causing misfit “untracking file.”

I think the problem here is one of omission. This time I believe they’ve missed that failing to let users know about changes to files under version control violates Purpose 1 (“Make a set of changes persistent”) in the paper. By failing to let users know that they have changes, some of which may need to be in the current commit, that we’re failing to record the entire change.

Their solution introduces a new concept as well: tracking and untracking. Unfortunately, it’s not clear whether untrack means “stop versioning this file” or “ignore changes to this file until I say otherwise.” Does gl track replace gl add? It seems like it should, but the documentation isn’t clear on this. So then does gl untrack mean git rm? No, it means “ignore changes until I say otherwise”. If you want to remove the file, you simply remove it from your working tree and it’s detected by Gitless as such, if the file is being tracked. So there’s an asymmetry here that seems confusing. That said, I believe the track concept is a good one, and would certainly be less friction than folks have now, but I’d like to hear what the authors have to say about it addressing Purpose 1.

Final Thoughts

I may be criticizing the analysis the authors put forward, but I am appreciative that they actually took the time to do a deep look at the issues and attempted to do something about it. I wish they hadn’t gone as far as they have (for instance, removing the staging area concept–it’s actually very nice when staging changes for commit), but I do hope that some of their work does end up in Git. I like the idea that changes to tracked files follow the branch, even if they’re not committed (though they don’t say what happens when you push the branch to a remote repository). I actually like the track/untrack concept too, though I still worry about forgetting to put the file back into a tracked state and missing something. And no, I don’t use “assume unchaged” primarily for that reason.

I hope the project continues and that some of the snags get polished out. I think there’s a real potential to have a nicer tool for newbies. But I also hope that they don’t forget about he expert users and find a way to get some of this back into the core Git for everyone to share.