Looking at ANTLR...

I was sitting down last week looking at (yet another) problem where I needed to parse some data and bust it up into a tree, so that I could walk it and generate some specialized output. Generally, my need for doing this has been smallish, very specialized, and I know exactly what I want, so cranking out the code by hand is pretty painless. In this case, however, I need to parse some C, which isn’t all the friendly to parse without a compiler front-end. Which then lead me to think about ANTLR.

I’ve run across ANTLR in the past, and started thinking… I was about to parse this stuff into a tree… if I had to name that tree… it would probably be an Abstract Syntax Tree! So I finally decided to invest some mindshare into it and get The Definitive ANTLR Reference: Building Domain-Specific Languages by Terence Parr. Mind you, this isn’t my first foray into compiler front-ends, grammars, and such–despite the fact that the only two comp-sci courses I’ve ever taken are Cryptography and Digital Signal Processing Algorithms. However, I’m finding the book to be an amazingly easy read, and bears a lot of relevance to the several times I’ve need to recode data from one format to another. Up front, Terence got you writing a small grammar, and seeing some output. He then showed you how to turn it into an Abstract Syntax Tree, and write another grammar to perform actions on that tree. I’ve flown through the book so far… 142 pages in roughly 4 hours worth of time. I expect to have it done by the week’s end, and hopefully be able to do something productive with it. I do have to applaud the author for this little nugget:

…I implore everyone to please stop using XML as a human interface!..

I couldn’t agree more. It seems like everywhere I turn around, yet another project or tool wants to shove an XML file in my damn face. I get XML, but I despise it as a human interface. It’s too constraining, too verbose, and requires far too much knowledge to actually get it right. And while it may be “easy” to parse (not easy because it’s simple, easy because tools already exist), it comes at a heavy cost for the user. Something quick and easy with only a few rules to follow, is a much better way to go.