Brackets Dockerfile Syntax Highlighter Using the Jacob Lexical Tokenizer

Brackets is an excellent opensource source code editor, available from  Originally from Adobe, it is now a community developed project on GitHub –

It comes with a lot of plugin extensions for pretty much everything you would need, like: Git integration, Linters (code quality analysis tools), Language syntax highlighters, etc.

Recently I have started contributing my own syntax highlighters for M4 macros and Dockerfiles, and it is this latter project that this blog is about.

Under the hood, Brackets uses CodeMirror to provide language syntax highlighting.  It comes with a range of language “modes”, which are really just javascript modules that state-fully tokenize code into CSS styles for the syntax colouring/highlighting.  They can also handle indenting and commenting.

I wrote my original extensions in a similar manner, hand-coding the state-machine and tokenizing from the code using regular-expressions.  However, I quickly realised, with my Dockerfile extension, that this code had become too complicated, too convoluted and difficult to maintain.  Just look at this code in my project’s history…

Now, my background is in C coding and experience with tools like Lex/Flex & Yacc/Bison. Flex is an opensource Lexical Analyzer and Bison a Grammer Parser.  What I wanted was something similar, but for Javascript.  On searching, I found Jacob (also available via NPM here) – which provides both of these capabilities in one tool.  It seemed the Lexer component of Jacob would be an ideal way of coding, and hopefully simplifying, my Dockerfile extension.

Installing Jacob was easy:

I created a Dockerfile.jacoblex file.  This provides a lexical definition of the language I wanted to parse and tokenize.  This file is divided up into 3 sections, separated by %%.

The first section declares the lexer’s module name:

The next section is to define named regular expressions:

In this case, just a regex matching all of the Dockerfile’s possible keywords.

The final section defines the parsing rules and state-machine.  Here is a simple example. This parses a comment and returns the ‘COMMENT’ token:

A more complicated example, using the above named regex:

The first part of this rule matches on the {directive} (Dockerfile keywords) and then uses this.pushState() to advance the state-machine, e.g. to DOCKDIR, so the rules associated with that state, denoted by <DOCKDIR> can then be applied.  The method this.popState(), as it’s name implies, reverts back to the previous state on the stack.

This is just a taster, you can view the complete file here.

The lexer module is generated from this, using jacob:

This creates the Javascript file dockerlex.js, which can be imported into my extension’s main.js script:

Integrating the generated lexer into a custom CodeMirror Mode proved a little challenging, until I realised that I could simply 1) use the lexer itself as the mode’s State object, and 2) extend the Stream object to provide the extra methods expected by Jacob.

Here I create the mode’s state object:

and extend the stream object with these methods:

These were taken and tweaked from Jacob’s own StringReader object.

As CodeMirror was feeding my tokenizer stream line-by-line, I needed to think carefully how the lexer could work (e.g. the regex ‘$’ directive does not work, requiring an alternative approach using this.input.more()), and also reapply the stream on each iteration.

The start state being created using:

Then for each iteration, I ensured the lexer’s input was reset to the current stream object:

The call to state.nextToken() in fact calls the lexer generated by Jacob.  The return token’s name attribute is then passed back as the syntax highlighting style name (e.g. ‘def’, ‘string’, ‘error’, etc).

I realised CodeMirror’s internal copyState() method couldn’t fully copy the lexer state object, so I coded a custom method:

and also added a blankLine() method to pass a dummy newline to the lexer, as CodeMirror normally drops empty lines.

You can view this complete main.js script in GitHub here.

Finally, I was able to switch CodeMirror syntax highlighting to use its builtin mode for “Shell” scripts when my lexer encountered either a RUN or CMD Dockerfile directive:

In main.js the bashMode was retrieved from CodeMirror using:

and when state.localMode is set by the lexer, above, the nested shell code is tokenized using:

the check for the end-of-line containing a ‘\’ is to allow line continuation, multi-line shell scriptlets on the directives.

The resulting code and jacoblex rules are, in my opinion, much easier to understand and will save me much pain supporting going forward.

The full project can view viewed here.

Here are a few screenshots from the GitHub project page:

Leave a Reply

Your email address will not be published. Required fields are marked *