Building shells with a grain of salt [shell, pedagogical, release]

The shell is at the heart of Unix. It's the glue that makes all the little Unix tools work together so well. Understanding it sheds light on many of Unix's important ideas, and writing our own is the best path to that understanding.

Earlier this year, at a place I worked, I decided to run a series of workshops on writing a Unix shell. A lot of questions had come up that I think writing a shell leads you through, as well as issues that suggested tenuous mental models of the shell and its scripting language.

A small sampling of those kinds of questions:

  • why does this Python replacement for this shell script deadlock?
  • why are there so many Unicorn processes on this server?
  • when are variables quoted in a shell script? why aren't they always visible to other programs I run?
  • how does set -e work?
  • how does control-C (^C) work? why do I need to use ^\ sometimes instead? why doesn't ^C work the way I'd expect on this bash for loop?
  • what's special about init?
  • why do we do these steps in daemonization?

At this company, we had a regular Friday afternoon workshop/lecture series. I had previously tried to do an overview of Unix processes relationships, but it felt too abstract. So, I tried to make it more concrete by getting everyone to actually implement a shell.

Initially, this was just a rough layout of what I thought I could cover in each session, and pointers to manpages. I never turned this into the full, DIY, self-paced tutorial I had hoped, but (in the spirit of release early, release often) I am opening up my work in progress at https://github.com/tokenrove/build-your-own-shell.

This isn't "finished", but if you're ambitious, you should be able to make something that passes all the tests. I decided it would be better to put it out there, even in rough form, than keep it sealed up. After all, a number of people enjoyed the workshop out of which this came.

(Caveat for macOS and *BSD users: there's still something wonky about the timing in the section that tests signals and job control; hopefully by the time you're reading this, I'll have it worked out, but if not, I apologize.)

In this post I'll reflect on some choices I made, and follow a few tangents that come up in the text but would be disruptive there.

Testing shells

I decided that, for this to be useful for self-study, it should contain an automated test suite. I love Tcl and expect, and had figured it would be a natural tool for testing the interactive components of shells. I took a quick look at how other shells were testing themselves. Most were strictly non-interactive tests, using shell scripts and comparing with expected output. A nice exception here is fish, which indeed uses expect for its interactive tests.

This makes sense, but I wanted to focus on interactive shells: in part because so many tutorials ignored the considerations of interactive shells, but also because I felt people would enjoy themselves more if they could use the shell they were writing directly.

I started with some tests edited from the output of autoexpect, but this turned out to be too fragile. Something I noticed in the first workshop was that people really enjoyed customizing their prompt; this should be no surprise (prompt customization is a perennial time-wasting activity in any shell), but it meant I'd have to be careful about how I matched outputs in tests. In particular, I couldn't really depend on detecting and matching the prompt.

The other tricky thing is that I couldn't use any feature in the tests that hadn't been developed yet, so using conditionals or echoing $? wasn't possible in the early tests.

I considered writing a wrapper using ptrace(2) that would watch for all fork / execve / wait syscalls from the shell and its children, and print those in a form easily consumed by a test harness (this seemed easier to do than cleaning up the output of strace), but things like prompts that exec git every time, as well as ptrace's noted stubbiness on macOS, prevented me from going further with this.

So that's where the workshop sat for a long time, until I finally decided to use a little test description language in place of expect scripts directly. So now a typical test might look like:

→ true || false || echo-rot13 foo⏎
≠ sbb
→ false || true && echo-rot13 foo⏎
← sbb
→ exit 42
☠ 42

For whatever reason, expressing things this way allowed me to finally write out all the tests I had intended to have, without focusing too much on the implementation of the test harness. Then I wrote some Tcl to interpret these files.

I decided to go with string matching in the output, which is not particularly robust, but is simple. Because of discrepancies between how different shells and TTY drivers draw things, it can be prone to matching the echoed input as the output if one isn't careful. There are also some timing issues; the script written by autoexpect suggests inserting a 100ms delay between each keystroke sent, but this makes the tests cripplingly slow; I'm still trying to find a tuning that is reliable across systems but speedy enough to be usable.

I decided that bash and mksh should pass all the tests, and cat should fail every test. There's nothing worse than a test that fails to actually test something. This reminds me of the admonishment "don't try to do what a corpse can do better": goals phrased in the negative (like "stop reading Hacker News") are hard to achieve — the dead (or cat) will always do them better than you. Positive goals (and tests) are more actionable.

There are still some timing issues on different platforms, but I don't regret making the simple choice for now.

Minimal shell builtins

Doing the workshop lead me to think about minimizing shell builtins; one of the questions that comes up a lot is why cd needed to be a builtin, but what doesn't come up until one is much deeper into pipelines and job control is what a pain builtins are, in how they interact with the rest of the shells features. It would be nice to get rid of them.

There are some commands which are builtins only to make them fast, like echo, true, and false. These usually have equivalents in /bin already.

Some builtins are required because they modify the shell's own environment: cd, exit, fg, bg, jobs, exec, wait, ulimit. (This is excluding really tricky, impractical things, like using shared memory, process_vm_writev, or ptrace to modify the shell from an outside process.)1

To prove a point, you could take functional programming to an extreme and have an immutable shell where cd executes a new shell in the chosen directory, but some of the others are probably not possible in the presence of typical job control.2

If we take this line of thought further, we can try externalizing some of the shell's operators. Conditional execution is interesting. How about && and ||? Syntactically, we probably can't pull these off as external commands, but we could provide commands and and or which take commands to execute.

Implementing if is an obvious next step from and and or. Now we can implement while, although we'd have to be careful about how we handle the environment if we wanted to handle many typical uses of while.

The for loop almost already exists in this form, as xargs. We would probably want to provide both a sequential version, where the environment for each iteration depends on the previous, and a parallel version where everything can run at the same time.

Note that most of these approaches require that you have mechanisms for escaping that aren't too cumbersome, for them to be practical. There seems to be a close parallel with macro facilities in languages like Lisp.

At the extreme side of cumbersome quoting would be case, which you'd probably want to take its input from a heredoc.

I was originally going to write a proof of concept of this (called "builtouts"), but researching this lead me to the intriguing execline "shell", which has already done this, and explored this space rather nicely.

One thing that execline doesn't seem to do is implement something resembling real job control. If bg executes a command without waiting and then re-executes the shell with a suitable variable set (to the PGID of this job), the shell on each execution can check this variable to see what jobs are still alive; the jobs command can print the contents of this variable; the fg command just becomes tcsetpgrp and wait with the PGID of the current job. For an interactive shell, the tricky thing is probably making sure that bg's children don't end up in an orphaned process group.

A lot of these programs end up having to deal with quoting. Is there a way to take this further and handle quoting in its own program? For fixed-arity programs (like if), we can imagine an unquote helper that calls a subsidiary program with, first, the fixed remaining arguments, and then all of the original quoted argument, expanded, as the remaining arguments.

As glob(7) notes:

Long ago, in UNIX V6, there was a program /etc/glob that would expand wildcard patterns. Soon afterward this became a shell built-in.

Luckily, the source is available in Diomidis Spinellis's unix-history-repo, and we can see that it does this same kind of chain loading, executing its first argument with the rest of its arguments expanded according to the globbing rules.

I especially enjoy the extremely primitive path search and shell script support.

Objet trouvé engineering

(Found object engineering, often called cargo cult programming.)

Now we get to the inflamatory bits, for those who kept reading.

Stackoverflow modernized, but did not create, the practice of assembling Frankenstein programs from poorly understood and imitated examples, but I think no language has been more greatly affected by this than shell, as evidenced by the bizarre ready-made shell scripts one can encounter almost everywhere. Sometimes, the evolution of these patterns reminds me of semantic drift in languages.

A lot of constructs are poorly understood and misused. I'm not blaming people, though; part of the problem is that I can't easily point to a single, modern reference work that someone should read before writing shell scripts. And, since shell scripts often feel like "configuration" rather than "programming", I imagine people don't even think about learning shell as a programming language.

Writing a shell helps disabuse people of some common confusions, for example that:

  • bash is shell scripting, definitively, and if you write a shell script, it is a "bash script" ("all the world's a VAX" and all that);
  • quotes make something into a string;
  • double and single quotes are interchangeable;
  • the argument to if, while, et cetera is something magical;
  • writing export FOO=x repeatedly does something;
  • variable case has some magic properties;
  • et cetera.

(Don't forget to use shellcheck and checkbashisms everywhere!)

Why write a shell

In the workshop, I cite the following motivations for writing a shell:

  • to give you a better understanding of how Unix processes work;
    • this will make you better at designing and understanding software that runs on Unix;
  • to clarify some common misunderstandings of POSIX shells;
    • this will make you more effective at using and scripting ubiquitous shells like bash;
  • to help you build a working implementation of a shell you can be excited about working on.
    • there are endless personal customizations you can make to your own shell, and can help you think about how you interact with your computer and how it might be different.

I've already touched on the first two, but the third is maybe less obvious. The shell remains a ubiquitous interface, decades after we imagined other modes of interaction would replace it. The field is ripe with opportunities for improvements.

There are a lot of people exploring this space in interesting ways, but I think there's room for so much more.

A lot of existing tutorials focus on the non-interactive case, and I think people will have more fun if they build a shell they can use interactively.

Aside from the interactive case, a lot of infrastructure is held together with shell scripts.

There's a commonly held belief that scripting languages like Perl, Ruby, and Python are complete replacements for shell scripting. My own experience is that these languages lack the expressive tools of the shell for working with pipelines, exit statuses, redirections, and so on, and the replacement code is often:

  • sequential rather than parallel, and often much slower for this reason;
  • full of deadlocks, race conditions, and signal handling issues;
  • much more verbose and less clear than an equivalently carefully-written shell script.

So, I feel there's still room for new tools in this space, too.

Conclusion

If you've been thinking about writing a shell for a while and haven't gotten around to it, why not try my workshop or any of the tutorials it links to?

Footnotes:

1

POSIX avoids dictating exactly what must be a builtin, but does specify that the following commands must be executed no matter if they are in the path:

alias bg cd command false fc fg getopts jobs kill newgrp pwd
read true umask unalias wait

Most of these have something to do with the shell's internal state, but not all.

2

This is a kind of chain loading, sometimes called Bernstein chaining. There's a lovely discussion of this in Andy Chu's Shell has a Forth-like quality. (The entire oil shell blog is full of great stuff.)

JS