<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Julian Squires</title>
  <link href="http://cipht.net/atom.xml" rel="self"/>
  <link href="http://cipht.net/"/>
  <updated>2024-01-01T18:37:34-0330</updated>
  <id>http://cipht.net/</id>
  <author>
    <name>Julian Squires</name>
    <email>julian@cipht.net</email>
  </author>
  <entry><title>Bug story: getaddrinfo(3) and PBR</title><link href='http://cipht.net/2024/01/01/getaddrinfo.html'/><updated>2024-01-01T03:30:00+0000</updated><id>http://cipht.net/2024/01/01/getaddrinfo</id><content type='html'>&lt;p&gt;
A while ago I was working on &lt;a href="https://en.wikipedia.org/wiki/Wireless_access_point"&gt;wireless access points&lt;/a&gt; (APs) based on
&lt;a href="https://openwrt.org/"&gt;OpenWrt&lt;/a&gt;.  One day I discovered that remote logging wasn't working, and
the debugging that followed had some surprises.
&lt;/p&gt;

&lt;p&gt;
On OpenWrt, there's a process called &lt;code&gt;logread&lt;/code&gt; responsible for
shipping the logs to another device via the &lt;a href="https://en.wikipedia.org/wiki/Syslog"&gt;syslog protocol&lt;/a&gt;.  These
APs don't persist their logs between boots, so sending logs to a
system that can store them was essential for diagnosing problems.  I
noticed &lt;code&gt;logread&lt;/code&gt; wasn't running, though it starts on boot, so I added
something to the init script to restart &lt;code&gt;logread&lt;/code&gt; if it crashed, and
was going to call it a day.  But I went to test it, and the logs
weren't showing up; sometimes, the logs would show up right after the
AP booted, but then at some point, it would stop working.
&lt;/p&gt;

&lt;p&gt;
I had already spent a lot of time on the other side of this, the
syslog that receives the logs, and was pretty sure the setup was
correct there.  So I ran &lt;code&gt;logread&lt;/code&gt; by hand, and it failed with
&lt;/p&gt;

&lt;pre class="example" id="org60ccdb6"&gt;
failed to connect: Permission denied
&lt;/pre&gt;

&lt;p&gt;
What?  Permission denied?  I read &lt;a href="https://git.openwrt.org/?p=project/ubox.git;a=blob;f=log/logread.c;h=f48dd4bb6ae0ad436702b09dfee4371a004e2217;hb=HEAD"&gt;the code&lt;/a&gt; to find out where this was
happening, and it was in &lt;code&gt;usock()&lt;/code&gt;, which is some socket code that's
used all over OpenWrt, and there were no obvious calls that could fail
with &lt;code&gt;EACCES&lt;/code&gt; in it.
&lt;/p&gt;

&lt;p&gt;
After checking some ACLs, making sure this couldn't possibly be a
permission problem (it's running as root), I decided to &lt;code&gt;strace&lt;/code&gt;
&lt;code&gt;logread&lt;/code&gt; (this required rebuilding the flash image for the AP, which
is why I didn't do it earlier), and saw:
&lt;/p&gt;

&lt;pre class="example" id="org8a81a44"&gt;
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_UDP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(65535), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
close(8)                                = 0
socket(AF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_UDP) = 8
connect(8, {sa_family=AF_INET6, sin6_port=htons(65535), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &amp;amp;sin6_addr), sin6_scope_id=0}, 28) = -1 EACCES (Perm
ission denied)
close(8)                                = 0
&lt;/pre&gt;

&lt;p&gt;
What the heck?  First off, the connection &lt;code&gt;logread&lt;/code&gt; is trying to make
in this case is a TCP connection, and we're giving it an IP address;
why is it making UDP connections to &lt;code&gt;localhost&lt;/code&gt;?  And why are those
connections failing?
&lt;/p&gt;

&lt;p&gt;
I had a guess on why this started happening &amp;#x2013; a little while before,
IPv6 had been disabled on these devices.  Maybe it hadn't been done
thoroughly enough?  I checked &lt;code&gt;ip addr&lt;/code&gt;, and &lt;code&gt;lo&lt;/code&gt; definitely did not
have &lt;code&gt;::1&lt;/code&gt; as an address, and IPv6 was disabled through the
&lt;a href="https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt"&gt;&lt;code&gt;disable_ipv6&lt;/code&gt; &lt;code&gt;sysctl&lt;/code&gt;&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
I decided that it was probably &lt;a href="https://git.openwrt.org/?p=project/libubox.git;a=blob;f=usock.c;h=15b6988e3c6d8269eb6fde2c085d3eccc6349c22;hb=HEAD#l157"&gt;a call to&lt;/a&gt; &lt;a href="https://man7.org/linux/man-pages/man3/getaddrinfo.3.html"&gt;&lt;code&gt;getaddrinfo()&lt;/code&gt;&lt;/a&gt; making UDP
connections &amp;#x2013; maybe it's trying to resolve DNS &amp;#x2013; but why port 65535?
Is that just an ephemeral port it's choosing every single time?
&lt;/p&gt;

&lt;p&gt;
I tested &lt;code&gt;getaddrinfo&lt;/code&gt; from Lua (the only interpreted language on the
device), but it worked fine, so there had to be something about how
&lt;code&gt;usock&lt;/code&gt; was calling it; did it want IPv6 addresses specifically or
something?
&lt;/p&gt;

&lt;p&gt;
&lt;a href="https://musl.libc.org/"&gt;musl&lt;/a&gt; is the libc of choice on these devices.  Checking its
&lt;a href="https://git.musl-libc.org/cgit/musl/tree/src/network/getaddrinfo.c?id=dc9285ad1dc19349c407072cc48ba70dab86de45#n44"&gt;implementation&lt;/a&gt; of &lt;code&gt;getaddrinfo&lt;/code&gt;, we see this block of code near the
top:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;flags &amp;amp; AI_ADDRCONFIG&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;Define the "an address is configured" condition for address&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt;     * families via ability to create a socket for the family plus&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt;     * routability of the loopback address for the family.&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;&amp;#8230;&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;static&lt;/span&gt; &lt;span style="color: #13665F;"&gt;const&lt;/span&gt; &lt;span style="color: #13665F;"&gt;struct&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;sockaddr_in6&lt;/span&gt; &lt;span style="color: #845A84;"&gt;lo6&lt;/span&gt; = &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
        .sin6_family = AF_INET6, .sin6_port = 65535,
        .sin6_addr = IN6ADDR_LOOPBACK_INIT
    &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;const&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; *&lt;span style="color: #845A84;"&gt;ta&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;[&lt;/span&gt;2&lt;span style="color: #a9779c;"&gt;]&lt;/span&gt; = &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt; &amp;amp;lo4, &amp;amp;lo6 &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;&amp;#8230;&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;i=0; i&amp;lt;2; i++&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
      &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;&amp;#8230;&lt;/span&gt;
      &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;s&lt;/span&gt; = socket&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;tf&lt;span style="color: #787096;"&gt;[&lt;/span&gt;i&lt;span style="color: #787096;"&gt;]&lt;/span&gt;, SOCK_CLOEXEC|SOCK_DGRAM,
                     IPPROTO_UDP&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;;
      &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;s&amp;gt;=0&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
        &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;cs&lt;/span&gt;;
        pthread_setcancelstate&lt;span style="color: #787096;"&gt;(&lt;/span&gt;
                               PTHREAD_CANCEL_DISABLE, &amp;amp;cs&lt;span style="color: #787096;"&gt;)&lt;/span&gt;;
        &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;r&lt;/span&gt; = connect&lt;span style="color: #787096;"&gt;(&lt;/span&gt;s, ta&lt;span style="color: #4d9391;"&gt;[&lt;/span&gt;i&lt;span style="color: #4d9391;"&gt;]&lt;/span&gt;, tl&lt;span style="color: #4d9391;"&gt;[&lt;/span&gt;i&lt;span style="color: #4d9391;"&gt;]&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;;
        pthread_setcancelstate&lt;span style="color: #787096;"&gt;(&lt;/span&gt;cs, 0&lt;span style="color: #787096;"&gt;)&lt;/span&gt;;
        close&lt;span style="color: #787096;"&gt;(&lt;/span&gt;s&lt;span style="color: #787096;"&gt;)&lt;/span&gt;;
        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #787096;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;r&lt;span style="color: #787096;"&gt;)&lt;/span&gt; &lt;span style="color: #13665F;"&gt;continue&lt;/span&gt;;
      &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
      &lt;span style="color: #13665F;"&gt;switch&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;errno&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
      &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; EADDRNOTAVAIL:
      &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; EAFNOSUPPORT:
      &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; EHOSTUNREACH:
      &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; ENETDOWN:
      &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; ENETUNREACH:
        &lt;span style="color: #13665F;"&gt;break&lt;/span&gt;;
      &lt;span style="color: #13665F;"&gt;default&lt;/span&gt;:
        &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; EAI_SYSTEM;
      &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
      &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;&amp;#8230;&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
And sure enough, &lt;code&gt;usock&lt;/code&gt; always sets &lt;code&gt;AI_ADDRCONFIG&lt;/code&gt; on flags.  So
this is a kind of probing connect musl is using to check the validity
of IPv4 or IPv6 on the system.  The &lt;code&gt;connect&lt;/code&gt; is returning &lt;code&gt;EACCES&lt;/code&gt;,
but musl isn't handling it as part of the errors it considers
"normal".  It bails out early, and leaves &lt;code&gt;errno&lt;/code&gt; set to &lt;code&gt;EACCES&lt;/code&gt;
where &lt;code&gt;logread&lt;/code&gt; prints it out to mystify us.
&lt;/p&gt;

&lt;p&gt;
But why would &lt;code&gt;connect&lt;/code&gt; fail with &lt;code&gt;EACCES&lt;/code&gt;?  The man page doesn't list
anything that makes sense for this.&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt;  Weirder still, I
decide to check if there are any IPv6 addresses at all &amp;#x2013; and there is
one, but for &lt;code&gt;eth0&lt;/code&gt;, not &lt;code&gt;lo&lt;/code&gt;.  I delete it, and suddenly &lt;code&gt;logread&lt;/code&gt;
works.
&lt;/p&gt;

&lt;p&gt;
At this point I start looking for information about musl's
&lt;code&gt;getaddrinfo&lt;/code&gt; and this issue, and &lt;a href="https://www.openwall.com/lists/musl/2019/12/05/1"&gt;find a patch&lt;/a&gt; posted to the mailing
list with no replies, never applied.
&lt;/p&gt;

&lt;p&gt;
Sweet!  I head over to &lt;a href="https://musl.libc.org/support.html"&gt;&lt;code&gt;#musl&lt;/code&gt;&lt;/a&gt; on IRC and ask them if it wasn't
applied for a reason, and they say it must have been overlooked.  But
then someone tries to reproduce with the instructions in the patch,
and can't.
&lt;/p&gt;

&lt;p&gt;
I dive into the kernel source trying to figure out what actually
returns &lt;code&gt;EACCES&lt;/code&gt; here.  There is a &lt;span class="underline"&gt;lot&lt;/span&gt; of code under
&lt;code&gt;ip6_datagram_connect&lt;/code&gt; so I tried to grep and pray, but there were
still too many possibilites to know for sure.  This is an opportunity
to use &lt;a href="https://www.kernel.org/doc/html/v5.0/trace/ftrace.html"&gt;&lt;code&gt;ftrace&lt;/code&gt;&lt;/a&gt;!  I had to rebuild the kernel, since these are
stripped down images for embedded devices, and I was worried
&lt;code&gt;trace-cmd&lt;/code&gt; might actually crash the device, but I got a capture fine.
I could see clearly that the last useful function called under
&lt;code&gt;ip6_datagram_connect&lt;/code&gt; was &lt;code&gt;fib6_rule_action&lt;/code&gt;, which can return
&lt;code&gt;EACCES&lt;/code&gt;, but why?  What even are these rules?
&lt;/p&gt;

&lt;p&gt;
I spent a while even trying to figure out what these rules are and how
to manipulate them.  It turns out they're for "&lt;a href="https://en.wikipedia.org/wiki/Policy-based_routing"&gt;policy-based routing&lt;/a&gt;"
(PBR), which I hadn't really explored before.  I didn't even realize
some of these firewall-like policies could be handled at this level.
&lt;/p&gt;

&lt;p&gt;
I was running the &lt;code&gt;ip rule&lt;/code&gt; command and not seeing anything
interesting, until I finally read the source for &lt;code&gt;ip&lt;/code&gt; and &lt;a href="https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/ip/iprule.c#n552"&gt;noticed it
defaults to IPv4&lt;/a&gt; &amp;#x2013; I needed to run &lt;code&gt;ip -6 rule&lt;/code&gt;, but that flag is
in the &lt;code&gt;ip(8)&lt;/code&gt; manpage, not the &lt;code&gt;ip-rule(8)&lt;/code&gt; manpage for the
subcommand.  But running it on the AP, I saw:
&lt;/p&gt;

&lt;pre class="example" id="orgd181ca7"&gt;
# ip -6 rule
0:      from all lookup local
32766:  from all lookup main
4200000001:     from all iif lo lookup unspec 12
4200000002:     from all iif eth0 lookup unspec 12
4200000003:     from all iif eth1 lookup unspec 12
&lt;/pre&gt;

&lt;p&gt;
I'm not sure I fully understand these rules now, and it took a bit of
looking (and &lt;code&gt;strace&lt;/code&gt; to confirm the netlink message being sent) to
see that this is action "12", which isn't one of the actions in the
(mainline) kernel.  But it was enough info to demonstrate that the
issue could be reproduced on any Linux system with
&lt;/p&gt;

&lt;pre class="example" id="org27bebd9"&gt;
ip -6 rule add from all iif lo lookup unspec prohibit
&lt;/pre&gt;

&lt;p&gt;
Some discussion on the musl mailing list revealed that action 12 is a
special rule &lt;a href="https://git.openwrt.org/?p=openwrt/svn-archive/archive.git;a=blob;f=target/linux/generic/patches-3.19/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch;h=f32458df30ad466d4e3ac8224cbec1bd074b43ec;hb=35d90ba52069c96afd1a74600b91499e5feed0e0#l42"&gt;OpenWrt adds in their kernel&lt;/a&gt;.  I discovered that
&lt;code&gt;netifd&lt;/code&gt;, which manages interfaces and rules on OpenWrt, was setting
IPv6 policies like this, even when IPv6 was disabled, so &lt;a href="https://patchwork.ozlabs.org/project/openwrt/patch/20210430143037.6763-1-julian@cipht.net/"&gt;I patched
that out&lt;/a&gt;.  And finally remote logging worked again.
&lt;/p&gt;

&lt;p&gt;
This was a surprising set of interactions.  Figuring it out was
tractable thanks to having all the source for everything, and
reasonable tools for introspection.  Is there a moral to this story?
Perhaps a few tidbits: strace and ftrace are good; getaddrinfo is bad;
maybe don't disable IPv6; and &lt;a href="https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man2/connect.2?id=375c65a9c2f5fef9796672078769104074530ec1"&gt;blessed are those who update manpages&lt;/a&gt;.
&lt;/p&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
At the time this happened, the man pages on my system
didn't list this, but looking now, I see &lt;a href="https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man2/connect.2?id=375c65a9c2f5fef9796672078769104074530ec1"&gt;Stefan Puiu added a note
about this&lt;/a&gt;, debugging much the same situation as mine.  What a time
saver this would have been.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>the perils of pause(2)</title><link href='http://cipht.net/2023/11/30/perils-of-pause.html'/><updated>2023-11-30T03:30:00+0000</updated><id>http://cipht.net/2023/11/30/perils-of-pause</id><content type='html'>&lt;p&gt;
I recently had a bug in a simple program that has a form I've seen a
lot in the last few years: loops and signal handling without masking.
The worst thing about these kinds of bugs is that they don't rear
their heads immediately &amp;#x2013; they fall into the class of "huh, it's
blocked in a syscall and I'm sure it should have woken up" bugs.
Let's look at the problem and then how to lint it.
&lt;/p&gt;

&lt;div id="outline-container-org8a9512d" class="outline-2"&gt;
&lt;h2 id="org8a9512d"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; A common mistake&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
I had some tooling for a test suite that would wait for a specified
signal, and then print the name of that signal on stdout.  I did this
by setting up a signal handler, and then calling &lt;code&gt;pause()&lt;/code&gt;, which
suspends the program until a signal is delivered (i.e., it always
returns &lt;code&gt;EINTR&lt;/code&gt;).  The program indicates to its cooperating programs
that it's ready for the signal by printing "ok".
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;static&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;sig_atomic_t&lt;/span&gt; &lt;span style="color: #845A84;"&gt;got&lt;/span&gt;;
&lt;span style="color: #13665F;"&gt;static&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;h&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;n&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; got = n; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt;//&lt;/span&gt;&lt;span style="color: #7c878a;"&gt;...&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;sigaction&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;sig, &amp;amp;&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;struct&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;sigaction&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;){&lt;/span&gt;.sa_handler=h&lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;NULL&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
        &lt;span style="color: #4C7A90;"&gt;abort&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;;
    write&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;1, &lt;span style="color: #39854C;"&gt;"ok\n"&lt;/span&gt;, 3&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;do&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; pause&lt;span style="color: #a9779c;"&gt;()&lt;/span&gt;; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;while&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;sig != got&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Every so often, this program will just hang and the test would time
out.  Worse yet, it's rare enough that I didn't really notice it when
I wrote the code.&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; (Why loop when we only expect one signal?
There are other signals that will interrupt &lt;code&gt;pause&lt;/code&gt; unless you've gone
out of your way to ignore them all; for example, &lt;code&gt;SIGTSTP&lt;/code&gt;.)
&lt;/p&gt;

&lt;p&gt;
One possible problematic execution is this:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;we print ok;&lt;/li&gt;
&lt;li&gt;before we get to &lt;code&gt;pause&lt;/code&gt;, the other program sends the signal;&lt;/li&gt;
&lt;li&gt;now &lt;code&gt;sig == got&lt;/code&gt;, but we &lt;code&gt;pause&lt;/code&gt; anyway, and wait for another
signal that will never come.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Another common execution with this pattern is this:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;we &lt;code&gt;pause&lt;/code&gt;, and get interrupted by some other signal;&lt;/li&gt;
&lt;li&gt;we test &lt;code&gt;got&lt;/code&gt; against our desired &lt;code&gt;sig&lt;/code&gt; and see it hasn't
triggered;&lt;/li&gt;
&lt;li&gt;now our desired signal is delivered, and &lt;code&gt;sig == got&lt;/code&gt;, but we're
already past the test;&lt;/li&gt;
&lt;li&gt;we &lt;code&gt;pause&lt;/code&gt; again, and have to wait arbitrarily long (till some
other signal wakes us up).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
This also happens often in loops with &lt;code&gt;poll&lt;/code&gt; or &lt;code&gt;select&lt;/code&gt;:
&lt;/p&gt;
&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;;;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;A) either there's fd activity, or we get EINTR&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;n_active&lt;/span&gt; = poll&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;fds, n_fds, INFTIM&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;[...] handle fds&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;B) check variables set by signal handler&lt;/span&gt;
    ...
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
We expect poll to get interrupted if there's a signal, however the
signal may arrive after the test at B but before we get back to A.
&lt;/p&gt;

&lt;p&gt;
The solution in all these cases are signal masks, and calls that
manipulate them atomically.  When a signal arrives while masked by the
process, it remains pending until the process unmasks it.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgfbb2677" class="outline-2"&gt;
&lt;h2 id="orgfbb2677"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Masking versus disposition&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
Something that's always confusing about this is that masking a signal
does not affect its disposition.  "Signal disposition" is the action
associated with a signal, that is, what should happen when the signal
is delivered to the process: either a handler is called, the signal is
ignored, or a default action takes place.
&lt;/p&gt;

&lt;p&gt;
Knowing that, you might set a mask for some signal whose disposition
is &lt;code&gt;SIG_DFL&lt;/code&gt;, and see that it works fine, and then be confused when
this doesn't work for signals whose disposition is &lt;code&gt;SIG_IGN&lt;/code&gt;.  POSIX
says:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
If the action associated with a blocked signal is anything other than
to ignore the signal, and if that signal is generated for the thread,
the signal shall remain pending until it is unblocked, it is accepted
when it is selected and returned by a call to the sigwait() function,
or the action associated with it is set to ignore the signal.
 — &lt;a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_01"&gt;POSIX.1-2017 System Interfaces 2.4.1 Signal generation and delivery&lt;/a&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
I noticed that OpenBSD has the slightly strange behavior of treating
signals whose default disposition is to stop or continue the program
as if they have the ignored disposition, so these signals need an
explicit handler.  This is probably a bug?  Although it means we could
have avoided looping in our initial example, so one could argue it's
the better behavior.
&lt;/p&gt;

&lt;p&gt;
Also, signal disposition is per-process, while signal masks are
per-thread &amp;#x2013; but let's not get into that mess here.  (An even bigger
mess is then what happens on &lt;code&gt;exec()&lt;/code&gt; &amp;#x2013; dispositions are reset, but
masks are inherited, as well as pending signals!)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org65006cb" class="outline-2"&gt;
&lt;h2 id="org65006cb"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; Alternatives that atomically unmask&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
In the case of &lt;code&gt;pause(2)&lt;/code&gt;, we can use &lt;code&gt;sigsuspend(2)&lt;/code&gt;, or a few other
similar functions.  If we need the signal handler, our code might look
like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;sigaction&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;sig, &amp;amp;&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;struct&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;sigaction&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;){&lt;/span&gt;.sa_handler=h&lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;NULL&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
    &lt;span style="color: #4C7A90;"&gt;abort&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;;
&lt;span style="color: #E36B3F;"&gt;sigset_t&lt;/span&gt; &lt;span style="color: #845A84;"&gt;set&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;prev&lt;/span&gt;;
sigemptyset&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&amp;amp;set&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
sigaddset&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&amp;amp;set, sig&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;span style="color: #4C7A90;"&gt;sigprocmask&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;SIG_BLOCK, &amp;amp;set, &amp;amp;prev&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
write&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;1, &lt;span style="color: #39854C;"&gt;"ok\n"&lt;/span&gt;, 3&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;span style="color: #13665F;"&gt;do&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; sigsuspend&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&amp;amp;prev&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;while&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;sig != got&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
but in this case, we are just waiting for this signal, and don't need
to take other action when it arrives, so &lt;code&gt;sigwait(2)&lt;/code&gt;
suffices:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #E36B3F;"&gt;sigset_t&lt;/span&gt; &lt;span style="color: #845A84;"&gt;set&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;prev&lt;/span&gt;;
sigemptyset&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&amp;amp;set&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
sigaddset&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&amp;amp;set, sig&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;span style="color: #4C7A90;"&gt;sigprocmask&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;SIG_BLOCK, &amp;amp;set, &amp;amp;prev&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
write&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;1, &lt;span style="color: #39854C;"&gt;"ok\n"&lt;/span&gt;, 3&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;got&lt;/span&gt;;
&lt;span style="color: #13665F;"&gt;do&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; sigwait&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&amp;amp;set, &amp;amp;got&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;while&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;sig != got&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
We might use &lt;code&gt;sigwaitinfo&lt;/code&gt; or &lt;code&gt;sigtimedwait&lt;/code&gt; if we need more details
than just the signal number.  If we were certain no other signal could
arrive, we could avoid the loop entirely, but it's nice to protect
against cases you might not otherwise consider, like testing the
program interactively and hitting ^Z (sending SIGTSTP).
&lt;/p&gt;

&lt;p&gt;
(Note that OpenBSD also lacks &lt;code&gt;sigwaitinfo&lt;/code&gt; and &lt;code&gt;sigtimedwait&lt;/code&gt;
presently.)
&lt;/p&gt;

&lt;p&gt;
For &lt;code&gt;poll(2)&lt;/code&gt; and &lt;code&gt;select(2)&lt;/code&gt;, unmasking variants &lt;code&gt;ppoll(2)&lt;/code&gt; and
&lt;code&gt;pselect(2)&lt;/code&gt; exist for this reason.  (Linux also has &lt;code&gt;signalfd(2)&lt;/code&gt;,
which more naturally integrates with polling loops, but note it only
reads pending signals, so you still need to mask with &lt;code&gt;sigprocmask&lt;/code&gt;,
and now you have to deal with reading &lt;code&gt;siginfo&lt;/code&gt; out of a buffer.  Oh,
and what you actually get out of the fd depends on which process is
reading&amp;#x2026;)  There's also the classic &lt;a href="https://cr.yp.to/docs/selfpipe.html"&gt;self-pipe trick&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0ad119d" class="outline-2"&gt;
&lt;h2 id="org0ad119d"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; Linting&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
This makes me wonder about some kind of review-level lints that only
apply to new code being added.  Ideally we'd flag any code which
accesses variables assigned from signal handlers and calls one of
these functions, in a loop.
&lt;/p&gt;

&lt;p&gt;
Here's my attempt at partially doing this with coccinelle:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-cocci"&gt;&lt;span style="color: #0000ff;"&gt;@@&lt;/span&gt;
sig_atomic_t signal_handler_variable&lt;span style="color: #000000;"&gt;;&lt;/span&gt;
&lt;span style="color: #0000ff;"&gt;@@&lt;/span&gt;
&lt;span style="color: #d02090;"&gt;*   signal_handler_variable&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;...&lt;/span&gt;
&lt;span style="color: #d02090;"&gt;*   \(pause\|poll\|select\)(...)&lt;/span&gt;

&lt;span style="color: #0000ff;"&gt;@@&lt;/span&gt;
sig_atomic_t signal_handler_variable&lt;span style="color: #000000;"&gt;;&lt;/span&gt;
&lt;span style="color: #0000ff;"&gt;@@&lt;/span&gt;
&lt;span style="color: #d02090;"&gt;*   \(pause\|poll\|select\)(...)&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;...&lt;/span&gt;
&lt;span style="color: #d02090;"&gt;*   signal_handler_variable&lt;/span&gt;

&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This will match any function that contains access to a &lt;code&gt;sig_atomic_t&lt;/code&gt;
and a call to &lt;code&gt;pause&lt;/code&gt;, &lt;code&gt;poll&lt;/code&gt;, or &lt;code&gt;select&lt;/code&gt;.  If you save that as
&lt;code&gt;lint-pause.cocci&lt;/code&gt;, you can check code with
&lt;/p&gt;

&lt;pre class="example" id="org840ae7d"&gt;
spatch --very-quiet --sp-file lint-pause.cocci path/to/c/files
&lt;/pre&gt;

&lt;p&gt;
Note that I am just using &lt;code&gt;*&lt;/code&gt; to print out match cases for brevity,
but you can add Python scripts to coccinelle rules for much
prettier/more elaborate reporting.
&lt;/p&gt;

&lt;p&gt;
It's possible to do much more careful matching, like ensuring the poll
calls happen in a loop, or only matching polls with no timeout, but
this simple form is sufficient to catch interesting cases to examine
later.  I also discovered while writing a more elaborate form that &lt;code&gt;do
{} while&lt;/code&gt; matching was &lt;a href="https://gitlab.inria.fr/coccinelle/coccinelle/-/merge_requests/190"&gt;only merged last year&lt;/a&gt; and distributions tend to
carry older version of &lt;code&gt;spatch&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
Note that it only catches use of &lt;code&gt;sig_atomic_t&lt;/code&gt;; while testing this, I
found some old code that just doesn't even use &lt;code&gt;volatile&lt;/code&gt;; at some
point I may write a more elaborate script that flags all globals set
from any function passed to &lt;code&gt;sigaction&lt;/code&gt;, but as a review reminder,
this simpler form suffices for my needs.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org30dbc45" class="outline-2"&gt;
&lt;h2 id="org30dbc45"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; Conclusion&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;dash had a bug of this flavor, and &lt;a href="https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=3800d4934391b144fd261a7957aea72ced7d47ea"&gt;fixed it with sigsuspend&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;busybox ash and hush &lt;a href="https://git.busybox.net/busybox/commit/shell/shell_common.c?id=a277506a64404e6c4472ff89c944c4f353db1c33"&gt;acknowledge that they have this bug&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
I notice that Kerrisk's LPI talks about pause in section 20.14, but
doesn't note its perils, except to indicate that other ways will be
investigated in section 22.9.  There, Kerrisk introduces sigsuspend
and talks about exactly our problem with pause.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="https://sourceware.org/glibc/manual/html_node/Pause-Problems.html"&gt;glibc's info page&lt;/a&gt; talks about this extensively, so it's unfortunate
that, for example, &lt;a href="https://man7.org/linux/man-pages/man2/pause.2.html"&gt;Linux's man page for pause(2)&lt;/a&gt; contains no such
details.
&lt;/p&gt;

&lt;p&gt;
Running the aforementioned coccinelle script across an arbitrary
corpus of packages I have on hand turns up a number of likely
instances of this bug, so this is still an issue worth keeping in
mind.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Note that this program was designed to only handle a single
signal.  If that handler was registered for multiple signals, you
could have races around what value &lt;code&gt;got&lt;/code&gt; gets.  It used to be you
could have &lt;code&gt;got&lt;/code&gt; be a mask &amp;#x2013; e.g. &lt;code&gt;got |= 1&amp;lt;&amp;lt;n;&lt;/code&gt; &amp;#x2013; except these days
&lt;code&gt;SIGRTMAX&lt;/code&gt; and &lt;code&gt;_SIG_MAXSIG&lt;/code&gt; can be way higher than whatever the width
of &lt;code&gt;sig_atomic_t&lt;/code&gt; is, so I guess people end up doing an array of
&lt;code&gt;sig_atomic_t&lt;/code&gt; which is maybe 64 times less dense than you'd like. :-/
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>worried by wordexp(3)</title><link href='http://cipht.net/2023/11/21/worried-by-wordexp.html'/><updated>2023-11-21T03:30:00+0000</updated><id>http://cipht.net/2023/11/21/worried-by-wordexp</id><content type='html'>&lt;p&gt;
The function &lt;code&gt;wordexp(3)&lt;/code&gt; is a POSIX C standard library function which
performs "word expansion like a POSIX shell".  &lt;code&gt;wordexp(3)&lt;/code&gt; combines
the safety of elaborate string parsing in C with the efficiency and
robustness of invoking the shell on arbitrary user input.  Why does it
even exist?  And why shouldn't you use it?
&lt;/p&gt;

&lt;div id="outline-container-org021a6be" class="outline-2"&gt;
&lt;h2 id="org021a6be"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; usage&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
Probably the most legit use is by &lt;code&gt;init&lt;/code&gt;-style programs executing a
command line (e.g. &lt;a href="https://sources.debian.org/src/finit/4.2-1/src/service.c/?hl=520#L520"&gt;finit&lt;/a&gt;); though, since many wordexp implementations
invoke the shell anyway, these might as well exec &lt;code&gt;sh -c 'exec ...'&lt;/code&gt;
instead.
&lt;/p&gt;

&lt;p&gt;
Applications typically use &lt;code&gt;wordexp&lt;/code&gt; to expand tildes and globs
(&lt;code&gt;~/*.txt&lt;/code&gt;), and are oblivious to its excessive powers.  Mostly, these
uses are in places like configuration files the user directly
controls, so any disasters as a consequence of &lt;code&gt;wordexp&lt;/code&gt; can be
considered the user's fault.
&lt;/p&gt;

&lt;p&gt;
More severe are the cases where wordexp's input comes from an
untrusted source&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; or the program is in question is
setuid&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;.  Sometimes people use it when parsing files, even
(e.g. this &lt;a href="https://github.com/syoyo/tinygltf/issues/368"&gt;tinygltf issue&lt;/a&gt; ended up affecting &lt;a href="https://projects.blender.org/blender/blender/commit/466eb426ed96f2112494cc9ee12997255a6aaae2"&gt;blender&lt;/a&gt;).&lt;sup&gt;&lt;a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
I continue to find code copied from Stack Overflow where it remains
recommended, despite it being unsafe, and probably slow, too.
&lt;/p&gt;

&lt;p&gt;
All this would just make wordexp seem like just another call like
&lt;code&gt;popen&lt;/code&gt; or &lt;code&gt;system&lt;/code&gt; where it's obvious that you're opening Pandora's
box, except wordexp has a flag, &lt;code&gt;WRDE_NOCMD&lt;/code&gt;, intended to prevent the
worst abuses of it.  The existence of this flag is a mistake, because
almost no libc actually tries to make it consistently safe.  This flag
may imply to people that wordexp is ever safe to use on untrusted
input.  However &lt;code&gt;WRDE_NOCMD&lt;/code&gt; is effectively broken depending on the
combination of shell and libc in use.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org3d94759" class="outline-2"&gt;
&lt;h2 id="org3d94759"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; a central problem&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
Command substitution has two forms in shell: backtick-delimited
(&lt;code&gt;`command`&lt;/code&gt;) and dollar-parenthesized (&lt;code&gt;$(command)&lt;/code&gt;).  The former
presents more problems for the user, but fewer for the author of the
parser: simply scan ahead for a matching backtick, obeying other
escaping and quoting.
&lt;/p&gt;

&lt;p&gt;
The latter form is often where first-time shell writers fall into
despair.  It may be nested, and contain content-dependent unbalanced
parentheses &lt;sup&gt;&lt;a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt;.  The lexer must mutually-recursively
invoke the parser just to find the end of the token.
&lt;/p&gt;

&lt;p&gt;
The POSIX standard is ambiguous about a concerning detail of shell
syntax: the difference between command and arithmetic substitution.
Don't expend too much effort trying to understand this:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
If the current character is an unquoted '&lt;code&gt;$&lt;/code&gt;' or '&lt;code&gt;`&lt;/code&gt;', the shell shall
identify the start of any candidates for parameter expansion, command
substitution, or arithmetic expansion from their introductory unquoted
character sequences: '&lt;code&gt;$&lt;/code&gt;' or "&lt;code&gt;${&lt;/code&gt;", "&lt;code&gt;$(&lt;/code&gt;" or '&lt;code&gt;`&lt;/code&gt;', and "&lt;code&gt;$((&lt;/code&gt;",
respectively. The shell shall read sufficient input to determine the
end of the unit to be expanded (as explained in the cited
sections). While processing the characters, if instances of expansions
or quoting are found nested within the substitution, the shell shall
recursively process them in the manner specified for the construct
that is found. The characters found from the beginning of the
substitution to its end, allowing for any recursion necessary to
recognize embedded constructs, shall be included unmodified in the
result token, including any embedded or enclosing substitution
operators or quotes. The token shall not be delimited by the end of
the substitution.
&lt;/p&gt;

&lt;p&gt;
— &lt;a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03"&gt;POSIX Shell Command Language, 2.3 Token Recognition&lt;/a&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Each shell and wordexp implementation have their own take on how to
deal with ambiguous expressions like these:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-sh"&gt;$&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; a&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; b&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
$&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; &lt;span style="color: #39854C;"&gt;"("&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; &lt;span style="color: #39854C;"&gt;")"&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
$&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;case&lt;/span&gt; a&lt;span style="color: #13665F;"&gt; in&lt;/span&gt; *) &lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; b;; &lt;span style="color: #13665F;"&gt;esac&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
An actual shell needs to recursively parse these expressions, whereas
most wordexp implementations try to simply match parentheses.  The
last example in particular offers an opportunity to introduce
arbitrary parentheses.
&lt;/p&gt;

&lt;p&gt;
The POSIX rationale does say:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Arithmetic expansions have precedence over command substitutions. That
is, if the shell can parse an expansion beginning with "&lt;code&gt;$((&lt;/code&gt;" as an
arithmetic expansion then it will do so. It will only parse the
expansion as a command substitution (that starts with a subshell) if
it determines that it cannot parse the expansion as an arithmetic
expansion. If the syntax is valid for neither type of expansion, then
it is unspecified what kind of syntax error the shell reports.
&lt;/p&gt;

&lt;p&gt;
— &lt;a href="https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_06_03"&gt;POSIX Rationale for Shell and Utilities, 2.6&lt;/a&gt;
&lt;/p&gt;
&lt;/blockquote&gt;


&lt;p&gt;
But who's reading that?  Clearly not the authors of most popular
shells:
&lt;/p&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope="col" class="org-left"&gt;shell&lt;/th&gt;
&lt;th scope="col" class="org-left"&gt;&lt;code&gt;$((echo a);(echo b))&lt;/code&gt;&lt;/th&gt;
&lt;th scope="col" class="org-left"&gt;&lt;code&gt;$((echo "(");(echo ")"))&lt;/code&gt;&lt;/th&gt;
&lt;th scope="col" class="org-left"&gt;&lt;code&gt;$((case a in *) echo b;; esac))&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;dash&lt;/td&gt;
&lt;td class="org-left"&gt;expects )&lt;/td&gt;
&lt;td class="org-left"&gt;parse error&lt;/td&gt;
&lt;td class="org-left"&gt;parse error&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;bash&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;a b&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;( )&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;syntax error&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;zsh&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;a b&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;bad math expression&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;mksh&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;a b&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;( )&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;code&gt;b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;hush&lt;/td&gt;
&lt;td class="org-left"&gt;expects )&lt;/td&gt;
&lt;td class="org-left"&gt;syntax error&lt;/td&gt;
&lt;td class="org-left"&gt;syntax error&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
And while perhaps &lt;code&gt;zsh&lt;/code&gt; is unlikely to ever be &lt;code&gt;/bin/sh&lt;/code&gt;, all of the
others are reasonable candidates for it that I've seen on other
systems.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org376769c" class="outline-2"&gt;
&lt;h2 id="org376769c"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; implementations&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
Now that we know shells handle these expressions inconsistently, how
do different libcs implement &lt;code&gt;wordexp&lt;/code&gt;?
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org3989db3" class="outline-3"&gt;
&lt;h3 id="org3989db3"&gt;&lt;span class="section-number-3"&gt;3.1.&lt;/span&gt; OpenBSD&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-1"&gt;
&lt;p&gt;
OpenBSD has the best possible implementation of &lt;code&gt;wordexp(3)&lt;/code&gt;: none.
The demerits of the function are discussed in &lt;a href="https://www.mail-archive.com/tech@openbsd.org/msg02325.html"&gt;this thread from 2010&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org20d3e9a" class="outline-3"&gt;
&lt;h3 id="org20d3e9a"&gt;&lt;span class="section-number-3"&gt;3.2.&lt;/span&gt; leveraging the shell&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-2"&gt;
&lt;p&gt;
There are implementations which try to keep &lt;code&gt;wordexp&lt;/code&gt; simple by
shelling out, which was probably the intended behavior when the
function was first created.  Unfortunately, this means &lt;code&gt;WRDE_NOCMD&lt;/code&gt;
can't be trusted in these libcs, without the direct assistance of the
shell.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgf23a49d" class="outline-4"&gt;
&lt;h4 id="orgf23a49d"&gt;&lt;span class="section-number-4"&gt;3.2.1.&lt;/span&gt; musl&lt;/h4&gt;
&lt;div class="outline-text-4" id="text-3-2-1"&gt;
&lt;p&gt;
&lt;a href="https://git.musl-libc.org/cgit/musl/tree/src/misc/wordexp.c?id=7c8454790080395bf5b27857a766b3468aa5ed98"&gt;musl's implementation&lt;/a&gt; is nice and simple, because it shells out.
Unfortunately, in this simplicity, there's no way to really enforce
&lt;code&gt;WRDE_NOCMD&lt;/code&gt;.  musl tries, by matching parentheses, but &lt;code&gt;$((echo
a);(echo b))&lt;/code&gt; or similar will get around it, as long as &lt;code&gt;/bin/sh&lt;/code&gt;
supports such contortions (this confuses &lt;code&gt;dash&lt;/code&gt;, but &lt;code&gt;bash&lt;/code&gt; happily
runs these commands).
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgc46b050" class="outline-4"&gt;
&lt;h4 id="orgc46b050"&gt;&lt;span class="section-number-4"&gt;3.2.2.&lt;/span&gt; Apple / FreeBSD&lt;/h4&gt;
&lt;div class="outline-text-4" id="text-3-2-2"&gt;
&lt;p&gt;
FreeBSD &lt;a href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=13420"&gt;added wordexp in 2002&lt;/a&gt;, using the &lt;a href="https://cgit.freebsd.org/src/tree/lib/libc/gen/wordexp.c?id=faea1495bf19ed23bfec20c5c3257759d4e0e9eb"&gt;problematic shell-invoking
approach&lt;/a&gt;.  To FreeBSD's credit, &lt;a href="https://cgit.freebsd.org/src/commit/?id=d358fa780b338913419f028acdf62896e2481d97"&gt;they fixed the major issues with
this approach&lt;/a&gt; circa 2015 by specializing the shell, indeed noting:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
Shell syntax is too complicated to detect command substitution and
unquoted operators reliably without implementing much of sh's
parser. Therefore, have sh do this detection.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
macOS has inherited versions of this implementation, with some
modifications (&lt;a href="https://opensource.apple.com/source/Libc/Libc-1044.1.2/gen/FreeBSD/wordexp.c"&gt;1044.1.2&lt;/a&gt;, &lt;a href="https://github.com/apple-oss-distributions/Libc/blob/Libc-1534.81.1/gen/FreeBSD/wordexp.c"&gt;1534.81.1&lt;/a&gt;).  An important practical
difference, though, is that on FreeBSD, &lt;code&gt;/bin/sh&lt;/code&gt; is always their
&lt;code&gt;ash&lt;/code&gt;, which at the least doesn't suffer from the aforementioned
parsing problem, while on macOS &lt;code&gt;/bin/sh&lt;/code&gt; has been &lt;code&gt;bash&lt;/code&gt;.&lt;sup&gt;&lt;a id="fnr.5" class="footref" href="#fn.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
One interesting twist of Apple's implementation is that in the past,
&lt;a href="https://github.com/Apple-FOSS-Mirror/Libc/blob/2ca2ae74647714acfc18674c3114b1a5d3325d7d/gen/wordexp.c#L192"&gt;they shelled out to &lt;code&gt;perl&lt;/code&gt;, via &lt;code&gt;popen&lt;/code&gt;&lt;/a&gt;.  Last time I checked, they
use a helper called &lt;code&gt;/usr/lib/system/wordexp&lt;/code&gt;, but this did nothing to
prevent command substitution &amp;#x2013; Apple's libc suffered the same problem
as musl.  (The shell situation on macOS is &lt;a href="https://www.jwz.org/blog/2023/11/how-did-apple-manage-to-break-redirects-on-all-versions-of-bash/"&gt;always evolving in
interesting ways&lt;/a&gt; so who knows what the state is now.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org81a59fb" class="outline-4"&gt;
&lt;h4 id="org81a59fb"&gt;&lt;span class="section-number-4"&gt;3.2.3.&lt;/span&gt; Solaris / Illumos&lt;/h4&gt;
&lt;div class="outline-text-4" id="text-3-2-3"&gt;
&lt;p&gt;
The &lt;a href="https://github.com/TritonDataCenter/illumos-joyent/blob/2f6344af1104ce12d08740df56cd755e87695926/usr/src/lib/libc/port/regex/wordexp.c#L232-L243"&gt;Solaris implementation&lt;/a&gt; (originally from MKS with a copyright of
1985!) is notable for implementing &lt;code&gt;WRDE_NOCMD&lt;/code&gt; by leveraging ksh's
restricted mode.
&lt;/p&gt;

&lt;p&gt;
Not many people may still be using this implementation, but this is
pretty clever and I guess demonstrates that the commercial Unix
implementations may not have been bad.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orga46da7b" class="outline-3"&gt;
&lt;h3 id="orga46da7b"&gt;&lt;span class="section-number-3"&gt;3.3.&lt;/span&gt; parsing shell syntax&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-3"&gt;
&lt;p&gt;
So, instead of being simple, we can try the herculean task of
implementing most of a shell in libc instead.  The only libc I know of
that does this is glibc, though (often old, broken) copies of its
implementation are found widely, both in programs trying to get around
systems like OpenBSD as well as other libcs.  For example, uclibc
has &lt;a href="https://github.com/gittup/uClibc/blob/9dbf00be840a15a52656a039f83d1997344ce507/libc/misc/wordexp/wordexp.c#L2020"&gt;an old version of glibc's wordexp&lt;/a&gt; which has the fatal flaw that
backtick can still be parsed from arithmetic expressions, so
e.g. &lt;code&gt;$[`touch foo`]&lt;/code&gt; will execute a command.
&lt;/p&gt;

&lt;p&gt;
glibc avoids calling out to the shell except when it must, for command
substitution.  This results in &lt;a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=posix/wordexp.c;h=0da98f5b083139629dda47ba554817d0551b6b69;hb=HEAD#l1"&gt;an elaborate, 2500-line
reimplementation of some of a shell parser&lt;/a&gt;, including a single
800-line function to parse parameter expansions.
&lt;/p&gt;

&lt;p&gt;
It omits many details; for example, in arithmetic expansion.  Many
valid (and useful) expressions like &lt;code&gt;$((1&amp;lt;&amp;lt;16))&lt;/code&gt; will not be
recognized.
&lt;/p&gt;

&lt;p&gt;
However it has one great merit.  In 2014, &lt;a href="https://sourceware.org/git/?p=glibc.git;a=commit;h=a39208bd7fb76c1b01c127b4c61f9bfd915bfe7c"&gt;Carlos O'Donnel fixed an
important vulnerability&lt;/a&gt;.  Previously, the code tried to enforce
&lt;code&gt;WRDE_NOCMD&lt;/code&gt; only when it recognized command substitution's two forms,
like many of the implementations which lean on the shell for
everything.  Though it seems obvious in retrospect, glibc's current
implementation guards the actual execution of commands with
&lt;code&gt;WRDE_NOCMD&lt;/code&gt; tests, instead of trying to do this during parsing.
&lt;/p&gt;

&lt;p&gt;
Despite its complexity, it does seem like the only implementation that
is safe to use, though it seems a better policy is to declare a ban on
wordexp.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org4ceecfc" class="outline-2"&gt;
&lt;h2 id="org4ceecfc"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; conclusion&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
I had never heard of &lt;code&gt;wordexp(3)&lt;/code&gt; until I saw it mentioned in the
POSIX standard, while I was implementing a shell.
&lt;/p&gt;

&lt;p&gt;
Many programs that just use &lt;code&gt;wordexp("~/foo")&lt;/code&gt; could be replaced with
&lt;code&gt;glob(..., GLOB_TILDE)&lt;/code&gt;.  (Though keep in mind, &lt;a href="https://sourceware.org/bugzilla/show_bug.cgi?id=24607"&gt;anyone can crash your
program with a bad glob&lt;/a&gt;.)
&lt;/p&gt;

&lt;p&gt;
I'm not sure where it was first implemented; scanning the Unix history
repo, it doesn't really appear until the 90s but there's a reference
in a manpage from FreeBSD 1.0 (which doesn't implement it).  Let's
hope it fades into history similarly.
&lt;/p&gt;

&lt;p&gt;
(Post-publishing addendum: a friend alerted me to the POSIX rationale
for &lt;code&gt;wordexp&lt;/code&gt; (which I should have read to begin with).  It seems the
problem was that people kept demanding more and more features for
&lt;code&gt;glob&lt;/code&gt;, so it seems like the committee threw up their hands and added
the far-too-broad &lt;code&gt;wordexp&lt;/code&gt; in attempt to cover all possible bases.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
&lt;a href="http://cvs.savannah.nongnu.org/viewvc/jailkit/jailkit/src/jk_lsh.c?view=markup#l151"&gt;jailkit&lt;/a&gt; uses wordexp in its restricted shell; to be fair,
this is only enabled if you use the &lt;code&gt;allow_word_expansion&lt;/code&gt; option
which is disabled by default, and there's always the chance it will
link with a safer wordexp like in modern glibc.  However, it ships
with a vulnerable wordexp bundled, for systems without it, and for
example &lt;code&gt;jk_lsh -c '$[`touch foo`]'&lt;/code&gt; will do bad things in such a
case.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
sudo has some clever logic &lt;a href="https://www.sudo.ws/repos/sudo/file/tip/src/sudo_noexec.c#l170"&gt;to stuff the &lt;code&gt;WRDE_NOCMD&lt;/code&gt; flag
into calls to wordexp&lt;/a&gt; in its noexec mode; unfortunately, as
demonstrated in this post, that's not sufficient to prevent execution
from happening in all cases.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Debian's excellent codesearch service provides a quick
way to find calls:
&lt;a href="https://codesearch.debian.net/search?q=%5Cbwordexp%5C%28%5Cb&amp;amp;literal=0&amp;amp;page=1"&gt;https://codesearch.debian.net/search?q=%5Cbwordexp%5C%28%5Cb&amp;amp;literal=0&amp;amp;page=1&lt;/a&gt;
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Consider this example:
&lt;/p&gt;
&lt;div class="org-src-container"&gt;
&lt;pre class="src src-shell"&gt;$&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;case&lt;/span&gt; $&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;case&lt;/span&gt; $&lt;span style="color: #845A84;"&gt;x&lt;/span&gt;&lt;span style="color: #13665F;"&gt; in&lt;/span&gt; *) &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; $&lt;span style="color: #845A84;"&gt;x&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;;; &lt;span style="color: #13665F;"&gt;esac&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #13665F;"&gt;in&lt;/span&gt; (x) $&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; :&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;; *) $&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;echo&lt;/span&gt; :&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;; &lt;span style="color: #13665F;"&gt;esac&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.5" class="footnum" href="#fnr.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
bash has a (disabled by default at compile time) &lt;code&gt;--wordexp&lt;/code&gt;
option, which is an attempt to provide this kind of functionality more
safely.  It tries to disable command substitution everywhere and only
invokes the parser and expander immediately.  Last time I checked,
macOS didn't use this.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>Some talks</title><link href='http://cipht.net/2017/11/02/some-talks.html'/><updated>2017-11-02T02:30:00+0000</updated><id>http://cipht.net/2017/11/02/some-talks</id><content type='html'>&lt;p&gt;
Over the course of 2016 and 2017, I gave a few talks in public.  I
wanted to link to all of them, in part because it forces me to release
the slides and code associated with them, but also to say a little
about the value of public speaking and putting oneself out there.
&lt;/p&gt;

&lt;p&gt;
Like many people, I have a hard time putting myself in front of people
and exposing myself to criticism.  (I've also spent a lot of my life
being intensely private and avoiding visibility of any kind.)  I often
feel an obligation to perfection, and I came to realize that I don't
hold other people to this same standard; I can forgive
well-intentioned mistakes in others, so why not myself?&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
I came to realize that it was impossible for me to get everything
right every time, and that as long as I graciously accept corrections,
and try my best, I'm still providing value.  In particular, I have
often discovered that a lot of things I consider well-known are
exciting and new to a lot of other people.  Also, I just plain enjoy
telling people about things I find exciting.
&lt;/p&gt;

&lt;p&gt;
This is a motivating factor behind blogging for me, even as I'm
acutely aware of the imperfections that necessarily come with these
posts.  (It also helps that the audience of this blog is vanishingly
tiny, so I don't feel bad about expressing myself in my habitual
tangly, verbose, parenthetical style.)
&lt;/p&gt;

&lt;p&gt;
I hope that, then, reading this, you'll consider speaking on something
you find interesting.  You don't have to be an expert: your own
passion plus diligent preparation will do fine.
&lt;/p&gt;

&lt;p&gt;
A good way to get started is to speak at a local meetup (if you're in
or visiting Montreal, please consider speaking at &lt;a href="https://www.meetup.com/Papers-We-Love-Montreal/"&gt;Papers We Love
Montreal (PWLMTL)&lt;/a&gt;!).  After that, why not submit proposals to
conferences you'd like to go to anyway?  You have nothing to lose.
&lt;/p&gt;

&lt;p&gt;
I should mention that &lt;a href="http://bangbangcon.com/"&gt;!!Con&lt;/a&gt; might be my favorite conference I've ever
gone to, so far, and so I encourage you to submit a talk next year.
&lt;/p&gt;

&lt;div id="outline-container-orga4ea9d4" class="outline-2"&gt;
&lt;h2 id="orga4ea9d4"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; &lt;a href="http://www.erlang-factory.com/sfbay2016/julian-squires"&gt;What if your NIF goes adrift?&lt;/a&gt; (Erlang Factory 2016)&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/5Qkqs2oNboA" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;

&lt;p&gt;
(&lt;a href="http://www.erlang-factory.com/static/upload/media/1465548996845641juliansquireswhatifyournifgoesadrift_.pdf"&gt;Slides&lt;/a&gt;; slides source coming when I can find it)
&lt;/p&gt;

&lt;p&gt;
I was working a lot with &lt;a href="http://erlang.org/doc/man/erl_nif.html"&gt;Erlang NIFs&lt;/a&gt;, doing a lot of debugging and
benchmarking, and ended up building a tool to make some parts of this
easier (&lt;a href="https://github.com/tokenrove/niffy"&gt;niffy&lt;/a&gt;).  This talk was both an introduction to niffy and an
opportunity to share some tips and warnings from that experience.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org4ad84c7" class="outline-2"&gt;
&lt;h2 id="org4ad84c7"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; &lt;a href="http://www.erlang-factory.com/euc2016/julian-squires"&gt;Think Outside the VM: Unobtrusive Performance Measurement&lt;/a&gt; (Erlang User Conference 2016)&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/xP2yzaYdjpo" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;
(&lt;a href="http://www.erlang-factory.com/static/upload/media/1474729992651497juliansquiresthinkoutsidethevmeuc2016.pdf"&gt;Slides&lt;/a&gt;; &lt;a href="https://github.com/tokenrove/talk-euc2016-think-outside-the-vm"&gt;slides source&lt;/a&gt;)
&lt;/p&gt;

&lt;p&gt;
This talk was about something I'm still passionate about: safely
inspecting applications in production, in particular to diagnose
performance issues (which can be hard to reproduce outside of
production environments).
&lt;/p&gt;

&lt;p&gt;
Like the previous talk, this was partially driven by &lt;a href="https://github.com/tokenrove/extrospect-beam"&gt;extrospect-beam&lt;/a&gt;,
a tool I had been working on, and having the talk coming up was good
motivation to make a release.  (And this post is a good reminder for
me to pick up that work again and make the tool much easier to use,
and perhaps modernize it with some eBPF approaches that have become
available.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0128073" class="outline-2"&gt;
&lt;h2 id="org0128073"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; &lt;a href="https://systemswe.love/archive/minneapolis-2017/julian-squires"&gt;Implementations of Timing Wheels&lt;/a&gt; (Systems We Love 2017, PWLMTL 2016/11)&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;iframe src="https://player.vimeo.com/video/209021792?title=0&amp;byline=0&amp;portrait=0" width="640" height="360" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;

&lt;p&gt;
(&lt;a href="https://github.com/tokenrove/talk-swl2017-timing-wheels"&gt;Slides source&lt;/a&gt;)
&lt;/p&gt;

&lt;p&gt;
I was thinking a lot about expiry and rotation in many different
contexts, and got very interested in timing wheels.  I gave a talk
that was something like an hour and a half at Papers We Love, and then
somehow condensed it into twenty minutes for SWL.  Needless to say
this compression resulted in some loss, and you can see I'm a bit
agitated in the video, mostly because of the time limit; my earlier
attempts to give the cut down version had still been forty minutes
long.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6c7b716" class="outline-2"&gt;
&lt;h2 id="org6c7b716"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; the Emoji that Killed Chrome (&lt;a href="http://bangbangcon.com"&gt;!!Con&lt;/a&gt; 2017)&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/UE-fJjMasec" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;

&lt;p&gt;
(&lt;a href="https://github.com/tokenrove/talk-bangbangcon-2017-chromoji"&gt;Slides&lt;/a&gt;, &lt;a href="https://bugs.chromium.org/p/chromium/issues/detail?id=586628"&gt;bug report&lt;/a&gt;)
&lt;/p&gt;

&lt;p&gt;
If twenty minutes is hard, ten minutes seemed impossible.  But this
was a fun little debugging story and I ended up very glad I made this
presentation.  I think I fail to make the essential connection to
surrogate pairs in this talk, which is unfortunate, as this is a key
part of why this isn't valid UTF-8.  (See &lt;a href="http://unicode.org/faq/utf_bom.html#utf8-4"&gt;these&lt;/a&gt; &lt;a href="http://unicode.org/faq/utf_bom.html#utf16-2"&gt;entries&lt;/a&gt; in the &lt;a href="http://www.unicode.org/faq/"&gt;Unicode
FAQ&lt;/a&gt; for a start.)  The audience remained very supportive however,
which helped a lot.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org624c8e9" class="outline-2"&gt;
&lt;h2 id="org624c8e9"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; &lt;a href="https://www.meetup.com/preview/Papers-We-Love-Montreal/events/243307433"&gt;Simple Fast Algorithms for the Editing Distance between Trees &amp;amp; Related Problems&lt;/a&gt; (PWLMTL 2017/09)&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
No slides, since I like doing these talks on a whiteboard, but all the
papers cited in the comments of the &lt;a href="https://www.meetup.com/preview/Papers-We-Love-Montreal/events/243307433"&gt;event page&lt;/a&gt; (which I can't seem to
link directly to, but which live at the bottom) are a pretty good
read.  Here's the code that was presented: (looks like it could use
some refactoring)
&lt;/p&gt;

&lt;script src="https://gist.github.com/tokenrove/fbc0541be1e4a6da9f33842db197b993.js"&gt;&lt;/script&gt;

&lt;p&gt;
There's no video for this talk, because we don't record talks at
PWLMTL.  I get questions about this a lot, and wanted to say a few
words about it.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6b11eb0" class="outline-3"&gt;
&lt;h3 id="org6b11eb0"&gt;&lt;span class="section-number-3"&gt;5.1.&lt;/span&gt; Why PWLMTL doesn't record talks, and allows interruptions&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-5-1"&gt;
&lt;p&gt;
Normally, questions in conference talks are kind of awful.  You're on
a strict schedule, so interruptions and digressions are just going to
derail the speaker, and most of the questions at the end are usually
more &lt;a href="https://www.recurse.com/manual#no-well-actuallys"&gt;"well, actually"&lt;/a&gt; than elucidating.
&lt;/p&gt;

&lt;p&gt;
I wanted to create an environment unlike that, where people, both
attendees and presenter, would feel comfortable &lt;a href="http://www.catb.org/jargon/html/W/Whats-a-spline.html"&gt;asking "dumb"
questions&lt;/a&gt;, making mistakes, and exploring digressions.  I know that if
a talk is being recorded, I am much less comfortable asking a
question.
&lt;/p&gt;

&lt;p&gt;
So this helps PWLMTL function more like a discussion group than a
conference talk, and I think this is valuable.  I wish there were more
things like this, but unfortunately they inherently don't scale.
(Both in number of attendees, and in the number of these talks you can
have in a day: PWLMTL benefits from a very flexible schedule, where
talks can go from forty-five minutes to three hours long, depending on
the stamina of the audience.)
&lt;/p&gt;

&lt;p&gt;
(This still doesn't solve the problem that many people aren't
comfortable speaking up, but I think trying to keep the environment as
encouraging to speaking up as possible, as well as having a moderator
facilitating the conversation (and curtailing those who are talking a
little too much&amp;#x2026;), is better than the alternatives.  I can't stand
talks where I feel like the message is "don't ask questions"; why
should I even attend?)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgc17aff0" class="outline-2"&gt;
&lt;h2 id="orgc17aff0"&gt;&lt;span class="section-number-2"&gt;6.&lt;/span&gt; Conclusion&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
Hopefully you enjoyed at least one of these talks.  If not, that's ok;
I think my future talks will improve.
&lt;/p&gt;

&lt;p&gt;
If you think even speaking at a meetup group is intractible, how about
speaking to a small group of your peers?  Many companies have lunch
time or Friday afternoon activities where you can give a talk or lead
a workshop; and if yours doesn't, you could start one.  (The same
applies if you're a student, of course.)  My experience is that these
things are greatly appreciated and become an important part of a
company's culture.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Incidentally, if this really resonates with you, I can
highly recommend Jeff Szymanski's &lt;a href="http://drjeffszymanski.com/the-perfectionists-handbook"&gt;the Perfectionist's Handbook&lt;/a&gt;; unlike
many similar books, it doesn't simply urge you to lower your
standards, and has much practical advice.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>Fixie tries</title><link href='http://cipht.net/2017/10/29/fixie-tries.html'/><updated>2017-10-29T02:30:00+0000</updated><id>http://cipht.net/2017/10/29/fixie-tries</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;tl;dr:&lt;/b&gt; Here's a trie you probably don't want to use, but you might
find interesting: an x86-64-specific popcount-array radix trie for
fixed-length keys.  The code (in Rust) is &lt;a href="https://github.com/tokenrove/fixie-trie"&gt;on GitHub&lt;/a&gt;.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
In which I discuss a slightly-hackish minor trie variant, &lt;a href="https://github.com/tokenrove/fixie-trie"&gt;the fixie
trie&lt;/a&gt;.  There are already so many kinds of tries&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt;:
&lt;a href="https://xlinux.nist.gov/dads/HTML/patriciatree.html"&gt;PATRICIA&lt;/a&gt;, &lt;a href="http://judy.sourceforge.net/"&gt;Judy array&lt;/a&gt;, &lt;a href="http://lampwww.epfl.ch/papers/idealhashtrees.pdf"&gt;hash array mapped trie (HAMT)&lt;/a&gt;, &lt;a href="https://cr.yp.to/critbit.html"&gt;crit-bit&lt;/a&gt;,
&lt;a href="https://dotat.at/prog/qp/README.html"&gt;qp-trie&lt;/a&gt;, &lt;a href="http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p57.pdf"&gt;poptrie&lt;/a&gt;, &lt;a href="https://www.nada.kth.se/~snilsson/publications/TRASH/trash.pdf"&gt;TRASH&lt;/a&gt;, &lt;a href="https://db.in.tum.de/~leis/papers/ART.pdf"&gt;adaptive radix trie (ART)&lt;/a&gt;, …
&lt;/p&gt;

&lt;p&gt;
This is without even getting into suffix tries and so on.  There are
almost as many trie variants as there are heap variants.  (In a way,
&lt;a href="http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf"&gt;hierarchical timing wheels&lt;/a&gt; are also a kind of trie structure.&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;)
&lt;/p&gt;

&lt;p&gt;
So although I'm giving this one a new name, it probably already exists
in some form.  My apologies to whoever's named variant I may have
stepped on.  Where did yet another trie variant come from?
&lt;/p&gt;

&lt;div id="outline-container-orgb135bc6" class="outline-2"&gt;
&lt;h2 id="orgb135bc6"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; Crit-Bit Tr[ei]es&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;

&lt;div id="org1af0bb0" class="figure"&gt;
&lt;p&gt;&lt;img src="http://www.cipht.net/images/2017-10-29-critbit.png" alt="2017-10-29-critbit.png" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
When I first stumbled on Dan Bernstein's description of &lt;a href="https://cr.yp.to/critbit.html"&gt;crit-bit
trees&lt;/a&gt;, I got really excited.&lt;sup&gt;&lt;a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt;  He paints a pretty compelling picture:
a data structure simpler, faster, smaller, and more featureful than
hash tables.  Who needs these hash tables anyway?
&lt;/p&gt;

&lt;p&gt;
Of course, it turned out that in practice, at least for the critbit
trees described by djb, all that pointer-chasing wasn't great, and
it's not straightforward to adapt to types where you don't have a
natural sentinel like NUL.
&lt;/p&gt;

&lt;p&gt;
Then I found Tony Finch's &lt;a href="https://dotat.at/prog/qp/README.html"&gt;qp-tries&lt;/a&gt; which are even better, but first
let's go through HAMT.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgae92b02" class="outline-2"&gt;
&lt;h2 id="orgae92b02"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; array mapped tries&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;

&lt;div id="orge742ccc" class="figure"&gt;
&lt;p&gt;&lt;img src="http://www.cipht.net/images/2017-10-29-amt.png" alt="2017-10-29-amt.png" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Marek's &lt;a href="https://idea.popcount.org/2012-07-25-introduction-to-hamt/"&gt;introduction to HAMT&lt;/a&gt; is better at explaining this than I am,
but briefly: there's this great trick you can do if you have a fast
way to count the 1s in a bitmap (this operation is called population
count, or "popcount" for short).&lt;sup&gt;&lt;a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
Let's say you want a map from the integers [0,8) to elements in an
array.  This is such a small number that you could just allocate eight
elements and your "map" is just array indexing, but let's pretend
eight is a larger number, since I didn't want to draw a bunch of
sixty-four-element-wide diagrams in this article.
&lt;/p&gt;

&lt;p&gt;
If you wanted to only allocate space for the elements that are present
in this map, you could use an 8-bit bitmap to indicate presence, and
store the elements in order.  Now, popcount of the whole bitmap gives
you the number of elements present; the individual bits tell you a
given element is present; and the popcount of the bitmap masked by the
bits lower than that element gives you its index in the array.
&lt;/p&gt;

&lt;p&gt;
Phil Bagwell presents this idea in &lt;a href="http://lampwww.epfl.ch/papers/triesearches.pdf.gz"&gt;Fast and Efficient Trie Searches&lt;/a&gt; to
yield array mapped tries (AMT)&lt;sup&gt;&lt;a id="fnr.5" class="footref" href="#fn.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt;.  Later, in &lt;a href="https://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf"&gt;Ideal Hash Tries&lt;/a&gt;,
he builds on these to yield hash array mapped tries (HAMT), which have
become widely implemented.&lt;sup&gt;&lt;a id="fnr.6" class="footref" href="#fn.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt; &lt;sup&gt;, &lt;/sup&gt;&lt;sup&gt;&lt;a id="fnr.7" class="footref" href="#fn.7" role="doc-backlink"&gt;7&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
This popcount trick appears in all sorts of other places, for example
in unrelated travels I just ran into it in the &lt;a href="http://pages.pathcom.com/~vadco/cwg.html"&gt;Caroline Word Graph&lt;/a&gt; as
a consequence of reading Appel and Jacobson's &lt;a href="https://www.cs.cmu.edu/afs/cs/academic/class/15451-s06/www/lectures/scrabble.pdf"&gt;The World's Fastest
Scrabble Program&lt;/a&gt;.  (Which I now notice is also cited by Bagwell in the
aforementioned paper.)
&lt;/p&gt;

&lt;p&gt;
(BTW, Bagwell's other papers, such as &lt;a href="https://infoscience.epfl.ch/record/52465/files/IC_TECH_REPORT_200244.pdf"&gt;Fast Functional Lists&lt;/a&gt;, are also
well worth reading!)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0180edc" class="outline-2"&gt;
&lt;h2 id="org0180edc"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; &lt;a href="https://dotat.at/prog/qp/README.html"&gt;qp-tries&lt;/a&gt;&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;

&lt;div id="org52eb315" class="figure"&gt;
&lt;p&gt;&lt;img src="http://www.cipht.net/images/2017-10-29-qp.png" alt="2017-10-29-qp.png" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Now, qp-tries sort of combine HAMT and crit-bit trees; they work with
larger chunks instead of a bit at a time; the original description was
a nibble at a time, so a 64-bit word uses 16 bits for a bitmap
representing possible nibbles, and the rest of the word stores the
index of the nibble being tested (the "critical nibble").
&lt;/p&gt;

&lt;p&gt;
I found qp-tries inspiring in part because of the lucid implementation
which yielded great performance with simple code.  Check out &lt;a href="https://github.com/fanf2/qp/blob/HEAD/qp.h"&gt;qp.h&lt;/a&gt; for
a nice overview in the comments, and &lt;a href="https://github.com/fanf2/qp/blob/HEAD/qp.c"&gt;qp.c&lt;/a&gt; for the implementation.
&lt;/p&gt;

&lt;p&gt;
qp-tries are great for lots of cases, especially strings; I wanted to
go on about them but in general I suggest just checking out everything
on the &lt;a href="https://dotat.at/prog/qp/README.html"&gt;qp-trie homepage&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org3837b6b" class="outline-2"&gt;
&lt;h2 id="org3837b6b"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; canonical pointers on x86-64&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
For whatever reason, this never came to me when I encountered HAMT,
but my ears perked up when I heard 16, 48, 64, in the description of
qp-tries.  In practice, x86-64 machines only use 48 of the bits in a
pointer (64 bits); what about the other 16?  Intel has the following
to say about addresses: (&lt;a href="https://software.intel.com/en-us/articles/intel-sdm"&gt;SDM volume 1&lt;/a&gt;, 3.3.7.1 Canonical Addressing)
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
In 64-bit mode, an address is considered to be in canonical form if
address bits 63 through to the most-significant implemented bit by the
microarchitecture are set to either all ones or all zeros.
&lt;/p&gt;

&lt;p&gt;
Intel 64 architecture defines a 64-bit linear address. Implementations
can support less. The first implementation of IA-32 processors with
Intel 64 architecture supports a 48-bit linear address. This means a
canonical address must have bits 63 through 48 set to zeros or ones
(depending on whether bit 47 is a zero or one).
&lt;/p&gt;

&lt;p&gt;
Although implementations may not use all 64 bits of the linear
address, they should check bits 63 through the most-significant
implemented bit to see if the address is in canonical form. If a
linear-memory reference is not in canonical form, the implementation
should generate an exception.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
(Note this means that pointers are in a sense "signed" on this
platform!)
&lt;/p&gt;

&lt;p&gt;
So, we get no guarantees that we can always use those upper 16 bits,
but this is the case presently&lt;sup&gt;&lt;a id="fnr.8" class="footref" href="#fn.8" role="doc-backlink"&gt;8&lt;/a&gt;&lt;/sup&gt;: they must always be all
zeros, or all ones.  And, at least on Linux, the kernel uses the
"negative" addresses (whose 16-bit prefix is all ones), so our
userspace pointers are always zero there.  What a tempting place to
stuff some extra data!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org1bc4fd1" class="outline-2"&gt;
&lt;h2 id="org1bc4fd1"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; fixie tries&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
Then, earlier this year, I was laying the groundwork for potentially
developing a service in Rust at a company where previously such
services had been written in C.  I wanted to write some code that
would prove to me that it wouldn't be too painful to drop to the level
of pointers, bits, and syscalls that we needed.  I started with the
&lt;a href="https://github.com/tokenrove/magic-ringbuffer-rs"&gt;magic ringbuffer&lt;/a&gt;, and then a popcount-based &lt;a href="https://github.com/tokenrove/tiny-compact-map"&gt;tiny compact map&lt;/a&gt;.&lt;sup&gt;&lt;a id="fnr.9" class="footref" href="#fn.9" role="doc-backlink"&gt;9&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
Finally, I needed a structure where I could space-efficiently map
fixed-length integers to values.  What if we gave up critbit's nice
property of compressing paths where the bits are the same, and
reconstructed the key by traversing the trie?  Could we stuff that
bitmap into the unused bits, and end up with a single word per branch?
&lt;/p&gt;


&lt;div id="orgf2ca6b9" class="figure"&gt;
&lt;p&gt;&lt;img src="http://www.cipht.net/images/2017-10-29-fixie.png" alt="2017-10-29-fixie.png" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
That's basically fixie tries: we use the lowest bit on a pointer to
determine if it's a branch or a leaf (since pointers must be word
aligned, we actually have a few free bits at the bottom).  If it's a
branch, we cut off its sign extension and put a bitmap there; the
pointer that results from masking the branch with
&lt;code&gt;0x0000_ffff_ffff_fffe&lt;/code&gt; points to an array containing the children
indicated by the bitmap.
&lt;/p&gt;

&lt;p&gt;
If a leaf is at the deepest level in the trie, it just points directly
to a value, since we can reconstruct the key as we walk to it.  If the
leaf is somewhere shallower, we store a tuple of the full key and the
value; there are probably improvements to be made there, but the
hassles of aligning a partially-nibbled key have kept me away from
them.
&lt;/p&gt;

&lt;p&gt;
To be clear, though qp-tries were what sparked the flame, these are
more like Bagwell's array mapped tries.  They're
x86-64-specific&lt;sup&gt;&lt;a id="fnr.10" class="footref" href="#fn.10" role="doc-backlink"&gt;10&lt;/a&gt;&lt;/sup&gt;, in a way that you probably don't want
to depend on, and they're not magically better than everything else,
but they have a few nice properties.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org48c71b5" class="outline-2"&gt;
&lt;h2 id="org48c71b5"&gt;&lt;span class="section-number-2"&gt;6.&lt;/span&gt; Benchmarking and performance&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
Sadly I mostly had a puny AMD E-350 to benchmark this on, but
thankfully a friend ran it on a machine identifying itself as packing
16 × &lt;code&gt;Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
Random insertions of 32-bit keys as a set: (it would be nice to
add Roaring or similar to this benchmark)
&lt;/p&gt;

&lt;pre class="example" id="orgc734e47"&gt;
test random_u32_insertions_on_hash_set          ... bench:         134 ns/iter (+/- 139)
test random_u32_insertions_on_fixie_trie_as_set ... bench:         293 ns/iter (+/- 26)
test random_u32_insertions_on_btree_set         ... bench:         343 ns/iter (+/- 44)
&lt;/pre&gt;

&lt;p&gt;
Random insertions of 64-bit keys and 32-bit values:
&lt;/p&gt;

&lt;pre class="example" id="org3f6c556"&gt;
test random_u64_insertions_on_hash_map          ... bench:         154 ns/iter (+/- 158)
test random_u64_insertions_on_fixie_trie_as_map ... bench:         266 ns/iter (+/- 32)
test random_u64_insertions_on_btree_map         ... bench:         435 ns/iter (+/- 44)
&lt;/pre&gt;

&lt;p&gt;
One thing that's part of djb's argument for critbit trees that remains
true for fixie tries is relatively uniform performance.  We can see
that hash inserts are fast, but have a lot of variance.
&lt;/p&gt;

&lt;p&gt;
Random queries on 32-bit keys, acting as a set:
&lt;/p&gt;

&lt;pre class="example" id="orgf8eb2e4"&gt;
test u32_random_queries_on_btree_set               ... bench:           5 ns/iter (+/- 0)
test u32_random_queries_on_fixie_trie_set          ... bench:          22 ns/iter (+/- 0)
test u32_random_queries_on_hash_set                ... bench:          29 ns/iter (+/- 0)
test u32_random_queries_on_a_qp_trie_set           ... bench:          59 ns/iter (+/- 0)
&lt;/pre&gt;

&lt;p&gt;
I was surprised by how good &lt;code&gt;BTreeSet&lt;/code&gt; is here; some of my early
benchmarking (at the beginning of the year) indicated otherwise, but
it (or my methodology) must have improved.
&lt;/p&gt;

&lt;p&gt;
Random queries on 64-bit keys for 32-bit values:
&lt;/p&gt;

&lt;pre class="example" id="org9a636fe"&gt;
test u64_random_queries_on_fixie_trie              ... bench:          23 ns/iter (+/- 0)
test u64_random_queries_on_hash_map                ... bench:          31 ns/iter (+/- 0)
test u64_random_queries_on_btree_map               ... bench:          93 ns/iter (+/- 1)
&lt;/pre&gt;

&lt;p&gt;
BTreeMap starts to fall behind as the key size grows here.
&lt;/p&gt;

&lt;p&gt;
Memory usage (maximum RSS of process) after random insert of &lt;b&gt;n&lt;/b&gt;
64-bit keys associated with 32-bit values:
&lt;/p&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;&amp;#xa0;&lt;/td&gt;
&lt;td class="org-left"&gt;10000&lt;/td&gt;
&lt;td class="org-left"&gt;100000&lt;/td&gt;
&lt;td class="org-left"&gt;1000000&lt;/td&gt;
&lt;td class="org-left"&gt;10000000&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;&lt;code&gt;btree_map&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;2640 kB&lt;/td&gt;
&lt;td class="org-left"&gt;4776 kB&lt;/td&gt;
&lt;td class="org-left"&gt;25900 kB&lt;/td&gt;
&lt;td class="org-left"&gt;237000 kB&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;&lt;code&gt;fixie_trie&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;2852 kB&lt;/td&gt;
&lt;td class="org-left"&gt;5508 kB&lt;/td&gt;
&lt;td class="org-left"&gt;31848 kB&lt;/td&gt;
&lt;td class="org-left"&gt;286212 kB&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;&lt;code&gt;hash_map&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;3272 kB&lt;/td&gt;
&lt;td class="org-left"&gt;8596 kB&lt;/td&gt;
&lt;td class="org-left"&gt;76208 kB&lt;/td&gt;
&lt;td class="org-left"&gt;592096 kB&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;&lt;code&gt;qp_trie&lt;/code&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;3180 kB&lt;/td&gt;
&lt;td class="org-left"&gt;10944 kB&lt;/td&gt;
&lt;td class="org-left"&gt;93796 kB&lt;/td&gt;
&lt;td class="org-left"&gt;809116 kB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
As we can see, &lt;code&gt;BTreeMap&lt;/code&gt; is actually quite compact, which I didn't
expect; since we're measuring the whole process's usage, this takes
into account allocator fragmentation and so on.  It's likely that with
a custom allocator, fixie tries would be much more compact.
&lt;/p&gt;

&lt;p&gt;
Overall, despite still having a lot of relatively low-hanging
optimization opportunities, fixie tries do ok here: they use much less
memory than hash maps and qp-tries, and have consistent performance
for inserts and queries that is never slower than the alternatives
tested.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgff18ba3" class="outline-2"&gt;
&lt;h2 id="orgff18ba3"&gt;&lt;span class="section-number-2"&gt;7.&lt;/span&gt; Tangent: benchmarking caveats&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-7"&gt;
&lt;p&gt;
Aside from the regular disclaimers, here's something specific to how
this benchmark is written.  What's wrong with this approach?
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-rust"&gt;&lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #845A84;"&gt;rng&lt;/span&gt; = &lt;span style="color: #845A84;"&gt;rand&lt;/span&gt;::thread_rng&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;;
&lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #845A84;"&gt;t&lt;/span&gt; = &lt;span style="color: #E36B3F;"&gt;FixieTrie&lt;/span&gt;::new&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;;
b.iter&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;|| &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #845A84;"&gt;_&lt;/span&gt; &lt;span style="color: #13665F;"&gt;in&lt;/span&gt; 0..&lt;span style="color: #E36B3F;"&gt;N_INSERTS&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt; t.insert&lt;span style="color: #787096;"&gt;(&lt;/span&gt;rng.gen::&lt;span style="color: #4d9391;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;u32&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;gt;()&lt;/span&gt;, &lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;; &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
&lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
It makes fixie tries look great compared to BTreeSets.  You should
always scrutinize a benchmark, especially when it surprises you, but
most of all when it confirms your own beliefs (or desires).  This is
when it is most tempting to stop looking and publish your results.
(This making it a rhetorical benchmark.)
&lt;/p&gt;

&lt;p&gt;
We don't recreate the trie every time in this test, so the trie gets
progressively fuller as the iterations continue.  Maybe fixie tries
are good in this situation, but it's not what we meant to test, and
isn't likely to give us much accuracy since the real workload will
change every iteration.
&lt;/p&gt;

&lt;p&gt;
Unfortunately, &lt;code&gt;cargo bench&lt;/code&gt; doesn't give us the ability to do any
kind of setup/teardown between iterations, so we're forced to use
patterns like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-rust"&gt;&lt;span style="color: #13665F;"&gt;fn&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;random_insertions_on_a_set&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #845A84;"&gt;T&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;Rand&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;S&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;Set&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;T&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;&amp;gt;&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;gt;(&lt;/span&gt;&lt;span style="color: #845A84;"&gt;b&lt;/span&gt;: &lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;&lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;Bencher&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;new&lt;/span&gt;: &lt;span style="color: #13665F;"&gt;fn&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;()&lt;/span&gt; -&amp;gt; &lt;span style="color: #E36B3F;"&gt;S&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #845A84;"&gt;rng&lt;/span&gt; = &lt;span style="color: #845A84;"&gt;rand&lt;/span&gt;::thread_rng&lt;span style="color: #a9779c;"&gt;()&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #845A84;"&gt;s&lt;/span&gt; = new&lt;span style="color: #a9779c;"&gt;()&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #845A84;"&gt;_&lt;/span&gt; &lt;span style="color: #13665F;"&gt;in&lt;/span&gt; 0..&lt;span style="color: #E36B3F;"&gt;N_INSERTS&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt; s.insert&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;rng.gen&lt;span style="color: #787096;"&gt;()&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;; &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
    b.iter&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;|| &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt; s.insert&lt;span style="color: #787096;"&gt;(&lt;/span&gt;rng.gen&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;; &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
I can't fault &lt;code&gt;cargo bench&lt;/code&gt; too much here, because it does so much
right, out of the box: it warms up the code under test before trying
to measure, and uses somewhat robust statistical measures to determine
when it's safe to stop (rather than using the mean and standard
deviation as you'll see in a lot of benchmarks).  Also, it takes a lot
of fiddly, system-specific code to be able to measure a single
iteration with accuracy, so it would be difficult for it to provide
this interface.
&lt;/p&gt;

&lt;p&gt;
But since the underlying structure is changing on every iteration, we
may not be measuring what we think we're measuring.  At least filling
it partially before starting the timing puts the
inserts-under-measurement in more consistent territory.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0148929" class="outline-2"&gt;
&lt;h2 id="org0148929"&gt;&lt;span class="section-number-2"&gt;8.&lt;/span&gt; Tangent: debugging a jemalloc deadlock&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-8"&gt;
&lt;p&gt;
After updating this crate to use the latest Rust allocator API, some
tests would hang.  Digging in with gdb revealed they were deadlocked,
trying to lock a mutex they already held.  Although I was 99% sure
this was a bug in my code, jemalloc has had its fair share of deadlock
bugs in the past so I held a glimmer of hope I could find something
really interesting.
&lt;/p&gt;

&lt;p&gt;
The tests didn't hang with the system allocator and valgrind saw
nothing wrong.  So I dug into the code and learned a lot about
jemalloc's internals; especially I learned that there's a lot of
tricky locking in the thread cache code, which surprises me since the
point of the thread cache is to reduce synchronization overhead, but I
guess I was looking at all the slow paths.  I observed that modifying
&lt;code&gt;lg_tcache_max&lt;/code&gt; (the largest size-class kept in the thread cache)
changed at what point it deadlocked, but not in a predictable way
(making a table of values to progress, there was a general trend that
the larger the classes allowed, the more progress it made, but not
consistently).
&lt;/p&gt;

&lt;p&gt;
One of the interesting things was how predictable the failure was.
Thinking I might be doing something to corrupt jemalloc's structures
badly enough to corrupt the lock, I scripted gdb to print the state of
the lock at the points where it was locked and unlocked in the
function where it deadlocks, which revealed that the locks looked fine
in that function. Unfortunately, because everything was optimized out,
it was hard to introspect the structures I thought might have been
getting corrupted; I started building rust with a debug,
assertion-laden jemalloc, but I knew it would take all night on my
laptop.
&lt;/p&gt;

&lt;p&gt;
Before I called it a night, I scripted gdb to print all the calls to
&lt;code&gt;mallocx&lt;/code&gt;, &lt;code&gt;rallocx&lt;/code&gt;, and &lt;code&gt;sdallocx&lt;/code&gt;, and their return values, and
started putting together a sed script to transform this into a C
program that made the same sequence of allocations and
deallocations.&lt;sup&gt;&lt;a id="fnr.11" class="footref" href="#fn.11" role="doc-backlink"&gt;11&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
When I woke up the next morning, I realized there was something
suspicious about the deallocations logged. I tested with the system
allocator, but with various allocators &lt;code&gt;LD_PRELOAD&lt;/code&gt;'d, including the
same version of jemalloc that Rust was using; none of these hung. So I
asked myself, what's the difference?
&lt;/p&gt;

&lt;p&gt;
Of course, Rust is using the sized deallocation API (&lt;code&gt;sdallocx&lt;/code&gt;), and
the system allocator will be going through malloc and free and not
passing any sizes.  Looking again at how my library was calling
dealloc, I spotted the bug instantly; I was claiming some things I was
freeing were smaller than they actually were.  Changing this fixed the
bug.  I took a look through the paths that &lt;code&gt;sdallocx&lt;/code&gt; would take if
given smaller sizes, and it looks like if my build of Rust with
jemalloc assertions enabled had finished compiling, it would have
detected my mistake, but otherwise, supplying the wrong size would
cause havoc in the thread cache later.
&lt;/p&gt;

&lt;p&gt;
You might say that I should have known to look at these things first,
but such is the nature of bugs.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgcab3e33" class="outline-2"&gt;
&lt;h2 id="orgcab3e33"&gt;&lt;span class="section-number-2"&gt;9.&lt;/span&gt; Tangent: property testing saves the day&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-9"&gt;
&lt;p&gt;
I would have zero confidence in this if it weren't for property
testing.  Although not as full-featured as the libraries for &lt;a href="http://hypothesis.works/"&gt;some&lt;/a&gt;
&lt;a href="http://www.quviq.com/products/erlang-quickcheck/"&gt;other&lt;/a&gt; &lt;a href="https://github.com/manopapad/proper"&gt;languages&lt;/a&gt;, Rust's &lt;a href="https://github.com/BurntSushi/quickcheck"&gt;quickcheck&lt;/a&gt; is extremely easy to integrate into
the development process, and found tons of bugs that my own,
manually-crafted unit tests never would have turned up.
&lt;/p&gt;

&lt;p&gt;
A nice property we have when making a data structure that obeys some
common interface is that we can just test against a known-good
structure that obeys the same interface:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-rust"&gt;&lt;span style="color: #6D46E3;"&gt;#&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;[&lt;/span&gt;&lt;span style="color: #6D46E3;"&gt;derive&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #6D46E3;"&gt;Copy, Clone, Debug&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;]&lt;/span&gt;
&lt;span style="color: #13665F;"&gt;enum&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;MapOperation&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #845A84;"&gt;K&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;FixedLengthKey&lt;/span&gt;, &lt;span style="color: #E36B3F;"&gt;V&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;Insert&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;K&lt;/span&gt;,&lt;span style="color: #E36B3F;"&gt;V&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;, &lt;span style="color: #E36B3F;"&gt;Remove&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;K&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;, &lt;span style="color: #E36B3F;"&gt;Query&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;K&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;,
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;

&lt;span style="color: #13665F;"&gt;impl&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;K&lt;/span&gt;,&lt;span style="color: #E36B3F;"&gt;V&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;Arbitrary&lt;/span&gt; &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;MapOperation&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;K&lt;/span&gt;,&lt;span style="color: #E36B3F;"&gt;V&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;gt;&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;where&lt;/span&gt; &lt;span style="color: #845A84;"&gt;K&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;Arbitrary&lt;/span&gt; + &lt;span style="color: #E36B3F;"&gt;FixedLengthKey&lt;/span&gt; + &lt;span style="color: #E36B3F;"&gt;Rand&lt;/span&gt;,
          &lt;span style="color: #845A84;"&gt;V&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;Arbitrary&lt;/span&gt; + &lt;span style="color: #E36B3F;"&gt;Rand&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;fn&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;arbitrary&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #845A84;"&gt;G&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;Gen&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;&amp;gt;(&lt;/span&gt;&lt;span style="color: #845A84;"&gt;g&lt;/span&gt;: &lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;&lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;G&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; -&amp;gt; &lt;span style="color: #E36B3F;"&gt;MapOperation&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;K&lt;/span&gt;,&lt;span style="color: #E36B3F;"&gt;V&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;use&lt;/span&gt; &lt;span style="color: #13665F;"&gt;self&lt;/span&gt;::&lt;span style="color: #E36B3F;"&gt;MapOperation&lt;/span&gt;::*;
        &lt;span style="color: #13665F;"&gt;match&lt;/span&gt; g.gen_range&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;0,3&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
            0 =&amp;gt; &lt;span style="color: #E36B3F;"&gt;Insert&lt;/span&gt;&lt;span style="color: #787096;"&gt;(&lt;/span&gt;g.gen&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;, g.gen&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;,
            1 =&amp;gt; &lt;span style="color: #E36B3F;"&gt;Remove&lt;/span&gt;&lt;span style="color: #787096;"&gt;(&lt;/span&gt;g.gen&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;,
            2 =&amp;gt; &lt;span style="color: #E36B3F;"&gt;Query&lt;/span&gt;&lt;span style="color: #787096;"&gt;(&lt;/span&gt;g.gen&lt;span style="color: #4d9391;"&gt;()&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;,
            _ =&amp;gt; &lt;span style="color: #6D46E3;"&gt;panic!&lt;/span&gt;&lt;span style="color: #787096;"&gt;()&lt;/span&gt;
        &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
    &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;

&lt;span style="color: #6D46E3;"&gt;quickcheck!&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;A small keyspace makes it more likely that we'll randomly get&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;keys we've already used; it's easy to never test the insert&lt;/span&gt;
    &lt;span style="color: #7c878a;"&gt;// &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;x/remove x path otherwise.&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;fn&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;equivalence_with_map&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #845A84;"&gt;ops&lt;/span&gt;: &lt;span style="color: #E36B3F;"&gt;Vec&lt;/span&gt;&amp;lt;&lt;span style="color: #E36B3F;"&gt;MapOperation&lt;/span&gt;&amp;lt;&lt;span style="color: #E36B3F;"&gt;u8&lt;/span&gt;,&lt;span style="color: #E36B3F;"&gt;u64&lt;/span&gt;&amp;gt;&amp;gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; -&amp;gt; &lt;span style="color: #E36B3F;"&gt;bool&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;use&lt;/span&gt; &lt;span style="color: #13665F;"&gt;self&lt;/span&gt;::&lt;span style="color: #E36B3F;"&gt;MapOperation&lt;/span&gt;::*;
        &lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #845A84;"&gt;us&lt;/span&gt; = &lt;span style="color: #E36B3F;"&gt;FixieTrie&lt;/span&gt;::new&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;;
        &lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #13665F;"&gt;mut&lt;/span&gt; &lt;span style="color: #845A84;"&gt;them&lt;/span&gt; = ::&lt;span style="color: #845A84;"&gt;std&lt;/span&gt;::&lt;span style="color: #845A84;"&gt;collections&lt;/span&gt;::&lt;span style="color: #845A84;"&gt;btree_map&lt;/span&gt;::&lt;span style="color: #E36B3F;"&gt;BTreeMap&lt;/span&gt;::new&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;;
        &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #845A84;"&gt;op&lt;/span&gt; &lt;span style="color: #13665F;"&gt;in&lt;/span&gt; ops &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
            &lt;span style="color: #13665F;"&gt;match&lt;/span&gt; op &lt;span style="color: #787096;"&gt;{&lt;/span&gt;
                &lt;span style="color: #E36B3F;"&gt;Insert&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;k, v&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; =&amp;gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; &lt;span style="color: #6D46E3;"&gt;assert_eq!&lt;/span&gt;&lt;span style="color: #a0586c;"&gt;(&lt;/span&gt;us.insert&lt;span style="color: #cd9575;"&gt;(&lt;/span&gt;k, v&lt;span style="color: #cd9575;"&gt;)&lt;/span&gt;, them.insert&lt;span style="color: #cd9575;"&gt;(&lt;/span&gt;k, v&lt;span style="color: #cd9575;"&gt;)&lt;/span&gt;&lt;span style="color: #a0586c;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;,
                &lt;span style="color: #E36B3F;"&gt;Remove&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;k&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; =&amp;gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; &lt;span style="color: #6D46E3;"&gt;assert_eq!&lt;/span&gt;&lt;span style="color: #a0586c;"&gt;(&lt;/span&gt;us.remove&lt;span style="color: #cd9575;"&gt;(&lt;/span&gt;&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;k&lt;span style="color: #cd9575;"&gt;)&lt;/span&gt;, them.remove&lt;span style="color: #cd9575;"&gt;(&lt;/span&gt;&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;k&lt;span style="color: #cd9575;"&gt;)&lt;/span&gt;&lt;span style="color: #a0586c;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;,
                &lt;span style="color: #E36B3F;"&gt;Query&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;k&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; =&amp;gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; &lt;span style="color: #6D46E3;"&gt;assert_eq!&lt;/span&gt;&lt;span style="color: #a0586c;"&gt;(&lt;/span&gt;us.get&lt;span style="color: #cd9575;"&gt;(&lt;/span&gt;&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;k&lt;span style="color: #cd9575;"&gt;)&lt;/span&gt;, them.get&lt;span style="color: #cd9575;"&gt;(&lt;/span&gt;&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;k&lt;span style="color: #cd9575;"&gt;)&lt;/span&gt;&lt;span style="color: #a0586c;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;,
            &lt;span style="color: #787096;"&gt;}&lt;/span&gt;
        &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
        us.keys&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;.zip&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;them.keys&lt;span style="color: #787096;"&gt;()&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;.all&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;|&lt;span style="color: #787096;"&gt;(&lt;/span&gt;a,&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;b&lt;span style="color: #787096;"&gt;)&lt;/span&gt;| us.get&lt;span style="color: #787096;"&gt;(&lt;/span&gt;&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;a&lt;span style="color: #787096;"&gt;)&lt;/span&gt; == them.get&lt;span style="color: #787096;"&gt;(&lt;/span&gt;&lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;&amp;amp;&lt;/span&gt;b&lt;span style="color: #787096;"&gt;)&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;
    &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org2f3b88e" class="outline-2"&gt;
&lt;h2 id="org2f3b88e"&gt;&lt;span class="section-number-2"&gt;10.&lt;/span&gt; Further directions&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-10"&gt;
&lt;p&gt;
This allocates a lot, and &lt;a href="https://dotat.at/prog/qp/notes-write-buffer.html"&gt;buffering writes&lt;/a&gt; could help, as well as a
custom allocator.  Bagwell's papers talk about the importance of
knowing the common allocation patterns and tailoring the allocator to
them.
&lt;/p&gt;

&lt;p&gt;
Dropping the trie is expensive right now.  This is something that
would be a lot faster with a hierarchical or region-based allocator,
as long as the keys and values don't implement &lt;code&gt;Drop&lt;/code&gt;; you would much
rather throw the whole area away than having to walk it just to
dispose of it, which is what I do in the current implementation.
&lt;/p&gt;

&lt;p&gt;
We still have some unused bits in our branch structure.  If
prefix-compression ended up being useful, it could be stuffed into one
of those bits.  As it is, though, I like that the pointer structure is
pretty simple.
&lt;/p&gt;

&lt;p&gt;
Tony Finch has explored lots of qp-trie variants; some of the same
ideas are also applicable to fixie tries.  &lt;a href="https://dotat.at/prog/qp/blog-2015-10-11.html"&gt;Prefetching&lt;/a&gt; made a
noticeable difference for qp-tries.
&lt;/p&gt;

&lt;p&gt;
There are some interesting concurrent tries in the same vein,
including &lt;a href="http://infoscience.epfl.ch/record/166908/files/ctries-techreport.pdf"&gt;CTries&lt;/a&gt;, and to a lesser extent &lt;a href="http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p57.pdf"&gt;poptries&lt;/a&gt;.  I thought about a
reasonable scheme for lockless insertions in fixie tries, but haven't
had time to implement it.
&lt;/p&gt;

&lt;p&gt;
It might be nice to do something like direct pointing, also from
poptries, where the first &lt;span class="underline"&gt;n&lt;/span&gt; bits are covered by a 2ⁿ array of tries.
This is a little reminiscent of the two-level approach of &lt;a href="http://roaringbitmap.org/"&gt;Roaring
bitmaps&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
Finally, there are still a lot of interface niceties missing from
this.  For example, it really should implement &lt;code&gt;Entry&lt;/code&gt; but I haven't
gotten around to it yet.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="mailto:julian@cipht.net"&gt;Let me know&lt;/a&gt; if you end up using fixie tries or just end up taking
inspiration from this world of trie techniques to build you own little
trie variant.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
First digression, pronunciation: ever since I found
out that &lt;b&gt;trie&lt;/b&gt; comes from the middle of the word re-&lt;b&gt;trie&lt;/b&gt;-val, I
have pronounced this word as "tree", confusing everyone.  Paul
E. Black &lt;a href="https://xlinux.nist.gov/dads/HTML/trie.html"&gt;agrees&lt;/a&gt;, but Knuth says "[a] trie — pronounced 'try'".
&lt;/p&gt;

&lt;p class="footpara"&gt;
Usually all these naming prefixes (suffix trie, PATRICIA trie,
qp-trie, fixie trie) are enough to distinguish them.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
&lt;a href="https://systemswe.love/archive/minneapolis-2017/julian-squires"&gt;I gave a brief talk on timing wheels&lt;/a&gt; (and other ways to
implement timers) at Systems We Love in Minneapolis this year.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I think Adam Langley's &lt;a href="https://www.imperialviolet.org/binary/critbit.pdf"&gt;literate programming treatement of
critbit trees&lt;/a&gt; is a nice way to explore them.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
See also &lt;a href="http://www.felixcloutier.com/x86/POPCNT.html"&gt;&lt;code&gt;popcnt&lt;/code&gt; in SSE4&lt;/a&gt;, and &lt;a href="http://wm.ite.pl/articles/sse-popcount.html"&gt;Wojciech Muła's work&lt;/a&gt;
doing popcount on larger bitmaps more efficiently.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.5" class="footnum" href="#fnr.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
And, it ends with a paragraph that has more significance to me since &lt;a href="http://cipht.net/2017/10/02/are-jump-tables-always-fastest.html"&gt;Are
jump tables always fastest?&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Finally, its worth noting that &lt;code&gt;case&lt;/code&gt; statements could be implemented
using an adaptation of the AMT to give space efficient, optimized
machine code for fast performance in sparse multi-way program
switches.
&lt;/p&gt;
&lt;/blockquote&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.6" class="footnum" href="#fnr.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
For example, Erlang's not-quite-new-anymore map type is
implemented as a HAMT, &lt;a href="https://github.com/erlang/otp/blob/40c19a6674b9034a35f1d0e5540fa755cfd54b7c/erts/emulator/beam/erl_map.h#L76%0A"&gt;at least for more than 32 elements&lt;/a&gt;.  This is
why it's important to build your Erlang VM with &lt;code&gt;-march=native&lt;/code&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.7" class="footnum" href="#fnr.7" role="doc-backlink"&gt;7&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I'd be remiss not to mention recent work
&lt;a href="https://michael.steindorfer.name/publications/oopsla15.pdf"&gt;(Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM
Collections)&lt;/a&gt; that's been done on improving HAMT performance, as
covered in &lt;a href="https://blog.acolyer.org/2015/11/27/hamt/"&gt;The Morning Paper&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.8" class="footnum" href="#fnr.8" role="doc-backlink"&gt;8&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
But as /u/1amzave on lobste.rs points out, &lt;a href="https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf"&gt;57-bit
addressing is coming&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.9" class="footnum" href="#fnr.9" role="doc-backlink"&gt;9&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I coincidentally started reading &lt;a href="https://www.goodreads.com/book/show/667203.Purity_and_Danger"&gt;Purity and Danger&lt;/a&gt; while
writing all this unsafe code and thinking about the cultures of C and
Rust.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.10" class="footnum" href="#fnr.10" role="doc-backlink"&gt;10&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Could we do this on another platform?  If you use
indexes into an array you control, or if you control the memory
allocator and use a &lt;a href="http://www.memorymanagement.org/glossary/b.html#term-bibop"&gt;BIBOP&lt;/a&gt;-style approach where you know what all trie
pointers will look like, you could probably do the same thing.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.11" class="footnum" href="#fnr.11" role="doc-backlink"&gt;11&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Minimal repro, in case you too would like to
explore the guts of jemalloc's locking code:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #6D46E3;"&gt;#include&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #39854C;"&gt;jemalloc/jemalloc.h&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;&amp;gt;&lt;/span&gt;

&lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;main&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; *&lt;span style="color: #845A84;"&gt;p&lt;/span&gt; = mallocx&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;9, 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    sdallocx&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;p, 2, 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;enum&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt; &lt;span style="color: #845A84;"&gt;N&lt;/span&gt; = 38000 &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; *&lt;span style="color: #845A84;"&gt;q&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;[&lt;/span&gt;N&lt;span style="color: #a9779c;"&gt;]&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;i&lt;/span&gt; = 0; i &amp;lt; N; ++i&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
        q&lt;span style="color: #a9779c;"&gt;[&lt;/span&gt;i&lt;span style="color: #a9779c;"&gt;]&lt;/span&gt; = mallocx&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;8, 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;for&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;i&lt;/span&gt; = 0; i &amp;lt; N; ++i&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
        sdallocx&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;q&lt;span style="color: #4C7A90;"&gt;[&lt;/span&gt;i&lt;span style="color: #4C7A90;"&gt;]&lt;/span&gt;, 8, 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    malloc_stats_print&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #845A84;"&gt;NULL&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;NULL&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;NULL&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>Building shells with a grain of salt</title><link href='http://cipht.net/2017/10/17/build-your-own-shell.html'/><updated>2017-10-17T02:30:00+0000</updated><id>http://cipht.net/2017/10/17/build-your-own-shell</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
The shell is at the heart of Unix.  It's the glue that makes all the
little Unix tools work together so well.  Understanding it sheds light
on many of Unix's important ideas, and writing our own is the best
path to that understanding.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Earlier this year, at a place I worked, I decided to run a series of
workshops on writing a Unix shell.  A lot of questions had come up
that I think writing a shell leads you through, as well as issues that
suggested tenuous mental models of the shell and its scripting
language.
&lt;/p&gt;

&lt;p&gt;
A small sampling of those kinds of questions:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;why does this Python replacement for this shell script deadlock?&lt;/li&gt;
&lt;li&gt;why are there so many Unicorn processes on this server?&lt;/li&gt;
&lt;li&gt;when are variables quoted in a shell script?  why aren't they
always visible to other programs I run?&lt;/li&gt;
&lt;li&gt;how does &lt;code&gt;set -e&lt;/code&gt; work?&lt;/li&gt;
&lt;li&gt;how does control-C (&lt;code&gt;^C&lt;/code&gt;) work?  why do I need to use &lt;code&gt;^\&lt;/code&gt;
sometimes instead?  why doesn't &lt;code&gt;^C&lt;/code&gt; work the way I'd expect on
this bash &lt;code&gt;for&lt;/code&gt; loop?&lt;/li&gt;
&lt;li&gt;what's special about init?&lt;/li&gt;
&lt;li&gt;why do we do these steps in daemonization?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
At this company, we had a regular Friday afternoon workshop/lecture
series.  I had previously tried to do an overview of Unix processes
relationships, but it felt too abstract.  So, I tried to make it more
concrete by getting everyone to actually implement a shell.
&lt;/p&gt;

&lt;p&gt;
Initially, this was just a rough layout of what I thought I could
cover in each session, and pointers to manpages.  I never turned this
into the full, DIY, self-paced tutorial I had hoped, but (in the
spirit of release early, release often) I am opening up my work in
progress at &lt;a href="https://github.com/tokenrove/build-your-own-shell"&gt;https://github.com/tokenrove/build-your-own-shell&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
This isn't "finished", but if you're ambitious, you should be able to
make something that passes all the tests.  I decided it would be
better to put it out there, even in rough form, than keep it sealed
up.  After all, a number of people enjoyed the workshop out of which
this came.
&lt;/p&gt;

&lt;p&gt;
(Caveat for macOS and *BSD users: there's still something wonky about
the timing in the section that tests signals and job control;
hopefully by the time you're reading this, I'll have it worked out,
but if not, I apologize.)
&lt;/p&gt;

&lt;p&gt;
In this post I'll reflect on some choices I made, and follow a few
tangents that come up in the text but would be disruptive there.
&lt;/p&gt;

&lt;div id="outline-container-org3f73a86" class="outline-2"&gt;
&lt;h2 id="org3f73a86"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; Testing shells&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
I decided that, for this to be useful for self-study, it should
contain an automated test suite.  I love Tcl and &lt;a href="http://expect.sourceforge.net/"&gt;expect&lt;/a&gt;, and had
figured it would be a natural tool for testing the interactive
components of shells.  I took a quick look at how other shells were
testing themselves.  Most were strictly non-interactive tests, using
shell scripts and comparing with expected output.  A nice exception
here is &lt;a href="https://fishshell.com/"&gt;fish&lt;/a&gt;, which indeed uses expect for its interactive tests.
&lt;/p&gt;

&lt;p&gt;
This makes sense, but I wanted to focus on interactive shells: in part
because so many tutorials ignored the considerations of interactive
shells, but also because I felt people would enjoy themselves more if
they could use the shell they were writing directly.
&lt;/p&gt;

&lt;p&gt;
I started with some tests edited from the output of &lt;a href="https://github.com/tokenrove/til/blob/master/tcl/autoexpect.md"&gt;&lt;code&gt;autoexpect&lt;/code&gt;&lt;/a&gt;, but
this turned out to be too fragile.  Something I noticed in the first
workshop was that people really enjoyed customizing their prompt; this
should be no surprise (prompt customization is a perennial
time-wasting activity in any shell), but it meant I'd have to be
careful about how I matched outputs in tests.  In particular, I
couldn't really depend on detecting and matching the prompt.
&lt;/p&gt;

&lt;p&gt;
The other tricky thing is that I couldn't use any feature in the tests
that hadn't been developed yet, so using conditionals or echoing &lt;code&gt;$?&lt;/code&gt;
wasn't possible in the early tests.
&lt;/p&gt;

&lt;p&gt;
I considered writing a wrapper using &lt;a href="https://en.wikipedia.org/wiki/Ptrace"&gt;ptrace(2)&lt;/a&gt; that would watch for
all &lt;code&gt;fork&lt;/code&gt; / &lt;code&gt;execve&lt;/code&gt; / &lt;code&gt;wait&lt;/code&gt; syscalls from the shell and its children,
and print those in a form easily consumed by a test harness (this
seemed easier to do than cleaning up the output of &lt;code&gt;strace&lt;/code&gt;), but
things like prompts that exec &lt;code&gt;git&lt;/code&gt; every time, as well as &lt;code&gt;ptrace&lt;/code&gt;'s
noted stubbiness on macOS, prevented me from going further with this.
&lt;/p&gt;

&lt;p&gt;
So that's where the workshop sat for a long time, until I finally
decided to use a little test description language in place of expect
scripts directly.  So now a typical test might look like:
&lt;/p&gt;

&lt;pre class="example" id="orgc57b5d5"&gt;
→ true || false || echo-rot13 foo⏎
≠ sbb
→ false || true &amp;amp;&amp;amp; echo-rot13 foo⏎
← sbb
→ exit 42
☠ 42
&lt;/pre&gt;

&lt;p&gt;
For whatever reason, expressing things this way allowed me to finally
write out all the tests I had intended to have, without focusing too
much on the implementation of the test harness.  Then I wrote some Tcl
to interpret these files.
&lt;/p&gt;

&lt;p&gt;
I decided to go with string matching in the output, which is not
particularly robust, but is simple.  Because of discrepancies between
how different shells and TTY drivers draw things, it can be prone to
matching the echoed input as the output if one isn't careful.  There
are also some timing issues; the script written by &lt;code&gt;autoexpect&lt;/code&gt;
suggests inserting a 100ms delay between each keystroke sent, but this
makes the tests cripplingly slow; I'm still trying to find a tuning
that is reliable across systems but speedy enough to be usable.
&lt;/p&gt;

&lt;p&gt;
I decided that &lt;code&gt;bash&lt;/code&gt; and &lt;code&gt;mksh&lt;/code&gt; should pass all the tests, and &lt;code&gt;cat&lt;/code&gt;
should fail every test.  There's nothing worse than a test that fails
to actually test something.  This reminds me of the admonishment
"don't try to do what a corpse can do better": goals phrased in the
negative (like "stop reading Hacker News") are hard to achieve — the
dead (or &lt;code&gt;cat&lt;/code&gt;) will always do them better than you.  Positive goals
(and tests) are more actionable.
&lt;/p&gt;

&lt;p&gt;
There are still some timing issues on different platforms, but I don't
regret making the simple choice for now.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgc460dc5" class="outline-2"&gt;
&lt;h2 id="orgc460dc5"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Minimal shell builtins&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
Doing the workshop lead me to think about minimizing shell builtins;
one of the questions that comes up a lot is why &lt;code&gt;cd&lt;/code&gt; needed to be a
builtin, but what doesn't come up until one is much deeper into
pipelines and job control is what a pain builtins are, in how they
interact with the rest of the shells features.  It would be nice to
get rid of them.
&lt;/p&gt;

&lt;p&gt;
There are some commands which are builtins only to make them fast,
like &lt;code&gt;echo&lt;/code&gt;, &lt;code&gt;true&lt;/code&gt;, and &lt;code&gt;false&lt;/code&gt;.  These usually have equivalents in
&lt;code&gt;/bin&lt;/code&gt; already.
&lt;/p&gt;

&lt;p&gt;
Some builtins are required because they modify the shell's own
environment: &lt;code&gt;cd&lt;/code&gt;, &lt;code&gt;exit&lt;/code&gt;, &lt;code&gt;fg&lt;/code&gt;, &lt;code&gt;bg&lt;/code&gt;, &lt;code&gt;jobs&lt;/code&gt;, &lt;code&gt;exec&lt;/code&gt;, &lt;code&gt;wait&lt;/code&gt;,
&lt;code&gt;ulimit&lt;/code&gt;.  (This is excluding really tricky, impractical things, like
using shared memory, &lt;code&gt;process_vm_writev&lt;/code&gt;, or &lt;code&gt;ptrace&lt;/code&gt; to modify the
shell from an outside process.)&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
To prove a point, you could take functional programming to an extreme
and have an immutable shell where &lt;code&gt;cd&lt;/code&gt; executes a new shell in the
chosen directory, but some of the others are probably not possible in
the presence of typical job control.&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
If we take this line of thought further, we can try externalizing some
of the shell's operators.  Conditional execution is interesting.  How
about &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; and &lt;code&gt;||&lt;/code&gt;?  Syntactically, we probably can't pull these off
as external commands, but we could provide commands &lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt;
which take commands to execute.
&lt;/p&gt;

&lt;p&gt;
Implementing &lt;code&gt;if&lt;/code&gt; is an obvious next step from &lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt;.  Now we
can implement &lt;code&gt;while&lt;/code&gt;, although we'd have to be careful about how we
handle the environment if we wanted to handle many typical uses of
&lt;code&gt;while&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
The &lt;code&gt;for&lt;/code&gt; loop almost already exists in this form, as &lt;code&gt;xargs&lt;/code&gt;.  We
would probably want to provide both a sequential version, where the
environment for each iteration depends on the previous, and a parallel
version where everything can run at the same time.
&lt;/p&gt;

&lt;p&gt;
Note that most of these approaches require that you have mechanisms
for escaping that aren't too cumbersome, for them to be practical.
There seems to be a close parallel with macro facilities in languages
like Lisp.
&lt;/p&gt;

&lt;p&gt;
At the extreme side of cumbersome quoting would be &lt;code&gt;case&lt;/code&gt;, which you'd
probably want to take its input from a heredoc.
&lt;/p&gt;

&lt;p&gt;
I was originally going to write a proof of concept of this (called
"builtouts"), but researching this lead me to the intriguing &lt;a href="https://skarnet.org/software/execline/index.html"&gt;execline&lt;/a&gt;
"shell", which has already done this, and explored this space rather
nicely.
&lt;/p&gt;

&lt;p&gt;
One thing that &lt;code&gt;execline&lt;/code&gt; doesn't seem to do is implement something
resembling real job control.  If &lt;code&gt;bg&lt;/code&gt; executes a command without
waiting and then re-executes the shell with a suitable variable set
(to the PGID of this job), the shell on each execution can check this
variable to see what jobs are still alive; the &lt;code&gt;jobs&lt;/code&gt; command can
print the contents of this variable; the &lt;code&gt;fg&lt;/code&gt; command just becomes
&lt;code&gt;tcsetpgrp&lt;/code&gt; and &lt;code&gt;wait&lt;/code&gt; with the PGID of the current job.  For an
interactive shell, the tricky thing is probably making sure that
&lt;code&gt;bg&lt;/code&gt;'s children don't end up in an orphaned process group.
&lt;/p&gt;

&lt;p&gt;
A lot of these programs end up having to deal with quoting.  Is there
a way to take this further and handle quoting in its own program?  For
fixed-arity programs (like &lt;code&gt;if&lt;/code&gt;), we can imagine an &lt;code&gt;unquote&lt;/code&gt; helper
that calls a subsidiary program with, first, the fixed remaining
arguments, and then all of the original quoted argument, expanded, as
the remaining arguments.
&lt;/p&gt;

&lt;p&gt;
As &lt;a href="http://man7.org/linux/man-pages/man7/glob.7.html"&gt;&lt;code&gt;glob(7)&lt;/code&gt;&lt;/a&gt; notes:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Long ago, in UNIX V6, there was a program /etc/glob that would expand
wildcard patterns.  Soon afterward this became a shell built-in.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Luckily, &lt;a href="https://github.com/dspinellis/unix-history-repo/blob/Research-V6-Snapshot-Development/usr/source/s1/glob.c"&gt;the source is available&lt;/a&gt; in Diomidis Spinellis's
&lt;a href="https://github.com/dspinellis/unix-history-repo"&gt;unix-history-repo&lt;/a&gt;, and we can see that it does this same kind of chain
loading, executing its first argument with the rest of its arguments
expanded according to the globbing rules.
&lt;/p&gt;

&lt;p&gt;
I especially enjoy the extremely primitive &lt;a href="https://github.com/dspinellis/unix-history-repo/blob/38371171d1ed457a43a9c8e7f2df5d596916209d/usr/source/s1/glob.c#L50-L54"&gt;path search&lt;/a&gt; and &lt;a href="https://github.com/dspinellis/unix-history-repo/blob/38371171d1ed457a43a9c8e7f2df5d596916209d/usr/source/s1/glob.c#L132-L137"&gt;shell
script support&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org1b8b31e" class="outline-2"&gt;
&lt;h2 id="org1b8b31e"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; Objet trouvé engineering&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
(&lt;i&gt;Found object&lt;/i&gt; engineering, often called &lt;a href="https://en.wikipedia.org/wiki/Cargo_cult_programming"&gt;cargo cult programming&lt;/a&gt;.)
&lt;/p&gt;

&lt;p&gt;
Now we get to the inflamatory bits, for those who kept reading.
&lt;/p&gt;

&lt;p&gt;
Stackoverflow modernized, but did not create, the practice of
assembling Frankenstein programs from poorly understood and imitated
examples, but I think no language has been more greatly affected by
this than shell, as evidenced by the bizarre ready-made shell scripts
one can encounter almost everywhere.  Sometimes, the evolution of
these patterns reminds me of &lt;a href="https://en.wikipedia.org/wiki/Semantic_change"&gt;semantic drift&lt;/a&gt; in languages.
&lt;/p&gt;

&lt;p&gt;
A lot of constructs are poorly understood and misused.  I'm not
blaming people, though; part of the problem is that I can't easily
point to a single, modern reference work that someone should read
before writing shell scripts.  And, since shell scripts often feel
like "configuration" rather than "programming", I imagine people don't
even think about learning shell as a programming language.
&lt;/p&gt;

&lt;p&gt;
Writing a shell helps disabuse people of some common confusions, for
example that:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;bash &lt;i&gt;is&lt;/i&gt; shell scripting, definitively, and if you write a shell
script, it is a "bash script" (&lt;a href="https://www.lysator.liu.se/c/ten-commandments.html"&gt;"all the world's a VAX"&lt;/a&gt; and all that);&lt;/li&gt;
&lt;li&gt;quotes make something into a string;&lt;/li&gt;
&lt;li&gt;double and single quotes are interchangeable;&lt;/li&gt;
&lt;li&gt;the argument to &lt;code&gt;if&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt;, et cetera is something magical;&lt;/li&gt;
&lt;li&gt;writing &lt;code&gt;export FOO=x&lt;/code&gt; repeatedly does something;&lt;/li&gt;
&lt;li&gt;variable case has some magic properties;&lt;/li&gt;
&lt;li&gt;et cetera.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
(Don't forget to use &lt;a href="https://www.shellcheck.net/"&gt;shellcheck&lt;/a&gt; and &lt;a href="https://anonscm.debian.org/git/collab-maint/devscripts.git/tree/scripts/checkbashisms.pl"&gt;checkbashisms&lt;/a&gt; everywhere!)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org09f44f9" class="outline-2"&gt;
&lt;h2 id="org09f44f9"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; Why write a shell&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
In the workshop, I cite the following motivations for writing a shell:
&lt;/p&gt;

&lt;ul class="org-ul"&gt;
&lt;li&gt;to give you a better understanding of how Unix processes work;
&lt;ul class="org-ul"&gt;
&lt;li&gt;this will make you better at designing and understanding software
that runs on Unix;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;to clarify some common misunderstandings of POSIX shells;
&lt;ul class="org-ul"&gt;
&lt;li&gt;this will make you more effective at using and scripting
ubiquitous shells like bash;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;to help you build a working implementation of a shell you can be
excited about working on.
&lt;ul class="org-ul"&gt;
&lt;li&gt;there are endless personal customizations you can make to your
own shell, and can help you think about how you interact with
your computer and how it might be different.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
I've already touched on the first two, but the third is maybe less
obvious.  The shell remains a ubiquitous interface, decades after we
imagined other modes of interaction would replace it.  The field is
ripe with opportunities for improvements.
&lt;/p&gt;

&lt;p&gt;
There are a lot of people exploring this space in interesting ways,
but I think there's room for so much more.
&lt;/p&gt;

&lt;p&gt;
A lot of existing tutorials focus on the non-interactive case, and I
think people will have more fun if they build a shell they can use
interactively.
&lt;/p&gt;

&lt;p&gt;
Aside from the interactive case, a lot of infrastructure is held
together with shell scripts.
&lt;/p&gt;

&lt;p&gt;
There's a commonly held belief that scripting languages like Perl,
Ruby, and Python are complete replacements for shell scripting.  My
own experience is that these languages lack the expressive tools of
the shell for working with pipelines, exit statuses, redirections, and
so on, and the replacement code is often:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;sequential rather than parallel, and often much slower for this
reason;&lt;/li&gt;
&lt;li&gt;full of deadlocks, race conditions, and signal handling issues;&lt;/li&gt;
&lt;li&gt;much more verbose and less clear than an equivalently
carefully-written shell script.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
So, I feel there's still room for new tools in this space, too.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org09972b8" class="outline-2"&gt;
&lt;h2 id="org09972b8"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; Conclusion&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
If you've been thinking about writing a shell for a while and haven't
gotten around to it, why not try &lt;a href="https://github.com/tokenrove/build-your-own-shell"&gt;my workshop&lt;/a&gt; or any of the tutorials
it links to?
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
POSIX avoids dictating exactly what must be a
builtin, but &lt;a href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_09_01_01"&gt;does specify&lt;/a&gt; that the following commands must be executed
no matter if they are in the path:
&lt;/p&gt;

&lt;pre class="example" id="orge2d913c"&gt;
alias bg cd command false fc fg getopts jobs kill newgrp pwd
read true umask unalias wait
&lt;/pre&gt;

&lt;p class="footpara"&gt;
Most of these have something to do with the shell's internal state,
but not all.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
This is a kind of chain loading, sometimes
called &lt;a href="http://www.catb.org/~esr/writings/taoup/html/ch06s06.html"&gt;Bernstein chaining&lt;/a&gt;.  There's a lovely discussion of this in
Andy Chu's &lt;a href="https://www.oilshell.org/blog/2017/01/13.html"&gt;Shell has a Forth-like quality&lt;/a&gt;.  (The entire oil shell blog
is full of great stuff.)
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>Return to the Source</title><link href='http://cipht.net/2017/10/05/why-read-code.html'/><updated>2017-10-05T02:30:00+0000</updated><id>http://cipht.net/2017/10/05/why-read-code</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
If a system is to serve the creative spirit, it must be entirely
comprehensible to a single individual.
  — &lt;a href="https://www.cs.virginia.edu/~evans/cs655/readings/smalltalk.html"&gt;Dan Ingalls&lt;/a&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
I saw Ellen Ullman speak last night, about &lt;a href="https://www.goodreads.com/book/show/31450584-life-in-code"&gt;her new book&lt;/a&gt;,&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; and
the topic turned to culpability for Y2K, systems that people never
expected would run for decades, and systems that no one understands
any more.
&lt;/p&gt;

&lt;p&gt;
When a Peterborough nuclear facility &lt;a href="http://www.vcfed.org/forum/archive/index.php/t-37827.html"&gt;reached out to retrocomputing
enthusiasts&lt;/a&gt; looking for someone who knew PDP-11 assembler, I started
thinking about &lt;a href="https://www.goodreads.com/book/show/29579.Foundation"&gt;the Foundation series&lt;/a&gt; (warning: possible spoilers
follow).  The idea that was most striking to me, in those books, was
that eventually, societies who became comfortable with advanced
technology could end up losing the knowledge of how that technology
worked (cast in a very '50s nuclear vibe).&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; I
encounter a lot of people dismissive of the importance of &lt;a href="https://scholar.harvard.edu/files/mickens/files/thenightwatch.pdf"&gt;systems
programming&lt;/a&gt; (moreso online than IRL, thankfully), and it makes me
wonder if we are rapidly heading in that direction.
&lt;/p&gt;

&lt;p&gt;
Ullman talked about "returning to the source" — extracting the lost
knowledge from code whose authors aren't around anymore.  There
couldn't have been a more serendipitous time for this, as I had just
been discussing the merits (and pitfalls) of reading source with &lt;a href="https://www.recurse.com/"&gt;my
fellow Recursers&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
It's my sincere belief that code is the source of truth in computing
(and by this, I also mean machine code, which is also worth reading;
the success of &lt;a href="https://godbolt.org/"&gt;Matt Godbolt's Compiler Explorer&lt;/a&gt; tells me I'm not alone
in this).  So I am writing this article to exhort you to read code
written by someone you don't know, today, to save the future.
&lt;/p&gt;

&lt;div id="outline-container-org47d4990" class="outline-2"&gt;
&lt;h2 id="org47d4990"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; Why Read?  Craftsmanship&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
Jon Bentley opens his first Programming Pearl on literate programming
with the following:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
When was the last time you spent a pleasant evening in a comfortable
chair, reading a good program? I don't mean the slick subroutine you
wrote last summer, nor even the big system you have to modify next
week. I'm talking about cuddling up with a classic, and starting to
read on page one. Sure, you may spend more time studying this elegant
routine or worrying about that questionable decision, and everybody
skims over a few parts they find boring. But let's get back to the
question: When was the last time you read an excellent program?
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
(I like to ask this question in interviews, on both sides of the
table; not to be a snob, but to open up a discussion about reading
code.)
&lt;/p&gt;

&lt;p&gt;
I always remember this better the way Steve McConnell paraphrases
Jon Bentley in Code Complete:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
One especially good way to learn about programming is to study the
work of great programmers. Jon Bentley thinks that you should be able
to sit down with a glass of brandy and a good cigar and read a program
the way you would a good novel.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
The intent is clear: you can improve as a &lt;a href="http://manifesto.softwarecraftsmanship.org/"&gt;craftsperson&lt;/a&gt; by reading
masterpieces of software, the same as writers need to read other
works&lt;sup&gt;&lt;a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; and musicians need to listen to other performances.
Empirically verifying whether this is true is unfortunately outside my
abilities, but I believe I've benefitted greatly from "reading the
greats".&lt;sup&gt;&lt;a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
One thing to clarify from those quotes, though: these always made me
picture reading the source from top to bottom, and it turns out this
isn't particularly effective.
&lt;/p&gt;

&lt;p&gt;
I've been passing Peter Seibel's &lt;a href="http://www.gigamonkeys.com/code-reading/"&gt;Code is not Literature&lt;/a&gt; around a lot
lately; this is an article I didn't understand when I first read it.
I thought it was an attack on code reading, but in fact it's a
suggestion of much better ways to approach reading code, especially in
a group.
&lt;/p&gt;

&lt;p&gt;
There's an extension to that: I think Pierre Bayard's &lt;a href="https://www.goodreads.com/book/show/1143788.How_to_Talk_About_Books_You_Haven_t_Read"&gt;How to Talk
About Books You Haven't Read&lt;/a&gt; expresses this far better than I can, but
I think it's self-defeating to believe that "having read the code" is
a binary state: either you read (and understood) it all, or you
haven't "read it".  Diving into a codebase is just the start of a long
relationship with that code; you can keep coming back, to familiar
haunts and undiscovered territories every time.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org330c388" class="outline-2"&gt;
&lt;h2 id="org330c388"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Why Read?  Personal mastery&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
I started this article with my favorite Dan Ingalls quote, from the
design principles of Smalltalk.  I think there's a deep truth about
software in that.  We don't seem to be able to build &lt;a href="https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/"&gt;abstractions that
aren't leaky&lt;/a&gt;, so you're always going to need to be able to go up and
down in the layers of abstraction in a system just to fix the problems
at the level you care about.
&lt;/p&gt;

&lt;p&gt;
What will a system that lasts 10000 years look like?  I don't think it
will be one that no one can understand.  Is it possible to build
complete systems that can be understood by an individual?  The work by
&lt;a href="http://vpri.org/"&gt;Viewpoints Research Institute&lt;/a&gt; seems to suggest it's possible.  Until
the day when we all have our personal 10kLOC operating systems&lt;sup&gt;&lt;a id="fnr.5" class="footref" href="#fn.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt;
committed to memory, Fahrenheit 451-style, reading systems large and
small helps one grapple with the nature of complexity, and find a
personal relationship with it.
&lt;/p&gt;

&lt;p&gt;
And, pragmatically, reading your dependencies helps you answer the
question: will anyone be able to understand this when it breaks?  (And
it will break: because of bitrot; because the assumptions changed.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org1d35247" class="outline-2"&gt;
&lt;h2 id="org1d35247"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; Why Read?  Procedural rhetoric&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
Ellen Ullman also talked about how algorithms have biases; software
isn't neutral&lt;sup&gt;&lt;a id="fnr.6" class="footref" href="#fn.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt;.  This reminded me of Ian Bogost's concept
of &lt;a href="https://mediawiki.middlebury.edu/wiki/MIDDMedia/Procedural_rhetoric"&gt;procedural rhetoric&lt;/a&gt;, where interaction with systems can be
persuasive, and can communicate ideas and opinions, in a way that is
subtle.
&lt;/p&gt;

&lt;p&gt;
This is a deep topic in its own right, but I think the first step in
being the future masters of technology&lt;sup&gt;&lt;a id="fnr.7" class="footref" href="#fn.7" role="doc-backlink"&gt;7&lt;/a&gt;&lt;/sup&gt; is to understand
the workings of systems we interact with, and the purest form of that
is reading their code.  Even when we can't read the code of many
systems around us, reading the code of similar systems is a part of
understanding how they might work and the biases implicit within them.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org68294eb" class="outline-2"&gt;
&lt;h2 id="org68294eb"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; What's worth reading?&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
You might be wondering, "where do I even start?".  I think there are
two classes of code especially worth reading: code that you use, and
code that is great.  The latter is incredibly subjective; I have been
compiling a list of what I think are "masterpieces of software" and
will post it at some point, to much criticism, I'm sure.&lt;sup&gt;&lt;a id="fnr.8" class="footref" href="#fn.8" role="doc-backlink"&gt;8&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
However, the former is straightforward for everyone: read your
dependencies.  Now that we live in a Free Software utopia
(hah)&lt;sup&gt;&lt;a id="fnr.9" class="footref" href="#fn.9" role="doc-backlink"&gt;9&lt;/a&gt;&lt;/sup&gt;, you probably have access to the source of the vast
majority of libraries, servers, tools, and systems you use and depend
on.  This is a wealth that is often squandered.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgdf06146" class="outline-2"&gt;
&lt;h2 id="orgdf06146"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; Literate programming&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
Fans of literate programming are probably champing at the bit, waiting
for me to unveil Knuth's perfect plan for programmer literacy.  (If
you're not familiar with literate programming at all, I think &lt;a href="http://www-cs-faculty.stanford.edu/~knuth/lp.html"&gt;Knuth's
book&lt;/a&gt; is still the best treatment, even if it is pretty dated at this
point.)
&lt;/p&gt;

&lt;p&gt;
I have done a bit of literate programming (and I still think literate
assembly language is the most useful application of these techniques);
I've read a lot of literate programs; and as it relates to the topic
of this article, I feel it's mostly irrelevant.  There will always be
the need to read unadorned, unpresented programs.  Literate programs
are lovely, but they aren't a complete replacement for the kind of
code reading I'm advocating here (even if it was a common enough
practice that one could find a reasonable supply of them).
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0779908" class="outline-2"&gt;
&lt;h2 id="org0779908"&gt;&lt;span class="section-number-2"&gt;6.&lt;/span&gt; Reading about reading&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
Sadly, there haven't been a lot of books about reading code.  There
are many books that intersperse commentary with code, but these aren't
really about reading code; they're more like literate programs.
&lt;/p&gt;

&lt;p&gt;
The only book I know to exclusively treat this subject is &lt;a href="https://www.spinellis.gr/codereading/%0A"&gt;Diomedis
Spinellis's Code Reading&lt;/a&gt;.  It's been a while since I read it, but I
remember feeling that it was a good start, but not the complete
picture.  The author has also compiled a lot of arguments for why code
reading is important on that site, if you find this article
unconvincing.
&lt;/p&gt;

&lt;p&gt;
Michael Feathers's &lt;a href="https://www.goodreads.com/book/show/44919.Working_Effectively_with_Legacy_Code"&gt;Working Effectively with Legacy Code&lt;/a&gt; is a truly
great book, but I don't remember it having much concrete advice about
actually reading legacy code.  (I might be misremembering, of course.)
However, it is about testing, and one of the great ways to read a
codebase is to try to write tests for it.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org2268588" class="outline-2"&gt;
&lt;h2 id="org2268588"&gt;&lt;span class="section-number-2"&gt;7.&lt;/span&gt; Bonus: the tension of comments&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-7"&gt;
&lt;p&gt;
I mentioned that I feel that code is the only truth of the system.
(Which is a terrible oversimplification as almost all systems are also
data-driven to some extent.)  So it's unsurprising that I agree with
Kernighan and Pike's advice on commenting in The Practice of
Programming, which is often misinterpreted as "don't write comments".
&lt;/p&gt;

&lt;p&gt;
When I read code (especially when I review code), I actually skip the
comments (sometimes I strip them out) on my first pass through the
code.  I find that, because I'm much faster at reading English text
than source code, it is easy for the eye to get comfortable reading
the comments and only skimming the code.  This deceives me into
thinking I've actually read the code, when I haven't.
&lt;/p&gt;

&lt;p&gt;
However I have an egregious example from Darwin (macOS)
&lt;a href="https://github.com/apple/darwin-xnu/blob/82dda5e44dd0f5a3ded7a02514db350589dac60f/osfmk/kern/thread_call.c#L717-L722"&gt;&lt;code&gt;osfmk/kern/thread_call.c&lt;/code&gt;&lt;/a&gt;:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;cancel_all&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
        result = _remove_from_pending_queue&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;func, param, cancel_all&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; |
                _remove_from_delayed_queue&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;func, param, cancel_all&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;span style="color: #13665F;"&gt;else&lt;/span&gt;
        result = _remove_from_pending_queue&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;func, param, cancel_all&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; ||
                _remove_from_delayed_queue&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;func, param, cancel_all&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
I often trot this snippet out when I ask people where their threshold
for "too clever" code is.  To me, my first reaction on seeing this was
"this is a typo", and only after looking over it again carefully did I
realize what it was &lt;i&gt;trying&lt;/i&gt; to do, and then a while longer thinking
as to whether it actually did that correctly.
&lt;/p&gt;

&lt;p&gt;
But the most recent time I went to show someone this snippet, it
turned out it &lt;a href="https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/kern/thread_call.c#L728-L738"&gt;had been updated&lt;/a&gt;, including adding some crucial
comments!
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;cancel_all&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;exhaustively search every queue, and return true if any search found something&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
        result = _cancel_func_from_queue&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;func, param, group, cancel_all, &amp;amp;group-&amp;gt;pending_queue&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; |
                 _cancel_func_from_queue&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;func, param, group, cancel_all, &amp;amp;group-&amp;gt;delayed_queues&lt;span style="color: #4C7A90;"&gt;[&lt;/span&gt;TCF_ABSOLUTE&lt;span style="color: #4C7A90;"&gt;]&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;  |
                 _cancel_func_from_queue&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;func, param, group, cancel_all, &amp;amp;group-&amp;gt;delayed_queues&lt;span style="color: #4C7A90;"&gt;[&lt;/span&gt;TCF_CONTINUOUS&lt;span style="color: #4C7A90;"&gt;]&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;early-exit as soon as we find something, don't search other queues&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
        result = _cancel_func_from_queue&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;func, param, group, cancel_all, &amp;amp;group-&amp;gt;pending_queue&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; ||
                 _cancel_func_from_queue&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;func, param, group, cancel_all, &amp;amp;group-&amp;gt;delayed_queues&lt;span style="color: #4C7A90;"&gt;[&lt;/span&gt;TCF_ABSOLUTE&lt;span style="color: #4C7A90;"&gt;]&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; ||
                 _cancel_func_from_queue&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;func, param, group, cancel_all, &amp;amp;group-&amp;gt;delayed_queues&lt;span style="color: #4C7A90;"&gt;[&lt;/span&gt;TCF_CONTINUOUS&lt;span style="color: #4C7A90;"&gt;]&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Reading this code gave me an appreciation for a kind of comment I
would otherwise have tended to omit.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org4d335ef" class="outline-2"&gt;
&lt;h2 id="org4d335ef"&gt;&lt;span class="section-number-2"&gt;8.&lt;/span&gt; Bonus: How to read a C program&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-8"&gt;
&lt;p&gt;
I would be remiss not to end this with some concrete advice.  Reading
tips will vary by language, but a lot of code I read is written in C;
how do I approach reading a C program?
&lt;/p&gt;

&lt;p&gt;
When in doubt, start from the bottom.  Occasionally someone tries to
fight the natural C order of definitions by forward-declaring static
functions; this is unnatural and most code isn't written this way.
Instead, you'll generally see that if you want a "top-down" view, you
should go to the end and work backwards.  (Incidentally, this is even
more true for OCaml / Standard ML programs, where the order of
declaration is very strict.)
&lt;/p&gt;

&lt;p&gt;
Use &lt;a href="https://dotat.at/prog/unifdef/"&gt;unifdef&lt;/a&gt; to get rid of as much that is irrelevant to you as
possible, at least for an initial reading.  Get rid of those paths
that are only taken on Acorns and Ataris.
&lt;/p&gt;

&lt;p&gt;
Use &lt;a href="http://ctags.sourceforge.net/"&gt;ctags&lt;/a&gt;, &lt;a href="http://cscope.sourceforge.net/"&gt;cscope&lt;/a&gt;, &lt;a href="https://www.gnu.org/software/global/"&gt;GNU global&lt;/a&gt;, and whatever other support you can find
to be able to quickly jump to and from the definitions of identifiers,
ideally also seeing all the places that refer to those identifiers.
Cross-reference tools like LXR (&lt;a href="http://elixir.free-electrons.com"&gt;on the Linux kernel&lt;/a&gt;, &lt;a href="https://github.com/whitequark/ruby-cross-reference"&gt;on MRI&lt;/a&gt;) are
sometimes nice for this, although I often find them more cumbersome
than using my editor on my local machine.
&lt;/p&gt;

&lt;p&gt;
Look at the header files included; what are the data structures that
get used all the time?  Sometimes there's tangly stuff with macros,
like &lt;a href="https://www.freebsd.org/cgi/man.cgi?query=queue&amp;amp;sektion=3"&gt;the queue.h macros&lt;/a&gt; for intrusive data structures; it can help to
run the preprocessor over the file (&lt;code&gt;cc -E&lt;/code&gt;) or write the structure
out by hand on paper and annotate it.
&lt;/p&gt;

&lt;p&gt;
Don't be afraid to mutilate the program to understand it.  Cut things
out and try to compile it.  Make hypotheses and validate them.  Is
this struct field used by anything?  Let's cut it out and see what
breaks in the compile.  (Newer statically-typed languages tend to be
even more receptive to those kinds of experiments.)  Attach a debugger
and set breakpoints at functions you're reading.
&lt;/p&gt;

&lt;p&gt;
If you're reading a library, consider starting with an example
program, tracing through the API calls made, into the guts of the
library.
&lt;/p&gt;

&lt;p&gt;
Maybe you also have the version control history (it's wonderful that
we can start almost taking this for granted).  When you find something
interesting, dig back with &lt;code&gt;blame&lt;/code&gt;; what changed, and why?  Also, if
the code seems too complicated, it can be helpful to start from an
early revision and then work forward in history.  Seeing the code
adapt as imagination encounters the real world paints a picture of
evolution as vivid as any archaeological exhibit.
&lt;/p&gt;

&lt;p&gt;
Serendipity can also be good.  Seibel, cited above, describes
"play[ing] the role of a 19th century naturalist returning from a trip
to some exotic island to present to the local scientific society a
discussion of the crazy beetles they found".  Sometimes I like to just
peek at different parts of the code at random, and see if there's
something that catches my eye or delights me.
&lt;/p&gt;

&lt;p&gt;
After all, reading code is not just good for you; it is fun.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I should probably wait until I've read her new book to write
this, since I'm sure it has some great insights about this, but I
can't wait.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Tangent: perhaps this will never happen in software
because nothing runs reliably long enough for anyone to think we can
get rid of the programmers.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
"If you don't have time to read, you don't have the time (or
the tools) to write. Simple as that." — Stephen King.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I feel this is advice commonly given to young
mathematicians but I can't find any source for it, at the moment.  I
think it must derive from &lt;a href="http://www.federicopereiro.com/masters/"&gt;this Niels Abel quote&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.5" class="footnum" href="#fnr.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
&lt;a href="https://web.archive.org/web/20040131054056/http://www.colorforth.com/"&gt;Chuck Moore is ahead of the game on this one.&lt;/a&gt;
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.6" class="footnum" href="#fnr.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I feel this is inherent, because &lt;a href="http://siderea.livejournal.com/1241996.html"&gt;software is made of
decisions&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.7" class="footnum" href="#fnr.7" role="doc-backlink"&gt;7&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
"The future masters of technology will have to be
light-hearted and intelligent. The machine easily masters the grim and
the dumb." — Marshall McLuhan
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.8" class="footnum" href="#fnr.8" role="doc-backlink"&gt;8&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Ok, a friend convinced me to include a few places to
start if you're really at a loss; since I talk about C at the end, how
about some C code I've enjoyed reading recently: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git"&gt;postgres&lt;/a&gt;, &lt;a href="https://github.com/Tarsnap/spiped"&gt;anything by
cperciva&lt;/a&gt;, &lt;a href="https://github.com/illumos/illumos-gate/tree/master/usr/src/uts"&gt;Illumos&lt;/a&gt;, &lt;a href="https://www.sqlite.org/cgi/src/tree?ci=trunk"&gt;sqlite&lt;/a&gt;.  Some of these are pretty complicated, but
typically stylistically good, and uncommonly well-commented.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.9" class="footnum" href="#fnr.9" role="doc-backlink"&gt;9&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Desperately absent in this article is an acknowledgement of
how much free and open source software has changed the world, but I
don't know how to write about it.  The beginning of &lt;a href="http://www.drmaciver.com/2017/01/programmer-at-large-what-is-this/"&gt;David MacIver's
Programmer at Large&lt;/a&gt; makes me think, though.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>Are Jump Tables Always Fastest?</title><link href='http://cipht.net/2017/10/03/are-jump-tables-always-fastest.html'/><updated>2017-10-03T02:30:00+0000</updated><id>http://cipht.net/2017/10/03/are-jump-tables-always-fastest</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;tl;dr:&lt;/b&gt; I make a petty point about premature optimization; don't go
out and rewrite your &lt;code&gt;switch&lt;/code&gt; statements as binary searches by hand;
maybe &lt;i&gt;do&lt;/i&gt; rewrite your jump tables as &lt;code&gt;switch&lt;/code&gt; statements, though.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
A couple of years ago I got into an argument in a job interview.  In
this case, the question was how I would implement dispatch for a
protocol handler efficiently, and my answer was that I would write the
most obvious code possible, probably with a &lt;code&gt;switch&lt;/code&gt; statement, and
see what my compiler produced, before making any tricky implementation
choices.  (Of course, my answer should have been, "write the obvious
thing and then measure it", but I have always had a thing for
inspecting compiler output.)
&lt;/p&gt;

&lt;p&gt;
The interviewer indicated that this wasn't the answer they wanted to
hear, and kept prodding me until I realized the question they thought
they were asking was "how do you implement a jump table in C".  At the
time, I grumbled a bit that a jump table wasn't necessarily the best
approach, especially if the distribution of dispatch cases is uneven,
but I didn't fight too much.  This stuck with me, though, for a few
reasons:
&lt;/p&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;any reasonable compiler will emit a jump table for a dense switch
statement if it judges prudent;&lt;/li&gt;
&lt;li&gt;given the complexities introduced by the cache and branch
prediction, it's not a safe assumption that comparison-based
dispatch is slower than a jump table;&lt;/li&gt;
&lt;li&gt;any time you feel wronged in an interview you'll never let it go.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The more I thought about it, the more I wondered how big these effects
might be.  I started to work on a simple experiment to evaluate the
performance of table-based dispatch versus comparison-based dispatch,
and reviewed the literature.  (Spoiler: if you're looking for a
rigorous experimental evaluation of these effects, it's not in this
article.  Read Roger Sayle's &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=968AE756567863243AC7B1728915861A?doi=10.1.1.602.1875&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;A Superoptimizer Analysis of Multiway
Branch Code Generation&lt;/a&gt; for that.)
&lt;/p&gt;

&lt;p&gt;
I got a push to finish this &lt;a href="http://www.cipht.net/2014/08/19/ilc2014.html"&gt;when I attended ILC2014&lt;/a&gt;, where Robert
Strandh presented &lt;a href="http://metamodular.com/generic-dispatch.pdf"&gt;a paper on improving generic dispatch in CLOS&lt;/a&gt; which
relied on the idea that table-based dispatch methods pay a penalty on
modern hardware due to the additional, non-sequential memory accesses.
&lt;/p&gt;

&lt;p&gt;
However, I still didn't do this, and the code sat for a long time.
&lt;/p&gt;

&lt;p&gt;
But it came up again, I had some more literature references, and
everyone loves a juicy interview story, so let's uncharitably phrase
the interviewer's hypothesis as "jump tables are always faster" and
show that this isn't so.
&lt;/p&gt;

&lt;div id="outline-container-org05413f7" class="outline-2"&gt;
&lt;h2 id="org05413f7"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; a little background&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
What do I mean by jump tables and so on?  Imagine we have code like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;switch&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;packet&lt;span style="color: #a9779c;"&gt;[&lt;/span&gt;9&lt;span style="color: #a9779c;"&gt;]&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
  &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; PROTO_TCP: &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; handle_tcp&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;packet&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
  &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; PROTO_UDP: &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; handle_udp&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;packet&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
  &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; PROTO_ICMP: &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; handle_icmp&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;packet&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
  ...
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Let's abstract that a little.  Our experiment will actually generate C
code like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;dispatch&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;unsigned&lt;/span&gt; &lt;span style="color: #845A84;"&gt;state&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
  &lt;span style="color: #13665F;"&gt;switch&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;state&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; 0: fn_0&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;; &lt;span style="color: #13665F;"&gt;return&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; 1: fn_1&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;; &lt;span style="color: #13665F;"&gt;return&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; 2: fn_2&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;; &lt;span style="color: #13665F;"&gt;return&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;case&lt;/span&gt; 3: fn_3&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;; &lt;span style="color: #13665F;"&gt;return&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;default&lt;/span&gt;: abort&lt;span style="color: #4C7A90;"&gt;()&lt;/span&gt;;
  &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The code the interviewer was looking for me to manually transform that
into is approximately this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;static&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;*&lt;span style="color: #4C7A90;"&gt;vtable&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;[&lt;/span&gt;4&lt;span style="color: #a9779c;"&gt;]&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; = &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt; fn_0, fn_1, fn_2, fn_3 &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;;

&lt;span style="color: #E36B3F;"&gt;void&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;dispatch&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;unsigned&lt;/span&gt; &lt;span style="color: #845A84;"&gt;state&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
  &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;state &amp;gt;= 4&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; abort&lt;span style="color: #a9779c;"&gt;()&lt;/span&gt;;
  &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;*vtable&lt;span style="color: #4C7A90;"&gt;[&lt;/span&gt;state&lt;span style="color: #4C7A90;"&gt;]&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)()&lt;/span&gt;;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
That's a jump table.  We'll write it in x86-64 assembly like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-asm"&gt;        &lt;span style="color: #13665F;"&gt;.text&lt;/span&gt;
&lt;span style="color: #4C7A90;"&gt;dispatch&lt;/span&gt;:
        &lt;span style="color: #13665F;"&gt;cmp&lt;/span&gt; $4, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;jae&lt;/span&gt; 1f
        &lt;span style="color: #13665F;"&gt;jmp&lt;/span&gt; *vtable&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;, 8&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;span style="color: #4C7A90;"&gt;1&lt;/span&gt;:      &lt;span style="color: #13665F;"&gt;call&lt;/span&gt; abort
        &lt;span style="color: #13665F;"&gt;.data&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;.align&lt;/span&gt; 16
&lt;span style="color: #4C7A90;"&gt;vtable&lt;/span&gt;:
        &lt;span style="color: #13665F;"&gt;.quad&lt;/span&gt; fn_0, fn_1, fn_2, fn_3
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
How else could the compiler compile that &lt;code&gt;switch&lt;/code&gt; statement?  A really
simple (but not always ineffective) way is linear search:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-asm"&gt;&lt;span style="color: #4C7A90;"&gt;dispatch&lt;/span&gt;:
        &lt;span style="color: #13665F;"&gt;cmp&lt;/span&gt; $0, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;je&lt;/span&gt; fn_0
        &lt;span style="color: #13665F;"&gt;cmp&lt;/span&gt; $1, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;je&lt;/span&gt; fn_1
        &lt;span style="color: #13665F;"&gt;cmp&lt;/span&gt; $2, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;je&lt;/span&gt; fn_2
        &lt;span style="color: #13665F;"&gt;cmp&lt;/span&gt; $3, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;je&lt;/span&gt; fn_3
        &lt;span style="color: #13665F;"&gt;call&lt;/span&gt; abort
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
But if we have a lot of cases, or they're widely spread out, we'd
probably want to at least use binary search:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-asm"&gt;&lt;span style="color: #4C7A90;"&gt;dispatch&lt;/span&gt;:
        &lt;span style="color: #13665F;"&gt;cmp&lt;/span&gt; $2, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;jae&lt;/span&gt; .L1
&lt;span style="color: #4C7A90;"&gt;.L0&lt;/span&gt;:    cmp $0, &lt;span style="color: #845A84;"&gt;%edi&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;je&lt;/span&gt; fn_0
        &lt;span style="color: #13665F;"&gt;jmp&lt;/span&gt; fn_1
&lt;span style="color: #4C7A90;"&gt;.L1&lt;/span&gt;:    je fn_2
        &lt;span style="color: #13665F;"&gt;jmp&lt;/span&gt; fn_3
        &lt;span style="color: #13665F;"&gt;call&lt;/span&gt; abort
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Those are our basic options, although when we look at the literature,
we'll see there are a range of other choices.
&lt;/p&gt;

&lt;p&gt;
What is this good for in general?  We find the pattern of "dispatch to
a handler", often implemented with &lt;code&gt;switch&lt;/code&gt; or pattern matching, all
over the place: in finite state machines (&lt;a href="http://staff.polito.it/riccardo.sisto/cisco/report.pdf"&gt;check out this FSA-based
packet filter&lt;/a&gt;), generic dispatch, protocol handlers, simple
interpreters, and virtual machines.  (Both &lt;a href="http://elixir.free-electrons.com/linux/v4.13.4/source/net/ipv4/ip_input.c#L205"&gt;Linux&lt;/a&gt; and &lt;a href="https://github.com/freebsd/freebsd/blob/4b19e2f4fbc9c52cfe3eea381967ea0b9934042d/sys/netinet/ip_input.c#L437"&gt;FreeBSD&lt;/a&gt; use
table-based dispatch instead of switch for handling IP packets,
although I'd argue this is more about flexibility than speed.)
&lt;/p&gt;

&lt;p&gt;
For example, the canonical implementation Tcl
(&lt;a href="https://github.com/tcltk/tcl/blob/a7715d59bea55164b5d88d7b74a93b978e51b46e/generic/tclExecute.c#L2417"&gt;generic/tclExecute.c:2417&lt;/a&gt;), Lua (&lt;a href="https://github.com/lua/lua/blob/master/lvm.c#L793"&gt;lvm.c:793&lt;/a&gt;), as well as the most
portable forms of Python and &lt;a href="https://github.com/ruby/ruby/blob/c08f7b80889b531865e74bc5f573df8fa27f2088/vm_exec.h#L145-L147"&gt;Ruby&lt;/a&gt;'s bytecode interpreters, use
switch-based dispatch.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="https://en.wikipedia.org/wiki/Threaded_code"&gt;Threaded code&lt;/a&gt; is more popular for interpreters these days; see Ertl
and Gregg, &lt;a href="https://www.jilp.org/vol5/v5paper12.pdf"&gt;The Structure and Performance of Efficient Interpreters&lt;/a&gt;, as
well as this interesting comment about the efficiency of threaded code
versus what the compiler generates for switch in CPython's
&lt;a href="https://github.com/python/cpython/blob/ff8ad0a576c6cf375e68276e614e551b3b759254/Python/ceval.c#L825"&gt;Python/ceval.c:825&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
Threaded code is basically where the end of each handler dispatches to
the next one; this makes sense in a VM where you have the whole
program, but not in a protocol handler where you probably don't have
the next packet.
&lt;/p&gt;

&lt;p&gt;
Threaded code should have better branch prediction behavior than a
jump table with a single dispatch point (for a nice analysis of this,
see &lt;a href="http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/"&gt;Eli Bendersky's Computed goto for efficient dispatch tables&lt;/a&gt;),
although &lt;a href="http://meseec.ce.rit.edu/eecc722-fall2001/papers/branch-prediction/4/indir_isca24.pdf"&gt;indirect branch prediction&lt;/a&gt; should help even the field.
(&lt;b&gt;Update:&lt;/b&gt; See &lt;a href="https://hal.inria.fr/hal-01100647/document"&gt;Branch Prediction and the Performance of Interpreters -
Don’t Trust Folklore&lt;/a&gt;; hardware has already caught up.)
&lt;/p&gt;

&lt;p&gt;
But we're getting ahead of ourselves.  Can we find any support in the
literature for our argument?  (Ok, we could look at the GCC source,
but I promise, the literature is a better place to start in this
case.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org80c797c" class="outline-2"&gt;
&lt;h2 id="org80c797c"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; the early literature&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
Looking around, we quickly trace back to Arthur Sale's &lt;a href="http://eprints.utas.edu.au/126/1/CaseStmts.pdf"&gt;The
Implementation of Case Statements in Pascal&lt;/a&gt; (1981), which is
interesting just for the details on how simplistic many compilers of
the time were, often because they had to emit code immediately.
&lt;/p&gt;

&lt;p&gt;
Sale points out that most Pascal compilers at the time would produce
simple jump tables from switch statements, and proposes binary search
instead.
&lt;/p&gt;

&lt;p&gt;
Robert Bernstein (&lt;a href="http://onlinelibrary.wiley.com/doi/10.1002/spe.4380151009/pdf%0A"&gt;Producing good code for the case statement&lt;/a&gt; (1985);
paywalled, sorry) goes into some gritty details; he points out that
binary search usually takes one more comparison than linear search per
case, and asserts that binary search is faster than linear search when
there are at least 4 case items.  We'll revisit that further on.
&lt;/p&gt;

&lt;p&gt;
Bernstein talks about some of the latitude we have in optimizing these
search trees; for example, that paths which lead to traps can have the
most instructions, since they're not likely to be repeatedly executed,
and we can decide whether to put the jump-above / jump-equals first.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgec017df" class="outline-2"&gt;
&lt;h2 id="orgec017df"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; a dialogue with the compiler&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
How do we decide some of these things?  Bernstein says:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
Faster executing code can be produced if the probabilities that the
case selector takes on the case item values are known. These may be
known as a result of trace information that is automatically supplied
to the compiler, or perhaps as an extra-lingual mechanism pertaining
to the case statement. In step 5 , the linear search can be arranged
in decreasing probabilities, and a Huffman search rather than a binary
search can be used.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
How can you supply this information to the compiler?  I found Dan
(djb) Bernstein's &lt;a href="https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf"&gt;the Death of Optimizing Compilers&lt;/a&gt; made concrete
everything I had been thinking for a while about the need for a
&lt;i&gt;dialogue&lt;/i&gt; between the programmer and the compiler.
&lt;/p&gt;

&lt;p&gt;
Right now, we can supply a bit of information ahead of time; for
example, in GCC, we can use &lt;a href="https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect"&gt;&lt;code&gt;__builtin_expect&lt;/code&gt;&lt;/a&gt; (which you might be
familiar with from the Linux kernel's &lt;a href="http://elixir.free-electrons.com/linux/latest/source/include/linux/compiler.h#L137"&gt;&lt;code&gt;likely~/~unlikely&lt;/code&gt; macros&lt;/a&gt;).
LLVM has &lt;a href="https://llvm.org/docs/BranchWeightMetadata.html"&gt;branch weight metadata&lt;/a&gt;, although it looks like clang's
interface to this is only the same limited &lt;code&gt;__builtin_expect&lt;/code&gt;
interface as GCC.
&lt;/p&gt;

&lt;p&gt;
We can do much better with &lt;a href="https://en.wikipedia.org/wiki/Profile-guided_optimization"&gt;profile-guided optimization (PGO)&lt;/a&gt; /
feedback-driven optimization (FDO); see &lt;a href="https://gcc.gnu.org/wiki/AutoFDO"&gt;AutoFDO in GCC&lt;/a&gt;,
&lt;a href="https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#Instrumentation-Options"&gt;&lt;code&gt;-fprofile-arcs&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html"&gt;&lt;code&gt;-fbranch-probabilities&lt;/code&gt;&lt;/a&gt; in GCC, et cetera, but
these (like the benefits of JIT compilers) require us to run the
program with representative workloads.  This might be better for some
cases, but is a bit unsatisfying if we want to communicate in the
source code (to both reader and compiler), something we know ahead of
time about the distribution of cases.
&lt;/p&gt;

&lt;p&gt;
(Outside the scope of this article, but interesting: branch prediction
tables are power hungry; we could want to optimize a specific switch
for power instead of performance.  This is another application of this
kind of dialogue with the compiler.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgfdbc629" class="outline-2"&gt;
&lt;h2 id="orgfdbc629"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; later literature&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
Through the 90s, there's a trickle of papers: Kannan and Proebsting
&lt;a href="https://pdfs.semanticscholar.org/3110/91fb9fb9d38f7b76e768a603c02acc799fe0.pdf"&gt;issue an important practical correction&lt;/a&gt; to Bernstein; H.G. Dietz's
&lt;a href="https://pdfs.semanticscholar.org/fad6/7476c2f2a6e9995ef051ff17763e1488472f.pdf"&gt;Coding Multiway Branches Using Customized Hash functions&lt;/a&gt; (1992)
proposes a hashing approach to the problem; Erlingsson et al.'s
&lt;a href="https://www.cs.cornell.edu/home/ulfar/mrst.html"&gt;Efficient Multiway Radix Search Trees&lt;/a&gt; (1996) presents a better
approach for sparse sets.
&lt;/p&gt;

&lt;p&gt;
Already it's becoming clear that &lt;code&gt;switch&lt;/code&gt; generation isn't cut and
dry, and then through the 2000s Roger Sayle publishes several papers,
culminating in &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=968AE756567863243AC7B1728915861A?doi=10.1.1.602.1875&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;A Superoptimizer Analysis of Multiway Branch Code
Generation&lt;/a&gt;, which almost saves us from having to run any experiment at
all.  It contains citations to lots of related work (much of it by
Sayle himself), but more importantly, a great summary of techniques,
and benchmarks that pretty much prove our point for us already.
&lt;/p&gt;

&lt;p&gt;
Sayle demonstrates that GCC can produce a wide range of code for
&lt;code&gt;switch&lt;/code&gt;, suitable to many different situations, and even suggests
that the compiler should detect attempts to manually implement
&lt;code&gt;switch&lt;/code&gt; (usually for irrelevant performance considerations relevant
on legacy systems) and undo them.&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgb4e5002" class="outline-2"&gt;
&lt;h2 id="orgb4e5002"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; an experiment&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
However, as I am a big believer in &lt;b&gt;rhetorical benchmarks&lt;/b&gt;, we might
as well construct a small experiment that definitively proves our
(admittedly ungracious and cherry-picked) point.
&lt;/p&gt;

&lt;p&gt;
I wrote some code for this a few years ago, and then my ambitions for
what should be tested grew beyond measure, leading to nothing getting
done.  So, in restoring this, I decided to explicitly allow myself to
release some terrible code full of measurement errors: after all, this
is a &lt;i&gt;rhetorical&lt;/i&gt; benchmark (and what benchmark isn't?) — it doesn't
need to be correct, it only needs to prove our point.&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
The important thing for making our point is that we can find branch
probabilities that support almost any implementation choice.  For
example, an IP stack protocol handler might encounter TCP packets 70%
of the time, UDP packets 25% of the time, and other stuff 5% of the
time.  (Note: not an actual measurement.)  In this case, if we can
bias our dispatch code towards TCP and UDP packets, we're likely to
get a win.
&lt;/p&gt;

&lt;p&gt;
In this case, we'll choose a distribution like this (but even more
unfair), where 80% of the time, we take the first case, and 20% of the
time we take a random case.  So we run: (on a machine quite unsuitable
to benchmarking)
&lt;/p&gt;

&lt;pre class="example" id="orgb610f89"&gt;
make run-experiment CALL_DISTRIBUTION=pareto N_DISPATCHES=5000000 N_RUNS=5 N_ENTRIES=256
&lt;/pre&gt;

&lt;p&gt;
and cherry-pick some spurious but convincing results:
&lt;/p&gt;

&lt;pre class="example" id="orgd9ce998"&gt;
Performance counter stats for './x86_64-binary 5000000' (5 runs):

    6,883,819,114      cycles                    #    2.090 GHz                      ( +-  0.43% )
      232,004,486      instructions              #    0.03  insns per cycle          ( +-  0.06% )
       56,828,213      branches                  #   17.257 M/sec                    ( +-  0.04% )
        1,262,892      branch-misses             #    2.22% of all branches          ( +-  0.05% )

      3.299025345 seconds time elapsed                                          ( +-  0.43% )

Performance counter stats for './x86_64-vtable 5000000' (5 runs):

    7,709,225,443      cycles                    #    2.087 GHz                      ( +-  0.95% )
      217,283,422      instructions              #    0.03  insns per cycle          ( +-  0.03% )
       51,631,368      branches                  #   13.976 M/sec                    ( +-  0.03% )
          957,553      branch-misses             #    1.85% of all branches          ( +-  0.10% )

      3.706410106 seconds time elapsed                                          ( +-  1.04% )
&lt;/pre&gt;

&lt;p&gt;
Which allows us to claim what came to us in &lt;a href="https://en.wikipedia.org/wiki/L%2527esprit_de_l%2527escalier"&gt;l'esprit de l'escalier&lt;/a&gt;
after that ill-fated interview: explicitly-constructed jump tables
aren't always faster than what the compiler generates.
&lt;/p&gt;

&lt;p&gt;
(The code is at &lt;a href="https://github.com/tokenrove/dispatch-comparison"&gt;https://github.com/tokenrove/dispatch-comparison&lt;/a&gt;, but
I wouldn't recommend using it for anything.)
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgcd1106c" class="outline-2"&gt;
&lt;h2 id="orgcd1106c"&gt;&lt;span class="section-number-2"&gt;6.&lt;/span&gt; further work&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
Ideally, we'd want to run a proper experiment on this, that tries all
sorts of combinations, with proper cache-flushing&lt;sup&gt;&lt;a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt;, variable
amounts of work in the "handlers", et cetera.  Maybe later.  I'm
explicitly calling this experiment cheating and moving on.
&lt;/p&gt;

&lt;p&gt;
I didn't talk about the effects of these techniques on instruction
cache usage, which is actually one of the most interesting factors.
Unfortunately there are a lot of different patterns to talk about:
when the handlers are large enough to boot the dispatch code out of
I-cache, versus tight loops that are just dispatching all the time.
&lt;/p&gt;

&lt;p&gt;
There are also a lot of ways we can tweak search:
&lt;/p&gt;

&lt;ul class="org-ul"&gt;
&lt;li&gt;Knuth breaks down the math for cost for linear search with
carefully ordered data (by probability); see TAOCP volume 3,
section 6.1.&lt;/li&gt;
&lt;li&gt;It's good to know at what point binary search becomes faster than
linear search on your machine, &lt;a href="https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/"&gt;as this experiment demonstrates&lt;/a&gt; (and
then Paul Khuong &lt;a href="https://www.pvk.ca/Blog/2012/07/03/binary-search-star-eliminates-star-branch-mispredictions/"&gt;argues that binary search is basically always
better&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Leveraging some of what we've seen so far, could we better exploit
branch prediction by combining optimal decision tree dispatch with
threaded code?  Suppose the end of each VM instruction were rewritten
to perform the first \(n\) levels of binary search, so the highest level
branches would be better predicted (if instruction dispatch is close
to a Markov process, anyway).&lt;sup&gt;&lt;a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt;  See also Baer, &lt;a href="https://arxiv.org/pdf/cs/0604016.pdf"&gt;On
Conditional Branches in Optimal Search Trees&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
But I think I'll avoid exploring this further until the next querulous
job interview.&lt;sup&gt;&lt;a id="fnr.5" class="footref" href="#fn.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Tangent: if we see &lt;code&gt;switch&lt;/code&gt; as just an anemic form
of pattern matching, there's a lot more of interest in the literature,
but that's another rabbithole.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
For example, I'm totally ignoring the advice from
Kalibera, Jones: &lt;a href="https://kar.kent.ac.uk/33611/7/paper.pdf"&gt;Rigorous Benchmarking in Reasonable Time&lt;/a&gt;.  If luck
prevails, I'll write a bit about common easily-made benchmarking
errors and the role of rhetorical benchmarks soon.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
See Whaley and Castaldo, &lt;a href="http://www.csc.lsu.edu/~whaley/papers/timing_SPE08.pdf"&gt;Achieving accurate and
context-sensitive timing for code optimization&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
NB: a binary search sorted by weights is precisely
a Huffman search.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.5" class="footnum" href="#fnr.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Many people helped me make this article better.  Thanks!
I'm not sure how to thank you all.  Of course the mistakes remain my
own.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>When is an Erlang iolist an iovec?</title><link href='http://cipht.net/2017/01/15/when-is-an-iolist-an-iovec.html'/><updated>2017-01-15T03:30:00+0000</updated><id>http://cipht.net/2017/01/15/when-is-an-iolist-an-iovec</id><content type='html'>&lt;p&gt;
While trying to improve the performance of JSON encoding in an Erlang
application last year, I came to wonder about the different
representations one can use when writing data to disk or sending it
over a socket, and how they map to the OS's underlying facilities.
&lt;/p&gt;

&lt;p&gt;
In Erlang, there is a convention of deferring the creation of large
binaries, whereby functions accept a list called an iolist, composed
of binaries, characters (integers), and other, nested, iolists.  This
means that you don't need to waste time and space copying a bunch of
pieces of a message into a single buffer before writing it out to a
socket, for example.
&lt;/p&gt;

&lt;p&gt;
If you've done much network programming, that concept probably sounds
familiar; it's like a less-restrictive version of the &lt;code&gt;struct iovec&lt;/code&gt;
scatter-gather structures used in many Unix IO calls (see &lt;a href="https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGather.html"&gt;the glibc
documentation&lt;/a&gt; for example).
&lt;/p&gt;

&lt;p&gt;
Since there's an obvious connection, I asked myself the titular
question for this post: when does an Erlang iolist map most closely to
an iovec?  Is there a layout that minimizes copying and allocation?
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;tl;dr:&lt;/b&gt; Erlang does a pretty good job of this, so don't worry about
it and just use iolists.  A list of large refcounted binaries maps
closely to an iovec with minimal extra copying.  Also, some drivers
don't even support iolists!
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div id="outline-container-orgcddb2fa" class="outline-2"&gt;
&lt;h2 id="orgcddb2fa"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; Establishing the relationship&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
I had previously noticed the &lt;code&gt;SysIOVec&lt;/code&gt; structure in the ERTS source,
which &lt;code&gt;erts/emulator/sys/unix/driver_int.h&lt;/code&gt; defines as:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;typedef&lt;/span&gt; &lt;span style="color: #13665F;"&gt;struct&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;iovec&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;SysIOVec&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
(As I was fact-checking this article, I noticed that &lt;a href="http://www.erlang.org/doc/apps/erts/alt_dist.html"&gt;a chapter of the
ERTS reference manual&lt;/a&gt; is explicit in making this connection.  People
often complain about the Erlang documentation, but part of the problem
is just that you might never think to read a chapter called "How to
implement an alternative carrier for the Erlang distribution".)
&lt;/p&gt;

&lt;p&gt;
&lt;code&gt;ErlIOVec&lt;/code&gt;, which contains a &lt;code&gt;SysIOVec&lt;/code&gt;, is defined like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;typedef&lt;/span&gt; &lt;span style="color: #13665F;"&gt;struct&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;erl_io_vec&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;vsize&lt;/span&gt;;                  &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;length of vectors&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;ErlDrvSizeT&lt;/span&gt; &lt;span style="color: #845A84;"&gt;size&lt;/span&gt;;           &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;total size in bytes&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;SysIOVec&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;iov&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;ErlDrvBinary&lt;/span&gt;** &lt;span style="color: #845A84;"&gt;binv&lt;/span&gt;;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;ErlIOVec&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
There's &lt;a href="http://www1.erlang.org/pipermail/erlang-questions/2002-October/005859.html"&gt;an &lt;code&gt;erlang-questions&lt;/code&gt; mailing list post&lt;/a&gt; by Scott Fritchie that
points to the functions &lt;code&gt;io_list_to_vec()&lt;/code&gt; and &lt;code&gt;io_list_vec_len()&lt;/code&gt;.
Let's take a look at them.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org09392d6" class="outline-2"&gt;
&lt;h2 id="org09392d6"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Reading the source&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
Here's the beginning of &lt;code&gt;io_list_vec_len&lt;/code&gt;: (in
&lt;a href="https://github.com/erlang/otp/blob/e4f93595aba76c2eda2d2efef175ea9d72ee5d29/erts/emulator/beam/io.c"&gt;&lt;code&gt;erts/emulator/beam/io.c&lt;/code&gt;&lt;/a&gt;)
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; * Returns 0 if successful and a non-zero value otherwise.&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; *&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; * Return values through pointers:&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; *    *vsize      - SysIOVec size needed for a writev&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; *    *csize      - Number of bytes not in binary (in the common binary)&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; *    *pvsize     - SysIOVec size needed if packing small binaries&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; *    *pcsize     - Number of bytes in the common binary if packing&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; *    *total_size - Total size of iolist in bytes&lt;/span&gt;
&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;

&lt;span style="color: #13665F;"&gt;static&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; 
&lt;span style="color: #4C7A90;"&gt;io_list_vec_len&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt; &lt;span style="color: #845A84;"&gt;obj&lt;/span&gt;, &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;vsize&lt;/span&gt;, &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;csize&lt;/span&gt;,
                &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;pvsize&lt;/span&gt;, &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;pcsize&lt;/span&gt;,
                &lt;span style="color: #E36B3F;"&gt;ErlDrvSizeT&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;total_size&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This is called by &lt;code&gt;erts_port_output&lt;/code&gt;, and a common binary gets
allocated based on the &lt;code&gt;csize&lt;/code&gt; returned.  In our thought experiment
here, we want to avoid allocating this common binary, and we would
like to make sure &lt;code&gt;vsize&lt;/code&gt; is minimized and that we can efficiently
pack the iovec without allocation and copying.
&lt;/p&gt;

&lt;p&gt;
Note that we figure out both packed and unpacked sizes.  Basically, if
the unpacked &lt;code&gt;vsize&lt;/code&gt; is small enough, we won't pack, since that means
less work (as we'll see below).
&lt;/p&gt;

&lt;p&gt;
It's worth noting here that copying things into a common binary,
although it will cause an allocation, can be much faster than forcing
the OS to work with a long iovec.
&lt;/p&gt;

&lt;p&gt;
&lt;code&gt;io_list_vec_len()&lt;/code&gt; continues:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #4C7A90;"&gt;DECLARE_ESTACK&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;s&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;objp&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;v_size&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;c_size&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;b_size&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;in_clist&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;p_v_size&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;p_c_size&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;p_in_clist&lt;/span&gt; = 0;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;total&lt;/span&gt;; &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;Uint due to halfword emulator&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
&lt;code&gt;erts/emulator/beam/global.h&lt;/code&gt; says of &lt;code&gt;DECLARE_ESTACK&lt;/code&gt;: "Here is an
implementation of a lightweiht stack." (sic) It basically lays out a
little array of terms, initially on the stack (up to 16 items), which
can be migrated to the heap if the stack grows too much.
&lt;/p&gt;

&lt;p&gt;
(Reading the ERTS allocator code is an easy way to spend a lot of
time; it's pretty convoluted, and figuring out exactly which allocator
ends up doing what is non-obvious on one's first few encounters with
that code.)
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_jump_start&lt;/span&gt;;  &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;avoid a push&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;

    &lt;span style="color: #13665F;"&gt;while&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;ESTACK_ISEMPTY&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;s&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        obj = ESTACK_POP&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;s&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #845A84;"&gt;L_jump_start&lt;/span&gt;:
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
We begin by jumping into the loop.  Note throughout this code the
careful use of &lt;code&gt;goto&lt;/code&gt; to avoid unnecessary churn on the stack.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_list&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        &lt;span style="color: #845A84;"&gt;L_iter_list&lt;/span&gt;:
            objp = list_val&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
            obj = CAR&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;objp&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If &lt;code&gt;obj&lt;/code&gt; is a cons cell, we inspect the head.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_byte&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                c_size++;
                &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;c_size == 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow_error&lt;/span&gt;;
                &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;in_clist&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
                    in_clist = 1;
                    v_size++;
                &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
                p_c_size++;
                &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;p_in_clist&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
                    p_in_clist = 1;
                    p_v_size++;
                &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If it's a byte (specifically, a fixnum under 256), we add to the size
of the common binary required.  If we weren't already in a section
that will point into the common binary, we have to add to the size of
the underlying iovec.  Note the behavior is the same for packed and
unpacked.
&lt;/p&gt;

&lt;p&gt;
So our first rule is probably going to be "avoid interspersing
binaries and strings".  This will keep &lt;code&gt;vsize&lt;/code&gt; lower.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_binary&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                IO_LIST_VEC_COUNT&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Back to our loop, still handling the elements of a list, if we got a
binary instead, we invoke &lt;code&gt;IO_LIST_VEC_COUNT&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
This is a monster macro in the same file.  Let's take a look in
pieces:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #6D46E3;"&gt;#define&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;IO_LIST_VEC_COUNT&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #845A84;"&gt;obj&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;                                          \
&lt;span style="color: #13665F;"&gt;do&lt;/span&gt; {                                                                    \
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;_size&lt;/span&gt; = binary_size&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;                                      \
    &lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt; &lt;span style="color: #845A84;"&gt;_real&lt;/span&gt;;                                                        \
    ERTS_DECLARE_DUMMY&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;Uint _offset&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;                                   \
    &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;_bitoffs&lt;/span&gt;;                                                       \
    &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;_bitsize&lt;/span&gt;;                                                       \
    ERTS_GET_REAL_BIN&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj, _real, _offset, _bitoffs, _bitsize&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;         \
    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;_bitsize != 0&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;;                               \
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
We start by getting a bunch of properties about the binary, and
erroring out if this is a bitstring.  In Erlang, a bitstring is a
binary with a number of bits which isn't a multiple of 8.  Note that
we don't error out if this binary has a non-octet bit offset.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;thing_subtag&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;*binary_val&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;_real&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; == REFC_BINARY_SUBTAG &amp;amp;&amp;amp;       \
        _bitoffs == 0&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;                                                \
        b_size += _size;                                                \
        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;b_size &amp;lt; _size&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow_error&lt;/span&gt;;                      \
        in_clist = 0;                                                   \
        v_size++;                                                       \
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If this is a byte-aligned refcounted binary, we could put it a
reference right in the iovec.  This is the main answer to this blog
post's question.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;        &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;If iov_len is smaller then Uint we split the binary into&lt;/span&gt;&lt;span style="color: #7c878a;"&gt;*/&lt;/span&gt;   \
        &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;multiple smaller (2GB) elements in the iolist.&lt;/span&gt;&lt;span style="color: #7c878a;"&gt;*/&lt;/span&gt;             \
        v_size += _size / MAX_SYSIOVEC_IOVLEN;                          \
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The commit for this code, added in 2016, notes:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
On windows the max size of an iov element is long, i.e. 4GB so in
order to write larger binaries to file we split the binary into
smaller 2GB chunks so that the write is possible.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
I bet that was inspired by a fun bug.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;_size &amp;gt;= ERL_SMALL_IO_BIN_LIMIT&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;                          \
            p_in_clist = 0;                                             \
            p_v_size++;                                                 \
        &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;                                                        \
            p_c_size += _size;                                          \
            &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;p_in_clist&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;                                          \
                p_in_clist = 1;                                         \
                p_v_size++;                                             \
            &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;                                                           \
        &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;                                                               \
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This code applies only to our packed counts.  &lt;code&gt;ERL_SMALL_IO_BIN_LIMIT&lt;/code&gt;
is &lt;code&gt;4*ERL_ONHEAP_BIN_LIMIT&lt;/code&gt;, which is 4×64 = 256.  So if the binary is
less than 256 bytes, and we need to pack, we'll copy it into the
common binary instead of referencing it directly.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;                                                            \
        c_size += _size;                                                \
        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;c_size &amp;lt; _size&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow_error&lt;/span&gt;;                      \
        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;in_clist&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;                                                \
            in_clist = 1;                                               \
            v_size++;                                                   \
        &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;                                                               \
        p_c_size += _size;                                              \
        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;p_in_clist&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;                                              \
            p_in_clist = 1;                                             \
            p_v_size++;                                                 \
        &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;                                                               \
    &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;                                                                   \
&lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;while&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;0&lt;span style="color: #ef6787; background-color: #832729;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Otherwise, if this is a heap binary, we always copy it into the
common binary.
&lt;/p&gt;


&lt;p&gt;
Back in &lt;code&gt;io_list_vec_len()&lt;/code&gt;:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_list&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                ESTACK_PUSH&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;s, CDR&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;objp&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_iter_list&lt;/span&gt;;   &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;on head&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If we got a list, we push it on the stack to deal with later.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;is_nil&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Finally, if there's anything else that isn't the empty list, that's an
error.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            obj = CDR&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;objp&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
            &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_list&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_iter_list&lt;/span&gt;;   &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;on tail&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
            &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_binary&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;  &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;binary tail is OK&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
                IO_LIST_VEC_COUNT&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
            &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;is_nil&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Here we handle the tail of the list.  It's interesting that an iolist
doesn't need to be a proper list (you can have &lt;code&gt;[List | &amp;lt;&amp;lt;"bin"&amp;gt;&amp;gt;]&lt;/code&gt;).
&lt;/p&gt;


&lt;p&gt;
So, our conclusions so far would seem to be that an Erlang iolist maps
most closely to an iovec when it is an arbitrarily nested list of
refcounted binaries &amp;#x2013; and if there are more than 16 of them, they
must be at least 256 bytes long each.
&lt;/p&gt;

&lt;p&gt;
Since the stack for processing these lists changes its allocation
strategy around 16 items deep, in practice you probably don't want
your iolists to be too deeply nested.
&lt;/p&gt;


&lt;p&gt;
Let's look at &lt;code&gt;io_list_to_vec&lt;/code&gt;:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #13665F;"&gt;static&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt;
&lt;span style="color: #4C7A90;"&gt;io_list_to_vec&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt; &lt;span style="color: #845A84;"&gt;obj&lt;/span&gt;,       &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;io-list&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
               &lt;span style="color: #E36B3F;"&gt;SysIOVec&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;iov&lt;/span&gt;,   &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;io vector&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
               &lt;span style="color: #E36B3F;"&gt;ErlDrvBinary&lt;/span&gt;** &lt;span style="color: #845A84;"&gt;binv&lt;/span&gt;, &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;binary reference vector&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
               &lt;span style="color: #E36B3F;"&gt;ErlDrvBinary&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;cbin&lt;/span&gt;, &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;binary to store characters&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
               &lt;span style="color: #E36B3F;"&gt;ErlDrvSizeT&lt;/span&gt; &lt;span style="color: #845A84;"&gt;bin_limit&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;   &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;small binaries limit&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The comments here are pretty clear; &lt;code&gt;obj&lt;/code&gt; is the input and &lt;code&gt;iov&lt;/code&gt; is
the output.  Whether to pack or not is expressed by &lt;code&gt;bin_limit&lt;/code&gt;.  As
we look at the rest of the code, we'll see how the other arguments are
used.  The function begins with these declarations:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;&lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    DECLARE_ESTACK&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;s&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;objp&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;char&lt;/span&gt; *&lt;span style="color: #845A84;"&gt;buf&lt;/span&gt;  = cbin-&amp;gt;orig_bytes;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;len&lt;/span&gt; = cbin-&amp;gt;orig_size;
    &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;csize&lt;/span&gt;  = 0;
    &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;vlen&lt;/span&gt;   = 0;
    &lt;span style="color: #E36B3F;"&gt;char&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;cptr&lt;/span&gt; = buf;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Note that we setup &lt;code&gt;buf&lt;/code&gt;, &lt;code&gt;len&lt;/code&gt;, and &lt;code&gt;cptr&lt;/code&gt; according to &lt;code&gt;cbin&lt;/code&gt;.
Their use will become clear as we go on.
&lt;/p&gt;

&lt;p&gt;
We continue with a jump into the body of our loop:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_jump_start&lt;/span&gt;;  &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;avoid push&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;

    &lt;span style="color: #13665F;"&gt;while&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;ESTACK_ISEMPTY&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;s&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        obj = ESTACK_POP&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;s&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #845A84;"&gt;L_jump_start&lt;/span&gt;:
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
We can see that the structure of this function mirrors that of
&lt;code&gt;io_list_vec_len()&lt;/code&gt;.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_list&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        &lt;span style="color: #845A84;"&gt;L_iter_list&lt;/span&gt;:
            objp = list_val&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
            obj = CAR&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;objp&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If our term is a cons, we look at the car.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_byte&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;len == 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow&lt;/span&gt;;
                *buf++ = unsigned_val&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
                csize++;
                len--;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If it's a byte, we store that byte in the buffer passed in as &lt;code&gt;cbin&lt;/code&gt;.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;is_binary&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;&lt;span style="color: #ef6787; background-color: #832729;"&gt;)&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
                ESTACK_PUSH&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;s, CDR&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;objp&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;handle_binary&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If it's a binary, we push the tail of the list we're working on onto
the stack so we pick up there later, and jump to &lt;code&gt;handle_binary&lt;/code&gt;.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;is_list&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;&lt;span style="color: #ef6787; background-color: #832729;"&gt;)&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
                ESTACK_PUSH&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;s, CDR&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;objp&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_iter_list&lt;/span&gt;;    &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;on head&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
            &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;is_nil&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;&lt;span style="color: #ef6787; background-color: #832729;"&gt;)&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;;
            &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If it's a list, save our current position in the outer list to the
stack, and start walking the inner list.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            obj = CDR&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;objp&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
            &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_list&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_iter_list&lt;/span&gt;; &lt;span style="color: #7c878a;"&gt;/* &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;on tail&lt;/span&gt;&lt;span style="color: #7c878a;"&gt; */&lt;/span&gt;
            &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;is_binary&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;handle_binary&lt;/span&gt;;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;is_nil&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;obj&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;;
            &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
How the cdr of the cons is handled should be unsurprising by now.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;        &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;is_binary&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;&lt;span style="color: #ef6787; background-color: #832729;"&gt;)&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
            &lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt; &lt;span style="color: #845A84;"&gt;real_bin&lt;/span&gt;;
            &lt;span style="color: #E36B3F;"&gt;Uint&lt;/span&gt; &lt;span style="color: #845A84;"&gt;offset&lt;/span&gt;;
            &lt;span style="color: #E36B3F;"&gt;Eterm&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;bptr&lt;/span&gt;;
            &lt;span style="color: #E36B3F;"&gt;ErlDrvSizeT&lt;/span&gt; &lt;span style="color: #845A84;"&gt;size&lt;/span&gt;;
            &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;bitoffs&lt;/span&gt;;
            &lt;span style="color: #E36B3F;"&gt;int&lt;/span&gt; &lt;span style="color: #845A84;"&gt;bitsize&lt;/span&gt;;

        &lt;span style="color: #845A84;"&gt;handle_binary&lt;/span&gt;:
            size = binary_size&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
            ERTS_GET_REAL_BIN&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;obj, real_bin, offset, bitoffs, bitsize&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
            ASSERT&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;bitsize == 0&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The binary handling case isn't too different from &lt;code&gt;IO_LIST_VEC_COUNT&lt;/code&gt;
above.  We die immediately if this is a bitstring.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            bptr = binary_val&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;real_bin&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
            &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;*bptr == HEADER_PROC_BIN&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                &lt;span style="color: #E36B3F;"&gt;ProcBin&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;pb&lt;/span&gt; = &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;ProcBin&lt;/span&gt; *&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; bptr;
                &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;bitoffs != 0&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;len &amp;lt; size&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
                        &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow&lt;/span&gt;;
                    &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
                    erts_copy_bits&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;pb-&amp;gt;bytes+offset, bitoffs, 1,
                                   &lt;span style="color: #787096;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;byte&lt;/span&gt; *&lt;span style="color: #787096;"&gt;)&lt;/span&gt; buf, 0, 1, size*8&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;;
                    csize += size;
                    buf += size;
                    len -= size;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
It's interesting that the bit offset is handled here, even though
extra bits aren't permitted.  The git history associated with this
block is not helpful.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;                &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;bin_limit &amp;amp;&amp;amp; size &amp;lt; bin_limit&lt;span style="color: #ef6787; background-color: #832729;"&gt;)&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;len &amp;lt; size&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                        &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow&lt;/span&gt;;
                    &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
                    sys_memcpy&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;buf, pb-&amp;gt;bytes+offset, size&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
                    csize += size;
                    buf += size;
                    len -= size;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If we're packing small binaries and this one is below the limit, copy
it in.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;                &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;csize != 0&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                        io_list_to_vec_set_vec&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&amp;amp;iov, &amp;amp;binv, cbin,
                                               cptr, csize, &amp;amp;vlen&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
                        cptr = buf;
                        csize = 0;
                    &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;pb-&amp;gt;flags&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                        erts_emasculate_writable_binary&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;pb&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
                    &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
                    io_list_to_vec_set_vec&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;
                        &amp;amp;iov, &amp;amp;binv, Binary2ErlDrvBinary&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;pb-&amp;gt;val&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;,
                        pb-&amp;gt;bytes+offset, size, &amp;amp;vlen&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
                &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Otherwise, we make the direct translation to an iovec entry.  If we
need to emit a reference to part of the common binary, do that first.
&lt;/p&gt;

&lt;p&gt;
Note the curiously-named &lt;code&gt;erts_emasculate_writable_binary()&lt;/code&gt; which
seems to shrinkwrap the binary (reallocate it to trim unused space).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;            &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
                &lt;span style="color: #E36B3F;"&gt;ErlHeapBin&lt;/span&gt;* &lt;span style="color: #845A84;"&gt;hb&lt;/span&gt; = &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;ErlHeapBin&lt;/span&gt; *&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; bptr;
                &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;len &amp;lt; size&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
                    &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_overflow&lt;/span&gt;;
                &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
                copy_binary_to_buffer&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;buf, 0,
                                      &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;&lt;span style="color: #E36B3F;"&gt;byte&lt;/span&gt; *&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; hb-&amp;gt;data&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;+offset, bitoffs,
                                      8*size&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
                csize += size;
                buf += size;
                len -= size;
            &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;
        &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;&lt;span style="color: #ef6787;"&gt;!&lt;/span&gt;is_nil&lt;span style="color: #ef6787; background-color: #832729;"&gt;(&lt;/span&gt;obj&lt;span style="color: #ef6787; background-color: #832729;"&gt;))&lt;/span&gt; &lt;span style="color: #ef6787; background-color: #832729;"&gt;{&lt;/span&gt;
            &lt;span style="color: #13665F;"&gt;goto&lt;/span&gt; &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;;
        &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;
    &lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If this is a heap binary, it's much less interesting.  We just append
to the common binary.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #13665F;"&gt;if&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;csize != 0&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
        io_list_to_vec_set_vec&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&amp;amp;iov, &amp;amp;binv, cbin, cptr, csize, &amp;amp;vlen&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
After all that, we reference the tail of the common binary, if we have
anything left in it.
&lt;/p&gt;

&lt;p&gt;
Finally we come to the end of the function:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-c"&gt;    &lt;span style="color: #4C7A90;"&gt;DESTROY_ESTACK&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;s&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; vlen;

 &lt;span style="color: #845A84;"&gt;L_type_error&lt;/span&gt;:
    &lt;span style="color: #4C7A90;"&gt;DESTROY_ESTACK&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;s&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; -2;

 &lt;span style="color: #845A84;"&gt;L_overflow&lt;/span&gt;:
    &lt;span style="color: #4C7A90;"&gt;DESTROY_ESTACK&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;s&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
    &lt;span style="color: #13665F;"&gt;return&lt;/span&gt; -1;
&lt;span style="color: #ef6787; background-color: #832729;"&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Not much to see here, but we might as well include it as we've looked
at every other part.
&lt;/p&gt;


&lt;p&gt;
This isn't complete without noting how this pair of functions is
called by functions like &lt;code&gt;erts_port_output()&lt;/code&gt;.  I won't get into the
guts, but if you look in the same file, you'll see that an iovec of
&lt;code&gt;SMALL_WRITE_VEC&lt;/code&gt; (16) is setup on the stack, and if &lt;code&gt;io_list_vec_len()&lt;/code&gt;
reports a &lt;code&gt;vsize&lt;/code&gt; larger than that, it has to allocate the space using
the &lt;code&gt;ERTS_ALC_T_TMP&lt;/code&gt; allocator.
&lt;/p&gt;

&lt;p&gt;
Even if &lt;code&gt;csize&lt;/code&gt; is zero, a binary gets allocated anyway
(&lt;code&gt;driver_alloc_binary(0)&lt;/code&gt; returns a valid allocation).  That seems
like a waste but I guess it makes subsequent logic simpler.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgdd475f1" class="outline-2"&gt;
&lt;h2 id="orgdd475f1"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; Confirming what we've found&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
When I decided to write a test to verify this, I had some problems.
First, I thought writing to &lt;code&gt;/dev/null&lt;/code&gt; with &lt;code&gt;file:write/2&lt;/code&gt; would be
the easiest way to isolate the differences.  After some puzzling
initial results and investigation with strace, I discovered that
&lt;code&gt;file:write/2&lt;/code&gt; always converts the iolist to a binary before sending
it to the file driver!  (I imagine this is to avoid copying long
strings.)
&lt;/p&gt;

&lt;p&gt;
Then I tried sending data to &lt;code&gt;cat &amp;gt;/dev/null&lt;/code&gt; opened as a port.  At
least now strace confirmed that &lt;code&gt;writev&lt;/code&gt; was being called, but
everything was being packed&amp;#x2026; a quick trip into gdb revealed
&lt;code&gt;drv-&amp;gt;outputv&lt;/code&gt; wasn't set — this driver doesn't think it supports
iovecs!
&lt;/p&gt;

&lt;p&gt;
Ok, so a quick grep in &lt;code&gt;erts/drivers/&lt;/code&gt; reveals a lot of drivers don't
define &lt;code&gt;outputv&lt;/code&gt;.  The only thing I could confirm would definitely
use iovecs all the way was the TCP driver.  An informal test, with &lt;code&gt;ncat
-k -l &amp;gt;/dev/null&lt;/code&gt; on the other side, confirmed the differences, and
showed the case where the iovec structure is preserved rather than
packed as being three times slower than packing cases.
&lt;/p&gt;

&lt;p&gt;
So, although this is interesting, being careful about the iolist
structure generated isn't likely to get me any big wins for JSON
encoding.
&lt;/p&gt;

&lt;p&gt;
The iolist-iovec correspondance is probably most useful when you need
to send really large iolists, ones that might be too large to allocate
in one place.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content></entry><entry><title>#1GAM February 2015: ZooKicker</title><link href='http://cipht.net/2015/03/01/zookicker.html'/><updated>2015-03-01T03:30:00+0000</updated><id>http://cipht.net/2015/03/01/zookicker</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
One who makes no mistakes never makes anything.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
It's &lt;a href="http://www.montrealenlumiere.com/nuit-blanche-en/"&gt;Nuit blanche à Montréal&lt;/a&gt;, three in the morning, but I'm not out in
the city, surrounded by revellers; I'm at home, hunched over an aging
Thinkpad, asking myself, "Is this a game?  Can I release this?".  I
tweak another detail, and blaze through the game's three stolen levels
again, prolonging the inevitable.
&lt;/p&gt;

&lt;p&gt;
It's &lt;a href="http://www.onegameamonth.com/"&gt;#1GAM&lt;/a&gt; time again.  How did I end up with another last-minute
crunch after supposedly &lt;a href="http://www.cipht.net/2015/02/01/hbs.html"&gt;learning my lesson last month&lt;/a&gt;?
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
(I will be updating this post with links to binaries shortly; for now,
you can build ZooKicker from &lt;a href="https://github.com/tokenrove/zookicker"&gt;source&lt;/a&gt; using my &lt;a href="https://github.com/tokenrove/tsdl"&gt;modified tsdl&lt;/a&gt; and &lt;a href="https://github.com/tokenrove/tsdl-mixer"&gt;extra&lt;/a&gt;
&lt;a href="https://github.com/tokenrove/tsdl-image"&gt;libraries&lt;/a&gt;.)
&lt;/p&gt;

&lt;ul class="org-ul"&gt;
&lt;li&gt;&lt;a href="http://www.cipht.net/tmp/zookicker-150228-linux-x86_64.tar.gz"&gt;ZooKicker Linux &lt;code&gt;x86_64&lt;/code&gt; Debian sid&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
How to play ZooKicker: move with the cursor keys; press space to kick
a square in the direction you are facing.  You can kick a square
through another square, as long as it's unobstructed.  The goal is to
kick pairs of squares of the same color together.  There are three
levels.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div id="outline-container-org68dee44" class="outline-2"&gt;
&lt;h2 id="org68dee44"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; Demon of the Fall&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;

&lt;div id="orgd17c6c2" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-03-01-dotf.png" alt="[Demon of the Fall]" class="pixelated" width="640" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
I spent most of the month working on reviving a game I started writing
in 2004, called Demon of the Fall.
&lt;/p&gt;

&lt;p&gt;
Demon of the Fall was a way for me to pay tribute to my favorite game,
&lt;a href="http://www.giantbomb.com/solstice-the-quest-for-the-staff-of-demnos/3030-13220/"&gt;Solstice&lt;/a&gt; (and its sequel, &lt;a href="http://www.giantbomb.com/equinox/3030-13849/"&gt;Equinox&lt;/a&gt;, and related isometric
puzzle-platformers like &lt;a href="http://www.worldofspectrum.org/infoseekid.cgi?id%3D0002259"&gt;Head over Heels&lt;/a&gt;).  It started as a
straight-forward clone, but as Retsyn and I worked together on it, we
came up with some uniquely appropriate gameplay elements.  I won't say
too much about that here, though, because I will probably be giving
Demon of the Fall another shot later this year.
&lt;/p&gt;

&lt;p&gt;
All I'll say is that music is central to Demon of the Fall.  Because
February is also the month of the &lt;a href="http://rpmchallenge.com/"&gt;RPM Challenge&lt;/a&gt;, I decided I could get
rid of two albatrosses at once by recording the soundtrack as my album
for RPM&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt;, and completing Demon of the Fall.
&lt;/p&gt;

&lt;p&gt;
I started with the best of intentions, as we always do, and resolved
to do a little work on it every day.  However, the code was in an
unusable state in the darcs repo I found, a casualty of a
"refactoring" gone wrong.
&lt;/p&gt;

&lt;p&gt;
Over the course of the year, we will examine more of these corpses,
and in each case, the cause of death will be the same: refactoring
without tests.&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;  I'm sure 2004-me could have given you countless
justifications for making these changes without unit tests, but they
were all wrong, as 2015-me gets to discover, again and again.
&lt;/p&gt;

&lt;p&gt;
I'll talk about this more throughout the year in these #1GAM posts,
but let me just relate this to what I got out of February:
&lt;/p&gt;

&lt;p&gt;
Christer Kaitila &lt;a href="http://gamedevelopment.tutsplus.com/articles/1gam-how-to-succeed-at-making-one-game-a-month--gamedev-3695"&gt;talks about &lt;i&gt;the wall&lt;/i&gt;&lt;/a&gt; as a reason games don't get
finished.  That's the point where it stops being the drug-like rush of
implementing interesting stuff and becomes all about patience,
discipline, and other dirty words you spend your early adult years
trying to avoid.
&lt;/p&gt;

&lt;p&gt;
I think that a lot of my projects had a cycle like this:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;inspiration strikes: I hack out something in a frenzied night or
weekend;&lt;/li&gt;
&lt;li&gt;there's enough kindling that the fire burns while there are
interesting problems to solve and clever algorithms to implement;&lt;/li&gt;
&lt;li&gt;but the logs don't catch, and what remains to do is boring (which
we dismiss as "too simple" to preserve our ego, though the truth is
it's actually "hard but not fun");&lt;/li&gt;
&lt;li&gt;time passes, and I wonder whatever happened to project X;&lt;/li&gt;
&lt;li&gt;I jump in, but I realize the code is a complete hack, or I've
learned a much better way to do some major structural thing, or my
knowledge of whatever novel programming language I used has
completely changed;&lt;/li&gt;
&lt;li&gt;instead of proceeding cautiously (or better yet, just doing the
hard-but-not-fun bits), I start cutting huge swathes through the
code, breaking everything &amp;#x2013; "&lt;i&gt;We had to destroy the code in order to save it&lt;/i&gt;".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Thankfully, at some point I turned around and saw the trail of dead
projects stretching back for miles.  Awareness was the first step; I
also tried to improve not only my testing and refactoring habits, but
also my version control habits (an area where DVCSes have helped a
lot).  Now I recognize when it's happening, and avoid the &lt;a href="http://youarenotsosmart.com/2011/03/25/the-sunk-cost-fallacy/"&gt;sunk cost
fallacy&lt;/a&gt; that can accompany breaking changes ("&lt;i&gt;I can't revert these
commits, they were so much work!&lt;/i&gt;").
&lt;/p&gt;

&lt;p&gt;
Anyway, it took a long time to not only undo some of that damage, but
also to modernize the code and port it to Windows.  Although I worked
diligently, an hour or two a day was not enough.  My kanban board was
like a frozen river.
&lt;/p&gt;

&lt;p&gt;
On February 22nd, I realized that, even if I could ignore all my other
work (which I couldn't, since money pays for electricity and guitar
strings), there was no way to get Demon of the Fall done by the end of
the month.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org18e4cd8" class="outline-2"&gt;
&lt;h2 id="org18e4cd8"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Tricky Kick&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
After January, though, I had come up with multiple backup plans, in
case my primary game for any month didn't pan out.  These were mostly
plans for clones of simple but fun games that I like.  After a bit of
deliberation, I decided that I would write a clone of &lt;a href="http://www.giantbomb.com/tricky-kick/3030-16919/"&gt;Tricky Kick&lt;/a&gt;, a
PC-Engine game in the fine tradition of puzzle games about kicking or
shoving things, such as &lt;a href="http://www.giantbomb.com/kickle-cubicle/3030-3758/"&gt;Kickle Cubicle&lt;/a&gt; and &lt;a href="http://www.hardcoregaming101.net/mendelpalace/mendelpalace.htm"&gt;Mendel Palace&lt;/a&gt;.
&lt;/p&gt;


&lt;div id="orgd810d56" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-03-01-tricky-kick-level3.png" alt="[Tricky Kick: Oberon in the Bestiary]" class="pixelated" width="512" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Tricky Kick's puzzles have an excellent property of inducing
&lt;a href="http://en.wikipedia.org/wiki/Einstellung_effect"&gt;Einstellung&lt;/a&gt; through suggestive placement of the pieces.
&lt;/p&gt;

&lt;p&gt;
One of the reasons I had Tricky Kick in mind was that I had worked on
a solver for the levels before, as well as Rush Hour and other
Sokoban-like games.&lt;sup&gt;&lt;a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt;  I knew it had simple mechanics I could
implement quickly, and minimal art requirements.
&lt;/p&gt;

&lt;p&gt;
Because of the simple, grid-oriented game state, I considered writing
the game with a roguelike interface:
&lt;/p&gt;

&lt;pre class="example" id="orgd78e3ce"&gt;
****************
****************
*.*****..*****.*
*.1***....***2.*
*...*....@.*...*
*...3..12..3...*
*...3..12..3...*
*...*......*...*
*.1***....***2.*
*.*****..*****.*
****************
****************
&lt;/pre&gt;

&lt;p&gt;
Indeed, the tests still use this ASCII representation:
&lt;/p&gt;
&lt;div class="org-src-container"&gt;
&lt;pre class="src src-ocaml"&gt;&lt;span style="color: #13665F;"&gt;let&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;kicked_beast_with_no_obstruction_wraps_til_player&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;()&lt;/span&gt; =
  compare_boards &lt;span style="color: #39854C;"&gt;"&lt;/span&gt;
&lt;span style="color: #39854C;"&gt;.....1&lt;/span&gt;
&lt;span style="color: #39854C;"&gt;.1@...&lt;/span&gt;
&lt;span style="color: #39854C;"&gt;"&lt;/span&gt; &lt;span style="color: #39854C;"&gt;"&lt;/span&gt;
&lt;span style="color: #39854C;"&gt;.....1&lt;/span&gt;
&lt;span style="color: #39854C;"&gt;..@1.."&lt;/span&gt;
    &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;fun&lt;/span&gt; &lt;span style="color: #845A84;"&gt;it&lt;/span&gt; -&amp;gt; move &lt;span style="color: #383e3f; background-color: #EDEEEB;"&gt;Left&lt;/span&gt; it; kick it&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
It seemed like something that would be fun in a vector-oriented
language like J, but I realized that resolving collisions would be
tricky to do idiomatically in J, since they are much easier to deal
with sequentially than in parallel.
&lt;/p&gt;

&lt;p&gt;
I had been doing a lot of work in OCaml lately, and since I had
prototyped some of my &lt;a href="http://www.cipht.net/2014/11/29/pwl-followup.html"&gt;shape grammar&lt;/a&gt; stuff with OCaml and the &lt;a href="http://erratique.ch/software/tsdl"&gt;Tsdl
bindings&lt;/a&gt; for SDL2, I figured I could use a decent language and still
get things done.  I had delivered software under Windows and OS X with
OCaml before, so I figured the porting friction wouldn't be too bad.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgd77ee9a" class="outline-2"&gt;
&lt;h2 id="orgd77ee9a"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; ZooKicker&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
In the first ten levels of Tricky Kick, you are kicking adorable
animals into each other so that they explode.  This is a little
bizarre.  ZooKicker seemed a suitable name to play on that idea, and I
had figured I would draw a bunch of hyper-cute animals to kick around
in keeping with that surreal theme.
&lt;/p&gt;


&lt;div id="org2b5b7f2" class="figure"&gt;
&lt;p&gt;&lt;img src="http://www.cipht.net/images/2015-03-01-zookicker-level2.png" alt="[Level 2]" class="pixelated" width="512" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Of course, none of that polish got done, so the game should really be
called &lt;del&gt;SquarePusher&lt;/del&gt; RectangleSlipper.  What happened?
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org1fe83d7" class="outline-2"&gt;
&lt;h2 id="org1fe83d7"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; What Went Right&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;/div&gt;
&lt;div id="outline-container-orge5f49eb" class="outline-3"&gt;
&lt;h3 id="orge5f49eb"&gt;&lt;span class="section-number-3"&gt;4.1.&lt;/span&gt; Testing&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-4-1"&gt;
&lt;p&gt;
I forced myself to write a bunch of tests for all the things that
could happen on the board, and this was valuable.  It would have been
cool to do some fancier property-based testing, but I easily get
tangled up in making fancy tests where simple, example-based tests
would do.  I'm glad I avoided that trap.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org05ea77b" class="outline-3"&gt;
&lt;h3 id="org05ea77b"&gt;&lt;span class="section-number-3"&gt;4.2.&lt;/span&gt; Music archives&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-4-2"&gt;
&lt;p&gt;
Late on the final night, I dug through my archives of unfinished
recordings, hoping I would have something I could use as a backing
loop to then record some guitar and keyboard over to serve as music.
Instead I found way more snippets than I actually needed, and I didn't
even bother recording extra parts on top.
&lt;/p&gt;

&lt;p&gt;
Although some of the loops I included are short, I'm happy that every
level in the game has its own music, and the music is a big step up
from the disaster that happened in January.  Maybe by the end of the
year I won't be desperately scrambling for music to add at the last
minute.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org3c13e1e" class="outline-2"&gt;
&lt;h2 id="org3c13e1e"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; What Went Wrong&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;/div&gt;
&lt;div id="outline-container-org52ceaed" class="outline-3"&gt;
&lt;h3 id="org52ceaed"&gt;&lt;span class="section-number-3"&gt;5.1.&lt;/span&gt; Not sending builds to friends&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-5-1"&gt;
&lt;p&gt;
Something that went right in January was uploading new ROM images
daily and soliciting feedback from friends, even when the game was
trivial and bugs made it unplayable.
&lt;/p&gt;

&lt;p&gt;
Meanwhile, I didn't have Demon of the Fall actually running until the
17th, and I never produced standalone builds of it.  With ZooKicker, I
made a few attempts at producing Windows binaries and statically
linked Linux binaries, but it seemed like too much of a hassle at the
time.
&lt;/p&gt;

&lt;p&gt;
Being able to get feedback from people early on is a big motivator,
and although I am adverse to spending the little time I have each day
to work on this on infrastructure tasks, it seems that it would be
worth making that one of the first steps in future #1GAM developments.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org1b8a73f" class="outline-3"&gt;
&lt;h3 id="org1b8a73f"&gt;&lt;span class="section-number-3"&gt;5.2.&lt;/span&gt; Stealing Levels&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-5-2"&gt;
&lt;p&gt;
Maybe this is something that went right, in a sense; it allowed me to
release something.  But I'm not happy with it.
&lt;/p&gt;

&lt;p&gt;
I knew, when considering ZooKicker as a backup option, that one of the
hardest parts would be coming up with good level designs.  I persuaded
myself to copy levels from Tricky Kick in order to get things working,
thinking that I would have time to spend on level design.  I even
thought I might have enough time to adapt my solver code into some
kind of procedural level generator.
&lt;/p&gt;

&lt;p&gt;
No such luck.  I'm sorry about that.  But it's only my second greatest
disappointment with this month's game; the first is the art.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6673367" class="outline-3"&gt;
&lt;h3 id="org6673367"&gt;&lt;span class="section-number-3"&gt;5.3.&lt;/span&gt; Underestimating Art&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-5-3"&gt;
&lt;p&gt;
Trying to learn from January's experience and Demon of the Fall, I
applied &lt;a href="http://gamedevelopment.tutsplus.com/articles/1gam-how-to-succeed-at-making-one-game-a-month--gamedev-3695"&gt;the McFunkyPants method&lt;/a&gt;, but perhaps a little too
dogmatically; or maybe I just didn't allocate enough time to the
project til the end (it was mostly an hour here and there for the last
week of February, until the big push on the last day).  Either way, I
kept my focus on the no-art playable for longer than was healthy,
given that I am a slow and inexperienced artist.
&lt;/p&gt;

&lt;p&gt;
Art takes time proportional to your desired quality level divided by
your skill level.  I had hoped to make some cute vector creatures to
populate the game, but nothing reached a consistent quality level I
could justify replacing the programmer art with.
&lt;/p&gt;

&lt;p&gt;
Looking back, I think I could have thought further outside the box and
gotten something better together: for example, I could have taken
photos of small plastic farm animals (which we have around the
apartment, somewhere) and used those as the animal sprites.
&lt;/p&gt;

&lt;p&gt;
In the end, I added a facing indicator to the player's rectangle,
which was the final admission that art was not happening this time
around.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org9f03106" class="outline-3"&gt;
&lt;h3 id="org9f03106"&gt;&lt;span class="section-number-3"&gt;5.4.&lt;/span&gt; Never Trust a New Tool&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-5-4"&gt;
&lt;p&gt;
I have used OCaml for many things, on and off, in the last decade, but
I haven't done much game development with it.
&lt;/p&gt;

&lt;p&gt;
I am a huge admirer of Daniel Bünzli's &lt;a href="http://erratique.ch/tags/OCaml"&gt;OCaml libraries&lt;/a&gt;, but it turned
out I had made some rash assumptions about the state of Tsdl.  There
were no bindings for any of the usual SDL helper libraries.  These
libraries, such as &lt;code&gt;SDL2_ttf&lt;/code&gt;, &lt;code&gt;SDL2_image&lt;/code&gt;, and &lt;code&gt;SDL2_mixer&lt;/code&gt;, are not
necessarily the most full-featured or optimized implementations, but
they are incredibly handy for quickly throwing together a game, and I
had just assumed I would have them on hand.
&lt;/p&gt;

&lt;p&gt;
So, I had to &lt;a href="https://github.com/tokenrove/tsdl"&gt;modify tsdl&lt;/a&gt; and create bindings for &lt;a href="https://github.com/tokenrove/tsdl-image"&gt;&lt;code&gt;SDL2_image&lt;/code&gt;&lt;/a&gt; and
&lt;a href="https://github.com/tokenrove/tsdl-mixer"&gt;&lt;code&gt;SDL2_mixer&lt;/code&gt;&lt;/a&gt;.&lt;sup&gt;&lt;a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; Of course, I end up doing that, and learning what
the development workflow is for opam packages&lt;sup&gt;&lt;a id="fnr.5" class="footref" href="#fn.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt;, and learning
&lt;a href="https://github.com/ocamllabs/ocaml-ctypes"&gt;ctypes&lt;/a&gt;, on the final day when I really just needed to be creating
content and doing polish.
&lt;/p&gt;

&lt;p&gt;
The other thing that bit me is that there seems to be no way to
declare that foreign objects have &lt;a href="http://c2.com/cgi/wiki?DynamicExtent"&gt;dynamic extent&lt;/a&gt;, which I guess is a
Lispism, that means (in this case) that this object should live on the
stack.&lt;sup&gt;&lt;a id="fnr.6" class="footref" href="#fn.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
Not having dynamic extent is a huge pain in the ass, especially when
interfacing with C code which often has an API designed around the
idea that small, temporary structures can be cheaply setup without
adding any memory pressure.
&lt;/p&gt;

&lt;p&gt;
At first, my trivial no-art playable was GC'ing every few seconds,
which is totally unacceptable.  It is entirely possible (and
desirable), when writing games in garbage-collected languages, to
never trigger a GC in the inner game loop.  Since I've done this
before in other GC'd languages, I had assumed (given OCaml's
pragmatism, in general) that this would be no problem.
&lt;/p&gt;

&lt;p&gt;
Memory pressure isn't a problem for ZooKicker right now, but it did
give me a scare.  Anyway, efficiency isn't something I should be
talking about with a game thrown together quickly like this, where
&lt;code&gt;List.find&lt;/code&gt; accounts for around 7% of the total execution time.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org9d7a578" class="outline-2"&gt;
&lt;h2 id="org9d7a578"&gt;&lt;span class="section-number-2"&gt;6.&lt;/span&gt; A digression: code dumps and maintained software&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
So, as a result of this project, I have now added two more code dumps
to the FOSS landscape, despite resolving to avoid this.  I'm going to
write more about this soon, but I've realized that I have been a poor
free software citizen over the past two decades: I would just leave a
tarball somewhere to gather dust (or open a public repo, in the github
era), rather than tending my open source code like a garden, and I am
changing that.
&lt;/p&gt;

&lt;p&gt;
I still believe code dumps are better than not releasing code at all.
Sometimes it's much better to be able to build on someone's
unmaintained implementation of something than to start from scratch.
That said, there's something unconscientious about it.
&lt;/p&gt;

&lt;p&gt;
There is, of course, an irony about pointing this out in a post about
a game like this, which is often the purest form of code dump, since
games are rarely maintained.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org98477eb" class="outline-2"&gt;
&lt;h2 id="org98477eb"&gt;&lt;span class="section-number-2"&gt;7.&lt;/span&gt; Segue to March&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-7"&gt;
&lt;p&gt;
What have we learned?
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;learning lessons is hard;&lt;/li&gt;
&lt;li&gt;content has to come into the picture early;&lt;/li&gt;
&lt;li&gt;never do more than one new thing at once.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Is this a game?  Yes.  Can I release this?  Yes.  Am I disappointed?
Sure, but March is another month, and all I can do is try again.
&lt;/p&gt;

&lt;hr /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I have participated in RPM every year of its existence, and I
have never finished what I intended to finish (although I did &lt;a href="http://rpmchallenge.com/index.php?option%3Dcom_comprofiler&amp;amp;task%3Duserprofile&amp;amp;user%3D7024&amp;amp;Itemid%3D296"&gt;finish
something one year&lt;/a&gt;, but the less said about that, the better.)
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Beyond that, any time you're tempted to call something a &lt;a href="http://c2.com/cgi/wiki?NoseJobRefactoring"&gt;"big
bang" refactoring&lt;/a&gt;, &lt;a href="http://martinfowler.com/bliki/RefactoringMalapropism.html"&gt;it's not refactoring&lt;/a&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
I beat all 60 levels of Tricky Kick without using the solver,
though.  &lt;a href="http://www.cipht.net/images/2015-03-01-tricky-kick-password.png"&gt;Click here to see the password to unlock all the levels.&lt;/a&gt;
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Only enough to get ZooKicker running, for now.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.5" class="footnum" href="#fnr.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
You probably want path pins, not git pins, no matter what the
opam tool tells you.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.6" class="footnum" href="#fnr.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Add dynamic extent to the long list of features in CL that
many modern languages omit, only to be rediscovered as if novel in the
next wave of "system" programming languages.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>#1GAM January 2015: Balloon Spite</title><link href='http://cipht.net/2015/02/01/hbs.html'/><updated>2015-02-01T03:30:00+0000</updated><id>http://cipht.net/2015/02/01/hbs</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
Want the game?  &lt;a href="http://www.cipht.net/releases/balloon-spite-150201.gba"&gt;Here's the ROM&lt;/a&gt;.  Press B to flap.  Hit L or R in the
select screen to choose an alternate palette.  START skips most
screens.
&lt;/p&gt;
&lt;/blockquote&gt;


&lt;div id="orga363f67" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-in-game.png" alt="[Balloon Spite]" class="pixelated" width="720" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
In December, I found out about &lt;a href="http://www.onegameamonth.com/"&gt;One Game a Month&lt;/a&gt; (abbreviated #1GAM),
which is a kind of personal challenge to finish and release one game
every month for a year.  (I am no stranger to &lt;a href="http://nanowrimo.org/"&gt;ridiculous&lt;/a&gt; &lt;a href="http://rpmchallenge.com/"&gt;personal&lt;/a&gt;
&lt;a href="http://www.cppgm.org/"&gt;challenges&lt;/a&gt;.)
&lt;/p&gt;

&lt;p&gt;
I read &lt;a href="http://mcfunkypants.com/2012/12-games-in-12-months/"&gt;Christer Kaitila's blog post about #1GAM&lt;/a&gt;; it deeply resonated
with me when I read, "I’ve started so many more games than I’ve
finished in the last 20 years."
&lt;/p&gt;

&lt;p&gt;
Even before I knew I wanted to be a programmer, I wanted to make
games.  I'm sure this is a stated goal as common and clichéd as
"astronaut" for the children of the Atari age, but I did pursue it
right through my childhood.  After typing games in from magazines, I
started writing my own, and from somewhere around age six or so
onwards until my early twenties, I wrote games.  A lot of games.
&lt;/p&gt;

&lt;p&gt;
So where are they?  Even the most promising projects, products of
fertile collaborations with talented friends, were never finished.
&lt;/p&gt;

&lt;p&gt;
I had thought about doing a personal "games retrospective" before, but
I wasn't sure how to do it properly.  #1GAM presented an opportunity
to dredge up some past games in a time boxed manner, and to learn more
about finishing things.
&lt;/p&gt;

&lt;div id="outline-container-org0fe0252" class="outline-2"&gt;
&lt;h2 id="org0fe0252"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; January&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
January got off to a rough start, though.  I had a huge list of games
I wanted to work on, and I became paralyzed by choice.  After a week
of regret and despair, I came back to it more systematically.  I
couldn't work on a game full-time, so it had to be something I could
conceivably release on about a commit or two a day.
&lt;/p&gt;

&lt;p&gt;
I dug through my archives.  Unfortunately, the backups I had with me
only had a smattering of working copies of CVS checkouts of some old
projects, rather than the repositories themselves.  One project looked
like it met the criteria: Hyper Ballon Struggle.
&lt;/p&gt;


&lt;div id="orgc3fb13f" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-original.png" alt="[Hyper Ballon Struggle]" class="pixelated" width="480" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
In 2002, I ordered a Gameboy Advance flash linker from &lt;a href="http://en.wikipedia.org/wiki/Lik_Sang"&gt;Lik Sang&lt;/a&gt;, and
RhombusSoft started working on GBA games.  (I'll explain RhombusSoft
in later #1GAM blog posts.)  Hyper Ballon Struggle was a project to
see how much GBA game fit in a weekend's worth of development.  (We
would often have weekend game making parties, which were basically
game jams now that I think about it, although the concept was not
known to us at the time.)
&lt;/p&gt;

&lt;p&gt;
It was intended as a &lt;a href="http://en.wikipedia.org/wiki/Balloon_Fight"&gt;Balloon Fight&lt;/a&gt; / &lt;a href="http://en.wikipedia.org/wiki/Joust_%2528video_game%2529"&gt;Joust&lt;/a&gt; clone featuring a roster of
characters from other games we had developed, in the spirit of
crossover/all-star games like &lt;a href="http://www.hardcoregaming101.net/waiwaiworld/waiwaiworld.htm"&gt;Wai Wai World&lt;/a&gt;, &lt;a href="http://segaretro.org/Saturn_Bomberman"&gt;Saturn Bomberman&lt;/a&gt;, or
&lt;a href="http://www.smashbros.com/"&gt;Super Smash Bros&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
Once I figured out how to build the copy I had, however, I realized it
was incomplete and out-of-date.  I knew that the last version we built
that weekend had a playfield (but no playfield collision), music, and
a "great" rotscale checkerboard effect in the select screen.  This
version only had the core Joust-style mechanic and little else.
&lt;/p&gt;

&lt;p&gt;
I made some calls to people still living in Newfoundland (where all
this part of history happened), and a backup tape was discovered,
dated right after we turned out the lights on our little game studio.
I was thrilled, until I discovered there was no way to read the tape;
a machine with an internal DDS-2 drive was dug out of storage, but the
drive no longer functioned.  Not wanting to ship the tape around, I
decided to work with what I had.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org459dfd1" class="outline-2"&gt;
&lt;h2 id="org459dfd1"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; What Balloon Spite is&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
I renamed Hyper Ballon Struggle to Balloon Spite, because I never pass
up a cheap pun (Balloonacy was already taken).
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org9577e3b" class="outline-3"&gt;
&lt;h3 id="org9577e3b"&gt;&lt;span class="section-number-3"&gt;2.1.&lt;/span&gt; As a game&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-1"&gt;

&lt;div id="orge40807c" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-sweat.png" alt="[Sweating like a pig]" class="pixelated" width="480" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Because of the possibility that I wouldn't have time to draw new
sprites, I decided to turn it into a kind of Street Fighter II-style
one-on-one game, but with a balloon popping mechanic.
&lt;/p&gt;

&lt;p&gt;
I added an exertion mechanic (indicated by the sweat drop) hoping it
would provide some depth to the play, but the stamina values need some
tweaking for it to be really useful.  You can sometimes tire out the
computer opponent if they try camping at the top of the screen, then
get in a quick attack.
&lt;/p&gt;

&lt;p&gt;
There are eight levels.  The original intent had been to have a level
for each character, plus two or more levels with special boss
characters, and of course, different music for each level, with nods
in the themes to the music from that character's respective game.
Alas, as it is, there are only two short, irritating in-game songs,
because the music was done in an afternoon.
&lt;/p&gt;

&lt;p&gt;
The characters and what (unreleased) game they're from:
&lt;/p&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope="col" class="org-left"&gt;Character&lt;/th&gt;
&lt;th scope="col" class="org-left"&gt;&amp;#xa0;&lt;/th&gt;
&lt;th scope="col" class="org-left"&gt;Game&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;Harvey&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-harvey.png" alt="2015-02-01-hbs-harvey.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;AnimoCity&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Rudolph&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-rudolph.png" alt="2015-02-01-hbs-rudolph.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;Quest of Zo&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Ralph&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-alien.png" alt="2015-02-01-hbs-alien.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;a href="http://www.indiedb.com/games/buckler-strife"&gt;Buckler Strife&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Lopez&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-lopez.png" alt="2015-02-01-hbs-lopez.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;Greed'n'Magic&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Pierce&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-pierce.png" alt="2015-02-01-hbs-pierce.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;Maelstrom&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Greedy&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-greedy.png" alt="2015-02-01-hbs-greedy.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;Greed'n'Magic&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Sam&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-sam.png" alt="2015-02-01-hbs-sam.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;Convergence&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;Iceclown&lt;/td&gt;
&lt;td class="org-left"&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-iceclown.png" alt="2015-02-01-hbs-iceclown.png" /&gt;&lt;/td&gt;
&lt;td class="org-left"&gt;Fobwart&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
The iceclown isn't a playable character, but was intended to be a
boss.  (Lopez was supposed to be a boss, too, but for lack of
KidThulhu (from the eponymous game) and Peter (from Demon of the
Fall), he remained selectable.)
&lt;/p&gt;

&lt;p&gt;
Myr was one of the artists on the team.  There also was a sprite that
I thought was supposed to be Retsyn, but he disputes this and removed
it from the game.  (It should be noted that half these games were
written by Retsyn; &lt;a href="http://www.indiedb.com/games/buckler-strife"&gt;Buckler Strife&lt;/a&gt; is the only one I had no involvement
in, though.)
&lt;/p&gt;

&lt;p&gt;
Melville is a reference to the Taito game &lt;a href="http://en.wikipedia.org/wiki/Cameltry"&gt;Cameltry&lt;/a&gt;.  ("Moby died in
the Spinning Room.")
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org742ac6c" class="outline-3"&gt;
&lt;h3 id="org742ac6c"&gt;&lt;span class="section-number-3"&gt;2.2.&lt;/span&gt; Technically&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-2"&gt;
&lt;p&gt;
According to &lt;a href="http://www.dwheeler.com/sloccount/"&gt;sloccount&lt;/a&gt;, it's about 2620 lines of ARM assembly code
(ignoring another 1kLOC of tables and such).  It's pretty buggy, but
it does have a unique character to it.  There were some bugs I
consciously decided not to expend time on, like interpenetration
resolution having the possibility of pushing a character into the
playfield.  In the final hour or two, copy-and-paste became the
dominant coding paradigm.  The code is &lt;a href="http://github.com/tokenrove/hyper-ballon-struggle"&gt;on github&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://pulkomandy.tk/projects/GrafX2"&gt;GrafX2&lt;/a&gt; was used for all pixels, both in 2002 and in 2015.  Levels were
created as PCXes (yes, PCX, even in 2015) and converted to tiles with
a tool borrowed from Convergence.  Music, such as it is, was created
in emacs and compiled with &lt;a href="https://github.com/tokenrove/mumble"&gt;mumble&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
The original game did run on the real hardware, since we didn't have a
usable emulator at the time!  All my GBA development hardware is in
storage, so no new builds have been tested on the real thing.  That's
okay; I will be testing it in the future, and maybe I'll patch it
later.  Most people will be playing on emulators, anyway.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org772073f" class="outline-2"&gt;
&lt;h2 id="org772073f"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; What Went Right&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;/div&gt;
&lt;div id="outline-container-orgf521fcb" class="outline-3"&gt;
&lt;h3 id="orgf521fcb"&gt;&lt;span class="section-number-3"&gt;3.1.&lt;/span&gt; Targeting the GBA&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-1"&gt;
&lt;p&gt;
By producing a ROM image that could be run in any emulator, I made it
much easier to send builds to my friends and get quick feedback.  This
was an unexpected benefit.  Getting feedback, even on early, broken
builds, was helpful in maintaining motivation.
&lt;/p&gt;

&lt;p&gt;
I remember that iterating with the flash linker was a really slow
process, so I have no shame in targeting an emulator where I can get
nearly immediate feedback from a build.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgdf5af99" class="outline-3"&gt;
&lt;h3 id="orgdf5af99"&gt;&lt;span class="section-number-3"&gt;3.2.&lt;/span&gt; What went right: Retsyn's great pixels&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-2"&gt;
&lt;p&gt;
I was already blessed with some pretty great sprites:
&lt;/p&gt;


&lt;div id="org70dade6" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-old-roster.png" alt="[Old roster]" class="pixelated" width="480" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Or so I thought, until Retsyn came to the rescue at the last minute
and revamped the sprites and palettes:
&lt;/p&gt;


&lt;div id="orgf337d03" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-new-roster.png" alt="[New roster]" class="pixelated" width="480" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
He also did the background for Myr's moon stage.  (And, of course, the
vast majority of the old Rhombus content was done by him, back in 2002
and before.)
&lt;/p&gt;

&lt;p&gt;
It was inspirational, and kept me going in the final moments of the
project.  Of course, it also made me regret the time wasted that could
have gone into polishing other aspects, like the music and sound.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org342d537" class="outline-3"&gt;
&lt;h3 id="org342d537"&gt;&lt;span class="section-number-3"&gt;3.3.&lt;/span&gt; What went right: Resolving to Ship&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-3"&gt;
&lt;p&gt;
&lt;a href="http://drjeffszymanski.com/the-perfectionists-handbook"&gt;The Perfectionist's Handbook&lt;/a&gt; is one of the books that has been a
critical part of my journey towards becoming a finisher.  I realized I
had to expose myself to criticism if I was going to get anything done.
I had to stop seeing every piece of code as a reflection of my
self-worth.  And I had to stop trying to optimize (or elegantize) the
hell out of everything.
&lt;/p&gt;

&lt;p&gt;
A number of books draw the distinction between healthy and unhealthy
perfectionism, but Szymanski's Handbook was the most useful in
convincing me that I could let go of some things without losing the
good things about perfectionism.  Most importantly, it helped me
understand that perfectionism, instead of causing me to release only
flawless code, had caused me to withhold tons of good code from
release, releasing only mediocre code under duress.
&lt;/p&gt;

&lt;p&gt;
I stole a few tools from my other projects, most importantly tools for
converting PCX files to GBA tiles.  The code was often terrible &amp;#x2013;
several of the tools were among the first programs I had ever written
in OCaml, and it certainly shows.  My instinct was to dramatically
refactor them immediately, and I am a little proud that I resisted.
Maybe I will go back and clean them up, but I recognized that it
wouldn't further my goal.  The Julian of 2002 would not have been able
to bear that.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-asm"&gt;&lt;span style="color: #13665F;"&gt;.section&lt;/span&gt; .ewram
&lt;span style="color: #13665F;"&gt;.align&lt;/span&gt; 2
@@ XXX should use an overlay for this
&lt;span style="color: #13665F;"&gt;.lcomm&lt;/span&gt; balloons, BALLOON_LEN*MAX_BALLOONS
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
There were so many opportunities for optimization that I avoided
making.  For example, separate screens ("activities" in Android lingo)
could share the same region of memory for their local variables, using
overlays in the linker script.  Didn't do it.  All the tile data, map
data, music data, and even PCM samples (!) are totally uncompressed.
Scandalous!  Various structures in memory wasted bits or even bytes
out of convenience.  In 2002, I could never have endured that.
&lt;/p&gt;

&lt;p&gt;
One thing I didn't know at the time was that the code in ROM would
have been faster, in general, if it had been written in Thumb mode
rather than ARM mode.  I strongly considered rewriting my code to use
mixed modes, but I reminded myself: I need to ship this in a matter of
days.  It's just a silly little game.  &lt;a href="http://c2.com/cgi/wiki?YouArentGonnaNeedIt"&gt;YAGNI&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
When it came time to implement normalization of vectors when computing
contact normals for collision resolution, I did waste a bit of time
thinking about the efficient implementation of &lt;a href="http://en.wikipedia.org/wiki/Fast_inverse_square_root"&gt;reciprocal square root&lt;/a&gt;
versus &lt;a href="http://en.wikipedia.org/wiki/Atan2"&gt;atan2&lt;/a&gt;.  I had also been thinking about implementing &lt;a href="https://books.google.ca/books?id%3DrL25bfhHGk8C&amp;amp;pg%3DPA267&amp;amp;lpg%3DPA267&amp;amp;dq%3Dsunderland%2Blookup%2Btable&amp;amp;source%3Dbl&amp;amp;ots%3Dyw-DCIqK4X&amp;amp;sig%3DB4EvnXBJ6K0bhI84GrG6_KA5LTs&amp;amp;hl%3Den&amp;amp;sa%3DX&amp;amp;ei%3DcXDOVIGgEfX8sATr7oDwBQ&amp;amp;ved%3D0CGEQ6AEwCA#v%3Donepage&amp;amp;q%3Dsunderland%2520lookup%2520table&amp;amp;f%3Dfalse"&gt;the
Sunderland algorithm&lt;/a&gt; for improving my trigonometric function lookup
tables.  &lt;a href="http://c2.com/cgi/wiki?YouArentGonnaNeedIt"&gt;YAGNI&lt;/a&gt;.  Maybe later.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-asm"&gt;&lt;span style="color: #4C7A90;"&gt;compute_contact_normal&lt;/span&gt;:
        &lt;span style="color: #13665F;"&gt;stmfd&lt;/span&gt; sp!, &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;lr&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;
        @@ XXX Ideally, we'd use the reciprocal square root here.
        @@ There are great, simple algorithms for it.  But let's get the
        @@ slow way working first.
        &lt;span style="color: #13665F;"&gt;mov&lt;/span&gt; r0, r7, lsl #PHYS_FIXED_POINT*2
        &lt;span style="color: #13665F;"&gt;swi&lt;/span&gt; #8&amp;lt;&amp;lt;16              @ sqrt
        &lt;span style="color: #13665F;"&gt;mov&lt;/span&gt; r7, r0
        &lt;span style="color: #13665F;"&gt;mov&lt;/span&gt; r0, r8, lsl #PHYS_FIXED_POINT*2
        &lt;span style="color: #13665F;"&gt;swi&lt;/span&gt; #8&amp;lt;&amp;lt;16              @ sqrt
        &lt;span style="color: #13665F;"&gt;mov&lt;/span&gt; r8, r0
        @@ r7 = distance &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;12.4&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;, r8 = penetration &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;12.4&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
I used the slow BIOS division and square-root routines instead; the
first time, my hands trembled.  I think it's the first time I've
written a division on a system like the GBA that wasn't a shift
or a reciprocal table lookup.  The second time, I stopped and wondered
how many divisions I could survive in one frame.  By the eighth time,
I didn't even think about it.  Ship it.  Optimize only if there's a
reason to do so.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org7c2cf6a" class="outline-3"&gt;
&lt;h3 id="org7c2cf6a"&gt;&lt;span class="section-number-3"&gt;3.4.&lt;/span&gt; What went right: I started to enjoy working on it&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-4"&gt;
&lt;p&gt;
The most important thing that went right is that, to my surprise, I
started to enjoy working on the game; I even started to enjoy playing
games again.
&lt;/p&gt;

&lt;p&gt;
It's been years since I've enjoyed playing videogames at all.  I think
the peak for me was when I was in my early 20s.  (If I don't enjoy
games, why am I writing them?  There's a question to answer later this
year.  I'm still thinking about it.)
&lt;/p&gt;

&lt;p&gt;
The possibilities of the idea, which seemed stunted as I began, opened
up as I spent time with it.  I didn't get to exploit any of those
possibilities, really, but it was a good lesson.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orga5f4552" class="outline-2"&gt;
&lt;h2 id="orga5f4552"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; What Went Wrong&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
Getting back up to speed with the GBA took a little while, and given
the late start, it ate up days that would have really counted in the
end.
&lt;/p&gt;

&lt;p&gt;
I have to admit that when I started, I was so rusty that I wrote &lt;code&gt;mvn
r0, r0&lt;/code&gt; rather than &lt;code&gt;rsb r0, r0, #0&lt;/code&gt; trying to negate an integer and
similar kinds of mistakes, but it came back to me eventually.
&lt;/p&gt;

&lt;p&gt;
I didn't have the same cross-compiler the original project was built
with, so I dropped in the linkscript I wrote for Convergence and hoped
it would work, but this actually resulted in a couple of days of
debugging until I realized that the &lt;code&gt;.data&lt;/code&gt; and &lt;code&gt;.bss&lt;/code&gt; sections were
silently being put in ROM.  It turned out that, in Convergence, I had
always indicated whether space was to be reserved in EWRAM or IWRAM,
so I never added a generic BSS or data segment to the linker script.
Once I figured it out, I was pretty surprised that &lt;code&gt;ld&lt;/code&gt; hadn't
complained, but these are the perils of reuse from other projects.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6f028c0" class="outline-3"&gt;
&lt;h3 id="org6f028c0"&gt;&lt;span class="section-number-3"&gt;4.1.&lt;/span&gt; What went wrong: Making assumptions about tools available&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-4-1"&gt;
&lt;p&gt;
I left the music til pretty late, because I can compose pretty quickly
(as evidenced by &lt;a href="https://archive.org/details/polygon018"&gt;Chip&lt;/a&gt; &lt;a href="http://www.cipht.net/chiptune-weekend/martins-inferno/"&gt;Weekend&lt;/a&gt;); the original music routine written for
Hyper Ballon Struggle was missing, but I assumed that the music
playroutine in Convergence would be fine, and that my archive of that
would include tools to convert some format (probably XM or MIDI) to
its internal format.
&lt;/p&gt;

&lt;p&gt;
Imagine my horror when, on January 29th, I attempted to replace the
awful test song only to discover that the music conversion tools for
Convergence, if they were ever finished, weren't in the copy I had.
&lt;/p&gt;

&lt;p&gt;
I decided I would add support for PH-1 (the Convergence playroutine)
to &lt;a href="http://github.com/tokenrove/mumble"&gt;mumble&lt;/a&gt;, a compiler for an &lt;a href="http://en.wikipedia.org/wiki/Music_Macro_Language"&gt;MML&lt;/a&gt;-like textual music description to
various playroutines that I wrote when I was working on Atari ST demos
and then totally abandoned, circa 2004.  It was pretty naïvely
implemented, and very incomplete.  There was another temptation to
rewrite it, especially given how much my understanding of Common Lisp
has improved since I wrote it, but at this point I knew there was no
time.
&lt;/p&gt;

&lt;pre class="example" id="org4ca579f"&gt;
;; Victory fanfare
A o3     a12aaa4     &amp;gt;c12ccc4 | c+12c+c+c+4 e12eee4  | e1
B o2 %i2 c+12c+c+c+4 e12eee4  | e12eee4     g12ggg4  | b1
C o2     e12eee4     g12ggg4  | a12aaa4     &amp;gt;c12ccc4 | g+1
&lt;/pre&gt;

&lt;p&gt;
I was planning to present Balloon Spite to some friends that evening,
so I tried to focus on the straightest path to the goal, even if it
meant a lot of compromise.  I hacked in crude PH-1 support, although
mumble's lack of support for drum kit-style bindings for the noise and
PCM channels meant I couldn't quickly write percussion parts.  This is
one of the worst deficits of the music that ended up in the game, and
it's one of the reasons it all feels terribly primitive.
&lt;/p&gt;

&lt;p&gt;
Added to that, the playroutine design for Convergence sucks; there's
no getting away from that.  Music was left til late in Convergence,
too, so the playroutine design was never battle tested.  It was my
second (or maybe third) chip playroutine design and suffered from me
trying to do things differently, thinking that a more musical
representation would be compact and efficient.  (These days, I think
&lt;a href="http://web.archive.org/web/20160321171110/http://blog.kebby.org/?p=34"&gt;the approach KB used for fr-08&lt;/a&gt; is way better than something like
this.)  Also, many features were simply unimplemented: no arpeggios,
no vibrato, no volume envelopes &amp;#x2013; those are the essential tools for
making chip music that doesn't sound dead, that doesn't sound like
the output of BASIC's &lt;code&gt;PLAY&lt;/code&gt; statement.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgfa89868" class="outline-3"&gt;
&lt;h3 id="orgfa89868"&gt;&lt;span class="section-number-3"&gt;4.2.&lt;/span&gt; What went wrong: Not following the McFunkypants Method&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-4-2"&gt;
&lt;p&gt;
I was aware of &lt;a href="http://gamedevelopment.tutsplus.com/articles/1gam-how-to-succeed-at-making-one-game-a-month--gamedev-3695"&gt;the McFunkypants Method&lt;/a&gt; for finishing a game, but I
didn't follow it as closely as I could have.
&lt;/p&gt;


&lt;div id="org7cdfe89" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2015-02-01-hbs-versus.png" alt="[The incomplete VERSUS screen]" class="pixelated" width="480" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
I think the biggest mistake was not focusing relentlessly on the core
gameplay until it was done.  I was seduced by my overall vision,
especially aspects that were more "tech demo" than game.  The "versus"
screen was supposed to have a rotscale spinning-out effect on the
"VERSUS" text, for example.  Levels were supposed to have different
combinations of parallax scrolling foregrounds and backgrounds, et
cetera.  The select screen has its nauseating rotscale checkerboard
effect (a recreation of an effect I remember from the original).
Players can have alternate palettes (and palettes are pretty
customizable in general).  Various time was wasting doing these things
or experimenting with them.
&lt;/p&gt;

&lt;p&gt;
That time would have been better focused on the core gameplay.  As
late as last night, my final night working on the project, I was
making major gameplay changes, and the build fluctuated between
impossibly hard and incredibly easy.  The final build is not as much
fun as one of the earlier builds, but I accept that as a lesson about
priorities and time management.
&lt;/p&gt;

&lt;p&gt;
For February, I'm planning to follow the aforementioned method much
more closely.  Today I'll be making my storyboard since the game idea
is already established and planning what is required to completely
implement the core gameplay in the first week.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org13cd54f" class="outline-2"&gt;
&lt;h2 id="org13cd54f"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; Lessons Learned&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
I think the summary of all that is:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;verify your tools before you start&lt;/li&gt;
&lt;li&gt;make choices early and stick with them&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.stevenpressfield.com/2013/10/you-as-the-muse-sees-you/"&gt;the Muse waits for you at your desk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Cumulatively, this was a full weekend of effort in 2002, plus about
thirteen days of spare-time hacking and maybe two days of full-on
work.  Given that timeline, I am pretty happy with the result.  Maybe
next year I will revisit it and put out a "remix" version with the
music and physics it deserves, but even if I don't, I'm content that I
gave it a chance to finally see the light of day.
&lt;/p&gt;

&lt;p&gt;
Onward, to February!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content></entry><entry><title>Papers We Love Montreal Followup: Procedural Modeling of Buildings</title><link href='http://cipht.net/2014/11/29/pwl-followup.html'/><updated>2014-11-29T03:30:00+0000</updated><id>http://cipht.net/2014/11/29/pwl-followup</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
(I organize a &lt;a href="http://meetup.com/Papers-We-Love-Montreal"&gt;meetup group&lt;/a&gt; which is the Montreal chapter of &lt;a href="http://paperswelove.org/"&gt;Papers We
Love&lt;/a&gt;; we meet monthly to discuss a paper or papers someone loves, to
help bridge the gap between industry and academia for working
programmers.  This is a followup to the last meetup, where I talked
about the paper &lt;a href="http://peterwonka.net/Publications/pdfs/2006.SG.Mueller.ProceduralModelingOfBuildings.final.pdf"&gt;Procedural Modeling of Buildings&lt;/a&gt;.)
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div id="outline-container-org70b4bbf" class="outline-2"&gt;
&lt;h2 id="org70b4bbf"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; On questions&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
There were some very good questions during my talk, but I think it
could be made more clear: at a PWL meetup, there are no dumb
questions, and &lt;a href="http://www.catb.org/jargon/html/W/Whats-a-spline.html"&gt;"What's a Spline?" questions&lt;/a&gt; can often benefit
everyone.  For example, I was a little surprised no one asked about
&lt;a href="http://en.wikipedia.org/wiki/Euler_operator"&gt;Euler operators&lt;/a&gt;, since I'd never heard of them until I started
implementing the boundary representation code for this talk.  There
were a few aspects like that I realized I should have explained in
more detail.  Next time, you can help, by speaking up and asking a
question.
&lt;/p&gt;

&lt;p&gt;
Also, I'll add this to the meetup page, but it should be noted that
papers, presentations, and questions are welcome in French.  If the
speaker doesn't understand French, someone will be happy to translate.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org550d30b" class="outline-2"&gt;
&lt;h2 id="org550d30b"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Grammar-driven test case generation&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;p&gt;
One of the big things we talked about was how to choose productions
when generating from a grammar, and I mentioned &lt;a href="http://mcts.ai/"&gt;Monte Carlo tree
search&lt;/a&gt; in the context of a game where your generator is "playing
against" the system under test, with crashes, resource leaks, or code
coverage as the "score".  Lots of learning algorithms could be used in
this case, but MCTS is worth checking out because it doesn't require
coming up with a good admissible heuristic yourself.  Also, it's a
very cool, very general algorithm that has had tremendous success
playing Go against humans.
&lt;/p&gt;

&lt;p&gt;
The &lt;a href="http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form"&gt;ABNF&lt;/a&gt;-driven generator I mentioned will be released soon.  Until
then, check out &lt;a href="http://www.quut.com/abnfgen/"&gt;this one&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
The paper I mentioned regarding this approach for protocol fuzzing was
&lt;a href="http://link.springer.com/chapter/10.1007%252F978-3-642-39218-4_9"&gt;Extraction of ABNF Rules from RFCs to Enable Automated Test Data
Generation&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org64142a7" class="outline-2"&gt;
&lt;h2 id="org64142a7"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; L-systems&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
The &lt;a href="http://algorithmicbotany.org/"&gt;Algorithmic Botany&lt;/a&gt; page is full of great stuff, often
L-system-related.
&lt;/p&gt;

&lt;p&gt;
I mentioned &lt;a href="http://nothings.org/gamedev/l_systems.html"&gt;Sean Barrett's criticism of L-systems&lt;/a&gt;, and I think it's
worth a read.  I don't agree with his conclusion, but his reduction of
a specific L-system down to simple algorithm is a great demonstration
of the need to think about generality and constraint in notation.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgf9ae69d" class="outline-2"&gt;
&lt;h2 id="orgf9ae69d"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; Procedural Modeling of Buildings&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
The most important link is &lt;a href="http://peterwonka.net/Publications/publications.html"&gt;Peter Wonka's publications page&lt;/a&gt;; most of
the papers mentioned are there, or are directly cited by one of those
papers.
&lt;/p&gt;

&lt;p&gt;
In particular, in addition to the central papers discussed (&lt;a href="http://peterwonka.net/Publications/pdfs/2003.SG.Wonka.InstantArchitecture.high.pdf"&gt;Instant
Architecture&lt;/a&gt;, &lt;a href="http://peterwonka.net/Publications/pdfs/2006.SG.Mueller.ProceduralModelingOfBuildings.final.pdf"&gt;Procedural Modeling of Buildings&lt;/a&gt;), check out the papers
on Inverse Procedural Modeling, and parallel derivation/evaluation of
shape grammars.
&lt;/p&gt;

&lt;p&gt;
Also, I mentioned &lt;a href="http://www.esri.com/software/cityengine"&gt;CityEngine&lt;/a&gt;, but be sure to &lt;a href="https://www.youtube.com/watch?v%3DaFRqSJFp-I0"&gt;check out their
impressive demo videos.&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Two things I would have liked to discuss more were the parallels
between a grammar-driven generator and a programming language
interpreter, and graph rewriting / graph grammars.  Maybe those are
topics that can come up again at a future PWL meetup.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org99f5bf5" class="outline-3"&gt;
&lt;h3 id="org99f5bf5"&gt;&lt;span class="section-number-3"&gt;4.1.&lt;/span&gt; Bag of links&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-4-1"&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Matrix_grammar"&gt;Matrix grammar&lt;/a&gt; — a potentially simpler way to formalize the LOD idea
in the paper;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.shapegrammar.org/"&gt;Shape Grammars&lt;/a&gt; — in which you can find reference to many uses of
shape grammars in architecture and design, including the coffee
maker and Buick applications I mentioned;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://vterrain.org/"&gt;Virtual Terrain Project&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.generative-modeling.org/GenerativeModeling/Gml.html"&gt;Generative Modeling Language;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://grape.swap-zt.com/"&gt;GRAPE&lt;/a&gt; — a parametric shape grammar interpreter;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.academia.edu/4722788/From_topologies_to_shapes_parametric_shape_grammars_implemented_by_graphs"&gt;From topologies to shapes: parametric shape grammars implemented by graphs&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://uwspace.uwaterloo.ca/handle/10012/4935"&gt;The aesthetics of science fiction spaceship design&lt;/a&gt; — be sure to
check out the Death Star trench modeling section;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://andrew.li/publications.html"&gt;Andrew Li's publications&lt;/a&gt; — including some interesting analysis of
historical Chinese building standards with shape grammars;&lt;/li&gt;
&lt;li&gt;Garment-modeling papers that look interesting but which I haven't read:
&lt;ul class="org-ul"&gt;
&lt;li&gt;&lt;a href="http://dl.acm.org/citation.cfm?id%3D1994424"&gt;Context-aware garment modeling from sketches&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://dl.acm.org/citation.cfm?id%3D2461975"&gt;Parsing sewing patterns into 3D garments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.sciencedirect.com/science/article/pii/S0010448513002418"&gt;Modeling 3D garments by examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.worldscientific.com/doi/abs/10.1142/S0219467813500216"&gt;A Survey on 3D Human Body Modeling for Interactive Fashion Design&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org3adb867" class="outline-2"&gt;
&lt;h2 id="org3adb867"&gt;&lt;span class="section-number-2"&gt;5.&lt;/span&gt; Demos&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
The 64k demo we watched at the beginning was &lt;a href="https://www.youtube.com/watch?v%3DmZdlSWLqumw"&gt;The Timeless&lt;/a&gt; (1st place,
pc 64k, Revision 2014), by Mercury.  As I mentioned, given what looks
like shape grammar techniques, I suspect the title is a reference to
&lt;a href="http://en.wikipedia.org/wiki/The_Timeless_Way_of_Building"&gt;The Timeless Way of Building&lt;/a&gt; by Christopher Alexander.
&lt;/p&gt;

&lt;p&gt;
I think one of the big inspirations for procedural content generation
at this scale was &lt;a href="https://www.youtube.com/watch?v%3DY3n3c_8Nn2Y"&gt;.the .product&lt;/a&gt; (1st place, pc 64k, The Party 2000),
another 64k demo by one of my favorite demo groups, &lt;a href="http://www.farbrausch.com/"&gt;farbrausch&lt;/a&gt;.  The
architecture generated is less sophisticated than in the Mercury demo,
but given that it was released fourteen years ago, I think it could be
considered more impressive.
&lt;/p&gt;

&lt;p&gt;
I meant to talk more about the tools they released, but you can find
the source code &lt;a href="https://github.com/farbrausch/fr_public"&gt;in their github repo&lt;/a&gt;.  Among other things, this
includes their 96 kilobyte FPS game.  There are a couple of talks on
YouTube by members of farbrausch talking about some of their
techniques that are worth checking out, especially if you're
interested in texture generation.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org13a66ae" class="outline-2"&gt;
&lt;h2 id="org13a66ae"&gt;&lt;span class="section-number-2"&gt;6.&lt;/span&gt; The next paper&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;
&lt;p&gt;
We decided at the meeting to skip December; I'll be out of town, and
late December is a busy time for everyone.  This gives us a bit of
time to plan the January meetup, so I urge anyone in the group with an
interest in presenting a paper to contact me about presenting in
January.
&lt;/p&gt;

&lt;p&gt;
Alternately, if there's a topic you'd like to see discussed but don't
feel comfortable presenting, why not ask?
&lt;/p&gt;

&lt;p&gt;
Here are some papers and topics I'd love to see presented:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;&lt;a href="http://research.microsoft.com/pubs/74339/dwork_tamc.pdf"&gt;Differential Privacy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mcts.ai/"&gt;Monte Carlo Tree Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://c2.com/cgi/wiki?LinearTypes"&gt;linear types&lt;/a&gt; and all kinds of resource-bound guarantees
(e.g. &lt;a href="http://dl.acm.org/citation.cfm?id%3D321846"&gt;Robson's proof&lt;/a&gt; that &lt;a href="https://www.sqlite.org/malloc.html"&gt;SQLite uses&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href="http://users.ece.cmu.edu/~omutlu/pub/dram-row-hammer_isca14.pdf"&gt;Flipping Bits in Memory Without Accessing Them&lt;/a&gt; (chilling!)&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.linozemtseva.com/research/2014/icse/coverage/coverage_paper.pdf"&gt;Coverage is not strongly correlated with test suite effectiveness&lt;/a&gt;
(provocative!)&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.cis.upenn.edu/~stevez/papers/LZ06b.pdf"&gt;A Language-based Approach to Unifying Events and Threads&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
There's also a poll, &lt;a href="http://www.meetup.com/Papers-We-Love-Montreal/polls/1158482/"&gt;here on the meetup website&lt;/a&gt;, but I can appreciate
that no one has voted on it yet, since meetup's website forces me to
make it illegibly dense.  Also, if you need help tracking down a copy
of a paper, just email me.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content></entry><entry><title>ILC2014 summary</title><link href='http://cipht.net/2014/08/19/ilc2014.html'/><updated>2014-08-19T02:30:00+0000</updated><id>http://cipht.net/2014/08/19/ilc2014</id><content type='html'>&lt;p&gt;
When I heard that the International Lisp Conference (ILC) was
happening in Montreal this year, I got very excited.  Actually, I
started making ambitious plans for a paper to submit and a talk to
give, against which the rest of my life conspired, but I certainly
registered as soon as registration opened.  I've always wanted to go
to ILC (and ECLM and so on), but having it happen in the city in which
I live meant I had to go.
&lt;/p&gt;

&lt;p&gt;
I've noticed some people on &lt;code&gt;#lisp&lt;/code&gt; asking about the talks, so I'll try
to give a brief summary of each talk, to the extent that I remember.
If you have any questions, drop me a line via email, or on &lt;code&gt;#lisp&lt;/code&gt; on
&lt;a href="http://freenode.net/"&gt;Freenode IRC&lt;/a&gt; (my nick is &lt;code&gt;tokenrove&lt;/code&gt;; attending ILC encouraged me to idle more).
&lt;/p&gt;

&lt;p&gt;
Generally, the talks were much less technical than I had anticipated,
and aimed at a very broad audience.  Almost every talk seemed to run
late, so I refrained from asking questions, interrogating the speakers
in person during the breaks instead.
&lt;/p&gt;

&lt;p&gt;
There were about 70 people, overwhelmingly male and white.  There were
perhaps two or three women.  This was a bit of a shock for me, since
other technical events I've been around lately have had much less
homogeneous demographics.
&lt;/p&gt;

&lt;p&gt;
In spite of my grumblings throughout this post, I have to say that the
conference was overall very well organized, and my thanks go out to
all the organizers.
&lt;/p&gt;


&lt;div id="orgba9d51d" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2014-08-19-ilc2014.jpg" alt="[ILC2014 proceedings, badge, and picobot]" /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="table-of-contents" role="doc-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents" role="doc-toc"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#org96471f1"&gt;1. Friday&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#org70b477f"&gt;1.1. Tutorial 1: Multiplatform and Mobile App Development in Scheme with Gambit/SchemeSpheres&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org6406809"&gt;1.2. Tutorial 2: A Gentle Introduction to Gendl, a Common Lisp-based Knowledge Based Engineering Environment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org45aff40"&gt;1.3. Cocktail&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#orga6d9fe2"&gt;2. Saturday&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#org4f30039"&gt;2.1. What a SOOC!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org509a88c"&gt;2.2. Kilns: A Lisp Without Lambda&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org0d97536"&gt;2.3. Using Common Lisp as a Scripting Language&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orga337c0e"&gt;2.4. Common Lisp's Predilection for Mathematical Programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org4cca22c"&gt;2.5. Typed Clojure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org7d3744b"&gt;2.6. Hygienic Macro System for JavaScript and Its Light-weight Implementation Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org7fcc205"&gt;2.7. An Array and List Processing System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org610a220"&gt;2.8. Reaching Python from Racket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org712091f"&gt;2.9. Lightning talks 1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#org34cdac8"&gt;3. Sunday&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#orgd87a591"&gt;3.1. Emacs Lisp on the Move&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgef6ee3c"&gt;3.2. A Scheme-based Closed-Loop Anaesthesia System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgee7e343"&gt;3.3. Leadership Trait Analysis and Threat Assessment with Profiler Plus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org0348228"&gt;3.4. Efficient Finite Permutation Groups and Homomesy Computation in Common Lisp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org62f3caa"&gt;3.5. CL-FFF: A Common Lisp Full Stack Framework for Web Apps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org7fe714a"&gt;3.6. SICL spinoffs: Generic Dispatch, Garbage Collection, and CLOS Bootstrapping&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgc7f45d5"&gt;3.7. A Transformation Based Approach to Semantics-Directed Code Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#orgf2945ee"&gt;3.8. Lightning talks 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#org5136fb0"&gt;3.9. Panel: "The Next Move for Lisp"&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#org847988a"&gt;4. Summary: Rekindling the flame&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org96471f1" class="outline-2"&gt;
&lt;h2 id="org96471f1"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; Friday&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
Luckily I was already familiar with the Université de Montréal campus,
otherwise I surely would not have found the venue in time for the
first talk.  Later, there were posters up in a few places, which
helped a bit, but initially there was no indication where this was
happening.  I heard from at least one other person who did miss the
first talk on account of this.
&lt;/p&gt;

&lt;p&gt;
Oh, and of course UdeM is built on quite a steep hill, so I imagine
most people arrived a bit sweaty.  It was fortunate we weren't having
the kind of hot, sticky weather that often happens here this time of
year.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org70b477f" class="outline-3"&gt;
&lt;h3 id="org70b477f"&gt;&lt;span class="section-number-3"&gt;1.1.&lt;/span&gt; Tutorial 1: Multiplatform and Mobile App Development in Scheme with Gambit/SchemeSpheres&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-1-1"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Álvaro Castro-Castilla.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
The first talk was about an application framework called
&lt;a href="http://www.schemespheres.org/"&gt;SchemeSpheres&lt;/a&gt;, which provides, among other things, modules,
conditional compilation, and a unified build system.
&lt;/p&gt;

&lt;p&gt;
It was nice seeing a focus on application delivery (an area where
Scheme traditionally trounces CL).  The demonstration was not terribly
convincing, but the software itself seemed promising.
&lt;/p&gt;

&lt;p&gt;
The speaker spent considerable time discussing the provision of
features which CL already provides.  This was the first of a recurring
theme at the conference, and to me could have been discussed in the
final panel: how do we increase awareness of the powerful facilities
provided by CL and its ecosystem?
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org6406809" class="outline-3"&gt;
&lt;h3 id="org6406809"&gt;&lt;span class="section-number-3"&gt;1.2.&lt;/span&gt; Tutorial 2: A Gentle Introduction to Gendl, a Common Lisp-based Knowledge Based Engineering Environment&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-1-2"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Dave Cooper.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
This was basically a walkthrough of some examples and exercises with
&lt;a href="http://gendl.com/"&gt;http://gendl.com/&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
It was during this tutorial that the audience was polled for CL users,
and the vast majority of the attendees raised their hands.
&lt;/p&gt;

&lt;p&gt;
I wanted to ask some questions about units of measure (a pet bugaboo
of mine at work lately) but I think we were already running late.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org45aff40" class="outline-3"&gt;
&lt;h3 id="org45aff40"&gt;&lt;span class="section-number-3"&gt;1.3.&lt;/span&gt; Cocktail&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-1-3"&gt;
&lt;p&gt;
The social aspects of the conference were actually the most valuable
for me.  Hearing about what people were working on, getting to put
faces to names, and meeting new people all made the conference worth
it.
&lt;/p&gt;

&lt;p&gt;
Oh, and the catering was great.  That always helps.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orga6d9fe2" class="outline-2"&gt;
&lt;h2 id="orga6d9fe2"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Saturday&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;/div&gt;
&lt;div id="outline-container-org4f30039" class="outline-3"&gt;
&lt;h3 id="org4f30039"&gt;&lt;span class="section-number-3"&gt;2.1.&lt;/span&gt; What a SOOC!&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-1"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Christian Quiennec&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
The legendary Christian Quiennec discussed a non-massive MOOC that he
ran (&lt;a href="https://programmation-recursive-1.appspot.com/course"&gt;https://programmation-recursive-1.appspot.com/course&lt;/a&gt;), including
copious statistics about participation, and the mechanics of running
the course.
&lt;/p&gt;

&lt;p&gt;
For me, the most interesting aspect was his discussion of automated
grading.  All students were required to test their code, and submit
their test suite along with their implementation.  The grader then
checked not only whether the student's implementation satisfied its
test suite, but also other combinations, as follows: (v() being the
tests, and f being the implementation, s and t being student and
teacher respectively)
&lt;/p&gt;

&lt;table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col  class="org-left" /&gt;

&lt;col  class="org-left" /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="org-left"&gt;coherence&lt;/td&gt;
&lt;td class="org-left"&gt;check v&lt;sub&gt;s&lt;/sub&gt;(f&lt;sub&gt;s&lt;/sub&gt;) and v&lt;sub&gt;s&lt;/sub&gt;(f&lt;sub&gt;t&lt;/sub&gt;)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;correctness&lt;/td&gt;
&lt;td class="org-left"&gt;check v&lt;sub&gt;t&lt;/sub&gt;(f&lt;sub&gt;s&lt;/sub&gt;)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="org-left"&gt;coverage&lt;/td&gt;
&lt;td class="org-left"&gt;compare v&lt;sub&gt;s&lt;/sub&gt;(f&lt;sub&gt;s&lt;/sub&gt;) and v&lt;sub&gt;t&lt;/sub&gt;(f&lt;sub&gt;s&lt;/sub&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
Another interesting concept he mentioned was "epsilon-peeping";
students with incorrect solutions could be shown slightly less
incorrect solutions from other students, to guide them towards
correctness, and give them experience reading code.
&lt;/p&gt;

&lt;p&gt;
I'm not sure if the talk is available in English anywhere, but the
slides for a version in French appear &lt;a href="http://www.societe-informatique-de-france.fr/wp-content/uploads/2014/05/2014-06-j-info-mooc-c-quiennec.pdf"&gt;to be available here&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org509a88c" class="outline-3"&gt;
&lt;h3 id="org509a88c"&gt;&lt;span class="section-number-3"&gt;2.2.&lt;/span&gt; Kilns: A Lisp Without Lambda&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-2"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Greg Pfeil.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
The first talk where I had to think.  Greg Pfeil presented &lt;a href="https://github.com/sellout/Kilns"&gt;a language
he's been working on called Kilns&lt;/a&gt;, based on &lt;a href="http://sardes.inrialpes.fr/kells/"&gt;the kell calculus&lt;/a&gt;.  There
were two interesting aspects of this talk; the first was the idea of
modelling locality in the language, and where that could go.  I
wondered whether locality could be extended "inward" instead of out &amp;#x2013;
like a memory-hierarchy conscious language like &lt;a href="http://sequoia.stanford.edu/"&gt;Sequoia&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
The second aspect was a bigger goal he described, involving a
combination of educational reform and layers of languages to empower
everyday programming.  This reminded me a bit of some of the ideas
coming out of &lt;a href="http://vpri.org/"&gt;VPRI&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0d97536" class="outline-3"&gt;
&lt;h3 id="org0d97536"&gt;&lt;span class="section-number-3"&gt;2.3.&lt;/span&gt; Using Common Lisp as a Scripting Language&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-3"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: François-René Rideau.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
Fare talked about the very practical aspects he has driven forward
with his work on &lt;a href="http://cliki.net/cl-launch"&gt;cl-launch&lt;/a&gt; and newer versions of &lt;a href="http://common-lisp.net/project/asdf/"&gt;asdf&lt;/a&gt;.  The situation
for CL scripting sounds much nicer than it was even a few years ago.
&lt;/p&gt;

&lt;p&gt;
There was a question about startup time from Marc Feeley, and Fare
indicated that scanning for ASDF systems was the bulk of the startup
time, and that it was easily eliminated by compiling the script,
although he felt it was mostly negligible anyway.
&lt;/p&gt;

&lt;p&gt;
You can find more of the details here:
  &lt;a href="https://github.com/fare/asdf3-2013"&gt;https://github.com/fare/asdf3-2013&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orga337c0e" class="outline-3"&gt;
&lt;h3 id="orga337c0e"&gt;&lt;span class="section-number-3"&gt;2.4.&lt;/span&gt; Common Lisp's Predilection for Mathematical Programming&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-4"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Robert Smith.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
An interesting talk espousing the virtues of CL as a language for
numerical computation and experimental mathematics.  Robert Smith
showed off some code, which I understand can be obtained from &lt;a href="https://bitbucket.org/tarballs_are_good"&gt;his
bitbucket&lt;/a&gt;.  He demonstrated how CL provided very effective tools for
succinctly expressing the constructs he needed to perform this work.
It was nice to see someone fire up Emacs and show off a bunch of
macros.
&lt;/p&gt;

&lt;p&gt;
I asked him where he felt CL implementations should go in the future
to better serve mathematicians, and he pointed me to his blog entry,
&lt;a href="http://symbo1ics.com/blog/?p%3D2316"&gt;Things I Want in Common Lisp&lt;/a&gt;.  It brings up something which was
another theme: everyone seems to want CL's standard library to be more
generic.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;


&lt;div id="outline-container-org4cca22c" class="outline-3"&gt;
&lt;h3 id="org4cca22c"&gt;&lt;span class="section-number-3"&gt;2.5.&lt;/span&gt; Typed Clojure&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-5"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Ambrose Bonnaire-Sergeant.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
I understand that the organizers put a lot of effort into trying to
attract members of the Clojure community to ILC this year, but they
don't seem to have been successful.  Bonnaire-Sergeant was effectively
the only (vocal) Clojure user I met at the conference.
&lt;/p&gt;

&lt;p&gt;
He gave a talk on his work on Typed Clojure, which is directly
inspired by the work of Typed Racket.  It was a nice overview, but
again I found myself saying to myself, "don't people know that SBCL
already does a pretty good job at this?  Don't people know that CL has
had optional typing forever, and that declarations are great?" (Not to
mention that the potential utility of declarations extends beyond just
type annotations.)
&lt;/p&gt;

&lt;p&gt;
It was originally supposed to be a discussion specifically of the
interlanguage interoperability aspects of Typed Clojure, but
Bonnaire-Sergeant decided to just describe the Typed Clojure system,
given the number of attendees unfamiliar with the work.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org7d3744b" class="outline-3"&gt;
&lt;h3 id="org7d3744b"&gt;&lt;span class="section-number-3"&gt;2.6.&lt;/span&gt; Hygienic Macro System for JavaScript and Its Light-weight Implementation Framework&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-6"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Ken Wakita.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
I wasn't too interested in this talk from the abstract, but it ended
up being a great presentation.  Ken Wakita presented ExJS, which I
gathered was a simple, elegant implementation of macros for Javascript
that actually had a very palatable syntax.  Wakita was very clear
about the existing work, how their work differed, and the kinds of
problems they solved.  One interesting thing about it is that they
convert the original Javascript to s-expressions and use an existing
Scheme implementation for macro-expansion.
&lt;/p&gt;

&lt;p&gt;
I didn't get a URL for this, and a few seconds of trivial googling
didn't help much, but &lt;a href="https://github.com/homizu/js-macro"&gt;https://github.com/homizu/js-macro&lt;/a&gt; might be the
github project for ExJS.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org7fcc205" class="outline-3"&gt;
&lt;h3 id="org7fcc205"&gt;&lt;span class="section-number-3"&gt;2.7.&lt;/span&gt; An Array and List Processing System&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-7"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Dave Penkler.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
This was the talk I most anticipated, because it's very closely
related to work I've been doing that I had hoped to present.
&lt;/p&gt;

&lt;p&gt;
Penkler presented ALPS, a fascinating rapid prototyping environment
derived from Lisp and APL, including support for interactive graphics,
audio, task scheduler, and probably countless other things.
&lt;/p&gt;

&lt;p&gt;
This seemed to be the epitome of the personal programming environment,
as unsuited to the development of stifling "enterprise" software as it
is suited to maximally amplifying the output of a single programmer
who knows the system intimately, like a well-worn favorite musical
instrument.
&lt;/p&gt;

&lt;p&gt;
He went into detail as to why he felt LISP 1.5 and APL/360 were
excellent models for this kind of system, as opposed to their more
modern descendents.
&lt;/p&gt;

&lt;p&gt;
He demoed the obligatory Conway's Game of Life.  One curiosity that
was revealed out of this is that ALPS supports both APL's concept of
booleans (1 and 0) as well as Lisp's (t and nil), which seems a little
confusing.
&lt;/p&gt;

&lt;p&gt;
ALPS does not have a type system, per se, and the set of types was
intentionally kept quite simple.  The sole numeric datatype is IEEE
754 double floats, not unlike Javascript.  The read syntax for arrays
simply uses square brackets, to indicate a vector, which can then be
shaped multidimensionally with p (an ersatz ρ) if required.
&lt;/p&gt;

&lt;p&gt;
GC is a simple mark-and-sweep approach, but with separate spaces for
conses and vectors.  I asked if he was doing any reference counting
tricks (since you can often overwrite arrays in-place if you know
there are no other references) but he indicated that always copying
was fast enough.
&lt;/p&gt;

&lt;p&gt;
ALPS has been ported across many machines and architectures; he even
has it running on his phone!  Penkler indicated that, at this point,
having been so widely ported, ALPS' own support library is so
all-encompassing that it could be run on the bare metal without any
real OS support.  It was originally implemented in Pascal, and later
ported to C.
&lt;/p&gt;

&lt;p&gt;
He indicated that he wasn't familiar with &lt;a href="http://www.cs.trinity.edu/~jhowland/aprol.paper.pdf"&gt;APROL&lt;/a&gt;, which is the main
attempt of which I'm aware to blend Lisp and APL.  Of course, there
are other shades of that: &lt;a href="http://series.sourceforge.net/"&gt;SERIES&lt;/a&gt; is inspired by APL, and &lt;a href="http://kx.com/"&gt;K&lt;/a&gt; is heavily
inspired by Scheme (but stays, culturally, on the APL side of the
fence).  There are also a number of more modern attempts, particularly
inspired by J and K, such as &lt;a href="http://static.livingcosmos.org/domains/org/metaperl/redick/doc/website/INSTALL.html"&gt;Redick&lt;/a&gt;, but that's a topic for a later
blog post.
&lt;/p&gt;

&lt;p&gt;
As far as I know, there is no public release of this system, but I
exchanged contact information with Dave and I will update this page if
there are any more details.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org610a220" class="outline-3"&gt;
&lt;h3 id="org610a220"&gt;&lt;span class="section-number-3"&gt;2.8.&lt;/span&gt; Reaching Python from Racket&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-8"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Pedro Ramos.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
A technical discussion of the difficulties of reaching Python
libraries which use C modules, and an approach the authors used to
solving the problem for Racket.  It seemed that the talk covered the
same ground as &lt;a href="http://dl.acm.org/citation.cfm?id%3D2635660"&gt;the paper&lt;/a&gt;, although it was nice seeing the demo
step-by-step.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org712091f" class="outline-3"&gt;
&lt;h3 id="org712091f"&gt;&lt;span class="section-number-3"&gt;2.9.&lt;/span&gt; Lightning talks 1&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-9"&gt;
&lt;p&gt;
My memory of the lightning talks is very fuzzy, so I apologize if I
have omitted your talk.
&lt;/p&gt;

&lt;p&gt;
There were a couple of interesting implementation lightning talks.
There was a talk on reducing the overhead of structures in Gambit
scheme, and one on lazy compilation and code versioning.
&lt;/p&gt;

&lt;p&gt;
There were recruitment spiels from &lt;a href="http://www.esstech.com/"&gt;ESS Technology&lt;/a&gt; and &lt;a href="http://www.ravenpack.com/"&gt;RavenPack&lt;/a&gt;, which
I'm sure will also be posted on &lt;a href="http://lispjobs.wordpress.com/"&gt;Lispjobs&lt;/a&gt; if they haven't already been.
&lt;/p&gt;

&lt;p&gt;
Paul Tarvydas presented some interesting work on writing reader macros
that use PEG parsers to support fancier syntax, and some experiments
in flow-based programming.  This work is available at
&lt;a href="https://github.com/guitarvydas"&gt;https://github.com/guitarvydas&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
Finally, Didier Verna gave a very entertaining introduction to his &lt;a href="https://www.lrde.epita.fr/~didier/software/lisp/misc.php#smilisp"&gt;:o(
Smilisp :-)&lt;/a&gt; dialect to finish off the day.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org34cdac8" class="outline-2"&gt;
&lt;h2 id="org34cdac8"&gt;&lt;span class="section-number-2"&gt;3.&lt;/span&gt; Sunday&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;/div&gt;
&lt;div id="outline-container-orgd87a591" class="outline-3"&gt;
&lt;h3 id="orgd87a591"&gt;&lt;span class="section-number-3"&gt;3.1.&lt;/span&gt; Emacs Lisp on the Move&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-1"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Stefan Monnier.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
This was the only truly universal talk at the conference.  When asked
who &lt;span class="underline"&gt;didn't&lt;/span&gt; use Emacs in the crowd, there was only one hand raised
(and they were promptly lynched).
&lt;/p&gt;

&lt;p&gt;
Monnier talked a bit about where elisp came from, some rationale for
design decisions which seem painful now, and went into some depth on
the efforts to improve the language.  Particular advances he described
in detail included lexical binding and the new advice system.  He
indicated that there were many things which constrained the rate of
change, although that rate has been increasing in the last few years.
&lt;/p&gt;

&lt;p&gt;
There was some especially interesting discussion of language features
that had traditionally been slow, and how that affected the idioms and
usage.  Finally, there was speculation on the future, including the
current progress of running elisp in Guile's VM.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgef6ee3c" class="outline-3"&gt;
&lt;h3 id="orgef6ee3c"&gt;&lt;span class="section-number-3"&gt;3.2.&lt;/span&gt; A Scheme-based Closed-Loop Anaesthesia System&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-2"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Christian Petersen.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
Here was a real Lisp-in-the-trenches success story.  Christian
Petersen described a sophisticated medical application built with
Scheme, including application delivery across embedded, desktop and
mobile platforms.
&lt;/p&gt;

&lt;p&gt;
It was especially interesting to hear about their approach to safety
certification, and his emphasis that formal verification could never
entirely replace testing, since the application has to be delivered on
top of millions of lines of unverified code in the OS, anyway.
&lt;/p&gt;

&lt;p&gt;
People don't typically reach for a language like Scheme when building
safety-critical software like this, so this is a story it would be
nice to see spread to a wider audience.
&lt;/p&gt;

&lt;p&gt;
The framework or development system they built in the process of
building this and other systems is called &lt;a href="http://www.lambdanative.org/"&gt;LambdaNative&lt;/a&gt;, and is open
source.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgee7e343" class="outline-3"&gt;
&lt;h3 id="orgee7e343"&gt;&lt;span class="section-number-3"&gt;3.3.&lt;/span&gt; Leadership Trait Analysis and Threat Assessment with Profiler Plus&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-3"&gt;
&lt;p&gt;
&lt;i&gt;Speakers: Michael Young and Nick Levine.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
The original application mentioned in the title here, and in the
proceedings, was not presented, as the speakers felt a related
application, &lt;a href="http://www.thoughthelper.com/"&gt;thoughthelper.com&lt;/a&gt;, would demo better.
&lt;/p&gt;

&lt;p&gt;
About half the talk was actually an introduction to the concepts of
&lt;a href="https://en.wikipedia.org/wiki/Cognitive_behavioral_therapy"&gt;cognitive behavioral therapy&lt;/a&gt;, along with a demo of some of those
concepts in action in the aforementioned web application.  Then we got
to see a few of the details of the CLOS-based NLP framework underlying
it.  I always hate Lisp workflows that are so "clicky" as so much
LispWorks usage seems to be, but it presented nicely.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org0348228" class="outline-3"&gt;
&lt;h3 id="org0348228"&gt;&lt;span class="section-number-3"&gt;3.4.&lt;/span&gt; Efficient Finite Permutation Groups and Homomesy Computation in Common Lisp&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-4"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Robert Smith.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
Here was another nice, REPL-driven demo of some interesting code,
attached to an interesting mathematical result.  I was afraid this was
going to turn into an hour-long attempt to explain basic group theory,
or alternately an impenetrable presentation of hyperspecialized
results, but it treaded a nice middle-ground, and aside from some
arguments about the phrase "bit inversions", the audience seemed
appeased.
&lt;/p&gt;

&lt;p&gt;
The permutation group code appears to be available at
&lt;a href="https://bitbucket.org/tarballs_are_good/cl-permutation"&gt;https://bitbucket.org/tarballs_are_good/cl-permutation&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org62f3caa" class="outline-3"&gt;
&lt;h3 id="org62f3caa"&gt;&lt;span class="section-number-3"&gt;3.5.&lt;/span&gt; CL-FFF: A Common Lisp Full Stack Framework for Web Apps&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-5"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Marc Battyani.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
There were a lot of framework presentations at this ILC.  I've used a
lot of Marc Battyani's software in the past so I was eager to see this
presentation.  Unfortunately, I didn't find it very engaging (though
I'm sure cl-fff is a fine framework for web applications); most
interesting was Battyani's description of some of the commercial
applications he's built in CL.
&lt;/p&gt;

&lt;p&gt;
Here's a link to CL-FFF: &lt;a href="https://github.com/mbattyani/cl-fff"&gt;https://github.com/mbattyani/cl-fff&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org7fe714a" class="outline-3"&gt;
&lt;h3 id="org7fe714a"&gt;&lt;span class="section-number-3"&gt;3.6.&lt;/span&gt; SICL spinoffs: Generic Dispatch, Garbage Collection, and CLOS Bootstrapping&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-6"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Robert Strandh.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
This was, hands-down, my favorite talk at ILC.  I guess I was
expecting more talks of this nature, but this is what I came for.
Strandh presented all three of the papers from the proceedings in a
condensed form. (Two of those papers seem to be here:
&lt;a href="http://metamodular.com/generic-dispatch.pdf"&gt;http://metamodular.com/generic-dispatch.pdf&lt;/a&gt;,
&lt;a href="http://metamodular.com/sliding-gc.pdf"&gt;http://metamodular.com/sliding-gc.pdf&lt;/a&gt;)
&lt;/p&gt;

&lt;p&gt;
The generic dispatch optimization directly relates to something I'm
going to be blogging about soon; I actually got into a disagreement in
a job interview over this basic idea – that table-based dispatch, in
general, is now often outperformed by switch-like code (carefully
ordered integer comparison and branches), on modern hardware.
&lt;/p&gt;

&lt;p&gt;
One question that was raised during this portion was how this approach
compared with &lt;a href="https://en.wikipedia.org/wiki/Inline_caching"&gt;inline caches&lt;/a&gt;.  I wanted to write something about that
here but I'd better do some experiments first.
&lt;/p&gt;

&lt;p&gt;
The GC approach was very cool, although I heard some grumbling from a
few people about having done it before.  Hendrik Boom mentioned that
he had implemented a similar sliding GC in the '80s.  R. Matthew
Emerson mentioned that CCL has a nice mark&amp;amp;compact GC that no one has
gotten around to writing a paper about.
&lt;/p&gt;

&lt;p&gt;
The CLOS boostrapping portion discussed a technique called satiation
to overcome metastability issues.
&lt;/p&gt;

&lt;p&gt;
Since I've been living under a rock, I wasn't really aware of SICL,
its goals, or its progress, but I find it terribly exciting,
especially SICL's IR, &lt;a href="http://metamodular.com/cleavir.pdf"&gt;Cleavir&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
SICL, including the slides of Strandh's talk, can be found at
&lt;a href="https://github.com/robert-strandh/SICL"&gt;https://github.com/robert-strandh/SICL&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgc7f45d5" class="outline-3"&gt;
&lt;h3 id="orgc7f45d5"&gt;&lt;span class="section-number-3"&gt;3.7.&lt;/span&gt; A Transformation Based Approach to Semantics-Directed Code Generation&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-7"&gt;
&lt;p&gt;
&lt;i&gt;Speaker: Arthur Nunes-Harwitt.&lt;/i&gt;
&lt;/p&gt;

&lt;p&gt;
Starting from the principle of a closure as a kind of primitive code
generator, Nunes-Harwitt showed a series of relatively
straight-forward transformations to create a compiler out of an
interpreter.  He then compared the performance of the result with
Norvig's well-known Prolog from PAIP.  He stressed that this was a
manual technique, noting that in addition to the base transformations,
he had also swapped out unify with a fast union-find implementation.
&lt;/p&gt;

&lt;p&gt;
This was another condensation of multiple papers, and it was
relatively difficult subject matter after a long day, which may have
contributed to the paucity of questions asked.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgf2945ee" class="outline-3"&gt;
&lt;h3 id="orgf2945ee"&gt;&lt;span class="section-number-3"&gt;3.8.&lt;/span&gt; Lightning talks 2&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-8"&gt;
&lt;p&gt;
I guess one of the more unexpected presentations was that of (if I
recall the name on the slide correctly) Esposito Louis.  This was a
presentation by a 14-year-old of a Lisp dialect he built, including an
interactive graphical frontend.  It was difficult to understand
exactly what was being claimed, though, since the presentation
basically was a very rapid scroll through an interactive notebook that
seemed to be based on Maxima, punctuated with terse exclamations of
the form, "here I demonstrate mappings in the complex plane &amp;#x2026;".
&lt;/p&gt;

&lt;p&gt;
Greg Pfeil gave a brief talk on &lt;a href="https://github.com/sellout/quid-pro-quo"&gt;Quid Pro Quo&lt;/a&gt;, which I knew in its
former life as &lt;code&gt;dbc.lisp&lt;/code&gt; floating around on the net.  This was a
great and inspiring example of the kind of curation that needs to
happen in the CL community if we want to really solve the library
problem.
&lt;/p&gt;

&lt;p&gt;
There was a brief announcement from Dave Cooper about &lt;a href="http://cl-foundation.org/"&gt;the Common Lisp
Foundation&lt;/a&gt;, which promised more CL-specific meetings and other good
stuff.  Probably the biggest relief was hearing that the domain names,
like cliki.net, which have been bouncing around and renewed at the
personal expense of various individuals, would finally be taken care
of in a more responsible way.  He also gave us a preview of a
relaunched &lt;a href="http://common-lisp.net/"&gt;common-lisp.net&lt;/a&gt; which will probably be live by the time
this blog post is disseminated.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org5136fb0" class="outline-3"&gt;
&lt;h3 id="org5136fb0"&gt;&lt;span class="section-number-3"&gt;3.9.&lt;/span&gt; Panel: "The Next Move for Lisp"&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-3-9"&gt;
&lt;p&gt;
To conclude the conference, there was a panel about the future of
Lisp, comprising Fare, Christian Quiennec, Nick Levine, Dave Cooper,
and Greg Pfeil.  I found this discussion frustrating, and I was pretty
tired, so I apologize for the misrepresentations I am bound to make in
this section.
&lt;/p&gt;

&lt;p&gt;
The first topic for the panel was the Lisp community.  Quiennec
indicated that he didn't think there was such a thing as "the Lisp
community" divorced from a specific language or implementation, which
seems about right.  Seeing as he was the only Schemer, most of the
ensuing discussion too often conflated Lisp-in-general and Common
Lisp.
&lt;/p&gt;

&lt;p&gt;
There was only one question in the following discussion that was
actually important: "where are the Clojure people?".  I don't think we
got a satisfactory answer to that question.
&lt;/p&gt;

&lt;p&gt;
I asked why the demographics of this conference were so skewed.  I was
tired by this point of the day, so my question was probably pretty
incoherent, unfortunately.  I was however disappointed that it was
completely ignored, and never addressed.
&lt;/p&gt;

&lt;p&gt;
Trying to stir up trouble, I also mentioned the Smug Lisp Weenie image
(which, real or imagined, is the biggest obstacle in the Lisp
community, in my opinion), but no one bit.  To me, one of the reasons
Clojure won big was by not being called Lisp, which allowed it to
escape a lot of the baggage associated with Lisp, especially the Smug
Lisp Weenie aspect.
&lt;/p&gt;

&lt;p&gt;
There was some discussion of the discoverability of Lisp; evidently
the lack of a canonical forum has been a difficulty for some.  There
was a shoutout to &lt;code&gt;#lisp&lt;/code&gt;, and an anti-shoutout to &lt;code&gt;comp.lang.lisp&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
The next two sections of discussion were on Lisp in innovation, and
practical directions for Lisp, if I recall correctly.  In any case,
the topics became blurrier as the crowd started to interject with
greater frequency.  The core of the innovation part was basically,
"why are all the PL researchers using Haskell/ML-family languages
instead of Lisp?".  The last section didn't have much coherency at
all, except for treading over the usual watchwords: "libraries",
"documentation", "curation", "community".
&lt;/p&gt;

&lt;p&gt;
Robert Strandh pointed out something very important, which was that CL
and Scheme are based on standards, in an age of languages defined by
de-facto canonical implementations, and that this could be a source of
strength if we paid attention to that advantage.
&lt;/p&gt;

&lt;p&gt;
Near the end, R. Matthew Emerson said what hopefully everyone was
thinking, which was that panels like this, filled with hand-wringing,
tend to be pretty depressing; that important solutions like &lt;a href="http://quicklisp.org/"&gt;Quicklisp&lt;/a&gt;
come out of people deciding to solve their own problems in Common
Lisp; and that the most important thing any of us can do is to just go
hack more Lisp, for which he received applause from all present.
&lt;/p&gt;

&lt;p&gt;
Finally, as a reward to those who stayed til the end, there was a
raffle for a Scheme-driven robot, which I won! (PICO-020, pictured
above)  Due to the overwhelming impressiveness of Esposito Louis's
talk, he was awarded a second robot.
&lt;/p&gt;

&lt;p&gt;
Marc Feeley promised to send me the whole toolchain.  Although he
hasn't yet, I found &lt;a href="http://www.ccs.neu.edu/home/stamourv/papers/picobit.pdf"&gt;an associated paper&lt;/a&gt; and &lt;a href="https://github.com/stamourv/picobit"&gt;repo&lt;/a&gt; online.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org847988a" class="outline-2"&gt;
&lt;h2 id="org847988a"&gt;&lt;span class="section-number-2"&gt;4.&lt;/span&gt; Summary: Rekindling the flame&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
I noticed that several people had a similar story to my own: they'd
drifted away from the Lisp world over the last few years, and going to
ILC was a way to rekindle that flame.  That could have been a more
potent topic for the panel – CL went through a spike of interest in
the mid-2000s; where did those people go, and what lessons can the
community learn from that?
&lt;/p&gt;

&lt;p&gt;
In any case, ILC worked for me.  I came away from the conference eager
to return to the CL community, to better curate the libraries and
tools, and most importantly, to hack more Lisp.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content></entry><entry><title>Headless Testing of OpenGL Software</title><link href='http://cipht.net/2014/08/11/headless-testing-of-opengl-software.html'/><updated>2014-08-11T02:30:00+0000</updated><id>http://cipht.net/2014/08/11/headless-testing-of-opengl-software</id><content type='html'>&lt;p&gt;
I've been resuscitating an old game I wrote in C; at the same time,
I've been involved in a major refactoring effort for a client, which
resulted in me rereading Michael Feathers' excellent book, &lt;a href="http://www.amazon.ca/gp/product/0131177052/ref%3Das_li_qf_sp_asin_tl?ie%3DUTF8&amp;amp;camp%3D15121&amp;amp;creative%3D330641&amp;amp;creativeASIN%3D0131177052&amp;amp;linkCode%3Das2&amp;amp;tag%3Djulisqui-20"&gt;Working Effectively with Legacy Code&lt;/a&gt;.
Inspired by Feathers, I decided I would like to try to get 95%+ code
coverage before I made any major changes to it.
&lt;/p&gt;

&lt;p&gt;
I'm planning to write more about testing and games, but today I wanted
to just announce one little aid to this process.
&lt;/p&gt;

&lt;p&gt;
There's a fair bit of OpenGL code involving shaders that needs to be
tested.  A great solution to this problem is using Mesa's software
renderer (OSMesa).  Although it has its own bugs, it does also help to
rule out driver-specific problems in the code, which is one of the big
nightmares of writing OpenGL code.  (Oh, and always set &lt;code&gt;MESA_DEBUG&lt;/code&gt; in
your environment when running your tests!)
&lt;/p&gt;

&lt;p&gt;
The problem is that I use GLEW to deal with setting up extensions
appropriately, and it doesn't play well with OSMesa.  Searching the
web, I see that a number of people have had this problem; indeed,
chromium even have their own patched version of GLEW with OSMesa
support.  None of the solutions I found online worked well for me,
though, so I contributed my own 80% solution (&lt;a href="http://www.ccs.neu.edu/home/shivers/papers/sre.txt"&gt;sorry Olin&lt;/a&gt;) which can be
found on github:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
&lt;a href="https://github.com/tokenrove/glew/tree/headless-for-testing"&gt;https://github.com/tokenrove/glew/tree/headless-for-testing&lt;/a&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
This adds a &lt;code&gt;linux-osmesa&lt;/code&gt; system definition that can be used to test
GLEW-using code with OSMesa.  The extensions used by my own code are
relatively conservative so I wouldn't be surprised if more modern code
does not work in this case, but hopefully this will help someone else
out there.
&lt;/p&gt;
</content></entry><entry><title>The boustrophedonic madness of space-filling curves: ICFPC 2012 postmortem</title><link href='http://cipht.net/2012/07/16/icfpc2012-postmortem.html'/><updated>2012-07-16T02:30:00+0000</updated><id>http://cipht.net/2012/07/16/icfpc2012-postmortem</id><content type='html'>&lt;p&gt;
&lt;a href="http://en.wikipedia.org/wiki/ICFP_Programming_Contest"&gt;The programming contest associated with&lt;/a&gt; &lt;a href="http://www.icfpconference.org/"&gt;the ICFP conference&lt;/a&gt; is, in my
mind, the most prestigious programming competition currently running.
The lack of restrictions compared to many competitions is an
indication of its difficulty: anyone can enter, on teams or alone;
almost any language is permissible; and the task changes several times
during the competition.
&lt;/p&gt;

&lt;p&gt;
Many years I have promised myself that I would compete, and many years
I did at most one day.  The morning of the second, the siren call of
one of my own back-burner projects would wax louder, and I would
wonder why I was solving someone else's problem.
&lt;/p&gt;

&lt;p&gt;
This year, I resolved to endure the weekend, no matter what happened.
My goal was to submit a solution, but that didn't happen, and here's
my story why.
&lt;/p&gt;

&lt;div id="outline-container-orgef1ad96" class="outline-2"&gt;
&lt;h2 id="orgef1ad96"&gt;&lt;span class="section-number-2"&gt;1.&lt;/span&gt; The problem, and my problems&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
&lt;a href="http://icfpcontest2012.wordpress.com/task/"&gt;The problem this year&lt;/a&gt; was basically &lt;a href="http://en.wikipedia.org/wiki/Boulderdash"&gt;Boulderdash&lt;/a&gt; without monsters, with
a cellular automata model.  Instead of diamonds, one collects lambdas.
After the first announcement, I fully expected either some kind of
cellular automata computation problem or the addition of monsters and
other players.  Unfortunately, neither expectation was correct.
&lt;/p&gt;

&lt;p&gt;
In turn, the organizers introduced flooding (lower parts of the board
gradually become hazardous), trampolines (effectively teleporters),
beards and razors (a kind of amoeba that fills its Moore neighborhood,
and a means of cutting beards), and higher-order rocks (lambdas hidden
inside boulders).
&lt;/p&gt;

&lt;p&gt;
I think my main problems this year were familiar ones for me in
general: too much research, and overengineering.  I often wonder
whether having teammates might have prevented some of these problems.
Maybe next year.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id="outline-container-org83f063e" class="outline-3"&gt;
&lt;h3 id="org83f063e"&gt;&lt;span class="section-number-3"&gt;1.1.&lt;/span&gt; Too much R, not enough D&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-1-1"&gt;
&lt;p&gt;
According to my org-mode files, I put in 40 hours of work this
weekend, and at least 10 hours are attributed to pure research,
although I know that many of the hours clocked on developing the state
model were also research.  I read (or at least skimmed) over 40
papers.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org4a042c2" class="outline-3"&gt;
&lt;h3 id="org4a042c2"&gt;&lt;span class="section-number-3"&gt;1.2.&lt;/span&gt; Overengineering&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-1-2"&gt;
&lt;p&gt;
My solution involved a parent process that handled signals and
executed children, restarting them if they crashed or exited before
the time limit was reached (&lt;code&gt;SIGINT&lt;/code&gt; sent from a harness),
periodically reading the best solutions logged by the children.  It
involved Bloom filters, and representing state as a path on a
space-filling curve&lt;sup&gt;&lt;a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; to increase cache coherency.  It involved
&lt;code&gt;tcmalloc&lt;/code&gt;, and bit interleaving tricks.  I was constantly engineering
for the most extreme cases, and as a result, I never finished a
working lifter (solver).
&lt;/p&gt;

&lt;p&gt;
Once again, the agile null hypothesis stands: YAGNI, KISS, et cetera.
Ignore this at your peril.
&lt;/p&gt;


&lt;div id="org018a095" class="figure"&gt;
&lt;p&gt;&lt;img src="http://cipht.net/images/2012-07-16-icfpc-scrawls.jpg" alt="[ICFPC whiteboard scrawls]" /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;span class="figure-number"&gt;Figure 1: &lt;/span&gt;Madness therein lies.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org135a97d" class="outline-2"&gt;
&lt;h2 id="org135a97d"&gt;&lt;span class="section-number-2"&gt;2.&lt;/span&gt; Chronology&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;/div&gt;
&lt;div id="outline-container-orge9c72f3" class="outline-3"&gt;
&lt;h3 id="orge9c72f3"&gt;&lt;span class="section-number-3"&gt;2.1.&lt;/span&gt; Friday&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-1"&gt;
&lt;p&gt;
The announcement of the problem took me off guard, since I had
expected it to begin on Friday evening, and it started at 12:00UTC.
&lt;/p&gt;

&lt;p&gt;
I wasn't sure how I was going to do search, but I decided from the
beginning that any good solution was going to need to be able to
efficiently compute the next state, and probably represent states
compactly.
&lt;/p&gt;

&lt;p&gt;
My initial implementation was in Common Lisp, using bignums as
bitplanes, with the goal being to do board update as a sequence of
whole-board boolean operations.  Looking back on the code, nothing is
particularly interesting, although the following snipped demonstrates
shifting the board in a given direction:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-lisp"&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;logand &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;ash &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;bits p&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;ecase&lt;/span&gt; dir &lt;span style="color: #787096;"&gt;(&lt;/span&gt;left -1&lt;span style="color: #787096;"&gt;)&lt;/span&gt; &lt;span style="color: #787096;"&gt;(&lt;/span&gt;right 1&lt;span style="color: #787096;"&gt;)&lt;/span&gt; &lt;span style="color: #787096;"&gt;(&lt;/span&gt;up &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;- w&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt; &lt;span style="color: #787096;"&gt;(&lt;/span&gt;down w&lt;span style="color: #787096;"&gt;)&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
        &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;ldb &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;byte &lt;span style="color: #787096;"&gt;(&lt;/span&gt;* w h&lt;span style="color: #787096;"&gt;)&lt;/span&gt; 0&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; -1&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
I spent a lot of time thinking about admissible heuristics.  A* and
its variants need a function \(f(n) = g(n) + h(n)\), where \(g(n)\)
represents the path cost (here, path benefit) to node \(n\), and \(h(n)\)
is an &lt;i&gt;admissible heuristic&lt;/i&gt; for the potential benefit from node \(n\)
on til a goal state.&lt;sup&gt;&lt;a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
One of the problems with the specified task is that every position is
a goal state.  Your score at any state is \(c \cdot 25 \cdot \lambda -
m\), where \(\lambda\) is the number of lambdas collected, \(m\) is the
number of moves, and \[c = \left\{ \begin{array}{rl}
 1 &amp;\mbox{ if one is crushed or drowned} \\
 2 &amp;\mbox{ if one aborts} \\
 3 &amp;\mbox{ if one escapes on a lift}
 \end{array} \right.
\]
&lt;/p&gt;

&lt;p&gt;
You can abort at any time, and leave with points, therefore no smart
program should ever be crushed or drowned, but you need a simulator
that will actually alert you to the fatal move and back up to abort
immediately upon finding the previous lambda (meaning that aborting
right out of the gate is superior to making moves that don't find a
lambda).  The lift only works if you've collected all lambdas, and
some maps may be unsolvable, therefore one cannot depend on reaching
the lift as a goal state.
&lt;/p&gt;

&lt;p&gt;
As for the admissibility of a heuristic, in our case, since we're
looking at points scored rather than cost (although, since we know how
many lambdas there are, we could express cost as difference from
\(75\cdot\lambda_\mbox{total}\)), we need a function that does not
under-estimate the potential value of a position.  The nice thing
about this idea is that you can merge several admissible heuristics by
finding their minimum.
&lt;/p&gt;

&lt;p&gt;
This, however, presumes that you can find such a heuristic for the
problem at hand.  I thought about the classic &lt;a href="http://en.wikipedia.org/wiki/Sokoban"&gt;Sokoban&lt;/a&gt; heuristic
(&lt;a href="http://en.wikipedia.org/wiki/Manhattan_distance"&gt;Manhattan distance&lt;/a&gt; from blocks to goals)&lt;sup&gt;&lt;a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; and other tricks, but
nothing seemed very satisfying.  Playing the game manually (&lt;a href="http://icfp.stbuehler.de/icfp2012/"&gt;on Stefan
Bühler's awesome Javascript simulator&lt;/a&gt;) demonstrated I lacked "domain"
insight&amp;#x2026; how could I write a good heuristic?  If you look at the
&lt;a href="http://www.cs.princeton.edu/~appel/papers/rogomatic.html"&gt;original Rog-o-matic paper&lt;/a&gt;, they cited the use of domain knowledge
from human experts as key to rog-o-matic's success.  (Aside: I had
intended on creating a graphical interactive version of the simulator,
with alpha blended flooding indicators and danger indicators, but once
I saw Stefan's simulator I didn't even bother.)
&lt;/p&gt;

&lt;p&gt;
Had I been more familiar with recent game AI research, my difficulty
in finding an admissible heuristic would have tipped me off to an
alternate approach which I didn't discover until mid-afternoon Sunday.
&lt;/p&gt;

&lt;p&gt;
My notes show I spent most of my time thinking about state
representations, and only a few hours coding.  I implemented a harness
in perl that behaved as the competition harness would; the interesting
portion being as follows:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-perl"&gt;&lt;span style="color: #13665F;"&gt;my&lt;/span&gt; &lt;span style="color: #845A84;"&gt;$pid&lt;/span&gt; = open2&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;\*LIFTER_OUT, \*LIFTER_IN, $LIFTER&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt; &lt;span style="color: #E36B3F;"&gt;or&lt;/span&gt; &lt;span style="color: #13665F;"&gt;die&lt;/span&gt; $!;
&lt;span style="color: #13665F;"&gt;eval&lt;/span&gt; &lt;span style="color: #4d9391;"&gt;{&lt;/span&gt;
    &lt;span style="color: #13665F;"&gt;my&lt;/span&gt; &lt;span style="color: #845A84;"&gt;$gracious&lt;/span&gt; = 1;
    &lt;span style="color: #13665F;"&gt;local&lt;/span&gt; &lt;span style="color: #ff0000; background-color: #eeeed1; font-weight: bold; font-style: italic;"&gt;$SIG&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;&lt;span style="color: #39854C;"&gt;ALRM&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;}&lt;/span&gt; = &lt;span style="color: #13665F;"&gt;sub&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt;
        &lt;span style="color: #13665F;"&gt;if&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;$gracious&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
            &lt;span style="color: #E36B3F;"&gt;kill&lt;/span&gt; &lt;span style="color: #39854C;"&gt;'INT'&lt;/span&gt;, $pid; $gracious = 0; &lt;span style="color: #E36B3F;"&gt;alarm&lt;/span&gt; 10;
        &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt; &lt;span style="color: #13665F;"&gt;else&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;{&lt;/span&gt;
            &lt;span style="color: #E36B3F;"&gt;kill&lt;/span&gt; &lt;span style="color: #39854C;"&gt;'KILL'&lt;/span&gt;, $pid; &lt;span style="color: #E36B3F;"&gt;alarm&lt;/span&gt; 0; &lt;span style="color: #13665F;"&gt;die&lt;/span&gt; &lt;span style="color: #39854C;"&gt;"Exceeded life expectancy.\n"&lt;/span&gt;;
        &lt;span style="color: #4C7A90;"&gt;}&lt;/span&gt;
    &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;;
    &lt;span style="color: #E36B3F;"&gt;alarm&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;$TIME_TO_LIVE&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;;       &lt;span style="color: #7c878a;"&gt;# &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;150 in competition&lt;/span&gt;

    &lt;span style="color: #66cd00;"&gt;print&lt;/span&gt; LIFTER_IN $map; &lt;span style="color: #E36B3F;"&gt;close&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;LIFTER_IN&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;; &lt;span style="color: #13665F;"&gt;while&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&amp;lt;LIFTER_OUT&amp;gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;{&lt;/span&gt; $route .= $_; &lt;span style="color: #a9779c;"&gt;}&lt;/span&gt;
    &lt;span style="color: #E36B3F;"&gt;waitpid&lt;/span&gt; $pid, 0;
    &lt;span style="color: #E36B3F;"&gt;alarm&lt;/span&gt; 0;
&lt;span style="color: #4d9391;"&gt;}&lt;/span&gt;;
&lt;span style="color: #13665F;"&gt;die&lt;/span&gt; $@ &lt;span style="color: #13665F;"&gt;if&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;$@&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
So it spawns the lifter, sets an alarm of 150 seconds, feeds it the
map, and then waits to read the route from it.  After the first
timeout, it sends &lt;code&gt;SIGINT&lt;/code&gt;.  The process gets 10 more seconds grace
before &lt;code&gt;SIGKILL&lt;/code&gt; is sent.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-orgea52cec" class="outline-3"&gt;
&lt;h3 id="orgea52cec"&gt;&lt;span class="section-number-3"&gt;2.2.&lt;/span&gt; Saturday&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-2"&gt;
&lt;p&gt;
I spent the morning writing a test suite (using TAP so I could call it
with &lt;code&gt;prove&lt;/code&gt;) that took maps annotated with routes tested by hand, and
compared the simulator's output to the results of the web validator.
This revealed numerous discrepancies between my model of rocks and the
web validator.
&lt;/p&gt;

&lt;p&gt;
The organizers added trampolines, and this prompted much thought about
graph structures, particularly the idea of walking the space from the
robot's initial position, keeping track of the connected components of
the graph and discarding anything else.  I wondered if I could use
some kind of complete heap storage approach so that the common case of
four adjacencies would be implicit in the packed array storage, and
still handle the exceptional case of trampolines (with four completely
different adjacencies).  At this point, I considered applying
homotopic compaction&lt;sup&gt;&lt;a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; to the empty space in levels to reduce
graph size.
&lt;/p&gt;

&lt;p&gt;
I spent a bit of time looking into &lt;a href="http://mathworld.wolfram.com/StaircaseWalk.html"&gt;monotonic paths&lt;/a&gt;, bumping into
&lt;a href="http://mathworld.wolfram.com/CatalanNumber.html"&gt;Catalan number&lt;/a&gt; a few times on the way, but to no useful end.  (Note
that a space-filling curve on a fixed grid as in our case is a kind of
&lt;a href="http://mathworld.wolfram.com/Self-AvoidingWalk.html"&gt;self-avoiding walk&lt;/a&gt;.)  Lots of interesting mathematics, but nothing
that was getting the code written any faster.
&lt;/p&gt;

&lt;p&gt;
I also did a lot of fruitless research into &lt;a href="http://en.wikipedia.org/wiki/Binary_decision_diagram"&gt;Binary Decision Diagrams&lt;/a&gt;
(described in &lt;sup&gt;&lt;a id="fnr.5" class="footref" href="#fn.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; for example).  There's an attractive A* variant
called SetA*&lt;sup&gt;&lt;a id="fnr.6" class="footref" href="#fn.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt; based on using BDDs that seemed like a way to
prevent the massive state explosion I expected for this problem, but I
just don't understand BDDs well enough yet to implement something like
that in a weekend (definitely a project for the future, though).
(Another A*-alike I discarded was D*-lite, which is also pretty
cool.)
&lt;/p&gt;

&lt;p&gt;
That afternoon, I wrote a new simulator in &lt;a href="http://jsoftware.com"&gt;J&lt;/a&gt;, and though it wasn't
the most productive use of my time, it was the most fun I had all
weekend.  J really is a delightful language.  I just wish
&lt;a href="http://www.snakeisland.com/apexup.htm"&gt;there was a good compiler for it&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
Here's all the rock update code (whole map at once), for example:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-j"&gt;&lt;span style="color: #845A84;"&gt;updaterocks&lt;/span&gt; &lt;span style="color: #000000;"&gt;=:&lt;/span&gt; 3 &lt;span style="color: #ff0000;"&gt;:&lt;/span&gt; 0
&lt;span style="color: #7c878a;"&gt;NB. &lt;/span&gt;&lt;span style="color: #7c878a;"&gt;XXX should probably calculate rocks once but i like these trains so much...&lt;/span&gt;
  &lt;span style="color: #845A84;"&gt;a&lt;/span&gt; &lt;span style="color: #000000;"&gt;=.&lt;/span&gt; (rocks &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (above &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; empty)) y
  &lt;span style="color: #845A84;"&gt;b&lt;/span&gt; &lt;span style="color: #000000;"&gt;=.&lt;/span&gt; (rocks &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (above &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; rocks) &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (leftAndUpLeft &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; empty)) y
  &lt;span style="color: #845A84;"&gt;c&lt;/span&gt; &lt;span style="color: #000000;"&gt;=.&lt;/span&gt; (rocks &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (above &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; rocks) &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (rightAndUpRight &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; empty)) y
  &lt;span style="color: #845A84;"&gt;d&lt;/span&gt; &lt;span style="color: #000000;"&gt;=.&lt;/span&gt; (rocks &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (above &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; lambdas) &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (leftAndUpLeft &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; empty)) y
  &lt;span style="color: #845A84;"&gt;r&lt;/span&gt; &lt;span style="color: #000000;"&gt;=.&lt;/span&gt; rocks y
  (r &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (&lt;span style="color: #ff0000;"&gt;-.&lt;/span&gt; (a &lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt; b &lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt; c &lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt; d))) &lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt; (below r &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; a) &lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt; (right &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; below r &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; (b&lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt;d)) &lt;span style="color: #ff0000;"&gt;+.&lt;/span&gt; (left &lt;span style="color: #0000ff;"&gt;@:&lt;/span&gt; below r &lt;span style="color: #ff0000;"&gt;*.&lt;/span&gt; c)
)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
I figured I wasn't going to write a lifter in J, though, since the
data structures and recursion in an A* search like SMA* are hard (for
me) to reason about in J's "everything is an array" model.  There is
&lt;a href="http://www.jsoftware.com/jwiki/ProblemSolving"&gt;an example of A* search in J&lt;/a&gt; on the J software wiki looking fairly
directly translated from AIMA&lt;sup&gt;&lt;a id="fnr.2.100" class="footref" href="#fn.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt;.  Any time I see the "explicit
verb only" control structures like &lt;code&gt;while.&lt;/code&gt;, it's a red flag that
we're straying outside J's domain.
&lt;/p&gt;

&lt;p&gt;
(In fact, the simulator is the first code in which I've ever used a
multi-statement &lt;code&gt;if.&lt;/code&gt; / &lt;code&gt;elseif.&lt;/code&gt; in J, and so I was bitten by the
bizarre misfeature that &lt;code&gt;else.&lt;/code&gt; cannot be combined with &lt;code&gt;elseif.&lt;/code&gt; in
J.)
&lt;/p&gt;

&lt;p&gt;
But it sure was fun to hack on that stuff in J, especially with the
array display from the REPL just doing the Right Thing as I played
with it interactively.  The tessellation operator in J is amazingly
expressive, too.
&lt;/p&gt;

&lt;p&gt;
My fun ended with the introduction of beards into the task.  Beards
grow into their &lt;a href="http://en.wikipedia.org/wiki/Moore_neighborhood"&gt;Moore neighborhood&lt;/a&gt; every so many turns.  Since there
was no position independent rule to determine whether a beard or a
rock had priority, I was forced to write a simulator that updated the
board left-to-right, bottom-to-top, instead of all-at-once (which is,
to me, much more elegant, and easily parallelizable).  With that, I
gave up on ideas like lazily streaming states in a local area around
the robot to partially evaluate their merit.
&lt;/p&gt;

&lt;p&gt;
This was the nadir for me, where I realized I was basically back at
the beginning, having read dozens of papers but being no further along
for it.  I still didn't have a solver implemented, and I wasn't
confident that a basic A* approach would even work.  I had no good
heuristics, especially with the introduction of beards, which must be
shaved with razors which the robot can collect.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id="outline-container-org9b04fbb" class="outline-3"&gt;
&lt;h3 id="org9b04fbb"&gt;&lt;span class="section-number-3"&gt;2.3.&lt;/span&gt; Sunday&lt;/h3&gt;
&lt;div class="outline-text-3" id="text-2-3"&gt;
&lt;p&gt;
I did some toying around in J, Lisp, and &lt;a href="http://www.ats-lang.org"&gt;ATS&lt;/a&gt;, but nothing useful came
out of it.  The final task update came in: "higher-order rocks", which
are rocks that contain lambdas which only become available if the rock
is broken open by dropping it.  This inspired even more research and
less coding.
&lt;/p&gt;

&lt;p&gt;
I forget how I stumbled across it, but amidst all these papers, I came
across &lt;a href="http://www.cameronius.com/cv/mcts-survey-master.pdf"&gt;A Survey of Monte-Carlo Tree Search Methods&lt;/a&gt; by Browne et al,
and it blew me away.  Here was the perfect strategy for this problem.
Indeed, a bit more searching led me to &lt;a href="http://www.personeel.unimaas.nl/Maarten-Schadd/Papers/2008SameGameCG.pdf"&gt;Schaad et al.'s paper on
applying MCTS to single-player puzzles&lt;/a&gt;&lt;sup&gt;&lt;a id="fnr.7" class="footref" href="#fn.7" role="doc-backlink"&gt;7&lt;/a&gt;&lt;/sup&gt; which described exactly
my predicament.  In the absence of a good heuristic function, here was
a way to search, balancing &lt;i&gt;exploitation&lt;/i&gt; and &lt;i&gt;exploration&lt;/i&gt; as
necessary for the problem.  I could even use my parent-child model of
processes to implement metasearch by blowing away the child every so
often if it wasn't reporting new routes back to the parent often
enough.
&lt;/p&gt;

&lt;p&gt;
I took to the whiteboard (erasing flocks of Z- and U-shaped squiggles),
and scribbled out a grand plan: a giant hash table would store states,
resolving conflicts by choosing the state with the highest score (thus
eliminating duplicate states achieved by different paths); a simulator
would accept states linearized by &lt;a href="http://en.wikipedia.org/wiki/Z-order_curve"&gt;Morton's Z-order curve&lt;/a&gt; and emit the
next state (writing to a pair of buffers swapped each iteration,
points, robot condition, and subsequent valid moves; Monte Carlo Tree
Search would build a tree of routes by choosing random moves from a
list of valid moves weighted by any heuristics we subsequently
developed.
&lt;/p&gt;

&lt;p&gt;
An example of the heuristic move weighting was to give a small
probability bump to the "down" move in early turns on levels with
flooding, to try to explore the bottom before it became completely
flooded.
&lt;/p&gt;

&lt;p&gt;
Another example of the felixibility of this method that &lt;a href="http://www.abandonstream.net/"&gt;Craig&lt;/a&gt; came up
with as I was giving him a post-mortem on the competition was the idea
of analyzing the distribution of lambdas when the map is first read,
and using it to tune the balance between exploitation and exploration:
explore more the further lambdas are apart, on average.
&lt;/p&gt;

&lt;p&gt;
Of course, the tragic ending is that my approach was criminally
overengineered, I got a quarter-way into my hyperoptimized C
implementation, shaving every byte, and realized I had no way to
finish it in the time remaining.  So I went to sleep.
&lt;/p&gt;

&lt;p&gt;
The MCTS idea is cool enough on its own that I am going to try and
complete it in one form or another, but it's a shame about the
competition.  Maybe next year.  I certainly learned a lot this time
around, although some of them are lessons that should have been
absorbed before now.  I blame my obsession with space-filling curves.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="footnotes"&gt;
&lt;h2 class="footnotes"&gt;Footnotes: &lt;/h2&gt;
&lt;div id="text-footnotes"&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink"&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Worst case locality of a curve:
  \[\frac{d(p,q)^2}{A(p,q)}\]
with \(d(p,q)\) the distance between points \(p\) and \(q\), and \(A(p,q)\)
the area filled by the curve between \(p\) and \(q\).
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink"&gt;2&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
See Russell and Norvig's &lt;i&gt;Artificial Intelligence: a Modern
Approach&lt;/i&gt;.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink"&gt;3&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
Junghanns and Schaeffer. &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.128.8912"&gt;"Sokoban: Enhancing general single-agent search methods using domain knowledge."&lt;/a&gt; In Artificial Intelligence 129, 2001.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink"&gt;4&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
F.S. Al-Anzi, &lt;a href="http://www.ijcis.info/Vol1N1/1-17.pdf"&gt;"Efficient Cellular Automata Algorithms for Planar Graph and VLSI. Layout Homotopic Compaction."&lt;/a&gt;
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.5" class="footnum" href="#fnr.5" role="doc-backlink"&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
D. E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks &amp;amp; Techniques; Binary Decision Diagrams, 12nd ed.    Addison-Wesley Professional, Mar. 2009. Available: &lt;a href="http://www.worldcat.org/isbn/0321580508"&gt;http://www.worldcat.org/isbn/0321580508&lt;/a&gt;
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.6" class="footnum" href="#fnr.6" role="doc-backlink"&gt;6&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
R.M. Jensen, R.E. Bryant and M.M. Veloso, &lt;a href="http://www.itu.dk/people/rmj/data/papers/JBV02A.pdf"&gt;"SetA*: An efficient BDD-Based Heuristic Search Algorithm"&lt;/a&gt;. In Proceedings of 18th National Conference on Artificial Intelligence (AAAI'02), pages 668-673, 2002. 
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class="footdef"&gt;&lt;sup&gt;&lt;a id="fn.7" class="footnum" href="#fnr.7" role="doc-backlink"&gt;7&lt;/a&gt;&lt;/sup&gt; &lt;div class="footpara" role="doc-footnote"&gt;&lt;p class="footpara"&gt;
M.P.D. Schadd, M.H.M. Winands, H.J. van den Herik, G.M.J-B. Chaslot and J.W.H.M. Uiterwijk. &lt;a href="http://www.personeel.unimaas.nl/Maarten-Schadd/Papers/2008SameGameCG.pdf"&gt;Single-Player Monte-Carlo Tree Search.&lt;/a&gt; In Computers and Games, H.J. van den Herik and X. Xu and Z. Ma and M.H.M. Winands,eds., Springer, pages 1-12, Beijing, China, 2008.
&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;


&lt;/div&gt;
&lt;/div&gt;</content></entry><entry><title>A static blog compiler in emacs</title><link href='http://cipht.net/2012/06/29/griffin-initial-release.html'/><updated>2012-06-29T02:30:00+0000</updated><id>http://cipht.net/2012/06/29/griffin-initial-release</id><content type='html'>&lt;p&gt;
Back in the mid-to-late '90s, I had a hideous homepage on geocities or
something similar; in dark blue text on a black background, serving no
purpose, as was the style of the time (but at least it was Lynx
friendly!).  Anyway, at the time, it seemed logical to me that one
should statically compile such sites, using templates to insert
uniform headers and footers.  So, I implemented my own with &lt;a href="http://en.wikipedia.org/wiki/M4_%28computer_language%29"&gt;m4&lt;/a&gt; and
&lt;code&gt;make&lt;/code&gt;; it might have even had a link-checking pass, I can't recall.
I wrote way too much &lt;code&gt;m4&lt;/code&gt; in those days.
&lt;/p&gt;

&lt;p&gt;
Anyway, time passed, things on the web came and went.  The era of
"blogs" and "CMSes" came, and with it came crippled browser-based
administration of said sites.  I wanted no part, and continued to lead
the life of the hermit.
&lt;/p&gt;

&lt;p&gt;
Around the same time that I decided to become less private in my life,
a static blog generator called &lt;a href="https://github.com/mojombo/jekyll/"&gt;jekyll&lt;/a&gt; came into vogue.  It seemed to
me that things were coming back in the right direction, and I gave it
a shot.  It worked okay for trivial things, so I used it for a &lt;a href="http://molt.ca"&gt;few&lt;/a&gt;
&lt;a href="http://issfenn.com"&gt;different&lt;/a&gt; &lt;a href="http://kylatilley.com/blog"&gt;sites&lt;/a&gt;.  I won't get into my misgivings about Ruby; that's
material for a later post on language hipsterism.
&lt;/p&gt;

&lt;p&gt;
So I used jekyll for a while without too many hassles, until I started
a post (forthcoming) on suffix arrays.  I needed to embed some math,
so I tried to use &lt;a href="http://www.mathjax.org"&gt;mathjax&lt;/a&gt;.  Well, getting &lt;a href="http://daringfireball.net/projects/markdown/"&gt;markdown&lt;/a&gt; to play well with
mathjax wasn't working, so I converted the post to &lt;a href="http://en.wikipedia.org/wiki/Textile_(markup_language)"&gt;textile&lt;/a&gt;.  RedCloth
barfs on the first non-ASCII character in the post, so I took a look
at the source and thought long and hard about whether I was going to
fix this serious bug all so I could shoehorn my usecase into some
rinkydink markup language.
&lt;/p&gt;

&lt;p&gt;
I wanted to write my posts with &lt;a href="http://orgmode.org"&gt;org-mode&lt;/a&gt;, which has sensible
LaTeX-style math input, tables, and syntax highlighting that plays
well with emacs.  Org has its own publishing features, but I wasn't
going to let that stop me from reinventing the wheel for the
&lt;i&gt;n&lt;/i&gt;&amp;#x00ad;th time, alas.  So I wrote a quick Jekyll-replacement that
runs inside emacs and uses &lt;code&gt;org-mode&lt;/code&gt; as the post format.
&lt;/p&gt;

&lt;p&gt;
I replaced the liquid templating with a simple approach of reading and
evaluating the contents of the tag as a Lisp expression.  So, there's
no interleaving of template tags and content, which hasn't been a
problem for me yet.
&lt;/p&gt;

&lt;p&gt;
Though it's a quick hack, I'm happy with it &amp;#x2013; it scratches my itch,
and I'm sure to improve it as time goes on.  Maybe it can turn into
something useful for other people eventually, too.
&lt;/p&gt;

&lt;p&gt;
Given those caveats, feel free to &lt;a href="http://cipht.net/releases/griffin-120629.el.gz"&gt;download the source code.&lt;/a&gt;
&lt;/p&gt;
</content></entry><entry><title>Shred for Satan initial release</title><link href='http://cipht.net/2012/04/02/shred-for-satan-release.html'/><updated>2012-04-02T02:30:00+0000</updated><id>http://cipht.net/2012/04/02/shred-for-satan-release</id><content type='html'>
&lt;div id="org332fe9d" class="figure"&gt;
&lt;p&gt;&lt;a href="http://cipht.net/images/2012-04-02-shred-01.png" alt="[screenshot]"&gt;&lt;img src="http://cipht.net/images/2012-04-02-shred-01-thumb.png" alt="[screenshot]" /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
&lt;a href="http://github.com/tokenrove/shred-for-satan"&gt;Here's a little GTK-based metronome&lt;/a&gt; I wrote for &lt;a href="http://kylatilley.com/blog/"&gt;Kyla&lt;/a&gt; to
practice the &lt;a href="http://molt.ca/"&gt;Molt&lt;/a&gt; material.  It was created because the material
contains a lot of meter and tempo changes, which are hard to practice
with a conventional metronome.  So this one reads key, meter, and
tempo changes from a MIDI file.  Since we prepare all our scores with
&lt;a href="http://lilypond.org/"&gt;Lilypond&lt;/a&gt;, we have MIDI files that include all the requisite
information.
&lt;/p&gt;

&lt;p&gt;
It needs ocaml, lablgtk, ocaml-bitstring, and a recent version of
ocaml-portaudio (one that uses portaudio v19) to build.
&lt;/p&gt;

&lt;p&gt;
Despite the fact that it uses portaudio, I've only tested it with JACK
running.  There's something broken in portaudio's interaction with
JACK, so you might find that it won't obtain the correct ALSA device
if JACK isn't running.
&lt;/p&gt;
</content></entry><entry><title>A kernel driver for legacy Wacom serial tablets</title><link href='http://cipht.net/2011/07/02/wacom_serial-initial-release.html'/><updated>2011-07-02T02:30:00+0000</updated><id>http://cipht.net/2011/07/02/wacom_serial-initial-release</id><content type='html'>&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;Update&lt;/b&gt; (14/08/20): It looks like this driver will be included in
Linux kernel 3.17, thanks to the labors of Hans de Goede.  It should
no longer be necessary to use the version linked here.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;Update&lt;/b&gt; (12/03/27): There is a new version of this driver
available, which includes a patched version of &lt;code&gt;inputattach&lt;/code&gt;,
&lt;a href="http://cipht.net/releases/wacom_serial-120327-1.tar.bz2"&gt;here&lt;/a&gt;.  This includes support for PenPartner tablets.  For Intuos
tablets, &lt;a href="https://github.com/RoaldFre/wacom_serial5"&gt;look here&lt;/a&gt;.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Having gotten back to doing some art on computer, I decided to dust
off my old Wacom Digitizer II again.  It's always a bit of an
adventure trying to get it to work on a new system, as some
configuration system has &lt;a href="http://www.jwz.org/doc/cadt.html"&gt;always completely changed&lt;/a&gt; since the last time I
hooked it up.  However, this time, I discovered that while the general
approach to detecting and configuring input devices had improved a
lot, support for these old serial Wacom tablets had been completely
removed from the xorg Wacom input driver!
&lt;/p&gt;

&lt;p&gt;
Initially I was pretty irritated, as you can imagine, but after
looking at the code that had been excised, it was clear that this was
for the best.  Given the new(ish) approach to handling input devices
in the Linux kernel, having all the support for the device on the X
side is now clearly the Wrong Thing.  So, I set about reading as much
code as possible related to serial Wacom tablets, and writing a
&lt;code&gt;serio&lt;/code&gt;-based driver.
&lt;/p&gt;

&lt;p&gt;
Along the way, it seemed to me that this would be cleaner if protocol
four (like my Digitizer II) and protocol five (newer tablets like the
Intuos series) devices were supported separately.  So, Intuos owners,
I regret to say that the driver presented here does not support your
devices, though I wouldn't mind trying to write a driver to support
them.
&lt;/p&gt;

&lt;p&gt;
Aside from the inevitable actual bugs to be discovered, this driver
currently does not support (at least):
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;pad buttons;&lt;/li&gt;
&lt;li&gt;tilt;&lt;/li&gt;
&lt;li&gt;suppression;&lt;/li&gt;
&lt;li&gt;cursor devices (some things are missing to fully support these devices).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
To use it presently, you'll need to do a few things: (instructions
apply to Debian systems but should be easily adapted elsewhere)
&lt;/p&gt;

&lt;ol class="org-ol"&gt;
&lt;li&gt;&lt;p&gt;
Unpack and build the module:
&lt;/p&gt;
&lt;pre class="example" id="org75dbebc"&gt;
$ tar xzf wacom_serial.tar.gz
$ cd wacom_serial
$ make all
&lt;/pre&gt;

&lt;p&gt;
That should produce &lt;code&gt;wacom_serial.ko&lt;/code&gt; if you've got things
otherwise configured correctly for building modules against your
current kernel version.  Then:
&lt;/p&gt;

&lt;pre class="example" id="org7f192cf"&gt;
$ sudo insmod ./wacom_serial.ko
&lt;/pre&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;
Patch and build &lt;code&gt;inputattach&lt;/code&gt; (in the &lt;code&gt;joystick&lt;/code&gt; package) with the
included patch:
&lt;/p&gt;

&lt;pre class="example" id="orgfbd8a51"&gt;
$ apt-get source joystick
$ cd joystick-1.4.1
$ patch -p1 &amp;lt; ~/wacom_serial/inputattach.patch
$ dpkg-buildpackage
$ sudo dpkg -i ../inputattach-1.4.1-1_powerpc.deb
&lt;/pre&gt;

&lt;p&gt;
(Adjust paths to things per your case, of course.)
&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;Add the included &lt;code&gt;70-serial-wacom.rules&lt;/code&gt; file to your local udev rules (put it
in &lt;code&gt;/etc/udev/rules.d&lt;/code&gt;).&lt;/li&gt;

&lt;li&gt;&lt;p&gt;
Connect your tablet, turn it on, and run:
&lt;/p&gt;

&lt;pre class="example" id="orgc0e9af5"&gt;
$ sudo inputattach --wacom_iv /dev/ttyS0
&lt;/pre&gt;

&lt;p&gt;
where &lt;code&gt;ttyS0&lt;/code&gt; is the device for the serial port to which the tablet is
attached.  USB serial adapters usually show up as &lt;code&gt;/dev/ttyUSBn&lt;/code&gt;.
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
At this point, if everything else on your system is fairly current
(including the &lt;code&gt;xf86-input-wacom&lt;/code&gt; module and its configuration), your
tablet should hopefully work in X.  &lt;a href="mailto:julian@cipht.net"&gt;Let me know&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
So far, I've only tested it on Linux kernel 2.6.39, i386 and powerpc.
&lt;/p&gt;

&lt;p&gt;
You can get the driver here: &lt;a href="http://cipht.net/releases/wacom_serial-110702-0.tar.gz"&gt;wacom serial-110702-0.tar.gz&lt;/a&gt;.  If you
have a Wacom serial tablet, please try it out and let me know what
happens, success or failure regardless.  Please also send any messages
logged (usually to &lt;code&gt;/var/log/kern.log&lt;/code&gt;) from the point where you
attached the device with &lt;code&gt;inputattach&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
This driver was developed with reference to much code written by others,
particularly:
&lt;/p&gt;
&lt;ul class="org-ul"&gt;
&lt;li&gt;&lt;code&gt;elo&lt;/code&gt;, &lt;code&gt;gunze&lt;/code&gt; drivers by Vojtech Pavlik;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wacom_w8001&lt;/code&gt; driver by Jaya Kumar;&lt;/li&gt;
&lt;li&gt;the USB wacom input driver, credited to many people
(see &lt;code&gt;drivers/input/tablet/wacom.h&lt;/code&gt;);&lt;/li&gt;
&lt;li&gt;new and old versions of linuxwacom / xf86-input-wacom credited to
Frederic Lepied, Ping Cheng, and Jon E. Joganic;&lt;/li&gt;
&lt;li&gt;and &lt;code&gt;xf86wacom.c&lt;/code&gt; (a presumably ancient version of the linuxwacom code), by
Frederic Lepied and Raph Levien.&lt;/li&gt;
&lt;/ul&gt;
</content></entry><entry><title>Molt live, July 21st</title><link href='http://cipht.net/2011/06/16/molt-show-announcement.html'/><updated>2011-06-16T02:30:00+0000</updated><id>http://cipht.net/2011/06/16/molt-show-announcement</id><content type='html'>&lt;p&gt;
My eccentric death metal band, &lt;a href="http://molt.ca"&gt;Molt&lt;/a&gt;, will be playing
&lt;a href="http://www.myspace.com/barflymtl"&gt;Barfly&lt;/a&gt; on July 21st.  Further details to come soon — keep an eye on
the feed at &lt;a href="http://molt.ca"&gt;molt.ca&lt;/a&gt; or &lt;a href="http://www.last.fm/event/1971929+Molt+at+Barfly+on+21+July+2011"&gt;the event page on last.fm&lt;/a&gt;.
&lt;/p&gt;
</content></entry><entry><title>Anaphora 0.9.4 released</title><link href='http://cipht.net/2011/06/15/anaphora-0.9.4-released.html'/><updated>2011-06-15T02:30:00+0000</updated><id>http://cipht.net/2011/06/15/anaphora-0.9.4-released</id><content type='html'>&lt;p&gt;
Just shy of the fifth anniversary of the last release, &lt;a href="http://common-lisp.net/project/anaphora"&gt;anaphora&lt;/a&gt; 0.9.4
has been released.  This release is mostly some accumulated minor bug
fixes, though it also adds &lt;code&gt;ALET&lt;/code&gt; and &lt;code&gt;SLET&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
Anaphora is an anaphoric macro package for Common Lisp, allowing code
like this:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;
&lt;pre class="src src-lisp"&gt;&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;define-binary-type&lt;/span&gt; array &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;type count&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
  &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;:reader&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;in&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;
    &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;aprog1 &lt;span style="color: #787096;"&gt;(&lt;/span&gt;make-array count &lt;span style="color: #4C7A90;"&gt;:element-type&lt;/span&gt; type&lt;span style="color: #787096;"&gt;)&lt;/span&gt;
      &lt;span style="color: #787096;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;loop&lt;/span&gt; for i below count do &lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;setf &lt;span style="color: #a0586c;"&gt;(&lt;/span&gt;svref it i&lt;span style="color: #a0586c;"&gt;)&lt;/span&gt; &lt;span style="color: #a0586c;"&gt;(&lt;/span&gt;read-value type in&lt;span style="color: #a0586c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;&lt;span style="color: #787096;"&gt;)&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
  &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;:writer&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;out array&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;
    &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;loop&lt;/span&gt; for v across array do &lt;span style="color: #787096;"&gt;(&lt;/span&gt;write-value type out v&lt;span style="color: #787096;"&gt;)&lt;/span&gt;&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;

&lt;span style="color: #4d9391;"&gt;(&lt;/span&gt;&lt;span style="color: #13665F;"&gt;defun&lt;/span&gt; &lt;span style="color: #4C7A90;"&gt;get-faces&lt;/span&gt; &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;chunk&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;
  &lt;span style="color: #a9779c;"&gt;(&lt;/span&gt;awhen &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;recursively-seek-chunk 'face-list chunk&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;
    &lt;span style="color: #4C7A90;"&gt;(&lt;/span&gt;faces-of it&lt;span style="color: #4C7A90;"&gt;)&lt;/span&gt;&lt;span style="color: #a9779c;"&gt;)&lt;/span&gt;&lt;span style="color: #4d9391;"&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
(&lt;code&gt;APROG1&lt;/code&gt;, &lt;code&gt;IT&lt;/code&gt;, and &lt;code&gt;AWHEN&lt;/code&gt; are symbols from &lt;code&gt;ANAPHORA&lt;/code&gt;).
&lt;/p&gt;
</content></entry>
</feed>
