The boustrophedonic madness of space-filling curves: ICFPC 2012 postmortem [icfpc, postmortem, ai, j, lisp, perl]

The programming contest associated with the ICFP conference is, in my mind, the most prestigious programming competition currently running. The lack of restrictions compared to many competitions is an indication of its difficulty: anyone can enter, on teams or alone; almost any language is permissible; and the task changes several times during the competition.

Many years I have promised myself that I would compete, and many years I did at most one day. The morning of the second, the siren call of one of my own back-burner projects would wax louder, and I would wonder why I was solving someone else's problem.

This year, I resolved to endure the weekend, no matter what happened. My goal was to submit a solution, but that didn't happen, and here's my story why.

The problem, and my problems

The problem this year was basically Boulderdash without monsters, with a cellular automata model. Instead of diamonds, one collects lambdas. After the first announcement, I fully expected either some kind of cellular automata computation problem or the addition of monsters and other players. Unfortunately, neither expectation was correct.

In turn, the organizers introduced flooding (lower parts of the board gradually become hazardous), trampolines (effectively teleporters), beards and razors (a kind of amoeba that fills its Moore neighborhood, and a means of cutting beards), and higher-order rocks (lambdas hidden inside boulders).

I think my main problems this year were familiar ones for me in general: too much research, and overengineering. I often wonder whether having teammates might have prevented some of these problems. Maybe next year.

Too much R, not enough D

According to my org-mode files, I put in 40 hours of work this weekend, and at least 10 hours are attributed to pure research, although I know that many of the hours clocked on developing the state model were also research. I read (or at least skimmed) over 40 papers.

Overengineering

My solution involved a parent process that handled signals and executed children, restarting them if they crashed or exited before the time limit was reached (SIGINT sent from a harness), periodically reading the best solutions logged by the children. It involved Bloom filters, and representing state as a path on a space-filling curve1 to increase cache coherency. It involved tcmalloc, and bit interleaving tricks. I was constantly engineering for the most extreme cases, and as a result, I never finished a working lifter (solver).

Once again, the agile null hypothesis stands: YAGNI, KISS, et cetera. Ignore this at your peril.

[ICFPC whiteboard scrawls]

Figure 1: Madness therein lies.

Chronology

Friday

The announcement of the problem took me off guard, since I had expected it to begin on Friday evening, and it started at 12:00UTC.

I wasn't sure how I was going to do search, but I decided from the beginning that any good solution was going to need to be able to efficiently compute the next state, and probably represent states compactly.

My initial implementation was in Common Lisp, using bignums as bitplanes, with the goal being to do board update as a sequence of whole-board boolean operations. Looking back on the code, nothing is particularly interesting, although the following snipped demonstrates shifting the board in a given direction:

(logand (ash (bits p) (ecase dir (left -1) (right 1) (up (- w)) (down w)))
        (ldb (byte (* w h) 0) -1))

I spent a lot of time thinking about admissible heuristics. A* and its variants need a function \(f(n) = g(n) + h(n)\), where \(g(n)\) represents the path cost (here, path benefit) to node \(n\), and \(h(n)\) is an admissible heuristic for the potential benefit from node \(n\) on til a goal state.2

One of the problems with the specified task is that every position is a goal state. Your score at any state is \(c \cdot 25 \cdot \lambda - m\), where \(\lambda\) is the number of lambdas collected, \(m\) is the number of moves, and \[c = \left\{ \begin{array}{rl} 1 &\mbox{ if one is crushed or drowned} \\ 2 &\mbox{ if one aborts} \\ 3 &\mbox{ if one escapes on a lift} \end{array} \right. \]

You can abort at any time, and leave with points, therefore no smart program should ever be crushed or drowned, but you need a simulator that will actually alert you to the fatal move and back up to abort immediately upon finding the previous lambda (meaning that aborting right out of the gate is superior to making moves that don't find a lambda). The lift only works if you've collected all lambdas, and some maps may be unsolvable, therefore one cannot depend on reaching the lift as a goal state.

As for the admissibility of a heuristic, in our case, since we're looking at points scored rather than cost (although, since we know how many lambdas there are, we could express cost as difference from \(75\cdot\lambda_\mbox{total}\)), we need a function that does not under-estimate the potential value of a position. The nice thing about this idea is that you can merge several admissible heuristics by finding their minimum.

This, however, presumes that you can find such a heuristic for the problem at hand. I thought about the classic Sokoban heuristic (Manhattan distance from blocks to goals)3 and other tricks, but nothing seemed very satisfying. Playing the game manually (on Stefan Bühler's awesome Javascript simulator) demonstrated I lacked "domain" insight… how could I write a good heuristic? If you look at the original Rog-o-matic paper, they cited the use of domain knowledge from human experts as key to rog-o-matic's success. (Aside: I had intended on creating a graphical interactive version of the simulator, with alpha blended flooding indicators and danger indicators, but once I saw Stefan's simulator I didn't even bother.)

Had I been more familiar with recent game AI research, my difficulty in finding an admissible heuristic would have tipped me off to an alternate approach which I didn't discover until mid-afternoon Sunday.

My notes show I spent most of my time thinking about state representations, and only a few hours coding. I implemented a harness in perl that behaved as the competition harness would; the interesting portion being as follows:

my $pid = open2(\*LIFTER_OUT, \*LIFTER_IN, $LIFTER) or die $!;
eval {
    my $gracious = 1;
    local $SIG{ALRM} = sub {
        if($gracious) {
            kill 'INT', $pid; $gracious = 0; alarm 10;
        } else {
            kill 'KILL', $pid; alarm 0; die "Exceeded life expectancy.\n";
        }
    };
    alarm($TIME_TO_LIVE);       # 150 in competition

    print LIFTER_IN $map; close(LIFTER_IN); while(<LIFTER_OUT>) { $route .= $_; }
    waitpid $pid, 0;
    alarm 0;
};
die [email protected] if([email protected]);

So it spawns the lifter, sets an alarm of 150 seconds, feeds it the map, and then waits to read the route from it. After the first timeout, it sends SIGINT. The process gets 10 more seconds grace before SIGKILL is sent.

Saturday

I spent the morning writing a test suite (using TAP so I could call it with prove) that took maps annotated with routes tested by hand, and compared the simulator's output to the results of the web validator. This revealed numerous discrepancies between my model of rocks and the web validator.

The organizers added trampolines, and this prompted much thought about graph structures, particularly the idea of walking the space from the robot's initial position, keeping track of the connected components of the graph and discarding anything else. I wondered if I could use some kind of complete heap storage approach so that the common case of four adjacencies would be implicit in the packed array storage, and still handle the exceptional case of trampolines (with four completely different adjacencies). At this point, I considered applying homotopic compaction4 to the empty space in levels to reduce graph size.

I spent a bit of time looking into monotonic paths, bumping into Catalan number a few times on the way, but to no useful end. (Note that a space-filling curve on a fixed grid as in our case is a kind of self-avoiding walk.) Lots of interesting mathematics, but nothing that was getting the code written any faster.

I also did a lot of fruitless research into Binary Decision Diagrams (described in 5 for example). There's an attractive A* variant called SetA*6 based on using BDDs that seemed like a way to prevent the massive state explosion I expected for this problem, but I just don't understand BDDs well enough yet to implement something like that in a weekend (definitely a project for the future, though). (Another A*-alike I discarded was D*-lite, which is also pretty cool.)

That afternoon, I wrote a new simulator in J, and though it wasn't the most productive use of my time, it was the most fun I had all weekend. J really is a delightful language. I just wish there was a good compiler for it.

Here's all the rock update code (whole map at once), for example:

updaterocks =: 3 : 0
NB. XXX should probably calculate rocks once but i like these trains so much...
  a =. (rocks *. (above @: empty)) y
  b =. (rocks *. (above @: rocks) *. (leftAndUpLeft @: empty)) y
  c =. (rocks *. (above @: rocks) *. (rightAndUpRight @: empty)) y
  d =. (rocks *. (above @: lambdas) *. (leftAndUpLeft @: empty)) y
  r =. rocks y
  (r *. (-. (a +. b +. c +. d))) +. (below r *. a) +. (right @: below r *. (b+.d)) +. (left @: below r *. c)
)

I figured I wasn't going to write a lifter in J, though, since the data structures and recursion in an A* search like SMA* are hard (for me) to reason about in J's "everything is an array" model. There is an example of A* search in J on the J software wiki looking fairly directly translated from AIMA2. Any time I see the "explicit verb only" control structures like while., it's a red flag that we're straying outside J's domain.

(In fact, the simulator is the first code in which I've ever used a multi-statement if. / elseif. in J, and so I was bitten by the bizarre misfeature that else. cannot be combined with elseif. in J.)

But it sure was fun to hack on that stuff in J, especially with the array display from the REPL just doing the Right Thing as I played with it interactively. The tessellation operator in J is amazingly expressive, too.

My fun ended with the introduction of beards into the task. Beards grow into their Moore neighborhood every so many turns. Since there was no position independent rule to determine whether a beard or a rock had priority, I was forced to write a simulator that updated the board left-to-right, bottom-to-top, instead of all-at-once (which is, to me, much more elegant, and easily parallelizable). With that, I gave up on ideas like lazily streaming states in a local area around the robot to partially evaluate their merit.

This was the nadir for me, where I realized I was basically back at the beginning, having read dozens of papers but being no further along for it. I still didn't have a solver implemented, and I wasn't confident that a basic A* approach would even work. I had no good heuristics, especially with the introduction of beards, which must be shaved with razors which the robot can collect.

Sunday

I did some toying around in J, Lisp, and ATS, but nothing useful came out of it. The final task update came in: "higher-order rocks", which are rocks that contain lambdas which only become available if the rock is broken open by dropping it. This inspired even more research and less coding.

I forget how I stumbled across it, but amidst all these papers, I came across A Survey of Monte-Carlo Tree Search Methods by Browne et al, and it blew me away. Here was the perfect strategy for this problem. Indeed, a bit more searching led me to Schaad et al.'s paper on applying MCTS to single-player puzzles7 which described exactly my predicament. In the absence of a good heuristic function, here was a way to search, balancing exploitation and exploration as necessary for the problem. I could even use my parent-child model of processes to implement metasearch by blowing away the child every so often if it wasn't reporting new routes back to the parent often enough.

I took to the whiteboard (erasing flocks of Z- and U-shaped squiggles), and scribbled out a grand plan: a giant hash table would store states, resolving conflicts by choosing the state with the highest score (thus eliminating duplicate states achieved by different paths); a simulator would accept states linearized by Morton's Z-order curve and emit the next state (writing to a pair of buffers swapped each iteration, points, robot condition, and subsequent valid moves; Monte Carlo Tree Search would build a tree of routes by choosing random moves from a list of valid moves weighted by any heuristics we subsequently developed.

An example of the heuristic move weighting was to give a small probability bump to the "down" move in early turns on levels with flooding, to try to explore the bottom before it became completely flooded.

Another example of the felixibility of this method that Craig came up with as I was giving him a post-mortem on the competition was the idea of analyzing the distribution of lambdas when the map is first read, and using it to tune the balance between exploitation and exploration: explore more the further lambdas are apart, on average.

Of course, the tragic ending is that my approach was criminally overengineered, I got a quarter-way into my hyperoptimized C implementation, shaving every byte, and realized I had no way to finish it in the time remaining. So I went to sleep.

The MCTS idea is cool enough on its own that I am going to try and complete it in one form or another, but it's a shame about the competition. Maybe next year. I certainly learned a lot this time around, although some of them are lessons that should have been absorbed before now. I blame my obsession with space-filling curves.

Footnotes:

1

Worst case locality of a curve: \[\frac{d(p,q)^2}{A(p,q)}\] with \(d(p,q)\) the distance between points \(p\) and \(q\), and \(A(p,q)\) the area filled by the curve between \(p\) and \(q\).

2

See Russell and Norvig's Artificial Intelligence: a Modern Approach.

3

Junghanns and Schaeffer. "Sokoban: Enhancing general single-agent search methods using domain knowledge." In Artificial Intelligence 129, 2001.

5

D. E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams, 12nd ed. Addison-Wesley Professional, Mar. 2009. Available: http://www.worldcat.org/isbn/0321580508

6

R.M. Jensen, R.E. Bryant and M.M. Veloso, "SetA*: An efficient BDD-Based Heuristic Search Algorithm". In Proceedings of 18th National Conference on Artificial Intelligence (AAAI'02), pages 668-673, 2002.

7

M.P.D. Schadd, M.H.M. Winands, H.J. van den Herik, G.M.J-B. Chaslot and J.W.H.M. Uiterwijk. Single-Player Monte-Carlo Tree Search. In Computers and Games, H.J. van den Herik and X. Xu and Z. Ma and M.H.M. Winands,eds., Springer, pages 1-12, Beijing, China, 2008.

JS