malloc() is the new gensym()

Teaching an introductory course to “compilation” this semester (actually it was called Virtual Machines, but it was really about compiling expressions to stack machines), I realized something I hadn’t heard before, and wish I had been told when I first learned OCaml many years ago. Here it is: as soon as you are programming in a functional language with physical equality (i.e. pointer equality, the (==) operator in OCaml), then you are actually working in a “weakly impure” language, and you can for example implement a limited form of gensym. What? gensym is this classic “innocuously effectful” function returning a different symbol—usually a string—each time it is called. It is used pervasively to generate fresh variable names, in compilers notably. How? well, you actually don’t have much to do, except let the runtime call malloc: it will return a “fresh” pointer where to store your data. malloc and the garbage collector together ensures this freshness condition, and you can then compare two pointers with (==). As a bonus, you can even store data along your fresh symbol.

In this post, I’ll exploit that simple idea to develop an assembler for a little stack machine close to that of OCaml.

Read the rest of this entry »