malloc() is the new gensym()

by Matthias Puech

Teaching an introductory course to “compilation” this semester (actually it was called Virtual Machines, but it was really about compiling expressions to stack machines), I realized something I hadn’t heard before, and wish I had been told when I first learned OCaml many years ago. Here it is: as soon as you are programming in a functional language with physical equality (i.e. pointer equality, the (==) operator in OCaml), then you are actually working in a “weakly impure” language, and you can for example implement a limited form of gensym. What? gensym is this classic “innocuously effectful” function returning a different symbol—usually a string—each time it is called. It is used pervasively to generate fresh variable names, in compilers notably. How? well, you actually don’t have much to do, except let the runtime call malloc: it will return a “fresh” pointer where to store your data. malloc and the garbage collector together ensures this freshness condition, and you can then compare two pointers with (==). As a bonus, you can even store data along your fresh symbol.

In this post, I’ll exploit that simple idea to develop an assembler for a little stack machine close to that of OCaml.

The idea

In OCaml, something as simple as this is a gensym:

type 'a sym = C of 'a
let gensym x = C x

Each call to say gensym () will allocate one new data block in memory; you can then compare two symbols with the physical equality (==).What we care about here is not the content of that memory span, but its address, which is unique.

A few warnings first: in OCaml, the constructor must have arguments, otherwise the compiler optimizes the representation to a simple integer and nothing is allocated. Also, don’t replace the argument x to C by a constant, say (), in the function code: if you do so, the compiler will place value C () in the data segment of the program, and calling gensym will not trigger an allocation either. There is an excellent and already classic series of blog post about OCaml’s value representation here.

Another way of saying the same thing is that (non-cyclic) values in OCaml are not trees, as they can be thought of considering the purely functional fragment, but DAGs, that is trees with sharing.

I think that not many beginner/intermediate OCaml programmers realize the power of this, so I’d like to show a cool application of this remark. We will code a small compiler from a arithmetic language to a stack machine. Bear with me, it’s going to be fun!

An application: compiling expressions to a stack machine

The input language of expressions is:

type expr =
  | Int of int
  | Plus of expr * expr
  | If of expr * expr * expr

Its semantics should be clear, except for the fact that If are like in C: if their condition is different than 0, then their first branch is taken; if it is 0, then the second is taken. Because we have these conditionals, the stack machine will need instructions to jump around in the code. The instructions of this stack machine are:

  • Push i pushes i on the stack;
  • Add pops two values off the stack and pushes their sum;
  • Halt stops the machine and returning the (supposedly unique) stack value;
  • Branch o skips the next o instructions in the code;
  • Branchif o skips the next o instructions if the top of the stack is not 0, and has no effect otherwise

For instance, the expression 1 + (if 0 then 2 else (3+3)) is compiled into:

[Push 1; Push 0; Branchif 3; 
   Push 3; Push 3; Add; Branch 1;
   Push 2;
 Add; Halt]

and evaluates of course to 7. Notice how the two branches of the If are turned around in the code? First, we’ve got the code of expression 2, then the code of 3+3. In general, expression if e1 then e2 else e3 will be compiled to [c1; Branchif (|c3|+1); c3; Branch |c2|; c2; …] where ci is the compiled code of ei, and |l| is the size of code l. But I’m getting ahead of myself.


Now, compiling an expr to a list of instructions in one pass would be a little bit messy, because we have to compute these integer offset for jumps. Let’s follow instead the common practice and first compile expressions to an assembly language where some suffixes of the code have labels, which are the names referred to by instructions Branch and Branchif. This assembly language asm will then be well… assembled into actual code, where jumps are translated to integer offsets. But instead of generating label names by side-effect as customary, let’s use our trick: we will refer to them by a unique pointer to the code attached to it. In other words, the arguments to Branch and Branchif will actually be pointers to asm programs, comparable by (==).

To represent the code and asm data structures, we generalize over the notion of label:

type 'label instr =
  | Push of int
  | Add
  | Branchif of 'label
  | Branch of 'label
  | Halt

An assembly program is a list of instruction where labels are themselves assembly programs (the -rectypes option of OCaml is required here):

type asm = asm instr list

For instance, taking our previous example,

Plus (Int 1, If (Int 0, Int 2, Plus (Int 3, Int 3)))

is compiled to the (shared) value:

Push 1 :: Push 0 :: 
  let k = [Add; Halt] in 
  Branchif (Push 2 :: k) :: 
  Push 3 :: Push 3 :: Add :: k

See how the suffix k (the continuation of the If) is shared among the Branchif and the main branch? In call-by-value, this is a value: if you reduce it any further by inlining k, you will get a different value, that can be told apart from the first by using (==). So don’t let OCaml’s pretty-printing of values fool you: this is not a tree, the sharing of k is important! What you get is the DAG of all possible execution traces of your program; they eventually all merge in one point, the code suffix k = [Add; Halt].

The compilation function is relatively straightforward; it’s an accumulator-based function:

let rec compile e k = match e with
  | Int i -> Push i :: k
  | Plus (e1, e2) -> compile e1 (compile e2 (Add :: k))
  | If (e1, e2, e3) ->
    compile e1 (Branchif (compile e2 k) :: compile e3 k)

let compile e = compile e [Halt]

The sharing discussed above is realized here in the If case, by compiling its two branches using the accumulator (continuation) k twice. Again, many people think of this erroneously as duplicating a piece of value. Actually, this is only mentioning twice a pointer to an already-allocated unique piece of value; and since we can compare pointers, we have a way to know that they are the same. Note also that this compilation function is purely compositional: to each subexpression corresponds a contiguous span of assembly code.


Now, real code for our machine is simply a list of instructions where labels are represented by (positive) integers:

type code = int instr list

Why positive? Well, since we have no way to make a loop, code can be arranged such that all jumps are made forward in the code.

The assembly function took me a while to figure out. It “linearizes” the assembly, a DAG, into a list by traversing it depth-first. The tricky part is that we don’t want to repeat the common suffixes of all branches; that’s where we use the fact that they are at the same memory address, which we can check with (==). If a piece of input code has already been compiled n instructions ahead in the output code, instead of repeating it we just emit a Branch n.

So practically, we must keep as an argument an association list k mapping already-compiled suffixes of the input to the corresponding output instruction; think of it as a kind of “cache” of the function. It also doubles as the result of the process: it is what’s eventually returned by assemble. For each input is, we first traverse that list k looking for the pointer is; if we find it, then we have our Branch instruction; otherwise, we assemble the next instruction. This first part of the job corresponds to the assemble function:

let rec assemble is k =
  try (is, Branch (List.index (fun (is', _) -> is == is') k)) :: k
  with Not_found -> assem is k

(List.index p xs returns the index of the first element x of xs such that p x is true).

Now the auxiliary function assem actually assembles instructions into a list of pairs of source programs and target instruction:

and assem asm k = match asm with
  | (Push _ | Add | Halt as i) :: is ->
    (asm, i) :: assemble is k
  | Branchif is :: js ->
    let k = assemble is k in
    let k' = assemble js k in
    (asm, Branchif (List.length k' - List.length k)) :: k'
  | Branch _ :: _ -> assert false
  | [] -> k

Think of the arguments asm and k as one unique list asm @ k that is “open” for insertion in two places: at top-level, as usual, and in the middle, between asm and k. The k part is the already-processed suffix, and asm is what remains to be processed. The first case inserts the non-branching instructions Push, Add, Halt at top-level in the output (together with their corresponding assembly suffix of course). The second one, Branchif, begins by inserting the branch is at top-level, and then inserts the remainder js in front of it. Note that when assembling this remainder, we can discover sharing that was recorded in k when compiling the branch. Note also that there can’t be any Branch in the assembly since it would not make much sense (everything after a Branch instruction would be dead code), hence the assert false.

Finally, we can strip off the “cached” information in the returned list, keeping only the target instructions:

let assemble is = snd (List.split (assemble is []))


That’s it, we have a complete compilation chain for our expression language! We can execute the target code on this machine:

let rec exec = function
  | s, Push i :: c -> exec (i :: s, c)
  | i :: j :: s, Add :: c -> exec (i + j :: s, c)
  | s, Branch n :: c -> exec (s, List.drop n c)
  | i :: s, Branchif n :: c -> exec (s, List.drop (if i<>0 then n else 0) c)
  | [i], Halt :: _ -> i
  | _ -> failwith "error"

let exec c = exec ([], c)

The idea of using labels that are actual pointers to the code seems quite natural and seems to scale well (I implemented a compiler from a mini-ML to a virtual machine close to OCaml’s bytecode). In terms of performance however, assemble is quadratic: before assembling each instruction, we look up if we didn’t assemble it already. When we have real (string) labels, we can represent the “cache” as a data structure with faster lookup; unfortunately, if labels are pointers, we can’t really do this because we don’t have a total order on pointers, only equality (==).

This is only one example of how we can exploit pointer equality in OCaml to mimick a name generator. I’m sure there are lots of other applications to be discovered, or that I don’t know of (off the top of my head: to represent variables in the lambda-calculus). The big unknown for me is the nature of the language we’ve been working in, functional OCaml + pointer equality. Can we still consider it a functional language? How to reason on its programs? The comment section is right below!