### Quick, dirty and shallow definitions

#### by Matthias Puech

Here is a quick hack. A few months ago, I advocated for pointer equality in OCaml (==) as a way to deal with fresh symbols in a toy compiler. Today, I’ll show another application of pointer equality: how to trivially implement a mechanism of definitions, to e.g. pretty-print programs in a more readable way. Once again, this is really easy, but I never heard of such a trick, so here it is.

Have you ever implemented an quick prototype for a language, and be annoyed by the lack of definition mechanism? For instance, you define a small calculus and encode a few constructs to test it, but end up with outputs like:

((\n. \p. \f. \x. n f (p f x)) (\f. \x. f (f x)) (\f. \x. f (f x))) (\b. (\b. \x. \y. b x y) b (\x. \y. x) (\x. \y. x))

when you only wanted the system to print:

2 + 2

(these two constants being Church-encoded in the λ-calculus, FWIW).

One possibility, the Right One™, is to add a definition construct to your language, together with a map from name to definition:

type exp = | ... | Def of string type env = (string * exp) list type program = env * exp

Some would call this a *deep encoding* of definitions. But it is unfortunately very boring: for each function traversing your programs, you will now have to add a case that traverses constructs by looking up their definitions.

Here is another, *shallow* solution: keep the expressions as they are, and just have a global, reverse map from expression pointers to names. Each time you want to pretty-print a term, first look if it is not associated with a name in the table. Let me implement that.

First, we will need a simplistic map module with pointer equality comparison:

module QMap (T : sig type t type u end) : sig open T val register : u -> t -> t val lookup_or : (t -> u) -> t -> u end = struct let tbl = ref [] let register v x = tbl := (x, v) :: !tbl; x let lookup_or f x = try List.assq x !tbl with Not_found -> f x end

It is implemented as a list (we can’t really do better than this), and the `lookup`

function first tries to find a match with the same memory address (function `List.assq`

), or applies a certain function in case of failure.

Then we define our language (here the λ-calculus), and instantiate the functor with a target type of strings:

type exp = | Lam of string * exp | App of exp * exp | Var of string include QMap (struct type t = exp type u = string end)

Let’s now encode some (classic) example expressions for testing:

(* church numerals *) let num n = let rec aux = function | 0 -> Var "x" | n -> App (Var "f", aux (n-1)) in register (string_of_int n) (Lam ("f", Lam ("x", aux n))) (* addition *) let add = register "add" (Lam ("n", Lam ("p", Lam ("f", Lam ("x", App (App (Var "n", Var "f"), App (App (Var "p", Var "f"), Var "x")))))))

Notice how, as we define these encodings, we give them a name by registering them in the map. Now defining the pretty-printer:

let rec to_string () = let rec lam = function | Lam (x, m) -> "\\" ^ x ^ ". " ^ lam m | e -> app e and app = function | App (m, n) -> app m ^ " " ^ to_string () n | f -> to_string () f in lookup_or (function | Var s -> s | m -> "(" ^ lam m ^ ")")

Notice the use of function `lookup_or`

? At each iteration, we look in the table for a name, and either return it or continue pretty-printing. (Before you ask, the unit argument to `to_string`

is there to convince OCaml that we are indeed defining a proper recursive function, which I wish it could find out by itself).

That’s it! Now we can ask, say, to print a large term composed by our previous definitions:

print_string (to_string () (App (App (add, num 2), num 2)));;

and get this output:

(add 2 2)

while retaining the structure of the underlying term. I could then go ahead and define various transformations, interpreters etc. As long as they preserve the memory location of unchanged terms, (i.e. do not reallocate too much), my pretty-printing will be well… pretty.

Does this scale up? Probably not. What else can we do with this trick? Memoization anyone?

I don’t see why you need to deal with memory locations when you’ve got a white-box datastructure like ‘exp’. You can implement (intensional) function equality quite easily. I’ve never written OCaml, but here’s some Coq (assuming straighforward definitions of “==” for strings and “&&” for booleans):

Fixpoint exp_eq lhs rhs := match (lhs, rhs) with

| (Lam x1 f1, Lam x2 f2) => x1 == x2 && exp_eq f1 f2

| (App l1 r1, App l2 r2) => exp_eq l1 l2 && exp_eq r1 r2

| (Var s1, Var s2) => s1 == s2

| _ => false

end.

Now you can do the same trick of storing definitions against (string) names in a map, but rather than comparing memory address, you can compare the datastructures directly using exp_eq. This intensional equality will actually spot more expressions than using the memory address (which is a form of definitional equality).

This won’t spot alpha-equivalent terms, but that’s an inherent problem with using strings for variable names. I tend to use de Bruijn indices to avoid having to worry about alpha equivalence :)

I tend to use hashconsing to obtain constant comparison of terms (as well as a unique integer identifier). this way, the unique identifier can be used for hashing or total ordering, so that looking up terms in a table is much faster (but term creation is slower). Also, +1 for De Bruijn indices!

Hello Chris, Hello Simon.

Thanks for your comments. Maybe I should have made myself clearer: what I want is “just” folding of named expressions, not automatic recognition of any registered subexpressions. In other words, I still want to be able to define \f. \x. f x (which can be meant as the applicator lambda-term) without the system recognizing it as Church numeral 1. This is a sound requirement for a definition system: it shouldn’t fold definitions you didn’t provide explicitly.

Using pointer equality achieves just that: there is a match between how I define the term in OCaml and how it is pretty-printed. If I mention a registered definition name, then this name will be printed; otherwise, if I redefine it by hand, its definition will be printed in full.

Using structural equality (as Chris suggests) equate more terms, and thus performs unwanted folding that were not stated explicitly by the user. It is also a lot slower than pointer equality FWIW, since it has to traverse the whole term, unless you use hasconsing (as Simon suggests). A fortiori, I wouldn’t want to equate two alpha-equivalent terms; that would make even more definition folding. So De Bruijn indices are *not* useful in this context. (although of course they are of great use when defining interpreters etc.)

Is it clearer?