Quick, dirty and shallow definitions

by Matthias Puech

Here is a quick hack. A few months ago, I advocated for pointer equality in OCaml (==) as a way to deal with fresh symbols in a toy compiler. Today, I’ll show another application of pointer equality: how to trivially implement a mechanism of definitions, to e.g. pretty-print programs in a more readable way. Once again, this is really easy, but I never heard of such a trick, so here it is.

Have you ever implemented an quick prototype for a language, and be annoyed by the lack of definition mechanism? For instance, you define a small calculus and encode a few constructs to test it, but end up with outputs like:

((\n. \p. \f. \x. n f (p f x)) (\f. \x. f (f x)) (\f. \x. f (f x)))
(\b. (\b. \x. \y. b x y) b (\x. \y. x) (\x. \y. x))

when you only wanted the system to print:

2 + 2

(these two constants being Church-encoded in the λ-calculus, FWIW).

One possibility, the Right One™, is to add a definition construct to your language, together with a map from name to definition:

type exp =
  | ...
  | Def of string

type env = (string * exp) list
type program = env * exp

Some would call this a deep encoding of definitions. But it is unfortunately very boring: for each function traversing your programs, you will now have to add a case that traverses constructs by looking up their definitions.

Here is another, shallow solution: keep the expressions as they are, and just have a global, reverse map from expression pointers to names. Each time you want to pretty-print a term, first look if it is not associated with a name in the table. Let me implement that.

First, we will need a simplistic map module with pointer equality comparison:

module QMap (T : sig type t type u end) : sig open T
 val register : u -> t -> t
 val lookup_or : (t -> u) -> t -> u
end = struct
 let tbl = ref []
 let register v x = tbl := (x, v) :: !tbl; x
 let lookup_or f x = try List.assq x !tbl with Not_found -> f x
end

It is implemented as a list (we can’t really do better than this), and the lookup function first tries to find a match with the same memory address (function List.assq), or applies a certain function in case of failure.

Then we define our language (here the λ-calculus), and instantiate the functor with a target type of strings:


type exp =
  | Lam of string * exp
  | App of exp * exp
  | Var of string

include QMap (struct type t = exp type u = string end)

Let’s now encode some (classic) example expressions for testing:

(* church numerals *)
let num n =
  let rec aux = function
    | 0 -> Var "x"
    | n -> App (Var "f", aux (n-1)) in
  register (string_of_int n) (Lam ("f", Lam ("x", aux n)))

(* addition *)
let add = register "add"
    (Lam ("n", Lam ("p", Lam ("f", Lam ("x",
                                        App (App (Var "n", Var "f"),
                                             App (App (Var "p", Var "f"),
                                                  Var "x")))))))

Notice how, as we define these encodings, we give them a name by registering them in the map. Now defining the pretty-printer:

let rec to_string () =
  let rec lam = function
      | Lam (x, m) -> "\\" ^ x ^ ". " ^ lam m
      | e -> app e
  and app = function
    | App (m, n) -> app m ^ " " ^ to_string () n
    | f -> to_string () f
  in lookup_or (function
    | Var s ->  s
    | m -> "(" ^ lam m ^ ")")

Notice the use of function lookup_or? At each iteration, we look in the table for a name, and either return it or continue pretty-printing. (Before you ask, the unit argument to to_string is there to convince OCaml that we are indeed defining a proper recursive function, which I wish it could find out by itself).

That’s it! Now we can ask, say, to print a large term composed by our previous definitions:

 print_string (to_string () (App (App (add, num 2), num 2)));;

and get this output:

(add 2 2)

while retaining the structure of the underlying term. I could then go ahead and define various transformations, interpreters etc. As long as they preserve the memory location of unchanged terms, (i.e. do not reallocate too much), my pretty-printing will be well… pretty.

Does this scale up? Probably not. What else can we do with this trick? Memoization anyone?

About these ads