### An OCaml hack: recover the abstraction of abstract types

#### by Matthias Puech

Here is a scoop: in OCaml, your abstract types aren’t really abstract (unfortunately). This is because some magical functions in the standard library don’t respect the abstraction of data types: `Pervasive.compare`

, `Hashtbl.hash`

, `Pervasive.(=)`

— yes, the very equality you use everyday — etc. The problem is that if you contribute to a large project written in OCaml and you want to take advantage of abstract types, these functions become your worst enemies because they are breaking the abstraction you were carefully designing. You’ll probably end up as I did finding and removing by hand every occurrence of these magical functions in your code. The following is for you then.

In this post, I’ll give you a feel of what’s wrong with these functions, if you don’t have one already, and I’ll then show you a quick and dirty trick to spot them at run-time. I’ll finish with some hints of what could be done in a compiler to fix their defect.

### Grandeur of the abstract types

Imagine for instance that you just (re-)invented lists in OCaml; you don’t want to write the module:

module List0 : sig type 'a t = | Nil | Cons of 'a * 'a t val app : 'a t * 'a t -> 'a t end = struct type 'a t = | Nil | Cons of 'a * 'a t let rec app = function | Nil, m -> m | Cons (x, l), m -> Cons (x, app (l, m)) end

because you’re then tied to the underlying, canonical representation of lists for ever. Instead, prefer something like this, using an abstract type and explicit constructors & destructors:

module List1 : sig type 'a t val nil : 'a t val cons : 'a * 'a t -> 'a t val destruct : 'a t -> ('a * 'a t) option val app : 'a t * 'a t -> 'a t end = struct type 'a t = | Nil | Cons of 'a * 'a t let nil = Nil let cons (x, l) = Cons (x, l) let destruct = function | Nil -> None | Cons (x, l) -> Some (x, l) let rec app = function | Nil, m -> m | Cons (x, l), m -> Cons (x, app (l, m)) end

It is a bit heavier since you have to turn every constructor into a function and you can’t use pattern-matching as before (you have to call `destruct`

). Too bad OCaml doesn’t have a view mechanism! But if you end up using `app`

a lot and few `cons`

or `destruct`

, there is probably a nicer underlying representation of lists for you with the same signature: you can switch from canonical lists to binary trees for instance:

module List2 : sig type 'a t val nil : 'a t val cons : 'a * 'a t -> 'a t val app : 'a t * 'a t -> 'a t val destruct : 'a t -> ('a * 'a t) option end = struct type 'a tree = | Leaf of 'a | Node of 'a tree * 'a tree type 'a t = Empty | Tree of 'a tree let nil = Empty let cons = function | x, Empty -> Tree (Leaf x) | x, Tree t -> Tree (Node (Leaf x, t)) let app = function | Tree t, Tree u -> Tree (Node(t, u)) | Tree t, Empty | Empty, Tree t -> Tree t | Empty, Empty -> Empty let destruct = function | Empty -> None | Tree t -> let rec aux = function | Leaf x -> Some (x, Empty) | Node (Leaf x, u) -> Some (x, Tree u) | Node (Node (t, u), v) -> aux (Node (t, Node (u, v))) in aux t end

Changing the `List`

module from `List1`

to `List2`

should be transparent for all possible clients, since the type `'a t`

is abstract…

### The problem with OCaml’s generic equality

… well, unfortunately, it is not, because OCaml’s generic comparison functions don’t care about abstract type. Try to run this for example:

app (cons (1, cons (2, nil)), cons (3, nil)) = app (cons (1, nil), cons (2, cons (3, nil)))

The answer is `false`

, because the terms to constructed on both sides are `Tree (Node (Leaf 1, Node (Leaf 2, Leaf 3)))`

and `Tree (Node (Node (Leaf 1, Leaf 2), Leaf 3))`

, which are syntactically not the same, although abstractly, they are *quotiented* by the `destruct`

function. If you compare their abstract structure by using `destruct`

:

destruct (app (cons (1, cons (2, nil)), cons (3, nil))) = destruct (app (cons (1, nil), cons (2, cons (3, nil))))

they are equal. Not very consistent isn’t it? Why not? Because you probably expect the equality to be *referentially transparent* wrt. `destruct`

, or in other words, `destruct`

should be the only morphism for equality on lists:

`l = l' destruct l = destruct l'`

Instead, the generic equality has no awareness of the function destruct, and just cross the boundary of the abstract types comparing values, constructors by constructors.

### How to spot them

Now, imagine that your very big project uses module `List1`

, and you want to switch to `List2`

, but you are crippled by the fact that in many places, people (or you, when you were sillier) used magical functions on `List1.t`

. The change as-is break everything! How to spot them? Well, let’s rely on a dirty trick, curing the bad by the evil: We don’t like Pervasive.(=), but Pervasive.(=) doesn’t like functions! Recall that functions are values, and these magical functions compare values. If one of them meets a function on its way, it immediately fails with an `Invalid_argument`

exception. So we’re going introduce functions in our abstract types! People using proper destructors won’t see the difference since we’re going to hide it, but people jumping over them will be caught the first time their code will be called!

Let’s transform `List1`

into:

module List1' : sig type 'a t val nil : 'a t val cons : 'a * 'a t -> 'a t val destruct : 'a t -> ('a * 'a t) option val app : 'a t * 'a t -> 'a t end = struct type 'a t' = | Nil | Cons of 'a * 'a t' type 'a t = (int -> int) * 'a t' let inj x = (fun x -> x), x let prj (_, x) = x let nil = inj Nil let cons (x, l) = inj (Cons (x, prj l)) let destruct x = match prj x with | Nil -> None | Cons (x, l) -> Some (x, (inj l)) let rec app (x, y : 'a t * 'a t) = match prj x, prj y with | Nil, m -> inj m | Cons (x, l), m -> inj (Cons (x, prj (app (inj l, inj m)))) end

We’ve wrapped the original list into a pair of a (dummy) function and a list. Decorating our previous code with `inj`

and `prj`

which create and ignore this pair, we get an equivalent code… except when comparing two values with the magical functions. Now try the previous equality test, you’ll get an exception!

Finally, to track where this exception was raised, wrap you whole program into an exception catcher which prints its backtrace:

try main () with Invalid_argument "equal: functional value" as e -> Printexc.print_backtrace stderr; raise e

and launch it in shell environment `OCAMLRUNPARAM=b`

(be careful that this exception is not catched earlier somewhere in `main`

). Just wait for the error to happen and the backtrace will tell you exactly where to go to spot the incorrect usage of the evil magical functions! This is what we could ironically call a run-time type-checker. Of course a compile-time solution would be far better.

### Some proposals to perform equality the right way

We need some way to quickly compare value, but not in a type-unsafe way as it is arguably today. Haskell solves that issue with type-classes; we might not want to go that far, but here are a couple of hacks we could implement on the OCaml compiler to fix the behavior of the magical functions:

- The type system could check that every occurrence of a magical function does not cross any abstract types and issue a warning if it is the case. Beware of polymorphism: a magical function is either Pervasive.compare etc., or a polymorphic function that calls a magical function on its polymorphic argument (
*e.g.*`List.mem`

,`List.assoc`

etc.) - There could be a mechanism to register custom type destructors to be used by the magical functions when comparing values, the same way we register printers in the toplevel and debugger. This would be kind of the opposite of what Haskell has: we don’t explicitly mention when to build the equality (“deriving”), but when
**not**to use it directly:let destructor destruct x = match prj x with | Nil -> None | Cons (x, l) -> Some (x, (inj l))

Everytime

`Pervasive.compare`

has to compare two values of the type of`x`

(`'a List2.t`

), then it calls`destruct`

and continue recursively on its result.

Note that both ideas — a compile-time and a run-time support — are compatible together, and would make up for a pretty nice equality subsystem!

Thanks for reading!

Nice trick Matthias!

About existing solutions, notice that SML has equality types, which are, roughly speaking, a builtin adhoc type class mechanism specialized in polymorphic equality. See for instance :

http://mlton.org/PolymorphicEquality

Ok, what SML has seems to be more-or-less my first proposal, but I think being able to say “here is

howto compare two things” (as in proposal 2) instead of just “here is when you’re allowed to compare them” seems reasonable too…FWIW, OCaml does have a kind of view mechanism, via private row types, which allow read-only access to values constructible only in their declaring module. http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#toc76