A research blog about programming languages, formal logics, software development and their interactions, by Matthias Puech.

### Typeful disjunctive normal form

This is the answer to last post’s puzzle. I gave an algorithm to put a formula in disjunctive normal form, and suggested to prove it correct in OCaml, thanks to GADTs. My solution happens to include a wealth of little exercises that could be reused I think, so here it is.

I put the code snippets in the order that I think is more pedagogical, and leave to the reader to reorganize them in the right one.

### Disjunctive normal forms in big steps

This is probably a second-semester functional programming exercise, but I found it surprisingly hard, and could not find a solution online. So at the risk of depriving a TA from a problem for its mid-term exam, here is my take on it, that I painfully put together yesterday.

Given a formula built out of conjunction, disjunction and atoms, return its disjunctive normal form, in big step or natural semantics, that is, not applying repetitively the distributivity and associativity rules, but in a single function run. Before you go any further, please give it a try and send me your solution!

Here is a little programming pearl. I’ve been wanting to work on pattern-matching for a while now, and it seems like I will finally have this opportunity here at my new (academic) home, McGill.

Encoding some simply-typed languages with GADTs is now routine for a lot of OCaml programmers. You can even take (kind of) advantage of (some form of) convenient binding representation, like (weak) HOAS; you then use OCaml variables to denote your language’s variables. But what about pattern-matching? Patterns are possibly “deep”, i.e. they bind several variables at a time, and they don’t respect the usual discipline that a variable is bound for exactly its subterm in the AST.

It turns out that there is an adequate encoding, that relies on two simple ideas. The first is to treat variables in patterns as nameless placeholders bound by λ-abstractions on the right side of the arrow (this is how e.g. Coq encodes matches: `match E₁ with (y, z) -> E₂` actually is sugar for `match E₁ with (_, _) -> fun x y -> E₂`); the second is to thread and accumulate type arguments in a GADT, much like we demonstrated in our `printf` example recently.

The ideas probably extends seamlessly to De Bruijn indices, by threading an explicit environment throughout the term. It stemmed from a discussion on LF encodings of pattern-matching with Francisco over lunch yesterday: what I will show enables also to represent adequately pattern-matching in LF, which I do not think was ever done this way before.

### Update on Typeful Normalization by Evaluation

In October, I publicized here a new draft on normalization by evaluation, which provoked some very helpful comments and constructive criticisms. Together with Chantal and Olivier, we thus revised the draft profoundly and a revamped version is available on my web site.

Besides the enhanced form and better explanations, we included a completely new section on what it means for NbE to be written in Continuation-Passing Style, that I am particularly excited about. This allowed us to extend our typeful NbE formalization beyond the minimal λ-calculus to sums and call/cc (which is known to be difficult). Taking our code, you can write a program with call/cc and if statements, and partially evaluate it: all redexes will be removed and your code will be specialized. All this, as before, is done typefully, thanks to OCaml’s GADTs: this means that the transformation is guaranteed to map well-typed terms to well-typed normal forms.

### New draft on Normalization by Evaluation using GADTs

There is a new draft on my web page! It is called Tagless and Typeful Normalization by Evaluation using Generalized Algebraic Data Types, which is a mouthful, but only describes accurately the cool new OCaml development we elaborated together with Olivier and Chantal. The world of Normalization by Evaluation (NbE) is still relatively new to me, but its wonders keep amazing me. It’s a program specialization algorithm… no, it’s a completeness proof for first-order logic… wait, it’s a technique to turn an interpreter into a compiler! Definitely, Aarhus University, my new home, is not the worst place to learn about it.

Since the introduction of GADTs in OCaml, a whole new realm of applications emerged, the most well-known being to faithfully represent typed languages: it allows to define compact and correct interpreters, and type-preserving program transformations. Ok, if you never saw this, here is a small snippet that should be self-explanatory:

```type _ exp =
| Val : 'a -> 'a exp
| Eq : 'a exp * 'a exp -> bool exp
| Add : int exp * int exp -> int exp

let rec eval : type a. a exp -> a = function
| Val x -> x
| Eq (e1, e2) -> eval e1 = eval e2
| Add (e1, e2) -> eval e1 + eval e2
;;
assert (eval (Eq (Add (Val 2, Val 2), Val 4)))
```

Nice, isn’t it? Our question was: if we can so beautifully go from typed syntax (expressions) to typed semantics (OCaml values), can we do the converse, i.e., go from an OCaml value back to a (normal) expression of the same type? In other words, can we implement NbE in a typeful manner, using GADTs?

The answer is a proud “yes”, but more interestingly, we all learned some interesting GADT techniques and puzzling logical interpretations on the way. Read on to know more!

Tagless and Typeful Normalization by Evaluation using Generalized Algebraic Data Types

We present the first tagless and typeful implementation of normalization by evaluation for the simply typed λ-calculus in OCaml, using Generalized Algebraic Data Types (GADTs). In contrast to normalization by reduction, which is operationally carried out by repeated instances of one-step reduction, normalization by evaluation uses a non-standard model to internalize intermediate results: it is defined by composing a non-standard evaluation function with a reification function that maps the denotation of a term into the normal form of this term. So far, normalization by evaluation has been implemented either in dependently-typed languages such as Coq or Agda, or in general-purpose languages such as Scheme, ML or Haskell. In the latter case, denotations are either tagged or Church-encoded. Tagging means injecting denotations, either implicitly or explicitly, into a universal domain of values; Church encoding is based on the impredicativity of the metalanguage. Here, we show how to obtain not only tagless values, making evaluation efficient, but also typeful reification, guaranteeing type preservation and η-long, β-normal forms. To this end, we put OCaml’s GADTs to work. Furthermore, our implementation does not depend on any particular representation of binders (HOAS, de Bruijn levels or indices) nor on the style (direct style, continuation-passing style, etc.) of the non-standard evaluation function.

PS: My previous draft, Proofs, upside down was accepted for presentation at APLAS 2013. See you in Melbourne!