### Building terms, incrementally

Here is the first post of a series about incremental verification in programming languages. This problem has become more-or-less the main subject of my Ph.D, and this is an exercise in explaining it from scratch, exploring all the possible alternatives.

Eventually, the goal is to have a clear foundation for supporting better typing incrementality in programming and proof development tools. By incrementality, I mean the ability to have feedback of type errors upon any change during the development. This requires the ability to check changes in programs or proofs and not have to retype a whole development or file as it is the case too often.

Enough with the motivations, which will hopefully become clearer in later posts. Let us start with a simple goal: defining a system for constructing purely first-order well-typed terms, but incrementally. What does that mean?

1. First-order should be pretty clear: no variable binding, no α- or β-equivalence.
2. But you cannot build any term though: every symbol has a type and can be constructed only from terms of other types. All ill-typed term are refused by the system.
3. The incremental nature of the process is specified as follows: you will provide to the system either full terms, or transformations (or deltas) from previously provided terms to build new terms. From a delta, you can surely reconstruct the full information, the same way you can apply a patch to a file. But what we want here is verify a delta alone, without its whole “context”. That way, if you do a small modification in a huge term, you won’t need to re-check the huge term but just the modification. So the system should have a state, containing previous verifications; we’ll call it the repository. It should provide at least two basic operations:
• commit takes a delta and a repository, verifies the validity of the delta wrt. the reop, and records the new information in the repository;
• checkout takes a repository, reads back the current version of the term and prints it.

### The simple model

Let’s start by defining points 1 and 2. We’ll deal with 3 afterwards. I am given this very simple grammar of terms and “types”:

$t, u\ ::=\ c\ |\ t\ u$
$A, B\ ::=\ \alpha\ |\ A \to B$

Here $c$ ranges over a finite set of constants, and $\alpha$ over a finite set of ground types. A signature $\Sigma$ assigns to each constants $c$ a type $A$. Some will call this a “combinator algebra” (e.g. take constants S, K, I) but it is really too pompous for what it really is. The usual two rules rule apply:

• $c : A$ if $\Sigma(c) = A$
• $t\ u : B$ if $t : A \to B$ and $u : A$

There we are with our terms and well-sorting algorithm. Nothing too exciting until now. Now what is a repository, what is a delta, and how do we check one against the other?

Let a delta be just a regular term, augmented with references to terms bound in the current repository, by means of metavariables taken in an infinite set:

$d, e\ ::=\ c\ |\ d\ e\ |\ X$

Let the repository be a set of definitions, in no particular order, assigning to metavariables a content (a delta) and a type, together with a head, a distinguished metavariable:

$R\ ::=\ \Delta, X$
$\Delta\ :\ X \mapsto (d : A)$

Recording statically the type of terms in definitions is crucial here: when one of these terms will be referenced in deltas afterwards, we’ll just have to verify that it has the expected type.

How do we commit a term? Very simple: just thread a $\Delta$ throughout the previous two rules and add the one for metavariables:

• $\Delta \vdash X : A$ if $\Delta(X) = (d : A)$

Now $commit((\Delta, X), d) = (\Delta[Y \mapsto d : A], Y)$ if $\Delta\vdash d : A$ and $checkout(\Delta, X) = |X|^\Delta$ where $|X|^\Delta = \Delta(X)$, $|d e|^\Delta = |d|^\Delta |e|^\Delta$ and $|c|^\Delta = c$: committing a delta is checking it, choosing a fresh metavariable name for it and adding it to the set of defined metavariables; checking out a term is reading it back following the metavariable. That’s it: we just extended our term syntax with toplevel, multiple lets!

### Example

This simple model allows already to construct and verify terms in a bottom-up fashion. This is in opposition to the top-down construction usually adopted in other contexts (proof search, interactive theorem proving) where user refines a term with holes by instantiating the holes with terms, itself possibly containing other holes etc. Let us take an example. We will be building the term $t_1 = f\ (g\ a)\ (g\ b)$ then changing it to $t_2 = f\ (g\ a)\ (g\ (g\ b))$ in the signature $f : \alpha\to\alpha\to\beta,\ g : \alpha\to\alpha,\ a\ b : \alpha$, starting from the empty repository.

• Nothing prevents us from committing our first term in multiple steps. First, let’s commit $f\ (g\ a)$. We get the repository $X : \alpha\to\beta = f\ (g\ a), X$.
• Now the second half, $g\ b$. We get $X : \alpha\to\beta = f\ (g\ a); Y : \alpha = g\ b, Y$.
• Finally we construct the whole first term by assembling the two parts we just constructed: commit $X\ Y$. We get $X : \alpha\to\beta = f\ (g\ a); Y : \alpha = g\ b; Z : \beta = X\ Y, Z$.

At this stage, if we checkout the term, we’ll get $t_1$. How do we change it to $t_2$? We took care of cutting our repository into small slices, so we can share common substructures between the two terms. The commit to obtain $t_2$ is $X\ (g\ Y)$. After that we get the repository $X : \alpha\to\beta = f\ (g\ a); Y : \alpha = g\ b; Z : \beta = X\ Y; T : \beta = X\ (g\ Y), T$.

Note that in this final repository, the variable $Z$ is not accessible from the head $T$:  with the given primitive, the repository grows monotonically and “dangling” metavariables can remain. Note also that the last commit to create $t_2$ does not need to retype the whole term but just assemble the already-given parts. It is incremental.

The morality of this example is that a lot of common operation on syntax trees can be already modeled by this system if we slice the terms enough: all insertions and deletion of one or many nodes anywhere in the tree. Let us be a little bit more formal about the expected properties.

### Properties

The two main properties of this simplistic system is that all checked-out terms are well-typed, and that committing a term is a cheap operation (even if the repository is very large).

Definition: A slicing of a term $t$ is a (non-parallel) substitution $\sigma = [X_1 / d_1, \ldots, X_n / d_n]$ s.t. $X_n \sigma = t$. We extend the operation of commit to slicings: $commit((\Delta_0, Z), \sigma) = R$ if $commit((\Delta_0, Z), d_1) = \Delta_1, Y_1$ and $commit((\Delta_1, Y_1), d_2[X_1/Y_1]) = (\Delta_2, Y_2)$ … and $commit((\Delta_{n-1}, Y_{n-1}), d_n[X_1/Y_1, \ldots, X_{n-1}/Y_{n-1}]) = R$.

Lemma: For all term $t$ and all slicing $\sigma$ of $t$, $checkout(commit((\cdot, Z), \sigma)) = t$ (we don’t care about the previous repository since $t$ is ground).

Theorem: For all ground term $t$ and all slicing $\sigma$ of $t$, $\vdash t : A$ iff $commit((\Delta, Z), \sigma)$.

Theorem: The $commit(R, d)$ operation takes time linear in $|d|$ (number of nodes in $d$).

### Conclusion

In this post, I have shown how to construct terms incrementally by building them bottom-up and exploiting sharing in a very simple way. I’ve shown the only two strictly necessary operations, but more are definable equally simply: compute a diff between two names in the repo, checking out a particular name in a repository (not the head), reorganize a repository by slicing a term in a different way (but for the same checkout)…

Yet it’s still not very useful in the realm of programming languages or proofs for different reasons. One is that binders, for example lambdas, are not taken into account here. They require a special treatment, for example by seeing them as pure first-order structure (de Bruijn indices), or by employing meta-techniques like HOAS. Another reason is that of top-down input of terms: very often when elaborating incrementally a program, parts are left to be addressed in the future, and it is unreasonable to accept only

Finally, there is no support in the system for recording sequences of versions of a term: it requires the user to remember particular names generated by the system. Both these questions, and lots more, will be discussed and — hopefully — addressed in a future post.