<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Syntax! &#187; ocaml</title>
	<atom:link href="http://syntaxexclamation.wordpress.com/tag/ocaml/feed/" rel="self" type="application/rss+xml" />
	<link>http://syntaxexclamation.wordpress.com</link>
	<description>A research blog about programming languages, formal logics, software development and their interactions, by Matthias Puech.</description>
	<lastBuildDate>Sat, 04 May 2013 15:15:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='syntaxexclamation.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Syntax! &#187; ocaml</title>
		<link>http://syntaxexclamation.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://syntaxexclamation.wordpress.com/osd.xml" title="Syntax!" />
	<atom:link rel='hub' href='http://syntaxexclamation.wordpress.com/?pushpress=hub'/>
		<item>
		<title>malloc() is the new gensym()</title>
		<link>http://syntaxexclamation.wordpress.com/2013/05/04/malloc-is-the-new-gensym/</link>
		<comments>http://syntaxexclamation.wordpress.com/2013/05/04/malloc-is-the-new-gensym/#comments</comments>
		<pubDate>Sat, 04 May 2013 15:15:44 +0000</pubDate>
		<dc:creator>Matthias Puech</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[compilation]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[gensym]]></category>
		<category><![CDATA[ocaml]]></category>
		<category><![CDATA[pointer equality]]></category>
		<category><![CDATA[stack]]></category>
		<category><![CDATA[virtual machine]]></category>

		<guid isPermaLink="false">http://syntaxexclamation.wordpress.com/?p=365</guid>
		<description><![CDATA[Teaching an introductory course to “compilation” this semester (actually it was called Virtual Machines, but it was really about compiling expressions to stack machines), I realized something I hadn&#8217;t heard before, and wish I had been told when I first learned OCaml many years ago. Here it is: as soon as you are programming in [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=365&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Teaching an introductory course to “compilation” this semester (actually it was called  <a href="http://www.pps.univ-paris-diderot.fr/~puech/ens/mv6.html">Virtual Machines</a>, but it was really about compiling expressions to stack machines), I realized something I hadn&#8217;t heard before, and wish I had been told when I first learned OCaml many years ago. Here it is: as soon as you are programming in a functional language with physical equality (i.e. pointer equality, the <code>(==)</code> operator in OCaml), then you are actually working in a “weakly impure” language, and you can for example implement a limited form of <code>gensym</code>. What? <code>gensym</code> is this classic “innocuously effectful” function returning a different <i>symbol</i>&#8212;usually a string&#8212;each time it is called. It is used pervasively to generate fresh variable names, in compilers notably. How? well, you actually don&#8217;t have much to do, except let the runtime call <code>malloc</code>: it will return a “fresh” pointer where to store your data. <code>malloc</code> and the garbage collector together ensures this freshness condition, and you can then compare two pointers with <code>(==)</code>. As a bonus, you can even store data along your fresh symbol.</p>
<p>In this post, I&#8217;ll exploit that simple idea to develop an assembler for a little stack machine close to that of OCaml.</p>
<p><span id="more-365"></span></p>
<h3>The idea</h3>
<p>In OCaml, something as simple as this is a <code>gensym</code>:</p>
<pre class="brush: fsharp; title: ; notranslate">
type 'a sym = C of 'a
let gensym x = C x
</pre>
<p>Each call to say <code>gensym ()</code> will allocate one new data block in memory; you can then compare two symbols with the physical equality <code>(==)</code>.What we care about here is not the content of that memory span, but its <i>address</i>, which is unique.</p>
<p>A few warnings first: in OCaml, the constructor must have arguments, otherwise the compiler optimizes the representation to a simple integer and nothing is allocated. Also, don&#8217;t replace the argument <code>x</code> to <code>C</code> by a constant, say <code>()</code>, in the function code: if you do so, the compiler will place value <code>C ()</code> in the data segment of the program, and calling <code>gensym</code> will not trigger an allocation either. There is an excellent and already classic series of blog post about OCaml&#8217;s value representation <a href="http://rwmj.wordpress.com/2009/08/04/ocaml-internals/">here</a>.</p>
<p>Another way of saying the same thing is that (non-cyclic) values in OCaml are not trees, as they can be thought of considering the purely functional fragment, but DAGs, that is trees with sharing. </p>
<p>I think that not many beginner/intermediate OCaml programmers realize the power of this, so I&#8217;d like to show a cool application of this remark. We will code a small compiler from a arithmetic language to a stack machine. Bear with me, it&#8217;s going to be fun!</p>
<h3>An application: compiling expressions to a stack machine</h3>
<p>The input language of expressions is:</p>
<pre class="brush: fsharp; title: ; notranslate">
type expr =
  | Int of int
  | Plus of expr * expr
  | If of expr * expr * expr
</pre>
<p>Its semantics should be clear, except for the fact that <code>If</code> are like in C: if their condition is different than 0, then their first branch is taken; if it is 0, then the second is taken. Because we have these conditionals, the stack machine will need instructions to jump around in the code. The instructions of this stack machine are:</p>
<ul>
<li><code>Push i</code> pushes <code>i</code> on the stack;</li>
<li><code>Add</code> pops two values off the stack and pushes their sum;</li>
<li><code>Halt</code> stops the machine and returning the (supposedly unique) stack value;</li>
<li><code>Branch o</code> skips the next <code>o</code> instructions in the code;</li>
<li><code>Branchif o</code> skips the next <code>o</code> instructions <i>if</i> the top of the stack is not <code>0</code>, and has no effect otherwise</i>
</ul>
<p>For instance, the expression <i>1 + (if 0 then 2 else (3+3))</i> is compiled into:</p>
<pre class="brush: fsharp; title: ; notranslate">
[Push 1; Push 0; Branchif 3; 
   Push 3; Push 3; Add; Branch 1;
   Push 2;
 Add; Halt]
</pre>
<p>and evaluates of course to <code>7</code>. Notice how the two branches of the <code>If</code> are turned around in the code? First, we&#8217;ve got the code of expression <i>2</i>, then the code of <i>3+3</i>. In general, expression <i>if e1 then e2 else e3</i> will be compiled to [<i>c1</i>; <code>Branchif</code> (|<i>c3</i>|+1); <i>c3</i>; <code>Branch</code> |<i>c2</i>|; <i>c2</i>; ...] where <i>ci</i> is the compiled code of <i>ei</i>, and |<i>l</i>| is the size of code <i>l</i>. But I&#8217;m getting ahead of myself.</p>
<h3>Compilation</h3>
<p>Now, compiling an <code>expr</code> to a list of instructions in one pass would be a little bit messy, because we have to compute these integer offset for jumps. Let&#8217;s follow instead the common practice and first compile expressions to an assembly language where some suffixes of the code have <i>labels</i>, which are the names referred to by instructions <code>Branch</code> and <code>Branchif</code>. This assembly language <code>asm</code> will then be well&#8230; assembled into actual <code>code</code>, where jumps are translated to integer offsets. But instead of generating label names by side-effect as customary, let&#8217;s use our trick: we will refer to them by a unique <i>pointer</i> to the code attached to it. In other words, the arguments to <code>Branch</code> and <code>Branchif</code> will actually be pointers to <code>asm</code> programs, comparable by <code>(==)</code>.</p>
<p>To represent the <code>code</code> and <code>asm</code> data structures, we generalize over the notion of label:</p>
<pre class="brush: fsharp; title: ; notranslate">
type 'label instr =
  | Push of int
  | Add
  | Branchif of 'label
  | Branch of 'label
  | Halt
</pre>
<p>An assembly program is a list of instruction where labels are themselves assembly programs (the <code>-rectypes</code> option of OCaml is required here):</p>
<pre class="brush: fsharp; title: ; notranslate">
type asm = asm instr list
</pre>
<p>For instance, taking our previous example,</p>
<pre class="brush: fsharp; title: ; notranslate">
Plus (Int 1, If (Int 0, Int 2, Plus (Int 3, Int 3)))
</pre>
<p>is compiled to the (shared) value:</p>
<pre class="brush: fsharp; title: ; notranslate">
Push 1 :: Push 0 :: 
  let k = [Add; Halt] in 
  Branchif (Push 2 :: k) :: 
  Push 3 :: Push 3 :: Add :: k
</pre>
<p>See how the suffix <code>k</code> (the continuation of the <code>If</code>) is shared among the <code>Branchif</code> and the main branch? In call-by-value, this is a value: if you reduce it any further by inlining <code>k</code>, you will get a different value, that can be told apart from the first by using <code>(==)</code>. So don&#8217;t let OCaml&#8217;s pretty-printing of values fool you: this is not a tree, the sharing of <code>k</code> <i>is</i> important! What you get is the DAG of all possible execution traces of your program; they eventually all merge in one point, the code suffix <code>k = [Add; Halt]</code>.</p>
<p>The compilation function is relatively straightforward; it&#8217;s an accumulator-based function:</p>
<pre class="brush: fsharp; title: ; notranslate">
let rec compile e k = match e with
  | Int i -&gt; Push i :: k
  | Plus (e1, e2) -&gt; compile e1 (compile e2 (Add :: k))
  | If (e1, e2, e3) -&gt;
    compile e1 (Branchif (compile e2 k) :: compile e3 k)

let compile e = compile e [Halt]
</pre>
<p>The sharing discussed above is realized here in the <code>If</code> case, by compiling its two branches using the accumulator (continuation) <code>k</code> twice. Again, many people think of this erroneously as <i>duplicating</i> a piece of value. Actually, this is only mentioning twice a pointer to an already-allocated unique piece of value; and since we can compare pointers, we have a way to know that they are the same. Note also that this compilation function is purely compositional: to each subexpression corresponds a contiguous span of assembly code.</p>
<h3>Assembly</h3>
<p>Now, real code for our machine is simply a list of instructions where labels are represented by (positive) integers:</p>
<pre class="brush: fsharp; title: ; notranslate">
type code = int instr list
</pre>
<p>Why positive? Well, since we have no way to make a loop, code can be arranged such that all jumps are made <i>forward</i> in the code.</p>
<p>The assembly function took me a while to figure out. It “linearizes” the assembly, a DAG, into a list by traversing it depth-first. The tricky part is that we don&#8217;t want to repeat the common suffixes of all branches; that&#8217;s where we use the fact that they are at the same memory address, which we can check with <code>(==)</code>. If a piece of input code has already been compiled <i>n</i> instructions ahead in the output code, instead of repeating it we just emit a <code>Branch</code> <i>n</i>.</p>
<p>So practically, we must keep as an argument an association list <code>k</code> mapping already-compiled suffixes of the input to the corresponding output instruction; think of it as a kind of “cache” of the function. It also doubles as the <i>result</i> of the process: it is what&#8217;s eventually returned by <code>assemble</code>. For each input <code>is</code>, we first traverse that list <code>k</code> looking for the pointer <code>is</code>; if we find it, then we have our <code>Branch</code> instruction; otherwise, we assemble the next instruction. This first part of the job corresponds to the <code>assemble</code> function:</p>
<pre class="brush: fsharp; title: ; notranslate">
let rec assemble is k =
  try (is, Branch (List.index (fun (is', _) -&gt; is == is') k)) :: k
  with Not_found -&gt; assem is k
</pre>
<p>(<code>List.index p xs</code> returns the index of the first element <code>x</code> of <code>xs</code> such that <code>p x</code> is <code>true</code>). </p>
<p>Now the auxiliary function <code>assem</code> actually assembles instructions into a list of pairs of source programs and target instruction:</p>
<pre class="brush: fsharp; title: ; notranslate">
and assem asm k = match asm with
  | (Push _ | Add | Halt as i) :: is -&gt;
    (asm, i) :: assemble is k
  | Branchif is :: js -&gt;
    let k = assemble is k in
    let k' = assemble js k in
    (asm, Branchif (List.length k' - List.length k)) :: k'
  | Branch _ :: _ -&gt; assert false
  | [] -&gt; k
</pre>
<p>Think of the arguments <code>asm</code> and <code>k</code> as one unique list <code>asm @ k</code> that is “open” for insertion in two places: at top-level, as usual, and in the middle, between <code>asm</code> and <code>k</code>. The <code>k</code> part is the already-processed suffix, and <code>asm</code> is what remains to be processed. The first case inserts the non-branching instructions <code>Push, Add, Halt</code> at top-level in the output (together with their corresponding assembly suffix of course). The second one, <code>Branchif</code>, begins by inserting the branch <code>is</code> at top-level, and then inserts the remainder <code>js</code> in front of it. Note that when assembling this remainder, we can discover sharing that was recorded in <code>k</code> when compiling the branch. Note also that there can&#8217;t be any <code>Branch</code> in the assembly since it would not make much sense (everything after a <code>Branch</code> instruction would be dead code), hence the <code>assert false</code>.</p>
<p>Finally, we can strip off the “cached” information in the returned list, keeping only the target instructions:</p>
<pre class="brush: fsharp; title: ; notranslate">
let assemble is = snd (List.split (assemble is []))
</pre>
<h3>Conclusion</h3>
<p>That&#8217;s it, we have a complete compilation chain for our expression language! We can execute the target code on this machine:</p>
<pre class="brush: fsharp; title: ; notranslate">
let rec exec = function
  | s, Push i :: c -&gt; exec (i :: s, c)
  | i :: j :: s, Add :: c -&gt; exec (i + j :: s, c)
  | s, Branch n :: c -&gt; exec (s, List.drop n c)
  | i :: s, Branchif n :: c -&gt; exec (s, List.drop (if i&lt;&gt;0 then n else 0) c)
  | [i], Halt :: _ -&gt; i
  | _ -&gt; failwith &quot;error&quot;

let exec c = exec ([], c)
</pre>
<p>The idea of using labels that are actual pointers to the code seems quite natural and seems to scale well (I implemented a compiler from a mini-ML to a virtual machine close to OCaml&#8217;s bytecode). In terms of performance however, <code>assemble</code> is quadratic: before assembling each instruction, we look up if we didn&#8217;t assemble it already. When we have real (string) labels, we can represent the “cache” as a data structure with faster lookup; unfortunately, if labels are pointers, we can&#8217;t really do this because we don&#8217;t have a total order on pointers, only equality <code>(==)</code>.</p>
<p>This is only one example of how we can exploit pointer equality in OCaml to mimick a name generator. I&#8217;m sure there are lots of other applications to be discovered, or that I don&#8217;t know of (off the top of my head: to represent variables in the lambda-calculus). The big unknown for me is the nature of the language we&#8217;ve been working in, functional OCaml + pointer equality. Can we still consider it a functional language? How to reason on its programs? The comment section is right below!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/syntaxexclamation.wordpress.com/365/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/syntaxexclamation.wordpress.com/365/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=365&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://syntaxexclamation.wordpress.com/2013/05/04/malloc-is-the-new-gensym/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/34a3291be1d0c68725533654d7848863?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mqtthiqs</media:title>
		</media:content>
	</item>
		<item>
		<title>My thesis is out!</title>
		<link>http://syntaxexclamation.wordpress.com/2013/04/19/my-thesis-is-out/</link>
		<comments>http://syntaxexclamation.wordpress.com/2013/04/19/my-thesis-is-out/#comments</comments>
		<pubDate>Fri, 19 Apr 2013 14:03:58 +0000</pubDate>
		<dc:creator>Matthias Puech</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ocaml]]></category>
		<category><![CDATA[incremental]]></category>
		<category><![CDATA[type-checking]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[sequent calculus]]></category>
		<category><![CDATA[LF]]></category>

		<guid isPermaLink="false">http://syntaxexclamation.wordpress.com/?p=354</guid>
		<description><![CDATA[At last! The definitive, final and comprehensive version of my thesis manuscript is out. I defended it on April 8 in Bologna, Italy, and received both titles of &#8220;Dottore di ricerca&#8221; and &#8220;Docteur&#8221; in Computer Science, with great pride and relief. What an adventure! You can find my manuscript on my web page, precisely here; [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=354&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>At last! The definitive, final and comprehensive version of my thesis manuscript is out. I defended it on April 8 in Bologna, Italy, and received both titles of &#8220;Dottore di ricerca&#8221; and &#8220;Docteur&#8221; in Computer Science, with great pride and relief. What an adventure! You can find my manuscript on my <a href="http://www.pps.univ-paris-diderot.fr/~puech/">web page</a>, precisely <a title="My Ph.D thesis" href="http://www.pps.univ-paris-diderot.fr/~puech/thesis.pdf">here</a>; it&#8217;s called <em>Certificates for incremental type-checking</em>, and after much hesitation, I chose a blue cover for its printed version (it was a tough choice). It is already a little bit obsolete since I compulsively worked on that material even after its submission to avoid the baby blues, but I will nonetheless advertise it here, and eventually write about my advances in future posts. In short, if you are interested in proof certificates, manipulation of proof objects in a functional language, spine-form LF, incremental type-checking, contextual type theory, or the relationship between natural deduction and the sequent calculus, you might be interested in some parts of my manuscript.</p>
<p><span id="more-354"></span></p>
<p>In a little more details, the abstract printed on the (blue) back cover reads:</p>
<blockquote><p>The central topic of this thesis is the study of algorithms for type checking, both from the programming language and from the proof-theoretic point of view. A type checking algorithm takes a program or a proof, represented as a syntactical object, and checks its validity with respect to a specification or a statement; it is a central piece of compilers and proof assistants. First, we present a tool which supports the development of functional programs manipulating proof certificates (certifying programs). It uses LF as a representation metalanguage for higher-order proofs and OCaml as a programming language, and facilitates the automated and efficient verification of these certificates at run time. Technically, we introduce in particular the notion of function inverse allowing to abstract from a local environment when manipulating open terms. Then, we remark that the idea of a certifying type checker, generating a typing derivation, can be extended to realize an incremental type checker, working by reuse of typing subderivation. Such a type checker would make possible the structured and type-directed edition of proofs and programs. Finally, we showcase an original correspondence between natural deduction and the sequent calculus, through the transformation of the corresponding type checking functional programs: we show, using off-the-shelf program transformations, that the latter is the accumulator-passing version of the former.</p></blockquote>
<p>Now that this is over with, I can go back to all the activities I&#8217;ve been missing so much these past months (years?); one of them is blogging. So stay tuned for some OCaml fun, serious proof theory and terrible hacks. You will hopefully read shortly about:</p>
<ul>
<li><span style="line-height:13px;">the compilation of ML pattern-matching, explained as logically principled as I can,</span></li>
<li>how you can write a <em>gensym </em>in OCaml without using the <strong>mutable</strong> keywords,<em><br />
</em></li>
<li>handling syntax with binders in ML</li>
<li>&#8230; and more!</li>
</ul>
<p>À bientôt!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/syntaxexclamation.wordpress.com/354/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/syntaxexclamation.wordpress.com/354/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=354&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://syntaxexclamation.wordpress.com/2013/04/19/my-thesis-is-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/34a3291be1d0c68725533654d7848863?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mqtthiqs</media:title>
		</media:content>
	</item>
		<item>
		<title>Reverse natural deduction and get sequent calculus</title>
		<link>http://syntaxexclamation.wordpress.com/2011/09/01/reverse-natural-deduction-and-get-sequent-calculus/</link>
		<comments>http://syntaxexclamation.wordpress.com/2011/09/01/reverse-natural-deduction-and-get-sequent-calculus/#comments</comments>
		<pubDate>Thu, 01 Sep 2011 14:32:05 +0000</pubDate>
		<dc:creator>Matthias Puech</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[herd]]></category>
		<category><![CDATA[natural deduction]]></category>
		<category><![CDATA[ocaml]]></category>
		<category><![CDATA[reverse]]></category>
		<category><![CDATA[sequent calculus]]></category>
		<category><![CDATA[type-checking]]></category>

		<guid isPermaLink="false">http://syntaxexclamation.wordpress.com/?p=129</guid>
		<description><![CDATA[This is a follow-up on my previous post. It should be readable by itself if you just take a quick peek at herds. Today, we are going to write type-checkers. And rewrite them. Again and again. Eventually, I&#8217;ll put in evidence that what seemed to be a programming hack in the last post turns out [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=129&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This is a follow-up on my <a title="Reversing data structures" href="http://syntaxexclamation.wordpress.com/2011/08/31/reversing-data-structures/">previous post</a>. It should be readable by itself if you just take a quick peek at <em>herds</em>.</p>
<p>Today, we are going to write type-checkers. And rewrite them. Again and again. Eventually, I&#8217;ll put in evidence that what seemed to be a programming hack in the last post turns out to be the difference between two well-known equivalent formulations of first-order logic, your good ol&#8217; natural deduction and sequent calculus.<span id="more-129"></span></p>
<h3>Bi-directional type-checking</h3>
<p>We shall here start by writing a type-checker for the usual simply typed lambda-calculus, natural deduction-style. Types are:</p>
<p><img src='http://s0.wp.com/latex.php?latex=A+%3A%3A%3D+nat%5C+%7C%5C+A%5Cto+A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='A ::= nat&#92; |&#92; A&#92;to A' title='A ::= nat&#92; |&#92; A&#92;to A' class='latex' /></p>
<pre class="brush: fsharp; title: ; notranslate">
type tp =
  | Nat
  | Arr of tp * tp
</pre>
<p>Let us make a <em>huge</em> simplification right away, that will probably make more than one reader jump on their seat: our language will be <em>normal</em> (canonical). Yes, no redexes. What&#8217;s the use of checking well-typing (that is, termination) if you can&#8217;t write redexes? The quick answer is to look at any reference to a modern metatheory of LF (<a title="Mechanizing Metatheory in a Logical Framework" href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=1&amp;ved=0CBwQFjAA&amp;url=http%3A%2F%2Fwww.cs.cmu.edu%2F~rwh%2Fpapers%2Fmech%2Fjfp06.pdf&amp;rct=j&amp;q=metatheory%20logical%20framework&amp;ei=BmxeTsiqNcyXOtC6rd0C&amp;usg=AFQjCNEBZoDF6KwwRrJTIfjzSLn0BgMeLA&amp;sig2=rB7tG5vLKC-fHYmGBEtAyA&amp;cad=rja">here</a> for instance). The still-very-quick answer is that it is a simplification that will need to be lifted, but is already useful: you can&#8217;t write redexes, but you can implement the substitution function as an <em>admissible</em> process. Having redexes in the syntax and eliminating them on one side, and having no redexes but instead a function to compute their result is the difference between <em>cut-elimination</em> and <em>cut-admissibility</em>. On one hand, you will want to prove, as usual, that the iteration of the cut-elimination procedure terminates on well-typed terms, on the other that one admissible cut procedure (sometimes called hereditary substitution) terminates given a well-typed term <img src='http://s0.wp.com/latex.php?latex=t+%3A+A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='t : A' title='t : A' class='latex' /> with a free variable <img src='http://s0.wp.com/latex.php?latex=x+%3A+B&amp;bg=fff&amp;fg=444444&amp;s=0' alt='x : B' title='x : B' class='latex' />, and a substituend <img src='http://s0.wp.com/latex.php?latex=u+%3A+B&amp;bg=fff&amp;fg=444444&amp;s=0' alt='u : B' title='u : B' class='latex' />.</p>
<p>The normal syntax looks like this:</p>
<p><img src='http://s0.wp.com/latex.php?latex=M+%3A%3A%3D+%5Clambda+x.+M%5C+%7C%5C+R&amp;bg=fff&amp;fg=444444&amp;s=0' alt='M ::= &#92;lambda x. M&#92; |&#92; R' title='M ::= &#92;lambda x. M&#92; |&#92; R' class='latex' /> (canonical terms)<br />
<img src='http://s0.wp.com/latex.php?latex=R+%3A%3A%3D+x%5C+%7C%5C+R%5C+M&amp;bg=fff&amp;fg=444444&amp;s=0' alt='R ::= x&#92; |&#92; R&#92; M' title='R ::= x&#92; |&#92; R&#92; M' class='latex' /> (atomic terms)</p>
<p>or in OCaml, adopting de Bruijn notation:</p>
<pre class="brush: fsharp; title: ; notranslate">
  type m =
    | Lam of m
    | At of r

  and r =
    | Var of int
    | App of r * m
</pre>
<p>One of the advantage of this restriction is that there is a correspondence between these two syntactic categories and the two judgments for type-checking (hence the name, bi-directional):</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5CGamma%5Cvdash+M+%5CLeftarrow+A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;Gamma&#92;vdash M &#92;Leftarrow A' title='&#92;Gamma&#92;vdash M &#92;Leftarrow A' class='latex' /> (check that term <img src='http://s0.wp.com/latex.php?latex=t&amp;bg=fff&amp;fg=444444&amp;s=0' alt='t' title='t' class='latex' /> has type <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='A' title='A' class='latex' />)<br />
<img src='http://s0.wp.com/latex.php?latex=%5CGamma%5Cvdash+R+%5CRightarrow+A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;Gamma&#92;vdash R &#92;Rightarrow A' title='&#92;Gamma&#92;vdash R &#92;Rightarrow A' class='latex' /> (infer type <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='A' title='A' class='latex' /> for <img src='http://s0.wp.com/latex.php?latex=t&amp;bg=fff&amp;fg=444444&amp;s=0' alt='t' title='t' class='latex' />)</p>
<p>or in OCaml:</p>
<pre class="brush: fsharp; title: ; notranslate">
val check : tp list -&gt; m * tp -&gt; unit
val infer : tp list -&gt; r -&gt; tp
</pre>
<p>Instead of showing the rules, I&#8217;ll give the code directly:</p>
<pre class="brush: fsharp; title: ; notranslate">
  let rec check env : m * tp -&gt; unit = function
    | Lam n, Arr(t, u) -&gt; check (t :: env) (n, u)
    | At a, t -&gt; if infer env a = t then () else failwith &quot;not the awaited type&quot;
    | _ -&gt; failwith &quot;type mismatch&quot;

  and infer env : r -&gt; tp = function
    | Var i -&gt; List.nth env i
    | App (a, n) -&gt; match infer env a with
        | Arr (t, u) -&gt; check env (n, t); u
        | Nat -&gt; failwith &quot;too many arguments&quot;
</pre>
<p>This should be pretty self-explanatory.</p>
<h3>Reversing atomic terms</h3>
<p>Let&#8217;s now have a closer look at the type of atomic terms, and recognize the exact structure of a <code>Herd.t</code> from <a title="Reversing data structures" href="https://syntaxexclamation.wordpress.com/2011/08/31/reversing-data-structures/">last post</a>. It is precisely a <code>(m, int) Herd.t</code>. Looking at function infer, it is a simple bottom-up traversal of that structure. So let us refactor our previous code as:</p>
<pre class="brush: fsharp; title: ; notranslate">
  type m =
    | Lam of m
    | At of r

  and r = (m, int) Herd.t

  let rec check env : m * tp -&gt; unit = function
    | Lam n, Arr(t, u) -&gt; check (t :: env) (n, u)
    | At a, t -&gt; if infer env a = t then () else failwith &quot;not the awaited type&quot;
    | _ -&gt; failwith &quot;type mismatch&quot;

  and infer env : r -&gt; tp =
    Herd.fold_right
      (fun n t -&gt; match t with
        | Arr (t, u) -&gt; check env (n, t); u
        | Nat -&gt; failwith &quot;too many arguments&quot;
      ) List.nth env
</pre>
<p>Function <code>check</code> doesn&#8217;t change, but <code>infer</code> is expressed as a <code>fold_right</code>. As we remarked last time, this function is not tail-recursive, but it is equivalent to a <code>fold_left'</code> on the reversed <code>Herd</code>. We&#8217;d better use the latter then if we care about the size of our stack. We get the new, tail-rec version:</p>
<pre class="brush: fsharp; title: ; notranslate">
  type m =
    | Lam of m
    | At of r

  and r = (m, int) Herd.t'

  let rec check env : m * tp -&gt; unit = function
    | Lam n, Arr(t, u) -&gt; check (t :: env) (n, u)
    | At a, t -&gt; if infer env a = t then () else failwith &quot;not the awaited type&quot;
    | _ -&gt; failwith &quot;type mismatch&quot;

  and infer env : r -&gt; tp =
    Herd.fold_left'
      (fun t n -&gt; match t with
        | Arr (t, u) -&gt; check env (n, t); u
        | Nat -&gt; failwith &quot;too many arguments&quot;
      ) List.nth env
</pre>
<p>Here is our tail-rec type-checker. What just happened here? Let&#8217;s unfold the <code>Herd</code> abstraction:</p>
<pre class="brush: fsharp; title: ; notranslate">
  type m =
    | Lam of m
    | At of r

  and r = int * s

  and s =
    | Cons of m * s
    | Nil

  let rec check env = function
    | Lam n, Arr(t, u) -&gt; check (t :: env) (n, u)
    | At a, t -&gt; if infer env a = t then () else failwith &quot;not the awaited type&quot;
    | _ -&gt; failwith &quot;type mismatch&quot;

  and infer env : r -&gt; tp = function
    | i, l -&gt; thread env (List.nth env i) l

  and thread env t : s -&gt; tp = function
    | Nil -&gt; t
    | Cons (n, s) -&gt; match t with
        | Arr (t, u) -&gt; check env (n, t); thread env u s
        | Nat -&gt; failwith &quot;too many arguments&quot;
</pre>
<p>What is this language of terms?</p>
<p><img src='http://s0.wp.com/latex.php?latex=M+%3A%3A%3D+%5Clambda+x.+M%5C+%7C%5C+x%5C+S&amp;bg=fff&amp;fg=444444&amp;s=0' alt='M ::= &#92;lambda x. M&#92; |&#92; x&#92; S' title='M ::= &#92;lambda x. M&#92; |&#92; x&#92; S' class='latex' /><br />
<img src='http://s0.wp.com/latex.php?latex=S+%3A%3A%3D+%5Ccdot%5C+%7C%5C+M%3B+S&amp;bg=fff&amp;fg=444444&amp;s=0' alt='S ::= &#92;cdot&#92; |&#92; M; S' title='S ::= &#92;cdot&#92; |&#92; M; S' class='latex' /></p>
<p>It is a lambda calculus where application is n-ary: this way you get direct access to the functional part of the application, and arguments appear in &#8220;natural&#8221; order (the nearest to the function on top). This trick is called <em>spine calculus</em> by the Twelf people, and was chosen for their term representation (read <a title="The LF Seminar: Term Representation" href="http://www.cs.cmu.edu/~rjsimmon/papers/lf-meeting/meeting01.pdf">this</a>) because it is more efficient when proof-searching or unifying. But we&#8217;ll give it another name. Let&#8217;s reconstitute the typing rules from the code of the checker. A new form of judgment appears with the function <code>thread</code> used to parse these n-ary applications, that we will write <img src='http://s0.wp.com/latex.php?latex=%5CGamma%3B+A%5Cvdash+S+%3A+A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;Gamma; A&#92;vdash S : A' title='&#92;Gamma; A&#92;vdash S : A' class='latex' />. It has a distinguished formula <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=fff&amp;fg=444444&amp;s=0' alt='A' title='A' class='latex' /> on the left of the sequent corresponding to the current functional&#8217;s type. We get:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cbegin%7Barray%7D%7B+c+%7D++%5CGamma%2C+A%5Cvdash+M+%3A+B++%5C%5C%5Chline++%5CGamma%5Cvdash+%5Clambda+x.+M+%3A+A%5Cto+B++%5Cend%7Barray%7D%5C+%5Cto_r%5Cqquad++++%5Cbegin%7Barray%7D%7B+c+%7D++%5CGamma%28x%29+%3A+A%5Cquad%5CGamma%3B+A%5Cvdash+S+%3A+B++%5C%5C%5Chline++%5CGamma%5Cvdash+x%5C+S+%3A+B++%5Cend%7Barray%7D%5C+Cont++%5C%5C%5C%5C++%5Cbegin%7Barray%7D%7B+c+%7D++%5CGamma%5Cvdash+M+%3A+A+%5Cquad+%5CGamma%3B+B%5Cvdash+S+%3A+C++%5C%5C%5Chline++%5CGamma%3B+A%5Cto+B%5Cvdash+M%3B+S+%3A+C++%5Cend%7Barray%7D%5C+%5Cto_l%5Cqquad++%5Cbegin%7Barray%7D%7B+c+%7D++%5C%5C%5Chline++%5CGamma%3B+A%5Cvdash%5Ccdot+%3A+A++%5Cend%7Barray%7D%5C+Ax++&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;begin{array}{ c }  &#92;Gamma, A&#92;vdash M : B  &#92;&#92;&#92;hline  &#92;Gamma&#92;vdash &#92;lambda x. M : A&#92;to B  &#92;end{array}&#92; &#92;to_r&#92;qquad    &#92;begin{array}{ c }  &#92;Gamma(x) : A&#92;quad&#92;Gamma; A&#92;vdash S : B  &#92;&#92;&#92;hline  &#92;Gamma&#92;vdash x&#92; S : B  &#92;end{array}&#92; Cont  &#92;&#92;&#92;&#92;  &#92;begin{array}{ c }  &#92;Gamma&#92;vdash M : A &#92;quad &#92;Gamma; B&#92;vdash S : C  &#92;&#92;&#92;hline  &#92;Gamma; A&#92;to B&#92;vdash M; S : C  &#92;end{array}&#92; &#92;to_l&#92;qquad  &#92;begin{array}{ c }  &#92;&#92;&#92;hline  &#92;Gamma; A&#92;vdash&#92;cdot : A  &#92;end{array}&#92; Ax  ' title='&#92;begin{array}{ c }  &#92;Gamma, A&#92;vdash M : B  &#92;&#92;&#92;hline  &#92;Gamma&#92;vdash &#92;lambda x. M : A&#92;to B  &#92;end{array}&#92; &#92;to_r&#92;qquad    &#92;begin{array}{ c }  &#92;Gamma(x) : A&#92;quad&#92;Gamma; A&#92;vdash S : B  &#92;&#92;&#92;hline  &#92;Gamma&#92;vdash x&#92; S : B  &#92;end{array}&#92; Cont  &#92;&#92;&#92;&#92;  &#92;begin{array}{ c }  &#92;Gamma&#92;vdash M : A &#92;quad &#92;Gamma; B&#92;vdash S : C  &#92;&#92;&#92;hline  &#92;Gamma; A&#92;to B&#92;vdash M; S : C  &#92;end{array}&#92; &#92;to_l&#92;qquad  &#92;begin{array}{ c }  &#92;&#92;&#92;hline  &#92;Gamma; A&#92;vdash&#92;cdot : A  &#92;end{array}&#92; Ax  ' class='latex' /></p>
<p>This system, called the normal <img src='http://s0.wp.com/latex.php?latex=%5Cbar%5Clambda&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;bar&#92;lambda' title='&#92;bar&#92;lambda' class='latex' /> (in <a title="A lambda-calculus structure isomorphic to sequent calculus structure" href="http://pauillac.inria.fr/~herbelin/publis/csl-Her94-lambda-bar.ps.gz">this paper</a>), can be viewed as a restriction to normal forms of&#8230; a restriction of the usual sequent calculus: try to erase the terms in these rules, you&#8217;ll recognize both rules for implication! The second restriction is due to the new judgement: it is not allowed to use a <img src='http://s0.wp.com/latex.php?latex=%5Cto_r&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;to_r' title='&#92;to_r' class='latex' /> rule as the right premise of a <img src='http://s0.wp.com/latex.php?latex=%5Cto_l&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;to_l' title='&#92;to_l' class='latex' /> rule for instance: once you focus on a premise, you have to treat it until the end (of the spine). It is called LJT, and has the same expressive power as the traditional sequent calculus though.</p>
<h3>Conclusion</h3>
<p>So, rewriting the history of logics, I hope I convinced you how (intuitionistic) sequent calculus could have been invented by a hacker, as a tail-recursive version of natural deduction.</p>
<p>The same optimization is extensible to conjunctions, but unfortunately it fails to recover the disjunction rule of LJT. This should be adressed&#8230;</p>
<p>Remains to show too how this influences non-normal terms. Does it even make sense when we don&#8217;t have the nice correspondence between syntax and judgements? If we pass this difficulty, the next step would be to write an interpreter. What will the most convenient term representation be then? Reversed or not reversed?</p>
<p>A last remark: the same trick of reversion can be done on <img src='http://s0.wp.com/latex.php?latex=M&amp;bg=fff&amp;fg=444444&amp;s=0' alt='M' title='M' class='latex' />&#8216;s too (which are a <code> (a, m) Herd.t</code>), so as to see lambdas in a n-ary way. What kind of logic do we get? What do we optimize?</p>
<p>Thanks for reading!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/syntaxexclamation.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/syntaxexclamation.wordpress.com/129/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=129&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://syntaxexclamation.wordpress.com/2011/09/01/reverse-natural-deduction-and-get-sequent-calculus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/34a3291be1d0c68725533654d7848863?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mqtthiqs</media:title>
		</media:content>
	</item>
		<item>
		<title>Reversing data structures</title>
		<link>http://syntaxexclamation.wordpress.com/2011/08/31/reversing-data-structures/</link>
		<comments>http://syntaxexclamation.wordpress.com/2011/08/31/reversing-data-structures/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 12:59:47 +0000</pubDate>
		<dc:creator>Matthias Puech</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[context]]></category>
		<category><![CDATA[data structure]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[herd]]></category>
		<category><![CDATA[ocaml]]></category>
		<category><![CDATA[reverse]]></category>
		<category><![CDATA[zipper]]></category>

		<guid isPermaLink="false">http://syntaxexclamation.wordpress.com/?p=94</guid>
		<description><![CDATA[A reversed list is not really a list anymore. It is isomorphic to a list, but it is not a list. Let me explain why. Prelude: zippers and contexts How do we define in general reversing a data structure? Intuitively, constructors at the bottom of an original value must appear at the top of the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=94&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A reversed list is not really a list anymore. It is isomorphic to a list, but it is not a list. Let me explain why.<span id="more-94"></span></p>
<h3>Prelude: zippers and contexts</h3>
<p>How do we define in general reversing a data structure? Intuitively, constructors at the bottom of an original value must appear at the top of the resulting value. A good formal definition arises from the <a title="Zipper (Wikipedia)" href="http://en.wikipedia.org/wiki/Zipper_%28data_structure%29"><em>zipper</em></a> of a data structure. The idea of the zipper is to represent a particular point(er) inside a deep value <code>v : t</code> by a pair of a value <code>v' : t</code>, which is what&#8217;s underneath the current point, and a value <code>c : t'</code>, which is the <em>path</em> from the root of <code>v</code> to the current point, <em>i.e. </em>what&#8217;s above current point. It is sometimes referred as the <em>one-hole context</em> or <em>derivative</em> of the data type (see <a title="Conor McBride - The Derivative of a Regular Type is its Type of One-Hole Contexts" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8611">this</a> of course). The type of the context depends on the type the value. A classic example, binary trees:</p>
<pre class="brush: fsharp; title: ; notranslate">
module type Btree = struct

  type 'a t =
    | Node of 'a t * 'a * 'a t
    | Leaf

  type 'a t' =
    | Top
    | Left of 'a t * 'a * 'a t'
    | Right of 'a t' * 'a * 'a t

  type 'a zipper = 'a t * 'a t'
</pre>
<p>Type <code>'a t</code> is the data type of binary trees, <code>'a t'</code> is the context, and <code>'a zipper</code> represents positions in a <code>'a t</code>. The context has 3 possible value: either we are at the <code>Top</code> of the structure (there is no context), or we just went <code>Left</code> of a <code>Node</code>, or <code>Right</code>. These types traditionally come with functions</p>
<pre class="brush: fsharp; title: ; notranslate">
  val top : 'a t -&gt; 'a zipper
  val up : 'a zipper -&gt; 'a zipper
  val left : 'a zipper -&gt; 'a zipper
  val right : 'a zipper -&gt; 'a zipper
end
</pre>
<p>to initialize a zipper and navigate through the data structure. But let&#8217;s forget about traversal for now and look closer at the type of contexts <code>t'</code>. To every <code>Leaf</code> of a tree <code>v</code>, there is a corresponding context which allows to reconstruct the whole tree, and it is stored <em>reverse order</em>, from bottom to top. That&#8217;s exactly what we want!</p>
<h3>Andante: reversing lists</h3>
<p>What is the context corresponding to lists? Likewise, we construct it by enumerating all possible ways of making a hole in a list constructor; there is just one:</p>
<pre class="brush: fsharp; title: ; notranslate">
module List = struct

  type 'a t =
    | Cons of 'a * 'a t
    | Nil

  type 'a t' =
    | Top
    | Down of 'a * 'a t'
</pre>
<p>As announced, <code>t'</code> is isomorphic to <code>t</code>! And since there is just one leaf in a list, reversing a list is a deterministic operation, from <code>t</code> to <code>t'</code>:</p>
<pre class="brush: fsharp; title: ; notranslate">
  let rec rev_append acc : 'a t -&gt; 'a t' = function
    | Nil -&gt; acc
    | Cons (x, xs) -&gt; rev_append (Down (x, acc)) xs

  let rev l = rev_append Top l
</pre>
<p>which is usually written from <code>t</code> to <code>t</code>. Let us write both variants of fold, on lists and reversed lists:</p>
<pre class="brush: fsharp; title: ; notranslate">
  let rec fold_left f acc = function
    | Nil -&gt; acc
    | Cons (x, xs) -&gt; fold_left f (f acc x) xs

  let rec fold_right' f acc = function
    | Top -&gt; acc
    | Down (x, xs) -&gt; f (fold_right' f acc xs) x
</pre>
<p>These functions are interchangeable: one acts on lists, the other on the reversed list and we have</p>
<p><code>fold_left f acc l = fold_right' f acc (rev l)</code></p>
<p>Note that <code>fold_left</code> is tail-recursive, while <code>fold_right'</code> is not. Conversely,</p>
<pre class="brush: fsharp; title: ; notranslate">
  let rec fold_right f acc = function
    | Nil -&gt; acc
    | Cons (x, xs) -&gt; f (fold_right f acc xs) x

  let rec fold_left' f acc = function
    | Top -&gt; acc
    | Down (x, xs) -&gt; fold_right' f (f acc x) xs
</pre>
<p>We have</p>
<p><code>fold_right f acc l = fold_left' f acc (rev l)</code></p>
<p>and <code>fold_right</code> is not tail-recursive, while <code>fold_left'</code> is. Of course, both usual equations without the primes don&#8217;t hold since the types of lists don&#8217;t match anymore.</p>
<p>The moral is that if you use <code>fold_left</code> a lot, you will likely prefer lists rather than reversed lists, but if you use <code>fold_right</code>, you&#8217;d better store (or convert on-the-fly) your lists in reversed form. Probably no big news.</p>
<h3>Variations on the same theme: <em>herds</em></h3>
<p>Let us redo the same thing on a variation of list: a <em>herd</em> (I just made up the name) is a list of elements, with a distinguished element of another type at the bottom:</p>
<pre class="brush: fsharp; title: ; notranslate">
module Herd = struct

  type ('sheep, 'dog) t =
    | Cons of 'sheep * ('sheep, 'dog) t
    | Nil of 'dog
</pre>
<p>Its context should make the <code>'dog</code> lead the way, followed by all the <code>'sheep</code>s in reverse order:</p>
<pre class="brush: fsharp; title: ; notranslate">
  type 'sheep t'' =
    | Down of 'sheep * 'sheep t''
    | Top

  type ('sheep, 'dog) t' = Bot of 'dog * 'sheep t''
</pre>
<p>This time, the type <code>t'</code> of contexts is <em>not</em> isomorphic to <code>t</code>. The <code>rev</code> function is written</p>
<pre class="brush: fsharp; title: ; notranslate">
  let rec rev_append acc : ('a, 'b) t -&gt; ('a, 'b) t' = function
    | Cons (x, xs) -&gt; rev_append (Down (x, acc)) xs
    | Nil z -&gt; Bot (z, acc)

  let rev l = rev_append Top l
end
</pre>
<p>Let&#8217;s now write the iterators on herds. As with lists, there are four of them, alternatively tail-recursive and not, and they satisfy the same equations. Because of the <code>'dog</code> at the tail of the <code>Herd.t</code> (and at the head of its context), the types of these functions differ from the usual lists: they take a supplementary function <code>g</code> that&#8217;s used to wrap around the <code>'dog</code>.</p>
<p><code>fold_left f g acc l = fold_right' f g acc (rev l)<br />
fold_right f g acc l = fold_left' f g acc (rev l)</code></p>
<pre class="brush: fsharp; title: ; notranslate">
  let rec fold_left f g acc = function  (* tail-rec *)
    | Nil z -&gt; g acc z
    | Cons (x, xs) -&gt; fold_left f g (f acc x) xs

  let rec fold_right f g acc = function (* non-tail-rec *)
    | Nil z -&gt; g acc z
    | Cons (x, xs) -&gt; f x (fold_right f g acc xs)

  let fold_left' f g acc = function     (* tail-rec *)
    | Bot (z, l) -&gt;
      let rec aux acc = function
        | Top -&gt; acc
        | Down (x, xs) -&gt; aux (f acc x) xs
      in aux (g acc z) l

  let fold_right' f g acc = function    (* non-tail-rec *)
    | Bot (z, l) -&gt;
      let rec aux acc = function
        | Top -&gt; acc
        | Down (x, xs) -&gt; f (aux acc xs) x
      in g (aux acc l)
end
</pre>
<p><strong>Examples</strong></p>
<ul>
<li><code>fold_left f g x (Cons (1, Cons (2, Cons (3, Nil true)) = g (f (f (f x 1) 2) 3) true</code></li>
<li><code>fold_right f g x (Cons (1, Cons (2, Cons (3, Nil true)) = f (f (f (g x true) 1) 2) 3</code></li>
</ul>
<h3>Coda: the (lack of) punchline</h3>
<p>There is no real punchline here, other than a strict construction of what reversing a list-like structure is by means of context, and what effect it has on the tail-recursivity of iterators. In the next post, I will show how this technique may shed some light on a logical object, namely the term assignment of an intuitionistic sequent calculus.</p>
<p>I wonder if &#8212; no, I am sure that &#8212; this construction can be made mechanical. I&#8217;ve been showed recently by <a title="Ian Zerny" href="http://www.zerny.dk/">Ian</a> the magic of defunctionalization of the CPS-transform of programs (see <a title="A walk in the semantic park" href="http://dl.acm.org/citation.cfm?doid=1929501.1929503">this</a>), and how it gives rise to the one-hole contexts. Yet I am not comfortable enough yet with these ideas to have managed to derive the <code>t'</code> from the <code>t</code> mechanically, and observe the inversion of tail-recursivity of the iterators. Maybe someone will?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/syntaxexclamation.wordpress.com/94/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/syntaxexclamation.wordpress.com/94/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=94&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://syntaxexclamation.wordpress.com/2011/08/31/reversing-data-structures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/34a3291be1d0c68725533654d7848863?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mqtthiqs</media:title>
		</media:content>
	</item>
		<item>
		<title>An OCaml hack: recover the abstraction of abstract types</title>
		<link>http://syntaxexclamation.wordpress.com/2011/08/15/an-ocaml-hack-recover-abstraction-of-abstract-types/</link>
		<comments>http://syntaxexclamation.wordpress.com/2011/08/15/an-ocaml-hack-recover-abstraction-of-abstract-types/#comments</comments>
		<pubDate>Mon, 15 Aug 2011 20:20:12 +0000</pubDate>
		<dc:creator>Matthias Puech</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dev]]></category>
		<category><![CDATA[ocaml]]></category>

		<guid isPermaLink="false">http://syntaxexclamation.wordpress.com/?p=31</guid>
		<description><![CDATA[Here is a scoop: in OCaml, your abstract types aren&#8217;t really abstract (unfortunately). This is because some magical functions in the standard library don&#8217;t respect the abstraction of data types: Pervasive.compare, Hashtbl.hash, Pervasive.(=) &#8212; yes, the very equality you use everyday &#8212; etc. The problem is that if you contribute to a large project written [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=31&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here is a scoop: in OCaml, your abstract types aren&#8217;t really abstract (unfortunately). This is because some magical functions in the standard library don&#8217;t respect the abstraction of data types: <code>Pervasive.compare</code>, <code>Hashtbl.hash</code>, <code>Pervasive.(=)</code> &#8212; yes, the very equality you use everyday &#8212; etc. The problem is that if you contribute to a large project written in OCaml and you want to take advantage of abstract types, these functions become your worst enemies because they are breaking the abstraction you were carefully designing. You&#8217;ll probably end up as I did finding and removing by hand every occurrence of these magical functions in your code. The following is for you then.<span id="more-31"></span></p>
<p>In this post, I&#8217;ll give you a feel of what&#8217;s wrong with these functions, if you don&#8217;t have one already, and I&#8217;ll then show you a quick and dirty trick to spot them at run-time. I&#8217;ll finish with some hints of what could be done in a compiler to fix their defect.</p>
<h3>Grandeur of the abstract types</h3>
<p>Imagine for instance that you just (re-)invented lists in OCaml; you don&#8217;t want to write the module:</p>
<pre class="brush: fsharp; title: ; notranslate">
module List0 : sig
  type 'a t =
    | Nil
    | Cons of 'a * 'a t
  val app : 'a t * 'a t -&gt; 'a t
end = struct
  type 'a t =
    | Nil
    | Cons of 'a * 'a t

  let rec app = function
    | Nil, m -&gt; m
    | Cons (x, l), m -&gt; Cons (x, app (l, m))
end
</pre>
<p>because you&#8217;re then tied to the underlying, canonical representation of lists for ever. Instead, prefer something like this, using an abstract type and explicit constructors &amp; destructors:</p>
<pre class="brush: fsharp; title: ; notranslate">
module List1 : sig
  type 'a t
  val nil : 'a t
  val cons : 'a * 'a t -&gt; 'a t
  val destruct : 'a t -&gt; ('a * 'a t) option
  val app : 'a t * 'a t -&gt; 'a t
end = struct
  type 'a t =
    | Nil
    | Cons of 'a * 'a t

  let nil = Nil
  let cons (x, l) = Cons (x, l)

  let destruct = function
    | Nil -&gt; None
    | Cons (x, l) -&gt; Some (x, l)

  let rec app = function
    | Nil, m -&gt; m
    | Cons (x, l), m -&gt; Cons (x, app (l, m))
end
</pre>
<p>It is a bit heavier since you have to turn every constructor into a function and you can&#8217;t use pattern-matching as before (you have to call <code>destruct</code>). Too bad OCaml doesn&#8217;t have a <a title="Views" href="http://homepages.inf.ed.ac.uk/wadler/papers/view/view.ps">view mechanism</a>! But if you end up using <code>app</code> a lot and few <code>cons</code> or <code>destruct</code>, there is probably a nicer underlying representation of lists for you with the same signature: you can switch from canonical lists to binary trees for instance:</p>
<pre class="brush: fsharp; title: ; notranslate">
module List2 : sig
  type 'a t
  val nil : 'a t
  val cons : 'a * 'a t -&gt; 'a t
  val app : 'a t * 'a t -&gt; 'a t
  val destruct : 'a t -&gt; ('a * 'a t) option
end = struct
  type 'a tree =
    | Leaf of 'a
    | Node of 'a tree * 'a tree
  type 'a t = Empty | Tree of 'a tree

  let nil = Empty

  let cons = function
    | x, Empty -&gt; Tree (Leaf x)
    | x, Tree t -&gt; Tree (Node (Leaf x, t))

  let app = function
    | Tree t, Tree u -&gt; Tree (Node(t, u))
    | Tree t, Empty | Empty, Tree t -&gt; Tree t
    | Empty, Empty -&gt; Empty

  let destruct = function
    | Empty -&gt; None
    | Tree t -&gt;
      let rec aux = function
        | Leaf x -&gt; Some (x, Empty)
        | Node (Leaf x, u) -&gt; Some (x, Tree u)
        | Node (Node (t, u), v) -&gt; aux (Node (t, Node (u, v)))
      in aux t
end
</pre>
<p>Changing the <code>List</code> module from <code>List1</code> to <code>List2</code> should be transparent for all possible clients, since the type <code>'a t</code> is abstract&#8230;</p>
<h3>The problem with OCaml&#8217;s generic equality</h3>
<p>&#8230; well, unfortunately, it is not, because OCaml&#8217;s generic comparison functions don&#8217;t care about abstract type. Try to run this for example:</p>
<pre class="brush: fsharp; title: ; notranslate">
app (cons (1, cons (2, nil)), cons (3, nil)) = app (cons (1, nil), cons (2, cons (3, nil)))
</pre>
<p>The answer is <code>false</code>, because the terms to constructed on both sides are <code>Tree (Node (Leaf 1, Node (Leaf 2, Leaf 3)))</code> and <code>Tree (Node (Node (Leaf 1, Leaf 2), Leaf 3))</code>, which are syntactically not the same, although abstractly, they are <em>quotiented</em> by the <code>destruct</code> function. If you compare their abstract structure by using <code>destruct</code>:</p>
<pre class="brush: fsharp; title: ; notranslate">
destruct (app (cons (1, cons (2, nil)), cons (3, nil))) = destruct (app (cons (1, nil), cons (2, cons (3, nil))))
</pre>
<p>they are equal. Not very consistent isn&#8217;t it? Why not? Because you probably expect the equality to be <em>referentially transparent</em> wrt. <code>destruct</code>, or in other words, <code>destruct</code> should be the only morphism for equality on lists:</p>
<p><code>l = l' <img src='http://s0.wp.com/latex.php?latex=%5CLeftrightarrow&amp;bg=fff&amp;fg=444444&amp;s=0' alt='&#92;Leftrightarrow' title='&#92;Leftrightarrow' class='latex' /> destruct l = destruct l'</code></p>
<p>Instead, the generic equality has no awareness of the function destruct, and just cross the boundary of the abstract types comparing values, constructors by constructors.</p>
<h3>How to spot them</h3>
<p>Now, imagine that your very big project uses module <code>List1</code>, and you want to switch to <code>List2</code>, but you are crippled by the fact that in many places, people (or you, when you were sillier) used magical functions on <code>List1.t</code>. The change as-is break everything! How to spot them? Well, let&#8217;s rely on a dirty trick, curing the bad by the evil: We don&#8217;t like Pervasive.(=), but Pervasive.(=) doesn&#8217;t like functions! Recall that functions are values, and these magical functions compare values. If one of them meets a function on its way, it immediately fails with an <code>Invalid_argument</code> exception. So we&#8217;re going introduce functions in our abstract types! People using proper destructors won&#8217;t see the difference since we&#8217;re going to hide it, but people jumping over them will be caught the first time their code will be called!</p>
<p>Let&#8217;s transform <code>List1</code> into:</p>
<pre class="brush: fsharp; title: ; notranslate">
module List1' : sig
  type 'a t
  val nil : 'a t
  val cons : 'a * 'a t -&gt; 'a t
  val destruct : 'a t -&gt; ('a * 'a t) option
  val app : 'a t * 'a t -&gt; 'a t
end = struct
  type 'a t' =
    | Nil
    | Cons of 'a * 'a t'
  type 'a t = (int -&gt; int) * 'a t'

  let inj x = (fun x -&gt; x), x
  let prj (_, x) = x
  let nil = inj Nil
  let cons (x, l) = inj (Cons (x, prj l))

  let destruct x = match prj x with
    | Nil -&gt; None
    | Cons (x, l) -&gt; Some (x, (inj l))

  let rec app (x, y : 'a t * 'a t) = match prj x, prj y with
    | Nil, m -&gt; inj m
    | Cons (x, l), m -&gt; inj (Cons (x, prj (app (inj l, inj m))))
end
</pre>
<p>We&#8217;ve wrapped the original list into a pair of a (dummy) function and a list. Decorating our previous code with <code>inj</code> and <code>prj</code> which create and ignore this pair, we get an equivalent code&#8230; except when comparing two values with the magical functions. Now try the previous equality test, you&#8217;ll get an exception!</p>
<p>Finally, to track where this exception was raised, wrap you whole program into an exception catcher which prints its backtrace:</p>
<pre class="brush: fsharp; title: ; notranslate">
try main ()
with Invalid_argument &quot;equal: functional value&quot; as e -&gt;
  Printexc.print_backtrace stderr; raise e
</pre>
<p>and launch it in shell environment <code>OCAMLRUNPARAM=b</code> (be careful that this exception is not catched earlier somewhere in <code>main</code>). Just wait for the error to happen and the backtrace will tell you exactly where to go to spot the incorrect usage of the evil magical functions! This is what we could ironically call a run-time type-checker. Of course a compile-time solution would be far better.</p>
<h3>Some proposals to perform equality the right way</h3>
<p>We need some way to quickly compare value, but not in a type-unsafe way as it is arguably today. Haskell solves that issue with type-classes; we might not want to go that far, but here are a couple of hacks we could implement on the OCaml compiler to fix the behavior of the magical functions:</p>
<ul>
<li>The type system could check that every occurrence of a magical function does not cross any abstract types and issue a warning if it is the case. Beware of polymorphism: a magical function is either Pervasive.compare etc., or a polymorphic function that calls a magical function on its polymorphic argument (<em>e.g.</em> <code>List.mem</code>, <code>List.assoc</code> etc.)</li>
<li>There could be a mechanism to register custom type destructors to be used by the magical functions when comparing values, the same way we register printers in the toplevel and debugger. This would be kind of the opposite of what Haskell has: we don&#8217;t explicitly mention when to build the equality (&#8220;deriving&#8221;), but when <strong>not</strong> to use it directly:
<pre class="brush: fsharp; title: ; notranslate">
let destructor destruct x = match prj x with
    | Nil -&gt; None
    | Cons (x, l) -&gt; Some (x, (inj l))
</pre>
<p>Everytime <code>Pervasive.compare</code> has to compare two values of the type of <code>x</code> (<code>'a List2.t</code>), then it calls <code>destruct</code> and continue recursively on its result.</li>
</ul>
<p>Note that both ideas &#8212; a compile-time and a run-time support &#8212; are compatible together, and would make up for a pretty nice equality subsystem!</p>
<p>Thanks for reading!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/syntaxexclamation.wordpress.com/31/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/syntaxexclamation.wordpress.com/31/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=syntaxexclamation.wordpress.com&#038;blog=14690639&#038;post=31&#038;subd=syntaxexclamation&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://syntaxexclamation.wordpress.com/2011/08/15/an-ocaml-hack-recover-abstraction-of-abstract-types/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/34a3291be1d0c68725533654d7848863?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mqtthiqs</media:title>
		</media:content>
	</item>
	</channel>
</rss>
