Automating Transport with Univalent E-Graphs

How cool would it be if we could take entire verified libraries in an interactive theorem prover defined over existing datatypes and, with the push of a button (and maybe a tiny bit of user guidance), port those libraries’ functions and proofs to use our own datatypes? This is the problem that DEVOID, a part of the PUMPKIN PATCH plugin suite, chips away at—starting with a small and confined use case.

I’ve recently extended DEVOID (in a branch) to chip away more at this problem, and the more and more I chip away at it, the clearer the picture becomes. I think this kind of automation is pretty closesay, 5-10 years away—but we have some interesting problems to solve first.

In this post, I’ll talk about one of those problems: efficiently automating transport across type equivalences. A type equivalence between two types A and B is a pair of functions:

f : A -> B
g : B -> A

that are mutual inverses:

section: forall (a : A), g (f a) = a.
retraction: forall (b : B), f (g b) = b.

Transport, at a very high level, means taking something defined over A and spitting out something analogous defined over B. Automating transport is key to good reuse and even to related problems like proof repair. I have some ideas for how to do this that I’d like to share.

If you’re interested in any of these problems, please talk to me, because a lot of this is work in progress.

Transport in DEVOID

In homotopy type theory, transport is a function that lets you derive analogous proofs about B from proofs about A, given an equivalence between A and B.

Since we can’t define transport as a function in Coq without any axioms (in general, at least, but there’s some great work on how close we can get), in DEVOID, transport is a program transformation implemented in OCaml that takes in Coq proofs about A and spits out Coq proofs about B. This has the extra benefit of producing fast functions and small proof terms, and allowing us to get rid of A entirely after porting our functions and proofs if we’d like.

The caveat right now is that while in homotopy type theory, this transport function works for any equivalence, in DEVOID, it’s so far implemented for just a few kinds of equivalences. I’m working on changing that, but that’s a topic for a different day.

Push-Button Transport

Even if you have transport defined as a function in your language, like in homotopy type theory, it’s still not fully automatic. You have to explicitly apply this transport function in order to derive your new proofs (think of this like rewriting along equalities in Coq, but fancier). This can be really annoying, especially when you want to port an entire library or something that effect.

With DEVOID, we automate the most basic case, where the user just wants to change every A to a B (or similarly in the opposite direction). For example, one of my users recently came to me with a problem. He had a compiler from another language to Coq’s specification language Gallina, and his compiler generated types and functions that looked like this:

Module Generated.

  Definition input := (prod bool (prod nat bool)).

  Definition output := (prod nat bool).

  Definition op (r : (prod bool (prod nat bool))) : (prod nat bool) :=
    pair (fst (snd r)) (andb (fst r) (snd (snd r))).

Emd Generated.

He wanted to write some proofs about the generated functions. Unfortunately, the generated functions were really gross to look at. He wanted to write proofs about types that looked more like this:

Record input :=
  MkInput { firstBool : bool; numberI : nat; secondBool : bool }.

Record output :=
  MkOutput { numberO : nat; andBools : bool }.

Using an extended branch of DEVOID, my user can now do this easily. All he has to do is call the Lift command:

Lift Generated.input input in Generated.op as op_1.
Lift Generated.output output in op_1 as op.

This proves the equivalences between Generated.input and input and between Generated.output and output, then writes functions for him over his records:

Definition op (r : input) : output :=
      numberO := numberI r;
      andBools := andb (firstBool r) (secondBool r)

He can then write his proofs over these records:

Theorem and_spec_true_true :
  forall (r : input),
    firstBool r = true ->
    secondBool r = true ->
    andBools (op r) = true.
  intros. destruct r as [f n s]. simpl in *. intuition.

then lift those proofs back to get analogous proofs over his original specification:

Module GeneratedProofs.

  Theorem and_spec_true_true :
    forall (r : Generated.input),
      fst r = true ->
      snd (snd r) = true ->
      andb (fst (Generated.op r)) (snd (snd (Generated.op r))) = true.

End GeneratedProofs.

Not Quite Push-Button Transport

What I’ve shown you is pretty automatic, but you can already see a problem if you look closely. Take a look at Generated.op:

Definition op (r : (prod bool (prod nat bool))) : (prod nat bool) :=
    pair (fst (snd r)) (andb (fst r) (snd (snd r))).

We can think of the input r as having type Generated.input, or we can think of it as having type (prod bool Generated.output). Our procedure is so naive that it turns every A into a B. So it works fine if the user lifts in this order:

Lift Generated.input input in Generated.op as op_1.
Lift Generated.output output in op_1 as op.

but if the user lifts in the opposite order:

Lift Generated.output output in Generated.op as op_1.
Lift Generated.input input in op_1 as op.

we’re in trouble:

Definition op (r : bool * output) : output := 
    numberO := numberO (snd r);
    andBools := andb (fst r) (andBools (snd r))

The function that we get back is also correct, but it’s not the one we want.

So the user has to think hard about the order, and still is pretty limited in what is possible. If the user wants to lift only some occurrences of A, this is only possible in DEVOID right now with some hacky options.

Automatic Transport as Type-Directed Search

The better solution, in the long run, will be implementing automatic transport as type-directed search. So the user can just say:

Lift Generated.op as op : input -> output.

and DEVOID will look through a bunch of stored equivalences, and apply the program transformation along the correct ones to get the desired type.

The solutions I’ve seen for this so far in other tools like the univalent parametricity framework use type classes. I think this is a good start, but I suspect that it won’t scale well when we have large libraries to port, and when we want to port those libraries along many different equivalences. Indeed, Nate Yazdani, who worked with me on DEVOID, noted that in his experience with the univalent parametricity framework, it was common for this type-directed search to spin out of control.

Univalent E-Graphs

This is the cool part.

Automatic transport isn’t special. Equality up to transport is just an equivalence relation, and automatic transport just searches for proofs of that relation and rewrites along those proofs, either by directly applying transport (in homotopy type theory and univalent parametricity) or by transforming the term in a metalanguage (in DEVOID) . In other words, automatic transport is a rewrite system.

There are these beautiful data structures called e-graphs that are already well equipped for handling this efficiently and elegantly in non-dependently-typed languages. They are used, for example, inside of SMT solvers. They are basically designed to help deal with the problem like the one we saw with op earlier, when there were multiple equivalences to choose from.

Recent work extended these data structures to handle dependent types. The result was wonderful automation of equality proofs in the Lean theorem prover. The cost was introducing an axiom called UIP to Lean that is not in Coq and is incompatible with homotopy type theory (without any axioms, the notion of equality between dependent types is just not very useful for automation like this).

What if we were to implement this data structure for equality up to transport rather than heterogenous equality as in Lean? In univalent type theories like homotopy type theory, I bet we could implement automatic transport this way, using what would essentially be an extremely powerful congruence tactic coupled with the transport function. In the univalent parametricity framework, I imagine we could get much more predictable and efficient type-directed transport when managing a lot of equivalences. And in DEVOID, I bet we could also get efficient type-directed search.

Thinking Big

The IAS memorial service spoke about Voevodsky’s vision of proof reuse, a world where:

“mathematicians around the world could collaborate by depositing proofs and constructions in the computer, and … it  would be up to the computer to locate the equivalence between formulations and [to] transport the constructions from one context to another.”

I think we as a community are only 5-10 years away from realizing that vision if we really sit down and collaborate on these problems. DEVOID already does both of these for some specific kinds of equivalences, and the second of these is automatic transport, the subject of this post.

I think this also hides the key to better proof repair. Proof repair is really nothing but proof reuse over time. And I bet that, in a dependently-typed language, you can think of any change as an equivalence between sigma types. The problem just becomes exposing a good interface around searching for and transporting along this equivalence (maybe using example-based search like the original PUMPKIN PATCH, maybe doing something different), getting rid of all references to the old type (which DEVOID can do), and then hopefully at some point converting everything back to a reasonable, maintainable set of tactics (my collaborator RanDair Porter has ongoing work on this).

So I think we’re also not too far from a world in which we can freely change our specifications and easily repair our proofs. And I think that type-directed automatic transport really holds the key to this as well.

Let’s Hack!

In summary, I think maybe the answer to easy proof reuse, awesome congruence tactics in univalent systems, automatic transport, and proof repair is sitting there right in front of us. What I want right now is a hackathon for univalent e-graphs. I want the hackathon to include, at a minimum:

  • Me and my coauthors (DEVOID)
  • Daniel Selsam and Leo de Moura (congruence closure in Lean)
  • Nicolas Tabareau, Eric Tanter, and Matthieu Sozeau (univalent parametricity in Coq)
  • Bas Spitters (interested in better congruence in Coq)
  • Your name here?

Let’s all just sit down and hack away at univalent e-graphs for a couple of weeks. I think the benefits for the interactive theorem proving community would be massive.

Thanks to Chandra and Max for initial thoughts about when e-graphs would be useful.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.