Social Processes and the Next 100 Years of Verification

The infamous Social Processes paper concludes that my research area is worthless. So maybe it’s unconventional to say this, but I love this paper.

Here’s the abstract, for the unacquainted:

It is argued that formal verifications of programs, no matter how obtained, will not play the same key role in the development of computer science and software engineering as proofs do in mathematics. Furthermore the absence of continuity, the inevitability of change, and the complexity of specification of significantly many real programs make the formal verification process difficult to justify and manage. It is felt that ease of formal verification should not dominate program language design.

The first time I read this paper as a first-year Ph.D. student, it made me feel incensed—the authors bring up so many legitimate concerns, but then instead of concluding that the concerns are surmountable with more work, they conclude that program verification is not worth pursuing as a research area. To me, all of the concerns they bring up seemed surmountable (granted, with 40 years of work to reflect on since the paper was written). This is why I ended up working on making proof assistants like Coq easier to use; this inspired my thesis.

So I like to think of this paper as “The Next 100 Years of Verification,” even though the authors hold the view that there should not be any of that work. In other words, I think this paper is fantastic with one caveat: All of its conclusions are wrong.

The concerns the authors have about verification are real, but they’re also surmountable. Some of them have been addressed already, some are currently being addressed, and some are viable future research directions.

I’m going to discuss three concerns the authors have that I find particularly interesting, in the light of progress that has been made in the last 40 years. This is only a small sample! If you’re looking for a fun project to make proof assistants more accessible and useful, this great paper from 1979 is full of ideas, some of which are still untouched.

Believing Theorems and Proofs

As the authors note, believing a theorem or proof is a social process. The authors then conclude that the same social processes do not apply to verification.

The standard counterclaim to this point from programming languages researchers is that the computer does the believing for you: Proof assistants like Coq have a small core kernel which checks the proofs. Obviously, there’s always something to trust, no matter how deep down you get, but in practice people do seem to trust proofs that are checked by computers. In programming languages especially, a formal development of some result in a proof assistant is often considered more trustworthy than a hand-written proof of the same result.

I think this standard response is a half-truth that dodges the authors’ concerns. In fact, the same social processes do apply to verification. And that is valuable. We just have a lot of work left to do to make that easier.

Formal developments rarely come without explanation: research papers, comments in the code, research talks, accompanying websites, and so on. It seems that formal developments add trust, but alone are not enough—humans still crave understanding of the developments themselves. So we read about them, we discuss them, we play with them in our own IDEs, we use them to derive new results. We just trust them that much more because the kernel has checked the result for us.

So how do we make this easier? One cool area to focus on is human readability of proofs. In this spirit, some proof assistants like Mizar stress human readability; the Formalized Mathematics journal consists entirely of Mizar proofs that are automatically translated into English. There are Mizar modes for many proof assistants, including Coq. Isabelle/HOL comes with a style guide which recommends using Isar, a proof language influenced by Mizar that also stresses readability.

Mizar-inspired languages and modes are just one way of improving readability. Style guides are common within certain frameworks or developments, such as the Coq Homotopy Type Theory library. There is also some preliminary work on making proofs readable post-hoc without enforcing a certain style, for example by translating Coq proofs into English. There is a lot of work left to do here, but it’s possible to do this work.

Proofs as Messages

There’s a beautiful quote about proofs in here:

The proof of a theorem is a message. A proof is not a beautiful abstract object with independent existence. No mathematician grasps a proof, sits back, and sighs happily at the knowledge that he can now be certain of the truth of his theorem. He runs out into the hall and looks for someone to listen to it.

I find this quote really interesting because some mathematicians claim to think of proofs as irrelevant: Once we have a proof, the theorem is true, so we can throw away the proof and believe the theorem.

In practice, I think mathematicians actually do use information from the proof itself. And I think the authors of this paper, perhaps inadvertently, are getting at this. A proof of a theorem in itself carries more information than just the theorem it proves; an elegant proof tells you more than just the truth value of the theorem.

In proof assistants like Coq, every proof corresponds to a term. While it can be useful to erase proofs as Coq does during extraction, and while proof irrelevance itself—the notion that any two proofs of a given proposition are equal—also has practical benefits, I think there is a lot of benefit to considering the actual proof terms themselves, especially for automated tooling. This is an area where computers can shine.

Enter proof reuse, an area that has been around since the 1990s. The key insight behind proof reuse is that we can write tools (languages, plugins, and so on) that can derive new proofs from existing proofs!

There are real-life analogues to a lot of work in proof reuse. Let’s consider inverting proofs. Pretty much anyone who has ever written a paper proof has written one that proceeds like this:

We show one direction; the opposite direction follows similarly.

This is a tactic. It’s a tactic that you expect the reader of your proof to have installed in her brain. And it’s a tactic that actually considers the proof object itself. To run this tactic, the reader reads the first half of your proof once again, this time with some variables swapped; this proves the opposite direction. And this is exactly what the inversion component of my tool PUMPKIN PATCH does; you can implement a toy version of this in my Magic tutorial plugin, along with toy versions of tactics for generalization of proofs, factoring of proofs into multiple proofs of intermediate lemmas, and reduction of complex proofs into simpler proofs.

So I think that in the real world, proofs are relevant, and that is fantastic. My takeaway from this is that we should consider what other information proofs carry, and use it to automatically extract more information. I think that computers can be a lot better at this than humans! But ultimately, the final information the computer spits out is for a human, and tooling should keep that in mind.

Changing Specifications

This is a minor concern in the scope of the Social Processes paper, but at one point the authors do note that real programs (and their specifications) change over time; verification is useless if it can’t handle these changes. Indeed, it is, which is why there is a lot of work on addressing changes, either by making your formalization resilient to change ahead of time, or by using automatic tooling to adapt it after specifications change.

There’s some great work here at UW on design principles that can make proofs resilient to change; some of the design principles cross over directly from software engineering. Adam Chlipala (who, for those of you in the class I’m TAing, wrote the book you’re using) advocates for writing powerful domain-specific tactics; updating a proof development to a change in specification is then reduced to updating the automation (or, sometimes, to nothing at all, if your automation is powerful enough), which only has to be done once in a single location instead of over and over again in every proof.

But if you’re not perfect and can’t foresee every change, then it’s still OK, because proofs are programs. And that means we can automatically repair them like we do with programs. Actually, we’re really lucky here, because traditional program repair is hard precisely because we rarely have full specifications for programs, so it is difficult to tell when a patched program is correct—a famous paper recently showed that test cases alone are usually not enough, and patches generated by tooling that only considers test cases are often incorrect. But in the world of proofs, we know exactly when the patched proof is correct: It proves the theorem we want it to prove. We have the formal specification most program repair researchers would kill for. So proof repair is, in a sense, easier than other kinds of program repair.

Work on proof repair is in its infancy, but is my thesis area, so I hope to bring this into the mainstream. This is what PUMPKIN PATCH does; I also have some unreleased tooling I plan to integrate with PUMPKIN PATCH. I would love to see more work on it outside of my own tools. If you’re interested, definitely talk to me.