PL for SE, Part 1: “Programming Languages”

This is the first post in a Q&A series I’m doing for software engineers who are interested in getting into programming languages (PL) research. And it’s a great question to start with because it’s impossible to get into PL research if you don’t know what it is. The name really isn’t self-explanatory.

Q

@drin_montana writes:

Type systems, formal analysis and security related things are all I know of in PL research. What other types of research are under a PL PhD?


A

Any one-line summary of PL research is going to miss the mark, but here’s an attempt at one, anyways: PL research is about reasoning about programs. Sometimes we do the reasoning ourselves, sometimes we build algorithms or tools that do the reasoning automatically, and sometimes we make it easier for programmers to do the reasoning.

I once gave a quick summary along those lines to a machine learning researcher, who then responded “so basically, you guys solve the halting problem.”¹ I mean, basically. Pretty much anything interesting you could ever want to know about a program is undecidable. And yet PL is a thriving field. How?

Typically, we explore some trade-offs:

  1. We solve decidable sub-problems of an undecidable problem.
  2. We approximate a solution to an undecidable problem that mostly works in practice.
  3. We limit what we let the programmer do so that we can avoid solving the problem altogether. (This is often actually a really good solution.)

This explanation for sure falls short, though, so let’s get into some examples. I’ll walk through a small selection of topics I know about. I’m not going to pretend it’s an exhaustive list, and it’s definitely biased by the kinds of research I’m exposed to at UW. Also note that this list is constantly evolving, and that there’s some overlap with software engineering research—I’m not going to get into the distinction.

If you want to learn more about any of these topics or see what other kinds of PL research exist, I recommend checking out the programs for some big PL conferences (PLDI 2017 [not yet posted, but hopefully soon], POPL 2017, OOPSLA 2016).

Language Design & Implementation

What it is: This is probably what most people think we do: design and implement programming languages.² This often means developing new abstractions that help programmers express something that is difficult to express or avoid common mistakes. Sometimes it’s just a matter of combining and implementing existing abstractions the right way. It’s actually a pretty small part of PL research, but there are people who focus on it, and it intersects a lot with other kinds of PL research.

Recent example: Checked C is a project at Microsoft Research with the goal of making your life less miserable if you have to write C programs. In their words:

Checked C … adds checking to C to detect or prevent common programming errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts.

Why you should care: You use these languages every day. No matter how awesome of a programmer you are, there are sometimes things you can’t express easily or mistakes that you’re likely to make. Researchers are trying to make your life easier.

Domain-Specific Languages

What it is: General-purpose programming languages (like C or Java) are great. But sometimes, you can get a lot more power by using a language that is specific to the problem you are solving (say, by using Mathematica for math, or SQL to query a database). Domain-specific languages (DSLs) are designed for these specific tasks.

Recent example: Luck is a language that makes it easier to write generators (which generate test inputs) for QuickCheck-style (automatic) testing.

Why you should care: You already do! DSLs are all around you. They’re often disguised as frameworks or libraries. You can view Spring MVC as a DSL. The same goes for mocking frameworks like Mockito.

Solver-Aided Languages

What it is: Solver-aided languages are (usually domain-specific) languages backed by constraint solvers. These languages let you leave some of the reasoning to a logic solver, so you can write high-level programs and have the logic solver prove your program correct (verification), generate correct code on the fly (angelic execution), or produce correct programs for you (synthesis).

Recent example: Rosette is a solver-aided language that makes it easy to write new solver-aided languages (pretty meta). One recent application of Rosette is a solver-aided language that helps programmers generate lots of different inputs for testing (kind of like Luck, but very different under the hood).

Why you should care: For all of the same reasons you care about DSLs and general-purpose programming languages, with the added bonus that constraint solvers are really good at the tedious things you don’t like wasting your time on. Imagine if all of the time you spent thinking about low-level details and writing boilerplate were freed up to spend just on the fun, creative stuff. Your job would be more fun; your code would be better.

Program Synthesis

What it is: An elaboration of the above, but not limited to its use in new languages. Basically, programming is hard, so let’s let our computers do some of the programming for us. Sometimes we give the computer a specification of what our program should do; sometimes we give it some examples and let it figure out the rest.

Recent example: Cassius synthesizes CSS. This probably doesn’t need any motivation to anyone who has ever written a website, but if you’d like some, the talk intro has a nice example.

Why you should care: You may have unknowingly used synthesis tools already (for example, Flash Fill, an Excel feature that fills in spreadsheets from examples). If you haven’t, they will be coming your way soon. I think this is really going to be the future of programming. It doesn’t mean you’ll be out of a job—it just means you’ll be able to spend more time on the things that really matter. So think about any programming task that you find mundane or unnecessarily difficult, something you wish you could just throw at your computer and have it do for you. Synthesis is here to help you with that.

Verification

What it is: Proving programs correct. According to common wisdom there are “two kinds” of verification, but nobody agrees on what those “two kinds” are. Some say the split is automatic/manual: Is the tool reasoning, or is the programmer using the tool to reason? Others say it is classical/constructive: What assumptions does the tool’s underlying logic make? In reality, these are more like axes, so you’ll see automatic-classical, automatic-constructive, and manual-constructive (probably not manual-classical, but who knows, and there are way more than just two flavors of logic).

Recent example: Neutrons verified a radiotherapy system (and found bugs in the process). In the past, programming bugs in (unverified) radiotherapy machines actually poisoned patients from radiation overdoses.

Why you should care: If you write code that’s important in any way, you probably want it to be correct. Even if your bugs don’t physically harm customers, they might impact customer security and privacy. Testing gives you some weak guarantees, but imagine the peace of mind of knowing your program is correct. (It also means getting paged less often.)

Testing

What it is: Sometimes you really don’t need full guarantees that your programs are correct, so you write some tests. Writing good tests is hard, though. Testing research aims to make that easier, usually through automation (whether partial or full), but sometimes through new abstractions (as in Luck, which we talked about earlier). The meat of this is software engineering research, but testing papers still show up in PL conferences.

Recent example: PBNJ makes it easier to write mock objects. It uses constraint solvers under the hood.

Why you should care: Oh, I’m sure you already care. I remember scoping out projects with my TPM and setting aside an equal amount of time for testing and writing code in some cases. Should you really spend that much time writing tests? Sometimes you have to, but you probably shouldn’t have to.

Static Program Analysis

What it is: Sometimes you don’t quite want to prove things correct, but you also don’t want to just run a bunch of tests and call it a day. There are a lot of topics that fall in between the two approaches, and I’m not going to write about all of them, but static program analysis is a really big one worth mentioning. Basically, static program analysis approximates your program in some way, then proves some properties about the approximation. You can use this to find bugs, but for even very simple bugs (like divide by zero) you’ll typically have at least one of two problems: Sometimes your analysis will report bugs that aren’t really bugs, or sometimes it will fail to report some bugs. That’s usually OK, though.

Recent example: FindBugs is an analysis tool that finds bugs in Java programs. I’ve seen it actually used in industry.

Why you should care: I think I’ve motivated the “bugs suck” angle a lot, so here’s another reason: Every time you compile a general-purpose language, the compiler is already doing this for you and using it to optimize your code. This is why you don’t have to think about all of the low-level optimizations you learned about in undergrad that make your code fast but messy. The compiler makes your code fast, but you never have to see the mess.

Type Systems / Type Theory

What it is: This is way down in “core PL,” and focuses on reasoning about the type systems that programming languages use. Type systems are really a way of limiting what the programmer can do so we can better reason about our programs. Ideally you want a type system that is expressive enough to do everything you want to do (imagine writing Java code without being able to make a List of Integers). But as you make your type system more expressive, it may be harder to reason about or require extra programmer effort.

Not-so-recent example: I’m going a little older here, since it takes a while for type theory results to reach the real world (but trust me, they do). It’d be great to automatically infer types without the programmer ever having to write them. It turns out this is undecidable for a pretty simple language. But! (OK, if you can’t tell, I really like this stuff.) By placing just a few restrictions on the type system, it’s actually decidable. ML and older versions of Haskell both use this result.

Why you should care: You probably don’t want your type system holding you back when you write programs. But you also probably don’t want your programs mysteriously failing at runtime because of a simple type error that you could have prevented ahead of time. Type theory research gets very abstract very quickly, but ultimately it still does reach you (sometimes decades later) in the form of more expressive languages and more reliable programs.


  1. At the time, I was working with Byron Cook‘s group at Amazon. Byron Cook is well-known for literally solving the halting problem (in practice).^
  2. Often the first question we get when we say we study programming languages is, “what language?” ^

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.