Symbolic Compilation for GCL

A Beginner's Guide to GCL and the Weakest Precondition

Posted on May 10, 2023 by Eric Hayden Campbell

Dijkstra’s Guarded Command Language (GCL) has become a staple of automated verification. I often use it as an intermediate language in my verification tasks, primarily because it has a well understood compilation algorithm to formal logic. It’s very similar to the IMP language that we study in introductory PL courses, but it has a key difference: nondeterminism.

GCL models the core elements of imperative programming languages: variables, branching control flow and iteration. The language also incorporates some logical constructs used for verification tasks: assertions and assumptions.

The first and simplest thing you can do in GCL is assign an expression (e) to a variable (x) like so:

x := e

Here, the expression language can really be anything (arithmetic, bitvectors, booleans), but here we’ll assume that they range over fixed-width bitvector expressions. The semantics of such an assignment is the following: first, e is evaluated to some value v, then the state is updated so that x has value v.

We can also make assertions about the program state. For instance assert b crashes the program when b is false and has no effect when the program fails. In answering verification questions, our goal will be to statically show that no assertion ever fails.

The first mechanism we have for constructing larger programs from smaller ones is sequential composition, which we write using a semicolon (;) as in Java or C. If we have two smaller programs c1 and c2, we construct their composition by sequentially composing them—i.e. by writing c1;c2. This code first executes c1 and then c2 on the resulting state.

The second mechanism for constructing larger programs is the eponymous guarded command. A guarded command is a sequence of boolean guards b1, ..., bn and corresponding programs c1,..,cn written as follows:

if  b1 -> c1
    b2 -> c2
    ...
    bn -> cn
fi

Very similar to an if statement, the above guarded command will only run ci if its guard bi evaluates to true. However, different from traditional if-statements, all of the guards are evaluated simultaneously and of those that are true, one is nondeterministically chosen. For instance, consider the following code where both guards are true.

i = 1;
if  i < 2 -> j := 0
    i < 3 -> j := 99
fi 

The semantics of this GCL program is to nondeterministically chose either j :=0 or j:=99 to execute, that is it’s equivalent to

if  true -> j := 0
    true -> j := 99
fi

If we want to implement the standard disjoint semantics for if-statements we need to negate the conditions of the previous rows. That is, to express

if (b0) { 
  c1
} elseif (b1) { 
  c2 
} else {}

we would need to write the following guarded command

if  b0         -> c1
    !b0 && b1  -> c2
    !b0 && !b2 -> skip
fi

Here, skip is simply the command that does nothing.

And that’s all there is to GCL! It’s a very simple yet powerful language. Dijkstra’s original GCL does have a mechanism for iteration. I’ll discuss how loops work in a future post.

I do find that this version of GCL has some unnecessary complexity. I prefer the a more algebraic presentation of the same language.

GCL with an Algebraic Flavor

A more algebraic presentation of GCL breaks down the guarded command operation into two operations, the assume b operator, and the binary nondeterministic choice operator c1 [] c2 and the assume b operator. Here’s how they work:

The choice operator c1 [] c2 uses demonic nondeterminism to choose between terminating executions in c1 and c2. The demonic modifier here describes the behavior of choice with respect to assert statements. It’s a kind of worst-case behavior where the whole program crashes if either c1 or c2 crashes. (Conversely, in angelic nondeterminism, c1 [] c2 only crashes if both c1 and c2 crash).

The assume b command behaves similarly to the assert operator, except instead of crashing the program when b fails, it runs an infinite loop. This semantic design is important for the fact that nondeterministic choice only considers terminating executions. For instance, a GCL programmer can disable a non-deterministic branch by assuming false. That is the following two programs are equivalent on their terminating executions.

x := 5 [] (assume false; x := 7)
===
x := 5

Let’s see why. The right-hand branch in the first program (assume false; x:=7) is a non-terminating loop because false will never evaluate to true, causing assume false to spin in an infinite loop. The left-hand branch simply assigns x to 5, which means the only terminating execution that [] can choose from is x:=5. So the programs are equal. With these two operators we define GCL using a slightly more algebraic flavor (which is my preference):

   assume : Bool -> GCL 
   assert : Bool -> GCL
   ( := ) : Var -> Expr -> GCL
   ( ; ) : GCL -> GCL -> GCL
   ( [] ) : GCL -> GCL -> GCL

Note that we can recover the guarded commands using these operators by implementing guarded commands as follows:

(assume b1; c1) []
(assume b2: c2) []
…
(assume bn; cn)

(Note that [] is associative)

Aside. I’ve found both the guarded command presentation and the GCL presentation to be ergonomic in different settings. For instance, guards were convenient in modeling concrete tables in Avenir, but I’ve found choice to be more ergonomic in doing path splitting. Predicate Transformer Semantics

As I mentioned earlier, the reason we want to use GCL is to perform logical analyses of programs. We can compile a GCL program into first order logic using something called predicate transformers, which are two ways of giving formal semantics to GCL programs. They are called the weakest precondition and strongest postcondition semantics. Here we’ll only talk about the weakest precondition, and an efficient variant of the strongest postcondition formalized by Cormac Flanagan & Jonathan Saxe in 2001.

The Weakest Precondition

The weakest precondition P is defined with respect to GCL command c and a postcondition Q and must satisfy two conditions: first, that if a state satisfies P, running c in that state produces an output state satisfying Q, and second, that if some other formula P’ satisfies the first condition, that P’ => P. The first property characterizes the correctness requirement for P to be a valid precondition of c and Q at all, while the second property characterizes what it means to be weakest: that every other valid precondition is stronger than or equal to P’.

This is best explored with an example. Consider the postcondition x > 3 and the command x := x + 1. Consider a few candidate preconditions below:

  1. x = 0
  2. x = 3
  3. x = 3 || x = 4
  4. x > 2

The first formula x = 0 is not a valid precondition, because incrementing x when it is zero produces an output where x = 1, which is less than 3, violating the desired postcondition. The second formula is a valid postcondition, since 3 + 1 > 3, however it is not the weakest, since x = 3 || x = 4 also is a valid postcondition and x = 3 => x = 4. The fourth condition is the weakest precondition. Notice that x = 3 => x > 2 and x =3 || x=5 => x > 2.

The Weakest Precondition Function

The cool thing about the weakest precondition is that Dijkstra came up with an algorithm in the 70s for computing them for terminating executions! This is often called “weakest liberal preconditions” As in the semantics, the “terminating” caveat is important to reason about assume statements.

The weakest liberal precondition function, written P = wlp(c, Q) returns P, the weakest liberal precondition of c w.r.t Q. We’ll define the weakest precondition recursively over each variant of GCL commands. The first case is when c is assignment:

wlp(x:=e, Q) = Q[e/x]

Here Q[e/x] represents the formula Q with every occurrence of x replaced by e. The justification here is that Q is reasoning about the state after the assignment occurred, when x = e, but before the assignment occurred, x could have been anything, so we need to eliminate all facts about x by replacing it with an equivalent expression e.

The next case is for assertions, and is straightforward.

wlp(assert b, Q) = b && Q

Asserting a condition b means that we must prove its truth at that point to avoid errors—it must be conjoined to postcondition.

The case for assumption is a bit tricky semantically, but easy from a terminology perspective. Assumptions and implications have a tight connection in proof theory, so why not also in program verification?

wlp (assume b, Q) = b => Q

The reason for this is a bit subtle and again relies on the liberality of our preconditions. In only considering terminating executions, we are only concerned with executions that pass the assumption, i.e. for which b holds, which means that rather than requiring b to hold, we assume that it does, and disregard all other (non-terminating) inputs falsifying b.

The rule for sequential composition simply propagates the precondition backwards through thr program:

wlp (c1;c2, Q) = wlp(c1, wlp(c2, Q))

Notice that it’s very common to say “the” weakest precondition because weakest preconditions are “unique”. The proof of this is very simple. If both P and P' are weakest preconditions, then P => P' and P' => P which means they are logically equivalent and therefore unique. The nuance here is that P and P' may be syntactically very different even if they are semantically the same. For instance, conditions 4 and 5 above are semantically identical but syntactically different. They are still both “the” weakest precondition.

A Simple Symbolic Compiler

Now that we have the weakest precondition its quite easy to build a symbolic compiler for GCL programs. A symbolic compiler simply produces a formula φ that describes the relational sematnics of the input program.

To do this we’ll take the weakest precondition of a program c with respect to a special formula X = X', where X is the set of variables that occur in the program c, and X' is the same set of variables given fresh names. Morally, X represents the input state and X' represents the output state. For instance in the program x := 1, X = {x} and X' = {x'}. Then, the symbolic relational semantics of the program is x' = 1. This characterizes the notion that for all input states, the output state has x ↦ 1.

So, the implementation of a simple symbolic compiler is

wp(c, X = X')

And we’re done!

In future posts I’ll discuss other kinds of symbolic compilers and how we can make them fast