Blog

Before moving ahead to functions, let’s spend a bit of time looking at
type checking when adding Booleans to BAE. Let’s define a new
language BAE with Booleans (`BBAE`

) that simply adds the same boolean
operators added to `AE`

to get `ABE`

. Let’s quickly walk through our
standard methodology for extending a language and add at the same time
add type checking. We’ll treat this almost like an exercise with some
quick commentary, but not a great deal of detail.

The first step is extending concrete syntax and adding new values if
necessary. The concrete syntax for `BBAE`

is literally the concrete
syntax of `ABE`

and `BAE`

composed:

Similarly, there are no new values that do not appear in `ABE`

and
`BAE`

. We simply compose the values from those to languages:

We now have a syntax for `BBAE`

to work from. On to the parser and
pretty printer.

The abstract syntax of `BBAE`

follows immediately from its concrete
syntax. One can simply copy-and-paste the elements of `ABE`

and `BAE`

into a single AST resulting in:

A quick check (no pun intended) reveals that we have an abstract syntax element for each language element. One can literally walk through the concrete syntax and map constructs to their constructors.

The parser, pretty printer, and generator for `BBAE`

are simple
extensions of `ABE`

and `BAE`

. For brevity, we’ll move on without
including them in the discussion. However, you’ll find code for all
three in the `BBAE`

source files. Note that the `ABE`

generator must
be modified to account for keeping track of generated identifiers.
This is a simple addition, but easy to miss.

Like the syntax definition, the definition for eval composes rules
from `ABE`

and `BAE`

. Literally nothing changes:

How does this work? We’ve combined two languages with their own
constructs by simply combining rules. How do they know about each
other during interpretation? The trick is in the definition of `t`

and its associated abstract syntax. As long as we have a rule for
every abstract syntax element, recursive calls to the interpreter take
care of integrating the languages. Said differently, in virtually
every rule we can interpret subterms of terms by simply calling the
interpreter *without regard to the specifics of the subterm*.
Recursion is your friend here.

The property that allows us to compose languages in this way is
*orthogonality*. The valuation rules for one expression do not
interact with other specific expressions. For example, while `isZero`

uses the value of its argument expression, that expression can be any
properly formed language expression. Orthogonality is an important
design principle that dramatically simplifies language design and
implementation.

We now have a mathematical definition for the evaluation relation.

Now the implementation details for the evaluator. Here things are not quite as simple as composing previous interpreters. We’ll need to spend some time thinking through how we want our interpreters to behave before diving into the code.

What should the interpreter return? Previously we’ve implemented interpreters that return:

- Haskell values
- AST values
`Maybe`

AST values

The plan is to eventually write a type checker for this language. AST
values are sufficient for this `eval`

implementation because we hope
to predict success before execution. Haskell values would be fine,
but we want to use our earlier technique for implementing error
messages. The `Maybe`

implementation was specifically for catching
errors at runtime and that is not required for a type checking
interpreter.

What about the environment? `Env`

remains unchanged from previous
implementations because the environment remains a simple list of
identifier/value pairs. Nothing changes other than the type stored in
the environment changing to `BBAE`

:

This gives us the expected type signature for the `eval`

function:

`eval`

will accept an environment and abstract syntax term and produce
an abstract syntax term.

Evaluation cases for each AST element are defined by evaluation rules
and taken verbatim from the `ABE`

and `BAE`

interpreters. Again,
orthogonality helps us out. The resulting function looks like this:

Interestingly, `eval`

will only return `Nothing`

when variable lookup
fails. `Nothing`

doesn’t even appear directly in the code. Even more
interesting is type checking will assure that all variables are bound
and `lookup`

will never retrn `Nothing`

. We use the monad simply for
sequencing `eval`

steps.

The interpretation final function that composes the interpreter and
parser is identical to that for `BAE`

. Specifically, call the parser
and evaluate the result with an initial, empty environment:

We now have an interpreter for `BBAE`

that requires testing. By
defining `interp`

in the same manner as `BAE`

, the test function for
`eval`

literally does not change:

With everything in place, we evaluate over just a few cases and
quickly learn that `eval`

without any dynamic error checking or static
type checking fails. Hopefully this is not surprising to you.
The next step is defining and implementing a type checker to
statically determine if a `BBAE`

expression will execute.

Type rules for `BAE`

and `ABE`

expressions remain largely unchanged in `BBAE`

until we get to identifiers. Numbers remain of type $\tnum$ and Booleans $\tbool$:

Unary and binary operations similarly remain the same:

\[\frac{t_1 : \tnum,\; t_2 : \tnum}{t_1 + t_2 : \tnum}\; [PlusT]\] \[\frac{t_1 : \tnum,\; t_2 : \tnum}{t_1 - t_2 : \tnum}\; [MinusT]\] \[\frac{t_1 : \tbool,\; t_2 : \tbool}{t_1 \aand t_2 : \tbool}\; [AndT]\] \[\frac{t_1 : \tnum,\; t_2 : \tnum}{t_1 \lleq t_2 : \tbool}\; [LeqT]\] \[\frac{t : \tnum}{\iisZero t : \tbool}\; [isZeroT]\] \[\frac{t_0 : \tbool,\; t_1 : T,\; t_2 : T}{\iif t_0 \tthen t_1 \eelse t_2 : T}\;[IfT]\]The new concept in this language is identifiers introduced by `bind`

and used anywhere in expressions. How do we determine the type of
identifiers? To get an idea, recall how `eval`

handles `bind`

using
an environment. when evaluating a `bind`

:

`x`

takes the value `v`

in `b`

. We use an environment to hold
bindings of identifiers to values. Can we do the same thing with
types of identifiers? Specifically, create a environment-like
structure that maintains bindings of identifiers to types and simply
look up type in that structure?

The environment-like structure that contains types will be called a
*context* and is frequently represented by $\Gamma$. Like the
environment, it is a list of pairs and behaves in exactly the same
fashion. The difference is each pair consists of an identifier and
type rather than identifier and value.

We will use the notation $(x,T)$ to represent the binding of a variable to some type and the Haskell list append $(x,T):\Gamma$ to represent addition of new binding to a context. $(x,T)\in\Gamma$ represents finding the first instance of $(x,T)$ in $\Gamma$. Finally, $\Gamma$ is included in type inference rules using the notation $\Gamma\vdash t : T$. This notation is read “$\Gamma$ derives $t$ is of type $T$.” It is interpreted as using $\Gamma$ to define the types of identifiers in term $t$.

Using this new machinery the rule for `bind`

adds the newly bound identifier with its type to the context and finds the type of the `bind`

body:

The type of the body becomes the type of the `bind`

. This makes sense
as the body is what gets evaluated when the `bind`

is evaluated. It is
also interesting to think of `bind`

as a special expression that
simply adds to the environment or context.

One more type rule for identifiers is needed to finish the definition. It simply looks up a type binding $(v,T)$ in $\Gamma$ and asserts that $v:T$ is true when the type binding is found:

\[\frac{(i,T)\in \Gamma}{\Gamma\vdash i : T}\;[IdT]\]This lookup is quite similar to the lookup used when the environment is used. In fact, the functions for manipulating the environment and context are identical.

To integrate our new type rules for `bind`

and identifiers we need to include the context in all type rules. Thus, the previous rules for terms become:

Now we implement the `typeof`

function. First the standard data type
for the two types expressions defined for `BBAE`

:

Now we build out the type checker from the type checking rules. Like
the `eval`

function earlier, virtually all type checking code comes
from the `BAE`

and `ABE`

type checkers.

`typeof`

accepts a context and AST and returns either an error message or a type:

Note that we’re again using the `Maybe`

construct to return the type
or `Nothing`

to indicate en error. The `typeof`

signature is eerily
similar to the `eval`

signature. We’ll come back to that later, but
it’s worth thinking about why that’s the case.

Here’s the complete code for the `typeof`

function:

There’s really nothing to see here. Our technique for defining languages and specifying their evaluation and type characteristics makes it simple to compose orthogonal language elements. Things will get harder as we add other constructs, but for now things are pretty easy.

Could it be that testing the `BBAE`

evaluator is as simple as
composing the `ABE`

and `BAE`

QuickCheck capabilities? Fortunately,
it is. We simply need to generate arbitrary terms that include terms
from both the `ABE`

and `BAE`

language subsets.

As usual, we first make the `BBAE`

data type an instance of
`Arbitrary`

and define the `arbitrary`

function. The use of `genBBAE`

is where the generator for `BBAE`

is integrated:

The hard work for the generator itself is already done. First, we’ll
define generators for the numeric elements of `BBAE`

:

Now the elements that involve both numbers and booleans:

Finally, the elements that involve identifiers and their values:

Now we simply put everything together into a single generator:

What changes here from the previous generators is the list of
generators passed to `oneof`

. Instead of choosing only expressions
from the numeric expression or the boolean expressions, we now choose
from both. For values, things work similarly choosing from both
Boolean and numeric values.

Three things worth thinking about are happening in this chapter. First, we are going from beginning to end defining a new language. We started with syntax, defined evaluation and implemented an evaluation function, defined a type system and implemented a type checker, and concluded with QuickCheck functions for testing. This is the methodology we use for defining languages at work.

Second, we see how composition of orthogonal language constructs makes this reasonably simple. How many times in the chapter did we talk about composing elements of previous interpreters? Usually that was a simple cut-and-past operation. The only real exception being the arbitrary test case generator where we had to compose the language subsets.

Finally, we introduced the concept of a context that behaves like an environment for types. An interesting difference is the environment is a runtime construct while the context is used during static analysis. The reason is values cannot be known until runtime, but types are known statically. This is not true for all languages, but is true for our languages.

Remember these things moving forward. The step-by-step approach for defining a language and the composition of orthogonal language constructs are important design concepts. Context is something we will continue to use throughout our study.

- Context - list containing bindings of identifiers to types defining identifiers currently in scope.

- Write the function
`evalErr :: Env -> BBAE -> Either String BBAE`

for for`BBAE`

that returns either a string error message or a value following interpretation. - Compare the
`evalErr`

function with the`evalType`

function using the same techniques used for`ABE`

- Prove that the
`error`

function will never be called in`eval`

given that all expressions given to`eval`

successfully type check. - Rewrite
`typeof`

using`Either`

as a monad.

Download source for all interpreter code from this chapter.