The first significant addition we will make to our language is identifiers using a
bind expression. The
bind expression is like a let. It defines a value and binds it to an identifier that may then be used in a subsequent expression. Consider this trivial example defining
x to be
bind expression creates the identifier,
x, assigns it the value
5+2 using the concrete syntax
x = 5+2, and defines a region where
x can be used using the
A general definition for
bind would be $\bbind i=v\; \iin b$ where $i$ has the value $v$ in $b$ and returns the result of evaluating $b$.
Identifier instance, binding and scope are concepts used throughout language specification to describe identifier definition and usage. An identifier instance is any time that identifier is used. In the following simple example there are two instances of
x and one instance of
A binding instance is the instance where an identifier is declared and given a value. There is one binding instance for
x in the previous example following the
x = 5+2 Generally,
bind specifies a binding instance immediately following the of the form $i = v$ where $i$ is a new identifier and $v$ is an expression providing the new identifier’s value.
An identifier’s scope is the program region where an identifier can be used. We say the scope is where the identifier is bound. The
bind expression defines the scope of the identifier it creates as the expression following the
in keyword. In the previous example the scope of
Putting everything together, the
defines a new identifier $i$ with the value of $a$ defined in scope $s$.
A bound instance of an identifier is use of the identifier in the scope if its binding. A free instance of an identifier is use of an identifier outside the scope of its binding. In the example above,
x is bound in the body
y is free. There is a binding instance for
x following the
bind keyword, but none for
Nesting allows definition of multiple identifiers that will be simultaneously available. Each
bind creates a binding instance for a single identifier, thus the only way to have multiple identifiers is besting. As expressions,
bind can appear anywhere that a
BAE expression is allowed.
bind expressions can be be nested in simple ways with expected results:
Here the scope of
y is the expression following
x+y-4. The scope of
x is the expression following
in, the nested
bind expression. Thus,
y are in scope in
Bound variables may also be used in definitions of bound identifier values:
x is defined and given the value
4 that can be used in the nested
bind expression that defines
y. The expression
5+x defines the value of
y using the previously defined
x. The expression
x+y-4 refers to both the definition of
x and the nested definition of
There are yet more interesting ways to nest the
bind expression whose differences can be quite subtle. The expression:
y and nests a definition of
x in another definition of
x. Reflecting on what traditional languages do, the inner
x will shadow the outer
x in its scope. The outer value is used in the definition of the inner
x value, but the inner
x is used in its defined scope.
Now a slight change placing parens around
x+y-4. What does this do? How does it change the result of evaluating the
The parentheses close the scope of the inner
x before the last line. The body of the innermost
bind changes from
x+y-4. The final x is associated with the outer
bind that defines
This is not a good way to define the semantics of
bind. It is confusing and ambiguous. We need an alternative definition that captures how
bind evaluates in a precise manner.
The concrete syntax for this new language is a simple extension of
AE that includes identifiers and
bind. Note that we are back to
AE without Booleans:
From the concrete syntax we can quickly define a Haskell data type for the abstract syntax:
The additions to
AE that give us
BAE are the
Bind constructor and the
Id is defined over a string in the same way that
Num is defined over numbers.
Bind accepts a string representing the new identifier, an expression representing the identifier value, and a scope for the identifier.
BAE expressions is a simple extension of parsing
AE. We need only add parsers for the and
Both cases are routine.
identExpr simply calls the built-in
identifier parser and returns the result lifted into the
BAE AST using
bindExpr calls the identifier and
expr parsers in sequence then
Bind to create the term AST. Both cases are integrated into the main
term parser in the same way as other expressions:
In the previous description, I used the term lift for the first time. When we lift a term we are transforming a term from the host language into the external language. The opposite of lift is lower.
As one would expect, pretty printing
BAE requires adding only two cases to the
BAE pretty printer:
Testing the parser and pretty printer is done exactly as it was with
BAE. We’ll skip that here, but it is included in this chapter’s source code.
The answer to precise definition is the mathematics of inference rules. To define
bind evaluation we will add one new inference rule:
$BindE$ is not significantly different from earlier evaluation rules with the exception of a new notation for substitution. The antecedent requires the argument, $a$, evaluate to $v$. The consequent uses the standard notation $[i\mapsto v]s$ for specify that $i$ is replaced by $v$ in $s$. Specifically, $[i\mapsto v]s$ is defined as $s$ with all free instances of $i$ replaced by $v$. When evaluating
bind itself falls away and instances of its identifier are replaced with its value.
Let’s spend a bit of time to understand the definition of substitution and why it is the operation we want. The notation $[i\mapsto v]s$ says that $i$ is replaced by $v$ in $s$. A simple derivation of the value of
bind x=5 in x+7 works like this:
The $BindE$ rule transforms
bind into a substitution by dropping the
bind and performing substitution. Dropping the
bind and binding instance makes any bound instance free. In the previous example,
x is a bound instance in the
bind expression. When
bind is dropped,
x becomes free. Substituting
[x->5]x+7 now replaces free
x instances with 5 resulting in
Let’s try another example with a nested
Once again the outermost
bind is dropped and substitution performed over the term
x + bind x=7 in x. Only the first instance of
x is free - the second instance is in the scope of the nested
bind. If we didn’t include the condition that identifiers must be free when replaced, the second
x would be replaced with
the first even though it is a different identifier instance.
A third example uses the bound identifier in the definition of a nested identifier:
x used in defining a value for
y becomes free when the binding instance for
x is dropped. Thus, the free
x in the defining expression is replaced by 5.
BAE we need to first define substitution. The function
subst x v s will implement the substitition $[x\mapsto v]s$. Once again we treat our program as a data structure and define
subst over the
BAE data type.
The cases for numbers and binary operations are trivial. Substitution has no effect on numbers as they represent constant values. For addition and subtraction, we simply call substitution recursively on the two expression terms.
Substituting over an identifier compares the identifier name with the name being substituted for. If the names are the same, the identifier is replaced with the substituted value. If the names are not the same, the identifier is left alone.
How do we determine if an identifier is bound or free? This is determined by where the identifier occurs and not the identifier itself. Then
Bind case takes care of this by turning off substitution for the binding instance it creates. If the identifier being replaced is the same as the bound instance, then no substitution is performed in the
bind body. If the identifier being replace is not the same as the bound instance, substitution is performed on the bound body. Substitution is always performed on the binding instance’s value expression.
Note that the scope of an identifier bound by
bind is the expression following the
in keyword. The value expression is explicitly not a part of the scope. Aside from impacting substitution, this means that a bound instance cannot be used in its own definition. For example:
will not evaluate because the
x in the value for
x is free. It has no binding instance and will never be replaced during evaluation.
subst defined we can easily define
evals, an evaluator that performs substitution is specified by inference rules. Cases for
Minus are unchanged from
AE. The case for
Bind is implemented by substitution. Specifically, the defined identifier is replaced by the value expression in the
The interesting case is for
Id where an error is raised for an undeclared variable. Why is this? In this interpreter, identifiers are replaced immediately following their definition.
An interpreter composes the
BAE parser with our evaluator:
To test our interpreters, we need to extend the generator use for
AE to include
bind and identifiers.
Generating random names can be done in several ways. Rather than generating arbitrary length strings, let’s generate single character identifier names using
choose to select a character and return the character converted to a string:
genName will select a name from the range a to z and return it as a string.
Generating an ID from a name is done in the same way we generated numbers from integers. Simply take a string,
n and return
We can integrate both the
Bind generators into the arbitrary expression generator by adding them to the
oneof argument lists for the base case and inductive case respectively.
Now for a QuickCheck test:
What happens is an almost immediate testing failure resulting from an undefined identifier. An example
BAE instance that causes such a failure is:
It is almost impossible to get QuickCheck to find an arbitrary term without a free identifier. Our arbitrary
Id generator produces
Id instances without regard to whether the
Id is a bound instance. No term will evaluate unless its identifiers are all bound. What we want is a generator for arbitrary bound instances.
Thankfully this is not a difficult feature to add. What we will do is input a list of bound identifiers to the
BAE term generator:
genBind will still generate an arbitrary symbol as its binding instance name. When
genBAE to generate its body, the new identifier is added to a list of previously bound identifiers.
genId is called it is given a list of bound identifiers. Rather than using
choose over a range of characters, the
elements generator is used to select an arbitrary element of the bound identifier list:
Our new arbitrary term generator produces only terms that include bound identifiers. We can QuickCheck
evals to determine if it crashes as we did earlier versions of
Let’s assume the interpreter defined by
interps is a faithful implementation of the
BAE language and examine how it executes. Each time a
bind is encountered,
subst on the body of the
bind expression. Although this is technically correct following precisely what the semantics require, it is not efficient.
As an example, consider this intentionally obnoxious code fragment from
To execute this code,
evals would start by replacing
w with 5 throughout the body of the outer
let. This results in:
bind body and value expression are visited and
w replaced with
5. Now we evaluate the outer
bind body the same way, replacing
x by 12.
Do you see the problem? Every inner
bind body and value expression are visited a second time to perform the second replacement. The process now repeats for
z causing the innermost body to be processed 4 times and the innermost
bind value expression 3 times. For this tiny code fragment, there is no problem. For complex code, imagine visiting the entire executable program once for each identifier evaluated! We must find a more efficient way.
As inefficient as they are,
evals as are still useful.
evals define a normative interpreter for
BAE. Normative in the sense that the interpreter is correct, but is not optimized or made efficient in any way. It serves as an ideal model for evaluating
BAE. In some communities normative implementations are called golden and used to evaluate other implementations. We’ll see that at work when we test our interpreters.
evalsErrthat uses the
Eitherconstruct to return either a
BAEconstruct or an error message. How many different error messages need we return?
Download source for all interpreter code from this chapter.