The substitution mechanism implemented in evals
is a faithful
implementation of our eval definition, but it is not optimized in any
way. Let’s try a far more efficient implementation of the BAE
interpreter that waits to perform substitution until an identifier is
evaluated rather that immediately substituting when an identifier is
bound. To understand the approach, consider a simple example from the
previous chapter:
Using substitution, we saw that when evaluating bind x=4
all
occurrences of x
in scope were replaced immediately by 4
. Instead
of immediate substitution, let’s simply remember that in the outermost
scope, the value of x
is 4:
Evaluating the bind calculates the value of x
, but does not
immediately substitute. It simply remembers that x
is 4
. The
notation [(x,4)]
depicts an environment containing the binding
of x
to 4
. The decorations after each in
show the environment
in each bind
’s scope. Each bind
adds a binding to the current
environment. As shown above, following the first in
, the
environment contains the binding of 4
to x
.
Evaluation does the same thing for the inner bind, but this time it
remembers that y
is bound to 5
by adding it to the front of the
list representing the environment:
After the second in
, the environment contains bindings for both y
and x
. As we will see later, order is important here. New bindings
are always added to the beginning of the environment.
Now we are ready to evaluate the body of the innermost bind
. The
interpreter evaluates each identifier by simply looking it up in the
current environment:
If constructed correctly, the environment should contain a binding for
every bound instance. Unlike evals
, this new evaluator visits each
element of the expression one time making evaluation far more
efficient than constantly walking the code to perform substitution.
Summarizing, an environment is a data structure containing
bindings generated from binding instances. As of now, binding
instances are only create by bind
expressions, thus environments are
altered only by executing bind
. A binding is simply an
identifier/value pair. A binding is added to the environment each
time a bind
expression is evaluated.
Before charging into a new eval
implementation, let’s revisit
examples from the previous chapter to get a feel for using
environments. First, let’s look a the remaining two examples from the
previous chapter.
The first example has 3 bind
instances that add to the environment.
The first binds y
to 4, thus the y
immediately following will
evaluate to 4. The second binds x
to 4. y
will evaluate to 4
based on the binding from the outermost bind
to give x
its value.
The third bind
once again adds a binding for x
, but this time to
x+2
. The x
in the value expression evaluates to 4
based on the
environment at that point. Evaluation of the third bind
has not
completed, thus the new value for x
has not been added. When we do
reach the body of the innermost bind, the environment includes two
bindings for x
- one for the current scope and another for the
previous scope. In such cases the innermost instance is correct.
Searching for x
from the beginning of the list gives exactly the
result we want. Including the first y
, the resulting expression to
be evaluated is 4+6+4-4+6 == 16
. While the last x
is on its own
line, it is still in the scope created by the innermost bind.
Let’s make a small change and close the innermost bind
before
evaluating the last x
. Everything proceeds as before except the
final evaluation. When the closing parenthesis ends the expression
defined for the innermost bind
, it places the last x
in the scope
created by the second bind
.
In the example, the environment from the second bind
is repeated for
the last x
for emphasis. It is not created by somehow removing the
outermost binding for x
, but is simply returned to when the
expression x+y-4
finishes evaluating. Thus the result is 14 rather
than 16.
When a free variable is encountered, it will have no binding in the current environment.
In this example the environment created by bind
creates a binding
for x
, but not for y
. When y
is evaluated, an undefined
identifier error results.
Another experiment tries to use x
recursively in its own definition:
This definition seems odd from the outset, but try to use an
environment to create a binding. The environment resulting from the
bind
should contain a value for x
, but what does the environment
contain before bind
? That is the environment used to find the
value of x
in x+1
. As one might guess, the environment is empty
and contains no bindings. Thus, when x
is evaluated in x+1
, an
undefined identifier error results.
As noted earlier, an environment is simply a list of identifier/value
pairs. The simplest way to represent this in Haskell uses a list of
String
/BAE
pairs. The Env
type provides a shorthand for this
type:
For example, environment”
contains three bindings for identifiers named x
, frap
and z
in
order of recency.
To include an environment, eval
becomes a function from an
environment and expression to an integer:
Every call to eval
now includes the environment used by the
evaluation. As we’ll see, evaluating bind
involves adding a binding
to the environment while evaluating other terms do not affect the
binding.
Implementing eval
with an environment follows the same pattern as
other interpreters with the addition of an environment parameter to
recursive eval
calls. The cases for numbers, addition and
subtraction are the same as for evals
and earlier evaluators, except
recursive calls to eval
must include the environment, env
:
The environment passed with eval
does not change. For example, when
evaluating Plus
the same environment is used for subterms as is used
for the term itself.
Cases for Bind
and Id
manage and use the environment respectively.
Bind
adds to the environment and Id
searches the environment:
The Bind
case evaluates the value expression associated with the new
identifier using the current environment. The expression
((i,v'):env)
creates a new pair and adds it to the front of the
original environment using cons. Then the body, b
, is evaluted with
the identifier added to the environment. In effect, evaluating the
bind
adds its identifier and value to the environment and evaluates
its body with the new environment.
The Id
case uses the builtin Haskell lookup
function to search the
environment for the first pair whose first element is equal to id
,
the name of the identifier. lookup
returns a Maybe
instance where
Just a
is a value if if found and Nothing
indicates and error. If
you’re wondering why there is no do
and sequential set of binds,
lookup
uses Maybe
and returns a Maybe BAE
value. Thus, we can
simplly call lookup
directly rather than bind its result to a
variable to be returned.
Finally, the interpretation function is defined by composing eval []
and the parser. Note that the parser, pretty printer, and term
generator from the immediate substitution example are completely
unchanged. This is the first sign we have built the new interpreter
correctly.
Remember that eval []
results in a function with a single parameter
of type BAE
. It is still a function and can be treated like any
other function.
Remember that we called the immediate substituting interpreter a
normative implementation that implements a correct interpreter that
is not optimized. If the new interpreter properly implements the
BAE
language, then the two interpreters should always produce the
same value for valid input. QuickCheck can easily check this property
by calling both interpreters on arbitrary BAE
expressions and
comparing results:
Trying this on large sets of test cases results in no errors, giving
us strong evidence that the new interpreter implements BAE
in the
same way as the original interpreter.
To make testing more meaningful, evals
and eval
should be
implemented with an Either
return type like ABE
earlier. This is
left as an exercise and will be explored in a later chapter.
Unlike most of our earlier interpreters, the objective of adding an
environment is not to add a new language feature or change what eval
does. The objective is to make evaluation more efficient by caching
substitutions in the environment data structure. In fact, we hope
that adding and environment does not change the evaluation result at
all.
bind
is a different kind of expression than +
or -
. Arithmetic
expressions calculate values and return them. 2+1
evaluates to 3
and the new value is returned to whatever mechanism involved eval
.
bind
on the other hand doesn’t calculate anything. Its body does,
but the bind
itself is wrapped around the body. The real work of
the bind
is performed on the environment. Arithmetic expressions
calculate values and in a sense, bind
calculates a new environment
by creating identifiers. It’s not necessary to make this distinction,
but grouping expressions by their purpose can prove useful in both
understanding and implementing interpreters.
eval
called evalErr
that uses the Either
construct to return either a BAE
construct or an error message.interp
that uses a prelude. The prelude is
an initial environment with a collection of identifiers that are
always available without explicit bind
definitions.Download source for all interpreter code from this chapter.