Programming Languages in Haskell

Support for EECS 662 at KU

Index
Blog
\[\newcommand\calc{\mathsf{calc}\;} \newcommand\parse{\mathsf{parse}\;} \newcommand\typeof{\mathsf{typeof}\;} \newcommand\interp{\mathsf{interp}\;} \newcommand\eval{\mathsf{eval}\;} \newcommand\NUM{\mathsf{NUM}\;} \newcommand\ID{\mathsf{ID}\;} \newcommand\iif{\mathsf{if}\;} \newcommand\tthen{\;\mathsf{then}\;} \newcommand\eelse{\;\mathsf{else}\;} \newcommand\iisZero{\mathsf{isZero}\;} \newcommand\bbind{\mathsf{bind}\;} \newcommand\iin{\mathsf{in}\;} \newcommand\aand{\;\mathsf{\&\&}\;} \newcommand\lleq{\;\mathtt{<=}\;} \newcommand\ttrue{\;\mathsf{true}} \newcommand\ffalse{\;\mathsf{false}} \newcommand\tnum{\;\mathsf{TNum}} \newcommand\tbool{\;\mathsf{TBool}}\]

Adding Environments

The substitution mechanism implemented in evals is a faithful implementation of our eval definition, but it is not optimized in any way. Let’s try a far more efficient implementation of the BAE interpreter that waits to perform substitution until an identifier is evaluated rather that immediately substituting when an identifier is bound. To understand the approach, consider a simple example from the previous chapter:

bind x = 4 in
  bind y = 5 in
    x+y-4

Using substitution, we saw that when evaluating bind x=4 all occurrences of x in scope were replaced immediately by 4. Instead of immediate substitution, let’s simply remember that in the outermost scope, the value of x is 4:

bind x = 4 in    [(x,4)]
  bind y = 5 in
    x+y-4

Evaluating the bind calculates the value of x, but does not immediately substitute. It simply remembers that x is 4. The notation [(x,4)] depicts an environment containing the binding of x to 4. The decorations after each in show the environment in each bind’s scope. Each bind adds a binding to the current environment. As shown above, following the first in, the environment contains the binding of 4 to x.

Evaluation does the same thing for the inner bind, but this time it remembers that y is bound to 5 by adding it to the front of the list representing the environment:

bind x = 4 in    [(x,4)]
  bind y = 5 in  [(y,5),(x,4)]
    x+y-4

After the second in, the environment contains bindings for both y and x. As we will see later, order is important here. New bindings are always added to the beginning of the environment.

Now we are ready to evaluate the body of the innermost bind. The interpreter evaluates each identifier by simply looking it up in the current environment:

x+y-4     [(y,5),(x,4)]
== 4+5-4
== 5

If constructed correctly, the environment should contain a binding for every bound instance. Unlike evals, this new evaluator visits each element of the expression one time making evaluation far more efficient than constantly walking the code to perform substitution.

Summarizing, an environment is a data structure containing bindings generated from binding instances. As of now, binding instances are only create by bind expressions, thus environments are altered only by executing bind. A binding is simply an identifier/value pair. A binding is added to the environment each time a bind expression is evaluated.

Nesting Bind - Redux

Before charging into a new eval implementation, let’s revisit examples from the previous chapter to get a feel for using environments. First, let’s look a the remaining two examples from the previous chapter.

The first example has 3 bind instances that add to the environment. The first binds y to 4, thus the y immediately following will evaluate to 4. The second binds x to 4. y will evaluate to 4 based on the binding from the outermost bind to give x its value.

bind y = 4 in        [(y,4)]
  y + bind x = y in  [(x,4),(y,4)]
    bind x = x+2 in  [(x,6),(x,4),(y,4)]
      x+y-4
    + x

The third bind once again adds a binding for x, but this time to x+2. The x in the value expression evaluates to 4 based on the environment at that point. Evaluation of the third bind has not completed, thus the new value for x has not been added. When we do reach the body of the innermost bind, the environment includes two bindings for x - one for the current scope and another for the previous scope. In such cases the innermost instance is correct. Searching for x from the beginning of the list gives exactly the result we want. Including the first y, the resulting expression to be evaluated is 4+6+4-4+6 == 16. While the last x is on its own line, it is still in the scope created by the innermost bind.

Let’s make a small change and close the innermost bind before evaluating the last x. Everything proceeds as before except the final evaluation. When the closing parenthesis ends the expression defined for the innermost bind, it places the last x in the scope created by the second bind.

bind y = 4 in         [(y,4)]
  y + bind x = y in   [(x,4),(y,4)]
    (bind x = x+2 in  [(x,6),(x,4),(y,4)]
       x+y-4)         
    + x               [(x,4),(y,4)]

In the example, the environment from the second bind is repeated for the last x for emphasis. It is not created by somehow removing the outermost binding for x, but is simply returned to when the expression x+y-4 finishes evaluating. Thus the result is 14 rather than 16.

When a free variable is encountered, it will have no binding in the current environment.

bind x=5 in  [(x,5)]
  y+x

In this example the environment created by bind creates a binding for x, but not for y. When y is evaluated, an undefined identifier error results.

Another experiment tries to use x recursively in its own definition:

bind x = x+1 in x

This definition seems odd from the outset, but try to use an environment to create a binding. The environment resulting from the bind should contain a value for x, but what does the environment contain before bind? That is the environment used to find the value of x in x+1. As one might guess, the environment is empty and contains no bindings. Thus, when x is evaluated in x+1, an undefined identifier error results.

Defining an Environment

As noted earlier, an environment is simply a list of identifier/value pairs. The simplest way to represent this in Haskell uses a list of String/BAE pairs. The Env type provides a shorthand for this type:

type Env = [(String,BAE)]

For example, environment”

[("x",(Num 5)),("frap",(Num 7)),("z",(Num 10))]

contains three bindings for identifiers named x, frap and z in order of recency.

To include an environment, eval becomes a function from an environment and expression to an integer:

eval :: Env -> BAE -> (Maybe BAE)

Every call to eval now includes the environment used by the evaluation. As we’ll see, evaluating bind involves adding a binding to the environment while evaluating other terms do not affect the binding.

Implementing eval with an environment follows the same pattern as other interpreters with the addition of an environment parameter to recursive eval calls. The cases for numbers, addition and subtraction are the same as for evals and earlier evaluators, except recursive calls to eval must include the environment, env:

eval env (Num x) = (Just (Num x))
eval env (Plus l r) = do { l' <- (eval env l) ;
                           r' <- (eval env r) ;
                           return (liftNum (+) l' r')
eval env (Minus l r) = do { l' <- (eval env l) ;
                            r' <- (eval env r) ;
                            return (liftNum (-) l' r')

The environment passed with eval does not change. For example, when evaluating Plus the same environment is used for subterms as is used for the term itself.

Cases for Bind and Id manage and use the environment respectively. Bind adds to the environment and Id searches the environment:

eval env (Bind i v b) = do { v' <- (eval env v) ;
                             eval ((i,v'):env) b }

The Bind case evaluates the value expression associated with the new identifier using the current environment. The expression ((i,v'):env) creates a new pair and adds it to the front of the original environment using cons. Then the body, b, is evaluted with the identifier added to the environment. In effect, evaluating the bind adds its identifier and value to the environment and evaluates its body with the new environment.

eval env (Id id) = lookup id env

The Id case uses the builtin Haskell lookup function to search the environment for the first pair whose first element is equal to id, the name of the identifier. lookup returns a Maybe instance where Just a is a value if if found and Nothing indicates and error. If you’re wondering why there is no do and sequential set of binds, lookup uses Maybe and returns a Maybe BAE value. Thus, we can simplly call lookup directly rather than bind its result to a variable to be returned.

Finally, the interpretation function is defined by composing eval [] and the parser. Note that the parser, pretty printer, and term generator from the immediate substitution example are completely unchanged. This is the first sign we have built the new interpreter correctly.

interp = (eval []) . parseBAE

Remember that eval [] results in a function with a single parameter of type BAE. It is still a function and can be treated like any other function.

Testing

testEval :: Int -> IO ()
testEval n = quickCheckWith stdArgs {maxSuccess=n}
  (\t -> (interp $ pprint t) == (eval [] t))

Remember that we called the immediate substituting interpreter a normative implementation that implements a correct interpreter that is not optimized. If the new interpreter properly implements the BAE language, then the two interpreters should always produce the same value for valid input. QuickCheck can easily check this property by calling both interpreters on arbitrary BAE expressions and comparing results:

testCompare :: Int -> IO ()
testCompare n = quickCheckWith stdArgs {maxSuccess=n}
  (\t -> (eval [] t) == (evals t))

Trying this on large sets of test cases results in no errors, giving us strong evidence that the new interpreter implements BAE in the same way as the original interpreter.

To make testing more meaningful, evals and eval should be implemented with an Either return type like ABE earlier. This is left as an exercise and will be explored in a later chapter.

Discussion

Unlike most of our earlier interpreters, the objective of adding an environment is not to add a new language feature or change what eval does. The objective is to make evaluation more efficient by caching substitutions in the environment data structure. In fact, we hope that adding and environment does not change the evaluation result at all.

bind is a different kind of expression than + or -. Arithmetic expressions calculate values and return them. 2+1 evaluates to 3 and the new value is returned to whatever mechanism involved eval. bind on the other hand doesn’t calculate anything. Its body does, but the bind itself is wrapped around the body. The real work of the bind is performed on the environment. Arithmetic expressions calculate values and in a sense, bind calculates a new environment by creating identifiers. It’s not necessary to make this distinction, but grouping expressions by their purpose can prove useful in both understanding and implementing interpreters.

Definitions

Exercises

  1. Write a version of eval called evalErr that uses the Either construct to return either a BAE construct or an error message.
  2. Write a version of interp that uses a prelude. The prelude is an initial environment with a collection of identifiers that are always available without explicit bind definitions.

Source

Download source for all interpreter code from this chapter.

Notes