Blog

In prior chapters we have defined an interpreter over a language with only numbers, added Booleans, and showed how to handle and avoid type errors. With functions, we don’t need to add Booleans to see the same problem emerge again. Consider the following expression:

```
bind inc = lambda x in x + 1 in
(inc inc)
```

This expression simply applies `inc`

to itself, something that made
sense when we discussed untyped recursion. However, in this context
trying to increment an increment function makes little sense. The
value of `inc`

is a lambda specifying an increment function over
numbers. In the body of the `bind`

, `inc`

is applied to itself.
There is of course nothing wrong with applying a function to itself,
but in this case the body of `inc`

attempts to add 1 to its argument.
Adding 1 to a function is not defined and causes any of our
interpreters to crash.

We can fix this problem as we have in the past - add run-time or
predictive type checking. Said using terminology from our discussion
of scoping, we can add *dynamic* or *static* type checking. As usual,
dynamic performed at run-time and static performed before run-time.

Before diving into types of functions, let’s take a quick aside into
finding the type of a `bind`

remembering that like `lambda`

and application,
`bind`

defines new identifiers and associates them with values. To
start with, let’s examine a trivial bind that creates an identifier,
binds it to a value, and returns the resulting identifier:

```
bind x = 1 in
1+1
```

As always, `bind`

evaluates its body and returns the result. The type
of that result in turn becomes the type of the `bind`

. If we can find
the type of `1+1`

we can find the type of what evaluating `bind`

returns. In this case, that’s quite simple using typing rules we
already have. Specifically, any term of the form `t1+t2`

is of type
`TNum`

if both `t1`

and `t2`

are also of type `TNum`

. The type of the
`bind`

is clearly `TNum`

.

Things are much more interesting if we use the bound identifier in the
body of the `bind`

. Defining new identifiers and giving them values
is what `bind`

is all about. Let’s change the body of the `bind`

to
use the identifier:

```
bind x = 1 in
x+1
```

The same type rule applies to the body. If `x:TNum`

and `1:TNum`

then
`x+1:TNum`

. The second term is simple as `1:TNum`

by definition. The
first term requires some thought, but is still rather obvious. By
definition `x`

is bound to `1`

in the body of the `bind`

and `1`

is of
type `TNum`

. It follows that in the body of the `bind`

the identifier
`x`

has type `TNum`

. We will implement this intuition using a
*context* that behaves much like an environment does in an evaluator.

Thinking back to how environments work reveals the following simple
annotation for the `bind`

:

```
bind x = 1 in [(x,1)]
x+1
```

Following the `x = 1`

binding instance the environment contains a
binding of `x`

to `1`

. When `x`

appears in the `bind`

body, it is
looked up in the environment when evaluated. A *context* will serve
the same role, but for types rather than values. In the same way an
environment stores identifiers and values during evaluation, a context
will store identifiers and types during type checking.

The context is maintained exactly like an environment. When an
identifier is bound, the type of the bound value is associated with
the identifier in the context. Using a similar notation, we can
decorate the type inference process for `bind`

:

```
bind x = 1 in [(x,TNum)]
x+1
```

Now when `x`

is used in the term `x+1`

, its type is looked up in the
context maintained by the `typeofM`

function. Thus, `x+1`

is known to
have type `TNum`

because both `x`

and `1`

are type `TNum`

.

Let’s define the necessary type operations used by the `typeofM`

implementation that will perform the type inference operation. First,
the case for an identifier:

```
typeofM cont (Id id) = (lookup id cont)
```

The code is virtually identical to looking up the value of an
identifier in an environment. The function uses the identifier’s name
to look up the identifier’s type in the context, `cont`

. Conveniently,
`lookup`

returns a `Maybe`

, so we don’t need to do anything with its
return value.

The type of a `bind`

instance is again similar to evaluation:

```
typeofM cont (Bind i v b) = do { v' <- typeof cont v ;
typeof ((i,v'):cont) b }
```

The type of the bound value, `v`

, is first determined using the
current context, `cont`

. Then type of the `bind`

body is determined
using a new context resulting from the original context with `i`

bound
to the bound value’s type. This looks just like evaluation, except
we’re generating types rather than values. Hold that thought.

Now the formal rules. First for identifiers:

\[\frac{(i,T)\in \Gamma}{\Gamma\vdash i:T}\]If $(i,T)$ is in the current context, then type of $i$ is $T$. This
is a trivial definition, but necessary for completeness. Now the rule
for `bind`

:

Finding the type of `bind`

gives us all the tools needed to talk about
the type of `lambda`

and application. We’ve defined what a context is and
how to add an identifier to the context in the scope of a `bind`

. As
we move on to functions, keep these definitions in mind as finding
their types will use quite similar techniques.

After a diversion into `bind`

, let’s think about how to determine the
type of a `lambda`

. A function is at its essence a mapping from an
input value to an output value. For example:

```
lambda x in x+1
```

takes a value and returns that value plus 1. It is also true that
this function will only work correctly on numbers. Applying ```
(lambda
x in x+1)
```

to a Boolean or another function always results in an
error. We can express this by saying that `x`

must be of type `TNum`

using the notation `x:TNum`

. When applied to an argument of the
correct type this `lambda`

should result in a good value. For
example:

```
((lambda x in x+1) 5)
```

Furthermore, the type of the resulting value can be predicted. In
this case, the application results in a value of type `TNum`

. Putting
everything together, we can say that this `lambda`

expression applied
to any `TNum`

will result in `TNum`

should it terminate. To represent
this we use a *function type*. Specifically, we extend our current
type definition with a new function type in much the same way we
defined binary operators over values like `+`

or `-`

by using a
recursive construction:

This new definition is quite different from our other types. It allows constructing new types form existing types such as $\tnum \rightarrow \tnum$, $\tnum
\rightarrow (\tnum \rightarrow \tnum)$, and $(\tnum \rightarrow \tnum)
\rightarrow \tnum$. An informal name for the $\rightarrow$ operation
is a type former or a type function that takes two types and generates
a new type. Just like `+`

takes two numbers and generates a new
number, $\rightarrow$ will take two types and generate a new type.
The left type is called the *domain* and the right type the *range* of
the function type. If no parentheses are included, we assume that
$\rightarrow$ is right associative. We will often use `D`

for the
domain type and the `R`

for the range type.

Now that we have a means for defining function types, we can find
types for `lambda`

expressions like we did `bind`

. Let’s try to
define type inference iteratively by thinking about how we derive the
type of a trivial `lambda`

. Let’s start with the simple increment
function and see if we can find its type. Specifically, we would like
to find `T`

in the following expression:

```
lambda x in x+1 : T
```

All lambda expressions must have a function type, thus `T`

must have
the form `D->R`

where `D`

is the domain type and `R`

is the range
type. We don’t know what either `D`

or `R`

is yet, but we do know
that this particular function should have a function type:

```
lambda x in x+1 : D->R
```

Now we need to find `D`

and `R`

and will appeal to how we calculated
the type of `bind`

. Like the `bind`

, `x`

is the newly boudn
identifier and `x+1`

is the body where `x`

is bound. Just like `bind`

if we can find the type of `x+1`

we will know `R`

, but to find `R`

we
must know the type of `x`

. `bind`

uses a context to track defined
identifiers and their types during type checking like evaluation uses
an environment to track identifiers and their values during execution.
If we can add a binding of `x`

to its type to the context, its type
will be known when we calculate tye type of `x+1`

. Specifically using
the same notation to track context:

```
lambda x in [(x,D)]
x+1
```

Unfortunately, we have no way of inferring `D`

in the general case.
Look carefully at a similar `bind`

:

```
bind x = 1 in [(x,TNum)]
x+1
```

Here we do know `D`

because `x`

is given a value that has an
associated type. We do know know what value might be given to `x`

until the `lambda`

is applied to an argument. In the specific case of
`x+1`

we can know `D`

because `+`

takes two numbers. But consider
this trivial `lambda`

instance:

```
lambda x in x
```

There is no way to infer `D`

without knowing the value for `x`

and
that value will not be known until application evaluates the `lambda`

. It
simply isn’t possible to infer the input type in every case.

The way forward is changing the `lambda`

definition and *ascribing* a
type to the function’s input parameter declaring what type `D`

should
be. In this case, `TNum`

:

```
lambda (x:TNum) in x+1 : TNum->R
```

The notation `(x:TNum)`

declares that `x`

must be of type `TNum`

and
thus `D`

becomes `TNum`

. Because we now know the type of `x`

we can
find `R`

by adding `(x,TNum)`

to the context and finding the type of
`x+1`

. When calculating the type of `x+1`

using the definition for
the type of `+`

we simply look up `x`

in the current context.
Finally, type for the `lambda`

becomes:

```
lambda (x:TNum) in x+x : TNum->TNum
```

Generalizing this example, the case for funding the type of a `lambda`

in `typeof`

looks like this:

```
typeofM cont (Lambda x t b) = do { tyB <- typeofM ((x,t):cont) b ;
return (t :->: tyB) }
```

`D`

is given by the `lambda`

declaration. Find `R`

by adding `(x,D)`

to the context and finding the type of `b`

. Given `D`

and `R`

, the
type of the `lambda`

is `D->R`

.

The formal type rule can be written:

\[\frac{(x,D):\Gamma\vdash b:R}{\Gamma\vdash (\llambda (x:D)\;\iin b):D\rightarrow R}\]Knowing how to type `lambda`

, we now turn to application. The two arguments
to application should be a function and an argument to the function. Let’s
consider applying the increment function from the `lambda`

example and
finding the resulting type:

```
((lambda (x:TNum) in x+1) 5) : T
```

The first argument to application must be a fucntion or it cannot be applied
to anything. Thus we know that it must have the form `D->R`

. In this
example, `(lambda (x:TNum) in x+1)`

has type `TNum -> TNum`

and both
`D`

and `R`

are `TNum`

. For the function application to be
successful, we also know the the argument type must be `D`

. If we are
going to evaluate `(f a)`

, then `a`

must be of a type that `f`

accepts. In this case, `5`

must have type `TNum`

as it does. The
type of the application itself is then `R`

because the function type `D->R`

says the output will be of type `R`

if the input is of type `D`

. In
this case, the type of the application is `TNum`

because the type of `x+1`

in the context of `x:TNum`

is `TNum`

.

Let’s make this way simpler. Given some `f:D->R`

and `a:D`

, then
`(f a):R`

. Simple as that. Here’s the type rule that captures
this:

What does this say about the problem that motivated the chapter? Specifically, what is the type of:

```
bind inc = lambda x in
x + 1 in
(inc inc)
```

Let’s walk through the type derivation. First, we add a type to the
function’s formal parameter. It has to be `TNum`

. (Think carefully
about why.) The expression now becomes:

```
bind inc = lambda (x:TNum) in
x + 1 in
(inc inc)
```

To find the type of `inc`

, we must find the type of the `lambda`

bound
to it. `D`

is given to be `TNum`

, thus we add `(D,TNum)`

to the
context and determine the type of `x + 1`

to get `R`

:

```
bind inc = lambda (x:TNum) in [(x,TNum)]
x + 1 in
(inc inc)
```

`x`

is `TNum`

thus `x+1`

is `TNum`

by the definition of the type of
`t1+t2`

. The type of the `lambda`

now becomes `TNum -> TNum`

giving
the type for `inc`

. Now add the type of `inc`

to the context and look
at the type of the application, noting that the type of `x`

drops from the
context when the `lamdba`

’s scope closes:

```
bind inc = lambda (x:TNum) in [(x,TNum)]
x + 1 in [(inc,TNum->TNum)]
(inc inc)
```

Finding the type of an application requires finding the type of `inc`

to be
`TNum -> TNum`

by looking it up in the context. Now we apply a
function of type `TNum -> TNum`

to an argument of type `TNum -> TNum`

.
Clearly this is a problem because the type of the argument and the
type of function’s domain do not match. The type inference function
will throw and error and we will know *before* runtime that this
program will crash.

Now let’s build `evalM`

and `typeofM`

for `FBAE`

. First thing we need to do is update the concrete syntax to include a parameter type for `lambda`

and a new type constructor for function types:

No news here, we simply need to make new constructs available in the concrete syntax.

The abstract syntax follows from the concrete syntax, again adding a type parameter to `Lambda`

:

```
data FBAE where
Num :: Int -> FBAE
Plus :: FBAE -> FBAE -> FBAE
Minus :: FBAE -> FBAE -> FBAE
Bind :: String -> FBAE -> FBAE -> FBAE
Lambda :: String -> TFBAE -> FBAE -> FBAE
App :: FBAE -> FBAE -> FBAE
Id :: String -> FBAE
If :: FBAE -> FBAE -> FBAE -> FBAE
deriving (Show,Eq)
```

and a function type constructor to `FBAETy’:

```
data TFBAE where
TNum :: TFBAE
(:->:) :: TFBAE -> TFBAE -> TFBAE
deriving (Show,Eq)
```

Note the constructor for function types uses the Haskell infix
constructor notation allowing them to have the form `T :->: T`

similar
to how they are written in rules. Just a bit of Haskell trickery,
nothing significant.

Let’s write the `typeofM`

function first using `Maybe`

and `do`

to
construct cases for each term type. The signature of `typeofM`

is a
context and term to a type:

```
typeofM :: Cont -> FBAE -> (Maybe TFBAE)
```

There is nothing new in how types for constants and operations:

```
typeofM cont (Num x) = return TNum
typeofM cont (Plus l r) = do { TNum <- (typeofM cont l) ;
TNum <- (typeofM cont r) ;
return TNum}
typeofM cont (Minus l r) = do { TNum <- (typeofM cont l) ;
TNum <- (typeofM cont r) ;
return TNum }
typeofM cont (Bind i v b) = do { v' <- typeofM cont v ;
typeofM ((i,v'):cont) b }
typeofM cont (Id id) = (lookup id cont)
typeofM cont (If c t e) = do { TNum <- (typeofM cont c) ;
t' <- (typeofM cont t) ;
e' <- (typeofM cont e) ;
if t'==e'
then return t'
else Nothing }
```

After all our work defining type rules, the new cases for `typeofM`

follow quickly. For `Lambda`

, `typeofM`

is called on the `Lambda`

body
with the argument name bound to its type added to the original
context. `D`

is known from the `lambda`

and `R`

is learned by calling
`typeofM`

:

```
typeofM cont (Lambda x t b) = do { tyB <- typeofM ((x,t):cont) b ;
return (t :->: tyB) }
```

The final type becomes `td :->: tr`

as defined by our previous type rule.

`App`

is a bit more involved, but nothing too dramatic. First, the
`App`

’s argument type is found and bound to and `tyY`

respectively. A
`case`

statement finds the type of the `App`

’s function element. If
it is a function type, the domain type is compared to the argument
type and the range type returned if they match. If they don’t, a
mismatch occurs and an error results:

```
typeofM cont (App x y) = do { tyXd :->: tyXr <- typeofM cont x ;
tyY <- typeofM cont y ;
if tyXd==tyY
then return tyXr
else Nothing }
```

If the type of the `App`

’s function argument is anything but a
function, an error is thrown immediately.

That’s it for `typeof`

over `FBAE`

. Do we now need a new `eval`

?
Other than accounting for the new abstract syntax for `lambda`

, no
changes are needed for the `eval`

function for `FBAE`

. For brevity,
we’ll skip that small update.

The simplest way to do dynamic type checking on this new language with
function types is to do nothing. Literally. We have an interpreter
that performs dynamic checking on terms other than `lambda`

and application.
While we could use the argument type from `lambda`

, the simplest thing
is to do what Lisp variants to and perform substitution and throw
errors when expressions are evaluated.

In our discussion of static type checking for `FBAE`

we did not
discuss scoping. Specifically, now that we’ve identified dynamic and
static scoping as different approaches, do we need different type
inference capabilities for each?

The difference between static and dynamic scoping is whether the static declaration environment or the runtime environment is used to find symbol values. Obviously, we don’t know the runtime environment until, well, runtime. Do you see the problem? If we don’t know what symbol is being referenced until runtime, we can’t statically check it’s type. Look at this definition that slightly modifies an example from our discussion of static and dynamic scoping:

```
bind n = 1 in
bind f = (lambda x in x + n) in
bind n = true in
(f 1)
```

Using static scoping, we know the `n`

referenced in the `lambda`

is
the first `n`

that is a number. Using dynamic scoping, the second `n`

gets used and it is a Boolean. Delete that binding, and dynamic
scoping matching static scoping. The same function called in
different places has different types, one legit and one not.

It may be too strong to say we can’t statically check this expression. However, we would need to do something akin to evaluation statically. That makes no sense. Regardless, this is yet another argument against using dynamic scoping.

An important observation is that `typeofM`

is simply an evaluation
function that returns types rather than values. Or, types *are*
values in this particular interpretation. Programs are simply data
structures and writing interpreters for them - regardless of the
interpretation type - has the same form. A great exercise to do now
is rewrite the interpreter to use the values `odd`

and `even`

rather
than numbers. You will discover that you can reuse almost all your
code from `evalM`

or `typeofM`

targeting these new values.

- Context - A list of identifiers and their types currently in scope.
- Static Type Checking - Type checking performed before interpretation.
- Dynamic Type Checking - Type checking performed at run-time.