For those of us in the functional language community, functions are the essence of computation. From Church’s Lambda Calculus forward, the use of functions to encapsulate and apply computation has been an essential part of computer science.
As an example of where we want to go, consider a Haskell function inc
that simply increments the value of its argument and returns the result:
inc x = x + 1
Applying inc
to an actual parameter value substitutes that parameter with the value in the function body:
inc 3
== 3 + 1
== 4
Like mathematics, the argument to inc
can be a complex expression:
inc ((5+1) - 3)
== inc 6 - 3
== inc 3
== 3 + 1
== 4
or alternatively:
inc ((5+1) - 3)
== ((5+1) - 3) + 1
== 3 + 1
== 4
The common concept here is substitution of an identifier by an expression in the body of the function. This will form the basis of function application.
Before we go further, let’s establish a few definitions. In the definition of inc x
we refer to x
as a formal parameter. When we apply inc
to an expression as in inc ((5+1) - 3)
we refer to ((5+1) - 3)
as an actual parameter or an argument. We refer to the function applied to an expression such as inc ((5+1) - 3)
as an application of inc
. Finally, we refer to the expression that defines inc
, x+1
as the body of inc and the scope of x
as the body of inc
.
Given all these nice definitions we can describe the application of inc
to any actual parameter as substituting its formal parameter, x
, with the actual parameter from the body of inc
. We can be a just little more formal and general. _The application of any function to an actual parameter is the substitution of its formal parameter with its actual parameter in its body. Looking back to inc 3
in light of this definition, applying inc
to 3
is substituting x
with 3
in x+1
.
Having defined informally what functions do, lets talk briefly about how they are defined before going into a formal definition of their use.
First-order functions are functions that have a special representation that is not accessible to the programer. They cannot take other functions as arguments or return functions, thus the name first-order. Some languages with first-order functions do not even provide a syntax for defining functions, but most allow function definition of some type.
If the definition for inc
above were done in a first order language, the resulting definition would add a function named inc
to the environment that can then be called. Specifically:
inc x = x + 1
(inc 10)
defines inc
and calls he resulting function on 10
. Because inc
is first order, it is not possible to define functions like map
that would apply inc
to every element of a list. No function can be used as an argument to another. Furthermore, no function can return a function as its return value. While this may seem an odd thing to do right not, it is a powerful concept that allows such things as currying and partial evaluation that will come later.
Higher-order functions take other functions as arguments and may return functions as values. Our old friend the map
function is a great example of a higher-order function:
map :: (a -> b) -> [a] -> [b]
The first argument to map
is a function mapping elements of type a
to type b
. Similarly, function application may result in a function. Currying the map
function is a good example of this:
map inc
In this case, the application of map
to the function inc
results in a new function of type that adds 1
to every element of an input list.
First-class functions are values treated like any other value in a language. In addition to being applied to actual parameters, they can be created statically, created at run-time, assigned to variables, passed as parameters, and returned as values. In many languages they can even be reflected upon and examined.
Not stopping at higher-order functions, Haskell has first-class functions as does Scheme and ML. A function value is created using a lambda or other special form. In Haskell the following produces a function value that adds 1
to an input value:
(\x -> x+1)
The backslash is used to resemble the lambda from lambda calculus and is easy to remember for that reason. We can bind a function value to a name just like any other value:
inc = (\x -> x+1)
inc
is an identifier bound to the function (\x -> x+1)
and behaves just like any other identifier. Function values such as these are frequently referred to as anonymous functions because they do not have explicit names. In the Haskell expression:
map inc l
the value for inc
is found just like the value for l
.
The use of function values in programming languages dates back to Lisp and is finding its way into many mainstream languages. It is an elegant solution to defining functions that requires little syntax and few semantic extensions.
Our next language will implement first-class functions as an extension of BAE
. The concrete syntax for FBAE
is a simple extension of BAE
to include function definitions and applications. Once again we are dropping Booleans for the time being:
The third and fourth lines introduce the new syntax. Functions are defined with the $\llambda$ keyword following by an argument and function body. The value of our inc
function in FBAE
becomes:
lambda x in x+1
and we can give it the name inc
using a bind
:
bind inc = (lambda x in x+1) in ...
where the ellipsis represents the remainder of the program using inc
. Applying inc is done using function application as in:
bind inc = lambda x in x+1 in
inc 1
Function application uses the common syntax $(f\; t)$ represent the function $f$ applied to the term $t$. Parenthesis are frequently ommitted when the application is unambiguous without them.
We can now use concrete syntaxc to define a function, give it a name, and apply it to an actual parameter.
Additions to abstract syntax include new fields for Lambda
and App
. The definitions follow immediately from what we defined in the concrete syntax. The new abstract syntax for FBAE
becomes:
data FBAE where
Num :: Int -> FBAE
Plus :: FBAE -> FBAE -> FBAE
Minus :: FBAE -> FBAE -> FBAE
Bind :: String -> FBAE -> FBAE -> FBAE
Lambda :: String -> FBAE -> FBAE
App :: FBAE -> FBAE -> FBAE
Id :: String -> FBAE
The most basic definition for functions and their application to arguments comes from Church and the Lambda Calculus. Hopefully you will have the opportunity to study Lambda Calculus later, but for now we will only borrow some basic concepts for our work.
A hint for our direction is the similarity between the concrete syntax for bind
and lambda
:
bind x = 5 in x
lambda x in x
bind
defines a new identifier, binds it to a value, and uses the new identifier in the body of the bind
. lambda
is quite similar in that it defines a new identifier and uses the new identifier in the body of the lambda
. However, it does not bind the identifier to a particular value. That binding is deferred until the lambda
is applied. bind
and lambda
are cousins in that both define new identifiers. bind
and apply are similarly cousins in that both bind values to identifiers. Regardless of the relationship, we should be able to use the tools for defining bind
to similarly defined apply.
The simplest definition for function application is called $\beta$-reduction and uses substitution to replace an identifier with its value. In the $\beta$-reduction rule below, $i$ is the identifier defined by the $\mathsf{lambda}$, $b$ is the body of the $\llambda$ and $a$ is the argument the $\mathsf{lambda}$ is applied to. If $a$ evaluates to some value $v$ then then a function applied to $a$ results in substituting $v$ for the function’s formal parameter. The rule for this is:
\[\frac{a\eval v}{((\llambda i\; b) \; a) \rightarrow [i\mapsto v]b}[\beta-reduction]\]The result of the application is $[i\mapsto v]b$, or replacing $i$ with $a$’s value in $b$. This is exactly how we expect functions to behave. When applied, their formal parameter is replaced with an actual parameter and the result becomes the function application’s value.
Let’s think about how to implement evaluating App
and Lambda
using the substitution operator defined for bind
. We established that bind
both defines and replaces an identifier while function definition defines an identifier and application replaces it. Let’s review the substitution performed for bind
in the evalS
function:
evalS (Bind i v b) = do { v' <- evalS v ;
(evalS (subst i v' b)) }
The subst
function gets is identifier from the bind
syntax as well as the value it is substituting, v
. The body, b
, is also found in the bind
expression. Thus, the subst
application is simple. We can understand the evaluation of App
by looking for the arguments to subst
in the lambda
and its actual parameter.
Before evaluating App
, lets evaluate both its arguments. Specifically, f
to get a lambda value and a
to get a value for the actual parameter:
evalS (App f a) = do { (Lambda i b) <- (evalS f) ;
v <- (evalS a) ;
evalS (subst i v b) }
Now we have the pieces needed to apply subst
. i
is the identifier defined by the Lambda
. The argument to the substitution is provided by the App
as v
. We evaluate it first before passing it into the substitution. Finally, the body of the function, b
, is the expression substituted into. The result is the following application of subst
:
evalS (subst i v b)
and embedding the substitution into the case for App
gives us the following:
evalS (App f a) = do { (Lambda i b) <- (evalS f) ;
v <- (evalS a) ;
evalS (subst i v b) }
And there we have it. Substitute the argument in for the identifier in the body of the function.
What about the Lambda
case for evalS
? As it turns out, lambda
s are values. Just like True
or (Num 0)
, (Lambda x (Plus (Id x) (Num 1)))
is a value and is not evaluated further. Thus, the evaluation of Lambda
is trivial:
evalS (Lambda i b) = return (Lambda i b)
Putting the whole thing together with evaluation of remaining terms gives us a substituting interpreter for FBAE
:
evalS :: FBAE -> (Maybe FBAE)
evalS (Num x) = (Just (Num x))
evalS (Plus l r) = do { (Num l') <- (evalS l) ;
(Num r') <- (evalS r) ;
return (Num (l' + r')) }
evalS (Minus l r) = do { (Num l') <- (evalS l) ;
(Num r') <- (evalS r) ;
return (Num (l' - r')) }
evalS (Bind i v b) = do { v' <- evalS v ;
(evalS (subst i v' b)) }
evalS (Lambda i b) = return (Lambda i b)
evalS (App f a) = do { (Lambda i b) <- (evalS f) ;
a' <- (evalS a) ;
evalS (subst i a' b) }
evalS (If c t e) = do { (Num c') <- (evalS c) ;
if c'==0 then (evalS t) else (evalS e) }
evalS (Id id) = Nothing
The new interpreter for functions has the same problem as our original interpreter for bind
. Substitutions are performed immediately throughout an expression. Thankfully, we can use an environment again to solve the same problem. Recall how we used the environment variable to pass a list of defined variables and their values to implement bind
:
evalM env (Bind i v b) = do { v' <- evalM env v ;
evalM ((i,v'):env) b }
We can do a similar thing with function application. In the same way we replaced the call to subst
with an environment update in bind
, we can literally do the same thing with function application:
evalM env (App f a) = do { (Lambda i b) <- (evalM env f) ;
a' <- (evalM env a) ;
evalM ((i,a'):env) b }
The identifier and its value are added to the environment before evaluating the function body. Whenever the identifier appears, the value is found in the environment and used. Virtually the same implementation as bind.
The complete code for the interpreter implementing an environment is:
type Env = [(String,FBAE)]
evalM :: Env -> FBAE -> (Maybe FBAE)
evalM env (Num x) = return (Num x)
evalM env (Plus l r) = do { (Num l') <- (evalM env l) ;
(Num r') <- (evalM env r) ;
return (Num (l'+r')) }
evalM env (Minus l r) = do { (Num l') <- (evalM env l) ;
(Num r') <- (evalM env r) ;
return (Num (l'-r')) }
evalM env (Bind i v b) = do { v' <- evalM env v ;
evalM ((i,v'):env) b }
evalM env (Lambda i b) = return (Lambda i b)
evalM env (App f a) = do { (Lambda i b) <- (evalM env f) ;
a' <- (evalM env a) ;
evalM ((i,a'):env) b }
evalM env (Id id) = do { v <- (lookup id env) ;
return v }
evalM env (If c t e) = do { (Num c') <- (evalM env c) ;
if c'==0 then (evalM env t) else (evalM env e) }
We now have two interpreters for what we hope is the same untyped system with functions. We can use QuickCheck to help our testing, but we have immediate problems that can easily be seen.
We have introduced a new kind of value to our interpreter. Clearly when we added Booleans to our first interpreter there were new values to deal with. It’s less obvious how we have new values in this language, but remember that lambda
creates a function value. lambda
expressions are values just like 1
or true
. What this means is expression like this one:
bind f = (lambda x in x+1) in
f f
will crash. f
can take any argument, but its expression x+1
will only operate on numbers.
Let’s consider the following code snippet:
bind f = (lambda x in x + n) in
f 1
The bind
defines f
whose value is the function that returns its input plus n
. This looks quite a bit like inc
, but instead adds n
to the argument rather than 1
. Where does n
come from? In this case, nowhere. Both evalS
and eval
throw an error when they cannot find the value of n
. Let’s now add an n
with f
in its scope:
bind n = 1 in
bind f = (lambda x in x + n) in
f 1
The outer bind
gives n
the value 1
while the inner bind
gives f
the lambda value that returns its input argument plus n
. We just defined f
in the context of n
. When we evaluate the new bind
bith evalS
and eval
give us the value 2
as expected. So far so good.
Lets take this one step further by defining multiple, nested values of n
by making the body of f
a bind
expression like this:
bind n = 1 in
bind f = (lambda x in x + n) in
bind n = 2 in
f 1
What do you expect to get when evaluating this expression? evalS
gives us the value 2
. However, eval
gives us the value 3
. What’s happening here? Whatever it is, its bad because our two interpreters for the same language are now giving different results with evalS
and eval
. Same language using deferred substitution rather than immediate substitution giving different values.
We have successfully implemented an untyped function system by adding to our original BAE
interpreter. We demonstrated that we can implement direct substitution and deferred substitution just as we did with bind
expressions. However, we’ve run into two problems associated with new function values and an odd difference in the way evalS
and eval
process expressions. As you might guess, we will deal with the new function values by predicting failure using types. We will deal with the difference between immediate and deferred substitution by looking at static and dynamic scoping.
eval
function for a language with only first-class functions using direct substitutions and the subst
operation defined for bind
eval
function for a language with only first-class functions to defer substitution using an environment that contains both functions and values.