Enrichment is the term we for the process of adding new features to an existing language. What we’ve done in this text is learn a new concept by enriching an existing language with new features. We started with an arithmetic expression language, added Booleans, added identifiers, and so forth until now we have a statically scoped language with recursion.
The technique we’ve used for enrichment is called extension. When extending a language we:
This in turn supports defining new interpreters, type checkers, and other analysis tools.
The second technique explored here is using a derived form. When we add a new feature as a derived form we define the new feature in terms of existing language features. We will still add new syntax and types, but we will add the language features themselves by translating those features into existing constructs.
Programs are just data structures. Elaboration is the process of transforming one expression into another that transforms one data structure into another. A derived form defines a transformation while elaboration performs it. Elaborators are quite similar in nature to evaluators and type checkers and optimizers. This should not be at all surprising to you as everything we’ve written has a similar shape.
We haven’t played with
AE in quite awhile, so lets define a simple elaborator that adds a new operation. Specifically, let’s add the unary operation
AE has no functions, so this is the only way we can add a new expression. We’ll skip the parsing and concrete syntax and go straight to the original AST:
data AE where Num :: Int -> AE Plus :: AE -> AE -> AE Minus :: AE -> AE -> AE Mult :: AE -> AE -> AE Div :: AE -> AE -> AE deriving (Show,Eq)
inc is not defined in the original AST. If we’re going to add
inc as a syntactic element, we need a different AST to include it. Let’s define
AEX and add
data AEX where NumX :: Int -> AEX PlusX :: AEX -> AEX -> AEX MinusX :: AEX -> AEX -> AEX MultX :: AEX -> AEX -> AEX DivX :: AEX -> AEX -> AEX IncX :: AEX -> AEX deriving (Show,Eq)
Note that we’ve renamed both the type and its constructors so we can tell them apart. The elaborator translates AEX to AE:
elab :: AEX -> AE
and it’s rather simple to do. Every syntactic form in
AEX has a similar form in
inc. That means no translation necessary. Just copy the terms like this:
elab (PlusX l r) = (Plus l r)
Not quite. What if
r are also complex expressions that contain
inc? They won’t get elaborated. Plus, the arguments to
AEX. Let’s elaborate them as well:
elab (PlusX l r) = (Plus (elab l) (elab r))
This will work and will work for every term other than
elab (NumX n) = (Num n) elab (PlusX l r) = (Plus (elab l) (elab r)) elab (MinusX l r) = (Minus (elab l) (elab r)) elab (MultX l r) = (Mult (elab l) (elab r)) elab (DivX l r) = (Div (elab l) (elab r))
Now we need to take care of
IncX, but we have a definition for it:
inc x == x + 1
The case for
elab (IncX t) = (Plus (elab t) (Num 1))
Now we’re done.
inc x translates to
To use the elaborator, we call it right before
interp t = eval . elab
Note that the name
AEX is chosen because
AEX is called the eXternal language. In contrast,
AE is the internal language.
Our first derived form is for
bind. As it turns out,
bind can be rewritten in terms of
lambda. The derived form of
bind is expressed formally as:
bind x = t1 in t2 == ((lambda x in t2) t1)
== is the meta-language notion of equivalence or definition.
When we looked at
bind originally we said it declares an identifier and gives the identifier a value. In effect, it gives a value a name whose scope is the
bind body. In the
bind x = 5 in x + 1
x is bound to
5 in the bind body,
lambda is half of bind. It defines a new identifier and its scope, but does not give it a value. That will happen later when the
lambda is applied. In the
lambda x in x + 1
x is defined over
x+1, but has no value - the definition half of
The value half of
bind is achieved with function application When we apply a
lambda to a value, the value is bound to the
lambda’s formal parameter. Thus:
((lambda x in x + 1) 5)
x to have the value
x+1. Furthermore, this works for any
lambda and any value. Just like
let. From these observations we define the derived form for
bind x = t1 in t2 == ((lambda x in t2) t1)
Another interesting example uses functions to implement product types, or what you more likely think of as pairs. Let’s first think about how we want pairs to behave. The best way to do this is with inference rules that describe their behavior:
In honor of John McCarthy We’ll use the classical syntax for pairs and lists from Lisp.
cons t1 t2 will create a pair containing the results of evaluating
car (cons t1 t2) will return
(cdr (cons t1 t2)) will return
t2. First, lets define the abstract syntax of the external language that contains pairs:
data FBAEX where NumX :: Int -> FBAEX ... ConsX :: FBAEX -> FBAEX -> FBAEX CarX :: FBAEX -> FBAEX CdrX :: FBAEX -> FBAEX
All we’ve done is added constructors for
cdr representing the newly added operations. We will also need a new type to represent pairs. We will extend the external language type representation with a product that contains two types:
data FBAEXTy where ... TProd :: FBAEXTy -> FBAEXTy -> FBAEXTy
We’ll skip the concrete syntax as it tends to be uninteresting. Particularly here. Now the derived forms defining how we translate pair operations into the internal language
cons t1 t2 == lambda x in if x then t1 else t1 car t1 == t1 true cdr t1 == t1 false
cons is implemented as a
lambda over 1 argument created using the two pair elments,
t2. The encapsulated
if will return
t1 if the
lambda input is
t2 if it is false.
cdr apply a pair to
car will return the first argument to
cdr the second.
Let’s see if we can write the inference rules:
The only rules here are for
cdr. Where’s the rule for
cons? It turns out we don’t need one. The only thing we are about when we implement pairs is exactly what we wrote. The behavior of
cons is completely defined by the observers
Using derived forms and elaboration to define new language features has numerous advantages over extending a definition. First, the implementation is almost always simpler. We are writing an elaborator that is simpler than an interpreter that usually needs to work only about the new feature.
Second, the original language need not be verified again. Tested certainly, but because we are not truly adding anything to the language except syntax all we need to worry about is the new definitions and whether they implement the new feature correctly.
For this reason, a variant of elaboration is the basis for building almost every sophisticated programming language. A kernel language is defined and features added by incrementally defining them in terms of the previous language. Think about building an onion from the inside out by adding layers. The difference between true elaboration and this approach is there is no formal definition of the elaborator. One layer is simply defined in terms of the prior layer.
caseas a derived form using
if. You may use other language constructs if necessary