Welcome to part 3 of this series in compiling functional languages to LLVM. In
part 2 we added basic control
flow to our langauge with an if / then / else
construct.
Today we’re going to get one important step towards a functional programming language by adding functions and variables.
By the end of today we’ll be able to make small modules such as:
function add(a: integer, b: integer) {
a + b
}
function increment(a: integer) {
a + 1
}
add(increment(1), add(2, 3))
However, there are quite a few things our functions won’t do.
- They can’t call themselves recursively (ie,
function factorial(a: Integer) { if a == 0 then 1 else a * factorial(a - 1) }
. This is to simplify the typechecker implementation for the time being rather than any limitation in LLVM - we’ll come back round to making this possible. - We must define all function arguments up front (ie,
a: Integer, b: Boolean
). It is possible to infer these, but let’s keep things simple for now. - Functions can only call functions defined before them. This can be solved by doing some dependency analysis before typechecking (ie, move things around to typecheck them in a sensible order). We avoid this for now, but will come back to it.
- Functions can’t return other functions, they can only be defined at the top level. This is a limitation of LLVM, however in future chapters we’ll work around this by implementing closures.
OK, lets get concrete
We’re going to need some new datatypes to express all the new things we want to do.
Function
Firstly, we have a Function
type.
data Function ann = Function
fnAnn :: ann,
{ fnArgs :: [(ArgumentName, Type ann)],
fnFunctionName :: FunctionName,
fnBody :: Expr ann
}deriving stock (Eq, Ord, Show, Functor)
This lets us describe something like:
function add(a: Integer, b: Integer) {
a + b
}
Note that the function body is a single Expr
, and that it can use variables
a
and b
introduced as function arguments.
The ann
type will contain file location information
after parsing, and then will contain the type of the function after
typechecking.
Our function implementation is very limited - we can only use variables passed into the function as arguments, and must explicitly annotate each function argument with it’s type.
Module
A Module
lets us combine multiple functions and a main
expression.
data Module ann = Module
mdFunctions :: [Function ann],
{ mdExpr :: Expr ann
}deriving stock (Eq, Ord, Show, Functor)
This lets us write code like:
function increment(a: Integer) {
a + 1
}
function decrement(a: Integer) {
a - 1
}
increment(decrement(1)) == 1
As mentioned earlier, functions can only be used in the order
they are defined. Therefore decrement
could call increment
, but not the
other way round. We can improve this in future with some basic dependency
analysis.
Expr
Our Expr
needs a couple of new constructors.
data Expr ann
= EPrim ann Prim
| EInfix ann Op (Expr ann) (Expr ann)
| EIf ann (Expr ann) (Expr ann) (Expr ann)
| EVar ann Identifier -- new!
| EApply ann FunctionName [Expr ann] -- new!
deriving stock (Eq, Ord, Show, Functor, Foldable, Traversable)
EVar
represents a variable such as a
, and EApply
represents function application (add(1,2)
).
Typechecker changes
After a rather sizable swerve into bidirectional typechecking in the previous part, we are going to focus less on the changes here.
Adding state
The most important part is that it’s become stateful, as we will be learning
about both functions and variables as we typecheck. We have created a
TypecheckM
newtype that we use, that contains both a ReaderT
and a
StateT
.
newtype TypecheckEnv ann = TypecheckEnv
tceVars :: HashMap Identifier (Type ann)
{
}deriving stock (Eq, Ord, Show)
newtype TypecheckState ann = TypecheckState
tcsFunctions :: HashMap FunctionName (Type ann)}
{deriving stock (Eq, Ord, Show)
newtype TypecheckM ann a = TypecheckM
getTypecheckM ::
{ReaderT (TypecheckEnv ann)
StateT (TypecheckState ann)
(Either (TypeError ann))) a
(
}deriving newtype
Functor,
( Applicative,
Monad,
MonadReader (TypecheckEnv ann),
MonadError (TypeError ann),
MonadState (TypecheckState ann)
)
The reasons for separate Reader
and State
are the nature of the state in
them. Variables only live for the life of a function that are defined in, so
they live in the TypecheckEnv
used by Reader
, and disappear after the
function definition is typechecked (more information on this technique
here).
The trick is using the local
function from Control.Monad.Reader
:
withFunctionArgs :: [(Identifier, Type ann)] -> TypecheckM ann a -> TypecheckM ann a
=
withFunctionArgs args computation
local->
( \tce
tce= tceVars tce <> HM.fromList args
{ tceVars
}
) computation
We pass in some args
, which are the function arguments and their types, and
computation
, which is whatever typechecking we’d like do. Then throughout
running computation
, we’ll have extra variables in scope, and then they’ll
disappear again. This is helpful for typechecking functions, where the vars
only exist inside.
Inference changes
We have two new infer
cases, EVar
and EApply
. EVar
is pretty
straightforward:
EVar ann var) = do
infer (<- lookupVar ann var
ty pure (EVar ty var)
We lookup the type for var
, and decorate the type with it. If lookupVar
fails, it “throws” a TypeError ann
.
-- | look up a saved identifier "in the environment"
lookupVar :: ann -> Identifier -> TypecheckM ann (Type ann)
= do
lookupVar ann identifier <- asks (HM.lookup identifier . tceVars)
maybeType case maybeType of
Just found -> pure found
Nothing -> do
<- asks (HM.keysSet . tceVars)
allIdentifiers VarNotFound ann identifier allIdentifiers) throwError (
The other new infer
case is EApply
. This is used to get the type of an
applied function.
EApply ann fnName args) = do
infer (-- lookup function by name in State
<- lookupFunction ann fnName
fn <- case fn of
(ty, elabArgs) TFunction _ tArgs tReturn -> do
-- check the arguments length match the function
whenlength args /= length tArgs)
($
(throwError FunctionArgumentLengthMismatch ann
length tArgs)
(length args)
(
)-- check each arg against type
<- zipWithM check tArgs args
elabArgs -- return type and elaborated arguments
pure (tReturn, elabArgs)
-> throwError $ NonFunctionTypeFound ann fn
_ pure (EApply (ty $> ann) fnName elabArgs)
Note how we use check
here to check each argument against the expected type in the
function. This is where the bidirectional type checking approach really starts
to shine, as any problems become immediately apparent.
The lookupFunction
helper is very similar to lookupVar
, except it looks in
the State
instead of Reader
:
-- | look up a saved identifier "in the environment"
lookupFunction :: ann -> FunctionName -> TypecheckM ann (Type ann)
= do
lookupFunction ann fnName <- gets (HM.lookup fnName . tcsFunctions)
maybeType case maybeType of
Just found -> pure found
Nothing -> do
<- gets (HM.keysSet . tcsFunctions)
allFunctions FunctionNotFound ann fnName allFunctions) throwError (
These types, and the functions used to store / fetch variables and functions are defined here.
Elaborating a function
Our functions have types for the arguments, so we push them into the Reader
environment, and then elaborate the expression inside.
elaborateFunction ::
Function ann ->
TypecheckM ann (Function (Type ann))
Function ann args name expr) = do
elaborateFunction (-- with the `args` added to the Reader, infer the type of `expr`
<- withFunctionArgs args (infer expr)
exprA -- adjust the types of the arguments
let argsA :: [(ArgumentName, Type (Type ann))]
= fmap (second (\ty -> fmap (const ty) ty)) args
argsA -- create type of function
let tyFn = TFunction ann (snd <$> args) (getOuterAnnotation exprA)
-- wrap it all back up again
pure (Function tyFn argsA name exprA)
We’ve extended the Type
datatype to add a TFunction
constructor, which
contains the types of all the arguments, and the return type. All functions
will have a TFunction
type.
data Type ann
= TPrim ann TypePrim
| TFunction ann [Type ann] (Type ann) -- new!
deriving stock (Eq, Ord, Show, Functor)
Elaborating a module
Elaborating a module involves:
- Elaborate each function
- Push it’s type into the
State
- Elaborate the
main
expression
elaborateModule ::
forall ann.
Module ann ->
Either (TypeError ann) (Module (Type ann))
Module {mdFunctions, mdExpr}) = runTypecheckM (TypecheckEnv mempty) $ do
elaborateModule (-- typecheck all functions...
<-
fns traverse
-> do
( \fn -- typecheck function
<- elaborateFunction fn
elabFn -- add it to State
storeFunction (fnFunctionName elabFn) (fnAnn elabFn)-- return it
pure elabFn
)
mdFunctions
-- typecheck `expr`, and wrap everything back together
Module fns <$> infer mdExpr
Updating the interpreter
We won’t go into the interpreter changes today, they work in the same way as
the typechecker,
storing variables in the Reader
env and functions in the State
. The code
lives
here.
To the IR!
Here is an expression:
function sum(a: Integer, b: Integer) {
a + b
}
sum(20, 22)
Here is the LLVN output for it. Hopefully it’s not too brutal.
; ModuleID = 'example'
declare external ccc void @printint(i32)
define external ccc i32 @sum(i32 %a_0, i32 %b_0) {
%1 = add i32 %a_0, %b_0
ret i32 %1
}
define external ccc i32 @main() {
%1 = call ccc i32 @sum(i32 20, i32 22)
call ccc void @printint(i32 %1)
ret i32 0
}
; ModuleID = 'example'
A comment, lol.
declare external ccc void @printint(i32)
We define our output function from our standard library.
define external ccc i32 @sum(i32 %a_0, i32 %b_0) {
Define the function. The first i32
is the return type. %a_0
is the first
argument with type i32
, and %b_0
is the second argument, with type i32
.
%1 = add i32 %a_0, %b_0
Body of the sum
function, add %a_0
and %b_0
and assign the result to
%1
.
ret i32 %1
Return %1
from the function.
}
End of sum
function.
define external ccc i32 @main() {
This defines the main
function, the entry point to our application.
%1 = call ccc i32 @sum(i32 20, i32 22)
Body of the main
function - this calls the sum
function, passing it 20
amd 22
as args. The result is stored in %1
.
call ccc void @printint(i32 %1)
Pass the result (%1
) to the printint
function in our standard library.
ret i32 0
Return the value 0
to show the program successfully completed.
}
Nice closing bracket. Time for a rest.
Generating IR from Haskell
Adding functions and variables means that our IR generation also becomes stateful. We’re going to define a couple more types:
data OutputState = OutputState
osFunctions :: Map FunctionName LLVM.Operand,
{ osVars :: Map Identifier LLVM.Operand
}
It also means something can go wrong:
data OutputError
= CantFindVar Identifier
| CantFindFunction FunctionName
| NonFunctionType (Type ())
deriving stock (Eq, Ord, Show)
Both these errors shouldn’t happen if the typechecking is working, however
it’s nice to capture them properly rather than just throwing with error
.
It means we also need a similar set of functions for adding and looking up functions and variables, which can be found here and are hopefully unsurprising.
A lot of our implementation are various “turn things into LLVM” functions:
typeToLLVM :: Type ann -> LLVM.Type
TPrim _ TBool) = LLVM.i1
typeToLLVM (TPrim _ TInt) = LLVM.i32
typeToLLVM (TFunction _ tyArgs tyRet) =
typeToLLVM (LLVM.FunctionType (typeToLLVM tyRet) (typeToLLVM <$> tyArgs) False
functionNameToLLVM :: FunctionName -> LLVM.Name
FunctionName fnName) =
functionNameToLLVM (LLVM.Name (fromString (T.unpack fnName))
functionArgToLLVM ::
ArgumentName, Type (Type ann)) ->
(LLVM.Type, LLVM.ParameterName)
(ArgumentName argName, ty) =
functionArgToLLVM (let llvmType = typeToLLVM (getOuterTypeAnnotation ty)
= LLVM.ParameterName (fromString (T.unpack argName))
paramName in (llvmType, paramName)
Now we can create IR for each function in our module. We put each function
variable in State
so that we can look them up when generating expressions:
functionToLLVM ::
LLVM.MonadModuleBuilder m,
( MonadFix m,
MonadState OutputState m,
MonadError OutputError m
=>
) Function (Type ann) ->
m ()Function {fnAnn, fnFunctionName, fnBody, fnArgs}) = do
functionToLLVM (-- get llvm type of function
<- case fnAnn of
retType TFunction _ _ tyRet -> pure $ typeToLLVM tyRet
-> throwError (NonFunctionType (fnAnn $> ()))
_
let argTypes = functionArgToLLVM <$> fnArgs
= functionNameToLLVM fnFunctionName
functionName
-- create the LLVM function
<- LLVM.function functionName argTypes retType $ \args -> do
llvmFunction -- save the args in the environment
saveArgs$
( M.fromList zipWith
ArgumentName argName, _) arg ->
( \(Identifier argName, arg)
(
)
fnArgs
args
)
-- build the LLVM AST for our expression
<- exprToLLVM fnBody
ourExpression
-- return result
LLVM.ret ourExpression
-- save reference to this function in our State to lookup in other
-- expressions
saveFunction fnFunctionName llvmFunction
We have two new cases in exprToLLVM
, for EVar
and EApply
. These somewhat
echo the implementation in the typechecker above. For EVar
, we lookup the
LLVM IR in our State
.
EVar _ var) =
exprToLLVM ( lookupArg var
And for EApply
, we lookup the function in State
, then use LLVM’s call
to
pass all the arguments to it.
EApply _ fnName args) = do
exprToLLVM (<- lookupFunction fnName
irFunc <- traverse exprToLLVM args
irArgs <$> irArgs) LLVM.call irFunc ((,[])
Finally, we bring it all together in the top level moduleToLLVM
function,
which takes our typechecked Module
and creates an LLVM module.
-- | given our `Module` type, turn it into an LLVM module
moduleToLLVM :: Module (Type ann) -> Either OutputError LLVM.Module
Module {mdExpr = expr, mdFunctions}) =
moduleToLLVM (flip evalStateT (OutputState mempty mempty) $ LLVM.buildModuleT "example" $ do
-- get the printing function for our `expr`'s return type
<- printFunction (getOuterAnnotation expr)
printFn
-- create all our functions
traverse_ functionToLLVM mdFunctions
-- create a function called `main` that will be the entry point to our
-- program
"main" [] LLVM.i32 $ \_ -> do
LLVM.function -- build the LLVM AST for our expression
<- exprToLLVM expr
ourExpression
-- print our result to stdout
<- LLVM.call printFn [(ourExpression, [])]
_
-- return success exit code of `0`
0) LLVM.ret (LLVM.int32
You can see all of the LLVM generation code here.
Well, well, well, if it’s not the end of the article
Congratulations, you are all functional programming language implementation experts now. Hopefully you found that somewhat interesting and/or useful. Next time we’ll be adding basic product types and pattern matching. Great!
Make sense? If not, get in touch!
Further reading: