Bound by a Free-Monad-like Structure

2023-05-11

Recently my friend Jonas Carpay has been teaching me about the Haskell package bound, and the fundamental problem it is trying to solve. bound gives a good solution for defining a lambda calculus in Haskell, specifically with how to differentiate between bound and free variables.

While playing around with the ideas from bound, I found a neat way of defining a datatype for a lambda calculus expression using something like a free monad.

This post explains a little background for bound, and then shows my own definition using the free-monad-like structure.

The Haskell code in this post is based on GHC-9.2, Stackage LTS-20.14.

Lambda Calculus

Most Haskellers will be familiar with lambda calculus. Even if you've never explicitly studied it, it will seem familiar to you just from having used Haskell.

An example lambda calculus expression might look like the following:

\x -> \y -> a (x y)

Lambda calculus has variables, function application, and lambda functions: all things you are familiar with from Haskell.

If you wanted to write a data type in Haskell to represent a lambda calculus expression, what might it look like? You might start with something like this:

data ExpTooSimp
  = ExpTooSimpVar String
    -- ^ This refers to a variable, like @x@.
  | ExpTooSimpApp ExpTooSimp ExpTooSimp
    -- ^ This refers to a function application, like @f x@.
  | ExpTooSimpLam String ExpTooSimp
    -- ^ This refers to a lambda function, like @\x -> _@.

Our above example lambda calculus expression might look like the following in Haskell:

ExpTooSimpLam "x"
  ( ExpTooSimpLam "y"
      ( ExpTooSimpApp
          (ExpTooSimpVar "a")
          (ExpTooSimpApp (ExpTooSimpVar "x") (ExpTooSimpVar "y"))
      )
  )

One generalization we can make is turning the String into a parameter:

data ExpSimp a
  = ExpSimpVar a
    -- ^ This refers to a variable, like @x@.
  | ExpSimpApp (ExpSimp a) (ExpSimp a)
    -- ^ This refers to a function application, like @f x@.
  | ExpSimpLam a (ExpSimp a)
    -- ^ This refers to a lambda function, like @\x -> _@.

Then, instead of having ExpTooSimp, we have ExpSimp String.

This datatype is certainly easy to understand, but it leaves a little to be desired.

Bound vs Free Variables

One thing that the above Haskell datatype doesn't address is our desire to differentiate between bound and free variables.

For instance, in the expression:

\x -> \y -> a (x y)

the x and y are bound variables, while the a is a free variable.

Could we change our datatype to take this bound-vs-free information into account?

The article Bound by Edward Kmett shows a bunch of different ways to express this in Haskell. The article works towards explaining a specific technique, and why it is better than most other techniques.

The general technique the article works towards is referred to as Bird and Paterson. This technique was originally introduced in the first half of the De Bruijn notation as a nested datatype paper. The bound Haskell package then improves on this technique in a few ways, and wraps it up in a library.

The general idea from the Bird and Paterson technique is to take our simple expression datatype:

data ExpSimp a
  = ExpSimpVar a
    -- ^ This refers to a variable, like @x@.
  | ExpSimpApp (ExpSimp a) (ExpSimp a)
    -- ^ This refers to a function application, like @f x@.
  | ExpSimpLam a (ExpSimp a)
    -- ^ This refers to a lambda function, like @\x -> _@.

and rewrite it like this:

data Exp a
  = ExpVar a
    -- ^ This refers to a variable, like @x@.
  | ExpApp (Exp a) (Exp a)
    -- ^ This refers to a function application, like @f x@.
  | ExpLam (Exp (Maybe a))
    -- ^ This refers to a lambda function, like @\x -> _@.

The ExpVar and ExpApp cases haven't changed, but the ExpLam case needs some explaining.

First off, in the new ExpLam, where has the variable name gone? Why isn't this something like ExpLam a (Exp a)?

The insight is that in ExpLam (Exp (Maybe a)), we're recursing on Exp, but on every recursion, we add a new layer of Maybe to a. Each layer of Maybe refers to a variable being bound by a given lambda. This is a little tricky to understand, and it will help to look at a few examples.

Using Exp String, here's a top-level free variable without any lambdas involved.

lambda calculus Haskell
top-level free variable a ExpVar "a"

This should be straightforward. There are no Maybes involved here (since there are no uses ExpLam).

How about the two cases of a simple, single lambda expression: one with a free variable, and one with a bound variable?

lambda calculus Haskell
bound variable \x -> x ExpLam (ExpVar Nothing)
free variable \x -> a ExpLam (ExpVar (Just "a"))

You can see that within one layer of ExpLam, ExpVar now takes a type of Maybe String. Just refers to a free variable, while Nothing refers to the closest bound variable.

This is pretty neat. We now have a way to differentiate free and bound variables. Bound variables don't even need to carry around a variable name (once a variable is bound, it doesn't really matter what its name is)!

What does this look like with more layers of lambdas? Here are a few more examples:

lambda calculus Haskell
var bound to inner lambda \x -> \y -> y ExpLam (ExpLam (ExpVar Nothing))
var bound to outer lambda \x -> \y -> x ExpLam (ExpLam (ExpVar (Just Nothing)))
free var \x -> \y -> a ExpLam (ExpLam (ExpVar (Just (Just "a"))))

You can see here that within two layers of lambdas, ExpVar now takes a type of Maybe (Maybe String). When adding each layer of lambda, we have to add a new layer of Maybe.

With two layers of lambda (and two corresponding layers of Maybe), you can see that Nothing refers to the closest binding lambda, Just Nothing refers to the next closest binding lambda, and Just (Just v) refers to a free variable.

While this is certainly clever, why do we prefer Exp over ExpSimp?

Monad and Foldable for Exp

Exp is nice because it has a straightforward Monad, Foldable, and Traversable instance that allows us to manipulate free (non-bound) variables.

Here's what these instances look like:

data Exp a
  = ExpVar a
  | ExpApp (Exp a) (Exp a)
  | ExpLam (Exp (Maybe a))
  deriving stock (Functor, Foldable, Traversable)

-- A trivial Applicative instance
instance Applicative Exp where pure = ExpVar ; (<*>) = ap

instance Monad Exp where
  (>>=) :: Exp a -> (a -> Exp b) -> Exp b
  e >>= k =
    case e of
      ExpVar a -> k a
      ExpApp f a -> ExpApp (f >>= k) (a >>= k)
      ExpLam l -> ExpLam (l >>= traverse k)

Let's look at how we can use each of these instances in turn. First, here's an example expression we can play around with:

exampleExp1 :: Exp String
exampleExp1 =
  ExpLam $
    ExpApp
      (ExpVar (Just "a"))
      (ExpLam $
        ExpApp
          (ExpVar (Just Nothing))
          (ExpVar (Just (Just "b"))
      )

This corresponds to the following Lambda calculus expression:

\x -> a (\y -> x b)

First, let's look at the Functor instance for Exp. The Functor instance gives us the ability to modify free variables. Here's an example of replacing free variables with their ASCII codes:

> fmap (ord . head :: String -> Int) exampleExp1 :: Exp Int
ExpLam $
  ExpApp
    (ExpVar (Just 97))
    (ExpLam $
      ExpApp
        (ExpVar (Just Nothing))
        (ExpVar (Just (Just 98))
    )

This result corresponds to a lambda expression that would look something like the following1:

\x -> 97 (\y -> x 98)

Note that the bound x variable is not affected in any way, only the free a and b variables.

The Monad instance gives us a way to substitute free variables. For instance, in our running example, let's say we want to substitute in the identity function in place of the free variable a.

In our example \x -> a (\y -> x b), we want to substitute \z -> z in place of a. The result should look like:

\x -> (\z -> z) (\y -> x b)

Let's start by writing a function to do the substitution:

substituteIdForA :: String -> Exp String
substituteIdForA v
  | v == "a"  = ExpLam (ExpVar Nothing)
  | otherwise = ExpVar v

Let's use this with bind:

> exampleExp1 >>= substituteIdForA :: Exp String
ExpLam $
  ExpApp
    (ExpLam (ExpVar Nothing))
    (ExpLam $
      ExpApp
        (ExpVar (Just Nothing))
        (ExpVar (Just (Just "b"))
    )

You can see how the ExpVar (Just "a") case has been replaced by the new lambda expression.

Finally, let's look at Foldable2. This gives us a way to pull out all the free variables3:

> foldMap (:[]) exampleExp1 :: [String]
["a", "b"]

This is a nice way to operate over all the free variables in an expression at once.

Takeaway for the Bird and Paterson Technique

The takeaway here is that once you come up with your expression data type:

data Exp a
  = ExpVar a
  | ExpApp (Exp a) (Exp a)
  | ExpLam (Exp (Maybe a))

you can automatically derive Functor, Foldable, and Traversable. The Applicative instance is trivial. All you really need to write is the Monad instance, but it is straightforward.

Despite being easy to write, these instances give you quite a lot of power for manipulating free variables in your expressions. This is pretty nice!4

Push it Further

While this is certainly nice, I started wondering if there was any way to push it further. Is there some formulation where you don't even have to write the Monad instance?

The first thing that came to my mind was a free-monad-like construction. With the normal approach to free monads for eDSLs, all you have to do is create a datatype and derive Functor for it. You then wrap it up in Free and you get the Monad instance... well, for free.

Could it work to just pass Exp to Free? Maybe something like the following?

data Free f a
  = Pure a
  | Free (f (Free f a))

data ExpFBad r
  = ExpFBadApp r r
  | ExpFBadLam (Maybe r)

type ExpBad = Free ExpFBad

This almost looks like it would work, but you'll notice that the ExpFBadLam data constructor is incorrect. Compare to Exp:

data Exp a
  = ExpVar a
  | ExpApp (Exp a) (Exp a)
  | ExpLam (Exp (Maybe a))

In ExpLam, we're not recursing on Maybe (Exp a), we're recursing on Exp (Maybe a). As far as I can tell, it is not possible to express this type of recursive datatype with Free5.

Let's rewrite Exp to make this a little more obvious:

data ExpTran a
  = ExpTranVar a
  | ExpTranApp (IdentityT ExpTran a) (IdentityT ExpTran a)
  | ExpTranLam (MaybeT ExpTran a)

Here, instead of recursing directly in Exp, we've defined all recursion to happen through the monad transformers IdentityT and MaybeT. This ExpTran datatype is isomorphic to Exp, but it helps to show why it is not possible to define Exp using Free plus some functor.

Is there some sort of Free-like structure that we can use to define Exp?

FFree and ExpF for Exp

After playing around with this a bunch, I came up with the following construction:

data ExpF f a
  = ExpFApp (IdentityT f a) (IdentityT f a)
  | ExpFLam (MaybeT f a)

data FFree f a
  = FPure a
  | FFree (f (FFree f) a)

Here's ExpTran and Free for an easy comparison:

data ExpTran a
  = ExpTranVar a
  | ExpTranApp (IdentityT ExpTran a) (IdentityT ExpTran a)
  | ExpTranLam (MaybeT ExpTran a)

data Free f a
  = Pure a
  | Free (f (Free f a))

Let's see if we can write Functor, Applicative, and Monad for FFree. We may need the assistance of an instance like Functor for ExpF, but let's try without and see how far we get.

First, Functor for FFree f:

instance Functor (FFree f) where
  fmap :: (a -> b) -> FFree f a -> FFree f b
  fmap g = \case
    FPure a -> FPure (g a)
    FFree ex -> FFree _

Hmm, we're stuck on the FFree case. What is the type of _?

_ :: f (FFree f) b

What do we have in the environment that is relevant?

ex :: f (FFree f) a

g :: a -> b

It seems like we need f to be a Functor:

theFunctionWeNeedForFmap :: (a -> b) -> f (FFree f) a -> f (FFree f) b

Let's try writing the Monad instance for FFree f:

instance Monad (FFree f) where
  (>>=) :: FFree f a -> (a -> FFree f b) -> FFree f b
  m >>= k =
    case m of
      FPure a -> k a
      FFree ex -> FFree _

We're stuck on the FFree case again. What is the type of _ this time?

_ :: f (FFree f) b

What do we have in the environment that is relevant?

ex :: f (FFree f) a

k :: a -> FFree f b

This is a little different than the Functor case, but it looks like we need a function like the following:

theFunctionWeNeedForBind :: f (FFree f) a -> (a -> FFree f b) -> f (FFree f) b

We might even be able to generalize this a to:

theFunctionWeNeed :: Monad h => f h a -> (a -> h b) -> f h b

In our case, f will be ExpF, and h will be FFree ExpF. Fully specialized, this would look like:

theFunctionWeNeedSpecial
  :: ExpF (FFree ExpF) a -> (a -> FFree ExpF b) -> ExpF (FFree ExpF) b

The Bound Typeclass

Let's wrap this function up in a typeclass6:

class Bound f where
  (>>>=) :: Monad h => f h a -> (a -> h b) -> f h b

We'll need an instance for ExpF (since f will eventually become ExpF and h will be FFree ExpF).

But first, lets confirm we can use this typeclass to write Functor and Monad for FFree:

instance Bound f => Functor (FFree f) where
  fmap :: (a -> b) -> FFree f a -> FFree f b
  fmap f = \case
    FPure a -> FPure (f a)
    FFree ex -> FFree $ ex >>>= (pure . f)

instance Bound f => Applicative (FFree f) where pure = FPure ; (<*>) = ap

instance Bound f => Monad (FFree f) where
  (>>=) :: FFree f a -> (a -> FFree f b) -> FFree f b
  m >>= k =
    case m of
      FPure a -> k a
      FFree ex -> FFree $ ex >>>= k

Looks like it works!

Now for our ExpF instance of Bound:

instance Bound ExpF where
  (>>>=) :: Monad h => ExpF h a -> (a -> h b) -> ExpF h b
  ex >>>= f = case ex of
    ExpFApp ida idb -> ExpFApp (ida >>= lift . f) (idb >>= lift . f)
    ExpFLam may -> ExpFLam $ may >>= lift . f

What have we gained?

So what have we gained here? Let's review the relevant datatypes and instances. First, Exp:

data Exp a
  = ExpVar a
  | ExpApp (Exp a) (Exp a)
  | ExpLam (Exp (Maybe a))

instance Functor Exp
instance Applicative Exp
instance Monad Exp

We have Functor, Applicative, and Monad for Exp.

Now Bound, ExpF, and FFree:

class Bound f where
  (>>>=) :: Monad h => f h a -> (a -> h b) -> f h b

data ExpF f a
  = ExpFApp (IdentityT f a) (IdentityT f a)
  | ExpFLam (MaybeT f a)

instance Bound ExpF

data FFree f a
  = FPure a
  | FFree (f (FFree f) a)

instance Bound f => Functor (FFree f) where
instance Bound f => Applicative (FFree f) where
instance Bound f => Monad (FFree f) where

Here are the big ideas:

  1. FFree ExpF is isomorphic to Exp.
  2. FFree and Bound could be wrapped up in a library, similar to free.
  3. All an end-user would have to provide is the datatype ExpF and the Bound ExpF instance. The end-user would not have to write the Functor, Applicative, or Monad instances.
  4. FFree and Bound are sort of like Free and Functor, except that GHC can automatically derive Functor in most cases. The end-user would have to manually write the Bound ExpF instance.

More Generalization

It is possible to generalize this a little more. You can pull out the MaybeT and IdentityT transformers used in ExpF, and define Bound for any MonadTrans. Here's what we originally had:

data ExpF f a
  = ExpFApp (IdentityT f a) (IdentityT f a)
  | ExpFLam (MaybeT f a)

This can be generalized to:

data ExpTransF i m f a
  = ExpTransFApp (i f a) (i f a)
  | ExpTransFLam (m f a)

The Bound instance for ExpTransF then becomes7:

instance
    ( forall x. Monad x => Monad (i x)
    , MonadTrans i
    , forall x. Monad x => Monad (m x)
    , MonadTrans m
    ) => Bound (ExpTransF i m) where
  (>>>=) :: Monad h => ExpTransF i m h a -> (a -> h b) -> ExpTransF i m h b
  ex >>>= f = case ex of
    ExpTransFApp ida idb -> ExpTransFApp (ida >>= lift . f) (idb >>= lift . f)
    ExpTransFLam l -> ExpTransFLam $ l >>= lift . f

Now, we have the following isomorphisms:

  • ExpTransF IdentityT MaybeT is isomorphic to ExpF.
  • FreeF (ExpTransF IdentityT MaybeT) is isomorphic to both FreeF ExpF and Exp.

Alternatively, you can actually define Bound instances for MaybeT and IdentityT, and call out to them in the implementation of the Bound instance for ExpTransF:

instance Bound IdentityT where
  (>>>=) :: Monad h => IdentityT h a -> (a -> h b) -> IdentityT h b
  (IdentityT m) >>>= f = IdentityT $ m >>= f

instance Bound MaybeT where
  (>>>=) :: Monad h => MaybeT h a -> (a -> h b) -> MaybeT h b
  (MaybeT m) >>>= f = MaybeT $ m >>= traverse f

instance (Bound i, Bound m) => Bound (ExpF i m) where
  (>>>=) :: Monad h => ExpF i m h a -> (a -> h b) -> ExpF i m h b
  ex >>>= f = case ex of
    ExpFApp ida idb -> ExpFApp (ida >>>= f) (idb >>>= f)
    ExpFLam l -> ExpFLam $ l >>>= f

Conclusion

In practice, is this FFree, Bound, and ExpF approach good?

In my opinion, using FFree, Bound, and ExpF likely doesn't save you that much work over just writing a Monad instance for Exp. And showing a value of FFree ExpF is harder to understand than Exp, since there are many "extra" constructors.

If there was a FFree library, it might contain a few nice helper functions, but my hunch is that it wouldn't save you too much work over just using Exp and writing them yourself.

However, if you want to use ExpTransF and frequently change the monad transformer you are working with, maybe FFree and Bound would save you a little work?

I'd be interested in hearing from anyone who has used a similar approach, or possibly wants to collaborate on a library in this space8.

Footnotes


  1. It's another question entirely whether an operation like this would mean anything semantically for the underlying lambda calculus. This is mostly just an example of using fmap!↩︎

  2. I'm not going to discuss Traversable, but it should be straightforward to understand if you understand the Foldable instance.↩︎

  3. Note that (:[]) is the a -> [a] singleton list construction function. Otherwise known as pure for the list monad.↩︎

  4. There are a few other advantages for the Bird and Paterson technique, You can find them in the Bound article.↩︎

  5. I haven't formally proven this, but I played around with it a bunch and wasn't able to get it to work. I plan on brushing off my dusty Coq skills and trying coming up with an actual proof of this.↩︎

  6. This is actually the Bound typeclass from bound. I stumbled on this without realizing it was already defined in bound, but it was very nice to know I wasn't just pulling some random function out of the air.↩︎

  7. The quantified constraints make this look a little complicated, but I expect they will go away starting with transformers-0.6.0.0, available with GHC-9.6, since MonadTrans has gained these exact quantified constraints.↩︎

  8. The issue tracker for the bound library does have an open issue about people trying things in this same space.↩︎

tags: haskell