How lambdas & closures are implemented

Can closures+lambdas be implemented irrespective of whether the language is garbage collected?

Jan 23, 2022

This post is a copy of the gist I posted in response to the WeekendDevPuzzle of 2022-01-22, shown below.

As the thread above has submissions from people who chose to share their perspectives on the puzzle, I'll be linking this post to this Twitter thread, to avoid polluting the original one.

Why this topic?

Given that I started these puzzles as a way to peel the onion, so to say, lambdas + closures make for a particularly great subject, because of the large difference between our mental models of them (useful while programming), vs how they're actually implemented.

You see, as programmers, we're quite comfortable with moving data around, be it when we're passing them as parameters to a function, or when returning values. Whether the data is in stack, heap, register, it doesn't matter, nor whether we're passing them by value or by reference. I mean, it does matter, but our mental models aren't too separated from the underlying reality. We know that when we pass by reference, the object is somewhere in memory (often the heap), and it's just the address to that object being passed around.

But lambdas aren't data, they're code, while closures are data. So what does it mean that we return a lambda function? Are we moving code around? Did that code get created when we hit that particular point in our execution, or was it always there? If it was always there, how is it able to access the captured variables in the closure? After all, we didn't pass any of them as parameters.

So many questions! At least I recall thinking about these the first time I encountered them many years ago (it was Python that exposed me to it, if you're interested).

A quick refresher of code vs data

You might already be aware that CPUs go about sequentially executing machine code, while they maintain dedicated registers to things like stack pointer etc. So at least at a fundamental level, you'd know that your machine organises instructions/code separately from data (stack or heap).

How the instruction part is laid out is determined by the compiler (which produced machine code) and the loader (which loaded that executable in memory amongst other things). In case of JIT runtimes like JVM etc, the code also mutates itself over time.

The stack is the part of memory which expands/shrinks depending upon how functions get invoked. When a function is called, a stack frame gets pushed (so the stack grows), while when a function returns the stack gets popped (stack shrinks). Any variable that gets allocated on the stack dies (shouldn't be accessed) after that function has returned, as that memory address isn't inside the stack anymore.

The heap is another place where data is managed. Because allocations here are managed outside of stack frame boundaries, objects created here can outlast functions that created them. So a function can create such an object, and return a reference to them. So long as these references are well tracked, by language runtime or a library or by programmer, everything's good. All GC languages keep most of their objects here, naturally.

Now, the question is, what really happens when we create a lambda?

Lambdas

Let's first talk just about how a lambda gets created (we'll get to closures soon). Clearly, whatever entity creates a lambda, needs access to the source code to understand & convert the lambda into machine code. The source code would also be needed to semantically understand which variables to include in the closure.

Which entity has access to the source code? The compiler. Which means every single lambda, no matter how many if-elses it's hidden behind, gets parsed and compiled by the compiler (in case of languages like Python, remember that the compiler is running at runtime).

So, even though it feels like you're creating a lambda at a particular point in your program, the reality is the lambda gets created at compile time itself, most often as some randomly named anonymous function. There's no stack/heap memory allocation that's happening when creating a lambda, unlike when creating other kinds of objects (remember that we're talking only lambdas right now, not closures). Given how we create lambdas in code just like we create objects, the cognitive dissonance between our mental model of them vs how they're implemented can be tricky.

Let's say, our lambda got created inside a function that has a bunch of variables. Such lambdas can access two kinds of data:

Parameters defined in the lambda
Any variable that's available in the lambda's lexical scope (this constitutes the closure for this lambda)

Recall from the previous section that lambdas had their code pre-compiled, so when we call/invoke the lambda, that piece of code needs access to both of these kinds of data.

Access to parameter is fine, as it would get implemented with the same ABI semantics as normal function calls. Parameters are usually passed either by specific registers, or by pushing the parameters in the new stack frame that gets created for that function call.

But how to access the data that was captured as a closure?

Closures

At this point in time, we can wonder, well maybe the compiler pre-recorded the address of the captured variables (the compiler knows which ones) and baked it into the machine code of the lambda. We can immediately dismiss this idea, because the compiler does not have address information for any variable that isn't statically available (think, why?).

Another possibility is that when a lambda is moved around (say, being returned from a function), what's really being moved around is a combination of the address of the compiled anonymous lambda function, and a metadata object that's just a struct of the captured variables. Hmm, that could work. This way, the compiler could include this struct as a hidden function parameter, and appropriately resolve those variable names inside the lambda functions code. I'm skipping a bunch of possible optimisations here, but they aren't very relevant to our core topic.

Now the question is, whether the captured variables are included in this struct as a copy? Or as a reference? If it's a copy, things are fine even if inefficient. But if it's by reference, we have a problem! What happens if the captured variable is no more valid by the time the lambda is invoked? This can easily happen in non-GC languages if a reference to a local variable was captured during lambda creation, and that lambda was returned.

With GC languages, this issue doesn't arise, because every object is on heap, and preserved until any references remain, so a captured variable reference makes sure that that object is not deallocated while the lambda remains alive.

But with non-GC, as you can guess, the only way to ensure correctness, is by ensuring that the lifecycle of the captured vairables is >= the lifecycle of the lambda. One way to do that is by putting the necessary object on the heap, and capturing that reference. This requires programmer intervention in non-GC languages, e.g. in case of Rust this would mean using Box.

Conclusion

As you can see, while for GC languages, it isn't too hard to introduce lambdas & closures, for non-GC languages it does require programmer intervention to deal with the lifecycle mismatch between captured closure and the lambda itself (remember that this mismatch doesn't always exist).

I've focussed only on the principles, that diving deep into any specific implementation, but if you find anything incorrect in here, or some implementation that proves these principals wrong, please do point out. Thanks.

I hope this was informative. If you've got a comment to share, head over to the gist of this post