Single Pass Recursion in Rust


This is the third post in a three-post series. In the first post we developed a stack-safe, ergonomic, and concise method for working with recursive data structures (using a simple expression language as an example). In the second post we made it fully generic, providing a set of generic tools for expanding and collapsing any recursive data structure in Rust.

In this post we will see how to combine these two things - expanding a structure and collapsing it at the same time, performing both operations in a single pass. In the process, we will gain the ability to write arbitrary recursive functions over traditional boxed-pointer recursive structures (instead of the novel RecursiveTree type introduced in my previous post) while retaining stack safety.

execution graph for simultaneously expanding and collapsing a simple expression

This post covers functionality already implemented in my recursion crate. For that reason, it will focus on diagrams over code, but the relevant code is provided if you’re interested or curious.

Our Expression Language

Earlier in this series we defined a simple expression language for math, using this enum to represent a single layer of an expression tree:

pub enum ExprLayer<A> {
    Add { a: A, b: A },
    Sub { a: A, b: A },
    Mul { a: A, b: A },
    LiteralInt { literal: i64 },
}

struct ExprBoxed(Box<ExprLayer<ExprBoxed>>);

So that we can easily write expressions out by hand (for test cases and examples), let’s define some helper functions (specifics omitted).

// some simple utility functions for creating boxed expressions
impl ExprBoxed {
    fn add(a: Self, b: Self) -> Self { ... }
    fn sub(a: Self, b: Self) -> Self { ... }
    fn mul(a: Self, b: Self) -> Self { ... }
    fn literal_int(x: i64) -> Self { ... }
}

In this post we’ll be working with the expression (5 - 3) * (3 + 12). Here’s what that looks like as code:

use ExprBoxed::*;
let expr = mul(
	sub(literal_int(5), literal_int(3)),
	add(literal_int(3), literal_int(12)),
);

Expand and Collapse

Previously, we defined two core traits representing the ability to expand a recursive structure out from some seed, and to collapse a recursive structure down into a single value:

/// Support for expanding a data structure from a seed value, one layer at a time
pub trait Expand<A, Wrapped> {
    fn expand_layers<F: Fn(A) -> Wrapped>(a: A, expand_layer: F) -> Self;
}

/// Support for collapsing a data structure into a single value, one layer at a time
pub trait Collapse<A, Wrapped> {
    fn collapse_layers<F: FnMut(Wrapped) -> A>(self, collapse_layer: F) -> A;
}

We’ll be introducing a new recursion backend that implements these traits. This is already implemented in the recursion crate, so this post is concerned mostly with communicating the underlying concepts - not the implementation details.

Recursive Data Structure

Instead of using a vector of elements linked by vector indices, we’ll construct a vector where linkages are defined only by the relative position of elements. This lets us replace the usize vector indices used in the previous posts with zero-size markers (StackMarker, equivalent to () but given its own name for convenience). This results in an extremely compact representation of recursive structures.

Here’s what the data structure looks like. It’s provided by the recursion crate, so you won’t need to define it yourself.

pub struct StackMarker;

/// Recursive tree made up of layers of some type 'Layer<_>', eg `ExprLayer<_>`
pub struct RecursiveTree<Wrapped> {
    /// nonempty, in topological-sorted order
    elems: Vec<Wrapped>, // Layer<Index> (eg `ExprLayer<()>`)
}

Here’s a visualization of what this data structure looks like for the expression (5 - 3) * (3 + 12).

simple expr data at rest

Expand

Let’s see what expanding a boxed expression tree for (5 - 3) * (3 + 12) into a RecursiveTree looks like. We’ll look at the implementation soon, but this visualization shows what the computation looks like.

execution graph for expansion of a simple expression

You can see the implementation of expand_layers for this structure below, if you’re curious, but the important thing is that we have an implementation of Expand provided by the recursion crate, not how its implemented. The whole point of having a crate is not having to examine every implementation detail!

impl<A, Underlying, O: MapLayer<StackMarker, Unwrapped = A, To = U>> Expand<A, O>
    for RecursiveTree<U>
{
    fn expand_layers<F: Fn(A) -> O>(a: A, generate_layer: F) -> Self {
        let mut frontier = Vec::from([a]);
        let mut elems = vec![];

        // expand to build a vec of elems while preserving topo order
        while let Some(seed) = frontier.pop() {
            let layer = generate_layer(seed);

            let mut topush = Vec::new();
            let layer = layer.map_layer(|aa| {
                topush.push(aa);
                StackMarker
            });
            frontier.extend(topush.into_iter().rev());

            elems.push(layer);
        }

        elems.reverse();

        Self {
            elems,
            _underlying: std::marker::PhantomData,
        }
    }
}

Let’s see what expanding a boxed-pointer expression into a RecursiveTree looks like. Since it’s already made up of ExprLayer layers, we can just dereference the boxed value using the * operator.

let expr_tree = RecursiveTree::expand_layers(expr, |ExprBoxed(boxed)| *boxed)

Collapse

Here’s what collapsing the RecursiveTree that we just constructed for (5 - 3) * (3 + 12) looks like. As in the last section, this visualization shows what the computation looks like.

execution graph for collapsing of a simple expression
impl<A, O, U: MapLayer<A, To = O, Unwrapped = StackMarker>> Collapse<A, O>
    for RecursiveTree<U>
{
    fn collapse_layers<F: FnMut(O) -> A>(self, mut collapse_layer: F) -> A {
        let mut result_stack = Vec::new();

        for layer in self.elems.into_iter() {
            let layer = layer.map_layer(|_| result_stack.pop().unwrap());

            result_stack.push(collapse_layer(layer));
        }

        result_stack.pop().unwrap()
    }
}

Here’s how we can use this to collapse expressions represented in the RecursiveTree format. Nothing complex, just some simple arithmatic:

let result = recursive_tree.collapse_layers(|expr| {
    use ExprLayer::*;
	match expr {
        Add { a, b } => a + b,
        Sub { a, b } => a - b,
        Mul { a, b } => a * b,
        LiteralInt { literal } => literal,
	}
})

Expand and Collapse in a Single Pass

As a reminder, the RecursiveTree representing (5 - 3) * (3 + 12) looks like this:

simple expr data at rest

Here’s a visualization showing what the full evaluation looks like, if we run expand_layers immediately followed by collapse_layers:

execution graph for expanding and then collapsing a simple expression

But what if could run both operations in a single pass? That would remove the need to hold the full tree in memory. It would also let us avoid introducing a new type of data structure - RecursiveTree - into our projects. It should be possible to evaluate the expression like this, one branch at a time, instead of constructing the full tree:

execution graph for simultaneously expanding and collapsing a simple expression

The function that does this is called expand_and_collapse, and it’s provided by the recursion crate. Docs are here.

/// For some Layer type, Expandable is Layer<Seed> and Collapsable is Layer<Out>
pub fn expand_and_collapse<Seed, Out, Expandable, Collapsable>(
    seed: Seed,
    mut expand_layer: impl FnMut(Seed) -> Expandable,
    mut collapse_layer: impl FnMut(Collapsable) -> Out,
) -> Out
    Expandable: MapLayer<(), Unwrapped = Seed>,
    <Expandable as MapLayer<()>>::To: MapLayer<Out, Unwrapped = (), To = Collapsable>,

The full implementation is below, if you’re curious.

pub fn expand_and_collapse<Seed, Out, Expandable, Collapsable>(
    seed: Seed,
    mut expand_layer: impl FnMut(Seed) -> Expandable,
    mut collapse_layer: impl FnMut(Collapsable) -> Out,
) -> Out
where
    Expandable: MapLayer<(), Unwrapped = Seed>,
    <Expandable as MapLayer<()>>::To: MapLayer<Out, Unwrapped = (), To = Collapsable>,
{
    enum State<Seed, CollapsableInternal> {
        Expand(Seed),
        Collapse(CollapsableInternal),
    }

    let mut vals: Vec<Out> = vec![];
    let mut stack = vec![State::Expand(seed)];

    while let Some(item) = stack.pop() {
        match item {
            State::Expand(seed) => {
                let node = expand_layer(seed);
                let mut seeds = Vec::new();
                let node = node.map_layer(|seed| {
                    seeds.push(seed)
                });

                stack.push(State::Collapse(node));
                stack.extend(seeds.into_iter().map(State::Expand));
            }
            State::Collapse(node) => {
                let node = node.map_layer(|_: ()| vals.pop().unwrap());
                vals.push(collapse_layer(node))
            }
        };
    }
    vals.pop().unwrap()
}

Here’s how we can use this function to evaluate a traditional boxed-pointer recursive expression tree, without constructing an intermediate RecursiveTree - thus saving on both conceptual overhead and memory-usage overhead:

fn eval(expr: ExprBoxed) -> i63 {
    expand_and_collapse(
        expr,                      // seed value
        |ExprBoxed(boxed)| *boxed, // expand layer function
        |expr| {
            // collapse layer function
            use ExprLayer::*;
            match expr {
                Add { a, b } => a + b,
                Sub { a, b } => a - b,
                Mul { a, b } => a * b,
                LiteralInt { literal } => literal,
            }
        },
    )
}

This is fully generic, and can be used with any recursive structure, not just the simple expression language used in this post. In the next post, I’ll show how to use expand_and_collapse to build an expression language for matching filesystem entities, with features like short-circuiting to minimize syscalls, arena allocation (as a flex), and, of course, many more execution graphs.

Computer Science Implications

If you have a background in data structures and algorithms, and if you looked closely at the implementation details of the above functions, you might recognize them as implementing stack machines.

If you’re familiar with Haskell and Recursion schemes, you might recognize expand_and_collapse as a hylomorphism. If you don’t know what the hell a hylomorphism is and are interested in learning, there’s an excellent blog post here that goes into some detail on the concept.

There does appear to be a fundamental correspondence between stack machines and hylomorphisms. This is interesting, because these ideas arise in different corners of the computer science universe. That said, this isn’t that surprising, as both are descriptions of the same thing: recursion.

Thank you

Thank you to Fiona, Rain, Eliza, among others, for reviewing drafts of this post.


See also