Folds on lists

The previous post turned out to be rather complicated. In this one, I return to much simpler matters: {\mathit{foldr}} and {\mathit{foldl}} on lists, and list homomorphisms with an associative binary operator. For simplicity, I’m only going to be discussing finite lists.

Fold right

Recall that {\mathit{foldr}} is defined as follows:

\displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{foldr} :: (\alpha \rightarrow \beta \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{foldr}\;(\oplus)\;e\;[\,] & = e \\ \mathit{foldr}\;(\oplus)\;e\;(a:x) & = a \oplus \mathit{foldr}\;(\oplus)\;e\;x \end{array}

It satisfies the following universal property, which gives necessary and sufficient conditions for a function {h} to be expressible as a {\mathit{foldr}}:

\displaystyle  \begin{array}{@{}l} h = \mathit{foldr}\;(\oplus)\;e \quad\Longleftrightarrow\quad h\;[\,] = e \;\land\; h\;(a:x) = a \oplus (h\;x) \end{array}

As one consequence of the universal property, we get a fusion theorem, which states sufficient conditions for fusing a following function {g} into a {\mathit{foldr}}:

\displaystyle  \begin{array}{@{}l} g \cdot \mathit{foldr}\;(\oplus)\;e = \mathit{foldr}\; (\otimes)\;e' \quad \Longleftarrow\quad g\;e = e' \;\land\; g\;(a \oplus b) = a \otimes g\;b \end{array}

(on finite lists—infinite lists require an additional strictness condition). Fusion is an equivalence if {\mathit{foldr}\;(\oplus)\;e} is surjective. If {\mathit{foldr}\;(\oplus)\;e} is not surjective, it’s an equivalence on the range of that fold: the left-hand side implies

\displaystyle  g\;(a \oplus b) = a \otimes g\;b

only for {b} of the form {\mathit{foldr}\;(\oplus)\;e\;x} for some {x}, not for all {b}.

As a second consequence of the universal property (or alternatively, as a consequence of the free theorem of the type of {\mathit{foldr}}), we have map–fold fusion:

\displaystyle  \mathit{foldr}\;(\oplus)\;e \cdot \mathit{map}\;g = \mathit{foldr}\;((\oplus) \cdot g)\;e

The code golf{((\oplus) \cdot g)}” is a concise if opaque way of writing the binary operator pointfree: {((\oplus) \cdot g)\;a\;b = g\;a \oplus b}.

Fold left

Similarly, {\mathit{foldl}} is defined as follows:

\displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{foldl} :: (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{foldl}\;(\oplus)\;e\;[\,] & = e \\ \mathit{foldl}\;(\oplus)\;e\;(a:x) & = \mathit{foldl}\;(\oplus)\;(e \oplus a)\;x \end{array}

({\mathit{foldl}} is tail recursive, so it is unproductive on an infinite list.)

{\mathit{foldl}} also enjoys a universal property, although it’s not so well known as that for {\mathit{foldr}}. Because of the varying accumulating parameter {e}, the universal property entails abstracting that argument on the left in favour of a universal quantification on the right:

\displaystyle  \begin{array}{@{}l} h = \mathit{foldl}\;(\oplus) \quad\Longleftrightarrow\quad \forall b \;.\; h\;b\;[\,] = b \;\land\; h\;b\;(a:x) = h\;(b \oplus a)\;x \end{array}

(For the proof, see Exercise 16 in my paper with Richard Bird on arithmetic coding.)

From the universal property, it is straightforward to prove a map–{\mathit{foldl}} fusion theorem:

\displaystyle  \mathit{foldl}\;(\oplus)\;e \cdot \mathit{map}\;f = \mathit{foldl}\;((\cdot f) \cdot (\oplus))\;e

Note that {((\cdot f) \cdot (\oplus))\;b\;a = b \oplus f\;a}. There is also a fusion theorem:

\displaystyle  (\forall e \;.\; f \cdot \mathit{foldl}\;(\oplus)\;e = \mathit{foldl}\;(\otimes)\;(f\;e)) \quad\Longleftarrow\quad f\;(b \oplus a) = f\;b \otimes a

This is easy to prove by induction. (I would expect it also to be a consequence of the universal property, but I don’t see how to make that go through.)

Duality theorems

Of course, {\mathit{foldr}} and {\mathit{foldl}} are closely related. §3.5.1 of Bird and Wadler’s classic text presents a First Duality Theorem:

\displaystyle  \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\oplus)\;e\;x

when {\oplus} is associative with neutral element {e}; a more general Second Duality Theorem:

\displaystyle  \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\otimes)\;e\;x

when {\oplus} associates with {\otimes} (that is, {a \oplus (b \otimes c) = (a \oplus b) \otimes c}) and {a \oplus e = e \otimes a} (for all {a,b,c}, but fixed {e}); and a Third Duality Theorem:

\displaystyle  \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\mathit{flip}\;f)\;e\;(\mathit{reverse}\;x)

(again, all three only for finite {x}).

The First Duality Theorem is a specialization of the Second, when {{\oplus} = {\otimes}} (but curiously, also a slight strengthening: apparently all we require is that {a \oplus e = e \oplus a}, not that both equal {a}; for example, we still have {\mathit{foldr}\;(+)\;1 = \mathit{foldl}\;(+)\;1}, even though {1} is not the neutral element for {+}).

Characterization

When is a function a fold?” Evidently, if function {h} on lists is injective, then it can be written as a {\mathit{foldr}}: in that case (at least, classically speaking), {h} has a post-inverse—a function {g} such that {g \cdot h = \mathit{id}}—so:

\displaystyle  \begin{array}{@{}ll} & h\;(a:x) \\ = & \qquad \{ g \cdot h = \mathit{id} \} \\ & h\;(a : g\;(h\;x)) \\ = & \qquad \{ \mbox{let } a \oplus b = h\;(a : g\;b) \} \\ & a \oplus h\;x \end{array}

and so, letting {e = h\;[\,]}, we have

\displaystyle  h = \mathit{foldr}\;(\oplus)\;e

But also evidently, injectivity is not necessary for a function to be a fold; after all, {\mathit{sum}} is a fold, but not the least bit injective. Is there a simple condition that is both sufficient and necessary for a function to be a fold? There is! The condition is that lists equivalent under {h} remain equivalent when extended by an element:

\displaystyle  h\;x = h\;x' \quad\Longrightarrow\quad h\;(a:x) = h\;(a:x')

We say that {h} is leftwards. (The kernel {\mathrm{ker}\;h} of a function {h} is a relation on its domain, namely the set of pairs {(x,x')} such that {h\;x = h\;x'}; this condition is equivalent to {\mathrm{ker}\;h \subseteq \mathrm{ker}\;(h \cdot (a:))} for every {a}.) Clearly, leftwardsness is necessary: if {h = \mathit{foldr}\;(\oplus)\;e} and {h\;x = h\;x'}, then

\displaystyle  \begin{array}{@{}ll} & h\;(a:x) \\ = & \qquad \{ h \mbox{ as } \mathit{foldr} \} \\ & a \oplus h\;x \\ = & \qquad \{ \mbox{assumption} \} \\ & a \oplus h\;x' \\ = & \qquad \{ h \mbox{ as } \mathit{foldr} \mbox{ again} \} \\ & h\;(a:x') \end{array}

Moreover, leftwardsness is sufficient. For, suppose that {h} is leftwards; then pick a function {g} such that, when {b} is in the range of {h}, we get {g\;b = x} for some {x} such that {h\;x = b} (and it doesn’t matter what value {g} returns for {b} outside the range of {h}), and then define

\displaystyle  a \oplus b = h\;(a : g\;b)

This is a proper definition of {\oplus}, on account of leftwardsness, in the sense that it doesn’t matter which value {x} we pick for {g\;b}, as long as indeed {h\;x = b}: any other value {x'} that also satisfies {h\;x' = b} entails the same outcome {h\;(a:x') = h\;(a:x)} for {a \oplus b}. Intuitively, it is not necessary to completely invert {h} (as we did in the injective case), provided that {h} preserves enough distinctions. For example, for {h = \mathit{sum}} (for which indeed {\mathit{sum}\;x = \mathit{sum}\;x'} implies {\mathit{sum}\;(a:x) = \mathit{sum}\;(a:x')}), we could pick {g\;b = [b]}. In particular, {h\;x} is obviously in the range of {h}; then {g\;(h\;x)} is chosen to be some {x'} such that {h\;x' = h\;x}, and so by construction {h\;(g\;(h\;x)) = h\;x}—in other words, {g} acts as a kind of partial inverse of {h}. So we get:

\displaystyle  \begin{array}{@{}ll} & a \oplus h\;x \\ = & \qquad \{ \mbox{definition of } \oplus \mbox{; let } x' = g\;(h\;x) \} \\ & h\;(a : x') \\ = & \qquad \{ h\;x' = h\;x \mbox{; assumption} \} \\ & h\;(a:x) \\ \end{array}

and therefore {h = \mathit{foldr}\;(\oplus)\;e} where {e = h\;[\,]}.

Monoids and list homomorphisms

A list homomorphism is a fold after a map in a monoid. That is, define

\displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{hom} :: (\beta\rightarrow\beta\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{hom}\;(\odot)\;f\;e\;[\,] & = e \\ \mathit{hom}\;(\odot)\;f\;e\;[a] & = f\;a \\ \mathit{hom}\;(\odot)\;f\;e\;(x \mathbin{{+}\!\!\!{+}} y) & = \mathit{hom}\;(\odot)\;f\;e\;x \;\odot\; \mathit{hom}\;(\odot)\;f\;e\;y \end{array}

provided that {\odot} and {e} form a monoid. (One may verify that this condition is sufficient for the equations to completely define the function; moreover, they are almost necessary—{\odot} and {e} should form a monoid on the range of {\mathit{hom}\;(\odot)\;f\;e}.) The Haskell library {\mathit{Data.Foldable}} defines an analogous method, whose list instance is

\displaystyle  \mathit{foldMap} :: \mathit{Monoid}\;\beta \Rightarrow (\alpha \rightarrow \beta) \rightarrow [\alpha] \rightarrow \beta

—the binary operator {\odot} and initial value {e} are determined implicitly by the {\mathit{Monoid}} instance rather than being passed explicitly.

Richard Bird’s Introduction to the Theory of Lists states implicitly what might be called the First Homomorphism Theorem, that any homomorphism consists of a reduction after a map (in fact, a consequence of the free theorem of the type of {\mathit{hom}}):

\displaystyle  \mathit{hom}\;(\odot)\;f\;e = \mathit{hom}\;(\odot)\;\mathit{id}\;e \cdot \mathit{map}\;f

Explicitly as Lemma 4 (“Specialization”) the same paper states a Second Homomorphism Theorem, that any homomorphism can be evaluated right-to-left or left-to-right, among other orders:

\displaystyle  \mathit{hom}\;(\odot)\;f\;e = \mathit{foldr}\;(\lambda\;a\;b \;.\; f\;a \odot b)\;e = \mathit{foldl}\;(\lambda\;b\;a \;.\; b \odot f\;a)\;e

I wrote up the Third Homomorphism Theorem, which is the converse of the Second Homomorphism Theorem: any function that can be written both as a {\mathit{foldr}} and as a {\mathit{foldl}} is a homomorphism. I learned this theorem from David Skillicorn, but it was conjectured earlier by Richard Bird and proved by Lambert Meertens during a train journey in the Netherlands—as the story goes, Lambert returned from a bathroom break with the idea for the proof. Formally, for given {h}, if there exists {\oplus, \otimes, e} such that

\displaystyle  h = \mathit{foldr}\;(\oplus)\;e = \mathit{foldl}\;(\otimes)\;e

then there also exists {f} and associative {\odot} such that

\displaystyle  h = \mathit{hom}\;(\odot)\;f\;e

Recall that {h} is “leftwards” if

\displaystyle  h\;x = h\;x' \Longrightarrow h\;(a:x) = h\;(a:x')

and that the leftwards functions are precisely the {\mathit{foldr}}s. Dually, {h} is rightwards if

\displaystyle  h\;x = h\;x' \Longrightarrow h\;(x \mathbin{{+}\!\!\!{+}} [a]) = h\;(x' \mathbin{{+}\!\!\!{+}} [a])

and (as one can show) the rightwards functions are precisely the {\mathit{foldl}}s. So the Specialization Theorem states that every homomorphism is both leftwards and rightwards, and the Third Homomorphism Theorem states that every function that is both leftwards and rightwards is necessarily a homomorphism. To prove the latter, suppose that {h} is both leftwards and rightwards; then pick a function {g} such that, for {b} in the range of {h}, we get {g\;b = x} for some {x} such that {h\;x = b}, and then define

\displaystyle  b \odot c = h\;(g\;b \mathbin{{+}\!\!\!{+}} g\;c)

As before, this is a proper definition of {\odot}: assuming leftwardsness and rightwardsness, the result does not depend on the representatives chosen for {g}. By construction, {g} again satisfies {h\;(g\;(h\;x)) = h\;x} for any {x}, and so we have:

\displaystyle  \begin{array}{@{}ll} & h\;x \odot h\;y \\ = & \qquad \{ \mbox{definition of } \odot \} \\ & h\;(g\;(h\;x) \mathbin{{+}\!\!\!{+}} g\;(h\;y)) \\ = & \qquad \{ h \mbox{ is rightwards; } h\;(g\;(h\;x)) = h\;x \} \\ & h\;(x \mathbin{{+}\!\!\!{+}} g\;(h\;y)) \\ = & \qquad \{ h \mbox{ is leftwards; } h\;(g\;(h\;y)) = h\;y \} \\ & h\;(x \mathbin{{+}\!\!\!{+}} y) \end{array}

Moreover, one can show that {\odot} is associative, and {e} its neutral element (at least on the range of {h}).

For example, sorting a list can obviously be done as a {\mathit{foldr}}, using insertion sort; by the Third Duality Theorem, it can therefore also be done as a {\mathit{foldl}} on the reverse of the list; and because the order of the input is irrelevant, the reverse can be omitted. The Third Homomorphism Theorem then implies that there exists an associative binary operator {\odot} such that {\mathit{sort}\;(x \mathbin{{+}\!\!\!{+}} y) = \mathit{sort}\;x \odot \mathit{sort}\;y}. It also gives a specification of {\odot}, given a suitable partial inverse {g} of {\mathit{sort}}—in this case, {g = \mathit{id}} suffices, because {\mathit{sort}} is idempotent. The only characterization of {\odot} arising directly from the proof involves repeatedly inserting the elements of one argument into the other, which does not exploit sortedness of the first argument. But from this inefficient characterization, one may derive the more efficient implementation that merges two sorted sequences, and hence obtain merge sort overall.

The Trick

Here is another relationship between the directed folds ({\mathit{foldr}} and {\mathit{foldl}}) and list homomorphisms—any directed fold can be implemented as a homomorphism, followed by a final function application to a starting value:

\displaystyle  \mathit{foldr}\;(\oplus)\;e\;x = \mathit{hom}\;(\cdot)\;(\oplus)\;\mathit{id}\;x\;e

Roughly speaking, this is because application and composition are related:

\displaystyle  a \oplus (b \oplus (c \oplus e)) = (a\oplus) \mathbin{\$} (b\oplus) \mathbin{\$} (c\oplus) \mathbin{\$} e = ((a\oplus) \cdot (b\oplus) \cdot (c\oplus))\;e

(here, {\$ } is right-associating, loose-binding function application). To be more precise:

\displaystyle  \begin{array}{@{}ll} & \mathit{foldr}\;(\oplus)\;e \\ = & \qquad \{ \mbox{map-fold fusion: } (\$)\;(a\oplus)\;b = a \oplus b \} \\ & \mathit{foldr}\;(\$)\;e \cdot \mathit{map}\;(\oplus) \\ = & \qquad \{ \mbox{fusion: } (\$ e)\;\mathit{id} = e \mbox{ and } (\$ e)\;((\oplus) \cdot f) = (\oplus)\;(f\;e) = (\$)\;(\oplus)\;((\$ e)\;f) \} \\ & (\$ e) \cdot \mathit{foldr}\;(\cdot)\;\mathit{id} \cdot \mathit{map}\;(\oplus) \\ = & \qquad \{ (\cdot) \mbox{ and } \mathit{id} \mbox{ form a monoid} \} \\ & (\$ e) \cdot \mathit{hom}\;(\cdot)\;(\oplus)\;\mathit{id} \end{array}

This result has many applications. For example, it’s the essence of parallel recognizers for regular languages. Language recognition looks at first like an inherently sequential process; a recognizer over an alphabet {\Sigma} can be represented as a finite state machine over states {S}, with a state transition function of type {S \times \Sigma \rightarrow S}, and such a function is clearly not associative. But by mapping this function over the sequence of symbols, we get a sequence of {S \rightarrow S} functions, which should be subject to composition. Composition is of course associative, so can be computed in parallel, in {\mathrm{log}\;n} steps on {n} processors. Better yet, each such {S \rightarrow S} function can be represented as an array (since {S} is finite), and the composition of any number of such functions takes a fixed amount of space to represent, and a fixed amount of time to apply, so the {\mathrm{log}\;n} steps take {\mathrm{log}\;n} time.

Similarly for carry-lookahead addition circuits. Binary addition of two {n}-bit numbers with an initial carry-in proceeds from right to left, adding bit by bit and taking care of carries, producing an {n}-bit result and a carry-out; this too appears inherently sequential. But there aren’t many options for the pair of input bits at a particular position: two {1}s will always generate an outgoing carry, two {0}s will always kill an incoming carry, and a {1} and {0} in either order will propagate an incoming carry to an outgoing one. Whether there is a carry-in at a particular bit position is computed by a {\mathit{foldr}} of the bits to the right of that position, zipped together, starting from the initial carry-in, using the binary operator {\oplus} defined by

\displaystyle  \begin{array}{@{}ll} (1,1) \oplus b & = 1 \\ (0,0) \oplus b & = 0 \\ (x,y) \oplus b & = b \end{array}

Again, applying {\oplus} to a bit-pair makes a bit-to-bit function; these functions are to be composed, and function composition is associative. Better, such functions have small finite representations as arrays (indeed, we need a domain of only three elements, {G, K, P}; {G} and {K} are both left zeroes of composition, and {P} is neutral). Better still, we can compute the carry-in at all positions using a {\mathit{scanr}}, which for an associative binary operator can also be performed in parallel in {\mathrm{log}\;n} steps on {n} processors.

The Trick seems to be related to Cayley’s Theorem, on monoids rather than groups: any monoid is equivalent to a monoid of endomorphisms. That is, corresponding to every monoid {(M,{\oplus},e)}, with {M} a set, and {(\oplus) :: M \rightarrow M \rightarrow M} an associative binary operator with neutral element {e :: M}, there is another monoid {(N,(\cdot),\mathit{id})} on the carrier {N = \{ (x\oplus) \mid x \in M \}} of endomorphisms of the form {(x\oplus)}; the mappings {(\oplus) :: M \rightarrow N} and {(\$ e) :: N \rightarrow M} are both monoid homomorphisms, and are each other’s inverse. (Cayley’s Theorem is what happens to the Yoneda Embedding when specialized to the one-object category representing a monoid.) So any list homomorphism corresponds to a list homomorphism with endomorphisms as the carrier, followed by a single final function application:

\displaystyle  \begin{array}{@{}ll} & \mathit{hom}\;(\oplus)\;f\;e \\ = & \qquad \{ \mbox{Second Homomorphism Theorem} \} \\ & \mathit{foldr}\;((\oplus) \cdot f)\;e \\ = & \qquad \{ \mbox{The Trick} \} \\ & (\$ e) \cdot \mathit{hom}\;(\cdot)\;((\oplus) \cdot f)\;\mathit{id} \end{array}

Note that {(\oplus) \cdot f} takes a list element {a} the endomorphism {(f\,a\, \oplus)}. The change of representation from {M} to {N} is the same as in The Trick, and is also what underlies Hughes’s novel representation of lists; but I can’t quite put my finger on what these both have to do with Cayley and Yoneda.

About jeremygibbons

Jeremy Gibbons is Professor of Computing in Oxford University Department of Computer Science, and a fan of functional programming and patterns of computation.
This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Folds on lists

  1. Tom Schrijvers says:

    Hi Jeremy,

    The last equivalence, hom (+) f e = ($ e) . hom (.) ((+) . f) id, can be explained by naturality in the monoid parameter. It’s more obvious if you write it as:

    hom :: Monoid m => (a -> m) -> ([a] -> m)

    Because the monoid parameter occurs both co- and contra-variantly, you need a monoid isomorphism, which you have between M and N.

    Cheers,

    Tom

    • Tom Schrijvers says:

      Only the first half of the above explanation is right. Let me explain it properly in more detail.

      As you have already observed, your hom is equivalent to
      foldMap :: Monoid m => (a -> m) -> ([a] -> m).

      Let’s specialize this for a given element type A and call it fM.
      We can then see this as a natural transformation fM : F -> G
      where F and G are functors from the category Mon to the category Set.

      F : Mon -> Set
      F(M,(+),e) = A -> M
      F(f) = \g -> f . g

      G : Mon -> Set
      G(M,(+),e) = [A] -> M
      G(f) = \g -> f . g

      As you have already pointed out, for any given monoid (M,(+),e) the Yoneda embedding yields a monoid (N,(.),id). The functions ($e) :: N -> M and (+) :: M -> N are monoid homomorphisms between these two monoids such that ($e) . (+) = id_M (*).

      Then we can calculate:
      fm_M
      = (category)
      id_GM . fm_M
      = (functor)
      G(id_M) . fm_M
      = (*)
      G(($e) . (+)) . fm_M
      = (functor)
      G(($e)) . G((+)) . fm_M
      = (naturality)
      G(($e)) . fm_N . F((+))

      If we expand the above we get the equation

      foldMap f = ($e) . foldMap ((+) . f)

      or even more explicitly

      hom (+) f e = ($ e) . hom (.) ((+) . f) id

      Key here was the appeal to naturality. I believe that the Trick can be proven also by two appeals to naturality, one for each type parameter of foldr, one of which is the element type and the other the algebra.

      Cheers,

      Tom

  2. Tom Schrijvers says:

    Hi Jeremy,

    Following our personal conversation, here is another note about your interesting post.

    I believe that your universal property of left folds is not as universal as it could be. For one, as you point out, it does not allow you to show the fusion property.

    You have pointed out to me that there is a non-standard definition of foldl by Richard Bird that is dual to the standard foldr definition:

    foldl (+) e [] = e
    foldl (+) e (xs ++ [x]) = foldl (+) e xs + x

    This suggests directly a universal property formulation that is dual to that of foldr:

    h = foldl (+) e h [] = e /\ (forall x xs. h (xs ++ [x]) = h xs + x)

    This formulation does enable the fusion property:

    f . foldl (+) e = foldl (*) (f e) <= forall a b. f (a + b) = f a * b

    Indeed, we can show that the two conditions of the universal property are met:

    f (foldl (+) e [])
    = def foldl
    f e

    f (foldl (+) e (xs ++ [x])
    = def foldl
    f (foldl (+) e xs + x)
    = precondition of fusion
    f (foldl (+) e xs) * x

    Moreover, I believe that my formulation is more general than yours because it does not require the function h to be written in a form that is parametric in an accumulator value. If the function is already written in that style, then I believe that the two properties are equivalent. Yet, for instance, for the fusion property we do not have a formulation in that style, which is why your formulation does not fit.

    Also, I believe that neither universal property covers infinite lists — foldl seems to make sense only for finite lists.

    Cheers,

    Tom

  3. Pingback: Resumen de lecturas compartidas durante agosto de 2018 | Vestigium

Leave a reply to Tom Schrijvers Cancel reply