## Distributivity in Horner’s Rule

This is a continuation of my previous post on Horner’s Rule, and in particular, of the discussion there about distributivity in the datatype-generic version of the Maximum Segment Sum problem:

the essential property behind Horner’s Rule is one of distributivity. In the datatype-generic case, we will model this as follows. We are given an ${(\mathsf{F}\,\alpha)}$-algebra ${(\beta,f)}$ [for a binary shape functor ${\mathsf{F}}$], and a ${\mathsf{M}}$-algebra ${(\beta,k)}$ [for a collection monad ${\mathsf{M}}$]; you might think of these as “datatype-generic product” and “collection sum”, respectively. Then there are two different methods of computing a ${\beta}$ result from an ${\mathsf{F}\,\alpha\,(\mathsf{M}\,\beta)}$ structure: we can either distribute the ${\mathsf{F}\,\alpha}$ structure over the collection(s) of ${\beta}$s, compute the “product” ${f}$ of each structure, and then compute the “sum” ${k}$ of the resulting products; or we can “sum” each collection, then compute the “product” of the resulting structure. Distributivity of “product” over “sum” is the property that these two different methods agree, as illustrated in the following diagram.

For example, with ${f :: \mathsf{F}\,{\mathbb Z}\,{\mathbb Z} \rightarrow {\mathbb Z}}$ adding all the integers in an ${\mathsf{F}}$-structure, and ${k :: \mathsf{M}\,{\mathbb Z} \rightarrow {\mathbb Z}}$ finding the maximum of a (non-empty) collection, the diagram commutes.

There’s a bit of hand-waving above to justify the claim that this is really a kind of distributivity. What does it have to do with the common-or-garden equation

$\displaystyle a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$

stating distributivity of one binary operator over another? That question is the subject of this post.

## Distributing over effects

Recall that ${\delta_2 :: (\mathsf{F}\,\alpha)\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}(\mathsf{F}\,\alpha)}$ distributes the shape functor ${\mathsf{F}}$ over the monad ${\mathsf{M}}$ in its second argument; this is the form of distribution over effects that crops up in the datatype-generic Maximum Segment Sum problem. More generally, this works for any idiom ${\mathsf{M}}$; this will be important below.

Generalizing in another direction, one might think of distributing over an idiom in both arguments of the bifunctor, via an operator ${\delta : \mathsf{F} \cdot (\mathsf{M} \times \mathsf{M}) \mathbin{\stackrel{.}{\to}} \mathsf{M} \cdot \mathsf{F}}$, which is to say, ${\delta_\beta :: \mathsf{F}\,(\mathsf{M}\beta)\,(\mathsf{M}\beta) \rightarrow \mathsf{M}(\mathsf{F}\beta)}$, natural in the ${\beta}$. This is the ${\mathit{bidist}}$ method of the ${\mathit{Bitraversable}}$ subclass of ${\mathit{Bifunctor}}$ that Bruno Oliveira and I used in our Essence of the Iterator Pattern paper; informally, it requires just that ${\mathsf{F}}$ has a finite ordered sequence of “element positions”. Given ${\delta}$, one can define ${\delta_2 = \delta \cdot \mathsf{F}\,\mathit{pure}\,\mathit{id}}$.

That traversability (or equivalently, distributivity over effects) for a bifunctor ${\mathsf{F}}$ is definable for any idiom, not just any monad, means that one can also conveniently define an operator ${\mathit{contents}_{\mathsf{H}} : \mathsf{H} \mathbin{\stackrel{.}{\to}} \mathsf{List}}$ for any traversable unary functor ${\mathsf{H}}$. This is because the constant functor ${\mathsf{K}_{[\beta]}}$ (which takes any ${\alpha}$ to ${[\beta]}$) is an idiom: the ${\mathit{pure}}$ method returns the empty list, and idiomatic application appends two lists. Then one can define

$\displaystyle \mathit{contents}_{\mathsf{H}} = \delta \cdot \mathsf{H}\,\mathit{wrap}$

where ${\mathit{wrap}}$ makes a singleton list. For a traversable bifunctor ${\mathsf{F}}$, we define ${\mathit{contents}_{\mathsf{F}} = \mathit{contents}_{\mathsf{F}\cdot\triangle}}$ where ${\triangle}$ is the diagonal functor; that is, ${\mathit{contents}_{\mathsf{F}} :: \mathsf{F}\,\beta\,\beta \rightarrow [\beta]}$, natural in the ${\beta}$. (No constant functor is a monad, except in trivial categories, so this convenient definition of contents doesn’t work monadically. Of course, one can use a writer monad, but this isn’t quite so convenient, because an additional step is needed to extract the output.)

One important axiom of ${\delta}$ that I made recent use of in a paper with Richard Bird on Effective Reasoning about Effectful Traversals is that it should be “natural in the contents”: it should leave shape unchanged, and depend on contents only up to the extent of their ordering. Say that a natural transformation ${\phi : \mathsf{F} \mathbin{\stackrel{.}{\to}} \mathsf{G}}$ between traversable functors ${\mathsf{F}}$ and ${\mathsf{G}}$ “preserves contents” if ${\mathit{contents}_{\mathsf{G}} \cdot \phi = \mathit{contents}_{\mathsf{F}}}$. Then, in the case of unary functors, the formalization of “naturality in the contents” requires ${\delta}$ to respect content-preserving ${\phi}$:

$\displaystyle \delta_{\mathsf{G}} \cdot \phi = \mathsf{M}\phi \cdot \delta_{\mathsf{F}} : \mathsf{T}\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}\mathsf{G}$

In particular, ${\mathit{contents}_{\mathsf{F}} : \mathsf{F} \mathbin{\stackrel{.}{\to}} \mathsf{List}}$ itself preserves contents, and so we expect

$\displaystyle \delta_{\mathsf{List}} \cdot \mathit{contents}_{\mathsf{F}} = \mathsf{M}(\mathit{contents}_{\mathsf{F}}) \cdot \delta_{\mathsf{F}}$

to hold.

## Folding a structure

Happily, the same generic operation ${\mathit{contents}_{\mathsf{F}}}$ provides a datatype-generic means to “fold” over the elements of an ${\mathsf{F}}$-structure. Given a binary operator ${\otimes :: \beta\times\beta \rightarrow \beta}$ and an initial value ${b :: \beta}$, we can define an ${(\mathsf{F}\,\beta)}$-algebra ${(\beta,f)}$—that is, a function ${f :: \mathsf{F}\,\beta\,\beta\rightarrow\beta}$—by

$\displaystyle f = \mathit{foldr}\,(\otimes)\,b \cdot \mathit{contents}_{\mathsf{F}}$

(This is a slight specialization of the presentation of the datatype-generic MSS problem from last time; there we had ${f :: \mathsf{F}\,\alpha\,\beta \rightarrow \beta}$. The specialization arises because we are hoping to define such an ${f}$ given a homogeneous binary operator ${\otimes}$. On the other hand, the introduction of the initial value ${b}$ is no specialization, as we needed such a value for the “product” of an empty “segment” anyway.)

Incidentally, I believe that this “generic folding” construction is exactly what is intended in Ross Paterson’s Data.Foldable library.

## Summing a collection

The other ingredient we need is an ${\mathsf{M}}$-algebra ${(\beta,k)}$. We already decided last time to

stick to reductions${k}$s of the form ${\oplus/}$ for associative binary operator ${{\oplus} :: \beta \times \beta \rightarrow \beta}$; then we also have distribution over choice: ${\oplus / (x \mathbin{\underline{\smash{\mathit{mplus}}}} y) = (\oplus/x) \oplus (\oplus/y)}$. Note also that we prohibited empty collections in ${\mathsf{M}}$, so we do not need a unit for ${\oplus}$.

On account of ${\oplus/}$ being an algebra for the collection monad ${\mathsf{M}}$, we also get a singleton rule ${\oplus/ \cdot \mathit{return} = \mathit{id}}$.

## Reduction to distributivity for lists

One of the take-home messages in the Effective Reasoning about Effectful Traversals paper is that it helps to reduce a traversal problem for datatypes in general to a more specific one about lists, exploiting the “naturality in contents” property of traversability. We’ll use that tactic for the distributivity property in the datatype-generic version Horner’s Rule.

In this diagram, the perimeter is the commuting diagram given at the start of this post—the diagram we have to justify. Face (1) is the definition of ${\delta_2}$ in terms of ${\delta}$. Faces (2) and (3) are the expansion of ${f}$ as generic folding of an ${\mathsf{F}}$-structure. Face (4) follows from ${\oplus/}$ being an ${\mathsf{M}}$-algebra, and hence being a left-inverse of ${\mathit{return}}$. Face (5) is an instance of the naturality property of ${\mathit{contents}_{\mathsf{F}} : \mathsf{F}\triangle \mathbin{\stackrel{.}{\to}} \mathsf{List}}$. Face (6) is the property that ${\delta}$ respects the contents-preserving transformation ${\mathit{contents}_{\mathsf{F}}}$. Therefore, the whole diagram commutes if Face (7) does—so let’s look at Face (7)!

## Distributivity for lists

Here’s Face (7) again:

Demonstrating that this diagram commutes is not too difficult, because both sides turn out to be list folds.

Around the left and bottom edges, we have a fold ${\mathit{foldr}\,(\otimes)\,b}$ after a map ${\mathsf{List}\,(\oplus)}$, which automatically fuses to ${\mathit{foldr}\,(\odot)\,b}$, where ${\odot}$ is defined by

$\displaystyle x \odot a = (\oplus/x) \otimes a$

or, pointlessly,

$\displaystyle (\odot) = (\otimes) \cdot (\oplus/) \times \mathit{id}$

Around the top and right edges we have the composition ${\oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \cdot \delta_{\mathsf{List}}}$. If we can write ${\delta_{\mathsf{List}}}$ as an instance of ${\mathit{foldr}}$, we can then use the fusion law for ${\mathit{foldr}}$

$\displaystyle h \cdot \mathit{foldr}\,f\,e = \mathit{foldr}\,f'\,e' \;\Leftarrow\; h\,e=e' \land h \cdot f = f' \cdot \mathit{id}\times h$

to prove that this composition equals ${\mathit{foldr}\,(\odot)\,b}$.

In fact, there are various equivalent ways of writing ${\delta_{\mathsf{List}}}$ as an instance of ${\mathit{foldr}}$. The definition given by Conor McBride and Ross Paterson in their original paper on idioms looked like the identity function, but with added idiomness:

$\displaystyle \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{pure}\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathit{pure}\,(:) \circledast \mathit{mb} \circledast \delta_{\mathsf{List}}\,\mathit{mbs} \end{array}$

In the special case that the idiom is a monad, it can be written in terms of ${\mathit{liftM}_0}$ (aka ${\mathit{return}}$) and ${\mathit{liftM}_2}$:

$\displaystyle \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{liftM}_0\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathit{liftM}_2\,(:)\,\mathit{mb}\,(\delta_{\mathsf{List}}\,\mathit{mbs}) \end{array}$

But we’ll use a third definition:

$\displaystyle \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{return}\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathsf{M}(:)\,(\mathit{cp}\,(\mathit{mb}, \delta_{\mathsf{List}}\,\mathit{mbs})) \end{array}$

where

$\displaystyle \begin{array}{lcl} \mathit{cp} &::& \mathsf{M}\,\alpha \times \mathsf{M}\,\beta \rightarrow \mathsf{M}(\alpha\times\beta) \\ \mathit{cp}\,(x,y) &=& \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a,b) \} \end{array}$

That is,

$\displaystyle \delta_{\mathsf{List}} = \mathit{foldr}\,(\mathsf{M}(:)\cdot\mathit{cp})\,(\mathit{return}\,[\,])$

Now, for the base case we have

$\displaystyle \oplus/\,(\mathsf{M}(\mathit{foldr}\,(\otimes)\,b)\,(\mathit{return}\,[\,])) = \oplus/\,(\mathit{return}\,(\mathit{foldr}\,(\otimes)\,b\,[\,])) = \oplus/\,(\mathit{return}\,b) = b$

as required. For the inductive step, we have:

$\displaystyle \begin{array}{ll} & \oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \cdot \mathsf{M}(:) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{functors} \} \\ & \oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b \cdot (:)) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{evaluation for $$\mathit{foldr}$$} \} \\ & \oplus/ \cdot \mathsf{M}((\otimes) \cdot \mathit{id}\times\mathit{foldr}\,(\otimes)\,b) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{functors; naturality of $$\mathit{cp}$$} \} \\ & \oplus/ \cdot \mathsf{M}(\otimes) \cdot \mathit{cp} \cdot \mathsf{M}\mathit{id}\times\mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \\ = & \qquad \{ \mbox{distributivity for $$\mathit{cp}$$: see below} \} \\ & (\otimes) \cdot (\oplus/)\times(\oplus/) \cdot \mathsf{M}\mathit{id}\times\mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \\ = & \qquad \{ \mbox{functors} \} \\ & (\otimes) \cdot (\oplus/)\times\mathit{id} \cdot \mathit{id}\times\mathsf{M}(\oplus/\cdot\mathit{foldr}\,(\otimes)\,b) \end{array}$

which completes the fusion proof, modulo the wish about distributivity for ${\mathit{cp}}$:

$\displaystyle \oplus/ \cdot \mathsf{M}(\otimes) \cdot \mathit{cp} = (\otimes) \cdot (\oplus/)\times(\oplus/)$

## Distributivity for cartesian product

As for that wish about distributivity for ${\mathit{cp}}$:

$\displaystyle \begin{array}{ll} & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}(\otimes) \mathbin{\hbox{\footnotesize\}} \mathit{cp}\,(x,y) \\ = & \qquad \{ \mbox{definition of $$\mathit{cp}$$} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}(\otimes) \mathbin{\hbox{\footnotesize\}} \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a,b) \,\} \\ = & \qquad \{ \mbox{map over $$\mathbf{do}$$} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a \otimes b) \,\} \\ = & \qquad \{ \mbox{expanding $$\mathbf{do}$$} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathit{join} \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} \mathsf{M}\,(a\otimes)\,y)\,x \\ = & \qquad \{ \mbox{$$\oplus/$$ is an $$\mathsf{M}$$-algebra} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}(\oplus/) \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} \mathsf{M}\,(a\otimes)\,y)\,x \\ = & \qquad \{ \mbox{functors} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} \oplus/(\mathsf{M}\,(a\otimes)\,y))\,x \\ = & \qquad \{ \mbox{distributivity for collections: see below} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} a \otimes (\oplus/\,y))\,x \\ = & \qquad \{ \mbox{sectioning} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\otimes (\oplus/\,y))\,x \\ = & \qquad \{ \mbox{distributivity for collections again} \} \\ & (\otimes (\oplus/\,y))\,(\oplus/\,x) \\ = & \qquad \{ \mbox{sectioning} \} \\ & (\oplus/\,x) \otimes (\oplus/\,y) \\ = & \qquad \{ \mbox{eta-expansion} \} \\ & (\otimes) \mathbin{\hbox{\footnotesize\}} (\oplus/ \times \oplus/) \mathbin{\hbox{\footnotesize\}} (x,y) \\ \end{array}$

which discharges the proof obligation about distributivity for cartesian product, but again modulo two symmetric wishes about distributivity for collections:

$\displaystyle \begin{array}{lcl} \oplus/ \cdot \mathsf{M}(a\otimes) &=& (a\otimes) \cdot \oplus/ \\ \oplus/ \cdot \mathsf{M}(\otimes b) &=& (\otimes b) \cdot \oplus/ \\ \end{array}$

## Distributivity for collections

Finally, the proof obligations about distributivity for collections are easily discharged, by induction over the size of the (finite!) collection, provided that the binary operator ${\otimes}$ distributes over ${\oplus}$ in the familiar sense. The base case is for a singleton collection, ie in the image of ${\mathit{return}}$ (because we disallowed empty collections); this case follows from the fact that ${\oplus/}$ is an ${\mathsf{M}}$-algebra. The inductive step is for a collection of the form ${u \mathbin{\underline{\smash{\mathit{mplus}}}} v}$ with ${u,v}$ both strictly smaller than the whole (so, if the monad is idempotent, disjoint, or at least not nested); this requires the distribution of the algebra over choice ${\oplus / (u \mathbin{\underline{\smash{\mathit{mplus}}}} v) = (\oplus/u) \oplus (\oplus/v)}$, together with the familiar distribution of ${\otimes}$ over ${\oplus}$.

## Summary

So, the datatype-generic distributivity for ${\mathsf{F}}$-structures of collections that we used for the Maximum Segment Sum problem reduced to distributivity for lists of collections, which reduced to the cartesian product of collections, which reduced to that for pairs. That’s a much deeper hierarchy than I was expecting; can it be streamlined?