This is a continuation of my previous post on Horner’s Rule, and in particular, of the discussion there about distributivity in the datatype-generic version of the Maximum Segment Sum problem:
the essential property behind Horner’s Rule is one of distributivity. In the datatype-generic case, we will model this as follows. We are given an -algebra [for a binary shape functor ], and a -algebra [for a collection monad ]; you might think of these as “datatype-generic product” and “collection sum”, respectively. Then there are two different methods of computing a result from an structure: we can either distribute the structure over the collection(s) of s, compute the “product” of each structure, and then compute the “sum” of the resulting products; or we can “sum” each collection, then compute the “product” of the resulting structure. Distributivity of “product” over “sum” is the property that these two different methods agree, as illustrated in the following diagram.
For example, with adding all the integers in an -structure, and finding the maximum of a (non-empty) collection, the diagram commutes.
There’s a bit of hand-waving above to justify the claim that this is really a kind of distributivity. What does it have to do with the common-or-garden equation
stating distributivity of one binary operator over another? That question is the subject of this post.
Distributing over effects
Recall that distributes the shape functor over the monad in its second argument; this is the form of distribution over effects that crops up in the datatype-generic Maximum Segment Sum problem. More generally, this works for any idiom ; this will be important below.
Generalizing in another direction, one might think of distributing over an idiom in both arguments of the bifunctor, via an operator , which is to say, , natural in the . This is the method of the subclass of that Bruno Oliveira and I used in our Essence of the Iterator Pattern paper; informally, it requires just that has a finite ordered sequence of “element positions”. Given , one can define .
That traversability (or equivalently, distributivity over effects) for a bifunctor is definable for any idiom, not just any monad, means that one can also conveniently define an operator for any traversable unary functor . This is because the constant functor (which takes any to ) is an idiom: the method returns the empty list, and idiomatic application appends two lists. Then one can define
where makes a singleton list. For a traversable bifunctor , we define where is the diagonal functor; that is, , natural in the . (No constant functor is a monad, except in trivial categories, so this convenient definition of contents doesn’t work monadically. Of course, one can use a writer monad, but this isn’t quite so convenient, because an additional step is needed to extract the output.)
One important axiom of that I made recent use of in a paper with Richard Bird on Effective Reasoning about Effectful Traversals is that it should be “natural in the contents”: it should leave shape unchanged, and depend on contents only up to the extent of their ordering. Say that a natural transformation between traversable functors and “preserves contents” if . Then, in the case of unary functors, the formalization of “naturality in the contents” requires to respect content-preserving :
In particular, itself preserves contents, and so we expect
Folding a structure
Happily, the same generic operation provides a datatype-generic means to “fold” over the elements of an -structure. Given a binary operator and an initial value , we can define an -algebra —that is, a function —by
(This is a slight specialization of the presentation of the datatype-generic MSS problem from last time; there we had . The specialization arises because we are hoping to define such an given a homogeneous binary operator . On the other hand, the introduction of the initial value is no specialization, as we needed such a value for the “product” of an empty “segment” anyway.)
Incidentally, I believe that this “generic folding” construction is exactly what is intended in Ross Paterson’s Data.Foldable library.
Summing a collection
The other ingredient we need is an -algebra . We already decided last time to
stick to reductions—s of the form for associative binary operator ; then we also have distribution over choice: . Note also that we prohibited empty collections in , so we do not need a unit for .
On account of being an algebra for the collection monad , we also get a singleton rule .
Reduction to distributivity for lists
One of the take-home messages in the Effective Reasoning about Effectful Traversals paper is that it helps to reduce a traversal problem for datatypes in general to a more specific one about lists, exploiting the “naturality in contents” property of traversability. We’ll use that tactic for the distributivity property in the datatype-generic version Horner’s Rule.
In this diagram, the perimeter is the commuting diagram given at the start of this post—the diagram we have to justify. Face (1) is the definition of in terms of . Faces (2) and (3) are the expansion of as generic folding of an -structure. Face (4) follows from being an -algebra, and hence being a left-inverse of . Face (5) is an instance of the naturality property of . Face (6) is the property that respects the contents-preserving transformation . Therefore, the whole diagram commutes if Face (7) does—so let’s look at Face (7)!
Distributivity for lists
Here’s Face (7) again:
Demonstrating that this diagram commutes is not too difficult, because both sides turn out to be list folds.
Around the left and bottom edges, we have a fold after a map , which automatically fuses to , where is defined by
Around the top and right edges we have the composition . If we can write as an instance of , we can then use the fusion law for
to prove that this composition equals .
In fact, there are various equivalent ways of writing as an instance of . The definition given by Conor McBride and Ross Paterson in their original paper on idioms looked like the identity function, but with added idiomness:
In the special case that the idiom is a monad, it can be written in terms of (aka ) and :
But we’ll use a third definition:
Now, for the base case we have
as required. For the inductive step, we have:
which completes the fusion proof, modulo the wish about distributivity for :
Distributivity for cartesian product
As for that wish about distributivity for :
which discharges the proof obligation about distributivity for cartesian product, but again modulo two symmetric wishes about distributivity for collections:
Distributivity for collections
Finally, the proof obligations about distributivity for collections are easily discharged, by induction over the size of the (finite!) collection, provided that the binary operator distributes over in the familiar sense. The base case is for a singleton collection, ie in the image of (because we disallowed empty collections); this case follows from the fact that is an -algebra. The inductive step is for a collection of the form with both strictly smaller than the whole (so, if the monad is idempotent, disjoint, or at least not nested); this requires the distribution of the algebra over choice , together with the familiar distribution of over .
So, the datatype-generic distributivity for -structures of collections that we used for the Maximum Segment Sum problem reduced to distributivity for lists of collections, which reduced to the cartesian product of collections, which reduced to that for pairs. That’s a much deeper hierarchy than I was expecting; can it be streamlined?