Comprehensions

Prompted by some recent work I’ve been doing on reasoning about monadic computations, I’ve been looking back at the work from the 1990s by Phil Trinder, Limsoon Wong, Leonidas Fegaras, Torsten Grust, and others, on monad comprehensions as a framework for database queries.

The idea goes back to the adjunction between extension and intension in set theory—you can define a set by its extension, that is by listing its elements:

\displaystyle  \{ 1, 9, 25, 49, 81 \}

or by its intension, that is by characterizing those elements:

\displaystyle  \{ n^2 \mid 0 < n < 10 \land n \equiv 1 (\mathop{mod} 2) \}

Expressions in the latter form are called set comprehensions. They inspired a programming notation in the SETL language from NYU, and have become widely known through list comprehensions in languages like Haskell. The structure needed of sets or of lists to make this work is roughly that of a monad, and Phil Wadler showed how to generalize comprehensions to arbitrary monads, which led to the “do” notation in Haskell. Around the same time, Phil Trinder showed that comprehensions make a convenient database query language. The comprehension notation has been extended to cover other important aspects of database queries, particularly aggregation and grouping. Monads and aggregations have very nice algebraic structure, which leads to a useful body of laws to support database query optimization.

List comprehensions

Just as a warm-up, here is a reminder about Haskell’s list comprehensions.

\displaystyle  [ 2 \times a + b \mid a \leftarrow [1,2,3] , b \leftarrow [4,5,6] , b \mathbin{\underline{\smash{\mathit{mod}}}} a == 0 ]

This (rather concocted) example yields the list of all values of the expression {2 \times a + b} as {a} is drawn from {[1,2,3]} and {b} from {[4,5,6]} and such that {b} is divisible by {a}, namely {[6,7,8,8,10,12]}.

To the left of the vertical bar is the term (an expression). To the right is a comma-separated sequence of qualifiers, each of which is either a generator (of the form {a \leftarrow x}, with a variable {a} and a list expression {x}) or a filter (a boolean expression). The scope of a variable introduced by a generator extends to all subsequent generators and to the term. Note that, in contrast to the mathematical inspiration, bound variables need to be generated from some existing list.

The semantics of list comprehensions is defined by translation; see for example Phil Wadler’s Chapter 7 of The Implementation of Functional Programming Languages. It can be expressed equationally as follows:

\displaystyle  \begin{array}{lcl} [ e \mid \epsilon ] &=& [e] \\ {} [ e \mid b ] &=& \mathbf{if}\;b\;\mathbf{then}\;[ e ]\;\mathbf{else}\;[\,] \\ {} [ e \mid a \leftarrow x ] &=& \mathit{map}\,(\lambda a \mathbin{.} e)\,x \\ {} [ e \mid q, q' ] &=& \mathit{concat}\,[ [ e \mid q' ] \mid q ] \end{array}

(Here, {\epsilon} denotes the empty sequence of qualifiers. It’s not allowed in Haskell, but it is helpful in simplifying the translation.)

Applying this translation to the example at the start of the section gives

\displaystyle  \begin{array}{ll} & [ 2 \times a + b \mid a \leftarrow [1,2,3] , b \leftarrow [4,5,6] , b \mathbin{\underline{\smash{\mathit{mod}}}} a == 0 ] \\ = & \mathit{concat}\,(\mathit{map}\,(\lambda a \mathbin{.} \mathit{concat}\,(\mathit{map}\,(\lambda b \mathbin{.} \mathbf{if}\;b \mathbin{\underline{\smash{\mathit{mod}}}} a == 0\;\mathbf{then}\;[2 \times a + b]\;\mathbf{else}\;[\,])\,[4,5,6]))\,[1,2,3]) \\ = & [6,7,8,8,10,12] \end{array}

More generally, a generator may match against a pattern rather than just a variable. In that case, it may bind multiple (or indeed no) variables at once; moreover, the match may fail, in which case it is discarded. This is handled by modifying the translation for generators to use a function defined by pattern-matching, rather than a straight lambda-abstraction:

\displaystyle  [ e \mid p \leftarrow x ] = \mathit{concat}\,(\mathit{map}\,(\lambda a \mathbin{.} \mathbf{case}\;a\;\mathbf{of}\;p \rightarrow [ e ] \;;\; \_ \rightarrow [\,])\,x)

or, more perspicuously,

\displaystyle  [ e \mid p \leftarrow x ] = \mathbf{let}\;h\,p = [ e ] ; h\,\_ = [\,]\;\mathbf{in}\; \mathit{concat}\,(\mathit{map}\,h\,x)

Monad comprehensions

It is clear from the above translation that the necessary ingredients for list comprehensions are {\mathit{map}}, singletons, {\mathit{concat}}, and the empty list. The first three are the operations arising from lists as a functor and a monad, which suggests that the same translation might be applicable to other monads too. But the fourth ingredient, the empty list, does not come from the functor and monad structures; that requires an extra assumption:

\displaystyle  \begin{array}{ll} \mathbf{class}\;\mathit{Monad}\,m \Rightarrow \mathit{MonadZero}\,m\;\mathbf{where} \\ \quad \mathit{mzero} :: m\,a \end{array}

Then the translation for list comprehensions can be generalized to other monads:

\displaystyle  \begin{array}{lcl} [ e \mid \epsilon ] &=& \mathit{return}\,e \\ {} [ e \mid b ] &=& \mathbf{if}\;b\;\mathbf{then}\;\mathit{return}\,e\;\mathbf{else}\;\mathit{mzero} \\ {} [ e \mid p \leftarrow m ] &=& \mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \mathit{mzero}\;\mathbf{in}\; \mathit{join}\,(\mathit{map}\,h\,m) \\ {} [ e \mid q, q' ] &=& \mathit{join}\,[ [ e \mid q' ] \mid q ] \end{array}

(so {[ e \mid \epsilon ] = [ e \mid \mathit{True} ]}). The actual monad to be used is implicit; if we want to be explicit, we could use a subscript, as in “{[ e \mid q ]_\mathsf{List}}“.

This translation is different from the one used in the Haskell language specification, which to my mind is a little awkward: the empty list crops up in two different ways in the translation of list comprehensions—for filters, and for generators with patterns—and these are generalized in two different ways to other monads (to the {\mathit{mzero}} method of the {\mathit{MonadPlus}} class in the first case, and the {\mathit{fail}} method of the {\mathit{Monad}} class in the second). I think it is neater to have a monad subclass {\mathit{MonadZero}} with a single method subsuming both these operators. Of course, this does mean that the translation forces a monad comprehension with filters or possibly failing generators to be interpreted in a monad in the {\mathit{MonadZero}} subclass rather than just {\mathit{Monad}}—the type class constraints that are generated depend on the features used in the comprehension. (Perhaps this translation was tried in earlier versions of the language specification, and found wanting?)

Taking this approach gives basically the monad comprehension notation from Wadler’s Comprehending Monads paper; it loosely corresponds to Haskell’s do notation, except that the term is to the left of a vertical bar rather than at the end, and that filters are just boolean expressions rather than introduced using {\mathit{guard}}.

We might impose the law that {\mathit{mzero}} is a “left” zero of composition, in the sense

\displaystyle  \mathit{join}\,\mathit{mzero} = \mathit{mzero}

or, in terms of comprehensions,

\displaystyle  [ e \mid a \leftarrow \mathit{mzero} ] = \mathit{mzero}

Informally, this means that any failing steps of the computation cleanly cut off subsequent branches. Conversely, we do not require that {\mathit{mzero}} is a “right” zero too:

\displaystyle  \mathit{join}\,(\mathit{map}\,(\lambda a \mathbin{.} \mathit{mzero})\,m) \ne \mathit{mzero} \quad\mbox{(in general)}

This would have the consequence that a failing step also cleanly erases any effects from earlier parts of the computation, which is too strong a requirement for many monads—particularly those of the “launch missiles now” variety. (The names “left-” and “right zero” make more sense when the equations are expressed in terms of the usual Haskell bind operator {(\gg\!=)}, which is a kind of sequential composition.)

Ringads and collection classes

One more ingredient is needed in order to characterize monads that correspond to “collection classes” such as sets and lists, and that is an analogue of set union or list append. It’s not difficult to see that this is inexpressible in terms of the operations introduced so far: given only collections {m} of at most one element, any comprehension using generators of the form {a \leftarrow m} will only yield another such collection, whereas the union of two one-element collections will in general have two elements.

To allow any finite collection to be expressed, it suffices to introduce a binary union operator {\uplus}:

\displaystyle  \begin{array}{ll} \mathbf{class}\;\mathit{Monad}\,m \Rightarrow \mathit{MonadPlus}\,m\;\mathbf{where} \\ \quad (\uplus) :: m\,a \times m\,a \rightarrow m\,a \end{array}

We require composition to distribute over union, in the following sense:

\displaystyle  \mathit{join}\,(m \uplus n) = \mathit{join}\,m \uplus \mathit{join}\,n

or, in terms of comprehensions,

\displaystyle  [ e \mid a \leftarrow m \uplus n, q ] = [ e \mid a \leftarrow m, q ] \uplus [ e \mid a \leftarrow n, q ]

For the remainder of this post, we will assume a monad in both {\mathit{MonadZero}} and {\mathit{MonadPlus}}. Moreover, we will assume that {\mathit{mzero}} is the unit of {\uplus}, and is both a left- and a right zero of composition. To stress the additional constraints, we will write “{\emptyset}” for “{\mathit{mzero}}” from now on. The intention is that such monads exactly capture collection classes; Phil Wadler has called these structures ringads. (He seems to have done so in an unpublished note Notes on Monads and Ringads from 1990, which is cited by some papers from the early 1990s. But Phil no longer has a copy of this note, and it’s not online anywhere… I’d love to see a copy, if anyone has one!)

\displaystyle  \begin{array}{ll} \mathbf{class}\;(\mathit{MonadZero}\,m, \mathit{MonadPlus}\,m) \Rightarrow \mathit{Ringad}\,m\;\mathbf{where} \end{array}

(There are no additional methods; the class {\mathit{Ringad}} is the intersection of the two parent classes {\mathit{MonadZero}} and {\mathit{MonadPlus}}, with the union of the two interfaces, together with the laws above.) I used roughly the same construction already in the post on Horner’s Rule.

As well as (finite) sets and lists, ringad instances include (finite) bags and a funny kind of binary tree (externally labelled, possibly empty, in which the empty tree is a unit of the binary tree constructor). These are all members of the so-called Boom Hierarchy of types—a name coined by Richard Bird, for an idea due to Hendrik Boom, who by happy coincidence is named for one of these structures in his native language. All members of the Boom Hierarchy are generated from the empty, singleton, and union operators, the difference being whether union is associative, commutative, and idempotent. Another ringad instance, but not a member of the Boom Hierarchy, is the type of probability distributions—either normalized, with a weight-indexed family of union operators, or unnormalized, with an additional scaling operator.

Aggregation

The well-behaved operations over monadic values are called the algebras for that monad—functions {k} such that {k \cdot \mathit{return} = \mathit{id}} and {k \cdot \mathit{join} = k \cdot \mathit{map}\,k}. In particular, {\mathit{join}} is itself a monad algebra. When the monad is also a ringad, {k} necessarily distributes also over {\uplus}—there is a binary operator {\oplus} such that {k\,(m \uplus n) = k\,m \oplus k\,n} (exercise!). Without loss of generality, we write {\oplus/} for {k}; these are the “reductions” of the Bird–Meertens Formalism. In that case, {\mathit{join} = \uplus/} is a ringad algebra.

The algebras for a ringad amount to aggregation functions for a collection: the sum of a bag of integers, the maximum of a set of naturals, and so on. We could extend the comprehension notation to encompass aggregations too, for example by adding an optional annotation, writing say “{[ e \mid q ]^\oplus}“; although this doesn’t add much, because we could just have written “{\oplus/\,[e \mid q]}” instead. We could generalize from reductions {\oplus/} to collection homomorphisms {\oplus/ \cdot \mathit{map}\,f}; but this doesn’t add much either, because the map is easily combined with the comprehension—it’s easy to show the “map over comprehension” property

\displaystyle  \mathit{map}\,f\,[e \mid q] = [f\,e \mid q]

Leonidas Fegaras and David Maier develop a monoid comprehension calculus around such aggregations; but I think their name is inappropriate, because nothing forces the binary aggregating operator to be associative.

Note that, for {\oplus/} to be well-defined, {\oplus} must satisfy all the laws that {\uplus} does—{\oplus} must be associative if {\uplus} is associative, and so on. It is not hard to show, for instance, that there is no {\oplus} on sets of numbers for which {\mathit{sum}\,(x \cup y) = \mathit{sum}\,x \oplus \mathit{sum}\,y}; such an {\oplus} would have to be idempotent, which is inconsistent with its relationship with {\mathit{sum}}. (So, although {[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Bag}^{+}} denotes the sum of the squares of the odd elements of bag {x}, the expression {[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Set}^{+}} (with {x} now a set) is not defined, because {+} is not idempotent.) In particular, {\oplus/\emptyset} must be the unit of {\oplus}, which we write {1_\oplus}.

We can derive translation rules for aggregations from the definition

\displaystyle  [ e \mid q ]^\oplus = \oplus/\,[e \mid q]

For empty aggregations, we have:

\displaystyle  \begin{array}{ll} & [ e \mid \epsilon ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid \epsilon ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathit{return}\,e) \\ = & \qquad \{ \mbox{monad algebra} \} \\ & e \end{array}

For filters, we have:

\displaystyle  \begin{array}{ll} & [ e \mid b ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid b ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathbf{if}\;b\;\mathbf{then}\;\mathit{return}\,e\;\mathbf{else}\;\emptyset) \\ = & \qquad \{ \mbox{lift out the conditional} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;{\oplus/}\,(\mathit{return}\,e)\;\mathbf{else}\;{\oplus/}\,\emptyset \\ = & \qquad \{ \mbox{ringad algebra} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;e\;\mathbf{else}\;1_\oplus \end{array}

For generators, we have:

\displaystyle  \begin{array}{ll} & [ e \mid p \leftarrow m ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid p \leftarrow m ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \emptyset\;\mathbf{in}\;\mathit{join}\,(\mathit{map}\,h\,m)) \\ = & \qquad \{ \mbox{lift out the \textbf{let}} \} \\ & \mathbf{let}\;h\,p = \mathit{return},e ; h\,\_ = \emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{join}\,(\mathit{map}\,h\,m)) \\ = & \qquad \{ \mbox{monad algebra} \} \\ & \mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,(\oplus/)\,(\mathit{map}\,h\,m)) \\ = & \qquad \{ \mbox{functors} \} \\ & \mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,(\oplus/ \cdot h)\,m) \\ = & \qquad \{ \mbox{let~} h' = \oplus/ \cdot h \} \\ & \mathbf{let}\;h'\,p = \oplus/\,(\mathit{return}\,e) ; h'\,\_ = \oplus/\,\emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,h'\,m) \\ = & \qquad \{ \mbox{ringad algebra} \} \\ & \mathbf{let}\;h'\,p = e ; h'\,\_ = 1_\oplus\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,h'\,m) \end{array}

And for sequences of qualifiers, we have:

\displaystyle  \begin{array}{ll} & [ e \mid q, q' ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid q, q' ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathit{join}\,[ [ e \mid q'] \mid q ] \\ = & \qquad \{ \mbox{monad algebra} \} \\ & \oplus/\,(\mathit{map}\,(\oplus/)\,[ [ e \mid q'] \mid q ]) \\ = & \qquad \{ \mbox{map over comprehension} \} \\ & \oplus/\,[ \oplus/\,[ e \mid q'] \mid q ] \\ = & \qquad \{ \mbox{aggregation} \} \\ & [ [ e \mid q']^\oplus \mid q ]^\oplus \end{array}

Putting all this together, we have:

\displaystyle  \begin{array}{lcl} [ e \mid \epsilon ]^\oplus &=& e \\ {} [ e \mid b ]^\oplus &=&\mathbf{if}\;b\;\mathbf{then}\;e\;\mathbf{else}\;1_\oplus \\ {} [ e \mid p \leftarrow m ]^\oplus &=& \mathbf{let}\;h\,p = e ; h\,\_ = 1_\oplus\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,h\,m) \\ {} [ e \mid q, q' ]^\oplus &=& [ [ e \mid q']^\oplus \mid q ]^\oplus \end{array}

Heterogeneous comprehensions

We have seen that comprehensions can be interpreted in an arbitrary ringad; for example, {[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Set}} denotes (the set of) the squares of the odd elements of (the set) {x}, whereas {[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Bag}} denotes the bag of such elements, with {x} a bag. Can we make sense of “heterogeneous comprehensions”, involving several different ringads?

Let’s introduced the notion of a ringad morphism, extending the familiar analogue on monads. For monads {\mathsf{M}} and {\mathsf{N}}, a monad morphism {\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}} is a natural transformation {\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}}—that is, a family {\phi_\alpha :: \mathsf{M}\,\alpha \rightarrow \mathsf{N}\,\alpha} of arrows, coherent in the sense that {\phi_\beta \cdot \mathsf{M}\,f = \mathsf{N}\,f \cdot \phi_\alpha} for {f :: \alpha \rightarrow \beta}—that also preserves the monad structure:

\displaystyle  \begin{array}{lclcl} \phi \cdot \mathit{return}_\mathsf{M} &=& \mathit{return}_\mathsf{N} \\ \phi \cdot \mathit{join}_\mathsf{M} &=& \mathit{join}_\mathsf{N} \cdot \phi \cdot \mathsf{M}\,\phi &=& \mathit{join}_\mathsf{N} \cdot \mathsf{N}\,\phi \cdot \phi \end{array}

A ringad morphism {\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}} for ringads {\mathsf{M},\mathsf{N}} is a monad morphism {\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}} that also respects the ringad structure:

\displaystyle  \begin{array}{lcl} \phi\,\emptyset_\mathsf{M} &=& \emptyset_\mathsf{N} \\ \phi\,(x \uplus_\mathsf{M} y) &=& \phi\,x \uplus_\mathsf{N} \phi\,y \end{array}

Then a ringad morphism behaves nicely with respect to ringad comprehensions—a comprehension interpreted in ringad {\mathsf{M}}, using existing collections of type {\mathsf{M}}, with the result transformed via a ringad morphism {\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}} to ringad {\mathsf{N}}, is equivalent to the comprehension interpreted in ringad {\mathsf{N}} in the first place, but with the initial collections transformed to type {\mathsf{N}}. Informally, there will be no surprises arising from when ringad coercions take place, because the results are the same whenever this happens. This property is straightforward to show by induction over the structure of the comprehension. For the empty comprehension, we have:

\displaystyle  \begin{array}{ll} & \phi\,[ e \mid \epsilon ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathit{return}_\mathsf{M}\,e) \\ = & \qquad \{ \mbox{ringad morphism} \} \\ & \mathit{return}_\mathsf{N}\,e \\ = & \qquad \{ \mbox{comprehension} \} \\ & [e \mid \epsilon ]_\mathsf{N} \end{array}

For filters, we have:

\displaystyle  \begin{array}{ll} & \phi\,[ e \mid b ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathbf{if}\;b\;\mathbf{then}\;\mathit{return}_\mathsf{M}\,e\;\mathbf{else}\;\emptyset_\mathsf{M}) \\ = & \qquad \{ \mbox{lift out the conditional} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;\phi\,(\mathit{return}_\mathsf{M}\,e)\;\mathbf{else}\;\phi\,\emptyset_\mathsf{M} \\ = & \qquad \{ \mbox{ringad morphism} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;\mathit{return}_\mathsf{N}\,e\;\mathbf{else}\;\emptyset_\mathsf{N} \\ = & \qquad \{ \mbox{comprehension} \} \\ & [ e \mid b ]_\mathsf{N} \end{array}

For generators:

\displaystyle  \begin{array}{ll} & \phi\,[ e \mid p \leftarrow m ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathbf{let}\;h\,p = \mathit{return}_\mathsf{M}\,e ; h\,\_ = \emptyset_\mathsf{M}\;\mathbf{in}\;\mathit{join}_\mathsf{M}\,(\mathit{map}_\mathsf{M}\,h\,m)) \\ = & \qquad \{ \mbox{lift out the \textbf{let}} \} \\ & \mathbf{let}\;h\,p = \mathit{return}_\mathsf{M}\,e ; h\,\_ = \emptyset_\mathsf{M}\;\mathbf{in}\;\phi\,(\mathit{join}_\mathsf{M}\,(\mathit{map}_\mathsf{M}\,h\,m)) \\ = & \qquad \{ \mbox{ringad morphism, functors} \} \\ & \mathbf{let}\;h\,p = \mathit{return}_\mathsf{M}\,e ; h\,\_ = \emptyset_\mathsf{M}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\phi\,(\mathit{map}_\mathsf{M}\,(\phi \cdot h)\,m)) \\ = & \qquad \{ \mbox{let~} h' = \phi \cdot h \} \\ & \mathbf{let}\;h'\,p = \phi\,(\mathit{return}_\mathsf{M}\,e) ; h'\,\_ = \phi\,\emptyset_\mathsf{M}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\phi\,(\mathit{map}_\mathsf{M}\,h'\,m)) \\ = & \qquad \{ \mbox{ringad morphism, induction} \} \\ & \mathbf{let}\;h'\,p = \mathit{return}_\mathsf{N}\,e ; h'\,\_ = \emptyset_\mathsf{N}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\phi\,(\mathit{map}_\mathsf{M}\,h'\,m)) \\ = & \qquad \{ \mbox{naturality of~} \phi \} \\ & \mathbf{let}\;h'\,p = \mathit{return}_\mathsf{N}\,e ; h'\,\_ = \emptyset_\mathsf{N}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\mathit{map}_\mathsf{N}\,h'\,(\phi\,m)) \\ = & \qquad \{ \mbox{comprehension} \} \\ & [ e \mid p \leftarrow \phi\,m ]_\mathsf{N} \end{array}

And for sequences of qualifiers:

\displaystyle  \begin{array}{ll} & \phi\,[ e \mid q, q' ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathit{join}\,[ [ e \mid q' ]_\mathsf{M} \mid q ]_\mathsf{M}) \\ = & \qquad \{ \mbox{ringad morphism} \} \\ & \phi\,(\mathit{map}\,\phi\,[ [ e \mid q' ]_\mathsf{M} \mid q ]_\mathsf{M}) \\ = & \qquad \{ \mbox{map over comprehension} \} \\ & \phi\,[ \phi\,[ e \mid q' ]_\mathsf{M} \mid q ]_\mathsf{M} \\ = & \qquad \{ \mbox{induction} \} \\ & [ [ e \mid q' ]_\mathsf{N} \mid q ]_\mathsf{N} \\ = & \qquad \{ \mbox{comprehension} \} \\ & [ e \mid q, q' ]_\mathsf{N} \end{array}

For example, if {\mathit{bag2set} : \mathsf{Bag} \mathbin{\stackrel{.}{\to}} \mathsf{Set}} is the obvious ringad morphism from bags to sets, discarding information about the multiplicity of repeated elements, and {x} a bag of numbers, then

\displaystyle  \mathit{bag2set}\,[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Bag} = [a^2 \mid a \leftarrow \mathit{bag2set}\,x, \mathit{odd}\,a]_\mathsf{Set}

and both yield the set of squares of the odd members of {x}. As a notational convenience, we might elide use of the ringad morphism when it is “obvious from context”—we might write just {[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Set}} even when {x} is a bag, relying on the “obvious” morphism {\mathit{bag2set}}. This would allow us to write, for example,

\displaystyle  [ a+b \mid a \leftarrow [1,2,3], b \leftarrow \langle4,4,5\rangle ]_\mathsf{Set} = \{ 5,6,7,8 \}

(writing {\langle\ldots\rangle} for the extension of a bag), instead of the more pedantic

\displaystyle  [ a+b \mid a \leftarrow \mathit{list2set}\,[1,2,3], b \leftarrow \mathit{bag2set}\,\langle4,4,5\rangle ]_\mathsf{Set} = \{ 5,6,7,8 \}

There is a forgetful function from any poorer member of the Boom hierarchy to a richer one, flattening some distinctions by imposing additional laws—for example, from bags to sets, flattening distinctions concerning multiplicity—and I would class these forgetful functions as “obvious” morphisms. On the other hand, any morphisms in the opposite direction—such as sorting, from bags to lists, and one-of-each, from sets to bags—are not “obvious”, and so should not be elided; and similarly, I’m not sure that I could justify as “obvious” any morphisms involving non-members of the Boom Hierarchy, such as probability distributions.

Posted in Uncategorized | 9 Comments

Upwards and downwards accumulations

Continuing my work in regress, this post revisits—with the benefit of much hindsight—what I was working on for my DPhil thesis (which was summarized in a paper at MPC 1992) and in subsequent papers at MPC 1998 and in SCP in 2000. This is the topic of accumulations on data structures, which distribute information across the data structure. List instances are familiar from the Haskell standard libraries (and, to those with a long memory, from APL); my thesis presented instances for a variety of tree datatypes; and the later work was about making it datatype-generic. I now have a much better way of doing it, using Conor McBride’s derivatives.

Accumulations

Accumulations or scans distribute information contained in a data structure across that data structure in a given direction. The paradigmatic example is computing the running totals of a list of numbers, which can be thought of as distributing the numbers rightwards across the list, summing them as you go. In Haskell, this is an instance of the {\mathit{scanl}} operator:

\displaystyle  \begin{array}{lcl} \mathit{scanl} &::& (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow [\beta] \\ \mathit{scanl}\,f\,e\,[\,] &=& [e] \\ \mathit{scanl}\,f\,e\,(a:x) &=& e : \mathit{scanl}\,f\,(f\,e\,a)\,x \medskip \\ \mathit{totals} &::& [{\mathbb Z}] \rightarrow [{\mathbb Z}] \\ \mathit{totals} &=& \mathit{scanl}\,(+)\,0 \end{array}

A special case of this pattern is to distribute the elements of a list rightwards across the list, simply collecting them as you go, rather than summing them. That’s the {\mathit{inits}} function, and it too is an instance of {\mathit{scanl}}:

\displaystyle  \mathit{inits} = \mathit{scanl}\,\mathit{snoc}\,[\,] \quad\mathbf{where}\; \mathit{snoc}\,x\,a = x \mathbin{{+}\!\!\!{+}} [a]

It’s particularly special, in the sense that it is the most basic {\mathit{scanl}}; any other instance can be expressed in terms of it:

\displaystyle  \mathit{scanl}\,f\,e = \mathit{map}\,(\mathit{foldl}\,f\,e) \cdot \mathit{inits}

This is called the Scan Lemma for {\mathit{scanl}}. Roughly speaking, it states that a {\mathit{scanl}} replaces every node of a list with a {\mathit{foldl}} applied to that node’s predecessors. Read from right to left, the scan lemma is an efficiency-improving transformation, eliminating duplicate computations; but note that this only works on expressions {\mathit{map}\,f \cdot \mathit{inits}} where {f} is a {\mathit{foldl}}, because only then are there duplicate computations to eliminate. It’s an important result, because it relates a clear and simple specification on the right to a more efficient implementation on the left.

However, the left-to-right operators {\mathit{inits}}, {\mathit{foldl}}, and {\mathit{scanl}} are a little awkward in Haskell, because they go against the grain of the cons-based (ie, right-to-left) structure of lists. I leave as a simple exercise for the reader the task of writing the more natural {\mathit{tails}}, {\mathit{foldr}}, and {\mathit{scanr}}, and identifying the relationships between them. Conversely, one can view {\mathit{inits}} etc as the natural operators for snoc-based lists, which are constructed from nil and snoc rather than from nil and cons.

Upwards and downwards accumulations on binary trees

What would {\mathit{inits}}, {\mathit{tails}}, {\mathit{scanl}}, etc look like on different—and in particular, non-linear—datatypes? Let’s consider a simple instance, for homogeneous binary trees; that is, trees with a label at both internal and external nodes.

\displaystyle  \mathbf{data}\;\mathsf{Tree}\,\alpha = \mathit{Leaf}\,\alpha \mid \mathit{Fork}\,\alpha\,(\mathsf{Tree}\,\alpha)\,(\mathsf{Tree}\,\alpha)

for which the obvious fold operator is

\displaystyle  \begin{array}{lcl} \mathit{fold} &::& (\alpha\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta\rightarrow\beta\rightarrow\beta) \rightarrow \mathsf{Tree}\,\alpha \rightarrow \beta \\ \mathit{fold}\,f\,g\,(\mathit{Leaf}\,a) &=& f\,a \\ \mathit{fold}\,f\,g\,(\mathit{Fork}\,a\,t\,u) &=& g\,a\,(\mathit{fold}\,f\,g\,t)\,(\mathit{fold}\,f\,g\,u) \end{array}

I’m taking the view that the appropriate generalization is to distribute data “upwards” and “downwards” through such a tree—from the leaves towards the root, and vice versa. This does indeed specialize to the definitions we had on lists when you view them vertically in terms of their “cons” structure: they’re long thin trees, in which every parent has exactly one child. (An alternative view would be to look at distributing data horizontally through a tree, from left to right and vice versa. Perhaps I’ll come back to that another time.)

The upwards direction is the easier one to deal with. An upwards accumulation labels every node of the tree with some function of its descendants; moreover, the descendants of a node themselves form a tree, so can be easily represented, and folded. So we can quite straightforwardly define:

\displaystyle  \begin{array}{lcl} \mathit{scanu} &::& (\alpha\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta\rightarrow\beta\rightarrow\beta) \rightarrow \mathsf{Tree}\,\alpha \rightarrow \mathsf{Tree}\,\beta \\ \mathit{scanu}\,f\,g\,(\mathit{Leaf}\,a) &=& \mathit{Leaf}\,(f\,a) \\ \mathit{scanu}\,f\,g\,(\mathit{Fork}\,a\,t\,u) &=& \mathit{Fork}\,(g\,a\,(\mathit{root}\,t')\,(\mathit{root}\,u'))\,t'\,u' \\ & & \quad\mathbf{where}\; t' = \mathit{scanu}\,f\,g\,t ; u' = \mathit{scanu}\,f\,g\,u \end{array}

where {\mathit{root}} yields the root of a tree:

\displaystyle  \begin{array}{lcl} \mathit{root} &::& \mathsf{Tree}\,\alpha \rightarrow \alpha \\ \mathit{root}\,(\mathit{Leaf}\,a) &=& a \\ \mathit{root}\,(\mathit{Fork}\,a\,t\,u) &=& a \end{array}

As with lists, the most basic upwards scan uses the constructors themselves as arguments:

\displaystyle  \begin{array}{lcl} \mathit{subtrees} &::& \mathsf{Tree}\,\alpha \rightarrow \mathsf{Tree}\,(\mathsf{Tree}\,\alpha) \\ \mathit{subtrees} &=& \mathit{scanu}\,\mathit{Leaf}\,\mathit{Fork} \end{array}

and any other scan can be expressed, albeit less efficiently, in terms of this:

\displaystyle  \mathit{scanu}\,f\,g = \mathit{fmap}\,(\mathit{fold}\,f\,g) \cdot \mathit{subtrees}

The downwards direction is more difficult, though. A downwards accumulation should label every node with some function of its ancestors; but these do not form another tree. For example, in the homogeneous binary tree

the ancestors of the node labelled {3} are the nodes labelled {2,4,3}. One could represent those ancestors simply as a list, {[2,4,3]}; but that rules out the possibility of a downwards accumulation treating left children differently from right children, which is essential in a number of algorithms (such as the parallel prefix and tree drawing algorithms in my thesis). A more faithful rendering is to define a new datatype of paths that captures the left and right turns—a kind of non-empty cons list, but with both a “left cons” and a “right cons” constructor:

\displaystyle  \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{Single}\,\alpha \mid \mathit{LCons}\,\alpha\,(\mathsf{Path}\,\alpha) \mid \mathit{RCons}\,\alpha\,(\mathsf{Path}\,\alpha)

(I called them “threads” in my thesis.) Then we can capture the data structure representing the ancestors of the node labelled {3}

by the expression {\mathit{RCons}\,2\,(\mathit{LCons}\,4\,(\mathit{Single}\,3))}. I leave it as an exercise for the more energetic reader to work out a definition for

\displaystyle  \mathit{paths} :: \mathsf{Tree}\,\alpha \rightarrow \mathsf{Tree}\,(\mathsf{Path}\,\alpha)

to compute the tree giving the ancestors of every node, and for a corresponding {\mathit{scand}}.

Generic upwards accumulations

Having seen ad-hoc constructions for a particular kind of binary tree, we should consider what the datatype-generic construction looks like. I discussed datatype-generic upwards accumulations already, in the post on Horner’s Rule; the construction was given in the paper Generic functional programming with types and relations by Richard Bird, Oege de Moor and Paul Hoogendijk. As with homogeneous binary trees, it’s still the case that the generic version of {\mathit{subtrees}} labels every node of a data structure of type {\mathsf{T}\alpha = \mu\mathsf{F}\alpha} with the descendants of that node, and still the case that the descendants form a data structure also of type {\mathsf{T}\alpha}. However, in general, the datatype {\mathsf{T}} does not allow for a label at every node, so we need the labelled variant {\mathsf{L}\alpha = \mu\mathsf{G}\alpha} where {\mathsf{G}(\alpha,\beta) = \alpha \times \mathsf{F}(1,\beta)}. Then we can define

\displaystyle  \mathit{subtrees}_{\mathsf{F}} = \mathit{fold}_{\mathsf{F}}(\mathit{in}_{\mathsf{G}} \cdot \mathit{fork}(\mathit{in}_{\mathsf{F}} \cdot \mathsf{F}(\mathit{id},\mathit{root}), \mathsf{F}(!,\mathit{id}))) :: \mathsf{T}\alpha \rightarrow \mathsf{L}(\mathsf{T}\alpha)

where {\mathit{root} = \mathit{fst} \cdot \mathit{in}_{\mathsf{G}}^{-1} = \mathit{fold}_{\mathsf{G}}\,\mathit{fst} :: \mathsf{L}\alpha \rightarrow \alpha} returns the root label of a labelled data structure—by construction, every labelled data structure has a root label—and {!_{\alpha} :: \alpha \rightarrow 1} is the unique arrow to the unit type. Moreover, we get a datatype-generic {\mathit{scanu}} operator, and a Scan Lemma:

\displaystyle  \begin{array}{lcl} \mathit{scanu}_{\mathsf{F}} &::& (\mathsf{F}(\alpha,\beta) \rightarrow \beta) \rightarrow \mathsf{T}\alpha \rightarrow \mathsf{L}\beta \\ \mathit{scanu}_{\mathsf{F}}\,\phi &=& \mathsf{L}\,(\mathit{fold}_{\mathsf{F}}\,\phi) \cdot \mathit{subtrees}_{\mathsf{F}} \\ &=& \mathit{fold}_{\mathsf{F}}(\mathit{in}_{\mathsf{G}} \cdot \mathit{fork}(\phi \cdot \mathsf{F}(\mathit{id},\mathit{root}), \mathsf{F}(!,\mathit{id}))) \end{array}

Generic downwards accumulations, via linearization

The best part of a decade after my thesis work, inspired by the paper by Richard Bird & co, I set out to try to define datatype-generic versions of downward accumulations too. I wrote a paper about it for MPC 1998, and then came up with a new construction for the journal version of that paper in SCP in 2000. I now think these constructions are rather clunky, and I have a better one; if you don’t care to explore the culs-de-sac, skip this section and the next and go straight to the section on derivatives.

The MPC construction was based around a datatype-generic version of the {\mathsf{Path}} datatype above, to represent the “ancestors” of a node in an inductive datatype. The tricky bit is that data structures in general are non-linear—a node may have many children—whereas paths are linear—every node has exactly one child, except the last which has none; how can we define a “linear version” {\mathsf{F}'} of {\mathsf{F}}? Technically, we might say that a functor is linear (actually, “affine” would be a better word) if it distributes over sum.

The construction in the paper assumed that {\mathsf{F}} was a sum of products of literals

\displaystyle  \begin{array}{lcl} \mathsf{F}(\alpha,\beta) &=& \sum_{i=1}^{n} \mathsf{F}_i(\alpha,\beta) \\ \mathsf{F}_i(\alpha,\beta) &=& \prod_{j=1}^{m_i} \mathsf{F}_{i,j}(\alpha,\beta) \end{array}

where each {\mathsf{F}_{i,j}(\alpha,\beta)} is either {\alpha}, {\beta}, or some constant type such as {\mathit{Int}} or {\mathit{Bool}}. For example, for leaf-labelled binary trees

\displaystyle  \mathbf{data}\;\mathsf{Tree}\,\alpha = \mathit{Tip}\,\alpha \mid \mathit{Bin}\,(\mathsf{Tree}\,\alpha)\,(\mathsf{Tree}\,\alpha)

the shape functor is {\mathsf{F}(\alpha,\beta) = \alpha + \beta \times \beta}, so {n=2} (there are two variants), {m_1=1} (the first variant has a single literal, {\alpha}) and {m_2=2} (the second variant has two literals, {\beta} and {\beta}), and:

\displaystyle  \begin{array}{lcl} \mathsf{F}(\alpha,\beta) &=& \mathsf{F}_1(\alpha,\beta) + \mathsf{F}_2(\alpha,\beta) \\ \mathsf{F}_1(\alpha,\beta) &=& \mathsf{F}_{1,1}(\alpha,\beta) \\ \mathsf{F}_{1,1}(\alpha,\beta) &=& \alpha \\ \mathsf{F}_2(\alpha,\beta) &=& \mathsf{F}_{2,1}(\alpha,\beta) \times \mathsf{F}_{2,2}(\alpha,\beta) \\ \mathsf{F}_{2,1}(\alpha,\beta) &=& \beta \\ \mathsf{F}_{2,1}(\alpha,\beta) &=& \beta \\ \end{array}

Then for each {i} we define a {(k_i+1)}-ary functor {\mathsf{F}'_i}, where {k_i} is the “degree of branching” of variant {i} (ie, the number of {\beta}s occurring in {\mathsf{F}_i(\alpha,\beta)}, which is the number of {j} for which {\mathsf{F}_{i,j}(\alpha,\beta)=\beta}), in such a way that

\displaystyle  \mathsf{F}'_i(\alpha,\beta,\ldots,\beta) = \mathsf{F}_i(\alpha,\beta)

and {\mathsf{F}'_i} is linear in each argument except perhaps the first. It’s a bit messy explicitly to give a construction for {\mathsf{F}'_i}, but roughly speaking,

\displaystyle  \mathsf{F}'_i(\alpha,\beta_1,\ldots,\beta_{k_i}) = \prod_{j=1}^{m_i} \mathsf{F}'_{i,j}(\alpha,\beta_1,\ldots,\beta_{k_i})

where {\mathsf{F}'_{i,j}(\alpha,\beta_1,\ldots,\beta_{k_i})} is “the next unused {\beta_i}” when {\mathsf{F}_{i,j}(\alpha,\beta)=\beta}, and just {\mathsf{F}_{i,j}(\alpha,\beta)} otherwise. For example, for leaf-labelled binary trees, we have:

\displaystyle  \begin{array}{lcl} \mathsf{F}'_1(\alpha) &=& \alpha \\ \mathsf{F}'_2(\alpha,\beta_1,\beta_2) &=& \beta_1 \times \beta_2 \end{array}

Having defined the linear variant {\mathsf{F}'} of {\mathsf{F}}, we can construct the datatype {\mathsf{P}\alpha = \mu\mathsf{H}\alpha} of paths, as the inductive datatype of shape {\mathsf{H}} where

\displaystyle  \mathsf{H}(\alpha,\beta) = \mathsf{F}(\alpha,1) + \sum_{i=1}^{n} \sum_{j=1}^{k_i} (\mathsf{F}_i(\alpha,1) \times \beta)

That is, paths are a kind of non-empty cons list. The path ends at some node of the original data structure; so the last element of the path is of type {\mathsf{F}(\alpha,1)}, which records the “local content” of a node (its shape and labels, but without any of its children). Every other element of the path consists of the local content of a node together with an indication of which direction to go next; this amounts to the choice of a variant {i}, followed by the choice of one of {k_i} identical copies of the local contents {\mathsf{F}_i(\alpha,1)} of variant {i}, where {k_i} is the degree of branching of variant {i}. We model this as a base constructor {\mathit{End}} and a family of “cons” constructors {\mathit{Cons}_{i,j}} for {1 \le i \le n} and {1 \le j \le k_i}.

For example, for leaf-labelled binary trees, the “local content” for the last element of the path is either a single label (for tips) or void (for bins), and for the other path elements, there are zero copies of the local content for a tip (because a tip has zero children), and two copies of the void local information for bins (because a bin has two children). Therefore, the path datatype for such trees is

\displaystyle  \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{End}\,(\mathsf{Maybe}\,\alpha) \mid \mathit{Cons}_{2,1}\,(\mathsf{Path}\,\alpha) \mid \mathit{Cons}_{2,2}\,(\mathsf{Path}\,\alpha)

which is isomorphic to the definition that you might have written yourself:

\displaystyle  \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{External}\,\alpha \mid \mathit{Internal} \mid \mathit{Left}\,(\mathsf{Path}\,\alpha) \mid \mathit{Right}\,(\mathsf{Path}\,\alpha)

For homogeneous binary trees, the construction gives

\displaystyle  \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{External}\,\alpha \mid \mathit{Internal}\,\alpha \mid \mathit{Left}\,\alpha\,(\mathsf{Path}\,\alpha) \mid \mathit{Right}\,\alpha\,(\mathsf{Path}\,\alpha)

which is almost the ad-hoc definition we had two sections ago, except that it distinguishes singleton paths that terminate at an external node from those that terminate at an internal one.

Now, analogous to the function {\mathit{subtrees}_\mathsf{F}} which labels every node with its descendants, we can define a function {\mathit{paths}_\mathsf{F} : \mathsf{T}\alpha \rightarrow \mathsf{L}(\mathsf{P}\alpha)} to label every node with its ancestors, in the form of the path to that node. One definition is as a fold; informally, at each stage we construct a singleton path to the root, and map the appropriate “cons” over the paths to each node in each of the children (see the paper for a concrete definition). This is inefficient, because of the repeated maps; it’s analogous to defining {\mathit{inits}} by

\displaystyle  \begin{array}{lcl} \mathit{inits}\,[\,] &=& [[\,]] \\ \mathit{inits}\,(a:x) &=& [\,] : \mathit{map}\,(a:)\,(\mathit{inits}\,x) \end{array}

A second definition is as an unfold, maintaining as an accumulating parameter of type {\mathsf{P}(\alpha)\rightarrow\mathsf{P}(\alpha)} the “path so far”; this avoids the maps, but it is still quadratic because there are no common subexpressions among the various paths. (This is analogous to an accumulating-parameter definition of {\mathit{inits}}:

\displaystyle  \begin{array}{lcl} \mathit{inits} &=& \mathit{inits}'\,\mathit{id} \medskip \\ \mathit{inits}'\,f\,[\,] &=& f\,[\,] \\ \mathit{inits}'\,f\,(a:x) &=& f\,[\,] : \mathit{inits}'\,(f \cdot (a:))\,x \end{array}

Even with an accumulating “Hughes list” parameter, it still takes quadratic time.)

The downwards accumulation itself is defined as a path fold mapped over the paths, giving a Scan Lemma for downwards accumulations. With either the fold or the unfold definition of paths, this is still quadratic, again because of the lack of common subexpressions in a result of quadratic size. However, in some circumstances the path fold can be reassociated (analogous to turning a {\mathit{foldr}} into a {\mathit{foldl}}), leading finally to a linear-time computation; see the paper for the details of how.

Generic downwards accumulations, via zip

I was dissatisfied with the “…”s in the MPC construction of datatype-generic paths, but couldn’t see a good way of avoiding them. So in the subsequent SCP version of the paper, I presented an alternative construction of downwards accumulations, which does not go via a definition of paths; instead, it goes directly to the accumulation itself.

As with the efficient version of the MPC construction, it is coinductive, and uses an accumulating parameter to carry in to each node the seed from higher up in the tree; so the downwards accumulation is of type {\gamma \times \mathsf{T}\alpha \rightarrow \mathsf{L}\beta}. It is defined as an unfold, with a body {g} of type

\displaystyle  \gamma \times \mathsf{T}\alpha \rightarrow \mathsf{G}(\beta, \gamma \times \mathsf{T}\alpha)

The result {\mathsf{G}(\beta, \gamma \times \mathsf{T}\alpha)} of applying the body will be constructed from two components, of types {\mathsf{G}(\beta, \gamma)} and {\mathsf{G}(1, \mathsf{T}\alpha)}: the first gives the root label of the accumulation and the seeds for processing the children, and the second gives the children themselves.

These two components get combined to make the whole result via a function

\displaystyle  \mathit{zip} :: \mathsf{G}(\alpha,\beta) \times \mathsf{G}(\gamma,\delta) \rightarrow \mathsf{G}(\alpha \times \gamma, \beta \times \delta)

This will be partial in general, defined only for pairs of {\mathsf{G}}-structures of the same shape.

The second component of {g} is the easier to define; given input {\gamma \times \mathsf{T}\alpha}, it unpacks the {\mathsf{T}\alpha} to {\mathsf{F}(\alpha,\mathsf{T}\alpha)}, and discards the {\gamma} and the {\alpha} (recall that {\mathsf{L}\alpha=\mu\mathsf{G}\alpha} is the labelled variant of {\mathsf{T}\alpha=\mu\mathsf{F}\alpha}, where {\mathsf{G}(\alpha,\beta) = \alpha \times \mathsf{F}(1,\beta)}).

For the first component, we enforce the constraint that all output labels are dependent only on their ancestors by unpacking the {\mathsf{T}\alpha} and pruning off the children, giving input {\gamma \times \mathsf{F}(\alpha,1)}. We then suppose as a parameter to the accumulation a function {f} of type {\gamma \times \mathsf{F}(\alpha,1) \rightarrow \beta \times \mathsf{F}(1,\gamma) = \mathsf{G}(\beta,\gamma)} to complete the construction of the first component. In order that the two components can be zipped together, we require that {f} is shape-preserving in its second argument:

\displaystyle  \mathsf{F}(!,!) \cdot \mathit{snd} = \mathsf{F}(!,!) \cdot f \cdot \mathit{snd}

where {! : \alpha \rightarrow 1} is the unique function to the unit type. Then, although the {g} built from these two components depends on the partial function {\mathit{zip}}, it will still itself be total.

The SCP construction gets rid of the “…”s in the MPC construction. It is also inherently efficient, in the sense that if the core operation {f} takes constant time then the whole accumulation takes linear time. However, use of the partial {\mathit{zip}} function to define a total accumulation is a bit unsatisfactory, taking us outside the domain of sets and total functions. Moreover, there’s now only half an explanation in terms of paths: accumulations in which the label attached to each node depends only on the list of its ancestors, and not on the left-to-right ordering of siblings, can be factored into a list function (in fact, a {\mathit{foldl}}) mapped over the “paths”, which is now a tree of lists; but accumulations in which left children are treated differently from right children, such as the parallel prefix and tree drawing algorithms mentioned earlier, can not.

Generic downwards accumulations, via derivatives

After another interlude of about a decade, and with the benefit of new results to exploit, I had a “eureka” moment: the linearization of a shape functor is closely related to the beautiful notion of the derivative of a datatype, as promoted by Conor McBride. The crucial observation Conor made is that the “one-hole contexts” of a datatype—that is, for a container datatype, the datatype of data structures with precisely one element missing—can be neatly formalized using an analogue of the rules of differential calculus. The one-hole contexts are precisely what you need to identify which particular child you’re talking about out of a collection of children. (If you’re going to follow along with some coding, I recommend that you also read Conor’s paper Clowns to the left of me, jokers to the right. This gives the more general construction of dissecting a datatype, identifying a unique hole, but also allowing the “clowns” to the left of the hole to have a different type from the “jokers” to the right. I think the explanation of the relationship with the differential calculus is much better explained here; the original notion of derivative can be retrieved by specializing the clowns and jokers to the same type.)

The essence of the construction is the notion of a derivative {\Delta\mathsf{F}} of a functor {\mathsf{F}}. For our purposes, we want the derivative in the second argument only of a bifunctor; informally, {\Delta\mathsf{F}(\alpha,\beta)} is like {\mathsf{F}(\alpha,\beta)}, but with precisely one {\beta} missing. Given such a one-hole context, and an element with which to plug the hole, one can reconstruct the whole structure:

\displaystyle  \mathit{plug}_\mathsf{F} :: \beta \times \Delta\mathsf{F}(\alpha,\beta) \rightarrow \mathsf{F}(\alpha,\beta)

That’s how to consume one-hole contexts; how can we produce them? We could envisage some kind of inverse {\mathit{unplug}} of {\mathit{plug}}, which breaks an {\mathsf{F}}-structure into an element and a context; but this requires us to invent a language for specifying which particular element we mean—{\mathit{plug}} is not injective, so {\mathit{unplug}} needs an extra argument. A simpler approach is to provide an operator that annotates every position at once with the one-hole context for that position:

\displaystyle  \mathit{positions}_\mathsf{F} :: \mathsf{F}(\alpha,\beta) \rightarrow \mathsf{F}(\alpha, \beta \times \Delta\mathsf{F}(\alpha,\beta))

One property of {\mathit{positions}} is that it really is an annotation—if you throw away the annotations, you get back what you started with:

\displaystyle  \mathsf{F}(\mathit{id},\mathit{fst})\,(\mathit{positions}\,x) = x

A second property relates it to {\mathit{plug}}—each of elements in a hole position plugs into its associated one-hole context to yield the same whole structure back again:

\displaystyle  \mathsf{F}(\mathit{id},\mathit{plug})\,(\mathit{positions}\,x) = \mathsf{F}(\mathit{id},\mathit{const}\,x)\,x

(I believe that those two properties completely determine {\mathit{plug}} and {\mathit{positions}}.)

Incidentally, the derivative {\Delta\mathsf{F}} of a bifunctor can be elegantly represented as an associated type synonym in Haskell, in a type class {\mathit{Diff}} of bifunctors differentiable in their second argument, along with {\mathit{plug}} and {\mathit{positions}}:

\displaystyle  \begin{array}{lcl} \mathbf{class}\; \mathit{Bifunctor}\,f \Rightarrow \mathit{Diff}\,f \;\mathbf{where} \\ \qquad \mathbf{type}\; \mathit{Delta}\,f :: \ast \rightarrow \ast \rightarrow \ast \\ \qquad \mathit{plug} :: (\beta, \mathit{Delta}\,f\,\alpha\,\beta) \rightarrow f\,\alpha\,\beta \\ \qquad \mathit{positions} :: f\,\alpha\,\beta \rightarrow f\,\alpha\,(\beta, \mathit{Delta}\,f\,\alpha\,\beta) \end{array}

Conor’s papers show how to define instances of {\mathit{Diff}} for all polynomial functors {\mathsf{F}}—anything made out of constants, projections, sums, and products.

The path to a node in a data structure is simply a list of one-hole contexts—let’s say, innermost context first, although it doesn’t make much difference—but with all the data off the path (that is, the other children) stripped away:

\displaystyle  \mathsf{P}\alpha = \mathsf{List}(\Delta\mathsf{F}(\alpha,1))

This is a projection of Huet’s zipper, which preserves the off-path children, and records also the subtree in focus at the end of the path:

\displaystyle  \mathsf{Zipper}_\mathsf{F}\,\alpha = \mathsf{List}(\Delta\mathsf{F}(\alpha,\mu\mathsf{F}\alpha)) \times \mu\mathsf{F}\alpha

Since the contexts are listed innermost-first in the path, closing up a zipper to reconstruct a tree is a {\mathit{foldl}} over the path:

\displaystyle  \begin{array}{lcl} close_\mathsf{F} &::& \mathsf{Zipper}_\mathsf{F}\,\alpha \rightarrow \mu\mathsf{F}\alpha \\ close_\mathsf{F}\,(p,t) &=& \mathit{foldl}\,(\mathit{in}\cdot\mathit{plug})\,t\,p \end{array}

Now, let’s develop the function {\mathit{paths}}, which turns a tree into a labelled tree of paths. We will write it with an accumulating parameter, representing the “path so far”:

\displaystyle  \begin{array}{lcl} \mathit{paths}_\mathsf{F} &::& \mathsf{T}\alpha \rightarrow \mathsf{L}(\mathsf{P}\alpha) \\ \mathit{paths}_\mathsf{F}\,t &=& \mathit{paths}'_\mathsf{F}\,(t,[\,]) \end{array}

Given the components {\mathit{in}_\mathsf{F}\,x} of a tree and a path {p} to its root, {\mathit{paths}'_\mathsf{F}} must construct the corresponding labelled tree of paths. Since {\mathsf{L} = \mu\mathsf{G}} and {\mathsf{G}(\alpha,\beta) = \alpha \times \mathsf{F}(1,\beta)}, this amounts to constructing a value of type {\mathsf{P}\alpha \times \mathsf{F}(1, \mathsf{L}(\mathsf{P}\alpha))}. For the first component of this pair we will use {p}, the path so far. The second component can be constructed from {x} by identifying all children via {\mathit{positions}}, discarding some information with judicious {!}s, consing each one-hole context onto {p} to make a longer path, then making recursive calls on each child:

That is,

\displaystyle  \begin{array}{lcl} \mathit{paths}'_\mathsf{F} &::& \mathsf{T}\alpha\times\mathsf{P}\alpha \rightarrow \mathsf{L}(\mathsf{P}\alpha) \\ \mathit{paths}'_\mathsf{F}\,(\mathit{in}_\mathsf{F}\,x,p) &=& \mathit{in}_\mathsf{G}(p, \mathsf{F}(!, \mathit{paths}'_\mathsf{F} \cdot \mathit{id}\times((:p)\cdot\Delta\mathsf{F}(\mathit{id},!)) )\,(\mathit{positions}\,x)) \end{array}

Downwards accumulations are then path functions mapped over the result of {\mathit{paths}}. However, we restrict ourselves to path functions that are instances of {\mathit{foldr}}, because only then are there common subexpressions to be shared between a parent and its children (remember that paths are innermost-first, so related nodes share a tail of their ancestors).

\displaystyle  \begin{array}{lcl} \mathit{scand}_\mathsf{F} &::& (\Delta\mathsf{F}(\alpha,1)\times\beta\rightarrow\beta) \rightarrow \beta \rightarrow \mathsf{T}\alpha \rightarrow \mathsf{L}\beta \\ \mathit{scand}_\mathsf{F}\,f\,e &=& \mathit{map}\,(\mathit{foldr}\,f\,e) \cdot \mathit{paths} \end{array}

Moreover, it is straightforward to fuse the {\mathit{map}} with {\mathit{paths}}, to obtain

\displaystyle  \begin{array}{lcl} \mathit{scand}_\mathsf{F}\,f\,e\,t &=& \mathit{scand}'_\mathsf{F}\,f\,(t,e) \medskip \\ \mathit{scand}'_\mathsf{F}\,f\,(\mathit{in}_\mathsf{F}\,x,e) &=& \mathit{in}_\mathsf{G}(e, \mathsf{F}(!, \mathit{scand}'_\mathsf{F}\,f \cdot \mathit{id}\times g )\,(\mathit{positions}\,x)) \\ & & \quad\mathbf{where}\; g\,d = f\,(\Delta\mathsf{F}(\mathit{id},!)\,d, e) \end{array}

which takes time linear in the size of the tree, assuming that {f} and {e} take constant time.

Finally, in the case that the function being mapped over the paths is a {\mathit{foldl}} as well as a {\mathit{foldr}}, then we can apply the Third Homomorphism Theorem to conclude that it is also an associative fold over lists. From this (I believe) we get a very efficient parallel algorithm for computing the accumulation, taking time logarithmic in the size of the tree—even if the tree has greater than logarithmic depth.

Posted in Uncategorized | 2 Comments

Distributivity in Horner’s Rule

This is a continuation of my previous post on Horner’s Rule, and in particular, of the discussion there about distributivity in the datatype-generic version of the Maximum Segment Sum problem:

the essential property behind Horner’s Rule is one of distributivity. In the datatype-generic case, we will model this as follows. We are given an {(\mathsf{F}\,\alpha)}-algebra {(\beta,f)} [for a binary shape functor {\mathsf{F}}], and a {\mathsf{M}}-algebra {(\beta,k)} [for a collection monad {\mathsf{M}}]; you might think of these as “datatype-generic product” and “collection sum”, respectively. Then there are two different methods of computing a {\beta} result from an {\mathsf{F}\,\alpha\,(\mathsf{M}\,\beta)} structure: we can either distribute the {\mathsf{F}\,\alpha} structure over the collection(s) of {\beta}s, compute the “product” {f} of each structure, and then compute the “sum” {k} of the resulting products; or we can “sum” each collection, then compute the “product” of the resulting structure. Distributivity of “product” over “sum” is the property that these two different methods agree, as illustrated in the following diagram.

For example, with {f :: \mathsf{F}\,{\mathbb Z}\,{\mathbb Z} \rightarrow {\mathbb Z}} adding all the integers in an {\mathsf{F}}-structure, and {k :: \mathsf{M}\,{\mathbb Z} \rightarrow {\mathbb Z}} finding the maximum of a (non-empty) collection, the diagram commutes.

There’s a bit of hand-waving above to justify the claim that this is really a kind of distributivity. What does it have to do with the common-or-garden equation

\displaystyle  a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)

stating distributivity of one binary operator over another? That question is the subject of this post.

Distributing over effects

Recall that {\delta_2 :: (\mathsf{F}\,\alpha)\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}(\mathsf{F}\,\alpha)} distributes the shape functor {\mathsf{F}} over the monad {\mathsf{M}} in its second argument; this is the form of distribution over effects that crops up in the datatype-generic Maximum Segment Sum problem. More generally, this works for any idiom {\mathsf{M}}; this will be important below.

Generalizing in another direction, one might think of distributing over an idiom in both arguments of the bifunctor, via an operator {\delta : \mathsf{F} \cdot (\mathsf{M} \times \mathsf{M}) \mathbin{\stackrel{.}{\to}} \mathsf{M} \cdot \mathsf{F}}, which is to say, {\delta_\beta :: \mathsf{F}\,(\mathsf{M}\beta)\,(\mathsf{M}\beta) \rightarrow \mathsf{M}(\mathsf{F}\beta)}, natural in the {\beta}. This is the {\mathit{bidist}} method of the {\mathit{Bitraversable}} subclass of {\mathit{Bifunctor}} that Bruno Oliveira and I used in our Essence of the Iterator Pattern paper; informally, it requires just that {\mathsf{F}} has a finite ordered sequence of “element positions”. Given {\delta}, one can define {\delta_2 = \delta \cdot \mathsf{F}\,\mathit{pure}\,\mathit{id}}.

That traversability (or equivalently, distributivity over effects) for a bifunctor {\mathsf{F}} is definable for any idiom, not just any monad, means that one can also conveniently define an operator {\mathit{contents}_{\mathsf{H}} : \mathsf{H} \mathbin{\stackrel{.}{\to}} \mathsf{List}} for any traversable unary functor {\mathsf{H}}. This is because the constant functor {\mathsf{K}_{[\beta]}} (which takes any {\alpha} to {[\beta]}) is an idiom: the {\mathit{pure}} method returns the empty list, and idiomatic application appends two lists. Then one can define

\displaystyle  \mathit{contents}_{\mathsf{H}} = \delta \cdot \mathsf{H}\,\mathit{wrap}

where {\mathit{wrap}} makes a singleton list. For a traversable bifunctor {\mathsf{F}}, we define {\mathit{contents}_{\mathsf{F}} = \mathit{contents}_{\mathsf{F}\cdot\triangle}} where {\triangle} is the diagonal functor; that is, {\mathit{contents}_{\mathsf{F}} :: \mathsf{F}\,\beta\,\beta \rightarrow [\beta]}, natural in the {\beta}. (No constant functor is a monad, except in trivial categories, so this convenient definition of contents doesn’t work monadically. Of course, one can use a writer monad, but this isn’t quite so convenient, because an additional step is needed to extract the output.)

One important axiom of {\delta} that I made recent use of in a paper with Richard Bird on Effective Reasoning about Effectful Traversals is that it should be “natural in the contents”: it should leave shape unchanged, and depend on contents only up to the extent of their ordering. Say that a natural transformation {\phi : \mathsf{F} \mathbin{\stackrel{.}{\to}} \mathsf{G}} between traversable functors {\mathsf{F}} and {\mathsf{G}} “preserves contents” if {\mathit{contents}_{\mathsf{G}} \cdot \phi = \mathit{contents}_{\mathsf{F}}}. Then, in the case of unary functors, the formalization of “naturality in the contents” requires {\delta} to respect content-preserving {\phi}:

\displaystyle  \delta_{\mathsf{G}} \cdot \phi = \mathsf{M}\phi \cdot \delta_{\mathsf{F}} : \mathsf{T}\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}\mathsf{G}

In particular, {\mathit{contents}_{\mathsf{F}} : \mathsf{F} \mathbin{\stackrel{.}{\to}} \mathsf{List}} itself preserves contents, and so we expect

\displaystyle  \delta_{\mathsf{List}} \cdot \mathit{contents}_{\mathsf{F}} = \mathsf{M}(\mathit{contents}_{\mathsf{F}}) \cdot \delta_{\mathsf{F}}

to hold.

Folding a structure

Happily, the same generic operation {\mathit{contents}_{\mathsf{F}}} provides a datatype-generic means to “fold” over the elements of an {\mathsf{F}}-structure. Given a binary operator {\otimes :: \beta\times\beta \rightarrow \beta} and an initial value {b :: \beta}, we can define an {(\mathsf{F}\,\beta)}-algebra {(\beta,f)}—that is, a function {f :: \mathsf{F}\,\beta\,\beta\rightarrow\beta}—by

\displaystyle  f = \mathit{foldr}\,(\otimes)\,b \cdot \mathit{contents}_{\mathsf{F}}

(This is a slight specialization of the presentation of the datatype-generic MSS problem from last time; there we had {f :: \mathsf{F}\,\alpha\,\beta \rightarrow \beta}. The specialization arises because we are hoping to define such an {f} given a homogeneous binary operator {\otimes}. On the other hand, the introduction of the initial value {b} is no specialization, as we needed such a value for the “product” of an empty “segment” anyway.)

Incidentally, I believe that this “generic folding” construction is exactly what is intended in Ross Paterson’s Data.Foldable library.

Summing a collection

The other ingredient we need is an {\mathsf{M}}-algebra {(\beta,k)}. We already decided last time to

stick to reductions{k}s of the form {\oplus/} for associative binary operator {{\oplus} :: \beta \times \beta \rightarrow \beta}; then we also have distribution over choice: {\oplus / (x \mathbin{\underline{\smash{\mathit{mplus}}}} y) = (\oplus/x) \oplus (\oplus/y)}. Note also that we prohibited empty collections in {\mathsf{M}}, so we do not need a unit for {\oplus}.

On account of {\oplus/} being an algebra for the collection monad {\mathsf{M}}, we also get a singleton rule {\oplus/ \cdot \mathit{return} = \mathit{id}}.

Reduction to distributivity for lists

One of the take-home messages in the Effective Reasoning about Effectful Traversals paper is that it helps to reduce a traversal problem for datatypes in general to a more specific one about lists, exploiting the “naturality in contents” property of traversability. We’ll use that tactic for the distributivity property in the datatype-generic version Horner’s Rule.

In this diagram, the perimeter is the commuting diagram given at the start of this post—the diagram we have to justify. Face (1) is the definition of {\delta_2} in terms of {\delta}. Faces (2) and (3) are the expansion of {f} as generic folding of an {\mathsf{F}}-structure. Face (4) follows from {\oplus/} being an {\mathsf{M}}-algebra, and hence being a left-inverse of {\mathit{return}}. Face (5) is an instance of the naturality property of {\mathit{contents}_{\mathsf{F}} : \mathsf{F}\triangle \mathbin{\stackrel{.}{\to}} \mathsf{List}}. Face (6) is the property that {\delta} respects the contents-preserving transformation {\mathit{contents}_{\mathsf{F}}}. Therefore, the whole diagram commutes if Face (7) does—so let’s look at Face (7)!

Distributivity for lists

Here’s Face (7) again:

Demonstrating that this diagram commutes is not too difficult, because both sides turn out to be list folds.

Around the left and bottom edges, we have a fold {\mathit{foldr}\,(\otimes)\,b} after a map {\mathsf{List}\,(\oplus)}, which automatically fuses to {\mathit{foldr}\,(\odot)\,b}, where {\odot} is defined by

\displaystyle  x \odot a = (\oplus/x) \otimes a

or, pointlessly,

\displaystyle  (\odot) = (\otimes) \cdot (\oplus/) \times \mathit{id}

Around the top and right edges we have the composition {\oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \cdot \delta_{\mathsf{List}}}. If we can write {\delta_{\mathsf{List}}} as an instance of {\mathit{foldr}}, we can then use the fusion law for {\mathit{foldr}}

\displaystyle  h \cdot \mathit{foldr}\,f\,e = \mathit{foldr}\,f'\,e' \;\Leftarrow\; h\,e=e' \land h \cdot f = f' \cdot \mathit{id}\times h

to prove that this composition equals {\mathit{foldr}\,(\odot)\,b}.

In fact, there are various equivalent ways of writing {\delta_{\mathsf{List}}} as an instance of {\mathit{foldr}}. The definition given by Conor McBride and Ross Paterson in their original paper on idioms looked like the identity function, but with added idiomness:

\displaystyle  \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{pure}\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathit{pure}\,(:) \circledast \mathit{mb} \circledast \delta_{\mathsf{List}}\,\mathit{mbs} \end{array}

In the special case that the idiom is a monad, it can be written in terms of {\mathit{liftM}_0} (aka {\mathit{return}}) and {\mathit{liftM}_2}:

\displaystyle  \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{liftM}_0\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathit{liftM}_2\,(:)\,\mathit{mb}\,(\delta_{\mathsf{List}}\,\mathit{mbs}) \end{array}

But we’ll use a third definition:

\displaystyle  \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{return}\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathsf{M}(:)\,(\mathit{cp}\,(\mathit{mb}, \delta_{\mathsf{List}}\,\mathit{mbs})) \end{array}

where

\displaystyle  \begin{array}{lcl} \mathit{cp} &::& \mathsf{M}\,\alpha \times \mathsf{M}\,\beta \rightarrow \mathsf{M}(\alpha\times\beta) \\ \mathit{cp}\,(x,y) &=& \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a,b) \} \end{array}

That is,

\displaystyle  \delta_{\mathsf{List}} = \mathit{foldr}\,(\mathsf{M}(:)\cdot\mathit{cp})\,(\mathit{return}\,[\,])

Now, for the base case we have

\displaystyle  \oplus/\,(\mathsf{M}(\mathit{foldr}\,(\otimes)\,b)\,(\mathit{return}\,[\,])) = \oplus/\,(\mathit{return}\,(\mathit{foldr}\,(\otimes)\,b\,[\,])) = \oplus/\,(\mathit{return}\,b) = b

as required. For the inductive step, we have:

\displaystyle  \begin{array}{ll} & \oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \cdot \mathsf{M}(:) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{functors} \} \\ & \oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b \cdot (:)) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{evaluation for~} \mathit{foldr} \} \\ & \oplus/ \cdot \mathsf{M}((\otimes) \cdot \mathit{id}\times\mathit{foldr}\,(\otimes)\,b) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{functors; naturality of~} \mathit{cp} \} \\ & \oplus/ \cdot \mathsf{M}(\otimes) \cdot \mathit{cp} \cdot \mathsf{M}\mathit{id}\times\mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \\ = & \qquad \{ \mbox{distributivity for~} \mathit{cp} \mbox{: see below} \} \\ & (\otimes) \cdot (\oplus/)\times(\oplus/) \cdot \mathsf{M}\mathit{id}\times\mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \\ = & \qquad \{ \mbox{functors} \} \\ & (\otimes) \cdot (\oplus/)\times\mathit{id} \cdot \mathit{id}\times\mathsf{M}(\oplus/\cdot\mathit{foldr}\,(\otimes)\,b) \end{array}

which completes the fusion proof, modulo the wish about distributivity for {\mathit{cp}}:

\displaystyle  \oplus/ \cdot \mathsf{M}(\otimes) \cdot \mathit{cp} = (\otimes) \cdot (\oplus/)\times(\oplus/)

Distributivity for cartesian product

As for that wish about distributivity for {\mathit{cp}}:

\displaystyle  \begin{array}{ll} & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathsf{M}(\otimes) \mathbin{\hbox{\footnotesize\$}} \mathit{cp}\,(x,y) \\ = & \qquad \{ \mbox{definition of~} \mathit{cp} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathsf{M}(\otimes) \mathbin{\hbox{\footnotesize\$}} \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a,b) \,\} \\ = & \qquad \{ \mbox{map over~} \mathbf{do} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a \otimes b) \,\} \\ = & \qquad \{ \mbox{expanding~} \mathbf{do} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathit{join} \mathbin{\hbox{\footnotesize\$}} \mathsf{M}\,(\lambda a \mathbin{.} \mathsf{M}\,(a\otimes)\,y)\,x \\ = & \qquad \{ \oplus/ \mbox{~is an~} \mathsf{M} \mbox{-algebra} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathsf{M}(\oplus/) \mathbin{\hbox{\footnotesize\$}} \mathsf{M}\,(\lambda a \mathbin{.} \mathsf{M}\,(a\otimes)\,y)\,x \\ = & \qquad \{ \mbox{functors} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathsf{M}\,(\lambda a \mathbin{.} \oplus/(\mathsf{M}\,(a\otimes)\,y))\,x \\ = & \qquad \{ \mbox{distributivity for collections: see below} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathsf{M}\,(\lambda a \mathbin{.} a \otimes (\oplus/\,y))\,x \\ = & \qquad \{ \mbox{sectioning} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\$}} \mathsf{M}\,(\otimes (\oplus/\,y))\,x \\ = & \qquad \{ \mbox{distributivity for collections again} \} \\ & (\otimes (\oplus/\,y))\,(\oplus/\,x) \\ = & \qquad \{ \mbox{sectioning} \} \\ & (\oplus/\,x) \otimes (\oplus/\,y) \\ = & \qquad \{ \mbox{eta-expansion} \} \\ & (\otimes) \mathbin{\hbox{\footnotesize\$}} (\oplus/ \times \oplus/) \mathbin{\hbox{\footnotesize\$}} (x,y) \\ \end{array}

which discharges the proof obligation about distributivity for cartesian product, but again modulo two symmetric wishes about distributivity for collections:

\displaystyle  \begin{array}{lcl} \oplus/ \cdot \mathsf{M}(a\otimes) &=& (a\otimes) \cdot \oplus/ \\ \oplus/ \cdot \mathsf{M}(\otimes b) &=& (\otimes b) \cdot \oplus/ \\ \end{array}

Distributivity for collections

Finally, the proof obligations about distributivity for collections are easily discharged, by induction over the size of the (finite!) collection, provided that the binary operator {\otimes} distributes over {\oplus} in the familiar sense. The base case is for a singleton collection, ie in the image of {\mathit{return}} (because we disallowed empty collections); this case follows from the fact that {\oplus/} is an {\mathsf{M}}-algebra. The inductive step is for a collection of the form {u \mathbin{\underline{\smash{\mathit{mplus}}}} v} with {u,v} both strictly smaller than the whole (so, if the monad is idempotent, disjoint, or at least not nested); this requires the distribution of the algebra over choice {\oplus / (u \mathbin{\underline{\smash{\mathit{mplus}}}} v) = (\oplus/u) \oplus (\oplus/v)}, together with the familiar distribution of {\otimes} over {\oplus}.

Summary

So, the datatype-generic distributivity for {\mathsf{F}}-structures of collections that we used for the Maximum Segment Sum problem reduced to distributivity for lists of collections, which reduced to the cartesian product of collections, which reduced to that for pairs. That’s a much deeper hierarchy than I was expecting; can it be streamlined?

Posted in Uncategorized | 2 Comments

Horner’s Rule

This post is about my all-time favourite calculation, of a linear-time algorithm for the maximum segment sum problem, based on Horner’s Rule. The problem was popularized in Jon Bentley’s Programming Pearls series in CACM (and in the subsequent book), but I learnt about it from Richard Bird’s lecture notes on The Theory of Lists and Constructive Functional Programming and his paper Algebraic Identities for Program Calculation, which he was working on around the time I started my DPhil. It seems like I’m not the only one for whom the problem is a favourite, because it has since become a bit of a cliché among program calculators; but that won’t stop me writing about it again.

Maximum segment sum

The original problem is as follows. Given a list of numbers (say, a possibly empty list of integers), find the largest of the sums of the contiguous segments of that list. In Haskell, this specification could be written like so:

\displaystyle  \begin{array}{lcl} \mathit{mss} &=& \mathit{maximum} \cdot \mathit{map}\,\mathit{sum} \cdot \mathit{segs} \end{array}

where {\mathit{segs}} computes the contiguous segments of a list:

\displaystyle  \begin{array}{lcl} \mathit{segs} &=& \mathit{concat} \cdot \mathit{map}\,\mathit{inits} \cdot \mathit{tails} \\ \mathit{tails} &=& \mathit{foldr}\,f\,[[\,]] \quad\mathbf{where}\; f\,\mathit{x}\,\mathit{xss} = (\mathit{x}:\mathit{head}\,\mathit{xss}):\mathit{xss} \\ \mathit{inits} &=& \mathit{foldr}\,g\,[[\,]] \quad\mathbf{where}\; g\,\mathit{x}\,\mathit{xss} = [\,] : \mathit{map}\,(\mathit{x}:)\,\mathit{xss} \end{array}

and {\mathit{sum}} computes the sum of a list, and {\mathit{maximum}} the maximum of a nonempty list:

\displaystyle  \begin{array}{lcl} \mathit{sum} &=& \mathit{foldr}\,(+)\,0 \\ \mathit{maximum} &=& \mathit{foldr}_1\,\max \end{array}

This specification is executable, but takes cubic time; the problem is to do better.

We can get quite a long way just with standard properties of {\mathit{map}}, {\mathit{inits}}, etc:

\displaystyle  \begin{array}{ll} & \mathit{mss} \\ = & \qquad \{ \mbox{definition of~} \mathit{mss} \} \\ & \mathit{maximum} \cdot \mathit{map}\,\mathit{sum} \cdot \mathit{segs} \\ = & \qquad \{ \mbox{definition of~} \mathit{segs} \} \\ & \mathit{maximum} \cdot \mathit{map}\,\mathit{sum} \cdot \mathit{concat} \cdot \mathit{map}\,\mathit{inits} \cdot \mathit{tails} \\ = & \qquad \{ \mbox{naturality of~} \mathit{concat} \} \\ & \mathit{maximum} \cdot \mathit{concat} \cdot \mathit{map}\,(\mathit{map}\,\mathit{sum}) \cdot \mathit{map}\,\mathit{inits} \cdot \mathit{tails} \\ = & \qquad \{ \mathit{maximum} \cdot \mathit{concat} = \mathit{maximum} \cdot \mathit{map}\,\mathit{maximum} \} \\ & \mathit{maximum} \cdot \mathit{map}\,\mathit{maximum} \cdot \mathit{map}\,(\mathit{map}\,\mathit{sum}) \cdot \mathit{map}\,\mathit{inits} \cdot \mathit{tails} \\ = & \qquad \{ \mbox{functors} \} \\ & \mathit{maximum} \cdot \mathit{map}\,(\mathit{maximum} \cdot \mathit{map}\,\mathit{sum} \cdot \mathit{inits}) \cdot \mathit{tails} \end{array}

For the final step, if we can write {\mathit{maximum} \cdot \mathit{map}\,\mathit{sum} \cdot \mathit{inits}} in the form {\mathit{foldr}\,h\,e}, then the {\mathit{map}} of this can be fused with the {\mathit{tails}} to yield {\mathit{scanr}\,h\,e}; this observation is known as the Scan Lemma. Moreover, if {h} takes constant time, then this gives a linear-time algorithm for {\mathit{mss}}.

The crucial observation is based on Horner’s Rule for evaluation of polynomials, which is the first important thing you learn in numerical computing—I was literally taught it in secondary school, in my sixth-year classes in mathematics. Here is its familiar form:

\displaystyle  \sum_{i=0}^{n-1} a_i x^i = a_0 + a_1 x + a_2 x^2 + \cdots + a_{n-1} x^{n-1} = a_0 + x(a_1 + x(a_2 + \cdots + x\,a_{n-1}))

but the essence of the rule is about sums of products:

\displaystyle  \sum_{i=0}^{n-1} \prod_{j=0}^{i-1} u_j = 1 + u_0 + u_0u_1 + \cdots + u_0u_1\ldots u_{n-1} = 1 + u_0(1 + u_1(1 + \cdots + u_{n-1}))

Expressed in Haskell, this is captured by the equation

\displaystyle  \mathit{sum} \cdot \mathit{map}\,\mathit{product} \cdot \mathit{inits} = \mathit{foldr}\,(\oplus)\,e \quad \mathbf{where}\; e = 1 \mathbin{;} u \oplus z = e + u \times z

(where {\mathit{product} = \mathit{foldr}\,(\times)\,1} computes the product of a list of integers).

But Horner’s Rule is not restricted to sums and products; the essential properties are that addition and multiplication are associative, that multiplication has a unit, and that multiplication distributes over addition. This the algebraic structure of a semiring (but without needing commutativity and a unit of addition, or that that unit is a zero of multiplication). In particular, the so-called tropical semiring on the integers, in which “addition” is binary {\max} and “multiplication” is integer addition, satisfies the requirements. So for the maximum segment sum problem, we get

\displaystyle  \mathit{maximum} \cdot \mathit{map}\,\mathit{sum} \cdot \mathit{inits} = \mathit{foldr}\,(\oplus)\,e \quad \mathbf{where}\; e = 0 \mathbin{;} u \oplus z = e \max (u + z)

Moreover, {\oplus} takes constant time, so this gives a linear-time algorithm for {\mathit{mss}}.

Tail segments, datatype-generically

About a decade after the initial “theory of lists” work on the maximum segment sum problem, Richard Bird (with Oege de Moor and Paul Hoogendijk) came up with a datatype-generic version of the problem in the paper Generic functional programming with types and relations. It’s clear what “maximum” and “sum” mean generically, but not so clear what “segment” means for nonlinear datatypes; the point of their paper is basically to resolve that issue.

Recalling the definition of {\mathit{segs}} in terms of {\mathit{inits}} and {\mathit{tails}}, we see that it would suffice to develop datatype-generic notions of “initial segment” and “tail segment”. One fruitful perspective is given in Bird & co’s paper: a “tail segment” of a cons list is just a subterm of that list, and an “initial segment” is the list but with some tail (that is, some subterm) replaced with the empty structure.

So, representing a generic “tail” of a data structure is easy: it’s a data structure of the same type, and a subterm of the term denoting the original structure. A datatype-generic definition of {\mathit{tails}} is a little trickier, though. For lists, you can see it as follows: every node of the original list is labelled with the subterm of the original list rooted at that node. I find this a helpful observation, because it explains why the {\mathit{tails}} of a list is one element longer than the list itself: a list with {n} elements has {n+1} nodes ({n} conses and a nil), and each of those nodes gets labelled with one of the {n+1} subterms of the list. Indeed, {\mathit{tails}} ought morally to take a possibly empty list and return a non-empty list of possibly empty lists—there are two different datatypes involved. Similarly, if one wants the “tails” of a data structure of a type in which some nodes have no labels (such as leaf-labelled trees, or indeed such as the “nil” constructor of lists), one needs a variant of the datatype providing labels at those positions. Also, for a data structure in which some nodes have multiple labels, or in which there are different types of labels, one needs a variant for which every node has precisely one label.

Bird & co call this the labelled variant of the original datatype; if the original is a polymorphic datatype {\mathsf{T}\,\alpha = \mu(\mathsf{F}\,\alpha)} for some binary shape functor {\mathsf{F}}, then the labelled variant is {\mathsf{L}\,\alpha = \mu(\mathsf{G}\,\alpha)} where {\mathsf{G}\,\alpha\,\beta = \alpha \times \mathsf{F}\,1\,\beta}—whatever labels {\mathsf{F}} may or may not have specified are ignored, and precisely one label per node is provided. Given this insight, it is straightforward to define a datatype-generic variant {\mathit{subterms}} of the {\mathit{tails}} function:

\displaystyle  \mathit{subterms}_{\mathsf{F}} = \mathit{fold}_{\mathsf{F}}(\mathit{in}_{\mathsf{G}} \cdot \mathit{fork}(\mathit{in}_{\mathsf{F}} \cdot \mathsf{F}\,\mathit{id}\,\mathit{root}, \mathsf{F}\,!\,\mathit{id})) :: \mathsf{T}\,\alpha \rightarrow \mathsf{L}(\mathsf{T}\,\alpha)

where {\mathit{root} = \mathit{fst} \cdot \mathit{in}_{\mathsf{G}}^{-1} = \mathit{fold}_{\mathsf{G}}\,\mathit{fst} :: \mathsf{L}\,\alpha \rightarrow \alpha} returns the root label of a labelled data structure, and {!_{\alpha} :: \alpha \rightarrow 1} is the unique arrow to the unit type. (Informally, having computed the tree of subterms for each child of a node, we make the tree of subterms for this node by assembling all the child trees with the label for this node; the label should be the whole structure rooted at this node, which can be reconstructed from the roots of the child trees.) What’s more, there’s a datatype-generic scan lemma too:

\displaystyle  \begin{array}{lcl} \mathit{scan}_{\mathsf{F}} &::& (\mathsf{F}\,\alpha\,\beta \rightarrow \beta) \rightarrow \mathsf{T}\,\alpha \rightarrow \mathsf{L}\,\beta \\ \mathit{scan}_{\mathsf{F}}\,\phi &=& \mathsf{L}\,(\mathit{fold}_{\mathsf{F}}\,\phi) \cdot \mathit{subterms}_{\mathsf{F}} \\ &=& \mathit{fold}_{\mathsf{F}}(\mathit{in}_{\mathsf{G}} \cdot \mathit{fork}(\phi \cdot \mathsf{F}\,\mathit{id}\,\mathit{root}, \mathsf{F}\,!\,\mathit{id})) \end{array}

(Again, the label for each node can be constructed from the root labels of each of the child trees.) In fact, {\mathit{subterms}} and {\mathit{scan}} are paramorphisms, and can also be nicely written coinductively as well as inductively. I’ll return to this in a future post.

Initial segments, datatype-generically

What about a datatype-generic “initial segment”? As suggested above, that’s obtained from the original data structure by replacing some subterms with the empty structure. Here I think Bird & co sell themselves a little short, because they insist that the datatype {\mathsf{T}} supports empty structures, which is to say, that {\mathsf{F}} is of the form {\mathsf{F}\,\alpha\,\beta = 1 + \mathsf{F}'\,\alpha\,\beta} for some {\mathsf{F}'}. This isn’t necessary: for an arbitrary {\mathsf{F}}, we can easily manufacture the appropriate datatype {\mathsf{U}} of “data structures in which some subterms may be replaced by empty”, by defining {\mathsf{H}\,\alpha\,\beta = 1 + \mathsf{F}\,\alpha\,\beta} and {\mathsf{U}\,\alpha = \mu(\mathsf{H}\,\alpha)}.

As with {\mathit{subterms}}, the datatype-generic version of {\mathit{inits}} is a bit trickier—and this time, the special case of lists is misleading. You might think that because a list has just as many initial segments as it does tail segments, so the labelled variant ought to suffice just as well here too. But this doesn’t work for non-linear data structures such as trees—in general, there are many more “initial” segments than “tail” segments (because one can make independent choices about replacing subterms with the empty structure in each child), and they don’t align themselves conveniently with the nodes of the original structure.

The approach I prefer here is just to use an unstructured collection type to hold the “initial segments”; that is, a monad. This could be the monad of finite lists, or of finite sets, or of finite bags—we will defer until later the discussion about precisely which, and write simply {\mathsf{M}}. We require only that it provide a {\mathit{MonadPlus}}-like interface, in the sense of an operator {\mathit{mplus} :: \mathsf{M}\,\alpha \times \mathsf{M}\,\alpha \rightarrow \mathsf{M}\,\alpha}; however, for reasons that will become clear, we will expect that it does not provide a {\mathit{mzero}} operator yielding empty collections.

Now we can think of the datatype-generic version of {\mathit{inits}} as nondeterministically pruning a data structure by arbitrarily replacing some subterms with the empty structure; or equivalently, as generating the collection of all such prunings.

\displaystyle  \mathit{prune} = \mathit{fold}_{\mathsf{F}}(\mathsf{M}\,\mathit{in}_{\mathsf{H}} \cdot \mathit{opt}\,\mathit{Nothing} \cdot \mathsf{M}\,\mathit{Just} \cdot \delta_2) :: \mu(\mathsf{F}\,\alpha) \rightarrow \mathsf{M}(\mu(\mathsf{H}\,\alpha))

Here, {\mathit{opt}} supplies a new alternative for a nondeterministic computation:

\displaystyle  opt\,a\,\mathit{mx} = \mathit{return}\,a \mathbin{\underline{\smash{\mathit{mplus}}}} \mathit{mx}

and {\delta_2 :: (\mathsf{F}\,\alpha)\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}(\mathsf{F}\alpha)} distributes the shape functor {\mathsf{F}} over the monad {\mathsf{M}} (which can be defined for all {\mathit{Traversable}} functors {\mathsf{F}\,\alpha}). Informally, once you have computed all possible ways of pruning each of the children of a node, a pruning of the node itself is formed either as {\mathit{Just}} some node assembled from arbitrarily pruned children, or {\mathit{Nothing}} for the empty structure.

Horner’s Rule, datatype-generically

As we’ve seen, the essential property behind Horner’s Rule is one of distributivity. In the datatype-generic case, we will model this as follows. We are given an {(\mathsf{F}\,\alpha)}-algebra {(\beta,f)}, and a {\mathsf{M}}-algebra {(\beta,k)}; you might think of these as “datatype-generic product” and “collection sum”, respectively. Then there are two different methods of computing a {\beta} result from an {\mathsf{F}\,\alpha\,(\mathsf{M}\,\beta)} structure: we can either distribute the {\mathsf{F}\,\alpha} structure over the collection(s) of {\beta}s, compute the “product” {f} of each structure, and then compute the “sum” {k} of the resulting products; or we can “sum” each collection, then compute the “product” of the resulting structure. Distributivity of “product” over “sum” is the property that these two different methods agree, as illustrated in the following diagram.

For example, with {f :: \mathsf{F}\,{\mathbb Z}\,{\mathbb Z} \rightarrow {\mathbb Z}} adding all the integers in an {\mathsf{F}}-structure, and {k :: \mathsf{M}\,{\mathbb Z} \rightarrow {\mathbb Z}} finding the maximum of a (non-empty) collection, the diagram commutes. (To match up with the rest of the story, we have presented distributivity in terms of a bifunctor {\mathsf{F}}, although the first parameter {\alpha} plays no role. We could just have well have used a unary functor, dropping the {\alpha}, and changing the distributor to {\delta :: \mathsf{F}\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}\mathsf{F}}.)

Note that {(\beta,k)} is required to be an algebra for the monad {\mathsf{M}}. This means that it is not only an algebra for {\mathsf{M}} as a functor (namely, of type {\mathsf{M}\,\beta \rightarrow \beta}), but also it should respect the extra structure of the monad: {k \cdot \mathit{return} = \mathit{id}} and {k \cdot \mathit{join} = k \cdot \mathsf{M}\,k}. For the special case of monads for associative collections (such as lists, bags, and sets), and in homage to the old Squiggol papers, we will stick to reductions{k}s of the form {\oplus/} for associative binary operator {{\oplus} :: \beta \times \beta \rightarrow \beta}; then we also have distribution over choice: {\oplus / (x \mathbin{\underline{\smash{\mathit{mplus}}}} y) = (\oplus/x) \oplus (\oplus/y)}. Note also that we prohibited empty collections in {\mathsf{M}}, so we do not need a unit for {\oplus}.

Recall that we modelled an “initial segment” of a structure of type {\mu(\mathsf{F}\,\alpha)} as being of type {\mu(\mathsf{H}\,\alpha)}, where {\mathsf{H}\,\alpha\,\beta = 1 + \mathsf{F}\,\alpha\,\beta}. We need to generalize “product” to work on this extended structure, which is to say, we need to specify the value {b} of the “product” of the empty structure too. Then we let {g = \mathit{maybe}\,b\,f :: \mathsf{H}\,\alpha\,\beta \rightarrow \beta}, so that {\mathit{fold}_{\mathsf{H}}(g) :: \mu(\mathsf{H}\,\alpha) \rightarrow \beta}.

The datatype-generic version of Horner’s Rule is then about computing the “sum” of the “products” of each of the “initial segments” of a data structure:

\displaystyle  {\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(g)) \cdot \mathit{prune}

We will use fold fusion to show that this can be computed as a single fold, given the necessary distributivity property.

\displaystyle  \begin{array}{ll} & \mathord{\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(g)) \cdot \mathit{prune} \cdot \mathit{in}_{\mathsf{F}} \\ = & \qquad \{ \mbox{evaluation for~} \mathit{prune} \} \\ & \mathord{\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(g)) \cdot \mathsf{M}\,\mathit{in}_{\mathsf{H}} \cdot \mathit{opt}\,\mathit{Nothing} \cdot \mathsf{M}\,\mathit{Just} \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ \mbox{functors; evaluation for~} \mathit{fold}_{\mathsf{H}} \} \\ & \mathord{\oplus/} \cdot \mathsf{M}(g \cdot \mathsf{H}\,\mathit{id}\,(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathit{opt}\,\mathit{Nothing} \cdot \mathsf{M}\,\mathit{Just} \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ \mathsf{M}\,h \cdot \mathit{opt}\,a = \mathit{opt}\,(h\,a) \cdot \mathsf{M}\,h \} \\ & \mathord{\oplus/} \cdot \mathit{opt}\,(g\,\mathit{Nothing}) \cdot \mathsf{M}(g \cdot \mathsf{H}\,\mathit{id}\,(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathsf{M}\,\mathit{Just} \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ \mbox{functors;~} \mathit{Just} :: \mathsf{F}\,\alpha \mathbin{\stackrel{.}{\to}} \mathsf{H}\,\alpha \} \\ & \mathord{\oplus/} \cdot \mathit{opt}\,(g\,\mathit{Nothing}) \cdot \mathsf{M}\,g \cdot \mathsf{M}(\mathit{Just} \cdot \mathsf{F}\,\mathit{id}\,(\mathit{fold}_{\mathsf{H}}(g))) \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ \mbox{functors;~} \delta_2 :: (\mathsf{F}\alpha)\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}(\mathsf{F}\alpha) \} \\ & \mathord{\oplus/} \cdot \mathit{opt}\,(g\,\mathit{Nothing}) \cdot \mathsf{M}(g \cdot \mathit{Just}) \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,(\mathsf{M}(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \end{array}

(Sadly, I have to break this calculation in two to get it through WordPress’s somewhat fragile LaTeX processor… where were we? Ah, yes:)

\displaystyle  \begin{array}{ll} & \mathord{\oplus/} \cdot \mathit{opt}\,(g\,\mathit{Nothing}) \cdot \mathsf{M}(g \cdot \mathit{Just}) \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,(\mathsf{M}(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ g = \mathit{maybe}\,b\,f \} \\ & \mathord{\oplus/} \cdot \mathit{opt}\,b \cdot \mathsf{M}\,f \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,(\mathsf{M}(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ {\oplus/} \cdot \mathit{opt}\,b = (b\oplus) \cdot {\oplus/} \} \\ & (b\oplus) \cdot {\oplus/} \cdot \mathsf{M}\,f \cdot \delta_2 \cdot \mathsf{F}\,\mathit{id}\,(\mathsf{M}(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ \mbox{distributivity:~} {\oplus/} \cdot \mathsf{M}\,f \cdot \delta_2 = f \cdot \mathsf{F}\,\mathit{id}\,(\oplus/) \} \\ & (b\oplus) \cdot f \cdot \mathsf{F}\,\mathit{id}\,(\oplus/) \cdot \mathsf{F}\,\mathit{id}\,(\mathsf{M}(\mathit{fold}_{\mathsf{H}}(g))) \cdot \mathsf{F}\,\mathit{id}\,\mathit{prune} \\ = & \qquad \{ \mbox{functors} \} \\ & (b\oplus) \cdot f \cdot \mathsf{F}\,\mathit{id}\,({\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(g)) \cdot \mathit{prune}) \end{array}

Therefore,

\displaystyle  {\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot \mathit{prune} = \mathit{fold}_{\mathsf{F}}((b\oplus) \cdot f)

(Curiously, it doesn’t seem to matter what value is chosen for {b}.)

Maximum segment sum, datatype-generically

We’re nearly there. We start with the traversable shape bifunctor {\mathsf{F}}, a collection monad {\mathsf{M}}, and a distributive law {\delta_2 :: (\mathsf{F}\,\alpha)\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}(\mathsf{F}\alpha)}. We are given an {(\mathsf{F}\,\alpha)}-algebra {(\beta,f)}, an additional element {b :: \beta}, and a {\mathsf{M}}-algebra {(\beta,{\oplus/})}, such that {f} and {\oplus} take constant time and {f} distributes over {\oplus/} in the sense above. Then

\displaystyle  {\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot \mathit{segs}

can be computed in linear time, where

\displaystyle  \mathit{segs} = \mathit{join} \cdot \mathsf{M}\,\mathit{prune} \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathit{subterms} :: \mu(\mathsf{F}\,\alpha) \rightarrow \mathsf{M}(\mu(\mathsf{H}\,\alpha))

and where

\displaystyle  \mathit{contents}_{\mathsf{L}} :: \mathsf{L} \mathbin{\stackrel{.}{\to}} \mathsf{M}

computes the contents of an {\mathsf{L}}-structure (which, like {\delta_2}, can be defined using the traversability of {\mathsf{F}}). Here’s the calculation:

\displaystyle  \begin{array}{ll} & \mathord{\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot \mathit{segs} \\ = & \qquad \{ \mbox{definition of~} \mathit{segs} \} \\ & \mathord{\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot \mathit{join} \cdot \mathsf{M}\,\mathit{prune} \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathit{subterms} \\ = & \qquad \{ \mbox{naturality of~} \mathit{join} :: \mathsf{M}\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}\mbox{; functors} \} \\ & \mathord{\oplus/} \cdot \mathit{join} \cdot \mathsf{M}(\mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot\mathit{prune}) \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathit{subterms} \\ = & \qquad \{ \oplus/ \mbox{~is an~} \mathsf{M}\mbox{-algebra; functors} \} \\ & \mathord{\oplus/} \cdot \mathsf{M}({\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot\mathit{prune}) \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathit{subterms} \\ = & \qquad \{ \mbox{naturality of~} \mathit{contents} :: \mathsf{L} \mathbin{\stackrel{.}{\to}} \mathsf{M} \} \\ & \mathord{\oplus/} \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathsf{L}({\oplus/} \cdot \mathsf{M}(\mathit{fold}_{\mathsf{H}}(\mathit{maybe}\,b\,f)) \cdot\mathit{prune}) \cdot \mathit{subterms} \\ = & \qquad \{ \mbox{Horner's Rule} \} \\ & \mathord{\oplus/} \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathsf{L}(\mathit{fold}_{\mathsf{F}}((b\oplus)\cdot f)) \cdot \mathit{subterms} \\ = & \qquad \{ \mbox{Scan Lemma} \} \\ & \mathord{\oplus/} \cdot \mathit{contents}_{\mathsf{L}} \cdot \mathit{scan}_{\mathsf{F}}((b\oplus)\cdot f) \end{array}

The scan can be computed in linear time, because its body takes constant time; moreover, the “sum” {\oplus/} and {\mathit{contents}} can also be computed in linear time (and what’s more, they can be fused into a single pass).

For example, with {f :: \mathsf{F}\,{\mathbb Z}\,{\mathbb Z} \rightarrow {\mathbb Z}} adding all the integers in an {\mathsf{F}}-structure, {b = 0 :: {\mathbb Z}}, and {{\oplus} :: {\mathbb Z}\times{\mathbb Z} \rightarrow {\mathbb Z}} returning the greater of two integers, we get a datatype-generic version of the linear-time maximum segment sum algorithm.

Monads versus relations

As the title of their paper suggests, Bird & co carried out their development using the relational approach set out in the Algebra of Programming book; for example, their version of {\mathit{prune}} is a relation between data structures and their prunings, rather than being a function that takes a structure and returns the collection of all its prunings. There’s a well-known isomorphism between relations and set-valued functions, so their relational approach roughly looks equivalent to the monadic one I’ve taken.

I’ve known their paper well for over a decade (I made extensive use of the “labelled variant” construction in my own papers on generic downwards accumulations), but I’ve only just noticed that although they discuss the maximum segment sum problem, they don’t discuss problems based on other semirings, such as the obvious one of integers with addition and multiplication—which is, after all, the origin of Horner’s Rule. Why not? It turns out that the relational approach doesn’t work in that case!

There’s a hidden condition in the calculation, which relates back to our earlier comment about which collection monad—finite sets, finite bags, lists, etc—to use. When {\mathsf{M}} is the set monad, distribution over choice ({\oplus / (x \mathbin{\underline{\smash{\mathit{mplus}}}} y) = (\oplus/x) \oplus (\oplus/y)})—and consequently the condition {{\oplus/} \cdot \mathit{opt}\,b = (b\oplus) \cdot {\oplus/}} that we used in proving Horner’s Rule—require {\oplus} to be idempotent, because {\mathit{mplus}} itself is idempotent; but addition is not idempotent. For the same reason, the distributivity property does not hold for addition with the set monad. But everything does work out for the bag monad, for which {\mathit{mplus}} is not idempotent. The bag monad models a flavour of nondeterminism in which multiplicity of results matters—as it does for the sum-of-products instance of the problem, when two copies of the same segment should be treated differently from just one copy. Similarly, if the order of results matters—if, for example, we were looking for the “first” solution—then we would have to use the list monad rather than bags or sets. Seen from a monadic perspective, the relational approach is programming with just one monad, namely the set monad; if that monad doesn’t capture your effects faithfully, you’re stuck.

(On the other hand, there are aspects of the problem that work much better relationally. We have carefully used {\max} only for a linear order, namely the usual ordering of the integers. A partial order is more awkward monadically, because there need not be a unique maximal value. For example, it is not so easy to compute a segment with maximal sum, unless we refine the sum ordering on segments to make it once more a linear order; relationally, this works out perfectly straightforwardly. We can try the same trick of turning the relation “maximal under a partial order” into the collection-valued function “all maxima under a partial order”, but I fear that the equivalent trick on the ordering itself—turning the relation “{<}” into the collection-valued function “all values less than this one”—runs into problems from taking us outside the world of finite nondeterminism.)

Posted in Uncategorized | 6 Comments

Morality and temptation

Inspired by Bob Harper’s recent postings, I too have a confession to make. I know what is morally right; but sometimes the temptation is too great, and my resolve is weak, and I lapse. Fast and loose reasoning may excuse me, but my conscience would be clearer if I could remain pure in the first place.

Initial algebras, final coalgebras

We know and love initial algebras, because of the ease of reasoning with their universal properties. We can tell a simple story about recursive programs, solely in terms of sets and total functions. As we discussed in the previous post, given a functor {\mathsf{F} : \mathbb{S}\mathrm{et} \rightarrow \mathbb{S}\mathrm{et}}, an {\mathsf{F}}-algebra is a pair {(X,f)} consisting of an object {X} and an arrow {f : \mathsf{F}(X) \rightarrow X}. A homomorphism between {\mathsf{F}}-algebras {(X,f)} and {(Y,g)} is an arrow {h : X \rightarrow Y} such that {h \cdot f = g \cdot \mathsf{F}(h)}:

The {\mathsf{F}}-algebra {(X,f)} is initial iff there is a unique such {h} for each {(Y,g)}; for well-behaved functors {\mathsf{F}}, such as the polynomial functors on {\mathbb{S}\mathrm{et}}, an initial algebra always exists. We conventionally write “{(\mu\mathsf{F},\mathit{in})}” for the initial algebra, and “{\mathit{fold}_{\mathsf{F}}(g)}” for the unique homomorphism {h} to another {\mathsf{F}}-algebra {(Y,g)}. (In {\mathbb{S}\mathrm{et}}, initial algebras correspond to datatypes of finite recursive data structures.)

The uniqueness of the solution is captured in the universal property:

\displaystyle  h = \mathit{fold}(g) \Leftrightarrow h \cdot \mathit{in} = g \cdot \mathsf{F}(h)

In words, {h} is this fold iff {h} satisfies the defining equation for the fold.

The universal property is crucial. For one thing, the homomorphism equation is a very convenient style in which to define a function; it’s the datatype-generic abstraction of the familiar pattern for defining functions on lists:

\displaystyle  \begin{array}{lcl} h\,[] &=& e \\ h\,(x:\mathit{xs}) &=& f\,x\,(h\,\mathit{xs}) \end{array}

These two equations implicitly characterizing {h} are much more comprehensible and manipulable than a single equation

\displaystyle  h = \lambda \mathit{xs}\;.\; \textbf{if}\;\mathit{null}\,\mathit{xs}\;\textbf{then}\;e\;\textbf{else}\;f\,(\mathit{head}\,\mathit{xs})\,(h\,(\mathit{tail}\,\mathit{xs}))

explicitly giving a value for {h}. But how do we know that this assortment of two facts about {h} is enough to form a definition? Of course! A system of equations in this form has a unique solution.

Moreover, the very expression of the uniqueness of the solution as an equivalence {h = \ldots \Leftrightarrow \ldots} provides many footholds for reasoning:

  • Read as an implication from left to right, instantiating {h} to {\mathit{fold}(g)} to make the left-hand side trivially true, we get an evaluation rule for folds:

    \displaystyle  \mathit{fold}(g) \cdot \mathit{in} = g \cdot \mathsf{F}(\mathit{fold}(g))

  • Read as an implication from right to left, we get a proof rule for demonstrating that some complicated expression {h} is a fold:

    \displaystyle  h = \mathit{fold}(g) \Leftarrow \ldots

  • In particular, we can quickly see that the identity function is a fold:

    \displaystyle  \begin{array}{ll} & \mathit{id} = \mathit{fold}(g) \\ \Leftarrow & \qquad \{ \mbox{universal property} \} \\ & \mathit{id} \cdot \mathit{in} = g \cdot \mathsf{F}(\mathit{id}) \\ \Leftrightarrow & \qquad \{ \mbox{identities} \} \\ & \mathit{in} = g \end{array}

    so {\mathit{id} = \mathit{fold}(\mathit{in})}. (In fact, this one’s an equivalence.)

  • We get a very simple proof of a fusion rule, for combining a following function with a fold to make another fold:

    \displaystyle  \begin{array}{ll} & h \cdot \mathit{fold}(f) = \mathit{fold}(g) \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & h \cdot \mathit{fold}(f) \cdot \mathit{in} = g \cdot \mathsf{F}(h \cdot \mathit{fold}(f)) \\ \Leftrightarrow & \qquad \{ \mbox{evaluation rule, functors} \} \\ & h \cdot f \cdot \mathsf{F}(\mathit{fold}(f)) = g \cdot \mathsf{F}(h) \cdot \mathsf{F}(\mathit{fold}(f)) \\ \Leftarrow & \qquad \{ \mbox{Leibniz} \} \\ & h \cdot f = g \cdot \mathsf{F}(h) \end{array}

  • Using this, we can deduce Lambek’s Lemma, that the constructors {\mathit{in}} form an isomorphism. Supposing that there is a right inverse, and it is a fold, what must it look like?

    \displaystyle  \begin{array}{ll} & \mathit{in} \cdot \mathit{fold}(f) = \mathit{id} \\ \Leftrightarrow & \qquad \{ \mathit{id} \mbox{~as a fold} \} \\ & \mathit{in} \cdot \mathit{fold}(f) = \mathit{fold}(\mathit{in}) \\ \Leftarrow & \qquad \{ \mbox{fusion} \} \\ & \mathit{in} \cdot f = \mathit{in} \cdot \mathsf{F}(\mathit{in}) \\ \Leftarrow & \qquad \{ \mbox{Leibniz} \} \\ & f = \mathsf{F}(\mathit{in}) \end{array}

    So if we define {\mathit{in}^{-1} = \mathit{fold}(\mathsf{F}(\mathit{in}))}, we get {\mathit{in} \cdot \mathit{in}^{-1} = \mathit{id}}. We should also check the left inverse property:

    \displaystyle  \begin{array}{ll} & \mathit{in}^{-1} \cdot \mathit{in} \\ = & \qquad \{ \mathit{in}^{-1} \mbox{~as a fold} \} \\ & \mathit{fold}(\mathsf{F}(\mathit{in})) \cdot \mathit{in} \\ = & \qquad \{ \mbox{evaluation rule} \} \\ & \mathsf{F}(\mathit{in}) \cdot \mathsf{F}(\mathit{fold}(\mathsf{F}(\mathit{in}))) \\ = & \qquad \{ \mathit{in}^{-1} \mbox{~as a fold again} \} \\ & \mathsf{F}(\mathit{in}) \cdot \mathsf{F}(\mathit{in}^{-1}) \\ = & \qquad \{ \mbox{functors} \} \\ & \mathsf{F}(\mathit{in} \cdot \mathit{in}^{-1}) \\ = & \qquad \{ \mbox{right inverse} \} \\ & \mathsf{F}(\mathit{id}) \\ = & \qquad \{ \mbox{functors} \} \\ & \mathit{id} \end{array}

And so on, and so on. Many useful functions can be written as instances of {\mathit{fold}}, and the universal property gives us a very powerful reasoning tool—the universal property of {\mathit{fold}} is a marvel to behold.

And of course, it all dualizes beautifully. An {\mathsf{F}}-coalgebra is a pair {(X,f)} with {f : X \rightarrow \mathsf{F}(X)}. A homomorphism between {\mathsf{F}}-coalgebras {(X,f)} and {(Y,g)} is a function {h : X \rightarrow Y} such that {g \cdot h = \mathsf{F}(h) \cdot f}:

The {\mathsf{F}}-coalgebra {(Y,g)} is final iff there is a unique homomorphism to it from each {(X,f)}; again, for well-behaved {\mathsf{F}}, final coalgebras always exist. We write “{(\nu\mathsf{F},\mathit{out})}” for the final coalgebra, and {\mathit{unfold}_{\mathsf{F}}(f)} for the unique homomorphism to it. (In {\mathbb{S}\mathrm{et}}, final coalgebras correspond to datatypes of finite-or-infinite recursive data structures.)

Uniqueness is captured by the universal property

\displaystyle  h = \mathit{unfold}(f) \Leftrightarrow \mathit{out} \cdot h = \mathsf{F}(h) \cdot f

which has just as many marvellous consequences. Many other useful functions are definable as instances of {\mathit{unfold}}, and again the universal property gives a very powerful tool for reasoning with them.

Hylomorphisms

There are also many interesting functions that are best described as a combination of a fold and an unfold. The hylomorphism pattern, with an unfold followed by a fold, is the best known: the unfold produces a recursive structure, which the fold consumes.

The factorial function is a simple example. The datatype of lists of natural numbers is determined by the shape functor

\displaystyle  \mathsf{L}(X) = 1 + \mathbb{N} \times X

Then we might hope to write

\displaystyle  \mathit{fact} = \mathit{product} \cdot \mathit{downFrom}

where {\mathit{downFrom} = \mathit{unfold}_{\mathsf{L}}(d)} and {\mathit{product} = \mathit{fold}_{\mathsf{L}}(m)} with

\displaystyle  \begin{array}{lcl} d &::& \mathbb{N} \rightarrow \mathsf{L}(\mathbb{N}) \\ d\,0 &=& \mathit{inl}\,() \\ d\,(n+1) &=& \mathit{inr}\,(n+1,n) \bigskip\\ m &::& \mathsf{L}(\mathbb{N}) \rightarrow \mathbb{N} \\ m\,(\mathit{inl}\,()) &=& 1 \\ m\,(\mathit{inr}\,(n,n')) &=& n \times n' \end{array}

More elaborately, we might hope to write {\mathit{quicksort} : \mathsf{List}({\mathbb Z}) \rightarrow \mathsf{List}({\mathbb Z})} as the composition of {\mathit{unfold}_\mathsf{B}(s)} (to generate a binary search tree) and {\mathit{fold}_\mathsf{B}(g)} (to flatten that tree to a list), where {\mathsf{B}} is the shape functor for internally-labelled binary trees,

\displaystyle  p : \mathsf{List}({\mathbb Z}) \rightarrow \mathsf{B}(\mathsf{List}({\mathbb Z}))

partitions a list of integers into the unit or a pivot and two sublists, and

\displaystyle  g : \mathsf{B}(\mathsf{List}({\mathbb Z})) \rightarrow \mathsf{List}({\mathbb Z})

glues together the unit or a pivot and two sorted lists into one list. In fact, any divide-and-conquer algorithm can be expressed in terms of an unfold computing a tree of subproblems top-down, followed by a fold that solves the subproblems bottom-up.

But sadly, this doesn’t work in {\mathbb{S}\mathrm{et}}, because the types don’t meet in the middle. The source type of the fold is (the carrier of) an initial algebra, but the target type of the unfold is a final coalgebra, and these are different constructions.

This is entirely reasonable, when you think about it. Our definitions in {\mathbb{S}\mathrm{et}}—the category of sets and total functions—necessarily gave us folds and unfolds as total functions; the composition of two total functions is a total function, and so a fold after an unfold ought to be a total function too. But it is easy to define total instances of {\mathit{unfold}} that generate infinite data structures (such as a function {\mathit{upFrom}}, which generates an infinite ascending list of naturals), on which a following fold is undefined (such as “the product” of an infinite ascending list of naturals). The composition then should not be a total function.

One might try interposing a conversion function of type {\nu\mathsf{F} \rightarrow \mu\mathsf{F}}, coercing the final data structure produced by the unfold into an initial data structure for consumption by the fold. But there is no canonical way of doing this, because final data structures may be “bigger” (perhaps infinitely so) than initial ones. (In contrast, there is a canonical function of type {\mu\mathsf{F} \rightarrow \nu\mathsf{F}}. In fact, there are two obvious definitions of it, and they agree—a nice exercise!)

One might try parametrizing that conversion function with a natural number, bounding the depth to which the final data structure is traversed. Then the coercion is nicely structural (in fact, it’s a fold over the depth), and everything works out type-wise. But having to thread such “resource bounds” through the code does terrible violence to the elegant structure; it’s not very satisfactory.

Continuous algebras

The usual solution to this conundrum is to give up on {\mathbb{S}\mathrm{et}}, and to admit that richer domain structures than sets and total functions are required. Specifically, in order to support recursive definitions in general, and the hylomorphism in particular, one should move to the category {\mathbb{C}\mathrm{po}} of continuous functions between complete partial orders (CPOs). Now is not the place to give all the definitions; see any textbook on denotational semantics. The bottom line, so to speak, is that one has to accept a definedness ordering {\sqsubseteq} on values—both on “data” and on functions—and allow some values to be less than fully defined.

Actually, in order to give meaning to all recursive definitions, one has to further restrict the setting to pointed CPOs—in which there is a least-defined “bottom” element {\bot_X} for each type {X}, which can be given as the “meaning” (solution) of the degenerate recursive definition {x=x} at type {X}. Then there is no “empty” CPO; the smallest CPO {0} has just a single element, namely {\bot}. As with colimits in general, this smallest object is used as the start of a chain of approximations to a limiting solution. But in order for {0} really to be an initial object, one also has to constrain the arrows to be strict, that is, to preserve {\bot}; only then is there a unique arrow {0 \rightarrow A} for each {A}. The category of strict continuous functions between pointed CPOs is called {\mathbb{C}\mathrm{po}_\bot}.

It so happens that in {\mathbb{C}\mathrm{po}_\bot}, initial algebras and final coalgebras coincide: the objects (pointed CPOs) {\mu\mathsf{F}} and {\nu\mathsf{F}} are identical. This is very convenient, because it means that the hylomorphism pattern works fine: the structure generated by the unfold is exactly what is expected by the fold.

Of course, it still happen that the composition yields a “partial” (less than fully defined) function; but at least it now type-checks. Categories with this initial algebra/final coalgebra coincidence are called algebraically compact; they were studied by Freyd, but there’s a very good survey by Adámek, Milius and Moss.

However, the story gets murkier than that. For one thing, {\mathbb{C}\mathrm{po}_\bot} does not have proper products. (Indeed, an algebraically compact category with products collapses.) But beyond that, {\mathbb{C}\mathrm{po}_\bot}—with its restriction to strict arrows—is not a good model of lazy functional programming; {\mathbb{C}\mathrm{po}}, with non-strict arrows too, is better. So one needs a careful balance of the two categories. The consequences for initial algebras and final coalgebras are spelled out in one of my favourite papers, Program Calculation Properties of Continuous Algebras by Fokkinga and Meijer. In a nutshell, one can only say that the defining equation {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)} for folds has a unique strict solution in {h}; without the strictness side-condition, {h\,\bot} is unconstrained (because {\mathit{in}\,x \ne \bot} for any {x}). But the situation for coalgebras remains unchanged—the defining equation {\mathit{out} \cdot h = \mathsf{F}(h) \cdot f} for unfolds has a unique solution (and moreover, it is strict when {f} is strict).

This works, but it means various strictness side-conditions have to be borne in mind when reasoning about folds. Done rigorously, it’s rather painful.

Recursive coalgebras

So, back to my confession. I want to write divide-and-conquer programs, which produce intermediate data structures and then consume them. Folds and unfolds in {\mathbb{S}\mathrm{et}} do not satisfy me; I want more—hylos. Morally, I realise that I should pay careful attention to those strictness side-conditions. But they’re so fiddly and boring, and my resolve is weak, so I usually just brush them aside. Is there away that I can satisfy my appetite for divide-and-conquer programs while still remaining in the pure {\mathbb{S}\mathrm{et}} world?

Tarmo Uustalu and colleagues have a suggestion. Final coalgebras and algebraic compactness are sufficient but not necessary for the hylo diagram above to have a unique solution; they propose to focus on recursive coalgebras instead. The {\mathsf{F}}-coalgebra {(X,f)} is “recursive” iff, for each {g : \mathsf{F}(Y) \rightarrow Y}, there is a unique {h} such that {h = g \cdot \mathsf{F}(h) \cdot f}:

This is a generalization of initial algebras: if {\mathsf{F}} has an initial algebra {(\mu\mathsf{F},\mathit{in})}, then by Lambek’s Lemma {\mathit{in}} has an inverse {\mathit{in}^{-1}}, and {(\mu\mathsf{F},\mathit{in}^{-1})} is a recursive coalgebra. And it is a strict generalization: it also covers patterns such as paramorphisms (primitive recursion)—since {(\mu\mathsf{F}, \mathsf{F}(\mathit{fork}(\mathit{id},\mathit{id}))\cdot\mathit{in}_\mathsf{F}^{-1})} is a recursive {\mathsf{G}}-coalgebra where {\mathsf{G}} is the functor taking {X} to {\mathsf{F}(X \times \mathsf{F}(X))}—and the “back one or two steps” pattern used in the Fibonacci function.

Crucially for us, almost by definition it covers all of the “reasonable” hylomorphisms too. For example, {(\mathbb{N},d)} is a recursive {\mathsf{L}}-coalgebra, where {\mathsf{L}} is the shape functor for lists of naturals and {d} the {\mathsf{L}}-coalgebra introduced above that analyzes a natural into nothing (for zero) or itself and its predecessor (for non-zero inputs). Which is to say, for each {m : \mathsf{L}(X) \rightarrow X}, there is a unique {h} such that {h = m \cdot \mathsf{L}(h) \cdot d}; in particular, for the {m} given above that returns 1 or multiplies, the unique {h} is the factorial function. (In fact, this example is also an instance of a paramorphism.) And {(\mathsf{List}({\mathbb Z}),p)} is a recursive {\mathsf{B}}-coalgebra, where {p} is the partition function of quicksort—for any {\mathsf{B}}-algebra {(Y,g)}, there is a unique {h} such that {h = g \cdot \mathsf{B}(h) \cdot p}, and in particular when {g} is the glue function for quicksort, that unique solution is quicksort itself.

This works perfectly nicely in {\mathbb{S}\mathrm{et}}; there is no need to move to more complicated settings such as {\mathbb{C}\mathrm{po}} or {\mathbb{C}\mathrm{po}_\bot}, or to consider partiality, or strictness, or definedness orderings. The only snag is the need to prove that a particular coalgebra of interest is indeed recursive. Capretta et al. study a handful of “basic” recursive coalgebras and of constructions on coalgebras that preserve recursivity.

More conveniently, Taylor and Adámek et al. relate recursivity of coalgebras to the more familiar notion of variant function, ie well-founded ordering on arguments of recursive calls. They restrict attention to finitary shape functors; technically, preserving directed colimits, but informally, I think that’s equivalent to requiring that each element of {\mathsf{F}(X)} has a finite number of {X} elements—so polynomial functors are ok, as is the finite powerset functor, but not powerset in general. If I understand those sources right, for a finitary functor {\mathsf{F}} and an {\mathsf{F}}-coalgebra {(X,f)}, the following conditions are equivalent: (i) {(X,f)} is corecursive; (ii) {f} is well-founded, in the sense that there is a well-founded ordering {\prec} such that {y \prec x} for each “element” {y} of {f(x)}; (iii) every element of {\mathit{unfold}_\mathsf{F}(f)} has finite depth; and (iv) there is a coalgebra homomorphism from {(X,f)} to {(\mu\mathsf{F},\mathit{in})}.

This means that I can resort to simple and familiar arguments in terms of variant functions to justify hylo-style programs. The factorial function is fine, because ({\mathsf{L}} is a finitary functor, being polynomial, and) the chain of recursive calls to which {d} leads is well-founded; quicksort is fine, because the partitioning step is well-founded; and so on. Which takes a great weight of guilt off my shoulders: I can give in to the temptation to write interesting programs, and still remain morally as pure as the driven snow.

Posted in Uncategorized | 5 Comments

Extreme solutions

Not all problem statements are amenable to translation via Galois connections or adjunctions. A reasonable characterization of those that are suitable is the optimization problems—finding the least or greatest solution satisfying a given collection of constraints, according to some ordering.

Least and greatest solutions

For example, the Galois connection determining integer division that we considered a couple of posts ago

\displaystyle  n \div k \ge m \Leftrightarrow n \ge m \times k

defines {n \div k} to be the greatest solution {m} to the equation {n \ge m \times k}. It does so in a very pithy way: reading the equivalence as an implication from left to right, instantiating {m} to {n \div k} (and exploiting the reflexivity of the ordering {\ge}), we get that {n \ge (n \div k) \times k}, so {n \div k} is indeed a solution to the equation on the right; reading the equivalence from right to left, we get that {n \div k \ge m} for any solution {m}, so {n \div k} is in fact the greatest solution.

Similarly, the characterization of the floor function {\lfloor\cdot\rfloor} from reals to integers

\displaystyle  \mathit{inj}(n) \le_R x \Leftrightarrow n \le_I \lfloor x \rfloor

defines {\lfloor x \rfloor} as the greatest integer {n} for which {\mathit{inj}(n) \le_R x}, and the Galois connection involving {\cap} and {\cup}

\displaystyle  A \cap X \subseteq B \Leftrightarrow A \subseteq B \cup \overline{X}

characterizes {B \cup \overline{X}} as the greatest set {A} (under the usual subset ordering) for which {A \cap X \subseteq B}.

Limits and colimits

The characterization of greatest solutions might be equivalently expressed in terms of greatest lower bounds. Given a preordered set {(X,\le)}, and a subset {Y} of {X}, an element {x \in X} is a lower bound of {Y} in {X} if {x \le y} for every {y \in Y}; in addition, {x} is a greatest lower bound {\mathrm{inf}(Y)} of {Y} if {x' \le x} for any other lower bound {x'}. (Note “a” rather than “the”, as there may be multiple such. But they are all related by {\le}; if the ordering is a partial order, the {\mathrm{inf}} is unique when it exists. Note also that {\mathrm{inf}(Y)} need not be in {Y} itself, even when it does exist.)

This construction can be phrased in terms of Galois connections as follows. The two ordered sets are {(X,\le)} and {(\mathsf{P}^+(X),\le^\ast)}, where {\mathsf{P}^+(X)} is the set of nonempty subsets of {X}, with ordering {\le^\ast} defined pointwise: {Y \le^\ast Z} iff {x \le x'} for all {x \in Y, x' \in Z}. The mappings in either direction are the singleton set former {\{\cdot\} : X \rightarrow \mathsf{P}^+(X)} and greatest lower bound {\mathrm{inf} : \mathsf{P}^+(X) \rightarrow X}, related by the Galois connection {x \le \mathrm{inf}(Y) \Leftrightarrow \{x\} \le^\ast Y}. Here’s how it looks with {x' \le x = \mathrm{inf}(Y)} and {Y = \{ x_0,x_1,x_2 \}}:

The categorical perspective on greatest lower bounds is the notion of limit; it’s just the generalization of the diagram above to an arbitrary category. Here is a very brief outline. The fragment of the diagram consisting of {x, x_0, x_1, x_2} is called a cone, from vertex {x} to base {x_0,x_1,x_2} (and so is {x',x_0,x_1,x_2}). The cone {x,x_0,x_1,x_2} is called a limit when, for any other cone from vertex {x'} to the same base, there is a unique arrow {x' \rightarrow x} making the diagram commute.

Commutativity of the diagram above isn’t very interesting—because the category is a partial order, but also because the base is degenerate: just three discrete objects. In general, the base will also contain arrows; then a cone consists of a vertex ({X} in the diagram below) with arrows to each of the objects in the base ({f_i : X \rightarrow X_i}) making the diagram commute ({g_0 \cdot f_0 = f_1}, etc). As before, the cone from vertex {X} is a limit if any other cone factors uniquely through it.

Of course, it all dualizes beautifully. The categorical perspective on least upper bounds is expressed in terms of cones from a base ({X_0,X_1,X_2} below) to a vertex ({X}), being a colimit if allowing unique factorization for any other cone from the same base to another vertex ({X'}).

Initial algebras and final coalgebras

Recall that, for a functor {\mathsf{F}}, an {\mathsf{F}}-algebra is a pair {(X,f)} consisting of an object {X} and an arrow {f : \mathsf{F}(X) \rightarrow X}. A homomorphism between {\mathsf{F}}-algebras {(X,f)} and {(Y,g)} is an arrow {h : X \rightarrow Y} such that:

The {\mathsf{F}}-algebra {(X,f)} is initial if there is a unique such {h} for each {(Y,g)}. We usually write {\mu\mathsf{F}} for the “carrier” of this initial algebra (because it is the “least fixed point” of {\mathsf{F}}, as we shall see below), and {\mathit{in} : \mathsf{F}(\mu\mathsf{F}) \rightarrow \mu\mathsf{F}} for the “constructor” (and indeed, it is an isomorphism, so a constructed piece of data can be deconstructed again); we write {h=\mathit{fold}(g)} for the unique {h} such that {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}.

As you might expect, “initial” things are extreme solutions too, albeit not in a very interesting way. An initial object in a category is an object from which there is a unique arrow (often written “{!}“) to any other object. An initial object is a colimit of the diagram generated from the empty category—which has no objects, and hence no arrows either. (Any object forms the vertex of a (trivial) cone, so the colimiting vertex is simply one from which there is a unique arrow to any other vertex, with no additional constraints.) In particular, an initial {\mathsf{F}}-algebra is an initial object in the category of {\mathsf{F}}-algebras, whose objects are {\mathsf{F}}-algebras and whose arrows are homomorphisms between them.

And of course, it all dualizes nicely, to final coalgebras, which are in some sense “greatest fixed points” of functors; final objects are the vertices of limiting cones to the empty base.

Extreme (co-)algebras as (co-)limits

Here is a more illuminating presentation of initial algebras as extreme solutions, explaining rather better in what way they correspond to “least fixed points” of functors. (The construction is well known; I’ve based this presentation on a manuscript by François Métayer.) Initial algebras can be constructed as an instance of the colimit construction above, in which the base consists of a countable chain of objects and arrows:

In the category {\mathbb{S}\mathrm{et}}, every such a chain has a colimit (categories with this property are called {\omega}-categories).

If the category has an initial object {0}, then any endofunctor {\mathsf{F}} induces such a countable chain:

Under mild assumptions, the colimit of this chain is (the carrier of) an initial {\mathsf{F}}-algebra. (Besides assuming an {\omega}-category with an initial object, we have to assume that {\mathsf{F}} is {\omega}-cocontinuous—that is, that it transforms the colimit {X} of the countable chain {X_0 \rightarrow X_1 \rightarrow \cdots} into a colimit {\mathsf{F}(X)} of the countable chain {\mathsf{F}(X_0) \rightarrow \mathsf{F}(X_1) \rightarrow \cdots}. One can show that any polynomial functor—one built from constants using sum and product—is {\omega}-cocontinuous.)

The construction goes as follows. By assumption, the countable chain {0 \rightarrow \mathsf{F}(0) \rightarrow \cdots} has a colimit; let’s suggestively call the vertex {\mu \mathsf{F}}, so that the edges {u_i : \mathsf{F}^i(0) \rightarrow \mu\mathsf{F}} satisfy {u_{i+1} \cdot \mathsf{F}^i(!) = u_i} for each {i}.

Since {\mathsf{F}} is {\omega}-cocontinuous, it transforms this diagram into another colimiting cone, with base shifted one place to the right and vertex {\mathsf{F}(\mu \mathsf{F})}. But {\mu \mathsf{F}} is the vertex of another cone over the same shifted base; and since {\mathsf{F}(\mu \mathsf{F})} is the colimit, there is a unique arrow—let’s suggestively call it {\mathit{in}}—making the diagram below commute ({\mathit{in} \cdot \mathsf{F}(u_i) = u_{i+1}} etc).

All we have to do now is to show that {(\mu\mathsf{F}, \mathit{in})} is indeed the initial {\mathsf{F}}-algebra, as claimed. Suppose we are given another {\mathsf{F}}-algebra {(Y,g)}; we will (i)~construct an arrow {h : \mu\mathsf{F} \rightarrow Y}, (ii)~show that it is a homomorphism between the algebras, {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}, and (iii)~show that it is the only such.

For (i), given the target {\mathsf{F}}-algebra {(Y,g)}, we can construct a square as follows:

which commutes by virtue of the initiality of {0}. Applying {\mathsf{F}} to this square yields another, which can be pasted alongside; and this can be repeated indefinitely, yielding the following ladder:

Then we can pick out arrows {\xi_i = \mathsf{F}^0(g) \cdot \mathsf{F}^1(g) \cdots \mathsf{F}^{i-1}(g) \cdot \mathsf{F}^i(!_Y) : \mathsf{F}^i(0) \rightarrow Y}. Moreover, these {\xi_i}s commute with the base of the colimit diagram ({\xi_{i+1} \cdot \mathsf{F}^i(!) = \xi_i}, etc) to yield another cone to vertex {Y}; we therefore conclude that there is a unique {h : \mu\mathsf{F} \rightarrow Y} such that {h \cdot u_i = \xi_i} for each {i}.

Now for (ii). Note that {Y} and the {\xi_{i+1}} also form a cone over the shifted base starting from {\mathsf{F}(0)}; and because {\mathsf{F}(\mu\mathsf{F})} is the colimit from this shifted base, we also get a unique mediating arrow {k : \mathsf{F}(\mu\mathsf{F}) \rightarrow Y} such that {k \cdot \mathsf{F}(u_i) = \xi_{i+1}} for each {i}.

Moreover, both {h \cdot \mathit{in}} and {g \cdot \mathsf{F}(h)} are also such mediating arrows:

\displaystyle  h \cdot \mathit{in} \cdot \mathsf{F}(u_i) = h \cdot u_{i+1} = \xi_{i+1} = g \cdot \mathsf{F}(\xi_i) = g \cdot \mathsf{F}(h \cdot u_i) = g \cdot \mathsf{F}(h) \cdot \mathsf{F}(u_i)

so both must equal {k} and hence also each other: {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}.

Finally, for (iii), suppose we have another {h' : \mu\mathsf{F} \rightarrow Y} for which {h' \cdot \mathit{in} = g \cdot \mathsf{F}(h')}; we have to show that {h' = h}. By the uniqueness of the mediating arrow, it suffices to show that {h \cdot u_i = \xi_i} for each {i}, which is easily done by induction.

That is, given {\mathsf{F}}-algebra {(Y,g)}, there exists a unique {h : \mu\mathsf{F} \rightarrow Y} (for which we write “{\mathit{fold}(g)}“) such that {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}. If you squint at this in the right way, you can see the inductive definition of the recursive datatype, and of the folds over it. Each {\mathsf{F}^i(0)} is an approximation to {\mu\mathsf{F}}, cut off at depth {i}; they all embed into {\mu\mathsf{F}}, and indeed, {\mu\mathsf{F}} is the least extension—the colimit—of them all. Each {\xi_i} is an approximation to {\mathit{fold}(g)}, again restricted to data structures cut off at depth {i}, and {\mathit{fold}(g)} is the completion of all the {\xi_i}.

Naturally, it all dualizes for final coalgebras: then we need “cochains” {1 \leftarrow \mathsf{F}(1) \leftarrow \mathsf{F}^2(1) \leftarrow \cdots} to a terminal object {1}; an {\omega^{\mathrm{op}}}-category is one in which all such countable cochains have a limit; {\omega}-continuous functors preserve limits of countable cochains. (It is a bit unfortunate that the interesting extreme algebra, namely the initial algebra, is a colimit, whereas the final coalgebra is a limit, but sometimes life is like that.)

(Co-)limits as adjunctions

The definition of limits can be made more concise and precise by noting that the base corresponds to the image of some functor {\mathsf{F} : \mathbb{J} \rightarrow \mathbb{C}}, where {\mathbb{C}} is the category of interest, and index category {\mathbb{J}} determines the shape of the base—for each object {j} of {\mathbb{J}}, there is an object {\mathsf{F}(j)} of {\mathbb{C}} in the base ({X_j} in the diagram below), and for each arrow {a : i \rightarrow j} of {\mathbb{J}}, an arrow {\mathsf{F}(a) : \mathsf{F}(i) \rightarrow \mathsf{F}(j)} of {\mathbb{C}} in the base ({g_a : X_i \rightarrow X_j} below).

(In the {\mathrm{inf}} diagram, the index category is the discrete category on three objects—with no arrows other than identity arrows. In the diagram above, {\mathbb{J}} is {\bullet \mathrel{{-}\vcenter{\hbox{\scriptsize0}}{\rightarrow}} \bullet \mathrel{{\leftarrow}\vcenter{\hbox{\scriptsize2}}{-}} \bullet}, with three objects and two generating arrows. In the construction of initial algebras, the index category is {\omega = \bullet \rightarrow \bullet \rightarrow \cdots}, equivalent to the usual {\le} ordering on the natural numbers, whereas for final coalgebras it is {\bullet \leftarrow \bullet \leftarrow \cdots}, equivalent to {\ge} on natural numbers.)

The vertex {X} too can be seen as the image of {\mathbb{J}} under a particular, degenerate functor—the diagonal functor {\Delta X : \mathbb{J} \rightarrow \mathbb{C}}, defined by {\Delta X(j) = X} for each object {j} of {\mathbb{J}}, and {\Delta X(a) = \mathit{id}_X} for each arrow {a}. Then “the cone {f} from vertex {X} to base {\mathsf{F}}” corresponds to a natural transformation {f : \Delta X \mathbin{\stackrel{.}{\to}} \mathsf{F}}: naturality is exactly the condition that the cone commutes. We write “{\mathrm{Lim}\,\mathsf{F}}” for the limiting object, {X}; its universal property is that, for any cone {f'} from {X'} to {\mathsf{F}}, there exists a unique {h : X' \rightarrow X} such that {f_i \cdot h = f'_i} for each {i}. In other words, there is a (natural) isomorphism between the natural transformations {\Delta X' \mathbin{\stackrel{.}{\to}} \mathsf{F}} and the arrows {X' \rightarrow \mathrm{Lim}\,\mathsf{F}}; that is, an adjunction {\Delta \dashv \mathrm{Lim}}, with limit being right adjoint to the diagonal.

Dually, of course, colimits turn out to be left adjoints: the whole construction is encapsulated in three symbols, {\mathrm{Colim} \dashv \Delta}.

Posted in Uncategorized | 6 Comments

Adjunctions

Universal properties are a generalization of the notion of a Galois connection between two orderings. Or perhaps I should say: universal properties arise from adjunctions, and it is adjunctions that are a generalization of Galois connections. Adjunctions capture in an abstract categorical setting the idea of “optimal solutions to a problem”; and this idea is itself very general, capturing many of the structures underlying common patterns in programming (not to mention the rest of mathematics). Solutions to equations, products, limits of sequences of approximations, and minimality and maximality are just some of the instances of this powerful abstraction that we will make use of. In the preface to Categories for the Working Mathematician, Mac Lane wrote that “adjoint functors arise everywhere”.

Adjoint functors

Two functors {\mathsf{F} : \mathbb{D} \leadsto \mathbb{C}} and {\mathsf{G} : \mathbb{C} \leadsto \mathbb{D}} form an adjunction, written {\mathsf{F} \dashv \mathsf{G}}, if there is an isomorphism between the sets of arrows {\mathsf{F}(B) \rightarrow A} in {\mathbb{C}} and {B \rightarrow \mathsf{G}(A)} in {\mathbb{D}}. We say that {\mathsf{F}} is the left adjoint and {\mathsf{G}} the right adjoint. The essence of the isomorphism is captured by two natural transformations {\eta : \mathsf{Id} \mathbin{\stackrel{.}{\to}} \mathsf{G} \mathbin{\cdot} \mathsf{F}} in {\mathbb{D}} and {\epsilon : \mathsf{F} \mathbin{\cdot} \mathsf{G} \mathbin{\stackrel{.}{\to}} \mathsf{Id}} in {\mathbb{C}}, called the unit and counit of the adjunction; {\eta} is the image in {\mathbb{D}} of {\mathit{id}_{\mathsf{F}(B)} : \mathsf{F}(B) \rightarrow \mathsf{F}(B)} in {\mathbb{C}}, and conversely, {\epsilon} is the image in {\mathbb{C}} of {\mathit{id}_{\mathsf{G}(A)}} in {\mathbb{D}}. The unit and counit satisfy the laws

\displaystyle  \begin{array}{lcl} \epsilon_{\mathsf{F}(B)} \cdot \mathsf{F}(\eta_B) &=& \mathit{id}_{\mathsf{F}(B)} \\ \mathsf{G}(\epsilon_A) \cdot \eta_{\mathsf{G}(A)} &=& \mathit{id}_{\mathsf{G}(A)} \end{array}

From them one can construct the witnesses to the isomorphism for arbitrary arrows: for each arrow {f : \mathsf{F}(B) \rightarrow A} in {\mathbb{C}}, there is a unique arrow {g : B \rightarrow \mathsf{G}(A)} in {\mathbb{D}} such that {\epsilon_A \cdot \mathsf{F}(g) = f}, given by {g = \mathsf{G}(f) \cdot \eta_B}; and conversely, for each arrow {g : B \rightarrow \mathsf{G}(A)} in {\mathbb{D}}, there is a unique arrow {f : \mathsf{F}(B) \rightarrow A} in {\mathbb{C}} such that {\mathsf{G}(f) \cdot \eta_ B = g}, given by {f = \epsilon_B \cdot \mathsf{F}(g)}; and moreover, these two constructions are each other’s inverses.

Adjunctions from Galois connections

A preorder {(X,{\le})} forms a category: the objects of the category are the elements of the set~{X}, and between any two elements {x,y \in X}, there is a unique arrow if {x \le y}, and no arrow otherwise. That adjunctions are a generalization of Galois connections follows straightforwardly from the fact that there is at most one arrow between any two objects in a preorder category. Then monotonic functions {f : (X,{\le_X}) \rightarrow (Y,{\le_Y})} and {g : (Y,{\le_Y}) \rightarrow (X,{\le_X})} between preorders {(X,{\le_X})} and {(Y,{\le_Y})} form a Galois connection precisely if the sets of arrows {f(y) \rightarrow x} and {y \rightarrow g(x)} are isomorphic—that is, if both {f(y) \le_X x} and {y \le_Y g(x)} hold, or neither do, or in other words,

\displaystyle  f(y) \le_X x \Leftrightarrow y \le_Y g(x)

Adjoints of the diagonal functor

A very useful example of adjunctions arises in the definition of products—in the category {\mathbb{S}\mathrm{et}} of sets and total functions, for given types {A,B,C}, there is an isomorphism between the set of pair-generating functions, of type {A \rightarrow B \times C}, and their two projections, pairs of functions of types {A \rightarrow B} and {A \rightarrow C}. (Indeed, given functions {f:A \rightarrow B} and {g:A \rightarrow C}, one can construct the pair-generating function {\mathit{fork}(f,g) : A \rightarrow B \times C}; and conversely, given a pair-generating function {h : A \rightarrow B \times C}, one can construct its two projections {fst \cdot h : A \rightarrow B} and {snd \cdot h : A \rightarrow C}; and moreover, these two constructions are inverses.)

The “isomorphism between sets of arrows” can be elegantly expressed as an adjunction; since it concerns pairs of arrows, one side of the adjunction involves the product category {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}. The right adjoint is the product functor {(\times) : \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et} \leadsto \mathbb{S}\mathrm{et}}, mapping an object in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}—that is, a pair of sets—to their cartesian product as an object in {\mathbb{S}\mathrm{et}}, and an arrow in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}—that is, a parallel pair of functions—to a function in {\mathbb{S}\mathrm{et}} acting pointwise on pairs. In the other direction, the left adjoint is the diagonal functor {\triangle : \mathbb{S}\mathrm{et} \leadsto \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, mapping an object {A} in {\mathbb{S}\mathrm{et}} to the object {(A,A)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, and a function {f} to the pair of functions {(f,f)} as an arrow in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}. The adjunction {{\triangle} \dashv (\times)} amounts to the isomorphism

\displaystyle  \triangle A \rightarrow (B,C) \approx A \rightarrow {\times} (B,C)

or equivalently,

\displaystyle  (A \rightarrow B)\times(A \rightarrow C) \approx A \rightarrow (B\times C)

The unit and counit of the adjunction are {\eta : \mathsf{Id} \mathbin{\stackrel{.}{\to}} (\times) \mathbin{\cdot} \triangle} and {\epsilon : \triangle \mathbin{\cdot} (\times) \mathbin{\stackrel{.}{\to}} \mathsf{Id}}. In more familiar terms, the unit is a natural transformation in {\mathbb{S}\mathrm{et}}, so a polymorphic function; in fact, it’s the function of type {A \rightarrow A \times A} that we might call {\mathit{double}}. However, the counit is a natural transformation {(A \times B,A \times B) \rightarrow (A,B)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, so not simply a (polymorphic) function; but arrows in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}} are pairs of functions, so we might write this {(\mathit{fst},\mathit{snd}) :: (A \times B \rightarrow A, A \times B \rightarrow B)}.

Then the “fork” operation is in fact one of the two witnesses to the isomorphism between the sets of arrows: given an arrow {\triangle A \rightarrow (B,C)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, that is, a pair {(f,g)} of functions of types {(A \rightarrow B,A \rightarrow C)}, then {\mathit{fork}(f,g)} is an arrow {A \rightarrow {\times} (B,C)} in {\mathbb{S}\mathrm{et}}, that is, a function of type {A \rightarrow B \times C}, given by the construction above:

\displaystyle  \mathit{fork}(f,g) = (\times) (f,g) \cdot \mathit{double}

or, with more points,

\displaystyle  \mathit{fork} (f,g)\,a = (f\,a, g\,a)

The laws that the unit and counit satisfy are

\displaystyle  \begin{array}{lcl} (\mathit{fst},\mathit{snd}) \cdot \triangle \mathit{double} &=& \mathit{id} \\ (\times) (\mathit{fst},\mathit{snd}) \cdot \mathit{double} &=& \mathit{id} \end{array}

or, in more familiar terms,

\displaystyle  \begin{array}{lcl} \mathit{fst} \cdot \mathit{double} &=& \mathit{id} \\ \mathit{snd} \cdot \mathit{double} &=& \mathit{id} \\ \mathit{fork} (\mathit{fst},\mathit{snd}) &=& \mathit{id} \end{array}

The universal property of {\mathit{fork}} follows from the isomorphism between sets of arrows:

\displaystyle  \begin{array}{ll} & h = \mathit{fork}(f,g) \\ \Leftrightarrow & \qquad \{ \mathit{fork} \} \\ & h = (\times) (f,g) \cdot \mathit{double} \\ \Leftrightarrow & \qquad \{ \mbox{isomorphism between arrow sets} \} \\ & (\mathit{fst},\mathit{snd}) \cdot \triangle h = (f,g) \\ \Leftrightarrow & \qquad \{ \triangle \} \\ & (\mathit{fst},\mathit{snd}) \cdot (h,h) = (f,g) \\ \Leftrightarrow & \qquad \{ \mbox{composition in~} \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et} \mbox{~is pointwise} \} \\ & (\mathit{fst} \cdot h,\mathit{snd} \cdot h) = (f,g) \\ \Leftrightarrow & \qquad \{ \mbox{equality of pairs is pointwise} \} \\ & \mathit{fst} \cdot h=f \land \mathit{snd} \cdot h=g \end{array}

The universal property of {\mathit{fork}} underlies all the useful laws of that operator.

Of course, the situation nicely dualizes too. Coproducts in {\mathbb{S}\mathrm{et}} arise from the isomorphism between the set of arrows {A+B \rightarrow C} and the pairs of arrows in {A \rightarrow C} and {B \rightarrow C}. Again, “pairs of arrows” suggest the product category; but this time, the diagonal functor is the right adjoint, with the coproduct functor {(+) : \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et} \rightarrow \mathbb{S}\mathrm{et}} (which takes a pair of sets {(A,B)} to their disjoint union) as the left adjoint. That is, the adjunction is {(+) \dashv \triangle}, and the isomorphism is

\displaystyle  (+) (A,B) \rightarrow C \approx (A,B) \rightarrow \triangle C

The unit {\eta : \mathsf{Id} \mathbin{\stackrel{.}{\to}} \triangle \mathbin{\cdot} (+)} is a natural transformation {(A,B) \rightarrow (A+B,A+B)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, that is, a pair of functions {\mathit{inl} : A \rightarrow A+B} and {\mathit{inr} : B \rightarrow A+B}. The counit {\epsilon : (+) \mathbin{\cdot} \triangle \mathbin{\stackrel{.}{\to}} \mathsf{Id}} is a natural transformation {A+A \rightarrow A} in {\mathbb{S}\mathrm{et}}, which we might call {\mathit{merge}}. The “join” of two functions with a common range is a witness to one half of the isomorphism—given an arrow {(f,g) : (A,B) \rightarrow \triangle C} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, then {\mathit{join} (f,g)} is an arrow {(+) (A,B) \rightarrow C} in {\mathbb{S}\mathrm{et}}, defined by

\displaystyle  \mathit{join} (f,g) = \mathit{merge} \cdot (+) (f,g)

The two laws that the unit and counit satisfy are:

\displaystyle  \begin{array}{lcl} \mathit{merge} \cdot (+) (\mathit{inl},\mathit{inr}) &=& \mathit{id} \\ \triangle \mathit{merge} \cdot (\mathit{inl},\mathit{inr}) &=& \mathit{id} \end{array}

or, perhaps more perspicuously,

\displaystyle  \begin{array}{lcl} \mathit{join} (\mathit{inl},\mathit{inr}) &=& \mathit{id} \\ \mathit{merge} \cdot \mathit{inl} &=& \mathit{id} \\ \mathit{merge} \cdot \mathit{inr} &=& \mathit{id} \end{array}

Another familiar example from functional programming is the notion of currying, which arises when one can construct the function space {A \Rightarrow B} (the type of functions from {A} to {B}, for each type {A} and {B}), such that there is an isomorphism between the sets of arrows {A \rightarrow (B \Rightarrow C)} and {A \times B \rightarrow C}. Here, the adjunction is {( \times B) \dashv (B \Rightarrow )}—in this case, both functors are endofunctors on {\mathbb{S}\mathrm{et}}. The unit and counit are natural transformations {\mathsf{Id} \mathbin{\stackrel{.}{\to}} (B \Rightarrow )\mathbin{\cdot}( \times B)} and {( \times B)\mathbin{\cdot}(B \Rightarrow ) \mathbin{\stackrel{.}{\to}} \mathsf{Id}}. We might call these {\mathit{pair}} and {\mathit{apply}}, since the first is a curried pair-forming operator, and the second applies a function to an argument:

\displaystyle  \begin{array}{lcl} \mathit{pair} &:& A \rightarrow (B \Rightarrow (A \times B)) \\ \mathit{apply} &:& (B \Rightarrow A) \times B \rightarrow A \end{array}

The laws they satisfy are as follows:

\displaystyle  \begin{array}{lcl} \mathit{apply} \cdot ( \times B) \mathit{pair} &=& \mathit{id} \\ (B \Rightarrow )\mathit{apply} \cdot \mathit{pair} &=& \mathit{id} \end{array}

or, in points,

\displaystyle  \begin{array}{lcl} \mathit{apply} (\mathit{pair}\,a,b) &=& (a,b) \\ \mathit{apply} \cdot \mathit{pair}\,f &=& f \end{array}

The isomorphism itself is witnessed by the two inverse functions

\displaystyle  \begin{array}{lcl} \mathit{curry}\,f &=& (B \Rightarrow ) f \cdot \mathit{pair} \\ \mathit{uncurry}\,g &=& \mathit{apply} \cdot ( \times B) g \end{array}

where {f : A \times B \rightarrow C} and {g : A \rightarrow (B \Rightarrow C)}.

Posted in Uncategorized | 5 Comments

Universal properties and Galois connections

One recurring theme throughout this series will be that of a universal property—an identity that captures an indirect means of solving a problem, by transforming that problem into a different (and hopefully simpler) domain, while still preserving all its essential properties. In particular, the original problem has a solution if and only if the transformed problem does, and moreover, the solution to the transformed problem can easily be translated back into a solution to the original problem. One can see universal properties as a generalization of the notion of a Galois connection between two orderings, which are a similarly powerful technique of relating problems in two different settings. (In fact, the proper generalization of Galois connections is to adjunctions, but that’s a story for next time.)

Universal properties

The universal property of the {\mathit{fork}} operation for products is a representative example. Recall that {\mathit{fork}\,(f,g) :: a \rightarrow (b,c)} when {f :: a \rightarrow b} and {g :: a \rightarrow c}; and that {\mathit{fst} :: (b,c) \rightarrow b} and {\mathit{snd} :: (b,c) \rightarrow c}. Then {\mathit{fork}} is completely defined by its universal property:

\displaystyle  h = \mathit{fork}\,(f,g) \quad\Leftrightarrow\quad \mathit{fst} \cdot h = f \land \mathit{snd} \cdot h = g

This identity repays careful study.

  • It translates a problem in the more complex domain of products (namely, the problem of showing how some complicated expression {h} can be written in terms of {\mathit{fork}}) into simpler problems (here, equations about the two projections of {h}).
  • It’s an equivalence. So not only do you have an implication from left to right (any {h} expressible as a {\mathit{fork}} satisfies the two properties on the right), you also have one from right to left (any pair of functions {f,g} satisfying the two properties on the right induces a {\mathit{fork}}). In other words, {h} is a solution to the equation on the left iff it is a solution on the right; not only does a solution on the right yield a construction on the left, but also the absence of solutions on the right implies the absence on the left. Or again: the equations on the right have a unique solution in {h}—since any two solutions {h,h'} must both be equal to the same expression on the left.
  • It has many useful simple consequences. You can make the left-hand side trivially true by letting {h = \mathit{fork}\,(f,g)}; then the right-hand side must also be true:

    \displaystyle  \begin{array}{lcl} \mathit{fst} \cdot \mathit{fork}\,(f,g) &=& f \\ \mathit{snd} \cdot \mathit{fork}\,(f,g) &=& g \end{array}

    Symmetrically, you can make the right-hand side trivially true by letting {f = \mathit{fst} \cdot h} and {g = \mathit{snd} \cdot h}; then the left-hand side must also be true:

    \displaystyle  h = \mathit{fork}\,(\mathit{fst} \cdot h, \mathit{snd} \cdot h)

    If you further let {h = \mathit{id}}, you conclude that every pair consists solely of its two projections, nothing more:

    \displaystyle  \mathit{id} = \mathit{fork}\,(\mathit{fst}, \mathit{snd})

    In fact, the universal property of {\mathit{fork}} tells you everything you need to know about {\mathit{fork}}; you might take that as one justification for the term “universal”.

  • It also has many useful less obvious consequences. For example, if you’re searching for an {h} that acts independently on the two components of a pair—{\mathit{fst} \cdot h = h_1 \cdot \mathit{fst}} and {\mathit{snd} \cdot h = h_2 \cdot \mathit{snd}}—just let {f = h_1 \cdot \mathit{fst}} and {g = h_2 \cdot \mathit{snd}} in the universal property, and conclude

    \displaystyle  h = \mathit{fork}\,(h_1\cdot\mathit{fst}, h_2\cdot\mathit{snd})

    (which we’ve written “{\mathit{prod}\,(h_1,h_2)}” elsewhere). For another example, we can deduce a fusion law for {\mathit{fork}}: for what {f',g'} does the equation

    \displaystyle  \mathit{fork}\,(f,g) \cdot k = \mathit{fork}\,(f',g')

    hold? This matches the left-hand side of the universal property; expanding the right-hand side yields

    \displaystyle  \begin{array}{lclcl} f' &=& \mathit{fst}\cdot\mathit{fork}\,(f,g)\cdot k &=& f \cdot k \\ g' &=& \mathit{snd}\cdot\mathit{fork}\,(f,g)\cdot k &=& g \cdot k \end{array}

Such a rich harvest from so small a seed! (In fact, we will see later that an even smaller seed suffices.)

Galois connections

We can see the same structures that occur in universal properties like that of {\mathit{fork}} above also in relationships between orderings. As a very simple example, consider the problem of dividing a natural number {n} by two, exactly; the universal property of a solution {m} to this problem is the equivalence

\displaystyle  n / 2 = m \Leftrightarrow n = m \times 2

That is, {m} is a solution to the problem “compute {n / 2}” precisely when {n = m \times 2}; both the existence and the identification of a solution to a problem expressed in terms of division has been translated to one in terms of multiplication—which is arguably a simpler setting. Note that the universal property amounts to an equivalence

\displaystyle  f(n) = m \Leftrightarrow n = g(m)

involving the two functions {f = (/2)} and {g = (\times 2)}, which are in some sense inverses. This pattern will crop up over and over again.

The division example involved an equivalence between the two identities {f(n)=m} and {n=g(m)}. More generally, another relation than “{=}” might be involved. Extending the previous example to integer division, rounding down, we have for {k>0}:

\displaystyle  n \div k \ge m \Leftrightarrow n \ge m \times k

Again, this relates the two (in some sense inverse) functions {(\div k)} and {(\times k)}; but this time equality is inadequate for stating the problem, and it perhaps more convincing to claim that a more complicated problem {(\div k)} has been translated into a simpler one {(\times k)}. What is more, translating the problem via this universal property pays dividends when it comes to reasoning about the problem, because the simpler problem space is much more amenable to calculation. For example, properties of repeated division {(n \div k) \div l} (for {k,l>0}) do not trip off the tongue; but we can reason straightforwardly as follows:

\displaystyle  \begin{array}{ll} & (n \div k) \div l \ge m \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & n \div k \ge m \times l \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & n \ge (m \times l) \times k \\ \Leftrightarrow & \qquad \{ \mbox{multiplication is associative} \} \\ & n \ge m \times (l \times k) \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & n \div (l \times k) \ge m \end{array}

Thus, {(n \div k) \div l \ge m} precisely when {n \div (l \times k) \ge m}, or in other words, {(n \div k) \div l = n \div (l \times k)}.

In this case, the two problem spaces have both involved the same relation {\ge} on the same domain, namely the natural numbers; that is not essential. For example, the universal property of the floor function {\lfloor\cdot\rfloor} from reals to integers is given by:

\displaystyle  \mathit{inj}(n) \le_R x \Leftrightarrow n \le_I \lfloor x \rfloor

where, to be completely explicit, we have written {\le_R} for the usual ordering on reals and {\le_I} for the corresponding ordering on integers, and {\mathit{inj}} for the injection from the integers into the reals. This time the two problem spaces involve two different orderings on different domains; we say that the pair of functions {\mathit{inj}} and {\lfloor\cdot\rfloor} form a Galois connection between the orderings {\le_R} and {\le_I}. (We also see that the relationship between the two functions {\mathit{inj}} and {\lfloor\cdot\rfloor} is becoming less like a pure inverse relationship, and more of an embedding–projection pair.)

As a simple non-arithmetical example of a Galois connection on a single domain, consider some set {U} and a fixed subset {X \subseteq U}; then

\displaystyle  A \cap X \subseteq B \Leftrightarrow A \subseteq B \cup \overline{X}

That is, {(\cap X)} and {(\cup \overline{X})} form a Galois connection between {\subseteq} and itself.

A non-arithmetical example between two different domains is afforded by the field of formal concept analysis, which relates “objects” and their “properties”. Given are sets {O} of objects and {P} of properties, and a relation {(\vdash) \subseteq O \times P}; we write {o \mathrel{\vdash} p} to denote that object {o} has property {p}. This induces “concept-forming operators” {\mathit{intent} : 2^O \rightarrow 2^P} and {\mathit{extent} : 2^P \rightarrow 2^O} defined by:

\displaystyle  \begin{array}{lcl} \mathit{intent}(E) &=& \{ p \in P \mid \forall o \in E .\; o \mathrel{\vdash} p \} \\ \mathit{extent}(I) &=& \{ o \in O \mid \forall p \in I .\; o \mathrel{\vdash} p \} \end{array}

That is, {\mathit{intent}(E)} is the set of properties enjoyed by all objects in {E}, and {\mathit{extent}(I)} is the set of objects enjoying all the properties in {I}; a concept is a pair {(E,I)} with {\mathit{intent}(E) = I} and {\mathit{extent}(I) = E}. The concept-forming operators form a Galois connection between {\subseteq} and {\supseteq}:

\displaystyle  \begin{array}{ll} & \mathit{extent}(I) \supseteq E \\ \Leftrightarrow& \qquad \{ \mbox{characteristic of~} \mathit{extent} \} \\ & \forall o \in E .\; (\forall p \in I .\; o \mathrel{\vdash} p) \\ \Leftrightarrow& \qquad \{ \mbox{commuting quantifiers} \} \\ & \forall p \in I .\; (\forall o \in E .\; o \mathrel{\vdash} p) \\ \Leftrightarrow& \qquad \{ \mbox{characteristic of~} \mathit{intent} \} \\ & I \subseteq \mathit{intent}(E) \end{array}

This construction can be used to translate a problem about the extension of a concept (that is, an enumeration of its instances) into one about the intension (that is, the characteristic properties of its instances). It is related to the observation that “syntax and semantics are adjoint“—under the analogy that “objects” are sets of mathematical structures, “properties” are axioms, and the relation is “satisfaction”, the models of an axiomatic theory {T} are included in a set of structures {S} if and only if the theory {T} logically entails the minimal axiomatization of {S}.

Posted in Uncategorized | 3 Comments

Lenses are the coalgebras for the costate comonad

I took part in the Dagstuhl Seminar on Bidirectional Transformations “BX” earlier this month. It was a meeting of people from four communities—databases, graph transformations, programming languages, and software engineering—discussing their various perspectives—namely the view–update problem in databases, triple graph grammars, lenses, and model synchronization—on the common problem of “BX”.

While there, I reported on a marvellous observation made by Russell O’Connor, that lenses are exactly the coalgebras for the costate comonad. That is, the independently identified notion of a “very well-behaved lens” in the work of Pierce and others coincides exactly with the categorical notion of a “coalgebra” for a particular comonad, the “costate” comonad. I’ll unpack that claim here.

Lenses

Pierce’s lenses are pairs of functions between “source” and “view” datatypes {S} and {V}: a “get” function {g : S \rightarrow V} and a “put” function {p : S \times V \rightarrow S}. The story is that the view is some projection of the data in the source—perhaps a subset of the data, or the data in a simpler format—and so in order to update the source given a modified view, one needs also a copy of the original source from which to reconstruct the missing information.

For these two functions to capture a “well-behaved” lens, they should satisfy the so-called Get–Put and Put–Get laws:

\displaystyle  \begin{array}{lcl} p\,(s,g\,s) &=& s \\ g\,(p\,(s,v)) &=& v \end{array}

The Get–Put law says that if you “get” a view of the source, and then “put” it straight back without modifying it, the source remains unmodified: a no-op edit on the view translates into a no-op on the source. The Put–Get law says that if you “put” any view into a source and then “get” it back, you end up with the view you first thought of: nothing is lost from the view when it is put back.

Additionally, for these two functions to capture a “very well-behaved” lens, they must satisfy a third law, the Put–Put law:

\displaystyle  \begin{array}{lcl} p\,(p\,(s,v),u) &=& p\,(s,u) \end{array}

In words, “put”ting back two views {v} then {u} is equivalent to “put”ting back just the second; any changes to the source from putting back {v} are completely overwritten when putting back {u}. (This turns out to be rather a strong condition, requiring that the source basically factors into the view and a completely independent “complement”; few real applications of bidirectional transformation satisfy it. But that’s another story.)

The costate comonad

Intuitively, comonads capture “data in context”. A comonad {(D,\mathit{extr},\mathit{dupl})} consists of a functor {D} together with two natural transformations {\mathit{extr} : D \rightarrow 1} and {\mathit{dupl} : D \rightarrow DD} that extract the data from its context and duplicate the context, satisfying the three axioms:

\displaystyle  \begin{array}{lcl} \mathit{extr} \cdot \mathit{dupl} &=& \mathit{id} \\ \mathit{fmap}\,\mathit{extr} \cdot \mathit{dupl} &=& \mathit{id} \\ \mathit{fmap}\,\mathit{dupl} \cdot \mathit{dupl} &=& \mathit{dupl} \cdot \mathit{dupl} \end{array}

One example of a comonad is the “costate” construction: for fixed {V}, define functor {D} by

\displaystyle  \begin{array}{lcl} D\,A &=& V \times (V \rightarrow A) \end{array}

so that the “map” function for {D} satisfies {\mathit{fmap}\,h\,(v,f) = (v, h \cdot f)}. The operations are given by

\displaystyle  \begin{array}{lcl} \mathit{extr}\,(v,f) &=& f\,v \\ \mathit{dupl}\,(v,f) &=& (v, \lambda u \rightarrow (u,f)) \end{array}

Verifying that these definitions satisfy the comonad axioms is left as an exercise for the interested reader.

(Incidentally, I think it’s called the “costate” comonad more because it is the dual {(V\times)\cdot(V\rightarrow)} of the “state” monad {(V\rightarrow)\cdot(V\times)}, rather than because it has anything to do with stateful computations. However, it does model state in the sense of stored variables; and indeed, Russell O’Connor’s blog posting calls {D} the “store” comonad.)

Coalgebras of a comonad

For a functor {F}, an {F}-coalgebra is a pair {(A,f)} of a type {A} and a function {f : A \rightarrow F\,A}. A “coalgebra for a comonad {D}” is a {D}-coalgebra that interacts well with the operations {\mathit{extr}} and {\mathit{dupl}} of the comonad; that is, the function {f} should also satisfy the laws:

\displaystyle  \begin{array}{lcl} \mathit{extr} \cdot f &=& \mathit{id} \\ \mathit{dupl} \cdot f &=& \mathit{fmap}\,f \cdot f \end{array}

(Another incidentally: I don’t have a feeling for what these laws mean, in the way that I do for the laws of an algebra of a monad. At least for the free monads {T} that represent terms with free variables, an algebra is a pair {(A, f : T\,A \rightarrow A)} such that {f} makes sense as an “expression evaluator”—it respects singleton variables and substitution. It’s clear to me that the laws of a coalgebra for a comonad are the obvious duals of those for the algebra of a monad; and that they describe the interesting ways of putting together the coalgebra operation with the comonad operations; but I still don’t have a direct intuition. Any comments gratefully received!)

Lenses are coalgebras of the costate comonad

Now it’s just a matter of putting the pieces together. Curry the “put” function of a lens to obtain {p : S \rightarrow (V \rightarrow S)}, and define a lens to be the fork of the “get” and “put” functions:

\displaystyle  \begin{array}{lcl} \ell\,s &=& (g\,s, p\,s) \end{array}

Note that now {\ell : S \rightarrow D\,S} where {D} is the costate comonad. The Get–Put law is equivalent to the counit axiom of the coalgebra:

\displaystyle  \begin{array}{ll} & \mathit{extr} \cdot \ell = \mathit{id} \\ \Leftrightarrow & \qquad \{ \mbox{apply to an~} s \} \\ & \mathit{extr}\,(\ell\,s) = s \\ \Leftrightarrow & \qquad \{ \ell \} \\ & \mathit{extr}\,(g\,s, p\,s) = s \\ \Leftrightarrow & \qquad \{ \mathit{extr} \} \\ & p\,s\,(g\,s) = s \end{array}

And the Put–Get and Put–Put laws together are equivalent to the coassociativity axiom:

\displaystyle  \begin{array}{ll} & \mathit{dupl} \cdot \ell = \mathit{fmap}\,\ell \cdot \ell \\ \Leftrightarrow & \qquad \{ \mbox{apply to an~} s \} \\ & \mathit{dupl}\,(\ell\,s) = \mathit{fmap}\,\ell\,(\ell\,s) \\ \Leftrightarrow & \qquad \{ \ell \} \\ & \mathit{dupl}\,(g\,s, p\,s) = \mathit{fmap}\,\ell\,(g\,s, p\,s) \\ \Leftrightarrow & \qquad \{ \mathit{fmap} \mbox{~for~} D \} \\ & \mathit{dupl}\,(g\,s, p\,s) = (g\,s, \ell \cdot p\,s) \\ \Leftrightarrow & \qquad \{ \mathit{dupl} \} \\ & (g\,s, \lambda v \rightarrow (v, p\,s)) = (g\,s, \ell \cdot p\,s) \\ \Leftrightarrow & \qquad \{ \mbox{first components are clearly equal} \} \\ & \lambda v \rightarrow (v, p\,s) = \ell \cdot p\,s \\ \Leftrightarrow & \qquad \{ \mbox{apply to a~} v \} \\ & (v, p\,s) = \ell\,(p\,s\,v) \\ \Leftrightarrow & \qquad \{ \ell \} \\ & (v, p\,s) = (g\,(p\,s\,v), p\,(p\,s\,v)) \\ \Leftrightarrow & \qquad \{ \mbox{apply second components to a~} u \} \\ & (v, p\,s\,u) = (g\,(p\,s\,v), p\,(p\,s\,v)\,u) \\ \Leftrightarrow & \qquad \{ \mbox{pair equality is pointwise} \} \\ & v = g\,(p\,s\,v) \land p\,s\,u = p\,(p\,s\,v)\,u \end{array}

Posted in Uncategorized | 5 Comments

The stream monad

I read quite a nice problem on Nick Wu’s blog, which will serve as a fine warm-up exercise. It’s about the fact that streams (infinite lists) form a monad, in a different way from lists. Nick shows the “right” and two “wrong” definitions of the join or bind operation, distinguishing them on the basis of the monad laws. But I think Nick’s proofs are more complicated than they need to be, because he hasn’t fully exploited the recursion patterns that underlie his definitions.

This post will involve some language that we have not yet covered. Fear not! I hope it will be clear from context. But in case it isn’t, you might want to take a look at some of the background material (especially the paper Calculating Functional Programs).

Streams

Like Nick, for simplicity we will take the datatype of streams to be a synonym for lists; in all that follows, assume that lists are properly infinite (not finite, or partial).

\displaystyle  \mathbf{type}\;\mathit{Stream}\,a = [a]

Streams are naturally a codatatype rather than a datatype: in the category of sets and total functions, they would be represented as a final coalgebra rather than an initial algebra. In Haskell, which is roughly based on the category of CPOs and continuous functions, initial algebras and final coalgebras coincide, so we need not (indeed, we cannot) make the distinction formally. But we can make it informally, by stipulating that the basic pattern of computation for streams is the {\mathit{unfold}}:

\displaystyle  \begin{array}{l} \mathit{unfold} :: (b \rightarrow (a,b)) \rightarrow b \rightarrow \mathit{Stream}\,a \\ \mathit{unfold}\,f\,b = a : \mathit{unfold}\,f\,b' \qquad\mathbf{where}\qquad (a,b') = f\,b \end{array}

{\mathit{unfold}\,f} generates a stream from a seed, using the body {f} that transforms a seed {b} into an element {a} and a new seed {b'}. For example, the map function for streams uses the input stream as the seed, repeatedly splitting it into its head and tail:

\displaystyle  \begin{array}{l} \mathit{mapS} :: (a \rightarrow b) \rightarrow \mathit{Stream}\,a \rightarrow \mathit{Stream}\,b \\ \mathit{mapS}\,f = \mathit{unfold}\,(\mathit{fork}\,(f \cdot \mathit{head}, \mathit{tail})) \end{array}

where {\mathit{fork}} applies two functions to the same argument:

\displaystyle  \mathit{fork}\,(f,g)\,a = (f\,a, g\,a)

The crucial property of {\mathit{unfold}} is its universal property, which provides necessary and sufficient conditions for a computation to be expressible as an instance of {\mathit{unfold}}:

\displaystyle  h = \mathit{unfold}\,f \Leftrightarrow \mathit{out} \cdot h = \mathit{prod}\,(\mathit{id},h) \cdot f

where {\mathit{out} = \mathit{fork}\,(\mathit{head},tail)} deconstructs a stream into its head and tail, and

\displaystyle  \mathit{prod}\,(f,g)\,(a,b) = (f\,a, g\,b)

From the universal property, one can easily (exercise!) prove three simple consequences (we’ll call them the “identity” and two “evaluation” rules):

\displaystyle  \begin{array}{l} \mathit{unfold}\,\mathit{out} = \mathit{id} \\ \mathit{head} \cdot \mathit{unfold}\,(\mathit{fork}\,(f,g)) = f \\ \mathit{tail} \cdot \mathit{unfold}\,(\mathit{fork}\,(f,g)) = \mathit{unfold}\,(\mathit{fork}\,(f,g)) \cdot g \end{array}

and the very important fusion law:

\displaystyle  \mathit{unfold}\,f \cdot h = \mathit{unfold}\,g \Leftarrow f \cdot h = \mathit{prod}\,(\mathit{id},h) \cdot g

allowing a preceding function {h} to be absorbed into the unfold.

Streams as a monad

Making streams a monad amounts to defining functions

\displaystyle  \begin{array}{lcl} \mathit{return} &::& a \rightarrow \mathit{Stream}\,a \\ \mathit{join} &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow \mathit{Stream}\,a \end{array}

satisfying the monad laws:

\displaystyle  \begin{array}{lcl} \mathit{join} \cdot \mathit{return} &=& \mathit{id} \\ \mathit{join} \cdot \mathit{mapS}\,\mathit{return} &=& \mathit{id} \\ \mathit{join} \cdot \mathit{mapS}\,\mathit{join} &=& \mathit{join} \cdot \mathit{join} \end{array}

Looking at the type, the obvious (indeed, I think the only possible) definition one can give for {\mathit{return}} is {\mathit{return} = \mathit{repeat}} where

\displaystyle  \mathit{repeat} = \mathit{unfold}\,\mathit{double}

and {\mathit{double} = \mathit{fork}\,(\mathit{id},\mathit{id})} makes two copies of its argument. However, there are many type-correct definitions one could give for { \mathit{join}}, including {\mathit{head}}, {\mathit{mapS}\,\mathit{head}}, and {\mathit{diag}}, where

\displaystyle  \mathit{diag} = \mathit{unfold}\,\mathit{hhtt}

and where (for brevity in what follows) we define

\displaystyle  \begin{array}{lcl} \mathit{hhtt} &=& \mathit{fork}\,(\mathit{hh},\mathit{tt}) \\ \mathit{hh} &=& \mathit{head}\cdot\mathit{head} \\ \mathit{tt} &=& \mathit{tail}\cdot\mathit{mapS}\,\mathit{tail} \end{array}

Obviously, {\mathit{head}} yields the first “row” of a stream of streams (if one considers it in row-major order), and {\mathit{mapS}\,\mathit{head}} yields the first column; as the name suggests, {\mathit{diag}} yields the leading diagonal. Nick’s post demonstrates that the first two, although type-correct, do not satisfy the monad laws. He also provides a proof that the third does, which we turn to next.

Checking the monad laws

The proofs that {\mathit{repeat}} and {\mathit{diag}} satisfy the three monad laws are very straightforward, using the universal property of {\mathit{unfold}} and its consequences.

For the first monad law, fusion gives us the condition to check:

\displaystyle  \begin{array}{ll} & \mathit{diag}\cdot\mathit{repeat} = \mathit{id} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{repeat} = \mathit{prod}\,(\mathit{id},\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

Working on the right-hand side, we have:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{repeat} \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fork}\,(\mathit{hh},\mathit{tt})\cdot\mathit{repeat} \\ = & \qquad \{ \mbox{composition distributes backwards over fork} \} \\ & \mathit{fork}\,(\mathit{hh}\cdot\mathit{repeat},\mathit{tt}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{definitions} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{repeat},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{evaluation for~} \mathit{repeat} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{repeat}\cdot\mathit{tail}) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{repeat}) \cdot \mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

discharging the proof obligation.

Similarly, for the second monad law, fusion gives us the condition:

\displaystyle  \begin{array}{ll} & \mathit{diag}\cdot\mathit{mapS}\,\mathit{repeat} = \mathit{id} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{repeat} = \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

and working on the right-hand side, in almost exactly the same steps we get:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{repeat} \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fork}\,(\mathit{hh},\mathit{tt})\cdot\mathit{mapS}\,\mathit{repeat} \\ = & \qquad \{ \mbox{composition distributes backwards over fork} \} \\ & \mathit{fork}\,(\mathit{hh}\cdot\mathit{mapS}\,\mathit{repeat},\mathit{tt}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{definitions} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{mapS}\,\mathit{repeat},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; functors; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{mapS}\,\mathit{repeat}\cdot\mathit{tail}) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat}) \cdot \mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

discharging the obligation.

What about the third monad law? To apply the universal property (or fusion), we need one side to be expressed as an unfold; but neither side of the equation {\mathit{diag}\cdot\mathit{diag} = \mathit{diag}\cdot\mathit{mapS}\,\mathit{diag}} is in that form. No matter; let us hypothesize that one side—say, the left—can be expressed in the form {\mathit{unfold}\,h} for some {h}, then calculate a suitable definition for {h} (if one exists). Assuming we succeed, then we can use fusion to check that the other side equals {\mathit{unfold}\,h}. (This strategy doesn’t work if we can find no such {h}!)

Again, fusion gives us

\displaystyle  \mathit{diag}\cdot\mathit{diag} = \mathit{unfold}\,h \Leftarrow \mathit{hhtt}\cdot\mathit{diag} = \mathit{prod}\,(\mathit{id},\mathit{diag})\cdot h

so we calculate:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{diag} \\ = & \qquad \{ \mbox{definition; distribution} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{diag},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{diag}) \\ = & \qquad \{ \mbox{evaluation for~} \mathit{diag} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{diag}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{diag}\cdot\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{diag})\cdot\mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \end{array}

Therefore, letting

\displaystyle  \mathit{hhhttt} = \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail}))

we have concluded that

\displaystyle  \mathit{diag}\cdot\mathit{diag} = \mathit{unfold}\,\mathit{hhhttt}

Now all we have to do is to check that the right-hand side of the third monad law also equals this; fusion gives us the condition

\displaystyle  \begin{array}{ll} & \mathit{diag}\cdot\mathit{mapS}\,\mathit{diag} = \mathit{unfold}\,\mathit{hhhttt} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{diag} = \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{diag})\cdot \mathit{hhhttt} \end{array}

and we calculate on the right-hand side:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{diag} \\ = & \qquad \{ \mbox{definition; distribution} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{mapS}\,\mathit{diag},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{diag}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{diag}) \\ = & \qquad \{ \mbox{functors; naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{mapS}\,\mathit{diag}\cdot\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \\ = & \qquad \{ \mbox{pairs; definition} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{diag})\cdot\mathit{hhhttt} \end{array}

completing the proof.

As you’ll see, the calculations are all quite short and simple, whereas in Nick’s formulation, they were rather hard work; I think that was (a) because he wasn’t exploiting the universal property, and (b) because he was working in terms of the “bind” rather than the “join” of the monad, which forced him into a more pointwise rather than pointfree style. Points are helpful when writing programs, but less so when reasoning about them.

Deducing diag

Here’s another way of looking at the problem. Nick’s blog presented three plausible (that is, type-correct) definitions for the {\mathit{join}} operation. Two of these didn’t satisfy the necessary laws, so were evidently wrong. The third, {\mathit{diag}}, does satisfy the laws, but is it the only possible definition that does? I believe that it is the only solution in the form of an unfold; but I only have a hand-waving argument as to why.

Let us suppose that indeed

\displaystyle  join = \mathit{unfold}\,k

for some {k}. Without loss of generality, let us suppose also that

\displaystyle  k = \mathit{fork}\,(k_1,k_2)

with

\displaystyle  \begin{array}{lcl} k_1 &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow a \\ k_2 &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow \mathit{Stream}\,(\mathit{Stream}\,a) \end{array}

I claimed above that {\mathit{repeat}} is the only type-correct definition of the {\mathit{return}} operation. (Ignoring bottoms, that is. Which is to say, in Haskell, all type-correct definitions are approximations in the definedness ordering to {\mathit{repeat}}.)

Consideration of just the first two monad laws gives us some constraints on {k}, since we know that {\mathit{return} = \mathit{repeat}}:

\displaystyle  \begin{array}{lcl} k\cdot\mathit{repeat} &=& \mathit{prod}\,(\mathit{id},\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \\ k\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

Or in terms of {k}‘s two components,

\displaystyle  \begin{array}{lcll} k_1\cdot\mathit{repeat} &=& \mathit{head} &(1)\\ k_2\cdot\mathit{repeat} &=& \mathit{repeat}\cdot\mathit{tail} &(2)\\ k_1\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{head} &(3)\\ k_2\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{mapS}\,\mathit{repeat}\cdot\mathit{tail} &(4) \end{array}

I claim that (1) entails that {k_1} picks some element out of the first “column” of a stream of streams (thinking of the input as an infinite matrix in row-major order again)—for the equation says that when the input consists of infinitely many copies of the same stream, {k_1} picks (one of the many copies of) the head of that stream. Symmetrically, (3) entails that, when given a infinite matrix whose columns are all equal, {k_1} picks some element out of the first “row”. And because {k_1} has to be polymorphic, it cannot behave differently on special matrices like these than it does in general. Putting those statements together, and waving my hands in the air, I conclude that {k_1} picks the only element that is in both the first row and the first column:

\displaystyle  k_1 = \mathit{head} \cdot \mathit{head}

Similarly, Equation (2) says that, given an infinite input matrix all of whose rows are equal, {k_2} drops the first column (and possibly some of the rows are duplicated or dropped, and the order of the rows may change; but the elements of the rows are untouched). Symmetrically, (4) says that, given an input whose columns are all equal, {k_2} drops the first row (and may duplicate, drop, or rearrange the columns, but not change any of them). And again, the behaviour in general must be consistent with these special cases. Putting these observations together, {k_2} must drop the first row and the first column, and cannot change any of the remainder of the matrix.

\displaystyle  k_2 = \mathit{tail} \cdot \mathit{mapS}\,\mathit{tail}

What is the right framework in which to present such arguments more formally? It feels rather like Paul Hoogendijk’s relational approach to generic programming, which has to talk about largest natural transformations of a given type: the relational setting provides the conjunction one needs in order to express the two separate constraints on {k_1}.

Posted in Uncategorized | 9 Comments