Extreme solutions

Not all problem statements are amenable to translation via Galois connections or adjunctions. A reasonable characterization of those that are suitable is the optimization problems—finding the least or greatest solution satisfying a given collection of constraints, according to some ordering.

Least and greatest solutions

For example, the Galois connection determining integer division that we considered a couple of posts ago

\displaystyle  n \div k \ge m \Leftrightarrow n \ge m \times k

defines {n \div k} to be the greatest solution {m} to the equation {n \ge m \times k}. It does so in a very pithy way: reading the equivalence as an implication from left to right, instantiating {m} to {n \div k} (and exploiting the reflexivity of the ordering {\ge}), we get that {n \ge (n \div k) \times k}, so {n \div k} is indeed a solution to the equation on the right; reading the equivalence from right to left, we get that {n \div k \ge m} for any solution {m}, so {n \div k} is in fact the greatest solution.

Similarly, the characterization of the floor function {\lfloor\cdot\rfloor} from reals to integers

\displaystyle  \mathit{inj}(n) \le_R x \Leftrightarrow n \le_I \lfloor x \rfloor

defines {\lfloor x \rfloor} as the greatest integer {n} for which {\mathit{inj}(n) \le_R x}, and the Galois connection involving {\cap} and {\cup}

\displaystyle  A \cap X \subseteq B \Leftrightarrow A \subseteq B \cup \overline{X}

characterizes {B \cup \overline{X}} as the greatest set {A} (under the usual subset ordering) for which {A \cap X \subseteq B}.

Limits and colimits

The characterization of greatest solutions might be equivalently expressed in terms of greatest lower bounds. Given a preordered set {(X,\le)}, and a subset {Y} of {X}, an element {x \in X} is a lower bound of {Y} in {X} if {x \le y} for every {y \in Y}; in addition, {x} is a greatest lower bound {\mathrm{inf}(Y)} of {Y} if {x' \le x} for any other lower bound {x'}. (Note “a” rather than “the”, as there may be multiple such. But they are all related by {\le}; if the ordering is a partial order, the {\mathrm{inf}} is unique when it exists. Note also that {\mathrm{inf}(Y)} need not be in {Y} itself, even when it does exist.)

This construction can be phrased in terms of Galois connections as follows. The two ordered sets are {(X,\le)} and {(\mathsf{P}^+(X),\le^\ast)}, where {\mathsf{P}^+(X)} is the set of nonempty subsets of {X}, with ordering {\le^\ast} defined pointwise: {Y \le^\ast Z} iff {x \le x'} for all {x \in Y, x' \in Z}. The mappings in either direction are the singleton set former {\{\cdot\} : X \rightarrow \mathsf{P}^+(X)} and greatest lower bound {\mathrm{inf} : \mathsf{P}^+(X) \rightarrow X}, related by the Galois connection {x \le \mathrm{inf}(Y) \Leftrightarrow \{x\} \le^\ast Y}. Here’s how it looks with {x' \le x = \mathrm{inf}(Y)} and {Y = \{ x_0,x_1,x_2 \}}:

The categorical perspective on greatest lower bounds is the notion of limit; it’s just the generalization of the diagram above to an arbitrary category. Here is a very brief outline. The fragment of the diagram consisting of {x, x_0, x_1, x_2} is called a cone, from vertex {x} to base {x_0,x_1,x_2} (and so is {x',x_0,x_1,x_2}). The cone {x,x_0,x_1,x_2} is called a limit when, for any other cone from vertex {x'} to the same base, there is a unique arrow {x' \rightarrow x} making the diagram commute.

Commutativity of the diagram above isn’t very interesting—because the category is a partial order, but also because the base is degenerate: just three discrete objects. In general, the base will also contain arrows; then a cone consists of a vertex ({X} in the diagram below) with arrows to each of the objects in the base ({f_i : X \rightarrow X_i}) making the diagram commute ({g_0 \cdot f_0 = f_1}, etc). As before, the cone from vertex {X} is a limit if any other cone factors uniquely through it.

Of course, it all dualizes beautifully. The categorical perspective on least upper bounds is expressed in terms of cones from a base ({X_0,X_1,X_2} below) to a vertex ({X}), being a colimit if allowing unique factorization for any other cone from the same base to another vertex ({X'}).

Initial algebras and final coalgebras

Recall that, for a functor {\mathsf{F}}, an {\mathsf{F}}-algebra is a pair {(X,f)} consisting of an object {X} and an arrow {f : \mathsf{F}(X) \rightarrow X}. A homomorphism between {\mathsf{F}}-algebras {(X,f)} and {(Y,g)} is an arrow {h : X \rightarrow Y} such that:

The {\mathsf{F}}-algebra {(X,f)} is initial if there is a unique such {h} for each {(Y,g)}. We usually write {\mu\mathsf{F}} for the “carrier” of this initial algebra (because it is the “least fixed point” of {\mathsf{F}}, as we shall see below), and {\mathit{in} : \mathsf{F}(\mu\mathsf{F}) \rightarrow \mu\mathsf{F}} for the “constructor” (and indeed, it is an isomorphism, so a constructed piece of data can be deconstructed again); we write {h=\mathit{fold}(g)} for the unique {h} such that {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}.

As you might expect, “initial” things are extreme solutions too, albeit not in a very interesting way. An initial object in a category is an object from which there is a unique arrow (often written “{!}“) to any other object. An initial object is a colimit of the diagram generated from the empty category—which has no objects, and hence no arrows either. (Any object forms the vertex of a (trivial) cone, so the colimiting vertex is simply one from which there is a unique arrow to any other vertex, with no additional constraints.) In particular, an initial {\mathsf{F}}-algebra is an initial object in the category of {\mathsf{F}}-algebras, whose objects are {\mathsf{F}}-algebras and whose arrows are homomorphisms between them.

And of course, it all dualizes nicely, to final coalgebras, which are in some sense “greatest fixed points” of functors; final objects are the vertices of limiting cones to the empty base.

Extreme (co-)algebras as (co-)limits

Here is a more illuminating presentation of initial algebras as extreme solutions, explaining rather better in what way they correspond to “least fixed points” of functors. (The construction is well known; I’ve based this presentation on a manuscript by François Métayer.) Initial algebras can be constructed as an instance of the colimit construction above, in which the base consists of a countable chain of objects and arrows:

In the category {\mathbb{S}\mathrm{et}}, every such a chain has a colimit (categories with this property are called {\omega}-categories).

If the category has an initial object {0}, then any endofunctor {\mathsf{F}} induces such a countable chain:

Under mild assumptions, the colimit of this chain is (the carrier of) an initial {\mathsf{F}}-algebra. (Besides assuming an {\omega}-category with an initial object, we have to assume that {\mathsf{F}} is {\omega}-cocontinuous—that is, that it transforms the colimit {X} of the countable chain {X_0 \rightarrow X_1 \rightarrow \cdots} into a colimit {\mathsf{F}(X)} of the countable chain {\mathsf{F}(X_0) \rightarrow \mathsf{F}(X_1) \rightarrow \cdots}. One can show that any polynomial functor—one built from constants using sum and product—is {\omega}-cocontinuous.)

The construction goes as follows. By assumption, the countable chain {0 \rightarrow \mathsf{F}(0) \rightarrow \cdots} has a colimit; let’s suggestively call the vertex {\mu \mathsf{F}}, so that the edges {u_i : \mathsf{F}^i(0) \rightarrow \mu\mathsf{F}} satisfy {u_{i+1} \cdot \mathsf{F}^i(!) = u_i} for each {i}.

Since {\mathsf{F}} is {\omega}-cocontinuous, it transforms this diagram into another colimiting cone, with base shifted one place to the right and vertex {\mathsf{F}(\mu \mathsf{F})}. But {\mu \mathsf{F}} is the vertex of another cone over the same shifted base; and since {\mathsf{F}(\mu \mathsf{F})} is the colimit, there is a unique arrow—let’s suggestively call it {\mathit{in}}—making the diagram below commute ({\mathit{in} \cdot \mathsf{F}(u_i) = u_{i+1}} etc).

All we have to do now is to show that {(\mu\mathsf{F}, \mathit{in})} is indeed the initial {\mathsf{F}}-algebra, as claimed. Suppose we are given another {\mathsf{F}}-algebra {(Y,g)}; we will (i)~construct an arrow {h : \mu\mathsf{F} \rightarrow Y}, (ii)~show that it is a homomorphism between the algebras, {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}, and (iii)~show that it is the only such.

For (i), given the target {\mathsf{F}}-algebra {(Y,g)}, we can construct a square as follows:

which commutes by virtue of the initiality of {0}. Applying {\mathsf{F}} to this square yields another, which can be pasted alongside; and this can be repeated indefinitely, yielding the following ladder:

Then we can pick out arrows {\xi_i = \mathsf{F}^0(g) \cdot \mathsf{F}^1(g) \cdots \mathsf{F}^{i-1}(g) \cdot \mathsf{F}^i(!_Y) : \mathsf{F}^i(0) \rightarrow Y}. Moreover, these {\xi_i}s commute with the base of the colimit diagram ({\xi_{i+1} \cdot \mathsf{F}^i(!) = \xi_i}, etc) to yield another cone to vertex {Y}; we therefore conclude that there is a unique {h : \mu\mathsf{F} \rightarrow Y} such that {h \cdot u_i = \xi_i} for each {i}.

Now for (ii). Note that {Y} and the {\xi_{i+1}} also form a cone over the shifted base starting from {\mathsf{F}(0)}; and because {\mathsf{F}(\mu\mathsf{F})} is the colimit from this shifted base, we also get a unique mediating arrow {k : \mathsf{F}(\mu\mathsf{F}) \rightarrow Y} such that {k \cdot \mathsf{F}(u_i) = \xi_{i+1}} for each {i}.

Moreover, both {h \cdot \mathit{in}} and {g \cdot \mathsf{F}(h)} are also such mediating arrows:

\displaystyle  h \cdot \mathit{in} \cdot \mathsf{F}(u_i) = h \cdot u_{i+1} = \xi_{i+1} = g \cdot \mathsf{F}(\xi_i) = g \cdot \mathsf{F}(h \cdot u_i) = g \cdot \mathsf{F}(h) \cdot \mathsf{F}(u_i)

so both must equal {k} and hence also each other: {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}.

Finally, for (iii), suppose we have another {h' : \mu\mathsf{F} \rightarrow Y} for which {h' \cdot \mathit{in} = g \cdot \mathsf{F}(h')}; we have to show that {h' = h}. By the uniqueness of the mediating arrow, it suffices to show that {h \cdot u_i = \xi_i} for each {i}, which is easily done by induction.

That is, given {\mathsf{F}}-algebra {(Y,g)}, there exists a unique {h : \mu\mathsf{F} \rightarrow Y} (for which we write “{\mathit{fold}(g)}“) such that {h \cdot \mathit{in} = g \cdot \mathsf{F}(h)}. If you squint at this in the right way, you can see the inductive definition of the recursive datatype, and of the folds over it. Each {\mathsf{F}^i(0)} is an approximation to {\mu\mathsf{F}}, cut off at depth {i}; they all embed into {\mu\mathsf{F}}, and indeed, {\mu\mathsf{F}} is the least extension—the colimit—of them all. Each {\xi_i} is an approximation to {\mathit{fold}(g)}, again restricted to data structures cut off at depth {i}, and {\mathit{fold}(g)} is the completion of all the {\xi_i}.

Naturally, it all dualizes for final coalgebras: then we need “cochains” {1 \leftarrow \mathsf{F}(1) \leftarrow \mathsf{F}^2(1) \leftarrow \cdots} to a terminal object {1}; an {\omega^{\mathrm{op}}}-category is one in which all such countable cochains have a limit; {\omega}-continuous functors preserve limits of countable cochains. (It is a bit unfortunate that the interesting extreme algebra, namely the initial algebra, is a colimit, whereas the final coalgebra is a limit, but sometimes life is like that.)

(Co-)limits as adjunctions

The definition of limits can be made more concise and precise by noting that the base corresponds to the image of some functor {\mathsf{F} : \mathbb{J} \rightarrow \mathbb{C}}, where {\mathbb{C}} is the category of interest, and index category {\mathbb{J}} determines the shape of the base—for each object {j} of {\mathbb{J}}, there is an object {\mathsf{F}(j)} of {\mathbb{C}} in the base ({X_j} in the diagram below), and for each arrow {a : i \rightarrow j} of {\mathbb{J}}, an arrow {\mathsf{F}(a) : \mathsf{F}(i) \rightarrow \mathsf{F}(j)} of {\mathbb{C}} in the base ({g_a : X_i \rightarrow X_j} below).

(In the {\mathrm{inf}} diagram, the index category is the discrete category on three objects—with no arrows other than identity arrows. In the diagram above, {\mathbb{J}} is {\bullet \mathrel{{-}\vcenter{\hbox{\scriptsize0}}{\rightarrow}} \bullet \mathrel{{\leftarrow}\vcenter{\hbox{\scriptsize2}}{-}} \bullet}, with three objects and two generating arrows. In the construction of initial algebras, the index category is {\omega = \bullet \rightarrow \bullet \rightarrow \cdots}, equivalent to the usual {\le} ordering on the natural numbers, whereas for final coalgebras it is {\bullet \leftarrow \bullet \leftarrow \cdots}, equivalent to {\ge} on natural numbers.)

The vertex {X} too can be seen as the image of {\mathbb{J}} under a particular, degenerate functor—the diagonal functor {\Delta X : \mathbb{J} \rightarrow \mathbb{C}}, defined by {\Delta X(j) = X} for each object {j} of {\mathbb{J}}, and {\Delta X(a) = \mathit{id}_X} for each arrow {a}. Then “the cone {f} from vertex {X} to base {\mathsf{F}}” corresponds to a natural transformation {f : \Delta X \mathbin{\stackrel{.}{\to}} \mathsf{F}}: naturality is exactly the condition that the cone commutes. We write “{\mathrm{Lim}\,\mathsf{F}}” for the limiting object, {X}; its universal property is that, for any cone {f'} from {X'} to {\mathsf{F}}, there exists a unique {h : X' \rightarrow X} such that {f_i \cdot h = f'_i} for each {i}. In other words, there is a (natural) isomorphism between the natural transformations {\Delta X' \mathbin{\stackrel{.}{\to}} \mathsf{F}} and the arrows {X' \rightarrow \mathrm{Lim}\,\mathsf{F}}; that is, an adjunction {\Delta \dashv \mathrm{Lim}}, with limit being right adjoint to the diagonal.

Dually, of course, colimits turn out to be left adjoints: the whole construction is encapsulated in three symbols, {\mathrm{Colim} \dashv \Delta}.

Posted in Uncategorized | 6 Comments


Universal properties are a generalization of the notion of a Galois connection between two orderings. Or perhaps I should say: universal properties arise from adjunctions, and it is adjunctions that are a generalization of Galois connections. Adjunctions capture in an abstract categorical setting the idea of “optimal solutions to a problem”; and this idea is itself very general, capturing many of the structures underlying common patterns in programming (not to mention the rest of mathematics). Solutions to equations, products, limits of sequences of approximations, and minimality and maximality are just some of the instances of this powerful abstraction that we will make use of. In the preface to Categories for the Working Mathematician, Mac Lane wrote that “adjoint functors arise everywhere”.

Adjoint functors

Two functors {\mathsf{F} : \mathbb{D} \leadsto \mathbb{C}} and {\mathsf{G} : \mathbb{C} \leadsto \mathbb{D}} form an adjunction, written {\mathsf{F} \dashv \mathsf{G}}, if there is an isomorphism between the sets of arrows {\mathsf{F}(B) \rightarrow A} in {\mathbb{C}} and {B \rightarrow \mathsf{G}(A)} in {\mathbb{D}}. We say that {\mathsf{F}} is the left adjoint and {\mathsf{G}} the right adjoint. The essence of the isomorphism is captured by two natural transformations {\eta : \mathsf{Id} \mathbin{\stackrel{.}{\to}} \mathsf{G} \mathbin{\cdot} \mathsf{F}} in {\mathbb{D}} and {\epsilon : \mathsf{F} \mathbin{\cdot} \mathsf{G} \mathbin{\stackrel{.}{\to}} \mathsf{Id}} in {\mathbb{C}}, called the unit and counit of the adjunction; {\eta} is the image in {\mathbb{D}} of {\mathit{id}_{\mathsf{F}(B)} : \mathsf{F}(B) \rightarrow \mathsf{F}(B)} in {\mathbb{C}}, and conversely, {\epsilon} is the image in {\mathbb{C}} of {\mathit{id}_{\mathsf{G}(A)}} in {\mathbb{D}}. The unit and counit satisfy the laws

\displaystyle  \begin{array}{lcl} \epsilon_{\mathsf{F}(B)} \cdot \mathsf{F}(\eta_B) &=& \mathit{id}_{\mathsf{F}(B)} \\ \mathsf{G}(\epsilon_A) \cdot \eta_{\mathsf{G}(A)} &=& \mathit{id}_{\mathsf{G}(A)} \end{array}

From them one can construct the witnesses to the isomorphism for arbitrary arrows: for each arrow {f : \mathsf{F}(B) \rightarrow A} in {\mathbb{C}}, there is a unique arrow {g : B \rightarrow \mathsf{G}(A)} in {\mathbb{D}} such that {\epsilon_A \cdot \mathsf{F}(g) = f}, given by {g = \mathsf{G}(f) \cdot \eta_B}; and conversely, for each arrow {g : B \rightarrow \mathsf{G}(A)} in {\mathbb{D}}, there is a unique arrow {f : \mathsf{F}(B) \rightarrow A} in {\mathbb{C}} such that {\mathsf{G}(f) \cdot \eta_ B = g}, given by {f = \epsilon_B \cdot \mathsf{F}(g)}; and moreover, these two constructions are each other’s inverses.

Adjunctions from Galois connections

A preorder {(X,{\le})} forms a category: the objects of the category are the elements of the set~{X}, and between any two elements {x,y \in X}, there is a unique arrow if {x \le y}, and no arrow otherwise. That adjunctions are a generalization of Galois connections follows straightforwardly from the fact that there is at most one arrow between any two objects in a preorder category. Then monotonic functions {f : (X,{\le_X}) \rightarrow (Y,{\le_Y})} and {g : (Y,{\le_Y}) \rightarrow (X,{\le_X})} between preorders {(X,{\le_X})} and {(Y,{\le_Y})} form a Galois connection precisely if the sets of arrows {f(y) \rightarrow x} and {y \rightarrow g(x)} are isomorphic—that is, if both {f(y) \le_X x} and {y \le_Y g(x)} hold, or neither do, or in other words,

\displaystyle  f(y) \le_X x \Leftrightarrow y \le_Y g(x)

Adjoints of the diagonal functor

A very useful example of adjunctions arises in the definition of products—in the category {\mathbb{S}\mathrm{et}} of sets and total functions, for given types {A,B,C}, there is an isomorphism between the set of pair-generating functions, of type {A \rightarrow B \times C}, and their two projections, pairs of functions of types {A \rightarrow B} and {A \rightarrow C}. (Indeed, given functions {f:A \rightarrow B} and {g:A \rightarrow C}, one can construct the pair-generating function {\mathit{fork}(f,g) : A \rightarrow B \times C}; and conversely, given a pair-generating function {h : A \rightarrow B \times C}, one can construct its two projections {fst \cdot h : A \rightarrow B} and {snd \cdot h : A \rightarrow C}; and moreover, these two constructions are inverses.)

The “isomorphism between sets of arrows” can be elegantly expressed as an adjunction; since it concerns pairs of arrows, one side of the adjunction involves the product category {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}. The right adjoint is the product functor {(\times) : \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et} \leadsto \mathbb{S}\mathrm{et}}, mapping an object in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}—that is, a pair of sets—to their cartesian product as an object in {\mathbb{S}\mathrm{et}}, and an arrow in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}—that is, a parallel pair of functions—to a function in {\mathbb{S}\mathrm{et}} acting pointwise on pairs. In the other direction, the left adjoint is the diagonal functor {\triangle : \mathbb{S}\mathrm{et} \leadsto \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, mapping an object {A} in {\mathbb{S}\mathrm{et}} to the object {(A,A)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, and a function {f} to the pair of functions {(f,f)} as an arrow in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}. The adjunction {{\triangle} \dashv (\times)} amounts to the isomorphism

\displaystyle  \triangle A \rightarrow (B,C) \approx A \rightarrow {\times} (B,C)

or equivalently,

\displaystyle  (A \rightarrow B)\times(A \rightarrow C) \approx A \rightarrow (B\times C)

The unit and counit of the adjunction are {\eta : \mathsf{Id} \mathbin{\stackrel{.}{\to}} (\times) \mathbin{\cdot} \triangle} and {\epsilon : \triangle \mathbin{\cdot} (\times) \mathbin{\stackrel{.}{\to}} \mathsf{Id}}. In more familiar terms, the unit is a natural transformation in {\mathbb{S}\mathrm{et}}, so a polymorphic function; in fact, it’s the function of type {A \rightarrow A \times A} that we might call {\mathit{double}}. However, the counit is a natural transformation {(A \times B,A \times B) \rightarrow (A,B)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, so not simply a (polymorphic) function; but arrows in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}} are pairs of functions, so we might write this {(\mathit{fst},\mathit{snd}) :: (A \times B \rightarrow A, A \times B \rightarrow B)}.

Then the “fork” operation is in fact one of the two witnesses to the isomorphism between the sets of arrows: given an arrow {\triangle A \rightarrow (B,C)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, that is, a pair {(f,g)} of functions of types {(A \rightarrow B,A \rightarrow C)}, then {\mathit{fork}(f,g)} is an arrow {A \rightarrow {\times} (B,C)} in {\mathbb{S}\mathrm{et}}, that is, a function of type {A \rightarrow B \times C}, given by the construction above:

\displaystyle  \mathit{fork}(f,g) = (\times) (f,g) \cdot \mathit{double}

or, with more points,

\displaystyle  \mathit{fork} (f,g)\,a = (f\,a, g\,a)

The laws that the unit and counit satisfy are

\displaystyle  \begin{array}{lcl} (\mathit{fst},\mathit{snd}) \cdot \triangle \mathit{double} &=& \mathit{id} \\ (\times) (\mathit{fst},\mathit{snd}) \cdot \mathit{double} &=& \mathit{id} \end{array}

or, in more familiar terms,

\displaystyle  \begin{array}{lcl} \mathit{fst} \cdot \mathit{double} &=& \mathit{id} \\ \mathit{snd} \cdot \mathit{double} &=& \mathit{id} \\ \mathit{fork} (\mathit{fst},\mathit{snd}) &=& \mathit{id} \end{array}

The universal property of {\mathit{fork}} follows from the isomorphism between sets of arrows:

\displaystyle  \begin{array}{ll} & h = \mathit{fork}(f,g) \\ \Leftrightarrow & \qquad \{ \mathit{fork} \} \\ & h = (\times) (f,g) \cdot \mathit{double} \\ \Leftrightarrow & \qquad \{ \mbox{isomorphism between arrow sets} \} \\ & (\mathit{fst},\mathit{snd}) \cdot \triangle h = (f,g) \\ \Leftrightarrow & \qquad \{ \triangle \} \\ & (\mathit{fst},\mathit{snd}) \cdot (h,h) = (f,g) \\ \Leftrightarrow & \qquad \{ \mbox{composition in~} \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et} \mbox{~is pointwise} \} \\ & (\mathit{fst} \cdot h,\mathit{snd} \cdot h) = (f,g) \\ \Leftrightarrow & \qquad \{ \mbox{equality of pairs is pointwise} \} \\ & \mathit{fst} \cdot h=f \land \mathit{snd} \cdot h=g \end{array}

The universal property of {\mathit{fork}} underlies all the useful laws of that operator.

Of course, the situation nicely dualizes too. Coproducts in {\mathbb{S}\mathrm{et}} arise from the isomorphism between the set of arrows {A+B \rightarrow C} and the pairs of arrows in {A \rightarrow C} and {B \rightarrow C}. Again, “pairs of arrows” suggest the product category; but this time, the diagonal functor is the right adjoint, with the coproduct functor {(+) : \mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et} \rightarrow \mathbb{S}\mathrm{et}} (which takes a pair of sets {(A,B)} to their disjoint union) as the left adjoint. That is, the adjunction is {(+) \dashv \triangle}, and the isomorphism is

\displaystyle  (+) (A,B) \rightarrow C \approx (A,B) \rightarrow \triangle C

The unit {\eta : \mathsf{Id} \mathbin{\stackrel{.}{\to}} \triangle \mathbin{\cdot} (+)} is a natural transformation {(A,B) \rightarrow (A+B,A+B)} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, that is, a pair of functions {\mathit{inl} : A \rightarrow A+B} and {\mathit{inr} : B \rightarrow A+B}. The counit {\epsilon : (+) \mathbin{\cdot} \triangle \mathbin{\stackrel{.}{\to}} \mathsf{Id}} is a natural transformation {A+A \rightarrow A} in {\mathbb{S}\mathrm{et}}, which we might call {\mathit{merge}}. The “join” of two functions with a common range is a witness to one half of the isomorphism—given an arrow {(f,g) : (A,B) \rightarrow \triangle C} in {\mathbb{S}\mathrm{et} \times \mathbb{S}\mathrm{et}}, then {\mathit{join} (f,g)} is an arrow {(+) (A,B) \rightarrow C} in {\mathbb{S}\mathrm{et}}, defined by

\displaystyle  \mathit{join} (f,g) = \mathit{merge} \cdot (+) (f,g)

The two laws that the unit and counit satisfy are:

\displaystyle  \begin{array}{lcl} \mathit{merge} \cdot (+) (\mathit{inl},\mathit{inr}) &=& \mathit{id} \\ \triangle \mathit{merge} \cdot (\mathit{inl},\mathit{inr}) &=& \mathit{id} \end{array}

or, perhaps more perspicuously,

\displaystyle  \begin{array}{lcl} \mathit{join} (\mathit{inl},\mathit{inr}) &=& \mathit{id} \\ \mathit{merge} \cdot \mathit{inl} &=& \mathit{id} \\ \mathit{merge} \cdot \mathit{inr} &=& \mathit{id} \end{array}

Another familiar example from functional programming is the notion of currying, which arises when one can construct the function space {A \Rightarrow B} (the type of functions from {A} to {B}, for each type {A} and {B}), such that there is an isomorphism between the sets of arrows {A \rightarrow (B \Rightarrow C)} and {A \times B \rightarrow C}. Here, the adjunction is {( \times B) \dashv (B \Rightarrow )}—in this case, both functors are endofunctors on {\mathbb{S}\mathrm{et}}. The unit and counit are natural transformations {\mathsf{Id} \mathbin{\stackrel{.}{\to}} (B \Rightarrow )\mathbin{\cdot}( \times B)} and {( \times B)\mathbin{\cdot}(B \Rightarrow ) \mathbin{\stackrel{.}{\to}} \mathsf{Id}}. We might call these {\mathit{pair}} and {\mathit{apply}}, since the first is a curried pair-forming operator, and the second applies a function to an argument:

\displaystyle  \begin{array}{lcl} \mathit{pair} &:& A \rightarrow (B \Rightarrow (A \times B)) \\ \mathit{apply} &:& (B \Rightarrow A) \times B \rightarrow A \end{array}

The laws they satisfy are as follows:

\displaystyle  \begin{array}{lcl} \mathit{apply} \cdot ( \times B) \mathit{pair} &=& \mathit{id} \\ (B \Rightarrow )\mathit{apply} \cdot \mathit{pair} &=& \mathit{id} \end{array}

or, in points,

\displaystyle  \begin{array}{lcl} \mathit{apply} (\mathit{pair}\,a,b) &=& (a,b) \\ \mathit{apply} \cdot \mathit{pair}\,f &=& f \end{array}

The isomorphism itself is witnessed by the two inverse functions

\displaystyle  \begin{array}{lcl} \mathit{curry}\,f &=& (B \Rightarrow ) f \cdot \mathit{pair} \\ \mathit{uncurry}\,g &=& \mathit{apply} \cdot ( \times B) g \end{array}

where {f : A \times B \rightarrow C} and {g : A \rightarrow (B \Rightarrow C)}.

Posted in Uncategorized | 5 Comments

Universal properties and Galois connections

One recurring theme throughout this series will be that of a universal property—an identity that captures an indirect means of solving a problem, by transforming that problem into a different (and hopefully simpler) domain, while still preserving all its essential properties. In particular, the original problem has a solution if and only if the transformed problem does, and moreover, the solution to the transformed problem can easily be translated back into a solution to the original problem. One can see universal properties as a generalization of the notion of a Galois connection between two orderings, which are a similarly powerful technique of relating problems in two different settings. (In fact, the proper generalization of Galois connections is to adjunctions, but that’s a story for next time.)

Universal properties

The universal property of the {\mathit{fork}} operation for products is a representative example. Recall that {\mathit{fork}\,(f,g) :: a \rightarrow (b,c)} when {f :: a \rightarrow b} and {g :: a \rightarrow c}; and that {\mathit{fst} :: (b,c) \rightarrow b} and {\mathit{snd} :: (b,c) \rightarrow c}. Then {\mathit{fork}} is completely defined by its universal property:

\displaystyle  h = \mathit{fork}\,(f,g) \quad\Leftrightarrow\quad \mathit{fst} \cdot h = f \land \mathit{snd} \cdot h = g

This identity repays careful study.

  • It translates a problem in the more complex domain of products (namely, the problem of showing how some complicated expression {h} can be written in terms of {\mathit{fork}}) into simpler problems (here, equations about the two projections of {h}).
  • It’s an equivalence. So not only do you have an implication from left to right (any {h} expressible as a {\mathit{fork}} satisfies the two properties on the right), you also have one from right to left (any pair of functions {f,g} satisfying the two properties on the right induces a {\mathit{fork}}). In other words, {h} is a solution to the equation on the left iff it is a solution on the right; not only does a solution on the right yield a construction on the left, but also the absence of solutions on the right implies the absence on the left. Or again: the equations on the right have a unique solution in {h}—since any two solutions {h,h'} must both be equal to the same expression on the left.
  • It has many useful simple consequences. You can make the left-hand side trivially true by letting {h = \mathit{fork}\,(f,g)}; then the right-hand side must also be true:

    \displaystyle  \begin{array}{lcl} \mathit{fst} \cdot \mathit{fork}\,(f,g) &=& f \\ \mathit{snd} \cdot \mathit{fork}\,(f,g) &=& g \end{array}

    Symmetrically, you can make the right-hand side trivially true by letting {f = \mathit{fst} \cdot h} and {g = \mathit{snd} \cdot h}; then the left-hand side must also be true:

    \displaystyle  h = \mathit{fork}\,(\mathit{fst} \cdot h, \mathit{snd} \cdot h)

    If you further let {h = \mathit{id}}, you conclude that every pair consists solely of its two projections, nothing more:

    \displaystyle  \mathit{id} = \mathit{fork}\,(\mathit{fst}, \mathit{snd})

    In fact, the universal property of {\mathit{fork}} tells you everything you need to know about {\mathit{fork}}; you might take that as one justification for the term “universal”.

  • It also has many useful less obvious consequences. For example, if you’re searching for an {h} that acts independently on the two components of a pair—{\mathit{fst} \cdot h = h_1 \cdot \mathit{fst}} and {\mathit{snd} \cdot h = h_2 \cdot \mathit{snd}}—just let {f = h_1 \cdot \mathit{fst}} and {g = h_2 \cdot \mathit{snd}} in the universal property, and conclude

    \displaystyle  h = \mathit{fork}\,(h_1\cdot\mathit{fst}, h_2\cdot\mathit{snd})

    (which we’ve written “{\mathit{prod}\,(h_1,h_2)}” elsewhere). For another example, we can deduce a fusion law for {\mathit{fork}}: for what {f',g'} does the equation

    \displaystyle  \mathit{fork}\,(f,g) \cdot k = \mathit{fork}\,(f',g')

    hold? This matches the left-hand side of the universal property; expanding the right-hand side yields

    \displaystyle  \begin{array}{lclcl} f' &=& \mathit{fst}\cdot\mathit{fork}\,(f,g)\cdot k &=& f \cdot k \\ g' &=& \mathit{snd}\cdot\mathit{fork}\,(f,g)\cdot k &=& g \cdot k \end{array}

Such a rich harvest from so small a seed! (In fact, we will see later that an even smaller seed suffices.)

Galois connections

We can see the same structures that occur in universal properties like that of {\mathit{fork}} above also in relationships between orderings. As a very simple example, consider the problem of dividing a natural number {n} by two, exactly; the universal property of a solution {m} to this problem is the equivalence

\displaystyle  n / 2 = m \Leftrightarrow n = m \times 2

That is, {m} is a solution to the problem “compute {n / 2}” precisely when {n = m \times 2}; both the existence and the identification of a solution to a problem expressed in terms of division has been translated to one in terms of multiplication—which is arguably a simpler setting. Note that the universal property amounts to an equivalence

\displaystyle  f(n) = m \Leftrightarrow n = g(m)

involving the two functions {f = (/2)} and {g = (\times 2)}, which are in some sense inverses. This pattern will crop up over and over again.

The division example involved an equivalence between the two identities {f(n)=m} and {n=g(m)}. More generally, another relation than “{=}” might be involved. Extending the previous example to integer division, rounding down, we have for {k>0}:

\displaystyle  n \div k \ge m \Leftrightarrow n \ge m \times k

Again, this relates the two (in some sense inverse) functions {(\div k)} and {(\times k)}; but this time equality is inadequate for stating the problem, and it perhaps more convincing to claim that a more complicated problem {(\div k)} has been translated into a simpler one {(\times k)}. What is more, translating the problem via this universal property pays dividends when it comes to reasoning about the problem, because the simpler problem space is much more amenable to calculation. For example, properties of repeated division {(n \div k) \div l} (for {k,l>0}) do not trip off the tongue; but we can reason straightforwardly as follows:

\displaystyle  \begin{array}{ll} & (n \div k) \div l \ge m \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & n \div k \ge m \times l \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & n \ge (m \times l) \times k \\ \Leftrightarrow & \qquad \{ \mbox{multiplication is associative} \} \\ & n \ge m \times (l \times k) \\ \Leftrightarrow & \qquad \{ \mbox{universal property} \} \\ & n \div (l \times k) \ge m \end{array}

Thus, {(n \div k) \div l \ge m} precisely when {n \div (l \times k) \ge m}, or in other words, {(n \div k) \div l = n \div (l \times k)}.

In this case, the two problem spaces have both involved the same relation {\ge} on the same domain, namely the natural numbers; that is not essential. For example, the universal property of the floor function {\lfloor\cdot\rfloor} from reals to integers is given by:

\displaystyle  \mathit{inj}(n) \le_R x \Leftrightarrow n \le_I \lfloor x \rfloor

where, to be completely explicit, we have written {\le_R} for the usual ordering on reals and {\le_I} for the corresponding ordering on integers, and {\mathit{inj}} for the injection from the integers into the reals. This time the two problem spaces involve two different orderings on different domains; we say that the pair of functions {\mathit{inj}} and {\lfloor\cdot\rfloor} form a Galois connection between the orderings {\le_R} and {\le_I}. (We also see that the relationship between the two functions {\mathit{inj}} and {\lfloor\cdot\rfloor} is becoming less like a pure inverse relationship, and more of an embedding–projection pair.)

As a simple non-arithmetical example of a Galois connection on a single domain, consider some set {U} and a fixed subset {X \subseteq U}; then

\displaystyle  A \cap X \subseteq B \Leftrightarrow A \subseteq B \cup \overline{X}

That is, {(\cap X)} and {(\cup \overline{X})} form a Galois connection between {\subseteq} and itself.

A non-arithmetical example between two different domains is afforded by the field of formal concept analysis, which relates “objects” and their “properties”. Given are sets {O} of objects and {P} of properties, and a relation {(\vdash) \subseteq O \times P}; we write {o \mathrel{\vdash} p} to denote that object {o} has property {p}. This induces “concept-forming operators” {\mathit{intent} : 2^O \rightarrow 2^P} and {\mathit{extent} : 2^P \rightarrow 2^O} defined by:

\displaystyle  \begin{array}{lcl} \mathit{intent}(E) &=& \{ p \in P \mid \forall o \in E .\; o \mathrel{\vdash} p \} \\ \mathit{extent}(I) &=& \{ o \in O \mid \forall p \in I .\; o \mathrel{\vdash} p \} \end{array}

That is, {\mathit{intent}(E)} is the set of properties enjoyed by all objects in {E}, and {\mathit{extent}(I)} is the set of objects enjoying all the properties in {I}; a concept is a pair {(E,I)} with {\mathit{intent}(E) = I} and {\mathit{extent}(I) = E}. The concept-forming operators form a Galois connection between {\subseteq} and {\supseteq}:

\displaystyle  \begin{array}{ll} & \mathit{extent}(I) \supseteq E \\ \Leftrightarrow& \qquad \{ \mbox{characteristic of~} \mathit{extent} \} \\ & \forall o \in E .\; (\forall p \in I .\; o \mathrel{\vdash} p) \\ \Leftrightarrow& \qquad \{ \mbox{commuting quantifiers} \} \\ & \forall p \in I .\; (\forall o \in E .\; o \mathrel{\vdash} p) \\ \Leftrightarrow& \qquad \{ \mbox{characteristic of~} \mathit{intent} \} \\ & I \subseteq \mathit{intent}(E) \end{array}

This construction can be used to translate a problem about the extension of a concept (that is, an enumeration of its instances) into one about the intension (that is, the characteristic properties of its instances). It is related to the observation that “syntax and semantics are adjoint“—under the analogy that “objects” are sets of mathematical structures, “properties” are axioms, and the relation is “satisfaction”, the models of an axiomatic theory {T} are included in a set of structures {S} if and only if the theory {T} logically entails the minimal axiomatization of {S}.

Posted in Uncategorized | 3 Comments

Lenses are the coalgebras for the costate comonad

I took part in the Dagstuhl Seminar on Bidirectional Transformations “BX” earlier this month. It was a meeting of people from four communities—databases, graph transformations, programming languages, and software engineering—discussing their various perspectives—namely the view–update problem in databases, triple graph grammars, lenses, and model synchronization—on the common problem of “BX”.

While there, I reported on a marvellous observation made by Russell O’Connor, that lenses are exactly the coalgebras for the costate comonad. That is, the independently identified notion of a “very well-behaved lens” in the work of Pierce and others coincides exactly with the categorical notion of a “coalgebra” for a particular comonad, the “costate” comonad. I’ll unpack that claim here.


Pierce’s lenses are pairs of functions between “source” and “view” datatypes {S} and {V}: a “get” function {g : S \rightarrow V} and a “put” function {p : S \times V \rightarrow S}. The story is that the view is some projection of the data in the source—perhaps a subset of the data, or the data in a simpler format—and so in order to update the source given a modified view, one needs also a copy of the original source from which to reconstruct the missing information.

For these two functions to capture a “well-behaved” lens, they should satisfy the so-called Get–Put and Put–Get laws:

\displaystyle  \begin{array}{lcl} p\,(s,g\,s) &=& s \\ g\,(p\,(s,v)) &=& v \end{array}

The Get–Put law says that if you “get” a view of the source, and then “put” it straight back without modifying it, the source remains unmodified: a no-op edit on the view translates into a no-op on the source. The Put–Get law says that if you “put” any view into a source and then “get” it back, you end up with the view you first thought of: nothing is lost from the view when it is put back.

Additionally, for these two functions to capture a “very well-behaved” lens, they must satisfy a third law, the Put–Put law:

\displaystyle  \begin{array}{lcl} p\,(p\,(s,v),u) &=& p\,(s,u) \end{array}

In words, “put”ting back two views {v} then {u} is equivalent to “put”ting back just the second; any changes to the source from putting back {v} are completely overwritten when putting back {u}. (This turns out to be rather a strong condition, requiring that the source basically factors into the view and a completely independent “complement”; few real applications of bidirectional transformation satisfy it. But that’s another story.)

The costate comonad

Intuitively, comonads capture “data in context”. A comonad {(D,\mathit{extr},\mathit{dupl})} consists of a functor {D} together with two natural transformations {\mathit{extr} : D \rightarrow 1} and {\mathit{dupl} : D \rightarrow DD} that extract the data from its context and duplicate the context, satisfying the three axioms:

\displaystyle  \begin{array}{lcl} \mathit{extr} \cdot \mathit{dupl} &=& \mathit{id} \\ \mathit{fmap}\,\mathit{extr} \cdot \mathit{dupl} &=& \mathit{id} \\ \mathit{fmap}\,\mathit{dupl} \cdot \mathit{dupl} &=& \mathit{dupl} \cdot \mathit{dupl} \end{array}

One example of a comonad is the “costate” construction: for fixed {V}, define functor {D} by

\displaystyle  \begin{array}{lcl} D\,A &=& V \times (V \rightarrow A) \end{array}

so that the “map” function for {D} satisfies {\mathit{fmap}\,h\,(v,f) = (v, h \cdot f)}. The operations are given by

\displaystyle  \begin{array}{lcl} \mathit{extr}\,(v,f) &=& f\,v \\ \mathit{dupl}\,(v,f) &=& (v, \lambda u \rightarrow (u,f)) \end{array}

Verifying that these definitions satisfy the comonad axioms is left as an exercise for the interested reader.

(Incidentally, I think it’s called the “costate” comonad more because it is the dual {(V\times)\cdot(V\rightarrow)} of the “state” monad {(V\rightarrow)\cdot(V\times)}, rather than because it has anything to do with stateful computations. However, it does model state in the sense of stored variables; and indeed, Russell O’Connor’s blog posting calls {D} the “store” comonad.)

Coalgebras of a comonad

For a functor {F}, an {F}-coalgebra is a pair {(A,f)} of a type {A} and a function {f : A \rightarrow F\,A}. A “coalgebra for a comonad {D}” is a {D}-coalgebra that interacts well with the operations {\mathit{extr}} and {\mathit{dupl}} of the comonad; that is, the function {f} should also satisfy the laws:

\displaystyle  \begin{array}{lcl} \mathit{extr} \cdot f &=& \mathit{id} \\ \mathit{dupl} \cdot f &=& \mathit{fmap}\,f \cdot f \end{array}

(Another incidentally: I don’t have a feeling for what these laws mean, in the way that I do for the laws of an algebra of a monad. At least for the free monads {T} that represent terms with free variables, an algebra is a pair {(A, f : T\,A \rightarrow A)} such that {f} makes sense as an “expression evaluator”—it respects singleton variables and substitution. It’s clear to me that the laws of a coalgebra for a comonad are the obvious duals of those for the algebra of a monad; and that they describe the interesting ways of putting together the coalgebra operation with the comonad operations; but I still don’t have a direct intuition. Any comments gratefully received!)

Lenses are coalgebras of the costate comonad

Now it’s just a matter of putting the pieces together. Curry the “put” function of a lens to obtain {p : S \rightarrow (V \rightarrow S)}, and define a lens to be the fork of the “get” and “put” functions:

\displaystyle  \begin{array}{lcl} \ell\,s &=& (g\,s, p\,s) \end{array}

Note that now {\ell : S \rightarrow D\,S} where {D} is the costate comonad. The Get–Put law is equivalent to the counit axiom of the coalgebra:

\displaystyle  \begin{array}{ll} & \mathit{extr} \cdot \ell = \mathit{id} \\ \Leftrightarrow & \qquad \{ \mbox{apply to an~} s \} \\ & \mathit{extr}\,(\ell\,s) = s \\ \Leftrightarrow & \qquad \{ \ell \} \\ & \mathit{extr}\,(g\,s, p\,s) = s \\ \Leftrightarrow & \qquad \{ \mathit{extr} \} \\ & p\,s\,(g\,s) = s \end{array}

And the Put–Get and Put–Put laws together are equivalent to the coassociativity axiom:

\displaystyle  \begin{array}{ll} & \mathit{dupl} \cdot \ell = \mathit{fmap}\,\ell \cdot \ell \\ \Leftrightarrow & \qquad \{ \mbox{apply to an~} s \} \\ & \mathit{dupl}\,(\ell\,s) = \mathit{fmap}\,\ell\,(\ell\,s) \\ \Leftrightarrow & \qquad \{ \ell \} \\ & \mathit{dupl}\,(g\,s, p\,s) = \mathit{fmap}\,\ell\,(g\,s, p\,s) \\ \Leftrightarrow & \qquad \{ \mathit{fmap} \mbox{~for~} D \} \\ & \mathit{dupl}\,(g\,s, p\,s) = (g\,s, \ell \cdot p\,s) \\ \Leftrightarrow & \qquad \{ \mathit{dupl} \} \\ & (g\,s, \lambda v \rightarrow (v, p\,s)) = (g\,s, \ell \cdot p\,s) \\ \Leftrightarrow & \qquad \{ \mbox{first components are clearly equal} \} \\ & \lambda v \rightarrow (v, p\,s) = \ell \cdot p\,s \\ \Leftrightarrow & \qquad \{ \mbox{apply to a~} v \} \\ & (v, p\,s) = \ell\,(p\,s\,v) \\ \Leftrightarrow & \qquad \{ \ell \} \\ & (v, p\,s) = (g\,(p\,s\,v), p\,(p\,s\,v)) \\ \Leftrightarrow & \qquad \{ \mbox{apply second components to a~} u \} \\ & (v, p\,s\,u) = (g\,(p\,s\,v), p\,(p\,s\,v)\,u) \\ \Leftrightarrow & \qquad \{ \mbox{pair equality is pointwise} \} \\ & v = g\,(p\,s\,v) \land p\,s\,u = p\,(p\,s\,v)\,u \end{array}

Posted in Uncategorized | 6 Comments

The stream monad

I read quite a nice problem on Nick Wu’s blog, which will serve as a fine warm-up exercise. It’s about the fact that streams (infinite lists) form a monad, in a different way from lists. Nick shows the “right” and two “wrong” definitions of the join or bind operation, distinguishing them on the basis of the monad laws. But I think Nick’s proofs are more complicated than they need to be, because he hasn’t fully exploited the recursion patterns that underlie his definitions.

This post will involve some language that we have not yet covered. Fear not! I hope it will be clear from context. But in case it isn’t, you might want to take a look at some of the background material (especially the paper Calculating Functional Programs).


Like Nick, for simplicity we will take the datatype of streams to be a synonym for lists; in all that follows, assume that lists are properly infinite (not finite, or partial).

\displaystyle  \mathbf{type}\;\mathit{Stream}\,a = [a]

Streams are naturally a codatatype rather than a datatype: in the category of sets and total functions, they would be represented as a final coalgebra rather than an initial algebra. In Haskell, which is roughly based on the category of CPOs and continuous functions, initial algebras and final coalgebras coincide, so we need not (indeed, we cannot) make the distinction formally. But we can make it informally, by stipulating that the basic pattern of computation for streams is the {\mathit{unfold}}:

\displaystyle  \begin{array}{l} \mathit{unfold} :: (b \rightarrow (a,b)) \rightarrow b \rightarrow \mathit{Stream}\,a \\ \mathit{unfold}\,f\,b = a : \mathit{unfold}\,f\,b' \qquad\mathbf{where}\qquad (a,b') = f\,b \end{array}

{\mathit{unfold}\,f} generates a stream from a seed, using the body {f} that transforms a seed {b} into an element {a} and a new seed {b'}. For example, the map function for streams uses the input stream as the seed, repeatedly splitting it into its head and tail:

\displaystyle  \begin{array}{l} \mathit{mapS} :: (a \rightarrow b) \rightarrow \mathit{Stream}\,a \rightarrow \mathit{Stream}\,b \\ \mathit{mapS}\,f = \mathit{unfold}\,(\mathit{fork}\,(f \cdot \mathit{head}, \mathit{tail})) \end{array}

where {\mathit{fork}} applies two functions to the same argument:

\displaystyle  \mathit{fork}\,(f,g)\,a = (f\,a, g\,a)

The crucial property of {\mathit{unfold}} is its universal property, which provides necessary and sufficient conditions for a computation to be expressible as an instance of {\mathit{unfold}}:

\displaystyle  h = \mathit{unfold}\,f \Leftrightarrow \mathit{out} \cdot h = \mathit{prod}\,(\mathit{id},h) \cdot f

where {\mathit{out} = \mathit{fork}\,(\mathit{head},tail)} deconstructs a stream into its head and tail, and

\displaystyle  \mathit{prod}\,(f,g)\,(a,b) = (f\,a, g\,b)

From the universal property, one can easily (exercise!) prove three simple consequences (we’ll call them the “identity” and two “evaluation” rules):

\displaystyle  \begin{array}{l} \mathit{unfold}\,\mathit{out} = \mathit{id} \\ \mathit{head} \cdot \mathit{unfold}\,(\mathit{fork}\,(f,g)) = f \\ \mathit{tail} \cdot \mathit{unfold}\,(\mathit{fork}\,(f,g)) = \mathit{unfold}\,(\mathit{fork}\,(f,g)) \cdot g \end{array}

and the very important fusion law:

\displaystyle  \mathit{unfold}\,f \cdot h = \mathit{unfold}\,g \Leftarrow f \cdot h = \mathit{prod}\,(\mathit{id},h) \cdot g

allowing a preceding function {h} to be absorbed into the unfold.

Streams as a monad

Making streams a monad amounts to defining functions

\displaystyle  \begin{array}{lcl} \mathit{return} &::& a \rightarrow \mathit{Stream}\,a \\ \mathit{join} &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow \mathit{Stream}\,a \end{array}

satisfying the monad laws:

\displaystyle  \begin{array}{lcl} \mathit{join} \cdot \mathit{return} &=& \mathit{id} \\ \mathit{join} \cdot \mathit{mapS}\,\mathit{return} &=& \mathit{id} \\ \mathit{join} \cdot \mathit{mapS}\,\mathit{join} &=& \mathit{join} \cdot \mathit{join} \end{array}

Looking at the type, the obvious (indeed, I think the only possible) definition one can give for {\mathit{return}} is {\mathit{return} = \mathit{repeat}} where

\displaystyle  \mathit{repeat} = \mathit{unfold}\,\mathit{double}

and {\mathit{double} = \mathit{fork}\,(\mathit{id},\mathit{id})} makes two copies of its argument. However, there are many type-correct definitions one could give for { \mathit{join}}, including {\mathit{head}}, {\mathit{mapS}\,\mathit{head}}, and {\mathit{diag}}, where

\displaystyle  \mathit{diag} = \mathit{unfold}\,\mathit{hhtt}

and where (for brevity in what follows) we define

\displaystyle  \begin{array}{lcl} \mathit{hhtt} &=& \mathit{fork}\,(\mathit{hh},\mathit{tt}) \\ \mathit{hh} &=& \mathit{head}\cdot\mathit{head} \\ \mathit{tt} &=& \mathit{tail}\cdot\mathit{mapS}\,\mathit{tail} \end{array}

Obviously, {\mathit{head}} yields the first “row” of a stream of streams (if one considers it in row-major order), and {\mathit{mapS}\,\mathit{head}} yields the first column; as the name suggests, {\mathit{diag}} yields the leading diagonal. Nick’s post demonstrates that the first two, although type-correct, do not satisfy the monad laws. He also provides a proof that the third does, which we turn to next.

Checking the monad laws

The proofs that {\mathit{repeat}} and {\mathit{diag}} satisfy the three monad laws are very straightforward, using the universal property of {\mathit{unfold}} and its consequences.

For the first monad law, fusion gives us the condition to check:

\displaystyle  \begin{array}{ll} & \mathit{diag}\cdot\mathit{repeat} = \mathit{id} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{repeat} = \mathit{prod}\,(\mathit{id},\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

Working on the right-hand side, we have:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{repeat} \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fork}\,(\mathit{hh},\mathit{tt})\cdot\mathit{repeat} \\ = & \qquad \{ \mbox{composition distributes backwards over fork} \} \\ & \mathit{fork}\,(\mathit{hh}\cdot\mathit{repeat},\mathit{tt}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{definitions} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{repeat},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{evaluation for~} \mathit{repeat} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{repeat}\cdot\mathit{tail}) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{repeat}) \cdot \mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

discharging the proof obligation.

Similarly, for the second monad law, fusion gives us the condition:

\displaystyle  \begin{array}{ll} & \mathit{diag}\cdot\mathit{mapS}\,\mathit{repeat} = \mathit{id} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{repeat} = \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

and working on the right-hand side, in almost exactly the same steps we get:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{repeat} \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fork}\,(\mathit{hh},\mathit{tt})\cdot\mathit{mapS}\,\mathit{repeat} \\ = & \qquad \{ \mbox{composition distributes backwards over fork} \} \\ & \mathit{fork}\,(\mathit{hh}\cdot\mathit{mapS}\,\mathit{repeat},\mathit{tt}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{definitions} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{mapS}\,\mathit{repeat},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; functors; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{mapS}\,\mathit{repeat}\cdot\mathit{tail}) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat}) \cdot \mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

discharging the obligation.

What about the third monad law? To apply the universal property (or fusion), we need one side to be expressed as an unfold; but neither side of the equation {\mathit{diag}\cdot\mathit{diag} = \mathit{diag}\cdot\mathit{mapS}\,\mathit{diag}} is in that form. No matter; let us hypothesize that one side—say, the left—can be expressed in the form {\mathit{unfold}\,h} for some {h}, then calculate a suitable definition for {h} (if one exists). Assuming we succeed, then we can use fusion to check that the other side equals {\mathit{unfold}\,h}. (This strategy doesn’t work if we can find no such {h}!)

Again, fusion gives us

\displaystyle  \mathit{diag}\cdot\mathit{diag} = \mathit{unfold}\,h \Leftarrow \mathit{hhtt}\cdot\mathit{diag} = \mathit{prod}\,(\mathit{id},\mathit{diag})\cdot h

so we calculate:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{diag} \\ = & \qquad \{ \mbox{definition; distribution} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{diag},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{diag}) \\ = & \qquad \{ \mbox{evaluation for~} \mathit{diag} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{diag}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{diag}\cdot\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{diag})\cdot\mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \end{array}

Therefore, letting

\displaystyle  \mathit{hhhttt} = \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail}))

we have concluded that

\displaystyle  \mathit{diag}\cdot\mathit{diag} = \mathit{unfold}\,\mathit{hhhttt}

Now all we have to do is to check that the right-hand side of the third monad law also equals this; fusion gives us the condition

\displaystyle  \begin{array}{ll} & \mathit{diag}\cdot\mathit{mapS}\,\mathit{diag} = \mathit{unfold}\,\mathit{hhhttt} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{diag} = \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{diag})\cdot \mathit{hhhttt} \end{array}

and we calculate on the right-hand side:

\displaystyle  \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{diag} \\ = & \qquad \{ \mbox{definition; distribution} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{mapS}\,\mathit{diag},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{diag}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{diag}) \\ = & \qquad \{ \mbox{functors; naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{mapS}\,\mathit{diag}\cdot\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \\ = & \qquad \{ \mbox{pairs; definition} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{diag})\cdot\mathit{hhhttt} \end{array}

completing the proof.

As you’ll see, the calculations are all quite short and simple, whereas in Nick’s formulation, they were rather hard work; I think that was (a) because he wasn’t exploiting the universal property, and (b) because he was working in terms of the “bind” rather than the “join” of the monad, which forced him into a more pointwise rather than pointfree style. Points are helpful when writing programs, but less so when reasoning about them.

Deducing diag

Here’s another way of looking at the problem. Nick’s blog presented three plausible (that is, type-correct) definitions for the {\mathit{join}} operation. Two of these didn’t satisfy the necessary laws, so were evidently wrong. The third, {\mathit{diag}}, does satisfy the laws, but is it the only possible definition that does? I believe that it is the only solution in the form of an unfold; but I only have a hand-waving argument as to why.

Let us suppose that indeed

\displaystyle  join = \mathit{unfold}\,k

for some {k}. Without loss of generality, let us suppose also that

\displaystyle  k = \mathit{fork}\,(k_1,k_2)


\displaystyle  \begin{array}{lcl} k_1 &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow a \\ k_2 &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow \mathit{Stream}\,(\mathit{Stream}\,a) \end{array}

I claimed above that {\mathit{repeat}} is the only type-correct definition of the {\mathit{return}} operation. (Ignoring bottoms, that is. Which is to say, in Haskell, all type-correct definitions are approximations in the definedness ordering to {\mathit{repeat}}.)

Consideration of just the first two monad laws gives us some constraints on {k}, since we know that {\mathit{return} = \mathit{repeat}}:

\displaystyle  \begin{array}{lcl} k\cdot\mathit{repeat} &=& \mathit{prod}\,(\mathit{id},\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \\ k\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}

Or in terms of {k}‘s two components,

\displaystyle  \begin{array}{lcll} k_1\cdot\mathit{repeat} &=& \mathit{head} &(1)\\ k_2\cdot\mathit{repeat} &=& \mathit{repeat}\cdot\mathit{tail} &(2)\\ k_1\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{head} &(3)\\ k_2\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{mapS}\,\mathit{repeat}\cdot\mathit{tail} &(4) \end{array}

I claim that (1) entails that {k_1} picks some element out of the first “column” of a stream of streams (thinking of the input as an infinite matrix in row-major order again)—for the equation says that when the input consists of infinitely many copies of the same stream, {k_1} picks (one of the many copies of) the head of that stream. Symmetrically, (3) entails that, when given a infinite matrix whose columns are all equal, {k_1} picks some element out of the first “row”. And because {k_1} has to be polymorphic, it cannot behave differently on special matrices like these than it does in general. Putting those statements together, and waving my hands in the air, I conclude that {k_1} picks the only element that is in both the first row and the first column:

\displaystyle  k_1 = \mathit{head} \cdot \mathit{head}

Similarly, Equation (2) says that, given an infinite input matrix all of whose rows are equal, {k_2} drops the first column (and possibly some of the rows are duplicated or dropped, and the order of the rows may change; but the elements of the rows are untouched). Symmetrically, (4) says that, given an input whose columns are all equal, {k_2} drops the first row (and may duplicate, drop, or rearrange the columns, but not change any of them). And again, the behaviour in general must be consistent with these special cases. Putting these observations together, {k_2} must drop the first row and the first column, and cannot change any of the remainder of the matrix.

\displaystyle  k_2 = \mathit{tail} \cdot \mathit{mapS}\,\mathit{tail}

What is the right framework in which to present such arguments more formally? It feels rather like Paul Hoogendijk’s relational approach to generic programming, which has to talk about largest natural transformations of a given type: the relational setting provides the conjunction one needs in order to express the two separate constraints on {k_1}.

Posted in Uncategorized | 12 Comments

Design patterns as higher-order datatype-generic programs

Well, I got side-tracked from working directly on the book. One of the sidetracks was working on my paper Design Patterns as Higher-Order Datatype-Generic Programs from the Workshop on Generic Programming in 2006, revising and extending it for a journal; I’ve put the revised version online. In my defence, I offer that although this isn’t directly content for the book, it is indirectly so.

The central argument of the paper is that current mainstream programming languages such as Java and C# are not expressive enough to allow one to capture the code parts of design patterns such as the Visitor pattern. However, functional languages such as Haskell are sufficiently expressive—which I demonstrate in the paper by providing Haskell representations of four of the GOF patterns.

As the title of the paper suggests, the specific concepts that Haskell provides but current mainstream languages do not are higher-order (parametrization by a function, sometimes called a “lambda” or a “closure”) and datatype-generic (parametrization by a type functor) features. I’ve taken to calling programs exploiting these features “HODGPs”, and languages providing them as “HODGP languages”. A HODGP language lets you capture the code of design patterns as highly flexible reusable software components. Interestingly, Scala also provides HODGP features, together with everything you’d expect from an OO language; I’m hopeful that Java and C# will follow Scala’s lead (or that Scala will supercede them!). Bruno Oliveira‘s DPhil thesis Genericity, Extensibility and Type-Safety in the Visitor Pattern explored using Scala for HODGPs.

Of course, I admit that “capturing the code parts of a pattern” is not the same as capturing the pattern itself. There is more to the pattern than just the code; the “prose, pictures, and prototypes” form an important part of the story, and are not captured in a HODGP representation of the pattern. So the HODGP isn’t a replacement for the pattern.

The four GOF patterns I discuss in the paper are Composite, Iterator, Visitor and Builder. My claim is that Composite (hierarchical structures) amounts to recursive datatypes, Iterator (linear traversals over the elements of a collection) to maps, Visitor (structured traversals also exploiting the shape of a collection) to folds, and Builder (separating the construction of an object from its representation) to unfolds and builds. This is a simplification, both of the patterns and the paper; take a read if you want the full story!

Posted in Uncategorized | 6 Comments

The story so far

I’ve already written a fair amount on these topics, and I don’t propose to blog that as I go. If you’d like some background, you might want to look at the following notes of mine:

Of course, many others have also written on these topics too. Some articles that have particularly inspired me are:

(I’m sure there are many others too, which I’ve forgotten to mention.)

Posted in Uncategorized | 3 Comments

Recursion patterns

The central idea in this book is the importance of recursion patterns in functional programming. One of the simplest examples is the map operator on lists, defined in Haskell as follows:

\begin{array}{lcl} \mathit{map}\;f\;[\,] &=& [\,] \\ \mathit{map}\;f\;(x:xs) &=& f\;x : \mathit{map}\;f\;xs \end{array}

(Intuitively, \mathit{map} applies a function f to every element of a list, returning a new list. When \mathit{map}\;f is applied to the empty list [\,], it returns the empty list; applied to a non-empty list x:xs with head x and tail xs, it returns the list whose head is f applied to x, and whose tail is \mathit{map}\;f applied recursively to xs. Because the function \mathit{map} accepts another function f as an argument, it is called a higher-order function.)

Another example familiar to Haskell programmers is the fold operator on lists:

\begin{array}{lcl} \mathit{foldr}\;f\;e\;[\,] &=& e \\ \mathit{foldr}\;f\;e\;(x:xs) &=& f\;x\;(\mathit{foldr}\;f\;e\;xs) \end{array}

(Intuitively, this “folds” a list up into a value; the empty list is folded into the initial value e, and a non-empty list is folded by recursively folding the tail then combining the result of this with the head using the binary function f.)

One can think of the operators \mathit{map} and \mathit{foldr} as abstracted patterns of computation: many useful functions can be expressed in terms of them, and identifying them as useful and reusable concepts saves a lot of repetition in code. Put another way, functional programming (especially lazy functional programming) languages provide the programmer with good tools for building their own control structures; if you don’t have the right kind of loop built-in, just define it as a higher-order function.

But there’s another benefit to be derived from certain recursion patterns (in particular, for the map and fold above, but not for any old custom control structure that you might just have defined). Take the definition of \mathit{foldr}, but abstract from the \mathit{foldr}\;f\;e part:

\begin{array}{lcl} h\;[\,] &=& e \\ h\;(x:xs) &=& f\;x\;(h\;xs) \end{array}

and read this as an equation in the unknown h, for fixed f and e. It turns out that this equation has a unique solution (modulo some subtleties, as we’ll cover later) — which is, as you might have guessed, h = \mathit{foldr}\;f\;e. This is called the universal property of \mathit{foldr}, and it is very convenient for reasoning about programs with. For example, to show that some complicated expression equals a given fold, it suffices to show that the expression satisfies the universal property for that fold. Such universal properties will come up again and again throughout the book.

Posted in Uncategorized | 1 Comment

Functional programming

So, what is the “functional programming” in the title about, then? Here’s a very brief characterization, for those who haven’t heard of it before. I’m mostly thinking of the Haskell programming language, and there’s a good explanation and many resources at www.haskell.org.

In a nutshell, functional programming restricts itself to programming with values, rather than with actions. Most programming languages are “imperative”, focussing on the actions: the program describes a series of actions to perform, and only makes incidental use of values in passing; there is a sublanguage of statements (such as assignments to variables, loops, conditional choices) and a sublanguage of expressions (such as for the computation yielding the value to be assigned to a variable in an assignment statement, and for the condition on which a conditional depends).

In contrast, pure functional programming languages dispense with the statements, and make use only of expressions. This might seem very limiting, restricting programs to variations on a mere pocket calculator. But of course, a useful functional language extends the grammar of expressions, so that they cover much more than arithmetic; indeed, expressions typically encompass complex data structures and recursive function definitions, and so are every bit as expressive as statements. Moreover, it turns out that programs expressed in terms of expressions rather than statements are usually shorter (many of the details of the equivalent imperative program are redundant) and simpler (because there are no assignment statements, variables don’t vary, and so the familiar equational reasoning principles of high-school algebra are applicable throughout).

The characterization of programming languages as “functional” versus “imperative” is by no means exhaustive; there are many other styles too. But many of them are also discovering the joys of functional programming; for example, good practice in object-oriented programming states that value objects should be immutable.

Posted in Uncategorized | 1 Comment