## Folds on lists

The previous post turned out to be rather complicated. In this one, I return to much simpler matters: ${\mathit{foldr}}$ and ${\mathit{foldl}}$ on lists, and list homomorphisms with an associative binary operator. For simplicity, I’m only going to be discussing finite lists.

## Fold right

Recall that ${\mathit{foldr}}$ is defined as follows:

$\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{foldr} :: (\alpha \rightarrow \beta \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{foldr}\;(\oplus)\;e\;[\,] & = e \\ \mathit{foldr}\;(\oplus)\;e\;(a:x) & = a \oplus \mathit{foldr}\;(\oplus)\;e\;x \end{array}$

It satisfies the following universal property, which gives necessary and sufficient conditions for a function ${h}$ to be expressible as a ${\mathit{foldr}}$:

$\displaystyle \begin{array}{@{}l} h = \mathit{foldr}\;(\oplus)\;e \quad\Longleftrightarrow\quad h\;[\,] = e \;\land\; h\;(a:x) = a \oplus (h\;x) \end{array}$

As one consequence of the universal property, we get a fusion theorem, which states sufficient conditions for fusing a following function ${g}$ into a ${\mathit{foldr}}$:

$\displaystyle \begin{array}{@{}l} g \cdot \mathit{foldr}\;(\oplus)\;e = \mathit{foldr}\; (\otimes)\;e' \quad \Longleftarrow\quad g\;e = e' \;\land\; g\;(a \oplus b) = a \otimes g\;b \end{array}$

(on finite lists—infinite lists require an additional strictness condition). Fusion is an equivalence if ${\mathit{foldr}\;(\oplus)\;e}$ is surjective. If ${\mathit{foldr}\;(\oplus)\;e}$ is not surjective, it’s an equivalence on the range of that fold: the left-hand side implies

$\displaystyle g\;(a \oplus b) = a \otimes g\;b$

only for ${b}$ of the form ${\mathit{foldr}\;(\oplus)\;e\;x}$ for some ${x}$, not for all ${b}$.

As a second consequence of the universal property (or alternatively, as a consequence of the free theorem of the type of ${\mathit{foldr}}$), we have map–fold fusion:

$\displaystyle \mathit{foldr}\;(\oplus)\;e \cdot \mathit{map}\;g = \mathit{foldr}\;((\oplus) \cdot g)\;e$

The code golf${((\oplus) \cdot g)}$” is a concise if opaque way of writing the binary operator pointfree: ${((\oplus) \cdot g)\;a\;b = g\;a \oplus b}$.

## Fold left

Similarly, ${\mathit{foldl}}$ is defined as follows:

$\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{foldl} :: (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{foldl}\;(\oplus)\;e\;[\,] & = e \\ \mathit{foldl}\;(\oplus)\;e\;(a:x) & = \mathit{foldl}\;(\oplus)\;(e \oplus a)\;x \end{array}$

(${\mathit{foldl}}$ is tail recursive, so it is unproductive on an infinite list.)

${\mathit{foldl}}$ also enjoys a universal property, although it’s not so well known as that for ${\mathit{foldr}}$. Because of the varying accumulating parameter ${e}$, the universal property entails abstracting that argument on the left in favour of a universal quantification on the right:

$\displaystyle \begin{array}{@{}l} h = \mathit{foldl}\;(\oplus) \quad\Longleftrightarrow\quad \forall b \;.\; h\;b\;[\,] = b \;\land\; h\;b\;(a:x) = h\;(b \oplus a)\;x \end{array}$

(For the proof, see Exercise 16 in my paper with Richard Bird on arithmetic coding.)

From the universal property, it is straightforward to prove a map–${\mathit{foldl}}$ fusion theorem:

$\displaystyle \mathit{foldl}\;(\oplus)\;e \cdot \mathit{map}\;f = \mathit{foldl}\;((\cdot f) \cdot (\oplus))\;e$

Note that ${((\cdot f) \cdot (\oplus))\;b\;a = b \oplus f\;a}$. There is also a fusion theorem:

$\displaystyle (\forall e \;.\; f \cdot \mathit{foldl}\;(\oplus)\;e = \mathit{foldl}\;(\otimes)\;(f\;e)) \quad\Longleftarrow\quad f\;(b \oplus a) = f\;b \otimes a$

This is easy to prove by induction. (I would expect it also to be a consequence of the universal property, but I don’t see how to make that go through.)

## Duality theorems

Of course, ${\mathit{foldr}}$ and ${\mathit{foldl}}$ are closely related. §3.5.1 of Bird and Wadler’s classic text presents a First Duality Theorem:

$\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\oplus)\;e\;x$

when ${\oplus}$ is associative with neutral element ${e}$; a more general Second Duality Theorem:

$\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\otimes)\;e\;x$

when ${\oplus}$ associates with ${\otimes}$ (that is, ${a \oplus (b \otimes c) = (a \oplus b) \otimes c}$) and ${a \oplus e = e \otimes a}$ (for all ${a,b,c}$, but fixed ${e}$); and a Third Duality Theorem:

$\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\mathit{flip}\;f)\;e\;(\mathit{reverse}\;x)$

(again, all three only for finite ${x}$).

The First Duality Theorem is a specialization of the Second, when ${{\oplus} = {\otimes}}$ (but curiously, also a slight strengthening: apparently all we require is that ${a \oplus e = e \oplus a}$, not that both equal ${a}$; for example, we still have ${\mathit{foldr}\;(+)\;1 = \mathit{foldl}\;(+)\;1}$, even though ${1}$ is not the neutral element for ${+}$).

## Characterization

When is a function a fold?” Evidently, if function ${h}$ on lists is injective, then it can be written as a ${\mathit{foldr}}$: in that case (at least, classically speaking), ${h}$ has a post-inverse—a function ${g}$ such that ${g \cdot h = \mathit{id}}$—so:

$\displaystyle \begin{array}{@{}ll} & h\;(a:x) \\ = & \qquad \{ g \cdot h = \mathit{id} \} \\ & h\;(a : g\;(h\;x)) \\ = & \qquad \{ \mbox{let } a \oplus b = h\;(a : g\;b) \} \\ & a \oplus h\;x \end{array}$

and so, letting ${e = h\;[\,]}$, we have

$\displaystyle h = \mathit{foldr}\;(\oplus)\;e$

But also evidently, injectivity is not necessary for a function to be a fold; after all, ${\mathit{sum}}$ is a fold, but not the least bit injective. Is there a simple condition that is both sufficient and necessary for a function to be a fold? There is! The condition is that lists equivalent under ${h}$ remain equivalent when extended by an element:

$\displaystyle h\;x = h\;x' \quad\Longrightarrow\quad h\;(a:x) = h\;(a:x')$

We say that ${h}$ is leftwards. (The kernel ${\mathrm{ker}\;h}$ of a function ${h}$ is a relation on its domain, namely the set of pairs ${(x,x')}$ such that ${h\;x = h\;x'}$; this condition is equivalent to ${\mathrm{ker}\;h \subseteq \mathrm{ker}\;(h \cdot (a:))}$ for every ${a}$.) Clearly, leftwardsness is necessary: if ${h = \mathit{foldr}\;(\oplus)\;e}$ and ${h\;x = h\;x'}$, then

$\displaystyle \begin{array}{@{}ll} & h\;(a:x) \\ = & \qquad \{ h \mbox{ as } \mathit{foldr} \} \\ & a \oplus h\;x \\ = & \qquad \{ \mbox{assumption} \} \\ & a \oplus h\;x' \\ = & \qquad \{ h \mbox{ as } \mathit{foldr} \mbox{ again} \} \\ & h\;(a:x') \end{array}$

Moreover, leftwardsness is sufficient. For, suppose that ${h}$ is leftwards; then pick a function ${g}$ such that, when ${b}$ is in the range of ${h}$, we get ${g\;b = x}$ for some ${x}$ such that ${h\;x = b}$ (and it doesn’t matter what value ${g}$ returns for ${b}$ outside the range of ${h}$), and then define

$\displaystyle a \oplus b = h\;(a : g\;b)$

This is a proper definition of ${\oplus}$, on account of leftwardsness, in the sense that it doesn’t matter which value ${x}$ we pick for ${g\;b}$, as long as indeed ${h\;x = b}$: any other value ${x'}$ that also satisfies ${h\;x' = b}$ entails the same outcome ${h\;(a:x') = h\;(a:x)}$ for ${a \oplus b}$. Intuitively, it is not necessary to completely invert ${h}$ (as we did in the injective case), provided that ${h}$ preserves enough distinctions. For example, for ${h = \mathit{sum}}$ (for which indeed ${\mathit{sum}\;x = \mathit{sum}\;x'}$ implies ${\mathit{sum}\;(a:x) = \mathit{sum}\;(a:x')}$), we could pick ${g\;b = [b]}$. In particular, ${h\;x}$ is obviously in the range of ${h}$; then ${g\;(h\;x)}$ is chosen to be some ${x'}$ such that ${h\;x' = h\;x}$, and so by construction ${h\;(g\;(h\;x)) = h\;x}$—in other words, ${g}$ acts as a kind of partial inverse of ${h}$. So we get:

$\displaystyle \begin{array}{@{}ll} & a \oplus h\;x \\ = & \qquad \{ \mbox{definition of } \oplus \mbox{; let } x' = g\;(h\;x) \} \\ & h\;(a : x') \\ = & \qquad \{ h\;x' = h\;x \mbox{; assumption} \} \\ & h\;(a:x) \\ \end{array}$

and therefore ${h = \mathit{foldr}\;(\oplus)\;e}$ where ${e = h\;[\,]}$.

## Monoids and list homomorphisms

A list homomorphism is a fold after a map in a monoid. That is, define

$\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{hom} :: (\beta\rightarrow\beta\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{hom}\;(\odot)\;f\;e\;[\,] & = e \\ \mathit{hom}\;(\odot)\;f\;e\;[a] & = f\;a \\ \mathit{hom}\;(\odot)\;f\;e\;(x \mathbin{{+}\!\!\!{+}} y) & = \mathit{hom}\;(\odot)\;f\;e\;x \;\odot\; \mathit{hom}\;(\odot)\;f\;e\;y \end{array}$

provided that ${\odot}$ and ${e}$ form a monoid. (One may verify that this condition is sufficient for the equations to completely define the function; moreover, they are almost necessary—${\odot}$ and ${e}$ should form a monoid on the range of ${\mathit{hom}\;(\odot)\;f\;e}$.) The Haskell library ${\mathit{Data.Foldable}}$ defines an analogous method, whose list instance is

$\displaystyle \mathit{foldMap} :: \mathit{Monoid}\;\beta \Rightarrow (\alpha \rightarrow \beta) \rightarrow [\alpha] \rightarrow \beta$

—the binary operator ${\odot}$ and initial value ${e}$ are determined implicitly by the ${\mathit{Monoid}}$ instance rather than being passed explicitly.

Richard Bird’s Introduction to the Theory of Lists states implicitly what might be called the First Homomorphism Theorem, that any homomorphism consists of a reduction after a map (in fact, a consequence of the free theorem of the type of ${\mathit{hom}}$):

$\displaystyle \mathit{hom}\;(\odot)\;f\;e = \mathit{hom}\;(\odot)\;\mathit{id}\;e \cdot \mathit{map}\;f$

Explicitly as Lemma 4 (“Specialization”) the same paper states a Second Homomorphism Theorem, that any homomorphism can be evaluated right-to-left or left-to-right, among other orders:

$\displaystyle \mathit{hom}\;(\odot)\;f\;e = \mathit{foldr}\;(\lambda\;a\;b \;.\; f\;a \odot b)\;e = \mathit{foldl}\;(\lambda\;b\;a \;.\; b \odot f\;a)\;e$

I wrote up the Third Homomorphism Theorem, which is the converse of the Second Homomorphism Theorem: any function that can be written both as a ${\mathit{foldr}}$ and as a ${\mathit{foldl}}$ is a homomorphism. I learned this theorem from David Skillicorn, but it was conjectured earlier by Richard Bird and proved by Lambert Meertens during a train journey in the Netherlands—as the story goes, Lambert returned from a bathroom break with the idea for the proof. Formally, for given ${h}$, if there exists ${\oplus, \otimes, e}$ such that

$\displaystyle h = \mathit{foldr}\;(\oplus)\;e = \mathit{foldl}\;(\otimes)\;e$

then there also exists ${f}$ and associative ${\odot}$ such that

$\displaystyle h = \mathit{hom}\;(\odot)\;f\;e$

Recall that ${h}$ is “leftwards” if

$\displaystyle h\;x = h\;x' \Longrightarrow h\;(a:x) = h\;(a:x')$

and that the leftwards functions are precisely the ${\mathit{foldr}}$s. Dually, ${h}$ is rightwards if

$\displaystyle h\;x = h\;x' \Longrightarrow h\;(x \mathbin{{+}\!\!\!{+}} [a]) = h\;(x' \mathbin{{+}\!\!\!{+}} [a])$

and (as one can show) the rightwards functions are precisely the ${\mathit{foldl}}$s. So the Specialization Theorem states that every homomorphism is both leftwards and rightwards, and the Third Homomorphism Theorem states that every function that is both leftwards and rightwards is necessarily a homomorphism. To prove the latter, suppose that ${h}$ is both leftwards and rightwards; then pick a function ${g}$ such that, for ${b}$ in the range of ${h}$, we get ${g\;b = x}$ for some ${x}$ such that ${h\;x = b}$, and then define

$\displaystyle b \odot c = h\;(g\;b \mathbin{{+}\!\!\!{+}} g\;c)$

As before, this is a proper definition of ${\odot}$: assuming leftwardsness and rightwardsness, the result does not depend on the representatives chosen for ${g}$. By construction, ${g}$ again satisfies ${h\;(g\;(h\;x)) = h\;x}$ for any ${x}$, and so we have:

$\displaystyle \begin{array}{@{}ll} & h\;x \odot h\;y \\ = & \qquad \{ \mbox{definition of } \odot \} \\ & h\;(g\;(h\;x) \mathbin{{+}\!\!\!{+}} g\;(h\;y)) \\ = & \qquad \{ h \mbox{ is rightwards; } h\;(g\;(h\;x)) = h\;x \} \\ & h\;(x \mathbin{{+}\!\!\!{+}} g\;(h\;y)) \\ = & \qquad \{ h \mbox{ is leftwards; } h\;(g\;(h\;y)) = h\;y \} \\ & h\;(x \mathbin{{+}\!\!\!{+}} y) \end{array}$

Moreover, one can show that ${\odot}$ is associative, and ${e}$ its neutral element (at least on the range of ${h}$).

For example, sorting a list can obviously be done as a ${\mathit{foldr}}$, using insertion sort; by the Third Duality Theorem, it can therefore also be done as a ${\mathit{foldl}}$ on the reverse of the list; and because the order of the input is irrelevant, the reverse can be omitted. The Third Homomorphism Theorem then implies that there exists an associative binary operator ${\odot}$ such that ${\mathit{sort}\;(x \mathbin{{+}\!\!\!{+}} y) = \mathit{sort}\;x \odot \mathit{sort}\;y}$. It also gives a specification of ${\odot}$, given a suitable partial inverse ${g}$ of ${\mathit{sort}}$—in this case, ${g = \mathit{id}}$ suffices, because ${\mathit{sort}}$ is idempotent. The only characterization of ${\odot}$ arising directly from the proof involves repeatedly inserting the elements of one argument into the other, which does not exploit sortedness of the first argument. But from this inefficient characterization, one may derive the more efficient implementation that merges two sorted sequences, and hence obtain merge sort overall.

## The Trick

Here is another relationship between the directed folds (${\mathit{foldr}}$ and ${\mathit{foldl}}$) and list homomorphisms—any directed fold can be implemented as a homomorphism, followed by a final function application to a starting value:

$\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{hom}\;(\cdot)\;(\oplus)\;\mathit{id}\;x\;e$

Roughly speaking, this is because application and composition are related:

$\displaystyle a \oplus (b \oplus (c \oplus e)) = (a\oplus) \mathbin{\} (b\oplus) \mathbin{\} (c\oplus) \mathbin{\} e = ((a\oplus) \cdot (b\oplus) \cdot (c\oplus))\;e$

(here, ${\ }$ is right-associating, loose-binding function application). To be more precise:

$\displaystyle \begin{array}{@{}ll} & \mathit{foldr}\;(\oplus)\;e \\ = & \qquad \{ \mbox{map-fold fusion: } (\)\;(a\oplus)\;b = a \oplus b \} \\ & \mathit{foldr}\;(\)\;e \cdot \mathit{map}\;(\oplus) \\ = & \qquad \{ \mbox{fusion: } (\ e)\;\mathit{id} = e \mbox{ and } (\ e)\;((\oplus) \cdot f) = (\oplus)\;(f\;e) = (\)\;(\oplus)\;((\ e)\;f) \} \\ & (\ e) \cdot \mathit{foldr}\;(\cdot)\;\mathit{id} \cdot \mathit{map}\;(\oplus) \\ = & \qquad \{ (\cdot) \mbox{ and } \mathit{id} \mbox{ form a monoid} \} \\ & (\ e) \cdot \mathit{hom}\;(\cdot)\;(\oplus)\;\mathit{id} \end{array}$

This result has many applications. For example, it’s the essence of parallel recognizers for regular languages. Language recognition looks at first like an inherently sequential process; a recognizer over an alphabet ${\Sigma}$ can be represented as a finite state machine over states ${S}$, with a state transition function of type ${S \times \Sigma \rightarrow S}$, and such a function is clearly not associative. But by mapping this function over the sequence of symbols, we get a sequence of ${S \rightarrow S}$ functions, which should be subject to composition. Composition is of course associative, so can be computed in parallel, in ${\mathrm{log}\;n}$ steps on ${n}$ processors. Better yet, each such ${S \rightarrow S}$ function can be represented as an array (since ${S}$ is finite), and the composition of any number of such functions takes a fixed amount of space to represent, and a fixed amount of time to apply, so the ${\mathrm{log}\;n}$ steps take ${\mathrm{log}\;n}$ time.

Similarly for carry-lookahead addition circuits. Binary addition of two ${n}$-bit numbers with an initial carry-in proceeds from right to left, adding bit by bit and taking care of carries, producing an ${n}$-bit result and a carry-out; this too appears inherently sequential. But there aren’t many options for the pair of input bits at a particular position: two ${1}$s will always generate an outgoing carry, two ${0}$s will always kill an incoming carry, and a ${1}$ and ${0}$ in either order will propagate an incoming carry to an outgoing one. Whether there is a carry-in at a particular bit position is computed by a ${\mathit{foldr}}$ of the bits to the right of that position, zipped together, starting from the initial carry-in, using the binary operator ${\oplus}$ defined by

$\displaystyle \begin{array}{@{}ll} (1,1) \oplus b & = 1 \\ (0,0) \oplus b & = 0 \\ (x,y) \oplus b & = b \end{array}$

Again, applying ${\oplus}$ to a bit-pair makes a bit-to-bit function; these functions are to be composed, and function composition is associative. Better, such functions have small finite representations as arrays (indeed, we need a domain of only three elements, ${G, K, P}$; ${G}$ and ${K}$ are both left zeroes of composition, and ${P}$ is neutral). Better still, we can compute the carry-in at all positions using a ${\mathit{scanr}}$, which for an associative binary operator can also be performed in parallel in ${\mathrm{log}\;n}$ steps on ${n}$ processors.

The Trick seems to be related to Cayley’s Theorem, on monoids rather than groups: any monoid is equivalent to a monoid of endomorphisms. That is, corresponding to every monoid ${(M,{\oplus},e)}$, with ${M}$ a set, and ${(\oplus) :: M \rightarrow M \rightarrow M}$ an associative binary operator with neutral element ${e :: M}$, there is another monoid ${(N,(\cdot),\mathit{id})}$ on the carrier ${N = \{ (x\oplus) \mid x \in M \}}$ of endomorphisms of the form ${(x\oplus)}$; the mappings ${(\oplus) :: M \rightarrow N}$ and ${(\ e) :: N \rightarrow M}$ are both monoid homomorphisms, and are each other’s inverse. (Cayley’s Theorem is what happens to the Yoneda Embedding when specialized to the one-object category representing a monoid.) So any list homomorphism corresponds to a list homomorphism with endomorphisms as the carrier, followed by a single final function application:

$\displaystyle \begin{array}{@{}ll} & \mathit{hom}\;(\oplus)\;f\;e \\ = & \qquad \{ \mbox{Second Homomorphism Theorem} \} \\ & \mathit{foldr}\;((\oplus) \cdot f)\;e \\ = & \qquad \{ \mbox{The Trick} \} \\ & (\ e) \cdot \mathit{hom}\;(\cdot)\;((\oplus) \cdot f)\;\mathit{id} \end{array}$

Note that ${(\oplus) \cdot f}$ takes a list element ${a}$ the endomorphism ${(f\,a\, \oplus)}$. The change of representation from ${M}$ to ${N}$ is the same as in The Trick, and is also what underlies Hughes’s novel representation of lists; but I can’t quite put my finger on what these both have to do with Cayley and Yoneda.

Posted in Uncategorized | 5 Comments

## Asymmetric Numeral Systems

The previous two posts discussed arithmetic coding (AC). In this post I’ll discuss a related but more recent entropy encoding technique called asymmetric numeral systems (ANS), introduced by Jarek Duda and explored in some detail in a series of 12 blog posts by Charles Bloom. ANS combines the effectiveness of AC (achieving approximately the theoretical optimum Shannon entropy) with the efficiency of Huffman coding. On the other hand, whereas AC acts in a first-in first-out manner, ANS acts last-in first-out, in the sense that the decoded text comes out in reverse; this is not a problem in a batched context, but it makes ANS unsuitable for encoding a communications channel, and also makes it difficult to employ adaptive text models.

## Arithmetic coding, revisited

Here is a simplified version of arithmetic coding; compared to the previous posts, we use a fixed rather than adaptive model, and rather than ${\mathit{pick}}$ing some arbitrary element of the final interval, we pick explicitly the lower bound of the interval (by ${\mathit{fst}}$, when it is represented as a pair). We suppose a function

$\displaystyle \begin{array}{@{}l} \mathit{count} :: \mathit{Symbol} \rightarrow \mathit{Integer} \\ \end{array}$

that returns a positive count for every symbol, representing predicted frequencies of occurrence, from which it is straightforward to derive functions

$\displaystyle \begin{array}{@{}l} \mathit{encodeSym} :: \mathit{Symbol} \rightarrow \mathit{Interval} \\ \mathit{decodeSym} :: \mathit{Rational} \rightarrow \mathit{Symbol} \end{array}$

that work on a fixed global model, satisfying the central property

$\displaystyle \mathit{decodeSym}\;x = s \Leftrightarrow x \in \mathit{encodeSym}\;s$

For example, an alphabet of three symbols a’, b’, c’, with counts 2, 3, and 5 respectively, could be encoded by the intervals ${(0,\frac{1}{5}), (\frac{1}{5},\frac{1}{2}),(\frac{1}{2},1)}$. We have operations on intervals:

$\displaystyle \begin{array}{@{}ll} \mathit{weight}\;(l,r)\;x & = l + (r-l) \times x \\ \mathit{scale}\;(l,r)\;x & = (x-l)/(r-l) \\ \mathit{narrow}\;i\;(p,q) & = (\mathit{weight}\;i\;p, \mathit{weight}\;i\;q) \\ \end{array}$

that satisfy

$\displaystyle \begin{array}{@{}ll} \mathit{weight}\;i\;x \in i & \Leftrightarrow x \in \mathit{unit} \\ \mathit{weight}\;i\;x = y & \Leftrightarrow \mathit{scale}\;i\;y = x \\ \end{array}$

Then we have:

$\displaystyle \begin{array}{@{}l} \mathit{encode}_1 :: [\mathit{Symbol}] \rightarrow \mathit{Rational} \\ \mathit{encode}_1 = \mathit{fst} \cdot \mathit{foldl}\;\mathit{estep}_1\;\mathit{unit} \qquad \mathbf{where} \\ \qquad \begin{array}[t]{@{}l} \mathit{estep}_1 :: \mathit{Interval} \rightarrow \mathit{Symbol} \rightarrow \mathit{Interval} \\ \mathit{estep}_1\;i\;s = \mathit{narrow}\;i\;(\mathit{encodeSym}\;s) \medskip \end{array} \\ \mathit{decode}_1 :: \mathit{Rational} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_1 = \mathit{unfoldr}\;\mathit{dstep}_1 \qquad \mathbf{where} \\ \qquad \begin{array}[t]{@{}l} \mathit{dstep}_1 :: \mathit{Rational} \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, \mathit{Rational}) \\ \mathit{dstep}_1\;x = \mathbf{let}\;s = \mathit{decodeSym}\;x\;\mathbf{in}\; \mathit{Just}\;(s, \mathit{scale}\;(\mathit{encodeSym}\;s)\;x) \end{array} \end{array}$

For example, with the alphabet and intervals above, the input text “abc” gets encoded symbol by symbol, from left to right (because of the ${\mathit{foldl}}$), to the narrowing sequence of intervals ${(\frac{0}{1}, \frac{1}{1}), (\frac{0}{1}, \frac{1}{5}), (\frac{1}{25}, \frac{1}{10}), (\frac{7}{100}, \frac{1}{10})}$ from which we select the lower bound ${\frac{7}{100}}$ of the final interval. (This version doesn’t stream well, because of the decision to pick the lower bound of the final interval as the result. But it will suffice for our purposes.)

Note now that the symbol encoding can be “fissioned” out of ${\mathit{encode}_1}$, using map fusion for ${\mathit{foldl}}$ backwards; moreover, ${\mathit{narrow}}$ is associative (and ${\mathit{unit}}$ its neutral element), so we can replace the ${\mathit{foldl}}$ with a ${\mathit{foldr}}$; and finally, now using fusion forwards, we can fuse the ${\mathit{fst}}$ with the ${\mathit{foldr}}$. That is,

$\displaystyle \begin{array}{@{}cl} & \mathit{encode}_1 \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fst} \cdot \mathit{foldl}\;\mathit{estep}_1\;\mathit{unit} \\ = & \qquad \{ \mbox{fusion for } \mathit{foldl} \mbox{, backwards} \} \\ & \mathit{fst} \cdot \mathit{foldl}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{map}\;\mathit{encodeSym} \\ = & \qquad \{ \mathit{narrow} \mbox{ is associative} \} \\ & \mathit{fst} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{map}\;\mathit{encodeSym} \\ = & \qquad \{ \mbox{fusion for } \mathit{foldr} \} \\ & \mathit{foldr}\;\mathit{weight}\;0 \cdot \mathit{map}\;\mathit{encodeSym} \end{array}$

and we can fuse the ${\mathit{map}}$ back into the ${\mathit{foldr}}$; so we have ${\mathit{encode}_1 = \mathit{encode}_2}$, where

$\displaystyle \begin{array}{@{}l} \mathit{encode}_2 :: [\mathit{Symbol}] \rightarrow \mathit{Rational} \\ \mathit{encode}_2 = \mathit{foldr}\;\mathit{estep}_2\;0 \qquad\mathbf{where} \\ \qquad \begin{array}[t]{@{}l} \mathit{estep}_2 :: \mathit{Symbol} \rightarrow \mathit{Rational} \rightarrow \mathit{Rational} \\ \mathit{estep}_2\;s\;x = \mathit{weight}\;(\mathit{encodeSym}\;s)\;x \end{array} \end{array}$

Now encoding is a simple ${\mathit{foldr}}$ and decoding a simple ${\mathit{unfoldr}}$. This means that we can prove that decoding inverts encoding, using the “Unfoldr–Foldr Theorem” stated in the Haskell documentation for ${\mathit{unfoldr}}$ (in fact, the only property of ${\mathit{unfoldr}}$ stated there!). The theorem states that if

$\displaystyle \begin{array}{@{}ll} g\;(f\;x\;z) & = \mathit{Just}\;(x,z) \\ g\;e & = \mathit{Nothing} \end{array}$

for all ${x}$ and ${z}$, then

$\displaystyle \mathit{unfoldr}\;g\;(\mathit{foldr}\;f\;e\;\mathit{xs}) = \mathit{xs}$

for all finite lists ${\mathit{xs}}$. Actually, that’s too strong for us here, because our decoding always generates an infinite list; a weaker variation (hereby called the “Unfoldr–Foldr Theorem With Junk”) states that if

$\displaystyle \begin{array}{@{}ll} g\;(f\;x\;z) & = \mathit{Just}\;(x,z) \end{array}$

for all ${x}$ and ${z}$, then

$\displaystyle \exists \mathit{ys}\,.\, \mathit{unfoldr}\;g\;(\mathit{foldr}\;f\;e\;\mathit{xs}) = \mathit{xs} \mathbin{{+}\!\!\!{+}} \mathit{ys}$

for all finite lists ${\mathit{xs}}$—that is, the ${\mathit{unfoldr}}$ inverts the ${\mathit{foldr}}$ except for appending some junk ${\mathit{ys}}$ to the output. It’s a nice exercise to prove this weaker theorem, by induction.

We check the condition:

$\displaystyle \begin{array}{@{}cl} & \mathit{dstep}_1\;(\mathit{estep}_2\;s\;x) \\ = & \qquad \{ \mathit{estep}_2 \mbox{; let } x' = \mathit{weight}\;(\mathit{encodeSym}\;s)\;x \} \\ & \mathit{dstep}_1\;x' \\ = & \qquad \{ \mathit{dstep}_1 \mbox{; let } s' = \mathit{decodeSym}\;x' \} \\ & \mathit{Just}\;(s', \mathit{scale}\;(\mathit{encodeSym}\;s')\;x') \end{array}$

Now, we hope to recover the first symbol, ie we hope that ${s' = s}$:

$\displaystyle \begin{array}{@{}cl} & \mathit{decodeSym}\;x' = s \\ \Leftrightarrow & \qquad \{ \mbox{central property} \} \\ & x' \in \mathit{encodeSym}\;s \\ \Leftrightarrow & \qquad \{ x' = \mathit{weight}\;(\mathit{encodeSym}\;s)\;x \mbox{; } \mathit{weight} \} \\ & x \in \mathit{unit} \end{array}$

Fortunately, it is an invariant of the encoding process that ${x}$—the result of ${\mathit{encode}_2}$—is in the unit interval, so indeed ${s' = s}$. Continuing:

$\displaystyle \begin{array}{@{}cl} & \mathit{dstep}_1\;(\mathit{estep}_2\;s\;x) \\ = & \qquad \{ \mbox{above} \} \\ & \mathit{Just}\;(s, \mathit{scale}\;(\mathit{encodeSym}\;s)\;(\mathit{weight}\;(\mathit{encodeSym}\;s)\;x)) \\ = & \qquad \{ \mathit{scale}\;i \cdot \mathit{weight}\;i = \mathit{id} \} \\ & \mathit{Just}\;(s, x) \end{array}$

as required. Therefore decoding inverts encoding, apart from the fact that it continues to produce junk having decoded all the input; so

$\displaystyle \mathit{take}\;(\mathit{length}\;\mathit{xs})\;(\mathit{decode}_1\;(\mathit{encode}_2\;\mathit{xs})) = \mathit{xs}$

for all finite ${\mathit{xs}}$.

## From fractions to integers

Whereas AC encodes longer and longer messages as more and more precise fractions, ANS encodes them as larger and larger integers. Given the ${\mathit{count}}$ function on symbols, we can easily derive definitions

$\displaystyle \begin{array}{@{}ll} \mathit{cumul} & :: \mathit{Symbol} \rightarrow \mathit{Integer} \\ \mathit{total} & :: \mathit{Integer} \\ \mathit{find} & :: \mathit{Integer} \rightarrow \mathit{Symbol} \\ \end{array}$

such that ${\mathit{cumul}\;s}$ gives the cumulative counts of all symbols preceding ${s}$ (say, in alphabetical order), ${\mathit{total}}$ the total of all symbol counts, and ${\mathit{find}\;x}$ looks up an integer ${0 \le x < \mathit{total}}$:

$\displaystyle \mathit{find}\;x = s \Leftrightarrow \mathit{cumul}\;s \le x < \mathit{cumul}\;s + \mathit{count}\;s$

(for example, we might tabulate the function ${\mathit{cumul}}$ using a scan).

We encode a text as an integer ${x}$, containing ${\log x}$ bits of information (all logs in this blog are base 2). The next symbol ${s}$ to encode has probability ${p = \mathit{count}\;s / \mathit{total}}$, and so requires an additional ${\log (1/p)}$ bits; in total, that’s ${\log x + \log (1/p) = \log (x/p) = \log (x \times \mathit{total}/\mathit{count}\;s)}$ bits. So entropy considerations tell us that, roughly speaking, to incorporate symbol ${s}$ into state ${x}$ we want to map ${x}$ to ${x' \simeq x \times \mathit{total} / \mathit{count}\;s}$. Of course, in order to decode, we need to be able to invert this transformation, to extract ${s}$ and ${x}$ from ${x'}$; this suggests that we should do the division by ${\mathit{count}\;s}$ first:

$\displaystyle x' = (x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s) \times \mathit{total} \qquad \mbox{\textemdash{} not final}$

so that the multiplication by the known value ${\mathit{total}}$ can be undone first:

$\displaystyle x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s = x' \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total}$

How do we reconstruct ${s}$? Well, there is enough headroom in ${x'}$ to add any non-negative value less than ${\mathit{total}}$ without affecting the division; in particular, we can add ${\mathit{cumul}\;s}$ to ${x'}$, and then we can use ${\mathit{find}}$ on the remainder:

$\displaystyle \begin{array}{@{}l} x' = (x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s) \times \mathit{total} + \mathit{cumul}\;s \qquad \mbox{\textemdash{} also not final} \\ x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s = x' \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total} \\ \mathit{cumul}\;s = x' \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{total} \\ s = \mathit{find}\;(\mathit{cumul}\;s) = \mathit{find}\;(x' \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{total}) \end{array}$

But we have still lost some information from the lower end of ${x}$ through the division, namely ${x \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{count}\;s}$; so we can’t yet reconstruct ${x}$. Happily, ${\mathit{find}\;(\mathit{cumul}\;s) = \mathit{find}\;(\mathit{cumul}\;s+r)}$ for any ${0 \le r < \mathit{count}\;s}$, so there is still precisely enough headroom in ${x'}$ to add this lost information too without affecting the ${\mathit{find}}$, allowing us also to reconstruct ${x}$:

$\displaystyle \begin{array}{@{}l} x' = (x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s) \times \mathit{total} + \mathit{cumul}\;s + x \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{count}\;s \qquad \mbox{\textemdash{} final} \\ x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s = x' \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total} \\ \mathit{cumul}\;s + x \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{count}\;s = x' \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{total} \\ s = \mathit{find}\;(\mathit{cumul}\;s + x \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{count}\;s) = \mathit{find}\;(x' \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{total}) \\ x = \mathit{count}\;s \times (x \mathbin{\underline{\smash{\mathit{div}}}} \mathit{count}\;s) + x \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{count}\;s = \mathit{count}\;s \times (x' \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total}) + x' \mathbin{\underline{\smash{\mathit{mod}}}} \mathit{total} - \mathit{cumul}\;s \end{array}$

We therefore define

$\displaystyle \begin{array}{@{}l} \mathit{encode}_3 :: [\mathit{Symbol}] \rightarrow \mathit{Integer} \\ \mathit{encode}_3 = \mathit{foldr}\;\mathit{estep}_3\;0 \\ \mathit{estep}_3 :: \mathit{Symbol} \rightarrow \mathit{Integer} \rightarrow \mathit{Integer} \\ \mathit{estep}_3\;s\;x = \mathbf{let}\;(q,r) = \mathit{divMod}\;x\;(\mathit{count}\;s)\;\mathbf{in}\;q \times \mathit{total} + \mathit{cumul}\;s + r \medskip \\ \mathit{decode}_3 :: \mathit{Integer} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_3 = \mathit{unfoldr}\;\mathit{dstep}_3 \\ \mathit{dstep}_3 :: \mathit{Integer} \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, \mathit{Integer}) \\ \mathit{dstep}_3\;x = \mathbf{let}\;(q,r) = \mathit{divMod}\;x\;\mathit{total} ; s = \mathit{find}\;r\;\mathbf{in}\; \mathit{Just}\;(s, \mathit{count}\;s \times q + r - \mathit{cumul}\;s) \end{array}$

Using similar reasoning as for AC, it is straightforward to show that a decoding step inverts an encoding step:

$\displaystyle \begin{array}{@{}cl} & \mathit{dstep}_3\;(\mathit{estep}_3\;s\;x) \\ = & \qquad \{ \mathit{estep}_3 \mbox{; let } (q,r) = \mathit{divMod}\;x\;(\mathit{count}\;s), x' = q \times \mathit{total} + \mathit{cumul}\;s + r \} \\ & \mathit{dstep}_3\;x' \\ = & \qquad \{ \mathit{dstep}_3 \mbox{; let } (q',r') = \mathit{divMod}\;x'\;\mathit{total} ; s' = \mathit{find}\;r' \} \\ & \mathit{Just}\;(s', \mathit{count}\;s' \times q' + r' - \mathit{cumul}\;s') \\ = & \qquad \{ r' = \mathit{cumul}\;s + r, 0 \le r < \mathit{count}\;s, \mbox{ so } s'=\mathit{find}\;r'=s \} \\ & \mathit{Just}\;(s, \mathit{count}\;s \times q' + r' - \mathit{cumul}\;s) \\ = & \qquad \{ r' - \mathit{cumul}\;s = r, q' = x' \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total} = q \} \\ & \mathit{Just}\;(s, \mathit{count}\;s \times q + r) \\ = & \qquad \{ (q,r) = \mathit{divMod}\;x\;(\mathit{count}\;s) \} \\ & \mathit{Just}\;(s,x) \end{array}$

Therefore decoding inverts encoding, modulo final junk, by the Unfoldr–Foldr Theorem With Junk:

$\displaystyle \mathit{take}\;(\mathit{length}\;\mathit{xs})\;(\mathit{decode}_3\;(\mathit{encode}_3\;\mathit{xs})) = \mathit{xs}$

for all finite ${\mathit{xs}}$. For example, with the same alphabet and symbol weights as before, encoding the text “abc” (now from right to left, because of the ${\mathit{foldr}}$) proceeds through the steps ${0, 5, 14, 70}$, and the last element ${70}$ is the encoding.

In fact, the inversion property holds whatever value we use to start encoding with (since it isn’t used in the proof); in the next section, we start encoding with a certain lower bound ${l}$ rather than ${0}$. Moreover, ${\mathit{estep}_3}$ is strictly increasing on states strictly greater than zero, and ${\mathit{dstep}_3}$ strictly decreasing; which means that the decoding process can stop when it returns to the lower bound. That is, if we pick some ${l > 0}$ and define

$\displaystyle \begin{array}{@{}l} \mathit{encode}_4 :: [\mathit{Symbol}] \rightarrow \mathit{Integer} \\ \mathit{encode}_4 = \mathit{foldr}\;\mathit{estep}_3\;l \medskip \\ \mathit{decode}_4 :: \mathit{Integer} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_4 = \mathit{unfoldr}\;\mathit{dstep}_4 \\ \mathit{dstep}_4 :: \mathit{Integer} \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, \mathit{Integer}) \\ \mathit{dstep}_4\;x = \mathbf{if}\; x==l \;\mathbf{then}\; \mathit{Nothing} \;\mathbf{else}\; \mathit{dstep}_3\;x \end{array}$

then the stronger version of the Unfoldr–Foldr Theorem holds, and we have

$\displaystyle \mathit{decode}_4\;(\mathit{encode}_4\;\mathit{xs}) = \mathit{xs}$

for all finite ${\mathit{xs}}$, without junk.

## Bounded precision

The previous versions all used arbitrary-precision arithmetic, which is expensive. We now change the approach slightly to use only bounded-precision arithmetic. As usual, there is a trade-off between effectiveness (a bigger bound means more accurate approximations to ideal entropies) and efficiency (a smaller bound generally means faster operations). Fortunately, the reasoning does not depend on much about the actual bounds. We will represent the integer accumulator ${z}$ as a pair ${(x,\mathit{ys})}$ which we call a ${\mathit{Number}}$:

$\displaystyle \mathbf{type}\; \mathit{Number} = (\mathit{Integer}, [\mathit{Integer}])$

such that ${\mathit{ys}}$ is a list of digits in a given base ${b}$, and ${x}$ is an integer with ${l \le x < u}$ (where we define ${u = l \times b}$ for the upper bound of the range), under the abstraction relation

$\displaystyle z = \mathit{foldl}\;\mathit{inject}\;x\;\mathit{ys}$

where

$\displaystyle \begin{array}{@{}ll} \mathit{inject}\;x\;y & = x \times b + y \\ \mathit{extract}\;x & = \mathit{divMod}\;x\;b \end{array}$

We call ${x}$ the “window” and ${\mathit{ys}}$ the “remainder”. For example, with ${b=10}$ and ${l=100}$, the pair ${(123,[4,5,6])}$ represents ${123456}$; and indeed, for any ${z \ge l}$ there is a unique representation satisfying the invariants. According to Duda, ${\mathit{total}}$ should divide into ${l}$ evenly. On a real implementation, ${b}$ and ${l}$ should be powers of two, such that arithmetic on values up to ${u}$ fits within a single machine word.

The encoding step acts on the window in the accumulator using ${\mathit{estep}_3}$, which risks making it overflow the range; we therefore renormalize with ${\mathit{enorm}_5}$ by shifting digits from the window to the remainder until this overflow would no longer happen, before consuming the symbol.

$\displaystyle \begin{array}{@{}l} \mathit{econsume}_5 :: [\mathit{Symbol}] \rightarrow \mathit{Number} \\ \mathit{econsume}_5 = \mathit{foldr}\;\mathit{estep}_5\;(l,[\,]) \medskip \\ \mathit{estep}_5 :: \mathit{Symbol} \rightarrow \mathit{Number} \rightarrow \mathit{Number} \\ \mathit{estep}_5\;s\;(x,\mathit{ys}) = \mathbf{let}\;(x',\mathit{ys}') = \mathit{enorm}_5\;s\;(x,\mathit{ys})\;\mathbf{in}\;(\mathit{estep}_3\;s\;x',\mathit{ys}') \medskip \\ \mathit{enorm}_5 :: \mathit{Symbol} \rightarrow \mathit{Number} \rightarrow \mathit{Number} \\ \mathit{enorm}_5\;s\;(x,\mathit{ys}) = \begin{array}[t]{@{}l@{}} \mathbf{if}\;\mathit{estep}_3\;s\;x < u \;\mathbf{then}\; (x,\mathit{ys})\;\mathbf{else}\; \\ \mathbf{let}\;(q,r) = \mathit{extract}\;x \;\mathbf{in}\; \mathit{enorm}_5\;s\;(q,r:\mathit{ys}) \end{array} \end{array}$

Some calculation, using the fact that ${\mathit{total}}$ divides ${l}$, allows us to rewrite the guard in ${\mathit{enorm}_5}$:

$\displaystyle \mathit{estep}_3\;s\;x < u \Leftrightarrow x < b \times (l \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total}) \times \mathit{count}\;s$

For example, again encoding the text “abc”, again from right to left, with ${r=10}$ and ${l=100}$, the accumulator starts with ${(100,[\,])}$, and goes through the states ${(205,[\,])}$ and ${(683,[\,])}$ on consuming the c’ then the b’; directly consuming the a’ would make the window overflow, so we renormalize to ${(68,[3])}$; then it is safe to consume the a’, leading to the final state ${(340,[3])}$.

Decoding is an unfold using the accumulator as state. We repeatedly output a symbol from the window; this may make the window underflow the range, in which case we renormalize if possible by injecting digits from the remainder (and if this is not possible, because there are no more digits to inject, it means that we have decoded the entire text).

$\displaystyle \begin{array}{@{}l} \mathit{dproduce}_5 :: \mathit{Number} \rightarrow [\mathit{Symbol}] \\ \mathit{dproduce}_5 = \mathit{unfoldr}\;\mathit{dstep}_5 \medskip \\ \mathit{dstep}_5 :: \mathit{Number} \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, \mathit{Number}) \\ \mathit{dstep}_5\;(x,\mathit{ys}) = \begin{array}[t]{@{}l} \mathbf{let}\; \mathit{Just}\;(s, x') = \mathit{dstep}_3\;x ; (x'',\mathit{ys}'') = \mathit{dnorm}_5\;(x',\mathit{ys}) \;\mathbf{in} \\ \mathbf{if}\; x'' \ge l \;\mathbf{then}\; \mathit{Just}\;(s,(x'',\mathit{ys}'')) \;\mathbf{else}\; \mathit{Nothing} \medskip \end{array} \\ \mathit{dnorm}_5 :: \mathit{Number} \rightarrow \mathit{Number} \\ \mathit{dnorm}_5\;(x,y:\mathit{ys}) = \mathbf{if}\; x < l \;\mathbf{then}\; \mathit{dnorm}_5\; (\mathit{inject}\;x\;y, \mathit{ys}) \;\mathbf{else}\; (x,y:\mathit{ys}) \\ \mathit{dnorm}_5\;(x,[\,]) = (x,[\,]) \end{array}$

Note that decoding is of course symmetric to encoding; in particular, when encoding we renormalize before consuming a symbol; therefore when decoding we renormalize after emitting a symbol. For example, decoding the final encoding ${(340,[3])}$ starts by computing ${\mathit{dstep}_3\;340 = \mathit{Just}\;(\mbox{\texttt{\char39{}a\char39}},68)}$; the window value 68 has underflowed, so renormalization consumes the remaining digit 3, leading to the accumulator ${(683,[\,])}$; then decoding proceeds to extract the b’ and c’ in turn, returning the accumulator to ${(100,[\,])}$.

We can again prove that decoding inverts encoding, although we need to use yet another variation of the Unfoldr–Foldr Theorem, hereby called the “Unfoldr–Foldr Theorem With Invariant”. We say that predicate ${p}$ is an invariant of ${\mathit{foldr}\;f\;e}$ and ${\mathit{unfoldr}\;g}$ if

$\displaystyle \begin{array}{@{}lcl} p\;z &\Rightarrow& p\;(f\;x\;z) \\ p\;z \land g\;z = \mathit{Just}\;(x,z') &\Rightarrow& p\;z' \end{array}$

The theorem is that if ${p}$ is such an invariant, and the conditions

$\displaystyle \begin{array}{@{}ll} g\;(f\;x\;z) & = \mathit{Just}\;(x,z) \qquad \Leftarrow p\;z \\ g\;e & = \mathit{Nothing} \end{array}$

hold for all ${x}$ and ${z}$, then

$\displaystyle \mathit{unfoldr}\;g\;(\mathit{foldr}\;f\;e\;\mathit{xs}) = \mathit{xs} \qquad \Leftarrow p\;e$

for all finite lists ${\mathit{xs}}$, provided that the initial state ${e}$ satisfies the invariant ${p}$. In our case, the invariant is that the window ${x}$ is in range (${l \le x < u}$), which is indeed maintained by ${\mathit{estep}_5}$ and ${\mathit{dstep}_5}$. Then it is straightforward to verify that

$\displaystyle \mathit{dstep}_5\;(l,[\,]) = \mathit{Nothing}$

and

$\displaystyle \mathit{dstep}_5\;(\mathit{estep}_5\;s\;(x,\mathit{ys})) = \mathit{Just}\;(s,(x,\mathit{ys}))$

when ${x}$ is in range, making use of a lemma that

$\displaystyle \mathit{dnorm}_5\;(\mathit{enorm}_5\;(x,\mathit{ys})) = (x,\mathit{ys})$

when ${x}$ is in range. Therefore decoding inverts encoding:

$\displaystyle \mathit{dproduce}_5\;(\mathit{econsume}_5\;\mathit{xs}) = \mathit{xs}$

for all finite ${\mathit{xs}}$.

The version of encoding in the previous section yields a ${\mathit{Number}}$, that is, a pair consisting of an integer window and a digit sequence remainder. It would be more conventional for encoding to take a sequence of symbols to a sequence of digits alone, and decoding to take the sequence of digits back to a sequence of symbols. For encoding, we have to flush the remaining digits out of the window at the end of the process, reducing the window to zero:

$\displaystyle \begin{array}{@{}l} \mathit{eflush}_5 :: \mathit{Number} \rightarrow [\mathit{Integer}] \\ \mathit{eflush}_5\;(0,\mathit{ys}) = \mathit{ys} \\ \mathit{eflush}_5\;(x,\mathit{ys}) = \mathbf{let}\; (x',y) = \mathit{extract}\;x \;\mathbf{in}\; \mathit{eflush}_5\;(x',y:\mathit{ys}) \end{array}$

For example, ${\mathit{eflush}_5\;(340,[3]) = [3,4,0,3]}$. Then we can define

$\displaystyle \begin{array}{@{}l} \mathit{encode}_5 :: [\mathit{Symbol}] \rightarrow [\mathit{Integer}] \\ \mathit{encode}_5 = \mathit{eflush}_5 \cdot \mathit{econsume}_5 \end{array}$

Correspondingly, decoding should start by populating an initially-zero window from the sequence of digits:

$\displaystyle \begin{array}{@{}l} \mathit{dstart}_5 :: [\mathit{Integer}] \rightarrow \mathit{Number} \\ \mathit{dstart}_5\;\mathit{ys} = \mathit{dnorm}_5\;(0,\mathit{ys}) \end{array}$

For example, ${\mathit{dstart}_5\;[3,4,0,3] = (340,[3])}$. Then we can define

$\displaystyle \begin{array}{@{}l} \mathit{decode}_5 :: [\mathit{Integer}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_5 = \mathit{dproduce}_5 \cdot \mathit{dstart}_5 \end{array}$

One can show that

$\displaystyle \mathit{dstart}_5\;(\mathit{eflush}_5\;x) = x$

provided that ${x}$ is initially in range, and therefore again decoding inverts encoding:

$\displaystyle \begin{array}{@{}ll} & \mathit{decode}_5\;(\mathit{encode}_5\;\mathit{xs}) \\ = & \qquad \{ \mathit{decode}_5, \mathit{encode}_5 \} \\ & \mathit{dproduce}_5\;(\mathit{dstart}_5\;(\mathit{eflush}_5\;(\mathit{econsume}_5\;\mathit{xs}))) \\ = & \qquad \{ \mathit{econsume}_5 \mbox { yields an in-range window, on which } \mathit{dstart}_5 \mbox{ inverts } \mathit{eflush}_5 \} \\ & \mathit{dproduce}_5\;(\mathit{econsume}_5\;\mathit{xs}) \\ = & \qquad \{ \mathit{dproduce}_5 \mbox { inverts } \mathit{econsume}_5 \} \\ & \mathit{xs} \end{array}$

for all finite ${\mathit{xs}}$.

Note that this is not just a data refinement—it is not the case that ${\mathit{encode}_5}$ yields the digits of the integer computed by ${\mathit{encode}_4}$. For example, ${\mathit{encode}_4\;\mbox{\texttt{\char34{}abc\char34}} = 3411}$, whereas ${\mathit{encode}_5\;\mbox{\texttt{\char34{}abc\char34}} = [3,4,0,3]}$. We have really changed the effectiveness of the encoding in return for the increased efficiency.

## Streaming encoding

We would now like to stream encoding and decoding as metamorphisms: encoding should start delivering digits before consuming all the symbols, and conversely decoding should start delivering symbols before consuming all the digits.

For encoding, we have

$\displaystyle \begin{array}{@{}l} \mathit{encode}_5 :: [\mathit{Symbol}] \rightarrow [\mathit{Integer}] \\ \mathit{encode}_5 = \mathit{eflush}_5 \cdot \mathit{econsume}_5 \end{array}$

with ${\mathit{econsume}_5}$ a fold; can we turn ${\mathit{eflush}_5}$ into an unfold? Yes, we can! The remainder in the accumulator should act as a queue of digits: digits get enqueued at the most significant end, so we need to dequeue them from the least significant end. So we define

$\displaystyle \begin{array}{@{}l} \mathit{edeal}_6 :: [\alpha] \rightarrow [\alpha] \\ \mathit{edeal}_6 = \mathit{unfoldr}\;\mathit{unsnoc} \;\mathbf{where} \\ \qquad \begin{array}[t]{@{}ll} \mathit{unsnoc}\;[\,] & = \mathit{Nothing} \\ \mathit{unsnoc}\;\mathit{ys} & = \mathbf{let}\; (\mathit{ys}',y) = (\mathit{init}\;\mathit{ys}, \mathit{last}\;\mathit{ys}) \;\mathbf{in}\; \mathit{Just}\;(y, \mathit{ys}') \end{array} \end{array}$

to “deal out” the elements of a list one by one, starting with the last. Of course, this reverses the list; so we have the slightly awkward

$\displaystyle \mathit{eflush}_5 = \mathit{reverse} \cdot \mathit{edeal}_6 \cdot \mathit{eflush}_5$

Now we can use unfold fusion to fuse ${\mathit{edeal}_6}$ and ${\mathit{eflush}_5}$ into a single unfold, deriving the coalgebra

$\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{efstep}_6 :: \mathit{Number} \rightarrow \mathsf{Maybe}\;(\mathit{Integer}, \mathit{Number})} \\ \mathit{efstep}_6\;(0,[\,]) & = \mathit{Nothing} \\ \mathit{efstep}_6\;(x,[\,]) & = \mathbf{let}\; (q,r) = \mathit{extract}\;x \;\mathbf{in}\; \mathit{Just}\;(r,(q,[\,])) \\ \mathit{efstep}_6\;(x,\mathit{ys}) & = \mathbf{let}\; (\mathit{ys}',y) = (\mathit{init}\;\mathit{ys}, \mathit{last}\;\mathit{ys}) \;\mathbf{in}\; \mathit{Just}\;(y, (x,\mathit{ys}')) \end{array}$

by considering each of the three cases, getting ${\mathit{eflush}_5 = \mathit{eflush}_6}$ where

$\displaystyle \begin{array}{@{}l} \mathit{eflush}_6 :: \mathit{Number} \rightarrow [\mathit{Integer}] \\ \mathit{eflush}_6 = \mathit{reverse} \cdot \mathit{unfoldr}\;\mathit{efstep}_6 \end{array}$

As for the input part, we have a ${\mathit{foldr}}$ where we need a ${\mathit{foldl}}$. Happily, that is easy to achieve, at the cost of an additional ${\mathit{reverse}}$, since:

$\displaystyle \mathit{foldr}\;f\;e\;\mathit{xs} = \mathit{foldl}\;(\mathit{flip}\;f)\;e\;(\mathit{reverse}\;\mathit{xs})$

on finite ${\mathit{xs}}$. So we have ${\mathit{encode}_5 = \mathit{encode}_6}$, where

$\displaystyle \begin{array}{@{}l} \mathit{encode}_6 :: [\mathit{Symbol}] \rightarrow [\mathit{Integer}] \\ \mathit{encode}_6 = \mathit{reverse} \cdot \mathit{unfoldr}\;\mathit{efstep}_6 \cdot \mathit{foldl}\;(\mathit{flip}\;\mathit{estep}_5)\;(l,[\,]) \cdot \mathit{reverse} \end{array}$

It turns out that the streaming condition does not hold for ${\mathit{efstep}_6}$ and ${\mathit{flip}\;\mathit{estep}_5}$—although we can stream digits out of the remainder:

$\displaystyle \begin{array}{@{}ll} \mathit{efstep}_6\;(123,[4,5,6]) & = \mathit{Just}\;(6,(123,[4,5])) \\ \mathit{estep}_5\;\mbox{\texttt{\char39{}c\char39}}\;(123,[4,5,6]) & = (248,[4,5,6]) \\ \mathit{efstep}_6\;(\mathit{estep}_5\;\mbox{\texttt{\char39{}c\char39}}\;(123,[4,5,6])) & = \mathit{Just}\;(6,(248,[4,5])) \end{array}$

we cannot stream the last few digits out of the window:

$\displaystyle \begin{array}{@{}ll} \mathit{efstep}_6\;(123,[\,]) & = \mathit{Just}\;(3,(12,[\,])) \\ \mathit{estep}_5\;\mbox{\texttt{\char39{}c\char39}}\;(123,[\,]) & = (248,[\,]) \\ \mathit{efstep}_6\;(\mathit{estep}_5\;\mbox{\texttt{\char39{}c\char39}}\;(123,[\,])) & = \mathit{Just}\;(8,(24,[\,])) \end{array}$

We have to resort to flushing streaming, which starts from an apomorphism

$\displaystyle \mathit{apo} :: (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow (\beta \rightarrow [\gamma]) \rightarrow \beta \rightarrow [\gamma]$

rather than an unfold. Of course, any unfold is trivially an apo, with the trivial flusher that always yields the empty list; but more interestingly, any unfold can be factored into a “cautious” phase (delivering elements only while a predicate ${p}$ holds) followed by a “reckless” phase (discarding the predicate, and delivering the elements anyway)

$\displaystyle \mathit{unfoldr}\;g = \mathit{apo}\;(\mathit{guard}\;p\; g)\;(\mathit{unfoldr}\;g)$

where

$\displaystyle \begin{array}{@{}l} \mathit{guard} :: (b \rightarrow \mathit{Bool}) \rightarrow (b \rightarrow \mathsf{Maybe}\;(c,b)) \rightarrow (b \rightarrow \mathsf{Maybe}\;(c,b)) \\ \mathit{guard}\;p\;g\;x = \mathbf{if}\; p\;x \;\mathbf{then}\; g\;x \;\mathbf{else}\; \mathit{Nothing} \end{array}$

In particular, the streaming condition may hold for the cautious coalgebra ${\mathit{guard}\;p\;g}$ when it doesn’t hold for the reckless coalgebra ${g}$ itself. We’ll use the cautious coalgebra

$\displaystyle \begin{array}{@{}l} \mathit{efstep}_7 :: \mathit{Number} \rightarrow \mathsf{Maybe}\;(\mathit{Integer}, \mathit{Number}) \\ \mathit{efstep}_7 = \mathit{guard}\;(\mathit{not} \cdot \mathit{null} \cdot \mathit{snd})\;\mathit{efstep}_6 \end{array}$

which carefully avoids the problematic case when the remainder is empty. It is now straightforward to verify that the streaming condition holds for ${\mathit{efstep}_7}$ and ${\mathit{flip}\;\mathit{estep}_5}$, and therefore ${\mathit{encode}_6 = \mathit{encode}_7}$, where

$\displaystyle \begin{array}{@{}l} \mathit{encode}_7 :: [\mathit{Symbol}] \rightarrow [\mathit{Integer}] \\ \mathit{encode}_7 = \mathit{reverse} \cdot \mathit{fstream}\;\mathit{efstep}_7\;(\mathit{unfoldr}\;\mathit{efstep}_6)\;(\mathit{flip}\;\mathit{estep}_5)\;(l,[\,]) \cdot \mathit{reverse} \end{array}$

and where ${\mathit{encode}_7}$ streams its output. On the downside, this has to read the input text in reverse, and also write the output digit sequence in reverse.

## Streaming decoding

Fortunately, decoding is rather easier to stream. We have

$\displaystyle \mathit{decode}_5 = \mathit{dproduce}_5 \cdot \mathit{dstart}_5$

with ${\mathit{dproduce}_5}$ an unfold; can we turn ${\mathit{dstart}_5}$ into a fold? Yes, we can! In fact, we have ${\mathit{dstart}_5 = \mathit{foldl}\;\mathit{dsstep}_8\;(0,[\,])}$, where

$\displaystyle \begin{array}{@{}ll} \mathit{dsstep}_8\;(x,[\,])\;y & = \mathbf{if}\; x < l \;\mathbf{then}\; (\mathit{inject}\;x\;y, [\,]) \;\mathbf{else}\; (x, [y]) \\ \mathit{dsstep}_8\;(x,\mathit{ys})\;y & = (x, \mathit{ys} \mathbin{{+}\!\!\!{+}} [y]) \end{array}$

That is, starting with ${0}$ for the accumulator, digits are injected one by one into the window until this is in range, and thereafter appended to the remainder.

Now we have decoding as an ${\mathit{unfoldr}}$ after a ${\mathit{foldl}}$, and it is straightforward to verify that the streaming condition holds for ${\mathit{dstep}_5}$ and ${\mathit{dsstep}_8}$. Therefore ${\mathit{decode}_5 = \mathit{decode}_8}$ on finite inputs, where

$\displaystyle \begin{array}{@{}l} \mathit{decode}_8 :: [\mathit{Integer}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_8 = \mathit{stream}\;\mathit{dstep}_5\;\mathit{dsstep}_8\;(0,[\,]) \end{array}$

and ${\mathit{decode}_8}$ streams. It is perhaps a little more symmetric to move the ${\mathit{reverse}}$ from the final step of encoding to the initial step of decoding—that is, to have the least significant digit first in the encoded text, rather than the most significant—and then to view both the encoding and the decoding process as reading their inputs from right to left.

## Summing up

Inlining definitions and simplifying, we have ended up with encoding as a flushing stream:

$\displaystyle \begin{array}{@{}l} \mathit{encode}_9 :: [\mathit{Symbol}] \rightarrow [\mathit{Integer}] \\ \mathit{encode}_9 = \mathit{fstream}\;g'\;(\mathit{unfoldr}\;g)\;f\;(l,[\,]) \cdot \mathit{reverse} \;\mathbf{where} \\ \qquad \begin{array}[t]{@{}ll} f\;(x,\mathit{ys})\;s & = \begin{array}[t]{@{}ll} \mathbf{if} & x < b \times (l \mathbin{\underline{\smash{\mathit{div}}}} total) \times \mathit{count}\;s \\ \mathbf{then} & \mathbf{let}\; (q,r) = \mathit{divMod}\;x\;(\mathit{count}\;s) \;\mathbf{in}\; (q \times \mathit{total} + \mathit{cumul}\;s + r,\mathit{ys}) \\ \mathbf{else} & \mathbf{let}\; (q,r) = \mathit{extract}\;x \;\mathbf{in}\; f\;(q, r : \mathit{ys})\;s \end{array} \\ g\;(0,[\,]) & = \mathit{Nothing} \\ g\;(x,[\,]) & = \mathbf{let}\; (q,r) = \mathit{extract}\;x \;\mathbf{in}\; \mathit{Just}\;(r,(q,[\,])) \\ g\;(x,\mathit{ys}) & = \mathbf{let}\; (\mathit{ys}',y) = (\mathit{init}\;\mathit{ys}, \mathit{last}\;\mathit{ys}) \;\mathbf{in}\; \mathit{Just}\;(y, (x,\mathit{ys}')) \\ g'\;(x,\mathit{ys}) & = \mathbf{if}\; \mathit{null}\;\mathit{ys} \;\mathbf{then}\; \mathit{Nothing} \;\mathbf{else}\; g\;(x,\mathit{ys}) \end{array} \end{array}$

and decoding as an ordinary stream:

$\displaystyle \begin{array}{@{}l} \mathit{decode}_9 :: [\mathit{Integer}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_9 = \mathit{stream}\;g\;f\;(0,[\,]) \cdot \mathit{reverse} \;\mathbf{where} \\ \qquad \begin{array}[t]{@{}ll} g\;(x,\mathit{ys}) & = \begin{array}[t]{@{}lll} \mathbf{let} & (q,r) & = \mathit{divMod}\;x\;\mathit{total} \\ & s & = \mathit{find}\;r \\ & (x'',\mathit{ys}'') & = n\;(\mathit{count}\;s \times q + r - \mathit{cumul}\;s,\mathit{ys}) \\ \mathbf{in} & \multicolumn{2}{l}{\mathbf{if}\; x'' \ge l \;\mathbf{then}\; \mathit{Just}\;(s,(x'',\mathit{ys}'')) \;\mathbf{else}\; \mathit{Nothing}} \end{array} \\ n\;(x,\mathit{ys}) & = \begin{array}[t]{@{}l} \mathbf{if}\; x\ge l \lor \mathit{null}\;\mathit{ys} \;\mathbf{then}\; (x,\mathit{ys}) \;\mathbf{else} \\ \mathbf{let}\; (y:\mathit{ys}') = \mathit{ys} \;\mathbf{in}\; n\;(\mathit{inject}\;x\;y, \mathit{ys}') \end{array} \\ f\;(x,[\,]) y & = n\;(x,[y]) \\ f\;(x,\mathit{ys})\;y & = (x, \mathit{ys} \mathbin{{+}\!\!\!{+}} [y]) \end{array} \end{array}$

Note that the two occurrences of ${\mathit{reverse}}$ can be taken to mean that encoding and decoding both process their input from last to first. The remainder acts as a queue, with digits being added at one end and removed from the other, so should be represented to make that efficient. But in fact, the remainder can be eliminated altogether, yielding simply the window for the accumulator—for encoding:

$\displaystyle \begin{array}{@{}l} \mathit{encode} :: [\mathit{Symbol}] \rightarrow [\mathit{Integer}] \\ \mathit{encode} = h_1\;l \cdot \mathit{reverse} \qquad \mathbf{where} \\ \qquad \begin{array}[t]{@{}ll} h_1\;x\;(s:\mathit{ss}') & = \begin{array}[t]{@{}ll} \mathbf{if} & x < b \times (l \mathbin{\underline{\smash{\mathit{div}}}} \mathit{total}) \times \mathit{count}\;s \\ \mathbf{then} & \mathbf{let}\; (q,r) = \mathit{divMod}\;x\;(\mathit{count}\;s) \;\mathbf{in}\; h_1\;(q \times total + \mathit{cumul}\;s + r)\;\mathit{ss}' \\ \mathbf{else} & \mathbf{let}\; (q,r) = \mathit{extract}\;x \;\mathbf{in}\; r : h_1\;q\;(s:\mathit{ss}') \end{array} \\ h_1\;x\;[\,] & = h_2\;x \\ h_2\;x & = \mathbf{if}\; x==0 \;\mathbf{then}\; [\,] \;\mathbf{else}\;\mathbf{let}\; (q,r) = \mathit{extract}\;x \;\mathbf{in}\; r : h_2\;q \end{array} \end{array}$

and for decoding:

$\displaystyle \begin{array}{@{}l} \mathit{decode} :: [\mathit{Integer}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode} = h_0\;0 \cdot \mathit{reverse} \qquad \mathbf{where} \\ \qquad \begin{array}{@{}lll} h_0\;x\;(y:\mathit{ys}) & \mid x < l & = h_0\;(\mathit{inject}\;x\;y)\;\mathit{ys} \\ h_0\;x\;\mathit{ys} && = h_1\;x\;\mathit{ys} \\ h_1\;x\;\mathit{ys} && = \begin{array}[t]{@{}ll} \mathbf{let} & \begin{array}[t]{@{}ll} (q,r) & = \mathit{divMod}\;x\;\mathit{total} \\ s & = \mathit{find}\;r \\ x' & = \mathit{count}\;s \times q + r - \mathit{cumul}\;s \end{array} \\ \mathbf{in} & h_2\;s\;x'\;\mathit{ys} \end{array} \\ h_2\;s\;x\;(y:\mathit{ys}) & \mid x < l & = h_2\;s\;(\mathit{inject}\;x\;y)\;\mathit{ys} \\ h_2\;s\;x\;\mathit{ys} & \mid x \ge l & = s : h_1\;x\;\mathit{ys} \\ h_2\;s\;x\;[\,] && = [\,] \end{array} \end{array}$

This gives rather simple programs, corresponding to those in Duda’s paper; however, the calculation of this last version from the previous steps currently eludes me.

Posted in Uncategorized | 3 Comments

## Streaming Arithmetic Coding

In the previous post we saw the basic definitions of arithmetic encoding and decoding, and a proof that decoding does indeed successfully retrieve the input. In this post we go on to show how both encoding and decoding can be turned into streaming processes.

## Producing bits

Recall that

$\displaystyle \begin{array}{@{}l} \mathit{encode}_0 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow \mathit{Rational} \\ \mathit{encode}_0\;m = \mathit{pick} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \vrule width0pt depth2ex \\ \mathit{decode}_0 :: \mathit{Model} \rightarrow \mathit{Rational} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_0\;m\;x = \mathit{unfoldr}\;\mathit{step}\;(m,x) \end{array}$

Encoding and decoding work together. But they work only in batch mode: encoding computes a fraction, and yields nothing until the last step, and so decoding cannot start until encoding has finished. We really want encoding to yield as the encoded text a list of bits representing the fraction, rather than the fraction itself, so that we can stream the encoded text and the decoding process. To this end, we replace ${\mathit{pick} :: \mathit{Interval} \rightarrow \mathit{Rational}}$ by ${\mathit{pick}_2 = \mathit{fromBits} \cdot \mathit{toBits}}$, where

$\displaystyle \begin{array}{@{}l} \mathbf{type}\;\mathit{Bit} = \mathit{Integer} - \mbox{\quad 0 or 1 only} \vrule width0pt depth2ex \\ \mathit{toBits} :: \mathit{Interval} \rightarrow [\mathit{Bit}] \\ \mathit{fromBits} :: [\mathit{Bit}] \rightarrow \mathit{Rational} \end{array}$

The obvious definitions have ${\mathit{toBits}\;i}$ yield the shortest binary expansion of any fraction within ${i}$, and ${\mathit{fromBits}}$ evaluate this binary expansion. However, we don’t do quite this—it turns out to prevent the streaming condition from holding—and instead arrange for ${\mathit{toBits}}$ to yield the bit sequence that when extended with a 1 yields the shortest expansion of any fraction within ${i}$ (and indeed, the shortest binary expansion necessarily ends with a 1), and ${\mathit{fromBits}}$ compute the value with this 1 appended.

$\displaystyle \begin{array}{@{}l} \mathit{fromBits} = \mathit{foldr}\;\mathit{pack}\;(\frac 1 2) \vrule width0pt depth2ex \\ \mathit{pack} :: \mathit{Bit} \rightarrow \mathit{Rational} \rightarrow \mathit{Rational} \\ \mathit{pack}\;b\;x = (b + x) / 2 \vrule width0pt depth2ex \\ \mathit{toBits} = \mathit{unfoldr}\;\mathit{nextBit} \vrule width0pt depth2ex \\ \mathit{nextBit} :: \mathit{Interval} \rightarrow \mathsf{Maybe}\;(\mathit{Bit}, \mathit{Interval}) \\ \mathit{nextBit}\;(l,r) \\ \begin{array}[t]{@{\quad}clcl} | & r \le \frac 1 2 &=& \mathit{Just}\;(0, (0, \frac 1 2) \mathbin{\triangleleft} (l,r)) \\ | & \frac 1 2 \le l &=& \mathit{Just}\;(1, (\frac 1 2,1) \mathbin{\triangleleft} (l,r)) \\ | & \mathbf{otherwise} &=& \mathit{Nothing} \end{array} \end{array}$

Thus, if ${r \le \frac 1 2}$ then the binary expansion of any fraction within ${[l,r)}$ starts with 0; and similarly, if ${\frac 1 2 \le l}$, the binary expansion starts with 1. Otherwise, the interval ${[l,r)}$ straddles ${\frac 1 2}$; the shortest binary expansion within is it the expansion of ${\frac 1 2}$, so we yield the empty bit sequence.

Note that ${\mathit{pick}_2 = \mathit{fromBits} \cdot \mathit{toBits}}$ is a hylomorphism, so we have

$\displaystyle \begin{array}{@{}l} \mathit{pick}_2\;(l,r) \\ \begin{array}[t]{@{\quad}clcl} | & r \le \frac 1 2 &=& \mathit{pick}_2\;((0,\frac 1 2) \mathbin{\triangleleft} (l,r)) / 2 \\ | & \frac 1 2 \le l &=& (1 + \mathit{pick}_2\;((\frac 1 2,1) \mathbin{\triangleleft} (l,r))) / 2 \\ | & \mathbf{otherwise} &=& \frac 1 2 \end{array} \end{array}$

Moreover, it is clear that ${\mathit{toBits}}$ yields a finite bit sequence for any non-empty interval (since the interval doubles in width at each step, and the process stops when it includes ${\frac 1 2}$); so this equation serves to uniquely define ${\mathit{pick}_2}$. In other words, ${\mathit{nextBit}}$ is a recursive coalgebra. Then it is a straightforward exercise to prove that ${i \ni \mathit{pick}_2\;i}$; so although ${\mathit{pick}}$ and ${\mathit{pick}_2}$ differ, they are sufficiently similar for our purposes.

Now we redefine encoding to yield a bit sequence rather than a fraction, and decoding correspondingly to consume that bit sequence:

$\displaystyle \begin{array}{@{}l} \mathit{encode}_1 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Bit}] \\ \mathit{encode}_1\;m = \mathit{toBits} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \vrule width0pt depth2ex \\ \mathit{decode}_1 :: \mathit{Model} \rightarrow [\mathit{Bit}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_1\;m = \mathit{decode}_0\;m \cdot \mathit{fromBits} \end{array}$

That is, we move the ${\mathit{fromBits}}$ part of ${\mathit{pick}_2}$ from the encoding stage to the decoding stage.

## Streaming encoding

Just like ${\mathit{encode}_0}$, the new version ${\mathit{encode}_1}$ of encoding consumes all of its input before producing any output, so does not work for encoding infinite inputs, nor for streaming execution even on finite inputs. However, it is nearly in the right form to be a metamorphism—a change of representation from lists of symbols to lists of bits. In particular, ${\mathit{narrow}}$ is associative, and ${\mathit{unit}}$ is its unit, so we can replace the ${\mathit{foldr}}$ with a ${\mathit{foldl}}$:

$\displaystyle \mathit{encode}_1\;m = \mathit{unfoldr}\;\mathit{nextBit} \cdot \mathit{foldl}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m$

Now that ${\mathit{encode}_1}$ is in the right form, we must check the streaming condition for ${\mathit{narrow}}$ and ${\mathit{nextBit}}$. We consider one of the two cases in which ${\mathit{nextBit}}$ is productive, and leave the other as an exercise. When ${r \le \frac 1 2}$, and assuming ${\mathit{unit} \supseteq (p,q)}$, we have:

$\displaystyle \begin{array}{@{}cl} & \mathit{nextBit}\;((l,r) \mathbin{\triangleright} (p,q)) \\ = & \qquad \{ \mathit{narrow} \} \\ & \mathit{nextBit}\;(\mathit{weight}\;(l,r)\;p, \mathit{weight}\;(l,r)\;q) \\ = & \qquad \{ (l,r) \ni \mathit{weight}\;(l,r)\;q \mbox{, so in particular } \mathit{weight}\;(l,r)\;q < r \le \frac 1 2 \} \\ & \mathit{Just}\;(0, (0, \frac 1 2) \mathbin{\triangleleft} ((l,r) \mathbin{\triangleright} (p,q))) \\ = & \qquad \{ \mathit{widen} \mbox{ associates with } \mathit{narrow} \mbox{ (see below)} \} \\ & \mathit{Just}\;(0, ((0, \frac 1 2) \mathbin{\triangleleft} (l,r)) \mathbin{\triangleright} (p,q)) \end{array}$

as required. The last step is a kind of associativity property:

$\displaystyle i \mathbin{\triangleleft} (j \mathbin{\triangleright} k) = (i \mathbin{\triangleleft} j) \mathbin{\triangleright} k$

whose proof is left as another exercise. Therefore the streaming condition holds, and we may fuse the ${\mathit{unfoldr}}$ with the ${\mathit{foldl}}$, defining

$\displaystyle \begin{array}{@{}l} \mathit{encode}_2 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Bit}] \\ \mathit{encode}_2\;m = \mathit{stream}\;\mathit{nextBit}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \end{array}$

which streams the encoding process: the initial bits are output as soon as they are fully determined, even before all the input has been read. Note that ${\mathit{encode}_1}$ and ${\mathit{encode}_2}$ differ, in particular on infinite inputs (the former diverges, whereas the latter does not); but they coincide on finite symbol sequences.

## Streaming decoding

Similarly, we want to be able to stream decoding, so that we don’t have to wait for the entire encoded text to arrive before starting decoding. Recall that we have so far

$\displaystyle \mathit{decode}_1\;m = \mathit{decode}_0\;m \cdot \mathit{fromBits}$

where ${\mathit{decode}_0}$ is an ${\mathit{unfoldr}}$ and ${\mathit{fromBits}}$ a ${\mathit{foldr}}$. The first obstacle to streaming is that ${\mathit{foldr}}$, which we need to be a ${\mathit{foldl}}$ instead. We have

$\displaystyle \textstyle \mathit{fromBits} = \mathit{foldr}\;\mathit{pack}\;(\frac 1 2)$

Of course, ${\mathit{pack}}$ is not associative—it doesn’t even have the right type for that. But we can view each bit in the input as a function on the unit interval: bit~0 is represented by the function ${\mathit{weight}\;(0,\frac 1 2)}$ that focusses into the lower half of the unit interval, and bit~1 by the function ${\mathit{weight}\;(\frac 1 2, 1)}$ that focusses into the upper half. The fold itself composes a sequence of such functions; and since function composition is associative, this can be written equally well as a ${\mathit{foldr}}$ or a ${\mathit{foldl}}$. Having assembled the individual focussers into one composite function, we finally apply it to ${\frac 1 2}$. (This is in fact an instance of a general trick for turning a ${\mathit{foldr}}$ into a ${\mathit{foldl}}$, or vice versa.) Thus, we have:

$\displaystyle \textstyle \mathit{fromBits}\;bs = \mathit{foldl}\;\mathit{focusf}\;\mathit{id}\;bs\;(\frac 1 2) \quad\mathbf{where}\; \mathit{focusf}\;h\;b = h \cdot \mathit{weight}\;(\mathit{half}\;b)$

where ${\mathit{half}}$ yields either the lower or the upper half of the unit interval:

$\displaystyle \begin{array}{@{}lcl} \multicolumn{3}{@{}l}{\mathit{half} :: \mathit{Bit} \rightarrow \mathit{Interval}} \\ \mathit{half}\;0 &=& (0, \frac 1 2) \\ \mathit{half}\;1 &=& (\frac 1 2, 1) \end{array}$

In fact, not only may the individual bits be seen as focussing functions ${\mathit{weight}\;(0, \frac 1 2)}$ and ${\mathit{weight}\;(\frac 1 2, 1)}$ on the unit interval, so too may compositions of such functions:

$\displaystyle \begin{array}{@{}lcl} \mathit{id} &=& \mathit{weight}\;\mathit{unit} \\ \mathit{weight}\;i \cdot \mathit{weight}\;j &=& \mathit{weight}\;(i \mathbin{\triangleright} j) \end{array}$

So any such composition is of the form ${\mathit{weight}\;i}$ for some interval ${i}$, and we may represent it concretely by ${i}$ itself, and retrieve the function via ${\mathit{weight}}$:

$\displaystyle \textstyle \mathit{fromBits}\;bs = \mathit{weight}\;(\mathit{foldl}\;\mathit{focus}\;\mathit{unit}\;bs)\;(\frac 1 2) \quad\mathbf{where}\; \mathit{focus}\;i\;b = i \mathbin{\triangleright} \mathit{half}\;b$

So we now have

$\displaystyle \textstyle \mathit{decode}_1\;m = \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \quad\mathbf{where}\; \mathit{prepare}\;m\;i = (m, \mathit{weight}\;i\;(\frac 1 2))$

This is almost in the form of a metamorphism, except for the occurrence of the adapter ${\mathit{prepare}\;m}$ in between the unfold and the fold. It is not straightforward to fuse that adapter with either the fold or the unfold; fortunately, however, we can split it into the composition

$\displaystyle \mathit{prepare}\;m = \mathit{prepare}_2 \cdot \mathit{prepare}_1\;m$

of two parts, where

$\displaystyle \begin{array}{@{}lcl} \multicolumn{3}{@{}l}{\mathit{prepare}_1 :: \mathit{Model} \rightarrow \mathit{Interval} \rightarrow (\mathit{Model}, \mathit{Interval})} \\ \mathit{prepare}_1\;m\;i &=& (m,i) \vrule width0pt depth2ex \\ \multicolumn{3}{@{}l}{\mathit{prepare}_2 :: (\mathit{Model}, \mathit{Interval}) \rightarrow (\mathit{Model}, \mathit{Rational})} \\ \mathit{prepare}_2\;(m,i) &=& (m, \mathit{weight}\;i\;(\frac 1 2)) \end{array}$

in such a way that the first part ${\mathit{prepare}_1}$ fuses with the fold and the second part ${\mathit{prepare}_2}$ fuses with the unfold. For fusing the first half of the adapter with the fold, we just need to carry around the additional value ${m}$ with the interval being focussed:

$\displaystyle \mathit{prepare}_1\;m \cdot \mathit{foldl}\;\mathit{focus}\;i = \mathit{foldl}\;\mathit{mfocus}\;(m,i)$

where

$\displaystyle \begin{array}{@{}l} \mathit{mfocus} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathit{Bit} \rightarrow (\mathit{Model}, \mathit{Interval}) \\ \mathit{mfocus}\;(m,i)\;b = (m, \mathit{focus}\;i\;b) \end{array}$

For fusing the second half of the adapter with the unfold, let us check the fusion condition. We have (exercise!):

$\displaystyle \begin{array}{@{}l} \mathit{step}\;(\mathit{prepare}_2\;(m, i)) = \mathit{fmap}\;\mathit{prepare}_2\;(\mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i))) \\ \qquad\mathbf{where}\;s = \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\frac 1 2)) \end{array}$

where the ${\mathit{fmap}}$ is the functorial action for the base functor of the ${\mathsf{List}}$ datatype, applying just to the second component of the optional pair. We therefore define

$\displaystyle \begin{array}{@{}l} \mathit{stepi} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model}, \mathit{Interval})) \\ \mathit{stepi}\;(m,i) = \mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i)) \\ \qquad\mathbf{where}\;s = \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\frac 1 2)) \end{array}$

and have

$\displaystyle \mathit{step}\;(\mathit{prepare}_2\;(m, i)) = \mathit{fmap}\;\mathit{prepare}_2\;(\mathit{stepi}\;(m,i))$

and therefore

$\displaystyle \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2 = \mathit{unfoldr}\;\mathit{stepi}$

Note that the right-hand side will eventually lead to intervals that exceed the unit interval. When ${j \supseteq i}$, it follows that ${\mathit{unit} \supseteq j \mathbin{\triangleleft} i}$; but the unfolding process keeps widening the interval without bound, so it will necessarily eventually exceed the unit bounds. We return to this point shortly.

We have therefore concluded that

$\displaystyle \begin{array}{@{}lcl} \mathit{decode}_1\;m &=& \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \\ &=& \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2 \cdot \mathit{prepare}_1\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \\ &=& \mathit{unfoldr}\;\mathit{stepi} \cdot \mathit{foldl}\;\mathit{mfocus}\;(m,\mathit{unit}) \end{array}$

Now we need to check the streaming condition for ${\mathit{mfocus}}$ and ${\mathit{stepi}}$. Unfortunately, this is never going to hold: ${\mathit{stepi}}$ is always productive, so ${\mathit{stream}\;\mathit{stepi}\;\mathit{mfocus}}$ will only take production steps and never consume any input. The problem is that ${\mathit{unfoldr}\;\mathit{stepi}}$ is too aggressive, and we need to use the more cautious flushing version of streaming instead. Informally, the streaming process should be productive from a given state ${(m,i)}$ only when the whole of interval ${i}$ maps to the same symbol in model ${m}$, so that however ${i}$ is focussed by subsequent inputs, that symbol cannot be invalidated.

More formally, note that

$\displaystyle \mathit{unfoldr}\;\mathit{stepi} = \mathit{apo}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{stepi})$

where

$\displaystyle \begin{array}{@{}l} \mathit{safestepi} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model}, \mathit{Interval})) \\ \mathit{safestepi}\;(m,i) \\ \begin{array}[t]{@{\quad}clcl} | & \mathit{safe}\;(m,i) &=& \mathit{stepi}\;(m,i) \\ | & \mathbf{otherwise} &=& \mathit{Nothing} \end{array} \end{array}$

and

$\displaystyle \begin{array}{@{}l} \mathit{safe} :: (\mathit{Model},\mathit{Interval}) \rightarrow \mathit{Bool} \\ \mathit{safe}\;(m, i) = \mathit{encodeSym}\;m\;s \supseteq i \quad\mathbf{where}\; s = \mathit{decodeSym}\;m\;(\mathit{midpoint}\;i) \end{array}$

That is, the interval ${i}$ is “safe” for model ${m}$ if it is fully included in the encoding of some symbol ${s}$; then all elements of ${i}$ decode to ${s}$. Then, and only then, we may commit to outputting ${s}$, because no further input bits could lead to a different first output symbol.

Note now that the interval remains bounded by unit interval during the streaming phase, because of the safety check in ${\mathit{safestepi}}$, although it will still exceed the unit interval during the flushing phase. However, at this point we can undo the fusion we performed earlier, “fissioning${\mathit{unfoldr}\;\mathit{stepi}}$ into ${\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2}$ again: this manipulates rationals rather than intervals, so there is no problem with intervals getting too wide. We therefore have:

$\displaystyle \mathit{decode}_1\;m = \mathit{apo}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2) \cdot \mathit{foldl}\;\mathit{mfocus}\;(m,\mathit{unit})$

Now let us check the streaming condition for ${\mathit{mfocus}}$ and the more cautious ${\mathit{safestepi}}$. Suppose that ${(m,i)}$ is a productive state, so that ${\mathit{safe}\;(m,i)}$ holds, that is, all of interval ${i}$ is mapped to the same symbol in ${m}$, and let

$\displaystyle \begin{array}{@{}lcl} s &=& \mathit{decodeSym}\;m\;(\mathit{midpoint}\;i) \\ m' &=& \mathit{newModel}\;m\;s \\ i' &=& \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i \end{array}$

so that ${\mathit{safestepi}\;(m,i) = \mathit{Just}\;(s, (m',i'))}$. Consuming the next input ${b}$ leads to state ${(m, \mathit{focus}\;i\;b)}$. This too is a productive state, because ${i \supseteq \mathit{focus}\;i\;b}$ for any ${b}$, and so the whole of the focussed interval is also mapped to the same symbol ${s}$ in the model. In particular, the midpoint of ${\mathit{focus}\;i\;b}$ is within interval ${i}$, and so the first symbol produced from the state after consumption coincides with the symbol ${s}$ produced from the state before consumption. That is,

$\displaystyle \mathit{safestepi}\;(\mathit{mfocus}\;(m,i)\;b) = \mathit{Just}\;(s, \mathit{mfocus}\;(m', i')\;b)$

as required. We can therefore rewrite decoding as a flushing stream computation:

$\displaystyle \begin{array}{@{}l} \mathit{decode}_2 :: \mathit{Model} \rightarrow [\mathit{Bit}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_2\;m = \mathit{fstream}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2)\;\mathit{mfocus}\;(m,\mathit{unit}) \end{array}$

That is, initial symbols are output as soon as they are completely determined, even before all the input bits have been read. This agrees with ${\mathit{decode}_1}$ on finite bit sequences.

## Fixed-precision arithmetic

We will leave arithmetic coding at this point. There is actually still quite a bit more arithmetic required—in particular, for competitive performance it is important to use only fixed-precision arithmetic, restricting attention to rationals within the unit interval with denominator ${2^k}$ for some fixed~${k}$. In order to be able to multiply two numerators using 32-bit integer arithmetic without the risk of overflow, we can have at most ${k=15}$. Interval narrowing now needs to be approximate, rounding down both endpoints to integer multiples of ${2^{-k}}$. Care needs to be taken so that this rounding never makes the two endpoints of an interval coincide. Still, encoding can be written as an instance of ${\mathit{stream}}$. Decoding appears to be more difficult: the approximate arithmetic means that we no longer have interval widening as an exact inverse of narrowing, so the approach above no longer works. Instead, our 2002 lecture notes introduce a “destreaming” operator that simulates and inverts streaming: the decoder works in sympathy with the encoder, performing essentially the same interval arithmetic but doing the opposite conversions. Perhaps I will return to complete that story some time…

Posted in Uncategorized | 5 Comments

## Arithmetic Coding

This post is about the data compression method called arithmetic coding, by which a text is encoded as a subinterval of the unit interval, which is then represented as a bit sequence. It can often encode more effectively than Huffman encoding, because it doesn’t have the restriction of Huffman that each symbol be encoded as a positive whole number of bits; moreover, it readily accommodates adaptive models of the text, which “learn” about the text being encoded while encoding it. It is based on lecture notes that I wrote in 2002 with Richard Bird, although the presentation here is somewhat simplified; it is another application of streaming. There’s quite a lot to cover, so in this post I’ll just set up the problem by implementing a basic encoder and decoder. In the next post, I’ll show how they can both be streamed. (We won’t get into the intricacies of restricting to fixed-precision arithmetic—perhaps I can cover that in a later post.)

The basic idea behind arithmetic coding is essentially to encode an input text as a subinterval of the unit interval, based on a model of the text symbols that assigns them to a partition of the unit interval into non-empty subintervals. For the purposes of this post, we will deal mostly with half-open intervals, so that the interval ${[l,r)}$ contains values ${x}$ such that ${l \le x < r}$, where ${l,r,x}$ are rationals.

For example, with just two symbols “a” and “b”, and a static model partitioning the unit interval into ${[0, \frac 1 3)}$ for “a” and ${[\frac 1 3, 1)}$ for “b”, the symbols in the input text “aba” successively narrow the unit interval to ${[0,\frac 1 3), [\frac 1 9, \frac 1 3), [\frac 1 9, \frac 5 {27})}$, and the latter interval is the encoding of the whole input. And in fact, it suffices to pick any single value in this final interval, as long as there is some other way to determine the end of the encoded text (such as the length, or a special end-of-text symbol).

## Intervals

We introduce the following basic definitions for intervals:

$\displaystyle \begin{array}{@{}l} \mathbf{type}\;\mathit{Interval} = (\mathit{Rational}, \mathit{Rational}) \vrule width0pt depth2ex \\ \mathit{unit} :: \mathit{Interval} \\ \mathit{unit} = (0,1) \vrule width0pt depth2ex \\ \mathit{contains} :: \mathit{Interval} \rightarrow \mathit{Rational} \rightarrow \mathit{Bool} \\ \mathit{contains}\;(l,r)\;x = l \le x \land x < r \vrule width0pt depth2ex \\ \mathit{includes} :: \mathit{Interval} \rightarrow \mathit{Interval} \rightarrow \mathit{Bool} \\ \mathit{includes}\;(l,r)\;(p,q) = l \le p \land q \le r \end{array}$

We’ll write “${i \ni x}$” for ${\mathit{contains}\;i\;x}$, and “${i \supseteq j}$” for ${\mathit{includes}\;i\;j}$.

A crucial operation on intervals is narrowing of one interval by another, where ${\mathit{narrow}\;i\;j}$ is to ${i}$ as ${j}$ is to the unit interval:

$\displaystyle \begin{array}{@{}l} \mathit{narrow} :: \mathit{Interval} \rightarrow \mathit{Interval} \rightarrow \mathit{Interval} \\ \mathit{narrow}\;i\;(p,q) = (\mathit{weight}\;i\;p, \mathit{weight}\;i\;q) \vrule width0pt depth2ex \\ \mathit{weight} :: \mathit{Interval} \rightarrow \mathit{Rational} \rightarrow \mathit{Rational} \\ \mathit{weight}\;(l,r)\;x = l + (r-l) \times x \end{array}$

We’ll write “${i \mathbin{\triangleright} j}$” for ${\mathit{narrow}\;i\;j}$. Thus, ${\mathit{weight}\;(l,r)\;x}$ is “proportionately ${x}$ of the way between ${l}$ and ${r}$“, and we have

$\displaystyle \begin{array}{@{}lcl} i \ni \mathit{weight}\;i\;x & \Leftarrow& \mathit{unit} \ni x \\ i \supseteq i \mathbin{\triangleright} j &\Leftarrow& \mathit{unit} \supseteq j \end{array}$

Conversely, we can widen one interval by another:

$\displaystyle \begin{array}{@{}l} \mathit{widen} :: \mathit{Interval} \rightarrow \mathit{Interval} \rightarrow \mathit{Interval} \\ \mathit{widen}\;i\;(p,q) = (\mathit{scale}\;i\;p, \mathit{scale}\;i\;q) \vrule width0pt depth2ex \\ \mathit{scale} :: \mathit{Interval} \rightarrow \mathit{Rational} \rightarrow \mathit{Rational} \\ \mathit{scale}\;(l,r)\;x = (x-l)/(r-l) \end{array}$

We’ll write “${i \mathbin{\triangleleft} j}$” for ${\mathit{widen}\;i\;j}$. Note that ${\mathit{scale}}$ is inverse to ${\mathit{weight}}$, in the sense

$\displaystyle y = \mathit{weight}\;i\;x \Leftrightarrow \mathit{scale}\;i\;y = x$

and consequently widening is inverse to narrowing:

$\displaystyle i \mathbin{\triangleleft} (i \mathbin{\triangleright} j) = j$

## Models

We work with inputs consisting of sequences of symbols, which might be characters or some higher-level tokens:

$\displaystyle \mathbf{type}\;\mathit{Symbol} = \mathit{Char}$

The type ${\mathit{Model}}$ then must provide the following operations:

• a way to look up a symbol, obtaining the corresponding interval:

$\displaystyle \mathit{encodeSym} :: \mathit{Model} \rightarrow \mathit{Symbol} \rightarrow \mathit{Interval}$

• conversely, a way to decode a value, retrieving a symbol:

$\displaystyle \mathit{decodeSym} :: \mathit{Model} \rightarrow \mathit{Rational} \rightarrow \mathit{Symbol}$

• an initial model:

$\displaystyle \mathit{initial} :: \mathit{Model}$

• a means to adapt the model on seeing a new symbol:

$\displaystyle \mathit{newModel} :: \mathit{Model} \rightarrow \mathit{Symbol} \rightarrow \mathit{Model}$

The central property is that encoding and decoding are inverses, in the following sense:

$\displaystyle \mathit{decodeSym}\;m\;x = s \quad \Leftrightarrow \quad \mathit{encodeSym}\;m\;s \ni x$

There are no requirements on ${\mathit{initial}}$ and ${\mathit{newModel}}$, beyond the latter being a total function.

For example, we might support adaptive coding via a model that counts the occurrences seen so far of each of the symbols, represented as a histogram:

$\displaystyle \mathbf{type}\;\mathit{Model} = [(\mathit{Symbol},\mathit{Integer})]$

This naive implementation works well enough for small alphabets. One might maintain the histogram in decreasing order of counts, so that the most likely symbols are at the front and are therefore found quickest. For larger alphabets, it is better to maintain the histogram as a binary search tree, ordered alphabetically by symbol, and caching the total counts of every subtree.

## Encoding

Now encoding is straightforward to define. The function ${\mathit{encodeSyms}}$ takes an initial model and a list of symbols, and returns the list of intervals obtained by looking up each symbol in turn, adapting the model at each step:

$\displaystyle \begin{array}{@{}l} \mathit{encodeSyms} :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Interval}] \\ \mathit{encodeSyms}\; m = \mathit{map}\;\mathit{snd} \cdot \mathit{tail} \cdot \mathit{scanl}\;\mathit{next}\;(m,\mathit{unit}) \\ \quad \mathbf{where}\; \begin{array}[t]{@{}l} \mathit{next} :: (\mathit{Model},\mathit{Interval}) \rightarrow \mathit{Symbol} \rightarrow (\mathit{Model},\mathit{Interval}) \\ \mathit{next}\;(m,i)\;s = (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s) \end{array} \end{array}$

That is,

$\displaystyle \begin{array}{@{}lcl} \mathit{encodeSyms}\;m\;[\,] &=& [\,] \\ \mathit{encodeSyms}\;m\;(s:ss) &=& \mathit{encodeSym}\;m\;s : \mathit{encodeSyms}\;(\mathit{newModel}\;m\;s)\;ss \end{array}$

We then narrow the unit interval by each of these subintervals, and pick a single value from the resulting interval:

$\displaystyle \begin{array}{@{}l} \mathit{encode}_0 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow \mathit{Rational} \\ \mathit{encode}_0\;m = \mathit{pick} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \end{array}$

All we require of ${\mathit{pick} :: \mathit{Interval} \rightarrow \mathit{Rational}}$ is that ${i \ni \mathit{pick}\;i}$; then ${\mathit{encode}_0}$ yields a fraction in the unit interval. For example, we might set ${\mathit{pick} = \mathit{midpoint}}$, where

$\displaystyle \textstyle \mathit{midpoint}\;i = \mathit{weight}\;i\;(\frac 1 2)$

## Decoding

So much for encoding; how do we retrieve the input text? In fact, we can retrieve the first symbol simply by using ${\mathit{decodeSym}}$. Expanding the encoding of a non-empty text, we have:

$\displaystyle \begin{array}{@{}cl} & \mathit{encode}_0\;m\;(s:ss) \\ = & \qquad \{ \mathit{encode}_0 \mbox{ and } \mathit{encodeSyms} \mbox{, as above; let } i = \mathit{encodeSym}\;m\;s \} \\ & \mathit{pick}\;(\mathit{foldr}\;\mathit{narrow}\;\mathit{unit}\;(i : \mathit{encodeSyms}\;(\mathit{newModel}\;m\;s)\;ss)) \\ = & \qquad \{ \mbox{fold} \} \\ & \mathit{pick}\;(i \mathbin{\triangleright} \mathit{foldr}\;\mathit{narrow}\;\mathit{unit}\;(\mathit{encodeSyms}\;(\mathit{newModel}\;m\;s)\;ss)) \\ = & \qquad \{ \mathit{pick}\;(i \mathbin{\triangleright} j) = \mathit{weight}\;i\;(\mathit{pick}\;j) \mbox{ (see below)} \} \\ & \mathit{weight}\;i\;(\mathit{pick}\;(\mathit{foldr}\;\mathit{narrow}\;\mathit{unit}\;(\mathit{encodeSyms}\;(\mathit{newModel}\;m\;s)\;ss))) \\ = & \qquad \{ \mathit{encode}_0 \mbox{ and } \mathit{encodeSyms} \mbox{ again} \} \\ & \mathit{weight}\;i\;(\mathit{encode}_0\;(\mathit{newModel}\;m\;s)\;ss) \end{array}$

The proof obligation, left as an exercise, is to show that

$\displaystyle \mathit{pick}\;(i \mathbin{\triangleright} j) = \mathit{weight}\;i\;(\mathit{pick}\;j)$

which holds when ${\mathit{pick}\;i}$ is of the form ${\mathit{weight}\;i\;x}$ for some ${x}$.

Now

$\displaystyle \begin{array}{@{}cl} & \mathit{decodeSym}\;m\;(\mathit{encode}_0\;m\;(s:ss)) = s \\ \Leftrightarrow & \qquad \{ \mbox{expansion of } \mathit{encode}_0 \mbox{, as above; let } i = \mathit{encodeSym}\;m\;s \} \\ & \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\mathit{encode}_0\;(\mathit{newModel}\;m\;s)\;ss)) = s \\ \Leftrightarrow & \qquad \{ \mbox{requirement on models} \} \\ & i \ni \mathit{weight}\;i\;(\mathit{encode}_0\;(\mathit{newModel}\;m\;s)\;ss) \\ \Leftarrow & \qquad \{ \mathit{weight} \} \\ & \mathit{unit} \ni \mathit{encode}_0\;(\mathit{newModel}\;m\;s)\;ss \end{array}$

and indeed, encoding yields a fraction in the unit interval, so this recovers the first symbol correctly. This is the foothold that allows the decoding process to make progress; having obtained the first symbol using ${\mathit{decodeSym}}$, it can adapt the model in precisely the same way that the encoding process does, then retrieve the second symbol using that adapted model, and so on. The only slightly tricky part is that when decoding an initial value ${x}$, having obtained the first symbol ${s}$, decoding should continue on some modified value ${x'}$; what should the modification be? It turns out that the right thing to do is to scale ${x}$ by the interval associated in the model with symbol ${s}$, since scaling is the inverse operation to the ${\mathit{weight}}$s that take place during encoding. That is, we define:

$\displaystyle \begin{array}{@{}l} \mathit{decode}_0 :: \mathit{Model} \rightarrow \mathit{Rational} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_0\;m\;x = \mathit{unfoldr}\;\mathit{step}\;(m,x) \vrule width0pt depth2ex \\ \mathit{step} :: (\mathit{Model}, \mathit{Rational}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model},\mathit{Rational})) \\ \mathit{step}\;(m,x) = \mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{scale}\;(\mathit{encodeSym}\;m\;s)\;x)) \\ \quad \mathbf{where}\;s = \mathit{decodeSym}\;m\;x \end{array}$

(Of course, ${\mathit{encodeSym}\;m\;s \ni x}$, by the inverse requirement on models, and so the new scaled value is again within the unit interval.)

Note that decoding yields an infinite list of symbols; the function ${\mathit{step}}$ is always productive. Nevertheless, that infinite list starts with the encoded text, as we shall now verify. Define the round-trip function

$\displaystyle \mathit{round}_0\;m = \mathit{decode}_0\;m \cdot \mathit{encode}_0\;m$

Then we have:

$\displaystyle \begin{array}{@{}cl} & \mathit{round}_0\;m\;(s:ss) \\ = & \qquad \{ \mbox{definition of } \mathit{round}_0 \} \\ & \mathit{decode}_0\;m\;(\mathit{encode}_0\;m\;(s:ss)) \\ = & \qquad \{ \mathit{encode}_0 \mbox{; let } i = \mathit{encodeSym}\;m\;s, m' = \mathit{newModel}\;m\;s \} \\ & \mathit{decode}_0\;m\;(\mathit{weight}\;i\;(\mathit{encode}_0\;m'\;ss)) \\ = & \qquad \{ \mathit{decode}_0 \mbox{; first decoded symbol is correct, as above} \} \\ & s : \mathit{decode}_0\;m'\;(\mathit{scale}\;i\;(\mathit{weight}\;i\;(\mathit{encode}_0\;m'\;ss))) \\ = & \qquad \{ \mathit{scale}\;i\;(\mathit{weight}\;i\;x) = x \} \\ & s : \mathit{decode}_0\;m'\;(\mathit{encode}_0\;m'\;ss) \\ = & \qquad \{ \mbox{definition of } \mathit{round}_0 \} \\ & s : \mathit{round}_0\;m'\;ss \end{array}$

From this it follows that indeed the round-trip recovers the initial text, in the sense that ${\mathit{round}_0\;m\;ss}$ yields an infinite sequence that starts with ${ss}$; in fact,

$\displaystyle \mathit{round}_0\;m\;ss = ss \mathbin{{+}\!\!\!{+}} \mathit{round}_0\;(\mathit{foldl}\;\mathit{newModel}\;m\;ss)\;[\,]$

yielding the original input followed by some junk, the latter obtained by decoding the fraction ${\frac 1 2}$ (the encoding of ${[\,]}$) from the final model ${\mathit{foldl}\;\mathit{newModel}\;m\;ss}$ that results from adapting the initial model to each symbol in ${ss}$ in turn. To actually retrieve the input text with no junk suffix, one could transmit the length separately (although that doesn’t sit well with streaming), or append a distinguished end-of-text symbol.

## What’s next

So far we have an encoder and a decoder, and a proof that the decoder successfully decodes the encoded text. In the next post, we’ll see how to reimplement both as streaming processes.

Posted in Uncategorized | 3 Comments

## The Digits of Pi

In the previous post we were introduced to metamorphisms, which consist of an unfold after a fold—typically on lists, and the fold part typically a ${\mathit{foldl}}$. A canonical example is the conversion of a fraction from one base to another. For simplicity, let’s consider here only infinite fractions, so we don’t have to deal with the end of the input and flushing the state:

$\displaystyle \begin{array}{@{}lcl@{}} \multicolumn{3}{@{}l}{\mathit{stream} :: (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow [\gamma]} \\ \mathit{stream}\;g\;f\;b\;x &=& \mathbf{case}\;g\;b\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} \mathit{Just}\;(c,b') &\rightarrow& c : \mathit{stream}\;g\;f\;b'\;x \\ \mathit{Nothing} &\rightarrow& \mathbf{case}\;x\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} a:x' &\rightarrow& \mathit{stream}\;g\;f\;(f\;b\;a)\;x' \end{array} \end{array} \end{array}$

So for example, we can convert an infinite fraction in base 3 to one in base 7 with

$\displaystyle \mathit{stream}\;\mathit{next}\;\mathit{step}\;(0,1)$

where

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{next}\;(u,v) &=& \begin{array}[t]{@{}l} \mathbf{let}\;y = \lfloor{7 \times u \times v}\rfloor\;\mathbf{in} \\ \mathbf{if}\;\lfloor{y}\rfloor = \lfloor{7 \times (u+1) \times v}\rfloor\;\begin{array}[t]{@{}l@{\;}l}\mathbf{then}&\mathit{Just}\;(y,(u - y/(v \times 7), v \times 7))\\\mathbf{else}&\mathit{Nothing} \\ \end{array} \end{array} \\ \mathit{stepl}\;(u,v)\;d &=& (u \times 3 + d, v / 3) \end{array}$

In this post, we’ll see another number conversion problem, which will deliver the digits of ${\pi}$. For more details, see my paper—although the presentation here is simpler now.

## Series for pi

Leibniz showed that

$\displaystyle \displaystyle \frac{\pi}{4} = \sum_{i=0}^{\infty} \frac{(-1)^i}{2i+1}$

From this, using Euler’s convergence-accelerating transformation, one may derive

$\displaystyle \pi = \sum_{i=0}^{\infty} \frac{(i!)^2\,2^{i+1}}{(2i+1)!}$

or equivalently

$\displaystyle \pi = 2 + \frac{1}{3} \times \biggl(2 + \frac{2}{5}\times \biggl(2 + \frac{3}{7}\times \biggl( \cdots \biggl(2 + \frac{i}{2i+1}\times \biggl(\cdots\biggr)\biggr)\biggr)\biggr)\biggr)$

This can be seen as the number ${(2;2,2,2...)}$ in a funny mixed-radix base ${(\frac{1}{3}, \frac{2}{5}, \frac{3}{7}...)}$, just as the usual decimal expansion

$\displaystyle \pi = 3 + \frac{1}{10} \times \biggl(1 + \frac{1}{10}\times \biggl(4 + \frac{1}{10}\times \biggl( \cdots\biggr)\biggr)\biggr)$

is represented by the number ${(3;1,4,1...)}$ in the fixed-radix base ${(\frac{1}{10},\frac{1}{10},\frac{1}{10}...)}$. Computing the decimal digits of ${\pi}$ is then a matter of conversion from the mixed-radix base to the fixed-radix base.

## Conversion from a fixed base

Let’s remind ourselves of how it should work, using a simpler example: conversion from one fixed base to another. We are given an infinite-precision fraction in the unit interval

$\displaystyle x = \frac{1}{m} \times \biggl(x_0 + \frac{1}{m}\times \biggl(x_1 + \frac{1}{m}\times \biggl( \cdots\biggr)\biggr)\biggr)$

in base ${m}$, in which ${0 \le x_i < m}$ for each digit ${x_i}$. We are to convert it to a similar representation

$\displaystyle x = \frac{1}{n} \times \biggl(y_0 + \frac{1}{n}\times \biggl(y_1 + \frac{1}{n}\times \biggl( \cdots\biggr)\biggr)\biggr)$

in base ${n}$, in which ${0 \le y_j < n}$ for each output digit ${y_j}$. The streaming process maintains a state ${(u,v)}$, a pair of rationals; the invariant is that after consuming ${i}$ input digits and producing ${j}$ output digits, we have

$\displaystyle x = \frac{1}{n} \times \biggl(y_0 + \cdots + \frac{1}{n}\times \biggl(y_{j-1} + v \times (u + \frac{1}{m} \times \biggl( x_i + \frac{1}{m} \times \biggl(x_{i+1} + \cdots \biggr)\biggr)\biggr)\biggr)\biggr)$

so that ${(u,v)}$ represents a linear function ${(v\times) \cdot (u+)}$ that should be applied to the value represented by the remaining input.

We can initialize the process with ${i=0, j=0, u=0, v=1}$. At each step, we first try to produce another output digit. The remaining input digits ${x_i, x_{i+1},...}$ represent a value in the unit interval; so if ${n \times v \times (u+0)}$ and ${n \times v \times (u+1)}$ have the same integer part, then that must be the next output digit, whatever the remaining input digits are. Let ${y_j = \lfloor n \times v \times u \rfloor}$ be that integer. Now we need to find ${(u',v')}$ such that

$\displaystyle \frac{1}{n} \times \biggl(y_j + v' \times (u' + r)\biggr) = v \times (u + r)$

for any remainder ${r}$; then we can increment ${j}$ and set ${(u,v)}$ to ${(u',v')}$ and the invariant is maintained. A little algebra shows that we should take ${v' = n \times v}$ and ${u' = u - y_j/v'}$.

If ${v \times u}$ and ${v \times (u+1)}$ have different integer parts, we cannot yet tell what the next output digit should be, so we must consume the next input digit instead. Now we need to find ${(u',v')}$ such that

$\displaystyle v \times \biggl(u + \frac{1}{m} \times \biggl(x_i + r\biggr)\biggr) = v' \times (u' + r)$

for any remainder ${r}$; then we can increment ${i}$ and set ${(u,v)}$ to ${(u',v')}$ and the invariant is again maintained. Again, algebraic manipulation leads us to ${v' = v/m}$ and ${u' = m \times u + x_i}$.

For example, ${\frac{1}{4} = 0.020202..._3 = 0.151515..._7}$, and the conversion starts as follows:

$\displaystyle \begin{array}{c|c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}cc} x_i & & 0 && 2 && 0 && && 2 && && 0 && 2 \\ \hline (u,v) & \bigl(\frac{0}{1},\frac{1}{1}\bigr) && \bigl(\frac{0}{1},\frac{1}{3}\bigr) && \bigl(\frac{2}{1},\frac{1}{9}\bigr) && \bigl(\frac{6}{1},\frac{1}{27}\bigr) && \bigl(\frac{15}{7},\frac{7}{27}\bigr) && \bigl(\frac{59}{7},\frac{7}{81}\bigr) && \bigl(\frac{8}{49},\frac{49}{81}\bigr) && \bigl(\frac{24}{49},\frac{49}{243}\bigr) && \bigl(\frac{170}{49},\frac{49}{729}\bigr) & \cdots \vrule height 2.5ex depth 1.5ex width 0pt \\ \hline y_j & & && && && 1 && && 5 \end{array}$

That is, the initial state is ${u_0=\frac{0}{1}, v_0=\frac{1}{1}}$. This state does not yet determine the first output digit, so we consume the first input digit 0 to yield the next state ${u_1 = \frac{0}{1}, v_1 = \frac{1}{3}}$. This state still does not determine the first output, and nor will the next; so we consume the next two input digits 2 and 0, yielding state ${u_3 = \frac{6}{1}, v_3 = \frac{1}{27}}$. This state does determine the next digit: ${v_3 \times u_3 = 0.020_3 = 0.136..._7}$ and ${v_3 \times (u_3+1) = 0.021_3 = 0.154..._7}$ both start with a 1 in base 7. So we can produce a 1 as the first output digit, yielding state ${u_4 = \frac{15}{7}, v_4 = \frac{7}{27}}$. And so on.

The process tends to converge. Each production step widens the non-empty window ${[n \times v \times u, n \times v \times (u+1))}$ by a factor of ${n}$, so it will eventually contain multiple integers; therefore we cannot produce indefinitely. Each consumption step narrows the window by a factor of ${m}$, so it will tend towards eventually producing the next output digit. However, this doesn’t always work. For example, consider converting ${0.333..._{10}}$ to base 3:

$\displaystyle \begin{array}{c|c@{}c@{}c@{}c@{}c@{}c@{}cc} x_i & & 3 && 3 && 3 & \\ \hline (u,v) & \bigl(\frac{0}{1},\frac{1}{1}\bigr) && \bigl(\frac{3}{1},\frac{1}{10}\bigr) && \bigl(\frac{33}{1},\frac{1}{100}\bigr) && \bigl(\frac{333}{1},\frac{1}{1000}\bigr) & \cdots \vrule height 2.5ex depth 1.5ex width 0pt \\ \hline y_j & & && \end{array}$

The first output digit is never determined: if the first non-3 in the input is less than 3, the value is less than a third, and the first output digit should be a 0; if the first non-3 is greater than 3, then the value is definitely greater than a third, and it is safe to produce a 1 as the first output digit; but because the input is all 3s, we never get to make this decision. This problem will happen whenever the value being represented has a finite representation in the output base.

## Conversion from a mixed base

Let’s return now to computing the digits of ${\pi}$. We have the input

$\displaystyle \pi = 2 + \frac{1}{3} \times \biggl(2 + \frac{2}{5}\times \biggl(2 + \frac{3}{7}\times \biggl( \cdots \biggl(2 + \frac{i}{2i+1}\times \biggl(\cdots\biggr)\biggr)\biggr)\biggr)\biggr)$

which we want to convert to decimal. The streaming process maintains a pair ${(u,v)}$ of rationals—but this time representing the linear function ${(u+) \cdot (v\times)}$, since this time our expression starts with a sum rather than a product. The invariant is similar: after consuming ${i-1}$ input digits and producing ${j}$ output digits, we have

$\displaystyle \pi = y_0 + \frac{1}{10} \times \biggl(\cdots y_{j-1} + \frac{1}{10} \times \biggl(u + v \times \biggl(x_i + \frac{i}{2i+1} \times \biggl(x_{i+1} + \frac{i+1}{2i+3} \times \cdots\biggr)\biggr)\biggr)\biggr)$

Note that the output base is fixed at 10; but more importantly, the input digits ${x_i}$ are all fixed at 2, and it is the input base that varies from digit to digit.

We can initialize the process with ${i=1, j=0, u=0, v=1}$. At each step, we first try to produce an output digit. What value might the remaining input

$\displaystyle r = 2 + \frac{i}{2i+1} \times \biggl(2 + \frac{i+1}{2i+3} \times \cdots \biggr)$

represent? Each of the bases is at least ${\frac{1}{3}}$, so it is clear that ${r_{\mathrm{min}} \le r}$, where

$\displaystyle r_{\mathrm{min}} = 2 + \frac{1}{3} \times r_{\mathrm{min}}$

which has unique solution ${r_{\mathrm{min}} = 3}$. Similarly, each of the bases is less than ${\frac{1}{2}}$, so it is clear that ${r < r_{\mathrm{max}}}$, where

$\displaystyle r_{\mathrm{max}} = 2 + \frac{1}{2} \times r_{\mathrm{max}}$

which has unique solution ${r_{\mathrm{max}} = 4}$. So we consider the bounds ${\lfloor u + v \times 3 \rfloor}$ and ${\lfloor u + v \times 4 \rfloor}$; if these have the same integer part ${y_j}$, then that is the next output digit. Now we need to find ${(u',v')}$ such that

$\displaystyle y_j + \frac{1}{10} \times (u' + v' \times r) = u + v \times r$

for any remainder ${r}$, so we pick ${u' = 10 \times (u - y_j)}$ and ${v' = 10 \times v}$. Then we can increment ${j}$ and set ${(u,v)}$ to ${(u',v')}$, and the invariant is maintained.

If the two bounds have different integer parts, we must consume the next input digit instead. Now we need to find ${(u',v')}$ such that

$\displaystyle u' + v' \times r = u + v \times \biggl(x_i + \frac{i}{2i+1} \times r\biggr)$

for all ${r}$, so we pick ${u' = u + v \times x_i}$ and ${v' = v \times i / (2i+1)}$. Then we can increment ${i}$ and set ${(u,v)}$ to ${(u',v')}$, and again the invariant is maintained.

The conversion starts as follows:

$\displaystyle \begin{array}{c|c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}cc} x_i & & 2 && && 2 && 2 && && 2 && 2 \\ \hline (u,v) & \bigl(\frac{0}{1},\frac{1}{1}\bigr) && \bigl(\frac{2}{1},\frac{1}{3}\bigr) && \bigl(\frac{-10}{1},\frac{10}{3}\bigr) && \bigl(\frac{10}{3},\frac{4}{3}\bigr) && \bigl(\frac{-2}{3},\frac{4}{7}\bigr) && \bigl(\frac{-50}{3},\frac{40}{7}\bigr) && \bigl(\frac{-110}{21},\frac{160}{63}\bigr) && \bigl(\frac{-10}{63},\frac{800}{693}\bigr) & \cdots \vrule height 2.5ex depth 1.5ex width 0pt \\ \hline y_j & & && 3 && && && 1 && && \end{array}$

Happily, non-termination ceases to be a problem: the value being represented does not have a finite representation in the output base, being irrational.

## Code

We can plug these definitions straight into the ${\mathit{stream}}$ function above:

$\displaystyle \mathit{piDigits} = \mathit{stream}\;g\;f\;(1,0,0\%1,1\%1)\;(\mathit{repeat}\;2)$

where

$\displaystyle g\;(i,j,u,v) = \begin{array}[t]{@{}l} \mathbf{if}\;y == \mathit{floor}\;(u + v \times 4) \\ \mathbf{then}\;\mathit{Just}\;(y, (i,j+1, 10 \times (u - \mathit{fromIntegral}\;y), 10 \times v)) \\ \mathbf{else}\;\mathit{Nothing} \\ \mathbf{where}\;y = \mathit{floor}\;(u + v \times 3) \end{array}$

and

$\displaystyle f\;(i,j,u,v)\;x = \begin{array}[t]{@{}l} (i+1,j,u + v \times \mathit{fromIntegral}\;x, v \times i' / (2 \times i' + 1)) \\ \mathbf{where}\;i' = \mathit{fromIntegral}\;i \end{array}$

(The ${\%}$s make rational numbers in Haskell, and force the ambiguous fractional type to be ${\mathit{Rational}}$ rather than ${\mathit{Double}}$.)

In fact, this program can be considerably simplified, by inlining the definitions. In particular, the input digits are all 2, so we need not supply them. Moreover, the ${j}$ component of the state is never used, because we treat each output digit in the same way (in contrast to the input digits); so that may be eliminated. Finally, we can eliminate some of the numeric coercions if we represent the ${i}$ component as a rational in the first place:

$\displaystyle \mathit{piDigits} = \begin{array}[t]{@{}l} \mathit{go}\;((1,0,1) :: (\mathit{Rational},\mathit{Rational},\mathit{Rational}))\;\mathbf{where} \\ \qquad \mathit{go}\;(i,u,v) = \begin{array}[t]{@{}ll} \mathbf{if} & y == \mathit{floor}\;(u+v \times 4) \\ \mathbf{then} & y : \mathit{go}\;(i,10 \times (u-\mathit{fromIntegral}\;y),10 \times v) \\ \mathbf{else} & \mathit{go}\;(i+1,u+2 \times v, (v \times i) / (2 \times i+1)) \\ \multicolumn{2}{@{}l}{\qquad \mathbf{where}\; y = \mathit{floor}\;(u+v \times 3)} \end{array} \end{array}$

Then we have

$\displaystyle \mathit{piDigits} = [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9,3...$

Posted in Uncategorized | 4 Comments

## Metamorphisms

It appears that I have insufficient time, or at least insufficient discipline, to contribute to this blog, except when I am on sabbatical. Which I now am… so let’s see if I can do better.

## Hylomorphisms

I don’t think I’ve written about them yet in this series—another story, for another day—but hylomorphisms consist of a fold after an unfold. One very simple example is the factorial function: ${n!}$ is the product of the predecessors ${[n,...,1]}$ of ${n}$. The predecessors can be computed with an unfold:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{preds} &::& \mathit{Integer} \rightarrow [\mathit{Integer}] \\ \mathit{preds} &=& \mathit{unfoldr}\;\mathit{step} \; \mathbf{where} \\ & & \quad \begin{array}[t]{@{}lcl@{}} \mathit{step}\;0 &=& \mathit{Nothing} \\ \mathit{step}\;n &=& \mathit{Just}\;(n, n-1) \end{array} \end{array}$

and the product as a fold:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{prod} &::& [\mathit{Integer}] \rightarrow \mathit{Integer} \\ \mathit{prod} &=& \mathit{foldr}\;(\times)\;1 \end{array}$

and then factorial is their composition:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{factorial} &::& \mathit{Integer} \rightarrow \mathit{Integer} \\ \mathit{factorial} &=& \mathit{prod} \cdot \mathit{preds} \end{array}$

Another example is a tree-based sorting algorithm that resembles Hoare’s quicksort: from the input list, grow a binary search tree, as an unfold, and then flatten that tree back to a sorted list, as a fold. This is a divide-and-conquer algorithm; in general, these can be modelled as unfolding a tree of subproblems by repeatedly dividing the problem, then collecting the solution to the original problem by folding together the solutions to subproblems.

## An unfold after a fold

This post is about the opposite composition, an unfold after a fold. Some examples:

• ${\mathit{regroup}\;n = \mathit{group}\;n \cdot \mathit{concat}}$ to reformat a list of lists to a given length;
• ${\mathit{heapsort} = \mathit{flattenHeap} \cdot \mathit{buildHeap}}$ to sort a list;
• ${\mathit{baseconv}\;(b,c) = \mathit{toBase}\;b \cdot \mathit{fromBase}\;c}$ to convert a fraction from base ${c}$ to base ${b}$;
• ${\mathit{arithCode} = \mathit{toBits} \cdot \mathit{narrow}}$ to encode a text in binary by “arithmetic coding”.

In each of these cases, the first phase is a fold, which consumes some structured representation of a value into an intermediate unstructured format, and the second phase is an unfold, which generates a new structured representation. Their composition effects a change of representation, so we call them metamorphisms.

Hylomorphisms always fuse, and one can deforest the intermediate virtual data structure. For example, one need not construct the intermediate list in the factorial function; since each cell gets constructed in the unfold only to be immediately deconstructed in the fold, one can cut to the chase and go straight to the familiar recursive definition. For the base case, we have:

$\displaystyle \begin{array}{ll} & \mathit{factorial}\;0 \\ = & \qquad \{ \mathit{factorial} \} \\ & \mathit{prod}\;(\mathit{preds}\;0) \\ = & \qquad \{ \mathit{preds} \} \\ & \mathit{prod}\;[\,] \\ = & \qquad \{ \mathit{prod} \} \\ & 1 \end{array}$

and for non-zero argument ${n}$, we have:

$\displaystyle \begin{array}{ll} & \mathit{factorial}\;n \\ = & \qquad \{ \mathit{factorial} \} \\ & \mathit{prod}\;(\mathit{preds}\;n) \\ = & \qquad \{ \mathit{preds} \} \\ & \mathit{prod}\;(n : \mathit{preds}\;(n-1)) \\ = & \qquad \{ \mathit{prod} \} \\ & n \times \mathit{prod}\;(\mathit{preds}\;(n-1)) \\ = & \qquad \{ \mathit{factorial} \} \\ & n \times \mathit{factorial}\;(n-1) \\ \end{array}$

In contrast, metamorphisms only fuse under certain conditions. However, when they do fuse, they also allow infinite representations to be processed, as we shall see.

Fusion seems to depend on the fold being tail-recursive; that is, a ${\mathit{foldl}}$:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{foldl} &::& (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta \\ \mathit{foldl}\;f\;b\;(a:x) &=& \mathit{foldl}\;f\;(f\;b\;a)\;x \\ \mathit{foldl}\;f\;b\;[\,] &=& b \end{array}$

For the unfold phase, we will use the usual list unfold:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{unfoldr} &::& (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow \beta \rightarrow [\gamma] \\ \mathit{unfoldr}\;g\;b &=& \mathbf{case}\;g\;b\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} \mathit{Just}\;(c,b') &\rightarrow& c : \mathit{unfoldr}\;g\;b' \\ \mathit{Nothing} &\rightarrow& [\,] \end{array} \end{array}$

We define a metamorphism as their composition:

$\displaystyle \begin{array}{l} \mathit{meta} :: (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow [\gamma] \\ \mathit{meta}\;g\;f\;b = \mathit{unfoldr}\;g \cdot \mathit{foldl}\;f\;b \end{array}$

This transforms input of type ${[A]}$ to output of type ${[C]}$: in the first phase, ${\mathit{foldl}\;f\;b}$, it consumes all the input into an intermediate value of type ${B}$; in the second phase, ${\mathit{unfoldr}\;g}$, it produces all the output.

## Streaming

Under certain conditions, it is possible to fuse these two phases—this time, not in order to eliminate an intermediate data structure (after all, the intermediate type ${B}$ need not be structured), but rather in order to allow some production steps to happen before all the consumption steps are complete.

To that end, we define the ${\mathit{stream}}$ function as follows:

$\displaystyle \begin{array}{@{}lcl@{}} \multicolumn{3}{@{}l}{\mathit{stream} :: (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow [\gamma]} \\ \mathit{stream}\;g\;f\;b\;x &=& \mathbf{case}\;g\;b\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} \mathit{Just}\;(c,b') &\rightarrow& c : \mathit{stream}\;g\;f\;b'\;x \\ \mathit{Nothing} &\rightarrow& \mathbf{case}\;x\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} a:x' &\rightarrow& \mathit{stream}\;g\;f\;(f\;b\;a)\;x' \\ {[\,]} &\rightarrow& [\,] \end{array} \end{array} \end{array}$

This takes the same arguments as ${\mathit{meta}}$. It maintains a current state ${b}$, and produces an output element ${c}$ when it can; and when it can’t produce, it consumes an input element instead. In more detail, it examines the current state ${b}$ using function ${g}$, which is like the body of an unfold; this may produce a first element ${c}$ of the result and a new state ${b'}$; when it yields no element, the next element ${a}$ of the input is consumed using function ${f}$, which is like the body of a ${\mathit{foldl}}$; and when no input remains either, we are done.

The streaming condition for ${f}$ and ${g}$ is that

$\displaystyle g\;b = \mathit{Just}\;(c,b') \quad\Rightarrow\quad \forall a \mathbin{.} g\;(f\;b\;a) = \mathit{Just}\;(c, f\;b'\;a)$

Consider a state ${b}$ from which the body ${g}$ of the unfold is productive, yielding some ${\mathit{Just}\;(c,b')}$. From here we have two choices: we can either produce the output ${c}$, move to intermediate state ${b'}$, then consume the next input ${a}$ to yield a final state ${f\;b'\;a}$; or we can consume first to get the intermediate state ${f\;b\;a}$, and again try to produce. The streaming condition says that this intermediate state ${f\;b\;a}$ will again be productive, and will yield the same output ${c}$ and the same final state ${f\;b'\;a}$. That is, instead of consuming all the inputs first, and then producing all the outputs, it is possible to produce some of the outputs early, without jeopardizing the overall result. Provided that the streaming condition holds for ${f}$ and ${g}$, then

$\displaystyle \mathit{stream}\;g\;f\;b\;x = \mathit{meta}\;g\;f\;b\;x$

for all finite lists ${x}$.

As a simple example, consider the buffering’ process ${\mathit{meta}\;\mathit{uncons}\;(\mathbin{{+}\!\!\!{+}})\;[\,]}$, where

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{uncons}\;x &=& \mathbf{case}\;x\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} [\,] &\rightarrow& \mathit{Nothing} \\ c:x' &\rightarrow& \mathit{Just}\;(c,x') \end{array} \end{array}$

Note that ${\mathit{unfoldr}\;\mathit{uncons} = \mathit{id}}$, so ${\mathit{meta}\;\mathit{uncons}\;(\mathbin{{+}\!\!\!{+}})\;[\,]}$ is just a complicated way of writing ${\mathit{concat}}$ as a ${\mathit{foldl}}$. But the streaming condition holds for ${\mathbin{{+}\!\!\!{+}}}$ and ${\mathit{uncons}}$ (as you may check), so ${\mathit{concat}}$ may be streamed. Operationally, the streaming version of ${\mathit{concat}}$ consumes one list from the input list of lists, then peels off and produces its elements one by one; when they have all been delivered, it consumes the next input list, and so on.

## Flushing

The streaming version of ${\mathit{concat}}$ is actually rather special, because the production steps can always completely exhaust the intermediate state. In contrast, consider the regrouping’ example ${\mathit{regroup}\;n = \mathit{meta}\;(\mathit{chunk}\;n)\;(\mathbin{{+}\!\!\!{+}})\;[\,]}$ where

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{chunk}\;n\;[\,] &=& \mathit{Nothing} \\ \mathit{chunk}\;n\;x &=& \mathit{Just}\;(\mathit{splitAt}\;n\;x) \end{array}$

from the introduction (here, ${\mathit{splitAt}\;n\;x}$ yields ${(y,z)}$ where ${y \mathbin{{+}\!\!\!{+}} z = x}$, with ${\mathit{length}\;y=n}$ when ${n \le \mathit{length}\;x}$ and ${y=x}$ otherwise). This transforms an input list of lists into an output list of lists, where each output chunk’ except perhaps the last has length ${n}$—if the content doesn’t divide up evenly, then the last chunk is short. One might hope to be able to stream ${\mathit{regroup}\;n}$, but it doesn’t quite work with the formulation so far. The problem is that ${\mathit{chunk}}$ is too aggressive, and will produce short chunks when there is still some input to consume. (Indeed, the streaming condition does not hold for ${\mathbin{{+}\!\!\!{+}}}$ and ${\mathit{chunk}\;n}$—why not?) One might try the more cautious producer ${\mathit{chunk'}}$:

$\displaystyle \begin{array}{@{}lclcl@{}} \mathit{chunk'}\;n\;x &\mid& n \le \mathit{length}\;x &=& \mathit{Just}\;(\mathit{splitAt}\;n\;x) \\ &\mid& \mathbf{otherwise} &=& \mathit{Nothing} \end{array}$

But this never produces a short chunk, and so if the content doesn’t divide up evenly then the last few elements will not be extracted from the intermediate state and will be lost.

We need to combine these two producers somehow: the streaming process should behave cautiously while there is still remaining input, which might influence the next output; but it should then switch to a more aggressive strategy once the input is finished, in order to flush out the contents of the intermediate state. To achieve this, we define a more general flushing stream operator:

$\displaystyle \begin{array}{@{}lcl@{}} \multicolumn{3}{@{}l@{}}{\mathit{fstream} :: (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow (\beta \rightarrow [\gamma]) \rightarrow (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow [\gamma]} \\ \mathit{fstream}\;g\;h\;f\;b\;x &=& \mathbf{case}\;g\;b\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} \mathit{Just}\;(c,b') &\rightarrow& c : \mathit{fstream}\;g\;h\;f\;b'\;x \\ \mathit{Nothing} &\rightarrow& \mathbf{case}\;x\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} a:x' &\rightarrow& \mathit{fstream}\;g\;h\;f\;(f\;b\;a)\;x' \\ {[\,]} &\rightarrow& h\;b \end{array} \end{array} \end{array}$

This takes an additional argument ${h :: \beta \rightarrow [\gamma]}$; when the cautious producer ${g}$ is unproductive, and there is no remaining input to consume, it uses ${h}$ to flush out the remaining output elements from the state. Clearly, specializing to ${h\;b=[\,]}$ retrieves the original ${\mathit{stream}}$ operator.

The corresponding metamorphism uses an apomorphism in place of the unfold. Define

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{apo} &::& (\beta \rightarrow \mathsf{Maybe}\;(\gamma,\beta)) \rightarrow (\beta \rightarrow [\gamma]) \rightarrow \beta \rightarrow [\gamma] \\ \mathit{apo}\;g\;h\;b &=& \mathbf{case}\;g\;b\;\mathbf{of} \\ & & \quad \begin{array}[t]{@{}lcl@{}} \mathit{Just}\;(c,b') &\rightarrow& c : \mathit{apo}\;g\;h\;b' \\ \mathit{Nothing} &\rightarrow& h\;b \end{array} \end{array}$

Then ${\mathit{apo}\;g\;h}$ behaves like ${\mathit{unfoldr}\;g}$, except that if and when ${g}$ stops being productive it finishes up by applying ${h}$ to the final state. Similarly, define flushing metamorphisms:

$\displaystyle \mathit{fmeta}\;g\;h\;f\;b = \mathit{apo}\;g\;h \cdot \mathit{foldl}\;f\;b$

Then we have

$\displaystyle \mathit{fstream}\;g\;h\;f\;b\;x = \mathit{fmeta}\;g\;h\;f\;b\;x$

for all finite lists ${x}$ if the streaming condition holds for ${f}$ and ${g}$. In particular,

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{regroup}\;n\;\mathit{xs} &=& \mathit{fmeta}\;(\mathit{chunk'}\;n)\;(\mathit{unfoldr}\;(\mathit{chunk}\;n))\;(\mathbin{{+}\!\!\!{+}})\;[\,]\;\mathit{xs} \\ &=& \mathit{fstream}\;(\mathit{chunk'}\;n)\;(\mathit{unfoldr}\;(\mathit{chunk}\;n))\;(\mathbin{{+}\!\!\!{+}})\;[\,]\;\mathit{xs} \end{array}$

on finite inputs ${\mathit{xs}}$: the streaming condition does hold for ${\mathbin{{+}\!\!\!{+}}}$ and the more cautious ${\mathit{chunk'}\;n}$, and once the input has been exhausted, the process can switch to the more aggressive ${\mathit{chunk}\;n}$.

## Infinite input

The main advantage of streaming is that it can allow the change-of-representation process also to work on infinite inputs. With the plain metamorphism, this is not possible: the ${\mathit{foldl}}$ will yield no result on an infinite input, and so the ${\mathit{unfoldr}}$ will never get started, but the ${\mathit{stream}}$ may be able to produce some outputs before having consumed all the inputs. For example, the streaming version of ${\mathit{regroup}\;n}$ also works for infinite lists, providing that the input does not end with an infinite tail of empty lists. And of course, if the input never runs out, then there is no need ever to switch to the more aggressive flushing phase.

As a more interesting example, consider converting a fraction from base 3 to base 7:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{fromBase3} &=& \mathit{foldr}\;\mathit{stepr}\;0 \quad \mathbf{where}\;\mathit{stepr}\;d\;x = (d+x)/3 \\ \mathit{toBase7} &=& \mathit{unfoldr}\;\mathit{next} \quad \mathbf{where}\; \begin{array}[t]{@{}lcl@{}} \mathit{next}\;0 &=& \mathit{Nothing} \\ \mathit{next}\;x &=& \mathbf{let}\;y=7\times x\;\mathbf{in}\;\mathit{Just}\;(\lfloor y\rfloor, y-\lfloor y\rfloor) \end{array} \end{array}$

We assume that the input digits are all either 0, 1 or 2, so that the number being represented is in the unit interval.

The fold in ${\mathit{fromBase3}}$ is of the wrong kind; but we have also

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{fromBase3} &=& \mathit{extract} \cdot \mathit{foldl}\;\mathit{stepl}\;(0,1) \quad \mathbf{where}\; \mathit{stepl}\;(u,v)\;d = (u \times 3 + d, v / 3) \end{array}$

Here, the intermediate state ${(u,v)}$ can be seen as a defunctionalized representation of the function ${(v\times) \cdot (u+)}$, and ${\mathit{extract}}$ applies this function to ${0}$:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{apply}\;(u,v)\;x &=& v \times (u + x) \\ \mathit{extract}\;(u,v) &=& \mathit{apply}\;(u,v)\;0 \end{array}$

Now there is an extra function ${\mathit{extract}}$ between the ${\mathit{foldl}}$ and the ${\mathit{unfoldr}}$; but that’s no obstacle, because it fuses with the ${\mathit{unfoldr}}$:

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{toBase7} \cdot \mathit{extract} &=& \mathit{unfoldr}\;\mathit{next'} \quad \mathbf{where}\; \begin{array}[t]{@{}lcl@{}} \mathit{next'}\;(0,v) &=& \mathit{Nothing} \\ \mathit{next'}\;(u,v) &=& \begin{array}[t]{@{}l} \mathbf{let}\;y = \lfloor{7 \times u \times v}\rfloor\;\mathbf{in} \\ \mathit{Just}\;(y,(u - y/(v \times 7), v \times 7)) \end{array} \end{array} \end{array}$

However, the streaming condition does not hold for ${\mathit{stepl}}$ and ${\mathit{next'}}$. For example,

$\displaystyle \begin{array}{@{}lcl@{}} \mathit{next'}\;(1,{{}^{1\!}/_{\!3}}) &=& \mathit{Just}\;(2, ({{}^{1\!}/_{\!7}},{{}^{7\!}/_{\!3}})) \\ \mathit{next'}\;(\mathit{stepl}\;(1,{{}^{1\!}/_{\!3}})\;1) &=& \mathit{next'}\;(4,{{}^{1\!}/_{\!9}}) \\ &=& \mathit{Just}\;(3,({{}^{1\!}/_{\!7}},{{}^{7\!}/_{\!9}})) \end{array}$

That is, ${0.1_3 \simeq 0.222_7}$, but ${0.11_3 \simeq 0.305_7}$, so it is premature to produce the first digit 2 in base 7 having consumed only the first digit 1 in base 3. The producer ${\mathit{next'}}$ is too aggressive; it should be more cautious while input remains that might invalidate a produced digit.

Fortunately, on the assumption that the input digits are all 0, 1, or 2, the unconsumed input—a tail of the original input—again represents a number in the unit interval; so from the state ${(u,v)}$ the range of possible unproduced outputs represents a number between ${\mathit{apply}\;(u,v)\;0}$ and ${\mathit{apply}\;(u,v)\;1}$. If these both start with the same digit in base 7, then (and only then) is it safe to produce that digit. So we define

$\displaystyle \mathit{next''}\;(u,v) = \mathbf{if}\;\lfloor{u \times v \times 7}\rfloor = \lfloor{(u+1) \times v \times 7}\rfloor\;\mathbf{then}\;\mathit{next'}\;(u,v)\;\mathbf{else}\;\mathit{Nothing}$

and we have

$\displaystyle \mathit{unfoldr}\;\mathit{next'} = \mathit{apo}\;\mathit{next''}\;(\mathit{unfoldr}\;\mathit{next'})$

Now, the streaming condition holds for ${\mathit{stepl}}$ and ${\mathit{next''}}$ (as you may check), and therefore

$\displaystyle \mathit{toBase7}\;(\mathit{fromBase3}\;x) = \mathit{fstream}\;\mathit{next''}\;(\mathit{unfoldr}\;\mathit{next'})\;\mathit{stepl}\;(0,1)\;x$

on finite digit sequences ${x}$ in base 3. Moreover, the streaming program works also on infinite digit sequences, where the original does not.

(Actually, the only way this could possibly produce a finite output in base 7 would be for the input to be all zeroes. Why? If we are happy to rule out this case, we could consider only the case of taking infinite input to infinite output, and not have to worry about reaching the end of the input or flushing the state.)

Posted in Uncategorized | 8 Comments

What is the correct way to write breadth first traversal of a ${[\mathsf{Tree}]}$?

He’s thinking of “traversal” in the sense of the ${\mathit{Traversable}}$ class, and gave a concrete declaration of rose trees:

$\displaystyle \begin{array}{lcl} \mathbf{data}\;\mathsf{Tree}\;\alpha &=& \mathit{Tree}\; \{\; \mathit{root} :: \alpha, \mathit{children} :: [\mathsf{Tree}\;\alpha] \;\} \end{array}$

It’s an excellent question.

First, let’s think about breadth-first enumeration of the elements of a tree. This isn’t compositional (a fold); but the related “level-order enumeration”, which gives a list of lists of elements, one list per level, is compositional:

$\displaystyle \begin{array}{lcl} \mathit{levels} &::& \mathsf{Tree}\;\alpha \rightarrow [[\alpha]] \\ \mathit{levels}\;t &=& [\mathit{root}\;t] : \mathit{foldr}\;(\mathit{lzw}\;(\mathbin{{+}\!\!\!{+}}))\;[\,]\;(\mathit{map}\;\mathit{levels}\;(\mathit{children}\;t)) \end{array}$

Here, ${\mathit{lzw}}$ is “long zip with”; it’s similar to ${\mathit{zipWith}}$, but returns a list as long as its longer argument:

$\displaystyle \begin{array}{llllcl} \mathit{lzw} &&&&::& (\alpha\rightarrow\alpha\rightarrow\alpha) \rightarrow [\alpha]\rightarrow[\alpha]\rightarrow[\alpha] \\ \mathit{lzw}&f&(a:x)&(b:y) &=& f\;a\;b : \mathit{lzw}\;f\;x\;y \\ \mathit{lzw}&f&x&[\,] &=& x \\ \mathit{lzw}&f&[\,]&y &=& y \end{array}$

(It’s a nice exercise to define a notion of folds for ${\mathsf{Tree}}$, and to write ${\mathit{levels}}$ as a fold.)

Given ${\mathit{levels}}$, breadth-first enumeration is obtained by concatenation:

$\displaystyle \begin{array}{lcl} \mathit{bf} &::& \mathsf{Tree}\;\alpha \rightarrow [\alpha] \\ \mathit{bf} &=& \mathit{concat} \cdot \mathit{levels} \end{array}$

Incidentally, this allows trees to be foldable, breadth-first:

$\displaystyle \begin{array}{lcl} \mathbf{instance}\;\mathit{Foldable}\;\mathsf{Tree}\;\mathbf{where} \\ \quad \mathit{foldMap}\;f = \mathit{foldMap}\;f \cdot \mathit{bf} \end{array}$

## Relabelling

Level-order enumeration is invertible, in the sense that you can reconstruct the tree given its shape and its level-order enumeration.

One way to define this is to pass the level-order enumeration around the tree, snipping bits off it as you go. Here is a mutually recursive pair of functions to relabel a tree with a given list of lists, returning also the unused bits of the lists of lists.

$\displaystyle \begin{array}{lcl} \mathit{relabel} &::& (\mathsf{Tree}\;(),[[\alpha]]) \rightarrow (\mathsf{Tree}\;\alpha,[[\alpha]]) \\ \mathit{relabel}\;(t,(x:\mathit{xs}):\mathit{xss}) &=& \mathbf{let}\; (\mathit{us},\mathit{yss}) = \mathit{relabels}\; (\mathit{children}\;t,\mathit{xss}) \; \mathbf{in}\; (\mathit{Tree}\;x\;\mathit{us}, \mathit{xs}:\mathit{yss}) \medskip \\ \mathit{relabels} &::& ([\mathsf{Tree}\;()],[[\alpha]]) \rightarrow ([\mathsf{Tree}\;\alpha],[[\alpha]]) \\ \mathit{relabels}\;([],\mathit{xss}) &=& ([],\mathit{xss}) \\ \mathit{relabels}\;(t:\mathit{ts},\mathit{xss}) &=& \mathbf{let} \; (u,\mathit{yss}) = \mathit{relabel}\;(t,\mathit{xss}) \mathbin{;} (\mathit{us},\mathit{zss}) = \mathit{relabels}\;(\mathit{ts},\mathit{yss}) \; \mathbf{in} \; (u:\mathit{us},\mathit{zss}) \end{array}$

Assuming that the given list of lists is “big enough”—ie each list has enough elements for that level of the tree—then the result is well-defined. Then ${\mathit{relabel}}$ is determined by the equivalence

$\displaystyle \begin{array}{ll} & \mathit{relabel}\;(t,\mathit{xss}) = (u,\mathit{yss}) \\ \Leftrightarrow & \\ & \mathit{shape}\;u = \mathit{shape}\;t \land \mathit{lzw}\;(\mathbin{{+}\!\!\!{+}})\;(\mathit{levels}\;u)\;\mathit{yss} = \mathit{xss} \end{array}$

Here, the ${\mathit{shape}}$ of a tree is obtained by discarding its elements:

$\displaystyle \begin{array}{lcl} \mathit{shape} &::& \mathsf{Tree}\;\alpha \rightarrow \mathsf{Tree}\;() \\ \mathit{shape} &=& \mathit{fmap}\;(\mathit{const}\;()) \end{array}$

In particular, if the given list of lists is the level-order of the tree, and so is exactly the right size, then ${\mathit{yss}}$ will have no remaining elements, consisting entirely of empty levels:

$\displaystyle \mathit{relabel}\;(\mathit{shape}\;t, \mathit{levels}\;t) = (t, \mathit{replicate}\;(\mathit{depth}\;t)\;[\,])$

So we can take a tree apart into its shape and contents, and reconstruct the tree from such data.

$\displaystyle \begin{array}{lcl} \mathit{split} &::& \mathsf{Tree}\;\alpha \rightarrow (\mathsf{Tree}\;(), [[\alpha]]) \\ \mathit{split}\;t &=& (\mathit{shape}\;t, \mathit{levels}\;t) \medskip \\ \mathit{combine} &::& \mathsf{Tree}\;() \rightarrow [[\alpha]] \rightarrow \mathsf{Tree}\;\alpha \\ \mathit{combine}\;u\;\mathit{xss} &=& \mathit{fst}\;(\mathit{relabel}\;(u, \mathit{xss})) \end{array}$

This lets us traverse a tree in breadth-first order, by performing the traversal just on the contents. We separate the tree into shape and contents, perform a list-based traversal, and reconstruct the tree.

$\displaystyle \begin{array}{l} \mathbf{instance}\;\mathit{Traversable}\;\mathsf{Tree}\;\mathbf{where} \\ \quad \mathit{traverse}\;f\;t = \mathit{pure}\;(\mathit{combine}\;(\mathit{shape}\;t)) \circledast \mathit{traverse}\;(\mathit{traverse}\;f)\;(\mathit{levels}\;t) \end{array}$

This trick of traversal by factoring into shape and contents is explored in my paper Understanding Idiomatic Traversals Backwards and Forwards from Haskell 2013.

We’ve seen that level-order enumeration is invertible in a certain sense, and that this means that we perform traversal by factoring into shape and contents then traversing the contents independently of the shape. But the contents we’ve chosen is the level-order enumeration, which is a list of lists. Normally, one thinks of the contents of a data structure as simply being a list, ie obtained by breadth-first enumeration rather than by level-order enumeration. Can we do relabelling from the breadth-first enumeration too? Yes, we can!

There’s a very clever cyclic program for breadth-first relabelling of a tree given only a list, not a list of lists; in particular, breadth-first relabelling a tree with its own breadth-first enumeration gives back the tree you first thought of. In fact, the relabelling function is precisely the same as before! The trick comes in constructing the necessary list of lists:

$\displaystyle \begin{array}{lcl} \mathit{bflabel} &::& \mathsf{Tree}\;() \rightarrow [\alpha] \rightarrow \mathsf{Tree}\;\alpha \\ \mathit{bflabel}\;t\;\mathit{xs} &=& \mathbf{let}\;(u,\mathit{xss}) = \mathit{relabel}\;(t, \mathit{xs}:\mathit{xss})\;\mathbf{in}\;u \end{array}$

Note that variable ${\mathit{xss}}$ is defined cyclically; informally, the output leftovers ${\mathit{xss}}$ on one level also form the input elements to be used for relabelling all the lower levels. Given this definition, we have

$\displaystyle \mathit{bflabel}\;(\mathit{shape}\;t)\;(\mathit{bf}\;t) = t$

for any ${t}$. This program is essentially due to Geraint Jones, and is derived in an unpublished paper Linear-Time Breadth-First Tree Algorithms: An Exercise in the Arithmetic of Folds and Zips that we wrote together in 1993.

$\displaystyle \begin{array}{l} \mathbf{instance}\;\mathit{Traversable}\;\mathsf{Tree}\;\mathbf{where} \\ \quad \mathit{traverse}\;f\;t = \mathit{pure}\;(\mathit{bflabel}\;(\mathit{shape}\;t)) \circledast \mathit{traverse}\;f\;(\mathit{bf}\;t) \end{array}$

Posted in Uncategorized | 7 Comments

## Comprehensions

Prompted by some recent work I’ve been doing on reasoning about monadic computations, I’ve been looking back at the work from the 1990s by Phil Trinder, Limsoon Wong, Leonidas Fegaras, Torsten Grust, and others, on monad comprehensions as a framework for database queries.

The idea goes back to the adjunction between extension and intension in set theory—you can define a set by its extension, that is by listing its elements:

$\displaystyle \{ 1, 9, 25, 49, 81 \}$

or by its intension, that is by characterizing those elements:

$\displaystyle \{ n^2 \mid 0 < n < 10 \land n \equiv 1 (\mathop{mod} 2) \}$

Expressions in the latter form are called set comprehensions. They inspired a programming notation in the SETL language from NYU, and have become widely known through list comprehensions in languages like Haskell. The structure needed of sets or of lists to make this work is roughly that of a monad, and Phil Wadler showed how to generalize comprehensions to arbitrary monads, which led to the “do” notation in Haskell. Around the same time, Phil Trinder showed that comprehensions make a convenient database query language. The comprehension notation has been extended to cover other important aspects of database queries, particularly aggregation and grouping. Monads and aggregations have very nice algebraic structure, which leads to a useful body of laws to support database query optimization.

## List comprehensions

Just as a warm-up, here is a reminder about Haskell’s list comprehensions.

$\displaystyle [ 2 \times a + b \mid a \leftarrow [1,2,3] , b \leftarrow [4,5,6] , b \mathbin{\underline{\smash{\mathit{mod}}}} a == 0 ]$

This (rather concocted) example yields the list of all values of the expression ${2 \times a + b}$ as ${a}$ is drawn from ${[1,2,3]}$ and ${b}$ from ${[4,5,6]}$ and such that ${b}$ is divisible by ${a}$, namely ${[6,7,8,8,10,12]}$.

To the left of the vertical bar is the term (an expression). To the right is a comma-separated sequence of qualifiers, each of which is either a generator (of the form ${a \leftarrow x}$, with a variable ${a}$ and a list expression ${x}$) or a filter (a boolean expression). The scope of a variable introduced by a generator extends to all subsequent generators and to the term. Note that, in contrast to the mathematical inspiration, bound variables need to be generated from some existing list.

The semantics of list comprehensions is defined by translation; see for example Phil Wadler’s Chapter 7 of The Implementation of Functional Programming Languages. It can be expressed equationally as follows:

$\displaystyle \begin{array}{lcl} [ e \mid \epsilon ] &=& [e] \\ {} [ e \mid b ] &=& \mathbf{if}\;b\;\mathbf{then}\;[ e ]\;\mathbf{else}\;[\,] \\ {} [ e \mid a \leftarrow x ] &=& \mathit{map}\,(\lambda a \mathbin{.} e)\,x \\ {} [ e \mid q, q' ] &=& \mathit{concat}\,[ [ e \mid q' ] \mid q ] \end{array}$

(Here, ${\epsilon}$ denotes the empty sequence of qualifiers. It’s not allowed in Haskell, but it is helpful in simplifying the translation.)

Applying this translation to the example at the start of the section gives

$\displaystyle \begin{array}{ll} & [ 2 \times a + b \mid a \leftarrow [1,2,3] , b \leftarrow [4,5,6] , b \mathbin{\underline{\smash{\mathit{mod}}}} a == 0 ] \\ = & \mathit{concat}\,(\mathit{map}\,(\lambda a \mathbin{.} \mathit{concat}\,(\mathit{map}\,(\lambda b \mathbin{.} \mathbf{if}\;b \mathbin{\underline{\smash{\mathit{mod}}}} a == 0\;\mathbf{then}\;[2 \times a + b]\;\mathbf{else}\;[\,])\,[4,5,6]))\,[1,2,3]) \\ = & [6,7,8,8,10,12] \end{array}$

More generally, a generator may match against a pattern rather than just a variable. In that case, it may bind multiple (or indeed no) variables at once; moreover, the match may fail, in which case it is discarded. This is handled by modifying the translation for generators to use a function defined by pattern-matching, rather than a straight lambda-abstraction:

$\displaystyle [ e \mid p \leftarrow x ] = \mathit{concat}\,(\mathit{map}\,(\lambda a \mathbin{.} \mathbf{case}\;a\;\mathbf{of}\;p \rightarrow [ e ] \;;\; \_ \rightarrow [\,])\,x)$

or, more perspicuously,

$\displaystyle [ e \mid p \leftarrow x ] = \mathbf{let}\;h\,p = [ e ] ; h\,\_ = [\,]\;\mathbf{in}\; \mathit{concat}\,(\mathit{map}\,h\,x)$

It is clear from the above translation that the necessary ingredients for list comprehensions are ${\mathit{map}}$, singletons, ${\mathit{concat}}$, and the empty list. The first three are the operations arising from lists as a functor and a monad, which suggests that the same translation might be applicable to other monads too. But the fourth ingredient, the empty list, does not come from the functor and monad structures; that requires an extra assumption:

$\displaystyle \begin{array}{ll} \mathbf{class}\;\mathit{Monad}\,m \Rightarrow \mathit{MonadZero}\,m\;\mathbf{where} \\ \quad \mathit{mzero} :: m\,a \end{array}$

Then the translation for list comprehensions can be generalized to other monads:

$\displaystyle \begin{array}{lcl} [ e \mid \epsilon ] &=& \mathit{return}\,e \\ {} [ e \mid b ] &=& \mathbf{if}\;b\;\mathbf{then}\;\mathit{return}\,e\;\mathbf{else}\;\mathit{mzero} \\ {} [ e \mid p \leftarrow m ] &=& \mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \mathit{mzero}\;\mathbf{in}\; \mathit{join}\,(\mathit{map}\,h\,m) \\ {} [ e \mid q, q' ] &=& \mathit{join}\,[ [ e \mid q' ] \mid q ] \end{array}$

(so ${[ e \mid \epsilon ] = [ e \mid \mathit{True} ]}$). The actual monad to be used is implicit; if we want to be explicit, we could use a subscript, as in “${[ e \mid q ]_\mathsf{List}}$“.

This translation is different from the one used in the Haskell language specification, which to my mind is a little awkward: the empty list crops up in two different ways in the translation of list comprehensions—for filters, and for generators with patterns—and these are generalized in two different ways to other monads (to the ${\mathit{mzero}}$ method of the ${\mathit{MonadPlus}}$ class in the first case, and the ${\mathit{fail}}$ method of the ${\mathit{Monad}}$ class in the second). I think it is neater to have a monad subclass ${\mathit{MonadZero}}$ with a single method subsuming both these operators. Of course, this does mean that the translation forces a monad comprehension with filters or possibly failing generators to be interpreted in a monad in the ${\mathit{MonadZero}}$ subclass rather than just ${\mathit{Monad}}$—the type class constraints that are generated depend on the features used in the comprehension. (Perhaps this translation was tried in earlier versions of the language specification, and found wanting?)

Taking this approach gives basically the monad comprehension notation from Wadler’s Comprehending Monads paper; it loosely corresponds to Haskell’s do notation, except that the term is to the left of a vertical bar rather than at the end, and that filters are just boolean expressions rather than introduced using ${\mathit{guard}}$.

We might impose the law that ${\mathit{mzero}}$ is a “left” zero of composition, in the sense

$\displaystyle \mathit{join}\,\mathit{mzero} = \mathit{mzero}$

or, in terms of comprehensions,

$\displaystyle [ e \mid a \leftarrow \mathit{mzero} ] = \mathit{mzero}$

Informally, this means that any failing steps of the computation cleanly cut off subsequent branches. Conversely, we do not require that ${\mathit{mzero}}$ is a “right” zero too:

$\displaystyle \mathit{join}\,(\mathit{map}\,(\lambda a \mathbin{.} \mathit{mzero})\,m) \ne \mathit{mzero} \quad\mbox{(in general)}$

This would have the consequence that a failing step also cleanly erases any effects from earlier parts of the computation, which is too strong a requirement for many monads—particularly those of the “launch missiles now” variety. (The names “left-” and “right zero” make more sense when the equations are expressed in terms of the usual Haskell bind operator ${(\gg\!=)}$, which is a kind of sequential composition.)

One more ingredient is needed in order to characterize monads that correspond to “collection classes” such as sets and lists, and that is an analogue of set union or list append. It’s not difficult to see that this is inexpressible in terms of the operations introduced so far: given only collections ${m}$ of at most one element, any comprehension using generators of the form ${a \leftarrow m}$ will only yield another such collection, whereas the union of two one-element collections will in general have two elements.

To allow any finite collection to be expressed, it suffices to introduce a binary union operator ${\uplus}$:

$\displaystyle \begin{array}{ll} \mathbf{class}\;\mathit{Monad}\,m \Rightarrow \mathit{MonadPlus}\,m\;\mathbf{where} \\ \quad (\uplus) :: m\,a \times m\,a \rightarrow m\,a \end{array}$

We require composition to distribute over union, in the following sense:

$\displaystyle \mathit{join}\,(m \uplus n) = \mathit{join}\,m \uplus \mathit{join}\,n$

or, in terms of comprehensions,

$\displaystyle [ e \mid a \leftarrow m \uplus n, q ] = [ e \mid a \leftarrow m, q ] \uplus [ e \mid a \leftarrow n, q ]$

For the remainder of this post, we will assume a monad in both ${\mathit{MonadZero}}$ and ${\mathit{MonadPlus}}$. Moreover, we will assume that ${\mathit{mzero}}$ is the unit of ${\uplus}$, and is both a left- and a right zero of composition. To stress the additional constraints, we will write “${\emptyset}$” for “${\mathit{mzero}}$” from now on. The intention is that such monads exactly capture collection classes; Phil Wadler has called these structures ringads. (He seems to have done so in an unpublished note Notes on Monads and Ringads from 1990, which is cited by some papers from the early 1990s. But Phil no longer has a copy of this note, and it’s not online anywhere… I’d love to see a copy, if anyone has one!)

$\displaystyle \begin{array}{ll} \mathbf{class}\;(\mathit{MonadZero}\,m, \mathit{MonadPlus}\,m) \Rightarrow \mathit{Ringad}\,m\;\mathbf{where} \end{array}$

(There are no additional methods; the class ${\mathit{Ringad}}$ is the intersection of the two parent classes ${\mathit{MonadZero}}$ and ${\mathit{MonadPlus}}$, with the union of the two interfaces, together with the laws above.) I used roughly the same construction already in the post on Horner’s Rule.

As well as (finite) sets and lists, ringad instances include (finite) bags and a funny kind of binary tree (externally labelled, possibly empty, in which the empty tree is a unit of the binary tree constructor). These are all members of the so-called Boom Hierarchy of types—a name coined by Richard Bird, for an idea due to Hendrik Boom, who by happy coincidence is named for one of these structures in his native language. All members of the Boom Hierarchy are generated from the empty, singleton, and union operators, the difference being whether union is associative, commutative, and idempotent. Another ringad instance, but not a member of the Boom Hierarchy, is the type of probability distributions—either normalized, with a weight-indexed family of union operators, or unnormalized, with an additional scaling operator.

## Aggregation

The well-behaved operations over monadic values are called the algebras for that monad—functions ${k}$ such that ${k \cdot \mathit{return} = \mathit{id}}$ and ${k \cdot \mathit{join} = k \cdot \mathit{map}\,k}$. In particular, ${\mathit{join}}$ is itself a monad algebra. When the monad is also a ringad, ${k}$ necessarily distributes also over ${\uplus}$—there is a binary operator ${\oplus}$ such that ${k\,(m \uplus n) = k\,m \oplus k\,n}$ (exercise!). Without loss of generality, we write ${\oplus/}$ for ${k}$; these are the “reductions” of the Bird–Meertens Formalism. In that case, ${\mathit{join} = \uplus/}$ is a ringad algebra.

The algebras for a ringad amount to aggregation functions for a collection: the sum of a bag of integers, the maximum of a set of naturals, and so on. We could extend the comprehension notation to encompass aggregations too, for example by adding an optional annotation, writing say “${[ e \mid q ]^\oplus}$“; although this doesn’t add much, because we could just have written “${\oplus/\,[e \mid q]}$” instead. We could generalize from reductions ${\oplus/}$ to collection homomorphisms ${\oplus/ \cdot \mathit{map}\,f}$; but this doesn’t add much either, because the map is easily combined with the comprehension—it’s easy to show the “map over comprehension” property

$\displaystyle \mathit{map}\,f\,[e \mid q] = [f\,e \mid q]$

Leonidas Fegaras and David Maier develop a monoid comprehension calculus around such aggregations; but I think their name is inappropriate, because nothing forces the binary aggregating operator to be associative.

Note that, for ${\oplus/}$ to be well-defined, ${\oplus}$ must satisfy all the laws that ${\uplus}$ does—${\oplus}$ must be associative if ${\uplus}$ is associative, and so on. It is not hard to show, for instance, that there is no ${\oplus}$ on sets of numbers for which ${\mathit{sum}\,(x \cup y) = \mathit{sum}\,x \oplus \mathit{sum}\,y}$; such an ${\oplus}$ would have to be idempotent, which is inconsistent with its relationship with ${\mathit{sum}}$. (So, although ${[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Bag}^{+}}$ denotes the sum of the squares of the odd elements of bag ${x}$, the expression ${[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Set}^{+}}$ (with ${x}$ now a set) is not defined, because ${+}$ is not idempotent.) In particular, ${\oplus/\emptyset}$ must be the unit of ${\oplus}$, which we write ${1_\oplus}$.

We can derive translation rules for aggregations from the definition

$\displaystyle [ e \mid q ]^\oplus = \oplus/\,[e \mid q]$

For empty aggregations, we have:

$\displaystyle \begin{array}{ll} & [ e \mid \epsilon ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid \epsilon ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathit{return}\,e) \\ = & \qquad \{ \mbox{monad algebra} \} \\ & e \end{array}$

For filters, we have:

$\displaystyle \begin{array}{ll} & [ e \mid b ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid b ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathbf{if}\;b\;\mathbf{then}\;\mathit{return}\,e\;\mathbf{else}\;\emptyset) \\ = & \qquad \{ \mbox{lift out the conditional} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;{\oplus/}\,(\mathit{return}\,e)\;\mathbf{else}\;{\oplus/}\,\emptyset \\ = & \qquad \{ \mbox{ringad algebra} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;e\;\mathbf{else}\;1_\oplus \end{array}$

For generators, we have:

$\displaystyle \begin{array}{ll} & [ e \mid p \leftarrow m ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid p \leftarrow m ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \emptyset\;\mathbf{in}\;\mathit{join}\,(\mathit{map}\,h\,m)) \\ = & \qquad \{ \mbox{lift out the \textbf{let}} \} \\ & \mathbf{let}\;h\,p = \mathit{return},e ; h\,\_ = \emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{join}\,(\mathit{map}\,h\,m)) \\ = & \qquad \{ \mbox{monad algebra} \} \\ & \mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,(\oplus/)\,(\mathit{map}\,h\,m)) \\ = & \qquad \{ \mbox{functors} \} \\ & \mathbf{let}\;h\,p = \mathit{return}\,e ; h\,\_ = \emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,(\oplus/ \cdot h)\,m) \\ = & \qquad \{ \mbox{let~} h' = \oplus/ \cdot h \} \\ & \mathbf{let}\;h'\,p = \oplus/\,(\mathit{return}\,e) ; h'\,\_ = \oplus/\,\emptyset\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,h'\,m) \\ = & \qquad \{ \mbox{ringad algebra} \} \\ & \mathbf{let}\;h'\,p = e ; h'\,\_ = 1_\oplus\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,h'\,m) \end{array}$

And for sequences of qualifiers, we have:

$\displaystyle \begin{array}{ll} & [ e \mid q, q' ]^\oplus \\ = & \qquad \{ \mbox{aggregation} \} \\ & \oplus/\,[ e \mid q, q' ] \\ = & \qquad \{ \mbox{comprehension} \} \\ & \oplus/\,(\mathit{join}\,[ [ e \mid q'] \mid q ] \\ = & \qquad \{ \mbox{monad algebra} \} \\ & \oplus/\,(\mathit{map}\,(\oplus/)\,[ [ e \mid q'] \mid q ]) \\ = & \qquad \{ \mbox{map over comprehension} \} \\ & \oplus/\,[ \oplus/\,[ e \mid q'] \mid q ] \\ = & \qquad \{ \mbox{aggregation} \} \\ & [ [ e \mid q']^\oplus \mid q ]^\oplus \end{array}$

Putting all this together, we have:

$\displaystyle \begin{array}{lcl} [ e \mid \epsilon ]^\oplus &=& e \\ {} [ e \mid b ]^\oplus &=&\mathbf{if}\;b\;\mathbf{then}\;e\;\mathbf{else}\;1_\oplus \\ {} [ e \mid p \leftarrow m ]^\oplus &=& \mathbf{let}\;h\,p = e ; h\,\_ = 1_\oplus\;\mathbf{in}\;{\oplus/}\,(\mathit{map}\,h\,m) \\ {} [ e \mid q, q' ]^\oplus &=& [ [ e \mid q']^\oplus \mid q ]^\oplus \end{array}$

## Heterogeneous comprehensions

We have seen that comprehensions can be interpreted in an arbitrary ringad; for example, ${[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Set}}$ denotes (the set of) the squares of the odd elements of (the set) ${x}$, whereas ${[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Bag}}$ denotes the bag of such elements, with ${x}$ a bag. Can we make sense of “heterogeneous comprehensions”, involving several different ringads?

Let’s introduced the notion of a ringad morphism, extending the familiar analogue on monads. For monads ${\mathsf{M}}$ and ${\mathsf{N}}$, a monad morphism ${\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}}$ is a natural transformation ${\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}}$—that is, a family ${\phi_\alpha :: \mathsf{M}\,\alpha \rightarrow \mathsf{N}\,\alpha}$ of arrows, coherent in the sense that ${\phi_\beta \cdot \mathsf{M}\,f = \mathsf{N}\,f \cdot \phi_\alpha}$ for ${f :: \alpha \rightarrow \beta}$—that also preserves the monad structure:

$\displaystyle \begin{array}{lclcl} \phi \cdot \mathit{return}_\mathsf{M} &=& \mathit{return}_\mathsf{N} \\ \phi \cdot \mathit{join}_\mathsf{M} &=& \mathit{join}_\mathsf{N} \cdot \phi \cdot \mathsf{M}\,\phi &=& \mathit{join}_\mathsf{N} \cdot \mathsf{N}\,\phi \cdot \phi \end{array}$

A ringad morphism ${\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}}$ for ringads ${\mathsf{M},\mathsf{N}}$ is a monad morphism ${\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}}$ that also respects the ringad structure:

$\displaystyle \begin{array}{lcl} \phi\,\emptyset_\mathsf{M} &=& \emptyset_\mathsf{N} \\ \phi\,(x \uplus_\mathsf{M} y) &=& \phi\,x \uplus_\mathsf{N} \phi\,y \end{array}$

Then a ringad morphism behaves nicely with respect to ringad comprehensions—a comprehension interpreted in ringad ${\mathsf{M}}$, using existing collections of type ${\mathsf{M}}$, with the result transformed via a ringad morphism ${\phi : \mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{N}}$ to ringad ${\mathsf{N}}$, is equivalent to the comprehension interpreted in ringad ${\mathsf{N}}$ in the first place, but with the initial collections transformed to type ${\mathsf{N}}$. Informally, there will be no surprises arising from when ringad coercions take place, because the results are the same whenever this happens. This property is straightforward to show by induction over the structure of the comprehension. For the empty comprehension, we have:

$\displaystyle \begin{array}{ll} & \phi\,[ e \mid \epsilon ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathit{return}_\mathsf{M}\,e) \\ = & \qquad \{ \mbox{ringad morphism} \} \\ & \mathit{return}_\mathsf{N}\,e \\ = & \qquad \{ \mbox{comprehension} \} \\ & [e \mid \epsilon ]_\mathsf{N} \end{array}$

For filters, we have:

$\displaystyle \begin{array}{ll} & \phi\,[ e \mid b ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathbf{if}\;b\;\mathbf{then}\;\mathit{return}_\mathsf{M}\,e\;\mathbf{else}\;\emptyset_\mathsf{M}) \\ = & \qquad \{ \mbox{lift out the conditional} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;\phi\,(\mathit{return}_\mathsf{M}\,e)\;\mathbf{else}\;\phi\,\emptyset_\mathsf{M} \\ = & \qquad \{ \mbox{ringad morphism} \} \\ & \mathbf{if}\;b\;\mathbf{then}\;\mathit{return}_\mathsf{N}\,e\;\mathbf{else}\;\emptyset_\mathsf{N} \\ = & \qquad \{ \mbox{comprehension} \} \\ & [ e \mid b ]_\mathsf{N} \end{array}$

For generators:

$\displaystyle \begin{array}{ll} & \phi\,[ e \mid p \leftarrow m ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathbf{let}\;h\,p = \mathit{return}_\mathsf{M}\,e ; h\,\_ = \emptyset_\mathsf{M}\;\mathbf{in}\;\mathit{join}_\mathsf{M}\,(\mathit{map}_\mathsf{M}\,h\,m)) \\ = & \qquad \{ \mbox{lift out the \textbf{let}} \} \\ & \mathbf{let}\;h\,p = \mathit{return}_\mathsf{M}\,e ; h\,\_ = \emptyset_\mathsf{M}\;\mathbf{in}\;\phi\,(\mathit{join}_\mathsf{M}\,(\mathit{map}_\mathsf{M}\,h\,m)) \\ = & \qquad \{ \mbox{ringad morphism, functors} \} \\ & \mathbf{let}\;h\,p = \mathit{return}_\mathsf{M}\,e ; h\,\_ = \emptyset_\mathsf{M}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\phi\,(\mathit{map}_\mathsf{M}\,(\phi \cdot h)\,m)) \\ = & \qquad \{ \mbox{let~} h' = \phi \cdot h \} \\ & \mathbf{let}\;h'\,p = \phi\,(\mathit{return}_\mathsf{M}\,e) ; h'\,\_ = \phi\,\emptyset_\mathsf{M}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\phi\,(\mathit{map}_\mathsf{M}\,h'\,m)) \\ = & \qquad \{ \mbox{ringad morphism, induction} \} \\ & \mathbf{let}\;h'\,p = \mathit{return}_\mathsf{N}\,e ; h'\,\_ = \emptyset_\mathsf{N}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\phi\,(\mathit{map}_\mathsf{M}\,h'\,m)) \\ = & \qquad \{ \mbox{naturality of~} \phi \} \\ & \mathbf{let}\;h'\,p = \mathit{return}_\mathsf{N}\,e ; h'\,\_ = \emptyset_\mathsf{N}\;\mathbf{in}\;\mathit{join}_\mathsf{N}\,(\mathit{map}_\mathsf{N}\,h'\,(\phi\,m)) \\ = & \qquad \{ \mbox{comprehension} \} \\ & [ e \mid p \leftarrow \phi\,m ]_\mathsf{N} \end{array}$

And for sequences of qualifiers:

$\displaystyle \begin{array}{ll} & \phi\,[ e \mid q, q' ]_\mathsf{M} \\ = & \qquad \{ \mbox{comprehension} \} \\ & \phi\,(\mathit{join}\,[ [ e \mid q' ]_\mathsf{M} \mid q ]_\mathsf{M}) \\ = & \qquad \{ \mbox{ringad morphism} \} \\ & \phi\,(\mathit{map}\,\phi\,[ [ e \mid q' ]_\mathsf{M} \mid q ]_\mathsf{M}) \\ = & \qquad \{ \mbox{map over comprehension} \} \\ & \phi\,[ \phi\,[ e \mid q' ]_\mathsf{M} \mid q ]_\mathsf{M} \\ = & \qquad \{ \mbox{induction} \} \\ & [ [ e \mid q' ]_\mathsf{N} \mid q ]_\mathsf{N} \\ = & \qquad \{ \mbox{comprehension} \} \\ & [ e \mid q, q' ]_\mathsf{N} \end{array}$

For example, if ${\mathit{bag2set} : \mathsf{Bag} \mathbin{\stackrel{.}{\to}} \mathsf{Set}}$ is the obvious ringad morphism from bags to sets, discarding information about the multiplicity of repeated elements, and ${x}$ a bag of numbers, then

$\displaystyle \mathit{bag2set}\,[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Bag} = [a^2 \mid a \leftarrow \mathit{bag2set}\,x, \mathit{odd}\,a]_\mathsf{Set}$

and both yield the set of squares of the odd members of ${x}$. As a notational convenience, we might elide use of the ringad morphism when it is “obvious from context”—we might write just ${[a^2 \mid a \leftarrow x, \mathit{odd}\,a]_\mathsf{Set}}$ even when ${x}$ is a bag, relying on the “obvious” morphism ${\mathit{bag2set}}$. This would allow us to write, for example,

$\displaystyle [ a+b \mid a \leftarrow [1,2,3], b \leftarrow \langle4,4,5\rangle ]_\mathsf{Set} = \{ 5,6,7,8 \}$

(writing ${\langle\ldots\rangle}$ for the extension of a bag), instead of the more pedantic

$\displaystyle [ a+b \mid a \leftarrow \mathit{list2set}\,[1,2,3], b \leftarrow \mathit{bag2set}\,\langle4,4,5\rangle ]_\mathsf{Set} = \{ 5,6,7,8 \}$

There is a forgetful function from any poorer member of the Boom hierarchy to a richer one, flattening some distinctions by imposing additional laws—for example, from bags to sets, flattening distinctions concerning multiplicity—and I would class these forgetful functions as “obvious” morphisms. On the other hand, any morphisms in the opposite direction—such as sorting, from bags to lists, and one-of-each, from sets to bags—are not “obvious”, and so should not be elided; and similarly, I’m not sure that I could justify as “obvious” any morphisms involving non-members of the Boom Hierarchy, such as probability distributions.

Posted in Uncategorized | 11 Comments

## Upwards and downwards accumulations

Continuing my work in regress, this post revisits—with the benefit of much hindsight—what I was working on for my DPhil thesis (which was summarized in a paper at MPC 1992) and in subsequent papers at MPC 1998 and in SCP in 2000. This is the topic of accumulations on data structures, which distribute information across the data structure. List instances are familiar from the Haskell standard libraries (and, to those with a long memory, from APL); my thesis presented instances for a variety of tree datatypes; and the later work was about making it datatype-generic. I now have a much better way of doing it, using Conor McBride’s derivatives.

## Accumulations

Accumulations or scans distribute information contained in a data structure across that data structure in a given direction. The paradigmatic example is computing the running totals of a list of numbers, which can be thought of as distributing the numbers rightwards across the list, summing them as you go. In Haskell, this is an instance of the ${\mathit{scanl}}$ operator:

$\displaystyle \begin{array}{lcl} \mathit{scanl} &::& (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow [\beta] \\ \mathit{scanl}\,f\,e\,[\,] &=& [e] \\ \mathit{scanl}\,f\,e\,(a:x) &=& e : \mathit{scanl}\,f\,(f\,e\,a)\,x \medskip \\ \mathit{totals} &::& [{\mathbb Z}] \rightarrow [{\mathbb Z}] \\ \mathit{totals} &=& \mathit{scanl}\,(+)\,0 \end{array}$

A special case of this pattern is to distribute the elements of a list rightwards across the list, simply collecting them as you go, rather than summing them. That’s the ${\mathit{inits}}$ function, and it too is an instance of ${\mathit{scanl}}$:

$\displaystyle \mathit{inits} = \mathit{scanl}\,\mathit{snoc}\,[\,] \quad\mathbf{where}\; \mathit{snoc}\,x\,a = x \mathbin{{+}\!\!\!{+}} [a]$

It’s particularly special, in the sense that it is the most basic ${\mathit{scanl}}$; any other instance can be expressed in terms of it:

$\displaystyle \mathit{scanl}\,f\,e = \mathit{map}\,(\mathit{foldl}\,f\,e) \cdot \mathit{inits}$

This is called the Scan Lemma for ${\mathit{scanl}}$. Roughly speaking, it states that a ${\mathit{scanl}}$ replaces every node of a list with a ${\mathit{foldl}}$ applied to that node’s predecessors. Read from right to left, the scan lemma is an efficiency-improving transformation, eliminating duplicate computations; but note that this only works on expressions ${\mathit{map}\,f \cdot \mathit{inits}}$ where ${f}$ is a ${\mathit{foldl}}$, because only then are there duplicate computations to eliminate. It’s an important result, because it relates a clear and simple specification on the right to a more efficient implementation on the left.

However, the left-to-right operators ${\mathit{inits}}$, ${\mathit{foldl}}$, and ${\mathit{scanl}}$ are a little awkward in Haskell, because they go against the grain of the cons-based (ie, right-to-left) structure of lists. I leave as a simple exercise for the reader the task of writing the more natural ${\mathit{tails}}$, ${\mathit{foldr}}$, and ${\mathit{scanr}}$, and identifying the relationships between them. Conversely, one can view ${\mathit{inits}}$ etc as the natural operators for snoc-based lists, which are constructed from nil and snoc rather than from nil and cons.

## Upwards and downwards accumulations on binary trees

What would ${\mathit{inits}}$, ${\mathit{tails}}$, ${\mathit{scanl}}$, etc look like on different—and in particular, non-linear—datatypes? Let’s consider a simple instance, for homogeneous binary trees; that is, trees with a label at both internal and external nodes.

$\displaystyle \mathbf{data}\;\mathsf{Tree}\,\alpha = \mathit{Leaf}\,\alpha \mid \mathit{Fork}\,\alpha\,(\mathsf{Tree}\,\alpha)\,(\mathsf{Tree}\,\alpha)$

for which the obvious fold operator is

$\displaystyle \begin{array}{lcl} \mathit{fold} &::& (\alpha\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta\rightarrow\beta\rightarrow\beta) \rightarrow \mathsf{Tree}\,\alpha \rightarrow \beta \\ \mathit{fold}\,f\,g\,(\mathit{Leaf}\,a) &=& f\,a \\ \mathit{fold}\,f\,g\,(\mathit{Fork}\,a\,t\,u) &=& g\,a\,(\mathit{fold}\,f\,g\,t)\,(\mathit{fold}\,f\,g\,u) \end{array}$

I’m taking the view that the appropriate generalization is to distribute data “upwards” and “downwards” through such a tree—from the leaves towards the root, and vice versa. This does indeed specialize to the definitions we had on lists when you view them vertically in terms of their “cons” structure: they’re long thin trees, in which every parent has exactly one child. (An alternative view would be to look at distributing data horizontally through a tree, from left to right and vice versa. Perhaps I’ll come back to that another time.)

The upwards direction is the easier one to deal with. An upwards accumulation labels every node of the tree with some function of its descendants; moreover, the descendants of a node themselves form a tree, so can be easily represented, and folded. So we can quite straightforwardly define:

$\displaystyle \begin{array}{lcl} \mathit{scanu} &::& (\alpha\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta\rightarrow\beta\rightarrow\beta) \rightarrow \mathsf{Tree}\,\alpha \rightarrow \mathsf{Tree}\,\beta \\ \mathit{scanu}\,f\,g\,(\mathit{Leaf}\,a) &=& \mathit{Leaf}\,(f\,a) \\ \mathit{scanu}\,f\,g\,(\mathit{Fork}\,a\,t\,u) &=& \mathit{Fork}\,(g\,a\,(\mathit{root}\,t')\,(\mathit{root}\,u'))\,t'\,u' \\ & & \quad\mathbf{where}\; t' = \mathit{scanu}\,f\,g\,t ; u' = \mathit{scanu}\,f\,g\,u \end{array}$

where ${\mathit{root}}$ yields the root of a tree:

$\displaystyle \begin{array}{lcl} \mathit{root} &::& \mathsf{Tree}\,\alpha \rightarrow \alpha \\ \mathit{root}\,(\mathit{Leaf}\,a) &=& a \\ \mathit{root}\,(\mathit{Fork}\,a\,t\,u) &=& a \end{array}$

As with lists, the most basic upwards scan uses the constructors themselves as arguments:

$\displaystyle \begin{array}{lcl} \mathit{subtrees} &::& \mathsf{Tree}\,\alpha \rightarrow \mathsf{Tree}\,(\mathsf{Tree}\,\alpha) \\ \mathit{subtrees} &=& \mathit{scanu}\,\mathit{Leaf}\,\mathit{Fork} \end{array}$

and any other scan can be expressed, albeit less efficiently, in terms of this:

$\displaystyle \mathit{scanu}\,f\,g = \mathit{fmap}\,(\mathit{fold}\,f\,g) \cdot \mathit{subtrees}$

The downwards direction is more difficult, though. A downwards accumulation should label every node with some function of its ancestors; but these do not form another tree. For example, in the homogeneous binary tree

the ancestors of the node labelled ${3}$ are the nodes labelled ${2,4,3}$. One could represent those ancestors simply as a list, ${[2,4,3]}$; but that rules out the possibility of a downwards accumulation treating left children differently from right children, which is essential in a number of algorithms (such as the parallel prefix and tree drawing algorithms in my thesis). A more faithful rendering is to define a new datatype of paths that captures the left and right turns—a kind of non-empty cons list, but with both a “left cons” and a “right cons” constructor:

$\displaystyle \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{Single}\,\alpha \mid \mathit{LCons}\,\alpha\,(\mathsf{Path}\,\alpha) \mid \mathit{RCons}\,\alpha\,(\mathsf{Path}\,\alpha)$

(I called them “threads” in my thesis.) Then we can capture the data structure representing the ancestors of the node labelled ${3}$

by the expression ${\mathit{RCons}\,2\,(\mathit{LCons}\,4\,(\mathit{Single}\,3))}$. I leave it as an exercise for the more energetic reader to work out a definition for

$\displaystyle \mathit{paths} :: \mathsf{Tree}\,\alpha \rightarrow \mathsf{Tree}\,(\mathsf{Path}\,\alpha)$

to compute the tree giving the ancestors of every node, and for a corresponding ${\mathit{scand}}$.

## Generic upwards accumulations

Having seen ad-hoc constructions for a particular kind of binary tree, we should consider what the datatype-generic construction looks like. I discussed datatype-generic upwards accumulations already, in the post on Horner’s Rule; the construction was given in the paper Generic functional programming with types and relations by Richard Bird, Oege de Moor and Paul Hoogendijk. As with homogeneous binary trees, it’s still the case that the generic version of ${\mathit{subtrees}}$ labels every node of a data structure of type ${\mathsf{T}\alpha = \mu\mathsf{F}\alpha}$ with the descendants of that node, and still the case that the descendants form a data structure also of type ${\mathsf{T}\alpha}$. However, in general, the datatype ${\mathsf{T}}$ does not allow for a label at every node, so we need the labelled variant ${\mathsf{L}\alpha = \mu\mathsf{G}\alpha}$ where ${\mathsf{G}(\alpha,\beta) = \alpha \times \mathsf{F}(1,\beta)}$. Then we can define

$\displaystyle \mathit{subtrees}_{\mathsf{F}} = \mathit{fold}_{\mathsf{F}}(\mathit{in}_{\mathsf{G}} \cdot \mathit{fork}(\mathit{in}_{\mathsf{F}} \cdot \mathsf{F}(\mathit{id},\mathit{root}), \mathsf{F}(!,\mathit{id}))) :: \mathsf{T}\alpha \rightarrow \mathsf{L}(\mathsf{T}\alpha)$

where ${\mathit{root} = \mathit{fst} \cdot \mathit{in}_{\mathsf{G}}^{-1} = \mathit{fold}_{\mathsf{G}}\,\mathit{fst} :: \mathsf{L}\alpha \rightarrow \alpha}$ returns the root label of a labelled data structure—by construction, every labelled data structure has a root label—and ${!_{\alpha} :: \alpha \rightarrow 1}$ is the unique arrow to the unit type. Moreover, we get a datatype-generic ${\mathit{scanu}}$ operator, and a Scan Lemma:

$\displaystyle \begin{array}{lcl} \mathit{scanu}_{\mathsf{F}} &::& (\mathsf{F}(\alpha,\beta) \rightarrow \beta) \rightarrow \mathsf{T}\alpha \rightarrow \mathsf{L}\beta \\ \mathit{scanu}_{\mathsf{F}}\,\phi &=& \mathsf{L}\,(\mathit{fold}_{\mathsf{F}}\,\phi) \cdot \mathit{subtrees}_{\mathsf{F}} \\ &=& \mathit{fold}_{\mathsf{F}}(\mathit{in}_{\mathsf{G}} \cdot \mathit{fork}(\phi \cdot \mathsf{F}(\mathit{id},\mathit{root}), \mathsf{F}(!,\mathit{id}))) \end{array}$

## Generic downwards accumulations, via linearization

The best part of a decade after my thesis work, inspired by the paper by Richard Bird & co, I set out to try to define datatype-generic versions of downward accumulations too. I wrote a paper about it for MPC 1998, and then came up with a new construction for the journal version of that paper in SCP in 2000. I now think these constructions are rather clunky, and I have a better one; if you don’t care to explore the culs-de-sac, skip this section and the next and go straight to the section on derivatives.

The MPC construction was based around a datatype-generic version of the ${\mathsf{Path}}$ datatype above, to represent the “ancestors” of a node in an inductive datatype. The tricky bit is that data structures in general are non-linear—a node may have many children—whereas paths are linear—every node has exactly one child, except the last which has none; how can we define a “linear version” ${\mathsf{F}'}$ of ${\mathsf{F}}$? Technically, we might say that a functor is linear (actually, “affine” would be a better word) if it distributes over sum.

The construction in the paper assumed that ${\mathsf{F}}$ was a sum of products of literals

$\displaystyle \begin{array}{lcl} \mathsf{F}(\alpha,\beta) &=& \sum_{i=1}^{n} \mathsf{F}_i(\alpha,\beta) \\ \mathsf{F}_i(\alpha,\beta) &=& \prod_{j=1}^{m_i} \mathsf{F}_{i,j}(\alpha,\beta) \end{array}$

where each ${\mathsf{F}_{i,j}(\alpha,\beta)}$ is either ${\alpha}$, ${\beta}$, or some constant type such as ${\mathit{Int}}$ or ${\mathit{Bool}}$. For example, for leaf-labelled binary trees

$\displaystyle \mathbf{data}\;\mathsf{Tree}\,\alpha = \mathit{Tip}\,\alpha \mid \mathit{Bin}\,(\mathsf{Tree}\,\alpha)\,(\mathsf{Tree}\,\alpha)$

the shape functor is ${\mathsf{F}(\alpha,\beta) = \alpha + \beta \times \beta}$, so ${n=2}$ (there are two variants), ${m_1=1}$ (the first variant has a single literal, ${\alpha}$) and ${m_2=2}$ (the second variant has two literals, ${\beta}$ and ${\beta}$), and:

$\displaystyle \begin{array}{lcl} \mathsf{F}(\alpha,\beta) &=& \mathsf{F}_1(\alpha,\beta) + \mathsf{F}_2(\alpha,\beta) \\ \mathsf{F}_1(\alpha,\beta) &=& \mathsf{F}_{1,1}(\alpha,\beta) \\ \mathsf{F}_{1,1}(\alpha,\beta) &=& \alpha \\ \mathsf{F}_2(\alpha,\beta) &=& \mathsf{F}_{2,1}(\alpha,\beta) \times \mathsf{F}_{2,2}(\alpha,\beta) \\ \mathsf{F}_{2,1}(\alpha,\beta) &=& \beta \\ \mathsf{F}_{2,1}(\alpha,\beta) &=& \beta \\ \end{array}$

Then for each ${i}$ we define a ${(k_i+1)}$-ary functor ${\mathsf{F}'_i}$, where ${k_i}$ is the “degree of branching” of variant ${i}$ (ie, the number of ${\beta}$s occurring in ${\mathsf{F}_i(\alpha,\beta)}$, which is the number of ${j}$ for which ${\mathsf{F}_{i,j}(\alpha,\beta)=\beta}$), in such a way that

$\displaystyle \mathsf{F}'_i(\alpha,\beta,\ldots,\beta) = \mathsf{F}_i(\alpha,\beta)$

and ${\mathsf{F}'_i}$ is linear in each argument except perhaps the first. It’s a bit messy explicitly to give a construction for ${\mathsf{F}'_i}$, but roughly speaking,

$\displaystyle \mathsf{F}'_i(\alpha,\beta_1,\ldots,\beta_{k_i}) = \prod_{j=1}^{m_i} \mathsf{F}'_{i,j}(\alpha,\beta_1,\ldots,\beta_{k_i})$

where ${\mathsf{F}'_{i,j}(\alpha,\beta_1,\ldots,\beta_{k_i})}$ is “the next unused ${\beta_i}$” when ${\mathsf{F}_{i,j}(\alpha,\beta)=\beta}$, and just ${\mathsf{F}_{i,j}(\alpha,\beta)}$ otherwise. For example, for leaf-labelled binary trees, we have:

$\displaystyle \begin{array}{lcl} \mathsf{F}'_1(\alpha) &=& \alpha \\ \mathsf{F}'_2(\alpha,\beta_1,\beta_2) &=& \beta_1 \times \beta_2 \end{array}$

Having defined the linear variant ${\mathsf{F}'}$ of ${\mathsf{F}}$, we can construct the datatype ${\mathsf{P}\alpha = \mu\mathsf{H}\alpha}$ of paths, as the inductive datatype of shape ${\mathsf{H}}$ where

$\displaystyle \mathsf{H}(\alpha,\beta) = \mathsf{F}(\alpha,1) + \sum_{i=1}^{n} \sum_{j=1}^{k_i} (\mathsf{F}_i(\alpha,1) \times \beta)$

That is, paths are a kind of non-empty cons list. The path ends at some node of the original data structure; so the last element of the path is of type ${\mathsf{F}(\alpha,1)}$, which records the “local content” of a node (its shape and labels, but without any of its children). Every other element of the path consists of the local content of a node together with an indication of which direction to go next; this amounts to the choice of a variant ${i}$, followed by the choice of one of ${k_i}$ identical copies of the local contents ${\mathsf{F}_i(\alpha,1)}$ of variant ${i}$, where ${k_i}$ is the degree of branching of variant ${i}$. We model this as a base constructor ${\mathit{End}}$ and a family of “cons” constructors ${\mathit{Cons}_{i,j}}$ for ${1 \le i \le n}$ and ${1 \le j \le k_i}$.

For example, for leaf-labelled binary trees, the “local content” for the last element of the path is either a single label (for tips) or void (for bins), and for the other path elements, there are zero copies of the local content for a tip (because a tip has zero children), and two copies of the void local information for bins (because a bin has two children). Therefore, the path datatype for such trees is

$\displaystyle \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{End}\,(\mathsf{Maybe}\,\alpha) \mid \mathit{Cons}_{2,1}\,(\mathsf{Path}\,\alpha) \mid \mathit{Cons}_{2,2}\,(\mathsf{Path}\,\alpha)$

which is isomorphic to the definition that you might have written yourself:

$\displaystyle \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{External}\,\alpha \mid \mathit{Internal} \mid \mathit{Left}\,(\mathsf{Path}\,\alpha) \mid \mathit{Right}\,(\mathsf{Path}\,\alpha)$

For homogeneous binary trees, the construction gives

$\displaystyle \mathbf{data}\;\mathsf{Path}\,\alpha = \mathit{External}\,\alpha \mid \mathit{Internal}\,\alpha \mid \mathit{Left}\,\alpha\,(\mathsf{Path}\,\alpha) \mid \mathit{Right}\,\alpha\,(\mathsf{Path}\,\alpha)$

which is almost the ad-hoc definition we had two sections ago, except that it distinguishes singleton paths that terminate at an external node from those that terminate at an internal one.

Now, analogous to the function ${\mathit{subtrees}_\mathsf{F}}$ which labels every node with its descendants, we can define a function ${\mathit{paths}_\mathsf{F} : \mathsf{T}\alpha \rightarrow \mathsf{L}(\mathsf{P}\alpha)}$ to label every node with its ancestors, in the form of the path to that node. One definition is as a fold; informally, at each stage we construct a singleton path to the root, and map the appropriate “cons” over the paths to each node in each of the children (see the paper for a concrete definition). This is inefficient, because of the repeated maps; it’s analogous to defining ${\mathit{inits}}$ by

$\displaystyle \begin{array}{lcl} \mathit{inits}\,[\,] &=& [[\,]] \\ \mathit{inits}\,(a:x) &=& [\,] : \mathit{map}\,(a:)\,(\mathit{inits}\,x) \end{array}$

A second definition is as an unfold, maintaining as an accumulating parameter of type ${\mathsf{P}(\alpha)\rightarrow\mathsf{P}(\alpha)}$ the “path so far”; this avoids the maps, but it is still quadratic because there are no common subexpressions among the various paths. (This is analogous to an accumulating-parameter definition of ${\mathit{inits}}$:

$\displaystyle \begin{array}{lcl} \mathit{inits} &=& \mathit{inits}'\,\mathit{id} \medskip \\ \mathit{inits}'\,f\,[\,] &=& f\,[\,] \\ \mathit{inits}'\,f\,(a:x) &=& f\,[\,] : \mathit{inits}'\,(f \cdot (a:))\,x \end{array}$

Even with an accumulating “Hughes list” parameter, it still takes quadratic time.)

The downwards accumulation itself is defined as a path fold mapped over the paths, giving a Scan Lemma for downwards accumulations. With either the fold or the unfold definition of paths, this is still quadratic, again because of the lack of common subexpressions in a result of quadratic size. However, in some circumstances the path fold can be reassociated (analogous to turning a ${\mathit{foldr}}$ into a ${\mathit{foldl}}$), leading finally to a linear-time computation; see the paper for the details of how.

## Generic downwards accumulations, via zip

I was dissatisfied with the “…”s in the MPC construction of datatype-generic paths, but couldn’t see a good way of avoiding them. So in the subsequent SCP version of the paper, I presented an alternative construction of downwards accumulations, which does not go via a definition of paths; instead, it goes directly to the accumulation itself.

As with the efficient version of the MPC construction, it is coinductive, and uses an accumulating parameter to carry in to each node the seed from higher up in the tree; so the downwards accumulation is of type ${\gamma \times \mathsf{T}\alpha \rightarrow \mathsf{L}\beta}$. It is defined as an unfold, with a body ${g}$ of type

$\displaystyle \gamma \times \mathsf{T}\alpha \rightarrow \mathsf{G}(\beta, \gamma \times \mathsf{T}\alpha)$

The result ${\mathsf{G}(\beta, \gamma \times \mathsf{T}\alpha)}$ of applying the body will be constructed from two components, of types ${\mathsf{G}(\beta, \gamma)}$ and ${\mathsf{G}(1, \mathsf{T}\alpha)}$: the first gives the root label of the accumulation and the seeds for processing the children, and the second gives the children themselves.

These two components get combined to make the whole result via a function

$\displaystyle \mathit{zip} :: \mathsf{G}(\alpha,\beta) \times \mathsf{G}(\gamma,\delta) \rightarrow \mathsf{G}(\alpha \times \gamma, \beta \times \delta)$

This will be partial in general, defined only for pairs of ${\mathsf{G}}$-structures of the same shape.

The second component of ${g}$ is the easier to define; given input ${\gamma \times \mathsf{T}\alpha}$, it unpacks the ${\mathsf{T}\alpha}$ to ${\mathsf{F}(\alpha,\mathsf{T}\alpha)}$, and discards the ${\gamma}$ and the ${\alpha}$ (recall that ${\mathsf{L}\alpha=\mu\mathsf{G}\alpha}$ is the labelled variant of ${\mathsf{T}\alpha=\mu\mathsf{F}\alpha}$, where ${\mathsf{G}(\alpha,\beta) = \alpha \times \mathsf{F}(1,\beta)}$).

For the first component, we enforce the constraint that all output labels are dependent only on their ancestors by unpacking the ${\mathsf{T}\alpha}$ and pruning off the children, giving input ${\gamma \times \mathsf{F}(\alpha,1)}$. We then suppose as a parameter to the accumulation a function ${f}$ of type ${\gamma \times \mathsf{F}(\alpha,1) \rightarrow \beta \times \mathsf{F}(1,\gamma) = \mathsf{G}(\beta,\gamma)}$ to complete the construction of the first component. In order that the two components can be zipped together, we require that ${f}$ is shape-preserving in its second argument:

$\displaystyle \mathsf{F}(!,!) \cdot \mathit{snd} = \mathsf{F}(!,!) \cdot f \cdot \mathit{snd}$

where ${! : \alpha \rightarrow 1}$ is the unique function to the unit type. Then, although the ${g}$ built from these two components depends on the partial function ${\mathit{zip}}$, it will still itself be total.

The SCP construction gets rid of the “…”s in the MPC construction. It is also inherently efficient, in the sense that if the core operation ${f}$ takes constant time then the whole accumulation takes linear time. However, use of the partial ${\mathit{zip}}$ function to define a total accumulation is a bit unsatisfactory, taking us outside the domain of sets and total functions. Moreover, there’s now only half an explanation in terms of paths: accumulations in which the label attached to each node depends only on the list of its ancestors, and not on the left-to-right ordering of siblings, can be factored into a list function (in fact, a ${\mathit{foldl}}$) mapped over the “paths”, which is now a tree of lists; but accumulations in which left children are treated differently from right children, such as the parallel prefix and tree drawing algorithms mentioned earlier, can not.

## Generic downwards accumulations, via derivatives

After another interlude of about a decade, and with the benefit of new results to exploit, I had a “eureka” moment: the linearization of a shape functor is closely related to the beautiful notion of the derivative of a datatype, as promoted by Conor McBride. The crucial observation Conor made is that the “one-hole contexts” of a datatype—that is, for a container datatype, the datatype of data structures with precisely one element missing—can be neatly formalized using an analogue of the rules of differential calculus. The one-hole contexts are precisely what you need to identify which particular child you’re talking about out of a collection of children. (If you’re going to follow along with some coding, I recommend that you also read Conor’s paper Clowns to the left of me, jokers to the right. This gives the more general construction of dissecting a datatype, identifying a unique hole, but also allowing the “clowns” to the left of the hole to have a different type from the “jokers” to the right. I think the explanation of the relationship with the differential calculus is much better explained here; the original notion of derivative can be retrieved by specializing the clowns and jokers to the same type.)

The essence of the construction is the notion of a derivative ${\Delta\mathsf{F}}$ of a functor ${\mathsf{F}}$. For our purposes, we want the derivative in the second argument only of a bifunctor; informally, ${\Delta\mathsf{F}(\alpha,\beta)}$ is like ${\mathsf{F}(\alpha,\beta)}$, but with precisely one ${\beta}$ missing. Given such a one-hole context, and an element with which to plug the hole, one can reconstruct the whole structure:

$\displaystyle \mathit{plug}_\mathsf{F} :: \beta \times \Delta\mathsf{F}(\alpha,\beta) \rightarrow \mathsf{F}(\alpha,\beta)$

That’s how to consume one-hole contexts; how can we produce them? We could envisage some kind of inverse ${\mathit{unplug}}$ of ${\mathit{plug}}$, which breaks an ${\mathsf{F}}$-structure into an element and a context; but this requires us to invent a language for specifying which particular element we mean—${\mathit{plug}}$ is not injective, so ${\mathit{unplug}}$ needs an extra argument. A simpler approach is to provide an operator that annotates every position at once with the one-hole context for that position:

$\displaystyle \mathit{positions}_\mathsf{F} :: \mathsf{F}(\alpha,\beta) \rightarrow \mathsf{F}(\alpha, \beta \times \Delta\mathsf{F}(\alpha,\beta))$

One property of ${\mathit{positions}}$ is that it really is an annotation—if you throw away the annotations, you get back what you started with:

$\displaystyle \mathsf{F}(\mathit{id},\mathit{fst})\,(\mathit{positions}\,x) = x$

A second property relates it to ${\mathit{plug}}$—each of elements in a hole position plugs into its associated one-hole context to yield the same whole structure back again:

$\displaystyle \mathsf{F}(\mathit{id},\mathit{plug})\,(\mathit{positions}\,x) = \mathsf{F}(\mathit{id},\mathit{const}\,x)\,x$

(I believe that those two properties completely determine ${\mathit{plug}}$ and ${\mathit{positions}}$.)

Incidentally, the derivative ${\Delta\mathsf{F}}$ of a bifunctor can be elegantly represented as an associated type synonym in Haskell, in a type class ${\mathit{Diff}}$ of bifunctors differentiable in their second argument, along with ${\mathit{plug}}$ and ${\mathit{positions}}$:

$\displaystyle \begin{array}{lcl} \mathbf{class}\; \mathit{Bifunctor}\,f \Rightarrow \mathit{Diff}\,f \;\mathbf{where} \\ \qquad \mathbf{type}\; \mathit{Delta}\,f :: \ast \rightarrow \ast \rightarrow \ast \\ \qquad \mathit{plug} :: (\beta, \mathit{Delta}\,f\,\alpha\,\beta) \rightarrow f\,\alpha\,\beta \\ \qquad \mathit{positions} :: f\,\alpha\,\beta \rightarrow f\,\alpha\,(\beta, \mathit{Delta}\,f\,\alpha\,\beta) \end{array}$

Conor’s papers show how to define instances of ${\mathit{Diff}}$ for all polynomial functors ${\mathsf{F}}$—anything made out of constants, projections, sums, and products.

The path to a node in a data structure is simply a list of one-hole contexts—let’s say, innermost context first, although it doesn’t make much difference—but with all the data off the path (that is, the other children) stripped away:

$\displaystyle \mathsf{P}\alpha = \mathsf{List}(\Delta\mathsf{F}(\alpha,1))$

This is a projection of Huet’s zipper, which preserves the off-path children, and records also the subtree in focus at the end of the path:

$\displaystyle \mathsf{Zipper}_\mathsf{F}\,\alpha = \mathsf{List}(\Delta\mathsf{F}(\alpha,\mu\mathsf{F}\alpha)) \times \mu\mathsf{F}\alpha$

Since the contexts are listed innermost-first in the path, closing up a zipper to reconstruct a tree is a ${\mathit{foldl}}$ over the path:

$\displaystyle \begin{array}{lcl} close_\mathsf{F} &::& \mathsf{Zipper}_\mathsf{F}\,\alpha \rightarrow \mu\mathsf{F}\alpha \\ close_\mathsf{F}\,(p,t) &=& \mathit{foldl}\,(\mathit{in}\cdot\mathit{plug})\,t\,p \end{array}$

Now, let’s develop the function ${\mathit{paths}}$, which turns a tree into a labelled tree of paths. We will write it with an accumulating parameter, representing the “path so far”:

$\displaystyle \begin{array}{lcl} \mathit{paths}_\mathsf{F} &::& \mathsf{T}\alpha \rightarrow \mathsf{L}(\mathsf{P}\alpha) \\ \mathit{paths}_\mathsf{F}\,t &=& \mathit{paths}'_\mathsf{F}\,(t,[\,]) \end{array}$

Given the components ${\mathit{in}_\mathsf{F}\,x}$ of a tree and a path ${p}$ to its root, ${\mathit{paths}'_\mathsf{F}}$ must construct the corresponding labelled tree of paths. Since ${\mathsf{L} = \mu\mathsf{G}}$ and ${\mathsf{G}(\alpha,\beta) = \alpha \times \mathsf{F}(1,\beta)}$, this amounts to constructing a value of type ${\mathsf{P}\alpha \times \mathsf{F}(1, \mathsf{L}(\mathsf{P}\alpha))}$. For the first component of this pair we will use ${p}$, the path so far. The second component can be constructed from ${x}$ by identifying all children via ${\mathit{positions}}$, discarding some information with judicious ${!}$s, consing each one-hole context onto ${p}$ to make a longer path, then making recursive calls on each child:

That is,

$\displaystyle \begin{array}{lcl} \mathit{paths}'_\mathsf{F} &::& \mathsf{T}\alpha\times\mathsf{P}\alpha \rightarrow \mathsf{L}(\mathsf{P}\alpha) \\ \mathit{paths}'_\mathsf{F}\,(\mathit{in}_\mathsf{F}\,x,p) &=& \mathit{in}_\mathsf{G}(p, \mathsf{F}(!, \mathit{paths}'_\mathsf{F} \cdot \mathit{id}\times((:p)\cdot\Delta\mathsf{F}(\mathit{id},!)) )\,(\mathit{positions}\,x)) \end{array}$

Downwards accumulations are then path functions mapped over the result of ${\mathit{paths}}$. However, we restrict ourselves to path functions that are instances of ${\mathit{foldr}}$, because only then are there common subexpressions to be shared between a parent and its children (remember that paths are innermost-first, so related nodes share a tail of their ancestors).

$\displaystyle \begin{array}{lcl} \mathit{scand}_\mathsf{F} &::& (\Delta\mathsf{F}(\alpha,1)\times\beta\rightarrow\beta) \rightarrow \beta \rightarrow \mathsf{T}\alpha \rightarrow \mathsf{L}\beta \\ \mathit{scand}_\mathsf{F}\,f\,e &=& \mathit{map}\,(\mathit{foldr}\,f\,e) \cdot \mathit{paths} \end{array}$

Moreover, it is straightforward to fuse the ${\mathit{map}}$ with ${\mathit{paths}}$, to obtain

$\displaystyle \begin{array}{lcl} \mathit{scand}_\mathsf{F}\,f\,e\,t &=& \mathit{scand}'_\mathsf{F}\,f\,(t,e) \medskip \\ \mathit{scand}'_\mathsf{F}\,f\,(\mathit{in}_\mathsf{F}\,x,e) &=& \mathit{in}_\mathsf{G}(e, \mathsf{F}(!, \mathit{scand}'_\mathsf{F}\,f \cdot \mathit{id}\times g )\,(\mathit{positions}\,x)) \\ & & \quad\mathbf{where}\; g\,d = f\,(\Delta\mathsf{F}(\mathit{id},!)\,d, e) \end{array}$

which takes time linear in the size of the tree, assuming that ${f}$ and ${e}$ take constant time.

Finally, in the case that the function being mapped over the paths is a ${\mathit{foldl}}$ as well as a ${\mathit{foldr}}$, then we can apply the Third Homomorphism Theorem to conclude that it is also an associative fold over lists. From this (I believe) we get a very efficient parallel algorithm for computing the accumulation, taking time logarithmic in the size of the tree—even if the tree has greater than logarithmic depth.

Posted in Uncategorized | 2 Comments

## Distributivity in Horner’s Rule

This is a continuation of my previous post on Horner’s Rule, and in particular, of the discussion there about distributivity in the datatype-generic version of the Maximum Segment Sum problem:

the essential property behind Horner’s Rule is one of distributivity. In the datatype-generic case, we will model this as follows. We are given an ${(\mathsf{F}\,\alpha)}$-algebra ${(\beta,f)}$ [for a binary shape functor ${\mathsf{F}}$], and a ${\mathsf{M}}$-algebra ${(\beta,k)}$ [for a collection monad ${\mathsf{M}}$]; you might think of these as “datatype-generic product” and “collection sum”, respectively. Then there are two different methods of computing a ${\beta}$ result from an ${\mathsf{F}\,\alpha\,(\mathsf{M}\,\beta)}$ structure: we can either distribute the ${\mathsf{F}\,\alpha}$ structure over the collection(s) of ${\beta}$s, compute the “product” ${f}$ of each structure, and then compute the “sum” ${k}$ of the resulting products; or we can “sum” each collection, then compute the “product” of the resulting structure. Distributivity of “product” over “sum” is the property that these two different methods agree, as illustrated in the following diagram.

For example, with ${f :: \mathsf{F}\,{\mathbb Z}\,{\mathbb Z} \rightarrow {\mathbb Z}}$ adding all the integers in an ${\mathsf{F}}$-structure, and ${k :: \mathsf{M}\,{\mathbb Z} \rightarrow {\mathbb Z}}$ finding the maximum of a (non-empty) collection, the diagram commutes.

There’s a bit of hand-waving above to justify the claim that this is really a kind of distributivity. What does it have to do with the common-or-garden equation

$\displaystyle a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$

stating distributivity of one binary operator over another? That question is the subject of this post.

## Distributing over effects

Recall that ${\delta_2 :: (\mathsf{F}\,\alpha)\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}(\mathsf{F}\,\alpha)}$ distributes the shape functor ${\mathsf{F}}$ over the monad ${\mathsf{M}}$ in its second argument; this is the form of distribution over effects that crops up in the datatype-generic Maximum Segment Sum problem. More generally, this works for any idiom ${\mathsf{M}}$; this will be important below.

Generalizing in another direction, one might think of distributing over an idiom in both arguments of the bifunctor, via an operator ${\delta : \mathsf{F} \cdot (\mathsf{M} \times \mathsf{M}) \mathbin{\stackrel{.}{\to}} \mathsf{M} \cdot \mathsf{F}}$, which is to say, ${\delta_\beta :: \mathsf{F}\,(\mathsf{M}\beta)\,(\mathsf{M}\beta) \rightarrow \mathsf{M}(\mathsf{F}\beta)}$, natural in the ${\beta}$. This is the ${\mathit{bidist}}$ method of the ${\mathit{Bitraversable}}$ subclass of ${\mathit{Bifunctor}}$ that Bruno Oliveira and I used in our Essence of the Iterator Pattern paper; informally, it requires just that ${\mathsf{F}}$ has a finite ordered sequence of “element positions”. Given ${\delta}$, one can define ${\delta_2 = \delta \cdot \mathsf{F}\,\mathit{pure}\,\mathit{id}}$.

That traversability (or equivalently, distributivity over effects) for a bifunctor ${\mathsf{F}}$ is definable for any idiom, not just any monad, means that one can also conveniently define an operator ${\mathit{contents}_{\mathsf{H}} : \mathsf{H} \mathbin{\stackrel{.}{\to}} \mathsf{List}}$ for any traversable unary functor ${\mathsf{H}}$. This is because the constant functor ${\mathsf{K}_{[\beta]}}$ (which takes any ${\alpha}$ to ${[\beta]}$) is an idiom: the ${\mathit{pure}}$ method returns the empty list, and idiomatic application appends two lists. Then one can define

$\displaystyle \mathit{contents}_{\mathsf{H}} = \delta \cdot \mathsf{H}\,\mathit{wrap}$

where ${\mathit{wrap}}$ makes a singleton list. For a traversable bifunctor ${\mathsf{F}}$, we define ${\mathit{contents}_{\mathsf{F}} = \mathit{contents}_{\mathsf{F}\cdot\triangle}}$ where ${\triangle}$ is the diagonal functor; that is, ${\mathit{contents}_{\mathsf{F}} :: \mathsf{F}\,\beta\,\beta \rightarrow [\beta]}$, natural in the ${\beta}$. (No constant functor is a monad, except in trivial categories, so this convenient definition of contents doesn’t work monadically. Of course, one can use a writer monad, but this isn’t quite so convenient, because an additional step is needed to extract the output.)

One important axiom of ${\delta}$ that I made recent use of in a paper with Richard Bird on Effective Reasoning about Effectful Traversals is that it should be “natural in the contents”: it should leave shape unchanged, and depend on contents only up to the extent of their ordering. Say that a natural transformation ${\phi : \mathsf{F} \mathbin{\stackrel{.}{\to}} \mathsf{G}}$ between traversable functors ${\mathsf{F}}$ and ${\mathsf{G}}$ “preserves contents” if ${\mathit{contents}_{\mathsf{G}} \cdot \phi = \mathit{contents}_{\mathsf{F}}}$. Then, in the case of unary functors, the formalization of “naturality in the contents” requires ${\delta}$ to respect content-preserving ${\phi}$:

$\displaystyle \delta_{\mathsf{G}} \cdot \phi = \mathsf{M}\phi \cdot \delta_{\mathsf{F}} : \mathsf{T}\mathsf{M} \mathbin{\stackrel{.}{\to}} \mathsf{M}\mathsf{G}$

In particular, ${\mathit{contents}_{\mathsf{F}} : \mathsf{F} \mathbin{\stackrel{.}{\to}} \mathsf{List}}$ itself preserves contents, and so we expect

$\displaystyle \delta_{\mathsf{List}} \cdot \mathit{contents}_{\mathsf{F}} = \mathsf{M}(\mathit{contents}_{\mathsf{F}}) \cdot \delta_{\mathsf{F}}$

to hold.

## Folding a structure

Happily, the same generic operation ${\mathit{contents}_{\mathsf{F}}}$ provides a datatype-generic means to “fold” over the elements of an ${\mathsf{F}}$-structure. Given a binary operator ${\otimes :: \beta\times\beta \rightarrow \beta}$ and an initial value ${b :: \beta}$, we can define an ${(\mathsf{F}\,\beta)}$-algebra ${(\beta,f)}$—that is, a function ${f :: \mathsf{F}\,\beta\,\beta\rightarrow\beta}$—by

$\displaystyle f = \mathit{foldr}\,(\otimes)\,b \cdot \mathit{contents}_{\mathsf{F}}$

(This is a slight specialization of the presentation of the datatype-generic MSS problem from last time; there we had ${f :: \mathsf{F}\,\alpha\,\beta \rightarrow \beta}$. The specialization arises because we are hoping to define such an ${f}$ given a homogeneous binary operator ${\otimes}$. On the other hand, the introduction of the initial value ${b}$ is no specialization, as we needed such a value for the “product” of an empty “segment” anyway.)

Incidentally, I believe that this “generic folding” construction is exactly what is intended in Ross Paterson’s Data.Foldable library.

## Summing a collection

The other ingredient we need is an ${\mathsf{M}}$-algebra ${(\beta,k)}$. We already decided last time to

stick to reductions${k}$s of the form ${\oplus/}$ for associative binary operator ${{\oplus} :: \beta \times \beta \rightarrow \beta}$; then we also have distribution over choice: ${\oplus / (x \mathbin{\underline{\smash{\mathit{mplus}}}} y) = (\oplus/x) \oplus (\oplus/y)}$. Note also that we prohibited empty collections in ${\mathsf{M}}$, so we do not need a unit for ${\oplus}$.

On account of ${\oplus/}$ being an algebra for the collection monad ${\mathsf{M}}$, we also get a singleton rule ${\oplus/ \cdot \mathit{return} = \mathit{id}}$.

## Reduction to distributivity for lists

One of the take-home messages in the Effective Reasoning about Effectful Traversals paper is that it helps to reduce a traversal problem for datatypes in general to a more specific one about lists, exploiting the “naturality in contents” property of traversability. We’ll use that tactic for the distributivity property in the datatype-generic version Horner’s Rule.

In this diagram, the perimeter is the commuting diagram given at the start of this post—the diagram we have to justify. Face (1) is the definition of ${\delta_2}$ in terms of ${\delta}$. Faces (2) and (3) are the expansion of ${f}$ as generic folding of an ${\mathsf{F}}$-structure. Face (4) follows from ${\oplus/}$ being an ${\mathsf{M}}$-algebra, and hence being a left-inverse of ${\mathit{return}}$. Face (5) is an instance of the naturality property of ${\mathit{contents}_{\mathsf{F}} : \mathsf{F}\triangle \mathbin{\stackrel{.}{\to}} \mathsf{List}}$. Face (6) is the property that ${\delta}$ respects the contents-preserving transformation ${\mathit{contents}_{\mathsf{F}}}$. Therefore, the whole diagram commutes if Face (7) does—so let’s look at Face (7)!

## Distributivity for lists

Here’s Face (7) again:

Demonstrating that this diagram commutes is not too difficult, because both sides turn out to be list folds.

Around the left and bottom edges, we have a fold ${\mathit{foldr}\,(\otimes)\,b}$ after a map ${\mathsf{List}\,(\oplus)}$, which automatically fuses to ${\mathit{foldr}\,(\odot)\,b}$, where ${\odot}$ is defined by

$\displaystyle x \odot a = (\oplus/x) \otimes a$

or, pointlessly,

$\displaystyle (\odot) = (\otimes) \cdot (\oplus/) \times \mathit{id}$

Around the top and right edges we have the composition ${\oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \cdot \delta_{\mathsf{List}}}$. If we can write ${\delta_{\mathsf{List}}}$ as an instance of ${\mathit{foldr}}$, we can then use the fusion law for ${\mathit{foldr}}$

$\displaystyle h \cdot \mathit{foldr}\,f\,e = \mathit{foldr}\,f'\,e' \;\Leftarrow\; h\,e=e' \land h \cdot f = f' \cdot \mathit{id}\times h$

to prove that this composition equals ${\mathit{foldr}\,(\odot)\,b}$.

In fact, there are various equivalent ways of writing ${\delta_{\mathsf{List}}}$ as an instance of ${\mathit{foldr}}$. The definition given by Conor McBride and Ross Paterson in their original paper on idioms looked like the identity function, but with added idiomness:

$\displaystyle \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{pure}\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathit{pure}\,(:) \circledast \mathit{mb} \circledast \delta_{\mathsf{List}}\,\mathit{mbs} \end{array}$

In the special case that the idiom is a monad, it can be written in terms of ${\mathit{liftM}_0}$ (aka ${\mathit{return}}$) and ${\mathit{liftM}_2}$:

$\displaystyle \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{liftM}_0\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathit{liftM}_2\,(:)\,\mathit{mb}\,(\delta_{\mathsf{List}}\,\mathit{mbs}) \end{array}$

But we’ll use a third definition:

$\displaystyle \begin{array}{lcl} \delta_{\mathsf{List}}\,[\,] &=& \mathit{return}\,[\,] \\ \delta_{\mathsf{List}}\,(\mathit{mb} : \mathit{mbs}) &=& \mathsf{M}(:)\,(\mathit{cp}\,(\mathit{mb}, \delta_{\mathsf{List}}\,\mathit{mbs})) \end{array}$

where

$\displaystyle \begin{array}{lcl} \mathit{cp} &::& \mathsf{M}\,\alpha \times \mathsf{M}\,\beta \rightarrow \mathsf{M}(\alpha\times\beta) \\ \mathit{cp}\,(x,y) &=& \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a,b) \} \end{array}$

That is,

$\displaystyle \delta_{\mathsf{List}} = \mathit{foldr}\,(\mathsf{M}(:)\cdot\mathit{cp})\,(\mathit{return}\,[\,])$

Now, for the base case we have

$\displaystyle \oplus/\,(\mathsf{M}(\mathit{foldr}\,(\otimes)\,b)\,(\mathit{return}\,[\,])) = \oplus/\,(\mathit{return}\,(\mathit{foldr}\,(\otimes)\,b\,[\,])) = \oplus/\,(\mathit{return}\,b) = b$

as required. For the inductive step, we have:

$\displaystyle \begin{array}{ll} & \oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \cdot \mathsf{M}(:) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{functors} \} \\ & \oplus/ \cdot \mathsf{M}(\mathit{foldr}\,(\otimes)\,b \cdot (:)) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{evaluation for~} \mathit{foldr} \} \\ & \oplus/ \cdot \mathsf{M}((\otimes) \cdot \mathit{id}\times\mathit{foldr}\,(\otimes)\,b) \cdot \mathit{cp} \\ = & \qquad \{ \mbox{functors; naturality of~} \mathit{cp} \} \\ & \oplus/ \cdot \mathsf{M}(\otimes) \cdot \mathit{cp} \cdot \mathsf{M}\mathit{id}\times\mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \\ = & \qquad \{ \mbox{distributivity for~} \mathit{cp} \mbox{: see below} \} \\ & (\otimes) \cdot (\oplus/)\times(\oplus/) \cdot \mathsf{M}\mathit{id}\times\mathsf{M}(\mathit{foldr}\,(\otimes)\,b) \\ = & \qquad \{ \mbox{functors} \} \\ & (\otimes) \cdot (\oplus/)\times\mathit{id} \cdot \mathit{id}\times\mathsf{M}(\oplus/\cdot\mathit{foldr}\,(\otimes)\,b) \end{array}$

which completes the fusion proof, modulo the wish about distributivity for ${\mathit{cp}}$:

$\displaystyle \oplus/ \cdot \mathsf{M}(\otimes) \cdot \mathit{cp} = (\otimes) \cdot (\oplus/)\times(\oplus/)$

## Distributivity for cartesian product

As for that wish about distributivity for ${\mathit{cp}}$:

$\displaystyle \begin{array}{ll} & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}(\otimes) \mathbin{\hbox{\footnotesize\}} \mathit{cp}\,(x,y) \\ = & \qquad \{ \mbox{definition of~} \mathit{cp} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}(\otimes) \mathbin{\hbox{\footnotesize\}} \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a,b) \,\} \\ = & \qquad \{ \mbox{map over~} \mathbf{do} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathbf{do}\,\{\,a \leftarrow x \mathbin{;} b \leftarrow y \mathbin{;} \mathit{return}\,(a \otimes b) \,\} \\ = & \qquad \{ \mbox{expanding~} \mathbf{do} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathit{join} \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} \mathsf{M}\,(a\otimes)\,y)\,x \\ = & \qquad \{ \oplus/ \mbox{~is an~} \mathsf{M} \mbox{-algebra} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}(\oplus/) \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} \mathsf{M}\,(a\otimes)\,y)\,x \\ = & \qquad \{ \mbox{functors} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} \oplus/(\mathsf{M}\,(a\otimes)\,y))\,x \\ = & \qquad \{ \mbox{distributivity for collections: see below} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\lambda a \mathbin{.} a \otimes (\oplus/\,y))\,x \\ = & \qquad \{ \mbox{sectioning} \} \\ & \oplus/ \mathbin{\hbox{\footnotesize\}} \mathsf{M}\,(\otimes (\oplus/\,y))\,x \\ = & \qquad \{ \mbox{distributivity for collections again} \} \\ & (\otimes (\oplus/\,y))\,(\oplus/\,x) \\ = & \qquad \{ \mbox{sectioning} \} \\ & (\oplus/\,x) \otimes (\oplus/\,y) \\ = & \qquad \{ \mbox{eta-expansion} \} \\ & (\otimes) \mathbin{\hbox{\footnotesize\}} (\oplus/ \times \oplus/) \mathbin{\hbox{\footnotesize\}} (x,y) \\ \end{array}$

which discharges the proof obligation about distributivity for cartesian product, but again modulo two symmetric wishes about distributivity for collections:

$\displaystyle \begin{array}{lcl} \oplus/ \cdot \mathsf{M}(a\otimes) &=& (a\otimes) \cdot \oplus/ \\ \oplus/ \cdot \mathsf{M}(\otimes b) &=& (\otimes b) \cdot \oplus/ \\ \end{array}$

## Distributivity for collections

Finally, the proof obligations about distributivity for collections are easily discharged, by induction over the size of the (finite!) collection, provided that the binary operator ${\otimes}$ distributes over ${\oplus}$ in the familiar sense. The base case is for a singleton collection, ie in the image of ${\mathit{return}}$ (because we disallowed empty collections); this case follows from the fact that ${\oplus/}$ is an ${\mathsf{M}}$-algebra. The inductive step is for a collection of the form ${u \mathbin{\underline{\smash{\mathit{mplus}}}} v}$ with ${u,v}$ both strictly smaller than the whole (so, if the monad is idempotent, disjoint, or at least not nested); this requires the distribution of the algebra over choice ${\oplus / (u \mathbin{\underline{\smash{\mathit{mplus}}}} v) = (\oplus/u) \oplus (\oplus/v)}$, together with the familiar distribution of ${\otimes}$ over ${\oplus}$.

## Summary

So, the datatype-generic distributivity for ${\mathsf{F}}$-structures of collections that we used for the Maximum Segment Sum problem reduced to distributivity for lists of collections, which reduced to the cartesian product of collections, which reduced to that for pairs. That’s a much deeper hierarchy than I was expecting; can it be streamlined?

Posted in Uncategorized | 2 Comments