I read quite a nice problem on Nick Wu’s blog, which will serve as a fine warm-up exercise. It’s about the fact that streams (infinite lists) form a monad, in a different way from lists. Nick shows the “right” and two “wrong” definitions of the join or bind operation, distinguishing them on the basis of the monad laws. But I think Nick’s proofs are more complicated than they need to be, because he hasn’t fully exploited the recursion patterns that underlie his definitions.

This post will involve some language that we have not yet covered. Fear not! I hope it will be clear from context. But in case it isn’t, you might want to take a look at some of the background material (especially the paper Calculating Functional Programs).

## Streams

Like Nick, for simplicity we will take the datatype of streams to be a synonym for lists; in all that follows, assume that lists are properly infinite (not finite, or partial). $\displaystyle \mathbf{type}\;\mathit{Stream}\,a = [a]$

Streams are naturally a codatatype rather than a datatype: in the category of sets and total functions, they would be represented as a final coalgebra rather than an initial algebra. In Haskell, which is roughly based on the category of CPOs and continuous functions, initial algebras and final coalgebras coincide, so we need not (indeed, we cannot) make the distinction formally. But we can make it informally, by stipulating that the basic pattern of computation for streams is the ${\mathit{unfold}}$: $\displaystyle \begin{array}{l} \mathit{unfold} :: (b \rightarrow (a,b)) \rightarrow b \rightarrow \mathit{Stream}\,a \\ \mathit{unfold}\,f\,b = a : \mathit{unfold}\,f\,b' \qquad\mathbf{where}\qquad (a,b') = f\,b \end{array}$ ${\mathit{unfold}\,f}$ generates a stream from a seed, using the body ${f}$ that transforms a seed ${b}$ into an element ${a}$ and a new seed ${b'}$. For example, the map function for streams uses the input stream as the seed, repeatedly splitting it into its head and tail: $\displaystyle \begin{array}{l} \mathit{mapS} :: (a \rightarrow b) \rightarrow \mathit{Stream}\,a \rightarrow \mathit{Stream}\,b \\ \mathit{mapS}\,f = \mathit{unfold}\,(\mathit{fork}\,(f \cdot \mathit{head}, \mathit{tail})) \end{array}$

where ${\mathit{fork}}$ applies two functions to the same argument: $\displaystyle \mathit{fork}\,(f,g)\,a = (f\,a, g\,a)$

The crucial property of ${\mathit{unfold}}$ is its universal property, which provides necessary and sufficient conditions for a computation to be expressible as an instance of ${\mathit{unfold}}$: $\displaystyle h = \mathit{unfold}\,f \Leftrightarrow \mathit{out} \cdot h = \mathit{prod}\,(\mathit{id},h) \cdot f$

where ${\mathit{out} = \mathit{fork}\,(\mathit{head},tail)}$ deconstructs a stream into its head and tail, and $\displaystyle \mathit{prod}\,(f,g)\,(a,b) = (f\,a, g\,b)$

From the universal property, one can easily (exercise!) prove three simple consequences (we’ll call them the “identity” and two “evaluation” rules): $\displaystyle \begin{array}{l} \mathit{unfold}\,\mathit{out} = \mathit{id} \\ \mathit{head} \cdot \mathit{unfold}\,(\mathit{fork}\,(f,g)) = f \\ \mathit{tail} \cdot \mathit{unfold}\,(\mathit{fork}\,(f,g)) = \mathit{unfold}\,(\mathit{fork}\,(f,g)) \cdot g \end{array}$

and the very important fusion law: $\displaystyle \mathit{unfold}\,f \cdot h = \mathit{unfold}\,g \Leftarrow f \cdot h = \mathit{prod}\,(\mathit{id},h) \cdot g$

allowing a preceding function ${h}$ to be absorbed into the unfold.

Making streams a monad amounts to defining functions $\displaystyle \begin{array}{lcl} \mathit{return} &::& a \rightarrow \mathit{Stream}\,a \\ \mathit{join} &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow \mathit{Stream}\,a \end{array}$ $\displaystyle \begin{array}{lcl} \mathit{join} \cdot \mathit{return} &=& \mathit{id} \\ \mathit{join} \cdot \mathit{mapS}\,\mathit{return} &=& \mathit{id} \\ \mathit{join} \cdot \mathit{mapS}\,\mathit{join} &=& \mathit{join} \cdot \mathit{join} \end{array}$

Looking at the type, the obvious (indeed, I think the only possible) definition one can give for ${\mathit{return}}$ is ${\mathit{return} = \mathit{repeat}}$ where $\displaystyle \mathit{repeat} = \mathit{unfold}\,\mathit{double}$

and ${\mathit{double} = \mathit{fork}\,(\mathit{id},\mathit{id})}$ makes two copies of its argument. However, there are many type-correct definitions one could give for ${ \mathit{join}}$, including ${\mathit{head}}$, ${\mathit{mapS}\,\mathit{head}}$, and ${\mathit{diag}}$, where $\displaystyle \mathit{diag} = \mathit{unfold}\,\mathit{hhtt}$

and where (for brevity in what follows) we define $\displaystyle \begin{array}{lcl} \mathit{hhtt} &=& \mathit{fork}\,(\mathit{hh},\mathit{tt}) \\ \mathit{hh} &=& \mathit{head}\cdot\mathit{head} \\ \mathit{tt} &=& \mathit{tail}\cdot\mathit{mapS}\,\mathit{tail} \end{array}$

Obviously, ${\mathit{head}}$ yields the first “row” of a stream of streams (if one considers it in row-major order), and ${\mathit{mapS}\,\mathit{head}}$ yields the first column; as the name suggests, ${\mathit{diag}}$ yields the leading diagonal. Nick’s post demonstrates that the first two, although type-correct, do not satisfy the monad laws. He also provides a proof that the third does, which we turn to next.

The proofs that ${\mathit{repeat}}$ and ${\mathit{diag}}$ satisfy the three monad laws are very straightforward, using the universal property of ${\mathit{unfold}}$ and its consequences.

For the first monad law, fusion gives us the condition to check: $\displaystyle \begin{array}{ll} & \mathit{diag}\cdot\mathit{repeat} = \mathit{id} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{repeat} = \mathit{prod}\,(\mathit{id},\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}$

Working on the right-hand side, we have: $\displaystyle \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{repeat} \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fork}\,(\mathit{hh},\mathit{tt})\cdot\mathit{repeat} \\ = & \qquad \{ \mbox{composition distributes backwards over fork} \} \\ & \mathit{fork}\,(\mathit{hh}\cdot\mathit{repeat},\mathit{tt}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{definitions} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{repeat},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{evaluation for~} \mathit{repeat} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{repeat}\cdot\mathit{tail}) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{repeat}) \cdot \mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}$

discharging the proof obligation.

Similarly, for the second monad law, fusion gives us the condition: $\displaystyle \begin{array}{ll} & \mathit{diag}\cdot\mathit{mapS}\,\mathit{repeat} = \mathit{id} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{repeat} = \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}$

and working on the right-hand side, in almost exactly the same steps we get: $\displaystyle \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{repeat} \\ = & \qquad \{ \mbox{definition} \} \\ & \mathit{fork}\,(\mathit{hh},\mathit{tt})\cdot\mathit{mapS}\,\mathit{repeat} \\ = & \qquad \{ \mbox{composition distributes backwards over fork} \} \\ & \mathit{fork}\,(\mathit{hh}\cdot\mathit{mapS}\,\mathit{repeat},\mathit{tt}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{definitions} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{mapS}\,\mathit{repeat},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{repeat}) \\ = & \qquad \{ \mbox{naturality; functors; evaluation} \} \\ & \mathit{fork}\,(\mathit{head},\mathit{mapS}\,\mathit{repeat}\cdot\mathit{tail}) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat}) \cdot \mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}$

discharging the obligation.

What about the third monad law? To apply the universal property (or fusion), we need one side to be expressed as an unfold; but neither side of the equation ${\mathit{diag}\cdot\mathit{diag} = \mathit{diag}\cdot\mathit{mapS}\,\mathit{diag}}$ is in that form. No matter; let us hypothesize that one side—say, the left—can be expressed in the form ${\mathit{unfold}\,h}$ for some ${h}$, then calculate a suitable definition for ${h}$ (if one exists). Assuming we succeed, then we can use fusion to check that the other side equals ${\mathit{unfold}\,h}$. (This strategy doesn’t work if we can find no such ${h}$!)

Again, fusion gives us $\displaystyle \mathit{diag}\cdot\mathit{diag} = \mathit{unfold}\,h \Leftarrow \mathit{hhtt}\cdot\mathit{diag} = \mathit{prod}\,(\mathit{id},\mathit{diag})\cdot h$

so we calculate: $\displaystyle \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{diag} \\ = & \qquad \{ \mbox{definition; distribution} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{diag},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{diag}) \\ = & \qquad \{ \mbox{evaluation for~} \mathit{diag} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{diag}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{diag}\cdot\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \\ = & \qquad \{ \mbox{pairs} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{diag})\cdot\mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \end{array}$

Therefore, letting $\displaystyle \mathit{hhhttt} = \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail}))$

we have concluded that $\displaystyle \mathit{diag}\cdot\mathit{diag} = \mathit{unfold}\,\mathit{hhhttt}$

Now all we have to do is to check that the right-hand side of the third monad law also equals this; fusion gives us the condition $\displaystyle \begin{array}{ll} & \mathit{diag}\cdot\mathit{mapS}\,\mathit{diag} = \mathit{unfold}\,\mathit{hhhttt} \\ \Leftarrow & \\ & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{diag} = \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{diag})\cdot \mathit{hhhttt} \end{array}$

and we calculate on the right-hand side: $\displaystyle \begin{array}{ll} & \mathit{hhtt}\cdot\mathit{mapS}\,\mathit{diag} \\ = & \qquad \{ \mbox{definition; distribution} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{mapS}\,\mathit{diag},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{diag}) \\ = & \qquad \{ \mbox{naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,\mathit{diag}) \\ = & \qquad \{ \mbox{functors; naturality; evaluation} \} \\ & \mathit{fork}\,(\mathit{head}\cdot\mathit{head}\cdot\mathit{head},\mathit{mapS}\,\mathit{diag}\cdot\mathit{tail}\cdot\mathit{mapS}\,\mathit{tail}\cdot\mathit{mapS}\,(\mathit{mapS}\,\mathit{tail})) \\ = & \qquad \{ \mbox{pairs; definition} \} \\ & \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{diag})\cdot\mathit{hhhttt} \end{array}$

completing the proof.

As you’ll see, the calculations are all quite short and simple, whereas in Nick’s formulation, they were rather hard work; I think that was (a) because he wasn’t exploiting the universal property, and (b) because he was working in terms of the “bind” rather than the “join” of the monad, which forced him into a more pointwise rather than pointfree style. Points are helpful when writing programs, but less so when reasoning about them.

## Deducing diag

Here’s another way of looking at the problem. Nick’s blog presented three plausible (that is, type-correct) definitions for the ${\mathit{join}}$ operation. Two of these didn’t satisfy the necessary laws, so were evidently wrong. The third, ${\mathit{diag}}$, does satisfy the laws, but is it the only possible definition that does? I believe that it is the only solution in the form of an unfold; but I only have a hand-waving argument as to why.

Let us suppose that indeed $\displaystyle join = \mathit{unfold}\,k$

for some ${k}$. Without loss of generality, let us suppose also that $\displaystyle k = \mathit{fork}\,(k_1,k_2)$

with $\displaystyle \begin{array}{lcl} k_1 &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow a \\ k_2 &::& \mathit{Stream}\,(\mathit{Stream}\,a) \rightarrow \mathit{Stream}\,(\mathit{Stream}\,a) \end{array}$

I claimed above that ${\mathit{repeat}}$ is the only type-correct definition of the ${\mathit{return}}$ operation. (Ignoring bottoms, that is. Which is to say, in Haskell, all type-correct definitions are approximations in the definedness ordering to ${\mathit{repeat}}$.)

Consideration of just the first two monad laws gives us some constraints on ${k}$, since we know that ${\mathit{return} = \mathit{repeat}}$: $\displaystyle \begin{array}{lcl} k\cdot\mathit{repeat} &=& \mathit{prod}\,(\mathit{id},\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \\ k\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{prod}\,(\mathit{id},\mathit{mapS}\,\mathit{repeat})\cdot\mathit{fork}\,(\mathit{head},\mathit{tail}) \end{array}$

Or in terms of ${k}$‘s two components, $\displaystyle \begin{array}{lcll} k_1\cdot\mathit{repeat} &=& \mathit{head} &(1)\\ k_2\cdot\mathit{repeat} &=& \mathit{repeat}\cdot\mathit{tail} &(2)\\ k_1\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{head} &(3)\\ k_2\cdot\mathit{mapS}\,\mathit{repeat} &=& \mathit{mapS}\,\mathit{repeat}\cdot\mathit{tail} &(4) \end{array}$

I claim that (1) entails that ${k_1}$ picks some element out of the first “column” of a stream of streams (thinking of the input as an infinite matrix in row-major order again)—for the equation says that when the input consists of infinitely many copies of the same stream, ${k_1}$ picks (one of the many copies of) the head of that stream. Symmetrically, (3) entails that, when given a infinite matrix whose columns are all equal, ${k_1}$ picks some element out of the first “row”. And because ${k_1}$ has to be polymorphic, it cannot behave differently on special matrices like these than it does in general. Putting those statements together, and waving my hands in the air, I conclude that ${k_1}$ picks the only element that is in both the first row and the first column: $\displaystyle k_1 = \mathit{head} \cdot \mathit{head}$

Similarly, Equation (2) says that, given an infinite input matrix all of whose rows are equal, ${k_2}$ drops the first column (and possibly some of the rows are duplicated or dropped, and the order of the rows may change; but the elements of the rows are untouched). Symmetrically, (4) says that, given an input whose columns are all equal, ${k_2}$ drops the first row (and may duplicate, drop, or rearrange the columns, but not change any of them). And again, the behaviour in general must be consistent with these special cases. Putting these observations together, ${k_2}$ must drop the first row and the first column, and cannot change any of the remainder of the matrix. $\displaystyle k_2 = \mathit{tail} \cdot \mathit{mapS}\,\mathit{tail}$

What is the right framework in which to present such arguments more formally? It feels rather like Paul Hoogendijk’s relational approach to generic programming, which has to talk about largest natural transformations of a given type: the relational setting provides the conjunction one needs in order to express the two separate constraints on ${k_1}$.

Jeremy Gibbons is Professor of Computing in Oxford University Department of Computer Science, and a fan of functional programming and patterns of computation.
This entry was posted in Uncategorized. Bookmark the permalink.

### 13 Responses to The stream monad

1. For me, it was really helpful to consider the isomorphism between Stream a and Nat -> a. You can just take the monad instance of functions (aka the reader monad) and transform them into the stream equivalent. Taking the diagonal stream is directly analogous to join on functions, where you plug in the same input (in this case the stream index) twice.

• Right – that’s much easier, because it’s non-recursive.

• Brent says:

Nice insight, I had never thought of it this way! But then it occurred to me — doesn’t this immediately generalize to all representable functors? (I.e. functors f such that f a is isomorphic to R -> a, for some “representation type” R)? Well, sure enough, all representable functors are monads in Haskell!

2. Masayuki Takagi says:

very nice post! I implemented your Stream monad and it worked well on GHC.
But.. it was very inefficient on computational time.. (;_;)
I know that the point of your post is not efficiency, however, i want more efficiently working version.
Do you have any idea?

• Where do you think it’s inefficient?

• Masayuki Takagi says:

(>>=) :: Stream a -> (a -> Stream b) -> Stream b
m >>= g = join ((fmap g) m) where fmap = map

integral :: Num a => Stream a -> Stream a
integral xs = let x = zipWith ($) (0:x) (map (+) xs) in x and compare two expressions: a. take 5000$ (integral (return 1)) >>= (return . (*2))
b. take 5000 $map (*2)$ [1,2..]

( (*2) has no special meaning.)

These two expressions give same results [2,4..], but a is much slower than b. A takes 5.6[sec] to print, while b takes only 0.01[sec] in my environment respectively. A is about x500 slower than b.

• Ah, right – I see now. The problem is that the join (and hence the bind) for the stream monad themselves are inefficient, and I think inherently so. Remember that the join has to extract the diagonal from a stream of streams; it can access the head of that diagonal in just two operations (the head of the head), but the next element takes three steps (the head of the head of the tail), then four steps (head of head of tail of tail), and so on: linearly many steps for each element, and quadratically many for each prefix.

Instead of using a monadic interface to streams, perhaps an applicative one would be better. Instead of join/bind, you’d use a zip-like operator, giving you linear time access for any prefix.

I’m curious: is this part of a larger application or library? I’ve been looking for a good use of the stream monad/idiom…

3. Masayuki Takagi says:

Thanks for your detailed explanation. I understand.

Your suggestion is nice. In fact I first used an applicative interface, but I was stuck. I asked a question hereand got hints that bind would solve my proglem, so I searched monadic interface to streams.

Yes, this is part of a library I try to design. I try to build a DSL that describes physical simulation on top of Haskell. I want to make abstraction of time evolution of physical amounts with hiding each time points. I refer to Paul Hudak’s “The Haskell School of Expression” where some time-varying reactive animation programs are written using streams.

Will you see my codes? (as pure functions, not applicative or monadic) If you want, I will email you them.

4. Joseph says:

Interesting article. Did you ever find a good practical use for this monad? I’m very new to functional programming, and have the found the instance for finite streams (as in lazily evaluated lists) where join is the last element in the stream (opposite to a normal list) to be very useful in eager languages like racket where streams are more explicit than haskell. An example would be a lexical analyzer that uses functions like String -> Stream (Token, String) where the string on the right hand side is the remain string. The monad can be used to compose these functions very neatly.

I’m asking because I think a definition of join that enumerates every element of all streams sounds more useful but I don’t understand the topic well enough to know if that’s stupid. Something like
union :: Stream a -> Stream a -> Stream a
union x y = head x : union y (tail x)

join :: Stream (Stream a) -> Stream a
join = fold union

5. I see – your “union” fairly merges two streams. This definition has the right type, but doesn’t satisfy the monad laws. In fact, it doesn’t satisfy any of the three laws. This means that reasonable-looking refactorings of your monadic programs will change their behaviour.