Streaming Arithmetic Coding

In the previous post we saw the basic definitions of arithmetic encoding and decoding, and a proof that decoding does indeed successfully retrieve the input. In this post we go on to show how both encoding and decoding can be turned into streaming processes.

Producing bits

Recall that

\displaystyle  \begin{array}{@{}l} \mathit{encode}_0 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow \mathit{Rational} \\ \mathit{encode}_0\;m = \mathit{pick} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \vrule width0pt depth2ex \\ \mathit{decode}_0 :: \mathit{Model} \rightarrow \mathit{Rational} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_0\;m\;x = \mathit{unfoldr}\;\mathit{step}\;(m,x) \end{array}

Encoding and decoding work together. But they work only in batch mode: encoding computes a fraction, and yields nothing until the last step, and so decoding cannot start until encoding has finished. We really want encoding to yield as the encoded text a list of bits representing the fraction, rather than the fraction itself, so that we can stream the encoded text and the decoding process. To this end, we replace {\mathit{pick} :: \mathit{Interval} \rightarrow \mathit{Rational}} by {\mathit{pick}_2 = \mathit{fromBits} \cdot \mathit{toBits}}, where

\displaystyle  \begin{array}{@{}l} \mathbf{type}\;\mathit{Bit} = \mathit{Integer} - \mbox{\quad 0 or 1 only} \vrule width0pt depth2ex \\ \mathit{toBits} :: \mathit{Interval} \rightarrow [\mathit{Bit}] \\ \mathit{fromBits} :: [\mathit{Bit}] \rightarrow \mathit{Rational} \end{array}

The obvious definitions have {\mathit{toBits}\;i} yield the shortest binary expansion of any fraction within {i}, and {\mathit{fromBits}} evaluate this binary expansion. However, we don’t do quite this—it turns out to prevent the streaming condition from holding—and instead arrange for {\mathit{toBits}} to yield the bit sequence that when extended with a 1 yields the shortest expansion of any fraction within {i} (and indeed, the shortest binary expansion necessarily ends with a 1), and {\mathit{fromBits}} compute the value with this 1 appended.

\displaystyle  \begin{array}{@{}l} \mathit{fromBits} = \mathit{foldr}\;\mathit{pack}\;(\frac 1 2) \vrule width0pt depth2ex \\ \mathit{pack} :: \mathit{Bit} \rightarrow \mathit{Rational} \rightarrow \mathit{Rational} \\ \mathit{pack}\;b\;x = (b + x) / 2 \vrule width0pt depth2ex \\ \mathit{toBits} = \mathit{unfoldr}\;\mathit{nextBit} \vrule width0pt depth2ex \\ \mathit{nextBit} :: \mathit{Interval} \rightarrow \mathsf{Maybe}\;(\mathit{Bit}, \mathit{Interval}) \\ \mathit{nextBit}\;(l,r) \\ \begin{array}[t]{@{\quad}clcl} | & r \le \frac 1 2 &=& \mathit{Just}\;(0, (0, \frac 1 2) \mathbin{\triangleleft} (l,r)) \\ | & \frac 1 2 \le l &=& \mathit{Just}\;(1, (\frac 1 2,1) \mathbin{\triangleleft} (l,r)) \\ | & \mathbf{otherwise} &=& \mathit{Nothing} \end{array} \end{array}

Thus, if {r \le \frac 1 2} then the binary expansion of any fraction within {[l,r)} starts with 0; and similarly, if {\frac 1 2 \le l}, the binary expansion starts with 1. Otherwise, the interval {[l,r)} straddles {\frac 1 2}; the shortest binary expansion within is it the expansion of {\frac 1 2}, so we yield the empty bit sequence.

Note that {\mathit{pick}_2 = \mathit{fromBits} \cdot \mathit{toBits}} is a hylomorphism, so we have

\displaystyle  \begin{array}{@{}l} \mathit{pick}_2\;(l,r) \\ \begin{array}[t]{@{\quad}clcl} | & r \le \frac 1 2 &=& \mathit{pick}_2\;((0,\frac 1 2) \mathbin{\triangleleft} (l,r)) / 2 \\ | & \frac 1 2 \le l &=& (1 + \mathit{pick}_2\;((\frac 1 2,1) \mathbin{\triangleleft} (l,r))) / 2 \\ | & \mathbf{otherwise} &=& \frac 1 2 \end{array} \end{array}

Moreover, it is clear that {\mathit{toBits}} yields a finite bit sequence for any non-empty interval (since the interval doubles in width at each step, and the process stops when it includes {\frac 1 2}); so this equation serves to uniquely define {\mathit{pick}_2}. In other words, {\mathit{nextBit}} is a recursive coalgebra. Then it is a straightforward exercise to prove that {i \ni \mathit{pick}_2\;i}; so although {\mathit{pick}} and {\mathit{pick}_2} differ, they are sufficiently similar for our purposes.

Now we redefine encoding to yield a bit sequence rather than a fraction, and decoding correspondingly to consume that bit sequence:

\displaystyle  \begin{array}{@{}l} \mathit{encode}_1 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Bit}] \\ \mathit{encode}_1\;m = \mathit{toBits} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \vrule width0pt depth2ex \\ \mathit{decode}_1 :: \mathit{Model} \rightarrow [\mathit{Bit}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_1\;m = \mathit{decode}_0\;m \cdot \mathit{fromBits} \end{array}

That is, we move the {\mathit{fromBits}} part of {\mathit{pick}_2} from the encoding stage to the decoding stage.

Streaming encoding

Just like {\mathit{encode}_0}, the new version {\mathit{encode}_1} of encoding consumes all of its input before producing any output, so does not work for encoding infinite inputs, nor for streaming execution even on finite inputs. However, it is nearly in the right form to be a metamorphism—a change of representation from lists of symbols to lists of bits. In particular, {\mathit{narrow}} is associative, and {\mathit{unit}} is its unit, so we can replace the {\mathit{foldr}} with a {\mathit{foldl}}:

\displaystyle  \mathit{encode}_1\;m = \mathit{unfoldr}\;\mathit{nextBit} \cdot \mathit{foldl}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m

Now that {\mathit{encode}_1} is in the right form, we must check the streaming condition for {\mathit{narrow}} and {\mathit{nextBit}}. We consider one of the two cases in which {\mathit{nextBit}} is productive, and leave the other as an exercise. When {r \le \frac 1 2}, and assuming {\mathit{unit} \supseteq (p,q)}, we have:

\displaystyle  \begin{array}{@{}cl} & \mathit{nextBit}\;((l,r) \mathbin{\triangleright} (p,q)) \\ = & \qquad \{ \mathit{narrow} \} \\ & \mathit{nextBit}\;(\mathit{weight}\;(l,r)\;p, \mathit{weight}\;(l,r)\;q) \\ = & \qquad \{ (l,r) \ni \mathit{weight}\;(l,r)\;q \mbox{, so in particular } \mathit{weight}\;(l,r)\;q < r \le \frac 1 2 \} \\ & \mathit{Just}\;(0, (0, \frac 1 2) \mathbin{\triangleleft} ((l,r) \mathbin{\triangleright} (p,q))) \\ = & \qquad \{ \mathit{widen} \mbox{ associates with } \mathit{narrow} \mbox{ (see below)} \} \\ & \mathit{Just}\;(0, ((0, \frac 1 2) \mathbin{\triangleleft} (l,r)) \mathbin{\triangleright} (p,q)) \end{array}

as required. The last step is a kind of associativity property:

\displaystyle  i \mathbin{\triangleleft} (j \mathbin{\triangleright} k) = (i \mathbin{\triangleleft} j) \mathbin{\triangleright} k

whose proof is left as another exercise. Therefore the streaming condition holds, and we may fuse the {\mathit{unfoldr}} with the {\mathit{foldl}}, defining

\displaystyle  \begin{array}{@{}l} \mathit{encode}_2 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Bit}] \\ \mathit{encode}_2\;m = \mathit{stream}\;\mathit{nextBit}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \end{array}

which streams the encoding process: the initial bits are output as soon as they are fully determined, even before all the input has been read. Note that {\mathit{encode}_1} and {\mathit{encode}_2} differ, in particular on infinite inputs (the former diverges, whereas the latter does not); but they coincide on finite symbol sequences.

Streaming decoding

Similarly, we want to be able to stream decoding, so that we don’t have to wait for the entire encoded text to arrive before starting decoding. Recall that we have so far

\displaystyle  \mathit{decode}_1\;m = \mathit{decode}_0\;m \cdot \mathit{fromBits}

where {\mathit{decode}_0} is an {\mathit{unfoldr}} and {\mathit{fromBits}} a {\mathit{foldr}}. The first obstacle to streaming is that {\mathit{foldr}}, which we need to be a {\mathit{foldl}} instead. We have

\displaystyle  \textstyle \mathit{fromBits} = \mathit{foldr}\;\mathit{pack}\;(\frac 1 2)

Of course, {\mathit{pack}} is not associative—it doesn’t even have the right type for that. But we can view each bit in the input as a function on the unit interval: bit~0 is represented by the function {\mathit{weight}\;(0,\frac 1 2)} that focusses into the lower half of the unit interval, and bit~1 by the function {\mathit{weight}\;(\frac 1 2, 1)} that focusses into the upper half. The fold itself composes a sequence of such functions; and since function composition is associative, this can be written equally well as a {\mathit{foldr}} or a {\mathit{foldl}}. Having assembled the individual focussers into one composite function, we finally apply it to {\frac 1 2}. (This is in fact an instance of a general trick for turning a {\mathit{foldr}} into a {\mathit{foldl}}, or vice versa.) Thus, we have:

\displaystyle  \textstyle \mathit{fromBits}\;bs = \mathit{foldl}\;\mathit{focusf}\;\mathit{id}\;bs\;(\frac 1 2) \quad\mathbf{where}\; \mathit{focusf}\;h\;b = h \cdot \mathit{weight}\;(\mathit{half}\;b)

where {\mathit{half}} yields either the lower or the upper half of the unit interval:

\displaystyle  \begin{array}{@{}lcl} \multicolumn{3}{@{}l}{\mathit{half} :: \mathit{Bit} \rightarrow \mathit{Interval}} \\ \mathit{half}\;0 &=& (0, \frac 1 2) \\ \mathit{half}\;1 &=& (\frac 1 2, 1) \end{array}

In fact, not only may the individual bits be seen as focussing functions {\mathit{weight}\;(0, \frac 1 2)} and {\mathit{weight}\;(\frac 1 2, 1)} on the unit interval, so too may compositions of such functions:

\displaystyle  \begin{array}{@{}lcl} \mathit{id} &=& \mathit{weight}\;\mathit{unit} \\ \mathit{weight}\;i \cdot \mathit{weight}\;j &=& \mathit{weight}\;(i \mathbin{\triangleright} j) \end{array}

So any such composition is of the form {\mathit{weight}\;i} for some interval {i}, and we may represent it concretely by {i} itself, and retrieve the function via {\mathit{weight}}:

\displaystyle  \textstyle \mathit{fromBits}\;bs = \mathit{weight}\;(\mathit{foldl}\;\mathit{focus}\;\mathit{unit}\;bs)\;(\frac 1 2) \quad\mathbf{where}\; \mathit{focus}\;i\;b = i \mathbin{\triangleright} \mathit{half}\;b

So we now have

\displaystyle  \textstyle \mathit{decode}_1\;m = \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \quad\mathbf{where}\; \mathit{prepare}\;m\;i = (m, \mathit{weight}\;i\;(\frac 1 2))

This is almost in the form of a metamorphism, except for the occurrence of the adapter {\mathit{prepare}\;m} in between the unfold and the fold. It is not straightforward to fuse that adapter with either the fold or the unfold; fortunately, however, we can split it into the composition

\displaystyle  \mathit{prepare}\;m = \mathit{prepare}_2 \cdot \mathit{prepare}_1\;m

of two parts, where

\displaystyle  \begin{array}{@{}lcl} \multicolumn{3}{@{}l}{\mathit{prepare}_1 :: \mathit{Model} \rightarrow \mathit{Interval} \rightarrow (\mathit{Model}, \mathit{Interval})} \\ \mathit{prepare}_1\;m\;i &=& (m,i) \vrule width0pt depth2ex \\ \multicolumn{3}{@{}l}{\mathit{prepare}_2 :: (\mathit{Model}, \mathit{Interval}) \rightarrow (\mathit{Model}, \mathit{Rational})} \\ \mathit{prepare}_2\;(m,i) &=& (m, \mathit{weight}\;i\;(\frac 1 2)) \end{array}

in such a way that the first part {\mathit{prepare}_1} fuses with the fold and the second part {\mathit{prepare}_2} fuses with the unfold. For fusing the first half of the adapter with the fold, we just need to carry around the additional value {m} with the interval being focussed:

\displaystyle  \mathit{prepare}_1\;m \cdot \mathit{foldl}\;\mathit{focus}\;i = \mathit{foldl}\;\mathit{mfocus}\;(m,i)


\displaystyle  \begin{array}{@{}l} \mathit{mfocus} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathit{Bit} \rightarrow (\mathit{Model}, \mathit{Interval}) \\ \mathit{mfocus}\;(m,i)\;b = (m, \mathit{focus}\;i\;b) \end{array}

For fusing the second half of the adapter with the unfold, let us check the fusion condition. We have (exercise!):

\displaystyle  \begin{array}{@{}l} \mathit{step}\;(\mathit{prepare}_2\;(m, i)) = \mathit{fmap}\;\mathit{prepare}_2\;(\mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i))) \\ \qquad\mathbf{where}\;s = \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\frac 1 2)) \end{array}

where the {\mathit{fmap}} is the functorial action for the base functor of the {\mathsf{List}} datatype, applying just to the second component of the optional pair. We therefore define

\displaystyle  \begin{array}{@{}l} \mathit{stepi} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model}, \mathit{Interval})) \\ \mathit{stepi}\;(m,i) = \mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i)) \\ \qquad\mathbf{where}\;s = \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\frac 1 2)) \end{array}

and have

\displaystyle  \mathit{step}\;(\mathit{prepare}_2\;(m, i)) = \mathit{fmap}\;\mathit{prepare}_2\;(\mathit{stepi}\;(m,i))

and therefore

\displaystyle  \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2 = \mathit{unfoldr}\;\mathit{stepi}

Note that the right-hand side will eventually lead to intervals that exceed the unit interval. When {j \supseteq i}, it follows that {\mathit{unit} \supseteq j \mathbin{\triangleleft} i}; but the unfolding process keeps widening the interval without bound, so it will necessarily eventually exceed the unit bounds. We return to this point shortly.

We have therefore concluded that

\displaystyle  \begin{array}{@{}lcl} \mathit{decode}_1\;m &=& \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \\ &=& \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2 \cdot \mathit{prepare}_1\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \\ &=& \mathit{unfoldr}\;\mathit{stepi} \cdot \mathit{foldl}\;\mathit{mfocus}\;(m,\mathit{unit}) \end{array}

Now we need to check the streaming condition for {\mathit{mfocus}} and {\mathit{stepi}}. Unfortunately, this is never going to hold: {\mathit{stepi}} is always productive, so {\mathit{stream}\;\mathit{stepi}\;\mathit{mfocus}} will only take production steps and never consume any input. The problem is that {\mathit{unfoldr}\;\mathit{stepi}} is too aggressive, and we need to use the more cautious flushing version of streaming instead. Informally, the streaming process should be productive from a given state {(m,i)} only when the whole of interval {i} maps to the same symbol in model {m}, so that however {i} is focussed by subsequent inputs, that symbol cannot be invalidated.

More formally, note that

\displaystyle  \mathit{unfoldr}\;\mathit{stepi} = \mathit{apo}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{stepi})


\displaystyle  \begin{array}{@{}l} \mathit{safestepi} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model}, \mathit{Interval})) \\ \mathit{safestepi}\;(m,i) \\ \begin{array}[t]{@{\quad}clcl} | & \mathit{safe}\;(m,i) &=& \mathit{stepi}\;(m,i) \\ | & \mathbf{otherwise} &=& \mathit{Nothing} \end{array} \end{array}


\displaystyle  \begin{array}{@{}l} \mathit{safe} :: (\mathit{Model},\mathit{Interval}) \rightarrow \mathit{Bool} \\ \mathit{safe}\;(m, i) = \mathit{encodeSym}\;m\;s \supseteq i \quad\mathbf{where}\; s = \mathit{decodeSym}\;m\;(\mathit{midpoint}\;i) \end{array}

That is, the interval {i} is “safe” for model {m} if it is fully included in the encoding of some symbol {s}; then all elements of {i} decode to {s}. Then, and only then, we may commit to outputting {s}, because no further input bits could lead to a different first output symbol.

Note now that the interval remains bounded by unit interval during the streaming phase, because of the safety check in {\mathit{safestepi}}, although it will still exceed the unit interval during the flushing phase. However, at this point we can undo the fusion we performed earlier, “fissioning{\mathit{unfoldr}\;\mathit{stepi}} into {\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2} again: this manipulates rationals rather than intervals, so there is no problem with intervals getting too wide. We therefore have:

\displaystyle  \mathit{decode}_1\;m = \mathit{apo}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2) \cdot \mathit{foldl}\;\mathit{mfocus}\;(m,\mathit{unit})

Now let us check the streaming condition for {\mathit{mfocus}} and the more cautious {\mathit{safestepi}}. Suppose that {(m,i)} is a productive state, so that {\mathit{safe}\;(m,i)} holds, that is, all of interval {i} is mapped to the same symbol in {m}, and let

\displaystyle  \begin{array}{@{}lcl} s &=& \mathit{decodeSym}\;m\;(\mathit{midpoint}\;i) \\ m' &=& \mathit{newModel}\;m\;s \\ i' &=& \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i \end{array}

so that {\mathit{safestepi}\;(m,i) = \mathit{Just}\;(s, (m',i'))}. Consuming the next input {b} leads to state {(m, \mathit{focus}\;i\;b)}. This too is a productive state, because {i \supseteq \mathit{focus}\;i\;b} for any {b}, and so the whole of the focussed interval is also mapped to the same symbol {s} in the model. In particular, the midpoint of {\mathit{focus}\;i\;b} is within interval {i}, and so the first symbol produced from the state after consumption coincides with the symbol {s} produced from the state before consumption. That is,

\displaystyle  \mathit{safestepi}\;(\mathit{mfocus}\;(m,i)\;b) = \mathit{Just}\;(s, \mathit{mfocus}\;(m', i')\;b)

as required. We can therefore rewrite decoding as a flushing stream computation:

\displaystyle  \begin{array}{@{}l} \mathit{decode}_2 :: \mathit{Model} \rightarrow [\mathit{Bit}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_2\;m = \mathit{fstream}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2)\;\mathit{mfocus}\;(m,\mathit{unit}) \end{array}

That is, initial symbols are output as soon as they are completely determined, even before all the input bits have been read. This agrees with {\mathit{decode}_1} on finite bit sequences.

Fixed-precision arithmetic

We will leave arithmetic coding at this point. There is actually still quite a bit more arithmetic required—in particular, for competitive performance it is important to use only fixed-precision arithmetic, restricting attention to rationals within the unit interval with denominator {2^k} for some fixed~{k}. In order to be able to multiply two numerators using 32-bit integer arithmetic without the risk of overflow, we can have at most {k=15}. Interval narrowing now needs to be approximate, rounding down both endpoints to integer multiples of {2^{-k}}. Care needs to be taken so that this rounding never makes the two endpoints of an interval coincide. Still, encoding can be written as an instance of {\mathit{stream}}. Decoding appears to be more difficult: the approximate arithmetic means that we no longer have interval widening as an exact inverse of narrowing, so the approach above no longer works. Instead, our 2002 lecture notes introduce a “destreaming” operator that simulates and inverts streaming: the decoder works in sympathy with the encoder, performing essentially the same interval arithmetic but doing the opposite conversions. Perhaps I will return to complete that story some time…


About jeremygibbons

Jeremy is Professor of Computing in Oxford University Department of Computer Science, and a fan of functional programming and patterns of computation.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s