In the previous post we saw the basic definitions of arithmetic encoding and decoding, and a proof that decoding does indeed successfully retrieve the input. In this post we go on to show how both encoding and decoding can be turned into streaming processes.
Encoding and decoding work together. But they work only in batch mode: encoding computes a fraction, and yields nothing until the last step, and so decoding cannot start until encoding has finished. We really want encoding to yield as the encoded text a list of bits representing the fraction, rather than the fraction itself, so that we can stream the encoded text and the decoding process. To this end, we replace by , where
The obvious definitions have yield the shortest binary expansion of any fraction within , and evaluate this binary expansion. However, we don’t do quite this—it turns out to prevent the streaming condition from holding—and instead arrange for to yield the bit sequence that when extended with a 1 yields the shortest expansion of any fraction within (and indeed, the shortest binary expansion necessarily ends with a 1), and compute the value with this 1 appended.
Thus, if then the binary expansion of any fraction within starts with 0; and similarly, if , the binary expansion starts with 1. Otherwise, the interval straddles ; the shortest binary expansion within is it the expansion of , so we yield the empty bit sequence.
Note that is a hylomorphism, so we have
Moreover, it is clear that yields a finite bit sequence for any non-empty interval (since the interval doubles in width at each step, and the process stops when it includes ); so this equation serves to uniquely define . In other words, is a recursive coalgebra. Then it is a straightforward exercise to prove that ; so although and differ, they are sufficiently similar for our purposes.
Now we redefine encoding to yield a bit sequence rather than a fraction, and decoding correspondingly to consume that bit sequence:
That is, we move the part of from the encoding stage to the decoding stage.
Just like , the new version of encoding consumes all of its input before producing any output, so does not work for encoding infinite inputs, nor for streaming execution even on finite inputs. However, it is nearly in the right form to be a metamorphism—a change of representation from lists of symbols to lists of bits. In particular, is associative, and is its unit, so we can replace the with a :
Now that is in the right form, we must check the streaming condition for and . We consider one of the two cases in which is productive, and leave the other as an exercise. When , and assuming , we have:
as required. The last step is a kind of associativity property:
whose proof is left as another exercise. Therefore the streaming condition holds, and we may fuse the with the , defining
which streams the encoding process: the initial bits are output as soon as they are fully determined, even before all the input has been read. Note that and differ, in particular on infinite inputs (the former diverges, whereas the latter does not); but they coincide on finite symbol sequences.
Similarly, we want to be able to stream decoding, so that we don’t have to wait for the entire encoded text to arrive before starting decoding. Recall that we have so far
where is an and a . The first obstacle to streaming is that , which we need to be a instead. We have
Of course, is not associative—it doesn’t even have the right type for that. But we can view each bit in the input as a function on the unit interval: bit~0 is represented by the function that focusses into the lower half of the unit interval, and bit~1 by the function that focusses into the upper half. The fold itself composes a sequence of such functions; and since function composition is associative, this can be written equally well as a or a . Having assembled the individual focussers into one composite function, we finally apply it to . (This is in fact an instance of a general trick for turning a into a , or vice versa.) Thus, we have:
where yields either the lower or the upper half of the unit interval:
In fact, not only may the individual bits be seen as focussing functions and on the unit interval, so too may compositions of such functions:
So any such composition is of the form for some interval , and we may represent it concretely by itself, and retrieve the function via :
So we now have
This is almost in the form of a metamorphism, except for the occurrence of the adapter in between the unfold and the fold. It is not straightforward to fuse that adapter with either the fold or the unfold; fortunately, however, we can split it into the composition
of two parts, where
in such a way that the first part fuses with the fold and the second part fuses with the unfold. For fusing the first half of the adapter with the fold, we just need to carry around the additional value with the interval being focussed:
For fusing the second half of the adapter with the unfold, let us check the fusion condition. We have (exercise!):
where the is the functorial action for the base functor of the datatype, applying just to the second component of the optional pair. We therefore define
Note that the right-hand side will eventually lead to intervals that exceed the unit interval. When , it follows that ; but the unfolding process keeps widening the interval without bound, so it will necessarily eventually exceed the unit bounds. We return to this point shortly.
We have therefore concluded that
Now we need to check the streaming condition for and . Unfortunately, this is never going to hold: is always productive, so will only take production steps and never consume any input. The problem is that is too aggressive, and we need to use the more cautious flushing version of streaming instead. Informally, the streaming process should be productive from a given state only when the whole of interval maps to the same symbol in model , so that however is focussed by subsequent inputs, that symbol cannot be invalidated.
More formally, note that
That is, the interval is “safe” for model if it is fully included in the encoding of some symbol ; then all elements of decode to . Then, and only then, we may commit to outputting , because no further input bits could lead to a different first output symbol.
Note now that the interval remains bounded by unit interval during the streaming phase, because of the safety check in , although it will still exceed the unit interval during the flushing phase. However, at this point we can undo the fusion we performed earlier, “fissioning” into again: this manipulates rationals rather than intervals, so there is no problem with intervals getting too wide. We therefore have:
Now let us check the streaming condition for and the more cautious . Suppose that is a productive state, so that holds, that is, all of interval is mapped to the same symbol in , and let
so that . Consuming the next input leads to state . This too is a productive state, because for any , and so the whole of the focussed interval is also mapped to the same symbol in the model. In particular, the midpoint of is within interval , and so the first symbol produced from the state after consumption coincides with the symbol produced from the state before consumption. That is,
as required. We can therefore rewrite decoding as a flushing stream computation:
That is, initial symbols are output as soon as they are completely determined, even before all the input bits have been read. This agrees with on finite bit sequences.
We will leave arithmetic coding at this point. There is actually still quite a bit more arithmetic required—in particular, for competitive performance it is important to use only fixed-precision arithmetic, restricting attention to rationals within the unit interval with denominator for some fixed~. In order to be able to multiply two numerators using 32-bit integer arithmetic without the risk of overflow, we can have at most . Interval narrowing now needs to be approximate, rounding down both endpoints to integer multiples of . Care needs to be taken so that this rounding never makes the two endpoints of an interval coincide. Still, encoding can be written as an instance of . Decoding appears to be more difficult: the approximate arithmetic means that we no longer have interval widening as an exact inverse of narrowing, so the approach above no longer works. Instead, our 2002 lecture notes introduce a “destreaming” operator that simulates and inverts streaming: the decoder works in sympathy with the encoder, performing essentially the same interval arithmetic but doing the opposite conversions. Perhaps I will return to complete that story some time…