In the previous post we saw the basic definitions of arithmetic encoding and decoding, and a proof that decoding does indeed successfully retrieve the input. In this post we go on to show how both encoding and decoding can be turned into streaming processes.
Producing bits
Recall that
Encoding and decoding work together. But they work only in batch mode: encoding computes a fraction, and yields nothing until the last step, and so decoding cannot start until encoding has finished. We really want encoding to yield as the encoded text a list of bits representing the fraction, rather than the fraction itself, so that we can stream the encoded text and the decoding process. To this end, we replace by
, where
The obvious definitions have yield the shortest binary expansion of any fraction within
, and
evaluate this binary expansion. However, we don’t do quite this—it turns out to prevent the streaming condition from holding—and instead arrange for
to yield the bit sequence that when extended with a 1 yields the shortest expansion of any fraction within
(and indeed, the shortest binary expansion necessarily ends with a 1), and
compute the value with this 1 appended.
Thus, if then the binary expansion of any fraction within
starts with 0; and similarly, if
, the binary expansion starts with 1. Otherwise, the interval
straddles
; the shortest binary expansion within is it the expansion of
, so we yield the empty bit sequence.
Note that is a hylomorphism, so we have
Moreover, it is clear that yields a finite bit sequence for any non-empty interval (since the interval doubles in width at each step, and the process stops when it includes
); so this equation serves to uniquely define
. In other words,
is a recursive coalgebra. Then it is a straightforward exercise to prove that
; so although
and
differ, they are sufficiently similar for our purposes.
Now we redefine encoding to yield a bit sequence rather than a fraction, and decoding correspondingly to consume that bit sequence:
That is, we move the part of
from the encoding stage to the decoding stage.
Streaming encoding
Just like , the new version
of encoding consumes all of its input before producing any output, so does not work for encoding infinite inputs, nor for streaming execution even on finite inputs. However, it is nearly in the right form to be a metamorphism—a change of representation from lists of symbols to lists of bits. In particular,
is associative, and
is its unit, so we can replace the
with a
:
Now that is in the right form, we must check the streaming condition for
and
. We consider one of the two cases in which
is productive, and leave the other as an exercise. When
, and assuming
, we have:
as required. The last step is a kind of associativity property:
whose proof is left as another exercise. Therefore the streaming condition holds, and we may fuse the with the
, defining
which streams the encoding process: the initial bits are output as soon as they are fully determined, even before all the input has been read. Note that and
differ, in particular on infinite inputs (the former diverges, whereas the latter does not); but they coincide on finite symbol sequences.
Streaming decoding
Similarly, we want to be able to stream decoding, so that we don’t have to wait for the entire encoded text to arrive before starting decoding. Recall that we have so far
where is an
and
a
. The first obstacle to streaming is that
, which we need to be a
instead. We have
Of course, is not associative—it doesn’t even have the right type for that. But we can view each bit in the input as a function on the unit interval: bit~0 is represented by the function
that focusses into the lower half of the unit interval, and bit~1 by the function
that focusses into the upper half. The fold itself composes a sequence of such functions; and since function composition is associative, this can be written equally well as a
or a
. Having assembled the individual focussers into one composite function, we finally apply it to
. (This is in fact an instance of a general trick for turning a
into a
, or vice versa.) Thus, we have:
where yields either the lower or the upper half of the unit interval:
In fact, not only may the individual bits be seen as focussing functions and
on the unit interval, so too may compositions of such functions:
So any such composition is of the form for some interval
, and we may represent it concretely by
itself, and retrieve the function via
:
So we now have
This is almost in the form of a metamorphism, except for the occurrence of the adapter in between the unfold and the fold. It is not straightforward to fuse that adapter with either the fold or the unfold; fortunately, however, we can split it into the composition
of two parts, where
in such a way that the first part fuses with the fold and the second part
fuses with the unfold. For fusing the first half of the adapter with the fold, we just need to carry around the additional value
with the interval being focussed:
where
For fusing the second half of the adapter with the unfold, let us check the fusion condition. We have (exercise!):
where the is the functorial action for the base functor of the
datatype, applying just to the second component of the optional pair. We therefore define
and have
and therefore
Note that the right-hand side will eventually lead to intervals that exceed the unit interval. When , it follows that
; but the unfolding process keeps widening the interval without bound, so it will necessarily eventually exceed the unit bounds. We return to this point shortly.
We have therefore concluded that
Now we need to check the streaming condition for and
. Unfortunately, this is never going to hold:
is always productive, so
will only take production steps and never consume any input. The problem is that
is too aggressive, and we need to use the more cautious flushing version of streaming instead. Informally, the streaming process should be productive from a given state
only when the whole of interval
maps to the same symbol in model
, so that however
is focussed by subsequent inputs, that symbol cannot be invalidated.
More formally, note that
where
and
That is, the interval is “safe” for model
if it is fully included in the encoding of some symbol
; then all elements of
decode to
. Then, and only then, we may commit to outputting
, because no further input bits could lead to a different first output symbol.
Note now that the interval remains bounded by unit interval during the streaming phase, because of the safety check in , although it will still exceed the unit interval during the flushing phase. However, at this point we can undo the fusion we performed earlier, “fissioning”
into
again: this manipulates rationals rather than intervals, so there is no problem with intervals getting too wide. We therefore have:
Now let us check the streaming condition for and the more cautious
. Suppose that
is a productive state, so that
holds, that is, all of interval
is mapped to the same symbol in
, and let
so that . Consuming the next input
leads to state
. This too is a productive state, because
for any
, and so the whole of the focussed interval is also mapped to the same symbol
in the model. In particular, the midpoint of
is within interval
, and so the first symbol produced from the state after consumption coincides with the symbol
produced from the state before consumption. That is,
as required. We can therefore rewrite decoding as a flushing stream computation:
That is, initial symbols are output as soon as they are completely determined, even before all the input bits have been read. This agrees with on finite bit sequences.
Fixed-precision arithmetic
We will leave arithmetic coding at this point. There is actually still quite a bit more arithmetic required—in particular, for competitive performance it is important to use only fixed-precision arithmetic, restricting attention to rationals within the unit interval with denominator for some fixed~
. In order to be able to multiply two numerators using 32-bit integer arithmetic without the risk of overflow, we can have at most
. Interval narrowing now needs to be approximate, rounding down both endpoints to integer multiples of
. Care needs to be taken so that this rounding never makes the two endpoints of an interval coincide. Still, encoding can be written as an instance of
. Decoding appears to be more difficult: the approximate arithmetic means that we no longer have interval widening as an exact inverse of narrowing, so the approach above no longer works. Instead, our 2002 lecture notes introduce a “destreaming” operator that simulates and inverts streaming: the decoder works in sympathy with the encoder, performing essentially the same interval arithmetic but doing the opposite conversions. Perhaps I will return to complete that story some time…
Pingback: Resumen de lecturas compartidas (diciembre de 2017) | Vestigium
Pingback: Asymmetric Numeral Systems | Patterns in Functional Programming
Hello! On the Planet Haskell blog aggregator [1] it appears that you have written a follow-up for this blog-post, “Asymmetric Numeral Systems” (on the 4th of August, 2018). However, I cannot find that article on your blog; I’m not sure whether it was removed by mistake or on purpose, but I figured I would let you know. Thank you for all your blog-posts and nice papers! I would really love to find all this material organised in a book!
[1] https://planet.haskell.org
Apologies. Indeed, I published a draft, then immediately found a problem in it. So I’ve taken it offline until I figure out how to fix it. Shouldn’t be long.
Now fixed and republished. Thanks for asking!