The previous two posts discussed arithmetic coding (AC). In this post I’ll discuss a related but more recent entropy encoding technique called asymmetric numeral systems (ANS), introduced by Jarek Duda and explored in some detail in a series of 12 blog posts by Charles Bloom. ANS combines the effectiveness of AC (achieving approximately the theoretical optimum Shannon entropy) with the efficiency of Huffman coding. On the other hand, whereas AC acts in a first-in first-out manner, ANS acts last-in first-out, in the sense that the decoded text comes out in reverse; this is not a problem in a batched context, but it makes ANS unsuitable for encoding a communications channel, and also makes it difficult to employ adaptive text models.
Arithmetic coding, revisited
Here is a simplified version of arithmetic coding; compared to the previous posts, we use a fixed rather than adaptive model, and rather than ing some arbitrary element of the final interval, we pick explicitly the lower bound of the interval (by
, when it is represented as a pair). We suppose a function
that returns a positive count for every symbol, representing predicted frequencies of occurrence, from which it is straightforward to derive functions
that work on a fixed global model, satisfying the central property
For example, an alphabet of three symbols `a’, `b’, `c’, with counts 2, 3, and 5 respectively, could be encoded by the intervals . We have operations on intervals:
that satisfy
Then we have:
For example, with the alphabet and intervals above, the input text “abc” gets encoded symbol by symbol, from left to right (because of the ), to the narrowing sequence of intervals
from which we select the lower bound
of the final interval. (This version doesn’t stream well, because of the decision to pick the lower bound of the final interval as the result. But it will suffice for our purposes.)
Note now that the symbol encoding can be “fissioned” out of , using map fusion for
backwards; moreover,
is associative (and
its neutral element), so we can replace the
with a
; and finally, now using fusion forwards, we can fuse the
with the
. That is,
and we can fuse the back into the
; so we have
, where
Now encoding is a simple and decoding a simple
. This means that we can prove that decoding inverts encoding, using the “Unfoldr–Foldr Theorem” stated in the Haskell documentation for
(in fact, the only property of
stated there!). The theorem states that if
for all and
, then
for all finite lists . Actually, that’s too strong for us here, because our decoding always generates an infinite list; a weaker variation (hereby called the “Unfoldr–Foldr Theorem With Junk”) states that if
for all and
, then
for all finite lists —that is, the
inverts the
except for appending some junk
to the output. It’s a nice exercise to prove this weaker theorem, by induction.
We check the condition:
Now, we hope to recover the first symbol, ie we hope that :
Fortunately, it is an invariant of the encoding process that —the result of
—is in the unit interval, so indeed
. Continuing:
as required. Therefore decoding inverts encoding, apart from the fact that it continues to produce junk having decoded all the input; so
for all finite .
From fractions to integers
Whereas AC encodes longer and longer messages as more and more precise fractions, ANS encodes them as larger and larger integers. Given the function on symbols, we can easily derive definitions
such that gives the cumulative counts of all symbols preceding
(say, in alphabetical order),
the total of all symbol counts, and
looks up an integer
:
(for example, we might tabulate the function using a scan).
We encode a text as an integer , containing
bits of information (all logs in this blog are base 2). The next symbol
to encode has probability
, and so requires an additional
bits; in total, that’s
bits. So entropy considerations tell us that, roughly speaking, to incorporate symbol
into state
we want to map
to
. Of course, in order to decode, we need to be able to invert this transformation, to extract
and
from
; this suggests that we should do the division by
first:
so that the multiplication by the known value can be undone first:
How do we reconstruct ? Well, there is enough headroom in
to add any non-negative value less than
without affecting the division; in particular, we can add
to
, and then we can use
on the remainder:
But we have still lost some information from the lower end of through the division, namely
; so we can’t yet reconstruct
. Happily,
for any
, so there is still precisely enough headroom in
to add this lost information too without affecting the
, allowing us also to reconstruct
:
We therefore define
Using similar reasoning as for AC, it is straightforward to show that a decoding step inverts an encoding step:
Therefore decoding inverts encoding, modulo final junk, by the Unfoldr–Foldr Theorem With Junk:
for all finite . For example, with the same alphabet and symbol weights as before, encoding the text “abc” (now from right to left, because of the
) proceeds through the steps
, and the last element
is the encoding.
In fact, the inversion property holds whatever value we use to start encoding with (since it isn’t used in the proof); in the next section, we start encoding with a certain lower bound rather than
. Moreover,
is strictly increasing on states strictly greater than zero, and
strictly decreasing; which means that the decoding process can stop when it returns to the lower bound. That is, if we pick some
and define
then the stronger version of the Unfoldr–Foldr Theorem holds, and we have
for all finite , without junk.
Bounded precision
The previous versions all used arbitrary-precision arithmetic, which is expensive. We now change the approach slightly to use only bounded-precision arithmetic. As usual, there is a trade-off between effectiveness (a bigger bound means more accurate approximations to ideal entropies) and efficiency (a smaller bound generally means faster operations). Fortunately, the reasoning does not depend on much about the actual bounds. We will represent the integer accumulator as a pair
which we call a
:
such that is a list of digits in a given base
, and
is an integer with
(where we define
for the upper bound of the range), under the abstraction relation
where
We call the “window” and
the “remainder”. For example, with
and
, the pair
represents
; and indeed, for any
there is a unique representation satisfying the invariants. According to Duda,
should divide into
evenly. On a real implementation,
and
should be powers of two, such that arithmetic on values up to
fits within a single machine word.
The encoding step acts on the window in the accumulator using , which risks making it overflow the range; we therefore renormalize with
by shifting digits from the window to the remainder until this overflow would no longer happen, before consuming the symbol.
Some calculation, using the fact that divides
, allows us to rewrite the guard in
:
For example, again encoding the text “abc”, again from right to left, with and
, the accumulator starts with
, and goes through the states
and
on consuming the `c’ then the `b’; directly consuming the `a’ would make the window overflow, so we renormalize to
; then it is safe to consume the `a’, leading to the final state
.
Decoding is an unfold using the accumulator as state. We repeatedly output a symbol from the window; this may make the window underflow the range, in which case we renormalize if possible by injecting digits from the remainder (and if this is not possible, because there are no more digits to inject, it means that we have decoded the entire text).
Note that decoding is of course symmetric to encoding; in particular, when encoding we renormalize before consuming a symbol; therefore when decoding we renormalize after emitting a symbol. For example, decoding the final encoding starts by computing
; the window value 68 has underflowed, so renormalization consumes the remaining digit 3, leading to the accumulator
; then decoding proceeds to extract the `b’ and `c’ in turn, returning the accumulator to
.
We can again prove that decoding inverts encoding, although we need to use yet another variation of the Unfoldr–Foldr Theorem, hereby called the “Unfoldr–Foldr Theorem With Invariant”. We say that predicate is an invariant of
and
if
The theorem is that if is such an invariant, and the conditions
hold for all and
, then
for all finite lists , provided that the initial state
satisfies the invariant
. In our case, the invariant is that the window
is in range (
), which is indeed maintained by
and
. Then it is straightforward to verify that
and
when is in range, making use of a lemma that
when is in range. Therefore decoding inverts encoding:
for all finite .
Trading in sequences
The version of encoding in the previous section yields a , that is, a pair consisting of an integer window and a digit sequence remainder. It would be more conventional for encoding to take a sequence of symbols to a sequence of digits alone, and decoding to take the sequence of digits back to a sequence of symbols. For encoding, we have to flush the remaining digits out of the window at the end of the process, reducing the window to zero:
For example, . Then we can define
Correspondingly, decoding should start by populating an initially-zero window from the sequence of digits:
For example, . Then we can define
One can show that
provided that is initially in range, and therefore again decoding inverts encoding:
for all finite .
Note that this is not just a data refinement—it is not the case that yields the digits of the integer computed by
. For example,
, whereas
. We have really changed the effectiveness of the encoding in return for the increased efficiency.
Streaming encoding
We would now like to stream encoding and decoding as metamorphisms: encoding should start delivering digits before consuming all the symbols, and conversely decoding should start delivering symbols before consuming all the digits.
For encoding, we have
with a fold; can we turn
into an unfold? Yes, we can! The remainder in the accumulator should act as a queue of digits: digits get enqueued at the most significant end, so we need to dequeue them from the least significant end. So we define
to “deal out” the elements of a list one by one, starting with the last. Of course, this reverses the list; so we have the slightly awkward
Now we can use unfold fusion to fuse and
into a single unfold, deriving the coalgebra
by considering each of the three cases, getting where
As for the input part, we have a where we need a
. Happily, that is easy to achieve, at the cost of an additional
, since:
on finite . So we have
, where
It turns out that the streaming condition does not hold for and
—although we can stream digits out of the remainder:
we cannot stream the last few digits out of the window:
We have to resort to flushing streaming, which starts from an apomorphism
rather than an unfold. Of course, any unfold is trivially an apo, with the trivial flusher that always yields the empty list; but more interestingly, any unfold can be factored into a “cautious” phase (delivering elements only while a predicate holds) followed by a “reckless” phase (discarding the predicate, and delivering the elements anyway)
where
In particular, the streaming condition may hold for the cautious coalgebra when it doesn’t hold for the reckless coalgebra
itself. We’ll use the cautious coalgebra
which carefully avoids the problematic case when the remainder is empty. It is now straightforward to verify that the streaming condition holds for and
, and therefore
, where
and where streams its output. On the downside, this has to read the input text in reverse, and also write the output digit sequence in reverse.
Streaming decoding
Fortunately, decoding is rather easier to stream. We have
with an unfold; can we turn
into a fold? Yes, we can! In fact, we have
, where
That is, starting with for the accumulator, digits are injected one by one into the window until this is in range, and thereafter appended to the remainder.
Now we have decoding as an after a
, and it is straightforward to verify that the streaming condition holds for
and
. Therefore
on finite inputs, where
and streams. It is perhaps a little more symmetric to move the
from the final step of encoding to the initial step of decoding—that is, to have the least significant digit first in the encoded text, rather than the most significant—and then to view both the encoding and the decoding process as reading their inputs from right to left.
Summing up
Inlining definitions and simplifying, we have ended up with encoding as a flushing stream:
and decoding as an ordinary stream:
Note that the two occurrences of can be taken to mean that encoding and decoding both process their input from last to first. The remainder acts as a queue, with digits being added at one end and removed from the other, so should be represented to make that efficient. But in fact, the remainder can be eliminated altogether, yielding simply the window for the accumulator—for encoding:
and for decoding:
This gives rather simple programs, corresponding to those in Duda’s paper; however, the calculation of this last version from the previous steps currently eludes me.
Apologies if you came looking for this page and found it missing. I realised about the last step (eliminating the remainder) just after publishing, so took it offline to include this step. I still don’t immediately see how to calculate it, though; I’d be grateful for any thoughts.
Pingback: Folds on lists | Patterns in Functional Programming
On further reflection, I don’t think forcing encoding to fit the fstream pattern was helpful. It’s easier to justify the final version of encode directly from encode5. Perhaps also for decoding, although I haven’t yet thought further about that.