The previous two posts discussed *arithmetic coding* (AC). In this post I’ll discuss a related but more recent entropy encoding technique called *asymmetric numeral systems* (ANS), introduced by Jarek Duda and explored in some detail in a series of 12 blog posts by Charles Bloom. ANS combines the *effectiveness* of AC (achieving approximately the theoretical optimum Shannon entropy) with the *efficiency* of Huffman coding. On the other hand, whereas AC acts in a first-in first-out manner, ANS acts last-in first-out, in the sense that the decoded text comes out in reverse; this is not a problem in a batched context, but it makes ANS unsuitable for encoding a communications channel, and also makes it difficult to employ adaptive text models.

## Arithmetic coding, revisited

Here is a simplified version of arithmetic coding; compared to the previous posts, we use a fixed rather than adaptive model, and rather than ing some arbitrary element of the final interval, we pick explicitly the lower bound of the interval (by , when it is represented as a pair). We suppose a function

that returns a positive count for every symbol, representing predicted frequencies of occurrence, from which it is straightforward to derive functions

that work on a fixed global model, satisfying the central property

For example, an alphabet of three symbols `a’, `b’, `c’, with counts 2, 3, and 5 respectively, could be encoded by the intervals . We have operations on intervals:

that satisfy

Then we have:

For example, with the alphabet and intervals above, the input text “abc” gets encoded symbol by symbol, from left to right (because of the ), to the narrowing sequence of intervals from which we select the lower bound of the final interval. (This version doesn’t stream well, because of the decision to pick the lower bound of the final interval as the result. But it will suffice for our purposes.)

Note now that the symbol encoding can be “fissioned” out of , using map fusion for backwards; moreover, is associative (and its neutral element), so we can replace the with a ; and finally, now using fusion forwards, we can fuse the with the . That is,

and we can fuse the back into the ; so we have , where

Now encoding is a simple and decoding a simple . This means that we can prove that decoding inverts encoding, using the “Unfoldr–Foldr Theorem” stated in the Haskell documentation for (in fact, the only property of stated there!). The theorem states that if

for all and , then

for all finite lists . Actually, that’s too strong for us here, because our decoding always generates an infinite list; a weaker variation (hereby called the “Unfoldr–Foldr Theorem With Junk”) states that if

for all and , then

for all finite lists —that is, the inverts the except for appending some junk to the output. It’s a nice exercise to prove this weaker theorem, by induction.

We check the condition:

Now, we hope to recover the first symbol, ie we hope that :

Fortunately, it is an invariant of the encoding process that —the result of —is in the unit interval, so indeed . Continuing:

as required. Therefore decoding inverts encoding, apart from the fact that it continues to produce junk having decoded all the input; so

for all finite .

## From fractions to integers

Whereas AC encodes longer and longer messages as more and more precise fractions, ANS encodes them as larger and larger integers. Given the function on symbols, we can easily derive definitions

such that gives the cumulative counts of all symbols preceding (say, in alphabetical order), the total of all symbol counts, and looks up an integer :

(for example, we might tabulate the function using a scan).

We encode a text as an integer , containing bits of information (all logs in this blog are base 2). The next symbol to encode has probability , and so requires an additional bits; in total, that’s bits. So entropy considerations tell us that, roughly speaking, to incorporate symbol into state we want to map to . Of course, in order to decode, we need to be able to invert this transformation, to extract and from ; this suggests that we should do the division by first:

so that the multiplication by the known value can be undone first:

How do we reconstruct ? Well, there is enough headroom in to add any non-negative value less than without affecting the division; in particular, we can add to , and then we can use on the remainder:

But we have still lost some information from the lower end of through the division, namely ; so we can’t yet reconstruct . Happily, for any , so there is still precisely enough headroom in to add this lost information too without affecting the , allowing us also to reconstruct :

We therefore define

Using similar reasoning as for AC, it is straightforward to show that a decoding step inverts an encoding step:

Therefore decoding inverts encoding, modulo final junk, by the Unfoldr–Foldr Theorem With Junk:

for all finite . For example, with the same alphabet and symbol weights as before, encoding the text “abc” (now from right to left, because of the ) proceeds through the steps , and the last element is the encoding.

In fact, the inversion property holds whatever value we use to start encoding with (since it isn’t used in the proof); in the next section, we start encoding with a certain lower bound rather than . Moreover, is strictly increasing on states strictly greater than zero, and strictly decreasing; which means that the decoding process can stop when it returns to the lower bound. That is, if we pick some and define

then the stronger version of the Unfoldr–Foldr Theorem holds, and we have

for all finite , without junk.

## Bounded precision

The previous versions all used arbitrary-precision arithmetic, which is expensive. We now change the approach slightly to use only bounded-precision arithmetic. As usual, there is a trade-off between effectiveness (a bigger bound means more accurate approximations to ideal entropies) and efficiency (a smaller bound generally means faster operations). Fortunately, the reasoning does not depend on much about the actual bounds. We will represent the integer accumulator as a pair which we call a :

such that is a list of digits in a given base , and is an integer with (where we define for the upper bound of the range), under the abstraction relation

where

We call the “window” and the “remainder”. For example, with and , the pair represents ; and indeed, for any there is a unique representation satisfying the invariants. According to Duda, should divide into evenly. On a real implementation, and should be powers of two, such that arithmetic on values up to fits within a single machine word.

The encoding step acts on the window in the accumulator using , which risks making it overflow the range; we therefore renormalize with by shifting digits from the window to the remainder until this overflow would no longer happen, before consuming the symbol.

Some calculation, using the fact that divides , allows us to rewrite the guard in :

For example, again encoding the text “abc”, again from right to left, with and , the accumulator starts with , and goes through the states and on consuming the `c’ then the `b’; directly consuming the `a’ would make the window overflow, so we renormalize to ; then it is safe to consume the `a’, leading to the final state .

Decoding is an unfold using the accumulator as state. We repeatedly output a symbol from the window; this may make the window underflow the range, in which case we renormalize if possible by injecting digits from the remainder (and if this is not possible, because there are no more digits to inject, it means that we have decoded the entire text).

Note that decoding is of course symmetric to encoding; in particular, when encoding we renormalize before consuming a symbol; therefore when decoding we renormalize *after* emitting a symbol. For example, decoding the final encoding starts by computing ; the window value 68 has underflowed, so renormalization consumes the remaining digit 3, leading to the accumulator ; then decoding proceeds to extract the `b’ and `c’ in turn, returning the accumulator to .

We can again prove that decoding inverts encoding, although we need to use yet another variation of the Unfoldr–Foldr Theorem, hereby called the “Unfoldr–Foldr Theorem With Invariant”. We say that predicate is an invariant of and if

The theorem is that if is such an invariant, and the conditions

hold for all and , then

for all finite lists , provided that the initial state satisfies the invariant . In our case, the invariant is that the window is in range (), which is indeed maintained by and . Then it is straightforward to verify that

and

when is in range, making use of a lemma that

when is in range. Therefore decoding inverts encoding:

for all finite .

## Trading in sequences

The version of encoding in the previous section yields a , that is, a pair consisting of an integer window and a digit sequence remainder. It would be more conventional for encoding to take a sequence of symbols to a sequence of digits alone, and decoding to take the sequence of digits back to a sequence of symbols. For encoding, we have to flush the remaining digits out of the window at the end of the process, reducing the window to zero:

For example, . Then we can define

Correspondingly, decoding should start by populating an initially-zero window from the sequence of digits:

For example, . Then we can define

One can show that

provided that is initially in range, and therefore again decoding inverts encoding:

for all finite .

Note that this is not just a data refinement—it is not the case that yields the digits of the integer computed by . For example, , whereas . We have really changed the effectiveness of the encoding in return for the increased efficiency.

## Streaming encoding

We would now like to stream encoding and decoding as metamorphisms: encoding should start delivering digits before consuming all the symbols, and conversely decoding should start delivering symbols before consuming all the digits.

For encoding, we have

with a fold; can we turn into an unfold? Yes, we can! The remainder in the accumulator should act as a queue of digits: digits get enqueued at the most significant end, so we need to dequeue them from the least significant end. So we define

to “deal out” the elements of a list one by one, starting with the last. Of course, this reverses the list; so we have the slightly awkward

Now we can use *unfold fusion* to fuse and into a single unfold, deriving the coalgebra

by considering each of the three cases, getting where

As for the input part, we have a where we need a . Happily, that is easy to achieve, at the cost of an additional , since:

on finite . So we have , where

It turns out that the streaming condition does not hold for and —although we can stream digits out of the remainder:

we cannot stream the last few digits out of the window:

We have to resort to *flushing streaming*, which starts from an apomorphism

rather than an unfold. Of course, any unfold is trivially an apo, with the trivial flusher that always yields the empty list; but more interestingly, any unfold can be factored into a “cautious” phase (delivering elements only while a predicate holds) followed by a “reckless” phase (discarding the predicate, and delivering the elements anyway)

where

In particular, the streaming condition may hold for the cautious coalgebra when it doesn’t hold for the reckless coalgebra itself. We’ll use the cautious coalgebra

which carefully avoids the problematic case when the remainder is empty. It is now straightforward to verify that the streaming condition holds for and , and therefore , where

and where streams its output. On the downside, this has to read the input text in reverse, and also write the output digit sequence in reverse.

## Streaming decoding

Fortunately, decoding is rather easier to stream. We have

with an unfold; can we turn into a fold? Yes, we can! In fact, we have , where

That is, starting with for the accumulator, digits are injected one by one into the window until this is in range, and thereafter appended to the remainder.

Now we have decoding as an after a , and it is straightforward to verify that the streaming condition holds for and . Therefore on finite inputs, where

and streams. It is perhaps a little more symmetric to move the from the final step of encoding to the initial step of decoding—that is, to have the least significant digit first in the encoded text, rather than the most significant—and then to view both the encoding and the decoding process as reading their inputs from right to left.

## Summing up

Inlining definitions and simplifying, we have ended up with encoding as a flushing stream:

and decoding as an ordinary stream:

Note that the two occurrences of can be taken to mean that encoding and decoding both process their input from last to first. The remainder acts as a queue, with digits being added at one end and removed from the other, so should be represented to make that efficient. But in fact, the remainder can be eliminated altogether, yielding simply the window for the accumulator—for encoding:

and for decoding:

This gives rather simple programs, corresponding to those in Duda’s paper; however, the calculation of this last version from the previous steps currently eludes me.

Apologies if you came looking for this page and found it missing. I realised about the last step (eliminating the remainder) just after publishing, so took it offline to include this step. I still don’t immediately see how to calculate it, though; I’d be grateful for any thoughts.

Pingback: Folds on lists | Patterns in Functional Programming