It appears that I have insufficient time, or at least insufficient discipline, to contribute to this blog, except when I am on sabbatical. Which I now am… so let’s see if I can do better.
I don’t think I’ve written about them yet in this series—another story, for another day—but hylomorphisms consist of a fold after an unfold. One very simple example is the factorial function: is the product of the predecessors of . The predecessors can be computed with an unfold:
and the product as a fold:
and then factorial is their composition:
Another example is a tree-based sorting algorithm that resembles Hoare’s quicksort: from the input list, grow a binary search tree, as an unfold, and then flatten that tree back to a sorted list, as a fold. This is a divide-and-conquer algorithm; in general, these can be modelled as unfolding a tree of subproblems by repeatedly dividing the problem, then collecting the solution to the original problem by folding together the solutions to subproblems.
An unfold after a fold
This post is about the opposite composition, an unfold after a fold. Some examples:
- to reformat a list of lists to a given length;
- to sort a list;
- to convert a fraction from base to base ;
- to encode a text in binary by “arithmetic coding”.
In each of these cases, the first phase is a fold, which consumes some structured representation of a value into an intermediate unstructured format, and the second phase is an unfold, which generates a new structured representation. Their composition effects a change of representation, so we call them metamorphisms.
Hylomorphisms always fuse, and one can deforest the intermediate virtual data structure. For example, one need not construct the intermediate list in the factorial function; since each cell gets constructed in the unfold only to be immediately deconstructed in the fold, one can cut to the chase and go straight to the familiar recursive definition. For the base case, we have:
and for non-zero argument , we have:
In contrast, metamorphisms only fuse under certain conditions. However, when they do fuse, they also allow infinite representations to be processed, as we shall see.
Fusion seems to depend on the fold being tail-recursive; that is, a :
For the unfold phase, we will use the usual list unfold:
We define a metamorphism as their composition:
This transforms input of type to output of type : in the first phase, , it consumes all the input into an intermediate value of type ; in the second phase, , it produces all the output.
Under certain conditions, it is possible to fuse these two phases—this time, not in order to eliminate an intermediate data structure (after all, the intermediate type need not be structured), but rather in order to allow some production steps to happen before all the consumption steps are complete.
To that end, we define the function as follows:
This takes the same arguments as . It maintains a current state , and produces an output element when it can; and when it can’t produce, it consumes an input element instead. In more detail, it examines the current state using function , which is like the body of an unfold; this may produce a first element of the result and a new state ; when it yields no element, the next element of the input is consumed using function , which is like the body of a ; and when no input remains either, we are done.
The streaming condition for and is that
Consider a state from which the body of the unfold is productive, yielding some . From here we have two choices: we can either produce the output , move to intermediate state , then consume the next input to yield a final state ; or we can consume first to get the intermediate state , and again try to produce. The streaming condition says that this intermediate state will again be productive, and will yield the same output and the same final state . That is, instead of consuming all the inputs first, and then producing all the outputs, it is possible to produce some of the outputs early, without jeopardizing the overall result. Provided that the streaming condition holds for and , then
for all finite lists .
As a simple example, consider the `buffering’ process , where
Note that , so is just a complicated way of writing as a . But the streaming condition holds for and (as you may check), so may be streamed. Operationally, the streaming version of consumes one list from the input list of lists, then peels off and produces its elements one by one; when they have all been delivered, it consumes the next input list, and so on.
The streaming version of is actually rather special, because the production steps can always completely exhaust the intermediate state. In contrast, consider the `regrouping’ example where
from the introduction (here, yields where , with when and otherwise). This transforms an input list of lists into an output list of lists, where each output `chunk’ except perhaps the last has length —if the content doesn’t divide up evenly, then the last chunk is short. One might hope to be able to stream , but it doesn’t quite work with the formulation so far. The problem is that is too aggressive, and will produce short chunks when there is still some input to consume. (Indeed, the streaming condition does not hold for and —why not?) One might try the more cautious producer :
But this never produces a short chunk, and so if the content doesn’t divide up evenly then the last few elements will not be extracted from the intermediate state and will be lost.
We need to combine these two producers somehow: the streaming process should behave cautiously while there is still remaining input, which might influence the next output; but it should then switch to a more aggressive strategy once the input is finished, in order to flush out the contents of the intermediate state. To achieve this, we define a more general flushing stream operator:
This takes an additional argument ; when the cautious producer is unproductive, and there is no remaining input to consume, it uses to flush out the remaining output elements from the state. Clearly, specializing to retrieves the original operator.
The corresponding metamorphism uses an apomorphism in place of the unfold. Define
Then behaves like , except that if and when stops being productive it finishes up by applying to the final state. Similarly, define flushing metamorphisms:
Then we have
for all finite lists if the streaming condition holds for and . In particular,
on finite inputs : the streaming condition does hold for and the more cautious , and once the input has been exhausted, the process can switch to the more aggressive .
The main advantage of streaming is that it can allow the change-of-representation process also to work on infinite inputs. With the plain metamorphism, this is not possible: the will yield no result on an infinite input, and so the will never get started, but the may be able to produce some outputs before having consumed all the inputs. For example, the streaming version of also works for infinite lists, providing that the input does not end with an infinite tail of empty lists. And of course, if the input never runs out, then there is no need ever to switch to the more aggressive flushing phase.
As a more interesting example, consider converting a fraction from base 3 to base 7:
We assume that the input digits are all either 0, 1 or 2, so that the number being represented is in the unit interval.
The fold in is of the wrong kind; but we have also
Here, the intermediate state can be seen as a defunctionalized representation of the function , and applies this function to :
Now there is an extra function between the and the ; but that’s no obstacle, because it fuses with the :
However, the streaming condition does not hold for and . For example,
That is, , but , so it is premature to produce the first digit 2 in base 7 having consumed only the first digit 1 in base 3. The producer is too aggressive; it should be more cautious while input remains that might invalidate a produced digit.
Fortunately, on the assumption that the input digits are all 0, 1, or 2, the unconsumed input—a tail of the original input—again represents a number in the unit interval; so from the state the range of possible unproduced outputs represents a number between and . If these both start with the same digit in base 7, then (and only then) is it safe to produce that digit. So we define
and we have
Now, the streaming condition holds for and (as you may check), and therefore
on finite digit sequences in base 3. Moreover, the streaming program works also on infinite digit sequences, where the original does not.
(Actually, the only way this could possibly produce a finite output in base 7 would be for the input to be all zeroes. Why? If we are happy to rule out this case, we could consider only the case of taking infinite input to infinite output, and not have to worry about reaching the end of the input or flushing the state.)