The Genuine Sieve of Eratosthenes

The “Sieve of Eratosthenes” is often used as a nice example of the expressive power of streams (infinite lazy lists) and list comprehensions:

\displaystyle  \begin{array}{@{}l} \mathit{primes} = \mathit{sieve}\;[2.\,.] \;\textbf{where} \\ \qquad \mathit{sieve}\;(p:\mathit{xs}) = p : \mathit{sieve}\;[ x \mid x \leftarrow \mathit{xs}, x \mathbin{\mathrm{mod}} p \ne 0] \end{array}

That is, {\mathit{sieve}} takes a stream of candidate primes; the head {p} of this stream is a prime, and the remaining primes are obtained by removing all multiples of {p} from the candidates and sieving what’s left. (It’s also a nice unfold.)

Unfortunately, as Melissa O’Neill observes, this nifty program is not the true Sieve of Eratosthenes! The problem is that for each prime {p}, every remaining candidate {x} is tested for divisibility. O’Neill calls this bogus common version “trial division”, and argues that the Genuine Sieve of Eratosthenes should eliminate every multiple of {p} without reconsidering all the candidates in between. That is, only {{}^{1\!}/_{\!2}} of the natural numbers are touched when eliminating multiples of 2, less than {{}^{1\!}/_{\!3}} of the remaining candidates for multiples of 3, and so on. As an additional optimization, it suffices to eliminate multiples of {p} starting with {p^2}, since by that point all composite numbers with a smaller nontrivial factor will already have been eliminated.

O’Neill’s paper presents a purely functional implementation of the Genuine Sieve of Eratosthenes. The tricky bit is keeping track of all the eliminations when generating an unbounded stream of primes, since obviously you can’t eliminate all the multiples of one prime before producing the next prime. Her solution is to maintain a priority queue of iterators; indeed, the main argument of her paper is that functional programmers are often too quick to use lists, when other data structures such as priority queues might be more appropriate.

O’Neill’s paper was published as a Functional Pearl in the Journal of Functional Programming, when Richard Bird was the handling editor for Pearls. The paper includes an epilogue that presents a purely list-based implementation of the Genuine Sieve, contributed by Bird during the editing process. And the same program pops up in Bird’s textbook Thinking Functionally with Haskell (TFWH). This post is about Bird’s program.

The Genuine Sieve, using lists

Bird’s program appears in §9.2 of TFWH. It deals with lists, but these will be infinite, sorted, duplicate-free streams, and you should think of these as representing infinite sets, in this case sets of natural numbers. In particular, there will be no empty or partial lists, only properly infinite ones.

Now, the prime numbers are what you get by eliminating the composite numbers from the “plural” naturals (those greater than one), and the composite numbers are the proper multiples of the primes—so the program is cleverly cyclic:

\displaystyle  \begin{array}{@{}l@{\;}l} \multicolumn{2}{@{}l}{\mathit{primes}, \mathit{composites} :: [\mathit{Integer}]} \\ \mathit{primes} & = 2 : ([3.\,.] \mathbin{\backslash\!\backslash} \mathit{composites}) \\ \mathit{composites} & = \mathit{mergeAll}\;[ \mathit{multiples}\;p \mid p \leftarrow \mathit{primes} ] \\ \end{array}

We’ll come back in a minute to {\mathit{mergeAll}}, which unions a set of sets to a set; but {(\backslash\!\backslash)} is the obvious implementation of list difference of sorted streams (hence, representing set difference):

\displaystyle  \begin{array}{@{}l@{\;}l@{\;}l} \multicolumn{3}{@{}l}{(\mathbin{\backslash\!\backslash}) :: \mathit{Ord}\;a \Rightarrow [a] \rightarrow [a] \rightarrow [a]} \\ (x:\mathit{xs}) \mathbin{\backslash\!\backslash} (y:\mathit{ys}) & \mid x < y & = x:(\mathit{xs} \mathbin{\backslash\!\backslash} (y:\mathit{ys})) \\ & \mid x == y & = \mathit{xs} \mathbin{\backslash\!\backslash} \mathit{ys} \\ & \mid x > y & = (x:\mathit{xs}) \mathbin{\backslash\!\backslash} \mathit{ys} \end{array}

and {\mathit{multiples}\;p} generates the multiples of {p} starting with {p^2}:

\displaystyle  \begin{array}{@{}l} \mathit{multiples}\;p = \mathit{map}\; (p\times)\;[p.\,.] \end{array}

Thus, the composites are obtained by merging together the infinite stream of infinite streams {[ [ 4, 6.\,.], [ 9, 12.\,.], [ 25, 30.\,.], \dots ]}. You might think that you could have defined instead {\mathit{primes} = [2.\,.] \mathbin{\backslash\!\backslash} \mathit{composites}}, but this doesn’t work: this won’t compute the first prime without first computing some composites, and you can’t compute any composites without at least the first prime, so this definition is unproductive. Somewhat surprisingly, it suffices to “prime the pump” (so to speak) just with 2, and everything else flows freely from there.

Returning to {\mathit{mergeAll}}, here is the obvious implementation of {\mathit{merge}}, which merges two sorted duplicate-free streams into one (hence, representing set union):

\displaystyle  \begin{array}{@{}l@{\;}l@{\;}l} \multicolumn{3}{@{}l}{\mathit{merge} :: \mathit{Ord}\;a \Rightarrow [a] \rightarrow [a] \rightarrow [a]} \\ \mathit{merge}\;(x:\mathit{xs})\;(y:\mathit{ys}) & \mid x < y & = x : \mathit{merge}\;\mathit{xs}\;(y:\mathit{ys}) \\ & \mid x == y & = x : \mathit{merge}\;\mathit{xs}\;\mathit{ys} \\ & \mid x > y & = y : \mathit{merge}\;(x:\mathit{xs})\;\mathit{ys} \end{array}

Then {\mathit{mergeAll}} is basically a stream fold with {\mathit{merge}}. You might think you could define this simply by {\mathit{mergeAll}\;(\mathit{xs}:\mathit{xss}) = \mathit{merge}\;\mathit{xs}\;(\mathit{mergeAll}\;\mathit{xss})}, but again this is unproductive. After all, you can’t merge the infinite stream of sorted streams {[ [5,6.\,.], [4,5.\,.], [3,4.\,.], \dots ]} into a single sorted stream, because there is no least element. Instead, we have to exploit the fact that we have a sorted stream of sorted streams; then the binary merge can exploit the fact that the head of the left stream is the head of the result, without even examining the right stream. So, we define:

\displaystyle  \begin{array}{@{}l} \mathit{mergeAll} :: \mathit{Ord}\;a \Rightarrow [[a]] \rightarrow [a] \\ \mathit{mergeAll}\;(\mathit{xs}:xss) = \mathit{xmerge}\;\mathit{xs}\;(\mathit{mergeAll}\;xss) \medskip \\ \mathit{xmerge} :: \mathit{Ord}\;a \Rightarrow [a] \rightarrow [a] \rightarrow [a] \\ \mathit{xmerge}\;(x:\mathit{xs})\;\mathit{ys} = x : \mathit{merge}\;\mathit{xs}\;\mathit{ys} \end{array}

This program is now productive, and {\mathit{primes}} yields the infinite sequence of prime numbers, using the genuine algorithm of Eratosthenes.

Approximation Lemma

Bird uses the cyclic program as an illustration of the Approximation Lemma. This states that for infinite lists {\mathit{xs}}, {\mathit{ys}},

\displaystyle  (\mathit{xs} = \mathit{ys}) \quad \Leftrightarrow \quad (\forall n \in \mathbb{N} \mathbin{.} \mathit{approx}\;n\;\mathit{xs} = \mathit{approx}\;n\;\mathit{ys})

where

\displaystyle  \begin{array}{@{}l@{\;}l@{\;}l} \multicolumn{3}{@{}l}{\mathit{approx} :: \mathit{Integer} \rightarrow [a] \rightarrow [a]} \\ \mathit{approx}\;(n+1)\;[\,] & = & [\,] \\ \mathit{approx}\;(n+1)\;(x:\mathit{xs}) & = & x : \mathit{approx}\;n\;\mathit{xs} \end{array}

Note that {\mathit{approx}\;0\;\mathit{xs}} is undefined; the function {\mathit{approx}\;n} preserves the outermost {n} constructors of a list, but then chops off anything deeper and replaces it with {\bot} (undefined). So, the lemma states that to prove two infinite lists equal, it suffices to prove equal all their finite approximations.

Then to prove that {\mathit{primes}} does indeed produce the prime numbers, it suffices to prove that

\displaystyle  \mathit{approx}\;n\;\mathit{primes} = p_1 : \cdots : p_n : \bot

for all {n}, where {p_j} is the {j}th prime (so {p_1=2}). Bird therefore defines

\displaystyle  \mathit{prs}\;n = \mathit{approx}\;n\;\mathit{primes}

and claims that

\displaystyle  \begin{array}{@{}ll} \mathit{prs}\;n & = \mathit{approx}\;n\;(2 : ([3.\,.] \mathbin{\backslash\!\backslash} \mathit{crs}\;n)) \\ \mathit{crs}\;n & = \mathit{mergeAll}\;[\mathit{multiples}\;p \mid p \leftarrow \mathit{prs}\;n] \end{array}

To prove the claim, he observes that it suffices for {\mathit{crs}\;n} to be well defined at least up to the first composite number greater than {p_{n+1}}, because then {\mathit{crs}\;n} delivers enough composite numbers to supply {\mathit{prs}\;(n+1)}, which will in turn supply {\mathit{crs}\;(n+1)}, and so on. It is a “non-trivial result in Number Theory” (in fact, a consequence of Bertrand’s Postulate) that {p_{n+1} < p_n^2}; therefore it suffices that

\displaystyle  \mathit{crs}\;n = c_1 : \cdots : c_m : \bot

where {c_j} is the {j}th composite number (so {c_1=4}) and {c_m = p_n^2}. Completing the proof is set as Exercise 9.I of TFWH, and Answer 9.I gives a hint about using induction to show that {\mathit{crs}\;(n+1)} is the result of merging {\mathit{crs}\;n} with {\mathit{multiples}\;p_{n+1}}. (Incidentally, there is a typo in TFWH: both the body of the chapter and the exercise and its solution have “{m = p_n^2}” instead of “{c_m = p_n^2}”.)

Unfortunately, the hint in Answer 9.I is simply wrong. For example, it implies that {\mathit{crs}\;2} (which equals {4:6:8:9:\bot}) could be constructed from {\mathit{crs}\;1} (which equals {4:\bot}) and {\mathit{multiples}\;3} (which equals {[9,12.\,.]}); but this means that the 6 and 8 would have to come out of nowhere. Nevertheless, the claim in Exercise 9.I is valid. What should the hint for the proof have been?

Fixing the proof

The problem is made tricky by the cyclicity. Tom Schrijvers suggested to me to break the cycle by defining

\displaystyle  \mathit{crsOf}\;\mathit{qs} = \mathit{mergeAll}\;[\mathit{multiples}\;p \mid p \leftarrow \mathit{qs}]

so that {\mathit{crs}\;n = \mathit{crsOf}\;(\mathit{prs}\;n)}; this allows us to consider {\mathit{crsOf}} in isolation from {\mathit{prs}}. In fact, I claim (following a suggestion generated by Geraint Jones and Guillaume Boisseau at our Algebra of Programming research group) that if {\mathit{qs} = q_1:\dots:q_n:\bot} where {1 < q_1 < \cdots < q_n}, then {\mathit{crsOf}\;\mathit{qs}} has a well-defined prefix consisting of all the composite numbers {c} such that {c \le q_n^2} and {c} is a multiple of some {q_i} with {c \ge q_i^2}, in ascending order, then becomes undefined. That’s not so difficult to see from the definition: {\mathit{multiples}\;q_i} contains the multiples of {q_i} starting from {q_i^2}; all these streams get merged; but the result gets undefined (in {\mathit{merge}}) once we pass the head {q_n^2} of the last stream. In particular, {\mathit{crsOf}\;(\mathit{prs}\;n)} has a well-defined prefix consisting of all the composites up to {p_n^2}, then becomes undefined, as required.

However, I must say that I still find this argument unsatisfactory: I have no equational proof about the behaviour of {\mathit{crsOf}}. In fact, I believe that the Approximation Lemma is unsufficient to provide such a proof: {\mathit{approx}} works along the grain of {\mathit{prs}}, but against the grain of {\mathit{crs}}. I believe we want something instead like an “Approx While Lemma”, where {\mathit{approxWhile}} is to {\mathit{approx}} as {\mathit{takeWhile}} is to {\mathit{take}}:

\displaystyle  \begin{array}{@{}l} \mathit{approxWhile} :: (a \rightarrow \mathit{Bool}) \rightarrow [a] \rightarrow [a] \\ \mathit{approxWhile}\;p\;(x:\mathit{xs}) \mid p\;x = x : \mathit{approxWhile}\;p\;\mathit{xs} \end{array}

Thus, {\mathit{approxWhile}\;p\;\mathit{xs}} retains elements of {\mathit{xs}} as long as they satisfy {p}, but becomes undefined as soon as they don’t (or when the list runs out). Then for all sorted, duplicate-free streams {\mathit{xs},\mathit{ys}} and unbounded streams {\mathit{bs}} (that is, {\forall n \in \mathbb{N} \mathbin{.} \exists b \in \mathit{bs} \mathbin{.} b > n}),

\displaystyle  (\mathit{xs} = \mathit{ys}) \quad \Leftrightarrow \quad (\forall b \in \mathit{bs} \mathbin{.} \mathit{approxWhile}\;(\le b)\;\mathit{xs} = \mathit{approxWhile}\;(\le b)\;\mathit{ys})

The point is that {\mathit{approxWhile}} works conveniently for {\mathit{merge}} and {(\mathbin{\backslash\!\backslash})} and hence for {\mathit{crs}} too. I am now working on a proof by equational reasoning of the correctness (especially the productivity) of the Genuine Sieve of Eratosthenes…

About jeremygibbons

Jeremy Gibbons is Professor of Computing in Oxford University Department of Computer Science, and a fan of functional programming and patterns of computation.
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to The Genuine Sieve of Eratosthenes

  1. Running Spring says:

    This may not be useful for the proof, but it’s interesting to see that the Approx lemma (and ApproxWhile lemma to a lesser extent) is again a consequence of Yoneda embedding, in fact, Yoneda embedding restricted to a dense subcategory.

    Let S be the preorder of possibly infinite sequences of integers, and x <= y if x is a prefix of y. Then Yoneda embedding for this category tells us x = y holds exactly when for any (possibly infinite sequences) z, z <= x iff z <= y.
    Because the full subcategory of finite sequences is _dense_ in S, we can restrict the range of z to only finite sequences. This is the Approx lemma.

    Similarly for ApproxWhile, we take the dense subcategory to be the sequences that are bounded by some b_i in the statement of ApproxWhile lemma.

    I think this is a nice little example of restricting the Yoneda embedding to a dense subcategory, which is still faithful.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s