Primes Plus Powers, in PythonsteemCreated with Sketch.

in #steemstem6 years ago (edited)

"What is the smallest prime that is the sum of a square and a prime in two different ways?"

This small puzzle has a "tricky" solution which is 3 = 1^2 + 2 = 0^2 + 3. But every prime is a prime plus 0^2 so it's probably more in keeping with the puzzle to ask for positive squares; the answer is then 11 = 2^2 + 7 = 3^2 + 2.

What about the first prime number with three such representations? What about cubes, fourth powers, and higher? Let's turn this brain teaser into some brute-force computational mathematics, and use two of my favorite algorithms along the way.

The Sieve of Eratosthenes, in pure Python

Obviously we need a bunch of primes. The Sieve of Eratosthenes is a very efficient way of generating lots of primes, as long as we keep them relatively small--- and even for larger values, sieving methods are best for generating consecutive primes. The primesieve package is a highly-optimized multi-threaded prime sieve which can generate billions of primes in just a few seconds.

But my favorite go-to is this pure-Python snippet, from a StackOverflow answer by "Robert William Hanks" (Bruno Astrolino E Silva) to python - Fastest way to list all primes below N

def primes2(n):
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    n, correction = n-n%6+6, 2-(n%6>1)
    sieve = [True] * (n/3)
    for i in xrange(1,int(n**0.5)/3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      k*k/3      ::2*k] = [False] * ((n/6-k*k/6-1)/k+1)
        sieve[k*(k-2*(i&1)+4)/3::2*k] = [False] * ((n/6-k*(k-2*(i&1)+4)/6-1)/k+1)
    return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]

This sieve implementation operates on just numbers that are congruent to 1 or -1 modulo 6; all other possibilities are divisible by 2 and 3, so we can eliminate them right away. Notice that instead of looping over all multiples of a prime explicitly, the code uses Python array slice notation to do the iteration.

Heaps and Enumeration

If we want to efficiently find the smallest example of repeated representations q = p + b^n, it will be helpful if we can visit them in increasing order. That way we don't have to remember all the primes we've visited, which could take a large amount of memory.

One efficient mechanism is described by Daniel J. Bernstein (a mathematician and cryptography researcher) in "Enumerating Solutions to p(a) + q(b) = r(c) + s(d)", Mathematics of Computation, Vol 70 Issue 223, January 2001. This method is particularly easy to implement in Python.

A heap is a data structure frequently introduced in computer science courses, but which doesn't get a lot of varied uses. With a heap you can build priority queues, and implement heapsort, and... that's the point at which most people's memory runs dry. (It's used in some graph algorithms, too.)

LIKE THIS:

By Ermishin - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=12251273

**NOT LIKE THIS:** (Author's personal heap)

Today is no exception; we're going to use a heap as a priority queue. The great thing about a heap is that extracting the minimum is extremely efficient (it's constant time) and replacing that minimum by a new value is still efficient (logarithmic time.) This means we can search a quite large range of numbers without worrying about CPU costs. It's also memory-efficient as it can be implemented as a simple array. The heapq Python module implements a heap.

In the full generality described by Bernstein , the elements of a heap would be tuples (p(a)+q(b),a,b), initialized for all values of b (up to some maximum), using a = 1. The minimum value on the heap is thus the smallest possible value of the sum p(a) + q(b), as long as p is nondecreasing. We can extract that value, and then replace the minimum tuple (p(1)+q(b),1,b) with (p(2)+q(b),2,b), by subtracting p(1) and adding p(2) to get a higher value. Keep doing that until we hit the maximum value of a that we're wiling to use, and we will visit all the p(a)+q(b) values in order from smallest to largest.

The nice things about using heapq are that it is a min-heap, which is just what we wanted, and that the sort order for Python tuples is lexicographic order. So if we put the value first, as in the description above, the heap will store the entire tuple but be ordered in ascending value, without any additional work.

For the "primes plus powers" problem one of the two functions is just the identity, so we can get rid of it and store a 2-tuple instead of a 3-tuple. Here's a generator which returns all of the numbers of the form prime + x^power, up to some maximum p and maximum x, in order:

import heapq

def enumeratePowersPlusPrimes( power, maxPrime = 1000000, maxBase = 10000 ):
    # We don't need to store the prime explicitly, we can always recover it.
    # Our heap contains the total value y, and the x such that y = p + x**power
    h = [ (p+1,1) for p in primes2( maxPrime ) ]
    heapq.heapify( h )
    while len( h ) > 0:
        ( y, n ) = h[0]
        yield ( y, y - n**power, n )
        if n <= maxBase:
            yPrime = y - n ** power + (n+1) ** power
            heapq.heapreplace( h, ( yPrime, n+1 ) )
        else:
            heapq.heappop( h )

Example usage (sorry, this is the Python 2 format) showing the numbers of the form prime + x^3, in order:

>>> g = enumeratePowersPlusPrimes( 3 )
>>> g.next()
(3, 2, 1)
>>> g.next()
(4, 3, 1)
>>> g.next()
(6, 5, 1)
>>> g.next()
(8, 7, 1)
>>> g.next()
(10, 2, 2)

Check: 10 = 2 + 2^3

Last pieces

Now we just need a way to detect two of the same value in a row, which is pretty easy. There's probably some cute way to do it with zip or the itertools package, but this is fine:

def matchTwo( sequence ):
    last = sequence.next()
    for x in sequence:
        if x[0] == last[0]:
            yield ( last, x )
        else:
            last = x

And, the puzzle asked not just for any numbers with this representation but for prime numbers. What we could do, if we wanted to be fancy, is walk through the primes in ascending order, or use a fast primality test, but I got lazy and just put our primes into a set for easy reference:

def search( power, maxPrime = 1000000, maxResults = 100 ):
    primes = set( primes2( maxPrime ) )
    for (y1,p1,base1),(y2,p2,base2) in matchTwo( enumeratePowersPlusPrimes( power, maxPrime ) ):
        if y1 in primes:
            print "{}^{} + {} = {} = {}^{} + {}".format( base1, power, p1, y1, base2, power, p2 )
            maxResults -= 1
            if maxResults == 0:
                break

Results

Primes plus squares

>>> search( 2 )
2^2 + 7 = 11 = 3^2 + 2
2^2 + 19 = 23 = 4^2 + 7
2^2 + 37 = 41 = 6^2 + 5
2^2 + 43 = 47 = 4^2 + 31
2^2 + 43 = 47 = 6^2 + 11
4^2 + 37 = 53 = 6^2 + 17
4^2 + 43 = 59 = 6^2 + 23
6^2 + 31 = 67 = 8^2 + 3
2^2 + 67 = 71 = 8^2 + 7
2^2 + 79 = 83 = 4^2 + 67
2^2 + 79 = 83 = 6^2 + 47
2^2 + 79 = 83 = 8^2 + 19
2^2 + 79 = 83 = 9^2 + 2
4^2 + 73 = 89 = 6^2 + 53
2^2 + 97 = 101 = 8^2 + 37
6^2 + 67 = 103 = 10^2 + 3
2^2 + 103 = 107 = 6^2 + 71
2^2 + 103 = 107 = 8^2 + 43
2^2 + 103 = 107 = 10^2 + 7
2^2 + 109 = 113 = 4^2 + 97
2^2 + 109 = 113 = 10^2 + 13
2^2 + 127 = 131 = 8^2 + 67
2^2 + 127 = 131 = 10^2 + 31
6^2 + 101 = 137 = 8^2 + 73
6^2 + 101 = 137 = 10^2 + 37
6^2 + 113 = 149 = 12^2 + 5
6^2 + 127 = 163 = 12^2 + 19
...

There's our desired solution, right at the top. If we look a little further down the list, we can see the first prime with three representations as a prime plus a square:

47 = 2^2 + 43 = 4^2 + 31 = 6^2 + 11

and the first prime with four representations:

83 = 2^2 + 79 = 4^2 + 67 = 8^2 + 19 = 9^2 + 2

That last one is sort of special; the only way a prime can be the sum of an odd square and a prime, is if that prime is two. So most of our squares are going to be even, as you can see in the lists given here.

Whenever we have a list of numbers that we're interested in, we should look it up in the Online Encylopedia of Integer Sequences and see if somebody has already studied it. This time we come up empty: no matching sequences, as of 7/30 If we managed to show something particularly interesting about primes of this sort, we could submit it as a new sequence. But there are plenty of other sequences that involve primes and squares.

Primes plus Positive Cubes

(Every solution p = (-x)^3 + q corresponds to q = p + x^3 so we're not missing anything by not using negative numbers.)

>>> search( 3 )
2^3 + 59 = 67 = 4^3 + 3
4^3 + 163 = 227 = 6^3 + 11
4^3 + 193 = 257 = 6^3 + 41
4^3 + 199 = 263 = 6^3 + 47
2^3 + 269 = 277 = 6^3 + 61
4^3 + 283 = 347 = 6^3 + 131
2^3 + 359 = 367 = 6^3 + 151
2^3 + 389 = 397 = 6^3 + 181
2^3 + 401 = 409 = 6^3 + 193
2^3 + 431 = 439 = 6^3 + 223
4^3 + 379 = 443 = 6^3 + 227
2^3 + 449 = 457 = 6^3 + 241

Primes plus Seventh Powers

>>> search( 7 )
2^7 + 280001 = 280129 = 6^7 + 193
4^7 + 263803 = 280187 = 6^7 + 251
2^7 + 280121 = 280249 = 6^7 + 313
4^7 + 263869 = 280253 = 6^7 + 317
4^7 + 263953 = 280337 = 6^7 + 401
2^7 + 280409 = 280537 = 6^7 + 601
2^7 + 280499 = 280627 = 6^7 + 691
4^7 + 264529 = 280913 = 6^7 + 977
4^7 + 264769 = 281153 = 6^7 + 1217
2^7 + 281189 = 281317 = 6^7 + 1381
2^7 + 281291 = 281419 = 6^7 + 1483
2^7 + 281429 = 281557 = 6^7 + 1621
2^7 + 281609 = 281737 = 6^7 + 1801
4^7 + 265399 = 281783 = 6^7 + 1847
2^7 + 281669 = 281797 = 6^7 + 1861
4^7 + 265423 = 281807 = 6^7 + 1871
4^7 + 265483 = 281867 = 6^7 + 1931
2^7 + 282101 = 282229 = 6^7 + 2293
2^7 + 282281 = 282409 = 6^7 + 2473
4^7 + 266029 = 282413 = 6^7 + 2477
2^7 + 282311 = 282439 = 6^7 + 2503
...

Even that just takes a couple seconds. To search for larger powers we'd have to increase the number of primes, as even 4^11 is 4194304, and that's when things start to slow down.

Primes plus Eleventh Powers

An exercise for the reader! I tried with maxPrimes=100000000 (which took about 1.5GB of memory) and that was insufficient. A C, C++, or Rust implementation would be more efficient in both CPU and memory. Also, it would be probably desirable to replace the 2nd list of primes used for checking, with a fast primality test known to be accurate over the range of interest. (See http://miller-rabin.appspot.com/ for a collection of "best known bases" for which Miller-Rabin testing is guaranteed to be correct up to some maximum value.) Note that the code above does not terminate its search when the enumerated values are too large for the primality test to succeed.

A further optimization is to note that the difference between powers need not be recalculated each time. We could store them in a table (as Bernstein suggests) for easy lookup, which would reduce the cost of each step.

References and Related Reading:

Sort:  

Hi @markgritter!

Your post was upvoted by utopian.io in cooperation with steemstem - supporting knowledge, innovation and technological advancement on the Steem Blockchain.

Contribute to Open Source with utopian.io

Learn how to contribute on our website and join the new open source economy.

Want to chat? Join the Utopian Community on Discord https://discord.gg/h52nFrV

Coin Marketplace

STEEM 0.22
TRX 0.26
JST 0.040
BTC 96371.40
ETH 3383.73
USDT 1.00
SBD 3.17