Self-initializing quadratic sieve

From Prime-Wiki
Jump to: navigation, search

The self-initializing quadratic sieve (SIQS) is a factorization method based on the multiple polynomial quadratic sieve (MPQS).

Introduction

Let N be the number to be factored. This number must not be a perfect power. If somehow we find two integers X and Y such that [math]X^2\equiv Y^2\,\pmod N[/math] and [math]X\not\equiv \pm Y\pmod N[/math], then [math]\gcd(X+Y,\, N)[/math] will reveal a proper factor of N.

In order to find these values X and Y the method finds relations which have the form [math]t^2 \equiv u\,\pmod N[/math] where u is the product of small prime numbers. The set of these primes is the factor base. These relations will be found using sieving as will be explained below.

The relations found are combined using multiplications. The left hand side will always be a square because is a product of squares, so the goal is to have a square at the right hand side. A number is a square when all its prime factors appear an even number of times.

For example: let the number to factor be N = 1817 and we have found the following relations with factor base = {2, 7, 13}:

[math]45^2\,\equiv \,2^4*7^0*13^1[/math]
[math]123^2\,\equiv \,2^{10}*7^0*13^1[/math]

Both relations have non-square RHS, since the exponent of 13 is not even. But multiplying them we get:

[math]84^2\,\equiv \,2^{14}*13^2[/math]
[math]84^2\,\equiv \,(2^7*13)^2[/math]

Since [math]2^7*13\,\equiv\,1664[/math] we get the factor [math]\gcd(84+1664,\,1817) = 23[/math].

Which relations have to multiplied to find a square in the RHS is a linear algebra problem.

Sieving

We will sieve in the interval [-M, +M] using several polynomials [math]g_{a,b}(x)\, =\, (ax + b)^2 - N[/math] where [math]b^2 - n\, =\, ac[/math] so [math]g_{a,b}(x)\, =\,a(ax^2+bx+c)[/math] with different values of [math]a[/math] and [math]b[/math]. The advantage here is that we know the factorization of [math]a[/math], so the other factor is small compared to [math]N[/math], thus increasing the probability that all its prime factors are included in the factor base. Using the notation of the previous section, [math]t^2\equiv u\,\pmod N[/math] where [math]t = ax+b[/math] and [math]u = g_{a,b}(x)[/math].

The advantage of this method versus MPQS is that once the first polynomial to sieve is generated, a lot of polynomials using the same value of the coefficient [math]a[/math] can be quickly generated using the information used for the first one. This speeds a lot the sieving.

Generating the factor base

In order to determine the factor base, we have to notice that it includes all the possible factors of [math]g_{a,b}(x)[/math]. Since [math]g_{a,b}(x) = t^2 - N[/math], we get that [math]0 \equiv t^2 - N[/math] or [math]t^2 \equiv N[/math] modulo any prime of the factor base. This means that about half of the primes cannot be in the factor base because [math]N[/math] cannot be a quadratic residue modulo that prime. Notice that the content of this paragraph does not depend on the particular values of [math]a[/math] or [math]b[/math], so the factor base can be computed at the beginning of SIQS. The upper bound for the factor base is determined from experimentation, for example:

  • Factoring a 60 digit number: F = 60000
  • Factoring a 70 digit number: F = 350000
  • Factoring a 80 digit number: F = 900000

The next step consists in computing the modular square roots of [math]N[/math] modulo the different primes in the factor base and computing the logarithms in base 2 of the different primes rounded to the nearest integer, storing these values in the arrays [math]\operatorname{tsqrt}[/math] and [math]\operatorname{tlog}[/math].

Generating the first polynomial

Now we compute a value of [math]a[/math] as the product of several primes [math]q_1[/math], [math]q_2[/math], [math]q_3[/math], ..., [math]q_j[/math], of the factor base, such that:

  • Their product is about [math]\sqrt{2N}/M[/math].
  • Each factor is on the range 2000 to 4000.
  • When selecting the primes of the factor base for the second and further values of the coefficient [math]a[/math], there must be at least two different primes from previous selections.

The last two considerations reduce the probability of finding duplicate relations, i.e., relations that can be deduced from the others, that would be discarded by the linear algebra operations, thus increasing the computation time.

If the factors are very large, we will run out of polynomials too quickly and we have to select another value of the coefficient [math]a[/math] which is an expensive process.

Now for each value of the index [math]i[/math] such that [math]1\le i\le j[/math] perform the following:

  • Compute [math]\gamma = \operatorname{tsqrt}[q_i] * (a/q_i)^{-1}\,\bmod{q_i}[/math] where [math]a/q_i\,\pmod{q_i}[/math] is computed by multiplying [math]q_1 * q_2 * ... * q_n\,\pmod{q_i}[/math] excluding [math]q_i[/math] from the factors.
  • If [math]\gamma \gt q_i/2[/math] then replace [math]\gamma[/math] with [math]q_i - \gamma[/math].
  • Let [math]B[i] = (a/q_i)\,\gamma[/math] using multiple precision. This means that this array holds multiple precision integers.

Let [math]b = B[1] + B[2] + ... + B[j][/math] and [math]g_{a,b}(x)\, =\,(ax+b)^2-N[/math]. Then initialize the sieve array to zeros.

For each prime [math]p[/math] not in the list [math]q_i[/math] of factors of the coefficient [math]a[/math]:

  • Compute [math]\operatorname{ainv}[/math] as the inverse of [math]a[/math] modulo [math]p[/math]. This can be computed using single precision, provided that [math]p^2[/math] fits in a computer word.
  • For each value of the index [math]i[/math] such that [math]1\le i\le j[/math] compute [math]\operatorname{Bainv2}[i,p] = 2\, B[i] \,\operatorname{ainv}\,\bmod p[/math].
  • Compute the first solution as [math]\operatorname{soln1} = \operatorname{ainv} * (\operatorname{tsqrt}[p] - b) \bmod p[/math] and add [math]\operatorname{tlog}[p][/math] to all locations [math]\operatorname{soln1} + k*p[/math] of the sieve array.
  • Compute the second solution as [math]\operatorname{soln2} = \operatorname{ainv} * (-\operatorname{tsqrt}[p] - b) \bmod p[/math] and add [math]\operatorname{tlog}[p][/math] to all locations [math]\operatorname{soln2} + k*p[/math] of the sieve array.

Obtaining relations

Walk through the sieve array. For each element greater than about [math]\log(2x\sqrt {N})[/math] minus a small error term (because the logarithms are rounded and we do not sieve with powers of prime numbers), perform a trial division of [math]g_{a,b}(x)/a[/math] by the elements of the factor base. If all prime factors are less than or equal to F, save the relation. If there are enough relations (generally about 5% more relations than elements of the factor base), perform the linear algebra stage. Otherwise more relations must be collected. When the sieve stage is exhausted, change polynomial.

Changing polynomial

Suppose we generated polynomial number i and we want to generate polynomial number i+1 where [math]1\le i\le 2^{j-1}-1[/math] where j is the number of prime factors of the coefficient a. If [math]i = 2^{j-1}-1[/math], go to the Generating the first polynomial subsection.

Let v be the number of bits set to zero at the right of the number 2i when written in binary.

If the bit immediately at the left of the rightmost "1" when the number i written in binary is zero, let e = 1, otherwise let e = -1.

Let [math]b = b + e * B_v[/math] and [math]g_{a.b}(x) = (ax+b)^2 - N[/math].

Initialize the sieve array to zeros.

For each prime [math]p[/math] in the factor base that does not divide the coefficient [math]a[/math]:

  • Compute [math]\operatorname{soln1}[p] = \operatorname{soln1}[p] + e*Bainv2[v,p] \bmod{p}[/math] and add [math]\operatorname{tlog}[p][/math] to all locations [math]\operatorname{soln1} + k*p[/math] of the sieve array.
  • Compute [math]\operatorname{soln2}[p] = \operatorname{soln2}[p] + e*Bainv2[v,p] \bmod{p}[/math] and add [math]\operatorname{tlog}[p][/math] to all locations [math]\operatorname{soln1} + k*p[/math] of the sieve array.

Then go to the Obtaining relations subsection.

Large prime variation

A variation that makes the sieving about twice as fast but requiring more memory, named PSIQS, consists in selecting a large prime bound (about 64 F). If after the trial division by the primes in the factor base the quotient is less than the large prime bound, we collect this partial relation in a different location than full relations (when the final quotient is 1). If the large prime found is the same than the large prime found in a previous partial relation, we can combine them:nnLet [math]t^2\equiv u * p\,\pmod{N}[/math] and [math]r^2\equiv v * p\,\pmod{N}[/math] be two partial relations with the same large prime [math]p[/math]. Then a full relation will be:

[math]\left(\frac {t*r}{p}\right)^2\,\equiv u*v\,\pmod{N}[/math].

This full relation can be stored with the other full relations found. At the beginning of PSIQS the number of full relations found in this way will be small, but it increases as the algorithm progresses, and then there will be more relations found from partial relations than without large primes.

Complexity

The theoretical time and space complexity of the quadratic sieve is O(exp(sqrt(log n log log n))) where n is an integer. The constant e is usually used as the base of the logarithm.

External links