Elliptic curve method

The elliptic curve method (sometimes called Lenstra elliptic curve factorization, commonly abbreviated as ECM) is a factorization method which computes a large multiple of a point on a random elliptic curve modulo the number to be factored. It is currently the best algorithm known, among those whose complexity depends mainly on the size of the factor found.

This complexity can be represented by:

O (\exp \sqrt{(\log p \log \log p) (1 + O (1)})

where p is the smallest factor.

Maths

Simple Explanation

This is a simple explanation of the Elliptic Curve Method. It is not a mathematical proof of the method, nor is it an explanation of why it works; it is simply an explanation of what the method is.

The Elliptic Curve Method is a way of determining factors of a composite number. This method cannot be used when it is not known in advance that the number is composite, so it cannot be used as a primality test.

First we need to understand some simple mathematical terms. A prime number is a number greater than 1 that is only divisible by itself and 1. A composite number is a number that has divisors that are neither itself, nor 1. A highly composite number is a number that has lots and lots of divisors. We can make highly composite numbers simply by multiplying lots of different prime numbers together. So that 2 x 3 x 5 x 7 x 11 x 13 etc will be a highly composite number. But that is only the first six prime numbers multiplied together. If we multiply the first 2000 prime numbers together we get a huge number that has lots and lots of divisors. The significance of this to the Elliptic Curve Method is that a huge amount of other numbers will have factors in common with our highly composite number.

Modular arithmetic can be thought of as a system of arithmetic in which we only concern ourselves with the remainder after dividing by a specific number called the modulus. So that if we take 17 as our modulus, when we multiply 47 x 23, instead of saying that the answer is 1081, we say it leaves a remainder of 10 after division by the modulus. We can then abbreviate this to 47 x 23 = 10 (mod 17). The significance of this to the Elliptic Curve Method is that all of our maths is done with the number we are trying to factor as the modulus.

If we plug different values for x into an algebraic expression such as $y = 3 x^{2} + 10 x - 9$ , we get out different values for y. If we plug in x = 2, we get out y = 23. Plug in x = 3, we get out y = 48. If we plug in enough different values for x and plot the resultant y values on a graph, when we join the dots on the graph we get a curve. For the expression given above we would get a quadratic curve because the expression $y = 3 x^{2} + 10 x - 9$ is a quadratic expression (because the highest power of x is 2).

If we change the expression to $y^{2} \equiv x^{3} + a x + b$ (where a, b, x and y are integers like 3 or -12, and the sign $\equiv$ means that the difference between the left hand side and the right hand side is a multiple of the number to be factored) we get a different kind of curve called an elliptic curve. For example, if we want to factor the number 77, the point (x, y) = (12, 25) is located in the elliptic curve $y^{2} \equiv x^{3} + 17 x + 2$ because the difference is: $25^{2} - (12^{3} + 17 * 12 + 2) = - 1309$ , a multiple of 77.

So now we have an algebraic expression that generates an elliptic curve. We have a highly composite number that has loads and loads of divisors, and we have some arithmetic being done modulo the number being factored. How do these three go together ?

We start by calculating a point on the curve. (Naturally the computer programmes that do this do not actually draw graphs, they simply calculate the x and y values for a particular point on the curve). Now, by doing some very beautiful mathematics explained below, we can calculate the position of other points on the same curve, the multiples of the original point. This process is called "running a curve" as in "I have run 400 curves" and in terms of the computer program doing this it is one iteration of the algorithm. So we compute the product of the first point times the highly composite number, and then calculate the greatest common divisor of the coordinate x of the found point and the number to be factored. If the greatest common divisor is greater than 1 and less than the number to be factored, congratulations! we have found a proper factor. Otherwise we have to run more curves, using a different starting point.

As described above it probably sounds a bit of a hit-or-miss affair, but there are controls built into the programme. By carefully determining the starting values for the programme it is possible to limit our search to factors of no more than some size measured in bits (a bit is the smallest unit of computer memory). After running the requisite number of curves (detailed on a table on this page) at one particular bit level, we can then start to look for factors at a higher bit level. Even at the higher bit level it is still possible to find factors of the smaller size. You can think of this as being a tall man and his short wife with an umbrella. If the wife is holding the umbrella, when we look under the umbrella we will only find her. But if her husband is holding the umbrella when we look underneath it we may find either of them, or even both. By running sufficient curves at each bit level we can eliminate the possibility of their being any factors hiding under the umbrella.

More detailed explanation

Introduction

The elliptic curve stated above (Weierstrass form) is not the best in terms of computations. The Montgomery form is used instead:

B y^{2} z \equiv x (x^{2} + A x z + z^{2})

(mod

N

) (1)

which is an elliptic curve when $B \neq 0$ and $A \neq \pm 2$ .

Notice that the curve (really a surface in the 3-dimensional space) above always contains the point $x = 0$ , $y = 1$ , $z = 0$ . This point is known as the neutral element O. Also the equation does not change by multiplying all coordinates by the same value $k \neq 0$ .

In this paragraph let's suppose that we work modulo a prime $p$ (this will be an a-priori unknown prime factor of $N$ ). Given the set of points for which (1) holds, we can define the addition operation. So we can write P + Q = R where P, Q and R are points in this set. We can write P + P = S as 2P = S. Given a point P, we could compute 2P, 3P, 4P, and so on. For some value $g$ . we will have $g$ P = O. This value $g$ is known as the group order and depends on the values of $p$ , $A$ and $B$ . Its value is near $p$ , or more exactly between $p - 2 s q r t p$ and $p + 2 s q r t p$ . It should be obvious that $2 g$ P = O, $3 g$ P = O, and so on. For some values of P, it is possible to find $f$ P = O where $f$ is a divisor of the group order, but this does not change the factorization method (in fact it works better).

It is interesting to note that the elliptic curve method does not use the $y$ coordinate, thus making additions and duplications faster.

Step 1

Now if we work modulo the number to be factored, $N$ , if we start with the point P, and somehow we know a multiple of the group order, say $k g$ , we will find that the coordinates $x$ and $z$ of $k g$ P will be multiples of the unknown prime factor. So we compute the greatest common divisor between the coordinate $x$ and the number to be factored $N$ in order to reveal the prime. Notice that it is possible that $k g$ is also multiple of the group factor for another prime factor, especially if $k g$ is very high. In that case the GCD will be a composite factor of $N$ . In the worst case the GCD will be $N$ .

How do we find a multiple of the group order that is not known in advance? Using highly composite numbers. Given a bound B₁, we multiply the original point P by all prime numbers less than B₁ (each prime number is raised to a power such that the result is also about B₁). If the group order is a composite number whose factors are all less than B₁ we are done. Otherwise we have to select another curve and another initial point P or continue with step 2.

Step 2

The step 2 is useful when the group order has a prime factor $q$ between the bounds B₁ and B₂ and all the other prime factors less than B₁. Since this is almost always the case when ECM is successful, this step 2 speeds a lot the elliptic curve method.

Let Q be the point computed in the step 1 (the previous paragraph), so $q$ Q = O (considering arithmetic $(\mod p)$ ). Let $C$ be the multiple of 6 before B₁. The idea is to compute $C$ Q and 6Q. Then using the addition formula we can compute ( $C + 6$ )Q, ( $C + 12$ )Q, ( $C + 18$ )Q, ..., up to B₂Q. If $q = C + 1 + 6 k$ the point computed $(C + 6 k)$ Q will be equal to -Q and if $q = C - 1 + 6 k$ the point computed will be equal to Q. Since the coordinates $x$ of the points Q and -Q are congruent mod p, this coordinate will be congruent to the coordinate $x$ of $(C + 6 k)$ Q (mod p), so we just multiply all $X_{(C + 6 r) Q} - X_{Q} (\mod N)$ . This value will be a multiple of $p$ , so taking the GCD of the product and $N$ the factor $p$ will be revealed.

An improvement to step 2 can be done if there is enough memory to store intermediate values. As a simple example let's consider the modulus 30. The numbers coprime to it are: 1, -1, 7, -7, 11, -11, 13 and -13. So we can precompute a table of four values: Q, 7Q, 11Q and 13Q (of course the first one does not require any calculation), and the values 30Q and $C$ Q as done in the previous paragraph (in this case $C$ is the multiple of 30 before B₁). Then we multiply all $(X_{(C + 30 k) Q} - X_{Q})$ $(X_{(C + 30 k) Q} - X_{7 Q})$ $(X_{(C + 30 k) Q} - X_{11 Q})$ $(X_{(C + 30 k) Q} - X_{13 Q}) (\mod N)$ and take the GCD of the product and $N$ in order to reveal the factor.

The modulus 30 can be replaced by 210, or 2310 if there is enough memory to hold all values of $t$ Q where the number $t$ is positive, less than half the modulus and coprime to it.

Another optimization can be done when the modular multiplications are expensive: when both $C + d k + t$ and $C + d k - t$ are composite, do not multiply the value $X_{(C + d k) Q} - X_{t Q}$ .

For high values of B₂ another method, known as the birthday paradox continuation, is faster.

Formulas for addition and duplication

Since the algorithm does not use the $y$ coordinate we represent a point P as (P_x : : P_z). Notice that the second coordinate is not shown deliberately because it is not used.

If P = -Q then P_x = Q_x and P_z = Q_z (they only differ in the $y$ coordinate).

Let P = (P_x : : P_z) and Q = (Q_x : : Q_z) where P_x / P_z is not equal to Q_x / Q_z (otherwise they are the same point or negatives). Also let P + Q = (X₊ : : Z₊) and P - Q = (X_- : : Z_-).

$\frac{X_{+}}{Z_{+}} = \frac{Z_{-} (U + V)^{2}}{X_{-} (U - V)^{2}}$

where $U = (P_{x} - P_{z}) (Q_{x} + Q_{z})$ and $V = (P_{x} + P_{z}) (Q_{x} - Q_{z})$

Since the previous formula does not work when P_x / P_z = Q_x / Q_z we need another one to compute 2P.

Let P = (P_x : : P_z) and Q = 2P = (Q_x : : Q_z).

We get:

$\frac{Q_{x}}{Q_{z}} = \frac{(P_{x} + P_{z})^{2} (P_{x} - P_{z})^{2}}{T ((P_{x} - P_{z})^{2} + S T)}$

where: $S = \frac{A + 2}{4}$ which can be precomputed when the elliptic curve is selected and

$T = (P_{x} + P_{z})^{2} - (P_{x} - P_{z})^{2}$ .

All the arithmetic above must be done modulo $N$ .

Only modular additions, subtractions and multiplications are needed. No modular inversions are required (they are very expensive in execution time).

In order to compute $S$ without modular inversions, we get the first number of the sequence $A + 2$ ,

$A + 2 + N$ , $A + 2 + 2 N$ and $A + 2 + 3 N$ which is multiple of 4 (which can be seen by inspecting the two least significant bits). Then just divide by 4 that number.

Multiplication

In order to perform multiplication of a point P by a natural number $M$ (which is the main operation in step 1), there are two approaches using the formulas shown above.

Montgomery's PRAC is an almost optimal algorithm, but this is too complicated to explain it here.

Another algorithm is the same used in exponentiation. We represent the value $M$ in binary form. Then erase the first "1" (all numbers when represented in binary start with the digit "1"). Then for every "1" write the string "DA" and for every "0" write "D".

For example for the number 21 which is 10101 when written in binary, you will have the string DDADDA. The letter D means duplication and A means addition.

So if we have to compute 21P using the recipe above we will get the sequence: 2P, 4P, 5P, 10P, 20P, 21P.

A problem with this approach is that the formula for addition shown above for Q + R needs the coordinates of Q - R. This can be overcomed by using another point. When we compute kP we will be also computing (k+1)P.

Then if we start with the points (kP, (k+1)P), when the bit equals zero we compute (2kP, (2k+1)P) and when the bit equals one we compute ((2k+1)P, 2(k+1)P). In both cases we need one duplication and one addition requiring only 10 modular multiplications.

The PRAC algorithm is 30% faster in average than the algorithm shown above to multiply a point by a natural number.

Selecting curve and starting point

Using the Montgomery form of elliptic curve, the group order is always multiple of four. Suyama found a method where the group order is multiple of 12, thus increasing the probability that this number has only small prime factors.

First we have to select a number sigma ( $σ$ ) which is not equal to 0, -1, 1 or 5. Then with the following formulas we can find the coordinates $x$ and $z$ and the elliptic curve parameter $A$ . All computations must be done modulo $N$ .

u = σ^{2} - 5

v = 4 σ

x = u^{3}

z = v^{3}

A = \frac{(v - u)^{3} (3 u + v)}{4 u^{3} v} - 2

Implementations

GMP-ECM
Prime95
A Web application, written by Dario Alpern, to factorize using ECM and SIQS.

Choosing the best parameters for ECM

It's still an open question how to choose the best parameters for ECM. Although, a few conventions have been made, including the B1 parameter for factors up to 70 digits (as you can see, that is a good limit because ECM on 70-digit factors is almost impossible with current hardware). Many programs today use the default B2=100*B1:

Digits	B1(B2=B1*100)	Curves
15	2,000	25
20	11,000	90
25	50,000	300
30	250,000	700
35	1,000,000	1,800
40	3,000,000	5,100
45	11,000,000	10,600
50	43,000,000	19,300
55	110,000,000	49,000
60	260,000,000	124,000
65	850,000,000	210,000
70	2,900,000,000	340,000

GMP-ECM also has default B2 values to use, which are commonly taken by most factorers. Some say that the B2 value doesn't affect the chances of finding a factor much, but as a rule of thumb it's common to spend as many time in stage 2 as in stage 1, and with the progress made on algorithms, this value should be much higher than B1*100. As we can see, using a higher B2 does affect the chances of finding a factor, according to GMP-ECM's probability function. A red square means N/A:

Digits	Value of B1	GMP-ECM B2 default	Curves
15	2,000	147,396
20	11,000	1,873,422	86
25	50,000	12,746,592	214
30	250,000	128,992,510	430
35	1,000,000	1,045,563,762	910
40	3,000,000	5,706,890,290	2,351
45	11,000,000	35,133,391,030	4,482
50	43,000,000	240,490,660,426	7,557
55	110,000,000	776,278,396,540	17,884
60	260,000,000	3,178,559,884,516	42,057
65	850,000,000	15,892,628,251,516	69,471
70	2,900,000,000	105,101,237,217,912	102,212
75	7,600,000,000	425,332,376,469,022	188,056
80	25,000,000,000	2,551,982,328,195,322	265,557

Ongoing factorization effort with ECM

There is an ongoing effort by GIMPS to factorize composite Mersenne and Fermat numbers using ECM. As of May 2018, the lowest composite Mersenne number without any known factors is 2¹²⁷⁷-1. [1]

The largest factor found using ECM so far has 83 decimal digits and was discovered on 2013-09-07 by R. Propper.[2]

External links

Elliptic curve method

Contents

Maths

Simple Explanation

More detailed explanation

Introduction

Step 1

Step 2

Formulas for addition and duplication

Multiplication

Selecting curve and starting point

Implementations

Choosing the best parameters for ECM

Ongoing factorization effort with ECM

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Prime (direct)

Prime Wiki

Tools