Guest
|
Posted: Mon Oct 06, 2008 5:02 pm Post subject: Fast inversion in GF(p^N), 4 times down to to 2 times slower |
|
|
Fast inversion in GF(p^N), 4 times down to to 2 times slower than
multiplication
The field GF(p^N) has a reduction polinomial of the form "x^N + P"
where P has exactly N digits modulo a subfield FFT_MOD, x^N is makes
it monic, and x is the p from the definition. "x^N + P" is preferred
to be irreducible.
Well as many try to invert in this field with complicated methods,
there>s a simple Euclid-like method that does the job with the cost
similar to 4 multiplications and 1 subfield inversion.
In the Euclidean algorithm we start with A = D = 1 ; B = C = 0 ; X =
_X (the input number to invert) ; Y = x^N + P and we have these
equations:
A*_X + B*_Y = X
C*_X + D*_Y = Y
Note that we could chose W = highest_digit_val(Y) /
highest_digit_val(X) and reformulate :
A = A*W - C ; B = B*W - D ; X = X*W - Y (note X is loosing it>s
highest digit)
But this method requires approx 2N subfield inversions and makes it
inefficient in small and practical situations.
If we chose: V = highest_digit_val(Y) ; W= highest_digit_val(X)
We>d use: A = A*V - C*W ; B = B*V - D*W ; X = X*V - Y*W ; (note X is
also loosing it>s highest digit)
So we doubled the number of subfield multiplications and we need no
subfield division (quite efficient when N < lg(FFT_MOD) ).
This way we deduced a method to cut one digit at a time from a 2N set
of digits onwards computing the inverse. This would lead to a metod
for inversion using 4N^2 + O(N) simple subfield operations. The catch
is that we can improve it still, by cutting 2 or more digits at a
time, and as we>ll see we>ll get to a formula that inversion time I(N)
is close to " 2 * M(N) * (1 + 1/s) " where s is the number of digits
we cut at a time, when we deduct a method to perform it, M(N) is the
time of the multiplication, for this problem we>re using M(N) = N^2, .
Let the polynomials to be reduced start with :
X = x3 x2 x1 x0 ...
Y = y3 y2 y1 y0 ...
step 1 : 1 digit cut (to enter we need x3 != 0 and y3 != 0)
X <-- X*y3 - Y*x3
X = 0 ; x2y3 - y2x3 = k1 ; x1y3 - y1x3 = k2 ; ...
Y = y3 ; y2 ; y1 ; ...
step 2 : 2 digits cut (to enter we need k1 != 0 and y3 != 0)
Y <-- Y*k1 - (X << 1)*y3
expanding:
Y <-- Y * (k1 + (x3y3 << 1)) - X * (y3y3 << 1)
let P1 = k1 + (x3y3 << 1); P2 = y3y3 << 1
P1 has 2 non-null coefficients, P2 has 1 non-null coefficient
If we calculate the operations count of the inversion when using 2
digits cut, we>d get:
I(N) = 2N * 3N / 2 + O(N) (number of coefficient to reduce * cost
per 2 coeff cutoff / number of coeff cut) = 3N^2 + O(N)
Note that after step 2, we>re back to the same problem but with X and
Y having different meanings through multiplication by P1 and P2, and 1
digit less for each, and P1 and P2 are polynomials. We could expand to
a method of cutting 4 points and reaching I(N) = 2.5N^2+ O(N).
The order of complexity is as follows:
s = 1 ; I(N) = 4N^2 + O(N)
s = 2 ; I(N) = 3N^2 + O(N)
s = 3 ; I(N) = 2.(6)N^2 + O(N)
s = 4 ; I(N) = 2.5N^2 + O(N)
s = 5 ; I(N) = 2.4N^2 + O(N)
...
s = N ; I(N) = much bigger than when s = 1 , so there must be a
breaking point depending on N and subfield bits, the calculations just
become too inefficient after some point.
Here>s some sample C++ code for the s = 1 case:
//---------------------------------------------------------------------------
/* DEFINITIONS
//---------------------------------------------------------------------------
#define FFT_MOD ((qint)0xFFFFFFFF00000001)
#define FFT_ROOT ((qint)7)
//---------------------------------------------------------------------------
typedef unsigned __int64 qint;
//---------------------------------------------------------------------------
#define MAX_N 4096
//---------------------------------------------------------------------------
typedef qint NUM[MAX_N];
typedef qint DUB[MAX_N* 2];
//---------------------------------------------------------------------------
#define let(A, B, N) for (int _rI= N; --_rI>= 0; (A)[_rI]= (B)
[_rI])
#define zero(A, N) for (int _rI= N; --_rI>= 0; (A)[_rI]= 0)
//---------------------------------------------------------------------------
int high_digit_pos(qint A[], int N)
{
for (int I= N; --I>= 0; )
if (A[I])
return I;
return -1;
}
//---------------------------------------------------------------------------
void LinearMul(qint A[], qint K, int N)
{ //operation A[I] *= K, for I = 0 to N - 1
for (int I= N; --I>= 0; )
A[I] = fft_mul(A[I], K);
}
//---------------------------------------------------------------------------
void msub(qint A[], qint B[], qint W, int N)
{ //operation: A= A- B* W; right to left scan
for (int I= N; --I>= 0; )
A[I] = fft_sub(A[I], fft_mul(B[I], W));
}
//---------------------------------------------------------------------------
FIELD SPECIFIC FUNCTIONS:
- 1. qint fft_div(qint A, qint X)
- returns "A / X % FFT_MOD", could be A* fft_exp(X, FFT_MOD -2)
- 2. qint fft_mul(qint A, qint B)
- returns "A * B % FFT_MOD"
- 3. qint fft_sub(qint A, qint B)
- returns "A - B % FFT_MOD"
*/
//---------------------------------------------------------------------------
void invert(qint _X[], qint P[], int N)
{ //operation: _X= 1/ _X % (x^N + P) ; JUST 1 (ONE) FIELD
INVERT...
//rules: A*_X+ B*_Y= X ; firstly: A= D= 1 && B= C= 0
// C*_X+ D*_Y= Y ; X= _X; Y= P
// 1 digit cut: (AV-CW)*_X+ (BV-DW)*_Y= XV- YW; when V =
highest_digit(Y), W= highest_digit(X)
// We don>t need B or D so we compute the inverse using A, C, X and Y
// time complexity:
// start end sum (per line = cost per step per big number)
// A 1 N - 1 N
// C 0 N N
// X N - 1 1 N
// Y N 0 N
// With the 1 digit cut strategy we obtain a time complexity of
O(Number_of_steps * Cost_per_step)
// = O(2N * 2N) = 4N^2 + O(N) subfield operations more precisely.
DUB _A, _C, _Y;
qint *A= _A, *C= _C,
*X= _X, *Y= _Y;
zero(A, N); A[0]= 1;
zero(C, N+ 1);
let(X, X, N);
let(Y, P, N); Y[N]= 1;
int NX= high_digit_pos(X, N),
NY= N,
NA= 0,
NC= -1;
do {
if (NX< NY) {
qint *tmp1;
tmp1= X; X= Y; Y= tmp1; //swap(X,Y)
tmp1= A; A= C; C= tmp1; //swap(A,C)
int tmp2;
tmp2= NX; NX= NY; NY= tmp2; //swap(NX, NY)
tmp2= NA; NA= NC; NC= tmp2; //swap(NA, NC)
}//NX>= NY now... (lengths)
if (NY< 0)
break;
qint &hW= X[NX], //not modified, see reduction
below
&lW= Y[NY];
int K= NX- NY;
//below X= X- [YW << (nx-ny)]
LinearMul(X, lW, NX ); msub(X+ K, Y, hW,
NY ); //-2 mul
while (--NX>= 0 && X[NX]== 0); //decrease NX
if (NX>= 0) {
LinearMul(A, lW, NA+ 1); msub(A+ K, C, hW, NC+
1);
if (NC+ K> NA)
NA= NC+ K; //NA= maximum(NA, NC+
K);
}
} while (1);
qint low= fft_div(1, X[NX]); //mul-monic denominator
(single sub-f invert)
let(_X, A, N);
LinearMul(_X, low, NA+ 1); //result= A/ X[HX]= (mul-
monic)
}
//---------------------------------------------------------------------------
Hope you find this post useful,
Badita Robert, Romania |
|