FFT-Polynomial.mws
POLYNOMIALS MULTIPLICATION USING THE FAST FOURIER TRANSFORM
Bruno Guerrieri
Florida A&M University
We will be multiplying polynomials using two different ways, (traditional and FFT), and see whether one method is consistently faster than the other one. We will let the polynomials
and
be of the same degree for simplicity and we will generate the polynomials coefficients randomly. Again, for simplicity, the coefficients are integers chosen in [-50..50]. For a more detailed explanation of what follows, please consult [1] in the Reference section.
An Analogy
Consider the following table:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8182 16384 32768
You have, of course, noticed that entry
in row 1 corresponds to entry
in row 2. Or, to put it another way, the top row lists the logarithms in base 2 of the bottom list.
What if we wanted to calculate (64)(512)? One way would be to follow the traditional algorithm that we all learned in grammar school and perform the standard multiplication. The other way would be to locate the corresponding logarithms, namely 6 and 9, ADD these two logarithms to get 15, and then determine the antilog, our answer, namely 32768. In what follows, the strategy is the same. The quantities "worked on" and the operations are different. Instead of multiplying numbers, we will be multiplying polynomials. The equivalent of taking logarithms and antilogarithms will be polynomial evaluations and interpolations and the equivalent of our ADD will be POINTWISE multiplication.
Consider two lists of numbers say
{s} = (
,
,...,
)
and
{t} = (
,
, ...,
).
Then pointwise multiplication gives the list of numbers
{
} = (
,
,...,
).
>
restart:
>
with(plots): with(inttrans):
Warning, the name changecoords has been redefined
A Series of Needed Procedures
The procedure "coefficient" generates "deg+1" random coefficients in the range [-50..50].
>
coefficient := proc(deg)
>
local i,c,die;
>
die := rand(-50..50);
>
for i from 1 to deg+1 do
>
c[i] := die();
>
od;
>
RETURN(c);
>
end:
The procedure "padding" "right-pads" a list of coefficients with "deg" zeros.
>
padding := proc(c,deg)
>
local i,d;
>
d := c;
>
for i from 1 to deg do
>
d := [op(d),0];
>
od;
>
RETURN(d);
>
end:
The procedure "poly_mult" determines the coefficient vector "c" of the resulting polynomial
in the traditional way. Vector "c" is obtained by "convolving" the input coefficients vectors "a" and "b".
>
poly_mult := proc(a,b,deg)
>
local j,k,c;
>
for j from 1 to 2*deg+1 do
>
c[j] := 0;
>
for k from 1 to j do
>
c[j] := c[j] + a[k]*b[j-k+1];
>
od;
>
od;
>
RETURN(c);
>
end:
The procedure "dft" (Discrete Fourier Transform) is present here since we wanted to, in fact compare the three processes for multiplication of two polynomials, namely the traditional, DFT, and FFT (Fast Fourier Transform) processes. One has to get into high degrees to see the FFT overtake the traditional method. The DFT is present here to make us appreciate the speed improvement that the FFT brings to the situation. The procedure "dft" which follows takes two arguments. The first one is the list to be transformed. The second argument is set so that a = 1 for DFT and a = -1 for IDFT (Inverse Discrete Fourier Transform). The last part of the code takes very small values (usually on the inverse transformation when dealing with original real data) located in the imaginary part and zero them out. So first the list is "split" into real and imaginary parts, Fourier transformation occurs, zeroing out is done where needed, and then the two lists are "sewn" back together.
>
dft := proc(h_in,a)
>
local j,k,sum1,nombre,LS1,h_out,list1,re_list,im_list;
>
LS1 := nops(h_in);
>
nombre := round(evalf(ln(LS1)/ln(2)));
>
for k from 1 to LS1 do
>
sum1 := 0;
>
for j from 1 to LS1 do
>
sum1 := sum1 + h_in[j]*exp(a*I*2*Pi*(j-1)*(k-1)/LS1);
>
h_out[k] := sum1/(LS1)^(-0.5*a+0.5);
>
od:
>
od:
>
list1 := [evalf(seq(h_out[j],j=1..LS1))];
>
re_list := map(fnormal,map(Re,list1)):
>
im_list := map(fnormal,map(Im,list1)):
>
h_out := simplify(zip((a,b)->(a+b*I),re_list,im_list));
>
RETURN(h_out);
>
end:
The FFT/IFFT procedure call is to be found below. The only reason the procedure shown below looks lengthy is because real part and imaginary part of lists going through the FFT or the IFFT have to be kept separated.
>
fastft := proc(L_in,a)
>
local LS1,nombre,re_list,im_list,re_array,im_array,L_out;
>
LS1 := nops(L_in);
>
nombre := round(evalf(ln(LS1)/ln(2)));
>
re_list := map(Re,L_in); im_list := map(Im,L_in);
>
re_array := array(1..nops(re_list),re_list); im_array := array(1..nops(im_list),im_list);
>
if (a=1) then FFT(nombre,re_array,im_array) else iFFT(nombre,re_array,im_array); fi;
>
re_list := map(fnormal,convert(re_array,list)); im_list := map(fnormal,convert(im_array,list));
>
L_out := zip((a,b)->simplify((a+b*I)),re_list,im_list);
>
RETURN(L_out);
>
end:
The "power_of_two" procedure determines the nearest power greater than deg(A(x)) + deg(B(x) + 1. This is needed, for padding purposes, since the FFT/IFFT needs "power of two" list sizes.
>
power_of_two := proc(deg)
>
local i;
>
i := 1;
>
while (2^i < 2*deg+1) do i := i+1; od;
>
RETURN(2^i);
>
end:
The " zeros_plot" procedure "reconstructs" a polynomial from its coefficients so that its zeros can be determined and plotted if necessary.
>
zeros_plot := proc(coef_list,couleur)
>
local i,p,y,N,R,points;
>
N := nops(coef_list);
>
y := coef_list[1];
>
for i from 2 to N do
>
y := y + coef_list[i]*x^(i-1);
>
od;
>
R := [ fsolve(y,x,complex) ];
>
points := map(z->[Re(z),Im(z)],R);
>
p := plot(points,style=point,symbol=circle,color=couleur):
>
RETURN(p);
>
end:
>
Initial Data
Now that all the procedures we will need have been written, let us randomly generate the "deg+1" coefficients of two polynomials A(x) and B(x) of degree "deg". For simplicity, we will take the two polynomials to be of the same degree. Otherwise, we would have to pad with zeros. We are also choosing the coefficients to be integers in [-50..50]
.
>
deg := 10:
>
pot := power_of_two(deg);
>
a := coefficient(deg): b := coefficient(deg):
>
a_list := [seq(a[i],i=1..deg+1)]: b_list := [seq(b[i],i=1..deg+1)]:
We are saving these two lists, in c_list and d_list, so that we can use the same polynomials with the DFT and then the FFT approaches.
>
c_list := a_list; d_list := b_list:
>
The Traditional Multiplication of Two Polynomials
In the straightforward multiplication (convolution) method that follows, list sizes are handled automatically. We will notice, when we use the DFT approach that the padding will always be up to a power of 2.
>
a_list := padding(a_list,deg); b_list := padding(b_list,deg):
>
st := time():
>
c := poly_mult(a_list,b_list,deg):
>
standard_time := time() - st:
The final answer provides a list of the coefficients of the resulting polynomials C(x).
>
final_answer_1 := [seq(c[i],i=1..2*deg+1)];
>
Explanation of the DFT/FFT Method
Again, for a more detailed explanation we recommend that you refer to [1]. We would like to multiply the same two polynomials seen above using the DFT. Very quickly, we will come to realize that this process requires a lot of computing time. In fact the DFT is no match for the traditional method. We wanted to investigate the DFT approach for the simple reason that it is a routine that you can write on your own by using the definition for the DFT. This way, anybody unfamiliar with the intricacies of the FFT, can feel somewhat at ease when told that the FFT is nothing but a "souped up" DFT - (Apologies to Cooley and Tukey [3]). Therefore
we recommend that you bypass the DFT process and go directly to the FFT one whenever your polynomial degrees start to get above 50.
Note that the polynomial representation we used above, where polynomials are identified by their coefficient lists, is called the "coefficient representation (CR)". One can also use the "point-value representation (PVR)". The PVR form of a polynomial of degree
is a set of (
) point-value pairs (
), (
) , ..., (
), that is (
) points that belong to the graph of the polynomial. While multiplying polynomials in CR form takes O(
), the alternative multiplicative option of using the three steps of
a) conversion to PVR,
b) pointwise multiplication,
c) reconversion to CR
only takes O(
).
The evaluation of a polynomial at a particular value of
is usually carried out using Horner's method. The "inverse" of evaluation is interpolation and one can determine the coefficients {a} = (
,
, ...,
) of a polynomial in PVR form by relying on Lagrange polynomials. Thus
-points evaluation and interpolation are therefore well-defined operations that allow one to move from CR to PVR and back again.
Let us assume in the rest of the discussion that
= deg(
= deg(
). Since the resulting polynomial is of degree
, we must start with "extended"
PVR form for both polynomials A and B. The quantity (+
) that appears at times has to do with the fact that a polynomial of degree
has
coefficients. We must also, since the FFT is at its best when processing lists whose lengths are powers of 2, let 2
be a power of 2. Be aware that there are two sorts of padding going on. One was seen at work in the traditional multiplication process. But the one we are referring to now has to do with bringing all lists to lengths that are powers of 2.
One can be clever in the choice of the
's (there will be 2
of them) at which polynomials will be evaluated so as to determine the corresponding PVR form. In fact, if we choose "complex roots of unity" as evaluation points, we can produce a PVR form of a given polynomial in CR form by taking the DFT of the coefficient vector. IDFT takes care of interpolation so as to bring us back to a CR form, from a PVR form.
Here is the algorithm we will be following
:
Start with A(x) and B(x) in coefficient (CR) form.
Create coefficient representation of
and
as polynomials of degree 2
. Time = [O(
)].
Evaluate: Compute point-value representation of
and
of length
through applications of the DFT (later the FFT) of order
to each coefficient vector. These representations contains the values of the two polynomials at the (
)th roots of unity. Time = [O(
)].
Pointwise multiply: Compute a PVR form for
=
by multiplying these values together, pointwise. The representation obtained contains the value of
at each (
)th roots of unity. Time = [O(
)].
Interpolate: Create the coefficient representation of the polynomial
through a single application of a IDFT (later IFFT) on
point-value pairs. Time = [O(
)].
What has been done is as follows: {a} convolved with {b} = IDFT[DFT(a)*DFT(b)], where vectors {a} and {b} have been padded with zeros to length
and * denotes the componentwise product of two
-element vectors. Note that {a} convolved with {b} stands for the traditional multiplication procedure we started out with.
>
List Padding as Preparation for DFT/FFT
We are padding the coefficients list with zeros up to the power of two that follows degree(
)
+ degree(
).
>
c_list_in := padding(c_list,pot - deg - 1); d_list_in := padding(d_list,pot - deg - 1);
You can, if you wish, plot the zeros of the two initial polynomials. You would have to remove the comment symbols # on the line that follows.
>
# p[1] := zeros_plot(c_list,yellow): p[2] := zeros_plot(d_list,green): display([p[1],p[2]]);
The DFT Approach
Please recall
NOT
to run the DFT if the degrees of your polynomials
and
are large, say above 50 each
. If you do, you will appreciate what the FFT brings to the DFT. For a good discussion on this point, see [2, pg 294] in the Reference section.
>
st := time();
>
c_list_out := dft(c_list_in,1); d_list_out := dft(d_list_in,1);
In the next step, we are seeing whether Parseval' identity is verified or not. This is a safety check to make sure that we are "on track"
.
You will have to "uncomment" it to see it in action.
>
# evalf(sum(abs(c_list_in[i])^2,i=1..nops(c_list_in))); evalf(sum(abs(c_list_out[i])^2,i=1..nops(c_list_in))/nops(c_list_out));
We are now performing our pointwise multiplication.
>
e_list_in := zip((a,b)->a*b,c_list_out,d_list_out);
We are now ready to perform the IDFT. We should, if everything goes well, recover the coefficients of
. Since
and
are chosen with real coefficients we should recover numbers with no or very tiny imaginary values. We will therefore extract the real part of our final list. Also keep in mind that we will be getting a list that will, most of the time, be padded with zeros. The reason is as follows: recall that
will be of degree "2*deg" and will "have" "2*deg+1" coefficients. This number will, in general, be less than power_of_two.
>
e_list_out := dft(e_list_in,-1);
>
real_part_e := map(Re,e_list_out);
>
dft_time := time() - st:
>
final_answer_2 := [seq(real_part_e[i],i=1..2*deg+1)];
How do the coefficients of the resulting polynomials match under "traditional" and "DFT". (Getting all zeros is good!)
>
final_answer_1 - final_answer_2;
The FFT Approach
We will now retrace the steps we followed when applying the DFT/IDFT combination, using the FFT/IFFT combination this time.
>
st := time();
>
c_list_out := fastft(c_list_in,1); d_list_out := fastft(d_list_in,1);
Again we check for Parseval's identity.
>
# evalf(sum(abs(c_list_in[i])^2,i=1..nops(c_list_in))); evalf(sum(abs(c_list_out[i])^2,i=1..nops(c_list_in))/nops(c_list_out));
>
e_list_in := zip((a,b)->a*b,c_list_out,d_list_out);
>
e_list_out := fastft(e_list_in,-1);
>
real_part_e := map(Re,e_list_out);
>
final_answer_3 := [seq(real_part_e[i],i=1..2*deg+1)];
>
# p[3] := zeros_plot(final_answer_2,red): display(p[3]);
>
fft_time := time() - st:
>
final_answer_1 - final_answer_3;
Time Comparisons
Here is some results on a Pentium III.
You will notice where the FFT overtakes the traditional method.
For deg(
) = 20,
standard_time = 0.015, dft_time := 15.7, fft_time = 0.275
For deg(
) = 30,
standard_time = 0.035, dft_time := 27.6, fft_time = 0.4
For deg(
) = 50,
standard_time = 0.235, dft_time := 307.2, fft_time = 1.05
For deg(
) = 100,
standard_time = 0.345, dft_time := Not Run , fft_time = 1.168
For deg(
) = 300,
standard_time = 3.723, dft_time := Not Run , fft_time = 6.009
For deg(
) = 400,
standard_time = 7.996, dft_time := Not Run , fft_time = 5.948
For deg(
) = 500,
standard_time = 10.831, dft_time := Not Run , fft_time = 5.873
For deg(
) = 2000,
standard_time = 168.822, dft_time := Not Run , fft_time = 35.424
>
standard_time; dft_time; fft_time;
>
References
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Introduction to Algorithms, MIT Press (1990)
[2] David W. Kammler, A First Course in Fourier Analysis, Prentice-Hall (2000)
[3] J.W. Cooley and J.W. Tukey, An Algorithm for the Machine Computation of Complex Fourier Series,
Math. Comp
. 19(1965), 297-301.