TR-00-21.mws
Symbolic-Numeric Algorithms for Polynomials
Robert M. Corless, Deputy Director
Ontario Research Centre for Computer Algebra
Professor, Department of Applied Mathematics
Honorary Professor, Department of Computer Science
University of Western Ontario, London, Canada
Rob.Corless@uwo.ca
http://www.orcca.on.ca
Concepts, Definitions and Applications.
Approximate Polynomial
An
approximate polynomial
is simply a polynomial with inexactly-known coefficients. You may have data with measurement errors that provides, after some kind of fitting, a polynomial that describes your data. You may have a formula with parameters for each coefficient, but the values of the parameters may be known only through experimental measurements. Approximate polynomials arise in many applied problems, from direct mathematical models through normal forms and bifurcation theory to geometric problems described in floating-point arithmetic for Computer-Aided Design.
The idea that we don't know precisely what the coefficients are moves us from the domain of algebra (in the case of polynomials, from classical algebra) to the domain of analysis. In particular, this allows us to bring modern tools of numerical analysis to problems of computational algebra and geometry.
Experience with numerical
linear
algebra suggests that it will be useful to replace classical discrete algorithms for polynomial problems with continuous, iterative algorithms for approximate polynomial problems. In this fashion we hope to achieve a body of results useful enough to be termed
numerical nonlinear algebra
.
Groebner basis
Suppose that we have a system of multivariate polynomial equations. A Groebner basis (aka standard basis) is, for the purpose of discussion here, an equivalent but simpler system of multivariate polynomial equations: equivalent in that the Groebner basis describes the same variety (has the same zeros) and is simpler in that many questions about the polynomial system (e.g. ideal membership, counting the number of roots) are easier to answer. An excellent introduction to Groebner bases can be found in
David Cox, John Little, and Donal O'Shea,
Ideals, Varieties and Algorithms,
Springer-Verlag, 1992.
Groebner bases can be computed for many systems of practical interest, but not all: the Buchberger algorithm is a finite algorithm, but some Groebner bases are exponentially more lengthy than their input.
>
with(Groebner);
>
origsys := { x^2-y^2 - 1, x^3 - y^3 - 1};
>
gb := gbasis( origsys, plex(x,y) );
Notice in that
lexicographic ordered
Groebner basis, the first polynomial contains only y; the second is linear in x, and once y is known, the corresponding x is known. Therefore there are 4 roots to the original system (and here they are easy to find). Lexicographic ordered bases are problematic, however, in that (a) they can be doubly-exponential (!) in cost to compute, and (b) they are at least as numerically unstable as computing the characteristic polynomial is for an eigenvalue problem (indeed this is a generalization of that process).
So for computation we prefer a degree ordering:
>
gbt := gbasis( origsys, tdeg(x,y) );
To the human eye, this is not as convenient as a lex-order basis---but we will see later that computationally it is better in some ways.
>
Discontinuity
A very significant problem with Groebner bases (or many other polynomial concepts, such as GCD) is that they can be
discontinuous
with respect to parameters. This means that we will have problems with numerical computation of Groebner bases (particularly for approximate polynomials) and we will also have problems with symbolic computation of Groebner bases---because the generic-case computed solution will not always be true for all values of the parameters.
>
restart;
>
with(Groebner):
>
origsys := {c*x^2 - y^2 - 1, x^3 - y^3 - 1};
>
gb := gbasis(origsys,tdeg(x,y));
>
map(limit, gb, c=0 );
That is what we would get by substituting c=0 into our Groebner basis, but we see that if we do things in the other order, we get a different result (indeed, the above is not even a Groebner basis).
>
gb0 := gbasis(map(limit,origsys,c=0), tdeg(x,y) );
This problem is solved in general by considering a so-called comprehensive Groebner basis (see the papers by Weispfenning, and work in REDLOG by Dolzmann and Sturm). We take a ``deferred evaluation'' approach here and simply keep this problem in mind (we will see a method in an example below).
Commuting families of matrices
A matrix family A[i], i = 1..n, is a commuting family if A[i]A[j] = A[j]A[i] for all i,j in 1..n. Of course, only special families of matrices commute with each other. When matrices commute, there are many practical consequences, the first of which is that their invariant subspaces are related.
Theorem
(see e.g. Horn and Johnson): If A[i] is a commuting family, then there exists a unitary matrix V that simultaneously upper-triangularizes the family: T[i] = V* A[i] V is upper triangular for all i.
Commuting families of matrices arise in this context as the (dual of the) standard representation for multiplication by the variable x[i] in the commutative ring P/I, where I is the ideal generated by a system of polynomial equations.
For approximate polynomials and approximate computation of the commuting families, using floating-point arithmetic, we have to think about what happens if we have only a
nearly-commuting
family of matrices: ||A[i]A[j] -A[j]A[i]|| < e. It turns out that
Theorem
(Corless, Gianni & Trager 1997): If A[i] is a nearly-commuting family with commutators bounded by epsilon, then there exists a unitary matrix V that simultaneously nearly block upper-triangularizes the family; the size of the subdiagonal entries is bounded by epsilon, and each block clusters nearby eigenvalues.
Problems examined to date
GCD of approximate polynomials
Corless, Gianni, Trager, & Watt
``The SVD for Polynomial Systems'', Proceedings 1995 ISSAC, Montreal.
Sigsam Bulletin Special Issue on Symbolic-Numeric Algorithms for Polynomials, 1996, RMC ed.
Karmarkar & Lakshman,
``Approximate Polynomial Greatest Common Divisors and Nearest Singular Polynomials", Proceedings 1996 ISSAC, Zuerich
Chin, Corless, & Corliss
, ``Optimization Strategies for the Approximate GCD Problem'', Proceedings 1997 ISSAC, Rostock.
Journal of Symbolic Computation Special Issue on Symbolic-Numeric Algorithms for Polynomials, Stetter & Watt eds.
Hans J. Stetter
, ``The Nearest Polynomial with a Given Zero, and Similar Problems'', Sigsam Bulletin 33 no. 4 December 1999 pp 2--4.
Many other authors, including
Andre Galligo, Victor Pan
.
Functional decomposition of approximate polynomials
Corless, Giesbrecht, Jeffrey & Watt
, ``Approximate Polynomial Decomposition", Proc. ISSAC 1999 Vancouver.
Finding roots of multivariate systems via eigenproblems
Auzinger & Stetter
, ``An Elimination Algorithm for the Computation of All Zeros of
a System of Multivariate Polynomial Equations", Proc. Numerical Mathematics, Singapore, World Scientific, 1988
Corless, Gianni, Trager, & Watt
``The SVD for Polynomial Systems'', Proceedings 1995 ISSAC, Montreal.
Sigsam Bulletin Special Issue on Symbolic-Numeric Algorithms for Polynomials, 1996, RMC ed.
Journal of Symbolic Computation Special Issue on Symbolic-Numeric Algorithms for Polynomials, 1998, Stetter & Watt eds.
Many papers of
Bernard Mourrain
Many papers of
Ioannis Emiris
Numerical Implicitization by Linear Algebra
Corless, Giesbrecht, Kotsireas, Watt
, ``Numerical Implicitization of Parametric Hypersurfaces with Linear Algebra", proceedings AISC 2000, Madrid. Springer LNAI 1930.
Division of approximate polynomials
Corless, Giesbrecht, Kotsireas, Watt,
``Division of Approximate Polynomials'', in progress.
Examples
Implicitization
>
restart;
Suppose we have the following parametric representation of a curve.
>
xp := t -> (t*(3-t^2))/(1+t^2)^2;
>
yp := t -> (t^2*(3-t^2))/(1+t^2)^2;
>
plot([xp,yp,-10..10], view=[-1..1,-1..1], scaling=CONSTRAINED);
We can find an implicit (x,y) representation of this curve from the parameterization in many ways, for example by resultants:
>
resultant(numer(X-xp(t)),numer(Y-yp(t)),t)/256;
Or, by using Groebner bases:
>
with(Groebner):
>
eqs := {denom(xp(t))*X-numer(xp(t)),
denom(yp(t))*Y-numer(yp(t)),
denom(xp(t))*Z-1};
>
gb := gbasis( eqs, plex(Z,t,X,Y) ):
>
select(v->has(v,[X,Y]) and not(has(v,[t,Z])),gb);
"This is based on the theorem which says that the elements of the lex grobner basis that do not contain the auxiliary variables t,Z are a basis of the implicit ideal." Ilias Kotsireas, Sept 15 2000.
A new method, using linear algebra techniques,
and which works even for non-rational parameterizations
, runs as follows.
>
with(LinearAlgebra):
We first assume that we know (or are progressing to compute) the degree of the implicitization. Here we simply plop down all possible terms up to degree 4.
>
M:=(x,y) -> Vector(15,[1, x, y, x^2, x*y, y^2,x^3,x^2*y,x*y^2,y^3,x^4,x^3*y,x^2*y^2,x*y^3,y^4]);
We make a matrix out of these terms:
>
M1:= M(xp(t),yp(t)) . Transpose(M(xp(t),yp(t)));
Now we integrate those functions of t over some random interval.
>
# M3:=map(evalf,map(Int, M1, t=-10..10));
Doing it that way, by numerical integration, spends a lot of time doing integration when just sampling is really good enough. We take twice as many samples as we need, to avoid sensitivity of interpolation.
>
UseHardwareFloats := false;
>
Digits := 50;
>
Mfinite := Matrix(15,15):
>
Npoints := 30:
>
for i to Npoints do
ti := -10. + 20.*(i-1)/(Npoints-1);
xpt := xp(ti);
ypt := yp(ti);
Mfinite := Mfinite + M(xpt,ypt).Transpose(M(xpt,ypt));
od:
>
Ufinite, Sfinite, Vtfinite := SingularValues(Mfinite,output=['U','S','Vt']);
>
>
# U,S,Vt := SingularValues(M3, output=['U','S','Vt']);
>
Sfinite[1];
>
Sfinite[14];
>
Sfinite[15];
>
nv := Vector(15,[seq(Vtfinite[15,i],i=1..15)]);
>
ex:=fnormal( Transpose(M(x,y)).nv ,6);
>
p := fnormal(ex/coeff(ex,y,4),6);
>
simplify(subs(x=xp(t),y=yp(t),p));
Schoenhage's GCD example
>
restart;
>
f := x^4 + x + 1;
>
g := x^3 + eta*x;
>
S := LinearAlgebra[SylvesterMatrix]( f, g, x );
>
Sinv := LinearAlgebra[MatrixInverse](S):
>
Sinvs := map(series, Sinv, eta, 3 );
>
Si0 := map(coeff, Sinvs, eta, 0 );
>
Si1 := map(coeff, Sinvs, eta, 1 );
>
resultant(f,g,x);
>
fsolve(%,eta,complex);
>
map(abs,[%]);
>
with(LinearAlgebra):
>
SingularValues(evalf(eval(S,eta=0)));
>
%[7]/%[1];
So, for small but nonzero eta, the nearest singular Sylvester matrix is at least 0.124 away in norm; that is, the nearest pair of polynomials f+deltaf and g + deltag that have a nontrivial GCD must have norms of delta g and delta f of about 0.1.
Therefore, for small but nonzero eta, the GCD of f and g is 1, and moreover the
approximate gcd
is also 1, for tolerances smaller than about 0.1.
But if we run Euclid's algorithm on this problem:
>
r1 := rem( f, g, x );
>
r2 := rem( g, r1, x );
>
r3 := rem( r1, r2, x );
Note carefully: r1 and r2 have a nontrivial approximate GCD if eta is small!
>
solve(r1,x);
>
map(series,[%],eta,4);
>
solve(r2,x);
>
series(%,eta,4);
So r1 and r2 have, to O(eta^2), a common root. Therefore, a small perturbation of r1 or of r2 (on the order of eta^2) will bring them to having an exact common root.
Theorem
: A step of the Euclidean Algorithm does not necessarily preserve approximate gcd.
>
A Geometric Intersection Problem
>
restart;
>
with(Groebner);
>
f := x^2 + y^2 - 1;
>
g := x*y-5/13*12/13;
>
plot({sqrt(1-x^2),-sqrt(1-x^2),60/169/x},x=-1.5..1.5, view=[-1.5..1.5,-1.5..1.5],colour=black,scaling=CONSTRAINED);
>
gb := gbasis([f,g],tdeg(x,y));
>
ns,rv := SetBasis( gb, tdeg(x,y) );
>
Mx := MulMatrix(x,ns,rv,gb,tdeg(x,y));
>
My := MulMatrix(y,ns,rv,gb,tdeg(x,y));
>
Mx.My - My.Mx;
>
with(LinearAlgebra):
>
Px := JordanForm( Mx, output='Q');
>
xvals := Px^(-1).Mx.Px;
>
yvals := Px^(-1).My.Px;
>
answers := [seq([xvals[i,i],yvals[i,i]],i=1..4)];
>
for i to 4 do
subs( x=answers[i][1], y=answers[i][2], [f,g] );
od;
>
A Lagrange Multiplier Problem
>
restart;
First load in the routines.
>
with(Groebner);
>
f := x^3 + 2*x*y*z - z^2:
>
g := x^2 + y^2 + z^2 - 1:
>
F := f + lambda*g;
>
gF := convert(linalg[grad](F, [lambda, x, y, z]),set);
>
gb := gbasis(gF, tdeg(lambda,x,y,z)):
Now call SetBasis to get the normal set of this Groebner basis.
>
ns, rv := SetBasis(gb, tdeg(lambda,x,y,z)):
>
ns;
We don't wish to look at these 12 by 12 matrices here, but we will verify that the matrices commute.
>
M[x] := MulMatrix(x, ns, rv, gb, tdeg(lambda,x,y,z)):
>
M[y] := MulMatrix(y, ns, rv, gb, tdeg(lambda,x,y,z)):
>
M[z] := MulMatrix(z, ns, rv, gb, tdeg(lambda,x,y,z)):
>
M[lambda] := MulMatrix(lambda, ns, rv, gb, tdeg(lambda,x,y,z)):
>
with(LinearAlgebra):
>
Norm( M[x].M[y]-M[y].M[x], infinity);
>
alpha := RandomVector(4);
>
Digits := trunc(evalhf(Digits));
>
alpha := evalf(alpha/Norm(alpha,2));
Since the matrices commute, we know that there exists a unitary matrix V which simultaneously upper-triangularizes all four matrices. However, multiple roots cause a problem, and it is often the case (and is the case here) that because of symmetries in the problem, each individual coordinate matrix may have multiple roots --- but overall, the system has only single roots (think of x^2+y^2-1 and (x+y)(x-y)=12*5/13^2). A good way to ensure that only genuine multiplicity causes a problem is to take a generic linear combination of the multiplication matrices and use that as the basis of the eigenvalue computation.
>
M[combination] := alpha[1]*M[x] + alpha[2]*M[y] + alpha[3]*M[z] + alpha[4]*M[lambda];
>
P[lambda] := JordanForm(M[combination],output='Q');
>
xvals := P[lambda]^(-1).M[x].P[lambda];
>
yvals := P[lambda]^(-1).M[y].P[lambda];
>
zvals := P[lambda]^(-1).M[z].P[lambda];
>
lambdavals := P[lambda]^(-1).M[lambda].P[lambda];
>
answers := map(fnormal, [seq( [xvals[i,i], yvals[i,i], zvals[i,i], lambdavals[i,i]], i=1..12)]);
>
for i to 12 do
map(abs, eval(gF, {x=xvals[i,i], y=yvals[i,i], z=zvals[i,i], lambda=lambdavals[i,i]} ));
od;
>
Fvals := Vector(12):
>
for i to 12 do
Fvals[i] := fnormal(eval( F, {x=xvals[i,i], y=yvals[i,i], z=zvals[i,i], lambda=lambdavals[i,i]} ));
od;
>
min(seq(Re(Fvals[i]),i=1..12));
>
answers[8];
>
eval(F,[x=-2/3,y=1/3,z=2/3,lambda=4/3]);
>
evalf(%);
>
max(seq(Re(Fvals[i]),i=1..12));
>
An example from Li, Sauer, and Yorke ``The Cheater's Homotopy''
>
restart;
A problem with free parameters. This is taken from
the paper ``The Cheater's Homotopy'' by T. Y. Li, Tim Sauer, and J. A.
Yorke, from the SIAM J. Numerical Analysis, Volume 26, No. 5, pp. 1241-1251, October 1989. This problem cannot be done by a pure lexicographic ordered Groebner basis (the answer is just too complicated to be of any use even if we could calculate it in under two hours and 267 Mbytes).
But the approach here works in under a minute on a 486, with only a small
amount of memory.
>
LSY[1] := x^3*y^2+c1*x^3*y+y^2+c2*x+c3;
>
LSY[2] := c4*x^4*y^2-x^2*y+y+c5;
>
with(Groebner):
The following step does not succeed if you ask for a plex order basis.
Since this problem has free parameters, and Groebner bases can be discontinuous with respect to parameters, we have to be a bit careful. The Maple computation is guaranteed to be correct only if the sequence of leading coefficients of the result are nonzero, and if the sequence of gcd's removed in the computation of the primitive parts as the computation progresses is nonzero. To find out about that last sequence, we must use an undocumented feature, namely
>
infolevel[primpart] := 1;
>
gb := gbasis({LSY[1],LSY[2]},tdeg(x,y)):
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 66
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1840712798943858272237612105371699316541728043
Groebner/primpartscale: remove content c3^2+2*c3*c1^2+c1^4
Groebner/primpartscale: remove content 10728291952625167346729110632599977987582603096873725819108467215868918194062213685034046431801554686779885909220864
Groebner/primpartscale: remove content c4^2*c1^7*c5+3*c4^2*c1^5*c5*c3+3*c1^3*c4^2*c5*c3^2+c1*c4^2*c5*c3^3+2*c4^2*c1^8+6*c4^2*c1^6*c3+6*c4^2*c1^4*c3^2+2*c4^2*c1^2*c3^3-c1^7*c4-3*c1^5*c4*c2-3*c1^5*c4*c3-6*c1^3*c4*c3*c2-3*c1^3*c4*c3^2-3*c1*c4*c2*c3^2-c1*c4*c3^3-c1^4*c4*c2*c5-2*c1^2*c4*c3*c2*c5-c2*c3^2*c4*c5+c1^4*c2+c1^2*c2^2+2*c1^2*c3*c2+c3*c2^2+c2*c3^2
Groebner/primpartscale: remove content 103077198649121846052949180212963218623589713585678274405982928270910334890984846313914435688963200802577570247857736242879359562410393600
Groebner/primpartscale: remove content -4*c1^2*c4^2*c3^3+c3^3*c5^2*c4^3*c1-2*c3^3*c5*c4^2*c1+12*c1^4*c4^3*c5*c3^2+4*c1^9*c4^3+c1^7*c5^2*c4^3-4*c1^8*c4^2+c1^7*c4+c1*c4*c3^3+3*c1^3*c4*c3^2+3*c1^5*c4*c3+12*c4^3*c1^7*c3+4*c1^3*c4^3*c3^3-12*c1^6*c4^2*c3-12*c1^4*c4^2*c3^2+12*c4^3*c1^5*c3^2-2*c1^7*c5*c4^2-4*c1^3*c5*c3*c4^2*c2-2*c1^5*c5*c4^2*c2+12*c1^6*c5*c4^3*c3+4*c1^2*c5*c4^3*c3^3+4*c1^8*c5*c4^3+3*c1^5*c5^2*c4^3*c3-6*c1^3*c5*c4^2*c3^2-6*c1^5*c5*c4^2*c3-2*c2*c1*c4^2*c5*c3^2+2*c2*c1^5*c4+2*c2*c1*c4*c3^2+4*c2*c1^3*c4*c3-4*c2*c1^6*c4^2+c1^3*c4*c2^2+c3*c4*c1*c2^2+3*c1^3*c4^3*c3^2*c5^2-4*c2*c1^2*c4^2*c3^2-8*c2*c1^4*c4^2*c3
Groebner/primpartscale: remove content 2*c4*c1^3+c4*c1^2*c5-c1^2+2*c1*c4*c3+c3*c5*c4-c2-c3
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content -c4*c3^2+2*c1^5*c4^2-c4*c1^4-2*c4*c1^2*c3+c4^2*c3^2*c5+2*c3*c4^2*c1^2*c5+c4^2*c1^4*c5-c2*c4*c3+4*c4^2*c1^3*c3-c4*c1^2*c2+2*c1*c4^2*c3^2
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 2*c4*c1^3+c4*c1^2*c5-c1^2+2*c1*c4*c3+c3*c5*c4-c2-c3
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
Groebner/primpartscale: remove content 1
>
leadcoeffs := map(leadcoeff, gb, tdeg(x,y) );
We will assume henceforth (for this exposition) that all of these polynomials in the parameters are nonzero for the parameter values we will use.
>
ns, rv := SetBasis(gb, tdeg(x,y)):
>
ns;
>
nops(ns);
>
infolevel[primpart] := 0;
>
M_x := MulMatrix(x, ns, rv, gb, tdeg(x,y)):
The entries of these sparse matrices are rational functions in the parameters c1 through c5. For example,
>
M_x[10,3];
>
M_y := MulMatrix(y, ns, rv, gb, tdeg(x,y)):
At this point, one may insert numerical values for c1 through c5 and find eigenvalues of a generic (convex random) combination of these two matrices, cluster any multiple roots, and use the Schur vectors to find the roots of the system. We see that there are generically 10 roots. We take random values for the parameters below, as an example, ignoring the possibility of multiple roots.
>
c1 := rand()/10.^12:
c2 := rand()/10.^12:
c3 := rand()/10.^12:
c4 := rand()/10.^12:
c5 := rand()/10.^12:
>
Digits := trunc(evalhf(Digits)):
with(LinearAlgebra):
>
M_x_f := map(eval,M_x):
>
xvals, V := Eigenvectors(evalf(M_x_f)):
>
yvalmtx := map(fnormal, V^(-1) . evalf(map(eval,M_y)) . V ):
>
yvals := [seq(yvalmtx[i,i],i=1..10)]:
Substitute the computed values of x and y into the original equations, to see how nearly the computed quantities satisfy the original equations. To know how accurate our computed x and y are, we need more than just these residuals; we should look at how perturbations in these polynomials affect the roots. We do not do that here.
>
residuals := [seq(subs(x=xvals[i],y=yvals[i],{LSY[1],LSY[2]}),i=1..10)];
>
Alternatively one can use tdeg bases to explore the bifurcation behaviour of these equations; one can quite easily determine parameter sets that force double, triple, or even quadruple roots.
What's next?
We have currently made some progress on the problem of factorization of approximate polynomials. This is a challenge problem, as listed in Erich Kaltofen's recent J.Symb.Comp. paper ``Open Problems in Computer Algebra''.
A proper comparison of the methods of sparse resultants, homotopy methods, and eigenvalue methods for the solution of systems of polynomial equations needs to be done.
There is a great deal of ``cleaning up'' and implementing these SNAP algorithms; a coherent, robust package would be very useful.
Over the horizon, there is the problem of symmetries: a polynomial system may have all its solutions generated from a much smaller set by symmetries. An example of this is quadrature formulae: they can be formulated (by smart humans) as an eigenproblem with n eigenvalues, encapsulating the n! arrangements nicely; whilst a Groebner basis formulation must necessarily lead to a commuting eigenproblem of size n!, with eigenvalues determining each possible arrangement of the nodes. An automatic procedure to detect symmetries or approximate symmetries would be very useful indeed.