Efficiency Improvements in Maple 14
Maple 14 includes numerous efficiency improvements. Some highlights where dramatic speed-ups were achieved are described below.
Intel (R) MKL 10.0 (Windows)
Cyclotomic Polynomials
Polynomial Arithmetic
Fetching Information of Polynomials via has, sort, degree, type
Faster GraphTheory Algorithms
CUDA Acceleration
. (dot) Operator Improvements
Direct Access to Solvers
The Intel (R) Math Kernel Library (Intel (R) MKL) version 10.0 replaces a prior version on 32-bit Windows, and replaces generic BLAS on 64-bit Windows. These highly optimized core routines are used in various places throughout Maple.
The following example highlights how performance in Maple 14 is significantly improved.
M:=LinearAlgebra:-RandomMatrix(2000,datatype=float[8]):
time( M.M );
0.481
On a 2.13GHz Core2 Duo, running 32-bit Windows, Maple 14 takes 2.44 seconds now compared to 7.95 seconds in Maple 13.02.
On a 2.00GHz dual quad core Xeon, running 64-bit Windows, Maple 14 takes 2.437 seconds now compared to 25.593 seconds in Maple 13.02.
A new implementation of numtheory[cyclotomic] constructs cyclotomic polynomials significantly faster than the previous implementation for large prime factors.
As a comparison, consider the following example:
t:= time():
numtheory[cyclotomic](255255,x):
time() - t;
0.228
This is about 1000 times as fast as in Maple 13. The amount of speedup increases with the size of the problem.
Polynomial multiplication and division over integers can be performed faster than in Maple 13 because of an improved implementation. Multiple threads may also be used for large polynomial multiplications.
The following example is about twenty times as fast as in Maple 13:
f,g := seq(randpoly([x,y,z],dense,degree=30),i=1..2):
assign(p=expand(f*g)):
divide(p,f,'q');
true
1.600
Higher level commands for polynomial arithmetic, such as factor, benefits from the improvements described above. The following example is about 10 times as fast as in Maple 13:
f := expand((1+x+y+z+t)^11)+1:
p:=expand(f*(f+1)):
time(factor(p));
2.828
Maple 14 adds improved algorithms for multiplication, division, inversion, and gcd of recursive dense polynomials, which is the representation used to compute with polynomials over algebraic number fields.
The new algorithms preallocate memory and run in place, which makes them significantly faster when multiple extensions are present.
The following example is about twice as fast as in Maple 13:
p := modp1(Prime(1)):
alpha := RootOf(Randprime(2,x) mod p):
beta := RootOf(Randprime(5,x,alpha) mod p):
cofs := ()->randpoly([alpha,beta],degree=4,dense):
f,g,h := seq(Expand(randpoly(x,degree=40,dense,coeffs=cofs)) mod p,i=1..3):
time(assign('a'=Expand(f*g) mod p));
0.396
time(assign('b'=Expand(g*h) mod p));
0.316
time(assign('g'=Gcd(a,b) mod p));
0.824
The commands has, sort, degree, and type are now much faster when algebraic numbers or functions are present. A new implementation reduces the time complexity of these commands.
For the examples below, Maple 14's implementation is 3 seconds faster than Maple 13 on the commands concerning F:
c := rand(0..995):
r := RootOf(numtheory[cyclotomic](997,Z),Z):
F := [seq(add(c()*Z^c()*x^i, i=0..10^4-1), j=1..10)]:
G := subs(Z=r, F):
time(map(degree, eval(F,1), x));
0.012
time(map(degree, eval(G,1), x));
time(map(type, eval(F,1), polynom(anything,x)));
0.004
time(map(type, eval(G,1), polynom(anything,x)));
0.008
Time and memory efficiency of BipartiteMatching, ConnectedComponents and MaxFlow has been improved in Maple 14 because a new algorithm is employed.
For the example below, Maple 14 is about 200 times faster and takes 150 MB less memory than Maple 13 to perform the computation.
with(GraphTheory):
with(SpecialGraphs):
bt:=CompleteBinaryTree(8):
time(BipartiteMatching(bt));
0.048
For the example below, Maple 14 takes 9 fewer seconds and 250 MB less memory than Maple 13 to perform the computation.
cg:=CycleGraph(4000):
cgc:=GraphComplement(cg):
time(ConnectedComponents(cgc));
0.112
For the example below, Maple 14 is 9 times faster than Maple 13.
sb:=SoccerBallGraph():
sbw:=MakeWeighted(sb):
time(MaxFlow(sbw,1,20));
0.020
In addition, basic GraphTheory commands (including CompleteGraph, DisjointUnion, and AddVertex) have been made more efficient for large graphs having many vertices and edges.
We have added support for accelerating LinearAlgebra routines using NVIDIA's CUDA technology. Matrix multiplication can be accelerated for a variety of datatypes and shapes. See the CUDA package and CUDA,supported_routines for more information. Note: the following example will only show a speed up if the computer on which it is run supports CUDA. See CUDA,supported_hardware for more information.
N := 20:
m1 := LinearAlgebra:-RandomMatrix( 2000,2000,outputoptions=[datatype=float[4]] ):
m2 := LinearAlgebra:-RandomMatrix( 2000,2000,outputoptions=[datatype=float[4]] ):
t := time[real]():
to N do mNoCuda := m1.m2: end:
tNoCuda := time[real]()-t;
tNoCuda≔24.942
CUDA:-Enable( true );
to N do mCuda := m1.m2: end:
tCuda := time[real]()-t;
tCuda≔2.202
We reduced the overhead of calling the . operator. This is particularly significant when working with small matrices.
In Maple 13, the following commands took over 9 seconds to complete.
A := Matrix([[9,5],[6,4]]):
B := Matrix([[6,7],[3,10]]):
n := 100:
st := time():
to n do to n do A.B; od; od:
t1 := time()-st;
t1≔0.072
Univariate polynomial solving can be accessed directly with the new command SolveTools[Polynomial] and multivariate solving can be accessed with SolveTools[PolynomialSystem]. Both commands are can be more efficient than solve on purely polynomial equations, since they avoid a large amount of preprocessing and dispatch overhead much like SolveTools[Linear]. See Enhancements to Symbolic Capabilities in Maple 14 for details.
See Also
Index of New Maple 14 Features
Download Help Document