 Efficiency - Maple Help

Efficiency Improvements in Maple 14

Maple 14 includes numerous efficiency improvements.  Some highlights where dramatic speed-ups were achieved are described below. Intel (R) MKL 10.0 (Windows)

 • The Intel (R) Math Kernel Library (Intel (R) MKL) version 10.0 replaces a prior version on 32-bit Windows, and replaces generic BLAS on 64-bit Windows.  These highly optimized core routines are used in various places throughout Maple.
 • The following example highlights how performance in Maple 14 is significantly improved.
 > M:=LinearAlgebra:-RandomMatrix(2000,datatype=float):
 > time( M.M );
 ${1.343}$ (1)

On a 2.13GHz Core2 Duo, running 32-bit Windows, Maple 14 takes 2.44 seconds now compared to 7.95 seconds in Maple 13.02.

On a 2.00GHz dual quad core Xeon, running 64-bit Windows, Maple 14 takes 2.437 seconds now compared to 25.593 seconds in Maple 13.02. Cyclotomic Polynomials

 • A new implementation of numtheory[cyclotomic] constructs cyclotomic polynomials significantly faster than the previous implementation for large prime factors.
 • As a comparison, consider the following example:
 > t:= time():
 > numtheory[cyclotomic](255255,x):
 > time() - t;
 ${0.228}$ (2)
 This is about 1000 times as fast as in Maple 13. The amount of speedup increases with the size of the problem. Polynomial Arithmetic

 • Polynomial multiplication and division over integers can be performed faster than in Maple 13 because of an improved implementation. Multiple threads may also be used for large polynomial multiplications.
 • The following example is about twenty times as fast as in Maple 13:
 > f,g := seq(randpoly([x,y,z],dense,degree=30),i=1..2):
 > t:= time():
 > assign(p=expand(f*g)):
 > divide(p,f,'q');
 ${\mathrm{true}}$ (3)
 > time() - t;
 ${1.600}$ (4)
 • Higher level commands for polynomial arithmetic, such as factor, benefits from the improvements described above.  The following example is about 10 times as fast as in Maple 13:
 > f := expand((1+x+y+z+t)^11)+1:
 > p:=expand(f*(f+1)):
 > time(factor(p));
 ${2.828}$ (5)
 • Maple 14 adds improved algorithms for multiplication, division, inversion, and gcd of recursive dense polynomials, which is the representation used to compute with polynomials over algebraic number fields.
 • The new algorithms preallocate memory and run in place, which makes them significantly faster when multiple extensions are present.
 • The following example is about twice as fast as in Maple 13:
 > p := modp1(Prime(1)):
 > alpha := RootOf(Randprime(2,x) mod p):
 > beta  := RootOf(Randprime(5,x,alpha) mod p):
 > cofs  := ()->randpoly([alpha,beta],degree=4,dense):
 > f,g,h := seq(Expand(randpoly(x,degree=40,dense,coeffs=cofs)) mod p,i=1..3):
 > time(assign('a'=Expand(f*g) mod p));
 ${0.396}$ (6)
 > time(assign('b'=Expand(g*h) mod p));
 ${0.316}$ (7)
 > time(assign('g'=Gcd(a,b) mod p));
 ${0.824}$ (8) Fetching Information of Polynomials via has, sort, degree, type

 • The commands has, sort, degree, and type are now much faster when algebraic numbers or functions are present.  A new implementation reduces the time complexity of these commands.
 • For the examples below, Maple 14's implementation is 3 seconds faster than Maple 13 on the commands concerning F:
 > c := rand(0..995):
 > r := RootOf(numtheory[cyclotomic](997,Z),Z):
 > F := [seq(add(c()*Z^c()*x^i, i=0..10^4-1), j=1..10)]:
 > G := subs(Z=r, F):
 > time(map(degree, eval(F,1), x));
 ${0.012}$ (9)
 > time(map(degree, eval(G,1), x));
 ${0.012}$ (10)
 > time(map(type, eval(F,1), polynom(anything,x)));
 ${0.004}$ (11)
 > time(map(type, eval(G,1), polynom(anything,x)));
 ${0.008}$ (12) Faster GraphTheory Algorithms

 • Time and memory efficiency of BipartiteMatching, ConnectedComponents and MaxFlow has been improved in Maple 14 because a new algorithm is employed.
 • For the example below, Maple 14 is about 200 times faster and takes 150 MB less memory than Maple 13 to perform the computation.
 > with(GraphTheory):
 > with(SpecialGraphs):
 > bt:=CompleteBinaryTree(8):
 > time(BipartiteMatching(bt));
 ${0.048}$ (13)
 • For the example below, Maple 14 takes 9 fewer seconds and 250 MB less memory than Maple 13 to perform the computation.
 > with(GraphTheory):
 > cg:=CycleGraph(4000):
 > cgc:=GraphComplement(cg):
 > time(ConnectedComponents(cgc));
 ${0.112}$ (14)
 • For the example below, Maple 14 is 9 times faster than Maple 13.
 > with(GraphTheory):
 > with(SpecialGraphs):
 > sb:=SoccerBallGraph():
 > sbw:=MakeWeighted(sb):
 > time(MaxFlow(sbw,1,20));
 ${0.020}$ (15)
 • In addition, basic GraphTheory commands (including CompleteGraph, DisjointUnion, and AddVertex) have been made more efficient for large graphs having many vertices and edges. CUDA Acceleration

 • We have added support for accelerating LinearAlgebra routines using NVIDIA's CUDA technology.  Matrix multiplication can be accelerated for a variety of datatypes and shapes.  See the CUDA package and CUDA,supported_routines for more information.  Note: the following example will only show a speed up if the computer on which it is run supports CUDA.  See CUDA,supported_hardware for more information.
 > N := 20:
 > m1 := LinearAlgebra:-RandomMatrix( 2000,2000,outputoptions=[datatype=float] ):
 > m2 := LinearAlgebra:-RandomMatrix( 2000,2000,outputoptions=[datatype=float] ):
 > t := time[real]():
 > to N do mNoCuda := m1.m2: end:
 > tNoCuda := time[real]()-t;
 ${\mathrm{tNoCuda}}{≔}{24.942}$ (16)
 > CUDA:-Enable( true );
 > t := time[real]():
 > to N do mCuda := m1.m2: end:
 > tCuda := time[real]()-t;
 ${\mathrm{tCuda}}{≔}{2.202}$ (17) . (dot) Operator Improvements

We reduced the overhead of calling the . operator.  This is particularly significant when working with small matrices.

 • In Maple 13, the following commands took over 9 seconds to complete.
 > A := Matrix([[9,5],[6,4]]):
 > B := Matrix([[6,7],[3,10]]):
 > n := 100:
 > st := time():
 > to n do to n do A.B; od; od:
 > t1 := time()-st;
 ${\mathrm{t1}}{≔}{0.072}$ (18) 