Efficiency Improvements in Maple 14
Maple 14 includes numerous efficiency improvements. Some highlights where dramatic speedups were achieved are described below.

Intel (R) MKL 10.0 (Windows)


•

The Intel (R) Math Kernel Library (Intel (R) MKL) version 10.0 replaces a prior version on 32bit Windows, and replaces generic BLAS on 64bit Windows. These highly optimized core routines are used in various places throughout Maple.

•

The following example highlights how performance in Maple 14 is significantly improved.

>

M:=LinearAlgebra:RandomMatrix(2000,datatype=float[8]):

On a 2.13GHz Core2 Duo, running 32bit Windows, Maple 14 takes 2.44 seconds now compared to 7.95 seconds in Maple 13.02.
On a 2.00GHz dual quad core Xeon, running 64bit Windows, Maple 14 takes 2.437 seconds now compared to 25.593 seconds in Maple 13.02.


Cyclotomic Polynomials


•

A new implementation of numtheory[cyclotomic] constructs cyclotomic polynomials significantly faster than the previous implementation for large prime factors.

•

As a comparison, consider the following example:

>

numtheory[cyclotomic](255255,x):


This is about 1000 times as fast as in Maple 13. The amount of speedup increases with the size of the problem.



Polynomial Arithmetic


•

Polynomial multiplication and division over integers can be performed faster than in Maple 13 because of an improved implementation. Multiple threads may also be used for large polynomial multiplications.

•

The following example is about twenty times as fast as in Maple 13:

>

f,g := seq(randpoly([x,y,z],dense,degree=30),i=1..2):

•

Higher level commands for polynomial arithmetic, such as factor, benefits from the improvements described above. The following example is about 10 times as fast as in Maple 13:

>

f := expand((1+x+y+z+t)^11)+1:

•

Maple 14 adds improved algorithms for multiplication, division, inversion, and gcd of recursive dense polynomials, which is the representation used to compute with polynomials over algebraic number fields.

•

The new algorithms preallocate memory and run in place, which makes them significantly faster when multiple extensions are present.

•

The following example is about twice as fast as in Maple 13:

>

alpha := RootOf(Randprime(2,x) mod p):

>

beta := RootOf(Randprime(5,x,alpha) mod p):

>

cofs := ()>randpoly([alpha,beta],degree=4,dense):

>

f,g,h := seq(Expand(randpoly(x,degree=40,dense,coeffs=cofs)) mod p,i=1..3):

>

time(assign('a'=Expand(f*g) mod p));

>

time(assign('b'=Expand(g*h) mod p));

>

time(assign('g'=Gcd(a,b) mod p));



Fetching Information of Polynomials via has, sort, degree, type


•

The commands has, sort, degree, and type are now much faster when algebraic numbers or functions are present. A new implementation reduces the time complexity of these commands.

•

For the examples below, Maple 14's implementation is 3 seconds faster than Maple 13 on the commands concerning F:

>

r := RootOf(numtheory[cyclotomic](997,Z),Z):

>

F := [seq(add(c()*Z^c()*x^i, i=0..10^41), j=1..10)]:

>

time(map(degree, eval(F,1), x));

>

time(map(degree, eval(G,1), x));

>

time(map(type, eval(F,1), polynom(anything,x)));

>

time(map(type, eval(G,1), polynom(anything,x)));



Faster GraphTheory Algorithms


•

For the example below, Maple 14 is about 200 times faster and takes 150 MB less memory than Maple 13 to perform the computation.

>

bt:=CompleteBinaryTree(8):

>

time(BipartiteMatching(bt));

•

For the example below, Maple 14 takes 9 fewer seconds and 250 MB less memory than Maple 13 to perform the computation.

>

cgc:=GraphComplement(cg):

>

time(ConnectedComponents(cgc));

•

For the example below, Maple 14 is 9 times faster than Maple 13.

>

time(MaxFlow(sbw,1,20));

•

In addition, basic GraphTheory commands (including CompleteGraph, DisjointUnion, and AddVertex) have been made more efficient for large graphs having many vertices and edges.



CUDA Acceleration


•

We have added support for accelerating LinearAlgebra routines using NVIDIA's CUDA technology. Matrix multiplication can be accelerated for a variety of datatypes and shapes. See the CUDA package and CUDA,supported_routines for more information. Note: the following example will only show a speed up if the computer on which it is run supports CUDA. See CUDA,supported_hardware for more information.

>

m1 := LinearAlgebra:RandomMatrix( 2000,2000,outputoptions=[datatype=float[4]] ):

>

m2 := LinearAlgebra:RandomMatrix( 2000,2000,outputoptions=[datatype=float[4]] ):

>

to N do mNoCuda := m1.m2: end:

>

tNoCuda := time[real]()t;

${\mathrm{tNoCuda}}{\u2254}{24.942}$
 (16) 
>

to N do mCuda := m1.m2: end:

>

tCuda := time[real]()t;

${\mathrm{tCuda}}{\u2254}{2.202}$
 (17) 


. (dot) Operator Improvements


We reduced the overhead of calling the . operator. This is particularly significant when working with small matrices.
•

In Maple 13, the following commands took over 9 seconds to complete.

>

A := Matrix([[9,5],[6,4]]):

>

B := Matrix([[6,7],[3,10]]):

>

to n do to n do A.B; od; od:

${\mathrm{t1}}{\u2254}{0.072}$
 (18) 


Direct Access to Solvers







