Efficiency Improvements in Maple 13Maple 13 includes numerous efficiency improvements. Some highlights where dramatic speed-ups were achieved are described below.Linear systems over Q and over cyclotomic fieldsMore LinearAlgebra efficiency improvementsModular multivariate polynomial multiplication and divisionNew sparse modular algorithm for Greatest Common Divisors over algebraic number and function fieldsEfficiency improvements to lcoeff and tcoeffheap[extract] is more efficientMultithreaded performanceElement-wise operationsImprovements to GetSubImage in the ImageTools packageDecreased garbage collection frequency

<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk0">Linear systems over Q and over cyclotomic fields</Text-field>A new dense p-adic lift algorithm has improved the efficiency of LinearAlgebra[LinearSolve] and LinearAlgebra[Modular][IntegerLinearSolve]. See Enhancements to LinearAlgebra for details.Maple 13 includes a fast modular algorithm for solving linear systems over cyclotomic fields. For example, if a linear system involves NiMtSSdSb290T2ZHNiRJKnByb3RlY3RlZEdGJkkoX3N5c2xpYkc2IjYkSSJmR0YoSSJ4R0Yo where NiMtSSJmRzYiNiNJInhHRiU= is a cyclotomic polynomial of degree > 1, then the fast code will be used. See also numtheory[cyclotomic].As a comparison, consider the following example:r := RootOf(x^4+x^3+x^2+x+1);cf := ()->randpoly(r, degree = 3):nv := 50:vars := seq(x[i],i=1..nv):sys := {seq(randpoly([vars],degree=1,coeffs=cf),i=1..nv)}:tt := time():solve(sys,{vars}):time()-tt;In Maple 12, this same problem took almost 7 times as long.
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk1">More LinearAlgebra efficiency improvements</Text-field>Copying of a Matrix to a result with a shape (indexing function) is faster. The following Cholesky decomposition operation is approximately twenty times as fast as in Maple 12.with(LinearAlgebra):N := 1000:
M := RandomMatrix(N,N,generator=-0.5..0.5, outputoptions=[datatype=float[8]]):L := Matrix(M,shape=triangular[lower]):M := Matrix(L.Transpose(L)):for i from 1 to N do M[i,i]:=i; end do:for i from 1 to N do M[i,i] := i; end do:st := time():LUDecomposition(M,method=Cholesky):time() - st;The following LU decomposition operation is approximately ten times as fast and requires only 80% of the memory allocation as compared to Maple 12.with(LinearAlgebra):N := 1000:
M := RandomMatrix(N,N,generator=-0.5..0.5, outputoptions=[datatype=float[8]]):st := time():LUDecomposition(M):time()-st;The new thin option to SingularValues gives a result with only NiMtSSRtaW5HSSpwcm90ZWN0ZWRHRiU2JEkibUc2IkkibkdGKA== right and left singular vectors. This produces a marked speedup for LeastSquares solving when the number of rows is much larger than the number of columns. The following singular values decomposition operation is approximately thirty times as fast and requires only 4% of the memory allocation as compared to the non-thin computation available in Maple 12.with(LinearAlgebra):m := 10000:
n := 100:
M := RandomMatrix(m,n,generator=-1.0..1.0, outputoptions=[datatype=float[8]]):st := time():SingularValues(M,output=[U,S,Vt],thin=true):time()-st;
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk2">Modular multivariate polynomial multiplication and division</Text-field>New heap-based modular multivariate polynomial multiplication and division algorithms have been added to Maple which are faster than the prior multiplication and division implementations.This expansion has sped up by approximately 50%:p1 := randpoly([x,y,z],degree=20,terms=50):tt := time():p2 := modp(Expand(p1 ^ 3),13):time()-tt;This division has sped up by approximately 60%:tt := time():modp(Divide(p2,p1),13);time()-tt;For more information about these commands, see Expand and Divide.
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk3">New sparse modular algorithm for Greatest Common Divisors over algebraic number and function fields</Text-field>A new algorithm has been added for evaluating Greatest Common Divisors over algebraic number and function fields, which is particularly effective for sparse polynomials. For example, this Gcd computation is about 95% faster:alias(p=RootOf(z^3+t*z^2+s*t,z)):alias(q=RootOf(z^2+2)):g := (y+p*t+q)*(t*x+s*y*p+q):a := t*x+s*y+p*s*t+p^2+q:b := x+s*x*y-t*p^2+p+t+q*s:A := evala(Expand(g*a)):B := evala(Expand(g*b)):tt := time():evala(Gcd(A, B));time()-tt;
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk4">Efficiency improvements to lcoeff and tcoeff</Text-field>The efficiency of lcoeff and tcoeff has been improved. The following computation of the leading and trailing coefficients took more than twice as long and used considerably more memory in Maple 12:vars := [seq(x[i],i=0..19)];f := add(i*vars[i], i=1..20);g := expand(f^6):(nops,length)(g);t := time():
(lcoeff,tcoeff)(g, vars);
time() - t;Other improvements to lcoeff and tcoeff are described in Enhancements to Symbolic Capabilities in Maple 13.
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk5">heap[extract] is more efficient</Text-field>The heap[extract] function now performs fewer comparisons than in prior Maple versions. The number of comparisons performed in extracting all elements from an NiNJIm5HNiI= element heap for Maple 13 versus Maple 12 is provided in the following table:

Number ofElementsMaple 12Maple 1310272620896830158116402391705033824260416295705173458064241690737483100841554

<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk6">Multithreaded performance</Text-field>Many parts of the Maple evaluation engine have been updated to improve the performance when executing in parallel. Although there is still more work to be done, the performance in parallel is significantly faster.p := proc(A::Vector,B::Vector,C::Vector,m::integer,n::integer)
option hfloat;
local i;
for i from m to n
do
C[i] := A[i]^2+B[i]^2;
end do;
end proc:
n := 10^7:
m := n/2:
A := LinearAlgebra:-RandomVector( n, outputoptions=[datatype=hfloat] ):
B := LinearAlgebra:-RandomVector( n, outputoptions=[datatype=hfloat] ):
C := Vector( 1..n, datatype=hfloat ):tt := time[real]():
p(A,B,C,1,m):
p(A,B,C,m+1,n):
time[real]() - tt;tt := time[real]():
id1 := Threads:-Create( p(A,B,C,1,m) ):
p(A,B,C,m+1,n):
Threads:-Wait( id1 ):
time[real]() - tt;In Maple 12, the parallel case takes 35.568 seconds, slightly longer than the single threaded case.
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk7">Element-wise operations</Text-field>The elementwise operator syntax can be more efficient than zip or map in some cases. Specific operations have been optimized compared to the general code used by zip or map. In other cases, depending on the input, built-in hardware floating-point routines can be used directly.A := LinearAlgebra:-RandomMatrix(1000):time(zip(`*`,A,2));time( A *~ 2 );B := LinearAlgebra:-RandomMatrix(1000,outputoptions=[datatype=float[8]]):time(zip(`*`,B,2));time( B *~ 2 );time(map(`sin`,B));time( sin~(B));
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk8">Improvements to GetSubImage in the ImageTools package</Text-field>The GetSubImage routine of the ImageTools package is faster for the in-place case where the output is specified. The following repeated GetSubImage operation is approximately twice as fast as in Maple 12.N := 500:n := 100:image := Array(1..N,1..N,datatype=float[8],order=C_order):container := Array(1..n,1..n,datatype=float[8],order=C_order):st := time():for i from 1 to 1000 do
ImageTools:-GetSubImage(image,10,10,n,n,output=container);
end do:time() - st;
<Text-field style="Heading 2" layout="Heading 2" bookmark="bkmrk9">Decreased garbage collection frequency</Text-field>In previous versions of Maple, a garbage collection cycle was initiated after allocating approximately 10^6 words of memory. Given the increase in the amount of memory available on current computers the frequency of garbage collection can be reduced without placing a significant burden on the memory subsystem. In Maple 13, the garbage collector is activated after allocating a minimum of 10^7 words. On average, this yields an improvement of 19% on 32-bit processors and about 10% for 64-bit processors with a modest increase in memory usage. The frequency of garbage collection can be tuned to achieve the best performance for your application by adjusting the gcfreq variable using the kernelopts mechanism. For example,with(LinearAlgebra):gcTest := proc()
local A, n, resultList;
for n to 30 do
A := RandomMatrix(n,n);
resultList:= LUDecomposition(A,output=['P','L','U1','R']);
end do;
end proc:kernelopts(gcfreq=10^6);gc():
start := time():
gcTest():
t1 := time() - start;kernelopts(gcfreq=10^7);gc():
start := time():
gcTest():
t2:= time() - start;See AlsoIndex of New Maple 13 Features