Comparison of Multivariate
Optimization Methods
© Sergey Moiseev 2006, smoiseev@kodofon.vrn.ru, Kodofon, Russia

Introduction


This application demonstrates the use of Maple to compare methods of unconstrained nonlinear minimization of multivariable function. Seven methods of nonlinear minimization of the nvariables objective function f(x1,x2,…,xn) are analyzed:
1.

Minimum search by coordinate and conjugate directions descent  original minimization method developed by the author (the Descent program);

2.

Powell's method (the Powell program);

3.

Modified HookeJeeves method (the Hooke program);

4.

Simplex NelderMeed method (the Nelder program);

5.

Quasigradient method of minimum searching (the Qgrad program);

6.

Random directions searching (the Random program);

7.

Simulated annealing (the Anneal program).

All methods are direct searching methods; i.e. they do not require the objective function f(x1,x2,…,xn) to be differentiable and continuous. Optimization methods have been compared on the set of 21 test functions. Maple's optimization efficiency is compared with these programs.

Brief Description of Optimization Methods



Minimum Search By Coordinate And Conjugate Directions Descent


The method based on the idea of coordinates descent by all initial coordinates and when possible, the descent in the directions conjugated to initial coordinates. Let's recall that two directions and are conjugated if ; , where H  positive definite Hessian matrix: . If the function f(x1,x2,…xn) is the ndimensional positive defined quadratic form or ndimensional paraboloid; its minimum is obtained for n moves in the direction of n various conjugated directions.
Schematically the computing procedure of algorithm implemented in the Descent program consists in the following rules. At the beginning, there is movement in the line of initial coordinates x1,x2,…,xn by turn. The local minimum obtained on the kth iteration at movement in the line of coordinate is maintained. Later, when finding the second minimum on the (k+m)th iteration movement in the direction from the first minimum to the second one is carried out at once. As both minimums and have been obtained at the movement in the line of the same coordinate , the direction from to and direction in the line of coordinate is conjugate. If n>2 the third conjugate direction connecting two minimums, obtained at moving in the line of two identical previous conjugate directions is constructed. Thus the Descent program approximates function n variables by a sequence of twodimensional and threedimensional paraboloids.
If coordinates descent is unsuccessful, the additional evaluation of the objective function value in the point of the minimum ndimensional degenerate paraboloid is produced by which the objective function is approximated. This procedure allows considerably increasing reliability of the program.
Except the descent in the line of coordinate axes and conjugate directions in the program Descent the moving on curvilinear direction takes place. Curvilinear direction is obtained by parabolic approximation of three last coordinates of local minimums found at the movement by the last coordinate. Movement in curvilinear direction allows efficient passing narrow curved canyons, as in Rosenbrock function.


Powell's Method


In the Powell's method [1] the complete set of n conjugate directions by the following recurrent rule is under construction: the kth conjugate direction coincides with the direction connecting two local minimums, obtained at movement along previous k1 identical conjugate directions. If at movement in the kth conjugate direction the function considerably decreases, the appropriate coordinate axis on this conjugate direction is replaced. Thus, unlike the Descent program, the Powell program approximates function n variables by a sequence of ndimensional paraboloids during optimization.


The Modified HookeJeeves Method


The HookeJeeves method [1] is modification of the simplest coordinates descent method. First, k iterations of this method are carried out similarly to those of the coordinates descent method. After finding the local minimum after the k iteration the search is carried out in the direction connecting initial approximation and k approximation (socalled pattern search).
The modification of the HookeJeeves method performed by the mainly consists in the following: variable step of search and quadratic interpolation have been used at linear search on each coordinate. Besides increase of the probability the direction of pattern search coincides with direction conjugated with coordinate axes.


Simplex NelderMeed Method


When searching a minimum of nvariables function by the NelderMeed method [1] as trial values, the points located in the vertex of simplex (polyhedron) are taken. In each vertex of the simplex objective function values are calculated. Further, initial polyhedron is deformed by one of special procedures (reflection, expansion, compression or reductions) so that the vertex with a maximum function value to be transformed into a vertex with smaller function value. Deforming of the polyhedron proceeds so long as its size does not decrease up to the given tolerance. The barycentre of the final polyhedron are taken as approximation to a point of a minimum.


QuasiGradient Method of Minimum Search


The search direction of this method is directly opposite to the direction of a gradient vector of the function f(x1,x2,…,xn). As against the gradient method requiring knowledge of the first analytic derivatives of the objective function, derivatives is estimated approximately by meanse of difference derivation.


Random Directions Search


Search directions of this method are random and independent of each other. If the direction is chosen unsuccessfully, another random direction gets out.


Simulated Annealing


Simulated annealing [2] is a randomsearch technique utilizing an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure (the annealing process). Simulated annealing's major advantage over other methods is an ability to avoid becoming trapped in local minimum. The algorithm employs a random search which not only accepts changes that decrease the objective function f, but also some changes increasing it. The algorithm starts with an initial feasible solution, which is set as the current solution. Randomly, a neighboring solution from the solution space is chosen, and its energy (objective function) f is compared to that of the current solution. If the energy is decreased, this neighbor solution is kept and set as the current solution. Otherwise, this solution is accepted with a probability that is calculated according to the stage the algorithm is in (we designate this stage via a variable T called "temperature"). The probability of accepting a worse solution is P = exp(  df / T ) where df is the difference between energy of the neighbor solution and the current solution. The temperature is managed by the cooling schedule that controls the execution of the algorithm, i.e. the temperature T decreases according to the formula T_new=Cooling*T_old , where 0<Cooling<1  cooling factor.




Initialization



Optimization Programs


These programs are found in the startup code of this Maple document; to view them in detail, please select Edit > Startup Code.

Minimization Function f(x1,x2,…,xn) by Coordinate And Conjugate Directions Descent


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum.
Step  initial search step.
epsilon<<Step*Shorten  tolerance to stop when XminXOld <epsilon.
0<Shorten<1  shortening step coefficient. Recommended value: Shorten=0.15.
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(6);Shorten:=0.15;Nmax:=10000.
The global listlist Descent_Search_Path contains the search path coordinates.


Minimization Function f(x1,x2,…,xn) by Powell's Method


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum.
Step  initial search step.
epsilon<<Step*Shorten  tolerance to stop when XminXOld <epsilon.
0<Shorten<1  shortening step coefficient. Recommended value: Shorten=0.01.
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(6);Shorten:=0.01;Nmax:=10000.
The global listlist Powell_Search_Path contains the search path coordinates.


Minimization Function f(x1,x2,…,xn) by Modified HookeJeeves Method


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum.
Step  initial search step.
epsilon<<Step*Shorten  tolerance to stop when XminXOld <epsilon.
0<Shorten<1  shortening step coefficient. Recommended value: Shorten=0.15.
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(6);Shorten:=0.15;Nmax:=10000.
The global listlist Hooke_Search_Path contains the search path coordinates.


Minimization Function f(x1,x2,…,xn) by NelderMeed Method


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum.
Side  initial length of simplex rib.
epsilon<<Step*Compression  tolerance to stop when XminXOld <epsilon.
0<Compression<1  Compression coefficient. Recommended value: Compression=0.5.
Reflection  Reflection coefficient. Recommended value: Reflection=1.
Stretching>1  Stretching coefficient. Recommended value: Stretching=2.
Reduction>1  Reduction coefficient. Recommended value: Reduction=2.
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Side, epsilon, Compression, Reflection, Stretching, Reduction, Nmax can be missed.
In this case default values are: Side:=1;epsilon:=10.^(6); Compression:=0.5; Reflection:=1; Stretching:=2; Reduction:=2; Nmax:=10000.
The global listlist Nelder_Search_Path contains the search path coordinates of simplex barycentre.


Minimization Function f(x1,x2,…,xn) by QuasiGradient Method


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum.
Step  initial search step.
epsilon<<Step*Shorten. If step searcing<epsilon, the search stops.
0<Shorten<1  shortening step coefficient. Recommended value: Shorten=0.7.
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(6);Shorten:=0.7;Nmax:=10000.
The global listlist Qgrad_Search_Path contains the search path coordinates.


Minimization Function f(x1,x2,…,xn) by Random Searching


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum.
Nrnd  number of random directions per iteration.
Step  initial search step.
epsilon<<Step*Shorten. If step searching<epsilon, the search stops.
0<Shorten<1  shortening step coefficient. Recommended value: Shorten=0.15.
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Nrnd, Step, epsilon, Shorten, Nmax can be missed.
In this case default values are: Nrnd:=60;Step:=1;epsilon:=10.^(6);Shorten:=0.15;Nmax:=10000.
The global listlist Random_Search_Path contains the search path coordinates.


Minimization Function f(x1,x2,…,xn) by Simulated Annealing Method


f  function or procedure of nvariables (n=nops(Xmin0)).
Xmin0  list of n initial approximation of the locate minimum (initial solution).
Trnd  number of neighbor random solutions generated to calculate the initial temperature (T_initial=(fminfmax)/ln(0.8)).
0<Cooling<1  cooling factor (the temperature T decreases according to the formula T_new=Cooling*T_old).
Step  initial width of uniform distribution which used for selection random search step.
epsilon<<Step*Shorten. If Step<epsilon, the search stops.
0<Shorten<1  shortening step coefficient. Recommended value: Shorten=0.85.
Nrnd  number of random solutions that decrease the temperature (T_new=Cooling*T_old) and Step (Step_new=Shorten*Step_old).
Nmax  maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Trnd, Nrnd, Cooling, Step, epsilon, Shorten, Nmax can be missed.
In this case default values are: Trnd:=10; Nrnd:=30; Cooling:=0.15; Step:=1;epsilon:=10.^(6);Shorten:=0.95;Nmax:=10000.
The global listlist Anneal_Search_Path contains the search path coordinates.




Workplace for Comparison of Optimization Methods



Minimization


Xmin0
Xmin
fmin
Iter
Nf

Initial point ( List of n initial approximation of the locate minimum)
Argument of the minimum estimation
Minimum of the function f estimation
Number of iterations
Number of function evaluations



Assign a ndimension initial point Xmin0:
 (3.1.1) 

Minimization with the Default Parameters (there is comparison with Maple's Optimization)


Compare
Program:=[fmin Xmin Iter Nf]
MAPLE:= [fmin [ x1=Xmin[1], ... ,xn=Xmin[n] ]]

Minimization continuation starting from new initial point Xmin0=Xmin


Xmin0 = Xmin0



Minimization with UserDefined Parameters


Parameters For Minimization Programs

Step

Initial search step

eps

Tolerance

short

Shortening step coefficient (recomended value: Descent 0.15, Powell 0.01, Hook 0.15, Qgrad 0.7 Random 0.15)

Side

Initial length of simplex rib (only for Nelder)

Compression

Compression coefficient(only for Nelder, recomended value 0.5)

Reflection

Reflection coefficient(only for Nelder, recomended value 1)

Stretching

Stretching coefficient(only for Nelder, recomended value 2)

Reduction

Reduction coefficient(only for Nelder, recomended value 2)

Nrnd

Number of random directions (only for Random)

Trnd

Number of random solutions generated to calculate the initial temperature (only for Anneal)

NrndA

Number of random solutions that decrease the temperature and Step (only for Anneal)

Cooling

Cooling factor(only for Anneal)

Nmax

Maximum permissible number of function evaluations.



Warning! To improve reliability one can increase the Shortening step coefficient for nondifferentiable functions and decrease it for differentiable functions.
Assign parameters


Listlist of the Search Paths Coordinates


List of lists of n search path coordinates: [[x1_0, x2_0,.., xn_0], [x1_1, x2_1,.., xn_1],..., [x1_k, x2_k,.., xn_k]],
n  number of variables, k  number of search directions.
NB: Very large output  not shown



Optimization Efficiency (for four best Descent, Powell, Hooke and Nelder programs)



Computation Time and Accuracy of Minimization with the Default Parameters (there is comparison with Maple Optimization package)


Repetition_Number is the number of minimization repetitions. The more Repetition_Number the better computation time estimation.
Calculate
Function minimum estimations
Descent: 4.10e26
Powell: 1.13e33
Hooke: 3.41e12
Nelder: 1.34e01
Maple: 9.14e17
Time minimization (s)
Descent: 0.266
Powell: 0.250
Hooke: 0.281
Nelder: 0.328
Maple: 0.438
Number of function evaluations
Descent: 102
Powell: 157
Hooke: 421
Nelder: 271
Time equivalent number of function evaluations for Maple: 168

Assessing the optimization parameters:
Parameters

Optimization Efficiency as a Function of Initial Point (there is comparison with Maple's Optimization)


X0_min
X0_incr
X0_number
Side
max_fmin

First initial point
Incrementation of initial point
Number of initial points
Step for Nelder
Permissible maximum value for function minimum (for the purpose of reliability evaluation)



 (3.2.2.1) 
Calculate
Reliability (percentage of number of function evaluations when fmin<max_fmin)
Descent: 100.00%
Powell: 87.50%
Hooke: 100.00%
Nelder: 65.00%
Maple: 100.00%
Median of function minimum estimations
Descent: 1.60e21
Powell: 2.32e30
Hooke: 3.90e10
Nelder: 5.97e07
Maple: 1.35e15
Average number of function evaluations
Descent: 184
Powell: 281
Hooke: 865
Nelder: 327
Time equivalent for Maple: 473


Optimization Efficiency as a Function of Initial Search Step


Step_min
Step_max
Step_incr
Side
max_fmin

Minimum initial search step
Maximum initial search step
Incrementation of initial search step
Step for Nelder
Permissible maximum value for function minimum (for the purpose of reliability evaluation)



Calculate
Reliability (percentage of number of function evaluations when fmin<max_fmin)
Descent: 100.00%
Powell: 100.00%
Hooke: 100.00%
Nelder: 96.00%
Median of function minimum estimations
Descent: 2.68e23
Powell: 1.01e32
Hooke: 1.30e10
Nelder: 3.95e07
Average number of function evaluations
Descent: 85
Powell: 114
Hooke: 344
Nelder: 265


Optimization Efficiency as a Function of Tolerance


Eps_max
Eps_min
Eps_incr
Step

Maximum tolerance
Minimum tolerance
Multiplier of tolerance
Initial search step (Side=Step for Nelder)



Calculate


Optimization Efficiency as a Function of Digits


Digits_min
Digits_max
Digits_incr
Step
Eps

Minimum Digits
Maximum Digits
Incrementation of Digits
Initial search step (Side=Step for Nelder)
Tolerance



Calculate




Optimization Method Outcomes Comparison


Optimization methods have been compared on the set of 21 test functions mainly taken from [1].
Comparison of various methods of unconstrained nonlinear minimization of multivariable function is difficult because of effects of deterministic chaos during optimization. Even insignificant variation of optimization parameters can cause significant variation of number of objective function evaluations and accuracy of the minimum value approximation. Therefore for the comparison it is necessary to use averaged performances of optimization efficiency.
We'll characterize optimization efficiency by number of objective function evaluations Nf and accuracy of objective function minimum evaluation Df(xmin). Test outcomes are listed in Table 1. The initial searching step varies from 0.02 up to 4 with 0.02 increment. The columns Nf show number of objective function evaluations stepaveraged. The columns Df(xmin) show the median of a difference between the achieved minimum of the objective function and the real minimum f(xmin). Dash means failure in optimization.
Table 1. Number of Function Evaluations Nf and Accuracy of Approximated Minimum Values Df(xmin)


InitialPoint

Descent

Powell

Hooke

Nelder

Qgrad

Random

Anneal

Maple

Nf

DF(xmin)

Nf

DF(xmin)

Nf

DF(xmin)

Nf

DF(xmin)

Nf

DF(xmin)

Nf

DF(xmin)

Nf

DF(xmin)

Nf

DF(xmin)

1

10

26


39


28


133


42


37


2291






2

(8,9)

21


21


31


235


253


233


2019


107


3

(1.2, 1)

112


169


373


333


1777


3638


4011


215


3

(2.547, 1.489)

102


163


332


391


19490


2941


10031


236


4

(0.211, 3.505)

69


60


112


253


499


774


1911


75


5

(2,8)

50


38


61


277


136


782


4761


55


6

(1.5, 3.5)

157


164


429


381


36111


3776


8831


325


6

(0.248, 3.082)

221


224


516


445


31179


24211


10011


166


7

(2,2)

86


121


73


185


172347


1641


1911


238


8

(0.5, 0.5)

64


62


104


205


118


11161


1911


110


8

(8, 0.8)

151


102


222


253


1686


34148


50111


146


9

(2,2)

100






40


323






1748


2861






10

(1,1)

135


220


1946


624














378


11

(1,1)

72


53


84


229


336


213


1911


101


12

(1,1)

561





1003


1862






9976


1911






13

(1.2,2,0)

317


162


433


406


4360


1042


4761


680


14

(95,15,2)

256


270


371


549


675


2157










15

(1,1,1)

205


121


382


442





14423


9511


253


16

(3,1,0,1)

345


264


1516


1005


24182










516


17

(1,1,1,1)

371


356


421


493


3881


11202


14711


343


18

(2.7,90,1500, 10)

654


296


4336


918














627


19

(1,1,1,1,1)

































20

(8,8,8,...) n=10

573


313


499


6326


1867


1355


3191


85


20

(8,8,8,...) n=50

3264


18391


2919


















204


21

(8,8,8,...) n=10

280


102


186


















330




Nf*  Time equivalent number of function evaluations for Maple.
Note that the Qgrad, Random and Anneal programs give unsatisfactory outcomes in the majority of test functions. Efficiency of the Descent, Powell, Hooke and Nelder programs is considerably better than that of the above mentioned ones.
The programs can be ranked by three efficient performances such as reliability, quickness and accuracy. Reliability is number of objective function evaluations at which the value of the reached minimum is less than the given magnitude in percentage to total number of evaluations. Quickness is number of objective function evaluations with the accuracy not less than the given one. Accuracy is difference between the achieved minimum of the objective function and the real minimum. As ranking depends on the test function performances all test functions have been divided to two classes  differentiable and nondifferentiable. Tables 2 and 3 show the programs topdown ranking from the best to the worst.
Table 2. Optimization efficiency for differentiable functions.

Reliability

Quickness

Accuracy

Maple, Descent

Powell, Descent

Powell

Powell

Maple 9.5

Descent

Hooke

Hooke

Maple

Nelder

Nelder

Hooke

Random

Qgrad, Random

Nelder

Anneal

Anneal

Random

Qgrad


Anneal



Qgrad



Table 3. Optimization efficiency for nondifferentiable functions.

Reliability

Quickness

Accuracy

Descent

Descent, Powell

Descent

Nelder

Hooke

Nelder

Random

Nelder

Random

Hooke

Random

Hooke

Anneal

Qgrad

Anneal

Powell

Anneal

Powell

Qgrad


Qgrad



Comparative analysis of the best Descent, Powell, Hooke, Nelder and Maple programs shows that the search algorithms implemented in the Descent and Powell programs for differentiable functions require greatly smaller number of objective function evaluations on average than the HookeJeeves, NelderMeed and Maple algorithms. The accuracy of the minimum evaluation characterized by the magnitude Df(xmin) for the Descent and Powell programs is also much higher. Number of optimization failures for the Descent and Maple program is much less, than for the Powell, Hooke and Nelder programs.
For nondifferentiable functions, the Descent program exceeds other programs on average on reliability, quickness and accuracy. The Nelder program ranks second place on the whole. The Maple optimization procedure does not optimize nondifferentiable functions.


Conclusions


The seven nonlinear optimization methods have been compared on the set of 21 test functions. The Qgrad, Random and Anneal programs give unsatisfactory outcomes in the majority of test functions.
Differentiable functions
Comparative analysis of the best Descent, Powell, Hooke, Nelder and Maple programs shows that the search algorithms implemented in the Descent and Powell programs for differentiable functions require greatly smaller number of objective function evaluations on average than the HookeJeeves, NelderMeed and Maple algorithms. The accuracy of the minimum evaluation characterized by the magnitude Df(xmin) for the Descent and Powell programs is also much higher. Number of optimization failures for the Descent and Maple program is much less, than for the Powell, Hooke and Nelder programs.
Nondifferentiable functions
For nondifferentiable functions, the Descent program exceeds other programs on average on reliability, quickness and accuracy. The Nelder program is ranks the second place on the whole. The Maple optimization procedure does not optimize nondifferentiable functions.


References


1. Himmelblau M. David. Applied Nonlinear Programming. McGrawHill Book Company. 1972. 534 p.
2. Lester Ingber, "Simulated annealing: Practice versus theory", Mathl. Comput. Modelling, vol. 18, no. 11, 1993, pp. 2957, http://www.ingber.com/asa93_sapvt.pdf.

Legal Notice: The copyright for this application is owned by the author. Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for noncommercial, nonprofit use only. Contact the author for permission if you wish to use this application in forprofit activities.
