Comparison of Multivariate
Optimization Methods
© Sergey Moiseev 2006, smoiseev@kodofon.vrn.ru, Kodofon, Russia
Introduction
This application demonstrates the use of Maple to compare methods of unconstrained nonlinear minimization of multivariable function. Seven methods of nonlinear minimization of the n-variables objective function f(x1,x2,…,xn) are analyzed:
Minimum search by coordinate and conjugate directions descent - original minimization method developed by the author (the Descent program);
Powell's method (the Powell program);
Modified Hooke-Jeeves method (the Hooke program);
Simplex Nelder-Meed method (the Nelder program);
Quasi-gradient method of minimum searching (the Qgrad program);
Random directions searching (the Random program);
Simulated annealing (the Anneal program).
All methods are direct searching methods; i.e. they do not require the objective function f(x1,x2,…,xn) to be differentiable and continuous. Optimization methods have been compared on the set of 21 test functions. Maple's optimization efficiency is compared with these programs.
Brief Description of Optimization Methods
Minimum Search By Coordinate And Conjugate Directions Descent
The method based on the idea of coordinates descent by all initial coordinates and when possible, the descent in the directions conjugated to initial coordinates. Let's recall that two directions and are conjugated if ; , where H - positive definite Hessian matrix: . If the function f(x1,x2,…xn) is the n-dimensional positive defined quadratic form or n-dimensional paraboloid; its minimum is obtained for n moves in the direction of n various conjugated directions.
Schematically the computing procedure of algorithm implemented in the Descent program consists in the following rules. At the beginning, there is movement in the line of initial coordinates x1,x2,…,xn by turn. The local minimum obtained on the k-th iteration at movement in the line of coordinate is maintained. Later, when finding the second minimum on the (k+m)-th iteration movement in the direction from the first minimum to the second one is carried out at once. As both minimums and have been obtained at the movement in the line of the same coordinate , the direction from to and direction in the line of coordinate is conjugate. If n>2 the third conjugate direction connecting two minimums, obtained at moving in the line of two identical previous conjugate directions is constructed. Thus the Descent program approximates function n variables by a sequence of two-dimensional and three-dimensional paraboloids.
If coordinates descent is unsuccessful, the additional evaluation of the objective function value in the point of the minimum n-dimensional degenerate paraboloid is produced by which the objective function is approximated. This procedure allows considerably increasing reliability of the program.
Except the descent in the line of coordinate axes and conjugate directions in the program Descent the moving on curvilinear direction takes place. Curvilinear direction is obtained by parabolic approximation of three last coordinates of local minimums found at the movement by the last coordinate. Movement in curvilinear direction allows efficient passing narrow curved canyons, as in Rosenbrock function.
Powell's Method
In the Powell's method [1] the complete set of n conjugate directions by the following recurrent rule is under construction: the k-th conjugate direction coincides with the direction connecting two local minimums, obtained at movement along previous k-1 identical conjugate directions. If at movement in the k-th conjugate direction the function considerably decreases, the appropriate coordinate axis on this conjugate direction is replaced. Thus, unlike the Descent program, the Powell program approximates function n variables by a sequence of n-dimensional paraboloids during optimization.
The Modified Hooke-Jeeves Method
The Hooke-Jeeves method [1] is modification of the simplest coordinates descent method. First, k iterations of this method are carried out similarly to those of the coordinates descent method. After finding the local minimum after the k iteration the search is carried out in the direction connecting initial approximation and k approximation (so-called pattern search).
The modification of the Hooke-Jeeves method performed by the mainly consists in the following: variable step of search and quadratic interpolation have been used at linear search on each coordinate. Besides increase of the probability the direction of pattern search coincides with direction conjugated with coordinate axes.
Simplex Nelder-Meed Method
When searching a minimum of n-variables function by the Nelder-Meed method [1] as trial values, the points located in the vertex of simplex (polyhedron) are taken. In each vertex of the simplex objective function values are calculated. Further, initial polyhedron is deformed by one of special procedures (reflection, expansion, compression or reductions) so that the vertex with a maximum function value to be transformed into a vertex with smaller function value. Deforming of the polyhedron proceeds so long as its size does not decrease up to the given tolerance. The barycentre of the final polyhedron are taken as approximation to a point of a minimum.
Quasi-Gradient Method of Minimum Search
The search direction of this method is directly opposite to the direction of a gradient vector of the function f(x1,x2,…,xn). As against the gradient method requiring knowledge of the first analytic derivatives of the objective function, derivatives is estimated approximately by meanse of difference derivation.
Random Directions Search
Search directions of this method are random and independent of each other. If the direction is chosen unsuccessfully, another random direction gets out.
Simulated Annealing
Simulated annealing [2] is a random-search technique utilizing an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure (the annealing process). Simulated annealing's major advantage over other methods is an ability to avoid becoming trapped in local minimum. The algorithm employs a random search which not only accepts changes that decrease the objective function f, but also some changes increasing it. The algorithm starts with an initial feasible solution, which is set as the current solution. Randomly, a neighboring solution from the solution space is chosen, and its energy (objective function) f is compared to that of the current solution. If the energy is decreased, this neighbor solution is kept and set as the current solution. Otherwise, this solution is accepted with a probability that is calculated according to the stage the algorithm is in (we designate this stage via a variable T called "temperature"). The probability of accepting a worse solution is P = exp( - df / T ) where df is the difference between energy of the neighbor solution and the current solution. The temperature is managed by the cooling schedule that controls the execution of the algorithm, i.e. the temperature T decreases according to the formula T_new=Cooling*T_old , where 0<Cooling<1 - cooling factor.
Initialization
Optimization Programs
These programs are found in the start-up code of this Maple document; to view them in detail, please select Edit > Startup Code.
Minimization Function f(x1,x2,…,xn) by Coordinate And Conjugate Directions Descent
f - function or procedure of n-variables (n=nops(Xmin0)).
Xmin0 - list of n initial approximation of the locate minimum.
Step - initial search step.
epsilon<<Step*Shorten - tolerance to stop when ||Xmin-XOld ||<epsilon.
0<Shorten<1 - shortening step coefficient. Recommended value: Shorten=0.15.
Nmax - maximum permissible number of function evaluations. If number of function evaluations exceed Nmax, the search stops.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(-6);Shorten:=0.15;Nmax:=10000.
The global listlist Descent_Search_Path contains the search path coordinates.
Minimization Function f(x1,x2,…,xn) by Powell's Method
0<Shorten<1 - shortening step coefficient. Recommended value: Shorten=0.01.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(-6);Shorten:=0.01;Nmax:=10000.
The global listlist Powell_Search_Path contains the search path coordinates.
Minimization Function f(x1,x2,…,xn) by Modified Hooke-Jeeves Method
The global listlist Hooke_Search_Path contains the search path coordinates.
Minimization Function f(x1,x2,…,xn) by Nelder-Meed Method
Side - initial length of simplex rib.
epsilon<<Step*Compression - tolerance to stop when ||Xmin-XOld ||<epsilon.
0<Compression<1 - Compression coefficient. Recommended value: Compression=0.5.
Reflection - Reflection coefficient. Recommended value: Reflection=1.
Stretching>1 - Stretching coefficient. Recommended value: Stretching=2.
Reduction>1 - Reduction coefficient. Recommended value: Reduction=2.
Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations. The arguments Side, epsilon, Compression, Reflection, Stretching, Reduction, Nmax can be missed.
In this case default values are: Side:=1;epsilon:=10.^(-6); Compression:=0.5; Reflection:=1; Stretching:=2; Reduction:=2; Nmax:=10000.
The global listlist Nelder_Search_Path contains the search path coordinates of simplex barycentre.
Minimization Function f(x1,x2,…,xn) by Quasi-Gradient Method
epsilon<<Step*Shorten. If step searcing<epsilon, the search stops.
0<Shorten<1 - shortening step coefficient. Recommended value: Shorten=0.7.
The arguments Step, epsilon, Shorten, Nmax can be missed. In this case default values are: Step:=1;epsilon:=10.^(-6);Shorten:=0.7;Nmax:=10000.
The global listlist Qgrad_Search_Path contains the search path coordinates.
Minimization Function f(x1,x2,…,xn) by Random Searching
Nrnd - number of random directions per iteration.
epsilon<<Step*Shorten. If step searching<epsilon, the search stops.
The arguments Nrnd, Step, epsilon, Shorten, Nmax can be missed.
In this case default values are: Nrnd:=60;Step:=1;epsilon:=10.^(-6);Shorten:=0.15;Nmax:=10000.
The global listlist Random_Search_Path contains the search path coordinates.
Minimization Function f(x1,x2,…,xn) by Simulated Annealing Method
Xmin0 - list of n initial approximation of the locate minimum (initial solution).
Trnd - number of neighbor random solutions generated to calculate the initial temperature (T_initial=(fmin-fmax)/ln(0.8)).
0<Cooling<1 - cooling factor (the temperature T decreases according to the formula T_new=Cooling*T_old).
Step - initial width of uniform distribution which used for selection random search step.
epsilon<<Step*Shorten. If Step<epsilon, the search stops.
0<Shorten<1 - shortening step coefficient. Recommended value: Shorten=0.85. Nrnd - number of random solutions that decrease the temperature (T_new=Cooling*T_old) and Step (Step_new=Shorten*Step_old).
The arguments Trnd, Nrnd, Cooling, Step, epsilon, Shorten, Nmax can be missed.
In this case default values are: Trnd:=10; Nrnd:=30; Cooling:=0.15; Step:=1;epsilon:=10.^(-6);Shorten:=0.95;Nmax:=10000.
The global listlist Anneal_Search_Path contains the search path coordinates.
Select Test Objective Function of n-Variables
Select a test function by clicking the button (NB: The results in this document are from selecting #3):
21: n=n (Discontinuous function with multiple local minimums; n=1,2,3...)
Workplace for Comparison of Optimization Methods
Minimization
Xmin0
Xmin
fmin
Iter
Nf
Initial point ( List of n initial approximation of the locate minimum)
Argument of the minimum estimation
Minimum of the function f estimation
Number of iterations
Number of function evaluations
Assign a n-dimension initial point Xmin0:
Minimization with the Default Parameters (there is comparison with Maple's Optimization)
Compare
Program:=[fmin Xmin Iter Nf]
MAPLE:= [fmin [ x1=Xmin[1], ... ,xn=Xmin[n] ]]
Minimization continuation starting from new initial point Xmin0=Xmin
Xmin0 = Xmin0
Minimization with User-Defined Parameters
Parameters For Minimization Programs
Step
Initial search step
eps
Tolerance
short
Shortening step coefficient (recomended value: Descent 0.15, Powell 0.01, Hook 0.15, Qgrad 0.7 Random 0.15)
Side
Initial length of simplex rib (only for Nelder)
Compression
Compression coefficient(only for Nelder, recomended value 0.5)
Reflection
Reflection coefficient(only for Nelder, recomended value 1)
Stretching
Stretching coefficient(only for Nelder, recomended value 2)
Reduction
Reduction coefficient(only for Nelder, recomended value 2)
Nrnd
Number of random directions (only for Random)
Trnd
Number of random solutions generated to calculate the initial temperature (only for Anneal)
NrndA
Number of random solutions that decrease the temperature and Step (only for Anneal)
Cooling
Cooling factor(only for Anneal)
Nmax
Maximum permissible number of function evaluations.
Warning! To improve reliability one can increase the Shortening step coefficient for nondifferentiable functions and decrease it for differentiable functions.
Assign parameters
Listlist of the Search Paths Coordinates
List of lists of n search path coordinates: [[x1_0, x2_0,.., xn_0], [x1_1, x2_1,.., xn_1],..., [x1_k, x2_k,.., xn_k]],
n - number of variables, k - number of search directions.
Plot of Search Paths
Plots for x1, x2 out of n search path coordinates.
Optimization Efficiency (for four best Descent, Powell, Hooke and Nelder programs)
Computation Time and Accuracy of Minimization with the Default Parameters (there is comparison with Maple Optimization package)
Repetition_Number is the number of minimization repetitions. The more Repetition_Number the better computation time estimation.
Function minimum estimations Descent: 4.10e-26 Powell: 1.13e-33 Hooke: 3.41e-12 Nelder: 1.34e-01 Maple: 9.14e-17 Time minimization (s) Descent: 0.266 Powell: 0.250 Hooke: 0.281 Nelder: 0.328 Maple: 0.438 Number of function evaluations Descent: 102 Powell: 157 Hooke: 421 Nelder: 271 Time equivalent number of function evaluations for Maple: 168
Assessing the optimization parameters:
Optimization Efficiency as a Function of Initial Point (there is comparison with Maple's Optimization)
X0_min
X0_incr
X0_number
max_fmin
First initial point
Incrementation of initial point
Number of initial points
Step for Nelder
Permissible maximum value for function minimum (for the purpose of reliability evaluation)
Reliability (percentage of number of function evaluations when fmin<max_fmin) Descent: 100.00% Powell: 87.50% Hooke: 100.00% Nelder: 65.00% Maple: 100.00% Median of function minimum estimations Descent: 1.60e-21 Powell: 2.32e-30 Hooke: 3.90e-10 Nelder: 5.97e-07 Maple: 1.35e-15
Average number of function evaluations Descent: 184 Powell: 281 Hooke: 865 Nelder: 327 Time equivalent for Maple: 473
Optimization Efficiency as a Function of Initial Search Step
Step_min
Step_max
Step_incr
Minimum initial search step
Maximum initial search step
Incrementation of initial search step
Reliability (percentage of number of function evaluations when fmin<max_fmin) Descent: 100.00% Powell: 100.00% Hooke: 100.00% Nelder: 96.00% Median of function minimum estimations Descent: 2.68e-23 Powell: 1.01e-32 Hooke: 1.30e-10 Nelder: 3.95e-07
Average number of function evaluations Descent: 85 Powell: 114 Hooke: 344 Nelder: 265
Optimization Efficiency as a Function of Tolerance
Eps_max
Eps_min
Eps_incr
Maximum tolerance
Minimum tolerance
Multiplier of tolerance
Initial search step (Side=Step for Nelder)
Optimization Efficiency as a Function of Digits
Digits_min
Digits_max
Digits_incr
Eps
Minimum Digits
Maximum Digits
Incrementation of Digits
Optimization Method Outcomes Comparison
Optimization methods have been compared on the set of 21 test functions mainly taken from [1].
Comparison of various methods of unconstrained nonlinear minimization of multivariable function is difficult because of effects of deterministic chaos during optimization. Even insignificant variation of optimization parameters can cause significant variation of number of objective function evaluations and accuracy of the minimum value approximation. Therefore for the comparison it is necessary to use averaged performances of optimization efficiency.
We'll characterize optimization efficiency by number of objective function evaluations Nf and accuracy of objective function minimum evaluation Df(xmin). Test outcomes are listed in Table 1. The initial searching step varies from 0.02 up to 4 with 0.02 increment. The columns Nf show number of objective function evaluations step-averaged. The columns Df(xmin) show the median of a difference between the achieved minimum of the objective function and the real minimum f(xmin). Dash means failure in optimization.
Table 1. Number of Function Evaluations Nf and Accuracy of Approximated Minimum Values Df(xmin)
InitialPoint
Descent
Powell
Hooke
Nelder
Qgrad
Random
Anneal
Maple
DF(xmin)
1
10
26
39
28
133
42
37
2291
-
2
(8,9)
21
31
235
253
233
2019
107
3
(-1.2, 1)
112
169
373
333
1777
3638
4011
215
(-2.547, 1.489)
102
163
332
391
19490
2941
10031
236
4
(0.211, 3.505)
69
60
499
774
1911
75
5
(-2,8)
50
38
61
277
136
782
4761
55
6
(-1.5, 3.5)
157
164
429
381
36111
3776
8831
325
(0.248, -3.082)
221
224
516
445
31179
24211
10011
166
7
(2,2)
86
121
73
185
172347
1641
238
8
(0.5, -0.5)
64
62
104
205
118
11161
110
(8, 0.8)
151
222
1686
34148
50111
146
9
(-2,2)
100
40
323
1748
2861
(-1,1)
135
220
1946
624
378
11
(1,1)
72
53
84
229
336
213
101
12
(1,-1)
561
1003
1862
9976
13
(-1.2,2,0)
317
162
433
406
4360
1042
680
14
(95,15,2)
256
270
371
549
675
2157
15
(-1,-1,-1)
382
442
14423
9511
16
(3,-1,0,1)
345
264
1516
1005
24182
17
(1,1,1,1)
356
421
493
3881
11202
14711
343
18
(2.7,90,1500, 10)
654
296
4336
918
627
19
(-1,-1,-1,-1,-1)
20
(8,8,8,...) n=10
573
313
6326
1867
1355
3191
85
(8,8,8,...) n=50
3264
18391
2919
204
280
186
330
Nf* - Time equivalent number of function evaluations for Maple.