Application Center - Maplesoft

# Comparison of Multivariate Optimization Methods

You can switch back to the summary page by clicking here.

Comparison of Multivariate

Optimization Methods

© Sergey Moiseev 2006, smoiseev@kodofon.vrn.ru, Kodofon, Russia

Introduction

This application demonstrates the use of Maple to compare methods of unconstrained nonlinear minimization of multivariable function. Seven methods of nonlinear minimization of the n-variables objective function f(x1,x2,…,xn)  are analyzed:

 1 Minimum search by coordinate and conjugate directions descent - original minimization method developed by the author (the Descent program);
 2 Powell's method (the Powell program);
 3 Modified Hooke-Jeeves method (the Hooke program);
 4 Simplex Nelder-Meed method (the Nelder program);
 6 Random directions searching (the Random program);
 7 Simulated annealing (the Anneal program).

All methods are direct searching methods; i.e. they do not require the objective function f(x1,x2,…,xn) to be differentiable and continuous. Optimization methods have been compared on the set of  21 test functions. Maple's optimization efficiency is compared with these programs.

Brief Description of Optimization Methods

 Minimum Search By Coordinate And Conjugate Directions Descent The method based on the idea of coordinates descent by all initial coordinates and when possible, the descent in the directions conjugated to initial coordinates. Let's recall that two directions   and   are conjugated if   ; , where H - positive definite Hessian matrix:  . If the function f(x1,x2,…xn) is the n-dimensional positive defined quadratic form or n-dimensional paraboloid; its minimum is obtained for n moves in the direction of n various conjugated directions. Schematically the computing procedure of algorithm implemented in the Descent program consists in the following rules. At the beginning, there is movement in the line of initial coordinates x1,x2,…,xn by turn. The local minimum   obtained on the k-th iteration at movement in the line of   coordinate is maintained. Later, when finding the second minimum   on the (k+m)-th iteration movement in the direction from the first minimum   to the second one   is carried out at once. As both minimums   and   have been obtained at the movement in the line of the same coordinate  , the direction from   to   and direction in the line of coordinate   is conjugate. If n>2 the third conjugate direction connecting two minimums, obtained at moving in the line of two identical previous conjugate directions is constructed. Thus the Descent program approximates function n variables by a sequence of two-dimensional and three-dimensional paraboloids. If coordinates descent is unsuccessful, the additional evaluation of the objective function value in the point of  the minimum n-dimensional degenerate paraboloid is produced by which the objective function is approximated. This procedure allows considerably increasing reliability of the program. Except the descent in the line of coordinate axes and conjugate directions in the program Descent the moving on curvilinear direction takes place. Curvilinear direction is obtained by parabolic approximation of three last coordinates of local minimums found at the movement by the last coordinate. Movement in curvilinear direction allows efficient passing narrow curved canyons, as in Rosenbrock function.
 Powell's Method In the Powell's method [1] the complete set of n conjugate directions by the following recurrent rule is under construction: the k-th conjugate direction coincides with the direction connecting two local minimums, obtained at movement along previous k-1 identical conjugate directions. If at movement in the k-th conjugate direction the function considerably decreases, the appropriate coordinate axis on this conjugate direction is replaced. Thus, unlike the Descent program, the Powell program approximates function n variables by a sequence of n-dimensional paraboloids during optimization.
 The Modified Hooke-Jeeves Method The Hooke-Jeeves method [1] is modification of the simplest coordinates descent method. First, k iterations of this method are carried out similarly to those of the coordinates descent method. After finding the local minimum   after the k iteration the search is carried out in the direction connecting initial approximation   and k approximation   (so-called pattern search). The modification of the Hooke-Jeeves method performed by the mainly consists in the following: variable step of search and quadratic interpolation have been used at linear search on each coordinate. Besides increase of the probability the direction of pattern search coincides with direction conjugated with coordinate axes.
 Simplex Nelder-Meed Method When searching a minimum of n-variables function by the Nelder-Meed method [1] as trial values, the points located in the vertex of simplex (polyhedron) are taken. In each vertex of the simplex objective function values are calculated. Further, initial polyhedron is deformed by one of special procedures (reflection, expansion, compression or reductions) so that the vertex with a maximum function value to be transformed into a vertex with smaller function value. Deforming of the polyhedron proceeds so long as its size does not decrease up to the given tolerance. The barycentre of the final polyhedron are taken as approximation to a point of a minimum.
 Quasi-Gradient Method of Minimum Search The search direction of this method is directly opposite to the direction of a gradient vector of the function f(x1,x2,…,xn). As against the gradient method requiring knowledge of the first analytic derivatives of the objective function, derivatives is estimated approximately by meanse of difference derivation.
 Random Directions Search Search directions of this method are random and independent of each other. If the direction is chosen unsuccessfully, another random direction gets out.
 Simulated Annealing Simulated annealing [2] is a random-search technique utilizing an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure (the annealing process). Simulated annealing's major advantage over other methods is an ability to avoid becoming trapped in local minimum. The algorithm employs a random search which not only accepts changes that decrease the objective function f, but also some changes increasing it. The algorithm starts with an initial feasible solution, which is set as the current solution. Randomly, a neighboring solution from the solution space is chosen, and its energy (objective function) f is compared to that of the current solution. If the energy is decreased, this neighbor solution is kept and set as the current solution. Otherwise, this solution is accepted with a probability that is calculated according to the stage the algorithm is in (we designate this stage via a variable T called "temperature"). The probability of accepting a worse solution is P = exp( - df / T ) where df is the difference between energy of the neighbor solution and the current solution. The temperature is managed by the cooling schedule that controls the execution of the algorithm, i.e. the temperature T decreases according to the formula T_new=Cooling*T_old , where 0

Initialization

Optimization Programs

These programs are found in the start-up code of this Maple document; to view them in detail, please select Edit > Startup Code.

 Minimization Function f(x1,x2,…,xn) by Coordinate And Conjugate Directions Descent f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum. Step - initial search step. epsilon<
 Minimization Function f(x1,x2,…,xn) by Powell's Method f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum. Step - initial search step. epsilon<
 Minimization Function f(x1,x2,…,xn) by Modified Hooke-Jeeves Method f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum. Step - initial search step. epsilon<
 Minimization Function f(x1,x2,…,xn) by Nelder-Meed Method f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum. Side - initial length of simplex rib. epsilon<1 - Stretching coefficient.  Recommended value: Stretching=2. Reduction>1 - Reduction coefficient.  Recommended value: Reduction=2. Nmax - maximum permissible number of function evaluations. If  number of function evaluations exceed Nmax, the search stops. Return list containing the minimum of the function f estimation, list of n arguments of minimum estimation, number iterations, number of function evaluations. The arguments Side, epsilon, Compression, Reflection, Stretching, Reduction, Nmax can be missed. In this case default values are: Side:=1;epsilon:=10.^(-6); Compression:=0.5; Reflection:=1; Stretching:=2; Reduction:=2; Nmax:=10000. The global listlist Nelder_Search_Path contains the search path coordinates of simplex barycentre.
 Minimization Function f(x1,x2,…,xn) by Quasi-Gradient Method f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum. Step - initial search step. epsilon<
 Minimization Function f(x1,x2,…,xn) by Random Searching f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum. Nrnd - number of random directions per iteration. Step - initial search step. epsilon<
 Minimization Function f(x1,x2,…,xn) by Simulated Annealing Method f - function or procedure of n-variables (n=nops(Xmin0)). Xmin0 - list  of  n  initial approximation of the locate minimum (initial solution). Trnd - number of neighbor random solutions  generated to calculate the initial temperature (T_initial=(fmin-fmax)/ln(0.8)). 0

 Select Test Objective Function of n-Variables Select a test function by clicking the button (NB: The results in this document are from selecting #3):   1: n=1 2: n=2 (Quadratic function) 3: n=2 (Rosenbrock function) 4: n=2 5: n=2 6: n=2 (Function with serpentine curved canyon) 7: n=2 8: n=2 (Beale's function) 9: n=2 (Non-differentiable function) 10: n=2 (Function with narrow curved canyon and bad scaling) 11: n=2 (Function with 4 global minimums) 12: n=2 (Non-differentiable author's function) 13: n=3 14: n=3 (Least-modules method of estimation) 15: n=3 (Spiral descent) 16: n=4 (Wood's function) 17: n=4 18: n=4 (Least-squares method of estimation) 19: n=5 (Discontinuous plateau function) 20: n=n (n-Dimensional positive defined quadratic form; n=1,2,3...) 21: n=n (Discontinuous function with multiple local minimums; n=1,2,3...)

Workplace for Comparison of Optimization Methods

Minimization

 Xmin0 Xmin fmin Iter Nf Initial point ( List of n initial approximation of the locate minimum) Argument of the minimum estimation Minimum of the function f estimation Number of  iterations Number of  function evaluations

Assign a n-dimension initial point Xmin0:

 (3.1.1)

Minimization with the Default Parameters (there is comparison with Maple's Optimization)

Compare

Program:=[fmin Xmin Iter Nf]

MAPLE:= [fmin [ x1=Xmin[1], ... ,xn=Xmin[n] ]]

 Minimization continuation  starting  from  new initial  point  Xmin0=Xmin Xmin0 = Xmin0

Minimization with User-Defined Parameters

 Parameters For Minimization Programs Step Initial search step eps Tolerance short Shortening step coefficient (recomended value: Descent  0.15, Powell  0.01, Hook  0.15, Qgrad  0.7 Random  0.15) Side Initial length of simplex rib (only for Nelder) Compression Compression coefficient(only for Nelder, recomended value 0.5) Reflection Reflection coefficient(only for Nelder, recomended value 1) Stretching Stretching coefficient(only for Nelder, recomended value 2) Reduction Reduction coefficient(only for Nelder, recomended value 2) Nrnd Number of random directions (only for Random) Trnd Number of random solutions generated to calculate the initial temperature (only for Anneal) NrndA Number of random solutions that decrease the temperature and Step (only for Anneal) Cooling Cooling factor(only for Anneal) Nmax Maximum permissible number of function evaluations.

Warning!  To improve reliability one can increase the Shortening step coefficient for nondifferentiable functions and decrease it for differentiable functions.

Assign parameters

 Listlist of the Search Paths Coordinates List of lists of  n search path coordinates: [[x1_0, x2_0,.., xn_0], [x1_1, x2_1,.., xn_1],..., [x1_k, x2_k,.., xn_k]], n - number of variables, k - number of search directions.   NB: Very large output - not shown
 Plot of Search Paths Plots  for  x1, x2  out of  n search path coordinates.     Plots

Optimization Efficiency (for  four  best   Descent, Powell, Hooke and Nelder programs)

 Computation Time and Accuracy of  Minimization with the Default  Parameters (there is comparison with Maple Optimization package) Repetition_Number is the number of minimization repetitions. The more Repetition_Number the better computation time estimation.   Calculate Function minimum estimations Descent:   4.10e-26   Powell:    1.13e-33   Hooke:     3.41e-12   Nelder:    1.34e-01   Maple: 9.14e-17   Time minimization (s) Descent:   0.266      Powell:    0.250      Hooke:     0.281      Nelder:    0.328      Maple: 0.438      Number of function evaluations Descent:    102 Powell:     157 Hooke:      421 Nelder:     271 Time equivalent number of function evaluations for Maple:    168

Assessing the optimization parameters:

Parameters

Optimization Efficiency as a Function of Initial Point (there is comparison with Maple's Optimization)

 X0_min X0_incr X0_number Side max_fmin First initial point Incrementation of initial point Number of initial points Step for Nelder Permissible maximum value for function minimum (for the purpose of reliability evaluation)

 (3.2.2.1)

Calculate

Reliability (percentage of number of function evaluations when fmin<max_fmin)
Descent:   100.00%
Powell:     87.50%
Hooke:     100.00%
Nelder:     65.00%
Maple:     100.00%

Median of function minimum estimations
Descent:   1.60e-21
Powell:    2.32e-30
Hooke:     3.90e-10
Nelder:    5.97e-07
Maple:     1.35e-15

Average number of function evaluations
Descent:    184
Powell:     281
Hooke:      865
Nelder:     327

Time equivalent for Maple:     473

Optimization Efficiency as a Function of Initial Search Step

 Step_min Step_max Step_incr Side max_fmin Minimum initial search step Maximum initial search step Incrementation of initial search step Step for Nelder Permissible maximum value for function minimum (for the purpose of  reliability evaluation)

Calculate

Reliability (percentage of number of function evaluations when fmin<max_fmin)
Descent:   100.00%
Powell:    100.00%
Hooke:     100.00%
Nelder:     96.00%

Median of function minimum estimations
Descent:   2.68e-23
Powell:    1.01e-32
Hooke:     1.30e-10
Nelder:    3.95e-07

Average number of function evaluations
Descent:     85
Powell:     114
Hooke:      344
Nelder:     265

Optimization Efficiency as a Function of Tolerance

 Eps_max Eps_min Eps_incr Step Maximum tolerance Minimum tolerance Multiplier of tolerance Initial search step (Side=Step for Nelder)

Calculate

Optimization Efficiency as a Function of Digits

 Digits_min Digits_max Digits_incr Step Eps Minimum Digits Maximum Digits Incrementation of Digits Initial search step (Side=Step for Nelder) Tolerance

Calculate

Optimization Method Outcomes Comparison

Optimization methods have been compared on the set of 21 test functions mainly taken from [1].

Comparison of various methods of unconstrained nonlinear minimization of multivariable function is difficult because of effects of deterministic chaos during optimization. Even insignificant variation of optimization parameters can cause significant variation of number of objective function evaluations and accuracy of the minimum value approximation. Therefore for the comparison it is necessary to use averaged performances of optimization efficiency.

We'll characterize optimization efficiency by number of objective function evaluations Nf and accuracy of objective function minimum evaluation Df(xmin). Test outcomes are listed in Table 1. The initial searching step varies from 0.02 up to 4 with 0.02 increment. The columns Nf show number of objective function evaluations step-averaged. The columns Df(xmin) show the median of a difference between the achieved minimum of the objective function and the real minimum f(xmin). Dash means failure in optimization.

 Table 1. Number of Function Evaluations Nf  and Accuracy of Approximated Minimum Values Df(xmin) InitialPoint Descent Powell Hooke Nelder Qgrad Random Anneal Maple Nf DF(xmin) Nf DF(xmin) Nf DF(xmin) Nf DF(xmin) Nf DF(xmin) Nf DF(xmin) Nf DF(xmin) Nf DF(xmin) 1 10 26 39 28 133 42 37 2291 - - 2 (8,9) 21 21 31 235 253 233 2019 107 3 (-1.2, 1) 112 169 373 333 1777 3638 4011 215 3 (-2.547, 1.489) 102 163 332 391 19490 2941 10031 236 4 (0.211, 3.505) 69 60 112 253 499 774 1911 75 5 (-2,8) 50 38 61 277 136 782 4761 55 6 (-1.5, 3.5) 157 164 429 381 36111 3776 8831 325 6 (0.248, -3.082) 221 224 516 445 31179 24211 10011 166 7 (2,2) 86 121 73 185 172347 1641 1911 238 8 (0.5, -0.5) 64 62 104 205 118 11161 1911 110 8 (8, 0.8) 151 102 222 253 1686 34148 50111 146 9 (-2,2) 100 - - 40 323 - - 1748 2861 - - 10 (-1,1) 135 220 1946 624 - - - - - - 378 11 (1,1) 72 53 84 229 336 213 1911 101 12 (1,-1) 561 - 1003 1862 - - 9976 1911 - - 13 (-1.2,2,0) 317 162 433 406 4360 1042 4761 680 14 (95,15,2) 256 270 371 549 675 2157 - - - - 15 (-1,-1,-1) 205 121 382 442 - 14423 9511 253 16 (3,-1,0,1) 345 264 1516 1005 24182 - - - - 516 17 (1,1,1,1) 371 356 421 493 3881 11202 14711 343 18 (2.7,90,1500, 10) 654 296 4336 918 - - - - - - 627 19 (-1,-1,-1,-1,-1) - - - - - - - - - - - - - - - - 20 (8,8,8,...) n=10 573 313 499 6326 1867 1355 3191 85 20 (8,8,8,...) n=50 3264 18391 2919 - - - - - - - - 204 21 (8,8,8,...) n=10 280 102 186 - - - - - - - - 330

Nf* -  Time equivalent number of function evaluations for Maple.