simplexopt.mws
Optimization with sequential simplex of variable size
by E. G. RomeroBlanco
<eromero@costarricense.cr>
and J. F. Ogilvie
<ogilvie@cecm.sfu.ca>
Universidad de Costa Rica, Escuela de Quimica, Ciudad Universitaria Rodrigo Facio, San Jose, Costa Rica.
@ 2002 July
abstract
We present an algorithm for unconstrained optimisation of experimental data in chemical kinetics using as method a sequential simplex of variable size. An animated plot of progress of a triangular simplex descending across a surface of
to converge to an absolute minimum illustrates an implementation of this approach.
introduction
A common problem in treating chemical and physical systems is a necessity to evaluate an optimum response  a maximum or minimum, which is explicitly or implicitly a function of several factors. Although many methods exist for this purpose, conceptually the most simple is the sequential simplex method or its modifications. This approach has as its objective to locate efficiently the region or point of optimum response in a space defined by the factors and to find simultaneously the value of this extremum and to evaluate the magnitudes of factors that produce it [1, 2].
A simplex is a geometric figure defined on a response surface, or hypersurface in multidimensional space, of which the number of vertices or corners exceeds by one the number of factors, so
. Such a simplex as a geometric figure is described as belonging to a factor space of
dimensions [1,2]. The basic sequential simplex or algorithm for a sequential simplex of fixed size, introduced by Spendley, Hext and Himsworth [3], provides rules to force such a simplex to move by reflexions of vertices of an initially selected simplex to a region of optimum response.
Although originally developed for experimental optimization, the simplex method and its modifications have been applied to mathematical optimisation. For a purpose of applying the simplex method to a leastsquare fit of data to nonlinear models, Nelder and Mead [4] reported a sequential simplex with variable sizeon introducing two modifications into the original algorithm of Spendley
et alii
. These two modifications lead the simplex to expand in directions that are favourable and to contract in directions that are unfavourable, so as to provide a means according to which this figure can accelerate its progress toward a region of an optimum and can then diminish its search region until the extremum is located within desired accuracy.
Advantages of using a sequential simplex in fitting nonlinear models over other methods include its compact size, its relative independence from initial values of factors or parameters, and no necessity to calculate derivatives with respect to parameters. Like other approaches to fit a model nonlinear in parameters, the sequential simplex requires initial estimates of those parameters. Whereas the sequential simplex method converges from almost any initial values, in
sets, approaches based on derivatives require initial estimates typically near ultimate values of parameters; this property becomes important during fitting of models with many parameters.
The purpose of this work is to employ
Maple
's capabilities to demonstrate the applicability of a sequential simplex method, based on a least sum of squares of residuals and with variable size of simplex, to fit,
to previously published experimental data [2]
, a model well known in chemical kinetics of first order for the variation of concentration of a reaction product as a function of time,
in which
is absorbance of a product of a reaction of first kinetic order and is taken to be directly proportional to the concentration of that product,
t
is time,
is absorbance at infinite time, and
k
is the rate coefficient; such a relation implies that progress of a chemical reaction is being followed by means of the extent of absorption of a coloured product as a function of time.
The particular objective of solution of our optimisation task is to find numerical values of factors
and
k
that minimize the response function
, or chi squared, which is defined as a sum of squared residuals, with each residual as a difference between an observed value
of absorbance and the corresponding expectation
for experimental data
of number
n
.
With respect to the chemical system, the response to variation of independent variable time, denoted
, is the absorbance
of the product of the chemical reaction in the, for instance, visible spectrum, whereas with respect to the motion of thesimplex the response is
as a result of variation of parameters
and
in the course of fitting the kinetic model to measured values of
and
.
basis of the calculation
To begin, we specify, as an arrow function, a model or objective function to fit, in terms of an independent variable time
and two unknown factors  the rate coefficient
k
and the ultimate absorbance
of the coloured product at completion of reaction, represented as
A_inf
.
> 
F := (t, A_inf, k) > A_inf*(1  exp(k*t));

Experimental data to be fitted correspond to nine duplicated measurements of absorbance of a chemical system reacting according to first kinetic order during a defined period; we name lists
of values of time
and
for values of absorbance
.
> 
t_data := [1.5,1.5,3.0,3.0,4.5,4.5,6.0,6.0,9.0,9.0,12.0,12.0,15.0,
15.0,18.0,18.0,24.0,24.0];
A_data := [0.110,0.109,0.169,0.172,0.210,0.210,0.251,0.255,0.331,
0.325,0.326,0.330,0.362,0.383,0.381,0.372,0.422,0.411];
n_data := nops(t_data);

A plot of these data shows the behaviour of this system of first kinetic order.
> 
Gdata := plots[pointplot]({seq([t_data[i],A_data[i]],i=1..n_data)}, colour=coral,
labels=["time t", "Absorbance A"], view=[0..25, 0..0.45]):
plots[display](Gdata, titlefont=[TIMES,BOLD,12], title="points to be fitted");

Through a
Maple
command we directly evaluate parameters
and
k
that fit the model to these data on minimizing a sum of squared residuals,
, over all experimental data.
> 
lo := [minimize(add((A_data[i]  F(t_data[i], A_inf, k))^2,
i=1..n_data), A_inf=0..0.7,k=0..1.2, 'location')];

For future reference we take as
exact values
these results for parameters and a corresponding magnitude of
, denoted respectively as
A_min
,
k_min
and
chisq_min
, which we extract from the output above.
> 
A_min := evalf(subs(loc, A_inf),5);
k_min := evalf(subs(loc, k),5);
chisq_min := evalf(op(1, lo),5);

We view the topography of the response surface of
for the particular data by means of a plot in three dimensions, one for each parameter
and
k
and the third for
, over values of parameters in a convenient range. The surface of
presents a curved valley with parabolic cross sections near the minimum; that minimum response is readily located at the deepest point of the valley. On this plot in three dimensions we superimpose contours of
for selected values,
> 
GSurf0 := plot3d(add((A_data[i]  F(t_data[i], A_inf, k))^2,
i=1..n_data), A_inf=0..0.7, k=0..1.2, grid=[50,50],axes=BOXED,
contours=[seq(0.008*2^j,j=1..8)], style=PATCHCONTOUR,
shading=XYZ, orientation=[105,25]):
GSurf1 := plots[pointplot3d]([A_min, k_min, chisq_min], colour=black,
symbol=circle, symbolsize=10):
plots[display]([GSurf0, GSurf1], titlefont=[TIMES,BOLD,12],
title="surface with contours of chi squared as a function \n of parameters A_inf and k");

and plot separately these contours.
> 
GCont0 := plots[contourplot](add((A_data[i]  F( t_data[i], A_inf, k))^2, i=1..n_data),
A_inf=0..0.7, k=0..1.2, grid=[50,50], colour=orange, contours=[seq(0.008*2^j,j=1..8)]):
GCont1 := plots[pointplot]([A_min, k_min], symbol=circle, symbolsize=10, colour=black):
plots[display]([GCont0, GCont1], titlefont=[TIMES,BOLD,12],
title="contours of chi squared as a function \n of parameters A_inf and k" );

In the upper threedimensional plot, thin black lines indicate contours of
at constant values increasing ina geometric progression, 0.008, 0.008
, 0.008
and so forth, the same values as for orange lines in the lower contour plot; a small circle within the innermost contour indicates the minimum in both cases.
algorithm of a sequential simplex
To implement the method of sequential simplex with variable size we begin by constructing an initial simplex. The efficiency of any sequential simplex method depends to some extent on the size, orientation and location of the initial simplex. In this case, because two factors,
, namely
and
k
,
define explicitly the response that we apply as a sum of squared residuals,
,
to fix an
initial simplex we must define coordinates of three vertices,
, that form a triangle, in a factor space having two dimensions. For vertices of an initial simplex we select coordinates of points on a surface of
that enable us to observethe behaviour of successive simplices in a sequence with varied sizes, orientations and locations during progress toward a minimum; each initial vertex is defined as a value of a subscripted variable, so that the number of coordinates equals the number of factors.
For ourtest case we take initial vertices to have these coordinates [
,
k
]:
[0.600, 0.800], [0.638, 0.826] and [0.613, 0.897],
which define a small simplex on the side of the valley representing a surface of
remote from its minimum; generating further simplices from this initial condition, we can view motion of a simplex across the surface.
> 
A_inf_initial[1] := 0.600; k_initial[1] := 0.800;
A_inf_initial[2] := 0.638; k_initial[2] := 0.826;
A_inf_initial[3] := 0.613; k_initial[3] := 0.897;
n_factors := 2;
n_vertex := n_factors + 1;

We show this initial simplex on the contour plot.
> 
GCont2 := plots[polygonplot]([seq([A_inf_initial[j],
k_initial[j]], j=1..n_vertex)], colour=green):
plots[display]([GCont0, GCont1, GCont2]);

For this initial simplex each vertex of the three has acorresponding value of response
.
> 
for j from 1 to n_vertex do
chisq_initial[j] := evalf(add((A_data[i]  F(t_data[i], A_inf_initial[j],
k_initial[j]))^2, i=1..n_data),5);
end do;

We must rank, according to its magnitude, the response for each vertex as best, B, intermediate, N, and worst, W, such that the best vertex has the minimum response and the worst vertex has the maximum response for these three cases. For a case in which factors number more than two, the vertex next to worst would be selected instead of that here called intermediate.
> 
L_chisq := [chisq_initial[1],chisq_initial[2],chisq_initial[3]]:
chisq_B_initial := min(op(L_chisq));
chisq_W_initial :=max(op(L_chisq));
chisq_N_initial := min(op({op(L_chisq)} minus {chisq_B_initial}));
member(chisq_B_initial, L_chisq, 'NB'):
member(chisq_W_initial, L_chisq, 'NW'):
member(chisq_N_initial, L_chisq, 'NN'):
B_initial := [A_inf_initial[NB], k_initial[NB]];
W_initial := [A_inf_initial[NW], k_initial[NW]];
N_initial := [A_inf_initial[NN], k_initial[NN]];

The initial simplex is so characterized with its best values of two parameters
and
and of response
> 
A_inf_simplex[1] := A_inf_initial[NB];
k_simplex[1] := k_initial[NB];
chisq_simplex[1] := chisq_B_initial;

To make the first movement of the triangular initial simplex we reject its worst vertex W and replace it with a new vertex R, generated on reflexion of vertex W through midpoint P of the opposite edge between best B and intermediate N vertices:
> 
P_initial := evalf((B_initial + N_initial)/2.0, 5);
R_initial := P_initial + (P_initial  W_initial);

After that initial reflexion, further movements of the sequential simplex proceed according to modifications by Nelder and Mead of the original algorithm; this modified algorithm contains rules necessary to effect the operations reflexion, expansion and contraction of an initial simplex until it reaches an optimum response, or corresponding minimum value of
. The conditions "
>
"
,
"
>
"
,
"
<
" and "
<
" imply avertex that has a response
"
better than
", "
better than or equal to
",
"
worse than
" and "
worse than or equal to
" [1, 2] according to the following rules.

1) If
, use new simplex
BNR
, and go to 4).

2) If
R
>
B
, calculate and evaluate E through an
expansion
; if
E
>
B
, use new simplex
BNE
, and go to 4; if
E
<
B
, use new simplex
BNR
, and go to 4).

3) For a case
R
<
N
, if
R
>
W
, calculate and evaluate
through a
contraction
=
P
+ 0
, use new simplex
BN
, and go to 4), whereas if
, calculate and evaluate
through a
contraction
, use new simplex
BN
, and go to 4).

4) If a difference between responses or factors of a new vertex and its preceding vertex converges within a preset tolerance, stop.

5) Rank vertices of the newsimplex as
BNW
; as vertex
W
of the new simplex, assign vertex
N
of the preceding simplex.

6) Implement a
reflexion
: calculate and evaluate a new
P
and so a new
R
through
and
R = P + P  W
; go to 1).
We apply exactly these rules in the following programmed loop of commands until convergence according to a defined criterion. Before the loop begins we supply information about the initial simplex and the first reflexion
.
> 
A_inf[NB] := A_inf_initial[NB]:
k[NB] := k_initial[NB]:
A_inf[NN] := A_inf_initial[NN]:
k[NN] := k_initial[NN]:
chisq_B := chisq_B_initial:
chisq_N := chisq_N_initial:
chisq_W := chisq_W_initial:
R := R_initial:
P :=P_initial:
W := W_initial:
N := N_initial:
B := B_initial:

Following this preparation we execute a loop to construct a sequential simplex with variable size.
> 
for j from 2 do
# Evaluate a reflexion.
A_ := R[1]:
k_ := R[2]:
chisq := add((A_data[i]  F(t_data[i], A_, k_))^2, i=1..n_data):
# Calculate a new simplex: calculate and evaluate an expansion or contraction.
if chisq <chisq_B then
E := P + 2.0*(P  W):
A_ := E[1]:
k_ := E[2]:
chisq := add((A_data[i]  F(t_data[i], A_, k_))^2, i=1..n_data):
if chisq > chisq_B then
A_ := R[1]:
k_ := R[2]:
chisq := add((A_data[i]  F(t_data[i], A_, k_))^2, i=1..n_data):
end if:
elif chisq > chisq_N and chisq < chisq_W then
CR := P + 0.50*(P  W):
A_ := CR[1]:
k_ := CR[2]:
chisq := add((A_data[i]  F(t_data[i], A_, k_))^2, i=1..n_data):
elif chisq > chisq_W then
CW := P  0.50*(P  W):
A_ := CW[1];
k_ := CW[2]:
chisq := add((A_data[i]  F(t_data[i], A_, k_))^2, i=1..n_data):
end if;
# A new simplex results.
new_vertex :=[A_,k_]:
A_inf_simplex[j] := A_:
k_simplex[j] := k_:
chisq_simplex[j] := chisq:
# Prepare a plot of this simplex.
GCont3[j] := plots[polygonplot]([B, N, new_vertex], colour=green):
GCont4[j]:= plots[polygonplot]([B, N, new_vertex], style=line, colour=black):
# Test according to a criterion of convergence as a fraction of a tolerance.
if abs(A_inf_simplex[j] A_inf_simplex[j1])/A_inf_simplex[j] < 0.001
and abs(k_simplex[j]  k_simplex[j1])/k_simplex[j] < 0.001 then
cnt := j:
break:
end if:
# If the criterion is not met, rank vertices of the new simplex and
# calculate a new vertex through reflexion.
ANB := A_inf[NB]:
A_inf[1] := ANB:
A_inf[2] := A_:
kNB := k[NB]:
k[1] := kNB:
k[2] := k_:
L_chisq := [chisq_B, chisq]:
chisq_W := chisq_N:
chisq_B := min(op(L_chisq)):
chisq_N := max(op(L_chisq)):
member(chisq_B, L_chisq, 'NB'):
member(chisq_N, L_chisq, 'NN'):
W := N:
B := [A_inf[NB], k[NB]]:
N := [A_inf[NN], k[NN]]:
P := (B + N)/2.0:
R := P + (P  W):
end do:

generation of sequentialsimplices
After a criterion for convergence is satisfied, we obtain the following results for the final simplex.
> 
number_of_simplices := cnt;

> 
A_inf_final := evalf(A_inf_simplex[cnt], 5);
k_final := evalf(k_simplex[cnt], 5);
chisq_final := evalf(chisq_simplex[cnt], 5);

These final simplex results of
A_inf_final
and
k_final
correspond to values of parameters that fit the firstorder kinetic model to experimental data with
equal to
chisq_final
. We plot the fitted curve with the experimental points.
> 
Greg := plot(F(t, A_inf_final, k_final), t=0..25):
plots[display]([Gdata, Greg], titlefont=[TIMES,BOLD,12],
title=" points and fitted curve" );

We compare these final simplex results with
exact values
obtained above.
> 
A_min := A_min;
k_min := k_min;
chisq_min := chisq_min;

We calculate a fractional error as a convenient comparison of results in the two sets; this bias depends on a
criterion for convergence
fixed in the loop above.
> 
A_error := evalf(abs(A_min  A_inf_final)/A_min * 100, 5)*` %`;
k_error := evalf(abs(k_min  k_final)/k_min * 100,5)*` %`;

These errors can be decreased on diminishing a criterion of convergence in the above loop, but such a diminution has no discernible effect on movement of the simplex on a scale of contours shown above, and likely involve fitting data within the range of their uncertainties.
movement of simplex towards convergence
The following plot shows the movement of the simplex from its selected initial position on the surface of
, according to its contours, to its ultimate destination at the location of a minimum as evaluated above.
Play this animation!
> 
GContBk := plots[display]([GCont0, GCont1, GCont2]):
for j from 2 to cnt do
GContS[j] := plots[display]([GCont0, GCont1, GCont3[j]]):
end do:
plots[display]([GContBk, seq(GContS[j], j=2..cnt)], insequence=true,
title="animation of progress of sequential simplex",
titlefont=[TIMES,BOLD,12]);

The next plot tracks the sequence of simplices from an initial location until convergence, and shows how the size of each simplex expands or contracts from the preceding figure according to the slope of the surface: the simplex rapidly expands down the slope into the valley, contracts when reaching the valley, accelerates along the valley toward the minimum and finally collapses at the minimum after contractions in a continuous sequence.
> 
plots[display]([GCont0, GCont1, GCont2, seq(GCont4[j], j=2..cnt)],
titlefont=[TIMES,BOLD,12], title="sequential simplex track");

Here is a corresponding plot of values of
in a sequence towards convergence; the behaviour shown for this plot is consistent with the sequential movement of the simplex over the response surface. The variations of response produced are initially large as the simplex moves down the slope into the valley, but thereafter variations between consecutive figures are small as the simplex moves along the valley as far as the optimum.
> 
plots[pointplot]([seq([j, chisq_simplex[j]],j=1..cnt)], connect=true,
title="sequential change of chi squared",
titlefont=[TIMES,BOLD,12], labels=["simplex number","chi squared"]);

The following plots show how values of parameters
and
vary during the progress of the simplex from its initial location until convergence. The behaviour that is observed for both parameters reflects movements of the simplex across the surface: rapid expansions and contractions at the beginning of the sequential simplex produce important  even erratic  alterations in the magnitudeof each parameter. When the simplex approaches the optimum by mean of sucessive contractions, there result smaller alterations of parameters until criterion for convergence criterion is satisfied.
> 
plots[pointplot]([seq([j, A_inf_simplex[j]], j=1..cnt)], connect=true,
title="sequential change of parameter A_inf",
titlefont=[TIMES,BOLD,12], labels=["simplex number","A_inf"]);

> 
plots[pointplot]([seq([j, k_simplex[j]], j=1..cnt)],
connect=true, title="sequential change of parameter k",
titlefont=[TIMES,BOLD,12], labels=["simplex number","k"]);

table of numerical simplex progress
Here is a table of numerical values of parameters and response in progress of the simplex during optimization.
> 
print(`simplex `,A[infinity],` k`,chi^2);
for j to cnt do
print(j, evalf(A_inf_simplex[j],5), evalf(k_simplex[j],5), evalf(chisq_simplex[j],6));
end do;

other initial vertices
Varying size and location of an initial simplex results in varied behaviour across the response surface until the minimum. To explore this behaviour we suggest for an alternative initial vertex these coordinates [
,
k
] :

large initial simplex overthe minimum 
[0.200, 0.010], [0.600, 0.010] and [0.400, 0.800] ;

small initial simplex in the valley  [0.300, 0.800], [0.338, 0.826], and [0.313, 0.897] .
conclusion
Although simplex algorithms generally applied in mathematical optimisation workwith simplices of fixed or variable size, important modifications are reported in the literature to involve boundary conditions for constrained optimisation, weightedcentroid algorithms, and fitting of multiparametric models with estimates of errors of parameters [1]. Important published modifications include a supermodified simplex [5] or a compositemodified simplex [6] that seems to effect increased efficiency in the simplex. Here we have shown the applicability of
Maple
's environment to study optimisation through fitting of a model with two parameters according to a criterion of least sum of squared residuals using an algorithm for a sequential simplex of variable size; introduction and analysis of any other simplex extension is straightforward, reflecting
Maple
's capabilities for graphics and mathematical tools, easy handling of variables and powerful programming.
references
[1] Walters, F. H.; Parker, L. R.; Morgan, S. L.; Deming, S. N.
Sequential Simplex Optimisation
, CRC Press LLC, Florida, USA,
1991
.
[2] Deming, S. N.; Morgan, S. L. "Simplex optimisation of variables in analytical chemistry",
Analytical Chemistry
,
1973
, 45, 278A283A.
[3] Spendly, W.; Hext, G. R.; Himsworth, F. R. "Sequential application of simplex designs in optimisation and evolutionary operation",
Technometrics
,
1962
, 4, 441461.
[4] Nelder, A; Mead, R. "A simplex method for function minimisation",
Computer Journal
,
1965
, 7, 308313.
[5] Routh, M. W.; Swartz, R. A.; Denton, M. B. "Performance of the supermodified simplex",
Analytical Chemistry
,
1977
, 49, 14221428.
[6] Betteridge, D.; Wade, R. A.; Howard, A. G. "Reflections on the modified simplex  I, II",
Talanta
,
1985
, 32, 709734.