reliability.mws

**System Reliability and Availability**

*by Joe Riel*

* joer@k-online.com *

*www.k-online.com/~joer*

* 2000 Joseph S. Riel*

NOTE: This worksheet demonstrates the use of Maple for computing system reliability and availability for a fault tolerant power system. It illustrates how symbolic expressions for these quantities can be readily calculated using the
__networks__
package and a few additional procedures.

**Introduction:**

Fault tolerant systems use redundant architectures so that they continue operating when a single element has failed.
Consider a power system with n+1 power supplies, see Figure 1. Each supply has a constant failure rate of
. If one supply fails, the system continues to operate while the failed supply is removed and replaced. The repair rate is
. If a second supply fails before the first supply has been serviced the system will fail. Figure 2 shows a state diagram of this architecture.

`> `
**restart;**

`> `
**with(networks):**

`> `
**with(linalg,stackmatrix,coldim,minor,linsolve,iszero,det,col):**

**Procedures**

Define the procedures used in this worksheet.

`> `
**reduce := proc(AA::array)**

local A,i;

option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`;

description "Remove sinks and sources from state array";

A := copy(AA);

for i from coldim(A) to 1 by -1 do

if iszero(col(A,i)) then

A := minor(A,i,i) fi

od;

eval(A)

end:

`> `
**mtbf := proc(G::graph)**

local A,i,dim;

option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`;

description "Compute the MTBF of a system with repair";

A := reduce(diffprob(G));

dim := coldim(A);

if dim=1 then

-1/A[1,1]

else

simplify(-sum('det(minor(A,i,i))','i'=1..coldim(A))/det(A))

fi

end:

`> `
**availability := proc(G::graph)**

local A,V,dim,indx,i;

option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`;

description "Compute the availability of a system with repair";

A := diffprob(G,'indx');

dim := coldim(A);

A := stackmatrix(A,[1$dim]);

V := linsolve(A,[0$dim,1]);

table([seq(indx[i]=V[i],i=1..dim)])

end:

`> `
**diffprob := proc(G::graph,indx)**

local vlist,v,n,i,A,nbrs,id,hd,h,t,tl,e,w,ew,p,prob_depart,prob_arrive;

option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`;

description "Compute the differential probability matrix of a graph";

id := G(_EdgeIndex);

hd := G(_Head);

tl := G(_Tail);

ew := G(_Eweight);

vlist := sort([op(networks['vertices'](G))]);

if nargs > 1 then indx := vlist fi;

n := nops(vlist);

for i to n do p[vlist[i]] := i od;

A := array(1 .. n,1 .. n,sparse);

for v in vlist do

nbrs := networks['neighbors'](v,G);

prob_depart := 0;

for w in nbrs do

prob_arrive := 0;

for e in id[v,w] do

t := tl[e];

h := hd[e];

if t = w then prob_arrive := prob_arrive + ew[e]

elif h = w then prob_depart := prob_depart + ew[e]

fi

od;

A[p[v],p[w]] := prob_arrive

od;

A[p[v],p[v]] := -prob_depart

od;

eval(A)

end:

**MTBF of a System with n+1 Redundancy:**

Mean Time Between Failure (MTBF) is a common figure of merit for a system. We can compute it for this system as follows.

**System Definition**

Create a network,
, that describes the state diagram of figure 2. Use
__addvertex__
to define each state. Use
__addedge__
to connect the states; the weight of each edge is assigned the rate of failure or repair, whichever is appropriate. The repair transition from the fault state to the good state is not included in this

`> `
**new(G1):**

`> `
**addvertex([good,onefailed,fault],G1):**

`> `
**addedge([good,onefailed],weights=(n+1)*lambda,G1):**

`> `
**addedge([onefailed,fault],weights=n*lambda,G1):**

`> `
**addedge([onefailed,good],weights=mu,G1):**

Verify that the graph of the network corresponds to the state diagram.

`> `
**draw(G1);**

**Calculation**

`> `
**MTBF := mtbf(G1);**

Typically the failure rate is much smaller than the repair rate, that is,
. We can use this to approximate the MTBF.

`> `
**collect(MTBF,mu);**

`> `
**MTBFapprox := select(has,%,mu);**

**Availability of a System with n+1 Redundancy:**

Availibility is a common figure of merit for a fault tolerant system. To be meaningful, the system must be repairable from any state. We can achieve this by adding a transition from the fault state back to the good state, see the dashed line in Figure 2.

**System Definition**

Add a transition from the fault state to the good in the state diagram.

`> `
**addedge([fault,good],weights=mu,G1):**

Verify that the graph of the network corresponds to the state diagram.

`> `
**draw(G1);**

**Calculation**

Compute the availability. A table is returned, each item corresponds to the fraction of time that the system remains in that state. The

`> `
**Avail := availability(G1);**

`> `
**A := Avail[good] + Avail[onefailed];**

Availability should be very close to 1. A more convenient figure of merit is unavailibility, defined as
.

`> `
**U := normal(1-A);**

Typically the failure rate is much smaller than the repair rate, that is,
. We can use this to approximate the unavailibility:

`> `
**Uapprox := simplify(convert(series(U,lambda,3),polynom));**

**Conclusion:**

By using Maple we are able to quickly derive symbolic expressions for two common figures of merit, MTBF and Availability, for redundant system architectures. The symbolic expressions permits us to intuitively understand the tradeoffs of different redundant configurations.

**Disclaimer:**
While every effort has been made to validate the solutions in this worksheet, Waterloo Maple Inc. and the contributors are not responsible for any errors contained and are not liable for any damages resulting from the use of this material.