System Reliability and Availability
by Joe Riel
joer@k-online.com
www.k-online.com/~joer
2000 Joseph S. Riel
NOTE: This worksheet demonstrates the use of Maple for computing system reliability and availability for a fault tolerant power system. It illustrates how symbolic expressions for these quantities can be readily calculated using the networks package and a few additional procedures.
Introduction:
Fault tolerant systems use redundant architectures so that they continue operating when a single element has failed. Consider a power system with n+1 power supplies, see Figure 1. Each supply has a constant failure rate of . If one supply fails, the system continues to operate while the failed supply is removed and replaced. The repair rate is . If a second supply fails before the first supply has been serviced the system will fail. Figure 2 shows a state diagram of this architecture.
> restart;
> with(networks):
> with(linalg,stackmatrix,coldim,minor,linsolve,iszero,det,col):
Procedures
Define the procedures used in this worksheet.
> reduce := proc(AA::array) local A,i; option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`; description "Remove sinks and sources from state array"; A := copy(AA); for i from coldim(A) to 1 by -1 do if iszero(col(A,i)) then A := minor(A,i,i) fi od; eval(A) end:
> mtbf := proc(G::graph) local A,i,dim; option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`; description "Compute the MTBF of a system with repair"; A := reduce(diffprob(G)); dim := coldim(A); if dim=1 then -1/A[1,1] else simplify(-sum('det(minor(A,i,i))','i'=1..coldim(A))/det(A)) fi end:
> availability := proc(G::graph) local A,V,dim,indx,i; option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`; description "Compute the availability of a system with repair"; A := diffprob(G,'indx'); dim := coldim(A); A := stackmatrix(A,[1$dim]); V := linsolve(A,[0$dim,1]); table([seq(indx[i]=V[i],i=1..dim)]) end:
> diffprob := proc(G::graph,indx) local vlist,v,n,i,A,nbrs,id,hd,h,t,tl,e,w,ew,p,prob_depart,prob_arrive; option `Copyright (c) 1999 by Joseph Riel. All rights reserved.`; description "Compute the differential probability matrix of a graph"; id := G(_EdgeIndex); hd := G(_Head); tl := G(_Tail); ew := G(_Eweight); vlist := sort([op(networks['vertices'](G))]); if nargs > 1 then indx := vlist fi; n := nops(vlist); for i to n do p[vlist[i]] := i od; A := array(1 .. n,1 .. n,sparse); for v in vlist do nbrs := networks['neighbors'](v,G); prob_depart := 0; for w in nbrs do prob_arrive := 0; for e in id[v,w] do t := tl[e]; h := hd[e]; if t = w then prob_arrive := prob_arrive + ew[e] elif h = w then prob_depart := prob_depart + ew[e] fi od; A[p[v],p[w]] := prob_arrive od; A[p[v],p[v]] := -prob_depart od; eval(A) end:
MTBF of a System with n+1 Redundancy:
Mean Time Between Failure (MTBF) is a common figure of merit for a system. We can compute it for this system as follows.
System Definition
Create a network, , that describes the state diagram of figure 2. Use addvertex to define each state. Use addedge to connect the states; the weight of each edge is assigned the rate of failure or repair, whichever is appropriate. The repair transition from the fault state to the good state is not included in this
> new(G1):
> addvertex([good,onefailed,fault],G1):
> addedge([good,onefailed],weights=(n+1)*lambda,G1):
> addedge([onefailed,fault],weights=n*lambda,G1):
> addedge([onefailed,good],weights=mu,G1):
Verify that the graph of the network corresponds to the state diagram.
> draw(G1);
Calculation
> MTBF := mtbf(G1);
Typically the failure rate is much smaller than the repair rate, that is, . We can use this to approximate the MTBF.
> collect(MTBF,mu);
> MTBFapprox := select(has,%,mu);
Availability of a System with n+1 Redundancy:
Availibility is a common figure of merit for a fault tolerant system. To be meaningful, the system must be repairable from any state. We can achieve this by adding a transition from the fault state back to the good state, see the dashed line in Figure 2.
Add a transition from the fault state to the good in the state diagram.
> addedge([fault,good],weights=mu,G1):
Compute the availability. A table is returned, each item corresponds to the fraction of time that the system remains in that state. The
> Avail := availability(G1);
> A := Avail[good] + Avail[onefailed];
Availability should be very close to 1. A more convenient figure of merit is unavailibility, defined as .
> U := normal(1-A);
Typically the failure rate is much smaller than the repair rate, that is, . We can use this to approximate the unavailibility:
> Uapprox := simplify(convert(series(U,lambda,3),polynom));
Conclusion:
By using Maple we are able to quickly derive symbolic expressions for two common figures of merit, MTBF and Availability, for redundant system architectures. The symbolic expressions permits us to intuitively understand the tradeoffs of different redundant configurations.
Disclaimer: While every effort has been made to validate the solutions in this worksheet, Waterloo Maple Inc. and the contributors are not responsible for any errors contained and are not liable for any damages resulting from the use of this material.