Abstract
We devise efficient algorithms to construct, evaluate, and approximate a Markovian dependability system with cascading failures. The model, which was previously considered by Iyer et al., represents a cascading failure as a tree of components that instantaneously and probabilistically fail. Constructing the Markov chain presents significant computational challenges because it requires generating and evaluating all such possible trees, but the number of trees can grow exponentially in the size of the model. Our new algorithm reduces runtimes by orders of magnitude compared to a previous method devised by Iyer et al. Moreover, we propose some efficient approximations based on the idea of most likely paths to failure to further substantially reduce the computation time by instead constructing a model that uses only a subset of the trees.We also derive two new dependability measures related to the distribution of the size of a cascade. We present numerical results demonstrating the effectiveness of our approaches. For a model of a large cloud-computing system, our approximations reduce computation times by orders of magnitude with only a few percent error in the computed dependability measures.
Original language | English (US) |
---|---|
Article number | 2684785 |
Pages (from-to) | 258-280 |
Number of pages | 23 |
Journal | IEEE Transactions on Reliability |
Volume | 66 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2017 |
All Science Journal Classification (ASJC) codes
- Safety, Risk, Reliability and Quality
- Electrical and Electronic Engineering
Keywords
- Availability
- Cascading failures
- Markov processes
- Reliability modeling
- Trees