Abstract
We consider the problem of identifying the source of failure in a network after receiving alarms or having observed symptoms. To locate the root cause accurately and timely in a large communication system is challenging because a single fault can often result in a large number of alarms, and multiple faults can occur concurrently. In this paper, we present a new fault localization method using a machine-learning approach. We propose to use logistic regression to study the correlation among network events based on end-to-end measurements. Then based on the regression model, we develop fault hypothesis that best explains the observed symptoms. Unlike previous work, the machine-learning algorithm requires neither the knowledge of dependencies among network events, nor the probabilities of faults, nor the conditional probabilities of fault propagation as input. The 'low requirement' feature makes it suitable for large complex networks where accurate dependencies and prior probabilities are difficult to obtain. We then evaluate the performance of the learning algorithm with respect to the accuracy of fault hypothesis and the concentration property. Experimental results and theoretical analysis both show satisfactory performance.
Original language | English (US) |
---|---|
Article number | 7336493 |
Pages (from-to) | 701-708 |
Number of pages | 8 |
Journal | IEEE Internet of Things Journal |
Volume | 3 |
Issue number | 5 |
DOIs | |
State | Published - Oct 2016 |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Information Systems
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications
Keywords
- Complex networks
- computer network reliability
- fault diagnosis
- fault location
- logistic regression
- machine learning