TY - JOUR
T1 - Active defect discovery
T2 - A human-in-the-loop learning method
AU - Shen, Bo
AU - Kong, Zhenyu
N1 - Publisher Copyright:
© Copyright © 2023 “IISE”.
PY - 2023
Y1 - 2023
N2 - Unsupervised defect detection methods are applied to an unlabeled dataset by producing a ranked list based on defect scores. Unfortunately, many of the top-ranked instances by unsupervised algorithms are not defects, which leads to high false-positive rates. Active Defect Discovery (ADD) is proposed to overcome this deficiency, which sequentially selects instances to get the labeling information (defects or not). However, labeling is often costly. Therefore, balancing detection accuracy and labeling cost is essential. Along this line, this article proposes a novel ADD method to achieve the goal. Our approach is based on the state-of-the-art unsupervised defect detection method, namely, Isolation Forest, as the baseline defect detector to extract features. Thereafter, the sparsity of the extracted features is utilized to adjust the defect detector so that it can focus on more important features for defect detection. To enforce the sparsity of the features and subsequent improvement of the detection accuracy, a new algorithm based on online gradient descent, namely, Sparse Approximated Linear Defect Discovery (SALDD), is proposed with its theoretical Regret analysis. Extensive experiments are conducted on real-world datasets including healthcare, manufacturing, security, etc. The performance demonstrates that the proposed algorithm significantly outperforms the state-of-the-art algorithms for defect detection.
AB - Unsupervised defect detection methods are applied to an unlabeled dataset by producing a ranked list based on defect scores. Unfortunately, many of the top-ranked instances by unsupervised algorithms are not defects, which leads to high false-positive rates. Active Defect Discovery (ADD) is proposed to overcome this deficiency, which sequentially selects instances to get the labeling information (defects or not). However, labeling is often costly. Therefore, balancing detection accuracy and labeling cost is essential. Along this line, this article proposes a novel ADD method to achieve the goal. Our approach is based on the state-of-the-art unsupervised defect detection method, namely, Isolation Forest, as the baseline defect detector to extract features. Thereafter, the sparsity of the extracted features is utilized to adjust the defect detector so that it can focus on more important features for defect detection. To enforce the sparsity of the features and subsequent improvement of the detection accuracy, a new algorithm based on online gradient descent, namely, Sparse Approximated Linear Defect Discovery (SALDD), is proposed with its theoretical Regret analysis. Extensive experiments are conducted on real-world datasets including healthcare, manufacturing, security, etc. The performance demonstrates that the proposed algorithm significantly outperforms the state-of-the-art algorithms for defect detection.
KW - active defect discovery
KW - Isolation forest
KW - measurement feedback
KW - online gradient descent
KW - sparsity
UR - http://www.scopus.com/inward/record.url?scp=85165706344&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85165706344&partnerID=8YFLogxK
U2 - 10.1080/24725854.2023.2224854
DO - 10.1080/24725854.2023.2224854
M3 - Article
AN - SCOPUS:85165706344
SN - 2472-5854
JO - IISE Transactions
JF - IISE Transactions
ER -