TY - GEN
T1 - Anomalous Anomaly Detection
AU - Ahmed, Muyeed
AU - Neamtiu, Iulian
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Anomaly Detection (AD) is an integral part of AI, with applications ranging widely from health to finance, manufacturing, and computer security. Though AD is popular and various AD algorithm implementations are found in popular toolkits, no attempt has been made to test the reliability of these implementations. More generally, AD verification and validation are lacking. To address this need, we introduce an approach and study on 4 popular AD algorithms as implemented in 3 popular tools, as follows. First, we checked whether implementations can perform their basic task of finding anomalies in datasets with known anomalies. Next, we checked two basic properties, determinism and consistency. Finally, we quantified differences in algorithms' outcome so users can get a idea of variations that can be expected when using different algorithms on the same dataset. We ran our suite of analyses on 73 datasets that contain anomalies. We found that, for certain implementations, validation can fail on 10-73% of datasets. Our analysis has revealed that five implementations suffer from nondeterminism (19-98% of runs are nondeterministic), and 10 out of 12 implementation pairs are inconsistent.
AB - Anomaly Detection (AD) is an integral part of AI, with applications ranging widely from health to finance, manufacturing, and computer security. Though AD is popular and various AD algorithm implementations are found in popular toolkits, no attempt has been made to test the reliability of these implementations. More generally, AD verification and validation are lacking. To address this need, we introduce an approach and study on 4 popular AD algorithms as implemented in 3 popular tools, as follows. First, we checked whether implementations can perform their basic task of finding anomalies in datasets with known anomalies. Next, we checked two basic properties, determinism and consistency. Finally, we quantified differences in algorithms' outcome so users can get a idea of variations that can be expected when using different algorithms on the same dataset. We ran our suite of analyses on 73 datasets that contain anomalies. We found that, for certain implementations, validation can fail on 10-73% of datasets. Our analysis has revealed that five implementations suffer from nondeterminism (19-98% of runs are nondeterministic), and 10 out of 12 implementation pairs are inconsistent.
KW - AI reliability
KW - AI testing
KW - Anomaly Detection
KW - Machine Learning
KW - Nondeterminism
KW - Outlier Detection
KW - Verification
UR - http://www.scopus.com/inward/record.url?scp=85141086359&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141086359&partnerID=8YFLogxK
U2 - 10.1109/AITest55621.2022.00009
DO - 10.1109/AITest55621.2022.00009
M3 - Conference contribution
AN - SCOPUS:85141086359
T3 - Proceedings - 4th IEEE International Conference on Artificial Intelligence Testing, AITest 2022
SP - 1
EP - 6
BT - Proceedings - 4th IEEE International Conference on Artificial Intelligence Testing, AITest 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Artificial Intelligence Testing, AITest 2022
Y2 - 15 August 2022 through 18 August 2022
ER -