Anomaly Detection (AD) is an integral part of AI, with applications ranging widely from health to finance, manufacturing, and computer security. Though AD is popular and various AD algorithm implementations are found in popular toolkits, no attempt has been made to test the reliability of these implementations. More generally, AD verification and validation are lacking. To address this need, we introduce an approach and study on 4 popular AD algorithms as implemented in 3 popular tools, as follows. First, we checked whether implementations can perform their basic task of finding anomalies in datasets with known anomalies. Next, we checked two basic properties, determinism and consistency. Finally, we quantified differences in algorithms' outcome so users can get a idea of variations that can be expected when using different algorithms on the same dataset. We ran our suite of analyses on 73 datasets that contain anomalies. We found that, for certain implementations, validation can fail on 10-73% of datasets. Our analysis has revealed that five implementations suffer from nondeterminism (19-98% of runs are nondeterministic), and 10 out of 12 implementation pairs are inconsistent.