Statistically rigorous testing of clustering implementations

Xin Yin, Vincenzo Musco, Iulian Neamtiu, Usman Roshan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Clustering is a widely-used and well-studied AI branch, but defining clustering correctness, as well as verifying and validating clustering implementations, remains a challenge. To address this, we propose a statistically rigorous approach that couples differential clustering with statistical hypothesis testing, namely we conduct statistical hypothesis testing on the outcome (distribution) of differential clustering to reveal problematic outcomes. We employed this approach on widely-used clustering algorithms implemented in popular ML toolkits; the toolkits were tasked with clustering datasets from the Penn Machine Learning Benchmark. The results indicate that there are statistically significant differences in clustering outcomes in a variety of scenarios where users might not expect clustering outcome variation.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages91-98
Number of pages8
ISBN (Electronic)9781728104928
DOIs
StatePublished - May 17 2019
Event1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019 - Newark, United States
Duration: Apr 4 2019Apr 9 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019

Conference

Conference1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019
Country/TerritoryUnited States
CityNewark
Period4/4/194/9/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Safety, Risk, Reliability and Quality

Keywords

  • Clustering
  • Machine Learning
  • Statistics
  • Testing

Fingerprint

Dive into the research topics of 'Statistically rigorous testing of clustering implementations'. Together they form a unique fingerprint.

Cite this