Selecting the right correlation measure for binary data

Lian Duan, W. Nick Street, Yanchi Liu, Songhua Xu, Brook Wu

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Although there are numerous measures available for evaluating correlations, different correlation measures provide drastically different results. Piatetsky-Shapiro provided three mandatory properties for any reasonable correlation measure, and Tan et al. proposed several properties to categorize correlation measures; however, it is still hard for users to choose the desirable correlation measures according to their needs. In order to solve this problem, we explore the effectiveness problem in three ways. First, we propose two desirable properties and two optional properties for correlation measure selection and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust correlation measures and propose two new correlation measures: the Simplified x2 with Continuity Correction and the Simplified x2 with Support. Third, we analyze the upper and lower bounds of different measures and categorize them by the bound differences. Combining these three directions, we provide guidelines for users to choose the proper measure according to their needs.

Original languageEnglish (US)
Article number13
JournalACM Transactions on Knowledge Discovery from Data
Volume9
Issue number2
DOIs
StatePublished - Sep 23 2014

All Science Journal Classification (ASJC) codes

  • General Computer Science

Keywords

  • Association rules
  • Correlation
  • Knowledge discovery

Cite this