This paper describes the use of a concept hierarchy for improving the results of association rule mining. Given a large set of tuples with demographic information and personal interest information, association rules can be derived, that associate ages and gender with interests. However, it is a problem to come up with rules with high support whenever the mined data set is sparse. On the other hand, if rules with high support can be generated, they tend to involve interests that are too abstract to be of practical use. To overcome the first problem, we have developed a method of raising data instances to higher levels in the ontology. In this paper we give a formal definition of the raising operation. We also show that in some cases data mining with raised data leads to rules that better represent the reality. In order to avoid the second problem, namely rules that are too abstract, we formulate a notion of an optimal target level for the raising operation. We then derive two estimates for this optimal raising level. Knowing to which level to raise reduces the computational effort of raising to several levels and reduces the user effort of selecting those mined rules that best fit her/his needs from a large candidate set.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Vision and Pattern Recognition
- Artificial Intelligence
- association rule mining
- high support rules
- interest hierarchy