Analysis of presence-only data via semi-supervised learning approaches

Junhui Wang, Yixin Fang

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Presence-only data occur in a classification, which consist of a sample of observations from the presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semi-supervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to the presence class. In this article, we propose a generalized class balance constraint, which can be equipped with semi-supervised learning approaches to prevent them from degeneration. Furthermore, to circumvent the difficulty of model tuning with presence-only data, a selection criterion based on classification stability is developed, which measures the robustness of any given classification algorithm against the sampling randomness. The effectiveness of the proposed approach is demonstrated through a variety of simulated examples, along with an application to gene function prediction.

Original languageEnglish (US)
Pages (from-to)134-143
Number of pages10
JournalComputational Statistics and Data Analysis
Volume59
Issue number1
DOIs
StatePublished - Mar 2013

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Keywords

  • Cross validation
  • Functional genomics
  • Stability
  • Support vector machine
  • Tuning

Fingerprint

Dive into the research topics of 'Analysis of presence-only data via semi-supervised learning approaches'. Together they form a unique fingerprint.

Cite this