Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and Its Asymptotic Optimality

Ganggang Xu, Zuofeng Shang, Guang Cheng

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Tuning parameter selection is of critical importance for kernel ridge regression. To date, a data-driven tuning method for divide-and-conquer kernel ridge regression (d-KRR) has been lacking in the literature, which limits the applicability of d-KRR for large datasets. In this article, by modifying the generalized cross-validation (GCV) score, we propose a distributed generalized cross-validation (dGCV) as a data-driven tool for selecting the tuning parameters in d-KRR. Not only the proposed dGCV is computationally scalable for massive datasets, it is also shown, under mild conditions, to be asymptotically optimal in the sense that minimizing the dGCV score is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework. Supplemental materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)891-908
Number of pages18
JournalJournal of Computational and Graphical Statistics
Volume28
Issue number4
DOIs
StatePublished - Oct 2 2019

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Keywords

  • Distributed GCV
  • Divide-and-conquer
  • Kernel ridge regression
  • Optimal tuning

Fingerprint

Dive into the research topics of 'Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and Its Asymptotic Optimality'. Together they form a unique fingerprint.

Cite this