Optimal tuning for divide-and-conquer kernel ridge regression with massive data

Ganggang Xu, Zuofeng Shang, Guang Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Divide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages8719-8737
Number of pages19
ISBN (Electronic)9781510867963
StatePublished - 2018
Externally publishedYes
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume12

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period7/10/187/15/18

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Optimal tuning for divide-and-conquer kernel ridge regression with massive data'. Together they form a unique fingerprint.

Cite this