For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability

Akshay Rangamani, Lorenzo Rosasco, Tomaso Poggio

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In this paper, we study kernel ridge-less regression, including the case of interpolating solutions. We prove that maximizing the leave-one-out (CVloo) stability minimizes the expected error. Further, we also prove that the minimum norm solution - to which gradient algorithms are known to converge - is the most stable solution. More precisely, we show that the minimum norm interpolating solution minimizes a bound on CVloo stability, which in turn is controlled by the smallest singular value, hence the condition number, of the empirical kernel matrix. These quantities can be characterized in the asymptotic regime where both the dimension (d) and cardinality (n) of the data go to infinity (with n d → γ as d,n →∞). Our results suggest that the property of CVloo stability of the learning algorithm with respect to perturbations of the training set may provide a more general framework than the classical theory of Empirical Risk Minimization (ERM). While ERM was developed to deal with the classical regime in which the architecture of the learning network is fixed and n →∞, the modern regime focuses on interpolating regressors and overparameterized models, when both d and n go to infinity. Since the stability framework is known to be equivalent to the classical theory in the classical regime, our results here suggest that it may be interesting to extend it beyond kernel regression to other overparameterized algorithms such as deep networks.

Original languageEnglish (US)
Pages (from-to)193-215
Number of pages23
JournalAnalysis and Applications
Volume21
Issue number1
DOIs
StatePublished - Jan 1 2023
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Analysis
  • Applied Mathematics

Keywords

  • Algorithmic stability
  • high dimensional statistics
  • kernel regression
  • minimum norm interpolation
  • overparameterization

Fingerprint

Dive into the research topics of 'For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability'. Together they form a unique fingerprint.

Cite this