Computational limits of a distributed algorithm for smoothing spline

Zuofeng Shang, Guang Cheng

Research output: Contribution to journalArticlepeer-review

38 Scopus citations

Abstract

In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established: when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible. These sharp bounds partly capture intrinsic computational limits of the distributed algorithm considered in this paper, and turn out to be fully determined by the smoothness of the regression function. We name the asymptotic analysis on such split-and-aggregation estimation/inference as “splitotic” theory. As a side remark, we argue that sample splitting may be viewed as an alternative form of regularization, playing a similar role as smoothing parameter.

Original languageEnglish (US)
Pages (from-to)1-37
Number of pages37
JournalJournal of Machine Learning Research
Volume18
StatePublished - Oct 1 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Keywords

  • Computational limits
  • Divide-and-conquer
  • Smoothing spline
  • Splitotic theory

Fingerprint

Dive into the research topics of 'Computational limits of a distributed algorithm for smoothing spline'. Together they form a unique fingerprint.

Cite this