Abstract
This paper focuses on the distributed learning in nonparamet-ric regression framework. With sufficient computational resources, the ef-ficiency of distributed algorithms improves as the number of machines in-creases. We aim to analyze how the number of machines affects statistical optimality. We establish an upper bound for the number of machines to achieve statistical minimax in two settings: nonparametric estimation and hypothesis testing. Our framework is general compared with existing work. We build a unified frame in distributed inference for various regression problems, including thin-plate splines and additive regression under random design: univariate, multivariate, and diverging-dimensional designs. The main tool to achieve this goal is a tight bound of an empirical process by introducing the Green function for equivalent kernels. Thorough numerical studies back theoretical findings.
Original language | English (US) |
---|---|
Pages (from-to) | 3070-3102 |
Number of pages | 33 |
Journal | Electronic Journal of Statistics |
Volume | 14 |
Issue number | 2 |
DOIs | |
State | Published - 2020 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Computational limit
- Divide and conquer
- Kernel ridge regression
- Minimax optimality
- Nonparametric testing