Abstract
Local SGD is a popular optimization method in distributed learning, often outperforming minibatch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD's effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 4115-4157 |
| Number of pages | 43 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 247 |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada Duration: Jun 30 2024 → Jul 3 2024 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability
Keywords
- Distributed optimization
- intermittent communication
- Local SGD
- min-max optimal
Fingerprint
Dive into the research topics of 'The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver