The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

  • Kumar Kshitij Patel
  • , Margalit Glasgow
  • , Ali Zindari
  • , Lingxiao Wang
  • , Sebastian U. Stich
  • , Ziheng Cheng
  • , Nirmit Joshi
  • , Nathan Srebro

Research output: Contribution to journalConference articlepeer-review

Abstract

Local SGD is a popular optimization method in distributed learning, often outperforming minibatch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD's effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.

Original languageEnglish (US)
Pages (from-to)4115-4157
Number of pages43
JournalProceedings of Machine Learning Research
Volume247
StatePublished - 2024
Externally publishedYes
Event37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada
Duration: Jun 30 2024Jul 3 2024

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Keywords

  • Distributed optimization
  • intermittent communication
  • Local SGD
  • min-max optimal

Fingerprint

Dive into the research topics of 'The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication'. Together they form a unique fingerprint.

Cite this