Skip to main navigation Skip to search Skip to main content

Convergence of Markov Chains for Constant Step-Size Stochastic Gradient Descent with Separable Functions

Research output: Contribution to journalArticlepeer-review

Abstract

Stochastic gradient descent (SGD) is a popular algorithm for minimizing objective functions that arise in machine learning. For constant step-sized SGD, the iterates form a Markov chain on a general state space. Focusing on a class of separable (nonconvex) objective functions, we establish a ``Doeblin-type decomposition,” in that the state space decomposes into a uniformly transient set and a disjoint union of absorbing sets. Each of the absorbing sets contains a unique invariant measure, with the set of all invariant measures being the convex hull. Moreover, the set of invariant measures are shown to be global attractors to the Markov chain with a geometric convergence rate. The theory is highlighted with examples that show (1) the failure of the diffusion approximation to characterize the long-time dynamics of SGD; (2) the global minimum of an objective function may lie outside the support of the invariant measures (i.e., even if initialized at the global minimum, SGD iterates will leave); and (3) bifurcations may enable the SGD iterates to transition between two local minima.

Original languageEnglish (US)
Pages (from-to)2005-2043
Number of pages39
JournalSIAM Journal on Applied Dynamical Systems
Volume24
Issue number3
DOIs
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Analysis
  • Modeling and Simulation

Keywords

  • Doeblin-type decomposition
  • Markov chains
  • bifurcations
  • constant step-size
  • diffusion approximation
  • iterated function systems
  • spectral gap
  • stochastic gradient descent

Fingerprint

Dive into the research topics of 'Convergence of Markov Chains for Constant Step-Size Stochastic Gradient Descent with Separable Functions'. Together they form a unique fingerprint.

Cite this