Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

Hongkang Li, Shuai Zhang, Meng Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.

Original languageEnglish (US)
Title of host publication2022 56th Annual Conference on Information Sciences and Systems, CISS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages37-42
Number of pages6
ISBN (Electronic)9781665417969
DOIs
StatePublished - 2022
Externally publishedYes
Event56th Annual Conference on Information Sciences and Systems, CISS 2022 - Princeton, United States
Duration: Mar 9 2022Mar 11 2022

Publication series

Name2022 56th Annual Conference on Information Sciences and Systems, CISS 2022

Conference

Conference56th Annual Conference on Information Sciences and Systems, CISS 2022
Country/TerritoryUnited States
CityPrinceton
Period3/9/223/11/22

All Science Journal Classification (ASJC) codes

  • Information Systems and Management
  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems

Keywords

  • Gaussian mixture model
  • convergence
  • generalization
  • neural networks
  • sample complexity

Fingerprint

Dive into the research topics of 'Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data'. Together they form a unique fingerprint.

Cite this