TY - GEN
T1 - Sparse Coding and Autoencoders
AU - Rangamani, Akshay
AU - Mukherjee, Anirbit
AU - Basu, Amitabh
AU - Arora, Ashish
AU - Ganapathi, Tejaswini
AU - Chin, Sang
AU - Tran, Trac D.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/15
Y1 - 2018/8/15
N2 - In this work we study the landscape of squared loss of an Autoencoder when the data generative model is that of 'Sparse Coding'/'Dictionary Learning'. The neural net considered is an \mathbb{R}^{n}\rightarrow \mathbb{R}^{n} mapping and has a single ReLU activation layer of size h > n. The net has access to vectors y\in \mathbb{R}^{n} obtained as y=A^{\ast}x^{\ast} where x^{\ast}\in \mathbb{R}^{h} are sparse high dimensional vectors and A^{\ast}\in \mathbb{R}^{n\times h} is an overcomplete incoherent matrix. Under very mild distributional assumptions on x^{\ast}, we prove that the norm of the expected gradient of the squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of A^{\ast}. This is supported with experimental evidence using synthetic data. We conduct experiments to suggest that A^{\ast} sits at the bottom of a well in the landscape and we also give experiments showing that gradient descent on this loss function gets columnwise very close to the original dictionary even with far enough initialization. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. Since this property holds independent of the loss function we believe that it could be of independent interest. A full version of this paper is accessible at: https://arxiv.org/abs/1708.03735
AB - In this work we study the landscape of squared loss of an Autoencoder when the data generative model is that of 'Sparse Coding'/'Dictionary Learning'. The neural net considered is an \mathbb{R}^{n}\rightarrow \mathbb{R}^{n} mapping and has a single ReLU activation layer of size h > n. The net has access to vectors y\in \mathbb{R}^{n} obtained as y=A^{\ast}x^{\ast} where x^{\ast}\in \mathbb{R}^{h} are sparse high dimensional vectors and A^{\ast}\in \mathbb{R}^{n\times h} is an overcomplete incoherent matrix. Under very mild distributional assumptions on x^{\ast}, we prove that the norm of the expected gradient of the squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of A^{\ast}. This is supported with experimental evidence using synthetic data. We conduct experiments to suggest that A^{\ast} sits at the bottom of a well in the landscape and we also give experiments showing that gradient descent on this loss function gets columnwise very close to the original dictionary even with far enough initialization. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. Since this property holds independent of the loss function we believe that it could be of independent interest. A full version of this paper is accessible at: https://arxiv.org/abs/1708.03735
UR - http://www.scopus.com/inward/record.url?scp=85052437326&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052437326&partnerID=8YFLogxK
U2 - 10.1109/ISIT.2018.8437533
DO - 10.1109/ISIT.2018.8437533
M3 - Conference contribution
AN - SCOPUS:85052437326
SN - 9781538647806
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 36
EP - 40
BT - 2018 IEEE International Symposium on Information Theory, ISIT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Symposium on Information Theory, ISIT 2018
Y2 - 17 June 2018 through 22 June 2018
ER -