Abstract
Modular addition tasks serve as a useful test bed for observing empirical phenomena in deep learning, including the phenomenon of grokking. Prior work has shown that one-layer transformer architectures learn Fourier Multiplication circuits to solve modular addition tasks. In this paper, we show that Recurrent Neural Networks (RNNs) trained on modular addition tasks also use a Fourier Multiplication strategy. We identify low rank structures in the model weights, and attribute model components to specific Fourier frequencies, resulting in a sparse representation in the Fourier space. We also show empirically that the RNN is robust to removing individual frequencies, while the performance degrades drastically as more frequencies are ablated from the model.
| Original language | English (US) |
|---|---|
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: Apr 6 2025 → Apr 11 2025 |
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Electrical and Electronic Engineering
Keywords
- deep learning
- Fourier features
- modular addition
- recurrent networks