Abstract
This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.
Original language | English (US) |
---|---|
Pages (from-to) | 609-614 |
Number of pages | 6 |
Journal | Proceedings - IEEE Global Communications Conference, GLOBECOM |
DOIs | |
State | Published - 2022 |
Event | 2022 IEEE Global Communications Conference, GLOBECOM 2022 - Virtual, Online, Brazil Duration: Dec 4 2022 → Dec 8 2022 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Networks and Communications
- Hardware and Architecture
- Signal Processing
Keywords
- Consensus Monte Carlo
- Distributed Bayesian learning
- coded computing
- grouping
- stragglers