TY - GEN
T1 - Robust Distributed Bayesian Learning with Stragglers via Consensus Monte Carlo
AU - Chittoor, Hari Hara Suthan
AU - Simeone, Osvaldo
N1 - Funding Information:
The authors have received funding from the European Research Council(ERC) under the European Unions Horizon 2020 Research and Innovation Programme (Grant Agreement No. 725731).
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.
AB - This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.
KW - Consensus Monte Carlo
KW - Distributed Bayesian learning
KW - coded computing
KW - grouping
KW - stragglers
UR - http://www.scopus.com/inward/record.url?scp=85146936813&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146936813&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM48099.2022.10001070
DO - 10.1109/GLOBECOM48099.2022.10001070
M3 - Conference contribution
AN - SCOPUS:85146936813
T3 - 2022 IEEE Global Communications Conference, GLOBECOM 2022 - Proceedings
SP - 609
EP - 614
BT - 2022 IEEE Global Communications Conference, GLOBECOM 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Global Communications Conference, GLOBECOM 2022
Y2 - 4 December 2022 through 8 December 2022
ER -