TY - GEN
T1 - Dual-Lagrange Encoding for Storage and Download in Elastic Computing for Resilience
AU - Zhong, Xi
AU - Lu, Samuel
AU - Kliewer, Jörg
AU - Ji, Mingyue
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Coded elastic computing enables virtual machines to be preempted for high-priority tasks while allowing new virtual machines to join ongoing computation seamlessly. This paper addresses coded elastic computing for matrix-matrix multiplications with straggler tolerance by encoding both storage and download using Lagrange codes. In 2018, Yang et al. introduced the first coded elastic computing scheme for matrix-matrix multiplications, achieving a lower computational load requirement. However, this scheme lacks straggler tolerance and suffers from high upload cost. Zhong et al. (2023) later tackled these shortcomings by employing uncoded storage and Lagrange-coded download. However, their approach requires each machine to store the entire dataset. This paper introduces a new class of elastic computing schemes that utilize Lagrange codes to encode both storage and download, achieving a reduced storage size. The proposed schemes efficiently mitigate both elasticity and straggler effects, with a storage size reduced to a fraction 1/L of Zhong et al.'s approach, at the expense of doubling the download cost. Moreover, we evaluate the proposed schemes on AWS EC2 by measuring computation time under two different tasks allocations: heterogeneous and cyclic assignments. Both assignments minimize computation redundancy of the system while distributing varying computation loads across machines.
AB - Coded elastic computing enables virtual machines to be preempted for high-priority tasks while allowing new virtual machines to join ongoing computation seamlessly. This paper addresses coded elastic computing for matrix-matrix multiplications with straggler tolerance by encoding both storage and download using Lagrange codes. In 2018, Yang et al. introduced the first coded elastic computing scheme for matrix-matrix multiplications, achieving a lower computational load requirement. However, this scheme lacks straggler tolerance and suffers from high upload cost. Zhong et al. (2023) later tackled these shortcomings by employing uncoded storage and Lagrange-coded download. However, their approach requires each machine to store the entire dataset. This paper introduces a new class of elastic computing schemes that utilize Lagrange codes to encode both storage and download, achieving a reduced storage size. The proposed schemes efficiently mitigate both elasticity and straggler effects, with a storage size reduced to a fraction 1/L of Zhong et al.'s approach, at the expense of doubling the download cost. Moreover, we evaluate the proposed schemes on AWS EC2 by measuring computation time under two different tasks allocations: heterogeneous and cyclic assignments. Both assignments minimize computation redundancy of the system while distributing varying computation loads across machines.
UR - https://www.scopus.com/pages/publications/105021942109
UR - https://www.scopus.com/pages/publications/105021942109#tab=citedBy
U2 - 10.1109/ISIT63088.2025.11195520
DO - 10.1109/ISIT63088.2025.11195520
M3 - Conference contribution
AN - SCOPUS:105021942109
T3 - IEEE International Symposium on Information Theory - Proceedings
BT - ISIT 2025 - 2025 IEEE International Symposium on Information Theory, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Symposium on Information Theory, ISIT 2025
Y2 - 22 June 2025 through 27 June 2025
ER -