TY - GEN
T1 - Provably Good Randomized Strategies for Data Placement in Distributed Key-Value Stores
AU - Wang, Zhe
AU - Zhao, Jinhao
AU - Agrawal, Kunal
AU - Liu, He
AU - Xu, Meng
AU - Li, Jing
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/2/25
Y1 - 2023/2/25
N2 - Distributed storage systems are used widely in clouds, databases, and file systems. These systems store a large amount of data across multiple servers. When a request to access data comes in, it is routed to the appropriate server, queued, and eventually processed. If the server's queue is full, then requests may be rejected. Thus, one important challenge when designing the algorithm for allocating data to servers is the fact that the request pattern may be unbalanced, unpredictable, and may change over time. If some servers get a large fraction of the requests, they are overloaded, leading to many rejects. In this paper, we analyze this problem theoretically under adversarial assumptions. In particular, we assume that the request sequence is generated by an adversarial process to maximize the number of rejects and analyze the performance of various algorithmic strategies in terms of the fraction of the requests rejected. We show that no deterministic strategy can perform well. On the other hand, a simple randomized strategy guarantees that at most a constant fraction of requests are rejected in expectation. We also show that moving data to load balance is essential if we want to reject a very small fraction (1/m where m is the number of servers) of requests. We design a strategy with randomization and data transfer to achieve this performance with speed augmentation. Finally, we conduct experiments and show that our algorithms perform well in practice.
AB - Distributed storage systems are used widely in clouds, databases, and file systems. These systems store a large amount of data across multiple servers. When a request to access data comes in, it is routed to the appropriate server, queued, and eventually processed. If the server's queue is full, then requests may be rejected. Thus, one important challenge when designing the algorithm for allocating data to servers is the fact that the request pattern may be unbalanced, unpredictable, and may change over time. If some servers get a large fraction of the requests, they are overloaded, leading to many rejects. In this paper, we analyze this problem theoretically under adversarial assumptions. In particular, we assume that the request sequence is generated by an adversarial process to maximize the number of rejects and analyze the performance of various algorithmic strategies in terms of the fraction of the requests rejected. We show that no deterministic strategy can perform well. On the other hand, a simple randomized strategy guarantees that at most a constant fraction of requests are rejected in expectation. We also show that moving data to load balance is essential if we want to reject a very small fraction (1/m where m is the number of servers) of requests. We design a strategy with randomization and data transfer to achieve this performance with speed augmentation. Finally, we conduct experiments and show that our algorithms perform well in practice.
KW - distributed key-value store
KW - load balance
UR - http://www.scopus.com/inward/record.url?scp=85149311918&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149311918&partnerID=8YFLogxK
U2 - 10.1145/3572848.3577501
DO - 10.1145/3572848.3577501
M3 - Conference contribution
AN - SCOPUS:85149311918
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 1
EP - 12
BT - PPoPP 2023 - Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
T2 - 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023
Y2 - 25 February 2023 through 1 March 2023
ER -