TY - GEN
T1 - Work stealing for interactive services to meet target latency
AU - Li, Jing
AU - Agrawal, Kunal
AU - Elnikety, Sameh
AU - He, Yuxiong
AU - Lee, I. Ting Angelina
AU - Lu, Chenyang
AU - McKinley, Kathryn S.
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/2/27
Y1 - 2016/2/27
N2 - Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests. We design a new adaptive work stealing policy, called tailcontrol, that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services.
AB - Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests. We design a new adaptive work stealing policy, called tailcontrol, that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services.
UR - http://www.scopus.com/inward/record.url?scp=84963739865&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963739865&partnerID=8YFLogxK
U2 - 10.1145/2851141.2851151
DO - 10.1145/2851141.2851151
M3 - Conference contribution
AN - SCOPUS:84963739865
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
BT - 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016 - Proceedings
PB - Association for Computing Machinery
T2 - 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016
Y2 - 12 March 2016 through 16 March 2016
ER -