Abstract
Owing to real-world demands for global collaboration and increasing volumes of data to be analyzed, many data-intensive workflow applications are deployed in geographically distributed (geo-distributed) datacenters (DCs). In such an environment, inter-DC bandwidths are much slower than intra-DC ones, and how to effectively schedule inter-DC data communication without contention is crucial to a workflow's execution time. Meanwhile, the diversity of data privacy requirements in geo-distributed DCs causes an additional challenge. This article introduces a workflow scheduling model for geo-distributed DCs where inter-DC communications are explicitly considered and data privacy must be protected. A Communication-contention-Aware Privacy-preserving Scheduling (CAPS) method is proposed to solve it for the first time. CAPS distributes workflow tasks to DCs via a simulated annealing method such that privacy constraints are respected and the overall inter-DC data transmission time is minimized. It adopts a list scheduling heuristic to schedule tasks and data communications to computation and network resources. In experiments, CAPS is compared against leading-edge methods with realistic workflows and network settings. The results reveal that it can reduce workflow makespan by 7.08-87.53% in comparison with its peers, while guaranteeing data privacy and resolving all the communication contention issues, which has not been seen in the existing work.
Original language | English (US) |
---|---|
Pages (from-to) | 1887-1898 |
Number of pages | 12 |
Journal | IEEE Transactions on Services Computing |
Volume | 17 |
Issue number | 5 |
DOIs | |
State | Published - 2024 |
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications
- Information Systems and Management
Keywords
- Communication contention
- data privacy
- geo-distributed datacenters
- simulated annealing
- workflow scheduling