Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning

Daqing Yun, Wuji Liu, Chase Q. Wu, Nageswara S.V. Rao, Rajkumar Kettimuthu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


Big data transfer in next-generation scientific applications is now commonly carried out over connections with guaranteed bandwidth provisioned in High-performance Networks (HPNs) through advance bandwidth reservation. To use HPN resources efficiently, provisioning agents need to carefully schedule data transfer requests and allocate appropriate bandwidths. Such reserved bandwidths, if not fully utilized by the requesting user, could be simply wasted or cause extra overhead and complexity in management due to exclusive access. This calls for the capability of performance prediction to reserve bandwidth resources that match actual needs. Towards this goal, we employ machine learning algorithms to predict big data transfer performance based on extensive performance measurements, which are collected over a span of several years from a large number of data transfer tests using different protocols and toolkits between various end sites on several real-life physical or emulated HPN testbeds. We first identify a comprehensive list of attributes involved in a typical big data transfer process, including end host system configurations, network connection properties, and control parameters of data transfer methods. We then conduct an in-depth exploratory analysis of their impacts on application-level throughput, which provides insights into big data transfer performance and motivates the use of machine learning. We also investigate the applicability of machine learning algorithms and derive their general performance bounds for performance prediction of big data transfer in HPNs. Experimental results show that, with appropriate data preprocessing, the proposed machine learning-based approach achieves 95% or higher prediction accuracy in up to 90% of the cases with very noisy real-life performance measurements.

Original languageEnglish (US)
Title of host publicationIFIP Networking 2020 Conference and Workshops, Networking 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages9
ISBN (Electronic)9783903176287
StatePublished - Jun 2020
Event2020 IFIP Networking Conference and Workshops, Networking 2020 - Paris, France
Duration: Jun 22 2020Jun 25 2020

Publication series

NameIFIP Networking 2020 Conference and Workshops, Networking 2020


Conference2020 IFIP Networking Conference and Workshops, Networking 2020

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality


  • Performance prediction
  • big data transfer
  • experimental analysis
  • machine learning


Dive into the research topics of 'Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning'. Together they form a unique fingerprint.

Cite this