Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDS

Bartley Richardson, Bradley Rees, Tom Drabas, Even Oldridge, David A. Bader, Rachel Allen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The lines between data science (DS), machine learning (ML), deep learning (DL), and data mining continue to be blurred and removed. This is great as it ushers in vast amounts of capabilities, but it brings increased complexity and a vast number of tools/techniques. It's not uncommon for DL engineers to use one set of tools for data extraction/cleaning and then pivot to another library for training their models. After training and inference, it's common to then move data yet again by another set of tools for post-processing. The RAPIDS suite of open source libraries not only provides a method to execute and accelerate these tasks using GPUs with familiar APIs, but it also provides interoperability with the broader open source community and DL tools while removing unnecessary serializations that slow down workflows. GPUs provide massive parallelization that DL has leveraged for some time, and RAPIDS provides the missing pieces that extend this computing power to more traditional yet important DS and ML tasks (e.g., ETL, modeling). Complete pipelines can be built that encompass everything, including ETL, feature engineering, ML/DL modeling, inference, and visualization, all while removing typical serialization costs and affording seamless interoperability between libraries. All experiments using RAPIDS can effortlessly be scheduled, logged and reviewed using existing public cloud options. Join our engineers and data scientists as they walk through a collection of DS and ML/DL engineering problems that show how RAPIDS running on Azure ML can be used for end-to-end, entirely GPU pipelines. This tutorial includes specifics on how to use RAPIDS for feature engineering, interoperability with common ML/DL packages, and creating GPU native visualizations using cuxfilter. The use cases presented here give attendees a hands-on approach to using RAPIDS components as part of a larger workflow, seamlessly integrating with other libraries (e.g., TensorFlow) and visualization packages.

Original languageEnglish (US)
Title of host publicationKDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages3503-3504
Number of pages2
ISBN (Electronic)9781450379984
DOIs
StatePublished - Aug 23 2020
Event26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States
Duration: Aug 23 2020Aug 27 2020

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
Country/TerritoryUnited States
CityVirtual, Online
Period8/23/208/27/20

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • data science
  • deep learning
  • gpu acceleration
  • machine learning
  • pydata ecosystem

Fingerprint

Dive into the research topics of 'Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDS'. Together they form a unique fingerprint.

Cite this