Design of scalable parallel algorithms for large scale data analysis and visualization

Qishi Wu, Jinzhu Gao

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The rapid advances in computing power made possible by the modern supercomputing technologies have enabled a wide spectrum of next-generation scientific applications with a primary goal of either simulating very complex physical phenomena or creating digital representations of real-world objects for educational or scientific purposes. These applications generate vast amounts of scientific data on the order of terabytes at present or petabytes in the predictable future, which must be transferred, visualized, and analyzed by a collaborative team of geographically distributed scientists. The capability of collaboratively analyzing and visualizing today's abundant scientific data and on-going computations over wide-area networks is critical to ensure both the success of mission-critical applications and the utilization of expensive computing resources. In this chapter, we discuss various issues in the design of scalable parallel algorithms for large-scale data analysis and visualization and present a number of novel techniques with focus on scalability improvement. The main goal of algorithm design for parallel computing is to balance workload as well as minimize runtime communication and synchronization costs. We present several data management and distribution schemes that promise highly scalable parallel performance by utilizing both out-of-core schemes and effective data traversal along a space filling curve. These schemes ensure balanced workload for different parallel implementations and avoid data replication and run-time redistribution to maximize the utilization of limited system resources. We also present a linear pipeline scheme for image compositing in parallel visualization to support efficient image delivery to a remote client. The linear pipeline scheme arranges an arbitrary number of parallel processors within a cluster in a linear order and divides the image into a carefully selected number of segments, which flow through the linear in-cluster pipeline and wide-area networks to the remote client consecutively. We analytically determine the segment size that minimizes the final image display time and derive the conditions where the proposed image compositing and delivery scheme outperforms the traditional scheme based on the binary swap algorithm. The efficacy of these approaches is demonstrated by case studies that involve data analysis and visualization techniques widely used in scientific fields.

Original languageEnglish (US)
Title of host publicationCluster Computing and Multi-Hop Network Research
PublisherNova Science Publishers, Inc.
Pages109-136
Number of pages28
ISBN (Print)9781608761869
StatePublished - Dec 1 2010
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Design of scalable parallel algorithms for large scale data analysis and visualization'. Together they form a unique fingerprint.

Cite this