TY - GEN

T1 - Latency tolerance through parallelization of time in scientific applications

AU - Srinivasan, Ashok

AU - Chandra, Namas

N1 - Funding Information:
This work was funded by NSF grant # CMS-0403746. We also wish to thank Sri S.S. Baba for help with debugging the code, among other things.

PY - 2004

Y1 - 2004

N2 - Distributed computing environments, such as the Grid, promise enormous raw computational power, but involve high communication overheads. It is therefore believed that they are primarily suited for "embarrassingly parallel" applications, such as Monte Carlo, and for certain applications where the loosely-coupled nature of the science involved in the simulations leads to a coarse grained computation. In a typical application, this is not feasible. We discuss our solution strategy, based on scalable functional decomposition, which can be used to keep the computation coarse grained, even on a large number of processors. Such a decomposition can be attempted through a variety of means. We will discuss the use of time parallelization to achieve this. We demonstrate results with a model problem, and then discuss its implementation for an important problem in nanomaterials simulation. We also show that this technique can be extended to make it inherently fault-tolerant.

AB - Distributed computing environments, such as the Grid, promise enormous raw computational power, but involve high communication overheads. It is therefore believed that they are primarily suited for "embarrassingly parallel" applications, such as Monte Carlo, and for certain applications where the loosely-coupled nature of the science involved in the simulations leads to a coarse grained computation. In a typical application, this is not feasible. We discuss our solution strategy, based on scalable functional decomposition, which can be used to keep the computation coarse grained, even on a large number of processors. Such a decomposition can be attempted through a variety of means. We will discuss the use of time parallelization to achieve this. We demonstrate results with a model problem, and then discuss its implementation for an important problem in nanomaterials simulation. We also show that this technique can be extended to make it inherently fault-tolerant.

UR - http://www.scopus.com/inward/record.url?scp=12444256378&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12444256378&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:12444256378

SN - 0769521320

SN - 9780769521329

T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

SP - 1595

EP - 1605

BT - Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

T2 - Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

Y2 - 26 April 2004 through 30 April 2004

ER -