TY - JOUR
T1 - A super-programming approach for mining association rules in parallel on PC clusters
AU - Jin, Dejiang
AU - Ziavras, Sotirios G.
N1 - Funding Information:
This work was supported in part by the US Department of Energy under grant DE-FG02-03CH11171.
PY - 2004/9
Y1 - 2004/9
N2 - PC clusters have become popular in parallel processing. They do not involve specialized interprocessor networks, so the latency of data communications is rather long. The programming models for PC clusters are often different than those for parallel machines or supercomputers containing sophisticated interprocessor communication networks. For PC clusters, load balancing among the nodes becomes a more critical issue in attempts to yield high performance. We introduce a new model for program development on PC clusters, namely, the Super-Programming Model (SPM). The workload is modeled as a collection of Super-Instructions (SIs). We propose that a set of SIs be designed for each application domain. They should constitute an orthogonal set of frequently used high-level operations in the corresponding application domain. Each SI should normally be implemented as a high-level language routine that can execute on any PC. Application programs are modeled as Super-Programs (SPs), which are coded using SIs. SIs are dynamically assigned to available PCs at runtime. Because of the known granularity of SIs, an upper bound on their execution time can be estimated at static time. Therefore, dynamic load balancing becomes an easier task. Our motivation is to support dynamic load balancing and code porting, especially for applications with diverse sets of inputs such as data mining. We apply here SPM to the implementation of an Apriori-like algorithm for mining association rules. Our experiments show that the average idle time per node is kept very low.
AB - PC clusters have become popular in parallel processing. They do not involve specialized interprocessor networks, so the latency of data communications is rather long. The programming models for PC clusters are often different than those for parallel machines or supercomputers containing sophisticated interprocessor communication networks. For PC clusters, load balancing among the nodes becomes a more critical issue in attempts to yield high performance. We introduce a new model for program development on PC clusters, namely, the Super-Programming Model (SPM). The workload is modeled as a collection of Super-Instructions (SIs). We propose that a set of SIs be designed for each application domain. They should constitute an orthogonal set of frequently used high-level operations in the corresponding application domain. Each SI should normally be implemented as a high-level language routine that can execute on any PC. Application programs are modeled as Super-Programs (SPs), which are coded using SIs. SIs are dynamically assigned to available PCs at runtime. Because of the known granularity of SIs, an upper bound on their execution time can be estimated at static time. Therefore, dynamic load balancing becomes an easier task. Our motivation is to support dynamic load balancing and code porting, especially for applications with diverse sets of inputs such as data mining. We apply here SPM to the implementation of an Apriori-like algorithm for mining association rules. Our experiments show that the average idle time per node is kept very low.
UR - http://www.scopus.com/inward/record.url?scp=4544277998&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4544277998&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2004.37
DO - 10.1109/TPDS.2004.37
M3 - Article
AN - SCOPUS:4544277998
SN - 1045-9219
VL - 15
SP - 783
EP - 794
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 9
ER -