TY - GEN
T1 - Predicting Large-scale Protein-protein Interactions by Extracting Coevolutionary Patterns with MapReduce Paradigm
AU - Hu, Lun
AU - Zhao, Bo Wei
AU - Yang, Shicheng
AU - Luo, Xin
AU - Zhou, Mengchu
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Protein-protein interactions are of great significance for us to understand the functional mechanisms of proteins. With the rapid development of high-throughput genomic technology, the amount of protein-protein interaction data has become so big that most of existing prediction algorithms are no longer applicable. To address this problem, we develop a distributed framework by reimplementing one of state-of-the-art algorithms, i.e., CoFex, by using MapReduce. In particular, we adopt a novel tree-based data structure to reduce the heavy memory consumption cased by the huge sequence information of proteins. After that, the procedure of CoFex is modified by following the paradigm of MapReduce such that the prediction task can be completed in a distributed manner, thus fulfilling the demanding requirements of large-scale protein-protein interaction prediction. A series of experiments have been conducted to evaluate the performance of the proposed distributed framework in terms of both efficiency and effectiveness. Experimental results demonstrate that the proposed framework can considerably improve the efficiency of CoFex by achieving more than two-orders-of-magnitude improvement in computational efficiency while retaining a comparable level of accuracy.
AB - Protein-protein interactions are of great significance for us to understand the functional mechanisms of proteins. With the rapid development of high-throughput genomic technology, the amount of protein-protein interaction data has become so big that most of existing prediction algorithms are no longer applicable. To address this problem, we develop a distributed framework by reimplementing one of state-of-the-art algorithms, i.e., CoFex, by using MapReduce. In particular, we adopt a novel tree-based data structure to reduce the heavy memory consumption cased by the huge sequence information of proteins. After that, the procedure of CoFex is modified by following the paradigm of MapReduce such that the prediction task can be completed in a distributed manner, thus fulfilling the demanding requirements of large-scale protein-protein interaction prediction. A series of experiments have been conducted to evaluate the performance of the proposed distributed framework in terms of both efficiency and effectiveness. Experimental results demonstrate that the proposed framework can considerably improve the efficiency of CoFex by achieving more than two-orders-of-magnitude improvement in computational efficiency while retaining a comparable level of accuracy.
KW - MapReduce
KW - Protein-protein interaction
KW - large-scale prediction
KW - system biology
UR - http://www.scopus.com/inward/record.url?scp=85124317098&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124317098&partnerID=8YFLogxK
U2 - 10.1109/SMC52423.2021.9658839
DO - 10.1109/SMC52423.2021.9658839
M3 - Conference contribution
AN - SCOPUS:85124317098
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 939
EP - 944
BT - 2021 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2021
Y2 - 17 October 2021 through 20 October 2021
ER -