TY - GEN
T1 - Big data solutions for predicting risk-of-readmission for congestive heart failure patients
AU - Zolfaghar, Kiyana
AU - Meadem, Naren
AU - Teredesai, Ankur
AU - Roy, Senjuti Basu
AU - Chin, Si Chi
AU - Muckian, Brian
PY - 2013
Y1 - 2013
N2 - Developing holistic predictive modeling solutions for risk prediction is extremely challenging in healthcare informatics. Risk prediction involves integration of clinical factors with socio-demographic factors, health conditions, disease parameters, hospital care quality parameters, and a variety of variables specific to each health care provider making the task increasingly complex. Unsurprisingly, many of such factors need to be extracted independently from different sources, and integrated back to improve the quality of predictive modeling. Such sources are typically voluminous, diverse, and vary significantly over the time. Therefore, distributed and parallel computing tools collectively termed big data have to be developed. In this work, we study big data driven solutions to predict the 30-day risk of readmission for congestive heart failure (CHF) incidents. First, we extract useful factors from National Inpatient Dataset (NIS) and augment it with our patient dataset from Multicare Health System (MHS). Then, we develop scalable data mining models to predict risk of readmission using the integrated dataset. We demonstrate the effectiveness and efficiency of the open-source predictive modeling framework we used, describe the results from various modeling algorithms we tested, and compare the performance against baseline non-distributed, non-parallel, non-integrated small data results previously published to demonstrate comparable accuracy over millions of records.
AB - Developing holistic predictive modeling solutions for risk prediction is extremely challenging in healthcare informatics. Risk prediction involves integration of clinical factors with socio-demographic factors, health conditions, disease parameters, hospital care quality parameters, and a variety of variables specific to each health care provider making the task increasingly complex. Unsurprisingly, many of such factors need to be extracted independently from different sources, and integrated back to improve the quality of predictive modeling. Such sources are typically voluminous, diverse, and vary significantly over the time. Therefore, distributed and parallel computing tools collectively termed big data have to be developed. In this work, we study big data driven solutions to predict the 30-day risk of readmission for congestive heart failure (CHF) incidents. First, we extract useful factors from National Inpatient Dataset (NIS) and augment it with our patient dataset from Multicare Health System (MHS). Then, we develop scalable data mining models to predict risk of readmission using the integrated dataset. We demonstrate the effectiveness and efficiency of the open-source predictive modeling framework we used, describe the results from various modeling algorithms we tested, and compare the performance against baseline non-distributed, non-parallel, non-integrated small data results previously published to demonstrate comparable accuracy over millions of records.
KW - Healthcare
KW - Knowledge-Discovery
KW - Risk Prediction
UR - http://www.scopus.com/inward/record.url?scp=84893330267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893330267&partnerID=8YFLogxK
U2 - 10.1109/BigData.2013.6691760
DO - 10.1109/BigData.2013.6691760
M3 - Conference contribution
AN - SCOPUS:84893330267
SN - 9781479912926
T3 - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
SP - 64
EP - 71
BT - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PB - IEEE Computer Society
T2 - 2013 IEEE International Conference on Big Data, Big Data 2013
Y2 - 6 October 2013 through 9 October 2013
ER -