Developing holistic predictive modeling solutions for risk prediction is extremely challenging in healthcare informatics. Risk prediction involves integration of clinical factors with socio-demographic factors, health conditions, disease parameters, hospital care quality parameters, and a variety of variables specific to each health care provider making the task increasingly complex. Unsurprisingly, many of such factors need to be extracted independently from different sources, and integrated back to improve the quality of predictive modeling. Such sources are typically voluminous, diverse, and vary significantly over the time. Therefore, distributed and parallel computing tools collectively termed big data have to be developed. In this work, we study big data driven solutions to predict the 30-day risk of readmission for congestive heart failure (CHF) incidents. First, we extract useful factors from National Inpatient Dataset (NIS) and augment it with our patient dataset from Multicare Health System (MHS). Then, we develop scalable data mining models to predict risk of readmission using the integrated dataset. We demonstrate the effectiveness and efficiency of the open-source predictive modeling framework we used, describe the results from various modeling algorithms we tested, and compare the performance against baseline non-distributed, non-parallel, non-integrated small data results previously published to demonstrate comparable accuracy over millions of records.