Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data

Project: Research project

Project Details


Modern massive data appear in increasing volume and high heterogeneity. Examples include internet searches, social networks, mobile devices, satellites, genomics, medical scans, etc. Bayesian approaches are particularly useful in such context since the complex structures in the data can be naturally incorporated in Bayesian hierarchical models. Besides, uncertainty quantification can be easily executed through Bayesian computation. However, due to storage and computational bottlenecks, traditional Bayesian computation implemented in a single machine is no longer applicable to modern massive data. In this project, a set of nonparametric Bayesian aggregation procedures with theoretical justifications are developed based on a standard parallel computing strategy known as Divide-and-Conquer. This research will significantly enhance the availability of Bayesian tools and software for analyzing massive data. The educational plan of the project will be in the form of graduate student advising and offering of special topics courses.

This project consists of three major components. First, the PIs will establish a Gaussian approximation of general nonparametric posterior distributions which serves as a theoretical foundation for general distributed Bayesian algorithms. Second, the PIs will develop a nonparametric Bayesian aggregation procedure with theoretical guarantees that is particularly useful to handle massive data in a parallel fashion. Third, the PIs will develop an efficient parallel Markov Chain Monte Carlo (MCMC) algorithm for nonparametric Bayesian models which will perform as well as traditional MCMC with substantially less computational costs. This research will lead to an emergence of 'Splitotics (Split+Asymptotics) Theory' providing theoretical guidelines for Bayesian practices. The smoothing spline inference results recently obtained by the PIs will be used as a promising tool for achieving the above goals.

Effective start/end date8/1/198/31/20


  • National Science Foundation: $30,604.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.