Scalable Inference of Quantile Regression for Large-Scale Health Care Data

Project: Research project

Project Details


Project Summary Increasing attention is being cast toward high list prices for health care services given uninsured patients' prospects of potentially paying such prices on their own. The recent emergence of the price transparency move- ment in medicine has contributed to a surge of interest in prices charged for and payments made for health care services. Therefore, it is extremely important for both insured patients and uninsured patients to have ?rst-hand information of the charge-to-payment ratios of health care services. Physician and Other Supplier Public Use File (PUF) provides information on services and procedures pro- vided to Medicare bene?ciaries by physicians and other healthcare professionals. The Physician and Other Supplier PUF contains information on utilization, payment (allowed amount and Medicare payment), and sub- mitted charges organized by National Provider Identi?er (NPI), Healthcare Common Procedure Coding System (HCPCS) code, and place of service. The currently available data in the Physician and Other Supplier PUF covers calendar years 2012 through 2015. These growing, large amount of data provide us an unprecedented opportunity to examine the charge-to-payment ratios of health care services. With opportunity comes with challenges. The year 2012 dataset is of size 1.7GB and contains more than 9 million records, 2013 dataset 1.7GB and 9 million records, 2014 dataset 1.9GB and 9 million records, and 2015 dataset 2.0GB and 10 million records. With years to come, new datasets will be available. However, scalable statistical methods for analyzing such growing large-scale data are lacking. In this projet, we will develop novel scalable statistical methods and scalable inference procedures for analyzing growing large-scale data. In particular, we will develop quantile regression approaches via stochastic gradient decent algorithms, along with scalable inference procedures based on random perturbation. Moreover, computation implementation algorithms will be proposed and theoretical properties will be derived. The results from this project will bene?t 27 million uninsured Americans by providing them the charge-to- payment ratios of health care services. At the same time, the project will expose graduate students in the Depart- ment of Mathematical Sciences at New Jersey Institute of Technology to the research of large-scale data and big data analyses, and it will strengthen the Masters Program of Data Science, a brand-new program jointly formed by the Departments of Mathematical Sciences and Computer Science at New Jersey Institute of Technology.
Effective start/end date5/15/194/30/22


  • National Institute on Aging: $288,801.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.