Abstract
Subdata selection is necessary because of challenges arising from statistical analysis of big data using limited computing resources. The existing work on subdata selection relies heavily on a specified model, which calls for an approach that is robust to model misspecification. We propose the use of space-filling designs for subdata selection and examine a fast algorithm for its implementation. Our algorithm performs surprisingly well when compared to the reference distribution given by complete search. Simulations are conducted to compare our approach with a recently introduced IBOSS method, and the results show that our method is not just robust to model misspecification but also robust to model uncertainty. While robustness to model misspecification and uncertainty may be expected due to the nature of space-filling designs, we discover that our method enjoys an additional property of robustness when there exist substantial correlations among covariates.
Original language | English (US) |
---|---|
Article number | 82 |
Journal | Journal of Statistical Theory and Practice |
Volume | 15 |
Issue number | 4 |
DOIs | |
State | Published - Dec 2021 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
Keywords
- Massive data
- Maximin distance design
- Model-independent method
- Space-filling design