TY - JOUR
T1 - Multiple testing for pattern identification, with applications to microarray time-course experiments
AU - Sun, Wenguang
AU - Wei, Zhi
N1 - Funding Information:
Wenguang Sun is Assistant Professor, Department of Statistics, North Carolina State University, Raleigh, NC 27606-8203 (E-mail: [email protected]). Zhi Wei is Assistant Professor, Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102. This work was supported in part by National Science Foundation grant DMS-10-07675. We thank the Associate Editor and two referees for detailed and constructive comments which lead to a much improved article.
PY - 2011/3
Y1 - 2011/3
N2 - In time-course experiments, it is often desirable to identify genes that exhibit a specific pattern of differential expression over time and thus gain insights into the mechanisms of the underlying biological processes. Two challenging issues in the pattern identification problem are: (i) how to combine the simultaneous inferences across multiple time points and (ii) how to control the multiplicity while accounting for the strong dependence. We formulate a compound decision-theoretic framework for set-wise multiple testing and propose a data-driven procedure that aims to minimize the missed set rate subject to a constraint on the false set rate. The hidden Markov model proposed in Yuan and Kendziorski (2006) is generalized to capture the temporal correlation in the gene expression data. Both theoretical and numerical results are presented to show that our data-driven procedure controls the multiplicity, provides an optimal way of combining simultaneous inferences across multiple time points, and greatly improves the conventional combined p-value methods. In particular, we demonstrate our method in an application to a study of systemic inflammation in humans for detecting early and late response genes.
AB - In time-course experiments, it is often desirable to identify genes that exhibit a specific pattern of differential expression over time and thus gain insights into the mechanisms of the underlying biological processes. Two challenging issues in the pattern identification problem are: (i) how to combine the simultaneous inferences across multiple time points and (ii) how to control the multiplicity while accounting for the strong dependence. We formulate a compound decision-theoretic framework for set-wise multiple testing and propose a data-driven procedure that aims to minimize the missed set rate subject to a constraint on the false set rate. The hidden Markov model proposed in Yuan and Kendziorski (2006) is generalized to capture the temporal correlation in the gene expression data. Both theoretical and numerical results are presented to show that our data-driven procedure controls the multiplicity, provides an optimal way of combining simultaneous inferences across multiple time points, and greatly improves the conventional combined p-value methods. In particular, we demonstrate our method in an application to a study of systemic inflammation in humans for detecting early and late response genes.
KW - Compound decision problem
KW - Conjunction and partial conjunction tests
KW - False discovery rate
KW - Hidden markov models
KW - Microarray time-course data
KW - Simultaneous set-wise testing
UR - http://www.scopus.com/inward/record.url?scp=79954483579&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79954483579&partnerID=8YFLogxK
U2 - 10.1198/jasa.2011.ap09587
DO - 10.1198/jasa.2011.ap09587
M3 - Article
AN - SCOPUS:79954483579
SN - 0162-1459
VL - 106
SP - 73
EP - 88
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 493
ER -