TY - JOUR
T1 - Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations
AU - Sherratt, Katharine
AU - Gruson, Hugo
AU - Grah, Rok
AU - Johnson, Helen
AU - Niehus, Rene
AU - Prasse, Bastian
AU - Sandmann, Frank
AU - Deuschel, Jannik
AU - Wolffram, Daniel
AU - Abbott, Sam
AU - Ullrich, Alexander
AU - Gibson, Graham
AU - Ray, Evan L.
AU - Reich, Nicholas G.
AU - Sheldon, Daniel
AU - Wang, Yijin
AU - Wattanachit, Nutcha
AU - Wang, Lijing
AU - Trnka, Jan
AU - Obozinski, Guillaume
AU - Sun, Tao
AU - Thanou, Dorina
AU - Pottier, Loic
AU - Krymova, Ekaterina
AU - Meinke, Jan H.
AU - Barbarossa, Maria Vittoria
AU - Leithauser, Neele
AU - Mohring, Jan
AU - Schneider, Johanna
AU - Wlazlo, Jaroslaw
AU - Fuhrmann, Jan
AU - Lange, Berit
AU - Rodiah, Isti
AU - Baccam, Prasith
AU - Gurung, Heidi
AU - Stage, Steven
AU - Suchoski, Bradley
AU - Budzinski, Jozef
AU - Walraven, Robert
AU - Villanueva, Inmaculada
AU - Tucek, Vit
AU - Smid, Martin
AU - Zajicek, Milan
AU - Perez Alvarez, Cesar
AU - Reina, Borja
AU - Bosse, Nikos I.
AU - Meakin, Sophie R.
AU - Castro, Lauren
AU - Fairchild, Geoffrey
AU - Michaud, Isaac
AU - Osthus, Dave
AU - Alaimo Di Loro, Pierfrancesco
AU - Maruotti, Antonello
AU - Eclerova, Veronika
AU - Kraus, Andrea
AU - Kraus, David
AU - Pribylova, Lenka
AU - Dimitris, Bertsimas
AU - Li, Michael Lingzhi
AU - Saksham, Soni
AU - Dehning, Jonas
AU - Mohr, Sebastian
AU - Priesemann, Viola
AU - Redlarski, Grzegorz
AU - Bejar, Benjamin
AU - Ardenghi, Giovanni
AU - Parolini, Nicola
AU - Ziarelli, Giovanni
AU - Bock, Wolfgang
AU - Heyder, Stefan
AU - Hotz, Thomas
AU - Singh, David E.
AU - Guzman-Merino, Miguel
AU - Aznarte, Jose L.
AU - Morina, David
AU - Alonso, Sergio
AU - Alvarez, Enric
AU - Lopez, Daniel
AU - Prats, Clara
AU - Burgard, Jan Pablo
AU - Rodloff, Arne
AU - Zimmermann, Tom
AU - Kuhlmann, Alexander
AU - Zibert, Janez
AU - Pennoni, Fulvia
AU - Divino, Fabio
AU - Catala, Marti
AU - Lovison, Gianfranco
AU - Giudici, Paolo
AU - Tarantino, Barbara
AU - Bartolucci, Francesco
AU - Jona Lasinio, Giovanna
AU - Mingione, Marco
AU - Farcomeni, Alessio
AU - Srivastava, Ajitesh
AU - Montero-Manso, Pablo
AU - Adiga, Aniruddha
AU - Hurt, Benjamin
AU - Lewis, Bryan
AU - Marathe, Madhav
AU - Porebski, Przemyslaw
AU - Venkatramanan, Srinivasan
AU - Bartczuk, Rafal P.
AU - Dreger, Filip
AU - Gambin, Anna
AU - Gogolewski, Krzysztof
AU - Gruziel-Slomka, Magdalena
AU - Krupa, Bartosz
AU - Moszyński, Antoni
AU - Niedzielewski, Karol
AU - Nowosielski, Jedrzej
AU - Radwan, Maciej
AU - Rakowski, Franciszek
AU - Semeniuk, Marcin
AU - Szczurek, Ewa
AU - Zielinski, Jakub
AU - Kisielewski, Jan
AU - Pabjan, Barbara
AU - Holger, Kirsten
AU - Kheifetz, Yuri
AU - Scholz, Markus
AU - Przemyslaw, Biecek
AU - Bodych, Marcin
AU - Filinski, Maciej
AU - Idzikowski, Radoslaw
AU - Krueger, Tyll
AU - Ozanski, Tomasz
AU - Bracher, Johannes
AU - Funk, Sebastian
N1 - Publisher Copyright:
© 2023, eLife Sciences Publications Ltd. All rights reserved.
PY - 2023/4/21
Y1 - 2023/4/21
N2 - Background: Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022. Methods: We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1-4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models' predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models' forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models' past predictive performance. Results: Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models' forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models' forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models' forecasts of deaths (N=763 predictions from 20 models). Across a 1-4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models. Conclusions: Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks. Funding: AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).
AB - Background: Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022. Methods: We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1-4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models' predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models' forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models' past predictive performance. Results: Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models' forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models' forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models' forecasts of deaths (N=763 predictions from 20 models). Across a 1-4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models. Conclusions: Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks. Funding: AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).
KW - COVID-19
KW - Europe
KW - ensemble
KW - epidemiology
KW - forecast
KW - global health
KW - modelling
KW - none
KW - prediction
UR - http://www.scopus.com/inward/record.url?scp=85158061738&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85158061738&partnerID=8YFLogxK
U2 - 10.7554/eLife.81916
DO - 10.7554/eLife.81916
M3 - Article
C2 - 37083521
AN - SCOPUS:85158061738
SN - 2050-084X
VL - 12
JO - eLife
JF - eLife
ER -