Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations

K. Sherratt, H. Gruson, R. Grah, H. Johnson, R. Niehus, B. Prasse, F. Sandmann, J. Deuschel, D. Wolffram, S. Abbott, A. Ullrich, G. Gibson, E. L. Ray, N. G. Reich, D. Sheldon, Y. Wang, N. Wattanachit, L. Wang, J. Trnka, G. ObozinskiT. Sun, D. Thanou, L. Pottier, E. Krymova, J. H. Meinke, M. V. Barbarossa, N. Leithäuser, J. Mohring, J. Schneider, J. Włazło, J. Fuhrmann, B. Lange, I. Rodiah, P. Baccam, H. Gurung, S. Stage, B. Suchoski, J. Budzinski, R. Walraven, I. Villanueva, V. Tuček, M. Šmíd, M. Zajíček, C. Pérez Álvarez, B. Reina, N. I. Bosse, S. Meakin, L. Castro, G. Fairchild, I. Michaud, D. Osthus, P. Alaimo Di Loro, A. Maruotti, V. Eclerová, A. Kraus, D. Kraus, L. Pribylova, B. Dimitris, M. L. Li, S. Saksham, J. Dehning, S. Mohr, V. Priesemann, G. Redlarski, B. Bejar, G. Ardenghi, N. Parolini, G. Ziarelli, W. Bock, S. Heyder, T. Hotz, D. E. Singh, M. Guzman-Merino, J. L. Aznarte, D. Moriña, S. Alonso, E. Álvarez, D. López, C. Prats, J. P. Burgard, A. Rodloff, T. Zimmermann, A. Kuhlmann, J. Zibert, F. Pennoni, F. Divino, M. Català, G. Lovison, P. Giudici, B. Tarantino, F. Bartolucci, G. Jona Lasinio, M. Mingione, A. Farcomeni, A. Srivastava, P. Montero-Manso, A. Adiga, B. Hurt, B. Lewis, M. Marathe, P. Porebski, S. Venkatramanan, R. Bartczuk, F. Dreger, A. Gambin, K. Gogolewski, M. Gruziel-Słomka, B. Krupa, A. Moszynski, K. Niedzielewski, J. Nowosielski, M. Radwan, F. Rakowski, M. Semeniuk, E. Szczurek, J. Zieliński, J. Kisielewski, B. Pabjan, Y. Kheifetz, H. Kirsten, M. Scholz, P. Biecek, M. Bodych, M. Filinski, R. Idzikowski, T. Krueger, T. Ozanski, J. Bracher, S. Funk

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Background: Short-term forecasts of infectious disease contribute to situational awareness and capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise forecasts’ predictive performance by combining independent models into an ensemble. Here we report the performance of ensemble predictions of COVID-19 cases and deaths across Europe from March 2021 to March 2022. Methods: We created the European COVID-19 Forecast Hub, an online open-access platform where modellers upload weekly forecasts for 32 countries with results publicly visualised and evaluated. We created a weekly ensemble forecast from the equally-weighted average across individual models’ predictive quantiles. We measured forecast accuracy using a baseline and relative Weighted Interval Score (rWIS). We retrospectively explored ensemble methods, including weighting by past performance. Results: We collected weekly forecasts from 48 models, of which we evaluated 29 models alongside the ensemble model. The ensemble had a consistently strong performance across countries over time, performing better on rWIS than 91% of forecasts for deaths (N=763 predictions from 20 models), and 83% forecasts for cases (N=886 predictions from 23 models). Performance remained stable over a 4-week horizon for death forecasts but declined with longer horizons for cases. Among ensemble methods, the most influential choice came from using a median average instead of the mean, regardless of weighting component models. Conclusions: Our results support combining independent models into an ensemble forecast to improve epidemiological predictions, and suggest that median averages yield better performance than methods based on means. We highlight that forecast consumers should place more weight on incident death forecasts than case forecasts at horizons greater than two weeks. Funding: European Commission, Ministerio de Ciencia, Innovación y Universidades, FEDER; Agència de Qualitat i Avaluació Sanitàries de Catalunya; Netzwerk Universitätsmedizin; Health Protection Research Unit; Wellcome Trust; European Centre for Disease Prevention and Control; Ministry of Science and Higher Education of Poland; Federal Ministry of Education and Research; Los Alamos National Laboratory; German Free State of Saxony; NCBiR; FISR 2020 Covid-19 I Fase; Spanish Ministry of Health / REACT-UE (FEDER); National Institutes of General Medical Sciences; Ministerio de Sanidad/ISCIII; PERISCOPE European H2020; PERISCOPE European H2021; InPresa; National Institutes of Health, NSF, US Centers for Disease Control and Prevention, Google, University of Virginia, Defense Threat Reduction Agency Background Epidemiological forecasts make quantitative statements about a disease outcome in the near future. Forecasting targets can include measures of prevalent or incident disease and its severity, for some population over a specified time horizon. Researchers, policy makers, and the general public have used such forecasts to understand and respond to the global outbreaks of COVID-19 [1]–[3]. At the same time, forecasters use a variety of methods and models for creating and publishing forecasts, varying in both defining the forecast outcome and in reporting the probability distribution of outcomes [4], [5]. Within Europe, comparing forecasts across both models and countries can support a range of national policy needs simultaneously. European public health professionals operate across national, regional, and continental scales, with strong existing policy networks in addition to rich patterns of cross-border migration influencing epidemic dynamics. A majority of European countries also cooperate in setting policy with inter-governmental European bodies such as the European Centre for Disease Prevention and Control (ECDC). In this case, a consistent approach to forecasting across the continent as a whole can support accurately informing cross-European monitoring, analysis, and guidance [3]. At a regional level, multi-country forecasts can support a better understanding of the impact of regional migration networks. Meanwhile, where there is limited capacity for infectious disease forecasting at a national level, forecasters generating multi-country results can provide an otherwise-unavailable opportunity for forecasts to inform national situational awareness. Some independent forecasting models have sought to address this by producing multi-country results [6]–[9]. Variation in forecast methods and presentation makes it difficult to compare predictive performance between forecast models, and from there to derive objective arguments for using one forecast over another. This confounds the selection of a single representative forecast and reduces the reliability of the evidence base for decisions based on forecasts. A “forecast hub” is a centralised effort to improve the transparency and usefulness of forecasts, by standardising and collating the work of many independent teams producing forecasts [10]. A hub sets a commonly agreed-upon structure for forecast targets, such as type of disease event, spatio-temporal units, or the set of quantiles of the probability distribution to include from probabilistic forecasts. For instance, a hub may collect predictions of the total number of cases reported in a given country for each day in the next two weeks. Forecasters can adopt this format and contribute forecasts for centralised storage in the public domain. This shared infrastructure allows forecasts produced from diverse teams and methods to be visualised and quantitatively compared on a like-for-like basis, which can strengthen public and policy use of disease forecasts. The underlying approach to creating a forecast hub was pioneered in climate modelling and adapted for collaborative epidemiological forecasts of dengue [11] and influenza in the USA [10], [12]. This infrastructure was adapted for forecasts of short-term COVID-19 cases and deaths in the US [13], [14], prompting similar efforts in some European countries [15]–[17]. Standardising forecasts allows for combining multiple forecasts into a single ensemble with the potential for an improved predictive performance. Evidence from previous efforts in multi-model infectious disease forecasting suggests that forecasts from an ensemble of models can be consistently high performing compared to any one of the component models [11], [12], [18]. Elsewhere, weather forecasting has a long-standing use of building ensembles of models using diverse methods with standardised data and formatting in order to improve performance [19], [20]. The European COVID-19 Forecast Hub [21] is a project to collate short term forecasts of COVID-19 across 32 countries in the European region. The Hub is funded and supported by the ECDC, with the primary aim to provide reliable information about the near-term epidemiology of the COVID-19 pandemic to the research and policy communities and the general public [3]. Second, the Hub aims to create infrastructure for storing and analysing epidemiological forecasts made in real time by diverse research teams and methods across Europe. Third, the Hub aims to maintain a community of infectious disease modellers underpinned by open science principles. We started formally collating and combining contributions to the European Forecast Hub in March 2021. Here, we investigate the predictive performance of an ensemble of all forecasts contributed to the Hub in real time each week, as well as the performance of variations of ensemble methods created retrospectively. Methods We developed infrastructure to host and analyse prospective forecasts of COVID-19 cases and deaths. The infrastructure is compatible with equivalent research software from the US [22], [23] and German and Polish COVID-19 [24] Forecast Hubs, and easy to replicate for new forecasting collaborations. All data and code for this analysis are publicly available on Github [25].

Original languageEnglish (US)
Article numbere81916
JournaleLife
Volume12
DOIs
StatePublished - Apr 2023
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Neuroscience
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology

Fingerprint

Dive into the research topics of 'Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations'. Together they form a unique fingerprint.

Cite this