- Ensemble forecasting is a popular choice to improve predictions.
- The method has been widely used to model various aspects of the current pandemic.
- Ensembles may be our best option to predict future disease outbreaks.
An explosion of Covid-19 literature in the last year has seen a corresponding increase in ensemble forecasting as a tool to model various aspects of the pandemic. Ensemble forecasting combines independently developed modeling groups into one and is widely used to improve predictions. Compared to specialized modeling techniques, ensemble modeling has been shown to be more robust for complex systems like pandemics. It’s also better at providing a complete range of possible outcomes [2] and performs better when dealing with high-dimension data [3].
What is an Ensemble Forecast?
Image: Simple example of 3 probabilistic forecasts, each with weight w and probability p. A linear combination of these weighted forecasts leads to one ensemble forecast.
Ensemble forecasting is mostly used to improve performance of any one individual model or reduce the probability of choosing a single poor model from a set. Multiple models with independent projections are combined, resulting in one improved model to guide decision-making. The goal isn’t to provide a definite answer, but rather to give simple and informative recommendations [2].
The individual models can be agent-based, compartmental, or statistical. One common method is the multimodel ensemble, the simple combination of a set of models to improve on the prediction capability of a single model. An alternative is to use one single model and perturb, or change, the initial conditions to produce a set of different results. Hurricane projections often use the latter case, with the “spaghetti models” the result of initial perturbations.
There are wide and varying ways to go about this process, including taking a simple average or using linear combinations. Nonlinear weighting schemes improve the accuracy of input models in basic research, while equal‐weighting schemes or simple ensemble averages are usually the method of choice in operational forecasting. More complex approaches include the Ensemble Kalman Filter (EnKF), a Monte Carlo approximation of the Kalman filter which produces better results for nonlinear models with uncertain initial states.
In data science, machine learning algorithms like random forests are popular, as well as bagging, boosting, and stacking.
- Bagging (bootstrap aggregating), which includes random forests, decreases variance in models. Different training data subsets are randomly drawn with replacement. Each subset is then used to train a different classifier—one of the same type. Classifiers are combined by “majority vote”.
- Boosting (for example AdaBoost or Gradient Boosted Decision Tree) decreases bias (error) in models [7]. The final ensemble is created by resampling, which works to provide the most informative training data for consecutive classifiers. Like bagging, these are also combined by majority vote.
- Stacking combines multiple classification or regression models. A learning algorithm is trained to combine predictions from several other learning algorithms.
On the Future of Ensemble Methods for Pandemic Forecasting
At first glance, ensemble forecasts appear to be the solution to some major pitfalls with modeling. For example, it negates the difficult task of choosing a single model from a variety: why choose one when all can be combined? But ensemble modeling itself is relatively new, and using this type of modeling for pandemic data is in its infancy.
Some notable applications from the Covid-19 pandemic (all published in the last year) include:
- Diagnosing COVID-19 from routine blood tests: One study achieved a remarkable accuracy of 99.88% in distinguishing positive from negative cases. [4]
- Modeling COVID-19 transmission: The study, modeled with data from 17 cities in Hubei province, China, showed remarkable accuracy in modeling the spread of the pandemic. [5]
- Forecasting COVID-19 outbreaks: One study showed that stacked deep ensemble learning models perform better than individual deep learning models, improving the predictive accuracy for confirmed cases. [6]
- Identifying suitable sites for COVID-19 vaccine trials: Data-driven models can contribute to fast-tracking trials in pandemics “where every day counts”. [2]
Ensemble modeling has also been used to model various other aspects of infectious diseases like dengue, Ebola, and the flu. [2]
While historical data can be used to provide input data and test results, new variants of Covid-19 (or any other pandemic for that matter) do not behave in the same way as the original strain. Additionally, like all modeling techniques, there are “bad” ensembles and there are “good” ones [8]. The only way to sort the good from the bad is to compare projections subsequently observed data [2]. When lives are at stake, that may not be good enough, but it may be the best we have for now.
References
[1] Covid-10 Forecasts: Deaths.
[2] Ensemble forecast modeling for the design of COVID-19 vaccine effic…
[3] Applying Machine Learning Models with An Ensemble Approach for…
[4]Ensemble learning model for diagnosing COVID-19 from routine blood …
[5]Estimating Parameters of Two-Level Individual-Level Models of the C…
[6] Deep Ensemble Learning Method to Forecast COVID-19 Outbreak
[7] The Importance of Ensemble Techniques for Operational Space Weather…
[8] Ensemble forecasting and data assimilation: two problems with the s…