Using AI to Predict Influenza, C19 and Other Infectious Disease Rates

shutterstock_1050727721.jpg

Infectious diseases are a leading cause of death worldwide and the World Health Organization (WHO) released a report in 2007 as a foreboding warning of a two-fold emergent threat: the rate at which new infectious diseases are being discovered and the rate at which these diseases are spreading are higher than ever recorded in modern history[1]. Fast forward to today and the world finds itself in a Coronavirus Disease 2019 (COVID-19) pandemic. Scientists, doctors, health practitioners, and public policy makers are scrambling to deliver countermeasures to mitigate the devastating effects and spread of the disease. Given the large scale and highly negative impact of COVID-19 and other infectious diseases, we at Xyonix naturally wonder:

How can artificial intelligence be used to mitigate infectious disease propagation? 

The answer, in part, lies in our ability to collect relevant data and unfold the patterns that emerge in order to forecast future events. Like volcanic eruptions, earthquakes, and major terrorist attacks, a pandemic is considered a rare event. COVID-19 is one of only four pandemics that have occurred in the 20th century, the others being the Spanish (1918), Asian (1957), and Hong Kong (1968) influenza[3]. The low frequency of such rare events makes model forecasting and evaluation particularly difficult, since the number of observations is usually too small to make reliable statistical inferences regarding performance[4]. In other words, rare event data requires special handling and traditional statistical data modeling techniques won’t cut it. However, we hypothesize that COVID-19 is similar to seasonal influenza in how the disease spreads[16]. Thus, we may benefit from studying historical flu patterns when forecasting additional waves of COVID-19 propagation as quarantine and confinement restrictions are lifted.

Seasonal influenza is a commonly occurring event that has widespread health implications. WHO estimates that seasonal influenza is associated with 250,000 to 500,000 deaths worldwide per year[5]. As the timing and intensity of seasonal influenza can vary widely from region to region, the ability to accurately forecast future regional events is paramount in preparing communities for adverse impacts and combating its spread[2]. In this context,

AI can be used to generate regional forecasts of infectious disease rates that, in turn, empower government and other leaders to make prudent social distancing and other preemptive modifications.

Case Study Identification

As a case study, we will look at influenza data supplied by the Centers for Disease Control. The data is from the ILINet Program, which is a national outpatient influenza illness surveillance program. The data are accumulated by Healthcare providers (ILINet Providers) that participate in the program report information on patient visits for influenza-like illness (ILI) to the Influenza-like Illness Surveillance Network (ILINet). Figure 1 below shows the histories of ILI expressed as a percentage of patient population for a sampling of states from Oct 2010 to present day, Apr 2020. 

Fig. 1: Weekly Influenza-like illnesses (ILI) expressed as a percentage of patient population for selected states. The red shaded areas represent the estimated time span of public exposure to the COVID-19 disease, Dec 2019 to present.

Fig. 1: Weekly Influenza-like illnesses (ILI) expressed as a percentage of patient population for selected states. The red shaded areas represent the estimated time span of public exposure to the COVID-19 disease, Dec 2019 to present.

Our goal is to use ILI data to train a model that will accurately predict future seasonal flu levels. But what model should we use? To keep things simple, we restrict ourselves to univariate modeling, i.e., one where we only use the ILI series itself to forecast future values. We will consider multivariate forecasting models of infectious disease in a future blog post. To help us identify the best univariate approach, it is helpful to briefly discuss conventional models and how deep learning has recently come into play as a potentially superior technique.

Time series are often thought of as being comprised of trend, seasonal, cyclical, and irregular noise components [6]. For example, the strong seasonal component and relatively weak irregular noise component of historical influenza data are visually evident in Fig. 1. Classic forecasting models, such as Holt-Winters and seasonal autoregressive moving average (SARIMA) models, seek to estimate parameters of these components based on historical patterns exhibited in the data. Given the recent success of artificial intelligence in areas such as computer vision and natural language processing, one might naturally assume that deep learning (DL) models would fare better than classical models. However in time series forecasting problems, DL forecasting models have historically struggled to outperform classical statistical techniques [7,8,9]. That recently changed when a hybrid model (modern recurrent neural network + classic Holt-Winters) won the prestigious M4 time series competition in 2018[10, 11]. Then in 2019,  Oreshkin et al. developed a pure DL architecture known as N-BEATS (neural basis expansion analysis for interpretable times series) that outperformed the M4 winner with an impressive 3% improvement in standard competition metrics[7]. The authors claim that N-BEATS is attractive for many reasons, including:

  • interpretability: while N-BEATS has no explicit reliance on the time series components used in traditional models, the architecture can be adapted to produce model weights related to such components, e.g., trend and seasonality, in a way that doesn’t sacrifice model performance.

  • simplicity: the base architecture is relatively simple, relying on blocks of fully connected neural networks with RELU induced nonlinearities that are stacked into a few residual blocks.

N-BEATS MODELING OF SEASONAL INFLUENZA

As N-BEATS is a “pure” deep learning architecture, and hence represents a purely AI solution, and given that it has shown great promise in delivering accurate forecasts using M4 competition data, we chose N-BEATS as a model for forecasting seasonal influenza. For the analytically inclined, we’ve made the source for our case study publicly available (click here to download and experiment on your own). We began by trimming the red shaded regions from the ILI series shown in Fig. 1 to remove any possible crossover effect of COVID-19 and seasonal flu, e.g., it could be that doctors mistakenly diagnosed a patient of having seasonal flu instead of COVID-19 as, at the time, they may not have been aware of COVID-19 and since many of the initial symptoms are common to both diseases. Next we form a 70/30 split of our ILI data to form training and test sets, respectively. We selected a forecast horizon of 12 weeks, meaning that our model will produce forecasts for weeks 1 through 12 for each training and test sample. We must also choose how many historical points to use in forming our training samples, i.e, a lookback window (see Fig. 2). In general, the optimal lookback is not known a priori and so we chose multiple lookback windows in testing our model performance: 36, 60, 84, 120, 144, and 180 weeks.

Figure 2: Two potential lookback windows (blue shaded regions) of length 60 and 180 weeks and a forecast horizon of 20 weeks (green shaded window). With a single training data segment like the one shown above, many training samples may be formed by …

Figure 2: Two potential lookback windows (blue shaded regions) of length 60 and 180 weeks and a forecast horizon of 20 weeks (green shaded window). With a single training data segment like the one shown above, many training samples may be formed by sliding a window comprised of the lookback and horizon windows from left to right, one point at a time.

We trained separate N-BEATS models for each of the lookback-state combinations. Figure 3 illustrates the convergence of an N-BEATS model at three stages of training. At each stage, four randomly drawn test samples are scored with the current model to produce forecasts (red line), which can then be compared to the actual values (green line).  As expected, the model does poorly in the initial stage (row 1 of Fig. 3) but does increasingly better as the stages progress, giving us confidence that the model is indeed learning. 

Figure 3: Visual validation of convergence for California N-BEATS model with a lookback window of 120 weeks. Shown are four randomly drawn test samples (blue) evaluated at three stages of model training. In red are the 12 week ahead forecasted value…

Figure 3: Visual validation of convergence for California N-BEATS model with a lookback window of 120 weeks. Shown are four randomly drawn test samples (blue) evaluated at three stages of model training. In red are the 12 week ahead forecasted values, which should be compared to the actual values shown in green. In the first row the model randomly generates the model weights and is pretty far off the mark. However, as the training evolves, the predicted values are closer to the actual values.

We can quantify the efficacy of the trained models via forecast error statistics, e.g., mean absolute error (MAE), mean squared error (MSE), and symmetric mean absolute percentage error (sMAPE) [12]. Figure 4a shows the distributions of these metrics for each state. Given that the ILI% peak levels vary from state to state (see Fig. 1), arguably the most relevant of these metrics is sMAPE as it scales the error by the average of the forecast and ground truth, i.e., sMAPE is a scale-independent metric (see Fig. 4b). Figure 4c better illustrates the variability of sMAPE statistics as a function of lookback for each state model. For example, California has a low average sMAPE value around 0.2 and the model performance is mostly agnostic to the size of the lookback window. By contrast, New York’s sMAPE ranges from 0.58 (lookback = 36) to 0.9 (lookback = 144): non-stellar and highly variable forecasting performance. Based solely on sMAPE, N-BEATS performed the best for CA, well on {MA, LA, IL, TX}, moderately on {NC, WA}, and relatively poorly for {NY, OR}

(a) N-BEATS forecast efficacy metrics.

(a) N-BEATS forecast efficacy metrics.

(b) sMAPE N-BEATS forecast efficacy by state.

(b) sMAPE N-BEATS forecast efficacy by state.

(c) N-BEATS forecast error (sMAPE) expressed as a lookback-state heatmap.  Figure 4: (a) Inference efficacy for state models using various metrics. The error bars represent the 95% confidence intervals corresponding to the distribution of errors on …

(c) N-BEATS forecast error (sMAPE) expressed as a lookback-state heatmap.

Figure 4: (a) Inference efficacy for state models using various metrics. The error bars represent the 95% confidence intervals corresponding to the distribution of errors on all forecasts, which include the lookback models created for each state. (b) inference efficacy results ordered by the maximum sMAPE value recorded for each state. (c) model performance heatmap.

Figure 5 shows another view of the forecasting efficacy via distributional forecast error heatmaps. N-BEATS does an impressive job in forecasting California ILI%, highlighted by the fact that 62% of 12-week out forecasts fell within 0.25% of the actual values. In general, we are looking for a dark blue horizontal stripe across most/all weeks at the (ILI% error = 0%) level, representing highly accurate forecasts over most/all time horizons. Inasmuch, N-BEATS performed well for California, Washington, and Massachusetts. However, the distributional forecast error heatmap for New York (Fig. 5d) is highly variable, with a spread of the statistics mostly away from the  (ILI% error = 0%) level. Thus, in New York’s case, N-BEATS did a poor job in accurately forecasting seasonal influenza.

(a) California error heatmap.

(a) California error heatmap.

(b) Washington error heatmap.

(b) Washington error heatmap.

(c) Massachusetts error heatmap.

(c) Massachusetts error heatmap.

(d) New York error heatmap.Figure 5: Distributional forecast error heatmaps for (a) California, (b) Washington, and (c) Massachusetts. Each cell of the heatmap is located by a forecast horizon (weeks out) and forecast error (ILI% ± 0.25%). The value…

(d) New York error heatmap.

Figure 5: Distributional forecast error heatmaps for (a) California, (b) Washington, and (c) Massachusetts. Each cell of the heatmap is located by a forecast horizon (weeks out) and forecast error (ILI% ± 0.25%). The value of each cell is the percentage of forecasts that meet the cell location conditions. For example, the 0.58 value for the (0.0 ILI%, 1 week out) cell of the Massachusetts map means that 58% of the 1 week out forecasts were within 0.25% of the true value. The 0.29 value above it means that 29% of the 1 week out forecasted values were in the ILI% error range of 0.25%-0.75%, and so on. The values of the cells in each ‘weeks out’ column sum to unity.

SUMMARY

N-BEATS was able to produce accurate seasonal influenza forecasts over a 3-month time horizon for a majority of the states we studied, thereby solidifying the potential of this “pure” deep learning technique to forecast the spread of infectious diseases. While these results are satisfying, we see many areas of potential improvement, including:

  • hyperparameter tuning: No effort was made to tune the  N-BEATS model (hyper)parameters, e.g. learning rate, lookback window size, batch size, hidden layer units, et cetera. Likely, we would see a modest to moderate increase in accuracy by tuning these parameters..

  • ensembling: Oreshkin et al. describe an ensemble approach that improved model performance considerably[7]. While we generated various independent lookback models for each state, they were not combined in a meaningful way in hopes of producing a more performant (single) ensemble model. 

  • spatiotemporal multivariate extension: N-BEATS is designed for univariate time series forecasting and while we have shown it to be be performant for most of the state models, it is conceivable that leveraging multiple sources of time series data, e.g., regional temperature, aggregated general health and wellness, population density, etc. may increase the accuracy of the models.

  • model comparison: For completeness, N-BEATS efficacy metrics should be compared to those from other models such as: conventional (e.g., SARIMA), novel (e.g., weekly delta densities), and naive estimators (e.g., forecast value is simply a previously recorded value such as a one-year ago estimate).

LOOKING FORWARD

The goal of this exercise was to assess the efficacy of AI in forecasting regional infectious disease rates. Certainly, AI can be used for other problems of societal interest. For example, we are currently living in a COVID-19 pandemic and many are asking questions related to preparedness and policy making surrounding disease propagation, mitigation, and supply chain management to battle the disease, e.g., the development and distribution of personal protective equipment.

With N-BEATS showing promise as an effective AI tool for forecasting future influenza infections, we believe that it, or a similar DL architecture, can be helpful in forecasting future COVID-19 case counts, hospital load, and deaths.

In general, AI may play an important role in how we approach these important issues, both now and in the future. For example, can AI help in staging a more granular approach by leaders for isolation and quarantine practices during an epidemic? Is there an optimal way that the public can be released from quarantine that keeps the general population as safe as possible but without delaying the release too long, which may contribute to long-term economic hardship. In addition, there are many questions related to the COVID-19 disease itself and whether exogenous variables can be used to help answer these questions. For example, does tracking body temperature help in estimating the impact and transmission of the disease[13,14]? Can wastewater measurements be used to infer regional spread[15]? These and related questions generally establish an analytical framework where we need to analyze multiple variables, across time and regions, to forecast the impact and spread of a disease. In other words, to answer these questions, we need to invoke multivariate spatiotemporal analysis techniques; an approach we will pursue in a future article.


ACKNOWLEDGEMENTS

A PyTorch and Keras implementation of N-BEATS was created by Philippe  Rémy et al. and can be found here. We are grateful to Philippe for making this code available and thank him for his communication with us during the writing of this post.

SOURCE

The source used to generate the results in this article can be found here. Please feel free to download and experiment with the code.

REFERENCES

  1. https://www.bcm.edu/departments/molecular-virology-and-microbiology/emerging-infections-and-biodefense/introduction-to-infectious-diseases

  2. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R (2018) Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions. PLoS Comput Biol 14(6): e1006134. https://doi.org/10.1371/journal.pcbi.1006134

  3. Kilbourne, E. D. (2006). Influenza Pandemics of the 20th Century. Emerging Infectious Diseases, 12(1), 9-14. https://dx.doi.org/10.3201/eid1201.051254.

  4. https://www.iarpa.gov/index.php/working-with-iarpa/requests-for-information/forecasting-rare-events

  5. World Health Organization. WHO | Influenza (Seasonal); 2016. Available from: http://www.who.int/ mediacentre/factsheets/fs211/en/.

  6. Hyndman, R.J., Koehler, A.B., Ord, J.K., and Snyder, R.D. (2008) Forecasting with exponential smoothing: the state space approach, Springer-Verlag. http://www.exponentialsmoothing.net.

  7. Oreshkin, Boris & Carpo, Dmitri & Chapados, Nicolas & Bengio, Yoshua. (2019). N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. https://arxiv.org/pdf/1905.10437.pdf

  8. S Makridakis, E Spiliotis, and V Assimakopoulos. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE, 13(3), 2018a.

  9. Spyros Makridakis and Michèle Hibon. The M3-Competition: results, conclusions and implications. International Journal of Forecasting, 16(4):451–476, 2000.

  10. https://en.wikipedia.org/wiki/Makridakis_Competitions

  11. Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios (January 2020). "The M4 Competition: 100,000 time series and 61 forecasting methods". International Journal of Forecasting. 36 (1): 54–74. doi:10.1016/j.ijforecast.2019.04.014.

  12. https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error

  13. https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30198-5/fulltext

  14. https://www.medrxiv.org/content/10.1101/2020.02.22.20025791v1

  15. https://www.statnews.com/2020/04/07/new-research-wastewater-community-spread-covid-19/

  16. https://now.tufts.edu/articles/how-does-covid-19-compare-flu