Modeling COVID-19 Data: Forecasting and Clustering

Hudson Golino

1
hfg9s@virginia.edu

Abstract

The goal of the current analysis is to provide an updated, day-to-day analysis of the available COVID-19 data using state-of-the-art quantitative methods.

Report Updated on:

May 05, 2020

Introduction

A novel coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was detected in Wuhan (Hubei, China) in December 2019. Coronaviruses are a collection of viruses that cause illness ranging from the common cold to more severe diseases such as Severe Acute Respiratory Syndrome (SARS-CoV) and the coronavirus disease 2019 (COVID-19; WHO, 2020, n.d.).

The outbreak spread to 187 countries and regions, with more than 3.583055^{6} confirmed cases as of May 05, 2020.

Many scientific activities are ongoing as a response to this ever evolving public health emergency. Dong, Du, & Gardner (2020), for example, developed an online interactive dashboard, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualize and track reported cases of coronavirus disease 2019 (COVID-19) in real time. They created a Github repository with daily updates using a semi-automated living data stream strategy.

The current report uses the data gathered by Dong et al. (2020), and updated on a daily basis, to understand how the COVID-19 disease is spreading in the US, in general, and in the state of Virginia, in particular. Three are the goals of the current report: 1) to make forecasts, trying to predict the number of confirmed cases in the next 10 days; 2) to understand how states from the US can be grouped, based on the trajectory of the number of confirmed cases; and 3) to understand the latent trend for each group of state.

Forecasting

Forecasting is very difficult. But, it is also very important for effective and efficient planning (Hyndman & Athanasopoulos, 2018). The 10-days forecast below was implemented using a bootstrapping time series approach (see: Hyndman & Athanasopoulos, 2018) using an exponential smoothing state space model (ETS; Hyndman, Koehler, Ord, & Snyder, 2008). The forecasting generates several predictions and average the resulting forecasts, following a “bootstrap aggregating” (or bagging) process.

Forecast for the US (next 10 days)

FALSE     Point Forecast      Lo 80   Hi 80       Lo 95   Hi 95
FALSE 105        1212677   789896.7 1635457   566090.29 1859264
FALSE 106        1238747   658491.3 1819003   351322.38 2126172
FALSE 107        1264817   429492.2 2100143   -12702.35 2542337
FALSE 108        1290888   121286.9 2460488  -497862.29 3079638
FALSE 109        1316958  -258449.3 2892365 -1092419.51 3726335
FALSE 110        1343028  -710772.2 3396828 -1797988.38 4484045
FALSE 111        1369098 -1242454.8 3980651 -2624927.55 5363124
FALSE 112        1395169 -1864695.4 4655032 -3590363.13 6380700
FALSE 113        1421239 -2592743.0 5435220 -4717616.57 7560094
FALSE 114        1447309 -3445966.6 6340584 -6036310.23 8930928

Forecast for the state of Virginia (next 10 days)

FALSE     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
FALSE 105       20453.59 20324.19 20583.00 20255.68 20651.51
FALSE 106       21320.25 21142.26 21498.23 21048.04 21592.45
FALSE 107       22186.90 21932.16 22441.63 21797.31 22576.48
FALSE 108       23053.55 22701.99 23405.10 22515.89 23591.20
FALSE 109       23920.20 23456.53 24383.86 23211.08 24629.31
FALSE 110       24786.85 24198.38 25375.31 23886.86 25686.83
FALSE 111       25653.50 24929.12 26377.87 24545.66 26761.33
FALSE 112       26520.15 25649.82 27390.48 25189.10 27851.20
FALSE 113       27386.80 26361.24 28412.35 25818.35 28955.25
FALSE 114       28253.45 27063.99 29442.90 26434.34 30072.56

Number of New Cases per Day: Forecast for the US (next 10 days)

FALSE     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
FALSE 104       22335.32 19572.54 25098.10 18110.012 26560.62
FALSE 105       22335.32 18428.35 26242.28 16360.132 28310.50
FALSE 106       22335.32 17550.36 27120.27 15017.362 29653.27
FALSE 107       22335.32 16810.17 27860.46 13885.341 30785.29
FALSE 108       22335.32 16158.05 28512.58 12888.004 31782.63
FALSE 109       22335.32 15568.48 29102.15 11986.339 32684.29
FALSE 110       22335.32 15026.32 29644.31 11157.170 33513.46
FALSE 111       22335.32 14521.68 30148.95 10385.396 34285.24
FALSE 112       22335.32 14047.72 30622.91  9660.530 35010.10
FALSE 113       22335.32 13599.43 31071.20  8974.933 35695.70

Number of New Cases per Day: Forecast for the state of Virginia (next 10 days)

FALSE     Point Forecast     Lo 80    Hi 80    Lo 95    Hi 95
FALSE 104       1054.683  920.1562 1189.210 848.9419 1260.424
FALSE 105       1085.827  949.4560 1222.198 877.2657 1294.388
FALSE 106       1115.488  975.2437 1255.732 901.0030 1329.972
FALSE 107       1143.736  997.1593 1290.313 919.5661 1367.906
FALSE 108       1170.640 1015.0999 1326.180 932.7620 1408.518
FALSE 109       1196.263 1029.1864 1363.339 940.7417 1451.783
FALSE 110       1220.665 1039.6919 1401.639 943.8904 1497.440
FALSE 111       1243.906 1046.9644 1440.848 942.7097 1545.102
FALSE 112       1266.040 1051.3673 1480.713 937.7263 1594.354
FALSE 113       1287.121 1053.2442 1520.997 929.4375 1644.804

References

Boker, S. M., Deboek, P. R., Edler, C., & Keel, P. (2010). Generalized local linear approximation of derivatives from time series. In S. M. Chow, E. Ferrer, & F. Hsieh (Eds.), The notre dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue (pp. 161–178). Routledge/Taylor & Francis Group.

Deboeck, P. R., Montpetit, M. A., Bergeman, C., & Boker, S. M. (2009). Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14(4), 367–386. https://doi.org/http://dx.doi.org/10.1037/a0016622

Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track covid-19 in real time. The Lancet Infectious Diseases.

Golino, H., & Christensen, A. P. (2019). EGAnet: Exploratory graph analysis: A framework for estimating the number of dimensions in multivariate data using network psychometrics. Retrieved from https://CRAN.R-project.org/package=EGAnet

Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS One, 12(6), e0174035. https://doi.org/10.1371/journal.pone.0174035

Golino, H., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. D., Sadana, R., … Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, Advance online publication. https://doi.org/10.1037/met0000255

Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. OTexts.

Hyndman, R., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting with exponential smoothing: The state space approach. Springer Science & Business Media.

Takens, F. (1981). Detecting strange attractors in turbulence. In Lecture notes in mathematics (vol. 898) (pp. 366–381). Springer. https://doi.org/10.1007/BFb0091924

Whitney, H. (1936). Differentiable manifolds. The Annals of Mathematics, 37(3), 645–680. https://doi.org/10.2307/1968482

WHO. (2020). Coronavirus disease 2019 (covid-19) situation report, 46. Retrieved from https://apps.who.int/iris/bitstream/handle/10665/331443/nCoVsitrep06Mar2020-eng.pdf

WHO. (n.d.). Coronavirus. Retrieved from https://www.who.int/health-topics/coronavirus