Abstract

The goal of the current analysis is to provide an updated, day-to-day analysis of the available COVID-19 data using state-of-the-art quantitative methods.

Introduction

A novel coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was detected in Wuhan (Hubei, China) in December 2019. Coronaviruses are a collection of viruses that cause illness ranging from the common cold to more severe diseases such as Severe Acute Respiratory Syndrome (SARS-CoV) and the coronavirus disease 2019 (COVID-19; WHO, 2020, n.d.).

The outbreak spread to 187 countries and regions, with more than 3.583055^{6} confirmed cases as of May 05, 2020.

Many scientific activities are ongoing as a response to this ever evolving public health emergency. Dong, Du, & Gardner (2020), for example, developed an online interactive dashboard, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualize and track reported cases of coronavirus disease 2019 (COVID-19) in real time. They created a Github repository with daily updates using a semi-automated living data stream strategy.

The current report uses the data gathered by Dong et al. (2020), and updated on a daily basis, to understand how the COVID-19 disease is spreading in the US, in general, and in the state of Virginia, in particular. Three are the goals of the current report: 1) to make forecasts, trying to predict the number of confirmed cases in the next 10 days; 2) to understand how states from the US can be grouped, based on the trajectory of the number of confirmed cases; and 3) to understand the latent trend for each group of state.

Forecasting

Forecasting is very difficult. But, it is also very important for effective and efficient planning (Hyndman & Athanasopoulos, 2018). The 10-days forecast below was implemented using a bootstrapping time series approach (see: Hyndman & Athanasopoulos, 2018) using an exponential smoothing state space model (ETS; Hyndman, Koehler, Ord, & Snyder, 2008). The forecasting generates several predictions and average the resulting forecasts, following a “bootstrap aggregating” (or bagging) process.

Forecast for the US (next 10 days)

FALSE     Point Forecast      Lo 80   Hi 80       Lo 95   Hi 95
FALSE 105        1212677   789896.7 1635457   566090.29 1859264
FALSE 106        1238747   658491.3 1819003   351322.38 2126172
FALSE 107        1264817   429492.2 2100143   -12702.35 2542337
FALSE 108        1290888   121286.9 2460488  -497862.29 3079638
FALSE 109        1316958  -258449.3 2892365 -1092419.51 3726335
FALSE 110        1343028  -710772.2 3396828 -1797988.38 4484045
FALSE 111        1369098 -1242454.8 3980651 -2624927.55 5363124
FALSE 112        1395169 -1864695.4 4655032 -3590363.13 6380700
FALSE 113        1421239 -2592743.0 5435220 -4717616.57 7560094
FALSE 114        1447309 -3445966.6 6340584 -6036310.23 8930928

Forecast for the state of Virginia (next 10 days)

FALSE     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
FALSE 105       20453.59 20324.19 20583.00 20255.68 20651.51
FALSE 106       21320.25 21142.26 21498.23 21048.04 21592.45
FALSE 107       22186.90 21932.16 22441.63 21797.31 22576.48
FALSE 108       23053.55 22701.99 23405.10 22515.89 23591.20
FALSE 109       23920.20 23456.53 24383.86 23211.08 24629.31
FALSE 110       24786.85 24198.38 25375.31 23886.86 25686.83
FALSE 111       25653.50 24929.12 26377.87 24545.66 26761.33
FALSE 112       26520.15 25649.82 27390.48 25189.10 27851.20
FALSE 113       27386.80 26361.24 28412.35 25818.35 28955.25
FALSE 114       28253.45 27063.99 29442.90 26434.34 30072.56

Number of New Cases per Day: Forecast for the US (next 10 days)

FALSE     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
FALSE 104       22335.32 19572.54 25098.10 18110.012 26560.62
FALSE 105       22335.32 18428.35 26242.28 16360.132 28310.50
FALSE 106       22335.32 17550.36 27120.27 15017.362 29653.27
FALSE 107       22335.32 16810.17 27860.46 13885.341 30785.29
FALSE 108       22335.32 16158.05 28512.58 12888.004 31782.63
FALSE 109       22335.32 15568.48 29102.15 11986.339 32684.29
FALSE 110       22335.32 15026.32 29644.31 11157.170 33513.46
FALSE 111       22335.32 14521.68 30148.95 10385.396 34285.24
FALSE 112       22335.32 14047.72 30622.91  9660.530 35010.10
FALSE 113       22335.32 13599.43 31071.20  8974.933 35695.70

Number of New Cases per Day: Forecast for the state of Virginia (next 10 days)

FALSE     Point Forecast     Lo 80    Hi 80    Lo 95    Hi 95
FALSE 104       1054.683  920.1562 1189.210 848.9419 1260.424
FALSE 105       1085.827  949.4560 1222.198 877.2657 1294.388
FALSE 106       1115.488  975.2437 1255.732 901.0030 1329.972
FALSE 107       1143.736  997.1593 1290.313 919.5661 1367.906
FALSE 108       1170.640 1015.0999 1326.180 932.7620 1408.518
FALSE 109       1196.263 1029.1864 1363.339 940.7417 1451.783
FALSE 110       1220.665 1039.6919 1401.639 943.8904 1497.440
FALSE 111       1243.906 1046.9644 1440.848 942.7097 1545.102
FALSE 112       1266.040 1051.3673 1480.713 937.7263 1594.354
FALSE 113       1287.121 1053.2442 1520.997 929.4375 1644.804

Clustering and Latent Trends

To understand how the evolution in the number of confirmed COVID-19 cases in each state, and to model the similarities in the trajectories of the states, a two-step approach termed Dynamic Exploratory Graph Analysis (DynEGA) is used. Here, the number of confirmed cases per state was transformed into the number of confirmed cases per 100,000 inhabitants. The first step of the DynEGA method, transforms each time series (i.e. number of confirmed cases per 100k inhabitants) into a time delay embedding matrix, that can be used to reconstruct the attractor of a dynamical system using a single sequence of observations (Takens, 1981; Whitney, 1936). An attractor contains useful information about the dynamical system, being a series of values toward which a system tends to based on a set of starting conditions. In many empirical situations, however, the collection of possible system states (phase-space) and the equations governing the system are unkown. In such situations, attractor reconstruction techniques can be used as a means to reconstruct the phase-space dynamics using, for example, only a single time series with observable values.

After a time delay embedding matrix is created for each state, derivatives are estimated using generalized local linear approximation (GLLA; Deboeck, Montpetit, Bergeman, & Boker, 2009; Boker, Deboek, Edler, & Keel, 2010). GLLA is a technique that can be used to estimate how a variable changes as a function of time. The instantaneous change in one variable with respect to another variable is known as a derivative. The derivative can represent different aspects of change, such as the velocity (speed or rate of change) and acceleration in the number of confirmed COVID-19 cases (first and second order derivative, respectively).

In the second step, a network approach for dimensionality assessment and reduction termed exploratory graph analysis is used (Golino & Epskamp, 2017; Golino et al., 2020) to estimate clusters of states. The clusters reflect how the number of confirmed COVID-19 cases are changing together.

In the current analysis, the EGAnet package (Golino & Christensen, 2019) is used to implement the DynEGA method.

The figure above shows how the states are clustering, given the rate of change (first-order derivative) in the number of confirmed COVID-19 cases per 100,000 inhabitants. It is very important to point that these clusters are dynamical, and can change from day to day as new data are gathered by the authorities.

The states and their respective cluster can be seen in the table below:

State	Cluster Number
AL	1
AR	1
CA	1
CO	1
CT	1
DC	1
DE	1
GA	1
IN	1
KY	1
MD	1
MA	1
MS	1
NC	1
ND	1
OH	1
PA	1
RI	1
SD	1
TX	1
UT	1
AZ	2
IL	2
IA	2
KS	2
MN	2
NE	2
NH	2
NM	2
TN	2
VA	2
WI	2
WY	2
AK	3
HI	3
ID	3
LA	3
MT	3
VT	3
WA	3
FL	4
ME	4
MI	4
MO	4
NV	4
NJ	4
NY	4
OK	4
OR	4
SC	4
WV	4

The latent trends for each cluster can be seen in the plot below:

Finally, the maps below show the average rate of change in the number of confirmed COVID-19 cases per 100,000 inhabitants per state, and the geographical distribution of the clusters.

The average rate of change for each cluster can be seen below:

Cluster	Mean Rate of Change	Variance of the Rate of Change
1	3.439825	2898.387
2	1.988028	161.392
3	1.726087	1049.183
4	3.957361	17556.313

References

Boker, S. M., Deboek, P. R., Edler, C., & Keel, P. (2010). Generalized local linear approximation of derivatives from time series. In S. M. Chow, E. Ferrer, & F. Hsieh (Eds.), The notre dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue (pp. 161–178). Routledge/Taylor & Francis Group.

Deboeck, P. R., Montpetit, M. A., Bergeman, C., & Boker, S. M. (2009). Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14(4), 367–386. https://doi.org/http://dx.doi.org/10.1037/a0016622

Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track covid-19 in real time. The Lancet Infectious Diseases.

Golino, H., & Christensen, A. P. (2019). EGAnet: Exploratory graph analysis: A framework for estimating the number of dimensions in multivariate data using network psychometrics. Retrieved from https://CRAN.R-project.org/package=EGAnet

Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS One, 12(6), e0174035. https://doi.org/10.1371/journal.pone.0174035

Golino, H., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. D., Sadana, R., … Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, Advance online publication. https://doi.org/10.1037/met0000255

Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. OTexts.

Hyndman, R., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting with exponential smoothing: The state space approach. Springer Science & Business Media.

Takens, F. (1981). Detecting strange attractors in turbulence. In Lecture notes in mathematics (vol. 898) (pp. 366–381). Springer. https://doi.org/10.1007/BFb0091924

Whitney, H. (1936). Differentiable manifolds. The Annals of Mathematics, 37(3), 645–680. https://doi.org/10.2307/1968482

WHO. (2020). Coronavirus disease 2019 (covid-19) situation report, 46. Retrieved from https://apps.who.int/iris/bitstream/handle/10665/331443/nCoVsitrep06Mar2020-eng.pdf

WHO. (n.d.). Coronavirus. Retrieved from https://www.who.int/health-topics/coronavirus

Modeling COVID-19 Data: Forecasting and Clustering

Hudson Golino

Report Updated on: