Abstract
The goal of the current analysis is to provide an updated, day-to-day analysis of the available COVID-19 data using state-of-the-art quantitative methods.
May 05, 2020
A novel coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was detected in Wuhan (Hubei, China) in December 2019. Coronaviruses are a collection of viruses that cause illness ranging from the common cold to more severe diseases such as Severe Acute Respiratory Syndrome (SARS-CoV) and the coronavirus disease 2019 (COVID-19; WHO, 2020, n.d.).
The outbreak spread to 187 countries and regions, with more than 3.583055^{6} confirmed cases as of May 05, 2020.
Many scientific activities are ongoing as a response to this ever evolving public health emergency. Dong, Du, & Gardner (2020), for example, developed an online interactive dashboard, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualize and track reported cases of coronavirus disease 2019 (COVID-19) in real time. They created a Github repository with daily updates using a semi-automated living data stream strategy.
The current report uses the data gathered by Dong et al. (2020), and updated on a daily basis, to understand how the COVID-19 disease is spreading in the US, in general, and in the state of Virginia, in particular. Three are the goals of the current report: 1) to make forecasts, trying to predict the number of confirmed cases in the next 10 days; 2) to understand how states from the US can be grouped, based on the trajectory of the number of confirmed cases; and 3) to understand the latent trend for each group of state.
Forecasting is very difficult. But, it is also very important for effective and efficient planning (Hyndman & Athanasopoulos, 2018). The 10-days forecast below was implemented using a bootstrapping time series approach (see: Hyndman & Athanasopoulos, 2018) using an exponential smoothing state space model (ETS; Hyndman, Koehler, Ord, & Snyder, 2008). The forecasting generates several predictions and average the resulting forecasts, following a “bootstrap aggregating” (or bagging) process.
FALSE Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
FALSE 105 1212677 789896.7 1635457 566090.29 1859264
FALSE 106 1238747 658491.3 1819003 351322.38 2126172
FALSE 107 1264817 429492.2 2100143 -12702.35 2542337
FALSE 108 1290888 121286.9 2460488 -497862.29 3079638
FALSE 109 1316958 -258449.3 2892365 -1092419.51 3726335
FALSE 110 1343028 -710772.2 3396828 -1797988.38 4484045
FALSE 111 1369098 -1242454.8 3980651 -2624927.55 5363124
FALSE 112 1395169 -1864695.4 4655032 -3590363.13 6380700
FALSE 113 1421239 -2592743.0 5435220 -4717616.57 7560094
FALSE 114 1447309 -3445966.6 6340584 -6036310.23 8930928
FALSE Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
FALSE 105 20453.59 20324.19 20583.00 20255.68 20651.51
FALSE 106 21320.25 21142.26 21498.23 21048.04 21592.45
FALSE 107 22186.90 21932.16 22441.63 21797.31 22576.48
FALSE 108 23053.55 22701.99 23405.10 22515.89 23591.20
FALSE 109 23920.20 23456.53 24383.86 23211.08 24629.31
FALSE 110 24786.85 24198.38 25375.31 23886.86 25686.83
FALSE 111 25653.50 24929.12 26377.87 24545.66 26761.33
FALSE 112 26520.15 25649.82 27390.48 25189.10 27851.20
FALSE 113 27386.80 26361.24 28412.35 25818.35 28955.25
FALSE 114 28253.45 27063.99 29442.90 26434.34 30072.56
FALSE Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
FALSE 104 22335.32 19572.54 25098.10 18110.012 26560.62
FALSE 105 22335.32 18428.35 26242.28 16360.132 28310.50
FALSE 106 22335.32 17550.36 27120.27 15017.362 29653.27
FALSE 107 22335.32 16810.17 27860.46 13885.341 30785.29
FALSE 108 22335.32 16158.05 28512.58 12888.004 31782.63
FALSE 109 22335.32 15568.48 29102.15 11986.339 32684.29
FALSE 110 22335.32 15026.32 29644.31 11157.170 33513.46
FALSE 111 22335.32 14521.68 30148.95 10385.396 34285.24
FALSE 112 22335.32 14047.72 30622.91 9660.530 35010.10
FALSE 113 22335.32 13599.43 31071.20 8974.933 35695.70
FALSE Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
FALSE 104 1054.683 920.1562 1189.210 848.9419 1260.424
FALSE 105 1085.827 949.4560 1222.198 877.2657 1294.388
FALSE 106 1115.488 975.2437 1255.732 901.0030 1329.972
FALSE 107 1143.736 997.1593 1290.313 919.5661 1367.906
FALSE 108 1170.640 1015.0999 1326.180 932.7620 1408.518
FALSE 109 1196.263 1029.1864 1363.339 940.7417 1451.783
FALSE 110 1220.665 1039.6919 1401.639 943.8904 1497.440
FALSE 111 1243.906 1046.9644 1440.848 942.7097 1545.102
FALSE 112 1266.040 1051.3673 1480.713 937.7263 1594.354
FALSE 113 1287.121 1053.2442 1520.997 929.4375 1644.804
To understand how the evolution in the number of confirmed COVID-19 cases in each state, and to model the similarities in the trajectories of the states, a two-step approach termed Dynamic Exploratory Graph Analysis (DynEGA) is used. Here, the number of confirmed cases per state was transformed into the number of confirmed cases per 100,000 inhabitants. The first step of the DynEGA method, transforms each time series (i.e. number of confirmed cases per 100k inhabitants) into a time delay embedding matrix, that can be used to reconstruct the attractor of a dynamical system using a single sequence of observations (Takens, 1981; Whitney, 1936). An attractor contains useful information about the dynamical system, being a series of values toward which a system tends to based on a set of starting conditions. In many empirical situations, however, the collection of possible system states (phase-space) and the equations governing the system are unkown. In such situations, attractor reconstruction techniques can be used as a means to reconstruct the phase-space dynamics using, for example, only a single time series with observable values.
After a time delay embedding matrix is created for each state, derivatives are estimated using generalized local linear approximation (GLLA; Deboeck, Montpetit, Bergeman, & Boker, 2009; Boker, Deboek, Edler, & Keel, 2010). GLLA is a technique that can be used to estimate how a variable changes as a function of time. The instantaneous change in one variable with respect to another variable is known as a derivative. The derivative can represent different aspects of change, such as the velocity (speed or rate of change) and acceleration in the number of confirmed COVID-19 cases (first and second order derivative, respectively).
In the second step, a network approach for dimensionality assessment and reduction termed exploratory graph analysis is used (Golino & Epskamp, 2017; Golino et al., 2020) to estimate clusters of states. The clusters reflect how the number of confirmed COVID-19 cases are changing together.
In the current analysis, the EGAnet package (Golino & Christensen, 2019) is used to implement the DynEGA method.
The figure above shows how the states are clustering, given the rate of change (first-order derivative) in the number of confirmed COVID-19 cases per 100,000 inhabitants. It is very important to point that these clusters are dynamical, and can change from day to day as new data are gathered by the authorities.
The states and their respective cluster can be seen in the table below:
State | Cluster Number |
---|---|
AL | 1 |
AR | 1 |
CA | 1 |
CO | 1 |
CT | 1 |
DC | 1 |
DE | 1 |
GA | 1 |
IN | 1 |
KY | 1 |
MD | 1 |
MA | 1 |
MS | 1 |
NC | 1 |
ND | 1 |
OH | 1 |
PA | 1 |
RI | 1 |
SD | 1 |
TX | 1 |
UT | 1 |
AZ | 2 |
IL | 2 |
IA | 2 |
KS | 2 |
MN | 2 |
NE | 2 |
NH | 2 |
NM | 2 |
TN | 2 |
VA | 2 |
WI | 2 |
WY | 2 |
AK | 3 |
HI | 3 |
ID | 3 |
LA | 3 |
MT | 3 |
VT | 3 |
WA | 3 |
FL | 4 |
ME | 4 |
MI | 4 |
MO | 4 |
NV | 4 |
NJ | 4 |
NY | 4 |
OK | 4 |
OR | 4 |
SC | 4 |
WV | 4 |
The latent trends for each cluster can be seen in the plot below:
Finally, the maps below show the average rate of change in the number of confirmed COVID-19 cases per 100,000 inhabitants per state, and the geographical distribution of the clusters.
The average rate of change for each cluster can be seen below:
Cluster | Mean Rate of Change | Variance of the Rate of Change |
---|---|---|
1 | 3.439825 | 2898.387 |
2 | 1.988028 | 161.392 |
3 | 1.726087 | 1049.183 |
4 | 3.957361 | 17556.313 |
Boker, S. M., Deboek, P. R., Edler, C., & Keel, P. (2010). Generalized local linear approximation of derivatives from time series. In S. M. Chow, E. Ferrer, & F. Hsieh (Eds.), The notre dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue (pp. 161–178). Routledge/Taylor & Francis Group.
Deboeck, P. R., Montpetit, M. A., Bergeman, C., & Boker, S. M. (2009). Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14(4), 367–386. https://doi.org/http://dx.doi.org/10.1037/a0016622
Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track covid-19 in real time. The Lancet Infectious Diseases.
Golino, H., & Christensen, A. P. (2019). EGAnet: Exploratory graph analysis: A framework for estimating the number of dimensions in multivariate data using network psychometrics. Retrieved from https://CRAN.R-project.org/package=EGAnet
Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS One, 12(6), e0174035. https://doi.org/10.1371/journal.pone.0174035
Golino, H., Shi, D., Garrido, L. E., Christensen, A. P., Nieto, M. D., Sadana, R., … Martinez-Molina, A. (2020). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, Advance online publication. https://doi.org/10.1037/met0000255
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice. OTexts.
Hyndman, R., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting with exponential smoothing: The state space approach. Springer Science & Business Media.
Takens, F. (1981). Detecting strange attractors in turbulence. In Lecture notes in mathematics (vol. 898) (pp. 366–381). Springer. https://doi.org/10.1007/BFb0091924
Whitney, H. (1936). Differentiable manifolds. The Annals of Mathematics, 37(3), 645–680. https://doi.org/10.2307/1968482