Introduction

In the hopes of better informing public health decision-making, researchers have developed many prediction models to forecast the COVID-19 pandemic. Effective forecasts capable of identifying reliable leading indicators of emerging outbreaks could improve policy recommendations. To this end, factors such as mask-wearing1,2, weather3,4, and demography5 have been found to be associated with rates of infection in the United States. The effectiveness of other non-pharmaceutical interventions (NPIs) such as government lockdowns is also well studied6,7,8, although some questions still remain. For instance, it is challenging to disentangle the effects of overlap** NPIs, such as the rapid increase in mask-wearing in early April 2020 alongside widespread lockdowns in many parts of the United States.

Cell phone mobility data has emerged as an appealing surrogate of government mandates. Since it is a directly observable measure of human movement, it contains more information than the duration of government orders. In addition, it may serve as a better proxy for the actual quantity that government actions are intended to reduce: the relative frequency of risky in-person interactions where transmissions may occur. Mobility information is available through public APIs such as Google’s Community Mobility Reports9 and SafeGraph’s completely at-home metric10. The ubiquity of accessible mobility data, and the lack of alternative sources of data—such as contact tracing information—has made mobility an attractive proxy for interactions.

As mobility plummeted to unprecedented levels during the first wave of the pandemic, these publicly available data sources received widespread attention. Mainstream media such as the Washington Post11,12, Wall Street Journal13, New York Times14, Los Angeles Times15, and National Public Radio16 have all analyzed cell phone mobility and highlighted its record drop in 2020. Moreover, public-facing epidemiology dashboards, such as the US CDC and Prevention17 and the Institute for Health Metrics and Evaluation18, prominently list mobility as a metric of interest. As articles in leading scientific journals began to suggest that mobility data could be a valuable tool for battling the pandemic19,20,21, it is not surprising that many COVID-19 forecasts have used mobility as a data source.

Although there is a large body of work using mobility to predict COVID-19 spread, many of their conclusions are not broadly applicable outside of the initial wave of the pandemic. In particular, data limitations and inherent modeling assumptions restrict the applicability of these earlier works8,21,22,23,

Fig. 1: Weekly log infection growth rate yi,t of the inferred true incidence of infection.
figure 1

Each county i is faceted into panels by its US Census division. Within a panel, each row represents a county, and counties in the same combined statistical area (CSA) are grouped together in adjacent rows. For example, the cluster of rows in the bottom third of the Mid Atlantic with rapidly declining growth rates in April 2020 represent counties in the New York-Newark CSA. Counties within the same CSA tend to exhibit similar trends in log growth rate. At a high level, the national surge in fall 2020, followed by declining infection rates in early 2021 is pronounced.

Google’s mobility trends capture six distinct types of mobility: grocery/pharmacy, residential, retail/recreation, workplace, transit, and parks. Figure 2 shows the weekly trend for each of these variables for each county in three CSAs: New York City, San Francisco, and Green Bay, WI. Mobility values are reported relative to a baseline level in January 2020 for each county, which normalizes for population and pre-pandemic mobility levels. The rapid drop in mobility following widespread lockdowns in March 2020 is present in all locations. Furthermore, it is clear that these six mobility variables are tightly connected: grocery/pharmacy, retail/recreation, workplace, and transit are positively correlated, while residential mobility is negatively correlated with the others.

Fig. 2: Illustrative county-level mobility.
figure 2

County-level weekly % change from baseline mobility for six mobility categories (grocery and pharmacy, parks, residential, retail and recreation, transit stations, and workplace) are shown for three CSAs.

Overly flexible models lead to incorrect and misleading inferences

Before presenting results from our final model, we begin with some examples of how overly flexible models can overfit and lead to confusing conclusions. Each column of Fig. 3 highlights a different type of pitfall that can occur when the models under consideration are not properly constrained. The top row of the figure plots specific predictor variables (e.g., temperature or mobility) of interest. The middle row shows the estimated coefficients learned by a model for the variables in the top row. The bottom row shows the observed and fitted (i.e., predicted) values from the model.

Fig. 3: Illustrative model shortcomings.
figure 3

Pitfalls due to collinearity in covariates (left), too much flexibility in mobility (middle), and too much flexibility by including temperature (right). Observed covariates and infection rates from two counties are used to demonstrate these limitations; the same data is used for both overflexibile examples. Observed covariate values (top), estimated time-varying effects (middle), and fitted and observed growth rates (bottom) are shown with median (solid lines) and 95% quantiles (shaded).

The left column (labeled “collinear”) of Fig. 3 illustrates an issue caused by collinear input predictor variables. It is tempting to include each of the Google mobility types as separate variables in a model that predicts weekly infection growth rates from covariates such as mobility. However, the strong correlations between the different mobility variables often lead to misleading estimated associations between distinct mobility variables and infection growth rates. The top-left pane of the figure displays the retail/recreation and workplace mobility, two highly correlated mobility metrics within this CSA. Nonsensically, the learned association between retail mobility and infection rates is negative throughout the first three waves. This misleadingly suggests that higher levels of retail-related mobility correlate with lower infection rates, but is clearly an artifact of the collinearity between retail and workplace mobility. In our final model, we collapse the original six Google mobility measures into a single value using principal components analysis to avoid such unintentional side effects caused by collinearity29. This univariate feature captures over 60% of the variability in the original six mobility measures.

The middle (“overflexible-mobility”) column of Fig. 3 identifies another pitfall caused by too much model flexibility. The first principal component of mobility is plotted, along with its estimated association with growth rates from a model that allows for the association to varying freely each month. The model’s effect of mobility over time for this location varies considerably and appears to be overfit.

The right (“overflexible-temperature”) column Fig. 3 shows results from a different model, now allowing both the effect of temperature and the univariate measure of mobility to vary smoothly over time. Allowing the effect of temperature to also vary over time overpowers much of the signal contained in mobility, and it is clear that the model is overfitted by the near-perfect fit observed.

Properly constrained models lead to meaningful inferences

We incorporated the findings from these pitfalls into our final model. We use the first principal component of mobility as a univariate summary of the original six mobility metrics, and allow its effect to vary over four “waves” of 13 weeks each, spanning February 2020 to February 2021. Full details of the model can be found in Methods.

As the first set of qualitative checks, we display in Fig. 4 how well our final model fits the observed infection rates for three CSAs chosen to illustrate heterogeneity in conclusions and model fit. Each column shows results from San Francisco, New York City, and Green Bay, WI. For each location, the aggregate mobility metric is plotted over time, along with the model’s coefficients for mobility per wave and the fitted and observed infection rate values. New York had a strong association between mobility and growth rates at the beginning of the pandemic, Green Bay had a strong association later in the pandemic, and San Francisco never had a strong association.

Fig. 4: Illustrative model data, estimated effects, and fitted values.
figure 4

Mobility covariates (top), estimated time-varying effect of mobility (middle), and fitted and observed infection growth rate values (bottom) for three examples CSAs. New York has a strong estimated effect of mobility in waves one and two, whereas Green Bay has a strong estimated effect of mobility in the second to fourth waves. San Jose has a moderate effect of mobility in the fourth wave. Median (solid lines) and 95% quantiles (shaded) are shown.

Mobility was most predictive in urban areas during spring 2020; elsewhere exhibited substantial variation

Figure 5 presents the R2 of our model across different subsets of data. Panel (a) shows the overall R2 of the model for each week and the R2 across counties with varying population sizes. The overall fit is best during the first months of the pandemic and for the largest counties (populations of more than 250,000, comprising 64% of the total US population). R2 is low across the rural 46% of counties with a population of less than 25,000. Panel (b) shows similar R2 results according to the US Census region. The Northeast exhibits the best fit while the South has the poorest fit. Panels (c) and (d) show additional R2 results as a function of overall relative mobility levels across all locations and time. Model performance is highest during the first wave in most urban counties when mobility levels are at their lowest values. Interestingly, during the third and fourth waves, there is minimal difference in R2 as a function of mobility levels, suggesting that at this coarse level of analysis mobility’s association with infection growth rates weakened over time.

Fig. 5: Model performance.
figure 5

a R2 per week, overall, and by county population. b R2 per region and overall. At these coarse levels, models fit the best during April–May 2020. Fits were poor in summer and improved in some places during fall and winter, but never return to the initial high levels. c R2 as a function of the overall level of mobility, further broken down by county population. Mobility contains the most signal in the highest population counties when its overall value is extremely low. d R2 as a function of the overall level of mobility, further broken down by a wave. Mobility contains the most signal in the first wave at extremely low values. Median (solid lines) and 95% quantiles (shaded) are shown.

In Fig. 6, we visualize the effect of mobility alongside the corresponding R2 for each wave and CSA on a map of the US. There is a striking degree of non-stationarity in the estimated effects over time and space. In the first wave, the estimated effect of mobility is close to zero throughout most of the South, as well as much of the West and Midwest. The signal weakens considerably in the second wave, while in the third wave the signal is strongest in the Midwest. Although the estimated effects of mobility sometimes appear strong, as in the fourth wave spanning winter 2020 into early 2021, the corresponding R2 values are often fairly weak.

Fig. 6: Estimated coefficients and R2 by CSA across the four waves.
figure 6

Maps created using the R package usmap.

Overly rigid models underfit and wash out spatial and temporal effects

To assess whether our final model can be made simpler without sacrificing accuracy, we consider simpler models that limit mobility’s effect to vary by time and space. We construct an ablation study of six models: letting mobility’s effect vary by CSA, by region, or be fixed nationally; and letting mobility’s effect vary for each wave, or be fixed in time.

For the three example CSAs shown previously, we display the estimated effect of mobility across time for each ablation in Fig. 7. Comparisons of models allowing differential effects of mobility across locations show that rigid national grou** averages over effects visible at finer spatial grou**s, such as by region and CSA. Similar limitations are observed with constant temporal effects for mobility. This averaging is not just superficial: our conclusions on the association between mobility and the infection growth rate change. For example, in our final model, we conclude that there is no effect of mobility on the infection growth rate in New York during the third wave. However, all other progressions would conclude that there is a strong association. Likewise, the simpler model that allows mobility’s effect to vary by CSA but forces it to be fixed in time would conclude that New York and Green Bay have very similar associations between mobility and infection rates. However, the final model clearly shows that they are actually quite different, as New York had the strongest association early on while the opposite trend held in Green Bay.

Fig. 7: Overly rigid models average over spatiotemporal effects.
figure 7

The estimated effect of mobility for different spatial clustering (rows) and the form of temporal effects (line type) for three illustrative CSAs are displayed. Median (solid lines) and 95% quantiles (shaded) are shown.

Table 1 tabulates the overall and by region R2 for each of the six model progressions. As expected, greater flexibility generally results in a higher overall R2. The greatest differences in R2 are observed at finer disaggregations: the simplest model has an R2 of just 19% in the North East, whereas our four-wave CSA model achieves an R2 of over 40%; indicating that both time-varying coefficients and choice of clustering are critical. In Supplementary Fig. 4 we show that our final model does not overfit.

Table 1 R2 for no spatial clustering, clustering by region, or clustering by CSA, and constant or time-varying mobility coefficients.

Assessing the mask effect

On April 4, 2020, the Centers for Disease Control (CDC) began recommending public mask use, a stark reversal of earlier guidance. This led to an increase in mask use across the United States coincident with large drops in mobility. As a result of these concurrent events, mask use and mobility are strongly correlated in the first wave. To facilitate interpretation, we model the association between masks and the infection growth rate as a national effect that is constant across time. All other factors held constant, we estimate an expected 2% decrease in the infection growth rate due to an additive increase in mask adherence of 10%.

To untangle the effect of masks and mobility in the first wave, we compare the R2 by date in models with and without a mask variable. In the 4-week period following April 4, 2020, we find that overall R2 increases by approximately 10% when the mask variable is included in the model; see Supplementary Fig. 5 for additional details.

Conclusions are robust across different mobility data sources

To assess whether our conclusions are sensitive to the choice of mobility measure, we consider SafeGraph’s completely at home data measure (completely_home_prop_7dav)10,30 in place of the first principal component of Google’s mobility indicators. Our conclusions are very similar when using either Google’s or SafeGraph’s mobility measure. Panel (a) of Fig. 8 displays performance, as measured by R2, over time and by county population. As in Fig. 5, R2 is highest at the beginning of the pandemic and in high population counties.

Fig. 8: Conclusions are robust across mobility data sources.
figure 8

a R2 obtained using SafeGraph’s completely at home metric in place of the first principle component of Google’s mobility trends dataset. Median (solid lines) and 95% quantiles (shaded) are shown. b Rolling median 3-month correlation between SafeGraph’s completely at home mobility metric and each of the six Google mobility indicators. From March–June, absolute correlations were high, indicating consistency between all measures of mobility. However, the strong relationship decayed through May–October suggesting person-to-person contact patterns may not be well captured through coarse cell phone mobility after the initial period.

In panel (b) of Fig. 8, the rolling 3-month correlation between SafeGraph’s completely at the home measure and each of Google’s six mobility measures is plotted. From March–June 2020, we see correlations with large magnitudes across all variables, providing evidence that stay-at-home orders, lockdown orders, and general uncertainty resulted in a large correlated shift in mobility that is observable across different measures. As a result, during the first wave of the pandemic, any of these mobility measures should have a similar ability to predict infection rates. However, as the pandemic progressed this relationship eroded, potentially suggesting that coarse cell phone measures of mobility began to capture different aspects of mobility and in ways that may not as reliably explain person-to-person contact patterns.