Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan

Rahardiantoro, Septian; Sakamoto, Wataru

doi:10.1007/s00180-023-01331-x

Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan

Original paper
Published: 11 April 2023

Volume 39, pages 1513–1537, (2024)
Cite this article

Download PDF

Computational Statistics Aims and scope Submit manuscript

Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan

Download PDF

1540 Accesses
Explore all metrics

Abstract

This study addressed the issue of determining multiple potential clusters with regularization approaches for the purpose of spatio-temporal clustering. The generalized lasso framework has flexibility to incorporate adjacencies between objects in the penalty matrix and to detect multiple clusters. A generalized lasso model with two ${L}_{1}$ penalties is proposed, which can be separated into two generalized lasso models: trend filtering of temporal effect and fused lasso of spatial effect for each time point. To select the tuning parameters, the approximate leave-one-out cross-validation (ALOCV) and generalized cross-validation (GCV) are considered. A simulation study is conducted to evaluate the proposed method compared to other approaches in different problems and structures of multiple clusters. The generalized lasso with ALOCV and GCV provided smaller MSE in estimating the temporal and spatial effect compared to unpenalized method, ridge, lasso, and generalized ridge. In temporal effects detection, the generalized lasso with ALOCV and GCV provided relatively smaller and more stable MSE than other methods, for different structure of true risk values. In spatial effects detection, the generalized lasso with ALOCV provided higher index of edges detection accuracy. The simulation also suggested using a common tuning parameter over all time points in spatial clustering. Finally, the proposed method was applied to the weekly Covid-19 data in Japan form March 21, 2020, to September 11, 2021, along with the interpretation of dynamic behavior of multiple clusters.

Generalized fused Lasso for grouped data in generalized linear models

Article Open access 25 May 2024

Spatio-Temporal Adaptive Fused Lasso for Proportion Data

A modified generalized lasso algorithm to detect local spatial clusters for count data

Article 17 January 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Spatio-temporal data consists of information about objects or events located in space over a period of time. Ansari et al. (2020) classified spatio-temporal data into five types and provided a review on spatio-temporal clustering. Our study focuses on geo-referenced time series clustering, which aims to identify dynamic behavior of clusters of objects over time. The preceding literature on geo-referenced time series clustering includes fuzzy clustering method (Izakian et al. 2013, 2015), NeuCube spiking neural network architecture for brain data (Doborjeh and Kasabov 2015; Doborjeh et al. 2018), and Correlation-based Clustering of Big Spatiotemporal Datasets (CorClustST) (Husch et al. 2020).

Recently, regularization approaches have attracted attention in the spatio-temporal analysis. One approach that has great potential is the generalized lasso. The generalized lasso (Tibshirani and Taylor 2011; Arnold and Tibshirani 2016), a general form of lasso (Tibshirani 1996), makes constraints on regression coefficients based on the general structure or geometry using the ${L}_{1}$ penalty. Let ${\varvec{y}}\in {\mathbb{R}}^{n}$ be a response vector, ${\varvec{X}}\in {\mathbb{R}}^{n\times p}$ be a predictor matrix, and ${\varvec{\theta}}\in {\mathbb{R}}^{p}$ be a parameter vector. Then the generalized lasso can be formulated as

$$\arg \,\mathop {\,\min }\limits_{\varvec{\theta} } \left\{ {\left\| {\varvec{y} - \varvec{X} \varvec{\theta} } \right\|_{2}^{2} + \lambda \left\| {\varvec{D} \varvec{\theta} } \right\|_{1} } \right\},$$

(1)

where $\lambda \left( { \ge 0} \right)$ is a tuning parameter, and ${\varvec{D}} \in {\mathbb{R}}^{m \times p}$ is a penalty matrix, of which each row constructs a linear combination of ${\varvec{\theta}}$ to define the desired structural or geometric property of the problem. If ${\varvec{D}}={\varvec{I}}$, then the problem (1) becomes the ordinary lasso.

The generalized lasso has various applications by considering different forms of the penalty matrix ${\varvec{D}}$ and the predictor matrix ${\varvec{X}}$ in the model. If we specify the predictor matrix as ${\varvec{X}}={\varvec{I}}$, then (1) becomes the coefficient smoothing problem, widely known as the fused lasso (Tibshirani et al. 2005; Tibshirani and Wang 2008), trend filtering (Kim et al. 2009; Tibshirani 2014), and the wavelet smoothing (Donoho and Johnstone 1995), according to the specified structure in ${\varvec{D}}$. In contrast, in the case ${\varvec{X}}\ne {\varvec{I}}$, the applications are extended to the modeling problems, such as a modeling for MRI image data (Tibshirani and Taylor 2011), spatially varying coefficient models (Zhao and Bondell 2020; Rahardiantoro and Sakamoto 2021, 2022b), and outlier detection (She and Owen 2010).

The generalized lasso has been applied to spatial data and time series data by determining the penalty matrix ${\varvec{D}}$ appropriately. For spatial clustering analysis, a special form of the generalized lasso is the fused lasso on an irregular graph (Tibshirani and Taylor 2011; Arnold and Tibshirani 2016). In this case, the penalty matrix ${\varvec{D}}$ shows the structure of the graph$,$ so that its each row corresponds to the difference of coefficients between each pair of nodes connected by an edge. A collection of nodes on which the coefficients are estimated as common is considered to form a cluster. An application of the generalized lasso to the time series is the trend filtering (Tibshirani 2014). In this case, the penalty matrix ${\varvec{D}}$ contains discrete difference operators of a specified order, that is, the first-order difference for estimating a piecewise constant structure, the second-order difference for estimating a piecewise linear, etc.

In the preceding literature on spatio-temporal clustering, ordinary lasso approaches have been mainly used in combination with existing clustering methods. Kamenetsky et al. (2022) proposed the lasso approach to detect the potential cluster using a scan statistic by implementing the sparse matrix representation of the effects of potential clusters. Chen et al. (2018) built separate lasso sub-models at each time point to detect influenced predictors for different historical lags up to 8-time points and included the neighborhood between objects in the specified radius as one of the predictors. However, these methods have limitation in determining multiple potential clusters, because they are highly dependent on the specified radius of the neighborhood.

In this study, we propose a more flexible approach for spatio-temporal clustering, using the generalized lasso framework with two ${L}_{1}$ penalties, in which one penalty corresponds to roughness on the temporal scale, and the other penalty for fusion of adjacent locations at each time point. The proposed model can be separated into the two generalized lasso problems: trend filtering on the temporal scale and fused lasso for spatial clustering at each time point. In the trend filtering problem, smoothed temporal pattern is estimated from the average value over all locations at each time point. In the fused lasso problem, clusters are constructed at each time point and their relative magnitude can be compared. Therefore, our proposed method can reveal dynamic behavior of spatial clusters as time proceeds. One advantage of our proposed method is its flexibility, that is, we can incorporate adjacencies between objects in the penalty matrix, and it is possible to detect multiple clusters.

An essential aspect to obtain appropriate estimates of parameters is to select the optimum tuning parameter. The most common method is the $k$-fold cross-validation. For example, Zhao and Bondell (2020) applied the 10-fold cross-validation to select the tuning parameter in the generalized lasso problem. However, it is known that the $k$-fold cross-validation suffers from large biases in estimation of the out-of-sample prediction error (Rad and Maleki 2020; Rad et al. 2020). In addition, spatio-temporal data have time-dependent and neighbor-dependent structures, so splitting such data into the test and training sets may fail in estimating some coefficients depending on the structures of the penalty matrix ${\varvec{D}}$. The leave-one-out cross-validation (LOOCV), which is the case $k=n$, can reduce the bias in estimating the out-of-sample prediction error, but it requires high computational cost. We consider using the approximate leave-one-out cross-validation (ALOCV) (Rad and Maleki 2020), and its generalized cross-validation (GCV) (Craven and Wahba 1979) version. The ALOCV and GCV provide an approximation of the leave-one-out predicted values based on the primal and dual formulations of the general regularization problems, and can be used formally without breaking the structures of penalty matrix ${\varvec{D}}$.

Then, we apply the proposed method to weekly Covid-19 data in Japan as a real data application. Many studies on spatio-temporal clustering intended to reveal the pattern of Covid-19 outbreaks in some countries, such as, in Brazil (Castro et al. 2021), in the United States (Wang et al. 2021b), and in China (Wang et al. 2021a). Takemura et al. (2022) applied an adjusted Echelon scan method to detect multiple space–time clusters of daily Covid-19 cases in Japan, in which they detected time intervals and regions with significantly higher risk of infections than their surrounding ones, and considered the factors that caused them and affected the changes in a cluster’s shape. In contrast, our proposed method can reveal the overall trend of the temporal effect and detect dynamic behavior of spatial clusters.

The motivation for this study lies in three aspects, based on its purpose, contribution to the generalized lasso studies, and real data application. The first is to propose an approach to estimate temporal effect and detect multiple clusters by using generalized lasso. Secondly, this paper contributes to the ongoing generalized lasso application by extending the application to spatio-temporal data, which can be used to estimate the smoothed temporal pattern and identify dynamic behavior of spatial clusters according to adjacent locations over time. Finally, we are interested in revealing the pattern of temporal effect and spatial clusters weekly in the Covid-19 case in Japan. Our source code is available via Supplementary Information which can be accessed online at the link https://github.com/Rahardiantoro/Spatiotemporal-Generalized-Lasso-.

The sections are arranged as follows. Section 2 explains our proposed method on the generalized lasso for spatio-temporal clustering. Section 3 contains the methods for selecting the optimum tuning parameter. In Sect. 4, we perform the simulation study to investigate the performances of the proposed method compared to some existing methods. Section 5 contains the real data application of the Covid-19 data in Japan. Finally, Sect. 6 is the conclusion of this study.

2 The generalized lasso for spatio-temporal clustering

We explain our proposed method for applying to spatio-temporal data by combining two types of the generalized lasso, the trend filtering on the temporal scale and the fused lasso on an irregular graph for spatial clustering. We consider the spatio-temporal observations as ${y}_{it}$ with locations indexed by $i=\mathrm{1,2},\dots , S$ and time points indexed by $t=\mathrm{1,2},\dots ,T$. We represent ${y}_{it}$, using temporal effect ${\alpha }_{t}$ and the spatial effect ${\beta }_{it}$ at each time point, as a linear model

$$y_{it} = \alpha_{t} + \beta_{it} + \varepsilon_{it} , \, i = 1,2, \ldots ,S, \, t = 1,2, \ldots ,T,$$

(2)

where $\varepsilon_{it}$ indicates the noise at $i$-th location and $t$-th time point. To estimate $\boldsymbol{\alpha }={\left({\alpha }_{1},\dots ,{\alpha }_{T}\right)}^{T}$ and ${{\varvec{\beta}}}_{t}={\left({\beta }_{1t},\dots ,{\beta }_{St}\right)}^{T}$, we use the regularization method to obtain the smoothed of temporal effect and clusters in space over time points, by minimizing

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \alpha_{t} - \beta_{it} } \right)^{2} + \lambda_{T} P_{T} \left( {\varvec{\alpha}} \right) + \mathop \sum \limits_{t = 1}^{T} \lambda_{S,t} P_{S} \left( {{\varvec{\beta}}_{t} } \right),$$

(3)

where $P_{T} \left( {\varvec{\alpha}} \right)$ and $P_{S} \left( {{\varvec{\beta}}_{t} } \right)$ indicate the penalty terms of temporal effect and spatial effect, respectively, with corresponding tuning parameters $\lambda_{T}$ and $\lambda_{S,t}$. In this case, we use the $L_{1}$ penalty term for 1-dimensional trend filtering (Tibshirani and Taylor 2011)

$$P_{T} \left( {\varvec{\alpha}} \right) = \mathop \sum \limits_{t = 3}^{T} \left| {\alpha_{t} - 2\alpha_{t - 1} + \alpha_{t - 2} } \right|,$$

(4)

and the $L_{1}$ penalty term for the fused lasso on the graph (Tibshirani and Wang 2008; Tibshirani and Taylor 2011)

$$P_{S} \left( {{\varvec{\beta}}_{t} } \right) = \mathop \sum \limits_{{\left( {i,j} \right) \in {\mathcal{E}}}} \left| {\beta_{it} - \beta_{jt} } \right|,$$

(5)

where ${\mathcal{E}}$ is the set of edges on the graph defining adjacency.

We can rewrite the first term in (3) as

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \alpha_{t} - \beta_{it} } \right)^{2} = \mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} - \overline{\beta }_{ \cdot t} } \right)^{2} + S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} - \overline{\beta }_{ \cdot t} } \right)^{2} ,$$

(6)

where $\overline{\beta }_{ \cdot t} = S^{ - 1} \mathop \sum \limits_{i = 1}^{S} \beta_{it}$. If we put the constraint $\overline{\beta }_{ \cdot t} = 0$ for identifiability of the parameters, the Eq. (6) can be expressed as

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \alpha_{t} - \beta_{it} } \right)^{2} = \mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} } \right)^{2} + S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} } \right)^{2} .$$

(7)

Thus, we can express the problem (3) as

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} } \right)^{2} + S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} } \right)^{2} + \lambda_{T} P_{T} \left( {\varvec{\alpha}} \right) + \mathop \sum \limits_{t = 1}^{T} \lambda_{S,t} P_{S} \left( {{\varvec{\beta}}_{t} } \right).$$

(8)

Therefore, we can solve the problem of minimizing (8) as separated minimization on the temporal effect and the spatial effects over time, that is, for estimating $\boldsymbol{\alpha }$ we can only minimize

$$S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} } \right)^{2} + \lambda_{T} P_{T} \left( {\varvec{\alpha}} \right),$$

(9)

as a 1-dimensional trend filtering problem, and for estimating ${\varvec{\beta}}_{t} \left( {t = 1,2, \ldots ,T} \right)$, we can only minimize, for each $t = 1,..,T$

$$\mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} } \right)^{2} + \lambda_{S,t} P_{S} \left( {{\varvec{\beta}}_{t} } \right).$$

(10)

as a fused lasso problem on the graph. In this study, both the problems (9) and (10) have the form of the generalized lasso (1), and we can apply the R package “genlasso” (Arnold and Tibshirani 2016).

3 Methods for selecting optimum tuning parameters

LOOCV evaluates the mean square prediction error of each one observation in fitting the model using training set of rest $n-1$ observations (Stone 1974). In the context of the generalized lasso problem (1), the LOOCV error for a specified $\lambda$ can be stated as

$$LOOCV\left( \lambda \right) = \frac{1}{n}\sum\limits_{{c = 1}}^{n} {\left( {y_{c} - \varvec{x}_{c}^{T} \widehat{\varvec{\theta} }^{{/c}} } \right)} ^{2} ,$$

(11)

where ${\widehat{\varvec{\theta} }}^{/c}$ represents the leave-one-out estimate of ${\varvec{\theta}}$ when the $c$-th observation is omitted.

In the case of generalized lasso using the ${L}_{1}$ penalty, we have no exact explicit form of the leave-one-out estimate ${\widehat{{\varvec{\theta}}}}^{/c}$ in (11), and solving the generalized lasso problem for each $c$ requires high computational cost. However, we can apply the approximate leave-one-out cross-validation (ALOCV) to reduce computation time, which is based on the primal and dual formulations of non-differentiable regularization problems (Wang et al. 2018; Rad and Maleki 2020). In the context of generalized lasso problem (1), for each given $\lambda$, the algorithm of ALOCV can be described as follows (Wang et al. 2018).

(a)
Estimate ${\varvec{\theta}}$ as a solution of the primal problem (1).
(b)
Estimate ${\varvec{u}}$ as a solution of the dual problem of (1), which can be expressed as:
$$\mathop {{\text{arg }}\,{\text{min}}}\limits_{{{\varvec{\gamma}},{\varvec{u}}}} \frac{1}{2}\left\| {{\varvec{\gamma}} - {\varvec{y}}} \right\|_{2}^{2} \space s.t. \left\|{\varvec{u}} \right\|_{\infty } \le \lambda\quad {\text{and}}\quad {\varvec{X}}^{T} {\varvec{\gamma}} = {\varvec{D}}^{T} {\varvec{u}}.$$
(12)
(c)
Remove the rows of ${\varvec{D}}$ belonging to the index set $E=\left\{s=1,\dots ,m : \left|{\widehat{u}}_{s}\right|=\lambda \right\}$, to construct a submatrix ${{\varvec{D}}}_{-E}$.
(d)
Construct the matrix ${\varvec{A}}={\varvec{X}}{\varvec{B}}$, where ${\varvec{B}}$ has columns span the null space of ${{\varvec{D}}}_{-E}$.
(e)
Compute ${{\varvec{H}}}^{\boldsymbol{*}}={\varvec{A}}{{\varvec{A}}}^{+}$, where ${{\varvec{A}}}^{+}$ represents the Moore–Penrose pseudoinverse of ${\varvec{A}}$.
(f)
Calculate the ALOCV error as
$$\frac{1}{n}\mathop \sum \limits_{c = 1}^{n} \left( {\frac{{y_{c} - \varvec{x}_{c}^{T} \widehat{\varvec{\theta} }}}{{1 - h_{cc}^{*} }}} \right)^{2} ,$$
(13)

where $h_{cc}^{*}$ is the $c$-th diagonal component of ${\varvec{H}}^{\varvec{*}}$.

Then, the optimum tuning parameter $\lambda$ can be selected as the one minimizing ALOCV error (13). Our simulation study (Rahardiantoro and Sakamoto 2022a) suggested that, in the context of spatial clustering with spatially varying coefficient models, the ALOCV could yield slightly smaller out-of-sample prediction error and could detect edges in a graph with differences shrunk more appropriately, compared to $k$-fold cross-validation.

In practical computation, we may fail to obtain the ALOCV error for very small $\lambda$ values. Since ${h}_{cc}^{*}\to 1$ as $\lambda \to 0$, the denominator $1-{h}_{cc}^{*}$ for some $c$ in (13) may become close to zero, and then computation of ALOCV may be unstable, as illustrated in our application to real data (Sect. 5). Rad and Maleki (2020) suggested the generalized cross-validation (GCV) approach, which is to approximate as $h_{cc}^{*} \approx tr\left( {{\varvec{H}}^{\varvec{*}} } \right)/n$, to obtain the following score

$$\frac{1}{n}\mathop \sum \limits_{c = 1}^{n} \left( {\frac{{y_{c} - \varvec{x}_{c}^{T} \widehat{\varvec{\theta} }}}{{1 - tr\left( {{\varvec{H}}^{\varvec{*}} } \right)/n}}} \right)^{2} .$$

(14)

However, its performance does not seem to have been well investigated. We also adopt the GCV approach in our simulation study and application to real data.

4 Simulation study

In this simulation study, we investigate the performance of our proposed method with generalized lasso compared to some existing regularization methods. The problem of minimizing (3) consists of two penalties, but as explained in the Sect. 2, it can be separated into the two generalized lasso problems with each single penalty. Therefore, we compare our proposed methods with the regularization methods which consist of single penalty, such as lasso (Tibshirani 1996), ridge (Hoerl and Kennard 1970), and generalized ridge (Zhao and Bondell 2020). Table 1 shows the corresponding penalties for ${P}_{T}\left(\boldsymbol{\alpha }\right)$ and ${P}_{S}\left({{\varvec{\beta}}}_{t}\right)$.

Table 1 The penalties ${P}_{T}\left(\boldsymbol{\alpha }\right)$ and ${P}_{S}\left({{\varvec{\beta}}}_{t}\right)$ used in the simulation study

Full size table

For identifiability issues in lasso and ridge, we make some groups of adjacent time points (or spatial locations), and pool temporal effects ${\alpha }_{t}$ (or the spatial effects ${\beta }_{it}$) in the same group. Let ${\boldsymbol{\alpha }}^{*}={\left({\alpha }_{1}^{*},\dots ,{\alpha }_{{T}_{1}}^{*}\right)}^{T}$ be the vector of pooled temporal effects, $\overline{{\varvec{y}} }={\left({\overline{y} }_{\cdot 1},\dots ,{\overline{y} }_{\cdot T}\right)}^{T}$, and ${{\varvec{X}}}_{1}\in {\mathbb{R}}^{T\times {T}_{1}}$ be a block-diagonal predictor matrix with a vector ones for each block, representing how the elements are pooled. For example, suppose that we have 12 time points, grouped into ${T}_{1}=3$ groups, each containing 4 adjacent time points. In this case, ${\alpha }_{1}^{*},{\alpha }_{2}^{*},{\alpha }_{3}^{*}$ are the coefficients for group 1, 2, and 3, respectively, and the predictor matrix can be stated as ${{\varvec{X}}}_{1}=\left[\begin{array}{ccc} \varvec{1}& \varvec{0}& \varvec{0}\\ \varvec{0}& \varvec{1}& \varvec{0}\\ \varvec{0}& \varvec{0}& \varvec{1}\end{array}\right]$, where $\varvec{1}=\left[\begin{array}{c}1\\ \vdots \\ 1\end{array}\right]$ and $\varvec{0}=\left[\begin{array}{c}0\\ \vdots \\ 0\end{array}\right]$ are vectors of length 4.

For ridge problems, we can obtain the solution in the close form. Then, the problem of minimizing (9) using the ridge penalty for temporal effect is rewritten as

$$S\left\| {\overline{\varvec{y}} - {\varvec{X}}_{1} {\varvec{\alpha}}^{\varvec{*}} } \right\|_{2}^{2} + \lambda_{T} \left\| {{\varvec{\alpha}}^{\varvec{*}} } \right\|_{2}^{2} ,$$

(15)

and the solution of minimizing (15) is $\widehat{{\boldsymbol{\alpha }}^{*}}={\left({{\varvec{X}}}_{1}^{T}{{\varvec{X}}}_{1}+{\lambda }_{T}{\varvec{I}}\right)}^{-1}{{\varvec{X}}}_{1}^{T}\overline{{\varvec{y}} }$.

Similarly, let ${{\varvec{\beta}}}_{t}^{*}={\left({\beta }_{1t}^{*},\dots ,{\beta }_{{S}_{1}t}^{*}\right)}^{T}$ be the vector of pooled spatial effect, ${\widetilde{{\varvec{y}}}}_{t}$ be the vector of ${y}_{it}-{\overline{y} }_{\cdot t}$, and ${{\varvec{X}}}_{2}\in {\mathbb{R}}^{S\times {S}_{1}}$ be a block-diagonal predictor matrix representing how the elements are pooled. Then, the problem of minimizing (10) using the ridge penalty for spatial effect over time is rewritten as

$$\left\| {\widetilde{\varvec{y}}_{t} - {\varvec{X}}_{2} {\varvec{\beta}}_{t}^{*} } \right\|_{2}^{2} + \lambda_{S,t} \left\| {{\varvec{\beta}}_{t}^{*} } \right\|_{2}^{2}$$

(16)

for $t$$=1,\dots ,T$, and the solution of minimizing (16) is $\widehat{{{\varvec{\beta}}}_{t}^{*}}={\left({{\varvec{X}}}_{2}^{T}{{\varvec{X}}}_{2}+{\lambda }_{S,t}{\varvec{I}}\right)}^{-1}{{\varvec{X}}}_{2}^{T}{\widetilde{{\varvec{y}}}}_{t}$. For lasso problems, the penalties in (15) and (16) are replaced with ${L}_{1}$ penalties. In this simulation study, we used the R package “glmnet” to solve the lasso.

In the generalized ridge problem, we can also obtain the solution in the close form as follows. The problem of minimizing (9) using the generalized ridge penalty for temporal effect can be expressed in the matrix form as:

$$S\left\| {\overline{\varvec{y}} - \varvec{I\alpha }} \right\|_{2}^{2} + \lambda_{T} \left\| {{\varvec{D}}_{1} {\varvec{\alpha}}} \right\|_{2}^{2} ,$$

(17)

where ${\varvec{D}}_{1} \in {\mathbb{R}}^{{m_{1} \times T}}$ is the penalty matrix forming the second-order difference. The solution for ${\varvec{\alpha}}$ is written as $\hat{\varvec{\alpha }} = \left( {{\varvec{I}}^{T} {\varvec{I}} + \lambda_{T} {\varvec{D}}_{1}^{T} {\varvec{D}}_{1} } \right)^{ - 1} {\varvec{I}}^{T} \overline{\varvec{y}}$. In contrast, the problem of minimizing (10) using the generalized ridge penalty for spatial effect over time can be expressed in matrix form as:

$$\left\| {\widetilde{\varvec{y}}_{t} - \varvec{I\beta }_{t} } \right\|_{2}^{2} + \lambda_{S,t} \left\| {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right\|_{2}^{2}$$

(18)

for $t = 1, \ldots ,T$, where ${{\varvec{D}}}_{2}\in {\mathbb{R}}^{{m}_{2}\times S}$ is the penalty matrix forming the first-order difference on the set of edges $\mathcal{E}$. The solution for ${\varvec{\beta}}_{t}$ is written as $\hat{\varvec{\beta }}_{t} = \left( {{\varvec{I}}^{T} {\varvec{I}} + \lambda_{S,t} {\varvec{D}}_{2}^{T} {\varvec{D}}_{2} } \right)^{ - 1} {\varvec{I}}^{T} \widetilde{\varvec{y}}_{t}$. We also compare the proposed method with the unpenalized estimation in (2), that is $\hat{\alpha }_{t} = \overline{y}_{\cdot t}$ and $\hat{\beta }_{it} = y_{it} - \overline{y}_{\cdot t}$.

We applied LOOCV using the R function “cv.glmnet” to select the tuning parameter in the lasso problem. For ridge and generalized ridge problems, we applied the efficient LOOCV (Meijer 2010). In the case of minimizing (17), an efficient formula of the LOOCV error for given $\lambda_{T}$ is represented in a closed form as

$$\frac{1}{T}\mathop \sum \limits_{t = 1}^{T} \left( {\frac{{\overline{y}_{ \cdot t} - \hat{\alpha }_{t} }}{{1 - h_{tt} }}} \right)^{2} ,$$

(19)

where $h_{tt}$ is the $t$-th diagonal element of the hat-matrix ${\varvec{H}}_{T} = \left( {{\varvec{I}}^{T} {\varvec{I}} + \lambda_{T} {\varvec{D}}_{1}^{T} {\varvec{D}}_{1} } \right)^{ - 1}$. Then, we select $\lambda_{T}$ minimizing LOOCV error (19). Similarly, in the case of minimizing (18), the LOOCV error for given $\lambda_{S,t}$ can be expressed as

$$\frac{1}{S}\mathop \sum \limits_{i = 1}^{S} \left( {\frac{{y_{it} - \overline{y}_{ \cdot t} - \hat{\beta }_{it} }}{{1 - h_{ii} }}} \right)^{2}$$

(20)

for $t = 1, \ldots ,T$, where $h_{ii}$ is the $i$-th diagonal element of the hat-matrix ${{\varvec{H}}}_{S,t}={\left({{\varvec{I}}}^{T}{\varvec{I}}+{\lambda }_{S,t}{{{\varvec{D}}}_{2}}^{T}{{\varvec{D}}}_{2}\right)}^{-1}$. We select ${\lambda }_{S,t}$ minimizing LOOCV error (20) for each $t$, or a common ${\lambda }_{S}\equiv {\lambda }_{S,t}$ minimizing the sum of (20) for $t=1,\dots ,T$.

In this simulation study, we assessed the performance of the regularization methods explained above by using the mean square error (MSE) of the coefficients, which indicates the closeness between estimates and the true coefficients. We computed the MSE of the estimated temporal effect and estimated spatial effect. Moreover, to assess the accuracy for detecting clusters in the estimated spatial effect, we used the index of edges detection accuracy ($IEDA$) to evaluate the accuracy of detecting edges with zero differences, that is, zero elements of the vector ${{\varvec{D}}}_{2}{{\varvec{\beta}}}_{t}$. For 100 data replications, $IEDA$ can be stated as

$$IEDA = \frac{1}{100}\mathop \sum \limits_{z = 1}^{100} \frac{{2 \times Sens_{z}^{E} \times PPV_{z}^{E} }}{{Sens_{z}^{E} + PPV_{z}^{E} }},$$

(21)

where $Sens_{z}^{E}$ and $PPV_{z}^{E}$ indicate the sensitivity and PPV (positive prediction value) to detect the edges with zero differences, respectively, in the $z$-th replication. They are calculated as

$$Sens_{z}^{E} = \frac{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} \widehat{\varvec{\beta }}_{t} } \right)_{s} = 0} \right\} \cap \left\{ {s:\left( {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right)_{s} = 0} \right\}} \right)}}{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right)_{s} = 0} \right\}} \right)}},$$

(22)

$$PPV_{z}^{E} = \frac{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} \widehat{\varvec{\beta }}_{t} } \right)_{s} = 0} \right\} \cap \left\{ {s:\left( {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right)_{s} = 0} \right\}} \right)}}{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} \widehat{\varvec{\beta }}_{t} } \right)_{s} = 0} \right\}} \right)}}.$$

(23)

for $t = 1, \ldots ,T,$ where $len$ shows the length of a vector, $\left\{s:{\left({{\varvec{D}}}_{2}{\widehat{{\varvec{\beta}}}}_{t}\right)}_{s}=0\right\}$ is the index vector of estimated edges with zero differences, and $\left\{s:{\left({{\varvec{D}}}_{2}{{\varvec{\beta}}}_{t}\right)}_{s}=0\right\}$ is the index vector of actual edges with zero differences. The $IEDA$ close to 1 means that the estimates can detect edges with zero differences appropriately which indicates the objects are clustered correctly. We calculated $IEDA$ when selecting a common tuning parameter (${\lambda }_{S}$) over all time points and when selecting a different tuning parameter (${\lambda }_{S,t}$) for each time point. We also calculated the averages of sensitivity and PPV over 100 replications, that is,

$$\overline{{Sens^{E} }} = \frac{1}{100}\mathop \sum \limits_{z = 1}^{100} Sens_{z}^{E} ,\quad \overline{{PPV^{E} }} = \frac{1}{100}\mathop \sum \limits_{z = 1}^{100} PPV_{z}^{E} .$$

Because we are motivated by revealing the spread of Covid-19 positive cases in Japan, we constructed data simulating cases for each prefecture in Japan. Japan consists of 47 prefectures, with code 1–47 assigned roughly from north to south, and is grouped into 8 regions: Hokkaido (1), Tohoku (2–7), Kanto (8–14), Chubu (15–23), Kansai (24–30), Chugoku (31–35), Shikoku (36–39), Kyushu and Okinawa (40–47). We suppose that the adjacency between each pair of prefectures is defined based whether they are connected by land, bridges/tunnels, or ocean transportation (National Statistics Center 2016). We set weekly time points as $T=25$ to represent about 6 months.

We generated new positive cases ${o}_{it}$ in the $i$-th prefecture ($i=1, 2, \dots ,47$) at $t$-th week ($t=1, 2, \dots ,25)$ from Poisson distribution with mean ${\mu }_{it}\times {N}_{i}$, where $\mathrm{ln}\left({\mu }_{it}\right)={\alpha }_{t}+{\beta }_{it}+{\varepsilon }_{it}$. Where, the noise ${\varepsilon }_{it}$ was generated independently by following a normal distribution with mean 0 and standard deviation 3. ${N}_{i}$ is the population in the $i$-th prefecture, which was obtained form the 2020 Japan’s Population Census (Portal Site of Official Statistics of Japan (e-Stat), 2021). We define three cases of the true ${\alpha }_{t}+{\beta }_{it}$ with values 1, 5, and 10 as shown in the Fig. 1, to represent different problems and structures of clusters as follows.

(a) The Case 1 represents that one aggregated region of higher risk moves as time goes by as in Fig. 1a. In this case, we simulated the cluster of prefectures in Tohoku, Kanto, and Chubu Regions, which have a higher risk steady for four weeks. Then, the higher risk region moves to southwest prefectures within four weeks and become steady on most prefectures in Chubu, Kansai, and Chugoku Regions for eight weeks. After that, the higher risk region moves again to southwest prefectures within four weeks and become steady in Kansai, Chugoku, Shikoku, and Kyushu Regions for remaining five weeks.

(b) The Case 2 represents that one aggregated region of higher risk increases and decreases in size as in Fig. 1b. In the first four weeks, the higher risk region keeps steady on several prefectures in Kanto, Chubu, Kansai, and Chugoku Regions. Then, the higher risk region spreads to other surrounding prefectures within four weeks until it becomes steady up to seven regions: Tohoku, Kanto, Chubu, Kansai, Chugoku, Shikoku, and Kyushu Regions within eight weeks. After that, the higher risk region decreases for four weeks, and then returns to the initial size and keeps steady for five weeks.

(c) The Case 3 represents that several aggregated regions of higher risk appears and disappears as in Fig. 1c. For the first two weeks, there are no region of higher risk. Then, a higher risk region appears on prefectures in Tohoku, Kanto, and Chubu Regions for 14 weeks. In week 7, the second higher risk region appears on prefectures in Kansai and Chugoku Regions for 16 weeks. Meanwhile, the third higher risk region appears on prefectures in Shikoku and Kyushu Regions from week 13 for 8 weeks.

The regions of adjacent prefectures with higher risk value means that they are clustered. The last row of Fig. 1 shows the number of regions separated by adjacency of prefectures and different level of ${\alpha }_{t}+{\beta }_{it}$.

For each case 1, 2, and 3, 100 data sets were replicated. Then, for each data set, we transformed as ${y}_{it}=\mathrm{ln}\left(\frac{{o}_{it}}{{N}_{i}}\right)$, and fitted several models explained above. For generalized lasso/ridge, we defined the second-order difference penalty matrix ${{\varvec{D}}}_{1}\in {\mathbb{R}}^{23\times 25}$ for temporal effect. Moreover, the definition of adjacency between prefectures detects 93 edges, from which we obtained the penalty matrix ${{\varvec{D}}}_{2}\in {\mathbb{R}}^{93\times 47}$ for spatial effect. For lasso and ridge, we pooled 4 successive coefficients for temporal effect to obtain ${\alpha }_{1}^{*},\dots ,{\alpha }_{6}^{*}$ (the last one covers five weeks) and pooled coefficients based on 8 prefectural regions of Japan to obtain ${\beta }_{1t}^{*},\dots ,{\beta }_{8t}^{*}$, so that ${T}_{1}=6$, ${S}_{1}=8$, ${{\varvec{X}}}_{1}\in {\mathbb{R}}^{25\times 6}$ and ${{\varvec{X}}}_{2}\in {\mathbb{R}}^{47\times 8}$.

Figure 2 shows the MSE for the estimate of the temporal effect ${\alpha }_{t}$. In Case 1, the lasso had the smallest MSE than other methods, followed by the generalized lasso with ALOCV and GCV.

The true temporal effect ${\alpha }_{t}$ was almost constant, and so we guess that the lasso estimates of pooled coefficients might be more advantageous. In Case 2, the generalized lasso with ALOCV mainly provided the smallest MSE, followed by the generalized lasso with GCV. In this case, the MSEs of lasso and ridge were fluctuated highly, especially at the points where true risk values changed.

In Case 3, although the MSE of lasso had the smallest value for several weeks, but the generalized lasso with GCV’s MSE was the most stable, followed by the generalized lasso with ALOCV. The MSEs of lasso and ridge were also fluctuated when the number of clusters increased or decreased. We guess that pooling of the coefficients might have caused poor performance of lasso and ridge. Generally, in the temporal effect estimation, the generalized lasso with ALOCV and GCV provided relatively smaller and more stable MSE than other methods, for different pattern of true risk values.

Figure 3 shows the average of MSE for the estimates of the spatial effect ${\beta }_{it}$ over 47 prefectures for each time point. In the case of using a common tuning parameter ${\lambda }_{S}$ (Fig. 3a), the generalized lasso with GCV mainly provided the minimum MSE in all cases, followed by the generalized lasso with ALOCV. In Case 1, when the true risk values were steady in weeks 9 to 16, the MSE of lasso was smaller than the generalized lasso with ALOCV. However, in Case 2 and Case 3, when the true risk values changed, the MSEs of other methods were relatively higher than the generalized lasso. The result in the case of using a different tuning parameter ${\lambda }_{S,t}$ (Fig. 3b) was slightly different. Mainly, the generalized ridge provided the smallest MSE. The MSEs of the generalized lasso with GCV and ALOCV was higher at several intermediate weeks in Case 1, and at the first and last several weeks in Case 2, but were smaller than lasso and ridge in other cases.

Figure 4 shows the plots of $IEDA$ for all time points in clustering prefectures. We only show the result of the generalized lasso with ALOCV and GCV and lasso because all edges take non-zero differences for other methods. We can see that the generalized lasso with ALOCV outperformed, as indicated by the highest $IEDA$ for most cases in Fig. 4a and b. The $IEDA$ generally increased when the number of separated regions was small and decreased when the number of separated regions was large. Table 2 shows the averages of $\overline{{Sens }^{E}}$, $\overline{{PPV }^{E}}$ and $IEDA$ over all time points. The generalized lasso with ALOCV provided higher sensitivity and $IEDA$ than the generalized lasso with GCV and the lasso, although the coefficients of lasso were pooled in advance based on 8 prefectures regions. Moreover, we obtained slightly higher $IEDA$ when using a common tuning parameter than using different tuning parameters for all the cases and methods. If we use a common tuning parameter, the chosen tuning parameter value was not too small, so that the differences between coefficients on the edges tended to shrink to zero, which resulted in more accurate clustering. In contrast, if we use a different tuning parameter at each time point, the tuning parameter chosen varied greatly and was small for many time points. As a result, the differences between coefficients on the edges did not tend to shrink to zero, which decreased the clustering accuracy.

Table 2 Averages of $\overline{{Sens }^{E}}$, $\overline{{PPV }^{E}}$, and $IEDA$ over all time points

Full size table

In summary, our simulation study showed that the proposed method performed well in estimating the temporal effect, as suggested by lower MSE. Moreover, the proposed method was also very flexible in detecting multiple clusters, as shown by high $IEDA$ values. The generalized lasso with ALOCV outperformed in detecting clusters, while the generalized lasso with GCV performed well in estimating coefficients.

5 Real case data application: weekly Covid-19 cases in Japan

Since the first confirmed case was detected on January 16, 2020, Japan has experienced 5 major waves of the spread of Covid-19 until September 2021. As of September 11, 2021, the total number of Covid-19 cases in all prefectures in Japan was 1,627,898, with 98% recovered rate (Ministry of Health, Labor, and Welfare, 2021). At that time, the number of Japan's Covid-19 cumulative confirmed cases was the 26-th highest in the world (WHO 2021). In our study, we choose the start point on March 21, 2020, because on that date the total confirmed Covid-19 cases exceeded 1,000, spread in 39 of 47 (83%) prefectures of Japan.

Figure 5 shows daily reported Covid-19 cases in Japan, with (a)–(d) indicating periods of each declaration of emergency status respectively. To correspond with the first wave of Covid-19 spread, the first emergency status was declared on April 7, 2020, first in seven prefectures, and then it was expanded nationwide on April 16, 2020 (Fig. 5a). The second wave occurred in August 2020, but at that time the government didn’t declare an emergency status until the end of the year. After the number of cases decreased in autumn, the third wave occurred at the end of 2020, in which the number of infections reached 230,000 people. The second emergency status was declared for Saitama, Chiba, Tokyo, and Kanagawa on January 8, 2021, and was expanded to 11 prefectures on January 13, 2021. The duration of this emergency status was until March 7, 2021 (Fig. 5b). The first dose of Covid-19 vaccination was implemented on April 1, 2021, while at that time, Japan was hit by the fourth wave of outbreak. The third emergency declaration was issued for Tokyo, Osaka, Kyoto, and Hyogo on April 25, 2021, and was expanded to five other prefectures on May 16, 2021, which was lifted on June 20, 2021, except in Okinawa (Fig. 5c). The fifth wave occurred around July to September 2021, during which the Olympic Summer Games was held in Tokyo, and the fourth state of emergency was declared in several prefectures, particularly to prevent spread of the highly contagious Delta variant (Fig. 5d).

We applied the minimization problem (3) for spatio-temporal analysis, which can be decomposed into the two generalized lasso problems (9) and (10), with tuning parameter ${\lambda }_{T}$ and ${\lambda }_{S,t}$ selected by ALOCV and GCV, to understand the temporal effect and prefectural clusters constructed at each time point. We used the weekly Covid-19 positive case data for each prefecture in Japan from March 21, 2020, to September 11, 2021 (the data file covid_jpn_prefecture.csv in Takaya (2021)). Therefore, we have $S=47$ and $T=78$. Let ${y}_{it}^{*}$ be the number of weekly positive cases in the $i$-th prefecture and at the $t$-th week, and ${N}_{i}$ be the population in the $i$-th prefecture. Here, we used the log transformed positive case per population ${y}_{it}=\mathrm{log}\left(\frac{{y}_{it}^{*}}{{N}_{i}}\right)$ as the response variable in the generalized lasso problem. The adjacency between each pair of prefectures was introduced as constraints in the same way as in Sect. 4, and hence we have ${{\varvec{D}}}_{1}\in {\mathbb{R}}^{76\times 78}$ and ${{\varvec{D}}}_{2}\in {\mathbb{R}}^{93\times 47}$.

We calculated an unbiased estimate of the DF for ${\lambda }_{T}$ or ${\lambda }_{S,t}$ to evaluate complexity of the model. According to Tibshirani and Taylor (2011), the DF for given $\lambda$ in the generalized lasso (1) is defined as,

$$df = E\left[ {nullity\left( {{\varvec{D}}_{ - E} } \right)} \right],$$

(24)

where $nullity$ $\left({{\varvec{D}}}_{-E}\right)$ is the dimension of the null space of ${{\varvec{D}}}_{-E}$, reduced rows of the penalty matrix ${\varvec{D}}$ corresponding to the boundary index set $E$ of a solution of the dual problem (12). The DF for the ${L}_{1}$ penalty suggests the number of fused groups. In estimating temporal effect with (9), we used the formula (24) to calculate DF for selected ${\lambda }_{T}$. In estimating spatial effect for each time point with (10), we selected a common ${\lambda }_{S}$ for all $t=\mathrm{1,2},\dots ,78$, based on the simulation study described in Sect. 4, in which a common tuning parameter resulted in higher $IEDA$ values. In this case, the DF was calculated as the average value of DF over all $t=\mathrm{1,2},\dots ,78$.

5.1 Result of estimating temporal effect

We considered minimizing (9) to estimate the temporal effect ${\alpha }_{t}$. Table 3 contains selected ${\lambda }_{T}$ and the DF based on the proposed method with ALOCV and GCV. We limited the maximum DF to $\frac{3}{4}\left(78\right)=58.5$ to avoid extremely rough temporal effect. The generalized lasso with ALOCV selected higher ${\lambda }_{T}$ and smaller DF than the results of the generalized lasso with GCV.

Table 3 Selected ${\lambda }_{T}$ and DF for estimating the temporal effect ${\alpha }_{t}$

Full size table

Figure 6 shows the estimated temporal effect ${\widehat{\alpha }}_{t}$ for each $t$ based on selected ${\lambda }_{T}$ using generalized lasso with ALOCV and GCV, with each emergency status period (a)-(d). The break points in the estimated trend should suggest some change of conditions such as emergency status declaration. The estimated trend using ALOCV for ${L}_{1}$ penalty has slightly fewer break points than the one using GCV for ${L}_{1}$ penalty. During the first emergency status period (a), the estimated temporal effect reached the first peak at first and then fell down. After the period (a) ended, it rose quickly and reached the second peak in summer of 2020. During the second emergency status period (b), it reached the third peak at first and then fell down again. During the third emergency status period (c), it slightly increased for a while, reached the fourth peak, and then decreased quickly. During the fourth emergency status period (d), it increased for more than one month and then reached the fifth peak, and then decreased.

5.2 Result of estimating spatial effect

We considered minimizing (10) to estimate the spatial effect ${\beta }_{it}$. In this case, we assumed that ${\lambda }_{S}\equiv {\lambda }_{S,t}$ for all $t=\mathrm{1,2},\dots ,78$ and selected it by minimizing the sum of ALOCV and GCV errors over all $t$.

Table 4 contains selected ${\lambda }_{S}$ and the DF. We limited the maximum DF to $\frac{3}{4}\left(47\right)=35.25$ to avoid extremely rough spatial effect. ALOCV was not computable at lower ${\lambda }_{S}$ (less than 3.47) for the reason of division-by-zero issue, described in the last paragraph of Sect. 3, so that ALOCV selected higher ${\lambda }_{S}$ with lower DF compared to GCV.

Table 4 Selected common ${\lambda }_{S}$ and DF for estimating spatial effect

Full size table

The estimates ${\widehat{\beta }}_{it}$ of spatial effect for these methods of selecting ${\lambda }_{S}$ were plotted in Figs. 7, 8, and 9, in which the prefectures are plotted roughly from North (top) to South (bottom), and darker red color indicates higher values. Figure 7 shows unpenalized estimates of the spatial effect ${\widehat{\beta }}_{it}={y}_{it}-{\overline{y} }_{.t}$. Based on this figure, we can see the relative spread of Covid-19 for each prefecture in every week. However, the unpenalized ${\widehat{\beta }}_{it}$ looks very rough, and hence it is very difficult to grasp commonalities and differences of spatial effect between regions for each week. Figure 8 shows the estimated spatial effect ${\widehat{\beta }}_{it}$ with ${\lambda }_{S}$ selected by using ALOCV. It looks very smooth, and one or few clusters covered all prefectures at most of the weeks, and at some weeks the estimated spatial effect had large difference depending on the clustered regions. Figure 9 shows estimated spatial effect ${\widehat{\beta }}_{it}$ with ${\lambda }_{S}$ selected by using GCV. It suggests that there were some clusters of prefectures with the same values. We can see that the largest cluster consisted of most of prefectures during the emergency status periods. However, in other period of weeks, the prefectures were divided into some clusters.

5.3 Clustering the regions based on the estimated spatial effect

Figure 10 shows the heatmap of the estimated spatial effect ${\widehat{\beta }}_{it}$ with a common ${\lambda }_{S}$ selected by using GCV for generalized lasso as in Fig. 9, but prefectures have been arranged based on agglomerative hierarchical clustering. The heatmap after the arrangement can display relative infection risk, that is, how the infection occurred in a specific area and then spread to other areas.

Based on Fig. 10, we can detect six major clusters of prefectures from top to bottom: (1) All prefectures in Kyushu region, (2) All prefectures in Chugoku and Shikoku regions, (3) All prefectures in Kanto, Chubu, Kansai regions, and Fukushima prefecture (central part of Japan), (4) Prefectures in Tohoku region except Fukushima, (5) Hokkaido prefecture, and (6) Okinawa prefecture. We provide the following interpretation of the dynamic behavior of spatial clusters based on the result of generalized lasso clustering with separating into the five waves that Japan has experienced.

i.
First wave (March 21 to June 27, 2020)

During the first wave of infections, relative infection risk increased gradually in the central part of Japan (cluster 3) and Okinawa (cluster 6) and decreased in the remaining clusters. Then, while the first emergency status had been declared from mid-April to May 2020, relative infection risk was extremely higher in Hokkaido (cluster 5) and decreased gradually in the other clusters.

ii.
Second wave (July 4 to October 17, 2020)

In the second wave of outbreaks, relative infection risk was higher in Kyushu (cluster 1), central Japan (cluster 3), and Okinawa (cluster 6), while lower in the other clusters. In cluster 1, the outbreak reached a peak in August 2020 and then decreased gradually. During the period, the relative risk increased gradually in cluster 3. It was the highest and stagnant in Okinawa.

iii.
Third wave (October 24, 2020, to February 6, 2021)

In the third wave, relative risk was higher in central and northern parts of Japan (clusters 3, 4, 5, and 6). Then, it decreased gradually while the second emergency status had been declared from January to March 2021.

iv.
Fourth wave (February 13 to June 5, 2021)

After higher risk in Kanto and Tohoku regions in March 2021, the fourth wave spread to other regions. While the third emergency status had been declared from April to June 2021, relative risk was higher in Okinawa (cluster 6) in April and in Kyushu region (cluster 1) in May but was lower in the other clusters.

v.
Fifth wave (June 12 to September 11)

In the fifth wave of infections, infection risk increased first in Okinawa (cluster 6), spread into central Japan (cluster 3), Tohoku (cluster 4), and Hokkaido (cluster 5), next into Chugoku and Shikoku (cluster 2), and then into Kyushu (cluster 1).

In summary, we can see that the outbreaks that occurred in central Japan (cluster 3) spread into outer regions such as Chugoku-Shikoku region (cluster 2) and Tohoku region (cluster 4) in one month, and then spread into Kyushu region (cluster 1) a few months late. We can also see that the outbreaks in some regions leaped into Hokkaido (cluster 5) and Okinawa (cluster 6) a few months late.

6 Conclusion

In this study, we proposed a regularization approach using a modified generalized lasso model with two ${L}_{1}$ penalties for temporal effect and spatial effect. Then, our proposed method can be separated into two generalized lasso problems: trend filtering to estimate smooth temporal effect and fused lasso to detect clusters of spatial location for each time point. Through our proposed method, we can understand dynamic behavior of spatial clusters over time more flexibly, based on relative magnitude of estimated spatial effect at each time point.

To select the appropriate tuning parameters in the generalized lasso, we considered using ALOCV and GCV. Our simulation study suggested that estimation of temporal and spatial effects using generalized lasso with ALOCV and GCV was comparable or superior in terms of MSE to existing regularization methods such as lasso, ridge, and generalized ridge. Also, we showed that the generalized lasso with ALOCV provided higher $IEDA$, the accuracy of detecting edges with non-zero difference. In addition, our simulation study suggested that a common tuning parameter over all time points was preferable in spatial clustering.

Then, through the analysis of weekly Japan’s Covid-19 panel data, we illustrated how to understand the spread of Covid-19 infection using our modified generalized lasso model. In estimation of the spatial effect over weeks, the generalized lasso with a common tuning parameter over all time points selected by GCV, provided a reasonable result.

This study mainly used the “genlasso” package of R software to solve the generalized lasso problems using the dual path algorithm (Arnold and Tibshirani 2016). However, we may consider using the coordinate descent algorithm as suggested in Yamamura et al. (2021), which suggested to have better estimation accuracy and speed than the algorithm used in “genlasso”. Moreover, to detect the spatial clusters in the spread of disease as a task in epidemiology studies, the response variable is often observed as count data. The application of modified generalized lasso for count data was proposed by Choi et al. (2018), to which we have a great attention in our future work.

References

Ansari MY, Ahmad A, Khan SS, Bhushan G, Mainuddin. (2020) Spatiotemporal clustering: a review. Artif Intell Rev 53(4):2381–2423. https://doi.org/10.1007/s10462-019-09736-1
Article Google Scholar
Arnold TB, Tibshirani RJ (2016) Efficient implementations of the generalized lasso dual path algorithm. J Comput Graph Stat 25(1):1–27. https://doi.org/10.1080/10618600.2015.1008638
Article MathSciNet Google Scholar
Castro MC, Kim S, Barberia L, Ribeiro AF, Gurzenda S, Ribeiro KB, Abbott E, Blossom J, Rache B, Singer BH (2021) Spatiotemporal pattern of COVID-19 spread in Brazil. Science 372(6544):821–826. https://doi.org/10.1126/science.abh1558
Article Google Scholar
Chen Y, Ong JHY, Rajarethinam J, Yap G, Ng LC, Cook AR (2018) Neighbourhood level real-time forecasting of dengue cases in tropical urban Singapore. BMC Med 16(1):129. https://doi.org/10.1186/s12916-018-1108-5
Article Google Scholar
Choi H, Song E, Hwang S, Lee W (2018) A modified generalized lasso algorithm to detect local spatial clusters for count data. AStA Adv Statis Anal 102(4):537–563. https://doi.org/10.1007/s10182-018-0318-7
Article MathSciNet Google Scholar
Craven P, Wahba G (1979) Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403
Article Google Scholar
Doborjeh MG, Kasabov N (2015) Dynamic 3D clustering of spatio-temporal brain data in the neucube spiking neural network architecture on a case study of fMRI data. In: Arik S, Huang T, Lai WK, Liu Q (eds) Neural information processing. Springer International Publishing, Cham, pp 191–198. https://doi.org/10.1007/978-3-319-26561-2_23
Chapter Google Scholar
Doborjeh MG, Kasabov N, Doborjeh ZG (2018) Evolving, dynamic clustering of spatio/spectro-temporal data in 3D spiking neural network models and a case study on EEG data. Evol Syst 9(3):195–211. https://doi.org/10.1007/s12530-017-9178-8
Article Google Scholar
Donoho DL, Johnstone IM (1995) Adapting to unknown smoothness via wavelet shrinkage. J Am Stat Assoc 90(432):1200. https://doi.org/10.2307/2291512
Article MathSciNet Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55. https://doi.org/10.2307/1267351
Article Google Scholar
Hüsch M, Schyska BU, von Bremen L (2020) CorClustST—Correlation-based clustering of big spatio-temporal datasets. Futur Gener Comput Syst 110:610–619. https://doi.org/10.1016/j.future.2018.04.002
Article Google Scholar
Izakian H, Pedrycz W, Jamal I (2013) Clustering spatiotemporal data: an augmented fuzzy C-means. IEEE Trans Fuzzy Syst 21(5):855–868. https://doi.org/10.1109/TFUZZ.2012.2233479
Article Google Scholar
Izakian H, Pedrycz W, Jamal I (2015) Fuzzy clustering of time series data using dynamic time war** distance. Eng Appl Artif Intell 39:235–244. https://doi.org/10.1016/j.engappai.2014.12.015
Article Google Scholar
Kamenetsky ME, Lee J, Zhu J, Gangnon RE (2022) Regularized spatial and spatio-temporal cluster detection. Spatial Spatio-Temporal Epidemiol 41:100462. https://doi.org/10.1016/j.sste.2021.100462
Article Google Scholar
Kim S-J, Koh K, Boyd S, Gorinevsky D (2009) l1 trend filtering. SIAM Rev 51(2):339–360. https://doi.org/10.1137/070690274
Article MathSciNet Google Scholar
Ministry of Health, L. and W. (2021). Current situation in Japan. https://www.mhlw.go.jp/stf/covid-19/kokunainohasseijoukyou_00006.html
Meijer R (2010) Efficient approximate leave-one-out cross-validation for ridge and lasso. Delft University of Technology, Netherlands
Google Scholar
National Statistics Center. (2016). Publication of counted and indexed lists of combined adjacent blocks of prefectures in Japan (in Japanese). https://www.nstac.go.jp/technology/research/prefcomp/
Portal Site of Official Statistics of Japan (e-Stat). (2021, Oct 24). Population Census 2020. https://www.e-stat.go.jp/
Rad KR, Zhou W, and Maleki A. (2020). Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions. In: Proceedings of the 23rd international conference on artificial intelligence and statistics (AISTATS), pp 108
Rad KR, Maleki A (2020) A scalable estimate of the extra-sample prediction error via approximate leave-one-out. J R Statis Soc Series B. 82(4):965–996
Article Google Scholar
Rahardiantoro S, Sakamoto W (2021) Clustering regions based on socio-economic factors which affected the number of COVID-19 cases in Java Island. J Phys: Conf Series 1863(1):012014. https://doi.org/10.1088/1742-6596/1863/1/012014
Article Google Scholar
Rahardiantoro S, Sakamoto W (2022) optimum tuning parameter selection in generalized lasso for clustering with spatially varying coefficient models. IOP Conf Series: Earth Environ Sci 950(1):012093. https://doi.org/10.1088/1755-1315/950/1/012093
Article Google Scholar
Rahardiantoro S, Sakamoto W (2022) Spatially varying coefficient modeling of numerical and categorical predictor variables in the generalized lasso. J Environ Sci Sustain Soc 11(Supplement PP05):16–19
Google Scholar
She Y, and Owen AB. (2010). Outlier detection using nonconvex penalized regression. Unpublished manuscript. http://www-stat.stanford.edu/~owen/reports/theta-ipod.pdf
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Statis Soc Series B (Methodol) 36:111–147
MathSciNet Google Scholar
Takaya, H. (2021). COVID-19 dataset in Japan, Kaggle Dataset. https://www.kaggle.com/lisphilar/covid19-dataset-in-japan
Takemura Y, Ishioka F, Kurihara K (2022) Detection of space–time clusters using a topological hierarchy for geospatial data on COVID-19 in Japan. Japan J Statis Data Sci 5(1):279–301. https://doi.org/10.1007/s42081-022-00159-x
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol 58(1):267–288
MathSciNet Google Scholar
Tibshirani RJ (2014) Adaptive piecewise polynomial estimation via trend filtering. Ann Statis. https://doi.org/10.1214/13-AOS1189
Article MathSciNet Google Scholar
Tibshirani RJ, Taylor J (2011) The solution path of the generalized lasso. Ann Statis. https://doi.org/10.1214/11-AOS878
Article MathSciNet Google Scholar
Tibshirani R, Wang P (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1):18–29. https://doi.org/10.1093/biostatistics/kxm013
Article Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Statis Soc: Series B (Statis Methodol) 67(1):91–108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
Article MathSciNet Google Scholar
Wang Q, Dong W, Yang K, Ren Z, Huang D, Zhang P, Wang J (2021a) Temporal and spatial analysis of COVID-19 transmission in China and its influencing factors. Int J Infect Dis 105:675–685. https://doi.org/10.1016/j.ijid.2021.03.014
Article Google Scholar
Wang Y, Liu Y, Struthers J, Lian M (2021b) Spatiotemporal characteristics of the COVID-19 epidemic in the United States. Clin Infect Dis 72(4):643–651. https://doi.org/10.1093/cid/ciaa934
Article Google Scholar
Wang, S., Zhou, W., Maleki, A., Lu, H., and Mirrokni, V. (2018). Approximate leave-One-Out for high-dimensional non-differentiable learning problems. ar**v:1810.02716v1 [cs.LG]
World Health Organization (WHO). (2021, October 23). WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/table
Yamamura M, Ohishi M, Yanagihara H (2021) Spatio-temporal adaptive fused lasso for proportion data. In: Czarnowski I, Howlett RJ, Jain LC (eds) Intelligent Decision Technologies: Proceedings of the 13th KES-IDT 2021 Conference. Springer Singapore, Singapore, pp 479–489. https://doi.org/10.1007/978-981-16-2765-1_40
Chapter Google Scholar
Zhao Y, Bondell H (2020) Solution paths for the generalized lasso with applications to spatially varying coefficients regression. Comput Statis Data Anal 142:106821. https://doi.org/10.1016/j.csda.2019.106821
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported by JICA (Japan International Cooperation Agency).

Author information

Authors and Affiliations

Department of Human Ecology, Graduate School of Environmental and Life Science, Okayama University, Okayama, 700-8350, Japan
Septian Rahardiantoro & Wataru Sakamoto
Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Bogor, 16680, Indonesia
Septian Rahardiantoro

Authors

Septian Rahardiantoro
View author publications
You can also search for this author in PubMed Google Scholar
Wataru Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Septian Rahardiantoro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rahardiantoro, S., Sakamoto, W. Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan. Comput Stat 39, 1513–1537 (2024). https://doi.org/10.1007/s00180-023-01331-x

Download citation

Received: 16 September 2022
Accepted: 27 January 2023
Published: 11 April 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00180-023-01331-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan

Abstract

Similar content being viewed by others

Generalized fused Lasso for grouped data in generalized linear models

Spatio-Temporal Adaptive Fused Lasso for Proportion Data

A modified generalized lasso algorithm to detect local spatial clusters for count data

1 Introduction

2 The generalized lasso for spatio-temporal clustering

3 Methods for selecting optimum tuning parameters

4 Simulation study