1 Introduction

Spatio-temporal data consists of information about objects or events located in space over a period of time. Ansari et al. (2020) classified spatio-temporal data into five types and provided a review on spatio-temporal clustering. Our study focuses on geo-referenced time series clustering, which aims to identify dynamic behavior of clusters of objects over time. The preceding literature on geo-referenced time series clustering includes fuzzy clustering method (Izakian et al. 2013, 2015), NeuCube spiking neural network architecture for brain data (Doborjeh and Kasabov 2015; Doborjeh et al. 2018), and Correlation-based Clustering of Big Spatiotemporal Datasets (CorClustST) (Husch et al. 2020).

Recently, regularization approaches have attracted attention in the spatio-temporal analysis. One approach that has great potential is the generalized lasso. The generalized lasso (Tibshirani and Taylor 2011; Arnold and Tibshirani 2016), a general form of lasso (Tibshirani 1996), makes constraints on regression coefficients based on the general structure or geometry using the \({L}_{1}\) penalty. Let \({\varvec{y}}\in {\mathbb{R}}^{n}\) be a response vector, \({\varvec{X}}\in {\mathbb{R}}^{n\times p}\) be a predictor matrix, and \({\varvec{\theta}}\in {\mathbb{R}}^{p}\) be a parameter vector. Then the generalized lasso can be formulated as

$$\arg \,\mathop {\,\min }\limits_{\varvec{\theta} } \left\{ {\left\| {\varvec{y} - \varvec{X} \varvec{\theta} } \right\|_{2}^{2} + \lambda \left\| {\varvec{D} \varvec{\theta} } \right\|_{1} } \right\},$$
(1)

where \(\lambda \left( { \ge 0} \right)\) is a tuning parameter, and \({\varvec{D}} \in {\mathbb{R}}^{m \times p}\) is a penalty matrix, of which each row constructs a linear combination of \({\varvec{\theta}}\) to define the desired structural or geometric property of the problem. If \({\varvec{D}}={\varvec{I}}\), then the problem (1) becomes the ordinary lasso.

The generalized lasso has various applications by considering different forms of the penalty matrix \({\varvec{D}}\) and the predictor matrix \({\varvec{X}}\) in the model. If we specify the predictor matrix as \({\varvec{X}}={\varvec{I}}\), then (1) becomes the coefficient smoothing problem, widely known as the fused lasso (Tibshirani et al. 2005; Tibshirani and Wang 2008), trend filtering (Kim et al. 2009; Tibshirani 2014), and the wavelet smoothing (Donoho and Johnstone 1995), according to the specified structure in \({\varvec{D}}\). In contrast, in the case \({\varvec{X}}\ne {\varvec{I}}\), the applications are extended to the modeling problems, such as a modeling for MRI image data (Tibshirani and Taylor 2011), spatially varying coefficient models (Zhao and Bondell 2020; Rahardiantoro and Sakamoto 2021, 2022b), and outlier detection (She and Owen 2010).

The generalized lasso has been applied to spatial data and time series data by determining the penalty matrix \({\varvec{D}}\) appropriately. For spatial clustering analysis, a special form of the generalized lasso is the fused lasso on an irregular graph (Tibshirani and Taylor 2011; Arnold and Tibshirani 2016). In this case, the penalty matrix \({\varvec{D}}\) shows the structure of the graph\(,\) so that its each row corresponds to the difference of coefficients between each pair of nodes connected by an edge. A collection of nodes on which the coefficients are estimated as common is considered to form a cluster. An application of the generalized lasso to the time series is the trend filtering (Tibshirani 2014). In this case, the penalty matrix \({\varvec{D}}\) contains discrete difference operators of a specified order, that is, the first-order difference for estimating a piecewise constant structure, the second-order difference for estimating a piecewise linear, etc.

In the preceding literature on spatio-temporal clustering, ordinary lasso approaches have been mainly used in combination with existing clustering methods. Kamenetsky et al. (2022) proposed the lasso approach to detect the potential cluster using a scan statistic by implementing the sparse matrix representation of the effects of potential clusters. Chen et al. (2018) built separate lasso sub-models at each time point to detect influenced predictors for different historical lags up to 8-time points and included the neighborhood between objects in the specified radius as one of the predictors. However, these methods have limitation in determining multiple potential clusters, because they are highly dependent on the specified radius of the neighborhood.

In this study, we propose a more flexible approach for spatio-temporal clustering, using the generalized lasso framework with two \({L}_{1}\) penalties, in which one penalty corresponds to roughness on the temporal scale, and the other penalty for fusion of adjacent locations at each time point. The proposed model can be separated into the two generalized lasso problems: trend filtering on the temporal scale and fused lasso for spatial clustering at each time point. In the trend filtering problem, smoothed temporal pattern is estimated from the average value over all locations at each time point. In the fused lasso problem, clusters are constructed at each time point and their relative magnitude can be compared. Therefore, our proposed method can reveal dynamic behavior of spatial clusters as time proceeds. One advantage of our proposed method is its flexibility, that is, we can incorporate adjacencies between objects in the penalty matrix, and it is possible to detect multiple clusters.

An essential aspect to obtain appropriate estimates of parameters is to select the optimum tuning parameter. The most common method is the \(k\)-fold cross-validation. For example, Zhao and Bondell (2020) applied the 10-fold cross-validation to select the tuning parameter in the generalized lasso problem. However, it is known that the \(k\)-fold cross-validation suffers from large biases in estimation of the out-of-sample prediction error (Rad and Maleki 2020; Rad et al. 2020). In addition, spatio-temporal data have time-dependent and neighbor-dependent structures, so splitting such data into the test and training sets may fail in estimating some coefficients depending on the structures of the penalty matrix \({\varvec{D}}\). The leave-one-out cross-validation (LOOCV), which is the case \(k=n\), can reduce the bias in estimating the out-of-sample prediction error, but it requires high computational cost. We consider using the approximate leave-one-out cross-validation (ALOCV) (Rad and Maleki 2020), and its generalized cross-validation (GCV) (Craven and Wahba 1979) version. The ALOCV and GCV provide an approximation of the leave-one-out predicted values based on the primal and dual formulations of the general regularization problems, and can be used formally without breaking the structures of penalty matrix \({\varvec{D}}\).

Then, we apply the proposed method to weekly Covid-19 data in Japan as a real data application. Many studies on spatio-temporal clustering intended to reveal the pattern of Covid-19 outbreaks in some countries, such as, in Brazil (Castro et al. 2021), in the United States (Wang et al. 2021b), and in China (Wang et al. 2021a). Takemura et al. (2022) applied an adjusted Echelon scan method to detect multiple space–time clusters of daily Covid-19 cases in Japan, in which they detected time intervals and regions with significantly higher risk of infections than their surrounding ones, and considered the factors that caused them and affected the changes in a cluster’s shape. In contrast, our proposed method can reveal the overall trend of the temporal effect and detect dynamic behavior of spatial clusters.

The motivation for this study lies in three aspects, based on its purpose, contribution to the generalized lasso studies, and real data application. The first is to propose an approach to estimate temporal effect and detect multiple clusters by using generalized lasso. Secondly, this paper contributes to the ongoing generalized lasso application by extending the application to spatio-temporal data, which can be used to estimate the smoothed temporal pattern and identify dynamic behavior of spatial clusters according to adjacent locations over time. Finally, we are interested in revealing the pattern of temporal effect and spatial clusters weekly in the Covid-19 case in Japan. Our source code is available via Supplementary Information which can be accessed online at the link https://github.com/Rahardiantoro/Spatiotemporal-Generalized-Lasso-.

The sections are arranged as follows. Section 2 explains our proposed method on the generalized lasso for spatio-temporal clustering. Section 3 contains the methods for selecting the optimum tuning parameter. In Sect. 4, we perform the simulation study to investigate the performances of the proposed method compared to some existing methods. Section 5 contains the real data application of the Covid-19 data in Japan. Finally, Sect. 6 is the conclusion of this study.

2 The generalized lasso for spatio-temporal clustering

We explain our proposed method for applying to spatio-temporal data by combining two types of the generalized lasso, the trend filtering on the temporal scale and the fused lasso on an irregular graph for spatial clustering. We consider the spatio-temporal observations as \({y}_{it}\) with locations indexed by \(i=\mathrm{1,2},\dots , S\) and time points indexed by \(t=\mathrm{1,2},\dots ,T\). We represent \({y}_{it}\), using temporal effect \({\alpha }_{t}\) and the spatial effect \({\beta }_{it}\) at each time point, as a linear model

$$y_{it} = \alpha_{t} + \beta_{it} + \varepsilon_{it} , \, i = 1,2, \ldots ,S, \, t = 1,2, \ldots ,T,$$
(2)

where \(\varepsilon_{it}\) indicates the noise at \(i\)-th location and \(t\)-th time point. To estimate \(\boldsymbol{\alpha }={\left({\alpha }_{1},\dots ,{\alpha }_{T}\right)}^{T}\) and \({{\varvec{\beta}}}_{t}={\left({\beta }_{1t},\dots ,{\beta }_{St}\right)}^{T}\), we use the regularization method to obtain the smoothed of temporal effect and clusters in space over time points, by minimizing

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \alpha_{t} - \beta_{it} } \right)^{2} + \lambda_{T} P_{T} \left( {\varvec{\alpha}} \right) + \mathop \sum \limits_{t = 1}^{T} \lambda_{S,t} P_{S} \left( {{\varvec{\beta}}_{t} } \right),$$
(3)

where \(P_{T} \left( {\varvec{\alpha}} \right)\) and \(P_{S} \left( {{\varvec{\beta}}_{t} } \right)\) indicate the penalty terms of temporal effect and spatial effect, respectively, with corresponding tuning parameters \(\lambda_{T}\) and \(\lambda_{S,t}\). In this case, we use the \(L_{1}\) penalty term for 1-dimensional trend filtering (Tibshirani and Taylor 2011)

$$P_{T} \left( {\varvec{\alpha}} \right) = \mathop \sum \limits_{t = 3}^{T} \left| {\alpha_{t} - 2\alpha_{t - 1} + \alpha_{t - 2} } \right|,$$
(4)

and the \(L_{1}\) penalty term for the fused lasso on the graph (Tibshirani and Wang 2008; Tibshirani and Taylor 2011)

$$P_{S} \left( {{\varvec{\beta}}_{t} } \right) = \mathop \sum \limits_{{\left( {i,j} \right) \in {\mathcal{E}}}} \left| {\beta_{it} - \beta_{jt} } \right|,$$
(5)

where \({\mathcal{E}}\) is the set of edges on the graph defining adjacency.

We can rewrite the first term in (3) as

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \alpha_{t} - \beta_{it} } \right)^{2} = \mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} - \overline{\beta }_{ \cdot t} } \right)^{2} + S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} - \overline{\beta }_{ \cdot t} } \right)^{2} ,$$
(6)

where \(\overline{\beta }_{ \cdot t} = S^{ - 1} \mathop \sum \limits_{i = 1}^{S} \beta_{it}\). If we put the constraint \(\overline{\beta }_{ \cdot t} = 0\) for identifiability of the parameters, the Eq. (6) can be expressed as

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \alpha_{t} - \beta_{it} } \right)^{2} = \mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} } \right)^{2} + S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} } \right)^{2} .$$
(7)

Thus, we can express the problem (3) as

$$\mathop \sum \limits_{t = 1}^{T} \mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} } \right)^{2} + S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} } \right)^{2} + \lambda_{T} P_{T} \left( {\varvec{\alpha}} \right) + \mathop \sum \limits_{t = 1}^{T} \lambda_{S,t} P_{S} \left( {{\varvec{\beta}}_{t} } \right).$$
(8)

Therefore, we can solve the problem of minimizing (8) as separated minimization on the temporal effect and the spatial effects over time, that is, for estimating \(\boldsymbol{\alpha }\) we can only minimize

$$S\mathop \sum \limits_{t = 1}^{T} \left( {\overline{y}_{ \cdot t} - \alpha_{t} } \right)^{2} + \lambda_{T} P_{T} \left( {\varvec{\alpha}} \right),$$
(9)

as a 1-dimensional trend filtering problem, and for estimating \({\varvec{\beta}}_{t} \left( {t = 1,2, \ldots ,T} \right)\), we can only minimize, for each \(t = 1,..,T\)

$$\mathop \sum \limits_{i = 1}^{S} \left( {y_{it} - \overline{y}_{ \cdot t} - \beta_{it} } \right)^{2} + \lambda_{S,t} P_{S} \left( {{\varvec{\beta}}_{t} } \right).$$
(10)

as a fused lasso problem on the graph. In this study, both the problems (9) and (10) have the form of the generalized lasso (1), and we can apply the R package “genlasso” (Arnold and Tibshirani 2016).

3 Methods for selecting optimum tuning parameters

LOOCV evaluates the mean square prediction error of each one observation in fitting the model using training set of rest \(n-1\) observations (Stone 1974). In the context of the generalized lasso problem (1), the LOOCV error for a specified \(\lambda\) can be stated as

$$LOOCV\left( \lambda \right) = \frac{1}{n}\sum\limits_{{c = 1}}^{n} {\left( {y_{c} - \varvec{x}_{c}^{T} \widehat{\varvec{\theta} }^{{/c}} } \right)} ^{2} ,$$
(11)

where \({\widehat{\varvec{\theta} }}^{/c}\) represents the leave-one-out estimate of \({\varvec{\theta}}\) when the \(c\)-th observation is omitted.

In the case of generalized lasso using the \({L}_{1}\) penalty, we have no exact explicit form of the leave-one-out estimate \({\widehat{{\varvec{\theta}}}}^{/c}\) in (11), and solving the generalized lasso problem for each \(c\) requires high computational cost. However, we can apply the approximate leave-one-out cross-validation (ALOCV) to reduce computation time, which is based on the primal and dual formulations of non-differentiable regularization problems (Wang et al. 2018; Rad and Maleki 2020). In the context of generalized lasso problem (1), for each given \(\lambda\), the algorithm of ALOCV can be described as follows (Wang et al. 2018).  

  1. (a)

    Estimate \({\varvec{\theta}}\) as a solution of the primal problem (1).

  2. (b)

    Estimate \({\varvec{u}}\) as a solution of the dual problem of (1), which can be expressed as:

    $$\mathop {{\text{arg }}\,{\text{min}}}\limits_{{{\varvec{\gamma}},{\varvec{u}}}} \frac{1}{2}\left\| {{\varvec{\gamma}} - {\varvec{y}}} \right\|_{2}^{2} \space s.t. \left\|{\varvec{u}} \right\|_{\infty } \le \lambda\quad {\text{and}}\quad {\varvec{X}}^{T} {\varvec{\gamma}} = {\varvec{D}}^{T} {\varvec{u}}.$$
    (12)
  3. (c)

    Remove the rows of \({\varvec{D}}\) belonging to the index set \(E=\left\{s=1,\dots ,m : \left|{\widehat{u}}_{s}\right|=\lambda \right\}\), to construct a submatrix \({{\varvec{D}}}_{-E}\).

  4. (d)

    Construct the matrix \({\varvec{A}}={\varvec{X}}{\varvec{B}}\), where \({\varvec{B}}\) has columns span the null space of \({{\varvec{D}}}_{-E}\).

  5. (e)

    Compute \({{\varvec{H}}}^{\boldsymbol{*}}={\varvec{A}}{{\varvec{A}}}^{+}\), where \({{\varvec{A}}}^{+}\) represents the Moore–Penrose pseudoinverse of \({\varvec{A}}\).

  6. (f)

    Calculate the ALOCV error as

    $$\frac{1}{n}\mathop \sum \limits_{c = 1}^{n} \left( {\frac{{y_{c} - \varvec{x}_{c}^{T} \widehat{\varvec{\theta} }}}{{1 - h_{cc}^{*} }}} \right)^{2} ,$$
    (13)

     where \(h_{cc}^{*}\) is the \(c\)-th diagonal component of \({\varvec{H}}^{\varvec{*}}\).

Then, the optimum tuning parameter \(\lambda\) can be selected as the one minimizing ALOCV error (13). Our simulation study (Rahardiantoro and Sakamoto 2022a) suggested that, in the context of spatial clustering with spatially varying coefficient models, the ALOCV could yield slightly smaller out-of-sample prediction error and could detect edges in a graph with differences shrunk more appropriately, compared to \(k\)-fold cross-validation.

In practical computation, we may fail to obtain the ALOCV error for very small \(\lambda\) values. Since \({h}_{cc}^{*}\to 1\) as \(\lambda \to 0\), the denominator \(1-{h}_{cc}^{*}\) for some \(c\) in (13) may become close to zero, and then computation of ALOCV may be unstable, as illustrated in our application to real data (Sect. 5). Rad and Maleki (2020) suggested the generalized cross-validation (GCV) approach, which is to approximate as \(h_{cc}^{*} \approx tr\left( {{\varvec{H}}^{\varvec{*}} } \right)/n\), to obtain the following score

$$\frac{1}{n}\mathop \sum \limits_{c = 1}^{n} \left( {\frac{{y_{c} - \varvec{x}_{c}^{T} \widehat{\varvec{\theta} }}}{{1 - tr\left( {{\varvec{H}}^{\varvec{*}} } \right)/n}}} \right)^{2} .$$
(14)

However, its performance does not seem to have been well investigated. We also adopt the GCV approach in our simulation study and application to real data.

4 Simulation study

In this simulation study, we investigate the performance of our proposed method with generalized lasso compared to some existing regularization methods. The problem of minimizing (3) consists of two penalties, but as explained in the Sect. 2, it can be separated into the two generalized lasso problems with each single penalty. Therefore, we compare our proposed methods with the regularization methods which consist of single penalty, such as lasso (Tibshirani 1996), ridge (Hoerl and Kennard 1970), and generalized ridge (Zhao and Bondell 2020). Table 1 shows the corresponding penalties for \({P}_{T}\left(\boldsymbol{\alpha }\right)\) and \({P}_{S}\left({{\varvec{\beta}}}_{t}\right)\).

Table 1 The penalties \({P}_{T}\left(\boldsymbol{\alpha }\right)\) and \({P}_{S}\left({{\varvec{\beta}}}_{t}\right)\) used in the simulation study

For identifiability issues in lasso and ridge, we make some groups of adjacent time points (or spatial locations), and pool temporal effects \({\alpha }_{t}\) (or the spatial effects \({\beta }_{it}\)) in the same group. Let \({\boldsymbol{\alpha }}^{*}={\left({\alpha }_{1}^{*},\dots ,{\alpha }_{{T}_{1}}^{*}\right)}^{T}\) be the vector of pooled temporal effects, \(\overline{{\varvec{y}} }={\left({\overline{y} }_{\cdot 1},\dots ,{\overline{y} }_{\cdot T}\right)}^{T}\), and \({{\varvec{X}}}_{1}\in {\mathbb{R}}^{T\times {T}_{1}}\) be a block-diagonal predictor matrix with a vector ones for each block, representing how the elements are pooled. For example, suppose that we have 12 time points, grouped into \({T}_{1}=3\) groups, each containing 4 adjacent time points. In this case, \({\alpha }_{1}^{*},{\alpha }_{2}^{*},{\alpha }_{3}^{*}\) are the coefficients for group 1, 2, and 3, respectively, and the predictor matrix can be stated as \({{\varvec{X}}}_{1}=\left[\begin{array}{ccc} \varvec{1}& \varvec{0}& \varvec{0}\\ \varvec{0}& \varvec{1}& \varvec{0}\\ \varvec{0}& \varvec{0}& \varvec{1}\end{array}\right]\), where \(\varvec{1}=\left[\begin{array}{c}1\\ \vdots \\ 1\end{array}\right]\) and \(\varvec{0}=\left[\begin{array}{c}0\\ \vdots \\ 0\end{array}\right]\) are vectors of length 4.

For ridge problems, we can obtain the solution in the close form. Then, the problem of minimizing (9) using the ridge penalty for temporal effect is rewritten as

$$S\left\| {\overline{\varvec{y}} - {\varvec{X}}_{1} {\varvec{\alpha}}^{\varvec{*}} } \right\|_{2}^{2} + \lambda_{T} \left\| {{\varvec{\alpha}}^{\varvec{*}} } \right\|_{2}^{2} ,$$
(15)

and the solution of minimizing (15) is \(\widehat{{\boldsymbol{\alpha }}^{*}}={\left({{\varvec{X}}}_{1}^{T}{{\varvec{X}}}_{1}+{\lambda }_{T}{\varvec{I}}\right)}^{-1}{{\varvec{X}}}_{1}^{T}\overline{{\varvec{y}} }\).

Similarly, let \({{\varvec{\beta}}}_{t}^{*}={\left({\beta }_{1t}^{*},\dots ,{\beta }_{{S}_{1}t}^{*}\right)}^{T}\) be the vector of pooled spatial effect, \({\widetilde{{\varvec{y}}}}_{t}\) be the vector of \({y}_{it}-{\overline{y} }_{\cdot t}\), and \({{\varvec{X}}}_{2}\in {\mathbb{R}}^{S\times {S}_{1}}\) be a block-diagonal predictor matrix representing how the elements are pooled. Then, the problem of minimizing (10) using the ridge penalty for spatial effect over time is rewritten as

$$\left\| {\widetilde{\varvec{y}}_{t} - {\varvec{X}}_{2} {\varvec{\beta}}_{t}^{*} } \right\|_{2}^{2} + \lambda_{S,t} \left\| {{\varvec{\beta}}_{t}^{*} } \right\|_{2}^{2}$$
(16)

for \(t\)\(=1,\dots ,T\), and the solution of minimizing (16) is \(\widehat{{{\varvec{\beta}}}_{t}^{*}}={\left({{\varvec{X}}}_{2}^{T}{{\varvec{X}}}_{2}+{\lambda }_{S,t}{\varvec{I}}\right)}^{-1}{{\varvec{X}}}_{2}^{T}{\widetilde{{\varvec{y}}}}_{t}\). For lasso problems, the penalties in (15) and (16) are replaced with \({L}_{1}\) penalties. In this simulation study, we used the R package “glmnet” to solve the lasso.

In the generalized ridge problem, we can also obtain the solution in the close form as follows. The problem of minimizing (9) using the generalized ridge penalty for temporal effect can be expressed in the matrix form as:

$$S\left\| {\overline{\varvec{y}} - \varvec{I\alpha }} \right\|_{2}^{2} + \lambda_{T} \left\| {{\varvec{D}}_{1} {\varvec{\alpha}}} \right\|_{2}^{2} ,$$
(17)

where \({\varvec{D}}_{1} \in {\mathbb{R}}^{{m_{1} \times T}}\) is the penalty matrix forming the second-order difference. The solution for \({\varvec{\alpha}}\) is written as \(\hat{\varvec{\alpha }} = \left( {{\varvec{I}}^{T} {\varvec{I}} + \lambda_{T} {\varvec{D}}_{1}^{T} {\varvec{D}}_{1} } \right)^{ - 1} {\varvec{I}}^{T} \overline{\varvec{y}}\). In contrast, the problem of minimizing (10) using the generalized ridge penalty for spatial effect over time can be expressed in matrix form as:

$$\left\| {\widetilde{\varvec{y}}_{t} - \varvec{I\beta }_{t} } \right\|_{2}^{2} + \lambda_{S,t} \left\| {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right\|_{2}^{2}$$
(18)

for \(t = 1, \ldots ,T\), where \({{\varvec{D}}}_{2}\in {\mathbb{R}}^{{m}_{2}\times S}\) is the penalty matrix forming the first-order difference on the set of edges \(\mathcal{E}\). The solution for \({\varvec{\beta}}_{t}\) is written as \(\hat{\varvec{\beta }}_{t} = \left( {{\varvec{I}}^{T} {\varvec{I}} + \lambda_{S,t} {\varvec{D}}_{2}^{T} {\varvec{D}}_{2} } \right)^{ - 1} {\varvec{I}}^{T} \widetilde{\varvec{y}}_{t}\). We also compare the proposed method with the unpenalized estimation in (2), that is \(\hat{\alpha }_{t} = \overline{y}_{\cdot t}\) and \(\hat{\beta }_{it} = y_{it} - \overline{y}_{\cdot t}\).

We applied LOOCV using the R function “cv.glmnet” to select the tuning parameter in the lasso problem. For ridge and generalized ridge problems, we applied the efficient LOOCV (Meijer 2010). In the case of minimizing (17), an efficient formula of the LOOCV error for given \(\lambda_{T}\) is represented in a closed form as

$$\frac{1}{T}\mathop \sum \limits_{t = 1}^{T} \left( {\frac{{\overline{y}_{ \cdot t} - \hat{\alpha }_{t} }}{{1 - h_{tt} }}} \right)^{2} ,$$
(19)

where \(h_{tt}\) is the \(t\)-th diagonal element of the hat-matrix \({\varvec{H}}_{T} = \left( {{\varvec{I}}^{T} {\varvec{I}} + \lambda_{T} {\varvec{D}}_{1}^{T} {\varvec{D}}_{1} } \right)^{ - 1}\). Then, we select \(\lambda_{T}\) minimizing LOOCV error (19). Similarly, in the case of minimizing (18), the LOOCV error for given \(\lambda_{S,t}\) can be expressed as

$$\frac{1}{S}\mathop \sum \limits_{i = 1}^{S} \left( {\frac{{y_{it} - \overline{y}_{ \cdot t} - \hat{\beta }_{it} }}{{1 - h_{ii} }}} \right)^{2}$$
(20)

for \(t = 1, \ldots ,T\), where \(h_{ii}\) is the \(i\)-th diagonal element of the hat-matrix \({{\varvec{H}}}_{S,t}={\left({{\varvec{I}}}^{T}{\varvec{I}}+{\lambda }_{S,t}{{{\varvec{D}}}_{2}}^{T}{{\varvec{D}}}_{2}\right)}^{-1}\). We select \({\lambda }_{S,t}\) minimizing LOOCV error (20) for each \(t\), or a common \({\lambda }_{S}\equiv {\lambda }_{S,t}\) minimizing the sum of (20) for \(t=1,\dots ,T\).

In this simulation study, we assessed the performance of the regularization methods explained above by using the mean square error (MSE) of the coefficients, which indicates the closeness between estimates and the true coefficients. We computed the MSE of the estimated temporal effect and estimated spatial effect. Moreover, to assess the accuracy for detecting clusters in the estimated spatial effect, we used the index of edges detection accuracy (\(IEDA\)) to evaluate the accuracy of detecting edges with zero differences, that is, zero elements of the vector \({{\varvec{D}}}_{2}{{\varvec{\beta}}}_{t}\). For 100 data replications, \(IEDA\) can be stated as

$$IEDA = \frac{1}{100}\mathop \sum \limits_{z = 1}^{100} \frac{{2 \times Sens_{z}^{E} \times PPV_{z}^{E} }}{{Sens_{z}^{E} + PPV_{z}^{E} }},$$
(21)

where \(Sens_{z}^{E}\) and \(PPV_{z}^{E}\) indicate the sensitivity and PPV (positive prediction value) to detect the edges with zero differences, respectively, in the \(z\)-th replication. They are calculated as

$$Sens_{z}^{E} = \frac{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} \widehat{\varvec{\beta }}_{t} } \right)_{s} = 0} \right\} \cap \left\{ {s:\left( {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right)_{s} = 0} \right\}} \right)}}{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right)_{s} = 0} \right\}} \right)}},$$
(22)
$$PPV_{z}^{E} = \frac{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} \widehat{\varvec{\beta }}_{t} } \right)_{s} = 0} \right\} \cap \left\{ {s:\left( {{\varvec{D}}_{2} {\varvec{\beta}}_{t} } \right)_{s} = 0} \right\}} \right)}}{{len\left( {\left\{ {s:\left( {{\varvec{D}}_{2} \widehat{\varvec{\beta }}_{t} } \right)_{s} = 0} \right\}} \right)}}.$$
(23)

for \(t = 1, \ldots ,T,\) where \(len\) shows the length of a vector, \(\left\{s:{\left({{\varvec{D}}}_{2}{\widehat{{\varvec{\beta}}}}_{t}\right)}_{s}=0\right\}\) is the index vector of estimated edges with zero differences, and \(\left\{s:{\left({{\varvec{D}}}_{2}{{\varvec{\beta}}}_{t}\right)}_{s}=0\right\}\) is the index vector of actual edges with zero differences. The \(IEDA\) close to 1 means that the estimates can detect edges with zero differences appropriately which indicates the objects are clustered correctly. We calculated \(IEDA\) when selecting a common tuning parameter (\({\lambda }_{S}\)) over all time points and when selecting a different tuning parameter (\({\lambda }_{S,t}\)) for each time point. We also calculated the averages of sensitivity and PPV over 100 replications, that is,

$$\overline{{Sens^{E} }} = \frac{1}{100}\mathop \sum \limits_{z = 1}^{100} Sens_{z}^{E} ,\quad \overline{{PPV^{E} }} = \frac{1}{100}\mathop \sum \limits_{z = 1}^{100} PPV_{z}^{E} .$$

Because we are motivated by revealing the spread of Covid-19 positive cases in Japan, we constructed data simulating cases for each prefecture in Japan. Japan consists of 47 prefectures, with code 1–47 assigned roughly from north to south, and is grouped into 8 regions: Hokkaido (1), Tohoku (2–7), Kanto (8–14), Chubu (15–23), Kansai (24–30), Chugoku (31–35), Shikoku (36–39), Kyushu and Okinawa (40–47). We suppose that the adjacency between each pair of prefectures is defined based whether they are connected by land, bridges/tunnels, or ocean transportation (National Statistics Center 2016). We set weekly time points as \(T=25\) to represent about 6 months.

We generated new positive cases \({o}_{it}\) in the \(i\)-th prefecture (\(i=1, 2, \dots ,47\)) at \(t\)-th week (\(t=1, 2, \dots ,25)\) from Poisson distribution with mean \({\mu }_{it}\times {N}_{i}\), where \(\mathrm{ln}\left({\mu }_{it}\right)={\alpha }_{t}+{\beta }_{it}+{\varepsilon }_{it}\). Where, the noise \({\varepsilon }_{it}\) was generated independently by following a normal distribution with mean 0 and standard deviation 3. \({N}_{i}\) is the population in the \(i\)-th prefecture, which was obtained form the 2020 Japan’s Population Census (Portal Site of Official Statistics of Japan (e-Stat), 2021). We define three cases of the true \({\alpha }_{t}+{\beta }_{it}\) with values 1, 5, and 10 as shown in the Fig. 1, to represent different problems and structures of clusters as follows.

Fig. 1
figure 1

True \({\alpha }_{t}+{\beta }_{it}; \space i=\mathrm{1,2},\dots ,47, \space t=\mathrm{1,2},\dots ,25\) for each case. The row label shows the prefecture code, and the column label shows the week. The last row indicates the number of separated regions for each time point

(a) The Case 1 represents that one aggregated region of higher risk moves as time goes by as in Fig. 1a. In this case, we simulated the cluster of prefectures in Tohoku, Kanto, and Chubu Regions, which have a higher risk steady for four weeks. Then, the higher risk region moves to southwest prefectures within four weeks and become steady on most prefectures in Chubu, Kansai, and Chugoku Regions for eight weeks. After that, the higher risk region moves again to southwest prefectures within four weeks and become steady in Kansai, Chugoku, Shikoku, and Kyushu Regions for remaining five weeks.

(b) The Case 2 represents that one aggregated region of higher risk increases and decreases in size as in Fig. 1b. In the first four weeks, the higher risk region keeps steady on several prefectures in Kanto, Chubu, Kansai, and Chugoku Regions. Then, the higher risk region spreads to other surrounding prefectures within four weeks until it becomes steady up to seven regions: Tohoku, Kanto, Chubu, Kansai, Chugoku, Shikoku, and Kyushu Regions within eight weeks. After that, the higher risk region decreases for four weeks, and then returns to the initial size and keeps steady for five weeks.

(c) The Case 3 represents that several aggregated regions of higher risk appears and disappears as in Fig. 1c. For the first two weeks, there are no region of higher risk. Then, a higher risk region appears on prefectures in Tohoku, Kanto, and Chubu Regions for 14 weeks. In week 7, the second higher risk region appears on prefectures in Kansai and Chugoku Regions for 16 weeks. Meanwhile, the third higher risk region appears on prefectures in Shikoku and Kyushu Regions from week 13 for 8 weeks.

The regions of adjacent prefectures with higher risk value means that they are clustered. The last row of Fig. 1 shows the number of regions separated by adjacency of prefectures and different level of \({\alpha }_{t}+{\beta }_{it}\).

For each case 1, 2, and 3, 100 data sets were replicated. Then, for each data set, we transformed as \({y}_{it}=\mathrm{ln}\left(\frac{{o}_{it}}{{N}_{i}}\right)\), and fitted several models explained above. For generalized lasso/ridge, we defined the second-order difference penalty matrix \({{\varvec{D}}}_{1}\in {\mathbb{R}}^{23\times 25}\) for temporal effect. Moreover, the definition of adjacency between prefectures detects 93 edges, from which we obtained the penalty matrix \({{\varvec{D}}}_{2}\in {\mathbb{R}}^{93\times 47}\) for spatial effect. For lasso and ridge, we pooled 4 successive coefficients for temporal effect to obtain \({\alpha }_{1}^{*},\dots ,{\alpha }_{6}^{*}\) (the last one covers five weeks) and pooled coefficients based on 8 prefectural regions of Japan to obtain \({\beta }_{1t}^{*},\dots ,{\beta }_{8t}^{*}\), so that \({T}_{1}=6\), \({S}_{1}=8\), \({{\varvec{X}}}_{1}\in {\mathbb{R}}^{25\times 6}\) and \({{\varvec{X}}}_{2}\in {\mathbb{R}}^{47\times 8}\).

Figure 2 shows the MSE for the estimate of the temporal effect \({\alpha }_{t}\). In Case 1, the lasso had the smallest MSE than other methods, followed by the generalized lasso with ALOCV and GCV.

Fig. 2
figure 2

Line-plots of MSE for coefficient \({\alpha }_{t}\) each time point from 100 replications. The Cases 1, 2, and 3 correspond to the true \({\alpha }_{t}+{\beta }_{it}\) displayed in Fig. 1

The true temporal effect \({\alpha }_{t}\) was almost constant, and so we guess that the lasso estimates of pooled coefficients might be more advantageous. In Case 2, the generalized lasso with ALOCV mainly provided the smallest MSE, followed by the generalized lasso with GCV. In this case, the MSEs of lasso and ridge were fluctuated highly, especially at the points where true risk values changed.

In Case 3, although the MSE of lasso had the smallest value for several weeks, but the generalized lasso with GCV’s MSE was the most stable, followed by the generalized lasso with ALOCV. The MSEs of lasso and ridge were also fluctuated when the number of clusters increased or decreased. We guess that pooling of the coefficients might have caused poor performance of lasso and ridge. Generally, in the temporal effect estimation, the generalized lasso with ALOCV and GCV provided relatively smaller and more stable MSE than other methods, for different pattern of true risk values.

Figure 3 shows the average of MSE for the estimates of the spatial effect \({\beta }_{it}\) over 47 prefectures for each time point. In the case of using a common tuning parameter \({\lambda }_{S}\) (Fig. 3a), the generalized lasso with GCV mainly provided the minimum MSE in all cases, followed by the generalized lasso with ALOCV. In Case 1, when the true risk values were steady in weeks 9 to 16, the MSE of lasso was smaller than the generalized lasso with ALOCV. However, in Case 2 and Case 3, when the true risk values changed, the MSEs of other methods were relatively higher than the generalized lasso. The result in the case of using a different tuning parameter \({\lambda }_{S,t}\) (Fig. 3b) was slightly different. Mainly, the generalized ridge provided the smallest MSE. The MSEs of the generalized lasso with GCV and ALOCV was higher at several intermediate weeks in Case 1, and at the first and last several weeks in Case 2, but were smaller than lasso and ridge in other cases.

Fig. 3
figure 3

Line-plots of average and the range of MSE from \({\beta }_{it}\) over 47 prefectures for each time point, in the case of using (a) a common \({\lambda }_{S}\) for all time points and (b) different \({\lambda }_{S,t}\) for each time point. The Cases 1, 2, and 3 correspond to the true \({\alpha }_{t}+{\beta }_{it}\) displayed in Fig. 1

Figure 4 shows the plots of \(IEDA\) for all time points in clustering prefectures. We only show the result of the generalized lasso with ALOCV and GCV and lasso because all edges take non-zero differences for other methods. We can see that the generalized lasso with ALOCV outperformed, as indicated by the highest \(IEDA\) for most cases in Fig. 4a and b. The \(IEDA\) generally increased when the number of separated regions was small and decreased when the number of separated regions was large. Table 2 shows the averages of \(\overline{{Sens }^{E}}\), \(\overline{{PPV }^{E}}\) and \(IEDA\) over all time points. The generalized lasso with ALOCV provided higher sensitivity and \(IEDA\) than the generalized lasso with GCV and the lasso, although the coefficients of lasso were pooled in advance based on 8 prefectures regions. Moreover, we obtained slightly higher \(IEDA\) when using a common tuning parameter than using different tuning parameters for all the cases and methods. If we use a common tuning parameter, the chosen tuning parameter value was not too small, so that the differences between coefficients on the edges tended to shrink to zero, which resulted in more accurate clustering. In contrast, if we use a different tuning parameter at each time point, the tuning parameter chosen varied greatly and was small for many time points. As a result, the differences between coefficients on the edges did not tend to shrink to zero, which decreased the clustering accuracy.

Fig. 4
figure 4

Line-plots of \(IEDA\) for each time point, in the case of using (a) a common \({\lambda }_{S}\) for all time points and (b) different \({\lambda }_{S,t}\) for each time point. The Cases 1, 2, and 3 correspond to the true \({\alpha }_{t}+{\beta }_{it}\) displayed in Fig. 1

Table 2 Averages of \(\overline{{Sens }^{E}}\), \(\overline{{PPV }^{E}}\), and \(IEDA\) over all time points

In summary, our simulation study showed that the proposed method performed well in estimating the temporal effect, as suggested by lower MSE. Moreover, the proposed method was also very flexible in detecting multiple clusters, as shown by high \(IEDA\) values. The generalized lasso with ALOCV outperformed in detecting clusters, while the generalized lasso with GCV performed well in estimating coefficients.

5 Real case data application: weekly Covid-19 cases in Japan

Since the first confirmed case was detected on January 16, 2020, Japan has experienced 5 major waves of the spread of Covid-19 until September 2021. As of September 11, 2021, the total number of Covid-19 cases in all prefectures in Japan was 1,627,898, with 98% recovered rate (Ministry of Health, Labor, and Welfare, 2021). At that time, the number of Japan's Covid-19 cumulative confirmed cases was the 26-th highest in the world (WHO 2021). In our study, we choose the start point on March 21, 2020, because on that date the total confirmed Covid-19 cases exceeded 1,000, spread in 39 of 47 (83%) prefectures of Japan.

Figure 5 shows daily reported Covid-19 cases in Japan, with (a)–(d) indicating periods of each declaration of emergency status respectively. To correspond with the first wave of Covid-19 spread, the first emergency status was declared on April 7, 2020, first in seven prefectures, and then it was expanded nationwide on April 16, 2020 (Fig. 5a). The second wave occurred in August 2020, but at that time the government didn’t declare an emergency status until the end of the year. After the number of cases decreased in autumn, the third wave occurred at the end of 2020, in which the number of infections reached 230,000 people. The second emergency status was declared for Saitama, Chiba, Tokyo, and Kanagawa on January 8, 2021, and was expanded to 11 prefectures on January 13, 2021. The duration of this emergency status was until March 7, 2021 (Fig. 5b). The first dose of Covid-19 vaccination was implemented on April 1, 2021, while at that time, Japan was hit by the fourth wave of outbreak. The third emergency declaration was issued for Tokyo, Osaka, Kyoto, and Hyogo on April 25, 2021, and was expanded to five other prefectures on May 16, 2021, which was lifted on June 20, 2021, except in Okinawa (Fig. 5c). The fifth wave occurred around July to September 2021, during which the Olympic Summer Games was held in Tokyo, and the fourth state of emergency was declared in several prefectures, particularly to prevent spread of the highly contagious Delta variant (Fig. 5d).

Fig. 5
figure 5

Daily reported Covid-19 positive cases in Japan from March 18, 2020, to September 11, 2021, with emergency status periods (a)-(d)

We applied the minimization problem (3) for spatio-temporal analysis, which can be decomposed into the two generalized lasso problems (9) and (10), with tuning parameter \({\lambda }_{T}\) and \({\lambda }_{S,t}\) selected by ALOCV and GCV, to understand the temporal effect and prefectural clusters constructed at each time point. We used the weekly Covid-19 positive case data for each prefecture in Japan from March 21, 2020, to September 11, 2021 (the data file covid_jpn_prefecture.csv in Takaya (2021)). Therefore, we have \(S=47\) and \(T=78\). Let \({y}_{it}^{*}\) be the number of weekly positive cases in the \(i\)-th prefecture and at the \(t\)-th week, and \({N}_{i}\) be the population in the \(i\)-th prefecture. Here, we used the log transformed positive case per population \({y}_{it}=\mathrm{log}\left(\frac{{y}_{it}^{*}}{{N}_{i}}\right)\) as the response variable in the generalized lasso problem. The adjacency between each pair of prefectures was introduced as constraints in the same way as in Sect. 4, and hence we have \({{\varvec{D}}}_{1}\in {\mathbb{R}}^{76\times 78}\) and \({{\varvec{D}}}_{2}\in {\mathbb{R}}^{93\times 47}\).

We calculated an unbiased estimate of the DF for \({\lambda }_{T}\) or \({\lambda }_{S,t}\) to evaluate complexity of the model. According to Tibshirani and Taylor (2011), the DF for given \(\lambda\) in the generalized lasso (1) is defined as,

$$df = E\left[ {nullity\left( {{\varvec{D}}_{ - E} } \right)} \right],$$
(24)

where \(nullity\) \(\left({{\varvec{D}}}_{-E}\right)\) is the dimension of the null space of \({{\varvec{D}}}_{-E}\), reduced rows of the penalty matrix \({\varvec{D}}\) corresponding to the boundary index set \(E\) of a solution of the dual problem (12). The DF for the \({L}_{1}\) penalty suggests the number of fused groups. In estimating temporal effect with (9), we used the formula (24) to calculate DF for selected \({\lambda }_{T}\). In estimating spatial effect for each time point with (10), we selected a common \({\lambda }_{S}\) for all \(t=\mathrm{1,2},\dots ,78\), based on the simulation study described in Sect. 4, in which a common tuning parameter resulted in higher \(IEDA\) values. In this case, the DF was calculated as the average value of DF over all \(t=\mathrm{1,2},\dots ,78\).

5.1 Result of estimating temporal effect

We considered minimizing (9) to estimate the temporal effect \({\alpha }_{t}\). Table 3 contains selected \({\lambda }_{T}\) and the DF based on the proposed method with ALOCV and GCV. We limited the maximum DF to \(\frac{3}{4}\left(78\right)=58.5\) to avoid extremely rough temporal effect. The generalized lasso with ALOCV selected higher \({\lambda }_{T}\) and smaller DF than the results of the generalized lasso with GCV.

Table 3 Selected \({\lambda }_{T}\) and DF for estimating the temporal effect \({\alpha }_{t}\)

Figure 6 shows the estimated temporal effect \({\widehat{\alpha }}_{t}\) for each \(t\) based on selected \({\lambda }_{T}\) using generalized lasso with ALOCV and GCV, with each emergency status period (a)-(d). The break points in the estimated trend should suggest some change of conditions such as emergency status declaration. The estimated trend using ALOCV for \({L}_{1}\) penalty has slightly fewer break points than the one using GCV for \({L}_{1}\) penalty. During the first emergency status period (a), the estimated temporal effect reached the first peak at first and then fell down. After the period (a) ended, it rose quickly and reached the second peak in summer of 2020. During the second emergency status period (b), it reached the third peak at first and then fell down again. During the third emergency status period (c), it slightly increased for a while, reached the fourth peak, and then decreased quickly. During the fourth emergency status period (d), it increased for more than one month and then reached the fifth peak, and then decreased.

5.2 Result of estimating spatial effect

Fig. 6
figure 6

Plot of estimated temporal effect \({\widehat{\alpha }}_{t}\) based on \({\lambda }_{T}\) selected by using generalized lasso with ALOCV and GCV, with emergency status periods (a)-(d)

We considered minimizing (10) to estimate the spatial effect \({\beta }_{it}\). In this case, we assumed that \({\lambda }_{S}\equiv {\lambda }_{S,t}\) for all \(t=\mathrm{1,2},\dots ,78\) and selected it by minimizing the sum of ALOCV and GCV errors over all \(t\).

Table 4 contains selected \({\lambda }_{S}\) and the DF. We limited the maximum DF to \(\frac{3}{4}\left(47\right)=35.25\) to avoid extremely rough spatial effect. ALOCV was not computable at lower \({\lambda }_{S}\) (less than 3.47) for the reason of division-by-zero issue, described in the last paragraph of Sect. 3, so that ALOCV selected higher \({\lambda }_{S}\) with lower DF compared to GCV.

Table 4 Selected common \({\lambda }_{S}\) and DF for estimating spatial effect

The estimates \({\widehat{\beta }}_{it}\) of spatial effect for these methods of selecting \({\lambda }_{S}\) were plotted in Figs. 7, 8, and 9, in which the prefectures are plotted roughly from North (top) to South (bottom), and darker red color indicates higher values. Figure 7 shows unpenalized estimates of the spatial effect \({\widehat{\beta }}_{it}={y}_{it}-{\overline{y} }_{.t}\). Based on this figure, we can see the relative spread of Covid-19 for each prefecture in every week. However, the unpenalized \({\widehat{\beta }}_{it}\) looks very rough, and hence it is very difficult to grasp commonalities and differences of spatial effect between regions for each week. Figure 8 shows the estimated spatial effect \({\widehat{\beta }}_{it}\) with \({\lambda }_{S}\) selected by using ALOCV. It looks very smooth, and one or few clusters covered all prefectures at most of the weeks, and at some weeks the estimated spatial effect had large difference depending on the clustered regions. Figure 9 shows estimated spatial effect \({\widehat{\beta }}_{it}\) with \({\lambda }_{S}\) selected by using GCV. It suggests that there were some clusters of prefectures with the same values. We can see that the largest cluster consisted of most of prefectures during the emergency status periods. However, in other period of weeks, the prefectures were divided into some clusters.

5.3 Clustering the regions based on the estimated spatial effect

Fig. 7
figure 7

Heatmap of unpenalized estimated spatial effect \({\widehat{\beta }}_{it}\) with a common \({\lambda }_{S}\)

Fig. 8
figure 8

Heatmap of estimated spatial effect \({\widehat{\beta }}_{it}\) with a common \({\lambda }_{S}\) selected by ALOCV

Fig. 9
figure 9

Heatmap of estimated spatial effect \({\widehat{\beta }}_{it}\) with a common \({\lambda }_{S}\) selected by GCV

Figure 10 shows the heatmap of the estimated spatial effect \({\widehat{\beta }}_{it}\) with a common \({\lambda }_{S}\) selected by using GCV for generalized lasso as in Fig. 9, but prefectures have been arranged based on agglomerative hierarchical clustering. The heatmap after the arrangement can display relative infection risk, that is, how the infection occurred in a specific area and then spread to other areas.

Fig. 10
figure 10

Heatmap of estimated spatial effect \({\widehat{\beta }}_{it}\) with a common \({\lambda }_{S}\) selected by using GCV for generalized lasso, in which prefectures arranged based on agglomerative hierarchical clustering. The blue horizontal boxes indicated separated clusters constructed and black vertical dashed lines indicated separator between waves

Based on Fig. 10, we can detect six major clusters of prefectures from top to bottom: (1) All prefectures in Kyushu region, (2) All prefectures in Chugoku and Shikoku regions, (3) All prefectures in Kanto, Chubu, Kansai regions, and Fukushima prefecture (central part of Japan), (4) Prefectures in Tohoku region except Fukushima, (5) Hokkaido prefecture, and (6) Okinawa prefecture. We provide the following interpretation of the dynamic behavior of spatial clusters based on the result of generalized lasso clustering with separating into the five waves that Japan has experienced.

  1. i.

    First wave (March 21 to June 27, 2020)

  • During the first wave of infections, relative infection risk increased gradually in the central part of Japan (cluster 3) and Okinawa (cluster 6) and decreased in the remaining clusters. Then, while the first emergency status had been declared from mid-April to May 2020, relative infection risk was extremely higher in Hokkaido (cluster 5) and decreased gradually in the other clusters.

  1. ii.

    Second wave (July 4 to October 17, 2020)

  • In the second wave of outbreaks, relative infection risk was higher in Kyushu (cluster 1), central Japan (cluster 3), and Okinawa (cluster 6), while lower in the other clusters. In cluster 1, the outbreak reached a peak in August 2020 and then decreased gradually. During the period, the relative risk increased gradually in cluster 3. It was the highest and stagnant in Okinawa.

  1. iii.

    Third wave (October 24, 2020, to February 6, 2021)

  • In the third wave, relative risk was higher in central and northern parts of Japan (clusters 3, 4, 5, and 6). Then, it decreased gradually while the second emergency status had been declared from January to March 2021.

  1. iv.

    Fourth wave (February 13 to June 5, 2021)

  • After higher risk in Kanto and Tohoku regions in March 2021, the fourth wave spread to other regions. While the third emergency status had been declared from April to June 2021, relative risk was higher in Okinawa (cluster 6) in April and in Kyushu region (cluster 1) in May but was lower in the other clusters.

  1. v.

    Fifth wave (June 12 to September 11)

  • In the fifth wave of infections, infection risk increased first in Okinawa (cluster 6), spread into central Japan (cluster 3), Tohoku (cluster 4), and Hokkaido (cluster 5), next into Chugoku and Shikoku (cluster 2), and then into Kyushu (cluster 1).

In summary, we can see that the outbreaks that occurred in central Japan (cluster 3) spread into outer regions such as Chugoku-Shikoku region (cluster 2) and Tohoku region (cluster 4) in one month, and then spread into Kyushu region (cluster 1) a few months late. We can also see that the outbreaks in some regions leaped into Hokkaido (cluster 5) and Okinawa (cluster 6) a few months late.

6 Conclusion

In this study, we proposed a regularization approach using a modified generalized lasso model with two \({L}_{1}\) penalties for temporal effect and spatial effect. Then, our proposed method can be separated into two generalized lasso problems: trend filtering to estimate smooth temporal effect and fused lasso to detect clusters of spatial location for each time point. Through our proposed method, we can understand dynamic behavior of spatial clusters over time more flexibly, based on relative magnitude of estimated spatial effect at each time point.

To select the appropriate tuning parameters in the generalized lasso, we considered using ALOCV and GCV. Our simulation study suggested that estimation of temporal and spatial effects using generalized lasso with ALOCV and GCV was comparable or superior in terms of MSE to existing regularization methods such as lasso, ridge, and generalized ridge. Also, we showed that the generalized lasso with ALOCV provided higher \(IEDA\), the accuracy of detecting edges with non-zero difference. In addition, our simulation study suggested that a common tuning parameter over all time points was preferable in spatial clustering.

Then, through the analysis of weekly Japan’s Covid-19 panel data, we illustrated how to understand the spread of Covid-19 infection using our modified generalized lasso model. In estimation of the spatial effect over weeks, the generalized lasso with a common tuning parameter over all time points selected by GCV, provided a reasonable result.

This study mainly used the “genlasso” package of R software to solve the generalized lasso problems using the dual path algorithm (Arnold and Tibshirani 2016). However, we may consider using the coordinate descent algorithm as suggested in Yamamura et al. (2021), which suggested to have better estimation accuracy and speed than the algorithm used in “genlasso”. Moreover, to detect the spatial clusters in the spread of disease as a task in epidemiology studies, the response variable is often observed as count data. The application of modified generalized lasso for count data was proposed by Choi et al. (2018), to which we have a great attention in our future work.