Multi-density crime predictor: an approach to forecast criminal activities in multi-density crime hotspots

Cesario, Eugenio; Lindia, Paolo; Vinci, Andrea

doi:10.1186/s40537-024-00935-4

Multi-density crime predictor: an approach to forecast criminal activities in multi-density crime hotspots

Research
Open access
Published: 17 May 2024

Volume 11, article number 75, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Big Data Submit manuscript

Multi-density crime predictor: an approach to forecast criminal activities in multi-density crime hotspots

Download PDF

Eugenio Cesario¹^na1,
Paolo Lindia¹^na1 &
Andrea Vinci²^na1

660 Accesses
Explore all metrics

Abstract

The increasing pervasiveness of ICT technologies and sensor infrastructures is enabling police departments to gather and store increasing volumes of spatio-temporal crime data. This offers the opportunity to apply data analytics methodologies to extract useful crime predictive models, which can effectively detect spatial and temporal patterns of crime events, and can support police departments in implementing more effective strategies for crime prevention. The detection of crime hotspots from geo-referenced data is a crucial aspect of discovering effective predictive models and implementing efficient crime prevention decisions. In particular, since metropolitan cities are heavily characterized by variable spatial densities of crime events, multi-density clustering seems to be more effective than classic techniques for discovering crime hotspots. This paper presents the design and implementation of MD-CrimePredictor (Multi- Density Crime Predictor), an approach based on multi-density crime hotspots and regressive models to automatically detect high-risk crime areas in urban environments, and to reliably forecast crime trends in each area. The algorithm result is a spatio-temporal crime forecasting model, composed of a set of multi-density crime hotspots, their densities and a set of associated crime predictors, each one representing a predictive model to forecast the number of crimes that are estimated to happen in its specific hotspot. The experimental evaluation of the proposed approach has been performed by analyzing a large area of Chicago, involving more than two million crime events (over a period of 19 years). This evaluation shows that the proposed approach, based on multi-density clustering and regressive models, achieves good accuracy in spatial and temporal crime forecasting over rolling prediction horizons. It also presents a comparative analysis between SARIMA and LSTM models, showing higher accuracy of the first method with respect to the second one.

Predicting Crime Using Spatial Features

Prediction of Crime Hot Spots Using Spatiotemporal Ordinary Kriging

Progresses and Challenges of Crime Geography and Crime Analysis

Introduction

Reference context

The increasing urbanization occurring during the last years is transforming every aspect of the urban society and affecting its sustainable development [1,2,3,4]. In fact, as urbanization continues to grow, it is bringing significant social and economic benefits (i.e., additional urban services and employment opportunities), while also presenting challenges in city management issues, like resource planning (water, electricity), traffic, air and water quality, public policy and public safety services.

Among the main urban issues, criminal activities are one of the most important social problems in metropolitan areas, because they can severely affect public safety, harm the economy and sustainable development of a society, as well as reduce the quality of life and well-being of citizens. For such a reason, improving strategies to effectively manage and utilize limited public security resources has become a crucial issue for policymakers and urban management departments.

However, ICT technologies and sensor infrastructures are enabling public organizations and police departments to gather and store increasing volumes of crime-related data, with spatial and temporal information. This offers the opportunity to apply data analytics methodologies to extract useful knowledge models, which can effectively detect spatial and temporal patterns of crime events. By extracting useful predictive models and applying appropriate methods for data analysis, police departments are supported to better utilize their limited resources and implement more effective strategies for crime prevention.

Motivations and contributions

Several criminal justice studies show that the incidence of criminal events is not uniformly distributed within a city [2, 3, 5, 6]. In fact, crime trends are strongly affected by the geographic location of the area (there are low-risk and high-risk areas). Also, they can vary with respect to the period of the year (there could be seasonal patterns, peaks, and dips). For this reason, an effective predictive model must be able to automatically determine which city neighborhoods are most affected by crime-related incidents, namely crime hotspots, as well as how the crime rate in each particular hotspot evolves over time. This knowledge can allow police departments to allocate their resources more efficiently over the urban territory, enabling the effective deployment of officers to high-risk areas, or moving officers from areas expecting a decline in crime activities, thus more efficiently preventing or promptly responding to crimes.

In literature, classic density-based clustering algorithms are largely exploited to discover spatial hotspots [7,8,9,10,11]. However, due to the adoption of global parameters, they fail to identify multi-density hotspots (i.e., different regions having various densities [12, 13]) unless the clusters (or hotspots) are clearly separated by sparse regions [14]. In particular, this is a key issue when analyzing crime data and thus correctly detecting the real crime hotspots. In fact, the density of population, traffic, or events in large cities can vary widely from one area to another area [5], which also makes the incidence of crime events extremely dissimilar in terms of density.

Such a spatial density variation in crime events challenges the discovery of proper hotspots when classic density-based algorithms perform the analysis. For example, the well-known DBSCAN [14] receives two global input parameters ($\epsilon$ and $min-points$), which results in a minimum density threshold $\delta _{min}$ that is exploited for clustering the whole dataset. The optimal value of $\delta _{min}$ can affect the densities of the discovered hotspots and does not deal with large density variations in the urban data. Indeed, if the value of $\delta _{min}$ is too small, the algorithm can discover several small non-significant hotspots that actually do not represent dense crime regions, while if $\delta _{min}$ is too large, it can discover a few large regions having high intra-cluster density variations. Thus, classic density-based clustering algorithms fail to identify proper hotspots characterized by different density levels, and their application to discover crime hotspots can produce inaccurate results, particularly in urban environments. A recent study in Cesario et al. [5] shows that multi-density clustering achieves higher performance than classic approaches for discovering hotspots in multi-density urban environments.

This paper presents the design and implementation of MD-CrimePredictor (Multi-Density Crime Predictor), an approach based on multi-density crime hotspots and regressive models to automatically detect high-risk crime areas in urban environments, and to forecast crime trends in each area reliably. The algorithm is composed of three main steps. First, multi-density crime hotspots are detected by applying a multi-density clustering algorithm (i.e., CHD) proposed in Cesario et al. [5], where densities, shapes, and number of the detected regions are automatically computed by the algorithm without any pre-fixed division in areas. Then, a specific regressive model is discovered from each detected hotspot, analyzing the partitions discovered during the previous step. In this paper, this is done by exploiting both SARIMA [15] and LSTM [16] models, and a comparative experimental analysis is presented in terms of error measures. The final result of the algorithm is a spatio-temporal crime forecasting model, composed of a set of crime hotspots, their densities, and a set of associated crime predictors, each one representing a predictive model to forecast the number of crimes that are estimated to happen in its specific hotspot. The experimental evaluation of the proposed approach has been performed by analyzing a large area of Chicago, involving more than two million crime events (over a period of 19 years). The experimental evaluation, aimed at assessing the effectiveness of the approach over rolling prediction horizons, presents a comparative analysis between SARIMA and LSTM regression models, demonstrating higher accuracy of the first method with respect to the second one. We also provide a comparative assessment of the proposed approach with other studies proposed in literature, drawing a comparison in terms of hotspots detection and crime forecasting accuracy. Overall, the results show the effectiveness of the approach, by achieving good accuracy in spatial and temporal crime forecasting over rolling time horizons.

Plan of the paper

The rest of the paper is organized as follows. Section "Related work" reports the most important approaches proposed in the literature for crime hotspot detection and crime forecasting. Section "Problem Definition and Proposed Approach" outlines the problem statement and describes the approach proposed in the paper and reports its steps in detail. Section "Experimental Evaluation and Results" provides the experimental evaluation of the proposed approach on a real-world scenario by showing a comparative analysis between SARIMA and LSTM performances. The section also shows a comparison between the results achieved with the presented approach and other methodologies proposed in the literature. Finally, Sect. "Conclusion" concludes the paper and plans future research works.

Related work

Recently, crime hotspot detection and crime forecasting have been raised as hot topics within the research community. This section briefly reviews the most representative research works in both areas.

Crime forecasting

One of the first frameworks proposed in the literature for crime data analysis is CrimeTracer [17], which is based on a probabilistic approach to model the spatial behavior of known offenders within areas they frequent, called activity spaces. This work is based on the assumption, based on crime pattern theories, that offenders frequently commit serial violent crimes in places they are most familiar with (namely, their activity space). Also, the authors claim that taxi flows can provide useful information to correlate activity spaces, even if they are not geographically connected. Experiments carried out on real-world crime data have shown that criminals frequently commit crimes within their activity spaces, rather than venture into unknown territories. CrimeTracer is indeed able to predict the location of the next crime committed by known offenders but it does not provide information about the time window for the next crime events. Also, it requires a dataset with information related to specific offenders, which could not be available in general.

The work in Catlett et al. [7] presented a predictive approach based on spatial analysis and auto-regressive models in order to detect high-risk regions in urban areas and to forecast crime trends in each region. The approach exploits the DBSCAN algorithm to detect high-risk regions and ARIMA models to fit crime predictors. The approach has been validated on two crime datasets (i.e., Chicago and New York City areas) comprising crime events spanning from 2001 to 2016. The study shows good performances on both datasets, considering a three-year ahead forecasting window, which is a long-term time horizon. The approach is capable of detecting crime-dense regions having any shapes, however the main drawback is that DBSCAN detects wide regions or a large number of outliers, as it cannot tackle the multi-density nature of urban datasets.

The study described in Zhu et al. [3] proposes a hierarchical crime prediction framework, which integrates a modified gated GCN (Graph Convolutional Networks) and VMD (variational mode decomposition), to holistically predict the short-term crime patterns in different communities and support proactive policing. The approach is composed of several steps. First, the temporal dependency is decomposed in the frequency domain, and a network is constructed to capture the spatial relationships within the sub-frequencies. Then, human mobility traces are exploited to characterize the dynamic relationships within the network. The experimental evaluation has been focused on the crime distribution evolution of crimes in Chicago, to predict the short-term criminal events in the different communities holistically. The study concludes that social interactions based on human activity data can characterize dynamic crime distribution relationships, as well as spatial crime distribution evolutions. The main strength of the research study proposed in Zhu et al. [3] leverages on the dynamic relationships between human mobility and crimes, which represents a relevant methodological difference with other approaches proposed in literature; in particular, the analysis of human mobility allows to characterize also the dynamic distribution and evolution of crimes within and across areas, which is strongly affected by social interactions among individuals. However, while the approach exhibits reasonable effectiveness of taking a relationship-based perspective for crime forecasting, the theoretical description needs further verification (as also claimed by authors): in fact, as human activity data is multi-source, multi-granular, and multi-mode, and involves complex relationships, a more refined classification of human mobility trends is needed to understand their effects on different crime evolutions.

A general framework for crime data mining, exploited for some analysis tasks in collaboration with the Tucson and Phoenix Police departments, is presented in Chen et al. [18]. In particular, the paper describes three examples of its use in practice. First, entity extraction algorithms have been used to automatically identify persons, addresses, vehicles, and personal characteristics from police narrative reports (usually containing many typos, spelling errors, grammatical mistakes, etc.). Second, a text mining algorithm has been explored for deceptive identity detection, to discover the real identity of suspects that have given false names, faked birth dates, or false addresses. Third, a concept-based approach has been exploited to identify subgroups or key members in criminal networks, and to study interaction patterns among them. In our opinion, the main strength of this study is its innovativeness in providing investigators with a framework for automatically applying crime entity-extraction techniques on crime data, aiming to extract serial offenders’ behavioral patterns. However, using only crime department data could limit the applicability and effectiveness of the framework; as also observed in Chen et al. [18], additional heterogeneous data (i.e., citizenship, secret services, immigration, web, social) could enable the development of more intuitive techniques for crime pattern and network visualization, and higher accuracy in criminal activity predictions.

Authors of Liang et al. [19] propose a framework, named CrimeTensor, to predict the number of crime incidents belonging to different categories within each target region. The framework, based on tensor learning with spatio-temporal consistency techniques, aims to offer fine-scale prediction results considering spatio-temporal categorical correlations in crime events. Crime data is modeled as a tensor, and an objective function is presented, which leverages spatial, temporal, and categorical information. The prediction task is done by applying CANDECOMP/PARAFAC decomposition to find an optimal solution for the defined objective function. The approach is validated by conducting experiments on two real-world crime datasets, collected in the ** approach. The results of the experimental evaluation on the artificial datasets, made in Cesario et al. [25], are reported in Tables 2 and 3, where the clustering results are compared by several performance indexes (for each index, the best achieved result is reported in bold). The analysis shows that the HDBSCAN and CHD algorithms are the most effective in detecting clusters in multi-density dataset, and that CHD performs better than HDBSCAN on the second dataset (see Table 3). However, other approaches are presented in the literature, specifically tailored for clustering spatio-temporal data. The work in Nanni et al. [26] presents the TF-OPTICS algorithm, designed for time-focused clustering. The algorithm processes a set of spatio-temporal objects, each one represented by a trajectory of values, as a function of time. TF-OPTICS focuses on computing distances between trajectories by searching for the best possible time interval. This algorithm, as well as those ones tailored for clustering trajectories of moving objects, does not suit to the proposed use case, because we focus on crime events characterized both in time and space, that can not be aggregated in a set of well-defined trajectories. A more fitting algorithm for clustering spatio-temporal data is presented in Agrawal et al. [27]. The algorithm, called ST-OPTICS, is density-based, and exploits two different $\epsilon$ parameters, one for clustering points in space and the other for clustering points in time. A comparison between the proposed approach, based on CHD, and an alternative one, based on the ST-OPTICS algorithm, is provided in the Sect. "Comparative analysis with ST-OPTICS on hotspots detection and crime forecasting".

Table 2 Performance comparison between different density-based clustering algorithm on dataset Zahn Compound [25]

Full size table

Table 3 Performance comparison between different density-based clustering algorithm on Ordered Chess dataset [25]

Full size table

Main differences and novelty of MD-CrimePredictor

With respect to the summarized works, this paper presents two main novelties. First, it introduces MD-CrimePredictor, where a multi-density clustering algorithm (i.e., CHD) is exploited for crime hotspot detection (to the best of our knowledge, this is the first research study in the crime data analysis domain, showing results on multi-density crime hotspots). The exploited approach CHD is able to automatically detect multi-density (and multi-shape) crime hotspots, which differentiates it w.r.t. all the other approaches reviewed here, thus showing important benefits in the urban data analysis. MD-CrimePredictor relies on the exploitation of both seasonal regressive (SARIMA) and deep-learning (LSTM) models for crime forecasting in each discovered hotspot, and, as e second contribution, the paper furnishes an extensive comparative evaluation between the results given by the two forecasting algorithms. Also, to assess the effectiveness of the CHD-based approach for hotspot detection, we show a comparative analysis of the proposed approach with other studies proposed in literature, drawing a comparison in terms of hotspots detection and crime forecasting accuracy

Problem definition and proposed approach

This section presents the problem formulation and the approach proposed in the paper to forecast crime events in multi-density crime hotspots. Specifically, Sect. "Problem definition and goals" depicts the problem under investigation and its goals, whereas Sect. "The multi-crime-predictor approach" details the algorithm proposed in the paper.

Problem definition and goals

We begin by fixing a proper notation to be used throughout the paper. Let $T=<t_1,t_2,\ldots ,t_H>$ be an ordered timestamp list, such that $t_h<t_{h+1}, \forall _{ 0\le h<H}$, and where all $t_h$ are at equal time intervals (e.g., every hour, day, week). Let $\mathcal{C}\mathcal{D}$ be a crime dataset collecting crime events, $\mathcal{C}\mathcal{D}=<CD_1,CD_2,\ldots ,CD_N>$, where each $CD_i$ is a data instance described by $<latitude,longitude,t>$, i.e., the coordinates of the place and the time (with $t \in T$) the event occurs at. Now, let us consider a future temporal horizon, $S=<t_s, t_{s+1}, \ldots>$, with $s>H$. The goal of the analysis is to discover a set of crime hotspots in the city (which can have multi-density distribution of the events) and predictive models for reliably forecasting the number of crimes in each hotspots at a given timestamp $t_s \in S$. More specifically, the goal of the proposed approach aims at achieving the following goals:

1.
Discover a set $\mathcal{C}\mathcal{H}$ of crime hotspots, $\mathcal{C}\mathcal{H} = \{CH_1, \ldots , CH_K\}$, where a crime hotspot $CH_k$ is a spatial area which criminal events occur in with an higher density than other areas in the city;
2.
Compute a set $\Sigma$ of crime hotspot densities, $\Sigma =\{\sigma _1,\sigma _2,\ldots ,\sigma _H\}$, where each $\sigma _h$ is the spatial density of events occurred in the hotspot $CH_h$.
3.
Extract a set $\mathcal {F}_{crimes}$ of crime predictors, $\mathcal {F}_{crimes} = \{\mathcal {F}^1_{crimes}, \dots , \mathcal {F}^K_{crimes}\}$, where each function $F^k_{crime}:\mathcal {S}\rightarrow \mathcal {R}$, given a timestamp $t_s \in S$ states the number of crimes $N \in \mathcal {R}$ that are predicted to happen in the crime hotspot $CH_k \in \mathcal{C}\mathcal{H}$ at the timestamp $t_s$.

The multi-crime-predictor approach

The approach proposed in this paper is sketched in Fig. 1, and its meta-code is reported in Algorithm 1. The algorithm is composed of three main steps, as described in the following.

Step 1. Multi-density Crime Hotspots detection. The first step consists in the detection of multi-density crime hotspots from the original dataset, that is, areas where crime events occur with greater density than other adjacent areas. The goal of this step is to detect spatial areas of interest for crime forecasting, in order to conduct the further analysis over areas rather than single points. This step is performed by the DiscoverCrimeHotspots($\mathcal {D}$) method (line 1 of Algorithm 1), which returns the set $\mathcal{C}\mathcal{H}=\{CH_1,\ldots ,CH_H\}$ of crime hotspots and their corresponding densities $\Sigma =\{\sigma _1,\sigma _2,\ldots ,\sigma _H\}$. This task has been modeled as a geo-spatial clustering instance and has been performed, as described in Sect. "Detection of multi-density crime hotspots", using the City Hotspot Detector (CHD) multi-density clustering algorithm [5]. The number of detected hotspots is automatically detected by the algorithm, and their shapes are traced without any pre-fixed division in areas. The parameter setting for CHD is chosen by adopting a parameter-swee** methodology, that is, by running several instances of the CHD algorithm by varying their input parameters, and choosing the parameter settings that maximizes a set of internal indexes which comprises Silhouette [28], DBCV [29], CDBW [30], Calinsky-Harabaz [31], Davies-Bouldin [32].

Step 2. Crime Time Series Extraction. The second step consists in the spatial data splitting of the original crime data, based on the clustering model discovered at the previous step. In other words, the points of the original crime data events assigned to the $i^{th}$ hotspot are transformed in a time series and gathered in the $i^{th}$ output dataset, for $i = 1,...,K$. At the end of this step, K different time series data sets are available, each one containing the time series of crimes occurred in its associated dense region, aggregated on a weekly basis.

Step 3. Predictive Crime Models extraction. The third step is aimed at extracting a specific crime prediction model $F^i_{crime}$ for each $i^{th}$ crime hotspot, analyzing the crime data split during the previous step. This task can be done by applying different regression techniques. In particular, in our approach this task has been implemented by exploiting both SARIMA and LSTM techniques (which have been resulted the most effective approaches to this purpose), as described in Sect. "Extraction of crime predictors".

Detection of multi-density crime hotspots

The detection of crime hotspots has been done by exploiting the CHD algorithm [5], a multi density-based clustering algorithm that has been purposely designed for processing urban spatial data and discover multi-density hotspots. The algorithm is composed of several steps, as reported in Algorithm 2. First, given a fixed k variable, the k-nearest neighbors distance for each point is computed and exploited as an estimator of the density of each data point (line 1). Then, the points are sorted with respect to their estimated density, and the density variation between each consecutive couple of points in the ordered list is computed (line 2). The obtained density variation list can show very frequent fluctuations between subsequent values (in particular, in the analysis of real-wold urban data), thus a moving average filtering over windows of size s is applied to smooth out such fluctuations and highlight main trends (line 3). The data points are then partitioned into several density level sets (each one characterized by homogeneous density distributions), on the basis of the smoothed density variations (line 4). Then, a different $\epsilon$ value is estimated for each density level set (line 5). Finally, each set is analyzed by the DBSCAN algorithm (lines 7–12). Specifically, each instance takes as input the specific $\epsilon$ value computed for the analyzed density level set. The set of clusters detected for each partition constitutes the final result of the CHD algorithm. More details about CHD can be found in [5]. Moreover, in Cesario et al. [25] CHD has been proven to be effective in detecting clusters characterized by different densities in urban spatial datasets.

Extraction of crime predictors

Given a specific crime hotspot, the DiscoverLocalCrimePredictor() method (line 4 in Algorithm 1) extracts a regressive model to forecast the number of crimes that will happen in its specific area. In this paper, this has been performed by exploiting SARIMA (Seasonal AutoRegressive Integrated Moving Average) and LSTM (Long Short-Term Memory) models. Such models and their principles are briefly summarized in the following.

SARIMA models

Multiple regression models have been defined with the goal of forecasting a variable of interest using a linear combination of predictors [33]. In particular, in an auto-regression model, the variable of interest is forecasted using a linear combination of its past values (the term auto-regression indicates that it is a regression of the variable against itself), while a moving average model uses past forecast errors in a regression-like model. Sometimes, as a preliminary step to the regressive analysis, time series need a differencing transformation to stabilize the mean of a time series and so eliminating (or reducing) trend and seasonality. A combination of differencing, auto-regression and moving average methods is known as AutoRegressive Integrated Moving Average model (more frequently referred by its acronym ARIMA) [33], formally defined in the following.

Let us consider the time series $\{y_t: t=1...n\}$, where $y_t$ is the value of the time series at the timestamp t. Then, an ARIMA(p, d, q) model is written in the form

$$y^{(d)}_t = c + \phi _1 y^{(d)}_{t-1} + \ldots + \phi _p y^{(d)}_{t-p} + \theta _1 e_{t-1} + \ldots + \theta _q e_{t-q} + e_t$$

where:

$y^{(d)}_t$ is the $d^{th}$-differenced series of $y_t$, that is: $y^{(d)}_t=y^{(d-1)}_t-y^{(d-1)}_{t-1},~...~,y^{(d)}_{t-p}=y^{(d-1)}_{t-p}-y^{(d-1)}_{t-p-1}$;
$\phi _1,\ldots ,\phi _p$ are the regression coefficients of the auto-regressive part;
$\theta _1,\ldots ,\theta _q$ are the regression coefficient of the moving average part;
$e_{t-1},\ldots ,e_{t-q}$ are lagged errors;
$e_t$ is white noise and takes into account the forecast error;
c is a correcting factor.

The regression model above described is referred as ARIMA(p, d, q), where the order of the model is stated by three parameters: p (order of the auto-regressive part), d (degree of first differencing involved) and q (order of the moving average part). A useful notation commonly adopted when treating this kind of models is the ’backshift notation’ [34,35,36], that is based on the B operator. The B ($B^d$) operator on $y_t$ has the effect of shifting the data back one period (d periods). This is very useful when combining differences, as the operator can be treated using ordinary algebraic rules. By using the ’backshift’ operator, the full model can be written as:

$$(1-\phi _1B - \ldots - \phi _pB^p)(1-B)^dy_t = (1 - \theta _1B - \ldots - \theta _qB^q)e_t$$

whose details are out of the scope of this work and a formal demonstration can be found in [33,34,35].

In order to deal with seasonality, the classical ARIMA processes have been generalized and extended by the SARIMA (i.e., Seasonal ARIMA) models. A SARIMA model is formed by including additional seasonal terms (modeling a seasonal component that repeats with a given periodicity) in the classic ARIMA models previously introduced. The seasonal part of the model consists of terms that are very similar to the non-seasonal components of the model. In the final formula, the additional seasonal terms are simply multiplied with the non-seasonal terms. A seasonal ARIMA model is referred as $SARIMA(p,d,q)(P,D,Q)_m$, where m is a periodicity factor.

The SARIMA model can be written as [15]:

$$\phi _p(B)\Phi _P(B^m)\bigtriangledown ^d\bigtriangledown _m^D y_t = \theta _q(B)\Theta _Q(B^m)e_t$$

where p and q represent non-seasonal ARIMA order, P and Q represent seasonal ARIMA order, d is the number of time differences and D is the number of seasonal difference. B is the backshift operator and is defined such that $y_tB^s=y_{t-s}$. $\phi _p(B) = (1-\phi _1B - \ldots - \phi _pB^p)$ is the AR operator and $\theta _q(B) = (1 - \theta _1B - \ldots - \theta _qB^q)$ is the MA operator. $\Phi _P(B^m) = (1-\Phi _mB^m - \ldots - \Phi _{Pm}B^{Pm})$ is the seasonal AR operator and $\Theta _Q(B^m) = (1 - \Theta _mB^m - \ldots - \Theta _{Qm}B^{Qm})$ is the seasonal MA operator. $y_t$, which has both seasonal and non-seasonal components, is differenced d times (length one) and D times (length m). $\bigtriangledown ^d = (1-B)^d$ is the non-seasonal differencing operator and $\bigtriangledown _m^D = (1 - B^m)^D$ is the seasonal differencing operator. $e_t$ is the random shocks that are not autocorrelated.

Once the differencing order has been chosen i.e. d and D values, the estimation of the best model order and the regression coefficient values is performed by applying the Hyndman-Khandakar’s algorithm. Briefly, the algorithm performs a step-wise search to traverse the model space and discover the optimal combination of p, q, P and Q values, which is based on the minimization of the AIC (Akaike’s Information Criterion) [33]. Then, the estimation of the regression parameters of both seasonal (i.e., $\phi _1,\ldots ,\phi _p$ and $\theta _1,\ldots ,\theta _q$) and non-seasonal part ($\Phi _1,\ldots ,\Phi _p$ and $\Theta _1,\ldots ,\Theta _q$) is obtained by maximizing the MLE (Maximum Likelihood Estimation) [33], i.e., the probability of fitting the data that have been observed.

LSTM

The LSTM model is a recurrent neural system designed to overcome the exploding/vanishing gradient problems that typically arise when learning long-term dependencies, even when the minimal time lags are very long [16]. The LSTM architecture consists of a set of recurrently connected sub-networks, known as memory blocks. The idea behind the memory block is to maintain its state over time and regulate the information flow through non-linear gating units [37]. The output of the block is recurrently connected back to the block input and to all of the gates. As shown in Fig. 2 LSTM has an internal state variable, which is passed from one cell to the subsequent, and modified by the following Operation Gates [37]:

Forget gate: it is a sigmoid layer that takes the output at $\textit{t - 1}$ and the current input at time $\textit{t}$, concatenates them and applies a linear transformation followed by a sigmoid:
$$f^{(t)} = \sigma (W_f [ h^{(t-1)},x^t ] + b_f)$$
Input gate: it takes the previous output and the new input and passes them through another sigmoid layer, so this gate returns a value between 0 and 1.
$$i^{(t)} = \sigma (W_i [ h^{(t-1)},x^t ] + b_i)$$
This value is multiplied with the output of the candidate layer:
$$C^{(t)} = tanh (W_c [ h^{(t-1)},x^t ] + b_c)$$
The candidate layer applies a hyperbolic tangent returning a candidate vector to be added to the internal state, which is updated as follows:
$$C^{(t)} = f^{(t)}C^{(t)} + i^{(t)}C^{(t)}$$
The previous state is multiplied by the forget gate and then added to the fraction of the new candidate allowed by the output gate.
Output gate: it controls how much of the internal state is passed to the output and it works in a similar way to the other gates:
$$o^{(t)} = \sigma (W_o [ h^{(t-1)},x^t ] + b_o)$$
$$h^{(t)} = o^{(t)} tanh(C^{(t)})$$

Once the number of layers, the number of nodes/units and the activation function per layer have been chosen, the estimation of the best model weights is performed by applying the backpropagation algorithm, i.e. one of the most popular neural network algorithms exploited to compute the necessary correction of weights that have been set randomly at first. Briefly, the algorithm can be decomposed in the following steps [38]:

Feed-forward computation: given an input for the network, the output is computed by evaluating the network layer by layer, from the input to the output layers.
Back propagation: the error (loss) of the output layer is computed by comparing it with the reference. Once the layer error has been identified, it is exploited to compute the error for the previous layer, thus propagating it backward. This is repeated for all the layers back to the input one.
Weight updates: as the errors in all the network layers have been computed, the weights are changed in order to reduce the error, by exploiting the gradient descent algorithm.

The algorithm is stopped when the changes in the value of the chosen loss function become lower than a given threshold value.

Experimental evaluation and results

To assess the performance and usefulness of the algorithm described above, we conducted an extensive experimental analysis by running several experiments in a real-world case study represented by a large area of Chicago. Our analysis aims to identify the most significant multi-density crime hotspots and build efficient prediction models that can forecast the number of future crimes likely to occur in each hotspot. We also present a comparative analysis between SARIMA and LSTM forecasting models. The rest of this section is organized as follows. Section "Data description" describes the area selected for the analysis and the gathered data, Sect. "Crime hotspots: results and discussion" reports the results in terms of multi-density crime hotspots, and Sect. "Crime forecasting models: results and discussion" describes the evaluation of the regressive models, i.e., SARIMA and LSTM, comparing the achieved accuracy to predict crimes in the detected hotspots. Sect. "Comparative analysis with ST-OPTICS on hotspots detection and crime forecasting" furnishes a comparative evaluation of CHD and ST-OPTICS, establishing a contrast in crime prediction accuracy between hotspots based on CHD and those based on ST-OPTICS. Finally, Sect. "Comparison with other crime forecasting approaches on the Chicago Crimes dataset" reports a comparison of the performances between MD-CrimePredictor with other crime forecasting approaches [21,39], which evaluate the goodness of a clustering structure without respect to external labels. To do so, the following set of internal indexes are here adopted: Silhouette [28], DBCV [29], CDBW [30], Calinsky-Harabaz [31], Davies-Bouldin [32], which are used in literature to evaluate the clustering quality in terms of compactness, separation, number of clusters and density when no external information is available [39].

The first set of experimental results is reported in Fig. 5, which shows the performance achieved by the CHD algorithm with $\omega$ varying from $-$0.3 to $-$0.25. In particular, Figure 5a shows how the aforementioned internal indexes, evaluating the clustering quality, vary with respect to $\omega$ values. We can observe that the quality of detected hotspots is very sensitive to $\omega$, whose best value, in this case, can be clearly estimated as equal to $\omega ^*$ = $-$0.27. On the other side, Figure 5b shows how the number of noise points (blue curve) and the number of detected hotspots (red curve) vary with respect to $\omega$ values. Noise points are data instances that do not meet the criteria for falling into any of the detected clusters (and are considered outliers by the algorithm), while the number of detected hotspots depends on the algorithm’s ability to find a balanced trade-off between separability and compactness properties. We can observe that for $\omega ^*$=$-$0.27, the number of detected noise points is 18,929, while the number of detected clusters is 200.

As reported above, we have run several experimental tests to find the parameter settings capable of detecting the highest-quality city hotspots. For such a reason, in the following, we present the results achieved by fixing $\omega = -0.27$, $k=64$, $s=5000$, which have been assessed to best suit our application scenario and the considered dataset by the previous analysis.

Now, let us analyze more in detail the crime hotspots detected in the considered scenario. As reported in Sect. "The multi-crime-predictor approach", the clustering algorithm exploited in this work first partitions the original data in several density level sets (each one characterized by homogeneous density distributions on the basis of density variations), then analyzes each density level set through a specific density-based clustering algorithm to detect proper clusters in each partition. The final hotspots (i.e. totally 200) discovered by the algorithm are shown in Fig. 6, where a different color represents each region. Interestingly, this image shows how crime events are clustered on the basis of a density criterion; for example, the algorithm detects several significant crime regions clearly recognizable through different colors: a large crime region (in red) in the central part of the area along with seven smaller areas (in green, blue and light-blue) on the left and right side, corresponding to zones with the highest concentration of crimes. The five most relevant crime hotspots ($CH\#197$, $CH\#198$, $CH\#8$, $CH\#21$, and $CH\#15$) are zoomed-in on the left and right sides of Fig. 6. Many other hotspots are detected, representing areas having minor crime-densities w.r.t. the highlighted ones, or local high-density crime zones surrounded by low-density ones. Table 4 shows several statistics about the whole area and the five most numerous crime hotspots. Overall, these regions cover about 22% of the whole area extension and about 55% of the crime events detected in the whole area between 2001 and 2019.

Table 4 Descriptive statistics—whole area and crime hotspots

Full size table

Finally, in order to make a comparative analysis among classic density-based algorithms and multi-density approaches for hotspots detection, we report here a comparative table (Table 5) showing the results of four algorithms (two classic approaches: DBSCAN and OPTICS-**, and two multi-density approaches: CHD and HDBSCAN). Table 5 shows, for each algorithm, the selected input parameters and some statistics related to the achieved results (i.e., number of detected hotspots, percentage of noise points, Silhouette evaluation measure) on the Chicago crime dataset exploited in this paper and described in Sect. "Data description". By observing the results in Table 5, we can observe that HDBSCAN and CHD achieve higher clustering qualities than DBSCAN and OPTICS-**; in fact, HDBSCAN and CHD (multi-density algorithms) assess on silhouette values equal to $-$0.19 and $-$0.23, respectively, which are better than DBSCAN and OPTICS-xi’s results, whose clustering qualities assess on $-$0.28 and $-$0.46. Such results show that multidensity clustering (i.e., HDBSCAN and CHD) is able to distinguish and identify proper hotspots in urban environments better than classic density-based techniques. Moreover, focusing on the two multi-density algorithms CHD and HDBSCAN results, we can observe that CHD achieves a slightly lower silhouette than HDBSCAN, but it labels a very lower percentage of noise points (5.7%) with respect to HDBSCAN (34.6%). For such a reason, CHD resulted the best algorithm to be exploited in our crime data analysis case study. A more detailed analysis about the comparison among such algorithms is reported in [25].

Table 5 Comparative results achieved by DBSCAN, OPTICS-**, CHD and HDBSCAN to detect crime hotspots, on the Chicago crime dataset [25]

Full size table

Crime forecasting models: results and discussion

As described in Sect. "The multi-crime-predictor approach", the next steps of the algorithm consist of (i) transforming the original crime data set in several time series, and (ii) training local crime predictors for each crime hotspot. In particular, as described in Sect. "Extraction of crime predictors", the extraction of crime regressors has been performed by applying SARIMA and LSTM models on each hotspot. Specifically, we present here the details of the regressive models obtained by both algorithms for the whole area and the three largest crime hotspots, i.e., CH#197, CH#198, and CH#8. Then, we will show the predictive performance of the models on the test set for all hotspots.

The regressive models extracted by SARIMA are reported in Table 6. For each area, the table shows the order of the models, the final autoregressive formulas (in back-shift notation), and the final coefficient values. It is worth noting that the predictive crime models differ among the hotspots, showing that each area presents specific crime trends and patterns, thus making the discovery of different predictive models reasonable.

The models extracted by LSTM are reported in Table 7. For each area, neural networks are trained with 4 layers, ReLu [40] activation function, a number of epochs equal to 50, and a customised batch size and number of units/nodes per layer. In each of the models presented, the mean absolute error (mae) loss function is considered. One of the most important factors in neural network training is the learning rate, a customized hyperparameter with a small positive value between 0.0 and 1.0 [41]. The rate at which weights are changed during the training is known as the step size or learning rate. A learning rate of 0.01 produced superior results in the NN models reported here than other learning rates. Even in the case of LSTM models, each hotspot has specific crime trends and patterns.

Table 6 Details of the SARIMA models trained for the whole area and the top 3 largest crime hotspots in Chicago

Full size table

Table 7 Details of the LSTM models trained for the whole area and the top 3 largest crime-dense regions in Chicago

Full size table

In order to assess the effectiveness and accuracy of the regressive functions, we performed an evaluation analysis on the test set consisting of the last three years of data (i.e., years 2017–2019). In particular, for each crime hotspot and for the whole area, their associated SARIMA and LSTM models have been exploited to predict the number of crimes that are likely to happen in that hotspot, week by week. Figures 7 and 8 show observed, SARIMA-forecasted and LSTM-forecasted data (plotted in blue, orange and green, respectively), for the whole area and the crime hotspot CH#197 (the largest one), respectively. We consider here four prediction horizons on the test set, from one to four-week ahead. We note that forecasts generally adhere very well to the observed data over the whole test set period. However, the forecasting accuracy clearly decreases (in particular for LSTM) with the increase of the prediction horizon.

Now, let us give a quantitative evaluation of the performance of the regressive models and their effectiveness in making predictions on the corresponding test sets. To this end, we computed six error measures (MAE, MAPE, MSE, RMSE, MaxError, MeanError), which are commonly used in regressive analysis literature to quantify forecast performance [12].

Table 8 reports the values of the error measures described above achieved by SARIMA and LSTM models for the whole area and the three largest detected crime hotspots. Looking at the values reported in the table, we can make the following observations.

Table 8 MAE, MAPE, MSE, RMSE, Max Error and Mean Error vs several weekly prediction horizons, for the whole area and the top three largest crime hotspots in Chicago City

Full size table

The smaller hotspot, the lower MAE. Looking at the values in the table, we can observe that MAE values decrease when hotspot areas are smaller and smaller. In fact, considering one-week-ahead forecasting, the MAE achieved by SARIMA models monotonously decreases from 77.44 (whole area) to 24.42, 21.09, and 12.59 (three largest crime hotspots, ordered by decreasing size), and similarly for all other forecasting horizons. LSTM forecasts show decreasing MAE values as well. The trend is clearly recognizable in Fig. 9, which plots the MAE achieved by both SARIMA and LSTM for the whole area and the top five largest crime hotspots. The chart clearly shows that the smaller the hotspot, the lower the error. This is a reasonable outcome, that is, predictions are more precise when hotspot areas are smaller, thus providing city administrators and police officers with more detailed information for strategizing how to distribute resources and efforts among the various parts of the city.

Higher forecasting accuracy when the forecasting horizon is shorter. For example, the MAE assessed by LSTM-forecasts, by considering the whole area, monotonously increases from 91.06 (for one-week-ahead forecasts) to 97.86, 113.70 and 140.41 (for two-, three- and four-week ahead forecasts), and similarly all other indices and areas. This is a reasonable result, considering that forecasts are based on the previous historical trends: the more away is the forecasting timestamp from the most recent historical data, the less accurate the forecast. The increasing trend can also be seen in Fig. 10, which shows the MAE versus several weekly forecasting horizons. The increasing trend is more evident for the whole area and the largest cluster, and it is particularly marked for the LSTM-based forecasts.

SARIMA models outperform LSTM model (for large hotspots). Percentage errors (MAPE column) show that the adopted SARIMA models (Table 6) forecast the number of crimes with an average error ranging from 5.09% (whole area, one-week ahead) to 13.37% (crime hotspot #8, four-week ahead), which appears to be a very interesting result. On the other side, LSTM models assess MAPE values ranging from 5.93% to 12.81%, respectively. For a more complete view of these results, Fig. 11 shows the MAPE versus several weekly forecasting horizons. From the plot, we can observe that percentage errors of both SARIMA and LSTM models increase when the prediction horizon is longer and longer, and that generally SARIMA models outperform LSTM regressors (but for the smaller hotspot). Also, by observing the values in the Table 8 and Fig. 11, we can observe that the lower the hotspot area, the higher the percentage error. However, the MAPE index, as defined above, does not take into account the coverage level of each hotspot. The growth in forecasting errors is compensated by a more precise identification of the specific area where crime events will occur, thus giving more exhaustive information to city administrator and police officers for planning how to distribute resources and efforts in the different regions of the city.

Finally, to understand whether the forecast errors can be approximated to normally distributed with mean zero and variance $\sigma ^2$, we show in Fig. 12 the distribution of residuals (with overlaid the normal curve with the same mean and standard deviation as the distribution of forecast errors) for the two largest crime hotspots detected by SARIMA models. In particular, the figure presents the histograms of the forecast errors over one-week ahead forecasts, which show that the distributions of forecast errors are slightly shifted towards positive or negative values compared to a normal curve (it should be centered on 0, in the ideal case). This is also confirmed by observing the Normal QQ plot (quantile-quantile plot) shown in Fig. 13, which can be exploited as a graphical tool to assess if residuals plausibly follow a normal distribution. Both plots graphically confirm that the residuals follow a normal distribution, as expected.

Comparative analysis with ST-OPTICS on hotspots detection and crime forecasting

To make our evaluation more accurate and complete, we performed a comparative analysis of the proposed approach, based on CHD for hotspot detection, with a similar approach based on ST-OPTICS [27], which is a density-based clustering algorithm specifically designed to analyze spatio-temporal data. ST-OPTICS was selected among others since it was purposely designed for clustering datasets characterized by time-based features, and thus is not directly comparable with the other spatial clustering algorithms previously mentioned (see Table 5). In a nutshell, ST-OPTICS is a modified version of the OPTICS algorithm, achieved by extending the notion of density-reachability. It exploits two radiuses, $\epsilon _1$ and $\epsilon _2$, where the $\epsilon _1$ defines the reachability with respect to spatial attributes, and $\epsilon _2$ defines the reachability w.r.t. non-spatial (temporal) attributes; on the basis of such definitions, a point $p_i$ is considered in the neighborhood of $p_j$ if the distance between $p_i$ and $p_j$ is less than $\epsilon _1$ w.r.t. spatial attributes, and less than $\epsilon _2$ w.r.t. non-spatial attributes. The ST-OPTICS implementation we exploited is publicly available,^{Footnote 3} and it takes as input parameters $\langle \epsilon _2, min\_pts,\xi \rangle$, where $\epsilon _2$ is a threshold value on the maximum radius w.r.t. the non-spatial attributes, $min\_pts$ is the minimum number of neighbors required to define a core-point, and $\xi$ determines the minimum steepness on the reachability plot that constitutes a cluster boundary. The reachability plot takes into account both spatial and non-spatial radiuses. It is also worth noting that $min\_pts$ and $\xi$ are exploited as in the well-known OPTICS-$\xi$ algorithm.

To perform the comparative analysis between the results achieved by ST-OPTICS and CHD, we first evaluated the characteristics of the most five relevant hotspots detected by the two algorithms, and then the forecasting performance achieved for crime prediction in each hotspot. The dataset exploited for the comparative analysis is that one described in Sect. "Data description", and predictions have been compared versus different forecasting horizons.

As a first result, ST-OPTICS has been applied to discover spatial hotspots from the geo-referenced crime data. In order to detect high-quality crime-dense regions, an input parameters tuning has been done to achieve the best results of the algorithm. In particular, the clustering quality has been evaluated by computing the internal indexes (Silhouette, DBCV, CDBW, Calinsky-Harabasz, Davies-Bouldin) adopted in Sect. "Crime hotspots: results and discussion", by varying $\xi$ from 0.05 to 0.1 and $\epsilon _2$ from 4 to 24 (with step size equal to 4). The results are reported in Figure 14a, which shows the performance achieved by varying $\xi$, fixed $\epsilon _2=24$ and $k=64$ (which corresponded to the optimal performance within the faced scenario). In particular, Figure 14b shows that the best quality of detected hotspots is achieved for $\xi ^*$ = 0.07. Comparing such results with those reported in Sect. "Crime hotspots: results and discussion", we notice that CHD performs better than ST-OPTICS considering Silhouette, Calinsky-Harabasz and Davies-Bouldin indexes, while ST-OPTICS is better on the DBCV index. On the other side, Figure 5b shows how the number of noise points (blue curve) and the number of detected hotspots (red curve) vary with respect to $\xi$ values. We can observe that for $\xi ^*$=0.07, the number of detected noise points is 23,947, while the number of detected clusters is 49. With respect to CHD, ST-OPTICS detects an higher number of noise points (23,947 versus 18,929) and a lower number of hotspots (49 versus 200). The results shown below only refer to the run with the best combination of parameters (i.e, $\xi$=0.7, $\epsilon _2=24$, $k=64$).

Table 9 Crime forecasting: MAE, MAPE, MSE and RMSE for the top five most numerous crime hotspots in Chicago, detected by ST-OPTICS and CHD

Full size table

The comparative forecasting performance analysis on the hotspots detected by ST-OPTICS and CHD has been done by focusing on the five most numerous clusters returned by the two algorithms. In particular, as SARIMA models have shown higher predictive accuracy in Sect. "Experimental evaluation and results", we exploit here these regressive models to compare the achieved results. Table 9 reports the values of four error measures (MAPE, MAE, MSE, RMSE) achieved by SARIMA models on the five largest hotspots detected by ST-OPTICS and CHD (sorted by decreasing size), versus one-, two-, three- and four-week-ahead forecasting horizons. Looking at the values reported in the table, we can observe that the first two largest clusters detected by ST-OPTICS (clusters #0 and #4) and CHD (clusters #197 and #198) are very different in terms of number of points, while the other ones have comparable sizes. Also, by comparing MAPE, MAE, MSE and RMSE, we can observe that forecasts achieve generally lower errors on the hotspots detected by CHD than on those ones detected by ST-OPTICS. This result, in part due to the lower numerosity of the clusters, shows higher forecasting accuracy on the hotspots detected by CHD. As a more complete view of the MAPE results, Fig. 15 shows the MAPE versus several weekly forecasting horizons. From the plot, we can observe that percentage errors are lower on CHD-detected hotspots than on ST-OPTICS-detected hotspots (except for the largest cluster).

Comparison with other crime forecasting approaches on the Chicago Crimes dataset

With the aim of making a comparative analysis for crime forecasting more accurate and complete, we report here some comparative results between MD-CrimePredictor and some other approaches selected from the crime forecasting literature (i.e., [21,22,23]). Specifically, to ensure a fair and consistent comparison, we selected four algorithms that have been specifically applied to the Chicago crime data, i.e., the same dataset we exploited to evaluate MD-CrimePredictor as well. The approaches have been compared in terms of MAPE, which is a scale-independent metric (making it suitable for comparisons between different datasets or models) largely used in the crime forecasting performance evaluation [1]. Table 10 summarizes the results of the comparison, showing for each approach (i) the exploited models, (ii) the period of the Chicago crimes dataset exploited as training set, (iii) the period of the dataset exploited as test set, (iv) the total number of forecasted days, and (v) the related MAPE index for one-day-ahead forecasts, as reported in the corresponding references [21,22,23] (reviewed in Sect. "Related work"). By observing the table, it is worth noting that the MD-Crime-Predictor has been tested considering the longer time horizon (365 days), while the other approaches have been tested on time horizons no longer than 6 months (184 days for the approaches proposed in [23]). As a second thought, it can be seen that MD-CrimePredictor over-performs the other methodologies w.r.t. the MAPE index (0.12), resulting slightly more effective than the second best result reported in the table (0.14). The comparison confirms the goodness of the presented approach, even when considering short (one-day-ahead) time windows.

Table 10 Comparative results on crime forecasting with other approaches proposed in literature on the Chicago crimes dataset, for one day-ahead forecasts

Full size table

Conclusion

This paper presented the design and implementation of MD-CrimePredictor (Multi-Density Crime Predictor), an approach based on multi-density clustering and regressive models to automatically detect high-risk crime areas in urban environments, and to reliably forecast crime trends in each area. First, the algorithm detects multi-density crime hotspots by applying a multi-density clustering algorithm, where densities, shapes, and the number of the detected regions are automatically computed by the algorithm without any pre-fixed division in areas. Then, a specific regressive model is discovered from each detected hotspot, analyzing the partitions discovered during the previous step. The final result of the algorithm is a spatio-temporal crime forecasting model, composed of a set of crime hotspots, their densities, and a set of associated crime predictors. Forecasting models are extracted by exploiting both SARIMA and LSTM models, and a comparative experimental analysis is presented in terms of error measures. The experimental evaluation of the proposed approach, performed on a large area of Chicago (involving more than two million crime events), has shown higher accuracy of the first method with respect to the second one. We also offer a comparative evaluation of CHD in contrast to ST-OPTICS, making a comparison regarding crime prediction accuracy between hotspots identified through CHD and those identified through ST-OPTICS. Moreover, we have also presented a comparative analysis with other crime forecasting methods proposed in the literature, and specifically tested on Chicago crime data. Overall, the results show the effectiveness of the approach proposed in the paper, by achieving good accuracy in spatial and temporal crime forecasting over rolling time horizons.

In future work, other research issues may be investigated. First, we further explore the application of other multi-density approaches for the detection of crime hotspots, with the aim to perform a comparative evaluation between different clustering algorithms (multi-density vs classic density-based approaches) in crime spatial analysis. Second, we will study how other urban events can affect crime trends, and how such data can be correlated to criminal activities.

Availability of data and materials

The analyzed dataset is available at https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g .

Notes

https://data.cityofchicago.org/.
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g.
ST-OPTICS implementation on Github (https://github.com/eren-ck/st_optics) [42]

References

Butt UM, Letchmunan S, Hassan FH, Ali M, Baqir A, Sherazi HHR. Spatio-temporal crime hotspot detection and prediction: a systematic literature review. IEEE Access. 2020;8:166553–74.
Article Google Scholar
Zhu Q, Zhang F, Liu S, Li Y. An anticrime information support system design: application of k-means-VMD-BiGRU in the city of Chicago. Inf Manag. 2022;59(5): 103247. https://doi.org/10.1016/j.im.2019.103247.
Article Google Scholar
Zhu Q, Zhang F, Liu S, Wang L, Wang S. Static or dynamic? characterize and forecast the evolution of urban crime distribution. Expert Syst Appl. 2022;190: 116115.
Article Google Scholar
Cesario E. Big data analysis for smart city applications. In: Sakr S, Zomaya AY, editors. Encyclopedia of big data technologies. Berlin: Springer; 2019.
Google Scholar
Cesario E, Uchubilo PI, Vinci A, Zhu X. Multi-density urban hotspots detection in smart cities: a data-driven approach and experiments. Pervasive Mob Comput. 2022;86: 101687.
Article Google Scholar
Law J, Quick M, Chan PW. Analyzing hotspots of crime using a Bayesian spatiotemporal modeling approach: a case study of violent crime in the greater Toronto area. Geogr Anal. 2015;47:1–19.
Article Google Scholar
Catlett C, Cesario E, Talia D, Vinci A. Spatio-temporal crime predictions in smart cities: a data-driven approach and experiments. Pervasive Mob Comput. 2019;53:62–74.
Article Google Scholar
Liu P, Zhou D, Wu N. VDBSCAN: varied density based spatial clustering of applications with noise. In: 2007 International Conference on Service Systems and Service Management, IEEE. 2007. p. 1–4.
Cesario E, Talia D. Distributed data mining patterns and services: an architecture and experiments. Concurr Comput Pract Exp. 2012;24(15):1751–74.
Article Google Scholar
Mitra S, Nandy J. KDDClus: a simple method for multi-density clustering. In: Proceedings of International Workshop on Soft Computing Applications and Knowledge Discovery (SCAKD 2011), Moscow, Russia. Citeseer. 2011. p. 72–6.
Canino MP, Cesario E, Vinci A, Zarin S. Epidemic forecasting based on mobility patterns: an approach and experimental evaluation on COVID-19 data. Soc Netw Anal Min. 2022;12(1):116.
Article Google Scholar
Amini A, Saboohi H, Wah TY. A multi density-based clustering algorithm for data stream with noise. In: 2013 IEEE 13th International Conference on Data Mining Workshops, 2013. p. 1105–12.
Amini A, Saboohi H, Herawan T, Wah TY. Mudi-stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl. 2016;59:370–85.
Article Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96:226–31.
Google Scholar
Pankratz A. Forecasting with univariate Box-Jenkins models: concepts and cases. Hoboken: John Wiley & Sons; 2009.
Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Article Google Scholar
Tayebi MA, Ester M, Glasser U, Brantingham PL. Crimetracer: activity space based crime location prediction. In: 2014 IEEE/ACM International Conference On Advances in Social Networks Analysis and Mining (ASONAM), 2014. p. 472–80.
Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M. Crime data mining: a general framework and some examples. Computer. 2004;37(4):50–6.
Article Google Scholar
Liang W, Wu Z, Li Z, Ge Y. Crimetensor: fine-scale crime prediction via tensor learning with spatiotemporal consistency. ACM Trans Intell Syst Technol. 2022;13(2):33–13324.
Article Google Scholar
Wang H, Kifer D, Graif C, Li Z. Crime Rate Inference with Big Data. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, ACM. 2016. p. 635–44. https://doi.org/10.1145/2939672.2939736.
Han X, Hu X, Wu H, Shen B, Wu J. Risk prediction of theft crimes in urban communities: an integrated model of LSTM and ST-GCN. IEEE Access. 2020;8:217222–30. https://doi.org/10.1109/ACCESS.2020.3041924.
Article Google Scholar
Li Z, Huang C, **a L, Xu Y, Pei J. Spatial-temporal hypergraph self-supervised learning for crime prediction. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022. https://doi.org/10.1109/icde53745.2022.00269.
Zhou B, Chen L, Zhao S, Li S, Zheng Z, Pan G. Unsupervised domain adaptation for crime risk prediction across cities. IEEE Trans Comput Soc Syst. 2023;10(6):3217–27. https://doi.org/10.1109/TCSS.2022.3207987.
Article Google Scholar
Safat W, Asghar S, Gillani SA. Empirical analysis for crime prediction and forecasting using machine learning and deep learning techniques. IEEE Access. 2021;9:70080–94.
Article Google Scholar
Cesario E, Lindia P, Vinci A. Detecting multi-density urban hotspots in a smart city: approaches, challenges and applications. Big Data Cognit Comput. 2023. https://doi.org/10.3390/bdcc7010029.
Article Google Scholar
Nanni M, Pedreschi D. Time-focused clustering of trajectories of moving objects. J Intell Inf Syst. 2006;27:267–89.
Article Google Scholar
Agrawal K, Garg S, Sharma S, Patel P. Development and validation of optics based spatio-temporal clustering technique. Inf Sci. 2016;369:388–401.
Article Google Scholar
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.
Article Google Scholar
Moulavi D, Jaskowiak PA, Campello RJGB, Zimek A, Sander J. Density-based clustering validation, p. 839–47.
Halkidi M, Vazirgiannis M. A density-based cluster validity approach using multi-representatives. Pattern Recognit Lett. 2008;29(6):773–86.
Article Google Scholar
Calinski T, Harabasz J. A dendrite method for cluster analysis. Commun Stat. 1974;3(1):1–27.
MathSciNet Google Scholar
Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI. 1979;1(2):224–7.
Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. Melbourne: OTexts.com; 2014.
Google Scholar
Shumway RH, Stoffer DS. Time series analysis and its applications: with R examples. Springer Texts in Statistics. 3rd ed. New York: Springer; 2011.
Book Google Scholar
Cowpertwait PSP, Metcalfe AV. Introductory time series with R. 1st ed. Berlin: Springer; 2009.
Google Scholar
Cryer JD, Chan KS. Time series analysis: with applications in R. Springer Texts in Statistics. Berlin: Springer; 2008.
Book Google Scholar
Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. Neural Comput. 2000;12(10):2451–71.
Article Google Scholar
Cilimkovic M. Neural networks and back propagation algorithm. Institute of Technology Blanchardstown, Blanchardstown Road North Dublin. 2015;15(1)
Liu Y, Li Z, **ong H, Gao X, Wu J. Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining, 2010. p. 911–6.
Lederer J. Activation functions in artificial neural networks: a systematic overview. CoRR. 2021;abs/2101.09957.
Wilson DR, Martinez TR. The need for small learning rates on large problems. In: IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), 2001;1:115–1191.
Cakmak E, Plank M, Calovi DS, Jordan A, Keim D. Spatio-temporal clustering benchmark for collective animal behavior. In: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Animal Movement Ecology and Human Mobility, 2021. p. 5–8.

Download references

Acknowledgements

Not applicable.

Funding

This research has been supported by the "PNRR MUR project PE0000013-FAIR", the "ICSC National Centre for HPC, Big Data and Quantum Computing" (CN00000013) within the NextGenerationEU program, and the European Union—NextGenerationEU - National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) - Project: “SoBigData.it—Strengthening the Italian RI for Social Mining and Big Data Analytics”—Prot. IR0000013 - Avviso n. 3264 del 28/12/2021, and by the Italian Ministry of University and Research, PRIN 2022 “INSIDER: INtelligent ServIce Deployment for advanced cloud-Edge integRation”, grant n. 2022WWSCRR, CUP H53D23003670006.

Author information

Eugenio Cesario, Paolo Lindia and Andrea Vinci have contributed equally to this work.

Authors and Affiliations

University of Calabria, Via P. Bucci 18B, 87036, Rende, Italy
Eugenio Cesario & Paolo Lindia
Institute for High Performance Computing and Networking - National Research Council of Italy (ICAR-CNR), Via P. Bucci, 8/9c, 87036, Rende, Italy
Andrea Vinci

Authors

Eugenio Cesario
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Lindia
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Vinci
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

EC and AV designed the study; PL and AV carried out data collection; EC, PL and AV carried out the analysis and interpretation of the results; EC, PL, and AV helped to write the manuscript; EC coordinated the whole research study and paper submission. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Eugenio Cesario.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cesario, E., Lindia, P. & Vinci, A. Multi-density crime predictor: an approach to forecast criminal activities in multi-density crime hotspots. J Big Data 11, 75 (2024). https://doi.org/10.1186/s40537-024-00935-4

Download citation

Received: 06 May 2023
Accepted: 07 May 2024
Published: 17 May 2024
DOI: https://doi.org/10.1186/s40537-024-00935-4

Multi-density crime predictor: an approach to forecast criminal activities in multi-density crime hotspots

Abstract

Similar content being viewed by others

Predicting Crime Using Spatial Features

Prediction of Crime Hot Spots Using Spatiotemporal Ordinary Kriging

Progresses and Challenges of Crime Geography and Crime Analysis

Introduction

Reference context

Motivations and contributions

Plan of the paper

Related work

Crime forecasting

Main differences and novelty of MD-CrimePredictor

Problem definition and proposed approach

Problem definition and goals

The multi-crime-predictor approach

Detection of multi-density crime hotspots

Extraction of crime predictors

SARIMA models

LSTM

Experimental evaluation and results

Crime forecasting models: results and discussion

Comparative analysis with ST-OPTICS on hotspots detection and crime forecasting

Comparison with other crime forecasting approaches on the Chicago Crimes dataset

Conclusion

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation