Abstract
We present a fast, accurate estimation method for multivariate Hawkes self-exciting point processes widely used in seismology, criminology, finance and other areas. There are two major ingredients. The first is an analytic derivation of exact maximum likelihood estimates of the nonparametric triggering density. We develop this for the multivariate case and add regularization to improve stability and robustness. The second is a moment-based method for the background rate and triggering matrix estimation, which is extended here for the spatiotemporal case. Our method combines them together in an efficient way, and we prove the consistency of this new approach. Extensive numerical experiments, with synthetic data and real-world social network data, show that our method improves the accuracy, scalability and computational efficiency of prevailing estimation approaches. Moreover, it greatly boosts the performance of Hawkes process-based models on social network reconstruction and helps to understand the spatiotemporal triggering dynamics over social media.
Similar content being viewed by others
Notes
We obtain latitude and longitude coordinates from https://www.flickr.com/places/info.
References
Achab, M., Bacry, E., Gaïffas, S., Mastromatteo, I., Muzy, J.-F. (2017). Uncovering causality from multivariate Hawkes integrated cumulants. The Journal of Machine Learning Research, 18(1), 6998–7025.
Bacry, E., Bompaire, M., Gaïffas, S., Poulsen, S. (2017). Tick: A python library for statistical learning, with a particular emphasis on time-dependent modelling. ar**v preprint ar**v:1707.03003.
Bacry, E., Mastromatteo, I., Muzy, J.-F. (2015). Hawkes processes in finance. Market Microstructure and Liquidity, 1(01), 1550005.
Bacry, E., Muzy, J.-F. (2016). First-and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Transactions on Information Theory, 62(4), 2184–2202.
Balderama, E., Schoenberg, F. P., Murray, E., Rundel, P. W. (2012). Application of branching models in the study of invasive species. Journal of the American Statistical Association, 107(498), 467–476.
Bao, J., Zheng, Y., Mokbel, M. F. (2012). Location-based and preference-aware recommendation using sparse geo-social networking data. In Proceedings of the 20th international conference on advances in geographic information systems (pp. 199–208).
Brantingham, P. J., Yuan, B., Herz, D. (2020a). Is gang violent crime more contagious than non-gang violent crime? Journal of Quantitative Criminology, https://doi.org/10.1007/s10940-020-09479-1.
Brantingham, P. J., Yuan, B., Sundback, N., Schoenberg, F. P., Bertozzi, A. L., Gordon, J., et al. (2020b). Does violence interruption work? UCLA preprint, www.stat.ucla.edu/~frederic/papers/brantingham2.pdf.
Brillinger, D. R., Guttorp, P. M., Schoenberg, F. P., El-Shaarawi, A. H., Piegorsch, W. W. (2002). Point processes, temporal. Encyclopedia of Environmetrics, 3, 1577–1581.
Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P. (2017). Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.
Chen, S., Shojaie, A., Shea-Brown, E., Witten, D. (2017). The multivariate hawkes process in high dimensions: Beyond mutual excitation. ar**v preprint ar**v:1707.04928.
Chiang, W.-H., Yuan, B., Li, H., Wang, B., Bertozzi, A., Carter, J., Ray, B., Mohler, G. (2019). Sos-EW: System for overdose spike early warning using drug mover’s distance-based Hawkes processes. In Joint European conference on machine learning and knowledge discovery in databases (pp. 538–554). Berlin: Springer.
Cho, E., Myers, S. A., Leskovec, J. (2011). Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1082–1090). ACM.
Daley, D. J., Vere-Jones, D. (2003). An introduction to the theory of point processes: Volume I: Probability and its Applications. New York: Springer.
Daley, D. J., Vere-Jones, D. (2007). An introduction to the theory of point processes: Volume II: General theory and structure. New York: Springer.
Du, N., Farajtabar, M., Ahmed, A., Smola, A. J., Song, L. (2015). Dirichlet–Hawkes processes with applications to clustering continuous-time document streams. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 219–228). ACM.
Duchi, J., Hazan, E., Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.
Eichler, M., Dahlhaus, R., Dueck, J. (2017). Graphical modeling for multivariate Hawkes processes with nonparametric link functions. Journal of Time Series Analysis, 38(2), 225–242.
Farajtabar, M., Wang, Y., Rodriguez, M. G., Li, S., Zha, H., Song, L. (2015). Coevolve: A joint point process model for information diffusion and network co-evolution. Advances in Neural Information Processing Systems, 1954–1962.
Fox, E. W., Short, M. B., Schoenberg, F. P., Coronges, K. D., Bertozzi, A. L. (2016). Modeling e-mail networks and inferring leadership using self-exciting point processes. Journal of the American Statistical Association, 111(514), 564–584.
Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 37, 424–438.
Hall, E. C., Willett, R. M. (2016). Tracking dynamic point processes on networks. IEEE Transactions on Information Theory, 62(7), 4327–4346.
Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1), 83–90.
Kaipio, J., Somersalo, E. (2006). Statistical and computational inverse problems, Vol. 160. New York: Springer.
Kingma, D. P., Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.
Lai, E. L., Moyer, D., Yuan, B., Fox, E., Hunter, B., Bertozzi, A. L., Brantingham, P. J. (2016). Topic time series analysis of microblogs. IMA Journal of Applied Mathematics, 81(3), 409–431.
Lee, D. D., Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), p. 788.
Lewis, E., Mohler, G. (2011). A nonparametric EM algorithm for multiscale Hawkes processes. Journal of Nonparametric Statistics, 1(1), 1–20.
Linderman, S., Adams, R. (2014). Discovering latent network structure in point process data. In International conference on machine learning (pp. 1413–1421). Bei**g, China: JMLR: W&C.
Malinverno, A. (2002). Parsimonious Bayesian Markov chain Monte Carlo inversion in a nonlinear geophysical problem. Geophysical Journal International, 151(3), 675–688.
Mark, B., Raskutti, G., Willett, R. (2018). Network estimation from point process data. IEEE Transactions on Information Theory, 65, 2953–2975.
Marsan, D., Lengline, O. (2008). Extending earthquakes’ reach through cascading. Science, 319(5866), 1076–1079.
Mohler, G. O. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 30(3), 491–497.
Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., Tita, G. E. (2011). Self-exciting point process modeling of crime. Journal of the American Statistical Association, 106(493), 100–108.
Neumaier, A. (1998). Solving ill-conditioned and singular linear systems: A tutorial on regularization. SIAM Review, 40(3), 636–666.
Ogata, Y. (1978). The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, 30(1), 243–261.
Ogata, Y. (1998). Space-time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics, 50(2), 379–402.
Porter, M. D., White, G., et al. (2012). Self-exciting hurdle models for terrorist activity. The Annals of Applied Statistics, 6(1), 106–124.
Reinhart, A. (2018). A review of self-exciting spatio-temporal point processes and their applications. Statistical Science, 33(3), 299–318.
Schoenberg, F. P. (2006). On non-simple marked point processes. Annals of the Institute of Statistical Mathematics, 58(2), 223–233.
Schoenberg, F. P. (2013). Facilitated estimation of ETAS. Bulletin of the seismological Society of America, 103(1), 601–605.
Schoenberg, F. P., Brillinger, D. R., Guttorp, P. (2013). Point processes, spatial-temporal. Encyclopedia of Environmetrics, 4, 1573–1578.
Schoenberg, F. P., et al. (2018a). Comment on “A review of self-exciting spatio-temporal point processes and their applications” by Alex Reinhart. Statistical Science, 33(3), 325–326.
Schoenberg, F. P., Gordon, J. S., Harrigan, R. J. (2018b). Analytic computation of nonparametric Marsan–Lengliné estimates for Hawkes point processes. Journal of Nonparametric Statistics, 30(3), 742–775.
Veen, A., Schoenberg, F. P. (2008). Estimation of space-time branching process models in seismology using an EM-type algorithm. Journal of the American Statistical Association, 103(482), 614–624.
Wang, B., Luo, X., Zhang, F., Yuan, B., Bertozzi, A. L., Brantingham, P. J. (2018). Graph-based deep modeling and real time forecasting of sparse spatio-temporal data. ar**v preprint ar**v:1804.00684.
Yuan, B., Li, H., Bertozzi, A. L., Brantingham, P. J., Porter, M. A. (2019). Multivariate spatiotemporal Hawkes processes and network reconstruction. SIAM Journal on Mathematics of Data Science, 1(2), 356–382.
Yuan, B., Wang, X., Ma, J., Zhou, C., Bertozzi, A. L., Yang, H. (2020). Variational autoencoders for highly multivariate spatial point processes intensities. In International conference on learning (representations).
Zhu, S., **e, Y. (2019). Spatial–temporal–textual point processes with applications in crime linkage detection. ar**v preprint ar**v:1902.00440.
Zhuang, J., Ogata, Y., Vere-Jones, D. (2002). Stochastic declustering of space-time earthquake occurrences. Journal of the American Statistical Association, 97(458), 369–380.
Acknowledgements
This work was supported by the City of Los Angeles Gang Reduction Youth Development Project, by NSF grant DMS-2027277 and by NSF grant DMS-1737770. Baichuan Yuan gratefully acknowledges the fellowship support of the National Institute of Justice (NIJ) under Award Number 2018-R2-CX-0013.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fast estimation of Hawkes processes.
Appendices
Appendix 1: Simulation data
1.1 \(U=1\) data
We simulate a univariate ST-Hawkes process with \(K=1/6\), \(\mu =0.01\), \(T=2.1\times 10^5\), \(X,Y \in (0,10)\), \(f(r)=\frac{1}{2\pi \sigma ^2}\exp (-r^2/2\sigma ^2)\) (\(\sigma ^2=0.2\)) and \(h(t)=\omega \exp (-\omega t)\) (\(\omega =10\)). The regularization parameter \(\alpha =0.5\).
1.2 \(U=100\) data
Using the same triggering densities, this data set has the following parameters: \(U=100\), the background rate \(\varvec{\mu }=(0.01,\ldots ,0.01)\). \(T=10^5\), \(X,Y \in (0,10)\), \(\sigma ^2=0.2\) and \(\omega =10\) with 172,943 events. For the triggering matrix in Fig. 2, each yellow pixel is 1/20, cyan pixel is 1/40 and dark pixel is 0.
1.3 \(U=10\) data
With the same densities, the parameters are \(U=10\), \(\varvec{\mu }=(0.01,\ldots ,0.01)\), \(T=1e6\), \(X,Y \in (0,10)\), \(\sigma ^2=0.2\), \(\omega =10\) and \(\varvec{K}\) is shown in Fig. 3. Here, each yellow pixel is 1/6 and dark pixel is 0. The regularization parameter \(\alpha =0.55\).
1.4 \(U=10\) data with a Pareto triggering density in time
We keep the same parameters as the \(U=10\) above. The changes on the densities are on the temporal density \(h(t)=(p-1)c^{p-1}/(t+c)^p\) with \(c=2\) and \(p=2.5\) and the same spatial triggering density with \(\sigma ^2=0.1\). The regularization parameter \(\alpha =0.38\).
1.5 \(U=10\) data with a uniform triggering density in time
Similar to the section above, here we change the temporal densities to be uniform \(h(t)=0.1\) and the spatial triggering density with \(\sigma ^2=0.1\). The regularization parameter \(\alpha =0.4\). We threshold the estimated \(\varvec{{\tilde{K}}}\) with \(\epsilon = 0.01\) to remove noise.
1.6 \(U=10\) data with a power-law triggering density in space
Similarly, we use the power-law density \(f(r)=\frac{1}{(r^2+1)^2}\) in space and the exponential triggering density in time with \(\omega =10\). The regularization parameter \(\alpha =0.28\). We threshold the estimated \(\varvec{{\tilde{K}}}\) with \(\epsilon = 0.02\) to remove noise.
1.7 \(U=10\) data with a uniform triggering density in space
Given the same parameters as above, we change the spatial density to \(f(r)=0.25\) and keep the exponential triggering density in time with \(\omega =10\). The regularization parameter \(\alpha =0.36\). We threshold the estimated \(\varvec{{\tilde{K}}}\) with \(\epsilon = 0.01\) to remove noise (Fig. 9).
Appendix 2: Gowalla and Brightkite data sets
In this section, we describe the preprocessing procedure for Gowalla and Brightkite data sets. We focus on various local friendship subnetworks within different US cities, including San Diego (SD), Chicago (CHI), Los Angeles (LA) and San Francisco (SF). They have diverse network sizes and ST patterns within the same time period.
1.1 Brightkite-SD
We study check-ins in SD for Brightkite data set. We use a bounding box (with a north latitude of 33.1142, a south latitude of 32.5348, an east longitude of \(-\,116.9058\), and a west longitude of \(-\,117.2824\))Footnote 1 to locate check-ins in SD. We consider “active” users, who have more than 300 check-ins during the period. This gives us a small subnetwork with 25 “active” users and a total of 13,760 check-ins in SD.
1.2 Gowalla-CHI
We apply the same procedure as in "Appendix 2" on the Gowalla check-in data for CHI. The bounding box for CHI has a north latitude of 42.0229, a south latitude of 41.6446, an east longitude of \(-\,87.5245\) and a west longitude of \(-\,87.9395\). After selecting only active users (with more than 100 check-ins) users, we have a medium-sized subnetwork with 96 users and 27,326 check-ins.
1.3 Brightkite-LA
We apply the same procedure as in "Appendix 2" on the Brightkite check-in data in LA. The bounding box for LA has a north latitude of 34.34, a south latitude of 33.70, an east longitude of \(-\,118.16\) and a west longitude of \(-\,118.67\). After selecting only active users (with more than 150 check-ins) users, we have a medium-sized subnetwork with 168 users and 89,127 check-ins.
1.4 Gowalla-SF
We apply the same procedure as in "Appendix 2" on the Gowalla check-in data in SF. The bounding box for SF has a north latitude of 37.93, a south latitude of 37.64, an east longitude of \(-\,122.28\) and a west longitude of \(-\,123.17\). After selecting only active users (with more than 65 check-ins) users, we have a large subnetwork with 515 users and 102,673 check-ins.
Appendix 3: Assumptions for Theorem 1
There are two separate sets of general assumptions for the consistency of GMM and MLE in Hawkes processes. We only list assumptions that are relevant to our proof.
The first set of assumptions is from Ogata (1978) about the point process and intensity functions.
Assumption 1
(Consistency of MLE estimation)
-
Multivariate Hawkes process \((\varvec{N}_{t,x,y})\) is stationary, ergodic and absolutely continuous with respect to the standard Poisson process.
-
The conditional intensity function \(\lambda _{\Theta }\) with parameters \(\Theta \) is predictable for all compact metric spaces and continuous in \(\Theta \).
-
When \(t=0\), \(\lambda _{\Theta }\)is positive almost surely and \(\lambda _{\Theta _1}= \lambda _{\Theta _2}\)almost surely if and only if \(\Theta _1=\Theta _2\); for any \(\Theta \) from a compact metric space, there exists a neighborhood \(U(\Theta )\) of \(\Theta \) such that for all \(\Theta ' \in U(\Theta )\), \(|\lambda _{\Theta '}|\) and \(|\log \lambda _{\Theta '}|\) are bounded by random variables with finite second moments.
-
For any \(\Theta \) from a compact metric space, there is a neighborhood \(U(\Theta )\) of \(\Theta \) such that \(\sup _{\Theta ' \in U(\Theta )}|\lambda (\Theta ')-{\mathbb {E}}(\lambda (\Theta '))| \rightarrow 0\) in probability as \(t \rightarrow \infty \) and (for some \(\alpha >0\)) \(\sup _{\Theta ' \in U(\Theta )}|\log {\mathbb {E}}(\lambda (\Theta '))|\) has finite \((2+\alpha ){\text{th}}\) moment uniform bounded with respect to t.
On top of Assumption 1, we also need GMM-related assumptions from Achab et al. (2017).
Assumption 2
(Consistency of GMM estimation)
-
For (25), the GMM approximation error \(L(\varvec{R})=0\) if and only if \(\varvec{R} = (\varvec{I-K^{\rm T}})^{-1}\).
-
For (22–24), the supports of the triggering density X, Y, H satisfy \({\tilde{X}}^2/X\), \({\tilde{Y}}^2/Y\), \({\tilde{H}}^2/T \rightarrow 0\) separately as \(X,Y,H \rightarrow \infty \).
About this article
Cite this article
Yuan, B., Schoenberg, F.P. & Bertozzi, A.L. Fast estimation of multivariate spatiotemporal Hawkes processes and network reconstruction. Ann Inst Stat Math 73, 1127–1152 (2021). https://doi.org/10.1007/s10463-020-00780-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-020-00780-1