Log in

Review of Statistical Approaches for Modeling High-Frequency Trading Data

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

Due to technological advancements over the last two decades, algorithmic trading strategies are now widely used in financial markets. In turn, these strategies have generated high-frequency (HF) data sets, which provide information at an extremely fine scale and are useful for understanding market behaviors, dynamics, and microstructures. In this paper, we discuss how information flow impacts the behavior of high-frequency (HF) traders and how certain high-frequency trading (HFT) strategies significantly impact market dynamics (e.g., asset prices). The paper also reviews several statistical modeling approaches for analyzing HFT data. We discuss four popular approaches for handling HFT data: (i) aggregating data into regularly spaced bins and then applying regular time series models, (ii) modeling jumps in price processes, (iii) point process approaches for modeling the occurrence of events of interest, and (iv) modeling sequences of inter-event durations. We discuss two methods for defining events, one based on the asset price, and the other based on both price and volume of the asset. We construct durations based on these two definitions, and apply models to tick-by-tick data for assets traded on the New York Stock Exchange (NYSE). We discuss some open challenges arising in HFT data analysis including some empirical analysis, and also review applications of HFT data in finance and economics, outlining several research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Brazil)

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

Notes

  1. Indeed this is one of the criticisms of HFT. The May 6, 2010 “Flash Crash,” in which the Dow Jones Industrial Average dropped by almost 1,000 points in 30 min, was the result of an execution algorithm that considered only volume, not time. As a result, $4.1 billion of E-Mini S&P 500 futures contracts were sold on the Chicago Mercantile Exchange in a mere 20 min interval (Goldstein et al. (2014)).

  2. A number of approaches can be used to classify trades as buyer- or seller-initiated, including the Lee-Ready algorithm, the tick rule, and bulk volume classification (see Easley et al. 2016 and references therein).

References

  • Aït-Sahalia, Y. and Jacod, J. (2009). Testing for jumps in a discretely observed process. The Annals of Statistics 184–222.

  • Aït-sahalia, Y., Fan, J. and **u, D. (2010). High-frequency covariance estimates with noisy and asynchronous financial data. Journal of the American Statistical Association 105, 492, 1504–1517.

    Article  MathSciNet  MATH  Google Scholar 

  • Aït-sahalia, Y., Jacod, J. and Li, J. (2012). Testing for jumps in noisy high frequency data. Journal of Econometrics 168, 2, 207–222.

    Article  MathSciNet  MATH  Google Scholar 

  • Alizadeh, S., Brandt, M.W. and Diebold, F.X. (2002). Range-based estimation of stochastic volatility models. Journal of Finance 57, 3, 1047–1091.

    Article  Google Scholar 

  • Andersen, T.G. and Bollerslev, T. (1997). Intraday periodicity and volatility persistence in financial markets. Journal of Empirical Finance, 4, 2-3, 115–158.

    Article  Google Scholar 

  • Andersen, T.G. and Bollerslev, T. (1998). Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39, 4, 885–905.

    Article  Google Scholar 

  • Andersen, T.G., Benzoni, L. and Lund, J. (2002). An empirical investigation of continuous-time equity return models. The Journal of Finance, 57, 3, 1239–1284.

    Article  Google Scholar 

  • Andersen, T.G., Bollerslev, T. and Diebold, F.X. (2007). Roughing it up: including jump components in the measurement, modeling, and forecasting of return volatility. The Review of Economics and Statistics 89, 4, 701–720.

    Article  Google Scholar 

  • Ardia, D., Bluteau, K., Boudt, K., Catania, L. and Trottier, D.-A. (2019). Markov-switching GARCH models in r: the MSGARCH Package. Journal of Statistical Software 91(4).

  • Asai, M., Chang, C. -L. and McAleer, M. (2017). Realized stochastic volatility with general asymmetry and long memory. Journal of Econometrics 199, 2, 202–212.

    Article  MathSciNet  MATH  Google Scholar 

  • Baillie, R.T. (1996). Long memory processes and fractional integration in econometrics. Journal of Econometrics, 73, 1, 5–59.

    Article  MathSciNet  MATH  Google Scholar 

  • Baillie, R.T., Bollerslev, T. and Mikkelsen, H.O. (1996). Fractionally integrated generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 74, 1, 3–30.

    Article  MathSciNet  MATH  Google Scholar 

  • Barndorff-Nielsen, O.E. and Shephard, N. (2002). Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society Series B (Statistical Methodology) 64, 2, 253–280.

    Article  MathSciNet  MATH  Google Scholar 

  • Barndorff-Nielsen, O.E. and Shephard, N. (2004). Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics2, 2, 1–37.

    Article  Google Scholar 

  • Barndorff-Nielsen, O.E. and Shephard, N. (2005). Variation, jumps market frictions and high frequency data in financial econometrics.

  • Barndorff-Nielsen, O.E. and Shephard, N. (2006). Econometrics of testing for jumps in financial economics using bipower variation. Journal of Financial Econometrics 4, 1, 1–30.

    Article  Google Scholar 

  • Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A. and Shephard, N. (2011). Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Journal of Econometrics, 162, 2, 149–169.

    Article  MathSciNet  MATH  Google Scholar 

  • Bauwens, L. and Giot, P. (2000). The logarithmic ACD model: an application to the bid-ask quote process of three NYSE stocks. Annales d’Economie et de Statistique, (60):117–149.

  • Bauwens, L. and Hautsch, N. (2006). Stochastic conditional intensity processes. Journal of Financial Econometrics 4, 3, 450–493.

    Article  Google Scholar 

  • Bauwens, L. and Veredas, D. (2004). The stochastic conditional duration model: a latent variable model for the analysis of financial durations. Journal of Econometrics 119, 2, 381–412.

    Article  MathSciNet  MATH  Google Scholar 

  • Belfrage, M. (2016). ACDM: tools for autoregressive conditional duration models. (R package version 1.0.4).

  • Beran, J. (1994). Statistics for long-memory processes. CRC Press.

  • Bibinger, M. (2011). Efficient covariance estimation for asynchronous noisy high-frequency data. Scandinavian Journal of Statistics 38, 1, 23–45.

    Article  MathSciNet  MATH  Google Scholar 

  • Billio, M., Getmansky, M., Lo, A.W. and Pelizzon, L. (2012). Econometric measures of connectedness and systemic risk in the finance and insurance sectors. Journal of Financial Economics 104, 3, 535–559.

    Article  Google Scholar 

  • Bjursell, J. and Gentle, J.E. (2012). Identifying jumps in asset prices. In: Handbook of computational finance, pp. 371–399. Springer.

  • Black, F. (1976). Studies of stock market volatility changes. In: Proceedings of the American statistical association business and economic statistics section, pp. 177–181.

  • Black, F. (1986). Noise. The Journal of Finance 41, 3, 528–543.

    Article  Google Scholar 

  • Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 3, 307–327.

    Article  MathSciNet  MATH  Google Scholar 

  • Boudt, K., Cornelissen, J., Payseur, S., Kleen, O. and Sjoerup, E. (2021a). Highfrequency: tools for highfrequency data analysis. https://CRAN.R-project.org/package=highfrequency. R package version 0.9.0.

  • Boudt, K., Kleen, O. and Sjørup, E. (2021b). Analyzing intraday financial data in r: the highfrequency package. Available at SSRN 3917548.

  • Brouste, A., Fukasawa, M., Hino, H., Iacus, S., Kamatani, K., Koike, Y., Masuda, H., Nomura, R., Ogihara, T., Shimuzu, Y. et al (2014). The yuima project: a computational framework for simulation and inference of stochastic differential equations. Journal of Statistical Software 57, 1–51.

    Article  Google Scholar 

  • Buccheri, G., Bormetti, G., Corsi, F. and Lillo, F. (2021a). A score-driven conditional correlation model for noisy and asynchronous data: an application to high-frequency covariance dynamics. Journal of Business & Economic Statistics, 39, 4, 920–936.

    Article  MathSciNet  Google Scholar 

  • Buccheri, G., Corsi, F. and Peluso, S. (2021b). High-frequency lead-lag effects and cross-asset linkages: a multi-asset lagged adjustment model. Journal of Business & Economic Statistics, 39, 605–621.

    Article  MathSciNet  Google Scholar 

  • Cameron, A.C. and Trivedi, P.K. (2013). Regression analysis of count data. Cambridge University Press, Cambridge.

    Book  MATH  Google Scholar 

  • Cao, W., Hurvich, C. and Soulier, P. (2017). Drift in transaction-level asset price models. Journal of Time Series Analysis, 38, 5, 769–790.

    Article  MathSciNet  MATH  Google Scholar 

  • Carr, P. and Wu, L. (2003). The finite moment log stable process and option pricing. The Journal of Finance 58, 2, 753–777.

    Article  Google Scholar 

  • Carr, P., Madan, D. and Chang, E. (1998). The variance gamma process and option pricing. European Finance Review 2, 1, 79–105.

    Article  MATH  Google Scholar 

  • Carr, P., Geman, H., Madan, D.B. and Yor, M. (2002). The fine structure of asset returns: an empirical investigation. The Journal of Business 75, 2, 305–332.

    Article  Google Scholar 

  • Cartea, A. and Jaimungal, S. (2013). Modelling asset prices for algorithmic and high-frequency trading. Applied Mathematical Finance, 20, 6, 512–547.

    Article  MathSciNet  MATH  Google Scholar 

  • Chakrabarti, A. and Sen, R. (2019). Copula estimation for nonsynchronous financial data. ar**v:1904.10182.

  • Chan, K. (1992). A further analysis of the lead–lag relationship between the cash market and stock index futures market. The Review of Financial Studies5, 1, 123–152.

    Article  MathSciNet  Google Scholar 

  • Chen, C.W.S., Gerlach, R., Hwang, B.B.K and McAleer, M. (2012). Forecasting Value-at-Risk using nonlinear regression quantiles and the intra-day range. International Journal of Forecasting 28, 3, 557–574.

    Article  Google Scholar 

  • Chen, F., Diebold, F.X. and Schorfheide, F. (2013). A Markov-switching multifractal inter-trade duration model, with application to us equities. Journal of Econometrics 177, 2, 320–342.

    Article  MathSciNet  MATH  Google Scholar 

  • Chib, S., Nardari, F. and Shephard, N. (2002). Markov chain Monte Carlo methods for stochastic volatility models. Journal of Econometrics 108, 2, 281–316.

    Article  MathSciNet  MATH  Google Scholar 

  • Chib, S., Omori, Y. and Asai, M. (2009). Multivariate stochastic volatility. In: Handbook of financial time series, pp. 365–400. Springer.

  • Christensen, K., Kinnebrock, S. and Podolskij, M. (2010). Pre-averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data. Journal of Econometrics 159, 1, 116–133.

    Article  MathSciNet  MATH  Google Scholar 

  • Christensen, K., Oomen, R.C. and Podolskij, M. (2014). Fact or friction: jumps at ultra high frequency. Journal of Financial Economics 114, 3, 576–599.

    Article  Google Scholar 

  • Cont, R. (2011). Statistical modeling of high-frequency financial data. IEEE Signal Processing Magazine, 28, 5, 16–25.

    Article  Google Scholar 

  • Cont, R. and Tankov, P. (2004). Financial modeling with jump processes. Chapman & Hall/CRC, Boca Raton.

    MATH  Google Scholar 

  • Coroneo, L. and Veredas, D. (2012). A simple two-component model for the distribution of intraday returns. The European Journal Finance 18, 9, 775–797.

    Article  Google Scholar 

  • Corsi, F. and Audrino, F. (2012). Realized covariance tick-by-tick in presence of rounded time stamps and general microstructure effects. J. Financ. Econom.10, 591–616.

    Google Scholar 

  • Cox, D.R. (1972). Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 34, 187–202.

    MathSciNet  MATH  Google Scholar 

  • Cox, D.R. and Oakes, D. (2018). Analysis of survival data. Chapman and hall/CRC.

  • Daley, D.J. and Vere-Jones, D. (2003). An introduction to the theory of point processes: volume i: elementary theory and methods. Springer.

  • De Jong, F. and Nijman, T. (1997). High frequency analysis of lead-lag relationships between financial markets. J. Empir. Finance 4, 259–277.

    Article  Google Scholar 

  • Deo, R., Hsieh, M. and Hurvich, C.M. (2010). Long memory in intertrade durations, counts and realized volatility of NYSE stocks. J. Stat. Plan. Inference 140, 3715–3733.

    Article  MathSciNet  MATH  Google Scholar 

  • Diamond, D.W. and Verrecchia, R.E. (1987). Constraints on short-selling and asset price adjustment to private information. J. Financ. Econ. 18, 277–311.

    Article  Google Scholar 

  • Diebold, F.X. and Yılmaz, K. (2014). On the network topology of variance decompositions: measuring the connectedness of financial firms. J. Econ.182, 119–134.

    Article  MathSciNet  MATH  Google Scholar 

  • Dionne, G., Duchesne, P. and Pacurar, M. (2009). Intraday value at risk (IVaR) using tick-by-tick data with application to the Toronto Stock Exchange. J. Empir. Finance 16, 777–792.

    Article  Google Scholar 

  • Dobrev, D. and Schaumburg, E. (2017). High-frequency cross-market trading: model free measurement and applications. Perspectives.

  • Duffie, D., Pan, J. and Singleton, K. (2000). Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 1343–1376.

    Article  MathSciNet  MATH  Google Scholar 

  • Dufour, A. and Engle, R.F. (2000). Time and the price impact of a trade. J. Financ. 55, 2467–2498.

    Article  Google Scholar 

  • Easley, D. and O’Hara, M. (1992). Time and the process of security price adjustment. J. Financ. 47, 577–605.

    Article  Google Scholar 

  • Easley, D., Kiefer, N.M., O’Hara, M. and Paperman, J.B. (1996). Liquidity, information, and infrequently traded stocks. J. Financ. 51, 1405–1436.

    Article  Google Scholar 

  • Easley, D., Hvidkjaer, S. and O’Hara, M. (2002). Is information risk a determinant of asset returns? J. Financ. 57, 2185–2221.

    Article  Google Scholar 

  • Easley, D., de Prado, M.M.L. and O’Hara, M. (2012a). The volume clock: insights into the high-frequency paradigm. J. Portfolio Manag. 39, 19–29.

    Article  Google Scholar 

  • Easley, D., López de Prado, M.M. and O’Hara, M. (2012b). Flow toxicity and liquidity in a high frequency world. Rev. Financ. Stud. 25, 1457–1493.

    Article  Google Scholar 

  • Easley, D., de Prado, M.L. and O’Hara, M. (2016). Discerning information from trade data. J. Financ. Econ. 120, 269–285.

    Article  Google Scholar 

  • Easley, D., López de Prado, M., O’Hara, M. and Zhang, Z. (2021). Microstructure in the machine age. Rev. Financ. Stud. 34, 3316–3363.

    Article  Google Scholar 

  • Eberlein, E. and Keller, U. (1995). Hyperbolic distributions in finance. Bernoulli 281–299.

  • Efron, B. (1986). Double exponential families and their use in generalized linear regression. J. Am. Stat. Assoc. 81, 709–721.

    Article  MathSciNet  MATH  Google Scholar 

  • Embrechts, P., Klüppelberg, C. and Mikosch, T. (2013). Modelling extremal events: for insurance and finance. Springer Science & Business Media.

  • Engle, R. (2002a). Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J. Bus. Econ. Stat. 20, 339–350.

    Article  MathSciNet  Google Scholar 

  • Engle, R. (2002b). New frontiers for ARCH models. J. Appl. Econom.17, 425–446.

    Article  Google Scholar 

  • Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econom.: J. Econom. Soc. 50, 987–1007.

    Article  MathSciNet  MATH  Google Scholar 

  • Engle, R.F. and Russell, J.R. (1998). Autoregressive conditional duration: a new model for irregularly spaced transaction data. Econometrica 66, 1127–1162.

    Article  MathSciNet  MATH  Google Scholar 

  • Epps, T.W. (1979). Comovements in stock prices in the very short run. J. Am. Stat. Assoc. 74, 291–298.

    Google Scholar 

  • Evans, K.P. (2011). Intraday jumps and us macroeconomic news announcements. J. Bank. Finance 35, 2511–2527.

    Article  Google Scholar 

  • Fan, J., Li, Y. and Yu, K. (2012). Vast volatility matrix estimation using high-frequency data for portfolio selection. J. Am. Stat. Assoc. 107, 412–428.

    Article  MathSciNet  MATH  Google Scholar 

  • Feng, Y. and Zhou, C. (2015). Forecasting financial market activity using a semiparametric fractionally integrated log-acd. Int. J. Forecast. 31, 349–363.

    Article  Google Scholar 

  • Fernandes, M. and Grammig, J. (2006). A family of autoregressive conditional duration models. J. Econ. 130, 1–23.

    Article  MathSciNet  MATH  Google Scholar 

  • Fissler, T. and Ziegel, J.F. (2016). Higher order elicitability and Osband’s principle. Ann. Stat. 44, 1680–1707.

    Article  MathSciNet  MATH  Google Scholar 

  • Fleming, T.R. and Harrington, D.P. (2011). Counting processes and survival analysis. Wiley, New York.

    MATH  Google Scholar 

  • Gerlach, R. and Chen, C.W. (2015). Bayesian expected shortfall forecasting incorporating the intraday range. J. Financ. Econom. 14, 128–158.

    Google Scholar 

  • Gerlach, R. and Wang, C. (2016). Forecasting risk via realized GARCH, incorporating the realized range. Quant. Finance 16, 501–511.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghalanos, A. (2020). Rugarch: univariate GARCH models. R package version 1.4-4.

  • Giot, P. (2005). Market risk models for intraday data. Eur. J. Finance11, 309–324.

    Article  Google Scholar 

  • Goldstein, M.A., Kumar, P. and Graves, F.C. (2014). Computerized and high-frequency trading. Financ. Rev. 49, 177–202.

    Article  Google Scholar 

  • Gordy, M.B. and Juneja, S. (2010). Nested simulation in portfolio risk measurement. Manag. Sci. 56, 1833–1848.

    Article  MATH  Google Scholar 

  • Granger, C.W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc., 424–438.

  • Hansen, P.R. and Lunde, A. (2005). A forecast comparison of volatility models: does anything beat a GARCH(1,1)? J. Appl. Econ. 20, 873–889.

    Article  MathSciNet  Google Scholar 

  • Hansen, P.R., Lunde, A. and Nason, J.M. (2003). Choosing the best volatility models: the model confidence set approach. Oxf. Bull. Econ. Stat. 65, 839–861.

    Article  Google Scholar 

  • Hansen, P.R., Huang, Z. and Shek, H.H. (2012). Realized GARCH: a joint model for returns and realized measures of volatility. J. Appl. Econ. 27, 877–906.

    Article  MathSciNet  Google Scholar 

  • Harris, L. (2003). Trading and exchanges: market microstructure for practitioners. Oxford University Press, Oxford.

    Google Scholar 

  • Harvey, A., Ruiz, E. and Shephard, N. (1994). Multivariate stochastic variance models. Rev. Econ. Stud. 61, 247–264.

    Article  MATH  Google Scholar 

  • Harvey, A.C. and Shephard, N. (1996). Estimation of an asymmetric stochastic volatility model for asset returns. J. Bus. Econ. Stat. 14, 429–434.

    Google Scholar 

  • Hasbrouck, J. (2007). Empirical market microstructure: the institutions, economics, and econometrics of securities trading. Oxford University Press, Oxford.

    Google Scholar 

  • Hautsch, N. (2011). Econometrics of financial high-frequency data. Springer Science & Business Media.

  • Hautsch, N., Klausurtagung, S. and Risiko, O. (2006). Generalized autoregressive conditional intensity models with long range dependence.

  • Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 83–90.

    Article  MathSciNet  MATH  Google Scholar 

  • Hayashi, T. and Koike, Y. (2017). Multi-scale analysis of lead-lag relationships in high-frequency financial markets. ar**v:1708.03992.

  • Hayashi, T., Yoshida, N. et al (2005). On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 11, 359–379.

    Article  MathSciNet  MATH  Google Scholar 

  • Heckman, J.J. and Singer, B. (1984). Econometric duration analysis. J. Econ. 24, 63–132.

    Article  MathSciNet  MATH  Google Scholar 

  • Heinen, A. (2003). Modelling time series count data: an autoregressive conditional poisson model available at SSRN 1117187.

  • Hoffmann, M., Rosenbaum, M. and Yoshida, N. (2013). Estimation of the lead-lag parameter from non-synchronous data. Bernoulli 19, 426–461.

    Article  MathSciNet  MATH  Google Scholar 

  • Hosszejni, D. and Kastner, G. (2019). Modeling univariate and multivariate stochastic volatility in R with stochvol and factorstochvol. ar**v:1906.12123.

  • Hsieh, M.-C., Hurvich, C. and Soulier, P. (2019). Modeling leverage and long memory in volatility in a pure-jump process. High Frequency 2, 124–141.

    Article  Google Scholar 

  • Huang, D., Zhu, S., Fabozzi, F.J. and Fukushima, M. (2010). Portfolio selection under distributional uncertainty: a relative robust cvar approach. Eur. J. Oper. Res. 203, 185–194.

    Article  MATH  Google Scholar 

  • Iacus, S.M. and Yoshida, N. (2018). Simulation and inference for stochastic processes with yuima. A comprehensive R framework for SDEs and other stochastic processes. Use R.

  • Jacod, J., Li, Y., Mykland, P.A., Podolskij, M. and Vetter, M. (2009). Microstructure noise in the continuous case: the pre-averaging approach. Stoch. Process. Appl. 119, 2249–2276.

    Article  MathSciNet  MATH  Google Scholar 

  • Jacquier, E., Polson, N.G. and Rossi, P.E. (2004). Bayesian analysis of stochastic volatility models with fat-tails and correlated errors. J. Econ.122, 185–212.

    Article  MathSciNet  MATH  Google Scholar 

  • Jasiak, J. (1999). Persistence in intertrade durations. Available at SSRN: https://ssrn.com/abstract=162008.

  • Jiang, G.J. and Oomen, R. (2005). A new test for jumps in asset prices. Preprint.

  • Jiang, G.J. and Oomen, R.C. (2008). Testing for jumps when asset prices are observed with noise—a “swap variance” approach. J. Econ. 144, 352–370.

    Article  MathSciNet  MATH  Google Scholar 

  • Kalbfleisch, J.D. and Prentice, R.L. (2011). The statistical analysis of failure time data. Wiley, New York.

    MATH  Google Scholar 

  • Keim, D.B. and Madhavan, A. (1996). The upstairs market for large-block transactions: analysis and measurement of price effects. Rev. Financ. Stud.9, 1–36.

    Article  Google Scholar 

  • Kleinbaum, D.G. and Klein, M. (2010). Survival analysis. Springer, Berlin.

    MATH  Google Scholar 

  • Kwok, S.S.M., Li, W.K. and Yu, P.L.H. (2009). The autoregressive conditional marked duration model: statistical inference to market microstructure. J. Data Sci.

  • Lancaster, T. (1979). Econometric methods for the duration of unemployment. Econom. J. Econom. Soc. 47, 939–956.

    MATH  Google Scholar 

  • Lane, W.R., Looney, S.W. and Wansley, J.W. (1986). An application of the cox proportional hazards model to bank failure. J. Bank. Finance 10, 511–531.

    Article  Google Scholar 

  • Lee, S.S. and Mykland, P.A. (2008). Jumps in financial markets: a new nonparametric test and jump dynamics. Rev. Financ. Stud. 21, 2535–2563.

    Article  Google Scholar 

  • Li, J., Todorov, V., Tauchen, G. and Lin, H. (2019). Rank tests at jump events. J. Bus. Econ. Stat. 37, 312–321.

    Article  MathSciNet  Google Scholar 

  • Liboschik, T., Fokianos, K. and Fried, R. (2017). tscount: an R package for analysis of count time series following generalized linear models. J. Stat. Softw. 82, 1–51.

    Article  Google Scholar 

  • Liu, H., Zou, J. and Ravishanker, N. (2018). Multiple day biclustering of high-frequency financial time series. Stat 7, e176.

    Article  MathSciNet  Google Scholar 

  • Liu, H., Zou, J. and Ravishanker, N. (2021). Clustering high-frequency financial time series based on information theory, forthcoming. Appl. Stoch. Models Bus. Ind.

  • Liu, S. and Tse, Y.-K. (2015). Intraday value-at-Risk: an asymmetric autoregressive conditional duration approach. J. Econ. 189, 437–446.

    Article  MathSciNet  MATH  Google Scholar 

  • Madan, D.B. and Seneta, E. (1990). The variance gamma (vg) model for share market returns. J. Bus. 511–524.

  • Mancino, M.E. and Sanfelici, S. (2011). Estimating covariance via fourier method in the presence of asynchronous trading and microstructure noise. J. Financ. Econom. 9, 367–408.

    Google Scholar 

  • Manganelli, S. (2005). Duration, volume and volatility impact of trades. J. Financ. Mark. 8, 377–399.

    Article  Google Scholar 

  • Martens, M. and Van Dijk, D. (2007). Measuring volatility with the realized range. J. Econ. 138, 181–207.

    Article  MathSciNet  MATH  Google Scholar 

  • Meng, X. and Taylor, J.W. (2020). Estimating value-at-risk and expected shortfall using the intraday low and range data. Eur. J. Oper. Res. 280, 191–202.

    Article  MathSciNet  MATH  Google Scholar 

  • Mies, F., Bibinger, M., Steland, A. and Podolskij, M. (2020). High-frequency inference for stochastic processes with jumps of infinite activity. PhD thesis, RWTH Aachen University.

  • Mukherjee, A., Peng, W., Swanson, N.R. and Yang, X. (2020). Financial econometrics and big data: a survey of volatility estimators and tests for the presence of jumps and co-jumps. In: Handbook of statistics, vol 42, pp 3–59. Elsevier.

  • Nelson, D.B. (1991). Conditional heteroskedasticity in asset returns: a new approach. Econom. J. Econom. Soc. 59, 347–370.

    MathSciNet  MATH  Google Scholar 

  • N.Y.S.E. Trade and Quote Database (2019). Retrieved from wharton research data services accessed.

  • O’Hara, M. (1997). Market microstructure theory. Wiley, New York.

    Google Scholar 

  • Pacurar, M. (2008). Autoregressive conditional duration models in finance: a survey of the theoretical and empirical literature. J. Econ. Surv. 22, 711–751.

    Article  Google Scholar 

  • Palma, W. (2007). Long-memory time series: theory and methods. Wiley, New York.

    Book  MATH  Google Scholar 

  • Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. J. Bus. 53, 61–65.

    Article  Google Scholar 

  • Peluso, S., Corsi, F. and Mira, A. (2014). A bayesian high-frequency estimator of the multivariate covariance of noisy and asynchronous returns. J. Financ. Econom. 13, 665–697.

    Google Scholar 

  • Renò, R. (2003). A closer look at the epps effect. Int. J. Theor. Appl. Finance 6, 87–102.

    Article  MATH  Google Scholar 

  • Robinson, P.M. (2003). Time series with long memory. Advanced Texts in Econometrics.

  • Rocco, M. (2014). Extreme value theory in finance: a survey. J. Econ. Surv. 28, 82–108.

    Article  Google Scholar 

  • Russell, J.R. (1999). Econometric modeling of multivariate irregularly-spaced high-frequency data. Working Paper, University of Chicago.

  • Rydberg, T.H. and Shephard, N. (2000). BIN models for trade-by-trade data. modelling the number of trades in a fixed interval of time. Econometric Society World Congress 2000 Contributed Papers 0740, Econometric Society. https://ideas.repec.org/p/ecm/wc2000/0740.html.

  • Sen, R. (2009). Jumps and microstructure noise in stock price volatility. Volatility, 163.

  • Shirota, S., Hizu, T. and Omori, Y. (2014). Realized stochastic volatility with leverage and long memory. Comput. Stat. Data Anal. 76, 618–641.

    Article  MathSciNet  MATH  Google Scholar 

  • So, M.K. and Xu, R. (2013). Forecasting intraday volatility and value-at-risk with high-frequency data. Asia-Pac. Finan. Markets 20, 83–111.

    Article  MATH  Google Scholar 

  • So, M.K., Chu, A.M., Lo, C.C. and Ip, C.Y. (2021). Volatility and dynamic dependence modeling: review, applications, and financial risk management. Wiley Interdiscip. Rev.: Comput. Stat., e1567.

  • Song, X., Kim, D., Yuan, H., Cui, X., Lu, Z., Zhou, Y. and Wang, Y. (2021). Volatility analysis with realized garch-itô models. J. Econ.222, 393–410.

    Article  MATH  Google Scholar 

  • Stroud, J.R. and Johannes, M.S. (2014). Bayesian modeling and forecasting of 24-hour high-frequency volatility. J. Am. Stat. Assoc. 109, 1368–1384.

    Article  MathSciNet  Google Scholar 

  • Sun, W., Rachev, S., Fabozzi, F.J. and Kalev, P.S. (2008). Fractals in trade duration: capturing long-range dependence and heavy tailedness in modeling trade duration. Ann. Finance 4, 217–241.

    Article  MATH  Google Scholar 

  • Swishchuk, A. and Huffman, A. (2020). General compound hawkes processes in limit order books. Risks 8, 28.

    Article  Google Scholar 

  • Takahashi, M., Omori, Y. and Watanabe, T. (2009). Estimating stochastic volatility models using daily returns and realized volatility simultaneously. Comput. Stat. Data Anal. 53, 2404–2426.

    Article  MathSciNet  MATH  Google Scholar 

  • Takahashi, M., Watanabe, T. and Omori, Y. (2016). Volatility and quantile forecasts by realized stochastic volatility models with generalized hyperbolic distribution. Int. J. Forecast. 32, 437–457.

    Article  Google Scholar 

  • Tay, A.S., Ting, C., Tse, Y.K. and Warachka, M. (2004). Transaction-data analysis of marked durations and their implications for market microstructure.

  • Tay, A.S., Ting, C., Kuen Tse, Y. and Warachka, M. (2011). The impact of transaction duration, volume and direction on price dynamics and volatility. Quant. Finance 11, 447–457.

    Article  MathSciNet  MATH  Google Scholar 

  • Taylor, S.J. (1982). Financial returns modelled by the product of two stochastic processes-a study of the daily sugar prices 1961–75. Time Ser. Anal. Theory Pract. 1, 203–226.

    Google Scholar 

  • Taylor, S.J. (1994). Modeling stochastic volatility: a review and comparative study. Math. Financ. 4, 183–204.

    Article  MATH  Google Scholar 

  • Thavaneswaran, A., Ravishanker, N. and Liang, Y. (2015). Generalized duration models and optimal estimation using estimating functions. Ann. Inst. Stat. Math. 67, 129–156.

    Article  MathSciNet  MATH  Google Scholar 

  • Therneau, T.M. (2021). Survival: a package for survival analysis in R. R package version 3.2-13.

  • Tsai, P.-C. and Shackleton, M.B. (2016). Detecting jumps in high-frequency prices under stochastic volatility: a review and a data-driven approach. In: Handbook of high-frequency trading and modeling in finance, pp 137–181.

  • Tsay, R.S. (2005). Analysis of financial time series. Wiley, New York.

    Book  MATH  Google Scholar 

  • Vasileios, S. (2015). acp: autoregressive conditional poisson (R package version 2.1).

  • Wang, Q., Figueroa-López, J.E. and Kuffner, T.A. (2021). Bayesian inference on volatility in the presence of infinite jump activity and microstructure noise. Electron. J. Stat. 15, 506–553.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Y. and Zou, J. (2014). Volatility analysis in high-frequency financial data. Wiley Interdiscip. Rev. Comput. Stat. 6, 393–404.

    Article  Google Scholar 

  • Yan, B. and Zivot, E. (2003). Analysis of high-frequency financial data with S-PLUS. Working paper, UWEC-2005-03. http://ideas.repec.org/p/udb/wpaper/uwec-2005-03.html.

  • Yu, J. and Meyer, R. (2006). Multivariate stochastic volatility models: bayesian estimation and model comparison. Econ. Rev. 25, 361–384.

    Article  MathSciNet  MATH  Google Scholar 

  • Zaatour, R. (2014). Hawkes: Hawkes process simulation and calibration toolkit (R package version 0.0-4).

  • Zhang, L. (2011). Estimating covariation: Epps effect, microstructure noise. J. Econ. 160, 33–47.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, Y., Zou, J., Ravishanker, N. and Thavaneswaran, A. (2019). Modeling financial durations using penalized estimating functions. Comput. Stat. Data Anal. 131, 145–158.

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, Y., Li, Y. and Li, G. (2016). On Fréchet autoregressive conditional duration models. J. Stat. Plan. Inference 175, 51–66.

    Article  MATH  Google Scholar 

  • žikeš, F., Baruník, J. and Shenai, N. (2017). Modeling and forecasting persistent financial durations. Econom. Rev. 36, 1081–1110.

    Article  MathSciNet  MATH  Google Scholar 

  • Zivot, E. and Wang, J. (2007). Modeling financial time series with s-plus®;, vol 191. Springer Science & Business Media.

Download references

Acknowledgements

The authors are very grateful to the reviewers and editors for their helpful suggestions for improving the paper.

Funding

This paper was based upon work partially supported by the National Science Foundation under Grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. In addition, the work of SB was supported in part by an NSF award (DMS-1812128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chiranjit Dutta.

Ethics declarations

Conflict of Interest

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dutta, C., Karpman, K., Basu, S. et al. Review of Statistical Approaches for Modeling High-Frequency Trading Data. Sankhya B 85 (Suppl 1), 1–48 (2023). https://doi.org/10.1007/s13571-022-00280-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-022-00280-7

Keywords

AMS (2000) subject classification

Navigation