Log in

A martingale approach to scan statistics

  • Scan Statistics
  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Scan statistics are commonly used in biology, medicine, engineering and other fields where interest is in the probability of observing clusters of events in a window at an unknown location. Due to the dependent nature of the number of events in a large number of overlap** window locations, even approximate solutions for the simplest scan statistics may require elaborate calculations. We propose a new martingale method which allows one to approximate the distribution for a wide variety of scan statistics, including some for which analytical results are computationally infeasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aalen, O. O. (1978). Nonparametric inference for a family of counting processes,Annals of Statistics,6, 701–726.

    MATH  MathSciNet  Google Scholar 

  • Aki, S. and Hirano, K. (1999). Sooner and later waiting time problems for runs in Markov dependent bivariate trials,Annals of the Institute of Statistical Mathematics,51, 17–29.

    Article  MATH  MathSciNet  Google Scholar 

  • Andersen, P. K., Borgan, O., Gill, R. D. and Keiding, N. (1993).Statistical Methods Based on Counting Processes, Springer Series in Statistics, Springer-Verlag, New York.

    Google Scholar 

  • Antzoulakos, D. (2001). Waiting times for patterns in a sequence of multistate trials,Journal of Applied Probability,38, 508–518.

    Article  MATH  MathSciNet  Google Scholar 

  • Balakrishnan, N. and Koutras, M. V. (2002).Runs and Scans with Applications, Wiley Series in Probability and Statistics, John Wiley & Sons, New York.

    MATH  Google Scholar 

  • Biggins, J. D. and Cannings, C. (1987). Markov renewal processes, counters and repeated sequences in Markov chains,Advances in Applied Probability,19, 521–545.

    Article  MATH  MathSciNet  Google Scholar 

  • Blom, G. and Thorburn, D. (1982). How many random digits are required until given sequences are obtained?,Journal of Applied Probability,19, 518–531.

    Article  MATH  MathSciNet  Google Scholar 

  • Blom, G., Holst, L. and Sandell, D. (1994).Problem and Snapshots from the World of Probability, Springer-Verlag, New York.

    Google Scholar 

  • Breen, S., Waterman, M. and Zhang, N. (1985). Renewal theory for several patterns,Journal of Applied Probability,22, 228–234.

    Article  MATH  MathSciNet  Google Scholar 

  • Chao, M. T. and Fu, J. C. (1991). The reliability of large series systems under Markov structure,Advances in Applied Probability,23, 894–908.

    Article  MATH  MathSciNet  Google Scholar 

  • Chrysaphinou, O. and Papastavridis, S. (1990). The occurrence of a sequence of patterns in repeated dependent experiments,Theory of Probability and Applications,35, 145–152.

    Article  MATH  MathSciNet  Google Scholar 

  • Coulston, J. and Riitters, K. (2003). Geographic analysis of forest health indicators using spatial scan statistics,Environmental Management,31, 764–773.

    Article  Google Scholar 

  • Durand, D. and Sankoff, D. (2003). Tests for gene clustering,Journal of Computational Biology,10, 453–482.

    Article  Google Scholar 

  • Enemark, L., Ahrens, P., Juel, D., Petersen, E., Petersen, R., Andersen, J., Lind, P. and Thamsborg, S. (2002). Molecular characterization of Danish Cryptosporidium parvum isolates,Parasitology,125, 331–341.

    Article  Google Scholar 

  • Feller, W. (1968).An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., Wiley, New York.

    MATH  Google Scholar 

  • Fu, J. C. (1986). Reliability of consecutive-k-out-of-n: F systems with (k−1)-step Markov dependence,IEEE Transactions on Reliability,R35, 602–606.

    Article  MATH  Google Scholar 

  • Fu, J. C. (1996). Distribution theory of runs and patterns associated with a sequence of multi-state trials,Statistics Sinica,6, 957–974.

    MATH  Google Scholar 

  • Fu, J. (2001). Distribution of the scan statistics for a sequence of bistate trials,Journal of Applied Probability,38, 908–916.

    Article  MATH  MathSciNet  Google Scholar 

  • Fu, J. and Chang, Y. (2002). On probability generating functions for waiting time distribution of compound patterns in a sequence of multistate trials,Journal of Applied Probability,39, 70–80.

    Article  MATH  MathSciNet  Google Scholar 

  • Fu, J. C. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach.Journal of the American Statistical Association,78, 168–175.

    MathSciNet  Google Scholar 

  • Fu, J. C. and Lou, W. Y. W. (2003).Distribution Theory of Runs and Patterns, World Scientific Publishing, Singapore.

    MATH  Google Scholar 

  • Gerber, H. and Li, S. (1981). The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain,Stochastic Processes and Their Applications,11, 101–108.

    Article  MATH  MathSciNet  Google Scholar 

  • Glaz, J. and Balakrishnan, N. (eds.) (1999).Recent Advances on Scan Statistics, Birkhauser Publishers, Boston.

    Google Scholar 

  • Glaz, J. and Naus, J. (1991). Tight bounds for scan statistics probabilities for discrete data,Annals of Applied Probability,1, 306–318.

    MATH  MathSciNet  Google Scholar 

  • Glaz, J., Naus, J. and Wallenstein, S. (2001).Scan Statistics, Springer, New-York.

    MATH  Google Scholar 

  • Goldstein, L. and Waterman, M. S. (1992). Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons,Bulletin of Mathematical Biology,54, 785–812.

    MATH  Google Scholar 

  • Han, Q. and Hirano, K. (2003). Sooner and later waiting time problems for patterns in Markov dependent trials,Journal of Applied Probability,40, 73–86.

    Article  MATH  MathSciNet  Google Scholar 

  • Kaminski, R., Jefferis, E. and Chanhatasilpa, C. (2003). A spatial analysis of American police killed in the line of duty,Atlas of Crime: Map** the Criminal Landscape (eds. L. S. Turnbull, E. H. Hendrix and B. D. Dent), Oryx Press, Phoenix, Arizona.

    Google Scholar 

  • Karlin, S. and Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis,Science,257, 39–49.

    Article  Google Scholar 

  • Kulldorff, M. (1997). A spatial scan statistic,Communications in Statistics: Theory and Methods,26, 1481–1496.

    MATH  MathSciNet  Google Scholar 

  • Li, S. (1980). A martingale approach to the study of occurrence of sequence patterns in repeated experiments,the Annals of Probability,8, 1171–1176.

    MATH  MathSciNet  Google Scholar 

  • Loader, C. (1991). Large deviation approximations to distribution of scan statistics,Advances in Applied Probability,23, 751–771.

    Article  MATH  MathSciNet  Google Scholar 

  • Margai, F. and Henry, N. (2003). A community-based assessment of learning disabilities using environmental and contextual risk factors,Social Science and Medicine,56, 1073–1085.

    Article  Google Scholar 

  • Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line,Journal of the American Statistical Association,60, 532–538.

    Article  MathSciNet  Google Scholar 

  • Naus, J. I. and Sheng, K. N. (1997). Matching among multiple random sequences,Bulletin of Mathematical Biology,59, 483–496.

    MATH  Google Scholar 

  • Naus, J. I. and Stefanov, V. T. (2002). Double-scan statistics,Methodology and Computing in Applied Probability,4, 163–180.

    Article  MATH  MathSciNet  Google Scholar 

  • Naus, J. I. and Wartenberg, D. A. (1997). A double-scan statistic for clusters of two types of events,Journal of the American Statistical Association,92, 1105–1113.

    Article  MATH  MathSciNet  Google Scholar 

  • Pozdnyakov, V. and Kulldorff, M. (2003). On the occurrence of sequence patterns: An alternative proof and extended results (preprint).

  • Robin, S. and Daudin, J.-J. (2001). Exact distribution of the distances between any occurence of a set of words,Annals of the Institute of Statistical Mathematics,53, 895–905.

    Article  MATH  MathSciNet  Google Scholar 

  • Sheng, K.-N. and Naus, J. (1994). Pattern matching between two non-aligned random sequences,Bulletin of Mathematical Biology,56, 1143–1162.

    MATH  Google Scholar 

  • Shiryaev, A. N. (1995).Probability, 2nd ed., Springer, New York.

    MATH  Google Scholar 

  • Shmueli, G. (2003a). Computing consecutive-type reliabilities non-recursively,IEEE Transactions on Reliability,52, 367–372.

    Article  Google Scholar 

  • Shmueli, G. (2003b). System-wide probabilities for systems with runs and scans rules,Methodology and Computing in Applied Probability,4, 401–419.

    MathSciNet  Google Scholar 

  • Stefanov, V. T. (2000). On some waiting time problems,Journal of Applied Probability,37, 756–764.

    Article  MATH  MathSciNet  Google Scholar 

  • Stefanov, V. T. (2003). The intersite distances between pattern occurrences in strings generated by general discrete- and continuous-time models: An algorithmic approach,Journal of Applied Probability,40, 881–892.

    Article  MATH  MathSciNet  Google Scholar 

  • Stefanov, V. T. and Pakes, A. G. (1997). Explicit distributional results in pattern formation,Annals of Applied Probability,7, 666–678.

    Article  MATH  MathSciNet  Google Scholar 

  • Uchida, M. (1998). On generating functions of waiting time problems for sequence patterns of discrete random variables,Annals of the Institute of Statistical Mathematics,50, 655–671.

    Article  MATH  MathSciNet  Google Scholar 

  • Williams, D. (1991).Probability with Martingales, Cambridge University Press, Cambridge.

    MATH  Google Scholar 

  • Yoshida, M., Naya, Y. and Miyashita, Y. (2003). Anatomical organization of forward fiber projections from area TE to perirhinal neurons representing visual long-term memory in monkeys,Proceedings of the National Academy of Sciences of the United States of America,100, 4257–4262.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

About this article

Cite this article

Pozdnyakov, V., Glaz, J., Kulldorff, M. et al. A martingale approach to scan statistics. Ann Inst Stat Math 57, 21–37 (2005). https://doi.org/10.1007/BF02506876

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02506876

Key words and phrases

Navigation