Abstract
Scan statistics are commonly used in biology, medicine, engineering and other fields where interest is in the probability of observing clusters of events in a window at an unknown location. Due to the dependent nature of the number of events in a large number of overlap** window locations, even approximate solutions for the simplest scan statistics may require elaborate calculations. We propose a new martingale method which allows one to approximate the distribution for a wide variety of scan statistics, including some for which analytical results are computationally infeasible.
Similar content being viewed by others
References
Aalen, O. O. (1978). Nonparametric inference for a family of counting processes,Annals of Statistics,6, 701–726.
Aki, S. and Hirano, K. (1999). Sooner and later waiting time problems for runs in Markov dependent bivariate trials,Annals of the Institute of Statistical Mathematics,51, 17–29.
Andersen, P. K., Borgan, O., Gill, R. D. and Keiding, N. (1993).Statistical Methods Based on Counting Processes, Springer Series in Statistics, Springer-Verlag, New York.
Antzoulakos, D. (2001). Waiting times for patterns in a sequence of multistate trials,Journal of Applied Probability,38, 508–518.
Balakrishnan, N. and Koutras, M. V. (2002).Runs and Scans with Applications, Wiley Series in Probability and Statistics, John Wiley & Sons, New York.
Biggins, J. D. and Cannings, C. (1987). Markov renewal processes, counters and repeated sequences in Markov chains,Advances in Applied Probability,19, 521–545.
Blom, G. and Thorburn, D. (1982). How many random digits are required until given sequences are obtained?,Journal of Applied Probability,19, 518–531.
Blom, G., Holst, L. and Sandell, D. (1994).Problem and Snapshots from the World of Probability, Springer-Verlag, New York.
Breen, S., Waterman, M. and Zhang, N. (1985). Renewal theory for several patterns,Journal of Applied Probability,22, 228–234.
Chao, M. T. and Fu, J. C. (1991). The reliability of large series systems under Markov structure,Advances in Applied Probability,23, 894–908.
Chrysaphinou, O. and Papastavridis, S. (1990). The occurrence of a sequence of patterns in repeated dependent experiments,Theory of Probability and Applications,35, 145–152.
Coulston, J. and Riitters, K. (2003). Geographic analysis of forest health indicators using spatial scan statistics,Environmental Management,31, 764–773.
Durand, D. and Sankoff, D. (2003). Tests for gene clustering,Journal of Computational Biology,10, 453–482.
Enemark, L., Ahrens, P., Juel, D., Petersen, E., Petersen, R., Andersen, J., Lind, P. and Thamsborg, S. (2002). Molecular characterization of Danish Cryptosporidium parvum isolates,Parasitology,125, 331–341.
Feller, W. (1968).An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., Wiley, New York.
Fu, J. C. (1986). Reliability of consecutive-k-out-of-n: F systems with (k−1)-step Markov dependence,IEEE Transactions on Reliability,R35, 602–606.
Fu, J. C. (1996). Distribution theory of runs and patterns associated with a sequence of multi-state trials,Statistics Sinica,6, 957–974.
Fu, J. (2001). Distribution of the scan statistics for a sequence of bistate trials,Journal of Applied Probability,38, 908–916.
Fu, J. and Chang, Y. (2002). On probability generating functions for waiting time distribution of compound patterns in a sequence of multistate trials,Journal of Applied Probability,39, 70–80.
Fu, J. C. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach.Journal of the American Statistical Association,78, 168–175.
Fu, J. C. and Lou, W. Y. W. (2003).Distribution Theory of Runs and Patterns, World Scientific Publishing, Singapore.
Gerber, H. and Li, S. (1981). The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain,Stochastic Processes and Their Applications,11, 101–108.
Glaz, J. and Balakrishnan, N. (eds.) (1999).Recent Advances on Scan Statistics, Birkhauser Publishers, Boston.
Glaz, J. and Naus, J. (1991). Tight bounds for scan statistics probabilities for discrete data,Annals of Applied Probability,1, 306–318.
Glaz, J., Naus, J. and Wallenstein, S. (2001).Scan Statistics, Springer, New-York.
Goldstein, L. and Waterman, M. S. (1992). Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons,Bulletin of Mathematical Biology,54, 785–812.
Han, Q. and Hirano, K. (2003). Sooner and later waiting time problems for patterns in Markov dependent trials,Journal of Applied Probability,40, 73–86.
Kaminski, R., Jefferis, E. and Chanhatasilpa, C. (2003). A spatial analysis of American police killed in the line of duty,Atlas of Crime: Map** the Criminal Landscape (eds. L. S. Turnbull, E. H. Hendrix and B. D. Dent), Oryx Press, Phoenix, Arizona.
Karlin, S. and Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis,Science,257, 39–49.
Kulldorff, M. (1997). A spatial scan statistic,Communications in Statistics: Theory and Methods,26, 1481–1496.
Li, S. (1980). A martingale approach to the study of occurrence of sequence patterns in repeated experiments,the Annals of Probability,8, 1171–1176.
Loader, C. (1991). Large deviation approximations to distribution of scan statistics,Advances in Applied Probability,23, 751–771.
Margai, F. and Henry, N. (2003). A community-based assessment of learning disabilities using environmental and contextual risk factors,Social Science and Medicine,56, 1073–1085.
Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line,Journal of the American Statistical Association,60, 532–538.
Naus, J. I. and Sheng, K. N. (1997). Matching among multiple random sequences,Bulletin of Mathematical Biology,59, 483–496.
Naus, J. I. and Stefanov, V. T. (2002). Double-scan statistics,Methodology and Computing in Applied Probability,4, 163–180.
Naus, J. I. and Wartenberg, D. A. (1997). A double-scan statistic for clusters of two types of events,Journal of the American Statistical Association,92, 1105–1113.
Pozdnyakov, V. and Kulldorff, M. (2003). On the occurrence of sequence patterns: An alternative proof and extended results (preprint).
Robin, S. and Daudin, J.-J. (2001). Exact distribution of the distances between any occurence of a set of words,Annals of the Institute of Statistical Mathematics,53, 895–905.
Sheng, K.-N. and Naus, J. (1994). Pattern matching between two non-aligned random sequences,Bulletin of Mathematical Biology,56, 1143–1162.
Shiryaev, A. N. (1995).Probability, 2nd ed., Springer, New York.
Shmueli, G. (2003a). Computing consecutive-type reliabilities non-recursively,IEEE Transactions on Reliability,52, 367–372.
Shmueli, G. (2003b). System-wide probabilities for systems with runs and scans rules,Methodology and Computing in Applied Probability,4, 401–419.
Stefanov, V. T. (2000). On some waiting time problems,Journal of Applied Probability,37, 756–764.
Stefanov, V. T. (2003). The intersite distances between pattern occurrences in strings generated by general discrete- and continuous-time models: An algorithmic approach,Journal of Applied Probability,40, 881–892.
Stefanov, V. T. and Pakes, A. G. (1997). Explicit distributional results in pattern formation,Annals of Applied Probability,7, 666–678.
Uchida, M. (1998). On generating functions of waiting time problems for sequence patterns of discrete random variables,Annals of the Institute of Statistical Mathematics,50, 655–671.
Williams, D. (1991).Probability with Martingales, Cambridge University Press, Cambridge.
Yoshida, M., Naya, Y. and Miyashita, Y. (2003). Anatomical organization of forward fiber projections from area TE to perirhinal neurons representing visual long-term memory in monkeys,Proceedings of the National Academy of Sciences of the United States of America,100, 4257–4262.
Author information
Authors and Affiliations
About this article
Cite this article
Pozdnyakov, V., Glaz, J., Kulldorff, M. et al. A martingale approach to scan statistics. Ann Inst Stat Math 57, 21–37 (2005). https://doi.org/10.1007/BF02506876
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02506876