Abstract
We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov’s nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m+ 1)(2m+1 + |Σ|) bits, where m is the number of characters (excluding operator symbols) in the regular expression and Σ is the alphabet. This compares favorably against the worst case of (m + 1)2m+1|Σ| bits needed by a classical DFA representation and m(22m+1 + |Σ|) bits needed by the Wu and Manber approach implemented in Agrep.
Our approach is practical and simple to implement, and it permits searching regular expressions of moderate size (which include most cases of interest) faster than with any previously existing algorithm, as we show experimentally.
Partially supported by ECOS-Sud project C99E04 and, for the first author, Fondecyt grant 1-990627.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques and Tools. Addison-Wesley, 1985.
R. Baeza-Yates and G. Gonnet. A new approach to text searching. CACM, 35(10):74–82, October 1992.
G. Berry and R. Sethi. From regular expression to deterministic automata. Theoretical Computer Science, 48(1):117–126, 1986.
A. Brüggemann-Klein. Regular expressions into finite automata. Theoretical Computer Science, 120(2):197–213, November 1993.
C.-H. Chang and R. Paige. From regular expression to DFA’s using NFA’s. In Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching, LNCS v. 664, pages 90–110, 1992.
V.-M. Glushkov. The abstract theory of automata. Russian Mathematical Surveys, 16:1–53, 1961.
E. Myers. A four-russian algorithm for regular expression pattern matching. J. of the ACM, 39(2):430–448, 1992.
G. Navarro and M. Raffinot. Fast regular expression search. In Proceedings of the 3rd Workshop on Algorithm Engineering, LNCS v. 1668, pages 199–213, 1999.
G. Navarro and M. Raffinot. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA), 5(4), 2000. http://www.jea.acm.org/2000/NavarroString.
K. Thompson. Regular expression search algorithm. CACM, 11(6):419–422, 1968.
B. Watson. Taxonomies and Toolkits of Regular Language Algorithms. Phd. dissertation, Eindhoven University of Technology, The Netherlands, 1995.
S. Wu and U. Manber. Agrep-a fast approximate pattern-matching tool. In Proc. of USENIX Technical Conference, pages 153–162, 1992.
S. Wu and U. Manber. Fast text searching allowing errors. CACM, 35(10):83–91, October 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Navarro, G., Raffinot, M. (2001). Compact DFA Representation for Fast Regular Expression Search. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds) Algorithm Engineering. WAE 2001. Lecture Notes in Computer Science, vol 2141. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44688-5_1
Download citation
DOI: https://doi.org/10.1007/3-540-44688-5_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42500-7
Online ISBN: 978-3-540-44688-0
eBook Packages: Springer Book Archive