Collecting Valuable Information from Fast Text Streams

Qi, Baoyuan; Ma, Gang; Shi, Zhongzhi; Wang, Wei

doi:10.1007/978-3-662-44980-6_11

Baoyuan Qi^5,6,
Gang Ma^5,6,
Zhongzhi Shi⁵ &
…
Wei Wang⁷

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 432))

Included in the following conference series:

International Conference on Intelligent Information Processing

832 Accesses

Abstract

It has become a challenging work to collect valuable information from fast text streams. In this work, we propose a method which gains useful information effectively and efficiently. Firstly, we maintain an analyzer based on the Trie structure and the dynamic N-Gram tokenizer; secondly, unlike the traditional search engine principle, we consider the documents as a query by building the indexes for the whole query base. The experimental results show that it has the strong adaption ability, low latency and high quality support for the complex query combination compared with the conventional methods.

Download to read the full chapter text

Chapter PDF

Dynamic sampling of text streams and its application in text analysis

Article 27 March 2017

Enabling Time Sensitive Information Retrieval on the Web through Real Time Search Engines Using Streams

Progressive Term Frequency Analysis on Large Text Collections

Keywords

References

Gama, J., et al.: Knowledge discovery from data streams, Citeseer (2010)
Google Scholar
Aggarwal, C.C.: Data streams: models and algorithms. Springer (2006)
Google Scholar
Graham, C., et al.: Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Found. Trends Databases 4(1-3), 1–294 (2012)
Article Google Scholar
Muthukrishnan, S.: Data streams: Algorithms and applications. Now Publishers Inc. (2005)
Google Scholar
Li, M., et al.: Time and space efficient spectral clustering via column sampling. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2011)
Google Scholar
Zhang, Y., et al.: Space-efficient relative error order sketch over data streams. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006. IEEE (2006)
Google Scholar
**oufis, E.S., et al.: Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 2. AAAI Press (2011)
Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Chapter Google Scholar
Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Irmak, U., et al.: Efficient query subscription processing for prospective search engines. In: Proceedings of the 15th International Conference on World Wide Web. ACM (2006)
Google Scholar
Kanlayanawat, W., Prasitjutrakul, S.: Automatic indexing for Thai text with unknown words using trie structure. In: Proceedings of the Natural Language Processing Pacific Rim Symposium (NLPRS 1997) (1997)
Google Scholar
Kijkanjanarat, T., Chao, H.: Fast IP lookups using a two-trie data structure. In: Global Telecommunications Conference, GLOBECOM 1999. IEEE (1999)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Brown, P.F., et al.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)
Google Scholar
Zhang, H.-P., et al.: HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, vol. 17, p. 2003. Association for Computational Linguistics (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing, Institute of Computing Technology, CAS, Bei**g, 100190, China
Baoyuan Qi, Gang Ma & Zhongzhi Shi
University of Chinese Academy of Sciences, Bei**g, 100190, China
Baoyuan Qi & Gang Ma
Bei**g Lexo Technologies Co., Ltd., Bei**g, 100080, China
Wei Wang

Authors

Baoyuan Qi
View author publications
You can also search for this author in PubMed Google Scholar
Gang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, 100190, Bei**g, China
Zhongzhi Shi
Department of Computer Science, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Computer Science Department, Indiana University, 47405, Bloomington, IN, USA
David Leake
School of Computer Science, University of Manchester, M13 9PL, Manchester, UK
Uli Sattler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, B., Ma, G., Shi, Z., Wang, W. (2014). Collecting Valuable Information from Fast Text Streams. In: Shi, Z., Wu, Z., Leake, D., Sattler, U. (eds) Intelligent Information Processing VII. IIP 2014. IFIP Advances in Information and Communication Technology, vol 432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44980-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-662-44980-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44979-0
Online ISBN: 978-3-662-44980-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Collecting Valuable Information from Fast Text Streams

Abstract

Chapter PDF

Similar content being viewed by others

Dynamic sampling of text streams and its application in text analysis

Enabling Time Sensitive Information Retrieval on the Web through Real Time Search Engines Using Streams

Progressive Term Frequency Analysis on Large Text Collections

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Collecting Valuable Information from Fast Text Streams

Abstract

Chapter PDF

Similar content being viewed by others

Dynamic sampling of text streams and its application in text analysis

Enabling Time Sensitive Information Retrieval on the Web through Real Time Search Engines Using Streams

Progressive Term Frequency Analysis on Large Text Collections

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation