Abstract
An approach for the retrieval of price information from internet sites is applied to real-world application problems in this paper. The Web Information Retrieval System (WIRS) utilizes Hidden Markov Model (HMM) for its powerful capability to process temporal information. HMM is an extremely flexible tool and has been successfully applied to a wide variety of stochastic modeling tasks. In order to compare the prices and features of products from various web sites, the WIRS extracts prices and descriptions of various products within web pages. The WIRS is evaluated with real-world problems and compared with a conventional method and the result is reported in this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chorbani, A.A., Xu, X.: A fuzzy markov model approach for predicting user navigation, pp. 307–311 (2007)
Godoy, D., Amandi, A.: Learning browsing patterns for context-aware recommendation. In: Proc. of IFIP AI, pp. 61–70 (2006)
Bayir, M.A., et al.: Smart Miner: A New Framework for Mining Large Scale Web Usage Data. In: Proc. of Int. WWW Conf., pp. 161–170 (2009)
Cao, H., et al.: Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs. In: Proc. of Int. WWW Conf., pp. 191–200 (2009)
Brin, S., Page, L.: The Anatomy of a Large-Scale HypertextualWeb Search Engine. In: Proc. of Int. WWW Conf., pp. 107–117 (1998)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)
Tomlin, J.A.: A New Paradigm for Ranking Pages on the World Wide Web. In: Proc. of. Int. WWW Conf., pp. 350–355 (2003)
Rilo, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrap**. In: Proc. of the 16th National Conf. on Articial Intelligence, pp. 811–816 (1999)
Sonderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1), 233–272 (1999)
Leek, T.R.: Information Extraction Using Hidden Markov Models. Master thesis, UC, San Diego (1997)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of IEEE 77(2), 257–286 (1989)
Bing, L., Robert, G., Yanhong, Z.: Mining data records in web pages. In: Proc. of ACM SIGKDD, pp. 601–606 (2003)
Buttler, D., Liu, L., Pu, C.: A fully automated object extraction system for the world wide web. In: Proc.of IEEE ICDCS, pp. 361–370 (2001)
Chang, C., Lui, S.: IEPAD: Information extraction based on Pattern Discovery. In: Proc. of WWW Conf., pp. 682–688 (2001)
Park, D.-C., Kwon, O., Chung, J.: Centroid neural network with a divergence measure for GPDF data clustering. IEEE Trans. Neural Networks 19(6), 948–957 (2008)
Jiang, J.: Modeling Syntactic Structures of Topics with a Nested HMM-LDA. In: Proc. of ICDM, pp. 824–829 (2009)
Park, D.-C., Huong, V.T.L., Woo, D.-M., Hieu, D., Ninh, S.: Information Extraction System Based on Hidden Markov Model. In: Yu, W., He, H., Zhang, N. (eds.) ISNN 2009. LNCS, vol. 5551, pp. 55–59. Springer, Heidelberg (2009)
Raghavan, V.V., Wang, G.S., Bollmann, P.: A Critical Investigation of Recall and Precision as Measures of Retrieval System Performance. ACM Trans. Info. Sys. 7(3), 205–229 (1989)
http://www.cs.uic.edu/~liub/WebDataExtraction/MDR-download.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, TH. et al. (2011). A Web Information Retrieval System. In: Chen, R. (eds) Intelligent Computing and Information Science. ICICIS 2011. Communications in Computer and Information Science, vol 135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18134-4_81
Download citation
DOI: https://doi.org/10.1007/978-3-642-18134-4_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18133-7
Online ISBN: 978-3-642-18134-4
eBook Packages: Computer ScienceComputer Science (R0)