A Web Information Retrieval System

  • Conference paper
Intelligent Computing and Information Science (ICICIS 2011)

Abstract

An approach for the retrieval of price information from internet sites is applied to real-world application problems in this paper. The Web Information Retrieval System (WIRS) utilizes Hidden Markov Model (HMM) for its powerful capability to process temporal information. HMM is an extremely flexible tool and has been successfully applied to a wide variety of stochastic modeling tasks. In order to compare the prices and features of products from various web sites, the WIRS extracts prices and descriptions of various products within web pages. The WIRS is evaluated with real-world problems and compared with a conventional method and the result is reported in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chorbani, A.A., Xu, X.: A fuzzy markov model approach for predicting user navigation, pp. 307–311 (2007)

    Google Scholar 

  2. Godoy, D., Amandi, A.: Learning browsing patterns for context-aware recommendation. In: Proc. of IFIP AI, pp. 61–70 (2006)

    Google Scholar 

  3. Bayir, M.A., et al.: Smart Miner: A New Framework for Mining Large Scale Web Usage Data. In: Proc. of Int. WWW Conf., pp. 161–170 (2009)

    Google Scholar 

  4. Cao, H., et al.: Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs. In: Proc. of Int. WWW Conf., pp. 191–200 (2009)

    Google Scholar 

  5. Brin, S., Page, L.: The Anatomy of a Large-Scale HypertextualWeb Search Engine. In: Proc. of Int. WWW Conf., pp. 107–117 (1998)

    Google Scholar 

  6. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  7. Tomlin, J.A.: A New Paradigm for Ranking Pages on the World Wide Web. In: Proc. of. Int. WWW Conf., pp. 350–355 (2003)

    Google Scholar 

  8. Rilo, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrap**. In: Proc. of the 16th National Conf. on Articial Intelligence, pp. 811–816 (1999)

    Google Scholar 

  9. Sonderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1), 233–272 (1999)

    Article  Google Scholar 

  10. Leek, T.R.: Information Extraction Using Hidden Markov Models. Master thesis, UC, San Diego (1997)

    Google Scholar 

  11. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  12. Bing, L., Robert, G., Yanhong, Z.: Mining data records in web pages. In: Proc. of ACM SIGKDD, pp. 601–606 (2003)

    Google Scholar 

  13. Buttler, D., Liu, L., Pu, C.: A fully automated object extraction system for the world wide web. In: Proc.of IEEE ICDCS, pp. 361–370 (2001)

    Google Scholar 

  14. Chang, C., Lui, S.: IEPAD: Information extraction based on Pattern Discovery. In: Proc. of WWW Conf., pp. 682–688 (2001)

    Google Scholar 

  15. Park, D.-C., Kwon, O., Chung, J.: Centroid neural network with a divergence measure for GPDF data clustering. IEEE Trans. Neural Networks 19(6), 948–957 (2008)

    Article  Google Scholar 

  16. Jiang, J.: Modeling Syntactic Structures of Topics with a Nested HMM-LDA. In: Proc. of ICDM, pp. 824–829 (2009)

    Google Scholar 

  17. Park, D.-C., Huong, V.T.L., Woo, D.-M., Hieu, D., Ninh, S.: Information Extraction System Based on Hidden Markov Model. In: Yu, W., He, H., Zhang, N. (eds.) ISNN 2009. LNCS, vol. 5551, pp. 55–59. Springer, Heidelberg (2009)

    Google Scholar 

  18. Raghavan, V.V., Wang, G.S., Bollmann, P.: A Critical Investigation of Recall and Precision as Measures of Retrieval System Performance. ACM Trans. Info. Sys. 7(3), 205–229 (1989)

    Article  Google Scholar 

  19. http://www.cs.uic.edu/~liub/WebDataExtraction/MDR-download.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, TH. et al. (2011). A Web Information Retrieval System. In: Chen, R. (eds) Intelligent Computing and Information Science. ICICIS 2011. Communications in Computer and Information Science, vol 135. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18134-4_81

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18134-4_81

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-18133-7

  • Online ISBN: 978-3-642-18134-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation