Declustering Web Content Indices for Parallel Information Retrieval

  • Conference paper
  • First Online:
Web Intelligence: Research and Development (WI 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2198))

Included in the following conference series:

Abstract

We consider an information retrieval (IR) system on a low-cost highperformance PC cluster environment. The IR system replicates the Web pages locally, it is indexed by the inverted-index file (IIF), and the vector space model is used as ranking strategy. In the IR system, the inverted-index file (IIF) is partitioned into pieces using the lexical and the greedy declustering methods. The lexical method assigns each of the terms in the IIF lexicographically to each of the processing nodes in turn and the greedy one is based on the probability of co-occurrence of an arbitrary pair of terms in the IIF and distributed to the cluster nodes to be stored on each node’s hard disk. For each incoming user’s query with multiple terms, terms are sent to the corresponding nodes that contain the relevant pieces of the IIF to be evaluated in parallel. We study how query performance is affected by two declustering methods with various-sized IIF. According to the experiments, the greedy method shows about 3.7% enhancement overall when compared with the lexical method.

1 This paper was supported in part by the Korea Science and Engineering Foundation under contact NO. 2000-2-30300-002-3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Park, S.H., Kwon, H.C.: An Improved Relevance Feedback for Korean Information Retrieval System. Proceedings of the 16th IASTED International Conference on Applied Informatics, IASTED/ACTA Press, Garmisch-Partenkirchen, Germany (1998) 65–68

    Google Scholar 

  2. Frakes, W., Baeza-Yates, R.: Information retrieval-data structures & algorithms. Prentice-Hall (1992)

    Google Scholar 

  3. Cormack, G.V., Clarke, C.L.A., Palmer, C.R., Kisman, D.I.E.: Fast Automatic Passage Ranking (MultiText Experiment for TREC-8). The proceedings of the Eighth Text Retrieval Conference (TREC-8), Gaithersburg, Maryland (1999) 735–741

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chung, Y., Kwon, HC., Chung, SH., Ryu, K.R. (2001). Declustering Web Content Indices for Parallel Information Retrieval. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_41

Download citation

  • DOI: https://doi.org/10.1007/3-540-45490-X_41

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42730-8

  • Online ISBN: 978-3-540-45490-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Navigation