Abstract
A Web site generally contains a wide range of topics which provide information for users who have different access interests and goals. This information is not randomly scattered, but well organized under a hierarchy encoded in the hyperlink structure of a Web site. It is intended to mold the user’s mental models of how the information is organized. On the other hand, user traversals over hyperlinks between Web pages can reveal semantic relationships between these pages. Unfortunately, the link structure of a Web site which represent the Web designer’s expectation on visitors may be quite different from the organization expected by visitors to this site. Discovering the conceptual page hierarchy from a user’s angle can help web masters to have an sight into real relationships among the Web pages and refine the link structure of the Web site to facilitate effective user navigation. In this paper, we propose a method to generate a conceptual page hierarchy of a Web site on the basis of user traversal history. We use maximal forward references to model user’s traversal behavior over the underlying link hierarchy of a Web site. We then build a weighted directed graph to represent the inter-relationships between Web pages. Finally we apply a “Maximum Spanning Tree” (MST) algorithm to generate a conceptual page hierarchy of the Web site. We demonstrate the effectiveness of our approach by conducting a preliminary experiment based on a real world Web data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chen, M., Park, J., Yu, P.: Efficient data mining for path traversal patterns. IEEE Trans. on Knowledge and Data Engineering, TKDE (1998)
Zeng, H.J., Chen, Z., Ma, W.Y.: A unified framework for clustering heterogeneousweb objects. In: WISE (2002)
Chen, M., LaPaugh, A., Singh, J.P.: Categorizing information objects from user access patterns. In: The Eleventh International Conference on Information and Knowledge Management (2002)
Kath, A., Smith, A.N.: Web page clustering using a self-organizing map of user navigation patterns. Decision Support Systems, Special issue: Web data mining 35 (2003)
Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge discovery from users web-page navigation. In: IEEE Workshop Research Issues in Data Engineering, pp. 20–29
Perkowitz, M., Etzioni, O.: Adaptive web sites: Automatically synthesizing web pages. In: The Fifteenth National Conf. on Artificial Intelligence (AAAI), pp. 727–732
Su, Z., Yang, Q., Zhang, H.J., Xu, X., Hu, Y.H.: Correlation-based document clustering using web logs. In: The 34th Hawaii International Conference On System Sciences(HICSS-34), January 3-6 (2001)
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Technical Report, TR99-010, Department of Computer Science, Depaul University (1999)
Nakayama, T., Kato, H., Yamane, Y.: Discovering the gap between web site designers’ expectations and users’ behavior. In: The Ninth Int’l World Wide Web Conference, Amsterdam
Srikant, R., Yang, Y.: Mining web logs to improve website organization. In: WWW (2001)
Rohlf, F.J.: Algorithm 76: Hierarchical clustering using the minimum spanning tree. Computing (1973)
Chu, Y.J., Liu, T.H.: On the shortest arborescence of a directed graph. Science Sinica (1965)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, X., Li, M., Zhao, W., Chen, DY. (2005). Discovering Conceptual Page Hierarchy of a Web Site from User Traversal History. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_64
Download citation
DOI: https://doi.org/10.1007/11527503_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)