WebSAM-Adapter: Adapting Segment Anything Model for Web Page Segmentation

Ren, Bowen; Qian, Zefeng; Sun, Yuchen; Gao, Chao; Zhang, Chongyang

doi:10.1007/978-3-031-56027-9_27

Bowen Ren¹⁴,
Zefeng Qian¹⁴,
Yuchen Sun¹⁴,
Chao Gao¹⁶ &
…
Chongyang Zhang^14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14608))

Included in the following conference series:

European Conference on Information Retrieval

560 Accesses

Abstract

With the advancement of internet technology, web page segmentation, which aims to divide web pages into semantically coherent units, has become increasingly crucial for web-related applications. Conventional purely visual web page segmentation approaches, which depend on traditional edge detection, face challenges in generalizing across complex web pages. Recently, the Segment Anything Model (SAM) represents remarkable visual understanding and segmentation abilities. This inspires us that SAM can also demonstrate great potential in Web Page Segmentation. However, due to the lack of web-specific training data, its direct adaptation to web page segmentation domain has been hindered. To address this challenge, we propose WebSAM-Adapter, an effective adaptation of SAM, featuring a three-module architecture specifically tailored for web page segmentation with minimal additional trainable parameters. First, we propose a patch embedding tune module for adjusting the frozen patch embedding features, which is crucial for modifying the distribution of the original model. Second, an edge components tune module is designed to learn significant structural features within each web page. Finally, the outputs of these specialized modules are sent into our key Adapter module, which employs a lightweight multi-layer perceptron (MLP) to amalgamate these enriched features and generate webpage-specific knowledge. To the best of our knowledge, our method is the first successful adaptation of a large visual model like SAM to web page segmentation. Empirical evaluations on the comprehensive Webis-WebSeg-20 dataset demonstrate our model’s state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 69.54; Price includes VAT (Germany)

Softcover Book: EUR 87.73; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kiesel, J., Kneist, F., Meyer, L., Komlossy, K., Stein, B., Potthast, M.: Web page segmentation revisited: evaluation framework and dataset. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3047–3054 (2020)
Google Scholar
Cai, D., He, X., Wen, J.-R., Ma, W.-Y.: Block-level link analysis. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 440–447 (2004)
Google Scholar
Bing, L., Guo, R., Lam, W., Niu, Z.-Y., Wang, H.: Web page segmentation with structured prediction and its application in web page classification. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 767–776 (2014)
Google Scholar
Akpinar, M.E., Yesilada, Y.: Vision based page segmentation algorithm: extended and perceived success. In: Sheng, Q.Z., Kjeldskov, J. (eds.) ICWE 2013. LNCS, vol. 8295, pp. 238–252. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-04244-2_22
Saar, T., Dumas, M., Kaljuve, M., Semenenko, N.: Browserbite: cross-browser testing via image processing. Softw. Pract. Exp. 46(11), 1459–1477 (2016)
Google Scholar
Mahajan, S., Abolhassani, N., McMinn, P., Halfond, W.G.: Automated repair of mobile friendly problems in web pages. In: Proceedings of the 40th International Conference on Software Engineering, pp. 140–150 (2018)
Google Scholar
Geng, G.-G., Lee, X.-D., Zhang, Y.-M.: Combating phishing attacks via brand identity and authorization features. Secur. Commun. Netw. 8(6), 888–898 (2015)
Article Google Scholar
Cormier, M., Cohen, R., Mann, R., Rahim, K., Wang, D.: A robust vision-based framework for screen readers. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 555–569. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_39
Chapter Google Scholar
Cormier, M., Moffatt, K., Cohen, R., Mann, R.: Purely vision-based segmentation of web pages for assistive technology. Comput. Vis. Image Underst. 148, 46–66 (2016)
Article Google Scholar
Sanoja, A., Gançarski, S.: Block-o-matic: a web page segmentation framework. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS), pp. 595–600. IEEE (2014)
Google Scholar
Vineel, G.: Web page dom node characterization and its application to page segmentation. In: 2009 IEEE International Conference on Internet Multimedia Services Architecture and Applications (IMSAA), pp. 1–6. IEEE (2009)
Google Scholar
Chen, Y., Ma, W.-Y., Zhang, H.-J.: Detecting web page structure for adaptive viewing on small form factor devices. In: Proceedings of the 12th International Conference on World Wide Web, pp. 225–233 (2003)
Google Scholar
Rajkumar, K., Kalaivani, V.: Dynamic web page segmentation based on detecting reappearance and layout of tag patterns for small screen devices. In: 2012 International Conference on Recent Trends in Information Technology, pp. 508–513. IEEE (2012)
Google Scholar
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Vips: a vision-based page segmentation algorithm (2003)
Google Scholar
Zeleny, J., Burget, R., Zendulka, J.: Box clustering segmentation: a new method for vision-based web page preprocessing. Inf. Process. Manag. 53(3), 735–750 (2017)
Article Google Scholar
Bajammal, M., Mesbah, A.: Page segmentation using visual adjacency analysis. ar**v preprint ar**v:2112.11975 (2021)
Andrew, J., Ferrari, S., Maurel, F., Dias, G., Giguet, E.: Web page segmentation for non visual skimming. In: The 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33) (2019)
Google Scholar
Manabe, T., Tajima, K.: Extracting logical hierarchical structure of html documents based on headings. In: Proceedings of the VLDB Endowment, pp. 1606–1617 (2015). http://dx.doi.org/10.14778/2824032.2824058
Cao, J., Mao, B., Luo, J.: A segmentation method for web page analysis using shrinking and dividing. Int. J. Parallel Emergent Distrib. Syst. 25(2), 93–104 (2010)
Article MathSciNet Google Scholar
Cormer, M., Mann, R., Moffatt, K., Cohen, R.: Towards an improved vision-based web page segmentation algorithm. In: 2017 14th Conference on Computer and Robot Vision (CRV), pp. 345–352. IEEE (2017)
Google Scholar
Kirillov, A., et al.: Segment anything. ar**v preprint ar**v:2304.02643 (2023)
Ma, J., Wang, B.: Segment anything in medical images. ar**v preprint ar**v:2304.12306 (2023)
Wu, J., et al.: Medical sam adapter: adapting segment anything model for medical image segmentation. ar**v preprint ar**v:2304.12620 (2023)
Shaharabany, T., Dahan, A., Giryes, R., Wolf, L.: Autosam: adapting sam to medical images by overloading the prompt encoder. ar**v preprint ar**v:2306.06370 (2023)
Chen, K., et al.: Rsprompter: learning to prompt for remote sensing instance segmentation based on visual foundation model. ar**v preprint ar**v:2306.16269 (2023)
Chen, T., et al.: Sam fails to segment anything?-sam-adapter: adapting sam in underperformed scenes: Camouflage, shadow, and more. ar**v preprint ar**v:2304.09148 (2023)
Tang, L., **ao, H., Li, B.: Can sam segment anything? when sam meets camouflaged object detection. ar**v preprint ar**v:2304.04709 (2023)
Zaken, E.B., Ravfogel, S., Goldberg, Y.: Bitfit: simple parameter-efficient fine-tuning for transformer-based masked language-models. ar**v preprint ar**v:2106.10199 (2021)
Liu, W., Shen, X., Pun, C.-M., Cun, X.: Explicit visual prompting for low-level structure segmentations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19 434–19 445 (2023)
Google Scholar
He, X., Li, C., Zhang, P., Yang, J., Wang, X.E.: Parameter-efficient model adaptation for vision transformers. ar**v preprint ar**v:2203.16329 (2022)
Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. Adv. Neural Inf. Process. Syst. 35, 16 664–16 678 (2022)
Google Scholar
Chen, Z., et al.: Vision transformer adapter for dense predictions. ar**v preprint ar**v:2205.08534 (2022)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). Cornell University - ar**v (2016)
Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 461–486 (2009). https://doi.org/10.1007/s10791-008-9066-8
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)
Chen, K., et al.: Mmdetection: open mmlab detection toolbox and benchmark. ar**v Computer Vision and Pattern Recognition (2019)
Google Scholar
Kiesel, J., Meyer, L., Kneist, F., Stein, B., Potthast, M.: An empirical comparison of web page segmentation algorithms. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 62–74. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_5
Chapter Google Scholar

Download references

Acknowledgements

This work was financially supported by the Joint Research Fund of China Pacific Insurance (Group) Co. and SJTU-Artificial Intelligence Institute.

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Bowen Ren, Zefeng Qian, Yuchen Sun & Chongyang Zhang
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
Chongyang Zhang
China Pacific Insurance (Group) Co., Ltd., Shanghai, 200010, China
Chao Gao

Authors

Bowen Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zefeng Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Chongyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chongyang Zhang .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, B., Qian, Z., Sun, Y., Gao, C., Zhang, C. (2024). WebSAM-Adapter: Adapting Segment Anything Model for Web Page Segmentation. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-56027-9_27
Published: 20 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

WebSAM-Adapter: Adapting Segment Anything Model for Web Page Segmentation