A Study on Improving ALBERT with Additive Attention for Text Classification

Zhang, Zepeng; Chen, Hua; **ong, Jiagui; Hu, Jiayu; Ni, Wenlong

doi:10.1007/978-3-031-47637-2_15

Zepeng Zhang¹³,
Hua Chen¹³,
Jiagui **ong¹³,
Jiayu Hu¹³ &
…
Wenlong Ni¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14407))

Included in the following conference series:

Asian Conference on Pattern Recognition

326 Accesses

Abstract

The Transformer has made significant advances in various fields, but high computational costs and lengthy training times pose challenges for models based on this architecture. To address this issue, we propose an improved ALBERT-based model, which replaces ALBERT’s self-attention mechanism with an additive attention mechanism. This modification can reduce computational complexity and enhance the model’s flexibility. We compare our proposed model with other Transformer-based models, demonstrating that it achieves a lower parameter count and significantly reduces computational complexity. Through extensive evaluations on diverse datasets, we establish the superior efficiency of our proposed model over alternative ones. With its reduced parameter count, our proposed model emerges as a promising approach to enhance the efficiency and practicality of Transformer-based models. Notably, it enables practical training under resource and time limitations, highlighting its adaptability and versatility in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 60.98; Price includes VAT (Germany)

Softcover Book: EUR 79.17; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Pre-trained Data Augmentation for Text Classification

Contrastive learning with text augmentation for text classification

Article 09 March 2023

Dual-axial self-attention network for text classification

Article 25 November 2021

References

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805 (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Brown, T.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. ar**v preprint ar**v:1409.2329 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. ar**v preprint ar**v:1408.5882 (2014)
Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. ar**v preprint ar**v:2010.11929 (2020)
Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, Usunier, Nicolas, Kirillov, Alexander, Zagoruyko, Sergey: End-to-end object detection with transformers. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. ar**v preprint ar**v:2204.06125 (2022)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ar**v preprint ar**v:1910.01108 (2019)
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. ar**v preprint ar**v:2004.05150 (2020)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. ar**v preprint ar**v:1909.11942 (2019)
Kitaev, N., Kaiser, Ł., Levskaya, A.: Reformer: the efficient transformer. ar**v preprint ar**v:2001.04451 (2020)
Zhao, G., Lin, J., Zhang, Z., Ren, X., Sun, X.: Sparse transformer: concentrated attention through explicit selection (2019)
Google Scholar
Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020)
Google Scholar
Wu, C., Wu, F., Qi, T., Huang, Y., **e, X.: Fastformer: additive attention can be all you need. ar**v preprint ar**v:2108.09084 (2021)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, October 2014
Google Scholar
Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N project report, Stanford, 1(12) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Jiangxi Normal University, Nanchang, 330022, Jiangxi, China
Zepeng Zhang, Hua Chen, Jiagui **ong, Jiayu Hu & Wenlong Ni

Authors

Zepeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiagui **ong
View author publications
You can also search for this author in PubMed Google Scholar
Jiayu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Wenlong Ni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Chen .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Be**g, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Chen, H., **ong, J., Hu, J., Ni, W. (2023). A Study on Improving ALBERT with Additive Attention for Text Classification. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14407. Springer, Cham. https://doi.org/10.1007/978-3-031-47637-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-47637-2_15
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47636-5
Online ISBN: 978-3-031-47637-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Study on Improving ALBERT with Additive Attention for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pre-trained Data Augmentation for Text Classification

Contrastive learning with text augmentation for text classification

Dual-axial self-attention network for text classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Study on Improving ALBERT with Additive Attention for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pre-trained Data Augmentation for Text Classification

Contrastive learning with text augmentation for text classification

Dual-axial self-attention network for text classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation