Element Extraction from Computer Science Academic Papers for AI Survey Writing

Luo, Fan; Yu, **nguo

doi:10.1007/978-981-97-1332-5_21

Fan Luo⁸ &
**nguo Yu⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2060))

Included in the following conference series:

International Artificial Intelligence Conference

100 Accesses

Abstract

With the exponential growth of research papers, text summarization tools have emerged. However, existing text summarization tools merely extract existing sentences or words based on their frequency and may not be particularly well-suited for papers. To address this gap, this study develops a model based on DistilBERT, primarily focusing on information extraction and dataset labeling and augmentation techniques. The model’s central objective is entity recognition, aiming to identify two specific entities from the full text of research papers. The model takes these critical segments of papers as input and aims to identify the research problems and content contained within them. In response to the limitations of existing datasets, this research augments a dataset with over 4000 full-text ar**v computer algorithm papers through manual annotations.

The developed model demonstrates exceptional performance on several evaluation metrics, including accuracy, precision, F1 score, and recall. For comparative experiments, we employed several baseline models based on BERT. These results demonstrate the effectiveness of the proposed model. As part of a comparative experiment, we trained our models using three different dataset training methods. Additionally, to evaluate our dataset’s quality and underline the importance of full-text data, we manually annotated a random selection of 4000 papers from the ARXIV Data dataset, extracting only their titles and abstracts. As a result, Our proposed model outperforms all the baseline models, achieving an accuracy of 0.823 and an F1 Score of 0.798 and models trained on the proposed full-text annotated dataset outperform those trained on other datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Felizardo, K.R., Carver, J.C.: Automating systematic literature review. In: Contemporary Empirical Methods in Software Engineering, pp. 327–355. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-32489-6_12
Chapter Google Scholar
McNabb, L., Laramee, R.S.: How to write a visualization survey paper: a starting point. In: Eurographics (Education Papers), pp. 29–39 (2019)
Google Scholar
Loza, V., Lahiri, S., Mihalcea, R., et al.: Building a dataset for summarization and keyword extraction from emails. In: LREC, pp. 2441–2446 (2014)
Google Scholar
Jonnalagadda, S., Goyal, P., Huffman, M.: Automating data extraction in systematic reviews: a systematic review. Syst. Rev. 4(1), 78 (2015)
Article Google Scholar
Aliyu, M.B., Iqbal, R., James, A.: The canonical model of structure for data extraction in systematic reviews of scientific research articles. In: 15th International Conference on Social Networks Analysis, Management and Security (SNAMS 2018), pp. 264–271 (2018)
Google Scholar
Cabot, P.L.H., Navigli, R.: REBEL: relation extraction by end-to-end language generation. In: Findings of the Association for Computational Linguistics, EMNLP 2021, pp. 2370–2381 (2021)
Google Scholar
Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, vol. 1, p. 2 (2019)
Google Scholar
Nayak, T., Ng, H.T.: Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8528–8535 (2020)
Google Scholar
Yamada, I., Asai, A., Shindo, H., et al.: LUKE: deep contextualized entity representations with entity-aware self-attention. ar**v preprint ar**v:2010.01057 (2020)
Zhang, R.H., Liu, Q., Fan, A.X., et al.: Minimize exposure bias of Seq2Seq models in joint entity and relation extraction. ar**v preprint ar**v:2009.07503 (2020)
Blloshmi, R., Conia, S., Tripodi, R., et al.: Generating senses and RoLes: an end-to-end model for dependency-and span-based semantic role labeling. In: IJCAI, pp. 3786–3793 (2021)
Google Scholar
Dernoncourt, F., Lee, J.Y.: PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. ar**v preprint ar**v:1710.06071 (2017)
Gehrke, J., Ginsparg, P., Kleinberg, J.: Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5(2), 149–151 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Central China Normal University, Wuhan, China
Fan Luo & **nguo Yu

Authors

Fan Luo
View author publications
You can also search for this author in PubMed Google Scholar
**nguo Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to **nguo Yu .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, Hubei, China
Hai **
Chinese Academy of Science, Shenzhen, China
Yi Pan
Nan**g University of Science and Technology, Nan**g, China
Jianfeng Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, F., Yu, X. (2024). Element Extraction from Computer Science Academic Papers for AI Survey Writing. In: **, H., Pan, Y., Lu, J. (eds) Computer Networks and IoT. IAIC 2023. Communications in Computer and Information Science, vol 2060. Springer, Singapore. https://doi.org/10.1007/978-981-97-1332-5_21

Download citation

DOI: https://doi.org/10.1007/978-981-97-1332-5_21
Published: 03 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1331-8
Online ISBN: 978-981-97-1332-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Element Extraction from Computer Science Academic Papers for AI Survey Writing