Element Extraction from Computer Science Academic Papers for AI Survey Writing

  • Conference paper
  • First Online:
Computer Networks and IoT (IAIC 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2060))

Included in the following conference series:

  • 100 Accesses

Abstract

With the exponential growth of research papers, text summarization tools have emerged. However, existing text summarization tools merely extract existing sentences or words based on their frequency and may not be particularly well-suited for papers. To address this gap, this study develops a model based on DistilBERT, primarily focusing on information extraction and dataset labeling and augmentation techniques. The model’s central objective is entity recognition, aiming to identify two specific entities from the full text of research papers. The model takes these critical segments of papers as input and aims to identify the research problems and content contained within them. In response to the limitations of existing datasets, this research augments a dataset with over 4000 full-text ar**v computer algorithm papers through manual annotations.

The developed model demonstrates exceptional performance on several evaluation metrics, including accuracy, precision, F1 score, and recall. For comparative experiments, we employed several baseline models based on BERT. These results demonstrate the effectiveness of the proposed model. As part of a comparative experiment, we trained our models using three different dataset training methods. Additionally, to evaluate our dataset’s quality and underline the importance of full-text data, we manually annotated a random selection of 4000 papers from the ARXIV Data dataset, extracting only their titles and abstracts. As a result, Our proposed model outperforms all the baseline models, achieving an accuracy of 0.823 and an F1 Score of 0.798 and models trained on the proposed full-text annotated dataset outperform those trained on other datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Felizardo, K.R., Carver, J.C.: Automating systematic literature review. In: Contemporary Empirical Methods in Software Engineering, pp. 327–355. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-32489-6_12

    Chapter  Google Scholar 

  2. McNabb, L., Laramee, R.S.: How to write a visualization survey paper: a starting point. In: Eurographics (Education Papers), pp. 29–39 (2019)

    Google Scholar 

  3. Loza, V., Lahiri, S., Mihalcea, R., et al.: Building a dataset for summarization and keyword extraction from emails. In: LREC, pp. 2441–2446 (2014)

    Google Scholar 

  4. Jonnalagadda, S., Goyal, P., Huffman, M.: Automating data extraction in systematic reviews: a systematic review. Syst. Rev. 4(1), 78 (2015)

    Article  Google Scholar 

  5. Aliyu, M.B., Iqbal, R., James, A.: The canonical model of structure for data extraction in systematic reviews of scientific research articles. In: 15th International Conference on Social Networks Analysis, Management and Security (SNAMS 2018), pp. 264–271 (2018)

    Google Scholar 

  6. Cabot, P.L.H., Navigli, R.: REBEL: relation extraction by end-to-end language generation. In: Findings of the Association for Computational Linguistics, EMNLP 2021, pp. 2370–2381 (2021)

    Google Scholar 

  7. Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, vol. 1, p. 2 (2019)

    Google Scholar 

  8. Nayak, T., Ng, H.T.: Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8528–8535 (2020)

    Google Scholar 

  9. Yamada, I., Asai, A., Shindo, H., et al.: LUKE: deep contextualized entity representations with entity-aware self-attention. ar**v preprint ar**v:2010.01057 (2020)

  10. Zhang, R.H., Liu, Q., Fan, A.X., et al.: Minimize exposure bias of Seq2Seq models in joint entity and relation extraction. ar**v preprint ar**v:2009.07503 (2020)

  11. Blloshmi, R., Conia, S., Tripodi, R., et al.: Generating senses and RoLes: an end-to-end model for dependency-and span-based semantic role labeling. In: IJCAI, pp. 3786–3793 (2021)

    Google Scholar 

  12. Dernoncourt, F., Lee, J.Y.: PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. ar**v preprint ar**v:1710.06071 (2017)

  13. Gehrke, J., Ginsparg, P., Kleinberg, J.: Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5(2), 149–151 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to **nguo Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, F., Yu, X. (2024). Element Extraction from Computer Science Academic Papers for AI Survey Writing. In: **, H., Pan, Y., Lu, J. (eds) Computer Networks and IoT. IAIC 2023. Communications in Computer and Information Science, vol 2060. Springer, Singapore. https://doi.org/10.1007/978-981-97-1332-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-1332-5_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-1331-8

  • Online ISBN: 978-981-97-1332-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation