High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models

Montanelli, Luca; Venugopal, Vineeth; Olivetti, Elsa A.; Latypov, Marat I.

doi:10.1007/s40192-024-00344-8

High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models

Thematic Section: Harnessing the Power of Materials Data
Open access
Published: 19 March 2024

Volume 13, pages 396–405, (2024)
Cite this article

Download PDF

You have full access to this open access article

Integrating Materials and Manufacturing Innovation Aims and scope Submit manuscript

High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models

Download PDF

Luca Montanelli ORCID: orcid.org/0000-0002-7784-7627¹,
Vineeth Venugopal¹,
Elsa A. Olivetti¹ &
…
Marat I. Latypov²

819 Accesses
Explore all metrics

Abstract

Consolidating published research on aluminum alloys into insights about microstructure–property relationships can simplify and reduce the costs involved in alloy design. One critical design consideration for many heat-treatable alloys deriving superior properties from precipitation are phases as key microstructure constituents because they can have a decisive impact on the engineering properties of alloys. Here, we present a computational framework for high-throughput extraction of phases and their impact on properties from scientific papers. Our framework includes transformer-based and large language models to identify sentences with phase-property information in papers, recognize phase and property entities, and extract phase-property relationships and their “sentiment.” We demonstrate the application of our framework on aluminum alloys, for which we build a database of 7,675 phase–property relationships extracted from a corpus of almost 5000 full-text papers. We comment on the extracted relationships based on common metallurgical knowledge.

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

Article Open access 05 April 2023

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Article Open access 19 June 2018

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Global demand for metals is expected to increase two to sixfold by 2100 [1,2,3]. This is especially true for aluminum as there is a growing demand for high-performance, lightweight, recyclable structural alloys across industries [4, 5]. In the context of recycling, shifts in end-uses lead to uncertainties in the future scrap stream compositions, which is further exacerbated by accumulation of detrimental elements as alloys are recycled [6]. For example, increasing recycling content often leads to the emergence of phases (e.g., iron-containing intermetallics) detrimental to mechanical and other properties [7, 8].

Understanding microstructure–property relationships is the foundation for any alloy design effort (including recyclability considerations). Microstructure constituents of special interest in aluminum alloys are phases—spatial regions of uniform crystal structure and chemistry. Many beneficial properties are achieved based on the formation of desirable phases, for example, in the form of fine precipitates [9]. Conversely, many performance characteristics sharply deteriorate in the presence of phases with undesirable size or morphology. Given the importance of phases as key microstructure constituents, a large body of work has been dedicated to experimental observation of phase formation in response to metallurgical processing. Systematically organizing the knowledge published in the literature over decades of research could greatly benefit the current alloy design endeavors.

In recent years, natural language processing (NLP) has emerged as a powerful tool for analysis of large sets of scientific texts. It has been applied to the design and discovery of battery materials [10], complex oxides [11], zeolites [12, 13], nanoparticles [14], and more [15, 16]. However, development and application of NLP to the design of structural alloys are still in early stages. Sample research includes the text-mining of millions of papers to efficiently design high-entropy alloys [17] as well as predicting the pitting potential for corrosion-resistant alloy design using embeddings of literature excerpts [18]. Relevant to aluminum, Liu et al. have created a labeled dataset of material entities from the literature focused on the Al-Si alloy system [19]. Their use of active learning to supplement their manual labeling of entities, however, highlights the need for an automated high-throughput extraction method applicable to different regions of the alloy space. On the other hand, Pfeiffer et al. considered the entire range of aluminum alloy series and extracted 14,884 aluminum alloy compositions, along with 1,278 properties from 5,172 research papers [20]. While covering wide independent ranges and distributions of engineering properties, their database does not contain links between compositions and properties.

To address this gap, we develop an NLP framework to automatically extract, from the literature, phases, and their "sentiment" in terms of positive or negative impact on properties. We leverage large language models (LLMs) to perform a wide variety of NLP tasks (including named entity recognition (NER) and relationship extraction (RE)), without the need for extensive manually labeled datasets [21]. By performing automated collection of relevant sentences, NER, and relationship inference using transformer-based models and LLMs, we create a database of existing phase–property relationships. We demonstrate the uses of this database for gaining insights that we confirm against established metallurgical knowledge. We focus on aluminum alloys, but the framework presented here is flexible and can be applied to other metallic systems. We develop the framework in Sect. 2 and show how it can derive key insights from the aluminum system in Sect. 3. The framework’s uses and implications for researchers will be discussed in Sect. 4.

NLP Framework for High-Throughput Extraction of Phase–Property Relationships

In this section, we present an NLP framework for extraction of phase–property relationships in alloys from the literature applied to aluminum alloys. We (i) collect a corpus of relevant papers, (ii) extract sentences from full-body papers, (iii) perform NER and extract phase–property relationships, (iv) aggregate or disambiguate the extracted entities (Fig. 1).

Paper Corpus Collection

We first build a corpus of papers related to aluminum alloys culled from our in-house database of more than 5.7 million full texts of papers published in academic journals [22]. Our search for relevant papers included two strategies: (i) rule-based regular expression (regex) matching of words in titles and abstracts of the in-house database and (ii) querying the Scopus database [23]. In the first search strategy, we used the following five rules, which checked the presence of:

The words alumin(i)um and alloy in the title,
Alloy denominations in the title, (ex: "Al6061"),
Alloy series in the title (ex: "7xxx", "6xxx"),
Alloy names using chemical elements (ex: "Al-Si", "Al-Mg-Sc-Zr") along with the word alloy in the title,
Alloy numbers consisting of 3 or 4 consecutive numbers (ex: "5182", "A382") with a mention of alumin(i)um in the title or abstract.

A paper satisfying any of those rules was considered an aluminum text. We found a total of 19,356 articles in our database of full texts. To complement this search, we also queried the Scopus database for papers on the subject of aluminum alloys. We queried papers that contained strings "alumin*um" or "Al-" and "alloy" in the titles but excluded those having "-Al" to remove papers with aluminum as an alloying element. The Scopus queries provided a further 1,164 articles that that were not already present in the list of relevant papers identified with the regex search. Having the combined list of articles on aluminum alloys, we downloaded full texts of these articles from our in-house database to obtain a final corpus of 20,520 full texts in the JSON format.

Sentence Dataset Collection

From the paper corpus, we then extract the sentences that contain information on phases and properties to build a sentence dataset. We choose the sentence as the main unit of text because papers in metallurgic literature often discuss multiple phases and properties in a single paper. Focusing on a smaller unit of text reduces the possibilities of ambiguous relationships. On the other hand considering larger units of text (e.g., paragraph) may challenge extraction of unambiguously coupled phase–property pairs and the sentiment of their relationships. Furthermore, we hypothesize that the description of phases and their impact on properties is captured at the sentence level in the metallurgical literature at a sufficient level for insights to be gathered. Finally, focus on a small unit of text enables use of a wide spectrum of NLP tools and LLMs, including those with limited context windows.

The prototypical sentence that we targeted to include in the sentence dataset reads as "[Phase A] leads to an increase in [property B]". Such sentence extraction can be approached as a classification problem, i.e., whether or not a given sentence contains the phase-property information, or whether or not it resembles our prototypical sentence. Here, we chose BERT-type transformer models coupled with a classification head to perform this task. For best performance, we fine-tuned and evaluated four BERT models: the uncased versions of the original BERT [52], which in turn can form a foundation for interactive systems of fast and user-friendly retrieval of materials information. Our sentence dataset can be utilized as information-dense source of text data that can be used as domain-specific context for conversational LLMs (e.g., for retrieval-augmented generation [53]).

We finally note the key role of LLMs in building our framework without the need in excessive amounts of manually labeled data. Specifically, we observed a remarkable performance of LLMs in NER and RE tasks using only a handful of labeled examples (Sect. 2.3). The manual annotation of sentences for NER and RE tasks with more traditional NLP approaches would have been extremely time consuming. Furthermore, using few-shot learning, we could significantly improve the model performance without expensive fine-tuning. The sentence classification was addressed by fine-tuning BERT-type models, for which constituting a manually annotated dataset requires significantly less effort than NER and RE tasks.

Limitations and Future Opportunities

In this work, we developed a framework for high-throughput extraction of phases, properties, and their relationships from published literature on aluminum alloys. Ideally the framework should be fully automated, however, in the current state, some (semi-)manual intervention was still needed, most notably the aggregation of alias notations of the same phases/properties and their verification. For example, 15% of the extracted samples of the property "strength" have been aliased from otherwise worded terms referring to strength. Similarly, our database contains the property "corrosion," which aggregates not only the term "corrosion" itself but also other related terms that constituted 79% of the final aggregated "corrosion" samples. We expect that rapid progress in NLP and LLMs will eliminate the need in these additional steps and allow extraction of one-to-one relationships of unique phases and properties.

This study focused on qualitative relationships between phases and properties, i.e., whether any given phase has a positive or negative impact on a property. Next efforts in this direction can pursue quantitative relationships as well as additional extraction of alloy chemical composition to further aid computational alloy design.

Finally, Fig. 5 shows that relationships described in literature are about 70–80% positive. This indicates a clear bias towards reporting "positive results," e.g., phases and phenomena that are beneficial to alloys properties. This bias results in unbalanced extracted datasets regardless how good the NLP framework for extraction is. The computational alloy design leveraging state-of-the-art NLP could benefit from a more balanced reporting of both negative and positive research results from the community.

Conclusion

In summary, we present a novel methodology for extracting phase–property relationships from metallurgic literature using natural language processing and large language models. The study focuses on the aluminum system and leverages the power of NLP and LLMs to systematically organize knowledge from a vast corpus of research papers. The insights generated from the extracted database show its use as a valuable guide for alloy designers and researchers seeking to optimize alloy performance.

The results presented here show that this framework is useful for rapidly extracting insights from literature on alloys. The knowledge we have derived on the aluminum system would, traditionally, be held in textbooks that would have taken years to write by experts. As research on alloy properties continues to grow, these tools will become an indispensable to quickly screen literature and gain insights.

Code Availability

The code and database are openly available on GitHub at the following address: https://github.com/olivettigroup/phase-sentiment.

References

Saevarsdottir G, Kvande H, Welch B (2019) Aluminum production in the times of climate change: the global challenge to reduce the carbon footprint and prevent carbon leakage. JOM 11:72. https://doi.org/10.1007/s11837-019-03918-6
Article CAS Google Scholar
Cullen JM, Allwood JM (2013) Map** the global flow of aluminum: from liquid aluminum to end-use goods. Environ Sci Technol 47(7):3057–3064. https://doi.org/10.1021/es304256s
Article CAS PubMed Google Scholar
Watari T, Nansai K, Nakajima K (2021) Major metals demand, supply, and environmental impacts to 2100: a critical review. Resour Conserv Recycl 164:105107. https://doi.org/10.1016/j.resconrec.2020.105107
Article CAS Google Scholar
Raabe D, Ponge D, Uggowitzer PJ, Roscher M, Paolantonio M, Liu C et al (2022) Making sustainable aluminum by recycling scrap: the science of “dirty’’ alloys. Prog Mater Sci 128:100947. https://doi.org/10.1016/j.pmatsci.2022.100947
Article CAS Google Scholar
Raabe D, Tasan C, Olivetti E (2019) Strategies for improving the sustainability of structural metals. Nature 11(575):64–74. https://doi.org/10.1038/s41586-019-1702-5
Article CAS Google Scholar
Gaustad G, Olivetti E, Kirchain R (2011) Toward sustainable material usage: evaluating the importance of market motivated agency in modeling material flows. Environ Sci Technol 45(9):4110–4117. https://doi.org/10.1021/es103508u
Article CAS PubMed Google Scholar
Yang H, Ji S, Fan Z (2015) Effect of heat treatment and Fe content on the microstructure and mechanical properties of die-cast Al–Si–Cu alloys. Mater Des 85:823–832. https://doi.org/10.1016/j.matdes.2015.07.074
Article CAS Google Scholar
Basak C, Hari Babu N (2017) Improved recyclability of cast al-alloys by engineering \(\beta \)-Al9Fe2Si2 phase. In: Light metals. Springer, pp 1139–1147
Wang J (2018) Physical metallurgy of aluminum alloys. In: Aluminum science and technology. ASM International. https://doi.org/10.31399/asm.hb.v02a.a0006503
Huang S, Cole J (2020) A database of battery materials auto-generated using ChemDataExtractor. Sci Data 08:7. https://doi.org/10.1038/s41597-020-00602-2
Article Google Scholar
Young SR, Maksov A, Ziatdinov M, Cao Y, Burch M, Balachandran J et al (2018) Data mining for better material synthesis: the case of pulsed laser deposition of complex oxides. J Appl Phys 123(11):115303. https://doi.org/10.1063/1.5009942
Article CAS Google Scholar
Schwalbe-Koda D, Kwon S, Paris C, Bello-Jurado E, Jensen Z, Olivetti E et al (2021) A priori control of zeolite phase competition and intergrowth with high-throughput simulations. Science 374(6565):308–315. https://doi.org/10.1126/science.abh3350
Article CAS PubMed Google Scholar
Jensen Z, Kwon S, Schwalbe-Koda D, Paris C, Gómez-Bombarelli R, Román-Leshkov Y et al (2021) Discovering relationships between OSDAs and zeolites through data mining and generative neural networks. ACS Cent Sci 7(5):858–867. https://doi.org/10.1021/acscentsci.1c00024
Article CAS PubMed PubMed Central Google Scholar
Cruse K, Trewartha A, Lee S, Wang Z, Huo H, He T et al (2022) Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities. Sci Data 05(9):234. https://doi.org/10.1038/s41597-022-01321-6
Article Google Scholar
Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O et al (2019) Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571(7763):95–98. https://doi.org/10.1038/s41586-019-1335-8
Article CAS PubMed Google Scholar
Lee J, Lee M, Min K (2023) Natural language processing techniques for advancing materials discovery: a short review. Int J Precis Eng Manufact Green Technol 06:10. https://doi.org/10.1007/s40684-023-00523-6
Article Google Scholar
Pei Z, Yin J, Liaw PK, Raabe D (2023) Toward the design of ultrahigh-entropy alloys via mining six million texts. Nat Commun. https://doi.org/10.1038/s41467-022-35766-5
Article PubMed PubMed Central Google Scholar
Sasidhar KN, Siboni NH, Mianroodi JR, Rohwerder M, Neugebauer J, Raabe D (2023) Enhancing corrosion-resistant alloy design through natural language processing and deep learning. Sci Adv 9(32):eadg7992. https://doi.org/10.1126/sciadv.adg7992
Article CAS PubMed PubMed Central Google Scholar
Liu Y, Yao C, Niu C, Li W, Yin J, Shen T (2021) Text mining of hypereutectic Al–Si alloys literature based on active learning. Mater Today Commun 26:102032. https://doi.org/10.1016/j.mtcomm.2021.102032
Article CAS Google Scholar
Pfeiffer O, Liu H, Montanelli L, Latypov M, Sen F, Hegadekatte V et al (2022) Aluminum alloy compositions and properties extracted from a corpus of scientific manuscripts and US patents. Sci Data 03(9):128. https://doi.org/10.1038/s41597-022-01215-7
Article CAS Google Scholar
Dunn A, Dagdelen J, Walker N, Lee S, Rosen AS, Ceder G, et al (2022) Structured information extraction from complex scientific text with fine-tuned large language models. ar**v:2212.05238
Kim E, Huang K, Tomala A, Matthews S, Strubell E, Saunders A et al (2017) Machine-learned and codified synthesis parameters of oxide materials. Sci Data 4:sdata2017127. https://doi.org/10.1038/sdata.2017.127
Article CAS Google Scholar
Boyle F, Sherman D (2006) Scopus\(^{\rm TM}\): the product and its development. Ser Libr 49(3):147–153. https://doi.org/10.1300/J123v49n03_12
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805
Beltagy I, Lo K, Cohan A (2019) SciBERT: a pretrained language model for scientific text. ar**v:1903.10676
Trewartha A, Walker N, Huo H, Lee S, Cruse K, Dagdelen J et al (2022) Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science. Patterns 3(4):100488. https://doi.org/10.1016/j.patter.2022.100488
Article PubMed PubMed Central Google Scholar
Gupta T, Zaki M, Krishnan NMA, Mausam M (2022) MatSciBERT: a materials domain language model for text mining and information extraction. NPJ Comput. Mater. 12:8. https://doi.org/10.1038/s41524-022-00784-w
Article Google Scholar
Cohere LLM API. Accessed 30 Sept 2023. https://cohere.com/
Liang P, Bommasani R, Lee T, Tsipras D, Soylu D, Yasunaga M, et al (2022) Holistic evaluation of language models. ar**v:2211.09110
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Grootendorst M (2022) BERTopic: neural topic modeling with a class-based TF-IDF procedure. ar**v:2203.05794
Mrówka G (2010) Influence of chemical composition variation and heat treatment on microstructure and mechanical properties of 6xxx alloys. Arch Mater Sci Eng 12:46
Google Scholar
Usta M, Glicksman M, Wright R (2004) The effect of heat treatment on Mg2Si coarsening in aluminum 6105 alloy. Metall Mater Trans A 02(35):435–438. https://doi.org/10.1007/s11661-004-0354-7
Article Google Scholar
Jawalkar C, Verma AS, Suri N et al (2017) Fabrication of aluminium metal matrix composites with particulate reinforcement: a review. Mater Today Proc 4(2):2927–2936. https://doi.org/10.1016/j.matpr.2017.02.174
Article Google Scholar
Arunkumar S, Sundaram MS, Vigneshwara S et al (2020) A review on aluminium matrix composite with various reinforcement particles and their behaviour. Mater Today Proc 33:484–490. https://doi.org/10.1016/j.matpr.2020.05.053
Article CAS Google Scholar
Wang X, Jha A, Brydson R (2004) In situ fabrication of \({\rm Al}_3{\rm Ti}\) particle reinforced aluminium alloy metal-matrix composites. Mater Sci Eng, A 364(1–2):339–345. https://doi.org/10.1016/j.msea.2003.08.049
Article CAS Google Scholar
Menzemer C, Lam PC, Srivatsan TS, Wittel CF (1999) An investigation of fusion zone microstructures of welded aluminum alloy joints. Mater Lett 41(4):192–197. https://doi.org/10.1016/S0167-577X(99)00129-9
Article CAS Google Scholar
Myhr OR, Grong Ø, Fjær HG, Marioara CD (2004) Modelling of the microstructure and strength evolution in Al–Mg–Si alloys during multistage thermal processing. Acta Mater 52(17):4997–5008. https://doi.org/10.1016/j.actamat.2004.07.002
Article CAS Google Scholar
Robles Hernández FC, Sokolowski JH (2006) Comparison among chemical and electromagnetic stirring and vibration melt treatments for Al-Si hypereutectic alloys. J Alloy Compd 426(1):205–212. https://doi.org/10.1016/j.jallcom.2006.09.039
Article CAS Google Scholar
Dash SS, Chen D (2023) A review on processing–microstructure–property relationships of Al–Si alloys: recent advances in deformation behavior. Metals. https://doi.org/10.3390/met13030609
Article Google Scholar
Kim JC, Nishida Y, Arima H, Ando T (2003) Microstructure of Al-Si-Mg alloy processed by rotary-die equal channel angular pressing. Mater Lett 57(11):1689–1695. https://doi.org/10.1016/S0167-577X(02)01053-4
Article CAS Google Scholar
Natori K, Utsunomiya H, Tanaka T (2017) Improvement in formability of semi-solid cast hypoeutectic Al-Si alloys by equal-channel angular pressing. J Mater Process Technol 240:240–248. https://doi.org/10.1016/j.jmatprotec.2016.09.022
Article CAS Google Scholar
Al-Qutub AM, Allam IM, Qureshi TW (2006) Effect of sub-micron Al2O3 concentration on dry wear properties of 6061 aluminum based composite. J Mater Process Technol 172(3):327–331. https://doi.org/10.1016/j.jmatprotec.2005.10.022
Article CAS Google Scholar
Mahdavi S, Akhlaghi F (2011) Effect of SiC content on the processing, compaction behavior, and properties of Al6061/SiC/Gr hybrid composites. J Mater Sci 03(46):1502–1511. https://doi.org/10.1007/s10853-010-4954-x
Article CAS Google Scholar
Yu H, Huang X, Lei F, Tan X, Han Y (2013) Preparation and electrochemical properties of Cr(III)-Ti-based coatings on 6063 Al alloy. Surf Coat Technol 03(218):137–141. https://doi.org/10.1016/j.surfcoat.2012.12.042
Article CAS Google Scholar
Chong Z, Yang X, Wang Y, Zhang DQ, Chen Y (2019) Synergistic effect between glutamic acid and rare earth cerium (III) as corrosion inhibitors on AA5052 aluminum alloy in neutral chloride medium. Ionics 03:25. https://doi.org/10.1007/s11581-018-2605-4
Article CAS Google Scholar
Li T, Li X, Dong C, Cheng Y (2010) Characterization of atmospheric corrosion of 2A12 aluminum alloy in tropical marine environment. J Mater Eng Perform 06(19):591–598. https://doi.org/10.1007/s11665-009-9506-7
Article CAS Google Scholar
Ghosh R, Venugopal A, Rao S, Narayanan P, Pant B, Cherian RM (2018) Effect of temper condition on the corrosion and fatigue performance of AA2219 aluminum alloy. J Mater Eng Perform 01(27):423–433. https://doi.org/10.1007/s11665-018-3125-0
Article CAS Google Scholar
Osório WR, Spinelli JE, Ferreira IL, Garcia A (2007) The roles of macrosegregation and of dendritic array spacings on the electrochemical behavior of an Al-4.5wt% Cu alloy. Electrochimica Acta. 52(9):3265–3273. https://doi.org/10.1016/j.electacta.2006.10.004
Article CAS Google Scholar
Ma J, Wen J, Li Q, Zhang Q (2013) Electrochemical polarization and corrosion behavior of Al–Zn–In based alloy in acidity and alkalinity solutions. Int J Hydrogen Energy 38(34):14896–14902. https://doi.org/10.1016/j.ijhydene.2013.09.046
Article CAS Google Scholar
Andreatta F, Terryn H, de Wit JHW (2004) Corrosion behaviour of different tempers of AA7075 aluminium alloy. Electrochimica Acta 49(17):2851–2862. https://doi.org/10.1016/j.electacta.2004.01.046
Article CAS Google Scholar
Venugopal V, Pai S, Olivetti E (2022) MatKG: the largest knowledge graph in materials science–entities, relations, and link prediction through graph representation learning. ar**v:2210.17340
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N et al (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst 33:9459–9474
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the support from Novelis and NSF (grant CBET-2243914). We express our gratitude to Mrigi Munjal and Thorben Prein for providing a source of inspiration for the approach we used as well as important code snippets.

Funding

'Open Access funding provided by the MIT Libraries'.

Author information

Authors and Affiliations

Department of Material Science and Engineering, Massachusetts Institute of Technology, Cambridge, 02139, MA, USA
Luca Montanelli, Vineeth Venugopal & Elsa A. Olivetti
Department of Material Science and Engineering, University of Arizona, Tucson, 85721, AZ, USA
Marat I. Latypov

Authors

Luca Montanelli
View author publications
You can also search for this author in PubMed Google Scholar
Vineeth Venugopal
View author publications
You can also search for this author in PubMed Google Scholar
Elsa A. Olivetti
View author publications
You can also search for this author in PubMed Google Scholar
Marat I. Latypov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Montanelli.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Montanelli, L., Venugopal, V., Olivetti, E.A. et al. High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models. Integr Mater Manuf Innov 13, 396–405 (2024). https://doi.org/10.1007/s40192-024-00344-8

Download citation

Received: 29 November 2023
Accepted: 25 January 2024
Published: 19 March 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s40192-024-00344-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models

Abstract

Similar content being viewed by others

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

Introduction

NLP Framework for High-Throughput Extraction of Phase–Property Relationships

Paper Corpus Collection

Sentence Dataset Collection

Limitations and Future Opportunities

Conclusion

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models

Abstract

Similar content being viewed by others

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

Introduction

NLP Framework for High-Throughput Extraction of Phase–Property Relationships

Paper Corpus Collection

Sentence Dataset Collection

Limitations and Future Opportunities

Conclusion

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation