Introduction

Global demand for metals is expected to increase two to sixfold by 2100 [1,2,3]. This is especially true for aluminum as there is a growing demand for high-performance, lightweight, recyclable structural alloys across industries [4, 5]. In the context of recycling, shifts in end-uses lead to uncertainties in the future scrap stream compositions, which is further exacerbated by accumulation of detrimental elements as alloys are recycled [6]. For example, increasing recycling content often leads to the emergence of phases (e.g., iron-containing intermetallics) detrimental to mechanical and other properties [7, 8].

Understanding microstructure–property relationships is the foundation for any alloy design effort (including recyclability considerations). Microstructure constituents of special interest in aluminum alloys are phases—spatial regions of uniform crystal structure and chemistry. Many beneficial properties are achieved based on the formation of desirable phases, for example, in the form of fine precipitates [9]. Conversely, many performance characteristics sharply deteriorate in the presence of phases with undesirable size or morphology. Given the importance of phases as key microstructure constituents, a large body of work has been dedicated to experimental observation of phase formation in response to metallurgical processing. Systematically organizing the knowledge published in the literature over decades of research could greatly benefit the current alloy design endeavors.

In recent years, natural language processing (NLP) has emerged as a powerful tool for analysis of large sets of scientific texts. It has been applied to the design and discovery of battery materials [10], complex oxides [11], zeolites [12, 13], nanoparticles [14], and more [15, 16]. However, development and application of NLP to the design of structural alloys are still in early stages. Sample research includes the text-mining of millions of papers to efficiently design high-entropy alloys [17] as well as predicting the pitting potential for corrosion-resistant alloy design using embeddings of literature excerpts [18]. Relevant to aluminum, Liu et al. have created a labeled dataset of material entities from the literature focused on the Al-Si alloy system [19]. Their use of active learning to supplement their manual labeling of entities, however, highlights the need for an automated high-throughput extraction method applicable to different regions of the alloy space. On the other hand, Pfeiffer et al. considered the entire range of aluminum alloy series and extracted 14,884 aluminum alloy compositions, along with 1,278 properties from 5,172 research papers [20]. While covering wide independent ranges and distributions of engineering properties, their database does not contain links between compositions and properties.

To address this gap, we develop an NLP framework to automatically extract, from the literature, phases, and their "sentiment" in terms of positive or negative impact on properties. We leverage large language models (LLMs) to perform a wide variety of NLP tasks (including named entity recognition (NER) and relationship extraction (RE)), without the need for extensive manually labeled datasets [21]. By performing automated collection of relevant sentences, NER, and relationship inference using transformer-based models and LLMs, we create a database of existing phase–property relationships. We demonstrate the uses of this database for gaining insights that we confirm against established metallurgical knowledge. We focus on aluminum alloys, but the framework presented here is flexible and can be applied to other metallic systems. We develop the framework in Sect. 2 and show how it can derive key insights from the aluminum system in Sect. 3. The framework’s uses and implications for researchers will be discussed in Sect. 4.

NLP Framework for High-Throughput Extraction of Phase–Property Relationships

In this section, we present an NLP framework for extraction of phase–property relationships in alloys from the literature applied to aluminum alloys. We (i) collect a corpus of relevant papers, (ii) extract sentences from full-body papers, (iii) perform NER and extract phase–property relationships, (iv) aggregate or disambiguate the extracted entities (Fig. 1).

Fig. 1
figure 1

Summary of our NLP framework

Paper Corpus Collection

We first build a corpus of papers related to aluminum alloys culled from our in-house database of more than 5.7 million full texts of papers published in academic journals [22]. Our search for relevant papers included two strategies: (i) rule-based regular expression (regex) matching of words in titles and abstracts of the in-house database and (ii) querying the Scopus database [23]. In the first search strategy, we used the following five rules, which checked the presence of:

  • The words alumin(i)um and alloy in the title,

  • Alloy denominations in the title, (ex: "Al6061"),

  • Alloy series in the title (ex: "7xxx", "6xxx"),

  • Alloy names using chemical elements (ex: "Al-Si", "Al-Mg-Sc-Zr") along with the word alloy in the title,

  • Alloy numbers consisting of 3 or 4 consecutive numbers (ex: "5182", "A382") with a mention of alumin(i)um in the title or abstract.

A paper satisfying any of those rules was considered an aluminum text. We found a total of 19,356 articles in our database of full texts. To complement this search, we also queried the Scopus database for papers on the subject of aluminum alloys. We queried papers that contained strings "alumin*um" or "Al-" and "alloy" in the titles but excluded those having "-Al" to remove papers with aluminum as an alloying element. The Scopus queries provided a further 1,164 articles that that were not already present in the list of relevant papers identified with the regex search. Having the combined list of articles on aluminum alloys, we downloaded full texts of these articles from our in-house database to obtain a final corpus of 20,520 full texts in the JSON format.

Sentence Dataset Collection

From the paper corpus, we then extract the sentences that contain information on phases and properties to build a sentence dataset. We choose the sentence as the main unit of text because papers in metallurgic literature often discuss multiple phases and properties in a single paper. Focusing on a smaller unit of text reduces the possibilities of ambiguous relationships. On the other hand considering larger units of text (e.g., paragraph) may challenge extraction of unambiguously coupled phase–property pairs and the sentiment of their relationships. Furthermore, we hypothesize that the description of phases and their impact on properties is captured at the sentence level in the metallurgical literature at a sufficient level for insights to be gathered. Finally, focus on a small unit of text enables use of a wide spectrum of NLP tools and LLMs, including those with limited context windows.

The prototypical sentence that we targeted to include in the sentence dataset reads as "[Phase A] leads to an increase in [property B]". Such sentence extraction can be approached as a classification problem, i.e., whether or not a given sentence contains the phase-property information, or whether or not it resembles our prototypical sentence. Here, we chose BERT-type transformer models coupled with a classification head to perform this task. For best performance, we fine-tuned and evaluated four BERT models: the uncased versions of the original BERT [52], which in turn can form a foundation for interactive systems of fast and user-friendly retrieval of materials information. Our sentence dataset can be utilized as information-dense source of text data that can be used as domain-specific context for conversational LLMs (e.g., for retrieval-augmented generation [53]).

We finally note the key role of LLMs in building our framework without the need in excessive amounts of manually labeled data. Specifically, we observed a remarkable performance of LLMs in NER and RE tasks using only a handful of labeled examples (Sect. 2.3). The manual annotation of sentences for NER and RE tasks with more traditional NLP approaches would have been extremely time consuming. Furthermore, using few-shot learning, we could significantly improve the model performance without expensive fine-tuning. The sentence classification was addressed by fine-tuning BERT-type models, for which constituting a manually annotated dataset requires significantly less effort than NER and RE tasks.

Limitations and Future Opportunities

In this work, we developed a framework for high-throughput extraction of phases, properties, and their relationships from published literature on aluminum alloys. Ideally the framework should be fully automated, however, in the current state, some (semi-)manual intervention was still needed, most notably the aggregation of alias notations of the same phases/properties and their verification. For example, 15% of the extracted samples of the property "strength" have been aliased from otherwise worded terms referring to strength. Similarly, our database contains the property "corrosion," which aggregates not only the term "corrosion" itself but also other related terms that constituted 79% of the final aggregated "corrosion" samples. We expect that rapid progress in NLP and LLMs will eliminate the need in these additional steps and allow extraction of one-to-one relationships of unique phases and properties.

This study focused on qualitative relationships between phases and properties, i.e., whether any given phase has a positive or negative impact on a property. Next efforts in this direction can pursue quantitative relationships as well as additional extraction of alloy chemical composition to further aid computational alloy design.

Finally, Fig. 5 shows that relationships described in literature are about 70–80% positive. This indicates a clear bias towards reporting "positive results," e.g., phases and phenomena that are beneficial to alloys properties. This bias results in unbalanced extracted datasets regardless how good the NLP framework for extraction is. The computational alloy design leveraging state-of-the-art NLP could benefit from a more balanced reporting of both negative and positive research results from the community.

Conclusion

In summary, we present a novel methodology for extracting phase–property relationships from metallurgic literature using natural language processing and large language models. The study focuses on the aluminum system and leverages the power of NLP and LLMs to systematically organize knowledge from a vast corpus of research papers. The insights generated from the extracted database show its use as a valuable guide for alloy designers and researchers seeking to optimize alloy performance.

The results presented here show that this framework is useful for rapidly extracting insights from literature on alloys. The knowledge we have derived on the aluminum system would, traditionally, be held in textbooks that would have taken years to write by experts. As research on alloy properties continues to grow, these tools will become an indispensable to quickly screen literature and gain insights.