Introduction

High-entropy alloys (HEAs), also called multi-principal element alloys1,2,3, are chemically disordered but topologically ordered with the formation of random solid-solution (SS) structures, such as face-centered cubic (FCC), body-centered cubic (BCC), or hexagonal-close-packed (HCP). Understanding the composition–structure–properties relationship has long been a topic of great interest in HEAs. Thus, extensive studies have been carried out on various HEAs, and many attractive properties have been achieved in the last two decades. These properties include good plasticity, high strength and hardness, outstanding high-temperature-softening resistance, and unique electrical and magnetic properties. In the past few years, besides metallic systems, high entropy materials have expanded to ceramics made of carbides, borides, or nitrides of IV and V group transition metals, which have remarkable properties4,5,6. Due to these unique properties and large composition space, high entropy materials have promising potential applications under extreme conditions, such as, in high-temperature structural components, corrosion-resistant parts, coatings, and nuclear materials7.

However, with regard to the property-oriented designs of HEAs, some challenges remain to be solved. (1) Owing to the chemically disordered structure, HEAs are not necessarily equimolar compositions; that is, many potential elements in the periodic table can conceivably be incorporated into HEAs via microalloying or principal element substitution. Therefore, an essentially infinite number of HEAs are available. Since the compositions of HEAs can be continuously adjustable, the properties of interest can be optimized. Conceptually, this poses a serious challenge—How can potential HEAs with properties of interest be fine-tuned efficiently in such a large composition space rather than in a conventional “trial and error” manner8? (2) Coupled with the fact that fully understanding the complicated interplay between constituents and properties is a prerequisite when designing new HEAs, How can the intrinsic relationship in a vast and complex database be uncovered? To date, inspired by the Materials Genome Initiative (MGI), high-throughput techniques (preparation, characterization, and calculation) and the data-driven machine learning (ML) method have been adopted by synergistically combining experiment, theory, and computation in a tightly integrated and high-throughput manner, and to predict and optimize HEAs at an unparalleled scale and in an effective way 9. These tools can be used to screen extensive composition space for a desired property and simultaneously pinpoint specific alloys with the desired properties. Specifically, high-throughput techniques are able to bridge the gap between experiments and ML modeling; that is, high-throughput approaches can provide valuable materials information for the following ML, and vice versa, ML can provide intelligent feedback to the experiments10,11,12. Through continuing efforts to integrate experiment, computation, and data-driven ML, the underlying structure–property relationships to the materials genome can be revealed and thus seed a new generation of advanced HEAs13.

This review aims to present a brief state-of-the-art overview of the materials genome strategy (MGS) applied in HEAs and provide a timely focus on key developments, including challenges and opportunities, in this interdisciplinary area. Specifically, we will give a brief introduction to the development of HEAs and the application of MGI in this field. Additionally, some challenges will also be listed in a brief manner in “Introduction”. In section “High-throughput preparation and characterization of HEAs”, the main high-throughput preparation and characterization techniques for HEAs will be discussed in detailed and critical issues needed to be solved will also be proposed. In section “High-throughput computing for HEAs”, we will present and discuss applications of high-throughput computation method in accelerating the development of HEAs. An in-depth discussion about data-driven ML strategy for HEAs will be provided in section “Data-driven machine learning strategies “. Finally, in “Outlook” section, we will give an outlook of potential research activities to be exploited and main scientific challenges to be addressed in the future. The core purpose underlying the brief review is to provide an important opportunity to advance the understanding of MGS employed in HEAs and to offer researchers a platform to foster new ideas.

High-throughput preparation and characterization of HEAs

The design of HEAs poses a significant challenge when exploring the phase structure and desirable properties through the vast potential multicomponent compositional space available14. As such, unconventional high-throughput preparation techniques are crucially important, particularly for effectively narrowing down the alloys in a wide composition space. Among these, HEAs exploit a variety of preparation techniques, such as, combinatorial thin film deposition, laser additive manufacturing (LAM), rapid alloying prototype, diffusion multiples, and those based on welding. In what follows, we will give an overview of the different high-throughput techniques that were used to prepare multi-component HEAs and point out some critical issues that needed to be resolved.

High-throughput preparation techniques for HEAs

LAM

Combinatorial LAM endows the process with both high heating and cooling rates, and has been used as an efficient method for the synthesis of HEAs. Among various LAM methods, laser metal deposition (LMD) is the preferred technique used to make HEA combinational libraries. During the LMD process, the feedstock nozzles convey the raw material powder to a rapidly moving melt pool formed by a laser through an inert gas flow. Apparently, LMD is more suitable for high-throughput synthesis owing to the advantage of its real-time and variable feeding system, which applies two or more hoppers with different powder feeders to permit changes in the deposited powder compositions15,16,17,18,19,20,21.

Combinatorial laser deposition of compositionally graded complex alloys has been regarded as an attractive approach for assessing the composition–microstructure–property relationships of HEAs. LMD is quite capable of synthesizing refractory HEAs that are difficult to make19. Melia et al. prepared a MoNbTaW alloy system by additive manufacturing with commercial refractory elemental powders, which have good spherical morphology, leveraging the additive manufacturing process and mechanical testing to enable rapid alloy exploration, as shown in Fig. 1. In the steady state, there was an evident linear spatial trend in the composition and a significantly variation of hardness, with composition dominated by solution strengthening (Fig. 1d)19. Compared to other mechanical properties (i.e., strength, plasticity, toughness, etc.), hardness is the simplest one that can be obtained effectively by mechanical testing automatically in areas with different compositions of small samples. In view of the hardness–strength relationship (\({H}_{v}\;\approx\;\frac{3{\sigma }_{y}({MPa})}{9.81}\))22, hardness allows for indirect and efficient evaluations of mechanical properties.

Fig. 1: Analysis of the additive manufacturing processed (MoTaW)x(Nb)1−x compositionally graded part cross-section.
figure 1

a An optical image. b {100} Pole figure oriented parallel to the build direction with maximum intensity of 5.19 MRD. c IPF map. d The composition and hardness gradients along the height of the part. The arrows in d show the two axes of the hardness data19 (adapted with permission from ref. 19. Copyright 2020 Elsevier).

Borkar et al. studied the compositionally graded AlxCrCuFeNi2 (0 <x < 1.5) HEAs produced by laser deposition from a blend of elemental powders, using a double powder feeder with two hoppers containing CrCuFeNi2 and Al2CrCuFeNi2 powders, respectively. The sample of a cylindrical geometry was deposited with a smooth change of alloy composition in height15. Additionally, an identical laser deposition processing method, laser-engineered net sha** (LENS), was also applied to construct the compositional and microstructural libraries of AlxCoCrFeNi in a high-throughput manner18. The discrepancy between LENS and the above-mentioned case was that the substrate (CoCrFeNi plate) for LENS was priorly made by an arc-melting and copper mold-casting method, while in Borkar’s work, a blend of powders of a nominal composition of CrCuFeNi2 was used. During the LENS process, the laser power and moving speed remained unchanged, and the feeding rate of Al powder for each monolayer patch increased in certain increments. The entire deposition process includes the addition of Al and two subsequent remelting processes perpendicular to the deposition direction, to improve the mixing and compositional homogeneity of the alloyed region18.

In fact, the design and parameter adjustment of the LMD process has an important effect on sample preparation. For example, the substrate greatly influences the composition and microstructure of the deposited alloys, which can be improved by increasing the stack thickness or a reasonable experimental design. The former will not only increase the preparation cost, but will also affect the microstructure uniformity. Selecting the main component of the alloy as the substrate material, depositing the sample in the thickness direction with less affection for the substrate, and a controlled composition gradient could form a reasonable experimental design17,23.

Combinatorial deposition of thin film materials libraries

Combinatorial thin film synthesis by sputtering using multiple deposition sources is a state-of-the-art route for constructing of materials libraries that are composed of a wide range of gradually changed alloy compositions24,25. Continuous preparation of multiple gradients can be achieved by adjusting the processing parameters, such as the compositions of the targets, the power and angle of each gun, and the material and rotation of the substrate. For HEAs, several approaches based on sputtering have been employed for alloy design by tweaking these parameters25,26,27,44. Shukla et al. performed micro-hardness tests along four depth levels of a sample from the top surface, and each indent point was also 0.5 mm apart along the alloying path so that the systemic hardness and moduli were obtained. It is found that an increase in moduli and hardness values can be attributed to solute–matrix interaction. The as-cast ε phase-dominant microstructure showed ∼153 GPa moduli, while the same for a completely γ microstructure with supersaturated Cu content reached up to ∼224 GPa52. In measuring nano hardness and modulus of multicomponent samples, the setting route of nanoindentation is usually along the gradient direction of a specific alloying element, and the discrete points and micro-hardness can be analyzed for multiscale observation70. According to the hardness–strength relationship, one can efficiently select the potential HEA candidates with the desired mechanical properties, such as higher yield strength in a relatively large composition space. Although there may exist discrepancies between the high-throughput made and bulk samples for the absolute value of mechanical properties, the variation trend of the composition and properties of interest can still provide effective information for the design of new HEAs.

Due to the small scale of the HEAs prepared by the above-mentioned high-throughput methods (i.e., addictive manufacturing, sputtering, etc.), it is usually difficult to cut bulk specimens from the thin layers or coatings. The SPT is an evolving small specimen test technique with the potential to extract the mechanical properties (ductility, elastic modulus, yield strength, ultimate tensile strength, fracture toughness, etc.) from small-volume HEA specimens prepared by high-throughput methods67,69. It should be noted here that a prerequisite for using this test is to establish correlations between SPT and conventional mechanical tests such as tensile testing for HEAs in priori. However, the SPT response is easily influenced by different test parameters, that is, for specimen shapes and thickness, test speed, ball diameter, and so on. It is therefore imperative to understand the effects of these parameters. This necessitates the optimization of test parameters to obtain nearly unique SPT responses, at least for a class of HEA materials. Thus, it is necessary to relate the conventional and SPT results by empirical and analytical relations.

Additionally, the cooling rates of commonly used addictive manufacturing and sputtering high-throughput methods are much higher than those of traditional casting methods used for the preparation of bulk HEAs. Notably, in some extreme cases, owing in part to the multi-principle nature of HEAs, the HEA coatings or layers via high-throughput methods can form amorphous structures, which make the mechanical properties quite different from the bulk HEAs. Thus, the optimization of preparation parameters to make the cooling rate agree with the casting method is significant for the formation of HEAs. In sum, although there are some discrepancies between thin and bulk HEA materials, the SPT methods can at least determine the mechanical properties and guide the researchers to develop better HEAs in a such large composition space.

Physical properties

As one of the typical physical properties, the magnetic properties of HEAs depend heavily on the size, microstructure, and preparation process of the sample. Many efforts have been made to measure and map magnetic properties at very high spatial resolution. Borkar et al. presented a new combinatorial approach, based on laser additive deposition of compositionally graded alloys, for rapid assessment of the composition–microstructure–magnetic relationships in AlxCrCuFeNi2 alloys (0<x < 1.5 at.%) HEAs. Along the same alloy gradient, the microstructures are FCC solid solution, FCC/L12, mixed FCC/L12 + BCC/B2, and finally predominantly BCC/B2 with increasing Al content. Owing to the change of microstructures, the low Al-containing FCC/L12 regions are weakly ferromagnetic, while the BCC/B2 regions with higher Al contents are strongly ferromagnetic, exhibiting lower coercivity and higher saturation magnetization15. For the FeMnCoCrAl HEA system, Marshal et al. developed thin-film libraries for the combinatorial evaluation of the phase formation and magnetic properties combined with spatially resolved atom probe tomography and DFT simulation. It was found that the addition of Al can promote the formation of BCC structure, which exhibits soft ferromagnetic behavior. A further increase in the non-ferromagnetic Al content beyond 8 wt% decreased the overall saturation magnetization due to the substitution of ferromagnetic species by paramagnetic Al and lattice distortions, which was in agreement with DFT predictions32. As can be seen in these cases, high-throughput techniques are efficient in explaining the microalloying effects on the magnetic properties of HEAs and therefore have great potential for the future designs of soft magnetic HEAs with better performance. However, it should also be noted that the size effect and magnetocrystalline anisotropy caused by thin film may lead to some artifacts, which can be eliminated by increasing the thickness of as-prepared film/layer libraries or changing the measurement direction when performing magnetic testing.

Besides saturation magnetization, studies of high-throughput techniques for other physical properties of HEAs are rather sparse. However, when expanding the scope to other materials synthesized via high-throughput techniques, there are different physical properties of interest. For example, useful combinatorial methods for examining magnetic properties include magnetic force microscopy and scanning magneto-optical Kerr effect imaging40. In addition, the Decay microwave probe microscope, with very high micro-region resolution, can measure magnetic properties, including susceptibility and spin resonance. Combined with automatic sample table control and data acquisition, it is possible to realize a high-throughput automatic electromagnetic measurement of the composite material chips71,89. There are different software developed based on the CALPHAD approach, one of the typical commercial software is Thermo-Calc, which includes high-throughput modules such as TC-Python. Thermo-Calc users run batch calculations for many varied parameters in a high-throughput manner. Many attempts have been made to develop thermodynamic modeling in a variety of different alloy systems using the high-throughput CALPHAD method, including phase diagrams and thermodynamic properties90,91,92,93,94,95. Due to the limitations of the empirical VEC rule in different HEA systems, Zhong et al. recently proposed a data screening procedure to develop new HEAs via a high-throughput CALPHAD approach (as shown in Fig. 7)94 and found the relationship between phase formation behavior and VEC. Additionally, Zhang et al.90 reported a sufficiently large database of the Al–Co–Cr–Cu–Fe–Ni HEA system to calculate the primary solidification phase. Klaver et al.93 used the Thermo-Calc to determine the phase evolution behavior of AlCrMnMoTi, AlCrMoNbTiV, AlCrMnNbTiV, and AlCrFeTiV alloys at different temperatures and found that AlCrMnNbTiV and AlCrMoNbTiV were better HEA formers. Gurao and Biswas91 studied 1287 equiatomic quinary alloys using the CALPHAD method to find single-phase FCC and BCC HEAs. According to their calculation results, they achieved the optimized alloy composition just by preparing two FCC alloys and seven BCC alloys, which dramatically increased the efficiency of alloy designing. In particular, CALPHAD can predict the phase diagram under extreme conditions, such as high temperatures and high pressures, which are difficult to explore for experimental studies.

Fig. 7: The schematic of discovering HEAs with a high-throughput CALPHAD approach.
figure 7

Al–Co–Cr–Fe–Ni quinary systems were used as the case study to investigate the reliability VEC rule and its application to the material design94 (adapted with permission from ref. 94. Copyright 2020 Elsevier).

As a newly emerging technology, HTC still faces critical challenges. First, most integrated calculation programs currently available are based on first-principles calculations; thus the material data are obtained from a few to dozens of atoms, which requires develo** the HTC further on a larger scale. In this regard, combining ML and first principles to develop high-precision potential functions for MD simulations is a significant trial96. Second, the classification of the accumulated materials data is still vague, making it difficult to maintain a materials database in the future. It should be clearly divided according to an authoritative materials classification system. In addition, the data format should be strictly followed in the acquisition process. In terms of an in-depth understanding of HEAs, due to the multi-principal elements contained in HEAs and the metastable state in thermodynamics, there is an urgent need to develop a reliable thermodynamics database that contains a series of composition, temperature, and phase-equilibrium data for HEA systems. In this regard, the related binary and ternary systems should be gathered and assessed by implementing experiments and calculations on HEA systems.

Data-driven ML strategies

The enormous composition space for designing HEAs offers not only opportunities but also great challenges, requiring intelligent and efficient strategies for materials discovery. As a burgeoning branch of materials science, data-driven methods, such as ML, which are used to study a wealth of existing experimental and computational data, have become a very exciting area of research in materials science. ML refers to programs that automatically improve their ability to perform tasks by learning from experience in many scenarios. This automates the time-consuming knowledge acquisition process, which is essential to speed up computing and reduce the cost of develo** data-based systems. With ML, when given enough data and a rule-discovery algorithm, computers can analyze the trends in datasets and further help one to understand the relationships between properties and different parameters, which is beneficial in guiding materials modeling. ML is most useful in situations in which human learning is impossible, such as when data and interactions within the data are too complicated and intractable for human understanding and conceptualization97.

Datasets for HEAs

The first and most important step in ML is to generate robust datasets for training the ML model. The selection of suitable data can be deceptive in ML, which is why so much emphasis is placed on the visualization of the datasets98. The construction of a dataset is task-oriented; that is, the final prediction plays a decisive role in what type of data should be collected.

The study of ML in HEAs mainly focuses on the formation of single-phase solid solutions (i.e., BCC, FCC, and HCP), while some work has been carried out on mechanical properties such as hardness and modulus. Compared to traditional metallic materials, HEAs are newcomers that have been studied for only nearly two decades. To date, most HEA data have been collected from published experimental work or simulation methods. Miracle and Senkov’s review summarizes a dataset containing 648 entries of HEAs in different systems14. Based on this dataset, Zhuang et al. constructed a dataset composed of 401 HEAs, which consists of 174 SS phases, 54 intermetallics (IM), and 173 SS + IM phases, by removing some multiple alloys with the same composition99. Later, in 2020, Gao et al. built a dataset consisting of 1252 samples—625 single-phase and 627 multi-phase alloys—covering binaries and multi-component systems100. Besides experimental data, computational methods, such as high-throughput ab initio and DFT-based approaches, are used alternatively to produce phase formation information. Curtarolo et al. developed a high-throughput ab initio method called LTVC (Lederer–Toher–Vecchio–Curtarolo) to predict the transition temperature of multi-component systems88. In this way, a dataset containing a total of 1798 unique equiatomic compositions was constructed, consisting of 117 binaries, 441 ternaries, 1110 quaternaries, and 130 quinaries. Based on this dataset, Vecchio et al. built a data-driven workflow for predicting the composition–phase–structure relationship101.

Besides the phase formation data, there are property datasets of HEAs. Using the integrated CALPHAD-ML approach, Sun and Lu et al. predicted the hardness of Ti–Zr–Nb–Ta refractory HEA, which included building a database of 100 quaternary alloys, training the ML model, hardness prediction, and experimental verification102. A database composed of alloy composition and hardness data for the Ti–Zr–Nb–Ta RHEAs was established by combining CALPHAD. To search for high-entropy ceramics, Vecchio et al. performed an ML framework on 56 previously reported entropy-formation ability values, including nine synthesized compositions, six single phase, and three multi-phases. The high-entropy ceramics in the dataset are mainly composed of eight carbide-forming metal elements (Hf, Nb, Ta, Ti, Mo, V, W, and Zr)103. Regarding the modulus, Chen et al. combined first principles and ML to predict the elasticity of severely lattice-distorted HEAs with experimental validation. The ML models were trained on 6826 ordered inorganic compounds from the Materials Project database to predict the Voigt–Reuss–Hill averages of bulk and shear modulus with log-normalization104. In the case of experimental data for modulus, Roy et al. compiled Young’s modulus consisting of only 87 HEA entries from limited available experimental reports105. All the above-mentioned datasets are summarized in Table 1.

Table 1 Datasets of phase and mechanical properties for HEAs

Despite substantial progress in the construction of datasets for HEAs, the data size improvement is still far from complete. As a result, the results of calculations and predictions based on these databases may deviate significantly from the experimental results. Moreover, when reporting their findings, researchers tend to publish only favorable data, while the bad data points are often dropped. This will lead to the dataset being unbalanced and will affect subsequent ML models’ performance. Therefore, there is an urgent need to develop reliable and robust databases dedicated to HEAs. As such, high-throughput preparation and characterization, as well as HTC, would be a reliable approach to batch production of HEA libraries, including composition and property information.

Phase formation prediction

As a new paradigm for develo** HEAs, the data-to-knowledge ML strategy has the potential to explore complex structures and property space in an efficient way. Additionally, it can also yield valuable insights into the key factors that determine macro-performance and thus guide the design of HEAs with enhanced properties. As mentioned above, ML in the field of HEAs relies on the availability of libraries of compositions, structures, and properties that have been assembled and scrutinized by experimental and computational methods. Considering the different data sizes, phase formation behaviors (i.e., single solid solution formation for HEAs) have attracted much attention from the academic community105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120. In addition, there are increasing studies on the physical or mechanical properties of HEAs. From the perspective of ML, the two cases above correspond to classification and regression issues, respectively. As such, in this section, we will review ML techniques and propose the possibility of further development of ML in HEAs.

Phase formation behavior is crucial to the performance of HEAs. While computer simulations, such as first-principles calculations and MD simulations, have become a commonly used tool for materials discovery, their computation expense limits their application in the accelerated exploration of potential HEAs. The recent implementation of data-driven techniques has provided a possible alternative for efficiently predicting phase formation in HEAs109,110,111,112,117,119,121,122. ML can recognize the inner data pattern and construct a model to make quick predictions for unseen samples. Based on very sparse data, Raabe et al. proposed an active learning framework, which includes three main steps—targeted composition generation, physics-informed screening, and experimental feedback—to accelerate the design of high-entropy Invar alloys in an almost infinite compositional space (see Fig. 8). Compared with the conventional design approach, which requires years and many experiments, this ML workflow requires only a few months to develop HEAs with desirable properties121. Wu et al. used ML to successfully predict eutectic HEAs with excellent mechanical properties in the Al–Co–Cr–Fe–Ni HEA system, and analyzed the key elements for forming eutectic HEAs117. Islam et al. established a neural network model to predict the formation of the HEA phase. Cross-validation revealed a predictive accuracy of 83% on this limited data set109. Amitava et al. used more algorithms to establish multiple prediction models and forecast the different structures of the solid solution (FCC, BCC). The prediction accuracy is over 90%, which is attributed to the fact that the random forest model has overwhelming advantages in dealing with small datasets compared to the artificial neural network algorithm111. Thus, understanding and applying multiple ML algorithms is necessary for the prediction of HEA phase formation. Moreover, to solve the data shortage problem of HEAs, Lee utilized a conditional generative adversarial network to find a model distribution that emulates the distribution of known HEAs, then augmented realistic samples based on feature representation, and finally realized the expansion of the original dataset119. The results show that the accuracy of the model is significantly improved due to data augmentation.

Fig. 8: Schematic flow chart of the active learning framework.
figure 8

This framework aims to design the composition of HEAs, combining ML models, DFT calculations, thermodynamic simulations, and experimental feedback121 (adapted with permission from ref. 121. Copyright 2022 AAAS).

Compared with the original ML modeling method, using feature engineering to construct a new descriptor can effectively determine the structure–performance relationship123. Material descriptors and models determine the robustness of the ML prediction. Pei et al. carried out the ML modeling analysis of many parameters and the link between the phases, and identified the physical parameters that are crucial to the formation of solid solutions100, such as volume modulus, melting temperature, etc. Dai et al. used feature engineering and the ML strategy to extend the descriptor dimension from a low dimension originally to a high dimension114. Due to the uniqueness of different algorithm constructions, the best performance model depends on the effective combination of datasets, descriptors, and algorithms. In this regard, Zhang et al. proposed a systematic framework that utilized a genetic algorithm (GA) to efficiently select the ML model and materials descriptors from a huge number of alternatives and demonstrated its efficiency on two-phase formation problems in HEAs114. Generally, the prediction accuracy of the model can be improved through hyperparameter optimization, such as increasing the number of hidden layers and neurons in the neural network107. Overfitting and underfitting are the common problems that any ML may encounter113, and there is no exception in the study of predicting HEA phases by ML. Huang et al. found the overfitting phenomenon using ML phase projection. By adjusting the super parameters involved in the training process, training accuracy can always be improved to a higher level99. Wen et al. proposed ML models to predict the solid solution strength/hardness of HEAs123. Figure 9 shows the prediction error for the hardness of HEAs by five-fold cross-validation with possible combinations of different features (ξ, δXr, and ε, etc.). All ML models, including random forests (RF), support vector regression (SVR), kernel ridge regression (KRR), Gaussian process (GP), extreme gradient boosting (XGB), and Bayesian regularized neural networks (BRNN), show a basin-like tendency, indicating that too many or too few features will reduce the accuracy. According to “Occam’s razor” principle, simplicity, and interpretability with a minimum number of features are necessary for adequate accuracy. Using more features complicates the interpretation of the model and risks overlearning.

Fig. 9: Feature selection based on combinations of features from different ML algorithms.
figure 9

The predicted error of each model contains a subset of the eight features in the data set123 (adapted with permission from ref. 123. Copyright 2020 Elsevier).

In the absence of unified evaluation criteria, excessive optimism is often reported116 as a result of overfitting and the use of inappropriate training and test data. It is necessary to propose new standard criteria that can be used to evaluate the true accuracy and performance of ML models. An emphasis on experimental validation and repeatability through code archiving also helps overcome this challenge. The regularization method can be incorporated into the ML model to improve the generalizability of the model119. The hyperparameters of the model can also be optimized by the Bayesian optimization method to obtain good generalizability under the condition of high accuracy. In addition, constructing new rules with strong interpretability and universality through ML is desirable, which can be explored using conformable regression. Therefore, combining experimental results with theoretical guidance to analyze specific target characteristics is imperative to screen new HEAs with good performance115.

Prediction of mechanical properties

As a new kind of structural material that can serve under extreme environments, HEAs exhibit unique mechanical properties, such as high strength and hardness, and low moduli. These properties are generally used as selection parameters in the search for new alloys. This raises the question of whether ML algorithms can be readily used to the search for candidate alloys with better mechanical properties in such a large composition space.

As one of the most typical mechanical properties of HEAs, hardness has strong correlations with other properties, which requires an in-depth understanding. For instance, based on a reliable hardness–strength relationship, complex mechanical tests can be replaced to some extent by efficient and inexpensive hardness tests for a fast and comprehensive assessment of mechanical properties. Hence, develo** data-driven methods, in addition to experimental methods, is essential to effectively calculate, predict, and evaluate the hardness of HEAs. In this regard, several studies have attempted to explore the possibility of ML as an aid in hardness assessment. For example, using the integrated CALPHAD-ML approach, Sun and Lu et al. predicted the hardness of Ti–Zr–Nb–Ta refractory HEAs, which included building a database of 100 quaternary alloys, training the ML model, hardness prediction, and experimental verification, as shown in Fig. 10102. Menou et al. used a multi-objective optimization GA, together with solid solution hardening and thermodynamic modeling (CALPHAD), to design HEAs with high hardness124. Combining the radial basis function neural network algorithm and first-principles calculations, Zhu et al. found the key role of Al and its significant influence on hardness in modeling the Al–Cr–Fe–Ni system125. In a similar Al–Co–Cr–Cu–Fe–Ni system, Su et al. formulated a property-orientated materials design strategy combining ML, design of experiment, and feedback from experiment to search for HEAs with high hardness126. On this basis, they further proposed ML models, including feature engineering and physical models, to provide insights for predicting the hardness of these HEAs.

Fig. 10: Hardness distributions as functions of the Ta content.
figure 10

a 0.05–0.2 at.% Ta. b 0.25–0.4 at.% Ta. c 0.45–0.6 at.% Ta. d 0.65–0.8 at.% Ta. e Hardness values were predicted using the ML model for RHEAs with Ta contents of 0.35. f Hardness values predicted using the ML model for RHEAs with Ta contents of 0.4102 (adapted with permission from ref. 102. Copyright 2021 AIP Publishing).

In recent years, there have been several studies on the moduli of HEAs. Recent developments in the field of HEAs have sparked interest in using ML to predict moduli. Balasubramanian et al. implemented gradient boost algorithms to predict Young’s modulus (\(E\)) as well as the phase structure of low-, medium-, and HEAs composed of refractory elements. The ML result was in good agreement with the experiments and revealed that the melting temperature and the enthalpy of mixing are the key features determining the \(E\) of refractory HEAs105. Fewer studies have evaluated the role of ML in the plasticity or strength of HEAs compared to other mechanical properties (e.g., hardness and modulus). A principal reason is that the plasticity and strength data are very sensitive to the preparation process and sample sizes, leading to the poor quality of the original input dataset. Despite the obstacles, some attempts have been made to investigate the possibility of an ML framework for predicting the plasticity and strength of disordered alloys. Recently, Liu et al. constructed a data set through high-throughput preparation of solid solutions using powder metallurgy with Zr–Ti–Nb–O alloys as target materials127. Their study provides an enlightening idea for enhancing the plasticity of HEAs by tailoring key features via tuning the element content.

ML force fields

MD simulations are normally conducted with classic interatomic potentials. As these potentials often scale linearly with the number of atoms, they are computationally inexpensive, and the loss in accuracy is ignored to facilitate longer simulations or simulations with large-scale systems that include hundreds of thousands of atoms. However, the construction of force fields and tight-binding parameters is not straightforward. Given this, ML methods can provide a useful option for creating a reliable potential energy representation. Machine learning potentials (MLPs) are mathematical representations of the multidimensional potential-energy surface as a function of atomic positions. Unlike traditional potentials, reference databases of MLPs are usually generated by DFT calculations without experimental information. The other two ingredients required for MLPs are local structural descriptors, such as atom-centered symmetry function descriptors128, the smooth overlap of atomic positions129, and spectral neighbor analysis potential descriptors130,131,132 etc., representing atomic configurations and supervised learning models to obtain reliable relations between structure and energy, force, or stress tensor133,134,135.

MLPs have greatly promoted the studies of structure, thermodynamics, and mechanical properties of HEAs. Short-range ordering (SRO) refers to local chemical/structural ordering, which is a common structural feature in HEAs. It arises from the chemical interactions of constituent elements and significantly affects structural stability, and magnetic and mechanical properties136,137,138. Meshkov et al. used a low-rank potential in combination with MC simulations to investigate chemical SRO in the equiatomic fcc CoCrFeNi HEA, and demonstrated that Fe and Cr form sublattices139. Similar schemes were also employed to study the phase stability, phase transitions, and chemical SRO of the bcc NbMoTaW HEA by Kostiuchenko et al.140 They claimed that if local lattice distortions are introduced, the single phase stabilizes instead of separating into sublattices until it drops to room temperature. Later on, a new algorithm combining the thermodynamic integration method with moment tensor potentials was developed by Grabowski et al. to study the anharmonic free energy of a five-component VNbMoTaW refractory HEA, which achieved DFT-level accuracy141. DeepMD was also applied to molten TiZrHfNb using ab initio molecular dynamics (AIMD) trajectories142. Structural analyses of a VZrNbHfTa melt via partial RDFs and SRO parameters were exploited using high-dimensional neural network potential, indicating that vanadium atoms are repulsed by other types of atoms143. Another NbMoTaW potential, adopting the SNAP model, was applied to study the complex strengthening mechanisms by modeling Nb segregations to the grain boundaries. Applying the SNAP model, polycrystalline models with and without Monte Carlo/MD simulations were obtained, as shown in Fig. 11a–b144. Byggmästar et al. developed a set of Gaussian approximation potentials that were used to study segregation and radiation damage of the bcc refractory VNbMoTaW HEA145,146. The potentials show good accuracy and transferability in terms of elasticity, thermal stability, liquid and defect structure, and surface properties145. Figure 11c, d shows that the final defect structure of irradiated VNbMoTaW contains only smaller dislocation loops with respect to the pure W. In conclusion, the reduction of interstitial migration, the immovable dislocation loops, and the increase of vacancy mobility together promote the recombination of defects rather than clustering in HEAs146. In addition, there are some MLPs for medium entropy alloys147,148,149 and high entropy ceramics4,5,6. For example, Pak et al. used Canonical Monte Carlo simulations with the ML interatomic potentials to determine the temperature conditions for the formation of single-phase and multi-phase high-entropy ceramics and claimed that for TiZrNbHfTaC5 produced with electric arc discharge, the single-phase formation temperature was as high as 2000 K6.

Fig. 11: Polycrystalline models obtained via simulation method.
figure 11

a The same polycrystalline model after random initialization with equimolar quantities of Nb, Mo, W, and Ta144 (adapted with permission from ref. 144. Copyright 2020 Springer Nature). b Snapshot of polycrystalline model after hybrid Monte Carlo/MD simulations. c, d Defect evolution during annealing146 (adapted with permission from ref. 146. Copyright 2021 American Physical Society).

In general, interatomic potentials based on ML help to address the longstanding dilemma between efficiency and accuracy in MD simulations, but there are still some challenges in this field. First, the completeness of databases organized for the potentials of multicomponent chemically disordered systems is complicated and non-standardized, which is further exacerbated by short- or medium-range orders. Additionally, it is difficult to apply MLPs out of databases due to better flexibility but less extrapolation. Another concern is that MLPs are not based on physical information150. While active learning approaches151 and physically informed MLPs152 may be the solutions, further development is still needed.

Outlook

This paper presents a concise review covering several aspects of this rapidly growing field over the past two decades, from high-throughput experiments and computations to the data-driven ML of HEAs. To inspire and spur new ideas, we present some perspectives and possible research directions in HEAs.

High-throughput characterization techniques and high-quality data acquisition for HEAs

To keep pace with continuous advancements in high-throughput material preparation methods, it is crucial to develop high-throughput characterization techniques that offer high resolution, efficiency, and affordability. From a microdomain or in situ measurement perspective, synchrotron X-ray techniques possess exceptional capabilities for high-throughput characterization of a vast array of material samples due to their remarkable brightness, and high temporal and spatial resolution, thereby alleviating the flux bottleneck in high-throughput experiments. In addition, subsequent data crafting with high quality remains an ongoing challenge. Manually extracting data with expert knowledge is a time-consuming task for thousands of articles. Thus, it is increasingly necessary to develop methods for automated data extraction that are both rapid and accurate. Techniques such as web-crawler, natural language processing, or pattern recognition could potentially facilitate the automatic extraction of information from articles or patterns such as SEM, EBSD synchrotron XRD, and others.

Metastable state of HEAs

Due to the multi-principal elements contained in HEAs and the metastable state, there is an urgent need to understand the nonequilibrium thermodynamics of HEAs from both experimental and calculation perspectives. The cooling rates of some high-throughput methods are much higher than those of traditional casting methods used for the preparation of bulk HEAs. In some extreme cases, owing in part to the multi-principal nature of HEAs, the combinational materials libraries made using high-throughput methods can form amorphous structures, which make the properties quite different from bulk HEAs. In terms of high-throughput CALPHAD, to develop a reliable thermodynamic database for HEA systems, the related binary and ternary systems should be gathered and assessed by implementing experiments and calculations.

Analysis of SRO in HEAs

To understand comprehensively the correlations between SRO and properties, and to facilitate the development of innovative alloys, it is imperative to scientifically describe and quantitatively characterize SRO in these compositionally complex alloys. However, the multi-principal element nature of HEAs poses significant challenges for direct experimental observation and accurate description of the SRO. Detailed chemical ordering information can be obtained by combining ML techniques with AIMD simulations or reverse Monte Carlo refinement methods.

Evaluation criteria and interpretability of ML methods for HEAs

In the absence of unified evaluation criteria, excessive optimism is frequently observed, resulting from overfitting and the use of unsuitable training and test data. It is essential to propose new standardized criteria to properly assess the true accuracy and performance of ML models. Prioritizing experimental validation and repeatability through code archiving can also help mitigate this issue. Additionally, the interpretability of ML models remains limited and necessitates bridging existing gaps. There is a need to develop new rules with robust interpretability and universality through ML exploration using appropriate algorithms. Techniques such as partial dependence plots, individual conditional expectation, permutation feature importance, global surrogate, local surrogate (LIME), and SHAP (SHapley Additive exPlanations) exhibit varying technical characteristics that enhance interpretability.

In summary, the future studies of high-throughput experiments, computations, and data-driven ML in HEAs will focus on a comprehensive workflow design, incorporating rational experimental design, automated high-throughput synthesis, fundamental principles of high-throughput materials characterization, computational modeling, and data mining techniques. This multidisciplinary approach will offer a robust framework for the rational design and discovery of materials.