Introduction

Aluminum (Al) and its alloys provide the unique combination of properties, which makes them economical, versatile and attractive metallic materials for many uses–from highly ductile, soft wrap** foil to the most demanding structural applications.1

As pure aluminum is rather soft, the alloying elements are used to improve and control the properties in Al alloys. Most common additions are manganese, copper, silicon, zinc and magnesium. Up to 2 wt.% total amount of these elements can be typically present in an Al alloy, with some specialized alloys containing even large amounts of additives. Also, some minor alloying elements are added in the amounts < 0.5%. These elements have a function of controlling some specific properties, e.g., recrystallisation or corrosion resistance. The Al alloys are classified as wrought, casting and rapidly solidified/ powder alloys, which are further subdivided as age- and work-hardenable alloys.1 These classes are further subdivided into various systems based on the selection of alloying elements.

Aluminum alloy design, since the beginning of the twentieth century, has been essentially an iterative and empirical process, based on the lessons learned from experience and in-service use.2 A hill-climbing approach is taken in the traditional development and design of Al alloys.3 This traditional method of research and development is laborious, expensive and does not consider the full spectrum of potential properties. Testing of billions of combinations of alloys is not possible.4,5 Alloy development by mixing a combination of alloying elements and characterizing their structure and testing their properties is slow, costly and does not fully harness the data that have been accumulated before. Examples of empirical Al-alloy discovery include the gradual and continuous development of Al-Cu alloys to Al-Cu-Mg, then to Al-Cu-Zn-Mg and Al-Cu-(Mg)-Li alloys, based on the requirements of structural applications.6

Currently, 7 million tons of recycled aluminum scrap and around 22 million tons of new Al are being used. Recycling aluminum has both financial and environmental benefits. The production of 1 ton of primary Al requires 14,000 kWh energy to create.7,8,9 In contrast, only 5% of this is required to remelt and recycle 1 ton of Al. Ideally, recycled and virgin Al alloys would be of equal quality.

However, the recycling of complex in composition wrought alloys faces a few challenges. Wrought Al alloy production from scrap has the main difficulty of attaining the desired chemical composition with a less addition of pure alloying elements and also of primary aluminum.5 Technically, mixing of various scrap leads to the uncontrollable concentration of critical elements such as iron, copper, manganese, magnesium, zinc and silicon.12 Once the concentration of these critical elements falls outside of the concentration limit for a particular wrought alloy, the only solution would be their dilution with primary Al or modification of alloy composition. Another technical solution is to carefully sort and combine scrap at the pre-melting stage to assure the suitable chemical composition of the batch is costly.7,10 Therefore, due to the large search space of alloying elements present and great variety of existing Al alloys, grou** of alloys in the classes would be a potential solution.

Another well-known issue is the “compositional-tolerance limit” existing especially for wrought alloys, which becomes critical in recycling where unusual and unexpected impurities can inadvertently creep in, and even normal impurities may tend to build up and accumulate to an unacceptable degree.10 In most cases, it is not well investigated but rather the influence of these tolerance limits on the properties (and particularly on the selected properties) is known from experience with wrought alloys.7,11 These recycling issues show an urgent need for optimization of Al alloy grades to narrow down their search space.

The optimization of existing alloys and discovery of new alloys until now have been based on thermodynamic modeling11 and ‘ab initio’ methods.12 However, these methods have limitations in linking the composition to the properties and to the real processing and service conditions. Traditional methods are not suitable for finding new optimal alloys compositions from the large compositional search space.

However, there are several advanced methods like machine learning (ML) and artificial intelligence (AI) that aim to develop self-learning models that can solve these problems, acting like the human brain. Machine learning aims at building statistical models for data analysis and prediction. The artificial intelligence algorithms construct an inference model connecting the targeted property to material descriptors by successfully learning from the past data and understanding the pattern inside the data, which results in the rational choice of next experiment and makes accurate predictions.13

Today, materials science successfully uses machine learning algorithms to solve various materials science problems.14 The prediction of new stable materials, calculation of multiple material properties and acceleration of first-principle calculations are only a few examples of the numerous machine learning applications in materials research that have already been shown to be effective.15

For instance, Raccuglia et al.16 successfully designed new materials by employing a machine learning strategy to learn the rules of material synthesis from failed experiments. High-performance copper alloys with inverse compositional design were accomplished by building a method for designing alloy compositions with a focus on alloy properties; this method was also used to design the composition of piezoelectric materials.50,53,54,55 There is consensus that the metastable coherent β′′ phase serves as the main strengthening phase in Al-Mg-Si ternary alloys.53,54,55

When Cu is added to the Al-Mg-Si alloys in the 6XXX series, the Al-Mg-Si-Cu family of alloys is created. These quaternary alloys do not have a distinct designation within the 6XXX alloy series.50,56,57 The possibility for the formation of a quaternary phase is a significant underlying characteristic shared by all these alloys.58 The phase was extensively studied and is so called the Q phase.59,60,61,62 While the compositions of ternary Al-Mg-Si alloys fall on the equilibrium phase diagram at normal aging temperatures in a three-phase field, i.e., (Al) + β (Mg2Si) + (Si),58 the coexisting equilibrium three-phase fields enlarge into three tetrahedron composition spaces upon the inclusion of Cu. Inside each of these spaces, a four-phase equilibrium is present in the equilibrium phase diagram consisting of the two common phases, i.e., quaternary Q phase and (Al), and two other phases from the selection: β (Mg2Si), θ (CuAl2) and (Si).58 Regarding the precipitation, which is a metastable process, the presence of Cu modifies the order of precipitation as follows:63

At a lower Cu concentration: SSS—GP zones—β′′ (with Cu?)—Si—numerous variations of β′ (with Cu) including β′C—Mg2Si and Si.

At a higher Cu concentration: SSS—GP zones—β″ (with Cu?)—θ′ or/and Q′—Si—numerous variations of β′ (with Cu) including β′C, Q′ and θ′—Q (AlMgSiCu), Mg2Si, Al2Cu and Si.

These two precipitation paths may form at intermediate copper concentrations in cases of natural aging preceding, artificial aging and temperatures > 200°C. It is also known that the excess of Si regarding Mg2Si stoichiometry (revealed by the Mg:Si ratio) is beneficial to hardening because of the effect on the composition and structure of GP zones. The addition of copper also makes 6XXX series alloys less sensitive to the negative effects of natural aging on hardening upon artificial aging.63

Based on this general understanding of metallurgy of 6XXX series alloys, we can suggest the metallurgical reasons behind grou** of the alloys in different clusters. Let us look primarily at the yield strength as it is the property that responds most directly to the precipitation hardening.

Cluster 4: Cu = 0.85%, high concentration of Mg and Si, excess of Si (Mg:Si = 1–1.67), low Fe. High YS is due to the formation of β″ modified with Cu.

Cluster 2: Medium range of Cu = 0.275–0.95%, higher concentration of Mg and Si, large excess of Si (Mg:Si < 1). High YS due to the formation of β′′ modified with Cu.

Cluster 1: Cu = 0.2–0.275%, medium Si and high Mg concentration, Mg:Si is close to stoichiometry ~ 1.667, high Fe. Medium YS due to the less β′′ particles and high Fe that takes some Si from the solid solution.

Cluster 3: Cu = 0.1–0.2%, high concentration of Mg and Si and large excess of Si (Mg:Si < 1), high Fe. Medium YS due to the precipitation of β″ but in lesser quantities due to the consumption of Si by Fe-containing phases.

Cluster 0: Cu = 0.05–0.1%, lower concentration of Mg and Si and moderate excess of Si (Mg:Si = 1.3–1.68). The lowest YS due to the overall smaller concentration of solute elements and the resultant smaller amount of the hardening β″ phase.

These observations proved that clustering had a metallurgical reasoning behind it as the variation in the tensile properties between the clusters was due to the various precipitation hardening features. These clusters will now be the basis for future optimization and reduction of the number of the alloy grades without compromising their tensile properties.

Conclusion

In the present study, we attempted to refine the search space of 6XXX series aluminum alloys (extruded, T6 condition) into a few clusters by a combined PCA and K-means algorithm. We successfully created five clusters, each having distinct ranges of composition and properties with very few overlaps and outliers. To find the characteristic features that make each cluster special, we used a LIME algorithm. This allowed us to identify the alloying elements with their combinations and concentrations that affected the yield strength and modulus of toughness of the alloys in each cluster.

In addition to this formal selection of the features determining the formation of each cluster, we also examined the alloys in the clusters from the point of view of the known metallurgical knowledge due to precipitation hardening in 6XXX series alloys, as precipitation hardening of 6XXX alloys has a direct impact on the yield strength. This analysis showed that the amount of alloying elements, Mg:Si ratio and the contents of Fe are the parameters that determined the inclusion of the alloys in a given cluster.

This study showed that the machine learning algorithms gave us a meaningful selection of five clusters that incorporated 50 commercial 6XXX series alloy grades. This is a good base for optimization and reducing the number of the alloy grades without compromising their tensile properties. In the future, we will widen the selection of properties in our dataset.