Log in

Data-driven physics-constrained recurrent neural networks for multiscale damage modeling of metallic alloys with process-induced porosity

  • Original Paper
  • Published:
Computational Mechanics Aims and scope Submit manuscript

Abstract

Computational modeling of heterogeneous materials is increasingly relying on multiscale simulations which typically leverage the homogenization theory for scale coupling. Such simulations are prohibitively expensive and memory-intensive especially when modeling damage and fracture in large 3D components such as cast metallic alloys. To address these challenges, we develop a physics-constrained deep learning model that surrogates the microscale simulations. We build this model within a mechanistic data-driven framework such that it accurately predicts the effective microstructural responses under irreversible elasto-plastic hardening and softening deformations. To achieve high accuracy while reducing the reliance on labeled data, we design the architecture of our deep learning model based on damage mechanics and introduce a new loss component that increases the thermodynamical consistency of the model. We use mechanistic reduced-order models to generate the training data of the deep learning model and demonstrate that, in addition to achieving high accuracy on unseen deformation paths that include severe softening, our model can be embedded in 3D multiscale simulations with fracture. With this embedding, we also demonstrate that state-of-the-art techniques such as teacher forcing result in deep learning models that cause divergence in multiscale simulations. Our numerical experiments indicate that our model is more accurate than pure data-driven models and is much more efficient than mechanistic reduced-order models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. The accuracy of the surrogate in predicting the microstructural response given deformation paths that are not seen in training.

  2. By Transferable we mean an RNN that can be used as the constitutive law at all macro IPs in a wide range of multi-scale simulations.

  3. In 3D, a load path consists of 6 strain sequences where each sequence is of length \(n_{load}\).

  4. Interestingly, state variables are also used in classical constitutive laws such as the over-stress tensor that is used when modeling multi-axial large-strain kinematic hardening.

  5. \(\tilde{{\mathbb {C}}}_{n+1}^{alg}\) is equal to the elastic modulus in unloading.

  6. \(\tilde{{\mathbb {C}}}_{n+1}^{alg}=(1-{\tilde{D}}_{n+1}){\mathbb {C}}^{el}\) in unloading.

References

  1. Feyel Frédéric, Chaboche Jean-Louis (2000) FE2 multiscale approach for modelling the elastoviscoplastic behaviour of long fibre SiC/Ti composite materials. Comput Methods Appl Mech Eng 183.3–4:309–330

    Google Scholar 

  2. Pascale Kanouté DP, Chaboche Boso Jean-Louis, Schrefler BA (2009) Multiscale methods for composites: a review. Archiv Comput Methods Eng 16(1):31–75

    Google Scholar 

  3. Jian-Ying Wu, Nguyen Vinh Phu, Nguyen Chi Thanh, Sutula Danas, Sinaie Sina, Bordas Stéphane PA (2020) Phase-field modeling of fracture. Adv Appl Mech 53:1–183

    Google Scholar 

  4. Griffith Alan Arnold (1921) VI. The phenomena of rupture and flow in solids. Philos Trans Royal Soc London Ser A Contain Papers Math Phys Character 221:163–198

    Google Scholar 

  5. Dugdale Donald S (1960) Yielding of steel sheets containing slits. J Mech Phys Solids 8(2):100–104

    Google Scholar 

  6. Bouchard Pierre-Olivier, Bay François, Chastel Yvan (2003) Numerical modelling of crack propagation: automatic remeshing and comparison of different criteria. Comput Methods Appl Mech Eng 192.35–36:3887–3908

    Google Scholar 

  7. Vinh Phu Nguyen and Hung Nguyen-Xuan (2013) High-order B-splines based finite elements for delamination analysis of laminated composites. Compos Struct 102:261–275

    Google Scholar 

  8. Jian-Ying Wu (2011) Unified analysis of enriched finite elements for modeling cohesive cracks. Comput Methods Appl Mech Eng 200.45–46:3031–3050

    MathSciNet  Google Scholar 

  9. Moës Nicolas, Dolbow John, Belytschko Ted (1999) A finite element method for crack growth without remeshing. Int J Numer Methods Eng 46.1:131–150

    MathSciNet  Google Scholar 

  10. Moës Nicolas, Gravouil Anthony, Belytschko Ted (2002) Non-planar 3D crack growth by the extended finite element and level sets–Part I: mechanical model. Int J Numer Methods Eng 53.11:2549–2568

    Google Scholar 

  11. Rashid Yrn R (1968) Ultimate strength analysis of prestressed concrete pressure vessels. Nuclear Eng Design 7.4:334–344

    Google Scholar 

  12. Cervera Miguel, Jian-Ying Wu (2015) On the conformity of strong, regularized, embedded and smeared discontinuity approaches for the modeling of localized failure in solids. Int J Solids Struct 71:19–38

    Google Scholar 

  13. Krajcinovic Dusan (1989) Damage mechanics. Mech Mater 8.2–3:117–197

    Google Scholar 

  14. Jirásek Milan (2007) Mathematical analysis of strain localization. Revue européenne de génie civil 11.7–8:977–991

    Google Scholar 

  15. Simo Juan C, Ju JW (1987) Strain-and stress-based continuum damage models-I. Formulation. Int J Solids Struct 23(7):821–840

    Google Scholar 

  16. De Borst R, Sluys LJ (1991) Localisation in a Cosserat continuum under static and dynamic loading conditions. Comput Methods Appl Mech Eng 90.1–3:805–827

    Google Scholar 

  17. Bazant Zdenek P, Belytschko Ted B, Chang Ta-Peng et al (1984) Continuum theory for strain-softening. J Eng Mech 110.12:1666–1692

    Google Scholar 

  18. Bazant Zdenek P, Jirásek Milan (2002) Nonlocal integral formulations of plasticity and damage: survey of progress. J Eng Mech 128.11:1119–1149

    Google Scholar 

  19. Poh Leong Hien, Sun Gang (2017) Localizing gradient damage model with decreasing interactions. Int J Numer Methods Eng 110.6:503–522

    MathSciNet  Google Scholar 

  20. Bram Vandoren, Simone A (2018) Modeling and simulation of quasi-brittle failure with continuous anisotropic stress-based gradient-enhanced damage models. Comput Methods Appl Mech Eng 332:644–685

    MathSciNet  Google Scholar 

  21. Dvorak George J (1992) Transformation field analysis of inelastic composite materials. Proc Royal Soc London Ser A Math Phys Sci 437(1900):311–327

    MathSciNet  Google Scholar 

  22. Roussette Sophie, Michel Jean-Claude, Suquet Pierre (2009) Nonuniform transformation field analysis of elastic-viscoplastic composites. Compos Sci Technol 69.1:22–27

    Google Scholar 

  23. Liu Zeliang, Bessa MA, Liu Wing Kam (2016) Self-consistent clustering analysis: an efficient multi-scale scheme for inelastic heterogeneous materials. Comput Methods Appl Mech Eng 306:319–341

    MathSciNet  Google Scholar 

  24. Tang Shaoqiang, Zhang Lei, Liu Wing Kam (2018) From virtual clustering analysis to self-consistent clustering analysis: a mathematical study. Comput Mech 62.6:1443–1460

    MathSciNet  Google Scholar 

  25. Deng Shiguang, Soderhjelm Carl, Apelian Diran, Bostanabad Ramin (2022) Reduced-order multiscale modeling of plastic deformations in 3D alloys with spatially varying porosity by deflated clustering analysis. Computat Mech 70.3:517–548

    MathSciNet  Google Scholar 

  26. Shiguang Deng, Diran Apelian, Ramin Bostanabad (2023) Adaptive spatiotemporal dimension reduction in concurrent multiscale damage analysis. Computat Mech 72:1–33

    MathSciNet  Google Scholar 

  27. Planas R, Oune N, Bostanabad R (2021) Evolutionary Gaussian processes. J Mech Design 143(11):111703. https://doi.org/10.1115/1.4050746

    Article  Google Scholar 

  28. Oune N, Bostanabad R (2021) Latent map Gaussian processes for mixed variable metamodeling. Comput Methods Appl Mech Eng 387:114128. https://doi.org/10.1016/j.cma.2021.114128

    Article  MathSciNet  Google Scholar 

  29. Chen W, Iyer A, Bostanabad R (2022) Data centric design: a new approach to design of microstructural material systems. Engineering 10:89–98. https://doi.org/10.1016/j.eng.2021.05.022

    Article  Google Scholar 

  30. Zanjani Foumani Zahra, Mehdi Shishehbor, Amin Yousefpour, Ramin Bostanabad (2023) Multi-fidelity Costaware Bayesian optimization. Comput Methods Appl Mech Eng 407:115937. https://doi.org/10.1016/j.cma.2023.115937

    Article  Google Scholar 

  31. Loujaine Mehrez, Jacob Fish, Venkat Aitharaju, Rodgers Will R, Roger Ghanem (2017) A PCE-based multiscale framework for the characterization of uncertainties in complex systems. Comput Mech 61(1–2):219–236. https://doi.org/10.1007/s00466-017-1502-4. (ISSN: 0178-7675 1432-0924)

    Article  MathSciNet  Google Scholar 

  32. Carlos Mora, Tammer Eweis-Labolle Jonathan, Tyler Johnson, Likith Gadde, Ramin Bostanabad (2023) Probabilistic neural data fusion for learning from an arbitrary number of multi-fidelity data sets. Comput Methods Appl Mech Eng 415:116207. https://doi.org/10.1016/j.cma.2023.116207

    Article  MathSciNet  Google Scholar 

  33. Jones RE, Templeton JA, Sanders CM, Ostien JT (2018) Machine learning models of plastic flow based on representation theory. Comput Model Eng Sci 117:309–342. https://doi.org/10.31614/cmes.2018.04285

    Article  Google Scholar 

  34. Furukawa Tomonari, Yagawa Genki (1998) Implicit constitutive modelling for viscoplasticity using neural networks. Int J Numer Methods Eng 43.2:195–219

    Google Scholar 

  35. Furukawa Tomonari, Hoffman Mark (2004) Accurate cyclic plastic analysis using a neural network material model. Eng Anal Bound Elem 28.3:195–204

    Google Scholar 

  36. Fernández Mauricio, Rezaei Shahed, Mianroodi Jaber Rezaei, Fritzen Felix, Reese Stefanie (2020) Application of artificial neural networks for the prediction of interface mechanics: a study on grain boundary constitutive behavior. Adv Model Simul Eng Sci 7.1:1–27

    Google Scholar 

  37. **aoxin Lu, Yvonnet Julien, Detrez Fabrice, Bai **bo (2017) Multiscale modeling of nonlinear electric conductivity in graphene-reinforced nanocomposites taking into account tunnelling effect. J Comput Phys 337:116–131

    MathSciNet  Google Scholar 

  38. Mianroodi Jaber Rezaei, Siboni Nima H, Raabe Dierk (2021) Teaching solid mechanics to artificial intelligence–A fast solver for heterogeneous materials. NPJ Comput Mater 7.1:1–10

    Google Scholar 

  39. Haghighat Ehsan, Raissi Maziar, Moure Adrian, Gomez Hector, Juanes Ruben (2021) A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics. Comput Methods Appl Mech Eng 379:113741

    MathSciNet  Google Scholar 

  40. Peivaste Iman, Siboni Nima H, Alahyarizadeh Ghasem, Ghaderi Reza, Svendsen Bob, Raabe Dierk, Mianroodi Jaber Rezaei (2022) Machine-learning-based surrogate modeling of microstructure evolution using Phasefield. Comput Mater Sci 214:111750

    Google Scholar 

  41. Mozaffar M, Bostanabad R, Chen W, Ehmann K, Cao Jian, Bessa MA (2019) Deep learning predicts pathdependent plasticity. Proc Natl Acad Sci 116.52:26414–26420

    Google Scholar 

  42. Wang Kun, Sun WaiChing (2018) A multiscale multi-permeability poroplasticity model linked by recursive homogenizations and deep learning. Comput Methods Appl Mech Eng 334:337–380

    MathSciNet  Google Scholar 

  43. Ling Wu, Kilingar Nanda Gopala, Noels Ludovic et al (2020) A recurrent neural network-accelerated multi-scale model for elasto-plastic heterogeneous materials subjected to random cyclic and non-proportional loading paths. Comput Methods Appl Mech Eng 369:113234

    MathSciNet  Google Scholar 

  44. Ghavamian F, Simone A (2019) Accelerating multiscale finite element simulations of history-dependent materials using a recurrent neural network. Comput Methods Appl Mech Eng 357:112594

    MathSciNet  Google Scholar 

  45. Logarzo Hernan J, Capuano German, Rimoli Julian J (2021) Smart constitutive laws: inelastic homogenization through machine learning. Comput Methods Appl Mech Eng 373:113482

    MathSciNet  Google Scholar 

  46. Otero Fermin, Oller Sergio, Martinez Xavier (2018) Multiscale computational homogenization: review and proposal of a new enhanced-first-order method. Archiv Comput Methods Eng 25(2):479–505

    MathSciNet  Google Scholar 

  47. Tang Shaoqiang, Yang Yang (2021) Why neural networks apply to scientific computing? Theor Appl Mech Lett 11(3):100242

    Google Scholar 

  48. Hornik Kurt, Stinchcombe Maxwell, White Halbert (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Google Scholar 

  49. Lipton Zachary C, Berkowitz John, Elkan Charles (2015) A critical review of recurrent neural networks for sequence learning. In: ar**v preprint ar**v:1506.00019

  50. Hanin Boris (2018) Which neural net architectures give rise to exploding and vanishing gradients?. In: Advances in neural information processing systems vol 31

  51. Staudemeyer Ralf C, Morris Eric Rothstein (2019) Understanding LSTM–A tutorial into long short-term memory recurrent neural networks. In: ar**v preprint ar**v:1909.09586

  52. Karpathy Andrej, Johnson Justin, Fei-Fei Li (2015) Visualizing and understanding recurrent networks. In: ar**v preprint ar**v:1506.02078

  53. Silhavy Miroslav (2013) The mechanics and thermodynamics of continuous media. Springer, Berlin

    Google Scholar 

  54. Yang Han, Sinha Sumeet Kumar, Feng Yuan, McCallen David B, Jeremić Boris (2018) Energy dissipation analysis of elastic-plastic materials. Comput Methods Appl Mech Eng 331:309–326

    MathSciNet  Google Scholar 

  55. Feigenbaum Heidi P, Dafalias Yannis F (2007) Directional distortional hardening in metal plasticity within thermodynamics. Int J Solids Struct 44.22–23:7526–7542

    Google Scholar 

  56. **ang Zixue, Peng Wei, Liu Xu, Yao Wen (2022) Self-adaptive loss balanced physics-informed neural networks. Neurocomputing 496:11–34

    Google Scholar 

  57. Márquez-Neila Pablo, Salzmann Mathieu, Fua Pascal (2017) Imposing hard constraints on deep networks: Promises and limitations. In: ar**v preprint ar**v:1706.02025

  58. Goodfellow Ian, Bengio Yoshua, Courville Aaron (2016) Deep learning. MIT press, Cambridge

    Google Scholar 

  59. Deng Shiguang, Mora Carlos, Apelian Diran, Bostanabad Ramin (2022) Data-driven calibration of Multifidelity multiscale fracture models via latent map Gaussian Process. J Mech Design 145(1):011705

    Google Scholar 

  60. Bazant Zdenek P (2010) Can multiscale-multiphysics methods predict softening damage and structural failure? Int J Multiscale Comput Eng 8(1):61–67

    Google Scholar 

  61. Bengio Samy, Vinyals Oriol, Jaitly Navdeep, Shazeer Noam (2015) Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in neural information processing systems vol 28

  62. Li Hengyang, Kafka Orion L, Gao Jiaying, Cheng Yu, Nie Yinghao, Zhang Lei, Tajdari Mahsa, Shan Tang Xu, Guo Gang Li et al (2019) Clustering discretization methods for generation of material performance databases in machine learning and design optimization. Comput Mech 64:281–305

    MathSciNet  Google Scholar 

  63. Liu Dao**, Hang Yang KI, Elkhodary Shan Tang, Liu Wing Kam, Guo Xu (2022) Mechanistically informed data-driven modeling of cyclic plasticity via artificial neural networks. Comput Methods Appl Mech Eng 393:114766

    MathSciNet  Google Scholar 

  64. Bostanabad Ramin, Liang Biao, Gao Jiaying, Liu Wing Kam, Cao Jian, Zeng Danielle, Xuming Su, Hongyi Xu, Li Yang, Chen Wei (2018) Uncertainty quantification in multiscale simulation of woven fiber composites. Comput Methods Appl Mech Eng 338:506–532

    MathSciNet  Google Scholar 

  65. Osanov Mikhail, Guest James K (2016) Topology optimization for architected materials design. Annu Rev Mater Res 46:211–233

    Google Scholar 

  66. Zheng-Dong Ma, Noboru Kikuchi, Christophe Pierre, Basavaraju R (2006) Multidomain topology optimization for structural and material designs. J. Appl. Mech. 73(4):565–573

    MathSciNet  Google Scholar 

  67. Deng Shiguang, Suresh Krishnan (2016) Multi-constrained 3D topology optimization via augmented topological level-set. Comput Struct 170:1–12

    Google Scholar 

  68. Deng Shiguang, Suresh Krishnan (2015) Multi-constrained topology optimization via the topological sensitivity. Struct Multidiscip Optim 51(5):987–1001

    MathSciNet  Google Scholar 

  69. Oliver Javier (1989) A consistent characteristic length for smeared cracking models. Int J Numer Methods Eng 28(2):461–474

    Google Scholar 

  70. Oliver Javier, Huespe Alfredo Edmundo, Pulido MDG, Chaves E (2002) From continuum mechanics to fracture mechanics: the strong discontinuity approach. Eng Fract Mech 69.2:113–136

    Google Scholar 

  71. Liu Zeliang, Fleming Mark, Liu Wing Kam (2018) Microstructural material database for self-consistent clustering analysis of elastoplastic strain softening materials. Comput Methods Appl Mech Eng 330:547–577

    MathSciNet  Google Scholar 

  72. Smith Michael (2009) ABAQUS standard user’s manual. In: Dassault Systèmes Simulia Corp, Version 6.9

  73. Oliver Javier, Huespe Alfredo Edmundo, Cante JC (2008) An implicit/explicit integration scheme to increase computability of non-linear material and contact/friction problems. Comput Methods Appl Mech Eng 19.721–24:1865–1889

    Google Scholar 

  74. Liu Gui-Rong (2009) Meshfree methods: moving beyond the finite element method. CRC Press, Boca Raton

    Google Scholar 

  75. Jönsthövel TB, Van Gijzen MB, Vuik C, Kasbergen C, Scarpas A (2009) Preconditioned conjugate gradient method enhanced by deflation of rigid body modes applied to composite materials. Comput Model Eng Sci (CMES) 47.2:97

    Google Scholar 

  76. Saha Sourav, Kafka Orion L, Ye Lu, Cheng Yu, Liu Wing Kam (2021) Macroscale property prediction for additively manufactured in625 from microstructure through advanced homogenization. Integr Mater Manuf Innov 10:360–372

  77. Kafka Orion L, Cheng Yu, Cheng Puikei, Wolff Sarah J, Bennett Jennifer L, Garboczi Edward J, Cao Jian, **ao **anghui, Liu Wing Kam (2022) X-ray computed tomography analysis of pore deformation in IN718 made with directed energy deposition via in-situ tensile testing. Int J Solids Struct 256:111943

    Google Scholar 

  78. Yang Yang, Zhang Lei, Tang Shaoqiang (2022) A comparative study of cluster-based methods at finite strain. Acta Mechanica Sinica 38(4):421153

    MathSciNet  Google Scholar 

  79. Nie Yinghao, Li Zheng, Cheng Gengdong (2021) Efficient prediction of the effective nonlinear properties of porous material by FEM-Cluster based Analysis (FCA). Comput Methods Appl Mech Eng 383:113921

    MathSciNet  Google Scholar 

  80. Dispinar D, Akhtar Shahid, Nordmark Arne, Di Sabatino Marisa, Arnberg LJMS (2010) Degassing, hydrogen and porosity phenomena in A356. Mater Sci Eng A 527.16–17:3719–3725

    Google Scholar 

Download references

Acknowledgements

The authors appreciate the supports from ACRC consortium, the feedback from the anonymous reviewers, and the helpful discussions with Dr. Ling Wu and Dr. Ludovic Noels. Ramin Bostanabad also acknowledges support from NSF (award number 2211908) and the Office of Naval Research (award number N00014-23-1-2485).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramin Bostanabad.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

A Multiscale Continuum Damage Model

There are two popular approaches for modeling damage [69, 70]: (1) a discrete approach based on fracture mechanics which models displacement discontinuity at discontinuous interfaces, and (2) a continuous approach based on continuum mechanics which models damage as strain softening by inelastic strains. In our work, we adopt the latter approach to generate a database of microstructural responses. The continuous approach, however, often suffers from non-objectivity where localized damage bands become narrower and narrower upon mesh refinement, i.e., they are mesh dependent (this dependency is caused by ill-posed governing equations due to damage-induced non-positive definiteness). One way to address this issue is to modify the continuum constitutive law by introducing crack bandwidth via nonlocal functions.

However, integrating nonlocal functions within a multiscale damage model is quite difficult [18, 60]. This is because, on the one hand, a material characteristic length \(l_m\) is needed to stabilize ill-posed governing equations (caused by non-positive definite stiffness matrix) to address the mesh dependency at microscale. On the other hand, a macroscale characteristic length \(l_M\) is required to provide nonlocality for distant macro elements. Simultaneously imposing \(l_m\) and \(l_M\) helps to stabilize the multiscale damage model but is not physically realistic. If only \(l_m\) is imposed and \(l_M\) is neglected, the multiscale model merely transfers the damage-induced localization to the finer scale which is equivalent to a single scale model with \(l_m\) and contradicts with the purpose of multiscale modeling without properly transmitting microscale damage to macroscale [60].

One way to properly impose nonlocal functions in multiscale damage models is to introduce a material characteristic length in each RVE that accounts for the influence of damage on neighboring RVEs (and associated macro elements) [71]. We adopt another approach to properly impose both the micro and macro damage regularizations [26]: our micro damage model is regularized by a predefined material fracture energy and its resulting effective damage parameter is subsequently regularized by a macro damage model via a nonlocal function for neighboring RVEs. We provide more details on our approach below.

We adopt an continuum damage model to simulate strain softening in ductile metals whose load-carrying capacity drops due to the degradation of yield stress and stiffness. To simulate the onset of softening, we choose ductile damage initiation criteria which models the effective strain at damage initiation, i.e., \({\bar{E}}_{d}^{pl}\), as a function of stress and strain states. We presume \({\bar{E}}_{d}^{pl}\) is a constant and that damage begins when the equivalent plastic strain is equal or greater than it, i.e., \({\bar{E}}^{pl} \geqslant {\bar{E}}_{d}^{pl}\).

A major challenge of continuum damage models is the softening-induced non-positive definite stiffness matrix that results in slow solution convergence and negative wave speeds [60]. Specifically, the ill-posed problem causes equilibrium equations to lose objectivity with respect to mesh sizes by exhibiting spurious mesh sensitivity. In microstructural simulations, to address the lack of objectivity to mesh choices, we convert the stress–strain relation in the constitutive equation to a stress-displacement relation to formulate the micro-damage evolution after damage initiation as:

$$\begin{aligned} G_f = \int _{{\bar{E}}_0^{pl}}^{{\bar{E}}_f^{pl}} l_e S_y \,d{{\bar{E}}^{pl}} = \int _{0}^{{\bar{u}}_f^{pl}} S_y \,d{{\bar{u}}^{pl}} \end{aligned}$$
(A-1)

where \(l_e\) indicates the element’s characteristic length in an arbitrary RVE, and \(G_f\) represents the dissipated energy that opens a unit area of crack after damage initiation. The equivalent plastic displacement \({\bar{u}}^{pl}\) is the fracture work conjugate to the yield stress \(S_y\) from the onset of damage (with the effective plastic strain \({\bar{E}}_0^{pl}\) and zero plastic displacement \({\bar{u}}^{pl}\)) until the final failure (with the effective fracture strain \({\bar{E}}_f^{pl}\) and the fracture displacement \({\bar{u}}_{f}^{pl}\)). Using Equation (A-1) we define the microscale damage evolution based on an exponential form of the released energy [72] as:

$$\begin{aligned} D_m = 1 - exp\left( -\frac{1}{G_f} \int _{0}^{{\bar{u}}_f^{pl}} S_y \,d{{\bar{u}}^{pl}}\right) \end{aligned}$$
(A-2)

where \(D_m\) represents the microscale damage parameter that monotonically increases in the range of [0.0, 1.0]. We note that in our context of isotropic continuum damage, \(D_m\) is a scalar and it becomes a tensor in anisotropic damage models. In addition, we note that \(D_m\) approaches 1.0 asymptotically with infinitely large \({\bar{u}}^{pl}\) in Equation (A-2). In practice, we set \(D_m\) as 1.0 when the dissipated energy exceeds \(0.99G_f\). In our continuum damage model, we formulate a softening response of a micro point with an elasto-plastic behavior as:

$$\begin{aligned} {\varvec{S}}_m = (1-D_m){\varvec{S}}^0_m; \hspace{0.3cm} {\varvec{S}}^0_m = {\mathbb {C}}^{el}:{\varvec{E}}^{el}_m = {\mathbb {C}}^{el}:({\varvec{E}}_m-{\varvec{E}}^{pl}_m) \nonumber \\ \end{aligned}$$
(A-3)

where \({\varvec{S}}_m\) and \({\varvec{S}}^0_m\) are, respectively, the damaged stress and the reference stress that undergoes the same deformation path but in the absence of damage at a micro material point. \({\mathbb {C}}^{el}\) represents the fourth-order elasticity tensor. \({\varvec{E}}_m\), \({\varvec{E}}^{el}_m\) and \({\varvec{E}}^{pl}_m\) are the microscale total strain, elastic strain, and plastic strain, respectively.

From the microscale stress, we can compute the effective stress via Equation (2). Additionally, we compute the RVE’s effective damage parameter [71] by:

$$\begin{aligned} D_M = 1 - \frac{\Vert {\varvec{S}}_M: {\varvec{S}}_M^0 \Vert }{\Vert {\varvec{S}}_M^0:{\varvec{S}}_M^0 \Vert } \end{aligned}$$
(A-4)

where the homogenized damage parameter \(D_M\) indicates the damage status of the RVE. Its value depends on the values of the effective stress \({\varvec{S}}_M\) and the effective reference stress \({\varvec{S}}_M^0\) without damage; thus, it is clear that the effective damage parameter is not a function of homogenized plastic strains.

We proceed to constrain the effective damage parameter \(D_M\) via an integral-type non-local damage model to mitigate the spurious mesh dependency on the macroscale as:

$$\begin{aligned} {\hat{D}}_M({\varvec{P}}, {\varvec{P}}') = \int _{B} {\omega (\Vert {\varvec{P}}-{\varvec{P}}'\Vert )}D_M({\varvec{P}}') \,d{\varvec{P}}' \end{aligned}$$
(A-5)

where \({\hat{D}}_M({\varvec{P}}, {\varvec{P}}')\) is the non-local damage parameter at the macroscopic point \({\varvec{P}}\) surrounded by points \({\varvec{P}}'\) in the compact neighborhood B. \(D_M({\varvec{P}}')\) represents the local damage parameter at \({\varvec{P}}'\) and \(\omega \) indicates the non-local weighting function which depends on the distance \(\Vert {\varvec{P}}-{\varvec{P}}'\Vert \) between the studied point and its supporting points. In this work, we define \(\omega \) by a polynomial bell-shape function as:

$$\begin{aligned} {\omega (\Vert {\varvec{P}}-{\varvec{P}}'\Vert )} = \frac{\omega _\infty (\Vert {\varvec{P}}-{\varvec{P}}'\Vert )}{\int _{B} {\omega _\infty (\Vert {\varvec{P}}-{\varvec{P}}'\Vert )} \,d{\varvec{P}}'} \end{aligned}$$
(A-6a)
$$\begin{aligned} {\omega _\infty (\Vert {\varvec{P}}-{\varvec{P}}'\Vert )} = \left\langle 1 - \frac{4(\Vert {\varvec{P}}-{\varvec{P}}'\Vert )^2}{l_d^2} \right\rangle ^2 \end{aligned}$$
(A-6b)

where \(\langle \dots \rangle \) is the Macauley bracket defined as \(\langle x \rangle = max(0,x)\), \(l_d\) denotes the strain localization bandwidth whose value represents the non-local interacting radius, and the support domain B is a sphere with a radius of \(l_d/2\) in 3D models.

In our multiscale damage model, we compute the regularized macro stress, \({\varvec{S}}\), as:

$$\begin{aligned} {\varvec{S}}=(1-{\hat{D}}_M){\varvec{S}}^0_M \end{aligned}$$
(A-7)

where we assume the effective reference stress \({\varvec{S}}^0_M\) can be closely approximated by the effective damaged stress (\({\varvec{S}}_M\)) and the effective damage parameter (\(D_M\)) from microstructural analyses. That is:

$$\begin{aligned} {\varvec{S}}^0_M={\varvec{S}}_M/(1-D_M) \end{aligned}$$
(A-8)

We can directly relate \({\varvec{S}}\) to the RVEs’ effective damaged stress (\({\varvec{S}}_M\)) via \({\varvec{S}}={\varvec{S}}_M(1-{\hat{D}}_M)/(1-D_M)\). These two stresses are identical if there are no macro nonlocal functions, i.e., \({\hat{D}}_M=D_M\). However, as discussed before, it is important to properly impose damage regularization on each scale to address the mesh dependency issue of continuum damage mechanics [18, 60]. In our model, we regularize micro damage by material fracture energy in Equation (A-2) and macro damage by nonlocal functions in Equation (A-5). Hence, the two stresses (\({\varvec{S}}\) and \({\varvec{S}}_M\)) in our case are directly related via a coefficient of \((1-{\hat{D}}_M)/(1-D_M)\).

We now demonstrate that the relation in Equation (A-8) is directly related to the definition of the effective damage parameter in Equation (A-4):

$$\begin{aligned}{} & {} {{\varvec{S}}_M} = (1-D_M) {\varvec{S}}_M^0 \end{aligned}$$
(A-9a)
$$\begin{aligned}{} & {} {{\varvec{S}}_M: {\varvec{S}}_M^0} = (1-D_M) {\varvec{S}}_M^0: {\varvec{S}}_M^0 \end{aligned}$$
(A-9b)
$$\begin{aligned}{} & {} {({\varvec{S}}_M:{\varvec{S}}_M^0)/({\varvec{S}}_M^0:{\varvec{S}}_M^0)} {=} (1{-}D_M) ({\varvec{S}}_M^0:{\varvec{S}}_M^0)/({\varvec{S}}_M^0:{\varvec{S}}_M^0) \nonumber \\ \end{aligned}$$
(A-9c)
$$\begin{aligned}{} & {} {({\varvec{S}}_M:{\varvec{S}}_M^0)/({\varvec{S}}_M^0:{\varvec{S}}_M^0)} = (1-D_M) \end{aligned}$$
(A-9d)

where we can obtain the definition of the effective damage parameter as in Equation (A-4) since \(0 \le 1-D_M \le 1\).

B Hybrid Constitutive Integration

The non-positive definiteness of the stiffness matrix is the primary reason for the slow convergence of classic implicit time integration schemes that are used in continuum damage simulations. For illustration, consider the constitutive equation of an isotropic damage model integrated by an implicit backward-Euler integration scheme. Its algorithmic tangent operator at an arbitrary macroscopic IP can be written as:

$$\begin{aligned} {\mathbb {C}}_{n+1}^{alg}= & {} \frac{\partial {\varvec{S}}_{n+1}}{\partial {\varvec{E}}_{n+1}} = (1-D_{n+1}){\mathbb {C}}^{el} \nonumber \\{} & {} - \frac{S_{n+1} - H_{n}{\bar{E}}_{n+1}^{pl}}{({\bar{E}}_{n+1}^{pl})^3} {\varvec{S}}_{n+1}^{0} \otimes {\varvec{S}}_{n+1}^{0} \end{aligned}$$
(B-1)

where \({\mathbb {C}}_{n+1}^{alg}\), \({\bar{E}}_{n+1}^{pl}\), \(S_{n+1}\), \({\varvec{S}}_{n+1}^{0}\) and \(H_{n}\) represent the fourth-order algorithmic tangent operator, equivalent plastic strain, equivalent stress, referenced stress tensor, and softening modulus, respectively. The subscripts denote time steps and the symbol \(\otimes \) represents the cross product between tensors. Softening causes negative values for \(H_{n}\) which can render \({\mathbb {C}}_{n+1}^{alg}\) indefinite. A non-positive \({\mathbb {C}}_{n+1}^{alg}\) leads to an ill-conditioned elemental stiffness matrix with near-zero or negative eigenvalues, and further deteriorates the global stiffness matrix in the element assembly process. Such ill-posed matrices dramatically reduce the efficiency of iterative solvers (e.g., Newton–Raphson methods) and often cause job abortion before final convergence.

To fundamentally resolve the convergence issue, we adopt a hybrid time integration scheme [26, 73] to integrate the governing equations of elasto-plastic and softening equations explicitly-implicitly. The basic idea of the hybrid integration is to maintain the positive definiteness of the system’s algebraic tangent operator by separately integrating constitutive equations in two consecutive stages via explicit and implicit schemes. At the first stage, we explicitly extrapolate internal material state variables at time step \(n+1\) from step n to compute the explicit stress state \(\tilde{{\varvec{S}}}_{n+1}\) that balances the equilibrium equation between internal and external forces. At the second stage, we compute the implicit stress state \({\varvec{S}}_{n+1}\) based on the current strain state \({\varvec{E}}_{n+1}\) using the classic backward Euler method to update the trial stress tensor and yield functions for the next time step where the tangent operator between \(\tilde{{\varvec{S}}}_{n+1}\) and \({\varvec{E}}_{n+1}\) is kept positive definite.

For the elasto-plastic model, we choose the material state variable as the incremental plastic strain tensor \(\triangle \tilde{{\varvec{E}}}_{n+1}^{pl}\) such that \(\tilde{{\varvec{S}}}_{n+1}\) can be computed as:

$$\begin{aligned}{} & {} \tilde{{\varvec{S}}}_{n+1}(\triangle \tilde{{\varvec{E}}}_{n+1}^{pl}) = \tilde{{\varvec{S}}}_{n+1}^{trial} - {\mathbb {C}}^{el}:\triangle \tilde{{\varvec{E}}}_{n+1}^{pl} = {\mathbb {C}}^{el}:{\varvec{E}}_{n+1} \nonumber \\{} & {} \quad - {\mathbb {C}}^{el}:{\varvec{E}}_{n}^{pl} - {\mathbb {C}}^{el}:\triangle \tilde{{\varvec{E}}}_{n+1}^{pl} \nonumber \\{} & {} \quad \triangle \tilde{{\varvec{E}}}_{n+1}^{pl} = \frac{\triangle t_{n+1}}{\triangle t_n} \triangle {\varvec{E}}_n^{pl} \end{aligned}$$
(B-2)

where \({\varvec{E}}_n^{pl}\) represents the implicit incremental plastic strain tensor at time step n, \(\triangle t_n\) and \(\triangle t_{n+1}\) indicate the lengths of time steps at two consecutive steps. The algorithmic tangent operator (under loadingFootnote 5) is therefore computed as:

$$\begin{aligned}{} & {} \tilde{{\mathbb {C}}}_{n+1}^{alg} = \frac{\partial {\tilde{{\varvec{S}}}_{n+1}(\triangle \tilde{{\varvec{E}}}_{n+1}^{pl})}}{\partial {{\varvec{E}}_{n+1}}} \nonumber \\{} & {} \quad = \frac{\partial ({\mathbb {C}}^{el}:{\varvec{E}}_{n+1} - {\mathbb {C}}^{el}:{\varvec{E}}_{n}^{pl} - {\mathbb {C}}^{el}:\triangle \tilde{{\varvec{E}}}_{n+1}^{pl})}{\partial {{\varvec{E}}_{n+1}}} = {\mathbb {C}}^{el} \nonumber \\ \end{aligned}$$
(B-3)

In a similar manner, for isotropic continuum damage models, we choose the explicitly interpolated material state variable in the hybrid integration as the incremental plastic multiplier \(\triangle {\tilde{\lambda }}_{n+1}\), i.e., \(\triangle {\tilde{\lambda }}_{n+1} = (\triangle t_{n+1} / \triangle t_n) \triangle \lambda _n\). We can then write its explicit damaged stress and algorithmic tangent operator (under loadingFootnote 6) as:

$$\begin{aligned}{} & {} \tilde{{\varvec{S}}}_{n+1} = (1-{\tilde{D}}_{n+1}) {\varvec{S}}_{n+1}^0 = (1-{\tilde{D}}_{n+1}) {\mathbb {C}}^{el}:{\varvec{E}}_{n+1};\nonumber \\{} & {} {\tilde{D}}_{n+1} = {\tilde{D}}_{n+1} (D_n, \triangle {\tilde{\lambda }}_{n+1}) \end{aligned}$$
(B-4)
$$\begin{aligned}{} & {} \tilde{{\mathbb {C}}}_{n+1}^{alg} = \frac{\partial {\tilde{{\varvec{S}}}_{n+1}}}{\partial {{\varvec{E}}_{n+1}}} = (1-{\tilde{D}}_{n+1}) {\mathbb {C}}^{el} \end{aligned}$$
(B-5)

where \({\varvec{S}}_{n+1}^0\) is the effective stress tensor, and \({\tilde{D}}_{n+1}\) represents the explicit state of the damage variable which is a function of its previous implicit state \(D_n\) and the current explicit incremental plastic multiplier \(\triangle {\tilde{\lambda }}_{n+1}\).

In the hybrid integration scheme, the loading tangent operators of the elasto-plastic model in Equation (B-3) and the damage model in Equation (B-5) are trivially equal to the elastic modulus \({\mathbb {C}}^{el}\) and \((1-{\tilde{D}}_{n+1}) {\mathbb {C}}^{el}\). Hence, the hybrid integration scheme preserves the positive-definiteness of the governing equations and also allows to assemble the global stiffness matrix only once before online simulations. The global stiffness matrix remains constant for the elasto-plastic regime and only needs partial updates on matrix entries associated with the softening IPs by Equation (B-5). As softening is often highly localized in small regions, the global stiffness can be incrementally updated during the entire elasto-plastic-hardening-softening process [26]; saving significant memory footprints with robust convergence performance.

C Deflated Clustering Analysis

Simulation of microstructural softening via the classic FE2 method involves demanding computational costs, which is prohibitive for generating big training data for machine learning models. To accelerate the database generation, we adopt our previously developed mechanistic ROM, i.e., deflated clustering analysis (DCA) [25, 26]. Its high efficiency comes from two facts: (1) the number of unknown variables in the system is dramatically reduced from a large number of finite elements to a few clusters by agglomerating elements via clustering as shown in Fig. 23, and (2) the algebraic equations of the reduced system contains much fewer close-to-zero eigenvalues that results in better convergence comparing to the classic FE system.

Our DCA utilizes k-means clustering, i.e., an unsupervised machine learning technique for data interpretation and grou**, to agglomerate neighboring elements into a set of interactive irregular-shape clusters. The clustering begins with feeding the coordinates of element centroids into a feature space where randomly scattered cluster seeds serve as initial cluster means. Clusters accept or reject elements by iteratively minimizing the within-cluster variance until all elements are assigned to a cluster. The clustering procedure can be mathematically stated as a minimization problem as:

$$\begin{aligned} {\varvec{C}} = \min \limits _{{\varvec{C}}}\sum \limits _{I = 1}^k \sum \limits _{n \in C^I} \Vert \pmb {\varphi }_n - \bar{\pmb {\varphi }}_I \Vert ^2 \end{aligned}$$
(C-1)

where \({\varvec{C}}\) represents the k clusters with \({\varvec{C}} = \{C^1, C^2, \dots , C^k\}\). \(\pmb {\varphi }_n\) and \(\bar{\pmb {\varphi }}_I\) indicate the coordinates of the centroid of the \(n^{th}\) element and the mean of the coordinates of the \(I^{th}\) cluster, respectively. A clustering example is illustrated in Fig. 23 where the discrete domain of a 2D generic RVE with 5, 000 elements is decomposed into 100 clusters.

Fig. 19
figure 19

Demonstration of clustering in ROM: The domain of a generic 2D RVE with 5, 000 elements in (a) are decomposed into 100 clusters in (b) where elements in the same cluster are assigned with the same color

We construct clustering-based reduced mesh via Delaunay triangularization by connecting cluster centroids where the topological relations between clusters are preserved from the original FE mesh. By assuming the motions of cluster centroids are directly related to clustering nodes, we can compute the nodal displacements via polynomial augmented radian point interpolation [74] as:

$$\begin{aligned} {\varvec{u}}_c = {\varvec{R}}{\varvec{a}} + {\varvec{Z}}{\varvec{b}} \end{aligned}$$
(C-2)

where \({\varvec{u}}_c\) represents the displacements of cluster centroids. \({\varvec{a}}\) is the coefficient vector of the radial basis function matrix \({\varvec{R}}\), and \({\varvec{b}}\) is the coefficient vector of the polynomial basis matrix \({\varvec{Z}}\). Meanwhile, the radial coefficient and the polynomial basis need to satisfy the following equation for every node per cluster and every polynomial basis function to ensure solution uniqueness [74] as:

$$\begin{aligned} {\varvec{Z}}{\varvec{a}} = {\varvec{0}} \end{aligned}$$
(C-3)
Fig. 20
figure 20

Multiscale cube model: a Every integration point of the macro-cube model is associated with a porous RVE; and b The RVE domain is discretized by different numbers of clusters

The displacements of cluster centroids are augmented with rotational degrees of freedom to represent the six rigid body motions in a 3D deflation space [75], including three translations and three rotations. Upon the completion of a non-linear analysis on the reduced mesh, the displacement solutions can be projected back to the original FE mesh by:

$$\begin{aligned} {\varvec{u}}_{i}^{j} = {\varvec{W}}_{i}^{j} \pmb {\lambda }_{j} \end{aligned}$$
(C-4)

where \({\varvec{u}}_{i}^{j}\) represents the displacement vector at the \(i^{th}\) node in the \(j^{th}\) cluster. In addition, \(\pmb {\lambda }_{j}\) is the rigid body motion of the centroid of the \(j^{th}\) cluster, while the \({\varvec{W}}_{i}^{j}\) indicates the deflation matrix for the \(i^{th}\) node in the \(j^{th}\) cluster as:

$$\begin{aligned} \pmb {\lambda }_{j}= & {} [u_{jx}, u_{jy}, u_{jz}, \theta _{jx}, \theta _{jy}, \theta _{jz}]^{T}; \nonumber \\ {\varvec{W}}_{i}^{j}= & {} \begin{bmatrix} 1 &{} 0 &{} 0 &{} 0 &{} z_{i}^{j} &{} -y_{i}^{j} \\ 0 &{} 1 &{} 0 &{} -z_{i}^{j} &{} 0 &{} x_{i}^{j} \\ 0 &{} 0 &{} 1 &{} y_{i}^{j} &{} -x_{i}^{j} &{} 0 \end{bmatrix} \end{aligned}$$
(C-5)

where \(u_{jx}\) and \(\theta _{jx}\) are the displacement and rotation of the \(j^{th}\) cluster along x axis, and the (\(x_{i}^{j}\), \(y_{i}^{j}\), \(z_{i}^{j}\)) are the relative 3D coordinates of the \(i^{th}\) node with respect to the centroid of the \(j^{th}\) cluster. By assuming all elements in the same cluster share identical stress and strain fields, microstructural effective responses can be reproduced in a highly efficient manner such that the unknown variables are dramatically decreased from FE system that accounts for distinct field variables per element to the reduced system with much fewer distinct solutions per cluster.

To demonstrate the efficacy of our DCA, we compare its simulation results on a 3D multiscale cube against the classic FE2 method in Fig. 20. The macro-cube is fully constrained at its bottom surface, and it is subject to an upward extension on the top surface with \(d = 7\) mm. The cube is meshed with 12 tetrahedral elements of reduced-integration (one IP at the center of each tetrahedron). We assume each macro-IP is associated with the same porous RVE containing one spherical pore in the middle as shown in Fig. 20a.

Fig. 21
figure 21

Results of the multiscale cube model: a Comparison of the softening load–displacement curves between FE2 and FE-ROM with different clusters; and b Comparison of computational time

To determine the number of clusters for a given problem (for any clustering-based ROM, e.g., DCA, SCA, or SCA’s variants), we can perform a quick preliminary convergence study where we gradually increase the number of clusters and determine the minimum number of clusters above which the results insignificantly change. This convergence study can also be done by comparing the results of the ROM to that of direction numerical solutions (DNS). Yet another method is to formulate a data-driven inverse optimization problem [59] where the cluster number is considered as an optimization variable. In this work, we carry out a convergence study to ensure our ROM’s solutions do not change as the number of clusters increase and that they are consistent with the DNS, i.e., FE2. Specifically, we apply four clustering levels (k) of 400, 800, 1, 200 and 1, 600 to an RVE meshed with 15, 000 elements and investigate the effects of k on the RVE’s effective softening behaviors, see Fig. 20.

We compare the reaction force-displacement curves from FE2 and FE-ROM in Fig. 21a. By considering the FE2 solutions as the benchmark, we observe that: (1) the FE-ROM solutions with \(k = 400\) slightly overestimate the component’s strength as insufficient clustering in the RVE artificially strengthens the material [23, 25]; and (2) as k increases, the FE-ROM responses (especially the post-failure behaviors) become closer and closer to the benchmark. Specifically, we observe that when k increases to 1, 200 and 1, 600, FE-ROMs achieve sufficiently accurate results compared to FE2.

Fig. 22
figure 22

Architectures of GRU layer and cells: The internal structure and mathematical operations are demonstrated in the GRU cell at time step t

Fig. 23
figure 23

Experimental characterization of our aluminum alloy A356: a A356 ingots are melted in a high-temperature furnace with degassing to remove porosity; b Heat treatment of cast tensile bars; c Composition analysis; and d Tensile tests of the cast alloys

We compare the computational costs of the different solvers in Fig. 21b. While all experiments are performed on an HPC by paralleling 60 CPU cores with 360 GB RAM, the clock time of FE2 is the longest (about 24.9 hours). The clock time of the ROM with 1, 200 and 1, 600 clusters is about 2.5 and 3.2 hours, resulting in the acceleration factors of 9.9 and 7.8, respectively. Considering the fact that the ROM with \(k = 1,200\) is about \(28\%\) faster than its counterpart with \(k = 1,600\) while achieving similar accuracy, we adopt \(k = 1,200\) while building the training dataset in Sect. 4.

For efficient generation of (micro)structure-performance datasets, we note that many other ROMs can also be used for porous microstructural analyses. For example, self-consistent analysis (SCA) [23, 76, 77] and virtual clustering analysis (VCA) [24] can achieve highly efficient and accurate microstructural homogenization results by treating pores as a soft material with the \(0.1\%\) modulus of matrix materials [78]. Another method is the FEM-cluster-based analysis (FCA) [79] where the Hill-Mandel theorem is replaced with the energy equivalence theorem without filling pores with reference material properties. As our focus in this paper is on building the deep learning model that can faithfully surrogate microstructural analyses, we use our in-house DCA package and plan to leverage other methods such as SCA in our future works.

D Gated Recurrent Unit

To alleviate vanishing and exploding gradient issues of RNNs in processing long sequential data, long short term memory (LSTM) and gated recurrent unit (GRU) are typically used. GRU is a variant of the LSTM that, while providing similar accuracy, is more parsimonious and hence computationally more efficient. It is for this reason that we choose GRU as the memory cell in our proposed RNN architecture as in Fig. 4.

To demonstrate the working mechanism of GRUs, we three interconnected cells of a GRU layer in Fig. 22. In a GRU layer, a typical cell at an arbitrary time step t generates predictions \(\hat{{\varvec{y}}}_t\) and internal memory-like hidden variables \({\varvec{h}}_t\) after reading in the current inputs \({\varvec{x}}_t\) and the hidden variables \({\varvec{h}}_{t-1}\) from the previous cell. Compared to the RNN cell in Fig. 3b, the GRU cell uses reset and update gates to regulate its internal information flow. The reset gate \({\varvec{r}}_t\) reads \({\varvec{x}}_t\) and \({\varvec{h}}_{t-1}\) to determine the candidate hidden state \(\hat{{\varvec{h}}}_t\) by filtering out less important information passing from the previous cell. Its operations include:

$$\begin{aligned} {\varvec{r}}_t&=\sigma \left( {\varvec{W}}_{h r} {\varvec{h}}_{t-1}+{\varvec{W}}_{x r} {\varvec{x}}_t+{\varvec{b}}_r\right) \end{aligned}$$
(D-1a)
$$\begin{aligned} \tilde{{\varvec{h}}}_t&={\text {tanh}}\left( {\varvec{r}}_t \odot {\varvec{W}}_{h {\tilde{h}}} {\varvec{h}}_{t-1}+{\varvec{W}}_{x {\tilde{h}}} {\varvec{x}}_t+{\varvec{b}}_{{\tilde{h}}}\right) \end{aligned}$$
(D-1b)

where \(\sigma \) is the sigmoid activation function that returns a value in the range of [0, 1], tanh is the hyperbolic tangent function, and \(\odot \) represents the Hadamard product. \({\varvec{W}}_{hr}\), \({\varvec{W}}_{xr}\), \({\varvec{W}}_{h {\tilde{h}}}\), \({\varvec{W}}_{x {\tilde{h}}}\) are the weight matrices associated with the hidden state, the input state, the hidden-to-candidate hidden state and the input-to-candidate hidden state, respectively. \({\varvec{b}}_r\) and \({\varvec{b}}_{{\tilde{h}}}\) are the biases applied to the sigmoid function in the reset gate and the hyperbolic tangent function, respectively.

The update gate (which has its weights and biases) similarly operates on \({\varvec{x}}_t\) and \({\varvec{h}}_{t-1}\): it linearly interpolates the previous hidden state \({\varvec{h}}_{t-1}\) and the candidate hidden state \(\tilde{{\varvec{h}}}_t\) to update the memory-like hidden state \({\varvec{h}}_t\) which is then passed to the next cell:

$$\begin{aligned} {\varvec{u}}_t&=\sigma \left( {\varvec{W}}_{hu} {\varvec{h}}_{t-1}+{\varvec{W}}_{x u} {\varvec{x}}_t+{\varvec{b}}_u\right) \end{aligned}$$
(D-2a)
$$\begin{aligned} {\varvec{h}}_t&={\varvec{u}}_t \odot {\varvec{h}}_{t-1}+\left( 1-{\varvec{u}}_t\right) \odot \tilde{{\varvec{h}}}_t+{\varvec{b}}_h \end{aligned}$$
(D-2b)

where \({\varvec{W}}_{hu}\) and \({\varvec{W}}_{xu}\) are the weights applied onto the hidden state and input state in the update gate. \({\varvec{b}}_u\) and \({\varvec{b}}_h\) are the two biases associated to the sigmoid function and the generation of the current hidden state. The cell output at the current time step \(\hat{{\varvec{y}}}_t\) is then obtained by linearly transforming the hidden state:

$$\begin{aligned} \hat{{\varvec{y}}}_t ={\varvec{W}}_{h y} {\varvec{h}}_t+{\varvec{b}}_y \end{aligned}$$
(D-3)

where \({\varvec{W}}_{hy}\) and \({\varvec{b}}_y\) are the weights and biases associated with the current output state \(\hat{{\varvec{y}}}_t\). We note that all the weights and biases of the GRU networks are iteratively updated by BPTT during training.

E Experimental Material Characterization

For the microstructural simulations in Appendix C we assume the microstructure only contains porosity and the matrix material (i.e., aluminum alloy A356). So, in this section, we briefly discuss the experimental characterization process that can be used to obtain the effective elastoplastic and damage properties of the matrix material, see Fig. 23. Our experiment consists of several steps. In the first step, we melt aluminum A356 ingots in a furnace which is pre-heated to about \(800^{\circ }\,\hbox {C}\). During the melting process, we apply degassing [80] to remove gases (e.g., hydrogen contents) and gas-induced porosity before casting as tensile coupons. In the second step, we apply a standard T6 heat treatment to improve the A356 alloy’s strength and toughness. The heat treatment involves a high temperature treatment at \(540^{\circ }\,\hbox {C}\) for 8 hours to dissolve alloy elements into aluminum matrix, a quenching process to freeze alloy elements within the solid solution, and an artificial aging process at about \(155^{\circ }\,\hbox {C}\) for 3.5 hours to precipitate alloy elements and form grain structures. We also perform composition analysis and find that our A356 alloy contains about \(92.05\%\) aluminum (weight fraction), \(6.72\%\) silicon, \(0.09\%\) steel, \(0.0028\%\) magnesium, and other alloy elements. In the third step, we use X-ray computed tomography (CT) to inspect the porosity defect in tensile coupons to ensure the cast alloy is free of pores. Finally, we perform the tensile test on the tensile coupons and measure their averaged elastoplastic and damage parameters (which are provided in Sect. 4.1).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deng, S., Hosseinmardi, S., Wang, L. et al. Data-driven physics-constrained recurrent neural networks for multiscale damage modeling of metallic alloys with process-induced porosity. Comput Mech 74, 191–221 (2024). https://doi.org/10.1007/s00466-023-02429-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00466-023-02429-1

Keywords

Navigation