Introduction

Computational design of high-performance epoxy resins calls for methods to circumvent costly experiments. Chemistry-specific molecular models are critically needed to bridge the gap in scales between molecular dynamics (MD) simulations and experiments, while predicting accurately the highly tunable macroscopic properties of epoxy resins and their composites1,2,3. This remains a challenging problem to tackle due to the chemical complexity4,5,6 of epoxy resins, the high number of properties that must be targeted for realistic predictions, and their strong dependence on the degree of crosslinking (DC)7,8,9,10,11,12. This up-scaling problem requires multi-dimensional functional calibration, taking inputs from high-fidelity simulations such as all-atomistic simulations. All-atom (AA) MD simulations have demonstrated great success in predicting the effect of DC on the glass-transition temperature (Tg), thermal expansion coefficient and elastic response13,14 of epoxy resins, and the fracture behavior of epoxy composites15,16. This makes AA-MD suitable for informing larger-scale models, provided that the data required for upscaling is not prohibitively expensive to obtain. While theoretical tools such as time-temperature superposition have been instrumental in bridging temporal scales17,18, AA simulations on their own remain prohibitively expensive for high-throughput design.

Systematically coarse-grained (CG) models can extend the length and time scales of MD simulations by orders of magnitude, but chemistry-specificity requires calibration of a complex force-field to match the properties of underlying AA simulations or experimental data. Most CG models proposed for epoxies matched the structural features19 or the thermomechanical properties20,21 for highly-crosslinked networks. Prior models have generally not addressed the question of transferability of the model over different temperatures or curing states, which is challenging because of the smoother energy landscape and reduced degrees of freedom of CG models compared to AA models22,23. This particular aspect requires a functional calibration of the force-field parameters against DC, temperature (T), or any other variable over which transferability is desired. Machine Learning (ML) tools can efficiently handle such a parametric functional calibration in a complex force field. Despite the growing interest in utilizing ML approaches to CG modeling24,25,26, complex chemistries such as epoxy resins have not been explored extensively. Progress was made on this issue in a recent epoxy CG model27 where a particle swarm optimization algorithm was used to calibrate a T-dependent force-field for three different curing states with elastic modulus as the only target property. A general CG framework for epoxy resins that can target multiple properties at different DCs and demonstrate the method for more than one cure chemistry remains to be established. An accurate description of the dynamics and mechanical properties of partially cured epoxies is particularly relevant in the context of epoxy-based composites, where the exploitation of partial and multi-step curing processes can lead to enhanced performance of the epoxy resin for storage, additive manufacturing or functionalization28. Additionally, a model that can account for differences in curing degree across the material can be used to capture gradient properties within interphase regions of composites like CFRP29.

To address this issue, here we simultaneously target the DC dependence of density, dynamics, modulus, and yield strength of two model epoxy resins. A parametric functional calibration requires the functional form to be defined a priori30,31. This is not required by non-parametric methods that construct the calibration functions through a reproducing kernel Hilbert space32,33. However, either approach requires additional assumptions when used to calibrate functions in high-dimensional spaces to avoid identifiability issues34,35,36. For this reason, we employ a physics-informed strategy, leveraging our recently developed energy renormalization (ER)Full size image

The first step in the calibration of the CG force field was to set the parameters of the bonded potentials, which was done through a BI54 approach, to match the probability distributions informed from AA simulations. The details of the bonded terms parametrization are fully reported in our Supplementary Note 1 and Supplementary Fig. 1, and the potential form and parameters are listed in Table 1.

Table 1 List of all the bonded interaction parameters of the CG model obtained from Boltzmann inversion of the distributions of bonds and angles in the AA simulations, calculated between the centers of mass of the corresponding CG beads.

To determine the non-bonded parameters, we first extracted initial values for the cohesive energies and bead sizes \(\left[ {\varepsilon _i,\sigma _i} \right],(i = 1, \ldots ,7)\) from the AA radial distribution functions of all seven CG beads of the model using BI. These non-bonded parameters correctly reproduce the structure of the AA system in CG representation but fail to capture the macroscopic dynamics and mechanical properties of the system. This inadequacy makes the model insufficient to extract quantitative information from the simulations and guide the experimental design of these materials. In this study, we treat the non-bonded force field parametrization as a multi-objective optimization problem where we aim to determine 14 parameters \(\left[ {\varepsilon _i,\sigma _i} \right],\left( {i = 1, \ldots ,7} \right)\) to simultaneously match the target density, Debye-Waller factor 〈u2〉, Young’s modulus, and yield stress at all DCs.

Figure 2 reports the values of density, 〈u2〉, Young’s modulus, and yield stress of the AA systems for DGEBA+PACM and DGEBA+D400. We note that the values found for the Young’ modulus of the high DC systems are in line with experimental results47,49, in the range of 2.5–3 GPa. For both systems, the density and mechanical properties increase with increasing DC, while 〈u2〉, a marker of mobility, decreases. This is expected, and more pronounced in the DGEBA+PACM system, which has stiffer and less mobile chain networks due to the rigidity of the curing agent PACM. Flexibility introduced by D400 increases mobility and reduces density as well as mechanical properties of the DGEBA+D400 system47. A quantitative comparison of 〈u2〉 between simulations and future experiments should be done with caution, since in experiments 〈u2〉 is extracted from the neutron scattering intensity55, can depend on the scattering wavelength Q and the very definition of Debye–Waller Factor includes the whole exponential term \({{{\mathrm{DWF}}}} = {{{\mathrm{exp}}}}( - \frac{{Q^2u^2}}{3})\), while it is customary for molecular simulation studies to use the term DWF as a definition of the 〈u2〉 value extracted from MSD functions56.

Fig. 2: Target macroscopic properties of the AA simulations.
figure 2

a density, b Debye-Waller factor 〈u2〉, c Young’s modulus, and d yield stress as a function of DC for the DGEBA+PACM and DGEBA+D400 systems. Error bars result from the variance of statistically independent simulations. Density, modulus, and yield stress increase with increasing DC, while 〈u2〉, related to the mobility of the system, decreases. The D400 system, with the longer and flexible curing agent, has a lower density, higher mobility, and softer mechanical response. The dependence of these properties on DC is different in the CG model due to the different changes in configurational entropy caused by the reduction in degrees of freedom. This is typically discussed for changes in temperature, and here observed during the curing process of the polymer network. For this reason, a DC-independent parametrization of the CG model cannot fully capture the features of the AA model at all DC values (see Supplementary Figs. 2 and 3), and an energy renormalization procedure is needed.

Young’s modulus in particular changes differently depending on DC in the two systems, since the spatial density of crosslinks is higher in the DGEBA+PACM system due to the lower molecular weight of PACM compared to D400. In other words, because of the different chain configurations of the curing agent, increasing DC leads to different changes in configurational entropy caused by the reduction in degrees of freedom. In addition, we observe that the dependence of Young’s modulus on DC is nonlinear, indicating complex changes of configurational entropy with increasing DC in the epoxy resin networks.

Non-bonded CG force field: sensitivity analysis

Any fixed parametrization of the CG model is not able to match the properties of the AA system at all DC values, as we show in Supplementary Figs. 2 and 3 in our Supplementary Note 2. This is arguably caused by the different rate with which the configurational entropy of the AA and CG models changes with varying DC, similarly to what happens with varying T37. Thus, we introduced a DC dependence for all non-bonded parameters \(\left[ {\varepsilon _i,\sigma _i} \right] = [\varepsilon _i\left( {{{{\mathrm{DC}}}}} \right),\sigma _i({{{\mathrm{DC}}}})],(i = 1, \ldots ,7)\). In previous models with highly homogeneous polymers and few CG bead types, it was possible to study the dependence on temperature with manual parameter sweeps. ER in these circumstances required only one T-dependent function to rescale all cohesive energies (the εi) and another to rescale all the effective sizes of the CG beads (the σi). We found that this was not possible in our current epoxy model due to the high complexity of the system, including the effect of crosslinks and the large amount of CG beads with different cohesive energies and sizes. Here, we introduced a generalization of previous protocols that relies on ML to explore the high-dimensional space of the model parameters. The idea is to surrogate the AA and CG models with Gaussian random processes followed by minimizing the difference between the CG and the AA models for all DC with respect to the calibration functions. Preserving the seminal idea of the ER procedure, the protocol outlined in this paper can be easily generalized to any CG model. We used the simulation data presented in Fig. 2 to train the AA Gaussian process models: 19 samples for the DGEBA+PACM system and 20 samples for the DGEBA+D400 system. In the AA model, DC is the only input variable. For the CG model, DC and the non-bonded parameters [εi, σi] are the input parameters. The range of the parameters was determined by preliminary simulations calibrating the cohesive energies either to match the dynamics of the AA systems at DC = 0% or the Young’s modulus at DC = 90% or 95% (the highest DC we can achieve for the DGEBA+PACM or DGEBA+D400 AA networks respectively). This gave us extremes for the values of cohesive energies εi, and we further expanded them by ~20%. We also selected a range of ~±20% for the σi parameters from the initial estimate obtained from the BI of the radial distribution functions. We report the final range for all parameters \(\left[ {\varepsilon _i,\sigma _i} \right],(i = 1, \ldots ,7)\) in Supplementary Table 1 of the Supplementary Note 7. Our ranges were post-validated by our final calibration, as discussed in the following.

We trained the Gaussian process surrogate models on 700 simulation samples of the CG DBEGA + PACM system, which also allowed us to fine-tune the extremes for the calibration parameters. Then we trained 500 simulation samples of the CG DGEBA+D400 system, where fewer simulations where needed thanks to the initial fine-tuning. With these surrogates it was possible to perform a variance-based sensitivity analysis, as reported in Fig. 3. This type of analysis provided insight into how the responses of the surrogate models depend on their inputs57,58.

Fig. 3: Sensitivity analysis of the target properties varying \(\left[ {\varepsilon _i,\sigma _i} \right]\)and DC across 1200 CG simulations.
figure 3

The main sensitivity index measures the effect of varying a single input variable on the output. The total sensitivity analysis measures how changing a single input variable affects its contribution to the variance of an output measure while accounting for its interaction with the rest of the input parameters. The density of the systems (b) is dominated by the σi variables, as one would expect. Interestingly, DC has a stronger effect on 〈u2〉 (a) and the yield stress (d) than on the density and the Young’s modulus (c). The analysis sheds light on the role of different cohesive energies on the dynamics and mechanical properties of the systems, and it is a useful tool to guide the ML parametrization with the physical insight gained on the model.

As one would expect, the analysis revealed a strong influence of the σi parameters on the density, while the dynamics and mechanical properties of the system depend more on the cohesive energies εi. This separation was already assumed in previous ER models40 and it was confirmed here. Since the main sensitivity (white, thinner bars) dominates the total sensitivity (which includes the higher-order interaction effects between the input parameters) in all cases, the response of the CG model can be approximated with a first-degree polynomial. This also suggests that many of the functional relations between the forcefield parameters and DC can be described through a linear function, since the target responses presented in Fig. 2 are also close to linear. The relative contribution of the different cohesive energies to our target properties is similar for 〈u2〉, Young’s modulus, and yield stress. DC is as relevant as the cohesive energies for 〈u2〉 and yield stress, while its role is suppressed for the Young’s modulus. We notice the prominent influence of the parameter σ6 on all four measures used here to quantify the mechanical and dynamical properties of the DGEBA+D400 network. This is expected, as bead 6 is a relatively large bead in the repeated unit of the longer D400 molecule. As such, bead 6 makes up for ~28% of all the CG beads of the network, and close to 40% in terms of the bead volume. Variations of σ6 lead to large changes in the density of the system, as well as dynamics and mechanical properties.

CG force field optimization and validation

Before the calibration of the CG force-field, we needed to identify a flexible candidate class of calibration functions for the non-bonded parameters of the CG model. Previous ER papers40,41,42 for simple glass-forming polymers used a sigmoid function for the temperature dependence of cohesive energy and bead size with temperature. The choice is theoretically supported38 by the transition from the Arrhenius regime of liquids at high temperature to the glassy regime below the glass-transition temperature Tg, with the supercooled phase in between dominated by the caging dynamics and α-relaxation processes. We initially assumed a similar sigmoidal function for DC, roughly equating an increase in DC to a decrease in temperature given that both actions slow down dynamics. We found this constraint to be too restrictive for our systems: minimizing the discrepancy between the AA and CG response (Eq. (3) in our Methods section) did not yield a reasonable parametrization using sigmoid functions alone, as shown by Supplementary Figs. 4 and 5 in the Supplementary Note 3.

To uncover what functions best describe the DC dependence of the 14 non-bonded parameters, we employed a class of radial basis functions (RBF) described in our Methods section. We assumed that each calibration function shares the same shape parameter ω and that we have three centers for each calibration parameter x = [0%, 50%, 100%]. The number of centers can be increased to capture more complex behavior, but at the cost of overfitting the data and getting unrealistic approximations of the ‘true’ calibration functions. Our goal was to obtain the simplest force field that is still able to capture the response of the system. To demonstrate the effect of an overfitting parametrization, we include an example in the Supplementary Note 4 (see Supplementary Figs. 6 and 7) where the model has been calibrated at DC = 5% increments without analytical description.

The approach described so far using RBF for all the parameters gave us a possible solution for our force-field (see Supplementary Figs. 8 and 9 in the Supplementary Note 5), but at the cost of a highly complex parametrization. We wanted to simplify our model by reducing the degrees of freedom of the parametrization without affecting the model’s accuracy. Given that our CG and AA models have intrinsic uncertainty that is approximated with our Gaussian process models through the assumption of homoscedasticity, we calculated the probability that for a specific set of calibration parameters the CG models came from the same distribution as the AA models through an objective function that captures the goodness of fit:

$$\begin{array}{*{20}{c}} {{{{\mathcal{L}}}}\left( {\varepsilon ,\sigma } \right) = \mathop {\int }\limits_0^1 \mathop {\prod }\limits_{i = 1}^4 \mathop {\int}\limits_\mathcal{Y} {P\left( {f_{i,{{{\mathrm{P}}}}}^{\left( {{{{\mathrm{CG}}}}} \right)}\left( {{{{\mathrm{DC}}}},\varepsilon _{{{\mathrm{P}}}}\left( {{{{\mathrm{DC}}}}} \right),\sigma _{{{\mathrm{P}}}}\left( {{{{\mathrm{DC}}}}} \right)} \right) = y} \right)P\left( {f_{i,{{{\mathrm{P}}}}}^{\left( {{{{\mathrm{AA}}}}} \right)}\left( {{{{\mathrm{CD}}}}} \right) = y} \right){{{\mathrm{d}}}}y{{{\mathrm{dCD}}}} + } } \\ \qquad\qquad{\mathop {\int }\limits_0^1 \mathop {\prod }\limits_{i = 1}^4 \mathop {\int}\limits_\mathcal{Y} {P\left( {f_{i,{{{\mathrm{D}}}}}^{\left( {{{{\mathrm{CG}}}}} \right)}\left( {{{{\mathrm{DC}}}},\varepsilon _{{{\mathrm{D}}}}\left( {{{{\mathrm{DC}}}}} \right),\sigma _{{{\mathrm{D}}}}\left( {{{{\mathrm{DC}}}}} \right)} \right) = y} \right)P\left( {f_{i,{{{\mathrm{D}}}}}^{\left( {{{{\mathrm{AA}}}}} \right)}\left( {{{{\mathrm{CD}}}}} \right) = y} \right){{{\mathrm{d}}}}y{{{\mathrm{dCD}}}}} ,} \end{array}$$
(1)

where the subscript corresponds to the ith response variable. Equation (1) has similar properties as a likelihood function and thus lends itself to be used in an approximate Bayesian computation scheme to get a posterior approximation of the parameters that make up the calibration functions. Through a quasi-random sampling scheme, we approximated the first two statistical moments of the calibration functions.

The green curves in Fig. 4 show the functions in the RBF class that maximize the objective function of the CG and AA models yielding the same target properties, where the uncertainty quantification for each function is also reported (green band). Note that some of the calibration functions have a large envelope of uncertainty (e.g., ε6 and ε7), while others have a small uncertainty envelope (e.g., σ2 and σ6). If the uncertainty envelope is small, we were able to make a well-informed decision on the class of functions that would be most suited to model the non-bonded force field relation to DC. When the uncertainty bounds are large, then the choice of function is not consequential to the calibration accuracy, and we were able to simplify the function. In essence, the quantified uncertainty provides a decision support tool that gives modelers insight into what calibration functions are most significant to the calibration accuracy. The functions’ uncertainty reported in Fig. 4 is a local measure of uncertainty around the function mean value considering all the target properties, while the sensitivity analysis of Fig. 3 is a global measure in the whole parameter space for each property separately. Still, it is possible to connect the two quantities considering the joint probability distributions. We discuss this briefly in our Supplementary Note 8 (see Supplementary Figs. 11 and 12), and we will report these technical findings in detail in an upcoming paper focused on the statistical analysis approach to functional calibration.

Fig. 4: Optimized DC-dependent functions of the non-bonded force field parameters \([\varepsilon _i\left( {DC} \right),\sigma _i(DC)]\).
figure 4

The green curves are RBFs yielding maximum goodness of fit, see equation (1), between the AA and CG target properties. The green bands quantify the uncertainty of each parameter, which tells us how sensible the final response of the model depending on the parameter. Where large uncertainties are present, e.g., in the ε6 and ε7 functions, we were able to modify the class of function of that parameter to either linear or constant without loss of accuracy of the model’s response, thus simplifying the parametrization. The black curves are obtained after simplifying the class of functions and minimizing the squared difference in the AA and CG model response. Note that once a new class of functions is chosen, the new function is not necessarily an approximation of the RBF for each individual parameter. The simplified formulation maintained a fair match59 with the AA models with an average root mean squared percentage error (RMSPE) of 10%. We did not observe a noticeable loss of accuracy of the model compared to calibrations of much higher complexity, see Supplementary Figs. 7 and 9.

With this procedure, it was possible to drastically simplify our parametrization, reducing most functional forms either to linear functions or constants with changing DC. For the simplification, we used the results presented in Fig. 4 and considered either a constant function or a linear function if it would fit within the envelope of uncertainty (where we preferred constant over linear as it requires one fewer parameter). With this initial guess, we used Eq. (3) (see our Methods section) to minimize the squared difference for the new set of calibration functions. The results of this simplification are the black lines in Fig. 4: only the parameter ε3 required an RBF; ε2, ε5, σ1 and σ3 required a linear dependence on DC, while the remaining 9 parameters could be kept constant. The number of free parameters needed for this parametrization was reduced from 43 (all RBF) to 21 (simplified formulation), see Table 2. We note that once an inference has been made on the new class of function that can be used for each parameter in the simplified formulation, the goal is to globally minimize the discrepancy between the AA and CG models response. As such, each simplified function (black curves in Fig. 4) is not necessarily an analytical approximation of their respective RBF (green curves). Some of the trends obtained are in line with our expectations, like a general increase of ε3 with increasing DC as the main parameter to control the system’s response, given its preeminent role in determining the dynamics and mechanical properties of the CG model, as observed in the sensitivity analysis shown in Fig. 3. The parameters associated with beads 1-3 (the DGEBA molecule) showed the strongest trends. This makes sense, as DGEBA is present in both networks. For the bead sizes in particular, the DC dependence of both systems is controlled uniquely through σ1 and σ3, all other bead sizes being kept constant. The increase of ε2 and ε3 with increasing DC controls the increase of Young’s modulus, yield stress and 〈u2〉 in the DGEBA+D400 network, since ε6 and ε7 (part of the D400 molecule) are kept constant. A downward trend of ε5 (bead represented in both the PACM and D400 molecules) likely compensates for the effect of ε2 and ε3. We want to stress that this solution might not be unique, within small variations of overall accuracy, and the specific details of these functional calibration parameters will depend on the search space of the algorithm, the details of the training data set and other protocol dependent parameters. This is particularly true for parameters with a large uncertainty envelope, where the model’s outputs are not strongly affected by variations of the parameter. Nevertheless, the convergence of the algorithm ensures an excellent match between the target properties in the AA and CG force fields, as we show in the following, which is robust against these variations. For reproducibility purposes, we include in our supplementary materials our complete data set, inputs and outputs of all AA and CG simulations, as well as the LAMMPS input files and structure used to obtain these results.

Table 2 Parameters for the simplified analytical description of all cohesive energies and bead sizes, as shown in Fig. 4.

We report in Table 2 the analytical description of all the parameters in the simplified formulation shown in the black curves of Fig. 4. For each parametrization, the ML algorithm predicted the response of the CG model for all target properties as a function of DC, which was compared to the values of the same properties in the AA Gaussian process model through Eq. (3). For the parametrization shown in Fig. 4, the ML-predicted response of the CG model compared to the AA values is reported in Fig. 5. For each target property, the ML interpolation assigned a confidence interval in addition to the expected value for both the AA and the CG systems, with larger intervals for complex properties like the Young’s modulus, that has a higher measurement uncertainty (see Fig. 2c) and, for the CG model, large sensitivity to the variation of the force field parameters. The CG prediction is in line with the AA values for all properties and at any DC.

Fig. 5: Validation of the predictive power of the Gaussian process interpolation.
figure 5

Comparison of the target properties as a function of DC between the Gaussian process AA model (red lines), the CG model (blue lines) with the simplified parametrization shown in Fig. 4, and the results of the corresponding CG simulations (black stars). Debye-Waller factor, density, Young’s modulus and yield stress are reported for the DGEBA+PACM system (ad) and for the DGEBA+D400 system (eh). The confidence intervals were obtained from the data of Fig. 2 for the AA simulations and the design of experiments simulations for the CG model. The error bars on the black stars result from the variance of statistically independent CG simulations. The parametrization of Fig. 4 gives a fair agreement59 for all our targets from the uncrosslinked systems to the fully crosslinked epoxy networks (average RMSPE = 10%). The CG simulation data are in line with the ML-CG prediction, and close to the AA prediction. Slightly higher accuracy is possible with different parametrizations, but at the cost of greatly increasing the complexity of the force field. We discussed other formulations in our Supplementary Notes.

Our parametrization has a high level of accuracy, and we found a fair agreement59 (average RMSRE = 10%) between the AA and CG responses. We also note that the limit on the accuracy of our prediction lies in the competition between the different responses (dynamics and mechanical properties in particular), and the ML protocol proposed is able to obtain a much higher accuracy if calibrated on individual responses separately, as shown in Supplementary Fig. 10 of our Supplementary Note 6. A perfect calibration of 〈u2〉 for the high DC systems for example (Fig. 5a, e) would require a lower mobility of the CG model, which would increase the value of the Young’s modulus (Fig. 5c, g) above the target AA value. Our optimization provided the best solution taking into account the simultaneous calibration of the targets. Additionally, this protocol is easily generalizable to any system, for any set of target properties. Higher accuracy can be achieved, if needed, at the cost of a more complex force field. We discuss other possible parametrizations in our Supplementary Notes. We note that the framework here developed can be generalized to different systems of high chemical complexity, where a tradeoff between accuracy and generality of the CG force field must be considered depending on the goal and application of the model. Our method can be readily applied to multi-objective parametrizations, where proper weights are attributed, tailoring the force field to specific applications.

Finally, we discuss the results of the CG simulations performed with the parameters reported in Table 2. The stars in Fig. 5 correspond to the values of the target properties extracted from CG simulations performed with the simplified parametrization of Fig. 4, showing the agreement between the CG Gaussian process prediction and the actual CG simulation.

CG model predictivity beyond target properties

With the validated approach and optimized CG force field parameters, we now report the overall dynamics and mechanical response of the CG and AA systems with varying DC.

Figure 6 shows the MSD and stress–strain curves up to 20% tensile deformation for both DGEBA+PACM and DGEBA+D400 systems at DC = 0%, 50%, and 90–95% (for PACM and D400 respectively). The CG curves validate the prediction of the ML model and show good agreement with the AA values for 〈u2〉, Young’s modulus, and yield stress of the systems. In addition to that, the comparison with the AA curves of corresponding DC shows that by matching modulus and yield stress, we captured the overall stress under tensile deformation for the system. By matching the Debye–Waller factor 〈u2〉 we expected to match perfectly the overall MSD curve at longer timescales, given theoretical relationships linking the picosecond caging dynamics to the segmental dynamics of glass-forming systems and validated in previous ER models for simpler homopolymers. For the current model, we do not find a strong evidence of this. Despite matching the picosecond caging dynamics of the AA and CG systems, the AA has faster dynamics at longer timescales for the uncrosslinked systems. We are not sure of the origin of this effect, but it could be caused by the variety of CG beads with different sizes and cohesive energy, which might create a broader spectrum of caging scales and relaxation times. Despite this discrepancy, the effect is greatly reduced in the fully crosslinked network of interest for experimental applications, where the system is strongly restrained in the network conformation and diffusion is suppressed.

Fig. 6: CG validation of the ML parametrization.
figure 6

For our DGEBA+PACM and DGEBA+D400 systems the CG parameters chosen for the non-bonded interactions not only match the target properties we selected (as shown in Fig. 5) but can also predict the whole MSD (a, b) and tensile stress curves (c, d), validating our choice of targets as good predictors of the systems dynamics and mechanical properties.

Overall, the current parametrization showed a high level of accuracy and accounted for the variation in the degree of crosslinking of the network. Even if intermediate DC values might be less practical for this specific system, the problem of the ER for CG models is relevant outside of this particular chemistry, and the protocol outlined in this work can be easily generalized. The developed ML model has aspects of great relevance: (i) it provides reliable insight into unknown physics by accounting for the uncertainty in the training data and the response surface approximations, (ii) it is computationally tractable compared to fully Bayesian parametric and non-parametric calibration schemes that are known to struggle with problems with >10 parameters33. The CG simulations of this study run ~103 times faster than the AA systems, simulation size being the same. The increased efficiency of our CG model makes it possible to investigate epoxy networks beyond the nanoscale, for instance to examine factors such as heterogeneity or fracture processes that may exhibit scale dependence.