1 Introduction

The development of ground motion models using machine learning (ML) algorithms has been a topic of wide research in the past years, due to the ability of data-driven ML models to recognize patterns and nonlinear relationships of underlying phenomena in large datasets (e.g., Bishop 2006). This fact constitutes a clear advantage over the choice of predefined functional forms that traditional ground motion models (GMMs) employ (e.g., Ahmad et al. 2008; Gandomi et al. 2011); which can be rather complex and cumbersome to implement (e.g., Bommer et al. 2010), and are susceptible to generalization errors (e.g., Kuehn et al. 2009). The “functional form” of ML models is established from the data and the inherent dependencies between the input parameters and ground motion observations (e.g., Derras et al. 2012). However, this advantage may lead to biased GMMs if ML algorithms are trained on limited ground motion datasets or when complex models are employed (i.e., high dimensionality). Nowadays, the increasing availability of ground motion records has enabled the development of robust ML GMM on large datasets, potentially able to cover a wide range of scenarios (i.e., earthquake magnitude, source-to-site distances, site conditions).

In the past decades, various ML algorithms have been employed to predict ground shaking intensity measures (IMs) in the time and frequency domain, using strong motion databases with recordings from all over the world. Artificial neural networks (ANNs; e.g., Perlovsky 2000) was the first ML algorithm applied for this purpose, due to the universal approximation theorem (e.g., Hornik et al. 1989; Cybenko 1989). The first studies developed ANNs using small GM datasets available from specific countries (e.g., Kerh and Ting 2005; Liu et al. 2006; Güllü and Erçelebi 2007; Ahmad et al. 2008). Subsequently, alternative ML algorithms were also applied leveraging the availability of larger datasets, such as Support Vector Machine (e.g., Thomas et al. 2017); Gene expression Programming (e.g., Javan-Emrooz et al. 2018); ensemble decision tree models (e.g., Hamze-Ziabari and Bakhshpoori 2018); and combination of a few ML algorithms (e.g., Alavi and Gandomi 2011; Akhani et al. 2019). It is worth mentioning that Khosravikia and Clayton (2021) compared the performance of four different ML algorithms on the prediction of spectral ordinates and PGA, concluding that Random Forests seemed to have the best performance. A detailed overview regarding the state-of-the-art application of ML algorithms on ground motion modelling can be found in **e et al. (2020) and Kong et al. (2018).

ANNs have received special attention in this field, as evidenced by the large number of studies in the literature that employed this approach (**e et al. 2020). Particularly, several notable studies have developed ANN models to predict spectral ordinates and peak ground motion IMs using a combination of common input variables, namely, Joyner-Boore distance (Rjb) or closest to the rupture distance (Rrup); moment magnitude (Mw); average shear wave velocity in the top 30 m (Vs30); hypocentral depth (Dhyp); and faulting type (FT). Derras et al. (2014) developed an ANN model using the RESORCE database (e.g., Akkar et al. 2014a) and adapted the mixed-effects algorithm proposed by Abrahamson and Youngs (1992) to estimate the model’s parameters (weights and biases) and random effects. Using the RESORCE database along with the NGA-West2 database (e.g., Ancheta et al. 2014), Derras et al. (2016) trained ANNs and assessed the performance of site-condition input parameter proxies. Also utilizing the NGA-West2 database, Dhanya and Raghukanth (2017) proposed an alternative hybrid global search training algorithm for ANNs, while Derakhshani and Foruzan (2019) suggested a deep learning approach opting for more than one hidden layer. Finally, Wang et al. (2020) proposed an ANN to account for the uncertainties in Mw and Vs30.

We expand on the existing literature by training an ANN using data from the Pan-European Engineering Strong-Motion Database (ESM; Lanzano et al. 2019; Bindi et al 2019), which is a revised and extended version of previous European databases (e.g., RESORCE) that includes significantly more ground motion records from Europe and the Middle East, and the NGA-West2. We apply a recent formulation for the characterization of the ground shaking uncertainty, which allows the estimation of the between-event and between-site terms. Moreover, due to the use of a larger set of ground motion records, the ground motion models proposed herein has a larger range of applicability in terms of distance and magnitudes. We compared the results from the ground motion model with other common models in the literature, and verified its performance against recordings from recent earthquakes in Italy. The present study is the first part of an extension of the work of Kalakonas and Silva (2021), where ANNs were used to derive fragility and vulnerability functions for predominant building classes in the Balkans. In the companion study, the ground motion model described herein is used to carry out earthquake scenario simulations for several historical earthquakes in the Balkans, to compare for the first time the performance of ANNs against traditional methodologies and empirical damage and loss data.

2 Ground motion database compilation

The study of Derras et al. (2016) showed that using ground motion records from stations with only measured Vs30 profiles can reduce the aleatory variability of GMMs up to 15%, compared to inferred Vs30 values. Following this finding, the flatfile of the ESM database was initially filtered to include data from stations with measured Vs30 and to exclude data with unknown Mw, Rjb, and FT, leading only to 707 records. Moreover, a lack of very strong motions with measured Vs30 in the ESM database was observed. For these reasons, a subset with measured Vs30 profiles from the NGA-West2 flatfile was adopted, primarily containing strong motions from California and Europe. A minimum threshold of 0.01 g of PGA was set to compensate for the lack of strong motions in the extracted subset of ESM.

For the vast majority of the excluded records from the ESM, only the Rjb value was missing, while epicentral distance (Repi) was present for all records. Moreover, no information was found regarding the rake, strike and dip angles for these records, that could allow inferring the finiteness of the rupture, and consequently the calculation of Rjb. To overcome this issue, a decision was made to estimate the missing Rjb values using a simple ANN trained using the records with complete data. The resulting ANN was trained to estimate Rjb using both subsets (ESM and NGA), after excluding data from seismic events with Mw > 6.5 recorded at Repi < 20 km. The selected data were pre-processed by applying the log10 to Rjb, Repi, Dhyp and a one-hot transformation to FT to transform the categorical data describing the faulting type into numbers (i.e., 0 representing Normal fault, 1 Reverse fault, 2 Strike-slip fault). The Rjb prediction ANN model was trained following the approach described in Kalakonas and Silva (2021), using a fivefold cross-validation to assess the model’s performance on unseen data. In this process, the entire dataset was randomly divided into five equal subsets, and subsequently five ANNs were successively trained using four folds as the training dataset and one fold as the testing dataset until all folds were used for testing. The best performing ANNs were found to be structured by one hidden layer with 4 neurons (equal to the number of input parameters), using the sigmoid activation function for the hidden layer and the linear function for the output layer. The mean regression metrics amongst the 5 fitted ANNs are reported in Table 1, and the correlation between the Rjb values documented in the databases and the ones predicted by the ANN is illustrated in Fig. 1. This ANN is openly available through this public GitHub repository https://github.com/petroskalakonas/ANN-Ground-Motion-Model-for-Europe.

Table 1 Mean regression metrics of the trained ANN models to predict Rjb using fivefold cross-validation
Fig. 1
figure 1

Recorded versus predicted Rjb correlation plot

The selected Rjb prediction ANN model was used to fill the missing Rjb values in the ESM database for all the seismic events within the range of input parameters mentioned above. The authors acknowledge that this step introduced additional epistemic uncertainty, although given the high accuracy of the Rjb regression model, we argue that it can be neglected. At this stage, the compiled database was filtered with the following criteria:

  • Only events with a hypocentral depths in the range of 1 ≤ Dhyp ≤ 30 km were considered, in order to be consistent with the observed seismicity in the Balkan region.

  • Due to the limited amount of data with Mw higher than 7, which are recorded only from a handful events, an upper threshold of 7 was set to Mw. The included values of Mw span in the range of 3.5 and 7.0.

  • An upper threshold of 1200 m/s was applied to Vs30, in order to exclude motions recorded on bedrock and include only ground motions relevant for the assessment of the seismic response of buildings. Thus, the truncated values of Vs30 span between 90 and 1200 m/s.

  • All the values of Rjb were used, which include ground motions recorded at distances up to 550 km.

  • Three primary fault types were considered as documented in the databases: Normal; Reverse (or thrust) and Strike-Slip.

The final compiled database is comprised by 3928 recordings (3134 from the ESM and 794 from the NGA-West2), from 584 events, recorded at 867 stations. The data distribution is presented in Fig. 2.

Fig. 2
figure 2

Data distribution of the compiled database

3 Training of the ground motion ANN using mixed-effects

The type of neural network employed herein is a feed-forward ANN structured by consecutive dense layers, which are composed of processing nodes (units) carrying out numerical operations termed neurons (e.g., Haykin 2009). Each neuron is defined by a set of synaptic weights and a bias, which collectively are the parameters of the neural network to be determined by the training process. The synaptic weights are the multipliers of the outputs of the previous layer and the bias is a constant added to the outcome before it is passed to the activation function, which is common for every neuron in a given layer. The objective of the ANN training is the determination of its parameters through the minimization of a loss function and the application of the backpropagation technique. Detailed information regarding the mathematical operations and nature of such neural networks can be found in the existing literature (e.g., Perlovsky 2000; Bishop 2005; Haykin 2009).

In the context of ground motion modelling, typically empirical models are developed following the mixed-effects approach, which include fixed-effects and random-effects components. The former is the attenuation formula fitted to the empirical data (e.g., GMM functional form) and the latter is the ground motion variability, usually partitioned into inter- and intra- event variability terms (e.g., Joyner and Boore 1981; Brillinger and Preisler 1985). The mixed-effects regression analysis accounts for implicit dependencies of observations (e.g., ground motion IMs) belonging to the same groups (i.e., records at different stations from the same event or records recorded at the same station from different events). Derras et al. (2014) adapted the widely used algorithm of Abrahamson and Youngs (1992) to ANNs, where the fixed terms (i.e., ANN’s weights and biases) and random terms (i.e., variances) are estimated iteratively by maximizing the likelihood of a given equation, partitioning the total ground motion variability in between- and within- event terms. Khosravikia and Clayton (2021) extended the algorithm proposed by Derras et al. (2014) to any ML regression model to account for site random effects, and partitioned the total variability into 3 terms: between-event, site-to-site, and residual (e.g., Joyner and Boore 1993; Chen and Tsai 2002).

In the present study, the algorithm proposed by Khosravikia and Clayton (2021) was followed, although the likelihood function and maximization method were adopted from Bates et al. (2015) due to its high computational efficiency. In this process, the ANN is trained using a fixed-effects regression, where the weights and biases of the network are calculated using the approach described in the following paragraphs. Subsequently, the variances of the random-effects are estimated by likelihood maximization, and the between-event and between-site terms are estimated and subtracted from the ground motion observations. Finally, the ANN model is retrained on the updated data and these steps are repeated until the overall likelihood is maximized. The mathematical expression of the ground motion model employed herein is (e.g., Joyner and Boore 1993; Stafford 2014):

$$ \log_{10} \left( {IM_{i,j} } \right) = f\left( {{\varvec{X}}|{\varvec{\theta}}} \right) + \delta_{E,i} + \delta_{S,j} + \varepsilon_{i,j} $$
(1)

where X are the input parameters of the ANN (i.e., Mw, Rjb, Vs30, Dhyp and FT); θ are the ANN parameters (i.e., synaptic weights and biases); f(X|θ) is the ANN median prediction of the ground motion log10(IM) for event i at site j; δE,i is the event random effect term accounting for the between-event variability for event i; δS,j is the site random effect term accounting for the between-site variability at site j; and εi,j is the residual term of the model representing the within-event/site variability. These random effect terms are assumed to be independent random variables following a standard normal distribution (e.g., Al-Atik et al. 2010), with standard deviations τ, φ and ψ, respectively. Essentially, the total aleatory variability of the ground motions is partitioned into components related to characteristics of the earthquake rupture, seismic wave propagation, and local site effects which are not captured by the input parameters. For example, events with the same Mw, Dhyp and FT may have considerably different stress drop and directivity effects, and sites with the same Vs30 may have a systematic (de)amplification of the ground motion related to other local site parameters. The residual term represents the part of the aleatory variability of ground motion for a given event at sites with the same Rjb and Vs30, which is related to the other factors such as the effect of different travel paths and topographic effects. Finally, the total standard deviation can be expressed as:

$$ \sigma = \sqrt {\tau^{2} +\upphi ^{2} + \psi^{2} } $$
(2)

As for the outputs of the model, a group of common IMs used in seismic risk assessment was selected. These include the median (RotD50) of: PGA, PGV, PGD, Arias Intensity (IA; Arias 1970), and 5% damped spectral acceleration (SA) values spanning in the period range of 0.01–4.0 s, for a total of 27 periods. In order to divide the dataset into training and testing subsets, the cross-validation approach (e.g., Picard and Cook 1984) was employed, a robust method to evaluate the model’s performance. In particular, a fivefold cross validation was used (e.g., Huang and Burton 2019; Khosravikia and Clayton 2021), in which the compiled database was split randomly into 5 equal subsets. Five ANNs were trained successively using four folds for training and one fold for testing the model’s prediction on unseen data, until all subsets have served as the testing dataset. The cross-validation regression metrics are calculated as the average of the 5 trained ANNs, enabling a robust way to detect and avoid overfitting (i.e., evidently better performance on the training data) and underfitting (i.e., poor overall performance). Concerning the selection and tunning of the model’s hyperparameters (e.g., optimization algorithm, activation function), the procedure and findings described in Kalakonas and Silva (2021) were adopted. For the sake of clarity, each step and modelling choices are summarised below:

  • Pre-processing of input parameters and IMs The efficiency of the training process of ANNs is greatly enhanced when the input and output data span in similar ranges. For this reason, the log10 was applied to Rjb (in km), Dhyp (in km), Vs30 (in m/s) and all IMs (in cm, cm/s and cm/s2), while a one-hot encoding was applied to FT.

  • Optimization algorithm and loss function The mean squared error (MSE) was selected as the loss function which was minimized by the adaptive moment estimation algorithm (ADAM; e.g., Kingma and Ba 2017).

  • Number of hidden layers and activation functions One hidden layer was used due to its sufficiency to approximate any real function according to the universal approximation theorem (e.g., Auer et al. 2008; Morfidis and Kostinakis 2018). The hyperbolic tangent (tanh) and the linear function were employed in the hidden and the output layer, respectively.

  • Number of neurons The trial-and-error approach was followed to determine the optimal number of neurons in the hidden layer by comparing the performance of the models on the training and cross-validation datasets. It was found that 8 neurons maximized the models’ predictive power, while a higher number led to overfitting.

The training of the ANN was performed in a Python environment using the TensorFlow open-source ML library (e.g., Abadi et al. 2016). The average regression metrics amongst the 5 trained models are presented in Table 2. We note that these metrics are calculated from the fixed-effect regression before the application of the mixed-effect algorithm, and their purpose is to identify potential over- or under- fitting errors related to the selection of hyperparameters. The results from the mixed-effects regression algorithm for the selected model are reported in Table 3, where the total within-event standard deviation (i.e., \(\sqrt {\varphi^{2} + \psi^{2} }\)) is also reported. The schematic representation of the ANN is illustrated in Fig. 3 and the mathematical expression of its median prediction for an IM is given by the following formula:

$$ \overline{{\log_{10} \left( {IM_{j} } \right)}} = f_{lin} \left( {b_{j} + \mathop \sum \limits_{h = 1}^{8} W_{j,h} {* }tanh\left( {b_{h} + \mathop \sum \limits_{i = 1}^{5} W_{i,h} X_{i} } \right)} \right) $$
(3)

where Xi is the input parameter i (i.e., Mw, Rjb, Vs30, Dhyp and FT); Wi,h is the weight of the hidden neuron h for the input Xi; bh is the bias of the hidden neuron h; Wj,h is the weight of the IM’s neuron j for the output of neuron h; bj is the bias of the IM’s neuron j; tanh and flin are the activations functions of the hidden and output layer, respectively. The ANN along with the cross-validation and mixed-effects regression metrics are publicly available through the GitHub repository https://github.com/petroskalakonas/ANN-Ground-Motion-Model-for-Europe.

Table 2 Mean regression metrics of the trained ANN models using a fivefold cross validation
Table 3 Random-effects standard deviation of the selected ANN ground motion model
Fig. 3
figure 3

Schematic representation of the ANN ground motion model

4 GMM evaluation and input parameters scaling relationships

In this section, the predictive power of the developed ANN was evaluated and its median ground motion predictions were compared to other common GMMs for European regions. Scaling and attenuation relationships for two selected IMs against the input parameters were also studied to evaluate the model’s physical trends and consistency. The latter is quite important to be satisfied as the ANN will be used in earthquake scenario simulations of building portfolios, where a wide range of combinations of input parameters (i.e., Mw, Rjb, Vs30) usually occurs.

Typically, the correlation between the recorded IMs and the ones predicted by the ML models is quantified by the R or R2 correlation coefficient (e.g., Dhanya and Raghukanth 2017; Derakhshani and Foruzan 2019; Khosravikia and Clayton 2021). In Fig. 4, the recorded values of SA(0.3 s) and SA(1.0 s) are plotted against the values predicted by the ANN. These two IMs are commonly employed in seismic vulnerability modelling for low-rise and mid-rise building typologies, respectively (e.g., Yepes-Estrada et al. 2016; Martins and Silva 2020). The relatively high reported R2 values suggest a strong correlation, demonstrating the predictive power of the ANN and robustness of the mixed-effects algorithm. More importantly, this strong correlation is observed throughout the entire range of intensities for both training and testing datasets, which indicates a fine tunned and consistent model. It should be noted that similar results were observed for all the other IMs.

Fig. 4
figure 4

Recorded versus predicted correlation plot for the two selected IMs using the entire compiled database [left: SA(0.3 s) and right SA(1.0 s)]

The Mw scaling and Rjb attenuation relationships are presented in Figs. 5 and 6, respectively, assuming an intermediate hypocentral depth of 10 km and a relatively stiff soil site with Vs30 = 500 m/s in both cases. In the case of Mw, the scaling relationship of SA(0.3 s) and SA(1.0 s) are studied at short (10 km), intermediate (50 km) and large (100 km) Rjb distance. As for the case of Rjb, the attenuation of the two IMs is explored assuming a small (Mw = 4), moderate (Mw = 5.5) and large (Mw = 7) seismic event.

Fig. 5
figure 5

Mw scaling relationship for the two selected IMs [left: SA(0.3 s) and right SA(1.0 s)]

Fig. 6
figure 6

Rjb attenuation relationship for the two selected IMs [left: SA(0.3 s) and right SA(1.0 s)]

Even though the maximum value of Mw included in the training of the ANN is 7, in Fig. 5 we extended the horizontal axis up to 7.5 in order to assess the model’s behaviour beyond the applicable range. Evidently, the linear increasing trend is preserved beyond Mw 7 (e.g., Kotha et al. 2016), allowing the potential use of the model for larger seismic events. Overall, the impact of faulting type in Fig. 5 is quite similar for both IMs and suggests that ground motion intensity scales steeper for reverse fault types from moderate to large events (Mw = 5.5–7.5). Moreover, the scaling of SA(0.3 s) is more gradual at 10 and 50 km in comparison to SA(1.0 s), a trend that is also observed in Fig. 6. Additionally, the effect of faulting type is negligible at short distances in all cases, where a saturation of ground motion is observed especially for small and moderate events. This trend of ground motion models has been observed in several studies (e.g., Schmedes and Archuleta 2008; Boore et al. 2014) and the saturation length (plateau) is shown to be also depth dependent (Kotha et al. 2020). The results herein suggest that the length of the plateau increases with Mw.

Regarding the scaling of ground motion with Vs30, the results are presented for the two IMs in Fig. 7 for a moderate seismic event of Mw 5.5 with an intermediate depth of 10 km. The effect of Vs30 is more influential for SA(1.0 s) than SA(0.3 s) at short to intermediate distances. In both cases, the amplification of ground motion with decreasing Vs30 increases with distance.

Fig. 7
figure 7

Vs30 scaling relationship for the two selected IMs [left: SA(0.3 s) and right SA(1.0 s)]

Three GMMs developed using recordings from Europe and the Middle East were used to compare against the results from the ANN. These GMMs employ similar input parameters to those used by the ANN. These include Akkar et al. (2014b), Bindi et al. (2014) and Kotha et al. (2016). We note that for the case of Kotha et al. (2016) which includes a region specific site term, the site was assumed outside of Italy or Turkey. These three GMMs along with the ANN are used in the companion study to calculate ground shaking for several historical events.

Figures 8 and 9 present the median response spectra predicted by the ground motion models at a stiff soil site of Vs30 = 500 m/s. The former assumes a normal fault at an intermediate depth of 10 km, while the latter assumes a strike-slip fault at a deeper hypocentral depth of 20 km. The response spectra predicted by the ANN for small and moderate Mw appear to attenuate faster at intermediate and large distances, especially for the shallower event and normal faulting. Regarding the distance of 10 km, the ANN led to higher ground shaking for moderate Mw. On the other hand, the opposite trend is observed for Mw = 7, where at 50 and 100 km the ANN predicts relatively higher shaking at short to moderate periods of vibration. This trend is more predominant for the deeper event and strike-slip faulting. It was found that the NGA subset of the compiled database includes significantly stronger ground motion records than the ESM subset from Strike-Slip earthquakes, especially for events with large Mw. Thus, the ANN model estimates higher ground shaking for Strike-slip events with large Mw at intermediate to long distances. The authors recognize that the GMMs considered herein were developed using a different ground motion database (RESORCE), and therefore some differences are expected.

Fig. 8
figure 8

Median response spectra predicted by 3 European GMMs and the ANN model. In all cases, Vs30 = 500 m/s, Dhyp = 10 km and Normal fault were considered. Abbreviations used: A14—Akkar et al. (2014b); B14—Bindi et al. (2014); K16—(Kotha et al. 2016)

Fig. 9
figure 9

Median response spectra predicted by 3 European GMMs and the ANN model. In all cases, Vs30 = 500 m/s, Dhyp = 20 km and Strike-slip fault were considered Abbreviations used: A14—Akkar et al. (2014b); B14—Bindi et al. (2014); K16—Kotha et al. (2016)

The ANN’s predictions were also compared to recorded ground motion observations from historical events. To this end, six recent seismic events in Italy with adequate number of stations with measured Vs30 were extracted from the ESM database. In Figs. 10, 11 and 12 we present the comparisons of the ANN’s median PGA prediction (μ) ± one between-event standard deviation (τ) to the recorded PGA values. Only stations with soil class B according to EC8 were considered (360 m/s ≤ Vs30 ≤ 800 m/s) and the mean Vs30 value of the stations was used by the ANN model. An overall satisfactory agreement is observed for all the events, especially for the Amatrice 2016 and Norcia 2016 events, where the ANN is able to capture effectively the attenuation of PGA from short to long distances.

Fig. 10
figure 10

PGA median predictions of the ANN model ± 1 between-event standard deviation for the Mw 6.0Amatrice 2016 earthquake mainshock and one aftershock. The PGA observations from stations belonging to site class B according to EC8 are also plotted

Fig. 11
figure 11

PGA median predictions of the ANN model ± 1 between-event standard deviation for the Mw 6.1 Emilia Romagna 2012 earthquake mainshock and aftershock. The PGA observations from stations belonging to site class B according to EC8 are also plotted

Fig. 12
figure 12

PGA median predictions of the ANN model ± 1 between-event standard deviation for the Mw 6.5 Norcia 2016 earthquake mainshock and foreshock. The PGA observations from stations belonging to site class B according to EC8 are also plotted

5 Random effect terms and aleatory uncertainty

In Fig. 13, the standard deviation of the random-effects is plotted against the periods of vibration (T). The within-event standard deviation is also illustrated, which is calculated as \(\sqrt{{\varphi }^{2}+{\psi }^{2}}\). Surprisingly, the between-event (τ) is the dominant source of aleatory uncertainty throughout the entire range of 0.01–4 s. The variation of τ with respect to T shows a peak at T = 0.1 s for short to intermediate periods, which is in agreement to the results obtained by other studies (e.g., Kotha et al. 2016, 2020; Bindi et al. 2019). Its maximum value is reached at T = 4.0 s similar to the combined between-event standard deviation by Kotha et al. (2020). The relatively small between-site (φ) standard deviation is a result of employing ground motions recorded only at stations with measured Vs30. As demonstrated by Weatherill et al. (2020), a substantial decrease in the between-site aleatory uncertainty is achieved using measured Vs30 as opposed to inferred values. Similar to that study, φ has a global maximum at T = 2.0 s, whilst it shows a local peak at T = 0.1 s. The latter was also observed by other studies (e.g., Bindi et al. 2019; Kotha et al. 2020) and as suggested by Bindi et al. (2019), it points out a high variability of short period site-response. Concerning the residual variability standard deviation (ψ), it remains relatively constant and close to the between-event until T = 0.5 s, similar to the results obtained by Bindi et al. (2019). Beyond T = 0.5 s, ψ decreases rapidly and it is also smaller than φ for T > 0.8 s.

Fig. 13
figure 13

Standard deviation estimates of the components of aleatory uncertainty

The relatively high between-event standard deviation (τ) can be attributed to a few key factors. The high number and spatial distribution of the considered events in the compiled database can lead to a large variability. Rodriguez-Marek et al. (2013) found that the between-event variance may vary regionally, while Stafford (2014) noted that this part of the aleatory variability includes a major component of epistemic uncertainty related to the nature of the between-event random effect for a given country or region. For these reasons, modern GMMs developed using large regional datasets (e.g., Kotha et al. 2020; Weatherill et al. 2020) typically include region or country specific terms to account for this part of ground motion variability. Moreover, the GMM of Kotha et al. (2020) estimated a considerably higher aggregated between-event standard deviation than its predecessor GMM (Kotha et al. 2016). Kotha et al. (2020) attributed this increase to the increased number of events with Mw between 4 and 5 included in the ESM in comparison to RESORCE. Weatherill et al. (2020) suggested that in this range, the estimates of Mw are affected by errors which may propagate into the between-event terms, consequently overestimating τ.

Overall, these results highlight the implication of the ergodic assumption commonly adopted in ground motion modelling, which assumes that the ground motion observed in a given space at a given time is equivalent to the ground motion at a different space and time caused by similar sources at sites with similar conditions (e.g., Stafford 2014; Kotha et al. 2017). Kotha et al. (2020) suggested the employment of regional attenuation, tectonic locality, and site-specific random-effect terms in order to constrain the total ground motion variability of GMMs when employing large spatially distributed datasets.

In Fig. 14, the distributions of the between-event residuals (δEi) of SA(0.3 s) and SA(1.0 s) are plotted against the Mw of the events. The error bars demonstrate the mean and ± one standard deviation of the residuals for each Mw bin of 0.5 size. Although a minor heteroscedasticity is apparent, a generally unbiased trendline of the mean residuals for both IMs is observed. An exception occurs for SA(1.0 s) in the Mw range between 4 and 5, where a slight overestimation trend is evidenced. A similar offset was observed by Bindi et al. (2019) in this range, attributing it to the high number of events in the ESM database for which Mw is estimated from local magnitude (ML).

Fig. 14
figure 14

Distribution of the between-event terms with respect to Mw for the two selected IMs [left: SA(0.3 s) and right SA(1.0 s)]. Binned means every 0.5 Mw units and ± one standard deviations are also plotted

Furthermore, the distribution of the between-station residuals (δSi) of SA(0.3 s) and SA(1.0 s) are shown with respect to Vs30 in Fig. 15. In this case, the error bars illustrate the mean and ± one standard deviation of the residuals per Vs30 bin of 100 m/s size. A considerable offset of the mean residuals is observed for Vs30 > 700 m/s and Vs30 > 500 m/s for SA(0.3 s) and SA(1.0 s), respectively. This trend is more predominant for the latter and it extends until 900 m/s. Kotha et al. (2020) obtained similar trends studying the distribution of site-to-site residuals of SA(0.1 s) and SA(1.0 s) with respect to the measured Vs30 of the stations.

Fig. 15
figure 15

Distribution of the between-station terms with respect to Vs30 for the two selected IMs [left: SA(0.3 s) and right SA(1.0 s)]. Binned means every 100 m/s and ± one standard deviations are also plotted

Finally, the residual terms (εi,j) are plotted against Rjb for both selected IMs in Fig. 16 to investigate their dependecy with source-to-site distance. Throughout the entire range of Rjb, the standard deviation of the residuals remains very close to ψ, while their respective mean is essentially zero. However, a slight bias and higher standard deviation are observed for very long distances (Rjb > 450 km), due to the limited amount of data in this range. However, for the purposes of seismic risk assessment, we note that such distances tend to be less relevant due to the associated low ground shaking.

Fig. 16
figure 16

Distribution of the residual terms with respect to Rjb for the two selected IMs [top: SA(0.3 s) and bottom SA(1.0 s)]. Binned means with increasing size in km and ± one standard deviations are also plotted

6 Conclusions

The work presented herein is the first part of a study that explores the application of artificial neural networks in earthquake scenario simulations of building portfolios. An ANN ground motion model was trained for the European region using a compiled ground motion database from two subsets of the ESM and NGA-West2 databases. The ANN’s input parameters are Mw, Rjb, Dhyp, Vs30 and FT, and the outputs are the median values of common IMs employed in seismic risk assessment. The hyperparameters were selected by applying a fivefold cross-validation, where a single hidden layer with 8 neurons and the tanh activation function provided a fine-tuned model. The GMM was developed following the mixed-effects approach and the algorithm proposed by Khosravikia and Clayton (2021). The weights and biases of the network were calculated iteratively and the total ground motion variability was partitioned into between-event, between-site, and residual terms.

The scaling relationships of the input parameters were evaluated for two common IMs used in seismic vulnerability studies. The results demonstrated physically sound trends documented by several studies in the past. The median response spectra predicted by the ANN for several cases were compared to other European GMMs. The comparisons illustrated differences amongst the GMMs in scaling with Mw and attenuation with source-to-site distance. Moreover, the median PGA predictions by the ANN were compared to the recorded values from several stations for six past events in Italy, revealing an overall satisfactory agreement.

The study of the random effect terms and their standard deviation with respect to period of vibration revealed interesting findings. On the one hand, similar results were also reported by recently developed GMMs for Europe. On the other hand, we noticed a relatively high between-event variance for all IMs, which can be attributed to the high spatial distribution of ground motion data included in the compiled database. We note that this is an implication of the ergodic assumption when diversified ground motion datasets from various regions are employed to develop GMMs. Consequently, future studies using ANN or ML algorithms should focus on develo** partially or fully non-ergodic GMMs, accounting for region and country specific ground motion variability, whilst transferring a part of it to epistemic uncertainty.

The applicability and reliability of ML models is generally limited to the range of input variables in the training dataset. This limitation of data-driven regression models has been discussed in various studies (e.g., Derras et al. 2014; Khosravikia and Clayton 2021). Although ANNs tend to perform better in comparison to other ML algorithms in extrapolating beyond the training dataset range, it is recommended that the proposed ANN GMM should not be employed for cases outside of the dataset range presented in Fig. 2. This is certainly a limitation in the approach proposed herein, in comparison to conventional GMMs that use mathematical formulations that facilitate the extrapolation of the model to other input parameters. This paucity in the training datasets can be mitigated using synthetic recordings that can be generated using physics-based models. These models can effectively capture aspects such as nonlinear site response and be employed to different combinations of input parameters (e.g., soil class, magnitude, distance). However, we note the computational demand of physics-based models might be significant.

The developed GMM is used in the companion study (Kalakonas and Silva 2022) to carry out earthquake scenario simulations for several historical events in the Balkan region. The ANN is publicly available in a GitHub repository along with the cross-validation metrics and the results of the mixed-effects algorithm. Finally, the ANN will be implemented in the OpenQuake-engine (e.g., Pagani et al. 2014) where it will be used for seismic risk assessment of building portfolios.