1 Introduction

When roads and railway lines are constructed, they often require slope cuttings (also known as cut slopes), especially in hilly topography. When cuttings fail, there can be huge economic and social consequences, especially when debris collides with infrastructure or vehicles. Cutting failures are widespread along roads in low- to lower–middle-income countries (LIC/LMIC) where natural susceptibility to slope failure from heavy rain and hilly topography is exacerbated by the inadequate design of cuttings and their stabilisation measures. To carry out planning, design and costing of stabilisation measures along transportation infrastructure, it is useful for practitioners to have a measure for the stability of a cutting (Liu and Wang 2021; Corominas and Moya 2008). Factor of Safety (FoS) values are approximate indicators of safety based on a single best estimate of the slope conditions with the FoS expressing a margin of safety, the value of which is highly dependent on the problem considered. For example, most geotechnical standards require FoS = 1.3 for a slope whilst they typically require FoS = 4 against pi** failure. One may think that the safety margin against pi** is much higher but actually the differences in the threshold values largely stem from the model and parameter uncertainties. Also, two slopes exhibiting the same FoS can have a vastly different probability of failure. Instead, probability of failure (P(f)) and its complementary probability of survival provide a better metric to estimate a slope’s (lack of) safety for practitioners and public authorities since the meaning of such metrics is well understood outside the boundaries of civil engineering.

Landslide hazard and risk assessment studies often estimate an annual probability of slope failure, often in map form (Corominas et al. 2014; Dai et al. 2002; Tang et al. 2018). Cost assessment studies require a frequency within a time window. Annual probability can also be thought of as a frequency; a probability set in a reference time frame (Tang et al. 2018). These are typically determined either through statistical models which involve analysis of landslide databases (e.g. Lu et al. (2020); Guzzetti et al. (2008)) or mechanistic models which involve slope stability analyses (e.g. Lu et al. (2022a)). A drawback of statistical models is that they require a large volume of reliable historical data which can be difficult (and costly) to attain, especially in the context of LIC/LMICs that are often data-poor (particularly in terms of data on smaller scale road cuttings). In addition, statistical approaches often do not consider the local geology and geomaterial conditions and, therefore, only represent the average slope conditions over a region. Statistical models also have restrictions in terms of planning slope stabilisation measures as they are necessarily based on a database of historic failures. These failures encompass not only a wide range of site-specific conditions (e.g. geometry, hydrology and material properties) but also of stabilisation measures.

Mechanistic models have been widely used to account for uncertainty in geomaterial mechanical properties and their spatial variability, in addition to uncertainty in geometric parameters, e.g. position of contact surfaces in a multilayered slope. A large number of mechanistic models have been developed to predict landslides across landscapes by coupling hydrological models of varying levels of complexity (e.g. SHALSTAB, Montgomery and Dietrich (1994), TRIGRS, Baum et al. (2008), GEOtop-FS, Formetta et al. (2014)) with the infinite slope model. Even those who treat the slope stability component more completely make the assumption that failure occurs within the soil or at the soil–bedrock boundary (Hess et al. 2017; van Zadelhoff et al. 2022; Bellugi et al. 2015). These models are not appropriate for cutting failures because of their representation of the failure surface. Mechanistic models that better capture the failure surface currently do not deal with complex upslope topography extending to the watershed for that slope (e.g. Manning et al. (2008); Rouainia et al. (2009); Holcombe et al. (2012)). Therefore, we suggest that there is a gap in literature: complete models that represent upslope hydrological conditions and stability of cuttings both in a high level of detail.

Probability of failure given some trigger event can be estimated using mechanistic models by conducting a probabilistic stability analysis (Dou et al. 2014; Zhang et al. 2014). Thus, the resultant probability of failure is always conditional, though this is not always explicitly recognised. The most common triggers of slope failure are pore water pressure and seismic shaking, which are driven by rainfall and earthquakes, respectively. These triggers are not binary, but they have characteristics (e.g. pore pressure/seismic acceleration) and these characteristics both: (1) affect the probability of failure (i.e. the probability of failure is conditional on the trigger characteristics); and (2) vary in space and time such that their future values at a given location are highly uncertain. Therefore, estimating probability of failure within some time window (e.g. annual probability) for a given location should account for not only the probability of failure conditional on a particular trigger event (P(f|T)) but also the probability of that trigger event (or those trigger conditions) occurring (P(T)). Further, since multiple trigger events result in non-zero failure probability (i.e. P(f|Ti)), occurrence probabilities are required for multiple trigger conditions P(Ti), for example in the form of a probability distribution of trigger intensities (Mori et al. 2020; Lu et al. 2022a, b). Yet many studies do not account for variability in trigger intensities and only focus on the variability of the geomaterial properties. In these cases, failure probabilities are really conditional probabilities (i.e. P(f|T)) and it is essential that they are reported as such, with the trigger conditions on which they are conditional clearly reported, and that they are not interpreted as true probabilities associated with some time window (e.g. annual probabilities) from which estimates of hazard or risk can be made. Annual probability of a rainfall-triggered slope failure is often assumed to be controlled by the most critical rainfall (the rainfall under which the failure probability of the slope is maximum amongst all rainfall events in a year) (Lu et al. 2022a; Tang et al. 2018). The use of a critical rainfall to represent annual probability of failure raises additional concerns regarding its derivation: (1) how do we know that this is the most critical rainfall (i.e. how do we know that it triggers max P(f|T))?; (2) how do we decide what year to use (i.e. large rainfall events could be a one in 10-year event, a one in 50-year event or a one in 100-year event)?

When the variability in trigger characteristics is accounted for, this is often done assuming that trigger probability is time independent, e.g. modelling pore pressure using rainfall events drawn from an Intensity–Duration–Frequency (IDF) curve (Holcombe et al. 2012; Lu et al. 2022b). IDF curves are joint probability distributions of the intensity and duration of storm rainfall. Intensity and duration of rainfall are widely considered as the primary rainfall characteristics responsible for generating landslide-triggering pore pressure distributions within slopes (Caine 1980; Guzzetti et al. 2008). IDF curves are generally based on very long records of rainfall and have been used in both statistical (e.g. Guzzetti et al. (2008)) and mechanistic (e.g. Tang et al. (2018)) rainfall-induced landslide models. Tang et al. (2018) determine a conditional slope failure probability and annual failure probability (alongside a deterministic FoS) of a partially saturated soil slope triggered by rainfall, using ‘random rainfall patterns’ (rainfall intensity over time) simulated by a random cascade model characterised by an IDF curve. They use a probabilistic framework (a Monte Carlo Simulation, MCS) to determine the conditional probability of failure, varying the random rainfall patterns. They determine the annual failure probability by multiplying the failure probability conditional on a set of rainfall intensities (obtained from MCS) with the occurrence probabilities of those rainfall intensities (obtained from an IDF curve). De Leon and Garduño (2020) account for uncertainty in geomaterial properties and rainfall in determining an annual failure probability for soil slopes under heavy rainfall. Soil density and strength, rainfall intensity and duration are assumed to be stochastic variables, with the soil properties assumed to be lognormally distributed and the rainfall intensity is assumed to be an exponential variable (the correlation between intensity and duration is taken into account). Mori et al. (2020) also determines the probability of slope failure accounting for uncertainty in material properties (soil strength and permeability) and trigger variability (rainfall intensity from an IDF curve) using a MCS framework. They use a smooth particle hydrodynamics stability model with random field modelling to capture the spatial correlation of the soil properties. They show that variability in saturated hydraulic conductivity can have a very strong influence on the probability of failure.

A key limitation of IDF curves is that there is often an assumption of constant rainfall over the entire duration of the storm which results in the shape of the storm time series being lost (the method used by Tang et al. (2018) relaxes this assumption). In addition, studies that determine rainfall-induced slope failure probability using IDF curves assume that rainfall is time independent and, therefore, do not account for antecedent rainfall conditions. However, this is rarely the case. In fact, many processes associated with landslide triggering are rarely truly time independent: earthquakes trigger aftershocks (e.g. Parsons (2002)), and damage rock altering its material properties (e.g. Jones et al. (2021)); rainfall events cluster over a range of timescales from days to years (e.g. Wang et al. (2005)); and phreatic surfaces are influenced by previous rainfall sometimes over weeks or months (e.g. (Iverson 2000)). Thus, an alternative to time-independent sampling becomes necessary where triggers display consistent and/or significant time-dependence (e.g. areas that experience dry and wet seasons). Some studies have sought to address this within an IDF-based framework by sampling different antecedent conditions (e.g. Frattini et al. (2009)), others generate synthetic time series sampled from IDF curves (e.g. Tang et al. (2018)). However, there have been surprisingly few attempts to account for the time-dependent probability due to rainfall sequencing through direct simulation of rainfall time series (see Jones et al. (2021); Ozturk (2022) for earthquake-triggered equivalents).

In accounting for rainfall variability, a hydrological model is required for the derivation of time-dependent pore water pressure conditions from the rainfall record. For cuttings, this has focussed on representing rainfall infiltration: De Leon and Garduño (2020) use a 2D Richard’s Equation (RE) solver to model rainfall infiltration on a 2D slope, they do not take into account the whole slope domain; Tang et al. (2018) perform rainfall infiltration analyses using 2D seepage analysis characterised by the soil–water characteristic curve and permeability function curve; Mori et al. (2020) simulate infiltration into the slope with rainfall intensity specified through the Gumbel distribution (where the rate of infiltration is dependent on permeability and on pressure head differences of different soil layers). They use transient seepage analysis following Darcy’s law. Holcombe et al. (2012) are an exception in that they use a 1D RE solver to model vertical infiltration but also represent groundwater flow below the phreatic surface with an explicit solution to the Darcy equation. A drawback of all these methods is that they focus on the cutting and do not account for the influence of the surrounding topography on the groundwater regime, i.e. the effect of hydrogeology and antecedent rainfall on the phreatic surface. Current approaches to modelling rainfall infiltration and seepage in cuttings have many limitations: (1) they are computationally expensive; (2) hydrological parameters (e.g. hydraulic conductivity) and their variation in space are uncertain and poorly constrained, but have a substantial influence on the model output; (3) lateral inputs (i.e from upslope) are rarely taken into account. Models that predict landslides across landscapes have used a wider range of hydrological treatments, some only consider infiltration (e.g. Iverson (2000)), but many highlight the importance of lateral subsurface flow, representing it either in isolation (e.g. Talebi et al. (2008); Montgomery and Dietrich (1994)), or in combination with infiltration (though these are computationally expensive, e.g. Formetta et al. (2014)). However, these treatments have not been applied to the problem of cutting stability.

In this research, we overcome the aforementioned pitfalls by choosing a hydrological model that captures some of the hillslope hydrological dynamics, but that runs in minutes rather than hours. This model uses Boussinesq’s groundwater theory to characterise groundwater flow (solving Darcy’s equation coupled with the conservation of mass equation). Boussinesq’s theory provides a diffusion wave solution that is less restrictive than other solutions for the governing groundwater flow equations (e.g. kinematic wave, Fan and Bras (1998); Troch et al. (2002), or by simulating steady-state rather than dynamic flow, Beven and Kirkby (1979); Talebi et al. (2008)). Note, the Boussinesq equation neglects the effect of capillary rise above the groundwater table and follows the Dupuit–Forcheimer approximation (flow through an unconfined aquifer) that the streamlines are approximately parallel to the impermeable boundary (Boussinesq 1877). The original 1D Boussinesq theory cannot account for the 3D geometry of a slope (e.g. convergence or divergence, convexity or concavity). But slope geometry is known to significantly influence the hydrologic response (Troch et al. 2003, Eq. 6).

Fan and Bras (1998) and Troch et al. (2002) have developed simple models of groundwater flow for more complex geometries. Building on Fan and Bras (1998) and Troch et al. (2002), Troch et al. (2003) reformulated the equations of Boussinesq in terms of storage to develop a more complete hillslope-storage equation (relative to the previous kinematic wave approximations) that is applicable to more complex hillslopes. The continuity and Darcy equations of the Boussinesq equation are reformulated in terms of storage along the hillslope to derive the so-called Hillslope-Storage Boussinesq (HSB) equation (see Eq. 1 in Sect. 2.2). By introducing soil moisture storage, the 3D groundwater flow problem recast as a 1D flow problem for a 1D slope of variable planform width with an inclined planar impermeable boundary. The HSB model can be applied to any slopes that can be considered as continuous media, such as soil slopes and rock slopes with no dominant fracture orientation (as dominant fracture orientation would affect the flow motion). In these cases, Darcy’s equation and the continuity equation are valid.

Further simplifications enable faster numerical or even analytical solutions (e.g. Troch et al. (2004); Hilberts et al. (2004); Talebi et al. (2008)) but these are unnecessary in this case since the Troch et al. (2003) model is already sufficiently fast to run 1000 s of simulations within hours. Thus, Troch et al. (2003) HSB model is used in this study to simulate dynamic phreatic surface changes in response to rainfall time series and to capture the influence of upslope hillslope geometry. Paniconi et al. (2003) found that the HSB is able to capture the broad shapes of the storage and the outflow profiles for all hillslope profiles.

Here we present a computationally efficient mechanistic model for rainfall-triggered slope failure probability accounting for trigger variability using rainfall time series and upslope influence by coupling to a hillslope hydrological model. We choose a mechanistic rather than a statistical model so that its application is not limited to data-rich regions. We believe that this method is novel for establishing the frequency of a cutting failure as we develop a model that treats both the complexity of the failure mechanics and the hillslope hydrology in a computationally efficient way enabling efficient large-scale application. Conditional probability based on a single representative storm ignores known variability/uncertainty in trigger conditions. Our model is also novel as it accounts for the uncertainty about rainfall using a time-dependent method, as well as accounting for uncertainty in geomaterial properties. Where previous models have sampled from IDFs to generate triggering rainfall, we drive our model from real rainfall time series to correctly capture failure probability in situations where phreatic surface geometry and, thus, failure probability may be strongly dependent on antecedent conditions and rainfall sequencing. This scenario is particularly prevalent in areas that exhibit seasonal weather patterns (e.g. a dry and wet season, monsoons). The hydrological model we use accounts for lateral inputs by modelling the whole hillslope domain, whilst remaining computationally efficient. The model is developed by coupling a probabilistic slope stability analysis with a hillslope hydrological model to predict time-varying phreatic surface conditions resulting from input rainfall over a long time window of >10 years and accounting for the entire hillslope topography.

We used our model to estimate the absolute frequency of a slope failure over 11 years (\(F_f\)), which is closely related to annual failure probability, for a road slope in Nepal triggered by rainfall. We also explore the limitations of conditional probability of failure for a specific pore pressure scenario by examining how conditional probability, P(f|T), varies over time for a road cut slope in Nepal.

2 Methodology

Our methodology can be divided into three key sections:

  1. 1.

    Using MCS to determine a FoS distribution with FoS as a function of the phreatic surface level and the varied geomaterial parameters (Sect. 2.1).

  2. 2.

    Using the HSB model (Troch et al. 2003) to generate phreatic surface time series (Sect. 2.2).

  3. 3.

    Combining the outputs of the MCS and HSB model to determine time-varying P(f|T) and a \(F_f\) (Sect. 2.3).

First, a MCS is conducted to determine multiple FoS values that capture the aleatoric uncertainty in the geomaterial strength properties, for several seepage scenarios each associated with a different phreatic surface height at the upslope boundary (explained in Sect. 2.1). The slope geomaterial strength is characterised according to the Generalised Hoek–Brown (G-H–B) failure criterion which is today the most popular criterion to characterise rock mass strength (Wyllie and Mah 2017). Also well-constrained estimates of some G-H–B parameters can be reliably established from geological and geomorphological field observations (Robson et al. 2022). Second, relative to slope strength and hydraulic conductivity, meteorological information is easier to acquire, though in many countries, the network of gauges remains sparse. We assume that local daily rainfall is expressed by a time series and these data are available in the following analysis. The so-called hillslope-storage Boussinesq (HSB) equation is solved to generate a phreatic surface time series for the slope to account for local hydrological conditions in response to rainfall (discussed in Sect. 2.2). By associating the phreatic surface level time series from the HSB model with those assessed in the MCS, a FoS time series is generated for each parameter realisation. The time-varying P(f|T) and \(F_f\) can then be calculated from the FoS time series (discussed in Sect. 2.3).

The proposed methodology requires the following input data: rainfall time series, G-H–B parameters characterising the slope geomaterial, estimates of hydrological parameters and slope geometry. Our approach is especially advantageous when geotechnical investigation cannot be conducted either due to cost or difficult terrain (i.e. mountainous, remote areas). The questions raised over the use a a time-dependent model to represent trigger uncertainty, and what slope failure probability failure means and how it can be used are discussed in Sect. 5.2. The uncertainties and assumptions of the model are discussed in Sects. 5.1 and 5.3, respectively.

An overview of the methodology is presented as a flowchart in Fig. 1. Table 1 outlines all the parameter symbols used in this paper, categorised by the geotechnical, hydrological and failure frequency model.

Fig. 1
figure 1

Flowchart outlining the key steps of methodology to estimate the frequency of failure and the conditional probability of failure. \(N_r\) (number of realisations of G-H–B parameters varied in the Monte Carlo Simulation), \(N_z\) (number of phreatic surfaces tested in the seepage analysis) and \(N_k\) (number of realisations of hydraulic conductivity) were determined through convergence tests. Abbreviations: DEM = Digital Elevation Model, FE = Finite Element, FoS = Factor of Safety, G-H–B = Generalised Hoek–Brown, HSB = Hillslope-Storage Boussinesq, k = hydraulic conductivity, LEM = Limit Equilibrium Method, MCS = Monte Carlo Simulation, and Z = total head boundary condition

Table 1 Table of all parameter symbols used in this paper

2.1 Monte Carlo Simulations

It is well known that model parameters are characterised by two sources of uncertainty: (1) the uncertainty due to intrinsic material conditions; and (2) spatial variability in the analysed domain. For the spatial variability, plenty of experimental evidence shows that all rocks and soils are nonhomogeneous (Phoon and Kulhawy 1999a, b). This problem could be modelled using random field theory (e.g. Griffiths and Fenton (2000); Dyson and Tolooiyan (2019); Gravanis et al. (2014)), but this necessitates the definition of length scales associated with the variability (spatial correlation lengths). These length scales are extremely poorly constrained (Shokri et al. 2019) for the most intensively monitored sites globally; thus, there is a lack of general guidance on reasonable length scales of variability for different material properties in different settings (e.g. rock types). Furthermore, intensive field testing is necessary to estimate these length scales and this would be prohibitively expensive. For these reasons, we neglect the spatial arrangement of variability and we concentrate solely on the intrinsic conditions.

To account for aleatory uncertainty in material properties, we represent them as distribution density functions. For an assigned phreatic surface level, we perform Monte Carlo Simulations (MCS) to propagate the material uncertainties to the probability of failure of the slope. The MCS propagate parameter variability captured in probability density functions through the stability model to find the conditional probability of failure (i.e. the number of realisations which failed divided by the total number of realisations) (Fenton and Griffiths 2008).

In the MCS, not all material parameters are modified, but only those that are significant to stability conditions. For this purpose, we conducted a one-at-a-time sensitivity analysis to determine which G-H–B parameters the model is most sensitive to and, therefore, which G-H–B parameters should be varied as part of the MCS. To do so, literature-based estimates for the most likely, upper limit and lower limit for each parameter were input into the model, and the standard deviation (STDEV) in output FoS was computed for each parameter tested. In our case, we found that the FoS is most sensitive to the Geological Strength Index (GSI). From a physical point of view, a reduction of this quantity represents the physical degradation of the exposed rock mass, taking into account the blockiness of the mass and its surface conditions (Marinos and Hoek 2000).

The parameters chosen to be varied in the MCS were sampled from a lognormal distribution as there is general agreement in studies that examine or use rock strength distributions that the distribution should be uni-modal with many researchers suggesting a lognormal distribution as a reasonable model for the physical properties of rock (Hoek 1998; Parkin and Robinson 1992; Nour et al. 2002; Massih et al. 2008; Griffiths and Fenton 2009; Pandit et al. 2019), particularly amongst those with a strong physical constraint on the distributions lower bound (e.g. values cannot be negative). Furthermore, a lognormal distribution ensures that the random value of the parameter will always be positive. In a probabilistic analysis, any correlation between parameters should be accounted for (Zeng et al. 2014). However, the correlation coefficients between the G-H–B parameters are poorly constrained. Pan et al. (2017) plot failure probabilities as a function of the normalised slope height, corresponding to correlated and independent variables and find that when the G-H–B parameter correlations are considered, P(f) is slighty reduced. However, Pan et al. (2017) conclude that the influence that these correlations have on the P(f) is ‘rather small’ (p. 6). Correlation relationships of the G-H–B parameters are neglected in a number of probabilistic slope stability studies (Li et al. 2008, 2012; Pandit et al. 2019; Farichah and Hutama 2020). Given the uncertainty in the correlation coefficients between the G-H–B parameters, the correlation relationships are also neglected here. \(N_r\) random realisations of GSI were generated from the lognormal distribution (with \(N_r\) based on a convergence analysis).

We performed \(N_r\) deterministic stability analyses, representing each value of GSI, using the Morgenstern-Price (M-P) Limit Equilibrium Method (LEM) using Rocscience, Slide2. The M-P method was employed here as it satisfies all equations of equilibrium (rigorous equilibrium method) and is found to be most accurate across different slope conditions (Duncan 1996). We used the default half-sine interslice force function; however, the influence of the choice of interslice force is negligible for homogeneous slopes (Fredlund and Krahn 1977). A convergence test was first carried out to determine the optimal number of slices to be used in the LEM. The external boundaries of the model were chosen such that they have no effect on the resulting FoS.

MCS (\(N_r\) deterministic LEM analyses) were performed for the slope cutting for \(N_z\) phreatic surfaces imposed at various heights to capture phreatic surface variability. The phreatic surfaces were generated using Finite Element (FE) steady-state seepage analyses. \(N_z\) total head boundary conditions at different heights, Z, were imposed on the upslope boundary of the model. The elevation range in Z is from the ground surface to the elevation below which the phreatic surface does not influence the failure surface and thus the FoS. Z are equally spaced between these two elevation limits with the spacing set by the mesh element size, since the phreatic surface is insensitive to spacing more granular than this. The number of mesh elements for the FE analysis was optimised using convergence testing on the resultant FoS. We assumed homogeneous, isotropic hydraulic conductivity and no vertical inputs (these assumptions are taken to simplify our model into 1D, allowing for proof of concept for the overall methodology). Unsaturated zones in the slope were neglected. The value of hydraulic conductivity (k) prescribed to the model does not affect the resulting phreatic surface, nor the FoS since the seepage analysis is at steady state. Thus, the analyses did not need to be performed at different k realisations.

In summary, we performed \(N_r \times N_z\) LEM stability analyses and we obtain FoS as a function of the phreatic surface level Z and the GSI value. For a fixed phreatic surface level, the FoS is a monotonically increasing curve with increasing GSI and for each Z, we obtain different curves in which the FoS reduces with increasing Z. The key steps of the MCS used in our model are highlighted in blue in a schematic diagram presented in Fig. 2. In the subsequent section, we illustrate the model employed to evaluate the phreatic surface level time series under unsteady-state conditions.

Fig. 2
figure 2

Schematic diagram highlighting the key elements of the probabilistic (system highlighted in blue) and hydrological (system highlighted in red) models. The yellow area of the slope shows the stability model domain (the external boundaries of the stability model were chosen such that they have no effect on the resulting FoS). \(N_r\) Limit Equilibrium Method (LEM) analyses are conducted representing each realisation of GSI for \(N_z\) phreatic surfaces generated by Finite Element (FE) method seepage analyses from total head nodes (Z) at the upslope boundary of the stability model. \(N_k\) phreatic surface time series are generated through the Hillslope-Storage Boussinesq (HSB) model representing each value of hydraulic conductivity (k) (Color figure online)

2.2 Phreatic Surface Time Series

In the methodology presented here, we aim to determine the frequency with which the slope experiences each phreatic surface level considered in the previous MCS. For this purpose, we use a simple 1D hydrological model to simulate the phreatic surface height (and thus total head used in the FE seepage analysis) at the upslope boundary of the slope stability model. As an input in this model, we considered a rainfall time series rather than sampling storms affecting the slope of interest from an IDF distribution, to account for the correct timing and shape of storms and build up of pore water pressure in the slope (a time-dependent system). However, the structure of our model is such that a storm-based slope stability analysis could easily be performed as an alternative if a suitable IDF were available for the site.

The HSB model of Troch et al. (2003) is used to generate a phreatic surface time series (evolution of the phreatic surface) from the rainfall time series at the location of the upslope boundary of the stability model domain in the hillslope. Table 1 outlines the input parameters required for the HSB model (hydrological model). The boundary conditions are presented in Table 2.

Table 2 Table of initial and boundary conditions for the hillslope hydrological model (Hillslope-Storage Boussinesq model)

All variables depend on time (t) and space (x), with x being a distance measured from the channel to the ridge. As per the requirements of the HSB model presented by Troch et al. (2003), \(B_p\) and \(\alpha\) are constant. \(B_w\) is constrained to be in the range between zero (no water) and \(B_p\). This means that no flux at the slope surface is modelled.

By combining the continuity equation with Darcy’s equation, reformulated in terms of the storage equation (\(S(t,x) = n_f w B_w\)), a version of the HSB partial differential equation (Troch et al. 2003) (Eq. 6) of second order in space and first order in time is obtained:

$$\begin{aligned} n_f\frac{\partial S}{\partial t}=&\;\frac{k \cos {\alpha }}{n_f}\frac{\partial }{\partial x}\left[ \frac{S}{w}\left( \frac{\partial S}{\partial x} - \frac{S}{w} \frac{\partial w}{\partial x} \right) \right] \nonumber \\ {}&\quad + k \sin {\alpha } \frac{\partial S}{\partial x} +n_frw \end{aligned}$$
(1)

For many hillslopes, the relationship between hillslope width (i.e the width of the flow strip) and distance upslope can be approximated as an exponential function of the form (Troch et al. 2003, p. 7):

$$\begin{aligned} w(x)=w_0 \exp {(\beta x)} \end{aligned}$$
(2)

where \(w_0\) is the hillslope width at \(x=0\) and \(\beta\) is a shape factor controlling the variation of the width along the x-axis. If \(\beta =0\), a constant width profile is obtained (i.e. \(w(x)=w_0\)), negative values of \(\beta\) reflect divergent hillslope topography with respect to the toe of the slope and positive values reflect convergent topography.

Equation 1 is numerically integrated after the initial condition and the boundary conditions are imposed. The equations are solved using a finite difference method implemented in MATLAB and the main variable computed is the storage time series. The phreatic surface time series is then computed from the storage.

In situ, k is strongly dependent on the characteristics of the geomaterials (e.g. degree of weathering, fragmentation, and grain size heterogeneity) so that a deterministic value is not physically reasonable. However, spatial variability in k would be prohibitively computationally expensive to constrain. To account for the variability of k, a statistical approach was used so that multiple simulations of the model were conducted, each with a unique value of k drawn from a lognormal distribution of k. A lognormal distribution is chosen to prevent negative values. The lognormal distribution is characterised by the 1st and 99th percentiles which are the lower end and upper end of the k values for the slope, allowing for occasional occurrences of values outside the range. A convergence analysis was conducted to determine the minimum number of realisations of k required for the model (\(N_k\)). Phreatic surface time series are generated for each value of k, using the HSB model. We compare the resulting time-varying phreatic surface fluctuations to our expectations based on qualitative field observations at the site to constrain the values for the 1st and 99th percentiles of k. As with Troch et al. (2003), it is assumed that the variation in k with depth can be approximated as a step function, with uniform k geomaterial above an impermeable boundary (discussed further in 5.3).

The value of geomaterial porosity (\(n_f\)) used in the model was assumed from an average of literature values for the rock type at the site as it is not as variable as k (Singhal and Gupta 2010) (further discussed in Sect. 5.3). The catchment area upslope of the cutting (i.e. the area that would drain through it) was defined using a Digital Elevation Model (DEM) in ArcGIS (see Fig. 3). In doing so, the watershed was defined by projecting a line upslope from the cutting along the line of steepest inclination to the ridge (defined at the point where inclination goes to zero). This line is the hillslope length (L), as depicted in white in Fig. 3. The extent of the slope width is defined as where the hillslope switches from convex to concave form at a gulley (these slope extents are shown in lime green in Fig. 3). The shape factor (\(\beta\)) is estimated by measuring the width of the hillslope (distance between the two slope width extents) at three points along the slope length (\(w_0\), \(w_1\) and \(w_2\) in Fig. 3) and taking the best-fitting exponential of these three widths.

Fig. 3
figure 3

Hillslope measurements to define catchment area. The extent of the slope widths is shown in light green (defined as where the hillslope switches from convex to concave form at a gulley), measurements of the slope width (\(w_0\), \(w_1\) and \(w_2\)) are shown in pale pink and the length of the slope in white. \(w_0\), \(w_1\) and \(w_2\) are used to estimate the shape factor (\(\beta\)). Image sourced from google earth (Color figure online)

The depth to the impermeable boundary and its inclination were taken based on assumptions as there are few constraints that can be drawn from observations at this site and there is no accepted theory on the depth of the permeable region in a hillslope (discussed as the critical zone) in the scientific community (Anderson et al. 2019; Grant and Dietrich 2017; Flinchum et al. 2018; Clair et al. 2015). In literature, there is general agreement that the impermeable boundary tapers in inclination towards the ridge, meaning that the permeable layer is thicker at the ridge (Anderson et al. 2019; Grant and Dietrich 2017; Flinchum et al. 2018; Clair et al. 2015; Anderson et al. 2013; Medwedeff et al. 2022). Thereby, we assumed that the inclination of the impermeable boundary (\(\alpha\)) is 10\(^\circ\) lower than the inclination of the ground surface. The sensitivity of the model to this assumption is explored in Sect. 5.3.

The downslope boundary condition of the HSB model implies that the impermeable boundary intersects the ground surface at this boundary, in most cases a river at the base of the hillslope with the impermeable boundary at the bed of the river and a wedge of permeable material extending upslope. The impermeable boundary was taken such that it intersects the ground surface at the river (the downslope boundary of the hillslope). Figure 4 displays a 2D section of the hillslope in profile, highlighting the geometrical properties of the model.

Fig. 4
figure 4

Illustration of the Hillslope-Storage Boussinesq slope profile highlighting the geometry of the impermeable boundary. The Slide2 stability model domain is shown in yellow (the external boundaries were chosen such that they have no effect on the resulting FoS). The hillslope length (L) is measured from the channel to the ridge. The impermeable boundary is highlighted in brown, where \(\alpha\) denotes its inclination. The thickness of the permeable layer (\(B_p\)) is measured at the upslope boundary of the Slide2 model so that the impermeable boundary is at the level of the river channel at the river channel. A potential phreatic surface is shown in blue, with the phreatic surface height above the impermeable boundary denoted by \(B_w\) (Color figure online)

We obtained an 11-year daily rainfall time series from a local meteorological station and duplicated this resulting in a 22-year time series to avoid the effects of the condition occurred before our analysis on the subsurface flux. If there is no information on the phreatic surface, the initial condition of the phreatic surface is set to a height of zero (i.e. the level of the impermeable boundary). The model should be spun up to minimise the influence that this assumption could have on the model.

The HSB model is written in MATLAB. \(N_k\) model simulations are conducted each with a unique value of k, resulting in \(N_k\) phreatic surface time series. The key steps of the HSB model are highlighted in red in a schematic diagram presented in Fig. 2. The uncertainties and assumptions made in the HSB model are discussed in Sect. 5.

2.3 Combining the Models

In this section, we combine the results obtained from Sects. 2.1 and 2.2 to obtain the probability of failure and the frequency of failure (presented as a schematic diagram in Fig. 5). For an assigned value of k, we discretised the phreatic surface time series according to Z values that were used to generate the phreatic surfaces for the MCS. Prior to doing so, the reference frame of the output of the HSB model (total head time series) has to be converted to match the horizontal/vertical Cartesian reference frame of the phreatic surface total head boundary condition in the stability model (Z). This is done by dividing the phreatic surface from the HSB model by the cosine of the inclination of the impermeable boundary. Once the reference frame conversion is made, the ground level in the phreatic surface time series is made equal to the ground level of the stability model. By discretising each phreatic surface time series according to Z, FoS can be expressed as a discretised function of the phreatic surface for one GSI realisation and k realisation; therefore, we obtain a unique discretised FoS time series for each GSI realisation and k realisation.

Fig. 5
figure 5

A schematic diagram highlighting how the outputs of the Monte Carlo Simulations (\(N_z\) FoS distributions) and the Hillslope-Storage Boussinesq model (\(N_k\) phreatic surface time series) are combined to determine a time-varying conditional probability of failure (P(f|T)) and a frequency of failure (\(F_f\))

Combining the results of the probabilistic stability analyses and the hydrological model in this way significantly reduces the computational time that would otherwise be required to determine a FoS for every water level in this time series for every rock parameter realisation and every value of k.

The FoS time series for each realisation is converted to a binary ‘failure’ time series with failure for FoS<1 and stability for FoS>1.

2.3.1 Method to Determine Time-Varying Conditional Failure Probability

After doing the previous combination, we obtain \(N_r \times N_k\) discretised FoS time series. We consider slope failure when the FoS is <1. If we denote \(N_{f}\)(t) as the number of failures for a fixed time step (t) considering all of the FoS time series then P(f|T) (probability accounting for the phreatic surface) for each time step P(f|T(t)) is given by

$$\begin{aligned} P(f|T(t))= N_{f}(t)/(N_r \times N_k) \end{aligned}$$
(3)

2.3.2 Method to Estimate the Frequency of Failure

Assuming that the slope is returned to its pre-failure condition after each failure (this assumption is discussed further in Sect. 5.3), each ‘failure’ time series is worked through chronologically, and when failure occurs the following steps are taken: (1) iterate a failure count for that failure time series; and (2) zero the failure time series (preventing further failures from being counted) for a number of remediation days (the choice of remediation days is discussed further in Sect. 3.3 whilst the sensitivity of the model output to this parameter is tested in Sect. 5.3) to allow time for debris to be cleared and the cutting to be reinstated as it was. If we denote \(N_{f}\) as the number of failures considering all of the FoS time series, then the number of landslides per the timescale of the rainfall time series (\(F_f\)) is given by

$$\begin{aligned} F_f= N_f /(N_r \times N_k) \end{aligned}$$
(4)

2.4 Case Study

We demonstrated this methodology on a road cutting on the Narayanghat-Mugling road in Chitawan, Nepal (see Fig. 6 for map). Nepal has an elevation range of less than 100 m in the Terai region in the south to up to 8000 m in the Himalayan mountains. The region is tectonically active and experiences a summer monsoon season for four months of the year during which time 80% of Nepal’s annual rainfall occurs (Shakya and Nirula 2008). The case study site is at an elevation of around 240 m. It is situated in an area of sedimentary to low-grade metamorphic rocks of Proterozic age (2500–539 Ma) in the Lesser Himalaya geological region. At this site, there is an above-road cut slope around 25 m in height and 70\(^\circ\) inclination made up of weathered phyllite (see Fig. 7 for image of cutting). This cut slope exists in a valley side hillslope that is inclined at c. 25\(^\circ\) above the cutting. The road was originally excavated by blasting around 40 years ago (personal communications with consultant in November 2019). A 2 m tall gabion wall constructed along the cutting collapsed due to a minor rockfall during the 2019 monsoon season. Below the road, there is a 15 m long slope descending into the Trishuli River.

Fig. 6
figure 6

Map showing the location of the case study site along the Narayanghat-Mugling highway in the Chitawan District of Nepal

Fig. 7
figure 7

Image of road cutting case study along the Narayanghat-Mugling road in Chitawan, Nepal. Image taken in November 2019

The slope cutting of exposed rock mass is characterised according to G-H–B failure criterion. For all the G-H–B parameters and unit weight (\(\gamma\)), the values for the most likely, upper and lower limits of the parameter were estimated based on field observations of the slope and values from the literature (outlined in Table 3). GSI is a measure of the physical condition of the exposed rock mass (Marinos and Hoek 2000), whilst the unconfined compressive strength (\(\sigma _{ci})\) accounts for the strength of the rock (Hoek 1998). GSI and \(\sigma _{ci}\) were estimated according to field observations (the rock, phyllite, is disintegrated with a highly weathered surface), using Marinos and Hoek (2000) as guidance. The values for the G-H–B material constant, m\(_i\) and \(\gamma\) were estimated based on expected ranges for the rock type (weathered phyllite) taken from literature (\(\gamma\) from Fine (2021) and \(m_i\) from Marinos and Hoek (2000)).

Table 3 Most likely, upper and lower limit estimates for values of G-H–B parameters and unit weight (\(\gamma\)) for a slope along the Narayanghat-Mugling road based on geological and geomorphological field observations

The Disturbance Factor (D) in the G-H–B criterion accounts for the blast damage and stress relaxation of a rock mass. Despite this cutting being excavated by historical blasting, which is likely to have caused disturbance, D was taken as zero across the slope for the sake of simplicity. This choice is discussed further in Sect. 5.1.

Daily rainfall data spanning an 11-year period from January 2010 to December 2020 recorded at the Sakhar meteorological station in the Tanahu district (Gandaki Province) were acquired from the Government of Nepal Department of Hydrology and Meteorology (Fig. 8). Sakhar meteorological station is around 13 km from the research site and lies at an elevation of around 60 m higher than the road by the research site. It should also be noted that no data were available from the station for 2013, thus missing data were replaced by repeating the 2014 record to enable continuous simulation since a break in the simulation would have necessitated re-defining the model initial conditions.

Fig. 8
figure 8

Daily rainfall (m/s) from 2010–2020 recorded at Sakhar meteorological station in the district of Tanahu acquired from the Government of Nepal Department of Hydrology and Meteorology. Data for 2013 were missing and have been replaced with a copy of the data from 2012 (highlighted in orange) (Color figure online)

The application of our methodology to this case study is discussed in the next section (Sect. 3).

3 Site-Specific Methodology

3.1 Site-Specific Probabilistic Slope Stability Model

2D stability analyses were performed using M-P LEM in Rocscience, Slide2. Literature-based estimates for most likely, upper limit and lower limit values of each parameter (outlined in Sect. 2.4) were input to the model, and the STDEV in output FoS was computed for each parameter tested. For \(\gamma\), the STDEV was found to be negligible at 0.01, and, therefore, a single value of \(\gamma\) was employed. The STDEV of the FoS for GSI, m\(_i\) and \(\sigma _{ci}\) were found to be 0.5, 0.1 and 0.1, respectively.

Given that the model exhibits high sensitivity to the GSI compared to the other parameters (Pandit et al. 2019; Chen and Lin 2019; Hoek 1998; Pan et al. 2017), it was varied as part of the MCS. Instead, \(\sigma _{ci}\) and m\(_i\) were kept constant.

The rock mass of the cutting examined here was identified through geological observations to be phyllite. Field observations at the cutting suggested that GSI ranged from 15 to 30 based on the GSI charts of Marinos and Hoek (2000) as guidance. These values were used to parameterise a lognormal distribution assuming that the lower and upper limits reflected 1st and 99th percentiles of the distribution, respectively. This is to reflect its expected uni-modal form and allow for occasional occurrence of values outside the typical range (1st and 99th percentiles chosen so that only a very small proportion fall outside this range). The lognormal distribution of GSI is shown in Fig. 9.

Fig. 9
figure 9

Lognormal distribution of the Geological Strength Index categorised in terms of the 1st and 99th percentiles being the lower and upper limits of the reasonable range for the cutting made up of phyllite

Deterministic stability analyses were performed using the M-P 2D LEM in Rocscience, Slide2. A convergence test was first carried out to determine that the optimal number of slices to be used in the LEM slope stabilisation analyses is 50. The external boundaries of the model were chosen such that they have no effect on the resulting FoS. By conducting a convergence analysis, we find that the optimal number of realisations of GSI for the MCS is 1000 (i.e. \(N_r=1000\)). Further realisations result in <1% change in the output of the model. See Table 4 for parameter values used in each MCS analysis.

Table 4 Table displaying values for Generalised Hoek–Brown parameter (intact rock parameters, m\(_i\), unconfined compressive strength, \(\sigma _{ci}\), and disturbance, D) and unit weight (\(\gamma\)) used in each deterministic analysis

FE steady-state seepage analyses were performed in Slide2, solving for Darcy’s equation and the continuity equation. The mesh for the FE analyses was made up of 10,000 six-noded triangles (optimised based on convergence testing). Total head Z at the upslope boundary ranges from 92.48 m above river level (when the phreatic surface is at the ground surface) to 45 m above river level (below which the phreatic surface does not influence the failure surface and thus the FoS). Seepage analyses were performed for 25 total head values (\(N_z=25\)) each with a different Z equally spaced from 45 to 92.48 m (see Fig. 10 for model set up).

Fig. 10
figure 10

Stability analysis model set up in Rocscience, Slide2. Cutting is 25 m in height inclined at 70\(^\circ\). The topography upslope of the cutting is inclined at 25\(^\circ\). Total head boundary condition (Z) equally spaced between 45 and 92.48 ms imposed at the ridge side of the slope to generate phreatic surfaces

One thousand deterministic slope stability analyses (varying the GSI according to the lognormal distribution) were carried out for twenty-five phreatic surface scenarios (i.e. 1000 \(\times\) 25 LEM stability analyses were conducted). The predominant failure mechanisms observed in these analyses were shallow in terms of aspect ratio and constrained to the cut slope itself. Figure 11 displays probability density functions for the 1000 factors of safety determined through these deterministic slope stability analyses for every phreatic surface. As expected, the distribution of FoS shifts towards lower values as Z increases and the fraction of runs with FoS <1, i.e. the conditional probability of failure for that phreatic surface scenario increases.

Fig. 11
figure 11

Probability density function of the 1000 factors of safety values determined for the cutting for 25 phreatic surface scenarios. Phreatic surfaces are determined by carrying out Finite Element (FE) method seepage analysis with a boundary condition of total head (Z) on the ridge side of the model varying from 45 m in height to 92.48 m

3.2 Site-Specific Hillslope Hydrological Model

The cutting examined here is made up of weathered phyllite. Phyllite is a crystalline, foliated, low-grade metamorphic rock. Such rocks generally have comparatively high porosity but low permeability (Singhal and Gupta 2010). Values of k for fractured crystalline metamorphic rock were taken from Singhal and Gupta (2010) (\(10^{-9}\) to \(10^{-5}\) m/s) and applied in the model to predict time-varying phreatic surface fluctuations. We observed no evidence of surface run off at this cutting and inferred that surface saturation is rare within or upslope of the cutting. Therefore, we imposed a rule that the 1st percentile value of k cannot result in a phreatic surface at the ground level for a sustained amount of time (more than a day). This considerably increased the 1st percentile of k to \(1.7\times 10^{-6}\) m/s from the minimum value found in the literature (\(10^{-9}\)) which is perhaps indicative of a relatively high degree of weathering on the slope. The 99th percentile is given as the maximum values for k in metamorphosed crystalline rock (\(1\times 10^{-5}\) m/s). The 1st and 99th percentiles are use to characterise a lognormal distribution of k. Two thousand realisations of k (\(N_k=2000\)) are then generated from the lognormal distribution. Two thousand is the minimum number of realisations needed in the model determined by a convergence analysis.

The value of \(n_f\) used in the model was taken from an average of fractured crystalline metamorphic rocks (Singhal and Gupta 2010). The catchment area upslope of the cutting is defined using the Shuttle Radar Topography Mission (SRTM) DEM of 1-arc second resolution in ArcGIS. The slope width is measured at three places along the slope length (at x = 0 plus two random locations further towards the ridge, see Fig. 3) to estimate \(\beta\) (by taking the best-fitting exponential of these three widths).

Based on the observation that there is no evidence of the water table daylighting at the ground surface above the cutting on the hillslope, it is determined that the impermeable boundary is at a depth below the ground surface across the length of the hillslope. Based on the literature, we assume that the impermeable boundary tapers in inclination with distance towards the ridge (Anderson et al. 2019; Grant and Dietrich 2017; Flinchum et al. 2018; Clair et al. 2015; Anderson et al. 2013; Medwedeff et al. 2022). Thereby, it is assumed that the impermeable boundary is at an inclination of 15\(^\circ\), 10\(^\circ\) lower than the ground surface. The slope profile is composed of a planar 25\(^\circ\) slope with a steeper 70\(^\circ\) cutting near its toe. The impermeable boundary is configured such that it intersects the ground surface at the river resulting in a permeable layer thickness of c. 51.28 m at the upslope boundary of the Slide2 model. The key input parameters for the HSB model for this case study are outlined in Table 5.

Table 5 Case study input parameters for the HSB model

The rainfall time series is shown in Fig. 8. This is an 11-year record of daily rainfall. Given that we have no information on the phreatic surface, the initial condition of the phreatic surface was set to a height of zero. To minimise the influence that this assumption could have on the model, the model was spun up using a duplication of the 11-year rainfall record. We believe that this duration (an additional 11 years) is sufficient for the subsurface flux to avoid the effects of the conditions that occurred before our analysis.

MATLAB was used to run the HSB model according to the method outlined in Sect. 2.2 for 2000 realisations of k taken from the lognormal distribution. The model output was 2000 phreatic surface time series. Figure 12 displays the model output for 100 realisations of k varying from the minimum to maximum k in the distribution over 11 years. The lower the value of k, the higher the phreatic level time series. The effect of the annual variation in rainfall can be seen in this figure; during the monsoon season, the phreatic surface in the slope increases, and then decreases again during the dry season.

Fig. 12
figure 12

One hundred phreatic surface time series derived using the HSB model based on one hundred realisations of k. 50 m represents the ground surface and 0 m is the impermeable boundary

Figure 13 displays probability density distributions of the phreatic surfaces (across all realisations of k) at four time steps to showcase the variation in trends: (1) time = 0.74 years (peak, higher peak values); (2) time = 1.33 years (trough, higher peak values); (3) time = 6.80 years (peak, lower peak values) and (4) 7.46 years (trough, lower peak values). As expected, the spread of the phreatic surfaces at the peaks are much higher than those at the troughs. Another finding is that at the times where the phreatic surface peaks at the highest values, they have a wider spread of values (and in the trough following the high peak). Where the peak is lower, the spread of values is more constrained (also in the trough following the low peak).

Fig. 13
figure 13

Probability density distributions of phreatic surface height above the impermeable surface (across all values of k) at four time steps: (1) time = 0.74 years (peak, higher peak values); (2) time = 1.33 years (trough, higher peak values); (3) time = 6.80 years (peak, lower peak values) and (4) 7.46 years (trough, lower peak values). The peaks are shown in a solid line, whereas the troughs are in a dashed line

3.3 Site-Specific Model Coupling

To couple the outputs from the probabilistic (the MCS) and hillslope hydrological (the HSB) models, the phreatic surface time series (output from the HSB) were discretised according to 25 Z values (\(N_z= 25\)), that were used to generate the phreatic surfaces for the MCS. Prior to doing so, a reference frame convergence is made by dividing the phreatic surface from the HSB model by cosine of the inclination of the impermeable boundary (15\(^\circ\)). By discretising each phreatic surface time series according to Z, a unique FoS time series can be generated for each GSI realisation and k to generate 1000 \(\times\) 2000 (\(N_r \times N_k\)) FoS time series. The FoS time series for each realisation is converted to a binary ‘failure’ time series with failure for FoS<1 and stability for FoS>1.

The time-varying conditional probability of failure (P(f|T)) was determined by dividing the number of failures (where FoS < 1) for a fixed time step considering all FoS time series by the number of realisations of GSI (\(N_r= 1000\)) and the number of realisations of k (\(N_k= 2000\)), according to Eq. 3. Thereby, determining a P(f|T) for each time step.

To determine a frequency of failure (\(F_f\)), each ‘failure’ time series was worked through chronologically, and when failure occurs, there is an iteration for the failure count for that failure time series and the failure time series is zeroed for 90 remediation days preventing further failures from being counted in a 90-day window. The timescale for remediation was assumed to be 90 days for Nepal because the monsoon season lasts 3 to 4 months and work cannot be carried out during this time. The total number of landslides is then summed across all ‘failure’ time series and normalised by 1000 realisations (\(N_r= 1000\)) and 2000 realisations of k (\(N_k= 2000\)) to determine a number of landslides per 11 years (according to Eq. 4).

4 Results

4.1 Time-Varying Conditional Probability of Failure

Figure. 14 displays the time-varying conditional probability of failure determined by following the method outlined in Sect. 2.3.1. The figure exhibits similar general trends that can be observed in the phreatic surface time series plot (Fig. 12). Figure 12 displays annual cyclicity, with peaks in the phreatic surfaces in correspondence of the monsoon season and troughs in the dry season. The same trend can also be observed in Fig. 14, with peaks in the P(f|T) during the monsoon season and significant dips during the dry season. In Fig. 12, it can be seen that the peaks of the phreatic surfaces during the monsoon seasons for the first 2 years are the highest, and then the peaks of the phreatic surfaces decline every year to the lowest in the 7th year. After the 7th year, the peaks of the phreatic surfaces during the monsoon season start to increase, and then drop off again in the 10th year, followed by a sharp increase for the 11th year.

Fig. 14
figure 14

Time-varying conditional probability of failure (P(f|T)) over 11 years for the case study site. P(f|T) calculated for every time step by dividing the total number of failures across all ‘failure’ time series by the number of realisations and number of realisations of hydraulic conductivity

4.2 Frequency of Failure

By summing all the landslides from each FoS time series, and normalising by the number of k realisations (\(N_k= 2000\)) and the number of realisation (\(N_r= 1000\)), it is estimated that 5.10 failures of this cutting will occur every 11 years.

As shown by Ross (1972), the rate of landslide occurrence (\(\lambda\)) can be estimated as:

$$\begin{aligned} \lambda =\frac{N(t^*)}{t^*} \end{aligned}$$
(5)

where \(N(t^*)\) is the number of failures in a time period \(t^*\). In this case, \(\lambda =5.10/11=0.46\), meaning that failure is expected at a rate of 0.46 per year (annual \(F_f\)). This can be equated to a failure approximately every other year. Informal comments from consultants on the Narayanghat-Mugling road suggest that this frequency of failures has been observed in the past in the area. This methodology developed using an 11-year rainfall time series, assumes that this \(F_f\) is still valid for years into the future (i.e. in 20 years time).

Probability of failure can be determined from the frequency of failure using simple statistics, following a Poisson model which is a continuous-time model including the occurrence of random point-events, or in this case landslides (Crovelli 2000), the probability of failure can be determined. According to the Poisson model, the probability of one or more landslides during a time t is:

$$\begin{aligned} P \{ N(t)\ge 1 \} = 1-\exp {^{-\lambda t}} \end{aligned}$$
(6)

Thereby, it is estimated that the annual probability of failure for this cutting is 0.37 (\(1-\exp {^{-0.46 \times 1}}=0.37\)). The probability of one or more cutting failures occurring in 11 years is 0.99 (\(1-\exp {^{-0.46 \times 10}}=0.99\)) and the probability of one or more cutting failures occurring in 100 years is close to one (\(1-\exp {^{-0.46 \times 100}}=1\)).

Figure 15 displays a histogram of the number of slope failures over 11 years into 100 bins of hydraulic conductivity values. The number of failures were normalised by the number of realisations (\(N_r= 1000\)) and the size of the bin (40 realisations of k). This figure shows that the lower the k of the slope, the greater the \(F_f\).

Fig. 15
figure 15

Histogram of the number of landslides in 11 years for the case study site binned according to the discretised realisations of hydraulic conductivity. There are 40 realisations of k in each bin

Figure 16 displays a histogram of the number of slope failures per 11 years into 100 bins of GSI. The number of failures were normalised by the number of realisations (\(N_r= 1000\)) and the size of the bin (40 realisations of GSI). This figure shows that the lower the GSI of the slope, the greater the \(F_f\) as it is reasonable to expect.

Fig. 16
figure 16

Histogram of the number of landslides in 11 years for the case study site binned according to realisations of the Geological Strength Index (GSI). There are 40 realisations of GSI in each bin

5 Discussion

Following the methodology presented in this paper, the time-varying conditional probability of failure and frequency of failure of a slope cutting triggered by rainfall can be estimated. In this section, the uncertainties of the method and the sensitivity of the model to key assumptions made are discussed.

5.1 Uncertainties

The HSB model contains a large number of assumptions necessary to enable a fast and simple solution to the RE. Paniconi et al. (2003) carried out a comparison between the Hillslope-Storage Boussinesq equation and RE examining various hillslope geometries. They determined that the two models have closer matches in outcomes for convergent than divergent hillslopes, and under drainage conditions than recharge conditions. They state that there are “remarkably good matches of the diversity of shapes, including peaks and spreads, that characterise the storage and outflow dynamics of the different hillslopes" (Paniconi et al. 2003) (p. 9). The reason for the difference in the outcomes of the recharge scenarios is attributed to the role of the unsaturated zone in the RE, slowing the vertical transmission of rainfall through the hillslope soil. Due to the influence of the unsaturated zone, the storage profile simulated by RE is lower than that from the HSB model. This relationship is exhibited in the hydrographs they present, in that there is a closer match between the HSB and RE hydrographs for a convergent slope than a divergent slope, as the convergent slope drains slower, remaining more saturated, meaning the unsaturated zone plays less of an important role. On the other hand, the HSB and RE hydrographs for a divergent hillslope exhibit greater differences given that divergent slopes are faster draining and, therefore, the unsaturated zone plays a more important role. Thus, whilst the assumptions of the HSB are known to introduce some additional model uncertainty, they are necessary to render the approach tractable and this additional uncertainty is small in the context of the very large uncertainty in material properties (e.g. permeability, porosity, and geometry of impermeable layer) for hillslopes in general (this cutting is no exception).

The uncertainties in the rock parameters used in the LEM analysis are partially accounted for (the parameter values used for the probabilistic analysis are themselves uncertain) by carrying out probabilistic stability analyses. The assumption of infinite correlation length is also made by neglecting local spatial variability in the cutting.

As previously stated, the predominant failure mechanisms observed in the LEM stability analyses were shallow in terms of aspect ratio and constrained to the cut slope itself. Shallow failure of the road cut slope itself is commonly observed along this road, as well as multiple other roads in Nepal, due to over-steepened cut slopes.

The Narayanghat-Mugling road was historically blasted when it was initially excavated. However, disturbance caused by blasting is not accounted for. This was decided to keep the slope model simple, as this case study is being used to demonstrate the methodology presented in this paper. In addition, there is no data on the intensity and extent of damage that was caused by blasting. If the blasting was carried out in an uncontrolled manner, the disturbance towards the face of the cutting may be quite high which can significantly reduce the stability of the cutting.

5.2 Probability of Failure

The similarities in the trends of the phreatic surface time series (Fig. 12) and the P(f|T) time series (Fig. 14) highlight the influence of the phreatic surface level on the P(f|T). The phreatic level peaks in the first two monsoon seasons, as the rainfall in the previous year during the spin up is very high (a copy of the daily rainfall during 2020, see Fig. 8). Based on this finding, it can be said that daily rainfall is not the key driver of failure, but prolonged heavy rain instead. This suggests that it is important to account for rainfall variability in time using a time-dependent system (e.g. a rainfall time series) for an area that hosts a monsoon season, rather than using a time-independent system (e.g. using rainfall events drawn from an IDF curve).

The annual cyclicity and variability of the P(f|T) observed in Fig. 14 also suggest that annual probability may not be representative for an area that hosts a monsoon season and has a long-term memory system. If an instantaneous failure probability is used in stability analysis and design, it could result in dramatically over-conservative or under-conservative results. For example, we estimated an annual probability of failure of 0.37 for this cutting. Looking at the time-varying probability of failure (Fig. 14), this would be a gross overestimate of the probability of failure for this cutting for the majority of the 11 years analysed.

As discussed, Fig. 14 can be used to observe P(f|T) over time and how this reflects fluctuations in the phreatic surface time series. However, it is difficult to estimate a rate of slope failure occurrence from this plot as the time-varying conditional probability is not bound by time, and, as discussed, a rate of occurrence may not be representative of this plot. It is not straightforward how to use time-varying P(f|T) to inform slope stability design. Moreover, there are not yet slope design standards at present expressing threshold values in terms of failure probability. Conversely, the \(F_f\) is a much more intelligible value, which could be hugely beneficial to stability design. For example, \(F_f\) can be used in a cost–benefit analysis where stabilisation measures can be compared according to their \(F_f\) value and cost.

5.3 Assumptions

A significant assumption in the HSB model is that flow is oriented parallel to the bed slope. This differs from the RE model where flow can be resolved in any direction and flow direction emerges from the solution (Paniconi et al. 2003). Another assumption of the HSB model is that it does not account for vertical infiltration meaning that there will be an increase in water flux at the toe of the slope as compared to a model which accounts for vertical infiltration (e.g. RE) (Paniconi et al. 2003).

The model’s sensitivity to the remediation time required to return the slope to its pre-failure condition (during which FoS<1 does not result in a countable landslide) was tested using values of 7–365 days (Table 6). There is a steep decline in the number of failures in increasing the number of remediation days from 7 to 90 days, this then plateaus after 90 days. This trend demonstrates that reducing remediation time can considerably increase estimated frequency of failure. It also shows that the cutting used to demonstrate this methodology is very unstable, and that if remediation simply returns it to its pre-failure condition, it will quickly fail again. This indicates that slope stabilisation measures are necessary. Although the \(F_f\) is strongly sensitive to the remediation time, this assumption can be refined by a practitioner working on a project who is likely to have a good knowledge of the expected number of days until remediation occurs. It is also assumed that the cutting is reinstated to its pre-failure condition, during each remediation following a failure. Although it is unlikely that the slope would be returned to exactly the same conditions, it may not be dissimilar.

Table 6 Sensitivity test on how the number of remediation days (days after a landslide) affects the frequency of slope failures over 11 years

The 1st and 99th percentiles of the lognormal distribution of k were initially defined through evaluating values of k for fractured crystalline metamorphic rock from Singhal and Gupta (2010) (\(10^{-9}\) to \(10^{-5}\) m/s). The value for the 1st percentile was then adjusted based on predictions from the HSB model evaluated against the observational constraint that the cutting had no evidence of overflow and, therefore, should not experience prolonged periods of surface saturation. For the case of the 99th percentile, there is no observational constraint and, therefore, it is assumed that this value is the higher end of values found in literature (\(k=1\times 10^{-5}\) m/s). The sensitivity of the model to this value is tested (results displayed in Table 7). Values tested include \(6\times 10^{-6}\) to \(2.2\times 10^{-5}\) m/s in increments of \(4\times 10^{-6}\) m/s. Values lower than \(6\times 10^{-6}\) m/s were not tested as these would be too close to the 1st percentile (\(1.7\times 10^{-6}\) m/s). Values higher than \(2.2\times 10^{-5}\) m/s were not tested as the phreatic surface was levelling out at the impermeable boundary. This test shows that the greater the value of the 99th percentile of k, the lower the \(F_f\). There is a steep reduction in the \(F_f\) from \(6\times 10^{-6}\) to \(1.4\times 10^{-5}\) m/s, with little change thereafter. The range of values of k from literature for fractured crystalline rock is wide (\(10^{-9}\) to \(10^{-5}\) m/s), but based on observational constraints, the potential range for the cutting is reduced to \(1.7\times 10^{-6}\) to \(10^{-5}\) m/s. Given that this range is at the higher end for fractured crystalline rock, it can be said that the cutting is highly fractured.

Table 7 Sensitivity of the frequency of failure (\(F_f\)) model to the 99th percentile of the lognormal distribution defining hydraulic conductivity

The HSB model assumes that the variation in k with depth can be approximated as a step function, with uniform k material above an impermeable boundary. This is clearly an approximation, but underpinned by theoretical and observational studies on the permeable layer sometimes referred to as the critical zone (the weathered mantle between fresh bedrock and the atmosphere) (Anderson et al. 2019; Grant and Dietrich 2017; Flinchum et al. 2018; Clair et al. 2015). Many of these studies note a decline in permeability, fracture density or openness, and/or degree of weathering with depth and also note that this decline is typically nonlinear (with much of the permeability concentrated near the surface; e.g. Jiang et al. 2009; Ameli et al. 2016) and/or characterised by a sharp boundary between disturbed and fresh bedrock (Clair et al. 2015).

Hydraulic conductivity and porosity are linked as they are both influenced by the lithology type and the density of fractures in the rock, meaning that their values used in a model should be correlated. However, we did not correlate these values and the value for porosity in the HSB model was held constant. This was done so for the sake of simplicity to demonstrate the overall methodology. The value for porosity used in the HSB model was estimated as an average of a range of porosity values for fractured crystalline metamorphic rock (5–10%) from Singhal and Gupta (2010). The sensitivity of the model to the porosity was tested and the results are displayed in Table 8. The higher the value of porosity, the lower the frequency of failure. If the porosity of the cutting was actually at the lower end of the range for fractured crystalline metamorphic rock, the model output would remain the same. Alternatively, if the porosity was at the higher end, the model would overestimate the frequency of failure by 12%. These differences are relatively small compared to the other uncertainties examined here suggesting that the model is not too sensitive to porosity for this rock type, and thus it is acceptable to select a deterministic value for this parameter in this case. The range of porosity values for most rock types generally varies by 5–15% (with the exception of basalt and highly weathered crystalline rock) (Singhal and Gupta 2010). With this in mind, the model is likely to be insensitive to porosity across the range of porosity variability found in most rocks and, therefore, it is broadly accepted to neglect porosity spatial variability.

Table 8 Sensitivity test on how the effective porosity in the HSB affects frequency of slope failures (\(F_f\)) over 11 years

Given the lack of in situ hydrological data, we made the assumption that the impermeable boundary is at a lower inclination than the ground surface, so that the permeable layer is thicker towards the ridge based on evidence from literature (Clair et al. 2015). As with the boundary conditions imposed by Troch et al. (2003), the impermeable boundary at the river channel is fixed to the depth of the river channel. We tested the model’s sensitivity to the inclination of the impermeable boundary (see Table 9). Inclinations steeper than parallel to the ground surface had not been tested as there is no evidence of this subsurface architecture in the literature (Clair et al. 2015; Riebe et al. 2017; Anderson et al. 2013; Lebedeva and Brantley 2013; Flinchum et al. 2018; Hayes et al. 2019; Medwedeff et al. 2022). \(B_p\) (the thickness of the permeable layer) was varied with layer inclination, to ensure the impermeable boundary intersects the ground surface at the river channel.

A steep increase in the \(F_f\) can be observed as the inclination of the impermeable boundary increases. This correlation is due to increasing the gradient of the impermeable boundary resulting in an increased height of the phreatic surface leading to greater instability. Increasing the gradient of the impermeable boundary results in two competing drivers: (1) faster and, therefore, thinner lateral subsurface flow reducing the height of the phreatic surface; and (2) an increased height of the phreatic surface as the phreatic surface is perched on the impermeable boundary (increasing the gradient of the impermeable boundary increases its height above datum and thus increases the height of the phreatic surface above datum). Based on our results (Table 9), it can be said that the geometric effect of the height of the boundary on which the subsurface flow is perched increasing has a greater effect on the phreatic surface level than that of the gradient increasing the flow velocity. The outputs of this sensitivity analysis are slightly biassed towards being sensitive given the link between k and the inclination. Different combinations of permeability and inclination of the permeable layer can lead to the same phreatic surface (Beven 1996). The 1st percentile of k was chosen based on the observation-driven constraint that it should not result in frequent surface saturation for the slope. This constraint will be violated by some of the realisations of k from the previous distribution under scenarios where the bedrock inclination is increased. Conversely, the constraint will be over-conservative (and the 1st percentile of k over-estimated) for scenarios where the bedrock inclination is reduced.

Table 9 Sensitivity test on how the inclination of the impermeable boundary in the HSB effects frequency of slope failures over 11 years

In conclusion, this analysis shows that the methodology is particularly sensitive to the number of remediation days and the 1st percentile of k (linked with the inclination of the impermeable boundary). Given that practitioners should be able to confidently estimate the number of remediation days, the sensitivity of this input parameter is not a concern.

5.4 Use by Practitioners

The estimation of \(F_f\) can be very useful in cost–benefit analyses where different slope stabilisation measures are considered. For each measure, a different cost would be incurred and a value of \(F_f\) would be calculated.

Although this methodology is not costly nor challenging in terms of ground investigation, the bulk deterministic analyses in LEM can be costly in terms of computational time. To carry out the stability analyses (1000 analyses for each phreatic surface, of which there were 25), 10 virtual machines were used, which each had 32GB and 8 cores, running in parallel. Alternative methods could be used to conduct the stability analyses using an analytical approach and carrying out the probabilistic analysis using a first-order second-moment (FOSM) method (e.g. Huang 2021). However, in using an analytical approach, pore water pressure can only be included as a pore water pressure ratio (\(r_u\)) which could result in an unrealistic set of trigger scenarios. For this approach, the G-H–B parameters would have to be converted to M-C parameters. This conversion can be done using the approach presented by Renani and Martin (2020) who propose a new equation for the conversion of parameters for slope stability analysis based on elastic stress analysis. Conversely, the hydrological model is very fast to run (2000 values of k takes < five minutes).

6 Conclusion

We introduced a novel computationally efficient methodology coupling probabilistic slope stability analyses with the Hillslope-Storage Boussinesq hydrological model to estimate the time-varying probability of failure and frequency of failure of a cutting triggered by rainfall. Unlike many other studies, this method treats both the complexity of the failure mechanics and the hillslope hydrology. We aimed to further understand the implications of using a time-dependent system to represent rainfall variability (a rainfall time series) for an area that hosts a monsoon and to further explore the implications of using an instantaneous conditional probability of failure. The mechanistic model is suitable for use in a LIC/LMIC setting. We demonstrate the methodology on a road cutting in Nepal in a mountainous area subject to a markedly seasonal wet–dry weather. The outcome of frequency of failure methodology suggested that this cutting will fail every other year in its current condition. The time-varying conditional probability of failure showed annual cyclicity with the monsoon season (peak in probability of failure during the monsoon season) and reflected trends displayed in the phreatic surface time series.

Our results point to the need to calculate the phreatic surface to be employed in the seepage analyses of slope stability from a hydrogeological model rather than postulating the phreatic groundwater level as often done in current practise. This is for two reasons: (1) the phreatic level is highly dependent on the hydrogeology of the upslope which in turn is a function of rainfall and geometry of the catchment area; (2) the phreatic level exhibits significant variation over time. Therefore, we believe a relatively simple model like the 1D HSB as we adopted is a good model since it captures the key physical drivers of hydraulic flow within the slope without being computationally expensive.

These findings indicate that it is important to use a time-dependent system to represent rainfall variability for slope failure probability analysis, rather than a time-independent system, e.g. from an Intensity–Duration–Frequency curve. Our sensitivity analyses show that the frequency of failure is very sensitive to the value chosen to represent the 99th percentile of k in the distribution, the inclination of the impermeable boundary, the number of remediation days between landslide counts and the inclination of the impermeable boundary. The number of remediation days can be better constrained by knowledge of the typical remediation measures employed locally following a failure event. However, constraining the value for hydraulic conductivity may be more difficult and, therefore care should be taken in the result if the value for hydraulic conductivity is based on assumptions.