1 Introduction

For numerous basins in the Mediterranean semi-arid regions, mountains are a major ‘water tower’ for the surrounding plains [1, 2] where agriculture plays a crucial role in the economy. This is the case in the Moroccan High Atlas Mountains, which are considered as the headwaters for the surrounding plains where agricultural, industrial, and touristic activities consume more than 90% of the available water resources [3, 4]. Previous studies indicated that this area experienced pronounced drought events during the last decades [5, 6]. Additionally, a recent study has projected a significant trend toward drier conditions in this region towards the end of the century [7]. Air temperatures have also been continuously increasing since 1989 [8] and the demand for water is expanding due to the growth of population and socio-economic activities. As such, water resources in the region are threatened due to their vulnerability to climate change and the low storage capacity and absence of artificial reservoirs for water regularization in some basins [9]. Consequently, the need for effective water resources management in the headwater region of the Moroccan High Atlas region is an urgent priority.

It is commonly known within the water management community that increasing water management efficiency based on streamflow forecasts is critical for reducing the impact of water scarcity [10]. Streamflow is a key variable in the water balance and quantifying the amount of water supplied in the watershed could be a first step toward better water management [11], but it remains challenging. At the scale of the Tensift basin, several previous works have used conceptual, process-based hydrological models to simulate and predict runoff and resulting streamflow, such as the Soil and Water Assessment Tool model (SWAT) [9], the Snowmelt Runoff Model (SRM) [12] and the Génie Rural à 2 paramètres model (GR2M) [8, 13] in the Rheraya sub-basin in Tensift.

The High Atlas is characterized by a high spatial and temporal heterogeneity of hydrological processes [9, 14,15,16], data scarcity and complex runoff generation mechanisms, which makes the application of hydrological models for streamflow simulation a challenging task [3]. As such, Artificial Intelligence (AI)-based data-driven techniques could be a promising alternative to process-based models for hydrological simulations [17]. Various AI-based techniques have been widely used to forecast streamflow [18,19,20,21,22,23,24]. Their main objective is to decrease the estimated error between the simulated and the target variable [25]. Artificial Neural Network (ANN) techniques have been commonly applied for runoff modelling and achieved good results [26,27,28]. [29] successfully utilized the multilayer perceptron (MLP) and the radial basis neural network (RBFNN) to predict streamflow at multiple gauging stations in the agricultural Eucha watershed located in north-west Arkansas and north-east Oklahoma. In the ungauged River of Luvuvhu in South Africa, [30] used RBFNN to perform 1-day forecasts of streamflow. Due to data-scarcity that may occur in some develo** countries, the application of data-intensive models may be difficult, and [30] demonstrated that artificial neural networks are effective for streamflow forecasting in this context. [31] demonstrated the effectiveness of ANNs for monthly streamflow prediction in poorly gauge catchments as well. In parallel, the Support Vector Machine (SVM) has been popularized as a new statistical learning method and verified to be a robust and efficient algorithm for both classification and regression [32]. SVMs methods are based on map** the input data set into high dimensional feature space to resolve classification problems and reproduce the relationship between the input variables and the target. [33] studied the use of SVM for daily rainfall-runoff modeling and found it to provide accurate prediction of streamflow. Seasonal and hourly multi-scale streamflow prediction was conducted by [34] using the SVM method which showed a promising performance. [35, 36] assessed SVM’s ability to forecast monthly streamflow. They concluded that SVM, when combined with techniques for input selection and tuning hyper-parameters, provides an effective streamflow prediction at a monthly step. In semi-arid environments, [37] examined three different data-driven methods for forecasting river flow: ANN, SVM and Adaptive neuro fuzzy inference system (ANFIS). Their study showed that SVM performed better than ANN and ANFIS. The aforementioned studies proved that AI techniques can offer a viable alternative to process-based hydrological models due to their capability of handling the non-linear and non-stationary nature of the hydrological processes [38]. Furthermore, the integration of Geographic Information Systems (GIS) with Artificial Intelligence (AI) models has emerged as a revolutionary approach in hydrological research, enabling improved spatial analysis and visualization capabilities. GIS serves as a robust instrument for evaluating and managing water resources by facilitating the spatial map** of watershed characteristics, significantly enhancing AI-driven modeling efforts. For instance, [39] illustrated the effectiveness of GIS and remote sensing in assessing flood impact through morphometric parameters in the Niger Delta region. Additionally, the rise of hybrid AI models presents promising progress in capturing the intricate dynamics of hydrological systems. These models combine various AI techniques or incorporate AI with process-based models, potentially enhancing the precision of streamflow forecasts. [40] investigated an approach employing artificial neural networks combined with a hybrid technique utilizing wavelet transform to evaluate the water quality of the Tallo River in Indonesia, demonstrating the efficacy of hybrid AI approaches. Similarly, [41] devised an enhanced multi-stage genetic programming model for streamflow prediction, emphasizing the potential of hybrid models in improving prediction accuracy and dependability compared to conventional models alone. These advancements in GIS integration and hybrid AI models emphasize the evolving terrain of hydrological modeling, presenting novel pathways for precise, dependable, and comprehensive water resource management solutions. As AI algorithm become widely available, their flexibility and accessibility make them worth to be evaluated for simulating hydrological response.

Considering the main challenges related to water resources within the Moroccan context, this research aims to make a valuable contribution to the field of water management. Firstly, the objective is to understand how ML techniques could simulate daily streamflow in the semi-arid and mountainous region of Tensift. Secondly, the goal is to select the optimal configuration that can serve as a foundation for the future research to develop a hybrid modelling framework. Two ML techniques were selected for their robustness and effectiveness in addressing nonlinear relationships, SVR and RF, and were subsequently compared against the conventional Multiple Linear Regression (MLR) model. The study area focused on Rheraya sub-basin giving its unique climatic conditions and limited hydrological data, allowing for a deep exploration of machine learning’s potential in water management. Given the high temporal variability of the hydrological processes and resulting streamflow in this region, the temporal stability of these models was also assessed. The influence of the variable hydroclimatic conditions on the stability of the performance of the tested AI techniques was further analyzed and discussed.

2 Materials and methods

2.1 Study area

This study was conducted in the Rheraya sub-basin located in the High Atlas Mountains in Morocco. The watershed covers a surface area of 228 km\(^2\) and is characterized by a large altitudinal gradient ranging from 1060 m a.s.l to 4167 m a.s.l at the Jbel Toubkal summit, the tallest summit of North Africa (Fig. 1). The hydro-climatic context is characterized by a strong heterogeneity both in space and time. The mean annual precipitation increases with elevation by an average of 166 mm km\(^{-1}\) [42]. The mean temperature lapse rate varies between \(-\) 4.39 and \(-\) 4.85\(^{\circ }\)C km\(^{-1}\) annually, from \(-\) 3.67\(^{\circ }\)C to \(-\) 5.21\(^{\circ }\)C \(\text{km}^{-1}\) monthly and from \(-\) 2.75\(^{\circ }\)C to \(-\) 7.1\(^{\circ }\)C \(km^{-1}\) daily [12, 42]. The Rheraya sub-basin is an appropriate site for this study due to its relatively dense network of measurement stations compared to other basins in Tensift, and its urgent need for water management. The Rheraya sub-basin is an appropriate site for this study due to its relatively dense network of measurement stations compared to other basins in Tensift, and its urgent need for water management.

Fig. 1
figure 1

Map of the Tensift river basin and studied Rheraya sub-basin

2.2 Dataset

The datasets used in the present study are divided into three categories: (i) in-situ data, (ii) simulated data and (iii) satellite data, spanning a 13-year period from 1st September 2003 to 31st August 2016 (Table 1). The daily observations of streamflow records between 2003 and 2016 were supplied by the Tensift hydraulic water agency (ABHT: https://www.eau-tensift.net). Figure  2 illustrates the seasonal variability of the daily-observed streamflow and highlights the presence of several extreme values in each hydrological year. Meteorological data was provided by the International Mixt Laboratory (LMI-TREMA: LMI TREMA - LMI - Télédétection et Ressources en Eau en Méditerranée semi-Aride (lmi-trema.ma)) [43]. Table 2 provides a comprehensive overview of the statistical values of streamflow and its predictors from 2003 to 2016.The predictors were collected from ten stations located around the watershed (Fig. 1). All meteorological series were subject to several pre-processing steps, including: (i) aggregating all variable to daily averages; (ii) deriving lapse rates of air temperature, relative humidity, and precipitation; then, (iii) spatial distribution of the daily variables to the whole Rheraya catchment by combining an altitudinal lapse rate and spatial interpolation of temperature and precipitation. Further details about these processing steps are provided by [44]. Snowmelt and snow water equivalent (SWE) were simulated by the classical temperature-index model (TI) using air temperature (Ta) as a unique index of melt energy. The model has a single coefficient, called the “degree-day factor” (DDF, in mm \(^{\circ }\)C\(^{-1}\) d\(^{-1}\)); which was previously calibrated by [45]. The snow cover area (SCA) was calculated using the Normalized Difference Snow Index (NDSI) from the MODIS Collection V006 (MODIS Snow Cover Daily L3 Global 500 m SIN GRID V006) [46]. SCA was derived from both MODIS daily snow cover products MOD10A1 and MYD10A1 generated respectively from the Terra and Aqua satellites [47]. SCA from Terra and Aqua were blended and clouds and other missing pixels were filled using a spatiotemporal filter implemented by [4]. More information is given by [44].

Table 1 Datasets used in the present study
Table 2 Statistical values of streamflow and its predictors over the period 2003–2016
Fig. 2
figure 2

Distribution of daily streamflow values per hydrological year over the 2003–2016 period at the Rheraya outlet (Tahanout station)

2.3 Methods

The methodological framework involved three main steps, which are summarized in Fig. 3: (i) cross-correlation analysis between streamflow and potential predictors, (ii) model calibration and cross-validation, and (iii) streamflow simulation and model comparison. Step (i) is crucial for detecting statistically significant associations and optimal time lags that impact streamflow and hence improve the performance of the applied models. This step involves analyzing the importance factor of each potential predictor by using cross-correlation analyses to identify lagged relationships between predictors and streamflow, and the correlation matrix to detect predictor collinearity. Step (ii) applied a ‘leave-one-year-out’ cross-validation for the three models. The last step (iii) compared the mean performance of the three models using common error metrics, defined below. The subsequent sections present the details of each methodological step.

Fig. 3
figure 3

Flowchart of the methodology used in this study

2.3.1 Support Vector Regression (SVR)

The Support Vector Machine (SVM), which was created by [32], is one of the effective techniques considered to solve classification (Support Vector Classification - SVC) and regression (Support Vector Regression - SVR) issues [48,49,53,54,55]. SVM consists of projecting inputs into a high-dimensional space based on a linear function and calibration using an ideal hyperplane to partition datasets by offset. The goal of SVR is to find a regression function f(x) that can adequately characterize the connection between the given input dataset \(x = \bigl \{x1, x2, x3,\ldots xn\bigl \}\) and the target value \(y = \bigl \{y1, y2, y3,\ldots yn\bigl \}\) as follow:

$$\begin{aligned} f(x)=w\phi (x)+b \end{aligned}$$
(1)

where w denotes the function weight vector, b the offset factor and \(\phi \) is the nonlinear map**. To avoid overfitting the calibration data samples, SVR employs an objective function (Eq. 2) and a loss function (Eq.  3) to obtain the regression parameters in Eq. (1):

$$\begin{aligned}{} & {} min\frac{1}{2}\Vert w\Vert ^2+ C\Sigma _1^n(\xi _i + \xi _i^*) \end{aligned}$$
(2)
$$\begin{aligned}{} & {} \text {subject to:}\quad{\left\{ \begin{array}{ll} y_i - (w\phi (x)+b)\le \epsilon + \xi _i\\ (w\phi (x)+b)- y_i \le \epsilon + \xi _i^*\\ \xi _i\ge 0,\xi _i^*\ge 0, i = 1,\ldots,n \end{array}\right. } \end{aligned}$$
(3)

where \(\epsilon \) is a positive error threshold. C is a penalty coefficient defined by the user. \(\xi _i\) and \(\xi _i^*\) are the positive slack variables employed to determine the calibration data deviation from \(\epsilon \). Prior to model calibration, the hyperparameters, i.e., the type of kernels (either linear, polynomial, sigmoid, Gaussian or Gaussian Kernel Radial Basis Function (RBF)) and their corresponding parameters were optimized using the GridSearchCV algorithm from the Sklearn library GridSearchCV in Python demonstrated in [56]. The SVR model was implemented with the linear kernel as selected by the GridSearchCV.

2.3.2 Random Forest (RF)

The RF model, proposed by [58]. The number of trees (NTree) and the number of predictors chosen at each tree split (NPred) are the two most sensitive hyper-parameters of the RF algorithm [59]. Since the RF algorithm does not overfit data, the number of trees does not have a significant impact on the model performance [60] and a minimum of 500 trees is typically adequate (1300 trees were used in this study). But reducing the number of predictors at each split impacts the computation time, the correlations between trees and their predictive ability [61]. In this study, the RF model was implemented using the Python library Sklearn RandomForestRegressor. The hyperparameters (NTree and NPred) optimization was done using the same grid searching process as for SVR, and the hyperparameters values that gave the optimal score (mean-squared error) were considered.

2.3.3 Multiple Linear Regression (MLR)

Multiple Linear Regression (MLR) is an extension of the ordinary least squares (OLS) regression technique. MLR establishes the linear connection between explanatory inputs and a dependent variable. Each value of the independent variable x is associated with a value of the dependent variable y. Given n predictors, the equation of MLR can be described as follows:

$$\begin{aligned} y = \alpha _0+\alpha _1 x_1+\alpha _2 x_2+\cdots +\alpha _n x_n+ \epsilon \end{aligned}$$
(4)

where y is the target; \(x_i\) is a predictor; \(\alpha _0\) is the constant term; \(\alpha _i\) is the predictor’s slope coefficient; and \(\epsilon \) is the residual error term. The parameters tuning is done by the least-square approach during the calibration and consists of obtaining the best agreement between the target and the predictors [62]. Contrary to SVR and RF, the simple MLR procedure does not require optimizing hyperparameters.

2.4 Evaluation criteria

To assess the performance of the models, the Nash Sutcliffe efficiency (NSE, Eq. 5) and the Root Mean Squared Error (RMSE, Eq. 6) were used:

$$\begin{aligned}{} & {} NSE=1- \frac{\sum _{i=1}^{n} (y_i-\hat{y}i)^2}{\sum _i(y_i-\bar{y})^2} \end{aligned}$$
(5)
$$\begin{aligned}{} & {} RMSE=\sqrt{\frac{\sum _{i=1}^{n}(y_i-\hat{y}i)^2}{n}} \end{aligned}$$
(6)

where \(y_i\) is the observed streamflow, \(\hat{y}_i\) is the predicted streamflow, \(\bar{y}\) is the mean observed streamflow and n is the number of observations.

2.5 Model construction and validation

2.5.1 Predictor selection

The process of predictors selection consists of decreasing the number of input variables for a predictive model. It allows reducing the number of inputs to those that are the most useful to a model and removing the non-informative or redundant predictors for predicting the target variable. In general, the selection of appropriate predictors has a positive effect on the performance of ML models [35, 63]. First, cross-correlation analysis based on Spearman’s rank coefficient was used to detect the the linear and non-linear correlation between the streamflow and the predictors in lag time up to 15 days. Then, the correlation matrix was used to highlight predictor collinearity. Uncorrelated predictors that had a significant correlation (fixed at 0.19 given the general observed low performances) with the target (streamflow) were retained as potential inputs for the ML models.

2.5.2 Model calibration

Model calibration and model validation are two important steps in the development and evaluation of a machine learning model. They are critical for ensuring that the model is reliable and accurate. Model calibration focuses on adjusting the model’s predictions to improve its accuracy and reliability. This step is often based on experimental data during which, the model is fitted by both predictors and the target. While model validation is the process of assessing the accuracy of the model by comparing its predictions to observed data that were not used in the calibration process. This step tests the generalization ability of the model to new and unseen data. The cross-validation method was first introduced by [64,65,66]. It consists of using different portions of the data to train and test the model following different iterations. There are several types of cross-validation, but the most common is k-fold cross-validation. In k-fold cross-validation, the dataset is randomly divided into k groups (or folds) of approximately equal size. The model is trained on all but one of the subsets (k-1), and then evaluated on the remaining subset. This process is repeated k times, with a different subset reserved for evaluation (and excluded from training) each time. Cross-validation helps to detect overfitting and provides an insight into how the model will generalize to new data. In the current study, the calibration and validation of models is realized throughout a ‘leave-one-year-out’ method which is a variant of k-fold cross-validation. It consists of removing one hydrological year at a time from calibration (training), train the model on the remaining years, and validate (test) on the left-out year. The process is repeated 13 times (13 folds CV) according to the period of 13 hydrological years going from September 2003 to August 2019.

3 Results

3.1 Predictor selection

The cross-correlation analysis reveals the relationship between the current day’s streamflow (day t) and its predictors over multiple lag times, up to 15 days before the streamflow event (Fig. 4). Daily streamflow can be explained by the antecedent streamflow and any additional runoff during a given day [67]. Consequently, antecedent streamflow, which occurs in days preceding the current day, was considered as a useful potential predictor in this study in addition to climatic variables. The maximum correlation between streamflow and its antecedent values occurs one day ahead (lag 1) and decreases gradually thereafter. For the other hydroclimate conditions, streamflow is strongly and positively correlated to precipitation at lag time 0, followed by SWE and SCA one day ahead (lag 1). Then come the weakest correlations with Melt at lag 2 (r = 0.21), humidity (RH, r = 0.20) at lag 0, and temperature (Ta, r = \(-\) 0.19) at lag 0, with the magnitude of all correlations decreasing over time thereafter.

In parallel, the correlation matrix (Fig. 5) reveals potential collinearity among predictors. Besides the hydroclimate variables, the streamflow at the three precedent days were considered as potential predictors based on their highest correlation with streamflow at day t as shown in the cross-correlation (Figs. 4 and 5). \(SWE_{t-1}\) and \(SCA_{t-1}\) are highly and positively correlated together (r = 0.76), while Ta is negatively correlated with humidity (RH) (r = \(-\) 0.51), \(SWE_{t-1}\) (r = \(-\) 0.54) and \(SCA_{t-1}\) (r = \(-\) 0.64). \(Melt_{t-2}\) is moderately correlated with \(SCA_{t-1}\) (r = 0.36) and \(SWE_{t-1}\) (r = 0.46), while streamflow during the three previous days is strongly inter-correlated. Consequently, precipitation on the current day (\(P_{t}\)), streamflow on the precedent day (\(Q_{t-1}\)) and the one day ahead (\(SCA_{t-1}\)) were chosen as potential, uncorrelated predictors. \(SCA_{t-1}\) was chosen instead of the one day ahead SWE, even though SWE at day t-1 had a higher correlation. This is because the difference in correlation between the two predictors was small and satellite derived SCA has a greater operational potential than simulated SWE. Also, SWE is simulated with a temperature index approach using station-based precipitation measurements [45], which are sparse and of variable quality in the catchment [68], which probably underestimates the spatial variability of SWE. SCA derived from MODIS, on the other hand, represent a direct observation of snow cover conditions, even if the 500 m resolution does not fully capture the spatial heterogeneity of the snow cover [44].

Fig. 4
figure 4

The cross correlation of potential predictors and streamflow up to 15 days in lag time

Fig. 5
figure 5

Correlation matrix of the input variables

3.2 Model mean performance and inter-comparison

During calibration (training), all models exhibited good performance, with NSE values above 0.65 and RMSE values under 1.40 \(m^3s^{-1}\) (Fig. 6). In the validation phase, although NSE values decreased, they remained above 0.50, while the RMSE decreased for SVR and MLR but increased for the RF model, with an \(NSE = 0.59, 0.53, 0.54\) and \(RMSE = 1.18, 1.18, 1.01\) \(m^3s^{-1}\) for SVR, RF and MLR respectively. Overall, the satisfactory NSE range (\(0.53 \le NSE \le 0.58\)) [69] during validation underscores the robustness of the models, particularly the SVR model, which outperformed others with an NSE of 0.59. For a more detailed comparison of models’ performances, Fig. 7 illustrates observed streamflow and simulated streamflow by the three models during the test year of 2011–2012.

Fig. 6
figure 6

Mean calibration and validation performance for the three models (SVR, RF, MLR) derived from annual cross-validation calibration; a Nash–Sutcliffe efficiency (NSE) score; b root-mean squared error (RMSE)

3.3 Model stability

Despite the generally acceptable performance highlighted in Fig. 6, annual cross-validation tests revealed significant instability of models acrossdifferent annual tests as depicted in Fig. 8a. During the annual cross validation tests, the NSE ranged widely from as low as 0.05–0.16 to as high as 0.81–0.87. While the three models had their lowest performance during the 2012/2013 hydrological test year, the behavior of each model differed to some extent during other test years. Notably, RF and SVR covary more closely than with MLR. For the 2010/2011 test year, SVR and RF had a low performance (NSE = 0.29 and 0.38, respectively), while MLR exhibited its highest performance (NSE = 0.85). Moreover, MLR had a low performance (NSE = 0.26) in 2004/2005 when RF and SVR performed well (NSE = 0.69 and 0.71, respectively). Thus, the performance of the three models was somewhat heterogenous.

Fig. 7
figure 7

Daily observed streamflow and simulated one by SVR, RF and MLR over the test year 2011/2012

Fig. 8
figure 8

Model performance during annual cross-validation and the distribution of hydroclimatic conditions during the calibration and validation periods. a Nash–Sutcliffe efficiency (NSE) score for each left out year during cross-validation; b–f distribution of hydroclimatic conditions during the multiannual calibration and annual validation period. A logarithmic transformation was applied on all positive data with skewed distributions. b Log-transformed streamflow; c log-transformed precipitation; d log-transformed snow cover area, e relative humidity; f air temperature over the 2003–2016 period

4 Discussion

4.1 General model performance

Passing from calibration to validation, the models exhibited almost the same behavior, with the NSE decreasing and RMSE as well, except for the RF model for which the RMSE increased (Fig. 6). Overall, the SVR model showed the highest performance during the validation in terms of NSE (0.59), followed by MLR (0.54) and RF (0.53), while the MLR model outperformed the other models in terms of RMSE (RMSE = 1.01, 1.18, 1.18 \(m^3s^{-1}\) for MLR, SVR and RF respectively). Considering the challenges mentioned above, especially the quality of the ground data measurements and their spatial distribution, a mean performance of NSE = 0.55 of the three models is acceptable as a first step towards an integrated modelling that includes Artificial intelligence and hydrological models, remotely sensed data, and the physical processes. Previous studies have demonstrated the successful application of SVR and RF for streamflow prediction [36, 37, 67, 70] as they provide a good accuracy when dealing with streamflow abrupt fluctuations [71] and for predicting peak flow [72]. MLR as well has been widely applied for streamflow prediction issues [62, 73, 74]. However, the achieved accuracy of the three models (\(NSE < 0.60\)) in the present study was inferior compared to similar studies conducted either in wet or in arid regions [62, 75, 76]. In wet regions, numerous studies have demonstrated the effectiveness of SVR, RF and MLR in modelling streamflow. [77] applied SVR to simulate monthly streamflow in the Kurau River in Malaysia with an R2 equal to 0.71, demonstrating SVR’s capability in capturing complex hydrological patterns in tropical climates. [67] simulated daily streamflow of the North American River and the Chehalis River in the US using SVR and obtained NSE scores of 0.83 and 0.93, respectively. RF was also used for annual streamflow prediction in the source region the Yangtze River (SRYR) with a NSE of 0.82 [72]. [78] modelled the monthly mean streamflow over three different stations in Turkey (Durucasu, Sutluce and Kale) with a good performance expressed by a correlation (R) value greater than 0.80. Furthermore, MLR was tested in the upper reaches of the Yangtze River in China for a ten-day streamflow forecast, yielding a NSE greater than 0.80 [79]. In a different study, MLR was employed for short-term streamflow forecasting in the East River basin in China, yielding a R value greater than 0.90 [80]. In arid and semi-arid regions, SVR, RF and MLR models resulted in accurate streamflow modelling. [51] simulated the monthly streamflow of the semi-arid Wei River Basin in China using SVR and achieved a very high performance of NSE = 0.99. Another study conducted by [81] in the **sha River basin, Southwest China, modelled monthly streamflow with SVR with a NSE equal to 0.96. Besides, in the Sevier River Basin located in South-Central Utah, USA, [34] applied SVR to predict hourly streamflow and acquired an R2 equal to 0.97. On the other hand, [82] used a RF model for monthly streamflow forecasting in the Aswan High Dam (AHD) in Egypt which gave an R2 equal to 0.90. In the Karaj reservoir in Iran, [83] modelled daily streamflow with an NSE equal to 0.97. Furthermore, [73] used MLR model for daily river flow prediction over the Seybouse River Basin located in northeastern Algeria and obtained an R2 equal to 0.90. In another arid region, the Karoon River in Iran, MLR yielded an R2 equal to 0.74 when simulating daily river flow [84]. Consequently, the accuracy of the three daily prediction models obtained in the present study is less than the one usually achieved in other studies, including those in arid regions at a daily time scale.

4.2 Model stability and relation to hydroclimatic conditions

The annual cross-validation tests demonstrated notable variability in the performance of SVR, RF and MLR models, highlighting a fundamental challenge in hydrological forecasting. Not only were the performances heterogeneous from one annual split to the other for the same model, but it also differed from one model to another for the same split. Despite the known ability of SVR and RF to handle the hydrological forecasts [79, 85,86,87,88,89], they fail to perform well for all the splits in the present study. We suggest that the quality of streamflow records, often marred by missing or inaccurate data, coupled with the heterogeneous hydro-climatic conditions in the studied area, likely contributed to the observed performance instability.

Streamflow measurements in Rheraya are subject to uncertainties since they are based on a non-updated stage-discharge rating curve. Practically, streamflow measurements depend on the rating curve and the measured water depth. Due to large streamflow events that episodically occur in the semi-arid and mountainous Rheraya sub-basin (see extreme values in Fig. 2), the riverbed can be significantly reworked during these events and affect the rating curve. Therefore, rating curves must be updated regularly. However, doing so is costly and time-consuming for the Tensift hydraulic water agency (ABHT). This results in a non-updated rating curve and a loss of accuracy in streamflow measurements [9, 43, 90, 91]. This suggests that the data quality may have a potential impact on the stability of the model accuracies. Besides, streamflow in Rheraya is also characterized by strong temporal variability (Fig. 2). To illustrate this, we analyzed the differences in streamflow distribution between the calibration and validation periods for each annual split of the cross-validation (Fig. 8b). The streamflow was log-transformed to adress its highly skewed distribution, enabling a more accurate comparison between calibration and validation periods. Over the 13-year study period, the lowest performance was obtained for 2012/2013, for all models (Fig. 8a). For this validation year, the range of the streamflow distribution during the test is included in that of the calibration period. While for the hydrological year 2010/2011 which had a low performance for the SVR and RF models, the range of the streamflow distribution during the test exceeded that of the calibration period. For the validation years for which simulations by the three models were relatively satisfactory, mainly 2006/2007, 2009/2010 and 2011/2012, the range of the streamflow distribution during validation was either equal, or fell within, the range of streamflow distribution of the calibration period. It can be concluded that the model’s performance was closely tied to the range of streamflow distribution between the calibration and validation periods. We suppose that when the range of streamflow distribution during the validation year was equal to or included within the range of the calibration distribution, the models provided adequate accuracy (\(NSE \ge 0.50\)). Conversely, when the range of the streamflow distribution for the validation year was larger than that of the calibration set, the performance degraded. The variation in streamflow distribution over the different splits is mainly caused by the strong temporal variability of the hydrological processes in Rheraya sub-basin [9, 90,91,92]. To understand the underpinnings of streamflow variability, we examine the annual variations in the distribution of other hydroclimatic variables (P, SCA, Ta and RH) during both the calibration and validation periods (Fig. 8c–f). Precipitation and SCA were also log-transformed to better portray the highly skewed distributions which will aid in the analysis of their temporal variations. Although Ta and RH were not used as predictors in our models, analyzing their distributions can help understand the variability of other critical predictors like SCA and precipitation. A more restricted distribution of precipitation (P) during calibration than validation is frequently seen, which in some years matches the same phenomenon in streamflow and the aformentionned degradation in model performance (e.g., 2010/2011: RF and SVR; 2012/2013: all models). The same point is noticed for the other variables (SCA, RH and Ta), the distribution between calibration and validation are more balanced. [8, 14, 93, 94] highlighted the significant seasonal and interannual variability in the hydrological processes which explains the frequent unbalance between the calibration and validation distribution seen in Fig. 8. In this context, the effect of the North Atlantic Oscillation (NAO) on regional hydrological patterns, particularly temperature and precipitation, was the subject of various studies. López-Moreno et al. (2011) conducted a study over the mediterranean mountains including the Atlas Mountains, to analyze the effect of the NAO on the temperature and precipitation. It was concluded that the NAO strongly impacts the snowpack dynamics. This conclusion was confirmed by another research conducted by [95] in the Moroccan Atlas Mountains. The study as well highlighted a possible impact of the NAO on the snowpack dynamic through rainfall and temperature. Furthermore, the physiography of this basin is characterized by a rugged topography and steep slopes with sparse vegetation that accentuate the spatial heterogeneity of runoff processes. On the other hand, the scattered rains and the large infiltration and evapotranspiration rates in the large foothill and plain segment of the catchment favor the spatial and temporal variability of rainfall and discharge at the basin outlet. This explains the high temporal variability of streamflow in the Rheraya sub-basin. Given this high variability, the models tested demonstrate difficulties in extrapolating beyond their calibration conditions. This suggests a need for longer-time series covering a larger spectrum of streamflow and precipitation variations to strengthen the prediction accuracy of models. Still, the three tested models can overall be considered reliable for daily streamflow prediction based on their mean performance during validation. However, their low performance for certain years reduces their usefulness and points to the need of obtaining longer, quality-controlled streamflow time series to better sample the heterogeneous hydrological variability during model calibration. In parallel, considering further fine tuning of model hyperparameters could also help to improve the forecasting capacity of SVR and RF models. A further challenge is model stability under climate change conditions. Temperatures are projected to increase and precipitation to decrease in Morocco like other Mediterranean countries [96,97,98], which have a direct impact on river discharge [99]. Using longer records to develop ML models for daily streamflow prediction would thus require updating the model calibration frequently to adapt to the warming and drying climate in the area.

4.3 Practical implications in water resources engineering

The application of SVR, RF, and MLR models improves the accuracy of streamflow prediction to a reasonable extent, thereby help stakeholders to advance water management strategies, which include efficient irrigation scheduling, drought management, and flood risk reduction [24, 100]. Accurate predictions play a crucial role in adjusting to climate change, guiding infrastructure development, and sha** policy decisions related to the sustainable allocation and utilization of water resources. By providing a foundation for more informed choices regarding infrastructure investments, such as the building of dams and the establishment of early warning systems, the study contributes to addressing the difficulties brought about by water scarcity and competition. Ultimately, the findings derived from this research support the maintenance of water sustainability and ecosystem conservation in arid regions, supporting efforts to enhance water security, optimize resource utilization, and promote the sustainable growth of at-risk communities and ecosystems.

5 Conclusions

In this study, daily streamflow modelling was conducted in the Rheraya sub-basin of the Moroccan High Atlas range, using observed daily precipitation, antecedent streamflow, and remotely sensed snow cover data as inputs into two machine learning models, SVR and RF as well as MLR. A leave-one-year-out cross-validation approach was used for model validation for the period 2003–2016. Next, the mean performance of SVR, RF and MLR was compared in calibration and validation. Finally, the influence of the temporal variability of hydroclimatic conditions on the quality of the streamflow simulations was analyzed, and the following conclusions were obtained: (1) RF, MLR and especially SVR models showed an overall adequate mean performance during validation (NSE: 0.53, 0.54 and 0.59 respectively), making them promising tools for daily streamflow prediction in Rheraya. Still this performance was lower than previously reported daily streamflow modelling studies in both wet and dry regions. (2) The hydrological processes in the Tensift region are highly variable in time, and the resulting heterogeneous streamflow conditions during the 13-year observational period greatly affected model stability and transferability between the calibration periods and individual validation years. The cross-validation procedure demonstrated the instable predictive performance of models over the studied period, changing the period of calibration or validation can lead to different accuracy of simulations. By next, models can be hardly transferable over years. (3) The instability of the model performance is mainly ascribed to the natural temporal variability of hydroclimatic conditions but can be also related to the uncertainty in the measured streamflow; more quality-controlled observations are needed to reduce this source of uncertainty. (4) In a semi-arid region characterized by a high temporal variability of hydroclimatic conditions, a direct application of ML techniques for surface water modelling may not have a good accuracy due to the non-similarity of streamflow over time. It is suggested that under these conditions the calibration period should not be chosen randomly, and that the calibration period should be long enough to adequately capture the natural temporal variability of streamflow. If these challenges can be overcome, ML-driven streamflow models would be a useful tool as they do not require a large amount of hydro-meteorological forcing data, nor do they require information on catchment properties, unlike some hydrological models. Hence, ML models could be a helpful tool for runoff monitoring in semi-arid catchments affected by data scarcity. Nevertheless, a rigorous validation of the reliability of this type of model is needed in the other sub-basins of Tensift to evaluate its transferability, given the spatial variability in climate and resulting hydrological processes. Considering semi-arid regions characterized by data-scarcity, the transferability of ML methods for hydrological modelling from a gauged basin to an ungauged one is of big interest. The aim is the possibility to reproduce the hydrological response of the ungauged sub-basin without requiring observations in the same sub-basin. It would be helpful to assess the possibility to train models in a gauged basin then reproduce the streamflow of an ungauged one. In Tensift, most of the sub-basins are ungauged and even the existing measurements are subject to errors. While this approach can help generate the hydrological response of these sub-basins, it requires a rigorous evaluation beforehand. In conclusion, this research presents an initial investigation of ML’s potential in semi-arid hydrology, paving the way for future enhancements like hybrid models and ensemble techniques, which will significantly contribute to efficient water resource management in semi-arid regions.