Abstract
Estimation of hydraulic parameters and their successful prediction in an arid and/or semiarid region are challenging due to various hydrogeological complexities. In this study, machine learning (ML) algorithms, namely random forest (RF), random tree (RT), support vector machine (SVM), Gaussian process (GP) and long short-term memory (LSTM)) are used and their performances are compared for the prediction of hydraulic parameters, utilizing multiple hydrogeological attributes deriving from a geographical information system (GIS) and geophysical investigations in Sindhudurg District, Maharashtra, India. Multihydrogeological attributes as input and hydraulic parameters—e.g., hydraulic conductivity (K) and transmissivity (T) derived by geoelectrical methods are used as a target for building the predictive models. To enhance the model performance, a logarithmic data transformation technique was employed and correlation analysis was conducted to hypothesize the different input configurations during ML model building. The data were divided into two components: training (80%) and testing (20%). Qualitative and quantitative performance measures were evaluated to examine the predictive power of the ML models. Based on the test result, RF is found to be the best predictive model with Pearson’s correlation coefficients of ~0.93 and 0.83 for modeling K and T, respectively. Results also reveal that model performance mainly depends on ML architecture, data structure, data accuracy, and the amount of data used. Thus, the present study successfully facilitates the predictive modeling of hydraulic parameters in the study area and the proposed method could be further explored in other complex hydrogeological areas around the world.
Résumé
L’estimation des paramètres hydrauliques et leur prédiction réussie dans une région aride et/ou semiaride sont un défi en raison de diverses complexités hydrogéologiques. Dans cette étude, des algorithmes d’apprentissage automatique (ML), à savoir la forêt aléatoire (RF), l’arbre aléatoire (RT), la machine à vecteur de support (SVM), le processus gaussien (GP) et la mémoire à long terme (LSTM), sont utilisés et leurs performances sont comparées pour la prédiction des paramètres hydrauliques, en utilisant de multiples attributs hydrogéologiques provenant d’un système d’information géographique (SIG) et d’études géophysiques dans le district de Sindhudurg, Maharashtra, Inde. Les attributs hydrogéologiques multiples en tant que paramètre d’entrée et les paramètres hydrauliques (par exemple, la conductivité hydraulique (K) et la transmissivité (T)) dérivés des méthodes géoélectriques sont utilisés comme cible pour construire les modèles prédictifs. Afin d’améliorer les performances du modèle, une technique de transformation logarithmique des données a été employée et une analyse de corrélation a été menée pour émettre des hypothèses sur les différentes configurations d’entrée pendant la construction du modèle ML. Les données ont été divisées en deux composantes: entraînement (80%) et test (20%). Des mesures de performance qualitatives et quantitatives ont été évaluées pour examiner le pouvoir prédictif des modèles ML. D’après les résultats des tests, RF s’avère être le meilleur modèle prédictif avec des coefficients de corrélation de Pearson de ~0.93 et 0.83 pour la modélisation de K et T, respectivement. Les résultats révèlent également que la performance du modèle dépend principalement de l’architecture ML, de la structure des données, de la précision des données et de la quantité de données utilisées. Ainsi, la présente étude facilite avec succès la modélisation prédictive des paramètres hydrauliques dans la zone d’étude et la méthode proposée pourrait faire l’objet d’étude approfondie dans d’autres zones hydrogéologiques complexes dans le monde.
Resumen
La estimación de los parámetros hidráulicos y su predicción satisfactoria en una región árida y/o semiárida constituyen un desafío debido a diversas complejidades hidrogeológicas. En este estudio, se utilizan algoritmos de aprendizaje automático (ML), a saber, Random Forest (RF), Random Tree (RT), Support Vector Machine (SVM), Gaussian Process (GP) y Long Short-Term Memory (LSTM)) y se comparan sus rendimientos para la predicción de parámetros hidráulicos, utilizando múltiples atributos hidrogeológicos derivados de un sistema de información geográfica (GIS) e investigaciones geofísicas en el distrito de Sindhudurg, Maharashtra, India. Los atributos hidrogeológicos múltiples como entrada y los parámetros hidráulicos (por ejemplo, la conductividad hidráulica (K) y la transmisividad (T)) derivados por métodos geoeléctricos se utilizan como objetivo para construir los modelos predictivos. Para mejorar el rendimiento del modelo, se empleó una técnica de transformación logarítmica de los datos y se realizó un análisis de correlación para hipotetizar las diferentes configuraciones de entrada durante la construcción del modelo ML. Los datos se dividieron en dos componentes: entrenamiento (80%) y prueba (20%). Se evaluaron las medidas de rendimiento cualitativas y cuantitativas para examinar el poder predictivo de los modelos ML. Según el resultado de la prueba, se encuentra que RF es el mejor modelo predictivo con coeficientes de correlación de Pearson de ~0.93 y 0.83 para modelar K y T, respectivamente. Los resultados también revelan que el rendimiento del modelo depende principalmente de la arquitectura del ML, la estructura de los datos, la precisión de los datos y la cantidad de datos utilizados. Por lo tanto, el presente estudio facilita con éxito la modelización predictiva de los parámetros hidráulicos en la zona de estudio y el método propuesto podría seguir explorándose en otras zonas hidrogeológicas complejas de todo el mundo.
摘要
由于水文地质的复杂性,在干旱/半干旱地区估算且成功预测水力参数具有一定挑战。基于来自地理信息系统(GIS)和印度马哈拉施特拉邦大堡区地球物理调查的多种水文地质属性,本研究采用了机器学**(ML)算法,即随机森林(RF)、随机树(RT)、支持向量机(SVM)、高斯过程(GP)和长短期记忆(LSTM),并比较了其在水力参数预测方面的算法性能。以多水文地质属性作为输入项,通过地电方法得出的水力参数(如:导水率K和导水系数T)作为建立预测模型的目标。为了提高模型性能,研究采用了对数数据转换技术并进行了相关分析,以假设ML模型建立过程中的不同输入配置。数据被分为两部分:模型训练(80%)和模型测试(20%)。为了验证ML模型的预测能力,对模型的定性和定量的性能指标进行了评估。测试结果显示,RF是最优的预测模型,其中模拟的K和T的 Pearson相关系数分别为0.93和0.83。此外,模型性能主要取决于ML架构、数据结构、数据精度和使用的数据量。因此,本研究成功地促进了研究区水力参数的预测建模,该方法可以在全球其他水文地质条件复杂的地区进一步探索。
Resumo
A estimativa dos parâmetros hidráulicos e sua previsão bem-sucedida em regiões áridas e/ou semiáridas são desafiadoras devido a várias complexidades hidrogeológicas. Nesse estudo, algoritmos de aprendizado de máquinas (AM) nomeados Florestas Aleatórias (FA), Árvores Aleatórias (AA), Maquinas de Vetor de Suporte (MVS), Processo Gaussiano (PG) e Memória de Curto Longo Prazo (MCLP) foram utilizados e seus desempenhos foram comparados para a previsão de parâmetros hidráulicos, utilizando múltiplos atributos hidrogeológicos derivados de um sistema de informação geográfica (SIG) e investigações geofísicas no Distrito Sindhudurg, Maharashtra, India. Atributos multihidrogeológicos como uma entrada e parâmetros hidráulicos (p. ex. condutividade hidráulica (K) e transmissividade (T)) derivados por métodos geoelétricos foram utilizados como alvo para a construção dos modelos de previsão. Para melhorar o desempenho do modelo, uma transformação de dados logarítmicos foi empregada e análise de correlação foi conduzida para hipotetizar as configurações de entradas diferentes durante a construção do modelo ML. Os dados foram divididos em dois componentes: treinamento (80%) e teste (20%). As medidas de desempenho quantitativo e qualitativo foram avaliadas para examinar o poder de previsão dos modelos AM. Baseado nos resultados dos testes, FA foi o melhor modelo de previsão com os coeficientes de correlação Pearson de ~0.93 e 0.83 para modelagens K e T, respectivamente. Os resultados também revelam que o desempenho do modelo depende principalmente da arquitetura AM, estrutura de dados, acurácia dos dados e na quantidade de dados utilizada. Assim, a estudo atual facilita com êxito a modelagem de previsão de parâmetros hidráulicos na área de estudo e o método proposto pode ser futuramente explorado para áreas hidrogeologicamente complexas pelo mundo.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig4a_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig4b_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10040-022-02567-5/MediaObjects/10040_2022_2567_Fig11_HTML.png)
Similar content being viewed by others
References
Aldous D (1991) The continuum random tree, II: an overview. Stochastic Anal 167:23–70
Archie GE (1942) The electrical resistivity log as an aid in determining some reservoir characteristics. Technical Publ. 1422, Petroleum Technology, American Institute of Mineral and Metal Engineering. Wilkes-Barre, PA, pp 8–13
Arshad RR, Sayyad G, Mosaddeghi M, Gharabaghi B (2013) Predicting saturated hydraulic conductivity by artificial intelligence and regression models. International Scholarly Research Notices. https://www.hindawi.com/journals/isrn/2013/308159/tab2/. Accessed Nov 2022
Asim Y, Shahid AR, Malik AK, Raza B (2018) Significance of machine learning algorithms in professional blogger’s classification. Comput Electr Eng 65:461–473
Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility map**: the Staffora River Basin case study Italy. Math Geosci 44(1):47–70
Barzegar R, Moghaddam AA, Adamowski J, Fijani E (2017) Comparison of machine learning models for predicting fluoride contamination in groundwater. Stoch Env Res Risk A 31(10):2705–2718
Börner FD, Schopper JR, Weller A (1996) Evaluation of transport and storage properties in the soil and groundwater zone from induced polarization measurements. Geophys Prospect 44(4):583–601
Breiman L (2001) Random forests. Mach Learn 45:5–32
Bui DT, Khosravi K, Karimi M, Busico G, Khozani ZS, Nguyen H, Mastrocicco M, Tedesco D, Cuoco E, Kazakis N (2020) Enhancing nitrate and strontium concentration prediction in groundwater by using new data mining algorithm. Sci Total Environ 715:136836
Ceryan N, Ozkat EC, Can NK, Ceryan S (2021) Machine learning models to estimate the elastic modulus of weathered magmatic rocks. Environ Earth Sci 80(12):1–24
CGWB (2009) Groundwater information, Sindhudurg district, Maharashtra. Technical report 1625/DB/2009, Central Ground Water Board, Lucknow, India
Croft MG (1971) A method of calculating permeability from electric logs. US Geol Surv Prof Pap 750, pp 265–269
Deolankar SB (1980) The Deccan basalts of Maharashtra, India: their potential as aquifers. Groundwater 18(5):434–437
DIVA-GIS (2020) https://www.diva-gis.org. Accessed Nov 2022
Domenico PA, Schwartz FW (1990) Physical and chemical hydrogeology. Wiley, New York, 324 pp
Drmota M, Gittenberger B (1997) On the profile of random trees. Random Struct Algorithm 10(4):421–451
Ekinci YL, Demirci A (2008) A damped least-squares inversion program for the interpretation of Schlumberger sounding curves. J Appl Sci 8:4070–4078
Gaur S, Chahar BR, Graillot D (2011) Combined use of groundwater modeling and potential zone analysis for management of groundwater. Int J Appl Earth Obs Geoinf 13(1):127–139
Gunn SR (1998) Support vector machines for classification and regression. ISIS Tech Rep 14(1):5–16. http://people.uncw.edu/pattersone/resources/documents/SVM.pdf. Accessed Nov 202
Gupta G, Erram VC, Maiti S, Kachate NR, Patil SN (2010) Geoelectrical studies for delineating seawater intrusion in parts of Konkan coast, western Maharashtra. Int J Environ Earth Sci 1(1)
Gupta G, Maiti S, Erram VC (2014) Analysis of electrical resistivity data in resolving the saline and freshwater aquifers in west coast Maharashtra. J Geol Soc India 84(5):555–568
Hijmans RJ, Guarino L, Cruz M, Rojas E (2001) Computer tools for spatial analysis of plant genetic resources data: 1. DIVA-GIS. Plant Gen Resour Newsl 127:15–19
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hong X, Gao J, Jiang X, Harris CJ (2014) Estimation of Gaussian process regression model using probability distance measures. Syst Sci Control Eng 2(1):655–663
Huntley D (1986) Relations between permeability and electrical resistivity in granular aquifers. Groundwater 24(4):466–474
ISRIC (2020) World soil information. https://www.isric.org. Accessed Nov 2022
Jarvis A, Reuter HI, Nelson A, Guevara E (2008) Hole-filled seamless SRTM data V4. International Centre for Tropical Agriculture (CIAT). https://srtm.csi.cgiar.org. Accessed Nov 2022
Javed A, Wani MH (2009) Delineation of groundwater potential zones in Kakund watershed, eastern Rajasthan, using remote sensing and GIS techniques. J Geol Soc India 73(2):229–236
Jorda H, Bechtold M, Jarvis N, Koestel J (2015) Using boosted regression trees to explore key factors controlling saturated and nearsaturated hydraulic conductivity. Eur J Soil Sci 66(4):744–756
Kim JC, Jung HS, Lee S (1981) Spatial map** of the groundwater potential of the Geum River basin using ensemble models based on remote sensing images. Remote Sens 11(19):2285
Kosinski WK, Kelly WE (1981) Geoelectric soundings for predicting aquifer properties. Groundwater 19(2):163–171
Kotlar AM, Iversen BV, de Jong van Lier Q (2019) Evaluation of parametric and nonparametric machine-learning techniques for prediction of saturated and near-saturated hydraulic conductivity. Vadose Zone J 18(1):1–3
Kouli M, Lydakis-Simantiris N, Soupios P (2009) GIS-based aquifer modeling and planning using integrated geoenvironmental and chemical approaches, chap 1. In: Groundwater: modeling, management and contamination. Nova, New York, pp 17–77
LaValle SM (1998) Rapidly-exploring random trees: a new tool for path planning. http://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf. Accessed Nov 2022
Lee J (2017) Review of remote sensing studies on groundwater resources. Korean J Remote Sens 33(5_3):855–866
Lee SJ, Yoon HK (2021) Discontinuity predictions of porosity and hydraulic conductivity based on electrical resistivity in slopes through deep learning algorithms. Sensors 21(4):1412
Lee S, Hyun Y, Lee S, Lee MJ (2020) Groundwater potential map** using remote sensing and GIS-based machine learning techniques. Remote Sens 12(7):1200
Li SC, He P, Li LP, Shi SS, Zhang QQ, Zhang J, Hu J (2017) Gaussian process model of water inflow prediction in tunnel construction and its engineering applications. Tunn Undergr Space Technol 69:155–161
Maiti S, Gupta G, Erram VC, Tiwari RK (2011) Inversion of Schlumberger resistivity sounding data from the critically dynamic Koyna region using the Hybrid Monte Carlo-based neural network approach. Nonlinear Process Geophys 18(2):179–192
Maiti S, Erram VC, Gupta G, Tiwari RK (2012) ANN based inversion of DC resistivity data for groundwater exploration in hard rock terrain of western Maharashtra (India). J Hydrol 464:294–308
Maiti S, Gupta G, Erram VC, Tiwari RK (2013) Delineation of shallow resistivity structure around Malvan, Konkan region, Maharashtra by neural network inversion using vertical electrical sounding measurements. Environ Earth Sci 68(3):779–794
Miao KC, Han TT, Yao YQ, Lu H, Chen P, Wang B, Zhang J (2020) Application of LSTM for short term fog forecasting based on meteorological elements. Neurocomputing 408:285–291
Nagarajan M, Singh S (2009) Assessment of groundwater potential zones using GIS technique. J Indian Soc Remote Sens 37(1):69–77
Naidu S, Gupta G (2018) Spatial variation of aquifer parameters from coastal aquifers of Sindhudurg District, Maharashtra using pore-water resistivity and bulk resistivity. Hydrospatial Anal 1(1):28–40
Niwas S, Celik M (2012) Equation estimation of porosity and hydraulic conductivity of Ruhrtal aquifer in Germany using near surface geophysics. J Appl Geophys 84:77–85
Niwas S, de Lima OA (2003) Aquifer parameter estimation from surface resistivity data. Groundwater 41(1):94–99
Niwas S, Singhal DC (1981) Estimation of aquifer transmissivity from Dar-Zarrouk parameters in porous media. J Hydrol 50:393–399
Niwas S, Tezkan B, Israil M (2011) Aquifer hydraulic conductivity estimation from surface geoelectrical measurements for Krauthausen test site, Germany. Hydrogeol J 19(2):307–315
O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc Ser B Methodol 40(1):1–24
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199
Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning. Springer, Berlin, pp 63–71
Ribeiro E, Batjes NH, van Oostrum AJM (2020) World Soil Information Service (WoSIS): towards the standardization and harmonization of world soil data. Procedures Manual 2020. ISRIC report 2020/01, ISRIC - World Soil Information, Wageningen, The Netherlands, 166 pp
Sameen MI, Pradhan B, Lee S (2019) Self-learning random forests model for map** groundwater yield in data-scarce areas. Nat Resour Res 28(3):757–775
Schwartz FW, Zhang H (2002) Fundamentals of ground water. Wiley, Chichester, UK
Sihag P, Tiwari NK, Ranjan S (2017) Modeling of infiltration of sandy soil using gaussian process regression. Modeling Earth Syst Environ 3(3):1091–1100
Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524
Sinha DD, Mohapatra SN, Pani P (2012) Map** and assessment of groundwater potential in Bilrai watershed (Shivpuri District, MP): a geomatics approach. J Indian Soc Remote Sens 40(4):649–668
Slater L (2007) Near surface electrical characterization of hydraulic conductivity: from petrophysical properties to aquifer geometries—a review. Surv Geophys 28(2):169–197
Soupios PM, Kouli M, Vallianatos F, Vafidis A, Stavroulakis G (2007) Estimation of aquifer hydraulic parameters from surficial geophysical methods: a case study of Keritis Basin in Chania (Crete-Greece). J Hydrol 338(1–2):122–131
Szabó B, Szatmári G, Takács K, Laborczi A, Makó A, Rajkai K, Pásztor L (2019) Map** soil hydraulic properties using random-forest-based pedotransfer functions and geostatistics. Hydrol Earth Syst Sci 23(6):2615–2635
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106(D7):7183–7192
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58(1):267–288
Twarakavi NK, Šimůnek J, Schaap MG (2009) Development of pedotransfer functions for estimation of soil hydraulic parameters using support vector machines. Soil Sci Soc Am J 73(5):1443–1452
USGS (2017) USGS FEWS NET data portal. https://earlywarning.usgs.gov/fews. Accessed Nov 2022
Vapnik VN (1995) The nature of statistical learning theory. Springer, Heidelberg, Germany
Vinegar HJ, Waxman MH (1984) Induced polarization of shaly sands. Geophysics 49(8):1267–1287
Worthington PF (1975) Quantitative geophysical investigations of granular aquifers. Geophys Surv 2(3):313–366
Worthington PF (1977) Geophysical investigations of groundwater resources in the Kalahari Basin. Geophysics 42(4):838–849
Worthington PF (1993) The uses and abuses of the Archie equations, 1: the formation factor-porosity relationship. J Appl Geophys 30(3):215–228
Yadav GS, Abolfazli H (1998) Geoelectrical soundings and their relationship to hydraulic parameters in semiarid regions of Jalore, northwestern India. J Appl Geophys 39(1):35–51
Acknowledgements
We thank the Director of IIT (ISM), Dhanbad for permitting us to publish the work.
Funding
PKG is grateful to IIT(ISM) for the SRF fellowship. SM acknowledges the partial financial support from the Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Govt. of India, New Delhi, (Grant No. CRG/2018/001368) and TexMin project (Grant No. PSF-1H-1Y-007) for neural network research and development.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors state that there are no interests to declare.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
ESM 1
(PDF 604 kb)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, P.K., Maiti, S. Enhancing the prediction of hydraulic parameters using machine learning, integrating multiple attributes of GIS and geophysics. Hydrogeol J 31, 501–520 (2023). https://doi.org/10.1007/s10040-022-02567-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10040-022-02567-5