Log in

Enhancing the prediction of hydraulic parameters using machine learning, integrating multiple attributes of GIS and geophysics

Amélioration de la prédiction des paramètres hydrauliques à l’aide de l’apprentissage automatique, en intégrant de multiples attributs du SIG et de la géophysique

Mejora en la predicción de los parámetros hidráulicos mediante el aprendizaje automático, integrando múltiples atributos de los SIG y la geofísica

利用机器学**、整合GIS和地球物理多重属性数据提高水力参数的预测

Melhorando a previsão de parâmetros hidráulicos utilizando aprendizado de máquinas integrando atributos múltiplos de SIG e geofísica

  • Paper
  • Published:
Hydrogeology Journal Aims and scope Submit manuscript

Abstract

Estimation of hydraulic parameters and their successful prediction in an arid and/or semiarid region are challenging due to various hydrogeological complexities. In this study, machine learning (ML) algorithms, namely random forest (RF), random tree (RT), support vector machine (SVM), Gaussian process (GP) and long short-term memory (LSTM)) are used and their performances are compared for the prediction of hydraulic parameters, utilizing multiple hydrogeological attributes deriving from a geographical information system (GIS) and geophysical investigations in Sindhudurg District, Maharashtra, India. Multihydrogeological attributes as input and hydraulic parameters—e.g., hydraulic conductivity (K) and transmissivity (T) derived by geoelectrical methods are used as a target for building the predictive models. To enhance the model performance, a logarithmic data transformation technique was employed and correlation analysis was conducted to hypothesize the different input configurations during ML model building. The data were divided into two components: training (80%) and testing (20%). Qualitative and quantitative performance measures were evaluated to examine the predictive power of the ML models. Based on the test result, RF is found to be the best predictive model with Pearson’s correlation coefficients of ~0.93 and 0.83 for modeling K and T, respectively. Results also reveal that model performance mainly depends on ML architecture, data structure, data accuracy, and the amount of data used. Thus, the present study successfully facilitates the predictive modeling of hydraulic parameters in the study area and the proposed method could be further explored in other complex hydrogeological areas around the world.

Résumé

L’estimation des paramètres hydrauliques et leur prédiction réussie dans une région aride et/ou semiaride sont un défi en raison de diverses complexités hydrogéologiques. Dans cette étude, des algorithmes d’apprentissage automatique (ML), à savoir la forêt aléatoire (RF), l’arbre aléatoire (RT), la machine à vecteur de support (SVM), le processus gaussien (GP) et la mémoire à long terme (LSTM), sont utilisés et leurs performances sont comparées pour la prédiction des paramètres hydrauliques, en utilisant de multiples attributs hydrogéologiques provenant d’un système d’information géographique (SIG) et d’études géophysiques dans le district de Sindhudurg, Maharashtra, Inde. Les attributs hydrogéologiques multiples en tant que paramètre d’entrée et les paramètres hydrauliques (par exemple, la conductivité hydraulique (K) et la transmissivité (T)) dérivés des méthodes géoélectriques sont utilisés comme cible pour construire les modèles prédictifs. Afin d’améliorer les performances du modèle, une technique de transformation logarithmique des données a été employée et une analyse de corrélation a été menée pour émettre des hypothèses sur les différentes configurations d’entrée pendant la construction du modèle ML. Les données ont été divisées en deux composantes: entraînement (80%) et test (20%). Des mesures de performance qualitatives et quantitatives ont été évaluées pour examiner le pouvoir prédictif des modèles ML. D’après les résultats des tests, RF s’avère être le meilleur modèle prédictif avec des coefficients de corrélation de Pearson de ~0.93 et 0.83 pour la modélisation de K et T, respectivement. Les résultats révèlent également que la performance du modèle dépend principalement de l’architecture ML, de la structure des données, de la précision des données et de la quantité de données utilisées. Ainsi, la présente étude facilite avec succès la modélisation prédictive des paramètres hydrauliques dans la zone d’étude et la méthode proposée pourrait faire l’objet d’étude approfondie dans d’autres zones hydrogéologiques complexes dans le monde.

Resumen

La estimación de los parámetros hidráulicos y su predicción satisfactoria en una región árida y/o semiárida constituyen un desafío debido a diversas complejidades hidrogeológicas. En este estudio, se utilizan algoritmos de aprendizaje automático (ML), a saber, Random Forest (RF), Random Tree (RT), Support Vector Machine (SVM), Gaussian Process (GP) y Long Short-Term Memory (LSTM)) y se comparan sus rendimientos para la predicción de parámetros hidráulicos, utilizando múltiples atributos hidrogeológicos derivados de un sistema de información geográfica (GIS) e investigaciones geofísicas en el distrito de Sindhudurg, Maharashtra, India. Los atributos hidrogeológicos múltiples como entrada y los parámetros hidráulicos (por ejemplo, la conductividad hidráulica (K) y la transmisividad (T)) derivados por métodos geoeléctricos se utilizan como objetivo para construir los modelos predictivos. Para mejorar el rendimiento del modelo, se empleó una técnica de transformación logarítmica de los datos y se realizó un análisis de correlación para hipotetizar las diferentes configuraciones de entrada durante la construcción del modelo ML. Los datos se dividieron en dos componentes: entrenamiento (80%) y prueba (20%). Se evaluaron las medidas de rendimiento cualitativas y cuantitativas para examinar el poder predictivo de los modelos ML. Según el resultado de la prueba, se encuentra que RF es el mejor modelo predictivo con coeficientes de correlación de Pearson de ~0.93 y 0.83 para modelar K y T, respectivamente. Los resultados también revelan que el rendimiento del modelo depende principalmente de la arquitectura del ML, la estructura de los datos, la precisión de los datos y la cantidad de datos utilizados. Por lo tanto, el presente estudio facilita con éxito la modelización predictiva de los parámetros hidráulicos en la zona de estudio y el método propuesto podría seguir explorándose en otras zonas hidrogeológicas complejas de todo el mundo.

摘要

由于水文地质的复杂性,在干旱/半干旱地区估算且成功预测水力参数具有一定挑战。基于来自地理信息系统(GIS)和印度马哈拉施特拉邦大堡区地球物理调查的多种水文地质属性,本研究采用了机器学**(ML)算法,即随机森林(RF)、随机树(RT)、支持向量机(SVM)、高斯过程(GP)和长短期记忆(LSTM),并比较了其在水力参数预测方面的算法性能。以多水文地质属性作为输入项,通过地电方法得出的水力参数(如:导水率K和导水系数T)作为建立预测模型的目标。为了提高模型性能,研究采用了对数数据转换技术并进行了相关分析,以假设ML模型建立过程中的不同输入配置。数据被分为两部分:模型训练(80%)和模型测试(20%)。为了验证ML模型的预测能力,对模型的定性和定量的性能指标进行了评估。测试结果显示,RF是最优的预测模型,其中模拟的KT的 Pearson相关系数分别为0.93和0.83。此外,模型性能主要取决于ML架构、数据结构、数据精度和使用的数据量。因此,本研究成功地促进了研究区水力参数的预测建模,该方法可以在全球其他水文地质条件复杂的地区进一步探索。

Resumo

A estimativa dos parâmetros hidráulicos e sua previsão bem-sucedida em regiões áridas e/ou semiáridas são desafiadoras devido a várias complexidades hidrogeológicas. Nesse estudo, algoritmos de aprendizado de máquinas (AM) nomeados Florestas Aleatórias (FA), Árvores Aleatórias (AA), Maquinas de Vetor de Suporte (MVS), Processo Gaussiano (PG) e Memória de Curto Longo Prazo (MCLP) foram utilizados e seus desempenhos foram comparados para a previsão de parâmetros hidráulicos, utilizando múltiplos atributos hidrogeológicos derivados de um sistema de informação geográfica (SIG) e investigações geofísicas no Distrito Sindhudurg, Maharashtra, India. Atributos multihidrogeológicos como uma entrada e parâmetros hidráulicos (p. ex. condutividade hidráulica (K) e transmissividade (T)) derivados por métodos geoelétricos foram utilizados como alvo para a construção dos modelos de previsão. Para melhorar o desempenho do modelo, uma transformação de dados logarítmicos foi empregada e análise de correlação foi conduzida para hipotetizar as configurações de entradas diferentes durante a construção do modelo ML. Os dados foram divididos em dois componentes: treinamento (80%) e teste (20%). As medidas de desempenho quantitativo e qualitativo foram avaliadas para examinar o poder de previsão dos modelos AM. Baseado nos resultados dos testes, FA foi o melhor modelo de previsão com os coeficientes de correlação Pearson de ~0.93 e 0.83 para modelagens K e T, respectivamente. Os resultados também revelam que o desempenho do modelo depende principalmente da arquitetura AM, estrutura de dados, acurácia dos dados e na quantidade de dados utilizada. Assim, a estudo atual facilita com êxito a modelagem de previsão de parâmetros hidráulicos na área de estudo e o método proposto pode ser futuramente explorado para áreas hidrogeologicamente complexas pelo mundo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Aldous D (1991) The continuum random tree, II: an overview. Stochastic Anal 167:23–70

    Article  Google Scholar 

  • Archie GE (1942) The electrical resistivity log as an aid in determining some reservoir characteristics. Technical Publ. 1422, Petroleum Technology, American Institute of Mineral and Metal Engineering. Wilkes-Barre, PA, pp 8–13

  • Arshad RR, Sayyad G, Mosaddeghi M, Gharabaghi B (2013) Predicting saturated hydraulic conductivity by artificial intelligence and regression models. International Scholarly Research Notices. https://www.hindawi.com/journals/isrn/2013/308159/tab2/. Accessed Nov 2022

  • Asim Y, Shahid AR, Malik AK, Raza B (2018) Significance of machine learning algorithms in professional blogger’s classification. Comput Electr Eng 65:461–473

    Article  Google Scholar 

  • Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility map**: the Staffora River Basin case study Italy. Math Geosci 44(1):47–70

    Article  Google Scholar 

  • Barzegar R, Moghaddam AA, Adamowski J, Fijani E (2017) Comparison of machine learning models for predicting fluoride contamination in groundwater. Stoch Env Res Risk A 31(10):2705–2718

    Article  Google Scholar 

  • Börner FD, Schopper JR, Weller A (1996) Evaluation of transport and storage properties in the soil and groundwater zone from induced polarization measurements. Geophys Prospect 44(4):583–601

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Bui DT, Khosravi K, Karimi M, Busico G, Khozani ZS, Nguyen H, Mastrocicco M, Tedesco D, Cuoco E, Kazakis N (2020) Enhancing nitrate and strontium concentration prediction in groundwater by using new data mining algorithm. Sci Total Environ 715:136836

    Article  Google Scholar 

  • Ceryan N, Ozkat EC, Can NK, Ceryan S (2021) Machine learning models to estimate the elastic modulus of weathered magmatic rocks. Environ Earth Sci 80(12):1–24

    Article  Google Scholar 

  • CGWB (2009) Groundwater information, Sindhudurg district, Maharashtra. Technical report 1625/DB/2009, Central Ground Water Board, Lucknow, India

  • Croft MG (1971) A method of calculating permeability from electric logs. US Geol Surv Prof Pap 750, pp 265–269

    Google Scholar 

  • Deolankar SB (1980) The Deccan basalts of Maharashtra, India: their potential as aquifers. Groundwater 18(5):434–437

    Article  Google Scholar 

  • DIVA-GIS (2020) https://www.diva-gis.org. Accessed Nov 2022

  • Domenico PA, Schwartz FW (1990) Physical and chemical hydrogeology. Wiley, New York, 324 pp

    Google Scholar 

  • Drmota M, Gittenberger B (1997) On the profile of random trees. Random Struct Algorithm 10(4):421–451

    Article  Google Scholar 

  • Ekinci YL, Demirci A (2008) A damped least-squares inversion program for the interpretation of Schlumberger sounding curves. J Appl Sci 8:4070–4078

    Article  Google Scholar 

  • Gaur S, Chahar BR, Graillot D (2011) Combined use of groundwater modeling and potential zone analysis for management of groundwater. Int J Appl Earth Obs Geoinf 13(1):127–139

    Google Scholar 

  • Gunn SR (1998) Support vector machines for classification and regression. ISIS Tech Rep 14(1):5–16. http://people.uncw.edu/pattersone/resources/documents/SVM.pdf. Accessed Nov 202

  • Gupta G, Erram VC, Maiti S, Kachate NR, Patil SN (2010) Geoelectrical studies for delineating seawater intrusion in parts of Konkan coast, western Maharashtra. Int J Environ Earth Sci 1(1)

  • Gupta G, Maiti S, Erram VC (2014) Analysis of electrical resistivity data in resolving the saline and freshwater aquifers in west coast Maharashtra. J Geol Soc India 84(5):555–568

    Article  Google Scholar 

  • Hijmans RJ, Guarino L, Cruz M, Rojas E (2001) Computer tools for spatial analysis of plant genetic resources data: 1. DIVA-GIS. Plant Gen Resour Newsl 127:15–19

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hong X, Gao J, Jiang X, Harris CJ (2014) Estimation of Gaussian process regression model using probability distance measures. Syst Sci Control Eng 2(1):655–663

    Article  Google Scholar 

  • Huntley D (1986) Relations between permeability and electrical resistivity in granular aquifers. Groundwater 24(4):466–474

    Article  Google Scholar 

  • ISRIC (2020) World soil information. https://www.isric.org. Accessed Nov 2022

  • Jarvis A, Reuter HI, Nelson A, Guevara E (2008) Hole-filled seamless SRTM data V4. International Centre for Tropical Agriculture (CIAT). https://srtm.csi.cgiar.org. Accessed Nov 2022

  • Javed A, Wani MH (2009) Delineation of groundwater potential zones in Kakund watershed, eastern Rajasthan, using remote sensing and GIS techniques. J Geol Soc India 73(2):229–236

    Article  Google Scholar 

  • Jorda H, Bechtold M, Jarvis N, Koestel J (2015) Using boosted regression trees to explore key factors controlling saturated and nearsaturated hydraulic conductivity. Eur J Soil Sci 66(4):744–756

    Article  Google Scholar 

  • Kim JC, Jung HS, Lee S (1981) Spatial map** of the groundwater potential of the Geum River basin using ensemble models based on remote sensing images. Remote Sens 11(19):2285

    Article  Google Scholar 

  • Kosinski WK, Kelly WE (1981) Geoelectric soundings for predicting aquifer properties. Groundwater 19(2):163–171

    Article  Google Scholar 

  • Kotlar AM, Iversen BV, de Jong van Lier Q (2019) Evaluation of parametric and nonparametric machine-learning techniques for prediction of saturated and near-saturated hydraulic conductivity. Vadose Zone J 18(1):1–3

    Google Scholar 

  • Kouli M, Lydakis-Simantiris N, Soupios P (2009) GIS-based aquifer modeling and planning using integrated geoenvironmental and chemical approaches, chap 1. In: Groundwater: modeling, management and contamination. Nova, New York, pp 17–77

    Google Scholar 

  • LaValle SM (1998) Rapidly-exploring random trees: a new tool for path planning. http://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf. Accessed Nov 2022

  • Lee J (2017) Review of remote sensing studies on groundwater resources. Korean J Remote Sens 33(5_3):855–866

    Google Scholar 

  • Lee SJ, Yoon HK (2021) Discontinuity predictions of porosity and hydraulic conductivity based on electrical resistivity in slopes through deep learning algorithms. Sensors 21(4):1412

    Article  Google Scholar 

  • Lee S, Hyun Y, Lee S, Lee MJ (2020) Groundwater potential map** using remote sensing and GIS-based machine learning techniques. Remote Sens 12(7):1200

    Article  Google Scholar 

  • Li SC, He P, Li LP, Shi SS, Zhang QQ, Zhang J, Hu J (2017) Gaussian process model of water inflow prediction in tunnel construction and its engineering applications. Tunn Undergr Space Technol 69:155–161

    Article  Google Scholar 

  • Maiti S, Gupta G, Erram VC, Tiwari RK (2011) Inversion of Schlumberger resistivity sounding data from the critically dynamic Koyna region using the Hybrid Monte Carlo-based neural network approach. Nonlinear Process Geophys 18(2):179–192

    Article  Google Scholar 

  • Maiti S, Erram VC, Gupta G, Tiwari RK (2012) ANN based inversion of DC resistivity data for groundwater exploration in hard rock terrain of western Maharashtra (India). J Hydrol 464:294–308

    Article  Google Scholar 

  • Maiti S, Gupta G, Erram VC, Tiwari RK (2013) Delineation of shallow resistivity structure around Malvan, Konkan region, Maharashtra by neural network inversion using vertical electrical sounding measurements. Environ Earth Sci 68(3):779–794

    Article  Google Scholar 

  • Miao KC, Han TT, Yao YQ, Lu H, Chen P, Wang B, Zhang J (2020) Application of LSTM for short term fog forecasting based on meteorological elements. Neurocomputing 408:285–291

    Article  Google Scholar 

  • Nagarajan M, Singh S (2009) Assessment of groundwater potential zones using GIS technique. J Indian Soc Remote Sens 37(1):69–77

    Article  Google Scholar 

  • Naidu S, Gupta G (2018) Spatial variation of aquifer parameters from coastal aquifers of Sindhudurg District, Maharashtra using pore-water resistivity and bulk resistivity. Hydrospatial Anal 1(1):28–40

  • Niwas S, Celik M (2012) Equation estimation of porosity and hydraulic conductivity of Ruhrtal aquifer in Germany using near surface geophysics. J Appl Geophys 84:77–85

    Article  Google Scholar 

  • Niwas S, de Lima OA (2003) Aquifer parameter estimation from surface resistivity data. Groundwater 41(1):94–99

    Article  Google Scholar 

  • Niwas S, Singhal DC (1981) Estimation of aquifer transmissivity from Dar-Zarrouk parameters in porous media. J Hydrol 50:393–399

    Article  Google Scholar 

  • Niwas S, Tezkan B, Israil M (2011) Aquifer hydraulic conductivity estimation from surface geoelectrical measurements for Krauthausen test site, Germany. Hydrogeol J 19(2):307–315

    Article  Google Scholar 

  • O’Hagan A (1978) Curve fitting and optimal design for prediction. J R Stat Soc Ser B Methodol 40(1):1–24

    Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199

    Article  Google Scholar 

  • Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning. Springer, Berlin, pp 63–71

    Google Scholar 

  • Ribeiro E, Batjes NH, van Oostrum AJM (2020) World Soil Information Service (WoSIS): towards the standardization and harmonization of world soil data. Procedures Manual 2020. ISRIC report 2020/01, ISRIC - World Soil Information, Wageningen, The Netherlands, 166 pp

  • Sameen MI, Pradhan B, Lee S (2019) Self-learning random forests model for map** groundwater yield in data-scarce areas. Nat Resour Res 28(3):757–775

    Article  Google Scholar 

  • Schwartz FW, Zhang H (2002) Fundamentals of ground water. Wiley, Chichester, UK

  • Sihag P, Tiwari NK, Ranjan S (2017) Modeling of infiltration of sandy soil using gaussian process regression. Modeling Earth Syst Environ 3(3):1091–1100

    Article  Google Scholar 

  • Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524

    Article  Google Scholar 

  • Sinha DD, Mohapatra SN, Pani P (2012) Map** and assessment of groundwater potential in Bilrai watershed (Shivpuri District, MP): a geomatics approach. J Indian Soc Remote Sens 40(4):649–668

    Article  Google Scholar 

  • Slater L (2007) Near surface electrical characterization of hydraulic conductivity: from petrophysical properties to aquifer geometries—a review. Surv Geophys 28(2):169–197

    Article  Google Scholar 

  • Soupios PM, Kouli M, Vallianatos F, Vafidis A, Stavroulakis G (2007) Estimation of aquifer hydraulic parameters from surficial geophysical methods: a case study of Keritis Basin in Chania (Crete-Greece). J Hydrol 338(1–2):122–131

    Article  Google Scholar 

  • Szabó B, Szatmári G, Takács K, Laborczi A, Makó A, Rajkai K, Pásztor L (2019) Map** soil hydraulic properties using random-forest-based pedotransfer functions and geostatistics. Hydrol Earth Syst Sci 23(6):2615–2635

    Article  Google Scholar 

  • Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106(D7):7183–7192

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58(1):267–288

    Google Scholar 

  • Twarakavi NK, Šimůnek J, Schaap MG (2009) Development of pedotransfer functions for estimation of soil hydraulic parameters using support vector machines. Soil Sci Soc Am J 73(5):1443–1452

    Article  Google Scholar 

  • USGS (2017) USGS FEWS NET data portal. https://earlywarning.usgs.gov/fews. Accessed Nov 2022

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, Heidelberg, Germany

  • Vinegar HJ, Waxman MH (1984) Induced polarization of shaly sands. Geophysics 49(8):1267–1287

    Article  Google Scholar 

  • Worthington PF (1975) Quantitative geophysical investigations of granular aquifers. Geophys Surv 2(3):313–366

    Article  Google Scholar 

  • Worthington PF (1977) Geophysical investigations of groundwater resources in the Kalahari Basin. Geophysics 42(4):838–849

    Article  Google Scholar 

  • Worthington PF (1993) The uses and abuses of the Archie equations, 1: the formation factor-porosity relationship. J Appl Geophys 30(3):215–228

    Article  Google Scholar 

  • Yadav GS, Abolfazli H (1998) Geoelectrical soundings and their relationship to hydraulic parameters in semiarid regions of Jalore, northwestern India. J Appl Geophys 39(1):35–51

    Article  Google Scholar 

Download references

Acknowledgements

We thank the Director of IIT (ISM), Dhanbad for permitting us to publish the work.

Funding

PKG is grateful to IIT(ISM) for the SRF fellowship. SM acknowledges the partial financial support from the Science and Engineering Research Board (SERB), Department of Science and Technology (DST), Govt. of India, New Delhi, (Grant No. CRG/2018/001368) and TexMin project (Grant No. PSF-1H-1Y-007) for neural network research and development.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Praveen Kumar Gupta.

Ethics declarations

Conflict of interest

The authors state that there are no interests to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(PDF 604 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, P.K., Maiti, S. Enhancing the prediction of hydraulic parameters using machine learning, integrating multiple attributes of GIS and geophysics. Hydrogeol J 31, 501–520 (2023). https://doi.org/10.1007/s10040-022-02567-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10040-022-02567-5

Keywords

Navigation