Abstract
This study focuses on evaluating the performance of the resampling approach using under-sampling, over-sampling, and hybrid sampling techniques in the random forest (RF) model for landslide susceptibility assessment (LSA). For this research, the study area selected was Hokkaido, Japan, which experienced a total of 5,625 landslides as a single event caused by the 2018 Ibury earthquake. The objective of this study is to address the class imbalance issue and improve the accuracy of LSA. Multiple data sources are utilized to obtain conditioning factors, and objective absence data sampling based on Mahalanobis distance is employed to tackle the unlabeled sample problem. The RF model is used to calculate landslide susceptibility values and generate LSA. These values are then evaluated using two diagnostic tools, the Area Under the Receiver Operating Characteristic curve (AUROC) and the Precision-Recall curve (AUPRC). These tools help validate and interpret binary classification predictive models for imbalanced data. The results demonstrate improved performance with larger sample sizes, and the resampling approach yields better consistency compared to random sampling within the study area. To enhance the accuracy and consistency of machine learning techniques in reducing landslide risks, the study recommends utilizing hybrid sampling technique and Mahalanobis distance-based absence data sampling in LSA.
Similar content being viewed by others
References
Ado, M., Amitab, K., Maji, A.K., Jasińska, E., Gono, R., Leonowicz, Z., and Jasiński, M., 2002, Landslide susceptibility map** using machine learning: A literature survey. Remote Sensing, 14, 3029. https://doi.org/10.3390/rs14133029
Al-Najjar, H.A.H., Pradhan, B., Sarkar, R., Beydoun, G., and Alamri, A., 2021, A new integrated approach for landslide data balancing and spatial prediction based on Generative Adversarial Networks (GAN). Remote Sensing, 13, 4011. https://doi.org/10.3390/rs13194011
Azarafza, M., Azarafza, M., Akgün, H., Atkinson, P.M., and Derakhshani, R., 2021, Deep learning-based landslide susceptibility map**. Scientific Reports, 11, 24112. https://doi.org/10.1038/s41598-021-03585-1
Bhuyan, K., Tanyaş, H., Nava, L., Puliero, S., Meena, S.R., Floris, M., van Westen, C., and Catani, F., 2023, Generating multi-temporal landslide inventories through a general deep transfer learning strategy using HR EO data. Scientific Reports, 13, 162. https://doi.org/10.1038/s41598-022-27352-y
Cao, H., **e, X., Shi, J., and Wang, Y., 2022, Evaluating the validity of class balancing algorithms-based machine learning models for geogenic contaminated groundwaters prediction. Journal of Hydrology, 610, 127933. https://doi.org/10.1016/j.jhydrol.2022.127933
Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W., 2011, SMOTE: synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Conforti, M., Borrelli, L., Cofone, G., and Gullà, G., 2023, Exploring performance and robustness of shallow landslide susceptibility modeling at regional scale using different training and testing sets. Environmental Earth Sciences, 82, 161. https://doi.org/10.1007/s12665-023-10844-z
Conoscenti, C., Rotigliano, E., Cama, M., Caraballo-Arias, N.A., Lombardo, L., and Agnesi, V, 2016, Exploring the effect of absence selection on landslide susceptibility models: a case study in Sicily, Italy. Geomorphology, 261, 222–235. https://doi.org/10.1016/j.geomorph.2016.03.006
Dou, J., Yunus, A.P., Merghadi, A., Shirzadi, A., Nguyen, H., Hussain, Y., Avtar, R., Chen, Y., Pham, B.T., and Yamagishi, H., 2020, Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. The Science of the Total Environment, 720, 137320. https://doi.org/10.1016/j.scitotenv.2020.137320
Fang, Z., Wang, Y., Niu, R., and Peng, L., 2021, Landslide susceptibility prediction based on positive unlabeled learning coupled with adaptive sampling. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 11581–11592. https://doi.org/10.1109/JSTARS.2021.3125741
Fujiwara, S., Nakano, T., Morishita, Y., Kobayashi, T., Yarai, H., Une, H., and Hayashi, K., 2019, Detection and interpretation of local surface deformation from the 2018 Hokkaido Eastern Iburi Earthquake using ALOS-2 SAR data. Earth, Planets and Space, 71, 64. https://doi.org/10.1186/s40623-019-1046-2
Gao, H., Fam, P.S., Tay, L.T., and Low, H.C., 2020, Three over-sampling methods applied in a comparative landslide spatial research in Penang Island, Malaysia. SN Applied Sciences, 2, 1512. https://doi.org/10.1007/s42452-020-03307-8
Ge, Q., Sun, H., Liu, Z., and Wang, X., 2023, A data-driven intelligent model for landslide displacement prediction. Geological Journal, 58, 2187–2194. https://doi.org/10.1002/gj.4675
Guan, D., Yuan, W., Lee, Y., and Lee, S., 2009, Nearest neighbor editing aided by unlabeled data. Information Sciences, 179, 2273–2282. https://doi.org/10.1016/j.ins.2009.02.011
Himmy, O. and Rhinane, H., 2023, Landslide susceptibility map** using machine learning algorithms study case Al Hoceima region, northern Morocco. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVIII-4/W6-2022, 153–158. https://doi.org/10.5194/isprs-archives-XLVIII-4-W6-2022-153-2023
Hong, H., Tsangaratos, P., Ilia, I., Loupasakis, C., and Wang, Y., 2020, Introducing a novel multi-layer perceptron network based on stochastic gradient descent optimized by a meta-heuristic algorithm for landslide susceptibility map**. The Science of the Total Environment, 742, 140549. https://doi.org/10.1016/j.scitotenv.2020.140549
Huang, F., Zhang, J., Zhou, C., Wang, Y., Huang, J., and Zhu, L., 2020, A deep learning algorithm using a fully connected sparse autoencoder neural network for landslide susceptibility prediction. Landslides, 17, 217–229. https://doi.org/10.1007/s10346-019-01274-9
Kornejady, A., Ownegh, M., and Bahremand, A., 2017, Landslide susceptibility assessment using maximum entropy model with two different data sampling methods. Catena, 152, 144–162. https://doi.org/10.1016/j.catena.2017.01.010
Li, D., Liu, Z., **ao, P., Zhou, J., and Jahed Armaghani, D., 2022, Intelligent rockburst prediction model with sample category balance using feedforward neural network and Bayesian optimization. Underground Space, 7, 833–846. https://doi.org/10.1016/j.undsp.2021.12.009
Lima, P., Steger, S., Glade, T., and Mergili, M., 2023, Conventional data-driven landslide susceptibility models may only tell us half of the story: potential underestimation of landslide impact areas depending on the modeling design. Geomorphology, 430, 11–21. https://doi.org/10.1016/j.geomorph.2023.108638
Liu, Q., Tang, A., and Huang, D., 2023, Exploring the uncertainty of landslide susceptibility assessment caused by the number of non-landslides. Catena, 227, 107109. https://doi.org/10.1016/j.catena.2023.107109
Liu, Y., Zhang, W., Zhang, Z., Xu, Q., and Li, W, 2021, Risk factor detection and landslide susceptibility map** using Geo-Detector and Random forest models: The 2018 Hokkaido Eastern Iburi earthquake. Remote Sensing, 13, 1157. https://doi.org/10.3390/rs13061157
Lui, T.C.C., Gregory, D.D., Anderson, M., Lee, W., and Cowling, S.A., 2022, Applying machine learning methods to predict geology using soil sample geochemistry. Applied Computing and Geosciences, 16, 100094. https://doi.org/10.1016/j.acags.2022.100094
Luo, X., Lin, F., Zhu, S., Yu, M., Zhang, Z., Meng, L., and Peng, J., 2019, Mine landslide susceptibility assessment using IVM, ANN and SVM models considering the contribution of affecting factors. PLOS ONE, 14, e0215134. https://doi.org/10.1371/journal.pone.0215134
Moore, I.D., Grayson, R.B., and Ladson, A.R., 1991, Digital terrain modeling: a review of hydrological, geomorphological, and biological applications. Hydrological Processes, 5, 3–30. https://doi.org/10.1002/hyp.3360050103
Mutlu, A. and Goz, F., 2022, SkySlide: a hybrid method for landslide susceptibility assessment based on landslide-occurring data only. The Computer Journal, 65, 473–483. https://doi.org/10.1093/com-jnl/bxaa063
Nam, K. and Wang, F., 2019, The performance of using an autoencoder for prediction and susceptibility assessment of landslides: a case study on landslides triggered by the 2018 Hokkaido Eastern Iburi earthquake in Japan. Geoenvironmental Disasters, 6, 19. https://doi.org/10.1186/s40677-019-0137-5
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V, Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V, Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É., 2011, Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pourghasemi, H.R., Kornejady, A., Kerle, N., and Shabani, F., 2020, Investigating the effects of different landslide positioning techniques, landslide partitioning approaches, and presence-absence balances on landslide susceptibility map**. Catena, 187, 104364. https://doi.org/10.1016/j.catena.2019.104364
Rabby, Y.W, Li, Y., and Hilafu, H., 2023, An objective absence data sampling method for landslide susceptibility map**. Scientific Reports, 13, 1740. https://doi.org/10.1038/s41598-023-28991-5
Rong, G., Alu, S., Li, K., Su, Y., Zhang, J., Zhang, Y., and Li, T., 2020, Rainfall induced landslide susceptibility map** based on bayesian optimized random forest and gradient boosting decision tree models—a case study of Shuicheng County, China. Water, 12, 3066. https://doi.org/10.3390/w12113066
Shao, X., Ma, S., Xu, C., and Zhou, Q., 2020, Effects of sampling intensity and non-slide/slide sample ratio on the occurrence probability of coseismic landslides. Geomorphology, 363, 107222. https://doi.org/10.1016/j.geomorph.2020.107222
Song, Y., Niu, R., Xu, S., Ye, R., Peng, L., Guo, T., Li, S., and Chen, T., 2019, Landslide susceptibility map** based on weighted gradient boosting decision tree in Wanzhou Section of the Three Gorges Reservoir area (China). ISPRS International Journal of Geo-Information, 8, 4. https://doi.org/10.3390/ijgi8010004
Song, Y., Yang, D., Wu, W., Zhang, X., Zhou, J., Tian, Z., Wang, C., and Song, Y., 2023, Evaluating landslide susceptibility using sampling methodology and multiple machine learning models. ISPRS International Journal of Geo-Information, 12, 197. https://doi.org/10.3390/ijgi12050197
Su, C., Wang, B., Lv, Y., Zhang, M., Peng, D., Bate, B., and Zhang, S., 2022, Improved landslide susceptibility map** using unsupervised and supervised collaborative machine learning models. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 17, 387405. https://doi.org/10.1080/17499518.2022.2088802
Sun, D., Ding, Y., Zhang, J., Wen, H., Wang, Y., Xu, J., Zhou, X., and Liu, R., 2022, Essential insights into decision mechanism of landslide susceptibility map** based on different machine learning models. Geocarto International. https://doi.org/10.1080/10106049.2022.2146763
Süzen, M.L. and Doyuran, V.A., 2004, A comparison of the GIS based landslide susceptibility assessment methods: multivariate versus bivariate. Environmental Geology, 45, 665–679. https://doi.org/10.1007/s00254-003-0917-8
Tang, L., Yu, X., Jiang, W., and Zhou, J., 2023, Comparative study on landslide susceptibility map** based on unbalanced sample ratio. Scientific Reports, 13, 5823. https://doi.org/10.1038/s41598-023-33186-z
Walter, S.D., 2002, Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Statistics in Medicine, 21, 1237–1256. https://doi.org/10.1002/sim.1099
Wang, Y., Wu, X., Chen, Z., Ren, F., Feng, L., and Du, Q., 2019, Optimizing the predictive ability of machine learning methods for landslide susceptibility map** using SMOTE for Lishui City in Zhejiang Province, China. International Journal of Environmental Research and Public Health, 16, 368. https://doi.org/10.3390/ijerph16030368
Wu, B., Qiu, W, Jia, J., and Liu, N., 2021, Landslide susceptibility modeling using bagging-based positive-unlabeled learning. IEEE Geoscience and Remote Sensing Letters, 18, 766–770. https://doi.org/10.1109/LGRS.2020.2989497
Xu, S., Song, Y., and Hao, X., 2022, A comparative study of shallow machine learning models and deep learning models for landslide susceptibility assessment based on imbalanced data. Forests, 13, 1908. https://doi.org/10.3390/f13111908
Yang, C., Liu, L., Huang, F., Huang, L., and Wang, X., 2022, Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples. Gondwana Research. https://doi.org/10.1016/j.gr.2022.05.012
Yao, J., Qin, S., Qiao, S., Liu, X., Zhang, L., and Chen, J., 2022, Application of a two-step sampling strategy based on deep neural network for landslide susceptibility map**. Bulletin of Engineering Geology and the Environment, 81, 148. https://doi.org/10.1007/s10064-022-02615-0
Youssef, K., Shao, K., Moon, S., and Bouchard, L.S., 2023, Landslide susceptibility modeling by interpretable neural network. Communications Earth & Environment, 4, 162. https://doi.org/10.1038/s43247-023-00806-5
Zevenbergen, L.W. and Thorne, C.R., 1987, Quantitative analysis of land surface topography. Earth Surface Processes and Landforms, 12, 47–56. https://doi.org/10.1002/esp.3290120107
Zhang, H., Song, Y., Xu, S., He, Y., Li, Z., Yu, X., Liang, Y., Wu, W., and Wang, Y., 2022, Combining a class-weighted algorithm and machine learning models in landslide susceptibility map**: a case study of Wanzhou section of the Three Gorges Reservoir, China. Computers & Geosciences, 158, 104966. https://doi.org/10.1016/j.cageo.2021.104966
Zhang, S. and Yu, P., 2020, Seismic landslide susceptibility assessment based on ADASYN-LDA model. IOP Conference Series: Earth and Environmental Science, 525, 012087. https://doi.org/10.1088/1755-1315/525/1/012087
Zhu, A., Miao, Y., Liu, J., Bai, S., Zeng, C., Ma, T., and Hong, H., 2019, A similarity-based approach to sampling absence data for landslide susceptibility map** using data-driven methods. Catena, 183, 104188. https://doi.org/10.1016/j.catena.2019.104188
Acknowledgments
This study was carried out with the support of R&D Program for Forest Science Technology (Project No. 2023476C10-2325-BB01) provided by Korea Forest Service (Korea Forestry Promotion Institute).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nam, K., Kim, J. & Chae, BG. Exploring class imbalance with under-sampling, over-sampling, and hybrid sampling based on Mahalanobis distance for landslide susceptibility assessment: a case study of the 2018 Iburi earthquake induced landslides in Hokkaido, Japan. Geosci J 28, 71–94 (2024). https://doi.org/10.1007/s12303-023-0033-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12303-023-0033-6