Log in

Landslide Susceptibility Prediction based on Decision Tree and Feature Selection Methods

  • Research Article
  • Published:
Journal of the Indian Society of Remote Sensing Aims and scope Submit manuscript

Abstract

Landslide hazards give rise to considerable demolition and losses to lives in hilly areas. To reduce the destruction in these endangered regions, the prediction of landslide incidents with good accuracy remains a key challenge. Over the years, machine learning models have been used to increase the accuracy and precision of landslide predictions. These machine learning models are sensitive to the data on which they are applied. Feature selection is a crucial task in applying machine learning as meticulously selected features can significantly improve the performance of the machine learning model. These selected features decrease the learning time of the model and increase comprehensibility. In this paper, we have considered three feature selection methods namely chi-squared, extra tree classifier and heat map. The paper substantiates that feature selection can significantly increase the performance of the model. The study was carried out on the landslide data of the Kullu to Rohtang Pass transport corridor in Himachal Pradesh, India. The classification score and receiver operating characteristics (ROC) curves were used to evaluate the model performance. Results exhibited that eliminating one or more features using different feature selection methods increased the comprehensibility of the model by reducing the dimensionality of the dataset. The model achieved an accuracy of 90.74% and an area under the ROC curve (AUROC) value of 0.979. Furthermore, it can be deduced that with a reduced number of features model learns faster without affecting the actual result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abdalla, M., & Almghari, K. I. (2011). Remedy of multicollinearity using ridge regression. Journal of Al Azhar University Gaza (Natural Sciences), 13, 119–134.

    Google Scholar 

  • Achu, A. L., Aju, C. D., Pham, Q. B., Reghunath, R., & Anh, D. T. (2022). Landslide susceptibility modelling using hybrid bivariate statistical-based machine-learning method in a highland segment of Southern Western Ghats India. Environmental Earth Sciences, 81(13), 360. https://doi.org/10.1007/s12665-022-10464-z.

    Article  Google Scholar 

  • Achu, A. L., & Aju Rajesh Reghunath, C. D. (2020). Spatial modeling of shallow landslide susceptibility: a study from the southern western ghats region of Kerala India. Annals of GIS, 26(2), 113–131. https://doi.org/10.1080/19475683.2020.1758207.

    Article  Google Scholar 

  • Aggarwal, C. C. (2004). On demand classification of data streams. In Proceedings ACM SIGKDD international conference knowledge discovery data mining, (pp. 503–508).

  • Aghdam, I. N., Varzandeh, M. H. M., & Pradhan, B. (2016). Landslide susceptibility map** using an ensemble statistical index (wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz mountains (Iran). Environmental Earth Sciences, 75(7), 553. https://doi.org/10.1007/s12665-015-5233-6.

    Article  Google Scholar 

  • Akgun, A., Sezer, E. A., Nefeslioglu, H. A., Gokceoglu, C., & Pradhan, B. (2012). An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences, 38(1), 23–34. https://doi.org/10.1016/j.cageo.2011.04.012.

    Article  Google Scholar 

  • Alin, A. (2010). Multicollinearity wiley interdisciplinary reviews. Computational Statistics, 2(3), 370–374. https://doi.org/10.1002/wics.84.

    Article  Google Scholar 

  • Allen, M. P. (1997). The problem of multicollinearity. Understanding regression analysis: Springer, Boston, MA. https://doi.org/10.1007/978-0-585-25657-3_37.

    Book  Google Scholar 

  • Andrew, P. B. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2.

    Article  Google Scholar 

  • Bahassine, S., Madani, A., & Kissi, M. (2016). An improved Chi-sqaure feature selection for Arabic text classification using decision tree. In 11th International conference on intelligent systems: Theories and applications (SITA), (pp. 1–5). https://doi.org/10.1109/SITA.2016.7772289.

  • Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039–1082. https://doi.org/10.1007/s10994-017-5633-9.

    Article  Google Scholar 

  • Bharadwaj, B. K., & Pal, S. (2011). Data Mining: A prediction for performance improvement using classification. International Journal of Computer Science and Information Security, 9(4), 136–140. https://doi.org/10.48550/ar**v.1201.3418.

    Article  Google Scholar 

  • Bradley, P.S., Fayyad, U.M., & Reina, C. (1998). Scaling clustering algorithms to large databases. Knowledge Discovery and Data Mining, 9–15.

  • Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on machine learning, (pp. 161–168). Pittsburgh, Pennsylvania. https://doi.org/10.1145/1143844.1143865.

  • Carvalho, D. R., & Freitas, A. A. (2004). A hybrid decision tree/genetic algorithm method for data mining. Information Sciences, 163(1–3), 13–35. https://doi.org/10.1016/j.ins.2003.03.013.

    Article  Google Scholar 

  • Chandra, B., & Varghese, P. P. (2009). Fuzzifying Gini Index based decision trees. Expert Systems with Applications, 36(4), 8549–8559. https://doi.org/10.1016/j.eswa.2008.10.053.

    Article  Google Scholar 

  • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024.

    Article  Google Scholar 

  • Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., Wang, X., Bian, H., Zhang, S., Pradhan, B., & Ahmad, B. B. (2020). Modeling flood susceptibility using data-driven approaches of naive Bayes tree, alternating decision tree, and random forest methods. Science of the Total Environment, 701, 134979. https://doi.org/10.1016/j.scitotenv.2019.134979.

    Article  Google Scholar 

  • Chen, W., **e, X., Wang, J., Pradhan, B., Hong, H., Bui, D. T., Duan, Z., & Ma, J. A. (2017). Comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena, 151, 147–160. https://doi.org/10.1016/j.catena.2016.11.032.

    Article  Google Scholar 

  • Feizizadeh, B., & Ghorbanzadeh, O. (2017). GIS-based interval pairwise comparison matrices as a novel approach for optimizing an analytical hierarchy process and multiple criteria weighting. GI_Forum, 1, 27–35. https://doi.org/10.1553/giscience2017_01_s27.

    Article  Google Scholar 

  • Friedl, M. A., & Brodley, C. E. (1997). Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, 61(3), 399–409. https://doi.org/10.1016/S0034-4257(97)00049-7.

    Article  Google Scholar 

  • Garcia, S., Luengo, J., Saez, J. A., Lopez, V., & Herrera, F. (2013). A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734–750.

    Article  Google Scholar 

  • Ge, L., Li, G. Z., & You, M. Y. (2009). Embedded feature selection for multi-label learning. Journal of Nan**g University (Natural Sciences), 45(5), 671–676. https://doi.org/10.1145/1854776.1854828.

    Article  Google Scholar 

  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63, 3–42. https://doi.org/10.1007/s10994-006-6226-1.

    Article  Google Scholar 

  • Ghahramani, Z. (2015). Probabilistic machine learning and artificial intelligence. Nature, 521, 452–459. https://doi.org/10.1038/nature14541.

    Article  Google Scholar 

  • Ghorbanzadeh, O., Blaschke, T., Aryal, J., & Gholaminia, K. (2018). A new GIS-based technique using an adaptive neuro-fuzzy inference system for land subsidence susceptibility map**. Journal of Spatial Science, 65(3), 401–418. https://doi.org/10.1080/14498596.2018.1505564.

    Article  Google Scholar 

  • Goyal, S., & Maheshwar. (2019). Naive bayes model based improved k-nearest neighbor classifier for breast cancer prediction. In A. Luhach, D. Jat, K. Hawari, X. Z. Gao, & P. Lingras (Eds.), Advanced Informatics for Computing Research, ICAICR, Communications in Computer and Information Science, (p 1075). Singapore: Springer.

    Google Scholar 

  • Guo, Y., Chung, F., & Li, G. (2016). An ensemble embedded feature selection method for multi-label clinical text classification. In IEEE International Conference on Bioinformatics and Biomedicine, (pp. 823–826). https://doi.org/10.1109/BIBM.2016.7822631.

  • Hanley, J. A., & McNeil, B. J. (1983). A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148(3), 839–843. https://doi.org/10.1148/radiology.148.3.6878708.

    Article  Google Scholar 

  • Holbling, D., Fureder, P., Antolini, F., Cigna, F., Casagli, N., & Lang, S. (2012). A semi-automated object-based approach for landslide detection validated by persistent scatterer interferometry measures and landslide inventories. Remote Sensing, 4(5), 1310–1336. https://doi.org/10.3390/rs4051310.

    Article  Google Scholar 

  • Hong, H., Tsangaratos, P., Ilia, I., Liu, J., Zhu, A. X., & Chen, W. (2018). Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Science of The Total Environment, 625, 575–588. https://doi.org/10.1016/j.scitotenv.2017.12.256.

    Article  Google Scholar 

  • Hong, H., Chen, W., Xu, C., Youssef, A. M., Pradhan, B., & Tien Bui, D. (2017). Rainfall-induced landslide susceptibility assessment at the Chongren area (China) using frequency ratio, certainty factor, and index of entropy. Geocarto International, 32(2), 139–154. https://doi.org/10.1080/10106049.2015.1130086.

    Article  Google Scholar 

  • **, R., Breitbart, Y., & Muoh, C. (2009). Data discretization unification. Knowledge and Information Systems, 19(1), 1–29. https://doi.org/10.1007/s10115-008-0142-6.

    Article  Google Scholar 

  • Kamber, M., Winstone, L., Wan, G., Shan, C., & Jiawei, H. (1997). Generalization and decision tree induction: efficient classification in data mining. Proceedings Seventh International Workshop on Research Issues in Data Engineering, High Performance Database Management for Large-Scale Applications (pp. 111–120). UK: Birmingham.

    Chapter  Google Scholar 

  • Kannan, R., & Vasanthi, V. (2019). Machine learning algorithms with ROC curve for predicting and diagnosing the heart disease. In Soft Computing and Medical Bioinformatics (pp. 63–72). Springer Briefs in Applied Sciences and Technology. https://doi.org/10.1007/978-981-13-0059-2_8.

    Chapter  Google Scholar 

  • Lavrač, N. (1999). Machine learning for data mining in medicine. In W. Horn, Y. Shahar, G. Lindberg, S. Andreassen, & J. Wyatt (Eds.), Lecture notes in computer science. AIMDM 1999, Artificial Intelligence in Medicine (Vol. 1620)). Heidelberg: Springer, Berlin. https://doi.org/10.1007/3-540-48720-4_4.

    Chapter  Google Scholar 

  • Lee, I. H., Lushington, G. H., & Visvanathan, M. (2011). A filter-based feature selection approach for identifying potential biomarkers for lung cancer. Journal of Clinical Bioinformatics, 1(1), 11. https://doi.org/10.1186/2043-9113-1-11.

    Article  Google Scholar 

  • Liang, D., Tsai, C. F., & Wu, H. T. (2015). The effect of feature selection on financial distress prediction. Knowledge Based Systems, 73, 289–297. https://doi.org/10.1016/j.knosys.2014.10.010.

    Article  Google Scholar 

  • Lin, W., Chu, H., Wu, J., Sheng, B., & Chen, Z. (2013). A Heat-Map-Based algorithm for recognizing group activities in videos. IEEE Transactions on Circuits and Systems for Video Technology, 23(11), 1980–1992.

    Article  Google Scholar 

  • Lin, F. (2008). Solving multicollinearity in the process of fitting regression model using the Nested estimate procedure. Quality & Quantity, 42, 417–426.

    Article  Google Scholar 

  • Lu, M. (2019). Embedded feature selection accounting for unknown data heterogeneity. Expert Systems with Applications, 119, 350–361.

    Article  Google Scholar 

  • Maheshwar Kaushik, K., & Arora, V. (2015). A hybrid data clustering using firefly algorithm based improved genetic algorithm. Procedia Computer Science, 58, 249–256.

    Article  Google Scholar 

  • Maheshwar, & Kumar, G. (2019). Breast cancer detection using decision tree, naive bayes, KNN and SVM classifiers: A comparative study. In International conference on smart systems and inventive technology (ICSSIT), (pp. 683–686). Tirunelveli, India. https://doi.org/10.1109/ICSSIT46314.2019.8987778.

  • Mamitsuka, H. (2006). Selecting features in microarray classification using ROC curves. Pattern Recognition, 39(12), 2393–2404. https://doi.org/10.1016/j.patcog.2006.07.010.

    Article  Google Scholar 

  • Mansfield, E. R., & Helms, B. P. (1982). Detecting multicollinearity. The American Statistician, 36(3a), 158–160. https://doi.org/10.1080/00031305.1982.10482818.

    Article  Google Scholar 

  • Martire, D., De Rosa, M., Pesce, V., Santangelo, M. A., & Calcaterra, D. (2012). Landslide hazard and land management in high-density urban areas of Campania region, Italy. Natural Hazards and Earth System Sciences, 12(4), 905–926. https://doi.org/10.5194/nhess-12-905-2012.

    Article  Google Scholar 

  • Mengmeng, W., Zhigang, L., Zhongliang, S., Yong, Y., & Hong, Z. (2019). Machine learning methods for MRI biomarkers analysis of pediatric posterior fossa tumors. Biocybernetics and Biomedical Engineering, 39(3), 765–774. https://doi.org/10.1016/j.bbe.2019.07.004.

    Article  Google Scholar 

  • Miles, J. (2005). Tolerance and variance inflation factor. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in Behavioral Science (pp. 2055–2056). Hoboken, NJ, USA: Wiley.

    Google Scholar 

  • Myronidis, D., Papageorgiou, C., & Theophanous, S. (2016). Landslide susceptibility map** based on landslide history and analytic hierarchy process (AHP). Natural Hazards, 81, 245–263. https://doi.org/10.1007/s11069-015-2075-1.

    Article  Google Scholar 

  • Narayanan, B. N., Djaneye, B. O., & Kebede, T. M. (2016). Performance analysis of machine learning and pattern recognition algorithms for Malware classification. IEEE National aerospace and electronics conference (NAECON)and Ohio innovation summit (OIS) (pp. 338–342). OH: Dayton. https://doi.org/10.1109/NAECON.2016.7856826.

    Chapter  Google Scholar 

  • Pal, B., Zaman, S., & Hasan, M. A. (2015). Chi-Square statistic and principal component analysis based compressed feature selection approach for Naive Bayesian Classifier. Journal of Artificial Intelligence Research & Advances, 2(2), 16–23.

    Google Scholar 

  • Pham, Q. B., Achour, Y., Ali, S. A., Parvin, F., Vojtek, M., Vojteková, J., Al-Ansari, N., Achu, A. L., Costache, R., Khedher, K. M., & Anh, D. T. (2021). A comparison among fuzzy multi-criteria decision making, bivariate, multivariate and machine learning models in landslide susceptibility map**. Geomatics Natural Hazards and Risk, 12(1), 1741–1777. https://doi.org/10.1080/19475705.2021.1944330.

    Article  Google Scholar 

  • Pinto, A., Pereira, S., Correia, H., Oliveira, J., Rasteiro, D. M. L. D., & Silva, C. A. (2015). Brain tumour segmentation based on extremely randomized forest with high-level features. In 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), (pp. 3037–3040). https://doi.org/10.1109/embc.2015.7319032.

  • Porkodi, R. (2014). Comparison of filter based feature selection algorithms: An overview. International journal of Innovative Research in Technology & Science, 2(2), 108–113.

    Google Scholar 

  • Pourghasemi, H. R., & Kerle, N. (2016). Random forests and evidential belief function-based landslide susceptibility assessment in western Mazandaran province Iran. Environmental Earth Sciences, 75, 185. https://doi.org/10.1007/s12665-015-4950-1.

    Article  Google Scholar 

  • Pourghasemi, H., Gayen, A., Park, S., Lee, C. W., & Lee, S. (2018). Assessment of landslide-prone areas and their zonation using logistic regression, logitboost, and naivebayes machine-learning algorithms. Sustainability, 10(10), 3697. https://doi.org/10.3390/su10103697.

    Article  Google Scholar 

  • Pradhan, B. A. (2013). Comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility map** using GIS. Computers & Geosciences, 51, 350–365. https://doi.org/10.1016/j.cageo.2012.08.023.

    Article  Google Scholar 

  • Premakanthan, P., & Mikhael, W. B. (2001). Speaker verification/recognition and the importance of selective feature extraction: review. In: Proceedings of the 44th IEEE 2001 midwest symposium on circuits and systems. MWSCAS 1, (pp. 57–61).

  • Qiao, L. Y., Peng, X. Y., & Peng, Y. (2006). BPSO-SVM wrapper for feature subset selection. DianziXuebao. Acta Electronica Sinica, 34(3), 496–498.

    Google Scholar 

  • Quentin, T. W. (1997). Targeting the poor using ROC curves. World Development, 25(12), 2083–2092. https://doi.org/10.1016/S0305-750X(97)00108-3.

    Article  Google Scholar 

  • Rajab, K. D. (2017). New hybrid features selection method: a case study on websites phishing. Security and Communication Networks, 2017(1), 10. https://doi.org/10.1155/2017/9838169.

    Article  Google Scholar 

  • Saaty, T. L. (1990). How to make a decision: the analytic hierarchy process. European Journal Operational Research, 48(1), 9–26.

    Article  Google Scholar 

  • Saha, A. K., Gupta, R. P., Sarkar, I., Arora, M. K., & Csaplovics, E. (2005). An approach for GIS-based statistical landslide susceptibility zonation-with a case study in the Himalayas. Landslides, 2(1), 61–69. https://doi.org/10.1007/s10346-004-0039-8.

    Article  Google Scholar 

  • Sezer, E. A., Pradhan, B., & Gokceoglu, C. (2011). Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility map**: Klang valley Malaysia. Expert Systems with Applications, 38(7), 8208–8219. https://doi.org/10.1016/j.eswa.2010.12.167.

    Article  Google Scholar 

  • Solway, L. (1999). Socio-economic perspective of develo** country megacities vulnerable to flood and landslide hazards. In R. Casale & C. Margottini (Eds.), Floods and landslides: Integrated risk assessment. Environmental Science. Heidelberg: Springer, Berlin. https://doi.org/10.1007/978-3-642-58609-5_15.

    Chapter  Google Scholar 

  • Somol, P., Baesens, B., Pudil, P., & Vanthienen, J. (2005). Filter-versus wrapper-based feature selection for credit scoring. International Journal of Intelligent Systems, 20(10), 985–999. https://doi.org/10.1002/int.20103.

    Article  Google Scholar 

  • Sun, J., Zhang, X., Liao, D., & Chang, V. (2017). Efficient method for feature selection in text classification. In International Conference on Engineering and Technology (ICET), (pp. 1–6). https://doi.org/10.1109/ICEngTechnol.2017.8308201.

  • Svalova, V. (2018). Landslide risk management for urbanized territories. Risk Management Treatise for Engineering Practitioners. IntechOpen. https://doi.org/10.5772/intechopen.79181.

    Chapter  Google Scholar 

  • Tirelli, T., & Pessani, D. (2011). Importance of feature selection in decision-tree and artificial-neural-network ecological applications. Alburnus alburnus alborella: A practical example. Ecological Informatics, 6(5), 309–315. https://doi.org/10.1016/j.ecoinf.2010.11.001.

    Article  Google Scholar 

  • Wang, G. C. S. (1996). How to handle multicollinearity in regression modelling. The Journal of Business Forecasting Methods & Systems, 15(1), 23–27.

    Google Scholar 

  • Wang, F., Xu, P., Wang, C., Wang, N., & Jiang, N. (2017). Application of a GIS-based slope unit method for landslide susceptibility map** along the Longzi River Southeastern Tibetan Plateau. China. ISPRS International Journal of Geo-Information, 6(6), 172. https://doi.org/10.3390/ijgi6060172.

    Article  Google Scholar 

  • Wang, J., **g, Xu., Zhao, C., Peng, Y., & Wang, H. (2019). An ensemble feature selection method for high-dimensional data based on sort aggregation. Systems Science & Control Engineering, 7(2), 32–39. https://doi.org/10.1080/21642583.2019.1620658.

    Article  Google Scholar 

  • Windeatt, T., Duangsoithong, R., & Smith, R. (2011). Embedded feature ranking for ensemble MLP classifiers. IEEE Transactions on Neural Networks, 22(6), 988–994. https://doi.org/10.1109/TNN.2011.2138158.

    Article  Google Scholar 

  • Xue, B., Cervante, L., Shang, L., Browne, W. N., & Zhang, M. (2012). A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connection Science, 24(2–3), 91–116. https://doi.org/10.1080/09540091.2012.737765.

    Article  Google Scholar 

  • Zafari, A., Zurita-Milla, R., & Izquierdo-Verdiguier, E. (2019). Evaluating the performance of a Random Forest Kernel for land cover classification. Remote Sensing, 11(5), 1–20. https://doi.org/10.3390/rs11050575.

    Article  Google Scholar 

Download references

Acknowledgements

This study is a section of my Ph.D. research in Department of Geography, Delhi School of Economics, University of Delhi, India. We are obliged to University Grants Commission (UGC) for granting fellowship for the research. We are also grateful to National Disaster Management Authority (NDMA) Government of India, Border Road Organization (BRO), Manali and Public Work Department (PWD), Kullu for providing landslide data. We also acknowledge our gratitude to O P Gupta, Network administrator, Central Library, University of Delhi for his contribution to enhance the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mukesh Prasad.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nirbhav, Malik, A., Maheshwar et al. Landslide Susceptibility Prediction based on Decision Tree and Feature Selection Methods. J Indian Soc Remote Sens 51, 771–786 (2023). https://doi.org/10.1007/s12524-022-01645-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12524-022-01645-1

Keywords

Navigation