Abstract
Low visibility events at King Khalid airport in Riyadh, Saudi Arabia, are investigated using hourly time series of meteorological and air pollution data from April 2015 to December 2017. The analysis of binary classification is based on two machine learning classifiers (random forest (RF) and K-nearest neighbors (KNN)). Six models based on the feature selection methods of RF feature importance and Pearson correlation matrix are presented. The classification tasks include two resampling approaches (random oversampling and random undersampling) to address the problem of an imbalanced dataset of the visibility event classes. An important finding is that oversampling outperforms undersampling for the evaluated classifiers and achieves higher scores in terms of accuracy and F1 score metrics. The RF classifier has a better performance compared to the KNN in both sampling approaches. The RF classifier with oversampling approach provides the best overall performance in terms of accuracy, F1 score, and area under the receiver operating characteristics (AUROC). The best model has scores above 0.95 based on all the evaluation metrics considered in the study. Air temperature and dewpoint temperature have minimal impact on the performance, whereas the particulate matter with aerodynamic diameter <10 μm (PM10) has a profound impact on the performance. It is found that the PM10 has the highest importance (52%) for the low visibility events based on the analysis of RF feature importance. Other pollutants and meteorological variables show relative importance between 5 and 10% for low visibility events. Overall, the best model is found when all variables, except temperature and dewpoint temperature, are employed to predict the visibility classes.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00704-023-04697-6/MediaObjects/704_2023_4697_Fig7_HTML.png)
Similar content being viewed by others
Data Availability
Data are available with the corresponding author and available upon request.
Code availability
Not applicable.
References
Al Senafi F, Anis A (2015) Shamals and climate variability in the Northern Arabian/Persian Gulf from 1973 to 2012. Int J Climatol 35(15):4509–4528. https://doi.org/10.1002/joc.4302
Aldababseh A, Temimi M (2017) Analysis of the long-term variability of poor visibility events in the UAE and the link with climate dynamics. Atmosphere 8(12):242. https://doi.org/10.3390/atmos8120242
Alharbi B, Shareef MM, Husain T (2015) Study of chemical characteristics of particulate matter concentrations in Riyadh, Saudi Arabia. Atmos Pollut Res 6(1):88–98. https://doi.org/10.5094/APR.2015.011
Alhathloul SH, Khan AA, Mishra AK (2021) Trend analysis and change point detection of annual and seasonal horizontal visibility trends in Saudi Arabia. Theor Appl Climatol 144(1):127–146. https://doi.org/10.1007/s00704-021-03533-z
Ali N, Neagu D, Trundle P (2019) Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. SN App Sci 1(12):1559. https://doi.org/10.1007/s42452-019-1356-9
Almazroui M (2020) Rainfall trends and extremes in Saudi Arabia in recent decades. Atmosphere 11(9):964. https://doi.org/10.3390/atmos11090964
Altuwayjiri A, Pirhadi M, Kalafy M, Alharbi B, Sioutas C (2022) Impact of different sources on the oxidative potential of ambient particulate matter PM10 in Riyadh, Saudi Arabia: a focus on dust emissions. Sci Total Environ 806:150590. https://doi.org/10.1016/j.scitotenv.2021.150590
Beckmann M, Ebecken NFF, de Lima BSLP (2015) A KNN undersampling approach for data balancing. Int J Intell Syst Appl 7(4):4. https://doi.org/10.4236/jilsa.2015.74010
Berrar D (2018) Cross-Validation. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Chen R-C, Dewi C, Huang S-W, Caraka RE (2020) Selecting critical features for data classification based on machine learning methods. J Big Data 7(1):52. https://doi.org/10.1186/s40537-020-00327-4
Cornejo-Bueno S, Casillas-Pérez D, Cornejo-Bueno L, Chidean MI, Caamaño AJ, Sanz-Justo J, Casanova-Mateo C, Salcedo-Sanz S (2020) Persistence analysis and prediction of low-visibility events at Valladolid Airport, Spain. Symmetry 12(6):6. https://doi.org/10.3390/sym12061045
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE transactions on information theory 13(1):21–27
Deng J, Du K, Wang K, Yuan C-S, Zhao J (2012) Long-term atmospheric visibility trend in Southeast China, 1973–2010. Atmos Environ 59:11–21. https://doi.org/10.1016/j.atmosenv.2012.05.023
Dey S (2018) On the theoretical aspects of improved fog detection and prediction in India. Atmos Res 202:77–80. https://doi.org/10.1016/j.atmosres.2017.11.018
Ding J, Zhang G, Wang S, Xue B, Yang J, Gao J, Wang K, Jiang R, Zhu X (2022) Forecast of hourly airport visibility based on artificial intelligence methods. Atmosphere 13(1):1. https://doi.org/10.3390/atmos13010075
Dutta D, Chaudhuri S (2015) Nowcasting visibility during wintertime fog over the airport of a metropolis of India: decision tree algorithm and artificial neural network approach. Nat Hazards 75(2):1349–1368. https://doi.org/10.1007/s11069-014-1388-9
Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng 2(1):602–609. https://doi.org/10.1080/21642583.2014.956265
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Fix E, Hodges J (1951) An important contribution to nonparametric discriminant analysis and density estimation. Int Stat Rev 3(57):233–238
GACA (2023) General Authority of Civil Aviation Standards. GACAR - Safety Regulations - GACAR Part 91 – General Operating and Flight Rules. https://gaca.gov.sa/web/en-gb/page/aviation-standards
Goswami S, Chaudhuri S, Das D, Sarkar I, Basu D (2020) Adaptive neuro-fuzzy inference system to estimate the predictability of visibility during fog over Delhi, India. Meteorol Appl 27(2):e1900. https://doi.org/10.1002/met.1900
Hossin M, Sulaiman M (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):01–11. https://doi.org/10.5121/ijdkp.2015.5201
Hsu H-H, Hsieh C-W, Lu M-D (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144–8150. https://doi.org/10.1016/j.eswa.2010.12.156
Hu J, Zhang H, Chen S-H, Wiedinmyer C, Vandenberghe F, Ying Q, Kleeman MJ (2014) Predicting primary PM2.5 and PM0.1 trace composition for epidemiological studies in California. Environ Sci Technol 48(9):4971–4979. https://doi.org/10.1021/es404809j
Huang H, Chen C (2016) Climatological aspects of dense fog at Urumqi Diwopu International Airport and its impacts on flight on-time performance. Nat Hazards 81(2):1091–1106. https://doi.org/10.1007/s11069-015-2121-z
Ibrar M, Hassan MA, Shaukat K, Alam TM, Khurshid KS, Hameed IA, Aljuaid H, Luo S (2022) A machine learning-based model for stability prediction of decentralized power grid linked with renewable energy resources. Wirel Commun Mob Comput 2022:e2697303. https://doi.org/10.1155/2022/2697303
Ilmi N, Budi WTA, Nur RK (2016) Handwriting digit recognition using local binary pattern variance and K-nearest neighbor classification. In: 2016 4th International Conference on Information and Communication Technology (ICoICT), pp 1–5. https://doi.org/10.1109/ICoICT.2016.7571937
Kaur P, Gosain A (2018) Comparing the behavior of oversampling and undersampling approach of class imbalance learning by combining class imbalance problem with noise. In: Saini AK, Nayak AK, Vyas RK (eds) ICT Based Innovations (pp. 23–30). Springer. https://doi.org/10.1007/978-981-10-6602-3_3
Kaya K, Gündüz Öǧüdücü Ş (2018) A binary classification model for PM10 levels. In: 2018 3rd International Conference on Computer Science and Engineering (UBMK), pp 361–366. https://doi.org/10.1109/UBMK.2018.8566285
Kneringer P, Dietz SJ, Mayr GJ, Zeileis A (2019) Probabilistic nowcasting of low-visibility procedure states at Vienna International Airport during cold season. Pure Appl Geophys 176(5):2165–2177. https://doi.org/10.1007/s00024-018-1863-4
Kujawska J, Kulisz M, Oleszczuk P, Cel W (2022) Machine learning methods to forecast the concentration of PM10 in Lublin, Poland. Energies 15(17):17. https://doi.org/10.3390/en15176428
Kumar S, Mishra S, Singh SK (2020) A machine learning-based model to estimate PM2.5 concentration levels in Delhi’s atmosphere. Heliyon 6(11). https://doi.org/10.1016/j.heliyon.2020.e05618
Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. John Wiley & Sons
Lin M, Tao J, Chan C-Y, Cao J-J, Zhang Z-S, Zhu L-H, Zhang R-J (2012) Regression analyses between recent air quality and visibility changes in megacities at four haze regions in China. Aerosol Air Qual Res 12(6):1049–1061. https://doi.org/10.4209/aaqr.2011.11.0220
Liu D, Jiang T, Zhang Y, Wang Y, Pan X, Wu J (2021) Forecast model of airport haze visibility and meteorological factors based on SVR-RBF model. OP Conf Ser: Earth Environ Sci 657(1):012029. https://doi.org/10.1088/1755-1315/657/1/012029
Maghrabi AH (2021) Long-term visibility trends in the Riyadh Megacity, Central Arabian Peninsula and their possible link to solar activity. Am J Clim Change 10(3):3. https://doi.org/10.4236/ajcc.2021.103013
Mandrekar JN (2010) Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol 5(9):1315–1316. https://doi.org/10.1097/JTO.0b013e3181ec173d
Masoud AA, Aal AKA (2019) Three-dimensional geotechnical modeling of the soils in Riyadh city, KSA. Bull Eng Geol Environ 78(1):1–17. https://doi.org/10.1007/s10064-017-1011-x
Mohammed R, Rawashdeh J, Abdullah M (2020) Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp 243–248. https://doi.org/10.1109/ICICS49469.2020.239556
Neumann U, Riemenschneider M, Sowa J-P, Baars T, Kälsch J, Canbay A, Heider D (2016) Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach. BioData Mining 9(1):36. https://doi.org/10.1186/s13040-016-0114-4
Oğuz K, Peki̇n MA (2019) Predictability of fog visibility with artificial neural network for Esenboga Airport. Avrupa Bilim ve Teknoloji Dergisi 15:542–551. https://doi.org/10.31590/ejosat.452598
Sain SR (1996) The nature of statistical learning theory. Technometrics 38(4):409–409. https://doi.org/10.1080/00401706.1996.10484565
Sharma M, Kumar N, Sharma S, Jangra V, Mehandia S, Kumar S, Kumar P (2022) Assessment of fine particulate matter for Port City of Eastern Peninsular India using gradient boosting machine learning model. Atmosphere 13(5):5. https://doi.org/10.3390/atmos13050743
Shu Z, Yang S, Xu W (2016) The system of the calibration for visibility measurement instrument under the atmospheric aerosol simulation environment. EPJ Web Conf 119:23005. https://doi.org/10.1051/epjconf/201611923005
Sun S, Huang R (2010) An adaptive k-nearest neighbor algorithm. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, 1, pp 91–94. https://doi.org/10.1109/FSKD.2010.5569740
Syarif AR, Gata W (2017) Intrusion detection system using hybrid binary PSO and K-nearest neighborhood algorithm. In: 2017 11th International Conference on Information Communication Technology and System (ICTS), pp 181–186. https://doi.org/10.1109/ICTS.2017.8265667
Won W-S, Oh R, Lee W, Kim K-Y, Ku S, Su P-C, Yoon Y-J (2020) Impact of fine particulate matter on visibility at Incheon International Airport, South Korea. Aerosol Air Qual Res 20(5):1048–1061. https://doi.org/10.4209/aaqr.2019.03.0106
Yap BW, Rani KA, Rahman HAA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Herawan T, Deris MM, Abawajy J (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Springer, pp 13–22. https://doi.org/10.1007/978-981-4585-18-7_2
Yu Y, Notaro M, Liu Z, Wang F, Alkolibi F, Fadda E, Bakhrjy F (2015) Climatic controls on the interannual to decadal variability in Saudi Arabian dust activity: toward the development of a seasonal dust prediction model. J Geophys Res Atmos 120(5):1739–1758. https://doi.org/10.1002/2014JD022611
Funding
The authors received funding through a student scholarship from the Ministry of Education in Saudi Arabia.
Author information
Authors and Affiliations
Contributions
Saleh H. Alhathloul: formal analysis, investigation, writing — original draft. Abdul khan: writing — review and editing. Ashok Mishra: writing — review and editing.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All the authors have agreed to the present version of the manuscript and have no objection for its publication.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alhathloul, S.H., Mishra, A.K. & Khan, A.A. Low visibility event prediction using random forest and K-nearest neighbor methods. Theor Appl Climatol 155, 1289–1300 (2024). https://doi.org/10.1007/s00704-023-04697-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-023-04697-6