Log in

Estimation of soil organic matter in the Ogan-Kuqa River Oasis, Northwest China, based on visible and near-infrared spectroscopy and machine learning

  • Research Article
  • Published:
Journal of Arid Land Aims and scope Submit manuscript

Abstract

Visible and near-infrared (vis-NIR) spectroscopy technique allows for fast and efficient determination of soil organic matter (SOM). However, a prior requirement for the vis-NIR spectroscopy technique to predict SOM is the effective removal of redundant information. Therefore, this study aims to select three wavelength selection strategies for obtaining the spectral response characteristics of SOM. The SOM content and spectral information of 110 soil samples from the Ogan-Kuqa River Oasis were measured under laboratory conditions in July 2017. Pearson correlation analysis was introduced to preselect spectral wavelengths from the preprocessed spectra that passed the 0.01 level significance test. The successive projection algorithm (SPA), competitive adaptive reweighted sampling (CARS), and Boruta algorithm were used to detect the optimal variables from the preselected wavelengths. Finally, partial least squares regression (PLSR) and random forest (RF) models combined with the optimal wavelengths were applied to develop a quantitative estimation model of the SOM content. The results demonstrate that the optimal variables selected were mainly located near the range of spectral absorption features (i.e., 1400.0, 1900.0, and 2200.0 nm), and the CARS and Boruta algorithm also selected a few visible wavelengths located in the range of 480.0–510.0 nm. Both models can achieve a more satisfactory prediction of the SOM content, and the RF model had better accuracy than the PLSR model. The SOM content prediction model established by Boruta algorithm combined with the RF model performed best with 23 variables and the model achieved the coefficient of determination (R2) of 0.78 and the residual prediction deviation (RPD) of 2.38. The Boruta algorithm effectively removed redundant information and optimized the optimal wavelengths to improve the prediction accuracy of the estimated SOM content. Therefore, combining vis-NIR spectroscopy with machine learning to estimate SOM content is an important method to improve the accuracy of SOM prediction in arid land.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Araújo M C U, Saldanha T C B, Galvão R K H, et al. 2001. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57(2): 65–73.

    Article  Google Scholar 

  • Araújo S R, Wetterlind J, Demattê J A M, et al. 2014. Improving the prediction performance of a large tropical vis-NIR spectroscopic soil library from Brazil by clustering into smaller subsets or use of data mining calibration techniques. European Journal of Soil Science, 65(5): 718–729.

    Article  Google Scholar 

  • Bao N S, Wu L X, Ye B Y, et al. 2017. Assessing soil organic matter of reclaimed soil from a large surface coal mine using a field spectroradiometer in laboratory. Geoderma, 288: 47–55.

    Article  Google Scholar 

  • Chang W C, Laird D A, Mausbach M J, et al. 2001. Near-infrared reflectance spectroscopy-principal components regression analyses of soil properties. Soil Science Society of America Journal, 65(2): 480–490.

    Article  Google Scholar 

  • Chen Y, Ma L X, Yu D S, et al. 2022. Comparison of feature selection methods for map** soil organic matter in subtropical restored forests. Ecological Indicators, 135: 108545, doi: https://doi.org/10.1016/j.ecolind.2022.108545.

    Article  Google Scholar 

  • Chen S C, Xu H Y, Xu D Y, et al. 2021. Evaluating validation strategies on the performance of soil property prediction from regional to continental spectral data. Geoderma, 400: 115159, doi: https://doi.org/10.1016/j.geoderma.2021.115159.

    Article  Google Scholar 

  • Ding J L, Yu D L. 2014. Monitoring and evaluating spatial variability of soil salinity in dry and wet seasons in the Werigan-Kuqa Oasis, China, using remote sensing and electromagnetic induction instruments. Geoderma, 235–236: 316–322.

    Article  Google Scholar 

  • Dharumarajan S, Lalitha M, Gomez C, et al. 2022. Prediction of soil hydraulic properties using VIS-NIR spectral data in semi-arid region of Northern Karnataka Plateau. Geoderma Regional, 28: e00475, doi: https://doi.org/10.1016/j.geodrs.2021.e00475.

    Article  Google Scholar 

  • Ge X Y, Ding J L, ** X L, et al. 2021. Estimating agricultural soil moisture content through UAV-based hyperspectral images in the arid region. Remote Sensing, 13(8): 1562, doi: https://doi.org/10.3390/rs13081562.

    Article  Google Scholar 

  • Ge X Y, Ding J L, Teng D X, et al. 2022a. Exploring the capability of Gaofen-5 hyperspectral data for assessing soil salinity risks. International Journal of Applied Earth Observation and Geoinformation, 112: 102969, doi: https://doi.org/10.1016/j.jag.2022.102969.

    Article  Google Scholar 

  • Ge X Y, Ding J L, Teng D X, et al. 2022b. Updated soil salinity with fine spatial resolution and high accuracy: The synergy of Sentinel-2 MSI, environmental covariates and hybrid machine learning approaches. CATENA, 212: 106054, doi: https://doi.org/10.1016/j.catena.2022.106054.

    Article  Google Scholar 

  • Han L J, Ding J L, Wang J J, et al. 2022. Monitoring oasis cotton fields expansion in arid zones using the Google Earth Engine: A case study in the Ogan-Kucha River oasis, **njiang, China. Remote Sensing, 14(1): 225, doi: https://doi.org/10.3390/rs14010225.

    Article  Google Scholar 

  • Hong Y S, Chen Y Y, Shen R L, et al. 2021. Diagnosis of cadmium contamination in urban and suburban soils using visible-to-near-infrared spectroscopy. Environmental Pollution, 291: 118128, doi: https://doi.org/10.1016/j.envpol.2021.118128.

    Article  Google Scholar 

  • ** X L, Du J, Liu H J, et al. 2016. Remote estimation of soil organic matter content in the Sanjiang Plain, Northest China: The optimal band algorithm versus the GRA-ANN model. Agricultural and Forest Meteorology, 218–219: 250–260.

    Article  Google Scholar 

  • Keskin H, Grunwald S, Harris W G. 2019. Digital map** of soil carbon fractions with machine learning. Geoderma, 339: 40–58.

    Article  Google Scholar 

  • Kursa M B, Jankowski A, Rudnicki W. 2010. Boruta-a system for feature selection. Fundamenta Informaticae, 101(4): 271–285.

    Article  Google Scholar 

  • Li X H, Ding J L, Liu J, et al. 2021. Digital map** of soil organic carbon using sentinel series data: A case study of the Ebinur Lake Watershed in **njiang. Remote Sensing, 13(4): 769, doi: https://doi.org/10.3390/rs13040769.

    Article  Google Scholar 

  • Li Q Q, Huang Y, Song X Z, et al. 2019. Moving window smoothing on the ensemble of competitive adaptive reweighted sampling algorithm. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 214: 129–138.

    Article  Google Scholar 

  • Liu J B, Dong Z Y, **a J S, et al. 2021. Estimation of soil organic matter content based on CARS algorithm coupled with random forest. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 258: 119823, doi: https://doi.org/10.1016/j.saa.2021.119823.

    Article  Google Scholar 

  • Luo C, Wang Y A, Zhang X L, et al. 2022. Spatial prediction of soil organic matter content using multiyear synthetic images and partitioning algorithms. CATENA, 211: 106023, doi: https://doi.org/10.1016/j.catena.2022.106023.

    Article  Google Scholar 

  • Ma G L, Ding J L, Han L J, et al. 2021. Digital map** of soil salinization based on Sentinel-1 and Sentinel-2 data combined with machine learning algorithms. Regional Sustainability, 2(2): 177–188.

    Article  Google Scholar 

  • Mcbratney A, Field D J, Koch A. 2014. The dimensions of soil security. Geoderma, 213: 203–213.

    Article  Google Scholar 

  • Mesquita D P P, Gomes J P P, Rodrigues L R, et al. 2018. Building selective ensembles of Randomization Based Neural Networks with the successive projections algorithm. Applied Soft Computing, 70: 1135–1145.

    Article  Google Scholar 

  • Nocita M, Stevens A, Toth G, et al. 2014. Prediction of soil organic carbon content by diffuse reflectance spectroscopy using a local partial least square regression approach. Soil Biology and Biochemistry, 68: 337–347.

    Article  Google Scholar 

  • Savitzky A, Golay M J E. 1964. Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8): 1627–1639.

    Article  Google Scholar 

  • Schomberg J, Ziogas A, Anton-Culver H, et al. 2018. Identification of a gene expression signature predicting survival in oral cavity squamous cell carcinoma using Monte Carlo cross validation. Oral Oncology, 78: 72–79.

    Article  Google Scholar 

  • Shi T Z, Chen Y Y, Liu H Z, et al. 2014. Soil organic carbon content estimation with laboratory-based visible-near-infrared reflectance spectroscopy: Feature selection. Applied Spectroscopy, 68(8): 831–837.

    Article  Google Scholar 

  • Shi T Z, Wang J J, Chen Y Y, et al. 2016. Improving the prediction of arsenic contents in agricultural soils by combining the reflectance spectroscopy of soils and rice plants. International Journal of Applied Earth Observation and Geoinformation, 52: 95–103.

    Article  Google Scholar 

  • Song X Z, Huang Y, Tian K D, et al. 2020. Near infrared spectral variable optimization by final complexity adapted models combined with uninformative variables elimination-a validation study. Optik, 203: 164019, doi: https://doi.org/10.1016/j.ijleo.2019.164019.

    Article  Google Scholar 

  • Swierenga H, Wülfert F, De Noord O E, et al. 2000. Development of robust calibration models in near infra-red spectrometric applications. Analytica Chimica Acta, 411(1–2): 121–135.

    Article  Google Scholar 

  • Tian Y C, Zhang J J, Yao X, et al. 2013. Laboratory assessment of three quantitative methods for estimating the organic matter content of soils in China based on visible/near-infrared reflectance spectra. Geoderma, 202–203: 161–170.

    Article  Google Scholar 

  • Viscarra Rossel R A, Walvoort D J J, Mcbratney A B, et al. 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma, 131(1–2): 59–75.

    Article  Google Scholar 

  • Vohland M, Ludwig M, Thiele-Bruhn S, et al. 2014. Determination of soil properties with visible to near- and mid-infrared spectroscopy: Effects of spectral variable selection. Geoderma, 223–225(1): 88–96.

    Article  Google Scholar 

  • Wang J Z, Ding J L, Ma X, et al. 2019. Capability of Sentinel-2 MSI data for monitoring and map** of soil salinity in dry and wet seasons in the Ebinur Lake region, **njiang, China. Geoderma, 353: 172–187.

    Article  Google Scholar 

  • Wang X P, Zhang F, Ding J L, et al. 2018. Estimation of soil salt content (SSC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR), Northwest China, based on a Bootstrap-BP neural network model and optimal spectral indices. Science of the Total Environment, 615: 918–930.

    Article  Google Scholar 

  • Wang Z, Ding J L, Zhang Z P. 2022. Estimation of soil organic matter in arid zones with coupled environmental variables and spectral features. Sensors, 22(3): 1194, doi: https://doi.org/10.3390/s22031194.

    Article  Google Scholar 

  • **e S G, Ding F J, Chen S G, et al. 2022. Prediction of soil organic matter content based on characteristic band selection method. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 273: 120949, doi: https://doi.org/10.1016/j.saa.2022.120949.

    Article  Google Scholar 

  • **ng Z, Du C W, Shen Y Z, et al. 2021. A method combining FTIR-ATR and Raman spectroscopy to determine soil organic matter: Improvement of prediction accuracy using competitive adaptive reweighted sampling (CARS). Computers and Electronics in Agriculture, 191: 106549, doi: https://doi.org/10.1016/j.compag.2021.106549.

    Article  Google Scholar 

  • Yin G C, Chen X L, Zhu H H, et al. 2022. A novel interpolation method to predict soil heavy metals based on a genetic algorithm and neural network model. Science of the Total Environment, 825: 153948, doi: https://doi.org/10.1016/j.scitotenv.2022.153948.

    Article  Google Scholar 

  • Zhang Y, Sui B, Shen H O, et al. 2019. Map** stocks of soil total nitrogen using remote sensing data: A comparison of random forest models with different predictors. Computers and Electronics in Agriculture, 160: 23–30.

    Article  Google Scholar 

  • Zhang Z P, Ding J L, Zhu C M, et al. 2021. Bivariate empirical mode decomposition of the spatial variation in the soil organic matter content: A case study from NW China. CATENA, 206: 105572, doi: https://doi.org/10.1016/j.catena.2021.105572.

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the Key Project of Natural Science Foundation of **njiang Uygur Autonomous Region, China (2021D01D06) and the National Natural Science Foundation of China (41961059). We thank anonymous reviewers for their insightful comments, which help improve the quality of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianli Ding.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Q., Ding, J., Ge, X. et al. Estimation of soil organic matter in the Ogan-Kuqa River Oasis, Northwest China, based on visible and near-infrared spectroscopy and machine learning. J. Arid Land 15, 191–204 (2023). https://doi.org/10.1007/s40333-023-0094-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40333-023-0094-4

Keywords

Navigation