Multi-source information fusion-driven corn yield prediction using the Random Forest from the perspective of Agricultural and Forestry Economic Management

Yang, Xuziqi; Hua, Zekai; Li, Liang; Huo, **ngheng; Zhao, Ziqiang

doi:10.1038/s41598-024-54354-9

Multi-source information fusion-driven corn yield prediction using the Random Forest from the perspective of Agricultural and Forestry Economic Management

Article
Open access
Published: 19 February 2024

Volume 14, article number 4052, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Multi-source information fusion-driven corn yield prediction using the Random Forest from the perspective of Agricultural and Forestry Economic Management

Download PDF

Xuziqi Yang¹,
Zekai Hua¹,
Liang Li²,
**ngheng Huo² &
…
Ziqiang Zhao³

668 Accesses
Explore all metrics

Abstract

The objective of this study is to promptly and accurately allocate resources, scientifically guide grain distribution, and enhance the precision of crop yield prediction (CYP), particularly for corn, along with ensuring application stability. The digital camera is selected to capture the digital image of a 60 m × 10 m experimental cornfield. Subsequently, the obtained data on corn yield and statistical growth serve as inputs for the multi-source information fusion (MSIF). The study proposes an MSIF-based CYP Random Forest model by amalgamating the fluctuating corn yield dataset. In relation to the spatial variability of the experimental cornfield, the fitting degree and prediction ability of the proposed MSIF-based CYP Random Forest are analyzed, with statistics collected from 1-hectare, 10-hectare, 20-hectare, 30-hectare, and 50-hectare experimental cornfields. Results indicate that the proposed MSIF-based CYP Random Forest model outperforms control models such as support vector machine (SVM) and Long Short-Term Memory (LSTM), achieving the highest prediction accuracy of 89.30%, surpassing SVM and LSTM by approximately 13.44%. Meanwhile, as the experimental field size increases, the proposed model demonstrates higher prediction accuracy, reaching a maximum of 98.71%. This study is anticipated to offer early warnings of potential factors affecting crop yields and to further advocate for the adoption of MSIF-based CYP. These findings hold significant research implications for personnel involved in Agricultural and Forestry Economic Management within the context of develo** agricultural economy.

Crop yield prediction in cotton for regional level using random forest approach

Article 20 July 2020

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

Article Open access 15 January 2021

Prediction Analysis of Crop and Their Futuristic Yields Using Random Forest Regression

Introduction

Globally, the enduring challenge of food security, particularly in regions with limited food resources, such as North Africa and the Middle East, which are heavily reliant on food imports, remains a century-old issue. Even in developed nations, the specter of famine persists. The evolution of time has imbued new dimensions into the concept of food security¹. Predominantly, food scarcity serves as the primary catalyst for famine, triggering a cascade of issues such as the escalation of global food prices. Notably, geopolitical conflicts, trade restrictions, economic sanctions, and a decelerating global economy can lead to significant spikes, with wheat and flour prices in North Africa soaring by over 100% and corn prices escalating by more than 70%^2,3. In 2020, the Food and Agriculture Organization (FAO) of the United Nations (UN) and the World Food Programme (WFP) jointly published an early warning analysis report on regions experiencing extreme food insecurity⁴. The report underscores that without intervention, populations in Burkina Faso, northeastern Nigeria, southern Sudan, and parts of Yemen may imminently face famine if the situation continues to deteriorate⁵. Escalation of conflicts or impediments to humanitarian assistance could elevate the risk of famine. On a global scale, regions facing food insecurity have expanded from the initial four countries to 16, all at a high risk of extreme hunger⁶.

In 2022, the WFP appealed for donations to address the global famine crisis. However, recognizing the intricate nature of global famine factors, financial support alone can only alleviate surface-level problems without eradicating them⁷. As outlined in the 2022–2023 Global Agricultural Product Supply and Demand Forecast Report, global corn and soybean yields have diminished since February 2022, while global wheat and rice yields have increased. The structural changes can be attributed to ongoing declines in corn and soybean production in Argentina due to drought, wheat production in Russia and Ukraine, and increased rice production. Specifically, corn yield, exports, and consumption have all experienced reductions while wheat production continues to rise. Month on Month (MoM), soybean production, crushing capacity, and inventory have decreased, while rice production and consumption have increased⁸. The report also underscores risks such as natural disasters, climate change, geopolitical tensions, and import–export policies. Ukraine, heavily reliant on corn as its primary export, faces a potential decrease in corn yield from 25.6 million tons in 2022 to 21.7 million tons in 2023, with the expected reduction in sown area from over 4 million hectares to 3.6 million hectares due to the Russia Ukraine conflict⁹.

This study aims to scientifically guide the problem of food distribution by timely and accurate allocation of corn resources, and effectively improve the accuracy and application stability of corn yield prediction. The study focuses on experimental corn fields of different sizes, including plots of 1, 10, 20, 30, and 50 hectares. These fields of different sizes are selected to comprehensively evaluate the adaptability and predictive power of the model at various scales. This helps to better understand the performance of the model in diverse spatial dimensions and provides more comprehensive research results for the management of agricultural fields and agricultural economic systems at different scales. The results could offer early warning of factors that may affect crop yields and further promote the use of multivariate information data to predict corn yields. It has important research significance for the Agricultural & Forestry Economic Management (AFEM) and the development and construction of the agricultural economy. This study looks at the immediate causes of food shortages and delves into how these challenges can be addressed by improving crop yield prediction (CYP) models to ensure the sustainability of agricultural production.

In summary, the first paragraph of the introduction describes the challenges to global food security, particularly how food shortages can be a catalyst for famine and rising global food prices. The second paragraph elaborates on the multiple factors leading to food shortages, such as climate change, geopolitical tensions, etc., which exacerbate food insecurity. The third paragraph indicates that the CYP model must be improved to address these challenges effectively. This study aims to develop a more accurate prediction model for corn yields to better manage resources and afford early warning to mitigate the impact of food shortages. These three paragraphs are closely linked and together form this study's background, questions, and objectives.

Literature review

Machine learning-based CYP

In research on the application of machine learning methods in different fields, Zhang et al.¹⁰ amalgamated the deep belief network (DBN) and support vector machine (SVM) for cyberattack detection, achieving a notable level of accuracy. In a separate study, Volpato et al.¹¹ employed the Kalman filter to estimate the winter wheat yield at a national level, employing the Normalized Difference Vegetation Index (NDVI) to standardize the time series model. Additionally, Beguería et al.¹² utilized a gray model based on arable land data in Jilin Province, China, for predicting grain yield. The model considered key factors influencing grain production, including fertilizer usage, livestock, and acreage, resulting in a partial Mean Absolute Percentage Error (MAPE) of 6.67% and an overall MAPE of 5.20% within Jilin Province¹². Kross et al.¹³ predicted the relative yield of summer crops harvested in 2018 using remote sensors and multiple regression models. Their findings revealed that 20% of CYP errors were below 2%, and 40% were less than 5%¹³. Similarly, Olson et al.¹⁴ applied remote sensor-based CYP under the Crop Water Stress (CWS) scale, indicating effective CWS and soil moisture using the temperature vegetation drought index. Their study explored the interplay between temperature vegetation dryness index, solar radiation, and yields of winter crops in humid areas. Results demonstrated a Mean Relative Error (MRE) of 13.34%, with incident radiation significantly impacting crops in such regions¹⁴. Lin et al.¹⁵ proposed an SVM-enabled World Food Studies (WOFOST) grain yield prediction model, using corn yield in Changchun City, China, as an illustrative example. Comparative testing against independent SVM models revealed the proposed model’s superior accuracy, particularly in predicting crop disasters¹⁵. Furthermore, PS et al.¹⁶ integrated Principal Component Analysis (PCA) and Extreme Learning Machine (ELM) to predict short-term grain yields. Their comparison of predicted results with actual data yielded an MRE of 1.90% and 2.08% for short-term grain yield over three and five years, respectively. Simultaneously, accurate short-term grain yield predictions were achieved using the Backpropagation Neural Network (BPNN)¹⁶.

CYP using multi-source information fusion

In the realm of multi-source information fusion (MSIF)-based CYP research, Shook et al.¹⁷ utilized Remote Sensing Technology (RST) and satellites to monitor winter wheat growth. Their work introduced a winter wheat remote monitoring system and a yield prediction system strategically designed to safeguard the interests of farmers¹⁷. Building on extensive experimentation, Murtaza et al.¹⁸ integrated RST and Geographic Information System (GIS) to develop a comprehensive CYP system. Ji et al.¹⁹ integrated artificial neural networks (ANNs) and statistical methods into CYP models, incorporating various vegetation coefficients. Sharifi et al.²⁰ completed a regression analysis for US crop yield based on the NDVI, normalized water index, and dual-band enhanced vegetation index, achieving highly accurate results. Wolanin et al.²¹ formulated a corn yield regression equation by combining process model theory and RST during their study of corn yield in the Northeast agricultural lands of China. Their successful predictions aided local farmers in planning effective strategies²¹. Nevavuori et al.²² predicted winter wheat yield by linear regression models, incorporating resampling particle filter algorithms with county-level univariate data. They identified influencing factors related to specific management models affecting winter wheat yield per unit area²². Abdel Fattah et al.²³ utilized multi-temporal Unmanned Aerial Vehicle (UAV) remote sensing data to predict summer corn yield, demonstrating the superior predictive efficacy of multi-generational remote sensing over single-generation long-term predictions. Hara et al.²⁴ leveraged meteorological data to analyze soil water content, employing multi-linear regression to derive an optimal model. The resulting simple equation, characterized by coefficients that aptly explained and accurately estimated crop yield, showcased promising results²⁴. Archontoulis et al.²⁵ delved into the climate impact on seasonal CYP, incorporating dynamic factors like temperature, radiation, and rainfall. Their study revealed the significant influence of the proposed dynamic climate model on crop yield²⁵. Meanwhile, Dang et al.²⁶ established a rice yield regression model utilizing a Random Forest algorithm, primarily considering the rice spectral index. Although the proposed model demonstrated simplicity, ease of data acquisition, and high implementation efficiency, its limited robustness and failure to consider other characteristics of crop yield formation posed challenges in interpreting and analyzing yield prediction results²⁶.

Obviously, scholars have amalgamated MSIF and machine learning methodologies for CYP research, predominantly acquiring multi-source data through hyperspectral images, drone images, etc. However, operational constraints and elevated costs persist in practical applications. In the context of CYP research, the deployment of high-resolution cameras emerges as a viable alternative for crop monitoring. Nonetheless, utilizing a singular machine learning approach for CYP emphasizes internal influencing factors of crops while overlooking external factors. The term “internal influencing factors” refers to intrinsic elements that affect crop growth and yield, such as soil quality, moisture levels, and fertilization. On the other hand, “external influencing factors” pertain to the impact of environmental elements on crop growth and yield, including climate, weather conditions, and pest infestations. This study focuses on how to simultaneously consider and analyze these internal and external factors to conduct a more comprehensive prediction of crop yield. Moreover, neural network models are susceptible to data limitations and may not comprehensively encapsulate the myriad factors influencing grain production, leading to substantial prediction errors. Based on the above analysis, this study endeavors to address the prevailing deficiencies in the existing literature, provide insights into the factors impacting crop yield (specifically corn), and advocate for the broader adoption of the MSIF technique in CYP. In order to achieve this objective, the Random Forest methodology is introduced to mitigate model overfitting and enhance noise robustness. The uniqueness of this study in employing Random Forest lies in its application to address model overfitting and enhance noise robustness. Specifically, the introduction of the MSIF technique in the Random Forest model combines with a volatile corn yield dataset, allowing the model to consider the influence of various internal and external factors on crop yield. Consequently, in this study, Random Forest is not merely utilized as a tool but is integrated with the MSIF technique to improve the accuracy and reliability of the model. Overall, this study anticipates corn yield within a designated geographical area with meticulous consideration for spatial variability. The research critically examines the fitting degree and CYP capabilities of the Random Forest model. Empirical findings indicate a notably high level of prediction accuracy and commendable reliability.

Research methodology

The core idea of the Random Forest algorithm

The Random Forest constitutes an amalgamated machine learning algorithm rooted in the aggregation of output from multiple decision trees to yield an enhanced outcome. Distinctively refining the “bagging” technique, the Random Forest assembles a robust learner through the simultaneous deployment of numerous parallel yet independent identical weak learners. In classification tasks, the cumulative votes of individual weak classifiers collectively determine the result. In contrast, for regression problems, the Random Forest algorithm computes the mean of the output from weak learners, addressing the inherent characteristics of CYP as a representative regression problem. Hence, this research opts for the Random Forest methodology, employing “bagging” across multiple binary decision trees^27,28. The training process of a Random Forest is depicted in Fig. 1.

In Fig. 1, a Random Forest amalgamates the outputs of a collection of independent decision trees by posing a sequence of yes/no queries about elements within the dataset, culminating in the final result. The prediction probability is directly proportional to the quantity of decision trees integrated into the Random Forest model. Given that a Random Forest encapsulates the collective decisions of the majority of its constituent trees, the resultant outcome surpasses that of any individual member. Meanwhile, the voting process among member trees safeguards against potential harm, as it curtails errors and prevents adverse interactions between individual trees. Figure 2 elucidates the specific training process of an individual decision tree.

In Fig. 2, the training of the decision tree involves the careful consideration of feature and segmentation point selection and evaluation. In this context, a comprehensive testing method is employed for the identification of features and segmentation points. Specifically, the procedure involves traversing all values of the C-th feature within the training set. Each value serves as a segmentation point, and its efficacy post-segmentation is computed. Subsequently, the segmentation complexity of each point is compared with the minimum complexity of the current node. If the former is found to be smaller than the latter, the segmentation points and corresponding segmentation features are stored. Following the determination of the optimal segmentation, the training set is bifurcated into two sets: the left subnode and the right subnode. The entire segmentation process is iteratively executed until all sub-nodes are reached and returned^29,30. The purity of the segmented nodes, as measured by Eq. (1), gauges the quality of features and segmentation points:

$$ G\left( {x_{i} ,v_{ij} } \right) = \frac{{n_{{{\text{left}}}} }}{{N_{s} }}H\left( {X_{{{\text{left}}}} } \right) + \frac{{n_{{{\text{right}}}} }}{{N_{s} }}H\left( {X_{{{\text{right}}}} } \right) $$

(1)

In Eq. (1), x_i and v_ij represent the segmented variable and its respective segmented value, respectively. n_left and n_right denote the number of left and right sub-nodes of the training samples. N_s signifies the total number of sub-nodes within the training sample. The function H(X) denotes the node impurity function. Equations (2) and (3) illustrate two frequently employed impurity functions tailored for regression problems.

$$ H\left( {X_{m} } \right) = \frac{1}{{N_{m} }}\mathop \sum \limits_{{i \in N_{m} }} \left( {y - \overline{{y_{m} }} } \right)^{2} $$

(2)

$$ H\left( {X_{m} } \right) = \frac{1}{{N_{m} }}\mathop \sum \limits_{{i \in N_{m} }} \left( {y - \overline{{y_{m} }} } \right) $$

(3)

Equations (2) and (3) compute the Mean Square Error (MSE) and the Mean Absolute Error (MAE), respectively. N_m corresponds the number of nodes in the training sample. y and $\overline{{y_{m} }}$ represent the true value and the predicted value for regression, respectively. The prediction of corn yield in this research, the MSE study is chosen. Equation (4) illustrates the regression results for a specific segmentation point:

$$ G\left( {x,v} \right) = \frac{1}{{N_{s} }}\left( {\mathop \sum \limits_{{y_{i} \in X_{left} }} \left( {y_{i} - \overline{{y_{{{\text{left}}}} }} } \right)^{2} + \mathop \sum \limits_{{y_{j} \in X_{right} }} \left( {y_{j} - \overline{{y_{{{\text{right}}}} }} } \right)^{2} } \right) $$

(4)

In Eq. (4), G(x, v) represents the weighted sum of the impurity levels across each node. N_s denotes the number of sub-nodes in the training samples. The variables y_i and y_j denote the actual value of nodes i and j, respectively. Additionally, $\overline{{y_{left} }}$ and $\overline{{y_{right} }}$ represent the summation of training samples for dividing the left node i and the right node j. Following the establishment of decision trees, the classification outcome of Random Forest is computed using Eq. (5):

$$ H\left( x \right) = \mathop {\mathop {argmax}\limits_{Y} }\limits_{{}} \mathop \sum \limits^{{\mathop {i = 1}\limits_{k} }} W\left( {h_{i} \left( x \right) = Y} \right) $$

(5)

In Eq. (5), H(x) signifies the ultimate outcome derived from the Random Forest. W represents the Classification and Regression Tree (CART) model. The term h_i(x) denotes the classification model for each individual decision tree, and Y represents the classification result of h_i(x).

Random Forests demonstrate proficiency in managing high-dimensional data, where the significance of trained features plays a pivotal role in influencing prediction outcomes³¹. The computation for the importance of a node k in a Random Forest is expressed as shown in Eq. (6):

$$ n_{k} = w_{k} G_{k} - w_{{{\text{left}}}} G_{{{\text{left}}}} - w_{{{\text{right}}}} G_{{{\text{right}}}} $$

(6)

In Eq. (6), w_k represents the ratio of the number of training samples at node k to the total number of training samples. Likewise, w_left and w_right denote the ratios of the number of training samples on the left subnode and the right subnode to the total number of training samples, respectively. Additionally, G_k, G_left, and G_right signify the impurity levels of node k, the left subnode, and the right subnode, respectively. The computation for feature importance is articulated as shown in Eq. (7):

$$ f_{i} = \frac{{\sum\nolimits_{{j \in feature\,i\,{\text{tangent}}\,{\text{point}}\,n_{j} }} {} }}{{\sum\nolimits_{{k \in {\text{ all}}\,{\text{nodes}}}} {n_{k} } }} $$

(7)

In Eq. (7), n_k denotes the collective importance of all nodes, while n_j corresponds to the point $\left( {i{ \ni }j} \right)$ where feature i is segmented. Ultimately, the importance of features undergoes normalization, ensuring that their cumulative sum equates to 1. The precise calculation is elucidated as shown in Eq. (8):

$$ f_{ni} = \frac{{f_{i} }}{{\mathop \sum \limits_{{j \in {\text{ total features}}}} f_{j} }} $$

(8)

MSIF-based corn yield data collection

In order to prognosticate corn yield, this section deploys a digital camera positioned above the experimental field, capturing a comprehensive image of the cornfield measuring 60 m in length and 10 m in width. Subsequently, data pertaining to corn yield, development, and growth within the specified area are incorporated as a source of multi-source information for CYP. Figure 3 illustrates the digital image of the experimental cornfield.

In Fig. 3, datasets from the cornfield’s RGB (Red, Green, and Blue) and hyperspectral camera comprise Portable Network Graphics (PNG ) and Matrix (MAT ) file formats. The analysis of multi-source information from the experimental cornfield is depicted in Fig. 4.

Within Fig. 4, the growth and development of corn are categorized into six stages. Notably, optimal conditions for corn growth are observed in the sowing and germination stage, characterized by a temperature range of 16 to 18 °C, sunshine suitability of 4.77, and a crop coefficient of 0.354. The germination and jointing stage and jointing and tasseling stage exhibit optimal conditions with a growth temperature of 24–28 °C, sunshine suitability of 5.08, and a crop coefficient of 0.773. In the tasseling and filling, the corn thrives under a suitable temperature of 20–25 °C, sunshine suitability of 5.16, and a crop coefficient of 1.288. Similarly, the filling and milk stages benefit from a suitable growth temperature, sunshine suitability, and crop coefficient of 20 to 25 °C, 5.21, and 1.167, respectively. Lastly, the milk and mature stage demonstrates optimal growth conditions at a temperature range of 18 to 23 °C, sunshine suitability of 5.24, and a crop coefficient of 0.615. Referring to El-Hendawy et al. ’s³² study, spring wheat yield can reach up to 1050 ** under suitable conditions encompassing soil fertility, climate, vegetation variety, and field management. Spring wheat yield and land productivity under different conditions were estimated by a multivariate ensemble model integrating biophysical parameters and hyperspectral index³². Therefore, based on pertinent data, an estimate suggests that the 30 m × 10 m crop field could yield between 1400 and 2200 **. In alignment with the experimental cornfield in this study, the corn yield also falls within the range of this study 1400–2200 **. Consequently, the implementation of digital image-based CYP is considered reasonable. Based on this premise, a CYP Random Forest model is executed using the corn fluctuation yield dataset. Figure 5 illustrates the proposed MSIF-based CYP Random Forest model.

Experimental preparation

The experimental environment is configured with the Windows 10 Operating System, featuring the AMD R7-5800H 3.2 GHz Central Processing Unit (CPU ), 16 GB Random Access Memory (RAM ), Python 3.6, and a development integration environment of Python 1.3. The camera utilized in the experiment is a high-altitude parabolic network camera with a resolution of 2560 × 1440, equipped with a 1/1.8-inch black light level image sensor and an F1.4 large aperture lens. This camera supports parabolic event alarm, trajectory rendering, message push, video viewing, intelligent perimeter defense, and motion detection. Furthermore, for assessing the performance of the proposed MSIF-based CYP Random Forest model, evaluation metrics such as MSE, Root Mean Squared Error (RMSE ), and determinant coefficient R² are selected as indicators. The specific calculation for RMSE and R² are articulated as shown in Eqs. (9) and (10):

$$ R^{2} = 1 - \frac{{\mathop \sum \limits^{{\mathop {i = 1}\limits_{n} }} \left( {y_{i} - \widehat{{y_{l} }}} \right)^{2} }}{{\mathop \sum \limits^{{\mathop {i = 1}\limits_{n} }} \left( {y_{i} - \overline{{y_{l} }} } \right)^{2} }} $$

(9)

$$ RMSE = \sqrt {\frac{{\mathop \sum \limits^{{\mathop {i = 1}\limits_{n} }} \left( {y_{i} - \widehat{{y_{l} }}} \right)^{2} }}{n}} $$

(10)

In Eqs. (9) and (10), n represents the number of samples, with the ith sample denoted by i, and y_i representing the actual production data for the ith sample. The simulated yield of the ith sample is denoted by $\widehat{{y_{i} }}$, while y_i signifies the average of the sample data. The evaluation indexes for assessing the model’s performance are calculated through Eqs. (11) and (12):

$$ AE = \left| {y_{p} - y_{r} } \right| $$

(11)

$$ {\text{Accuracy }} = 1 - \left| {\frac{{y_{p} - y_{r} }}{{y_{r} }}} \right| $$

(12)

In Eqs. (11)-(12), AE, y_r, and y_p represents the absolute error, the actual output, and the simulated output.

Results and discussions

Performance analysis of an MSIF-based CYP Random Forest model

This section utilizes the data from the 60 m × 10 m experimental field data as input for the prediction of corn yield. The performance results are illustrated in Fig. 6.

As depicted in Fig. 6, the corn yield within the experimental area spans from 1400 to 2200 **, whereas the predicted corn yield varies between 1350 to 2100 **. Notably, the average yield for the experimental area is 1820.72 **, contrasting with the predicted corn yield of 1602.765 **. The average accuracy of the proposed MSIF-based CYP Random Forest model is 87.72%. Moreover, the model’s MSE and RMSE are computed as 0.14 and 0.0196, respectively. Furthermore, SVM, Long Short-Term Memory (LSTM), BPNN, and Multiple Linear Regression (MLR) are designated as control models. A comparative analysis of the performance between the control models and the proposed CYP Random Forest model is presented in Fig. 7.

According to Fig. 7, the proposed CYP Random Forest model exhibits the smallest MSE and MAE on the same test sample, surpassing the BPNN model with the largest error. The error magnitude of the proposed model is lesser compared to the control models, indicating a better reflection of the actual situation. Moreover, as illustrated in Fig. 7c, the highest prediction accuracy achieved by the proposed CYP model is 89.30%, outperforming LSTM, SVM, BPNN, and MLR with prediction accuracies of 78.30%, 83.50%, 76.80%, and 71.20%, respectively. In summary, the proposed MSIF-based CYP Random Forest model demonstrates superior performance, surpassing SVM and LSTM by a prediction accuracy margin of 13.44%.

Validation of the proposed MSIF-based CYP Random Forest model

Subsequently, the fitting degree and prediction ability of the proposed MSIF-based CYP Random Forest model are scrutinized using statistical corn yield data as a verification dataset. Specifically, the corn yield is predicted within the 1-hectare, 10-hectare, 20-hectare, 30-hectare, and 50-hectare experimental fields, and the results are illustrated in Fig. 8.

Figure 8 illustrates that the CYP curve for various areas aligns closely with the true value. Specifically, on a 1-hectare, 10-hectare, 20-hectare, 30-hectare, and 50-hectare fields, the predicted corn yield ranges from 19,680.4–25,814.92 **, 217,263.7–438,867.9 **, 443,898.6–559,433.2 **, 668,475.7–853,015.6 ** to 157,907.1–1,436,498 **. The forecasted corn yield for different-sized fields falls within the actual value range. The accuracy of the proposed MSIF-based CYP Random Forest model is depicted in Fig. 9.

According to Fig. 9, the accuracy of the proposed MSIF-based CYP Random Forest model on 1-hectare, 10-hectare, 20-hectare, 30-hectare, and 50-hectare fields is 85.81%, 86.57%, 88.12%, 88.98%, and 90.35%, respectively. Moreover, as the experimental field expands in size, the proposed MSIF-based CYP Random Forest model demonstrates higher accuracy, reaching a maximum of 98.71%

Conclusion

In order to advance the application of the MSIF technique in CYP and agricultural and forestry management, this study introduces the Random Forest method to predict corn yield, taking into account its spatial variation. Consequently, the MSIF-based CYP Random Forest model is proposed, and its fitting degree and prediction ability are evaluated, yielding highly accurate prediction results. The research findings reveal that the proposed model achieves a peak prediction accuracy of 89.30%. Specifically, the accuracy on 1-hectare, 10-hectare, 20-hectare, 30-hectare, and 50-hectare test fields reaches 85.81%, 86.57%, 88.12%, 88.98%, and 90.35%, respectively. Therefore, the proposed MSIF-based CYP Random Forest model proves effective in predicting corn yield. Finally, it is essential to acknowledge certain research limitations, such as the omission of regional and terrain differences and various other factors influencing corn yield. Future research endeavors should aim to incorporate additional factors for a more accurate prediction of corn yield in the study (Supplementary Informations 1, 2, 3, 4).

Data availability

All data generated or analyzed during this study are included in this published article [and its supplementary information files].

References

Shahhosseini, M. et al. Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt. Sci. Rep. 11(1), 1–15. https://doi.org/10.1038/s41598-020-80820-1 (2021).
Article ADS CAS Google Scholar
Khaki, S., Pham, H. & Wang, L. Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Sci. Rep. 11(1), 11132. https://doi.org/10.1038/s41598-021-89779-z (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Barzin, R. et al. Use of UAS multispectral imagery at different physiological stages for yield prediction and input resource optimization in corn. Remote Sens. 12(15), 2392. https://doi.org/10.3390/rs12152392 (2020).
Article ADS Google Scholar
Kim, N. et al. A comparison between major artificial intelligence models for crop yield prediction: Case study of the midwestern United States, 2006–2015. ISPRS Int. J. Geo-Inf. 8(5), 240. https://doi.org/10.3390/ijgi8050240 (2019).
Article Google Scholar
Schwalbert, R. et al. Mid-season county-level corn yield forecast for US Corn Belt integrating satellite imagery and weather variables. Crop Sci. 60(2), 739–750. https://doi.org/10.1002/csc2.20053 (2020).
Article Google Scholar
Kim, N. et al. An artificial intelligence approach to prediction of corn yields under extreme weather conditions using satellite and meteorological data. Appl. Sci. 10(11), 3785. https://doi.org/10.3390/app10113785 (2020).
Article CAS Google Scholar
Jiang, Z. et al. Predicting county-scale maize yields with publicly available data. Sci. Rep. 10(1), 1–12. https://doi.org/10.1038/s41598-020-71898-8 (2020).
Article Google Scholar
Muruganantham, P. et al. A systematic literature review on crop yield prediction with deep learning and remote sensing. Remote Sens. 14(9), 1990. https://doi.org/10.3390/rs14091990 (2022).
Article ADS Google Scholar
Dutta, S. et al. Maize yield in smallholder agriculture system—An approach integrating socio-economic and crop management factors. PLoS ONE 15(2), e0229100. https://doi.org/10.1371/journal.pone.0229100 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhang, H. et al. A real-time and ubiquitous network attack detection based on deep belief network and support vector machine. IEEE/CAA J. Autom. Sin. 7(3), 790–799. https://doi.org/10.1109/JAS.2020.1003099 (2020).
Article Google Scholar
Volpato, S. et al. Spectroscopic and foliar pH model for yield prediction in a symbiotic corn production. J. Agron. Res. 2(3), 1. https://doi.org/10.14302/issn.2639-3166.jar-19-3089 (2019).
Article Google Scholar
Beguería, S. & Maneta, M. P. Qualitative crop condition survey reveals spatiotemporal production patterns and allows early yield prediction. Proc. Natl. Acad. Sci. 117(31), 18317–18323. https://doi.org/10.1073/pnas.1917774117 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kross, A. et al. Using artificial neural networks and remotely sensed data to evaluate the relative importance of variables for prediction of within-field corn and soybean yields. Remote Sens. 12(14), 2230. https://doi.org/10.3390/rs12142230 (2020).
Article ADS Google Scholar
Olson, D. et al. Relationship of drone-based vegetation indices with corn and sugarbeet yields. Agron. J. 111(5), 2545–2557. https://doi.org/10.2134/agronj2019.04.0260 (2019).
Article CAS Google Scholar
Lin, T. et al. DeepCropNet: A deep spatial-temporal learning framework for county-level corn yield estimation. Environ. Res. Lett. 15(3), 034016. https://doi.org/10.1088/1748-9326/ab66cb (2020).
Article ADS Google Scholar
Ps, M. G. Performance evaluation of best feature subsets for crop yield prediction using machine learning algorithms. Appl. Artif. Intell. 33(7), 621–642. https://doi.org/10.1080/08839514.2019.1592343 (2019).
Article Google Scholar
Shook, J. et al. Crop yield prediction integrating genotype and weather variables using deep learning. PLoS ONE 16(6), e0252402 (2021).
Article CAS PubMed PubMed Central Google Scholar
Murtaza, B. et al. Municipal solid waste compost improves crop productivity in saline-sodic soil: A multivariate analysis of soil chemical properties and yield response. Commun. Soil Sci. Plant Anal. 50(8), 1013–1029. https://doi.org/10.1080/00103624.2019.1603305 (2019).
Article CAS Google Scholar
Ji, Z. et al. Prediction of crop yield using phenological information extracted from remote sensing vegetation index. Sensors 21(4), 1406. https://doi.org/10.3390/s21041406 (2021).
Article ADS PubMed PubMed Central Google Scholar
Sharifi, A. Yield prediction with machine learning algorithms and satellite images. J. Sci. Food Agric. 101(3), 891–896. https://doi.org/10.1002/jsfa.10696 (2021).
Article CAS PubMed Google Scholar
Wolanin, A. et al. Estimating and understanding crop yields with explainable deep learning in the Indian Wheat Belt. Environ. Res. Lett. 15(2), 024019. https://doi.org/10.1088/1748-9326/ab68ac (2020).
Article ADS Google Scholar
Nevavuori, P. et al. Crop yield prediction using multitemporal UAV data and spatio-temporal deep learning models. Remote Sens. 12(23), 4000. https://doi.org/10.3390/rs12234000 (2020).
Article ADS Google Scholar
Abdel-Fattah, M. K. et al. Multivariate analysis for assessing irrigation water quality: A case study of the Bahr Mouise Canal, Eastern Nile Delta. Water 12(9), 2537. https://doi.org/10.3390/w12092537 (2020).
Article CAS Google Scholar
Hara, P., Piekutowska, M. & Niedbała, G. Selection of independent variables for crop yield prediction using artificial neural network models with remote sensing data. Land 10(6), 609. https://doi.org/10.3390/land10060609 (2021).
Article Google Scholar
Archontoulis, S. V. et al. Predicting crop yields and soil-plant nitrogen dynamics in the US Corn Belt. Crop Sci. 60(2), 721–738. https://doi.org/10.1002/csc2.20039 (2020).
Article CAS Google Scholar
Dang, C. et al. Autumn crop yield prediction using data-driven approaches: Support vector machines, Random Forest, and deep neural network methods. Can. J. Remote Sens. 47(2), 162–181. https://doi.org/10.1080/07038992.2020.1833186 (2021).
Article ADS Google Scholar
Wang, X. et al. Winter wheat yield prediction at county level and uncertainty analysis in main wheat-producing regions of China with deep learning approaches. Remote Sens. 12(11), 1744. https://doi.org/10.3390/rs12111744 (2020).
Article ADS Google Scholar
Jiang, H. et al. A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: A case study of the US Corn Belt at the county level. Glob. Change Biol. 26(3), 1754–1766. https://doi.org/10.1111/gcb.14885 (2020).
Article ADS Google Scholar
Fernandez-Beltran, R. et al. Rice-yield prediction with multi-temporal sentinel-2 data and 3D CNN: A case study in Nepal. Remote Sens. 13(7), 1391. https://doi.org/10.3390/rs13071391 (2021).
Article ADS Google Scholar
Zhang, Y., Gurung, R., Marx, E., et al. DayCent model predictions of NPP and grain yields for agricultural lands in the contiguous US. J. Geophys. Res.: Biogeosci. 125(7), e2020JG005750. https://doi.org/10.1029/2020jg005750 (2020).
Jiao, S., Lu, Y. & Wei, G. Soil multitrophic network complexity enhances the link between biodiversity and multifunctionality in agricultural systems. Glob. Change Biol. 28(1), 140–153. https://doi.org/10.1111/gcb.15917 (2022).
Article CAS Google Scholar
El-Hendawy, S. et al. Combining biophysical parameters, spectral indices and multivariate hyperspectral models for estimating yield and water productivity of spring wheat across different agronomic practices. PLoS ONE 14(3), e0212294. https://doi.org/10.1371/journal.pone.0212294 (2019).
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

College of Economics and Management, Northwest A&F University, Yangling, 712100, Shaanxi, China
Xuziqi Yang & Zekai Hua
Laboratory of Walnut Research Center, College of Forestry, Northwest A&F University, Yangling, 712100, Shaanxi, China
Liang Li & **ngheng Huo
College of Humanities and Social Development, Northwest A&F University, Yangling, 712100, Shaanxi, China
Ziqiang Zhao

Authors

Xuziqi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zekai Hua
View author publications
You can also search for this author in PubMed Google Scholar
Liang Li
View author publications
You can also search for this author in PubMed Google Scholar
**ngheng Huo
View author publications
You can also search for this author in PubMed Google Scholar
Ziqiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.Y., Z.H., and L.L. contributed to conception and design of the study. X.H. organized the database. Z.Z. performed the statistical analysis. X.Y. and Z.H. wrote the first draft of the manuscript. L.L., X.H., and Z.Z. wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Corresponding author

Correspondence to Xuziqi Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, X., Hua, Z., Li, L. et al. Multi-source information fusion-driven corn yield prediction using the Random Forest from the perspective of Agricultural and Forestry Economic Management. Sci Rep 14, 4052 (2024). https://doi.org/10.1038/s41598-024-54354-9

Download citation

Received: 11 August 2023
Accepted: 12 February 2024
Published: 19 February 2024
DOI: https://doi.org/10.1038/s41598-024-54354-9
Springer Nature Limited

Multi-source information fusion-driven corn yield prediction using the Random Forest from the perspective of Agricultural and Forestry Economic Management

Abstract

Similar content being viewed by others

Crop yield prediction in cotton for regional level using random forest approach

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt

Prediction Analysis of Crop and Their Futuristic Yields Using Random Forest Regression

Introduction