Introduction

Biological invasions could result in serious global consequences if not handled properly, including ecological destruction and economic losses. Especially in agriculture, crop losses and pest control can be extremely expensive1. The increases in international trade and transportation have established novel pathways for the spread of invasive species2, which worsens the situation.

The codling moth, Cydia pomonella L. (Lepidoptera: Tortricide) is one of the most detrimental and economically important apple pests, and the moth has the potential to cause complete crop losses in untreated apple orchards3. The codling moth is a multivoltine species, and adaptive behaviour, such as facultative diapause and multiple generations per breeding season, have allowed the codling moth to adapt to diverse climatic conditions. Although the flight capacity of the codling moth is limited4, they can spread over long distances through the transportation of infested fruit and packing material, and this has become the most common method for colonization of new habitats. The codling moth is considered to have originated from south-eastern Europe5, over the last two centuries, they have dispersed throughout the world and have reached almost global distribution. The codling moth is now a cosmopolitan insect that occurs in almost every country where apples are grown, becoming one of the most successful pest insect species in terms of invasiveness6.

Prevention of biological invasions is much less expensive than post-entry control1. Detailed knowledge of the geographic and ecological distribution of a species is fundamental for conservation planning and forecasting7. To reduce the ecological destruction and economic losses caused by codling moth invasions, it is essential to understand the potential distribution of codling moths for risk assessment and decision making. Ecological niche models (ENM) have become an effective tool for assessing the potential risk for establishment of invasive species in recent years. There are basically two types of ecological niche models used: correlative models (e.g., Maxent, GARP, ENFA) and process-based distribution models (e.g.; CLIMEX). These models were used to estimate the potential risk of codling moths by some researchers. Liang et al. analysed the suitability for codling moths in China based on biological data of codling moths and meteorological data from 760 weather sites by using CLIMEX and ArcGIS8. Svobodova et al. investigated the historical occurrence of the codling moth in southern Moravia and northern Austria by using CLIMEX9. Vavrovic et al. used the CLIMEX model to estimate the potential codling moth infestation pressure in Slovakia under the conditions of climate change10. Zhao et al. adopted an ecological niche model, Maxent, to interpret the disjunct distribution and potential distribution of codling moths in China and identified the relative roles of climate, humans and vegetation with respect to the present codling moth distribution11. Kumar et al. used CLIMEX and Maxent model to map the global risk of codling moth establishment and compared the results of the two models12.

However, there are some limitations of the current studies: Firstly, the existing codling moth occurrence records that have been used to fit the models are inadequate. Secondly, most of the studies have merely focused on climate conditions for the establishment of codling moths, ignoring the availability of host plants and the increased possibility of transportation. Although climatic conditions are a major determinant of the potential distribution of codling moths, the risk of establishment of codling moths in new places is also largely influenced by human factors. Thirdly, many studies have adopted mechanism models, which are suitable for macroscopical predictions, but the performance at local scales is not good. Lastly, most research is based on regional studies, and there are few studies on the potential distribution of codling moths at the global scale. To solve these problems, in this study, codling moth occurrence records were collected from multiple sources. In addition, a maximum entropy model, which is a machine learning method, was used to simulate the potential global distribution of codling moths with global accessibility data, apple yield data, elevation data and 19 bioclimatic variables, considering the ecological characteristics and the expansion channels that cover the processes from growth and survival to the dispersal of codling moths.

Results

Potential distribution of codling moth

The potential distribution of codling moth predicted by the Maxent model is in good agreement with the current known codling moth occurrence regions (Fig. 1). Overall, the areas that were predicted to be suitable for codling moths are distributed on all continents except Antarctica. The suitable regions for codling moth are mainly distributed in Europe, North America and Asia. It is noteworthy that few areas between the latitudes of 20°N and 20°S or beyond 70°N and 70°S are predicted to be suitable for codling moths. By contrast, most of the suitable areas are distributed between the latitudes of 30° and 60°. In addition, the distribution of codling moths is significantly different on different continents.

Figure 1
figure 1

Global potential distribution of codling moth using Maxent.

In Asia (Supplementary Fig. S1), the areas that were predicted to be suitable for codling moths are primarily distributed in Central Asia and East Asia. The model predicted higher codling moth suitability in China but lower suitability in Central Asian countries including Kazakhstan, Uzbekistan, and Tajikistan. The suitable regions covered most of the apple-growing countries such as China, Turkey, Azerbaijan, Japan, North Korea, South Korea, India, Iran, Pakistan and Kazakhstan. In China, the model predicted no or very low suitability in the southern provinces and Qinghai-Tibetan Plateau, which might be the result of the lack of host plants. Medium or highly suitable areas are distributed in most of the apple-producing provinces including ** the disjunct distribution of introduced codling moth Cydia pomonella in China. Agr. Forest Entomol. 17, 214–222 (2015)." href="/article/10.1038/s41598-018-31478-3#ref-CR11" id="ref-link-section-d129534649e627">11 and Kumar’s12. In the study of Kumar, it predicted more suitable areas in Central Asia and northeast China than this study. By contrast, our study predicted more suitable areas in Northern Europe, such as Norway, Sweden and Finland. One of the reasons might be the different distribution of occurrence records, another is that we considered more input variables such as global accessibility and apple yield data. Besides, our study has different suitability distribution patterns comparing with Kumar and Zhao. In Kumar’s study, the degree of suitability was distributed roughly along latitude lines, suitability in two hemispheres was almost symmetrical regardless of the discontinuous continents. In our study, the predicted suitability for codling moth is less regularly distributed, looks similar to Zhao’s study from the global scale. But there are notable differences from a local perspective, the suitability in our study included more details that highly correlated with the global accessibility.

The model also predicted suitable environments for codling moths in some regions where they do not occur yet but include their favourite host plants. Finally, the major codling moth predictors were extracted, and global accessibility, mean temperature of the coldest quarter, precipitation of the driest month, annual mean temperature and apple yield were the most important predictors associated with the global distribution of codling moths. All of this information is very useful in assessing the risk of codling moth colonization in new areas.

However, this study has some defects that can be ameliorated by additional research. The codling moth occurrence data are still insufficient in some regions, and the Maxent model prediction may be affected by occurrence points that are not uniformly distributed. In addition, biological invasions are very complex processes. There are some other factors that affect the potential distribution of codling moths. Therefore, more complex models and more elements will be our next research directions.

Methods

The main work of this paper can be summarized as the following aspects.

Step 1: Codling moth occurrence records were collected from multiple sources, ensuring the highest possible data integrity.

Step 2: A high-resolution spatial dataset was produced, which included the ecological characteristics and the expansion channels that cover the processes from growth and survival to the invasion of codling moth.

Step 3: A Maxent model was built to simulate the potential global distribution of codling moths.

The technical flow chart of this study is shown in Fig. 5.

Figure 5
figure 5

Technical flow chart of this study.

Occurrence data

Georeferenced occurrence data of codling moths were collected from three different sources: (1) Existing open source data, which were mainly accessed via the online Global Biodiversity Information Facility database14, which is one of the most popular species distribution data sources. (2) Published articles and maps of codling moth occurrences were also used to extract occurrence location information. (3) Government documents, reports and related supportive materials were also used. Occurrence data from the GBIF covered most regions in the world where the codling moth is known to occur except Asia, South America and Africa. Therefore, occurrence data from published literatures and government reports were used as supplemental material for the GBIF data. After removing duplicate occurrence records, a total of 1776 occurrence records were collected for Maxent input occurrence data (Fig. 6).

Figure 6
figure 6

Worldwide codling moth occurrence records. The map was generated by ArcGIS 10.2 software35; the red points represent the occurrence records from the GBIF, and the yellow points represent the occurrence records extracted from the existing literature and reports.

Factor data

The factor data needed for the Maxent model consists of bioclimatic variables, global accessibility, apple yield data and elevation data (Table 1). The bioclimatic variable data were acquired from the World-Clim dataset15 with a resolution of 30 arc seconds, which is approximately 1 km2; these bioclimatic variables are more biologically meaningful variables that were derived from the monthly temperature and rainfall values, representing annual trends (e.g., mean annual temperature, annual precipitation), seasonality (e.g., annual ranges of temperature and precipitation) and extreme or limiting environmental factors (e.g., temperature of the coldest and warmest months and precipitation of the wet and dry quarters). The global accessibility map was download from the Joint Research Centre of the European Commission’s science and knowledge service16, and it reflects the global connectivity of transportation and the concentration of economic activities; the pixel values on the map represent the travel time in minutes to major cities with a resolution of 30 arc seconds, which is approximately 1 km2. The apple production data were acquired from EarthStat17 with a 5-minute spatial resolution, which is approximately 10 km2; these data represent the total apple production in metric tons on the land-area mass of a grid cell. The elevation data were obtained from the NASA Shuttle Radar Topography Mission (SRTM)18, which provides high-quality digital elevation models (DEM) for the entire globe with a spatial resolution of 3 arc seconds, which is approximately 250 m. To ensure spatial consistency of these variables, we converted the spatial resolutions of all data to 0.05 degrees.

Table 1 Variables for Maxent input.

The 22 variables were selected as Maxent model inputs for three main reasons:

Firstly, as climatic conditions are the major determinants for the establishment of codling moths, nineteen bioclimatic variables and elevation data were selected to indicate the ecological conditions that are required for codling moth survival and growth, referring to some existing studies11,19.

Secondly, the global apple yield data represents the availability of host plant apple trees for codling moths, which is another important indicator used to assess the risk of codling moth establishment. If host plants exist in a region, it might also contain suitable environmental conditions for codling moths.

Thirdly, the long-distance spread of codling moths to new habitats mainly occurs through international trade and transportation. The global accessibility (travel time to major cities) data reflect the connectivity and the concentration of international trade and transportation. As we mentioned before, codling moths have a broad environmental tolerance and are able to opportunistically establish populations in areas with low climate suitability with the assistance of humans; So human factors are indispensable for assessing the potential codling moth distribution.

The three kinds of indicators above comprehensively cover the processes from growth and survival to the spread of codling moth and considering as many aspects as possible, guaranteeing the rationality of the model.

Maximum entropy model (Maxent)

There are many models used to assess the potential distribution of species. According to some comparative studies on different models, Maxent outperforms GARP20 and some presence-only methods (e.g. DOMAIN, ENFA)21, have advantages over BIOCLIM22. Maxent is a general-purpose machine learning method with a precise mathematical formulation, it has a number of aspects that make it well-suited for species distribution modelling, such as Maxent uses a regularization multiplier to control model complexity and thus avoids over-fitting20,23,24, it is possible to analyse the contribution of each environmental variable to the suitability and lower data requirement25,26,27. Therefore, the Maxent niche model has been widely used to model potential species distributions28,29,30,31. In this study, the Maxent model was selected to simulate the potential risk area of codling moth.

The Maxent model applied a machine learning method called maximum entropy modelling32, it follows the principle of maximum entropy: when approximating an unknown probability distribution, the best approach is to ensure that the approximation is subject to any constraints on the unknown distribution33. The entropy formula is defined as below:

$$H(\hat{\pi })=-\,\sum _{x\in X}\hat{\pi }(x)ln\hat{\pi }(x)$$
(1)

where π is the unknown probability distribution; \(\hat{\pi }\) is the approximation of π; X is a finite set; x is an individual element in set X; and ln is the natural logarithm. The entropy is nonnegative and is at most the natural log of the number of elements in X.

Maxent integrates species presence locations with a set of environmental variables (e.g., temperature, precipitation) across a study area that is divided into grid cells and generates probabilities of species presence or predicted local abundance20. Maxent identifies areas that have conditions that are most similar to the current known occurrences of a species and ranks them from 0 (unsuitable) to 1 (most suitable).