Introduction

When evaluating the efficacy of a new pest management tool, population models are used before conducting field trials. Population models provide a logical knowledge base because the results of field trials generally contain various sources of noise, making it difficult to draw conclusive decisions. Historically, there have been two types of models in pest management sciences: highly realistic simulation models and simple analytical models (Onstad 1988). Highly realistic simulation models comprise complicated systems models that inherit the essence of the old-school systems approach (Garfinkel and Sack 1964; Ruesink 1976; Watt 1966). Although these models emerged in pest management science in the early 1960s, they are currently out of fashion (Liebhold 1994). Simple analytical models comprise various analytical models that focus on simple but fundamental structures of the target pest population and the interacting biotic and abiotic factors (Murray 1989; Renshaw 1991; Tuljapurkar and Caswell 1997). Analytical models are still providing many insights into the key mechanisms of pest regulation, although there is always criticism as to whether such simple models are realistic enough for practical pest management (Grimm 1994; Taylor 1983).

This review provides a brief overview of the various population models and prediction tools. Fundamental problems associated with the use of prediction models are highlighted. I also evaluate the significance of these models in the context of scientific philosophy. Then, the idea of assimilation is introduced, and I advocate that state-space modeling, which is an assimilation technique, can be a remedy to connect model structure and reality in the field. Finally, I introduce the idea of artificial intelligence (AI), specifically focusing on deep neural networks (DNNs), which have been rapidly implemented in various practical applications, and discuss how they can be utilized in pest management problems. I hope that this review can help scholars and practitioners who are not yet familiar with population models and consider how they can utilize such models for practical pest management.

Population models in pest management

If a virtual world can be set up on a high-speed computer to test several pest management options and generate highly accurate predictions, an integrated pest management (IPM) program can be fully optimized. In a standard IPM program, farmers must decide in advance whether chemical pesticides should be applied, considering the economically tolerable damage to crops at later harvesting periods, and the efficacy of biological and abiotic (non-chemical) control should be maximized (Nakasuji 1997). Systems models are the tools that assist with such decisions. They originated from the systems approach proposed by the famous limnological ecologists Howard and Eugene Odum (Odum and Barrett 1953). In this approach, all main elements in a crop field are modeled as sub-models, which usually take the form of differential equations and are formulated according to real-world observations and data. Subsequently, all sub-models are connected as an integrated virtual ecosystem. As the systems model has a biologically and physiologically explainable structure, many applied entomologists have assumed that this approach can produce highly realistic predictions for various pest management scenarios (Garfinkel and Sack 1964; Ruesink 1976; Watt 1966).

Many models of various crop systems with pests have been developed as an important part of IPM programs since the early 1960s (Furuhashi et al. 1983; Rabbinge 1976; Shoemaker 1973; Yamanaka and Shimada 2002). For example, Shoemaker (1973) constructed a simple systems model of a pest moth, Ephestia kuehniella Zeller (Lepidoptera: Pyralidae), and its parasitoid, Venturia canescens (Gravenhorst) (Hymenoptera: Ichneumonidae), in stored grains to determine the optimal combination of biological agents and insecticides. Rabbinge (1976) constructed a detailed systems model of the fruit tree red spider mite, Panonychus ulmi (Koch) (Acari: Tetranychidae), and a native predatory mite, Amblyseius potentillae (Garman) (Acari: Phytoseiidae), and explored the main factors affecting control success using sensitivity analyses. Another application of systems models to the citrus red mite, Panonychus citri (McGregor) (Acari: Tetranychidae), was reported in Japan (Furuhashi et al. 1983). As shown in the schematic flowchart in Fig. 1, the model structure consists of seven stages of citrus red mites. All stages are individually affected by temperature, precipitation, pesticide application, biological agents, etc., in the form of ordinary differential equations. A general correspondence is observed between the field monitoring records of citrus red mites in the actual orchard (solid lines) and the simulation predictions (dashed lines in Fig. 1, left panels).

Fig. 1
figure 1

A practical systems model for citrus red mite. The schematic flowchart on the left shows the model structure consisting of seven stages of citrus red mite. All stages are affected by temperature, precipitation, pesticide application, biological agents, and so on. Right panels show the field monitoring records of citrus red mites (solid lines) and simulation predictions (dashed lines) at three stations in Shizuoka in 1981. Arrows indicate the timings of acaricide applications. All parameters in the model were estimated from independent field experiments. The figures are modified by T. Yamanaka from the original with special permission from Dr. Furuhashi

While systems models were extensively applied in various pest management programs until the late 1980s, they are currently not widely in use, especially those related to biocontrol projects with natural enemies. This is mainly because the social background in many countries has drastically changed since the 1980s. Specifically, augmentative biological control became popular in greenhouses (Yano 2004) while field applications of exotic biological agents were gradually reduced because of their environmental risks as invasive alien species (Howarth 1991). After augmentative biological control became popular, the systems models approach became less relevant to applied entomologists and field practitioners than it was in classical biological control in the field (Naranjo et al. 2015) because we can easily apply parasitoid wasps and predatory mites the same as pesticides, and they are commercially available at any time.

In addition to such changes in the social background, systems models have become less appealing to theoretical ecologists because they are difficult to develop, communicate, and understand (Grimm et al. 1999). It is usually impossible to enumerate all the major players in the system a priori, and we cannot estimate all the parameters from the experiments. Consequently, such an imperfect structure results in poor prediction performance (Liebhold 1994). Although complicated models typically have many equations, verbal explanations cannot be effective without losing rigor (Grimm et al. 1999). More importantly, results obtained from complicated models are difficult to interpret. Although each part of a systems model has a biological meaning, comparable results can be obtained from different combinations of factors and/or parameter settings (Scheffer and Beets 1994).

In contrast to complicated models, simple analytical models for pest management continuously contribute to understanding the general principles of pest management (Murray 1989; Renshaw 1991; Tuljapurkar and Caswell 1997). Because the number of variables and parameters are manageable in such simple models, their behavior can be fully explored using well-developed mathematical tools. Almost all pest management techniques have been modeled using simple analytical methods. For example, the efficacy of parasitoid wasps as biological agents has been well studied regarding system stability (Hassell and May 1973). Pesticide resistance management has attracted the attention of theoretical ecologists as a natural experiment for rapid evolution (Comins 1986; Coyne 1951; Georghiou and Taylor 1977; Sudo et al. 2018). Ecologists have explored various possibilities for slowing the development of resistance using simple theoretical models. Knipling’s (1955) work is a classic textbook on the sterile insect release method, in which males are sterilized by radiation or chemicals and released to suppress the normal mating success of wild females with wild males. Knipling and McGuire (1966) developed a theoretical basis for mate-interference technologies using synthetic sex pheromones, such as mating disruption, where pheromone lures confuse male moths and prevent them from mating, and mass trap**, where traps kill as many males as possible to leave the majority of wild females unmated. Yamanaka (2007) extended the work of Knipling and McGuire (1966) by showing the general superiority of mating disruption over mass trap** in lepidopteran pests (Fig. 2). They identified that even if no males could be caught, population growth could be suppressed by masking the sex pheromones released by wild females.

Fig. 2
figure 2

Unified model of mating disruption and mass trap**. Parameters, the effect of the priority of the female in the overlapped area (b), and the catchability of the trap (c) are thoroughly examined. Contour lines in the lower panel indicate the ratio of mating success during the simulation period. Mating disruption corresponds to the case where males are confused by synthetic pheromones and no male is killed by traps (b ≅ 0, c = 0); however, mass trap** kills every male attracted to the trap (c ≅ 1)

Analytical models have made tremendous contributions to pest management by providing a theoretical basis for the method. However, there is always deep reluctance among applied entomologists against directly applying model predictions to the field (Grimm 1994; Taylor 1983). Practitioners are skeptical about results from such analytical models and whether they reflect reality in practical crop fields while theoretical ecologists still energetically explore the key mechanisms expressed by the models. We cannot deny that some theoreticians are interested only in the behavior of their own problems, and these models are based only on the previous studies conducted by other theoreticians. Sometimes, neither their models nor their research rationales are deeply involved in the actual practice of agriculture (Grimm 1994).

Fundamental problems in model predictability

May (1976) published a vital paper in which he showed that even a simple mathematical model exhibits highly complicated and unpredictable behavior, i.e., chaos. For example, a classic Ricker model of a difference equation can be written as

$$N_{t + 1} = N_t \exp \left( {a - bN_t } \right)$$
(1)

where a is an intrinsic growth rate, and b is a parameter regulating population size.

Surprisingly, the population dynamic expressed in Eq. 1 is perfectly deterministic but is sometimes unpredictable for a broad range of parameter a, but not so much for parameter b, in a biologically plausible domain. Under these conditions, the two populations start with a trivial difference and behave completely differently after a few generations (Fig. 3a). We call such unpredictable behavior chaos. However, the dynamics defined in Eq. 1 were maintained within a certain range of population sizes that were clearly distinguishable from white noise. In short, population models cannot always predict the exact future even if their structures are simple and deterministic.

Fig. 3
figure 3

Unpredictable dynamics in a simple Ricker model (Eq. 1 with a = 3.4, b = 0.2). a Two populations start from 0.3 and 0.30001, respectively, and diverge from each other after 14 generations. b Cobweb analysis for visual inspection of the system stability. A population size in the next time step is first calculated by Eq. 1 with N0 = 0.3 and then mirror-wise reflected by Nt+1 = Nt. This is iterated for 100 time steps to see how the population fluctuates

Stochasticity also influences the predictability of population models in addition to the deterministic chaos mentioned above (Bjørnstad and Grenfell 2001; de Valpine and Hastings 2002). Two types of stochasticity, “observation error” and “process error”, are explained sequentially in this section. Here, we assume \(N_t\) is the actual population size and \(Y_t\) and \(Y_t^{\prime}\) are the observed values. When the population process is deterministically controlled by the Ricker model but an error only exists in the observations, the model of observation error can be formulated as random sampling either from Poisson or negative binomial (NB) distributions (Eq. 2).

$$\begin{aligned} & r = \log \left( {\frac{{N_{t + 1} }}{N_t }} \right) = a - bN_t \\ & \lambda = N_{t + 1} = N_t \exp \left( r \right) \\ & Y_{t + 1} \sim {\text{Poisson}}\left( \lambda \right) \\ & Y_{t + 1}^{\prime} \sim {\text{NB}}\left( {k = 5, p = \frac{k}{k + \lambda }} \right) \\ \end{aligned}$$
(2)

Poisson distribution, \(Y_{t + 1} \sim {\text{Poisson}}\left( \lambda \right)\), assumes non-zero integers with means and variances of \(\lambda\). By contrast, the NB distribution, \(Y_{t + 1}^{{\prime}} \sim {\text{NB}}\left( {k = 5, p = k/(k + \lambda )} \right)\), counts the number of successes before k failures for events with a success probability of \(p = k/(k + \lambda )\). Technically, the Poisson distribution assumes a sample from randomly distributed populations, whereas NB assumes a spatially biased distribution (Shimada et al. 2005). As a result, NB contains more uncertainty than the Poisson distribution.

Contrary to observational errors, it is plausible that the population process is not perfectly controlled by the Ricker model. To model such process error, unpredictable events can be incorporated into the population growth rate, as follows:

$$\begin{aligned} r & = \log \left( {\frac{{N_{t + 1} }}{N_t }} \right) \sim {\text{Normal}}\left( {a - bN_t , \sigma_e^2 + \frac{\sigma_d^2 }{{N_t }}} \right) \\ Y_{t + 1}^{{\prime} {\prime} } & = N_{t + 1} = N_t \exp \left( r \right) \\ \end{aligned}$$
(3)

where the net annual growth rate was calculated as a random sample from a normal distribution with a mean of \(\left( {a - bN_t } \right)\) and variances of either environmental \(\left( {\sigma_e^2 } \right)\) or demographic stochasticity \(\left( {\sigma_d^2 /N_t } \right)\) (Eq. 3). Fluctuating climates or other external factors are sources of such environmental stochasticity, and demographic stochasticity matters when the population size is small and the number of neonates changes by chance. The derivation of environmental and demographic stochasticity in Eq. 3 can be found in Sæther et al. (2000).

Bjørnstad and Grenfell (2001) employed a novel approach to test how these different sources of stochasticity affect the predictability of the models. They utilized a simplex reconstruction of nonlinear forecasting as an ad hoc probe for model predictability. Simplex reconstruction estimates the best dimension of the system (in our case, dimension = 1) and can reconstruct the full dynamics of a single species by utilizing its own past record (Sugihara and May 1990). Simulations of the four models defined in Eqs. 2 and 3, i.e., Poisson and NB observation error models and demographic and environmental process models, were executed for 100 time steps starting from \(N_0\), which was randomly sampled from a uniform distribution [5, 20], for 1000 trials. Parameters are tuned so that the population periodically fluctuates with moderate stochasticity (a = 2.4, b = 0.2, k = 5, \(\sigma_e\) = \(\sigma_d\) = 1.5). The prediction skill (ρ) was then calculated as the correlation between the simulated observations (\(Y_t\), \(Y_t^{\prime}\), \(Y_t^{{\prime} {\prime} }\)) i in the final 91–100 time steps of 1000 trials and the predictions of τ time steps ahead (τ = 1, 2, …, 9) produced by the simplex reconstruction. The simplex reconstruction was individually tuned by simulated observations in the past time step from 1 to 90 over 1000 trials.

When the observation error is the only source of stochasticity, the predictability does not drastically decline from τ = 1 to 9 time steps ahead, even if the observation error is conspicuous (Fig. 4, gray line of NB). However, process errors of demographic and environmental stochasticity do affect predictability, as it drastically decreases within a few time steps ahead (Fig. 4, blue and red lines).

Fig. 4
figure 4

Different sources of stochasticity affect the predictability of population models of the Ricker model (Bjørnstad and Grenfell 2001). Parameters are tuned so that the population periodically fluctuates with moderate stochasticity (a = 2.4, b = 0.2, k = 5, \(\sigma_e\) = \(\sigma_d\) = 1.5), and simulations were iterated using 1000 trials for each model to calculate prediction skill (ρ). If uncertainty only exits in observation (Poisson and NB, grey dotted lines), then predictability does not drastically decline from τ = 1 to 9 time steps ahead. In contrast, when stochasticity exists in the process (demographic and environmental stochasticity, blue and red lines, respectively), predictability drastically drops as τ increases. The simplex reconstruction was executed using R library rEDM (ver 1.14.0: https://cran.r-project.org/web/packages/rEDM/, accessed June 2023)

Population models in the context of science philosophy

Currently, population ecologists recognize that even a simple model can exhibit complicated behavior and that stochasticity in the process can make the system more unpredictable. Consequently, I dare to state that population models may not always be the most suitable choice for predictions.

Historically, there are two opposing schools of science philosophy—empiricism and rationalism—both developed in European countries until the nineteenth century. Population models of realistic simulations or analytical models belong to rationalism, as mathematics generally belongs to the school of rationalism. Rationalism draws on principles based on well-defined axioms. Rationalism originated from Renaissance humanism in the sixteenth century and was based on a deep trust in human intelligence. After the continuous victories of rationalism in physics, from Newton’s mechanics to high-energy physics, people once believed that all natural phenomena could be explained by deductive approaches, as old ecologists attempted to solve ecological problems using systems models. However, as seen in this review, deductive approaches do not work well in some practical ecosystems because even a very simple model can easily be unpredictable, and stochasticity worsens the prediction.

Contrastingly, empiricism deeply appreciates the facts recorded in experiments or field observations and attempts to draw general principles from data, excluding human presuppositions. Traditional statistics, which are in the school of empiricism, became a strong tool for building national policy in Europe at the end of the eighteenth century, during the first wave of big data science. As a result of their importance in the centralized states of Europe, many documents related to taxation and military recruitment were gathered, printed, and accumulated in ministries (Hacking 1990). These records were then subjected to traditional statistical analyses, and the results provided objective solutions for European states to compete with neighboring great powers.

Traditional statistics, especially those related to linear models based on the universal assumption of a Gaussian error structure, have been recognized as a completed methodology and are still standard tools in experimental sciences (Zar 1999). In addition, since personal computers became popular in the late 1990s, computer statistics, such as classification and regression trees (De’ath and Fabricius 2000) and random forests (Prasad et al. 2006), have emerged. These methods typically use nonparametric iterative algorithms and are robust for non-Gaussian data structures. Such inductive statistical approaches remain powerful in many sciences, including pest management, and computer statistics are gradually merging into AIs, which will be described later (Fig. 5).

Fig. 5
figure 5

Schematic flow of the developments in science philosophy. The flow is modified by T. Yamanaka from the original (Luo et al. 2011)

Data-model assimilation

As I mentioned in the previous section, population models are typically considered deductive approaches rooted in rationalism. However, there is a way to use them as an inductive statistical tool known as assimilation (Bolker 2008).

Three broad assimilation methods are available. Trajectory matching is the simplest and most brute-force method of the three (Fig. 6b). The data and model predictions are visually adjusted by changing key parameters in the model or using an optimization algorithm, such as minimizing the mean square error or joint likelihood probabilities of the data given by the model prediction. However, trajectory matching implicitly assumes that only observation errors exist in the data.

Fig. 6
figure 6

Three assimilation methods (bd). a Canadian lynx records at the Hudson’s Bay Company as an example (Bulmer 1974); b trajectory-matching directly fits the model to the data; c one-step-ahead fitting transforms the data into the generational relationship, i.e., reproductive curve; and d state-space modeling assumes both system error and observation error

One-step-ahead fitting is a method in which population records are transformed into relationships of the focal process, generally reproduction, and a model is then fitted to the data (Fig. 6c). In contrast to the aforementioned trajectory matching, observation errors are not considered in this method. If the population size can be perfectly counted without considering the observation error, one-step-ahead fitting can perfectly account for the stochasticity in the system process. These methods have been applied in long-term laboratory experiments on insect pests to test inter- and intra-species regulation (Bellows 1982; Melbourne and Hastings 2008; Shimada 1989).

State-space modeling is the exact solution for data assimilation that contains both process and observation errors because the records of practical pest management trials in the field inevitably contain both. In state-space modeling, a stochastic model with two layers was constructed as the skeleton structure (Fig. 6d). The primary layer describes the system updates of the pest population, and population sizes (Xt) are assumed to be the true values in the model. The second layer assumes an observation error, and the observed population size (Yt) contains measurement or sampling errors. For example, another stochastic variant of the Ricker maps (mean = \(X_t \exp \left( {a - bX_t } \right)\), variance = \(\sigma_{{\text{proc}}}^2\), cf., Eq. 3) can be the first layer and an additional layer is described as random sampling from a normal distribution with a mean of Xt and variance of \(\sigma_{{\text{obs}}}^2\) (Eq. 4):

$$\begin{aligned} X_{t + 1} & \sim {\text{Normal}}\left( {X_t \exp \left( {a - bX_t } \right), \sigma_{{\text{proc}}}^2 } \right) \\ Y_t & \sim {\text{Normal}}\left( {X_t , \sigma_{{\text{obs}}}^2 } \right) \\ \end{aligned}$$
(4)

There are several tools available for fitting such a state-space model to field data. The Kalman filter is a classic method that assumes normal distributions in both system and observation updates, as seen in Eq. 4 (de Valpine and Hastings 2002). Owing to the linear nature of the normal distributions in Eq. 4, all likelihoods of the observations (Yt) given by the true process values (Xt) were parameterized by the model parameters and the initial value (X0). Consequently, we can obtain an analytical solution that maximizes the joint likelihood probabilities of the model predictions provided by the data. Yamamura (2016) analyzed 50 years of records of Chilo suppressalis Walker (Lepidoptera: Crambidae) and Nephotettix cincticeps (Uhler) (Hemiptera: Cicadellidae) in Japan’s national pest-monitoring program using state-space modeling with a Kalman filter. He successfully quantified the effects of external control and of internal regulation considering observation error from long-term data and concluded that the population of N. cincticeps is strongly regulated by intra-specific control, whereas that of C. suppressalis is mainly influenced by external perturbations.

The Kalman filter is limited because it can only handle normal distributions. If the system process has a stochastic component that follows a distribution other than normal, the Kalman filter cannot function. The Monte Carlo Markov chain (MCMC) is a powerful and computationally intensive method to overcome the limitations of the Kalman filter (Fukaya 2016; Meyer and Millar 1999). It simulates the true population value (Xt) at time step t and evaluates its concordance with the observed data (Yt), assuming that all other population sizes are derived from previously estimated values. Parameter values are then randomly adjusted to improve the agreement between the model predictions and data. These two processes are iterated hundreds of thousands of times for all time steps until all parameters converge at certain values.

State-space modeling with MCMC estimation is one of the best solutions for connecting population models to actual field records. Osada et al. (2018) constructed a multistate occupancy model, a variant of the state-space model for categorical records, to fit spatially and temporally extensive pine wilt disease (PWD) records in Japan. They revealed a positive density dependence (Allee effect) in Monochamus alternatus Hope (Coleoptera: Cerambycidae) population growth, which is a vector of PWD. Climatic conditions limited the invasion spread of PWD in the northern regions but were generally weaker than the Allee effect in M. alternatus.

State-space modeling quantifies the relative contributions of observation and system errors, and this modeling procedure facilitates an understanding of the mechanisms behind the data (Bolker 2008). In this sense, assimilation by state-space modeling is a good compromise between the beauty of deductive population models and inductive statistics. Nevertheless, striking the right balance between model complexity, data quality, and uncovering underlying structures in the data can often be challenging. Achieving successful results in this context often demands a degree of artistic sensibility. Incorrect model relationships can lead to erroneous conclusions.

Regarding predictability, state-space modeling is superior to other statistical methods, such as a generalized mixed-effect model or autoregressive models. Even so, it largely depends on whether we can capture the main mechanism in the data (Luo et al. 2011). If we fail to depict the fundamental structures in the model, the parameter estimation will not converge in the MCMC algorithm or will have wide confidence intervals (Bolker 2008). Thus, we can only obtain a poor prediction or cannot obtain any.

Data-driven science and AI

Currently, we are confronting a new “big data era”. Satellite images, automated measurements in the field, and personal information from cellular phones are accumulated in cloud storage of hundreds of terabytes to petabytes. With the accumulation of pest occurrence records, AI has become popular in various scientific fields, including pest management science (Kishi et al. 2023). In particular, DNNs shocked the world in 2015 by surpassing the image-recognition ability of humans in a competition of computer vision (Langlotz et al. 2019).

The basic idea of a neural network dates back to 1943. A neurophysiologist and a mathematician wrote down the neuronal transmission pattern using mathematical expressions (McCulloch and Pitts 1943). The inputs (data) are sent to a neuron that integrates the input information on the basis of their importance (weights: w1, w2, and w3) (Fig. 7a). After adding the bias (b) to the integrated information, the neuron propagates its activation status, which is calculated using an activation function, to the next neuron (output). If there is only one neuron and one output, it behaves like a simple linear multiregression model. However, neurons can be connected in a network structure and, in theory, a wide variety of nonlinear output responses could have been described (see the structure of Fig. 7b) (Smith 1993).

Fig. 7
figure 7

Schematic explanation for neural networks. a An artificial neuron in networks. b A false example of the general structure of deep neural network applying to time series. c Long short-term memory (LSTM) internal architecture. d Unfolded expression of the LSTM learning procedure. Yt observation data, wt weights, b bias, ht hidden state variables (or short-term memories), ct memory cells (or long-term memories)

Since then, many researchers have attempted to develop multilayer neural networks, known as DNNs, but achieving human-level recognition remained elusive until recently. Despite being much simpler than human brains, DNNs still involve thousands of parameters and face challenges due to the “vanishing-gradient” problem. In DNNs, parameters are estimated sequentially backward from output to input on the basis of their gradient values against the loss function. The loss function typically measures the deviation between true values and model estimations. Based on previous gradients in a chain, this gradient calculation process is called “backpropagation”. When there are numerous network paths from end to start, gradients near the input tend to be diluted, making parameter estimation a challenging task.

After 2010, the vanishing-gradient problem was gradually alleviated. Researchers have started using ReLU as an activation function rather than a sigmoid or hyperbolic tangent because it is durable against gradient dilution (Fig. 7a) (Gu et al. 2018). Many shortcuts at every few layers in a DNN architecture can keep the gradient descent valid, even in the upstream (He et al. 2016). Novel optimization techniques have been developed specifically for backpropagation (Bach and Moulines 2011; Shamir and Zhang 2013).

In addition to the technical breakthroughs described above, DNNs and AIs generally have the unique feature of super empiricism, i.e., predictions are primarily important (see Fig. 5 of science philosophy). To achieve high predictability, AI divides the records, i.e., pairs of inputs (predictors = environmental measurements) and outputs (response values = pest occurrence records), into three groups: training data, validation data, and test data (Stevens and Antiga 2019). Training data were fed for data-model fitting, and validation data were specifically employed to estimate hyperparameters such as model structures. The final test data were preserved to objectively evaluate the model’s accuracy. The model can be considered as a black box because we are not interested in the structure of the model itself. Such an AI attitude reminds us of Karl Pearson, an empiricist incarnate in the early twentieth century. He insisted that no mechanical structures behind the data could be observed, and any elucidations of causal relationships inevitably contain human prejudice and are useless in science (Otsuka 2020). In fact, because of their impressive predictability, DNNs that do not have any explainable structure have already been applied not only to information engineering but also to robotics, medical science, social science, economics, and other disciplines. However, it should be noted that DNNs generally have many parameters, which, in turn, require a large amount of data to validate their training.

DNNs can be applied to predict pest prevalence, but not in a simple manner. Pest-monitoring records are generally time series, and generic DNNs cannot properly predict the future without considering the sequential nature of the data (Fig. 7b). A special type of DNN, the recurrent neural network (RNN), has historically been employed for this purpose (Elman 1990). In an RNN, the same architecture is recursively recalled in the gradient calculation while the time-series records are fed into the model individually (Fig. 7d). Long short-term memory (LSTM) is one of the most successful RNNs having short-term and long-term memories (Hochreiter and Schmidhuber 1997; Jiang and Schotten 2020).

RNNs typically have an additional “vanishing problem” and also have a new “exploding problem” during the gradient calculation process compared to those of DNNs in general. As the parameters in an RNN are the same throughout the data-feeding process, they are repeatedly multiplied during the gradient calculation for optimization. Therefore, the gradient can easily vanish (decrease to zero), or explode (grow exponentially). LSTM overcomes this problem because the calculation of long-term memory (ct in Fig. 7c) does not contain its own repeated multiplications. Short-term memory (ht) functions as a hidden process, which works like system values in state-space modeling and is controlled by long-term memory (ct) to determine the amount of information that should be transmitted to the next time step. Because of this memory-controlling mechanism, LSTM enables us to deal with very long time series with multiple states, e.g., long-term monitoring records of specific pests at multiple stations.

Though the practical applications of LSTM in entomological records are still scanty, LSTM is intensively applied to epidemiological time series (ArunKumar et al. 2021; Gomez-Cravioto et al. 2021; Kırbaş et al. 2020). As human health issues are the most critical for public interest among all scientific disciplines, governments and international organizations spend large budgets on disease monitoring. Consequently, a large amount of data is available for feeding into the LSTMs. Generally, LSTMs predict the future far better than traditional statistical approaches. For example, Gomez-Cravioto et al. (2021) reported that LSTM predicts the future incidence of COVID-19 in Latin America 47.2% better than the traditional autoregressive models. As far as I know, there is no clear comparison of the predictability of LSTMs and state-space modeling. Though both have a similar structure, LSTM can model a more flexible relationship in the data than the state-space modeling. Therefore, I believe that LSTM predicts the future better than state-space modeling.

If we can accumulate vast amounts of pest-monitoring records in addition to physical and environmental data in the field with high precision, we may be able to employ LSTMs or other DNNs to achieve a high prediction of pest prevalence in the future. Moreover, LSTM and DNNs are generally highly generalizable. The coding skills of the LSTM and sufficient data are the only requirements. Conversely, the deductive approach, including state-space modeling, requires professional skills in statistics and biology because we need to envision meaningful structures, a priori, behind data to obtain valid predictions.

Concluding remarks

There is no denying that empiricism and rationalism contributed to the rapid development of modern science and technology, acting as the two driving forces of the twentieth century. We should elucidate the mechanistic structure of nature using deductive approaches, such as mathematical models or simulations, and test the consequences of such models in rigorous inductive experiments or long-term observations. State-space modeling is a good example of the fusion of empiricism and rationalism.

AIs, specifically DNNs, are powerful inductive prediction tools in the contemporary world, and their importance will continue to increase. However, DNNs are highly complicated and resemble black boxes. It is difficult to determine which biological or physical factors drive pest populations in DNN analysis. Consequently, we may have been unable to construct rational countermeasures. Therefore, any deductive approach that explains the causality or mechanical structures behind DNN predictions will be helpful for a deeper understanding of the behavior of DNNs and will provide important insights for detecting deficient information in the data.