1 Introduction

China is one of the world pioneering countries in promoting the development and acceptance of electric vehicles (EVs) due to the unprecedent challenges of severe air pollutions and significant amount of CO2 emissions countrywide. In 2013, large-scale smog inflicted more than 100 large cities spread in 30 provinces. The dramatic growth of motor vehicles in the past decade partly contributes to the heavy haze in many metropolis areas [1].

Both pure battery EVs and plug-in hybrid EVs are very promising technical alternatives of fossil fuel powered vehicles and they play an important role in decarbonizing the transportation section and in significantly reducing the exhaust gas emission in urban areas. However, to form an EV-oriented ecosystem for future transportation is a totally new and complicated decision process, necessitating a large number of trials and errors. Governments are putting forward incentive policies to support the development of EVs yet based on very limited information about the future behavior of customers, complicated with the uncertainty about the ultimate shape of the EV systems. The large difference between the government’s plans and the reality is discussed in the recent reports, and the goal to have 0.5 million EVs sold and used until 2015 is facing a big challenge [2]. In this highly-connected world, the future EV systems will in no doubt impose significant impacts on the future power systems as well [A1 and A2 in Appendix compare the authors’ results with the Deloitte’s ones for China, and show very similar distributions.

2.2 Importance ranking of factors

Even if only the key factors are selected and surveyed in a limited number of questions, a complete comprehension of people’s preferences on these key factors still proves to be difficult, especially when the number of questionnaires is limited. Although each question in the questionnaire is only related to one factor, the whole answer sheet still reveals a deterministic view of a person in regards to the relations among all factors. Thus, the joint probability information of the psychological thresholds of all the five factors must have been embedded in the complete set of the questionnaires collected.

If a multi-agent model is built to replace a group of respondents, the threshold value of each factor for each agent must be tuned based on the high-dimensional joint probability distribution that reflects the purchasing willingness of the survey respondents, though 200 samples are still far from enough to create an accurate joint probability map of the five factors by any conventional method. In order to overcome this difficulty, one more question is included in the questionnaire which requires each respondent to sort the factors by their importance. The proportion of a factor being chosen as the first important one is listed in the following descending order: purchase price (37.0%), range (36.5%), charge time (11.5%), price difference (10.5%), and fuel price (4.5%). The proportions of these factors being selected as the second important one are: purchase price (12.0%), range (28.5%), charge time (37.5%), price difference (12.5%), and fuel price (9.5%), as shown in Appendix B. However, it is impossible to know from these questionnaires the probability that respondents take the range as the second important factor while taking the purchase price as the first important one. This question is not answered by this table, as well as from the results shown in [18]. The actual conditional probability is 16.4% which is not equal to the total probability of 28.5%. In order to reproduce the joint probabilistic distribution of such group of factors after bulk samplings, the sampling agents have to be simulated based on the joint probabilistic distribution.

In order to store joint probabilistic data effectively, the data structure with multiple layers representing conditional probability is adopted. The concept of “sorted importance layer” is defined because the ordering of factors in the sampling process may obviously influence the utilization of information from the questionnaires. In the data structure, layer i corresponds to the frequency distribution of the i th important factor in all ni+1 possibilities, and the joint frequency distribution between layer i and ni possibilities in layer i+1, here n is the total number of factors. The latter distribution depends not only on the sampling in the current layer, but also on previous sampling results. The ordering of the sampling process uses the importance data associated with sample factors, rather than the reverse. The joint probability of a layer with smaller i implies larger entropy, it therefore needs a sufficiently larger number of samples in order to obtain a reliable statistic result. The entropy of layers with larger i is relatively smaller, hence an approximate statistical results can be used.

Given the above discussions, Fig. 1 illustrates a tree correlation data structure among adjacent layers for sorting factor importance, reflecting the correlations of frequency distributions. Since there is no uncertainty associated with the final layer, the “tree” structure has 4 layers to represent the correlations among the five factors. Here, a k (k = 1, 2, 3, 4, 5) represents the five features respectively, namely range, charge time, price difference, purchase price and fuel price.

Fig. 1
figure 1

Multi-layer1 frequency distributions

2.3 Rules for data reconciliation for insufficient number of samples

The joint frequencies on the correlation “tree” are counted along with the layers from the top to the bottom. However, there are cases of insufficient samples which often occur at the bottom layers of the “tree”. If the frequency is too low for a certain node, this implies that the information provided to next layers might be meaningless, and approximate distributions therefore shall be used to replace the original ones for the purpose of compensating missing information. The designed rules to control the replacement are as follows:

Rule 1: If the number of samples in the selected layer is sufficient, the frequency counting is strictly conducted on the corresponding data collected from the questionnaires.

Rule 2: If the number of samples in the selected layer is insufficient, i.e. the number is less than a threshold value α (α is set as 8 in Fig. 1), the correlation between factors is ignored and the independent distribution of the corresponding factor is used directly.

Rule 2 will be adopted in the cases of very low frequencies, therefore the influence on the accuracy is limited, which is confirmed through a number of simulation studies.

2.4 Information extraction for different psychological thresholds of factors

Figure 2 shows the upper-triangular correlation matrix reflecting the joint probability between the 1st sorted importance layer and the 2nd layer (Table B2 in Appendix B shows the independent distribution of each single factor’s psychological threshold). Including repeated ones, the matrix contains n × n sub-matrices, where n is the number of factors. Here, the sub-matrix (i, j) records the frequency if factor j is drawn in the 2nd layer while factor i is selected in the 1st layer. For example, the sub-matrix (2, 4) records the frequency if the charge time is placed at the 1st importance layer and the purchase price is placed at the 2nd layer.

Fig. 2
figure 2

Joint distribution of psychological thresholds of factors in a two layer structure

The dimension of each sub-matrix is decided by the number of the thresholds of the corresponding factor, which is equal to 5 in Fig. 2. Every row represents a threshold value \( \mathop d\nolimits_{{ 1.g_{ 1} }} \) of the 1st important factor, while every column represents certain threshold value \( \mathop d\nolimits_{{ 2.g_{ 2} }} \) of the 2nd important factor. The value of a sub-matrix’s element records the joint frequency in which thresholds \( \mathop d\nolimits_{{ 1.g_{ 1} }} \) and \( \mathop d\nolimits_{{ 2.g_{ 2} }} \) are both selected by the participants of the questionnaire based survey. Figure 3 is the flow chart to compute the complete joint frequency \( \mathop d\nolimits_{{ 1.g_{ 1} }} \mathop d\nolimits_{{ 2.g_{ 2} }} \mathop d\nolimits_{{ 3.g_{ 3} }} \mathop d\nolimits_{{ 4.g_{ 4} }} \mathop d\nolimits_{{ 5.g_{ 5} }} \) among all factors.

Fig. 3
figure 3

Sorting algorithm based on psychological thresholds of factors

Since there are more choices for the psychological threshold of a factor, Rule 3 is introduced to fully utilize the information from the answer sheets and to reduce the approximation error.

Rule 3: If the number of samples is less than a threshold value β in a group where the participants choose exactly the given psychological threshold, then a new group will be used for counting where participants choose a value equal or larger than the given psychological threshold.

3 Multi-agent modeling with full set of behaviors

The first half of Fig. 4 shows the algorithm to build multi-agents reflecting customers’ willingness to buy EVs based on the multi-dimensional information embedded in the collected questionnaires. The key part is the extraction of joint probabilistic distributions from the sorted importance data of different factors, and the distribution of psychological thresholds of these factors. The second half of Fig. 4 uses these distributions to generate individual agents as many as needed.

Fig. 4
figure 4

Flow chart for generating individual agents

4 Verification of multi-agent simulation results

In the verification, different target EV types are tested with all factors being randomly selected, and the ratio of questionnaires with all the thresholds being reached is used as the benchmark reflecting respondents’ willingness to buy EVs for the comparison purpose. Meanwhile, different sets of Monte-Carlo simulations, where 100000 agents for each set, are generated using the above algorithm to acquire the ratio of potential buyers who are satisfied with this EV type (the purchase ratio below for short), and the errors in comparison with the benchmark results are recorded. The statistics of simulation errors from a large number of trials with different EV types confirms the effectiveness of the rules adopted above for extracting the joint distribution statistics and the multi- agent models developed.

For a certain subset containing m answer sheets, samples are randomly selected from the total for building the statistical multi-agent model. The purchase ratio is acquired from simulations for the EV type 2 (as shown in Appendix C) used as the target vehicle, and presented as a point in Fig. 5. Repeating the procedure with m from 80 to 200, in a step of 10, the results are presented as a dashed curve. Then the whole process is renewed for the above random selections to obtain the 10 dashed curves in the figure.

Fig. 5
figure 5

Comparison between agent-based simulation results and data taken from questionnaires

The solid line with circles shows the benchmark, where the value with respect to m = 200 is the actual purchase ratio. As the value of m increases, all individual curves converge to the benchmark.

A curve marked by n in Fig. 6 is the average value of n dashed curves in Fig. 5. Figure 6 shows that the curve with a large enough n, e.g. 10, is highly matched with the benchmark.

Fig. 6
figure 6

Comparison between average agent-based simulation results and data taken from questionnaires

Figure 7 shows the influence of the rules on the simulation error. The horizontal axis represents the number of questionnaires taken for extracting the distributions of willingness to buy EVs. The vesrtical axis shows the relative error. Otherwise, the symbols and legends are the same as in Fig. 6. Figure 7a shows the results where the thresholds α and β that control the switches towards rule 2 and rule 3 respectively are both set as 0. It reveals that these two rules were never used. It also shows that the relative error can be confined to a level below 10% only when the number of questionnaires is equal or greater than 150. This indicates that the degree of information loss has a significant impact on the simulation precision.

Fig. 7
figure 7

Impact of the number of questionnaires on the simulation error

In Fig. 7b, α and β are set as 8 and 40 respectively, the relative error has clearly dropped. The comparison results confirm that the algorithm proposed in this paper will still be valid for a less number of questionnaires.

5 Simulation analysis of group behaviors

5.1 Hybrid EE-based simulation using both multi-agents with uncertain behaviors and human participants

In this paper, it has been shown that the multi-agent system should be modeled with deep information extracted from the questionnaires as much as possible, especially with the joint probability information of important factors, in order to accurately reflect the statistical regularities of decisions of respondents. With the multi-agent system, a study based on the hybrid simulation including a lot of individuals controlled either by human experimenters or by the probabilistic multi-agent model can be performed.

The following 2 experiments are our recent attempts along this research direction. The first experiment tests the agent model in different scenarios of vehicle types, in order to extract more information from the questionnaires and help EV producers to evaluate the popularity of their vehicle types; the second one extrapolates possible future morphology of the EV market based on the spatio-temporal information taken from the current questionnaires by partially customizing agents’ preferences.

5.2 Influence of vehicle parameters on EV purchasers’ willingness

This simulation studies the correlation of vehicle parameters and the influence of different psychological thresholds of factors on the EV purchasers’ willingness. The 3 test scenarios are listed in Appendix C.

As shown in Fig. 8a, the ratio of willingness to buy EVs increases slightly when the range varies from 0 to 80 km; the ratio rises quickly when the range is within 80–320 km; and finally the ratio’s growth rate drops again when the range is greater than 320 km. Figure 8b shows a linear relation between the charge time and the ratio. The relation between price difference (purchase price) and the ratio is exponential in different degrees, as shown in Fig. 8c, d respectively. The relation between fuel price and the ratio has two linear parts divided by a discontinuity point, above which customers are more sensitive to the parameter variance, as shown in Fig. 8e.

Fig. 8
figure 8

Influence of EV parameters on willingness to buy EVs

5.3 Influence of customers’ preference variation on purchase ratio

Customers’ preferences vary in different places and time [18]. The influences of preferences’ variation on the willingness to buy EVs can be studied through altering the probability distribution used in the multi-agent models, even turning it into a time-varying one. In Fig. 9, the preference to the range is studied, as the customers are more sensitive to this parameter. Curve 0 is the result obtained from the original questionnaires. For curve 1, the psychological thresholds of all the customers are set higher. Compared with Curve 0, Curve 1 leads to the most significant drop of purchase ratio at the range of 320 km (the median of the range’s possible variation). On Curve 2, all the customers’ psychological thresholds are distributed uniformly. The simulation result indicates that while the market of low-end EVs improves, the high-end EVs market however gets worse.

Fig. 9
figure 9

Influence of preferences to the range on the ratio of willingness to buy EVs

6 Conclusions

The hybrid EE-based simulation techniques combining multi-agents and human participants strike a balance among different computation considerations involving strong subjective willingness of participants and a large number of simulated individuals. In this paper, a model is first developed to describe the uncertain psychological thresholds for different characteristic factors. For the research of people’s willingness to buy EVs, the joint probability distributions can be extracted by this model from a limited number of relevant questionnaires collected from participants. A probabilistic multi-agent model is then constructed to fully reproduce the statistical distribution of respondents’ willingness in response to different characteristic factors. Case studies with various target EV types in this paper confirm the authenticity of the model. Further, by tuning parameters of the probabilistic agent model, the influence of preferences’ variations on the purchase ratio can be investigated. The method proposed in this paper helps to increase the simulation scale, maintain comparability among repeated trials, and reflect the effect of online attendance of human participants. It further offers a modeling approach for human behaviors, and the statistical rules proposed can help to effectively deal with difficulties arising from insufficient number of samples. In summary, the proposed method provides a powerful simulation platform for analyzing the influences of various factors on the development of EV industry.