Log in

Identification of potential causal variables for statistical downscaling models: effectiveness of graphical modeling approach

  • Original Paper
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript

Abstract

Selection of potential causal variables (PCVs) from a pool of many possibly associated variables is a critical issue since it can significantly affect the performance of any statistical downscaling model. Generally, the variable to be downscaled is associated with many other hydrologic and climatic (aka hydroclimatic) variables. Most of the existing approaches, such as correlation analysis (CA), partial correlation analysis (PaCA), and stepwise regression analysis (SRA), rely mostly on the mutual association for the selection of PCVs. However, none of these approaches investigate the detailed dependence structure that may be helpful in eliminating the unwanted information and efficiently selecting the PCVs for downscaling the target variable. In this study, the effectiveness of graphical modeling (GM) approach is explored for the selection of the PCVs as GM can effectively identify the detailed conditional independence structure among all the associated variables. For demonstration, downscaling of monthly precipitation is undertaken using the PCVs, identified by CA, PaCA, SRA, and the proposed GM approach. Two different downscaling models, namely statistical downscaling model (SDSM) and support vector regression (SVR)–based downscaling model, are utilized. The results show that the PCVs identified through the proposed GM approach provides consistent as well as robust performance, across different regions and seasons, due to its ability to capture the complete conditional indepedence structure among the variables. The downscaled monthly precipitation obtained using the proposed approach is better matching with the observed data in terms of the mean, variance as well as the probability distribution. Overall, this study recommends the GM approach for the identification of the PCVs for the downscaling models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

Download references

Funding

The work was partially supported by the sponsored projects supported by Department of Science and Technology, Climate Change Programme (SPLICE), Government of India (Ref No. DST/CCP/CoE/79/2017(G)), through a sponsored project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajib Maity.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1. Mathematical details of correlation analysis, partial correlation analysis, and stepwise regression analysis

Appendix 1. Mathematical details of correlation analysis, partial correlation analysis, and stepwise regression analysis

1.1 Appendix 1.1. Correlation analysis

The correlation analysis (CA) is the most commonly used approach for selection of the PCVs. Strong correlation of the causal variables, from a pool of possibly associated hydroclimatic variables, with the target variable is the most basic criteria for selection of PCVs. In this approach, the selection is governed by the correlation coefficient between the associated variables and the target variable to be downscaled. A certain value of the correlation coefficient is considered the threshold value and all the associated variables having equal or higher correlation are considered the PCVs for downscaling. Pearson’s correlation coefficient is used in this study and the same can be expressed as follows,

$$ {r}_{xy}=\frac{\sum \limits_{i=1}^n\left({x}_i-\overline{x}\right)\left({y}_i-\overline{y}\right)}{\sqrt{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2\sum \limits_{i=1}^n{\left({y}_i-\overline{y}\right)}^2}} $$
(5)

where rxy is Pearson’s correlation coefficient between the associated variables (X) and predictand (Y), n is the number of observations, xi and yi are the observations of X and Y respectively, and \( \overline{x} \) and \( \overline{y} \)are the means of X and Y respectively. The p value is evaluated, considering the correlation coefficient to follow t distribution at 95 % confidence level with n − 2 degrees of freedom. The causal variables with p value greater than 0.05 are recommended to select as the PCVs of the statistical downscaling model.

1.2 Appendix 1.2. Partial correlation analysis

Partial correlation is the measure of association between two variables (a particular associated variable and target variable), while controlling the effect of other associated variables. The partial correlation analysis (PaCA) can be used to identify the PCVs for downscaling as it adjusts the effect of other associated variables. The partial correlation coefficient between two variables controlling the third variable can be expressed as follows,

$$ {r}_{xy,z}=\frac{r_{xy}-{r}_{xz}{r}_{yz}}{\sqrt{\left(1-{r}_{xz}^2\right)\left(1-{r}_{yz}^2\right)}} $$
(6)

where rxy, z is the partial correlation between two variables X and Y when the third variable Z is controlled and rxy, rxz, ryz is the correlation coefficient between X and Y, X and Z, and Y and Z respectively. The p value is evaluated, considering the partial correlation coefficient to follow t distribution at 95% confidence level with n − 3 degrees of freedom. The causal variables with p value greater than 0.05 are recommended to select as the PCVs of the statistical downscaling model.

1.3 Appendix 1.3. Stepwise regression analysis

The stepwise regression analysis (SRA) is a method of fitting a regression model by stepwise removal of the least significant variables until all the remaining variables are significant. This method is often used for selection of PCVs when a large number of associated variables are available and to deal with issues related to multi-collinearity. In this technique, initially all the causal variables are considered in the model. At each step of the analysis, a variable is included or excluded from the model usually based on the partial F-tests. If F is greater than the critical F value, the causal variables can be included in the equation. The partial F statistic can be expressed as follows,

$$ F=\frac{\left({R}_q^2-{R}_{q-1}^2\right)\left(n-q-1\right)}{\left(1-{R}_q^2\right)} $$
(7)

where R is the correlation coefficient between a criteria variable and prediction equation, q is the number of causal variables in the equation, and n is as defined before. If the test statistic is less than the critical F value at 95% confidence level with degree of freedom (n − q − 1), the causal variables should be excluded from the equation. The causal variables with p value greater than 0.05 are recommended to select as the PCVs of the statistical downscaling model.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dutta, R., Maity, R. Identification of potential causal variables for statistical downscaling models: effectiveness of graphical modeling approach. Theor Appl Climatol 142, 1255–1269 (2020). https://doi.org/10.1007/s00704-020-03372-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00704-020-03372-4

Keywords

Navigation