1 Introduction

In the study of statistics, one very often needs to somehow measure the dependence between two variables to understand their behavior. It is useful to know if there is some relationship, how strong it is and if we can use it, for instance, to predict the future observations. Consequently, it is important to have a suitable coefficient that works as a measure of dependence.

Several different options have been introduced for this exact purpose over the history. Already in the 19th century, Pearson’s correlation coefficient was first defined to identify linear dependence between variables. Later, its definition was extended to create Spearman’s correlation coefficient in 1904 by C. Spearman (Spearman, 1904), the maximal correlation in 1941 by H. Gebelein (Gebelein, 1941), and the distance correlation in 2007 by G.J. Székely et al. (Székely et al. 2007) so that also non-linear and non-monotonic dependence could be detected. The birth of C. Shannon’s information theory (Shannon, 1948) in the 1940s enabled measuring non-functional dependence by using the mutual information, as formulated in 1957 by E.H. Linfoot (Linfoot, 1957), and yet another quantity named the maximal information coefficient (MIC) was proposed in 2011 by D.N. Reshef et al. (Reshef et al. 2011). Furthermore, there exist local measures of dependence, such as the correlation curve (Bjerve and Doksum, 1993) and the local Gaussian correlation (Tjøstheim and Hufthammer, 2013), and dependence between random variables can be also described with a type of multivariate cumulative distribution function called a copula (Sklar, 1959).

It is important to note that the coefficients of a single number or index cannot fully reveal the real nature of the underlying dependence (Balakrishnan and Lai, 2009) but, given their simple expression, different correlation coefficients, mutual information, and the MIC are very useful and therefore interesting topics of study. However, the number of these coefficients brings forth the question about which one of them should be used in a given situation. In (Rényi, 1959), A. Rényi introduced seven fundamental properties for a measure of dependence, including symmetry, values ranging the interval [0,1], and the value 0 meaning independence. Since most of the requirements by Rényi are trivially fulfilled by the aforementioned coefficients or their slightly modified versions, we do not consider these properties here but instead use the three following criteria, out of which the first and the third one were introduced in (Reshef et al. 2011) and the second one is notably studied in (Kinney and Atwal, 2014).

Firstly, we need to consider the generality of the measures of dependence because it is important that a chosen coefficient can be applied into different situations. Does our quantity only detect linear, monotonic, or functional dependence, or can it also recognize more complicated relationships between the variables? It must be taken into account whether the coefficient is designed for continuous or discrete variables, and how many observations it needs to work properly.

The other significant requirement is the power of the coefficient. How effective the measure is when used in a statistical test to decide whether there is some association between the variables or not? Namely, we can use any of our measures to test a null hypothesis of no dependence between two variables by first choosing a suitable threshold value from data of independent variables so that the probability of rejecting a true null hypothesis is fixed and then computing the probability of rejecting a false null hypothesis with the chosen threshold and data of two dependent variables. It is known that the amount of statistical noise in the relationship affects the power of the coefficients and, in particular, the MIC has been criticized for having too low power in case of noisy data (Simon and Tibshirani, 2014).

The third criterion is the equitability of the measures of dependence. Does the coefficient give similar values for such relationships that are based on different functions but have the same level of noise? Especially, this property was first attributed for the MIC in (Reshef et al. 2011) but, according to (Kinney and Atwal, 2014), it does not work as well as implied earlier.

While each of the coefficients considered here has been already studied separately (Asoodeh et al. 2015; Kinney and Atwal, 2014; **ao et al. 2016) and there is a survey article by D. Tjøstheim et al. (Tjøstheim et al. 2022) about copulas and local measures of dependence, there is relatively little research comparing different non-local measures based only one coefficient. Our aim in this article is to fill this gap by studying Pearson’s and Spearman’s correlation coefficients, the maximal correlation, the distance correlation, mutual information, and the MIC together. To find out if there is some coefficient that detects dependence always better than the others, we study them experimentally through several simulations implemented with the programming language R.

The structure of this article is as follows. First, we define of all the measures of dependence studied here and explain the methods for their computation in Section 2. In Section 3, we introduce our models and check what kind of values our coefficients give for them. Then, in Section 4, we compare the power of our coefficients by also considering how it is affected by certain elements, such as the exact type of dependence, the amount of noise, and the number of observations. Finally, in Section 5, we study the equitability of the coefficients under different functional relationships.

2 Preliminaries

Let us first define all the measures of dependence and show how they can be computed with the programming language R. If we have observations (xi,yi), i = 1,...,n, from two variables X and Y, we can estimate the correlation between these variables by computing Pearson’s correlation coefficient (**ao et al. 2016, (4), p. 3868)

$$ \begin{array}{@{}rcl@{}} r=\frac{{\sum}_{i=1}^{n}(x_{i}-{\overline{x}})({y_{i}}-{\overline{y}})}{\sqrt{{\sum}_{i=1}^{n}({x_{i}}-{\overline{x}})^{2}{{\sum}_{i=1}^{n}}(y_{i}-{\overline{y}})^{2}}}\in[-1,1], \end{array} $$
(2.1)

where \({\overline {x}}\) and \(\overline {y}\) denote the means of vectors (x1,...,xn) and (y1,...,yn), respectively. This coefficient was designed for measuring linear dependence between two variables whose marginal distributions are assumed to be normal, but it can also recognize non-linear dependence as long as it is monotonic.

One of the most well-known alternatives for Pearson’s correlation coefficient is Spearman’s correlation coefficientrs, which is found for n paired observations (xi,yi) by first converting them into their rank numbers and then calculating Pearson’s correlation coefficient of these ranks (**ao et al. 2016, p. 3869). Spearman’s coefficient is also from the interval [− 1,1] but, compared to Pearson’s coefficient, it suits better for such situations where the dependence is non-linear but monotonic or the variables are not normally distributed. Still, neither Pearson’s nor Spearman’s correlation coefficient is a good choice when the relationship between the variables is non-monotonic.

However, we can use the maximal correlation (Asoodeh et al. 2015, (1), p. 27)

$$ \begin{array}{@{}rcl@{}} \rho_{\max}=\sup\{\rho(f_{0}(X);f_{1}(Y))\}\in[0,1] \end{array} $$
(2.2)

to measure all types of functional dependence, regardless of if they are monotonic or not. Above, the supremum is taken over all the real-valued functions f0,f1 defined for the values of the variables X and Y, respectively, such that E(f0(X)) = E(f1(Y )) = 0 and E(f0(X)2) = E(f1(Y )2) = 1. The notation ρ(;) means here the population correlation, which is can be estimated from the data by computing Pearson’s coefficient r.

Another measure of dependence based on the definition of correlation is the (sample) distance correlation

$$ \begin{array}{@{}rcl@{}} \rho_{\text{dist}}=\sqrt{\frac{\mathcal{V}^{2}_{n}(X;Y)}{\sqrt{\mathcal{V}^{2}_{n}(X)\mathcal{V}^{2}_{n}(Y)}}}\in[0,1], \end{array} $$
(2.3)

where, for n paired observations (xi,yi) from the variables X and Y,

$$ \begin{array}{@{}rcl@{}} &&\mathcal{V}^{2}_{n}(X;Y) \!= \!\frac{1}{n^{2}}\sum\limits_{j=1}^{n}\sum\limits_{k=1}^{n}A_{j,k}B_{j,k}, \! \!\quad \mathcal{V}^{2}_{n}(X) \!= \!\mathcal{V}^{2}_{n}(X;X), \! \! \!\quad \mathcal{V}^{2}_{n}(Y) \!= \!\mathcal{V}^{2}_{n}(Y;Y),\\ &&A_{j,k}=|x_{j} \!- \!x_{k}|-\frac{1}{n}\sum\limits_{l=1}^{n}|x_{j} \!- \!x_{l}|-\frac{1}{n}\sum\limits_{l=1}^{n}|x_{k}-x_{l}| \!+ \!\frac{1}{n^{2}}\sum\limits_{l=1}^{n}\sum\limits_{h=1}^{n}|x_{l}-x_{h}|,\\ &&\text{and}\\ &&B_{j,k}=|y_{j}-y_{k}|-\frac{1}{n}\sum\limits_{l=1}^{n}|y_{j}-y_{l}|-\frac{1}{n}\sum\limits_{l=1}^{n}|y_{k}-y_{l}|+\frac{1}{n^{2}}\sum\limits_{l=1}^{n}\sum\limits_{h=1}^{n}|y_{l}-y_{h}|. \end{array} $$

This coefficient is much newer than the previous ones and should be able to recognize different functional relationships. Note that if the denominator in Eq. 3 is 0, we simply set ρdist = 0.

A slightly different way to identify dependence is compute the mutual information between variables X and Y, which is defined as a sum (Veyrat-Charvillon and Standaert, 2009, p. 431)

$$ \begin{array}{@{}rcl@{}} I(X;Y)=\underset{i}{\sum}\underset{j}{\sum} p(x_{i},y_{j})\log_{2}\left( \frac{p(x_{i},y_{j})}{p(x_{i})p(y_{j})}\right)\in[0,\infty) \end{array} $$
(2.4)

for discrete random variables X and Y with values xi and yj, and as an integral (Linfoot, 1957, (14), p. 88)

$$ \begin{array}{@{}rcl@{}} I(X;Y)={\int}_{x\in\mathcal{X}}{\int}_{y\in\mathcal{Y}} p(x,y)\log_{2}\left( \frac{p(x,y)}{p(x)p(y)}\right)dxdy\in[0,\infty). \end{array} $$
(2.5)

for continuous random variables X and Y with value sets \(\mathcal {X}\) and \(\mathcal {Y}\). While the exact value of the mutual information is often quite difficult to find because it requires knowing the probability distribution function p, this quantity can be estimated by dividing the domain into small bins and then using the so-called naive estimate (Kinney and Atwal, 2014, (6), p. 3356)

$$ \begin{array}{@{}rcl@{}} I_{\text{naive}}(X;Y)=\underset{\widetilde{x}, \widetilde{y}}{\sum}\hat{p}(\widetilde{x},\widetilde{y})\log_{2}\left( \frac{\hat{p}(\widetilde{x},\widetilde{y})}{\hat{p}(\widetilde{x})\hat{p}(\widetilde{y})}\right), \end{array} $$
(2.6)

where \(\hat {p}(\widetilde {x},\widetilde {y})\) is the fraction of data points inside one bin. The mutual information tells us the expected amount of information that the observations of one variable give about the other variable, and this measure therefore describes also non-functional relationships.

By denoting the estimate of the mutual information found with the bins of a rectangular nx × ny-grid G by IG(X;Y ), we can write the definition of the maximal information coefficient (MIC) as (Kinney and Atwal, 2014, (7), p. 3356)

$$ \begin{array}{@{}rcl@{}} \text{MIC}(X;Y)=\underset{n_{x}\times n_{y}}{\max}\frac{\max_{G} I_{G}(X;Y)}{\log(\min\{n_{x},n_{y}\})}\in[0,1]. \end{array} $$
(2.7)

Here, the value of the product nx × ny has usually some upper bound, such as B(n) = n0.6, where n is the number of paired observations. Clearly, the MIC is a non-parametric measure of dependence between the variables X and Y and, since its definition is based on that of the mutual information, it should also be able to detect both functional and non-functional dependence.

One of the issues when comparing these measures of dependence is that they are defined on different intervals. Here, we are interested in such a coefficient whose value is 0 if the variables X and Y are independent, 1 if one of these variables fully determines the values of the other, and some number from the interval (0,1) if there is a relationship between X and Y so that this value decreases as the amount of noise in the data increases. The maximal correlation, the distance correlation and the MIC already fulfill this condition, but we will consider below only the absolute values of both Pearson’s and Spearman’s correlation coefficient to deal with their values indicating negative correlation. Furthermore, because the mutual information is measured in bits and has sometimes values greater than 1, we consider the information coefficient of correlation (Linfoot, 1957, (13), p. 88)

$$ \begin{array}{@{}rcl@{}} r_{1}=\sqrt{1-e^{-2\cdot I(X;Y)}}\in[0,1], \end{array} $$
(2.8)

which was introduced in 1957 by H.E. Linfoot so that the value of the mutual information could be interpreted better.

Let us yet briefly introduce the methods of computation used in our simulations. Firstly, Pearson’s correlation coefficient can be computed with the base R-function cor and this same function also returns Spearman’s coefficient if we choose value “spearman” for its parameter “method”. The maximal correlation is found by first maximizing the linear correlation with the alternative conditional expectations algorithm ace from the package acepack and then using the function cor. The distance correlation can be computed with the function dcor from the package energy. The coefficient r1 is obtained by first discretizing the data with discretize from the package infotheo, estimating the mutual information the function mutinformation from the same package and just applying the formula in Eq. 8 in R. Finally, the MIC is computed with the function mine from the package minerva. We use here default settings for each function and more details can be found in the manuals of these R-packages.

3 Generality for Different Types of Dependence

In this section, we define nine different models of dependence, which can be seen from Fig. 1. For each type of dependence, we study the values of six different measures introduced in the previous section. The models below are built by generating observations from the normal distribution for the explanatory variable, but they can be easily redefined for some other marginal distribution.

Figure 1
figure 1

Scatter plots of one simulation from the models (9)–(12) with σ = 0 and n = 1000

In our simulations of functional dependence, the observations i = 1,...,n of the variables X and Y are generated according the model

$$ x_{i}\sim N(0,1),\quad y_{i}=f_{j}(x_{i})+\epsilon_{i},\quad \epsilon_{i}\sim N(0,\sigma^{2}), $$
(3.1)

in which the function fj is either the linear, logarithmic, cubic, quadratic, sinusoidal, or piecewise function, defined as

$$ \begin{array}{@{}rcl@{}} &&f_{1}(x)=x,\quad f_{2}(x)=5\ln(|x+5|),\quad f_{3}(x)=0.3x^{3},\quad f_{4}(x)=0.7x^{2},\quad f_{5}(x)=1.3\sin(3x), \\ &&f_{6}(x)=\min\{\max\{1\slash x,-3\},3\}. \end{array} $$

We also compute our coefficients for three non-functional models of dependence, including the cross-shaped dependence

$$ \begin{array}{@{}rcl@{}} &&x_{i}\sim N(0,1),\quad y_{i}\sim N(0,(\sigma\slash3)^{2}) \quad\text{for }i=1,...,\lfloor{n\slash2}\rfloor,\\ &&x_{i}\sim N(0,(\sigma\slash3)^{2}),\quad y_{i}\sim N(0,1) \quad\text{for }i=\lfloor{n\slash2}\rfloor+1,...,n, \end{array} $$
(3.2)

the circular dependence

$$ \begin{array}{@{}rcl@{}} (x_{i},y_{i})\in\{(h_{i}\cos(k_{i}),h_{i}\sin(k_{i}))\text{ }|\text{ }h_{i}\sim N(1,(\sigma\slash7)^{2}),k_{i}\sim N(0,1)\},\quad i=1,...,n, \end{array} $$
(3.3)

and the checkerboard dependence

$$ \begin{array}{@{}rcl@{}} &&x_{i}=k_{i0},\quad y_{i}=k_{i1}+\epsilon_{i},\quad\epsilon_{i}\sim N(0,(\sigma\slash2)^{2}),\quad i=1,...,n,\quad\text{where} \\ && \begin{pmatrix} k_{i0}\\ k_{i1} \end{pmatrix} \in \left.\left\{ \begin{pmatrix} k_{0}\\ k_{1} \end{pmatrix} \sim N\left( \begin{pmatrix} 0\\ 0 \end{pmatrix} , \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} \right) \text{~}\right|\text{~} \lfloor{0.7k_{0}}\rfloor-\lfloor{0.7k_{1}}\rfloor\equiv 0\pmod{2} \right\}. \end{array} $$
(3.4)

These models have been created so that the amount of statistical noise in the data can be added by increasing the value of the parameter σ > 0 in all the models except the cross-shaped model (10), where the amount of noise is increasing with respect to σ ∈ [0,3], decreasing with respect to σ ≥ 3, and the data comes from two independent, normally distributed variables if σ = 3.

First, let us consider the noiseless versions of these models with 1000 observations to see how our coefficients recognize different types of dependence without any disrupting factors. For each model, we compute the average values of the coefficients |r|, |rs| \(\rho _{\max \limits }\), ρdist, r1, and MIC in 1000 simulations with n = 1000 and σ = 0. The results of this experiment are collected in Table 1.

Table 1 The average values of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC in 1000 simulations of the models (9)–(12) with n = 1000 and σ = 0

From Table 1, we see that Pearson’s correlation coefficient |r| has a value of 1 only for the linear dependence, Spearman’s coefficient |rs| is 1 for all the monotonic relationships whereas the MIC is 1 for all functional models. Clearly, the two first coefficients cannot detect non-monotonic dependence properly and their values are very small for the symmetric models, like the cross-shaped, circular and quadratic types of dependence. Interestingly, the maximal correlation \(\rho _{\max \limits }\) always has larger values than the coefficients ρdist and r1 and it also exceeds the MIC for the models (10) and Eq. 12, even though the maximal correlation was designed only for identifying functional relationships.

By changing the values of the parameters σ and n in the simulations, we can see what kind of an impact the amount of noise and the number of observations, respectively, have on our measures of dependence. As we can see from Table 2, the values of the MIC decrease notably faster than those of the other coefficients, when the noise levels grow. According to Table 3, the correlation coefficients seem to decrease while the values of r1 and the MIC increase with respect to n. Note here that, even though Pearson’s coefficient can be defined for even just 3 observations, our methods of computation return 0 for the value of r1 if n ≤ 7 and, similarly, the distance correlation cannot be computed either if n ≤ 4.

Table 2 The average values of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC in 1000 simulations with n = 1000 observations from the model (9) with the linear function f1(x) = x, when the value of σ varies
Table 3 The average values of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC in 1000 simulations of the model (9) with the linear function f1(x) = x and σ = 1, when the number n of observations varies

It can also be studied how our coefficients behave if we modify the model (9) so that the observations of the variable X are generated from some distribution other than the standard normal distribution, such as the uniform, exponential or Poisson distribution. For instance, all the quantities give values close to 1 in case of the linear dependence, regardless of the exact marginal distribution of X, but the value of the distance correlation is greater for the sinusoidal model if we choose \(X\sim \text {Pois}(3)\) instead. It must be noted that these changes obviously also affect the shape of the data, though, and such as noise parameter should be chosen that the amount of noise is proportional to the range of the variable X.

However, the values of our measures of dependence do not tell us very much without any additional information. In order to draw any conclusions whether dependence in the data can be properly identified if, for instance, the MIC has a value of 0.3, we need to compare this result to the value of the coefficient computed from the data without any dependence. Consequently, we need to study here the power of our coefficients.

4 Power for Identifying Dependence

In this section, we study the power of six coefficients, including the absolute values of Pearson’s and Spearman’s correlation coefficients r and rs, the maximal correlation \(\rho _{\max \limits }\), the distance correlation ρdist, the coefficient r1, and the MIC. We apply the models (9)–(12) to create different types of dependence in our simulations. Furthermore, we consider how the amount of noise and the number of observations affect our results.

Recall that the power in a statistical test is the probability of rejecting a false null hypothesis. When studying the dependence between two variables, our null hypothesis is that there is no association between them, and we must therefore find out how likely it is to recognize the cases with some underlying dependence present. In order to measure this probability, we need to first decide the critical values of the coefficients which are used to decide if the null hypothesis is rejected or not with the significance level of α. In other words, the power is of some coefficient q is defined formally as the probability

$$ \begin{array}{@{}rcl@{}} P(q(X,Y)>q_{\text{crit}}\text{~}|\text{~}X\not\perp Y)\quad\text{for}\quad\{q_{\text{crit}}\in[0,1]\text{~}|\text{~}P(q(X,Z)> q_{\text{crit}}\text{~}|\text{~}X\perp Z)=\alpha\}. \end{array} $$
(4.1)

Consequently, let us compute the values of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC in 3000 simulations, each of which consists of n = 1000 observations from two independent, similarly distributed variables. We have then some approximations for the distributions of the values of these coefficients when the null hypothesis holds and, by taking the (1 − α)-quantiles from their histograms, we have estimates for their critical values for α. Table 4 contains these estimates in the cases where both the variables follow the standard normal distribution N(0,1) and α = 1,5,10%.

Table 4 The critical values of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC estimated from 3000 simulations with n = 1000 observations from two independent, normally distributed variables, when the significance varies

Now, we can estimate the power of our coefficients by computing what proportion of their values in 3000 simulations are above their critical values in Table 4. In one experiment for all the models (3.1)–(3.4) with parameter choices n = 1000, σ = 0.1, and α = 5%, it was observed that the powers of the coefficients \(\rho _{\max \limits }\), ρdist, r1, and MIC were 1 for all these models. The estimated powers of the absolute values of Pearson’s and Spearman’s correlation coefficients were 1 for all the monotonic relationships (the model (3.1) with j = 1,2,3), but notably less than this for the other models. Especially, the powers of these two coefficients are close to 0 in case of symmetric non-monotonic dependence, like the cross-shaped dependence of model (3.2).

Next, let us inspect how the amount of noise affects the power of our coefficients. To do this, we first choose some model and an appropriate interval of the noise parameter σ for this model. For each value of σ, we compute the values of our coefficients in 3000 simulations with n = 1000 observations and estimate the powers from these results by using the critical values of Table 2 for α = 5%. We plot the final results for three specific models.

Figure 2 contains the powers of all our coefficients, when the model is Eq. 3.1 with the cubic function f3(x) = 0.3x3 and σ = 0,1,...,30. For the first few values of σ, all our coefficients have power of 1, but the powers of the MIC and the coefficient r1 decrease very fast when σ > 3. The most powerful measure of dependence for this model is Pearson’s correlation coefficient |r|, followed by the coefficients |rs|, \(\rho _{\max \limits }\), and ρdist, all of whose powers seem to have very similar values.

Figure 2
figure 2

The estimated powers of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1 and MIC for n = 1000 observations of the model (3.1) with the cubic function f3(x) = 0.3x3, when σ = 0,1,...,30

Let us then consider the model (3.1) but choose the sinusoidal function \(f_{5}(x)=1.3\sin \limits (3x)\) instead and let σ = 0,0.5,...,15. Since neither Pearson’s nor Spearman’s correlation coefficient is well-suited for non-monotonic dependence, we only consider the coefficients \(\rho _{\max \limits }\), ρdist, r1, and MIC. From Fig. 3, we see that the maximal correlation \(\rho _{\max \limits }\) is considerably more powerful than the coefficients \(\rho _{\max \limits }\) and r1, whereas the MIC has the least power.

Figure 3
figure 3

The estimated powers of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC for n = 1000 observations of the model (3.1) with the sinusoidal function \(f_{5}(x)=1.3\sin \limits (3x)\), when σ = 0,0.5,...,15

Our third model considered is cross-shaped dependence of Eq. 3.2. Recall that σ = 0 gives us here a noiseless dependence whereas σ = 3 means that the data comes from two fully independent normal variables, so the powers of our coefficients should decrease from 1 to the value of α as σ increases from 0 to 3. Figure 4 is plotted by using the values σ = 0,0.1,...,3 and, as we can see, the power of the MIC decreases quickly close to 0 around σ = 0.6 and only the maximal correlation has values over 0.9 when σ exceeds 1.5.

Figure 4
figure 4

The estimated powers of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC for n = 1000 observations of the cross-shaped model (3.2), when σ = 0,0.1,...,3

By running similar experiments for all the other models introduced in Section 3, it can be noticed that the results found above do not change much. Namely, Pearson’s correlation coefficient |r| is the most powerful measure for monotonic dependence and the maximal correlation has the most power for detecting non-monotonic relationships, regardless of if they are functional or not. The MIC is very sensitive to the noise and therefore has less power than the coefficients \(\rho _{\max \limits }\), ρdist, and r1, whenever there is at least little noise in the model. This result was not affected by changing the level of significance into 10% or 1% with the corresponding critical values from Table 4.

However, if we choose the number n of observations so that it is clearly less than 100, it influences on the power of the coefficients. For each n = 10,11,...,50, we run 30000 simulations consisting of n observations of two independent normal variables, use this data to compute the critical values of the coefficients with the significance level α = 5% and then estimate the power of these coefficients from 30000 simulations with n observations from the model (3.1) where f is the linear function f1(x) = x and σ = 1. As can be seen from Fig. 5, the Pearson’s coefficient |r| has the greatest power, followed closely by the coefficients ρdist and |rs|, while the maximal correlation has the least power.

Figure 5
figure 5

The estimated powers of the coefficients |r|, |rs|, \(\rho _{\max \limits }\), ρdist, r1, and MIC for n observations from the model (3.1) with the linear function f1(x) = x and σ = 1, when n = 10,11,...,50

Figure 5 also shows us that the powers of the coefficient r1 and the MIC are not always increasing with respect to the number n of observations. This is because of our methods of computation: The mutual information needed to obtain the value of r1 is estimated by using \(\sqrt [3]{n}\) bins and the MIC is computed on a grid whose size is limited with the function \(B(n)=\max \limits \{n^{0.6},4\}\). By changing these default settings, we could fix this issue.

5 Equitability for Functional Types of Dependence

In this section, we study the equitability properties of the maximal correlation, the distance correlation, the coefficient r1 and the MIC. By equitability, we mean here such feature of a measure of dependence that it gives similar values for equally noisy relationships, regardless of the exact type of the association. We focus here on the model (3.1), where the function fj is one of the six options defined in Section 3: linear, logarithmic, cubic, quadratic, sinusoidal, or piecewise.

Recall the noiseless simulations of Table 1. It is clear that neither Pearson’s nor Spearman’s coefficient is equitable because they do not recognize non-monotonic types of dependence so we do not consider these coefficients. Similarly, the distance correlation cannot have this property because its values vary from 0.36 to 1 for functional relationships with σ = 0. Still, we can use the coefficient ρdist as a control when assessing the equitability of \(\rho _{\max \limits }\), r1, and MIC, who all have values close to 1 for these noiseless relationships.

However, in order to inspect the impact of the noise levels on our coefficients between several models, we need such a way to measure the amount of noise that does not depend on the choice of the function fj in the model (3.1) like the previously used parameter σ does. Consequently, we consider the coefficient of determination, defined as (Kinney and Atwal, 2014, p. 3355)

$$ R^{2}=R^{2}(f(X);Y)=(\rho(f(X);Y))^{2}\in[0,1], $$
(5.1)

where X and Y are chosen so that the function f defines their relationship so that Y = f(X) + 𝜖 with some third variable 𝜖 and ρ(;) is the population correlation estimated with Pearson’s coefficient r. Since the amount of noise is decreasing with respect to R2, we consider here the difference 1 − R2 instead. Note also that, according to (Kinney and Atwal, 2014, p. 3355), no non-trivial measure of dependence can be fully R2-equitable, but it is still useful to know if some of our coefficients are closer to fulfilling this property than the others.

Figure 6 shows us how the values of each coefficient \(\rho _{\max \limits }\), ρdist, r1, and MIC change for different functional types of dependence, when the noise measured with 1 − R2 grows. This figure was produced by generating 1000 times n = 1000 values for X and Y according to the model (3.1) and, during each iteration round, computing the values of different coefficients and 1 − R2, where R is the Pearson’s correlation between f(X) and Y obtained with the function cor in the R code. The results suggest that the most equitable coefficient is r1, which is compatible with prior research (Kinney and Atwal, 2014) where mutual information was noted to be able to measure different types of dependence in a consistent way. The MIC fulfills here the equitable better than the distance correlation but not as well as the maximal correlation.

Figure 6
figure 6

The values of the coefficients \(\rho _{\max \limits }\), ρdist, r1, and MIC against the noise measured with 1 − R2 in 1000 simulations of n = 1000 observations from the model (3.1) with the linear, logarithmic, cubic, quadratic, sinusoidal, and piecewise functions fj

We also notice here one interesting aspect of the maximal correlation. Namely, for several different functions f in the model (3.1), it follows from the similarities in the definitions (5.1) and Eq. 2.2 that \(\rho _{\max \limits }\geq \sqrt {R^{2}}\). For instance, suppose that f(X) = X so that our variables are \(X\sim N(0,1)\) and Y = X + 𝜖 with \(\epsilon \sim N(0,\sigma ^{2})\), X𝜖. Now, E(X) = E(Y ) = 0, Var(X) = 1 and Var(Y2) = 1 + σ2, so by the definition of correlation,

$$ \begin{array}{@{}rcl@{}} \sqrt{R^{2}}&=&\rho(X;Y)=\frac{E((X-E(X))(Y-E(Y))}{\sqrt{\text{Var}(X)\text{Var}(Y)}}=\frac{E(XY)}{\sqrt{1+\sigma^{2}}}=\frac{1}{\sqrt{1+\sigma^{2}}}\\ &=&E\left( X\frac{Y}{\sqrt{1+\sigma^{2}}}\right)=\rho\left( X;\frac{Y}{\sqrt{1+\sigma^{2}}}\right)\leq\rho_{\max}, \end{array} $$

as can be visually verified from Fig. 6 even though our computational methods are not fully accurate.

The equitability cannot be directly studied for non-functional relationships because the coefficient R2 is only defined for measuring noise from data that follows some functional model. Still, we know from Tables 1 and 2 that the values of the MIC are around 0.6 for both the cross-shaped dependence with no noise and the linear dependence with σ ≈ 0.6 or, equivalently, R2 ≈ 0.7. Since the maximal correlation has values close to 1 for all non-functional types of dependence and, unlike the MIC, this coefficient is not very sensitive to the noise, it probably has reasonably good equitability properties when measuring non-functional relationships.

6 Conclusions

According to our three criteria of generality, power, and equitability, the best choice of a measure of dependence is often the maximal correlation. The information coefficient of correlation r1 and the distance correlation also work relatively well. However, Pearson’s and Spearman’s correlation coefficients are greatly limited by the type of the dependence and the MIC is not well-suited for noisy data.

Both Pearson’s and Spearman’s correlation coefficients can be used to recognize non-monotonic dependence also when it is non-linear, but they do not find non-monotonic dependence if it is symmetric. Surprisingly, the maximal correlation also identifies non-functional relationships, even better than the coefficients that were actually designed for this objective. The distance correlation and the coefficient r1 work in an expected way but the MIC is considerably more sensitive to the amount of noise than any of the other coefficients. The number of observations does not affect very much the values of these quantities but there needs to at least 8 or so observations so that our methods of computation work properly.

For monotonic types of dependence, Pearson’s correlation coefficient is the most powerful measure of dependence, regardless of the number of observations. In case of non-monotonic or non-functional dependence, the maximal correlation has the most power, assuming we have at least 100 observations in the data. If we have less than 50 observations from a non-monotonic model, the distance correlation is a good choice for a measure of dependence because it is the most powerful out of the coefficients able to recognize this association and it is not susceptible to the exact number of observations. Predictably, the power of the MIC is very weak in all cases with at least some noise when compared to the other quantities.

The coefficient r1 can be used to measure functional dependence in quite an equitable way. The maximal correlation fulfills this property relatively well and, while the MIC is less equitable than the coefficient r1 and the maximal correlation, it still gives values close to 1 for all functional relationships with no noise and then decreases as the amount of noise grows. In turn, the distance correlation is not equitable in any way because its values vary very much depending on the function behind the dependence, even when there is no noise.