The Linear Model: t-test and ANOVA

  • Chapter
  • First Online:
Environmental Data Analysis
  • 1515 Accesses

Abstract

At the end of this (long) chapter \(\ldots \)

\(\ldots \):

you will know the t-test in its many different variations.

\(\ldots \):

You will understand that the idea of variance analysis is to divide the total variance into explainable and unexplained variance.

\(\ldots \):

you will know the F-test for calculating the significance of a variance analysis.

\(\ldots \):

you will understand the close relationship between ANOVA and regression.

If you give people a linear model function you give them something dangerous.

—John Fox (fortunes(49))

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 53.49
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 69.54
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 106.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We will also shortly consider the case that \(\sigma \) depends on \(\mathbf {X}\), but this is generally not possible in the linear model. The notation f(.) indicates that non-linear functions could also be considered. But we’re not going to do that here.

  2. 2.

    At this point it is obligatory to note that “Student” was the pseudonym of W. S. Gosset when he published the t-test while working for the Guinness Brewery. His employer considered quality assurance statistics as a trade secret. However, Gosset’s colleagues in mathematics knew the person behind pseudonym.

  3. 3.

    The central limit theorem states that (most) parameter estimators are (asymptotically) normally distributed, even if the considered variables are not normally distributed. For example, if we estimate the median of a sample, this estimate has a certain error because we consider only one sample. The error of this median is normally distributed, although our sample might be crooked and skewed!

  4. 4.

    That’s a mens size 6.5 for those of you in the UK and a men’s size 7 for you Yankees out there.

  5. 5.

    In Libre/OpenOffice Calc with the function TDIST or in Microsoft Excel with the function TINV. The reasons why we should never rely on MS Excel for statistical calculations are given by McCullough and Heiser (2008).

  6. 6.

    In fact, the normal distribution is also shown in the right figure, but is indistinguishable from the t-distribution with df = 500.

  7. 7.

    The phrase “less significant” is rightly frowned upon by statisticians. If we have a threshold for significance, conventionally 0.05, then any value below it is significant, fullstop. “Less significant” is akin to “more pregnant”.

  8. 8.

    See Sect. 11.3.2 on page 160 for a more detailed derivation. For the moment, we can think of it as a measure of how much effort we put into the calculation of means: the more classes, the more degrees of freedom we use. A more reasonable explanation unfortunately has to wait until we look at ANOVA and regression as two sides of the same coin.

  9. 9.

    The squared correlation coefficient between observed data \(\mathbf {y}\) and the model fit \(\hat{\mathbf {y}}\) is exactly \(R^2\). You will find both \(r^2\) and \(R^2\) in the literature. In the simple regression model, \(R^2=r^2\) is used, but this is no longer the case with non-linear models. There, the \(R^2\) loses its clear interpretability, since the null model is no longer necessarily a sub-model, and thus the comparison of the sum of squares is meaningless.

  10. 10.

    By the way, there are different ways to calculate the degrees of freedom. It is typically assumed, for example, that all groups contain the same number of data points (so-called balanced design), which unfortunately is rarely the case in the real world. We therefore use a generally valid equation here, which will also be useful to us later on, if we want to combine ANOVA and regression.

  11. 11.

    This is especially important when considering comparisons with other literature. In other textbooks, ANOVA is often only suggested for use with categorical predictors. We see here that this line of thought is a bit narrow-minded.

  12. 12.

    Well this actually shouldn’t come as a surprise. We have been using the F-value throughout this chapter to test whether the predictor has a significant influence on the variances.

  13. 13.

    So we first add to all values so that the smallest value is 0. Then we look at the amount of the next-biggest value one and add half of that to all values. If possible it is better to not use the ANOVA but to stay with a GLM. More on this later (Sect. 11.4).

  14. 14.

    For only positive y-values (so \(y > 0\)), the Yeo-Johnson-transformation presented here is identical to the Box-Cox-transformation. However, the Box-Cox-transformation only works for positive y-values (and shifts the values if necessary), while the Yeo-Johnson-transformation can also appropriately transform negative values without shifting. In their original work, the authors also show that their transformation often gets closer to the normal distribution, and is never worse than the Box-Cox (Yeo and Johnson 2000). Here is the original (two-parameter) Box-Cox transformation (Box and Cox 1964):

    $$ y' = {\left\{ \begin{array}{ll} ((y+c)^\lambda -1)/\lambda , &{} \text { if } \lambda \ne 0, \\ \log (y+c), &{} \text { if } \lambda = 0. \\ \end{array}\right. } $$

    The parameters \(\lambda \) and c (only if y also contains non-positive values) are calculated using the log-likelihood (i.e. adapted to a normal distribution). Since we will not deal with this transformation further (it’s too old school), here are the relevant R-packages: bcPower and yjPower in car; yeo.johnson in VGAM; boxcox in MASS.

  15. 15.

    For example: Duncan’s new multiple range test, Dunnett test, Friedman test(non-parametric, therefore usable for the Kruskal-Wallis-test), the Scheffé method, Holm-correction, false discovery rate-correction.

    In some of these tests (such as the Newman-Keuls test), the comparisons are first sorted by the difference in mean values and then tested one after another. As soon as a difference is no longer significant, we can abort the test, since the differences thereafter are even smaller (and the variance is the same everywhere, an assumption of ANOVA). Thus, we manage to get by with fewer comparisons, which leads to a less conservative statement than the Bonferroni correction.

    With the frequently used Holm correction, all comparisons are made, but then the P values are sorted and the first comparison is corrected just like the Bonferroni, but the second one is only multiplied by \(k-1\), the third one is multiplied by \(k-2\), and so on. This makes the Holm less conservative than Bonferroni correction.

  16. 16.

    Analysis of whether the units on the left side of the equation are identical to those on the right.

  17. 17.

    The deviance is computed (for all practical purposes) as \(-2\ell \). (Actually there is another term, the log-likelihood of the so-called saturated model, that comes in here, but in the context of the GLM this is largely immaterial.) Thus, the ratio of likelihoods is equivalent to the difference of log-likelihoods, which is why the \(\chi ^2\)-test is computed on their differences.

  18. 18.

    On the homepage of statistics professor Frank Harrell (Vanderbilt University, Nashville, Tennessee) this tip is listed under Philosophy of Biostatistics as the third point. The first two are also noteworthy: http://biostat.mc.vanderbilt.edu/wiki/Main/FrankHarrell.

  19. 19.

    A possible reason is that the context is not linear and we should insert a square term: point 4 on Harrell’s list.

References

  1. Box, G. E. P., & Cox, D. R. (1964). The analysis of transformations. Journal of the Royal Statistical Society B, 26, 211–252.

    MathSciNet  MATH  Google Scholar 

  2. Crawley, M. J. (2002). Statistical computing. An introduction to data analysis using S-Plus. Chichester: Wiley.

    MATH  Google Scholar 

  3. Crawley, M. J. (2007). The R Book. Chichester, UK: Wiley.

    Book  Google Scholar 

  4. Dalgaard, P. (2002). Introductory statistics with R. Berlin: S**er.

    MATH  Google Scholar 

  5. Day, R. W., & Quinn, G. P. (1989). Comparisons of treatments after an analysis of variance in ecology. Ecological Monographs, 59, 433–4636.

    Article  Google Scholar 

  6. Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Philosophical Transactions of the Royal Society of Edinburgh, 52, 399–433.

    Article  Google Scholar 

  7. Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver & Boyd.

    MATH  Google Scholar 

  8. Mann, H. B. (1949). Analysis and design of experiments: Analysis of variance and analysis of variance designs. New York: Dover Publications.

    Google Scholar 

  9. McCullough, B. D., & Heiser, D. A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52, 4570–4578.

    Article  MathSciNet  Google Scholar 

  10. O’Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118–122.

    Article  Google Scholar 

  11. Quinn, G. P., & Keough, M. J. (2002). Experimental design and data analysis for biologists. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  12. Underwood, A. J. (1997). Experiments in ecology: Their logical design and interpretation using analysis of variance. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  13. Warton, D. I., & Hui, F. K. C. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92, 3–10.

    Article  Google Scholar 

  14. Withers, C. S., & Nadarajah, S. (2014). Simple alternatives for Box-Cox transformations. Metrika, 77(2), 297–315.

    Article  MathSciNet  Google Scholar 

  15. Yeo, I.-K., & Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954–959.

    Article  MathSciNet  Google Scholar 

  16. Zar, J. H. (2013). Biostatistical analysis (5th ed.). Pearson.

    Google Scholar 

  17. Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. Berlin: Springer.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carsten Dormann .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dormann, C. (2020). The Linear Model: t-test and ANOVA. In: Environmental Data Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-55020-2_11

Download citation

Publish with us

Policies and ethics

Navigation