Generalized Linear Mixed Models for Counts

Salinas Ruíz, Josafhat; Montesinos López, Osval Antonio; Hernández Ramírez, Gabriela; Crossa Hiriart, Jose

doi:10.1007/978-3-031-32800-8_5

3719 Accesses

Abstract

Data in the for of counts regularly appear in studies in which the number of occurrences is investigated, such as the number of insects, birds, or weeds in agricultural or agroecological studies; the number of plants transformed or regenerated using modern breeding techniques; the number of individuals with a certain disease in a medical study; and the number of defective products in a quality improvement study, among others. These counts can be counted per unit of time, area, or volume. When using a generalized linear model (GLM) with a Poisson distribution, it is often found that there is excessive dispersion (extra variation) that is no longer captured by the Poisson model. In these cases, the data must be modeled with a negative binomial distribution that has the same mean as the Poisson distribution but with a variance greater than the mean. Most experiments have some form of structure due to the experimental design (completely randomized design (CRD), randomized complete block design (RCBD), incomplete block, or split-plot design) or the sampling design, which must be incorporated into the predictor to adequately model the data.

You have full access to this open access chapter, Download chapter PDF

5.1 Introduction

Data in the for of counts regularly appear in studies in which the number of occurrences is investigated, such as the number of insects, birds, or weeds in agricultural or agroecological studies; the number of plants transformed or regenerated using modern breeding techniques; the number of individuals with a certain disease in a medical study; and the number of defective products in a quality improvement study, among others. These counts can be counted per unit of time, area, or volume. When using a generalized linear model (GLM) with a Poisson distribution, it is often found that there is excessive dispersion (extra variation) that is no longer captured by the Poisson model. In these cases, the data must be modeled with a negative binomial distribution that has the same mean as the Poisson distribution but with a variance greater than the mean. Most experiments have some form of structure due to the experimental design (completely randomized design (CRD), randomized complete block design (RCBD), incomplete block, or split-plot design) or the sampling design, which must be incorporated into the predictor to adequately model the data.

5.2 The Poisson Model

A Poisson distribution with parameter λ belongs to the exponential family and is a discrete random variable, whose probability function is equal to

$$ f(y)=\frac{e^{-\lambda }{\lambda}^y}{y!};\lambda >0,y=0,1,2,\cdots . $$

The mean and variance of a Poisson random variable are equal, i.e., E(y) = Var(y) = λ. A Poisson distribution is often used to model responses that are “counts.” As λ increases, the Poisson distribution becomes more symmetric and eventually it can be reasonably approximated by a normal distribution.

Let y_ij be the value of the count variable associated with unit i at level one and with unit j at level two, given a set of explanatory variables. Therefore, we can express this as

$$ f\left({y}_{ij}\right)=\frac{e^{-{\lambda}_{ij}}{\lambda}_{ij}^{y_{ij}}}{y_{ij}!},{y}_{ij}=0,1,2,\cdots $$

and the logarithm of the likelihood is given by:

$$ \log f\left({y}_{ij}\right)=\log \left(\frac{e^{-{\lambda}_{ij}}{\lambda}_{ij}^{y_{ij}}}{y_{ij}!}\right)=-{\lambda}_{ij}+{y}_{ij}\log \left({\lambda}_{ij}\right)-\log \left({y}_{ij}!\right). $$

A Poisson distribution has very particular mathematical properties that are used when we model “counts.” For example, the expected value of y is equal to the variance of y, such that

$$ E\left({y}_{ik}\right)=\mathrm{Var}\left({y}_{ik}\right)={\lambda}_{ij} $$

Then, λ_ij is necessarily a nonnegative number, which could lead to difficulties if we consider using the identity bound function in this context. The natural logarithm is mainly used as a link function for expected “counts.” For single-level (factor) data, Poisson regression model is considered, where we work with the natural logarithm of the counts, log(λ_i), whereas for multilevel data (more than two factors), mixed models with Poisson data are considered a better choice for the logarithm of the counts λ_ij.

Suppose that given the random effects of b, the counts y₁, y₂, ⋯, y_n are conditionally independent such that y_ij ∣ b_j~Poisson(λ_ij), where

$$ \log \left({\lambda}_{ij}\right)=\eta +{\tau}_i+{b}_j. $$

This is a special case of a generalized linear mixed model (GLMM) in which the link function of this family of distributions is g(λ_ij) = log (λ_ij). The dispersion parameter ϕ, in this case, is equal to 1.

Sometimes, if the data counts are extremely large, their distribution can be approximated to a continuous distribution. Whereas, if all the counts are large enough, then the square root of the counts is viable for fitting the model as it allows the variance to be stabilized. However, as mentioned in previous chapters, the estimation process under normality can be problematic, as it can provide negative fitted values and predictions, which is illogical.

5.2.1 CRD with a Poisson Response

An CRD is a design in which a fixed number of t treatments is randomly assigned to r experimental units. The linear predictor describing the mean structure of this GLM is

$$ {\eta}_{ij}=\eta +{\tau}_i $$

where η_ij denotes the ijth link function of the ith treatment in the jth observation, η is the intercept, and τ_i is the fixed effect due to treatment i (i = 1, 2, ⋯, t; j = 1, 2, ⋯r_i), with t treatments and r_i replicates in each treatment i.

Example

Effect of a subculture on the number of shoots during micropropagation of sugarcane.

The objective of micropropagation in sugarcane is to produce vegetative material identical to the donor so that its genetic integrity is preserved. Despite this, somaclonal variation has been observed in plants derived from in vitro culture regardless of explant, variety, ploidy level, number of subcultures, and generation route used, among others. A total of 8 explants were planted in temporary immersion bioreactors (explant/bioreactor) to determine whether the number of subcultures (10 subcultures) influences the number of shoots observed per explant. In this example, we have r_i observations (j = 1, 2, …, r_i) on each of the 10 subcultures (i = 1, 2, …, 10) in a completely randomized design (Appendix 1: Data: Subcultures). The analysis of variance (ANOVA) table (Table 5.1) for this model is given below:

Table 5.1 Analysis of variance

Block 1								Block 4
A₁	A₇	A₃	A₂	A₅	A₄	A₆	⋯	A₆	A₃	A₇	A₂	A₁	A₅	A₄
B₃	B₃	B₄	B₁	B₂	B₁	B₃		B₃	B₃	B₄	B₁	B₂	B₁	B₃
B₁	B₂	B₃	B₃	B₁	B₂	B₂	⋯	B₁	B₂	B₃	B₃	B₁	B₂	B₂
B₂	B₄	B₁	B₄	B₃	B₃	B₄		B₂	B₄	B₁	B₄	B₃	B₃	B₄
B₄	B₁	B₂	B₂	B₄	B₄	B₁	⋯	B₄	B₁	B₂	B₂	B₄	B₄	B₁

Shade cloth red			Shade cloth Perl			Shade cloth black
Tray	Container 0.5 kg	Container 1 kg	Tray	Container 0.5 kg	Container 1 kg	Tray	Container 0.5 kg	Container 1 kg
C2	C5	C4	C4	C5	C2	C5	C2	C4
C4	Pf	C3	C3	Pf	C4	Pf	C3	C2
C3	C1	C5	C5	C1	C3	C1	C5	C3
C5	C2	Pf	Pf	C2	C5	C2	Pf	C5
Pf	C4	C1	C1	C4	Pf	C4	C1	Pf
C1	C3	C2	C2	C3	C1	C3	C24	C1

sub1	Rep1	NB	sub1	Rep1	NB	sub1	Rep1	NB	sub1	Rep1	NB
1	1	18	3	2	24	6	1	45	8	9	53
1	2	16	3	3	24	6	2	44	8	10	59
1	3	15	3	4	19	6	3	45	8	11	57
1	4	15	3	5	25	6	4	44	8	12	65
1	5	11	3	6	24	6	5	52	8	13	63
1	6	17	3	7	20	6	6	47	8	14	55
1	7	10	3	8	24	6	7	46	8	15	50
1	8	8	3	9	20	6	8	45	8	16	52
1	9	17	3	10	19	6	9	48	8	17	55
1	10	13	3	11	26	6	10	56	8	18	50
1	11	16	3	12	22	6	11	54	8	19	53
1	12	15	3	13	23	6	12	44	8	20	52
1	13	12	3	14	24	6	13	54	9	1	48
1	14	15	3	15	23	6	14	62	9	2	44
1	15	8	4	1	24	6	15	55	9	3	54
1	16	8	4	2	28	6	16	45	9	4	55
1	17	15	4	3	29	7	1	56	9	5	51
1	18	15	4	4	34	7	2	62	9	6	58
1	19	14	4	5	24	7	3	45	9	7	47
1	20	8	4	6	24	7	4	45	9	8	42
2	1	15	4	7	25	7	5	46	9	9	50
2	2	11	4	8	28	7	6	48	9	10	48
2	3	12	4	9	24	7	7	55	9	11	48
2	4	18	4	10	32	7	8	45	9	12	53
2	5	8	4	11	34	7	9	45	9	13	54
2	6	17	4	12	30	7	10	44	9	14	59
2	7	8	4	13	26	7	11	52	9	15	58
2	8	18	4	14	27	7	12	45	10	1	46
2	9	22	4	15	29	7	13	43	10	2	38
2	10	19	5	1	38	7	14	58	10	3	29
2	11	19	5	2	38	7	15	62	10	4	30
2	12	24	5	3	37	7	16	45	10	5	31
2	13	12	5	4	41	7	17	63	10	6	33
2	14	12	5	5	46	7	18	56	10	7	35
2	15	11	5	6	44	7	19	55	10	8	59
2	16	21	5	7	54	7	20	50	10	9	37
2	17	10	5	8	45	8	1	53	10	10	44
2	18	15	5	9	60	8	2	58	10	11	42
2	19	20	5	10	57	8	3	56	10	12	41
2	20	22	5	11	51	8	4	50	10	13	45

Row	Column	Treatment	Count
1	1	S	3
1	2	U	6
1	3	U	2
1	4	TR	7
1	5	S	1
1	6	TR	5
2	1	TR	5
2	2	S	4
2	3	TR	5
2	4	U	8
2	5	U	6
2	6	S	3
3	1	U	3
3	2	TR	6
3	3	U	4
3	4	S	3
3	5	S	4
3	6	TR	7
4	1	U	3
4	2	TR	4
4	3	TR	5
4	4	S	2
4	5	U	3
4	6	S	6
5	1	TR	8
5	2	S	5
5	3	S	6
5	4	U	7
5	5	TR	9
5	6	U	4
6	1	S	6
6	2	U	5
6	3	S	6
6	4	TR	9
6	5	TR	9

Generalized Linear Mixed Models for Counts

Abstract

5.1 Introduction

5.2 The Poisson Model

5.2.1 CRD with a Poisson Response

Example

5.2.2 Example 2: CRDs with Poisson Response

5.2.3 Example 3: Control of Weeds in Cereal Crops in an RCBD

5.2.4 Overdispersion in Poisson Data

5.2.4.1 Using the Scale Parameter

5.2.4.2 Linear Predictor Review

5.2.4.3 Using a Different Distribution

5.2.5 Factorial Designs

5.2.5.1 Example: A 2 × 4 Factorial with a Poisson Response

5.2.6 Latin Square (LS) Design

5.2.6.1 Latin Square Design with a Poisson Response

5.2.6.2 Randomized Complete Block Design in a Split Plot

Example 5.1

Example 5.2

5.3 Exercises

Exercise 5.3.1

Exercise 5.3.2

Exercise 5.3.3

Exercise 5.3.4

Exercise 5.3.5

References

Author information

Authors and Affiliations

Appendix 1

Appendix 1

Data: Subcultures

Data: Beatles

Data: Weed counts

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation