1 Introduction

In the last decades, countries have made efforts in reducing discharge of traditional contaminants (metals, nutrients, or abstract persistent organic pollutants also known as “forever chemicals”) into aquatic environments. Some developed countries have reported significant decreases in contaminants for nitrogen and phosphorus nutrients [1, 2], metals like organotin biocides tributyltin (TBT) [3], mercury [4], and organic compounds like DDT pesticides (1,1,1-trichloro-2,2\(^\prime\)bis(p-chlorophenyl)ethane or dichlorodiphenyl-trichloro ethane) [5, 6].

Besides traditional contaminants, compounds of emerging concern (CECs) also enter the aquatic environment [7]. The CECs include active PPCPs, microfibers, polar pesticides, micro-and nano-plastics, industrial chemicals such as bisphenol A (BPA), food additives, and lifestyle goods like caffeine. The frequent detection of CECs in an aquatic environment (wastewater, storm water, surface water, and sediment) pose a threat to human health and ecosystem [8,9,10]. Despite the strong link between human health and ocean health, little is known about how the two interact [11, 12].

In the last few decades, increasing investment in the health care sector, growing world population, aging societies, growing market availability, or the use of veterinary pharmaceuticals in intensive animal farming and industrially processed food and feeds have resulted in significant usage of PPCPs [13]. The advancement in technology for detecting some PPCPs in the range of micro-grams per liter or pico-grams per liter 2 in aquatic and terrestrial environments explains the increase in scientific publications related to the occurrences of PPCPs [14].

There are over 4000 pharmaceuticals primarily used for preventing or treating human, and animal disease, veterinary health care, or the growth of livestock in animal farms [15]. The three main sources of PPCPs released into the environment are point, disperse, and non-point. The point source is from the pharmaceutical industry, where the release of PPCPs takes place during the manufacture of drugs. Unfortunately, improperly managed PPCPs from businesses, medical facilities, or homes are released into the environment without receiving adequate care [16]. The WWTPs are the main sources of discharge wastewater and sewage sludge.

The widespread application of treated, partially treated, or untreated effluents from WWTPs and sewage effluent is used for irrigation [17] and sewage sludge for fertilization of lands in agriculture [18, 19]. Unfortunately, chemical standards to assess environmental risk do not cover effluents used for irrigation. Additionally, using reclaimed water–clean, transparent, and odorless treated wastewater–for irrigation on golf courses and other recreational areas can be an additional source of pharmaceutical contaminants, according to [20]. The PPCPs in reclaimed water could be absorbed by plant roots and eventually travel up the food chain [21]. The fertilizing of soil with natural fertilizers made from animal excreta or organic fertilizers like manure and slurry is another significant widespread source of PPCPs [22]. Finally, a non-point source is created when snow or rain naturally travels across the landscape, takes up contaminants like fertilizers and bacteria from pet waste, and deposits them into lakes, rivers, coastal waterways, and groundwater.

In recent years, there have been extensive publications on occurrences of PPCPs in surface waters, sea waters, and ground waters [15, 23,24,25,26,27]. The existence of the PPCPs in aquatic habitats, their bioaccumulation in marine organisms, and the development of microbes resistant to antibiotics pose potential dangers to human and animal health. Because the WWTPs are the most significant source of PPCPs in water, the study of various removal treatment processes and the effluent quality of PPCPs have gained importance in many countries [28, 29].

While the effluent of municipal sewage treatment plants is often investigated as a source of organic wastewater compounds in the environment, the solid end-product of wastewater treatment (i.e., digested municipal sludge or biosolids) has beginning to get attention [30]. Therefore, it is essential to get additional knowledge concerning the chemical composition of biosolids because the digested sewage sludge are applied as fertilizer in agriculture, forestry, and landsca**. In future studies, we recommend that scientists acquire such information while collecting data from the WWTP.

Because the WWTPs are the most significant source of PPCPs in water, the effluent quality of various removal treatment procedures must be studied using statistically valid methods. This field’s most widely used statistical procedures are ANOVA, paired or classical t-tests. These methods have significant limitations. The t-test is appropriate when only one treatment method, such as final effluent, is tested and observations are independent. The test is invalid if the data collection does not assure the independence of the observations [29]. For example, if the WWTP discharges water into a lake and data is collected from multiple sites, the data is likely correlated and we should not rely on the t-test result. Furthermore, if the study investigates many treatment procedures, as in our case, the independent t-test is invalid. A paired t-test based on the percentage of PPCPs removed is used in some research. This approach is reliable and widely used. The paired t-test, on the other hand, requires an equal number of points from each treatment group. When one of the samples in a pair is missing, the entire pair is removed from the data, reducing the number of observations and, as a result, making the test less powerful if the sample size is small. Consequently, there is a significant possibility that the test will miss the presence of PPCPs in the effluents.

The ANOVA test, which is also often employed, assumes that all observations are independent of one another, and that each factor’s variance is the same [31]. If, for instance, the WWTP data collection is carried out as in Fig. 1, at least one assumption is violated.

The repeated measures ANOVA works best with samples that are collected over time. However, if the assumptions that the variances of the differences between all group pairs are identical (i.e., sphericity) are not fulfilled, the likelihood of a Type I error could increase. If the data have a moderate sample size, we can apply Mauchly’s test to determine sphericity [32]. In small samples, the test won’t be able to detect sphericity, and in large samples, it might over-identify it. If the sphericity assumption is not satisfied, we can modify the degrees of freedom in F-tests using either Greenhouse-Geisser or Huynh-Feldt corrections [33, 34].

This paper proposes a linear mixed effects model (LMEM) method to overcome the challenges by incorporating fixed and random effects and allowing data dependency via a variance-covariance matrix. The LMEM provides answers for analyzing interactions between treatments and seasons that have never been addressed in this discipline before in the context of mixed effects. The LMEM has the property that if the data meets the properties of widely used statistical procedures, such as the t-test, then the LMEM’s results agree with those of the method.

2 Methodology

The LMEM allows us to investigate the fixed and random factors of interest while simultaneously accounting for variability within and across participants and items. Whereas fixed effects models account for population-level effects (i.e., average trends), random effects models account for trends across levels of grou** factor (e.g., participants or items). That is, the mixed effects model the behavior of certain participants or items that differ from the overall trend.

When observations within participants are nested, repeated measures ANOVA is preferable to standard ANOVAs and multiple regression. The repeated measures ANOVA models either person or item level variability but not both types of variability simultaneously. Because observations within a condition are collapsed across either participant or items in repeated measures ANOVA, important information about variability among participants or items is lost, reducing statistical power that may not detect an effect if one exists [35].

Also, the LMEM handles missing data and unbalanced designs better than ANOVA. While each observation in the LMEM approach reflects one of several replies within an individual, all responses within an ANOVA participant are part of the same observation. In other words, if one observation is missing, the entire observation is removed, and none of the data from that individual (or item) are included in ANOVA, resulting in less effects on parameter estimations by individual (or item) with more missing cases [36].

The complexity of the error term is the main difference between studies of independent data (e.g., between-subjects analyses) and dependent data (e.g., within-subjects analyses). When the data is independent, the error term is quite straightforward because there is only one source of random error. When data is dependent, however, the error term has multiple components, such as: (1) differences between subjects, know as (by-subject) random intercept, (2) differences between subjects in how they are affected by predictor variables, known as (by-subject) random slope, (3) random error, which captures all other types of error, such as unreliable measurement. These random error terms are incorporated into the LMEM.

Finally, it’s worth noting that ANOVA, t-tests, and multiple regression are all special cases of the general linear model (GLM), a special case of the LMEM. That is, the GLM is the LMEM without the random effects.

2.1 The LMEM model

The LMEM method is the best for analyzing our WWTP data since observations were collected monthly for each treatment process (multilevel/hierarchical, longitudinal), indicating a correlation structure. Also, some groups of compounds had more missing observations than others. Furthermore, we had fixed and random factors to consider when develo** the model.

In RStudio, we used the Linear Mixed Effect Models package (lme). The lme function fits a linear mixed-effects model described by Laird and Ware [37]. The “correlation” option describes the within-group correlation structure. We explored numerous autocorrelation structure models. Using the Likelihood Ratio Test and the Akaike information criterio (AIC), we concluded that the Auto Regressive Lag 1, AR(1), model was the best correlation structure for our data set. For fitting models, the lme offers maximum log-likelihood (MLE) and restricted maximum log-likelihood (REML) methods. We used the MLE method.

We fit the model

$$\begin{aligned} y=\textbf{X}{{\varvec{\beta }}}+\textbf{Z}\textbf{u}+{{\varvec{\epsilon }}} \end{aligned}$$
(1)

where \(\textbf{y}\) is a \(N\times 1\) column vector of the outcome variable, \(\textbf{X}\) is a \(N\times p\) matrix of the p predictor variables and their interactions, \({{\varvec{\beta }}}\) is a \(p\times 1\) column vector of the fixed-effects regression coefficients, \(\textbf{Z}\) is a \(N\times (qk)\) matrix for the q random effects and k groups, \(\textbf{u}\) is a \((qk)\times 1\) column vector of q random effects for k groups, and \({{\varvec{\epsilon }}}\) is a \(N\times 1\) column vector of the random errors that does not explain \(\textbf{y}\) by the model \(\textbf{X}{{\varvec{\beta }}}+\textbf{Z}\textbf{u}\). In our data, the response variable \(\textbf{y}\) is concentration level, \(\textbf{X}\) denote the design matrix for the study factors of group, treatment, season and their two- and three-way interactions, \(\textbf{Z}\) is a random individual-specific effect of month and \({{\varvec{\epsilon }}}\) is a within-individual measurement error.

3 Results and discussion

3.1 Sample data

California generates over 4 billion gallons of wastewater daily, which is managed by more than 900 WWTPs. A dataset from one of these WWTPs in California was examined in this study. Phonsiri (2018) (see Sect. 3.3 in [38]) collected data once a month on the 17th or 18th of each month from April 2017 to March 2018. The flowchart of the treatment processes with sample data locations is shown in Fig. 1.

There are preliminary, primary, and secondary treatment procedures used to remove PPCPs in the WWTP. Large objects like branches, and other items that could clog or harm the treatment plant’s machinery are screened out during the initial stage of the treatment process, known as preliminary treatment or influent screening. Following this screening, wastewater enters a grit chamber where gravity causes any small particles, like sand, eggshells, and coins, to settle to the bottom (treatment Influent, I). The wastewater is prepared for initial treatment procedures following the grit chamber (treatment Primary Effluent, P.E). To facilitate the physical process where heavy particles sink and light particles float, the wastewater is fed into primary settling or clarifier. The wastewater is now prepared for biological or secondary treatment after this procedure. Both the Activated Sludge (treatment A) and secondary Trickling Filter Effluent (treatment T) use microorganisms that feed on organic materials to remove pollutants. A secondary sedimentation tank receives the wastewater from treatment T after the organisms have absorbed and digested the organic contents. The mixture of 20% from treatment T and 80% from treatment A–known as the treatment Final Effluent, F.E–is the last phase and is used for reclaimed water.

Fig. 1
figure 1

The treatment processes I influent, PE primary effluent, T trickling filter effluent, A activated sludge effluent, and FE final effluent and SP sampling points

We studied five treatments (I, P.E, T, A, and F.E). We regarded F.E to be a treatment even though it is only a combination of two existing treatments to compare with the baseline treatment I. (i.e., not a chemical process). Also, to explore seasonal fluctuations and the effect of seasons on treatment processes, we grouped each data point into relevant seasons.

Our main objective is to investigate similar chemical compounds together rather than individually. Thus, the 19 PPCPs are divided into three groups: (1) hormones, (2) analgesic and anti-inflammatory medicines, and (3) antibiotics. Table 1 displays compound names and their grou**s.

Table 1 The groups of PPCPs and compound names

3.2 Handling missing values

Not applicable (NA) and Non-detect (ND) values are two types of missing values found in environmental data collection. For statistical analysis to be successful, it is critical to handle missing values correctly.

Table 2 The method detection limit (MDL) for compounds

We omitted the NA missing values from the data as we had no information why they were missing. We were aware, however, that the ND missing data were caused by the method of detection limit (MDL). Regardless of technological advancements, the instrument of detection limit, or MDL, may prevent reliable readings of contaminants below the threshold. The MDL is defined as the amount of a material with a 95% chance of having an analyte concentration greater than zero.

Maximum likelihood estimate, extrapolation, and simple replacement are the common methods for imputed ND values. The most popular technique for replacing ND is called a “simple replacement,” in which missing values are replaced by zero, MDL, or a fraction of MDL. The MDL fractions that are most frequently used are 1/2 and \(1/\sqrt{2}\). According to Zhang et al. [39], the confidence intervals based on the fraction of 1/2 of the MDL method lead to the same conclusion as the maximum likelihood method. Also, as using zero values tends to bias estimate downward while using MDL values tends to bias estimation upward, We therefore replaced \(\text {MDL}/2\) for the ND values. Table 2 displays the MDL values from Phonsiri (2018).

3.3 Descriptive statistics

We analyzed the data in RStudio [40], an integrated programming environment for R-free software.

The literature in this field typically reports summary data along with confidence intervals as in Tables 34, or displays boxplots as in Figs. 23. Because dependence between observations and two-way or three-way interactions were not investigated at this point, one should always take precautions when making inferences using confidence intervals.

Table 3 The descriptive statistics of the removal of compounds for groups

We only utilize the summary statistics table to decide whether to compute z-scores or standardized values. Table 3 shows that the concentration levels (measured in ng/L) for distinct groups have a wide range and spread values. As a result, we should standardize the data so that all concentration levels fell within the same range. The standardization technique allows for the comparison of different measures to prevent bias in statistical analysis. We calculated z-scores for each compound value by subtracting its mean and dividing its standard deviation. The new measurements have a mean of 0 and a standard deviation of 1.

Figure 2 shows the distributions of chemical removal for groups across several treatment techniques. The mean value is denoted by the “dot” between \(Q_1\) and \(Q_3\) values. We observed that the distributions were right-skewed, Group 2 had the highest mean concentration levels for all treatments, and Group 1 had the most outliers among three groups.

For comparison, we used the median values. Because the P.E. sample was split into two samples for treatments T and A, we compared treatments P.E. and T, P.E. and A, and T and A. We observed minor differences in treatments in Group 1. Treatment P.E outperformed treatment I in Group 3, but not in Groups 1 and 2. In Group 3, only treatment T outperformed treatment P.E, and treatment T surpassed treatment A in Groups 2 and 3. Treatments T and A were more efficient in removing PPCPs than treatment P.E in Group 2. Overall, the comparison of treatments I and F.E demonstrated that the final treatment procedure F.E was successful in removing PPCPs from Groups 2 and 3.

Fig. 2
figure 2

The boxplots of the removal of compounds for groups across treatment processes, with standard error bars

We were also interested in investigating seasonal variations among treatments and groups because consumption rates, precipitation amounts, and socio-geographical variables (such as tourism in the summer and winter) affect the concentration levels of PPCPs in WWTPs. The results are displayed in Table 4 and Fig. 3. The treatment distributions were right-skewed across all groups and seasons, with the exception of treatment I in the spring and treatment T in the winter.

Table 4 The descriptive statistics of the removal of compounds for groups across seasons

Group 1: The baseline concentration levels (i.e., treatment I) were highest in the fall and winter, respectively, and similar in the other seasons.Treatment A outperformed Treatment T in the fall but performed similarly to treatment F.E. In the spring and summer, none of the treatments outperformed Treatment I. Treatments T and A outperformed treatment P.E for reducing PPCPs in the winter, while treatments A and F.E worked equally well.

Group 2: None of the treatments in the fall outperformed treatment I. Treatment P.E performed similarly to other treatments in the spring, however, it was more successful than treatment I in removing contaminants. Surprisingly, treatment F.E. was not as successful as treatment I and was the second worst treatment after treatment A. Although treatments P.E. and T were superior to treatment I, they both removed PPCPs with comparable results.

Group 3: While removal performance from treatments I and P.E was equivalent in the fall, treatment T marginally outperformed treatment P.E. Although treatment A was the worst, its blend with treatment T, i.e., treatment F.E, was the best. The seasons with the most PPCP concentration levels were spring followed by winter. Geographical regions, seasonal variations in infectious illnesses, or variations in antibiotic prescription patterns could all be contributing factors to the elevated concentration levels. The best treatment in the spring was F.E followed by P.E. All treatments outperformed treatment I, and treatment T was superior to treatment A. The best treatment over the summer was treatment T, which was an improvement over treatment I. The treatments P.E., A, and F.E. were inferior to treatment I. In the winter, Treatment F.E and Treatment T performed the worst, with Treatment A outperforming them all. Treatment P.E., the initial chemical treatment, was more successful at eradicating PPCPs than treatment I.

Fig. 3
figure 3

The boxplots of the removal of compounds for groups in seasonal variation across treatment processes, with standard error bars

In summary, we found that the treatment processes had varied effects on different groups (i.e., two-way interaction between treatment processes and groups) and two-way interaction effects between groups and seasons. However, because data dependency was not addressed, we should be cautious about accepting the results from boxplots as valid inferences.

3.4 Data analysis

We visually investigated with three-way plotting if the interaction between the two factors differed depending on the third factor’s level. Figure 4 shows that there might be weak three-way interaction since the effects of the treatment procedures in groups do differ slightly between seasons. There appears to be treatment-season interaction because each treatment process behaves different in different seasons. Also, there may be two-way interactions between the group and season because each group behaves different in different seasons. However, we don’t observe that the treatment processes varies among groups. We will use the preliminary findings from the graph in building the model.

Fig. 4
figure 4

The results from LMEM to construct three-way interaction

We may rewrite the model Eq. (1) in R-syntax as

$$\begin{aligned} Concentration= & {} Group+Treatment+Season+Group*Treatment\\{} & {} +Group*Season+ Treatment*Season\\{} & {} +Group*Treatment*Season +(1\mid Month)+Error \end{aligned}$$

with AR(1) is a correlation structure for error. The term \((1\mid Month)\) incorporates dependency of the data by including months as a random component in the model. We chose to fit the model using z-scores rather than raw concentration levels because these values change across different compounds.

First, we investigated whether there was three-way interaction between groups, treatments, and seasons.

$$\begin{aligned}{} & {} H_0: \text {There is no three-way interaction} \\{} & {} \ H_a: \text {There is three-way interaction} \end{aligned}$$

Table 5 shows that the mean concentration levels for the treatments and groups were unaffected by seasons (\(p = 0.2540\)). Because there was no three-way interaction, we investigated whether one component influenced the effect of the other (i.e., two-way interactions.) Although the mean concentration levels for treatments were not statistically significant across groups (\(p = 0.9992\)), the mean concentration levels for seasons differed by groups (\(p <0.0001\)) and treatments (\(p =0.0027.\))

Table 5 The LMEM for three-way interaction model

Because there were two-way interactions, the effect of simultaneous changes cannot be determined by examining the main effects separately. That is, assessing the effects of treatment, group, and season separately is meaningless. Given that we had significant interactions, the next step is to investigate where the important differences are. In the next section, Tukey’s HSD (Honestly Significant Difference) and Bonferroni tests are used to determine means that are significantly different from each other.

3.5 Contrasts of means with Tukey’s HSD and Bonferroni procedures

As post hoc tests, we implemented Tukey’s HSD (Honestly Significant Difference) and Bonferroni, also known as the Dunn-Bonferroni procedures. While the Tukey’s HSD test provides a true \(\alpha\) correction value for the number of comparisons while maintaining statistical power, the Bonferroni test controls the familywise error rate by computing a new pairwise \(\alpha\) to keep the familywise \(\alpha\) value at the specified value. We present only the pairs of means that were statistically significant at \(\alpha =0.05\) in Tables 6, 7, 89.

Table 6 The LMEM model’s estimations are used to perform two-way interaction between groups and seasons

Because the Tukey’s HSD and the Bonferroni tests compared the same number of tests, the Table 6 yielded the same findings for comparisons between seasons and groups of compounds. There were statistical differences in the fall between Groups 1 and 3 (\(p = 0.0004\)) and Groups 2 and 3 (\(p < 0.0001\)). The only difference in spring was between Groups 1 and 3 (\(p = 0.0353\)).

The comparisons between treatments and seasons are reported in Table 7. Except for the p-values, both approaches produced the identical estimates for a given comparison. The p-values were different because the number of tests varied in both procedures.

In the fall, both tests found statistical significance between the treatments P.E and F.E (\(p =0.0287\) for Tukey’s HSD and \(p=0.0351\) for Bonferroni). In winter, the treatment A, a chemical procedure, removed more pollutants than the baseline treatment I (\(p=0.0229\) for Tukey’s HSD and \(p=0.0275\) for Bonferroni).

Table 7 The LMEM model’s estimations are used to perform two-way interaction between treatments and seasons

3.6 Comparison of two-sample t-tests with LMEM

We chose (Welch) two-sample t-tests with unequal sample variances to compare widely used statistics in the analysis of WWTP data with the LMEM method. We should not rely on the findings of the t-tests because the WWTP data is dependent. Because we observed two-way interactions in the LMEM model between treatments and seasons and groups and seasons, we investigated the same relations in the two sample t-tests (Tables 89).

The degrees of freedom were the key difference between the two methods, as seen in Tables 89. The LMEM approach had significantly more degrees of freedom than the t-test. There was also no consistent trend in the estimates of the contrasts or their standard errors. For example, we cannot assert that the LMEM’s standard error estimates were always lower than those of the t-tests. Although the t-tests discovered the same group differences as the LMEM approach in the fall and spring, it also discovered differences between groups 1 and 3 in the summer and groups 2 and 3 in the winter, spring, and summer (Table 8). Also, in comparing treatments and seasons, there were no similar results between the LMEM and the t-tests (Table 9).

In conclusion, because the LMEM addresses the data relation through the variance-covariance matrix, incorporates random effects with fixed effects, and handles missing values better, we advocate relying on its conclusions.

Table 8 Welch two-sample t-tests of groups with unequal sample variances within seasons
Table 9 Welch two-sample t-tests of treatment processes with unequal sample variances within seasons

4 Conclusion

In this research, we investigated five treatment techniques as well as the seasonal effects of removing PPCPs from the WWTP. Seasons had no effect on the mean concentration levels for the treatments or groups, revealing that there was no three-way interaction. We observed, however, that one factor influenced the other (i.e., two-way interactions). While there was no statistically significant difference in treatment mean concentration levels across groups (no treatment-group interaction), treatment mean concentration levels differed by season (treatment-season interaction), and group mean concentration levels differed by season (group-season interaction). Based on the estimations, we discovered that the final effluent achieved its goal of eliminating compounds when compared to the treatment P.E in the fall. Also, the treatment A is preferred over the baseline treatment I in the winter.

To conclude, we recommend that researchers analyze the WWTP dataset using the LMEM technique. It is simple and gives reliable results because the model adequately handles the data correlation structure, mixed effects factors, and missing values. Furthermore, researchers can explore factor inter-relationships through interactions.