Introduction

Water environment degradation is a big issue in watershed management, which poses severe threat to surface water security. Natural and anthropogenic activities deteriorate surface water quality (Sundaray et al. 2006), i.e., rock–water interactions (Li et al. 2014).

Positive matrix factorization (PMF)

PMF is widely used for source apportionment because it is one of the most important receptor models (Al-Dabbous and Kumar 2015; Hajigholizadeh 2016; Li et al. 2015; Mohammed et al. 2016). PMF is a multivariate receptor model that decomposes a matrix X of n by m dimensions, in which n is the number of samples and m is chemical species, into two matrices: factor contributions (G) and factor profiles (F), and the residual (E) (see Eq. 1). Two input files were inserted into PMF model: a file containing concentrations of the four examined water quality parameters and a file containing uncertainty values that is calculated as per Eq. 2. Optimum numbers of factors were obtained by performing several runs of the model and the subsequent selection of the best run/solution with the lowest values of Q (robust) where the later parameter shows the model fitting capability as mentioned earlier (Bzdusek et al. 2006). To minimize Q, this parameter has been defined as per Eq. 3. The main task of factor analysis by PMF is to minimize the objective function (Q) with respect to G and F under a constraint that all or at least some of the elements of the G and F were constrained to nonnegative values (Paatero 1997).

$$X = GF + E$$
(1)

The uncertainties were calculated using Eq. 2.

$${\text{Uncertainty }} = \frac{5}{6}*{\text{MDL}}$$
(2)

where MDL is method detection limit of each chemical species included as input in the modeling (Norris et al. 2014).

$$Q = \sum\limits_{{i = 1}}^{n} {\sum\limits_{{j = 1}}^{m} {(e_{{ij}} /s_{{ij}} )^{2} } }$$
(3)

Q = sum of squares of the difference (eij) between the original data matrix (X) and the PMF output (GF), divided by the computed uncertainties (sij).

The study was carried out using EPA PMF 5 which is based on Pateros’s PMF model. Optimum numbers of factors were found by the value of Q which shows the model fitting capability. A global minimum was computed by changing the seed value from 1 to 20 for each model run (Bzdusek et al. 2006).

Multiple correspondence analysis (MCA)

Correspondence analysis is a statistical visualization technique illustrating association between the members of two sets of data. MCA is the advanced form of correspondence analysis, which analyzes multiway tables. MCA allows the establishment of relationships between two and more than two variables. The main purpose of MCA is to find categories and distinguish them by separating. The same category variables are plotted close to one another, while different categories variables are plotted far apart (Ambarita et al. 2016; Pacheco 1998). Here, we performed MCA to categorize 27 water quality monitoring stations located along the river stretch. MCA categorized water monitoring sites to visualize its spatial distribution based on four water quality parameters (Fig. 3).

Results

Spatiotemporal variation of water quality data matrix was evaluated using different multivariate statistical techniques along with PMF modeling. Water sampling stations were broadly classified into four significant clusters as shown by dendrogram in Fig. 2. To assess spatial similarity among clusters, HCA was applied to 27 water quality monitoring sites. HCA classified the above-stated monitoring stations into four groups of similar water quality characteristics based on the four water quality parameters being analyzed in this study. The results are demonstrated by dendrogram as obvious from Fig. 2. HCA helps in declining the number of monitoring stations with minor loss of information (Simeonov et al. 2003).

Fig. 2
figure 2

Dendrogram of 27 water quality monitoring stations using Ward method based on water quality parameters of the Huaihe River basin

MCA was carried out to find out spatial distribution patterns of water quality monitoring stations at Huaihe River basin. Some water quality monitoring stations (3, 9 and 21) are located away from the rest of sites. Apart from these three water quality monitoring stations, the remaining stations are close to each other as demonstrated in Fig. 3. This gives an idea that the majority features of the above-stated three water quality monitoring stations are different from the remaining stations owing to unique pollutant emission sources. Those stations which have similar characteristics lie close to each other as they are exposed to approximately the same NPS pollution (Ambarita et al. 2016; Zhao et al. 2015).

Fig. 3
figure 3

MCA categorizes water quality monitoring stations at Huaihe River basin based on mean values of water quality parameters

PMF analysis was carried out for the apportionment of NPS pollution originating from different land uses. It suggested certain number of NPS pollution factors based on the underlying principle mentioned earlier in the materials and methods’ section. Each time the model was run using different initial seed value. The model was run 20 times. Four NPS pollution factors were identified per each group and per each season (winter, summer (wet), spring and autumn (wet)).

Four pollution sources are identified for winter season. All parameters are contributing in Factor 1(pH,35.5%; DO,61.6%; COD,21.5%; and NH3-N,5.8%) and Factor 3(pH,43.2%; DO,34.4%; COD,29.9%; and NH3-N,12.4%). Factor 1 and Factor 3 are identified as diffused land use with multiple NPS pollution. This may be due to closing of gates in dry season to store water for local supply which retains all kinds of pollutants produced from agricultural, industrial and urban land uses (World Bank, 1997). Factor 2(pH,3.3%; COD,8.8%; and NH3-N,75.5%) is characterized by unique high loading of NH3-N and negligible loadings of the remaining three water quality parameters. Hence, this factor is identified as agricultural land use. Huaihe River basin is the main grain-producing area of China. This may be due to excessive application of fertilizers and pesticides used for crop production in the river valley (Zhong 2006). Factor 4 is dominated by pH and COD as can be seen from Fig. 4. Factor 4(pH,18%; DO,4%; COD,39.8%; and NH3-N,6.3%) is then suggested to be related to urban land use (Huang et al. 2013; Pratt and Chang 2012; **ao et al. 2016; Zhao et al. 2015).

Fig. 4
figure 4

Temporal factor loadings obtained from PMF analysis of water quality parameters of Huaihe River basin. Factors along with its corresponding NPS are given. (For winter season: Factor 1 and Factor 3 = diffused land use, Factor 2 = agricultural land use and Factor 4 = urban land use. For summer season: Factor 1 = agricultural land use, Factor 2 = industrial land use, Factor 3 and Factor 4 = diffused land use. For spring season: Factor 1 and Factor 4 = diffused land use, Factor 2 = agricultural land use and Factor 3 = urban land use. For autumn season: Factor 1 and Factor 4 = diffused land use, Factor 2 = industrial land use and Factor 3 = agricultural land use.)

Four pollution sources are identified for summer season. Factor 1(pH,4.6%; DO,0.4%; COD,7.7%; and NH3-N,80.3%) is exclusively ruled by NH3-N as evident from the source profile shown in Fig. 4. Hence, this factor is most likely to be linked to agricultural land use. The profiles of agricultural land use in summer and winter are very similar. Rains in summer sweep fertilizers and pesticides from fields to the river. Factor 2(pH,1.6%; COD,31.6%; and NH3-N,1.5%) is dominated by COD and negligible loading of the rest water quality parameters and therefore classified as industrial land use. High loading of COD may be due to routine industrial activities accompanied by seasonal agricultural product processing enterprises at village and township level. It has aggravated the situation owing to outdated technology and lack of pollution treatment facilities (Zhong 2006). All parameters are contributing in Factor 3(pH,43.2%; DO,64.9%; COD,21.7%; and NH3-N,7%) and Factor 4(pH,50.6%; DO,34.7%; COD,38.9%; and NH3-N,11.2%). Factor 3 and Factor 4 are identified as diffused land use with multiple NPS pollution. It may be attributed to high water consumption for bathing due to rise in temperature and heavy rains in those particular areas. Storm water sweeps all kinds of contaminants from different land uses to the river (Huang et al. 2013; Pratt and Chang 2012; ** J, Peng W, Yi Z (2017) Influence of watershed topographic and socio-economic attributes on the climate sensitivity of global river water quality. Environ Res Lett 12:104012" href="/article/10.1007/s13201-019-0938-4#ref-CR1" id="ref-link-section-d81111753e2879">2017). Agricultural land continuously impairs surface water quality throughout the year with strongest impacts in dry season (summer and autumn) due to lower dilution (Liu et al. 2017). Industrial land use has positive association with COD. Water quality of industrial area is highly contaminated as compared to urban and suburban area (Zhao et al. 2015). Lacks of treatment facilities in industrial and urban area badly deteriorate surface water quality (Ding et al. 2015; Ho and Hui 2001; Sun et al. 2013).

Contribution of land use to seasonal contamination risk

Seasonal variation of surface water quality is associated with land use composition. The results of the current study suggested that water quality exhibits seasonality for a distinct land use composition for a particular site. Rainy season alters surface water quality due to instream flushing effect and dilution (Park et al. 2011) resulting in seasonal variations of point and NPS pollution (Ye et al. 2014). NH3-N and COD loading is higher in rainless season due to lower dilution effect. NH3-N concentration is higher in spring which may be due to high agricultural activities in the region (Liu et al. 2017). The study area is characterized by double crop** system, i.e., wheat and maize. Fertilizer application is common in March and April for winter wheat. Irrigation tailwater discharges due to conventional irrigation system in the study area deteriorate surface water quality via nutrient loss with soil erosion (Yu et al. 2016). Furthermore, seasonal first flush of rainfall is the second potential cause of water quality degradation (Liu et al. 2016). Late spring rainfall, after long dry season, drains fertilizers and pesticides to nearby watercourses which degrade surface water quality (Liu et al. 2017).

Local management implication for Huaihe River

NPS pollution strongly depends on land use which varies from one monitoring site to another monitoring site. Here, NPS pollutants are identified via multiple water quality variables. Therefore, it is necessary to implement comprehensive best management practices (BMPs) at various land use level to address multiple concerns of water quality deterioration. Seasonal variation of NPS pollution explains seasonal behavior of pollutants emission. NH3-N and COD concentration is higher in dry season in agricultural region and urbanized area. Agricultural and urban areas pose high risk of contamination during spring. The seasonal variability of NPS pollution can be beneficial in controlling seasonal contamination risk via BMPs (Liu et al. 2017). It is utmost important to control urban runoff in order to fulfill the country discharge standards, while sluices regulation, precision farming, terraced fields and buffer zone will be helpful in water quality improvement (**a et al. 2018).

Limitations

Up to certain extent, the authors have faced problems in the apportionment of NPS pollution based on land use due to the following facts. First, the authors faced difficulty in the identification of NPS pollution owing to the availability of limited number of water quality parameters. Secondly, the authors faced problems due to unavailability of field benchmark NPS pollution emission profiles of different sources at different land use levels. Field NPS pollution emission profile of different sources works as benchmark for PMF results. Validation of PMF results can be done via field benchmark NPS pollution emission profiles.

Conclusions

The underlying information was extracted from the complex multidimensional water quality data matrix via multivariate statistical techniques and PMF analysis for Huaihe River basin. HCA clustered twenty-seven water quality monitoring stations into four groups of similar water quality characteristics based on four water quality parameters. MCA has identified that some water quality monitoring stations (3, 9 and 21) are located away from the rest which suggests that they have different water quality characteristics due to unique pollutant emission sources. Box and whisker plots have suggested that temporal trends are possibly influenced by temperature and rainfall, while spatial trends are linked with NPS pollution from different land uses, i.e., agricultural, urban and industrial land uses. PMF identified 4 factors per each group and each season based on land use which gives clear picture of NPS pollution originating from different land uses. Each factor identified by PMF analysis shows the severity of anthropogenic activities at different land use level. Besides, NPS pollution varies with season which shows its possible linkages with natural processes, for instance hydrological regime. The seasonal contamination patterns will be beneficial in controlling seasonal pollution risk. Generally, Huaihe River water quality was mainly impaired by land use variation, flows regulated by sluices and dams, etc. In high regulated rivers scientific regulation of dams and sluices may be helpful in alleviating water quality pollution problems. The proposed pollution apportionment approach supports the management and planning of the ongoing Huaihe River water pollution control project.