Background

Artificial insemination (AI) pig industry focuses mainly on maximizing the number of insemination doses produced from each boar ejaculate. To achieve this goal, the ability of boars to produce high-quality semen (high motility and progressive motility with low levels of morphological defects) in sufficient quantity (large number of sperm cells per ejaculate) is decisive [1].

In recent years, with the fast advances in high-throughput genoty** and in molecular techniques in general, there is an increased interest in the study of the molecular processes and genetic mechanisms that affect semen traits. Genes and markers associated with pig semen traits have been described in the literature [12], is a method that allows estimation of SNP effects using genomic estimated breeding values (GEBV) from single-step genomic best linear unbiased prediction (ssGBLUP, [13]) based on all phenotyped, genotyped and pedigree-related animals. In addition, it allows unequal variances for SNPs, which results in improved precision of the estimation of SNP effects [12]. Therefore, when the number of animals with both phenotypes and genotypes is small and the traits are controlled by QTL with large effects, the WssGWAS may perform better than traditional GWAS methods. Recent studies have used this method for production, carcass and reproductive traits in livestock [14,15,16,17,18,19,20,21,22,23].

In a post-GWAS study, a gene network analysis can be performed for candidate genes related to QTL regions identified in GWAS. The gene network is used to investigate pathways and biological processes that are shared by these genes [24]. In addition, the biological information provided by these gene networks are helpful to understand genetic differences between populations for the same trait [25].

In this study, our aim was to identify QTL regions that are associated with four semen traits (motility, progressive motility, number of sperm cells per ejaculate and total morphological defects) in pigs. In addition, our aim was to identify candidate genes within those QTL regions that explained the highest proportions of genetic variance. To achieve our goal, we performed a WssGWAS in two commercial pig lines (L1: Large White type and L2: Landrace type), followed by gene network analyses to investigate the biological processes shared by genes that were identified for the same semen traits in these two lines.

Methods

Phenotypic, genotypic and pedigree data

Phenotypic data were available from two commercial pig lines, a Large White type line (L1) and a Landrace type line (L2), on ejaculate samples that were collected between January 2007 and October 2014. The evaluated traits were: (1) sperm motility (MOT), which is the proportion of moving sperm cells in an ejaculate; (2) sperm progressive motility (PROMOT), defined as the proportion of sperm cells that move in a straight line; (3) abnormal sperm cell number (ABN), which is the total number of sperm cells with morphological abnormalities; and (4) the total number of sperm cells in the ejaculate (Ncells per 106 sperm cells). MOT and PROMOT were evaluated using the UltiMateTM CASA system (Hamilton Thorne Inc., Beverly, MA, USA). Ncells was calculated as the product of the semen volume (mL) and concentration (106 mL−1, measured by the CASA system). The values of this measure were not normally distributed, and thus log-transformed before further analyses (lnNcells). Ejaculates evaluated for ABN were analyzed microscopically at a 1000× magnification by a trained technician with a phase contrast microscope and a thermal plate (BH-2, Olympus, Tokyo, Japan), counting 100 sperm cells per assessment. All semen traits were assessed on the day of semen collection using fresh semen.

The phenotypic data for MOT, PROMOT and lnNcells consisted of 43,455 ejaculates for L1 (866 boars) and 39,161 ejaculates for L2 (900 boars). For ABN, the phenotypic data consisted of 13,366 ejaculates for L1 (849 boars) and 9853 ejaculates for L2 (886 boars). The average number of ejaculates per boar (with standard deviations in parenthesis) for MOT, PROMOT and lnNcells were 50.18 (38.12) for L1 and 43.51 (36.37) for L2. For ABN, the average number of ejaculates per boar were 15.74 (11.19) for L1 and 11.12 (8.99) for L2. Number of boars with phenotypic data, mean, standard deviation, minimum and maximum values of semen traits in L1 and L2 are in Table 1.

Table 1 Descriptive statistics of semen traits

Genoty** data for 3737 animals (856 males and 2881 females) from L1 and 3307 animals (953 males and 2354 females) from L2 were available. A majority of the animals (2718 for L1 and 2394 for L2) were genotyped using the Illumina PorcineSNP60 BeadChip (Illumina, Inc., San Diego, CA) but part of the animals from L1 (n = 1019) and L2 (n = 913) were genotyped using the (Illumina, Inc.) GeneSeek Custom 80 K SNP chip (GeneSeek Inc., Lincoln, NE). Quality control was performed by excluding SNPs that had an unknown position on the pig genome build 10.2 [26], that were located on sex chromosomes, that had a call rate lower than 0.95 or a minor allele frequency lower than 0.01, or that were in strong deviation from Hardy–Weinberg equilibrium (χ2 > 600). Animals for which the frequency of missing genotypes was higher than 0.05 were also excluded. After quality control, missing genotypes from animals genotyped with the SNP60 BeadChip were imputed with the software Beagle version 3.3.2 [27] to the set of SNPs on the SNP60 BeadChip that passed quality control. In addition, genotypes from the animals genotyped with the GeneSeek Custom 80 K SNP chip were imputed to the set of SNPs on the SNP60 BeadChip that passed quality control. After quality control, 39,945 and 41,478 SNPs remained for L1 and L2, respectively, and were used for the GWAS.

The complete pedigree included 8352 animals for L1 and 8271 animals for L2. The total number of animals that remained after pedigree pruning was 6724 for L1 and 6502 for L2. Most animals had either phenotypic or genotypic data. The number of animals with both phenotypes and genotypes was 349 for L1 and 446 for L2.

Statistical analyses

The weighted ssGBLUP analysis was conducted within line using the BLUPF90 software family [28] adapted for genomic analyses [29]. First, variance components were estimated using AIREMLF90, which were then used in BLUPF90 to predict GEBV. SNP effects were then calculated using postGSf90 software.

The single-trait animal model for ssGBLUP was as follows:

$${\mathbf{y}} = {\mathbf{X}}\varvec{\upbeta} + {\mathbf{Za}} + {\mathbf{Wp}} + {\varvec{\upvarepsilon}},$$

where \({\mathbf{y}}\) is the vector of phenotypic observations; \({\varvec{\upbeta}}\) is the vector of fixed effects (combined effects of AI station, year and month of semen collection; the laboratory where the samples were analyzed and the covariates of interval between two subsequent semen collections in days and age of the boar at the time of collection in months); \({\mathbf{a}}\) is the vector of random additive genetic effects; \({\mathbf{p}}\) is the vector of random permanent environmental effects; \({\varvec{\upvarepsilon}}\) is the vector of random residuals; and \({\mathbf{X}}\), \({\mathbf{Z}}\) and \({\mathbf{W}}\) are the incidence matrices of \({\varvec{\upbeta}}\), \({\mathbf{a}}\) and \({\mathbf{p}}\), respectively.

It was assumed that \({\mathbf{a}}\sim{\text{N}}\left( {0,{\mathbf{H}}\sigma_{a}^{2} } \right)\), \({\mathbf{p}}\sim{\text{N}}\left( {0,{\mathbf{I}}\sigma_{p}^{2} } \right)\) and \({\varvec{\upvarepsilon}}\sim{\text{N}}\left( {0,{\mathbf{I}}\sigma_{e}^{2} } \right),\) where \(\sigma_{a}^{2}\), \(\sigma_{p}^{2}\) and \(\sigma_{e}^{2}\) are the additive genetic, permanent environmental and residual variances, respectively; \({\mathbf{H}}\) is the matrix that combines pedigree and genomic information [13], and \({\mathbf{I}}\) is an identity matrix. The inverse of \({\mathbf{H}}\) needed for mixed model equations is given by:

$${\mathbf{H}}^{ - 1} = {\mathbf{A}}^{ - 1} + \left[ {\begin{array}{*{20}c} 0 &\quad 0 \\ 0 &\quad {{\mathbf{G}}^{ - 1 } - {\mathbf{A}}_{22}^{ - 1} } \\ \end{array} } \right],$$

where \({\mathbf{A}}\) is the numerator relationship matrix based on pedigree for all animals; \({\mathbf{A}}_{22}\) is the numerator relationship matrix for genotyped animals; and \({\mathbf{G}}\) is the genomic relationship matrix [30]. This matrix was obtained as follows:

$${\mathbf{G}} = \frac{{{\mathbf{ZDZ}}^{{\prime }} }}{{\sum_{{{\text{i}} = 1}}^{\text{M}} 2{\text{p}}_{\text{i}} \left( {1 - {\text{p}}_{\text{i}} } \right)}},$$

where \({\mathbf{Z}}\) is a matrix of gene content adjusted for allele frequencies (0, 1 or 2 for aa, Aa and AA, respectively); \({\mathbf{D}}\) is a diagonal matrix of weights for SNP variances (initially \({\mathbf{D}} = {\mathbf{I}}\)); \({\text{M}}\) is the number of SNPs, and \({\text{p}}_{\text{i}}\) is the minor allele frequency of \({\text{i}}\)-th SNP.

Estimates of SNP effects and weights for WssGWAS were obtained according to Wang et al. [12] by the following steps:

  1. 1.

    In the first iteration (\({\text{t}} = 1\)): \({\mathbf{D}} = {\mathbf{I}}\); \({\mathbf{G}}_{{\left( {\text{t}} \right)}} = {\mathbf{D}}_{{\left( {\mathbf{t}} \right)}} {\mathbf{Z}}^{{\prime }}\uplambda\), where \(\uplambda = \frac{1}{{\sum_{i = 1}^{M} 2p_{i} \left( {1 - p_{i} } \right)}}\) [30];

  2. 2.

    GEBV were calculated for the entire dataset using ssGBLUP;

  3. 3.

    GEBV were converted to estimates of SNP effects (\({\hat{\mathbf{u}}}\)): \({\hat{\mathbf{u}}}_{{\left( {\text{t}} \right)}} =\uplambda{\mathbf{D}}_{{\left( {\text{t}} \right)}} {\mathbf{Z}}^{{\prime }} {\mathbf{G}}_{{\left( {\text{t}} \right)}}^{ - 1} \hat{\varvec{a}}_{g}\), where \(\hat{\varvec{a}}_{g}\) is the GEBV of animals that were also genotyped;

  4. 4.

    The weight for each SNP to be used in the next iteration was calculated as: \(d_{{{\text{i}}\left( {{\text{t}} + 1} \right)}} = \hat{u}_{{{\text{i}}\left( {\text{t}} \right)}}^{2} 2p_{\text{i}} (1 - p_{\text{i}} )\), where \({\text{i}}\) is the \({\text{i}}\)-th SNP;

  5. 5.

    The SNP weights were normalized to keep the total genetic variance constant:

    $${\mathbf{D}}_{{\left( {{\text{t}} + 1} \right)}} = \frac{{{\text{tr}}\left( {{\mathbf{D}}_{\left( 1 \right)} } \right)}}{{{\text{tr}}\left( {{\mathbf{D}}_{{\left( {{\text{t}} + 1} \right)}} } \right)}}{\mathbf{D}}_{{\left( {{\text{t}} + 1} \right)}} ;$$
  6. 6.

    \({\mathbf{G}}_{{\left( {{\text{t}} + 1} \right)}} = {\mathbf{ZD}}_{{\left( {{\text{t}} + 1} \right)}} {\mathbf{Z}}^{{\prime }}\uplambda\) was calculated;

  7. 7.

    \({\text{t }} = {\text{t }} + 1\) and loop to step 2.

This procedure was run for three iterations based on the realized accuracies of GEBV according to Legarra et al. [31] and performed by Wang et al. [14]. At each iteration, the weights for SNPs were updated (steps 4 and 5), used to construct the \({\mathbf{G}}\) matrices (step 6), update the GEBV (step 2) and, consequently, the estimated SNP effects (step 3). The percentage of genetic variance explained by the \({\text{i}}\)-th set of consecutive SNPs (\({\text{i}}\)-th SNP window) was calculated as described by Wang et al. [14] as:

$$\frac{{Var \left( {a_{\text{i}} } \right)}}{{\sigma_{a}^{2} }} \times 100\% = \frac{{Var\left( {\sum_{j = 1}^{x} {\mathbf{Z}}_{j} \hat{u}_{\text{j}} } \right)}}{{\sigma_{a}^{2} }} \times 100\% ,$$

where \(a_{\text{i}}\) is the genetic value of the \({\text{i}}\)-th SNP window that consists of a region of consecutive SNPs located within 0.4 Mb, which is the average haplotype block size in commercial pig lines [32], including the lines considered in the present study; \(\sigma_{a}^{2}\) is the total additive genetic variance; \({\mathbf{Z}}_{j}\) is the vector of gene content of the \(j\)-th SNP for all individuals and \(\hat{u}_{j}\) is the effect of the \(j\)-th SNP within the \({\text{i}}\)-th window. Manhattan plots showing these windows were created using the R software [33].

Selection of relevant SNP windows, search for candidate genes, and gene network analyses

The selection of relevant SNP windows and the search for genes within these QTL regions were performed in three steps. First, for the four traits (MOT, PROMOT, lnNcells and ABN), the SNP windows that explained 1% or more of the total genetic variance based on the WssGWAS were selected within each line (L1 and L2). The threshold of 1% was chosen based on the literature [19, 20, 34] and on the expected contribution of SNP windows [35]. For example, assuming an equal contribution of all windows (on average, we had 4223 and 4229 windows for each trait in L1 and L2, respectively), the expected proportion of genetic variance explained by each window was 0.02% for both lines (100/4223 for L1 and 100/4229 for L2). The threshold of 1% used in the present study is equal to 50 times the expected variance (0.02% × 50 = 1%) and considered a suitable threshold for our purposes. Then, after selecting the windows that explained more than 1% of the genetic variance, a search for overlap** windows for two or more traits of the same line was performed. Windows were considered to overlap if their midpoints were less than 0.4 Mb apart. Second, the three most important windows (that explained the highest proportion of genetic variance) for each trait and the 0.4 Mb region on either side of these windows midpoints were also identified. Third, based on the windows selected in the first step (> 1% of variance explained), overlap** windows for the same traits, but across lines, were investigated (also considering a maximum distance of 0.4 Mb between midpoints).

Based on the start and end positions of each identified and selected QTL region in the third step, we identified the genes in these QTL regions based on the Gene database for Sus scrofa available at “National Center for Biotechnology Information” (NCBI, [36]). For all the identified genes, we manually searched the literature if they had a previously identified relationship with the traits under study. In addition, based on human genes with the same description, we carried out four gene network analyses that described the biological processes and relations between the L1 and L2 sets of genes that were identified for the same traits by using the ClueGO and CluePedia Cytoscape plug-ins [37, 38]. The ClueGO plug-in combines Gene Ontology (GO) terms and KEGG/BioCarta pathways and develops a GO/pathway network. It also calculates enrichment and depletion tests for groups of genes based on the hypergeometric distribution and corrects the P-values for multiple testing [37]. Using the CluePedia plug-in, new associations between genes can be discovered with enrichments and added to the ClueGO pathways [38].

Results

Estimates of variance components for all semen traits in both lines are in Additional file 1: Table S1. Low to moderate heritabilities ranging from 0.10 to 0.31 were estimated (Table 2), with higher estimates in L1 than L2 for MOT, PROMOT and ABN.

Table 2 Estimates (standard error) of heritabilities for the evaluated semen traits in two lines

For L1, the three most important windows for MOT, PROMOT, lnNcells and ABN explained 14.4, 18.2, 15.67 and 18.8% of the genetic variance, respectively (Fig. 1) and Additional file 2: Table S2. For L2, the three most important windows explained 21.9, 18.7, 18.3 and 13.8% of the genetic variance of each trait, respectively (Fig. 2) and Additional file 3: Table S3. For L1, 20 relevant QTL regions (single and overlap**) were found on SSC1 (SSC for Sus scrofa chromosome), 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, and 15 (Table 3). For L2, 16 relevant QTL regions were located on SSC1, 2, 3, 6, 7, 8, 9, 11, 13, 15 and 18 (Table 4). For L1, 110 genes located in the relevant QTL regions were identified, of which six genes were described to be related to MOT, PROMOT, lnNcells and ABN in the literature (Table 3). For L2, 106 genes were found in the detected relevant regions, of which five genes were related to MOT, PROMOT, lnNcells and ABN based on literature (Table 4).

Fig. 1
figure 1

GWAS results of semen traits in the Large White type line (L1). a Sperm motility, b sperm progressive motility, c number of sperm cells per ejaculate, d total morphological abnormalities. Each dot represents one SNP window of 0.4 Mb. On the y-axis is the percentage of genetic variance explained by windows

Fig. 2
figure 2

GWAS results of semen traits in the Landrace type line (L2). a Sperm motility, b sperm progressive motility, c number of sperm cells per ejaculate, d total morphological abnormalities. Each dot represents one SNP window of 0.4 Mb. On the y-axis is the percentage of genetic variance explained by windows

Table 3 Individual and overlap** QTL regions for semen traits in L1
Table 4 Individual and overlap** QTL regions for semen traits in L2

We found no overlap** QTL regions between the two lines for the same traits in this study. Nevertheless, the gene network analysis for PROMOT revealed two genes in L1 (PLA2G4A and PTGS2) and one gene in L2 (HPGDS) that shared the biological processes of eicosanoid biosynthesis and arachidonic acid metabolism. The genes PTGS2 in L1 and HPGDS in L2 were also found to share the biological process of cyclooxygenase pathway (Fig. 3).

Fig. 3
figure 3

Gene network of biological processes for progressive motility. Complete network and important shared pathways (with zoom) are shown. Blue color indicates pathways for the Large White type line (L1) and green color indicates pathways for the Landrace type line (L2). Processes shared by PTGS2 and PLA2G4A genes (L1) and HPGDS gene (L2) are connected by blue nodes. Processes shared by PTGS2 and HPGDS are connected by grey nodes. Green dots are biological processes for HPGDS

Discussion

In the past years, interest in investigating genomic regions that control boar semen traits has increased due to advances in molecular and genoty** techniques, statistical methods, and the ease of GWAS application. According to Wang et al. [12], two of the most used GWAS methods are (1) single-SNP GWAS, which fits each SNP separately as a fixed effect in a model that accounts for population stratification; and (2) Bayesian methods, which analyze all SNPs at the same time. Nonetheless, when the number of animals that are both phenotyped and genotyped is relatively small, the application of single-SNP GWAS and Bayesian methods is limited, since the calculation of pseudo-phenotypes (i.e. deregressed breeding values [39]) is necessary [14]. An alternative method is single-step GWAS (ssGWAS), which was proposed by Wang et al. [12] and converts the GEBV of genotyped animals obtained by ssGBLUP into SNP effects. However, the ssGWAS model, which assumes equal variance for all marker effects, may be limited when traits are affected by large QTL [12, 40]. To overcome this limitation, Wang et al. [12] proposed the WssGWAS approach, in which SNP effects are weighted according to their importance for the trait of interest, which improves QTL detection [12].

To perform the WssGWAS, first, the \({\mathbf{H}}^{ - 1}\) matrix is calculated, by combining all known pedigree and genotype information, and then used in the ssGBLUP procedure to estimate GEBV for all animals. Then, the GEBV of the genotyped animals are used to estimate effects for the SNPs. Finally, SNP effects are used to calculate the percentage of genetic variance that is explained by sets of consecutive SNPs (SNP windows). In this approach, the SNP effects are not directly estimated from the model and there are no measures of uncertainty for the statistical tests. However, the WssGWAS provides information on the most important SNP windows, based on the proportion of explained genetic variance, which is a general concept that is widely accepted in QTL detection analyses [12,13,14] but does not allow for a formal testing of significance. In this study, we chose the WssGWAS approach because: (1) it can integrate all phenotypic, genotypic and pedigree data simultaneously, thus avoiding the need to calculate pseudo-phenotypes for genotyped animals to incorporate all phenotypic information; (2) it allows the use of different weights for SNPs according to their importance, which is a deviation from the non-realistic GBLUP assumption of the infinitesimal model and improves the precision of estimates of SNP effects; and (3) it provides the possibility to work with SNP windows, since a window of consecutive SNPs in the GWAS may be more successful in finding QTL regions compared to individual SNP analysis because of linkage disequilibrium (LD). In general, analyses that consider the association of individual SNPs may overestimate the number of detected QTL [41].

In spite of the small number of animals that were both genotyped and phenotyped (349 for L1 and 446 for L2), all the information of the 3737 and 3307 genotyped animals, the 866 and 900 phenotyped boars (849 and 886 for ABN), and the 6724 and 6502 animals in the pedigree file, for L1 and L2, respectively, were used to calculate GEBV, and consequently, to estimate SNP effects in the WssGWAS. Obtaining a large dataset of both genotyped and phenotyped animals is unlikely or difficult for semen traits because phenotypes can be recorded only for animals that are used for AI. The number of males and females genotyped in this study also shows the relatively small number of boars, 856 and 953 for L1 and L2, respectively, compared to the number of females, 2881 and 2354. In this context, the advantage of WssGWAS is to supply additional information on relatives without genotypes in order to improve the statistical power of QTL detection [12].

In our previous study [7], a classical GWAS (single-SNP GWAS) was performed for sperm motility in a subset of the L1 and L2 populations that were used in the present study: 645 and 1886 phenotyped and genotyped animals for L1, respectively, and 760 and 1972 animals for L2. Due to the small number of animals that were both genotyped and phenotyped, deregressed breeding values [39] were calculated and included as response variables in the GWAS model, a polygenic effect was added in the model to account for population structure, and a genome-wide false discovery rate (FDR) was applied (q-values) to avoid false positives due to multiple testing. The results showed no SNPs with a significant association (q-value ≤ 0.05) with sperm motility for L1 but six SNPs associated with the trait for L2 (SSC1, from 117.26 to 119.50 Mb). In this QTL region on SSC1, a novel candidate gene (MTFMT) that affects translation efficiency of proteins in sperm cells, was identified. The number of relevant QTL regions that was identified for MOT in the present study was greater than in [7], which indicates that the WssGWAS was more successful in detecting QTL. All the recommended steps for the GWAS method used were performed in each study and reliable candidate genes with biological meaning were found. In addition, Marques et al. [42] reported high genetic correlations between MOT, PROMOT and ABN, which corroborate our findings regarding the overlap** QTL regions that explained more than 1% of the genetic variance for these traits. Therefore, we believe that the QTL results of the present and previous studies should not be considered as false positives and we conclude that the single-SNP method with pseudo-phenotypes and the amount of data in the previous study were not able to identify some of the relevant QTL related to MOT. For L1, we identified overlap** QTL regions for MOT and other semen traits on SSC1 (Table 3), but they did not include the MTFMT gene described in our previous study [7], likely as a result of the larger number of animals used in the current study and the different statistical models used in the two studies, i.e. WssGWAS based on the \({\mathbf{H}}\) relationship matrix in the current study and single-SNP GWAS using the \({\mathbf{A}}\) relationship matrix and deregressed phenotypes in the previous study.

In the WssGWAS, we identified several relevant QTL regions associated with the traits under study, which confirms the assumption that these traits have a complex genetic determinism. In this study, the region used to search for candidate genes was not limited to the SNP window, but also included the upstream and downstream flanking regions. The use of a larger genomic region for the identification of genes is important because SNPs within a window may be in high LD with QTL in the surrounding area.

We detected several candidate genes for the semen traits in both L1 and L2 pig lines (Tables 3 and 4). For L1, the dynein axonemal intermediate chain 2 (DNAI2) gene, located on SSC12, was considered the best candidate gene for MOT in the QTL region. The DNAI2 gene encodes axonemal dyneins; the axoneme is a microtubular structure located in the center of all motile cilia and flagella, including sperm flagella [43]. Dyneins are large multisubunit ATPases that interact with microtubules to generate the driving force for flagellar motility [44]. Mutations in the human DNAI2 are involved in defects of the axoneme [45].

For L1, some overlap** QTL regions were identified between MOT and PROMOT. On SSC5, we identified one candidate gene, i.e. sodium voltage-gated channel alpha subunit 8 (SCN8A). Pinto et al. [46] described the expression of SCN8A in human sperm flagellum principal piece and showed that sodium channels contribute to the regulation of human sperm motility. Another candidate gene for both traits was identified on SSC13, i.e. IQ motif containing G (IQCG). Li et al. [10] showed that Iqcg knockout mice presented severe malformation and total immobility of their spermatozoa because of disorganized sperm flagellum axoneme. Harris et al. [47] reported that mice with mutations in the Iqcg gene presented spermiogenesis defects, with incomplete sperm tail formation.

We detected one overlap** QTL region in L1 for MOT, PROMOT and ABN on SSC14. This QTL region includes the gene LOC102167830, which is described as a spermatogenesis associated (SPATA) protein 31E1-like. SPATA genes form a large gene family that plays a very important role in testis development and spermatogenesis [61], the METTL3 protein catalyzes the methylation and formation of N6-methyladenosine (m6A), which is the most prevalent and reversible RNA epigenetic modification in mammalian mRNA. Yang et al. [4), which includes the spermatogenesis associated 7 gene (SPATA7). This gene was first identified in rat and human spermatocytes and may be involved in preparing chromatin for the initiation of meiotic recombination [63]. According to Ferguson et al. [64], meiotic recombination ties homologous chromosomes together and facilitates proper segregation of chromosomes during meiosis. Errors in the formation of crossovers can result in the production of aneuploid gametes. Sun et al. [65] reported a higher frequency of sperm aneuploidies for some chromosomes in men with sperm morphological defects compared to men with normal sperm concentrations. Thus, the SPATA7 gene in this QTL region on SSC7 is considered the best candidate gene for MOT and ABN.

Conclusions

We identified several QTL regions that are associated with semen traits in two pig lines using the weighted single-step GWAS, which allowed detection of QTL in spite of the small number of animals having both phenotypes and genotypes. A large part of the genetic variance of the semen traits was explained by different genes in the two lines but the gene network analysis revealed candidate genes for these two lines that are involved in shared biological pathways in the mammalian testes. These results can be used to search for causative mutations and for marker-assisted selection to enhance the production and quality of semen for a more efficient use of AI in pig breeding and production.