Introduction

Sheep (Ovis aries) was one of the first domesticated livestock, providing humans with meat, milk, fur, and wool products. Domestication, natural and artificial selection have resulted in remarkable phenotypic diversity in animal appearance, growth rate, local adaptation, and fertility rate [14] and Trimmomatic (version 0.36) [15]. High-quality paired-end reads were mapped to the sheep reference genome OAR4.0 using the BWA-MEM alignment tool [18]. Finally, all SNPs were annotated with ANNOVAR [19] according to NCBI’s gene annotation database.

Population structure and genomic diversity analysis

Based on the autosomal genetic variants, PLINK v1.9 [20] was used to calculate the individual genetic distances of the sheep. MEGA v7.0 [21] was then used to construct the Neighbor-Joining (NJ) tree for the genetic distance matrix. The fourfold degenerate sites were also used to build ML and NJ trees. The principal component analysis in all sheep was conducted using vcftools and PLINK with the parameters ‘--maf 0.05 --max-missing 0.9 --chr-set 26’. The nucleotide diversity (π) was calculated using vcftools with the parameter ‘--window-pi50000 --window-pi-step 25000’. The PopLDdecay software [22] was used to calculate r2 (-minMAF 0.05 -hwcutoff 0.001 -Het 0.88 -Miss 0.25) for the pairs of SNPs and to plot the LD curves. To remove the bias introduced by differing sample sizes in different populations, individuals in each population were randomly sampled to maintain a consistent sample size during the calculations (7 individuals per group). Only SNPs with a minor allele frequency (MAF) greater than 0.05 were considered.

Genomic selective sweep analysis

Selective sweep signals were identified using the population differentiation index (\({F}_{\mathrm{ST}}\), the DS group vs. the STH and MG groups) and locus-specific branch lengths, LSBL [23, 24] based on the sliding window strategy (window size 50 kb; step size 25 kb). LSBL was estimated based on pairwise \({F}_{\mathrm{ST}}\) values [25] of each polymorphic site from three groups: Target (DS), Control (STH), and Background (MG). The formula LSBL = (\({F}_{\mathrm{ST}}\)(DS–STH) + \({F}_{\mathrm{ST}}\)(DS–MG) − \({F}_{\mathrm{ST}}\)(STH–MG))/2 was used. The threshold for identifying the putative selection regions in the \({F}_{\mathrm{ST}}\) and LSBL analyses was empirically set at the top 1% percentile outliers. The putative genes under selection were submitted to DAVID [26] for the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Fisher’s Exact Test was used for p-value correction. Only terms with a p-value less than 0.05 were considered significant and listed. Data analysis and visualization were carried out with customized R scripts.

Results

Population structure and genomic diversity

Whole-genome sequencing was carried out at an average depth of 18.14 × coverage (Table S2), on the ear tissues collected from sheep in different regions of Inner Mongolia, China. After rigorously filtering, a total of 25,375,422 high-quality SNPs were obtained for further analysis. Among them, 15,525,859 SNPs were in intergenic regions, while 171,462 SNPs were in exonic regions (Table S3). The genetic relationships between the sheep breeds were explored based on all the genetic variants and fourfold degenerate sites. The phylogenetic tree constructed by the neighbor-joining (NJ) method showed that each breed population had a distinct clade (DairyMeade sheep and East Friesian sheep, DS; Small Tail Han sheep, STH; Mongolia sheep, MG) (Fig. S2a). Similar genetic affinities were obtained in phylogenetic trees constructed by neighbor-joining (Fig. S3a) and maximum-likelihood (ML) methods (Fig. S3b) using fourfold degenerate sites. Principle component analysis (PCA) also uncovered different population structuring among DS, MG and STH, and the PC1 (4.06%) revealed the fat-tailed and thin-tailed sheep variants (Fig. S2b). ADMIXTURE analysis revealed that the fat-tailed and thin-tailed sheep, belonged to different clades (K = 2), and there was no genetic exchange (Fig. S4).

The genetic diversity index was calculated based on the whole-genome genetic variants. Compared with STH and MG, DS showed a lower nucleotide diversity (DS, π = 2.533e−3; STH, π = 2.79e−3; MG, π = 2.87e−3) (Figs. S2c, S5) and a slower decay rate of linkage disequilibrium (LD, (dropped to half of its maximum at 79 kb, STH group (62 kb) and MG group (46 kb)) (Fig. S2d). These results suggest that indigenous breeds (MG and STH) have a higher genetic diversity, while bottlenecking and/or inbreeding occurred in the two dairy sheep breeds.

Selective signatures in fat- and thin-tailed sheep

Tail size was the prominent phenotypic difference between DS and MG/STH. We analyzed the inter/intra-population diversities of the highly significant sweep regions to explore the genetic basis underlying fat deposition in the tail. The population differentiation index \({F}_{\mathrm{ST}}\) and the LSBL of DS, STH and MG, was calculated on a sliding-window basis (50 kb sliding window with 25 kb step increment) to detect the candidate divergent regions. A total of 798 genomic regions were shown to have increased differentiation index between DS and STH–MG (\({F}_{\mathrm{ST}}\)> 0.42; LSBL > 0.435; both were at the top 1% threshold) (Fig. 1a; Table S4). In total, 510 shared protein-coding genes (619 and 614 genes were identified by \({F}_{\mathrm{ST}}\) and LSBL, respectively) were identified with signatures of selection (Table S5), which accounted for 1.96% of the whole-genome annotated genes (a total of 26,076). The functional enrichment analysis (in terms of KEGG) for the detected selective genes revealed that overrepresented functional categories were associated with cell growth and immunity, such as focal adhesion (adjusted p-value = 0.00086) and T cell receptor signaling pathway (adjusted p-value = 0.0013) (Table S6).

Amongst the candidate divergent regions, two putative sweeps showed the highest population differentiation scores. One was located on chromosome 1 (LSBL = 0.86 and \({F}_{\mathrm{ST}}\) = 0.79) as displayed in the Manhattan plots (Fig. 1a). This region, from 27.75 to 27.86 Mb, only harbors the GLIS1 gene (Fig. 1b). Further haplotype analysis showed that the haplotype pattern in DS was strikingly different from STH and MG (Figs. 1c, S6). A nonsynonymous point mutation (g.27807636G>T) found within GLIS1 in STH–MG resulted in a nonsynonymous Pro107 → Thr (P107T) substitution, thus making STH–MG different from DM and other thin tail mammals in this locus (Fig. 2). The second putative sweep appeared in a locus on chromosome 13 (LSBL = 0.82 and \({F}_{\mathrm{ST}}\) = 0.78) harboring three pseudogenes, including LOC101117953, LOC101118207 and LOC101110166 (Fig. S7). Another genomic region (from 3.825 to 3.90 Mb) on chromosome 15 also exhibited strong selection signatures (LSBL = 0.92, 0.93) between DS and STH–MG (Fig. S8), that harbors PDGFD gene, a member of the platelet-derived growth factor family. Other genes related to sheep tail traits, such as T (LBSL = 1.02, \({F}_{\mathrm{ST}}\) = 0.53) were also found in this study.

Fig. 1
figure 1

Selective-sweep analysis by comparing genomes between thin-tailed DS (dairy sheep, DairyMeade and East Friesian) and fat-tailed STH–MG (Small Tail Han sheep and Mongolian sheep). (a) Distribution of population differentiation index (\({F}_{\mathrm{ST}}\), top panel) and the lineage-specific branch length (LSBL, bottom panel) between DS and STH–MG in a 50 kb sliding window with a 25 kb step increment across all autosomes. (b) π and LSBL values around the genomic region on chromosome 1 (from 27.4 to 28.4 Mb) between DS and STH–MG populations. GLIS1 is located in this genomic sweep region. The red, green and blue cells represent DS, STH and MG population, respectively. (c) Haplotype pattern of the selective-sweep region. Haplotype pattern in a region defined by SNPs that are at a high frequency in DS and at a low frequency in STH–MG. Each column is a polymorphic genomic location, each row is a phased haplotype, and the colored column on the left denotes the population identity of the individuals. The reference/alternative allele is indicated in light yellow/green. (Color figure online)

Fig. 2
figure 2

Alignment of the amino acid sequences of GLIS1 protein in different mammals. Positions in which the amino acids differ are highlighted in grey

Discussion

Dairy sheep breeds with typical thin tail phenotype, including DairyMeade and East Friesian, are recently used to hybrid with fat-tailed Mongolian sheep and Small Tail Han sheep to create new dairy sheep breed in Inner Mongolia, China. To our surprise, all the F1 sheep have significantly thinner tail than Mongolian sheep or Small Tail Han sheep, without segregation of character. Since these two dairy sheep breeds were not used in any studies related to tail fat deposition before, we then collect all the available pedigrees of DairyMeade and East Friesian in China and compared their genome with local fat-tailed Mongolian sheep and Small Tail Han sheep.

The fat tail phenotype in sheep occurs as a result of multiple genes. With the usage of new genomic materials, this study revealed that there was a recent selective sweep at GLIS1 locus in the ovine genome. GLIS1 is a zinc finger protein that acts as both an activator and repressor of transcription [27]. In mouse embryonic development, GLIS1 starts to express in the forelimb, hindlimb and tail at 10.0 days post coitus (dpc), then it expresses in the anterior region of the forelimb, ventral part of the body and tail at 10.5 dpc and the expression is increased at 11.0 dpc, which is consistent with mesoderm differentiation [28]. In a recent study, GLIS1 was recognized as a novel pro-adipogenic transcription factor. It is highly expressed in bipotent muscle satellite cells. But when overexpressed, increased occupancy of GLIS1 is observed at the promoters of adipogenic genes Adipoq, Cebpa and Ucp1, and drives brown adipogenesis both in vitro and in vivo [29]. However, GLIS1 role in sheep has not been extensively studied. A SNP in GLIS1 affects the feed efficiency in Dual Purpose and Blackface rams [30], which may also be related to different muscle and fat ratio in the carcass. DS and MG/STH had remarkable differences in growth rate and tail phenotype. In both newborn and adult DS, almost no fat was deposited in the tail. However, MG and STH, had a large amount of fat deposited to the ventral region of the tail and subcutaneous tissue. It is worth noting that fat deposition in the ventral region of the tail was observed as early as in the postnatal stage, indicating that the tail phenotype is determined during fetal development. Thus, it could be an innate feature of adaptation for MG and STH to face the challenges of cold and food scarcity lambing season (March to April) in northern China. Combined with the information together, we hypothesized that, GLIS1, as a pro-adipogenic factor, plays a key role in mesodermal cell differentiation during fetal development in fat-tailed sheep to initiate differentiation of preadipocytes and fat accumulation.

Previous studies reported that LOC101117953 and BMP2 (bone morphogenetic protein 2, from 48,387,181 to 48,400,679 bp on chromosome 13) were related to tail-fat deposition [6,7,8]. LOC101117953 is a retro-copy of PPP1CC (protein phosphatase PP1-gamma catalytic subunit gamma), which is not expressed in adult tissues as it lacks promoter region, and is thus less likely to be the causative gene for the tail phenotypes [8]. Previous studies also revealed that PDGFD is a likely causal gene for fat deposition in sheep tail, which promotes proliferation and inhibits differentiation of preadipocyte [11, 31,32,33]. Two SNPs in PDGFD significantly affect the tail length and width [34]. T, a key regulator of mesoderm formation during early development, was found related to short-tail phenotype in Hulunbuir sheep, a subpopulation of Mongolia sheep [35]. It may also be related to the caudal vertebra phenotype differences between DS and STH/MG, as DS has long straight tails while STH/MG has relatively shorter tails with a slightly curved tip.

This study revealed that the ovine genome has recently encountered a selective sweep at GLIS1 locus. As a novel pro-adipogenic transcription factor, GLIS1 may initiate the accumulation and differentiation of preadipocytes in the tails during fetal development and affect the tail phenotypes in sheep.

Conclusions

Fat tail in sheep is occurs as a result of multiple genes. This study demonstrated that GLIS1, LOC101117953, PDGFD and T have encountered a recent selective sweep. A nonsynonymous point mutation (g.27807636G>T) within GLIS1 locus in STH–MG resulted in a Pro to Thr substitution. As a pro-adipogenic factor, GLIS1 may play critical roles in the mesodermal cell differentiation during fetal development and affect fat deposition in sheep tails. This study gives a new insight into the genetic basis of species-specific adaptive traits in sheep and provides a novel opportunity to develop therapies for complex diseases related to fat metabolism.