Background

Over the past decade, genome-wide association studies (GWAS) have successfully discovered hundreds of thousands of genetic variants associated with complex traits and diseases in both human and livestock species [1,2,3]. As the majority of these variants are located in noncoding regions [4], it is challenging to understand how they impact complex phenotypes. Previous studies have illustrated that trait-associated variants are significantly enriched in regulatory regions, such as promoters and enhancers, in well-studied species (e.g., humans and mice) [5, 6]. Currently, global efforts such as the Functional Annotation of Animal Genomes (FAANG) initiative and the Farm animal Genotype-Tissue Expression (FarmGTEx) project are working to uncover basic knowledge of genomic function and regulation in livestock species [7,8,9]. However, a comprehensive atlas of regulatory elements (REs) is still lacking for most livestock species, which limits our understanding of the functional biology of species evolution and restricts the genetic improvement of complex traits in livestock. As abundant terrestrial herbivores [10], ruminants, such as cattle, sheep, and goats, have a unique history of species differentiation and play an important economic role in modern animal husbandry. Therefore, a comprehensive comparison of REs between major ruminants and other species will provide novel insights into functional genome evolution specific to ruminants. Moreover, it will allow us to explore the genetic basis underlying complex traits of economic value in these farm animal species.

The emergence of cross-species comparative epigenomics has provided a new method for both elucidating genomic evolution and identifying potential functional noncoding variants associated with complex traits and diseases [11]. By comparing the chromatin landscape of primary aortic endothelial cells isolated during the acute NF-κB response among humans, mice, and cattle, Alizada et al. found that inflammatory- and cardiovascular-associated genetic variants discovered by GWAS were significantly enriched in the species-conserved regulatory regions nearby NF-κB target genes [12]. In addition, by cross-species map** of epigenomic marks, Liu et al. found that the genetic control of immune and reproductive traits is conserved to a certain degree between humans and cattle [13]. These findings indicate that evolutionarily conserved REs play key roles in sha** complex phenotypes across species [14, 15]. Although previous studies have investigated the evolution of the transcriptome (e.g., long noncoding RNAs) in ruminants [8, 16], a comprehensive comparison of epigenetic regulation and its potential impacts on other molecular phenotypes and complex traits is still lacking.

Here, by using the liver as a representative tissue, we systematically detected and functionally characterized the epigenomic landscapes and explored the dynamics of REs across three ruminant species (i.e., cattle, sheep, and goat) and three non-ruminant species (i.e., pig, mouse, and human). We annotated an average of 81,081 REs (17,154 and 63,927 promoters and enhancers, respectively) across six species by integrating 137 multi-omics datasets, including epigenetic data such as histone modifications, gene expression, chromatin accessibility, and DNA methylation (Additional file 1: Fig. S1). By detecting lineage-specific REs and associating them with expression quantitative traits loci (eQTLs) and large-scale GWAS datasets from 44 and 52 complex traits in cattle and humans, respectively, we further explored how comparative epigenomics across species could help us understand the evolutionary and genetic mechanism of complex phenotypes. Overall, our study provides a valuable resource for REs in ruminants and highlights the key roles of conserved functional elements in complex traits in both human and livestock species.

Results

Overview of multi-omics datasets

To study the epigenomic changes during ruminant evolution, we performed chromatin immunoprecipitation sequencing (ChIP-seq) for H3K27ac and H3K4me3 in the liver of cattle, sheep, and goats (Fig. 1A). In total, we generated 17 ChIP-seq datasets, and each species had three biological replicates. We also generated nine RNA sequencing (RNA-seq) datasets and nine whole genome bisulfite sequencing (WGBS) datasets in the same liver samples. We further retrieved a total of 41 public datasets including ChIP-seq datasets for H3K27ac and H3K4me3, WGBS datasets, and RNA-seq datasets, of three non-ruminant (i.e., pigs, humans, and mice) livers. Each species had at least two biological replicates (Additional file 2: Table S1A-C) [17, 18]. We have processed all the data using the same data analysis pipeline to make human, mouse, and pig datasets (previously generated) comparable to those (newly generated) of three ruminants. Furthermore, we also collected datasets from seven other cattle tissues to investigate the dynamic epigenetic landscape across tissues [19]. Overall, we uniformly analyzed 35 new genome-wide omics datasets from three ruminant livers and integrated them with 102 previously published datasets. We obtained over 25 billion mapped reads with an average map** rate of 91.24% after filtering low-quality reads (Fig. 1A and Additional file 2: Table S1A-D).

Fig. 1
figure 1

Summary and characterization of 137 epigenetic and gene expression data in six mammals. A Datasets analyzed by this study. B The number of regulatory regions (promoters and enhancers) identified in the liver of each species. C Fold enrichments of regulatory elements (REs) for 14 chromatin states previously predicted in cattle and pig liver [19]. These chromatin states mainly represented enhancers (CTCF/Enhancer, Active_Enhancer, and Primed_Enhancer), promoters (CTCF/Promoter, Active_Promoter, Promoter, Poised_Promoter), repressed regions (Insulator, Low_Signal, and Polycomb_Repressed), open regions (Active_Element), and TSS-proximal regions (CTCF/TSS, Active_TSS, and Flanking TSS). D The percentages of REs overlapped with public data in cattle liver (blue) and newly annotated in this study (orange). E The sample clustering based on pairwise Spearman correlation of gene expression. F Similarity of sample clustering patterns across different omics data types using Rand index

Through signal saturation analysis, we found that 20 million reads were required to reach the saturation of consistent peak detection for H3K27ac and H3K4me3 in single-end ChIP-seq samples, while 37.5 million reads were required for paired-end ChIP-seq samples (Additional file 1: Fig. S2A and S2B). We detected 66,000–108,000 (mean = 83,338) H3K27ac-enriched regions and 18,000–29,000 (mean = 23,576) H3K4me3-enriched regions in liver (q < 0.01, Additional file 1: Fig. S2C and S2D). Furthermore, we defined two categories of REs, (1) promoters, which were simultaneously marked by H3K4me3 and H3K27ac, and (2) enhancers, which were only enriched for H3K27ac. We identified an average of 81,081 REs per species, including 63,927 enhancers and 17,154 promoters (Fig. 1B and Additional file 3: Table S2). Moreover, we found that enhancers exhibited higher tissue specificity compared to promoters. For example, we found that a majority of (85.94%) enhancers in the liver exhibited tissue specificity, while only 22.92% of promoters did (Additional file 1: Fig. S2E). We observed that 78.09% of all promoters were located around (distance ≤ 5kb) transcriptional start sites (TSSs), whereas the majority (77.88%) of enhancers were distal to TSSs (distance > 5kb) (Additional file 1: Fig. S3A), which was consistent with previous findings [17]. We calculated the enrichment fold of REs for 14 chromatin states previously predicted by ChromHMM [19] and observed that these REs were significantly enriched for the corresponding chromatin states (Fig. 1C and Additional file 1: Fig. S3B and S3C). For instance, enhancers were significantly enriched for “Active_Enhancer” and “CTCF/Enhancer” (enrichment fold = 21.78 and 16.61, respectively). We further validated that over 70% and 45% of our newly detected promoters and enhancers overlapped with REs identified using publicly available datasets in the liver (Fig. 1D and Additional file 1: Fig. S3D and S3E) of cattle, pigs, and mice [17, 19, Full size image

According to bootstrap** analysis, the total length of the phylogenetic trees varied widely across these omics data types (Fig. 2F), reflecting their differences in evolution rates. Notably, the branch of the gene expression tree was significantly (P<0.0001) shorter than those of another seven epigenomic phylogenetic trees, indicating that gene expression levels were more conserved than their regulatory elements during mammalian evolution. Furthermore, promoters were highly conserved compared to distal enhancers [17, 22]. Compared to histone modifications (H3K27ac and H3K4me3) and chromatin accessibility, DNA methylation in REs evolved faster (Fig. 2F) [Full size image