Background

Understanding how species have adapted to their environment has long interested evolutionary biologists [1]. Adaptive phenotypes can result from changes in protein-coding sequences that affect protein structure and function [2,3,4], and from gene expression alterations [5]. Gene expression is an intermediate phenotype that links DNA sequence and physiological traits [6]. Alterations in gene expression are more likely to cause adaptive changes in morphology and development than changes in protein sequences [7]. For example, environmental adaptations in humans are tenfold more likely to affect gene expression than amino acid sequences [8]. Adaptive phenotypes driven by alterations in gene expression patterns are more flexible than changes in amino acid sequence, and species can temporarily adapt to the environment by regulating gene expression [9]. Therefore, it is necessary to examine gene expression changes that occur during environmental adaptation.

Diet plays a pivotal role in the evolutionary history of animals [10]. Specialization of diets have resulted in the evolution of similar morphological, physiological, behavioral and biochemical adaptations [11,12,13,19,20,21]. DNA methylation patterns are not static but can be altered by diet and multiple factors [22,1). We used the red panda reference genome and reference annotation downloaded from DNA ZOO (https://www.dnazoo.org/). The red panda genome from DNA ZOO was reassembled using 3D-DNA pipeline [58] and reviewed using Juicebox Assembly Tools [59] based on the draft assembly ASM200746v1 (GCA_002007465.1) [16]. Low-quality reads and any adapter sequences were removed using NGS QC Toolkit [60] with a quality score of 20. High-quality reads that passed filter thresholds were mapped using HISAT2 [61]. Final efficiency of RNA-seq read alignments varied from 85.01 to 99.44% with species (Table S1). SAMtools was then used to convert the alignments in SAM format to BAM format. After reading in the reference annotations to count fragments, a count of all exons grouped by gene was calculated by featureCounts [62].

Definition of orthologous genes

We performed an extensive orthologous gene comparison to investigate the expression level differences between obligate bamboo-eating pandas and other mammal species. Only the longest protein sequence was retained for each unique gene. Orthofinder 2.3.7 determined [63] one-to-one (1:1) orthologues between species within the liver and pancreas by using the reciprocal best hit method in BLASTp, with an E value cutoff of 1e-5. Orthologous gene ID and symbols of giant panda were used as proxies for following description of genes.

Phylogenetic analysis

We identified 9,219 single-copy genes from the eight species. The amino acid sequences of all orthologous genes were aligned and concatenated to construct the phylogenetic tree. The maximum likelihood (ML) tree was performed in RAxML [64], with 100 bootstrap replicates under the PROTGAMMAJTT model. This process followed the python3 script in GitHub (https://github.com/dongwei1220/EasySpeciesTree).

Expression level normalization

For cross-species comparison, the RNA-seq experiments will result in not only different gene lengths but also different sequencing depths. Gene expression levels for 1:1 orthologues were normalized using GeTMM (Gene length corrected TMM) [65]. This method combined gene length correction with the normalization procedure trimmed mean of M-values (TMM) (applied in edgeR) to obtain expression levels of orthologous genes comparable between species. We constructed gene expression matrices of liver and pancreas samples separately with each column presenting a sample and each row presenting the expression of an ortholog. Low expressed genes were filtered to include only genes expressed greater than 0 counts in the samples of the same species. We then defined each species as a group and a set of scaling factors were computed using TMM to normalize the library sizes. Normalized GeTMM values were used in downstream analyses. We assessed the data quality by comparing CV (ratio of the standard deviation to mean) of gene expression data before and after normalization. The CV of normalized data was lower than nonnormalized data (Figure S4), which indicated the bias due to species and biological replicates, which was reduced after normalization.

Principal component analysis (PCA)

Normalized gene expression matrices of each tissue were log2 transformed. The PCA was performed on these transformed data using the ‘prcomp’ function in the R package ‘stats’.

Correlation analysis between species

The liver and pancreas expression matrices of n rows (gene) by m columns (samples) were constructed. We calculated the Spearman correlation of each sample using the function “cor” in R, and the function ‘heatmap.2’ in package ‘gplots’ was used to plot the results.

Differentially expressed genes

The GeTMM values were used to analyze gene expression differences with generalized linear models (GLM) in edgeR. Giant panda and red panda samples were compared to other non-panda samples separately. Significant DEGs equaled |log2FC|< 1 and Benjamini and Hochberg FDR-adjusted P-value < 0.05. DEGs that were shared between each panda species and the other non-herbivorous species were identified as convergent expression genes. DEGs were categorized and clustered by columns using the function ‘pheatmap’ in package ‘pheatmap’. We then performed gene ontology and pathway enrichment analysis of DEGs by using ‘enricher’ function in the ‘clusterProfiler’ package in R [66], with all genes of giant panda as the reference gene set.

Real-time quantitative PCR

Experimental C57BL/6 J mice (8 weeks, n = 4) and Wistar Han rats (8 weeks, n = 4) were purchased from Chengdu Dossy Experimental Animals Co., Ltd. We collected the pancreas samples from mice and rats followed the ‘Guide for the care and use of laboratory animals’. Total RNA was extracted using M5 Universal RNA Mini Kit (Mei5 Biotechnology, China) according to the manufacturer’s instructions. The RNA of giant pandas, red pandas, ferrets, mice and rats were used for real-time PCR assay (Table S18). The isolated RNA was converted to double-stranded cDNA using M5 Sprint qPCR RT Kit (Mei5 Biotechnology, China). After the cDNA synthesis, quantification of 10 mRNA levels was conducted by real-time PCR performed on a CFX96 real-Time PCR Detection System. The expressions of 5 shared adaptive convergence DEGs associated with carbohydrate metabolism and respiratory electron transport related-genes were verified. All the primer sequences used for amplification of those 5 mRNAs were shown in Table S19.

The total volume of 10 μl reaction mix for the real-time PCR contained 5 μl 2X M5 HiPer SYBR Premix EsTaq (with Tli RNaseH) (Mei5 Biotechnology, China), 0.2 μl forward primer (10 pmol/μl), 0.2 μl reverse primer (10 pmol/μl), and 1 μl cDNA severed as a template and 3.6 μl ddH2O. Negative controls containing water as template were also included in each run. The cycling conditions were as follows: 1 cycle of 95 °C for 30 s; 40 cycles of 95 °C for 5 s, 60 °C for 30 s. Then, the expression levels of the mRNAs above were analyzed using the relative quantification (delta-Ct method). The housekee** gene, GAPDH, was included as internal controls in all RT-qPCR runs. Expression of each gene verified by RT-qPCR were showed in Table S20.

DNA methylation of convergently expressed nutrition metabolism-related genes in the promoter region

In order to explore the molecular mechanisms underlying epigenetic regulation of convergently expressed nutrition metabolism-related genes in both panda species, whole-genome methylation sequencing on liver and pancreas tissues of adult giant pandas and red pandas were performed, and the corresponding tissue methylation data of humans and mice were downloaded from SRA database for comparative analysis (Table 2). The DNA purity and concentration of afu_1025 liver, afu_0708 liver, afu_1025 pancreas, afu_0527 pancreas, afu_1219 pancreas, aml_PP pancreas and aml_SE liver was substandard. They were removed from further analysis. The processes for library preparation, bisulfite sequencing, and reads map** were described in previous study [67]. For each sample, 109-251G clean base was generated after data quality control, and the final efficiency of BS-seq read alignments was ranged from 61.1 to 79.3%.

Table 2 Summary of collected samples and reads quality for WGBS

A negative correlation was generally occurred between promoter methylation and gene expression levels. We focused on the DNA methylation levels of all convergently expressed nutrition metabolism-related genes in the promoter region. The methylation level of the 1,000 bp promoter (the region from –1000 relative to the transcription start site) was calculated using the formula: Methylation level of promoter = ∑mC / ∑(mC + C). mC was the number of methylation reads in promoter and C was the number of unmethylation reads. Promoters had to contain at least 2 CpG sites which were covered by more than five reads. Except for the promoter region of the red panda COQ8A gene, all other genes met the above requirements. Differentially methylated promoters between giant panda and human, giant panda and mouse, red panda and human, and red panda and mouse were identified. The difference in methylation level between two groups was identified with a P-value < 0.05 using Two-tailed T-test.

Gene clone, plasmids construction and luciferase assays

IN order to explore whether the down-regulated AASS that we had identified in giant panda liver was regulated by NR3C1, we performed luciferase assays. First, total RNA of giant panda live was extracted by using M5 Universal RNA Mini Kit (Mei5 Biotechnology, Co., Ltd) following the manufacturer's protocol. The extracted RNA samples were then used for the cDNA synthesis by using M5 Super plus qPCR RT kit (Mei5 Biotechnology, Co., Ltd). The giant panda NR3C1 was amplified by PCR using cDNA as template with the primers (Table 3), which were designed according to the sequences from giant panda genome. The PCR reaction system is 25 μl, which included 2.5 μl 10 × Taq Buffer (without Mgcl2), 2 μl dNTP Mixture (2.5 mM each), 0.5 μl of forward and reverse primer (10 μmol/L) respectively, 0.5 μl Taq DNA Polymerase (Vazyme) and 0.5 μl cDNA, added nuclear-free H2O to 25 μl.

Table 3 Primers for the PCR amplifications of NR3C1 protein coding region

Then, 30 mg liver sample of giant panda was used for DNA extraction according to the instructions of Tissue Genomic DNA Extraction Kit (Tiangen). 1,000 bp region upstream the AASS transcription start site was amplified based on the annotation information of giant panda genome (Primers: F 5’GTCTTGGGGCAATGGTCTA3’; R5’TAACAGGGTGTCCGTTCTG3’). The PCR reaction system is 50 μl, which included 25 μl 2 × Phanta Max Buffer, 1 μl dNTP Mix (10 mM each), 2 μl of forward and reverse primer (10 μmol/L) respectively, 1 μl Phanta Max super-fidelity DNA Polymerase (Vazyme) and 1 μl DNA, added nuclear-free H2O to 50 μl.

The sequence of AASS gene promoter region and NR3C1 protein coding region (Figure S5) were sent to company (** Kairui, Wuhan) for synthesis. The synthesized fragments contain restriction sites KpnI and XhoI. AASS promoter was cloned into pGL3-basic (Promega) and named AASS1000-pGL3-basic. NR3C1 was cloned into pCDNA3.1 (Invitrogen) and named pcDNA3.1-NR3C1. The orientations and the sequences of the inserts were verified by restriction digestion and sequencing.

293 cells were cultured at a density of 2 × 104 cells/well in 96-well culture plates and co-transfected with 0.2 μg of the luciferase reporter construct and the internal control vector pRL-TK (Promega, Madison, WI) at a ratio of 20:1 (reporter construct: control vector) using LipofectamineTM 2000 (Invitrogen, Carlsbad, CA) according to instruction of the manufacturer. 5 h post-transfection, the transfection medium was removed and replenished with medium containing 6 μM of curcumin (Sigma-Aldrich, St. Louis, MO) solubilized in 100% dimethylsulfoxide (DMSO) (Sigma). 48 h post-transfection, luciferase activity was measured using the Dual-Luciferase® Reporter Assay System (Promega). Firefly luciferase activity was normalized to renilla luciferase activity in cells co-transfected with the reporter construct and the control vector.