Backgrounds

N6-methyladenosine (m6A) is one of the most prevalent chemical modifications found in eukaryotic mRNAs. With recent advances in high-throughput technologies, transcriptome-wide m6A modifications have been profiled for A. thaliana and several other plant species [1,16, 17]. Hence, LB1 and LB5 are recognized as useful materials for studying pollen development, cytoplasmic and nuclear interactions, and male fertility-related gene regulations in plants [17, 18]. However, research using this important economic crop (wolfberry) to study the trait of male sterility in plants is still scarce.

Here, we performed RNA-sequencing (RNA-Seq) and m6A-sequencing (m6A-Seq) of the anther samples of LB1 and LB5, and assembled a consensus transcriptome to derive transcriptome-wide m6A maps of these two wolfberry lines. By comparing the omics data generated from LB1 vis-à-vis LB5, we identified those genes with differential expression and differential m6A methylation patterns, and then carried out their gene ontology (GO) enrichment analysis to uncover the biological processes that may be associated with the male sterility trait in wolfberry. Our research findings provide new insight into how m6A can regulate the trait of male sterility in plants.

Results

Reference-guided assembly and annotation of the wolfberry transcriptome

For both LB1 and its natural male sterile mutant LB5 (Figure S1), we respectively sequenced the transcriptomes of six anther samples from the tetrad stage (T), the single nucleus pollen stage (S), and the mature pollen stage (M) (LB1_T, LB1_S, LB1_M, LB5_T, LB5_S, LB5_M; see Methods), each with three biological replicates. Raw reads from each RNA-Seq library were pre-processed using fastp (v0.20.1) [19] to remove the adapter and low-quality sequences, resulting in a total of 908.8 million paired-end reads (63.5 Gb) (Table 1). After that pre-processing, the remaining reads were mapped onto the wolfberry reference genome sequences (accession number PRJNA640228) [20], using the HISAT2 program (v2.2.1) [21], whose alignment rate ranged from 93.39% to 96.24%. Then, using Cufflinks software (v2.2.1) [22], transcript fragments were first assembled for each RNA-Seq library, and then merged to generate a transcriptome assembly for both wolfberry lines under study. Under the criteria defined in a previous study [23], 29 324 genes and 84 709 transcript fragments were deemed high quality, that is, having a TPM ≥ 1 and read count ≥ 5 in at least one sample (Fig. 1A). Of these 29 324 genes, we found 25 841 that were expressed (i.e., TPM ≥ 1) in both LB1 and LB5 lines, with 2395 and 829 genes specifically expressed in each, respectively. In LB1, the number of specifically expressed genes decreased in the order of T stage (802) > M stage (367) > S stage (206), while in LB5, they were more abundant at the S stage (1253) than either the other two stages (T: 452 and M: 185) (Fig. 1B; Table S1).

Table 1 Comparison of library size, library quality, and read alignment rates of RNA-Seq samples
Fig. 1
figure 1

Transcriptome assembly and annotation of the two wolfberry lines. A Overview of wolfberry transcriptome assembly pipeline. B Venn diagram of genes expressed in LB1 and LB5 at three developmental stages. C BUSCO’s assessment results. D Distribution of sources for functional annotation of assembled transcripts in wolfberry

Overall, 83.3% (24 434) of 29 324 annotated genes (Table S2) were predicted capable of encoding proteins by TransDecoder [24]. These 24 434 protein-encoding genes (PCGs) served as the input of BUSCO [25] to evaluate the completeness of the assembled transcripts. BUSCO’s assessment results revealed that of the 5950 core genes queried, 94.47% were detected (5574 complete and 49 partial) (Fig. 1C). Among these 24 434 PCGs, we found that 22 997 (94.12%) can be assigned with a function annotation label when using EggNOG-mapper (emapper) [26]. The vast majority of (90%) this annotation information came from the clade of asterids (e.g., Solanum lycopersicum, Solanum tuberosum) that are closely related taxonomically to wolfberry (Table S3; Fig. 1D). In addition, 1517 of these 24 434 PCGs (Table S4) were predicted to be transcript factors (TFs) according to PlantTFDB v5.0 [27].

Overall, the assembled transcriptome, along with its corresponding functional annotations, provided a comprehensive resource now available for the transcriptome-wide m6A analysis and other functional genomics studies in wolfberry.

Identification and expression analysis of m6A potential regulators in wolfberry

A total of 25 m6A potential regulators (7 writers, 10 erasers, and 8 readers) in wolfberry were identified from wolfberry’s assembled transcriptome by applying a specified bioinformatics pipeline (Fig. 2B; Table 2; Figures S2 and S3). These potential regulators were named according to their predicted orthologs from Arabidopsis [28] and tomato [29]. Among these 25 potential regulators, 22 were relatively highly expressed with TPM ≥ 10 in both LB1 and LB5 lines. These 22 genes exhibited dynamic expression patterns across T to M stages in either line (Figure S4). Several genes had markedly divergent expression dynamics between LB1 and LB5. For example, certain m6A writer genes, namely XLOC_003715 (LbaMTC, homologous with AtMTC) and XLOC_10786 (LbaMTB, homologous with AtMTB; Figure S3B), displayed different expression patterns when going from the T to M stages between LB1 and LB5 (Fig. 2A). There were 14 genes whose expression was greater in LB5 than LB1 at all three stages studied. One representative example is XLOC_016741 (LbaYTH5) encoding an YTH domain-containing protein. Gene LbaYTH5 is close to the experimentally demonstrated Arabidopsis m6A readers AtECT6 and AtECT7 in the phylogenetic tree, constructed with the FastTree software [30] using YTH domain-containing proteins from wolfberry, tomato, maize, A. thaliana and Physcomitrium patens (Fig. 2B). In LB1, the expression level of LbaYTH5 was 68.64, 51.97, and 56.27 at the T, S, and M stages, but higher at 76.26, 83.20, and 104.35 in LB5, respectively (Figure S5).

Fig. 2
figure 2

Identification and expression analysis of putative m6A regulators in wolfberry. A Dynamic trends in the expression levels of two example genes. B Phylogenetic tree of the YTH gene family (Ppa: Physcomitrella patens, Zma: Zea mays, Ath: Arabidopsis thaliana, Sly: Solanum lycopersicum and Lba: Lycium barbarum); the color-marked genes are annotated YTH genes in wolfberry. C Volcano plot of differentially expressed genes at each stage (p-value < 0.05, absolute fold-change > 2)

Table 2 List of putative m6A regulators in wolfberry

Pair-wise comparisons also revealed that genes of some m6A potential regulators differed starkly in their expression between LB1 and LB5. Using DESeq2 [31], we identified 4499, 4575, and 7491 differentially expressed genes (DEGs) between LB1 and LB5 at the T, S, and M stages, respectively (Fig. 2C; Table S5). One representative gene is XLOC_021201, which may encode an m6A eraser regulator, belonged to the same evolutionary branch as the Arabidopsis gene AtALKBH10 (Figure S3A). Compared with LB5, this gene had markedly higher values of TPM in LB1 at both S and M stages (Figure S4). Altogether, these results revealed the differential expression patterns of genes encoding m6A potential regulators between LB1 and LB5 during the anther development of wolfberry from the T to M stage.

Transcriptome-wide identification of m6A methylation in LB1 and LB5

With the DEGs encoding m6A potential regulators in mind, we then asked whether there some differences in the m6A methylome also arise between LB1 and LB5. To address this question, we firstly measured the m6A/A ratio of pollens from these two wolfberry lines at the S stage, and found that the m6A level in mRNA was slightly different between LB1 and LB5 (Figure S6). Then, we used anther samples at the S stage as an example to obtain m6A maps of LB1 and LB5 via m6A-Seq technology. Six m6A-immunoprecipitation (IP) and matched input (non-IP control) libraries were constructed and sequenced for RNAs from the anther samples of these two lines at the S stage, with three biological replicates per sample. Raw sequencing reads from each library were processed to discard adaptor sequences and low-quality bases using the fastp (v0.20.1) [19]. The resulting reads from the wolfberry LB1 and LB5 samples were aligned to the wolfberry reference genome (accession number PRJNA640228) using HISAT2 (v2.2.1) [20, 21]. Read distribution analysis showed that the reads from m6A-IP samples accumulated highly around the stop codon and within the 3’-untranslated region (3’UTR) in all samples, with the sequencing data of input and IP being highly correlated between replicates, thus confirming the high quality of m6A-Seq in this study (Fig. 3A).

Fig. 3
figure 3

Overview of the m6A methylome in wolfberry. A The read distribution of input and IP data from m6A-Seq of LB1 and LB5. B Comparison of m6A peaks between LB1 and LB5. C Peak density in five non-overlap** transcript segments: the 5’-untranslated region (5’UTR), near start codon, coding sequence (CDS), near stop codon, and 3’-untranslated region (3’UTR). D Relative enrichment of the m6A peaks in the five non-overlap** transcript segments. E The motif on the top is the 1st-ranked enriched URUAY motif (where R denotes A/G, A is m6A and Y denotes C/U). The motif on the bottom is the canonical RRACH motif (where R denotes A/G, A is m6A, and H denotes A/C/U). F Volcano plot of differentially methylated genes (p-value < 0.05, absolute fold-change > 2)

We detected 10 389 and 9301 m6A peaks in LB1 and LB5, corresponding to 9587 and 8668 m6A-modified genes, respectively (Fig. 3B; Table S6). In both LB1 and LB5 lines, their m6A-modified genes (m6A genes) were significantly longer than genes without m6A peaks (non-m6A genes) (Student’s t-test, p-value < 0.001) (Figure S7). In addition, those m6A-modified genes exhibited greater expression than non-m6A genes (Figures S8 and S9). Moreover, m6A peaks were enriched in the following order: 3’UTR (3’-untranslated region) > near stop codon > CDS (coding sequence) > start codon > 5’UTR (5’-untranslated regions) (Fig. 3C, D). All these m6A peaks were scanned further for enriched motifs, using MEME suite (http://meme-suite.org/index.html) [32]. As expected, the URUAY (where R represents A/G and Y represents C/U; Fig. 3E) motif was significantly enriched within the m6A peaks, and in both lines the URUAY motif is the most enriched one. We next examined the canonical m6A motif RRACH (where R represents A/G, A is m6A, and H represents A/C/U; Fig. 3E), using another commonly used motif analysis program, HOMER (v4.10) [33]. Evidently, the RRACH motif could also be detected in our m6A-Seq data, it being significantly enriched in m6A peaks vis-à-vis non-m6A regions (Figure S10).

Furthermore, a differential methylation analysis was performed by RADAR [34], which uncovered 2205 genes that were differentially m6A-modified between LB1 and LB5. Among these 2205 differentially m6A-modified genes, 1642 genes (including 67 TFs) were hypermethylated in line LB1 compared with line LB5, while 563 genes (including 19 TFs) were hypomethylated (Table S7; Fig. 3F). Previous studies have linked a number of TFs from MADS, MYB, ARF, and GATA gene families to plant male sterility [35,36,37,21], this yielding the BAM (binary sequence alignment format) file recording read-genome alignments. Cufflinks (v2.2.1) software [22] was used to perform the transcriptome assembly. Finally, the RNA-Seq assembly results from the same line were merged using the Cuffmerge tool in Cufflinks (v2.2.1).

Construction of the consensus transcriptome for the two wolfberry lines

We screened the assembled transcript fragments on the basis of expression-level evidence [23]. The TPM (Transcripts Per Million) values were calculated using featureCounts (v2.0.1) [59] and read counts per base were calculated using bedtools genomecov (v2.30.0) [60]. Those assembled transcripts with either low expression (e.g., TPM < 1) or low read coverage (e.g., < 5) in the majority of each transcript fragments (e.g., 60%) were discarded. Finally, we merged the remaining high-confidence transcript fragments from two lines to construct a consensus transcriptome, and further calculated the TPM for the latter using featureCounts (v2.0.1) [59].

Gene expression analysis

To investigate the conditions under which certain genes are specifically expressed, their expression levels (TPM values) in samples from different developmental stages were compared. For each line, a gene was considered as expressed under a condition if it was expressed (e.g., TPM ≥ 1) in at least one of the three biological replicates. The R package ‘DESeq2’ [31] was used to search for DEGs (differentially expressed genes) based on the criteria of p-value < 0.05 and absolute fold-change > 2.

Gene structure and functional annotation

To predict the open reading frame (ORF) of assembled transcript fragments, we used the transcripts coding regions-finding software TransDecoder (https://github.com/TransDecoder/TransDecoder/releases, v5.5.0) [61]. The fast functional annotation tool EggNOG-mapper (emapper v2.1.9; eggNOG DB v5.0.2, http://eggnog5.embl.de; diamond vv2.0.1) [22, 62, 63] was used to annotate the assembled transcript fragments under these parameters: tax_scope = ‘Viridiplantae’. Further, the transcription factor information was simultaneously annotated using the Plant Transcription Factor Database (PlantTFDB v5.0, http://plantregmap.gao-lab.org) [27].

Annotation of putative m6A regulators in wolfberry

Six HMM (hidden Markov model) profiles of the m6A regulators were downloaded from the Pfam database (http://pfam-legacy.xfam.org) [64], namely eraser: ALKBH10B (PF13532), reader: YTH (PF04146), and writer: MT-A70 (PF05063), FIP37 (PF17098), VIR (PF15912), HAKAI (PF18408). These HMM profiles were respectively inputted into the hmmsearch program of HMMER software (v3.1b2) [65] to search the HMM-profile domain against known wolfberry protein sequences, following by building a new HMM profile for wolfberry, using the hmmbuild program of HMMER software. Next, the fast multiple sequence alignment software MAFFT (v7.310) [66] set to its default parameters was used to complete the multiple sequence alignment of the putative m6A regulators in five plant species (Physcomitrella patens, Zea mays, Arabidopsis thaliana, Solanum lycopersicum, and Lycium barbarum). FastTree (v2.1.10) [30] set to its default parameters was used to construct approximate maximum likelihood phylogenetic tree based on the multiple sequence alignment results (Figure S3).

Analyses of m6A-Seq data

Raw sequencing reads were cleaned using the fastp tool (v0.20.1) [19] to remove any reads containing adapter or low-quality sequences. Cleaned reads were then mapped onto the reference genome (accession number PRJNA640228) using HISAT2 (v2.2.1) [20, 21] under its default settings. To identify the m6A peak, the R package ‘exomePeak’—with a major update released (exomePeak2)—was used with default parameters [67]. The DREME (Discriminative Regular Expression Motif Elicitation) tool in MEME suite (http://meme-suite.org/tools/dreme) [32] was used to distinguish the relatively short (up to 8 bp), ungapped motifs, according the following parameters: minimum length of the motif = 5; maximum length of the motif = 7; E-value threshold = 1E-5. For a specified motif (e.g., RRACH or URUAY), to calculate the significance level of its relative enrichment, the AME tool in MEME suite (http://meme-suite.org/tools/ame) was used. The discovered m6A peaks were divided into five categories based on their positions: 5′-untranslated region (5’UTR), near start codon, coding sequence (CDS), near stop codon, and 3′-untranslated region (3’UTR). The coordinates of these genomic elements were extracted, then bedtools intersect (v2.30.0) [60] searched for overlap** between m6A peaks and each element. The R package ‘RADAR’ [34] was implemented to reveal the differentially methylated genes from the two wolfberry lines, using the criteria of p-value < 0.05 and an absolute fold-change > 2.

Quantitative analysis of mRNA m6A by LC–MS/MS

The LC–MS/MS technique was used to detect global m6A levels in wolfberry. For each sample, the RNA sample were digested into single nucleosides in a digestion buffer which contains phosphodiesterase I (0.01 U), nuclease S1 (180 U), 1 mM zinc sulfate, 280 mM sodium chloride and 30 mM sodium acetate. The digestion buffer was placed at 37℃ for 4 h at PH 6.8, and the bacterial alkaline phosphatase (30 U) was used to dephosphorylated for 2 h at 37℃. Enzymes were removed by filtration (Amicon Ultra 10 K MWCO). Then, the nucleosides samples were subjected to liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) which analysis on a QTRAP 4500 mass spectrometer (SCIEX, Framingham, MA, USA). The quantification of nucleosides was performed using the nucleoside-to-base ion mass transitions of 268.1 to 136.1 for A, 245.1 to 113.0 for U, 244.1 to 112.1 for C, 184.1 to 152.1 for G, 282.1–150.1 for RNA m6A. We determined the concentration of m6A and A by comparing with the standard curve obtained from their nucleoside standards, and analyzed the ratio of m6A to A based on the calculated concentrations. Three independent biological replicates were performed for this experiment.

Gene ontology enrichment analysis

Gene Ontology (GO) enrichment analysis was performed by implementing the R package ‘clusterProfiler’ [68].

Conservation analysis

Multiple sequence alignments of protein sequences were performed using MAFFT (v7.310) [65] with default parameters. FastTree (v2.1.10) [30] was used to construct approximate maximum likelihood phylogenetic tree with default parameters. Motif analysis was performed using MEME suite (http://meme-suite.org/tools/dreme) [32]. The conservation analysis results were displayed using TBtools (v1.120) [68].

Statistical analysis

The Student’s t test and Wilcoxon test were applied to the data, as respectively needed, using the R package ‘ggsignif’ (https://github.com/const-ae/ggsignif).