Introduction

Cotton (Gossypium spp.) is one of the most important cash crops in the world because its main product fiber is the important natural source for the textile industry. In the four cultivars of Gossypium genus (G.hirsutum, G. barbadense, G. arboreum, and G.raimondii), G hirsutum (upland cotton) is the most widely planted due to its high yields and adaptability [42]. The period of cotton fiber development has been classified into four stages: initiation, elongation, secondary cell wall deposition, and maturity of fiber [12]. The first two stages could determine the number and length of fibers, further affecting fiber yields. Consequently, many studies have been documented to explore the underlying genetic mechanisms related to fiber initiation and elongation, contributing to cotton production improvement [16, 17, 19, 29, 65].

Cotton mutants with fibreless, fuzzless, and lintless are good materials for studying the mechanism of fiber initiation development. With the auxin and gibberellin (GA) application in two fibreless mutants of Asian cotton in vitro culture, it showed that fiber cells differentiated from ovule epidermis at a temperature lower than 30 degrees, but not above 32 degrees, which indicated the important roles of auxin and GA in fiber development promotion at some specific conditions [2]. SNPs comparison obtained by RNA-Seqs showed that glabrous mutant Xu142fl may be the progeny of G. barbadense. Based on the F2 and BC1 population between TM-1 and Xu142fl, the Li3 gene encoding an MYB-MIXTA-like transcription factor was mapped and adjacent to MYB25-like in the D12 chromosome [60]. The inheritance evaluation of fuzzless seed in segregation population suggested that the interaction of three loci (N1, n2 and n3) contributed to fuzzless seed [48], among which two loci, N1 and n2, located on a pair of homologous chromosomes A12/D12 [6]. The plants of N1N1 homozygous and N1n1 heterozygous produced fuzzless seeds [48]. The n3 locus that could produce the fibreless seed was identified by genetic analysis of cross progeny between N1N1 and n2n2 [48]. The fourth locus, named nt4\({n}_t^4\), was identified from ethyl methanesulfonate (EMS) induced mutation analysis, whose homozygous seed exhibited a partially naked phenotype [3]. All these fiber development defect mutants provide suitable materials for fiber development study.

With the advantage of Next Generation Sequencing (NGS), RNA-Seq as one of the NGS has been widely used to reveal expressions of genes and transcripts, among which some transcripts have been identified as non-coding RNA (ncRNA) because of their limitation of coding proteins. NcRNA includes microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and so on, which have emerged as key regulators of gene expression through their direct and indirect actions on chromatin [23,24,25]. In Oryza sativa, 1,254 differentially expressed lncRNAs (DELs) were identified from BIL progenies [26]. Another RNA-Seq showed that 328 of 444 DELs were associated with meiosis and the low fertility in autotetraploid rice [27]. The lncRNAs were also involved in abiotic stress such as drought and re-watering in Brassica napus [46], and osmotic and salt stress in Medicago truncatula [53]. The differences in genes expressions and regulations between fibreless mutants and wild-type have been investigated using omics methods [14, 28, 45, 51]. With fiberless mutant Xu142fl and its counterpart Xu142, a previous comparative small RNAome analysis uncovered a possible network of fiber initiation-related miRNAs in cotton ovules, which comprises seven miRNAs expressed in cotton ovules, and each of them bears functional specific targets [51]. Another work showed that 54 miRNAs are differentially expressed in fiber initiation between Xu142fl and its wild-type, which are potentially targeted to TFs such as MYB, auxin response factor, and Leucine repeat receptor [45]. Using multi-omics, the differentially expressed genes (1,953), proteins (187), and phosphoproteins (131) were identified by the comparison of Xu142 and Xu142fl [28]. Genetic markers including 302 SNPs for fiber development were also developed and validated based on a deep sequencing between Xu142 and Xu142fl [28]. In particular, a transcriptomic repertoire revealed that 645 and 651 lncRNAs were preferentially expressed in Xu142fl and Xu142, respectively. Further study showed that down-regulating two lncRNAs XLOC_545639 and XLOC_039050 in Xu142 fl increased the fiber initials on the ovules, while silencing XLOC_079089 in Xu142 shortened the fiber length [14], indicating the important and diverse roles of lncRNAs in fiber development.

LncRNA is a type of ncRNA with more than 200 nt in length and without protein-coding abilities [4, 51, 69]. Many studies on non-coding RNAs in cotton have been limited to small RNAs until now. For instance, a lot of miRNAs specifically expressed during anther development or callus were identified in male sterile cotton as well as cotton somatic embryogenesis [57, 64]. Gong et al. revealed the 33 conserved miRNAs families between the A and D genomes [9]. On the genomic level, the expression of 79 miRNAs families was studied and 257 novel miRNAs were identified related to cotton fiber elongation [63]. In addition, two key miR828 and miR858 were proved the roles in the regulation of homoeologous MYB2 (GhMYB2A and GhMYB2D) in G. hirsutum fiber development [11].

As a kind of long non-coding RNA, lncRNA provides more regulatory mechanisms for gene expression, protein synthesis, chromatin remodeling etc., while it is not clear about the detailed lncRNAs and the underlying mechanism in fiber development. A previous study identified 30,550 lincRNAs loci and 4,718 lncNATs loci, which are rich in repetitive sequences and preferentially expressed in a tissue-specific manner with weak evolutionary conservation. Further, lncRNAs showed overall higher methylation levels, and their expression was less affected by gene body methylation [52]. Using the epidermal cells from the ovules at 0 and 5 DPA from Xu142 and Xu142fl, 35,802 lncRNAs and 2,262 circular RNAs (circRNAs) were identified, of which 645 lncRNAs were preferentially expressed in the fibreless mutant Xu142fl and 651 lncRNAs were preferentially expressed in the fiber-attached lines; three lncRNAs XLOC_545639, XLOC_039050, and XLOC_079089 all showed the solid function in fiber development by VIGS assay [14]. Here, a novel glabrous mutant-ZM24fl, which showed excellent somatic embryogenesis induction was used to identify the key lncRNA involved in fiber initiation development [59].

Totally, 3,288 lncRNA transcripts were identified from the -2 DPA, 0 DPA and 5 DPA ovules of ZM24 and fl, which is significantly different from the number of identified lncRNA in Xu142fl [14] and G. barbadense L. cv 3-79 [52]. To identify the causal lncRNAs for fiber initiation, some comparisons were built to analyze the differentially expressed genes including lncRNAs and mRNAs during fiber initiation and earlier elongation. The identified DELs and DEGs in comparisons of 0 DPA vs -2 DPA and 5 DPA vs 0 DPA of ZM24 and fl indicated that many lncRNAs and coding genes are involved in the fiber initiation and primary development, while few lncRNAs and coding genes may involve the ovule development. The analysis of the DEGs further showed that fatty acid metabolism, very long strain fatty acid synthesis and sugar metabolism play important roles in the fiber initiation of ZM24, supporting the previous results [15, 39]. Moreover, some MYB family, bHLH type TFs encoding genes were also identified the important roles in fiber initiation, which is in agreement with the function of these TFs in previous research [10, 16, 29, 36, 40, 49, 50]. To uncover the upstream factors such as lncRNAs, we focused on the comparisons of ZM24_0 DPA vs fl_0 DPA and ZM24_0 DPA vs ZM24_-2 DPA to find the common lncRNAs which should be a key regulator for fiber initiation. Consequently, one lncRNA MSTRG 2723.1 was obtained, which locates on the A02G (84218766—84219942) encoding a lncNAT and covering the most coding region and partial 3’-terminal untranslated region of Ghicr24_A02G147600 (Figure S4). The co-expression analysis further identified its potent targets including 3-ketoacyl-CoA synthase, MYB family proteins, phosphatase 2C family proteins, pectin lysase, and some uncharacterized proteins, which may are involved in fiber initiation through fatty acid pathway, cell wall plasticity, MYB-mediated signaling etc. These results provide important clues for the upstream regulatory lncRNAs in fiber initiation and novel information associated with the fiber development regulation network. In addition, MSTRG 3390.1, MSTRG 48719.1, and MSTRG 31176.1 were also identified some positive correlation between fiber development and ovule development. The sequence analysis indicated that these lncRNAs are different from the previous lncRNAs XLOC_545639, XLOC_039050, and XLOC_079089 [14]. The target analysis also implied the possible interaction between different lncRNAs through mediating the common targets, which provide novel clues to explore the regulatory lncRNAs and underlying mechanisms in fiber development. Even with some achievement of lncRNAs, the understanding of the underlying mechanism of lncRNAs regulating targets or chromosome remodeling still needs more work to disclose.

Conclusion

Here, a novel glabrous cotton mutant ZM24fl was identified and applied to study the potential lncRNAs for fiber development with high-throughput sequencing. ZM24fl is derived from an elite cultivar of ZM24, which posses high callus induction and somatic embryogenesis ability, and is endowed with the valuable receptor for cotton genetic transformation [59]. Through the RNA-Seq and analysis in different ovules of ZM24 and fl, 3,288 lncRNAs were identified and some differentially expressed lncRNAs responsible for fiber (lint and fuzz) initiation and fiber earlier elongation were showed. Collectively, four lncRNAs MSTRG.2723.1, MSTRG.3390.1, MSTRG.48719.1 and MSTRG.31176.1 were showed potential important roles in fiber development, and the analysis of the target implied that MSTRG 2723.1 may function upstream of fatty acid metabolism, MBY25-mediating pathway, and pectin metabolism to regulate fiber initiation; the co-expression analysis between lncRNAs and targets further indicated the distinct models of different lncRNAs and interaction between lncRNAs, which provide precious information for illumination of the molecular mechanism of lncRNAs in fiber development of cotton.

Materials and methods

Plant Materials

Gossypium hirsutum L. acc. Zhongmiansuo24 (ZM24) and a natural fuzzless-lintless (fl) mutant from ZM24 were used and grown under standard field conditions in the Institute of Cotton Research of the Chinese Academy of Agricultural Sciences (Zhengzhou research base, Henan). The ovule tissues were collected from cotton bolls on -2, 0, and 5 DPA using a sterile knife. All materials were frozen in liquid nitrogen immediately and stored at -80 °C for the following experiments.

Microscopic observation of fiber initiation on ovules epidermis

To study the fiber initiation phenotypes of ZM24 and fl, the cotton bolls of two lines on -2, 0, 1 and 2 DPA were collected. Then, the ovules were stripped from the bolls in the middle region. Immediate Scanning electron microscopy (Hitachi) was performed to observe the ovule epidermis as described previously [14].

Strand specific libraries construction and sequencing

Total RNAs of each ovule sample was extracted using the RNAprep Pure Plant Kit (Tiangen, Bei**g, China) following the manufacturer’s instruction. Total RNAs of each sample was quantified and qualified by Agilent 2100 Bio-analyzer (Agilent Technologies, Palo Alto, CA, USA), Nanodrop 2000 (Thermo Fisher Scientific Inc.), and 1% agarose gel. RNA with RIN value above 7 was used for following library construction. The rRNA was removed using the Ribo-Zero™ rRNA removal Kit. The ribosomal depleted RNA was then used for sequencing library preparation according to the manufacturer’s protocol (NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®). The cDNA libraries with different indices were multiplexed and loaded on an Illumina Hiseq2500 with 150 base pair (bp) paired-end (PE150) raw reads according to the manufacturer’s instruction (Illumina, San Diego, CA, USA). RNA-Seq raw data with accession number SRP285346 was uploaded in the NCBI sequence read archive (http://www.ncbi.nlm.nih.gov/sra/) and the accession numbers of the twenty-four runs are SRR12710181-SRR12710192, and SRR12718970-SRR12718981.

Map** to the reference genome and LncRNAs identification

The raw data in fastq format were filtered with cutadapter (v1.9.1) software [30]. Clean data were obtained by removing reads that contained adapter, poly-N and base with Phred quality < 20 in 3’ or 5’ end, and the reads of length < 75 bp were removed after filtering. Finally, the GC percentage and Q30 of each sample were calculated using FastQC software (https://www.babraham.ac.uk/) and shown in Table S1. Clean data were mapped to the ZM24 genome (https://github.com/gitmalm/Genome-data-of-Gossypium-hirsutum/) [66] using HISAT(v2.1.0) [20, 21] software with the parameter “--rna-strandness RF”. Transcriptomes of each sample were assembled based on mapped reads and were merged by StringTie software (v2.0) [34, 35]. Transcripts annotation was performed using Cuffcompare [47]. Long non-coding RNA was identified as following steps: 1) transcripts with class codes of “i”, “u”, “x”, “j” representing the intronic transcripts, long intergenic noncoding RNAs (lincRNAs), long noncoding natural antisense transcripts (lncNAT), and the sense transcripts, respectively, were selected. 2) Transcripts with length > 200 bp, coverage > 1, FPKM > 0.5; 3) The CNCI [44], CPC [22] and PfamScan software were used to assessed protein-coding ability [7], with the parameter of (CPC score < 0, CNCI score < 0).

Differential expression analysis

The FPKM values and counts of genes and lncRNAs in each sample were calculated using StringTie and Ballgon [35]. Differential expression analyses were conducted by edgeR in R package [37, 38]. The DEGs and DELs were identified with an expression FPKM > 1.0, FDR (false discovery rate < 0.001), and |log2( fold change value)| ≥1 between each pairwise comparison.

Co-expression analysis between lncRNA and mRNA

To unveil the potential functions of DELs between the two genotypes, two interaction models of lncRNAs and protein-coding genes (lncRNAs/PC-genes) including cis- and trans-target were analyzed: 1) the Pearson correlation coefficient (PCC) between differentially expressed lncRNAs and mRNAs were calculated using the OmicShare tools (https://www.omicshare.com/) with the expression profiles (FPKM). The lncRNA-mRNA pairs with |PCC| > 0.95 and p-value < 0.01 were regarded as trans interaction between lncRNAs and mRNAs. 2) Protein-coding genes with a distance less than 20 kb from the upstream or downstream of lncRNAs were putative cis interaction. The co-expression networks were visualized by Cytoscape 3.6.1 [41].

GO and KEGG

To explore the functions of DEGs and lncRNAs between ZM24 and fl, the gene ontology (GO) enrichment was performed using the BLASTP program [1] and GO databases (http://archive.geneontology.org/latest-lite/) and (http://ftp.ncbi.nlm.nih.gov/gene/DATA/). Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed at KOBAS 3.0 website [58, 61] (http://kobas.cbi.pku.edu.cn/kobas3).

Q -PCR analysis

Ovules from bolls at -2, 0, 2, and 5 DPA were collected, and then total RNAs were extracted using the RNAprep Pure Plant Kit (Polysaccharides & Polyphenolics-rich, Tiangen, Bei**g, China) following the manufacturer’s instruction. Each reverse-transcribed reaction was performed with 1 μg RNA using a transScript® First-Strand cDNA Synthesis SuperMix (AT301-02, TransGen). The real-time PCR was performed on Roche 480 PCR system with a SYBR-Green Real-time PCR SuperMix (AQ101-01, TransGen). The 20 uL reaction volumes in each well contain 1 μL cDNA, 8.2 μL sterile water, 10 μL Mix, and 0.4 μL each of the forward and reverse primers. The Q-PCR procedures were as: pre-incubation of 30 s at 95 °C; followed by denaturation at 95 °C for 10 s, primer annealing at 55 °C for 10 s, and then extension at 72 °C for 30 s; finally, a melting curve at 95 °C for 30 s to check the primer specificity. The GhHistone3 (AF024716) gene was used as a reference gene. The 2-∆Ct method was used to calculate the relative expression of each gene, with three technical repetitions and three biological repetitions. Data were shown as mean ± SD. The student’s t-test was used for the significance statistic. The primer sequences used in the presented study are listed in Additional file 9.