Background

When the class of RNA regulatory genes known as microRNAs (miRNAs) was discovered it introduced a whole new layer of gene regulation in eukaryotes [1]. Since the discovery of the first miRNA (lin-4) in Caenorhabditis elegans, thousands of miRNAs have been identified experimentally or computationally from a variety of species [1]. miRNAs are currently estimated to comprise 1 to 5% of animal genes and collectively regulate up to 30% of genes, making them one of the most abundant classes of regulators [2]. However, while the importance of miRNAs in animal ontogeny has been rapidly elucidated, their role in phylogeny currently remains largely unknown. Recent studies have provided important clues indicating that these approximately 22-nucleotide non-coding RNAs might have been a causative factor in increasing organismal complexity through their action in regulating gene expression [36]. Indeed, vertebrates possess many more miRNAs than any invertebrate sampled to date, and the emergence of vertebrates is characterized by an unprecedented increase in the rate of miRNA family innovation [46]. However, how this increase in the miRNA repertoire relates to the emergence of the complex vertebrate body plan is currently unclear because groups from which we might gain insight into this (such as amphioxus) have not been thoroughly studied yet.

As the living invertebrate relative of the vertebrates, amphioxus affords the best available glimpse of a proximate invertebrate ancestor of the vertebrates and is likely to exemplify many of the starting conditions at the dawn of vertebrate evolution [7, 8]. The completion of the amphioxus genome project provides a tremendous opportunity for identifying miRNAs in this organism [9]. According to the rules proposed by Ambros et al. [10] and Berezikov et al. [11], a genuine miRNA should fulfill two basal requirements for miRNA annotation: its expression should be confirmed experimentally (the expression criterion) and the putative miRNA should be embedded within a canonical stem-loop hairpin precursor (the structural criterion). Furthermore, an optional but commonly used criterion is that the mature miRNA sequence and the predicted hairpin structure should be conserved in different species. Non-conserved miRNAs require more careful examination. In this work, we have proposed an integrative strategy combining an experimental screen with bioinformatic analysis to identify miRNAs fulfilling all these requirements (Figure 1). Our strategy has four steps: investigating all small RNAs expressed in the amphioxus Branchiostoma belcheri (Gray) via Solexa, a massively parallel sequencing technology [4b, miRNAs conserved throughout the Bilateria). These miRNAs are phylogenetically conserved despite several hundred million years of divergent evolution, suggesting ancient roles for them in activating the terminal differentiation of organs, tissues and specific cell types common to metazoans. Protostomes and chordates appear to have miRNAs that are specific to each clade as most invertebrate miRNAs have been lost in the chordate lineage (Figure 4b, homologs of invertebrate miRNAs), and many novel miRNAs present in both chordates and vertebrates have been fixed in the chordate genome and perpetuated under intense purifying selection over evolutionary time (Figure 4b, miRNAs present in both chordates and vertebrates). This observation suggests that chordates have abandoned most ancestral characters and are more vertebrate-like than any other invertebrate. Since many vertebrate miRNAs have homologs in amphioxus, these miRNAs must, therefore, have been present in the last common ancestor of vertebrates. Thus, the profound reorganization of the miRNA repertoires (the continuous expansion of the miRNA inventory and the loss of ancient miRNAs) in amphioxus highlights the importance of amphioxus as a model for understanding the transition from invertebrates to vertebrates.

Comparison of the miRNA repertoires of cephalochordates and tunicates

miRNA can also be employed as a valuable factor to resolve outstanding evolutionary questions. For instance, a fundamental evolutionary question is whether cephalochordates or tunicates are the closest living invertebrate relative of the vertebrates [23]. Living invertebrate chordates comprise the urochordate tunicates (the most familiar of which are the ascidians) and the cephalochordate amphioxus. Traditionally, cephalochordates are considered to be the closest living relatives of vertebrates, with tunicates representing the earliest chordate lineage [7, 8]. However, recent phylogenetic analyses with large concatenated gene sets suggest that the evolutionary positions of tunicates and cephalochordates should be reversed [24]. In order to solve this puzzle, we reconstructed the evolutionary histories of tunicates and cephalochordates according to their miRNA histories.

If tunicates are more vertebrate-like, then they should possess a subset of miRNAs conserved across chordates and vertebrates, but few invertebrate-specific miRNAs. However, by tracing the phylogenetic histories of miRNAs in Oikopleura dioica,Ciona intestinalis, and B. belcheri (Gray), we found that several phylogenetically conserved miRNAs were either lost or no longer recognizable in Oikopleura dioica (for example, miR-33, miR-34, miR-125, miR-133, miR-184, and miR-210), and we did not detect any miRNAs present in both chordates and vertebrates. Likewise, some phylogenetically conserved miRNAs were also lost in C. intestinalis (for example, miR-1, miR-9 and miR-10). In contrast, many phylogenetically conserved miRNAs, as well as miRNAs present in both chordates and vertebrates (for example, miR-216, miR-217, miR-22, miR-25, and miR-96), could be reliably traced back to B. belcheri (Gray). As can be seen in Figure 4c, amphioxus, in comparison to tunicates, shares additional miRNAs with zebrafish and abandons most ancestral miRNAs. These data strongly suggest that amphioxus miRNAs are less divergent from vertebrate miRNAs than are tunicate miRNAs. In agreement with this, the cephalochordate body plan is more vertebrate-like than that of any tunicate, as amphioxus possesses many homologs of vertebrate organs (for example, the pineal and pronephric kidneys) that are not found in tunicates [25]. Thus, the most appropriate organisms to use as a simple model for deciphering the fundamentals of vertebrate development are turning out to be the amphioxus cephalochordates, whose body plans and miRNA repertoires are more vertebrate-like than those of the tunicates. In contrast, tunicates are morphologically and molecularly derived with a trend towards genomic simplification.

Discussion

One important question in evolutionary biology concerns the origin of vertebrates from invertebrates. Amphioxus is generally accepted as an ideal model to use as a proxy for the ancestral vertebrates [7, 8, 26]. Recent advances in molecular biology and microanatomy have supported homology of body parts between vertebrates and amphioxus [8, 27, 28]. Thus, a thorough knowledge of the morphology and genetic programs of amphioxus may provide us with a unique opportunity to reconstruct the major events of early vertebrate evolution and decipher how the vertebrate body plan evolved.

While amphioxus is an outstanding model organism to bridge the huge gap between invertebrates and vertebrates, no amphioxus miRNAs have been registered in the miRNA database miRBase 12.0 [18]. The study of miRNAs in vertebrates such as mice, rats and humans as well as invertebrates such as C. elegans and Drosophila melanogaster has far outpaced that in amphioxus. Given the important position of amphioxus in metazoan phylogeny, the identification of novel miRNAs from amphioxus will contribute greatly to our understanding of both miRNA evolution and the possible role of miRNAs in facilitating the evolution of more complex animal forms.

Previously, miRNAs were defined as non-coding RNAs that fulfill a combination of expression and biogenesis criteria [10, 11]. First, a mature miRNA should be expressed as a distinct transcript of approximately 22 nucleotides that is detectable by Northern blot analysis or other experimental means such as cloning from size-fractionated small RNA libraries. Second, a mature miRNA should originate from a precursor with a characteristic secondary structure, such as a hairpin or fold-back, that does not contain large internal loops or bulges. The mature miRNA should occupy the stem part of the hairpin. By this method, a large portion of the small RNAs, such as breakdown products of mRNA transcripts, other endogenous non-coding RNAs (for example, tRNAs, rRNAs and natural antisense small interfering RNAs), as well as exogenous small interfering RNAs, are filtered out from the population of miRNAs [10, 11]. However, hairpin structures are common in eukaryotic genomes and are not a unique feature of miRNAs. Many random inverted repeats (termed pseudo-hairpins) can also fold into dysfunctional hairpins [14, 17]. To eliminate the false positive pseudo-hairpins, an optional but commonly used criterion that requires miRNA sequence and hairpin structure be conserved in different species [10, 11] was employed in the present study. By this definition, we detected 55 conserved miRNA genes in the amphioxus B. belcheri (Gray) that encode 45 non-redundant mature miRNAs. All of these conserved miRNAs meet the expression and structure criteria required for miRNA annotation, and many have additional supporting evidence such as multiple observations of expression, genomic clustering, and cloning of the star sequences. Unfortunately, the problem has not been solved thoroughly since a large number of non-conserved pre-miRNAs with species-specific expression patterns do exist in eukaryotes [16]. To surmount the technical shortfalls of comparative methods for identifying species-specific and non-conserved pre-miRNAs, several ab initio predictive approaches have been extensively developed [14, 17]. With these methods, many non-conserved miRNAs have been discovered and experimentally verified in viruses and human [14, 17]. Here, we used miPred, an ab initio prediction approach for identifying pre-miRNAs without relying on phylogenetic conservation, to remove the irrelevant genomic pool of pseudo-hairpins without sacrificing putative non-conserved pre-miRNAs [14, 17]. Among 69 pre-miRNA-like hairpins, 11 were classified as pseudo-pre-miRNAs and 58 as authentic pre-miRNAs. Thus, 58 miRNA genes constitute the final collection of non-conserved miRNA genes in amphioxus, and these encode 53 non-redundant mature miRNAs. Likewise, all of these miRNAs meet the expression and structural criteria required for miRNA annotation, and many have additional supporting evidence, including multiple observations of expression, genomic clustering and cloning of star sequences. However, the set of non-conserved miRNAs was fundamentally different from the set of conserved miRNAs, as the non-conserved miRNAs were represented by only 23,613 tags compared to 246,524 tags for the conserved miRNAs. These results indicate that the non-conserved miRNAs are expressed at substantially lower levels or in limited cell types or circumstances.

While we were writing this manuscript, Luo and Zhang [29] reported the computational prediction of 28 miRNAs in amphioxus using a homology search of Branchiostoma floridae v1.0 (an incomplete amphioxus genome). However, prediction of miRNAs without experimental proof is not sufficient, since predicted miRNAs only meet the structural criterion for being authentic miRNAs [10]. Furthermore, the computational approach provides no information on the expression levels of amphioxus miRNAs. After carefully comparing our result with that of Luo and Zhang, we found that the dataset from their study is just a subset of the Solexa dataset (Table S10 in Additional data file 1). In addition to computer-aided algorithms, Sanger-based molecular cloning strategies have been frequently used to identify new miRNAs in metazoans [30, 31]. By using this method, Dai et al. [32] provided experimental evidence for 33 evolutionarily conserved miRNAs and 35 amphioxus-specific miRNAs in the amphioxus Branchiostoma japonicum. However, the Sanger-based molecular cloning approach is highly biased towards abundantly and/or ubiquitously expressed miRNAs [17], making it unsuitable for identifying miRNAs that are expressed at low levels, at very specific stages or in rare cell types. This limitation, however, can be overcome by massively parallel sequencing technologies that significantly increase sequencing depth [11]. Accordingly, we employed Solexa sequencing, a massively parallel sequencing technology, to identify miRNAs from amphioxus. Solexa is a breakthrough sequencing technology characterized by numerous distinct advantages over conventional Sanger-based cloning technologies. In addition to avoiding the bacterial cloning steps inherent in Sanger sequencing, Solexa enables hundreds of thousands of short sequencing reads to be generated in one run, thereby boosting the discovery of many expressed small RNAs and resulting in the identification of more candidate miRNAs.

Consistent with this idea, our result is shown to be superior to that of Dai et al.: First, the reads of amphioxus miRNAs identified by Dai et al. were fundamentally different from ours. For instance, Dai et al. identified 841 sequences (out of 2,217 effective reads) as amphioxus miRNAs, whereas we identified 246,524 sequences (out of 313,313 effective reads) as amphioxus miRNAs. Second, after carefully comparing our dataset with that from Dai et al.'s study, we found that all the conserved miRNAs identified by Dai et al. are just a subset of the conserved miRNAs identified by us, and 23 out of 35 amphioxus-specific miRNAs have been identified by both (Table S10 in Additional data file 1). Third, besides expression and structural criteria, Dai et al. provided no additional evidence supporting the correct annotation of amphioxus-specific miRNAs. As can be seen in Table S10 in Additional data file 1, most of the 12 amphioxus-specific miRNAs identified from B. japonicum but not found in B. belcheri (Gray) are classified as pseudo-pre-miRNAs and represented by a single read. Thus, these non-conserved miRNAs require more careful examination for correct annotation as genuine miRNAs. Fourth, we showed that Solexa can produce highly accurate and definitive readouts of many low-level miRNAs, such as miRNA*s. In contrast, none of miRNA*s has been found from B. japonicum by Sanger-based cloning approach. This result further suggests that the Sanger-based molecular cloning approach is unsuitable for identifying miRNAs that are expressed at low levels.

When this manuscript was submitted, miRBase 13.0 was released. Since our analysis was based on miRBase 12.0, we updated the analysis by comparing our dataset with miRBase 13.0. No new miRNAs were identified and none of the major conclusions changed, except that some amphioxus-specific miRNAs were designated corresponding names (Table S10 in Additional data file 1). Taken together, it turns out that Solexa sequencing technology is the most powerful tool for miRNA discovery. More importantly, comparison of miRNA identified from B. belcheri (Gray), B. floridae, and B. japonicum will confirm the existence of some identical miRNAs in amphioxus and provide important clues to the roles of some special miRNAs.

We also present a comprehensive analysis of the organization of amphioxus miRNA genes. Consistent with the miRNA organization in zebrafish, mouse and humans, many amphioxus miRNAs have multiple copies in the genome and/or are organized in clusters. The implications for miRNA gene amplification are still unknown, but miRNA genes with multiple copies may augment or amplify the physiological functions of individual miRNA genes. Our observations support the hypothesis that duplication events causing the rapid spread of miRNA genes throughout the genome occur profoundly in the lineage leading to vertebrates.

Previous studies have suggested that animals with complex organs have increased their cell type repertoire and morphological complexity over geological time in a manner strikingly similar to the expansion of their miRNAs [46]. The availability of more miRNAs in animals with complex organs might be helpful to further modulate the developmental network in complex tissues and organs. Interestingly, we noted that although amphioxus does not possess as many miRNAs as vertebrates, it shares a set of key miRNAs with vertebrates that may have had a huge impact on phenotypic diversity and cell lineage decisions during animal phylogeny. For instance, miR-183, miR-184 and miR-96 dominate the population of expressed miRNAs in sensory organs in vertebrates [33], and these were also detected in amphioxus. Consistent with this, amphioxus possesses a frontal eye (homologous to the vertebrate paired eyes) and a lamellar organ (homologous to vertebrate pineal photoreceptors) [28]. Likewise, in agreement with the presence of gastric endocrine cells in amphioxus that are possibly homologous to the pancreatic islet cells of mammals [34], miR-216, miR-217, miR-7, and miR-375, which are characteristic of pancreatic tissue [35], are well established in amphioxus. Although the detailed spatial expression of these miRNAs remains to be shown, it is intriguing to speculate that a pool of such miRNAs contributed greatly to the evolution of complex vertebrate body plans. Further comparison of the body part homology and miRNA repertoires of amphioxus and vertebrates will allow us to model more precisely what our ancestors were like and, thereby, provide a unique opportunity to decipher how the vertebrate body plan evolved.

Another interesting observation is that none of the miRNAs involved in adaptive immunity (for example, miR-181a, miR-155, and miR-223) could be reliably traced back to amphioxus or previous protostomes [36]. When and how adaptive immunity emerged is an evolutionary mystery. It is generally believed that adaptive immunity emerged suddenly and is only present in jawed vertebrates [37]. We hypothesize that certain key miRNAs, such as miR-181a, miR-155, and miR-223, played a fundamental role in the genesis of the molecular machinery of the adaptive immune system. In this regard, the absence of these miRNAs in invertebrates (including amphioxus) explains why adaptive immunity is restricted to jawed vertebrates. However, to understand better the evolutionary origins of adaptive immune systems, more comparative data from jawless vertebrates (for example, lamprey and hagfish) are clearly needed.

Conclusions

Our current study introduces an accurate and efficient approach for miRNA discovery and will aid the identification of many miRNAs in other species. More importantly, our study provides the basis for future analysis of miRNA function in amphioxus. Further comparison of the body part homology and miRNA repertoire between amphioxus and vertebrates will allow us to model more precisely what our ancestors were like and offer a unique opportunity to decipher how the vertebrate body plan evolved.

Materials and methods

Animal collection and RNA isolation

Adults of the Chinese amphioxus B. belcheri (Gray) were collected from Beihai, Guangxi, China and kept alive with seawater and sea alga. For Solexa sequencing, 12 adult animals were pooled together, and total RNA was extracted from pooled samples with Trizol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions.

Solexa sequencing

The sequencing procedure was conducted as previously described [12]. Briefly, after PAGE purification of small RNA molecules (under 30 bases) and ligation of a pair of Solexa adaptors to their 5' and 3' ends, the small RNA molecules were amplified using the adaptor primers for 17 cycles and fragments of around 90 bp (small RNA + adaptors) were isolated from agarose gels. The purified DNA was used directly for cluster generation and sequencing analysis using the Illumina Genome Analyzer (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. The image files generated by the sequencer were then processed to produce digital-quality data. After masking of adaptor sequences and removal of contaminated reads, clean reads were processed for computational analysis.

In silicoanalysis

Solexa reads were aligned against the amphioxus genome (Branchiostoma floridae v2.0) [9] using SOAP [15]. Sequences with perfect match or one mismatch were retained for further analysis. To further analyze the RNA secondary structures comprising matched Solexa reads, 100 nucleotides of genomic sequence flanking each side of these sequences were extracted, and the secondary structures were predicted using RNAfold [38] and analyzed by MIREAP [39] under default settings. MIREAP is a computational tool specially designed to identify genuine miRNAs from deeply sequenced small RNA libraries; it fully considers miRNA biogenesis, sequencing depth and structural features to improve the sensitivity and specificity of miRNA identification. Stem-loop hairpins were considered typical only when they fulfilled three criteria: mature miRNAs are present in one arm of the hairpin precursors, which lack large internal loops or bulges; the secondary structures of the hairpins are steady, with the free energy of hybridization lower than -20 kcal/mol; and hairpins are located in intergenic regions or introns. Those genes whose sequences and structures satisfied all of these criteria were considered as candidate miRNA genes. Subsequently, we adopted a computational approach named miRAlign to predict new miRNA genes that are paralogs or orthologs to known miRNAs [13]. Finally, all remaining candidates were subjected to MiPred to filter out pseudo-pre-miRNAs. MiPred is a random forest-based method for classification of genuine pre-miRNAs and pseudo-pre-miRNAs using a hybrid feature (including local contiguous structure-sequence composition, minimum of free energy of the secondary structure and P-value of randomization test) [14]. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the random forest-based classifier will predict whether it is a genuine pre-miRNA (minimum of free energy <-20 kcal/mol and P-value < 0.05) or a pseudo-pre-miRNA (minimum of free energy >-20 kcal/mol or P-value > 0.05).

Stem-loop quantitative RT-PCR assay

Assays to quantify the mature miRNAs were conducted as previously described [19, 20]. Briefly, 1 μg of total RNA was reverse-transcribed to cDNA by using AMV reverse transcriptase (TaKaRa Co., Tokyo, Japan) and looped antisense primers. The mix was incubated at 16°C for 15 minutes, 42°C for 60 minutes, and 85°C for 5 minutes. This allowed for the creation of a library of multiple miRNA cDNAs. Real-time PCR was performed using an Applied Biosystems 7300 Sequence Detection system (Applied Biosystems, Foster City, CA, USA) by standardized protocol. In each assay, 1 μl cDNA (1:50 dilution) was used for amplification. The reactions were incubated in a 96-well optical plate at 95°C for 5 minutes, followed by 40 cycles of 95°C for 15 s and 60°C for 1 minute. All reactions were run in triplicate. After reaction, the threshold cycle (CT) was determined using default threshold settings. The CT is defined as the fractional cycle number at which the fluorescence passes the fixed threshold. To calculate the expression levels of miRNAs, a series of synthetic miRNA oligonucleotides with known concentration were also reverse-transcribed and amplified. The absolute amount of each miRNA was then calculated by referring to the standard curve.

Microarray experiments

The 795 complementary probes (in triplicate) against miRNAs, corresponding to 537 human, 204 mouse, and 54 rat miRNAs, were designed based on miRBase release 12.0 [18]. RNA labeling, microarray hybridization and array scanning were performed as previously described [21]. Briefly, 25 μg of total RNA was used to isolate the low molecular weight RNA using polyethylene glycol solution precipitation. Subsequently, low molecular weight RNAs were labeled with Cy3 and hybridized with miRNA microarrays (CapitalBio Corp., Bei**g, China). Finally, hybridization signals were detected and quantified. Four independent adult amphioxus RNA samples were hybridized with miRNA microarrays separately. Hybridization intensity values from individual amphioxus sample were filtered and global median normalized. We considered candidate miRNAs with a signal above 3,000 and P < 0.001 from a Student's test (compared with the blank spotting solution) to be positive.

Pearson's correlation coefficient

Correlation is a technique for investigating the relationship between two quantitative, continuous variables. Pearson's correlation coefficient R, also known as the product-moment coefficient of correlation, is a measure of the strength of the association between the two variables. The first step in studying the relationship between two continuous variables is to draw a scatter plot of the variables to check for linearity. The nearer the scatter of points is to a straight line, the higher the strength of association between the variables. The Pearson's correlation coefficient R may take any value from -1 to +1.

Additional data files

The following additional data are available with the online version of this paper: Tables S1 to S10 (Additional data file 1); a figure showing the predicted stem-loop structures of conserved amphioxus miRNAs (Additional data file 2); a figure showing the predicted stem-loop structures of amphioxus-specific miRNAs (Additional data file 3).