
More than 50 years have passed since H. Harris deduced that most nuclear RNAs are likely to be non-protein-coding in 1959 [1]. Existence of functional noncoding RNAs has become convention knowledge. Thanks to the dramatically expanded scope of transcriptomics research with high throughput sequencing technology developed in recent years [2], it is possible now to more accurately investigate the expression of non-coding RNA. Differential expressed long non coding RNAs have been found and reported in almost a weekly basis [3]. Among this trend of digging into retroactive sourced experimental data, circular RNAs emerge on stump eventually.

Circular RNAs (circRNAs) represent a type of regulatory noncoding RNA whose head 3′ and tail 5′ ends covalently bond together to result in a circular form. The circular form was verified with electron microscope in 1979 [4]. In 2012, Salzman et al. [5, 6] developed an algorithm to detect scrambled exons in RNA-Seq datasets, and reported that circular RNA isoforms are actually predominant in many human gene isoforms. Later, an improved version of the algorithm with exon splicing site AU/AC searching was applied to find the fact that circRNAs serving as natural microRNA “sponges”, which enriched with miRNA targeting site and serving competitive endogenous RNAs (ceRNAs) [7, 8]. A circRNA named CDR1as [9] expressed in human and mouse brain was shown to negatively regulate miR-7 in a post-transcriptional manner [7, 9, 10]; this mechanism appears to be evolutionarily conserved [7]. With the regulation potential of miRNAs, circRNAs became widely interested in the research field [11]. Circular RNA can be enriched within the sample through treating samples with RNase R before conducting RNA-Seq [12, 13].

In the following years, extended identification of circRNAs in mouse [14], fly [15] and other animals [16] suggests that circRNA ubiquity is evolutionally-conserved. The reported circRNAs are not results of singular case. These experiment results tend to be reproducible. Further evidence indicates that human circRNA expression exhibits tissue specificity, and now tens of thousands of circRNAs have been found and reported across human tissues [14, 17,39]. In addition, 465 transcriptome sequencing data sets were collected from NCBI Sequence Read Archive [23, 40], including datasets used in recent publications [14, 2.

Enrichment analysis of the ORCEL genes

An enrichment analysis of the 728 ORCEL genes was conducted through DAVID [43]. In the result of this analysis we found that many ORCEL genes participate in important KEGG [44] pathways such as Ubiquitin mediated proteolysis, Pathways in cancer, Focal adhesion and Progesterone-mediated oocyte maturation, as summarized in Table 1. In the 32 genes participated in the hsa05200: Pathways in cancer, 115 miRNAs and 45 circRNA transcripts were found participated in the ORCEL, as illustrated in the network of Fig. 1. The network was generated through Cytoscape [45].

Table 1 KEGG pathway enrichment of ORCEL genes
Fig. 1
figure 1

 Network of 32 ORCEL genes enriched in pathways in cancer. The genes are illustrated as green rectangle nodes. The circRNAs are illustrated as yellow circles. The miRNAs involved are illustrated as pink diamond nodes

From the network illustrated in Fig. 1, it can be noticed that among the 32 genes enriched in the pathways in cancer, only the ORCEL of HIA1 doesn’t involve miRNAs targeting multiple other genes involved in the pathway. A complete table of the enrichment can be found in Additional file 3. In the network illustrated in Fig. 2, for the 19 genes participated in the hsa04120: Ubiquitin mediated proteolysis, 52 miRNAs and 22 circRNA transcripts were found participated in the ORCEL. It is also worth mentioned that 61 ORCEL genes were enriched in the GO term GO:0046907~intracellular transport and 121 ORCEL genes were enriched in GO:0031974~membrane-enclosed lumen. The ORCEL phenomenon can potential be correlated to intracellular transport.

Fig. 2
figure 2

 Network of 19 ORCEL genes enriched in ubiquitin mediated proteolysis. The genes are illustrated as green rectangle nodes. The circRNAs are illustrated as yellow circles. The miRNAs involved are illustrated as pink diamond nodes


In this study, putative roles of circRNAs serving as endogenous miRNA sponges were investigated through analysis of transcriptome sequencing data sets and published results. With the head-tail junction structure, circular RNAs are more stable than other kind of long noncoding RNAs. Hence circRNAs are hypothetically easier to acuminate in cells. The longer half-lives of circRNAs allow the presence of miRNA target sites increase within the cells. Through statistical analysis of the abundance of the miRNA target seeds on circRNA sequences, putative circRNA serving as miRNA sponges were identified. Recent years experiment results suggest certain threshold of miRNA target sites needs to be reached for the ceRNAs to have physiological effects [46, 47], henceforth the analysis was focused on the abundance of miRNA target sites. The application of FPKM of the putative circRNA sequences, and SRPBM of the back spliced junction sites as thresholds should increase the prudence of the circRNA sequence prediction in this study. With the further compliance with results of recent year studies, results of our analysis suggest that only a certain subset of expressed circRNAs potentially serve as nature miRNA sponges. Among the miRNAs predicted to be sponged by these identified circRNAs, many had been experimentally verified to target the circRNA source genes. From these observations we hypothesize that genes targeted by miRNAs tend to be conserved with enriched miRNA targeting site in the coding region. CircRNAs coded from these regions can henceforth sponge the miRNAs when overexpressed. This phenomenon was hereby named Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs. The term was inspired by Friedrich August Kekulé and his famous discovery of benzene ring [48]. Given the observation of this phenomena in genes involved in cancer pathways, validation of this hypothetical phenomena shall significantly impact the research prospects in medical science. ORCEL can potentially serving as a kind of control mechanism to resist miRNAs overdose. The fact that miRNA sponge circRNA originated from region miRNA target sites enriched regions, while genes encoded from these regions are conserved to be miRNA targets rationalize the existence of ORCEL.


Through the bioinformatics analysis it was found that for certain subset of circRNAs, putatively sponged miRNA had been experimentally verified targeting circRNA host gene. From this observation, the existence of competitive endogenous loop of circRNAs and their host gene can be observed. Given the self-regulation and self-induction nature of these circRNAs, this kind of phenomenon was hereby called Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs.


The data analysis process of this research is summarized in Fig. 3. First, to identify circRNA, transcriptome sequencing data sets were obtained from the NCBI Sequence Read Archive (SRA). The back-spliced junction sites in each RNA-seq sample were identified using a circRNA discovery pipeline adapting the scripts provided on circBase [7, 49], which was referred as find_circ [50]. Detected back-spliced junction sites, along with the collected junction sites from previous reports, were further compared with the hg19 human genome annotation from RefSeq to annotate circRNA isoform sequence. The annotated sequence were then applied in the prediction of circRNA-miRNA interactions. Occurrence of miRNA target seeds in circRNA isoforms were examined and normalized by isoform length. The significance of interactions was evaluated by referring to the background distribution of miRNA seeds in all transcripts and only circRNA-miRNA interactions with P-values < 0.005 were collected. Expression profiling of the circRNAs within each of the samples collected from SRA was conducted in two different approaches: normalized counts of reads spanning the back spliced junction sites SRPBM [13] and normalized counts of reads aligned on the annotated sequences of circRNAs in units of FPKM. Only the circRNAs with estimated expression level over the threshold and found in multiple researches or samples were analyzed in this study.

Fig. 3
figure 3

 Overview of the data analysis process in this research. The general view of the process of ORCEL discovery is illustrated in this figure

Detection of the back spliced junction sites

Reported human back-spliced junction sites were collected from 22 recent studies [6, 7, 13, 20, 22, 24,39]. In addition, 465 transcriptome sequencing data sets were collected from NCBI Sequence Read Archive [23, 40]. The back-spliced junction sites in each RNA-seq sample were identified using a circRNA discovery pipeline referred as find_circ [7, 49, 50]. We apply the criteria defined in the pipeline hence the detected junction sites met same standards as those in the previous reports, as described in the Memczak et al. 2013 study.

Annotation of circRNA full sequence

The method was described in our previous study [23]. The method was further applied on the updated data from reports between year 2014 and 2016. To acquire the full length nucleotide sequence from RNA-seq reads, back-spliced junction sites were compared with the hg19 human genome annotation as obtained from UCSC genome browser and RefSeq [51, 52]. Given the results of recent research, multiple circRNA isoforms might originate from the same back-spliced junction site [6, 53]. Hence we annotated multiple circRNA isoforms for one back-spliced junction site. The annotation was conducted following the guideline:

  1. (1)

    For the back spliced junction sites locate on exact “head” and “tail” locus of exons from same transcript from RefSeq [51, 52], all the flanking exons of the transcript were considered as part of the same circRNA.

$$ SRPBM=\frac{Read s\ count\times {10}^9}{Read\ lenth\times Mapped\ reads} $$
  1. (2)

    For those back spliced junction sites flanked multiple isoforms from RefSeq [51, 52], existence of multiple isoform of circRNA was assumed.

  2. (3)

    For those circRNAs associated with these back-spliced junction sites having small misalignments to exon locations, flanked exons and a small portion of intron sequence locating in the head or tail locations were considered as parts of the isoforms.

  3. (4)

    For the junction sites that were found to be located in intergenic positions while others, despite overlap** with certain genes, localized to their antisense strands, the entire flanked sequence was considered as the sequence of the circRNA.

The resulted sequence annotation was took for expression analysis and miRNA target search. The annotation along with the gene transcripts was recorded in gtf format for the expression profiling.

Identification of potential miRNA sponges

Developed from results of our previous study [23, 54], to find the potential miRNA and circRNA interactions, we conducted a statistical analysis on the amount of miRNA binding sites on the annotated circRNAs sequences.

The miRNA target sequences deemed typical: 6mer, 7mer-A1, 7mer-m8 and 8mer sequences [42] were extracted from miRBase [55]. Perfect complementarity sites were found on the annotated circRNA as well as gene transcripts sequences through iterative searching. To normalize the number of occurrences of these sites by the length of the transcripts, the following formula was used:

$$ \mathrm{Frequency}\ \mathrm{of}\ \mathrm{Nmer}=\frac{Number\ of\ target\ seeds\times 1000}{N\times Length\ of\ CircRNA} $$

Where the ‘N’ is the length of the seed. N = 6 for 6mer, N = 7 for 7mers and 8 for 8mer. With this formula, four frequency numbers can be acquired from each pair of circRNA and miRNA. To distinguish circRNA from linear isoforms, frequency values were also calculated for gene transcripts mRNA. We calculated all the frequency value of the circRNAs as well as the linear isoforms pairing to miRNAs, and then converted the Z-score of the normal distribution into one tail P-value through survival function.

The circRNA-miRNA pair with P-value < 0.005 was considered high regulatory potential between the circRNA and miRNA. The miRNAs and experimentally verified gene targets were collected from miRTarBase [54].

Expression profiling of circRNAs

As previously described, the abundance of the circRNA within the collected samples were estimated through the transcript deconvolution algorithm of the Cufflinks pipeline [41]. To further increase the prudence of circRNA detection within the transcriptome, the normalized counts of sequence reads spanning the back spliced junctions were considered.

The normalized count of reads on back spliced junctions

To normalize the amount of the normalized sequence reads spanning the junction sites, a concept of spliced reads per billion map** (SRPBM) was applied [13]. Amount of reads mapped onto hg19 human genome was acquired through the tool STAR [56]. The equation applied to calculate SRPBM is as illustrated in Eq. 2. The junction sites with the value of SRPBM larger than 1.0 were selected.

Transcript deconvolution of circRNAs

RNA-seq aligner STAR [56] was applied to realign the sequence reads from the 465 RNA-seq samples on human genome. With the forth-mentioned gtf file containing annotated exon locus of circRNAs and mRNAs, and the bam files generated from STAR, we estimated the abundance within the sample of the annotated sequence through Cufflinks [41]. The resulted transcripts with FPKM over 1.0 were selected.

Co-occurrence analysis of circRNA

From the result of recent year comparison study [50], we deducted that inconsistency between different circRNA detection tools and false discovery of highly abundant circRNA could occur in the result of our analysis. Hence in addition to the two values of estimated abundance of circRNA, we further applied the following conditions:

The amount of previous peer review reports in which the back spliced junction sites were reported.

The amount of samples among the 465 collected samples in which the back spliced junction sites were found meeting the criteria defined in find_circ [7, 49, 50].

Only the circRNAs with the combined amount of these two values over 10 were considered in the analysis of this report.