Background

Natural antisense transcripts (NATs) are endogenous ones that exhibit complementary sequences to transcripts of a known function, or sense transcripts. NATs were first described in prokaryotes [1], and they were found to down-regulate the expression of sense transcripts involved in diverse biological functions, such as transposition, plasmid replication and gene expression [2]. Since the discovery of first NAT in human [3], an increasing number of NATs in mammalian organisms have been reported to be related to genomic imprinting [4], RNA interference [5], alternative splicing [6], X-inactivation [16]. Although increasing evidence suggests that trans NATs might perform more significant and versatile functions than previously expected [11], most of the high-throughput searches were focused on cis NATs but overlooked trans ones [10, 12, 19], we found two hNAT pairs displaying a significant inverse expression pattern before and after insulin injection.

Results and Discussion

Identification of hNATs from RefSeq with BLASTN

Previous large-scale hNATs searches have significantly expanded our knowledge about the prevalence of hNAT [1010, 12, 2022]. To date, underlying mechanisms of trans NATs are relatively less understood than cis ones. The identification of more novel trans NATs in this work may shed some light on this field.

It is noticeable that Lehner et al. concluded that 51 out of their 80 trans hNATs were likely to be chimeric mRNA containing sequences from two different chromosomal loci due to artifacts of cDNA library construction and chromosomal rearrangements [11]. However, after carefully checking the trans hNATs they reported, we found that 21 out of the 23 RefSeq trans hNAT pairs thought to be involving suspected chimeras can now be mapped to certain loci, thus are true trans NATs. Among these 21 NATs, the chromosomal location information of 4 entries was modified in the Genome database, suggesting that the presence of trans NATs did affect the genome assembly and gene localization as people worried about. It seems that the temporal unfinished human genome assembly was a major obstacle in their survey on trans hNATs.

Our data do not cover all of previously reported ones [101012]. We classified the 550 hNAT pairs that have CDS location data into 6 types according to the pairing region: 1) 5'UTR vs. 5'UTR, 2) 5'UTR vs. CDS, 3) 5'UTR vs. 3'UTR, 4) CDS vs. CDS, 5) CDS vs. 3'UTR, 6) 3'UTR vs. 3'UTR (Table 2). More than 87% of the 550 NAT pairs involved 5' or 3' UTR, that is, types 1, 2, 3, 5 and 6, supporting the significance of UTR in antisense-mediated regulation [9, 25, 26].

Table 2 Classification of hNAT pairs based on pairing region

According to our data, the subtype 6-cis, or 3'UTR vs. 3'UTR-cis, is the most common form of overlap**. While, analysis of adjacent gene sets in S. cerevisiae suggested that there might be evolutionary pressure to select against convergent genes [27], and the overlap** arrangement of convergent genes restricted the elongation of both transcripts, resulting in a severe reduction in mRNA accumulation, termed as transcriptional collision [28]. This collision effect could be a direct physical impediment to the transcription machinery or an indirect effect caused by supercoiling changes to the DNA template during transcription, while seemed unrelated with interference of antisense RNAs [28]. The precise biological implication of the convergent arrangement of genes in yeast and mammalian genomes remain to be elucidated. The 6-cis dataset including 116 hNAT pairs might provide basis for future in-depth investigation.

The subtype 1-cis, or 5'UTR vs. 5'UTR-cis NAT pairs, could be involved in bidirectional transcription driven by a divergent promoter, which seems to be a preferred structure in prokaryotes for gene regulation such as transcription coupling [29]. Transcription coupling is different from the traditional concept of antisense phenomena, and the choice between transcription coupling and antisense transcript-directed inhibition may depend on the overlap** length and involved transcription factor binding sites. Since bi-directional transcription in eukaryotes is not as common as in prokaryotes, the precise organization and its significance of the 20 members of type 1-cis are intriguing. An in-depth analysis is still in progress.

Splice variants involved in hNATs

Since the wide existence of alternative splicing has been well established [30], and there is evidence for the involvement of NATs in alternative splicing [31, 32], we also investigated the splice variants involved in hNATs. In the present 568 NATs pairs, 63 genes involved in 168 NAT pairs have splice variants (see Additional Table 2A). Forty-nine of the 63 genes involved in 121 pairs have pairing regions unaffected by alternative splicing; while the other 14 genes involved in 47 pairs have variable pairing regions due to alternative splicing. As an example for the latter case, the SPAG8 gene has two splice variants, NM_012436 and NM_172312, and they may pair with antisense transcript NPR2 with 308 bp overlap of 99% identity, and 110 bp overlap of 100% identity, respectively. It is conceivable that alternative splicing can also make the whole pairing region lost. In case of the IL18BP gene (see Additional Table 2A), it has four splice variants, but only three of them have antisense transcript NUMA1. To see how frequently this can happen, we checked the rest 400 hNAT pairs (568-168) using AceView [33] and found that additional 367 NAT pairs involved splicing variants (see Additional Table 2B). In these cases, only one of the transcript variants pairs with its countertranscript while the others lose the pairing region because of alternative splicing. Therefore these cases were not contained in Additional Table 2A. Among the 535 NAT pairs (168+367) related with splice variants, 22.6% (121/535) have splicing variants sharing the same pairing region, 77.4% (414/535) have their pairing regions affected or completely eliminated by alternative splicing. The remarkably high percentage of the latter suggested significant relationship of alternative splicing and antisense-directed regulation. Taking the THRA (c-erbAα) and NR1D1 (Rev-erbAα) pair (No. 472 in the Additional Table 1) as an example, the sense transcript c-erbAα encodes two structurally related proteins R-erbAα1 and R-erbAα2 by alternative splicing [23]. The antisense rev-erbAα transcript is complementary to the last exon of r-erbAα2 mRNA but not to the r-erbAα1 mRNA. It was indicated that rev-erbAα messenger prevents sense r-erbAα primary transcript splicing into r-erbAα2 mRNA by RNA masking, thus tilting the balance towards R-erbAα1 synthesis and ultimately modulating cellular response to hormone [6, 22, 23, 31, 34]. The inhibition of splicing by NAT complementary to sequences remote from the splice site is specific and efficient, which might be attributed to blocking of regulatory elements within the exon essential for exon selection and intron removal, disruption of pre-mRNA secondary structure important for splicing, or disruption of RNP structure required for assembly of a functional RNA-splicing complex [22, 35]. As a result, NAT dictates the way how a sense transcript is differentially spliced. A further analysis of the 414 NATs pairs involving splicing variants with different overlaps may help uncover underlying mechanisms.

Also in the above example, the antisense transcript encodes protein Rev-erbAα (NR1D1) which happens to belong to the thyroid/steroid hormone receptor family, same as products of sense transcripts [9]. This example illustrates the dual roles of some NATs: template for translation and regulator of sense gene expression. Such a gene structure is quite exquisite and economic to organize functionally related genes.

One-to-many relationship in hNATs

Initially, NAT pairs were considered to have one-to-one relationship. However, an antisense transcript might form duplex with more than one splice variants of a sense gene, as the examples given in previous section. In theory, an antisense transcript has also the potential to pair with more than one paralogous sense transcripts. Finally, more than one antisense transcripts may pair with one sense transcript at different parts of its sequence. In the 568 hNAT pairs, besides the 168 entries that form one-to-many relationship caused by alternative splicing, additional 97 hNAT pairs were found to have one-to-many relationship due to different pairing regions. That makes 47% ((97+168)/568) of the total, allowing NATs to form complex regulation networks. Figure 1 is an example of such a network including 29 genes involved in 40 hNATs. Interestingly, all hNAT pairs with genomic map** data are trans type. In fact, hNAT pairs in this paper formed 38 such networks in various sizes (see Additional Table 3). It is worth noting that these networks mainly involve trans NATs, same as the example in Figure 1. The trans relationship seems more flexible, thus provides greater chance for transcripts to "communicate" with each other. In addition to RNA masking mentioned above, there is now strong evidence that the interaction of antisense partners can also affect gene expression via the activation of dsRNA-dependent pathways [22, 36]. These might include RNA interference (RNAi)-dependent gene silencing. It was reported that several different micro RNAs could regulate the expression of the same target mRNA based on complementarity between micro RNA and mRNA [21]. It seems that the one-to-many relationship, notably in transcriptional regulation, might be common as the list of known one-to-many regulatory interactions becomes more comprehensive. From this perspective, the biological significance of trans NATs deserves to be further investigated systematically.

Figure 1
figure 1

Human NAT interaction network constituted by one-to-many relationship. Gene symbol represents a gene. Boldface gene symbols represent genes with variants which could pair with a same countertranscript. A dashed line indicates an uncertain type due to lacking genomic map** data; a solid line with bi-directional arrows indicates a trans type. There is no cis-NAT shown in this figure.

Identification of two hNAT pairs with inverse expression pattern

There have been reports that sense and antisense transcripts show inverse expression pattern, as well as examples of coordinated regulation of both transcripts [9, 10, 12, 37]. It is more intriguing that sense and antisense transcripts are differentially expressed depending on tissue types or development stages [25, 38, 39]. All these phenomena illustrated that the underlying mechanism is complicated and confusing. In this work we focused on identifying inverse expression pattern under certain condition for the hNAT pairs we found, considering that this pattern is the most common one for sense-antisense expression.

Microarray is a high-throughput technology for analyzing gene expression and has been recently applied to study NATs [12]. In this work, we used data from SMD [19] for in silico expression analysis of the hNATs we found. Rome et al. [40]designed a microarray of 29308 cDNA probes to evaluate gene expression pattern of skeletal muscle cells from six independent volunteers before and after insulin injection. We found that 150 out of the 568 NATs reported in this work had representing probes in the array, so we selected the expression data of 121 NATs with no less than 3 repeat samples for analysis. T-test and multiplicity adjustment showed that relative quantity (RQ) values of two NAT pairs, SARM1/MGC9564 (No. 550 in Additional Tables 1 and 4) and HARS/WDR55 (No. 443 in Additional Tables 1 and 4), varied significantly after insulin injection, showing the two pairs displayed inverse expression pattern. The calculation result is presented in Additional Table 4.

For the SARM1/MGC9564 pair, RQ value rose after insulin injection, indicating SARM1 is relatively up-regulated in contrast to MGC9564. SARM1 mRNA encodes a conserved protein with a SAM motif and is highly expressed in liver and kidney [41]. It was reported that a 0.4 kb antisense transcript was coordinately expressed with the SARM1 gene, but this is apparently not the same NAT as we found [41], suggesting that the SARM1 gene has at least two antisense partners. MGC9564 is an experimentally supported full-length mRNA (2096 bp) with a predicted coding sequence. According to the pairing data, SARM1 mRNA and MGC9564 mRNA possibly form a CDS vs. 3'UTR cis NAT pair.

For the second pair, HARS/WDR55, the relative quantity decreased after insulin injection, indicating HARS is down-regulated relative to WDR55. HARS mRNA codes for histidyl-tRNA synthetase, which is essential for the incorporation of histidine into proteins [42]. The WDR55 cDNA was cloned recently [43], and has not yet been subjected to final review in the latest release of NCBI RefSeq database. Based on the chromosomal location information, the two genes are mapped closely (5q31.3), but do not overlap. They may form a 3' UTR vs. 3' UTR trans NAT pair. This is a novel pair that we reported for the first time.

The apparent inverse expression pattern of the above two hNAT pairs suggests their possible regulatory roles in skeletal muscle cells after insulin injection. Further experimental studies are needed to unravel the underlying mechanism.

Conclusion

Through a systematic analysis of hNATs using RefSeq dataset, we identified 568 hNATs. Even though the total number is less than those reported by Yelin et al. [12] and Chen et al. [44]. Alternative splicing information was obtained from AceView [33]. Microarray data used for in silico expression analysis were downloaded from Stanford Microarray Database (SMD) [19].

Search for hNATs

The BLASTN program was used to identify putative hNATs with an e-value cutoff of 1e-9 and identity threshold of 98%. Query sequences were all human RNA sequences extracted from the RefSeq database, and the subject database was made up with reverse complement sequences of all query sequences. Option -S was set to be 1 to avoid the program reverse-complementing query sequences automatically. After getting BLASTN hits, we compared pairing sequence segments to the repetitive sequence database, RepBase [45], and confirmed that they contained no known repetitive sequence. We also used ClustalW to align all pairing segments to make sure there was no novel repeats. These two steps are necessary to eliminate the possibility of repetitive element-induced prevalence of hNATs.

Classification of hNATs according to genomic location and pairing region

Human NATs were divided into cis and trans types according to relative genome location using genomic map** data from the Genome database [44]. That is, the NAT transcribed from the opposite strand of the same genomic locus as its sense RNA is cis, while the NAT transcribed from a genomic locus different from its sense counterpart is trans. Based on the pairing region of sense and antisense transcripts, we also defined 6 types: 1) 5'UTR vs. 5'UTR, 2) 5'UTR vs. CDS, 3) 5'UTR vs. 3'UTR, 4) CDS vs. CDS, 5) CDS vs. 3'UTR, and 6) 3'UTR vs. 3'UTR, where the "CDS" contains "CDS", "CDS+3'UTR", "CDS+5' UTR" and "5'UTR+CDS+3' UTR".

Expression analysis of hNATs using SMD data

Gene expression data of human skeletal muscle cells before and after insulin injection are available in SMD [19, 40]. Since human cDNA probes in these data are identified with UniGene cluster IDs, antisense transcripts can be mapped to corresponding probes based on the relationship between UniGene cluster ID and RefSeq accession number (ACC), by which expression data of hNATs were retrieved. Human NATs with data from 3 or more repeat samples were selected for statistical analysis. The ratio of sense to antisense transcript expression levels represents the relative quantity (RQ) of a NAT pair, RQ = S/A. The change of the ratio of a pair after insulin injection is expressed as the ratio of two relative quantities, R = RQa/RQb, where RQb and RQa are relative quantities before and after insulin injection, respectively. This R value is used to check if a NAT pair shows an inverse expression pattern under the given condition. T-test with a significance level of 0.05 and Bonferroni's adjustment for multiplicity were used to evaluate the significance of an R value, that is, the change of a NAT pair in expression pattern.