Background

DNA replication is an essential process and is generally conserved across all three domains of life, making use of two different DNA replication apparatuses (bacterial-type and eukaryotic-type)[1, 2]. DNA replication initiates from a single origin in bacteria, whereas multiple origins are utilized in eukaryotes[3]. The study of replication origins in archaea has been ongoing for more than a decade, and multiple replication origins have been identified in several archaeal species[49] and Halobacterium sp. strain NRC-1[9].

These experimental data revealed that the basic structure of replication origins is conserved among archaea, normally containing an AT-rich unwinding element and several conserved repeats (Origin Recognition Box, ORB)[9]. The ORB elements were proven to be the recognition sites for the Orc/Cdc6 initiation protein via biochemical[5] and structural approaches[15, 16]. In addition, distinct from the ORBs identified in the oriC1 of S. solfataricus[5], a halophile-specific “G-string” (long G-stretches locating at the end of ORBs) was observed in all origins from H. volcanii[9]. Whereas the Cdc6 and the ORC complex proteins (Orc1-6) act together to recruit the MCM (minichromosome maintenance) complex to an origin of replication in eukaryotes[3, 17], a subset of initiator proteins (Orc/Cdc6), which are related to both Orc1 and Cdc6 of eukaryotes, were adopted by archaea. Therefore, archaeal Orc/Cdc6 is considered to possess both origin recognition and MCM-loading activities[3]. Previous studies in S. solfataricus revealed that origin identity was determined by the specific recognition of Orc/Cdc6 proteins[18]. Interestingly, the multiple origins, especially the ORB sequences and their associated Orc/Cdc6 proteins, are quite diverse in all three experimentally characterized archaea (S. solfataricus, H. volcanii and Halobacterium sp. NRC-1)[5, 9, 9] and Halobacterium sp. NRC-1[9, 9]. These common features provided us a reference standard to predict replication origins in H. hispanica. Briefly, only those intergenic regions (IRs) that contain ORB-like elements and are directly adjacent to orc/cdc6 genes were considered to be putative orc/cdc6-associated replication origins. Necessarily, although they were not included in the scope of this study, we do not exclude the possibility of replication origins that are not directly adjacent to orc/cdc6 genes or are without classical ORB-like elements. Replication origins with these characteristics were shown to exist in Sulfolobus spp.[5] and may exist in Halobacterium sp. NRC-1[9]. However, in contrast to other characterized archaeal origins with at least two ORB repeats flanking an AT-rich unwinding element, only one ORB-like element was observed in each IR flanking the cdc6D gene, which was considered to be a deficient origin (oriC3-cdc6D*) when examined by hand (Figure 1A and Additional file1). Accordingly, seven replication origins were predicted in H. hispanica: two were in the main chromosome (oriC1 cdc6A and oriC2 cdc6E), four were in the minichromosome (oriC4 cdc6G, oriC5 cdc6H, oriC6 cdc6I and oriC7 cdc6J), and one was in the megaplasmid (oriP cdc6K) (Figure 1A and B).

Figure 1
figure 1

Bioinformatic and genetic identification of replication origins in H. hispanica. A. Seven replication origins, oriC1-cdc6A and oriC2-cdc6E in the main chromosome; oriC4-cdc6G, oriC5-cdc6H, oriC6-cdc6I and oriC7-cdc6J in the minichromosome; and oriP-cdc6K in the megaplasmid, were predicted by searching ORB motifs (indicated with small triangles) in the IRs located directly adjacent to orc/cdc6 genes (indicated with red arrowheads) using MEME software. Logo representations of ORB elements are presented on the right, and the spaces represent sequences that are not conserved. oriC3*: predicted deficient origin adjacent to cdc6D gene. B. Replication assay for plasmids containing the origins predicted in A. (Up) Southern blot analysis with a bla gene probe: lane T contains crude DNA extracted from the H. hispanica transformants, and lane P represents the purified plasmid as an input control; (down) summaries of the identification of origins in H. hispanica and the five origins with ARS activity (oriC 1, 2, 6, 7, P) are indicated with filled ovals and are bolded in A.

To confirm these putative replication origins, we performed a genetic assay to test their autonomous replication activities. As a control, we also examined whether oriC3-cdc6D* and IRs around cdc6B, cdc6C and cdc6F, where no ORBs were detected, could engage in origin activities. DNA fragments, including the orc/cdc6 genes plus their flanking IRs, were cloned into a nonreplicating plasmid, pBI101[32, 33], to assay for the presence of an autonomously replicating sequence (ARS) (Figure 1, Additional file2). Of the eleven orc/cdc6 genes with adjacent IRs, oriC1 cdc6A and oriC2 cdc6E in the main chromosome, oriC6 cdc6I and oriC7 cdc6J in the minichromosome and oriP-cdc6K in the megaplasmid were able to confer replication ability to the non-replicating plasmid (Figure 1B, Additional file2), which was indicative of the ARS activities of these origins. As expected, no replicating ability was observed for plasmids constructed with oriC3-cdc6D* or with the fragments containing cdc6B, cdc6C and cdc6F (Additional file2). Although the remaining two predicted replication origins, oriC4-cdc6G and oriC5-cdc6H, shared a conserved structure with characteristic archaeal origin (Additional file1), they could not drive the autonomous replication activities under our experimental conditions, which is reminiscent of the dormant origins found in eukaryotes[34]. Dormant replication origins are normally inactive, but they can be activated for cellular response to replicative stress[35, 36]. In the future, it would be interesting to further analyze the utilization of these likely dormant replication origins in H. hispanica.

Most orc/cdc6 genes are predicted to associate with replication origins in haloarchaea

To date, the genomes of 15 haloarchaea have been made available through NCBI (before October 2011), and 14 of these 15 genomes include the minichromosomes and/or megaplasmids, which provided us the opportunity to perform a comparative genomic analysis of replication origins in haloarchaea. To focus on the orc/cdc6-associated replication origins, we first conducted an exhaustive search of the orc/cdc6 genes in the 15 sequenced haloarchaeal genomes (Table 1).

Table 1 Predicted origin-associated Orc/Cdc6 homologs in the haloarchaeal genomes

Multiple Orc/Cdc6 homologs are encoded in each of the 15 sequenced haloarchaeal genomes. Based on a previous study[15], origin-associated Orc/Cdc6 proteins contain two important domains, a N-terminal AAA + domain and a C-terminal winged-helix domain, and almost all have a length greater than 300 amino acids. A total of 154 Orc/Cdc6 homologs fulfilling these criteria were collected from the 15 sequenced haloarchaeal genomes (Table 1 and Additional file3), and the IRs flanking these orc/cdc6 genes were collected for motif searching. Interestingly, distinct ORB-like elements harboring G-string were found in the IRs flanking nearly two-thirds (102 of 154) of the orc/cdc6 genes (Table 1 and Additional files3 and4), and the predicted replication origins were rechecked manually to remove deficient origins such as oriC3-cdc6D* in H. hispanica. As expected, multiple replication origins were predicted in all of the analyzed haloarchaeal genomes (Table 1). Haloterrigena turkmenica has the greatest number of predicted origins at 12, and 7 of those origins are located on its chromosome (Table 1). On average, within the haloarchaeal chromosomes, more than half of the orc/cdc6 genes have predicted origins nearby: a maximum of 75% (3 of 4) in Halobacterium spp. and a minimum of 33% (1 of 3) in Halomicrobium mukohataei (Table 1). Compared with the chromosome, the overwhelming majority (greater than 80%) of the orc/cdc6 genes in the extrachromosomal elements (minichromosomes and megaplasmids) are associated with predicted replication origins (Table 1).

As several replication origins have been experimentally mapped in H. hispanica (Figure 1), H. volcanii[9] and Halobacterium sp. NRC-1[9], five additional replication origins were also predicted in this study. As discussed above, these additional predicted origins might be weak or dormant replication origins, which are not easily identified by experimental approaches.

In summary, our bioinformatic approach not only is important for identifying active replication origins in haloarchaea but also provides novel information for predicting likely dormant replication origins, which is also important for the future study of replication regulation and adaptation in archaea.

Diversity of orc/cdc6-associated replication origins in haloarchaea

A recent report suggested that Orc/Cdc6 initiators specifically determine origin discrimination in archaea[18]. To investigate this further, a phylogenetic analysis of ori-associated Orc/Cdc6 proteins in haloarchaea was performed, and the resulting tree showed that Orc/Cdc6 homologs cluster into different families (Figure 2A), which suggested that various orc/cdc6-associated replication origins have been adopted in haloarchaea. Different Orc/Cdc6 families have been suggested in previous work[14, 37]; herein, we focused on the putative origin-associated Orc/Cdc6 homologs with the intention of providing a detailed classification of predicted replication origins. Although setting precise boundaries was difficult, the predicted replication origins could be sorted into distinct families based on a combination of the phylogenetic tree of the Orc/Cdc6 homologs (Figure 2A) and a comparison of ORB sequences (Figure 2B). It is noteworthy that BLAST analyses confirmed that only those Orc/Cdc6 homologs showing high identities (at least 80%) were grouped into the same family in this study. Specifically, the origins adjacent to the specific Orc/Cdc6 conserved among all haloarchaea were named oriC1, as previously reported[9, 9]. As the two origins exhibit different structures and these two haloarchaea grow in different environmental conditions, these observations may provide novel insight into differential utilization of replication origins in haloarchaea.

Novel replication origins accompany newly acquired genomic content

As described above, the replication origins of two Haloarcula species, H. hispanica and H. marismortui were predicted, and their ARS activities were also examined in H. hispanica (Figure 1). Although their chromosomes show a high degree of conservation (Figure 4B), the two species harbor several different replication origins (Table 1 and Figure 4A). Thus, an in-depth study of these origins would be helpful in understanding the processes involved in the diversity of haloarchaeal replication origins.

Figure 4
figure 4

Comparative analysis of the orc/cdc6 -associated replication origins between the chromosomes of H. hispanica and H. marismortui. A. Distribution of the candidate orc/cdc6-associated replication origins in the chromosomes of H. hispanica (inside) and H. marismortui (outside). G + C content of the chromosome of H. hispanica was plotted, and significant variations in the two divergent regions are indicated with blue arrows. The predicted orc/cdc6-associated replication origins are indicated as ovals on the chromosome circle, and the shared orc/cdc6-associated replication origins in the two Haloarcula species, oriC1 and oriC2, are highlighted as filled ovals. B. Genome alignment of the chromosomes of H. hispanica and H. marismortui. Their shared orc/cdc6-associated replication origins are indicated as in A. Regions A and B represent discrepancies between the two chromosomes, which are exactly in accordance with the positions of their specific orc/cdc6-associated replication origins; oriC3-cdc6D* of H. hispanica and oriC3-cdc6i of H. marismortui are located in region A, and oriC4-cdc6g of H. marismortui is located in region B. The divergent regions and the edges of the similar regions were confirmed by BLASTN alignments of sequences, and shaded regions denote a similarity of over 70%. Linearized scaled bars are provided. C. A schematic representation of the two divergent regions (1 kb scale for Hhis_A, Hmar_A and Hhis_B; 2 kb scale for Hmar_B) between the two chromosomes. The orc/cdc6 genes are indicated. The polysaccharide biosynthesis genes are in yellow, transposase genes in purple, other genes with known functions in pink and hypothetical genes in gray. The species with the closest matches in the BLAST analysis is indicated on top of the gene: M, Methanobacterium; A, other non-halophilic archaea; B, eubacteria (the colors are designed to correspond to the marks in Additional file6). The genes in clusters are also in clusters in other haloarchaea, as indicated at the top of the clusters.

With the exception of the two shared replication origins, oriC1 cdc6A and oriC2-cdc6E in H. hispanica and the corresponding oriC1-cdc6d and oriC2-cdc6h in H. marismortui, there are one or two other predicted origins specific to each strain: oriC3-cdc6D* in H. hispanica, oriC3-cdc6i and oriC4-cdc6g in H. marismortui (Figure 4A and B). The two shared origins, oriC1 and oriC2, were likely present in the ancestor of Haloarcula, and their specific origins, oriC3 cdc6D* in H. hispanica and oriC3 cdc6i and oriC4 cdc6g in H. marismortui, may have been acquired later through translocation processes following the divergence of these species. An alternative hypothesis is that all three species-specific origins were also present in the ancestor of Haloarcula but were lost differently in H. hispanica and H. marismortui. However, these three predicted origins (oriC3 cdc6D*, oriC3 cdc6i and oriC4 cdc6g) are located in two divergent regions (region A and B in Figure 4A and B) with significant G + C content variations (Figure 4A), which is indicative of newly acquired genomic content specific to each of the two strains[38]. Thus, the most likely explanation is that these predicted species-specific origins were newly acquired as a part of new genomic content (i.e., the haloarchaeal genomes might recruit novel replication origins accompanying new genes). This hypothesis is reinforced by the abundance of transposases observed around these specific origins (Figure 4C and Additional file3).

Concentrating on the genes with annotated functions, except for those predicted to be transposases, the majority of genes within the two divergent regions were found to be involved in polysaccharide biosynthesis (Figure 4C). Subsequently, a BLAST analysis against the NCBI non-redundant proteins database was performed on all of the genes in regions A and B in both chromosomes (Figures 4B and C and Additional file6). The genes were conserved across several different organisms (Figure 4C); most were similar to other haloarchaeal homologs, but for several genes, their closest homologs were outside of haloarchaea. The two linked glycosyltransferase genes in region A of H. hispanica were most similar to those found in Methanobacterium (Figure 4C and Additional file6). Several genes in region B of both chromosomes showed the greatest similarity to genes found in bacteria, especially a cluster in region B of H. hispanica (Figure 4C and Additional file6). In addition, those genes found in clusters in the two Haloarcula species were also usually found in clusters in other organisms (Figure 4C), suggesting that these genes were acquired in clusters.

A previous report in Salinibacter ruber suggested that genes with related functions but different origins might have been assembled together and introduced concurrently into the genome of S. rubber[31]. Similarly, our comparative analyses indicated that the convergence of closely related functional genes from different sources is an important way through which new genomic content is acquired in haloarchaea and that foreign replication origins are usually introduced as a component of this new content. We cannot be certain whether the new genomic content (mixture of new genes and foreign replication origins) is introduced with single or multiple transfer(s), as the mechanism is not well understood; however, our analyses strongly suggested that the novel replication origins may be important for the acquisition of new genomic content and that the newly acquired genes from the surroundings may be favorable for the haloarchaeal cells to improve their ability to adapt to changeable environments.

Recruitment of novel replication origins in the reconstruction of the extrachromosomal replicons

The haloarchaeal genomes in this study, except that of Halorhabdus utahensis, generally harbor extrachromosomal replicon(s), ranging in number from one in H. mukohataei and H. walsbyi to eight in H. marismortui (Table 1). In addition, orc/cdc6 genes were found on most of the extrachromosomal elements (Table 1), suggesting that the orc/cdc6-associated replication origins are responsible for replication initiation on most of these replicons. Therefore, an in-depth analysis could further elucidate the evolution of these replication origins.

Compared to H. hispanica, the H. marismortui genome contains a greater number of extrachromosomal replicons, with eight (minichromosome II and 7 megaplasmids, pNG100 to pNG700), while H. hispanica contains only two (minichromosome II and megaplasmid pHH400). Among these minireplicons, only megaplasmids pHH400 and pNG700 are collinear (Figure 5), suggesting that they may have been present in a common ancestor of the two Haloarcula species. The lengths of the minichromosomes of H. marismortui and H. hispanica are 288 kb and 488 kb, respectively. They share homology over approximately 100 kb, with a few inversions and gaps (Figure 5), indicating that this region was likely rearranged in the two Haloarcula species and thus that the two minichromosomes are only distantly related. In addition, the megaplasmids from pNG100 to pNG600 are unique to H. marismortui. However, pairs of orthologous to the minichromosome of H. hispanica are observed, especially in pNG500, with orthologs as large as 30 kb (Figure 5). Together with the abundant ISH (i nsertion s equence from H alobacteriaceae) elements encoded in these replicons, our data imply that the extrachromosomal replicons were significantly rearranged after the divergence of the two species and that new DNA contents were acquired from surrounding organisms. These results are also reminiscent of previous reports on the evolution of the large dynamic replicons found in Halobacterium spp.[22, 39].

Figure 5
figure 5

Comparative genomic analysis of the extrachromosomal replicons of H. hispanica and H. marismortui. The orc/cdc6 genes (those from H. hispanica and H. marismortui are highlighted with a purple asterisk and a dark green round dot, respectively) that are associated with candidate replication origins are indicated, and the shared origins associated with cdc6G/cdc6a, cdc6K/cdc6k of the two strains are highlighted in bold. The homologous regions are boxed, and the lines in the box represent the regions that are continuous in H. marismortui.

To understand the different composition of the extrachromosomal elements in the two Haloarcula species, the orc/cdc6-associated replication origins in these minireplicons were also examined. In H. hispanica, four predicted orc/cdc6-associated replication origins are distributed in the minichromosome, and one is present in the megaplasmid pHH400. The two origins (oriC6 cdc6I and oriC7 cdc6J) in the minichromosome and the one (oriP-cdc6K) in pHH400 were confirmed by ARS activity (Figure 1 and5). In H. marismortui, the predicted orc/cdc6-associated replication origins are distributed among the extrachromosomal replicons as follows: two in the minichromosome, one in pNG700, one in pNG600, two in pNG500 and one in pNG100 (Figure 5). No orc/cdc6 genes are encoded by either pNG400 or pNG200, and no candidate replication origin was identified adjacent to the orc/cdc6 gene in pNG300, indicating that other types of replication origins are involved in the initiation of replication in these replicons. This concept is reinforced by the identification of rep genes in these replicons (Table 1)[40]. Among these replication origins, only two are shared by the two Haloarcula species, oriP-cdc6K in pHH400 and the origin (proximal to cdc6k) in pNG700, as well as the origin proximal to cdc6G and cdc6a in the minichromosomes of H. hispanica and H. marismortui, respectively (Figure 5). In contrast to the high conservation found in the megaplasmids pHH400 and pNG700, the region around cdc6G and cdc6a shows no collinearity (Figure 5), strongly suggesting that this origin might not have been present in their ancestor and instead was employed by H. hispanica and H. marismortui after their divergence. Surprisingly, a specific origin (oriC7-cdc6J) in the minichromosome of H. hispanica, which proved functional (Figure 1), was located in the region with high orthology to H. marismortui (Figure 5). This observation suggested that this replication origin was recruited into this region in H. hispanica or was lost in H. marismortui during rearrangement of minichromosomes in the two Haloarcula species. Similarly, the specific origins in pNG600, pNG500 and pNG100 and the rep-associated origins in pNG400, pNG300 and pNG200 were all likely recruited to accomplish the construction of these replicons in H. marismortui.

Multiple evolutionary mechanisms account for multiple orc/cdc6-associated origins in haloarchaea

Our above analysis clearly indicated that the replication origins in haloarchaea are quite diverse and that different haloarchaea can share a few different origins. Although we cannot exclude the possibility that origin loss contributes to mosaic replication origins in haloarchaea, it is unlikely that all of the origins currently shared by different haloarchaea were present in the ancestor of each genus of Halobacteriaceae as oriC1. Archaea species often harbor mobile elements within their genome, which are mobilized via integrases[41] or transposases encoded by insertion sequence (IS) elements[42]. Our comparative analyses of the genomic context of the replication origins in the two Haloarcula species demonstrated the presence of mobile elements near their specific origins (Figure 4). These indicators of translocation processes were further analyzed in the genomes proximal to the origins in other haloarchaea. Forty-two of 102 potential replication origins have integrases or transposases nearby (Table 2 and Additional file3), which might contribute to accelerate the translocation of these origins. In haloarchaeal chromosomes, the ratios of later-acquired origins are comparatively low, with a maximum of 50% for H. marismortui, H. utahensis and H. walsbyi and none in H. borinquense, H. jeotgali B3, H. mukohataei, H. xanaduensis and N. pharaonis (Table 2). By comparison, these later-acquired replication origins are widespread in extrachromosomal elements. For example, they account for 80% (4 of 5), 83% (5 of 6) and 87.5% (7 of 8) of the replication origins in the extrachromosomal elements of H. salinarum R1, H. volcanii DS2 and H. lacusprofundi, respectively (Table 2). These observations suggest that a portion of the replication origins in haloarchaea, especially those in extrachromosomal elements, were introduced through recent translocation processes.

Table 2 Predicted later-acquired replication origins in the haloarchaeal genomes

Contrary to the complete conservation of the replication origin oriC1[9]. This closeness might benefit the preservation of origins over evolutionary time. Another type of origin in the oriCb family, including the origins proximal to Htu_5222 in H. turkmenica, Hje_08365 in H. jeotgali, Nma_3611 in N. magadii and Hxa_0635 in H. xanaduensis, was observed; this origin showed no similarity with respect to the order of the genes flanking the origin in different genomes (Figure 6C and Additional file7). This finding implies a totally distinct evolutionary process. As three out of the four members of this type of origin were identified in extrachromosomal elements, it is plausible to propose that these origins were recruited for the construction of novel extrachromosomal replicons independently. Gene pools in environments were proposed to explain the adaption of prokaryotes under changeable environments[31, 43]. Similarly, the diversity of replication origins can be thought of as a pool of origins in environments that can be recruited for the construction of novel replicons. This hypothesis sheds light on not only the random distribution of conserved origins in different haloarchaea but also the presence of extremely variable extrachromosomal replicons in haloarchaea.

Conclusion

In this study, orc/cdc6-associated replication origins were predicted in 15 sequenced haloarchaeal genomes through Orc/Cdc6 protein analyses and adjacent ORB searching. Multiple replication origins were found in all of the analyzed genomes, and nearly two-thirds of the orc/cdc6 genes were found to be associated with the predicted replication origins. We also experimentally investigated the predicted replication origins in H. hispanica and demonstrated that 5 out of 7 predicted origins possess ARS activity and that the remaining 2 putative replication origins appear to be dormant in experimental conditions. In conjunction with ORB comparisons and phylogenetic analysis of the Orc/Cdc6 homologs, various families of these predicted replication origins were revealed in haloarchaea. The diversity of multiple replication origins in haloarchaea was mainly driven by the diversity of Orc/Cdc6 proteins that specifically associate with distinct ORB elements. Interestingly, origins within the same family may have different functions among the various haloarchaea, e.g., although belonging to the oriCa family, the active origin in Halobacterium sp. NRC-1 (proximal to orc10)[9]. These observations suggested differential origin utilization under different replicative conditions and demonstrated the advantage of our bioinformatic approaches in the identification of dormant or weak replication origins in haloarchaea.

Phylogenetic analysis of Orc/Cdc6 proteins suggested that multiple replication origins in haloarchaeal genomes can be categorized into at least two types: oriC1, which is present in an ancestor of archaea, and the other origins, which are likely specific in haloarchaea. We also revealed that transposases or integrases flank more than 40% of predicted replication origins; this flanking is indicative of the translocation of a portion of the replication origins among haloarchaea. In conjunction with comparative analyses of two families of replication origins (oriCa and oriCb), we suggested that different evolutionary mechanisms account for the diversity of replication origins in haloarchaea: preservation from ancestors (e.g., oriC1 was maintained from the original ancestor of archaea, and one type of origin in oriCb was maintained from the closest ancestor of H. volcanii, H. borinquense and H. lacusprofundi), differential loss, and translocation among haloarchaea. In particular, a comparative genomic analysis of two Haloarcula species revealed that species-specific origins in the main chromosome were introduced along with new genes, whereas in the extrachromosomal replicons, the recruitment of novel replication origins usually accompanied the construction and/or rearrangement of minireplicons. The concept of an “origins pool” was proposed, and the introduction of novel origins in conjunction with the acquisition of new genomic content may be linked to the mechanisms involved in the adaptation of haloarchaeal cells to changeable environments. Taken together, our analyses of the diversity and evolution of the potential replication origins in haloarchaea may open avenues to understanding the significance of the multiple replication origins in the domain of Archaea.

Methods

Strains, plasmids and culturing

Escherichia coli were grown in Luria-Bertani medium at 37 °C, and 100 μg/mL of ampicillin was added when required. H. hispanica was cultivated at 37 °C in nutrient-rich medium AS-168 (per liter: 5.0 g Bacto Casamino Acids, 5.0 g yeast extract, 1.0 g sodium glutamate, 3.0 g trisodium citrate, 200 g NaCl, 20 g MgSO4 · 7H2O, 2.0 g KCl, traces of FeSO4 · 4H2O and MnCl2 · 4H2O, pH 7.2), and 3 μg/ml of mevinolin was added when required[9, 14]. Briefly, the transformant on the plate was transferred into 200 μL of double-distilled H2O and 100 μL of phenol-chloroform and vortexed briefly. The supernatant (crude DNA) was collected for Southern blot analysis.

Identification of Orc/Cdc6 homologs in the haloarchaeal genomes

Fifteen haloarchaeal genomes were available through NCBI, including the H. hispanica genome sequenced by our laboratory[9] were considered as candidate ORB elements. The IRs were verified by hand, and only those contained inverted ORB repeats and were structurally similar to characterized archaeal replication origins were considered to be candidate orc/cdc6-associated replication origins. The results are summarized in Additional file4. Logo representation of ORB elements was performed using the program WebLogo (http://weblogo.berkeley.edu).

Phylogenetic analysis

16S rRNAs were collected from the 15 haloarchaeal genomes to estimate the evolutionary distance between them. The 16S rRNA sequence nearest the haloarchaeal-conserved replication origin (oriC1) was selected when there was more than one rRNA operon in the genome. Multiple alignments of the 16S rRNA sequences were performed using Clustal[50] implemented in MEGA[51]. A phylogenetic tree was constructed using neighbor-joining method[52] and maximum composite likelihood model implemented in MEGA, and 1000 bootstrap replicates were carried out. The Orc/Cdc6 homologs that were predicted to be associated with replication origins were collected from each of the 15 haloarchaeal genomes. The Orc/Cdc6 proteins, experimentally proven functional in their ability to recognize replication origins in other archaea (Pyrococcus abyssi[4], Sulfolobus solfataricus[5, 6], Aeropyrum pernix[7, 8]), were also included in this phylogenetic analysis. Multiple alignments of Orc/Cdc6 homologs were generated using Clustal (substitution matrix = BLOSUM; gap-opening penalty =10; gap-extension penalty = 0.1), and the result was adjusted manually to remove columns with many gaps. For maximum likelihood (ML) phylogeny, we used PHYML v3.0 with an LG substitution model and 100 nonparametric bootstrap replicates[53]. The data used to build the trees were deposited in TreeBASE (http://purl.org/phylo/treebase/phylows/study/TB2:S12601).

Comparative genomics and gene analysis

Whole genome alignments were performed using mummer and mummerplot algorithms in MUMmer[54] with default parameters. The GC plot was drawn using DNAplotter (window size: 50000; step size: 1000)[55]. Genome context analysis of the regions flanking the orc/cdc6-associated replication origins was performed using the NCBI Genome Workbench and scrutinized manually. Gene analysis was carried out using BlastP against the NCBI non-redundant proteins database (http://blast.ncbi.nlm.nih.gov/).