Background

The bacterial alternative sigma factor RpoN recognizes and binds a -24/-12-type promoter with the following consensus sequence: 5'-TGGCACG-N4-TTGCW-3' (the bold G and C are situated at position -24 and -12 relative to the transcription start site, respectively) [1]. Subsequently, the core DNA-dependent RNA polymerase (core-RNAP) binds to the RpoN-DNA secondary complex to form a stable closed promoter complex. The closed promoter complex is unable to initiate transcription by itself. For this, melting of the double-stranded (ds) DNA within the closed complex is required [2]. This is accomplished by the nucleotide hydrolysis activity of an activator or enhancer-binding protein (EBP). EBPs bind to enhancer sites situated 100 base-pairs (bp) or more upstream of the transcription initiation site. Each EBP is controlled by its own signal transduction pathway, thereby responding to different conditions [3,4,5]. As there is almost no leaky expression in the absence of EBPs, the expression of different RpoN-dependent genes is tightly regulated by the different EBPs.

RpoN is also known as σ54 (from the 54 kDa molecular weight of the Escherichia coli polypeptide), σN (this sigma factor was initially discovered for its requirement for the expression of nitrogen metabolism genes) and NtrA or GlnF (names now not in common use). Because the members of this protein family vary considerably in molecular weight, the designation σ54 cannot be correctly applied to all of them. Furthermore, we use the name RpoN instead of σN as the link with the encoding gene rpoN is more obvious. As some bacteria (for example, Bradyrhizobium japonicum, Rhizobium etli and Mesorhizobium loti) have two copies of the gene, the respective proteins can be easily distinguished as RpoN1 and RpoN2.

RpoN helps initiating the transcription of genes encoding proteins for very diverse functions in a broad range of bacteria [5,6,7]. The processes controlled by RpoN are not essential for cell survival and growth under favorable conditions, with the exception of Myxococcus xanthus [8]. The most widespread RpoN-regulated function in bacteria is the assimilation of ammonia [6,7]. The expression of genes coding for glutamine synthetase, an ammonium transporter and the PII proteins is controlled by the NtrC EBP and RpoN. In the enteric nitrogen regulation (Ntr) paradigm, NtrC is activated via phosphorylation by NtrB. The PII protein, a signal transducer, stimulates dephosphorylation of NtrC-P under conditions of nitrogen excess (reflected by a high glutamine/2-ketoglutarate ratio), whereas it does not affect the NtrC phosphorylation status under nitrogen-limiting conditions [9]. Consequently, under low nitrogen conditions, NtrC-P activates transcription of the NtrC-RpoN regulon.

Species of the genera Allorhizobium, Azorhizobium, Bradyrhizobium, Mesorhizobium, Rhizobium and Sinorhizobium are generally referred to as rhizobia. Rhizobial rpoN mutants are deficient in symbiotic nitrogen fixation [10,11,12,13,14]. RpoN ensures the transcription of most of the nitrogen fixation genes (nif/fix) whose gene products constitute the nitrogenase complex and accessory proteins [15,16]. However, several other symbiosis-related genes are reported to be RpoN-dependent [17,18,19,20,21,22,23,24,25,26,27]. So far, no large-scale effort has been undertaken to unravel the RpoN-regulon in rhizobia. The recent publication of several rhizobial genomes and symbiotic regions is a good opportunity to identify RpoN-regulated genes in rhizobia. To obtain a good view on what (symbiotic) functions are controlled by RpoN, we carried out an in silico analysis on the presence of -24/-12-type promoters in the symbiotic regions of R. etli CFN42, Rhizobium sp. NGR234, B. japonicum USDA110 and M. loti R7A [28,29,30] and in the genomes of M. loti MAFF303099 and Sinorhizobium meliloti 1021 [31,32]. Two closely related non-symbiotic species belonging to the Rhizobiales order of the α-proteobacteria, namely Agrobacterium tumefaciens C58 and Brucella melitensis 16M [33,34], were also included. These are non-symbiotic plant and animal pathogens, respectively. To date, there is only one report on RpoN-dependent functions in A. tumefaciens [35] whereas no information whatsoever is available on the RpoN-regulon of B. melitensis. The possible RpoN-dependent genes predicted by the screening were complemented with data from the literature and classified according to the function of the encoded proteins.

Results and discussion

In silicoidentification of potential RpoN-dependent promoters

The upstream intergenic sequences were extracted from the genomes of M. loti MAFF303099, S. meliloti 1021, A. tumefaciens C58 and B. melitensis M16 and the symbiotic regions of R. etli CFN42, Rhizobium sp. NGR234, B. japonicum USDA110 and M. loti R7A (see Materials and methods) [28,29,30,31,32,33,34]. The upper strand of these sequences was scored against a weight matrix using PATSER (see Materials and methods). Positive matches (possible RpoN-binding sites or -24/-12-type promoters) were classified according to the functions of the encoded gene products (see Additional data files, pages 8,9). To ensure a sound functional description, the predicted proteins were individually screened against the protein databases of the National Center for Biotechnology Information (NCBI) using BLASTP.

In our analysis, the number of false-positive matches is estimated to be very low. The use of PATSER in combination with a strong weight matrix is a preferred method to identify true binding sites [36]. The weight matrix used here was based on a set of 186 characterized RpoN-binding sites of 44 different bacterial species (Table 1) [1]. The lower threshold for the scores was chosen such that only matches strongly resembling the consensus were retained (see Materials and methods). This high stringency allows for a high specificity at the expense of the sensitivity. Indeed, some reports mention the presence of very poorly conserved -24/-12-type promoters that appear to be functional to some degree (see Additional data files, pages 8,9). Our stringent procedure most probably misses these sites. Therefore, the retained matches might represent an underestimation of the actual number of active -24/-12-type promoters. Second, only intergenic sequences were considered for the screening, as all known -24/-12-type promoters are situated in intergenic regions [7]. Moreover, the matches all have the correct orientation, as the upper strand of the intergenic sequences was used (for matches on the lower strand, see 'Additional control mechanisms involving RpoN'). The median of the distances from the -12 (C) position - the cytidylate residue on position -12 relative to the transcription initiation site - of a match to the start codon of the downstream coding sequence (CDS) varies from 74 to 156 bp (Table 2). This is somewhat different from the situation in E. coli, where the average distance amounts to 50 bp [7]. Although roughly 75% of the matches are within 200 bp upstream of the start codon, matches with distances over 1,000 bp were also retained (Table 2). A good example to justify this is the case of the mapped promoter of the B. japonicum gene fixB, which is situated 720 bp upstream of the coding region [37]. As is the case with other in silico prediction methods, the results of our analysis may have been biased by the approach used.

Table 1 Weight matrix based on 186 characterized -24/-12-type promoters [1]
Table 2 Comparison of different members of the Rhizobiales

A good test case to evaluate the reliability of our predictions is to compare the matches with experimental data. In our laboratory we confirmed the RpoN-dependent expression of several R. etli genes with predicted RpoN-binding sites (see Additional data files, pages 3,4,6,7,12,13). The products of these genes are involved in a wide variety of functions, such as nitrogen fixation (nifH and iscN-nifUS) [38,39], oxidative stress and gene regulation (spxA-rpoN2) [13,40], and transport (yp104-103 and yp100) [58]. It was stated that, as RpoN is able to bind to -24/-12-type promoters in the absence of core-RNAP, the negative autoregulation of the rpoN genes occurs by direct interference of RpoN with the binding of σ70-holo-RNAP complex to the -10 promoter region of the rpoN gene. Site-directed mutagenesis of the highly conserved GG to TT in the putative RpoN-binding site of the rpoN promoter relieved the negative autoregulation, giving strong support to the above hypothesis [14].

The in silico screening revealed the presence of oppositely oriented possible RpoN-binding sites upstream of the rpoN genes of M. loti, S. meliloti, A. tumefaciens and B. melitensis (Table 2). A comparison with the rpoN coding sequences of R. etli, R. leguminosarum, Rhizobium sp. NGR234, B. japonicum, M. loti and S. meliloti revealed that the rpoN genes of A. tumefaciens and B. melitensis were incorrectly annotated. Their coding sequences should be 63 bp longer and shorter, respectively. An alignment of the rpoN promoter regions of these species shows the strong conservation of these promoters (Figure 1). In addition, the screening of the lower strand of the intergenic sequences revealed a slightly lower number of matches than that of the correctly oriented RpoN-binding sites (Table 2). It is thus not unconceivable that RpoN could alter the expression of these genes in a way similar to its own autoregulation, that is, by interference with the binding of the holo-RNAP or a regulatory protein to the promoter. This would significantly broaden the working domain of RpoN.

Figure 1
figure 1

Manual alignment of rpoN promoter sequences. At (Agrobacterium tumefaciens, GI: 17738659); Bj (Bradyrhizobium japonicum, GI: 152137); Bm (Brucella melitensis, GI: 17983821); Ml (Mesorhizobium loti, GI: 14023393); NGR (Rhizobium sp. NGR234, GI: 152431); Re (Rhizobium etli, GI: 1046228), Rl (Rhizobium leguminosarum bv. viciae, GI: 5759116), Sm (Sinorhizobium meliloti, GI: 152389). Upper line, -35 and -10 consensus sequences of Escherichia coli and the transcription start site (*) as determined for S. meliloti [69]. Lower line, consensus sequence of -24/-12-type promoter [1]. Nucleotides are shaded in black (100% conserved) or gray (75% conserved). The numbers represent the distance (in bp) from the end of the alignment to the start codon of the downstream rpoN gene.

Conclusions

A highly specific in silico screening method was applied to predict members of the RpoN-regulon in eight different species of the Rhizobiales. The matches obtained were individually checked and classified according to protein function, resulting in a highly annotated and manually curated dataset. This dataset was complemented with available literature data on members of the RpoN-regulon in Rhizobiales. In addition, a screening was carried out to identify possible EBPs controlling the expression of RpoN-dependent genes. Together, these data serve as a source of exhaustive information on the (possible) roles of RpoN in symbiotic and non-symbiotic processes.

RpoN-binding sites were found upstream of genes involved in common RpoN-dependent functions, such as assimilation of ammonium and uptake of C4-dicarboxylic acids. The symbiotic members of the Rhizobiales seem to have recruited RpoN for the expression of nitrogen fixation and other symbiotic genes. This is illustrated by the high number of possible RpoN-binding sites in the symbiotic regions of these bacteria. Other RpoN-dependent symbiotic functions might include detoxification and transport or might be controlled indirectly through other regulatory proteins. Whereas an A. tumefaciens rpoN mutant only displays common RpoN-dependent phenotypes, the relatively high number of possible RpoN-binding sites present in its genome points to several other, yet unidentified, RpoN-dependent functions. So far, no reports are present on RpoN-dependent phenotypes in B. melitensis. This animal pathogen has a significantly lower number of possible RpoN-binding sites than the other members of the Rhizobiales. B. melitensis RpoN might be required for infection of the host organism, as is the case for other pathogens. Furthermore, the species screened seem to have recruited RpoN independently, in a species-specific manner, for the transcription of different gene sets. The high percentage of hypothetical conserved and non-conserved CDSs preceded by a possible RpoN-binding site opens up ample opportunities for future research. Several uncharacterized EBPs were identified besides the 'classic' EBPs such as NtrC, NtrX, NifA and DctD. This implies that signals other than nitrogen, oxygen and C4-dicarboxylates control the expression of RpoN-dependent genes in species of the Rhizobiales. Identification of these signals will give better insight into yet uncharacterized RpoN-dependent functions. Finally, a similar number of possible RpoN-binding sites were found on the lower strand of the upstream intergenic sequences. RpoN might thus significantly extend its working domain by blocking the binding of transcription regulatory factors. Such is the case, for instance, in the negative autoregulation of rpoN.

Although much consideration was given in our analysis to the design and, to some extent, the experimental validation of the approach, experimental confirmation will ultimately be required to establish the biological meaning of the predicted -24/-12-type promoters, as is the case with all computer predictions.

In conclusion, a highly efficient method was applied to predict the RpoN-regulon of different members of the Rhizobiales group. The same approach might be used for the prediction of RpoN-dependent genes in other bacterial species.

Materials and methods

Retrieval of intergenic sequences

Complete DNA sequences from A. tumefaciens C58 (circular chromosome: NC_003304, linear chromosome NC_003305, plasmid AT: NC_003306, plasmid Ti: NC_003308), B. japonicum USDA110 (symbiotic chromosomal region: AF322012 and AF322013), B. melitensis 16 M (chromosome I: NC_003317, chromosome II: NC_003318), M. loti MAFF303099 (chromosome: NC_002678, plasmid pMla: NC_002679, plasmid pMlb: NC_002682), M. loti R7A (symbiotic island: AL672111), Rhizobium sp. NGR234 (plasmid pNGR234a: NC_000914), R. etli CFN42 (plasmid p42d: NC_004041) and S. meliloti 1021 (chromosome: NC_003047, plasmid pSymA: NC_003037, plasmid pSymB: NC_003078) were extracted from GenBank. Intergenic sequences were identified by automatically parsing the corresponding GenBank files using the modules of INCLUsive [59,60]. An intergenic region is defined as the non-coding region between two genes. Intergenic regions smaller than 10 nucleotides were discarded, as the corresponding genes are likely to belong to an operon.

Prediction of possible RpoN-binding sites

The intergenic regions were screened with the PATSER module of the Regulatory Sequence Analysis Tools (RSAT) [61,62,63] for the presence of the -24/-12 promoter consensus sequence. PATSER scores N-mers (in this case 16-mers) from a sequence against a given weight matrix. A set of 186 RpoN-dependent promoters from different bacterial species [1] was used to generate the weight matrix (Table 1). Initially, this matrix was trained against 67 -24/-12-type promoters with a known transcriptional start site. From the distribution of these scores (Figure 2), it was decided to retain all matches with a score higher than or equal to the fifth percentile (8.9). PATSER was run with the GC content of the intergenic sequences as a measure for the a priori probabilities of the nucleotides. The intergenic GC content differs markedly from the total GC content of the genomes (Table 2).

Figure 2
figure 2

Distribution of PATSER scores (see Material and methods) of 67 -24/-12-type promoters with mapped transcriptional start sites. (GenBank GI number: 2979503, 141885, 141892, 38664, 38679, 1769418, 142336, 142378, 142326, 39977, 550310, 408911, 39532, 39516, 39526, 152106, 152283, 152315, 39548, 312974, 12620419, 152100, 152280, 152317, 2316081, 144194, 896457, 144223, 262651, 3493239, 7208421, 262651, 145911, 556890, 41774, 146158, 1004098, 41568, 42538, 149241, 43802, 149252, 149256, 149246, 43857, 43864, 149273, 149275, 150095, 950651, 150993, 490170, 6492432, 151643, 6636054, 46254, 152305, 152230, 340664, 46285, 46324, 46324, 550144, 550144, 664946, 453435, 1649033). Cumulative percentage: black line.

Prediction of possible EBPs

An estimate of the number of possible EBPs in the respective proteomes was obtained by looking for the presence of the Pfam Sigma54_activat domain [64]. Sigma54_activat is a conserved domain present in EBPs that is involved in the ATP-dependent interaction with RpoN. The predicted proteome sequences of A. tumefaciens, B. melitensis, S. meliloti and M. loti MAFF303099 were downloaded from Proteome Analysis @ EBI [65] and the protein sequences of the symbiotic regions of B. japonicum, M. loti R7A, Rhizobium sp. NGR234 and R. etli were obtained from the NCBI protein database [66]. The protein sequences were subsequently queried with the PF00158 HMM motif using HMMER 2.2 g [67]. Matches with an E-value less than or equal to 10-25 were retained.

Additional data files

An additional data file listing all genes and proteins included in this analysis is available. Each protein is accompanied by a functional description, the species from which it comes, GenBank GI number and transcription unit; additional references for each protein are also included in the file. Information on the gene's regulation is provided where available. The data were obtained from an in silico analysis (see Materials and methods) and the literature [12,13,14,17,18,19,20,21,22,23,24,25,26,27,29,30,31,32,3334,39,40,41,42,43,44,46,50,51,56,58,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88].