Introduction

Vibrio cholerae, the causative agent of the diarrheal disease cholera, causes natural pandemics. Strains of the O1 Classical biotype caused the first six pandemics, and the O1 El Tor biotype currently causes the 7th pandemic1,2,3. Pandemic strains cause diarrheal disease with the virulence factors cholera toxin (CT) and toxin co-regulated pilus (TCP)4,5,6,7. Several non-O1 strains, however, carry these main virulence factors and cause isolated cases of cholera-like illness without causing pandemic outbreaks8,9,10. The full set of factors driving V. cholerae pandemicity is unknown.

In its aquatic reservoir and the human small intestine, V. cholerae competes with other bacteria and predatory eukaryotic cells via the type VI secretion system (T6SS), a contractile nanomachine resembling a T4 bacteriophage that kills competitors through the contact-dependent translocation of toxic effectors11,12,13,14,15. The components of the T6SS are encoded in three clusters (the large cluster, auxiliary cluster 1 (Aux1) and auxiliary cluster 2 (Aux2)), each terminating in an effector/immunity (E/I) pair16,17,18. While T6 effectors are toxic to distinct bacteria, kin cells are protected by cognate immunity proteins18,19,20. It is hypothesised that this allows a strain to propagate clonally21. Comparative genomic studies of V. cholerae T6SS clusters show that all pandemic strains carry an identical set of effector genes (A-type), but environmental strains encode variable E/I subtypes22,23. Pan-genome phylogeny of V. cholerae does not reflect the dispersion of these effector subtypes23, suggesting E/I evolution by horizontal gene transfer (HGT). V. cholerae in both the estuarine environment and its human host is exposed to exogenous DNA, bacteriophage and conjugative elements. Further, when in contact with chitin, V. cholerae upregulates the T6SS and natural competence machinery24,25,26, driving rapid evolution via inter- and intra-species competition and the uptake of prey DNA. Recently, chitin-induced horizontal transfer of V. cholerae T6SS effectors was demonstrated in vitro27. These studies indicate the aquatic environment as a reservoir for the acquisition of new E/I subtypes.

Some T6SS components are bacteriophage structural homologues12,13,14, suggesting that the T6SS is the repurposing of one or more prophages. V. cholerae T6SS clusters do not, however, reflect typical prophage genomic organisation or encode functional recombinases. Seventh pandemic El Tor biotype strains encode several genomic islands that do encode phage-like recombination machinery and catalyse site-specific recombination: CTX phage, the SXT element, VPI-1, VPI-2, VSP-I and VSP-II28,29,30,31,32,33. For three of these elements (VPI-1, VPI-2 and VSP-II), integration into and excision from the host chromosome is catalysed by the tyrosine recombinases IntV1, IntV2 and IntV3, respectively30,31,34. Tyrosine recombinases do not effectively catalyse excision from the chromosome on their own and require assistance from small DNA-binding proteins called recombination directionality factors (RDFs)Full size image

Alignment of the Aux3 region in these nine environmental strains reveals variability in the additional sequence between VCA0281 and VCA0284, with most of the variability in the 5′ half of the region (Supplementary Fig. 4a). Further, all environmental strains lack VCA0282 (Supplementary Fig. 4a). Analysis of these nine environmental strains by PHASTER45 predicts that the Aux3 region in non-pandemic strains resembles an intact prophage of the Myoviridae family (Fig. 2b and Supplementary Fig. 5). Closer examination of the annotated coding regions in the environmental Aux3 elements reveals that the 5′ half of each element is composed primarily of phage regulatory genes like cro and cII, toxins, methylases, holins and other non-structural genes, but these cassettes vary between strains (Fig. 2c and Supplementary Data 2). The 3′ half of each environmental Aux3 element is more highly conserved and is composed of tailed phage structural genes including capsid, tail, sheath, tube and baseplate (Fig. 2c and Supplementary Data 2). To assess whether this region produces a phage particle, we collected and precipitated supernatants from V. cholerae 1154-74 and O395. V. cholerae O395 produces the filamentous CTX phage, while 1154-74 encodes a predicted Inovirus (filamentous phage) and the predicted Aux3 Myovirus (tailed phage). We were able to isolate filamentous phage from both O395 and 1154-74, but were not able to detect any tailed phage particles in the 1154-74 supernatant (Supplementary Fig. 4b). Despite its genetic resemblance to an intact prophage sequence, we cannot state that Aux3E encodes an intact prophage.

We performed a core genome alignment of 69 pandemic and environmental V. cholerae strains as well as 8 Vibrio sp. and one V. mimicus isolate (outgroup), which shows that the incidence of Aux3 in environmental strains is not reflective of phylogeny (Fig. 3). This scenario leads us to conclude that while Aux3P likely expanded clonally in pandemic strains, Aux3E may circulate environmentally by HGT. We hypothesise that the evolution of Aux3P in the pandemic lineage began with the integration of a horizontally transferred phage-like element which then underwent a large deletion event to generate the smaller module (Fig. 2d). The inverse event, in which Aux3P gained excess prophage-related genes in a large insertion event to form Aux3E, is also a possible scenario. All Aux3E strains lack insH (VCA0282) (Supplementary Fig. 4a), leading us to assume that the insertion of this element occurred in an evolutionary intermediate (Fig. 2d). We have not yet identified a strain encoding this intermediate Aux3P that lacks the IS5 element. These data support the idea that Aux3 exists in two basic states, environmental Aux3 (Aux3E) and pandemic Aux3 (Aux3P), that share a common origin.

Fig. 3: The Aux3 element is enriched in pandemic V. cholerae and sporadically distributed in environmental strains.
figure 3

A phylogenetic tree was constructed using the GTR Gamma Maximum likelihood model in RAxML based on core genome SNP alignment of 69 V. cholerae, 8 Vibrio sp. and 1 V. mimicus genome sequences. Bootstrap** support values are indicated next to their respective branches. Nodes with support values <70 were collapsed. Presence (black square) or absence (white square) of CT, TCP, O1/O139 antigen and the Aux3E or Aux3P module is indicated. Environmental (yellow), O1 Classical (green), Pre-7th Pandemic O1 El Tor (light blue), 7th Pandemic O1 El Tor (dark blue) and O139 (red) strains are highlighted.

Aux3 is excised from the host chromosome at a defined site

A BLASTP search for the Aux3 integrase amino acid sequence returned a conserved domain hit for “integrase P4”, a common integrase in temperate phages and PAIs known to catalyse integration and excision30,31,46,47. During excision, recombination occurs between attL and attR to reform attB at the chromosomal excision junction and attP on the excised circular DNA element48,49 (Fig. 4a). Thus, we aimed to determine if Aux3 excises from the genome to form a circular element. We tested this hypothesis by inverted PCR with primers outside of the att sites (P1/P4) and primers inside the att sites facing outward (P2/P3 or P2.2/P3.2; Fig. 4a). With this design, P1/P4 will only be brought into proximity for amplification upon excision and P2/P3 will only be in the right orientation upon circularisation. We tested two Aux3E strains (AM-19226 and 1154-74), three Aux3P strains (N16961, C6706 and A1552), and two Aux3-naïve strains (DL4215 and DL4211)50 for excision/circularisation. After 4 h of logarithmic growth, excision of the element is detectable in all Aux3-encoding strains (Fig. 4b). A band indicative of excision is also evident in the tested environmental strains due to the identical nature of the Aux3-naïve and Aux3-excised states. Further, the circular Aux3 module was present in all Aux3-encoding strains and absent from Aux3-naïve strains (Fig. 4b). PCR products were validated by Sanger sequencing against the expected chromosomal and plasmid excision junctions (Supplementary Fig. 6a).

Fig. 4: Integrase truncation and the loss of vefD leads to reduced excision of Aux3P.
figure 4

a Inverse PCR schematic showing integrated and excised Aux3P. Aux3 genes are green, genomic flanks are blue and att sites are orange triangles. Primers are represented by arrows with expected band sizes below. b PCR amplification of excision junctions, attP_Aux3 (P2/P3) and attB_Aux3 (P1/P4), on Aux3E (AM-19226, 1154-74), Aux3P (N16961, C6706 and A1552) and Aux3-naïve (DL4215, DL4211) strains. c Quantification of Aux3 excision by qPCR with primers designed against the naïve repeat 1 (grey) and attB_Aux3 (orange) (Supplementary Fig. 6b) on gDNA from Aux3E and Aux3P strains. Significance was determined by a one-way ANOVA with Tukey’s multiple comparisons test (ns non-significant; **p = 0.0018, 0.0033 and 0.0020; ***p = 0.0002, 0.0003 and 0.0002). d Quantification of attP_Aux3 and attB_Aux3 in AM-19226 by qPCR in comparison to AM-19226 growth. Growth curve values are shown on the left-axis. Relative incidence of excision values are shown on the right-axis. Schematic of primers designed against the attP_Aux3 are shown (dark orange) (Supplementary Fig. 6b). Significance was determined by two-way ANOVA with Sidak’s multiple comparisons test (ns non-significant; **p = 0.0042; ****p < 0.0001). e Circular excision junction PCR on A1552 wildtype, A1552 single and double recombinase null mutants, and DL4215. f Circular excision junction PCR on A1552 and AM-19226 wild-type strains, associated int-null mutants, and DL4215. Null mutants from each strain were trans-complemented with an empty mTn7 (Tn), Tn with the native integrase, or Tn with the opposing Aux3-type integrase. g Circular excision junction PCR on A1552 and A1552 Δint with trans-complementation of both integrase types, each with and without over-expression of the putative Aux3E RDF VefD. E.V. = pBAD24, VefD = pBAD24-vefD, Ara = 0.1% arabinose, and Dex = 0.1% dextrose. b, eg White arrows indicate ladder band sizes. Gels are representative of at least three distinct experiments (n = 3). c, d Quantitative results are from three distinct experiments (n = 3). Horizontal bars (c) or points (d) represent the mean and error bars indicate ±SD. bg Source data are provided as a Source Data file.

To assess the likelihood of Aux3 module transfer to a naïve strain, we measured the incidence of Aux3 excision in each strain by quantitative PCR (qPCR). Primers were designed against the Aux3-naïve repeats to amplify either repeat 1 or attB_Aux3 as well as the circular Aux3 junction attP_Aux3. This experimental setup allows us to quantify excision (reversion to the naïve state) at each chromosomal site and the presence of circular Aux3 modules (Supplementary Fig. 6b). With two repeat sites in the intergenic flanks, there are two potential integration states of Aux3. Measuring the reversion to a naïve site at both repeat 1 and attB_Aux3 allows us to confirm the site of recombination. Our results show that when normalised to total genomic DNA, repeat 1 is present at a ratio of ~1 in all tested strains (Fig. 4c), indicating that repeat 1 is constant. The incidence of attB_Aux3 is ~1/50 genomes for Aux3E strains and ~1/200 genomes for Aux3P strains (Fig. 4c), supporting attB_Aux3 as the site of recombination. Time course analysis was performed to assess changes in excision and circularisation in Aux3E strain AM-19226 during the progression to stationary phase. The portion of genomes with a reformed attB_Aux3 remains constant over the AM-19226 growth curve, while the normalised quantity of circular Aux3E module increases over the AM-19226 growth curve (Fig. 4d). We find that by 4 h of logarithmic growth there is significantly more Aux3E attP_Aux3 junctions than there are attB_Aux3 junctions. This may indicate that the Aux3E module carries an origin of replication, further supporting the idea that Aux3E is of prophage origin. Finally, while we can detect both the recombined, circular Aux3 module and the chromosomal excision scar in A1552 and AM-19226, we are unable to isolate colonies that have lost the Aux3 module (Supplementary Fig. 6d), leading us to hypothesise that Aux3 is likely excised from the genomes of dying cells.

Aux3E and Aux3P strains catalyse excision differentially

To investigate the role of the Aux3P-encoded int and insH recombinases in modular excision, each recombinase was deleted from the A1552 chromosome. Aux3P circularisation was assessed by inverted PCR with primers over the circular junction. New circularisation primers (P2.2/P3.2) were designed because the original P2 primer binds within the deleted integrase sequence (Fig. 4a). Neither single recombinase deletion nor a double knockout abolished circularisation of the Aux3P module in A1552 (Fig. 4e). This could indicate the involvement of an unidentified Aux3-extrinsic recombinase in A1552, as integrase cross-talk between V. cholerae PAIs has been previously reported40. Deletion of the corresponding int gene in the Aux3E strain AM-19226 largely suppressed modular circularisation, and trans-complementation of the Aux3E int gene restored circularisation to wild-type levels (Fig. 4f).

These data, along with the excision qPCR (Fig. 4c), suggest that there are disparities in the mechanism of site-specific recombination between Aux3P and Aux3E strains. One potential explanation for this difference is the presence of the IS5 module in Aux3P. A BLASTP search for the int amino acid sequence predicts this protein as a P4-like integrase and tyrosine recombinase. Pairwise alignment of the amino acid sequences of pandemic and environmental int proteins with other known tyrosine recombinases shows that both have all appropriate catalytic residues intact and strong homology to each other (Supplementary Fig. 7a). At the C-terminus, however, the Aux3E integrase protein is significantly longer than the Aux3P homologue. Closer investigation revealed that the IS5 element in Aux3P inserted immediately downstream of the catalytic Y375 residue, blunting the C-terminal tail of the protein and adding seven nonsense residues encoded by the 5′ end of the IS5 element (Supplementary Fig. 7b). We generated a predictive model of both the full-length and truncated integrase (Supplementary Fig. 7c, d). While the orientation of the catalytic residues is unaffected, IS5 blunting results in a short, disordered C-terminal tail compared to two tyrosine-rich α-helices in the full-length protein (Supplementary Fig. 7c, d), which could explain the decreased incidence of excision seen in Aux3P strains. To test this hypothesis, we trans-complemented the Aux3P integrase into AM-19226 Δint and found that it was unable to rescue Aux3 excision, supporting the conclusion that the Aux3P integrase has lost some functionality (Fig. 4f). In the reverse experiment, trans-complementation of the Aux3E integrase into A1552 Δint does not appear to raise Aux3P excision to environmental levels (Fig. 4f). This suggests that the incidence of excision is reliant on both integrase structure and integrase-extrinsic factors that differ between environmental and pandemic strains.

Loss of an RDF gene contributes to differential excision

We next aimed to identify the integrase-extrinsic factors that play a role in the reduction of excision between Aux3E and Aux3P. Aux3E is much longer than Aux3P and carries many phage-like genes, and so we hypothesised that Aux3E may encode a functional RDF gene that was lost in the transition to Aux3P. Loss of the RDF would shift the Aux3 integrase activity towards integration and favour maintenance of the Aux3 module in the chromosome. We first sought to identify a putative RDF gene in the Aux3E modules. We found that one gene conserved in all 9 Aux3E modules was predicted to be a helix-turn-helix MerR superfamily protein (Fig. 2c and Supplementary Data 2). The lambda phage RDF (** the att site were modified to remove the attP_Aux3 site. Regions containing either attPWT or attPKO sites were amplified off the Gibson assembled fragments and assembled into the SmaI-cut pKNOCK-Kan vector.

Identification of att sites and bacteriophage elements

To identify potential att sites, intergenic sequences from VCA0280 to VCA0281 and VCA0286 to VCA0287 were concatenated and input into REPFIND69 with a minimum repeat size of 10 and a P-value cutoff of 0.0001. To identify putative prophages, GenBank files for Aux3E strains were submitted to PHASTER45.

Nucleotide/amino acid sequence alignment

All genomes used for alignments can be found in Supplementary Table 3. All nucleotide alignments outside of phylogenetic analyses were performed in Geneious Prime (v2019.0.4). Nucleotide sequences encompassing more than one open reading frame were aligned using the Progressive MAUVE algorithm70 to account for insertions, deletions and rearrangements. Single gene, intergenic region or single protein sequence pairwise alignments were performed using MUSCLE (v3.8.425)71.

Aux3 enrichment analysis

MegaBlast queries were performed in Geneious Prime (v2019.0.4). Downstream manipulations and plots were done in RStudio (R version 3.3.2 (2016-10-31) -- “Sincere Pumpkin Patch”). V. cholerae genomic FASTA files were downloaded from the PATRIC database44. Nucleotide sequences for tseH (VCA0285), tseL (VC1418), vasX (VCA0020), vgrG3 (VCA0123), tcpA (VCA0828), and ctxAB (VC1456-VC1457) from O1 El Tor type strain N16961 were queried by MegaBlast against a custom database of PATRIC FASTA sequences to generate a grade (a weighted metric combining query coverage (0.50), e-value (0.25) and pairwise identity (0.25)) for each gene locus in each strain. Strains were grouped based on a 99% grade cutoff for tseH and the three A-type effectors tseL, vasX and vgrG3 to create 4 groups (Supplementary Table 4) and assess occurrence of tseH in AAA pandemic strains by Fisher’s exact test. PATRIC strains were k-means clustered by Partitioning Around Medoids (pam, R package cluster v2.1.0) based on grades for tseH, tseL, vasX, vgrG3, ctxAB and tcpA. Mean grade was determined at each locus for each cluster and plotted as a heat map (pheatmap, R package pheatmap v1.0.12).

Phylogenetic analysis and tree building

Genomic FASTA files for tree building were obtained from the PATRIC database44 or NCBI and annotated using Prokka (v1.12)72. A core genome was extracted from Prokka-output GFF3 files and aligned using Roary (v3.11.2)73. The core genome alignment was reduced to loci harbouring polymorphisms using SNP-sites (v2.4.1)74. Phylogenetic tree was built using the RAxML (v 7.0.4) GTR Gamma Maximum Likelihood model. Statistical branch support was obtained from 100 bootstrap repeats. Phylogenetic trees were visualised from RAxML-generated newick files using TreeGraph 2 (v2.15.0-887 beta)75. Branches with bootstrap** support values <70 were collapsed. Presence of TCP and CTX were determined by MegaBlast for tcpA (VC0828) and ctxAB (VC1456-VC1457). O1 antigen status was determined from the literature. Presence of tseH was determined as described above.

Functional prediction of phage genes in Aux3E modules

Aux3E genomic regions were extracted from gcvT to thrS and regions were re-annotated by Prokka (v1.12)72. All annotated genes (from both original Genbank files and re-annotated files) were extracted and translated. Amino acid sequences for all extracted annotations were submitted to NCBI Conserved Domain Search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) to identify putative functional domain hits. Further functional prediction of select genes was performed by submission to HHpred51 (MPI Bioinformatics Toolkit, https://toolkit.tuebingen.mpg.de/tools/hhpred).

Excision/circularisation PCR and quantitative PCR

Bacterial strains analysed by excision/circularisation PCR or qPCR were grown overnight as described above. Approximately equivalent growth for all analysed strains was verified (Supplementary Fig. 6c). For Fig. 4b, e, f, g overnight cultures were subcultured 1:50 in 5 mL of fresh LB and grown for 4 h. For Fig. 4g, 0.1% arabinose or 0.1% dextrose was added at the 1-h time point. Cultures were normalised to OD600 and pelleted (4300 × g, 10 min) and resuspended at 10X concentration in nuclease free H2O. Cell suspensions were boiled for 5 min to release nucleic acids. PCR was performed on 3 μL of each lysate with the indicated primers. In all, 3% DMSO was added for reactions using P2.2/P3.2 due to lower primer efficiency. For excision/circularisation qPCR, overnight cultures were subcultured 1:50 in 5 mL of fresh LB and grown for 6 h. In all, 1 mL of culture was collected (at 1, 2, 3, 4 and 6 h) and pelleted (14,000 rpm, 2 min). DNA was extracted by phenol/chloroform extraction. All DNA samples were normalised to 20 ng μL−1 and 50 ng μL−1. qPCR was performed on 250 ng (Fig. 4c) or 100 ng (Fig. 4d) of each sample in a 20-μL reaction volume with Bio-Rad SYBR Green Master Mix according to the product manual. Primers targeted repeat 1, attB_Aux3, and ompW or attP_Aux3 and ompW. Data was collected in Bio-Rad CFX Manager 3.1. All targets were measured by absolute quantification against the following standard curves: A1552 ΔAux3 genomic DNA (Aux3-naïve) for repeat 1, attB_Aux3, and ompW and pUC19-attP plasmid DNA for attP_Aux3. Repeat 1, attB_Aux3, and attP_Aux3 signal was normalised to ompW to control for variability in input DNA. Averages of at least three independent experiments (±standard deviation) are provided.

Aux3 excision tracking by colony-forming unit counts

Strains with an Aux3 internal kanamycin resistance cassette with struck out for isolated colonies on LB agar plates with the addition of rifampicin and kanamycin (A1552) or streptomycin and kanamycin (AM-19226). Three individual clones were selected for each tested strain. Clones were inoculated in 5 mL of LB media with rifampicin (A1552) or streptomycin (AM-19226) and grown shaking at 37 °C. At 24 hr each culture was subcultured at a ratio of 1:100 into 5 mL of fresh LB with the above indicated antibiotics. Remaining culture was serially diluted at a 1:10 ratio to 10-7. Dilution series were spotted (5 uL) on LB agar plates both with (Aux3-maintained colony-forming unit (CFU)) and without (total CFU) the addition of kanamycin. This process was repeated at the 48-h time point. Colonies were counted from the highest countable dilution spot to determine viable CFU.

Aux3 module transfer experiments

Overnight cultures of recipient strains (A1552 ΔAux3 with variable mTn7 constructs) and donor strains (S17 λpir with variable pKNOCK vectors) were pelleted (4300 × g, 10 min) and resuspended at 10X concentration in LB media. In total, 10 μL of each concentrated cell suspension was resuspended in 1 mL LB (1:100), from which serial dilutions were prepared and plated as spots on LB agar to determine input CFU. Donor and recipient strains were mixed in all combinations at a ratio of 10:1 donor to recipient. Mixtures were plated as 25 μL spots on Durapore 0.22 μm PVDF filters (Millipore Sigma) on pre-dried, pre-warmed LB agar plates with either arabinose or dextrose. Spots were dried and incubated at 37 °C for 24 h. After 24-h incubation, filters were collected and resuspended by vortexing in 1 mL LB media. Serial dilutions were prepared from each suspension, and dilution spots were plated on LB agar with kanamycin (donors), rifampicin + gentamycin (recipients), or rifampicin + gentamycin + kanamycin (cointegrates) to determine CFU for each subset of cells. Colonies were counted from the lowest countable dilution. Cointegrate formation frequency was determined by dividing cointegrate CFU/mL by total recipient CFU/mL. Averages of at least three independent experiments (±standard deviation) are provided (Supplementary Table 5).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.