Background

Anopheles gambiae sensu stricto is the major sub-Saharan vector for the human malaria parasite Plasmodium falciparum and the nominotypical member of a set of morphologically indistinguishable species that comprise the Anopheles gambiae complex [1]. The two molecular forms of An. gambiae s.s. (M and S), along with Anopheles arabiensis, constitute the major malaria vectors within this species complex. Despite their close evolutionary relationship, other members of the complex display either little (Anopheles merus, Anopheles melas and Anopheles bwambae) or no (Anopheles quadriannulatus A and Anopheles quadriannulatus B) vectorial capacity for human malaria [2].

Interestingly, the sole non-vector member of this species complex, An. quadriannulatus nevertheless is competent for P. falciparum infection [3, 4] and molecular evidence suggests that the karyotype for this species derived directly from that of the main vector An. gambiae s.s. [62] that house three Orco-expressing ORNs [13, 16]. Therefore, while An. gambiae antennae might possess a very slight advantage in OR-mediated odor sensitivity, our transcriptional data largely agrees with the comparative morphologic study to imply that that both species share equivalent olfactory capabilities [62].

Similarly, in both species half of the sum totals of tuning OR transcripts in the antenna were comprised of a small, largely identical subset of either 7 ORs in An. gambiae or 8 ORs in An. quadriannulatus. Within this top 50%, 5 ORs were shared between species (Ors 11, 15, 24, 68 and 75) and had an average dN/dS below that of the OR class as a whole. Therefore, in terms of relative transcript abundance, most of the predominant antennal Ors shared between the species were also more conserved at the sequence level.

Beyond these similarities, the composition of the remainder of the tuning OR pool appeared to vary substantially between the two species (Figure 3). In total, 49 of 58 (84%) tuning ORs showed significant differences, 16 of which were more than a 2-fold enriched in one of the species.

In An. gambiae antennae, the most noticeable overall trend in differential OR abundance was the degree to which select ORs were enriched as compared to An. quadriannulatus (Figure 4). While there were no ORs whose antennal expression appeared specific to An. gambiae, 29 tuning ORs showed significant levels of enrichment in An. gambiae, with ORs 36, 60, 69, and 75 each showing as much as a 4–6 fold enrichment (Figure 3). Overall, these An. gambiae enriched ORs were 6-fold more abundant than the combined pool of depleted ORs. This stands in marked contrast to the balanced distribution of ORs in An. quadriannulatus, with enriched and depleted ORs showing similar expression levels in terms of overall RPKM (Figure 4). Taken together, the OR-mediated odor coding of the An. gambiae antennae appears to be an overrepresented subset (Fisher’s Exact test, p=2.2x10-16) of ORs whose orthologs are also present in An. quadriannulatus. This sizeable skew in the distribution of ORs implies that the An. gambiae antenna predominantly expresses only a subset of those ORs within the antenna of An. quadriannulatus.

Figure 4
figure 4

Distribution of differentially abundant antennal Ors and their relative abundance levels in An. gambiae and An. quadriannulatus . Individual tuning Or orthologs are represented by bubbles with areas scaled to their respective abundance (RPKM) in either An. gambiae (red) or An. quadriannulatus (blue). Or orthologs are arranged horizontally based upon their enrichment (GFOLD value) in either An. gambiae (left) or An. quadriannulatus (right). Total RPKMs for each quadrant are indicated in the center. The asterisk denotes the larger than expected proportion of Or abundance in An. gambiae ascribable to Ors that are also enriched in An. gambiae (Fisher’s Exact Test, p=2.2x10-16).

When differential levels of OR transcripts were viewed within the context of molecular divergence (Figure 5), there was no significant correlation between transcript enrichment and dN/dS ratio. However, it was clear that ORs with higher evolutionary rates were also more variable in terms of transcript enrichment and tended to display higher enrichment levels. When ORs were analyzed in quartiles based on their dN/dS ratios, the upper three quartiles (dN/dS ratio ≥ 0.1) showed significantly higher median and variance values of transcript enrichment as compared with the first quartile, either individually or collectively (see Additional file 7: Table S4). Interestingly, the opposite trend was observed at the level of the antennal transcriptome profile, where genes in the first quartile (with lower dN/dS ratios) displayed greater magnitude and variability of transcript enrichment (see Additional file 7: Table S4). In addition, ORs with dN/dS ratios above the transcriptome median (0.0611) comprised the majority of detectable ORs and showed significantly higher levels of enrichment than those genes in the transcriptome background in the upper half of the dN/dS (Wilcoxon rank sum test, p=0.02792). This contrast, once again, highlights that ORs are under rapid evolution at both sequence and expression levels.

Figure 5
figure 5

Differential expression of antennal ORs plotted against dN/dS. X-axis represents the absolute GFOLD score (reliable log2 fold difference in transcript abundance) for Ors enriched in either An. gambiae (left half) or in An. quadriannulatus (right half). Ors displaying no significant difference in transcript abundance are plotted at zero. Y-axis is the interspecific dN/dS for each Or. Ors are color coded as follows: grey: conserved in sequence and in transcription, blue: conserved in sequence but diverged in transcription, yellow: diverged in sequence but conserved in transcription, green: diverged in sequence and in transcription. Horizontal dashed line denotes the top 10% of transcriptome wide dN/dS value and the vertical dashed line denotes the top 10% of transcriptome wide, absolute fold change.

Overall, there were 11 and 9 ORs that resided in the top 10% of the transcriptome profile in terms of their evolutionary rates and absolute levels of transcript enrichment, respectively (Figure 5). Four of these ORs showed both high sequence divergence and abundance differences, while the remaining genes differed in either sequence or abundance. This pattern suggests that sequence divergence and differential abundance represent two non-mutually exclusive mechanisms for the evolution of ORs, and perhaps other chemosensory genes. Those ORs with exceptionally high levels of sequence divergence and/or transcript enrichment likely play important roles in chemosensory-mediated behavioral differences between An. gambiae and An. quadriannulatus. Some of the relatively more conserved ORs might be interesting as well. For instance, Or35 is the most conserved tuning OR but its absolute fold change was ranked within the top 20% of the antennal transcriptome profile.

Differential receptivity analysis

We have previously integrated OR functional data with RNAseq data to model the receptivity profile for the antenna of An. gambiae following a bloodmeal [28]. This analytical approach synthesized the effects of many small changes in the expression profiles of individual tuning ORs to treat the antenna as a single, chemosensory unit. Applying the same methodology here, to effectively map the An. gambiae odorant receptivity onto the An. quadriannulatus OR transcriptome profile, we modeled potential odor-coding differences between these two species. While it is important to note that this approach assumes the general functional conservation among interspecific OR orthologs, this is a reasonable assumption given that non-conservative substitutions observed among the ORs occur in the trans-membrane and intra-cellular loop regions and are therefore most likely to impact the channel properties of the Orco-OR complex rather than OR-ligand interactions [59].

While the results of this analysis showed the species to share a similar level of receptivity toward three floral compounds (fenchone, isobutyl-acetate and methyl-benzoate), there appears to be a general reduction in relative receptivity within the An. gambiae antenna to many of the odorants tested. An. quadriannulatus appeared more receptive to a wide range of chemical classes including most aromatic compounds and many alcohols (Figure 6; Additional file 8: Table S5), and while many of these compounds are plant associated some are also components of human skin [36, 38, 6365]. Of those compounds to which An. quadriannulatus appears more receptive, the two indolic compounds are known to be important to the chemical ecology of many mosquito species [36, 37, 64, 6668]. While both indole and 3-methylindole have been characterized as human associated compounds [36, 64, 69], they are also associated with other natural sources, including decaying organic material and animal excreta [66]. Accordingly, we cannot discount the possibility that the same odorant can elicit different perceptions dependent upon ecological context. Nevertheless, the presence of these compounds along with the several other human associated odorants can also be rationalized within the context of human host-seeking since An. quadriannulatus displays limited, anthropophagic behavior as well [70].

Figure 6
figure 6

Differences in OR mediated odorant receptivity between An. gambiae and An. quadriannulatus antennae. Vertical axis represents computed, interspecific differences in antennal receptivity to a panel of odors. Displayed results are sorted left to right based upon the level of each odor’s relative receptivity enhancement in either An. gambiae (positive values) or An. quadriannulatus (negative values). The grey region around zero denotes an absolute change in relative receptivity of 10% or less. Chemical names are color coded by chemical class and asterisks denote chemical classes whose receptivity is disproportionately represented in one species (Fishers Exact Test, p<0.05). Red points denote odors that have been detected in human-associated skin emanations.

In contrast, the OR-mediated olfactory specialization of An. gambiae antenna appears to be heavily biased (Fishers Exact test, p=0.06) toward odors which have been previously associated with human skin emanation, including a majority of the esters assayed (Fishers Exact test, p=0.04). Furthermore, if we only consider compounds that showed a change in relative receptivity greater than 10% in either species that show only minimal, the apparent enhanced receptivity of An. gambiae to human-associated odor chemicals becomes even more significant (Fisher’s Exact test p=0.02). Moreover, some human associated odors have greater magnitudes of receptivity enhancement in An. gambiae to as compared to any of those in An. quadriannulatus (Figure 6). This notable trend agrees with both the molecular and the transcriptional analyses above, further suggesting that at the molecular level, the OR-mediated sensitivity of the antennae of An. gambiae appears to be more focused and specialized than that of An. quadriannulatus.

Conclusions

In this study we examined the RNA composition of the peripheral chemosensory tissues of An. gambiae s.s. and An. quadriannulatus, two closely related members of the An. gambiae species complex. Because these two species are phenotypically divergent in terms of their host seeking predilections, we looked specifically at differences within the chemosensory gene classes, both at the molecular level and at the transcriptional level. Overall, while the chemosensory gene repertoire was highly conserved, we found that rates of evolution of each of the chemosensory gene families were more rapid than the genomic background. In particular, we identified considerable levels of radical amino acid changes between orthologous OR genes that may potentially result in functional differences. To our knowledge, this is the first comparative study of the chemosensory gene repertoire between sibling species that are diverged by only several thousand years ago. Unlike the dramatic copy number changes often observed in comparisons of more distantly related species, these results suggest that functional divergence between orthologous chemosensory genes may be key in driving behavioral differences in the immediate aftermath of speciation events.

A careful analysis of their antennal transcriptome profiles also revealed both the overall conservation of some critical chemosensory transcripts (e.g. Orco), along with large degrees of abundance differences among some individual gene family members. The observed similarities confirm results of prior morphological studies that reported the antennae of both species share similar sensilla densities overall [62]. Though no ORs appeared to be exclusively expressed within the An. gambiae antenna, the divergence in the overall transcriptional profile of the ORs was considerable. The specific ORs whose transcripts comprise the preponderance of OR transcripts within the antennae of An. gambiae are also greatly enriched as compared to An. quadriannulatus, indicating that in terms of OR composition, the An. gambiae antenna appears most likely to be a specialization of the An. quadriannulatus antenna.

When these interspecific abundance differences in the OR gene family members were integrated in silico with AgOr functional data, the resulting antennal “receptivities” again indicated that the human-biased odor receptivity of An. gambiae was most likely a refinement of that of An. quadriannulatus. Moreover, this biased receptivity of An. gambiae antenna toward human-derived odors may be further augmented by the functional differences between orthologous ORs suggested by our sequence analyses. Future functional tests of AqOr –odor tuning will further improve our understanding in this regard.

Taken together, and given the central role that ORs play in defining host specificity, the anthropophagy of An. gambiae is most likely not derived from the evolution of any single OR specific for the purpose of human host seeking. Instead, we posit the receptivity bias in the antenna of An. gambiae toward human host odors is likely the result of the cumulative effects of both functional divergences and changes in the abundance and distribution of common ORs already present within the An. gambiae species complex.

Methods

Gene annotation

The genome assemblies of An. gambiae (version AgamP3) and An. quadriannulatus (version 1) were downloaded from the websites of VectorBase (http://www.vectorbase.org) and Broad Institute (olive.broadinstitute.org), respectively. The annotation of chemosensory genes was performed following a previous protocol [45]. In brief, previously reported chemosensory genes from An. gambiae, Aedes aegypti, Culex quinquefasciatus, and D. melanogaster were used as queries in TBLASTN [71] searches against the two anopheline genomes. Putative chemosensory gene coding loci were identified after filtering out low-scoring blast hits. For each locus, the query sequence that yield the highest bit score was selected as reference to perform homology-based gene prediction using GeneWise (version 2.2.0; [72]). All gene models were manually inspected and modified if needed. All genomic data is available through VectorBase and the annotated chemoreceptor sequences are listed in supplementary Table S1.

Phylogenetic analysis

For each of the OR/GR/IR/OBP families, protein sequences of genes in the two mosquitoes were aligned using MAFFT (version 7.037b; [73]). The multiple sequence alignments were manually curated and poorly aligned regions were removed using trimAl (version 1.4; [74]) with “automated1” option. Maximum-likelihood trees were constructed using RAxML (version 7.4.7; [75]) and the reliability of tree topology was evaluated with 100 bootstrap replicates. Resulting gene trees were reconciled with the species phylogeny to estimate ancestral gene copy numbers and gene gain and loss events. An orthologous group is defined as a highly supported clade (greater than 90%) representing a single gene in the common ancestor of An. gambiae and An. quadriannulatus.

Analysis of sequence divergence

For each orthologous pair of chemosensory genes in An. gambiae and An. quadriannulatus, protein sequences were aligned using MAFFT and the corresponding nucleotide alignment was generated using a custom script (available upon request). The rate of amino acid substitution and dN/dS ratio were calculated using PROTDIST (from the Phylip package version 3.69) and CodeML (from the PAML package version 4.7; [76]), respectively. The dR/dC ratio was calculated using the Zhang method [77], for which radical and conservative amino acid changes were defined by the Dayhoff classes (“AGPST”, “DENQ”, “HKR”, “ILMV”, “FWY”, and “C”). The topologies of Or proteins were predicted using TOPCONS [78] and the number of radical/conservative amino acid changes in transmembrane domain regions were counted accordingly.

To identify additional orthologous gene pairs between the two mosquitoes, de novo transcriptome assembly of An. quadriannulatus was generated and likely coding regions were extracted, both using Trinity (version 2012-10-05; [79]) Orthologous groups were then constructed from annotated genes in An. gambiae (version AgamP3.7) and likely coding sequences in An. quadriannulatus using orthoMCL (version 2.0.5; [80]) Protein divergence, dN/dS ratio, and dR/dC ratio were calculated for each 1-to-1 orthologous pair similarly to chemosensory gene pairs.

Mosquitos and mosquito rearing

An. gambiae sensu stricto (SUA 2La/2La, an M-form isolate originating from Suakoko, Liberia) and An. quadriannulatus (SKUQUA, an A form isolate originating from Skukuza, South Africa) were reared in the Vanderbilt Insectary Facility as described previously [21]. Adult mosquitoes were reared under 12:12 light–dark conditions and had constant access to 10% sucrose solution.

RNA isolation and RNA sequencing

Four to six day old adult female mosquitoes from each species were collected in the middle of the light phase (~ZT6) for antennal resection. For each collection, antennae were hand-resected into TRIzol, and total RNA was isolated. mRNA isolation and cDNA library preparation were carried out using the Illumina mRNA sequencing kit (Illumina Inc.; San Diego, CA). Libraries were barcoded and sequenced in paired-end fashion (50PE An. quadriannulatus, 100PE for An. gambiae) on an Illumina HiSeq2000. Approximately 30 million reads were generated for each sample. No biological replicates were preformed becasue sample-to-sample variation in RNAseq results among anophelene antennae has been observed to be very low (Additional file 9: Figure S3).

Data processing and abundance profiling

Individual Illumina read files (fastq) were trimmed and filtered using Trimmomatic, a software package specifically designed for trimming NGS reads. Paired end Trimmomatic parameters used were: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36. FastQC was used for data set quality checking.

To better quantify transcript abundances in An. quadriannulatus, a modified version of the An. gambiae reference genome was prepared to eliminate potential bias caused by genomic sequence differences between the two species. The reads of An. quadriannulatus were first mapped to the An. gambiae reference genome (version AgamP3) using Tophat2 (version 2.0.8) with the guidance of gene annotation (version AgamP3.7), and only one alignment was reported for each mapped read. Fixed differences between the species were called and filtered using SAMtools (version 0.1.18) with a minimum read depth of 5 and variant quality score of 60. We then replaced nucleotides in the An. gambiae reference genome at sites of fixed differences with each site’s most frequent, alternative allele. This modified reference genome sequence was used for subsequent analyses of An. quadriannulatus transcriptome. Finally, reads were then aligned to the respective, indexed genome using Tophat2 [81].

Differential transcript abundance calculation

Statistical significance along with fold change was determined by pairwise comparison of the Tophat2 alignments for each of the two species using GFOLD (version 1.0.9 [82]) configured for a 99 percent confidence interval. The result was a set of GFOLD values (a.k.a. GFOLD’s “reliable” log2 fold change) for each An. gambiae gene identifier (AGAP); GFOLD values other than zero are considered as significantly, differentially expressed.

Odorant receptivity changes

Relative differences in odorant receptivity between the An. gambiae and An. quadriannulatus were calculated from physiologic, odorant-response data from previously published functional deorphanization of An. gambiae odorant receptors [25, 26]. The SSR data was first filtered to remove any Ors or chemicals that failed to elicit a 100 spikes/second increase over baseline in at least one assay. One hundred spikes per second was chosen to retain only more-robustly responding receptors and ligands in an attempt to mitigate any small potency differences that might exist between the species. Odor-induced decreases in spiking frequency were treated as indeterminate and treated as zero. The response of each AgOr (spikes/second increase) to each odorant was then weighted by the normalized abundance level (RPKM [83]) of that Or. Odorant responses in weighted-spikes-per-second were then summed for each odorant in each species, resulting in an “antennal receptivity” for that species. Finally, the interspecific “receptivity change” of the antenna was calculated by dividing the “antennal receptivity” of An. gambiae by that of An. quadriannulatus.