Background

The cartilaginous fishes (Chondrichthyes) diverged from a common ancestor with other vertebrates around 450 million years ago (mya) and are comprised of Holocephali (chimaeras) and Elasmobranchii (sharks, skates, and rays), which likely split between 300 and 420 mya [1, 2]. They represent the most phylogenetically-distant relatives of mammals to have an adaptive immune system based on somatically-rearranging immunoglobulins (i.e. antibodies) and T cell receptors, as well as major histocompatibility complex molecules [3, 4]. Despite their key evolutionary position, the only high-quality genome assembly available for this group is that of the elephant shark (Callorhinchus milii); a chimaera [5]. This dataset has been used to infer the presence or absence of many genes in the cartilaginous fishes [5]. However, distinct scenarios of gene family evolution are likely to have played out within cartilaginous fish evolutionary history, most notably across the vast time separating chimaeras and elasmobranches (e.g [6]), questioning the use of a single species to infer the presence or absence of genes in an entire vertebrate class.

In this respect, an initial survey of the elephant shark genome suggested the immune gene repertoire of cartilaginous fishes was very different to that of bony jawed vertebrates, lacking many CD4+ T cell-associated genes present in mammals [5]. T cells expressing the CD4 co-receptor are vital for mounting an adaptive immune response [10]. For phylogenetic analyses involving putative cartilaginous fish IL-6Rα sequences, we included the closely related IL-11Rα and CNTFRα proteins [60], and employed a relaxed clock rooting approach [87]. The results firmly place the root between IL-6Rα and the other two proteins (RPP = 0.98), indicating that IL-11Rα and CNTFRα are more closely related to each other than to IL-6Rα (BPP = 1.00; UB = 99%) (Fig. 6a). Both Bayesian and maximum likelihood phylogenetic analyses strongly support direct orthology of cartilaginous fish IL-6Rα sequences to those in other jawed vertebrates (BPP = 0.99; UB = 99%) demonstrating that an IL-6Rα gene was present in the jawed vertebrate ancestor (Fig. 6a). Moreover, this approach also robustly supports the existence of cartilaginous fish orthologues of IL-11Rα (BPP = 1.00; UB = 92) and CNTFRα (BPP = 1.00; UB = 100) (Fig. 6a).

Fig. 6
figure 6

Phylogenetic analysis of the (a) IL-6Rα family, and the (b) IL2Rα/IL-15Rα family. Details as per Fig. 3

IL-2Rα/IL-15Rα family

IL-2Rα forms part of the IL-2R heterotrimer, which is pivotal to maintenance and growth of the immunomodulatory Treg lineage [17], but is thought to be missing from cartilaginous fishes [5]. Dijkstra [18] suggested that IL-2Rα separated from IL-15Rα early in tetrapod evolution, and that IL-15Rα functionally accommodates the role(s) of IL-2Rα in teleost fishes. Our BLAST and HMMER searches identified putative orthologues of IL-15Rα, and while no appropriate outgroup is known, we performed relaxed clock rooted phylogenetic analyses of IL-2Rα and IL-15Rα. This result appears to verify the identity of cartilaginous fish IL-15Rα (BPP = 0.92; UB = 100) (Fig. 6b). Interestingly however, we found no evidence for IL-2Rα emerging from IL-15Rα, rather it seems that they diverged from a common ancestor prior to the divergence of cartilaginous fishes and bony vertebrates (RPP ≥ 0.97) (Fig. 6b).

IL-23R and the class 1 group 2 cytokine receptor family

IL-23R is a cytokine receptor specific to TH17 cells [8, 10, 12, 15]. To verify the putative IL-23R identified by BLAST in cartilaginous fishes, and to better understand the evolution of cytokine receptors, we carried out a phylogenetic analysis of the class 1 group 2 cytokine receptor family [61]. This revealed that putative cartilaginous fish IL-23R falls sister to IL-23R of tetrapods (BPP = 1.00; UB = 99%; PPP = 1.00) indicating the presence of an IL-23R orthologue in cartilaginous fishes (Fig. 7). The analyses support inclusion of IL-23R within a subfamily that also contains IL27Rα, and IL-12Rβ2 (BPP = 1.00; UB = 100; PPP = 1.00) (Fig. 7). IL27Rα and IL-12Rβ2 are involved in TH1 cell differentiation and, due to their relationships to bony vertebrate sequences, our data suggest that direct orthologues exist for these genes in cartilaginous fishes (BPP = 1.00; UB = 97%; PPP = 1.00, and BPP = 1.00; UB = 93%; PPP = 1.00, respectively) (Fig. 7).

Fig. 7
figure 7

Phylogenetic analysis of class 1 group 2 cytokine receptors reveals an IL-23R orthologue in cartilaginous fishes. Details as per Fig. 3

This analysis also provides insights into the evolution of the other included class 1 group 2 cytokine receptors. GP-130 (also known as IL-6Rβ or IL-6ST), which forms complexes with many IL-6 and IL-6R family members, plays a key role in both promoting and suppressing inflammation, and is essential for embryo survival in mammals [98]. Three copies of GP-130 exist in cartilaginous fishes [5], which evidently result from two lineage-specific duplications (BPP ≥ 0.97; UB ≥ 72%; PPP ≥ 0.9) (Fig. 7). A GCSFR clade, which also contains cartilaginous fishes (BPP = 1.00; UB = 98%; PPP = 1.00), falls sister to GP-130 (BPP = 1.00; UB = 93%; PPP = 1.00), which together are sister to the IL-23Rα, IL-27Rα, and IL-12Rβ2 clade (BPP = 1.00; UB = 93%; PPP = 0.93) (Fig. 7). Outside this grou**, a cartilaginous fish sequence falls within an OSMR (multifunctional) and LIFR (tumour metastasis suppressor [99]) clade (BPP = 1.00; UB = 1.00%; PPP = 1.00) (Fig. 7). Finally, cartilaginous fishes possess a putative orthologue of leptin receptor (LEP-R), a hypothalamic appetite-controlling hormone receptor, as this sequence formed a clade with bony vertebrate LEP-R (BPP = 1.00; UB = 100%; PPP = 1.00), and relaxed clock rooting analysis best places the root between LEP-R and the other family members (RPP = 0.66) (Fig. 7).

ROR transcription factor family

Having identified orthologues of two cytokine receptors associated with the TH17 subset (IL-23R and IL-6R), we performed a variety of phylogenetic rooting analyses to look for evidence of the transcription factor ROR-γ, the master regulator of TH17 cells [100]. ROR-γ is a member of the larger ROR family and was reported missing in elephant shark [5]. We tested a relaxed clock rooting method (Fig. 8a), and two alternative outgroups; fruit fly HR3 (Fig. 8b), and the human RAR family (Fig. 8c), both closely related nuclear receptors [67]. These approaches did not provide congruent support for any root position (Fig. 8), which may result from the major difference in evolutionary rate between ROR-γ and the other RORs (Fig. 8). However, our results are consistent with two new findings: (i) ROR-γ existed in the jawed vertebrate ancestor, though evidence for its presence in cartilaginous fishes depends on root placement in relaxed clock analyses (Fig. 8a), and (ii) a fourth member of the vertebrate ROR family, which falls sister to ROR-β (BPP = 0.99; UB ≥ 79%), exists, but is possibly lost in mammals and teleosts. We propose the name ROR-δ for this new family member (Fig. 8).

Fig. 8
figure 8

Phylogenetic analyses of the vertebrate ROR family shows that ROR-γ existed in the jawed vertebrate ancestor and reveals a new vertebrate ROR-β paralog not found in mammals (which we name ROR-δ). Alternative rooting strategies, using (a) a relaxed clock model, (b) fruit fly HR3 as outgroup, or (c) human RARs as outgroup, show that the root of the ROR phylogeny cannot be confidently placed. Gene level clades are collapsed in (b) and (c), but contain the same taxa as a Other details as per Fig. 3

FOXP transcription factor family

FOXP3 is the master regulator of Treg cell development and function in mammals [101]. A FOXP3 homologue was identified in elephant shark, but presumed non-functional by Venkatesh and colleagues [5] based on analysis of the DNA-binding domain. However, Dijkstra [18] has suggested that this inference may be premature. Our phylogenetic analyses suggest that cartilaginous fishes possess orthologues to all four mammalian FOXP family members (Fig. 9). Like the ROR family, the relationships between these genes are not easily resolved as different root positions are favored when the tree is rooted with either relaxed clocks or invertebrate FOXP sequences (Fig. 9). Another common feature between the FOXP and ROR families is a striking increase in evolutionary rate in the family member involved in T cell biology (i.e. immune functioning RORC and FOXP3), as compared to the other family members (Figs 8 and 9). We generated a multiple sequence alignment of cartilaginous fish FOXP3 DNA binding domains against those of other jawed vertebrates to explore the issue of FOXP3 functionality in cartilaginous fishes and the jawed vertebrate ancestor. This revealed that the sites predicted to lead to non-functionality in cartilaginous fishes by Venkatesh et al. [5] are not noticeably more divergent from human than those of other non-mammals, and certainly no more so than expected in the context of species phylogeny and divergence times [1, 2].

Fig. 9
figure 9

Phylogenetic analyses of the vertebrate FOXP family verifies the existence of cartilaginous fish orthologues to FOXP1–4, but alternative rooting strategies, using (a) a relaxed clock model, or (b) invertebrate FOXP sequences as an outgroup, show that the root of the FOXP phylogeny cannot be confidently placed. Other details for (a) and (b) as per Figs. 3 and 6. (c) Alignment of the FOXP3 DNA-binding domain from phylogenetically representative vertebrates suggests that cartilaginous fish FOXP3 is not atypical

Discussion

Transcriptomes, taxonomy, and gene discovery

As available genomic data remains relatively sparse for cartilaginous fish, we generated a normalised multi-tissue transcriptome for the small-spotted catshark, with the goal of maximizing representation of novel transcripts. We applied a variety of trimming approaches and tested subsequent assemblies using various statistical approaches. While some of the assemblies contained excessive numbers of transcripts considering the number of genes typical of a vertebrate genome, we did not introduce filters by coverage, length, or contamination, thus retaining as many novel transcripts as possible. The results indicate that while statistical methods can be useful to determine the most contiguous (e.g. high N50), or most complete assembly (e.g. fewest missing BUSCOs), choosing a ‘best’ assembly may lead to loss of interesting data, such as novel sequences or full-length transcripts. The findings similarly highlighted the differential presence of transcripts of interest in our datasets compared to those of past transcriptome studies of the same species [6, 30], to the genome of the distantly related elephant shark [5], and to genomic data from other shark species. Our study thus supports the notion that using a single genome [102] or transcriptome assembly [103], or species [104], is grossly insufficient to adequately assess gene presence or absence in a vertebrate class. Our results also suggest that paired-end data, or longer reads than those applied here, should also be utilised where possible. Despite this, the data generated in this study contains novel sequences for cartilaginous fishes, and other researchers should benefit from this resource.

Adequate phylogenetic modelling of fast-evolving immune genes

A precarious balance must be maintained in immune gene evolution to uphold structural integrity and functionality, while avoiding pathogen subversion. As such, immune genes evolve rapidly, but with strong site-specific evolutionary pressures; both of which can contribute to accumulation of hidden substitutions (homoplasy) over time, which is known to cause phylogenetic errors. In line with this, standard phylogenetic models inadequately predicted the diversity of amino acid alphabets across sites in the immune gene datasets tested in this study. This inadequacy to detect site-specific biochemical constraints indicates that a model has an impaired capacity to infer hidden substitutions in the data [44]. To the best of our knowledge this is the first report of the inadequacy of standard phylogenetic models for immune gene datasets, though this result is not surprising given the complex evolutionary pressures imposed on immune genes by the host-pathogen arms race. In stark contrast, and consistent with our hypothesis that site-heterogeneous models would better accommodate the rapid and complex evolutionary patterns of immune genes, CAT-based models adequately captured site-specific amino acid alphabet diversity for all tested datasets. These findings imply (based on [44]) that standard models will often fit poorly to immune gene datasets, and that CAT-based models should typically produce more accurate phylogenetic trees for immune genes in future studies.

The problem of rooting rate asymmetric phylogenetic trees

Increased attention has been given recently to the prevalence of asymmetric evolutionary rates between different members of gene families and the negative impact this has on phylogenetic inference [105, 106]. Here, for the ROR and FOXP transcription factor family phylogenetic analyses we found that the immune genes RORC and FOXP3 had drastically increased evolutionary rates compared to their relatives. In the case of outgroup-free relaxed clock rooting analyses, the root fell between the fast-evolving immune gene and the rest of the family, although this was never the case using outgroups. This suggests that clock rooting may be susceptible to error in the face of extreme rate asymmetry, even when an uncorrelated relaxed clock model [87] is applied. However, multiple alternate outgroups were tested for the ROR family and these resulted in different root positions, meaning that the root placement under the relaxed clock cannot be reliably dismissed. Interestingly, it appears that for families of immune genes with a shared fast-evolutionary rate this phylogenetic difficulty is not as prevalent, with clocks and outgroups supporting a common root position (e.g. IL-6R family, IL-7/9 family, IL-4/13 family). As such, while many factors may contribute to the phylogenetic incongruence in the transcription factor families analysed here (e.g. rediploidisation following genome duplication events prior to divergence of cartilaginous and bony vertebrates [37, 39, 107], or selective pressure changes associated with the functional shift to immune gene status inducing compositional heterogeneity among branches, heterotachy, and/or heteropecilly [108]) we nonetheless predict that rate asymmetry is likely a key player, promoting the case for it being a somewhat overlooked phenomenon [105]. We suspect that this may derive from standard substitution models being designed to accommodate rate asymmetry, and the resultant branching errors when this fails being less obvious at the level of genes than species, where there are often morphology-derived topological expectations.

CD4+ T cell subsets in cartilaginous fishes and the jawed vertebrate ancestor

Venkatesh et al. proposed that cartilaginous fishes have only basic or primordial T cell function [5]. Here, having employed detailed phylogenetic analyses, we identified orthologues of several additional genes integral to CD4+ T cell-subset induction and function in cartilaginous fishes. Combined with previous findings [5, 18, 19], these results show that cartilaginous fishes possess the molecules necessary to generate an array of CD4+ helper and regulatory T cells comparable to that of mammals. In fact, we present a new model of helper and regulatory T cell evolution wherein all key genes (in some form) and/or pathways found in mammals existed in the jawed vertebrate ancestor (Fig. 10).

Fig. 10
figure 10

A full set of T helper and T regulatory cell associated genes existed in the jawed vertebrate ancestor. The figure and gene selection are based on Fig. 5 from Venkatesh et al. [5], but here refer to the ancestral jawed vertebrate gene set rather than that of cartilaginous fishes. Boxed lineages were predicted to have emerged in the ancestor of jawed vertebrates by Venkatesh et al. [5] (black boxed lineages), by this study (red boxed lineages), or by this study and Dijkstra [18] (blue boxed lineages). All genes listed, except for IL-9 and IL-2Rα, have now been identified in cartilaginous fishes

We have provided new insights on the controversy surrounding the absence of IL-2R and FOXP3 functioning in cartilaginous fishes, both of which are required for the development and function of Treg cells, a subset that helps maintain self-tolerance by dampening inflammation and suppressing immune responses [12, 17]. For example, in teleost fishes a common IL-2/15 receptor binds both IL-2 and IL-15 [18, 94, 109]; in a similar manner IL-15R, which our study shows is present in cartilaginous fishes, could functionally compensate for the lack of IL-2R, which appears to be lost from both cartilaginous fishes and teleosts. Also, while cartilaginous fish FOXP3 shows poor conservation of the amino acids that facilitate DNA binding in mammalian FOXP3, we find that this is not unusual among non-mammals. Further, these residues vary naturally between FOXP subfamily members—all of which can bind DNA [110]—so lack of conservation of these elements in FOXP3 does not necessarily equate to an absence of Treg cells in cartilaginous fishes [18]. Further, while Venkatesh et al. used the apparent absence of T helper cell subsets in general, and TFH cells in particular, to explain the long lag-times associated with humoral immune responses in cartilaginous fishes [5], our results contradict this idea. Indeed, our data suggest cartilaginous fishes are capable of producing both TH2 and TFH cells, a finding that fits better with the antibody affinity maturation and immunological memory previously evidenced in cartilaginous fishes [20, 21].

While our data is consistent with the presence of a sophisticated, mammalian-like, set of T cell subtypes in cartilaginous fishes, several lineage-specific novelties were also observed; for example, GP-130 (IL-6Rβ/IL-6ST; the signalling component of the IL-6 receptor) is triplicated in cartilaginous fishes, potentially increasing the diversity of signalling that can be induced by IL-6. In line with this, IL-6 is also duplicated (and possibly triplicated) in cartilaginous fishes. Enigmatic orthology, as observed for the IL-4/13 family, may result from independent duplications in many lineages, combined with exon shuffling or conversion events [19]. Lineage-specific loss events have also played a role, for example the potential loss of IL-9 and IL-2Rα in cartilaginous fishes, or ROR-δ in mammals.

Finally, it must be noted that the data presented here do not provide conclusive evidence for the existence of any T cell subset in cartilaginous fishes, or the jawed vertebrate ancestor, but do strongly reject past conclusions regarding their absence. Importantly, although a canonical (i.e. mammalian-like) CD4 was reported as absent from cartilaginous fishes [5, 19], one of several CD4/LAG3-like molecules identified by Venkatesh et al. [5] has since been shown to have a CD4-like expression profile and thus may act as the functional equivalent in sharks (Martin F. Flajnik, personal communication). Together with our data, this suggests that a fully developed set of CD4+ helper and regulatory T cell subsets equivalent to that of mammals evolved in the jawed vertebrate ancestor and still exists, with lineage-specific modifications, in cartilaginous fishes today. While more work is required to fully understand T cell biology in cartilaginous fishes, our results show that this arm of their adaptive immune system is likely no more ‘primordial’ than that of mammals.