Background

Catfish belong to the order Siluriformes, the second most diverse vertebrate order containing 39 families with more than 4100 species [1, 2]. Their basal phylogenetic position among teleosts makes them valuable models for comparative biological studies; they are economically important for sport fishing and are third in global aquaculture production, following only the carps and tilapias [3]. Channel catfish (Ictalurus punctatus) and blue catfish (I. furcatus), native to North America, lead aquaculture production in the USA. The species exhibit differential production and performance traits: channel catfish grow faster in culture but provide lower processing yields than blue catfish; channel catfish is resistant against columnaris disease but susceptible to enteric septicemia of catfish (ESC), while blue catfish are highly resistant against ESC disease but susceptible to columnaris disease [4]. Channel catfish and blue catfish are also useful research models for morphological, developmental, and environmental studies. They share a similar morphology, but exhibit sharp differences in body color, anal fin structure, and head size: channel catfish is light brown in color with pigmented spots on its body, while blue catfish is silver-blue in color and body spots are rare; channel catfish has fewer than 28 anal fin rays while blue catfish has more than 30 anal fin rays; channel catfish has a broader head while blue catfish has a smaller head compared to body size [4]. In nature, channel catfish rarely grow to more than 30 pounds, while blue catfish can grow to over 100 pounds; channel catfish habituate at the bottom of water columns while blue catfish habituate in the middle of water columns, reflecting their differences in tolerance to low water oxygen and adaptation to visible light [4].

While channel catfish is cultured more than blue catfish, their F1 hybrid produced from mating female channel catfish and male blue catfish (CXB hybrid) exhibits a high level of heterosis in growth rate, but the reciprocal F1 hybrid produced from mating female blue catfish and male channel catfish (BXC hybrid) does not [5]. Because of the superior growth performance of the CXB F1 hybrid, it is now the predominant genotype used in the US aquaculture industry, but artificial spawning must be conducted to produce the hybrid because of the reproductive isolation of the parent species. Thus, channel catfish and blue catfish are also a useful animal model to study heterosis and speciation. Understanding their genomes would also facilitate the development of superior brood stocks for aquaculture and support their sustainability for fisheries. The fertility of the female CXB F1 hybrids is extremely low. However, backcrossing of the male F1 CXB hybrid with female channel catfish is reasonably productive. In fact, successive generations of backcross progenies can be successfully produced, and the fourth generation of backcross fish can mate naturally in aquaculture ponds [6, 7], suggesting their reproductive isolation had been overcome. Such high levels of genetic similarity between reproductively isolated sibling species provide an excellent model to determine the genomic basis for speciation.

Chromosomal rearrangements are broadly considered important for speciation because it can disrupt meiosis in heterozygous hybrids, thereby causing postzygotic reproductive isolation [8,9,10]. Research in plants has provided concrete evidence for this hypothesis [11], but evidence from animal systems has been rare other than from fruit flies. Previous research also indicated co-localization of genes contributing to reproductive isolation within chromosomal inversions [10]. In addition, greater genetic divergence was observed with fixed chromosomal inversions than in collinear regions of the genome [9], supporting the hypothesis that gene flows are prohibited with pericentric inversions. The genetic basis of reproductive isolation between channel catfish and blue catfish is unknown, but comparative analysis of their genomic architecture should provide insights. In the present research, we sequenced and assembled chromosomal reference genome sequences for both channel catfish and blue catfish, using PacBio long reads for framework contig construction, paired-end Illumina sequencing for consensus correction, optical map** for contig scaffolding, and high-density genetic linkage map** for validation of chromosome structure. The highly contiguous and accurate genome assemblies permitted comparative analysis of genome architecture, coding capacities, and genomic incompatibilities. Here we report major chromosome inversions on three different chromosomes between channel catfish and blue catfish, their genomic architecture, coding capacities and characteristics, repetitive elements, characteristics of their centromere and telomeres, and specific expansion of gene families related to their biological characteristics.

Results

Sequencing and assembly of the channel catfish and blue catfish genomes

We present a highly contiguous, chromosome-scale reference genome for each of the sibling species blue catfish and channel catfish, with an average nucleotide conservation of 94.5% between the species. A doubled haploid, homozygous individual served as the reference individual for each species [12], and the channel catfish was the same individual used for the generation of the Coco_1.2 genome assembly [13]. The channel catfish donor genome was sequenced to a depth of 75X with PacBio contiguous long-read (CLR) data and 48X Illumina data (Additional file 1: Table S1), and the channel catfish optical map was produced from 233X coverage of molecules filtered to a minimal length of 200 kb (Additional file 1: Table S2). The blue catfish donor genome was sequenced to a depth of 93X with PacBio (CLR) sequences and a depth of 77X with Illumina sequence; the blue catfish optical map was produced from 306X genome coverage of molecules filtered to a minimal length of 200 kb. Assembled contigs were scaffolded with the optical maps and further scaffolded by alignment with high-resolution genetic maps.

Both reference assemblies of channel catfish and blue catfish each contain 29 chromosomes, equal to their karyotype (Fig. 1A, B). The channel catfish chromosome assembly, Coco_2.0, includes 814 Mb in 96 contigs, while the blue catfish chromosome assembly, Billie_1.0, includes 815 Mb in 168 contigs (Table 1). There are 67 gaps within the channel catfish assembled chromosomes, 17 of which are within the repetitive satellite DNA of centromeres (Fig. 1A; Additional file 1: Table S3). Similarly, there are 139 gaps within the blue catfish assembled chromosomes, 22 of which are within the repetitive satellite DNA of centromeres (Fig. 1B; Additional file 1: Table S3). Most gaps are relatively small, while larger gaps involve repetitive sequences of tandem arrays such as rRNA gene clusters or within centromeric or telomeric regions in both genome assemblies.

Fig. 1
figure 1

Presentation of the reference genome assemblies. A Channel catfish (Ictalurus punctatus) assembly Coco_2.0 with 29 chromosomes: Centromere positions are denoted by red triangles, telomere presence is denoted by a blue cap, and sequence gaps are denoted by black lines. B Blue catfish (I. furcatus) assembly Billie_1.0 with 29 chromosomes; annotated as above. The chromosome length is scaled in megabases. For both A and B, chromosomes are presented with the centromeres in the upper half of the chromosomes, including those of chromosomes 6, 11, and 24 where pericentric inversions are present between the two species. Concordance of marker positions on the genome sequences and genetic maps of channel catfish (C) and blue catfish (D) are presented as plot of chromosome physical length (x-axis) versus genetic distance (y-axis). E Circos presentation of the linear relationships between the channel catfish and blue catfish genomes, with GC content (track a), repeat elements density (track b), gene density (track c), and the collinearity of protein-coding genes (track d). F Dot plot of MUMmer whole-genome sequence alignments of the channel catfish (x-axis) versus blue catfish (y-axis) chromosomes

Table 1 Summary and comparison of genome assembles and annotation of channel catfish and blue catfish

The reference genome assemblies are close to complete, representing 96.6% of the channel catfish genomic sequence (base pair quality value 37) and 98.7% of blue catfish genomic sequence (base pair quality value 39), respectively. There is more unassigned sequence (28.8 Mb) in the channel assembly than in the blue catfish assembly (10.3 Mb), presumably due to a greater fraction of repetitive elements in the genome of channel catfish (47.6%) than in blue catfish (45.5%) (Table 1).

The channel catfish Coco_2.0 assembly contains 59 Mb (7.8%) more sequence than the previous Illumina-based sequence assembly (Coco_1.2). Every chromosome was longer with a significant fraction of the newly assembled additional sequences being repetitive elements (Additional file 1: Fig. S1). Most striking is the decrease in the number of gaps from 24,080 in Coco_1.2 to 67 in Coco_2.0, and the new assembly included both centromeric and telomeric sequences.

The accuracy of the reference genome assemblies was enhanced using independent methodologies. Assembled sequence contigs were scaffolded by integration into optical maps which mainly produced chromosome arm scaffolds. The integration of scaffolds into each respective genetic linkage map produced full-length chromosomes. The marker positions were fully concordant with the genome sequences (Fig. 1C, D) indicating the assemblies accurately represented the chromosomes. The relationship between genetic distance and chromosomal position was not entirely linear because of the lack of recombination around centromeric regions on the linkage maps of blue catfish (our unpublished data) and channel catfish [14,15,16]. The sequences of the 29 chromosomes of channel catfish and blue catfish genomes were highly co-linear, as demonstrated by collinearity of protein-coding genes (Fig. 1E) and sequence alignment (Fig. 1F), with exceptions of major chromosomal inversions as described below.

Chromosomal inversions and structural variations

The genomic sequences of channel catfish and blue catfish were compared to determine structural variations (SVs). Using Coco_2.0 as the reference, we identified 29,593 SVs ≥ 500 bp in the Billie_1.0 assembly (Fig. 2), comprised of 2435 insertions, 1838 deletions, 179 inversions, 3413 translocations, 17,592 duplications, and other complicated variations (Additional file 1: Table S4). Four SVs exceeded 1 Mb in size; these included three large chromosomal inversions and one large segmental duplication. The three large inversions are pericentric, on chromosomes 6, 11, and 24. The inversion on chromosome 6 was the largest, involving 29.81 Mb of blue catfish sequence and 29.66 Mb of channel catfish sequence. The inversion on chromosome 11 involved 17.0 Mb of blue catfish sequence and 16.70 Mb of channel catfish sequence. The inversion on chromosome 24 involved 14.25 Mb of blue catfish sequence and 15.97 Mb of channel catfish sequence (Additional file 1: Table S5). These inversions have been confirmed in independent blue and channel haplotype assemblies produced from F1 hybrid genomic DNA (data not shown). While these major inversions caused chromosomal structural changes, the number and content of genes involved in the inversional segments were very similar between the two species (Additional file 1: Table S6). In contrast, the structural variation on chromosome 16 involved segmental duplication of 2.1 Mb (Additional file 1: Table S5), and it represented a structural variation between individuals, detected in the Billie genome but not in other blue catfish genomic templates.

Fig. 2
figure 2

Structural variations (SVs) between the channel catfish and blue catfish genomes. Channel catfish chromosomes are orange lines and blue catfish chromosomes are blue lines. Major inversions are evident in chromosomes 6, 11, and 24

We conducted gene-based syntenic analysis [17] to determine if such pericentric inversions are present in related catfish species and if they are related to phylogenies (Fig. 3A). It appeared that similar inversions had also occurred in a number of catfish species although the exact break points varied among the species. With chromosome 6, the orientation of the inversional segment was shared among blue catfish, black bullhead (Ameiurus melas), and southern catfish (S. meridionalis), while the inverted orientation was observed with channel catfish, iridescent shark catfish (P. hypophthalmus), yellow catfish (P. fulvidraco), and redtail catfish (H. wychioides) (Fig. 3B). With chromosome 11, blue catfish, black bullhead, and iridescent shark catfish shared the same orientation of the inversional segment, while the remaining four species shared the other orientation of the inversional segments (Fig. 3C). For chromosome 24, only blue catfish and southern catfish shared the same orientation, while all the other five species shared the other orientation of the inversional segments (Fig. 3D). These results suggested that the pericentric inversions on chromosomes 6, 11, and 24 occurred broadly among catfish species, and they occurred independently in these taxa.

Fig. 3
figure 3

Collinearity analysis of the pericentric inversions observed between blue catfish (Ictalurus furcatus) and channel catfish (I. punctatus). Collinearity syntenic analysis of the three pericentric inversions. A Phylogenetic dendrogram of the species involved in the analysis. Asian species include Southern catfish (Silurus meridionalis), Asian redtail catfish (Mehibagrus wyckioides), Yellow catfish (Pelteobagrus fulvidraco), and striped catfish (Pangasianodon hypophthalmus). North American species include black bullhead catfish (Ameiurus melas), blue catfish (Ictalurus furcatus), and channel catfish (I. punctatus). Zebrafish (Danio rerio) and Mexican tetra (Astyanax mexicanus) are included as outgroups. Divergence times are in million years ago. B–D Collinearity analyses of the inversions in chromosome 6 (B), chromosome 11 (C), and chromosome 24 (D)

Molecular and genetic evidence of the chromosomal inversions

Analysis of reference genome sequences revealed the three pericentric inversions on chromosomes 6, 11, and 24. To demonstrate and validate the pericentric inversions, we took three different approaches: (1) analysis of long sequencing reads across the inversion junctions from multiple individuals; (2) genetic linkage analysis of a common set of markers that can be mapped to both blue catfish and channel catfish; and (3) PCR analysis across the inversion junctions. With analysis of long reads across inversion junctions, we independently mapped long reads from two blue catfish and two channel catfish unrelated to the reference genomes. As shown in Fig. 4, in all cases, long reads across the inversion junctions from both blue catfish aligned well with the reference genome of blue catfish Billie_1.0, but not with the reference genome of channel catfish; similarly, in all cases, long reads across the inversion junctions from both channel catfish aligned well with the reference genome of channel catfish Coco_2.0, but not with reference genome of blue catfish, providing additional sequence support for the accurate assemblies of the reference genomes of blue catfish and channel catfish. With the second approach, genetic linkage analysis was conducted using a common set of markers (Additional file 2: Table S7) that can be mapped to both blue catfish and channel catfish. As shown in Fig. 4, common markers were mapped to opposite locations within the inverted segments, providing genetic evidence for the pericentric inversions. With the third approach, PCR primers were designed flanking five of the six inversion junctions for both blue catfish and channel catfish (one primer set on chromosome 24 could not be designed uniquely because the flanking sequence consisted of telomeric repeats). As shown in Fig. 4, PCR amplicons were produced as expected of the pericentric inversions, providing molecular support for the pericentric inversions.

Fig. 4
figure 4

Evidence of pericentric inversions between the genomes of channel catfish and blue catfish. Three lines of evidence supported the presence of major pericentric inversions on chromosome 6, chromosome 11, and chromosome 24: (1) long reads mapped at the junctions (left panel); (2) genetic linkage map** (middle panel); and (3) junction PCR (right panel). With long reads across the inversional junctions, two additional blue catfish and two channel catfish were sequenced in addition to the sequencing templates that were used to generate the reference genomes. Alignments were contiguous when the junctional long reads from blue catfish individuals were mapped against the reference genome sequence of blue catfish, but not against the reference genome sequence of channel catfish and vice versa. This was true for all inverted chromosomes 6, 11, and 24, and for both the left and the right inversion junctions. With genetic map**, a common set of markers (Additional file 1: Table S17) were identified within the inverted junctions, and the inversion was evident for all inverted chromosomes of 6, 11, and 24. Finally, we designed PCR primers to amplify across the inversion junctions (Additional File 1: Table S17). As expected, the PCR amplicon matched the expectation of inversions, except for the left junction PCR using primers of channel catfish, which generated a band from blue catfish as well, but not of expected size. We believe this band was generated from non-specific primer binding as an artifact

The major pericentric inversions on chromosomes 6, 11, and 24 imply that gene flow between channel catfish and blue catfish could be restricted on these chromosomes. If so, the recombination rates would be lower in the inversional segments on these chromosomes in the gametes of F1 hybrid (channel catfish × blue catfish). We determined the recombination rates using genetic linkage analysis. Recombination rates were calculated from genetic linkage map** using channel catfish intraspecific resource families [14]; those for blue catfish were calculated from genetic linkage map** using blue catfish intraspecific families (unpublished), and those for hybrids were calculated from genetic linkage map** using interspecific backcross progenies, where channel catfish were mated with interspecific F1 hybrid (channel catfish female × blue catfish male) to produce the backcross progenies [15]. As shown in Fig. 5, there was no recombination in the interspecific hybrid within the inversional segments, other than very low numbers of double crossovers, as predicted. In contrast, there were recombination events within the inversional segments with channel catfish or blue catfish. Significantly higher recombination rates were observed within channel catfish or blue catfish than in backcross progenies, despite the overall low recombination rates surrounding the centromeres (Fig. 5), suggesting postzygotic inhibition of recombination or mortalities of the recombinants.

Fig. 5
figure 5

Recombination rates within pericentric inversions. Left panel: Dot plots of MUMmer alignments of channel catfish and blue catfish chromosome 6 (A), chromosome 11 (D), and chromosome 24 (G) are presented, all using reverse complement sequences of blue catfish as deposited in NCBI. Middle panel: Plot of genetic positions (y-axis) against physical positions (x-axis) of markers on Coco_2.0 (upper) and Billie_1.0 (lower) for chromosome 6 (B), chromosome 11 (E), and chromosome 24 (H). The orange background denotes the boundaries of the chromosomal inversions. Right panel: Plot of recombination rates (cM per Mb, y-axis) vs physical positions (Mb) of genetic markers on the genomic sequences for chromosome 6 (C), chromosome 11 (F), and chromosome 24 (I), with recombination rates of channel catfish indicated in black, blue catfish in blue, and hybrid catfish in red. The black triangle and blue triangle indicate the position of the centromere in channel catfish and blue catfish, respectively

Genome annotation and protein-coding capacities

To annotate the channel and blue protein-coding genes, we combined results obtained from protein-homology-based prediction, RNA-seq-based prediction, and Breaker2 prediction. A total of 25,035 high-confidence protein-coding genes were predicted in the channel catfish genome, of which 24,558 (98.1%) genes were included in the 29 chromosomes (Additional file 1: Table S8). The total number of protein-coding genes was increased by 1935 from Coco_1.2, and the number of protein-coding genes unassigned to chromosomes was decreased from 823 to 477. Similarly, a total of 23,546 high-confidence protein-coding genes were predicted in the blue catfish genome, of which 23,444 genes (99.6%) were included in the 29 chromosomes; only 102 blue catfish protein-coding genes were unassigned to chromosomes (Additional file 1: Table S8).

The numbers of protein-coding genes identified from the channel catfish and blue catfish genomes compare favorably with their orthologous counterparts from well assembled fish species (Fig. 6A; Additional file 1: Table S9). Of the 3640 Actinopterygii (ray-finned fish) BUSCO genes [18], 3517 (96.6%) and 3480 (95.6%) were detected in the channel catfish genome Coco_2.0 and blue catfish genome Billie_1.0, respectively, as compared to 3475 (95.47%) in the Danio rerio genome (Fig. 6B).

Fig. 6
figure 6

Annotation of the channel catfish and blue catfish genomes. A Comparison of sequence orthology between 16 fish species showing number of genes in each ortholog category, with channel catfish and blue catfish highlighted in the red box. B Comparison of the 3640 Actinopterygii (ray-finned fish) BUSCO genes among the genome assemblies of 16 fish species. C Analysis of distinctive genes (or gene families) among seven catfish species whose genome has been sequenced. Channel catfish and blue catfish are highlighted in the red box, and genes specific to them are highlighted in purple. D Enrichment analysis of genes specific to channel catfish and blue catfish. E Summary of the commonality and difference of channel catfish and blue catfish genes. F Enrichment analysis of genes specific to channel catfish. G Enrichment analysis of genes specific to blue catfish

We also compared the distinct protein-coding gene families of the seven catfish genomes for which whole-genome sequences are available, including southern catfish (Silurus meridionalis) [19, 20], Asian redtail catfish (Hemibagrus wyckioides) [21], yellow catfish (Pelteobagrus fulvidraco) [22], striped catfish (Pangasianodon hypophthalmus) [23], black bullhead catfish (A. melas, GenBank Accession GCA_012411375.1), and channel catfish and blue catfish. A total of 18,320 gene families were inferred from the seven catfishes, of which 13,255 gene families were shared by all seven catfish species. The remaining 5095 gene families were shared by a variable number of 1 to 6 catfish species (Fig. 6C, displaying only the top 15 shared groups of gene families). Of particular interest were the 508 gene families that were specific to the ictalurid channel catfish and blue catfish (Fig. 6C; Additional file 2: Table S10). Enrichment analysis indicated that the 508 channels and blue catfish-specific gene families were enriched for functions related to spermatid development, negative regulation of transposition, and RNA hydrolysis (Fig. 6D).

Comparison of the protein-coding capacities between channel catfish and blue catfish revealed 732 gene families specific to channel catfish (Additional file 2: Table S11) and 434 gene families specific to blue catfish (Additional file 2: Table S12). The channel catfish gene families included 1127 individual genes, and the blue catfish gene families included 606 individual genes (Fig. 6E). Enrichment analysis indicated that 1127 channel catfish-specific genes were enriched with chromatin structure of euchromatin and heterochromatin, especially ATP- and AMP-binding activities related to histone H3 modifications (Fig. 6F). In contrast, the 606 blue catfish-specific genes were enriched for (1) amino acid modification involving amino- and metalloexo-peptidase activities; (2) responses to light involving rhodopsin-mediated signaling pathway; (3) cellular motility-related functions involving dynein and microtubules; and (4) immune-related function involving MHC class I biosynthesis and interleukin 18 production (Fig. 6G).

Expansion of repeatome

The blue catfish and the channel catfish genome assemblies contain 45.5 and 47.6% repetitive elements, respectively. The top 17 categories of repetitive elements (representing at least 1% of the repeatome) in blue catfish accounted for 66.2% of all repetitive elements in the blue catfish genome (Fig. 7A, Additional file 2: Table S11), with Tc1/mariner transposons (22.1%) most abundant followed by simple sequence repeats (9.2%), repetitive proteins (7.8%), LINE/L2 (4.4%), DNA/hAT-Ac (3.6%), LTR/Ngaro (3.5%), LTR/Gypsy (3.1%), LTR/DIRS (2.7%), uncharacterized DNA transposon (2.4%), Satellite (1.5%), LTR/ERV1 (1.4%), LINE/Rex-Babar (1.2%), Repetitive non-coding RNA (1.7%), DNA/PIF-Harbinger (1.4%), DNA/CMC-EnSpm (1.4%), DNA/hAT-Charlie (1.4%), and LINE/Rex-Babar (1.2%). The remaining 50 categories of known repetitive elements accounted for 14.2% of the repeatome of blue catfish. A total of 19.6% of repetitive elements are unknown in nature (Additional file 1: Table S13).

Fig. 7
figure 7

The repeatomes of channel catfish and blue catfish and their specific expansion of immunoglobulin-related genes and the Xba elements. Repeatomes of channel catfish and blue catfish. A The most abundant categories of the repeatomes of blue catfish (left) and channel catfish (right), with each category representing at least 1% of the repetitive elements of their repeatome, respectively. The complete list and annotation of their repetitive elements are presented in Additional file 1: Table S14. B Distribution and quantity (in Mb) of immunoglobulin-related proteins (Immunoglobulins) and Xba elements in various catfish species. The amount in Mb of immunoglobulin-related proteins and Xba elements are indicated at the bottom of the figures. Phylogenetic analysis was conducted with cytochrome b sequences, and each bar corresponds to the species within the phylogenetic tree with immunoglobulins in blue, and Xba elements in orange

The categories and proportions of the repetitive elements in the channel catfish genome are similar to those in the blue catfish genome [24], with exception of the Xba elements and immunoglobulin-related repetitive proteins (Additional file 1: Table S14). The channel catfish genome contains significantly more Xba elements than the blue catfish genome, accounting for 1.7% of its repeatome, as compared to 0.35% in blue catfish. The Xba elements are centromeric (see below), but another major repeatome expansion of channel catfish and blue catfish is repetitive proteins, accounting for almost 8% of their repeatome (approximately 4% of the genome). In particular, the immunoglobulin-related genes are significantly expanded in the catfishes (Siluriformes) compared to other teleost and vertebrate taxa. Immunoglobulin-related gene sequences comprise 4.8 Mb (channel catfish) and 2.9 Mb (blue catfish) of the genome (Fig. 7B). The amount of immunoglobulin-related gene sequences in the genomes of various catfishes exhibited an interesting pattern. Of the 18 species analyzed, channel catfish has the largest proportion of immunoglobulin-related gene sequences in its genome, followed by blue catfish, black bullhead (A. melas), Neosho madtom (N. placidus), and Giant Mekong catfish (Pangasianodon gigas), while immunoglobulin-related gene content declined in proportion to the phylogenetic distance from ictalurid catfishes (Fig. 7B).

Centromeres and telomeres

The genome assemblies permitted characterization of centromeric and telomeric repeats in channel catfish and blue catfish. The centromeres of channel catfish and blue catfish are composed of satellite sequences of Xba elements [25, 26]. The Xba elements are highly AT-rich (65.5%); the vast majority are 321–325 bp long organized in head-to-tail tandem arrays in the centromeres (Fig. 8A, B). Relative centromere positions were also conserved between channel catfish and blue catfish except for chromosomes 6, 11, and 24 due to the inversions (Fig. 8C; Additional file 1: Table S15). Gene synteny surrounding the centromeres on these three chromosomes was entirely conserved.

Fig. 8
figure 8

Analysis of centromeres of channel catfish and blue catfish. A Southern blot analysis of channel catfish genomic DNA digested with Xba I restriction endonuclease (adopted from [25]), showing tandem structure as a ladder was produced with incremental amounts of Xba I enzyme (Lanes 1–6). The molecular weight standards are indicated on the right margin. B Fluorescence in situ hybridization of Xba elements labeled with digoxigenin and detected with FITC-labeled anti-digoxigenin on channel catfish metaphase chromosomes (2n = 58), adopted from Quiniou et al. [26]. C Comparison of relative centromeric locations and sizes of channel catfish (orange) and blue catfish (blue) chromosome scaffolds. Note that the relative size of centromeres was amplified by × 2 to clearly show the difference between channel catfish and blue catfish. D Quantitative real-time PCR using Xba element-specific primers. The Ct values of 8.96 10.43, and 24.63 were observed for channel catfish Xba (orange), blue catfish Xba (blue), and a single-copy microsatellite marker (gray), respectively. E Divergence rates of Xba elements of channel catfish (orange) and blue catfish (blue) as denoted by the Xba elements as a percentage of total genomic repeats versus substitution rate (% substitution per site)

The numbers of Xba tandem repeats varied greatly among chromosomes of both species (Additional file 1: Table S16). All chromosomes in the Coco_2.0 assembly contained centromeres, with nine chromosomes containing ungapped centromeres. The largest ungapped centromere contained 3146 Xba repeat units (chromosome 20) with a total length of over one million base pairs. Similarly, centromeres were identified in all but one blue catfish assembled chromosome, seven contained ungapped centromeres, the largest of which contained 635 Xba repeat units. While ungapped assembly of Illumina-corrected CLR sequence may not necessarily equate to complete centromeres, the genomic sequence pointed to larger centromere sizes in channel catfish than in blue catfish (Fig. 8C). Therefore, we validated the size difference using real-time quantitative PCR on genomic DNA from unrelated channel and blue catfish and found 2.25-fold more centromeric DNA in the channel catfish genome than in the blue catfish genome (Fig. 8D).

The sequences of the Xba elements are highly conserved, with the highest levels of conservation among Xba elements within a single centromere of channel catfish. The sequences of Xba elements are significantly more divergent in blue catfish (Fig. 8E), even among repeat units within a single centromere (Additional file 1: Fig. S2). The average percent of substitution rate of Xba elements in channel catfish was 3.1, whereas that of Xba elements in blue catfish was 11.7, almost four times larger. In both channel catfish and blue catfish, the Xba element sequence varied more at the beginning and end of each centromere, and sequences of the internal repeat units were most highly conserved.

Despite high levels of sequence conservation, centromeric Xba sequences appear to be present only in some species within Ictaluridae. We searched catfish whole-genome sequences in GenBank for Xba elements. In addition to channel catfish and blue catfish, Xba elements were present in the Ictalurids A. melas and N. placidus but not in any other organisms, including various catfish species (Fig. 7B). We do not know if the Xba elements serve as centromeres in A. melas as they do in channel catfish and blue catfish, but their distribution in various chromosomes in the form of tandem repeats suggests that role. However, copy numbers of Xba elements in N. placidus are very low—either this is an artifact of the sequencing platform or Xba elements may not be centromeric in this species.

Telomeric sequences were identified in all 29 chromosomes of channel catfish, of which TTAGGG repeats were identified both at the beginning and at the end in 22 chromosomes (Additional file 1: Table S15). For the 7 remaining chromosomes (2, 3, 10, 11, 15, 17, and 21), telomeric repeats present in the unlocated sequencing data could not be uniquely placed at the p-arm of the chromosomes. Three of the latter chromosomes (15, 17, 21) were acrocentric or telocentric based on optical map** data (Additional file 1: Table S15). Similarly, for blue catfish, telomeric TTAGGG repeats were present at both ends of 19 chromosomes. Again, the p-arm sequence began with centromeric Xba elements in the acrocentric/telocentric chromosomes 17 and 21. Centromeric sequences were missing from the p-arm of chromosome 15. Five additional chromosomes did not have telomeric sequence on the p-arm and two did not have telomere sequence on the q-arm (Additional file 1: Table S15).

There are some variations of telomere repeats with channel catfish. For example, its chromosome 12 has a long repeat sequence of 102 bp which appeared to have a higher order of repeat (HOR) (GGGCTTCCCCAGGCTCGGTGAGTGATTTTCGGGCAAAATGACAAACTTCCACAGGCGTTTCCCTTGAACCGAGCTCCATCAGGGGCTTCAGTACT/GGGTTA); chromosome 13 has a repeat sequence of AGAGGGG at the beginning but regular TTAGGG at the end; chromosome 22 has repeats of AAACAGTTAG(T/C)GATG/GGGTTA; chromosome 27 has TTAGGG on the same strand at both 5′-end and 3-end of the chromosome.

Discussion

We report reference genome sequences of channel catfish and blue catfish. These reference genomes will be valuable resources for various biological, environmental, and evolutionary studies. We previously published a reference genome sequence for channel catfish [13] that was produced using second-generation sequencing technology. The continuity of the current Coco_2.0 assembly is drastically enhanced, from a total of 34,615 contigs and 9974 sequence scaffolds in Coco_1.2 to only 96 contigs in Coco_2.0. In addition to continuity and more repetitive content, Coco_2.0 includes 1935 more protein-coding genes compared to Coco_1.2. Much like Coco_2.0 for channel catfish, the blue catfish genome sequence is highly contiguous (Table 1).

A blue catfish genome assembly has been recently reported [27], but we believe the Billie_1.0 assembly is more robust and more accurately reflects the blue catfish genome. The three large chromosomal inversions between blue catfish and channel catfish genomes were not reported by Wang et al. [27]; another 6.8 Mb inversion, on chromosome 7 (position 20,122,920–26,935,075), may represent an artifact in their assembly (Additional file 1: Fig. S3 and Fig. S4). The Billie_1.0 assembly was produced using three independent resources—long-read sequencing and optical map** from the D&B strain of blue catfish genome donor, and genetic map** of three unrelated blue catfish full-sibling families derived from the Rio Grande strain or from parents collected from the Mississippi River. The markers on the genetic map and on the physical sequence are concordant. Furthermore, we have produced two additional blue catfish haploid assemblies and two channel catfish haploid assemblies derived from genomic sequences of two F1 hybrid individuals and all four new assemblies confirm the inversions in Billie_1.0 compared to Coco_2.0. Scaffolding of the blue catfish genome assembly by Wang et al. [27] utilized genetic linkage maps constructed for channel catfish that were derived from either channel catfish resource families or interspecific hybrid resource families [14,15,16], suggesting that caution should be exercised when conducting reference-guided assemblies even of closely related species.

Three lines of evidence support the inversions we report here between blue catfish and channel catfish genomes on chromosomes 6, 11, and 24 (Fig. 4). First, long reads across the inversion junctions using unrelated blue catfish and channel catfish all are compatible with the inversions. Second, genetic linkage map**, as conducted using resource families that are unrelated to any of the multiple sequencing templates, also supported the chromosomal inversions. Third, direct test through PCR using primers across the inversion junctions also supported the chromosomal inversions, although on chromosomal 24 unique PCR primers could only be designed for the inversion junction at the beginning of the chromosome (Fig. 4). In addition, the primer pair at the beginning of chromosome 11 produced an unexpected band from blue catfish, likely from non-specific primer binding because the size was wrong even if there was no inversion.

The functional importance of the pericentric inversions in speciation of channel and blue catfish is unknown at present, because combining genomes with huge chromosomal inversions are not necessarily postzygotic barriers. Many fish species have multiple large inversions segregating in the populations (e.g., Atlantic cod) with impact on recombination but no apparent impact on hybrid fitness [28]. The observation of low recombinants among backcross progenies (Fig. 5) could indicate either lack of recombination or mortality of the recombinants. The detection of low rate of double crossover recombinants, but not single crossover recombinants, among backcross progenies suggested the latter, indicating that the pericentric inversions could be a postzygotic barrier for survival of the recombinants. In spite of being anecdotal, our previous research [7, 29] also reported low hatching and survival rates of first generation of backcross progenies (female channel catfish × male F1 hybrid), but increasingly higher hatching and survival rates were observed with higher generations of backcrosses, suggesting “homogenization” of chromosomes through continuous backcrossing would be an approach to effectively introgress chromosomal segments from blue catfish into fertile hybrids.

Overall quality of the channel and blue reference genome sequence assemblies was assessed with a set of standards and metrics recommended by the G10K Consortium [30]. The channel catfish genome assembly Coco_2.0 had x.y.P.Q.C. metrics of > 12. > 29.-0.37.97 and the blue catfish genome assembly had x.y.P.Q.C. metrics of > 6. > 30.-0.39.99 (Table 2). The channel catfish genome sequence was slightly more contiguous with 67 gaps than the blue catfish genome sequence with 139 gaps. However, the blue catfish genome assembly Billie_1.0 was more complete with 98.7% of the genome sequences assigned to chromosomes while 96.6% of genome sequences of channel catfish were assigned to chromosomes. Longer centromeric repeat arrays and more unlocated repetitive elements in channel catfish contributed to the 2.1% difference. However, larger gaps in the Billie_1.0 assembly, especially one on chromosome 14, were due to tandem arrays of rDNA and tRNA genes near the ends of the chromosome. Those arrays lacked polymorphic genetic markers and could not be oriented uniquely on the chromosomes. The accuracy of the reference genome sequence was demonstrated by concordance of large numbers of SNP marker positions on the reference genome sequences with those on the genetic linkage maps [14,15,16] (and our unpublished data). Haplotype blocks could not be assessed from the reference genome sequences because homozygous, doubled haploid sequencing templates were used for sequencing with both channel catfish and blue catfish. These high-quality genome sequences, channel catfish genome assembly Coco_2.0 and blue catfish genome assembly Billie_1.0, and assemblies from other fish species, such as zebrafish [31], cavefish [32], Atlantic salmon [33], sterlet sturgeon [34], Silver Sillago [35], half-smooth tongue sole [36], common carp [37], tilapia and related cichlids [38], and European seabass [39], will serve as long-term resources for genetic and genomic research with teleost species, which account for more than 50% of all vertebrate species.

Table 2 Quality assessment of the channel catfish genome assembly Coco_2.0 and blue catfish genome assembly Billie_1.0 using International Genome 10 K (G10K) Consortium metrics [30]

The high-quality assemblies of these two closely related species provided a more complete landscape of genome architecture, gene annotation, repetitive elements, TE insertions, and centromere and telomere sequence characteristics. Of particular interest were the 508 genes present in channel catfish and blue catfish but absent from five other catfish species. Enrichment analysis indicated overrepresentation of several categories of genes including piRNA-binding, RNA silencing, fertility-related functions, and negative regulation of transposition, suggesting the importance in catfish of piRNA-induced silencing complexes (piRISCs) in fertility and transposon silencing [40], as they are in worms, flies, and mice [41,42,43]. However, such speculation is based on the assumption that the reference genomes of the species under comparison are complete; we can only assess the quality of the reference genomes of blue catfish and channel catfish reported here but have no assessments for the other catfish species used in the analysis.

A large set of genes was present in the channel catfish genome but not in the blue catfish genome. Enrichment analysis indicated the major overrepresentation terms in channel catfish are genes related to chromatin structure involving histone H3 modifications such as H3-K4, H3-K9, and H3-K27 methylation. Similarly, a total of 606 genes in the blue catfish genome were not found in the channel catfish genome. Enrichment analysis indicated that the major overrepresented terms of these genes were involved in peptidase activities, responses to light involving rhodopsin signaling, cellular motility-related functions involving dynein and microtubules; and immune-related functions involving MHC class I and interleukin 18 production. We do not know what functional importance these enriched genes mean for each species, but we do know that these differences in gene contents between blue catfish and channel catfish are real because of the completeness of the reference genomes of both blue catfish and channel catfish.

The channel catfish and blue catfish genomes are characteristic of significant expansions of immunoglobulin-related genes [44], which correlate well with the phylogenetic relationship of various catfishes (Fig. 7B). Ictalurid catfishes are endemic to temperate areas in North America [45], whereas the more distantly related catfish species are distributed in tropical or subtropical South America or southern Asia. Perhaps more significantly, the taxa with significant expansion of immunoglobulin-related genes are all scaleless catfishes that could be exposed more frequently to microorganisms. Taxa with a smaller repertoire of immunoglobulin-related genes typically have protective skin structures. The body of Bagarius species is entirely or almost entirely covered by heavily keratinized skin superficially differentiated into unculiferous plaques or tubercles [46], while Corydoras catfishes have special scales made of bony dermal plates [13]. These protective skin structures, like the scales of other teleost fish, could offer more protection from direct exposure to microorganisms.

Centromeres are important for chromosome segregation during meiosis and mitosis, and telomeres are important for maintenance of chromosomal integrity and cell longevity. However, all existing catfish genome assemblies in NCBI, including those for channel catfish, yellow catfish (Tachysurus fulvidraco) [15], suggesting that the pericentric inversions interrupt postzygotic recombination or survival of recombinants. This work, therefore, has practical implications for breeding programs. Blue catfish is of particular interest with its superior traits of disease resistance against ESC bacterial disease, greater processing yields, and better harvestability [49]. Introgression of superior production and performance traits from the blue catfish genome by interspecific hybridization, followed by one or two generations of backcrossing to achieve homologous chromosome pairs of chromosomes 6, 11, and 24, is a logical step for breeding.

Methods

Production of gynogenetic doubled haploid blue and channel catfish

The homozygous catfish used as genome sequencing templates were produced through gynogenesis. The channel catfish, “Coco”, was the same individual used to produce the first channel catfish genome assembly [13] and was produced using established methods [12]. The blue catfish, “Billie”, was produced using a similar approach with the difference of blue catfish eggs fertilized with irradiated channel catfish sperm, and the pressure shock was applied 90 min post-fertilization. Homozygosity was validated by using microsatellite markers [50].

DNA isolation and sequencing

Genomic DNA was isolated from peripheral red blood cells using standard method of Proteinase K-SDS digestion, ammonium acetate protein precipitation, and precipitation of nucleic acids by 2-propanol. High molecular weight (HMW) DNA was randomly sheared to produce a 350-bp insert library, and paired-end sequences were produced on an Illumina NextSeq 500 platform. For the blue catfish long reads, HMW DNA was sheared with a Covaris® G-tube targeting > 20 kb fragments. Sheared DNA was prepared for PacBio sequencing (Pacific Biosciences, Menlo Park CA) using the SMRTbell™ Template Prep Kit, and size selected with the Blue Pippin (Sage Sciences). Sequencing was performed on a PacBio® RS II System on SMRT®Cell 8Pac V3 cells using P6-C4 chemistry. To target Continuous Long Reads, the libraries were sequenced using 6-h movies on 90 SMRT®Cells. For the channel catfish long reads, HMW libraries were produced as described for blue catfish above and sequencing was performed on a PacBio® Sequel System on 12 LR SMRT®Cells 1 M v3 using SMRTLink version 6 software. Continuous Long Reads were produced using 15-h movies. This channel catfish sample was also run on an 8 M SMRT®Cell on a PacBio® Sequel II System using v7.0 software.

Sequence assembly

A total of 6,935,942 CLR reads (76,971,401,043 bp) were produced from the blue catfish genome, with a N50 read length of 16,065 bp. A total of 3,696,288 CLR reads (63,729,299,415 bp) were produced from the channel catfish genome with an N50 read length of 25,713 bp. The CLR reads were assembled using Canu v1.8 [51], and sequence accuracy of assembled contigs was improved with two iterations of arrow using the CLR reads followed by one iteration of Freebayes using 77X (blue) or 48X (channel) coverage of Illumina reads [Genetic linkage map**

The channel catfish genetic linkage map was constructed using single-nucleotide polymorphic markers (SNPs) with 576 fish from three resource families of 192 fish each [14]. The interspecific hybrid linkage map was constructed using SNP markers in 288 backcross progenies with 96 individuals from each of the three backcross families [15]. The blue catfish linkage map was newly constructed for this project using SNP markers of the catfish 690 K SNP array [16]. A total of 141 individuals from three full-sib families of blue catfish were genotyped. The map** procedures followed the protocols as described [16] with some modifications. Genotype calling of generated signal intensity data in CEL file was performed using the Axiom Analysis Suite software. SNPs classified as “PolyHighResolution” and “NoMinorHom” were remained for further analysis. SNPs with call rate lower than 95%, minor allele frequency (MAF) less than 0.05, or missing value more than 10 were excluded by using SVS software package (SNP & Variation Suite, Version 8.3). The filtered genoty** data were then imported into PLINK 1.0 [53] to examine pedigree information based on pairwise identity-by-state (IBS) distance analysis. Outlier samples were removed once they were detected with significantly larger distances compared with the normal level. Mendelian segregation of SNP markers in the three map** families were checked using chi-square test of R package “Onemap” [54]. Markers with significant segregation distortion (p < 0.001) were eliminated from linkage analysis. Linkage map was constructed using Lep-MAP3 [55]. First, SNP genoty** data from the three families were combined and converted to genotype likelihoods (posteriors) using “linkage2post.awk” script. Then, the “SeparateChromosomes2” module was applied to cluster markers into linkage groups (LGs). The threshold of logarithm of the odds (LOD) score limit of 12 and minimum LG size of 60 (lodLimit = 12 sizeLimit = 60) were applied to form 29 LGs. Singular markers were added to the established LGs by using the “JoinSingles” module with LOD score limit of 5 and minimum difference of 2 (LodLimit = 5 lodDifference = 2). Finally, the module “OrderMarkers2” was used to order makers in each linkage group (LG), which was determined by allowing different recombination probabilities in both sexes. Two rounds of marker ordering procedures were carried out to obtain the order with best likelihood with 10 interactions per each round. After the second round of ordering, genetic distance was calculated with the Kosambi map** function accounting for both male and female meiosis. Sex-specific recombination rates were then calculated with the same marker order. All genetic linkage maps were drawn with MapChart (version 2.3).

Assessment and validation of the sequence assembly

The accuracy of the sequence assembly was assessed using MUMmer [56] to compare SNP marker positions on the genetic map with their positions on the genomic sequence scaffolds.

Genome annotation

The repetitive elements were identified using RepeatModeler 1.0.8 containing RECON [57] and RepeatScout [58] with default parameters. The derived repetitive sequences were searched against Dfam and Repbase [59, 60]. If the sequences were classified as “Unknown”, they were further searched against the non-redundant nucleotide database using blastn 2.11.0 + analysis of repetitive elements. The results, along with a custom library from RepeatMasker, were merged. We used the comprehensive species-specific repeat element library to mask the repeats from known families (replaced with N) and their location information was collected as intergenic. All repetitive regions were soft-masked before annotation of protein-coding genes.

Structural annotation was conducted by three strategies consisting of ab initio, homology, and RNA-seq-based prediction. To conduct ab initio gene prediction, the genome data and RNA-seq short reads (SRR11951631, SRR11951633, SRR11951635, SRR11951637, SRR11951639, SRR11951641, SRR11951643, and SRR392744) were input to the BRAKER2 pipeline [61], which performed iterative gene prediction to train and refine gene models by invoking GeneMark-ES [62] and Augustus [63]. RNA-seq reads were assembled in the genome-guided way by the HISAT2 (v2.1.0) [64] and StringTie (v2.1.4) [65]. Afterward, the genome-guided transcript sets were sent to TransDecoder (https://github.com/TransDecoder) to identify coding sequences by open reading frame (ORF) prediction and homology searches. For homology-based protein prediction, protein sequences of closely related fish species were downloaded from Ensembl, including Astyanax mexicanus, Danio rerio, Ictalurus punctatus, Oryzias atipes, and Pangasianodon hypophthalmus. Finally, we produced an integrated gene set from MAKER pipeline [66] using the abovementioned three annotations as input datasets. Functional annotation was performed using Diamond (v2.0.15) [15]. For all genetic maps, marker orders were compared to physical positions in the channel catfish or blue catfish genome sequence assemblies. The relationship between genetic and physical positions was demonstrated by a scatter plot with the markers’ genetic positions (cM) versus physical positions (Mb). The local recombination rates were estimated and displayed by a smooth line chart in non-overlap** 2 Mb windows with the Loess (locally weighted scatterplot smoothing) method.

Gene space completeness

The final assembly of channel catfish and blue catfish genomes was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) [84] with the lineage database Actinopterygii_odb10. Genome assemblies for the 16 species under comparison were downloaded from NCBI. The 3640 Actinopterygii (ray-finned fish) BUSCO genes were used as a benchmark to assess the genome completeness. Homologous gene pairs between channel catfish and blue catfish were constructed through reciprocal best hit (RBH) method using all-against-all BLASTP (v2.10.1 +).

Analysis of centromeres

The Xba elements were arranged in head-to-tail tandem arrays. The repeat sequences were extracted from the genome sequences of channel catfish and blue catfish. The positions on each chromosome were located, and their sizes were quantified using the repeat numbers of the Xba elements. Sequences of Xba elements were aligned using Clustal Omega with the online platform of EMBL-EBI (https://www.ebi.ac.uk/Tools/msa/clustalo/). Their tandem nature was confirmed by both sequence analysis and our Southern blot experiments [25]. Similarly, fluorescent in situ hybridization previously conducted in our laboratory demonstrated chromosomal position [26]. The observed copy number difference between channel catfish and blue catfish was confirmed by quantitative PCR.

Calculation of divergence rates of Xba elements

The average number of substitutions per site (K) for each Xba repeat unit was subtotaled. The K value was calculated based on the Jukes-Cantor formula: K =  − 300/4 × Ln(1 − D × 4/300), the D represents the proportion of each Xba repeat unit differing from the consensus sequences [85].

Quantitative real-time PCR

A quantitative real-time PCR assay was designed and optimized to confirm the relative copy number of Xba elements in the blue and channel catfish genomes. Triplicate reactions were performed in 20 µL with the SsoAdvanced Universal SYBR Green Mix (Bio-Rad Laboratories, Hercules, CA) and 500 nM of each primer (Xba_76F: GTGCTCTTTAKVCGCTCAAAACGC, Xba_145R: AAAAACCACTTTCCTTTGCTCCT) or a single-copy locus on chromosome 12 (Chr12_03F: TCTACAGTTTGGTCCGTATGATC and Chr12_03R: CAATGTCCAGAGAGCTGGCATG) was tested with a temperature gradient, melt curve analysis, and standard curve. The loci were amplified by heating for 3 min at 98 °C, 40 cycles of 10 s at 98 °C and 30 s at 62 °C, followed by a melt curve on a CFX96 Touch Real-Time PCR Detection System (Bio-Rad Laboratories). Normalized quantities were calculated from three replicates each from four channel catfish and four blue catfish using the 2−ΔΔCt method [86].