Background

Grass carp has a breeding history of more than 1700 years in China. Since the 1980s, grass carp has been introduced directly or indirectly to various countries of the world, such as the United States, Mexico, India and Hungary [1]. Its artificial breeding began in 1958 and it is the most productive species in freshwater fish farming in the world, with great economic effects, providing a large amount of high-quality protein and trace elements for all mankind. In 2020, the total production of freshwater fish farming in China was 30.89 million tons, of which grass carp had the highest production (5.57 million tons), accounting for about 18% of the total production [2]. Due to strong adaptability, rapid growth, and large size of grass carp, it is known as one of the “Four Domesticated Fish” in freshwater culture in China, together with the bighead carp (Hypophthalmichthys nobilis), silver carp (Hypophthalmichthys molitrix), black carp (Mylopharyngodon piceus). It should be noted that the “Four Domesticated Fish” belong to the family Xenocyprididae in the latest classification system (Eschmeyer’s Catalog of Fishes), which previously classified them as Cyprinidae. Grass carp is a typical herbivorous fish and mainly distributed in the Yangtze, Pearl and Heilongjiang rivers in China. Food habit transition during the development of grass carp facilitates the rapid growth and development. Previous research has shown that the body weight, body length and intestine length of transitioned grass carp are significantly higher than that of the untransitioned grass carp. The genes involving circadian rhythm, lipid synthesis and metabolic pathways have undergone adaptive changes after transition of food habits, making it more effective to use the nutrients in plants [3].

Via high-throughput whole-genome sequencing technology we can accurately obtain the base sequences of a species to decipher its genetic information. It can reveal the complexity and genetic diversity of the species genome, which brings new research methods and solutions to explore the mechanism of species development and environmental adaptability, thereby speeding up the breeding process of new varieties [4, 5]. In recent years, the genome sequences of major freshwater economic fishes in China have constantly published. Such as blunt snout bream (Megalobrama amblycephala) [6,69] was used for multiple sequence alignment of clean data, and HTSeq (v.0.11.2) [70] was used to calculate TPM value for gene expression. Although the whole genomic sequence of M. amblycephala has been published, the resulting document of gene structure prediction has not been made public [Conservation or loss of CNEs in teleost fish genomes

A CNE was considered present in a cyprinid fish genome if it showed a coverage of at least 30% with a zebrafish CNE in Multiz [81] alignment. To identify CNEs that could have been missed in the Multiz alignments due to rearrangements in the genomes, or due to partitioning of the CNEs among cyprinid fish duplicate genes, we searched the zebrafish CNEs against the genome of the cyprinid fish using BLASTN (E-value <1e-10; ≥ 80% identity; ≥ 30% coverage). Those CNEs that had no significant match in a cyprinid fish genome were considered as missing in that genome. The method of CNEs annotation refers to previous research [20, 82]. We visualized CNEs using the online tool VISTA (https://genome.lbl.gov/vista) [83].

Expansion and contraction of gene families

For greater insight into the evolutionary dynamics of the genes, the expansion and contraction of the gene ortholog clusters were determined (p value < 0.01) among the 19 species by comparing cluster sizes between ancestors and each current species using CAFÉ software (v.4.2.1) [84]. The gene gain and loss along each lineage of the RAxML tree were calculated by CAFÉ software with a random birth and death process model. A probabilistic graphical model (PGM) was introduced to calculate the probability of transitions in gene family size from parent to child on the phylogeny. The expanded and contracted gene families in grass carp were identified by comparison with other species, and expanded and contracted gene families in other species were identified by comparison with ancestors. KEGG and GO analyses were conducted based on gene families exclusively presented and specifically expanded and contracted in cyprinid fish using clusterProfiler [85].

Identification of positively selected genes (PSGs)

All one-to-one orthologous genes extracted from 19 species were used to identify PSGs. The multiple sequence alignments were generated and used to estimate three types of ω (the ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions) using branch model in the codeml program of the PAML package (v.4.8) [78]. Branch model (model = 2, NS sites = 0) was used to detect ω of appointed branch to test (ω0) and average ω of all the other branches (ω1) and the mean of whole branches (ω2). Then χ2 test was used to check whether ω0 was significantly higher than ω1 and ω2 under the threshold p value < 0.01, which hinted that these genes would be under positive selection or fast evolution.