Introduction

Larimichthys crocea, more commonly known as the large yellow croaker, is a sciaenid fish species. The ecological and genetic studies of large yellow croakers are important because they are economically important in Chinese coastal regions. Wild large yellow croaker was originally distributed from the southern Yellow Sea to the South China Sea1. However, wild stocks of the large yellow croaker have suffered severely from overfishing and are on the brink of extinction. Large yellow croaker was initially domesticated in the early 1980s. The annual yield of large yellow croaker aquaculture in China has been greater than that of any other domesticated marine fish2. After the initial artificial breeding attempts were successful in the 1980s, enhancement and release were also carried out. It has been reported that the first enhancement and release of 16,200 individuals took place in the Ningde Sea as early as 1987. This was followed by a rapid increase with the release of millions of large yellow croaker carried out annually3. It is hard to define whether a large yellow croaker captured in the sea is wild or domesticated based on the above background. Therefore, in this study, we defined the populations of our samples as sea-captured population and the farmed population.

The culture performance of farmed large yellow croaker populations has declined, mainly because of irrational artificial breeding, inbreeding, and blind introduction. They have caused the degradation of genetic resources and hybrid germplasm in the large yellow croaker4. Previous genetic studies of the population structure of large yellow croaker are available for both domesticated and sea-captured populations; however, these studies have been limited by putative neutral markers; e.g., microsatellites5 and single nucleotide polymorphism (SNP) loci6 including narrow regions of the genome7. This is not enough genetic data to describe the structure of the population. In this study, whole-genome resequencing of the 198 croakers were performed to obtain a better information of the critical uncertainties associated with population structure, genetic diversity and the analysis of mixed stocks across the domesticated and sea-captured populations of large yellow croaker.

As mentioned above, for each generation over the last decades, large yellow croaker has been exposed to directional selection for an increasing number of economically important traits, such as growth, anti-freeze capacity, desirable flesh characteristics and disease resistance8,9. Genetic improvement of strains of large yellow croaker for commercial aquaculture is in the process of establishment in China at the current time23. The reference genome sequence of large yellow croaker was downloaded from Ensembl Release 100 (http://www.ensembl.org), and indexed for BWA v0.7.16a-r118124. The filtered reads were aligned to the large yellow croaker reference genome by BWA-MEM with default parameters.

Variant calling, filtering and annotations

Variant calling was performed using samtools mpileup with default parameters25. To obtain high quality SNPs, only biallelic SNPs were analysed further. Variants with a call rate < 100% and minor allele frequency < 5% were filtered out. To perform variant annotations, the large yellow croaker genome annotation was downloaded from Ensembl Release 100. SNPs were annotated with SnpEff v4.226. Finally, we retained 6,302,244 SNPs as the initial dataset for the downstream analysis.

PCA and UMAP analysis

The SNP data in VCF format was converted to PLINK binary format using PLINK v1.90b4.527. PCA was then carried out with PLINK. UMAP was performed on the top five principal components with the R umap library.

Construction of the neighbour-joining tree

To construct a neighbour-joining phylogenetic tree of the samples, we calculated pairwise genome-wide identical-by-state (IBS) distances based on the SNPs using PLINK. Based on the pairwise distance matrix (1-IBS), a neighbour-joining tree was constructed using MEGA728.

Admixture analysis

The admixture analysis was performed with ADMIXTURE v1.3.0 software29. CV errors were estimated for each K-value. The K-value with the lowest CV error was regarded as optimal for estimating the level of admixture in each sample.

Analysis of effective population sizes

The effective population size was calculated for each group using SNeP v1.130. The parameter for maximum number of SNPs per chromosome was set at 10,000.

Fst analysis

Fixation index (Fst) values between populations were calculated for all SNPs using PLINK. Fst values more than 0.25 were regarded as significant. Also, average Fst values were calculated using a 40 kb window with 10 kb step. Average Fst values more than 0.05 were regarded as significant. Significant regions were merged and the genes in these regions were reported.

Gene ontology analysis

Gene ontology (GO) analysis was performed by Metascape31 with the following parameters: Min Overlap = 3, Min Enrichment = 1.5, P cut-off value = 0.05. Input gene lists were analysed as zebrafish. The p-values were adjusted for multiple comparisons.

Ethics approval and consent to participate

The farmed fish were reared in a nucleus farm named ‘Fisheries research institute of Zhoushan’ in Zhoushan City, Zhejiang Province, P.R. China and Fufa Aquaculture Co. Ltd in Ningde City, Fujian Province, P.R. China, respectively; The Zhoushan sea-captured individuals were caught by trawler around the Zhongjieshan islands (August–September 2019; latitude 30.198; longitude: 122.682); the Ningde sea-captured individuals were captured around **yang Island (November–December 2019; latitude: 26.508; longitude: 120.53); This study was approved by the Animal Care and Use Committee of Zhejiang Ocean University. All experimental protocols followed ARRIVE guidelines. All participants consented to publish the paper.

Consent for publication

Consent for publication is not applicable in this study, because there is not any individual person’s data.