Background

With a total of 107 chicken breeds, China has one of the richest local breed resources [1]. This diverse chicken genetic resource is a vital part of the diversity of biological genetic resources around the world and provides excellent material for breeding new varieties or to genetically improve breed.

China is the second-largest broiler producer and consumer all over the world, which accounts for approximately 11% of the chicken production across the globe (FAOSTAT, 2017). In China, chicken is the second largest meat product after pork, making up to 17% of the total meat production. Chicken meat is mainly obtained from the introduced white feather broilers and domestic yellow-feathered meat-type chickens (meat-type local chicken breed, meat-type bred variety and a relevant strain containing the consanguinity of Chinese indigenous chicken), each accounting for half of the consumption. However, the current challenge is how to effectively protect and maintain the existing local varieties. On the other hand, if breeding efficiency is promoted, new chicken lines breeding would be accelerated. The genome-wide SNP chip, also known as SNP array, arranges up to 25 million of DNA marker flanks on glass or special silicon chip to form the SNP probe array. It functions by means of the reaction of base pairing between the chip fixed DNA marker flanks with the target genome, so as to accurately identify the genetic information.

The genoty** arrays have been developed for pig [2], cow [3], dairy cattle [4], sheep [5], salmon [6], and buffalo [7] et al. In chicken, the first 3 K genoty** array was developed in 2005 with 3072 SNPs [8]. After that, in 2008, Groenen et al. did develop a 60 K bead chip for chicken which evenly covered the whole genome [9]. To date, the only available commercial arrays for chicken is Chicken the Affy 600 K SNP Array (Axiom Genome-Wide Chicken Genoty** Array), which was developed by Kranis et al [10]. The other arrays are privately owned by commercial companies. The array supplied an important tool for the genetic diversity analysis, breeds relationship analysis, GWAS, quantitative character positioning analysis of QTL, selective evolution investigation, and Genomic Selection [11]. Up till now the most efficient ways for SNP genoty**, biodiversity measuring, QTL map** and genomic selection is using SNP arrays. These applications provide improved technical support for the conservation of indigenous breeds and development of new genetic lines/breeds.

One pitfall of all current chicken SNP arrays is the bias towards western commercial lines. The current chicken arrays, however, lack the genomic variation information of Chinese indigenous breeds. Therefore, it is imperative to develop a new type of genome-wide SNP chip with moderate flux in the chicken breeding industry, and also contains the genetic variation information specific to Chinese indigenous breeds. Overlap with the current arrays of the different platforms (Axiom and Illumina) is essential to link the commercial SNP arrays.

Through whole genome re-sequencing of a variety of Chinese native breeds and commercial chicken lines, integrating SNPs associated with economic traits detected in a crossing breed (either indigenous and commercial), a new public available moderate density (55 K) chicken array (IASCHICK) has been developed.

Results

The SNPs selection was performed in four groups. The roadmap is shown in Fig. 1, and the establishment of the four groups are indicated in the following paragraphs.

Fig. 1
figure 1

The roadmap for the design of the new chicken 55K SNP array

Genome re-sequencing of chickens supplying the first SNP group

Eight Chinese local chicken breeds or inbred lines were selected for whole genome sequencing. Each breed/line holds 3 pools of 16 individuals per library without individual barcodes (Table 1). The data summary of each library is provided in the Additional file 1. The number of SNPs per breed/line varied from 7.09 million to 9.41 million SNPs. The average number of detected SNPs was 8.61 M in the local lines, and 7.73 M in the commercial broilers. The total number of SNPs detected overall 8 breeds/lines was 15.2 M. The SNPs with minor allele frequency (MAF) < 0.05 and with low ΔF were excluded for further steps. The 140 K SNPs, which allelic frequencies distinct to the control breeds, were subsequently used as the first group of candidate SNPs.

Table 1 Sequenced chickens and the number of SNPs detected from different breeds

Selection of the second group of candidate SNPs based on the GWAS of 15 traits

The 7.42 K SNPs were demonstrated to have the top 1% genome-wide significance in 15 traits and were selected as the second group of SNPs. The details are shown in Additional file 2.

Selection of the third group of candidate SNPs based on the genes associated with economic traits

SNPs in the regions of 861 candidate genes related to economic traits were used according to previous studies of gene/protein expression profiles. A total of 66.37 K SNPs in 383 genes for breast muscle and intramuscular fat development in embryonic and post-hatching periods [The fourth group of candidate SNPs are derived from whole genome sequences of low- and high-RFI chickens

Whole genome sequencing of low- and high-RFI chickens were performed to locate the genomic variants for RFI based on differences in allelic frequency between high- and low-RFI chickens as described in our previous study [Designing the Affy 55K genoty** array

Based on the above four groups of candidate SNPs, a custom-made algorithm was used to fix the final array. Finally, 52,184 SNPs were selected for the final array. The mean physical distance of SNPs in each involved chromosome shows in Table 2. The priority 1 SNPs (the SNPs in group 2, 3 and 4) and 25 INDELs were first placed on the final SNP panel. The next step was addition of the priority 2 SNPs (the SNPs in group 1). The remaining 18.41 K SNPs was selected for the blank windows in the whole chicken genome which the SNPs in the four groups cannot be covered.

Table 2 The number of SNPs of the 55K array on each chromosome and their distancea

The SNPs positions of 55K array were given in Additional file 5. The selected SNPs were derived from the following five groups (Table 3): (i) 19.2 K SNPs from whole genome sequencing of the eight chicken breeds/lines; (ii) 7.42 K trait-related SNPs from the Illumina 60 K SNP Bead Chip, which were found as SNPs significantly associated with 15 economic traits; (iii) 15.98 K SNPs from 861 candidate genes of target traits and high IgY level related region; (iv) 4.32 K SNPs related to chicken RFI; and (v) 18.41 K from chicken SNPdb. In the final genoty** array, 99.85% of SNPs could be annotated (Table 4). The distribution of SNPs on the chromosomes is shown in Fig. 2.

Table 3 The number of SNPs from five candidate groups in the final 55K array
Table 4 Summary of the SNPs effect prediction in 55K array
Fig. 2
figure 2

The chromosome-wise SNP density of the 55K SNP array. Chromosome length shows in left axis (based on galGal-5) and SNP density shows in right axis

The comparisons of the Affy 55K array with the existing chicken arrays (Affy 600 K array, and Illumina 60 K)

All the SNPs of this 55K array, Affy 600 K array [10], and Illumina 60 K array [9] were mapped to the latest chicken genome (GRCg6a). The overlap of the 3 arrays is shown in Fig. 3. There are 6740 SNPs (13%) which overlap between the Affy 55K array and the Illumina 60 K array. When comparing to the Affy 600 K array, there are 24,227 SNPs that overlap between the 55K array which accounts for 46%. There were 21,412 new SNPs included in 55K array compared to the existing arrays.

Fig. 3
figure 3

The comparison of the overlap of the SNP positions among Affy 55K array, Affy 600 K array and Illumina 60 K array

Validation of the 55K array in 13 chicken breeds/lines

All samples from 10 Chinese local breeds (Chahua, Dagu, Liyang, Luhua, Qingyuan, Silkie, Wenchang, Bai’er, ** data to the high-density SNPs genoty** data is possible. In the new 55K genoty** array, 69% of SNPs are within genes (non-intergenic variant), the proportion is higher than the proportion in the Affy 600 K array (54%), and lower than the proportion in Illumina 60 K array (86%).

To investigate the ability of our 55K panel to detect polymorphisms and population structure in local or commercial breeds/lines. Nine Chinese local breeds (Chahua, Dagu, Liyang, Luhua, Qingyuan, Silkie, Wenchang, Bai’er, and ** array can be used to determine genetic variation both in various local Chinese breeds and in commercial meat-type and egg-type breeds.

According to the results of MDS analysis (Fig. 4), individuals originating from the commercial broilers, Hubbard and Cobb clustered together tightly and the two Chinese indigenous egg-type breeds, ** array. The 55K array has a medium SNPs density, cost-efficient, and optimal for Chinese local breeds compared with the existing 600 K commercial array. Furthermore, the 55K genoty** array incorporated known SNPs loci that possess a high potential for association with economic traits and traits that are expensive and difficult to measure, which will be interesting for both GWAS and genomic selection (GS) projects.

With the rapid development of next-generation sequencing technologies and reduction of the costs, genoty** with re-sequencing (IBS) will be the focus of future research. In the current phase, however, the IBS system is more complex and not as solid as the SNP array. The array genotyped data can be easily analyzed and standardized according to constant array SNP positions. The batch effect can be excluded by different laboratories and companies.

Conclusions

In conclusion, we developed Affy 55K genoty** array that was designed to use SNPs that are segregated in Chines local chicken breeds and commercial lines/breeds, and where large number of SNPs are associated with economic traits. Compared to the existing Affy 600 K and Illumina 60 K arrays, 21,41 K new SNPs were included in the 55K SNP array. The results from the our Affy 55K genoty** array can be imputed to the high-density SNPs genoty** data. This array offers wide range of potential applications, such as the evaluation of germplasm resources of chicken breeds, investigation of diversity of different chicken breeds, implementation of genome-wide association studies and genomic selection.

Methods

Animals

For whole genome sequencing, the 384 chickens were sampled from eight local breeds or inbred lines (Table 1). Chickens were supplied by Institute of Animal Sciences in CAAS (local breed Bei**g-You, inbred **gxing-Huang line), Jiangsu Lihua Co. Ltd. (Cyan-shank Partridge lines with fast and mediate growth rates, respectively), Institute of Poultry Sciences of CAAS (Sanhuang chicken and Recessive White chicken), **nguang Nongmu Co. Ltd. (paternal and maternal line of Cobb in parental generation). In addition, a set of 15 to 21 chickens in each breed/line were used for SNP array evaluation, which were sampled from 9 local breeds and 3 commercial lines. Chickens were supplied by the Institute of Poultry Sciences of CAAS (Bai’er chicken, Chahua chicken, Dagu chicken, Liyang chicken, Qingyuan chicken, Silkie, Wenchang chicken, Luhua chicken and **anju chicken), **nguang Nongmu Co. Ltd. (paternal lines in parent generation from Cobb and Hubbard), the Institute of Animal Sciences of CAAS (White Leghorn). Two groups with 87 and 100 chickens from **gxing-Huang and Cobb were also used for SNP array evaluation. The blood samples used in this study were all collected from chickens under the veterinary supervision and the Guidelines for Experimental Animals established by the Ministry of Science and Technology (Bei**g, China), and with the approval of Animal Ethics Committee of the Institute of Animal Sciences. No anaesthesia or euthanasia methods were used. There was no evidence at health examination that any of the involved chickens had clinical diseases caused by the sampling.

Whole genome re-sequencing

Genomic DNA was isolated from blood samples by the phenol-chloroform method. Samples DNA quality were validated by gel electrophoresis and Nanophotometer. The individual DNA samples (48 from each breed/line) were pooled to construct three libraries, with each library containing 8 males and 8 females. The libraries were constructed using the Nextera DNA Library Preparation Kit (Illumina Inc., San Diego, CA) according to the manufacturer’s standard protocol. All libraries were sequenced on the Illumina Hiseq2500 (2 × 125 bp).

Genome sequence alignment and detection of the first group of candidate SNPs

Reads were filtered for low quality (> 10 consecutive nucleotides with Phred scores < 10), adaptor sequences, and sequences without a quality control-passed paired read using NGSQC toolkit (v2.3.3) [22]. Each trimmed pool sequencing coverage are shown in Table S5. Filtered sequenced reads were mapped to the reference genome (Gallus_gallus_4.0) by BWA software (v0.7.10) [23]. PCR duplications were removed with -rmdup argument in Samtools (version 0.1.1.18) [24]. SNPs were identified and genotyped for each data set with mpileup function in Samtools, then called by VarScan [25]. Only those highly confident variants supported by both methods were kept for downstream analyses. The SNPs calling details parameter were described by Liu et al [16]. The SNPs with MAF < 0.05 and the INDELs in each breed/line were filtered by vcftools [26]. In Bei**g-you chicken, **gxing-Huang chicken, Sanhuang chicken, and the two lines of cyan-shank partridges minus the MAFs of Cobb paternal line, as well as the MAFs of Recessive White chicken, and the paternal and maternal generation of Cobb minus the MAFs of Bei**g-You chicken, respectively. The SNPs with low ΔF were excluded. The value of ΔF was adjusted for 140 K SNPs reserved in local breeds and commercial lines to generate the first group of candidate SNPs. The threshold of △F in local breeds and commercial lines are 0.609 and 0.731, respectively. The SNPs acquired through genome re-sequencing of eight breeds/lines supplied the major data for the first group of SNPs in the array. SNPs specific for chromosome W were removed and were not considered in current designing. There are also 25 INDELs for special interest, which were defined as priory 1.

Selection of the second group of candidate SNPs based on GWAS analysis of 15 traits

The second group of candidate SNPs was selected according to a GWAS analysis of 15 traits. Phenotype and genotype data were generated from the CAAS chicken F2 resource population as described in Sun’s report [27]. Briefly, the population was derived from a cross between local Bei**g-You chickens and commercial Cobb broilers (Cobb-Vantress, Inc.). The weight, carcass, immune and meat quality traits were measured from 367 F2 chickens. The 15 traits were as follows, (a.) body weight of day 28 and day 42, (b.) carcass traits including total weight percentage after slaughtering, breast muscle weight percentage, leg muscle weight percentage, abdominal fat percentage, (c.) meat quality traits including the breast muscle intramuscular fat ratio, ultimate pH (24 h), meat lightness, redness value and yellowness value of breast muscle, (d.) immune traits including IgY level to sheep red blood cell, the heterophil and lymphocyte ratio, IgY level in serum, and the average red blood cell backlog.

SNPs were genotyped by using Illumina 60 K SNP Bead chip for chicken [9]. All description of the phenotypes had been reported by Sun et al. in 2013 [27]. To maximize the polymorphism resources for SNP array, the GLM procedures were used for the GWAS analysis and was performed by PLINK software (version 1.07) [28] with 42,585 SNPs passed quality control. The details were described by Sun et al. [27]. The SNPs with top 1% lowest p-values were used in the following procedures.

Selection of the third group of candidate SNP based on the associated genes for target traits

Known candidate genes for economic traits were collected and used for the SNP array design. All genes were identified through previous researches by our group [12, 13, 29, 30]. We retrieved total 861 genes related to skeletal muscle and intramuscular fat development, chicken fat metabolism, salmonella enteritidis resistance etc. (Additional file 2). The SNPs were annotated by the Ensembl tool VEP [31]. Mutations and the SNPs in the exons, splicing region, and UTRs were firstly selected out. A maximum of 5 candidate SNPs were selected out for each gene.

In addition, the SNPs in this group also included a batch SNPs detected from a set of capture sequencing of Chr. 11, Chr. 16, and Chr. 19 of White Leghorns and Bei**g-You chickens with low or high serum IgY (Liu et al., unpublished, Supplement Table S3).

Selection of the fourth group of candidate SNPs for RFI

The fourth group candidate SNPs were selected from a whole genomic re-sequencing research of low- and high- RFI Cobb and Bei**g-You chickens. SNPs calling results showed that 8,505,214 and 8,479,041 single nucleotide polymorphisms (SNPs) were detected in low- and high-RFI Bei**g-You chickens, respectively; 8,352,008 and 8,372,769 SNPs were detected in low- and high-RFI Cobb chickens, respectively. The SNPs with Fst value < 5% in each breed were excluded followed by SNPs with mean ΔF < 0.35 between low- and high-RFI chickens. Through the above filtering processes, 3.74 K SNPs assigned to 1137 candidate genes in Bei**g-You chickens and 0.58 K SNPs (448 genes) in Cobb chickens were reserved [16].

Selection of the SNPs from chicken SNPs database

The first four groups cannot cover the whole genome evenly. In the fifth group, SNPs were selected from chicken SNPs database from NCBI (ftp://ftp.ncbi.nih.gov/snp/organisms/archive/chicken_9031/).

SNP screening according to the scoring of probes

All the SNPs’ positions were transformed from WASHUC2.1 (Illumina 60 K), and Gallus_gallus-4.0 (Affy 600 K) to Gallus_gallus-5.0 (Affy 55 K) by the LiftOver tool on UCSC Genome Browser. Take utility of all SNPs from the five candidate groups above, in silico validation, was performed using the AxiomGTv1 algorithm of APT, which generated an output score file containing p-convert values, signifying the SNP array quality and list of recommended and non-recommended SNP probes. For a high-quality SNP array design, non-recommended SNP probes were all excluded in the following procedure.

SNPs selection procedure for the final 55K array

The final SNPs selection was done in multiple steps using several criteria. The roadmap is shown in Fig. 1.

A custom-made algorithm was applied as described below. According to the Gallus_gallus-5.0, the chicken genome length is about 1.2 Gb. To ensure the probe position evenly distributed in the chicken genome, the whole genome was distributed by windows with 22 Kb length. The backward window started from the probe position of the forward probe position. The selection of the final array was performed on each chromosome separately. The first four groups SNPs were divided as 2 priorities. The SNPs in group 2, group 3, group 4, and the INDELs in group 1 were defined as priority 1, and the SNPs in group 1 were defined as priority 2.

  1. 1.

    a) The selection of the SNPs in priority 1. If there is no SNP in a 22 kb window, the window will be reserved. b) If there are one or two SNPs in the window, the SNP(s) was reserved. c) If there are 3 or more SNPs in a window, only 2 SNPs in this window will be reserved, which can make the SNPs even distributed in this window according to the following formula. SD2= \( \frac{{\left(\mathrm{S}-\overline{\mathrm{x}}\right)}^2+{\left({N}_i-\overline{\mathrm{x}}\right)}^2+{\left({\mathrm{N}}_{\mathrm{j}}-\overline{\mathrm{x}}\right)}^2+{\left(E-\overline{\mathrm{x}}\right)}^2}{4} \). In the formula above, the S and E are the start position and the end position of the window respectively; and Ni and Nj are the target SNPs position in the window. The SNPs Ni and Nj which can minimum the SD2, will be reserved.

  2. 2.

    The selection of priority 2 SNPs. The windows reserved 1 or 2 SNPs will be skipped. The windows without SNP will be filled by one SNP of priority 2 according to the formula described above.

  3. 3.

    The windows without any SNP will be filled by 1 SNP from the NCBI SNPdb of chicken, while the validated SNPs will have a priority for filling.

The final array contains 55K probes for 52 K SNPs, which were manufactured by Affymetrix® using photolithography. The redundant probes are used for interrogating each SNPs [32, 33]. The final 52 K SNPs were annotated by the online tool Ensembl VEP [34].

The comparisons of the 55K Affy array with the existing arrays (Affy 600 K array, and Illumina 60 K)

All the SNPs’ positions were transformed from WASHUC2.1 (Illumina 60 K), Gallus_gallus-4.0 (Affy 600 K) and Gallus_gallus-5.0 (Affy 55 K) to GRCg6a by the LiftOver tool on UCSC Genome Browser. All the SNP positions of the three genoty** arrays were compared. The SNPs on 600 K array and 60 K array were also performed by Ensembl VEP [31]. Overlap** Venn plot was performed by the Calculate and draw custom Venn diagrams website (http://bioinformatics.psb.ugent.be/webtools/Venn/).

Validation of the 55K array in 13 chicken breeds/lines

The genomic DNA from 12 breeds/lines (Chahua, Dagu, Liyang, Luhua, Qingyuan, Silkie, Wenchang, Bai’er, and ** was done on Axiom® arrays using the Affymetrix® GeneTitan® system according to the procedure described by Affymetrix (https://assets.thermofisher.com/TFS-Assets/LSG/manuals/702899_PI.pdf) in the Bei**g Compass Biotechnology Co., Ltd. (Bei**g, China).

Basic genotype statistics for each marker, including call rate, MAF, Hardy-Weinberg Equilibrium (HWE), allele and genotype counts were calculated using the Quality Assurance Module from the SNP Variation Suite version 7 (SVS; Golden Helix Inc., Bozeman, Montana: www.goldenhelix.com). The following quality control criteria (filtering) were used to remove SNPs with less than 95% call rate for further analysis. The SNPs with less than 0.05 MAF. SNPs were tested for HWE (P < 0.001) to identify possible ty** error. Samples with more than 10% missing genotypes were removed from the study.

The MDS was performed using the genotype data of the SNPs from the 55K panel on all the breeds samples (n = 226) to assess the utility of the panel in detecting population structure. Population structure between 12 breeds was carried out using PLINK software (version 1.90b3) [28] with the MDS method on, and the plot was performed by ggplot2 [35]. The linkage disequilibrium in 2 populations were performed by the GAPIT [36]. The LD decay plot performed by PopLDdecay software are presented as whole genome levels and as chromosome levels with the parameter of smaller break point size of 5 Kb and bigger break point size of 40 Kb [37].