Introduction

Genetic variants of the FTO gene were the first common genetic polymorphisms to be associated with increased body weight and obesity.1 This association was replicated in several distinct populations,2, 3, 4, 5, 6, 7, 8, 9, Full size image

SOLiD sequencing

Libraries were prepared from each pool and emulsion PCR was carried out according to the instructions from Applied Biosystems (Foster City, CA, USA). Sequencing was performed with Applied Biosystems' SOLiD 3 platform using a 50-bp read length on standard slides according to the protocol of the manufacturer. The reads were mapped to the FTO reference sequence (hg18, chr16:52 285 069-52 716 675) using the corona lite algorithm (Applied Biosystems) with default settings.

Variant calling and filtering

The methodology used for identification and filtration of false-positives SNPs, insertions and deletion (indels) are covered in greater detail in Supplementary Information 1. Briefly, SNPs are reported by the SOLiD system as valid adjacent miss-matches (that is, not sequencing errors) compared with the reference sequence. However, to limit the number of false-positive SNPs, we implemented a strategy that we previously used on data from SOLiD sequencing of pooled DNA.26 This method focuses on filtering excessively amplified reads containing valid adjacent miss-matches, probably derived from errors in previous PCR amplification steps and not true polymorphic sites. A quality score of unique valid adjacent mismatches (UVAM) was calculated for each candidate polymorphic site and a UVAM threshold was determined by comparing the score distribution of the detected candidate sites to sites that were found in the dbSNP database and the 1000 genomes project (Figures 2a–d). Sites that had a lower quality score than the threshold was considered false positives and excluded. The filtration step was validated by genoty** 48 candidate SNPs before filtration with a 48-plex GoldenGate assay from Illumina (San Diego, CA, USA).28

Figure 2
figure 2

To remove false-positive SNPs, the distribution of unique valid adjacent miss-matches (UVAM) from the SOLiD sequencing were compared for known SNPs from the dbSNP database (a) and results from the 1000 genomes project (b) with the candidate SNPs from the sequencing of obese (c) and lean group (d). The clear difference in distribution between previously known and novel candidate SNPs indicates that most of the candidates are false positives. Based on this, a binomial hypergeometric distribution was used to model the probability of non-SNPs to have a certain UVAM score in the obese (e) and lean group (f). This allowed us to set a threshold for each group (marked in red) that corresponds to a false discovery rate of 1%. The color reproduction of this figure is available at the International Journal of Obesity online.

Indels with an estimated minor allele frequency >1% were detected using two methods for gapped read map** and included for further analysis.

Data analysis

Association tests based on allele frequencies were performed using Fisher's exact test together with approximation of 95% confidence interval off odds ratios for each SNP and indel using the software R (http://www.r-project.org, R Foundation for statistical computing, Vienna, Austria). To adjust for multiple testing, the significance level was adjusted for the 44 haplotype blocks that we identified across the sequenced region in a previous study of a Swedish population, which contains SNPs in linkage disequilibrium.29 Consequently, P-values below 1.14 × 10−3 were considered significant.

Results from the individual genoty** (GoldenGate and TaqMan assay) were analyzed with PLINK (http://pngu.mgh.harvard.edu/purcell/plink/). The validated SNPs were checked for deviation from Hardy–Weinberg equilibrium using a Pearson's χ2-test (1 d.f) before being tested for association with obesity by using logarithmic regression to calculate 95% confidence intervals of odds ratios, controlled for sex and assuming an additive model. For details on the information with ENCODE data see Supplementary Information 1.

Genoty**

For details on genoty** of candidate SNPs and validation in Latvian and Greek populations see Supplementary Information 1.