Background

Nuclear receptors are critical for proper development and function of many physiological pathways including lipid metabolism, inflammation, and cell growth [13]. Over the past 25 years, it has become clear that nuclear receptors are also critical for the onset and progression of many diseases, including cancer. In breast cancer, for example, estrogen receptor-α (ERα) is expressed and drives tumor growth in approximately 2/3 of cases. However, only recently it has been appreciated that proper nuclear receptor function is absolutely dependent on the interaction with coregulator proteins [4]. These proteins couple nuclear receptors with RNA polymerase II and chromatin remodeling machinery to either activate (coactivators) or repress (corepressors) nuclear receptor mediated gene transcription. And because a single or a subset of coregulators can simultaneously regulate multiple cellular processes through multiple nuclear receptors, they have been classified as 'master regulators' [3]. Kee** with this classification, many coregulators have been implicated in numerous human diseases, including breast cancer [510].

Family history is one of the strongest risk factors for breast cancer with the risk approximately double in first degree relatives of women with breast cancer compared to the general population [11]. Because of this, many attempts to identify genetic risk factors using multiple approaches have been conducted. However, despite the identification of mutations in the major risk factor genes such as BRCA1, BRCA2, PTEN, CHEK2, and ATM, it is estimated that ~75% of familial breast cancers have yet unidentified risk alleles [12]. ERα is expressed and drives a large fraction of breast cancer cases and is therefore an excellent candidate gene for identifying breast cancer risk factors. Recently, a significant association with familial breast cancer risk has been observed for the C allele of ESR1_rs2747648 in an allele dose-dependent manner. This variant is located in a miRNA-binding site in the 3' untranslated region of ESR1 [13]. However, historically very few associations have been found between SNPs in ERα and breast cancer risk. Further, a recent study conducted a comprehensive search of all SNPs in ERα that revealed no major risk associations (n>55,000 breast cancer cases and controls) [14]. This suggests that other players in the ER signaling pathway may be important for breast cancer risk. Because of the critical importance of coregulators for ERα function, we hypothesized that breast cancer risk is influenced by SNPs within the coactivators SRC-1/NCoA1 and SRC-3/NCoA3/AIB1 and the corepressors NCoR and SMRT/NCoR2.

We previously reported two SNPs in SRC-3 (rs2230782 and rs2076546) associated with reduced breast cancer risk in a case-control study of German and Polish high-risk, BRCA1/2 mutation-negative women (cases: 775, controls: 1628) [15]. In a recent study by Haiman et al [

Methods

SNP Discovery

Target sequence obtained from NCBI consisting of all exons, 500 bp of proximal promoter, and 25 bp of flanking introns from SRC-1, SRC-3, NCoR, and SMRT was submitted for primer design and Sanger sequencing to Polymorphic DNA Technologies Inc. (Alameda CA). DNA from 96 samples (48 Caucasian American, 48 African American) obtained from the Coriell Institute (Camden, NJ, USA) (sample sets: HD100CAU and HD100AA) was sequenced in both directions and aligned to NCBI reference sequence and previously reported SNPs in dbSNP. These samples had been collected and anonymized by the National Institute of General Medical Sciences. Visual inspection of chromatograms was conducted for heterozygous calls.

Genoty** Cohort

A case-control study was conducted investigating a German familial breast cancer study cohort. Unrelated, German, female BRCA1/2 mutation negative index cases from breast cancer families were used in this study. The samples, all of Caucasian origin, were collected during the years 1997-2005 by six centers of the German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC: centers of Heidelberg, Würzburg, Cologne, Kiel, Düsseldorf and Munich, see authors affiliations). Familial cases were identified based on (A1) families with two or more breast cancer cases including at least two cases with onset below the age of 50 years; (A2) families with at least one male breast cancer case; (B) families with at least one breast cancer and one ovarian cancer case; (C) families with at least two breast cancer cases including one case diagnosed before the age of 50 years; (D) families with at least two breast cancer cases diagnosed after the age of 50 years; (E) single cases of breast cancer with age of diagnosis before 35 years. These selection criteria which have previously been reported [17] enrich for cases caused by genetic factor(s). The control population included healthy and unrelated female blood donors collected by the Institute of Transfusion Medicine and Immunology (Mannheim), sharing the ethnic background and sex with the breast cancer patients. The age distribution in the controls and cases was similar (controls: mean age 45.6 years, median age 46 years, age range from 18 to 68 years old; cases: mean age 45.1 years, median age 45 years, age range from 19 to 87 years old). According to the German guidelines for blood donation, all blood donors were examined by a standard questionnaire and gave their informed consent. They were randomly selected during the years 2004-2007 for this study and no further inclusion criteria were applied during recruitment. The study was approved by the Ethics Committee of the University of Heidelberg (Heidelberg, Germany).

Genoty**

Genoty** was conducted using TaqMan allelic discrimination assays. Primers and TaqMan MGB probes were purchased from Applied Biosystems (Foster City, CA).

SRC-3 Q586H: 5'-CTGGGCTTTTATTGCGACCAAA-3V, reverse 5VGCTCTCCTTACTTTCTTTGTCACTGA-3'; TaqMan probes: forward 5'-TTCAATGTGTCACTCAAAT-3'-VIC, reverse 5'-CAATGTGTCAGTCAAAT-3'-FAM.

SRC-3 T960T: forward 5'-CCTGCACTGGGTGGCT-3', reverse 5'-CTCGCACCTGGTATGCTATTAGAC-3'; TaqMan probes: forward 5'-CTATTCCCACATTGCCTC-3'-VIC, reverse 5'-TTCCCACGTTGCCTC-3'-FAM.

SRC-3 C218R: forward 5'-AGACATAAACGCCAGTCCTGAAATG-3', reverse 5'-GCCAGAGATATGAAACAATGCAGTG-3'; TaqMan probes: forward 5'-TGAAATGCGCCAGAG-3'-VIC, reverse 5'-TGAAATGTGCCAGAG-3'-FAM.

SRC-1 P1272S: forward 5'-CCCTCCTCCTCAGAGTTCTCT-3', reverse 5'-CCTTCATGTCTGGTGACTGATACC-3'; TaqMan probes: forward 5'-CAGGTGGAGTTTGC-3'-VIC, reverse 5'-CAGGTGAAGTTTGC-3'-FAM.

SMRT A1706T: forward 5'-ACCTCGCAGCAGATGCA-3', reverse 5'-GAGGCCCCTCAGCATATCAG-3'; TaqMan probes: forward 5'-CCACAACACGGCCAC-3'-VIC, reverse 5'-CACAACGCGGCCAC-3'-FAM.

Genoty** call rates for all studies were >97%. The SNP assays were validated by re-genoty** 5% of all samples. The concordance rate for all SNPs varied from 99 to 100%.

Statistical Analysis

Hardy-Weinberg equilibrium test was undertaken using the chi-square "goodness-of-fit" test. Crude odds ratios (ORs), 95% confidence intervals (95% CIs) and P values were computed by unconditional logistic regression using a tool offered by the Institute of Human Genetics, Technical University Munich, Germany http://ihg.gsf.de/cgi-bin/hw/hwa1.pl. Power calculations were determined using power and sample size calculator software PS version 2.1.31 http://www.mc.vanderbilt.edu/prevmed/ps/. With the total sample size, we had 80% power to detect OR of 0.79/1.26 and 0.57/1.56 for carrier frequencies of 30% and 5%, respectively.

Haplotype Analysis

Haplotypes of variants located in the same gene were determined using the PHASE 2 software created by Stephens et al. [18], or SNPHAP 1.3 software created by David Clayton http://www-gene.cimr.cam.ac.uk/clayton/software/snphap.txt. Each individual was assumed to carry the most likely pair of haplotypes and the haplotype distributions were estimated based on the controls.

Results/Discussion

SNP Discovery

Complete coding regions and 25 bp of the flanking intronic regions of SRC-1, SRC-3, NCoR, and SMRT were fully sequenced in both directions using Sanger sequencing in 96 apparently normal individuals (48 Caucasian American, 48 African American) generating a total of ~5.8 MB of sequence. From this effort we identified 120 SNPs (61 in SMRT, 33 in NCoR, 18 in SRC-3, and 8 in SRC-1). A summary of the results is shown in Table 1 and details are provided in Additional File 1. Of these, 86 coding SNPs were identified resulting in 36 nonsynonymous SNPs (nsSNPs). SMRT contained the largest number of SNPs (61 total, 43 coding, and 17 nsSNPs). Despite its close relationship with SMRT, NCoR contains far fewer SNPs (33 total, 25 coding, and 10 nsSNPs). This is especially evident when only common SNPs are considered (minor allele frequency [MAF]>5%; 16 in SMRT, 1 in NCoR). The position of the coding SNPs and the MAF is schematically presented in Figure 1.

Figure 1
figure 1

SNP discovery in (A) SRC-1, (B) SRC-3, (C) NCoR, and (D) SMRT. Vertical lines delineate the position of SNPs identified by our resequencing effort. The height of the vertical lines represents the frequency at which the SNP was found. Black lines represent novel SNPs, grey lines represent SNPs found in dbSNP. Solid lines represent nonsynonymous SNPs, dashed lines represent synonymous SNPs. Positions of SNPs genotyped for risk associations are pointed out by arrows.

Table 1 SNP Discovery Summary.

By conducting the sequencing in two populations, we were able to distinguish SNPs unique to a particular population. We identified 66 SNPs unique to African Americans and 23 SNPs unique to Caucasian Americans (see Additional File 1). This distribution is similar to that reported previously in the SNP@Ethnos database for Yoruban and European populations and is hypothesized to arise from bottlenecks in non-African population history[19] However, most of the unique SNPs found in Caucasians were rare, possibly suggesting that these are recent alterations since only 4 out of the 23 unique SNPs (17%) were found in more than a single individual. On the other hand, 31 out of the 66 unique SNPs (47%) in African Americans were found in more than a single individual. It is important to note that some of the population unique SNPs are rare and since only 48 individuals were sequenced for each population, they could appear as unique SNPs purely by chance.

From our sequencing effort we identified 74 SNPs in these four coregulators not previously represented in dbSNP or reported in the recent study by Haiman et al [15] (rs2230782 and rs2076546). Additionally, we genotyped other coregulator SNPs we rationalized may have functional consequences based on the severity of the amino acid change and proximity to functional domains [rs1804645 (SRC-1), rs6094752 (SRC-3), and rs2229840 & rs7978237 (SMRT)] (positions are highlighted in Figure 1). For example, rs1804645 (SRC-1 P1272S) was chosen since it is the only non-synonymous SNP in SRC-1, is located in the second activation domain, and is predicted to be 'probably damaging' by a polymorphism phenotype prediction tool (PolyPhen, http://genetics.bwh.harvard.edu/pph/). Rs6094752 (SRC-3 R218C) was chosen because of the loss of charge and size as a result of the amino acid substitution, and is one of the most common non-synonymous SNPs in SRC-3. The SNPs in SMRT, rs2229840 (A1706T) and rs7978237 (G781E) were chosen for genoty** due to high frequency, severity of amino acid change, and location in a functional domain. Several approaches to design TaqMan assays for rs7978237 failed. We were therefore unable to obtain genoty** information for this SNP.

The genoty** results were in Hardy-Weinberg equilibrium in controls for all SNPs investigated (p = 0.309 for rs1804645; p = 0.112 for rs6094752; p = 0.058 for rs2230782; p = 0.067 for rs2076546; p = 0.140 for rs2229840). The three SNPs that we rationalized may have functional consequences that we were able to genotype, namely SRC-1 P1272S (rs1804645), SRC-3 R218C (rs6094752), and SMRT A1706T (rs2229840), did not significantly associate with breast cancer risk (Table 2). Also, stratification for age (> = 50 year and <50 years of age) in order to investigate a possible risk influence in pre- or postmenopausal women revealed no significant associations except for rs6094752 where a significant effect could be detected for heterozygous carriers only (Table 3). However, this is most likely a chance effect due to multiple testing. Stratification by bilateral cases revealed no significant associations (Table 4). We observed a protective effect of the homozygous c-allele carrier of SRC-3 Q586H rs2230782 (GG+GC versus CC: OR = 0.45, 95%CI = 0.041, Table 2), similar to the findings that have been reported before (GG+GC versus CC: OR = 0.39, 95%CI = 0.14-1.05 p = 0.061) [15]. As our study included a portion of the samples of the previous reported study it is noteworthy to mention that the results of the current study excluding the previously analyzed samples show the same protective effect and borderline significance (GG+GC versus CC: OR = 0.37, 95%CI = 0.13-1.08, p = 0.059). However, we failed to replicate previous associations between SRC-3 rs2076546 (T960T) SNP and breast cancer risk. The haplotype analysis of the variants analysed in SRC-3 revealed a protective haplotype including the C-C-G-alleles of R218C, Q586H and T960T, respectively (Table 5). As the haplotype is very rare occurring with a frequency of 0.03 in controls this result has to be verified in further multi-center collaboration studies.

Table 2 Summary of associations in entire population
Table 3 Associations according to age stratification
Table 4 Associations with stratification by bilateral cases
Table 5 Associations of SRC-3 haplotypes

The discordant findings between our studies and the Haiman study [16] with respect to SRC-3 Q586H may be due to the inherent differences in the populations examined. For example, our studies exclusively examined Europeans while the study by Haiman et al. examined a range of ethnic backgrounds. A number of recent studies suggest that a SNP association could be specific to the genetic background of a certain ethnic group [22, 23]. It is possible that the Q586H effect is only seen in European populations, and/or that the lower number of unselected European cases within the Haiman study had insufficient power to detect this effect. The selection of high risk BRCA1/BRCA2 mutation negative cases in our study is expected to act as a multiplier to further increase our power to detect associations. Lastly, since only nonsynonymous SNPs were genotyped in the Haiman study, the stronger effect seen in the two-SNP SRC-3 haplotype could not be observed. We did not genotype the two SNPs (SMRT H52R and CALCOCO1 R12H) identified in the Haiman study to be associated with breast cancer risk since they were found either exclusively or predominantly in African Americans (European population MAF: SMRT H52R = 0%, CALCOCO1 R12H = 0.6%). Since our study exclusively contains Europeans, it was unlikely that we would obtain sufficient power to detect an association.

Conclusions

In summary, these results illustrate the dramatic differences in polymorphism frequency that can be seen amongst closely related genes. Further, the fact that so many novel SNPs were identified through our sequencing effort, even common SNPs with MAF>5%, illustrates the huge amount of genetic diversity that has yet to be discovered. Finally, the strengthening of the association between the SRC-3 Q586H SNP and decreased breast cancer risk, and the identification of a rare haplotype within SRC-3 associated with decreased risk, suggest that this information could be used to help identify a subgroup of high-risk women at a more modest risk. However, this remains to be verified prospectively.