Abstract
Current catalogs of brain expression quantitative trait loci (eQTL) are incomplete and the findings do not replicate well across studies. All existing cortical eQTL studies are small and emphasize the need for a meta-analysis. We performed a meta-analysis of 424 brain samples across five studies to identify regulatory variants influencing gene expression in human cortex. We identified 3584 genes in autosomes and chromosome X with false discovery rate q<0.05 whose expression was significantly associated with DNA sequence variation. Consistent with previous eQTL studies, local regulatory variants tended to occur symmetrically around transcription start sites and the effect was more evident in studies with large sample sizes. In contrast to random SNPs, we observed that significant eQTLs were more likely to be near 5’-untranslated regions and intersect with regulatory features. Permutation-based enrichment analysis revealed that SNPs associated with schizophrenia and bipolar disorder were enriched among brain eQTLs. Genes with significant eQTL evidence were also strongly associated with diseases from OMIM (Online Mendelian Inheritance in Man) and the NHGRI (National Human Genome Research Institute) genome-wide association study catalog. Surprisingly, we found that a large proportion (28%) of ~1000 autosomal genes encoding proteins needed for mitochondrial structure or function were eQTLs (enrichment P-value=1.3 × 10−9), suggesting a potential role for common genetic variation influencing the robustness of energy supply in brain and a possible role in the etiology of some psychiatric disorders. These systematically generated eQTL information should be a valuable resource in determining the functional mechanisms of brain gene expression and the underlying biology of associations with psychiatric disorders.
Similar content being viewed by others
Introduction
Psychiatric disorders like schizophrenia, bipolar disorder, major depressive disorder, autism and substance use disorders account for a significant proportion of disability world-wide1 and cause enormous personal and societal burdens.2 The lifetime prevalence estimates range from 0.1% (autism spectrum disorder) to 24% (nicotine dependence).3 These disorders have a significant genetic component, with estimates of heritability ranging from 37% (major depressive disorder) to 81% (schizophrenia).3
Recent genome-wide association studies (GWAS) investigating the genetic architecture of psychiatric disorders have identified many common variants that meet consensus criteria for significance and replication.4, 5, 6 Understanding the biological mechanisms by which these common variants contribute to complex traits is challenging. The main reason is that the majority (>90%) of disease-associated variants from many GWAS lie in noncoding regions,7 making evaluation of their function difficult. However, accumulating evidence suggests that these noncoding common variants are involved in transcriptional regulatory mechanisms such as promoter and enhancer elements8 and enriched within expression quantitative trait loci (eQTL).8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 In addition, about 77% of SNPs implicated in GWAS were within or in high linkage disequilibrium (LD) with DNase I hypersensitivity sites, a marker for open chromatin subject to transcriptional regulation.7,21,22
eQTL studies measure genetic variation and gene expression in the same individuals, and thus link DNA variation to mRNA variation.8 These studies have received particular attention due to their inherent relevance to the control of gene expression and because they provide a way to generate hypotheses about the functional meaning of GWAS findings via relatively simple data base queries.11,23,24
There are relatively few eQTL studies of human brain tissue25, 26, 27, 28 or brain disease.23,29 Current catalogs of brain eQTLs are incomplete and the findings do not replicate well across studies—all existing brain eQTL studies are small and highlight the need for a meta-analysis.8,9 We evaluated this broad hypothesis using enrichment analyses.
First, we assessed whether SNPs associated with psychiatric disorders were enriched among genetic variants that were part of a cortical eQTL (that is, SNP–gene pair) using permutation tests.10 Specifically, we evaluated the overlap between eQTLs in human cortex with five psychiatric disorders studied by the Psychiatric Genomics Consortium (PGC): attention-deficit hyperactivity disorder, autism, bipolar disorder, major depressive disorder and schizophrenia (SCZ).5 We obtained results files from the PGC website (https://pgc.unc.edu/Sharing.php#SharingOpp) from a GWAS meta-analysis of these disorders in independent cases and controls.47 There were 1 065 656 GWAS SNPs common to the five PGC results files and our brain eQTL SNPs. We excluded the extended major histocompatibility locus (eMHC, chr6:25–34 Mb) given its high gene density, LD and functionally clustered genes. We compared LD-pruned sets of GWAS SNPs generated via PLINK (—indep-pairwise 100 25 0.8).48 For each disorder, we generated 10 000 randomized SNP sets, each the same size as the original list of associated GWAS SNPs at a given P-value threshold matched on MAF distribution of the original list and sampled without replacement from the null set. For each set, we determined the number of significant eQTL SNPs at FDR threshold of 0.05. These permutations yielded an empirical enrichment P-value, calculated as the proportion of 10 000 randomized sets in which the number of eQTL SNPs exceeds the originally observed number of eQTL SNPs at the FDR threshold. We repeated this analysis for a recent larger SCZ GWAS.49
Second, we evaluated whether genes that were part of a SNP–gene eQTL in brain were enriched for functional roles in biological pathways or similar cellular functions. We evaluated the following gene sets previously associated with SCZ: expert-curated lists of synaptic genes,50 genes encoding postsynaptic density proteins,51 genes encoding the NMDA (N-methyl-D-aspartate) receptor52 and activity-regulated cytoskeleton-associated protein complex,52 genes whose mRNAs interact with FMRP,53 genes encoding components of voltage-gated calcium channels (all CACN* RefSeq genes)49 and genes whose proteins interact with a calcium channel subunit.54 We also evaluated OMIM disease genes,55 genes with an eQTL in peripheral blood from the largest human eQTL study,49 We focused on genes with q<0.05 and performed clum** using PLINK to retain eQTL SNPs with r2<0.6 within 500-kb windows (—clump-P1 0.05—clump-P2 0.05—clump-r2 0.6—clump-kb 500). To guard against a falsely inflated intersection rate with the GWAS catalog SNPs, we used q-values rather than P-values as input for clum** and identified GWAS catalog SNPs with reported P-values <1 × 10−9 that were intersecting the clumped regions.
Results
Meta-analysis of eQTL
We first conducted eQTL analyses for each of the five cortical studies. After quality control, sample sizes ranged from 24 to 189 and the numbers of transcripts ranged from 10 038 to 15 857 per study (17 537 genes evaluated at least one study). Supplementary Figures S5–S9 show plots of gene location versus eQTL location. We defined a local eQTL as an SNP–gene eQTL±1 Mb of the transcription start or end sites for a gene and distant otherwise. As expected, local eQTLs tended to have stronger effects than distant eQTLs. Studies with small sample sizes showed much weaker local eQTLs.
We next conducted a meta-analysis of 424 brain samples across five studies to identify regulatory variants influencing gene expression in human cortex (Table 1). As previous eQTL studies14, Predicted functional consequences of local eQTL SNPs. (a) Functional consequences of significant eQTLs (q<0.05, 143 679 unique SNPs) using Ensembl Variant Effect Predictor tool. Each SNP was assigned to the most severe predicted consequence. The ratio on each bar represents number of SNPs with regulatory features divided by number of SNPs in each functional category. (b) Functional consequences of randomly selected, MAF-matched, insignificant eQTLs (q>0.5, 143 679 unique SNPs). eQTL, expression quantitative trait loci; MAF, minor allele frequency; SNP, single-nucleotide polymorphism.
We evaluated whether there were significant differences between the classifications of significant and randomly selected nonsignificant eQTL SNPs. The overall distributions were significantly different between the two sets of SNPs (χ2 P<1 × 10−4). Each functional consequence relative to intergenic also revealed significant difference between the two sets of SNPs (Supplementary Table S4). Odds ratios ranged from 3.5 to 9.2 and all P-values were <1 × 10−4. SNPs in 5’-untranslated region showed the largest difference and were 9.2 times more likely to be significant eQTLs.
Prior studies observed clustering of significant local SNP–gene eQTLs near transcription start sites.11,26,78 Analysis was restricted to 4074 GO categories containing genes between 5 and 3000 to account for pathway sizes. We used permutation to get empirical P-values per pathway and to correct for multiple-testing. Multiple GO pathways related to mitochondrial structure and function were ranked as top pathways (Supplementary Table S7). This result is consistent with the DAVID results, indicating that mitochondrial pathways are robust findings regardless of different gene-set enrichment methods. We tested for enrichment in mitochondrial pathways by further analyses using nuclear-encoded mitochondrial genes from MitoCarta (http://www.broadinstitute.org/pubs/MitoCarta),75 autosomal oxidative phosphorylation genes,76 and nuclear-encoded transcriptional regulators of mitochondrial genes.76,79 Of 914 nuclear-encoded mitochondrial genes, 257 genes (28%) overlapped with genes showing significant eQTL evidence. We observed strong enrichment of significant eQTL genes in autosomal mitochondrial genes (odds ratio=1.60, P=1.3 × 10−9 using all genes; odds ratio=1.39, P=1.0 × 10−4 using brain-expressed genes). However, no enrichment was observed for nuclear regulators of mitochondrial genes (P=0.80 using all genes, P=0.95 using brain-expressed genes) and oxidative phosphorylation genes (P=0.06 using all genes, P=0.11 using brain-expressed genes).
Second, genes expressed in multiple tissues tend to have local regulatory elements.80 To evaluate the hypothesis, we compared eQTL genes in peripheral blood57 We restricted our search to GWAS SNPs with P<1 × 10−9, yielding 2946 SNPs for 471 traits from 869 papers. Of the 2946 SNPs implicated by GWAS, 528 (17.9%) were part of a local eQTL (178 directly associated with 94 traits and 350 SNPs indirectly via a proxy SNP with r2>0.2). The 10 most frequent traits were height (12 SNPs), inflammatory bowel disease (11), Crohn’s disease (9), plasma phospholipid levels (6), total cholesterol (5), chronic kidney disease (4), coronary heart disease (4), HDL cholesterol (4), metabolite levels (4) and red blood cell traits (4). We evaluated brain eQTLs that overlap with the SNPs associated with central nervous system-related phenotypes (Table 5), and identified overlap with bipolar disorder, Parkinson’s disease and nicotine dependence.
We compared our eQTL genes to OMIM55 which catalogs genes often containing rare variation with strong effects. We observed significant enrichment of genes with significant brain eQTL evidence in OMIM disease genes (odds ratio=1.15, P=0.009). Mitochondrial complex I deficiency and Leigh syndrome were the second most frequent diseases in our data (FOXRED1, NDUFA2, NDUFA10, NDUFAF1, NDUFAF2, NDUFAF4 and NDUFS2).
Protein–protein interaction
We used DAPPLE81 to evaluate whether genes with strong evidence of local eQTLs connected via protein–protein interactions. Genes with evidence of local eQTLs showed somewhat higher network connectivity (direct P=0.04 and indirect P=0.01). However, many of these genes were in small networks rather than a single network, suggesting that there is no a dominant functional network related to all these genes (Supplementary Figure S15).
Discussion
We performed a meta-analysis of local regulatory variation of 424 postmortem brain samples from five human brain eQTL studies. Our analysis of local eQTLs in this relatively large sample size allowed us to identify more eQTLs than those from individual studies.
Consistent with prior findings, we observed that local regulatory variants tend to occur symmetrically around transcription start sites, and effect was more evident in studies with large sample sizes. Significant eQTLs tended to be near 5’-untranslated regions and intersect with regulatory features. In accordance with previous eQTL studies showing that eQTLs are more likely to overlap with SNPs implicated in GWAS,10 we observed that SNPs associated with SCZ and bipolar disorder were enriched among brain eQTLs. Many brain eQTLs are also associated with central nervous system-related diseases (Table 5).
We compared our results with previous findings from the literature. Myers et al.26 and Liu et al.28 reported significant associations between RPS26 and rs11171739 in prefrontal cortex. Cheung et al.82 reported a significant association between RPS26 and rs2271194 (in high LD with rs11171739) in lymphoblastoid cells. We observed strong associations for both RPS26–rs11171739 (q=7.7 × 10−11) and RPS26–rs2271194 (q=1.3 × 10−10). Another eQTL study of human liver identified a significant RPS26 and rs2292239 relationship and suggested RPS26 as a candidate susceptibility gene for type 1 diabetes.83 We observed a strong correlation for the RPS26–rs2292239 pair (q=5 × 10−5) as well.
There are expression variants that are specific to tissues, cells, anatomical regions and diseases.14,34 However, the substantial overlap (24%) between eQTLs from the largest eQTL study in peripheral blood83
We identified a significant number of brain eQTLs that influence the expression of nuclear-encoded genes involved in mitochondrial function and strong evidence of functional clusters related to mitochondrial function (for example, nuclear-encoded mitochondrial genes, P=1.3 × 10−9). Moreover, mitochondrial complex I deficiency genes involved in local eQTL were a frequent overlap. This raises an intriguing possibility, that common genetic variation influences the expression of sets of autosomal genes that influence the number and/or function of mitochondria. Nuclear-encoded autosomal genes (~1000 based on MitoCarta75) and mitochondrial-encoded genes (13 genes in human84) are involved in ATP synthesis, cellular energy metabolism and oxidative phosphorylation, as well as regulation of cellular calcium levels, steroid synthesis, production of free radicals and regulation of apoptosis.85 The central nervous system has a very high metabolic rate because neurons require large amounts of ATP for maintenance of ionic gradients across the cell membranes and for neurotransmission. Neuronal function and survival depend critically on mitochondrial function and oxygen supply.86 Thus, it is conceivable that minor deviations from normal mitochondria functioning can have devastating consequences on the integrity of cells and influence a variety of diseases, including aging,87 cancer,88 metabolic traits,89 neurodegenerative diseases85 and psychiatric disorders.90,91
Although most patients with psychiatric disorders do not have classical mitochondrial diseases caused by mutations of nuclear or mitochondrial DNA, multiple lines of evidences support that impairment in any processes related to normal mitochondria function may be critical in neurobiology of psychiatric disorders.85,91 A study of large, rare CNVs in SCZ observed significant enrichment in gene products localized to mitochondria.92 Impaired neuronal differentiation in hair follicle-derived induced pluripotent stem cells from SCZ cases is associated with mitochondrial dysfunction.93 A recent meta-analysis of autism spectrum disorders suggests an association with mitochondrial dysfunction.94 Mutations and deletions in mitochondrial DNA have been reported to be associated with mood disorders and bipolar disorder.95,96 Postmortem brain samples of bipolar disorder cases showed a pronounced decrease in the expression of nuclear genes regulating oxidative phosphorylation.97 Taken together, gene pathways or networks involving mitochondria function may have an etiological role for some psychiatric disorders.
There are several limitations of this study. First, more data are required. Our sample size was less than that required for confident local eQTL identification.30 Second, this investigation included only normal adult brain samples. Inclusion of data from cases with psychiatric disorders or from earlier developmental stages would likely be informative. Third, although consistent quality control steps were applied, different DNA and RNA platforms across studies may have impacted our findings. To evaluate the impact of between-study heterogeneity, we performed a random-effect meta-analysis using ‘REML’ method in metaphor R package (Supplementary File, http://cran.r-project.org/web/packages/metafor/index.html). We observed that the P-values from random-effect model tend to be larger than fixed-effect model. This is not surprising since fixed-effect models are known to produce tighter confidence intervals and more significant P-values than random-effect models in the presence of between-study heterogeneity.98, 99, 100 The genomic control inflation factors for the fixed-effect and random-effect analyses were 1.08 and 0.87, respectively. Top signals from random-effect model and fixed-effect model were quite different. Many significant SNP–gene pairs from fixed-effect model became nonsignificant via random-effect model.101 Small sample sizes, different expression platforms and unknown differences across our studies could possibly introduce such a large variation in effect sizes and thus inflated between-study heterogeneity. We need to be more cautious about interpretation of the fixed-effect results. On the other hand, there can be a large uncertainty in meta-analysis about the presence and the extent of between-study heterogeneity with limited number of studies. It was pointed out that strong inferences about heterogeneity or lack thereof should be avoided.98 Finally, analysis of postmortem human brain tissues face many challenges as we cannot fully control for all potential confounders that might have impacted the integrity of brain expression assessment (for example, antemortem history, medication use, licit or illicit substance use disorders, cause of death or postmortem delay).
Despite these limitations, the eQTLs and pathways identified in this investigation warrant further exploration as potential candidates involved in pathogenesis of psychiatric disorders. Annotating SNPs identified from GWAS of psychiatric disorders with brain eQTL information will be a valuable resource to characterize the functions of causal variants and generate testable hypotheses for the mechanism underlying GWAS findings.