Introduction

Psychiatric disorders like schizophrenia, bipolar disorder, major depressive disorder, autism and substance use disorders account for a significant proportion of disability world-wide1 and cause enormous personal and societal burdens.2 The lifetime prevalence estimates range from 0.1% (autism spectrum disorder) to 24% (nicotine dependence).3 These disorders have a significant genetic component, with estimates of heritability ranging from 37% (major depressive disorder) to 81% (schizophrenia).3

Recent genome-wide association studies (GWAS) investigating the genetic architecture of psychiatric disorders have identified many common variants that meet consensus criteria for significance and replication.4, 5, 6 Understanding the biological mechanisms by which these common variants contribute to complex traits is challenging. The main reason is that the majority (>90%) of disease-associated variants from many GWAS lie in noncoding regions,7 making evaluation of their function difficult. However, accumulating evidence suggests that these noncoding common variants are involved in transcriptional regulatory mechanisms such as promoter and enhancer elements8 and enriched within expression quantitative trait loci (eQTL).8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 In addition, about 77% of SNPs implicated in GWAS were within or in high linkage disequilibrium (LD) with DNase I hypersensitivity sites, a marker for open chromatin subject to transcriptional regulation.7,21,22

eQTL studies measure genetic variation and gene expression in the same individuals, and thus link DNA variation to mRNA variation.8 These studies have received particular attention due to their inherent relevance to the control of gene expression and because they provide a way to generate hypotheses about the functional meaning of GWAS findings via relatively simple data base queries.11,23,24

There are relatively few eQTL studies of human brain tissue25, 26, 27, 28 or brain disease.23,29 Current catalogs of brain eQTLs are incomplete and the findings do not replicate well across studies—all existing brain eQTL studies are small and highlight the need for a meta-analysis.8,9 We evaluated this broad hypothesis using enrichment analyses.

First, we assessed whether SNPs associated with psychiatric disorders were enriched among genetic variants that were part of a cortical eQTL (that is, SNP–gene pair) using permutation tests.10 Specifically, we evaluated the overlap between eQTLs in human cortex with five psychiatric disorders studied by the Psychiatric Genomics Consortium (PGC): attention-deficit hyperactivity disorder, autism, bipolar disorder, major depressive disorder and schizophrenia (SCZ).5 We obtained results files from the PGC website (https://pgc.unc.edu/Sharing.php#SharingOpp) from a GWAS meta-analysis of these disorders in independent cases and controls.47 There were 1 065 656 GWAS SNPs common to the five PGC results files and our brain eQTL SNPs. We excluded the extended major histocompatibility locus (eMHC, chr6:25–34 Mb) given its high gene density, LD and functionally clustered genes. We compared LD-pruned sets of GWAS SNPs generated via PLINK (—indep-pairwise 100 25 0.8).48 For each disorder, we generated 10 000 randomized SNP sets, each the same size as the original list of associated GWAS SNPs at a given P-value threshold matched on MAF distribution of the original list and sampled without replacement from the null set. For each set, we determined the number of significant eQTL SNPs at FDR threshold of 0.05. These permutations yielded an empirical enrichment P-value, calculated as the proportion of 10 000 randomized sets in which the number of eQTL SNPs exceeds the originally observed number of eQTL SNPs at the FDR threshold. We repeated this analysis for a recent larger SCZ GWAS.49

Second, we evaluated whether genes that were part of a SNP–gene eQTL in brain were enriched for functional roles in biological pathways or similar cellular functions. We evaluated the following gene sets previously associated with SCZ: expert-curated lists of synaptic genes,50 genes encoding postsynaptic density proteins,51 genes encoding the NMDA (N-methyl-D-aspartate) receptor52 and activity-regulated cytoskeleton-associated protein complex,52 genes whose mRNAs interact with FMRP,53 genes encoding components of voltage-gated calcium channels (all CACN* RefSeq genes)49 and genes whose proteins interact with a calcium channel subunit.54 We also evaluated OMIM disease genes,55 genes with an eQTL in peripheral blood from the largest human eQTL study,49 We focused on genes with q<0.05 and performed clum** using PLINK to retain eQTL SNPs with r2<0.6 within 500-kb windows (—clump-P1 0.05—clump-P2 0.05—clump-r2 0.6—clump-kb 500). To guard against a falsely inflated intersection rate with the GWAS catalog SNPs, we used q-values rather than P-values as input for clum** and identified GWAS catalog SNPs with reported P-values <1 × 109 that were intersecting the clumped regions.

Results

Meta-analysis of eQTL

We first conducted eQTL analyses for each of the five cortical studies. After quality control, sample sizes ranged from 24 to 189 and the numbers of transcripts ranged from 10 038 to 15 857 per study (17 537 genes evaluated at least one study). Supplementary Figures S5–S9 show plots of gene location versus eQTL location. We defined a local eQTL as an SNP–gene eQTL±1 Mb of the transcription start or end sites for a gene and distant otherwise. As expected, local eQTLs tended to have stronger effects than distant eQTLs. Studies with small sample sizes showed much weaker local eQTLs.

We next conducted a meta-analysis of 424 brain samples across five studies to identify regulatory variants influencing gene expression in human cortex (Table 1). As previous eQTL studies14,

Figure 2
figure 2

Predicted functional consequences of local eQTL SNPs. (a) Functional consequences of significant eQTLs (q<0.05, 143 679 unique SNPs) using Ensembl Variant Effect Predictor tool. Each SNP was assigned to the most severe predicted consequence. The ratio on each bar represents number of SNPs with regulatory features divided by number of SNPs in each functional category. (b) Functional consequences of randomly selected, MAF-matched, insignificant eQTLs (q>0.5, 143 679 unique SNPs). eQTL, expression quantitative trait loci; MAF, minor allele frequency; SNP, single-nucleotide polymorphism.

We evaluated whether there were significant differences between the classifications of significant and randomly selected nonsignificant eQTL SNPs. The overall distributions were significantly different between the two sets of SNPs (χ2 P<1 × 10−4). Each functional consequence relative to intergenic also revealed significant difference between the two sets of SNPs (Supplementary Table S4). Odds ratios ranged from 3.5 to 9.2 and all P-values were <1 × 104. SNPs in 5’-untranslated region showed the largest difference and were 9.2 times more likely to be significant eQTLs.

Prior studies observed clustering of significant local SNP–gene eQTLs near transcription start sites.11,26,78 Analysis was restricted to 4074 GO categories containing genes between 5 and 3000 to account for pathway sizes. We used permutation to get empirical P-values per pathway and to correct for multiple-testing. Multiple GO pathways related to mitochondrial structure and function were ranked as top pathways (Supplementary Table S7). This result is consistent with the DAVID results, indicating that mitochondrial pathways are robust findings regardless of different gene-set enrichment methods. We tested for enrichment in mitochondrial pathways by further analyses using nuclear-encoded mitochondrial genes from MitoCarta (http://www.broadinstitute.org/pubs/MitoCarta),75 autosomal oxidative phosphorylation genes,76 and nuclear-encoded transcriptional regulators of mitochondrial genes.76,79 Of 914 nuclear-encoded mitochondrial genes, 257 genes (28%) overlapped with genes showing significant eQTL evidence. We observed strong enrichment of significant eQTL genes in autosomal mitochondrial genes (odds ratio=1.60, P=1.3 × 10−9 using all genes; odds ratio=1.39, P=1.0 × 10−4 using brain-expressed genes). However, no enrichment was observed for nuclear regulators of mitochondrial genes (P=0.80 using all genes, P=0.95 using brain-expressed genes) and oxidative phosphorylation genes (P=0.06 using all genes, P=0.11 using brain-expressed genes).

Second, genes expressed in multiple tissues tend to have local regulatory elements.80 To evaluate the hypothesis, we compared eQTL genes in peripheral blood57 We restricted our search to GWAS SNPs with P<1 × 10−9, yielding 2946 SNPs for 471 traits from 869 papers. Of the 2946 SNPs implicated by GWAS, 528 (17.9%) were part of a local eQTL (178 directly associated with 94 traits and 350 SNPs indirectly via a proxy SNP with r2>0.2). The 10 most frequent traits were height (12 SNPs), inflammatory bowel disease (11), Crohn’s disease (9), plasma phospholipid levels (6), total cholesterol (5), chronic kidney disease (4), coronary heart disease (4), HDL cholesterol (4), metabolite levels (4) and red blood cell traits (4). We evaluated brain eQTLs that overlap with the SNPs associated with central nervous system-related phenotypes (Table 5), and identified overlap with bipolar disorder, Parkinson’s disease and nicotine dependence.

Table 5 eQTL SNP clum** regions and brain diseases from the NHGRI GWAS catalog

We compared our eQTL genes to OMIM55 which catalogs genes often containing rare variation with strong effects. We observed significant enrichment of genes with significant brain eQTL evidence in OMIM disease genes (odds ratio=1.15, P=0.009). Mitochondrial complex I deficiency and Leigh syndrome were the second most frequent diseases in our data (FOXRED1, NDUFA2, NDUFA10, NDUFAF1, NDUFAF2, NDUFAF4 and NDUFS2).

Protein–protein interaction

We used DAPPLE81 to evaluate whether genes with strong evidence of local eQTLs connected via protein–protein interactions. Genes with evidence of local eQTLs showed somewhat higher network connectivity (direct P=0.04 and indirect P=0.01). However, many of these genes were in small networks rather than a single network, suggesting that there is no a dominant functional network related to all these genes (Supplementary Figure S15).