Introduction

Osteoporosis is a highly polygenetic disease characterized by low bone mass and deterioration in bone microarchitecture, leading to increased skeletal fragility and fracture risk [1,2,3]. A low bone mineral density (BMD), independently of bone quality or structure, measures the mineral component of bone and is a strong clinically relevant risk factor for osteoporosis and a key indicator of its diagnosis and treatment [4, 5]. Although BMD is most often measured by dual-energy X-ray absorptiometry (DXA) scanning in clinical settings, an alternative method of estimating the BMD is derived from ultrasound, typically at the heel (referred to here as estimated BMD (eBMD)). A previous genome-wide association study (GWAS) of eBMD that used heel ultrasound parameters identified 84% of all currently known genome-wide significant loci for DXA-derived BMD [32, 33]. We used strict thresholds: PP3 > 0.9 for evidence of trait−gene expression associations caused by multiple distinct causal variants from a GWAS and an eQTL and PP4 > 0.8 for evidence of trait−gene expression associations caused by a joint signal from a GWAS and an eQTL [34].

Assessment of gene−disease associations

To investigate the likelihood that functional genes are more likely to be causal, the associations of biological function between the candidate genes and osteoporosis were assessed using VarElect [35, 36], a cutting-edge Variant Election application for disease/phenotype-dependent gene variant prioritization. VarElect provides a robust algorithm for ranking genes within a shortlist, noting their likelihood to be associated with the disease of interest, and producing a list of prioritized, scored, and contextually annotated genes and direct links to supporting evidence and additional information. VarElect utilizes the deep LifeMap Knowledgebase to infer the “direct” or “indirect” association of biological function between genes and phenotypes. A “direct” association between genes and disease has been supported by many studies showing that genes can directly affect disease development, and an “indirect” association between genes and disease is based on shared pathways, protein-protein interaction networks, paralogy relationships, domain-sharing, and mutual publications.

Protein−protein interaction (PPI) network and pathway enrichment analysis

The functional networks of genes that were found to be significantly associated with osteoporosis by the TWAS were further validated using the STRING and CluePedia tools. STRING (Search Tool for the Retrieval of Interacting Genes) is an online tool designed to evaluate PPI networks [37, 38], and CluePedia is a plugin of Cytoscape software that searches for potential genes associated with certain signaling pathways by calculating linear and nonlinear statistical dependencies from experimental data [39, 40]. The PPI networks of the significant genes identified by the TWAS were constructed using STRING. The functional pathways were detected and visualized using CluePedia.

Differential analysis of gene expression

To further validate the functional causality of candidate genes, the Gene Expression Omnibus (GEO) database and European Molecular Biology Laboratory (EMBL-EBI) database were searched to identify gene expression profiling studies of subjects with osteoporosis. The following key search terms were used: “osteoporosis,” “gene expression,” and “microarray.” We obtained gene expression profiles from four different sources and included original microarray studies that analyzed the differential gene expression profiles between patients with osteoporosis and normal controls, as shown in Table 2. The existence of heterogeneity among multiple microarray studies arising from different microarray platforms, gene nomenclature, and clinical samples makes it infeasible to compare the gene expression data directly. Therefore, normalization is necessary to minimize heterogeneity. Consequently, we performed a robust multiarray average approach [41] for background correction and normalization. The original GEO data were then converted into expression measures. The Limma package [42] was used to identify the differentially expressed probe sets between patients with osteoporosis and normal controls. Gene-specific t tests were performed, and p values were calculated. Multiple testing adjustment was performed, and the genes with adjusted p values < 0.05 were selected as differentially expressed genes (DEGs).

Results

TWAS-based identification of candidate genes for the treatment of osteoporosis

We first used the TWAS method with GWAS summary data from the GEFOS consortium to identify candidate genes associated with osteoporosis. In this study, we used the eBMD GWAS summary dataset rather than fragility fractures because fragility fractures were not found to be enriched by the TWAS, as shown in Supplementary Table 7. A gene expression reference panel for muscle-skeletal, which has a total of 13,416 expressed genes, was used. The TWAS identified 204 significantly associated genes with a p value < 3.7E-06, as shown in Fig. 1.

Fig. 1
figure 1

Manhattan plot of the results from the TWAS (upper panel) and GWAS (lower panel) of osteoporosis. The transcriptome-wide significance threshold was p value = 3.7E-06; the genome-wide significance threshold was p value = 6.6E-09. A total of 1103 conditionally independent SNPs at 515 loci among n = 426,824 UK Biobank participants passed the criteria for genome-wide significance

The TWAS method can detect causal genes by effectively predicting genetic variants based on gene expression. The following four biological patterns were identified by the TWAS (Fig. 2). First, for SNPs in coding regions (introns and exons) significantly associated with osteoporosis, the causal genes identified by the GWAS and the TWAS were likely to be consistent, as shown in Fig. 2a. The effect size of rs10411210 (PGWAS = 1.6E-119) on osteoporosis obtained from the GWAS corresponds with that of rs10411210 on RHPN2 (PTWAS = 4.4E-73) gene expression identified from the TWAS. Second, for SNPs in noncoding regions, the candidate genes might be close to the significant eQTLs but different from the GWAS hits, as shown in Fig. 2b. The variant rs2785197 (PGWAS = 6.5E-44) in 11p13 mapped to PDHX in GWAS, but the causal gene for rs2785197 in our TWAS results was more likely to be CD44 (PTWAS = 1.1E-32). The colocalization analysis showed that CD44 (PP4 = 0.99 in Supplementary Table 2) gene expression was regulated by the single variant rs2785197, which might be regarded as its expression regulation element. Third, the candidate genes might be regulated by relatively distant significant SNPs in noncoding regions, as shown in Fig. 2c. Our TWAS results indicated that rs4792909 (PGWAS = 1.5E-74) in 17q21.31 might be associated with G6PC3 (PTWAS = 4.2E-26). The distance between rs4792909 and G6PC3 is 387 kb, but we did not find the gene identified by the GWAS near rs4792909. Fourth, candidate genes were discovered based on SNPs that were not significantly associated with osteoporosis. The nonsignificant region identified from the GWAS was a novel discovery: rs1003260 (PGWAS = 3.6E-08) in 6q13 was associated with RIMS1 (PTWAS = 2.1E-08), as shown in Fig. 2d. RIMS1, as a novel locus, was first reported to be associated with BMD, and further investigation was performed.

Fig. 2
figure 2

Biological patterns identified by the TWAS. a): For significant SNPs in coding regions, rs10411210 (PGWAS = 1.6E-119) in 19q13.11 is associated with RHPN2 (PTWAS = 4.4E-73). b): For SNPs in the noncoding regions, rs2785197 (PGWAS = 6.5E-44) in 11p13 was associated with PDHX, which was marked in green, as determined by the GWAS, but the causal gene for rs2785197 was more likely to be CD44, which is marked in red (PTWAS = 1.1E-32), as determined by our TWAS. c): rs4792909 (PGWAS = 1.5E-74) in 17q21.31 might be associated with G6PC3 (PTWAS = 4.2E-26). The distance between rs4792909 and G6PC3 is 387 kb, but no gene has been identified by a GWAS near rs4792909. d): rs1003260 (PGWAS = 3.6E-08) in 6q13 was associated with RIMS1 (PTWAS = 2.1E-08)

Gene expression differences identified by TWAS might be causally associated with the phenotype of interest but can also be due to variant linkage disequilibrium or gene product co-expression [43, 44]. To pinpoint the causal relationship between the target gene of an eQTL and a complex trait, we performed a colocalization analysis using the COLOC method; see the Methods section. We used a strict threshold for single variant colocalization with PP4 > 0.8 and a stricter threshold for multiple variants colocalization with PP3 > 0.9 as this category seems slightly inflated than PP4; see QQ plot in Supplementary Figure 4. The results showed that 103 TWAS associations provided strong evidence of joint causal variants with PP3 > 0.9, as shown in Supplementary Table 1, and 101 showed evidence of a single causal variant with PP4 > 0.8, as shown in Supplementary Table 2.

Compared with previous GWAS studies, we found that 51 of the identified genes were previously implicated in osteoporosis risk by GWASs, as demonstrated in the literature, and 153 genes have not been reported to be associated with osteoporosis risk in previous GWASs, as shown in Figs. 3a–b.

Fig. 3
figure 3

Significant genes and candidate genes in muscle-skeletal tissue identified by TWAS. (a) Comparison of significant genes found using the TWAS and GWAS methods. (b) The top 20 candidate genes were not reported in previous GWASs, the red bars indicate upregulated gene expression, and the blue bars indicate downregulated gene expression (full lists can be found in Supplementary Figure 1, Supplementary Table 1 and Supplementary Table 2).

Assessment of the candidate gene−osteoporosis associations

For 153 candidate genes, we evaluated the associations between the candidate genes and osteoporosis through an analysis using VarElect. The analytical results showed that 20 genes (Supplementary Table 3) were “directly” associated, 83 genes were “indirectly” associated (Supplementary Table 4), and the remaining genes have not yet been classified. The direct associations indicated that the target genes were supported by rich evidence (the relevant literature, gene function annotation, etc.). The score shown in Supplementary Table 3 indicated the strength of the association between the gene and osteoporosis: a higher score indicates stronger evidence. Indirectly associated genes might interact with intermediaries to influence the development of osteoporosis through a PPI network and pathways (Supplementary Table 5). We considered the remaining unidentified genes as novel candidate genes, which were mainly lncRNAs, pseudogenes, and antisense genes. These novel candidate markers are potential disease factors for which there is no available evidence and thus need further investigation.

Functional pathways of the candidate genes

To further verify the associations between the significant genes identified by the TWAS and osteoporosis, we explored the biological function pathways of these genes using the STRING and CluePedia tools. Four pathways that may promote the understanding of the mechanism of osteoporosis were enriched (adjusted p value < 0.05), as shown in Table 1. However, eBMD-related genes have mostly not yet been well studied, and few overlaps were found with the KEGG database. Other nonsignificant KEGG pathways may also have important roles on osteoporosis and provide additional clues, as shown in Supplementary Table 5. Among them, some of the pathways which interacted with each other were shown in Supplementary Figure 5 (e.g., PI3K-Akt signaling, focal adhesion, and ECM-receptor interaction ). These results showed that the significant genes identified by the TWAS are involved in many biological mechanisms in the development of osteoporosis.

Table 1 Functional pathways of significant genes identified by the TWAS

Functional validation for the candidate genes

Previous research based on expression profiling with gene signatures of cellular models to characterize the gene’s involvement in bone metabolism and disease processes revealed that impaired osteoblastic differentiation reduces bone formation and causes severe osteoporosis in animals [45]. We analyzed four gene expression profile datasets from bone, bone marrow, monocyte cells, and B cells of patients with osteoporosis and normal controls and high- and low-BMD control groups. Based on the cut-off criterion for the identification of DEGs (adjusted p value < 0.05), a total of 15 significant genes identified by the TWAS were duplicated, and 11 of these genes were not found by the GWAS, as shown in Table 2. The results of the functional pathway analysis also supported our findings. As shown in Fig. 4, we discovered that four differentially expressed genes were enriched in four KEGG pathways that were significantly and strongly associated with osteoporosis. SLC11A2, which is enriched in the mineral absorption pathway (Ppathway = 0.019), regulates the fine-tuned balance between bone resorption and bone formation and thus affects bone density [46]; MAP2K5 is enriched in the MAPK signalling pathway (Ppathway = 0.388), which is involved in the regulation of many cellular physiological functions, such as proliferation, differentiation, inflammation, and apoptosis, and affects bone formation [47, 48]; NFATC4 is enriched in the Wnt signalling pathway (Ppathway = 0.104) and is a candidate for therapeutic intervention aimed at increasing bone mass and strength in treated patients [49, 50]; HSP90B1 is enriched in the PI3K-AKT signalling pathway (Ppathway= 0.020), which is involved in the inhibition of osteoporosis through the promotion of osteoblast proliferation, differentiation and bone formation [51, 52]. Therefore, we inferred that these genes are very likely to be the causal pathogenic genes of osteoporosis. Due to the small sample size of the mRNA expression datasets, more experiments and other types of RNA datasets are needed in the future.

Table 2 Significant genes identified by the TWAS that show significantly differential gene expression in the four gene expression profile datasets. The red marker genes were not identified by the GWASs
Fig. 4
figure 4

Biological function verification of the significant genes identified by the TWAS. SLC11A2, NFATC4 and HSP90B1 showed significantly differential expression in bone tissue and bone marrow cells between patients with osteoporosis and normal controls. MAP2K5 gene expression is significantly downregulated in B cells of low-BMD samples

OP osteoporosis, NOR normal, BMD bone mineral density

Discussion

Multiple GWASs have been performed with considerable sample sizes to detect osteoporosis heredity, but the progress toward understanding the mechanism of the disease is limited. Most GWAS hits are in noncoding regions, and it is difficult to understand downstream biological inferences. In most cases, the nearest genes are usually reported [53, 54]. SNPs in noncoding regions do not have to regulate genes based on the distance between SNPs and genes. The integration of GWAS and transcriptome data will empower novel discovery and, most importantly, pinpoint causality. The TWAS method calculates local SNP−gene expression correlations and further calculates the likelihood of gene causality. Therefore, for a significant SNP in a coding region, the causal genes identified by using the GWAS and TWAS methods should be and indeed are consistent, as shown in Fig. 2a. For SNPs in noncoding regions, the causal genes might be close to the significant eQTLs, which might differ from the GWAS hits shown in Fig. 2b. The most relevant GWAS variants and their nearest genes were not enriched as causal variants/genes by the TWAS. Accordingly, we compared the GWAS-reported genes and the TWAS-enriched genes in all significant GWAS regions, as shown in Supplementary Table 8. The TWAS method can even discover causal genes in regions with no significant GWAS hits, as shown in Fig. 2d, and relatively distant significant SNPs, as shown in Fig. 2c. More valuable region plots can be found in Supplementary Figure 2.

We found 204 significant candidate genes through our TWAS. Among these genes, 103 genes were regulated by two distinct causal variants, and 101 genes were regulated by a single causal variant. In comparison with the GWAS, 51 genes were duplicated. For the remaining 153 genes, an analysis of their biological functions revealed that 20 genes directly affected pathways closely related to the development of osteoporosis: IBSP, EIF2B2, CD44, FEN1, UBA7, MARCO, ATF1, CBFB, G6PC3, SLC11A2, MST1R, PLEKHM1, ATRIP, CCDC36, AKAP7, EPRS, CTSB, CRHR1, FADS1, and MAP1LC3A. For example, IBSP (score = 13.74) is the gene most associated with osteoporosis. The COLOC analysis showed that the SNPs rs1471403 and rs1054627 might co-regulate IBSP gene expression. In addition, previous studies have shown that IBSP is expressed in all major bone cells, including osteoblasts, osteocytes, and osteoclasts [55], and encodes a major noncollagenous bone matrix protein that binds to calcium and hydroxyapatite via acidic amino acid clusters in the PI3K-AKT signaling pathway [51]. In contrast, we found that 83 genes appear to exert their biological functions to affect the development of osteoporosis through a PPI network. As shown in Supplementary Figure 3, RAC3 and NFATC4 were enriched in the MAPK signaling pathway (Ppathway = 0.388) through interactions with genes (ESR1, FOS, IGF1, TGFB1, JUN, NFATC1, IGF1, LRP5, TNF, and PRKACA) that are known to be associated with osteoporosis. More information on gene interactions can be found in Supplementary Table 4 and Table 5. In addition, 50 significant markers as novel candidate genes are not associated with osteoporosis based on existing knowledge, and these include 19 genes, 13 lincRNAs, 9 pseudogenes, and 9 antisense genes. Some of these candidate genes, such as AF131215.2 (PTWAS = 1.92E-66) and RP11-73 M18.6 (PTWAS = 2.72E-51), were very significant. Simultaneously, we found that RIMS1 was located in new locus, and its causal SNPs were non-significantly associated with osteoporosis in the GWAS. RIMS1 is an RAS gene superfamily member and plays a role in the regulation of voltage-gated calcium channels during neurotransmitter and insulin release. Although previous studies have not provided evidence supporting a causal association with osteoporosis, these genes might be potential causal biomolecules for osteoporosis, and more experiments are needed to verify their biological function.

Furthermore, we obtained additional evidence by comparing differentially expressed genes through an analysis of four types of gene expression profiles. Our results identified 15 significantly differentially expressed genes, as shown in Table 2. Among them, 11 genes were not discovered by the GWAS, and these included SLC11A2, G6PC3, and MAP1LC3A, which have been proven to be directly associated with osteoporosis, and NFATC4, HSP90B1, TP53I13, MTMR9, PCGF2, MAPT-AS1, and MAP2K5, which are considered to be indirectly associated with osteoporosis based on PPI networks and literature. It is worth mentioning that SLC11A2, NFATC4, HSP90B1, and MAP1LC3A were enriched in four very important pathways. In addition, GPATCH1, SPTBN1, DPP8, and ISYNA1 were also found in the GWAS, and our results once again confirmed their potential as candidate disease markers. The biological function information of these 15 genes can be found in Supplementary Table 6.

This investigation constitutes the largest study integrating the GWAS and TWAS methods to identify osteoporosis susceptibility genes. We used data from the 426,824 individuals with osteoporosis in the GWAS and 860 samples from GTEx in our analyses. Many findings were discovered, although this research still has some limitations. First, the current TWAS method cannot explain the variants influencing disease that are independent of cis expression because it was only trained on cis-eQTL analysis. Second, some bias might exist due to the use of normal muscle-skeletal tissues from GTEx to make predictions. Third, tissue sensitivity and tissue specificity are important issues to consider when performing a TWAS. Prediction models built on gene expression data from osteoblast cells of osteoporosis patients will help identify additional candidate genes associated with osteoporosis [56].

In summary, we integrated data from GWAS and transcriptome expression to identify 204 significant genes associated with osteoporosis. One hundred fifty-four genes have been previously associated with osteoporosis (literature, protein-protein interaction networks, pathway, etc.), and 50 genes have not been previously discovered. Therefore, we analyzed the biological patterns of those loci and explained their pathway interactions. We hope that our findings will provide novel insights for future pathogenetic studies of osteoporosis.