Background

Heart failure (HF) with preserved ejection fraction (HFpEF) represents an unmet public health concern with rising prevalence [1, 2]. Accumulating evidence indicates HFpEF is clinically and biologically distinct from HF with reduced ejection fraction (HFrEF), as reflected by missing therapy response in HFpEF patients to drugs effective in HFrEF [3]. HF patients suffer from a wide range of comorbidities, which are considered important for HF development and progression [4]. In the pathogenesis of HFpEF, comorbidities have been suggested as causal factors [3, 5] and could possibly be linked to genetic etiology. Treatment of comorbidity has also been shown to have beneficial effects of cardiac physiology [6], emphasizing the potential to address HF subtypes through their comorbidities.

Systems medicine attempts to model disease in a holistic manner. One facet of this, network medicine, is used to analyze complex systems such as patients, organs, or cells via network representation [7, 8]. Comorbidity networks represent diseases as nodes, connected via edges based on co-occurrence in patients. These networks can be used to define disease modules or explore topological changes between patient cohorts [9,10,11,12]. Previous work has shown that disease comorbidity is also often linked to shared disease genes that locate close together in gene-based networks like protein–protein interaction networks [12, 13]. This observation is often the basis of network-based gene prediction, where novel disease genes are predicted based on network proximity to known disease genes.

Cardiovascular diseases are particularly suited for system medicine approaches due to the typical multiorgan involvement [14] and multifactorial etiology [15]. To date, such approaches to study HFpEF have been limited, though the comorbidity-driven pathophysiology of HFpEF makes it a promising subject. In addition, despite the technological advances in multi-omics, knowledge of molecular characteristics of HFpEF remains limited, possibly due to difficulties of biopsy acquisition in HFpEF patients [16] and heterogeneity of HFpEF patients [17].

In this study, we applied a network medicine approach to describe comorbidity patterns in HFpEF and investigate a shared genetic background associated with these patterns. We first demonstrated that comorbidity profiles vary between HFpEF and HFrEF patients and derived distinct comorbidity profiles for each cohort. Then, we built a comorbidity network that contained disease clusters relevant for HF patients. The construction of a multilayer heterogeneous network by integration of prior knowledge resources allowed us to translate the comorbidity profiles into a gene signature for HFpEF. We corroborated this signature in the cardiac transcriptome of a murine HFpEF model. This network medicine approach allowed us to identify distinct comorbidity profiles and novel genetic patterns in HFpEF.

Methods

Study population

The study population was derived from a research data warehouse containing data from patients that visited the Department of Cardiology, Angiology, and Pneumology at Heidelberg University Hospital, Heidelberg, Germany [18]. Heidelberg University Hospital acts as a tertiary care center for the surrounding region, specializing in the treatment of cardiomyopathy. From this data warehouse, we identified patients with HF, visiting between 01.01.2008 and 01.01.2021. The study protocol was approved by the local ethics committee. HF was defined as two or more HF-relevant International Classification of Disease, version 10 (ICD-10) diagnosis codes (I50*, I11.0, I13.0, I13.2, I42.0, I42.5, I42.8, I42.9, I25.5) or at least one HF-relevant diagnosis and at least one of the following criteria: (i) elevated N-terminal pro b-type natriuretic peptide (NTproBNP) (> 120 ng/ml), (ii) recorded New York Heart Association functional class, (iii) echocardiography based E/e’ > 15 ( ratio of early diastolic mitral inflow velocity to early diastolic mitral annulus velocity), (iv) echocardiography or MRI-based left ventricular ejection fraction (LvEF) < 50%, and (v) documented loop diuretic. Patients with HF before age 40, those with a diagnosis of inheritable cardiomyopathy (I42.1-I42.4, I42.6, I42.7), and heart transplant patients (Z94.3) were excluded from the HF cohort. Within the HF cohort, HF subtypes were identified, based on echocardiographic or MRI-based LvEF. Patients with LvEF ≥ 50% were labeled HFpEF, LvEF 40–50% HFmrEF (HF with mid-range ejection fraction), and ≤ 40% HFrEF (Fig. 1). For all patients in the HF cohort, demographics, ICD-10 codes, operational and procedural codes, and targeted clinically relevant measurements were processed (Additional file 1: Fig. S1A, B) [11, 19,20,21,22,23,24,25,26,27,Full size image

Multiple correspondence analysis (MCA)

Disease profiles of HFpEF, HFrEF, and HFmrEF cohorts were captured as binary variables (0—patient has no record, 1—patient has a record of disease) of 569 phenome-wide association scan codes (PheCodes) (Additional file 1: Supplementary Methods). In this feature space (569 comorbidities × 17,665 HF patients), we performed MCA (R-package FactoMiner [39]). Each MCA dimension was then tested for association with clinical covariates with linear regression models (e.g., MCA-dimension 1 ~ age). For each covariate, we then summed the variance associated to all significantly associated dimensions (p-value < 0.05) as an estimate for the total associated variance.

Patient classifier

We trained random forest and regularized logistic regression (elastic net) models to predict HFpEF and HFrEF cohort labels on 569 PheCodes. The highest achieved mean area under the receiver operating characteristic (AUROC) in hyperparameter tuning was reported as an estimate for the model test error (Additional file 1: Supplementary Methods).

To derive the comorbidity profiles for HFpEF and HFrEF, we performed a forward selection with an L1-regularized logistic regression model of the 196 non-zero features from the elastic net model. Models were trained in R with R packages tidymodels using model engines from glmnet [40] and ranger [41].

HF comorbidity network (HFnet) construction

In disease comorbidity networks, nodes represent diseases while edges represent statistical association of co-occurrence, resulting in the graphical depiction of comorbidities as diseases that are statistically dependent. In detail, we selected edges using Fisher’s exact test for estimating statistical dependence and its Benjamini–Hochberg (BH) corrected p-value (< 0.0001) to discard non-significant disease pairs and keep a more sparse network structure. To determine strength of association, we calculated ɸ correlation, which can be interpreted as a Pearson correlation for binary variables. We selected all edges with positive correlation. To account for bias in ɸ correlations, we scaled the values by dividing by mean correlation values for every disease and assigned these values as edge weights [42].

Network node characteristics, such as betweenness, closeness, and degree centrality, and transitivity were calculated with the igraph R package. To calculate metrics based on graph distance, we replaced weights for edge i (Wi) with a new edge score Si: \(S_i= \mathrm{max}(W)-W_i\). The constructed network was then compared to other comorbidity networks (Additional file 1: Supplementary Methods).

HF heterogeneous network (HFhetnet) construction

Disease-gene association

We used disease–gene associations provided by DisGeNet v7.0 [43, 44] and mapped the ICD-10 codes in DisGeNet to PheCodes (Additional file 1: Supplementary Methods). To ensure that the most frequent diseases in our cohort were mapped, we selected the most frequent 3-digit ICD-10 codes that were not mapped to DisGeNet and performed manual annotation via Unified Medical Language System (UMLS) IDs for 23 disease entities (Additional file 2: Supp. Table 1), e.g., PheCode 427.2 (atrial fibrillation) was manually mapped to the UMLS ID C000423. We only included disease–gene associations with a DisGeNet confidence score > 0.29. This cut-off was chosen, such that either one curated source or multiple experimental sources were necessary for disease–gene associations. Details on DisGeNET score calculation can be found at https://www.disgenet.org/dbinfo.

Gene–gene association

To consider multiple layers of gene organization, we constructed a multilayer gene network from different sources.

Omnipath [45, 46] is a meta resource of a multitude of biological knowledge databases, and we curated a network by connecting two genes if a resource provides a co-membership for a signaling pathway. We used the number of resources that reported a relationship as an estimate for the confidence in the relationship, which we introduced as edge weights in the Omnipath layer.

The protein–protein interaction (PPI) network was constructed based on the union of publicly available data from experimental and literature curated data (HuRi-union [47]) [48].

Gene Ontology (GO) gene networks have been constructed before, and we used the GO networks constructed by [49].

Each gene network was reduced to remove loops and multiple edges. To filter for genes relevant in cardiac tissue, gene networks were subset to genes expressed in the human heart on RNA or protein level. For protein expression, we used proteomic data [50, 51], where we selected all peptides that were detected in the human heart and used the leading gene associated with each peptide. For gene expression, we selected genes that were detected in the heart tissue in the Genotype-Tissue Expression (GTEx) Project v8 with a transcript per million value > 1 [52]. We chose this threshold to discard non-expressed genes but include lowly expressed genes. To ensure that gene programs only active in diseased hearts were also captured, we also included genes that were captured in a meta-analysis of HF transcriptomes [32, 33].

Disease–gene prediction and prioritization

To predict genes from diseases within the HFhetnet, we relied on a network propagation algorithm developed for multilayer networks (random walk with restart on multiplex heterogeneous networks; RWR-MH [53]). This algorithm is an extension of the random walk algorithm that tries to find a stationary distribution of probabilities that a node is visited when a random walk on the network is initiated in a set of seed nodes.

We assessed the performance of the link prediction task within the HFhetnet. For a given disease that was present in the HFhetnet and directly linked to two or more genes, we attempted to predict those genes after removing the direct links from the HFhetnet and running RWR-MH with the disease as seed node. The position of the target genes in the resulting probability ranking was then assessed with multiple metrics to estimate success of disease gene recovery (Additional file 1: Supplementary Methods).

For HFpEF and HFrEF gene prediction, we applied the RWR-MH, using the comorbidity profiles from the patient classifier as the seed nodes in the HFhetnet. This yielded two vectors of RW probabilities for each comorbidity profile. The top 500 genes yielded non-zero probability values for each profile. To select gene candidates that were differently ranked, we calculated a prioritization score for HFpEF and HFrEF. For this, we calculated \(G_i= {P_i}*|\Delta R_i|\).

G is the gene prioritization score, P is the RW-based probability, ΔR is the rank difference between HFpEF and HFrEF rankings for gene i.

Transcriptome analysis

We filtered lowly expressed genes and normalized samples using the Trimmed mean of M-values (edgeR [54]) and subsequent variance-stabilizing transformation (limma voom) and performed differential expression analysis (limma [55]. We performed principal component analysis and Gene Ontology enrichment with the enrichr [56] R package. For the overrepresentation analysis, we ranked genes by t-statistic and performed gene set enrichment analysis (fgsea R package [57]) of the top predicted HFpEF and HFrEF genes using different cut-offs.