Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records

Venkatesh, Samvida S.; Ganjgahi, Habib; Palmer, Duncan S.; Coley, Kayesha; Linchangco, Gregorio V.; Hui, Qin; Wilson, Peter; Ho, Yuk-Lam; Cho, Kelly; Arumäe, Kadri; Wittemans, Laura B. L.; Nellåker, Christoffer; Vainik, Uku; Sun, Yan V.; Holmes, Chris; Lindgren, Cecilia M.; Nicholson, George

doi:10.1038/s41467-024-49998-0

Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records

Article
Open access
Published: 10 July 2024

Volume 15, article number 5801, (2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records

Download PDF

361 Accesses
5 Altmetric
Explore all metrics

Abstract

Obesity is a heritable disease, characterised by excess adiposity that is measured by body mass index (BMI). While over 1,000 genetic loci are associated with BMI, less is known about the genetic contribution to adiposity trajectories over adulthood. We derive adiposity-change phenotypes from 24.5 million primary-care health records in over 740,000 individuals in the UK Biobank, Million Veteran Program USA, and Estonian Biobank, to discover and validate the genetic architecture of adiposity trajectories. Using multiple BMI measurements over time increases power to identify genetic factors affecting baseline BMI by 14%. In the largest reported genome-wide study of adiposity-change in adulthood, we identify novel associations with BMI-change at six independent loci, including rs429358 (APOE missense variant). The SNP-based heritability of BMI-change (1.98%) is 9-fold lower than that of BMI. The modest genetic correlation between BMI-change and BMI (45.2%) indicates that genetic studies of longitudinal trajectories could uncover novel biology of quantitative traits in adulthood.

BMI loci and longitudinal BMI from adolescence to young adulthood in an ethnically diverse cohort

Article 27 December 2016

Established BMI-associated genetic variants and their prospective associations with BMI and other cardiometabolic traits: the GLACIER Study

Article 28 April 2016

Dissecting the clinical relevance of polygenic risk score for obesity—a cross-sectional, longitudinal analysis

Article 25 June 2022

Introduction

Obesity, the accumulation of excess body fat¹, which is associated with increased disease burden^2,3, has a strong genetic component⁴. The heritability of body mass index (BMI) is estimated to be 40–70%^4,5,6, and genome-wide association studies (GWASs) have implicated over 1000 independent loci associated with a range of obesity traits⁴. The dynamic process of change in weight over time is also thought to have a genetic component^7,8. Recent studies reveal the shifting genetic landscape of infant, childhood, and adolescent BMI, which detect age-specific transient effects by performing age-stratified GWASs^9,10,11. Adult twin studies^12,13,14 and an electronic health record (EHR)-based population study¹⁵ indicate that long-term patterns of change in adiposity are heritable and have a distinct genetic component to baseline obesity levels. However, less is known about the specific variants and genes that contribute to patterns of adulthood adiposity change. This paucity of GWASs of long-term trajectories of weight change can be partially attributed to the challenges in building and maintaining large-scale genetics cohorts that follow participants over their lifetime¹⁶.

Longitudinal data are a key feature of EHRs, whose increased adoption in the clinic and integration into biobanks has powered cost-efficient and scalable genetics research^17,18. Despite biases in EHR data, including sparsity, non-random missingness, data inaccuracies, and informed presence, EHR-based genetics studies reliably replicate results from purpose-built cohorts^19,20,21. Recent advances in the extraction of phenotypes from longitudinal EHRs at scale show that, as expected^22,23, the mean of repeat quantitative measurements can outperform cross-sectional phenotypes for genetic discovery^24,25. Repeat measurements further allow for the estimation of longitudinal metrics of trait change, such as trajectory-based clusters²⁶, linear slope²⁷, and within-individual variability over time²⁸, all of which may provide additional information to uncover the genetic underpinnings of disease.

A variety of approaches are available for harnessing the longitudinal component of trajectories in EHR data. Simple models target the gradient of a linear fit over time, such as in a longitudinal linear mixed-effects model framework^28,29,30. More complex regression modelling approaches are employed to investigate non-linear changes over time. For example, semi-parametric regression models³¹ generate flexible longitudinal patterns from combinations of basis functions, such as B-splines, regularised to induce a suitable degree of temporal smoothness^32,33,34,35. Subgroups of individuals with similar non-linear trajectories are often identified through clustering approaches, with subgroup membership then tested for association with clinical outcomes or genetic variation^{36,37,38,39,40,41}. Although it is possible to fit full joint models that incorporate both genetic data and longitudinal trajectories simultaneously²⁸, two-stage approaches wherein summary metrics from models of longitudinal EHRs are taken forward for genetic association analyses are popular for their computational efficiency²⁷.

In this study, we leveraged longitudinal EHRs linked to the UK Biobank (UKBB)⁴², Million Veteran Program (MVP)^43,44, and Estonian Biobank (EstBB)⁴⁵ to study the genetic architecture of change in adiposity over adulthood. We developed a two-stage analytical pipeline, utilising statistical methods with a history of application in the EHR data context, to derive linear and non-linear trajectories of BMI and weight over time, and to identify clusters of individuals with similar adiposity trajectories. In the second stage, we carried forward the latent phenotypes from these models, which capture both baseline obesity trait levels and change in obesity traits over time, to perform the largest reported genome-wide association analyses for adiposity change in adulthood. Our results demonstrate the added value of EHR-derived longitudinal phenotypes for genetic discovery.

Results

Longitudinal data help identify novel genetic signals for obesity

We obtained BMI and weight records for up to 177,098 individuals of white–British ancestry with up to 1.48 million measurements in UKBB longitudinal records from general practitioner (GP) and UKBB assessment centre measurements (Table 1 and Supplementary Fig. 3). For each individual, we estimated linear change in BMI or weight over time using a linear mixed-effects (LME) model with random intercepts and random longitudinal gradients (Fig. 1A) within six strata—defined as the pair-wise combinations of two adiposity traits (BMI, weight) with three sex subsets (women-only, men-only, combined sexes). We sought replication of genetic findings in two external cohorts with longitudinal EHR data—MVP (N = 437,703) and EstBB (N = 127,769)—whose demographic and obesity trait characteristics are distinct from UKBB. Individuals in MVP are predominantly male (92.4%) and on average 3.5 units of BMI heavier than male participants in the UKBB; on the other hand, participants in EstBB are of similar BMI to those in the UKBB, but are on average 6–8 years younger than their UKBB counterparts (Supplementary Data 23).

Table 1 Characterisation of obesity trait data in longitudinal records curated from UK Biobank assessment centre visits and linked general practitioner (GP) records

Full size table

**Fig. 1: Modelling of longitudinal obesity trait trajectories.**

We first investigated whether the individual-level random-intercept terms outputted by the longitudinal LME model, by sharing information across multiple BMI measurements, provided higher statistical power for GWAS than one based on a single, cross-sectional BMI measurement per individual. Despite our GWAS being 4-fold smaller than the largest published analyses⁴⁶, we identify 14 novel loci and refine 53 previously described signals for obesity traits among the 374 unique fine-mapped lead single-nucleotide polymorphisms (SNPs) (P < 5 × 10⁻⁸) across all strata (Fig. 2A, Supplementary Fig. 13, and Supplementary Data 2), see Methods for conditional analysis to classify novel, refined, and reported SNPs⁴⁷). The 53 refined SNPs are conditionally independent of and represent stronger associations (P < 0.05) than published SNPs in this population. Together, the refined and novel SNPs explain 0.33% of variance in baseline BMI (in addition to the 2.7% explained by previously published SNPs), and 0.83% of variance in baseline weight (in addition to the 4.7% explained by previously reported SNPs) (Fig. 2B). We further quantified the power gained from estimating baseline BMI over repeat longitudinal measurements per individual by comparing genome-wide significant (GWS) SNPs from our baseline BMI GWAS to the largest published BMI meta-analysis to date⁴⁶. We observe an increase in median chi-squared statistics of GWS SNPs from either study of between 13.4% (females) to 14.8% (males) in our GWAS over what would be expected from a cross-sectional GWAS of equivalent sample size.

**Fig. 2: Genome-wide novel and refined SNP associations with baseline obesity estimated over the measurement window for each individual.**

Nine of the 14 novel SNPs replicate at P < 3.6 × 10⁻³ (family-wise error rate (FWER) controlled at 5% across 14 tests using the Bonferroni method) in at least one of (1) baseline obesity estimated with LME model intercepts in up to 437,703 individuals the MVP cohort, (2) baseline obesity estimated with LME model intercepts in up to 125,209 individuals the EstBB cohort, or (3) UKBB assessment centre measurements of cross-sectional obesity in up to 230,861 individuals not included in the discovery GWAS (Supplementary Data 3). These include rs6769383, whose nearest gene EDEM1 is involved in carbohydrate metabolism⁴⁸, rs2861761, whose nearest gene TENM2 is enriched in white adipocytes⁴⁹, rs11156978 whose nearest gene CHD8 is associated with impaired glucose tolerance in mouse knockouts⁵⁰, and rs7962636, whose nearest gene MED13L is a transcriptional regulator of white adipocyte differentiation⁵¹. We also replicate in MVP the male-specific BMI association of rs79586444, whose nearest gene, DUSP26, is associated with decreased high-density lipoprotein (HDL) cholesterol in mouse knockouts⁵².

Intra-individual variance is another longitudinal metric of interest, however we (Supplementary Fig. 15) and others²⁸ find no genetic variants associated with intra-individual variance in weight over time. While the intra-individual mean and baseline trait modelled from LME are phenotypically (R² > 0.95) and genetically highly correlated (R² > 0.99) (Supplementary Fig. 17), the LME intercept appears better powered for genetic association testing than the average trait, as we discover up to 1.2× more GWS variants associated with the former (Supplementary Data 20).

Ascertainment bias in our discovery cohort could arise from the over-representation of heavier participants in EHR data (Supplementary Data 4)⁵³. On average, women with ten or more weight measurements are 8.3 kg (3.7 units of BMI) heavier than their counterparts with 1–3 measurements; for men, this is an 8.2 kg (3.1 units of BMI) difference. However, the BMI-intercept metric from our longitudinal data is genetically perfectly correlated with the un-ascertained cross-sectional BMI in Genetic Investigation of ANthropometric Traits (GIANT) 2019⁴⁶ (r_G = 1 and P < 1 × 10⁻¹⁶ in all strata), and 96% of the GWS associations (P < 5 × 10⁻⁸) identified in our GWAS have either been reported, or are correlated with reported obesity-associated SNPs in the GWAS Catalog⁵⁴ (Supplementary Data 1).

APOE variant associated with weight loss over time, independent of baseline obesity

To identify genetic variants that affect change in adiposity over time, we performed GWASs for patterns of BMI and weight change adjusted for baseline measurements, defined in two ways. First, we created a linear phenotype from subject-specific random gradients, estimated within the LME model framework. Second, to capture non-linear patterns of temporal change, we modelled longitudinal variation in obesity traits using a regularised high-dimensional B-spline basis³¹ (Fig. 1). Within each of the six strata, we identified four clusters of individuals using k-medoids clustering^55,56, representing high gain (k1), moderate gain (k2), stable (k3), and loss (k4) trajectories, and estimated each individual’s probability of belonging to a cluster based on their posterior non-linear obesity trait trajectory (Fig. 1 and Supplementary Fig. 5). We performed GWASs on the linear slope-change phenotype and on individuals’ logit-transformed posterior probabilities of membership in the high gain cluster (k1), high and moderate gain clusters (k1 and k2), or all but the loss cluster (k1, k2, and k3). All analyses were adjusted for baseline obesity trait and confounders, including length of follow-up and number of follow-up measures, to mitigate survivor bias.

A common missense variant in APOE (rs429358) is associated with decrease in both BMI and weight over time, and lower posterior probabilities of gain-cluster membership in all analysis strata (Table 2). Each copy of the minor C allele of rs429358 (minor allele frequency (MAF) = 0.16) is associated with 0.060 standard deviation (SD) decrease (95% confidence interval (CI) = 0.050–0.069, P = 8.6 × 10⁻³⁵) in expected BMI slope over time and 0.063 SD decrease (0.054–0.072, P = 6.0 × 10⁻⁴²) in expected weight slope over time (Fig. 3A). Independent of baseline obesity, carriers of the minor C allele of rs429358 are at lower odds of membership in the high-gain BMI and weight clusters (odds ratio (OR) = 0.976, 95% CI = 0.97–0.98, P < 4.9 × 10⁻¹⁹), lowering the membership posterior probability from 40% to 39% on average (Fig. 3B). Although the minor allele of rs429358 is also associated with lower baseline BMI (β = 0.015 SD lower BMI-intercept, 95% CI = 0.0054–0.024) and weight (β = 0.011 SD lower weight intercept, 95% CI = 0.0029–0.020), these associations do not reach GWS (P > 0.002).

Table 2 Lead SNPs identified from genome-wide association studies (GWAS) of posterior probability of membership in an adiposity-change cluster (high gain k1, high/moderate gain k1/k2, or high/moderate gain and steady k1/k2/k3), independent of baseline obesity

Full size table

**Fig. 3: Association of minor C allele of rs429358, missense variant in *APOE*, with various longitudinal phenotypes.**

The association of rs429358 with adiposity-change phenotypes was replicated at P < 1.39 × 10⁻³ (FWER controlled at 5% across six variants and six traits tested) in: (1) up to 437,703 individuals in the MVP cohort, (2) up to 125,209 individuals in the EstBB, and (3) up to 17,035 individuals in UKBB with multiple measurements of weight and BMI at repeat assessment centre visits who were excluded from the discovery analyses (Fig. 4 and Supplementary Data 5). Further, based on 301,943 UKBB participants who were not included in the discovery GWASs, and who reported weight change in the last year as “gain”, “about the same”, or “loss”, we found that carriers of each additional copy of the minor C allele of rs429358 are at 0.956 (95% CI = 0.94–0.97) lower odds of being in a higher ordinal weight-gain category, independent of their BMI (Fig. 3C and Supplementary Data 6). We observe consistent effect direction of the rs429358 association with both estimated and self-reported weight loss over time in individuals who self-identify as Asian (maximum N = 8324 individuals), Black (6796), mixed (2681), white not in the white–British ancestry subset (47,174), and other (3994) ethnicities (see Methods for ancestral group definitions, Supplementary Fig. 1 and Supplementary Data 7).

**Fig. 4: Effect sizes of rs429358 on BMI-change phenotypes in discovery (UK Biobank (UKBB)) and replication (Million Veterans Program (MVP) and Estonian Biobank (EstBB)) datasets.**

Finally, we tested for the effect of rs429358 on change in abdominal adiposity in up to 44,154 individuals of white–British ancestry in UKBB who were not in the discovery set, with repeated assessment centre measurements of waist circumference (WC) and waist-to-hip ratio (WHR). Each copy of the C allele is associated with 0.040 SD decrease (95% CI = 0.021-0.049, P = 2.3 × 10⁻⁵) in expected WC slope over time and 0.031 SD decrease (0.012–0.050, P = 1.1 × 10⁻³) in expected WHR slope over time, independent of baseline values (Fig. 3D and Supplementary Data 6). While the effect direction remains consistent, these associations are no longer significant upon adjustment for BMI (all P > 0.1), suggesting that the observed loss in abdominal adiposity over time may represent a reduction in overall adiposity.

We additionally performed a longitudinal phenome-wide scan to test for the association of rs429358 with changes in 45 quantitative biomarkers obtained from the UKBB-linked primary care records. Each copy of the C allele is associated with an increase in expected slope change over time of total cholesterol (β = 0.030 SD increase, P = 6.4 × 10⁻¹²), C-reactive protein (CRP) (β = 0.026, P = 9.6 × 10⁻⁷), and HDL cholesterol (β = 0.022, P = 1.0 × 10⁻⁵), but a decrease in expected slope change over time of triglycerides (β = − 0.027, P = 2.7 × 10⁻⁷), potassium (β = − 0.023, P = 3.9 × 10⁻⁶), lymphocytes (β = − 0.020, P = 4.0 × 10⁻⁵), and haemoglobin concentration (β = − 0.016, P = 1.0 × 10⁻³) (FWER controlled at 5% across 45 tests via the Bonferroni method) (Fig. 3E and Supplementary Data 8).

The APOE locus is a highly pleiotropic region that is associated with lipid levels^57,58, Alzheimer’s disease^59,60, and lifespan^61,62, among other traits⁶³, both in the UKBB (Supplementary Fig. 14) and elsewhere. Excluding the 242 individuals with diagnoses of dementia or Alzheimer’s disease in our replication datasets did not alter associations of rs429358 with any of the longitudinal obesity traits (Supplementary Fig. 2), indicating that they are unlikely to be driven solely by weight loss that accompanies dementia. Despite the association of rs429358 with lifespan, we found no association between this variant and follow-up metrics in our study (Supplementary Data 22); we also found no significant difference in the effect of this variant on adiposity change from two sets of models: (1) without including age and related covariates, i.e., follow-up metrics and year of birth, and (2) with these covariates (heterogeneity P value P_het > 0.05) (Supplementary Fig. 16). Finally, we observe no associations between 135 of 138 published lifespan-associated genetic variants and our adiposity-change phenotypes at P < 3.6 × 10⁻⁴ (FWER controlled at 5% across 138 tests via the Bonferroni method). Of the three SNPs associated with both weight change and lifespan, two (rs429358 and rs7412) are variants in the APOE gene, and rs1085251 is a known obesity association in the FTO locus (Supplementary Data 16).

Genome-wide architecture of change in adiposity over time is distinct from baseline adiposity

We identify six independent genetic loci associated with distinct longitudinal trajectories of obesity traits (Table 2). This included the APOE locus above and five signals in intergenic regions. rs9467663 (OR = 1.011 for membership in the high-gain weight cluster, P = 1.6 × 10⁻⁹) and chr6:26076446 (OR = 1.012 for membership in the high-gain BMI cluster, P = 2.1 × 10⁻⁹), are reported associations with haematological traits⁶⁴. We identify two SNPs, rs11778922 and rs61955499, with female-specific effects on BMI change. rs11778922 (OR = 0.984 for membership in the high-gain BMI cluster, P = 1.3 × 10⁻⁸, sex-heterogeneity P_sexhet = 5.8 × 10⁻⁴, see Methods) has previously been nominally associated with BMI in females⁴⁶, and rs61955499 (OR = 1.070 for membership in the BMI loss cluster, P = 3.4 × 10⁻⁸, P_sexhet = 4.7 × 10⁻⁵), has previously been nominally associated with low-density lipoprotein (LDL) cholesterol levels⁶⁵. Finally, rs12953815 is associated with male-specific weight change (OR = 1.012 for membership in the weight loss cluster, P = 1.7 × 10⁻⁸, P_sexhet = 2.0 × 10⁻⁵) and has been previously nominally associated with lung function⁶⁶.

Other than rs429358, none of the lead variants for adiposity change replicated in either MVP or EstBB at P > 1.39 × 10⁻³ (FWER controlled at 5% across 6 variants via the Bonferroni method) (Supplementary Data 5). However, we were only sufficiently powered to replicate the effects of three of these in MVP (rs9467663, chr6:26076446, and the male-specific variant rs12953815), and none in EstBB, as replication at 80% power required sample sizes of between 116,000 to 234,000 individuals with repeat measurements of BMI (Supplementary Data 25).

While all lead variants in the discovery GWASs remain significant at P < 5 × 10⁻⁷ in GWASs that are not adjusted for follow-up metrics, we discover three variants in the FTO locus that are associated with BMI or weight gain only in analyses that are unadjusted for follow-up metrics (Supplementary Data 21). These associations may reflect genetic contributions to baseline weight rather than weight change, as FTO is among the strongest known loci for obesity, and follow-up metrics are strongly positively correlated with baseline obesity (Supplementary Data 4).

The smaller number of independent GWS associations with adiposity change: six, compared to 374 unique lead SNPs associated with baseline obesity traits, is expected given the 7- to 9-fold lower heritability of adiposity change. The heritability explained by genotyped SNPs (${h}_{G}^{2}$)⁶⁷ of the posterior probability of belonging to an adiposity-gain cluster is between 1.38% (standard error (SE) = 0.53) in men to 2.82% (0.59) in women, while the ${h}_{G}^{2}$ of baseline obesity traits varies between 21.6% (1.09) to 29.0% (1.72) across strata (Fig. 5). Furthermore, we observe that the heritability of BMI and weight trajectories are higher in women than in men (2.89% (0.56) vs 1.05% (0.59) for BMI slopes, P_sexhet = 0.012; and 3.42% (0.53) vs 1.69% (0.52) for weight slopes, P_sexhet = 9.9 × 10⁻³). Similarly, we estimate the heritability of BMI slopes in the EstBB to be higher in women (2.15% (0.56) in women vs 1.80% (0.98) in men); however, these values are low and must be interpreted with caution. We do not observe a corresponding difference in the ${h}_{G}^{2}$ of baseline BMI or weight between the sexes (P_sexhet > 0.1). Finally, baseline and change in obesity traits are genetically correlated, with r_G ranging from 0.35 (95% CI = 0.24–0.45) for weight in women to 0.91 (0.59–1.23) for BMI in men (Fig. 5). As expected given their positive correlation, we observe inflation of the χ² statistics for adiposity-change slope associations amongst lead variants for baseline adiposity (Supplementary Fig. 19). While the genetic correlation between baseline adiposity and adiposity change appears to be higher in men as compared to women, these estimates have wide CIs (overlap** 1) and P_sexhet > 0.05 for both BMI and weight.

**Fig. 5: Genotyped SNP-based heritability of, and genetic correlation between, baseline obesity trait and obesity-change phenotypes.**

Throughout this study, we evaluate both BMI and weight as obesity traits, and expect these to track closely in adults as height does not change significantly over time. In the 161,891 individuals in our discovery strata with multiple measurements of both BMI and weight, there is a strong correlation between the slopes for weight and BMI change (r² = 0.88) and between the posterior probabilities of membership in the BMI-gain and weight-gain clusters (r² = 0.73) (Supplementary Data 9, all P < 1 × 10⁻¹⁶). Moreover, the genetic correlation between change in BMI and weight is nearly perfect (r_G for slope terms = 0.98, r_G for posterior probability of membership in gain cluster = 0.95, all P < 1 × 10⁻¹⁶), indicating that the genetic architecture highlighted here is robust to the metric of adiposity used to define trajectories.

Discussion

In this large-scale EHR- and genetics-based study of longitudinal trajectories of obesity traits, we demonstrate that modelling multiple observations across time increases power to identify genome-wide signals for baseline BMI and weight and enables the discovery of genetic variants associated with changes in adiposity, which are less heritable than and only partially shared with baseline adiposity. Modelling ~1.5 million observations of BMI and weight from >170,000 individuals in the UKBB, enabled us to identify 14 novel, biologically plausible, genetic signals associated with obesity traits. The discovery of these novel loci highlights that repeat measurements can contribute to narrowing the “missing heritability” gap. Leveraging the bespoke longitudinal adiposity phenotypes developed here, we find six genetic loci associated with changes in BMI and weight over time, including a missense variant in APOE that replicates in two external cohorts in the United States and Estonia. While previous studies have investigated the associations of cross-sectional BMI SNPs or obesity polygenic scores with adiposity trajectories^15,68, to the best of our knowledge, this study reports the first genome-wide scan of variants associated with obesity trait trajectories over adulthood.

Accounting for the influence of genetic variation on adiposity change may provide opportunities to personalise obesity prevention and treatment^69,70. While several studies have investigated the association between BMI-related genetic variants and weight loss guided by medical⁷⁰, surgical^71,72, dietary⁷³, or behavioural^70,74,75,76 interventions, results are inconsistent across studies, intervention types, and genes assessed. Given our evidence that the genetic basis of adiposity change is distinct from baseline levels, we hypothesise that genetic variants associated with longitudinal weight trajectories may be better predictors of long-term weight change following treatment or lifestyle interventions than variants associated with baseline BMI. Moreover, incorporating information on the genetic signals associated with adiposity trajectories will complement current genetics-based strategies to identify genes for pharmaceutical targets⁷⁷ for obesity treatment.

Previous studies have estimated continuity in the genetic correlation of BMI measured at different ages⁷⁸, which is theorised to emerge by two possible mechanisms⁷⁹: (1) common genetic (or environmental) factors are associated with the rates of change in BMI over time, which we test in this study, and (2) that these correlations are induced by time-specific genetic (or environmental) factors in an autoregressive manner, i.e., BMI genetics at time-point t−1 causally affect BMI at time t. Studies testing the latter hypothesis have arrived at opposing conclusions: Gillespie et al.⁸⁰ find that on a genome-wide scale, age-specific genetic effects in an autoregressive framework do not explain differences in BMI heritability across ages 40–73 years, while Winkler et al.⁷⁹ did identify 15 genetic loci with differential effects on BMI in younger adults (age <50 years) and older adults (age >50 years). Both studies were pseudo-longitudinal, i.e., the same individuals were not monitored over a period of time, but rather cross-sectional individual data was grouped into age bins. Our work tests a distinct hypothesis and is also, to our knowledge, the first to perform a truly longitudinal genetic study with repeated measures in this age group.

Leveraging EHR to derive longitudinal metrics for genetic discovery may be affected by various biases described earlier⁸¹. We attempted to mitigate these biases in three ways: (1) While EHR data over-represent sick patients and individuals with higher BMI, UKBB participants are, on average, healthier and have lower BMIs than the population of the UK⁸². Therefore, our UKBB-linked EHR discovery cohort is more overweight than a random sampling of UKBB, but in contrast, UKBB as a whole is ascertained towards lower BMI individuals than a random sampling of the UK. (2) Appending the more accurate UKBB assessment center measurements to the EHR data improves overall data quality. (3) Stringent quality control at both the population and individual increases the signal-to-noise ratio by filtering out a subset of inaccurate data entries. Although we were powered to replicate four of the six UKBB-identified variants for adiposity-change in the MVP cohort, only one replicated; the lack of signal for other variants may imply these are false positive results. However, it is also important to consider the differences in the demographic and obesity-related characteristics between these cohorts, as participants in the MVP are much more likely to have cardiovascular disease and be overweight⁴⁴ compared to those in UKBB; and assigning individuals in the former cohort to adiposity trajectory clusters from the latter may distort the phenotypes. Nevertheless, a majority of the baseline adiposity variants in our discovery GWASs as well as the rs429358 variant for adiposity-change replicate across the UKBB, MVP, and EstBB, suggesting that linking EHRs with biobank data may provide a robust framework for genetic discovery.

The two-stage nature of our approach to associate genetic variants with longitudinal trajectories of obesity traits is highly advantageous because of its computational efficiency and convenience. In particular, our method is composable, as the longitudinal analysis of raw data can first be performed separately using a choice of popular, efficient implementations of models; the first-stage outputs can then be taken forward to a GWAS performed in its own bespoke, highly optimised software. The two-stage method approximates the fitting of a full joint model incorporating raw measurement data and genome-wide SNP data. While a full joint model would propagate posterior uncertainty from the longitudinal sub-model through to the GWAS, the approximation here takes forward a single point estimate, i.e. a best linear unbiased predictor (BLUP) or posterior probability of cluster membership, to GWAS. However, in EHR datasets, the number of measurements, and hence estimation precision, can vary across individuals. The propagation of uncertainty between model components, in a similar vein to Markov melding⁸³, has the potential to further improve the quality of genetic discovery. An interesting area for future research will be to allow for the principled propagation of posterior uncertainty in traits through the highly optimised, multi-locus, mixed-model GWAS methods to perform genetic association in the presence of relatedness and population stratification⁸⁴.

It is also important that the choice of trajectory metric utilised in genetic analysis is phenotype-aware. While the variance within an individual’s trait value over time may capture meaningful biology for biomarkers such as blood pressure or triglycerides, whose fluctuations are associated with disease development and progress^85,86, weight is a more stable trait that shows a steady pattern of change over many years^87,88. Our adiposity-change metrics, derived from regression models incorporating linear and non-linear temporal trends, are better suited to identify the genetic component of BMI and weight trajectories, and are robust to the manner in which this is defined. For example, despite self-report being an imprecise metric⁸⁹, lead SNPs from our obesity-change GWASs are also associated with self-reported weight change. However, our results indicate the relative difficulty of identifying genetic associations with longitudinal changes in obesity traits, compared with identifying loci associated with cross-sectional BMI. Variants associated with cross-sectional BMI must have had a causal impact on expected longitudinal BMI at some periods in individuals’ lifespans; i.e. a cross-sectional BMI phenotype captures the cumulative longitudinal effects of each BMI-associated genotype up to the age at which the individual is measured. In contrast, our derived measures of longitudinal change target the rate of change of BMI over a shorter average time period, and the magnitude of the genetic signal thus tends to be smaller in the longitudinal analysis compared to the cross-sectional one. This means that the weaker longitudinal genetic signal can be obscured by the non-genetic contribution from individuals’ short-and long-term environment, whilst the stronger cross-sectional genetic signal may be detected with higher power as the signal-to-noise ratio is larger. More broadly, there are several factors that might affect the relative power to detect longitudinal effects such as sample size, typically being smaller in longitudinal studies; the longer and more frequent the typical follow-up is in a longitudinal study, the greater the power, and the particular statistical methods used to estimate cross-sectional versus longitudinal traits can affect the accuracy and precision of estimates, and hence the strength of genetic signal detected.

The SNP rs429358 (missense variant in APOE) is robustly associated with loss in BMI and weight, independent of baseline obesity, across men and women, across three global cohorts of European ancestry. APOE codes for apolipoprotein E, which is a core component of plasma lipoproteins that is essential for cholesterol transport and homoeostasis in several tissues across the body, including the central nervous system, muscle, heart, liver, and adipose tissue^90,91. The precise pathway by which this variant affects weight change is difficult to pinpoint, as APOE is a highly pleiotropic locus associated with hundreds of biomarkers and diseases⁶³. Here as well, we find associations between rs429358 and 11 biomarker trajectories. Obesity is cross-sectionally associated with several of these, including levels of triglycerides and cholesterol^92,93, markers of chronic inflammation⁹⁴, and haematological traits⁹⁵. Some of the effects of rs429358 are discordant with previously reported phenotypic correlations between obesity and these biomarkers, however, the causal longitudinal and pleiotropic nature of these associations remain to be established. As rs429358 is also the strongest genetic risk factor for Alzheimer’s disease^59,60, which is preceded by weight loss⁹⁶, we ensured that our findings were robust to the exclusion of individuals with dementia. As longevity may confound the APOE-weight loss association^61,62, we adjusted analyses for the length of follow-up in EHR to mitigate against survivor bias; however, we also present age-unadjusted analyses and demonstrate that other lifespan-associated variants are not associated with adiposity change in our GWASs. We thus hypothesise that the APOE effect on weight loss may act through cholesterol- and lipid-metabolism pathways that partly determine response to dietary and environmental factors, as seen in mouse models^97,98. Indeed, it has recently been suggested that APOE-mediated cholesterol dysregulation in the brain may influence the onset and severity of Alzheimer’s disease⁹⁹, suggesting that ageing-associated systemic aberrations in cholesterol homoeostasis could have far-ranging consequences, from weight loss to cognitive decline.

Patterns of weight change in mid-to-late adulthood have been observed to be sex-specific, particularly as women undergo significant changes in weight and body fat distribution around menopause¹⁰⁰. Here, we find that the heritability of changes in obesity traits is higher in women than in men, supporting a previous finding that obesity polygenic scores are more strongly associated with weight change trajectories in women than in men⁶⁸. This is in contrast to baseline obesity, which is equally heritable in men and women, both in our study and as previously reported⁴⁶. The lower genetic correlation between baseline obesity and obesity-change in women as compared to men, while not statistically significant, may nevertheless indicate sex-differential genome-wide contributions to these phenotypes. We hypothesise that sex hormones could explain some of this sex-specificity, particularly through their role in altering overall obesity and fat distribution around menopause^101,102. We were underpowered to study the genome-wide architecture of change in adult WC and WHR (10-fold fewer observations than BMI and weight), whose cross-sectional levels are genetically sex-specific with higher heritability in women⁴⁶, so more work is needed to disentangle the genetic contribution to changes in adult body fat distribution over time.

While the EHR-linked UKBB cohort has driven genetic discovery for a vast array of human traits in populations of European ancestry¹⁰³, sample sizes remain under-powered to detect genome-wide associations in other ancestral groups. We were thus limited to replicating European-ancestry associations in other populations, without the ability to discover ancestry-specific variants associated with adult adiposity trajectories. Furthermore, despite the inclusion of >200,000 individuals in the UKBB EHR data, sample sizes remain low to analyse the genetics of longitudinal trajectory metrics, which have lower heritability than the averaged trait value^15,104 (~7–9x lower in our study) and are thus more challenging to characterise genetically without corresponding increases in sample size. Another limitation of our study was the exclusion of time-varying covariates, such as medication use, smoking status, and other dietary and environmental covariates from models of adiposity change. It is challenging to extract time-dependent values of these variables from EHRs and difficult to ascertain the direction of causality by which these covariates may be associated with weight change. For example, the use of statins to lower blood pressure may be connected to weight gain, mediated indirectly by change in appetite¹⁰⁵, but high blood pressure may itself be a consequence of weight gain¹⁰⁶. Inappropriate adjustments along this causal pathway may lead to unexpected collider biases¹⁰⁷. In general, despite their longitudinal nature, it is challenging to assign causality to the associations between weight change and covariates or disease diagnoses from EHR observations alone, as there is no prospective study design to follow¹⁰⁸. Advances in emulating randomised control trials from longitudinal EHR are beginning to overcome these challenges^109,110, and in the future, it will be critical to incorporate information on genetic risk into these simulated studies.

To the best of our knowledge, this is the largest study to date that characterises the genome-wide architecture of adult adiposity trajectories, and the first to identify specific variants that alter BMI and weight in mid- to late-adulthood. We add evidence to support the growing utility of EHRs in genetics research, and particularly highlight opportunities for incorporating longitudinal information to boost power and identify novel associations. In particular, the APOE-associated weight loss identified here contributes to a growing body of evidence on the ageing-associated effects of cholesterol dysregulation. Heterogeneity between men and women in the genome-wide architecture of obesity-change and genetic correlation with baseline obesity highlights the importance of distinguishing between the genetic contributions to mean and lifetime trajectories of phenotypes in sex-specific analyses. In the future, the growing integration of EHR with genetic data in large biobanks will allow us to assess the time-varying associations of rare variants with outsize effects on quantitative traits, as well as to establish genetic and phenotypic relationships among the trajectories of multiple correlated biomarkers across adulthood.

Methods

Identification and quality control of longitudinal obesity records

UK Biobank

This study was conducted using the UKBB resource, which is a prospective UK-based cohort study with approximately 500,000 participants aged 40–69 years at recruitment, on whom a range of medical, environmental, and genetic information has been collected⁴². Here, we included 409,595 individuals in the white–British ancestry subset identified by Bycroft et al.¹¹¹ who passed genotype quality control (QC) (see below).

Repeat obesity trait measurements

Obesity-associated traits including BMI and weight were recorded at initial baseline assessment (between 2006 and 2010), as well as at repeat assessments of 20,345 participants (between 2012 and 2013), and at imaging assessments of 52,596 participants (in 2014 and later). We curated a longitudinal research resource by integrating these repeat UKBB assessment centre measurements with the interim release of primary care records provided by GPs for approximately 45% of the UKBB cohort (~230,000 participants, randomly selected)¹¹² (Supplementary Fig. 3). Each individual with at least one BMI record (coded as Clinical Practice Research Datalink (CPRD) code 22K.) or weight record (coded as CPRD code 22A) in the GP data had their respective UKBB assessment centre measurements appended. Following phenotype and genotype QC, this resulted in 162,666 participants of white–British ancestry with multiple BMI measurements and 177,472 participants with multiple weight measurements (Supplementary Fig. 3).

Quality control

We performed both population-level and individual-level longitudinal QC. Participants with codes for history of bariatric surgery (Supplementary Data 10, as identified by Kuan et al.¹¹³) were excluded entirely, while BMI and weight observations up to the date of surgery were retained for individuals where this could be determined. Only those measures recorded in adulthood (ages 20–80 years) were retained. We excluded implausible observations, defined as more extreme than ±10% of the UKBB asessment centre minimum and maximum values, respectively (BMI <10.9 kg/m² or >82.1 kg/m² and weight <27 kg or >217 kg). We further removed any extreme values >5 SDs away from the population mean to exclude possible technical errors. At the individual-level we excluded multiple observations on the same day, which are likely to be recording errors, by only retaining the observation closest to the individual’s median value of the trait across all time points. Finally, we excluded any extreme measurements on the individual-level. For individual i with J_i data points represented as (measurement, age) pairs (y_i,j, t_i,j) for j = 1, …, J_i ordered chronologically, i.e., ${t}_{i,1} < \ldots < {t}_{i,{J}_{i}}$, a “jump” P_i,j for j = 1,…, J_i − 1 was defined as:

$${P}_{i,j}={\log }_{2}\frac{| {y}_{i,j+1}-{y}_{i,j}| /{y}_{i,j}}{{t}_{i,j+1}-{t}_{i,j}}$$

(1)

We removed data points associated with extreme jumps (>3 SDs away from the population mean jump, to exclude possible technical errors) by excluding the observation farther from the individual’s median value of the trait across all time points.

BMI and weight validation data

Participants with BMI and weight observations in UKBB assessment centre measurements who were not included in the interim release of the GP data were held out of discovery analyses (Supplementary Fig. 3). This resulted in 245,447 individuals with at least one BMI observation and 230,861 individuals with at least one weight observation for replication of cross-sectional results. For the replication of longitudinal results, a subset of individuals was used comprising 17,006 individuals with multiple observations of BMI, and 17,035 individuals with multiple observations of weight, from repeat assessment centre visits.

Self-reported weight change data

At each UKBB assessment centre visit, participants were asked the question: “Compared with one year ago, has your weight changed?”, reported as “No—weigh about the same”, “Yes—gained weight”, “Yes—lost weight”, “Do not know”, or “Prefer not to answer”. We coded the 1-yr self-reported weight change response at the first assessment centre visit as an ordinal categorical variable with three levels: “loss”, “no change”, and “gain”, excluding individuals who did not respond or responded with “Do not know” or “Prefer not to answer”. We retained 301,943 individuals of white–British ancestry who were not included in any of the discovery analyses.

Abdominal adiposity data

Similar to the BMI and weight validation datasets, we retained the 44,154 participants with multiple WC and hip circumference (HC) records across repeat assessment centre visits who were not included in the interim release of the GP data, and hence held out of discovery analyses. WHR was calculated at each visit by taking the ratio of WC to HC. We further calculated WC adjusted for BMI (WCadjBMI) and WHR adjusted for BMI (WHRadjBMI) values at each visit for which WC, HC, and BMI were recorded simultaneously by taking the residual of WC and WHR in linear regression models with BMI as the sole predictor.

Models to define baseline adiposity and adiposity change traits

Individual i has J_i data points represented as (measurement, age) pairs (y_i,j, t_i,j) for j = 1, …, J_i ordered chronologically, i.e. ${t}_{i,1} < \ldots < {t}_{i,{J}_{i}}$. The following models are all fitted separately in three strata: female-specific, male-specific, and sex-combined.

Intercept and slope traits for GWAS

We implement a two-stage algorithm to estimate and preprocess local intercept and slopes of obesity traits to be taken forward to GWAS in both discovery and validation datasets.

1.
Fit random-slope, random-intercept mixed model with the maximum likelihood estimation procedure in the lme4¹¹⁴ package in R¹¹⁵. We target two quantities: the baseline value of each individual’s clinical trait (the β₀ + u_i,0 below); and the the linearly approximated rate of change in the trait during each individual’s measurement window (the β₁ + u_i,1 below):
$${y}_{i,j}= {x}_{i}^{T}\gamma+({\beta }_{0}+{u}_{i,0})+({\beta }_{1}+{u}_{i,1})\cdot ({t}_{i,j}-{t}_{i,1})+{\varepsilon }_{i,j}\\ {u}_{i,k} \sim {{{{{\rm{N}}}}}}(0,{\sigma }_{u,k}^{2}),\quad k=0,1\\ {\varepsilon }_{i,j} \sim {{{{{\rm{N}}}}}}(0,{\sigma }_{\varepsilon }^{2}),$$
(2)
where individual-specific covariates x_i comprise: baseline age, (baseline age)², data provider, year of birth, and sex. Variance parameters ${\sigma }_{u,k}^{2}$ and ${\sigma }_{\varepsilon }^{2}$ are estimated. Fitting model (2) outputs fixed effect model estimates $\hat{\gamma }$, ${\hat{\beta }}_{0}$, ${\hat{\beta }}_{1}$ and BLUPs of the random effects ${\hat{u}}_{i,0}$ and ${\hat{u}}_{i,1}$.
2.
Linearly adjust and transform the outputted BLUPs. We fit and subtract the linear predictor in each of the linear models:
$${\hat{u}}_{i,0}={x}_{i,0}^{T}{\gamma }_{0}+{\varepsilon }_{i,0}$$
(3)
$${\hat{u}}_{i,1}={x}_{i,1}^{T}{\gamma }_{1}+{\varepsilon }_{i,1}$$
(4)
where the vector of intercept-adjusting covariates x_i,0 in (3) comprise: baseline age, (baseline age)², sex, year of birth, assessment centre, number of follow-ups, and total length of follow-up (in years). The vector of slope-adjusting covariates x_i,1 in (4) comprise the same as x_i,0 but additionally include the intercept BLUP ${\hat{u}}_{i,0}$. The coefficient vectors γ₀ and γ₁ in (3) and (4) are estimated by least squares and are distinct from the previously estimated γ in (2). We finally apply a deterministic rank-based inverse-normal transformation¹¹⁶ to the residuals from fitting models (3) and (4). For example, the intercept trait for individual i taken forward to GWAS is
$${\tilde{u}}_{i,0}={\Phi }^{-1}\left(\frac{r({\hat{u}}_{i,0}-{x}_{i,0}^{T}\hat{{\gamma }_{0}})-c}{N-2c+1}\right)$$
(5)
where $r({\hat{u}}_{i,0}-{x}_{i,0}^{T}\hat{{\gamma }_{0}})$ is the rank of the ith residual among all N residuals, the offset c is 0.5, and Φ( ⋅ ) is the cumulative distribution function (CDF) of the standard Gaussian distribution.

The distribution of residuals and BLUPs from the LME models are heavy-tailed relative to a Gaussian (Supplementary Figs. 10–12). Such model misspecification could potentially lead to miscalibration of CIs and hypothesis tests based on the standard linear mixed model, although this is likely to be mitigated by the large sample size owing to the central limit theorem. We therefore take forward covariate-adjusted and inverse-normal transformed BLUPs, as described in (5), for genome-wide association testing.

Modelling non-linear trajectories with regularised splines

We model non-linear changes in obesity traits using a regularised B-spline basis of degree 3 (i.e., a cubic spline model) with n_df = 100 degrees of freedom, incorporating n_df − 4 (i.e., n_df − 3[degree] − 1 [intercept]) knots that are spaced evenly across each individual’s first T = 7500 post-baseline days ≈ 20.5 years. It is common practice in semi-parametric regression to use regularised splines with a relatively large number of knots, thereby allowing functional expressiveness without overfitting^31,117. Conditional on the spline coefficients, b_i, the likelihood for measurements y_i (individual i’s J_i-vector of measurements taken at days ${t}_{i,1},\ldots,{t}_{i,{J}_{i}}$) is

$$p({{{{{{\boldsymbol{y}}}}}}}_{i}| {{{{{{\boldsymbol{b}}}}}}}_{i},{\sigma }^{2})={{{{{\rm{MVN}}}}}}({{{{{{\boldsymbol{y}}}}}}}_{i}| {{{{{{\boldsymbol{Z}}}}}}}_{i}{{{{{{\boldsymbol{X}}}}}}}_{B}{{{{{{\boldsymbol{b}}}}}}}_{i},\,{{{{{\boldsymbol{I}}}}}}{\sigma }^{2})$$

(6)

where: the n_df-vector b_i contains the ith individual’s spline basis coefficients; X_B is the (T + 1) × n_df matrix of spline basis functions evaluated at days 0, …, T post-baseline; and Z_i is a J_i × (T + 1) matrix whose jth row extracts day t_i,j − t_i,1 post-baseline, i.e.,

$${[{{{{{{\boldsymbol{Z}}}}}}}_{i}]}_{j,k}=\left\{\begin{array}{ll}1\quad &\,{{\mbox{if}}}\,k={t}_{i,j}-{t}_{i,1}+1\\ 0\quad &\,{{\mbox{otherwise}}}\,.\quad\quad\end{array}\right.$$

We specify an order-1 autoregressive (AR(1)) model as a smoothing prior on spline coefficients, b_i, which vary smoothly around an individual-specific mean value, μ_i. On μ_i we specify a non-informative prior: $\,{{\mbox{N}}}\,({\mu }_{i}| 0,{\sigma }_{\mu }^{2})$ with large SD σ_μ. The resulting μ_i-marginalised prior for b_i is

$$p({{{{{{\boldsymbol{b}}}}}}}_{i})= \,{{\mbox{MVN}}}\,({{{{{{\boldsymbol{b}}}}}}}_{i}| {{{{{\mathbf{0}}}}}},\;{{{{{{\mathbf{\Sigma }}}}}}}_{B})\\ {{{{{{\mathbf{\Sigma }}}}}}}_{B}: \!\!= {{{{{{\mathbf{\Sigma }}}}}}}_{AR(1)}+{\sigma }_{\mu }^{2}\overrightarrow{{{{{{\bf{1}}}}}}}\\ {\left[{{{{{{\mathbf{\Sigma }}}}}}}_{AR(1)}\right]}_{k,{k}^{{\prime} }}: \!\!= {\sigma }_{AR(1)}^{2}{\phi }^{| k-{k}^{{\prime} }| },$$

(7)

where: Σ_AR(1) is the n_df × n_df autocovariance matrix implied by an AR(1) model with lag-1 autocorrelation $\phi \in \left[0,1\right)$ and scale parameter ${\sigma }_{AR(1)}^{2} > 0$; and $\overrightarrow{{{{{{\boldsymbol{1}}}}}}}$ is an n_df × n_df matrix of ones.

The prior at (7) and likelihood at (6) are a specific case of the Bayes linear model¹¹⁸, for which the posterior is available in closed form:

$$p({{{{{{\boldsymbol{b}}}}}}}_{i}| {{{{{{\boldsymbol{y}}}}}}}_{i},{{{{{{\boldsymbol{\Sigma }}}}}}}_{B},{\sigma }^{2})= \,{{\mbox{MVN}}}\,({{{{{{\boldsymbol{b}}}}}}}_{i}| {{{{{{\boldsymbol{m}}}}}}}_{i},\,{\sigma }^{2}{{{{{{\boldsymbol{V}}}}}}}_{i})\\ {{{{{{\boldsymbol{V}}}}}}}_{i}: \!\!= {\left({{{{{{\boldsymbol{X}}}}}}}_{B}^{T}{{{{{{\boldsymbol{Z}}}}}}}_{i}^{T}{{{{{{\boldsymbol{Z}}}}}}}_{i}{{{{{{\boldsymbol{X}}}}}}}_{B}+{{{{{{\boldsymbol{\Sigma }}}}}}}_{B}^{-1}\right)}^{-1}\\ {{{{{{\boldsymbol{m}}}}}}}_{i}: \!\!= {{{{{{\boldsymbol{V}}}}}}}_{i}{{{{{{\boldsymbol{X}}}}}}}_{B}^{T}{{{{{{\boldsymbol{Z}}}}}}}_{i}^{T}{{{{{{\boldsymbol{y}}}}}}}_{i}.$$

(8)

The posterior at (8) can be evaluated separately and in parallel across individuals because the (y_i, b_i) are conditionally independent across individuals i given the hyperparameters ${\sigma }_{AR(1)}^{2}$, ϕ, σ_μ and σ². Values of hyperparameters in the smoothing prior are chosen subjectively, via visualisation of randomly selected samples of individual data trajectories, to reflect empirical levels of smoothness: ${\sigma }_{AR(1)}^{2}:=2.5$, ϕ ≔ 0.99, σ_μ ≔ 100 (Supplementary Fig. 4). We additionally compared cluster allocations for 5000 randomly selected individuals across the following settings of hyperparameters: (${\sigma }_{AR(1)}^{2}:=0.5$, ϕ ≔ 0.9, σ_μ ≔ 10), (${\sigma }_{AR(1)}^{2}:=2.5$, ϕ ≔ 0.99, σ_μ ≔ 100), and (${\sigma }_{AR(1)}^{2}:=10$, ϕ ≔ 0.999, σ_μ ≔ 500) (Supplementary Fig. 8).

For each trait separately, we set σ² to the median of its individual-specific maximum likelihood estimates (MLEs), i.e., ${\sigma }^{2}:=\,{{\mbox{median}}}\,\{\frac{1}{{J}_{i}}| | {{{{{{\boldsymbol{y}}}}}}}_{i}-{{{{{{\boldsymbol{Z}}}}}}}_{i}{{{{{{\boldsymbol{X}}}}}}}_{B}{{{{{{\boldsymbol{m}}}}}}}_{i}| {| }_{2}^{2}:i=1,\ldots,n\}$ where each MLE is calculated from (6) after substituting for b_i its maximum a posteriori estimate, m_i from (8) (Supplementary Data 12).

The measurements y_i inputted into the likelihood for the regularised spline model at (6) are pre-processed by taking the standardised residual from the linear model with the following covariates: baseline age, (baseline age)², data provider, year of birth, and sex, i.e. from the model ${{{{{{\boldsymbol{y}}}}}}}_{i,j}={x}_{i}^{T}\gamma+{\varepsilon }_{i,j}$ fitted across all i = 1, …, N individuals and j = 1, …, J_i time points. Standardisation of residuals then proceeds by subtracting the mean and dividing by the SD of residuals across all individuals and time points.

We focus on individual i’s posterior change from baseline, i.e. on

$${\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}: \!\!={(0,{u}_{i,2}-{u}_{i,1},{u}_{i,3}-{u}_{i,1},\ldots )}^{T}$$

(9)

$$\equiv {{{{{\boldsymbol{D}}}}}}{{{{{\boldsymbol{b}}}}}}$$

(10)

where the jth row of D is ${({{{{{{\boldsymbol{e}}}}}}}_{j}-{{{{{{\boldsymbol{e}}}}}}}_{1})}^{T}$ and e_k is the kth basis vector, i.e. a column n_df-vector with zeroes everywhere except the kth entry, which is one. To calculate the posterior for ${\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}$ we linearly transform the posterior at (8) so that

$$p({\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}| {{{{{{\boldsymbol{y}}}}}}}_{i},{{{{{{\boldsymbol{\Sigma }}}}}}}_{B},{\sigma }^{2})=\,{{\mbox{MVN}}}\,({\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}| {{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i},\,{\sigma }^{2}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{V}}}}}}}_{i}{{{{{{\boldsymbol{D}}}}}}}^{T})$$

(11)

with m_i and V_i defined at at (8).

Soft clustering of individuals by non-linear adiposity trajectory patterns

See Supplementary Fig. 5 for an overview of the clustering protocol.

Any two individuals typically have quite distinct measurement profiles, with different numbers of measurements taken at ages which may be quite disparate. Therefore the precision with which we can estimate any particular spline coefficient varies across individuals. To incorporate this heteroscedasticity into our clustering framework, we define the following scaled Euclidean distance between each pair of individuals $(i,{i}^{{\prime} })$ in the space of baselined spline basis coefficients:

$$d(i,{i}^{{\prime} })=\sqrt{\sum_{k=1}^{{n}_{{{{{{\rm{df}}}}}}}}\frac{{({[{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i}]}_{k}-{[{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{{i}^{{\prime} }}]}_{k})}^{2}}{({[{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{V}}}}}}}_{i}{{{{{{\boldsymbol{D}}}}}}}^{T}]}_{k,k}+{[{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{V}}}}}}}_{{i}^{{\prime} }}{{{{{{\boldsymbol{D}}}}}}}^{T}]}_{k,k}){\sigma }^{2}}}$$

(12)

where m_i and σ²V_i are the posterior mean and covariance of individual i’s spine coefficients b_i taken from (8). For each spline coefficient k in (12), the squared difference between individuals’ i and ${i}^{{\prime} }$ mean coefficients is standardised by the sum of the corresponding variances.

We perform k-medoids clustering using the partitioning around medoids (PAM) algorithm^55,56 as implemented in the pam function in the cluster package¹¹⁹ in R¹¹⁵. We train cluster centroids on a randomly selected subset of 80% of individuals in each analysis strata. We filter individuals in the training set to retain only those with at least L = 2 observations. For a fixed number of clusters, K = 4, we initialise cluster membership according to bins ${{{{{{\mathcal{B}}}}}}}_{1:K}$ demarcated by the $0,\frac{1}{K},\frac{2}{K},\ldots,1$ empirical quantiles of the estimated fold change in obesity trait between baseline and year M = 2:

$${{{{{{\mathcal{B}}}}}}}_{k}: \!\!= \left[{\hat{F}}^{-1}\left(\frac{k-1}{K}\right),{\hat{F}}^{-1}\left(\frac{k}{K}\right)\right)k=1,\ldots,K\\ \hat{F}(\cdot ): \!\!= \,{{\mbox{empirical CDF of}}}\,\,\left\{\frac{{[{{{{{{\boldsymbol{X}}}}}}}_{B}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i}]}_{M+1}}{{[{{{{{{\boldsymbol{X}}}}}}}_{B}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i}]}_{1}}:\,i=1,\ldots,N\right\}\\ {{{{{\rm{individual}}}}}}\,i\,\,{{\mbox{in bin}}}\,\,k \iff \frac{{[{{{{{{\boldsymbol{X}}}}}}}_{B}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i}]}_{M+1}}{{[{{{{{{\boldsymbol{X}}}}}}}_{B}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i}]}_{1}}\in {{{{{{\mathcal{B}}}}}}}_{k}.$$

(13)

To ensure robustness, we run the clustering algorithm S = 10 times, each on a random sub-sample of size 5000 (without replacement). For each clustering output s = 1, …, S, we calculate the point-wise mean of each cluster’s constituent individuals:

$${{{{{{\boldsymbol{c}}}}}}}_{k,s}: \!\!=\frac{1}{| {{{{{{\mathcal{C}}}}}}}_{k}^{(s)}| }\sum_{i\in {{{{{{\mathcal{C}}}}}}}_{k}^{(s)}}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i}$$

(14)

For each clustering s, we observe all trajectories c_s,1:K to be monotonic and non-overlap** (Supplementary Fig. 6). We can therefore define ordered cluster means c_(k),s,

$$k \, < \, {k}^{{\prime} }\iff {[{{{{{{\boldsymbol{c}}}}}}}_{(k),s}]}_{j} > {[{{{{{{\boldsymbol{c}}}}}}}_{({k}^{{\prime} }),s}]}_{j}\quad \forall j=1,\ldots,{n}_{{{{{{\rm{df}}}}}}},$$

(15)

and average the kth ordered mean across S clusterings, where the highest-weight cluster mean is given by c₍₁₎ and the lowest by c_(K):

$${{{{{{\boldsymbol{c}}}}}}}_{(k)}: \!\!=\frac{1}{S}\sum\limits_{s=1}^{S}{{{{{{\boldsymbol{c}}}}}}}_{(k),s},$$

(16)

with corresponding point-wise SEs. We investigate the sensitivity of the resulting clusters to number of clusters K, filter parameter L (minimum number of measurements), and the cluster initialisation parameter M appearing in (13) via silhouette values¹²⁰, which evaluate the similarity between cluster members (cohesion) vs others (separation) (Supplementary Fig. 6). We test values of K from 2, …, 8, filtering parameter L ∈ (2, 5, 10), and initialisation parameter M ∈ (1, 2, 5, 10) or random initialisation to choose a combination of parameters that produces dense and separable clusters, i.e. K = 4, L = 2, M = 2. We also qualitatively evaluate cluster centroids across all parameter settings (Supplementary Fig. 7). Finally, we compared cluster allocations over each of the 10 random trains for a set of 5000 randomly sampled individuals held out of the training splits (Supplementary Fig. 9).

Once cluster centroids have been calculated, we define individual i’s soft cluster membership probability of belonging to cluster k as the posterior probability of being closest in Euclidean distance to cluster k’s centroid:

$${\pi }_{i,(k)}: \!\!=\int\,{\mathbb{I}}\left(k=\,{\mbox{argmin}}_{{k}^{{\prime} }}\,| | {\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}-{{{{{{\boldsymbol{c}}}}}}}_{({k}^{{\prime} })}| {| }_{2}\right)\,{{\mbox{MVN}}}\,({\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}| {{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{m}}}}}}}_{i},\,{\sigma }^{2}{{{{{\boldsymbol{D}}}}}}{{{{{{\boldsymbol{V}}}}}}}_{i}{{{{{{\boldsymbol{D}}}}}}}^{T})d{\tilde{{{{{{\boldsymbol{b}}}}}}}}_{i}$$

(17)

where the second term in the integrand is the posterior from (8), and we approximate the integral in (17) using 100 Monte Carlo samples from the posterior.

Finally, we validate the clustering by comparing cluster properties of the randomly selected 80% training set used to define cluster centroids, with the held-out 20% validation set. We assign each individual to the cluster for which they have highest membership probability and compare the proportion of individuals assigned to each cluster, as well as distributions of sex, baseline age, number of follow-up measures, and total length of follow-up of individuals assigned to each cluster. These metrics are similar across training and validation sets in all strata (Supplementary Data 13).

Finally, we take forward bounded logit-transformed cumulative cluster probabilities to GWAS. These outputs are defined as bounded logit(π_i,(1)), bounded logit(π_i,(1) + π_i,(2)), and bounded logit(π_i,(1) + π_i,(2) + π_i,(3)), i.e., the bounded log odds of being in the highest (k1), highest two (k1 or k2), and highest three (k1, k2, or k3) weight clusters respectively. To prevent infinite log odds at π ∈ {0, 1} we defined the following bounded logit transform¹²¹:

$$\,{{\mbox{bounded logit}}}(\pi )\equiv {{\mbox{logit}}}\,\left(\frac{(S-1)\pi+0.5}{S}\right)\quad \pi \in [0,1],$$

(18)

where S = 100, the number of Monte Carlo samples from the posterior in approximating (17).

Genome-wide association studies

QC of UK Biobank genotyped and imputed data

Genoty**, initial genotype QC, and imputation on genome build hg19 were performed by UKBB¹¹¹. We performed post-imputation QC to retain only bi-allelic SNPs with MAF >0.01, info score >0.8, missing call rate < 5%, and Hardy-Weinberg equilibrium (HWE) exact test P > 1 × 10⁻⁶. We additionally performed sample QC to exclude individuals with sex chromosome aneuploidies, whose self-reported sex did not match inferred genetic sex, with an excess of third-degree relatives in UKBB, identified as heterozygosity or missingness outliers, excluded from autosome phasing or kinship inference, and any other UKBB recommended exclusions¹¹¹.

Linear mixed model association analyses for quantitative traits

An overview of the traits carried forward for GWAS is provided in Supplementary Fig. 18. The following association analyses are all performed separately in three strata: female-specific, male-specific, and sex-combined. The intercept and slope traits for GWAS, i.e., ${\tilde{u}}_{i,0}$ and ${\tilde{u}}_{i,1}$ were tested for association with genetic variants, adjusted for the first 21 genetic principal components (PCs) and genoty** array, using the BOLT-LMM software⁸⁴. We also performed GWAS for the inverse-normal transformed within-individual mean adiposity trait, adjusting for the same covariates described for ${\tilde{u}}_{i,0}$. A similar protocol was followed for the logit-transformed soft clustering probability traits, i.e. ${\pi }_{i,1}^{\prime \prime}$, ${\pi }_{i,2}^{\prime \prime}$, and ${\pi }_{i,3}^{\prime \prime}$ with additional adjustments for baseline trait, baseline age, (baseline age)², sex, year of birth, assessment centre, number of follow-ups, and total length of follow-up (in years).

Fine-map** SNP associations

We identified putative causal variants at all GWS loci (defined by merging windows of 1.5 Mb around SNPs with P < 5 × 10⁻⁸), using FINEMAP¹²² to select variants (lead SNPs) with a posterior inclusion probability >95%. Lead SNPs were annotated to the nearest gene transcription start site.

Classifying baseline BMI and weight SNPs as reported, refined, or novel obesity associations

We curated a list of SNPs associated with any of 44 obesity-related traits in the GWAS Catalog⁵⁴ accessed on 02 Nov 2021, henceforth referred to as published obesity-associated variants (Supplementary Data 1). We then conducted conditional analysis using GCTA-COJO¹²³ for each lead SNP in our GWAS and published obesity-associated variants within 500 kb, classifying variants as reported, refined, or novel based on previously recommended criteria⁴⁷. Reported SNPs in our study are those whose effects are fully accounted for by published obesity-associated variants within 500 kb. Refined SNPs fulfil all of the following criteria: (1) the refined SNP is correlated (linkage disequilibrium (LD) r² ≥ 0.1) with at least one published obesity-associated variant within 500 kb, (2) the refined SNP has a significantly stronger effect (P < 0.05 in a two-sample t test for difference in mean effect sizes) on the BMI- or weight-intercept trait than published obesity-associated SNPs and also accounts for the effect of published obesity-associated SNPs in conditional analysis (conditional P > 0.05), and (3) published obesity-associated SNPs cannot fully account for the effect of the refined SNP in conditional analysis (conditional P < 0.05). Finally, a SNP in our study was declared novel if it was not in LD with (r² < 0.1), and conditionally independent of (conditional P < 0.05), all published obesity-associated variants within 500 kb.

Replication of GWS associations in UK Biobank hold-out sets

BMI and weight intercept-trait genetic associations

We created cross-sectional obesity phenotypes for the 245,447 individuals in the hold-out set for BMI and 230,861 individuals in the hold-out set for weight (Supplementary Fig. 3) by retaining the observed trait value closest to the individual’s median trait value (if multiple observations present). Deterministic rank-based inverse-normal transformation¹¹⁶ was applied to the residual of the obesity trait adjusted for age, age², year of birth, data provider, and sex. We then tested this trait for association with genetic variants, adjusted for the first 21 genetic PCs and genoty** array, using the BOLT-LMM software⁸⁴.

BMI and weight slope-trait genetic associations

We created adiposity slope phenotypes for the 17,006 individuals with multiple observations of BMI and 17,035 individuals with multiple observations of weight from repeat assessment centre visits (Supplementary Fig. 3 and Supplementary Data 19) with BLUPs from LMEs models as described in the slope-trait modelling section above. We tested for association of this slope-trait with GWS variants associated with adiposity change in our discovery analyses, adjusted for the first 21 genetic PCs and genoty** array, via the linear regression framework implemented in PLINK¹²⁴. As PLINK does not account for family structure, we compared each pair of second-degree or closer related individuals (kinship coefficient >0.0884)¹¹¹ and excluded the individual in the pair having higher genoty** missingness. We repeated the same protocol within each self-identified ethnic group of individuals not of white–British ancestry (Supplementary Data 11).

Genetic associations with BMI and weight cluster probabilities

We fit regularised splines as detailed above to the 17,006 individuals with multiple observations of BMI and 17,035 individuals with multiple observations of weight from repeat assessment centre visits (Supplementary Fig. 3). Soft cluster membership probabilities for these individuals were calculated, and the three logit-transformed π_i traits were carried forward for association testing with GWS variants associated with adiposity change in our discovery analyses. As above, we pruned out second-degree or closer related individuals and performed association analysis, adjusted for baseline trait, baseline age, (baseline age)², assessment centre, first 21 genetic PCs and genoty** array, via the linear regression framework implemented in PLINK¹²⁴. We repeated the same protocol within each self-identified ethnic group of individuals not of white–British ancestry.

Genetic associations with self-reported weight change

We fit proportional odds logistic regression models implemented in the MASS package¹²⁵ in R¹¹⁵ to estimate the additive effect of lead SNPs on self-reported one-year weight change coded as an ordinal categorical variable with three levels: “loss”, “no change”, and “gain” in 301,943 individuals (described in the data section above). All models were adjusted for BMI, age, sex, year of birth, data provider, assessment centre, first 21 genetic PCs and genoty** array. We repeated the same protocol within each self-identified ethnic group of individuals not of white–British ancestry.

Replication of GWS associations in external cohorts

Quality control, modelling of adiposity change, and GWAS in external cohorts were all performed exactly as in the UKBB discovery analyses, with any exceptions noted below.

Million Veteran Program

The MVP mega-biobank, with ~950,000 participants enroled to date, is actively recruiting participants from the 6.9 million eligible individuals who make use of the services provided by the Veterans Health Administration (VHA) from around 50 Veterans Affairs (VA) facilities across the United States of America (USA)⁴³. Eligible candidates are registered VHA users who are at least 18 years of age, possess a valid mailing address, and have the ability to provide informed consent. The VA Central Institutional Review Board (IRB) 10-02 protocol gained approval from the VA Central IRB in 2010, and the enrolment of study participants commenced in early 2011. Genetic data for this study was obtained from the custom-genotyped dataset with imputation to the 1000 Genomes project on genome build hg19, and filtered to markers with imputation information score >0.30 with minor allele count >30¹²⁶. Full characteristics of the MVP cohort⁴³ and associated genetic data¹²⁶ have been described previously.

Weight, height, and other covariate records were compiled from the MVP Baseline Survey, which collected information on demographics, health status, lifestyle habits, military experience, and physical traits, and supplemented with EHRs. A survey cleaning algorithm was used to process self-reported data, ensuring quality through expert-defined rules, full details of which have been described previously⁴⁴. Following population-level and individual-level QC of repeat BMI measurements as described above, we retained 404,503 male European-ancestry participants with 20.6 million observations of BMI and 33,200 female European-ancestry participants with 1.94 million observation of BMI.

For each participant, we calculated linear rates of change in BMI over time with the LME models described in (2); we also calculated each individual’s soft cluster membership probability of belonging to clusters whose centroids were defined in the UKBB discovery data (Supplementary Data 24). All analyses were performed in sex-specific and sex-combined strata. Genetic association analysis was performed using REGENIE v2.2.4, software for whole genome regression modelling of large GWASs that accounts for relatedness and population stratification¹²⁷. All GWASs were adjusted for baseline age, (baseline age)², the first 10 genetic PCs, and sex (in sex-combined analyses).

Estonian biobank

EstBB is a volunteer-based sample of Estonian residents comprising ~20% of the Estonian adult population (N > 210,000), recruited by medical personnel and through media campaigns. Various health and demographic data have been collected from the participants, both by medical workers and via self-reports, since 2002. The cohort has been described in detail by Leitsalu et al.⁴⁵. Genetic data for this study was obtained from genoty** with the Illumina global screening array (GSA) microchip, with imputation using a customised reference panel aligned to the hg19 genome, as described previously¹²⁸.

BMI was available for 193,490 participants. BMI measurements were collected by doctors (through measurements of height and weight) from 2001 to 2023. Population-level and individual-level QC of repeat BMI measurements were performed as described for the UKBB discovery cohort; we additionally excluded individuals with records of use of GLP-1 inhibitors such as semaglutide (blood glucose-lowering drugs that typically also result in weight loss, drug codes A10BJ*). In total, 82,034 female participants with 281,438 measurements of BMI and 45,735 male participants with 164,166 measurements of BMI were retained. Of these, 125,209 passed genoty** QC.

For each participant, we calculated linear rates of change in BMI over time with the LME model described in (2); we also calculated each individual’s soft cluster membership probability of belonging to clusters whose centroids were defined in the UKBB discovery data (Supplementary Data 24). All analyses were performed in sex-specific and sex-combined strata. Genetic association analysis was performed using REGENIE v3.2 software for whole genome regression modelling¹²⁷. All GWASs were adjusted for baseline age, (baseline age)², the first 20 genetic PCs, and sex (in sex-combined analyses).

Power calculations for replication sample sizes

We corrected the observed effect sizes from discovery GWASs for winner’s curse through an implementation first described by Palmer et al.¹²⁹. Briefly, we solve for the bias using the following maximum likelihood model,

$${\beta }_{obs}={\beta }_{true}+s\frac{\phi \left(\frac{{\beta }_{true}}{s}-c\right)-\phi \left(\frac{-{\beta }_{true}}{s}-c\right)}{\psi \left(\frac{{\beta }_{true}}{s}-c\right)+\psi \left(\frac{-{\beta }_{true}}{s}-c\right)}$$

(19)

where β_obs is the effect size in the discovery GWAS, β_true is the (assumed true) effect size in the source population, and c = 5.33 is the test statistic corresponding to a discovery α = 5 × 10⁻⁸. The sample size required to replicate the (assumed true) unbiased effect size is then calculated for nominally significant α = 0.05 and Bonferroni-adjusted for the number of independent variants tested, M_var ($\alpha=\frac{0.05}{{M}_{var}}$) as follows:

$$\,{{\mbox{power}}}(\alpha,{{\mbox{ncp}}})=1-{\chi }_{1}^{2}({({\chi }_{1}^{2})}^{-1}(1-\alpha ),{{\mbox{ncp}}})$$

(20)

under the alternative distribution which is non-central ${\chi }_{1}^{2}$ with non-centrality parameter per variant (ncp) estimated for a normalised trait with variance 1 as:

$$\,{{\mbox{ncp}}}\, \approx \, N\frac{2{\beta }_{obs}^{2}{{{{{\rm{AF}}}}}}(1-{{{{{\rm{AF}}}}}})}{1-2{\beta }_{obs}^{2}{{{{{\rm{AF}}}}}}(1-{{{{{\rm{AF}}}}}})}$$

(21)

where AF is the variant allele frequency.

Power comparison to GIANT 2019 meta-analysis of BMI

We accessed publicly available summary statistics from the GIANT consortium’s meta-analysis of BMI across UKBB and previous GIANT releases in female-specific (max N = 434,793), male-specific (max N = 374,755), and sex-combined strata (max N = 806,834)⁴⁶. SNPs included in both the GIANT 2019 meta-analysis and our in-house BMI-intercept GWAS that reached GWS in either study were carried forward for power comparisons, resulting in 26,812 (female-specific strata), 22,123 (male-specific strata), and 82,559 (sex-combined strata) SNPs. Per variant, we calculated the χ² statistic (as $\frac{{\beta }^{2}}{S{E}^{2}}$) and obtained the ratio of ${\chi }_{in-house}^{2}$ to ${\chi }_{GIANT}^{2}$. Median $\frac{{\chi }_{in-house}^{2}}{{\chi }_{GIANT}^{2}}$ across all GWS SNPs was then compared to the median ratio of sample sizes, i.e. $\frac{{N}_{in-house}}{{N}_{GIANT}}$, to determine the boost in power over that expected from the sample size difference between the two studies.

Single-variant analyses

The following analyses were all conducted in female-specific, male-specific, and sex-combined strata.

Abdominal adiposity change traits

Slope changes in WC, WHR, WCadjBMI, and WHRadjBMI for up to 44,154 individuals with repeat observations were calculated using LMEs models, adjusted and rank-based inverse-normal transformed¹¹⁶ for genetic association testing as described in the slope modelling section above. We estimated the additive association of number of copies of each lead variant minor allele (0, 1, or 2) with slope traits adjusted for the first 21 genetic PCs and genoty** array via linear regression (Supplementary Data 17).

Longitudinal phenome-wide association

We curated a longitudinal research resource for 45 additional quantitative phenotypes in up to 146,099 individuals of white–British ancestry (Supplementary Data 14, as identified by Kuan et al.¹³⁰) by integrating UKBB assessment centre measurements with the interim release of primary care records provided by GPs, with QC performed as described above for obesity traits. Slope changes in each of these phenotypes were calculated using LMEs models described in (2). A deterministic rank-based inverse-normal transformation¹¹⁶, as described in (5), was applied to the slope BLUP ${\hat{u}}_{i,1}$. The transformed slope-trait was tested for additive association with number of copies of each lead variant minor allele (0, 1, or 2), adjusted for the intercept BLUP ${\hat{u}}_{i,0}$, baseline age, (baseline age)2, sex, year of birth, number of follow-ups, total length of follow-up (in years), assessment centre, first 21 genetic PCs and genoty** array (Supplementary Data 18).

Identification of individuals with Alzheimer’s or dementia diagnoses

We identified participants with codes for history or diagnosis of dementia in either primary care or hospital in-patient records (Supplementary Data 15, as identified by Kuan et al.¹¹³). We performed sensitivity analyses for the replication of rs429358 associations with all obesity-change phenotypes after excluding up to 242 individuals of white–British ancestry with recorded history or diagnosis of dementia.

Identification of lifespan-associated variants

We curated a list of 138 independent variants associated with longevity in the GWAS Catalog⁵⁴, accessed on 27 March 2023 (Supplementary Data 16). We identified independent SNPs that passed genoty** and imputation QC filters in UKBB by pair-wise pruning variants in LD (r² > 0.1) within a 1 Mb window. One of the lead variants identified in this study, i.e., rs429358 in the APOE locus, was pruned out in favour of rs4420638, which is 11 kb away from the lead variant and in LD with rs429358 with r² = 0.69. We looked up the effects of these variants in the various adiposity-change GWAS summary statistics and established significance at P = 3.60 × 10⁻⁴ (Bonferroni-corrected at 5% across 138 tests).

SNP heritability and genetic correlations

We estimated the heritability explained by genotyped SNPs (${h}_{G}^{2}$) and genetic correlations (r_G) between obesity-intercept and obesity-change traits, from summary statistics, using LD score regression implemented in the LDSC software^67,131, with pre-computed LD-scores based on European-ancestry samples of the 1000 Genomes Project¹³² restricted to HapMap3 SNPs¹³³. The same protocol was followed to determine r_G between BMI-intercept in our in-house study and BMI in the GIANT 2019 meta-analysis.

Joint modelling of intra-individual mean and variance

Analyses were performed using the TrajGWAS package²⁸ in Julia¹³⁴, for 177,472 unrelated individuals of white–British ancestry with multiple measurements of weight included in the discovery analyses. Briefly, TrajGWAS analysis is conducted in two stages to test for genetic effects on longitudinal trajectory mean, intra-individual variance, and a joint effect on either mean or variance in an LME model framework²⁸. In the first stage, we fit a null model for weight with fixed effects for the intercept, age, age², sex, and 21 genetic PCs; we included random effects for the intercept and linear slope of age. In the second stage, we performed score testing with the saddle-point approximation under the full model, i.e. including genome-wide effects for all variants with MAF >1% in the genotyped and imputed UKBB data that passed QC.

Sex-heterogeneity testing

We tested for sex-heterogeneity in the effects of adiposity-change lead SNPs by calculating Z-statistics and corresponding P-values for the difference in female-specific and male-specific effects as:

$${Z}_{sexhet}=\frac{({\hat{\beta }}_{(F)}-{\hat{\beta }}_{(M)})}{\sqrt{(S{E}_{(F)}^{2}+S{E}_{(M)}^{2})}}$$

(22)

A similar statistic and test was used to determine heterogeneity between (${h}_{G}^{2}$) of all traits in males and females, and r_G between obesity-intercepts and obesity-change traits in males and females.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The GWAS summary statistics generated in this study have been deposited in the GWAS Catalog⁵⁴. They can be downloaded from the parent directory: ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90429001-GCST90430000/ using the accession numbers provided in Supplementary Data 26 (ranging from GCST90429765 to GCST90429794).

Code availability

All code required to reproduce analyses is publicly available at: https://github.com/lindgrengroup/longitudinal_primarycare/tree/main/adiposity/scripts/manuscript¹³⁵.

References

Bluher, M. Obesity: global epidemiology and pathogenesis. Nat. Rev. Endocrinol. 15, 288–298 (2019).
Article PubMed Google Scholar
Collaborators, G. B. D. O. et al. Health effects of overweight and obesity in 195 countries over 25 years. N. Engl. J. Med. 377, 13–27 (2017).
Article Google Scholar
Must, A. et al. The disease burden associated with overweight and obesity. JAMA 282, 1523–1529 (1999).
Article CAS PubMed Google Scholar
Loos, R. J. F. & Yeo, G. S. H. The genetics of obesity: from discovery to biology. Nat. Rev. Genet. 23, 120–133 (2022).
Article CAS PubMed Google Scholar
Maes, H. H., Neale, M. C. & Eaves, L. J. Genetic and environmental factors in relative body weight and human adiposity. Behav. Genet. 27, 325–351 (1997).
Article CAS PubMed Google Scholar
Elks, C. E. et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Front. Endocrinol. (Lausanne) 3, 29 (2012).
Article PubMed Google Scholar
Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596.e9 (2019).
Article PubMed PubMed Central Google Scholar
Hardy, R. et al. Life course variations in the associations between fto and mc4r gene variants and body size. Hum. Mol. Genet. 19, 545–552 (2010).
Article CAS PubMed Google Scholar
Silventoinen, K. et al. Changing genetic architecture of body mass index from infancy to early adulthood: an individual based pooled analysis of 25 twin cohorts. Int. J. Obes. (Lond.) 46, 1901–1909 (2022).
Article CAS PubMed Google Scholar
Helgeland, O. et al. Characterization of the genetic architecture of infant and early childhood body mass index. Nat. Metab. 4, 344–358 (2022).
Article CAS PubMed Google Scholar
Couto Alves, A. et al. Gwas on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI. Sci. Adv. 5, eaaw3095 (2019).
Article ADS PubMed PubMed Central Google Scholar
Hjelmborg, J. et al. Genetic influences on growth traits of bmi: a longitudinal study of adult twins. Obesity 16, 847–852 (2008).
Article PubMed Google Scholar
Fabsitz, R. R., Sholinsky, P. & Carmelli, D. Genetic influences on adult weight gain and maximum body mass index in male twins. Am. J. Epidemiol. 140, 711–720 (1994).
Article CAS PubMed Google Scholar
Austin, M. A. et al. Genetic influences on changes in body mass index: a longitudinal analysis of women twins. Obes. Res. 5, 326–331 (1997).
Article CAS PubMed Google Scholar
Xu, J. et al. Exploring the clinical and genetic associations of adult weight trajectories using electronic health records in a racially diverse biobank: a phenome-wide and polygenic risk study. Lancet Digit Health 4, e604–e614 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat. Med. 26, 29–38 (2020).
Article CAS PubMed Google Scholar
Wolford, B. N., Willer, C. J. & Surakka, I. Electronic health records: the next wave of complex disease genetics. Hum. Mol. Genet. 27, R14–R21 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wei, W. Q. & Denny, J. C. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med. 7, 41 (2015).
Article PubMed PubMed Central Google Scholar
Gottesman, O. et al. The electronic medical records and genomics (emerge) network: past, present, and future. Genet. Med. 15, 761–771 (2013).
Article PubMed PubMed Central Google Scholar
Monda, K. L. et al. A meta-analysis identifies new loci associated with body mass index in individuals of african ancestry. Nat. Genet. 45, 690–696 (2013).
Article CAS PubMed PubMed Central Google Scholar
Postmus, I. et al. Pharmacogenetic meta-analysis of genome-wide association studies of ldl cholesterol response to statins. Nat. Commun. 5, 5068 (2014).
Article CAS PubMed Google Scholar
Chiu, Y. F., Justice, A. E. & Melton, P. E. Longitudinal analytical approaches to genetic data. BMC Genet. 2, 4 (2016).
Article Google Scholar
Fan, R. et al. Longitudinal association analysis of quantitative traits. Genet. Epidemiol. 36, 856–869 (2012).
Article PubMed PubMed Central Google Scholar
Furlotte, N. A., Eskin, E. & Eyheramendy, S. Genome-wide association map** with longitudinal data. Genet. Epidemiol. 36, 463–471 (2012).
Article PubMed PubMed Central Google Scholar
Goldstein, J. A. et al. Labwas: novel findings and study design recommendations from a meta-analysis of clinical labs in two independent biobanks. PLoS Genet. 16, e1009077 (2020).
Article CAS PubMed PubMed Central Google Scholar
Justice, A. E. et al. Genome-wide association of trajectories of systolic blood pressure change. BMC Proc. 10, 321–327 (2016).
Article PubMed PubMed Central Google Scholar
Gauderman, W. J. et al. Longitudinal data analysis in pedigree studies. Genet. Epidemiol. 1, S18–28 (2003).
Article Google Scholar
Ko, S. et al. Gwas of longitudinal trajectories at biobank scale. Am. J. Hum. Genet. 109, 433–445 (2022).
Article CAS PubMed PubMed Central Google Scholar
Laird, N. M. & Ware, J. H. Random-effects models for longitudinal data. Biometrics 38, 963–974 (1982).
Article CAS PubMed Google Scholar
Xu, H. et al. High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes. Bioinformatics 36, 3004–3010 (2020).
Article CAS PubMed Google Scholar
Ruppert, D., Wand, M. P. & Carroll, R. J. Semiparametric regression. Cambridge Series in Statistical and Probabilistic Mathematics. https://www.cambridge.org/core/books/semiparametric-regression/02FC9A9435232CA67532B4D31874412C (Cambridge University Press, Cambridge, 2003).
Das, K. et al. A dynamic model for genome-wide association studies. Hum. Genet. 129, 629–639 (2011).
Article PubMed PubMed Central Google Scholar
Das, K. et al. Dynamic semiparametric Bayesian models for genetic map** of complex trait with irregular longitudinal data. Stat. Med. 32, 509–523 (2013).
Article MathSciNet PubMed Google Scholar
Li, Z. & Sillanpää, M. J. A bayesian nonparametric approach for map** dynamic quantitative traits. Genetics 194, 997–1016 (2013).
Article PubMed PubMed Central Google Scholar
Li, J., Wang, Z., Li, R. & Wu, R. Bayesian group lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. Ann. Appl. Stat. 9, 640–664 (2015).
Article MathSciNet PubMed PubMed Central Google Scholar
Anh Luong, D. T. & Chandola, V. A K-means approach to clustering disease progressions. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI), 268–274 (2017).
Hedman, A. K. et al. Identification of novel pheno-groups in heart failure with preserved ejection fraction using machine learning. Heart 106, 342–349 (2020).
Article PubMed Google Scholar
Lee, C. & Schaar, M. V. D. Temporal phenoty** using deep predictive clustering of disease progression. In: Proceedings of the 37th International Conference on Machine Learning, 5767–5777 (PMLR, 2020). https://proceedings.mlr.press/v119/lee20h.html. ISSN: 2640-3498.
Mullin, S. et al. Longitudinal K-means approaches to clustering and analyzing EHR opioid use trajectories for clinical subtypes. J. Biomed. Inform. 122, 103889 (2021).
Article PubMed PubMed Central Google Scholar
Lee, C., Rashbass, J. & van der Schaar, M. Outcome-oriented deep temporal phenoty** of disease progression. IEEE Trans. Biomed. Eng. 68, 2423–2434 (2021).
Article PubMed Google Scholar
Carr, O., Javer, A., Rockenschaub, P., Parsons, O. & Durichen, R. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. In Proceedings of Machine Learning for Health. https://proceedings.mlr.press/v158/carr21a.html. 220–238 (PMLR, 2021).
Sudlow, C. et al. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Gaziano, J. M. et al. Million veteran program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
Article PubMed Google Scholar
Nguyen, X. T. et al. Baseline characterization and annual trends of body mass index for a mega-biobank cohort of us veterans 2011-2017. J. Health Res. Rev. Dev. Ctries 5, 98–107 (2018).
Article PubMed PubMed Central Google Scholar
Leitsalu, L. et al. Cohort profile: Estonian biobank of the estonian genome center, university of tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).
Article PubMed Google Scholar
Pulit, S. L. et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of european ancestry. Hum. Mol. Genet. 28, 166–174 (2019).
Article CAS PubMed Google Scholar
Benonisdottir, S. et al. Epigenetic and genetic components of height regulation. Nat. Commun. 7, 13490 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Shenkman, M. et al. Mannosidase activity of edem1 and edem2 depends on an unfolded state of their glycoprotein substrates. Commun. Biol. 1, 172 (2018).
Article PubMed PubMed Central Google Scholar
Tews, D. et al. Teneurin-2 (tenm2) deficiency induces ucp1 expression in differentiating human fat cells. Mol. Cell Endocrinol. 443, 106–113 (2017).
Article CAS PubMed Google Scholar
Jung, H. et al. Sexually dimorphic behavior, neuronal activity, and gene expression in chd8-mutant mice. Nat. Neurosci. 21, 1218–1228 (2018).
Article CAS PubMed Google Scholar
Mo, D. et al. Transcriptome landscape of porcine intramuscular adipocytes during differentiation. J. Agric Food Chem. 65, 6317–6328 (2017).
Article CAS PubMed Google Scholar
Groza, T. et al. The international mouse phenoty** consortium: comprehensive knockout phenoty** underpinning the study of human disease. Nucleic Acids Res. 51, D1038–D1045 (2023).
Article CAS PubMed Google Scholar
Pirastu, N. et al. Genetic analyses identify widespread sex-differential participation bias. Nat. Genet. 53, 663–671 (2021).
Article CAS PubMed PubMed Central Google Scholar
Welter, D. et al. The nhgri gwas catalog, a curated resource of snp-trait associations. Nucleic Acids Res. 42, D1001–6 (2014).
Article CAS PubMed Google Scholar
Reynolds, A. P., Richards, G., de la Iglesia, B. & Rayward-Smith, V. J. Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorithms 5, 475–504 (2006).
Article MathSciNet Google Scholar
Schubert, E. & Rousseeuw, P. J. Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Amato, G., Gennaro, C., Oria, V. & Radovanović, M. (eds.) Similarity Search and Applications, Lecture Notes in Computer Science, 171–187 (Springer International Publishing, Cham, 2019).
Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hoffmann, T. J. et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat. Genet. 50, 401–413 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shen, L. et al. Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in mci and ad: a study of the adni cohort. Neuroimage 53, 1051–1063 (2010).
Article CAS PubMed Google Scholar
Nazarian, A., Yashin, A. I. & Kulminski, A. M. Genome-wide analysis of genetic predisposition to alzheimer’s disease and related sex disparities. Alzheimers Res. Ther. 11, 5 (2019).
Article PubMed PubMed Central Google Scholar
Joshi, P. K. et al. Variants near chrna3/5 and apoe have age- and sex-related effects on human lifespan. Nat. Commun. 7, 11174 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Pilling, L. C. et al. Human longevity: 25 genetic loci associated in 389,166 uk biobank participants. Aging (Albany NY) 9, 2504–2520 (2017).
Article CAS PubMed Google Scholar
Lumsden, A. L., Mulugeta, A., Zhou, A. & Hypponen, E. Apolipoprotein e (apoe) genotype-associated disease risks: a phenome-wide, registry-based, case-control study utilising the uk biobank. EBioMedicine 59, 102954 (2020).
Article CAS PubMed PubMed Central Google Scholar
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 e19 (2016).
Article PubMed PubMed Central Google Scholar
Kettunen, J. et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of lpa. Nat. Commun. 7, 11122 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Shrine, N. et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 51, 481–493 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. K. et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Song, M. et al. Associations between genetic variants associated with body mass index and trajectories of body fatness across the life course: a longitudinal analysis. Int. J. Epidemiol. 47, 506–515 (2018).
Article PubMed Google Scholar
Bray, M. S. et al. Nih working group report-using genomic information to guide weight management: from universal to precision treatment. Obes. (Silver Spring) 24, 14–22 (2016).
Article Google Scholar
Delahanty, L. M. et al. Genetic predictors of weight loss and weight regain after intensive lifestyle modification, metformin treatment, or standard care in the diabetes prevention program. Diab Care 35, 363–366 (2012).
Article CAS Google Scholar
Liou, T. H. et al. Esr1, fto, and ucp2 genes interact with bariatric surgery affecting weight loss and glycemic control in severely obese patients. Obes. Surg. 21, 1758–1765 (2011).
Article PubMed Google Scholar
Sarzynski, M. A. et al. Associations of markers in 11 obesity candidate genes with maximal weight loss and weight regain in the sos bariatric surgery cases. Int J. Obes. 35, 676–683 (2011).
Article CAS Google Scholar
Zhang, X. et al. Fto genotype and 2-year change in body composition and fat distribution in response to weight-loss diets: the pounds lost trial. Diabetes 61, 3005–3011 (2012).
Article CAS PubMed PubMed Central Google Scholar
Papandonatos, G. D. et al. Genetic predisposition to weight loss and regain with lifestyle intervention: analyses from the diabetes prevention program and the look ahead randomized controlled trials. Diabetes 64, 4312–4321 (2015).
Article CAS PubMed PubMed Central Google Scholar
McCaffery, J. M. et al. Genetic predictors of change in waist circumference and waist-to-hip ratio with lifestyle intervention: the trans-nih consortium for genetics of weight loss response to lifestyle intervention. Diabetes 71, 669–676 (2022).
Article CAS PubMed PubMed Central Google Scholar
Holzapfel, C. et al. Association between single nucleotide polymorphisms and weight reduction in behavioural interventions-a pooled analysis. Nutrients 13, 819 (2021).
Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).
Article CAS PubMed Google Scholar
Silventoinen, K. & Kaprio, J. Genetics of tracking of body mass index from birth to late middle age: evidence from twin and family studies. Obes. Facts 2, 196–202 (2009).
Article PubMed PubMed Central Google Scholar
Winkler, T. W. et al. The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study. PLOS Genet. 11, e1005378 (2015).
Article PubMed PubMed Central Google Scholar
Gillespie, N. A. et al. Determining the stability of genome-wide factors in BMI between ages 40 to 69 years. PLOS Genet. 18, e1010303 (2022).
Article CAS PubMed PubMed Central Google Scholar
Beesley, L. J., Fritsche, L. G. & Mukherjee, B. A modeling framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records. bioRxiv. https://www.biorxiv.org/content/early/2019/05/14/499392 (2019).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of uk biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Article PubMed PubMed Central Google Scholar
Goudie, R. J. B., Presanis, A. M., Lunn, D., Angelis, D. D. & Wernisch, L. Joining and splitting models with Markov melding. Bayesian Anal. 14, 81–109 (2019).
Article MathSciNet PubMed PubMed Central Google Scholar
Loh, P. R. et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Triglyceride-glucose index variability and incident cardiovascular disease: a prospective cohort study. Cardiovasc. Diabetol. 21, 105 (2022).
Article PubMed PubMed Central Google Scholar
Nuyujukian, D. S. et al. Blood pressure variability and risk of heart failure in accord and the vadt. Diabetes Care 43, 1471–1478 (2020).
Article CAS PubMed PubMed Central Google Scholar
Speakman, J. R. et al. Set points, settling points and some alternative models: theoretical options to understand how genes and environments combine to regulate body adiposity. Dis. Model. Mech. 4, 733–745 (2011).
Article PubMed PubMed Central Google Scholar
Muller, M. J., Geisler, C., Heymsfield, S. B. & Bosy-Westphal, A. Recent advances in understanding body weight homeostasis in humans. F1000Res 7, F1000 (2018).
Nawaz, H., Chan, W., Abdulrahman, M., Larson, D. & Katz, D. L. Self-reported weight and height: implications for obesity research. Am. J. Prev. Med. 20, 294–298 (2001).
Article CAS PubMed Google Scholar
Kowal, R. C., Herz, J., Goldstein, J. L., Esser, V. & Brown, M. S. Low density lipoprotein receptor-related protein mediates uptake of cholesteryl esters derived from apoprotein e-enriched lipoproteins. Proc. Natl. Acad. Sci. USA 86, 5810–5814 (1989).
Article ADS CAS PubMed PubMed Central Google Scholar
Kockx, M., Traini, M. & Kritharides, L. Cell-specific production, secretion, and function of apolipoprotein e. J. Mol. Med. 96, 361–371 (2018).
Article CAS PubMed Google Scholar
Garrison, R. J. et al. Obesity and lipoprotein cholesterol in the framingham offspring study. Metabolism 29, 1053–1060 (1980).
Article CAS PubMed Google Scholar
Albrink, M. J. et al. Intercorrelations among plasma high density lipoprotein, obesity and triglycerides in a normal population. Lipids 15, 668–676 (1980).
Article CAS PubMed Google Scholar
Panagiotakos, D. B., Pitsavos, C., Yannakoulia, M., Chrysohoou, C. & Stefanadis, C. The implication of obesity and central fat on markers of chronic inflammation: the Attica study. Atherosclerosis 183, 308–315 (2005).
Article CAS PubMed Google Scholar
Purdy, J. C. & Shatzel, J. J. The hematologic consequences of obesity. Eur. J. Haematol. 106, 306–319 (2021).
Article PubMed Google Scholar
Gillette Guyonnet, S. et al. Iana (international academy on nutrition and aging) expert group: weight loss and alzheimer’s disease. J. Nutr. Health Aging 11, 38–48 (2007).
CAS PubMed Google Scholar
von Hardenberg, S., Gnewuch, C., Schmitz, G. & Borlak, J. Apoe is a major determinant of hepatic bile acid homeostasis in mice. J. Nutr. Biochem. 52, 82–91 (2018).
Article Google Scholar
Wang, J. et al. Apoe and the role of very low density lipoproteins in adipose tissue inflammation. Atherosclerosis 223, 342–349 (2012).
Article CAS PubMed PubMed Central Google Scholar
Blanchard, J. W. et al. Apoe4 impairs myelination via cholesterol dysregulation in oligodendrocytes. Nature 611, 769–779 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Greendale, G. A. et al. Changes in body composition and weight during the menopause transition. JCI Insight. 4, e124865 (2019).
Davies, K. M., Heaney, R. P., Recker, R. R., Barger-Lux, M. J. & Lappe, J. M. Hormones, weight change and menopause. Int. J. Obes. Relat. Metab. Disord. 25, 874–879 (2001).
Article CAS PubMed Google Scholar
Chen, Y. W., Hang, D., Kvaerner, A. S., Giovannucci, E. & Song, M. Associations between body shape across the life course and adulthood concentrations of sex hormones in men and pre- and postmenopausal women: a multicohort study. Br. J. Nutr. 127, 1000–1009 (2022).
Article CAS PubMed Google Scholar
Conroy, M. et al. The advantages of UK biobank’s open-access strategy for health research. J. Intern. Med. 286, 389–397 (2019).
Article CAS PubMed PubMed Central Google Scholar
Coady, S. A. et al. Genetic variability of adult body mass index: a longitudinal assessment in framingham families. Obes. Res. 10, 675–681 (2002).
Article PubMed Google Scholar
Singh, P. et al. Statins decrease leptin expression in human white adipocytes. Physiol. Rep. 6, e13566 (2018).
McCarron, D. A. & Reusser, M. E. Body weight and blood pressure regulation. Am. J. Clin. Nutr. 63, 423S–425S (1996).
Article CAS PubMed Google Scholar
Hernan, M. A., Hernandez-Diaz, S. & Robins, J. M. A structural approach to selection bias. Epidemiology 15, 615–625 (2004).
Article PubMed Google Scholar
Beesley, L. J. et al. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat. Med 39, 773–800 (2020).
Article MathSciNet PubMed Google Scholar
Kutcher, S. A., Brophy, J. M., Banack, H. R., Kaufman, J. S. & Samuel, M. Emulating a randomised controlled trial with observational data: an introduction to the target trial framework. Can. J. Cardiol. 37, 1365–1377 (2021).
Article PubMed Google Scholar
Shortreed, S. M., Rutter, C. M., Cook, A. J. & Simon, G. E. Improving pragmatic clinical trial design using real-world data. Clin. Trials 16, 273–282 (2019).
Article PubMed PubMed Central Google Scholar
Bycroft, C. et al. The uk biobank resource with deep phenoty** and genomic data. Nature 562, 203–209 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Team, U. B. UK Biobank Primary Care Linked Data (2019), version 1.0 edn. https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/primary_care_data.pdf (2019).
Kuan, V. et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the english national health service. Lancet Digit Health 1, e63–e77 (2019).
Article PubMed PubMed Central Google Scholar
Bates, D., Machler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Article Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2021). https://www.R-project.org/.
Beasley, T. M., Erickson, S. & Allison, D. B. Rank-based inverse normal transformations are increasingly used, but are they merited? Behav. Genet. 39, 580–595 (2009).
Article PubMed PubMed Central Google Scholar
Eilers, P. H. C. & Marx, B. D. Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996).
Article MathSciNet Google Scholar
O’Hagan, A. & Kendall, M. G. Kendall’s advanced theory of statistics: bayesian inference. Volume 2B (Edward Arnold, 1994). Google-Books-ID: DlrEMgEACAAJ.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K.cluster: Cluster Analysis Basics and Extensions https://CRAN.R-project.org/package = cluster. R package version 2.1.4 — For new features, see the ‘Changelog’ file (in the package source) (2022).
Peter, J. R. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Article Google Scholar
Smithson, M. & Verkuilen, J. A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables. Psychol. Methods 11, 54–71 (2006).
Article PubMed Google Scholar
Benner, C. et al. Finemap: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Article CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. Gcta: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation plink: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S (Springer, New York, 2002), fourth edn. https://www.stats.ox.ac.uk/pub/MASS4/ (2002).
Hunter-Zinck, H. et al. Genoty** array design and data quality control in the million veteran program. Am. J. Hum. Genet. 106, 535–548 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
Article CAS PubMed Google Scholar
Mitt, M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage wgs-based imputation reference panel. Eur. J. Hum. Genet. 25, 869–876 (2017).
Article PubMed PubMed Central Google Scholar
Palmer, C. & Pe’er, I. Statistical correction of the winner’s curse explains replication variability in quantitative trait genome-wide association studies. PLoS Genet. 13, e1006916 (2017).
Article PubMed PubMed Central Google Scholar
Denaxas, S. et al. A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the uk biobank using different primary care ehr and clinical terminology systems. JAMIA Open 3, 545–556 (2020).
Article PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS PubMed PubMed Central Google Scholar
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS Google Scholar
International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Article ADS Google Scholar
Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: a fresh approach to numerical computing. SIAM Rev. 59, 65–98 (2017).
Article MathSciNet Google Scholar
Venkatesh, S. S. & Nicholson, G. The genetic architecture of changes in adiposity during adulthood. GitHub repository https://doi.org/10.5281/zenodo.11108733 (2024).

Download references

Acknowledgements

S.S.V. was supported by the Rhodes Scholarships, Clarendon Fund, and the Medical Sciences Doctoral Training Centre at the University of Oxford. K.C. was supported by the University of Leicester (College of Life Sciences) and Health Data Research UK. K.A. was supported by the Estonian Research Council’s Personal Starting Grant PSG759. L.B.L.W. was supported by the Wellcome Trust. U.V. was supported by the Estonian Research Council’s Personal Starting Grant PSG759. C.H. wishes to acknowledge support from the Alan Turing Institute, the EPSRC grant Bayes4Health, Novartis, and Novo Nordisk. C.M.L. is supported by the Li Ka Shing Foundation, NIHR Oxford Biomedical Research Centre, Oxford, NIH (1P50HD104224-01), Gates Foundation (INV-024200), and a Wellcome Trust Investigator Award (221782/Z/20/Z). G.N. acknowledges funding from the NIHR Biomedical Research Centre, Oxford (grant no. NIHR203311). This research has been conducted using the UK Biobank Resource under Application Number 11867. This research was partially supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z with additional support from the NIHR Oxford BRC. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. This research is partially supported by funding from the Department of Veterans Affairs Office of Research and Development, Million Veteran Program Grant I01-BX003340 and I01-BX004821. This publication does not represent the views of the Department of Veterans Affairs or the United States Government. This study was partially funded by the European Union through the European Regional Development Fund Project No. 2014-2020.4.01.15-0012 GENTRANSMED. Data analysis was carried out in part in the High-Performance Computing Centre of the University of Tartu. The activities of the EstBB are regulated by the Human Genes Research Act, which was adopted in 2000 specifically for the operations of the EstBB. Individual-level data analysis in the EstBB was carried out under ethical approvals of 1.1-12/1409 and 1.1-12/2161 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs), using data according to release application 6-7/GI/31993 from the EstBB.

Author information

Authors and Affiliations

Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
Samvida S. Venkatesh & Cecilia M. Lindgren
Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
Samvida S. Venkatesh, Habib Ganjgahi, Duncan S. Palmer, Christoffer Nellåker & Cecilia M. Lindgren
Department of Statistics, University of Oxford, Oxford, UK
Habib Ganjgahi, Chris Holmes & George Nicholson
Nuffield Department of Population Health, Medical Sciences Division, University of Oxford, Oxford, UK
Duncan S. Palmer
Department of Population Health Sciences, University of Leicester, Leicester, UK
Kayesha Coley
Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
Gregorio V. Linchangco Jr., Qin Hui & Yan V. Sun
Atlanta VA Health Care System, Decatur, GA, USA
Gregorio V. Linchangco Jr., Qin Hui, Peter Wilson & Yan V. Sun
Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA
Peter Wilson
Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Veterans Affairs Boston Healthcare System, Boston, MA, USA
Yuk-Lam Ho & Kelly Cho
Division of Aging, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Kelly Cho
Institute of Psychology, Faculty of Social Sciences, University of Tartu, Tartu, Estonia
Kadri Arumäe & Uku Vainik
Novo Nordisk Research Centre Oxford, Oxford, UK
Laura B. L. Wittemans
Nuffield Department of Women’s and Reproductive Health, Medical Sciences Division, University of Oxford, Oxford, UK
Laura B. L. Wittemans, Christoffer Nellåker & Cecilia M. Lindgren
Estonian Genome Centre, Institute of Genomics, Faculty of Science and Technology, University of Tartu, Tartu, Estonia
Andres Metspalu, Lili Milani, Tõnu Esko, Reedik Mägi, Mari Nelis, Georgi Hudjashov & Uku Vainik
Department of Neurology and Neurosurgery, Faculty of Medicine and Health Sciences, University of McGill, Montreal, Canada
Uku Vainik
Nuffield Department of Medicine, Medical Sciences Division, University of Oxford, Oxford, UK
Chris Holmes
The Alan Turing Institute, London, UK
Chris Holmes
Broad Institute of Harvard and MIT, Cambridge, MA, USA
Cecilia M. Lindgren

Authors

Samvida S. Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar
Habib Ganjgahi
View author publications
You can also search for this author in PubMed Google Scholar
Duncan S. Palmer
View author publications
You can also search for this author in PubMed Google Scholar
Kayesha Coley
View author publications
You can also search for this author in PubMed Google Scholar
Gregorio V. Linchangco Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Qin Hui
View author publications
You can also search for this author in PubMed Google Scholar
Peter Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Yuk-Lam Ho
View author publications
You can also search for this author in PubMed Google Scholar
Kelly Cho
View author publications
You can also search for this author in PubMed Google Scholar
Kadri Arumäe
View author publications
You can also search for this author in PubMed Google Scholar
Laura B. L. Wittemans
View author publications
You can also search for this author in PubMed Google Scholar
Christoffer Nellåker
View author publications
You can also search for this author in PubMed Google Scholar
Uku Vainik
View author publications
You can also search for this author in PubMed Google Scholar
Yan V. Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chris Holmes
View author publications
You can also search for this author in PubMed Google Scholar
Cecilia M. Lindgren
View author publications
You can also search for this author in PubMed Google Scholar
George Nicholson
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Million Veteran Program

Gregorio V. Linchangco Jr.
, Qin Hui
, Peter Wilson
, Yuk-Lam Ho
& Kelly Cho

Estonian Biobank Research Team

Kadri Arumäe
, Uku Vainik
, Andres Metspalu
, Lili Milani
, Tõnu Esko
, Reedik Mägi
, Mari Nelis
& Georgi Hudjashov

Contributions

S.S.V., G.N., and C.M.L. conceptualised the study. Data curation and formal analyses were conducted by S.S.V., Kayesha C., G.V.L., Q.H., K.A., U.V., and G.N. S.S.V., H.G., and G.N. developed methodology and software. Data collection was performed by P.W., Y.H., and Kelly C. Funding was acquired by U.V., Y.V.S, C.H., and C.M.L. C.H., G.N., and C.M.L. were responsible for supervision. S.S.V. and G.N. wrote the original draft. S.S.V, H.G., D.S.P., L.B.L.W, C.N., C.H., C.M.L., and G.N. edited the draft.

Corresponding authors

Correspondence to Samvida S. Venkatesh, Cecilia M. Lindgren or George Nicholson.

Ethics declarations

Competing interests

L.B.L.W. is currently employed by Novo Nordisk Research Centre Oxford but, while she conducted the research described in this manuscript, was only affiliated with the University of Oxford. C.H. reports grants from Novo Nordisk and Novartis; C.M.L. reports grants from Bayer AG and Novo Nordisk and has a partner who works at Vertex. The other authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Andrea Ganna, Zoltán Kutalik and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1-26

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Venkatesh, S.S., Ganjgahi, H., Palmer, D.S. et al. Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records. Nat Commun 15, 5801 (2024). https://doi.org/10.1038/s41467-024-49998-0

Download citation

Received: 25 January 2023
Accepted: 25 June 2024
Published: 10 July 2024
DOI: https://doi.org/10.1038/s41467-024-49998-0
Springer Nature Limited

Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records

Abstract

Similar content being viewed by others

Introduction

Results

Longitudinal data help identify novel genetic signals for obesity

APOE variant associated with weight loss over time, independent of baseline obesity

Genome-wide architecture of change in adiposity over time is distinct from baseline adiposity

Discussion

Methods

Identification and quality control of longitudinal obesity records

UK Biobank

Repeat obesity trait measurements

Quality control

BMI and weight validation data

Self-reported weight change data

Abdominal adiposity data

Models to define baseline adiposity and adiposity change traits

Intercept and slope traits for GWAS

Modelling non-linear trajectories with regularised splines

Soft clustering of individuals by non-linear adiposity trajectory patterns

Genome-wide association studies

QC of UK Biobank genotyped and imputed data

Linear mixed model association analyses for quantitative traits

Fine-map** SNP associations

Classifying baseline BMI and weight SNPs as reported, refined, or novel obesity associations

Replication of GWS associations in UK Biobank hold-out sets

BMI and weight intercept-trait genetic associations

BMI and weight slope-trait genetic associations

Genetic associations with BMI and weight cluster probabilities

Genetic associations with self-reported weight change

Replication of GWS associations in external cohorts

Million Veteran Program

Estonian biobank

Power calculations for replication sample sizes

Power comparison to GIANT 2019 meta-analysis of BMI

Single-variant analyses

Abdominal adiposity change traits

Longitudinal phenome-wide association

Identification of individuals with Alzheimer’s or dementia diagnoses

Identification of lifespan-associated variants

SNP heritability and genetic correlations

Joint modelling of intra-individual mean and variance

Sex-heterogeneity testing

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

Million Veteran Program

Estonian Biobank Research Team

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation