Introduction

Genomic studies have established the importance of copy number variants (CNVs) in rare disease aetiology, and particularly as causal factors for neurodevelopmental disorders [1]. Despite conferring substantial risk for severe outcomes, CNVs often exhibit incomplete penetrance and wide variability in clinical manifestations [2,3,4,5], suggesting complex mechanisms for disease liability.

One of the most common genomic rearrangements is a recurrent hemizygous deletion on chromosome 22q11.2. In 90% of the cases, the deletion occurs de novo via non-allelic homologous recombination of low copy repeats [6]. In addition to being associated with characteristic physical features, cognitive deficits, heart problems, and neuropsychiatric symptoms [6,7,8,9], it is one of the most common aetiology for schizophrenia (SCZ) with a penetrance of 25–40% [10, 11]. Such incomplete penetrance could be associated with environmental and/or genetic factors as proposed for neurodevelopmental manifestations linked to 16p11.2 rearrangements [12]. Multiple lines of evidence support this “multiple hit” model, whereby secondary hits (i.e., modifying genetic factors) in addition to the threshold-lowering first hit (i.e., 22q11.2 deletion) modulates the clinical outcomes [13, 14]. While in rare cases the potential modifier effects of the 22q11.2 deletion syndrome (22q11.2DS) were attributed to variable deletion size [15], hemizygosity of single nucleotide variants (SNVs) on the intact allele [16, 17], and additional rare CNVs [18], an increasing burden of evidence suggests that common allelic variation of SNVs pertaining to SCZ biology could explain variability in neuropsychiatric symptoms of 22q11.2DS [16, 19], specifically psychosis development and cognitive decline [20, 21].

Trait-associated SNV effects pooled into a single number, namely a polygenic risk score (PRS) could capture a meaningful proportion of phenotypic variance (e.g., 7.7% for SCZ phenotype [22]) and thus facilitates the estimation of genetic liability for a trait of interest as well as for endophenotypes and biologically overlap** outcomes. SCZ PRS has been associated with prodromal motor deficits [23], cognitive ability [24, 25], and disorganized symptoms in the general population [25], negative symptoms and anxiety in adolescence [26], and greater illness severity and worse cognition within a psychosis cohort [27]. SCZ PRS was also linked with decreased total brain volume and cortical thickness [28, 29], reduced neurite density index, especially in the thalamus, basal ganglia, and hippocampus [30], thinner frontotemporal cortices and a smaller hippocampal subfield volume [31], as well as with impaired mnemonic hippocampal activity [47]. Cortical thickness was computed as the shortest distance between the white and the pial cortical surfaces [48, 49] and surface area was measured at the grey/white matter boundary. Average measures of cortical thickness and surface area were extracted from 68 regions based on the Desikan parcellation [50]. An automated segmentation technique published with FreeSurfer v6.0 [51] was employed to obtain the volume of the whole hippocampus and seven relevant subfields, including CA1, CA2/3, CA4, GC-DG, ML, tail, and subiculum. All the obtained images were visually inspected and excluded from downstream analysis if the quality of the segmentation was sub-optimal as explained in detail in Mancini et al. [52].

Genoty**

One hundred and twenty-two individuals whose DNA samples were available within the Swiss 22q11.2DS cohort, were subjected for whole-genome genoty** with the Illumina Global Diversity Array v1. Quality control was carried out with PLINK v2.0 [53] (webpage: https://www.cog-genomics.org/plink/2.0/) using the following criteria: (i) exclusion of individuals with genotype call rate <95%; (ii) exclusion of single nucleotide variants (SNVs) with call rate <95%, Hardy-Weinberg equation (HWE) < 1e-4, minor allele frequency (MAF) < 0.01, and with A/T or G/C alleles to avoid strand issues; (iii) removal of outliers who deviated ± 3 standard deviations from the samples’ heterozygosity rate mean, and (iv) verification that the data did not contain closely related individuals (PI_HAT > 0.2) and that phenotype and genotype sex matched. Of first-degree relatives, one member of each related pair was excluded, preferentially retaining samples that had more complete phenotype data. Deletion carrier status was confirmed with bcftools cnv calling plugin (https://samtools.github.io/bcftools/howtos/cnv-calling.html) [54]. The 1000 Genome Project data [55] was used as reference to exclude samples that showed differential ancestral background than European based on principal component analysis (PCA) (Supplementary Fig. 3). Genetic principal components were calculated with QTLtools pca mode using variant sites separated by 5000 base pairs [56] (webpage: https://qtltools.github.io/qtltools/). Haplotype Reference Consortium reference panel [57] (webpage: http://www.haplotype-reference-consortium.org/) was used for array imputation with the following parameters: build hg19, reference panel apps@hrc-r1.1, population European, phasing eagle. After imputation, SNVs with low imputation quality score R2 < 0.3, HWE p < 1e-6 and MAF < 0.05 were filtered out. The final quality controlled SNV set contained 6,462,855 biallelic SNVs for 103 individuals. Six individuals were further excluded as no phenotype data was available either due to their young age for completing SIPS or due to sub-optimal MRI data, thus reducing the sample set to 97 patients.

Derivation of the polygenic risk score for schizophrenia (SCZ PRS)

For constructing and identifying the SCZ PRS with the best predictive performance, we used the summary statistics from the SCZ genome-wide association analysis (GWAS) wave3 by the Psychiatric Genomics Consortium conducted primarily on samples of European ancestry [22], phenotype and genotype data collected within the Estonian Biobank (EstBB) [58] and the LDpred algorithm [59].

EstBB is a population-based biobank in Northern Europe, comprising 201,146 individuals aged ≥18 years. All biobank participants have signed a broad informed consent form, which allows continuous updating of epidemiologic data through periodical linking to national electronic repositories (hospital databases, national registries), and recontacting of participants. Medical history and health status are recorded according to the International Classification of Diseases, Tenth Revision (ICD-10 codes) [58]. EstBB participants have been genotyped using Illumina Global Screening Arrays with quality control conducted according to best practices (exclusion of individuals with call rate <95%, mismatch of genotype and phenotype sex, exclusion of SNVs with call rate <95%, HWE p < 1e-4, MAF < 1%). Pre-phasing was carried out with Eagle v2.3 [60] and imputation with Beagle v5 (28Sep18.79)8 [61] using the population specific imputation reference panel built from 2297 whole genome sequencing samples [62].

Genome-wide SCZ PRSs were constructed with LDpred, a Bayesian approach that applies a continuous shrinkage model to modify effect sizes based on the strength of each variant’s association in the GWAS and the underlying linkage disequilibrium (LD) structure [59]. We started with 7,585,078 SNVs for which the summary statistics level data from the SCZ GWAS wave3 was available [22] (https://www.med.unc.edu/pgc/download-results). The EstBB SNV content was (i) filtered for the quality controlled SNV content captured in the Swiss 22q11.2DS genotype data to account for the uniform set of SNVs in both datasets (resulted in 5,459,498 SNVs), (ii) filtered for the quality controlled SNV content (MAF > 0.01 and imputation quality score >0.8) in EstBB data (resulted in 5,235,126 SNVs), and (iii) clumped for maximum LD between SNV to reduce multicollinearity dimensions (r2 = 0.99; resulted in 2,473,370 SNVs). Ten different SCZ PRSs were derived by varying the fraction of causal SNVs (infinitesimal, p ≤ 1, p ≤ 0.3, p ≤ 0.1, p ≤ 0.03, p ≤ 0.01, p ≤ 0.003, p ≤ 0.001, p ≤ 0.0003, and p ≤ 0.0001) and using the EstBB LD reference panel to account for LD between SNVs.

For testing and validating the SCZ PRSs in EstBB, we excluded EstBB participants whose data was included in the SCZ GWAS wave3, one member per related individual pairs (PI_HAT > 0.2) and individuals with non-European ancestry in reference to 1000 Genome Project samples [55]. SCZ cases were defined using two sub-group criteria based on ICD-10 codes in electronic health records: (i) relaxed “Schizophrenia Spectrum Disorder” diagnosis (ICD-10 F2* “Schizophrenia, schizotypal, delusional, and other non-mood psychotic disorders” category; resulted in 1,356 SCZ cases), and (ii) strictly “Schizophrenia” diagnosis (ICD-10 code F20.* “Schizophrenia” category; resulted in 572 SCZ cases). Based on the consultation with practising Estonian psychiatrists to define the definition of SCZ diagnosis using ICD-10 codes reported in the national healthcare system, we opted for testing the SCZ PRSs using two SCZ definition groups to account for the following factors: (1) loss of power due to volunteer-based recruitment resulting in low number of strictly SCZ cases (i.e., considering ICD-10 F20.*), (2) possible increase in noise when relaxing SCZ diagnosis criteria (i.e., considering ICD-10 F2*). We considered SCZ cases with at least one report of an ICD-10 code for Schizophrenia Spectrum Disorder/Schizophrenia given by a psychiatrist or a neurologist and excluded individuals carrying SCZ diagnosis as a comorbid condition only or diagnosed by a non-specialist. EstBB participants without ICD-10 F2* were considered as controls (n = 108,201). Individuals with mania (ICD-10 F30.* “Manic episodes” category) and bipolar disorder (F31.* “Bipolar disorder” category) were excluded from all sets given the considerable genetic overlap between these psychiatric disorders and SCZ [63] (further information in the Supplementary Note).

Next, two-thirds of the EstBB cohort were allocated into a testing set (71,412 controls; 894 SCZ cases with F2* diagnosis, and 377 SCZ cases with F20* diagnosis) and one-third into a validation set (36,789 controls; 462 SCZ cases with F2* diagnosis and 195 SCZ cases with F20* diagnosis) for identifying and validating the best performing PRS, respectively (overview of the characteristics of the testing and validation sets are outlined in Supplementary Table 1). All ten SCZ PRSs retrieved with the LDpred method were computed for all individuals with STEROID v0.1.1 (https://genomics.ut.ee/en/tools) by multiplying the genotype dosage of each risk allele for each SNV by its respective weight and then summing across all SNVs into a score. For determining the best predicting PRS, we considered ten standardized SCZ PRSs separately and used a logistic model with diagnosis status (SCZ case or control) as a dependent variable and sex, age, and five genotype PCs as covariates. The model with the highest odds ratio was selected for replication in the validation set. The score with the best discriminative capacity in the validation set was additionally assessed based on maximal area under the receiver-operator curve (AUC) for considered logistic regression models using R/pROC package [64] and using R/survival package [65] (latter was used to account for age effect using left truncation and right censoring). Individual level data analysis was carried out under ethical approval 1.112/624 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs) and data release N05 from EstBB.

Association testing in the 22q11.2DS cohort

The SNVs and their adjusted weights of the best performing SCZ PRS (i.e., infinitesimal SCZ PRS model) were used for calculating the SCZ PRS for 22q11.2DS patients with STEROID v0.1.1 (https://genomics.ut.ee/en/tools) and standardized such that it followed a normal distribution with mean 0 and standard deviation 1 (Supplementary Fig. 6a).

To test for an association between 19 SIPS variables (ordered factor dependent variables) and SCZ PRS, we used ordinal logistic regression implemented in R/mass package (polr function) [66] for cross-sectional analysis and random-effects ordinal regression implemented in R/ordinal package (clmm function) [67] for longitudinal analysis. In the latter approach we considered each participant having SIPS variable data captured at multiple timepoints as random effects. Age, sex and first three genotype PCs were accounted for as covariates in cross-sectional analysis, while age2 was added in longitudinal analysis. SIPS data acquired at the timepoint in which the age was the closest to the median age of the 22q11.2DS cohort (median 16.43, mean 17.30, SD 4.91) were considered in cross-sectional analysis. Violation of proportional odds assumption was tested with Brant test that allows to assess whether the observed deviations from ordinal logistic regression model are larger than what could be attributed to chance alone using R/brant package [68]. The probabilities for each model with SIPS variables are given in Supplementary Table 2. No evidence for violating the proportional odds assumption was found (p > 0.05). To correct for multiple testing, false discovery rate (FDR) and Bonferroni correction were applied for cross-sectional and longitudinal analysis, respectively, accounting for 19 tests (R/qvalue package [69]). Given the small sample size, we additionally applied bootstrap** for each longitudinally tested model and carried out 1000 runs using sampling with replacement. Next, we considered the mean of p-values across bootstrap** runs for each item and determined model ranking based on the proportion (%) how many times the model was deemed significant at nominal p-value of <0.05 across 1000 bootstrap** runs. Items that surpassed Bonferroni correction and that were supported by bootstrap** were deemed as significant. SIPS variables were available for 88 individuals from 213 timepoints. To test whether SCZ PRS was correlated with positive or negative symptoms at different ages, we divided the cohort into two subsets using 18 years as the cut-off and carried out association testing cross-sectionally and longitudinally. A “positive symptoms” variable and a “negative symptoms” variable were derived by pooling values across respective category items. In cross-sectional analysis, the mean age of the younger sub-group (<18 years, n = 54) was 14.51 with median 14.67 and SD 2.16; and the mean age of the older sub-group (≥18 years, n = 34) was 21.72 with median 20.39 and SD 4.81. In longitudinal analysis, data of 76 individuals from 111 timepoints and 49 individuals from 102 timepoints were available for the younger and for the older sub-group, respectively.

Linear regression was used to test for an association between SCZ PRS and IQ and MRI variables cross-sectionally using data from the timepoint in which the age was the closest to the median age of the cohort (median 16.43, mean 17.30, SD 4.91). SCZ PRS was regressed on age, IQ test type/MRI scanner, and first three genotype PCs. Next, we used longitudinal data for associating cognitive function and brain imaging variables captured at multiple timepoints with SCZ PRS. To this end, we used linear mixed modelling implemented in R/lme4 package (lmer function) [70] to account for within-subject correlations by including a random intercept for each subject and considered age, age2, IQ test type/MRI scanner, and first three genotype PCs as covariates. For cognition, we first tested full scale IQ independently and then conducted a sub-analysis by considering verbal IQ and performance IQ measurements. For hippocampus, we carried out a secondary, region of interest analysis and considered fourteen volumetric hippocampal subfield variables. FDR correction [69] was applied for multiple testing. IQ measurements and brain imaging variables were available for 93 individuals from 212 timepoints, and 93 individuals from 207 timepoints, respectively, and were standardized such that these followed normal distribution with mean 0 and standard deviation 1.

To account for the five individuals with smaller 1.5 Mb deletion, we conducted a sensitivity analysis for all neuropsychiatric phenotypes considering 3 Mb deletion carriers only. While the test statistics show attenuation due to reduced power, these followed the same trend as in the main analyses (Supplementary Table 6).

Statistical analyses were conducted with R software version 3.6.2 [71].

Results

Swiss 22q11.2DS longitudinal cohort

Ninety-seven genotyped individuals (49 females) aged from 6 to 44 years (mean = 17.67, SD = 6.30) with a molecularly confirmed diagnosis of 22q11.2DS were included in the present study. Each participant was phenotypically assessed at an average of 2.2 timepoints (range = 1–5). Mean age at first visit was 15 years (SD = 6.66) and mean time interval between visits was 3.8 years (SD = 1.07; Table 1, Supplementary Fig. 2).

Identification of the best performing polygenic risk score for schizophrenia

Using LDpred, we constructed ten candidate SCZ PRSs using summary statistics from the SCZ GWAS wave3 [22] and tested and validated their predictive performance in EstBB comprising 201,146 individuals of European ancestry [58]. Using a testing set of 462 Schizophrenia Spectrum Disorder cases and 71,412 controls (Supplementary Table 1), we showed that the infinitesimal model, i.e., all genetic variants deemed causal for SCZ, showed the strongest effect in discriminating SCZ cases from control subjects (Fig. 1a, b; Supplementary Fig. 4a). One SD difference in SCZ PRS corresponded to an odds ratio (OR) of 1.73 (95% confidence interval (CI) 1.57–1.90, P = 1.47 × 10−29). These results were in concordance with estimates when considering a lower number of SCZ cases determined with stricter SCZ diagnostic criteria (Fig. 1c, d; Supplementary Fig. 4b; Supplementary Table 1). The prediction accuracy for the infinitesimal model was additionally assessed using maximal area under the receiver-operator curve (AUC). For the model containing covariates only (sex, age, five population structure PCs), the AUC was 0.653. Adding SCZ PRS to the model increased the AUC to 0.703, resulting in a 5% increase (Supplementary Fig. 5). As age was the main predictor, we additionally determined the discrimination capacity of SCZ PRS between SCZ cases and controls at the same age. Harrell’s C statistic of the model with age as timescale and without SCZ PRS in the model was 0.58 (95% CI 0.51–0.64) and 0.68 (95% CI 0.54–0.81) when considering Schizophrenia Spectrum Disorder and strictly Schizophrenia cases, respectively, and with SCZ PRS in the model increased to 0.64 (95% CI 0.58–0.70) and to 0.77 (95% CI 0.68–0.85) using the respective SCZ diagnostic criteria groups. These results agree with prior findings underscoring high polygenicity for SCZ [22, 72] as well as with AUC estimates determined for SCZ PRS [73, 74]. The SNVs and their adjusted weights of the infinitesimal SCZ PRS model were used for calculating SCZ PRS for 22q11.2DS patients. No discordance in the distributions of SCZ PRS values between EstBB and Swiss 22q11.2DS samples was identified in agreement with previous data [21] (Supplementary Fig. 6b).

Fig. 1: Predictive ability of SCZ PRS in EstBB.
figure 1

Odds ratios and 95% confidence intervals for ten SCZ PRSs in the testing set (a, c) and boxplots of the best performing SCZ PRS (infinitesimal model) in SCZ cases and controls (CTL) in the validation set (b, d). Schizophrenia Spectrum Disorder diagnosis and strictly Schizophrenia diagnosis were used for determining SCZ cases in the upper (a, b) and lower panels (c, d), respectively.

Polygenic burden for schizophrenia and phenotypic variance of 22q11.2DS

We first set out to determine whether the severity of clinical symptoms associated with psychosis can be explained by SCZ genetic load among 22q11.2 deletion carriers. To this end, we correlated SCZ PRS with 19 SIPS-derived items categorized into positive, negative, disorganized, and general symptoms. Cross-sectional analysis revealed that only “impaired tolerance to normal stress” was associated with SCZ PRS at FDR 5%, indicating that for one SD increase in SCZ PRS, the odds of scoring higher on the stress intolerance item doubled (OR 2.03, 95% CI 1.34–3.13, P = 0.001, Fig. 2a). When relaxing the FDR threshold to 10%, “social anhedonia” (OR 1.61, 95% CI 1.08–2.43, P = 0.02) and “ideational richness” (OR 1.69, 95% CI 1.14–2.54, P = 0.01) within negative symptoms, and “dysphoric mood” (OR 1.75, 95% CI 1.16–2.69, P = 0.009) within general symptoms, but none of the items within the positive symptoms category, showed a significant association with SCZ PRS (Fig. 2a; Supplementary Table 2, Supplementary Fig. 7). The distribution of SCZ PRS did not differ between psychosis positive and psychosis negative patients (P = 0.76, Fig. 2c).

Fig. 2: SCZ PRS association with SIPS variables.
figure 2

Overview of (a) cross-sectional and (b) longitudinal analyses results for SCZ PRS and SIPS variables with colour darkness indicating association strength after multiple correction, NS—not significant (white); FDR 5% (light blue) and Bonferroni/bootstrapped—associations that surpassed Bonferroni correction and were supported by bootstrap** (dark blue). c Boxplot of SCZ PRS values for psychosis positive vs psychosis negative deletion carriers. d Distributions of score values for four SIPS variables displaying the strongest association with SCZ PRS (i.e., from left to right “disorganized communication” within the positive symptoms category, “social anhedonia” and “occupational functioning” within the negative symptoms category, and “impaired tolerance to normal stress” within the general symptoms category) over age and coloured by increasing SCZ PRS quintiles (dark blue, light blue, grey, orange, and red). Each dot represents a score determined at a given timepoint (visit) connected with straight line for each 22q11.2DS patient.

To extend the findings of the cross-sectional analysis, we next investigated whether the 22q11.2DS patients with higher genomic burden for SCZ displayed steeper longitudinal increase/reduction on any symptomatic scale over time. To rule out false-positive associations due to small sample size, we used Bonferroni correction as well as bootstrap** validation. Random-effects ordinal regression modelling revealed that one SD increase in SCZ PRS corresponded on average to significantly greater odds to scoring higher on “disorganized communication” (OR 2.37, 95% CI 1.41–3.99) within positive symptoms, “social anhedonia” (OR 2.09, 95% CI 1.42–3.07), and “occupational functioning” (OR 1.82, 95% CI 1.32–2.51) within negative symptoms, “impairment in personal hygiene” (OR 1.82, 95% CI 1.29–2.56) within disorganized symptoms, and “dysphoric mood” (OR 2.0, 95% CI 1.28–3.11) and “impaired intolerance to normal stress” (OR 1.76, 95% CI 1.31–2.36) within general symptoms across time (Table 2, Fig. 2b, d; Supplementary Table 2, Supplementary Figs. 8, and 9). Whereas a sensitivity analysis did not allow to robustly show that SCZ PRS was correlated with negative and positive symptoms at different ages, we found in our longitudinal analysis with the younger sub-group (<18 years) that “disorganized communication” of positive symptoms showed stronger association with SCZ PRS, surviving Bonferroni correction (OR 2.95, 95% CI 1.43–6.30, P = 0.003), than “avolition” of negative symptoms that only survived FDR 10% correction (OR 1.55, 95% CI 1.05–2.29, P = 0.03; Supplementary Table 3, Supplementary Figs. 10 and 11). Altogether, our results suggest that 22q11.2DS patients with higher genetic liability to SCZ are specifically predisposed to a worsening negative and a general symptoms course.

Table 2 Longitudinal association analyses.

We next interrogated whether higher genetic burden for SCZ predisposes 22q11.2DS patients to a worsening in the trajectory of cognitive abilities. While none of the IQ variables reached statistical significance threshold in cross-sectional analysis (Supplementary Table 4), mixed linear modelling using longitudinal FSIQ measurements revealed a significant association between increasing SCZ PRS and cognitive decline (β = –0.25, standard error (SE) 0.11, P = 0.02, Table 2, Fig. 3a; Supplementary Table 4). It was driven by more severe decline in verbal capabilities (VIQ, β = –0.25, SE 0.11, P = 0.02), rather than underperformance in visuospatial intellectual abilities (PIQ, β = –0.19, SE 0.1, P = 0.08; Table 2; Supplementary Table 4, Supplementary Fig. 12) with one SD increase in PRS predicting a 3-point lower VIQ level on average.

Fig. 3: SCZ PRS association with cognition and brain imaging variables.
figure 3

Distribution of (a) FSIQ measurements and (bf) volumetric MRI measurements (total cortical grey matter, right and left hippocampus, right CA1 and left tail) across time for 22q11.2DS patients. Each dot denotes a measurement determined at given timepoint (visit) connected by a straight line for each 22q11.2DS patient. The subjects are coloured based on their clustering on SCZ PRS distribution. The blue and red denote the lowest and the highest SCZ PRS quintile, respectively, with grey marking joint three middle quintiles.

Lastly and given previous findings linking SCZ PRS with cortical and hippocampal features in the general population [30,31,20, 21], yet corroborating results obtained in the general population [25,26,27]. It was hypothesized that the genetic liability for SCZ might more strongly index molecular pathways manifesting as negative and general symptoms which in essence can reflect broad and heterogeneous clinical outcomes, and only weakly affect mechanisms that result in positive symptoms such as hallucinations and delusions [26]. Additionally, the diminished gene dosage resulting from the 22q11.2 deletion per se might account for the development of positive symptoms through mechanisms not captured by PRS [76]. However, but not contradictorily, given that the sample sets assessed in previous 22q11.2DS studies were considerably older [20, 21], and that the longitudinal analysis for symptoms course in the current study did indicate a positive association between SCZ PRS and delusional and persecutory ideas at more relaxed multiple test correction (Table 2; Supplementary Table 2), it is possible that patients at higher polygenic risk are yet to develop psychosis to its full extent. While our study with its low sample size and age range does not properly allow to assess whether SCZ PRS correlates with different symptom dimensions at different ages, our preliminary results warrant further investigation.

Secondly, as expected, the polygenic burden for SCZ amplified cognitive decline among 22q11.2DS patients. This result recapitulates the negative genetic correlation between cognition and SCZ [77,78,79] as well as replicates the previous report for 22q11.2DS patients [20]. It remains to be investigated whether the 22q11.2DS patients at increased genetic risk for SCZ and with lower cognitive levels exhibit more severe psychosis transitions compared to those with low SCZ PRS, and whether the stronger association with verbal IQ results from common variant burden functioning through domains affecting verbal rather than visuospatial abilities. In support with this hypothesis, 22q11.2 patients with psychotic symptoms did show an earlier decline specifically in VIQ [7]. Nevertheless, given that the higher levels of negative symptoms combined with the lower levels of cognition precede psychosis development [25, 80,81,82] and that the effect of the polygenic burden on SCZ could be partially mediated through cognition-relevant pathways [24], our results support the neurodevelopmental continuum model for psychosis [83]. These also indicate that the assessment of SCZ polygenic burden could provide valuable information for prognosis, patient monitoring and treatment allocation for 22q11.2DS patients.

Thirdly, the association between SCZ PRS and bilateral hippocampal volume reduction points out that the reduced hippocampal volume present in 22q11.2 patients [52, 84,85,86] is further aggravated by SCZ genome-wide burden. Prior estimates displaying a genetic overlap between idiopathic SCZ and hippocampal volume [87, 88] support the hypothesis that deviations from the normal hippocampal developmental trajectory could be a genetically-mediated intermediate phenotype for SCZ risk [22] as well as applied a PRS calculation method shown to outperform methods used in previous studies [20, 21, 25], thereby potentially resulting in more accurate downstream assessment with trait associated variables [90]. By using an external ancestrally matched cohort for deriving and validating the best performing SCZ, we recapitulated prior assessments substantiating that SCZ is highly polygenic with genetic effects diluted across the whole genome [22, 72, 73]. We acknowledge that a Swiss population-specific dataset would have allowed to derive the optimal SCZ PRS for association testing, but such data are unavailable. To minimize any bias stemming from sub-population stratification, we limited SCZ PRS calculation in the EstBB and in the Swiss 22q11.2DS cohort to a strictly common set of SNVs and used only samples of European ancestry that match the genetic background of samples used in SCZ GWAS [22]. No discordance in SCZ PRS value distributions was identified between the two datasets (Supplementary Fig. 6b). Furthermore, given that transferring a PRS to a different population but with the same ancestral background results in underestimation rather than in overestimation of risk prediction [91], the associations identified in the current study could be considered as conservative estimates. Lastly, given the small sample size and multiple testing burden, we could not reasonably perform a discovery analysis to identify brain regions most significantly impacted by SCZ polygenic burden but had to restrict ourselves to a candidate approach. Still, our results for all cortical and sub-cortical volume, surface area and thickness measurements according to the Desikan Killiany atlas indicate that hippocampus exhibits the strongest signal and is in line with previous reports [30, 31, 84, 89] (Supplementary Table 7).

In conclusion, our findings support the notion that the phenotypic expression resulting from a large-effect genetic variant is modified by second lower-effect SNVs. We demonstrate here that the higher polygenic burden for SCZ is associated with a worsened symptoms course, cognitive decline, and hippocampal volume reduction in 22q11.2 deletion carriers. These results substantiate that a genome-wide integrative analysis of allelic variation across the entire frequency spectrum is required to fully comprehend the genetic architecture and phenotypic variability of developmental disorders caused by a high-effect genetic variant [12, 19, 92,93,94]. Whether large-effect variants and polygenic burden act independently and additively or operate epistatically warrants investigation.