Background

By March 2022, most children and young people (CYP) in the United Kingdom (UK) appeared to have been exposed to SARS-CoV-2, with antibodies found in 82% and 99% of primary and secondary school aged pupils, respectively [1]. Given the scale of infection, a substantial number could develop symptoms of Long Covid (also referred to as Post Covid Condition). Long Covid in CYP can be defined as the presence of one or more impairing, persisting, physical symptom(s) lasting 12 or more weeks after initial SARS-CoV-2 infection that may fluctuate or relapse, either continuing or develo** post-infection [2]. Hence, it is important to study Long Covid, particularly given its potential impact on healthcare systems and need for planning.

Systematic reviews demonstrate that common symptoms of Long Covid in CYP at 3 months post-testing/infection include fatigue, insomnia, loss of smell, and headaches [3]. The Long Covid (CLoCk) study, is the largest matched cohort study of Long Covid in CYP in the world [4]. Based in England, CLoCk collected data on over 30,000 CYP testing positive and negative between September 2020 and March 2021 over a two-year period. CLoCk followed 6,804 CYP 3 months after a SARS-CoV-2 PCR-test and found over half of CYP testing negative and 67% of those testing positive reported at least one symptom 3-months post-testing [5]. The most common symptoms amongst test positives were tiredness (39%), headache (23%) and shortness of breath (23%), with test negatives reporting mainly tiredness (24%) and headache (14%). Results from this, and all other studies, need to be assessed against their methodological limitations, two of which are considered here. First, response rates to study invitation are generally low, for example, the response rate at the 3-months post-testing sweep of the CLoCk study was 13.4% [5]. Similarly, the UK Office for National Statistics’ [6] COVID-19 infection survey had a response rate of 12%. Second, all longitudinal studies suffer from attrition over time [7] which is typically more pronounced in studies with longer follow-up periods [8].

If non-response and attrition over time systematically differ by sub-groups in the envisioned population, findings could be biased and attempts to generalise findings to the wider population limited [9,10,11]. For example, those with particular characteristics (e.g., older, females and from specific ethnicities) are more likely to positively respond to study invitation [12]. Reasons for attrition over time include study withdrawal, individuals becoming uncontactable [e.g., due to change in contact details; 13] or lacking motivation to continue participating. Indeed, both initial non-respondents and those lost to follow-up are often socioeconomically disadvantaged and less healthy [14]. With studies on Long Covid, particularly those comparing test-positives to test-negatives, an additional source of bias could exist. For example, within the CLoCk study, to isolate the effect of Long Covid from that of living through a pandemic, researchers originally excluded from the analytic sample those (re)infected, that is, test-negatives who subsequently tested positive and test-positive CYP who were subsequently reinfected [15]. This criterion yields a cohort of CYP who, as per the data available, appear to have either (i) always tested negative, or (ii) tested positive only once. However, these CYP may not be representative of the larger population of CYP in England. One well-established method to assess the impact of potential bias due to non-response, attrition and sample selection is weighting, that is, emphasising the contribution of some individuals over others in an analysis to reconstruct the target population and/or general population [9]. Such weighting methodology is appropriate when data are missing (due to non-response, attrition, and sample selection) at random [16], that is, the missingness is dependent on fully observed characteristics such as sex, age, socioeconomic disadvantage and health status. Yet, this powerful statistical technique to address potential selection biases has been underutilised in epidemiological research [9].

In this manuscript we construct weights for the CLoCk study [17] and, as an illustrative example, apply them to published findings showing the overall prevalence of shortness of breath and tiredness increases in CYP from baseline (i.e., at the time of their index PCR test) to 12-months post-baseline [15]. Specifically, to assess the robustness of conclusions drawn from CLoCk data about Long Covid’s symptomatology and trajectory in CYP, the present study aims to (i) create weights for the CLoCk study at its data collection sweeps 3-, 6- and 12-months post-index PCR-test, and (ii) apply developed weights to the analysis of shortness of breath and tiredness over a 12-month period to determine whether accounting for any biases in response, attrition or (re)infection affects published results.

Methods

The CLoCk study identified 219,175 CYP (91,014 SARS-CoV-2 Positive and 128,161 SARS-CoV-2 Negative) who had a SARS-CoV-2 PCR-test between September 2020 and March 2021 through the UK Health Security Agency’s (UKHSA) database containing the outcomes of all such tests. At study invitation, test-positives were matched to test-negatives on age, sex, region of residence and month of test. Consenting SARS-CoV-2 Positive and Negative CYP complete a questionnaire about their mental and physical health 3-, 6-, 12- and 24-months post-index PCR-test [4]. Of note, the sweeps of data collection depend on the CYP’s month of test, with 3-, 6-, 12-, and 24-month data available for some (tested in January-March 2021), while for others only 6-, 12-, and 24-month (tested in October-December 2020), or 12- and 24-month (tested in September 2020 and an additional cohort from December 2020) data were collected. This manuscript is based on all data collected for the 3-, 6-, and 12-month timepoints. The analytic samples for previous CLoCk publications [5, 15, 18] were such that: (i) CYP must have responded within a pre-specified timeframe (i.e., < 24, ≤34, and ≤ 60 weeks post-testing for the 3-, 6-, and 12-month questionnaires, respectively) and (ii) Initial SARS-CoV-2 Negative CYP must have never reported a positive test, with initial SARS-CoV-2 Positive CYP never reporting being reinfected. The latter requirement was determined using a combination of self-report and UKHSA held data. See Figs. 1 and 2 for exclusion criteria at each stage and participant flow.

Fig. 1
figure 1

Logic model for inclusion in the analytic sample at 3-, 6-, and 12-months

a Initially, due to funding constraints, only a portion of those tested in December 2020 were contacted to participate at 6 months. Hence, some children and young people tested in December 2020 provided both 6- and 12- month data, whereas others only 12-month data

b Determined through self-report and UKHSA data. (Re)infected refers to (i) a SARS-CoV-2 Negative subsequently testing positive, or (ii) a SARS-CoV-2 Positive testing positive again

Fig. 2
figure 2

Flow diagram of participants at 3-, 6-, and 12 months

a Determined using the following cut off points: < 24 weeks post-testing for the 3-month questionnaire; ≤ 34 weeks post-testing for the 6-month questionnaire; ≤ 60 weeks post-testing for the 12-month questionnaire

b Determined through self-report and UKHSA data. (Re)infected refers to (i) a SARS-CoV-2 Negative subsequently testing positive, or (ii) a SARS-CoV-2 Positive testing positive again

c By definition of a COVID positive episode [19], a test-positive person cannot be reinfected by 3 months

Research ethics approval was granted by the Yorkshire and The Humber—South Yorkshire Research Ethics Committee (REC reference: 21/YH/0060; IRAS project ID:293,495).

Measures

Index COVID status, age, sex and region were determined from data held at UKHSA. Socioeconomic status was proxied using the Index of Multiple Deprivation (IMD), obtained using CYP’s lower super output area (i.e., small local area level-based geographic hierarchy), where higher values are indicative of lower deprivation [20]. Ethnicity was self-reported and collected at registration. Current (i.e., at time of questionnaire completion) health, current loneliness, and number of symptoms being experienced, including tiredness and shortness of breath, [out of a possible 21, consistent with the ISARIC Paediatric Working Group; 5] were self-reported at each data collection sweep. Similarly, standardised measures were collected, including the: Short Warwick and Edinburgh Mental Wellbeing Scale [SWEMWS; 21]; EuroQol Visual Analogue Scale [EQ-VAS; 22], EQ-5D-Y [23], Strengths and Difficulties Questionnaire [SDQ; 24], UCLA Loneliness Scale [25], and Chalder Fatigue Scale [CFS; 26]. See Additional File 1: Table 1 for further information.

For each data collection sweep, three indicator variables were created:

  • Responding given envisioned to take part (Yes/No): If participants completed the whole questionnaire.

  • Responding timely given responded (Yes/No): If participants who responded, responded to the questionnaire < 24 weeks post-testing (3-month questionnaire); ≤ 34 weeks post-testing (6-month questionnaire) and ≤ 60 weeks post-testing (12-month questionnaire).

  • (Re)infected given timely response (Yes/No): ‘Yes’ indicates, among those responding timely, SARS-CoV-2 index-test Positives that were reinfected and SARS-CoV-2 index-test Negatives that subsequently tested positive. ‘No’ indicates, among those responding timely, initial SARS-CoV-2 Positives that never report another positive test and initial SARS-CoV-2 Negatives that never report a positive test. A combination of the UKHSA’s testing data and self-reported information on having ever tested positive was used to generate this.

In total nine indicator variables were created: three at each data collection sweep.

Analysis

Analyses were conducted using Stata v17 [27].

Weight generation

At each data collection sweep and corresponding to the three indicator variables created (as described above), three ‘mini’ survey weights were generated to account for CYP being lost either due to (i) non-response, (ii) responding after the established cut-off points or (iii) (re)infection with SARS-CoV-2. A fourth, combined ‘envisioned population’ weight was created which accounted for loss in the analytic sample due to all three factors. These four survey weights (three ‘mini’ survey weights and one ‘envisioned population’ weight) were generated for each data collection sweep, (i.e., 3-, 6- and 12-months post-SARS-CoV-2 test), see Fig. 3 for details.

Fig. 3
figure 3

Steps in weight generation

a Determined using the following cut off points: < 24 weeks post-testing for the 3-month questionnaire; ≤ 34 weeks post-testing for the 6-month questionnaire; ≤ 60 weeks post-testing for the 12-month questionnaire

b Determined through self-report and UKHSA data. (Re)infected refers to (i) a SARS-CoV-2 Negative subsequently testing positive, or (ii) a SARS-CoV-2 Positive testing positive again

Here, the term ‘envisioned’ population refers to all CYP that could have taken part at the relevant time point (i.e., it is the maximum number of CYP that could provide data at a specific time point and was 50,845, 127,894 and 219,175 at 3-, 6-, and 12-months respectively). The ‘target’ population varies depending on the specific research question. For example, in the illustrative example described below, the target population is all CYP that could have taken part at 6 months (i.e., N = 127,894; see Fig. 4).

Fig. 4
figure 4

Participant flow in the published CLoCk study [15] to be replicated

a Here, the target population is all children and young people that could have taken part at 6 months

b A late response at 6 months is defined as not responding ≤ 34 weeks post-testing

c Determined through self-report and UKHSA data. (Re)infected refers to (i) a SARS-CoV-2 Negative subsequently testing positive, or (ii) a SARS-CoV-2 Positive testing positive again

d A late response at 12 months is defined as not responding ≤ 60 weeks post-testing

e Of these, 1,826 children and young people registered at 3 months (806 SARS-CoV-2 Negative and 1,020 SARS-CoV-2 Positive)

The three ‘mini’ survey weights were calculated for (i) response given envisioned to take part, (ii) timely response given response, and (iii) (re)infection given timely response. Each ‘mini’ survey weight was calculated as the reciprocal of its corresponding conditional probability (Fig. 3). These conditional probabilities were computed using logistic regression (described below).

For the logistic regression of responding given envisioned to take part, all available data (held at UKHSA for study-design matching) and pair-wise interactions were considered as explanatory variables. For the logistic regressions of (i) responding timely given responded and (ii) (re)infected given timely response, questionnaire data was also available for use as predictors. Forward (p < 0.157) and backward (p < 0.200) stepwise selection processes were used to refine models used to predict these probabilities with cut-offs selected as per recommendations [28]. Our weighting approach is appropriate when data are missing at random [16]. In an attempt to ensure this assumption is valid we included sex, age, region, index COVID Status and IMD in all but one (see below) of the logistic regression models. Of these, age and IMD were continuous variables, while the others were categorical. We determined the appropriate functional form for the relationship between age/IMD and the log odds of the probability of the (three) outcomes by modelling the relationship (i) linearly, (ii) categorically (age: 11–13, 14–15, 16–17 years; IMD deciles, 1–5), (iii) with linear and quadratic terms and (iv) using fractional polynomials with up to two degrees. The functional forms with the lowest Akaike’s information criterion (i.e., the best fitting model) were used in our subsequent models. Importantly, index COVID Status was excluded as a predictor of the probability of being (re)infected given CYP responded timely at 3 months. This is because, by definition of a COVID positive episode [19], once a person tests positive, they would only be considered to be reinfected should they test positive more than 3 months after the initial positive test. Table 1 summarises the variables included in each model to predict the three conditional probabilities at the three timepoints. When issues with variables perfectly predicting the outcome were encountered, relevant variables were dropped. This only happened at the 3-month time-point. The concordance statistic (C) was used to assess the predictive performance of the models: values 0.7 and 0.8 denoting good and strong performance, respectively, with a value of ≤ 0.5 indicating poor prediction [Table 1; 29, 30].

Table 1 Variables included in logistic regression models used to produce conditional probabilities for weight generation

At each time-point, the envisioned population weight was calculated as the product of the three corresponding ‘mini’ survey weights. Taking the example of 3 months post-testing: to re-weight from the previously used analytic sample to the envisioned CLoCk population, the fourth created survey weight comprised the product of the following three survey weights: Response3 months, Timely response3 months, and (Re)infection3 months (Fig. 3). The four survey weights at each time point (twelve in total) are flexible and can be combined as required, to create final survey weights to get to the target population as described in the illustrative example.

Weighting to the general population

Generated survey weights re-weight the analytic sample to the CLoCk envisioned population, that is, CYP invited to participate if they had a PCR-test within the pre-specified timepoints. However, as PCR testing varied by region and stage of the pandemic [31, 32], the envisioned population may not be fully representative of the general population of CYP in England. This is because, for example, not all CYP in England will have been able to access/complete a PCR-test. Hence, final survey weights used to get to the required target population were re-calibrated to the general population, using data on sex, age, and region from the 2021 UK Census [33]. To do this, ratios of the Census data to CLoCk data reweighted to the target population of interest were produced (see Additional File 2 for the interactive tool used to calculate these) with the final target population survey weights then multiplied by these ratios. See Additional File 2 for how this was done for the illustrative example below.

Weight trimming

All survey weights (i.e., each of the response given envisioned to take part, timely response given response, (re)infection given timely response, and the ‘envisioned population’ survey weights) were trimmed to reduce the likelihood of extremely large survey weights increasing variance [34]. This was done by reducing extreme survey weights to a cut-off defined as the median + k × interquartile range. k is typically either 3 or 4 [35]. In the present study we took a conservative approach and set k as 3. All survey weights were multiplied by a factor to re-calibrate back up to the original sum of weights [36]. When combining survey weights for the illustrative example below, untrimmed survey weights were initially used with the final survey weights trimmed.

Illustrative example: replicating published findings

Findings from CLoCk show the overall prevalence of tiredness and shortness of breath are high in CYP at baseline (i.e., at the time of their index PCR test) and increase over time to 12 months [15]. Here we compare the prevalence of tiredness and shortness of breath over a 12-month period from a previous publication [15] to prevalences that were weighted to the (i) target, and (ii) general populations. We demonstrate how uncertainty around generated weights can be accounted for via bootstrap** (with 1000 replications) and supply illustrative code for this (Additional File 1: Text 1). To be included in the published analytic sample (n = 5,085), CYP first registering in January-March 2021 must have completed their 3-month questionnaire (to provide information about their symptoms at the time of their PCR-test, i.e., at baseline), and be in the analytic sample at 6- and 12-months. For those registering in October-December 2020, they must meet the requirements to be included in the analytic samples at both 6- and 12-months (see Fig. 1 for cohort breakdown and Fig. 4 for participant flow for this example). Therefore, longitudinal weights were created by combining the survey weights as detailed in Fig. 5 and further illustrated in the bootstrap example in Text 1 (Additional File 1).

Fig. 5
figure 5

Steps taken to combine survey weights to replicate published CLoCk findings [15]

Note. To be included in the analytic sample, children and young people must have provided information about their symptoms at the time of their PCR test (i.e., 0 months). This information is gathered at study enrolment meaning criteria for inclusion varied depending on month of index PCR-test. Children and young people with an index test in January, February and March 2021 must have responded to the 3-month questionnaire (to gather information about baseline symptoms) as well as meet the criteria for inclusion in the analytic samples at 6- and 12-months post-testing (i.e., responded, done so timely and not (re)infected). Children and young people with an index-test in October, November, and December 2020 only had to meet the criteria for inclusion in the analytic samples at 6- and 12-months

Results

At the 3-month sweep, 7,135 CYP were included in the analytic sample, constituting 14% of the envisioned population at that time-point (N = 50,845, Table 2; Fig. 2). The analytic sample at 6 months (n = 12,946) comprised 10% of the envisioned population (N = 127,894); at 12-months, 15,624 were included in the analytic sample, forming 7% of the 12-month envisioned population (N = 219,175). Overall, 31,012 CYP completed at least one questionnaire, with 42,264 questionnaires completed. CYP in the analytic samples at 3-, 6-, and 12-months completed the questionnaire at a median of 14.9 (IQR: 13.1–18.9), 27.9 (IQR: 26.3–29.7), and 52.7 (IQR: 51.3–54.9) weeks post-testing, respectively. Compared to the envisioned population, CYP in the analytic samples were older, female and from less deprived areas (Table 2).

Table 2 Characteristics of the 3-, 6-, and 12-month envisioned and analytic populations

Weight generation

The C statistics for all required conditional probabilities varied between 0.60 (responding timely given responded at 12 months) to 0.77 ((re)infected given timely response at 12-months and 6-months, see Table 1). Table 3 displays the survey weights generated for each data collection sweep along with the relevant Ns, medians, and interquartile ranges.

Table 3 Survey weights generated for each data collection sweep (N, Median, and Interquartile Range [IQR])

Re-weighting published findings

Consistent with published findings [15], the overall prevalence of tiredness and shortness of breath increased from baseline to 12-months post-index PCR-test in both test-positive and test-negative CYP even after weighting (and trimming) to the target and general populations (Tables 4 and 5; Figs. 6 and 7). For example, at time of testing, the unweighted overall prevalence of tiredness in CYP who tested negative for SARS-CoV-2 was 3.63%. When weighted (and trimmed) to the target population the prevalence was 3.51% and when weighted (and trimmed) to the general population the prevalence was 3.69% (Table 4). Likewise, prevalences of tiredness and shortness of breath by time of first report remained similar to published findings (Figs. 6 and 7). Results using trimmed and untrimmed weights were broadly similar (Additional File 1: Tables 2 and 3; Figs. 1 and 2). Table 4 (Additional File 1) shows the uncertainty around the generated target population weight (untrimmed); results are broadly consistent.

Table 4 Weighted and unweighted tiredness prevalences from baseline to 12 months post-index PCR-test
Fig. 6
figure 6

Weighted (trimmed) and unweighted tiredness prevalences 0-12-months post-index PCR-test by time of first report

Fig. 7
figure 7

Weighted (trimmed) and unweighted shortness of breath prevalences 0-12-months post-index-PCR-test by time of first report

Table 5 Weighted and unweighted shortness of breath prevalences from baseline to 12 months post-index PCR-test

Discussion

The present study aimed to (i) create weights for the CLoCk study at its data collection sweeps 3-, 6- and 12-months post-index PCR-test, and (ii) apply the developed survey weights to the analysis of shortness of breath and tiredness over the 12-month period to determine whether accounting for any biases in the target population, response, attrition or (re)infection affected published results. Flexible survey weights for the CLoCk study were developed and applied in an illustrative example. When applying the survey weights, results were consistent with published CLoCk findings [15]. That is, the overall prevalence of tiredness and shortness of breath increased over time from baseline to 12-months post-testing in both test-positive and test-negative CYP.

A major strength of the present study includes the flexibility of the survey weights developed whereby the creation of separate ‘mini’ survey weights (i.e., response, timely response and (re)infection) and the overall ‘envisioned population’ weight ensures researchers are able to combine them to re-create their specific target population, which will vary depending on the specific research question being asked. The interactive tool provided will allow researchers to re-calibrate their target population weights to the general population of CYP in England using the recent Census 2021 data. This re-calibration attempts to address the potential bias in the envisioned CLoCk population due to variation in PCR testing by region and stage of the pandemic [31, 32]. Furthermore, by trimming survey weights using a technique that is unaffected by the size of the largest survey weight [34], we improve the accuracy and precision of final parameter estimates in re-weighted analyses [37]. Moreover, we used a range of data from both the UKHSA dataset and the CLoCk questionnaire to develop the models that predicted the required conditional probabilities. We acknowledge that the C statistics, particularly for models used to predict the probability of responding given envisioned to take part and the probability of responding timely given responded were somewhat low ranging between 0.60 and 0.73. However, for the probability of responding given envisioned to take part, it should be noted that the C statistic cannot be further improved due to the lack of additional data relating to the envisioned CLoCk population (here, only data held on the UKHSA database for matching was available). Thus, for all survey weight generation, but here in particular, one should note the constraint deriving from the variables used to generate conditional probabilities and the potential for the non-response/attrition/selection mechanisms to be dependent on unmeasured variables. For example, it might be that those with severe tiredness are less likely to respond. Relatedly, our approach is appropriate when missingness is assumed to be dependent on observed characteristics, but as mentioned above this may not be the case. This is an important potential limitation, with the implication being survey weights do not fully adjust for such (non-response, attrition, and sample selection) bias, though we attempt to minimise its impact. In an attempt to avoid potential recall bias, for the latter two ‘mini’ weights, we made the pragmatic decision to only consider questionnaire data asked in relation to health and wellbeing at the time of questionnaire completion.

We acknowledge concerns regarding the use of stepwise selection processes whereby inclusion of too many candidate variables may result in nuisance variables being selected over true variables meaning the best model is not provided [38]. We were mindful of this when selecting the initial list of potential predictors, determined the best functional forms of continuous variables used in all regressions, and used theoretical arguments to inform our selection, as recommended [39]. Finally, it should be noted that the survey weights are estimated and if treated as observed there is a risk of overestimating the precision of the estimates. To address this, we provide an example of how variabilities due to generating the weights can be accounted for via bootstrap**.

Conclusions

CLoCk is the largest known prospective study of Long Covid in non-hospitalised CYP, with over 30,000 respondents. Like all longitudinal population-based studies, issues regarding selection into the study and attrition over time need to be considered. The present findings suggest the CLoCk sample is representative of the envisioned and general populations of CYP in England, although the developed weights need to be utilised in multiple and different contexts to assess their impact and identify whether current conclusions are consistent across other CLoCk analyses. The same approach can and should be taken in other research studies to assess sample representativeness. Importantly, application of survey weights more generally is beneficial as a way of addressing the impact of potential bias.