Introduction

Online Mendelian Inheritance in Man (OMIM), an updated catalogue of human genes and genetic phenotypes, contains over 16,000 genes [1, 2], and more than 9,300 Mendelian phenotypes, including more than 6,000 with a known molecular defect. In addition, there are more than 1,500 confirmed Mendelian phenotype or a phenotypic locus with an unknown molecular basis and more than 1,700 additional phenotypes of suspected Mendelian origin [3]. According to Orphanet [4] there are roughly 7,000 rare diseases (RD), 80% of which are thought to have a genetic cause, the majority of which are Mendelian/monogenic disorders [5, 6].

Non-monogenic genetic diseases, also known as complex genetic diseases, are conversely caused by a combination of genetic, environmental, and lifestyle factors. Unlike monogenic diseases that are caused by a single gene mutation, non-monogenic diseases involve multiple genes, each contributing a small effect, as well as environmental and lifestyle factors. According to the European definition, rare diseases are life-threatening or chronically debilitating conditions with a prevalence of less than one case in 2,000 individuals, while the US figure is less than one case in 1,500 people [7, 8]. Although rare individually, these disorders affect 264–446 million of people worldwide, and 17.8–30.3 million in Europe [7, 9]. About 50 to 60% of RDs affect children, 12% of which are congenital and 42% have an onset in the first two years of life [9, 10]. Patients suffering from RD share similar needs, suffer diagnostic delay, uncertainty in genetic counselling, and lack of proper clinical management and care, since an effective treatment is available only for about 400 diseases. This is also due to failure in the identification of the molecular defects underlying a large number of these diseases. Genetic testing confirms or rules out a suspected genetic condition, is diagnostic in a proportion of clinically unsolved cases and determines the individual chance of develo** or passing on a genetic disorder. Over the past decade, the development of next generation sequencing (NGS) technologies and bioinformatic pipelines to manage and analyse genomic data, jointly with an impressive reduction of sequencing costs, have led to a widespread implementation of genomic sequencing, most often whole exome sequencing (WES). These tools, which can identify the molecular defect causing Mendelian disorders [11], have shown to be effective and sustainable in genomic medicine. As a powerful tool, genomic medicine has the potential to improve outcomes and reduce costs in primary care settings [12]. The application of whole genome sequencing (WGS) and the whole exome sequencing (WES) in new-borns and children suffering from a severe disorder of likely genetic origin is expected to improve targeted, effective care and management [13, 14].

WGS and WES are increasingly used for diagnostic purposes on critically ill infants and children admitted to Neonatal Intensive Care Units (NICU) and Paediatric Intensive Care Units (PICU) with a suspected genetic disorder [14,15,16]. Traditional genetic testing allows to reach the diagnosis in around 20% of cases [14]. Thus, acutely ill neonates with suspected genetic diseases are often discharged or deceased before diagnosis. As a result, NICU treatment of genetic diseases is usually empirical, may lack efficacy, may be inappropriate, or even may cause adverse effects [17].

Currently, whole exome sequencing (WES) is more commonly used globally than whole genome sequencing (WGS) due to its easier data storage and processing [18], as well as its cost-saving benefits [19]. Nevertheless, despite the widespread adoption of whole exome sequencing (WES), previous research has demonstrated that whole genome sequencing (WGS) has the potential to yield a greater number of diagnoses than WES both in undiagnosed adults and suspected genetic diseases of the newborn. Particularly, over a large number of studies, the diagnostic yield attained by WES ranges between 25 and 50% while the WGS diagnostic yield is about 40–60% [20, 21]. Furthermore, a recent systematic review and meta-analysis showed a greater diagnostic yield for WGS (0.41, 95% CI 0.34–0.48, I2 = 44%) compared to WES (0.36, 95% CI 0.33–0.40, I2 = 83%), although not statistically significant, and usual care (UC) (0.10, 95% CI 0.08–0.12, I2 = 81%) [22]. In addition, another systematic review reported a diagnostic yield ranging from 3 to 79% for WES and between 17 and 73% for WGS [23].

Recent studies have also supported the clinical utility of WGS, compared to standard testing, in NICU highlighting a higher diagnostic yield, a sharp increase in changes in clinical management, and shortening of the time to diagnosis thanks to the PCR-free WGS approach[24]. Therefore, a wider use of WGS could change acute management and life outcomes in children with chronic diseases using stratified therapeutics [14].

Marshall et al. [25]. In addition, the translation of WGS into clinical settings has been hindered by the lack of access to technology, complex infrastructure, and expert personnel. At present, in a context of limited healthcare resources, it is necessary to retrieve evidence on how to integrate the WGS technology in the diagnostics, fulfilling both the criteria of clinical utility and cost-effectiveness [26].

The aim of this study was to perform a systematic review and meta-analysis to assess the effectiveness of WGS, with respect to WES and/or UC, for the diagnosis of suspected genetic disorders among the paediatric population.

Materials and methods

Search strategy and selection criteria

A systematic review of the literature was conducted querying relevant electronic databases, including MEDLINE, EMBASE, ISI Web of Science, and Scopus from January 2010 to June 2022 in order to retrieve peer-reviewed articles. The Population, Intervention, Comparison, Outcome (PICO) [27] framework was adopted to formulate the following research question: “Is implementing WGS for the care of the paediatric population effective?”. A comprehensive search strategy was created and implemented according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [28] checklist. First, controlled descriptors and the relative key words were identified and verified in each scientific database. Afterwards, a Boolean search string, combining Medical Subject Headings (MeSH) and free-text words, such as “new born”, “infant”, “paediatric”, “paediatric”, “child”, “next-generation sequencing”, “whole genome sequencing”, “whole exome sequencing”, “genomic testing”, “panel test”, “diagnostic yield”, “effectiveness”, “appropriateness”, “efficacy”, “clinical efficacy”, “NICU”, “PICU”, “emergency”, was used. Full search strings for each database are detailed in the Supplementary Material. The search was completed by hand search in order to identify missing articles (i.e., snowball searching). Additional relevant articles were found by analysing bibliographic citations. The inclusion criteria for this systematic review were defined as follows: paediatric patients affected by severe life-threatening disorders of likely genetic origin undergoing WGS, and/or WES diagnostic test, either in an emergency setting (i.e., neonatal intensive care unit or NICU and paediatric intensive care unit or PICU) or in an outpatient setting. Where available in the included studies, UC was also considered. UC (e.g., chromosomal micro-array [CMA], targeted gene panel, array CGH, fluorescence in-situ hybridization, karyotype) was defined as sequencing methods not involving massively parallel sequencing and not allowing to screen simultaneously for mutations in hundreds of loci in genetically heterogeneous disorders, whole-genome screening for novel mutations, and sequence-based detection of novel pathogens that cause human diseases [29]. The inclusion was restricted to articles written in English and published between January 2010 and June 2022. The indicated timespan reflected the new sequencing technologies not available in older publications or being outdated owing to technological developments [30]. The search strategy was further restricted by availability of full texts published in peer-reviewed journals and by type of articles, which excluded non-primary literature, as commentary, books, thesis, and reviews. Assessment of the eligibility criteria was carried out independently by three authors; in the case of divergence, a fourth author was consulted. The primary outcome of our search was the diagnostic yield which was measured as the number of patients in which the genetic test suggested the definitive diagnosis out of the total number of patients undergoing the test. After the removal of duplicate articles, and according to the inclusion and exclusion criteria, three independent researchers performed the preliminary screening by evaluating the titles and abstracts. Then, the same subjects screened the full text of each study to determine the potential eligibility. In both screening phases, all disagreements were solved by a fourth author by discussing the inclusion and exclusion criteria of the article.

Data analysis

Data extraction was completed by three independent investigators. A pre-determined data extraction spreadsheet was designed including the following variables: study characteristics, country of the study, setting, sample size, age, intervention, comparator, indicators, and main findings. Methodological quality of studies evaluating diagnostic yield was assessed using the Quality Assessment of Diagnostic Accuracy studies (QUADAS-2) scale [29] as recommended by the Agency for Healthcare Research and Quality (AHRQ), the Cochrane Collaboration, and the National Institute for Health and Clinical Excellence (NICE). The use of QUADAS-2 implies four phases: (1) state the review question, (2) develop review specific guidance, (3) review the published flow diagram for the primary study or construction of a flow diagram if none is reported, and (4) judgement of bias and applicability. The scale includes four domains: (1) patient selection, (2) index test, (3) reference standard, and (4) flow and timing. The first part of each key domain regards bias and includes information used to support the risk of bias judgment, a set of signalling questions to help reviewers reach the judgements regarding bias, and judgment of risk of bias. For each signalling question, the investigator could select “yes,” “no,” or “unclear”. The risk of bias was judged as “low”, “high” or “unclear”. If all signalling questions for each domain answered “yes”, the risk of bias was judged “low”, while, if any signalling question answered “no”, the risk of bias was considered “high”. The term “unclear” was used whenever the risk of bias could not be assessed due to missing information. The second part of the first three domains regarded applicability. The applicability sections were similar to the bias sections except for the signalling questions. Concerns regarding applicability were rated as “low”, “high” or “unclear”, the latter definition being used when insufficient data were reported. Studies rated as “low” on all domains regarding bias or applicability received an overall judgment of “low risk of bias” or “low concern regarding applicability”. Studies rated as “high” in one or more domains were judged “at risk of bias” or as having “concern regarding applicability”. QUADAS-2 assessments were summarized through the relative tabular and graphical displays [30].

The Grading of Recommendations, Assessment, Development and Evaluations (GRADE) approach was adopted to assess the evidence quality for the outcome of interest across the included studies. The GRADE method categorizes the level of evidence quality into: high quality, further research is extremely unlikely to change the credibility of the pooled results; moderate quality, further research is likely to influence the credibility of pooled results and may change the estimate; low quality, further research is extremely likely to influence the credibility of pooled results and is likely to change the estimate; very low quality, the pooled results have extreme uncertainty [31]. The online GRADE profiler (GRADEpro) [32] was adopted to create evidence profiles and summary of findings tables. For the outcome of interest, the quality of evidence was downgraded if the risk of bias, inconsistency, indirectness, imprecision, and publication bias were assessed as having serious limitations [33]. The overall quality was rated as either “high”, “moderate”, “low”, or “very low”.

Quantitative data synthesis was performed by always kee** the diagnostic yield, in terms of proportion of cases detected out of the total, as outcome of reference. Firstly, the diagnostic yield was meta-analysed, by inspecting differences between diverse techniques (WES, WGS, and UC) via subgroup analyses: to this purpose, in view of the expected heterogeneity among studies [34], random-effects models were developed according to DerSimonian and Laird [35] and heterogeneity was inspected using the I2 statistic (threshold level for significant heterogeneity: ≥ 50%) and chi-squared test for homogeneity (significance level for heterogeneity: p < 0.1) [36]. Given the available number of studies, a meta-regression model was also built, in order to compare the techniques by adjusting for relevant covariates (ICU vs non-ICU setting, Mendelian vs non-Mendelian disease, publication before vs after 2017). Another meta-regression model was run stratifying by the value (i.e., low and high) of the diagnostic yield reported by the primary studies included in the revision. The cut-off value was set according to the pooled diagnostic yield estimated through the random-effects meta-analysis.

Secondly, a network meta-analysis was performed by considering all studies comparing at least two of the three techniques. A frequentist approach based on the Mantel–Haenszel method for binary data, as described by Efthimiou et al. [37], was adopted. Heterogeneity was quantified through the test of inconsistency (Cochran’s Q statistic), and the odds ratio was chosen as summary measure, as widely recommended for indirect comparisons of binary variables because of the symmetry and invariance of this measure to the coding of event and non-event [38,39,40].

All statistical analyses were carried out, and plots were drawn, using the statistical software R (version 4.0.5) [41]: specifically, the “meta” package (version 5.0.0) [42] was used for the meta-analysis of proportions and meta-regression, while the “netmeta” package (version 2.0–0) was used for the network meta-analysis [43]. Two-sided p-values < 0.05 were considered statistically significant.

Role of the funding source

The funding source had no involvement in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Results

The database search resulted in 4,927 publications and 18 studies were retrieved through the snowball search method. After duplicates elimination, 3,955 titles and abstracts were screened. A total of 63 articles were identified for a full-text screening. After full-text examination, 24 papers were excluded since they did not fulfil the eligibility criteria. Thus, 39 [15, 16, 24, 82]. Further research rigorously assessing costs, effectiveness, cost-effectiveness, organizational impacts, ethical aspects of WGS in a health technology assessment perspective [83] in a transparent manner is mandatory to allow for a more informed decision-making process in this context.

Moreover, additional primary studies (preferably high-quality RCTs with larger samples) firstly evaluating the comparison between WGS and WES and then between WGS and standard genetic testing, are required to deeply investigate the differences in costs and diagnostic yield and to increase the level of quality evidence.

Conclusion

Whole genome sequencing for the paediatric population with suspected genetic disorders allows an accurate and early genetic diagnosis in a high proportion of cases. This provides understanding of the molecular mechanism underlying diseases, supports tailored treatments and accurate genetic counselling, while reduces the burden of unsolved cases that weigh on patients and their families [25] by putting end to the so called “diagnostic Odyssey” [84, 85]. The present review suggests the use of WGS in the diagnostic workup of ill paediatric patients with suspected genetic disorders strengthened by evidence at policy, program, and intervention levels. Our study also reinforces the use of methodologies capable of providing robust evidence for the formulation of health policy on funding, to overcome present hurdles in transitioning WGS from the research setting into routine clinical practice. However, there is a pressing issue of efficiently allocating limited healthcare resources for HTA agencies when it comes to WGS approaches. Overcoming this challenge will be critical to realizing the potential benefits of WGS for improving patient outcomes and reducing healthcare costs.