Introduction

Clinical trials are essential in allowing the scientific transition from basic research to clinical practice, whether the trials are about drug development or other types of non-drug interventions [1]. Clinical trial protocols are the trial documents of reference, detailing every step for participants enrolled in the trial. One of the critical protocol sections, is the list of eligibility criteria [2]—if this list is clinical trial specific, it will often include recurring criteria within the same field or disease area, with trial specific cut-off differences [2]. For example, when conducting a trial for the neurodegenerative disease amyotrophic lateral sclerosis (ALS), it will in the majority of cases include vital capacity (VC) measurement as a trial eligibility criteria. Some ALS trials only include patients with a VC above 50% (NCT05633459), while others ask for patients with VC equal or superior to 65% (national clinical trial NCT, NCT04248465). Enrolling the right participants in a clinical trial is essential as it (1) could allow for a personalized medicine approach [3,4,5,6], (2) might be the only option to access drugs in development for patients suffering from diseases with no cures [7,8,9,10], (3) should ensure that motivated participants complete the entire study without drop** out, and thus ultimately maximize patient retention [11,12,13,14], and (4) in the end ensures good quality clinical trial data [15]. However, trial enrollment can also be challenging, especially to efficiently identify the above mentioned potentially eligible candidates during the pre-screening process [16,17,18]. This requires the identification of participants meeting the most stringent criteria and ultimately highly likely to be successful during the screening process. The pre-screening procedure is crucial as it decreases the screen failure rate, which drastically varies between trials across disease areas and countries [19,20,21,22]. Considering that screen failures are associated with participant burden while also negatively impacting the study budget, there is a need to develop clinical trial recruitment strategies targeting these aspects [23,24,25,26].

Typically, trial pre-screening is staff-bound with a designated staff member in charge of the pre-screening process. Research teams usually have team specific pre-screening processes, as there are no national consensus, guidelines nor universal standard operating procedures (SOPs) on how to conduct the pre-screening for clinical trials. A typical pre-screening process may include an internal check of the hospital medical journals in paper format, a review of electronic medical journals, a review of medical journals sent via traditional mail in case of a referral, direct emails from patients emailing a research team, and more [27,28,29,30]. This creates a pool of information derived from several sources, with no standardized system assuring quality and replicability, not allowing an audit trail for quality assurance and control, and overall creating inequitable trial access for patients [31]. Clinical trial eligibility criteria will often include both demographic data and disease specific information. This information is captured in most disease registries/population registries/patient registries, and there is an growing interest in using such registries for pre-screening due to the high quality and easy accessible data [32,33,34,35]. Such disease registries/population registries/patient registries are to be distinguished from other types of databases such as electronic health records (EHR). For the purpose of this review, we will use the term patient registry when referring to a specific database aiming to capture data on all patients from a patient population in a specific site, state, or country [32, 36, 37].

The usage of population and disease registries for trial pre-screening has previously been investigated through a literature review of the period 2004–2013, reporting limited registry use for clinical trial pre-screening, but advocating for a more systematic usage as this was deemed an efficient method [38]. The combined use of registries and medical record data has been described as optimizing trial recruitment [23], and we have since 2013 observed an explosion of clinical trials in many different fields as reported by the International Clinical Trial Registry Platform (ICTRP) of the World Health Organization (WHO) [39]. The ICTRP collects trial registration information from different databases such as Clinicaltrial.gov and reported 34,291 clinical trials in 2013 versus 59,964 clinical trials in 2021. The increasing number of clinical trials globally highlights the need for efficient and equitable pre-screening processes, but also for an updated review considering the last review on this topic was not conducted with a systematic methodology [38].

In this systematic review, we characterized the use of population and disease registries as a pre-screening tool for clinical trials not discriminating between drug and non-drug trials. We included publications published between January 2014 and December 2022, as a non-systematic review covered the 2004 to 2013 timeframe. We aimed to describe the type of registries used, disease areas, type of clinical trials linked to the registry-based pre-screening, and potential assets the method brought to the pre-screening process.

Methods

Inclusion and exclusion criteria

Citations and references obtained from the search were screened using the Rayyan software and our set of inclusion and exclusion criteria are listed in Table 1. Eligible studies had to be in English, from peer-reviewed journals, reporting the use of population/patient/disease registries for trial pre-screening. Included studies also needed to be set in trials on patients and not on healthy individuals. Studies had to have been published in our targeted window between January 2014 and December 2022, and abstracts had to be available for review. Finally, we included studies of high to moderate quality, as evaluated through the List of Included Studies and quality Assurance in Review (LISA-R) tool. Since there was no standardized tool to judge the quality of the included studies, we developed a quality assurance tool, the LISA-R. This quality assurance checklist was developed using guidance provided on the Parsifal platform for systematic reviews, a platform providing support for researchers conducting reviews and wishing to establish new quality assurance tools. The tool consists of 11 items in which each item was judged on a two-level scale (yes/no) (LISA-R blank tool available in Supplementary material 2). For each “yes”, one point was attributed, giving a scale range from 0 to 11. An overall score > 8 was interpreted as high quality, 6–8 moderate quality, and < 6 low quality.

Table 1 Inclusion and exclusion criteria used for article selection in Rayyan

Search and selection strategy

The study protocol was registered in PROSPERO with the identification number CRD42023433968 and followed the PRISMA requirements [40]. A literature search was performed in the following databases: MEDLINE, Embase, and Web of Science Core Collection. The last search was conducted on June 22, 2023.

The search strategy was developed in MEDLINE (Ovid) in collaboration with librarians at the Karolinska Institutet University Library. For each search concept, medical subject headings (MeSH-terms) and free text terms were identified (Supplementary material 1). The search was then translated, with Polyglot Search Translator used for the translation of the controlled vocabulary [41], into the other databases.

Language restriction was made to English and the search was limited to years 2014–2022 as a previous non-systematic review covered the 2004–2013 period [38]. De-duplication was done using the method described by Bramer et al. [1]. One final step was added to compare digital object identifiers to finalize de-duplication. The full search strategies for all databases are available in supplementary material (Supplementary material 1). The review of papers was conducted by two of the authors (JF and LA) independently and then cross-checked. A third author (CI) was asked to solve selection conflicts if they arose, by setting-up a meeting where JF and LA could expose their process and CI could make the final decision. A first review process (phase 1) was done based on titles and abstracts only, while the second review was of full texts (phase 2). Only the publications of moderate and high quality as per the LISA-R tool were included in the final search (phase 3).

Data extraction

Data extraction was conducted by two of the authors (JF and LA) reading the full texts and summarizing information in table format through an excel form. This data extraction form was created for the sole purpose of this systematic review. The extraction form included the information we wished to extract from the included studies: trial type (drug trial versus non-drug trial), clinical trial name, NCT number, registry name and scope, patient population, and age. In order to specifically look into enrollment and pre-screening rates, we extracted the number of patients identified through the registries, number of patients eligible for the trial in question, and number of patients enrolled in the clinical trial. Different enrollment rates were calculated when possible and represented by percentages: (1) comparing the number of patient enrolled to the number identified in the pre-screening process and (2) comparing the number of patient enrolled to number of patients actually eligible after screening.

Results

Review process

A total of 1430 citations were identified through the literature search. Out of them, 1369 were excluded based on titles and abstract review as they did not meet inclusion criteria (Table 1). One citation was excluded as a duplicate (Fig. 1). The 60 remaining publications were reviewed by reading the full text and 35 publications were subsequently excluded as they did not meet inclusion criteria. The remaining 25 publications were assessed using the LISA-R tool and the articles of low quality were excluded, ending up with 24 included papers (Supplementary material 3a and b). The list of excluded papers is available upon request.

Fig. 1
figure 1

Flowchart of the selection process

Included articles—descriptive characteristics

Out of the 24 articles included and reporting the use of a population/disease registry for a clinical trial pre-screening [42,43,44,45,46,47,48,49,50,51,52,53,54,38, 74,75,76,77,78,79]. Our search highlighted that such recruitment registries seem to be extensively used in Alzheimer and dementia research [80,81,82,83,84,85,86,87]. This could explain why patient registries may surprisingly not be the first in line of use for trial pre-screening, as “recruitment registries” are blooming to support different trials. However, recruitment registries should be carefully considered as they bring ethical concerns. Indeed, they can lead to consenting patients already enrolled in other trials, or having to deal with changes in patient’s disease status not being updated [88].

In terms of disease areas, 11 of the 24 included studies reported use in either cardiovascular health or oncology. This is aligned with the ICTRP website that reports oncology and cardiovascular trials at the 1st and 3rd position for the numbers of trials by health category (the 2nd place being for neuropsychiatric conditions) [39]. We found half of drug trials (compared to other interventions) in our included studies with 12 publications of the 24 included reporting registry usage for drug trial pre-screening [44, 46,47,48, 51, 97]. One might argue that these benefits are not reflecting the current reality: since January 2014 there was a mean of 50,000 clinical trials running each year [39] and only 24 studies between 2014 and 2022 reported using population registries for pre-screening despite advantages with this method. However, the literature is known to under-report recruitment strategies in clinical trials, from protocols to publications [98, 99]. This leads to restrictive data, as this systematic review only reflects research that reported registry use in a clinical trial pre-screening setting. It is important to consider more clinical trials may pre-screen and recruit patients from registries without reporting it neither in their protocol nor in their published methodology. This means that registry use for trial pre-screening may be much more important than reported in this review. Furthermore, studies reporting use of population and disease registries for trial pre-screening have failed to address questions around data privacy and protection. The majority of disease registries around the world are accessible by two types of users: patients, who may directly fill out information into the registries, and health care professionals. These registries have data agreement in place, regarding privacy, sharing, and use such as data extraction for research purposes. When pre-screening for clinical trials, clinical trial sponsors do request pre-screening logs. This is done for financial reasons, as clinical research teams do negotiate in their clinical trial budget to be compensated for the time spent pre-screening patients for a specific trial. Pre-screening logs are provided by sponsors and collect limited data respecting information privacy regulations applying locally, such as General Data Protection Regulation (GDPR) in the European Union. It is essential to continue using tools such as pre-screening logs to serve as buffers to minimize data sharing from registries to sponsors (most often pharmaceutical companies) and maintain compliance with information privacy regulations. The main difference linked to this aspect would be observed between the USA and Europe, as the US regulation allows for race data to be collected which is not approved in Europe. This is limiting the evaluation of racial representativeness in European clinical trials, which may be biased by enrolling a vast majority of Caucasian participants.

Finally, as artificial intelligence (AI) is being developed, studies are now reporting use of machine learning for patient pre-screening into trials: Su et al. recently cited a pilot trial from the Mayo Clinic in Rochester using an AI-based trial matching system [2]. The paper reported an enrollment increase of 80% due to the quick and accurate patient matching to the oncology trial run at the Mayo Clinic [2], a system that could be applied to patient registries. Oncology has also brought us algorithms for clinical trial pre-screening, specifically Evolutionary Strategy algorithms (ES algorithms) [100, 101], that are commonly used in machine learning [102]. Ni et al. reported a 450% increase in efficacy of clinical trial pre-screening using electronic health record and not a patient registry, despite the fact that 10% of eligible patients were missed in the process [101]. More globally, data-driven technologies and strategies are more and more being reported in the literature, whether it is supporting prevention, diagnosis, or decision-making [103,104,105,106]. Such strategies’ impact on time optimization and associated cost reduction could be of great aid both to small trial centers working with limited staff and resources, and bigger trial centers dealing with a large volume of patients and trials.

Future studies are needed to address the limitations specific to certain disease fields to better describe the disease-specific needs around the use of registries for clinical trial pre-screening.

Conclusion

In conclusion, we aimed to describe the type of registries used, disease areas, type of clinical trials linked to the registry-based pre-screening, and potential assets the method brought to the pre-screening process. Only 24 studies between 2014 and 2022 reported using population and disease registries for clinical trial pre-screening despite time optimization and financial advantages using the method. A majority of the registries used were on a national level, and half of the trials for which pre-screening was performed were drug trials. Pre-screening strategies remain under-reported, and the use of population and disease registries for trial pre-screening may be much more important than what is described in this review, both for drug trials and non-drug trials. Our review is therefore stressing the need for standardized methodological guidelines for clinical trial pre-screening and encourages reporting of pre-screening processes in trial protocols and publications.