Introduction

In January 2020, the first infections with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were detected in Germany. Starting with these very first infections, viral sequencing data was used to better understand the disease transmission1. With the help of viral sequencing data, it has also been possible to determine the origin and approximate time of the introduction of the virus to e.g. New York City, California, Iceland, or Bavaria2,3,4,5. Similarly, genomic analysis has been applied to trace local infection chains in hospitals and care facilities6,7,8. SARS-CoV-2 remains an important challenge to healthcare facilities daily, highlighting the importance of defining transmission pathways to improve the safety of patients and healthcare workers alike. The need to better understand transmission chains advocates for the use of viral genome sequencing. However, so far it is not well-understood how state-of-the-art viral genome sequencing combined with bioinformatics approaches for transmission tracing compares to conventional, interview-based, contact tracing in the setting of healthcare institutions. This extends far beyond the current SARS-CoV-2 pandemic to all common infectious diseases encountered in clinical and hospital settings.

This analysis aimed to understand in-hospital transmission clusters of SARS-CoV-2 at the university medical center of the Technical University of Munich (TUM) (Klinikum rechts der Isar), in Munich, Germany, by SARS-CoV-2 whole viral genome sequencing in combination with two bioinformatically defined clustering approaches compared to interview-based contact tracing.

Methods

Viral whole genome sequences were obtained from residual diagnostic material positive for SARS-CoV-2 by PCR. Samples were collected at Klinikum rechts der Isar in Munich, Germany, from February 3, 2020, to January 10, 2021 (“TUM samples”, Suppl Fig. 1, cf. Suppl Methods). CleanPlex®9 or Artic10 SARS-CoV-2 sequencing panels were used for library preparation. Sequencing was performed on Illumina platforms for a total of 926 samples from 622 probands at three sequencing sites across Germany (cf. Suppl Methods). A proband is defined as any SARS-CoV-2-positive individual included in the study.

Bwa-mem17,18,19 that demonstrate the utility of viral genome sequencing in the identification of transmission clusters within healthcare institutions and beyond and identified divergent clusters between viral genome sequencing and interview-based contact tracing. We found that viral genome clusters derived using two different computational approaches tended to be smaller, more closely related genetically and to be spanning spatially larger portions of the hospital. Similarly, for example, a study using SARS-CoV-2 whole genome sequencing analyses in a tertiary referral hospital in Madrid, Spain, showed that the introduction of five different SARS-CoV-2 strains was responsible for what was assumed to be a homogeneous outbreak due to a single transmission chain by interview-based contact tracing17. Also, the addition of local viral genome sequencing data covering the same time period from outside the hospital resulted in ruling out the involvement of two cases in the outbreak, due to the high probability of community-acquired infections17. Czech-Sioli et al.8, in investigating 284 samples in their analysis, came to similar conclusions that temporally preceding index cases and transmission routes can be missed when using only interview-based contact tracing. Through alignment with GISAID data15, they also showed that placing sequences in a local context is essential to distinguish independent entries from in-hospital transmission. Additionally, as interview-based contact tracing cannot identify cross-infections, the efficacy of containment procedures cannot be assessed.

The bioinformatic analysis of sequencing data provides information of transmission pathways that were not previously suspected. This includes, for example, staff members from service areas, which are not involved in direct patient care, for whom no connection to transmission clusters was expected. Further, by providing largely unbiased, depersonalized information, transmission chain tracing using viral genome sequencing will eliminate the (perceived) denunciation involved in personal contact tracing while at the same time providing more accurate results.

Virus genome sequencing also harbours the potential to better understand cryptic transmission events at micro-scale. Several studies, describing patterns of within-host diversity, found evidence of co-infections20,21,22 and that co-infection with certain strains might be driven by infection from two different sources of infection20,21. This could be similar to our observations where the predominant virus strain in almost 20% of individuals for whom this data was available changed during the hospital stay. While we cannot fully exclude the possibility of sample mix-ups or cross-contaminations, this surprisingly high number of individuals with more than one viral strain during their hospital treatment highlights the continued importance of individual isolation measures in SARS-CoV-2-infected individuals to limit in-hospital viral persistence and spread.

While likely more precise in identifying in-hospital clusters of transmission for SARS-CoV-2, there still are important limitations to the viral genome sequencing approach that challenge its widespread use. It is not always possible to obtain high-quality genomes with quick turn-around times, due to, for instance, low viral load, technical limitations or high costs. This can result in incomplete datasets and, consequently, transmission chains. Also, the necessary technical and computational resources are often only available at larger-scale academic institutions, hampering its widespread implementation in quotidian clinical practice. Lastly, genetic tracing is challenging for newly emerging pathogens with low genetic diversity, as illustrated by our data during the first wave of SARS-CoV-2 infections (January 2020 to June 2020), where it is difficult to distinguish between actual transmission and incidental genetic similarity.

In our experience, interview-based contact tracing alone is not sufficient to fully map transmission pathways and clusters in a large university hospital. Complementation with viral genome sequencing data proofed very beneficial, especially, by highlighting in-hospital transmission chains that were spatially more expansive than expected. This information is of paramount importance, however, to efficiently contain transmission chains.

While the SARS-CoV-2 pandemic provided an impetus to the implementation of viral genome sequencing in clinical practice, the same advantages will also apply to transmission tracing of nearly all other pathogens and metagenomic approaches can further broaden the scope of pathogens detected.