Abstract
Background
Since the first complete genome sequencing of SARS-CoV-2 in December 2019, more than 550,000 genomes have been submitted into the GISAID database. Sequencing of the SARS-CoV-2 genome might allow identification of variants with increased contagiousness, different clinical patterns and/or different response to vaccines. A highly automated next generation sequencing (NGS)-based method might facilitate an active genomic surveillance of the virus.
Methods
RNA was extracted from 27 nasopharyngeal swabs obtained from citizens of the Italian Campania region in March–April 2020 who tested positive for SARS-CoV-2. Following viral RNA quantification, sequencing was performed using the Ion AmpliSeq SARS-CoV-2 Research Panel on the Genexus Integrated Sequencer, an automated technology for library preparation and sequencing. The SARS-CoV-2 complete genomes were built using the pipeline SARS-CoV-2 RECoVERY (REconstruction of COronaVirus gEnomes & Rapid analYsis) and analysed by IQ-TREE software.
Results
The complete genome (100%) of SARS-CoV-2 was successfully obtained for 21/27 samples. In particular, the complete genome was fully sequenced for all 15 samples with high viral titer (> 200 copies/µl), for the two samples with a viral genome copy number < 200 but greater than 20, and for 4/10 samples with a viral load < 20 viral copies. The complete genome sequences classified into the B.1 and B.1.1 SARS-CoV-2 lineages. In comparison to the reference strain Wuhan-Hu-1, 48 total nucleotide variants were observed with 26 non-synonymous substitutions, 18 synonymous and 4 reported in untranslated regions (UTRs). Ten of the 26 non-synonymous variants were observed in ORF1ab, 7 in S, 1 in ORF3a, 2 in M and 6 in N genes.
Conclusions
The Genexus system resulted successful for SARS-CoV-2 complete genome sequencing, also in cases with low viral copies. The use of this highly automated system might facilitate the standardization of SARS-CoV-2 sequencing protocols and make faster the identification of novel variants during the pandemic.
Similar content being viewed by others
Background
Coronavirus disease-19 (COVID-19), declared as pandemic on March 2020 by WHO, is an infectious disease caused by the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2). COVID-19 represents a global public health concern due to the relatively large fraction of infected people who develop a severe and often fatal interstitial pneumonia [1,2,3]. Indeed, as of December 2019, SARS-CoV-2 infected more than 114 million individuals worldwide, causing more than 2.5 million deaths [4].
Italy was the first European country to be hardly hit by the SARS-CoV-2 epidemic. The first wave of infection mainly affected the Northern Italian regions, causing thousands of deaths, especially among the most fragile individuals [2). In contrast, the mean value for samples with high titer was 95.6% (range 80.6–99.8%) with a median of 98.2% (s.d. ± 8).
The complete genome (100%) of SARS-CoV-2 was successfully obtained for 21 samples with a mean coverage > 100× (max coverage 13,393×; min coverage 142×) (s.d. ± 4.378). In particular, the complete genome was fully sequenced for all samples with high viral titer (> 200 copies), for the two samples with a viral genome copy number < 200 but greater than 20, and for 4/10 samples with viral load < 20 (Table 2).
Phylogenetic analysis
Phylogenetic analysis was performed on 21 sequences (Table 3). In comparison to the reference strain Wuhan-Hu-1 (Accession number: NC_045512.2), 48 total variants were observed with 26 non-synonymous substitutions, 18 synonymous and 4 reported in untranslated regions (UTRs) (Table 3). Ten of the 26 non-synonymous were observed in ORF1ab, 7 in S, 1 in ORF3a, 2 in M and 6 in N genes.
The ORF1ab is the longest gene of CoV genome (21,289/29,903 bp) and is cleaved into non-structural proteins (NSP1-NSP16). Among NSPs, the NSP2 and NSP3 had the highest number of variants with 5 and 7 mutations, respectively.
The most common variants were the C241T and the C3037T in NSP3, the C14408T in NSP12 within the ORF1ab, and the D614G within the S protein. We compared the variants revealed in our complete genomes with those reported in the SARS- CoV-2 Mutation Browser v-1.3 database, containing sequence analysis of 10,416 SARS-CoV-2 strains from 111 locations and 6678 mutating positions. Interestingly, 12 variants were never detected before, of which 5 were reported in ORF1ab and 4 in S gene (Table 3). We could confirm 11/12 previously unreported variants by resequencing samples with available RNA by using an S5 XL apparatus (Additional file 1: Table S1). Sequencing of the p12 sample carrying the F2L variant in the S protein failed. In addition, the variants F561, E565K in orf1ab nsp2, K356R in the spike gene and V77F in orf3 were also confirmed by Sanger sequencing in samples with available RNA.
Following the GISAID classification, the variants C241T, C3037T, A23403G are the most common detected in several SARS-CoV-2 isolates throughout Europe. These mutations are characteristic of clade G and comprises the large Italian outbreak (since 29/01/2020 and still ongoing). Seventeen Italian complete genomes from this study showed the three mutations suggesting their classification in the clade G, lineage B.1. In addition, four sequences showed an additional mutation (G28882A) suggesting the classification in the clade GR, lineage B.1.1 originated from B.1, another clade mostly reported in Europe and in Italy. The classification within the G and GR clade was confirmed by phylogenetic analysis. The maximum likelihood tree showed 9 clusters within the G clade and 3 clusters within the GR clade.
Within the G clade, 13 sequences did not cluster with any other sequence. Three were reported from the municipality of Napoli and did not cluster with sequences from this study (p01, p12, p38).
One cluster showed sequences from the municipality of Napoli, p17 and p07, while p40 and p04 were from different municipalities. The major cluster with sequences from this study had 6 sequences represented by p11, p03, p41, p06, p59 from Napoli and the p37 sequence out of the sub-cluster from the province of Caserta. Two clusters showed the presence of sequences reported in other Italian regions. The p44 strain clustered with hCoV-19/Italy/LAZ-INMI-8/2020 (EPI_ISL_424342) from Central Italy (Lazio region), hCoV-19/Italy/VEN-UniVR-6/2020 (EPI_ISL_492985) and hCoV-19/Italy/LOM-UniMI-L160/2020 (EPI_ISL_542155) collected at the beginning of March in Northern Italy (Veneto and Lombardia regions).
The p19 and p24 showed evolutionary correlation with sequences from Lazio, Marche and Abruzzo regions collected in March 2020: hCoV-19/Italy/LAZ-INMI-9/2020 (EPI_ISL_424343), hCoV-19/Italy/MAR-UnivPM-78955-2/2020 (EPI_ISL_516088), hCoV-19/Italy/LAZ-INMI11-B/2020 (EPI_ISL_451304), hCoV-19/Italy/ABR-IZSGC-TE7097/2020 (EPI_ISL_528929) with p34 and hCoV-19/Italy/ABR-IZSGC-TE5472/2020 (EPI_ISL_420564) out of the former cluster.
Within the clade GR, one sequence did not cluster (p39) while p42 clustered with hCoV-19/Italy/LOM-UniMI-L182/2020 (EPI_ISL_542173) from Northern Italy (Lombardia region) and hCoV-19/Italy/SAR-AMVRC-28/2020 (EPI_ISL_458085) from Central Italy (Sardinia region). The p31 and p45 from different municipalities clustered together.
The variants shared by p03 and p34, who have history contact, were inspected. The p34 sample shared two variants, P512 in ORF1ab NSP3 and D3G in M, with p19 and p24 samples, whereas p03 did not show these mutations, confirming the different origin of viral infection between the two cases. Based on the phylogenetic analysis related sequences were reported from different municipalities and the large municipality of Napoli showed the circulation of several viral sequences with point mutations shared within strains phylogenetically correlated or not. In addition, as reported in Table 1, p03 and p34 declared close contacts among them, however, they clustered in different position within the phylogenetic tree and showed different point mutations (Fig. 1).
Maximum likelihood tree built using 133 SARS-CoV-2 complete genomes downloaded from GISAID database and 21 Italian strains reported in bold. The phylogenetic was built using IQ-TREE using the best fit model indicated by the Model Finder implemented in IQ-TREE and 1000 bootstrap replicates. Bootstrap > 70 are reported at nodes
Discussion
Since the first complete genome sequencing of SARS-CoV-2 on 31 December 2019, and the first Italian case of COVID-19 in Italy [17], more than 550,000 complete genomes have been sequenced worldwide and released on GISAID database after 1 year of pandemic. To date, the most used and successful sequencing method to obtain complete genome is NGS. We present here the complete sequencing of SARS-CoV-2 genomes using the Ion Torrent Genexus System, a highly automated sequencer. In this study, 21 out of 27 SARS-CoV-2 RNA, tested with Genexus System, were fully sequenced.
The 6 samples not fully sequenced had a number of copies lower than the limit of quantification of the Real Time PCR assay (20 copies) and 3 of them had too low RNA for the Qubit quantification. The results suggest that samples had an amount of viral RNA closer to the limit of sequencing potential of the amplicon-based Ampliseq technology. However, despite the low number of copies and RNA, too low to be detected by the Qubit, four samples were fully sequenced suggesting that other factors related to the quality of the sample may also affect the success of library preparation and sequencing reaction.
The results obtained, however, highlight the potentiality of the Genexus System: (i) the users are involved only in the sample preparation and quantification, (ii) the automated process allows the users to focus on NGS raw data check and subsequent bioinformatic analysis, (iii) the technology allowed the complete genome sequencing of 78% of the samples (21/27) obtained from routine SARS-CoV-2 diagnosis process, with additional 3 samples showing nearly completed sequencing of the genome (> 95%). Moreover, the Genexus System permits the sequencing of 32 multiplexed samples in less than 24 h, representing a useful method for SARS-CoV-2 surveillance during a pandemic event. The Genexus System could be useful to analyse in a few days, respect to the Sanger sequencing, several viral strains from patients with an abnormal clinical presentation such as a very late viral clearance or to identify possible new variants in situations of rapid increase of contagions.
The complete genome sequences obtained were analysed by phylogenetic analysis and compared to the 133 Italian complete genomes on GISAID database collected in the same period of the samples analysed in this study. The sequences here reported were collected in March and April 2020 from nasopharyngeal swabs from Napoli province and nearby towns in the Campania region. The sequences clustered within the two lineages, B.1 and B.1.1, which were mostly detected in Italy and worldwide since February 2020, when the D614G mutation of the B.1 lineage was reported for the first time.
The Italian sequences within the B.1 formed 9 clusters while other 3 clusters were in B.1.1, confirming the heterogeneity of circulating strains. In addition, most of the sequences under study (14/21) were collected from Napoli and formed 7 different clusters consistent with an area with high population density. Two clusters (e.g.: p19, p44) were formed by strains from this study, showing evolutionary correlation to SARS-CoV-2 sequences reported in other Italian regions and suggesting common origin of viral strains. The cluster formed by p40 and p04 sequences also showed the circulation of correlated sequences in different municipalities suggesting intra-municipality commutes.
To date the only differences reported among SARS-CoV-2 are point mutations, excepting the deletion described in the B.1.1.7 and P.1, reported for the first time in December 2020 in the United Kingdom and Brazilian variants. The point mutations reported in this study and shared by all complete genomes are related to the B.1 and B.1.1 lineages as the D614G.
Other mutations were reported in one or two sequences only and never reported before. Since these mutations never fixed in the viral population, we can speculate that a correlation between these SNPs and viral adaptation may exist. The genes with the high number of point mutations were the spike with 11, 6 in nsp3 and 7 in N, three genes under positive selection [18, 19].
It is interesting to note that two patients who tested positive after a business meeting (p03 and p34) actually have genomic sequences not closely correlated, thus suggesting different origins of the infection. The sequencing of the viral genome can therefore also better clarify some dynamics relating to the spread of the infection in hospitals or in communities.
Since the appearance of SARS-CoV-2 several mutations have been reported in the spike gene and novel mutations are continuously described [10, 20,21,22,23]. Selected mutations such as the D614G might provide an advantage to the virus by increasing the cellular infectivity and virus transmissibility. Recently, a N501Y mutation on the Spike gene has been reported [24] showing a higher affinity to human ACE2 protein as compared to D614G. The surveillance on circulating strain is highly relevant today because the novel variants, with multiple mutations in their spike glycoproteins, are key targets of virus-neutralizing antibodies and raise the concern of vaccine efficacy against the novel strains.
Conclusions
During a pandemic event the surveillance of circulating strains is crucial to understand the evolution of viral strains and the emerging of novel variants. This study is a first observation of variants detected in the Campania region; a region less affected than Italian Northern regions in the first phase of the pandemic in Italy. In particular, we reported the circulation of different variants within the Napoli province and the heterogeneity of different strains circulating between and within municipality. In addition, a novel automated technology as the Ion Torrent Genexus Integrated system allowed complete genome sequencing even of samples with relatively low viral titer in a relatively short timeframe, thus facilitating the continuous surveillance of novel variants.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from GISAID database (https://www.gisaid.org).
Abbreviations
- COVID-19:
-
Coronavirus disease-19
- SARS-CoV-2:
-
Severe Acute Respiratory Syndrome coronavirus 2
- NGS:
-
Next Generation Sequencing
- MERS-CoV:
-
Middle East Respiratory Syndrome Coronavirus
- ORFs:
-
Open reading frames
- S:
-
Spike
- E:
-
Envelope
- M:
-
Membrane
- N:
-
Nucleocapsid
- FAM:
-
6-Carboxyfluorescein
- ROX:
-
6-Carboxy-X-rhodamine
- LOD:
-
Limit of detection
- RECoVERY:
-
REconstruction of COronaVirus gEnomes & Rapid analysis
- UTRs:
-
Untranslated regions
- NSP:
-
Non-structural proteins
- ACE2:
-
Angiotensin-converting enzyme 2
References
Chan JFW, Yuan S, Kok KH, To KKW, Chu H, Yang J, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet. 2020;395(10223):514–23.
Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med. 2020;382(13):1199–207.
Lipsitch M, Swerdlow DL, Finelli L. Defining the epidemiology of Covid-19—studies needed. N Engl J Med. 2020;382(13):1194–6.
Coronavirus Resource Center. https://coronavirus.jhu.edu/map.html. Accessed Mar 2.
Cereda D, Tirani M, Rovida F, Demicheli V, Ajelli M, Poletti P, Trentini F, Guzzetta G, Marziano V, Barone A, Magoni M, Deandrea S, Diurno G, Lombardo M, Faccini M, Pan A, Bruno R, Pariani E, Grasselli G, Piatti A, Gramegna M, Baldanti F, Melegaro A, Merler S. The early phase of the COVID-19 outbreak in Lombardy, Italy. ar**v:2003.09320 [q-bioPE]. 2020.
Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395(10224):565–74.
Bar-On YM, Flamholz A, Phillips R, Milo R. SARS-CoV-2 (COVID-19) by the numbers. Elife. 2020;9:e57309.
Rambaut A, Holmes EC, O’Toole A, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403–7.
Korber B, Fischer WM, Gnanakaran S, Yoon H, Theiler J, Abfalterer W, et al. Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell. 2020;182(4):812.e19-827.e19.
Plante JA, Liu Y, Liu J, **a H, Johnson BA, Lokugamage KG, et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature. 2020;592(7852):116–21.
Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Fonseca V, Giandhari J, et al. Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. medRxiv. 2020. https://doi.org/10.1101/2020.12.21.20248640.
Faria NR, Claro IM, Candido D, Moyses Franco LA, Andrade PS, Coletti TM, Silva CAM, Sales FC, Manuli ER, Aguiar RS, Gaburo N, Camilo CdC, Fraiji NA, Esashika Crispim MA, Carvalho MdPSS, Rambaut A, Loman N, Pybus OG, Sabino EC, on behalf of CADDE Genomic Network. Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: 293 preliminary findings. 2021. https://virological.org/t/genomic-characterisation-of-an-emergent-294sars-cov-2-lineage-in-manaus-preliminary-findings/586.
De Sabato L, Vaccari G, Knijn A, Ianiro G, Di Bartolo I, Morabito S. SARS-CoV-2 RECoVERY: a multi-platform open-source bioinformatic pipeline for the automatic construction and analysis of SARS-CoV-2 genomes from NGS sequencing data. bioRxiv. 2021. https://doi.org/10.1101/2021.01.16.425365.
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.
Rakha A, Rasheed H, Batool Z, Akram J, Adnan A, Du J. COVID-19 variants database: a repository for human SARS-CoV-2 polymorphism data. bioRxiv. 2020. https://doi.org/10.1101/2020.06.10.145292.
Capobianchi MR, Rueca M, Messina F, Giombini E, Carletti F, Colavita F, et al. Molecular characterization of SARS-CoV-2 from the first case of COVID-19 in Italy. Clin Microbiol Infect. 2020;26(7):954–6.
Lo Presti A, Rezza G, Stefanelli P. Selective pressure on SARS-CoV-2 protein coding genes and glycosylation site prediction. Heliyon. 2020;6(9):e05001.
Benvenuto D, Giovanetti M, Ciccozzi A, Spoto S, Angeletti S, Ciccozzi M. The 2019-new coronavirus epidemic: evidence for virus evolution. J Med Virol. 2020;92(4):455–9.
Saha P, Banerjee AK, Tripathi PP, Srivastava AK, Ray U. A virus that has gone viral: amino acid mutation in S protein of Indian isolate of coronavirus COVID-19 might impact receptor binding, and thus, infectivity. 2020. Biosci Rep. https://doi.org/10.1042/BSR20201312.
Dawood AA. Mutated COVID-19 may foretell a great risk for mankind in the future. New Microbes New Infect. 2020;35:100673.
Sheikh JA, Singh J, Singh H, Jamal S, Khubaib M, Kohli S, et al. Emerging genetic diversity among clinical isolates of SARS-CoV-2: lessons for today. Infect Genet Evol. 2020;84:104330.
van Dorp L, Acman M, Richard D, Shaw LP, Ford CE, Ormond L, et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect Genet Evol. 2020;83:104351.
Mathavan S, Kumar S. Evaluation of the effect of D614g, N501y and S477n mutation in Sars-Cov-2 through computational approach. Preprints. 2020. https://doi.org/10.20944/preprints202012.0710.v1.
Acknowledgements
Not applicable.
Funding
Project “Caratterizzazione bio-molecolare del virus SARS-CoV-2 e dei cofattori dell'infiammazione implicati nella patogenesi della covid-19”, funded by the Campania region. NGS reagents for Genexus were kindly provided by Thermofisher.
Author information
Authors and Affiliations
Contributions
AMR, CR, RP, FB: NGS sequencing; MC, MF, DT: isolation of viral RNA; LDS: phylogenetic analysis; EC, GB, GP, GV, NN: study conception and design; LDS, AMR, GV, NN: analysis and interpretation of data; AMR, LDS, GV, NN: wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The research protocol was approved by the Institutional Review Board (IRB) of the Istituto Nazionale Tumori “Fondazione Giovanni Pascale”.
Consent for publication
All authors approved the manuscript and agreed to publish.
Competing interests
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1.
Confirmation of unreported variants by S5 and Sanger sequencing.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Rachiglio, A.M., De Sabato, L., Roma, C. et al. SARS-CoV-2 complete genome sequencing from the Italian Campania region using a highly automated next generation sequencing system. J Transl Med 19, 246 (2021). https://doi.org/10.1186/s12967-021-02912-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12967-021-02912-4