Background

Coronavirus disease-19 (COVID-19), declared as pandemic on March 2020 by WHO, is an infectious disease caused by the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2). COVID-19 represents a global public health concern due to the relatively large fraction of infected people who develop a severe and often fatal interstitial pneumonia [1,2,3]. Indeed, as of December 2019, SARS-CoV-2 infected more than 114 million individuals worldwide, causing more than 2.5 million deaths [4].

Italy was the first European country to be hardly hit by the SARS-CoV-2 epidemic. The first wave of infection mainly affected the Northern Italian regions, causing thousands of deaths, especially among the most fragile individuals [2). In contrast, the mean value for samples with high titer was 95.6% (range 80.6–99.8%) with a median of 98.2% (s.d. ± 8).

The complete genome (100%) of SARS-CoV-2 was successfully obtained for 21 samples with a mean coverage > 100× (max coverage 13,393×; min coverage 142×) (s.d. ± 4.378). In particular, the complete genome was fully sequenced for all samples with high viral titer (> 200 copies), for the two samples with a viral genome copy number < 200 but greater than 20, and for 4/10 samples with viral load < 20 (Table 2).

Phylogenetic analysis

Phylogenetic analysis was performed on 21 sequences (Table 3). In comparison to the reference strain Wuhan-Hu-1 (Accession number: NC_045512.2), 48 total variants were observed with 26 non-synonymous substitutions, 18 synonymous and 4 reported in untranslated regions (UTRs) (Table 3). Ten of the 26 non-synonymous were observed in ORF1ab, 7 in S, 1 in ORF3a, 2 in M and 6 in N genes.

Table 3 Variants of Italian SARS-CoV-2 complete genome

The ORF1ab is the longest gene of CoV genome (21,289/29,903 bp) and is cleaved into non-structural proteins (NSP1-NSP16). Among NSPs, the NSP2 and NSP3 had the highest number of variants with 5 and 7 mutations, respectively.

The most common variants were the C241T and the C3037T in NSP3, the C14408T in NSP12 within the ORF1ab, and the D614G within the S protein. We compared the variants revealed in our complete genomes with those reported in the SARS- CoV-2 Mutation Browser v-1.3 database, containing sequence analysis of 10,416 SARS-CoV-2 strains from 111 locations and 6678 mutating positions. Interestingly, 12 variants were never detected before, of which 5 were reported in ORF1ab and 4 in S gene (Table 3). We could confirm 11/12 previously unreported variants by resequencing samples with available RNA by using an S5 XL apparatus (Additional file 1: Table S1). Sequencing of the p12 sample carrying the F2L variant in the S protein failed. In addition, the variants F561, E565K in orf1ab nsp2, K356R in the spike gene and V77F in orf3 were also confirmed by Sanger sequencing in samples with available RNA.

Following the GISAID classification, the variants C241T, C3037T, A23403G are the most common detected in several SARS-CoV-2 isolates throughout Europe. These mutations are characteristic of clade G and comprises the large Italian outbreak (since 29/01/2020 and still ongoing). Seventeen Italian complete genomes from this study showed the three mutations suggesting their classification in the clade G, lineage B.1. In addition, four sequences showed an additional mutation (G28882A) suggesting the classification in the clade GR, lineage B.1.1 originated from B.1, another clade mostly reported in Europe and in Italy. The classification within the G and GR clade was confirmed by phylogenetic analysis. The maximum likelihood tree showed 9 clusters within the G clade and 3 clusters within the GR clade.

Within the G clade, 13 sequences did not cluster with any other sequence. Three were reported from the municipality of Napoli and did not cluster with sequences from this study (p01, p12, p38).

One cluster showed sequences from the municipality of Napoli, p17 and p07, while p40 and p04 were from different municipalities. The major cluster with sequences from this study had 6 sequences represented by p11, p03, p41, p06, p59 from Napoli and the p37 sequence out of the sub-cluster from the province of Caserta. Two clusters showed the presence of sequences reported in other Italian regions. The p44 strain clustered with hCoV-19/Italy/LAZ-INMI-8/2020 (EPI_ISL_424342) from Central Italy (Lazio region), hCoV-19/Italy/VEN-UniVR-6/2020 (EPI_ISL_492985) and hCoV-19/Italy/LOM-UniMI-L160/2020 (EPI_ISL_542155) collected at the beginning of March in Northern Italy (Veneto and Lombardia regions).

The p19 and p24 showed evolutionary correlation with sequences from Lazio, Marche and Abruzzo regions collected in March 2020: hCoV-19/Italy/LAZ-INMI-9/2020 (EPI_ISL_424343), hCoV-19/Italy/MAR-UnivPM-78955-2/2020 (EPI_ISL_516088), hCoV-19/Italy/LAZ-INMI11-B/2020 (EPI_ISL_451304), hCoV-19/Italy/ABR-IZSGC-TE7097/2020 (EPI_ISL_528929) with p34 and hCoV-19/Italy/ABR-IZSGC-TE5472/2020 (EPI_ISL_420564) out of the former cluster.

Within the clade GR, one sequence did not cluster (p39) while p42 clustered with hCoV-19/Italy/LOM-UniMI-L182/2020 (EPI_ISL_542173) from Northern Italy (Lombardia region) and hCoV-19/Italy/SAR-AMVRC-28/2020 (EPI_ISL_458085) from Central Italy (Sardinia region). The p31 and p45 from different municipalities clustered together.

The variants shared by p03 and p34, who have history contact, were inspected. The p34 sample shared two variants, P512 in ORF1ab NSP3 and D3G in M, with p19 and p24 samples, whereas p03 did not show these mutations, confirming the different origin of viral infection between the two cases. Based on the phylogenetic analysis related sequences were reported from different municipalities and the large municipality of Napoli showed the circulation of several viral sequences with point mutations shared within strains phylogenetically correlated or not. In addition, as reported in Table 1, p03 and p34 declared close contacts among them, however, they clustered in different position within the phylogenetic tree and showed different point mutations (Fig. 1).

Fig. 1
figure 1

Maximum likelihood tree built using 133 SARS-CoV-2 complete genomes downloaded from GISAID database and 21 Italian strains reported in bold. The phylogenetic was built using IQ-TREE using the best fit model indicated by the Model Finder implemented in IQ-TREE and 1000 bootstrap replicates. Bootstrap > 70 are reported at nodes

Discussion

Since the first complete genome sequencing of SARS-CoV-2 on 31 December 2019, and the first Italian case of COVID-19 in Italy [17], more than 550,000 complete genomes have been sequenced worldwide and released on GISAID database after 1 year of pandemic. To date, the most used and successful sequencing method to obtain complete genome is NGS. We present here the complete sequencing of SARS-CoV-2 genomes using the Ion Torrent Genexus System, a highly automated sequencer. In this study, 21 out of 27 SARS-CoV-2 RNA, tested with Genexus System, were fully sequenced.

The 6 samples not fully sequenced had a number of copies lower than the limit of quantification of the Real Time PCR assay (20 copies) and 3 of them had too low RNA for the Qubit quantification. The results suggest that samples had an amount of viral RNA closer to the limit of sequencing potential of the amplicon-based Ampliseq technology. However, despite the low number of copies and RNA, too low to be detected by the Qubit, four samples were fully sequenced suggesting that other factors related to the quality of the sample may also affect the success of library preparation and sequencing reaction.

The results obtained, however, highlight the potentiality of the Genexus System: (i) the users are involved only in the sample preparation and quantification, (ii) the automated process allows the users to focus on NGS raw data check and subsequent bioinformatic analysis, (iii) the technology allowed the complete genome sequencing of 78% of the samples (21/27) obtained from routine SARS-CoV-2 diagnosis process, with additional 3 samples showing nearly completed sequencing of the genome (> 95%). Moreover, the Genexus System permits the sequencing of 32 multiplexed samples in less than 24 h, representing a useful method for SARS-CoV-2 surveillance during a pandemic event. The Genexus System could be useful to analyse in a few days, respect to the Sanger sequencing, several viral strains from patients with an abnormal clinical presentation such as a very late viral clearance or to identify possible new variants in situations of rapid increase of contagions.

The complete genome sequences obtained were analysed by phylogenetic analysis and compared to the 133 Italian complete genomes on GISAID database collected in the same period of the samples analysed in this study. The sequences here reported were collected in March and April 2020 from nasopharyngeal swabs from Napoli province and nearby towns in the Campania region. The sequences clustered within the two lineages, B.1 and B.1.1, which were mostly detected in Italy and worldwide since February 2020, when the D614G mutation of the B.1 lineage was reported for the first time.

The Italian sequences within the B.1 formed 9 clusters while other 3 clusters were in B.1.1, confirming the heterogeneity of circulating strains. In addition, most of the sequences under study (14/21) were collected from Napoli and formed 7 different clusters consistent with an area with high population density. Two clusters (e.g.: p19, p44) were formed by strains from this study, showing evolutionary correlation to SARS-CoV-2 sequences reported in other Italian regions and suggesting common origin of viral strains. The cluster formed by p40 and p04 sequences also showed the circulation of correlated sequences in different municipalities suggesting intra-municipality commutes.

To date the only differences reported among SARS-CoV-2 are point mutations, excepting the deletion described in the B.1.1.7 and P.1, reported for the first time in December 2020 in the United Kingdom and Brazilian variants. The point mutations reported in this study and shared by all complete genomes are related to the B.1 and B.1.1 lineages as the D614G.

Other mutations were reported in one or two sequences only and never reported before. Since these mutations never fixed in the viral population, we can speculate that a correlation between these SNPs and viral adaptation may exist. The genes with the high number of point mutations were the spike with 11, 6 in nsp3 and 7 in N, three genes under positive selection [18, 19].

It is interesting to note that two patients who tested positive after a business meeting (p03 and p34) actually have genomic sequences not closely correlated, thus suggesting different origins of the infection. The sequencing of the viral genome can therefore also better clarify some dynamics relating to the spread of the infection in hospitals or in communities.

Since the appearance of SARS-CoV-2 several mutations have been reported in the spike gene and novel mutations are continuously described [10, 20,21,22,23]. Selected mutations such as the D614G might provide an advantage to the virus by increasing the cellular infectivity and virus transmissibility. Recently, a N501Y mutation on the Spike gene has been reported [24] showing a higher affinity to human ACE2 protein as compared to D614G. The surveillance on circulating strain is highly relevant today because the novel variants, with multiple mutations in their spike glycoproteins, are key targets of virus-neutralizing antibodies and raise the concern of vaccine efficacy against the novel strains.

Conclusions

During a pandemic event the surveillance of circulating strains is crucial to understand the evolution of viral strains and the emerging of novel variants. This study is a first observation of variants detected in the Campania region; a region less affected than Italian Northern regions in the first phase of the pandemic in Italy. In particular, we reported the circulation of different variants within the Napoli province and the heterogeneity of different strains circulating between and within municipality. In addition, a novel automated technology as the Ion Torrent Genexus Integrated system allowed complete genome sequencing even of samples with relatively low viral titer in a relatively short timeframe, thus facilitating the continuous surveillance of novel variants.