Introduction

The Coronavirus Disease 2019 (COVID-19) is a new contagious disease caused by the novel coronavirus (SARS-COV-2)1, which belongs to the genera of betacoronavirus, the same as the coronavirus that caused the SARS epidemic between 2002 and 20032. COVID-19 has spread to more than 200 countries/regions, with over 102 million confirmed cases and 2.2 million lives claimed as of January 31, 20213. The outbreak has been declared a pandemic and a public health emergency of international concern4.

As the specific symptoms of COVID-19 are now well-publicised, symptomatic transmissions are being contained in most countries. However, disease transmission by pre-symptomatic and asymptomatic viral carriers is seen to be extremely difficult to deal with due to its hidden nature5. Clinical data reveals that viral load becomes significant before the symptom onset6,7,8. Epidemiological investigations have identified clear cases of pre-symptomatic transmission soon after the initial outbreak9,10,11,12. Estimates vary greatly among experts on the percentage of total transmission due to this group of viral carriers, ranging from as low as 18% to over 50%13,14,15. An early model-based study by Ferretti et al.16 suggested that pre-symptomatic transmission alone could yield a basic reproduction number R0,p = 0.9, close to the critical value of 1.0 that sustains epidemic growth. Under intense surveillance of the pandemic, pre-symptomatic and asymptomatic transmissions become the main focus in outbreak control5.

While the actual viral shedding is influenced by many factors, patient viral load during the course of disease progression is more universal. This suggests a modelling approach that starts with clinical observations of symptom onset, and treats disease transmission as a dependent process that is further shaped by living and social conditions, including control measures to reduce physical contact. Following this strategy, we first introduce a model for an unprotected population and calibrate the model parameters against clinical case reports during the initial outbreak. Subsequently, we estimate the percentage reduction in the basic reproduction number (estimated to be around 3.87 at an exponential growth rate of 0.3/day) due to contact tracing, mask wearing and other measures, individually or in combination. Additionally, we present our findings against the epidemic development curves around the world to highlight the level of social mobilisation required to contain COVID-19 spreading.

Results

A renewal process centred on symptom onset

In epidemiological studies, the central quantity is the average number of secondary infections per unit time r(t) by a viral carrier on day t since the individual’s own infection17,18. In the case of COVID-19, disease transmission peaks around the symptom onset time of the individual7,8, as illustrated by the infectiousness curve shown in Fig. 1a (left panel). This property, when averaged over the population, gives an r(t) (Fig. 1a, right panel) that closely resembles the symptom onset time distribution, which we denote by pO(t) (Fig. 1a, middle panel). In fact, when the time window of transmission is narrowly centred around the symptom onset, we have approximately

$$r\left( t \right) \approx R_{\mathrm{E}}p_{\mathrm{O}}\left( {t + \theta _{\mathrm{S}}} \right).$$
(1)
Fig. 1: A stochastic model for COVID-19 disease progression, transmission and intervention.
figure 1

a The mean reproduction rate r(t) (black curve) of a patient on day t since infection is expressed as a convolution of the symptom onset time distribution pO(t) (red curve) and the infectiousness curve REpIt) (blue curve), where Δt is measured from the symptom onset. The mean reproduction number RE sets the overall level of the epidemic. The peak of the normalised infectiousness function pIt) is shifted from the symptom onset by an amount θP, which takes a positive value on the pre-symptomatic side. The peak of the mean reproduction rate r(t) is shifted from the peak of the symptom onset time distribution pO(t) by θS. b A compartmentalised model. A person infected first goes through a non-infectious latent phase (L) until tL, followed by an infectious period that spans across symptom onset at tO. In the pre-symptomatic phase A, the person is infectious without symptoms. The A phase is further split into two subphases, A1 with a constant transmission rate (orange region) and A2 with a declining transmission rate (blue region). At the symptom onset time tO, the person enters the S phase, and continues to be infectious (light blue region). Contact tracing brings an infected person out of the transmission cycle at the point of isolation, while testing does so only when the result is positive.

The mean reproduction number RE sets the overall level of disease transmission in the population, and equals the basic reproduction number R0 when the infectious disease first breaks into a community. Its actual value could change over time due to factors such as the intervention and containment measures considered below. The shift parameter θS (Fig. 1a, right panel) accommodates the actual shape of the infectiousness curve as well as effects resulting from intervention measures, e.g., isolation delays of infected cases.

To link up Eq. (1) with actual transmission data, we developed a compartmentalised epidemic spreading model as illustrated in Fig. 1b. A total of four phases are introduced to accommodate the infectiousness curve in Fig. 1a, left panel. Three of these phases reside in the pre-symptomatic period: a non-infectious latent phase L, followed by infectious phases A1 and A2 before and after the infectiousness peak. Starting from the day of infection, an individual first stays in the latent phase L. Transition to phase A1 takes place at a rate αL(t) that depends on the elapsed time t since infection. Once in phase A1, the individual is infectious with a daily transmission rate βA. Duration of the A1 phase is variable and follows Poisson statistics with an exit rate constant αA. On the other hand, duration of the succeeding phase A2 is fixed at θP, after which symptoms develop and the person enters the symptomatic phase S. Upon entering A2, the patient’s disease transmission rate βB(τ) weakens with the elapsed time τ = ttO + θP to match the right-wing of the infectiousness curve. Note that, due to the variable duration of A1, the population-averaged infectiousness of this phase rises towards the symptom onset.

Applying the above rules of disease transmission to a large and well-mixed population, the number of new infections per unit time JL(T) on day T satisfies the renewal equation

$$J_{\mathrm{L}}\left( T \right) = \int_{0}^{\infty} {r\left( t \right)J_{\mathrm{L}}\left( {T - t} \right){\mathrm{d}}t} ,$$
(2)

where the kernel function is given by

$$r\left( t \right) =\! \int_{0}^{t}\! {\alpha _{\mathrm{L}}\left( {t_1} \right)q_{\mathrm{L}}\left( {t_1} \right){\mathrm{e}}^{ - \alpha _{\mathrm{A}}\left( {t - t_1} \right)}} \left[ {\beta _{\mathrm{A}} + \alpha _{\mathrm{A}}\int_{0}^{t - t_1}\! {\beta _{\mathrm{B}}\left( {t_2} \right){\mathrm{e}}^{\alpha _{\mathrm{A}}t_2}{\mathrm{d}}t_2} } \right]{\mathrm{d}}t_1,$$
(3)

with \(q_{\mathrm{L}}\left( t \right) = {\mathrm{e}}^{ - {\int}_0^t {\alpha _{\mathrm{L}}\left( {t_1} \right){\mathrm{d}}t_1} }\) being the probability that an individual remains in the latent phase t days after infection. Derivation of these results are presented in Supplementary Section 1, together with dynamic equations governing the size of each subgroup.

Equations (2) and (3) can be solved by performing the Laplace transform. In this respect our model is equally tractable mathematically as the susceptible-exposed-infectious-recovered (SEIR) type models defined by a set of rate equations19. As we show below, the explicit representation of the temporal structure for disease progression and transmission in the present case facilitates direct model calibration from clinical data and also quantitative evaluation of intervention measures against epidemic development.

In Supplementary Section 1.3, we show that the mean reproduction number of the model is given by \(R_{\mathrm{E}} = R_{\mathrm{E}}^{\mathrm{A}} + R_{\mathrm{E}}^{\mathrm{S}}\), with \(R_{\mathrm{E}}^{\mathrm{A}} = \beta _{\mathrm{A}}/\alpha _{\mathrm{A}} + {\int}_0^{\theta _{\mathrm{P}}} {\beta _{\mathrm{B}}\left( \tau \right){\mathrm{d}}\tau }\) and \(R_{\mathrm{E}}^{\mathrm{S}} = {\int}_{\theta _{\mathrm{P}}}^\infty {\beta _{\mathrm{B}}\left( \tau \right){\mathrm{d}}\tau }\) being reproduction numbers associated with pre-symptomatic and symptomatic transmissions, respectively. When the right wing of the infectiousness curve in Fig. 1a takes the form of an exponentially decaying function \(\beta _{\mathrm{B}}\left( \tau \right) = \beta _{\mathrm{A}}{\mathrm{e}}^{ - \alpha _{\mathrm{B}}\tau }\) with a sufficiently large decay rate αB, we recover Eq. (1) which was initially proposed on heuristic grounds. The shift parameter is given approximately by

$$\theta _{\mathrm{S}} \approx \theta _{\mathrm{P}} - \frac{{\alpha _{\mathrm{A}}}}{{\alpha _{\mathrm{B}}\left( {\alpha _{\mathrm{A}} + \alpha _{\mathrm{B}}} \right)}}.$$
(4)

Parameter calibration

By combining three data sets7. Result of the MLE is given by two exponential functions meeting at −0.68 days (red solid line). Also shown are distributions with a constant bridge (dash line), or with a dome cap (dash-dotted line), with slightly lower likelihood values (see Supplementary Section 2.2). Thin grey curves are from bootstrap** for model #1 (see the “Methods” section). c Serial interval statistics outside the Hubei province in China from January 9 to February 13, 202022,23: whole period (solid circles), first two weeks (open squares), and last two weeks (open triangles). The grey dashed line on the right indicates exponential decay at a rate −0.31/day. The red curve is the convolution of the two red curves shown in a and b.