Introduction

The chromatin of a differentiated eukaryotic cell forms heterochromatin that coexists with euchromatin. In many cell types, heterochromatin is observed at the vicinity of the nuclear membrane and the nucleolus1. The compartmentalization of heterochromatin regions was also demonstrated in Hi-C experiments2,3. The genes in a euchromatin are actively expressed, while the genes in a heterochromatin are only rarely expressed. The genomic region of constitutive heterochromatin, such as centromeric and telomeric regions, does not depend on cell types, while facultative heterochromatin can switch to euchromatin and vice versa during development.

Biophysically, heterochromatin has been thought to be assembled by the phase separation of chromatin4,5,6,7,8,9,10,11,12,13,14,15,16,17,18. The constitutive heterochromatin is characterized by the post-translational modification of histone tails, H3K9me2/3, of nucleosomes. HP1 proteins selectively bind to H3K9 methylated histone tails and show liquid-liquid phase separation due to the multi-valent interaction between these proteins. The multi-valent interaction between HP1 proteins bound to nucleosomes with H3K9me2/3 has been thought to be the driving force of the heterochromatin assembly.

Fission yeast is a classical model system that has been used in molecular biology experiments to study the assembly of heterochromatin19,20. A fission yeast has three chromosomes, each having a centromeric region of 40 - 110 kbps (which are estimated to be 24 - 67 Kuhn units). The molecular mechanism of the heterochromatin assembly has been revealed in the last decades. The transcription is a rare event in heterochromatin regions. Nevertheless, the transcription is essential because the RNA interference (RNAi) pathway is the main molecular mechanism of the assembly and maintenance of centromeric heterochromatin of fission yeast21,22,23,24: RITS complexes bind to nascent RNAs and recruit CLRC complexes to methylate the H3K9 of nucleosomes of heterochromatin25 or RDRC/Dicers to produce small RNAs26. Recent experiments suggest that the binding of RITS complexes to nascent RNAs is small RNA-dependent when they methylate the H3K9 of nucleosomes and is H3K9me2/3-dependent when they produce small RNAs27,28: the H3K9 methylation and the small RNA production form the positive feedback, mutually enhancing each other. Experiments suggest that RDRC/Dicers are localized at the inner surface of the nuclear membrane in fission yeast29,30, implying that the small RNA production (and probably H3K9 methylation as well) occur at vicinity of the surface.

Because the H3K9 methylation and small RNA production occur during transcription, one may wonder what will happen if one upregulates the transcription. Indeed, the heterochromatin mark, such as the level of small RNAs, in the centromeric regions is enhanced by the upregulation of transcription31. Why does the transcription form heterochromatin in centromeric regions, but not in euchromatin regions? The centromeric regions are composed of repeat sequences that contain many transcription start sites inside them31. Asanuma and coworkers thus knocked-in tandemly repeated euchromatic genes in a euchromatic region to mimic this situation and found that these tandemly repeated genes are favorable substrates for RNAi-mediated heterochromatin—repeat-induced RNAi31. This implies that the repeat sequence and many transcription start sites are the key genomic signature, at which constitutive heterochromatin is assembled. It is well-known in polymer physics that polymers adhere to a surface much stronger than their monomers due to the connectivity of monomers along the polymer chain32,33, see Fig. 1. In a similar manner, the polymeric nature of tandemly repeated genes may enhance the binding of nascent RNAs to RDRC/dicers at the surface of the nuclear membrane.

Fig. 1: Adhesion of polymers to a surface.
figure 1

a Polymers composed of N adhesive units (cyan beads) and NL linker units (white beads). Adhesive units bind to the surface with the rate kon if they are located in 0 < z < b and bound units are unbound from the surface with the rate koff. Unbound units (for both adhesive and linker units) are repelled by the surface. b The probability qon that more than one (adhesive) unit is bound to the surface is shown as a function of the binding constant \({k}_{{{{{{\rm{off}}}}}}}/{k}_{{{{{{\rm{on}}}}}}}\) for N = 1 (cyan), 10 (black), and 20 (magenta). c. The fraction of units bound to the surface, provided that at least one (adhesive) unit is bound to the surface, is shown as a function of the binding constant \({k}_{{{{{{\rm{off}}}}}}}/{k}_{{{{{{\rm{on}}}}}}}\) for N = 1 (cyan), 10 (black), and 20 (magenta). b and c are derived by using the scaling theory shown in Supplementary Note 1, see also Supplementary Figures 15 and Supplementary Table 1. We used NL = 200 to derive b.

We therefore construct a theoretical model to predict the assembly mechanism of heterochromatin via repeat-induced RNAi. In this model, we take into account the connectivity of tandemly repeated genes and the diffusion of small RNAs in the kinetic equation of the H3K9 methylation and the small RNA production. Our theory predicts that the polymeric nature of the tandemly repeated genes ensures the steady production of small RNAs and H3K9 methylation through the stable binding to RDRC/Dicers at the surface of the nuclear membrane. We take into account the compaction of the H3K9 methylated chromatin due to Swi6/HP1, motivated by our recent experiments that Epe1, a putative H3K9 demethylase in fission yeast34,35,36, is required for repeat-induced RNAi31. For simplicity, we do not treat the suppression of transcription due to HDAC that keeps hypoacetylation in heterochromatin. This theory predicts that the compaction limits the accessibility of Pol IIs to the tandemly repeated genes and that the depletion of H3K9 demethylase decreases the production rate of small RNAs if the number of genes in the repeat is large enough. Our predictions are consistent with our recent experiments31. Small RNA-dependent gene silencing is not limited to fission yeast, but is also found in higher organisms37,38,39. Extensions of our theory may provide insight in the gene silencing mechanism of such systems.

Results

Kinetic model of assembly of heterochromatin

We treat tandemly repeated genes in a long chromosome at the vicinity of the nuclear membrane. We take into account the connectivity of these genes along the chromatin in the kinetic equations of transcription, H3K9 methylation, and small RNA production, coupled with the diffusion equation of small RNAs, see Fig. 2a. This approach is the fusion of systems biology, which often quantifies biochemical reactions by kinetic equations, and polymer physics, which studies the properties arising from the connectivity of repeated units along a polymer chain. Our model predicts the probability p that the nascent RNAs produced from genes during transcription are connected to RDRC/Dicers localized at the surface of the nuclear membrane, which characterizes the extent of small RNA production (the glossary of symbols are given in Supplementary Table 2).

Fig. 2: Model of heterochromatin assembly.
figure 2

a The key processes of heterochromatin assembly taken into account in our model are transcription (1), the binding of nascent RNAs (long green chain) to RDRC/Dicers (black complex) at the surface of the nuclear membrane (2), the production of small RNAs (short green chain) (3), the diffusion of small RNAs (4), and the H3K9 methylation of nucleosomes (cyan circles) (5). The production of small RNAs is H3K9me-dependent and the H3K9 methylation is small RNA-dependent. These processes thus mutually enhance to each other (positive feedback). Nascent RNAs are necessary for the binding of nascent RNAs to RDRC/Dicers and H3K9 methylation. The latter processes happen during transcriptional elongation. RITS complexes (magenta circle) act as a hub that mediate the binding of nascent RNAs and RDRC/Dicers or CLRCs (purple), but are taken into account in the model only implicitly. If more than one gene are bound to a RDRC/Dicer, unbound genes in the tandem repeat are also localized at the vicinity of RDRC/Dicers because the genes are connected through the chromatin and RDRC/Dicers are localized at the surface of the nuclear membrane, see the thick cyan and black lines. b Each gene is either of the three state: Pol II are absent from the gene (unbound), Pol II is bound to the promoter (bound), and Pol II has engaged in the transcriptional elongation (elongation). The transition between the unbound and bound states is faster than the transition from the bound to elongation states.

We use a simplified version of the Stasevich model40 to treat the transcription, see Fig. 2b. With this model, a gene is either of the three states: the unbound state (where Pol IIs are absent from the gene), the bound state (where a Pol II is bound to the promoter of the gene, but has not yet initiated transcription), the elongation state (where the gene is in the middle of the transcription). The kinetic equations of the state transitions are shown in the Materials and Methods (see also Supplementary Table 2 for the glossary of symbols). Because the binding and unbinding of Pol IIs to the promoter is faster than the initiation of transcriptional elongation, the kinetics of transcription can be represented by the Michaelis-Menten kinetics, where Pol IIs act as enzymes and the promoters act as substrates. Both H3K9 methylation and small RNA production happen during the transcriptional elongation. In the steady state, the fraction nelo of genes during the transcriptional elongation thus has the form

$${n}_{{{{{{\rm{elo}}}}}}}=\frac{{k}_{{{{{{\rm{ini}}}}}}}{\tau }_{{{{{{\rm{elo}}}}}}}\rho }{(1+{k}_{{{{{{\rm{ini}}}}}}}{\tau }_{{{{{{\rm{elo}}}}}}})\rho +{K}_{{{{{{\rm{p}}}}}}}}$$
(1)

where kini is the rate of transition from the bound to elongation states, τelo is the elongation time, ρ is the local concentration of Pol II, and Kp is the equilibrium constant that accounts for the binding of Pol IIs to the promoter, see Materials and Methods for the derivation.

In polymer physics, a long polymer, including chromatin, is modeled as repeated units, each has Kuhn length b, connected along a chain. The Kuhn length of chromatin is estimated to be 50 nm (≈1.65 kbps)41, which is approximately the length of typical genes, such as ade6 (≈1.66 kpbs). We treat such cases, in which each chromatin unit has one gene. Nascent RNAs produced by the transcription of heterochromatin regions are retained to the chromatin via RITS complexes28,42,43. A complex of the chromatin unit and nascent RNA can be thus viewed as one unit that can bind to RDRC/Dicers at the surface of the nuclear membrane. The kinetic equation of the fraction p of units bound to RDRC/Dicers thus has the form

$$\frac{{dp}}{{dt}}={k}_{{{{{{\rm{on}}}}}}}\frac{b}{\xi }{\sigma n}_{{{{{{\rm{elo}}}}}}}\left(1-p\right)-\frac{p}{{\tau }_{{{{{{\rm{elo}}}}}}}}$$
(2)

see also Fig. 2a. The first term of Eq. (2) is the rate of the binding of chromatin units to RDRC/Dicers at the surface and the second term represents the fact that bound units are released from RDRC/Dicers at the termination of transcription. The binding of a unit to RDRC/Dicers happens only if the nucleosomes of the unit are H3K9 methylated28 and is during the transcriptional elongation. The first term of Eq. (2) is thus proportional to the fraction σ of H3K9 methylated nucleosomes and the probability nelo that this unit is in the elongation state. Polymer physics has revealed that polymer units are confined in a layer of thickness ξ, which is the size of a subchain between units bound to the surface32,33, see also Supplementary Note 1. The polymeric nature of the tandemly repeated genes is taken into account in the factor b/ξ in the first term of Eq. (2). We use the simplest approximation in polymer physics—ideal chain approximation, to derive the expression of the size ξ of the subchain in the next section, see Fig. 3a, and take into account the compaction of the subchain due to the attractive interaction between Swi6 in the following sections, see Fig. 3b.

Fig. 3: Coil-globule transition.
figure 3

If we neglect the interaction between units (ideal chain approximation), the size of a section composed of g units in a chromatin is \(\xi =b{g}^{1/2}\) (coil) (a). In general, the H3K9 methylated units in the section show attractive interaction via Swi6. The interaction parameter χ represents the magnitude of this interaction. If the interaction parameter is large enough, the size of the section is \(\xi \propto b{g}^{1/3}\) (globule) (b).

Small RNAs are produced by RDRC/Dicers that cut nascent RNAs into pieces. The rate of the production of small RNAs per unit area has the form

$$S=\frac{{s}_{0}{Np}}{1-{\left(1-p\right)}^{N}}\frac{1}{{\xi }_{N}^{2}}$$
(3)

see also Fig. 2a and Supplementary equation (S35). s0 is the production rate of small RNAs from a unit. The production rate S per unit area is proportional to the number of units that are bound to RDRC/Dicers and to the inverse of the area \({\xi }_{N}^{2}\) occupied by the tandemly repeated genes. The produced small RNAs diffuse away from the nuclear membrane. In the steady state, the local concentration of small RNAs has the form

$$c\left(z\right)=\frac{S}{{k}_{{{{{{\rm{d}}}}}}}\lambda }{e}^{-\frac{z}{\lambda }}.$$
(4)

Equation (4) represents the fact that small RNAs are localized at the region of thickness λ. The diffusion length \(\lambda \, (=\sqrt{D/{k}_{{{{{{\rm{d}}}}}}}})\) is the distance by which small RNAs can diffuse (with the diffusivity D) until they are degraded (with the rate kd). While small RNAs diffuse away from the nuclear membrane, they are bound to (and unbound from) argonaute proteins in RITS complexes. The concentration c(z) includes both bound and unbound populations of small RNAs.

The kinetic equation of the degree σ of H3K9 methylation has the form

$$\frac{d\sigma }{{dt}}={k}_{{{{{{\rm{m}}}}}}}\Lambda {n}_{{{{{{\rm{elo}}}}}}}\left(1-\sigma \right)-{k}_{{{{{{\rm{dm}}}}}}}\sigma$$
(5)

see Fig. 2a. Equation (5) represents the fact that the degree of H3K9 methylation is determined by the balance of H3K9 methylation rate (the first term) and H3K9 demethylation rate (the second term). Equation (5) also assumes that the number of CLRC and RITS complexes is large and does not limit the kinetics of H3K9 methylation and that H3 proteins in nucleosomes are stably bound to DNA and are not evicted during the transcription, reflecting the recent structural biology experiments44. The parameter Λ is proportional to the probability that small RNAs that are bound to RITS complexes are bound to the nascent RNAs produced from a gene,

$$\Lambda \simeq \frac{1}{\xi }{\int }_{0}^{\xi }{dz}c\left(z\right)=\frac{S}{{k}_{{{{{{\rm{d}}}}}}}\xi }\left(1-{{{{{{\rm{e}}}}}}}^{-\frac{\xi }{\lambda }}\right)$$
(6)

see also Fig. 2a. To derive Eq. (6), we assumed that the dynamics of the subchain is faster than the enzymatic reaction of the H3K9 methylation.

The binding probability p of units is derived by solving Eqs. (16) for the steady state and the relationship between the size ξ of the tandemly repeated genes and the probability p that depend on polymer models (shown in the following sections). This conditional probability is effective for the bound state, at which at least one unit in the repeat is bound to RDRC/Dicers at the surface of nuclear membrane. The probability qon that the repeat is in the bound state follows the kinetic equation

$$\frac{d}{{dt}}{q}_{{{{{{\rm{on}}}}}}}={k}_{{{{{{\rm{on}}}}}}}\frac{3}{2}\frac{N}{{N}_{{{{{{\rm{L}}}}}}}}{\sigma }_{0}\left(1-{q}_{{{{{{\rm{on}}}}}}}\right)-N\frac{p{\left(1-p\right)}^{N-1}}{1-{\left(1-p\right)}^{N}}\frac{{q}_{{{{{{\rm{on}}}}}}}}{{\tau }_{{{{{{\rm{elo}}}}}}}}$$
(7)

Equation (7) represents the fact that the probability qon increases if one of the units in unbound tandemly repeated genes is bound to a RDRC/Dicer (the first term) and decreases if the transcription of the gene at the last bound unit is terminated before other genes are bound to RDRC/Dicers (the second term). σ0 is the degree of H3K9 methylation of the tandemly repeated genes due to the primary small RNA for cases that all of the genes are not bound to RDRC/Dicers. NL is the number of units of the linker chromatin (between the tandemly repeated genes and the bound chromatin region at the neighbor, such as telomere), see Fig. 1a. To derive the second term, we assumed that the relaxation time for the binding of units in the tandemly repeated genes is shorter than the time scale of the kinetics of qon and that the unbinding of units results from the termination of transcription (a more general derivation of qon is shown in sec. S1.4 and S1.6 in the SI).

Polymeric nature of tandemly repeated genes ensures steady production of small RNAs

The tandemly repeated genes are confined in a layer of thickness ξ of the size of a subchain between units bound to RDRC/Dicers at the surface of the nuclear membrane. Because pN units are bound to the surface, the average number of units in a subchain is p−1. The simplest treatment of the polymeric nature of the tandemly repeated genes is the ideal chain approximation that takes into account only the connectivity of the genes32. With this approximation, its size ξ has the form

$$\xi =b{p}^{-\frac{1}{2}},$$
(8)

see also Fig. 3a and Supplementary Note 1. ξN in Eq. (3) is the size of the tandemly repeated genes (composed of N units) and is bN1/2 in the ideal chain approximation. We also neglect the excluded volume interaction between chromatin units and Pol IIs. The local concentration ρ of Pol II at the genes is thus equal to the concentration ρ0 of Pol II in the nucleosol, ρ = ρ0. The values of the parameters used for the numerical calculations are summarized in Table 1. The general feature of our results does not depend on these values, see also Supplementary Figures 68.

Table 1 Independent parameters of the ideal chain model.

The binding probability p is derived as a function of the inverse of the (rescaled) elongation time 1/(kon τelo), see Fig. 4. The latter dimensionless quantity corresponds to the (rescaled) dissociation rate \({k}_{{{{{{\rm{off}}}}}}}/{k}_{{{{{{\rm{on}}}}}}}\) in the polymer adhesion problem, see Fig. 1b. For cases in which the number of genes in the repeat is smaller than a critical value Nc, the binding probability p increases continuously with increasing the elongation time τelo, see the cyan line in Fig. 4. In contrast, for cases in which the number of genes in the repeat is larger than the critical value Nc, the binding probability p has two stable solutions (the magenta solid line Fig. 4) and one unstable solution (the magenta broken line in Fig. 4). The binding probability p is approximately zero for one of the two solutions, implying that the units in the repeat are rarely bound to RDRC/Dicers and do not produce small RNAs steadily. This feature is analogous to euchromatin. The binding probability p is relatively large in the other stable solution, implying that the units in the tandemly repeated genes are almost always bound to RDRC/Dicers and steadily produce small RNAs. This feature is analogous to heterochromatin. The euchromatin solution is stable for \({\tau }_{{{{{{\rm{elo}}}}}}}^{-1}\, > \,{\tau }_{{{{{{\rm{sp}}}}}}2}^{-1}\), while the heterochromatin solution is stable for \({\tau }_{{{{{{\rm{elo}}}}}}}^{-1}\, < \,{\tau }_{{{{{{\rm{sp}}}}}}1}^{-1}\), see the magenta line in Fig. 4. The probability p therefore jumps from zero to a finite value at \({\tau }_{{{{{{\rm{elo}}}}}}}={\tau }_{{{{{{\rm{sp}}}}}}2}\) and from a finite value to zero at \({\tau }_{{{{{{\rm{elo}}}}}}}={\tau }_{{{{{{\rm{sp}}}}}}1}\). Our theory therefore predicts the discontinuous transition (the first order phase transition) between euchromatin and heterochromatin. Another important prediction is that not only the number of genes in the repeat, but also the elongation time, which is the length of each gene divided by the elongation of Pol II, are critical parameters for the assembly of heterochromatin. The latter prediction may be experimentally accessible.

Fig. 4: Binding probability p vs elongation time τelo – ideal chain chromatin model.
figure 4

The binding probability p is shown as a function of the inverse of the elongation time τelo as predicted by using the ideal chain chromatin model, see Eq. (8). We performed numerical calculations for N = 2 (cyan), 10 (black), and 25 (magenta). The stable solutions are shown by the solid lines and the unstable solution is shown by the broken line. τsp1 and τsp2 are the values of the elongation time at which the heterochromatin and euchromatin solutions become unstable, respectively, see the light green dotted lines. The parameters used for the calculations are summarized in Table 1. The derivation of this figure, including the critical number Nc of genes, is shown in Supplementary Note 2.

The probability qon of the repeat at the bound state increases as the elongation time τelo increases and eventually becomes qon ≈ 1, where the repeat is stably bound to the surface of nuclear membranes, see Fig. 5a. Notably, the width of the window of elongation time at which qon ≈ 1 increases as the number N of genes increases. It is because the rate with which all the genes in the repeat are unbound from RDRC/Dicers (shown in the second term of Eq. (7)) decreases exponentially with increasing the number N of genes. This mechanism is essentially the same as the case of the surface adhesion of polymers, see Fig. 1b, c. The average degree σqon of H3K9 methylation and the average production rate of small RNAs increase with increasing the number N of genes and eventually saturate for the regime of long elongation time, reflecting the feature of qon, see Fig. 5b, c. The increase of the level of H3K9 methylation and small RNAs with the number N of genes in the tandem repeat is consistent with our recent experiments31. Our theory therefore predicts that the polymeric nature of the tandemly repeated genes enhances the positive feedback between the small RNA production and the H3K9 methylation through the stable binding to RDRC/Dicers at the surface of the nuclear membrane, ensuring the steady production of small RNAs. Yet, the binding probability p in the bound state does not increase with N because although the production rate of small RNAs increases in proportion to the number N of genes in the repeat, the produced small RNAs are diluted due to the fact that the area \({\xi }_{N}^{2}\) occupied by this repeat increases in proportion to the number N.

Fig. 5: Degree of H3K9 methylation and production rate of small RNAs vs elongation time – ideal chain chromatin model.
figure 5

The probability qon that at least one unit in the tandemly repeated genes is bound to RDRC/Dicers (a), the degree σqon of DNA methylation (b), and the rescaled production rate \(p{q}_{{{{{{\rm{on}}}}}}}/(1-{\left(1-p\right)}^{N})\) of small RNAs (c) are shown as functions of the inverse of the elongation time τelo as predicted by the ideal chain chromatin model. We performed numerical calculations for N = 2 (cyan), 5 (light green), 10 (black), and 25 (magenta). The values of parameters used for the calculations are summarized in Table 1.

Chromatin compaction due to Swi6/HP1 suppresses transcription and reduces the production of small RNA

Swi6/HP1 binds to H3K9 methylated nucleosomes45 and the self-association of Swi6 has been thought to promote the compaction of the centromeric regions46,47,48,49. The extent of the chromatin compaction is quantified by the volume fraction ϕc of chromatin. The volume fraction ϕc is determined by the equation of state

$$-{p}^{\frac{4}{3}}{\phi }_{{{{{{\rm{c}}}}}}}^{\frac{1}{3}}+{p}^{\frac{2}{3}}{\phi }_{{{{{{\rm{c}}}}}}}^{\frac{5}{3}}-\chi {\sigma }^{2}{\phi }_{{{{{{\rm{c}}}}}}}^{2}-{\chi }_{0}{\phi }_{{{{{{\rm{c}}}}}}}^{2}-{\phi }_{{{{{{\rm{c}}}}}}}-{{\log }}\left(1-{\phi }_{{{{{{\rm{c}}}}}}}\right)=0.$$
(9)

The size ξ of a subchain is derived from the expression of the volume fraction \({\phi }_{\rm c}={b}^{3}{p}^{-1}/{\xi }^{3}\). Equation (9) represents the balance of the elastic stress due to the entropic elasticity of chromatin (the first and second terms), the stress due to the attractive interaction between chromatin units via Swi6 (the third term), and the stress due to the excluded volume interaction between chromatin units (the fourth, fifth, and sixth terms). Equation (9) is an extension of the equation of state that predicts the coil-globule transition of polymers and is derived by using the free energy in the spirit of the Flory theory50, see also Fig. 3 and Eq. (17) in Materials and Methods (the detail of the derivation is shown in Supplementary Note 3). This treatment is effective for cases in which the binding and unbinding of the units is rate limited: in such cases, the polymer dynamics associated with the binding and unbinding of the units is negligible and thus the local equilibrium approximation is applicable in the length scale of the subchain. χ is the interaction parameter that represents the magnitudes of the attractive interaction between chromatin units via HP1, see the dependence of the third term on σ2. χ0 is the interaction parameter that represents the magnitudes of the interaction independent of Swi6. We here use \({\chi }_{0}=\frac{1}{2}\) to approximately treat the screening of the excluded volume interaction due to the overlap** of chains: the chromatin return to ideal chains in the limit of vanishing Swi6 interaction, χ, see Eq. (3) (because the first and second terms of Eq. (9) dominate the other terms in this limit). The treatment used in this section is thus a direct extension of the treatment used to derive Fig. 1 (which agrees with the molecular dynamics simulation, see Supplementary Note 1).

In the globular state, the subchains composed of p−1 units in the tandemly repeated genes do not interpenetrate each other. The area occupied by the tandemly repeated genes is thus \({\xi }_{N}^{2}={\xi }^{2}{pN}\). This relationship is also valid for the coil state (see also the discussion below Eq. (8)) and will be used for any values of χ and χ0. The local concentration ρ of Pol II at the genes has the form

$$\rho =\left(1-{\phi }_{{{{{{\rm{c}}}}}}}\right){\rho }_{0}$$
(10)

which reflects the excluded volume interaction between Pol IIs and chromatin. The values of the parameters used for the numerical calculations are summarized in Table 2. The general feature of our results does not depend on these values, see Supplementary Figures 914. In the following, we mainly focus on the feature of the bound state of tandem repeats composed of many genes, where it does not depend on the number NL of units in linker chromatin.

Table 2 Independent parameters of the coil-globule transition model.

The interaction parameter χ is a standard parameter that is widely used in polymer physics, but its value for the chromatin interaction via Swi6 is not available in the literature. We thus calculated the binding probability p of units for various values of the interaction parameter χ. Our theory predicts that the binding probability p decreases as the interaction parameter χ increases for a relatively large elongation time, see Fig. 6a. This implies that the chromatin compaction due to Swi6 indeed suppresses the assembly of heterochromatin. Indeed, the production rate \(S{\xi }_{N}^{2}\) of small RNAs and the degree σ of H3K9 methylation in the bound state decreases with increasing the interaction parameter χ for the regime of long elongation time, see Fig. 6b, c. It is because the chromatin compaction decreases the local concentration of Pol IIs at the tandemly repeated genes and thus decreases the transcription rate of these genes, see Fig. 6d and Eq. (10).

Fig. 6: Effect of chromatin compaction on heterochromatin assembly – coil globule transition model.
figure 6

The binding probability p (a), the degree σ of H3K9 methylation in the bound state (b), the RNA production rate \(S{\xi }_{N}^{2}\) in the bound state (c), and the volume fraction ρ of Pol II (d) are shown as functions of the inverse of the elongation time for the values of the interaction parameters χ = 0.0 (black), 5.0 (light green), and 10.0 (orange) as predicted by using the coil-globule transition model. The black broken line is derived by using Eq. (9) with ρ = ρ0. The number N of units in the tandemly repeated genes is set to 25. The values of other parameters used for the calculations are summarized in Table 2.

H3K9 loss enhances the production of small RNA if the number of repeated genes is large

Epe1 is a putative H3K9 demethylase in fission yeast35,36. We have experimentally demonstrated that the depletion of Epe1 significantly decreases the production of small RNAs31. In our theory, the action of Epe1 is implicitly taken into account through the H3K9 demethylation rate kdm. We thus analyzed the dependence of the features of the heterochromatin on the demethylation rate kdm, see Fig. 7. Our theory predicts that the binding probability p of units is a non-monotonic function of the demethylation rate kdm if the interaction parameter is large enough, see the orange and magenta lines in Fig. 7a. It implies that if the inverse demethylation rate \({k}_{{{{{{\rm{dm}}}}}}}^{-1}\) is larger than the maximum in the wild type, the binding probability p decreases with decreasing the rate kdm (which corresponds to the depletion of Epe1). In this regime, the production rate of small RNAs in the bound state also decreases with decreasing the demethylation rate kdm, while the degree of H3K9 methylation increases with decreasing the demethylation rate \({k}_{{{{{{\rm{dm}}}}}}}\), see the orange and magenta lines in Fig. 7b, c. The down-regulation of small RNA production by the depletion of Epe1 results from the fact that the local concentration ρ of Pol II in the tandemly repeated genes decreases because this chromatin region becomes more compact with the depletion of Epe1, see the orange and magenta lines in Fig. 7d.

Fig. 7: Effect of demethylase upregulation on heterochromatin assembly – coil-globule transition model.
figure 7

The binding probability (a), the degree σ of H3K9 methylation in the bound state (b), the rescaled RNA production rate \(p/(1-{\left(1-p\right)}^{N})\) in the bound state (c), and the volume fraction of Pol II (d) are shown as functions of \(\frac{{s}_{0}}{{b}^{2}\sqrt{D{k}_{{{{{{\rm{d}}}}}}}}}\frac{{k}_{{{{{{\rm{m}}}}}}}}{{k}_{{{{{{\rm{dm}}}}}}}}\) (which is proportional to the inverse of the H3K9 demethylation rate kdm) for χ = 0 (black), 5.0 (light green), 10.0 (orange), and 20.0 (magenta) as predicted by the coil-globule transition model. The black broken line is derived by using Eq. (9) with ρ = ρ0. The inverse of rescaled elongation time \({\left({k}_{{{{{{\rm{on}}}}}}}{\tau }_{{{{{{\rm{elo}}}}}}}\right)}^{-1}\) is set to 0.05. The green dotted lines are derived by using the value of the binding probability for \(\frac{{s}_{0}}{{b}^{2}\sqrt{D{k}_{{{{{{\rm{d}}}}}}}}}\frac{{k}_{{{{{{\rm{m}}}}}}}}{{k}_{{{{{{\rm{dm}}}}}}}}\to \infty\). The number N of units in the tandemly repeated genes is set to 25. The values of other parameters used for the calculations are summarized in Table 2.

Our theory takes into account the primary small RNAs through the degree σ0 of H3K9 methylation in the unbound state, see Eq. (7). The degree σ0 increases with the depletion of H3K9 demethylase. For the case of only a gene of single copy (N = 1), the H3K9 methylation is mainly due to the primary small RNAs and the degree of H3K9me thus increases with the depletion of H3K9 demethylase. The H3K9 methylation of tandemly repeated genes is mainly due to the secondary small RNAs and the degree of H3K9me increases with the depletion of H3K9 demethylase, as described in the preceding section.

Transcription upregulation increases both RNA production and H3K9 methylation

The idea of chromatin compaction dependent gene accessibility of Pol II is attractive because the contribution of Epe1 to the heterochromatin assembly is explained only by the putative role of Epe1 as H3K9 demethylase. Epe1 has not only a JmjC domain, which is the putative domain of H3K9 demethylation activity, but also a transcription activation domain51. The relationship between the H3K9 demethylation and transcriptional activation is not fully elucidated. We first assume that the dominant role of epe1 is the activation of transcription, rather than the H3K9methyaltion, and use the ideal chain model to analyze if this assumption is consistent with our experimental results. In our theory, the transcriptional activator increases the rescaled transcription rate constant \(\frac{{k}_{{{{{{\rm{ini}}}}}}}}{{k}_{{{{{{\rm{on}}}}}}}}\frac{{\rho }_{0}}{{\rho }_{0}+{K}_{{{{{{\rm{p}}}}}}}}\), see Fig. 2 and Table 1. Our theory predicts that both the small RNA production rate and the degree σ of H3K9 methylation decrease with decreasing the transcription rate constant, namely with the depletion of transcriptional activator, if the latter constant is larger than the value at the euchromatin-heterochromatin transition, see the solid lines in Fig. 8a, b. Our recent experiments suggest that the level of H3K9 methylation increases with the depletion of Epe131, implying that these experimental results are not explained only by the role of Epe1 as a transcription activator.

Fig. 8: Dependence of small RNA production and H3K9 methylation on transcription rate – ideal chain model.
figure 8

The binding probability p, which is proportional to the production rate of small RNAs, (a) and the degree σ of H3K9 methylation (b) are shown as functions of the rescaled transcription rate constant \(\frac{{k}_{{{{{{\rm{ini}}}}}}}}{{k}_{{{{{{\rm{on}}}}}}}}\frac{{\rho }_{0}}{{\rho }_{0}+{K}_{{{{{{\rm{p}}}}}}}}\) for the inverse of the rescaled H3K9 demethylation rate \(\frac{{s}_{0}}{\sqrt{D{k}_{{{{{{\rm{d}}}}}}}}{b}^{2}}\frac{{k}_{{{{{{\rm{m}}}}}}}}{{k}_{{{{{{\rm{dm}}}}}}}}\) = 0.1 (black), 0.15 (blue), and 0.2 (red), as predicted by the ideal chain model. The solid line is derived by the numerical calculation and the dotted lines are derived by using Eqs. (11) and (12). The inverse of elongation time \(1/({k}_{{{{{{\rm{on}}}}}}}{\tau }_{{{{{{\rm{elo}}}}}}})\) is fixed to 0.02. Other parameters are shown in Table 1.

Both of the rescaled transcription rate constant \(\frac{{k}_{{{{{{\rm{ini}}}}}}}}{{k}_{{{{{{\rm{on}}}}}}}}\frac{{\rho }_{0}}{{\rho }_{0}+{K}_{{{{{{\rm{p}}}}}}}}\) and the demethylation rate constant kdm may decrease with the depletion of Epe1 due to its dual role as a H3K9 demethylase and a transcriptional activator. Both the production rate of small RNAs and the degree σ of H3K9 methylation increase with decreasing the demethylation rate kdm, see the black and red lines in Fig. 8a, b. This implies that whether the dual role of Epe1 is consistent with our experiments depends on the details of the modulation of the kinetic parameters by Epe1 depletion, which have not been well-characterized.

The analytical forms may be useful for future experiments because they represent the explicit dependence on the rate constants. For p ≈ 1, σ < 1/2, and N ≫ 1, the binding probability p, which is proportional to the production rate of small RNAs, has an approximate form

$$p=1-\frac{1}{{k}_{{{{{{\rm{on}}}}}}}{\tau }_{{{{{{\rm{elo}}}}}}}}\frac{{b}^{2}\sqrt{D{k}_{{{{{{\rm{d}}}}}}}}}{{s}_{0}}\frac{{k}_{{{{{{\rm{dm}}}}}}}}{{k}_{{{{{{\rm{m}}}}}}}}\frac{1}{{n}_{{{{{{\rm{elo}}}}}}}^{2}}$$
(11)

where it depends on the transcription rate constant via nelo given by Eq. (1), see the dotted lines in Fig. 8a. The degree σ of H3K9 methylation has an approximate form

$$\sigma =\frac{{s}_{0}}{{b}^{2}\sqrt{D{k}_{{{{{{\rm{d}}}}}}}}}\frac{{k}_{{{{{{\rm{m}}}}}}}}{{k}_{{{{{{\rm{dm}}}}}}}}{n}_{{{{{{\rm{elo}}}}}}}-\frac{1}{{k}_{{{{{{\rm{on}}}}}}}{\tau }_{{{{{{\rm{elo}}}}}}}{n}_{{{{{{\rm{elo}}}}}}}}$$
(12)

see the dotted lines in Fig. 8b. The binding probability p and the degree σ of H3K9 methylation are sensitive to the transcription rate constant at the vicinity of the euchromatin-heterochromatin transition.

Discussion

We have constructed a theory of the assembly of the constitutive heterochromatin in fission yeast. This theory is motivated by the fact that the repeat sequence and many transcription start sites are the genomic signature of heterochromatin assembly in fission yeast31. It is analogous to the surface adhesion of polymers32,33, given the fact that small RNAs are produced by RDRC/Dicers at the surfaces of the nuclear membrane29,30. We thus take into account the essence of the surface adhesion of polymers in the kinetic equations of the small RNA production and the H3K9 methylation. Quantifying biochemical reactions by using the kinetic equations is an approach often used in systems biology, while the properties arising from the connectivity of repeated units along a chain have been studied in polymer physics: our approach is the fusion of the soft matter physics and the systems biology. It is very different from other theories of heterochromatin assembly, which are based on the theory of phase separation4,5,6,7,8,9,10,11,12,13,14,15,16.

Our theory predicts that the polymeric nature of the tandemly repeated genes enhances the positive feedback between the small RNA production and the H3K9 methylation through the stable binding to RDRC/Dicers at the surface of the nuclear membrane, ensuring the steady production of small RNAs. The stable binding is promoted by the localization of nascent RNAs along DNA and of RDRC/Dicers on the surface of the nuclear membrane with a mechanism analogous to the surface adhesion of polymers: if a nascent RNA produced from a gene is bound to a RDRC/Dicer, other nascent RNAs in the tandem repeat are at the vicinity to the surface, where other RDRC/Dicers are localized. This is a new insight over the model proposed in our recent experimental paper31 that the tandem repeat increases the local concentration of nascent RNAs. This mechanism also acts to the cases in which the sequence is not a sequence of exact repeat, but a sequence including many transcription start sites, such as the case of dg/dh in centromeric region of fission yeast, which can produce transcripts containing the same sequence. We note that the fact that RDRC/Dicers are localized at the surface of a nuclear membrane plays a similar role in the localization of RDRC/Dicers(/RITS) in a condensate assembled by phase separation. A similar principle may thus act in a completely different system, such as the stable binding of tandemly repeated rDNAs to the surfaces of subcompartments in nucleoli52,53,54 and the stable binding of tandemly repeated enhancers in a superenhancer to the surfaces of transcriptional condensate55. Our theory also predicts that not only the number of genes in the repeat, but also the elongation time, which is the length of each gene divided by the elongation rate of Pol II, are important parameters to the assembly of heterochromatin, see Fig. 5. The latter prediction may be accessible by measuring the dependence of the levels of small RNAs and H3K9 methylation on the length of each gene in the repeat. The dependence of the heterochromatin assembly on the elongation rate is not because of a specific regulation in transcription56, but because a long elongation time increases the chance for nascent RNAs to bind to RDRC/Dicers and CLRC.

We also studied the contribution of the chromatin compaction due to the self-association of Swi6 to the small RNA production and the H3K9 methylation, see Fig. 6. While it is an issue separate from the nature of the heterochromatin assembly as the polymer surface adhesion, it can regulate the conformation of the tandemly repeated genes and the accessibility of Pol II to these genes. The chromatin compaction mainly affects the local concentration of Pol II at the vicinity of the genes, see Eq. (10). Our theory predicts that the production of small RNAs decreases as the H3K9 demethylation rate decreases if the number of genes in the repeat is large because it decreases the local concentration of Pol II. This prediction is consistent with our experimental result on the depletion of a putative H3K9 demethylase, Epe131. This does not exclude the possibility that the dual effect of Epe1 on H3K9 demethylation and transcriptional activation, but not via the chromatin compaction, can explain the experimental result, see Fig. 8. However, it depends on the details of the modulation of the kinetic parameters by Epe1, which is beyond the scope of this paper. In both cases, it is necessary to take into account the H3K9 demethylation activity of Epe1 to be consistent with our experiments31. More experiments are necessary to determine which is the dominant mechanism of the enhancement of the repeat-induced RNAi by Epe1.

Liquid-liquid phase separation (LLPS) has been thought to be the principle of the assembly of biological condensates57,58. Architectural RNAs (arcRNAs) are a class of RNAs that are essential to the assembly of some condensates59. The LLPS is driven by the multi-valent interactions between RNA-binding proteins (RBPs) bound to arcRNAs. RBPs bound to an arcRNA are constrained along the chain from the thermal diffusion of RBPs and this enhances the phase separation60, much like the Flory-Huggins mechanism of the phase separation of polymers32. The latter mechanism is effective for relatively long RNAs. In the assembly of heterochromatin, nascent RNAs produced by transcription are cut into small pieces. Small RNAs play the opposite role: the thermal diffusion of small RNAs allows the long-range interaction between separate regions of chromatin31,61. Cells regulate the length of RNAs to achieve different functions. The repeat sequence amplifies the long-range interaction via steady production of small RNAs. The gene silencing through small RNAs is not limited to fission yeast, but is also found in other organisms37,38,39. Our theory can be extended to understand the mechanism of the gene silencing in these organisms.

Methods

Kinetics of transcription

The kinetic equations of the state transition have the forms

$$\frac{d{n}_{{{{{{\rm{on}}}}}}}}{{dt}}={k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{p}}}}}}}\rho {n}_{{{{{{\rm{off}}}}}}}-{k}_{{{{{{\rm{off}}}}}}}^{{{{{{\rm{p}}}}}}}{n}_{{{{{{\rm{on}}}}}}}-{k}_{{{{{{\rm{ini}}}}}}}{n}_{{{{{{\rm{on}}}}}}}$$
(13)
$$\frac{d{n}_{{{{{{\rm{off}}}}}}}}{{dt}}={k}_{{{{{{\rm{off}}}}}}}^{{{{{{\rm{p}}}}}}}{n}_{{{{{{\rm{on}}}}}}}-{k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{p}}}}}}}\rho {n}_{{{{{{\rm{off}}}}}}}+\frac{{n}_{{{{{{\rm{elo}}}}}}}}{{\tau }_{{{{{{\rm{elo}}}}}}}}$$
(14)
$$\frac{d{n}_{{{{{{\rm{elo}}}}}}}}{{dt}}={k}_{{{{{{\rm{ini}}}}}}}{n}_{{{{{{\rm{on}}}}}}}-\frac{{n}_{{{{{{\rm{elo}}}}}}}}{{\tau }_{{{{{{\rm{elo}}}}}}}}$$
(15)

where non, noff, and nelo are the fraction of genes in the bound, unbound, and elongation states, respectively (\({n}_{{{{{{\rm{on}}}}}}}+{n}_{{{{{{\rm{off}}}}}}}+{n}_{{{{{{\rm{elo}}}}}}}=1\)). The state transition of the genes is driven by the binding of a Pol II to the promoter from the nucleosol (the first term of Eq. (13) and the second term of Eq. (14)), the unbinding of the Pol II from the promoter to the nucleosol (the second term of Eq. (13) and the first term of Eq. (14)), the initiation of the transcriptional elongation (the third term of Eq. (13) and the first term of Eq. (15)), and the termination of the transcriptional elongation (the third term of Eq. (14) and the second term of Eq. (15)). ρ is the local concentration of Pol II. kini is the initiation rate and τelo is the elongation time. \({k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{p}}}}}}}\) and \({k}_{{{{{{\rm{off}}}}}}}^{{{{{{\rm{p}}}}}}}\) are the rate constants that account for the binding and unbinding of Pol IIs to/from the promoter, respectively.

In general, the binding and unbinding of Pol IIs to the promoter is faster than the initiation of the transcription. In the relevant time scale, the binding and unbinding dynamics of Pol IIs reach the chemical equilibrium, \({k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{p}}}}}}}\rho {n}_{{{{{{\rm{off}}}}}}}-{k}_{{{{{{\rm{off}}}}}}}^{{{{{{\rm{p}}}}}}}{n}_{{{{{{\rm{on}}}}}}}=0\). This situation is analogous to the Michaelis-Menten kinetics, where Pol II acts as “enzyme” and the promoter acts as “substrate”. The equilibrium condition leads to the form

$${n}_{{{{{{\rm{on}}}}}}}=\frac{\rho }{\rho +{K}_{{{{{{\rm{p}}}}}}}}{n}_{0}$$
(16)

where \({n}_{0}(={n}_{{{{{{\rm{on}}}}}}}+{n}_{{{{{{\rm{off}}}}}}}=1-{n}_{{{{{{\rm{elo}}}}}}})\) is the fraction of genes in the bound or unbound state. \({K}_{{{{{{\rm{p}}}}}}}(={k}_{{{{{{\rm{off}}}}}}}^{{{{{{\rm{p}}}}}}}/{k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{p}}}}}}})\) is the equilibrium constant that accounts for the binding and unbinding of Pol IIs to the promoter. Substituting Eq. (16) into Eq. (15) leads to a kinetic equation only of the fraction nelo. In the steady state, the fraction nelo has the form of Eq. (1).

Free energy of heterochromatin

The free energy of a subchain composed of g units has the form

$$F={F}_{{{{{{\rm{ela}}}}}}}+{F}_{{{{{{\rm{mix}}}}}}}+{F}_{{{{{{\rm{int}}}}}}}-\mu \frac{{\xi }^{3}}{{b}^{3}}\left(\rho +{\phi }_{{{{{{\rm{c}}}}}}}\left({n}_{{{{{{\rm{on}}}}}}}+{n}_{{{{{{\rm{elo}}}}}}}\right)\right)+{\Pi }_{{{{{{\rm{osm}}}}}}}{\xi }^{3}$$
(17)

This free energy is the sum of the elastic free energy Fela of the subsection, the free energy Fmix due to the mixing entropy, and the free energy Fint due to the interaction between units. The fourth and fifth terms of Eq. (17) are the contributions of chemical potential μ, which represents the exchange of Pol II with the exterior (nucleosol), and osmotic pressure Πosm, which represents the mechanical balance with the exterior. We derive the size ξ of the subchain and the volume fraction ρ of Pol II in the volume pervaded by the subchain by using the free energy.

The elastic free energy has the form

$$\frac{{F}_{{{{{{\rm{ela}}}}}}}}{{k}_{{{{{{\rm{B}}}}}}}T}=\frac{3}{2}\frac{{\xi }^{2}}{g{b}^{2}}+\frac{3}{2}\frac{g{b}^{2}}{{\xi }^{2}}$$
(18)

Because the elasticity of a polymer results from the conformational entropy, the free energy scales as the thermal energy \({k}_{{{{{{\rm{B}}}}}}}T\), where kB is the Boltzmann constant and T is the absolute temperature. The first term is the elastic free energy due to the stretching of the subsection and the second term is the elastic free energy due to the compression of the subsection.

The mixing free energy has the form

$$\frac{{F}_{{{{{{\rm{mix}}}}}}}}{{k}_{{{{{{\rm{B}}}}}}}T}=\frac{{\xi }^{3}}{{b}^{3}}\left[\rho \,{{\log }}\,\rho +\left(1-\rho -\frac{{b}^{3}g}{{\xi }^{3}}\right){{\log }}\left(1-\rho -\frac{{b}^{3}g}{{\xi }^{3}}\right)\right]$$
(19)

The first term of Eq. (19) is the free energy due to the mixing entropy of Pol II and the second term of this equation is the free energy due to the mixing entropy of solvent. For simplicity, we neglected the excluded volume of Pol IIs that are bound to the chromatin units because it is not essential to the physics of heterochromatin assembly and increases the number of unknown parameters, see also Supplementary Note 3 and Supplementary Figure 15.

The interaction free energy has the form

$$\frac{{F}_{{{{{{\rm{int}}}}}}}}{{k}_{{{{{{\rm{B}}}}}}}T}=-\frac{{\xi }^{3}}{{b}^{3}}\chi {\sigma }^{2}{\left(\frac{{b}^{3}g}{{\xi }^{3}}\right)}^{2}-\frac{{\xi }^{3}}{{b}^{3}}{\chi }_{0}{\left(\frac{{b}^{3}g}{{\xi }^{3}}\right)}^{2}$$
(20)

The free energy is a function of the size ξ of the subchain and the volume fraction ρ of Pol IIs. The first derivatives of the free energy with respect to ξ and ρ lead to the balance of the osmotic pressure (the equation of state) and the equality of chemical potential of Pol IIs between the subchain and the exterior. Equation (9) is derived by rewriting the equation of state with the chromatin volume fraction

$${\phi }_{{{{{{\rm{c}}}}}}}=\frac{{b}^{3}g}{{\xi }^{3}}$$
(21)

Equation (10) is derived by rewriting the equality of chemical potentials by using \({\rho }_{0}=1/(1+{{{{{{\rm{e}}}}}}}^{-\mu /({k}_{{{{{{\rm{B}}}}}}}T)})\).

Diffusion of small RNA

Small RNAs produced by RDRC/Dicers diffuse away from the nuclear membrane. The local concentration c(z,t) of small RNAs follow the diffusion equation

$$\frac{\partial }{\partial t}c\left(z,t\right)=D\frac{{\partial }^{2}}{\partial {z}^{2}}c\left(z,t\right)-{k}_{{{{{{\rm{d}}}}}}}c\left(z,t\right)$$
(22)

where D is the diffusivity of small RNAs and kd is the degradation rate of small RNAs. z is the coordinate from the surface of the nuclear membrane, see Fig. 2. The boundary condition of the local concentration c(z,t) has the form

$$-D{\left.\frac{\partial }{\partial z}c\left(z,t\right)\right|}_{z\to 0}=S$$
(23)

where the rate S of small RNA production is given by Eq. (7). Equation (4) is derived by solving Eq. (22) in the steady state and use Eq. (23) to determine the integral constant.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.