Introduction

Satellite cells (SCs) are a heterogeneous population of stem and progenitor cells that have been demonstrated to play a pivotal role in skeletal muscle (SkM) hypertrophy, regeneration, and remodeling1,2. The SCs are normally kept in a quiescent state and activated upon exposure to stimuli, such as exercise or SkM injury. When committed to myogenic differentiation, SCs proliferate further, fuse to existing SkM fibers, and contribute new nuclei to the growing and regenerating fibers3. Aged human SkMs show a decline in the number and proliferative potential of the SCs4. As a consequence, a dysfunctional SC compartment is envisaged as a major contributor to age-related defects, including reduced capacity to respond to hypertrophic stimuli such as exercise and impaired recovery from muscle disuse and injury1,5,6. Furthermore, SCs have been shown to contribute to differentiated fibers in non-injured muscles of adult sedentary animals7,8. The basal turnover of nuclei in adult fibers appears to be less crucial in the protection from sarcopenia7, a progressive loss of SkM mass and function, which culminates in a highly disabling condition affecting up to 29% of the population aged 85 years9. Nonetheless, SCs play an essential role in limiting the occurrence of fibrosis in the SkM of mice affected by sarcopenia7 and their function in the human pathology needs to be further characterized.

A well-known factor in the decline of stem cell function is the loss of genome integrity10, for example, caused by the appearance of somatic mutations11. These modifications of the genome range from single-base changes (single-nucleotide variants (SNVs)) to insertions or deletions of a few bases (indels) to chromosomal rearrangements and occur during the whole life, starting from the first division of the embryo. In contrast to germline variants, somatic variants are not propagated to the whole individual but to a subpopulation of cells in the body, with the final consequence that adult human tissues are a mosaic of genetically different cells12,13,14. Moreover, somatic mutation burden increases during a lifetime15,16,17,18 as a result of accumulating errors occurring either during cell division or because of environment-induced DNA damage. At present, nothing is known about somatic mutation burden in human SCs or SkM.

Here, we investigate the genetic changes that occur with aging in the genome of human adult SCs and use the results to elucidate mutational processes and SC replication rate occurring in vivo in adult human muscles. We assess the functional effects of somatic mutations on SC proliferation and differentiation and predict the global consequence on muscle aging and sarcopenia. Our analyses reveal an accumulation of 13 mutations per genome per year that results in a 2–3-fold higher mutation load in active genes and promoters in aged SCs. High mutation burden correlates with defective SC function. Overall, our work points to the accumulation of somatic mutations as an intrinsic factor contributing to impaired muscle function with aging.

Results

Increased somatic mutation burden in aged SCs

We examined the somatic genetic variation in SCs from the leg muscle vastus lateralis of a group of young (21–24 years, n = 3) and old (64–78 years, n = 4) individuals by whole-genome sequencing (WGS). Because the sequencing of clonal populations is an effective strategy to study the genome of single cells19,20, freshly isolated cells expressing the SC antigen CD56 were single cell plated and cultured for 17–20 divisions prior to DNA extraction (Fig. 1a). Even though the selection of a pure SC population from muscle biopsies requires the use of multiple markers besides CD5621, the combined effect of our isolation protocol (Supplementary Fig. 1a) and culture conditions ensured the almost exclusive growth of satellite cell clones (SCCs, 57/58 tested colonies, Supplementary Table 1). Two to five SCCs/biopsy and the relative SkM and blood bulk DNA were subjected to WGS with an average depth of 30× (Supplementary Table 2). The SC origin of the sequenced clones was tested with either a differentiation assay or quantitative PCR (qPCR) for the expression of the SC markers Pax7, MyoD, and myogenin. Twenty one out of 21 tested SCCs resulted positive (Supplementary Table 2 and Supplementary Fig. 1b and c).

Fig. 1
figure 1

Age-related increase of somatic mutations in satellite cells. a Experimental strategy for somatic variant detection with whole-genome sequencing (WGS). The reference sequence was obtained from the blood of every individual. The muscle vastus lateralis was sequenced as a population (muscle bulk) or used to isolate CD56+ cells. These were either tested to verify the percentage of satellite cells or single cell plated and sequenced as a clonal population (SCCs) in order to detect cell-specific somatic mutations. The satellite cell origin of sequenced SCCs was tested in all clones that produced a sufficient number of cells (21/29) b Total number of somatic single-nucleotide variants (SNVs) and insertion/deletions (indels) per SCC normalized to the percentage of autosomes covered by the sequencing. The linear fit of mutation numbers and age is shown. P-values are calculated using a robust mixed model. Different colors correspond to the donors. c Average number of SNVs and indels accumulated per year by the SCCs according to the linear fit shown in b. d Absolute contribution of signatures 1, 5, and 8 to the mutation catalogue of each SCC. e Average percentage of SNVs/total SNVs for the most relevant signatures in young and old SCCs. *P < 0.05 two-sided t-test. f Linear fit of cancer signatures 1, 5, and 8 with the age of the SCCs. P-values were calculated with a robust mixed model with Bonferroni correction for multiple testing of different signatures (n = 3). g Yearly increase in the selected cancer signatures obtained by the linear fits in f was plotted as a proportion of the total number of SNVs/Gb accumulating every year in the SCCs

Next, we analyzed somatic mutations in 29 SCCs. The identified somatic SNVs ranged between 209 and 1371 per SCC (Fig. 1b). Considering a calculated false negative rate of 0.41 (Supplementary Table 3), the actual number of somatic mutations per genome was estimated to range between 354 and 2323, which is consistent with studies in clonally expanded cells from human fibroblasts19,22 and stem cells from the intestine and liver16. The pipeline for calling somatic variants was validated by analyzing two technical WGS replicates and showed a 94% and 99% validation rate of the somatic SNVs (Supplementary Table 4). In addition, WGS-independent validations of exonic SNVs selected from several clones (n = 16) showed a 100% validation rate (Supplementary Table 5). Somatic indels ranged between 8 and 109 per SCC (Fig. 1b and Supplementary Tables 2 and 6), with a validation rate of 84% and 98% in WGS of technical replicates (Supplementary Table 1D).

The somatic variants showed an age-dependent accumulation of mutations in the SC genomes (Fig. 1b; P = 1.47 × 10–4 for SNVs and P = 2.77 × 10−3 for indels), with a mutation rate of 13.1 ± 2.7 SNVs per genome per year during adult life (Fig. 1c). Somatic mutations have been suggested to arise in the quiescent state13, but errors occurring during cell division are generally considered the major source of mutations23,24. If we only consider cell division-coupled mutations and use a mutation rate of 2.5 SNVs per genome per cell division25,28. These data provide a map of the exons, promoters, and enhancers that are active in cultured SCs and represent a set of genes important for SC maintenance and proliferation. Moreover, the regions expressed in response to differentiating stimuli were registered, giving an indication of the enhancers and genes important for SC differentiation and fusion to myotubes. In all of these regions, old SCCs carried a number of somatic mutations that was 2–3-fold higher compared to young cells.

We wondered whether the higher mutation burden could cause an impairment of the SC functions, e.g., proliferation and differentiation. Alterations in the stem cell function during aging have been ascribed to deregulation of both extrinsic and intrinsic factors35. Extrinsic factors include molecules systemically delivered to the muscle or secreted by the local niche in the microenvironment where the SCs reside. It has been demonstrated that the SC environment changes over time and influences the SC self-renewal and capacity for tissue regeneration36. On the other hand, the age-related reduction in the number of SCs that are able to repopulate the niche and differentiate into muscle fiberrs has been demonstrated to also rely on cell-intrinsic factors, such as the activation of signaling pathways37,38 and epigenetic mechanisms39. With our model, we could explore how an intrinsic factor, i.e., genetic alterations, affects the SC activity, providing unique genome-wide data and direct assessment of the ex vivo performance of SCs of different ages. First, we found a negative correlation between the number of cells able to proliferate and differentiate and the average mutation burden in our biopsies. However, these correlations were based on data from the SC populations. To analyze the specific proliferation ability linked to each sequenced genome, we used data on the time required to complete the clonal culture. We found that the longer the time required for the colony to expand in culture, the higher the number of somatic mutations in the genome of the founder cell. This was true only for old SCCs, while no correlation was found for young SCCs. Overall, these results suggest that there is a threshold of mutations above which SC proliferation is impaired. Young SCCs showed a mutation burden lower than this threshold and proliferated normally. Conversely, old SCCs that carried a number of mutations above the threshold generally completed the clonal culture in abnormally long time. Similar results were obtained when we attributed a lethality score to every somatic mutation using CADD29. Old SCCs with normal proliferative rate and old SCCs with impaired proliferative capacity can be grouped and distinguished on the basis of the mutation lethality score. Interestingly, the lethality score that was only based on mutations in expressed regions could not distinguish normally and abnormally proliferating old SCCs. Hence, the effect of somatic mutations on cell fitness was dependent on the whole mutation burden, not only on expressed regions. In summary, our data on the proliferation of sequenced SCCs support the mutation burden as an important intrinsic factor regulating SC function during aging.

Besides affecting the SC activity per se, somatic mutation burden can be propagated to the mature tissue if SCs fuse to differentiated fibers. Therefore, we assessed whether mutations found in the SCs could be detected in the tissue of origin. To this end, we tested six SNVs affecting exons and introns of muscle-expressed genes and that showed high CADD scores. We found that a missense SNV in the HSPG2 gene was present in 1.3% of the alleles, which relates to 2.6% of the cells of the muscle of the old individual from which the SC was derived. We also found that this mutation was expressed at detectable levels. This HSPG2 variant has not previously been reported as pathogenic. However, it is harbored in a conserved region (PhastCons = 1) and a missense variant that changes the same amino acid (p.R2609Q) was found in two different genetic diseases, dyssegmental dysplasia (OMIM 224410) and Schwartz-Jampel syndrome type 1 (OMIM 255800). Thus these results show that variants found in single SCCs can be found at detectable levels in the muscle and, if pathogenic, hamper the function of the whole tissue. In line with this view, we used the somatic variants found in SCCs to predict possible deleterious effects of somatic mutations in aged SkMs. We analyzed the mutation load in a number of gene sets that regulate muscle function such as genes differentially expressed after muscle training, pathways involved in the response to training, especially those that deteriorates with aging5, and genes differentially expressed with age. Our results showed a mutation load increase of 5–6-fold with increased age and suggested a link between the increased mutation burden in old SCs and the defective response of aged muscles to training.

In summary, we have shown that SCs of the vastus lateralis muscle accumulate somatic mutations during life and provided evidence that intrinsic factors, such as the loss of genome integrity, contribute to aging of the SkM. Transcribed regions were specifically affected, and the mutation burden correlated with the capacity of SCs to proliferate and differentiate. Moreover, SCs could propagate their genomic defects to differentiated fibers and these defects might explain the decrease of muscle mass and response to stimuli observed in sarcopenia. Additional studies are required to assess similarities and differences in the age-related accumulation of mutations in muscles other than the vastus lateralis.

Methods

Clonal expansion of human SCs

Human SkM biopsies were obtained from healthy volunteers according to Ethical Permit EPN 2015/847-31/1, the Stockholm regional ethical review board. Informed consent was obtained from all donors. In total, 50–100 mg of tissue from the muscle vastus lateralis were obtained using a percutaneous needle biopsy technique40. A part of the tissue was frozen, while ~40 mg were freshly digested using 5 ml Trypl-LX in two rounds of 20 min incubations at 37 °C, 5% CO2 and gentle agitation. Cells were pre-plated in the growth medium in a 10-cm cell culture dish (Dulbecco’s modified Eagle’s medium (DMEM) F12 containing 1% ABAM and 20% fetal bovine serum (FBS)) for 30 min at 37 °C and 5% CO2. The non-attached cells were transferred to a 6-cm cell culture dish coated with collagen I (5 µg/cm2 of Collagen I bovine protein, Gibco #A10644-01, following the “thin coating procedure”). The cells were left to adhere for 48 h. Then the cells were detached and stained for CD56-PE (clone MY31, BD #347747, dilution 1:200). Using a BD FACSAria™ Mu cell sorter (BD Biosciences, USA), the cells were sorted based on their size and strong positivity for the antigen (see Supplementary Fig. 1) and were single cell plated in at least one 96-multiwell collagen-coated plate per biopsy. The cells were grown in a conditioned medium, i.e., growth medium that was collected every 48 h from a culture of feeder SCs (40–60% cell confluency), filtered through a 2-µm-wide pore filter and supplemented with FBS 1/10 of the volume. Colonies were counted after 14 days and scored as positive when at least 16 cells were visible in the well. Approximately 21 days after plating, the colonies were moved to 24-multiwell plates. Depending on the cell confluency, the colonies were then moved to six-multiwell plates. The new plates were collagen coated but growth medium instead of conditioned medium was provided. After an average of 52.9 ± 2.2 days in culture, the colonies were confluent and used for the DNA extraction.

DNA extraction

DNA was extracted from the confluent wells of the six-multiwell plate using the Gentra Puregen Kit, Qiagen. DNA was extracted from 30 mg of muscle biopsy using the Gentra Puregen Kit, supplemented with a lysis buffer containing Proteinase K as recommended by the supplier. DNA was extracted from 3 ml of total blood that was collected in EDTA as recommended by the instructions of the Gentra Puregen Blood Kit.

Whole-genome sequencing

The library preparation and sequencing were carried out at NGI Sweden, Science for Life Laboratories, Stockholm, following standard methods. For the SCCs, the library preparation was performed by a semiautomatic NeoPrep station using the Illumina TruSeq Nano Kit (350 bp average insert size) and 25 ng of DNA starting material. The libraries of the bulk DNA samples were prepared with Illumina TruSeq PCR-free library preparations (350 bp average insert size). Sequencing was performed on Illumina HiSeq X, PE 2×150 bp.

SC purity of cultured CD56+ populations

The purity of the SC populations isolated with our protocol was compared in different muscle biopsies by assessing the transcript levels of the SC marker Pax7. In parallel with single-cell sorting, CD56+ cells were sorted in a 24-well plate (Supplementary Table 1) and cultured for 1 week to expand the population prior to RNA extraction. In addition, the purity of the single-cell culture was tested by assessing the percentage of clones derived from single-cell sorted cells that expressed the myocyte marker MyoD. RNA levels were tested in those SCCs that were able to grow confluent in a 24-well plate and 97.4% of clones (CES1 n = 3, CES6 n = 9, CES7 n = 14, and CES8 n = 13) were found positive. The relative expression of SC markers was measured using the same calibrator, the CD56+ sorted population from CES3, expanded in culture for 4 weeks (Pax7 expression characterized in Supplementary Fig. 1b). As negative controls, a human fibroblast population and two clones from human subcutaneous fat were used. Negative control samples never showed amplification of the SC genes.

To further characterize the contaminant cell types that were present in our CD56+ sorted populations, we expanded for 3 weeks and froze the unsorted populations left from the original cell sorting of the CES biopsies. Thawed cells were sorted for CD56 and fluorescence-activated cell sorting analysis was performed to assess the percentage of cells expressing markers of SCs (Pax7 monoclonal, Hybridoma Bank, 10 μg/ml) and bone marrow-derived cells (CD45-APC, clone HI30, BD Biosciences, USA). At least 7000 cells per sample were counted. In addition, 5000 cells were cytospun on glass slides and immunofluorescently labeled for the fibroblast marker TE-7 (CBL271, Millipore, CA, USA). At least 300 cells per sample were counted. Thawed cells sorted for CD56 were grown as single-cell clones with the same protocol used in the original experiment and again the myogenic origin was tested by MyoD expression. 100% of clones resulted positive (CES3 = 4, CES6 = 3, CES7 = 6, CES8 = 6).

Verification of SC origin of the sequenced SCCs

Culture of SCCs from CD56+ sorted cells does not guarantee the exclusive growth of SC colonies, due to the presence of contaminant cells of myogenic and non-myogenic origin in the CD56+ pool (see Supplementary Table 1). For this reason, the SC origin of the sequenced SCCs was tested in a myotube-formation assay or by assessing the expression of the SC markers Pax7, MyoD, and myogenin via qPCR. This test could be performed for 21 of the 29 clones. The remaining eight clones showed a scarce growth, and after DNA extraction, no cells were available for additional tests.

Cell-culture assays

Proliferation assays were performed on the seven biopsies included in the study (Supplementary Table 1). For every biopsy, at least 96 SCs were plated as single cells (CES1, CES7, and CES8: N = 96; CES2, CES3, CES4, and CES6: N = 144). After 2 weeks in culture, the percentage of cells able to form a colony of at least 16 cells was scored (colony-forming SCs). SCs that were able to grow up to 30,000–100,000 cells (confluent well of six-well plate) were scored positive for long-lasting proliferative potential independent of the time in culture. The differentiation assay was performed on confluent cells, by replacing growing medium with differentiating medium (DMEM-F12, 1% FBS) for 5 days. The clones were scored for the presence of visible myotubular structures and the RNA was extracted to verify the myogenic origin. For this assay, the original plating of the single-cell clones was used for biopsies CES1 n = 37, CES2 n = 39, and CES4 n = 29. For biopsies CES3, CES6, CES7, and CES8, the frozen expanded populations of unsorted cells were thawed and grown for 1 week, than sorted for CD56, single cell plated, and grown to confluency in 24-well plates (CES3 n = 21, CES6 n = 37, CES7 n = 24, and CES8 n = 28 SCCs).

Long-culture SCCs

Freshly isolated SCs from a young individual (Supplementary Table 1) were expanded in culture for 50 days using growth medium (DMEM F12 containing 1% ABAM and 20% FBS), then the SCs were single cell plated and cultured according to our protocol for clonal expansion of SCs from human SkMs. The clones and blood bulk DNA from the donor were sequenced, and the somatic variants were analyzed as a control for the in vitro-induced mutations.

Somatic variant calling

Raw reads were aligned to the human reference genome (GRCh37/hg19 assembly version), using bwa mem 0.7.1241. Samtools 0.1.1942 was used for alignment sorting and indexing, and qualimap v2.243 for the alignment quality-control statistics. The raw alignments were then processed using the GATK best practice44 with version 3.3 of the GATK software suite. GATK RealignerTargetCreator and IndelRealigner were used to realign around indels, Picard MarkDuplicates 1.120 to mark duplicates, and GATK BaseRecalibrator to recalibrate base quality scores. Finally, genomic VCF files were created using the GATK HaplotypeCaller 3.3. Reference files came from the GATK 2.8 resource bundle and steps were coordinated using Piper v1.4.0 (www.github.com/NationalGenomicsInfrastructure/piper).

To identify somatic variants, a specific pipeline was developed. For each SCC, the union of variants called with HaplotypeCaller (GATK)45, MuTect2 (GATK 3.5.0), and FermiKit version r17846 were subjected to further filtering steps. First, the variants present in any of the SCCs were gathered in a comprehensive list of interesting positions, which was specific to each individual. For every clone of the individual, the read distribution of the interesting positions was derived from the .bam files and matched to the relative blood bulk. Variants were called when all of the following criteria were met: the read fraction supporting the alternative allele fell in the desired region (0.4–0.6), the read fraction in the blood was low (alternative < 0.1), and the coverage in both the clone and blood was at least 15×. Chromosomes X and Y were excluded from the analyses. Additional quality filters were applied as follows: the reads supporting the variants were on both strands, the maximum coverage was 1000×, and the variants that were located in problematic regions47,48 and those detected in more than one individual were removed. The variants were annotated using the Ensembl Variant Effector Predictor from ref. 49. Filtered somatic mutations in young, old, and long-term cultured human SC identified in the study are available in Supplementary Data 13.

Estimate of false negative rate

The estimate of false negative rate was done by counting how often heterozygous SNVs were detected in the clone bam file with an allele frequency that allows our pipeline to classify it as have occurred in vivo, i.e., 0.4–0.6 allele frequency. First, a set of high-confidence SNVs for which the individual was germline heterozygote (bulk and blood) and present with an allele frequency > 0.3 in the SweGen population was generated. Second, the germline heterozygote SNVs with a clone depth minimum of 15× were selected. Third, the number of SNVs that were detected in the clone bam file by GATK and/or Fermikit and/or MuTect2 were collected. Finally, the number of SNVs with allele frequency 0.4–0.6 were counted using the Samtools mpileup. Similarly, we estimated how often homozygous SNVs are detected with minor allele frequency under 0.1 in blood. High-confident homozygous SNPs were created in the same way as above, and Samtools mpileup was used to count the number of SNVs with allele frequency below 0.1 in the blood. To yield the false negative rate, the positive prediction rate (e.g., the fraction of SNVs with allele frequency 0.4–0.6 of the total number of SNVs and correct called homozygous SNVs) was subtracted from 1.

Variant validation

The variant validation was performed on technical replicates of WGS. Clone P2703_113 was sequenced twice with independent library preparations. Clone P2703_116 was split into 2 wells during the cell culture (1000 cell-stage) and resulted in 2 independently grown clones (P2703_116 and P2703_119) derived from the same ancestor cell. The DNA was extracted and sequenced independently, but clone P2703_119 was not included in the study. Variants were called in clones P2703_113 and P2703_116 (discovery set) according to our somatic variant calling pipeline. Called variants that had a minimum coverage of 10× in both the discovery and the validation sets were used for the validation. In total, 798 SNVs and 82 indels of clone P2703_113 and 1140 SNVs and 66 indels of clone P2703_116 were tested. Variants were considered validated when at least three reads supporting the alternative alleles were present in the validation set. As a control for the background signal, we validated the variants in unrelated clones, e.g., clones derived from a different founder cell obtained from the same or a different biopsy. For a WGS-independent validation, we used Agena genoty** (sequenom maldi-tof technology), which was performed at the Mutation Analysis Facility at the Karolinska Institute. The SNVs predicted to have high-impact consequences on the encoded protein were selected from all clones and tested when assays were designable (n = 11) in all clone, muscle bulk, and blood bulk DNA included in the study. The variants were considered validated when the expected genotype was found in the clone from which the SNV was discovered, and the reference allele was found in all the other samples, including the relative blood bulk sample. In addition, six SNVs selected based on their impact on muscle physiology were validated by digital droplet PCR (ddPCR; Bio-Rad) in the clone from which the SNV was discovered and the relative bulk samples.

RNA extraction and qPCR

RNA was extracted from muscle biopsies using TriZol® Reagent (Invitrogen) and from cultured cells using the RNeasy Mini Kit (Qiagen), according to the manufacturers' instructions. cDNA synthesis was performed using random hexamers and SuperScript Reverse Transcriptase (Invitrogen). Quantitative reverse transcriptase PCR (RT-PCR) was performed using Taqman qPCR reagents and an ABI7500 fast system sequencing detection instrument (Applied Biosystems). Primers and probes were assays on demand from Applied Biosystem (Pax7 Hs00242962_m1; MyoD1, Hs00159528_m1; myogenin Hs01072232_m1) and the normalizer gene was glyceraldehyde 3-phosphate dehydrogenase (#4352665, Applied Biosystems).

Digital droplet PCR

The rare event detection (RED) analysis was performed using the QX200 ddPCR system (Bio-Rad, Hercules, CA, USA). DdPCR assays (Bio-Rad, Hercules, CA, USA) were designed to detect six SNVs, namely, HSPG2 (chr1: 22174499G>A, assay id: dHsaMDS988410713), TTN (chr2: 179560728G>A, assay id: dHsaMDS896792909), FN1 (chr2: 216272892C>T, assay id: dHsaMDS668887829), FNDC1 (chr6: 159682281T>A, assay id: dHsaMDS711365861), RAP1GDS1 (chr4: 99338610C>T, assay id: dHsaMDS467491822), and CACNA2D2 (chr3: 50426879C>T, assay id: dHsaMDS892017893). The PCR reactions were performed using 2× ddPCR Supermix for Probes (no dUTP) (Bio-Rad), 20× ddPCR mut assays (FAM/HEX labeled) (Bio-Rad), 5 U of HindIII restriction enzyme (New England BioLabs, Ipswich, MA, USA), and template DNA and ran according to the manufacturer’s instructions. gBlocks gene fragments (IDT, Coralville, IA, USA), carrying one of the six SNVs each, were spiked into the human DNA sample and used for testing the ddPCR assays and optimizing the PCR thermal cycling conditions prior to the analysis of the samples. Approximately 30 ng of sample DNA per well were used in the analysis, except for the case of DNA from the additional clones from individual CES6 for which only 5 ng per well were used due to the low amount of available DNA. All samples were run in two or more replicate wells, and the data were merged from all the replicate wells to calculate the fractional abundances of the mutant alleles. Sample data were only accepted when falling within established detection parameters, which include a minimum of 3 positive droplets per sample and 10,000 accepted droplets per well.

Owing to the exonic position of the HSPG2 and FN1 mutations, the same assay could be used with both DNA and cDNA. To perform the RED on the cDNA samples, 1 μg of RNA was treated with DNase using RQ1 RNase-Free DNase (Promega, Madison, WI, USA) prior to the cDNA synthesis via the SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen, Carlsbad, CA, USA), which was performed according to the manufacturer’s instructions. The RED was performed as described for DNA. Approximately 50 ng of sample cDNA per well were used for the analysis, except for the case of cDNA from the SC population from a control individual, for which the amount of cDNA had to be reduced to 7.5 ng per wall to prevent overloading of the assay.

Mutational signatures

All analyses of base substitutions (Supplementary Fig. 2b) and mutational signatures (Fig. 1 and Supplementary Figs. 3 and 4) were carried out using the R-package MutationalPatterns16,50 following the creators’ instructions. In brief, base substitution patterns of the somatic mutations detected in SCCs were analyzed and visualized with MutationalPatterns using the reference genome set to “BSgenome.Hsapiens.UCSC.hg19”. Using the function “mut_matrix”, the relative contribution of the six single base substitution subtypes: C:G>A:T, C:G>G:C, C:G>T:A, T:A>A:T, T:A>C:G, and T:A>G:C, was refined by including the sequence context 5ʹ and 3ʹ of each mutated base to obtain a 96 trinucleotide substitution count matrix. We performed de novo extraction of mutational signatures from the SCC mutations using non-negative matrix factorization51 by applying the function “extract_signatures” to our data. We used the 96 trinucleotide substitution count matrices for all SCCs (n = 29) as input and specified that three signatures should be extracted (rank = 3). The contribution of the 96 possible trinucleotide substitution types to the extracted signatures was visualized using the function “plot_96_profile” (Supplementary Fig. 1f). The relative contribution of the extracted signatures to the mutational catalog was used for the PCA (Supplementary Fig. 1g). We used the function “fit_to_signatures” to assess how well the mutational catalog of our SCCs can be expressed as a combination of the previously identified cancer signatures (http://cancer.sanger.ac.uk/cosmic/signatures). Whole-genome data were then expressed as SNVs/Gbase/SCC and replotted (Fig. 1f). For each signature, the number of mutations per SCC was plotted and the linear fit with donor age was used to calculate the year increase (Fig. 1g). Analyses in R were done using RStudio Version 0.98.1025. Signature data were analyzed using the following pipeline to obtain principal components of the 29 sequenced SSCs and previously published sequence data from human fibroblasts27: (i) data were normalized on a per clone base; (ii) normalized data were transformed into principal components (NB: these principal components are the ones shown in the plots); (iii) 100,000 permutations of k-means clustering algorithm (with k = 2) have been performed on data restricted to the first three principal components. This pipeline has been implemented using MATLAB, and all programs have been run with default parameters.

Genomic distribution of mutations

The distribution of somatic mutations across the human genome (Supplementary Fig. 5a) was visualized using the function “plot_rainfall” of the R-package MutationalPatterns16,50. The observed and expected numbers of mutations in the different genomic regions (Fig. 2b, c) were analyzed using the function “genomic_distribution”. Input data were the somatic mutations detected in SCCs, genomic regions with enough sequencing depth (15×) in every SCC, and the genomic regions that should be tested for enrichment or depletion of somatic mutations, all represented as GRanges objects52. Regulatory regions were extracted from Ensembl using the BioMart database “regulation”, dataset “hsapiens_regulatory_feature” and GRCh “37”. Promoter regions were extracted using the filter “regulatory_feature_type_name” set to “Promoter”. Promoter flanking regions were extracted using the filter “regulatory_feature_type_name” set to “Promoter Flanking Region”. Open chromatin regions were extracted using the filter “regulatory_feature_type_name” set to “Open chromatin”. Transcription factor (TF)-binding regions were extracted using the filter “regulatory_feature_type_name” set to “TF binding site”. To test for significant depletion or enrichment of somatic mutations in genomic regions, we used the function “enrichment_depletion_test” with the observed and expected number of mutations in the genomic regions as input. The results were visualized using the fuction “plot_enrichment_depletion”, with the significance test data as input. Exon data were extracted from Ensembl using the BioMart database “ensembl”, dataset “hsapiens_gene_ensembl” and GRCh “37”. Information on genes that were detected or not detected on RNA level in the SkM was extracted from the human protein atlas (http://proteinatlas.com). A list of genes detected in the SkM was obtained by concatenating the results from the search terms “tissue_specificity_rna:skeletal muscle;elevated AND sort_by:tissue specific score”, “tissue_specificity_rna:any;expressed in all”, and “tissue_specificity_rna:skeletal muscle;mixed” at http://proteinatlas.org. A list of genes not detected in the SkM was generated by the search term “tissue_specificity_rna:skeletal muscle;not detected” at http://proteinatlas.org. Based on Ensembl ID, we selected the subsets of all exons extracted from the BioMart database “ensemble” that were present in the list of genes detected in the SkM or in the list of genes not detected in the SkM.

The distribution of mutations in 173 muscle disease genes was tested against a null model as follows. SCCs were assigned to the young (n = 13) and old (n = 16) groups and all genes harboring a mutation were registered for each group. The total number of mutations with encode annotation to a gene (e.g., harbored in extended gene regions including exons, introns, and 5 kb upstream and downstream of each gene) was counted for each group. We sampled 104 times the same number of exons from the list of all exons (“hsapiens_gene_ensembl” previously described) proportionally to their length and then computed how many extracted extended gene regions were present in the list of muscle disease genes. P-value and Z-score were then evaluated by comparing the empirical number of mutations with the distribution of the null-models using one-sided t-test. Full gene length was obtained from Gencode, release 19.

Mutation load in specific genomic regions and gene sets

For the analysis of genes, promoters and enhancers actively expressed during myoblast to myotube differentiation (Fig. 3a) data were downloaded from the FANTOM5 project at http://fantom.gsc.riken.jp/data/ (human_permissive_enhancers_phase_1_and_2_expression_tpm_matrix.txt.gz and hg19.cage_peak_phase1and2combined_counts_ann.osc.txt.gz)53. Expression data were available at 8 different time points during the myoblast to myotube differentiation (CNhs13847-14585): day 00 (basal), day 01, 02, 03, 04, 06, 08, 10, and 12 upon exposure to differentiation stimuli. Exploiting different time points, two expression sets, “basal” and “upon differentiation”, were created. “Basal” corresponds to transcribed regions on day 00. For “upon differentiation”, transcribed regions at all time points were intersected with expressed regions on day 00 and the non-overlap** portion of transcribed regions at all time points was taken. To determine actively transcribed promoters and enhancers, we determined an arbitrary threshold based on surveyed region size (see Supplementary Fig. 8). For enhancers, all regions with an expression score ≥1 were included and the considered regions were enlarged +/−100 bp. For promoters, all regions with an expression score ≥30 were included and the considered regions were enlarged of an asymmetric window of 860 bp upstream and 100 bp downstream. For exons, gene names were derived from expressed regions reported in the promoter list and the exonic regions of each gene were derived from “hsapiens_gene_ensembl” previously described. Three different donor replicates were used and regions transcribed with intensity equal or above the threshold in at least one replicate were included. Size of surveyed regions was 4525728 and 32409968 bp for enhancers in “basal” and “upon differentiation”, respectively, 5335985 and 3903736 for promoters, and 17542615 and 11123995 for exons. Somatic SNVs and indels were annotated to the described regions and quantified for every SCC and the average of mutations in young and old SCCs was reported. When enhancer and promoter regions overlapped, mutations were attributed to either one based on manual investigation of the region.

For the analysis of the numbers of mutations in gene sets involved in muscle function (Fig. 4b), genes were derived as described in Supplementary Tables 7 to 9. Lists of exonic mutations in young and old clones were screened and every mutation that was annotated with a gene name included in the gene set was scored. The average number of mutations per gene was calculated by dividing the number of mutations in the gene set by the number of tested SCC (young n = 13, old n = 16) and the number of genes in the gene set.

Statistical analyses

Unless otherwise indicated, P-values were calculated using two-tailed distribution, two-sample unequal variance Student’s t-tests, with significance defined as P < 0.05 (*P < 0.05, **P < 0.01, ***P < 0.001). The results are presented as the mean ± standard error of the mean (SEM). All calculations were performed using the GraphPad Prism software. The linear fits between mutation numbers and age shown in Fig. 1 were obtained using a robust mixed model where the dependent variable is the number of mutations or a given mutational signature, the fixed effect is age, and the random effect is the individual. Bonferroni correction for the number of tested signatures was applied in Fig. 1f. Analyses were performed in R.

Data availability

The sequence data that support the findings of this study are available within the article and its Supplementary Information (Supplementary Data 13). Raw sequence data are only available from the corresponding author upon request and will be made available to research projects that fulfill the informed consent, owing to regulations pertaining to the authors' deposition of these data in public repositories.