Background

Dystrophinopathies are the most common X-linked inherited muscle diseases, and the manifestations range from mild phenotypes of asymptomatic increase in serum concentration of creatine phosphokinase (CK) to severe phenotypes that include Duchenne muscular dystrophy (DMD, MIM 310200), Becker muscular dystrophy (BMD, MIM 300376), and DMD-associated dilated cardiomyopathy (DCM, MIM 302045) [1]. DMD usually presents in early childhood, and affected children are often wheelchair dependent by age 12 years, and few survive beyond the third decade, with respiratory complications and progressive cardiomyopathy being common causes of death. BMD is characterized by later-onset and relatively slow progress, and heart failure from DCM is a common cause of death in BMD [2, 3]. The exact prevalence data of dystrophinopathies are not available. DMD is more common than BMD, and it is reported that the incidence of DMD is 1:4,700 live male births in Canada [4] and 1:3,917 live male births in southeast Norway [5].

The molecular basis of DMD/BMD and DCM is pathogenic variation in the DMD gene (MIM 300377), the largest gene in humans, spanning 2.2 Mb genome sequence at Xp21, consisting of 79 exons. The DMD gene contains at least seven independent, tissue-specific promoters and two polyA-addition sites [6]. Among these, three full-length isoforms share the same number of exons but are derived from three independent promoters (exon 1) in the brain (Dp427c), muscle (Dp427m), and Purkinje cerebellar neurons (Dp427p) [2]. While many variants have been documented within this gene, a majority of them affect the expression of the muscle isoform (Dp427m) [2]. About 65% of DMD gene pathogenic variants are exonic deletions, ~ 10% are exonic duplications, and about 25% are small variants, including point mutations, small insertions/deletions (indels), and others [3].

Numerous molecular genetic methods are available for mutation screening of the DMD gene. Multiple polymerase chain reaction (M-PCR), targeting mutation hotspots, can detect approximately 98% of exonic deletions [7]. Multiplex ligation-dependent probe amplification (MLPA) [8] is more widely used in clinical labs for DMD mutation screening because it can simultaneously detect exonic deletions and duplications. The next-generation sequencing (NGS) technology enables rapid and comprehensive screening of single nucleotide variations (SNVs) and small indels among 79 exons in the DMD gene. Genetic diagnosis could be confirmed in around 98% of DMD/BMD patients by MLPA combined with NGS technology [9]. However, the traditional method of mutation screening for DMD cannot identify the complex structural variants (SVs) of the DMD gene, such as discerning whether DMD exonic duplications occur extragenically or intragenically and whether in tandem or not. This information holds significance for determining the pathogenicity of duplications [10]. Recently, long-read sequencing (LRS) methods have emerged, which can generate genome assemblies of unprecedented quality. Leveraging the advantages of longer reads, LRS has been successfully employed in the genetic testing of monogenic diseases with structural complexity, including thalassemia [11] and congenital adrenal hyperplasia [12].

In this study, we selected three unique families with duplication variants affecting exon 1 and/or exon 2 in the DMD gene to explore the structural characteristics of exonic duplications through LRS. Our investigation aimed to shed light on the pathogenicity of these variants and provide further insights into their implications.

Results

MLPA results

In pedigree 1, the index patient (II4) was identified with a duplication of exon 1–2 in the DMD gene during routine expanded carrier screening (ECS). Subsequently, MLPA was used to confirm that the duplication occurred in exons 1–2 in the Dp427m isoform and exon 1 in the Dp427c isoform of the DMD gene. Further investigation of the family demonstrated that the other three females (II2, II3, and III8) were heterozygous duplication carriers. Unexpectedly, three male members (I1, III3, III5) harbored the same hemizygous duplication variants, without clinical manifestations of DMD/BMD and abnormal biochemical indicators (I1, III3). The lack of genotype–phenotype cosegregation suggested that the duplication variant affecting exons 1–2 in the Dp427m isoform and exon 1 in the Dp427c isoform of the DMD gene was likely benign. (See Fig. 1B and Supplementary Fig. 1).

Fig. 1
figure 1

Genetic analysis of the DMD gene in Pedigree 1. A and B show the family pedigree and DMD gene analysis results detected by MLPA, male members I1, III3, and III5 had hemizygous duplication variants of consecutive DMD exons 1–2 in Dp427m and exon 1 in Dp427c, and female members II2, II3, II4 and III8 were heterozygous. C shows a critical breakpoint from a screenshot of the integrative genomics viewer (IGV) based on LRS data analysis. D Schematic diagram shows the location of the breakpoint and architectural features of the duplication variant. E is the result of Sanger sequencing verification for the critical breakpoint. The red dashed line indicates the breakpoints and the red single arrow indicates the same critical breakpoint

In pedigree 2, the affected boy (II1) presented with a duplication variant involving exon 2 in the DMD gene, displaying the characteristic clinical phenotype of DMD. His sister (II2, index patient) carried the heterozygous duplication of exon 2, with mild phenotype of abnormal biochemical indicators. Their mother (I2) was identified as a carrier of the heterozygous duplication variant of exon 2 in the DMD gene, without abnormal phenotype. (See Fig. 2B).

Fig. 2
figure 2

Genetic analysis of the DMD gene in Pedigree 2. A and B show the family pedigree and DMD gene analysis results detected by MLPA, affected boy (II1) had a hemizygous duplication variant of exon 2 in the DMD gene, and his mother and sisiter were heterozygous. C shows a critical breakpoint from a screenshot of the integrative genomics viewer (IGV) based on LRS data analysis. D Schematic diagram shows the locations of breakpoints and architectural features of the duplication variants. E is the result of Sanger sequencing verification for the critical breakpoint. The red dashed line indicates the breakpoints and the red single arrow indicates the same critical breakpoint

In pedigree 3, the proband exhibited a duplication variant involving exon 1 in Dp427c, displaying typical clinical phenotypes of DMD. His pregnant mother was identified as a carrier of the heterozygous variant. (See Fig. 3A).

Fig. 3
figure 3

Genetic analysis of the DMD gene in Pedigree 3. A shows the family pedigree and DMD gene analysis results detected by MLPA. The proband (II1) had hemizygous duplication variants of exon 1 of the Dp427c transcript in the DMD gene, and his mother was heterozygous. B shows critical breakpoints from screenshots of the integrative genomics viewer (IGV) based on LRS data analysis, and corresponding verification results by Sanger sequencing. From top to bottom are the joints of fragments a and c, fragments c and e, fragments e and a (indicated in C). The red single arrows indicate the breakpoints. C Schematic diagram shows the locations of breakpoints and architectural features of the duplication variant. The red dashed lines indicate the breakpoints involved in recombination, and the coordinates of the breakpoints in the genome are shown next to them

Breakpoints and architectural features identification by LRS and validation

Whole genome LRS was performed to identify the breakpoints, and the sequencing data parameters were described in Supplementary Table 2. In the index patient (II4) in pedigree 1, LRS revealed that the duplication variant was located at chrX:33,019,224–33,822,717. This duplication encompassed a contiguous segment of approximately 803.5 kb, spanning consecutive DMD exons 1–2 within the Dp427m transcript and exon 1 within the Dp427c transcript. The duplication occurred in a tandem arrangement. The critical breakpoint was confirmed by Sanger sequencing. (See Fig. 1C\D\E).

In pedigree 2, the fresh lymphocyte sample was unavailable from the affected boy (II1), but a fresh sample was obtained from II2. LRS on II2 indicated the duplication variant was ~ 71.0 kb, located at chrX:32,999,023–33,070,000. This duplication involved a single exon 2 of the DMD gene and was arranged in tandem, potentially disrupting the reading frame of the DMD gene. We conducted Sanger sequencing to validate the critical breakpoint. (See Fig. 2C\D\E).

In pedigree 3, the proband (II1) exhibited a complex duplication spanning approximately 688.9 kb within the DMD gene, as identified by LRS. This duplication involved an inverted single exon 1 of the Dp427c transcript. The initial tandem duplication segment was potentially substantial (~ 6.3 Mb), ranging from chrX:33,154,365 to chrX:39,474,769. Additionally, two internal deletions were observed either following or occurring simultaneously with the tandem duplication. These deletions encompassed fragments b (chrX:33,202,989–33,283,460) and d (chrX:33,523,779–39,074,862), respectively. Furthermore, an inversion (fragment c, chrX:33,283,460–33,523,779) was also detected. These complex genomic rearrangements result in an out-of-frame variant. Three critical breakpoints were verified by Sanger sequencing. (See Fig. 3 B\C).

No pathogenic SNVs and indels were identified in any of the subjects based on nanopore sequencing data. The size of duplication variants revealed by LRS was consistent with those determined by CNV-Seq using NGS with a resolution of 100 kb. (See Supplementary Fig. 2).

Sequence characteristics of breakpoints

In the three pedigrees examined in this study, repeat sequences were observed surrounding most of the breakpoints. These include various types of repeat elements, such as short interspersed nuclear elements (SINE), long interspersed nuclear elements (LINE), long terminal repeat elements (LTR), and low complexity repeats. Detailed information can be found in Supplementary Fig. 3.

Discussion

Exonic duplications are a frequent type of pathogenic variant in the DMD gene [3, 7, 13], and duplication of exon 2 is the most prevalent duplication variant among DMD patients [13]. While MLPA and NGS methods are commonly employed in clinical settings to detect exonic duplication variants in DMD, they often fail to discern the precise physical locations of breakpoints and structural characteristics of genomic rearrangements. However, LRS overcomes the limitations associated with assembly problems encountered when dealing with long and complex sequences. Kubota et al. [14] reported a DMD patient with complex genomic rearrangements involving exon 2 duplication through LRS, and simultaneously detected the normal intact DMD gene sequence, suggesting a mosaic nature in the patient. For individuals clinically diagnosed with DMD or BMD, long-read whole-genome sequencing presents a valuable approach for identifying potential structural variants within the DMD gene when conventional methods are unable to confirm the genetic diagnosis. **, detection of variants

Raw data in fastq format were obtained by capturing the electrical signal generated by PromethION. Guppy basecalling software (v5.0.16) was employed during this process. To maintain analysis accuracy and data integrity, NanoFilt (v2.8.0, https://github.com/wdecoster/nanofilt) was applied to eliminate low-quality reads (Qphred <  = 7) and short reads (< 1000 bp) from the raw data. Additionally, a total of 50 bp bases from both the head and tail ends of the reads were trimmed. Minimap2 (https://github.com/lh3/minimap2) was employed to align the reads to the reference genomes hg19 (GRCh37) accurately. Subsequently, samtools (v1.2, https://github.com/samtools/samtools) was used to convert the resulting SAM file to the BAM format for further processing and analysis. Sniffles2 (https://github.com/fritzsedlazeck/Sniffles) was utilized to process the BAM files to detect structural variations (SVs) in the genomic data. To refine the results, screening based on high-quality variant reads was conducted, and the karyotype diagnosis report was examined. By combining these analyses, preliminary SV results with improved accuracy and reliability were obtained. To examine single nucleotide variants (SNVs) and indels from samples, PEPPER-Margin-DeepVariant (r0.8-gpu, https://github.com/kishwarshafin/pepper) was employed by providing the BAM file as input.

Analysis of sequence characteristics near breakpoints

A manual analysis of the flanking sequences at each breakpoint was performed to investigate the presence of repetitive elements, and 100 or 200-base pair reads both upstream and downstream of each breakpoint were extracted. The "Repeat Masker" program from the UCSC Genome Browser was employed to conduct a comprehensive search for repetitive elements within these extracted reads.

NGS sequencing and copy number variants (CNVs) analysis

Approximately 50 ng of genomic DNA (gDNA) underwent fragmentation using the DNA Fragment kit (KT100804248, Yikon, China), followed by library preparation using the DNA library prepare kit (XK038, Yikon, China). The quality of the resulting library was assessed using the Agilent 2100 Bioanalyzer (Agilent, USA). Subsequently, DNA libraries were subjected to sequencing on the Nextseq500 system (Illumina, USA). Copy number quantification across the genome was performed using NGS reads, following established protocols.

PCR and Sanger sequencing validation

The breakpoints of the DMD gene identified by nanopore sequencing were confirmed by PCR and Sanger sequencing. PCR primers were designed using MFEprimer-3.1 (https://mfeprimer3.igenetech.com/), and primer sequences were listed in Supplementary Table 1. Template gDNA was amplified using 25 ul 2 × GoldStar Best MasterMix (CW0655M, Cwbio, China), 2 ul forward primer, and 2 ul reverse primer to obtain around 1 kb PCR products. PCR products were confirmed on agarose gels and sequenced on an ABI Prism 3730xl Genetic Analyser (Applied Biosystems) and analyzed with Chromas software (Technelysium, Australia).