-
Article
Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations
Background: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants...
-
Article
Open AccessMethPhaser: methylation-based long-read haplotype phasing of human genomes
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and di...
-
Article
Unveiling microbial diversity: harnessing long-read sequencing technology
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic cla...
-
Article
Open AccessProfiling complex repeat expansions in RFC1 in Parkinson’s disease
A biallelic (AAGGG) expansion in the poly(A) tail of an AluSx3 transposable element within the gene RFC1 is a frequent cause of cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS), and more rece...
-
Article
Analysis and benchmarking of small and large genomic variants across tandem repeats
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studi...
-
Article
Open AccessPublisher Correction: Detection of mosaic and population-level structural variants with Sniffles2
-
Article
Characterization and visualization of tandem repeats at genome scale
Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there...
-
Article
Open AccessDetection of mosaic and population-level structural variants with Sniffles2
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over curren...
-
Article
Open AccessInference of phylogenetic trees directly from raw sequencing reads using Read2Tree
Current methods for inference of phylogenetic trees require running complex pipelines at substantial computational and labor costs, with additional constraints in sequencing coverage, assembly and annotation q...
-
Article
Improved sequence map** using a complete reference genome and lift-over
Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, th...
-
Article
Open AccessGenomic variant benchmark: if you cannot measure it, you cannot improve it
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future ...
-
Article
Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensiv...
-
Article
The complete sequence of a human Y chromosome
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1–3. As a result, mo...
-
Article
Open AccessMultiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes
Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreti...
-
Article
Variant calling and benchmarking in an era of complete human genome sequences
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routin...
-
Article
Open AccessFixItFelix: improving genomic analysis by fixing reference errors
The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 p...
-
Article
Open AccessSVhound: detection of regions that harbor yet undetected structural variation
Recent population studies are ever growing in number of samples to investigate the diversity of a population or species. These studies reveal new polymorphism that lead to important insights into the mechanism...
-
Article
Open AccessTruvari: refined structural variant comparison preserves allelic diversity
The fundamental challenge of multi-sample structural variant (SV) analysis such as merging and benchmarking is identifying when two SVs are the same. Common approaches for comparing SVs were developed alongsid...
-
Article
Open AccessWhole genome sequencing identifies structural variants contributing to hematologic traits in the NHLBI TOPMed program
Genome-wide association studies have identified thousands of single nucleotide variants and small indels that contribute to variation in hematologic traits. While structural variants are known to cause rare bl...
-
Article
Open AccessThe multiple de novo copy number variant (MdnCNV) phenomenon presents with peri-zygotic DNA mutational signatures and multilocus pathogenic variation
The multiple de novo copy number variant (MdnCNV) phenotype is described by having four or more constitutional de novo CNVs (dnCNVs) arising independently throughout the human genome within one generation. It is ...