-
Article
Open AccessLocal read haplotagging enables accurate long-read small variant calling
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing pl...
-
Article
Open AccessUnsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction
Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, ...
-
Article
Open AccessThe complete sequence and comparative analysis of ape sex chromosomes
Apes possess two sex chromosomes—the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked ...
-
Chapter and Conference Paper
Multimodal LLMs for Health Grounded in Individual-Specific Data
Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ...
-
Article
Open AccessImproving variant calling using population data and deep learning
Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of v...
-
Article
Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models
Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a qu...
-
Article
Open AccessA draft human pangenome reference
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These...
-
Article
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer
Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10–25 kilobases), accurate ‘HiFi’ reads by combining serial observations of a DNA molecule into a consensus sequence. ...
-
Reference Work Entry In depth
Flavanols
Flavan-3-ols, also known as flavanols, are the most widespread flavonoid compounds in the human diet and are abundant in some fruits, vegetables, and other plant-derived foods. In nature, they exist as monomer...
-
Living Reference Work Entry In depth
Flavanols
Flavan-3-ols, also known as flavanols, are the most widespread flavonoid compounds in the human diet and are abundant in some fruits, vegetables, and other plant-derived foods. In nature, they exist as monomer...
-
Article
Open AccessStructural variant analysis of a cancer reference cell line sample using multiple sequencing technologies
The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant ...
-
Article
Open AccessAccelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing
Whole-genome sequencing (WGS) can identify variants that cause genetic disease, but the time required for sequencing and analysis has been a barrier to its use in acutely ill patients. In the present study, we...
-
Article
Open AccessDeepNull models non-linear covariate effects to improve phenotypic prediction and association power
Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due t...
-
Article
Open AccessHidden biases in germline structural variant detection
Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging.
-
Article
Open AccessA population-specific reference panel for improved genotype imputation in African Americans
There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediat...
-
Article
Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-gener...
-
Article
Open AccessChromosome-scale, haplotype-resolved assembly of human genomes
Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale...
-
Article
Author Correction: A robust benchmark for detection of germline large deletions and insertions
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
-
Article
A robust benchmark for detection of germline large deletions and insertions
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine r...
-
Article
Open AccessA diploid assembly-based benchmark for variants in the major histocompatibility complex
Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a...