Introduction

In the last ten to fifteen years, research efforts increasingly focused on genomic mosaicism—the phenomenon where individual or entire lineages of cells within a tissue harbor genetic mutations that are not present in every cell from one individual [1]. These genetic mutations are generally a consequence of non-fidelity during DNA replication or repair and can be caused by intrinsic or extrinsic mutational mechanisms [2,3,4]. If such an event occurs early in embryogenesis, the resulting variant will be present in multiple organ and tissue types throughout the body; however, if the variant occurs later after cell fate has already been determined, it will only be detected in that specific lineage [5, 6]. In the case of the nervous system, a mutation that arises from neural stem cells lining the neural tube early in development [7,8,9] would be present throughout all or many cells within the brain [5]. Alternatively, if a mutation occurs in a terminally differentiated neuron, that mutation will only be present in this one cell and persist for the remaining lifetime of the neuron [10].

Historically, studies devoted to mosaicism predominantly focused on its impact on human disease [1, 73]. While they are significantly rarer events than mSNVs or mInDels, they may have a much larger potential impact on function, largely depending on their size, affected genomic regions, and whether they result in a change of copy number.

One interesting sub-class of mSVs that has received increased recent scrutiny is mosaic copy number neutral losses of heterozygosity (mCN-LOHs) [74,75,76]. Loss of heterozygosity refers to a phenomenon where, instead of carrying paternal and maternal genomic material, two copies of one or the other are present. These are referred to as ‘copy number neutral’, as there are still two alleles, and the region is still considered diploid, despite receiving two copies from one parent [49]. These mSVs can range in size from very small genomic intervals to entire chromosomes. mCN-LOH variants have been studied in the field of cancer genetics and the context of clonal selection—as a LOH event could alter a heterozygous cancer-driving variant to a homozygous state [49, 77, 78]. However, in cases of severe mutations, mCN-LOHs may also be selected to express milder phenotypes, as demonstrated by an intriguing study by Lee and colleagues [79].

There is one additional variant type—technically a sub-class of mSVs—that leads to genomic mosaicism and is significantly enriched in brain tissue: retrotransposition of mobile elements. The most commonly studied type is LINE-1 or L1: L1-related sequences comprise 17% of the human genome and the L1 family contains the most active transposable element in the human genome [80, 81]. These retrotransposons can create mosaic populations by inserting DNA sequences in random locations in the genomes of different populations of cells. While largely quiescent in most somatic cells, it has been shown that LINE-1 elements are highly active in develo** neuronal progenitors [82, 83]. There has been disagreement regarding the frequency of these events, and current estimates range from <0.6 to 13.7 insertions per neuron [84, 85]. As a potentially relatively large insertion (thousands of base pairs), these events may impact gene expression significantly; this is compounded by their potential to interfere with splicing if integrated into an intronic region [85]. In addition, they are also interesting functionally from an evolutionary viewpoint as they quickly enable sequences to be transcribed and expressed [86]; however, whether an analogous mechanism is important in the context of brain mosaicism remains currently unexplored.

Detection of Mosaic Variants

For this review, we distinguish three distinct types of mosaicism detection (Fig. 3A, Table 1): (1) visualization of mutations in tissues or on the level of individual cells; (2) bulk genomic DNA analysis where mutant alleles are detected using specialized mosaicism detection tools; and (3) single-cell genomic DNA analysis. For the latter two approaches, the most common choice of technology is direct sequencing analysis. Mosaicism analysis can be performed for the detection of unknown or already known variants; this is true across the three mentioned types.

Fig. 3
figure 3

Types and scale of mosaicism detection approaches. A Mosaic mutations in a subpopulation of cells may be detected by three theoretical approaches: (1) through direct visualization of mutations employing FISH or chromosome spreads, (2) through bulk analysis of genomic material, or (3) through assessment of genomic material at the level of single cells. B Biological insights obtained from mosaicism analysis are heavily dependent on the scale of sampling. For instance, mosaicism may be detected from an entire tissue like the neocortex, microdissection, or microdissections, all of which provide distinct information due to their drastically different scale.

Table 1 Approaches to discover novel mosaic variants

In the realm of visualization of genomic mutations, chromosomal karyoty**—in use since the 1950s—is a way of imaging entire chromosomes following an arrest in metaphase [87]. With the addition of techniques such as G-banding, it is possible to identify mutations at the partial-arm or whole chromosome level [88]. While often used in clinical genetics to infer karyotypes or large-scale structural variants for a patient, these methods inherently work on a single-cell level. Thus, this renders them excellent tools for understanding genomic mosaicism if a sufficient number of cells are assayed. Indeed, this approach allows for the identification of constitutive as well as mosaic deletions, duplications, or translocations of sufficient size [88, 89].

Employing RNA or DNA fluorescent probes, Fluorescent in Situ Hybridization (FISH) allows the sequence-specific detection of chromosomal locations in interphase and metaphase [90,91,92]. Thus, using FISH, researchers and clinicians can assay specific intervals on chromosomes for aneuploidies [93, 94]. FISH can also be used in the context of a tissue [95]; combining single-molecule FISH with allele-specific probes theoretically also allows the assessment of mSNVs if they are located in expressed genes [96, 97]. While not currently employed, further technological advances and the combination with super-resolution microscopy may eventually enable the direct detection of genomic mosaic variants.

While the previous methods are useful for interrogating mSVs visually at low throughput, they are generally not very scalable. In the 2000s and 2010s, DNA microarrays were popularized as a higher-resolution method to interrogate DNA copy numbers [98, 99]. Array comparative genome hybridization (aCGH) panels were designed so that an experimental sample would be compared against a diploid reference sample, and these arrays were further improved by the addition of single nucleotide polymorphism (SNP) genoty** panels that could also identify LOH. This approach vastly improved the ability to detect mCNVs and mCN-LOH down to a resolution of approximately 50 kb at AFs as low as 0.01 from bulk samples [100,101,102]. Such SNP genoty** data was leveraged effectively to detect these types of mosaic variants in blood, which allowed the detection of clonal hematopoiesis and its impact on neurological disorders [71, 78, 102, 103]. While this approach still has limited resolution as compared to next-generation sequencing technologies, DNA microarrays remain relevant due to the still competitive cost for large-scale genomic analyses.

Detection of Genomic Mosaicism from Tissues

The advent of next-generation sequencing (NGS) fundamentally changed our approach to detecting mosaicism [104]. As NGS inherently is a method that assesses the sequence of individual DNA molecules, it is uniquely suited for mosaicism research—and it allows the detection of mSNVs. The theoretical limit of sensitivity for mosaic variant detection is set by the depth of sequencing and statistical considerations of random sampling. For instance, employing simple binomial calculations, a variant present at 0.1 AF in a non-limiting DNA sample will be picked up in at least one read 65.13% of the time when sequencing at 10×, or 99.99% at 100×. Similarly, a variant present at 0.01 AF will be picked up only 9.56% of the time at 10×, or 63.40% at 100×.

As the employed read-depth is typically cost-limited by the sequenced genomic space, whole-genome sequencing (WGS) is generally performed at lower depths (~30–60×) than whole-exome sequencing (WES; ~50–100×) or targeted panels (often >1,000×) [105, 106]. Due to decreasing sequencing costs, these numbers are very much moving targets, and we and others have employed deeper sequencing to understand mosaicism at higher sensitivities [23, 107,108,109,110]. This approach is especially powerful when assessing unbiased bulk mosaicism on a tissue level or within a microdissection (Fig. 3B). When dealing with microdissections, instead, it is common to perform regular or even shallower sequencing on a larger number of samples [24, 56, 111]. An alternative approach is mosaicism detection from RNA sequencing data [112]. Conceptually similar to detecting mosaicism from exome sequencing, there are biological and technical complications that need to be considered carefully, such as the level of transcription from areas containing mosaic variants, potential splice variants that may lead to uncalled mosaicism, or RNA editing that may result in false-positives. However, RNA sequencing analysis may reveal additional functional details of the impact of mosaic variants as DNA is transcribed to mRNA or even downstream (Table 1).

While there is a strong correlation between read depth and sensitivity, the above-provided sensitivity calculations assumed that one mutant read is sufficient to detect mosaicism. In practice, NGS and its computational processing have inherent error rates that impair our ability to detect mosaic variants [29]. Thus, in addition to sequencing at sufficient depth, mosaicism analysis also requires specialized analytical pipelines [113]. As unbiased mosaicism analysis from bulk sequencing samples was pioneered by the cancer research field, variants were typically expected to be positively selected, consequently at relatively high AFs, and only present in the tumor but not ‘normal’ control tissue. Thus, algorithms were initially designed to detect mosaicism for this purpose specifically [17].

More recently, the focus of mosaicism research has shifted to include lower abundance mosaic variants (<0.05 AF) and those that are shared among tissues. In many cases, these modern pipelines still include classical tools, such as Mutect2 [114, 115], but also employ additional classifiers that provide a secondary level of evaluation and increase specificity (i.e., reduce the number of false-positive mosaic variants) [116,117,118]. Alternatively, some programs provide both variant detection and classification, such as our previously developed tool MosaicHunter [119]. All of these pipelines typically have areas of strengths and weaknesses, and they might require different experimental designs (e.g., a tumor-normal comparison). Therefore, many analytical pipelines employ a combination of methods to improve sensitivity, specificity, or both. If a variant is known (or a group of variants is routinely seen), these approaches are often replaced by more specialized pipelines (e.g., for drivers of clonal hematopoiesis) [120]. It is important to note that these analytical approaches are rapidly evolving in parallel with sequencing technologies and computational innovations. For instance, duplex sequencing can significantly improve the specificity of any detected variants but comes at the cost of increased sequencing depth requirements [121, 122].

An important part of many mosaicism detection experiments from bulk samples is the subsequent validation of candidate variants through orthogonal approaches. We want to focus on two of the most common here: targeted amplicon sequencing and droplet-digital polymerase chain reaction (ddPCR); we do, however, acknowledge that there are many others, such as subcloning of amplified products, Multiplex-Ligation Probe Amplification (MLPA), denaturing high-performance liquid chromatography (DHPLC), and so forth [19, 123, 124]. The use of targeted amplicons that are subsequently sequenced at high depth is a commonly used approach for validation and quantification by us and others [6, 125, 126]. ddPCR enables the genoty** of single molecules through lipid droplet partition [127]. While the latter has inherent advantages, such as being independent of NGS approaches and highly sensitive, it is also relatively expensive when not used to test the same variants repeatedly and less scalable. Similar to the unbiased detection methods, the use of validation approaches is dependent on the specific question, the number of variants tested, and similar considerations.

Detection of Genomic Mosaicism from Individual Cells

Finally, the detection of mosaicism can also be performed on a single-cell or single-nucleus level (Table 1). Technological advances now allow the high-fidelity amplification of a single genome and the detection of mSNVs [129]. This has been used to great effect for neurons to understand development, aging, and disease or in cardiomyocytes [25, 26, 130]. Similarly, employing whole-genome amplification, mSVs can be detected from a single cell; this has been extensively studied in sperm but also in neurons [73, 131,132,133,134]. A variation of this approach is the analysis of a clone (e.g., crypts in the gut) in situ, where it is possible to isolate tissues that are mono- or low-level polyclonal. This allows an understanding of mosaicism across phylogenies but also within human tissues to understand mutation rates in such clones [24, 56, 135]. A different and interesting addition to single-cell or single-nucleus technologies is the combination with functional readouts, such as single-cell RNA sequencing [136].

If a cell type is capable of clonal expansion (e.g., skin cells, neural progenitors, cancer cells), it is possible to expand individual cells and sequence the resulting population as a representation of the genomic mosaicism present in the founder cell [21, 137, 138]. This takes advantage of the superior amplification of genomic DNA by the cellular machinery. For cell types that are not inherently available for clonal expansion, it is possible to perform nuclear transfer into proliferation-competent donors; this has been demonstrated for postmitotic neurons in mice [139]. However, this is a complicated process that has not yet been successfully applied to human neurons.

While it is inherently attractive to assess mosaicism on the level of a single cell, there are some technical limitations to employing this approach. First, the amplification of genomic material from a single cell or nucleus is error-prone and may result in a larger number of false-positive mosaic variants [140]. There are, however, strategies to remedy this, such as the genoty** of additional material for confirmation of clonal mosaicism or the restriction to ‘phased’ haplotypes [110, 141]. Here, the assumption is that mosaic variants should be restricted to the one parental haplotype where it originally arose. If a mosaic mutation is instead found across the two, it should be considered an artifact, as it is exceedingly unlikely for the same mosaic mutation to occur twice within a sample or cell [29, 116]. Similarly, clonal expansion—if possible—may suffer from cell culture artifacts, such as mutations acquired after isolation from primary tissues or selection of certain genotypes. Second, independent of possible errors, there is an additional conceptual limitation for employing single-cell analysis. We often refer to this approach as ‘bottom-up’—in contrast to the bulk-based ‘top-down’. While single cells offer the highest sensitivity of mosaic variant detection, they also provide a less comprehensive picture of clonal mosaicism depending on the sampling strategy and the number of assessed cells.

Impact and Utility of Genomic Mosaicism

As discussed in the preceding sections, various types of mosaic variation can be detected through different technical and conceptual approaches. However, why do we want to detect mosaicism in the first place? We propose that mosaicism detection can serve three fundamental purposes. (1) Genomic mosaicism may have a direct impact on observed phenotypes (Fig. 4A); this can be either due to positive selection of the mutation and a change in clonality or due to a dominant phenotype. Thus, the detection of these mutations may aid our understanding of disease pathology or enable treatment in the future. In addition, somatic mosaic variants can also be used for their utility: (2) as clonal lineage marks to understand normal development or mutational rates (Fig. 4B); (3) as a readout of environmental processes that induce certain types or patterns of mutations at developmental or past cellular time points (Fig. 4C). We will discuss each of these three with a focus on the brain subsequently; however, we will not provide a comprehensive discussion of each point for brain mosaicism. Thus, we want to draw attention to other excellent reviews that highlight these concepts in the brain, especially in the context of diseases [10, 18, 72, 142,143,144,145,146].

Fig. 4
figure 4

Impact and utility of natural mosaicism. A Mosaic mutations may act as a driver of disease. Clones harboring mosaic mutations can be positively selected for continued expansion and proliferation which may directly result in disease. Alternatively, mosaic mutations may exhibit a dominant phenotype. Note that these two scenarios are not mutually exclusive. B Natural mosaicism marks cellular lineages and can be used for lineage reconstruction or clonal analysis. For instance, in this example, distinct clones are marked by Neutral Mutation (NM) 1 and NM 3, whereas NM 3 marks a sub-clonal lineage in combination with NM 1. C Natural mosaicism can be used as a molecular readout of the microenvironment that cells are encountering. Exposure to different environmental mutagens such as reactive oxygen species or toxins can lead to very specific ‘mutational signatures’.

Mosaic variants within the nervous system have been identified as drivers of neurological disease in several instances. For instance, focal cortical dysplasia (FCD) and hemimegalencephaly (HME) are classical mosaicism-driven neurological diseases; they are characterized by dysmorphism and hyperexcitability of a small region of the cortex or an entire cerebral hemisphere, respectively [147]. Analysis of brain tissues from FCD and HME patients has revealed activating somatic mutations in the mTOR-AKT3-PI3KCA pathway as well as loss-of-function mutations in genes that are negative regulators [13, 16, 17, 148,149,150,151]. Importantly, these somatic mutations were mostly found exclusively in brain tissue, suggesting that they were acquired later in neurodevelopment [152]. Conceptually, these disorders are part of a spectrum where the exact phenotypic presentation depends on the timing of the driver mutation and its abundance within the tissue. Mechanistically, they likely represent a combination of positively selected and dominant mechanisms, as there is evidence of associated overgrowth syndromes and a patch of hyperexcitable dysplastic cells may induce drug-resistant epilepsy within a network of neurons.

In other neurological disorders, such as Alzheimer’s disease, schizophrenia, or ASD, it is speculated that mosaic variants may also contribute to or exacerbate the overall phenotype [71, 107, 108, 153,154,155,156,157]. While the contribution of mosaicism to such disorders is clear in cases where it is also detectable from blood (especially if a known disease mutation), the direct contribution of ‘cryptic’ (i.e., brain-specific) mosaicism has been more elusive. Unlike the clear disease mutations found at lower abundance (often below 0.10 or even 0.05 AF) in FCD [151], the minimum abundance of causative mutations in other disorders remains unclear. Thus, this is still an exciting ongoing field of research.

Utility of Natural Genomic Mosaicism for Lineage Tracing

Just as the timing of mosaic mutations in development is important to understand disease pathogenesis, researchers can also utilize neutral somatic mutations to study developmental processes and clonal lineages directly (Fig. 4B). Indeed, analysis of naturally occurring somatic mutations as lineage ‘barcodes’ has been utilized to study embryonic development and more generally cellular lineages of humans [24, 158]. We and others have similarly employed this framework to specifically understand these in the brain [23, 25, 110, 138]. An important addition to such studies—independent of the use of bulk or single-cell analyses—is the restriction of analysis to specific cell types (often neurons). This is typically achieved through fluorescence-activated nuclear sorting, as most human brain samples are frozen immediately after collection, which significantly complicates cellular sorting [159].

What are some of the insights that have been derived from employing mosaicism for lineage analysis? For instance, work by us and others has focused on the spread of clones that can be distinguished by mosaic variants [23, 110]. In 2015, the Walsh Lab sequenced 36 single-cell neurons to identify thousands of somatic mutations [25]. From these, they were able to reconstruct a lineage tree and identify points of divergence. As an extension, Bizzotto and colleagues performed high-depth sequencing on multiple human tissues to identify mSNVs [110]. They concluded that at the onset of gastrulation, there exists a pool of approximately 170 cells with 50 to 100 founders committed to the forebrain. These analyses also revealed that the spread of clones across the cortical surface is largely inverse-correlated with the observed abundance, although there are exceptions to this. A recent study by us further revealed that within the neocortex—in contrast to the overall patterning of the neural tube—clones are first separated along the left-right axis before anterior-posterior [23]; however, the same left-right separation did not extend to the hindbrain. Focusing on the neocortical hemispheres, we further proposed a neural progenitor founder pool of approximately 90 to 200 cells at the time of left-right separation.

Clonal analysis that focused on lineages of defined cell types further confirmed previous findings from rodent models for the first time in humans [23, 110, 136, 160,161,162,163]. These studies also suggested the existence of developmental bottlenecks or restrictions that can modify the contributions of early lineages within similar tissues in the absence of selection. Together, while these studies described some aspects of neurodevelopment comprehensively, there are still many remaining questions that will require careful experimental (or sampling) design and analyses; these include the migratory and developmental patterns of interneurons, the lineages and clonality of microglia, and the developmental trajectories of non-neocortical brain regions.

Utility of Natural Genomic Mosaicism to Map Mutational Histories

Finally, genomic mosaicism can also be a useful biomarker of cellular environment or stressors (Fig. 4C). The rate or frequency of mosaic mutations may reflect endogenous or exogenous mutators. For instance, the somatic mutation rate is significantly increased in neurons of individuals with mutations in the DNA repair machinery during aging [26]. The mutation burden can also be increased in seemingly healthy individuals in both germ cells and neurons due to alterations in the same or similar pathways [108, 164]. This approach even enables a distinction of repair fidelity across development: for instance, the earliest cell divisions in an embryo appear to show increased mutation rates compared to the latter, possibly due to the inheritance of the repair machinery through the egg cell [6, 44, 165].

Beyond the frequency of mutations, the observed types vary based on the mutator as well. Here, mSNVs are mainly analyzed and categorized from the perspective of the pyrimidine base (i.e., cytosine or thymidine) and the newly acquired mutations: thus, there are six possible substitutions, three for each. We mentioned above C>T substitutions and that they mainly derive from the deamination of methylated cytosine [52]; similarly, other cellular processes may drive different types of mutations. These patterns mainly derive from cancer studies but have been widely applied across genomic mosaicism research [166].

In addition to the described six categories of mutational types, more recently, the context of a mutation—the neighboring bases—has also been considered. This represented a significant innovation, as the two bases immediately adjacent to the mutated base allow for a finer dissection of molecular mutation mechanisms [167,168,169,170]. This is achieved through the statistical isolation of ‘mutational signatures’ which represent distinct potential drivers of mutations. While some of these turned out to be artifactual, others have been directly connected to internal and environmental mutagens, such as DNA replication, ultraviolet A light radiation, tobacco smoke, or certain chemotherapeutics [171,172,173]. An important limitation to assessing mutational signatures is the requirement for a sufficient number of observed mutations. Thus, this approach is mainly applicable to large collections of cancer genome data, or studies that leverage the individual genomes of cells like those centered on neurons [51]. In certain situations, lower numbers may be sufficient if driven by specific mutations, as demonstrated by a study focused on transgenerational mutation rates [164]. If this method can be implemented, it opens a window into the experienced environment of cells during development or in the context of disease.

For instance, neuroinflammation is a common symptom of many neurodegenerative disorders and is associated with a dysregulation of redox balance in the brain [174, 175]. A higher level of reactive oxygen or nitrogen species can result in elevated rates of somatic mutations in individual cells or their lineages present in the brain [176,177,178]. Analyzing the rates and types of somatic mutations in neurotypical versus diseased individuals provides information on the disease environment and may even identify contributing factors. In a recent example, this approach was applied to Alzheimer’s disease, confirming an inflammatory environment that causes oxidative DNA damage in neuronal nuclei [179]. The human genome effectively acts as a tape recorder of its environment; when combined with developmental lineage analysis it is possible to also resolve the embryonic environment. However, an efficient implementation requires the generation of large data sets that allow for the stratification of distinct developmental stages at higher resolution. The National Institutes of Health have recognized this limitation and have started the Somatic Mosaicism across Human Tissues (SMaHT) network which was initiated in 2022 and plans to provide a database of human genomic mosaicism and related technological toolboxes.

Utility of Engineered Genomic Mosaicism

Conceptually, both lineage analysis and mutational signatures represent a fascinating conundrum. For both, adult tissues—often from deceased individuals—are employed to understand embryonic processes that often occurred decades ago. While this allows insights into early development in the context of an organism—humans—that is otherwise intractable for such studies, it comes with some caveats. First, lineage reconstructions require many assumptions, including that mutations and lineages are neutrally selected and do not disappear; while this is mostly an appropriate approximation, it has been demonstrated to be inaccurate in some cases, especially in the context of the first cell divisions [110, 138, 180]. Second, due to the naturally determined rate of mutation and technical issues, lineage trees often remain incomplete. Thus, while these approaches allowed interesting and fundamental insights into neurodevelopment despite these limitations, there are clear advantages when mosaicism can be engineered rather than passively detected.

Lineage tracing through engineered mosaicism has a long history in neuroscience research. Conceptually, the goal is to mark the genome of a subpopulation with a permanent change that can be detected at a later time point [181, 182]. For instance, classical lineage tracing experiments in the brain by Walsh and Cepko employed retroviral vectors where a subset of cells was labeled based on their location at the time of injection [183,184,185]. Depending on the vector, the readout of the mosaic change was based on a visual phenotype or a direct readout of a known genetic sequence. The advent of advanced mouse genetics enabled the now classical lineage tracing with a fluorophore or otherwise active reporter protein upon irreversible activation by a Cre recombinase (Fig. 5A) [181, 182]. More complex reporters include the use of combinatorial multi-fluorophore systems to differentiate individual lineages in parallel or Mosaic Analysis with Double Markers that sparsely label individual daughters of a single cell [186, 187]. The most significant drawback of this system for lineage tracing is the limited resolution employing fluorophores.

Fig. 5
figure 5

Types of engineered mosaicism. A One of the most utilized methods to track lineages involves the use of fluorescent markers, such as GFP. One possible configuration employs a stop cassette which is flanked by Loxp sites and prevents the transcription of GFP. When Cre recombinase is expressed from a lineage-defined locus, the stop cassette is removed and the GFP is expressed in this cell and its daughters. B A more recent innovation used to track lineages employs ‘genomic writers’ (e.g., Cas9) that are targeted to a defined locus, often denoted as ‘genomic tape’. Here these writers can introduce either random or defined mutations that act as genomic barcodes to distinguish cells and their lineages. These barcodes are subsequently read through targeted sequencing. C Using writers in combination with other systems, it is possible to further encode the temporal resolution of defined signals (i), the expression status of a cell (ii), or protein binding to genomic regions (iii).

A potential solution is the direct readout of genomic alterations that can distinguish many lineages in parallel based on combinatorial sequence variation. The fundamental idea of such systems is the use of a genomic ‘writer’ with a controlled or known expression that can introduce genomic changes in a locus that contains ‘genomic tape’ (Fig. 5B). In 2016, a novel method named GESTALT (genome editing of synthetic target arrays for lineage tracing) utilized CRISPR/Cas9 to barcode cells in this way throughout early development [188]. This technique allows for the sequencing at a later developmental point and the reconstruction of a lineage tree by analyzing the barcoded sequence. A fundamental limitation of GESTALT is the introduction of random insertions and deletions that may interfere with each other and complicate lineage reconstruction. This was addressed by work from the same laboratory through the use of sequential genome editing in a proof-of-concept in cell lines [189]. A similar approach has also been employed in mice [190].

Employing an orthogonal but related concept, Kalhor and colleagues devised a homing CRISPR method in which homing guide RNAs were designed to act on their binding region [191]. The Cas9 enzyme can then introduce a variety of different types of mutations to the binding site of the homing guide RNA to act as cellular barcodes. They applied this method to the develo** murine brain, assessing barcodes from the left and right sides of the forebrain, midbrain, and hindbrain. Their results found that commitment to the anterior-posterior axis occurs before the lateral axis; however, they did not resolve this pattern in the neocortex itself. The homing CRISPR technique shows the feasibility of utilizing engineered mosaicism to identify and track lineage, both spatially and temporally. While technically different, the use of Cre recombinase on the complex engineered ‘Polylox’ locus follows a similar logic [192]: here, combinatorial recombination distinguishes distinct lineages, as was demonstrated in the hematopoietic system. Common to all these techniques is the ability to retrieve the combinatorial and lineage-defining loci through direct sequencing as the genomic tape is known a priori. Following the retrieval of the engineered mosaic marks, specialized algorithms allow the reconstruction of lineages similar to genome-wide natural mosaicism.

The two biggest advantages of using such methods are the tunable resolution of lineages based on the activity of the writer and the reduced sequencing cost due to the known mutated genomic tape. However, these described methods are inherently unable to reflect any ‘cellular states’ other than their initiation condition (e.g., expression of a Cre recombinase). In response to this limitation, several alternative approaches have been developed that allow the recording of such states (Fig. 5C). Fundamentally, these methods aim to reflect features such as gene expression or protein binding to chromatin as a permanent record in the genome.

For instance, a study by Chen and colleagues proposes the driving of multiple writers with distinct signatures by distinct enhancers [193]; this would enable to recording of a temporally resolved sequence of a predetermined number of input signals and use them for lineage tracing. Going one step further, bacterial systems can store copies of expressed RNA in their genome [194,195,196]. While currently only employed to record highly expressed RNAs, this has the potential to also mark lineages by variable integration of these transcripts. However, these systems have not yet been translated into eukaryotes or employed to distinguish clones of cells. Thus, a genome-wide recorder of transcriptional activity is currently unavailable in mammals. Finally, a method named ‘Calling Cards’, developed by the Mitra laboratory provides a distinct recording of cellular state [197, 198]. This method marks binding sites of transcription factors through the integration of a permanent transposon into the genome. Therefore, the genome-wide binding of a protein of interest can be assessed based on a customizable genomic scar.

While engineered mosaicism has advantages over natural mosaicism in terms of lineage tracing, there are important applications of the latter in model organisms. For instance, a recent study by Uchimura and colleagues tracked natural mosaicism in somatic and germ cell lineages for lineage tracing [199]. Importantly, their approach allowed them to retrieve features of mutation rates and signatures, which allows the inferral of the mutational environment during development. Moreover, the use of natural mosaicism allows the avoidance of potentially complex breeding strategies to introduce the necessary tunable genomic writers and the genomic tape. As sequencing costs continue to decrease, the most important advantage of engineered mosaicism is its superior resolution. However, it is conceivable to modulate mutation rates through the use of chemical mutagens or genetic backgrounds with impaired genome repair mechanisms. While currently not optimized, the bioinformatic analysis could be adapted for model organisms as needed.

Conclusions

Driven by technological advances, genomic mosaicism research in the brain has progressed significantly in the last decade. This review provides a bird’s eye view of current trends in this field, including conceptual definitions, current methodological approaches, and an overview of the impact and utility of genomic mosaicism. As sequencing is one of the main drivers of current discovery, this field benefits from the rapidly decreasing costs of next-generation and—prospectively—third-generation sequencing. This allows researchers interested in lineage tracing questions in the brain to more easily apply genomic mosaicism approaches. We hope this review will act as a primer for interested parties and allow wider adoption of the here-described concepts.