Introduction

The duplication of entire genomes leads to polyploidy and occurs in many cell types and organisms. The resulting polyploids often differ from their progenitors, and are mostly viewed as aberrant or not successful in evolutionary terms. In contrast, evidence is accumulating that polyploidization may be a driving force in evolution as it increases the adaptive potential in stressful conditions (van de Peer et al. 2017), leading to evolutionary innovations and diversification (Walden et al. 2020; Ostendorf et al. 2021).

Sometimes, polyploid cells lose parts of their chromosome set, resulting in aneuploidy. For various eukaryotes, aneuploidy is mostly harmful or even lethal (Birchler and Veitia 2012; Torres et al. 2008). For example, aneuploidy is a hallmark of cancer, with about 68% of solid tumours in humans being aneuploid (Duijf et al. 2013; Passerini et al. 2016). It is well established that chromosomal instability causes aneuploidy which drives tumour formation, but there is growing evidence that aneuploidy itself might contribute to tumorigenesis (Ben-David and Amon, 2020). In humans, aneuploidy caused by the addition of one single chromosome, as extensively investigated in the chromosomal-disorder disease trisomy 21, has severe consequences and leads to characteristic phenotypical alterations. Here, the majority of genes on the multiplied chromosome 21 showed a quantitative stoichiometric 1.5 fold increase in expression (Amano et al. 2004). However, regions with altered gene expression occur all over the genome, revealing that aneuploidy affects global transcript patterns (Letourneau et al. 2014).

In contrast to aneuploids, euploid organisms deriving from a whole-genome duplication (WGD) are viable and show less phenotypical deviations. The phenotypical effects of WGDs in plants include increased cell sizes and biomass production (Wu et al. 2012; del Pozo and Ramirez-Parra 2015). Similar to aneuploidy, a WGD can result in qualitative changes in gene expression, for example by an upregulation stronger than anticipated by the increased gene dosage (Guo et al. 1996), as well as in an unaltered level of gene products, presumably caused by gene dosage compensation mechanisms (Birchler and Veitia 2012; Shi et al. 2015).

In allopolyploids with their chromosome sets originating from different taxa, a synergy between chromosome duplication and hybrid vigor or heterosis effect may occur, associated with increased growth rates, a diverging morphology and an improved ability to adapt to new environmental conditions (Comai 2005; Sattler et al. 2016). Therefore, allopolyploidization is an attractive strategy for the optimization of crop plants in agriculture (Matsuoka 2011; Behling et al. 2020), and allows them to take over new niches (Cheng et al. 2018). For example, there is molecular evidence for allopolyploidy in some mosses of the genus Physcomitrium which are important land pioneers (Beike et al. 2014; Medina et al. 2018). However, in autopolyploids, with chromosome sets from the same taxon, a hybrid vigor effect is lacking and hence the overall impact of a pure WGD on the genome is weaker (Spoelhof et al. 2017). It is unclear to what extent a pure WGD affects gene expression, not only quantitatively due to increased gene dosage but also qualitatively at the global level. A qualitative change in gene expression might contribute to phenotypic effects observed after artificial pure WGDs, like a smaller fruit size in autotetraploid Hylocereus monacanthus plants (Cohen et al. 2013) or a reduced viability in stationary phase in isogenic yeast tetraploids (Andalis et al. 2004).

In contrast to animals, land plants undergo an alteration of generations between the haploid gametophyte and the diploid sporophyte. In most cases, this alteration is heteromorphic, i.e. gametophyte and sporophyte have different morphologies. Whilst the sporophyte dominates in angiosperms, the gametophyte dominates in mosses. Thus, most mosses are haploid in the dominating stage of their life cycle (Reski 1998a), although diploid or even triploid gametophytes exist, for example in the ecologically important peat mosses (Heck et al. 2021). While the genetic regulator for the developmental switch between gametophytic and sporophytic generation has been identified in the moss Physcomitrella (Horst et al. 2016; Horst and Reski 2016), it remains unclear why these haploid plants are so successful in evolutionary terms, and not prone to excess mutations.

The discovery that Physcomitrella repairs DNA double-strand breaks (DSBs) preferably via the homologous recombination (HR) mechanism may provide an explanation for this enigma. This highly efficient HR machinery facilitates the precise and efficient integration of foreign DNA via gene targeting (GT) with success rates of up to more than 90% (Girke et al. 1998; Kamisugi et al. 2005, 2006; Schaefer and Zrÿd, 1997; Schaefer et al. 2010; Strepp et al. 1998). Subsequently, highly efficient HR was also described for the moss Ceratodon purpureus (Trouiller et al. 2007). In contrast, non-homologous end joining (NHEJ) is the preferred mode for the repair of DNA–DSBs in angiosperms. NHEJ relies on a protein complex comprising Ku70, Ku80, DNA-PKCS, XRCC4 and DNA ligase 4 (Weterings and Chen 2008), leads to a random integration pattern of a transgene in the genome, and thereby results in low GT rates (Britt and May 2003; Iiizumi et al. 2008). Hence, all attempts to establish efficient GT strategies in seed plants were not particularly successful with reported frequencies as low as 10−4–10−5 (Beetham et al. 1999; Dong et al. 2006; Okuzaki and Toriyama 2004; Zhu et al. 1999). More recently, the CRISPR/Cas9 system was successfully applied for GT in angiosperms (Steinert et al. 2016), as well as for the realization of various agronomic traits (Qi et al. 2020; Waltz 2016). However, GT rates are still low and require elaborate screening (Barone et al. 2020; Schindele et al. 2020).

It is still puzzling why HR is so efficient in some mosses. Physcomitrella is a convenient model organism to address this question since it can be easily cultivated under controlled conditions and protocols for precise genetic engineering by GT are well established (Decker et al. 2015). Its genome sequence is available, assembled and annotated, and provides evidence for at least two WGDs in its evolutionary past (Rensing et al. 2008; Lang et al. 2018), although Physcomitrella is a functional haploid (Reski 1999). Several explanations for the high GT rates have been discussed, like an altered HR mechanism compared to angiosperms encompassing slight variations in the proteins required for HR or differential expression of their encoding genes (Puchta 2002; Reski 1998b; Strotbek et al. 2013). HR-based DNA–DSB repair in Physcomitrella relies on MRE11 and RAD50 (Kamisugi et al. 2012), which are part of a protein complex binding to the ends of broken DNA strands. Targeted knock-out (KO) of the recombinase RAD51 or the SOG1-like protein SOL proved the importance of these proteins in HR and moved DNA–DSB repair to faster but non-sequence conservative repair pathways (Goffová et al. 2019; Markmann-Mulisch et al. 2007; Schaefer et al. 2010). Further, the simultaneous presence of the kinases ATM and ATR, that are also involved in the reprogramming of Physcomitrella leaf cells into stem cells after DNA damage (Gu et al. 2020), are indispensable for GT via HR (Martens et al. 2020). A number of additional proteins have been identified that are favourable but not crucial for GT, like the homology-dependent DSB end-resection protein PpCtIP (Kamisugi et al. 2016) and both subunits of the XPF-ERCC1 endonuclease complex involved in the removal of 3’ non-homologous termini (Guyon-Debast et al. 2019). Additionally, two RecQ helicases possess a crucial distinct function in HR and influence GT frequency, where RecQ6 is an enhancer and RecQ4 a repressor of HR (Wiedemann et al. 2018). Similarly, Polymerase Q (POLQ) acts as an inhibitor of the HR pathway (Mara et al. 2019).

Hypotheses that are more general were proposed early on: haploidy of the tissue may favour high HR (Schaefer and Zrÿd 1997), or an unusual cell-cycle arrest may be advantageous (Reski 1998b). Physcomitrella chloronema cells stay predominantly at the G2/M-boundary (Schween et al. 2003a). This cell-cycle phase may be correlated with efficient HR, as HR requires preferentially a sister chromatid as source of the homologous nucleotide sequence that is only available in the late S-phase and in the G2-phase (Heyer et al. 2010; Watanabe et al. 2009). Indeed, B1‐type CDKs and B1‐type cyclins are important regulators of HR in the angiosperm model Arabidopsis thaliana, linking the activity of HR to the G2-phase (Weimer et al. 2016).

A technical way to achieve GT in Physcomitrella is PEG-mediated protoplast transformation. In protoplasts, the recovery from cell-wall removal and isolation of single cells is expected to happen in the same period as the integration of the transgene via HR. This is assumed to be completed within the first 72 h after isolation before the first cell division (** kit (Combimatrix Corp.) and reused up to four times. The experimental procedure was the same as described previously (Beike et al. 2015; Kamisugi et al. 2016; Wolf et al. 2010).

Microarray data analysis

Microarray expression values were investigated with the Expressionist Analyst Pro software (v5.0.23, Genedata, Basel, Switzerland). The probe sets were median condensed, and linear array-to-array normalization was applied using median normalization to a reference value of 10,000. Differentially expressed genes were detected using the Bayesian regularised unpaired CyberT test (Baldi and Long 2001) with Benjamini–Hochberg false discovery rate correction and a minimum |log2 fold change|> 1 (Richardt et al. 2010). A false discovery rate of q < 0.05 was taken as cut-off for the first microarray time series experiment. For the second microarray time series experiment p < 0.001 was chosen for the comparison of gene expression between the ploidy levels and for the comparison of gene expression between different time-points in regenerating protoplasts. K-means clustering with k = 2 identified upregulated and downregulated genes. An overview of the plant lines and sample sources used for the different comparisons to compute DEGs is compiled in Supplementary Table T2.

SuperSAGE library construction

SuperSAGE libraries were constructed by GenXPro (Frankfurt am Main, Germany) following a protocol based on Matsumura et al. (2010) as described by El Kelish et al. (2014) with the implementation of GenXPro-specific technology and improved procedures for quality control as well as specific bias proved adapters for elimination of PCR artefacts (True-Quant methodology). In total, 17 SuperSAGE libraries (including replicates) were constructed from 11 biological samples. The biological samples encompass: The transcriptome of Haploid A and Diploid A after protoplast isolation (0 h) and 4 h and 24 h after transfection; haploid as well as diploid protonema mRNA in duplicates; transcript data of WT protoplast from 0 h, 4 h and 24 h with triplicates for 4 h and 24 h. A detailed overview of the libraries is provided in Supplementary Table T3.

SuperSAGE data analysis

The quality of the processed libraries was checked with FastQC (v0.11.4, Andrews, 2010) and reads were mapped with HISAT2 (v2.0.3, Kim et al. 2015) to the V3 assembly of the P. patens genome (Lang et al. 2018) in the Galaxy platform (Freiburg Galaxy instance, http://galaxy.uni-freiburg.de, Afgan et al. 2016). Map** parameters allowed for no mismatches and only known splice sites were considered. A count table was constructed from the mapped reads using the featureCounts (v1.4.6.p5, Liao et al. 2014) tool from the Galaxy platform by counting all the reads mapped to exons or untranslated regions of each gene. Multiple alignments of reads were allowed, while reads with overlaps on the meta-feature (gene) level were disregarded for the construction of the count table. For specific parameters, see Supplementary Table T4 and Supplementary Table T5. Statistical analysis for differential gene expression was performed by pairwise comparison of library count tables using GFOLD (v1.1.4, Feng et al. 2012) and by two two-factor analyses with the DESeq2 package in Galaxy with default parameters (Galaxy Version 2.11.40.6, Love et al. 2014). In the two-factor analyses, ploidy-dependent gene expression was determined in the presence of tissue as secondary factor. All libraries originating from protonema and different protoplast material were used as input for the first two-factor analysis and the libraries of mock transformed WT protoplasts at 4 h and 24 h were considered as replicates to the libraries of transformed WT protoplasts at the corresponding time-points. Only libraries derived from protoplasts of the lines WT and Diploid A were considered for the second two-factor analysis. In GFOLD analysis, genes with a GFOLD(0.01) value (representing the log2 fold change of gene expression adapted for adjusted p value, Feng et al. 2012) of <  − 1 or > 1 were considered to be differentially expressed whereas in DESeq2 analysis genes with a |log2 fold change|> 1 and an adjusted p value < 0.1 were considered as differentially expressed. Further data exploration was performed using functions from SAMtools (v1.3.1, Li et al. 2009).

Computational analysis of DEGs

Annotation of DEGs was obtained using Phytozome (v12.1.5, Goodstein et al. 2012) and the PpGML DB (Fernandez‐Pozo et al. 2020). For the computation of the overlap between DEGs identified in the microarray and SuperSAGE data, and to generate a combined set of DEGs comprising all DEGs from both technologies, gene IDs of DEGs identified in the second microarray experiment were converted to Physcomitrella V3.3 IDs (Lang et al. 2018). If one ID mapped to several genes of the V3.3 annotation all of them were considered as DEGs. In case the IDs of multiple DEGs mapped to the same V3.3 ID the mean of the log2 fold change values was taken. Similarly, in the comparison between the DEGs identified in our study and DEGs found by ** functions of the five Physcomitrella patens FtsZ isoforms in chloroplast division, chloroplast sha**, cell patterning, plant development, and gravity sensing. Mol Plant 2:1359–1372. https://doi.org/10.1093/mp/ssp076 " href="/article/10.1007/s00299-021-02794-2#ref-CR84" id="ref-link-section-d165887451e2485">2009) showed that the production of double FtsZ-mutants can be as effective as the production of single mutants, confirming that the amount of cDNA during transformation is sufficient for several loci at the same time. Hence, in diploid Physcomitrella lines increased expression of the gene encoding XRCC4 correlates with a suppression of GT and thereby, the NHEJ pathway gains in significance over HR, the main DNA–DSB repair mechanism of the haploid-dominant moss (Kamisugi et al. 2006). We interpret high NHEJ rates in diploids as a reduced selective pressure for accurate DNA repair due to the additional information back-up available in form of a second set of chromosomes. Elevated NHEJ rates in diploids support the hypothesis that the haploid phase of Physcomitrella is interlinked with high integration rates of transgenes via HR (Schaefer and Zrÿd 1997). Yet, ploidy is unlikely the sole factor that determines GT rates in plants for several reasons: (i) GT frequencies of seed plants did not increase with haploid tissues (Mengiste and Paszkowski 1999), (ii) GT in other haploid species like Volvox is not as efficient as in Physcomitrella (Reski 1998b), and (iii) the GT rate we measured in diploid Physcomitrella plants is still a multiple factor higher than GT rates observed in polyploid angiosperms. Another factor potentially contributing to the GT efficiency in Physcomitrella is the G2/M-phase arrest of the protonema tissue used for transformation. This was, however, unchanged after WGD in our diploids.

As we analysed the transcriptomic responses in bulks of 300,000 protoplasts each, DEGs may have been masked by different transformation efficiencies or by the bulk of untransformed protoplasts. However, we did not observe different transformation efficiencies between haploid and diploid protoplasts based on the highly standardized procedures developed by us (Hohe et al. 2004). Single-cell transcriptomic studies are gaining popularity (Cole et al. 2021) but are still in their infancy in Physcomitrella (Kubo et al. 2019) and thus not highly standardized for a series of quantitative studies we performed here. The differences in gene expression between haploids and diploids having an identical, albeit duplicated, genome might be to some extent caused by ploidy-dependent epigenetic regulation of the transcriptome. Epigenetic regulation of chromatin accessibility is partially mediated via chromatin marks. **ao et al. (2012) showed that various methyltransferases are DEGs during protoplast regeneration in Physcomitrella. This may indicate an important mechanism for epigenetic regulation of DNA repair pathways. Indeed, epigenetic alterations (Wolffe and Matzke 1999) as well as the adaption of gene-regulatory networks and direct changes in the genome structure, among others by an altered transposable element activity or homologous and non-homologous recombination (Adams and Wendel 2005; del Pozo and Ramirez-Parra 2015; Liu and Wendel 2003; Otto 2007), already happen in the first generations very shortly after a WGD. They are reactions to challenges arising in newly formed polyploids, like genetic instability (Soltis et al. 2015), an increased demand of energy and a higher number of chromosomes to deal with during mitosis (del Pozo and Ramirez-Parra 2015; Doyle et al. 2008).

With the creation of artificial diploid Physcomitrella plants we have imitated a WGD event, which is an important driving force of evolution that happened several times over the past 200 million years in land plants (Renny-Byfield and Wendel 2014; Soltis and Soltis 2016; van de Peer et al. 2017), including Physcomitrella (Lang et al. 2018). Our studies provide an insight into the adaption of gene expression following a WGD. Such findings might help to retrace how autopolyploids established during evolution. Additionally, we are one step closer to unmasking the mysteries surrounding GT in plants by further elucidating the regulation of DNA repair mechanisms. Understanding the mechanism of HR is the basis for transferring the technique and efficiency to create genetically modified organisms via GT from Physcomitrella to other plant species (Collonnier et al. 2017). The biological relevance of DEGs described here will be analysed in loss-of-function moss mutants generated by GT in forthcoming studies.