Abstract
Background
Transcription factors (TFs) are vital elements that regulate transcription and the spatio-temporal expression of genes, thereby ensuring the accurate development and functioning of an organism. The identification of TF-encoding genes in a liverwort, Marchantia polymorpha, offers insights into TF organization in the members of the most basal lineages of land plants (embryophytes). Therefore, a comparison of Marchantia TF genes with other land plants (monocots, dicots, bryophytes) and algae (chlorophytes, rhodophytes) provides the most comprehensive view of the rates of expansion or contraction of TF genes in plant evolution.
Results
In this study, we report the identification of TF-encoding transcripts in M. polymorpha for the first time, as evidenced by deep RNA sequencing data. In total, 3,471 putative TF encoding transcripts, distributed in 80 families, were identified, representing 7.4% of the generated Marchantia gametophytic transcriptome dataset. Overall, TF basic functions and distribution across families appear to be conserved when compared to other plant species. However, it is of interest to observe the genesis of novel sequences in 24 TF families and the apparent termination of 2 TF families with the emergence of Marchantia. Out of 24 TF families, 6 are known to be associated with plant reproductive development processes. We also examined the expression pattern of these TF-encoding transcripts in six male and female developmental stages in vegetative and reproductive gametophytic tissues of Marchantia.
Conclusions
The analysis highlighted the importance of Marchantia, a model plant system, in an evolutionary context. The dataset generated here provides a scientific resource for TF gene discovery and other comparative evolutionary studies of land plants.
Similar content being viewed by others
Background
Regulation of gene expression is central to all organisms [1] and is imperative for determining the morphology, functional competence, and development of a multicellular organism [2]. This regulation is tightly coordinated by a number of mechanisms, such as DNA methylation [3]; chromatin organization [4]; dimerization; and sequence-specific DNA binding, which is executed primarily by transcription factors (TFs). Depending upon the combinatorial control of protein-protein interactions, a TF may simultaneously function as an activator of one set of genes and a repressor of others [5]. For example, TFs have been known to determine the identity of floral organs in plants [6]. These TFs, referred to as organ identity genes, control the transcriptional regulation of target genes, thereby triggering organ formation in sexual plant reproduction. Via their various actions, these modular proteins play a pivotal role in controlling the spatial and temporal expression patterns of genes in all living organisms.
Usually, TFs are comprised of a DNA-binding domain (DBD) that interacts with the cis-regulatory elements of its target genes [7] and a protein-protein interaction domain that facilitates oligomerization between TFs and other regulators [8]. The majority of TFs may be grouped into a number of different families according to their structural features, i.e., the type of DBD that is present within their sequence [5]. Usually, each TF has only one type of DBD, occurring in either single or multiple copies.
Eukaryotes have a more sophisticated transcription regulation mechanism than prokaryotes. Multicellular eukaryotes must address cell differentiation and consequently administer a more enigmatic regulatory mechanism, which uses a large number of TFs [9–12]. Reports have also shown that TF families are strongly conserved across eukaryotic organisms, especially plants [13]. Approximately, 45% of Arabidopsis TFs belong to families that are specific to plants [1]. As in animals, TF families have been considerably expanded in plant lineages, suggesting that they are involved in the regulation of clade-specific functions [1, 8, 14, 15]. Thus, plants have more TF genes than animals [13, 16]. A significant number of protein-encoding genes are dedicated to regulating the transcription machinery and gene expression [1]. In plants, ~7% of all genes encode for TFs. For example, the genome of Arabidopsis thaliana includes 27,416 protein-coding genes (TAIR http://www.arabidopsis.org/), of which 6% (more than 1,700) encode TFs.
The completion of various genome sequencing projects has provided a unique opportunity for comparative studies of transcriptional regulatory networks. Distribution and sequence analyses suggested that TF genes in plants evolved via genome duplication [17], exon capture, translocation, and mutation. The retention of duplicated TF genes led to gene family expansions, which further complicated the genomes of higher plants [18]. TF families that have significantly expanded in the past 600–100 million years are mainly the MADS box proteins, basic-region leucine-zipper proteins (bZIP), and the MYB and bHLH families [8, 19, 20].
Plants and animals are known to have originated from a common ancestor. Structural conservation of TF DBDs among plants and animals suggests that these domains may have originated before these two eukaryotic kingdoms diverged. Little structural conservation has also been reported among different eukaryotic TFs. This suggests that eukaryotes use only a limited number of DBDs to achieve various regulatory purposes, in combination with other functional activation domains. Thus, TFs may be viewed as molecular switches that link signal transduction pathways to gene expression [7]. The function of a few TF families has remained conserved between plants and animals separated by over a billion years of evolution [1]; one example is the E2F family, which controls basic cell cycle functions [1, 21]. On the other hand, many TF families may exhibit altered or diverse functions due to minor sequence changes in different plant and animal lineages [11]. Thus, these evolutionary changes in sequences and TF functions may complicate the detection of paralogous/orthologous relationships between organisms.
Liverworts are among the earliest diverging plant lineages, thus constituting a sister group to all other land plants [22–27]. The bryophyte fossil record shows that liverworts are at least 475 million years old [28]. M. polymorpha is a common liverwort with a wide distribution around the world and is one of the most intensively studied bryophytes. Because they belong to the clade of the most basal plant lineages, liverworts occupy a very important position with respect to understanding early land plant evolution [29]. No evolutionary study can be complete without data from Marchantia. Unfortunately, only minimal genomic information has been available for this bryophyte, until now. Although some expressed sequence tags (ESTs) have been produced and some male and female gene-based markers have been developed, full-fledged functional genomics studies in liverworts have not been initiated. Since M. polymorpha is a dioecious plant, ESTs have been generated [30, 31] in an attempt to identify key genes involved in sex differentiation mechanisms and the development of male and female plants but are limited in coverage. Here, we present the entire repertoire of regulatory factors in this liverwort for the first time and predict a set of TF-encoding transcripts in M. polymorpha on the basis of stringent sequence similarity with known TF genes. Sequence comparisons alone would not have provided the appropriate information regarding the alterations of TF function during evolution; hence, we also examined the expression profiles of the TF-encoding transcripts in M. polymorpha. In this study, we also focused on the evolution of TF gene families based on a comprehensive comparison of TF gene distribution in liverworts, mosses, higher plants, and their algal ancestors.
Results and discussion
Identification of TF-encoding transcripts through transcriptome sequencing and De novoassembly
The transcriptome of M. polymorpha was sequenced from RNA isolated from six different male and female tissues, as described in materials and methods section and shown in Additional file 1, using short reads on an Illumina HiSeq™ 2000 platform (Sharma et al., unpublished observations). The chosen tissue samples for RNA isolation and sequencing represented the most comprehensive repository of vegetative and reproductive stages of both male and female gametophytic tissues. The Marchantia transcriptome dataset generated from this study is a new source for the identification of novel regulatory transcripts and has provided a glance of their expression profiles in vegetative and reproductive tissues.
Approximately 80 million paired-end sequence reads, each 90 bp in length, were generated from RNA sequencing (Sharma et al., unpublished observations). Low-quality reads were filtered out before assembly. De novo transcriptome assembly was performed with Velvet [32] and Oases [33] using the same parameters used by Garg and colleagues for their transcriptome assembly [34]. De novo assembly of the Marchantia transcriptome resulted in a total of 46,533 non-redundant (NR) transcripts from 46,070 predicted loci. The sequence dataset generated is deposited at NCBI in the Short Read Archive (SRA) database under accession number SRP029610.
The total genome size of M. polymorpha was estimated to be 280 Mb based on flow cytometry, and the total number of genes was estimated to be ~20,000 [35]. In this study, 46,533 transcripts from 46,070 loci, potentially representing an estimated number of genes, were predicted from the transcriptome data of M. polymorpha. This number likely includes the alternatively spliced variants and non-coding transcripts. In fact, only 20,000 out of 46,533 transcripts generated BLASTX hits, with an E-value cut-off of 1e-05, against the protein sequences of embryophytes that were extracted from the NR NCBI database (Sharma et al., unpublished observations). Hence, we assume that most of the Marchantia genes, including TF genes, were detected by our RNA-Seq data. Our results indicate that the obtained transcript dataset may be fragmentary. Thus, the number of transcripts/genes encoding for TFs is likely to be fewer than what is presented in the data below. Further, the genome sequence information for Marchantia may provide more information about the fragmentation of transcripts in this liverwort.
The assembled NR transcripts of Marchantia were compared with known TF gene sequences of other sequenced plants listed in PlnTFDB [36] using BLASTX. In total, 3,471 putative Marchantia TF-encoding transcripts, distributed in at least 80 families, were identified, representing 7.4% of the total Marchantia transcripts detected in our study. Major TF gene families are depicted in Figure 1. The organization of TF families in Marchantia resembled that of Physcomitrella patens[36–39].
Hence, the description of TF-encoding transcripts from Marchantia provided insight into the organization and biological functions of TFs in lower plants as well as their evolution. From a biotechnological standpoint, TF identification is useful for studying the transcriptional regulatory switches involved in plant development and reproduction and in generating responses and sequential adaptations to the changing environment.
Comparison of TF-encoding genes in plants and their algal ancestors
In the present study, we first summarized the knowledge of TF-encoding genes in plants and algae, while updating the classification of Marchantia TF-encoding transcripts and their categorization in all 80 different TF families. PlnTFDB [40] includes 85 families of TFs and TRs from 20 sequenced plant species other than liverworts, ranging from unicellular red and green algae to highly complex angiosperms, thereby including >1.6 billion years of gene regulatory network evolution and encompassing 26,184 distinct proteins. Sequence data showing the number of TF encoding genes in red algae, green algae, Selaginella, Physcomitrella, Chlamydomonas, and other higher plants is listed in Table 1.
Data presented in Table 1 show that the number of genes encoding TFs is the smallest for algae; the number increases from liverworts to mosses, and increases further in monocots and dicots. More complex organisms execute complex mechanisms to control gene expression by employing a greater number of TFs [2, 9–12, 15, 41]. In eukaryotes, an appreciable number of protein-coding genes encode TFs. The number of TF-encoding genes ranges from 2–9% of the total protein-coding genes of the 20 organisms considered. As expected, based on published reports, the smallest number of TF genes was found in the most primeval organisms e.g., Chlamydomonas and Physcomitrella, where TF genes were found to be 2%–4% of the total genes annotated. In higher plants, the greater complexity of form and function presumably mandates an increased number of TF genes (e.g., monocot and dicot plants have 5–9% TF genes) [13]. This was clearly demonstrated in some earlier reports, which are summarized in Table 2. The number of total predicted protein-coding genes and the number of predicted TF genes identified are also indicated.
Existing knowledge of plant TF genes was acquired from various studies conducted on an exemplar genetic model in plant biology—Arabidopsis thaliana. Despite Arabidopsis being an important and very useful plant model for studying various developmental processes and regulatory mechanisms common to all higher plants [13], it lacks certain traits that are concomitant with the evolutionary movement of plants from aquatic conditions to land, such as the loss of genes associated with an aquatic environment and acquisition of genes for tolerating terrestrial stresses. These traits are of immense value to lower plants, and this may support the concept of evolution of plants from their algal ancestors. Hence, it was of great interest to perform a more comprehensive comparative analysis of TF genes between alga, moss, spike moss, liverwort, and higher plants. We considered the identity of organisms when evaluating gene family sizes, as various organisms are reported to have different rates of gene duplication and retention, and differences in gene content may reflect species-specific adaptations [39].
Figure 2 shows 85 TF families, color-coded according to the lineage of land plants in which they were commonly found. A strikingly important observation made from the analyzed comparative dataset is that, out of the 85 gene families taken into consideration, 24 appear to originate as liverworts evolved (marked as orange blocks). These families are present in all land plants, including liverworts, but are absent in red algae (rhodophytes) and green algae (chlorophytes): Alfin-like, ARF, AUX/IAA, BBR/BPC, BES1, CAMTA, DBP, EIL, FAR1, GeBP, GRAS, GRF, HRT, LFY, LOB, LUG, NAC, NOZZLE, OFP, SRS, TCP, Tify, Trihelix and zf-HD. An initial report stated that these 21 TF families arose within the earliest land plants or in their aquatic ancestor [2]. However, taking Marchantia into consideration, given that it is the earliest diverging lineage, transcriptome sequencing provided us with new findings. The numbers of TF-encoding genes in all studied organisms are given in Additional file 2. Some TFs, which originated together with the evolution of liverworts, contribute to the stress tolerance capacity of plants: for example, CAMTA [54] and Alfin-like [55] regulate salt tolerance; ARF [56] and AUX/IAA [57] play roles in auxin regulation; EIL [58] is known for ethylene signaling in higher plants; and GRF [59], LFY [60], LOB [61], LUG [62], NAC [63], NOZZLE [64], OFP [65], and Tify [66] regulate meristem elongation, flowering initiation, and flowering organ development [6]. Trihelix TFs are known to be involved in diverse functions in seed plants, such as abiotic stress tolerance [103]. In Marchantia, WRKY proteins are transcriptional regulators that are proposed to play a role in proper cellular responses to internal and external stimuli. Other transcripts showing preferential expression pattern for reproductive stages code for AP2-EREBP – a regulator of floral organ identity [104], HB which is involved in cell differentiation and controls cell-growth [105], LOB which functions in plant development in lateral organs like the leaf or flower [61], MYB which controls cellular proliferation and the commitment to development [99], PHD which controls chromatin or transcription [106], SET which is involved in histone methylation [107], and TIG which is involved in DNA binding [75]. Thus, these TFs are proposed to play similar roles in Marchantia.
In plants, the manifestation of fundamental biological processes and proper development requires some genes to be expressed constitutively, while others are expressed in a specific spatio-temporal pattern (organ-limited, stimulus-responsive, development-dependent, and cell-cycle specific manners). Both patterns of expression rely on the interaction of TFs with cis-acting elements or with other TFs for the regulation of cell activities. Hence, any change in the expression profile of TF genes in tissues normally leads to dramatic changes in plant development, and structural changes to these genes may signify an important evolutionary force [95]. As a practical approach, studying the expression pattern of these TF-encoding transcripts in liverworts provides us with strong evolutionary support for models and emphasizes the importance of this model plant system.
Putative functions of TF-encoding transcripts
3,471 TF-encoding transcripts were subjected to a BLASTX search against the non-redundant (NR) database of the NCBI (National Center for Biotechnology Information). The BLASTX search used an E-value cut-off of 1e-05. Out of 3,471 transcripts, 3,395 (97.8%) resulted in hits, supporting that these are the protein-coding genes. 94.8% of 3,395 transcripts resulted in hits with plants. A list of BLASTX hits is provided in Additional file 8.
qPCR validation
qPCR analysis was used to compare the expression of selected variably expressing transcripts across a spectrum of tissues, including vegetative, immature, and mature reproductive stages. Transcripts displaying consistent expression across the spectrum of cells were taken as reference genes. Homologues of actin (MpACT1) and CDPK (MpCDPK) exhibited variable expression in six considered stages when checked by qPCR, as shown in Additional file 9. Hence, CDPK and actin were not taken as reference genes. Instead, based on the RPKM values, a transcript having consistent expression was selected as the reference gene and was cross-checked by qPCR as well (Additional file 9). qPCR results confirmed the in-silico calculations for the RPKM values of the dataset for most of the transcripts, as shown in Figure 4. The de novo assembled Marchantia TF expression data presented here will also be beneficial for performing other functional genomics and comparative genomic studies.
Our in silico inspection of the expression patterns of these TF-encoding genes in different vegetative and reproductive tissues suggested tissue-specific and/or stress-responsive attributes in accordance with their expression patterns. The tissue-specific expression profile of a gene could also be used to discuss the combinatorial usage of TFs for dictating the transcriptional program of different tissues. Members of different TF gene families appear to differ in their time and level of expression as they responded to multiple environmental signals and different developmental signs. Consequently, specific lower-plant traits may derive from some unique TF gene expression patterns. Additionally, it is possible that the same TF gene family members variably express in different plants [95]. Hence, the differential expression of similar TF genes upon exposure to contrasting environmental stimuli could be due to cis-acting elements. Clearly, the regulation of TF gene expression and function involves a vital network of interrelated processes.
Statistical analysis
Analysis of variance showed highly significant differences among ranks (p <0.0001), in terms of the number of genes coding for TFs, as depicted in Additional file 10. The number of TF-encoding genes appear to increase significantly with organism rank, and thus complexity of the organisms involved. The comparisons of ranks using Gabriel’s comparison limits revealed three major groups. The two most primitive organisms (ranks 1 and 2) had a similarly few number of TF-encoding genes. Organisms classified as rank 5 and 6 (most developed) exhibited a similarly high number of TF-encoding genes. Organisms in rank 3 and 4 showed medium numbers of genes and were placed in between these two extremes, as shown in Figure 5. The results of variance showed that nearly 59% of the total variation in the number of genes coding for TFs was between organisms. Differences between ranks contributed to 39% of the variation, and only 2% variation existed between organisms grouped within a given rank.
Conclusions
Liverworts as the sister of all land plants represent the basal lineage of land plants, providing a unique perspective on the regulatory origin of TFs and the genetic complexity of terrestrial plants. Marchantia, among the liverworts, is particularly easy to grow, transformable, and may prove to be a crucial model for future study of the origin of regulatory genetic systems. The availability of the complete genomic sequences of an increasing diversity of important plant species has provided us with a unique opportunity for comparative studies on the expansion and contraction of TF families. The expansion of regulatory protein numbers and interactions, as well as changes to their spatial and temporal expression, constitute part of the evolutionary process that has led to increasingly complex organisms.
The comparison of Marchantia TF genes to other sequenced plant genomes reveals the emergence of new TF families within Marchantia that have been preferentially retained and have particularly diversified in higher plants. Among these, such TF families as GRAS, LFY, LUG, NOZZLE, Tify and Trihelix play important roles in sexual plant reproduction. Liverworts therefore appear as a critical lineage with respect to terrestrial trait development through the origin and diversification of TF genes regulating specialized functions in reproduction. The evolution of these TF families in Marchantia may allow the activation of gene expression during male/female reproductive organ formation and differentiation. However, two TF families present in lower plants and green and red algae did stop with Marchantia and were not inherited in higher plants.
This study identifies TF genes and provides a detailed analysis of TF gene expression as a means of understanding the impact of TF diversification on the evolution of liverworts and their importance in the origin of modern land plants from bryophytes to flowering plants. Thus, we have demonstrated the utility of short read sequence data to characterize TF-encoding transcripts using Marchantia as a basal lineage in the context of genetic change in a broad comparison of terrestrial plants with their charaphytic and algal ancestors. Further analysis is expected to increase our knowledge of organism diversification through further chromosomal sequence analysis and reorganization. In addition, the identification of cis- and trans-acting elements associated with plant TFs are expected to reveal additional mechanisms that regulate gene expression in a more tightly regulated genetic context. Future studies are expected to build on the current liverwort TF gene transcriptome through construction of a broader interactome (protein-protein interaction) and elucide the regulons controlling each TF. The establishment of such a TF interactome within a fairly short time span is a feasible and important goal. Such an interactome will encompass TF-TF interactions directly as well as TF-DNA interactions and will highlight the underlying complexity of gene regulation in liverworts.
Methods
Plant material and growth conditions
Male and female M. polymorpha plants were collected from local wild colonies growing in nurseries in Melbourne, Australia. Male and female lines for RT-PCR and Real-time PCR experimental purposes were established from a single gemma of the thallus. Plants were maintained and propagated in growth cabinet with temperature of 20°C and continuous white light 60 μmol photon m-2 s-1 and far-red (FR) light 730 nm. Tissues were collected for the RNA sequencing from male and female vegetative thallus (VM and VF), immature male and female reproductive structures (antheridial and archegonial discs) - 2 mm in height (IMM and IMF) and mature male and female reproductive structures (antheridial and archegonial discs) > 2 mm in height (MM and MF) as shown in Additional file 1.
RNA sequencing and assembly
Total RNA was extracted from the male and female vegetative thalli and immature and mature reproductive gametophytic tissues of M. polymorpha (obtained from nurseries across Melbourne) using an RNeasy extraction kit (Qiagen, Australia), according to the manufacturer’s recommendations. RNA samples were quantified using a Nanodrop ND-1000 spectrophotometer (Biolabgroup, Australia). RNA sequencing was performed by the Bei**g Genome Institute (BGI), China. In total, six cDNA paired-end libraries were generated using the mRNA-Seq assay for transcriptome sequencing on Illumina HiSeq™ 2000 platform.
Briefly, beads with Oligo(dT) were used to isolate poly(A) mRNA from the total RNA preparations. mRNA was fragmented into short fragments and taking these fragments as templates, random hexamer-primer was used to synthesize the first strand cDNA. The second-strand cDNA was synthesized using dNTPs, RNaseH and DNA polymerase I. Short fragments were purified and resolved for end reparation and adding poly(A). Short fragments were then connected with sequencing adapters and suitable fragment were selected using agarose gel electrophoresis for the PCR amplification as templates. At last, the library could be sequenced using Illumina HiSeq™ 2000.
Raw sequence reads were filtered for low quality reads trimmed off 3’ adaptor sequences. All short read assemblies were performed using publicly available programs: Velvet (version 1.1.05; http://www.ebi.ac.uk/~zerbino/velvet/), developed for de novo short read assembly using de Bruijn graphs [32], and Oases (version 0.1.22; http://www.ebi.ac.uk/~zerbino/oases/), a de novo transcriptome assembler for very short reads [33]. After velvet assembly, the resulting contigs were clustered into small groups, loci using Oases to produce transcript isoforms. Various parameters of these programs i.e. K-mer length = 49, N50 length were optimized to obtain the best assembly results with our dataset.
Similarity search and identification of TF-encoding transcripts
For the identification of TF-encoding transcripts in M. polymorpha, all of the assembled transcripts were subjected to a homology search (BLASTX) with known transcription factors (TFs) and other transcriptional regulators (TRs), as classified in Plant Transcription Factor Database (PlnTFDB; version 3.0; http://plntfdb.bio.uni-potsdam.de/v3.0/[40, 108]), with an e-value cut-off of 1e-05 using default parameters. PlnTFDB is an integrative database that provides complete sets of TFs and TRs in plant species, which have completely sequenced and annotated genomes and that are listed in the database.
Protein sequences for all of the genes from 20 species listed in the PlnTFDB were downloaded from (http://plntfdb.bio.uni-potsdam.de/v3.0/downloads.php); the file contained 29,473 sequences. This file acted as the database for the local BLASTX search, and the query file contained all the assembled Marchantia transcript sequences. The BLASTX results were inspected for their top first hits using in-house python script, and thus, putative transcripts of M. polymorpha that coded for TFs were identified.
Comparison of TF-encoding genes in plants and their algal ancestors
In order to better understand the evolution of TFs, comparative studies of TF gene families was carried out between 21 algal and plant species - 20 species were listed in the Plant Transcription Factor Database (PlnTFDB) and Marchantia transcripts. We investigated TF gene evolution based on the phylogenetic positions of plants listed in PlnTFDB and by comparing the number of genes coding for a particular TF family in different plant and algal species taken into consideration. Comparative analysis was performed on the number of TF genes by highlighting similarities and differences in TF gene populations among the organisms taken into consideration. The percentages of identified TF genes compared with the total number of protein-encoding genes in the genome were also analyzed for all species. We took into account the events of emergence, halt, expansion and contraction of particular TF gene families by considering the number of genes/transcripts that encoded for a specific TF in various species.
Expression patterns of TF-encoding transcripts of M. Polymorpha
We mapped all of the reads from six libraries onto the non-redundant set of assembled transcripts to quantify the abundance of the transcripts using Bowtie [109] allowing upto 3 mismatches per read. The calculation of transcript expression in each tissue used the RPKM (number of reads per kilobase per million reads) method [110]. The expression value in terms of the RPKM, which corresponded to each transcript in all six tissues, was determined. TF-encoding transcripts were quantified by the formula:
where RPKM(A) is the expression of transcript A, C is the number of reads that uniquely aligned to transcript A, N is the total number of reads that are uniquely aligned to all transcripts and L is the number of bases on transcript A. The RPKM method eliminated the influence of different gene lengths and sequencing levels on the calculation of gene expression. Therefore, the calculated gene expression could be directly used to compare the difference in gene expression between samples.
RT-PCR analysis
For the detection of transcripts that were expressed at specific stages as revealed by the assembly and RPKM methods, RT-PCR was carried out. Reverse transcriptase (Superscript™ One step RT-PCR with Platinum® Taq, Invitrogen, Australia) reactions were performed using 20 ng of total RNA, according to the manufacturer’s instructions. The cDNA equivalent of 20 ng total RNA was amplified in 10 μl reactions for 45 min at 50°C. The reaction conditions were as follows: pre-denaturation for 2 min at 94°C, followed by 35 cycles of 94°C for 15 s and annealing/extension at 58°C for 30 s, then 72°C for 1 min, followed by a final extension of 1 cycle at 72°C for 5 min. PCR products were run on a 1% (w/v) agarose gel to confirm the size of the amplification products and to verify the presence of a unique PCR product. Total RNA used in RT-PCR and Real-time PCR analysis experiments were extracted from the clean cultures of Marchantia. These RNA preparations were entirely independent from the ones used in RNA sequencing. Two technical replicates were done for each of the nine transcripts. Primers suitable for amplification for each transcript were designed using an online tool from Invitrogen, OligoPerfect™ Designer (http://tools.invitrogen.com/content.cfm?pageid=9716). A list of primers used is given in Additional file 11.
Real-time RT-PCR analysis
Real time PCR for selected TF encoding transcripts was performed in duplicates using Brilliant III Ultra-fast SYBR QPCR Master mix (Agilent Technologies, Mulgrave, Victoria, Australia) according to manufacturer’s instructions involving 3-step PCR cycle. Quantitative expression differences between samples were estimated using cDNA from male and female vegetative, immature and mature reproductive stages, obtained using the Invitrogen Superscript™III First strand cDNA synthesis kit according to manufacturer’s instructions. After purification and measurement, ~50 ng of cDNA from each stage of the 6 developmental stages was used as template for real-time PCR analysis using Brilliant III Ultra-fast SYBR QPCR Master mix. PCR amplifications were performed on the MX3000P real-time PCR instrument (Agilent Technologies, Mulgrave, Victoria, Australia). Data generated was analysed using MxPro software. All experiments were performed with two technical replicates and the RNA preparations were pooled mixtures of several rounds isolations for each sample, and are entirely independent from the ones used in RNA sequencing, hence the preparations itself contained multiple biological replicates. The quantity of cDNA was calculated by software in nanograms for each sample and is plotted onto a graph for reference transcripts - actin and CDPK genes and the transcript that has uniform constant RPKM values in all six stages (Additional file 9). The starting concentration of each transcript in a sample was expressed relative to the starting concentration of reference transcript. For each examined transcript, the ^Ct value between each tested sample and reference gene was calculated and plotted onto a graph. A list of primers used is given in Additional file 11.
Statistical analysis
Data given in Additional file 2 is divided into 6 ranks according to the group of organisms analysed and fed into Statistical Analysis Software (SAS) version 9.2. To test whether the number of genes encoding for TFs differs significantly among organisms (as grouped in ranks); all data were subjected to analysis of variance using PROC GLM of SAS. The sub-ranks nested within rank (i.e. rank (sub-rank)) was used as error term for significant test of ranks and this referred to individual organisms within a rank. Data was log-transformed prior to analysis to meet the assumptions of homogenous and normally distributed residuals. Pair-wise comparisons between ranks were undertaken with the use of Gabriel’s comparison interval (95% confidence intervals). Further analysis was done using the PROC NESTED (SAS) to determine the variance partitioning pattern among different sources of variation (i.e. rank, sub-rank, genes). The Tukey’s Studentized Range (HSD) Test also grouped 6 ranks in A, B, C and D groups according to the similarity between the number of TF-encoding genes for various organisms.
Availability of supporting data
The sequence datasets sets supporting the results of this article are available at NCBI in the Short Read Archive (SRA) database under accession number SRP029610.
References
Feller A, Machemer K, Braun EL, Grotewold E: Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. Plant J. 2011, 66 (1): 94-116. 10.1111/j.1365-313X.2010.04459.x.
Lang D, Weiche B, Timmerhaus G, Richardt S, Riano-Pachon DM, Correa LG, Reski R, Mueller-Roeber B, Rensing SA: Genome-wide phylogenetic comparative analysis of plant transcriptional regulation: a timeline of loss, gain, expansion, and correlation with complexity. Genome Biol Evol. 2010, 2: 488-503. 10.1093/gbe/evq032.
Aceituno FF, Moseyko N, Rhee SY, Gutierrez RA: The rules of gene expression in plants: organ identity and gene body methylation are key factors for regulation of gene expression in Arabidopsis thaliana. BMC Genomics. 2008, 9: 438-10.1186/1471-2164-9-438.
Barrera LO, Ren B: The transcriptional regulatory code of eukaryotic cells–insights from genome-wide analysis of chromatin organization and transcription factor binding. Curr Opin Cell Biol. 2006, 18 (3): 291-298. 10.1016/j.ceb.2006.04.002.
Mochida K, Yoshida T, Sakurai T, Yamaguchi-Shinozaki K, Shinozaki K, Tran LS: In silico analysis of transcription factor repertoires and prediction of stress-responsive transcription factors from six major gramineae plants. DNA Res. 2011, 18 (5): 321-332. 10.1093/dnares/dsr019.
Airoldi CA: Determination of sexual organ development. Sex Plant Reprod. 2010, 23 (1): 53-62. 10.1007/s00497-009-0126-z.
Katagiri F, Chua NH: Plant transcription factors: present knowledge and future challenges. Trends Genet. 1992, 8 (1): 22-27. 10.1016/0168-9525(92)90020-5.
Shiu SH, Shih MC, Li WH: Transcription factor families have much higher expansion rates in plants than in animals. Plant Physiol. 2005, 139 (1): 18-26. 10.1104/pp.105.065110.
Carroll SB: Chance and necessity: the evolution of morphological complexity and diversity. Nature. 2001, 409 (6823): 1102-1109. 10.1038/35059227.
Gutierrez RA, Green PJ, Keegstra K, Ohlrogge JB: Phylogenetic profiling of the arabidopsis thaliana proteome: what proteins distinguish plants from other organisms?. Genome Biol. 2004, 5 (8): R53-10.1186/gb-2004-5-8-r53.
Hsia CC, McGinnis W: Evolution of transcription factor function. Curr Opin Genet Dev. 2003, 13 (2): 199-206. 10.1016/S0959-437X(03)00017-0.
Levine M, Tjian R: Transcription regulation and animal diversity. Nature. 2003, 424 (6945): 147-151. 10.1038/nature01763.
Libault M, Joshi T, Benedito VA, Xu D, Udvardi MK, Stacey G: Legume transcription factor genes: what makes legumes so special?. Plant Physiol. 2009, 151 (3): 991-1001. 10.1104/pp.109.144105.
Dias AP, Braun EL, McMullen MD, Grotewold E: Recently duplicated maize R2R3 Myb genes provide evidence for distinct mechanisms of evolutionary divergence after duplication. Plant Physiol. 2003, 131 (2): 610-620. 10.1104/pp.012047.
Lespinet O, Wolf YI, Koonin EV, Aravind L: The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 2002, 12 (7): 1048-1059. 10.1101/gr.174302.
Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, et al: Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000, 290 (5499): 2105-2110. 10.1126/science.290.5499.2105.
Edger PP, Pires JC: Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes. Chromosome Res. 2009, 17 (5): 699-717. 10.1007/s10577-009-9055-9.
Freeling M, Thomas BC: Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 2006, 16 (7): 805-814. 10.1101/gr.3681406.
Becker A, Winter KU, Meyer B, Saedler H, Theissen G: MADS-Box gene diversity in seed plants 300 million years ago. Mol Biol Evol. 2000, 17 (10): 1425-1434. 10.1093/oxfordjournals.molbev.a026243.
Chen K, Rajewsky N: The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet. 2007, 8 (2): 93-103.
Inze D, De Veylder L: Cell cycle regulation in plant development. Annu Rev Genet. 2006, 40: 77-105. 10.1146/annurev.genet.40.110405.090431.
Qiu YL, Cho Y, Cox JC, Palmer JD: The gain of three mitochondrial introns identifies liverworts as the earliest land plants. Nature. 1998, 394 (6694): 671-674. 10.1038/29286.
Steinhauser S, Beckert S, Capesius I, Malek O, Knoop V: Plant mitochondrial RNA editing. J Mol Evol. 1999, 48 (3): 303-312. 10.1007/PL00006473.
Pruchner D, Beckert S, Muhle H, Knoop V: Divergent intron conservation in the mitochondrial nad2 gene: signatures for the three bryophyte classes (mosses, liverworts, and hornworts) and the lycophytes. J Mol Evol. 2002, 55 (3): 265-271. 10.1007/s00239-002-2324-2.
Groth-Malonek M, Pruchner D, Grewe F, Knoop V: Ancestors of trans-splicing mitochondrial introns support serial sister group relationships of hornworts and mosses with vascular plants. Mol Biol Evol. 2005, 22 (1): 117-125.
Forrest LL, Davis EC, Long DG, Crandall-Stotler BJ, Clark A, Hollingsworth ML: Unraveling the evolutionary history of the liverworts (marchantiophyta): multiple taxa, genomes and analyses. Bryologist. 2006, 109 (3): 303-334. 10.1639/0007-2745(2006)109[303:UTEHOT]2.0.CO;2.
He-Nygren X, Juslen A, Ahonen I, Glenny D, Piippo S: Illuminating the evolutionary history of liverworts (marchantiophyta) - towards a natural classification. Cladistics. 2006, 22 (1): 1-31. 10.1111/j.1096-0031.2006.00089.x.
Wellman CH, Osterloff PL, Mohiuddin U: Fragments of the earliest land plants. Nature. 2003, 425 (6955): 282-285. 10.1038/nature01884.
Bowman JL, Floyd SK, Sakakibara K: Green genes-comparative genomics of the green branch of life. Cell. 2007, 129 (2): 229-234. 10.1016/j.cell.2007.04.004.
Nishiyama R, Yamato KT, Miura K, Sakaida M, Okada S, Kono K, Takahama M, Sone T, Takenaka M, Fukuzawa H, et al: Comparison of expressed sequence tags from male and female sexual organs of marchantia polymorpha. DNA Res. 2000, 7 (3): 165-174. 10.1093/dnares/7.3.165.
Nagai J, Yamato KT, Sakaida M, Yoda H, Fukuzawa H, Ohyama K: Expressed sequence tags from immature female sexual organ of a liverwort, marchantia polymorpha. DNA Res. 1999, 6 (1): 1-11. 10.1093/dnares/6.1.1.
Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Res. 2008, 18 (5): 821-829. 10.1101/gr.074492.107.
Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012, 28 (8): 1086-1092. 10.1093/bioinformatics/bts094.
Garg R, Patel RK, Tyagi AK, Jain M: De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res. 2011, 18 (1): 53-63. 10.1093/dnares/dsq028.
Kohchi T, Ishizaki K, Fukuzawa H, Yamato KT: Toward functional and comparative genomics with the liverwort, marchantia polymorpha. Plant & Animal Genomes XVIII Conference: January 9–13, 2010 2010. 2010, San Diego, CA: Town & Country Convention Center
He K, Guo AY, Gao G, Zhu QH, Liu XC, Zhang H, Chen X, Gu X, Luo J: Computational identification of plant transcription factors and the construction of the PlantTFDB database. Methods Mol Biol. 2010, 674: 351-368. 10.1007/978-1-60761-854-6_21.
Zhang H, ** J, Tang L, Zhao Y, Gu X, Gao G, Luo J, Database issue: PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database. Nucleic Acids Res. 2011, 39 (Database issue): D1114-1117.
Guo AY, Chen X, Gao G, Zhang H, Zhu QH, Liu XC, Zhong YF, Gu X, He K, Luo J: PlantTFDB: a comprehensive plant transcription factor database. Nucleic Acids Res. 2008, 36 (Database issue): D966-969.
Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, et al: The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008, 319 (5859): 64-69. 10.1126/science.1150646.
Perez-Rodriguez P, Riano-Pachon DM, Correa LG, Rensing SA, Kersten B, Mueller-Roeber B: PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res. 2010, 38 (Database issue): D822-827.
Pires ND, Dolan L: Morphological evolution in land plants: new designs with old genes. Philos Trans R Soc Lond B Biol Sci. 2012, 367 (1588): 508-518. 10.1098/rstb.2011.0252.
Misumi O, Matsuzaki M, Nozaki H, Miyagishima SY, Mori T, Nishida K, Yagisawa F, Yoshida Y, Kuroiwa H, Kuroiwa T: Cyanidioschyzon merolae genome. A tool for facilitating comparable studies on organelle biogenesis in photosynthetic eukaryotes. Plant Physiol. 2005, 137 (2): 567-585. 10.1104/pp.104.053991.
Weber AP, Oesterhelt C, Gross W, Brautigam A, Imboden LA, Krassovskaya I, Linka N, Truchina J, Schneidereit J, Voll H, et al: EST-analysis of the thermo-acidophilic red microalga galdieria sulphuraria reveals potential for lipid a biosynthesis and unveils the pathway of carbon export from rhodoplasts. Plant Mol Biol. 2004, 55 (1): 17-32. 10.1007/s11103-004-0376-y.
Slapeta J, Lopez-Garcia P, Moreira D: Global dispersal and ancient cryptic species in the smallest marine eukaryotes. Mol Biol Evol. 2006, 23 (1): 23-29.
Worden AZ, Lee JH, Mock T, Rouze P, Simmons MP, Aerts AL, Allen AE, Cuvelier ML, Derelle E, Everett MV, et al: Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes micromonas. Science. 2009, 324 (5924): 268-272. 10.1126/science.1167222.
Palenik B, Grimwood J, Aerts A, Rouze P, Salamov A, Putnam N, Dupont C, Jorgensen R, Derelle E, Rombauts S, et al: The tiny eukaryote ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A. 2007, 104 (18): 7705-7710. 10.1073/pnas.0611046104.
Derelle E, Ferraz C, Rombauts S, Rouze P, Worden AZ, Robbens S, Partensky F, Degroeve S, Echeynie S, Cooke R, et al: Genome analysis of the smallest free-living eukaryote ostreococcus tauri unveils many unique features. Proc Natl Acad Sci U S A. 2006, 103 (31): 11647-11652. 10.1073/pnas.0604795103.
Blanc G, Duncan G, Agarkova I, Borodovsky M, Gurnon J, Kuo A, Lindquist E, Lucas S, Pangilinan J, Polle J, et al: The Chlorella variabilis NC64A genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic sex. Plant Cell. 2010, 22 (9): 2943-2955. 10.1105/tpc.110.076406.
Blanc G, Agarkova I, Grimwood J, Kuo A, Brueggeman A, Dunigan DD, Gurnon J, Ladunga I, Lindquist E, Lucas S, et al: The genome of the polar eukaryotic microalga coccomyxa subellipsoidea reveals traits of cold adaptation. Genome Biol. 2012, 13 (5): R39-10.1186/gb-2012-13-5-r39.
Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M, dePamphilis C, Albert VA, Aono N, Aoyama T, Ambrose BA, et al: The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science. 2011, 332 (6032): 960-963. 10.1126/science.1203810.
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al: A draft sequence of the rice genome (oryza sativa L. ssp. Indica). Science. 2002, 296 (5565): 79-92. 10.1126/science.1068037.
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al: A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science. 2002, 296 (5565): 92-100. 10.1126/science.1068275.
Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, et al: The arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet. 2011, 43 (5): 476-481. 10.1038/ng.807.
Bouche N, Scharlat A, Snedden W, Bouchez D, Fromm H: A novel family of calmodulin-binding transcription activators in multicellular organisms. J Biol Chem. 2002, 277 (24): 21851-21861. 10.1074/jbc.M200268200.
Winicov I: Alfin1 transcription factor overexpression enhances plant root growth under normal and saline conditions and improves salt tolerance in alfalfa. Planta. 2000, 210 (3): 416-422. 10.1007/PL00008150.
Guilfoyle TJ, Ulmasov T, Hagen G: The ARF family of transcription factors and their role in plant hormone-responsive transcription. Cell Mol Life Sci. 1998, 54 (7): 619-627. 10.1007/s000180050190.
Liscum E, Reed JW: Genetics of Aux/IAA and ARF action in plant growth and development. Plant Mol Biol. 2002, 49 (3–4): 387-400.
Solano R, Stepanova A, Chao Q, Ecker JR: Nuclear events in ethylene signaling: a transcriptional cascade mediated by ETHYLENE-INSENSITIVE3 and ETHYLENE-RESPONSE-FACTOR1. Genes Dev. 1998, 12 (23): 3703-3714. 10.1101/gad.12.23.3703.
Kim JH, Choi D, Kende H: The AtGRF family of putative transcription factors is involved in leaf and cotyledon growth in Arabidopsis. Plant J. 2003, 36 (1): 94-104. 10.1046/j.1365-313X.2003.01862.x.
Parcy F, Nilsson O, Busch MA, Lee I, Weigel D: A genetic framework for floral patterning. Nature. 1998, 395 (6702): 561-566. 10.1038/26903.
Husbands A, Bell EM, Shuai B, Smith HM, Springer PS: LATERAL ORGAN BOUNDARIES defines a new family of DNA-binding transcription factors and can interact with specific bHLH proteins. Nucleic Acids Res. 2007, 35 (19): 6663-6671. 10.1093/nar/gkm775.
Conner J, Liu Z: LEUNIG, a putative transcriptional corepressor that regulates AGAMOUS expression during flower development. Proc Natl Acad Sci U S A. 2000, 97 (23): 12902-12907. 10.1073/pnas.230352397.
Olsen AN, Ernst HA, Leggio LL, Skriver K: NAC transcription factors: structurally distinct, functionally diverse. Trends Plant Sci. 2005, 10 (2): 79-87. 10.1016/j.tplants.2004.12.010.
Schiefthaler U, Balasubramanian S, Sieber P, Chevalier D, Wisman E, Schneitz K: Molecular analysis of NOZZLE, a gene involved in pattern formation and early sporogenesis during sex organ development in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 1999, 96 (20): 11664-11669. 10.1073/pnas.96.20.11664.
Hackbusch J, Richter K, Muller J, Salamini F, Uhrig JF: A central role of Arabidopsis thaliana ovate family proteins in networking and subcellular localization of 3-aa loop extension homeodomain proteins. Proc Natl Acad Sci U S A. 2005, 102 (13): 4908-4912. 10.1073/pnas.0501181102.
Nishii A, Takemura M, Fujita H, Shikata M, Yokota A, Kohchi T: Characterization of a novel gene encoding a putative single zinc-finger protein, ZIM, expressed during the reproductive phase in arabidopsis thaliana. Biosci Biotechnol Biochem. 2000, 64 (7): 1402-1409. 10.1271/bbb.64.1402.
**e ZM, Zou HF, Lei G, Wei W, Zhou QY, Niu CF, Liao Y, Tian AG, Ma B, Zhang WK, et al: Soybean trihelix transcription factors GmGT-2A and GmGT-2B improve plant tolerance to abiotic stresses in transgenic arabidopsis. PLoS One. 2009, 4 (9): e6898-10.1371/journal.pone.0006898.
Breuer C, Kawamura A, Ichikawa T, Tominaga-Wada R, Wada T, Kondou Y, Muto S, Matsui M, Sugimoto K: The trihelix transcription factor GTL1 regulates ploidy-dependent cell growth in the Arabidopsis trichome. Plant Cell. 2009, 21 (8): 2307-2322. 10.1105/tpc.109.068387.
Gao MJ, Lydiate DJ, Li X, Lui H, Gjetvaj B, Hegedus DD, Rozwadowski K: Repression of seed maturation genes by a trihelix transcriptional repressor in arabidopsis seedlings. Plant Cell. 2009, 21 (1): 54-71. 10.1105/tpc.108.061309.
Brewer PB, Howles PA, Dorian K, Griffith ME, Ishida T, Kaplan-Levy RN, Kilinc A, Smyth DR: PETAL LOSS, a trihelix transcription factor gene, regulates perianth architecture in the arabidopsis flower. Development. 2004, 131 (16): 4035-4045. 10.1242/dev.01279.
Bolle C: The role of GRAS proteins in plant signal transduction and development. Planta. 2004, 218 (5): 683-692. 10.1007/s00425-004-1203-z.
Smalle J, Kurepa J, Haegman M, Gielen J, Van Montagu M, Van Der Straeten D: The trihelix DNA-binding motif in higher plants is not restricted to the transcription factors GT-1 and GT-2. Proc Natl Acad Sci U S A. 1998, 95 (6): 3318-3322. 10.1073/pnas.95.6.3318.
Bowman JL: Walkabout on the long branches of plant evolution. Curr Opin Plant Biol. 2013, 16 (1): 70-77. 10.1016/j.pbi.2012.10.001.
Kenrick P, Crane PR: The origin and early evolution of plants on land. Nature. 1997, 389 (6646): 33-39. 10.1038/37918.
Bork P, Doerks T, Springer TA, Snel B: Domains in plexins: links to integrins and transcription factors. Trends Biochem Sci. 1999, 24 (7): 261-263. 10.1016/S0968-0004(99)01416-4.
Duncan L, Nishii I, Harryman A, Buckley S, Howard A, Friedman NR, Miller SM: The VARL gene family and the evolutionary origins of the master cell-type regulatory gene, regA, in volvox carteri. J Mol Evol. 2007, 65 (1): 1-11. 10.1007/s00239-006-0225-5.
Timme RE, Delwiche CF: Uncovering the evolutionary origin of plant molecular processes: comparison of coleochaete (coleochaetales) and spirogyra (zygnematales) transcriptomes. BMC Plant Biol. 2010, 10: 96-10.1186/1471-2229-10-96.
Zobell O, Faigl W, Saedler H, Munster T: MIKC* MADS-box proteins: conserved regulators of the gametophytic generation of land plants. Mol Biol Evol. 2010, 27 (5): 1201-1211. 10.1093/molbev/msq005.
Tanabe Y, Hasebe M, Sekimoto H, Nishiyama T, Kitani M, Henschel K, Munster T, Theissen G, Nozaki H, Ito M: Characterization of MADS-box genes in charophycean green algae and its implication for the evolution of MADS-box genes. Proc Natl Acad Sci U S A. 2005, 102 (7): 2436-2441. 10.1073/pnas.0409860102.
Lee JH, Lin H, Joo S, Goodenough U: Early sexual origins of homeoprotein heterodimerization and evolution of the plant KNOX/BELL family. Cell. 2008, 133 (5): 829-840. 10.1016/j.cell.2008.04.028.
Floyd SK, Zalewski CS, Bowman JL: Evolution of class III homeodomain-leucine zipper genes in streptophytes. Genetics. 2006, 173 (1): 373-388. 10.1534/genetics.105.054239.
Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011, 12 (2): 87-98. 10.1038/nrg2934.
Bowman JL: The YABBY gene family and abaxial cell fate. Curr Opin Plant Biol. 2000, 3 (1): 17-22. 10.1016/S1369-5266(99)00035-7.
Byzova MV, Franken J, Aarts MG, de Almeida-Engler J, Engler G, Mariani C, Van Lookeren Campagne MM, Angenent GC: Arabidopsis STERILE APETALA, a multifunctional gene regulating inflorescence, flower, and ovule development. Genes Dev. 1999, 13 (8): 1002-1014. 10.1101/gad.13.8.1002.
Carles CC, Choffnes-Inada D, Reville K, Lertpiriyapong K, Fletcher JC: ULTRAPETALA1 Encodes a SAND domain putative transcriptional regulator that controls shoot and floral meristem activity in arabidopsis. Development. 2005, 132 (5): 897-911. 10.1242/dev.01642.
Mitsuda N, Hisabori T, Takeyasu K, Sato MH: VOZ; isolation and characterization of novel vascular plant transcription factors with a one-zinc finger from arabidopsis thaliana. Plant Cell Physiol. 2004, 45 (7): 845-854. 10.1093/pcp/pch101.
Doerks T, Copley R, Bork P: DDT – a novel domain in different transcription and chromosome remodeling factors. Trends Biochem Sci. 2001, 26 (3): 145-146. 10.1016/S0968-0004(00)01769-2.
Gardiner J, Donner TJ, Scarpella E: Simultaneous activation of SHR and ATHB8 expression defines switch to preprocambial cell state in arabidopsis leaf development. Dev Dyn. 2011, 240 (1): 261-270. 10.1002/dvdy.22516.
Lipsick JS: One billion years of Myb. Oncogene. 1996, 13 (2): 223-235.
Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.
Carretero-Paulet L, Galstyan A, Roig-Villanova I, Martinez-Garcia JF, Bilbao-Castro JR, Robertson DL: Genome-wide classification and evolutionary analysis of the bHLH family of transcription factors in arabidopsis, poplar, rice, moss, and algae. Plant Physiol. 2010, 153 (3): 1398-1412. 10.1104/pp.110.153593.
Pires N, Dolan L: Origin and diversification of basic-helix-loop-helix proteins in plants. Mol Biol Evol. 2010, 27 (4): 862-874. 10.1093/molbev/msp288.
Kranz H, Scholz K, Weisshaar B: c-MYB oncogene-like genes encoding three MYB repeats occur in all major plant lineages. Plant J. 2000, 21 (2): 231-235. 10.1046/j.1365-313x.2000.00666.x.
Rabinowicz PD, Braun EL, Wolfe AD, Bowen B, Grotewold E: Maize R2R3 Myb genes: sequence analysis reveals amplification in the higher plants. Genetics. 1999, 153 (1): 427-444.
Liu L, White MJ, MacRae TH: Transcription factors and their genes in higher plants functional domains, evolution and regulation. Eur J Biochem. 1999, 262 (2): 247-257. 10.1046/j.1432-1327.1999.00349.x.
Lin R, Ding L, Casola C, Ripoll DR, Feschotte C, Wang H: Transposase-derived transcription factors regulate light signaling in Arabidopsis. Science. 2007, 318 (5854): 1302-1305. 10.1126/science.1146281.
Fredericq H, De Greef J: Red (R), far-red (FR) photoreversible control of growth and chlorophyll content in light-grown thalli of marchantia polymorpha L. Naturwissenschaften. 1966, 53 (13): 337-
Wann FB: Some of the factors involved in the sexual reproduction of marchantia polymorpha. Am J Bot. 1925, 12 (6): 307-318. 10.2307/2435337.
Martin C, Paz-Ares J: MYB transcription factors in plants. Trends Genet. 1997, 13 (2): 67-73. 10.1016/S0168-9525(96)10049-4.
Purugganan MD, Rounsley SD, Schmidt RJ, Yanofsky MF: Molecular evolution of flower development: diversification of the plant MADS-box regulatory gene family. Genetics. 1995, 140 (1): 345-356.
Yamaguchi N, Wu MF, Winter CM, Berns MC, Nole-Wilson S, Yamaguchi A, Coupland G, Krizek BA, Wagner D: A molecular framework for auxin-mediated initiation of flower primordia. Dev Cell. 2013, 24 (3): 271-282. 10.1016/j.devcel.2012.12.017.
Buck MJ, Atchley WR: Phylogenetic analysis of plant basic helix-loop-helix proteins. J Mol Evol. 2003, 56 (6): 742-750. 10.1007/s00239-002-2449-3.
Eulgem T, Rushton PJ, Robatzek S, Somssich IE: The WRKY superfamily of plant transcription factors. Trends Plant Sci. 2000, 5 (5): 199-206. 10.1016/S1360-1385(00)01600-9.
Riechmann JL, Meyerowitz EM: The AP2/EREBP family of plant transcription factors. Biol Chem. 1998, 379 (6): 633-646.
Kappen C: The homeodomain: an ancient evolutionary motif in animals and plants. Comput Chem. 2000, 24 (1): 95-103.
Bienz M: The PHD finger, a nuclear protein-interaction domain. Trends Biochem Sci. 2006, 31 (1): 35-40. 10.1016/j.tibs.2005.11.001.
Marmorstein R: Structure of SET domain proteins: a new twist on histone methylation. Trends Biochem Sci. 2003, 28 (2): 59-62. 10.1016/S0968-0004(03)00007-0.
Riano-Pachon DM, Ruzicic S, Dreyer I, Mueller-Roeber B: PlnTFDB: an integrative plant transcription factor database. BMC Bioinforma. 2007, 8: 42-10.1186/1471-2105-8-42.
Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Map** and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.
Acknowledgements
NS acknowledges the receipt of the University of Melbourne postgraduate scholarship during the course of this study. We also thank Dr. Chol-hee Jung and Dr. Nuno D. Pires for valuable suggestions on the manuscript and Dr. Mohsen B. Mesgaran for hel** with statistical analysis. We thank three anonymous reviewers whose comments greatly improved the manuscript. Special thanks go to Prof. Scott D. Russell for suggestions for the improvement of the final manuscript. Financial support from the Australian Research Council in the form of the ARC Discovery grants ARC DP0988972 and ARCDP1097262 is also gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Conceived and designed the study: NS, PB, MS. Performed computational analysis and experiments: NS. Wrote the paper: NS, PB, MS. All authors have read and approved the manuscript.
Electronic supplementary material
12864_2013_5616_MOESM1_ESM.pptx
Additional file 1: Developmental stages of Marchantia polymorpha selected for RNA-Seq. VM (male vegetative thallus), VF (female vegetative thallus), IMM (immature reproductive male), IMF (immature reproductive female), MM (mature reproductive male) and MF (mature reproductive female). Immature male and female reproductive structures (antheridial and archegonial discs) – 2 mm in height and mature male and female reproductive structures (antheridial and archegonial discs) > 2 mm in height are taken into consideration for experimental purposes. (PPTX 519 KB)
12864_2013_5616_MOESM2_ESM.pdf
Additional file 2: Number of TF encoding genes in 20 organisms taken into consideration in the study. Based on the published reports, TF encoding genes in 20 organisms is recorded in the table. Organisms are classified into broader categories: red algae, green algae, liverwort, moss, spike moss, monocots and dicots. Liverwort data is the result of our study. All these classes of organisms are grouped in 6 ranks for statistical analysis. Ranks are also displayed in the table. Bar graph is also plotted for this distribution as shown in Figure 1. (PDF 40 KB)
12864_2013_5616_MOESM3_ESM.xlsx
Additional file 3: Transcription factor families and their potential function. 85 TF families and their functions as listed on Plants Transcription Factor Database (http://plntfdb.bio.uni-potsdam.de/v3.0/). (XLSX 14 KB)
12864_2013_5616_MOESM4_ESM.xlsx
Additional file 4: 24 TF families that evolved with Marchantia . Functions of 24 TF families that arose with emergence of Marchantia. TF families highlighted in yellow play role in sexual reproduction. (XLSX 15 KB)
12864_2013_5616_MOESM5_ESM.xlsx
Additional file 5: RPKM values of Marchantia transcripts encoding TFs in 6 developmental tissues. Sheet 1: List of 3,471 transcripts with details of their length and RPKM values in six developmental tissues in Marchantia. Sheet 2: List of 96 TF encoding transcript with specifically express in only one tissue. (XLSX 374 KB)
12864_2013_5616_MOESM6_ESM.xlsx
Additional file 6: List of transcripts expressing specifically in vegetative and reproductive stages. List of transcripts with details of their length, TF family they are encoding and RPKM values in six developmental tissues in Marchantia. (XLSX 31 KB)
12864_2013_5616_MOESM7_ESM.xlsx
Additional file 7: Table showing nature of correlation coefficients between different gene pairs in each TF family. List of number of gene pairs with significant positive, negative correlation and no significant correlation between expression patterns of genes in each TF family. (XLSX 195 KB)
12864_2013_5616_MOESM8_ESM.xlsx
Additional file 8: Top hits of BLASTX of TF encoding transcripts against nr database. List of transcripts with their top hits when blasted against NCBI nr database. (XLSX 2 MB)
12864_2013_5616_MOESM9_ESM.pptx
Additional file 9: Expression profiles of Actin and CDPK genes of Marchantia and the reference transcript. In qPCR analysis, the quantity of cDNA was calculated by software MaxPro in nanograms for each sample and is plotted onto a graph for reference transcripts - actin and CDPK and for the transcript that has uniform constant expression in all six stages. (PPTX 64 KB)
12864_2013_5616_MOESM10_ESM.docx
Additional file 10: Statistical results showing analysis of variance. Organisms under consideration are divided into 6 ranks as red algae – rank 1, green algae – rank 2, liverworts – rank 3, moss and spike moss – rank 4, monocots- rank 5 and dicots – rank 6. All data of TF-encoding transcripts is then fed into SAS to test whether the number of genes encoding for TFs differs significantly among organisms (as grouped in ranks). The Tukey’s Studentized Range (HSD) Test results also show the grou** of ranks 1, 2, 3, 4, 5 and 6 in 4 groups A, B, C and D on the basis of difference in mean. Higher plants (monocots (5) and dicots(6)) are grouped together as A, red algae and green algae are grouped together as D. Liverworts and mosses form the separate groups B and C respectively between the two extreme groups A and D. (DOCX 16 KB)
12864_2013_5616_MOESM11_ESM.xlsx
Additional file 11: List of primer sequences used for RT-PCR and real-time PCR experiments. Forward and Reverse primer sequences used in PCR. (XLSX 13 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Sharma, N., Bhalla, P.L. & Singh, M.B. Transcriptome-wide profiling and expression analysis of transcription factor families in a liverwort, Marchantia polymorpha. BMC Genomics 14, 915 (2013). https://doi.org/10.1186/1471-2164-14-915
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1471-2164-14-915