Background

Multigene families govern the growth and development of plants. A DNA-binding domain, a nuclear localization signal, a transcription activation domain, and an oligomerization site are the four primary domains that transcription factors typically contain and that play a significant role in the regulation of gene transcription [1]. Through the activation or suppression of the transcriptional process, these four domains cooperate to regulate a wide range of aspects of plant development and growth [2]. Analysis of each transcription factor’s distinct roles is complicated by the fact that multigene families frequently encode transcription factors [1]. The MYB transcription factors of higher plants are more widely distributed across the genome than those of fungi and mammals [3].

MYB transcription factors are made up of two unique regions: an N-terminal conserved MYB DNA binding domain and a C-terminal variable modulator region that regulates the activities of proteins. Plants have a high degree of conservation of the MYB domain, at N-terminus and proteins typically contain one to four repeats viz R1, R2, R3, and R4. Each repetition contains 50–53 amino acids that code for three α-helices, from which the second and third α-helices form the HTH (helix-turn-helix) structure [4]. The third α-helix interacts with the major groove of DNA and forms the transcription factor DNA recognition site [5]. A set of highly conserved tryptophan (W) residues found in the MYB domain is involved in sequence-specific DNA binding [6]. Contrarily, the C-terminal, promoter domain of various MYB proteins is extremely varied, contributing to the wide range of regulatory tasks played by the MYB gene family [7,8,9]. Higher plants often include R2R3-MYB domain proteins as their predominant form [10]. The regulation of primary and secondary metabolism, the management of the cell cycle, and the response to abiotic and biotic stressors are just a few of the phases of plant growth and development where the plant MYB transcription factors have been shown to play a role [11, 12].

The guava (Psidium guajava L.; PG) is a significant fruit crop in tropical and subtropical regions of the world. Guava has 2n = 22 chromosomes and a genomic size of over 450 MB. It belongs to the family Myrtaceae, which comprises about 150 species [13]. Guava is mostly produced in India, Mexico, Pakistan, Taiwan, Thailand, Colombia, and Indonesia, with small-scale plantations also operating in Malaysia, Australia, and South Africa [14]. There are approximately about 400 guava varieties produced all over the world, each with a unique fruit pulp and peel colour. When a fruit reaches maturity, the peel changes from green to yellow or red, and the fruit pulp can range from white to deep pink. This trait varies between cultivars and depends on the climate [15].

Consumers typically choose visually appealing colorful fruits because they have better nutritional characteristics and attractiveness. Genes involved in secondary metabolic pathways, particularly phenylalanine ammonia-lyase (PAL), anthocyanidin synthase (ANS), dihydro-flavonol 4-reductase (DFR), chalcone synthase (CHS), flavanol synthase/flavanone 3-hydroxylase (F3H), UDP-glucose: flavonoid 3-O-glucosyltransferase (UFGT), MYB transcription factors, basic helix-loop-helix (bHLH), tryptophan-aspartic acid repeat set c. regulate the color of fruits and vegetables [16,17,18,19,20,21].

Although Psidium guajava L (PG) is a lucrative economic crop it suffers huge losses due to wilt susceptibility. The specific etiology of wilt is unknown, but it has been linked to the pathogens Fusarium oxysporum, Fusarium solani, Rhizoctonia bataticola, Macrophomina phaeseoli, Gliocladium roseum, and Cephalosporium sp. etc. To combat wilt, the ICAR-Central Institute for Subtropical Horticulture in Lucknow has developed an interspecific hybrid rootstock of guava hybrid Psidium guajava × Psidium molle (PGPM). The rootstock has been demonstrated to be wilt-resistant and grafted successfully with commercial guava varieties [22, 23]. In light of the foregoing, it was hypothesized that the MYB also play a crucial role in determining the peel and pulp colour of guava fruit in PG as well as wilt resistance in PGPM. The first step for alteration via orderly breeding or genome editing is the identification of MYB candidate genes and the controlled pathways [24, 25]. The scarcity of genomic data for guava presents a significant challenge for genetic study. However, the development of genomic and transcriptomic analytical resources and tools is being aided by NGS techniques and bioinformatics pipelines. No attempt has been made to describe MYB genes implicated in guava development, even though physiological studies have been conducted. The objective of this study was to evaluate MYB expression in guava fruit pulp, root, and seed to determine its function by in silico analysis of the root transcriptome data of guava.

Materials and method

Transcriptome sequencing and in silico analysis

On the Illumina platform, paired-end sequencing was used to analyze the transcriptome sequence of the PGPM. The libraries were created using the Illumina TruSeq Stranded Total RNA Library Preparation Kit following the manufacturer's instructions using input total RNA of less than 1 g. Using Trinity software at default settings, high-quality reads were achieved. Trinity software was used with the default parameters to produce high-quality reads. The CD-HIT software was used to further process the transcripts for the prediction of Unigenes. With the help of blast analysis, more than 35,000 CDS were located with precise gene annotations. Nucleotide sequences were analyzed using nucleotide BLAST against mango genome data in the National Center for Biotechnology Information to determine the chromosomal placement and location of MYB CDS (NCBI). Using the online sequence manipulation suite (https://www.bioinformatics.org/sms2/), the nucleotide CDS sequence was converted into amino acid sequences. The MOTIF Search tool (https://www.genome.jp/tools/motif/) was used to analyze the motif search results. Using the MUSCLE tool (https://www.ebi.ac.uk/Tools/msa/muscle/), multiple amino acid sequence alignments of MYB proteins were carried out using CDS of transcriptome data. Weblogo (https://weblogo.berkeley.edu) tool was used to prepare the logo sequences for the R2-R3-MYB domains of MYB proteins. The gene expression analysis of mine MYB family genes was carried out using commonly occurring CDS (based on common NR blast hit accession) taking as control with hybrid Psidium guajava × Psidium molle hybrid (wilt-resistant) as treated. The Gene Ontology (GO) and Subcellular localization analysis of the mine MYB family genes was performed using the online web-server BUSCA (Bologna Unified Subcellular Component Annotator) (https://busca.biocomp.unibo.it/). The phylogenetic position of mine MYB family genes with respect to the reference MYB gene of Arabidopsis thaliana (AAD46772) was performed using MEGA version 5.2 software. The protein sequence of mine MYB family genes was aligned through by inbuilt MUSCLES alignment tool of MEGA version 5.2 software. The evolutionary history was inferred by using the Maximum Likelihood method based on the Poisson correction model with bootstrap replications of 500. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value [26].

Semi-quantitative RT-PCR analysis

Lalit (pulp, seed, and root) and Shweta (pulp) samples of frozen guava were pulverized to a fine powder with liquid nitrogen using a mortar and pestle. Total RNA was isolated using the Spectrum TM plant total RNA kit (Sigma, USA) following the manufacturer's instructions. NanoDrop was used to monitor the purity and quantity of total RNA. The Maxima first strand cDNA synthesis kit for RT-qPCR (Genetix) was used to reverse transcribe 2 g of total RNA from each sample using OligodT and random primers under the manufacturer guidelines. A 1:5 dilution of the synthesized cDNA in nuclease-free water was performed before qRT-PCR analysis. The following parameters were employed to develop primer (Table 1) using IDT PrimerQuest software and the CDS acquired from fruit transcriptome data: OligoAnalyzer was used to check for the presence of stable hairpins and dimers. The ideal length was 25 base pairs, the GC content was 50–55%, the melting point was 57 °C, and the amplicon length range was 100–200 base pairs. For in silico confirmation of each gene’s specificity, the produced primer pair was then aligned to all guava CDS. One hundred nanograms of cDNA, 0.5 µm conc. of each primer, 2.5 mM dNTPs, and 1 unit of Taq DNA polymerase were mixed with 1 × PCR buffer to perform PCR amplification in a total volume of 10 µl. The reaction was run through 35 cycles in a Bio-Rad thermal cycler, starting with an initial denaturation at 94 ℃ for 30 s, followed by 57 °C for 30 s, 72 ℃ for 30 s, and a final extension at 72 °C for 5 min. PCR-amplified products were visualized via a trans-illuminator after being resolved on a 2.0% agarose gel made with 1 × TBE buffer and 0.5 µg of ethidium bromide.

Table 1 Primers used in semi-quantitative RT-PCR analysis

Result and discussion

In this study, 2.64 GB of data was generated from root transcriptome analysis of the PGPM. Trinity software was used to do a de novo assembly of high-quality reads, identifying 170,027 transcripts with a maximum length of 962 bp. The top-hit species distribution showed that the species Eucalyptus grandis received the bulk of hits. We have mined 15 different MYB transcription factors genes/transcripts (MYB3, MYB4, MYB5, MYB6, MYB23, MYB44, MYB46, MYB51, MYB86, MYB82, MYB90, MYB114, MYB305, MYB308, and MYB330) in the root transcriptome data of guava. Further, the phylogenetic position of mine MYB family genes with respect to the reference MYB gene of Arabidopsis thaliana (AAD46772) is depicted in Fig. 1. Phylogenetic analysis grouped the mine MYB family genes into four clusters.

Fig. 1
figure 1

Molecular phylogenetic analysis by maximum likelihood method. The tree with the highest log likelihood (-2952.9718) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 16 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 65 positions in the final dataset

We used a bioinformatics technique to study domain analysis in coding sequences, which reveals that all mined MYB genes possess the conserved R2-MYB and R3-MYB domains and were found to be unevenly distributed throughout guava chromosomes (Fig. 2; Table 2). WebLogo was used to build sequence logos that could be used to show conservation at specific points. The conserved amino acids shared by MYB domains were strikingly similar, as seen in Figs. 3 and 4. According to these findings, the R2 and R3 repeats of the guava MYBs proteins contain a large number of conserved amino acids, including the distinctive Trp (W). The R2 repeat contains three conserved Trp (W) residues. The R3 repeat only conserved the second and third Trp (W), while the first Trp (W) was frequently replaced with phenylalanine (F) or isoleucine (I) (Figs. 3 and 4). It is possible that substitution at the first Trp (W) residue causes the recognition of new target genes and/or results in a reduction in the DNA-binding activity against target genes. The C-terminal domain is more mutable, whereas the N-terminal domain is more conserved [6, 27]. Gene Ontology (GO) and Subcellular localization analysis of the identified MYB family genes revealed that MYB5, MYB3, MYB1, MYB308, MYB51, MYB86, MYB4, MYB46, MYB23, MYB330, MYB114, MYB305, MYB44 were localized in the nucleus. Whereas the subcellular localization of MYB90 and MYB5 accounted for chloroplast and endomembrane systems respectively (Table 3 and supplementary data).

Fig. 2
figure 2

Distribution of MYB protein domains and motifs in different CDS of guava

Table 2 Chromosomal location and nucleotide positioning of different CDS sequences having R2–R3 MYB domain
Fig. 3
figure 3

Multiple sequence alignments of MYB proteins. a R2MYB. b R3MYB of guava

Fig. 4
figure 4

Logo sequences for a R2MYB, b R3MYB domain of guava MYB proteins prepared from Weblogo

Table 3 Gene ontology (GO) and Subcellular localization analysis of the identified MYB family genes

The expression of MYB3, MYB4, MYB23, MYB86, MYB90, and MYB308 was also evaluated in the Lalit fruit pulp (red pulp), seed, root, and Shweta pulp (white pulp) (Fig. 5). MYB23 and MYB308 were only expressed in the root tissue, indicating that they play a crucial role in the development of roots. While pulp and root tissue showed MYB4 expression (Fig. 4). MYB86 and MYB90 expression was identified in all tissues, including the root, seed, and pulp, indicating that they play a significant role in the development of fruits and roots (Fig. 4). MYB3 was only expressed in the fruit pulp, highlighting its crucial function in fruit ripening (Fig. 4). The heatmap analysis also accounted the exclusive significant higher expression of MYB23 gene in wilt-resistant PGPM as compared with wilt susceptible Psidium guajava (Fig. 6). This result indicates that higher expression of MYB23 (only expressed in root) gene might play a crucial role to confer the wilt resistance in PGPM.

Fig. 5
figure 5

Semi-quantitative RT-PCR analysis of different MYB in root, pulp, and seed of guava

Fig. 6
figure 6

Heatmap of upregulated and downregulated 15 mine MYB gene. Scaling is the base mean value, i.e., dark red colour represents upregulated genes whereas green colour represents downregulated genes. MYB family genes of PG (Psidium guajava) taking as control while PGPM (hybrid Psidium guajava × Psidium molle) is treated

Gene functions can be better understood by understanding the patterns of gene expression. Recent research has revealed that the pear’s skin, bud, and fruit express MYB genes [28, 43]. The MYB gene is responsible for controlling the synthesis of anthocyanins and proanthocyanidins, which affects plant structure and seed colour, respectively [44, 45].

Conclusion

It has been established that members of the MYB gene family play many regulatory roles in controlling how plants respond to diverse biotic and abiotic stresses. Root transcriptome analysis of Psidium guajava × Psidium molle hybrid generated 2.64 GB data. In silico mining including motifs and expression of guava MYBs was carried out in the present study. A total of 15 MYB family members were identified. They were unevenly distributed among chromosomes in guava probably with the occurrence of gene duplication. The exclusive significant higher expression of the MYB23 gene (only expressed in root) in wilt-resistant PGPM indicates that it might play a crucial role to confer the wilt resistance in PGPM. Furthermore, the expression patterns of some MYB demonstrated that MYB might participate in the regulation of fruit ripening, seed and root development. Future guava fruit quality will be enhanced with the help of the identification and study of MYB genes. The insights from these findings may help with future functional studies of the MYB genes to clarify their biological functions in guava.