Background

After rice and maize, cassava (Manihot esculenta Crantz) is the third most widely cultivated crop in the world [1]. Apart from providing food, cassava is also used in the production of starch-based products, animal feed, ethanol, and biofuel. In Southeast Asia [2], cassava is grown as an industrial crop by more than 2 million households. In Thailand alone, 1.28 million hectares of farmland is used for the cultivation of this crop, providing work for more than 600,000 families in 50 provinces. Typically, the country produces 28–30 million tons of casava annually, and demand can exceed 40 million tons [3]. Cassava serves as food and it is also used in the production of starch-based products, animal feed, ethanol, and biofuel. Cultivar Kasetsart 50 (KU50) is one of the most important cassava cultivars in the world, cultivated in several Asian countries. In 2009, the adaptation of KU50 reached its peak; farmers grew this cultivar on 718,400 hectares of land, 52% of this being in Thailand. Its excellent yield, high starch content, good gemination and vigorous growth explain the increasing popularity of this cultivar. It also adapts well to different Agro-ecological zones [4, 5]. Unfortunately, in recent years the production of this valuable crop has been affected by pests and pathogens, reducing productivity.

Cassava mosaic disease (CMD) is the most important plant pest in African and Asia [6]. CMD first emerged in Southeast Asia in 2015 [2, 7]. Currently most CMD outbreaks are reported in mainland SEA, including Cambodia, Vietnam, Thailand, and Laos [7,8,9,10,11]. Uke, et al. reported that cassava root yield decreased by 16–33% and starch content by 22–28%, after infected cuttings were planted in Vietnam [10]. To date, all CMD outbreaks in SEA have been caused by a single viral species, Sri Lankan cassava mosaic virus (SLCMV) [12]. Although outbreaks affected SEA for 8 years, currently there are no commercially available SLCMV resistant cassava cultivars. Nonetheless, field trials and greenhouse testing indicated that KU50 showed some tolerance to CMD, with only mildly symptomatic infections and slow disease progression resulting in better preserved yields compared to other commercial cassava varieties [13, 14]. Consequently, the Department of Agriculture Extension, Thailand, has approved KU50 as an acceptable substitute for susceptible cassava varieties until more CMD resistant varieties are developed. SLCMV is a member of cassava mosaic geminiviruses (family Geminiviridae, genus Begomovirus). SLCMV is one of eleven cassava mosaic geminivirus species [15] that are primarily transmitted by a whitefly vector (Bemisia tabaci), although vegetative propagation also occurs, through the plantation of infected stems [16]. It appears, that the practice of vegetative cassava propagation is the main factor in disease spread in SEA.

Proteomics studies analyze proteins after post-translational modifications (PTMs), providing vital insight into biological functions [17]. PTMs are crucial for plant growth, development, and responses to biotic and abiotic stresses. As PTMs cannot be detected by genome sequencing or transcriptional analysis, proteomics has great utility in understanding the importance of these changes [18,25].

Because in Thailand cassava is mostly propagated vegetatively, the use of potentially infected stems during this practice is the primary cause of CMV spread, and farmers also use this approach to replace other cassava cultivars with KU50. Despite previous proteomics studies investigating cassava leaves infected with ICMV and SLCMV, there is no available data on how infected KU50 plants respond to the virus. The objective of this study is to compare the proteome of healthy and infected KU50 leaves to identify changes induced by SLCMV infection. By identifying differentially expressed proteins during the infection, this study contributes significantly to our understanding of molecular mechanisms involved in the pathogenesis of SLCMV infection in cassava.

Materials and methods

Sample collection

Healthy and SLCMV infected cassava cv. KU50 stems were vegetatively propagated. The planting materials were supported by which obtained from Thai Tapioca Development Institute (TTDI), Nakhon Ratchasima, Thailand. The stems were cut into 15 cm lengths, containing 3–4 buds each, and planted in 20-cm diameter plastic pots in the green house of the Department of Plant Pathology, Faculty of Kasetsart University, Thailand. After 45 days of cultivation, the apex of healthy and infected stems was collected, pooling leaves from three plants into one sample. To stop all metabolic activity, the leaves were immediately flash frozen in liquid nitrogen, and the frozen samples were stored at -80 °C until protein extraction and nucleic acid extraction.

CMV detection by PCR

DNA was extracted from the frozen cassava leaves using the CTAB method [26]. Cassava tissues were ground to a powder in liquid nitrogen and 700 µL CTAB buffer was added to 200 mg of ground leaf power. This suspension was incubated at 65 °C for 30 min, then 700 μL chloroform:isoamyl alcohol mixture (24:1) was added to the tubes. DNA was precipitated by the addition of 700 μL isopropanol, and the tubes were incubated at -20 °C for 3 h. The pelleted DNA was washed twice in 70% ethanol and the pellet was dried at room temperature. The DNA was resuspended in ddH2O containing 100 μg/ml RNase (Thermo Fisher Scientific, Waltham, MA, USA) and stored at -20 °C. DNA samples were quantitated and checked for integrity by agarose gel electrophoresis and a Nanodrop spectrophotometer (NanoDrop Technologies, Thermo Scientific).

To identify infected plants, the AV1 gene of SLCMV was amplified using gene specific primers (forward: 5’-GTT GAA GGT ACT TAT TCC C-3’ and reverse: 5’-TAT TAA TAC GGT TGT AAA CGC-3’) [27]. Amplification was performed in a 25-μL reaction volume containing 1 × PCR buffer (PCR Biosystems, London, UK), 0.2 μM each of forward and reverse primers, and 50 ng of the genomic DNA. After an initial denaturation at 94 °C for 5 min 35 cycles were carried out, consisting of denaturation at 94 °C for 40 s, annealing at 55 °C for 40 s, and elongation at 72 °C for 40 s. The final elongation was carried out at 72 °C for 5 min. Amplified products were examined by gel electrophoresis on a 1.5% agarose TAE gel containing RedSafe Nucleic Acid Staining Solution (iNtRON Biotechnology, Sangdaewon, South Korea). Sanger sequencing was used to establish the sequence of the amplified product (Macrogen, The Netherlands), and the identity of the virus was confirmed by BLAST searches in the National Center for Biotechnology Information database (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Protein extraction

Cassava tissues were ground to a fine powder in liquid nitrogen, then 1 ml of 0.5% Sodium dodecyl sulfate was added to 100 mg tissue powder. After agitating for 1 h, tubes were centrifuged at 9,000 g for 15 min. The supernatant was transferred to fresh tube, mixed with 1,400 μL of cold acetone, and incubated at -20 °C. The mixture was centrifuged at 9,000 g for 15 min and the supernatant was discarded. The protein pellet was dried and stored at -80 °C.

Determination of protein concentration using the lowry method

The pellets were resuspended in 0.5% SDS and the protein concentration was determined by the Lowry method [28]. The absorbance at 750 nm (OD750) was measured and the protein concentration was calculated using a standard curve established using a serial dilution of BSA.

In solution digestion

Samples containing 5 µg of total protein were subjected to in-solution protease digestion. Samples were dissolved in 10 mM ammonium bicarbonate (AMBIC) buffer. Disulfide bonds were reduced by adding dithiothreitol (DTT) to a final concentration of 5 mM, followed by an incubation at 60ºC for 1 h. Finally, sulfhydryl groups were alkylated by incubating the samples for 45 min in AMBIC buffer containing 15 mM iodoacetamide, at room temperature, in the dark [29]. Next, samples were mixed with 50 ng/µL of sequencing grade trypsin (1:20 ratio) (Promega, Germany) and incubated at 37ºC overnight, to digest the protein content. Prior to LC–MS/MS analysis, the digested samples were dried and protonated with 0.1% formic acid.

Protein quantification

Tryptic peptide samples were prepared for injection into an Ultimate3000 Nano/Capillary LC System (Thermo Scientific, UK) coupled to a Hybrid quadrupole Q-Tof impact II™ (Bruker Daltonics) equipped with a Nano-captive spray ion source. Briefly, 1µL of peptide digest was enriched on a µ-Precolumn 300 µm i.d. X 5 mm C18 Pepmap 100, 5 µm, 100 A (Thermo Scientific, UK), separated on a 75-µm I.D. × 15 cm column packed with Acclaim PepMap RSLC C18, 2 μm, 100 Å, nanoViper (Thermo Scientific, UK). The C18 column was enclosed in a thermostatic oven set to 60 °C. Solvent A was 0.1% formic acid in water, solvent B was 0.1% formic acid in 80% acetonitrile. A gradient of 5–55% solvent B was used to elute the peptides at a constant flow rate of 0.30 µL/min over 30 min. Electrospray ionization was carried out at 1.6 kV using CaptiveSpray. Nitrogen was used as drying gas (flow rate approximately 50 L/h). Collision-induced-dissociation (CID) product ion mass spectra were obtained using nitrogen gas as the collision gas. Mass spectra [30] and MS/MS spectra were obtained in the positive-ion mode at 2 Hz over the range (m/z) 150–2200. The collision energy was adjusted to 10 eV as a function of the m/z value. The LC–MS analysis of each sample was performed in triplicate. The protein spectral data obtained in this study has been deposited at ProteomeXchange: PXD035792.

Bioinformatics and data analysis

The MaxQuant software package (version 2.0.3.0) was used to quantitate proteins in individual samples using the Andromeda search engine. This identifies peptide fragments by comparing MS/MS spectra to data deposited in the UniProt Manihot esculenta database [31]. Label-free quantitation with MaxQuant standard setting was performed allowing a maximum of two miss cleavages and a mass tolerance of 0.6 Daltons during the main search. These searches were set up based on trypsin being the digesting enzyme, carbamidomethylation of cysteine as a fixed modification, and the oxidation of methionine and acetylation of the protein N-terminus being variable modifications. Only peptides peaks containing a minimum of 7 amino acids and representing a single sequence motif were used in further analysis. Proteins were identified if at least two corresponding peptides were detected, with at least one of these being unique. Only proteins identified based on the above criteria were used for data analysis. False discovery rate (FDR) was set at 1% and was estimated by using reversed search sequences. The maximal number of modifications for any given peptide was set to 5.

For FASTA searches, the proteins present in the Manihot esculenta proteome were downloaded from the UniProt database. Potential contaminants, present in the contaminant FASTA file that comes with MaxQuant software, were automatically added to the search space. The MaxQuant ProteinGroups.txt file was loaded into the Perseus search platform (version 1.6.6.0) [31], and potential outliers that did not correspond to any UPS1 protein were removed from the data set. Maximum intensities were log2 transformed and pairwise comparisons were carried out via t-tests. Missing values were imputed in Perseus using a constant value (zero). Data visualization and statistical analyses were conducted using the MultiExperiment Viewer (MeV) of the TM4 software suit [32]. Protein organization and biological function were investigated according to the protein analysis through evolutionary relationships (Panther) protein classification [33] and Venn diagrams were used to visualize differences between protein lists derived from the various samples [34].

qPCR validation of differentially expressed genes

Total RNA was extracted from healthy and SLCMV infected leaf tissues using QIAzol Lysis Reagent (QIAGEN, Hilden, Germany) according to the manufacturer’s protocol. A RevertAid First Stand cDNA Synthesis Kit (ThermoFisher Scientific, USA) was used for first stand cDNA synthesis, primed by Oligo(dT)18, as recommend by the manufacturer. The qPCR reactions were performed using Hot FIREPol® EvaGreen® qPCR Mix Plus (Solis BioDyne,Tartu, Estonia). Reactions contained 10 ng cDNA template, 4 µL 5X HOT FIREPol® EvaGreen® qPCR Mix Plus, 0.2 µM primer mix, and nuclease-free H20 in 20 µL final volume. Amplification was carried out on a CFX Connect Real-Time PCR System (Bio-Rad, USA) using the following cycle parameters: initial denaturation (95ºC for 15 min), followed by 45 cycles of denaturation (95ºC for 30 s), annealing (60ºC for 30 s) and extension (72ºC for 30 s). Fluorescence was recorded at the end of each cycle and a melt curve analysis was carried out between 60ºC-95ºC with a 10 s hold at every 1ºC increment. Each reaction was set up in triplicates. Proteins uniquely upregulated in SLCMV infected cassava samples were analyzed by qPCR experiments. Most of these were associated with gene and protein processing, plant defense, and stress responses. The primers used in these experiments are listed in Supplementary Table S1. The reference gene to assess relative mRNA abundance was UBQ10 [25]. Fold changes in gene expression were analyzed using the log2 (ΔCT) method [35].

Results

SCMV symptoms and virus infection analysis

The characteristic symptoms of SLCMV infection were present in the infected plants after the emergence of the first leaf, with chlorotic mosaics and leaf deformation. PCR experiments using gene specific primers confirmed SLCMV as the causative agent of the visible symptoms. The ~ 900-bp amplicon was only seen when DNA from symptomatic leaves was analyzed (Supplementary Figure S1), and the sequence of the amplified product showed 99% identity with the Sri Lankan cassava mosaic virus isolated from Thailand (Supplementary Figure S2).

Protein identification and quantification

Samples from both healthy and infected plants contained three biotical replicates to reduce random sampling effects and increase the validity of proteins detections. The tandem mass spectra obtained matched the Manihot esculenta genome and protein annotation database. After normalizing spectral counts, the protein components of infected samples were compared with healthy leaves to identify differentially expressed molecules. Log2 change between paired values was calculated by affinity propagation to identify proteins showing significantly altered expression as a result of the infection. In total, 1,813 peptides were identified, 1,064 in the SLCMV infected KU50 samples and 947 in healthy plants. Venn diagram, showing the relationship between the 479 and 408 proteins detected in SLCMV-infected and healthy cassava plant, respectively is shown in Fig. 1. As shown in the Fig. 1, 109 of these proteins were present in both the infected and healthy samples.

Fig. 1
figure 1

Venn diagram showing the distribution and relationships of proteins in healthy and SLCMV infected KU50 plants

Gene Ontology Analysis

Analyzing the gene ontology classification of the detected proteins showed that 36% of the overexpressed proteins were membrane associated, 23.8% localized in the nucleus, and 14.6% to the cytoplasm. In contrast, proteins in healthy plants were up regulated on membranes (40.1%), nucleus (16.9%) and cytoplasm (19.5%). In addition, both samples contained proteins from other cellular components, such as chloroplast, cytosol, Golgi apparatus, mitochondria, microtube, endoplasmic reticulum, peroxisome, and vacuoles. The cellular localization of the detected molecules is shown in Fig. 2A.

Fig. 2
figure 2

Functional classification of differentially expressed proteins in the leaves of healthy or infected cassava cv. KU50. A Proportion of proteins in different functional classes based on cellular localization. B Functional grou** of biological processes. C Proportion of functional groups based on molecular function

In both healthy and SLCMV-infected cassava plants most of the identified proteins were involved in biosynthetic pathways and cellular or metabolic processes. The proportion of proteins associated with plant defense responses rose from 4.1% in healthy plants to 5.7% as a result of the infection, while the proportion of stress response proteins increased from 3.3% to 5.9%. Proteins related to transportation, signaling, photosynthesis, and respiration are shown in Fig. 2B, while those with a role in nucleotide binding, catalytic activity, transporter activity, structural role, and electron carrier activity are summarized in Fig. 2C. Overall, nucleotide binding proteins represented the most commonly identified protein category both in healthy and infected cells (60%). A heat map of differentially expressed proteins within nine biological functional groups is shown in Fig. 3, where a change in color from red to dark green corresponds to differential changes in gene expression abundance. In general, most upregulated proteins in infected KU50 samples were involved in respiration, plant defense, stress responses, metabolic processes, and biosynthetic pathways (Fig. 3).

Fig. 3
figure 3

Heatmap of the leaves of cassava cv. KU50 during SLCMV infection. Heatmaps represent each functional group of differentially expressed proteins a Respiration; b Stress response; c Plant defence response; d Metabolic processes and e Biosynthetic processes

Proteins differentially expressed in response to SLCMV infection

The identified differentially expressed proteins were associated with biosynthesis, plant defense, and stress responses. Within these categories, a subset of replication, transcription, and translation-related proteins were exclusively expressed in SLCMV-infected samples. There were 59 proteins that were overexpressed more than 17-fold in infected samples. As shown in Table 1, 8 of these were involved in replication, 43 in transcription, and 8 in translation. Interestingly, 21 of the most strikingly upregulated proteins were transcription factors, regulating immune mechanisms, stress responses, and the secretion of plant hormones, while another 20 were connected with plant defense responses. Finally, 26 proteins were previously documented to play a role in biotic plant stress. To confirm the findings of the proteomics analysis, we selected 77 differentially expressed proteins for qPCR-based quantitation. Samples were normalized using the expression of the UBQ10 gene in the same cDNA sample. These experiments identified 53 mRNAs that were upregulated in SLCMV infected plants (Fig. 4).

Table 1 Selected differentially expressed protein of SLCMV infected cassava cv. KU50
Fig. 4
figure 4

Validation of differential protein expression levels using RT-qPCR. Result represent the averages from three biological replicates and were normalized against the expression level of the endogenous UBQ10 gene

A total of 29 highly upregulated transcripts were related to biosynthetic processes, including the transcription factors NAC 6, NAC 22, NAC 35, NAC 54, NAC 70, WRKY 77, and WRKY 83. Furthermore, mRNA abundance and protein expression levels showed good correlation for these molecules. Regarding genes that were involved in transcription, translation, and protein synthesis DNA replication ATP-dependent helicase (A0A2C9VTN6) and replication termination factor 2 (A0A2C9UKD1) were also upregulated at transcript levels, showing good correlation with the proteomics data. However, transcripts levels for 16S rRNA m5C967 methyltransferase (A0A2C9UDI4), elongation factor 1-alpha (EF-1-alpha, O49169), HTH myb-type domain-containing protein (A0A2C9WJL4), peptidylprolyl isomerase (A0A251J9E4), and PWWP domain-containing protein (A0A2C9V4D0) were not upregulated at mRNA level, indicating discrepancies between proteomics and transcription data (Fig. 4). Furthermore, proteomics data showed that the WRKY 18 (A0A140H8M2), WRKY 24 (A0A140H8M8) and WRKY 29 (A0A140H8N3) transcription factors were down regulated in SLCMV infection.

qPCR experiments also confirmed changes in the abundance of mRNAs encoding plant defense molecules. The seven upregulated transcripts were cupin type-1 domain (A0A199UBY6), galectin domain (A0A2C9VQZ7), heavy metal-associated domain (HMA domain) (A0A2C9ULD8), Leucine rich repeat N-terminal domain (LRRNT_2 domain) (A0A2C9VHG8), occludin_ELL domain (A0A2C9W2D3), receptor-like serine/threonine-protein kinase (A0A2C9V7G1), and Toll/interleukin-1 receptor/resistance protein domain (TIR domain) (A0A2C9V5Q3). At the same time, AT-hook motif nuclear-localized protein, DCD domain, NB-ARC domain and, protein kinase domain were down-regulated according to qPCR and proteomics analysis (Fig. 4).

The expression of an additional 24 molecules was determined at mRNA level. Fourteen of these, annexin (A0A2C9UX68), catalase (A0A2C9WMD1), HSF_DOMAIN domain (A0A2C9WN68), peptidase_M28 domain (A0A2C9V3F4), peroxidase (A0A2C9UEX5), PMR5N domain (A0A2C9VL88), Rab-GAP TBC domain (A0A251KQD0), reticulon-like protein (A0A2C9VPT1), S-(hydroxymethyl) glutathione dehydrogenase (A0A2C9UNE1), SPRY domain (A0A2C9U854), START domain (A0A2C9WKA1), Thioredoxin domain (A0A2C9WFP1), and BEACH domain (A0A2C9UBJ8) showed elevate transcript abundance, in concordance with LC–MS/MS analysis. However, transcript levels of 1-amino-cyclopropane-1-carboxylic acid oxidase (A0SVL8), alpha-hydroxynitrile lyase (O49893), BAG domain (A0A2C9UMY9), CCHC-type domain (A0A2C9UUL9), Clp R domain (A0A2C9VMF9), formate dehydrogenase (A0A2C9U930), GUB_WAK_binding domain (A0A2C9UKE3), LEA_2 domain (A0A2C9WIP6), S-acyltransferase (A0A2C9WR35), and SHSP domain (A0A2C9V533) were found to be downregulated during qPCR, seemingly contradicting proteomics findings (Fig. 4).

Discussion

We explored the consequences of SLCMV infections in cassava plants at the level of the pant proteome. This approach enabled the documentation of changing protein expression levels in response to the infection and provided valuable insights into the molecular processes that occur in infected plants. Apart from their relevance to basic biological sciences, the findings may provide crucial information to aid the development of SLCMV tolerant commercial cultivars. Develo** such resistant plants is critical for disease control and has very high economic significance. This is the first documentation of global changes in the cassava proteome, comparing healthy and SLCMV infected KU50 plants. Subsequent gene ontology analyses highlighted the most likely biological processes altered by this viral infection. During the analysis we paid special attention to proteins that have been implicated in geminivirus infestations in other plants, particularly those molecules involved in gene synthesis, and plant defense and stress responses.

The expression of an extensive subset of differentially expressed proteins was also quantitated using qPCR, including a set of 77 proteins upregulated during SLCMV infection. Out of this fairly comprehensive set of molecules, qPCR confirmed a corresponding increase in mRNA abundance in approximately 70% of the cases. However, proteomics findings and qPCR gene expression data showed contradictory outcomes in the remaining 30%. While this discordance may seem striking, several mechanisms have been reported in the literature that may explain such discrepancies. Post-transcriptional regulatory processes, such as different local translation efficiency and codon bias [36], altered translation elongation rate [37], differences in the half-life of proteins [38], or variations in poly A tail length [39, 40] have all been reported to result in discordant proteomics/transcriptomics results.

According to gene ontology analysis, the majority of differentially expressed proteins in SLCMV infected cassava cv. KU50 plants localized to the cell membrane, nucleus, cytoplasm, and chloroplast. Previous studies indicated that plant defense pathways involved transcriptional and post-transcriptional gene silencing, the ubiquitin-proteasomal degradation pathway, protein kinase signaling cascades, autophagy, and hypersensitive responses [41]. In the nucleus, geminivirus use the enzymatic machinery of the hosts to replicate their initially ssDNA genomes to double stranded forms that interact with histones in the form of minichromosomes. This elicits RNA-directed DNA methylation (RdDM) to suppress the virus via transcriptional gene silencing (TGS) [42,43,44]. Beyond their role in photosynthesis, chloroplasts also play an important role in plant immunity through the regulation of reactive oxygen species (ROS) and salicylic acid (SA) signaling. Chloroplasts also modulate biosynthetic pathways via the synthesis of phytophomones, gibberellic acid (GA), abscisic acid (ABA) and jasmonic acid (JA). During the infection, chloroplasts appear to be one of the major targets of pathogens [25] ribosomal protein L10 (RPL10) plays role in protecting cassava against geminivirus infections, with a MYB domain-containing TF activating RPL10 transcription in the nucleus. This mechanism inhibited gene translation by the host and had a profound inhibitory effect on geminiviral replication. The authors proposed that the MYB/RPL10 protein network contributed to the symptomatic recovery of the virus resistant TIM3 cassava cultivar after exposure to South African cassava mosaic virus infection. However, while the abundance of MYB protein also showed an increase in response to SLCMV in our studies, we were unable to detect any RPL10. This may indicate that MYB-driven responses may vary according to the strain of geminivirus causing the infection, or the cultivar being tested. Nonetheless, determining MYB and RPL10 expression could be a valuable tool in breeding of CMD resistant cultivars. The fact that RPL10 is encoded on chromosome 12 of cassava and that polymorphisms in this chromosome appear to correlate with CMD2 resistance [66, 67], lends further support to the potential importance of the MYB/RPL10 pathway.

Basic helix-loop-helix (bHLH) proteins are transcription factors representing one of the largest TF families [68]. In Arabidopsis, bHLHs can act both as transcriptional activators or repressors, influencing plant development and physiological processes, such as phytohormone [

Conclusions

Here we describe the comprehensive proteomics analysis of the leaves of SLCMV infected and healthy cassava KU50 cultivar plants. In the course of the work carried out, more than 1,813 proteins were detected, including 479 that were upregulated in infected plants. These SLCMV induced proteins could be classified into nine functional groups and exhibited a wide range of biological functions. Further studies of the differentially regulated molecules in the cell membrane, nucleus, and chloroplast will help the identification of key proteins involved in cassava-SLCMV infection, potentially aiding the development of resistant crops.