Background

The oomycetes form a diverse group of filamentous eukaryotic microorganisms, also known as water molds, which include saprophytes as well as pathogens of plants, insects, crustaceans, fish, vertebrate animals, and various microorganisms [1, 2]. In plants, pathogenic oomycetes cause devastating diseases in a wide range of species including agricultural crops. Foxtail millet (Setalia italica (L.) Beauv.), the second most important millet in terms of global yield [3], suffers from downy mildew disease caused by Sclerospora graminicola (Sacc.) Schroet. in regions including India, China, Japan, and Russia.

Twenty genera of downy mildews are known, of which eight are graminicolous downy mildews [4]. Among these, S. graminicola (Sacc.) Schroet. is an obligate biotrophic oomycete. The likely source of the S. graminicola primary inoculum is oospores remaining in the soil or diseased plant residues. Fourteen graminaceous species are established hosts of S. graminicola, with strict host specificity observed among the various isolates of the pathogen [5]. After pathogen invasion, systemically infected leaves generally show chlorosis along the veins. When the pathogen colonizes the branched inflorescences, known as panicles, the floral organs are often transformed into leafy structures, in a process termed phyllody [6]. Phyllody leads to the disease referred to as “witches’ broom”, “green ear disease”, or “crazy top”, and is caused in foxtail millet, pearl millet, maize, and finger millet by pathogens belonging to the three genera, Peronosclerospora, Sclerophthora, and Sclerospora [6, 7]. No induction of phyllody in dicots by downy mildews has been reported.

Whole-genome sequencing and transcriptome analyses have profoundly changed research into plant-microbe interactions in recent years [8], and draft genome sequences of oomycetes have been published for five downy mildew pathogens [9,10,11,12,1).

Table 1 Genome statistics of Sclerospora graminicola (Sg) and other previously sequenced oomycetesa
Fig. 1
figure 1

Phylogenetic relationship of oomycete genomes. The tree was generated based on the nucleotide sequences of orthologous genes predicted by CEGMA pipeline using the Maximum Likelihood method implemented in MEGA6.06-mac. Bootstrap values from 1000 replicates are indicated on the branches

Sg has a large and heterozygous genome

Analysis of the k-mer frequency using paired-end reads showed two peaks, possibly derived from heterozygous and homozygous DNA sequences (Fig. 2a). To estimate the ploidy level of the Sg genome, we analyzed the distribution of the biallelic SNP call rate (Fig. 2b). The SNP counts had a single mode around 0.5, suggesting that the genome was diploid. The number of heterozygous SNPs, with a call rate of between 0.4 to 0.6, was 226,400 (Fig. 2c). The total genome size, estimated from the k-mer frequency at the peak corresponding to the putative homozygous DNA, was approximately 360 Mbp.

Fig. 2
figure 2

Sg has a large and diploid genome with high heterozygosity. a K-mer distribution and coverage of sequencing reads at K = 15. Peaks with single and double asterisks were estimated as k-mer species derived from heterozygous (k-mer frequency = 19) and homozygous (k-mer frequency = 37) sequences, respectively. b Ploidy analysis displaying the distribution of the SNP call rate. c Heterozygosity was evaluated by counting the SNPs based on the alignment of genome sequence reads

Sg has a highly repetitive genome

Gene prediction was carried out using Trinity/PASA, Tophat2/Cufflinks/PASA, MAKER2, AAT, based on the RNA-seq data and the in silico method [20,21,22,23,24,25]. By combining multiple types of evidence using EvidenceModeler [22], we identified a total of 16,736 genes supported by RNA-seq data. Analysis of repeated sequences using RepeatModeler [26] and RepeatMasker [27] revealed that approximately 73% of the assembled genome was repetitive, with more than half composed of long terminal repeat (LTR)-elements (Additional file 2: Table S2).

The Sg genome encodes proteins comparable to those of other downy mildews

To compare the Sg genome with those of other oomycetes, we performed clustering analyses of orthologs and paralogs from three downy mildew pathogens (DMs) (Sg, Plh, and Hyaloperonospora arabidopsidis; Hpa) and two Phytophthora species (Ph. infestans; Phi and Ph. sojae; Phs) based on the OMA orthology database [28]. There were 3548 and 2725 common orthologous groups in the DMs and in the five genomes (three DMs plus the two Phytophthora species), respectively (Additional file 3: Table S3). A total of 2055 groups were conserved in the Phytophthora species but not in the DMs, while only 128 groups were conserved among the DMs but not in Phytophthora. Some obligate biotrophs have lost the nitrogen and sulfate metabolic pathways [9, 10, 13]; an ortholog search revealed that Sg similarly lacked nitrate reductase, nitrite reductase, nitrate transporter, glutamine synthetase, and cysteine synthetase (Additional file 4: Table S4).

To gain insights into the unique features of the Sg genome, we compared the frequency of the protein domains encoded in the five oomycete genomes. In Sg, 11 domains were overrepresented (Fisher’s exact test, p < 0.05), compared with two in the DMs and/or Phytophthora species (Additional file 5). In particular, the Jacalin-like lectin domain was overrepresented among the putative secreted proteins. Although no domains were underrepresented in Sg alone, 85 domains were underrepresented in the three DMs in comparison with the Phytophthora species. Of these 85 domains, 20 were associated with cellular transporters and 11 were linked to plant cell wall degradation. Several protein families related to plant defense, such as elicitin and cellulose-binding elicitor lectin, were also less common in the DM genomes than in Phytophthora (Additional file 5).

Sg expresses conserved effector-like protein genes during infection

A total of 1220 Sg proteins were classified as putative secreted proteins based on the presence of signal peptides, predicted by SignalP4.1 [29], and the absence of transmembrane domains. This total was greater than those of Plh and Hpa, but fewer than that of Phi (Table 1). The number of proteins related to pathogenicity in Sg was comparable to that in other DMs, except for the RXLR-like proteins, of which Sg had more than Plh but fewer than those in the Phytophthora species (Table 2).

Table 2 Summary of putative pathogenicity genes in Sclerospora graminicola and related oomycetes

To search for effector candidates involved in Sg infection, an RNA-seq analysis was performed using total RNA extracted from sporangia/zoospores (inocula) and infected leaves. The foxtail millet leaves were inoculated with a spray containing a mixture of sporangia and zoospores. Primary penetration hyphae appeared 16–18 h after inoculation, and haustoria were formed one day after inoculation. We analyzed the gene expression profiles at five time points (stage 1: SPO (sporangia and zoospores), stages 2, 3, 4, and 5: 16 hpi (hpi; hours post infection), and 1, 2, and 3 dpi (dpi; days post infection), respectively). Distribution of maximum transcripts per million (TPM) value of all genes in five data points indicated that 54% of the genes were lower than 20, 31% were from 20 to 100, 14% were from 100 to 1000, and 1.6% were higher than 1000. From differentially expression gene (DEG) analysis using edgeR [30], expression of 91 putative secreted protein genes significantly changed during infection. The maximum value of TPM of all DEGs was more than 20.

Ninety one DEGs were classified into four clusters based on their expression patterns using ward’s method (Fig. 3, Additional file 6: Table S5). Representative genes of each cluster were validated by quantitative reverse transcription PCR (qRT-PCR) (Additional file 7: Fig. S1). Cluster I included genes expressed in sporangia or zoospores, but not during infection. The expression of genes belonging to cluster II increased in late stage of infection, suggesting that they include components contributing to pathogen expansion into leaves and the absorption of nutrition from host cells. Genes belonging to clusters III and IV were induced during stage 2 when the primary penetration hyphae developed, after which the expression of genes in clusters III gradually returned to basal levels. To determine the gene families overrepresented in each cluster, an enrichment analysis of protein domains predicted by InterProScan was performed (Additional file 8). CAP domain (CAP: the cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 proteins superfamily proteins) and CUB domain which is related to Trypsin-like peptidase were enriched in cluster I. Jacalin-like lectin domain and Necrosis inducing protein domain were significantly enriched in cluster III, indicating that these domain could function in the early stages of Sg infection.

Fig. 3
figure 3

Transcriptome profile of Sclerospora graminicola infection. a Heat map showing the expression patterns of DEGs encoding putative secreted proteins. b Line plots of the expression patterns of each gene cluster. SPO: mixture of sporangia and zoospores; L16H: SPO-inoculated leaves 16 h after inoculation; L1D, L2D, and L3D: SPO-inoculated leaves at one, two, and three days after inoculation, respectively

Different clustering methods could provide different results. We additionally performed clustering analyses using two methods, logFC-Cosine method using the cosine similarity of the vectors of their log-fold-change (logFC) values (Additional file 9: Figure S2) and model-based clustering method [31] (MBCluster; Additional file 10: Figure S3). Cluster I was separated into two clusters and some genes of cluster III and IV were classified into the same cluster by logFC-Cosine and MBCluster, however, most of genes showed similar clustering patterns by multiple clustering methods (Additional file 6: Table S5). Interproscan domain enrichment analysis indicated that Jacalin-like lectin domain and Necrosis inducing protein domain were also enriched in cluster 4 of logFC-Cosine method and cluster 2 of MBCluster that contain genes induced in early infection phase (Additional file 8).

To reveal features of Sg secretome, putative secreted proteins of Sg and 11 oomycetes (Plh, Hpa, Phi, Phs, Ph. ramorum, Ph. capsici, Ph. parasitica, Albugo candida, A. laibachii, Pythium ultimum, Saprolegnia parasitica) were clustered using TribeMCL protein family clustering algorithm [32]. 13,328 proteins were clustered into 1252 families (each family contains at least two sequences) and 1862 singletons. Of the 1252 familes, 230 contained Sg and other oomycete proteins and 78 were Sg specific families. Sg-specific families consisted of 39 RXLR-like families, 4 Jacalin-like domain-containing protein families, one leucine-rich repeat domain-containing family, one Mitochondrial carrier domain-containing family, and 33 unknown protein families (Additional file 11). Of these Sg-specific Tribes, Jacalin-like domain-containing families included genes those have high TPM levels, especially in stage 2 and 3 (Additional file 12: Fig. S4).

Jacalin-like lectin domain proteins

Jacalin-like lectin domain-containing proteins belong to a subgroup of lectins with binding specificity to mannose or galactose, and are involved in multiple biological processes. Jacalin-like proteins were overrepresented in the Sg genome (Additional file 5), and a phylogenetic analysis indicated many were specific to Sg (Fig. 4a). Among the jacalin-like protein genes of Plh, Hpa, and Phi, the closest to the Sg-specific clade was PITG_22899. Intriguingly, most of the Sg-jacalin-like proteins, including proteins with putative secreted signals and significant expression levels, belonged to the Sg-specific clade (Fig. 4a, red filled circles, Additional file 13). Effector genes are distributed in gene-sparse regions of the Phi genome [33, 34]. From the analysis of intergenic distance, jacalin-like protein genes appeared to distribute in gene-sparse regions (Fig. 4c, Wilcoxon rank sum test, 5′-intergenic length; p-value = 0.03721, 3′-intergenic length; p-value = 0.01161), however, most of jacalin-like protein genes were located near the scaffold border and were not possible to determine intergenic distance.

Fig. 4
figure 4

Features of jacalin-like lectin domain-containing protein genes. a Phylogeny of the jacalin-like lectin domain-containing proteins of Sg, Plh, Hpa, and Phi. The tree was conducted using the Maximum Likelihood method implemented in MEGA6.06-mac, with 1000 bootstrap replicates. b Multiple sequence alignment showing the sequence similarity between PITG22899T0 and the jacalin-like lectin domains of the Sg proteins. c Distribution of intergenic region length of Sg genes. All predicted genes are represented by a heatmap and the jacalin-like protein genes are represented by white circles. d Relative expression of DEGs of jacalin-like protein during infection. Clusters III and IV are defined in Fig. 3

Nep1-like proteins (NLPs)

NLPs are a widespread effector family among filamentous and bacterial pathogens that show very different lifestyles [35]. Oomycetes have two types of NLPs: type 1 NLPs with a cation-binding pocket required for cytotoxicity, and type 1a NLPs with amino acid substitutions in their cation-binding pocket [35]. The Sg genome contained 24 NLP-encoding genes, 17 of which had an N-terminal secretion signal peptide (Additional file 14). One NLP, SG00816, was classified as a type 1 NLP with a TRAP repeat and the other 23 were type 1a NLPs.

Six of the 24 SgNLPs were DEGs (Additional file 14). The type 1 NLP, SG00816, was not significantly expressed at any stage (Additional file 14). Intriguingly, these DEGs of NLPs were in one clade of the Sg-specific expansion groups (see asterisk in Fig. 5). All of six differentially expressed NLPs were classified into cluster III and IV (Additional file 14).

Fig. 5
figure 5

Phylogenetic relationship of NLP genes in Sg, Hpa, and Phi. The tree was constructed using the Maximum Likelihood method implemented in MEGA6.06-mac, with 1000 bootstrap replicates. The asterisk indicates the Sg-specific expansion group

Crinklers (CRNs)

CRNs are cytoplasmic effectors originally identified in Phi as secreted proteins that have a conserved LFLAK motif in the 50 amino acid residues of the N-terminal [36]. We identified 45 CRN-like genes in Sg (Table 2). Only four of these had a signal peptide at the N-terminus. SgCRNs, including four putative secreted CRN genes, were not significantly expressed during infection (Additional file 15).

RXLR-like proteins

The RXLR domain is a putative host-targeting motif [37] and is highly conserved among plant-pathogenic oomycetes. We predicted RXLR-like protein genes by searching for a RXLR(−EER) sequence following the N-terminal putative signal peptide. Proteins showing high similarity to known RXLR-like proteins were also included as RXLR-like protein candidates. A total of 355 RXLR-like proteins were found, among which 165 had the exact RXLR-EER motif and 60 had the RXLR motif, while 130 were predicted to be RXLR(−EER) variants (Fig. 6a). Some RXLR effectors contain a core α-helical fold known as the WY-fold [38]. We explored whether our identified RXLR-like proteins had the WY-fold using HMMER, and found a total of 38 proteins with at least one WY-fold (Additional file 16). In the gene expression profile and expression pattern clustering, RXLR-like protein genes were not enriched in any clusters; however, 22 of these genes were induced during infection (Additional files 8 and 15).

Fig. 6
figure 6

Features of RXLR-like protein genes. a Distribution of the conserved sequence patterns of putative RXLR-like proteins. b Distribution of Sg genes according to the length of their 5′ and 3′ flanking intergenic regions. The density of genes in each positional bin is indicated by a heatmap. Putative secreted proteins (white) and RXLR-like proteins (red) genes are represented by circles. (C) Orthologous groups of SgRXLR-like proteins within the putative secreted proteins of four oomycetes

Effector genes are distributed in gene-sparse regions of the Phi genome [33, 34]. In the Sg genome, secreted protein genes, in particular RXLR-like protein genes, were distributed in relatively gene-sparse regions compared with all of the predicted genes (Fig. 6b). Wilcoxon rank sum test indicated that distribution of intergenic length of RXLR-like genes was significantly different from that of all predicted genes (5′-intergenic length; p-value = 9.244e-05, 3′-intergenic length; p-value = 1.225e-08). We searched for orthologs of SgRXLR-like proteins among the putative secreted proteins of five oomycetes (Sg, Plh, Hpa, Phi, and Phs) and compared them using the OMA orthology database [28]. There were 35 ortholog groups that contained SgRXLR-like proteins (Fig. 6c), with most Sg orthologs found in the Phi genome.

Discussion

S. graminicola (Sg) has a large and highly heterozygous genome

Our analysis of Illumina sequencing paired-end reads suggested that the genome size of Sg is approximately 360 Mbp. This is 1.3 times larger than the genome of Phytophthora mirabilis, the largest among the previously sequenced oomycete plant pathogen genomes [39]. Phylogenetic analyses indicated that Sg is closely related to Plh, which has a 100-Mbp genome (Table 1), suggesting that expansion of the Sg genome probably occurred after its divergence from Plh. A broad range of genome sizes among closely related oomycetes is also found in Phytophthora; the smallest genome among the deeply sequenced Phytophthora species is 65 Mbp (in Ph. ramorum), while the largest genome is 240 Mbp (in Ph. infestans) [33]. Genome expansion occurred in Ph. infestans with an increase in repetitive regions such as the Gypsy elements. We found that at least 73% of Sg and 40% of Plh genomes comprised repeat regions, respectively. The number of protein-coding genes in the Sg genome was comparable to that in Plh, indicating that the larger genome size in Sg is not caused by an increased number of genes but by the expansion of the repetitive elements.

Proteins encoded by the Sg genome are mostly comparable to those of dicot downy mildews

A total of 2055 orthologous gene groups were conserved in the Phytophthora species but not the DMs. By contrast, the number of groups conserved among the DMs but not in Phytophthora was only 128. This suggests two possibilities: either the Phytophthora species are more phylogenetically closely related while the DMs are more diversified, or the obligate biotrophs have lost substantial numbers of genes in comparison with non-obligate microbes. Indeed, the DMs, including Sg, lack part of the nitrogen and sulfate metabolic pathways. When we compared the protein-coding domain frequency between the DM and Phytophthora genomes, we found fewer genes encoding transporters, cell wall degrading enzymes, and elicitin in the DMs than in Phytophthora. These results suggest that DMs have adapted to their hosts and developed their obligate biotroph lifestyles by losing components that might induce the host defense response.

Expression patterns of putative secreted protein genes

We performed expression profiling of putative secreted protein genes during infection and classified them into five clusters. Cluster I included genes expressed only in sporangia and zoospores, which likely having no direct influence on Sg infection of foxtail millet leaves. By contrast, the expression of genes belonging to clusters II, III, and IV increased during Sg infection in foxtail millet leaves. Genes of cluster II gradually increased with development of internal hyphae, suggesting that these genes contribute to the haustorial development of Sg and might be involved in the induction of phyllody in the Sg-infected foxtail millet plants. The expression of genes belonging to clusters III and IV were induced in stage 2 of infection, during the development of the primary penetration hyphae, then subsequently returned gradually to their basal expression levels. We hypothesize that Sg genes belonging to these clusters have roles in overcoming the host defense responses in foxtail millet, and that the effector candidate genes determining host specificity are included in clusters II, III, and IV.

Jacalin-like lectin domain proteins

We found that jacalin-like lectin domain-containing protein genes were specifically overrepresented in the Sg genome in comparison with Plh, Hpa, Phi, and Phs (Additional file 5). Additionally, clustering of Sg and 11 oomycetes secretomes using TribeMCL showed that Sg has four Sg-specific families which include 36 genes of jacalin-like domain proteins. PITG_22899, the closest gene to the Sg-specific clade, is induced in Phi during plant infection stages, and has been reported as an effector candidate by an in silico analysis (Fig. 4a) [34]. These findings imply that jacalin-like genes play a role in infection and have specifically diversified in the Sg genome. Our clustering analysis of the Sg gene expression patterns indicated that eight jacalin-like protein genes were found as DEGs (Additional file 13, in cluster III and IV). Many of jacalin-like genes other than DEGs also indicated high level of TPM (Additional file 13) implying that jacalin-like genes play roles during early infection.

If jacalin-like proteins are a novel class of effectors in Sg, it would be reasonable to expect the jacalin-like genes to be distributed in gene-sparse regions. While this did appear to be the case (Fig. 4c) [33, 34], the assembled scaffolds in this study were too short to determine genetic distances for a large number of genes. The use of long sequencing reads to improve the assembly will be required to determine the genetic distances of all genes, in particular the effector candidates located in gene-sparse regions.

Previous reports suggested that plant jacalin-like proteins play a role in the defense response; for example, a jacalin-related lectin-like gene in wheat positively regulates resistance to fungal pathogens [Crinkler (CRN) protein predictions

First, CRN pre-candidates were identified by their sequence similarity to known CRN proteins using BLASTp. The resulting 12 proteins with a LF/YLAK motif in their N-terminal 120 amino acids (aa) were used in a manual HMM search. The HMM was trained from the N-terminal 120 aa of these genes, and the pre-candidates were searched using HMMER v3.1 [58] with an e-value cut-off of 1e-3. The resultant proteins were identified as CRN-like proteins.

RXLR protein predictions

Candidate RXLR-like proteins were extracted from predicted secreted proteins using Perl regular expressions, HMM, and a BLASTp search. An initial set of proteins were searched using Perl regular expressions as described previously [33] and in HMM using the hmm profile [59]. The following approaches and criteria were used to extract exact RXLR proteins: (1) signal peptides within residues 1–30 followed by an RXLR motif [33, 59]; (2) Regex: allowing for a signal peptide between residues 10–40, followed by the RXLR motif within the next 100 residues, followed by the EER motif, allowing D and K [33]; (3) HMM search using Win’s hmm profile.

To complement the above approach, the predicted secreted proteins were scanned using HMM and a BLASTp search to extract RXLR-like proteins: (4) an HMM was trained on 40 aa sequences including the RXLR-EER motif from the exact RXLR proteins, and putative secreted proteins were searched for using HMMER v3.1 [58] with an e-value cut-off of 1e-3. (5) Putative secreted proteins with sequence similarity to known RXLR proteins were searched using BLASTp with an e-value cut-off of 1e-10.

The results for approaches 1–5 above were merged and the non-overlap** set of proteins were defined as RXLR-like protein genes (Additional file 18).

WY-domain predictions

The WY-domains of predicted RXLR-like proteins were extracted using a pfam search, MEME [60], PSIPRED [61], and HMM, as described previously [38]. First, conserved motifs annotated as RXLR by the pfam search (Additional file 19) were searched using MEME with following parameters: -protein -oc. -nostatus -time 18,000 -maxsize 60,000 -mod zoops -nmotifs 5 -minw 6 -maxw 50. The protein secondary structure was predicted using PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/). From the MEME results, motif 1 included repeating WLY sequences and spanned an a-helical fold (Additional file 20: Fig. S5). We used sequences including motif 1 for the manual HMM search as a WY-domain. After training the HMM, the RXLR-like proteins were searched using HMMER v3.1 [58] with an e-value cut-off of 0.05 (Additional file 18).

Expression profiling

Expression levels of predicted genes were determined using the TopHat2/Cufflinks pipeline [24, 25]. Differential expression was evaluated by the Fisher’s exact test using the edgeR package (version 3.18.1) [30]. TPM was calculated by the following formula: TPM = (FPKM / (sum of FPKM over all transcripts)) * 106. Clustering by the ward’s method was performed using R Commander [62]. Clustering by logFC-Cosine method was performed using the cosine similarity of the vectors of their logFC values calculated by edgeR. Clustering by model-based clustering method was performed using MBCluster.Seq package (version 1.0) [31]. Expression levels of putative pathogenicity genes were indicated in Additional files 14, 15, 16, and 21.

qRT-PCR analysis

cDNA was synthesized using ReverTra Ace® (Toyobo, Osaka, Japan). The qRT-PCR was performed using StepOne ™ real-time PCR instrument (Applied Biosystems, Foster city, CA, USA) with 10 μL reaction mixtures containing 0.5 μL cDNA, 5 μL the KAPA SYBR FAST Universal 2X qPCR Master Mix (Kapa Biosystems, Wilmington, MA, USA), 0.3 μL of each gene-specific primer (0.1 mM), and 1.9 μL ddH2O under the following reaction conditions: 95 °C for 20 s, followed by cycling for 40 cycles of denaturation at 95 °C for 3 s, and annealing and extension at 60 °C for 30 s. Finally, melt curve analyses (from 60 to 95 °C) were included at the end to ensure the consistency of the amplified products. A comparative CT (ΔΔCT) experiment used an endogenous control to determine the quantity of target in a sample relative to the quantity of target in a reference sample. Histone H2A gene (SG05345) was used as internal control. The primer sequences are provided in Additional file 22.

Ploidy analysis

The ploidy level was estimated as described previously [63]. Paired-end reads were mapped to the assembled genome using BWA. SNPs with at least 10 × coverage were counted using samtools v0.1.18.

Heterozygosity

To calculate heterozygosity, paired-end reads were mapped to the assembled genome using BWA. The SNPs were counted using samtools v0.1.18. SNPs with an allele frequency of between 0.4 and 0.6 were counted as heterozygous.

Domain search for S. italica jacalin-like proteins

S. italica proteins were downloaded from the foxtail millet database of the Bei**g Genome Initiative [64]. Jacalin-like domain-containing proteins were identified using InterProScan 5.15–54.0 [56] and the S. italica jacalin-like proteins were annotated using the HMMER web server [65].