Background

In the Himalayas, temperate Asia, and northern East America, Phryma leptostachya L. is a widely distributed perennial herb with both medicinal and agricultural uses [1,2,3]. As a traditional natural insecticide with striking insecticidal activity, this plant has been used to repel mosquitoes and flies in East Asia [4, 5]. Previous investigations have shown that the main insecticidal active ingredients in P. leptostachya are furofuran lignans [1, 6, 7]. For example, haedoxan A (HA) exhibits high insecticidal effectiveness against a wide variety of pests, like Culex pipiens pallens [7], Mythimna separata [4], Aedes albopictus, and Aedes aegypti [6, 8, 9]. ( +)-Phrymarolins I and II (( +)-P-I and P-II) have the same furofuran skeleton as HA and there is considerable synergistic activity between them and HA, pyrethrin, or carbamate pesticides [6, 10]. Consequently, haedoxans and phrymarolins are likely to serve as the main insecticidal ingredients in new botanical pesticides. However, due to their extremely low contents and the difficulty of chemical synthesis [11,12,13,14], a better understanding of the biosynthetic pathways of furofuran lignans in P. leptostachya would be an advantage to provide a potential approach for their application.

Coniferyl alcohol, one of the monolignols generated from the phenylpropanoid pathway, is dimerized to produce furofuran lignans [15, 16]. Then, a pair of methylenedioxy bridges are formed, followed by oxidation, methylation, and acetylation [17,18,19]. Coniferyl alcohol is therefore the monomeric building block for furofuran lignans, which can alter their composition and types significantly. To investigate the enzyme that catalyzes coniferyl alcohol, Davin et al. [20] conducted groundbreaking research and found that in the presence of an oxidase (peroxidase or laccase) or electron oxidant, coniferyl alcohol molecules could be stereoselectively coupled into ( +)-pinoresinol by a catalytic enzyme, dirigent protein (DIR).

The name DIRs comes from the Latin word dirigere, which means to align or guide. The first DIR protein was discovered in Forsythia intermedia [20]. Then, ferns, gymnosperms, and angiosperms were subsequently found to contain this kind of protein [21,22,23]. Often, DIR genes come in the form of gene families, such as 25, 49, 44, 45, 29, and 19 DIRs, which have been found in Arabidopsis thaliana, Oryza sativa, Linum usitatissimum, Medicago truncatula, Brassica rapa, and Isatis indigotica [21, 24,25,26,27,28]. According to Ralph et al. [21], six subfamilies of DIR proteins (DIR-a, DIR-b/d, DIR-c, DIR-e, DIR-f, and DIR-g) are recognized based on the lignan spatial structures they mediate and their evolutionary relationships. The DIR-a subfamily is thought to play a role in the production of pinoresinol, whereas the roles of the other subfamily members remain unknown. As a result, DIRs that do not belong to the DIR-a subfamily are referred to as DIR-like [20, 29].

By inhibiting microbe-derived degradative enzymes and forming a barrier against microbial pathogens, lignans play significant roles in plant pathogen defense. Therefore, by regulating monolignol coupling associated with the biosynthesis of lignans, DIRs improve plant stress resistance [16, 23, 30]. Numerous biotic and abiotic stressors can activate DIR genes. For example, DIR genes in the corresponding plants can be induced by the infection of pathogens, which include Fusarium solani in soybean [31], Colletotrichum gloeosporioides in Physcomitrella patens [32], Erysiphe necator in Vitis vinifera [33], and Verticillium dahlia in cotton [34]. Also, after exposure to abiotic stresses, such as salt, drought, high/low temperature, pesticide residue, water logging, and H2O2, there is evidence of ScDIR in sugarcane [35], OsDIRs and ShDJ in rice [36, 37], BrDIRs in Brassica [28], BhDIR1 in Boea hygrometrica [38], and CsDIR16 in cucumber [39] responding to them. In addition, DIR genes can be modulated by hormone signals, such as salicylic acid (SA), ethylene (ETH), methyl jasmonate (MeJA), and abscisic acid (ABA) [40].

DIR genes participate in many physiological processes in plants, and the exploration of their function is helpful to analyze lignan biosynthesis and metabolic pathways. Due to there being no detailed study of the DIR gene family in P. leptostachya, our work aims to further broaden current knowledge of the functions of PlDIRs. Here, a transcriptome-wide analysis of the DIR family in P. leptostachy was performed, and sequence characterization, phylogenetics, motif, and tertiary structure analysis were included. Meanwhile, we also investigated the expression patterns of PlDIRs in different tissues and explored their responses to signaling molecules. Furthermore, the function of PlDIR1 as a ( +)-pinoresinol-formation protein was revealed by analyzing the catalytic activity of its recombinant protein and the results of molecular docking. These discoveries will help comprehend PlDIRs’ function and will establish the groundwork for understanding the biosynthetic pathways for furofuran lignans and metabolic engineering in P. leptostachya.

Results

Identification and sequence analysis of DIR genes in P. leptostachya

The members of the P. leptostachya DIR gene family were identified by screening the transcriptome sequencing of P. leptostachya (accession no. PRJNA551634). After rejecting the redundant, overlapped, incomplete, and repeated sequences, 15 DIR gene sequences with complete open reading frames (ORFs) were obtained and named PlDIR1-15. Their conserved DIR domains (PF03018) were analyzed with the Pfam (http://pfam.xfam.org/search) and SMART (http://smart.embl-heidelberg.de/) programs. The analysis results for these genes are shown in Table 1. It was found that the predicted ORFs for the 15 DIR genes ranged from 543 (PlDIR13) to 609 (PlDIR4) bp, with the amino acid length mainly between 181–203 aa. The molecular weight (MW) of PlDIRs was between 19.78–22.16 kDa. The predicted isoelectric point (pI) values were within the large variable range (4.43–10.13), and the pI of 8 members is alkaline (pI > 7.0). Furthermore, except for PlDIR9, 13 and 15, most of the PlDIRs had a 20–30 aa length signal peptide at the N-terminus.

Table 1 Sequence analysis of 15 DIRs in P. leptostachya

By using the WoLF PSORT and CELLO subcellular localization software, PlDIRs were predicted to be mainly located in the chloroplast (chloro), plasma membrane, and extracellular. A total of 10 PlDIRs (PlDIR1-4, 6–8, 10, 13, and 14) were located in the chloro. Among these, five PlDIRs (PlDIR2, 3, 4, 6, and 13) were also located in the plasma membrane, three (PlDIR1, 8, and 14) were also located in extracellular space, and PlDIR10 was also located in mitochondria. Three PlDIRs (PlDIR5, 11, and 12) were distributed extracellularly, and two (PlDIR5 and 11) were also located in the plasma membrane. In addition, PlDIR9 and PlDIR15 were distributed in the cytoplasm, and also located in the nuclear and plasma membranes, respectively (Table 1).

Phylogenetic analysis and classification of PlDIRs

To further group and predict the potential functions of PlDIRs from well-studied DIRs in other plants, a phylogenetic tree was constructed with the amino acid sequences of PlDIR1-15 and 97 previously characterized DIRs from A. thaliana, O. sativa, I. indigotica, and other selected plant species. A total of 112 DIR/DIR-like proteins were categorized into six well-conserved subfamilies: DIR-a, b/d, c, e, f, and g (Fig. 1A). PlDIR members were grouped into four subfamilies (DIR-a, b/d, e, and g); DIR-c was a monocot specific subfamily, no proteins from our study were clustered into this DIR group. Ten members of PlDIRs (PlDIR3/4/5/6/7/8/9/11/12/14) were uniquely clustered into subfamily DIR-g. Two PlDIRs (PlDIR1/2) were clustered into DIR-a with two F. intermedia, eight Thuja plicata, six O. sativa, five A. thaliana, and four I. indigotica proteins. Another two PlDIRs (PlDIR10/15) were clustered into DIR-b/d with fourteen A. thaliana, seven I. indigotica, two Gossypium barbadense, and one O. sativa protein. PlDIR13 was clustered into DIR-e with eight I. indigotica, six A. thaliana, and two O. sativa proteins.

Fig. 1
figure 1

Phylogenetic relationships of DIRs from P. leptostachya and other plant species. A. Phylogenetic tree of 15 DIRs from P. leptostachya and other DIRs. Different groups of DIRs are indicated by different colors. PlDIRs are written in red and labeled with a red star. B. Circoletto radial diagram with ribbons connecting the PlDIRs and DIR orthologs in DIR-a subfamily. The colors of the ribbons are relative to the best BLAST alignment score, with matches within 80% of the best match as red, within 60% as orange, and within 40% as green. Light grey (PlDIRs) and dark grey bands on the periphery of the diagram represent the protein sequences, with the start and end of the sequence shown as green and red blocks, respectively. Ribbons representing the best hits are outlined and placed on top of all other ribbons

Previous studies have demonstrated that the DIR-a group members AtDIR5/6, TpDIR5/8, and FiDIR1 were involved in the formation of ( ±)-pinoresinol [20, 41, 42]. PlDIR1 and PlDIR2 belong to the DIR-a subfamily. To predict the potential functions of these two proteins, Circoletto was used to identify and visualize the sequence similarities between them and other members in DIR-a. As the results showed, PlDIR1 and PlDIR2 exhibited the highest sequence identity with FiDIR1 and FiDIR2, suggesting their roles in the lignan biosynthesis process (Fig. 1B). In addition, a comparison of PlDIR protein sequences shows that the protein similarity ranges from 18.6 to 93.6%, indicating the functional diversity among PlDIRs. The sequence similarity of PlDIR1 and PlDIR2 proteins in the DIR-a subfamily is exceptionally high, at 93.6%, whereas DIR-e member PlDIR13 exhibits low sequence similarity with other PlDIR proteins (Additional file 1: Table S1).

Protein characterization and tertiary structures of PlDIRs

Twelve conserved motifs of PlDIR proteins were identified by MEME; the details were listed in Additional file 2: Table S2, and a schematic diagram was designed to characterize the structural diversity of the DIR proteins (Fig. 2A). There are 3–7 conserved motifs contained in all of the PlDIR proteins. The highly conserved motifs 1–3 were found in all subgroups and were present in fifteen sequences. Good distributions of motifs 4–6 were found in ten, nine, and eight proteins, respectively, excluding the members of DIR-b/d and g subfamilies. The majority of PlDIR members belonging to the same subfamily shared certain conserved motifs, illustrating the functional conservation within subfamilies as well as the variety within distinct subfamilies. For example, motifs 7, and 8 were only found in the DIR-g subfamilies, and motifs 9–12 were specifically present in the DIR-a subfamilies. In addition, the result of domain position analysis revealed that all of the conserved DIR domains were located close to the C terminal in the related proteins (Fig. 2B).

Fig. 2
figure 2

Distribution of conserved motifs and domains of PlDIRs. A. Distribution of conserved motifs in PlDIRs. Twelve putative motifs are shown in different colored boxes. The sequence information for each motif is provided in Additional file 2: Table S2. B. The position of the conserved DIR domain in each PlDIR protein

In addition, the monomer of the pea (Pisum Sativum) DIR protein PsDRR206 (C4REV.A) associated with ( +)-pinoresinol was used as the template [43], which shared 25–59% sequence identity with the PlDIRs, to build the 3D structures of 15 PlDIR proteins. As Fig. 3 shows, after comparing and merging PlDIRs with PsDRR206, the 3D structures of PlDIR1 and 2 could be well integrated with PsDRR206, indicating their similarity in structures or even in functions.

Fig. 3
figure 3

Predicted tertiary structures of PlDIR proteins. The prediction of PlDIRs was compared and merged with PsDRR206 (associated with ( +)-pinoresinol)

Expression patterns of PlDIR genes in different tissues

Because the transcript abundance of a gene could reflect its function to a certain degree, the relative expression level of the 15 PlDIRs was analyzed in the tissues of the root, stem, leaf, flower, and seed by quantitative real-time reverse transcription-PCR (qRT-PCR). The results were presented in the form of P. leptostachya cartoon heatmaps (Fig. 4), and the expression trends were clustered in Fig. 4B. Based on the heatmap analysis, most PlDIRs have a comparatively higher transcript abundance in roots, leaves, and stems than in seeds and flowers. 8 PlDIRs (PlDIR1/2/4/5/7/9/13/14) displayed the highest transcript abundance in root tissues, and a higher level of expression was observed in leaf tissue for 5 PlDIRs (PlDIR3/6/10/12/15). PlDIR8 and 11 showed specific higher transcript abundance in the stems, and PlDIR7 was the only one that showed an accumulated expression level in seeds. However, all of these genes were hardly expressed in flowers.

Fig. 4
figure 4

Expression patterns of PlDIR genes in various tissues (root, stem, leaf, flower, and seed). A. Diagram showing the different tissues of the P. leptostachya plant. B. The heatmap was drawn by TBtools using mean values. C. The expression patterns of PlDIR genes are presented by a cartoon heatmap. The data were normalized with the expression level of Pl5.8 s RNA in the root by the 2−ΔΔCt method. Color orange represents a high expression level and blue represents a low expression level

Expression responses of PlDIRs gene to signaling molecules

Based on the tissue higher expressions of PlDIRs, roots, and leaves of P. leptostachya were selected to further analyze the response patterns of PlDIRs genes to three stress-related signaling molecules (MeJA, SA, and ETH) at 0, 6, 12, 24 h by qRT-PCR (Figs. 5, 6 and 7). The results showed that the majority of relative expression levels of PlDIRs were upregulated, but the response time and fold upregulation were inconsistent. For MeJA treatment, the response patterns of PlDIR1/2/9 were similar. Their relative expression in leaves was higher than in roots and reached a maximum of 6 h in roots. PlDIR10, PlDIR11, and PlDIR14 have similar expression profiles, which are suppressed in roots and more sensitive to MeJA in leaves. In addition, eight of the PlDIRs showed significant responses in roots, when compared to those in leaves. Among them, PlDIR4 and PlDIR5 showed higher expression levels at 12 h in roots; PlDIR3/6/7/12/15 reached a maximum at 24 h, and the expression level was up-regulated by more than 6, 8, 9, 20, and 10 folds, respectively (Fig. 5).

Fig. 5
figure 5

Relative expression level of PlDIR genes under MeJA treatment. The data were normalized with the expression level of 0 h by the 2−ΔΔCt method. Error bars represent the mean ± standard deviation (SD) of 3 biological replicates. The color red in the heatmap represents a high expression level, and blue represents a low expression level

Fig. 6
figure 6

Relative expression level of PlDIR genes under SA treatment. The data were normalized with the expression level of 0 h by the 2−ΔΔCt method. Error bars represent the mean ± standard deviation (SD) of 3 biological replicates. The color red in the heatmap represents a high expression level, and blue represents a low expression level

Fig. 7
figure 7

Relative expression level of PlDIR genes under ETH treatment. The data were normalized with the expression level of 0 h by the 2−ΔΔCt method. Error bars represent the mean ± standard deviation (SD) of 3 biological replicates. The color red in the heatmap represents a high expression level, and blue represents a low expression level

After exposure to SA treatment, 12 of the 15 PlDIRs were highly expressed to the maximum at 24 h in leaves, except for PlDIR3, PlDIR6, and PlDIR12, which reached their expression peaks at 24 h in roots. PlDIR4/5 has a strong response to SA in leaves compared with other genes, with the upregulation occurring more than 20,000 times. Moreover, the expression of PlDIR1/2/8/10/11/13/14/15 was suppressed in roots at all the tested time points; and PlDIR4/5/7/9 has similar expression profiles, which increased by more than sixfold in roots at 24 h (Fig. 6).

The ETH treatment induced the expression of PlDIR6/7/15 in roots much faster than in leaves, with peaks at 12 h. Especially for PlDIR6/7, which increased by more than 60 folds. PlDIR11 and PlDIR12 were then upregulated more than 12 times in 24 h. On the contrary, PlDIR1, PlDIR2, PlDIR10, and PlDIR14 were down-regulated at different time points in the roots. After the treatment with ETH, the expression profile of PlDIR1/2/4/5/8/9/13 in leaves is much higher than that in roots, which reached a maximum at 6 h or 12 h, except for PlDIR13 (Fig. 7).

Amino acid sequence alignments of DIR-a subfamily

Phylogenetic analysis revealed that two PlDIRs (PlDIR1 and PlDIR2) were DIR-a subfamily members, and amino acid sequence alignments of them were performed to determine if any hypothetical functions could be inferred (Fig. 8). According to a previous study, the DIR-a subfamily members AtDIR5 and AtDIR6 from A. thaliana, LuDIR5, and LuDIR6 from L. usitatissimum were found to guide E-coniferyl alcohol to form (-)-pinoresinol [41, 44]. However, in the presence of PsDRR206 from P. sativum, FiDIR1 from F. intermedia, TpDIR5 and TpDIR8 from T. plicata, and LuDIR1, the final product of E-coniferyl alcohol was the enantiomer ( +)-pinoresinol [20, 42,43,44]. As shown in Fig. 8, an eight-stranded antiparallel β-barrel (black arrow) and two N-glycosylation sites at aa 74 and 144 (Asn; green circles) were presented in all protein sequences. Strictly conserved residues (pink box) are conserved in all characterized pinoresinol-forming DIRs [45, 46]. Five differentially conserved residues at aa 119, 137, 139, 141, and 154 were involved in forming ( +)-pinoresinol or (-)-pinoresinol (red triangle). Furthermore, conserved residues at aa 79, 92, 96, 160, 167, 185, and 200 (blue triangle) are involved in forming (-)-pinoresinol, whereas DIR residues at aa 159 (purple triangle) are for forming ( +)-pinoresinol. The key amino acid residues of PlDIR1 and 2 are highly consistent with the ( +)-pinoresinol forming DIR proteins, indicating their functions in catalyzing the generation of ( +)-pinoresinol rather than (-)-pinoresinol.

Fig. 8
figure 8

The alignment of PlDIR1 and PlDIR2 protein sequences with other ( +)- and (-)-DIRs. Only full-length sequences are shown. Strictly conserved residues are boxed in pink. Those that are conserved in ( +)- and (-)-DIRs are highlighted by purple and blue triangles, respectively. Red triangles indicate residues that are differentially conserved in ( +)- and (-)-DIRs. N-glycosylation sites are shown by green circles. Secondary structure elements are shown above the alignment

Heterologous expression and catalytic activities of recombinant PlDIR1 protein

After induction for 6–15 h at 16 °C with 0.1 mM IPTG, the recombinant PlDIR1 protein with a MW of around 20.97 kDa were maximally expressed in E. coli BL21 (DE3) at 12 and 15 h. The recombinant protein with an N-His6-tag was purified by a Ni–NTA affinity column and verified by SDS-PAGE and Western blotting (Fig. 9A, B and Additional file 3: Fig. S1). Western blot detection showed that the PlDIR1 could specifically combine with anti-His-tag antibodies. One single immunoreactive band was detected from the recombinant PlDIR1 protein, and no such band was found in the empty vector pET-29a( +) (Fig. 9B). Then, in vitro enzyme activity assays were conducted to determine the potential catalytic activity of the recombinant protein, and reaction products were analyzed by LC–MS/MS. As seen in Fig. 9C, when substrate and laccase (Lac) protein were provided to recombinant protein PlDIR1, a peak at 6.78 min was observed (m/z 150.5–151.5), which was identical to the peak observed in chromatograms generated from standard ( +)-pinoresinol (m/z 150.5–151.5). However, in reaction samples without PlDIR1 (contains E-coniferyl alcohol and Lac) or substrate E-coniferyl alcohol (contains PlDIR1 and Lac), no such peak was detected. Moreover, the ion fragments observed in the mass spectrum for the peak appearing at 6.78 min in the PlDIR1-containing assays (Fig. 9E) were consistent with the fragmentation of the ( +)-pinoresinol standard (Fig. 9D). These results illustrate that PlDIR1 could catalyze the conversion of E-coniferyl alcohol to ( +)-pinoresinol with the help of Lac.

Fig. 9
figure 9

Analysis of PlDIR1 dirigent activity by LC–MS/MS. A. Expression of recombinant PlDIR1 protein in E. coli BL21 (DE3) was induced using 0.1 mM IPTG at 16℃ for 6–15 h and purified from the soluble fraction of the induced cells using resin with an affinity for the His-Tag. M: protein marker; Lane 1: the empty vector pET-29a( +) expressed in E. coli BL21 (DE3) and induced by IPTG for 6 h; Lane 2–5: pET29a( +)-PlDIR1 induced by IPTG for 6, 9, 12, and 15 h; Lane 6: purified soluble PlDIR1 protein. B. Western Blot assay of the recombinant PlDIR1 protein. M: protein marker; Lane 1: the empty vector pET-29a( +) expressed in E. coli BL21 (DE3) and induced by IPTG for 6 h; Lane 2: purified soluble PlDIR1 protein. C. LC–MS/MS analysis of ( +)-pinoresinol in the catalytic product of the recombinant PlDIR1 protein. Extracted ion chromatograms show the intense peak of standard ( +)-pinoresinol or catalytic product of PlDIR1 at m/z = 150.5–151.5. D, E. Mass spectra of standard ( +)-pinoresinol and catalytic product of PlDIR1. RT, retention time; AA, area

The correlation between hormone-induced lignan accumulations and PlDIR1 expression profile in P. leptostachya roots

The catalytic process of PlDIR1 exists upstream of the P. leptostachya lignan biosynthesis pathway. To determine the relationship between PlDIR1 and other metabolites in this pathway, a correlation analysis using Pearson’s correlation coefficient (PCC) was performed to identify possible correlations between the PlDIR1 expression and the investigated metabolites under hormone treatments at different time points. This data was visualized as a heat map. To achieve this purpose, the accumulation of five key lignan compounds (Leptostachyol acetate, LA; 6-Demethoxy-leptostachyol acetate, 6- demethoxy-LA; P-I, P-II, and HA) in P. leptostachya roots was firstly analyzed by HPLC, since they are mainly stored in root tissue. As a result, after MeJA treatment, only HA showed a sightly accumulation at 6 and 12 h; the contents of the remaining metabolites were reduced significantly compared to the control and reached the lowest levels at 24 h (Fig. 10A). Considering the correlation coefficients between PlDIR1 transcript levels and accumulation of five metabolites were 0.63, 0.40, 0.05, 0.15, and 0.64, respectively, PlDIR1 is correlated with LA and HA, but not or minimally correlated with 6-demethoxy-LA, P-I, and P-II (Fig. 10B). A similar trend was found for lignan accumulation under SA treatment (Fig. 10C), but dramatic correlations (P < 0.01) were presented between PlDIR1 expression profiles and the contents of 6-demethoxy-LA, P-I, and P-II, with correlation coefficients of 0.87, 0.87, and 1, respectively (Fig, 10D). Different results were presented after ETH treatment: four metabolites (LA, 6-demethoxy-LA, P-I and II) showed the highest abundance at 12 h with varying degrees, while HA content was not influenced (Fig. 10E). Moreover, the expression of PlDIR1 was not related to metabolites induced by ETH, as revealed by the PCC analysis in Fig. 10F.

Fig. 10
figure 10

The correlation analysis between hormone-induced lignan accumulations and PlDIR1 expression profile. A, C, E. Effect of MeJA, SA, ETH treatments on the accumulation of five key lignans in P. leptostachya roots, respectively. B, D, F. The Pearson’s correlation coefficient between lignan contents and PlDIR1 expression profile under MeJA, SA, ETH treatment, respectively. The number -1 to 1 indicates the correlation from low to high. Asterisks indicate the significant correlation (*P < 0.05, **P < 0.01, Student’s t-test); FW, fresh weight

Docking analysis of substrate interactions

To examine the enzymatic structure–function relationships underlying the ( +)-pinoresinol formation activity of PlDIR1, a molecular docking analysis was performed to gain some insight into the potential reaction mechanism involved. The homology model for PlDIR1 was generated based on the crystal structure reported for PsDRR206 [43]. As a result, two pockets (A and B) with different sizes were exhibited at the open end of the barrel of PlDIR1 (Fig. 11B), which is consumed to bind two substrate molecules. According to the docking studies for AtDIR6 and PsDRR206, the putative substrate for PlDIR1 is a reactive radical species, so it is hard to get the protein-substrate complexes straightforwardly. Accordingly, the bisQM, being the putative intermediate in pinoresinol formation following (CA·) radical coupling before cyclization of the furan rings, was used as a ligand to conduct the docking analysis (Fig. 11A). After docking runs, the one with the lowest energy and the greatest number of bindings was selected as the final analysis result. Important amino acids are present in the active site of PlDIR1 as previously reported in the structure and shown in Fig. 11C. Asp-42, Leu-44, Asn-52, Thr-54, Tyr-103, Tyr-105, Gly-112, Ala-113, Trp-114, Leu-115, Leu-138, Asn-140, Lys-141, Arg-143, Thr-165, Ser-167, Phe-174, and Leu-176 were the PlDIR1 amino acid residues that interacted with bisQM, and four hydrogen bonds were formed between Asp-42, Ala-113, Leu-138, Arg-143, and PlDIR1 (Fig. 11D).

Fig. 11
figure 11

Molecular docking model for PlDIR1 protein with the proposed reaction intermediate for ( +)-pinoresinol. A. Putative biosynthesis mechanism to afford ( +)-pinoresinol. B. Surface representation of PlDIR1 showing the pockets A and B of the active sites. C. View of the active site showing important residues within the binding pockets. D. Potential binding mode of bisQM (red molecular) in the active site of PlDIR1. Conformation and position were optimized by energy minimization after manual placement of the ligand. Possible hydrogen bonds are indicated with yellow dotted lines

PlDIRs co-expression analysis with genes involved in lignan biosynthesis

To deepen the understanding of the P. leptostachya lignan biosynthesis pathway, PlDIRs and 108 lignan synthesis-related genes chosen from the P. leptostachya transcriptome were subjected to a co-expression analysis, which was generated with Cytoscape software. The selected genes are listed in Additional file 4: Table S3, and a schematic biosynthetic pathway is proposed in Additional file 5: Fig. S2 to gain insight into their position and potential roles. As shown in Fig. 12, a total of 87 co-expressed genes showed a greater correlation coefficient than 0.7 with at least one other gene.

Fig. 12
figure 12

Co-expression correlations of genes involved in lignan biosynthesis. Edges are drawn when the linear correlation coefficient is > 0.7. Red rectangles represent PlDIRs; green circles represent characterized CADs; CYP81Q38 and PlDIR1 that might be involved in sesamin biosynthesis are marked with red circles. PAL, phenylalanine ammonia-lyase; C4H, trans-cinnamate 4-monooxygenase; C3H, p-coumarate 3-hydroxylase; COMT, caffeic acid 3-O-methyltransferase; 4CL, 4-coumarate-CoA ligase; CCoAOMT, caffeoyl-CoA O-methyltransferase; CCR, cinnamoyl-CoA reductase; CAD, cinnamyl-alcohol dehydrogenase; DIR, dirigent protein

A strong correlation was found between PlDIRs and lignan synthesis genes. Both peroxidase and laccase genes (POXs and LACs) are potentially involved in monolignol oxidation, i.e.: PlDIR1 with POX3 and POX15, PlDIR2 with POX9 and LAC3, PlDIR4 with POX2/4/5/7/9/10 and LAC2/3, PlDIR5 with POX2/4/5/7/9/10/14 and LAC2/3, PlDIR7 with POX2/4/5/7/10/14 and LAC2, PlDIR8 with POX11/13, PlDIR9 with POX11 and LAC4, PlDIR11 with POX11, PlDIR13 with POX6, and PlDIR14 with POX15. Furthermore, according to the pathway of lignan biosynthesis, genes of trans-cinnamate 4-monooxygenase (C4Hs), p-coumarate 3-hydroxylase (C3Hs), caffeic acid 3-O-methyltransferase (COMTs), 4-coumarate-CoA ligase (4CLs), caffeoyl-CoA O-methyltransferase (CCoAOMTs), cinnamoyl-CoA reductase (CCRs), and cinnamyl alcohol dehydrogenase (CADs) were in the upstream of PlDIRs, a reported CYP450 gene PlCYP81Q38 catalyzing ( +)-sesamin biosynthesis from ( +)-pinoresinol was in its downstream [18]. As the results indicated, PlDIRs were co-expressed with genes that were involved in the lignan biosynthesis pathway, which were the catalyzed genes of continuous two or three reactions. For example, PlDIR1 was predicted to co-express with CCR3/4, CAD5, and PlCYP81Q38, PlDIR3/8/9/14/15 with CCRs and CADs, PlDIR4/5/7 with CCoAOMTs, CCR12 and CAD4, PlDIR10 with C3H1 and COMT2, and PlDIR13 with 4CLs, CCoAOMTs, CCRs and CADs (Fig. 12).

Discussion

Our understanding of plant growth and development in many plant species has considerably advanced as a result of the characterization of DIR genes over the past few decades. Plant DIR proteins are involved in both abiotic and biotic stress responses. They were first discovered for regio- and stereo-selective coupling in the process of lignan biosynthesis. Large multigene families made up of DIRs have been found in terrestrial plants, and different plant species have varying numbers and types of DIRs. For example, 35 DIRs have been identified in spruce (Picea spp.) [21], 25 DIRs in Arabidopsis [21], 19 DIRs in I. indigotica [Characterization analysis of PlDIR proteins

The putative signal peptide sequence of PlDIR proteins was predicted at the SignalP 4.1 server (http://www.cbs.dtu.dk/services/SignalP/), and the conserved motifs in the deduced PlDIR protein sequences were analyzed using the MEME tools (http://memesuite.org/) with default parameters [69]. Theoretical MW and pI were assessed through the ExPASy ProtParam website (http://www.expasy.org/tools/protparam.html). N-glycosylation sites (Asn) were searched online using the NetNGlyc 1.0 server (https://www.cbs.dtu.dk/services/NetNGlyc/). In addition, WoLF PSORT (https://wolfpsort.hgc.jp/) and CELLO v.2.5 (http://cello.life.nctu.edu.tw/) were used to predict the subcellular locations of PlDIRs [70].

Phylogenetic analysis and multiple sequence alignment

To characterize the phylogenetic relationships between PlDIRs and DIR proteins from other plant species, MEGA 7.0 with the neighbor-joining method was used to construct a phylogenetic tree with default parameters [71]. In addition, two DIR-a proteins of PlDIRs were selected to analyze the sequence similarity with the proteins in the same subfamily by Circoletto, a web interface for comparing two sequence libraries via Circos [72]. P. leptostachya genes were used as the query against other DIR genes, and only the best match between the subject (PlDIRs) and query sequences was considered. Furthermore, under default settings, multiple sequence alignment was conducted with Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and illustrated with GeneDoc software [73].

Homology modeling and molecular docking analysis

The crystal structure of a DIR protein, PsDRR206 (C4REV.A), was used as a template to predict the theoretical model [43]. The initial homology models of PlDIR proteins were generated using the SWISS-MODEL workspace (https://www.swissmodel.expasy.org/) [74]. The interactions between intermediate 8–8′ linked bis-quinone methide (bisQM) in pinoresinol formation and PlDIR1 were predicted using the Discovery Studio CDOCKER software (Accelrys). The molecular structure of 8–8′ linked bisQM was generated with the use of Chem3D 19.0 and was prepared for a ligand by the operation “Apply Forcefield”. The 3D structure of PlDIR1 was prepared for the receptor protein by the operations Clean Protein, Hydrogen Add, and Apply Forcefield. The interaction or binding sites of PlDIR1 proteins were defined in previous studies [43, 45]. As a result, receptor-ligand interactions were operated by the CDOCKER protocol with the default parameters [75]. After molecular docking, the conformation with the lowest CDOCKER Interaction Energy pose was selected as the most probable binding conformation, and the types of amino acid residues and hydrogen bonds were visualized in the receptor-ligand interaction. All 3D structures of homology modeling and docking were visualized and manipulated with PyMol [76].

Gene expression analysis

Total RNA was extracted from P. leptostachya tissues with the TRIzol™ Reagent (Invitrogen, USA). Then, cDNA was synthesized from 1 μg total RNA using a PrimeScriptTM RT reagent Kit with gDNA Eraser (Takara, Japan). The qRT-PCR was performed using the TB Green Premix Ex Taq™ (Tli RnaseH Plus) (Takara, Japan) with a Light Cycler 480 II system (Roche Diagnostics, Mannheim, Germany) under the following procedures: 95℃ for 30 s, 95℃ for 5 s (40 cycles), and 60℃ for 20 s. The transcript levels of the 5.8 s rRNA (GenBank Accession: DQ533822) were used as a quantitative control. All the qRT-PCR primers were designed via Primer3 (https://primer3.ut.ee/) and are listed in Additional file 6: Table S4. Each reaction was repeated with three duplications biologically and three duplications technically. The comparative threshold approach (2−ΔΔCt) was used to assess amplification results.

Expression and purification of recombinant PlDIR1 protein

The ORF encoding PlDIR1 lacking a signal peptide sequence was amplified using specific primers that contain EcoR I and Hind III restriction sites in the forward and reverse directions, respectively (Additional file 6: Table S4). Then, the PCR products were inserted into the EcoR I/Hind III site of the pET29a( +) vector with the His tag using a ClonExpress II One Step Cloning Kit (Vazyme, China) to generate the pET29a( +)-PlDIR1 plasmid. The recombinant protein was expressed in Escherichia coli BL21 (DE3) and purified using a Ni–NTA affinity column (Qiagen, Germany). After desalting with PD-10 columns (GE, USA), the purified protein was concentrated with an Amicon® Ultra-4 centrifugal filter (Millipore, USA). A BCA protein assay kit (Epizyme, China) with bovine serum albumin (BSA) as the standard was used to measure the protein concentration. The presence of recombinant protein was confirmed by SDS-PAGE and western blot using anti-His antibodies (1:3000, CWBIO, Bei**g, China) [77].

In vitro enzyme activity assays and LC–MS/MS analysis

Enzyme activity assays were performed following Davin et al.’s method with minor modifications [20]. The total volume of standard reaction mixtures was 250 µL, which consisted of 8 mU/mL laccase from Trametes versicolor (Yuanye Bio-Technology Co., Ltd, China), 2 mM E-coniferyl alcohol, and 60 µL recombinant protein in MES-NaOH buffer (40 mM, pH 6.0). The reaction mixtures without recombinant protein or E-coniferyl alcohol were used as blank controls. To prepare the samples for enzyme activity reactions, the mixtures were incubated at 30 °C for 3 h, extracted twice with ethyl acetate, evaporated to dryness under a vacuum, and re-dissolved in 50% methanol. After filtering through a 0.22-μm organic membrane, samples were subjected to LC–MS/MS analysis system, with a Surveyor MS Pump Plus with Autosampler and a LTQ XL mass spectrometer (Thermo Scientific, USA) in negative ion mode. The mobile phase was 55% acetonitrile (contain 0.1% formic acid, v/v) and 45% water (contain 0.1% formic acid, v/v), under the following conditions: a flow rate of 0.3 mL/min, a Intertsil OSD-3 C18 Column (250 mm × 3.0 mm; GL Sciences Inc, Japan) at a column temperature of 35 °C and injection with 5 µL samples. Characteristic m/z ions were 150.5 → 151.5 for ( +)-pinoresinol.

Lignan accumulation analysis

P. leptostachya root tissues (500 mg) were ground with liquid N2 and extracted with 5 mL of 80% methanol under sonication for 30 min. After centrifugation at 12,000 g for 10 min, the supernatant was filtered through a 0.22-µm organic membrane filter and subjected to HPLC analysis on a Nexera HPLC LC-30A system (SHIMADZU, Japan) using a 5 μm, 4.6 × 250 mm Hypersil BDS C18 column (Elite, China) with a 35 °C column temperature. A mobile phase consisting of methanol: water (70: 30, v/v) was used, with the flow rate set at 0.8 mL/min for 15 min and 10 μL for the injection volume. The UV absorbance was monitored at 280 nm. Metabolite identification and quantification was achieved as reported before [11]. The tests were run in three biological replicates, and the samples for qRT-PCR and metabolites analysis were the same.

Gene co-expression analysis

Together with the identified PlDIR genes, a co-expression network was generated with the genes selected from the P. leptostachya transcriptome that are putatively involved in lignan biosynthesis. The complete list of these genes is presented in Additional file 4: Table S3. Gene expression data were collected from the root, leaf, and stem tissue’s full-length transcriptome database from P. leptostachya (Accession: PRJNA551634). Then, a gene expression correlation matrix was created using pair-wise Pearson correlation coefficients (PCC). Cytoscape 2.8.3 software was used to display a co-expression network that only included PCC values that were significant at P < 0.05 and had a cut-off value of 0.95 [78].