Introduction

Terpenoids are a large group of structurally complex natural products widely distributed in nature, with a predominant presence in the secondary metabolites of plant, animal, and microoganism. The biosynthesis of terpenoids is a highly intricate process. To date, over 90,000 terpenoids have been identified, primarily originating from plants, microorganisms, and marine organisms (Huang et al. 2022; Li et al. 2023b; Ma et al. 2023a). Terpenoids exhibit a wide array of unique physicochemical properties and biological activities, making them indispensable in various industries such as medicine, cosmetics, food processing, biofuels, and more. They hold substantial economic value and broad market prospects (Li et al. 2016). The introduction of 2-MB synthase (MBS) from PfO-1 and M. olivasterospora, as well as four genes (including MIBS) from S. griseus and S. coelicolor into 2-methyl-GPP-producing E. coli, ultimately generates different terpene products, including 2M2B, 1MC, 2-methylmyrcene, 2-methyllimonene, 2-methyl-β-fenchol, 2-methyl linalool, 2-methyl-α-terpineol, 2-methyl geraniol, and 2-methyl nerol (Kschowak et al. 2018).

Biosynthesis of geosmin

The biosynthetic pathway of geosmin has been elucidated by isotope-labeled precursor feeding experiments, identification of geosmin synthase (GS), and characterization of its byproducts. Geosmin biosynthesis begins with the cyclization of FPP into germacradienol catalyzed by GS. The subsequent step involves the fracture of the retro-Prins, which leads to the deletion of acetone. Further cyclization forms octalin, which subsequently undergoes deprotonation and electron rearrangement. The final step involves the trap** of a water molecule to form geosmin (Cane et al. 2006; Dickschat et al. 2005; Jiang et al. 2007; Jiang and Cane 2008; Nawrath et al. 2008).

In another report, 262 identified candidate TPSs of bacterial origin were engineered into the fungi strain Streptomyces avermitilis, and the majority of enzymes were found to belong to sesquiterpene synthases, and the major products were sesquiterpenoids such as geosmin and epi-isozizaene. In addition, GS was present in the majority of actinomycetes (Yamada et al. 2015).

Protein analysis shows that GS is composed of two structural domains, the N-terminal and C-terminal domains. The N-terminal domain possesses terpene cyclase activity and enables the generation of germacradienol from FPP. In addition, the C-terminal structural domain catalyzes the generation of geosmin from germacradienol (Cane and Watt 2003; Gust et al. 2003; Jiang et al. 2007). Furthermore, the biosynthesis of FPP to geosmin is catalyzed by a single enzyme without the intervention of any other enzyme or the need for any redox cofactor (Jiang et al. 2006) (Fig. 1).

Biosynthesis of noncanonical terpenoids with C16 carbon skeleton

Biosynthesis of homoterpenes

During the biosynthesis of the homoterpene TMTT and its C11-analog DMNT (As DMNT is the homoterpene of TMTT, we put this C11-analog here for description), GGPP and FPP were catalyzed by geranyllinalool synthase (GES) and nerolidol synthase (NES), respectively, to first generate geranyllinalool and nerolidol. Subsequent isotope-labeling precursor experiments revealed that both geranyl linalool and nerolidol undergo oxidative degradation to ultimately produce their respective homoterpenes. In addition, a CYP450 enzyme, CYP82G1, found in Arabidopsis, comes into action (Fig. 2). In the process of TMTT and DMNT synthesis, the CYP82G1 catalyzes the degradation of homoterpenes to generate TMTT and DMNT, respectively (Lee et al. 2010). Meanwhile, two CYP450 enzyme genes, GhCYP82L1 and GhCYP82L2, were identified from Gossypium hirsutum, which has opened new insights into the biosynthesis of TMTT and DMNT. Heterologous expression in yeast and the subsequent enzyme analyses demonstrated that they are involved in the biosynthesis of TMTT and DMNT and can catalyze the conversion of geranyl linalool to DMTT or nerolidol to TMTT (Liu et al. 2018). DMNT-rich plants showcase significant potetial in “push-pull” strategies for pest management, offering an effective means to regulate insect behavior. For instance, transgenic tobacco co-expressing GhCYP82Ls and GhTPS14 can release DMNT in response to cotton bollworm attack, and the DMNT-releasing transgenic tobacco exhibit significant capability to attract the parasitoid wasp Microplitis mediator (Liu et al. 2021).

Fig. 2
figure 2

The biosynthesis of homoterpenes and other noncanonical terpenoids with C16 carbon skeletons. GGPP geranylgeranyl pyrophosphate, GGPPS GGPP synthase, GES geranyllinalool synthase, DMNT (E)-4,8-dimethyl-1,3,7-nonatriene, NES nerolidol synthase, TMTT (E,E)-4,8,12-trimethyltrideca-1,3,7,11-tetraene, PSPP presodorifen pyrophosphate, SpFPPMT FPP methyltransferase from S. plymuthica, SpSODS sodorifen synthase from S. plymuthica, PcFPPMT FPP methyltransferase from P. chlororaphis, VbFPPMT FPP methyltransferase from V. boronicumulans, Pcγ-PSPP-MT γ- PSPP methyltransferase from P. chlororaphis, Vbγ-PSPP-MT γ- PSPP methyltransferase from V. boronicumulans, α-PCPP α-prechlororaphen pyrophosphate, Pc-ChloS chlororaphen synthases from P. chlororaphis, Vb-ChloS chlororaphen synthases from V. boronicumulans, APP ancheryl diphosphate, SPP serratinyl diphosphate, WPP weylandtenyl diphosphate, PPP plymuthenyl diphosphate, TPP thorvaldsenyl diphosphate, KPP kimlarsenyl diphosphate, BPP blixenyl diphosphate, HPP hammershoyl diphosphate, JPP jacobsenyl diphosphate

The control of pests and diseases in rice is particularly important as rice is a critical crop and a primary food for many. However, the biosynthetic pathway of homoterpene in rice is still unknown. The practical application of this indirect defense system also suffers from inherent limitations due to the quantitative constraints of homoterpene. However, recent in vitro biochemical functional characterization of yeast has discovered that OsCYP92C21 protein plays a pivotal role in the conversion of nerolidol and geranyl linalool to DMNT and TMTT, respectively. In addition, specific subcellular targeted expression, genetic transformation, and gene introgression have been reported to significantly increase the biosynthesis levels of DMNT and TMTT in rice. Thus, higher amounts of homoterpene can also be emitted even in the absence of inducing factors (Li et al. 2021).

Biosynthesis of sodorifen

Sodorifen, a novel 1,2,4,5,6,7,8-heptamethyl-3-methylene-bicyclo[3.2.1]oct-6-ene, is a major constituent of new hydrocarbons released by the rhizobium Serratia odorifera (von Reuss et al. 2010). FPP undergoes a SAM-dependent methylation modification reaction in the presence of SpFPPMT, an FPP methyltransferase from S. plymuthica, to generate presodorifen pyrophosphate (PSPP). During this process, the intermediate cyclohexyl carbocation does not undergo deprotonation as in typical terpene biosynthesis. Instead, it undergoes cyclization to generate an additional cycle. This step is catalyzed by the rare cyclase activity of SpFPPMT. PSPP, now containing a pentamethylcyclopentenyl group, undergoes further cyclization in the presence of S. plymuthica sodorifen synthase (SpSODS) to generate sodorifen (von Reuss et al. 2018) (Fig. 2). Moreover, 38 biosynthetic gene clusters (BGCs) were obtained by mining the S. plymuthica genome and through antiSMASH analysis. Using direct pathway cloning (DiPaC), 4.6 kb sodorifen candidate BGCs were successfully intercepted. Integrating them with the tetracycline-induced PtetO promoter and transforming them into E. coli resulted in the generation of large amounts of sodorifen in E. coli (Duell et al. 2019).

Apart from the rhizobium S. plymuthica, FPPMT was also identified in γ-proteobacterium P. chlororaphis O6 and β-proteobacterium Variovorax boronicumulans PHE5-4. In addition, methylation modification of PSPP by PSPPMT has been confirmed, revealing an noncanonical biosynthetic pathway for the first natural brexane-type bishomosesquiterpene, chlororaphen (C17H28). In the presence of enzymes like PcFPPMT or VbFPPMT, FPP is methylated to produce γ-presodorifen pyrophosphate (γ-PSPP, C16). Subsequent c-methylation of γ-PSPP by a second methyltransferase, Pcγ-PSPP-MT or Vbγ-PSPP-MT, produced α-prechlororaphen pyrophosphate (α-PCPP, C17). Finally, chlororaphen is generated under the catalysis of chlororaphen synthases (Pc-ChloS and Vb-ChloS) (Magnus et al. 2023) (Fig. 2). Bacterial genomic information mining led to the discovery of the sodorifen biosynthesis gene cluster. Subsequently, the TPSs from this gene cluster were extracted and introduced into an engineered yeast strain co-expressing SpFPPMT for fermentation. The research identified 47 different C16 terpenes in the products. Moreover, the structures of 13 different C16 noncanonical terpenes were resolved, highlighting the extensive structural diversity within this group of compounds (Duan et al. 2023).

Biosynthesis of other noncanonical terpenoids with C16 carbon skeleton

Key amino acid residue sites in SpFPPMT are essential for its role. Disruption of the carbocation substrate through potential stabilization/destabilization of the carbocation intermediate, or through spatial interference can lead to no further generation of PSPP, but rather to the generation of other C16 backbones. Mutations (F58V, F58L, F58M) in the F58 site of SpFPPMT generated diphosphate building blocks such as plymuthenyl diphosphate (PPP), thorvaldsenyl diphosphate (TPP), and weylandtenyl diphosphate (WPP) in S. cerevisiae. Additionally, the L302 variant of SpFPPMT generated diphosphate building blocks such as blixenyl diphosphate (BPP), kimlarsenyl diphosphate (KPP), and serratinyl diphosphate (SPP). In addition, diphosphate building blocks such as jacobsenyl diphosphate (JPP), hammershoyl diphosphate (HPP), and ancheryl diphosphate (APP) were identified in the variants that had two mutated residues like SpFPPMT (F58M-L302S), SpFPPMT (F58M-L302Q), and SpFPPMT (F58M-V273A) (Fig. 2). These novel diphosphate building blocks when further processed with CYP450 enzymes ultimately led to the synthesis of 28 distinct C16 noncanonical terpenoids (Ignea et al. 2022).

In addition to its role in catalyzing the generation of 6-methyl-GPP, the GPPMT BezA can also undergo targeted mutation to obtain BezA (W210A), a mutant of BezA. W210A exhibits FPP C6 methyltransferase activity. In addition, this mutant enzyme can carry out a methylation reaction at the C6 position, effectively producing 6-methyl farnesyl pyrophosphate (6-methyl-FPP) using FPP as its substrate (Tsutsumi et al. 2022).

Excess methyl groups in propionyl-CoA in the LMVA pathway can be retained, culminating in the formation of noncanonical terpenoids of C16, C17, and C18. The introduction of MVA pathway genes from Bombyx mori, FPP synthase (FPPS) genes CfFPPS1 and CfFPPS2 from Choristoneura fumiferana, and epi-isozizaene synthase into E. coli ultimately generated noncanonical terpenoids of C16 as well as C17 and C18 compounds (Eiben et al. 2019).

Noncanonical triterpenoids synthesized from non-squalene substrates

It is generally accepted that triterpenoids are generated by triterpene synthase cyclization with squalene and 2,3-oxidosqualene as precursors (Abe 2007). However, terpenoids such as C10, C15, and C20 are short-chain terpenoids generated by cyclization, hydroxylation, and oxidation relying on polyisoprenyl diphosphates (Degenhardt et al. 2009; Minami et al. 2018). IPP and DMAPP are catalyzed by FPS to generate FPP. The two FPPs are then subjected to SS- and SE-catalyzed condensation to sequentially generate squalene and 2,3-oxidosqualene. These intermediates, squalene and 2,3-oxidosqualene, are subsequently cyclized by triterpene synthases (TrTSs) to form lanosterol, which is then modified by numerous enzymes to form desired triterpene compounds (Chen et al. 2023; Garcia-Bermudez et al. 2019). TPSs can be classified into two types, type I and type II, based on the amino acid conserved site and the mode of substrate protonation during catalysis. Type II terpene synthases catalyze subsequent reactions by protonating the substrate through a conserved “DXDD” motif, such as Squalene-Hopene Cyclase (SHC) and Oxidosqualene Cyclase (OSC). However, type I terpene synthases catalyze the subsequent reactions by binding to Mg2+ via the “DDXX(D/E)” and “NSE/DTE” domains and removing the substrate pyrophosphate group, such as methylisoborneol synthase and pentalenene synthase (Christianson 2017). Although type I terpene synthases are capable of synthesizing monoterpenes, sesquiterpenes, diterpenes, and dibenzoterpenes using isopentenyl pyrophosphate of different chain lengths as substrates, the synthesis of triterpenes using hexaprenyl pyrophosphate (HexPP, C30) as a substrate has not been reported yet. Moreover, HexPP is generated through HexPP synthase catalyzation by adding three IPP molecules to FPP following the C15-C20-C25-C30 sequence. It is important to note that HexPP synthase cannot synthesize GPP or FPP using IPP and DMAPP as substrates (Ogura et al. 1997; Sasaki et al. 2011). Furthermore, research on HexPP synthase (HexPPS) from Sulfolobus solfataricus has revealed that these can add IPP based on GGPP to generate HexPP. In addition, key amino acid residue sites determining the carbon chain length of the product have been identified by crystal structural analysis and site-directed mutagenesis (Sun et al. 2005).

In recent research, two bifunctional chimeric TPSs, TvTS and MpMS, were identified in the filamentous fungi Talaromyces verruculosus TS63-9 and Macrophomina phaseolina MS6, in which the synthesis of triterpenoids using HexPP as a substrate was realized for the first time. TvTS and MpMS can synthesize novel triterpenoid skeleton compounds, talaropentaene, and macrophomene, using either IPP and DMAPP or HexPP directly as substrates. This breakthrough discovery destroys the stereotype that triterpene skeletons can only be synthesized using squalene as a starting unit (Courdavault and Papon 2022). In vitro reaction and homologous activation experiments demonstrated that synthesizing triterpene skeletons is the inherent ability of TvTS and MpMS. In-depth investigation, including in vitro isotope feeding experiments resolved the absolute configuration of the products and the cyclization mechanism. For talaropentaene, cyclization initiates with a C1-III-IV reaction triggered by the removal of the HexPP pyrophosphate group. The ensuing 1,2-hydrogen ion migration and deprotonation culminate in the formation of talaropentaene. Macrophomene cyclization, however, commences with the departure of the HexPP pyrophosphate group, which triggers cyclization at the C1 and C22 positions. Deprotonation at the C1 position forms a ternary ring and ultimately macrophomene. To further investigate the generality of this synthesis and to precisely and efficiently target TPSs, research efforts expanded to include a wider range of natural triterpene species. Subsequently, batch prediction and molecular docking of TPS 3D structures were performed by AlphaFold2. As a result, two additional triterpene synthase genes, Cgl13855 and PTTC074, were successfully predicted and obtained. Both of them demonstrated the capability to generate the new triterpene colleterpenol. These results illustrate the generalizability of type I chimeric TPSs in catalyzing the cyclization of HexPP to produce triterpene skeletons (Tao et al. 2022).

Nortriterpenes in G. lucidum

Ganoderma triterpenoids (GTs) are one of the main chemical components isolated from G. lucidum (Galappaththi et al. 2022). These triterpenoid compounds have complex and variable structures, and can undergo carbon reduction, ring opening, or rearrangement. Based on the number of carbon atoms in the skeleton, GTs are divided into three types, namely C30, C27, and C24-type (Gong et al. 2019). Among them, three C27-type nortriterpenoids, including lucidenic acid A, B, and C, were first discovered, and subsequently, C24-type nortriterpenoid compounds were identified, such as lucidones A, lucidones B, lucidones C, and lucidones H (Galappaththi et al. 2022). Compared with C30-type triterpenoids in G. lucidum, these C27- and C24-type nortriterpenoids are derived from lanostane-type triterpenoids with degraded side chains. However, little is known about the biosynthesis of C27-type and C24-type nortriterpenoids from lanosterol. Previous report has speculated the biosynthesis of one C24-type nortriterpenoid. The precursors ganoderic acid and its esterified derivatives underwent oxidation to form intermediates (such as lucidone A and lucidone D), which were further converted into lucidone J and lucidone K through addition reactions. Then, the compound lucidone K generated the C24-type nortriterpenoid lucidone I through elimination and addition reactions (Chen et al. 2017).

Conclusion and prospects

The distinctive structures and activities of noncanonical terpenoids have attracted extensive attention. The current approaches for the biosynthesis of noncanonical terpenoids primarily involve two strategies: (1) The biosynthesis of noncanonical terpenoids can be achieved by harnessing specific methyltransferases that utilize IPP, GPP, and FPP as starting substrates. Alternatively, protein engineering techniques can be employed to modify these methyltransferases, enabling the synthesis of noncanonical terpenoids with varying chain lengths, such as C6, C7, C11, C12, and C16; this approach expands the diversity of terpene skeletons; and (2) Triterpenoids can be synthesized without the need for squalene as a substrate. This innovative approach enhances our understanding of the biosynthesis of terpenes in biological systems. Analyzing the biosynthetic pathways of noncanonical terpenoids and studying the functions of related enzymes allow access to a large number of key intermediates and elucidation of rare catalytic mechanisms. These discoveries serve as a robust foundation for subsequent large-scale preparation of these noncanonical terpenoids using synthetic biology and the exploration of new noncanonical terpenoids.

Future research on noncanonical terpenoids will be devoted to the identification of novel carbon skeletons and catalytic mechanisms, and the discovery of their bioactivities, with a focus on the following prospects: (1) Comprehensively analyzing the genomes of microorganisms and plants using high-throughput sequencing technology can excavate novel catalytic enzymes and biosynthetic pathways, facilitating the discovery of new types of noncanonical terpenoids; (2) Achieving the refined and modular design of biosynthetic pathways for noncanonical terpenoids and in-depth analysis of natural synthetic pathways can lead to the discovery of more efficient and controllable synthetic routes to realize the precise synthesis of target products; (3) Increased research on the biological activities and pharmacological effects of noncanonical terpenoids can fully stimulate their antimicrobial, antitumor, and antioxidant effects; (4) Integration of cutting-edge technologies, such as big data, artificial intelligence, and machine learning, can enable more accurate prediction and verification of biosynthetic pathways of noncanonical terpenoids.