1 Introduction

Engineering microbial cell factories have been widely applied to produce various chemicals, such as natural products, biofuels, and bulk chemicals (Cho et al. 2022; Zhu et al. 2023). Metabolic engineering and synthetic biology enable the design of kinds of advanced cell factories mostly by introducing heterologous or non-natural biosynthetic pathways into host strains. From the previous complete biosynthesis of opioids (Galanie et al. 2015) to the de novo biosynthesis of xanthohumol (Yang et al. 2024), yeasts have shown great potential in the biosynthesis of many high-value active compounds. In addition to the model strains such as Escherichia coli and Saccharomyces cerevisiae, several microorganisms have been engineered as important chassis cells to adapt different application environments, such as Zymomonas mobilis (Wang et al. 2018), Yarrowia lipolytica (Park and Ledesma-Amaro 2023), and Halomonas campaniensis (Ling et al. 2019), with the aid of powerful genome-editing tools. A series of strategies based on metabolic engineering and systematic biology have been developed to improve the productivity of microbial cell factories, mainly by fine-tuning heterologous pathways (Chen et al. 2022; Ding and Liu 2023; Yan et al. 2023), eliminating the rate-limiting enzymatic steps (Li et al. 2020) and host engineering to block competing pathways (Ma et al. 2019).

Despite the great progress achieved by these strategies, engineering microbial cells to meet industrial requirements remains a challenge. In the large-scale fermentation process, microbial cells constantly face perturbations resulting from genetic and phenotypic instability, metabolic imbalance, and various harsh industrial conditions (including low pH, high temperature, and metabolite toxicity), which lead to poorly performing strains under these conditions. However, engineered microbial cells in the laboratory often do not take into account these multiple disturbances encountered in industrial conditions. Microbial robustness refers to the ability of the microbe to maintain constant production performance (defined as titers, yields, and productivity) regardless of the various stochastic and predictable perturbations that occur in a scale-up bioprocess (Mohedano et al. 2022; Olsson et al. 2022). Poor robustness limits industrial-scale microbial production.

The concept of microbial robustness goes beyond that of tolerance, even though they have sometimes been used interchangeably in industrial microbial applications. Tolerance or resistance refers to the ability of cells to grow or survive when exposed to single or multiple perturbations. It is generally described only in terms of growth-related parameters (such as viability or specific growth rate). Robustness represents the ability of a strain to maintain a stable production performance (e.g. titer, yield, and productivity) when growth conditions are changed. Strains with higher tolerance do not guarantee a higher yield, while the strain with higher robustness must have a higher tolerance. Therefore, increasing the strain robustness against unfavorable conditions becomes one of the most important considerations in engineering microbial cell factories and extending them to practical applications.

In this review, we focus on the introduction of the most proven strategies in engineering microbial robustness for high titer and productivity (Fig. 1). In addition, the challenges and future perspectives of microbial host engineering for increased robustness are discussed.

Fig. 1
figure 1

Strategies for engineering robust microbial cell factory

1.1 Transcription factor engineering

Transcription factors (TFs) are key proteins that control the fine-tuning expression of target genes by activating or suppressing gene transcription in a variety of biological processes (He et al. 2023). Cells have evolved to optimize cellular function through the coordinated regulation of multiple enzymes and pathways by different transcription factors in response to different environmental conditions. Based on their regulatory scope, transcription factors can be divided into global and specific transcription factors (Yu and Gerstein 2006). Global transcription factors can initiate or repress the expression of different genes involved in different physiological activities. The seven most well-characterized global regulatory factors, including CRP, IHF, FNR, ArcA, FIS, Lrp, and NarL, control over 50% of the E. coli genes (Lin et al. 2013). In a pyramidal gene expression network of E. coli, the top global regulatory factors control the middle high-level regulatory factors, which further regulate the low-level regulatory factors. Through a hierarchical regulation, the transcription and expression of target genes are systematically controlled in the genome-wide metabolic network. Therefore, the transcription factor has become a feasible and efficient target for improving strain robustness (Table 1).

Table 1 Strategies for transcriptional factor engineering

Global transcription machinery engineering (gTME), which focuses on introducing mutations in generic transcription-related proteins that trigger the reprogramming of gene networks and cellular metabolism, has proven to be a versatile approach to altering cell robustness. For example, engineering the housekee** sigma factor δ70 improved the E. coli tolerance to 60 g/L ethanol and high concentrations of SDS, while resulting in a high yield of lycopene (Alper and Stephanopoulos 2007). The gTME strategy has also been used in the more complex eukaryotic transcriptional machinery S. cerevisiae to increase its resistance to high concentrations of glucose and ethanol. Two target proteins Spt15 and Taf25 were selected for constructing ep-PCR gene libraries, and the resulting best mutant spt15-300 showed a significant growth improvement in the presence of 6% (v/v) ethanol and 100 g/L glucose (Alper et al. 2006). Further studies extended the gTME method to different organisms such as Lactobacillus plantarum, Rhodococcus ruber, and Z. mobilis to enhance their acid tolerance, acrylamide tolerance, and ethanol tolerance, respectively (Klein-Marcuschamer and Stephanopoulos 2008; Ma and Yu 2012; Tan et al. 2016a).

In addition to δ70, the cAMP receptor protein (CRP), which regulates more than 400 genes, has been successfully evolved to improve alcohol tolerance, and acid tolerance, and increase biosynthetic capacities such as vanillin, naringenin and caffeic acid (Basak et al. 2014; Geng and Jiang 2015; Zhang et al. 2023). For example, heterologous expression of the global regulator irrE from Deinococcus radiodurans and its mutant IrrE increased tolerance against ethanol or butanol stress in E. coli by 10 to 100-fold (Chen et al. 2011). Thereafter, by overexpression of the response regulator DR1558 from D. radiodurans, the engineered E. coli increased tolerance to osmotic stress at high concentrations of 300 g/L glucose and 2 mol/L NaCl (Guo et al. 2023). In addition, the evolved tolerant strain may show unexpectedly low production titers, rate or yield.

1.5 Computation-assisted robustness design

The aforementioned experimental methods can, to a certain extent, tune the performance of microbial cells to resist harsh industrial conditions. However, traditional regulatory strategies generally require a continuous design-build-test-learn cycle, which is time-consuming and laborious. More importantly, the intrinsic regulatory mechanism is complex. For example, the transporter protein is not always specific for certain compounds. Broad substrate specificity increases the uncertainty.

Genome-scale models (GEMs) have developed as one computational system biology approach to interpret and integrate multi-omics data. GEMs can be used to compute the metabolic and proteomic state of a microorganisms. Many GEMs have been constructed for typical industrial microorganisms, such as E. coli (Mao et al. 2022), S. cerevisiae (Lu et al. 2021), and B. subtilis (Kocabaş et al. 2017). Due to the biological complexity, such GEMs are generally integrated with different constraints to predict phenotype from genotype more accurately. As for E. coli, three stress-specific GEMs, FoldME (Chen et al. 2017), OxidizeME (Yang et al. 2019) and AcidifyME (Du et al. 2019), have been constructed for various environmental pressures. FoldME, a thermal-stress-response model, delineates the in vivo protein folding through the competition between de novo spontaneous folding and chaperone-mediated (HSP70 or HSP60) folding pathways. OxidizeME, a ROS-stress-response model, computes the systems-level balance between ROS management and iron homeostasis, including demetallation/mismetallation of Fe(II) proteins, damage and repair of iron–sulfur clusters and DNA damage. AcidifyME, an acid-stress-response model, established a quantitative framework integrated with characterized acid resistance mechanisms, including membrane lipid fatty acid composition, pH-dependent periplasmic or membrane protein activity and stability, and periplasmic chaperone protection. Such GEMs enable the rational and fast design of host robustness from a computational viewpoint.

With the help of mathematical models such as machine learning or deep learning, the performance of cell robustness may be adjusted quickly and accurately without taking into account the complex mechanism of action. Deep learning is an algorithm that uses artificial neural networks (for example, convolutional neural networks (CNNS) and recurrent neural networks (recurrent neural networks). RNN)) as a framework for characterizing and learning data sets (Sapoval et al. 2022). Machine learning uses algorithms such as Bayes, support vector machine and logistic regression to uncover the hidden rules and essence behind things, and to obtain models through training data sets (Asnicar et al. 2023). By develo** machine learning or deep learning models, any biological sequence such as DNA, RNA or amino acid sequence can be used as data input to solve many biological problems. For example, by combining machine learning with abundant proteomics and metabolomics data, the pathway dynamics can be effectively predicted in an automated manner (Costello and Martin 2018). This approach outperforms the classical kinetic models, which rely heavily on domain expertise, and guides the bioengineering efforts with qualitative and quantitative predictive data. Additionally, introducing machine learning or deep learning into multi-scale GEMs can effectively improve the model quality and prediction accuracy.

2 Conclusion and future perspectives

A stable microbial cell is more economically feasible to scale up from the laboratory testing to industrial biomanufacturing. In this review, we summarized the current strategies to improve host robustness, including three knowledge-guided engineering approaches such as transcription factors, membrane/transporter and stress proteins, and adaptive laboratory evolution based on natural selection. In addition, artificial intelligence (e.g. deep learning and machine learning)-assisted pathway design shows great potential in the design of robust industrial hosts. The above strategies have effectively improved the robustness of microbial hosts and expanded their applications in biomanufacturing. However, there are still several challenges in engineering cell robustness.

First, the understanding of the mechanisms of toxicity and robustness is limited. Although the transcription factor engineering allows the regulation of the entire metabolic network, the diversity makes it difficult to focus on which factor to engineer. In most cases, a trial-and-error approach is used to screen for the most effective factors. It is therefore expected that rapid and easy-to-engineer methods will be developed for mining and modifying regulatory factors, thereby promoting the high-throughput and (semi-)rational construction of microbial cell factories. Meanwhile, cell metabolism can be manipulated by combining multiple transcription factors to control a variety of key proteins to different harsh conditions at the same time. For example, a method called MultIplex Navigation of Global Regulatory Networks (MINR) has been proposed to target multiple transcription factors simultaneously (Liu et al. 2019). Based on these experimental data, the distinct regulatory mechanism for each known transcription factor can be uncovered to build a model or database. Alternatively, the functions of most transporters are unknown. Similar to transcription factors, the identification and characterization of transporters for specific compounds with high efficiency is also required.

Second, ALE is an efficient tool for engineering microbial cells with specific phenotypes, whereas the isolation of the target mutant from a microflora usually requies a high-throughput facility. For example, the DREM CELL platform allows for the screening of target strains at a picoliter scale (Meng et al. 2022). Depending on the fluorescence output, a biosensor based on transcription factors or riboswitches can significantly increase the efficiency of a screening process (Li et al. 2023). In addition, biosensors with appropriate sensitivity and dynamic range can be used to dynamically regulate the biosynthesis of many compounds (Hossain et al. 2020). A robust biosensor may be based on existing ones or modified to facilitate the construction of robust cell factories.

Third, model microorganisms, such as E. coli and S. cerevisiae, are usually mesophilic and have limited ability to withstand harsh industrial stresses. For example, C1 biotechnology has made great progress in using CO2, methanol, formic acid, etc. to synthesize valuable compounds in model hosts (Bae et al. 2022; Zhan et al. 2023). To improve the host’s robustness to cope with these substrates, the reconstruction of metabolic pathways to reduce the toxicity of substrates or intermediates is the necessary step. In another case of non-model host Halomonas bluephagenesis, an important platform chemical 3-hydroxypropionic acid, achieved high yields ofup to 154 g/ L at a 60 g/L of NaCl (Jiang et al. 2021a), which is intolerable for model hosts. Recently, knowledge of genome editing tools has increased, making it easier to work with non-model hosts (Liu et al. 2022). Some non-model hosts, such as thermophilic and acidophilic strains, may become an important direction for future cell factory construction (Thorwall et al. 2020), which can address the limitations of model microorganisms.

The rapid development of machine learning and deep learning has led to the emergence of many biological tools with various functions, such as DLKcat and UniKP for predicting the kinetic parameter kcat (Li et al. 2022; Yu et al. 2023). These intelligent approaches facilitate the analysis of big data generated by multi-omics sequencing, and help to optimize the GEMs for a particular host strain. Nevertheless, experimental data are still the basis for training artificial intelligence models. More practical data feeds can ensure the reliability and availability of AI models. The future computational approaches could consider the comprehensive capacity of models towards different environmental factors (e.g. mining a regulatory factor that responds to multiplex stresses). AI is expected to drive advances in biology, especially in the design of robust microbial cell factories.