Introduction

Microbial cell engineering for the improved performance of microbial platforms has emerged as a powerful tool for the production of metabolites. Major successes in this area include the improved production/synthesis of 2,3-butanediol, fumarate, L-proline, hyaluronic acid, and human O-linked glycoprotein, among others (Chen et al. 2020; Long et al. 2020; Meng et al. 2020; Natarajan et al. 2020; Wang et al. 2020). In this regard, a range of host organisms has been tested, among which the Escherichia coli bacterial expression system continues to be the preferred system for laboratory investigations and for the early-stage development of commercial applications. Indeed, more than 60% of recombinant proteins and nearly 30% of approved recombinant therapeutic proteins are produced by the E. coli expression system (Correa and Oppezzo 2015). The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression (Fomenkov et al. 2017). Although E. coli is a well-established host that offers short culturing time, easy genetic manipulation, and low cost, some bottlenecks in the production process, such as heterologous protein aggregation, prevent its more wide-scale use (Chen 2012). For instance, many eukaryotic proteins fail to fold properly when expressed using the E. coli system and form insoluble aggregates (Sahdev et al. 2008). Thus, from a host perspective, microbial cell engineering is becoming an important consideration to facilitate the production of native-state recombinant proteins with high efficiency. Unlike with typical metabolic pathways, recombinant protein expression is intricately linked to the cellular machinery, with multiple factors determining the flux through the pathway (Mahalik et al. 2014). Furthermore, the complex linkages between cellular physiology and heterogeneous expression make it difficult to break those bottlenecks in the successful production of proteins. Fortunately, knowledge from the increasing number of transcriptomic, genomic, and metabolomic studies has helped to provide a deeper understanding of the cellular factors responsible for exogeneous protein expression, and this will benefit rational host cell engineering (Brunk et al. 2016; Tan et al. 2020).

Protein aggregation (inclusion bodies) is the most frequently encountered issue during the overexpression of exogeneous proteins using the E. coli expression system (Ami et al. 2009). How to effectively avoid the formation of inclusion bodies and improve soluble expression yields remains a topic of discussion. Ranging from inheritance to environment, soluble expression can be affected by numerous factors: nucleotide sequence, protein size, the presence of post-translational modifications or cytoplasmic enzyme interactions, the vectors and promoters used in the reactions, and outside factors, such as the pH and temperature (Fink 1998; Idicula-Thomas and Balaji 2005; Peterson 2012). Indeed, induction at a lower temperature is known to increase protein yield in most cases and improve the biological activity of the product (Qing et al. 2014). Based on the gene annotation of the transcripts, the aligned-read values were calculated to determine the gene expression levels. The expression values were normalized by DESeq2 methods previously described based on the negative binomial distribution (Love et al. 2014). Genes with fold change ≥ 2 and p-adjusted ≤ 0.05 were designated as differentially expressed. Gene ontology enrichment analysis of differentially expressed genes (DEGs) was implemented by the ClusterProfiler package, and GO terms with corrected p-values less than 0.01 were considered significantly enriched (Yu et al. 2012). The biological replicates were checked for any batch effects before the raw counts were generated using the Bioconductor Rsubread package (Liao et al. 2019).

Real-time PCR (qPCR)

To further validate the RNA-seq analysis, qPCR was conducted to determine gene expression patterns. The total RNA isolation was performed as described above and purified using the RNAprep Pure cell/Bacteria kit. One Step qPCR was performed using the primers and probes listed in Supplemental Table S1 in the following reaction volumes: 1 μl template, 0.5 μl F/R primers, 64 μl of 10 × buffer, 2 μl of 2.5 mM dNTP, 0.4 μl HS Taq, 0.2 μl TransScript II Reverse Transcriptase (TransGen Biotech, Bei**g, China), 10.4 μl DEPC H2O, and 1 μl probe. The PCR program was run at 50 °C for 10 min, 95 °C for 10 min, followed by 45 cycles of 95 °C for 15 s and 55 °C for 50 s. The relative mRNA levels were evaluated using the comparative Cycle Threshold (2 − △△Ct) method, with 16S rRNA used as the internal reference (Brosius et al. 1978).

Strain construction

All ER2566 derivative strains used in this study are listed in Supplemental Table S2. E. coli DH5α (Takara, Dalian, China) was used for plasmid construction and ER2566 cells harboring pKD46 were used as the host strain for gene knock-out homologous recombination with the λ Red recombinase system (Datsenko and Wanner 2000). In brief, donor DNA with the selectable marker kan flanked by the Flp recognition target (FRT) site were integrated into the E. coli chromosome, and the kanamycin cassette was removed from the pCP20 plasmid using Flp recombinase (Doublet et al. 2008). After recombination, the correct recombinant colony was confirmed by sequencing.

ELISA quantitative detection

The HPV16-L1 monoclonal antibody 22E4 was coated into the wells of 96-well microplates (200 ng/well), as described previously (Gu et al. S3). For the N37, Y37, and Y24 overexpression samples, only 30.86 ~ 44.51% reads were mapped to the reference genome, which was significantly lower than that of the B37 samples: this reduction was due to the large amount of HPV16-L1 mRNA transcription, which was not related to the bacterial genome sequence.

A correlation analysis was performed to investigate reproducibility among the biological replicates and to determine the similarities between the different samples. Principal component analysis (PCA) based on gene expression abundance indicated high reliability of the biological triplicates (Fig. 2c). Next, we identified the genes that were differentially expressed in response to the different environments (Fig. 2d). In terms of leaky expression (N37 vs B37), 540 genes were significantly differentially expressed (log2|FoldChange|≥ 2.0 and p-adj ≤ 0.05), with 268 upregulated genes and 272 downregulated genes. Growth in a lower temperature (Y24 vs Y37) resulted in 620 DEGs (log2|FoldChange|≥ 1.0 and p-adj ≤ 0.05), with 261 upregulated genes and 359 downregulated genes. The DEGs were confirmed by real-time PCR (qPCR) analysis using the 2−ΔΔCt method and 16S ribosomal RNA for normalization. Five selected DEGs were submitted to the confirmation analysis, and the fold change in each gene expression was calculated. We found a high correlation between the assayed genes and those in the RNA-Seq data, confirming the reliability of the RNA-Seq analysis (Supplemental Fig. S2).

Molecular responses to leaky expression

A leaky expression can be problematic for protein production using the E. coli expression system. To obtain a broad overview of the molecular responses activated under conditions of leaky expression (N37 vs B37), gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were undertaken using the ClusterProfiler R package (Kanehisa and Goto 2000; Harris et al. 2004). All 540 DEGs identified for leaky conditions were associated with the following three GO categories: BP, biological process; CC, cellular component; and MF, molecular function, and clustered by whether the gene was upregulated or downregulated (Fig. 2d). DEGs that showed downregulation were significantly enriched in 103 terms, including “Ribosome assembly,” “Proton transmembrane transport,” “Membrane protein complex,” and “Disaccharide transport” (Supplemental Table S4a, Fig. 3a). Comparatively, the upregulated DEGs were mainly enriched across 68 terms related to biological processes (“Transcription,” “RNA biosynthetic process,” “Regulation of transcription,” “Fatty acid oxidation,” etc.) and 11 terms associated with a molecular function (“Oxidoreductase activity,” “Transcription regulator activity,” “DNA binding,” etc.); 3 terms were associated with cellular components (“Pilus,” “Integral component of cell outer membrane,” “Cell projection,”) (Supplemental Table S4b, Fig. 3b).

Fig. 3
figure 3

Gene ontology (GO) enrichment analysis of differentially expressed genes (DEGs) between non-induced ER2566 strain expressing HPV16-L1 protein cultured at 37 °C (N37) and blank ER2566 strain at the same temperature (B37). a Downregulated genes are totally enriched into 103 GO terms covering all three categories: BP, biological process; CC, cellular component; and MF, molecular function, and the 15 most significantly enriched BP terms, 10 CC terms and 15 MF terms are presented. The y-axis shows the GO enrichment terms across the three categories, whereas the x-axis shows the gene numbers. Enrichment criteria: pvalueCutoff < 0.01 and qvalueCutoff < 0.01. b Upregulated genes are enriched into 82 GO terms covering 3 categories and the 26 most significantly enriched BP terms, 3 CC terms and 11 MF terms are presented

Next, DEGs derived from a comparison of the non-induced and blank cultures (N37 vs B37) were submitted to KEGG enrichment analysis (p < 0.05; Supplemental Table S5). The results identified a downregulation in pathways associated with “Ribosome assembly” and “Oxidative phosphorylation,” and an upregulation in pathways that focused predominantly on “Pentose and glucuronate interconversions,” “Fatty acid metabolism,” “Tryptophan metabolism,” and “Propanoate metabolism.” These findings suggest that leaky expression can accelerate host energy metabolism and alter multiple host regulatory factors, with decreased ribosome synthesis assembly and transport pathways. This knowledge may help to establish a mechanism that describes the adaptative changes that occur under leaky expression stress.

Overall analysis of DEGs response to low-temperature induction

GO analysis was used to explore how low-temperature induction improves the soluble expression of the recombinant protein (Y24 vs Y37). The 359 downregulated DEGs identified were enriched into 6 clusters by GO terms: 5 terms belonged to biological processes (oxidation–reduction process, energy derivation by oxidation of organic compounds, Cellular respiration, anaerobic respiration, and carbohydrate transmembrane transport), and 1 term to cellular components (Ni–Fe hydrogenase complex) (Fig. 4a). The 261 upregulated DEGs were enriched in 11 terms (Fig. 4b), with 7 terms related to biological processes (polysaccharide transport, polysaccharide localization, maltodextrin transport, movement of cell or subcellular component, cell motility, localization of cell, and maltose transport), and 4 terms related to cellular components (bacterial-type flagellum, bacterial-type flagellum hock, bacterial-type flagellum, and cell projection). There was no enrichment of terms related to molecular function for either up- or downregulated genes associated with temperature-regulated induction. Comprehensive consideration of the BaseMean, fold changes, p-adj values, and genes which crossed multiple GO terms—for example, hybC, hybA, napA, cydA, aceK, malK, malF, flgH, among others—could potentially play important roles in promoting HPV16-L1 capsid protein soluble expression (Supplemental Table S6). In total, we concluded that 18 up- and 19 downregulated DEGs could be involved in host adaption for improved protein production under low-temperature induction (Fig. 5).

Fig. 4
figure 4

Gene ontology pathway analysis of differentially expressed genes (Y24 vs Y37). a, b The circles, respectively, indicate the correlations between the a downregulated and b upregulated DEGs and their gene ontology terms. A deeper color represents a greater fold change

Fig. 5
figure 5

Volcano plot showing gene expression differences under induction temperatures of 24 °C and 37 °C. For each gene, the –log10(p-adj) was plotted against its log2(FoldChange). Genes with p-adj ≤ 0.05 and log2(FoldChange) ≤ -1 were designated as downregulated, with the others designated as upregulated. The key regulated genes are either in red (upregulated) or blue (downregulated) and are also represented in Supplemental Table S6a and S6b, respectively

Genetic engineering of ER2566 strain

To investigate the involvement of these 18 up- and 19 downregulated DEGs, we successfully constructed 36 singly gene knocked out strains using the λ-Red homologous recombination system except one of downregulated lethal gene, cydB, and compared recombinant HPV16-L1 expression levels between the wild-type and DEG-knock out strains at 24 °C and 37 °C (Fig. 6a, b). Among the knock-out recombinant strains, three downregulated genes (cydA, mngR, udp) and one upregulated gene (bluF) altered HPV16-L1 protein expression and resulted in a 30 ~ 70% decrease in protein production. From the growth curves (OD600), we noted that the knockout of cydA gene affects the bacterial growth and the production of the target protein. However, there was no significant difference in cell density profiles between wild-type and recombinant strains (ER2566-ΔmngR, Δudp, ΔbluF) (Fig. 6c), and we speculate that genes could act as key roles, affecting and regulating the solubility of the recombinant protein.

Fig. 6
figure 6

Fold change in the expression of HPV16-L1 recombinant protein in knock-out ER2566 engineered strains in comparison to wild type (WT) strain ER2566. a Downregulated and b upregulated DEGs were singly knocked-out to compare protein expression with the WT cells using the double antibody sandwich ELISA. The dashed lines represent a 20% increase or decrease in expression. Data are the mean and standard deviation calculated from two biological repetitions. c Growth curves of different E. coli recombinant strains. The OD values on the growth curve represent the average calculated from three biological repetitions. d The protein yield of different reconstruct strains in flasks (left) and fermentation (right). Quantification of recombinant protein was measured by double antibody sandwich ELISA. These results show the average values from the experiments independently repeated three times. Error bars represent the standard error. The asterisk corresponds to p < 0.05, related to ER2566 by t-test. e The particle morphology of HPV 16 L1 VLP from ER2566 strain and recombinant strain (ER2566-ΔflgL-ΔflgH-ΔflgK) were characterized by TEM and HPSEC, respectively. f Soluble recombinant protein fractions of different recombinant proteins in E. coli ER2566 strain and ER2566-ΔflgL-ΔflgH-ΔflgK were determined by densitometry analysis of SDS-PAGE gel using ImageJ

Some of the knock-out strains for upregulated genes, ΔgntK, ΔflgH, ΔflgL, ΔflgK, led to a 20% increase in the production of the target protein at 24 °C but not at 37 °C. Further engineering of the host (ER2566-ΔflgH-ΔflgL-ΔflgK) by combining multiple flagellar motor DEGs improved the recombinant protein yield to 1.5-fold than that of the wild-type strain with statistical differences (p < 0.05) (Fig. 6d). To further investigate the recombinant protein expression in high-density fermentation process between different strains, we cultured the cells in fermentation of 5-L scale; the strains ER2566 and BL21(DE3), harboring an HPV type 16 L1 protein expression plasmid (pTO-T7-H16), respectively served as a control, the recombinant strains (ER2566-ΔflgH-ΔflgL-ΔflgK) as an experimental group. Next, we evaluated the HPV16 L1 protein expression capability and found that the ER2566(pTO-T7-H16) strain and BL21(DE3)(pTO-T7-H16) strain had similar protein yield (~ 16 mg/g). Like the result in the flask culture, the recombinant strains (ER2566-ΔflgH-ΔflgL-ΔflgK) had a 1.3-fold higher protein yield (~ 20.3 mg/g) than the control groups (Fig. 6d). Additionally, as visualized by negative-stain, transmission electron microscopy (TEM) confirmed that the HPV L1 proteins expressed from the ER2566 E. coli, and recombinant strains were able to self-assemble into the form of virus-like particles (VLPs) with a similar diameter of 50 nm (Fig. 6e). Meanwhile, high-performance size exclusion chromatography (HPSEC) also confirmed a similar molecular weight for VLPs achieved by the two expression hosts in terms of relative retention time (Fig. 6e). Meanwhile, we also selected a series of recombinant proteins such as HPV 69–51-26, HPV58, green fluorescent protein (GFP) and HEV P239 protein, to assess the soluble expression of the target protein. Soluble expression levels of HPV 69–51-26, HPV58, and GFP protein in recombinant strain ER2566-ΔflgL-ΔflgH-ΔflgK were detected by ImageJ. In terms of the soluble recombinant protein fractions of different recombinant proteins by pixel-counting quantification approach, we found that the protein yields of reconstruct strains increased up to ~ 1.3 fold in comparison to the WT strain, except for HEV P239, which has been found to be expressed in nearly exclusively inclusion body form in our previous study (Li et al. 2015), and we also show the difficulty in soluble expression in this study as well, and the engineering in strain ER2566-ΔflgL-ΔflgH-ΔflgK has no observable effect for HEV P239 soluble expression (Fig. 6f).

Discussion

Omics-guided bacterial engineering for recombinant protein expression had been developed to improve strain performance (Choi et al. 2019). Despite massive research efforts and much recent progress (Choi and Lee 2013; d’Espaux et al. 2017; Pontrelli et al. 2018), only a few model strains have been applied in industrial production, mainly due to the lack of comprehensive knowledge on related genome information and corresponding biosynthetic pathways. In this study, our workflow integrates a common method based on our previous publication which demonstrated a universal pipeline for genome assembly and reannotation for various bacteria, including the non-model engineering hosts without the complete genome and annotation information, which greatly increases the plasticity of industrial strains. In addition, our reannotation pipeline with high speed and accuracy could be extrapolated for the reannotation of other bacterial genomes to provide a better understanding of gene function under the external burden and provide more clues to engineer bacteria for biotechnological applications. In the future, we will develop a web-based genomic annotation analysis website to facilitate the annotation and analysis of industrial strains for more users.

Inclusion body formation is often a troubling issue during the overexpression of exogenous proteins using the E. coli expression system (Ami et al. 2009). Lower-temperature induction is known to increase protein yield in most cases and improve the biological activity of the product (Qing et al. 2004). In this study, we showcase an over-expression of HPV16-L1 by comparative transcriptomics of the strains under low-temperature induction results in the upregulation of polysaccharide transport, maltodextrin transport, and cell motility, along with a downregulation in the processes surrounding oxidation–reduction, cellular respiration, and anaerobic respiration, among others. The analysis suggests that a critical pathway could play the key role in regulating the balance between heterologous protein production and host metabolism.

In addition, we found that the knock-out of three motility-related DEGs (ER2666-ΔflgH-ΔflgL-ΔflgK) could improve the expression yield for various interest proteins up to 1.5-folds in comparison to the prototype strain either in shake-flask culture or high-density fermentation. Consistent with our results, some studies had shown that the flagellar production is energy-intensive and that the flg gene knock-out or mutation strains could provide more energy and substrate for the synthesis of recombinant protein production by preventing cell motility and the assembly of the flagellum. Taken together, the knock-out of flg-related genes could also be applied to other E. coli strains for improving the expression level of recombinant protein. Notably, a large proportion of the knocked-out strains showed different responses at the two temperatures, suggestive of more complicated interactions under low-temperature stress. The challenges to improve the performance of the expression systems will thus require a comprehensive strategy that includes “balancing” the yield and activity of the protein, as well as the product quality and associated metabolite toxicity. Meanwhile, some hub genes were first identified to significantly influence recombinant protein productions, such as the deletion of bluF, cydA, mngR, and udp, which led to a significant decrease in soluble recombinant protein production. Other studies indicated that the bluF gene mainly binds to and releases the bluR repressor from its bound DNA target and also may serve as a thermometer (Hasegawa et al. 2006; Nakasone et al. 2010). According to this knowledge, we assumed that bluF knock-out might decrease the soluble expression of HPV16 L1 by losing a thermometer for E. coli to sense the low temperature. cydA is the component of the aerobic respiratory chain of E. coli that predominates when cells are grown at low aeration, and there was a significant difference in cell density profiles of bacterial cultures between wild-type and knock-out strains (ER2566-ΔcydA) which might affect cellular energy metabolism to decrease the expression of soluble heterologous proteins (Borisov et al. 2011). mngR might involve in the regulation of the acid citric cycle in response to fatty acids, a suggestion based on in vitro experiments as a transcription regulation factor to influence the expression of recombinant protein (Sampaio et al. 2004). However, how these genes clearly regulate the soluble expression needs to be further investigated by a variety of approaches such as additional regulatory gene over-expression and multi-omics in the future.

Based on the above findings including transcriptomic data and knock-out assay, we proposed a possible model to illustrate the response of the ER2566 strain under low-temperature induction (Fig. 7). GO and KEGG analyses show that those genes which are downregulated upon the induction at lower temperature might associate with the slowdown of carbohydrate transmembrane transport and cellular respiration. In contrast, the maltose transporter-related genes are significantly upregulated. Indeed, the maltose transporter knock-out assays indicate a significant decrease in the expression of the soluble recombinant protein at low-temperature induction as compared with the wild type ER2566 cells. We speculate that maltose transportation and metabolism might have a role in maintaining energy metabolism in the host cell at lower temperatures and are benefitting from the soluble expression of recombinant proteins. In another primary mechanistic branch, genes related to cell motility, localization, and flagellum terms were significantly enriched under low-temperature induction; these findings are consistent with previous research that the loss of E. coli motility at 37 °C is recovered when the temperature is reduced to 24 °C (Noor et al. 2013). Some authors have proposed that flagellar biosynthesis and assembly are an energy‐intensive process (Yoon et al. 2009). This mechanism was further validated in our knock-out assays, where deleting the DEG association with the bacterial-type flagellum (flgH, flgK, flgL) resulted in an increase in the production of the target protein. This increase in production was further exacerbated to ~ 1.5-fold by the combined knockout of flgH, flgK, and flgL genes under 24 °C induction in shake-flask or high-density fermentation.

Fig. 7
figure 7

Suggested model of the major overexpression response mechanism for recombinant protein expression under low-temperature induction. Arrows indicate primary processes or direct interactions; dotted arrows indicate secondary or indirect interactions. Green and red arrows, respectively, represent down- and upregulated pathways under low-temperature induction. In this model, the overexpression of the heterologous protein under low-temperature induction results in the upregulation of polysaccharide transport, maltodextrin transport and cell motility, along with a downregulation in the processes surrounding carbohydrate transport (e.g., glucose influx), cellular respiration, and anaerobic respiration, among others. Cells consume lots of energy for cell motility and the assembly of flagellum under low-temperature induction. Motility-related knock-out strains may have better energy optimization and substrate distribution, which, in turn, would promote the soluble expression of the recombinant protein. Meanwhile, some hub genes, such as bluF and cydA among others, were identified to significantly influence recombinant protein production

In summary, we proposed a model to interpret the response of the ER2566 strain under low-temperature induction. The low temperatureinduced high yield may derive from comprehensive interactions, among cell motility and maltose transportation, and a reduction in pathways associated with carbohydrate transmembrane transportation and cellular respiration. Meanwhile, independent DEGs may also influence host adaptations and metabolic networks. Totally, this study focused on the anabolic and stress-responsive hub genes of the adaptation of E. coli to recombinant protein overexpression on the transcriptome level and constructed a series of engineering strains increasing the soluble protein yield of recombinant proteins which lays a solid foundation for the engineering of bacterial strains for recombinant technological advances.