Abstract
The influence of ionic liquid (IL) characteristics, lignocellulosic biomass (LCB) properties, and process conditions on LCB pretreatment is not well understood. In this study, a total of 129 experimental data on LCB (grass, agricultural, and forest residues) pretreatment using imidazolium, triethylamine, and choline-amino acid ILs were compiled to develop machine learning (ML) models for cellulose, hemicellulose, lignin, and solid recovery. Following data imputation, a bilayer artificial neural network (ANN) and random forest (RF) regression, the two most widely adopted ML models, were developed. The full-featured ANN following Bayesian hyperparameter (HP) optimisation offered excellent fit on training (R2: 0.936–0.994), though cross-validation (R2CV) performance remained marginally poor, i.e. between 0.547 and 0.761. The fitness of HP-optimised RF models varied between 0.824 and 0.939 for regression, and between 0.383 and 0.831 in cross-validation. Temperature and pretreatment time had been the most important predictors, except for hemicellulose recovery. Bayesian predictor selection combined with HP optimisation improved the R2CV boundary for ANN (0.555–0.825), as well as for RF models (0.474–0.824). As predictive performance of the models varied depending on target response, use of a larger homogeneous dataset may be warranted. The predictive modelling framework for LCB pretreatment, developed in this study, can be extended to similar biochemical process systems.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12155-024-10747-2/MediaObjects/12155_2024_10747_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12155-024-10747-2/MediaObjects/12155_2024_10747_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12155-024-10747-2/MediaObjects/12155_2024_10747_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12155-024-10747-2/MediaObjects/12155_2024_10747_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12155-024-10747-2/MediaObjects/12155_2024_10747_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12155-024-10747-2/MediaObjects/12155_2024_10747_Fig6_HTML.png)
Similar content being viewed by others
Data Availability
Data in this study are available on request to the corresponding author.
References
Su C-W, Pang L-D, Qin M et al (2023) The spillover effects among fossil fuel, renewables and carbon markets: evidence under the dual dilemma of climate change and energy crises. Energy 274:127304. https://doi.org/10.1016/j.energy.2023.127304
Stark A (2011) Ionic liquids in the biorefinery: a critical assessment of their potential. Energy Environ Sci 4:19–32. https://doi.org/10.1039/C0EE00246A
Maibam PD, Goyal A (2022) Approach to an efficient pretreatment method for rice straw by deep eutectic solvent for high saccharification efficiency. Bioresour Technol 351:127057. https://doi.org/10.1016/j.biortech.2022.127057
Wong JL, Khadaroo SNBA, Cheng JLY et al (2023) Green solvent for lignocellulosic biomass pretreatment: an overview of the performance of low transition temperature mixtures for enhanced bio-conversion. Next Mater 1:100012. https://doi.org/10.1016/j.nxmate.2023.100012
Alayoubi R, Mehmood N, Husson E et al (2020) Low temperature ionic liquid pretreatment of lignocellulosic biomass to enhance bioethanol yield. Renew Energy 145:1808–1816. https://doi.org/10.1016/j.renene.2019.07.091
Magina S, Barros-Timmons A, Ventura SPM, Evtuguin DV (2021) Evaluating the hazardous impact of ionic liquids — challenges and opportunities. J Hazard Mater 412:125215. https://doi.org/10.1016/j.jhazmat.2021.125215
Halder P, Kundu S, Patel S et al (2019) Progress on the pre-treatment of lignocellulosic biomass employing ionic liquids. Renew Sustain Energy Rev 105:268–292. https://doi.org/10.1016/j.rser.2019.01.052
Chen Z, Jiang D, Zhang T et al (2022) Comparison of three ionic liquids pretreatment of Arundo donax L. for enhanced photo-fermentative hydrogen production. Bioresour Technol 343:126088. https://doi.org/10.1016/j.biortech.2021.126088
Smuga-Kogut M, Kogut T, Markiewicz R, Słowik A (2021) Use of machine learning methods for predicting amount of bioethanol obtained from lignocellulosic biomass with the use of ionic liquids for pretreatment. Energies 14:243. https://doi.org/10.3390/en14010243
Torres-Barrán A, Alonso Á, Dorronsoro JR (2019) Regression tree ensembles for wind energy and solar radiation prediction. Neurocomputing 326–327:151–160. https://doi.org/10.1016/j.neucom.2017.05.104
Qian L, Ni J, Luo M et al (2023) Machine learning models for fast and isothermal hydrothermal liquefaction of biomass: comprehensive experiment and prediction of various product fraction yields. Energy Convers Manag 292:117430. https://doi.org/10.1016/j.enconman.2023.117430
Coşgun A, Günay ME, Yıldırım R (2023) A critical review of machine learning for lignocellulosic ethanol production via fermentation route. Biofuel Res J 10:1859–1875. https://doi.org/10.18331/BRJ2023.10.2.5
Ge H, Zheng J, Xu H (2023) Advances in machine learning for high value-added applications of lignocellulosic biomass. Bioresour Technol 369:128481. https://doi.org/10.1016/j.biortech.2022.128481
Tian Y, Zhang Y (2022) A comprehensive survey on regularization strategies in machine learning. Inf Fusion 80:146–166. https://doi.org/10.1016/j.inffus.2021.11.005
Wang H, Tang J, Wu M et al (2022) Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example. BMC Med Inform Decis Mak 22:13. https://doi.org/10.1186/s12911-022-01752-6
Dudek G (2015) Short-term load forecasting using random forests. In: Filev D et al. Intelligent Systems’2014. Advances in Intelligent Systems and Computing, Springer, Cham, vol 323, pp 821–828. https://doi.org/10.1007/978-3-319-11310-4_71
Bischl B, Binder M, Lang M et al (2023) Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. WIREs Data Min Knowl Discov 13:e1484. https://doi.org/10.1002/widm.1484
Kanthasamy R, Almatrafi E, Ali I et al (2023) Bayesian optimized multilayer perceptron neural network modelling of biochar and syngas production from pyrolysis of biomass-derived wastes. Fuel 350:128832. https://doi.org/10.1016/j.fuel.2023.128832
Phromphithak S, Onsree T, Tippayawong N (2021) Machine learning prediction of cellulose-rich materials from biomass pretreatment with ionic liquid solvents. Bioresour Technol 323:124642. https://doi.org/10.1016/j.biortech.2020.124642
Luo H, Gao L, Liu Z et al (2021) Prediction of phenolic compounds and glucose content from dilute inorganic acid pretreatment of lignocellulosic biomass using artificial neural network modeling. Bioresour Bioprocess 8:134. https://doi.org/10.1186/s40643-021-00488-x
Jadhav A, Pramod D, Ramanathan K (2019) Comparison of performance of data imputation methods for numeric dataset. Appl Artif Intell 33:913–933. https://doi.org/10.1080/08839514.2019.1637138
Folch-Fortuny A, Arteaga F, Ferrer A (2016) Missing data imputation toolbox for MATLAB. Chemom Intell Lab Syst 154:93–100. https://doi.org/10.1016/j.chemolab.2016.03.019
Beretta L, Santaniello A (2016) Nearest neighbor imputation algorithms: a critical evaluation. BMC Med Inform Decis Mak 16:74. https://doi.org/10.1186/s12911-016-0318-z
Waljee AK, Mukherjee A, Singal AG et al (2013) Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 3:e002847. https://doi.org/10.1136/bmjopen-2013-002847
Camargo A (2022) PCAtest: testing the statistical significance of Principal Component Analysis in R. PeerJ 10:e12967. https://doi.org/10.7717/peerj.12967
Feurer M, Hutter F (2019) Hyperparameter optimization. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham, pp 3–33. https://doi.org/10.1007/978-3-030-05318-5_1
Sage AJ, Genschel U, Nettleton D (2021) A residual-based approach for robust random forest regression. Stat Interface 14:389–402. https://doi.org/10.4310/20-SII660
Hossain SMZ, Sultana N, Razzak SA, Hossain MM (2022) Modeling and multi-objective optimization of microalgae biomass production and CO2 biofixation using hybrid intelligence approaches. Renew Sustain Energy Rev 157:112016. https://doi.org/10.1016/j.rser.2021.112016
Shahriari B, Swersky K, Wang Z et al (2016) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104:148–175. https://doi.org/10.1109/JPROC.2015.2494218
Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014
Tang F, Ishwaran H (2017) Random forest missing data algorithms. Stat Anal Data Min ASA Data Sci J 10:363–377. https://doi.org/10.1002/sam.11348
Kokla M, Virtanen J, Kolehmainen M et al (2019) Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. BMC Bioinformatics 20:492. https://doi.org/10.1186/s12859-019-3110-0
Ascher S, Sloan W, Watson I, You S (2022) A comprehensive artificial neural network model for gasification process prediction. Appl Energy 320:119289. https://doi.org/10.1016/j.apenergy.2022.119289
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans R Soc A Math Phys Eng Sci 374:20150202. https://doi.org/10.1098/rsta.2015.0202
Huang X-Y, Ao T-J, Zhang X et al (2023) Develo** high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation. Bioresour Technol 385:129375. https://doi.org/10.1016/j.biortech.2023.129375
Greenhill S, Rana S, Gupta S et al (2020) Bayesian optimization for adaptive experimental design: a review. IEEE Access 8:13937–13948. https://doi.org/10.1109/ACCESS.2020.2966228
Zhang W, Chen Q, Chen J et al (2023) Machine learning for hydrothermal treatment of biomass: a review. Bioresour Technol 370:128547. https://doi.org/10.1016/j.biortech.2022.128547
Abe M, Kuroda K, Sato D et al (2015) Effects of polarity, hydrophobicity, and density of ionic liquids on cellulose solubility. Phys Chem Chem Phys 17:32276–32282. https://doi.org/10.1039/C5CP05808B
Sun W, Greaves TL, Othman MZ (2020) Electro-assisted pretreatment of lignocellulosic materials in ionic liquid-promoted organic solvents. ACS Sustain Chem Eng 8:18177–18186. https://doi.org/10.1021/acssuschemeng.0c06537
Gallardo K, Castillo R, Mancilla N, Remonsellez F (2020) Biosorption of rare-earth elements from aqueous solutions using walnut shell. Front Chem Eng 2:4. https://doi.org/10.3389/fceng.2020.00004
Torre-Tojal L, Bastarrika A, Boyano A et al (2022) Above-ground biomass estimation from LiDAR data using random forest algorithms. J Comput Sci 58:101517. https://doi.org/10.1016/j.jocs.2021.101517
Probst P, Wright MN, Boulesteix A (2019) Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discov 9:e1301. https://doi.org/10.1002/widm.1301
Zhang W, Cheng X, Hu Y, Yan Y (2019) Online prediction of biomass moisture content in a fluidized bed dryer using electrostatic sensor arrays and the Random Forest method. Fuel 239:437–445. https://doi.org/10.1016/j.fuel.2018.11.049
Ahmad MW, Mourshed M, Rezgui Y (2017) Trees vs neurons: comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build 147:77–89. https://doi.org/10.1016/j.enbuild.2017.04.038
Maniruzzaman M, Rahman MJ, Al-MehediHasan M et al (2018) Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst 42:92. https://doi.org/10.1007/s10916-018-0940-7
Busato S, Gordon M, Chaudhari M et al (2023) Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies. Curr Opin Plant Biol 71:102326. https://doi.org/10.1016/j.pbi.2022.102326
Martín-Fernández J-A, Hron K, Templ M et al (2015) Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling 15:134–158. https://doi.org/10.1177/1471082X14535524
Velidandi A, Kumar Gandam P, Latha Chinta M et al (2023) State-of-the-art and future directions of machine learning for biomass characterization and for sustainable biorefinery. J Energy Chem 81:42–63. https://doi.org/10.1016/j.jechem.2023.02.020
Scheda R, Diciotti S (2022) Explanations of machine learning models in repeated nested cross-validation: an application in age prediction using brain complexity features. Appl Sci 12:6681. https://doi.org/10.3390/app12136681
Thomas RM, Bruin W, Zhutovsky P, van Wingen G (2020) Chapter 14 - Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. In: Mechelli A, Vieira S (eds) Machine Learning Methods and Applications to Brain Disorders. Academic Press, London, pp 249–266. https://doi.org/10.1016/B978-0-12-815739-8.00014-6
Acknowledgements
D. H. and B. M. would like to acknowledge Karunya Institute of Technology and Sciences, Coimbatore for providing every essential support to perform the experiments and complete this research work.
Funding
This work is financially supported by Karunya Institute of Technology and Sciences, Coimbatore.
Author information
Authors and Affiliations
Contributions
Biswanath Mahanty: conceptualisation, software, writing — original draft, reviewing and editing. Munmun Gharami: data curation, reviewing and editing. Dibyajyoti Haldar: conceptualisation, data curation, reviewing and editing.
Corresponding authors
Ethics declarations
Ethics Approval and Consent to Participate
The study does not involve any human participants, human data, or human tissue. Ethics approval is not applicable.
Consent for Publication
Not applicable.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mahanty, B., Gharami, M. & Haldar, D. Machine Learning Modelling for Predicting the Efficacy of Ionic Liquid-Aided Biomass Pretreatment. Bioenerg. Res. (2024). https://doi.org/10.1007/s12155-024-10747-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12155-024-10747-2