Log in

Machine Learning Modelling for Predicting the Efficacy of Ionic Liquid-Aided Biomass Pretreatment

  • Published:
BioEnergy Research Aims and scope Submit manuscript

Abstract

The influence of ionic liquid (IL) characteristics, lignocellulosic biomass (LCB) properties, and process conditions on LCB pretreatment is not well understood. In this study, a total of 129 experimental data on LCB (grass, agricultural, and forest residues) pretreatment using imidazolium, triethylamine, and choline-amino acid ILs were compiled to develop machine learning (ML) models for cellulose, hemicellulose, lignin, and solid recovery. Following data imputation, a bilayer artificial neural network (ANN) and random forest (RF) regression, the two most widely adopted ML models, were developed. The full-featured ANN following Bayesian hyperparameter (HP) optimisation offered excellent fit on training (R2: 0.936–0.994), though cross-validation (R2CV) performance remained marginally poor, i.e. between 0.547 and 0.761. The fitness of HP-optimised RF models varied between 0.824 and 0.939 for regression, and between 0.383 and 0.831 in cross-validation. Temperature and pretreatment time had been the most important predictors, except for hemicellulose recovery. Bayesian predictor selection combined with HP optimisation improved the R2CV boundary for ANN (0.555–0.825), as well as for RF models (0.474–0.824). As predictive performance of the models varied depending on target response, use of a larger homogeneous dataset may be warranted. The predictive modelling framework for LCB pretreatment, developed in this study, can be extended to similar biochemical process systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Data in this study are available on request to the corresponding author.

References

  1. Su C-W, Pang L-D, Qin M et al (2023) The spillover effects among fossil fuel, renewables and carbon markets: evidence under the dual dilemma of climate change and energy crises. Energy 274:127304. https://doi.org/10.1016/j.energy.2023.127304

    Article  Google Scholar 

  2. Stark A (2011) Ionic liquids in the biorefinery: a critical assessment of their potential. Energy Environ Sci 4:19–32. https://doi.org/10.1039/C0EE00246A

    Article  CAS  Google Scholar 

  3. Maibam PD, Goyal A (2022) Approach to an efficient pretreatment method for rice straw by deep eutectic solvent for high saccharification efficiency. Bioresour Technol 351:127057. https://doi.org/10.1016/j.biortech.2022.127057

    Article  CAS  PubMed  Google Scholar 

  4. Wong JL, Khadaroo SNBA, Cheng JLY et al (2023) Green solvent for lignocellulosic biomass pretreatment: an overview of the performance of low transition temperature mixtures for enhanced bio-conversion. Next Mater 1:100012. https://doi.org/10.1016/j.nxmate.2023.100012

    Article  Google Scholar 

  5. Alayoubi R, Mehmood N, Husson E et al (2020) Low temperature ionic liquid pretreatment of lignocellulosic biomass to enhance bioethanol yield. Renew Energy 145:1808–1816. https://doi.org/10.1016/j.renene.2019.07.091

    Article  CAS  Google Scholar 

  6. Magina S, Barros-Timmons A, Ventura SPM, Evtuguin DV (2021) Evaluating the hazardous impact of ionic liquids — challenges and opportunities. J Hazard Mater 412:125215. https://doi.org/10.1016/j.jhazmat.2021.125215

    Article  CAS  PubMed  Google Scholar 

  7. Halder P, Kundu S, Patel S et al (2019) Progress on the pre-treatment of lignocellulosic biomass employing ionic liquids. Renew Sustain Energy Rev 105:268–292. https://doi.org/10.1016/j.rser.2019.01.052

    Article  CAS  Google Scholar 

  8. Chen Z, Jiang D, Zhang T et al (2022) Comparison of three ionic liquids pretreatment of Arundo donax L. for enhanced photo-fermentative hydrogen production. Bioresour Technol 343:126088. https://doi.org/10.1016/j.biortech.2021.126088

    Article  CAS  PubMed  Google Scholar 

  9. Smuga-Kogut M, Kogut T, Markiewicz R, Słowik A (2021) Use of machine learning methods for predicting amount of bioethanol obtained from lignocellulosic biomass with the use of ionic liquids for pretreatment. Energies 14:243. https://doi.org/10.3390/en14010243

    Article  CAS  Google Scholar 

  10. Torres-Barrán A, Alonso Á, Dorronsoro JR (2019) Regression tree ensembles for wind energy and solar radiation prediction. Neurocomputing 326–327:151–160. https://doi.org/10.1016/j.neucom.2017.05.104

    Article  Google Scholar 

  11. Qian L, Ni J, Luo M et al (2023) Machine learning models for fast and isothermal hydrothermal liquefaction of biomass: comprehensive experiment and prediction of various product fraction yields. Energy Convers Manag 292:117430. https://doi.org/10.1016/j.enconman.2023.117430

    Article  CAS  Google Scholar 

  12. Coşgun A, Günay ME, Yıldırım R (2023) A critical review of machine learning for lignocellulosic ethanol production via fermentation route. Biofuel Res J 10:1859–1875. https://doi.org/10.18331/BRJ2023.10.2.5

    Article  Google Scholar 

  13. Ge H, Zheng J, Xu H (2023) Advances in machine learning for high value-added applications of lignocellulosic biomass. Bioresour Technol 369:128481. https://doi.org/10.1016/j.biortech.2022.128481

    Article  CAS  PubMed  Google Scholar 

  14. Tian Y, Zhang Y (2022) A comprehensive survey on regularization strategies in machine learning. Inf Fusion 80:146–166. https://doi.org/10.1016/j.inffus.2021.11.005

    Article  Google Scholar 

  15. Wang H, Tang J, Wu M et al (2022) Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example. BMC Med Inform Decis Mak 22:13. https://doi.org/10.1186/s12911-022-01752-6

    Article  PubMed  PubMed Central  Google Scholar 

  16. Dudek G (2015) Short-term load forecasting using random forests. In: Filev D et al. Intelligent Systems’2014. Advances in Intelligent Systems and Computing, Springer, Cham, vol 323, pp 821–828. https://doi.org/10.1007/978-3-319-11310-4_71

  17. Bischl B, Binder M, Lang M et al (2023) Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. WIREs Data Min Knowl Discov 13:e1484. https://doi.org/10.1002/widm.1484

  18. Kanthasamy R, Almatrafi E, Ali I et al (2023) Bayesian optimized multilayer perceptron neural network modelling of biochar and syngas production from pyrolysis of biomass-derived wastes. Fuel 350:128832. https://doi.org/10.1016/j.fuel.2023.128832

    Article  CAS  Google Scholar 

  19. Phromphithak S, Onsree T, Tippayawong N (2021) Machine learning prediction of cellulose-rich materials from biomass pretreatment with ionic liquid solvents. Bioresour Technol 323:124642. https://doi.org/10.1016/j.biortech.2020.124642

    Article  CAS  PubMed  Google Scholar 

  20. Luo H, Gao L, Liu Z et al (2021) Prediction of phenolic compounds and glucose content from dilute inorganic acid pretreatment of lignocellulosic biomass using artificial neural network modeling. Bioresour Bioprocess 8:134. https://doi.org/10.1186/s40643-021-00488-x

    Article  Google Scholar 

  21. Jadhav A, Pramod D, Ramanathan K (2019) Comparison of performance of data imputation methods for numeric dataset. Appl Artif Intell 33:913–933. https://doi.org/10.1080/08839514.2019.1637138

    Article  Google Scholar 

  22. Folch-Fortuny A, Arteaga F, Ferrer A (2016) Missing data imputation toolbox for MATLAB. Chemom Intell Lab Syst 154:93–100. https://doi.org/10.1016/j.chemolab.2016.03.019

    Article  CAS  Google Scholar 

  23. Beretta L, Santaniello A (2016) Nearest neighbor imputation algorithms: a critical evaluation. BMC Med Inform Decis Mak 16:74. https://doi.org/10.1186/s12911-016-0318-z

    Article  PubMed  PubMed Central  Google Scholar 

  24. Waljee AK, Mukherjee A, Singal AG et al (2013) Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 3:e002847. https://doi.org/10.1136/bmjopen-2013-002847

    Article  PubMed  PubMed Central  Google Scholar 

  25. Camargo A (2022) PCAtest: testing the statistical significance of Principal Component Analysis in R. PeerJ 10:e12967. https://doi.org/10.7717/peerj.12967

    Article  PubMed  PubMed Central  Google Scholar 

  26. Feurer M, Hutter F (2019) Hyperparameter optimization. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham, pp 3–33. https://doi.org/10.1007/978-3-030-05318-5_1

  27. Sage AJ, Genschel U, Nettleton D (2021) A residual-based approach for robust random forest regression. Stat Interface 14:389–402. https://doi.org/10.4310/20-SII660

    Article  Google Scholar 

  28. Hossain SMZ, Sultana N, Razzak SA, Hossain MM (2022) Modeling and multi-objective optimization of microalgae biomass production and CO2 biofixation using hybrid intelligence approaches. Renew Sustain Energy Rev 157:112016. https://doi.org/10.1016/j.rser.2021.112016

    Article  CAS  Google Scholar 

  29. Shahriari B, Swersky K, Wang Z et al (2016) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104:148–175. https://doi.org/10.1109/JPROC.2015.2494218

    Article  Google Scholar 

  30. Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014

    Article  Google Scholar 

  31. Tang F, Ishwaran H (2017) Random forest missing data algorithms. Stat Anal Data Min ASA Data Sci J 10:363–377. https://doi.org/10.1002/sam.11348

    Article  Google Scholar 

  32. Kokla M, Virtanen J, Kolehmainen M et al (2019) Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. BMC Bioinformatics 20:492. https://doi.org/10.1186/s12859-019-3110-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ascher S, Sloan W, Watson I, You S (2022) A comprehensive artificial neural network model for gasification process prediction. Appl Energy 320:119289. https://doi.org/10.1016/j.apenergy.2022.119289

    Article  Google Scholar 

  34. Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans R Soc A Math Phys Eng Sci 374:20150202. https://doi.org/10.1098/rsta.2015.0202

    Article  Google Scholar 

  35. Huang X-Y, Ao T-J, Zhang X et al (2023) Develo** high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation. Bioresour Technol 385:129375. https://doi.org/10.1016/j.biortech.2023.129375

    Article  CAS  PubMed  Google Scholar 

  36. Greenhill S, Rana S, Gupta S et al (2020) Bayesian optimization for adaptive experimental design: a review. IEEE Access 8:13937–13948. https://doi.org/10.1109/ACCESS.2020.2966228

    Article  Google Scholar 

  37. Zhang W, Chen Q, Chen J et al (2023) Machine learning for hydrothermal treatment of biomass: a review. Bioresour Technol 370:128547. https://doi.org/10.1016/j.biortech.2022.128547

    Article  CAS  PubMed  Google Scholar 

  38. Abe M, Kuroda K, Sato D et al (2015) Effects of polarity, hydrophobicity, and density of ionic liquids on cellulose solubility. Phys Chem Chem Phys 17:32276–32282. https://doi.org/10.1039/C5CP05808B

    Article  CAS  PubMed  Google Scholar 

  39. Sun W, Greaves TL, Othman MZ (2020) Electro-assisted pretreatment of lignocellulosic materials in ionic liquid-promoted organic solvents. ACS Sustain Chem Eng 8:18177–18186. https://doi.org/10.1021/acssuschemeng.0c06537

    Article  CAS  Google Scholar 

  40. Gallardo K, Castillo R, Mancilla N, Remonsellez F (2020) Biosorption of rare-earth elements from aqueous solutions using walnut shell. Front Chem Eng 2:4. https://doi.org/10.3389/fceng.2020.00004

    Article  Google Scholar 

  41. Torre-Tojal L, Bastarrika A, Boyano A et al (2022) Above-ground biomass estimation from LiDAR data using random forest algorithms. J Comput Sci 58:101517. https://doi.org/10.1016/j.jocs.2021.101517

    Article  Google Scholar 

  42. Probst P, Wright MN, Boulesteix A (2019) Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discov 9:e1301. https://doi.org/10.1002/widm.1301

  43. Zhang W, Cheng X, Hu Y, Yan Y (2019) Online prediction of biomass moisture content in a fluidized bed dryer using electrostatic sensor arrays and the Random Forest method. Fuel 239:437–445. https://doi.org/10.1016/j.fuel.2018.11.049

    Article  CAS  Google Scholar 

  44. Ahmad MW, Mourshed M, Rezgui Y (2017) Trees vs neurons: comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build 147:77–89. https://doi.org/10.1016/j.enbuild.2017.04.038

    Article  Google Scholar 

  45. Maniruzzaman M, Rahman MJ, Al-MehediHasan M et al (2018) Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst 42:92. https://doi.org/10.1007/s10916-018-0940-7

    Article  PubMed  PubMed Central  Google Scholar 

  46. Busato S, Gordon M, Chaudhari M et al (2023) Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies. Curr Opin Plant Biol 71:102326. https://doi.org/10.1016/j.pbi.2022.102326

    Article  PubMed  Google Scholar 

  47. Martín-Fernández J-A, Hron K, Templ M et al (2015) Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling 15:134–158. https://doi.org/10.1177/1471082X14535524

    Article  Google Scholar 

  48. Velidandi A, Kumar Gandam P, Latha Chinta M et al (2023) State-of-the-art and future directions of machine learning for biomass characterization and for sustainable biorefinery. J Energy Chem 81:42–63. https://doi.org/10.1016/j.jechem.2023.02.020

    Article  CAS  Google Scholar 

  49. Scheda R, Diciotti S (2022) Explanations of machine learning models in repeated nested cross-validation: an application in age prediction using brain complexity features. Appl Sci 12:6681. https://doi.org/10.3390/app12136681

    Article  CAS  Google Scholar 

  50. Thomas RM, Bruin W, Zhutovsky P, van Wingen G (2020) Chapter 14 - Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. In: Mechelli A, Vieira S (eds) Machine Learning Methods and Applications to Brain Disorders. Academic Press, London, pp 249–266. https://doi.org/10.1016/B978-0-12-815739-8.00014-6

Download references

Acknowledgements

D. H. and B. M. would like to acknowledge Karunya Institute of Technology and Sciences, Coimbatore for providing every essential support to perform the experiments and complete this research work.

Funding

This work is financially supported by Karunya Institute of Technology and Sciences, Coimbatore.

Author information

Authors and Affiliations

Authors

Contributions

Biswanath Mahanty: conceptualisation, software, writing — original draft, reviewing and editing. Munmun Gharami: data curation, reviewing and editing. Dibyajyoti Haldar: conceptualisation, data curation, reviewing and editing.

Corresponding authors

Correspondence to Biswanath Mahanty or Dibyajyoti Haldar.

Ethics declarations

Ethics Approval and Consent to Participate

The study does not involve any human participants, human data, or human tissue. Ethics approval is not applicable.

Consent for Publication

Not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 163 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahanty, B., Gharami, M. & Haldar, D. Machine Learning Modelling for Predicting the Efficacy of Ionic Liquid-Aided Biomass Pretreatment. Bioenerg. Res. (2024). https://doi.org/10.1007/s12155-024-10747-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12155-024-10747-2

Keywords

Navigation