Log in

Interpretable machine learning for materials design

  • Article
  • Published:
Journal of Materials Research Aims and scope Submit manuscript

Abstract

Fueled by the widespread adoption of machine learning and the high-throughput screening of materials, the data-centric approach to materials design has asserted itself as a robust and powerful tool for the in silico prediction of materials properties. When training models to predict material properties, researchers often face a difficult choice between a model’s interpretability and performance. We study this trade-off by leveraging four different state-of-the-art machine learning techniques: XGBoost, SISSO, Roost, and TPOT for the prediction of structural and electronic properties of perovskites and 2D materials. We then assess the future outlook of the continued integration of machine learning into materials discovery and identify the key problems that will continue to challenge researchers as the size of the literature’s datasets and complexity of models increases. Finally, we offer several possible solutions to these challenges with a focus on retaining interpretability and share our thoughts on magnifying the impact of machine learning on materials design.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

Data availability 

Copies of the datasets used in this work can be found at Exabyte’s GitHub (https://github.com/Exabyte-io/Scientific-Projects/tree/Updates_29_09_22/DigitalEcosystem/raw_data) in the form of serialized Python objects (pkl files).

Code availability

Jupyter (Python) notebooks are available on Exabyte’s GitHub (https://github.com/Exabyte-io/Scientific-Projects/tree/Updates_29_09_22), which contains code to reproduce our results and figures.

References

  1. C. Draxl, M. Scheffler, Big data-driven materials science and Its FAIR data infrastructure, in Handbook of materials modeling: methods: theory and modeling. ed. by W. Andreoni, S. Yip (Springer International Publishing, Cham, 2020), pp.49–73. https://doi.org/10.1007/978-3-319-44677-6_104

    Chapter  Google Scholar 

  2. A..C. Mater, M..L. Coote, Deep learning in chemistry. J. Chem. Info. Model. 59(6), 2545–2559 (2019). https://doi.org/10.1021/acs.jcim.9b00266

    Article  CAS  Google Scholar 

  3. K..T. Butler, D..W. Davies, H. Cartwright, O. Isayev, Aon Walsh, Machine learning for molecular and materials science. Nature 559(7715), 547–555 (2018). https://doi.org/10.1038/s41586-018-0337-2

    Article  CAS  Google Scholar 

  4. L. Bornmann, R. Mutz, Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Info. Sci. Technol. 66(11), 2215–2222 (2015). https://doi.org/10.1002/asi.23329

    Article  CAS  Google Scholar 

  5. J. Derek, Price little science (Columbia University Press, New York, 1963). https://doi.org/10.7312/pric91844

    Book  Google Scholar 

  6. J. Derek, Price science since Babylon (Yale University Press, New Haven, 1975)

    Google Scholar 

  7. D..J. de Solla, Price, networks of scientific papers. Science 149(3683), 510–515 (1965). https://doi.org/10.1126/science.149.3683.510

    Article  Google Scholar 

  8. National Science and Technology Council. Materials Genome Initiative for Global Competitiveness. Government, White House Office of Science and Technology Policy, United States of America, (2011)

  9. Subcommittee on the Materials Genome Initiative Committee on Technology. Materials Genome Initiative Strategic Plan. Government, National Science and Technology Council, United States of America, (2021)

  10. J..J. de Pablo, N..E. Jackson, M..A. Webb, L.-Q. Chen, J..E. Moore, D. Morgan, R. Jacobs, T. Pollock, D..G. Schlom, E..S. Toberer, J. Analytis, I. Dabo, D..M. DeLongchamp, G..A. Fiete, G..M. Grason, G. Hautier, Y. Mo, K. Rajan, E..J. Reed, E. Rodriguez, V. Stevanovic, J. Suntivich, K. Thornton, J.-C. Zhao, New frontiers for the materials genome initiative. Comput. Mater. 5(1), 1–23 (2019). https://doi.org/10.1038/s41524-019-0173-4

    Article  Google Scholar 

  11. J. Zhou, L. Shen, M..D. Costa, K..A. Persson, S..P. Ong, P. Huck, Y. Lu, X. Ma, Y. Chen, H. Tang, Y..P. Feng, 2DMatPedia, an open computational database of two-dimensional materials from top-down and bottom-up approaches. Sci. Data 6(1), 86 (2019). https://doi.org/10.1038/s41597-019-0097-3

    Article  Google Scholar 

  12. S. Curtarolo, W. Setyawan, G..W. Hart, M. Jahnatek, R..V. Chepulskii, R..H. Taylor, S. Wang, J. Xue, K. Yang, O. Levy, M..J. Mehl, H..T. Stokes, D..O. Demchenko, D.. Morgan, AFLOW: an automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012a). https://doi.org/10.1016/j.commatsci.2012.02.005

    Article  CAS  Google Scholar 

  13. S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R..H. Taylor, L..J. Nelson, G..L..W. Hart, S. Sanvito, M. Buongiorno-Nardelli, N. Mingo, O. Levy, AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012b). https://doi.org/10.1016/j.commatsci.2012.02.002

    Article  CAS  Google Scholar 

  14. M.N. Gjerding, A. Taghizadeh, A. Rasmussen, S. Ali, F. Bertoldo, T. Deilmann, N.R. Knøsgaard, M. Kruse, A.H. Larsen, S. Manti, T.G. Pedersen, U. Petralanda, T. Skovhus, M.K. Svendsen, J.J. Mortensen, T. Olsen, K.S. Thygesen, Recent progress of the computational 2D materials database (C2DB). 2D Mater. 8(4), 044002 (2021). https://doi.org/10.1088/2053-1583/ac1059

    Article  CAS  Google Scholar 

  15. S. Haastrup, M. Strange, M. Pandey, T. Deilmann, P.S. Schmidt, N.F. Hinsche, M.N. Gjerding, D. Torelli, P.M. Larsen, A.C. Riis-Jensen, J. Gath, K.W. Jacobsen, J.J. Mortensen, T. Olsen, K.S. Thygesen, The computational 2D materials database: high-throughput modeling and discovery of atomically thin crystals. 2D Mater. 5(4), 042002 (2018). https://doi.org/10.1088/2053-1583/aacfc1

    Article  CAS  Google Scholar 

  16. D..D. Landis, J..S. Hummelshøj, S. Nestorov, J. Greeley, M. Dułak, T. Bligaard, J..K. Nørskov, Karsten W. Jacobsen, The computational materials repository. Comput. Sci. Eng. 14(6), 51–57 (2012). https://doi.org/10.1109/MCSE.2012.16

    Article  Google Scholar 

  17. K. Choudhary, K..F. Garrity, A..C..E. Reid, B.. DeCost, A..J. Biacchi, A..R. Hight Walker, Z. Trautt, J. Hattrick-Simpers, A..G. Kusne, A. Centrone, A. Davydov, J. Jiang, R. Pachter, G. Cheon, E. Reed, A. Agrawal, X. Qian, V. Sharma, H. Zhuang, S..V. Kalinin, B..G. Sumpter, G. Pilania, P. Acar, S. Mandal, K. Haule, D. Vanderbilt, K. Rabe, F. Tavazza, The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design. Comput. Mater. 6(1), 1–13 (2020). https://doi.org/10.1038/s41524-020-00440-1

    Article  Google Scholar 

  18. A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson, Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1(1), 011002 (2013). https://doi.org/10.1063/1.4812323

    Article  CAS  Google Scholar 

  19. C. Draxl, M. Scheffler, The NOMAD laboratory: from data sharing to artificial intelligence. J. Phys. Mater. 2(3), 036001 (2019). https://doi.org/10.1088/2515-7639/ab13bb

    Article  CAS  Google Scholar 

  20. S. Kirklin, J..E. Saal, B. Meredig, A. Thompson, J..W. Doak, M. Aykol, S. Rühl, C. Wolverton, The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. Comput. Mater. 1(1), 1–15 (2015). https://doi.org/10.1038/npjcompumats.2015.10

    Article  CAS  Google Scholar 

  21. T. van der Ploeg, P.C. Austin, E.W. Steyerberg, Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 14(1), 137 (2014). https://doi.org/10.1186/1471-2288-14-137

    Article  Google Scholar 

  22. L. Mason, J. Baxter, P. Bartlett, M. Frean, Boosting algorithms as gradient descent. Advances in neural information processing systems (MIT Press, Cambridge, 2000)

    Google Scholar 

  23. T. Hastie, R. Tibshirani, J.H. Friedman, The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics (Springer, New York, 2009)

    Book  Google Scholar 

  24. T. Chen, C. Guestrin. X.G. Boost, A Scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. https://doi.org/10.1145/2939672.2939785

  25. H. Liang, K. Jiang, T.-A. Yan, G.-H. Chen, XGBoost: an optimal machine learning model with just structural features to discover MOF adsorbents of Xe/Kr. ACS Omega 6(13), 9066–9076 (2021). https://doi.org/10.1021/acsomega.1c00100

    Article  CAS  Google Scholar 

  26. N.A. Husna, A. Bustamam, A. Yanuar, D. Sarwinda, O. Hermansyah, The comparison of machine learning methods for prediction study of type 2 diabetes mellitus’s drug design. AIP Conf. Proc. 2264(1), 030010 (2020). https://doi.org/10.1063/5.0024161

    Article  CAS  Google Scholar 

  27. P.D. Ivatt, M.J. Evans, Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression trees. Atmospheric Chem. Phys. 20(13), 8063–8082 (2020). https://doi.org/10.5194/acp-20-8063-2020

    Article  CAS  Google Scholar 

  28. C.W. Schmidt, Tox 21: new dimensions of toxicity testing. Environ. Health Perspect. 117(8), A348–A353 (2009). https://doi.org/10.1289/ehp.117-a348

    Article  Google Scholar 

  29. D.L. Mobley, J.P. Guthrie, FreeSolv: a database of experimental and calculated hydration free energies, with input files. J. Comput.-Aided Mol. Design 28(7), 711–720 (2014). https://doi.org/10.1007/s10822-014-9747-x

    Article  CAS  Google Scholar 

  30. M. Kuhn, I. Letunic, L.J. Jensen, P. Bork, The SIDER database of drugs and side effects. Nucleic Acids Res. 44(D1), D1075-1079 (2016). https://doi.org/10.1093/nar/gkv1075

    Article  CAS  Google Scholar 

  31. D. Deng, X. Chen, R. Zhang, Z. Lei, X. Wang, F. Zhou, XGraphBoost: extracting graph neural network-based features for a better prediction of molecular properties. J. Chem. Info. Model. 61(6), 2697–2705 (2021). https://doi.org/10.1021/acs.jcim.0c01489

    Article  CAS  Google Scholar 

  32. J. Behler, M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98(14), 146–401 (2007). https://doi.org/10.1103/PhysRevLett.98.146401

    Article  CAS  Google Scholar 

  33. T. **e, J.C. Grossman, Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120(14), 145–301 (2018). https://doi.org/10.1103/PhysRevLett.120.145301

    Article  Google Scholar 

  34. R.A. Goodall, A.A. Lee, Predicting materials properties without crystal structure: deep representation learning from stoichiometry. Nat. Commun. 11(1), 6280 (2020). https://doi.org/10.1038/s41467-020-19964-7

    Article  CAS  Google Scholar 

  35. J. Behler, Four generations of high-dimensional neural network potentials. Chem. Rev. 121(16), 10037–10072 (2021). https://doi.org/10.1021/acs.chemrev.0c00868

    Article  CAS  Google Scholar 

  36. Kun Yao, John E. Herr, David W. Toth, Ryker Mckintyre, John Parkhill, The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chemical Science 9(8), 2261–2269 (2018). https://doi.org/10.1039/C7SC04934J

    Article  CAS  Google Scholar 

  37. J. Westermayr, M. Gastegger, P. M.arquetand, Combining SchNet and SHARC: the SchNarc machine learning approach for excited-state dynamics. J. Phys. Chem. Lett. 11(10), 3828–3834 (2020). https://doi.org/10.1021/acs.jpclett.0c00527

    Article  CAS  Google Scholar 

  38. K.T. Schütt, P.-J. Kindermans, H.E. Sauceda, S. Chmiela, A. Tkatchenko, K.-R. Müller, SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. Mach. Learn. (2017). https://doi.org/10.48550/ar**v.1706.08566

    Article  Google Scholar 

  39. A. Toniato, P. Schwaller, A. Cardinale, J. Geluykens, T. Laino, Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intel. 3(6), 485–494 (2021). https://doi.org/10.1038/s42256-021-00319-w

    Article  Google Scholar 

  40. A.C. Vaucher, P. Schwaller, J. Geluykens, V.H. Nair, A. Iuliano, T. Laino, Inferring experimental procedures from text-based representations of chemical reactions. Nat. Commun. 12(1), 2573 (2021). https://doi.org/10.1038/s41467-021-22951-1

    Article  CAS  Google Scholar 

  41. J. Panteleev, H. Gao, L. Jia, Recent applications of machine learning in medicinal chemistry. Bioorganic Med. Chem. Lett. 28(17), 2807–2815 (2018). https://doi.org/10.1016/j.bmcl.2018.06.046

    Article  CAS  Google Scholar 

  42. Y. Liang, S. Li, C. Yan, M. Li, C. Jiang, Explaining the black-box model: a survey of local interpretation methods for deep neural networks. Neurocomputing 419, 168–182 (2021). https://doi.org/10.1016/j.neucom.2020.08.011

    Article  Google Scholar 

  43. P. Gijsbers, E. LeDell, J. Thomas, S. Poirier, B. Bischl, J, Vanschoren. An open source autoML benchmark. ar**v:1907.00909[cs, stat], (2019)

  44. Q. Yao, M. Wang, Y. Chen, W. Dai, Y-F. Li, W-W. Tu, Q. Yang, Y. Yu, Taking Human out of Learning applications: a survey on automated machine learning. ar**v:1810.13306[cs, stat], December (2019)

  45. X. He, K. Zhao, X. Chu, AutoML: a survey of the state-of-the-art. Knowl.-Based Syst. 212, 106622 (2021)

    Article  Google Scholar 

  46. T.T. Le, F. Weixuan, J.H. Moore, Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36(1), 250–256 (2020). https://doi.org/10.1093/bioinformatics/btz470

    Article  CAS  Google Scholar 

  47. R.S. Olson, R.J. Urbanowicz, P.C. Andrews, N.A. Lavender, L.C. Kidd, J.H. Moore, Automating biomedical data science through tree-based pipeline optimization. Applications of evolutionary computation lecture notes in computer science (Springer International Publishing, Cham, 2016), pp.123–137

    Google Scholar 

  48. R.S. Olson, N. Bartley, R.J. Urbanowicz, J.H. Moore, Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 485–492, New York, (2016b). Association for Computing Machinery. https://doi.org/10.1145/2908812.2908918

  49. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  50. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems (Curran Associates Inc., New York, 2019), pp.8024–8035

    Google Scholar 

  51. M. Amir Haeri, M.M. Ebadzadeh, G. Folino, Statistical genetic programming for symbolic regression. Appl. Soft Comput. 60, 447–469 (2017). https://doi.org/10.1016/j.asoc.2017.06.050

    Article  Google Scholar 

  52. K.E. Kinnear, W.B. Langdon, L. Spector, P.J. Angeline, Una-May O’Reilly. Advances in genetic programming (MIT Press, Cambridge, 1994)

    Google Scholar 

  53. Michael Schmidt, Hod Lipson, Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009). https://doi.org/10.1126/science.1165893

    Article  CAS  Google Scholar 

  54. D.R. Stoutemyer, Can the Eureqa symbolic regression program, computer algebra and numerical analysis help each other? ar**v:1203.1023[cs], (2012)

  55. J. Dean, M.G. Taylor, G. Mpourmpakis, Unfolding adsorption on metal nanoparticles: connecting stability with catalysis. Sci. Adv. 5(9), eaax5101 (2019). https://doi.org/10.1126/sciadv.aax5101

    Article  CAS  Google Scholar 

  56. Kaiyang Tan, Mudit Dixit, James Dean, Giannis Mpourmpakis, Predicting metal-support interactions in oxide-supported single-atom catalysts. Indust. Eng. Chem. Res. 58(44), 20236–20246 (2019). https://doi.org/10.1021/acs.iecr.9b04068

    Article  CAS  Google Scholar 

  57. R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L.M. Ghiringhelli, SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2(8), 083 (2018). https://doi.org/10.1103/PhysRevMaterials.2.083802

    Article  Google Scholar 

  58. R. Ouyang, E. Ahmetcik, C. Carbogno, M. Scheffler, L.M. Ghiringhelli, Simultaneous learning of several materials properties from incomplete databases with multi-task SISSO. J. Phys.: Mater. 2(2), 024–002 (2019). https://doi.org/10.1088/2515-7639/ab077b

    Article  Google Scholar 

  59. A. Ihalage, Y. Hao, Analogical discovery of disordered perovskite oxides by crystal structure information hidden in unsupervised material fingerprints. Comput. Mater. 7(1), 1–12 (2021). https://doi.org/10.1038/s41524-021-00536-2

    Article  CAS  Google Scholar 

  60. C.J. Bartel, C. Sutton, B.R. Goldsmith, R. Ouyang, C.B. Musgrave, L.M. Ghiringhelli, M. Scheffler, New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 5(2), eaav0693 (2019). https://doi.org/10.1126/sciadv.aav0693

    Article  CAS  Google Scholar 

  61. Stephen R. **e, Parker Kotlarz, Richard G. Hennig, Juan C. Nino, Machine learning of octahedral tilting in oxide perovskites by symbolic classification with compressed sensing. Comput. Mater. Sci. 180, 109–690 (2020). https://doi.org/10.1016/j.commatsci.2020.109690

    Article  CAS  Google Scholar 

  62. C.M. Acosta, R. Ouyang, A. Fazzio, M. Scheffler, L.M. Ghiringhelli, C. Carbogno, Analysis of topological transitions in two-dimensional materials by compressed sensing. ar**v:1805.10950[cond-mat], May 2018

  63. C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

    Article  Google Scholar 

  64. S. Zeng, P. Kar, U.K. Thakur, K. Shankar, A review on photocatalytic co2 reduction using perovskite oxide nanomaterials. Nanotechnology 29(5), 052001 (2018). https://doi.org/10.1088/1361-6528/aa9fb1

    Article  CAS  Google Scholar 

  65. P. Roy, N.K. Sinha, S. Tiwari, A. Khare, A review on perovskite solar cells: evolution of architecture, fabrication techniques, commercialization issues and status. Solar Energy 198, 665–688 (2020)

    Article  CAS  Google Scholar 

  66. F. Xue, C. Zhang, Y. Ma, Y. Wen, X. He, Y. Bin, X. Zhang, Integrated memory devices based on 2d materials. Adv. Mater. 34(48), 2201880 (2022). https://doi.org/10.1002/adma.202201880

    Article  CAS  Google Scholar 

  67. M. Long, P. Wang, H. Fang, H. Weida, Progress, challenges, and opportunities for 2d material based photodetectors. Adv. Funct. Mater. 29(19), 1803807 (2019). https://doi.org/10.1002/adfm.201803807

    Article  CAS  Google Scholar 

  68. A. Chaves, J.G. Azadani, H. Alsalman, D.R. da Costa, R. Frisenda, A.J. Chaves, S.H. Song, Y.D. Kim, D. He, J. Zhou, A. Castellanos-Gomez, F.M. Peeters, Z. Liu, C.L. Hinkle, S.-H. Oh, P.D. Ye, S.J. Koester, Y.H. Lee, P. Avouris, X. Wang, T. Low, Bandgap engineering of two-dimensional semiconductor materials. 2D Mater. Appl. 4(1), 29 (2020). https://doi.org/10.1038/s41699-020-00162-4

    Article  CAS  Google Scholar 

  69. M.A. Islam, P. Serles, B. Kumral, P.G. Demingos, T. Qureshi, A. Meiyazhagan, A.B. Puthirath, M.S.B. Abdullah, S.R. Faysal, P.M. Ajayan, D. Panesar, C.V. Singh, T. Filleter, Exfoliation mechanisms of 2D materials and their applications. Appl. Phys. Rev. 9(4), 041301 (2022). https://doi.org/10.1063/5.0090717

    Article  CAS  Google Scholar 

  70. S. Körbel, M.A.L. Marques, S. Botti, Stability and electronic properties of new inorganic perovskites from high-throughput ab initio calculations. J. Mater. Chem. C 4(15), 3157–3167 (2016). https://doi.org/10.1039/C5TC04172D

    Article  CAS  Google Scholar 

  71. C. Draxl, M. Scheffler, NOMAD: The FAIR concept for big-data-driven materials science. ar**v:1805.05039[cond-mat, physics:physics], May 2018

  72. P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006). https://doi.org/10.1007/s10994-006-6226-1

    Article  Google Scholar 

  73. M. Rahm, R. Hoffmann, N.W. Ashcroft, Atomic and Ionic Radii of Elements 1–96. Chem. European J. 22(41), 14625–14632 (2016). https://doi.org/10.1002/chem.201602949

    Article  CAS  Google Scholar 

  74. Martin Rahm, Roald Hoffmann, N.. W. Ashcroft, Corrigendum: atomic and ionic radii of elements. Chem. European J. 23(16), 4017–4017 (2017). https://doi.org/10.1002/chem.201700610

    Article  CAS  Google Scholar 

  75. A.K. Rappe, C.J. Casewit, K.S. Colwell, W.A. Goddard, W.M. Skiff, UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114(25), 10024–10035 (1992). https://doi.org/10.1021/ja00051a040

    Article  CAS  Google Scholar 

  76. M.W. Gaultois, T.D. Sparks, C.K.H. Borg, R. Seshadri, W.D. Bonificio, D.R. Clarke, Data-driven review of thermoelectric materials: performance and resource considerations. Chem. Mater. 25(15), 2911–2920 (2013). https://doi.org/10.1021/cm400893e

    Article  CAS  Google Scholar 

  77. D. Jha, L. Ward, A. Paul, W.-K. Liao, A. Choudhary, C. Wolverton, A. Agrawal, ElemNet: deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8(1), 17593 (2018). https://doi.org/10.1038/s41598-018-35934-y

    Article  CAS  Google Scholar 

  78. D.C. Ghosh, A new scale of electronegativity based on absolute radii of atoms. J. Theoretical Comput. Chem. 04(01), 21–33 (2005). https://doi.org/10.1142/S0219633605001556

    Article  CAS  Google Scholar 

  79. P. Pyykkö, S. Riedel, M. Patzschke, Triple-bond covalent radii. Chem. European J. 11(12), 3511–3520 (2005). https://doi.org/10.1002/chem.200401299

    Article  CAS  Google Scholar 

  80. F. Tran, J. Doumont, L. Kalantari, P. Blaha, T. Rauch, P. Borlido, S. Botti, M.A.L. Marques, A. Patra, S. Jana, P. Samal, Bandgap of two-dimensional materials: thorough assessment of modern exchange–correlation functionals. J. Chem. Phys. 155(10), 104–103 (2021)

    Article  Google Scholar 

  81. A.C. Rajan, A. Mishra, S. Satsangi, R. Vaish, H. Mizuseki, K.-R. Lee, A.K. Singh, Machine-learning-assisted accurate band gap predictions of functionalized MXene. Chem. Mater. 30(12), 4031–4038 (2018). https://doi.org/10.1021/acs.chemmater.8b00686

    Article  CAS  Google Scholar 

  82. Y. Zhang, X. Wen**g, G. Liu, Z. Zhang, J. Zhu, M. Li, Bandgap prediction of two-dimensional materials using machine learning. PLOS ONE 16(8), e0255637 (2021). https://doi.org/10.1371/journal.pone.0255637

    Article  CAS  Google Scholar 

  83. K. Choudhary, Brian DeCost. Atomistic line graph neural network for improved materials property predictions. ar**v:2106.01829[cond-mat], (2021)

  84. C. Chen, W. Ye, Y. Zuo, C. Zheng, S.P. Ong, Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31(9), 3564–3572 (2019). https://doi.org/10.1021/acs.chemmater.9b01294

    Article  CAS  Google Scholar 

  85. A.R. Oganov, Modern methods of crystal structure prediction (Wiley-VCH, Weinheim, 2011)

    Google Scholar 

  86. D. Kumar, A.J. Elias, The explosive chemistry of nitrogen. Resonance 24(11), 1253–1271 (2019). https://doi.org/10.1007/s12045-019-0893-2

    Article  Google Scholar 

  87. P. Schwaller, R. Petraglia, V. Zullo, V.H. Nair, R.A. Haeuselmann, R. Pisoni, C. Bekas, A. Iuliano, T. Laino, Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11(12), 3316–3325 (2020). https://doi.org/10.1039/C9SC05704H

    Article  CAS  Google Scholar 

  88. P. Schwaller, T. Laino, T. Gaudin, P. Bolgar, C.A. Hunter, C. Bekas, A.A. Lee, Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Central Sci. 5(9), 1572–1583 (2019). https://doi.org/10.1021/acscentsci.9b00576

    Article  CAS  Google Scholar 

  89. Philippe Schwaller, Théophile. Gaudin, Dávid. Lányi, Costas Bekas, Teodoro Laino, Found in translation: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9(28), 6091–6098 (2018). https://doi.org/10.1039/C8SC02339E

    Article  CAS  Google Scholar 

  90. F.-L. Fan, J. **ong, M. Li, G. Wang, On interpretability of artificial neural networks: a survey. IEEE Trans. Radiat. Plasma Med Sci (2021). https://doi.org/10.1109/TRPMS.2021.3066428

    Article  Google Scholar 

  91. Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network interpretability. IEEE Trans. Emerg. Topics Comput. Intell. 5(5), 726–742 (2021b). https://doi.org/10.1109/TETCI.2021.3100641

    Article  Google Scholar 

  92. N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeersch, G.R. Hutchison, Open babel: an open chemical toolbox. J. Cheminformatics 3(1), 33 (2011). https://doi.org/10.1186/1758-2946-3-33

    Article  CAS  Google Scholar 

  93. S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V.L. Chevrier, K.A. Persson, G. Ceder, Python Materials Genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013). https://doi.org/10.1016/j.commatsci.2012.10.028

    Article  CAS  Google Scholar 

  94. A.H. Larsen, J.J. Mortensen, J. Blomqvist, I.E. Castelli, R. Christensen, M. Dulak, J. Friis, M.N. Groves, B. Hammer, C. Hargus, E.D. Hermes, P.C. Jennings, P.B. Jensen, J. Kermode, J.R. Kitchin, E.L. Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J.B. Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. Schiøtz, O. Schütt, M. Strange, K.S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, K.W. Jacobsen, The atomic simulation environment— a python library for working with atoms. J. Phys.: Condensed Matter. 29(27), 273–002 (2017). https://doi.org/10.1088/1361-648X/aa680e

    Article  Google Scholar 

  95. G. Landrum, P. Tosco, B. Kelley, sriniker, gedeck, NadineSchneider, Riccardo Vianello, Ric, Andrew Dalke, Brian Cole, AlexanderSavelyev, Matt Swain, Samo Turk, Dan N, Alain Vaucher, Eisuke Kawashima, Maciej Wójcikowski, Daniel Probst, guillaume godin, David Cosgrove, Axel Pahl, JP, Francois Berenger, strets123, JLVarjo, Noel O’Boyle, Patrick Fuller (Gianluca Sforna, and DoliathGavid. RDKit, Jan Holst Jensen, 2021)

  96. Logan Ward, Alexander Dunn, Alireza Faghaninia, N.E.R. Zimmermann, S. Bajaj, Q. Wang, J. Montoya, J. Chen, K. Bystrom, M. Dylla, K. Chard, M. Asta, K.A. Persson, G. Jeffrey Snyder, I. Foster, A. Jain, Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018). https://doi.org/10.1016/j.commatsci.2018.05.018

    Article  Google Scholar 

  97. Łukasz Mentel. Mendeleev – A Python resource for properties of chemical elements, ions and isotopes, ver. 0.9.0, (2014)

  98. Bingbing Zhang, **aodong Zhang, Yu. **, Ying Wang, Wu. Kui, Ming-Hsien. Lee, First-Principles High-Throughput Screening Pipeline for Nonlinear Optical Materials: Application to Borates. Chemistry of Materials 32(15), 6772–6779 (2020). https://doi.org/10.1021/acs.chemmater.0c02583. (ISSN 0897-4756)

    Article  CAS  Google Scholar 

  99. Lorenz M. Mayr, Dejan Bojanic, Novel trends in high-throughput screening. Current Opinion in Pharmacology 9(5), 580–588 (2009). https://doi.org/10.1016/j.coph.2009.08.004

    Article  CAS  Google Scholar 

  100. James Dean, Michael J. Cowan, Jonathan Estes, Mahmoud Ramadan, Giannis Mpourmpakis, Rapid prediction of bimetallic mixing behavior at the nanoscale. ACS Nano 14(7), 8171–8180 (2020). https://doi.org/10.1021/acsnano.0c01586

    Article  CAS  Google Scholar 

  101. M. Uhrin, S.P. Huber, J. Yu, N. Marzari, G. Pizzi, Workflows in AiiDA: engineering a high-throughput, event-based engine for robust and modular computational workflows. Comput. Mater. Sci. 187, 110–086 (2021). https://doi.org/10.1016/j.commatsci.2020.110086

    Article  Google Scholar 

  102. S.P. Huber, S. Zoupanos, M. Uhrin, L. Talirz, L. Kahle, R. Häuselmann, D. Gresch, T. Müller, A.V. Yakutovich, C.W. Andersen, F.F. Ramirez, C.S. Adorf, F. Gargiulo, S. Kumbhar, E. Passaro, C. Johnston, A. Merkys, A. Cepellotti, N. Mounet, N. Marzari, B. Kozinsky, G. Pizzi, AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance. Sci. Data 7(1), 300 (2020). https://doi.org/10.1038/s41597-020-00638-4

    Article  Google Scholar 

  103. T. Bazhirov, Data-centric online ecosystem for digital materials science. ar**v:1902.10838[cond-mat, physics:physics], (2019)

  104. T. Bazhirov, E. X. Abot, Fast and accessible first-principles calculations of vibrational properties of materials. ar**v:1808.10011[cond-mat, physics:physics], (2018)

  105. P. Das, M. Mohammadi, T. Bazhirov, Accessible computational materials design with high fidelity and high throughput. ar**v:1807.05623[cond-mat, physics:physics], (2018)

  106. P. Das, T. Bazhirov, Electronic properties of binary compounds with high fidelity and high throughput. J. Phys.: Conf. Series 1290, 012–011 (2019). https://doi.org/10.1088/1742-6596/1290/1/012011

    Article  CAS  Google Scholar 

  107. A. Zech, T. Bazhirov, CateCom: a practical data-centric approach to categorization of computational models. J. Chem. Inf. Model. 62(5), 1268–1281 (2022). https://doi.org/10.1021/acs.jcim.2c00112

    Article  CAS  Google Scholar 

  108. H. Yamada, C. Liu, S. Wu, Y. Koyama, S. Ju, J. Shiomi, J. Morikawa, R. Yoshida, Predicting materials properties with little data using shotgun transfer learning. ACS Central Sci. 5(10), 1717–1730 (2019). https://doi.org/10.1021/acscentsci.9b00804

    Article  CAS  Google Scholar 

  109. J.R. Rumble, T.J. Bruno, M.J. Doa, CRC handbook of chemistry and physics: a ready-reference book of chemical and physical data (CRC Press, Boca Raton, 2021)

    Google Scholar 

  110. L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials. Comput. Mater. 2(1), 1–7 (2016). https://doi.org/10.1038/npjcompumats.2016.28

    Article  Google Scholar 

  111. A. Salinas-Sanchez, J.L. Garcia-Muñoz, J. Rodriguez-Carvajal, R. Saez-Puche, J.L. Martinez, Structural characterization of R2BaCuO5 (r = y, lu, yb, tm, er, ho, dy, gd, eu and sm) oxides by x-ray and neutron diffraction. J. Solid State Chem. 100(2), 201–211 (1992). https://doi.org/10.1016/0022-4596(92)90094-C

    Article  CAS  Google Scholar 

  112. P.P. Ewald, Die Berechnung optischer und elektrostatischer Gitterpotentiale. Annalen der Physik 369(3), 253–287 (1921). https://doi.org/10.1002/andp.19213690304

    Article  Google Scholar 

  113. Jmol development team. Jmol, (2016)

  114. T. Akiba, S. Sano, T. Yanase, T. Ohta, M. K. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, pp. 2623–2631, New York,  (2019). Association for Computing Machinery.  https://doi.org/10.1145/3292500.3330701

  115. J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization. Advances in neural information processing systems (Curran Associates Inc., New York, 2011)

    Google Scholar 

  116. J. Bergstra, D. Yamins, D. Cox, Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, pp 115–123. PMLR, (2013)

  117. L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)

    Google Scholar 

  118. NIST Chemistry WebBook, NIST standard reference database number 69 (National Institute of Standards and Technology, Gaithersburg, 2021)

    Google Scholar 

Download references

Funding

This research was supported by the US Department of Energy (DoE) Small Business Innovation Research (SBIR) program (grant no. DE-SC0021514). Computational resources were provided by Exabyte Inc. T.P. and M.S. were funded by the NOMAD CoE (Novel Materials Discovery Center of Excellence, European Union’s Horizon 2020 research and innovation program, grant agreement N\(^\circ\) 951786), the project TEC1p (European Research Council, grant agreement N\(^\circ\) 740233), and the project FAIRmat (FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids, German Research Foundation, project N\(^\circ\) 460197019). T.P. would like to thank the Alexander von Humboldt Foundation for their support through the Alexander von Humboldt Postdoctoral Fellowship Program.

Author information

Authors and Affiliations

Authors

Contributions

JD and TB were responsible for performing the calculations and preparing the manuscript. TP performed and advised on the SISSO calculations. MS, RB, SB, and TB guided and conceptualized the project. All authors contributed in revising and writing the manuscript.

Corresponding author

Correspondence to Timur Bazhirov.

Ethics declarations

Conflicts of interest

James Dean carried out this work while employed by Exabyte Inc. Timur Bazhirov is currently employed by Exabyte Inc.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 1740.2 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dean, J., Scheffler, M., Purcell, T.A.R. et al. Interpretable machine learning for materials design. Journal of Materials Research 38, 4477–4496 (2023). https://doi.org/10.1557/s43578-023-01164-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1557/s43578-023-01164-w

Navigation