Abstract
Fueled by the widespread adoption of machine learning and the high-throughput screening of materials, the data-centric approach to materials design has asserted itself as a robust and powerful tool for the in silico prediction of materials properties. When training models to predict material properties, researchers often face a difficult choice between a model’s interpretability and performance. We study this trade-off by leveraging four different state-of-the-art machine learning techniques: XGBoost, SISSO, Roost, and TPOT for the prediction of structural and electronic properties of perovskites and 2D materials. We then assess the future outlook of the continued integration of machine learning into materials discovery and identify the key problems that will continue to challenge researchers as the size of the literature’s datasets and complexity of models increases. Finally, we offer several possible solutions to these challenges with a focus on retaining interpretability and share our thoughts on magnifying the impact of machine learning on materials design.
Graphical abstract
![](http://media.springernature.com/lw685/springer-static/image/art%3A10.1557%2Fs43578-023-01164-w/MediaObjects/43578_2023_1164_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1557%2Fs43578-023-01164-w/MediaObjects/43578_2023_1164_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1557%2Fs43578-023-01164-w/MediaObjects/43578_2023_1164_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1557%2Fs43578-023-01164-w/MediaObjects/43578_2023_1164_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1557%2Fs43578-023-01164-w/MediaObjects/43578_2023_1164_Fig4_HTML.png)
Similar content being viewed by others
Data availability
Copies of the datasets used in this work can be found at Exabyte’s GitHub (https://github.com/Exabyte-io/Scientific-Projects/tree/Updates_29_09_22/DigitalEcosystem/raw_data) in the form of serialized Python objects (pkl files).
Code availability
Jupyter (Python) notebooks are available on Exabyte’s GitHub (https://github.com/Exabyte-io/Scientific-Projects/tree/Updates_29_09_22), which contains code to reproduce our results and figures.
References
C. Draxl, M. Scheffler, Big data-driven materials science and Its FAIR data infrastructure, in Handbook of materials modeling: methods: theory and modeling. ed. by W. Andreoni, S. Yip (Springer International Publishing, Cham, 2020), pp.49–73. https://doi.org/10.1007/978-3-319-44677-6_104
A..C. Mater, M..L. Coote, Deep learning in chemistry. J. Chem. Info. Model. 59(6), 2545–2559 (2019). https://doi.org/10.1021/acs.jcim.9b00266
K..T. Butler, D..W. Davies, H. Cartwright, O. Isayev, Aon Walsh, Machine learning for molecular and materials science. Nature 559(7715), 547–555 (2018). https://doi.org/10.1038/s41586-018-0337-2
L. Bornmann, R. Mutz, Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Info. Sci. Technol. 66(11), 2215–2222 (2015). https://doi.org/10.1002/asi.23329
J. Derek, Price little science (Columbia University Press, New York, 1963). https://doi.org/10.7312/pric91844
J. Derek, Price science since Babylon (Yale University Press, New Haven, 1975)
D..J. de Solla, Price, networks of scientific papers. Science 149(3683), 510–515 (1965). https://doi.org/10.1126/science.149.3683.510
National Science and Technology Council. Materials Genome Initiative for Global Competitiveness. Government, White House Office of Science and Technology Policy, United States of America, (2011)
Subcommittee on the Materials Genome Initiative Committee on Technology. Materials Genome Initiative Strategic Plan. Government, National Science and Technology Council, United States of America, (2021)
J..J. de Pablo, N..E. Jackson, M..A. Webb, L.-Q. Chen, J..E. Moore, D. Morgan, R. Jacobs, T. Pollock, D..G. Schlom, E..S. Toberer, J. Analytis, I. Dabo, D..M. DeLongchamp, G..A. Fiete, G..M. Grason, G. Hautier, Y. Mo, K. Rajan, E..J. Reed, E. Rodriguez, V. Stevanovic, J. Suntivich, K. Thornton, J.-C. Zhao, New frontiers for the materials genome initiative. Comput. Mater. 5(1), 1–23 (2019). https://doi.org/10.1038/s41524-019-0173-4
J. Zhou, L. Shen, M..D. Costa, K..A. Persson, S..P. Ong, P. Huck, Y. Lu, X. Ma, Y. Chen, H. Tang, Y..P. Feng, 2DMatPedia, an open computational database of two-dimensional materials from top-down and bottom-up approaches. Sci. Data 6(1), 86 (2019). https://doi.org/10.1038/s41597-019-0097-3
S. Curtarolo, W. Setyawan, G..W. Hart, M. Jahnatek, R..V. Chepulskii, R..H. Taylor, S. Wang, J. Xue, K. Yang, O. Levy, M..J. Mehl, H..T. Stokes, D..O. Demchenko, D.. Morgan, AFLOW: an automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012a). https://doi.org/10.1016/j.commatsci.2012.02.005
S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R..H. Taylor, L..J. Nelson, G..L..W. Hart, S. Sanvito, M. Buongiorno-Nardelli, N. Mingo, O. Levy, AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012b). https://doi.org/10.1016/j.commatsci.2012.02.002
M.N. Gjerding, A. Taghizadeh, A. Rasmussen, S. Ali, F. Bertoldo, T. Deilmann, N.R. Knøsgaard, M. Kruse, A.H. Larsen, S. Manti, T.G. Pedersen, U. Petralanda, T. Skovhus, M.K. Svendsen, J.J. Mortensen, T. Olsen, K.S. Thygesen, Recent progress of the computational 2D materials database (C2DB). 2D Mater. 8(4), 044002 (2021). https://doi.org/10.1088/2053-1583/ac1059
S. Haastrup, M. Strange, M. Pandey, T. Deilmann, P.S. Schmidt, N.F. Hinsche, M.N. Gjerding, D. Torelli, P.M. Larsen, A.C. Riis-Jensen, J. Gath, K.W. Jacobsen, J.J. Mortensen, T. Olsen, K.S. Thygesen, The computational 2D materials database: high-throughput modeling and discovery of atomically thin crystals. 2D Mater. 5(4), 042002 (2018). https://doi.org/10.1088/2053-1583/aacfc1
D..D. Landis, J..S. Hummelshøj, S. Nestorov, J. Greeley, M. Dułak, T. Bligaard, J..K. Nørskov, Karsten W. Jacobsen, The computational materials repository. Comput. Sci. Eng. 14(6), 51–57 (2012). https://doi.org/10.1109/MCSE.2012.16
K. Choudhary, K..F. Garrity, A..C..E. Reid, B.. DeCost, A..J. Biacchi, A..R. Hight Walker, Z. Trautt, J. Hattrick-Simpers, A..G. Kusne, A. Centrone, A. Davydov, J. Jiang, R. Pachter, G. Cheon, E. Reed, A. Agrawal, X. Qian, V. Sharma, H. Zhuang, S..V. Kalinin, B..G. Sumpter, G. Pilania, P. Acar, S. Mandal, K. Haule, D. Vanderbilt, K. Rabe, F. Tavazza, The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design. Comput. Mater. 6(1), 1–13 (2020). https://doi.org/10.1038/s41524-020-00440-1
A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson, Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1(1), 011002 (2013). https://doi.org/10.1063/1.4812323
C. Draxl, M. Scheffler, The NOMAD laboratory: from data sharing to artificial intelligence. J. Phys. Mater. 2(3), 036001 (2019). https://doi.org/10.1088/2515-7639/ab13bb
S. Kirklin, J..E. Saal, B. Meredig, A. Thompson, J..W. Doak, M. Aykol, S. Rühl, C. Wolverton, The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. Comput. Mater. 1(1), 1–15 (2015). https://doi.org/10.1038/npjcompumats.2015.10
T. van der Ploeg, P.C. Austin, E.W. Steyerberg, Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 14(1), 137 (2014). https://doi.org/10.1186/1471-2288-14-137
L. Mason, J. Baxter, P. Bartlett, M. Frean, Boosting algorithms as gradient descent. Advances in neural information processing systems (MIT Press, Cambridge, 2000)
T. Hastie, R. Tibshirani, J.H. Friedman, The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics (Springer, New York, 2009)
T. Chen, C. Guestrin. X.G. Boost, A Scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. https://doi.org/10.1145/2939672.2939785
H. Liang, K. Jiang, T.-A. Yan, G.-H. Chen, XGBoost: an optimal machine learning model with just structural features to discover MOF adsorbents of Xe/Kr. ACS Omega 6(13), 9066–9076 (2021). https://doi.org/10.1021/acsomega.1c00100
N.A. Husna, A. Bustamam, A. Yanuar, D. Sarwinda, O. Hermansyah, The comparison of machine learning methods for prediction study of type 2 diabetes mellitus’s drug design. AIP Conf. Proc. 2264(1), 030010 (2020). https://doi.org/10.1063/5.0024161
P.D. Ivatt, M.J. Evans, Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression trees. Atmospheric Chem. Phys. 20(13), 8063–8082 (2020). https://doi.org/10.5194/acp-20-8063-2020
C.W. Schmidt, Tox 21: new dimensions of toxicity testing. Environ. Health Perspect. 117(8), A348–A353 (2009). https://doi.org/10.1289/ehp.117-a348
D.L. Mobley, J.P. Guthrie, FreeSolv: a database of experimental and calculated hydration free energies, with input files. J. Comput.-Aided Mol. Design 28(7), 711–720 (2014). https://doi.org/10.1007/s10822-014-9747-x
M. Kuhn, I. Letunic, L.J. Jensen, P. Bork, The SIDER database of drugs and side effects. Nucleic Acids Res. 44(D1), D1075-1079 (2016). https://doi.org/10.1093/nar/gkv1075
D. Deng, X. Chen, R. Zhang, Z. Lei, X. Wang, F. Zhou, XGraphBoost: extracting graph neural network-based features for a better prediction of molecular properties. J. Chem. Info. Model. 61(6), 2697–2705 (2021). https://doi.org/10.1021/acs.jcim.0c01489
J. Behler, M. Parrinello, Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98(14), 146–401 (2007). https://doi.org/10.1103/PhysRevLett.98.146401
T. **e, J.C. Grossman, Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120(14), 145–301 (2018). https://doi.org/10.1103/PhysRevLett.120.145301
R.A. Goodall, A.A. Lee, Predicting materials properties without crystal structure: deep representation learning from stoichiometry. Nat. Commun. 11(1), 6280 (2020). https://doi.org/10.1038/s41467-020-19964-7
J. Behler, Four generations of high-dimensional neural network potentials. Chem. Rev. 121(16), 10037–10072 (2021). https://doi.org/10.1021/acs.chemrev.0c00868
Kun Yao, John E. Herr, David W. Toth, Ryker Mckintyre, John Parkhill, The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chemical Science 9(8), 2261–2269 (2018). https://doi.org/10.1039/C7SC04934J
J. Westermayr, M. Gastegger, P. M.arquetand, Combining SchNet and SHARC: the SchNarc machine learning approach for excited-state dynamics. J. Phys. Chem. Lett. 11(10), 3828–3834 (2020). https://doi.org/10.1021/acs.jpclett.0c00527
K.T. Schütt, P.-J. Kindermans, H.E. Sauceda, S. Chmiela, A. Tkatchenko, K.-R. Müller, SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. Mach. Learn. (2017). https://doi.org/10.48550/ar**v.1706.08566
A. Toniato, P. Schwaller, A. Cardinale, J. Geluykens, T. Laino, Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intel. 3(6), 485–494 (2021). https://doi.org/10.1038/s42256-021-00319-w
A.C. Vaucher, P. Schwaller, J. Geluykens, V.H. Nair, A. Iuliano, T. Laino, Inferring experimental procedures from text-based representations of chemical reactions. Nat. Commun. 12(1), 2573 (2021). https://doi.org/10.1038/s41467-021-22951-1
J. Panteleev, H. Gao, L. Jia, Recent applications of machine learning in medicinal chemistry. Bioorganic Med. Chem. Lett. 28(17), 2807–2815 (2018). https://doi.org/10.1016/j.bmcl.2018.06.046
Y. Liang, S. Li, C. Yan, M. Li, C. Jiang, Explaining the black-box model: a survey of local interpretation methods for deep neural networks. Neurocomputing 419, 168–182 (2021). https://doi.org/10.1016/j.neucom.2020.08.011
P. Gijsbers, E. LeDell, J. Thomas, S. Poirier, B. Bischl, J, Vanschoren. An open source autoML benchmark. ar**v:1907.00909[cs, stat], (2019)
Q. Yao, M. Wang, Y. Chen, W. Dai, Y-F. Li, W-W. Tu, Q. Yang, Y. Yu, Taking Human out of Learning applications: a survey on automated machine learning. ar**v:1810.13306[cs, stat], December (2019)
X. He, K. Zhao, X. Chu, AutoML: a survey of the state-of-the-art. Knowl.-Based Syst. 212, 106622 (2021)
T.T. Le, F. Weixuan, J.H. Moore, Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36(1), 250–256 (2020). https://doi.org/10.1093/bioinformatics/btz470
R.S. Olson, R.J. Urbanowicz, P.C. Andrews, N.A. Lavender, L.C. Kidd, J.H. Moore, Automating biomedical data science through tree-based pipeline optimization. Applications of evolutionary computation lecture notes in computer science (Springer International Publishing, Cham, 2016), pp.123–137
R.S. Olson, N. Bartley, R.J. Urbanowicz, J.H. Moore, Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 485–492, New York, (2016b). Association for Computing Machinery. https://doi.org/10.1145/2908812.2908918
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems (Curran Associates Inc., New York, 2019), pp.8024–8035
M. Amir Haeri, M.M. Ebadzadeh, G. Folino, Statistical genetic programming for symbolic regression. Appl. Soft Comput. 60, 447–469 (2017). https://doi.org/10.1016/j.asoc.2017.06.050
K.E. Kinnear, W.B. Langdon, L. Spector, P.J. Angeline, Una-May O’Reilly. Advances in genetic programming (MIT Press, Cambridge, 1994)
Michael Schmidt, Hod Lipson, Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009). https://doi.org/10.1126/science.1165893
D.R. Stoutemyer, Can the Eureqa symbolic regression program, computer algebra and numerical analysis help each other? ar**v:1203.1023[cs], (2012)
J. Dean, M.G. Taylor, G. Mpourmpakis, Unfolding adsorption on metal nanoparticles: connecting stability with catalysis. Sci. Adv. 5(9), eaax5101 (2019). https://doi.org/10.1126/sciadv.aax5101
Kaiyang Tan, Mudit Dixit, James Dean, Giannis Mpourmpakis, Predicting metal-support interactions in oxide-supported single-atom catalysts. Indust. Eng. Chem. Res. 58(44), 20236–20246 (2019). https://doi.org/10.1021/acs.iecr.9b04068
R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L.M. Ghiringhelli, SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2(8), 083 (2018). https://doi.org/10.1103/PhysRevMaterials.2.083802
R. Ouyang, E. Ahmetcik, C. Carbogno, M. Scheffler, L.M. Ghiringhelli, Simultaneous learning of several materials properties from incomplete databases with multi-task SISSO. J. Phys.: Mater. 2(2), 024–002 (2019). https://doi.org/10.1088/2515-7639/ab077b
A. Ihalage, Y. Hao, Analogical discovery of disordered perovskite oxides by crystal structure information hidden in unsupervised material fingerprints. Comput. Mater. 7(1), 1–12 (2021). https://doi.org/10.1038/s41524-021-00536-2
C.J. Bartel, C. Sutton, B.R. Goldsmith, R. Ouyang, C.B. Musgrave, L.M. Ghiringhelli, M. Scheffler, New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 5(2), eaav0693 (2019). https://doi.org/10.1126/sciadv.aav0693
Stephen R. **e, Parker Kotlarz, Richard G. Hennig, Juan C. Nino, Machine learning of octahedral tilting in oxide perovskites by symbolic classification with compressed sensing. Comput. Mater. Sci. 180, 109–690 (2020). https://doi.org/10.1016/j.commatsci.2020.109690
C.M. Acosta, R. Ouyang, A. Fazzio, M. Scheffler, L.M. Ghiringhelli, C. Carbogno, Analysis of topological transitions in two-dimensional materials by compressed sensing. ar**v:1805.10950[cond-mat], May 2018
C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
S. Zeng, P. Kar, U.K. Thakur, K. Shankar, A review on photocatalytic co2 reduction using perovskite oxide nanomaterials. Nanotechnology 29(5), 052001 (2018). https://doi.org/10.1088/1361-6528/aa9fb1
P. Roy, N.K. Sinha, S. Tiwari, A. Khare, A review on perovskite solar cells: evolution of architecture, fabrication techniques, commercialization issues and status. Solar Energy 198, 665–688 (2020)
F. Xue, C. Zhang, Y. Ma, Y. Wen, X. He, Y. Bin, X. Zhang, Integrated memory devices based on 2d materials. Adv. Mater. 34(48), 2201880 (2022). https://doi.org/10.1002/adma.202201880
M. Long, P. Wang, H. Fang, H. Weida, Progress, challenges, and opportunities for 2d material based photodetectors. Adv. Funct. Mater. 29(19), 1803807 (2019). https://doi.org/10.1002/adfm.201803807
A. Chaves, J.G. Azadani, H. Alsalman, D.R. da Costa, R. Frisenda, A.J. Chaves, S.H. Song, Y.D. Kim, D. He, J. Zhou, A. Castellanos-Gomez, F.M. Peeters, Z. Liu, C.L. Hinkle, S.-H. Oh, P.D. Ye, S.J. Koester, Y.H. Lee, P. Avouris, X. Wang, T. Low, Bandgap engineering of two-dimensional semiconductor materials. 2D Mater. Appl. 4(1), 29 (2020). https://doi.org/10.1038/s41699-020-00162-4
M.A. Islam, P. Serles, B. Kumral, P.G. Demingos, T. Qureshi, A. Meiyazhagan, A.B. Puthirath, M.S.B. Abdullah, S.R. Faysal, P.M. Ajayan, D. Panesar, C.V. Singh, T. Filleter, Exfoliation mechanisms of 2D materials and their applications. Appl. Phys. Rev. 9(4), 041301 (2022). https://doi.org/10.1063/5.0090717
S. Körbel, M.A.L. Marques, S. Botti, Stability and electronic properties of new inorganic perovskites from high-throughput ab initio calculations. J. Mater. Chem. C 4(15), 3157–3167 (2016). https://doi.org/10.1039/C5TC04172D
C. Draxl, M. Scheffler, NOMAD: The FAIR concept for big-data-driven materials science. ar**v:1805.05039[cond-mat, physics:physics], May 2018
P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006). https://doi.org/10.1007/s10994-006-6226-1
M. Rahm, R. Hoffmann, N.W. Ashcroft, Atomic and Ionic Radii of Elements 1–96. Chem. European J. 22(41), 14625–14632 (2016). https://doi.org/10.1002/chem.201602949
Martin Rahm, Roald Hoffmann, N.. W. Ashcroft, Corrigendum: atomic and ionic radii of elements. Chem. European J. 23(16), 4017–4017 (2017). https://doi.org/10.1002/chem.201700610
A.K. Rappe, C.J. Casewit, K.S. Colwell, W.A. Goddard, W.M. Skiff, UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114(25), 10024–10035 (1992). https://doi.org/10.1021/ja00051a040
M.W. Gaultois, T.D. Sparks, C.K.H. Borg, R. Seshadri, W.D. Bonificio, D.R. Clarke, Data-driven review of thermoelectric materials: performance and resource considerations. Chem. Mater. 25(15), 2911–2920 (2013). https://doi.org/10.1021/cm400893e
D. Jha, L. Ward, A. Paul, W.-K. Liao, A. Choudhary, C. Wolverton, A. Agrawal, ElemNet: deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8(1), 17593 (2018). https://doi.org/10.1038/s41598-018-35934-y
D.C. Ghosh, A new scale of electronegativity based on absolute radii of atoms. J. Theoretical Comput. Chem. 04(01), 21–33 (2005). https://doi.org/10.1142/S0219633605001556
P. Pyykkö, S. Riedel, M. Patzschke, Triple-bond covalent radii. Chem. European J. 11(12), 3511–3520 (2005). https://doi.org/10.1002/chem.200401299
F. Tran, J. Doumont, L. Kalantari, P. Blaha, T. Rauch, P. Borlido, S. Botti, M.A.L. Marques, A. Patra, S. Jana, P. Samal, Bandgap of two-dimensional materials: thorough assessment of modern exchange–correlation functionals. J. Chem. Phys. 155(10), 104–103 (2021)
A.C. Rajan, A. Mishra, S. Satsangi, R. Vaish, H. Mizuseki, K.-R. Lee, A.K. Singh, Machine-learning-assisted accurate band gap predictions of functionalized MXene. Chem. Mater. 30(12), 4031–4038 (2018). https://doi.org/10.1021/acs.chemmater.8b00686
Y. Zhang, X. Wen**g, G. Liu, Z. Zhang, J. Zhu, M. Li, Bandgap prediction of two-dimensional materials using machine learning. PLOS ONE 16(8), e0255637 (2021). https://doi.org/10.1371/journal.pone.0255637
K. Choudhary, Brian DeCost. Atomistic line graph neural network for improved materials property predictions. ar**v:2106.01829[cond-mat], (2021)
C. Chen, W. Ye, Y. Zuo, C. Zheng, S.P. Ong, Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31(9), 3564–3572 (2019). https://doi.org/10.1021/acs.chemmater.9b01294
A.R. Oganov, Modern methods of crystal structure prediction (Wiley-VCH, Weinheim, 2011)
D. Kumar, A.J. Elias, The explosive chemistry of nitrogen. Resonance 24(11), 1253–1271 (2019). https://doi.org/10.1007/s12045-019-0893-2
P. Schwaller, R. Petraglia, V. Zullo, V.H. Nair, R.A. Haeuselmann, R. Pisoni, C. Bekas, A. Iuliano, T. Laino, Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11(12), 3316–3325 (2020). https://doi.org/10.1039/C9SC05704H
P. Schwaller, T. Laino, T. Gaudin, P. Bolgar, C.A. Hunter, C. Bekas, A.A. Lee, Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Central Sci. 5(9), 1572–1583 (2019). https://doi.org/10.1021/acscentsci.9b00576
Philippe Schwaller, Théophile. Gaudin, Dávid. Lányi, Costas Bekas, Teodoro Laino, Found in translation: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9(28), 6091–6098 (2018). https://doi.org/10.1039/C8SC02339E
F.-L. Fan, J. **ong, M. Li, G. Wang, On interpretability of artificial neural networks: a survey. IEEE Trans. Radiat. Plasma Med Sci (2021). https://doi.org/10.1109/TRPMS.2021.3066428
Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network interpretability. IEEE Trans. Emerg. Topics Comput. Intell. 5(5), 726–742 (2021b). https://doi.org/10.1109/TETCI.2021.3100641
N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeersch, G.R. Hutchison, Open babel: an open chemical toolbox. J. Cheminformatics 3(1), 33 (2011). https://doi.org/10.1186/1758-2946-3-33
S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V.L. Chevrier, K.A. Persson, G. Ceder, Python Materials Genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013). https://doi.org/10.1016/j.commatsci.2012.10.028
A.H. Larsen, J.J. Mortensen, J. Blomqvist, I.E. Castelli, R. Christensen, M. Dulak, J. Friis, M.N. Groves, B. Hammer, C. Hargus, E.D. Hermes, P.C. Jennings, P.B. Jensen, J. Kermode, J.R. Kitchin, E.L. Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J.B. Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. Schiøtz, O. Schütt, M. Strange, K.S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, K.W. Jacobsen, The atomic simulation environment— a python library for working with atoms. J. Phys.: Condensed Matter. 29(27), 273–002 (2017). https://doi.org/10.1088/1361-648X/aa680e
G. Landrum, P. Tosco, B. Kelley, sriniker, gedeck, NadineSchneider, Riccardo Vianello, Ric, Andrew Dalke, Brian Cole, AlexanderSavelyev, Matt Swain, Samo Turk, Dan N, Alain Vaucher, Eisuke Kawashima, Maciej Wójcikowski, Daniel Probst, guillaume godin, David Cosgrove, Axel Pahl, JP, Francois Berenger, strets123, JLVarjo, Noel O’Boyle, Patrick Fuller (Gianluca Sforna, and DoliathGavid. RDKit, Jan Holst Jensen, 2021)
Logan Ward, Alexander Dunn, Alireza Faghaninia, N.E.R. Zimmermann, S. Bajaj, Q. Wang, J. Montoya, J. Chen, K. Bystrom, M. Dylla, K. Chard, M. Asta, K.A. Persson, G. Jeffrey Snyder, I. Foster, A. Jain, Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018). https://doi.org/10.1016/j.commatsci.2018.05.018
Łukasz Mentel. Mendeleev – A Python resource for properties of chemical elements, ions and isotopes, ver. 0.9.0, (2014)
Bingbing Zhang, **aodong Zhang, Yu. **, Ying Wang, Wu. Kui, Ming-Hsien. Lee, First-Principles High-Throughput Screening Pipeline for Nonlinear Optical Materials: Application to Borates. Chemistry of Materials 32(15), 6772–6779 (2020). https://doi.org/10.1021/acs.chemmater.0c02583. (ISSN 0897-4756)
Lorenz M. Mayr, Dejan Bojanic, Novel trends in high-throughput screening. Current Opinion in Pharmacology 9(5), 580–588 (2009). https://doi.org/10.1016/j.coph.2009.08.004
James Dean, Michael J. Cowan, Jonathan Estes, Mahmoud Ramadan, Giannis Mpourmpakis, Rapid prediction of bimetallic mixing behavior at the nanoscale. ACS Nano 14(7), 8171–8180 (2020). https://doi.org/10.1021/acsnano.0c01586
M. Uhrin, S.P. Huber, J. Yu, N. Marzari, G. Pizzi, Workflows in AiiDA: engineering a high-throughput, event-based engine for robust and modular computational workflows. Comput. Mater. Sci. 187, 110–086 (2021). https://doi.org/10.1016/j.commatsci.2020.110086
S.P. Huber, S. Zoupanos, M. Uhrin, L. Talirz, L. Kahle, R. Häuselmann, D. Gresch, T. Müller, A.V. Yakutovich, C.W. Andersen, F.F. Ramirez, C.S. Adorf, F. Gargiulo, S. Kumbhar, E. Passaro, C. Johnston, A. Merkys, A. Cepellotti, N. Mounet, N. Marzari, B. Kozinsky, G. Pizzi, AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance. Sci. Data 7(1), 300 (2020). https://doi.org/10.1038/s41597-020-00638-4
T. Bazhirov, Data-centric online ecosystem for digital materials science. ar**v:1902.10838[cond-mat, physics:physics], (2019)
T. Bazhirov, E. X. Abot, Fast and accessible first-principles calculations of vibrational properties of materials. ar**v:1808.10011[cond-mat, physics:physics], (2018)
P. Das, M. Mohammadi, T. Bazhirov, Accessible computational materials design with high fidelity and high throughput. ar**v:1807.05623[cond-mat, physics:physics], (2018)
P. Das, T. Bazhirov, Electronic properties of binary compounds with high fidelity and high throughput. J. Phys.: Conf. Series 1290, 012–011 (2019). https://doi.org/10.1088/1742-6596/1290/1/012011
A. Zech, T. Bazhirov, CateCom: a practical data-centric approach to categorization of computational models. J. Chem. Inf. Model. 62(5), 1268–1281 (2022). https://doi.org/10.1021/acs.jcim.2c00112
H. Yamada, C. Liu, S. Wu, Y. Koyama, S. Ju, J. Shiomi, J. Morikawa, R. Yoshida, Predicting materials properties with little data using shotgun transfer learning. ACS Central Sci. 5(10), 1717–1730 (2019). https://doi.org/10.1021/acscentsci.9b00804
J.R. Rumble, T.J. Bruno, M.J. Doa, CRC handbook of chemistry and physics: a ready-reference book of chemical and physical data (CRC Press, Boca Raton, 2021)
L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials. Comput. Mater. 2(1), 1–7 (2016). https://doi.org/10.1038/npjcompumats.2016.28
A. Salinas-Sanchez, J.L. Garcia-Muñoz, J. Rodriguez-Carvajal, R. Saez-Puche, J.L. Martinez, Structural characterization of R2BaCuO5 (r = y, lu, yb, tm, er, ho, dy, gd, eu and sm) oxides by x-ray and neutron diffraction. J. Solid State Chem. 100(2), 201–211 (1992). https://doi.org/10.1016/0022-4596(92)90094-C
P.P. Ewald, Die Berechnung optischer und elektrostatischer Gitterpotentiale. Annalen der Physik 369(3), 253–287 (1921). https://doi.org/10.1002/andp.19213690304
Jmol development team. Jmol, (2016)
T. Akiba, S. Sano, T. Yanase, T. Ohta, M. K. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, pp. 2623–2631, New York, (2019). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330701
J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization. Advances in neural information processing systems (Curran Associates Inc., New York, 2011)
J. Bergstra, D. Yamins, D. Cox, Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, pp 115–123. PMLR, (2013)
L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)
NIST Chemistry WebBook, NIST standard reference database number 69 (National Institute of Standards and Technology, Gaithersburg, 2021)
Funding
This research was supported by the US Department of Energy (DoE) Small Business Innovation Research (SBIR) program (grant no. DE-SC0021514). Computational resources were provided by Exabyte Inc. T.P. and M.S. were funded by the NOMAD CoE (Novel Materials Discovery Center of Excellence, European Union’s Horizon 2020 research and innovation program, grant agreement N\(^\circ\) 951786), the project TEC1p (European Research Council, grant agreement N\(^\circ\) 740233), and the project FAIRmat (FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids, German Research Foundation, project N\(^\circ\) 460197019). T.P. would like to thank the Alexander von Humboldt Foundation for their support through the Alexander von Humboldt Postdoctoral Fellowship Program.
Author information
Authors and Affiliations
Contributions
JD and TB were responsible for performing the calculations and preparing the manuscript. TP performed and advised on the SISSO calculations. MS, RB, SB, and TB guided and conceptualized the project. All authors contributed in revising and writing the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
James Dean carried out this work while employed by Exabyte Inc. Timur Bazhirov is currently employed by Exabyte Inc.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dean, J., Scheffler, M., Purcell, T.A.R. et al. Interpretable machine learning for materials design. Journal of Materials Research 38, 4477–4496 (2023). https://doi.org/10.1557/s43578-023-01164-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1557/s43578-023-01164-w