Abstract
The electronic band structure and crystal structure are the two complementary identifiers of solid-state materials. Although convenient instruments and reconstruction algorithms have made large, empirical, crystal structure databases possible, extracting the quasiparticle dispersion (closely related to band structure) from photoemission band map** data is currently limited by the available computational methods. To cope with the growing size and scale of photoemission data, here we develop a pipeline including probabilistic machine learning and the associated data processing, optimization and evaluation methods for band-structure reconstruction, leveraging theoretical calculations. The pipeline reconstructs all 14 valence bands of a semiconductor and shows excellent performance on benchmarks and other materials datasets. The reconstruction uncovers previously inaccessible momentum-space structural information on both global and local scales, while realizing a path towards integration with materials science databases. Our approach illustrates the potential of combining machine learning and domain knowledge for scalable feature extraction in multidimensional data.
Similar content being viewed by others
Main
Modeling and characterization of the electronic band structure (BS) of a material play essential roles in materials design1 and device simulation2. The BS exists in momentum space, Ω(kx, ky, kz, E), and imprints the multidimensional and multivalued functional relations between the energy (E) and momenta (kx, ky, kz) of periodically confined electrons3. Photoemission band map**4 (Fig. 1a) using momentum- and energy-resolved photoemission spectroscopy (PES), including angle-resolved PES (ARPES)5,6 and multidimensional PES7,8, measures the BS as an intensity-valued multivariate probability distribution directly in Ω. The proliferation of band-map** datasets and their public availability brought about by recent hardware upgrades7,8,9,10 have ushered in possibilities regarding the comprehensive benchmarking of theories and experiments, which is especially challenging for multiband materials with complex band dispersions11,12,13. The available methods for interpreting photoemission spectra fall into two categories: physics-based methods, which require least-squares fitting of one-dimensional lineshapes, named energy or momentum distribution curves (EDCs or MDCs), and analytical models5,14,15. Although physics-informed data models guarantee high accuracy and interpretability, upscaling the pointwise fitting (or estimation) to large, densely sampled regions in momentum space (for example, including 104 or more momentum locations) presents challenges due to the limited numerical stability and efficiency. Therefore, their use is limited to selected momentum locations determined heuristically from physical knowledge of the materials and experimental settings. Image-processing-based methods apply data transformations to improve the visibility of dispersive features16,17,18,19. They are more computationally efficient and can operate on entire datasets, yet offer only visual enhancement of the underlying band dispersion. They do not allow reconstruction and are therefore insufficient for truly quantitative benchmarking or archiving.
Data availability
The electronic-structure calculations for WSe2 are available from the NOMAD repository (https://doi.org/10.17172/NOMAD/2020.03.28-1)75. The raw and processed photoemission datasets used in this work for WSe2 (https://doi.org/10.5281/zenodo.7314278)76, Bi2Te2Se (https://doi.org/10.5281/zenodo.7317667)77 and Au(111) (https://doi.org/10.5281/zenodo.7305241 including DFT calculation)78 are available on Zenodo. Source data are provided with this paper.
Code availability
The code developed for band structure reconstruction, including examples, is available on GitHub (https://github.com/mpes-kit/fuller)79.
Change history
16 January 2023
In the version of this article initially published online, refs. 76–79 were not presented as doi-based references, while the Editor recognition statement was missing. The changes have been made in the HTML and PDF versions of the article.
References
Isaacs, E. B. & Wolverton, C. Inverse band structure design via materials database screening: application to square planar thermoelectrics. Chem. Mater. 30, 1540–1546 (2018).
Marin, E. G., Perucchini, M., Marian, D., Iannaccone, G. & Fiori, G. Modeling of electron devices based on 2-D materials. IEEE Trans. Electron Devices 65, 4167–4179 (2018).
Bouckaert, L. P., Smoluchowski, R. & Wigner, E. Theory of Brillouin zones and symmetry properties of wave functions in crystals. Phys. Rev. 50, 58–67 (1936).
Chiang, T.-C. & Seitz, F. Photoemission spectroscopy in solids. Ann. Phys. 10, 61–74 (2001).
Damascelli, A., Hussain, Z. & Shen, Z.-X. Angle-resolved photoemission studies of the cuprate superconductors. Rev. Mod. Phys. 75, 473–541 (2003).
Zhang, H. et al. Angle-resolved photoemission spectroscopy. Nat. Rev. Methods Primers 2, 54 (2022).
Schönhense, G., Medjanik, K. & Elmers, H.-J. Space-, time- and spin-resolved photoemission. J. Electron Spectros. Relat. Phenomena 200, 94–118 (2015).
Medjanik, K. et al. Direct 3D map** of the Fermi surface and Fermi velocity. Nat. Mater. 16, 615–621 (2017).
Puppin, M. et al. Time- and angle-resolved photoemission spectroscopy of solids in the extreme ultraviolet at 500-kHz repetition rate. Rev. Sci. Instrum. 90, 023104 (2019).
Gauthier, A. et al. Tuning time and energy resolution in time-resolved photoemission spectroscopy with nonlinear crystals. J. Appl. Phys. 128, 093101 (2020).
Riley, J. M. et al. Direct observation of spin-polarized bulk bands in an inversion-symmetric semiconductor. Nat. Phys. 10, 835–839 (2014).
Bahramy, M. S. et al. Ubiquitous formation of bulk Dirac cones and topological surface states from a single orbital manifold in transition-metal dichalcogenides. Nat. Mater. 17, 21–28 (2018).
Schröter, N. B. M. et al. Chiral topological semimetal with multifold band crossings and long Fermi arcs. Nat. Phys. 15, 759–765 (2019).
Valla, T. et al. Evidence for quantum critical behavior in the optimally doped cuprate Bi2Sr2CaCu2O8 + δ. Science 285, 2110–2113 (1999).
Levy, G., Nettke, W., Ludbrook, B. M., Veenstra, C. N. & Damascelli, A. Deconstruction of resolution effects in angle-resolved photoemission. Phys. Rev. B 90, 045150 (2014).
Zhang, P. et al. A precise method for visualizing dispersive features in image plots. Rev. Sci. Instrum. 82, 043712 (2011).
He, Y., Wang, Y. & Shen, Z.-X. Visualizing dispersive features in 2D image via minimum gradient method. Rev. Sci. Instrum. 88, 073903 (2017).
Peng, H. et al. Super resolution convolutional neural network for feature extraction in spectroscopic data. Rev. Sci. Instrum. 91, 033905 (2020).
Kim, Y. et al. Deep learning-based statistical noise reduction for multidimensional spectral data. Rev. Sci. Instrum. 92, 073901 (2021).
Moser, S. An experimentalist’s guide to the matrix element in angle resolved photoemission. J. Electron Spectros. Relat. Phenomena 214, 29–52 (2017).
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015).
Wang, C., Komodakis, N. & Paragios, N. Markov random field modeling, inference and learning in computer vision and image understanding: a survey. Comput. Vis. Image Underst. 117, 1610–1627 (2013).
Comer, M. & Simmons, J. The Markov random field in materials applications: a synoptic view for signal processing and materials readers. IEEE Signal Process. Mag. 39, 16–24 (2022).
Kaufmann, K. et al. Crystal symmetry determination in electron diffraction using machine learning. Science 367, 564–568 (2020).
Traving, M. et al. Electronic structure of WSe2: a combined photoemission and inverse photoemission study. Phys. Rev. B 55, 10392–10399 (1997).
Finteis, T. et al. Occupied and unoccupied electronic band structure of WSe2. Phys. Rev. B 55, 10400–10411 (1997).
Kormányos, A. et al. k⋅p theory for two-dimensional transition metal dichalcogenide semiconductors. 2D Mater. 2, 022001 (2015).
Stimper, V., Bauer, S., Ernstorfer, R., Scholkopf, B. & **an, R. P. Multidimensional contrast limited adaptive histogram equalization. IEEE Access 7, 165437–165447 (2019).
Perdew, J. P. & Schmidt, K. Jacob’s ladder of density functional approximations for the exchange-correlation energy. In AIP Conference Proceedings (Eds Doren, V. V. et al.) 1–20 (AIP, 2001).
Golze, D., Dvorak, M. & Rinke, P. The GW Compendium: a practical guide to theoretical photoemission spectroscopy. Front. Chem. 7, 377 (2019).
Zacharias, M., Scheffler, M. & Carbogno, C. Fully anharmonic nonperturbative theory of vibronically renormalized electronic band structures. Phys. Rev. B 102, 045126 (2020).
Zhang, D. & Lu, G. Review of shape representation and description techniques. Pattern Recognit. 37, 1–19 (2004).
Khotanzad, A. & Hong, Y. Invariant image recognition by Zernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12, 489–497 (1990).
Mahajan, V. N. & Dai, G.-m. Orthonormal polynomials in wavefront analysis: analytical solution. J. Opt. Soc. Am. A 24, 2994–3016 (2007).
Himanen, L., Geurts, A., Foster, A. S. & Rinke, P. Data driven materials science: status, challenges and perspectives. Adv. Sci. 6, 1900808 (2019).
Horton, M. K., Dwaraknath, S. & Persson, K. A. Promises and perils of computational materials databases. Nat. Comput. Sci. 1, 3–5 (2021).
Kiureghian, A. D. & Ditlevsen, O. Aleatory or epistemic? Does it matter? Struct. Safety 31, 105–112 (2009).
Nocedal, J. & Wright, S. J. Numerical Optimization 2nd edn (Springer, 2006).
**an, R. P., Ernstorfer, R. & Pelz, P. M. Scalable multicomponent spectral analysis for high-throughput data annotation. Preprint at https://arxiv.org/abs/2102.05604 (2021).
Smith, M. W. Roughness in the Earth Sciences. Earth Sci. Rev. 136, 202–225 (2014).
Guo, H. et al. Double resonance Raman modes in monolayer and few-layer MoTe2. Phys. Rev. B 91, 205415 (2015).
Heremans, J. P., Cava, R. J. & Samarth, N. Tetradymites as thermoelectrics and topological insulators. Nat. Rev. Mater. 2, 17049 (2017).
Ehrhardt, M. & Koprucki, T. (eds) Multi-band effective mass approximations. In Lecture Notes in Computational Science and Engineering Vol. 94 (Springer, 2014).
Scheffler, M. et al. FAIR data enabling new horizons for materials research. Nature 604, 635–642 (2022).
Kordyuk, A. A. et al. Bare electron dispersion from experiment: self-consistent self-energy analysis of photoemission data. Phys. Rev. B 71, 214513 (2005).
Noack, M. M. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nat. Rev. Phys. 3, 685–697 (2021).
Beaulieu, S. et al. Ultrafast dynamical Lifshitz transition. Sci. Adv. 7, eabd9275 (2021).
Curcio, D. et al. Accessing the spectral function in a current-carrying device. Phys. Rev. Lett. 125, 236403 (2020).
Wilson, N. R. et al. Determination of band offsets, hybridization, and exciton binding in 2D semiconductor heterostructures. Sci. Adv. 3, e1601832 (2017).
Ulstrup, S. et al. Nanoscale map** of quasiparticle band alignment. Nat. Commun. 10, 3283 (2019).
Ewings, R. et al. Horace: software for the analysis of data from single crystal spectroscopy experiments at time-of-flight neutron instruments. Nucl. Instrum. Methods Phys. Res. A 834, 132–142 (2016).
Whittaker, C. E. et al. Exciton polaritons in a two-dimensional Lieb lattice with spin-orbit coupling. Phys. Rev. Lett. 120, 097401 (2018).
Frölich, A., Fischer, J., Wolff, C., Busch, K. & Wegener, M. Frequency-resolved reciprocal-space map** of visible spontaneous emission from 3D photonic crystals. Adv. Opt. Mater. 2, 849–853 (2014).
Amenabar, I. et al. Hyperspectral infrared nanoimaging of organic samples based on Fourier transform infrared nanospectroscopy. Nat. Commun. 8, 14402 (2017).
von Rueden, L. et al. Informed machine learning—a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Trans. Knowl. Data Eng. 35, 614–633 (2023).
Oelsner, A. et al. Microspectroscopy and imaging using a delay line detector in time-of-flight photoemission microscopy. Rev. Sci. Instrum. 72, 3968–3974 (2001).
**an, R. P. et al. An open-source, end-to-end workflow for multidimensional photoemission spectroscopy. Sci. Data 7, 442 (2020).
SPECS GmbH. METIS 1000 Brochure (SPECS, 2019); https://www.specs-group.com/fileadmin/user_upload/products/brochures/SPECS_Brochure-METIS_RZ_web.pdf
**an, R. P., Rettig, L. & Ernstorfer, R. Symmetry-guided nonrigid registration: the case for distortion correction in multidimensional photoemission spectroscopy. Ultramicroscopy 202, 133–139 (2019).
Kittler, J. & Föglein, J. Contextual classification of multispectral pixel data. Image Vision Comput. 2, 13–29 (1984).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1603.04467 (2016).
Li, S. Markov Random Field Modeling in Image Analysis 3rd edn (Springer, 2009).
Stimper, V. & **an, R. P. Fuller. https://github.com/mpes-kit/fuller
Hinuma, Y., Pizzi, G., Kumagai, Y., Oba, F. & Tanaka, I. Band structure diagram paths based on crystallography. Comput. Mater. Sci. 128, 140–184 (2017).
Ceperley, D. M. & Alder, B. J. Ground state of the electron gas by a stochastic method. Phys. Rev. Lett. 45, 566–569 (1980).
Perdew, J. P. & Wang, Y. Accurate and simple analytic representation of the electron-gas correlation energy. Phys. Rev. B 45, 13244–13249 (1992).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Perdew, J. P. et al. Restoring the density-gradient expansion for exchange in solids and surfaces. Phys. Rev. Lett. 100, 136406 (2008).
Heyd, J., Scuseria, G. E. & Ernzerhof, M. Hybrid functionals based on a screened Coulomb potential. J. Chem. Phys. 118, 8207–8215 (2003).
Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 180, 2175–2196 (2009).
Huhn, W. P. & Blum, V. One-hundred-three compound band-structure benchmark of post-self-consistent spin-orbit coupling treatments in density functional theory. Phys. Rev. Mater. 1, 033803 (2017).
Wyant, J. C. & Creath, K. in Applied Optics and Optical Engineering Vol. Xl (eds Shannon, R R. & Wyant, J. C.) 1–53 (Academic Press, 1992).
Watkins, D. S. Fundamentals of Matrix Computations 3rd edn (Wiley, 2010).
Zacharias, M. & Carbogno, C. First-principles calculations for 2H-WSe2. NOMAD Repository https://nomad-lab.eu/prod/rae/gui/dataset/id/CS7f_obIQd6hE3-2JHfSuw (2020).
** and band reconstruction of 2H-WSe2. Zenodo https://doi.org/10.5281/zenodo.7314278 (2022).
Dendzik, M. et al. Excited-state photoemission band map** data of the topological insulator Bi2Te2Se. Zenodo https://doi.org/10.5281/zenodo.7317667 (2022).
Dendzik, M. et al. Synchrotron bulk photoemission data from Au(111) and DFT calculations. Zenodo https://doi.org/10.5281/zenodo.7305241 (2022).
**an, R. P. et al. Fuller: code and examples for the band structure reconstruction workflow. Zenodo https://doi.org/10.5281/zenodo.7325584 (2022).
Acknowledgements
We thank M. Scheffler for fruitful discussions and S. Schülke and G. Schnapka at Gemeinsames Netzwerkzentrum (GNZ) in Berlin and M. Rampp at Max Planck Computing and Data Facility (MPCDF) in Garching for support on the computing infrastructure. The work was partially supported by BiGmax, the Max Planck Society’s Research Network on Big-Data-Driven Materials-Science, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grants 740233 and ERC-2015-CoG-682843), the German Research Foundation (DFG) through the Emmy Noether programme under grant no. RE 3977/1, the SFB/TRR 227 ‘Ultrafast Spin Dynamics’ (project-ID 328545488, projects A09 and B07) and the NOMAD pillar of the FAIR-DI e.V. association. We thank M. Bremholm for providing the Bi2Te2Se samples, and Ph. Hofmann and M. Bianchi for their support in obtaining Au(111) photoemission data. M.D. acknowledges support from the Göran Gustafssons Foundation. S. Beaulieu acknowledges financial support from the Banting Fellowship from the Natural Sciences and Engineering Research Council (NSERC) in Canada.
Funding
Open access funding provided by Max Planck Society.
Author information
Authors and Affiliations
Contributions
R.P.X. and R.E. conceived and coordinated the project. The photoemission band-map** experiments were supervised by L.R., R.E. and M.W. S.D. and S. Beaulieu acquired the data on WSe2, and M.D. acquired the data on Bi2Te2Se and Au(111). M.Z., M.D. and C.C. performed the DFT BS calculations. R.P.X. and M.D. processed the raw data. R.P.X. devised the BS digitization, algorithm validation schemes and metrics, and performed computational benchmarking. V.S. designed and implemented the machine-learning algorithm under the supervision of S. Bauer and B.S., along with input from R.P.X. R.P.X. and V.S. co-wrote the first draft of the manuscript with contributions from M.Z. and M.D. All authors contributed to discussions and revision of the manuscript to its final version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–14, Tables 1–4, derivations of formulae, extended numerical validations and discussion.
Supplementary Video 1
Left side shows the position of the cut viewed from the first projected Brillouin zone of WSe2. Right side shows the corresponding 2D cut in (ky, E) coordinates from volumetric band map** data, overlaid with the DFT calculation performed at the LDA level (LDA-DFT), used to initialize the reconstruction, and the resulting 14 reconstructed valence bands.
Supplementary Video 2
Left side shows the position of the cut viewed from the first projected Brillouin zone of WSe2. Right side shows the corresponding 2D cut in (kx, E) coordinates from volumetric band map** data, overlaid with the DFT calculation performed at the LDA level (LDA-DFT), used to initialize the reconstruction, and the resulting 14 reconstructed valence bands.
Supplementary Video 3
The video explores the reconstructed valence bands from photoemission band map** data on WSe2 using LDA-level DFT calculation as the initialization. It illustrates the generation of an exploded view of the bands from the original reconstruction, the bands viewed collectively from different angles and the individual view of each band.
Source data
Source Data Fig. 1
Numerical data contained in Fig. 1c–f.
Source Data Fig. 2
Numerical data contained in Fig. 2c,d.
Source Data Fig. 3
Numerical data contained in Fig. 3.
Source Data Fig. 4
Numerical data contained in Fig. 4.
Source Data Fig. 5
Numerical data contained in Fig. 5c,e.
Source Data Fig. 6
Numerical data contained in Fig. 6b–d,f,g.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-022-00382-2
- Springer Nature America, Inc.