Introduction

Macroscopic symmetry is one of the central concepts in the modern condensed matter physics and materials science1,2,3,4,5,6. Formalized via point and spatial group theory, symmetry underpins areas such as structural analysis, serves as the basis for the descriptive formalism of quasiparticles and elementary excitations, phase transitions, and mesoscopic order-parameter-based descriptions, especially of crystalline solids. In macroscopic physics, symmetry concepts arrived with the advent of X-ray methods developed by Bragg, and for almost a century remained the primary and natural language of physics. Notably, the rapid propagation of laboratory X-ray diffractometers and large-scale X-ray scattering facilities provided ample experimental data across multiple material classes and serve as a necessary counterpart for theoretical developments. Correspondingly, symmetry-based descriptors have emerged as a foundational element of condensed matter physics and materials science alike.

The natural counterpart of symmetry-based descriptors is the concept of physical building blocks. Thus, crystalline solids can be generally described via a combination of the unit cells with discrete translational lattice symmetries2,3,7. At the same time, systems such as Penrose structures8,9,10,11,12,13 possess well-defined building blocks but undefined translation symmetry. Finally, a broad range of materials lack translational symmetries, with examples ranging from structural glasses and polymers to ferroelectric and magnetic morphotropic systems14,15,16,17,18,19,20,21,22. Remarkably, the amenability of symmetry-based descriptors have led to much deeper insights into the structure and functionalities of materials with translational symmetries compared to (partially) disordered systems23,24,25.

The beginning of the 21st century has seen the emergence of real space imaging methods including scanning probe microscopy (SPM)26,27,28 and especially (scanning) transmission electron microscopy ((S)TEM)29,30,31. Following the introduction of the aberration corrector in the late ‘90s32 and the advent of commercial aberration-corrected microscopes, atomically resolved imaging is now mainstream. Notably, modern STEMs allow atomic columns to be imaged with ~pm-level precision33. This level of structural information allows insight into the chemical and physical functionalities of materials, including chemical reactivity, magnetic, and dielectric properties utilizing structure-property correlations developed by condensed matter physicists from macroscopic scattering data34,35,36,37,38,39,40. Over the last decade, several groups have extended these analyses to derive mesoscopic order parameter fields such as polarization41,42,43,44, strains and chemical strains45, and octahedra tilts46,47,48 directly from STEM and SPM data. Strain measurements have also been done in reciprocal space using nano-diffraction49, ultrafast CBED50, and a combination of 4D-STEM and machine learning51. In several cases, these data can be matched to the mesoscopic Ginzburg-Landau models, providing insight into the generative mesoscopic physics of the material52,53. Recently, a similar approach was proposed and implemented for theory-experiment matching via microscopic degrees of freedom54,55,56.

Yet, despite the wealth of information contained in atomically resolved imaging data, analyses to date were almost invariably based on the mathematical apparatus developed for macroscopic scattering data57,58,59,60,61. However, the nature of microscopic measurements is fundamentally different. For the case of ideal single crystal containing a macroscopic number of structural units, the symmetry of the diffraction pattern represents that of the lattice and the width of the peaks in Fourier space is determined by the intrinsic factors such as angle resolution of the measurement system, rather than disorder in the material. The presence of symmetry breaking distortions, such as the transition from a cubic to tetragonal state, is instantly detectable from peak splitting. For microscopic observations only a small part of the object is visible and the positions of the atoms are known only within an uncertainty interval; this uncertainty can be comparable to the magnitude of the symmetry breaking feature of interest such as tetragonality or polarization. Thus, questions arise: What image size is it justified to define the symmetry from the atomically resolved data? and What level of confidence can be defined? Ideally, such an approach should be applicable not only for structural data, but also for more complex multi-dimensional data sets such as those available in scanning tunneling spectroscopy (STS)62 in scanning tunneling microscopy (STM), force-distance curve imaging63 in atomic force microscopy, or electron energy loss spectroscopy (EELS)64,65 and ptychographic imaging66,67,68 in scanning transmission electron microscopy (STEM).

Here we propose an approach for the analysis of spatially resolved data based on deep learning in a Bayesian setting. This analysis utilizes the synergy of three fundamental concepts; the (postulated) parsimony of the atomic-level descriptors corresponding to stable atomic configurations, the presence of distortions in the idealized descriptors (e.g., due to local strains or other forms of symmetry breaking), and the presence of possible discrete or continuous rotational symmetries. These concepts are implemented in a workflow combining feature selection (atom finding), a rotationally invariant variational autoencoder to determine symmetry invariant building blocks, and a conditional autoencoder to explore intra-class variability via relevant disentangled representations. This approach is demonstrated for 2D imaging data but can also be generalized for more complex multi-dimensional data sets.

Results and discussion

Why local symmetry is Bayesian

Here, we illustrate why the consistent definitions of local symmetry properties necessitates the Bayesian framework. As an elementary, but easy to generalize example, we consider the 1D diatomic chain formed by alternating atoms (1) and (2) with coordinates generated by the rule \(x_i^{(1)} = x_i^{(2)} + a\), \(x_{i + 1}^{(2)} = x_i^{(1)} + b\). The atomic coordinates with some uncertainty stemming from the observational noise, sampling, etc. are experimentally observed and hence, the atomic positions, \(x_j^{exp}\), that are the sum of the ideal positions, \(x_j^{(1,2)}\), and noise, \(\delta\), are available for observation. We assume that the atom types are not observed (e.g. they have similar contrast), i.e., atoms (1) and (2) are indistinguishable. Correspondingly, we aim to answer the question - what number of observations can distinguish the simple chain, a = b, and diatomic chain, \(a \ne b\)? Note that this problem is equivalent to, e.g., distinguishing a square and tetragonal unit cell and can be generalized to more complex cases with the addition of several parameters.

The classical answer to this question is given by frequency-based statistics. Here, an alternative hypothesis (i.e. single vs. double chain) is formed where the point estimates for the average lattice parameters and their dispersions are calculated and the p-test can be used determine the correctness of the hypothesis. However, this approach has several significant limitations: it does not consider any potential prior knowledge of the system, it implicitly relies on the relevant distributions being Gaussian, and it is sensitive to the choice of an ideal system. A detailed analysis of the relevant drawbacks is given by Kruschke69.

An alternative approach to these problems is via the Bayesian framework, based on the concept of prior and posterior probabilities linked as70,71:

$$p\left( {\theta _i|D} \right) = \frac{{p\left( {D|\theta _i} \right)p\left( {\theta _i} \right)}}{{p\left( D \right)}}$$
(1)

where D represents the data obtained during the experiment, \(p\left( {D|\theta _i} \right)\) represents the likelihood that this data can be generated by the model, i, with parameter, \(\theta _i\). The prior, \(p\left( {\theta _i} \right)\), reflects the prior knowledge about the model. The posterior, \(p\left( {\theta _i|D} \right)\), describes the new knowledge (i.e., updated model and model parameters) as a result of the observational data. Finally, \(p\left( D \right)\) is the denominator that defines the total space of possible outcomes.

As an example, a set of diatomic chains is generated with bond lengths derived from two normal distributions, N(µ = 0.5, σ2 = 0.01) and N(µ = 1.5, σ2 = 0.01), where μ is the mean and σ the standard deviation of the distribution. These two sets of bond lengths are treated independently and are referred to as odd and even bond lengths, respectively. The likely distributions for this case are also assumed to be normal distributions, N(µ = µ1, σ = σ1) and N(µ = µ2, σ = σ2). A total of four parameters, µ1 and σ1for the odd bond lengths and µ2 and σ2 for the even bond lengths, exhaustively determine the parameter space. We refer to this analysis as case-1.

The key element of Bayesian inference is the concept of prior, summarizing the known information on the system70,71,72. In experiments, the priors are typically formed semi-quantitatively based on general physical knowledge of the material (e.g., SrTiO3 is known to be cubic with lattice parameter 3.1 Å). For this model example, the prior distributions of all four parameters are formed based on the first 10 observations, Y10. The prior distribution for µ1 and µ2 is a Laplace distribution, L(µ = Y10, b = 0.2*Y10), whereas for σ1 and σ2 it is a uniform distribution, U(0, Y10). This method of prior selection removes any a priori bias about the sample and only uses data obtained from experimental images. However, the priors can also be obtained from known materials properties (assuming a perfect imaging system). The posterior distributions of the parameters are updated with each datapoint. Figure 1a shows a schematic of how the odd and even chain analyses can be extended to a more general 2D Bravais lattice. Figure 1b shows the final posterior distributions of the parameters involved, with the posterior distributions after an update with first respective datapoints of each set shown by the solid lines. The means of both normal distributions are close to the real values and are far away from each other.

Fig. 1: Symmetry in Bayesian setting.
figure 1

a Schematic illustration of the correspondence between even-odd chains and 2D lattice. b Final posterior distributions of parameters (µ1, µ2, σ1, and σ2) involved in analysis of case-1, posterior distributions after first update are shown by solid lines. c Final posterior distributions of parameters (µ1, µ2, σ1, and σ2) involved in case-2, posterior distributions after first update are shown by solid lines. d Posterior distribution of µ3 as a function of number of datapoints for case-2. e High density interval (HDI) of µ3 (blue) and region of practical importance (ROPE) (red).

For a non-trivial case, odd and even bond lengths are derived from the normal distributions, N(µ = 0.95, σ2 = 0.01) and N(µ = 1.05, σ2 = 0.01). Here, the difference in the means is on the order of the standard deviation. We refer to this analysis as case-2. Figure 1b shows the final posterior distributions of the parameters involved, with the posterior distributions after the first update shown by the solid lines. To answer the question of whether the set of bond lengths belong to a simple chain or a diatomic chain, we construct the distribution of the difference in bond lengths with a likelihood, N(µ = µ3, σ = σ3). For a simple ideal lattice, this distribution should be centered at zero with no standard deviation. Figure 1d shows the posterior distribution for µ3 as a function of the number of datapoints. We then construct an interval region of practical equivalence (ROPE), which is the region around the hypothesis where the hypothesis is still true. A decision on the validity of the hypothesis can be made by comparing the highest density interval (HDI, 94% credible interval) and the ROPE. For illustration purposes, the ROPE is considered to be [−0.1, 0.1] in Fig. 1e and the HDI for µ3 is also shown. Decision rules for different overlaps of HDI and ROPE are discussed in e.g., ref. 70.

This simple example illustrates that for microscopic observations, many fundamental parameters are defined only in the Bayesian sense as the posterior probability densities. They can be related to the macroscopic definitions through concepts such as the practical equivalence. For large system sizes, the Bayesian estimates converge to the macroscopic model. We pose that the Bayesian descriptions of symmetry and structural properties from the bottom up should be Bayesian in nature, updating the prior knowledge of the system with the experimental data.

Local crystallographic analysis

As a second concept, we discuss established approaches for the systematic analysis of atomic structures from experimental observations and the deep fundamental connections between the intrinsic symmetries present (or postulated) in the data and the neural network architectures. For example, the classical fully connected multilayer perceptron intrinsically assumes the presence of potential strong correlations between arbitrarily separated pixels of the input image, resulting in a well-understood limitation of these networks to only relatively low-dimensional features. Convolutional neural networks (CNNs) are introduced as a universal approach for equivariant data analysis where the features of interest can be present anywhere within the image plane. This network architecture implicitly assumes the presence of continuous translational symmetry, similar to the sliding window/transform approach73,74,75. While allowing derivation of mesoscopic information, even for atomically resolved data, this approach suffers from inevitable spatial averaging and ignores the existence of well-defined atomic units.

If the positions of the atomic species can be determined, the analysis can be performed based on the local atomic neighborhoods (local crystallography)76,77 or the full atomic connectivity graph. In these approaches, the full image is reduced to atomic coordinates and the subsequent analysis is based on the latter. It is important to note that in this case all remaining information in the image plane is ignored, i.e., the full data set is approximated by the point estimates of the atomic positions. Finally, the combined approach can be based on the analysis of sub-images centered on defined atomic positions78,79. In this case, the known atomic positions provide the reference points and the sub-images contain information on the structure and functionality around them.

For atomic and sub-image-based descriptors, the behavior referenced to the ideal behavior is of interest and is defined by high-symmetry positions or ideal lattice sites. If these are known, then behaviors such as symmetry-breaking distortions can be immediately quantified and explored. However, the very nature of experimental observations is such that this ground truth information is not available directly, necessitating suitable approximations. For example, an ideal lattice can be postulated and average parameters can be found using a suitable filtering method. However, this approach is sensitive to minute distortions of the image (e.g., due to drift) and image distortion correction is required. Similarly, variability in the observed images due to microscope configurations (mis-tilt, etc.) can provide observational biases.

These examples illustrate that deep analysis of the structure and symmetry from atomically resolved data sets necessitates simultaneous (a) identification of ideal building blocks and symmetry breaking distortions, while (b) allowing for general rotational invariance in the image plane and (c) accounting for discrete translational symmetry as implemented in the Bayesian setting. Ideally, such descriptors will be referenced to local features.

Bayesian local crystallography

Here, we aim to combine the local crystallography and Bayesian approaches. The general workflow for deep Bayesian local crystallographic analysis is shown in Fig. 2a. For the first step, the STEM image or a stack of images are fed into the deep fully convolutional neural network (DCNN). for semantic segmentation and atom finding80,81. The semantics segmentation refers to a process where each pixel in the raw experimental data is categorized as belonging to an atom (or to a particular type of atom) or to a “background” (vacuum). The atom finding procedure is then performed on segmented data by finding a center of the mass of each segmented blob (corresponding to an atom) with a sub-pixel precision. The details of the DCNN configuration and implementation are available via the AtomAI repository on GitHub82. The DCNN-derived atomic positions are used to define the stack of sub-images centered on each atom and represent the image contrast in the vicinity of each atom.

Fig. 2: Schematic of the Bayesian local crystallographic and c(r)VAE workflow.
figure 2

a General workflow of deep Bayesian local crystallographic analysis. b Schematic of (conditional) rotational variational autoencoder, (c)rVAE, workflow. In (c)rVAE, the encoder layers can be either fully connected or convolutional, whereas the decoder layers are always fully-connected layers (in the current implementation, inclusion of convolutional layers breaks the rotational symmetry).

Note that this sub-image description is chosen since both the original STEM data and DCNN reconstructions contain information beyond atomic coordinates, such as column shapes and unresolved features, and this needs to be taken into account during analysis. It is important to note that the choice of sub-image stack (original image, smoothed image, or DCNN output) defines the type of information that will be explored. For example, DCNN outputs define the probability density that a certain image pixel belongs to a given atom class that is optimal for exploration of chemical transformation pathways. At the same time, original image contrast may be optimal for exploration of physical phenomena. Finally, we note that the extremely important issue in this analysis is the correction of distortions for effects such as fly-back delays or general image instabilities, which can alleviate unwanted artifacts and introduce new ones. Several examples of these will be discussed below. If necessary, these sub-images can be used to further refine the classes using standard methods such as principal component analysis (PCA) or Gaussian mixture modelling (GMM). GMM is a type of clustering technique that assumes that each cluster is a multivariate normal distribution. The clusters are characterized by a mean and a covariance matrix. The parameters (means and covariance matrices) of the clusters are estimated by maximum a posteriori estimation83. However, as mentioned above, these clustering methods will tend to separate the atoms into symmetry equivalent positions, leading to over-classification and poorly separable classes.

To avoid this problem, the subsequent step in the analysis is the rotationally invariant variational autoencoder (rVAE). In general, VAE is a directed latent-variable probabilistic graphical model. It allows learning stochastic map** between an observed x-space (in this case, space of the sub-images) with a complicated empirical distribution and a latent z-space whose distribution can be relatively simple84. Recently, it has been used by a subset of authors for exploring the (latent) order parameter from imaging data on dynamically evolving systems ranging from monolayer graphene85 to protein nanoparticles86. More specifically, the VAE consists of generative and inference models, which are Bayesian networks of the form \(p\left( {x|z} \right)p(z)\) and \(q\left( {z|x} \right)\), respectively. For the generative model, the latent variable, zi, is a “code” (hidden representation) from which it reconstructs xi. The potentially complex, non-linear dependency between xi and zi is parameterized by a (deep) neural network (NN) with weights θ, \(p_\theta \left( {x|z} \right)\), which takes “code” zi as an input. The inference model is used to approximate the posterior of the generative model, \(p_\theta \left( {z|x} \right)\), and represents a flexible family of variational distributions parameterized by a NN with weights ϕ, \(q_\phi \left( {z|x} \right)\). The NN-parameterized inference and generative models are frequently referred to as encoder and decoder, respectively. The point estimates for the parameters of the two networks (θ and ϕ) are jointly learned by maximizing the evidence lower boundary (ELBO) consisting of the reconstruction loss term and Kullback-Leibler (KL) divergence term with a mini-batch stochastic gradient descent (SGD). We note that a fully Bayesian treatment of the encoder and decoder weights is also possible, in which case the mini-batch SGD training procedure is substituted by a full-batch Hamiltonian Monte Carlo. However, as of now, aside from extremely high computational costs, the fully Bayesian neural networks show surprisingly poor performance on the data corrupted by noise87, which makes them potentially suboptimal for experimental data.

Here, we aim to learn a rotationally invariant code for our data. Unfortunately, standard neural network layers (fully connected and convolutional) do not respect rotational symmetry or invariance. One potential way to circumvent this problem is to use convolutional layers with modified, steerable filters88. Another approach, which is specific to the VAE set up, is to disentangle rotations from image content by making the generative model (decoder) explicitly dependent on the coordinates (Fig. 2b)89. In this case, we sample our ‘latent angle’ from a prescribed distribution (more details below) and use it to perform a 2D rotation of the coordinate grid, \(R\left( \gamma \right){{{{{\mathbf{x}}}}}}_g\). The rotated grid is then passed to a decoder where it is concatenated with “standard” VAE latent variables. Overall, the generative process is defined as

$$p\left( z \right) = {{{{{\mathcal{N}}}}}}\left( {z|0,\;I} \right);\quad p\left( \gamma \right) = {{{{{\mathcal{N}}}}}}\left( {\gamma |0,\;s_\gamma ^2} \right)$$
(2)
$$p_\theta \left( {x|z,\;\gamma } \right) = Bern\left( {x|g\left( {z,\gamma } \right)} \right)$$
(3)

where \(p\left( z \right)\) is a standard normal prior for the continuous latent code, \(p\left( \gamma \right)\) is a normal prior for the latent angle \(\gamma\) with a “rotational prior” \(s_\gamma ^2\) set by a user, and \(Bern\left( {x|g\left( {z,\gamma } \right)} \right)\) is a parametrized Bernoulli likelihood function where \(g\) is a decoder NN with the coordinate transformation (followed by the concatenation) as an “input layer”. We note that while priors other than Normal, including von Mises and projected normal distributions90, can in principle be used for the latent angle, we did not find empirically any significant difference in the results for the dataset discussed in this paper. The encoder in our inference model outputs the approximate parameters of the posterior distribution,

$$q_\phi \left( {z,\;\gamma |x} \right) = {{{{{\mathcal{N}}}}}}\left( {z,\;\gamma |\mu _\phi \left( x \right),\;{\upsigma}_\phi ^2\left( x \right)} \right)$$
(4)

where \(\mu _\phi \left( x \right)\) and \({\upsigma}_\phi ^2\left( x \right)\) correspond to the multi-head encoder NN. The loss objective (the negative ELBO) is computed as

$${{{{{\mathcal{L}}}}}} = {{{{{\mathcal{L}}}}}}_{{{{{{\mathrm{RE}}}}}}} + D_{KL}\left( {q_\phi \left( {z|x} \right)\|{p}\left( z \right)} \right) + D_{KL}\left( {q_\phi \left( {\gamma |x} \right)\|{p}\left( \gamma \right)} \right),$$
(5)

where the first term is a reconstruction error (which, in case of Bernoulli likelihood, is equivalent to a binary cross-entropy loss), and the second and third terms are the KL divergences associated with image content and rotation angle, respectively.

Hence, for the rVAE, the latent space is configured to comprise the rotational angle and additional unstructured latent variables. We note that other (than rotation) affine transformations including lateral offsets and scale can be added as well. These VAE configurations are ideally suited for the analysis of the variability in the STEM sub-image stack since uncertainties in the atomic positions (if any) can be naturally accommodated through the offset latent variables and the continuous or discrete rotations are captured by the angle variable. The remaining latent variables can be used in a manner similar to classical variational autoencoders4. The latent space distribution exhibits extremely interesting behavior. Figure 5a shows the joint distribution of the latent variables, visualized both as individual points and with a superimposed kernel density estimate (KDE). Note that each point corresponds to the sub-image and describes the behavior of the local neighborhood of a single lattice atom. The representation as points and KDE allows the comparison between the total system behavior (including the distribution of outliers) and the corresponding densities (average behaviors) and is necessary given the large number of points (from ~104 for single images to ~105 for the stacks).

Fig. 5: Application of conditional rVAE to the (LaxSr1-x)MnO3–NiO system.
figure 5

Conditional rVAE analysis of the three GMM components from Fig. 4 corresponding to the a, d A-site (component 1), b, e B-site (component 2) columns of the LSMO phase and c, f NiO phase (component 3). The upper images are the raw images, and the lower images have the average of the 3×3 tableau subtracted.

To extend this analysis, we note that the rVAE often tends to disentangle dissimilar types of distortions within a system. For example, experiments with a large number of different STEM images (beyond those shown in this paper) illustrate that scan distortions often tend to be described by one (group of) latent variables, whereas systematic changes in the local structure are described by the remaining latent variables. This property of VAEs is generally well known in computer science applications such as style networks; however, here we see that it applies for the physical systems as well.

We further explore this separation of atomic units based on neighborhood behavior using disentangled representations. As observed in Fig. 3, the angle and latent variable 2 seem to offer the optimal 2D basis to separate the atomic units, with clear contrast and a lack of distortion behaviors. The corresponding distribution and KDE plots are shown in Fig. 4b, illustrating three clearly defined groups of points corresponding to the A-site and B-site cations in the LSMO phase and columns in the NiO phases, respectively. Note that the KDE peaks corresponding to the three atomic types that jointly comprise >90% of points are fairly narrow. At the same time, there are a large number of outliers showing the presence of atoms with the behaviors falling on the continuous lines between the three groups, forming the manifold of possible states in the system.

Finally, we can gain further insight into the spatial distributions and classes of behaviors via clustering in the latent space. Figure 4c shows the Gaussian mixture model (GMM) clustering of points in the latent space. Note that given the complex structure of the distribution, the choice of a proper covariance matrix for the GMM, or the exploration of different clustering methods, will highlight different aspects of system behavior and hence offer a powerful tool for the exploration of corresponding physics. Here, we show as an example of the separation in three components. The spatial distribution of the label maps is shown in Fig. 4b and images corresponding to the centroids of the GMM classes are shown in Fig. 4e–g. Components 1 (blue) and 2 (green) correspond to the A and B sites in the perovskite, respectively, while component 3 (brown) corresponds to the NiO phase. To examine if additional components can provide more information, we repeat the analysis for five components in Fig. S5. The GMM analysis, shown in Fig. S5a, shows four well-defined clusters with one component more widely distributed. The spatial distribution of the components is shown in Fig. S5b while the individual components are shown in Fig. S5c–g. Once again, the first two components correspond to the A and B sites of the perovskite and fourth component corresponds to the NiO phase. The third component, which is the distributed component in the GMM cluster plot. It corresponds to a distorted NiO lattice and occurs at the edges of the NiO inclusions at the interface with the LSMO phase. The fifth component corresponds to a distorted A site in the LSMO. This is distributed throughout the perovskite lattice, with some horizontal stripes corresponding to the previously discussed scan distortions. The analysis is repeated on the DCNN segmented image shown in Fig. S6. In this case components 1 and 3 correspond to the LSMO lattice and component 2 corresponds to the NiO phase. Components 1 and 3 are no longer easily identified as the A or B sites of the perovskite but appear to be related by rotation. The distribution of these components is essentially random throughout the LSMO lattice rather than showing an alternating pattern seen in Fig. 4b. This most like due to the loss of intensity and shape information due to semantic segmentation.

To get further insight into the materials structure, we explore the disentangled representations of the structural building blocks using the conditional rotationally invariant variational autoencoder (crVAE) approach. The schematics of crVAE is shown in Fig. 2. Here, the autoencoder approach is used on the concatenated image stack (or its reduced representation) and the class labels. At the decoding stage, the mini-batch with one-hot encoded labels is concatenated with the unstructured latent variables. This leads to the decoder probability distribution being conditioned on the continuous latent code z and discrete labels c, \(p_\theta \left( {z|x,c} \right)\). The typical example of the crVAE application will be disentanglement of the styles in the MNIST data set93. Whence simple VAE will draw all the numbers and distribute them in the latent space, the crVAE will draw the selected number and the latent space representations will reflect writing styles—e.g. tilt, line width, etc. The key aspect of using crVAE approach, as opposed of VAE analysis of individual classes, is that the thus disentangled styles will be common across the data set, reminiscent of hierarchical Bayesian models. If the labels are known only partially, the unknown discrete classes are sampled from a uniform categorical distribution and an additional classifier neural network is added turning the model into a semi-supervised generative model. Recently, a subset of authors has shown that a semi-supervised rVAE model can be used for creating nanoparticle libraries from imaging data94. Here, we will limit ourselves to a scenario where all labels are known.

As an example of crVAE analysis, shown in Fig. 5 is the latent space representation for the three GMM components of the LSMO–NiO system form Fig. 4. Here, the latent space is subdivided into 3 × 3 regions and the corresponding images are reconstructed. Shown are the images per se and the images with subtracted average. Note that while direct physical interpretation of this disentangled representations is complex, we note the commonality in the character of changes in the vertical and lateral directions for all three components. The central position of each tableau shows the least variation from the mean value, with higher than average values seen to the left and lower than average values to the right.

Application of rVAE to a layered perovskite

We can extend this analysis to a system with a significantly more complex lattice such as the Sr3Fe2O7 (SFO) layered perovskite. Sr3Fe2O7 is a mixed valence Ruddlesden-Popper series compound with double perovskite structure that nominally features tetravalent iron. Charge disproportionation to Fe(III) and Fe(V) was observed by Mössbauer spectroscopy95,96. Spiral magnetic order was observed by neutron diffraction97 and provides a rare example of a magnetic cycloid arising from a ferromagnetic nearest neighbor competing with antiferromagnetic next-nearest exchange98. Further interest in this material arise from high oxygen mobility99. The preparation of a near stoichiometric compound requires high oxygen partial pressure100.

The rVAE analysis of SFO is shown in Fig. 6. The original STEM image, Fig. 6a, clearly illustrates the layered structure of SFO. In Fig. 6b the sub-image representation of the of the latent variable shows a change of contrast from left to right and a change of structure towards the top. Of most interest is the encoded angle, Fig. 6c, which shows three separate values, one down the center of the layers and alternating values either side. The histogram of the encoded angle is shown in Fig. 6f where three peaks are clearly present. The peaks have been labeled with colored circles corresponding to those on Fig. 6c. The first latent space in Fig. 6d exhibits a more complex periodic structure consistent with the corresponding four peaked histogram shown in Fig. 6g. The second latent space exhibits a gradual change in intensity from left to right corresponding to the sample thickness variation, which is also observed in the raw STEM image (Fig. 6a). The corresponding histogram has a flattened peak corresponding to this gradual change. Similar to observations for a 2-phase system, these behaviors are now disentangled and can be explored separately. We observed a similar separation for other STEM images where scan distortions e.g., due to fly-back delays, were clearly concentrated in a single latent variable.

Fig. 6: rVAE analysis of Sr3Fe2O7 image.
figure 6

a Original STEM image with a sub-image inlayed in the red box. The scale bar is 2 nm b sub-image representation in 2D latent parameter space, c encoded angle, d latent Z1 e latent variable Z2. and fh, the histograms corresponding to ce. The circles on f correspond to the points on c. Analysis is performed on raw images using window size of 40 pixels. Insets indicate intensity variation of each panel.

The choice of window size is crucial for extracting some of these features. The effect of using a smaller and larger window size on the rVAE process is shown in Fig. S7. The 2D latent parameter space for 32 pixels, shown in Fig S7a shows a gradual change in contrast from left to right. The encoded angle in Fig. S7c is basically constant in value. Examination of the associated histogram (Fig. S7f) shows that the three peaks seen in Fig. 6f have collapsed to a single sharp peak. The latent variables show a gradual change in in tensity from left to right similar to that seen in Fig. 6e. The results are similar for a larger window size of 60 pixels. It should be noted that the intermediate values the encoded angle histograms first lose the central peak before collapsing to a single peak. For completeness, this analysis was also performed on the DCNN segmented images and the results are shown in Fig. S8 and S9. To obtain the same three peaked encoded angle histogram a window of 34 pixels was used. The peaks on either extreme are separated by a full 180 degrees and are represented by the red and blue markers on Fig. S8c. They represent the atoms at the edge of the bands. The range for the raw image analysis was approximately half of this, perhaps because of the noise level of the original data. The first latent variable still reflects the banded structure but the second is basically random. The results for the DCNN segmented data are extremely sensitive to window size. As seen in Fig S9, varying the window size by 2 pixels either way the three peaked encoded angle histogram is reduce to either one or two peaks.

For completeness a clustering analysis has been performed on the SFO results and these results and discussion are included in the supplementary information (Figs. S10 and S11).

To summarize, we introduce a workflow for the bottom-up symmetry and structural analysis of atomically resolved STEM imaging data. For systems with known or ad hoc defined rotational variants, the combination of Gaussian mixture modeling and principal component analysis (GMM-PCA) allows separation of the relevant structural units and structural distortions for individual units. However, the GMM-PCA combination fails in the presence of multiple rotational variants and especially general rotations, since in this case the class will be assigned to each rotation of the same structural unit. The use of the rVAE-crVAE approach proposed here allows one to generalize the classification-distortion analysis for the general rotational symmetry. We illustrate that the capability of VAEs to produce disentangled representations can be beneficially used to separate structural units, relevant distortions, and in certain cases, the instrumental distortions, opening the pathway for systematic studies of symmetry breaking distortions for a broad range of material systems.

While implemented here for the analysis of structural STEM images, we expect that a similar approach can be used for the analysis of symmetry breaking distortions in e.g., scanning tunneling microscopy (STM) images, and can be further extended to the analysis of multidimensional data sets such as tunneling spectroscopy in STM, EELS and 4D STEM in STEM, and so on. Furthermore, similar to other Bayesian methods, it will be of interest to explore physics-based prior distributions in the latent space, beyond the class labels used here. Overall, we believe that the combination of the capability to disentangle physical phenomena via latent space representations and parsimonious analysis makes the proposed workflow universal for multiple physical problems.

Methods

Thin film growth

The LSMO–NiO VAN and the single-phase LSMO and NiO films were grown on STO(001) single-crystal substrates by PLD using a KrF excimer laser (λ = 248 nm) with fluence of 2 J/cm2 and a repetition rate of 5 Hz. All films were grown at 200 mTorr O2 and 700 °C. The films were post-annealed in 200 Torr of O2 at 700 °C to ensure full oxidation, and cooled down to room temperature at a cooling rate of 20 °C/min. For out-of-plane transport measurements, the films were grown on 0.5% Nb-doped STO(001) single-crystal substrates. The film composition was varied by using composite laser ablation targets with different composition.

Sample preparation

A polycrystalline rod of Sr3Fe2O7-x with 6 mm in diameter and 50 mm in length was prepared using powders synthesized from solid state reaction of stoichiometric SrCO3 and Fe2O3 at 1100 °C. The single crystalline material utilized here was grown using a high pressure floating zone furnace with O2 partial pressure of 148 bar. Refinement of neutron diffraction data obtained at the NOMAD instrument of the Spallation Neutron Source using GSAS-II101 revealed a single-phase material with an oxygen content of 6.8, see Supplementary Fig. S12 and Table S1.

STEM

The plan-view STEM samples of Ni-LSOM were prepared using ion milling after mechanical thinning and precision polishing. In brief, a thin film sample was firstly ground, and then dimpled and polished to a thickness less than 20 micrometer from the substrate side. The sample was then transferred to an ion milling chamber for further substrate-side thinning. The ion beam energy and milling angle were adjusted towards lower values during the thinning process, which was stopped when an open hole appeared for STEM characterization. The Sr3Fe2O7 sample(s) were prepared by FIB lift out followed by local low energy Ar ion milling, down to 0.5 eV, in a Fischione NanoMill.

The STEM used for the characterization of both samples was a Nion UltraSTEM200 operated at 200 kV. The beam illumination half-angle was 30 mrad and the inner detector half-angle was 65 mrad. Electron energy-loss spectra were obtained with a collection half-angle of 48 mrad.