Introduction

Scanning transmission electron microscopy (STEM) investigations increasingly combine the measurement of multiple analytical signals as a function of probe position with post-facto computational analysis [1]. In a scan, the number of local signal measurements is usually much greater than the number of significantly distinct microstructural elements and this redundancy may be harnessed during analysis, for example by averaging signals over like regions to improve signal to noise. Unsupervised machine learning techniques automatically exploit data redundancy to find patterns with minimal prior constraints [2]. In analytical electron microscopy, such methods have been applied to learn representative signals corresponding to separate microstructural elements (e.g. crystal phases) and to unmix signals comprising contributions from multiple microstructural elements sampled along the beam path [3,4,5,6,7,8,9,10]. These studies have primarily applied linear matrix decompositions such as independent component analysis (ICA) and non-negative matrix factorisation (NMF).

Scanning precession electron diffraction (SPED) enables nanoscale investigation of local crystallography [11, 12] by recording electron diffraction patterns as the electron beam is scanned across the sample with a step size on the order of nanometres. The incorporation of double conical rocking of the beam, also known as precession [13], achieves integration through a reciprocal space volume for each reflection. Precession has been found to convey a number of advantages for interpretation and analysis of the resultant diffraction patterns, in particular the suppression of intensity variation due to dynamical scattering [14,15,16]. The resultant four-dimensional dataset, comprising two real and two reciprocal dimensions (4D-SPED), can be analysed in numerous ways. For example, the intensity of a sub-set of pixels in each diffraction pattern can be integrated (or summed) as a function of probe position, to form so-called virtual bright field (VBF) or virtual dark field (VDF) images [17, 18]. VBF/VDF analysis has been used to provide insight into local crystallographic variations such as phase [17], strain [19] and orientation [20]. In another approach, the collected diffraction patterns are compared against a library of precomputed templates, providing a visualisation of the microstructure and orientation information, a process known as template or pattern matching [11]. These analyses do not utilise the aforementioned redundancy present in data and may require significant effort on the part of the researcher. Here, we explore the application of unsupervised machine learning methods to achieve dimensionality reduction and signal unmixing.

Methods

Materials

GaAs (cubic, F43m) nanowires containing type I twin (\(\Sigma 3\)) [21] boundaries were taken as a model system for this work. The long axis of these nanowires is approximately parallel to the [111] crystallographic direction as a result of growth by molecular beam epitaxy [22] on (111). In cross section, these nanowires have an approximately hexagonal geometry with a vertex-to-vertex distance of 120–150 nm. Viewed near to the \([1{\overline{1}}0]\) zone axis, the twin boundary normal is approximately perpendicular to the incident beam direction.

SPED experiments

Scanning precession electron diffraction was performed on a Philips CM300 FEGTEM operating at 300 kV with the scan and simultaneous double rocking of the electron beam controlled using a NanoMegas Digistar external scan generator. A convergent probe with convergence semi-angle of \(\sim 1.5\hbox { mrad}\) and precession angles of 0, 9 and 35 mrad was used to perform scans with a step size of 10 nm using the ASTAR software package. The resolution was thus dominated by the step size. PED patterns were recorded using a Stingray CCD camera to capture the image on the fluorescent binocular viewing screen.

It is generally inappropriate to manipulate raw data before applying multivariate methods such as decomposition or clustering, which cannot be considered objective if subjective prior alterations have been made. In this work, the only data manipulation applied before machine learning is to align the central beams of each diffraction pattern. Geometric distortions introduced from the angle between the camera and the viewing screen were corrected by applying an opposite distortion to the data after the application of machine learning methods.

Multislice simulations

A twinned bi-crystal model was constructed with the normal to the [111] twin boundary inclined at an angle of \(55 ^{\circ }\) to the incident beam direction so that the two crystals overlapped in projection. In this geometry, both crystals are oriented close to \(\langle 511\rangle\) zone axes with coherent matching of the \(\{0{\bar{6}}6\}\) and \(\{2{\bar{8}}2\}\) planes in these zones. Three precession angles were simulated using the TurboSlice package [23]: 0, 10 and 20 mrad, with 200 distinct azimuthal positions about the optic axis to ensure appropriate integration in the resultant simulated patterns [24]. The crystal model used in the simulation comprised 9 unique layers each 0.404 nm thick. 15 layers were used leading to a total thickness of 54.6 nm. These \(512\times 512\)-pixel patterns with 16-bit dynamic range were convolved with a 4-pixel Gaussian kernel to approximate a point spread function.

Linear matrix decomposition

Latent linear models describe data by the linear combination of latent variables that are learned from the data rather than measured—more pragmatically, the repeated features in the data can be well approximated using a small number of basis vectors. With appropriate constraints, the basis vectors may be interpreted as physical signals. To achieve this, a data matrix, \(\mathbf X\), can be approximated as the matrix product of a matrix of basis vectors \(\mathbf W\) (components), and corresponding coefficients \(\mathbf Z\) (loadings). The error in the approximation, or reconstruction error, may be expressed as an objective function to be minimised in a least squares scheme:

$$\begin{aligned} \left| \left| \mathbf X -\mathbf WZ \right| \right| ^{2}_{\mathrm {F}} \end{aligned}$$
(1)

where \(||\mathbf A ||_{\mathrm {F}}\) is the Frobenius normFootnote 1 of matrix \(\mathbf A\). More complex objective functions, for example incorporating sparsity promoting weighting factors [25], may be defined. We note that the decomposition is not necessarily performed by directly computing this error minimisation.

Three linear decompositions were used here: singular value decomposition (SVD) [2, 26], independent component analysis (ICA) [27], and non-negative matrix factorisation (NMF) [25, 28]. These decompositions were used as implemented in HyperSpy [29], which itself draws on the algorithms implemented in the open-source package scikit-learn [30].

The singular value decomposition is closely related to the better-known principal component analysis, in which the vectors comprising \(\mathbf W\) are orthonormal. The optimal solution to rank L is then obtained when \(\mathbf W\) is estimated by eigenvectors (principal components) corresponding to the L largest eigenvalues of the empirical covariance matrixFootnote 2. The optimal low-dimensional representation of the data is given by \(\mathbf z _{i} = \mathbf W ^{T}{} \mathbf x _{i}\), which is an orthogonal projection of the data on to the corresponding subspace and maximises the statistical variance of the projected data. This optimal reconstruction may be obtained via truncated SVD of the data matrix—the factors for PCA and SVD are equivalent, though the loadings may differ by independent scaling factors [\([1{\overline{1}}0]\) zone axis by \(\sim 30 ^{\circ }\), such that two microstructural elements overlapped in projection. The overlap of the two crystals was assessed using virtual dark-field imaging, NMF loading maps, and fuzzy clustering membership maps (Fig. 4). The region in which the crystals overlap can be identified by all these methods. The VDF result can be considered a reference and is obtained with minimal processing but requires manual specification of appropriate diffracting conditions for image formation. The NMF and fuzzy clustering approaches are semi-automatic. There is good agreement between the VDF images and NMF loading maps. The boundary appears slightly narrower in the clustering membership map. The NMF loading corresponding to the background component decreases along the profile, which may be related to the underlying carbon film, whilst the cluster membership for the background contains a spurious peak in the overlap region. Finally, the direct beam intensity is much lower in the NMF component patterns than in the true source signals. Our results indicate that either machine learning method is superior to conventional linear decomposition for the analysis of SPED datasets, but some unintuitive and potentially misleading features are present in the learning results.

Fig. 4
figure 4

SPED data from a GaAs nanowire with a twin boundary at an oblique angle to the beam. a Virtual dark-field images formed, using a virtual aperture 4 pixels in diameter, from the circled diffraction spots. b NMF decomposition results. c Clustering results. For b and c the profiles are taken from the line scans indicated, and the blue profile represents the intensity of the background component. Pattern and loading scale bars are common to all subfigures and measure 1 Å\(^{-1}\) and 70 nm respectively

Discussion

Unsupervised learning methods (SVD, ICA, NMF, and fuzzy clustering) have been explored here in application to SPED data as applied to materials where the region of interest comprises a finite number of significantly different microstructrual elements, i.e. crystals of particular phase and/or orientation. In this case, NMF and clustering may yield a small number of component patterns and weighted average cluster centres that resemble physical electron diffraction patterns. These methods are, therefore, effective for both dimensionality reduction and signal unmixing although we note that neither approach is well suited to situations where there are continuous changes in crystal structure. By contrast, SVD and ICA provide effective means of dimensionality reduction but the components are not readily interpreted using analogous methods to conventional electron diffraction analysis, owing to the presence of many negative values. The SVD and ICA results do nevertheless tend to highlight physically important differences in the diffraction signal across the region of interest. The massive data reduction from many thousands of measured diffraction patterns to a handful of learned component patterns is very useful, as is the unmixing achieved. Artefacts in the learning results were however identified, particularly when applied to achieve signal unmixing, and these are explored further here.

To illustrate artefacts resulting from learning methods, model SPED datasets were constructed based on line scans across inclined boundaries in hypothetical bicrystals. Models (Figs. 5 and 6) were designed to highlight features of two-dimensional diffraction-like signals rather than to reflect the physics of diffraction. These were, therefore, constructed with the strength of the signal directly proportional to thickness of the hypothetical crystal at each point, with no noise, and Gaussian peak profiles.

Fig. 5
figure 5

Construction and decomposition of an idealised model SPED dataset system comprising non-overlap** two-dimensional signals. a Schematic representation of hypothetical bi-crystal. b Ground truth end-member patterns and relative thickness of the two crystals. c Factors and loadings obtained by 2-component NMF. d Cluster centre average patterns and membership maps obtained by fuzzy clustering

Fig. 6
figure 6

Non-independent components. a Expected result for an artificial dataset with two ‘phases’ with overlap** peaks. b NMF decomposition. c Cluster results. d SVD loadings of the dataset, used for clustering. Each point corresponds to a diffraction pattern in the scan—several are indicated with the dotted lines. Contours indicate the value of membership to the two clusters—refer to “Methods” section “Data clustering

The model SPED dataset shown in Fig. 5 comprises the linear summation of two square arrays of Gaussians (to emulate diffraction patterns) with no overlap between the two patterns. NMF decomposition exactly recovers the signal profile in this simple case. In contrast, the membership profile obtained by fuzzy clustering, which varies smoothly owing to the use of a Euclidean distance metric, does not match the source signal. The boundary region instead appears qualitatively narrower than the true boundary. Further, the membership value for each of the pure phases is slightly below 100% because the cluster centre is a weighted average position that will only correspond to the end member if there are many more measurements near to it than away from it. A related effect is that the membership value rises at the edge of the boundary region where mixed patterns are closer to the weighted centre than the end members. We conclude that clustering should be used only if the data comprises a significant amount of unmixed signal. In the extreme, cluster analysis cannot separate the contribution from a microstructural feature which has no pure signal in the scan, for example a fully embedded particle. These observations are consistent with the results reported in association with Fig. 4.

A common challenge for signal separation arises when the source signals contain coincident peaks from distinct microstructural elements, as would be the case in SPED data when crystallographic orientation relationships exist between crystals. A model SPED dataset corresponding to this case was constructed and decomposed using NMF and fuzzy clustering (Fig. 6). In this case, the NMF decomposition yields a factor containing all the common reflections and a factor containing the reflections unique to only one end member. Whilst this is interpretable, it is not physical, although it should be noted that this is an extreme example where there is no unique information in one of the source patterns. Nevertheless, it should be expected that the intensity of shared peaks is likely to be unreliable in the learned factors and this was the case for the direct beam in learned component patterns shown in Fig. 4. As a result, components learned through NMF should not be analysed quantitativelyFootnote 3. The weighted average cluster centres resemble the true end members much more closely than the NMF components. The pure phases have a membership of around 99%, rather than 100%, due to the cluster centre being offset from the pure cluster by the mixed data, as shown in Fig. 6d. The observation that memberships extend across all the data (albeit sometimes with vanishingly small values) explains the rise in intensity of the background component in Fig. 4c in the interface region. Such interface regions do not evenly split their membership between their two true constituent clusters, meaning that some membership is attributed to the third cluster, causing a small increase in the membership locally. These issues may potentially be addressed using extensions to the algorithm developed by Rousseeuw et al. [41] or using alternative geometric decompositions such as vertex component analysis [42].

Precession was found empirically to improve machine learning decomposition as discussed above (Fig. 2), so long as the precession angle is large enough. This was attributed primarily to integration through bending of the nanowire. Precession may also result in a more monotonic variation of diffracted intensity with thickness [15] as a result of integration through the Bragg condition. It was, therefore, suggested that precession may improve the approximation that signals from two overlap** crystals may be considered to be combined linearly. To explore this, a multislice simulation of a line scan across a bi-crystal was performed and decomposed using both NMF and fuzzy clustering (Fig. 7). Without precession, both the NMF loadings and the cluster memberships do not increase monotonically with thickness but rather vary significantly in a manner reminiscent of diffracted intensity modulation with thickness due to dynamical scattering. Both the loading profile and the membership profile reach subsidiary minima when the corresponding component is just thicker than half the thickness of the simulation, which corresponds to a thickness of approximately 100 nm and is consistent with the \(2{\bar{2}}0\) extinction length for GaAs of 114 nm. This suggests that the decomposition of the diffraction patterns is highly influenced by a few strong reflections; hence, the variation of the \(2{\bar{2}}0\) reflections with thickness is overwhelming the other structural information encoded in the patterns. The removal of this effect, an essential function of applying precession, is seen: with 10 or 20 mrad precession this intensity modulation is suppressed and the loading or membership maps obtained show a monotonic increase across the inclined boundary. The cluster centres again show intensity corresponding to the opposite end member due to the weighted averaging. Precession is, therefore, beneficial for the application of unsupervised learning algorithms both in reducing signal variations arising from bending, which is a common artefact of specimen preparation, and reducing the impact of dynamical effects on signal mixing.

Fig. 7
figure 7

Unsupervised learning applied to SPED data simulated using dynamical multislice calculations a Original data with a 20 mrad precession angle. b NMF decomposition, in which the loadings have been re-scaled as in Fig. 5. The factors show pseudo-subtractive features, typical of NMF. c Cluster analysis. The high proportion of data points from the boundary means there is information shared between the cluster centres. Without precession, neither method can reproduce the original data structure

Noise and background are both significant in determining the performance of unsupervised learning algorithms. Extensive exploration of these parameters is beyond the scope of this work but we note that the various direct electron detectors that have recently been developed and that are likely to play a significant role in future SPED studies have very different noise properties. Therefore, understanding the optimal noise performance for unsupervised learning may become an important consideration. We also note that the pseudo-subtractive features evident in the NMF decomposition results of Fig. 3 may become more significant in this case and the robustness of fuzzy clustering to this may prove advantageous.

Conclusions

Unsupervised machine learning methods, particularly non-negative matrix factorisation and fuzzy clustering, have been demonstrated here to be capable of learning the significant microstructural features within SPED data. NMF may be considered a true linear unmixing whereas fuzzy clustering, when applied to learn representative patterns, is essentially an automated way of performing a weighted averaging with the weighting learned from the data. The former can struggle to separate coincident signals (including signal shared with a background or noise) whereas the latter implicitly leaves some mixing when a large fraction of measurements are mixed. In both cases, precession electron diffraction patterns are more amenable to unsupervised learning than the static beam equivalents. This is due to the integration through the Bragg condition, resulting from rocking the beam, causing diffracted beam intensities to vary more monotonically with thickness and the integration through small orientation changes due to out of plane bending. This work has, therefore, demonstrated that unsupervised machine learning methods, when applied to SPED data, are capable of reducing the data to the most salient structural features and unmixing signals. The scope for machine learning to reveal nanoscale crystallography will expand rapidly in the coming years with the application of more advanced methods.