1 Introduction

In this article, we explore the growing interface between deep learning and topology. We examine deep learning methods that make use of topological information to understand the shape of data, as well as the use of deep learning in calculating topological signatures. We broadly refer to this intersection of fields as topological deep learning. The advancements in topological deep learning have been enabled by the development of topological data analysis (TDA) over the last two decades.

TDA is a relatively recent amalgam of theory and algorithms that aims to obtain a geometric and topological understanding of data from real-world applications. The approach to data employed in TDA fundamentally differs from that in statistical learning. Rather than finding summary statistics, estimators, fitting approximate distributions, clustering, or training neural nets, TDA instead seeks to understand the properties of the geometric object, often a manifold, on which the data resides. This reflects the common intuition that data tends to lie on, or close to, a lower-dimensional manifold that is embedded in high-dimensional feature space. In this article, we sometimes refer to this as the data manifold.

The main goal of TDA is to infer information about the global structure of the data manifold, such as its connectivity and the presence of multi-dimensional holes. In the pure mathematical setting, this information is characterized by the persistence homology and the related concept of Betti numbers, that counts the number of n-dimensional holes in a manifold. With a finite set of data points, the Betti numbers are unavailable, but TDA employs various substitutes such as persistence diagrams and the barcode. An important property of the topological information obtained is its invariance to continuous deformation and scaling. This property also lends itself to robustness against perturbation and noise. Another benefit is the versatility of the TDA methods, owed mostly to the abstract origins of algebraic topology. The methods are applicable to a wide variety of data types and objects. This includes point cloud data in Euclidean spaces, categorical data, or the analysis of images and functions. TDA is backed by explainable theory but lacks the learning ability and other practical aspects of deep neural networks. Conversely, neural networks suffer from the need for large training datasets and billions of tunable parameters. Due to these aspects, integration of TDA with deep neural networks poses a number of challenges.

Despite much recent activity in co-opting topological approaches in deep learning, what the leading approach should be remains unclear, mostly because of computational and theoretical concerns. The TDA methods discussed in this paper form but a small part of the ever-expanding interface between topological data analysis and machine learning. However, it is important to state that this survey does not provide exhaustive background on TDA background and literature. For that, we refer our reader to the following excellent studies:  (Pun et al. 2016), Krizhevsky et al. (2012)) has shown that learning representations (i.e. feature learning) is a preferable approach.

Fig. 4
figure 4

Topological Deep Learning introduces TDA methods to deep models leading to topological neural architectures that can potentially address deep learning limitations. This is done by plugging topological components for a learning features Embedding (Sect. 3.1), b enhancing the learned Representations (Sect. 3.2), and/or c regularizing the model using a topological Loss (Sect. 3.3). Beyond that, d TDA can be used post-training to reveal insights of trained models (interpretability) (Sect. 3.4)

3 Topological deep learning (TDL)

Topological representations that incorporate structural information hold great promise for topological deep learning models (Hofer et al. 2017). Combining these cues with deep learning approaches has inherent benefits in various applications. On the flip side, deep learning approaches can be useful in overcoming some common hurdles faced by TDA approaches in estimating robust topological features. The incorporation of topological concepts into deep learning has only recently been investigated and the following benefits have been observed:

  • Global features from input data can be efficiently and robustly extracted that would otherwise be inaccessible via traditional feature maps.

  • TDA is versatile and adaptable, meaning that we are not limited to specific problems and types of data (such as images, sensor measurements, time series, graphs, etc.).

  • TDA is noise-resistant across a number of problems, which include the classification of 3D surface meshes (Som et al. 2018; Reininghau et al. 2015; Li et al. 2014), the recognition of 2D object shapes (Turner et al. 2014), the manifold of natural image patches (Carlsson et al. 2007), the analysis of activity patterns in the visual cortex (Singh et al. 2008), and clustering (Chazal et al. 2013).

  • TDA can be applied to arbitrary data structures without any prepossessing provided the right filtrations are used.

  • A new trend is emerging that allows efficient backpropagation through persistent homology components. This has been a long-standing challenge in TDA (further discussed in Sect. 3.3), but topological layers are now becoming compatible with deep learning and end-to-end training schemes.

We reiterate that though the benefits of using TDA (more specifically, persistent homology) and deep learning together have demonstrated success, there are still some theoretical and computational challenges in the application of TDA to data. We discuss these issues at length in Sect. 4.2.

In the rest of this section, we investigate TDA for deep learning from lenses of different magnifications and perspectives, as shown in Fig. 4. In particular, we explore the use of persistent homology in various different ways. The discussion in Sects. 3.13.3 is focused on the on-training integration of TDA. That is, building topological neural architectures. However, a holistic view should also consider TDA’s contribution to post-training (deep topological analytics). These analytics use TDA to study the ‘shape’ of a trained model. Thus, we review works that studied deep model complexity and interpretability using TDA in Sect. 3.4.

3.1 Learning topological features embedding

In this section, we extend the discussion of fixed vectorization methods (Sect. 2.3) by introducing deep learnable vectorization (i.e. embedding). A key advantage here is the possibility of leveraging the deep model to simultaneously learn the vectorization of data and the representation of the target task. For example, we may parameterize the vectorization of persistence diagrams \(\textrm{PD}\) to embedding vector \(V \in \mathbb {R}^d\) by neural layers \(f_w\) where w denotes the trainable parameters. Guided by the task loss, we can efficiently learn map** \(f_w: \mathrm {PD_x} \rightarrow V_{x}\) and automatically answer the question of “which family of vectorizations should best work for the given task”.

Handling PDs by neural networks is the focus of many deep topological embedding studies. Generally, PDs deep vectorization layers should be continuous and permutation invariant with respect to the input. The latter requirement is motivated by the set nature of the persistence diagram. Hofer et al. (2017, 2019) introduced the first learnable deep vectorization of PDs. It adopts a permutation invariant transformation by evaluating the PD’s points against Gaussian(s) whose mean and variance are learned during the training. Since permutation invariance was explored in other deep learning problems (e.g. Deep Set (Zaheer et al. 2017) for points cloud), some vectorization techniques for PD were borrowed from them. For example, PersLay (Carrière et al. 2020) builds on DeepSets for embedding extended PDs encoding graphs and uses it for graph classification. Recently, transformers were used for PDs embedding. Persformer (Reinauer et al. 2021) architecture showed superiority in synthetic and graph tasks while having some interpretability features. Note that transformers without positional encoding can be made as expressive as Deep Sets. Thus, the permutation invariance requirement can be maintained.

Zhou et al. (2022) proposed TopologyNet, a novel approach, to directly fit the output of topological representations derived from input point cloud data. This innovative method substantially reduces computation time for generating topological representations, in contrast to traditional pipelines, while maintaining a minimal approximating error in practical scenarios. The resultant output of TopologyNet holds potential for various downstream tasks that require efficient topological representations. Experimental evaluations involved incorporating TopologyNet as a topological branch within an autoencoder framework. The results demonstrated that the inclusion of the topological branch led to superior topology quality in the generated point clouds compared to an autoencoder lacking such a branch. Furthermore, the latent vectors generated by a topological autoencoder were employed to train a latent generative adversarial network (GAN), enabling the generation of new point clouds from Gaussian noise. Evaluation indices indicated that the inclusion of the topological autoencoder within the generative adversarial network resulted in improved quality of the newly generated point clouds, surpassing the performance of a GAN lacking the topological autoencoder.

Beyond PDs, deep embedding was explored for other topological signatures. For example, PLLay (Kim et al. 2020) provides a layer for embedding persistence landscapes. PLLay claim to robustness to extreme topological distortion is backed by a tight stability bound that’s independent of the input complexity.

Topological embedding transforms the topological input with a complex structure into a vector representation compatible with deep models. As discussed in this section, the process uses a custom topological input layer for embedding. In the next section, we explore topological components that enhance deep learning representation and usually have the flexibility to be plugged anywhere in the network.

Algorithm 1
figure a

Deep learnable topological embedding

Algorithm 1 represents the process of embedding persistence diagrams (PDs) into a vector space using deep neural network layers. The procedure DeepTopologicalEmbedding takes a persistence diagram as input, initializes an embedding vector and neural layers, and then maps each point in the PD to the embedding vector. The process is guided by a loss function to determine the best vectorization for the given task.

3.2 Integration of topological representations

Representation learning is the process of learning features from data that can be used to improve the accuracy of the model. Deep learning excels in this regard thanks to its powerful feature learning, but having a good representation goes further than achieving good performance on a target task (Bengio et al. 2013). For example, TDA’s stability can make deep representation resilient to input perturbation (de Surrel et al. 2022). Below, we review two categories of deep topological representations.

Constrained representations One approach is to train deep neural networks to learn representations that preserve the persistent homology of the input data. Again, TDA’s versatility ensures the feasibility of this as the topological signature can be computed for both the input and the internal representation. For example, Topological Autoencoders (Moor et al. 2020) perform the alignment through a loss, minimizing the divergence between input and latent representation topologies (both captured by PDs).

Augmented representations Another approach for topological representation is augmenting the deep features with topological signatures. Persistence Enhanced Graph Network (PEGN) (Zhao et al. 2020) developed graph spatial convolution that builds on persistence homology. Normally, convolution filters can adapt to local graph structures through the use of node degree information. In contrast, PEGN weights the message passing between nodes through neighborhood information captured by persistence images. Moreover, Graph Filtration Learning (GFL) (Hofer et al. 2020) adapts the readout operation (a graph pooling-like operation) in Graph Neural Network (GNN) to be topologically aware. BDs are computed for the graph nodes feature and vectorized. Interestingly, the filtration function is learned end-to-end. Topological Graph Layer (TOGL) (Horn et al. 2022) extends GFL’s idea and learns multiple filtrations of a graph (rather than one) in an end-to-end manner.

Unlike the embedding layers (e.g. PersLay Carrière et al. (2020)) that expect a pre-specified input type (e.g. PDs), the topological representation layers discussed in this section enjoy more flexibility regarding the input and placement in the network. This comes with the attached cost of requiring careful design choices and guarantees on the layer characteristics (e.g. consistency of gradients in Hofer et al. (2020)).

Algorithm 2
figure b

Topological Representation Integration in Deep Neural Networks

The process of integrating topological representations into deep learning models is outlined in Algorithm 2. The exact method used (e.g. Topological Autoencoders, PEGN, GFL, TOGL) depends on the specific approach chosen.

3.3 Topological loss

The most common approach for leveraging topology in deep learning is incorporating a topological penalty in the loss. The popularity of the approach stems from the fact that loss-based integration is straightforward and does not require changing the architecture or adding additional layers. The only caveat is that the loss should be differentiable and easy to compute. As iterated previously, the capability of topological features to capture the complex structure of the data means that deep learning can learn robust representations guided by topological loss. Thus, the representations are likely invariant with respect to typical transformations present in real-world datasets, such as noise and outliers. An example of this is a common persistence loss (Hu et al. 2019), which minimizes the difference between a predicted persistence diagram \(\textrm{PD}_X\) and the true diagram \(\textrm{PD}_Y\):

$$\begin{aligned} \mathcal {L}_{\text {topological}} = d(\textrm{PD}_X,\textrm{PD}_Y) \end{aligned}$$
(1)

This has been used either as a standalone loss or as a regularizer (i.e. augmenting another loss) (Hu et al. 2019) in applications such as semantic segmentation (Hu et al. 2019), or generative modeling (Wang et al. 2020).

As discussed in 3.1, PDs do not lend themselves to vector representations in Euclidean space. Moreover, the PD is not differentiable (a key requirement for using backpropagation). One strategy to resolve this is to leverage a divergence or metric that can handle PDs. The p-WassersteinFootnote 1 distance and the bottleneck distance are popular choices:

$$\begin{aligned} d_{p,q}(\textrm{PD}_X,\textrm{PD}_Y)&= \Big [ \inf _{\pi \in \Pi (\textrm{PD}_X, \textrm{PD}_Y) } \sum _{t \in \textrm{PD}_X} \Vert t - \pi (t)\Vert _{q}^{p} \Big ]^{\frac{1}{p}} \end{aligned}$$
(2)
$$\begin{aligned} d_{\infty }(\textrm{PD}_X,\textrm{PD}_Y)&= \inf _{\pi \in \Pi (\textrm{PD}_X, \textrm{PD}_Y) } \sup _{t \in \textrm{PD}_X} \Vert t - \pi (t)\Vert _{\infty } \end{aligned}$$
(3)

where t is a point corresponding to a \((b_i, d_i)\in \mathbb {R}^2\) that is in \(\mathrm {PD_X}\), and where \(\Pi (\textrm{PD}_X, \textrm{PD}_Y)\) denotes the set of bijection between \(\textrm{PD}_X\) and \(\textrm{PD}_Y\), and \(\Vert .\Vert _q\) is the \(\ell _q\) Euclidean norm. It can be seen that the bottleneck distance is the largest distance between any pair of corresponding points across all bijections that preserve the partial ordering of the points (i.e. we cannot match a point with a birth time greater than another point’s death time). This ensures that the topological features to be matched are comparable.

The initial popularity of the bottleneck distance is perhaps fueled by a stability theorem (Cohen-Steiner et al. 2005) for PDs of continuous functions. According to this theorem, the bottleneck distance is controlled by \(L_\infty \) distance, that is

$$\begin{aligned} d_{\infty }(\textrm{PD}_{f_1},\textrm{PD}_{f_2}) \le C \Vert f_1-f_2\Vert _{\infty } \end{aligned}$$
(4)

form some constant C. In effect, this means that the diagrams are stable with respect to small perturbations of the underlying data. A similar stability result exists for the p-Wasserstein distance. These are the foundation of the stability guarantees by recent deep learning works such as the stability of Heat Kernel Signature in graphs (Carrière et al. 2020) and stability of mini-batch-based diagram distances in Topological Autoencoders (Moor et al. 2020).

Among the limitations of (2) and (3) is the high computational budget needed by these distances when the number of points is large. As the distance requires point-wise matching, the computational complexity is \(\mathcal {O}(n^3)\) for n points (Anirudh et al. 2016). Also, in many applications (Wang et al. 2020; Chen et al. 2019), we aim to learn a model \(f_w\) that aligns a predicted diagram \(\textrm{PD}_P\) with a target (i.e. ground truth) diagram \(\textrm{PD}_T\) by gradually moving \(\textrm{PD}_P\) points towards \(\textrm{PD}_T\). This is typically achieved by pushing w in the negative direction of \(\nabla _w \mathcal {L}_{\text {topological}}\) and, obviously, assumes that the loss is differentiable with respect to the diagram. While the Wasserstein distance satisfies this requirement in general, it can have some instability issues (Solomon et al. 2021). Below, we select a few representative papers using topological losses in various applications and show how they handle these issues.

In generative modeling, TopoGAN (Wang et al. 2020) uses a slightly modified 1-Wassertsein distance to align the diagrams of generated and real images in medical image applications. The loss ignores the death time and focuses only on the birth time of the diagram features. Framed in this way, the loss becomes similar to the Sliced Wasserstein (Peyré et al. 2019), which can be computed efficiently and is still differentiable. A similar loss was used by Hu et al. (2019) for segmentation to encourage the deep model to produce output with a topology that was close to the ground truth. The cross-entropy loss is augmented with the 2-Wasserstein loss between persistence diagrams. To alleviate the computational burden, the method performs the calculation on a single small image patch (part of the image) at a time. In (Clough et al. 2022), the authors rely on Betti numbers for semi-supervised image segmentation. A notable advantage here is the output of a network trained on a small set of labeled images can still capture the actual Betti numbers correctly. This gives us the opportunity to initially train the model on a small labeled dataset guided by the Betti numbers loss \(\mathcal {L}_{\beta }\). The model is then fine-tuned using a large unlabeled dataset and guided by a loss (that incorporates \(\mathcal {L}_{\beta }\)). Since the estimation of Betti numbers is robust for unlabeled data, \(\mathcal {L}_{\beta }\) will regularize the second stage of training (fine-tuning). In classification, (Chen et al. 2019) uses a topological regularizer. To speed up the computation, it focuses on the zero homological dimension, where the persistence computations are particularly faster.

Algorithm 3
figure c

Topological Loss for Deep Learning

Algorithm 3 outlines the computation of topological loss using either the p-Wasserstein distance or the bottleneck distance. The procedure TopologicalLoss takes two persistence diagrams \(\textrm{PD}_X\) and \(\textrm{PD}_Y\), and the parameters p and q, then computes the p-Wasserstein or bottleneck distance as the topological loss. This loss can be used in deep learning models to minimize the difference between predicted and true topological features.

3.4 Deep topological analytics

The complementary value of TDA goes beyond on-training integration and constructing topological neural architectures. In fact, leveraging TDA methods post-training can be even more insightful and powerful. Currently, researchers use TDA to address deep learning transparency  (Liu et al. 2020), studying model complexity (Rieck et al. 2019; Carlsson and Gabrielsson 2020) and even tracking down answers for seemingly mysterious aspects of deep learning, e.g. why deep networks outperform shallow ones (Naitzat et al. 2020). These efforts are centered around analyzing deep models using TDA approaches. Hence, we call it deep topological analytics. We explore two aspects of it below.

Quantifying structural complexity Watanabe and Yamana (2021) treats the neural networks as a weighted graph G(VE) where V and E denote the network neurons and the relevance scores (computed from weights); respectively. By computing persistence features (e.g. Betti numbers) across filtration, we can gain insight into the network complexity. For example, the increase in the Betti number (the occurrence of a cycle between a set of neurons) can reflect the complexity of knowledge in deep neural networks. In Rieck et al. (2019), the authors follow the same line and further develop training optimization strategies (e.g. early stop**) informed by homological features.

Visual exploration of models Another use of TDA here is to provide a post-hoc explanation and/or visual exploration of the internal functioning of deep models. For example, topological information provides insight into the overall structure of high-dimensional functions. The authors in Liu et al. (2020) use this to offer a scalable visual exploration tool for data-driven black box models. This is an important research problem, where doing so in an intuitive way is a challenge. They also use topological splines to visualize the high-dimensional error landscape of the models. Similarly, TopoAct (Rathore et al. 2021) offers insightful information on neural network learned representations and provides a visual exploration tool to study topological summaries of activation vectors. Works such as Polianskii (2018) shed light on how neural networks maintain the topological properties of the data when they are projected into low-dimensional space.

DNN focused topology optimization The concept of “Inverting Representation of Image” and “Physically Informed Neural Network” served as inspiration for the creation of the topology optimization via neural reparameterization framework (TONR) (Zhang et al. 2021), which aims to address a variety of topology optimization issues. In this approach, the density field is optimized through the updating of DNN parameters and carefully choosing the initial parameters. This leads to quicker training and suggests a good measure for topology optimization.

4 Discussion

TDA is a steadily develo** and promising area, with successes in a wide variety of applications. However, there are open questions in applying TDA with deep neural networks. In this section, we discuss various successes and applications of deep TDA, we highlight several open challenges for future research on deep TDA in both practical and theoretical aspects, and paint a speculative picture by outlining what persistent homology holds for the future. We also note some open-source implementations available for researchers to get started.

4.1 Successes and applications

Deep TDA has demonstrated potential in a variety of challenging settings. The invariance of PH information to continuous deformation means TDA applies well to settings where objects should have consistent shapes but may be transformed in some way. TDA also performs well to bridge the gap between structural information and prior knowledge. If we have prior knowledge of the topology of a class of objects, then PDs are an effective tool for the classification and comparison of data against this class, even in the presence of noise or limited data. This robustness is well adapted to deep learning.

A potential area of application for topological data analysis (TDA) combined with deep learning lies in multi-class segmentation tasks. In such tasks, it becomes feasible to delineate the topology of individual classes as well as the boundaries between each class. This extension can be viewed as an implementation of persistent homology (PH) to address the issue examined in a study by Clough et al. (2022) and Haft-Javaherian et al. (2020), where prior information was utilized to define the adjacencies amongst different brain regions.

TDA can produce good results in small datasets (Byrne et al. 2021; BenTaieb and Hamarne 2016), and is especially useful for medical imaging applications where cost and privacy concerns often limit data acquisition. Byrne et al. (2021), BenTaieb and Hamarne (2016) have investigated the limitations of conventional deep learning training procedures when applied to small datasets. It reveals that these procedures heavily rely on pixel-wise loss functions, which restrict the optimization process in terms of extended or global features. They used persistent homology and constructed topological loss functions to evaluate image segments against a known prior, resulting in a richer description of segmentation topology with better accuracy.

As persistence homology describes the global structure, develo** topological loss functions could suppress small false positives or false negatives related to the topology of an object. For example, in the segmentation task, techniques such as morphological operations or CRF-based techniques are used to remove local errors; they do not have the concept of global topology. The benefit of PH-based loss is that the correct global topology can be propagated with local label smoothness. TDA has been used in settings with limited or noisy data, such as power forecasting (Senekane et al. 2021), segmenting aerial photography (Mosinska et al. 2018) and astronomy (Murugan and Robertson 2022).

As deep learning models continue to grow in complexity and dataset to grow in size, scalability and efficiency become even more crucial. Future directions in TDA for deep learning involve the development of scalable algorithms and efficient computational frameworks capable of handling large-scale datasets. This would enable the application of topological data analysis to diverse domains and real-world problems.

Interpreting deep learning models’ decisions remains a challenging endeavor. TDA offers a unique perspective by providing interpretable representations of complex data. Future directions in this area will focus on develo** methodologies to extract meaningful topological features and interpret their significance in the context of deep learning tasks. This will facilitate a better understanding of the decision-making process for deep neural networks and increase their trustworthiness.

Regularization plays a crucial role in preventing overfitting and improving the generalization ability of deep learning models. Future research will explore how TDA-based regularization techniques can be integrated into deep learning frameworks. This could involve incorporating topological penalties or constraints to encourage models to capture meaningful topological features, leading to improved model generalization and robustness.

Many real-world applications involve multimodal data, such as images, text, and sensor data. Combining TDA with deep learning techniques provides a promising avenue for analyzing and integrating information from multiple modalities. Future directions include the development of TDA methods that can handle multimodal data and exploit the interactions between different modalities to uncover complex relationships and structures.

Transfer learning has proven to be an effective strategy for leveraging knowledge gained from one task to improve performance on a related task. Integrating TDA into transfer learning frameworks can enable the transfer of topological knowledge between domains or datasets. This could facilitate the adaptation of deep learning models to new domains by preserving the underlying topological structure and transferring relevant information.

Moreover, deep learning may yet yield new kinds of topological representation other than PDs, with robustness to different data deformations. PH could have further applications in multi-class open-set problems (where data may have unknown classes). If the topology among classes is relatively consistent, then the object labels of unknown classes could be better predicted.

4.4 Implementations

There are a number of open-source implementations of TDA available to practitioners. Here, we present three libraries that have interfaces with deep learning architectures.

GUDHIFootnote 2 is an open-source library that implements relevant geometric data structures and TDA algorithms, and it can be integrated into the TensorFlow framework. PersLay (Carrière et al. 2020) and RipsLayer are implementations using GUDHI that learn persistence representations from complexes and PDs. They can handle automatic differentiation and are readily integrated in deep learning architectures.

Giotto-deepFootnote 3 is an open-source extension of the Giotto-TDA library. It aims to provide seamless integration between TDA and deep learning on top of PyTorch. To use topology for both pre-processing data (using a variety of available methods) and using it within neural networks, the developers aim to provide several off-the-shelf architectures. One such example is that of Persformer (Reinauer et al. 2021).

TopoModelXFootnote 4 is a recent Python package that extends Graph Neural Networks (GNNs) for application in topological domains, demonstrating a substantial development in the field of topological deep learning. The implementation of topological neural networks in TopoModelX started as the ICML 2023 Topological Deep Learning Challenge (Papillon et al. 2023a), hosted by the second annual Topology and Geometry (TAG) in Machine Learning Workshop at ICML. Participants contributed by implementing existing topological neural network methods from the literature and applying them to train on a benchmark dataset. TopoModelX offers a robust framework and essential functionalities, enabling researchers to either implement new GNN-based TDL algorithms or apply existing methodologies from scholarly literature to their specific problems.

5 Conclusion

The recent growth in TDA and the established efficacy of deep learning have meant that the integration of these techniques has been inevitable. There is no universal paradigm for combining TDA and deep learning. This article surveyed numerous ways in which these frameworks have benefited each other. We began with an overview of the key TDA concepts. Following this, we reviewed TDA in deep learning from a variety of perspectives. We described numerous challenges and opportunities that remain in this field, as well as some observed successes.