Abstract
Next-generation mobility services require a huge amount of data with multiple attributes. This data is stored as a multi-dimensional array called a tensor. A tensor network is an effective tool for representing a large composite tensor. As an application of the tensor-network formalism to tensor data processing, we present three research results from statistical physics: tree tensor networks, tensor ring decomposition, and MERA.
You have full access to this open access chapter, Download chapter PDF
1 Introduction
The growth of new mobility services, such as automatic driving and ride-sharing, requires a huge amount of data, for example, sensor images and information about the movement of various objects. These data often have multiple attributes, and are stored as a multi-dimensional array called a tensor. When the number of attributes (indexes) of a tensor increases, the size of the tensor becomes huge because it increases exponentially as the number of indexes increases. Therefore, the growth of future mobility services depends on the high-speed and high-quality use of tensor data of huge sizes.
A tensor network (TN) is an effective tool for representing a huge composite tensor and has been extensively developed in quantum information and statistical physics research [2, 12, 13]. For example, it has been used to represent a ground state in quantum many-body systems, where the dimension of the tensor is the number of quantum objects as qubits [13]. Since the tensor network can effectively represent such high-dimensional data, we may efficiently process the general tensor data. Recently, various applications of TN for general tensor data processing have been proposed [1, 5, 7, 10, 14, 18, 24].
In this chapter, we introduce three research areas in statistical physics that use tensor-network formalism to process tensor data: tree tensor networks (TTNs), tensor ring decomposition, and MERA, an extended tree tensor network. After briefly reviewing tensor-network formalism in statistical physics in Sect. 5.2, we explain generative modeling using a TTN for a multi-dimensional probability distribution [1, 5]. In Sect. 5.3, we introduce a new optimization algorithm for the network structure of a TTN. In Sect. 5.4, we outline tensor ring decomposition [24] for the compression of tensor data and explain our new approach for removing redundant information in tensor ring decomposition. In Sect. 5.5, we introduce an extended tree tensor network called MERA [21], which represents the compression of quantum information. We consider the underlying mechanisms of the success of MERA through the MERA representation of the ground state of a one-dimensional quantum model. Finally, in Sect. 5.6, we summarize these tensor-network approaches for tensor data processing.
2 Tensor-Network Formalism
2.1 Tensor Contraction and Tensor Network
Statistical causality between random variables is defined by conditional probability. For example, consider five random variables, \(x_1, x_2, x_3, x_4,\) and \(x_5\), and assume that \(x_4\) and \(x_5\) statistically depend on \((x_1, x_2)\) and \((x_3,x_4)\), respectively. If all these random variables are discrete, and the support is finite, then the conditional probability \(p(x_5|x_1, x_2, x_3)\) is defined by two conditional probabilities:
where \(p_a(|)\) and \(p_b(|)\) are conditional probabilities. If we define the elements of a tensor as a conditional probability, that is,
then (5.1) can be rewritten as
The right-hand side of (5.2) is an example of a tensor contraction, which is a generalization of a matrix product. A tensor contraction is defined as the summation of the multiplication of two tensor elements, with some indexes common to both tensors: for example, \(x_4\) of A and B in (5.2). The composite tensor T is defined by the tensor contraction of A and B for the index \(x_4\) in (5.2).
Using tensor contractions, we can define a large tensor T as
Visually understanding the relationship of each tensor in tensor contractions using only (5.3) is difficult; a diagram notation is useful for visualizing this. Figure 5.1a and b are diagram notations for the left-hand and right-hand sides of (5.3), respectively. As shown in Fig. 5.1a, a geometrical object (circle) is used to represent a tensor T. The open edges represent the indexes of the tensor. In Fig. 5.1b, an edge between two tensors denotes a tensor contraction. A connected edge indicates the connection of two edges (indexes) for a tensor contraction. In summary, a node and connection of nodes in these diagrams represent a tensor and tensor contraction, respectively. The diagram in Fig. 5.1b represents a composite tensor as a network of tensors; hence, we call this a tensor network.
Since a tensor network defined by a diagram is a composite tensor, we can define the class of composite tensors as a tensor network. Several types of network structures for tensor networks have already been proposed. For example, a tensor network with the one-dimensional structure shown in Fig. 5.1b is called a matrix product state (MPS). Historically, tensor networks have been used in the field of quantum information. If quantum amplitudes in a wave function are defined by a tensor, the tensor can be mapped to a quantum state. For example,
where \(\vert x_1x_2x_3\rangle \) is a base, that is, a tensor product of local bases in local Hilbert spaces. Thus, tensor networks can be called quantum states. In addition, an MPS is called a tensor train (TT) in applied mathematics [14]. Generally, if all tensors have open edges and are connected to their neighboring tensors, the tensor network is called a tensor product state (TPS) [9]. The one-dimensional TPS is the MPS, and the two-dimensional TPS is called a projected entangled paired state (PEPS) [20]. A tensor network without a loop structure is called a tree, as shown in Fig. 5.2, and various tree tensor networks (TTNs) exist. Note that an MPS belongs to the TTN class.
2.2 Tensor Decomposition and Tensor Compression
The singular value decomposition of a \(m \times n\) matrix A is \(U\Lambda V^\dagger \), where U and V are isometries, as \(U^\dagger U = V^\dagger V = I\), and \(\Lambda \) is a positive diagonal matrix. The diagonal part of \(\Lambda \) consists of positive singular values. We can perform SVD for any matrix. If the number of singular values is smaller than m and n, or the number of singular values is reduced by removing small values, then the matrix A can be compressed. Although the number of elements in A is \(m\times n\), the total number of independent elements in U, V, and \(\Lambda \) is \((m+n-k)k\). Therefore, if \(k < \min (m,n)\), the SVD is a compression of the original matrix. The SVD with the k largest singular values is the best approximation of A within the rank-k matrix: \(\tilde{U} \tilde{\Lambda } \tilde{V}^\dagger = \text {arg}\min _{\tilde{A}: \text {rank}-k} |\tilde{A} - A|_F\), where \(|X|_F\) is the Frobenius norm, \(|X|_F \equiv \text {Tr}[X^\dagger X]\).
Tensor decomposition is the transformation of a tensor into a composite tensor. Since composite tensors can be defined as a tensor network, a tensor network represents a tensor decomposition. For example, we can regard a tensor as a matrix by splitting the indexes of the tensor into two groups and compositing the indexes in each group. Thus, using SVD, we can decompose a matrix into a product of two matrixes: \(T=U\Lambda V^\dagger = (U\sqrt{\Lambda })(\sqrt{\Lambda }V^\dagger )=AT'\). By decomposing composite indexes into original indexes, we can transform the original tensor into a tensor contraction of two tensors. Repeatedly applying this decomposition to the right-hand side, \(T'\), we can transform any tensor into an MPS without truncations, as shown in Fig. 5.1b. Therefore, an MPS (TT) is a tensor decomposition that is applicable to any tensor. If the dimension of each tensor index is m and the number of indexes is n, then the number of elements in T is \(m^n\). However, the number of elements in the MPS is \(O(n \times (\chi ^2 m))\), where \(\chi \) is the dimension of the indexes between neighboring tensors in the MPS, called a bond dimension. Therefore, an MPS with a fixed bond dimension yields great compression.
If the largest singular values are kept, the approximation can be controlled by the MPS. However, determining the best tensor network with a fixed number of parameters to approximate a given tensor is difficult. Thus, we use a tensor network as a variational ansatz.
The computational cost of a tensor contraction is the product of all dimensions of the indexes in the tensor network that represent the tensor contraction. The computational cost of a tensor network that contains many tensor contractions generally depends on the order in which the tensor contractions are processed; however, determining the best order is an NP-hard problem. In addition, even for a compact tensor network such as a PEPS, the computational cost and spatial one exponentially increase with the number of tensors. Therefore, various approximation methods for tensor contractions have been proposed.
Using a tensor-network structure, we can efficiently calculate tensor contractions in some cases. For example, the computational cost of the inner product of two MPSs is only proportional to the length of the MPS. Therefore, compressing a huge tensor using a proper tensor-network decomposition can significantly reduce the computational cost of processing the tensor data.
3 Generative Model Using a Tree Tensor Network
3.1 Generative Modeling
Generative modeling finds a parametrized distribution \(p({\textbf {x}})\) to approximate a data distribution \(\pi ({\textbf {x}})\) [16]. Since the distance between these distributions can be defined by the Kullback–Leibler (KL) divergence,
we can optimize \(p({\textbf {x}})\) to minimize this KL divergence.
Suppose that a data sample is a vector in which each element takes a state from a finite set of states: \(\{{\textbf {x}}|({\textbf {x}})_i \in \{1, 2, \ldots , m\}\}\). A set of data samples, \({\mathcal M}=\{{\textbf {x}}_\mu \}_{\mu =1,\ldots ,M}\), defines an empirical distribution:
where M is the number of data samples and \(\delta ({\textbf {x}}, {\textbf {y}})\) is one if \({\textbf {x}}={\textbf {y}}\), otherwise it is zero. In practice, when the target distribution of generative modeling is the empirical data distribution, we minimize the negative log-likelihood (NLL) as the loss function during learning,
where \(S(\pi )\) is the entropy of the \(\pi \) distribution.
3.2 Tree Generative Model
Here, we consider a generative model based on a quantum state [1, 5]. Following the Born rule, we define \(p({\textbf {x}})\) as the square of the amplitude of a wave function:
where \(\psi ({\textbf {x}})\) is a wave function and Z is the normalization factor.
MPSs [5] and TTNs [1] have been proposed to define the wave function for generative modeling. Figure 5.1b shows the network structure of an MPS. Each tensor in the MPS has three edges, and these edges are sequentially connected. Figure 5.2 shows the network structure of a TTN. The number of indexes of a tensor in this TTN is equal to that in the MPS. The only difference is the topology of the network; all physical indexes in the TTN, \(x_i\), are connected, and the TTN network has no loop structure. Thus, an MPS is a specialized type of TTN. In the following, a generative model using a TTN is called a tree generative model.
3.3 Canonical Form of TTN
Using redundancy to insert a pair of matrices and their inverses on an edge in a tensor network, we can construct a useful canonical form of a TTN [17]. Since a TTN has no loop, we can decompose the network of a TTN into two trees by cutting an edge, as shown in Fig. 5.3a. If we regard the cut edge as the root edge of each tree, we can define the order of nodes from the terminal nodes (leaves) to the root and apply the following tensor transformations in this order. Each tensor has an edge toward the root node and the remaining two edges. By combining these two edges into an index, the tensor is transformed into a matrix: \(T_{ijk} = M_{i(jk)}\). The SVD of the matrix splits the matrix into an isometry and a matrix connected to the edge toward the root node (see Fig. 5.3b):
where U is an isometry as
The matrix \(M'\) is then absorbed by the next tensor (see Fig. 5.3c). This procedure is repeated from the leaf tensors to the root node. Then, two modified root tensors are obtained and combined into a tensor with four edges. Finally, the SVD of the top tensor obtains two isometries and a diagonal matrix that consists of the singular values (see Fig. 5.3d).
Constructing the canonical form of a TTN: a cutting an edge of the TTN; b splitting a tensor into an isometry (triangle) and a matrix (circle) using the SVD of a leaf; c matrices from the SVD are absorbed into an upper tensor; and d the canonical form of the TTN (the rhombus represents a diagonal matrix that consists of singular values)
The canonical form of a TTN is useful as it enables a direct calculation of the normalization factor in (5.8). Since almost all tensors in the canonical form are isometries with the property defined in (5.10), the normalization factor directly depends on the singular values of the canonical form of the TTN, \(\{\lambda _i\}\):
To calculate the NLL, we need to estimate a set of quantum amplitudes for the data, useful for the network structure of a TTN. The data vector \({\textbf {x}}_\alpha \) of a sample \(\alpha \) can be decomposed to a direct product of local vectors: \(({\textbf {x}}_\alpha )^{(1)} \otimes ({\textbf {x}}_\alpha )^{(2)} \otimes \cdots \). The set of local vectors for samples at a site i is represented by the matrix \(V_{k\alpha }^{(i)}\), where k is an index of the data at a site i. Thus, the total data set is defined as a TTN defined by the same network structure of delta tensors, where the leaves are the matrices \(V^{(i)}\), as shown in Fig. 5.4a; the delta tensor is one if all indexes are the same, otherwise it is zero. Due to the network structure of a TTN, we can efficiently calculate the contraction of the TTN with a data TTN using recursive steps (see Fig. 5.4a–c.)
Evaluation of quantum amplitude: a quantum amplitude \(\psi ({\textbf {x}}_\alpha )\) for a sample \(\alpha \); and b and c recursive calculation of \(\psi ({\textbf {x}}_\alpha )\), except for the part enclosed by the dotted line. The circles indicate data matrices at site i and the squares indicate a delta tensor, which is one if all indexes are the same, otherwise, it is zero
.
3.4 Learning Algorithm
Previous studies [1, 5] have used the learning algorithm for a single node (tensor) near the center of a canonical form with singular values, for example, the part enclosed by the dotted line in Fig. 5.4a. After combining the isometry and singular values in the enclosed part into a single tensor (node) with 3-legs, it can be updated using the gradient of the NLL. The SVD of the update tensor into an isometry, singular values, and a unitary can recover the canonical form of the TTN. Since the center of the canonical form of a TTN can move to a neighboring position in the network using SVD [17], we sweep all nodes in a TTN with single node updates.
3.5 Network Optimization
The network structure of a TTN is important for generative modeling. The performance of the balance tree generative model is better than that of the MPS in several scenarios [1]. Both the MPS and balance tree belong to the tree network class. The difference between them is the structure of their network. Therefore, optimizing the network for given data is effective for generative modeling.
The DMRG algorithm [22, 23] is often used to find the ground state of a one-dimensional quantum model, which is a variational method for an MPS. For a finite system, the DMRG algorithm sweeps and optimizes the tensors in the canonical form of an MPS to improve the variational energy for a target Hamiltonian. A two-site DMRG algorithm simultaneously updates two neighboring tensors in an MPS. Two tensors that are directly connected can be combined into a 4-leg tensor, and this combined tensor can be updated. Finally, the combined tensor is decomposed into two 3-leg tensors using SVD, and the algorithm proceeds to the next pair. The corresponding algorithm for a tree generative model was proposed in [1]. In these studies[1, 22, 23], the network structure of the TTN does not change. The 4-leg tensor can be divided into two tensors in three possible ways, as shown in Fig. 5.5. Selecting a new division globally changes the network structure of the TTN.
We now propose a new algorithm to change the network structure of a TTN for generative modeling. First, two isometries and singular values are combined in the center of the canonical form into a 4-leg tensor. The 4-leg tensor is updated to improve the NLL and is then converted into a matrix. Three matrices are used to divide the 4-leg tensor into two groups. A better division is selected and the SVD of the matrix is conducted to produce a new canonical form. The center of the canonical form is moved, and the updates are repeated. Several strategies can be used to select a better division; further details can be found in [6].
Network structure of a tree generative model: a initial MPS structure for random patterns and b network structure after optimization by our proposed algorithm. Circles indicate probability variables and their color indicates their position in the line. Edges indicate the index of a tensor, which is a vertex with three legs
We tested the proposed algorithm for the empirical probability distribution of 10 random patterns, starting from the MPS structure shown in Fig. 5.6a, because the binary probability variable is on a line. The length of the line is 256. We fixed the center part of the line at 0 and divided the left- and right-hand sides into a random pattern, as shown in the top and bottom rows in Fig. 5.6a. The variables on the left-hand side strongly correlate to those on the right. In this case, when we start with a randomly initialized MPS, we cannot obtain the minimum NLL if the network structure is fixed as an MPS. However, we can find a tree generative model with the minimum NLL by changing the structure of the network using our proposed algorithm. Figure 5.6b shows the optimized structure of the network using the proposed algorithm; this interesting network structure spontaneously emerged. Since the probability variables of the left- and right-hand sides strongly correlate, they are embedded into a compact tree structure, shown in the upper right of Fig. 5.6b. However, the lower part of Fig. 5.6b consists of probability variables in the center, which are fixed to 0.
4 Tensor Ring Decomposition
4.1 Introduction to Tensor Ring Decompositions
As discussed in Sect. 5.2, we can consider a variety of tensor-network decompositions to represent a given tensor. When we perform approximation for such a tensor network decomposition, the efficiency of the data compression highly depends on the structure of the tensor-network and properties of the tensor data. For quantum many-body problems, the area law of entanglement entropy, which determines the scaling of the amount of correlation in the quantum state, plays an important role in selecting a better tensor network. However, the area law does not necessarily hold for general tensor data. Thus, it remains unclear which types of tensor networks, beyond the simplest form of MPS (TT), are suitable for decomposing general tensor data.
In this section, we consider tensor ring decomposition (TRD), which is a fundamental decomposition of multidimensional tensors [24]. In TRD, tensors are decomposed into a form of matrix product states with periodic boundary conditions (Fig. 5.7). For example, for the N-leg tensor \(T_{i_1,i_2,\dots ,i_N}\), the TRD is expressed as
where \(M^{(n)}_{j_n,j_{n+1}}[i_n]\) is a 3-leg tensor, \(i_n = 1,2, \ldots , d_n\), and \(j_n = 1,2, \ldots , D_n\). Note that in the last expression, \(M^{(n)}[i_n]\) is regarded as a matrix for a fixed \(i_n\). Hereafter, we denote a tensor defined by (5.12) as \(\textrm{tTr}\prod _{n=1}^N M^{(n)}\), where \(\textrm{tTr}\) indicates the trace of a tensor network. We can obtain a TRD of a certain tensor, for example, by using successive SVDs; however, this yields an (open boundary) MPS, i.e., the bond dimensions of the two boundaries, \(D_1\) and \(D_N\), become one. However, an open boundary MPS solution is not usually optimal if we want to minimize the maximum or average bond dimension in the TRD, because information needs to flow through the entire system to represent correlations between two edges, which increases the bond dimension, \(D_n\). Thus, an efficient algorithm is required to find the optimal TRD for given tensors.
We can also consider a related problem where we instead approximate an N-leg tensor using TRD for given bond dimensions \(D_n\) to find the exact TRD. The alternated least square (ALS) method [24] is often used to find a numerically approximate TRD. The aim of the ALS method is to find a TRD that minimizes the distance between the original tensor and a TRD representation defined by the Frobenius norm:
The optimal solution is iteratively searched for by solving local linear problems for \(M^{(n)}\) defined by fixing \(M^{(m)}\) for \(m \ne n\). When a 3-leg tensor \(M^{(n)}\) is regarded as a vector,
the squared norm F can be written as
where \(\hat{N}\) and \({{\textbf {W}}}\) are defined as in Fig. 5.8.
Matrix \(\hat{N}\) and vector \({{\textbf {W}}}\) in (5.15) for \(n=1\)
Note that the matrix N is positive-semidefinite by construction. Because F is a quadratic function of \({{\textbf {M}}}\), its extrema are given by solving the linear equations defined by
Thus, when the matrix \(\hat{N}\) is a regular matrix, we can obtain the optimal \({{\textbf {M}}}\):
However, in general, \(\hat{N}\) can have zero eigenvalues and its inverse matrix \(\hat{N}^{-1}\) may not exist. In this case, the linear problem becomes underdetermined. To solve this problem, the pseudo inverse (PI), \(\hat{N}^{+}\), is often used instead of the inverse. Alternatively, the conjugate gradient (CG) method can be used to obtain one of the solutions.
The ALS algorithm often gets trapped at a local minimum of F. In particular, if the initial estimate of \(M^{(n)}\) contains a “redundant loop”, removing such a contribution from the TRD is not easy; thus, it will not converge to the global minimum. We investigate such a situation in Sect. 5.4.2.
4.2 Redundant Loops
We now consider the numerically optimized TRD of T, starting from an initial estimate that includes a redundant loop. For a general TRD, \(\textrm{tTr}\prod _{n}M^{(n)}\), we define an ideal redundant loop by adding additional degrees of freedom in the virtual bond for all \(M^{(n)}\):
This is also shown in Fig. 5.9a. Clearly, the TRD represented by \(\tilde{M}^{(n)}\) is equivalent to that of \(M^{(n)}\) up to a constant \(\sum _{k}\prod _{n}\sigma _{k}^{(n)}\). Thus, no essential information is represented by the additional indices \(k_n\).
When a TRD has redundant loops, the matrix \(\hat{N}\) for \(n=1\) is represented as in Fig. 5.9b. The redundant loops in the upper and bottom parts of this figure are disconnected, indicating that \(\hat{N}\) has zero eigenvalues.
Similarly, the vector \({{\textbf {W}}}\) for \(n=1\) is represented as in Fig. 5.9c. We can easily confirm that the solution obtained by the pseudo inverse of N, \({{\textbf {M}}} = \hat{N}^{+} {{\textbf {W}}}\), maintains the redundant loop. However, the redundant loop can disappear in the general solution of the linear equation along with the eigenvectors that correspond to the zero eigenvalues. Thus, to remove redundant loops, we must properly determine the contribution of the zero eigenvectors.
4.3 Entanglement Penalty Algorithm
We now discuss an idea to improve the optimization of the TRD, inspired by quantum many-body problems. We consider the corner double line (CDL) tensor, which often appears in statistical physics [4], as a model tensor. Based on the CDL structure, we introduce a modified cost function that can avoid the previously discussed local minima.
Let a tensor T be represented by an exact TRD:
Assume that each index \(i_n\) is represented by a set of two indices \((x_n,y_n)\) and that the 3-leg tensor \(C^{(n)}\) has a CDL structure:
T and \(C^{(n)}\) can be represented as in Fig. 5.10. More generally, we can consider unitary matrices that mix the indices \(x_n\) and \(y_n\). In this case, \(C^{(n)}\) is written as
where \(U^{(n)}\) is a unitary matrix.
As discussed in Sect. 5.4.2, when the ALS algorithm is used to find an optimal TRD of T, represented by (5.19), from an initial guess that includes redundant loops, it gets trapped at a local minimum. To avoid such local minima, we consider an additional term in the cost function.
One of the differences between the redundant loop and CDL structure is the entanglement, or correlation, between the original indices, \(i_n\), and virtual indices in the TRD. For redundant loops, the original and virtual indices have no connection, but in the CDL structure they do. When we regard a 3-leg tensor \(M_{i,j_1,j_2}\) as a matrix \(M_{i,(k_1,k_2)}\), the amount of such entanglement can be characterized by entanglement entropy, which is often used in quantum information [3] and is defined using the singular values of M:
where \(\tilde{s_i} = s_i/\sum _i s_i\) is the normalization of the singular value \(s_i\). When M is an \( N \times M\) matrix and \(r=\min (M,N)\), \(S_{\textrm{E}}\) is such that \(0 \le S_{\textrm{E}} \le \log r\): \(S_{\textrm{E}}=0\) for \(\tilde{s}_1 = 1\) and \(\tilde{s}_{i} = 0 (i \ne 1)\), and \(S_{\textrm{E}} = \log r\) for \(\tilde{s}_i = 1/r\). Note that \(S=0\) corresponds to the redundant loop, and \(S_{\textrm{E}}\) takes larger finite values for the CDL. Thus, when we consider an additional term (negatively) proportional to the entropy in the cost function, CDL-like solutions may be favored over the redundant loop. The new cost function is explicitly written as
where \(\epsilon \) is a positive constant. For a finite \(\epsilon \), the global minimum of \(F'\) yields a different TRD from that which minimizes F. Thus, in practice, we may begin the ALS algorithm with a sufficiently large \(\epsilon \) and adjust it toward \(\epsilon = 0\) with each iteration. Using this procedure, the TRD may escape the local minimum and redundant loop during the initial iterations; then, when \(\epsilon = 0\), the ALS algorithm is expected to converge to the global minimum of \(F'(\epsilon =0)=F\).
4.4 Numerical Experiments
We will now demonstrate how the entanglement entropy penalty works using numerical experiments. In these experiments, we performed ALS-like site-wise optimization with the cost functions F and \(F'(\epsilon )\) to find the optimal TRD. Each iteration of the ALS algorithm involves updating N local 3-leg tensors. For \(F'\), the local problem becomes nonlinear; we typically use the CG method to solve such a problem. For F, we can use either the PI or CG method.
As the simplest example, we show the typical optimization dynamics for the ideal CDL represented by (5.21). Figure 5.11a shows the convergence of the ALS algorithm, starting from random tensors, for a 4-leg tensor T consisting of a CDL with a bond dimension of each index \(i_1\) of 16. For the entanglement cost function (5.23), we set \(\epsilon > 0\) for the first few steps and then \(\epsilon = 0\), i.e., \(F'(\epsilon = 0) = F\), to obtain the true global minimum. We use the PI and CG methods to minimize the standard ALS cost function (5.13) and use the CG method for the entanglement cost function. The optimization with the usual cost function fails to converge to the global minimum with the PI and CG methods. However, the entanglement cost function (5.23) converges to the global minimum. When we consider initial tensors with the ideal redundant loops defined in (5.18), the standard ALS almost always gets trapped at a local minimum, as shown in Fig. 5.11b. However, the entanglement cost function avoids getting trapped at local minima and smoothly converges to the global minimum. Further details and results of these numerical experiments are presented elsewhere [11].
Optimization of TRD using the standard cost function with the PI and CG solvers versus the entanglement cost function for a random 4-leg tensor consisting of CDL tensors with a bond dimension of 16 for the outer indices: a the initial tensors are random dense tensors and b the initial tensors are random dense tensors with redundant loops with a bond dimension of 2, as in (5.18). For the entanglement cost function, \(\epsilon = 0.1\) for the first five iterations, which was minimized by the CG method; then, \(\epsilon = 0\), which was minimized by the PI. Each iteration optimizes N tensors. Here, \(N=4\)
These numerical experiments indicate that the entanglement cost function works very well for CDL-type tensors. However, in general, the target of TRD can be very different from the ideal CDL. Our other experiments using tensors constructed from the TRD of general tensors indicate that a naive entanglement cost function does not always escape local minima; it depends on the balance of the bond dimensions of the outer and virtual indices [11].
5 Exact MERA Network and Quantum Renormalization Group for Critical Spin Models
5.1 Introduction
In the previous sections of this chapter, we showed that tensor-network approaches are applicable to information-scientific problems. Thus, a deep understanding of these successes is highly desirable. In quantum physics, the approaches were originally proposed as efficient numerical methods for treating the ground and low-lying states of strongly interacting quantum many-body systems. In this section, we examine the underlying mechanisms of the success of a class of tensor networks called multiscale entanglement renormalization ansatz (MERA) [21].
Usually, two different types of tensor networks are considered depending on how close the target model is to quantum criticality. This is because the magnitude of nonlocal quantum correlation or entanglement determines the structure of the network. Away from criticality, the quantum state can be well represented by the PEPS. The well-known MPS is a one-dimensional realization of PEPS, and the relevant matrix dimension characterizes the magnitude of the quantum entanglement. Alternatively, the algebraic Bethe ansatz is a mathematically exact method for one-dimensional solvable models, and this can also be transformed into an MPS. Thus, this recent approach also has a traditional root, and hence the underlying mathematics is clear. However, in quantum critical cases, the ability of PEPS greatly reduces because the tensor dimension must be increased to precisely numerically optimize the tensor-network wave function. In this case, including an extra dimension, the so-called holographic dimension, in the network is a better approach. This extra dimension corresponds to the flow of real-space renormalization and also plays a role in greatly reducing the tensor dimension. The corresponding network is the MERA network.
Surprisingly, the MERA concept stimulates string theorists because the structure of the network is quite similar to the spacetime concept that appears in gauge/gravity correspondence [8, 15, 19]. This correspondence is considered to be key to understanding the complementary relationship between quantum field theory and general relativity. Therefore, clarifying the mathematical structure of the MERA network is important interdisciplinary research beyond condensed matter and statistical physics.
The traditional methods of a renormalization group (RG) are based on the flow of interaction parameters in the Hamiltonian, which is defined by the repetition of the renormalization group transformation in real or momentum space. By transforming the Hamiltonian, we can determine how the system approaches the fixed point and what the dominant parameters are. However, recent tensor-network approaches are mainly based on the optimization of the variational ansatz, and their relationship with the traditional RG concept is not very clear. We aim to overcome this discrepancy by bridging the tensor network with the RG of the Hamiltonian.
5.2 Heisenberg Model and Quantum Entanglement
We start with the antiferromagnetic Heisenberg Hamiltonian in one spatial dimension:
Here, \({{\textbf {S}}}\) is a quantum spin operator, \({{\textbf {S}}}=\frac{1}{2}{\boldsymbol{\sigma }}\) for Pauli matrix \({\boldsymbol{\sigma }}\), J is the exchange coupling, N is the number of lattice sites, and we assume the periodic boundary condition \({{\textbf {S}}}_{N+1}={{\textbf {S}}}_{1}\). The ground state of this Hamiltonian is \(\left| \psi \right\rangle =\sum _{s_{1},...,s_{N}}\psi ^{s_{1}...s_{N}}\left| s_{1}...s_{N}\right\rangle \), where \(\left| s_{1}...s_{N}\right\rangle \) is the abbreviation of \(\left| s_{1}\right\rangle \otimes \cdots \otimes \left| s_{N}\right\rangle \).
As an example, we consider the 4-site case (\(N=4\)), for which we can obtain the exact eigenstates (this is a very pedagogical example to demonstrate the nature of the MERA network). The eigenvalues are \(E=-2J, -J, 0, 0, 0, J\). In particular, the ground state (\(S_{tot}^{z}=0\)) is given by
where the coefficient \(\sqrt{12}\) corresponds to a normalization factor of \(\left| \psi \right\rangle \). In addition to the first term on the right-hand side of the equation, which is a classical antiferromagnetic configuration, a second term exists due to quantum fluctuation. The second term represents domain excitation, \(\left| \uparrow \downarrow \right\rangle _{i,i+1}\otimes \left| \downarrow \uparrow \right\rangle _{i+2,i+3}\). This state can also be written as
where \(\left| s\right\rangle _{ij}=\left( \left| \uparrow \downarrow \right\rangle _{ij}-\left| \downarrow \uparrow \right\rangle _{ij}\right) /\sqrt{2}\). Thus, two singlets spatially fluctuate, and this state has a finite amount of quantum entanglement. To clarify this feature, we introduce a reduced density matrix by taking the partial trace of the density matrix \(\rho =\left| \psi \right\rangle \left\langle \psi \right| \) as \(\rho _{12}=tr_{34}\rho \). Then, the bipartite entanglement entropy is defined by
Because of the finite \(S_{12}\), simple treatments, such as mean-field theory, break down. Tensor-network methods can efficiently treat this entanglement. For a one-dimensional case, this model can be solved exactly, even for a general N. In this case, the wave function is represented by the Bethe ansatz. The algebraic version of this ansatz can be transformed into the matrix product state.
5.3 Construction of Exact MERA Network
We would like to represent \(\left| \psi \right\rangle \) by the hierarchical tensor network (MERA) in Fig. 5.12. The hierarchical network matches the quantum critical systems since it can represent the presence of various energy scales due to the real-space renormalization processes. This construction can greatly reduce the tensor dimension compared with the MPS representation. This is advantageous in terms of numerical simulation. Furthermore, the hierarchical network contains disentangling tensors. The presence of disentangling tensors is necessary to realize the success of the real-space RG in quantum systems. Because of quantum fluctuation, the spin correlation is essentially nonlocal, thus maintaining a good approximation using only local transformations is difficult. Before the RG, the disentangling tensors properly kill the nonlocal entanglement, and thus the real-space RG becomes successful. This is a key in MERA optimization. Note that such a network without the disentangling tensors is a tree tensor network. The tree tensor network has various applications, as presented earlier in this chapter, even though it may not match the quantum RG for quantum critical systems.
We now consider a method for explicitly constructing the tensor elements of the MERA network. We first introduce the following representation with a singlet and triplet on each of the two sites:
where \(\left| s\right\rangle =\left( \left| \uparrow \downarrow \right\rangle -\left| \downarrow \uparrow \right\rangle \right) /\sqrt{2}\), \(\left| t_{0}\right\rangle =\left( \left| \uparrow \downarrow \right\rangle -\left| \downarrow \uparrow \right\rangle \right) /\sqrt{2}\), \(\left| t_{+}\right\rangle =\left| \uparrow \uparrow \right\rangle \), and \(\left| t_{-}\right\rangle =\left| \downarrow \downarrow \right\rangle \). The purpose of this representation is that it considers the nonlocal bases and connects them to disentangling tensors (see Fig. 5.12).
We now introduce the disentangling transformation \(\left| s\right\rangle \rightarrow \left| 00\right\rangle \), \(\left| t_{+}\right\rangle \rightarrow \left| 01\right\rangle \), \(\left| t_{-}\right\rangle \rightarrow \left| 10\right\rangle \), and \(\left| t_{0}\right\rangle \rightarrow \left| 11\right\rangle \). This is a unitary transformation that locally reduces the amount of entanglement. This change is quite powerful for understanding the properties of the MERA network. Then, we have
The vector \(\left| aa\right\rangle \) (\(a=0,1\)) can be simply represented as \(\left| a\right\rangle \), which compresses the information. The MERA representation of the ground state corresponds to the decomposition of the coefficient \(\psi ^{s_{1}s_{2}s_{3}s_{4}}\) by a set of functional tensors:
Here we assume a spatially uniform network. Thus, the top tensor is defined by
and the isometry tensor is defined by
The disentangling tensor is defined by
The matrix elements of U originate from the combination of \(\left| s\right\rangle \) and \(\left| t_{0}\right\rangle \), which resembles the Haal wavelet transformation. Thus, we conclude that the quantum RG can also be regarded as an extension of the scale control techniques that have been developed in classical systems.
5.4 RG Flow
The effective two-site Hamiltonian after the quantum RG flow using one layer of the disentangling and isometric transformations is obtained by taking the partial expectation value:
The matrix representation of the effective Hamiltonian and vector representation of the top tensor are given by
respectively, the basis set is \(\left| 00\right\rangle \), \(\left| 01\right\rangle \), \(\left| 10\right\rangle \), and \(\left| 11\right\rangle \). The ground state energy is given by
Here, the eigenvalues of \(H^\textrm{eff}\) are \(E=-2J, 0, 0, J\).
We now calculate the bipartite entanglement entropy after the RG, which is defined by the entanglement entropy between the two effective sites (isometry tensors). The reduced density matrix is defined by
Here, the eigenvalues of \(\rho _{R}\) are \(\left( 3\pm 2\sqrt{2}\right) /6\). The entropy is then calculated as
Thus, we determine that \(S_{R}<S_{12}\). Hence, the disentangling procedure actually reduces the entanglement entropy, as discussed in the original paper by Vidal [21].
5.5 Concluding Remarks
In this section, we presented the analytic properties of the MERA network to better understand the nature of the quantum RG. Although general cases with a larger N are still difficult, the present toy model largely demonstrates how the RG occurs in quantum cases. The practical use of a nonlocal basis to decompose the wave function is key to constructing the tensor network.
6 Summary
In this chapter, we presented two applications of tensor networks for tensor data processing and a discussion of the underlying mechanism of the success of a tensor network related to the compression of quantum information (MERA). Regarding further applications, parameter compression of neural networks by tensor networks is also interesting [10]; however, we could not introduce this here due to space limitations. As discussed in this chapter, research on tensor data processing using tensor networks is promising, and its future development is necessary to support next-generation mobility technologies.
References
S. Cheng, L. Wang, T. **ang, P. Zhang, Tree tensor networks for generative modeling. Phys. Rev. B 99(15), 155131 (2019). https://doi.org/10.1103/physrevb.99.155131
A. Cichocki, Tensor networks for big data analytics and large-scale optimization problems (2014). ar**v:1407.3124
J. Eisert, M. Cramer, M.B. Plenio, Colloquium: Area laws for the entanglement entropy. Rev. Mod. Phys. 82, 277–306 (2010). https://doi.org/10.1103/RevModPhys.82.277
Z.-C. Gu, M. Levin, X.-G. Wen, Tensor-entanglement renormalization group approach as a unified method for symmetry breaking and topological phase transitions. Phys. Rev. B 78, 205116 (2008). https://doi.org/10.1103/PhysRevB.78.205116
Z.-Y. Han, J. Wang, H. Fan, L. Wang, P. Zhang, Unsupervised generative modeling using matrix product states. Phys. Rev. X 8(3), 031012 (2018). https://doi.org/10.1103/physrevx.8.031012
K. Harada, T. Okubo, N. Kawashima, Network optimization of tree generative models (in preparation)
T.G. Kolda, B.W. Bader, Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009). https://doi.org/10.1137/07070111X
C.H. Lee, X.-L. Qi, Exact holographic map** in free fermion systems. Phys. Rev. B 93, 035112 (2016). https://doi.org/10.1103/PhysRevB.93.035112
T. Nishino, Y. Hieida, K. Okunishi, N. Maeshima, Y. Akutsu, A. Gendiar, Two-dimensional tensor product variational formulation. Prog. Theor. Phys. 105(3), 409–417 (2001). https://doi.org/10.1143/ptp.105.409
A. Novikov, D. Podoprikhin, A. Osokin, D.P. Vetrov, Tensorizing neural networks. Adv. Neural Inf. Proc. Syst. 28, 442–450 (2015)
T. Okubo, N. Kawashima (in preparation)
K. Okunishi, T. Nishino, H. Ueda, Developments in the tensor network – from statistical mechanics to quantum entanglement. J. Phys. Soc. Jpn. 91(6), 062001 (2022). https://doi.org/10.7566/JPSJ.91.062001
R. Orús, A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann. Phys. 349, 117–158 (2014). https://doi.org/10.1016/j.aop.2014.06.013
I.V. Oseledets, Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011). https://doi.org/10.1137/090752286
X.-L. Qi, Exact holographic map** and emergent space-time geometry (2013). ar**v:1309.6282
R. Salakhutdinov, Learning deep generative models. Annu. Rev. Stat. Appl. 2(1), 361–385 (2015). https://doi.org/10.1146/annurev-statistics-010814-020120
Y.Y. Shi, L.M. Duan, G. Vidal, Classical simulation of quantum many-body systems with a tree tensor network. Phys. Rev. A 74(2), 022320 (2006). https://doi.org/10.1103/physreva.74.022320
E. Stoudenmire, D.J. Schwab, Supervised learning with tensor networks. Adv. Neural Inf. Proc. Syst. 29, 4799–4807 (2016)
B. Swingle, Entanglement renormalization and holography. Phys. Rev. D 86, 065007 (2012). https://doi.org/10.1103/PhysRevD.86.065007
F. Verstraete, J.I. Cirac, Renormalization algorithms for quantum-many body systems in two and higher dimensions (2004). ar**v:cond-mat/0407066
G. Vidal, Entanglement renormalization. Phys. Rev. Lett. 99, 220405 (2007). https://doi.org/10.1103/PhysRevLett.99.220405
S.R. White, Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, 2863–2866 (1992). https://doi.org/10.1103/PhysRevLett.69.2863
S.R. White, Density-matrix algorithms for quantum renormalization groups. Phys. Rev. B 48, 10345–10356 (1993). https://doi.org/10.1103/PhysRevB.48.10345
Q. Zhao, G. Zhou, S. **e, L. Zhang, A. Cichocki, Tensor ring decomposition (2016). ar**v:1606.05535
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Harada, K., Matsueda, H., Okubo, T. (2024). Application of Tensor Network Formalism for Processing Tensor Data. In: Ikeda, K., et al. Advanced Mathematical Science for Mobility Society. Springer, Singapore. https://doi.org/10.1007/978-981-99-9772-5_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-9772-5_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9771-8
Online ISBN: 978-981-99-9772-5
eBook Packages: Computer ScienceComputer Science (R0)