1 Introduction

Chronic Obstructive Pulmonary Disease (COPD) is a complex disorder arising from various pathological processes including emphysema and functional small airways disease (fSAD). The extent of emphysema and fSAD that make up overall disease burden can vary, which can affect lung physiology. Both disease processes can progress at different rates, complicating prognostication. Optimising the quantification of disease extent in COPD may improve the precision of disease staging and monitoring.

Analysis of lung disease from Computed Tomography (CT) has typically relied on the analysis of the lung using global averages. Such metrics cannot capture the anatomical distribution of disease. Methods have been proposed to quantify the contribution of various emphysema subtypes [5] or the distribution of image features [2]. Harmouche et al. [5] built an emphysema manifold by analysis of classified emphysema subtypes. A Severity Index (S) was derived from this space that is complimentary to the mean level of emphysema. In contrast, Bragman et al. [2] modelled local distributions of density and biomechanical features; exploiting them to investigate differences between subtypes of COPD whilst also classifying these subtypes.

2 Method

We present a new method to quantify the spread of parenchymal disease and measure its effect on lung deformation. It is based on locally quantifying tissue destruction and deformation to capture heterogeneity or homogeneity across the lung. The outcome is a distribution that quantifies various aspects of lung pathophysiology that can be modelled to test associations with various clinical hypotheses. The distributions can be exploited to quantify inter-patient differences in lung tissue pathology and deformation. A single model of tissue disease and deformation can be obtained by combining separate embeddings obtained from manifold learning with manifold fusion.

2.1 Lung Deformation and Tissue Classification

The deformation between paired breath-hold CT scans acquired at forced residual capacity (\(\mathcal {I}_{exp}\), \(\varOmega ^{*}\)) and total lung capacity (\(\mathcal {I}_{ins}\), \(\varOmega \)) can be obtained using nonrigid registration. The output is a transformation \(\varphi \) map** each coordinate \(x \in \varOmega \rightarrow x^{*} \in \varOmega ^{*}\). Local volume change is characterised by the Jacobian determinant J. It is calculated on a voxel-wise basis: \(J = \text {det}\left( \nabla _{x} \varphi \right) \).

Parametric Response Map** (PRM) [4] was used to classify voxels as emphysema (PRM\(_{emph}\)) and functional small airways disease (PRM\(_{fSAD}\)). For all voxels \(x_{i} \in \mathcal {I}_{ins}\), the tissue class \(z_{i}\) is based on Hounsfield Unit (HU) thresholds in \(\mathcal {I}_{ins}\) and \(\mathcal {I}_{exp}\). A voxel is classified as PRM\(_{emph}\) if \(\mathcal {I}_{ins}(x_{i}) \le -950\) and \(\mathcal {I}_{exp}(\varphi (x_{i})) \le -856\). A voxel is classified as PRM\(_{fSAD}\) if \(\mathcal {I}_{ins}(x_{i}) > -950\) and \(\mathcal {I}_{exp}(\varphi (x_{i})) \le -856\). The airways and vasculature are segmented by only considering voxels with an HU between \(-500\) HU and \(-1024\) HU in both scans.

2.2 Local Disease and Deformation Distributions

We present the concept of local feature distributions (Fig. 1a and b). The aim is to quantify local abnormalities in lung physiology and pathology to define a signature unique to a patients disease state. We introduce two models: (1) local disease distributions and (2) local deformation distributions. The disease distributions model the spread of emphysema and fSAD whilst the deformation distribution characterises local volume change across the lung. They are created by locally sampling regions of \(\mathcal {Z}\) and J in a Cartesian grid using local regions of interest \(\varOmega _{k}\) (ROI) where \(k=1 \cdots K\) indexes the center voxel of the ROI. The size (\(r \times r \times r\)) of the ROI governs the scale of the sampling.

We modelled two properties of disease spread: (1) locally diffuse/dense disease and (2) global homogeneity/heterogeneity. For each ROI centered at \(z_{k}\) where \(z \in \varOmega _{k}\), we computed the fraction of \(\text {PRM}_{emph}\) and \(\text {PRM}_{fSAD}\) voxels; defined as \(v_{k}(emph)\) and \(v_{k}(fSAD)\). Dense disease occurred when \(v_{k}(\cdot ) \rightarrow 1\) whilst diffuse disease was present when \(v_{k}(\cdot ) \rightarrow 0\). The deviation of diffuse and dense regions in the lung defined the heterogeneity/homogeneity of disease spread.

A distribution \(f(v(\cdot ))\) for each feature was built by sampling K regions. The shape of the distribution is governed by the two disease properties (Fig. 1a). It provides information on the nature of local disease spread (diffuse or dense) and whether it is homogeneous or heterogeneous.

Expansion of the lung is dependent on local biomechanical properties (emphysema) and airway resistance (functional small airways disease), which will affect lung deformation locally. To capture volume change on a local basis, the Jacobian map (J) was sampled by calculating the mean Jacobian (\(\mu (J)_{k}\)) for all \(\varOmega _{k}\). A distribution \(f(\mu (J))\) of these measurements was built to capture local volume change throughout the lung using the same process as above (Fig. 1b).

Fig. 1.
figure 1

Local disease and deformation distributions.

2.3 Manifold Learning of COPD Distributions

We hypothesised that the heterogeneity of COPD could be modelled by the local disease and deformation distributions. Manifold learning can be used to capture variability in the distributions and learn separate embeddings for emphysema, fSAD and lung deformation. Fusion of these embeddings can then be performed to create various models of COPD.

Distribution Distance. Inter-patient differences are computed using the Earth Movers Distance (\(\mathcal {L}_{EMD}\)) [11]. It is a cross-bin distance metric, which measures the minimum amount of work needed to transform one distribution into another. The distributions are quantised into separate histograms \(h_{v(emph)}\), \(h_{v(fSAD)}\) and \(h_{J}\) using \(N_{b}\) bins. They are normalised to sum to 1 such that they have equal mass. A closed-form solution of the \(\mathcal {L}_{EMD}\) can be used for one-dimensional distributions with equal mass and bins [7]. It reduces to the \(\mathcal {L}_{1}\)-norm between cumulative distributions (H) of two histograms \(h_{1,(\cdot )}\) and \(h_{2,(\cdot )}\): \(\mathcal {L}_{EMD}\left( h_{1,(\cdot )},h_{2,(\cdot )}\right) = \left( \sum ^{N_{b}}_{n} |H_{n,1,(\cdot )}-H_{n,2,(\cdot )}| \right) \).

Manifold Learning and Fusion. Manifold learning is used to model emphysema, fSAD and Jacobian distributions. The aim is to capture variations in the distributions in a population of COPD patients. As emphysema and fSAD occur synchronously and both affect lung function, the manifold fusion framework of Aljabar et al. [1] is employed to create a single representation of these processes.

For P subjects, the PRM classified volumes are \(\mathcal {Z}_{1},\cdots ,\mathcal {Z}_{P}\) and their respective Jacobian determinant maps are \(J=J_{1},\cdots ,J_{P}\). The distributions are quantised using \(N_{b}\) bins into their respective histograms \(h_{p,v(emph)}\), \(h_{p,v(fSAD)}\) and \(h_{p,J}\). Pairwise measures in the population are obtained with the \(\mathcal {L}_{EMD}\) yielding the pairwise matrices \(\mathcal {M}^{emph}\), \(\mathcal {M}^{fSAD}\) and \(\mathcal {M}^{J}\). They can be visualised as connected graphs where each node represents a patient and the edge length is the \(\mathcal {L}_{EMD}\). IsomapFootnote 1 [12] is applied to each matrix. A K-nearest neighbour search is first performed to create a sparse representation of \(\mathcal {M}^{(\cdot )}\) where edges are restricted to the K-nearest neighbourhood of each node. A full pairwise geodesic distance matrix \(D^{(\cdot )}\) is then estimated by analysis of the K-nearest graph of \(\mathcal {M}^{(\cdot )}\) using Djikstra’s shortest-path algorithm [3]. The low-dimensional embedding \(y^{(\cdot )}_{p},p=1,\cdot ,P\) is obtained by minimisation of

$$\begin{aligned} \text {min} \sum _{p,j} \left( D^{(\cdot )}_{p,j} - ||y^{(\cdot )}_{p}-y^{(\cdot )}_{j}|| \right) ^{2} \end{aligned}$$
(1)

using Multi-Dimensional Scaling. The coordinate embeddings for \(\mathcal {M}^{emph}\), \(\mathcal {M}^{fSAD}\) and \(\mathcal {M}^{J}\) are \(y^{e}\), \(y^{f}\) and \(y^{J}\) with dimensions \(d^{e}\), \(d^{f}\) and \(d^{J}\) that are selected.

Fusion of the coordinates \(y^{(\cdot )}\) can be performed in any combination to investigate various processes. For simplicity, we consider all embeddings. The coordinates are uniformly scaled with the scale factors \(s^{e}\), \(s^{f}\) and \(s^{J}\) such that the first component of each embedding \(y^{(\cdot )}_{1}\) has a unit variance. These are concatenated to yield \(Y = (s^{e}y^{e},s^{f}y^{f},s^{J}y^{J})\) with dimension \(d^{e} + d^{f} + d^{J}\). A distance matrix \(\mathcal {M}^{c}\) is obtained by calculating pairwise Euclidean distances of Y. Isomap is then applied to yield the combined coordinate embedding \(y^{c}\) with dimension \(d^{c}\).

3 Experiments

3.1 Data Processing

A total of 1, 154 scans of COPD patients (GOLD \(\ge 1\)) were downloaded from COPDGene [10]. They were acquired on various scanners (GE Medical Systems, Siemens and Philips) with the following reconstruction algorithms: STANDARD (GE), AS+ B31f and B31f (Siemens), and 64 B (Philips). The Pulmonary ToolkitFootnote 2 was used for lung segmentation. Breath-hold scans were registered with NiftyReg [9] with a modified version of the EMPIRE10 pipeline [8]. The transformation was a stationary velocity field parameterised by a cubic B-spline and the similarity measure was MIND [6]. The constraint term was the bending energy of the velocity field, weighted at 1\(\%\) for all stages of the pipeline. After manual inspection of the registrations, 743 patients were selected. Scans were rejected if there were major errors close to the fissures and the lung boundary.

The sampling size of the ROIs was \(r=20\) mm, consistent with the size of the secondary pulmonary lobule. Sampling was performed with a Cartesian grid of center voxels spaced every 5 mm. We chose a value of \(N_{b}=60\) as its effect on pairwise distances was minimal with increasing \(N_{b}\) when \(N_{b}>50\).

The dimensionality d of y and the parameter K for each embedding were determined by estimating the reconstruction quality of the lower-dimensional coordinates. The residual variance \(1 - \rho ^{2}_{\mathcal {M},y}\) between the distances in \(\mathcal {M}^{(\cdot )}\) and the pairwise distances of \(y^{(\cdot )}\) was considered. For each embedding step (\(y^{e}\), \(y^{f}\) and \(y^{J}\)), we determined the combination of K and d that minimised the residual variance. Grid-search parameters were set to \(d^{*} \in [1,5]\) and \(K^{*} \in [5,100]\). Final parameters were \(K=[50,30,45]\) and \(d=[5,5,4]\) for \(y^{e}\), \(y^{f}\) and \(y^{J}\). We considered a model of the disease distributions (\(y^{e}\), \(y^{f} \rightarrow y^{c_{1}}\)) and a model also including the deformation (\(y^{e}\), \(y^{f}\), \(y^{J} \rightarrow y^{c_{2}}\)). Parameters for both models were \(K_{c_{1}}=55\) and \(K_{c_{2}}=60\) with \(d_{c_{1}}=4\) and \(d_{c_{2}}=4\).

Table 1. Pearson correlation coefficient between the first three embedding coordinates and the distributions using the median (\(\varphi \)), median absolute deviation (\(\rho \)), skewness (\(\gamma _{1}\)), kurtosis (\(\gamma _{2}\)). [\({*}=p<0.05\), \({\dagger }=p<10^{-3}\)]

3.2 Associations with Disease Severity

Correlations between the embeddings and distribution moments were computed (Table 1). The first and second components of the embeddings had strong to moderate correlations with the distribution parameters, demonstrating that manifold learning of the distributions modelled the variation in the population.

We considered several models to predict COPD severity using FEV\(_{1}\%\)predicted and (Table 2). We considered three simple models (mean PRM\(_{emph}\), mean PRM\(_{fSAD}\) and mean Jacobian \(\mu (J)\)) and compared them to univariate and multivariate models of embedding coordinates (y). The univariate models (\(y^{(e,f)}_{1}\)) showed moderate improvement over the simple mean models. However, the combined models (\(y^{c_{1}}_{1}\) and \(y^{c_{2}}_{1}\)) improved model prediction. The multivariate models demonstrated best performance, with model 2 (\(y^{c_{2}}=y^{e} + y^{f} + y^{J}\)) performing best, even after adjusting for an increase in variables. It had a Bayesian Information Criterion (BIC) of 620 compared to 625 (\(y^{c_{1}}\)) and 633, 650 and 648 for PRM\(_{emph}\), PRM\(_{fSAD}\) and \(\mu (J)\) respectively. The increase in explanatory power was also seen when correlating the first component of the combined models (\(y^{c_{1,2}}_{1}\)) with FEV\(_{1}\%\)predicted. The first components of the combined models had Pearson coefficients of \(r=0.67,p<0.001\) and \(r=0.70,p<0.001\) respectively. Coefficients for the mean models were \(r=-0.63,p<0.001\), \(r=-0.50,p<0.001\) and \(r=0.52, p<0.001\) respectively. We also used manifold fusion to create a joint model between mean values of PRM\(_{emph}\) and PRM\(_{fSAD}\) and a second with PRM\(_{emph}\), PRM\(_{fSAD}\) and \(\mu (J)\). Pairwise mean differences were used to create \(\mathcal {M}^{(\cdot )}\). Correlation of the first component was \(r=0.60,p<0.001\) and \(r=-0.65,p<0.001\) respectively. This corroborated the utility of combining embeddings based on the local distributions (\(y^{c_{2}}_{1} \rightarrow r=0.70,p<0.001\)) (Fig. 2).

Fig. 2.
figure 2

Projection of embeddings (a) \(y^{c_{1}}\) and (b) \(y^{c_{2}}\) with FEV\(_{1}\)%predicted overlayed.

Table 2. Regression of models versus various clinical measures of COPD severity. Model performance quoted as adjusted-\(r^{2}\). [\({\dagger }=p<10^{-3}\)]

3.3 Trajectories of Emphysema and fSAD Progression

It is likely that trajectories of disease progression in COPD vary depending on the dominant disease phenotype. We assessed whether we can model these in the tissue disease model (\(y^{c_{1}}\)). We parameterised \(y^{c_{1}}\) using the emphysema and fSAD distributions as covariates (l) with kernel regression: \(y^{c}(l(\cdot )) = \frac{1}{v} \sum _{i} K(l_{i}-l)y^{c}_{i}\) where K is a Gaussian kernel and v is a normalisation constant. The covariate was the \(\mathcal {L}_{EMD}\) between the distributions and an idealised healthy distribution (distribution peak at \(v=0\)). The outcome is two trajectories in the manifold space (Fig. 3a). The emphysema trajectory can be considered as the path taken when emphysema progression is dominant and vice-versa for fSAD. We classified patients based on these trajectories. A patient is seen to follow an emphysema progression trajectory if it is closest to \(y^{c}(l(emph))\). At the baseline, patients are classified as both emphysema and fSAD subtypes. When considering two sets of patients stratified by trajectory, the explanatory power of the embeddings improved in comparison to \(y^{c_{1}}\) (Table 2). The emphysema regression produced an adjusted-\(r^{2}\) of 0.52 and 0.63 when predicting FEV\(_{1}\%\)predicted and respectively whilst fSAD was 0.45 and 0.62.

Fig. 3.
figure 3

(a) Three-dimensional projection of \(y^{c_{1}}\) and (b) classified trajectories of \(y^{c_{1}}\).

4 Discussion and Conclusion

We have presented a method to parameterise distributions of various local features implicated in COPD progression. The disease distributions model local aspects of tissue destruction whilst modelling global properties of heterogeneity and homogeneity. The deformation distribution quantifies the local effect of disease on lung function. Patients exhibiting different mechanisms of tissue destruction can have identical global averages yet can display different disease distributions. These differences are likely to cause differences in local biomechanical properties, which are captured by the deformation distribution.

We have shown that models of the proposed distributions better predict COPD severity than conventional metrics (Table 2). We have shown that embeddings based on distribution dissimilarities have stronger correlations with FEV\(_{1}\%\)predicted than those learned from mean differences. Both these results suggest that the position of a patient in the manifold space of \(y^{c_{1}}\) or \(y^{c_{2}}\) is critical for assessing COPD. This was observed in the trajectory classification (Fig. 3). Determining the trajectory that a patient is following may help inform therapeutic decisions and improve our understanding of COPD progression.

Complexity of the modelling may be increased to model more specific information about lung pathophysiology. Separate manifolds can be produced on a lobar basis. This is likely to further increase the explanatory power of the models since inter-lobar disease metrics correlate with different aspects of physiology. The detection of regional differences in local deformation may add further important information regarding the pathophysiology of a patient.