Manifold Learning of COPD

Bragman, Felix J. S.; McClelland, Jamie R.; Jacob, Joseph; Hurst, John R.; Hawkes, David J.

doi:10.1007/978-3-319-66179-7_67

Felix J. S. Bragman²¹,
Jamie R. McClelland²¹,
Joseph Jacob²¹,
John R. Hurst²² &
…
David J. Hawkes²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10435))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

12k Accesses
1 Citations

Abstract

Analysis of CT scans for studying Chronic Obstructive Pulmonary Disease (COPD) is generally limited to mean scores of disease extent. However, the evolution of local pulmonary damage may vary between patients with discordant effects on lung physiology. This limits the explanatory power of mean values in clinical studies. We present local disease and deformation distributions to address this limitation. The disease distribution aims to quantify two aspects of parenchymal damage: locally diffuse/dense disease and global homogeneity/heterogeneity. The deformation distribution links parenchymal damage to local volume change. These distributions are exploited to quantify inter-patient differences. We used manifold learning to model variations of these distributions in 743 patients from the COPDGene study. We applied manifold fusion to combine distinct aspects of COPD into a single model. We demonstrated the utility of the distributions by comparing associations between learned embeddings and measures of severity. We also illustrated the potential to identify trajectories of disease progression in a manifold space of COPD.

You have full access to this open access chapter, Download conference paper PDF

Multi-scale Analysis of Imaging Features and Its Use in the Study of COPD Exacerbation Susceptible Phenotypes

CT-Based Local Distribution Metric Improves Characterization of COPD

Article Open access 07 June 2017

Prediction of disease severity in COPD: a deep learning approach for anomaly-based quantitative assessment of chest CT

Article Open access 27 December 2023

1 Introduction

Chronic Obstructive Pulmonary Disease (COPD) is a complex disorder arising from various pathological processes including emphysema and functional small airways disease (fSAD). The extent of emphysema and fSAD that make up overall disease burden can vary, which can affect lung physiology. Both disease processes can progress at different rates, complicating prognostication. Optimising the quantification of disease extent in COPD may improve the precision of disease staging and monitoring.

Analysis of lung disease from Computed Tomography (CT) has typically relied on the analysis of the lung using global averages. Such metrics cannot capture the anatomical distribution of disease. Methods have been proposed to quantify the contribution of various emphysema subtypes [5] or the distribution of image features [2]. Harmouche et al. [5] built an emphysema manifold by analysis of classified emphysema subtypes. A Severity Index (S) was derived from this space that is complimentary to the mean level of emphysema. In contrast, Bragman et al. [2] modelled local distributions of density and biomechanical features; exploiting them to investigate differences between subtypes of COPD whilst also classifying these subtypes.

2 Method

We present a new method to quantify the spread of parenchymal disease and measure its effect on lung deformation. It is based on locally quantifying tissue destruction and deformation to capture heterogeneity or homogeneity across the lung. The outcome is a distribution that quantifies various aspects of lung pathophysiology that can be modelled to test associations with various clinical hypotheses. The distributions can be exploited to quantify inter-patient differences in lung tissue pathology and deformation. A single model of tissue disease and deformation can be obtained by combining separate embeddings obtained from manifold learning with manifold fusion.

2.1 Lung Deformation and Tissue Classification

The deformation between paired breath-hold CT scans acquired at forced residual capacity ($\mathcal {I}_{exp}$, $\varOmega ^{*}$) and total lung capacity ($\mathcal {I}_{ins}$, $\varOmega $) can be obtained using nonrigid registration. The output is a transformation $\varphi $ map** each coordinate $x \in \varOmega \rightarrow x^{*} \in \varOmega ^{*}$. Local volume change is characterised by the Jacobian determinant J. It is calculated on a voxel-wise basis: $J = \text {det}\left( \nabla _{x} \varphi \right) $.

Parametric Response Map** (PRM) [4] was used to classify voxels as emphysema (PRM$_{emph}$) and functional small airways disease (PRM$_{fSAD}$). For all voxels $x_{i} \in \mathcal {I}_{ins}$, the tissue class $z_{i}$ is based on Hounsfield Unit (HU) thresholds in $\mathcal {I}_{ins}$ and $\mathcal {I}_{exp}$. A voxel is classified as PRM$_{emph}$ if $\mathcal {I}_{ins}(x_{i}) \le -950$ and $\mathcal {I}_{exp}(\varphi (x_{i})) \le -856$. A voxel is classified as PRM$_{fSAD}$ if $\mathcal {I}_{ins}(x_{i}) > -950$ and $\mathcal {I}_{exp}(\varphi (x_{i})) \le -856$. The airways and vasculature are segmented by only considering voxels with an HU between $-500$ HU and $-1024$ HU in both scans.

2.2 Local Disease and Deformation Distributions

We present the concept of local feature distributions (Fig. 1a and b). The aim is to quantify local abnormalities in lung physiology and pathology to define a signature unique to a patients disease state. We introduce two models: (1) local disease distributions and (2) local deformation distributions. The disease distributions model the spread of emphysema and fSAD whilst the deformation distribution characterises local volume change across the lung. They are created by locally sampling regions of $\mathcal {Z}$ and J in a Cartesian grid using local regions of interest $\varOmega _{k}$ (ROI) where $k=1 \cdots K$ indexes the center voxel of the ROI. The size ($r \times r \times r$) of the ROI governs the scale of the sampling.

We modelled two properties of disease spread: (1) locally diffuse/dense disease and (2) global homogeneity/heterogeneity. For each ROI centered at $z_{k}$ where $z \in \varOmega _{k}$, we computed the fraction of $\text {PRM}_{emph}$ and $\text {PRM}_{fSAD}$ voxels; defined as $v_{k}(emph)$ and $v_{k}(fSAD)$. Dense disease occurred when $v_{k}(\cdot ) \rightarrow 1$ whilst diffuse disease was present when $v_{k}(\cdot ) \rightarrow 0$. The deviation of diffuse and dense regions in the lung defined the heterogeneity/homogeneity of disease spread.

A distribution $f(v(\cdot ))$ for each feature was built by sampling K regions. The shape of the distribution is governed by the two disease properties (Fig. 1a). It provides information on the nature of local disease spread (diffuse or dense) and whether it is homogeneous or heterogeneous.

Expansion of the lung is dependent on local biomechanical properties (emphysema) and airway resistance (functional small airways disease), which will affect lung deformation locally. To capture volume change on a local basis, the Jacobian map (J) was sampled by calculating the mean Jacobian ($\mu (J)_{k}$) for all $\varOmega _{k}$. A distribution $f(\mu (J))$ of these measurements was built to capture local volume change throughout the lung using the same process as above (Fig. 1b).

2.3 Manifold Learning of COPD Distributions

We hypothesised that the heterogeneity of COPD could be modelled by the local disease and deformation distributions. Manifold learning can be used to capture variability in the distributions and learn separate embeddings for emphysema, fSAD and lung deformation. Fusion of these embeddings can then be performed to create various models of COPD.

Distribution Distance. Inter-patient differences are computed using the Earth Movers Distance ($\mathcal {L}_{EMD}$) [11]. It is a cross-bin distance metric, which measures the minimum amount of work needed to transform one distribution into another. The distributions are quantised into separate histograms $h_{v(emph)}$, $h_{v(fSAD)}$ and $h_{J}$ using $N_{b}$ bins. They are normalised to sum to 1 such that they have equal mass. A closed-form solution of the $\mathcal {L}_{EMD}$ can be used for one-dimensional distributions with equal mass and bins [7]. It reduces to the $\mathcal {L}_{1}$-norm between cumulative distributions (H) of two histograms $h_{1,(\cdot )}$ and $h_{2,(\cdot )}$: $\mathcal {L}_{EMD}\left( h_{1,(\cdot )},h_{2,(\cdot )}\right) = \left( \sum ^{N_{b}}_{n} |H_{n,1,(\cdot )}-H_{n,2,(\cdot )}| \right) $.

Manifold Learning and Fusion. Manifold learning is used to model emphysema, fSAD and Jacobian distributions. The aim is to capture variations in the distributions in a population of COPD patients. As emphysema and fSAD occur synchronously and both affect lung function, the manifold fusion framework of Aljabar et al. [1] is employed to create a single representation of these processes.

For P subjects, the PRM classified volumes are $\mathcal {Z}_{1},\cdots ,\mathcal {Z}_{P}$ and their respective Jacobian determinant maps are $J=J_{1},\cdots ,J_{P}$. The distributions are quantised using $N_{b}$ bins into their respective histograms $h_{p,v(emph)}$, $h_{p,v(fSAD)}$ and $h_{p,J}$. Pairwise measures in the population are obtained with the $\mathcal {L}_{EMD}$ yielding the pairwise matrices $\mathcal {M}^{emph}$, $\mathcal {M}^{fSAD}$ and $\mathcal {M}^{J}$. They can be visualised as connected graphs where each node represents a patient and the edge length is the $\mathcal {L}_{EMD}$. Isomap^{Footnote 1} [12] is applied to each matrix. A K-nearest neighbour search is first performed to create a sparse representation of $\mathcal {M}^{(\cdot )}$ where edges are restricted to the K-nearest neighbourhood of each node. A full pairwise geodesic distance matrix $D^{(\cdot )}$ is then estimated by analysis of the K-nearest graph of $\mathcal {M}^{(\cdot )}$ using Djikstra’s shortest-path algorithm [3]. The low-dimensional embedding $y^{(\cdot )}_{p},p=1,\cdot ,P$ is obtained by minimisation of

$$\begin{aligned} \text {min} \sum _{p,j} \left( D^{(\cdot )}_{p,j} - ||y^{(\cdot )}_{p}-y^{(\cdot )}_{j}|| \right) ^{2} \end{aligned}$$

(1)

using Multi-Dimensional Scaling. The coordinate embeddings for $\mathcal {M}^{emph}$, $\mathcal {M}^{fSAD}$ and $\mathcal {M}^{J}$ are $y^{e}$, $y^{f}$ and $y^{J}$ with dimensions $d^{e}$, $d^{f}$ and $d^{J}$ that are selected.

Fusion of the coordinates $y^{(\cdot )}$ can be performed in any combination to investigate various processes. For simplicity, we consider all embeddings. The coordinates are uniformly scaled with the scale factors $s^{e}$, $s^{f}$ and $s^{J}$ such that the first component of each embedding $y^{(\cdot )}_{1}$ has a unit variance. These are concatenated to yield $Y = (s^{e}y^{e},s^{f}y^{f},s^{J}y^{J})$ with dimension $d^{e} + d^{f} + d^{J}$. A distance matrix $\mathcal {M}^{c}$ is obtained by calculating pairwise Euclidean distances of Y. Isomap is then applied to yield the combined coordinate embedding $y^{c}$ with dimension $d^{c}$.

3 Experiments

3.1 Data Processing

A total of 1, 154 scans of COPD patients (GOLD $\ge 1$) were downloaded from COPDGene [10]. They were acquired on various scanners (GE Medical Systems, Siemens and Philips) with the following reconstruction algorithms: STANDARD (GE), AS+ B31f and B31f (Siemens), and 64 B (Philips). The Pulmonary Toolkit^{Footnote 2} was used for lung segmentation. Breath-hold scans were registered with NiftyReg [9] with a modified version of the EMPIRE10 pipeline [8]. The transformation was a stationary velocity field parameterised by a cubic B-spline and the similarity measure was MIND [6]. The constraint term was the bending energy of the velocity field, weighted at 1$\%$ for all stages of the pipeline. After manual inspection of the registrations, 743 patients were selected. Scans were rejected if there were major errors close to the fissures and the lung boundary.

The sampling size of the ROIs was $r=20$ mm, consistent with the size of the secondary pulmonary lobule. Sampling was performed with a Cartesian grid of center voxels spaced every 5 mm. We chose a value of $N_{b}=60$ as its effect on pairwise distances was minimal with increasing $N_{b}$ when $N_{b}>50$.

The dimensionality d of y and the parameter K for each embedding were determined by estimating the reconstruction quality of the lower-dimensional coordinates. The residual variance $1 - \rho ^{2}_{\mathcal {M},y}$ between the distances in $\mathcal {M}^{(\cdot )}$ and the pairwise distances of $y^{(\cdot )}$ was considered. For each embedding step ($y^{e}$, $y^{f}$ and $y^{J}$), we determined the combination of K and d that minimised the residual variance. Grid-search parameters were set to $d^{*} \in [1,5]$ and $K^{*} \in [5,100]$. Final parameters were $K=[50,30,45]$ and $d=[5,5,4]$ for $y^{e}$, $y^{f}$ and $y^{J}$. We considered a model of the disease distributions ($y^{e}$, $y^{f} \rightarrow y^{c_{1}}$) and a model also including the deformation ($y^{e}$, $y^{f}$, $y^{J} \rightarrow y^{c_{2}}$). Parameters for both models were $K_{c_{1}}=55$ and $K_{c_{2}}=60$ with $d_{c_{1}}=4$ and $d_{c_{2}}=4$.

Table 1. Pearson correlation coefficient between the first three embedding coordinates and the distributions using the median ($\varphi $), median absolute deviation ($\rho $), skewness ($\gamma _{1}$), kurtosis ($\gamma _{2}$). [${*}=p<0.05$, ${\dagger }=p<10^{-3}$]

Full size table

3.2 Associations with Disease Severity

Correlations between the embeddings and distribution moments were computed (Table 1). The first and second components of the embeddings had strong to moderate correlations with the distribution parameters, demonstrating that manifold learning of the distributions modelled the variation in the population.

We considered several models to predict COPD severity using FEV$_{1}\%$predicted and (Table 2). We considered three simple models (mean PRM$_{emph}$, mean PRM$_{fSAD}$ and mean Jacobian $\mu (J)$) and compared them to univariate and multivariate models of embedding coordinates (y). The univariate models ($y^{(e,f)}_{1}$) showed moderate improvement over the simple mean models. However, the combined models ($y^{c_{1}}_{1}$ and $y^{c_{2}}_{1}$) improved model prediction. The multivariate models demonstrated best performance, with model 2 ($y^{c_{2}}=y^{e} + y^{f} + y^{J}$) performing best, even after adjusting for an increase in variables. It had a Bayesian Information Criterion (BIC) of 620 compared to 625 ($y^{c_{1}}$) and 633, 650 and 648 for PRM$_{emph}$, PRM$_{fSAD}$ and $\mu (J)$ respectively. The increase in explanatory power was also seen when correlating the first component of the combined models ($y^{c_{1,2}}_{1}$) with FEV$_{1}\%$predicted. The first components of the combined models had Pearson coefficients of $r=0.67,p<0.001$ and $r=0.70,p<0.001$ respectively. Coefficients for the mean models were $r=-0.63,p<0.001$, $r=-0.50,p<0.001$ and $r=0.52, p<0.001$ respectively. We also used manifold fusion to create a joint model between mean values of PRM$_{emph}$ and PRM$_{fSAD}$ and a second with PRM$_{emph}$, PRM$_{fSAD}$ and $\mu (J)$. Pairwise mean differences were used to create $\mathcal {M}^{(\cdot )}$. Correlation of the first component was $r=0.60,p<0.001$ and $r=-0.65,p<0.001$ respectively. This corroborated the utility of combining embeddings based on the local distributions ($y^{c_{2}}_{1} \rightarrow r=0.70,p<0.001$) (Fig. 2).

Table 2. Regression of models versus various clinical measures of COPD severity. Model performance quoted as adjusted-$r^{2}$. [${\dagger }=p<10^{-3}$]

Full size table

3.3 Trajectories of Emphysema and fSAD Progression

It is likely that trajectories of disease progression in COPD vary depending on the dominant disease phenotype. We assessed whether we can model these in the tissue disease model ($y^{c_{1}}$). We parameterised $y^{c_{1}}$ using the emphysema and fSAD distributions as covariates (l) with kernel regression: $y^{c}(l(\cdot )) = \frac{1}{v} \sum _{i} K(l_{i}-l)y^{c}_{i}$ where K is a Gaussian kernel and v is a normalisation constant. The covariate was the $\mathcal {L}_{EMD}$ between the distributions and an idealised healthy distribution (distribution peak at $v=0$). The outcome is two trajectories in the manifold space (Fig. 3a). The emphysema trajectory can be considered as the path taken when emphysema progression is dominant and vice-versa for fSAD. We classified patients based on these trajectories. A patient is seen to follow an emphysema progression trajectory if it is closest to $y^{c}(l(emph))$. At the baseline, patients are classified as both emphysema and fSAD subtypes. When considering two sets of patients stratified by trajectory, the explanatory power of the embeddings improved in comparison to $y^{c_{1}}$ (Table 2). The emphysema regression produced an adjusted-$r^{2}$ of 0.52 and 0.63 when predicting FEV$_{1}\%$predicted and respectively whilst fSAD was 0.45 and 0.62.

4 Discussion and Conclusion

We have presented a method to parameterise distributions of various local features implicated in COPD progression. The disease distributions model local aspects of tissue destruction whilst modelling global properties of heterogeneity and homogeneity. The deformation distribution quantifies the local effect of disease on lung function. Patients exhibiting different mechanisms of tissue destruction can have identical global averages yet can display different disease distributions. These differences are likely to cause differences in local biomechanical properties, which are captured by the deformation distribution.

We have shown that models of the proposed distributions better predict COPD severity than conventional metrics (Table 2). We have shown that embeddings based on distribution dissimilarities have stronger correlations with FEV$_{1}\%$predicted than those learned from mean differences. Both these results suggest that the position of a patient in the manifold space of $y^{c_{1}}$ or $y^{c_{2}}$ is critical for assessing COPD. This was observed in the trajectory classification (Fig. 3). Determining the trajectory that a patient is following may help inform therapeutic decisions and improve our understanding of COPD progression.

Complexity of the modelling may be increased to model more specific information about lung pathophysiology. Separate manifolds can be produced on a lobar basis. This is likely to further increase the explanatory power of the models since inter-lobar disease metrics correlate with different aspects of physiology. The detection of regional differences in local deformation may add further important information regarding the pathophysiology of a patient.

Notes

References

Aljabar, P., Wolz, R., Srinivasan, L., Counsell, S.J., Rutherford, M.A., Edwards, A.D., Hajnal, J.V., Rueckert, D.: A combined manifold learning analysis of shape and appearance to characterize neonatal brain development. IEEE Trans. Med. Imaging 30(12), 2072–2086 (2011)
Article Google Scholar
Bragman, F.J.S., McClelland, J.R., Modat, M., Ourselin, S., Hurst, J.R., Hawkes, D.J.: Multi-scale analysis of imaging features and its use in the study of COPD exacerbation susceptible phenotypes. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8675, pp. 417–424. Springer, Cham (2014). doi:10.1007/978-3-319-10443-0_53
Chapter Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)
Article MathSciNet MATH Google Scholar
Galbán, C.J., Han, M.K., Boes, J.L., Chughtai, K.A., Charles, R., Johnson, T.D., Galbán, S., Rehemtulla, A., Kazerooni, E.A., Martinez, F.J., Ross, B.D.: CT-based biomarker provides unique signature for diagnosis of COPD phenotypes and disease progression. Nat. Med. 18(11), 1711–1715 (2013)
Article Google Scholar
Harmouche, R., Ross, J.C., Diaz, A.A., Washko, G.R., Estepar, R.S.J.: A robust emphysema severity measure based on disease subtypes. Acad. Radiol. 23(4), 421–428 (2016)
Article Google Scholar
Heinrich, M.P., Jenkinson, M., Bhushan, M., Matin, T., Gleeson, F.V., Brady, M., Schnabel, J.A.: MIND: modality independent neighbourhood descriptor for multi-modal deformable registration. Med. Image Anal. 16(7), 1423–1435 (2012)
Article Google Scholar
Levina, E., Bickel, P.: The earth mover’s distance is the Mallows distance: some insights from statistics. Eighth IEEE Int. Conf. Comput. Vis. 2, 251–256 (2001)
Article Google Scholar
Modat, M., McClelland, J., Ourselin, S.: Lung registration using the NiftyReg package. In: Medical Image Analysis for the Clinic: A Grand Challenge EMPIRE, vol. 10, pp. 33–42 (2010)
Google Scholar
Modat, M., Ridgway, G.R., Taylor, Z.A., Lehmann, M., Barnes, J., Hawkes, D.J., Fox, N.C., Ourselin, S.: Fast free-form deformation using graphics processing units. Comput. Methods Programs Biomed. 98(3), 278–284 (2010)
Article Google Scholar
Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty, T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D.: Genetic epidemiology of COPD (COPDGene) study design. COPD 7(1), 32–43 (2010)
Article Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Article MATH Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the EPSRC under Grant EP/H046410/1 and EP/K502959/1, and the UCLH NIHR RCF Senior Investigator Award under Grant RCF107/DH/2014. It used data (phs000179.v3.p2) from the COPDGene study, supported by NIH Grant U01HL089856 and U01HL089897.

Author information

Authors and Affiliations

Centre for Medical Image Computing, University College London, London, UK
Felix J. S. Bragman, Jamie R. McClelland, Joseph Jacob & David J. Hawkes
UCL Respiratory, University College London, London, UK
John R. Hurst

Authors

Felix J. S. Bragman
View author publications
You can also search for this author in PubMed Google Scholar
Jamie R. McClelland
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Jacob
View author publications
You can also search for this author in PubMed Google Scholar
John R. Hurst
View author publications
You can also search for this author in PubMed Google Scholar
David J. Hawkes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felix J. S. Bragman .

Editor information

Editors and Affiliations

Université de Sherbrooke, Sherbrooke, QC, Canada
Maxime Descoteaux
DKFZ, Heidelberg, Germany
Lena Maier-Hein
Ulm University of Applied Sciences, Ulm, Germany
Alfred Franz
Université de Rennes 1, Rennes, France
Pierre Jannin
McGill University, Montreal, QC, Canada
D. Louis Collins
Université Laval, Québec, QC, Canada
Simon Duchesne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bragman, F.J.S., McClelland, J.R., Jacob, J., Hurst, J.R., Hawkes, D.J. (2017). Manifold Learning of COPD. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D., Duchesne, S. (eds) Medical Image Computing and Computer Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science(), vol 10435. Springer, Cham. https://doi.org/10.1007/978-3-319-66179-7_67

Download citation

DOI: https://doi.org/10.1007/978-3-319-66179-7_67
Published: 04 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66178-0
Online ISBN: 978-3-319-66179-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)