Abstract
Dynamic contrast-enhanced MRI (DCE-MRI) is an imaging protocol where MRI scans are acquired repetitively throughout the injection of a contrast agent. The analysis of dynamic scans is widely used for the detection and quantification of blood brain barrier (BBB) permeability. Extraction of the pharmacokinetic (PK) parameters from the DCE-MRI washout curves allows quantitative assessment of the BBB functionality. Nevertheless, curve fitting required for the analysis of DCE-MRI data is error-prone as the dynamic scans are subject to non-white, spatially-dependent and anisotropic noise that does not fit standard noise models. The two existing approaches i.e. curve smoothing and image de-noising can either produce smooth curves but cannot guaranty fidelity to the PK model or cannot accommodate the high variability in noise statistics in time and space.
We present a novel framework based on Deep Neural Networks (DNNs) to address the DCE-MRI de-noising challenges. The key idea is based on an ensembling of expert DNNs, where each is trained for different noise characteristics and curve prototypes to solve an inverse problem on a specific subset of the input space. The most likely reconstruction is then chosen using a classifier DNN. As ground-truth (clean) signals for training are not available, a model for generating realistic training sets with complex nonlinear dynamics is presented. The proposed approach has been applied to DCE-MRI scans of stroke and brain tumor patients and is shown to favorably compare to state-of-the-art de-noising methods, without degrading the contrast of the original images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The appendix is avilable in the electronic version of the manuscript and at: https://drive.google.com/file/d/0B_vghaLYgXRKTnAwSU5oLUNDWmc/view?usp=sharing.
References
Abbott, N.J., Friedman, A.: Overview and introduction: the blood-brain barrier in health and disease. Epilepsia 53(s6), 1–6 (2012)
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 227–236. Springer, Heidelberg (1990)
Brix, G., Semmler, W., Port, R., Schad, L.R., Layer, G., Lorenz, W.J.: Pharmacokinetic parameters in CNS Gd-DTPA enhanced MR imaging. J. Comput. Assist. Tomogr. 15(4), 621–628 (1991)
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 60–65 (2005)
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: ICASSP, pp. 8609–8613. IEEE (2013)
Gal, Y., et al.: Denoising of dynamic contrast-enhanced MR images using dynamic nonlocal means. IEEE Trans. Med. Imaging 29(2), 302–310 (2010)
Golkov, V., et al.: q-space deep learning for twelve-fold shorter and model-freediffusion MRI scans. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 37–44. Springer, Heidelberg (2015)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: movies, color, texture, and volumetric medical images. Int. J. Comput. Vis. 39(2), 111–129 (2000)
Martel, A.L.: A fast method of generating pharmacokinetic maps from dynamic contrast-enhanced images of the breast. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 101–108. Springer, Heidelberg (2006)
Murase, K.: Efficient method for calculating kinetic parameters using T1-weighted dynamic contrast-enhanced magnetic resonance imaging. Magn. Reson. Med. 51(4), 858–862 (2004)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, DTIC Document (1985)
Schmid, V.J., et al.: A bayesian hierarchical model for the analysis of a longitudinal dynamic contrast-enhanced MRI oncology study. Magn. Reson. Med. 61(1), 163–174 (2009)
Sourbron, S.P., Buckley, D.L.: Classic models for dynamic contrast-enhanced MRI. NMR Biomed. 26(8), 1004–1027 (2013)
Tofts, P.: Quantitative MRI of the Brain: Measuring Changes Caused by Disease. Wiley, Hoboken (2005)
Tofts, P.S.: Modeling tracer kinetics in dynamic Gd-DTPA MR imaging. J. Magn. Reson. Imaging 7(1), 91–101 (1997)
Tofts, P.S., et al.: Estimating kinetic parameters from dynamic contrast-enhanced T1-weighted MRI of a diffusable tracer: standardized quantities and symbols. J. Magn. Reson. Imaging 10(3), 223–232 (1999)
Veksler, R., Shelef, I., Friedman, A.: Blood-brain barrier imaging in human neuropathologies. Arch. Med. Res. 45(8), 646–652 (2014)
Vincent, P., et al.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Acknowledgments
This study was supported by the European Union’s Seventh Framework Program (FP7/2007–2013; grant agreement 602102, EPITARGET; A.F.), the Israel Science Foundation (A.F.) and the Binational Israel-USA Foundation (BSF; A.F.).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
A Artificial Neuron and Deep neural network
Neural Networks (NNs) are modeled as collections of computational units called artificial neurons (ANs) that are connected in an acyclic graph. An AN is a computational unit with multiple inputs, denoted by an augmented input vector \(\tilde{\mathbf {c}}_{o}=\left[ 1,\mathbf {c}_{o}^{T}\right] ^{T}\) and a single output scalar, \(y\in \mathbb {R}\), such that \(y=f(\mathbf {x})=f\left( \mathbf {w^{T}x}\right) \), where \(f:\,\mathbb {R\rightarrow R}\) is called the activation function of the neuron.
Let \(\mathbf {w}_{i}^{l}=[w_{i,0}^{l},w_{i.1}^{l},\ldots ,w_{i,K_{l}}^{l}]^{T}\) define the weights of the graph edges connecting the \(i-\text{ th }\) AN in layer \(l+1\) to the \(K_{l}\) ANs in layer l , \(l\in [0,L-1]\). Let us also define by \(f_{l}(\cdotp )\) the activation function of the ANs in layer l. The output \(y_{i}^{l+1}\) of the \(i-\text{ th }\) AN in layer \(l+1\) is calculated as follows:
where \(\mathbf {y}^{l}=[1,y_{1}^{l},\ldots ,y_{K_{l}}^{l}]^{T}\). The DNN’s input is the augmented vector \(\mathbf {y}^{0}=\tilde{\mathbf {c}}_{o}=\left[ 1,\mathbf {c}_{o}^{T}\right] ^{T}\)
B Expert DNN
We set \(L=10\) layers for each of our expert DNNs. Each layer contains 100-175-150-120-80-30-80-120-150-175-100 neurons, respectively, where the input and output layers are of size \(l_{0}=l_{10}=100\). The pre-training process is carried out in an aggregative manner where the weights \(\mathbf {W}_{0}^{l}\) of each layer \(l=1\ldots 5\) of a given DNN are pre-trained using a single RBM (denoted by \(RBM^{l}\)) such that the hidden units realizations \(\mathbf {h}^{l-1}\) of an \(RBM^{l-1}\) are used as the visible units \(\mathbf {v}^{l}\) of an \(RBM^{l}\), i.e. \(\mathbf {h}^{l-1}=\mathbf {v}^{l}\) (see Fig. 8a). The weights of the DNN are initialized by setting \(\mathbf {W}^{l}=\mathbf {W}_{0}^{l}\) for the lower L / 2 layers (\(l=1\ldots 5)\) and \(\mathbf {W}^{l}=(\mathbf {W}_{0}^{L-l+1})^{T}\) for the upper layers (\(l=6,\ldots ,10\)) (see Fig. 8b). The calculation of the final weights of the DNN is done simultaneously using stochastic gradient decent (SGD) with linear activation function in the output layer.
For the initialization of the input layer we assume \(\{v_{i}^{0}\}\) are random variables sampled from a normal distribution \(\mathcal {N}(a_{i},\sigma _{i})\), where \(a_{i},\sigma _{i}\) is the mean and the standard deviation (respectively) associated with unit i and are estimated from the training set. Therefore, it is trained as a Gaussian-Bernoulli RBM, with an energy function:
The entire training set was scaled such that each entry of input has zero mean and a unit variance. The learning rate of the first \(RMB^{1}\) was set to 0.001 (0.01 for all the others) and pre-training proceeded for 300 epochs. In addition, we used more binary hidden units than the size of the input vector because real-valued data contains more information than a binary feature activation.
C Classification DNN
The classification DNN contains an input layer, two hidden layers and an output layer; each layer has 120-180-180-24 neurons, respectively. The input of the DNN is defined as follows. Let \(g_{k}(\mathbf {c}_{o})\) be the hypothesis of the \(k-th\) DNN expert with respect to an observed input \(\mathbf {c}_{o}\), where \(k=1,\ldots ,24\). We define five measures that allow the evaluation of the experts’ performances as follows: \(z_{1;k}=\left\| g_{k}(\mathbf {c}_{o})-\mathbf {c}_{o}\right\| _{1}\), \(z_{2;k}=\left\| g_{k}(\mathbf {c}_{o})-\mathbf {c}_{o}\right\| _{2}\), are the \(L_{1}\) and \(L_{2}\) norm of of the deviation from the original WoC; \(z_{3;k}=\frac{<g_{k}(\mathbf {c}_{o}),\mathbf {c}_{o}>}{\left\| g_{k}(\mathbf {c}_{o})\right\| \left\| \mathbf {c}_{o}\right\| _{2}}\), \(z_{4;k}=\frac{cov(g_{k}(\mathbf {c}_{o}),\mathbf {c}_{o})}{var(g_{k}(\mathbf {c}_{o}))var(\mathbf {c}_{o})}\), are the cosine similarity and correlation between the reconstruction and input signals; and \(z_{5;k}=\left\| \nabla g_{k}(\mathbf {c}_{o})\right\| _{1}\) is the hypothesis total variation. The input feature vector is therefore:
A “softmax” activation function, which is commonly used for multi-class classification problems, was assigned to neurons in the last layer. The softmax activation function takes into account not only the entry value of a specific AN but also the entries to all the other ANs at this layer:
where \(\alpha _{i}\) denotes the entry value at the \(i-th\) neuron.
Seventy percent of each DNN’s training set were picked at random to create the training set of the classification DNN where for each training example the feature vector \(\mathbf {z}\) was calculated and a label \(\mathbf {y}\) was assigned according to the origin of the training example. Namely, a training example that originally belongs to the training set of the \(k-th\) expert is assigned a label \(\mathbf {y}=\mathbf {e}_{k}\) such that all the coefficients are 0 except for the \(k-th\) coefficient which is 1.
D The Beltrami Framework
In this section we briefly describe the Beltrami framework for de-noising grayscale videos and our extension to modify it to DCE-MRI scans. We consider a grayscale video to be a 3D Riemannian manifold embedded in D = d + 3 dimensional space where d = 1 for grayscale images. The embedding map \(Q:\,\Sigma \rightarrow M\) is given by:
where I is the image intensity map. Both \(\Sigma \) and M are Riemannian manifolds and hence are equipped with metrics G and H, respectively, which enable measurement of lengths over each manifold. We require the lengths as measured on each manifold to be the same, i.e.,
where \(dI=I_{x}dx+I_{y}dy+I_{\tau }d\tau \), according to the chain rule. A natural choice for gray-level videos is a Euclidean space-feature manifold with the metric:
where \(\beta \) is the relative scale between the space coordinates and the intensity component. Using (2) the induced metric tensor \(G=\{g_{uv}\}\) is:
The Beltrami flow is obtained by minimizing the area of the image manifold:
where \(g=det\left( G\right) \). Using the methods of variational calculus with the resulting Euler-Lagrange relation, the minimization is given by:
Multiplying both sides by \(g^{-1/2}\) we get :
where \(\triangle _{g}\) is the Laplace-Beltrami operator. The discretized version of Eq. (10) allows us to perform iterative traversal through this scale space on a computer and produces a very effective technique for denoising grayscale videos when using the metric in (7):
where \(D=\sqrt{g}G^{-1}\nabla I\), \(div(D)=D_{x}+D_{y}+D_{\tau }\), and \(dt\propto \beta ^{-2}\). Note that the output depends on two hyper parameters: the number of iterations of the update, and the parameter \(\beta \).
The above framework assumes similar physical measures of the x, y, and \(\tau \) coordinates. In reality, the space domain coordinates x and y do not possess the same physical measure as the time domain coordinate \(\tau \). Hence, we need to introduce another scaling factor,\(\gamma \), into the space-time-intensity metric:
The new induced metric tensor for the 3D image manifold is computed using the constraint in Eq. (5) :
In addition we modified the numerical update step so it would fit the new scaling as follows:
where \(dt_{1}\propto \beta ^{-2}\) and \(dt_{2}\propto \gamma ^{-2}\).
E Results on Synthetic Data
The performance evaluation of our DNN-based de-noising method on synthetic data is done using 10-fold cross-validation (10-CV) method. 200,000 noisy WoC were generated using the Tofts model. The training data is randomly divided into ten groups (20,000 training examples in every CV group) such that nine groups are used for training and the remaining set is used for testing. The experiment is performed for different signal to noise ratio (SNR) values independently. Fig. 9 demonstrates successful denoising of a single representative WoC using our DNN-based method (red) and using moving average (MA) method (green) along with the synthetic, clean and noisy WoCs (blue and black, respectively). In Fig. 10 the mean MSE values and standard deviation intervals are plotted for different SNR levels.
F Run-time Comparison
We measured the run-time of the different algorithms for the 13 DCE-MRI scans. Table 1 presents the average run-time of each de-noising algorithm in minutes. The measured run-time did not include any pre-processing procedures and measured only the run-time of the de-noiosng algorithms. The algorithms were tested on MatLab 2014b (64-bit) using Intel(R) core(TM) i-7-4470, 3.4 GHz CPU with 16 GB RAM.
G Experimental Setup for Real Data
In the absence of ground-truth washout curves, in addition to visual assessment, we estimated the de-noising algorithms’ success using two measures: the fidelity of the output of the de-noising methods to the noisy data and to the PK model. Fig. 11 shows a block diagram of our performance assessment method. Given a noisy DCE-MRI scan we apply a de-noising algorithm. The fidelity of the cleaned curves to the noisy data is measured by calculating the mean squared error (MSE) between the noisy curve and the de-noised curve. Then, we extract the PK-parameters from the cleaned curves by applying the standard DCE-MRI curve fitting algorithm. Next, we use the estimated PK-parameters to generate synthetic washout curves according to the Tofts model. The MSE between the model-based synthetic curve and the de-noised curve measure the fidelity of the de-noised curves to the PK-model.
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Benou, A., Veksler, R., Friedman, A., Riklin Raviv, T. (2016). De-noising of Contrast-Enhanced MRI Sequences by an Ensemble of Expert Deep Neural Networks. In: Carneiro, G., et al. Deep Learning and Data Labeling for Medical Applications. DLMIA LABELS 2016 2016. Lecture Notes in Computer Science(), vol 10008. Springer, Cham. https://doi.org/10.1007/978-3-319-46976-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-46976-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46975-1
Online ISBN: 978-3-319-46976-8
eBook Packages: Computer ScienceComputer Science (R0)