1 Introduction

In stroke perfusion imaging [4] a series of 3D MR or CT images of the brain are acquired after injection of a contrast bolus. These images show the contrast agent – and hence the blood – flow in and out of the brain. As such, we have in each voxel a time series that shows the change in image intensity due the contrast agent. This intensity change can be converted to the concentration of contrast agent. This imaging modality plays a crucial role in stroke diagnosis, allowing to measure perfusion parameters such as cerebral blood flow (CBF), blood volume (CBV) and arrival time in each voxel of the brain. These perfusion parameters allow to assess to what extent the brain is affected by the stroke, distinguishing the core of the infarct (dead tissue), the penumbra (tissue at risk) and the healthy tissue. The volumes of the core and penumbra are essential to decide on the treatment plan of an acute stroke patient [2].

The native images acquired in perfusion imaging, be it CT perfusion or MR perfusion, are not directly interpretable. Typically, a deconvolution analysis is performed, where the concentration time series that are measured in each cerebral voxel (the time concentration curve or TCC), are deconvolved with the so-called arterial input function (AIF), which is the concentration time series measured in one of the large feeding arteries of the brain. The deconvolved time series are no longer influenced by the contrast injection protocol or the cardiac status of the subject. They are impulse response functions (IRF), the signal that theoretically would be measured if the injection of contrast agent where a Dirac impulse directly into the feeding artery of the brain. It has been shown that, under reasonable assumptions, the maximum of this deconvolved time series is proportional to the cerebral blood flow (CBF). This parameter is strongly predictive for the health of the tissue and is currently used in clinical practice to identify the infarct core: voxels with a CBF less than 30% compared to the other side of the brain are considered to be core [1]. The time when the deconvolved time series reaches its maximum, is called Tmax, and increased values are indicative for tissue is at risk. A threshold of 6 s is currently clinically used to determine the perfusion lesion (i.e. the combined core and penumbra) of the infarct [1].

Indeed, deconvolution plays a central role in perfusion analysis. However, deconvolution is a mathematically ill-posed problem, and given the relatively low signal to noise ratio of perfusion images, a successful implementation of perfusion analysis requires duly attention to this problem. First, there is need for proper preprocessing: motion correction, temporal and spatial smoothing, and possibly spatial downsampling. Second, the deconvolution is regularized, suppressing the high frequency signal in the reconstructed impulse response function. This is typically done in singular value decomposition (SVD) based deconvolution by regularizing the singular values, e.g. using Tikhonov regularization. Nevertheless, the deconvolution-based perfusion parameters remain noise sensitive and research for improved algorithms [3] or even deconvolution-free summary parameters [8] remains ongoing.

Recently, several works have proposed to use machine learning techniques to estimate, based on the perfusion and treatment parameters, how the final infarct will look [6, 7, 9, 12]. However, all these approaches first perform a deconvolution analysis – which suffers from the earlier mentioned problems – and then use the perfusion parameters as input features for the machine learning algorithms. One notable exception is [10] who use both the perfusion parameters and the native measurements as input to their method.

In this work, we show on simulated data that a neural network can learn to perform this deconvolution and achieve more accurate estimations of CBF and Tmax than a state of the art deconvolution technique can. Additionally, we show how a perfusion specific data augmentation can be used to learn this deconvolution from a relatively small number of training samples. Knowing that a neural network is able to learn how to perform the deconvolution, opens possibilities for new research, where the final infarct is predicted directly from the native images, bypassing the standard deconvolution as a preprocessing step.

2 Methods

We will compare a state of the art deconvolution approach with our proposed neural network-based approach.

2.1 Baseline: Tikhonov Regularized SVD-Based Deconvolution

As a baseline, we use Tikhonov regularized SVD-based deconvolution [4] with the Volterra discretization scheme [11] – to which we will simply refer as deconvolution. This method takes the AIF and TCC and will produce the IRF. From the IRF, we can infer both the CBF and the Tmax:

$$\begin{aligned} \text {CBF}&= \frac{1}{\rho } \max _t \text {IRF}(t) \end{aligned}$$
(1)
$$\begin{aligned} \text {Tmax}&= \mathop {\text {argmax}}_{t} \text {IRF}(t). \end{aligned}$$
(2)

The IRF has as unit 1/s and hence the CBF ml/g/s (being ml blood per g of brain tissue per s). When reporting CBF values, we will follow the common practice of using ml/\(100\,g\)/min. The IRF is discretized, so to produce continuous estimates of Tmax, the IRF is fitted with a quadratic spline.

The deconvolution has one hyperparameter, the relative regularization parameter \(\lambda _{rel}\) which sets the amount of filtering. For each experiment, the optimal regularization strength is found by evaluating the performance on the training set for a range of possible values (\(0.01 * 2^{0,1,2,..,9}\)).

2.2 Proposed Neural Network

Through experimentation we found that even simple networks succeed at this task and the following network is used for all experiments. Our network has two input layers, one for the AIF, and one for the TCC. The two input layers are concatenated, followed by two fully connected layers with each 30 neurons and end in a single output neuron. The fully connected layers have a PreLU activation and the output layer has no activation function. The output is a single value, depending on the experiment, either the CBF or the Tmax.

Note that while it is possible to provide spatial context to the network – e.g. the TCC of the neighboring voxels, which would most certainly improve the performance – this is out of scope for this work and would make the comparison between the two approaches unfair.

The network is trained in a supervised fashion by feeding examples of an AIF and a TCC with known CBF or Tmax and minimizing the absolute difference between prediction and ground truth. The optimization is performed by stochastic gradient descent with learning rate 0.01, Nesterov momentum of 0.9 and mini-batches of 2048 samples.

2.3 Proposed Data Augmentation

To limit the amount of training data that is required to train this network, we propose a perfusion specific data augmentation. The key insight is that the perfusion measurements are a linear time invariant system. This means that, if the contrast injection was a bit later, both the AIF and the TCC would show the same delay. Similarly, if the injection was earlier, all curves should shift to the left. However, the IRF (and hence the Tmax and CBF) would remain the same in both cases. If the concentration of the iodine or gadolinium in the contrast agent were a fraction higher or lower, the concentration curves should be changed with the same fraction. But again the IRF would remain untouched.

Hence, we will create additional samples from a single training sample by applying a random time shift (earlier or later) and a random scaling to both AIF and TCC. In our experiments, the shift is randomly chosen between −1 and 2 time points (since our measurements are discrete) and the scaling has a uniform distribution between [0.7, 1.3]. This augmentation enhances the number of different training samples the network sees, reducing the required size of our actual training set.

3 Experiments

To compare the deconvolution with the proposed methods, we need to compare the predictions with the ground truth. However, ground truth measures of the CBF or Tmax are not possible in vivo. Hence we perform a series of experiments on simulated data.

3.1 Data

In the simulations, we model the arterial input function (AIF) and the impuls response function (IRF) as gamma variates:

$$\begin{aligned} \gamma (t ; t_0, \alpha , \beta , A) = A * (t-t_0)^\alpha * e^{- \frac{t-t_0}{\beta } }. \end{aligned}$$
(3)

The different parameters are chosen randomly such that the resulting curves resemble the curves measured with CTP. The values \(t_0, \alpha , \beta \) are chosen from uniform distributions. For the AIF: \(t_0 \in [0,15]\), \(\alpha \in [1.5,3.5]\), \(\beta \in [1.5,3.5]\). For the IRF: \(t_0 \in [0,10]\), \(\alpha \in [0,0.5]\), \(\beta \in [2.5,4.5]\).

For the AIF, the value of A is chosen such that the AIF’s maximum is uniformly distributed between 100 and 500 HU.Footnote 1 For the IRF, the value of A is chosen such that the integral of the IRF (i.e. the CBV) is uniformly distributed between 0.1% and 6%. For the experiments about Tmax estimation, a different range is chosen (between 2% and 6%) since the deconvolution-based Tmax estimation performs poorly on weak signals. Kee** the full range would move the comparison in favor for the proposed method while not being relevant in clinical practice.

The TCC is obtained by convolving the simulated AIF and IRF. Finally, the AIF and TCC are sampled (19 samples over a time span of 40 s) and Gaussian noise is added (\(\sigma \) varies from 0.1 to 3.2, and is given later for each specific experiment). Figure 1 shows a randomly simulated sample and the resulting distribution of CBF, CBV, MTT and Tmax.

Fig. 1.
figure 1

The simulated data.

3.2 SVD-Based Deconvolution Versus Proposed Neural Network

We aim to compare how well the deconvolution approach and the neural network are able to estimate the CBF and Tmax from the AIF and TCC. The performance of the Tmax estimation is measured using the mean absolute difference (MAD) between the true and estimated value. For the CBF, we also use the MAD, but only after scaling all the estimates with an optimal scaling factor (i.e. the one that minimizes the MAD). This is warranted since in clinical practice the relative CBF is used to predict the infarct core.

For a range of noise values, we generate a training set (1M samples) and a testing set (10k samples), where a sample consists the AIF, the TCC and the ground truth perfusion parameter (CBF or Tmax). For each noise level, the optimal amount of Tikhonov regularization is determined on the training set and that value is used on the test set. The neural network is trained for one epoch on all training samples and subsequently predicts the test set. Figure 2a and c summarize the results and show an improvement for both measures on all noises levels. Figure 2b and d show the distribution of the predictions at a single noise level.

Fig. 2.
figure 2

Comparison between estimations produced by the neural network and the SVD-based deconvolution.

3.3 Data Augmentation

In the previous experiment, we trained on 1 million samples, each with a different AIF and hence each corresponding to a different acquisition. Such large training sets are not realistic, and in this experiment we explore how many samples are necessary and whether our proposed data augmentation can lower that number. Using a fixed noise level of \(\sigma =1\), we create training sets of various sizes, by varying the number of AIFs (which corresponds the number of acquisitions) and the number of TCCs per AIF. The proposed data augmentation is used to increase the number of samples with a factor 10. We use the same performance metrics and training method as in the previous section, with the number of training epochs adapted such that each network is trained for the same number of iterations (corresponding to one epoch with 1M samples). Figure 3 summarizes the results, showing that less than 100 acquisitions are sufficient and that the proposed data augmentation can lead to large improvements, especially in situations with limited training data.

Fig. 3.
figure 3

Influence of the number of training samples and data augmentation on the performance of the neural network.

4 Discussion and Conclusion

We proposed a simple neural network and a data augmentation approach to predict perfusion parameters from native perfusion measurements (i.e. the AIF and TCC). A comparison on simulated data shows that the neural network provides better estimations for both CBF and Tmax than a state of the art deconvolution method, and this over a wide range of noise levels. Using the proposed data augmentation, it is feasible to achieve these results with less than 100 datasets.

Earlier, Ho et al. [5] showed that a neural network can learn how to deconvolve. They trained and tested a CNN on MR perfusion datasets while using as a ground truth the perfusion parameters obtained by an SVD-based deconvolution method. They showed that a neural network can produce a reasonable approximation for the various perfusion parameters. In this work, we go one step further, and show that a neural network can outperform deconvolution methods on simulated CT Perfusion data.

We see two main directions for future research. First, one could investigate how neural network based perfusion parameter estimation works on real data. As mentioned earlier, validation of such an approach is difficult since ground truth measurements are not available. However, it might be possible to produce convincing evidence by comparing different modalities from the same subject or by comparing results on high- and low-resolution versions of the same dataset. Second, one could revisit the earlier mentioned methods that attempt to estimate the final infarct from perfusion parameter maps. Especially for neural network based approaches, such as [9], it might be beneficial to provide the original time series to the network instead of the perfusion parameters. Having the network perform the deconvolution operation implicitly might lead to better estimates of the CBF and Tmax, or – even more promising – might lead the network to learn new perfusion parameters that are even more predictive.