Introduction

Patient motion is a long-standing problem in positron emission tomography (PET) imaging. Two types of movements, namely physiological movements (internal organ deformation, e.g., heart beating and respiration) and non-physiological movements (random changes in the body or head position), contribute to PET image degradation [1]. Motion will be even more problematic when an accurate quantification is required such as for assessing treatment using absolute change in SUV and other kinetic parameters [2,3,4]. Motion is even more problematic for the high-resolution scanners under development as they are vulnerable to subtle movements. As the first commercial total-body scanner, uEXPLORER has the capability of imaging the whole body simultaneously [5]. It has ultra-high sensitivity compared with conventional scanners and a resolution of 2.9 mm in the center of the axial direction [6]. Dynamic imaging is essential for utilizing the full potential of such systems but could be vulnerable to patient movement, e.g., a patient may have trouble staying still for 60 min or more, which in turn affects not only the visual quality but also the accuracy of the quantification across the entire body [7, 8].

Attempts have been made to reduce the scan time in total-body PET imaging, which in turn reduces the likelihood of motion [9, 10]. Another way to reduce the motion is patient immobilization and coaching; this involves communicating the importance of remaining still to the patient or providing training with devices for immobilization. However, a downside of these prospective methods is that they often fail to eliminate the motion. Therefore, investigating motion correction approaches is important [11].

As stated earlier, physiological and non-physiological body motion are the two main types of patient motion. Researchers have previously studied extensively how to correct cardiac and respiratory motion in the thorax [12,13,14,15,16,17]. On the other hand, although body motion can be minimized by patient cooperation, it remains a challenging problem in PET. Most studies to resolve body motion focused on brain or cardiac PET static imaging [18,19,20,21,22]. Other researchers attempted to resolve abdominal movement on PET imaging by using MRI information simultaneously acquired in a PET / MRI session [23,24,25,26]. On the contrary, few studies have attempted to compensate for the effect of whole-body movement in PET imaging, especially for dynamic total-body imaging. One reason of course is that a conventional scanner with a multi-bed protocol cannot truly simultaneously acquire the whole-body data. Motion tracking is an attractive option [27] that has been proven effective for monitoring brain and respiratory movement but may struggle in tracking whole-body motion due to the complex nature of the movement. An integrated PET/MR solution is also not feasible as a total-body PET/MR scanner is required but does not exist yet.

In this study, we proposed an image-based correction framework that can compensate for body movement in dynamic total-body PET imaging. We focused on random body shifts and deformation across the entire body, but not periodic respiratory or cardiac movements. Little movement is assumed at the early stage of a scan for the subjects included in this study, and the data from the subsequent period can be aligned to the reference position. In the following context, we will first illustrate the implementation and then demonstrate the evaluation. Discussion about the results, limitations and future work will follow.

Materials and methods

Patient population and data acquisition

We studied 12 dynamic 18F-fluorodeoxyglucose (FDG) scans acquired on a uEXPLORER PET/CT (United Imaging Healthcare, Shanghai, China) from December 2020 to July 2021 at Henan Provincial People’s Hospital. Detailed information on each subject is presented in Table 1. The exact location of the tumor lesion (if present) is also reported. The study was approved by the local Ethics Committee, and written consent was obtained from each subject before the scan. The scanning workflow and data formatting are as follows. A CT scan was first performed for attenuation correction. Then, a 60-min long list-mode acquisition was initiated with the bolus intravenous injection of 18F-FDG at the ankle. List-mode data were binned into 66 frames (5 s × 24, 10 s × 6, 30 s × 6, 60 s × 6 and 120 s × 24) and reconstructed on the scanner workstation into a 192 × 192 × 673 matrix with a voxel size of 3.125 × 3.125 × 2.866 mm3 by using the 3D ordered subset expectation–maximization algorithm (with TOF and PSF, 3 iterations, 28 subsets, 2-mm Gaussian post-smoothing). Attenuation and scatter corrections were performed during the reconstruction with the CT-based attenuation maps.

Table 1 Patient demographics

Inter-frame positional changes were visually identified for all scans. The motion mostly happened after 10 min in the scans included in this study. For each scan, 10–60-min list-mode data were again reconstructed into fifty 1-min-long frames by using the same reconstruction parameters as above. This will allow a better temporal sampling in reconstructed frames to facilitate the motion correction. As a result, a total of 90 frames (5 s × 24, 10 s × 6, 30 s × 6 and 60 s × 54) constituted the new set of dynamic images. From this new set of images, the image-derived input function (IDIF) was extracted from the ascending aorta by drawing a 10-mm-diameter ROI on six consecutive slices in an image obtained by combining early time frames (0–60 s), where the effects of motion and partial volume were less prominent than in the left ventricle [28]. The uptake difference in blood and plasma was not accounted for in this study.

Implementation of the proposed correction framework

The registration of dynamic frames was performed sequentially in a manner similar to what has been proposed for dynamic brain PET [29, 30]. We assumed there is little movement in the early stage (0–10 min), hence the associated frames remained unchanged. The last frame within the first 10 min (frame 40) was used as the reference image. Multiple pairwise registrations were initiated from frame 41 onwards, and the subsequent frames were aligned to the reference position by matching each to its previous frame. In this manner, the potential contrast difference between frames is less problematic for pairwise registration as in conventional image-based methods. Thus, similarity metrics such as normalized cross-correlation can be employed.

The pairwise registration between neighboring frames was performed as follows. For total-body PET imaging, the human body may shift and deform randomly in time and space, resulting in non-uniform intensity change across the whole body. This poses challenges in aligning two neighboring frames when compared with applications such as brain PET. Therefore, we employed large deformation diffeomorphic metric matching (LDDMM), which is designed for large space–time deformation. LDDMM is a non-parametric approach based on principles from fluid mechanics [31, 32] and has been successfully applied to many medical imaging applications [33,34,35]. Here the goal is to register the source image I and target image J (neighbor of I) by finding a time-varying field \(v\) that minimizes the energy function:

$$E(v) = M(I,J,\phi_{1,0} ) + w\int_{0}^{1} {\left\| {Lv_{t} } \right\|}_{2} dt$$
(1)

where \(M(I,J)\) is the similarity measure, which is normalized cross-correlation. The second L2-norm term penalizes the non-smooth velocity fields, and L is the differential operator. \(\phi\) parameterizes a family of diffeomorphisms that can be generated by integrating a smooth velocity field, \(v:\Omega \times t \to {\mathbb{R}}^{d}\), using an ordinary differential equation:

$$\frac{d\phi (x,t)}{{dt}} = v(\phi (x,t),t),\,\,\,\,\phi (x,0) = x$$
(2)

Exploiting the fact that diffeomorphism \(\phi\) can be decomposed into two components \(\phi\) 1 and \(\phi\) 2, we constructed a symmetric alternative to Eq. (1); accordingly, the optimization problem can be expressed as

$$\{ v_{1}^{*} ,v_{2}^{*} \} = \mathop {\arg \min }\limits_{{v_{1} ,v_{2} }} \left\{ {\int_{\Omega } M \left( {I \circ \phi_{1} (x,0.5),J \circ \phi_{2} (x,0.5)} \right)d\Omega + w\left( {\int_{0}^{0.5} {\left\| {Lv_{1} (x,t)} \right\|^{2} dt + \int_{0.5}^{1} {\left\| {Lv_{2} (x,t)} \right\|^{2} } dt} } \right)} \right\}$$
(3)

The goal is to find \(v_{1}\) that minimizes the variational energy from t = 0 to 0.5, whereas \(v_{2}\) minimizes from t = 1 to 0.5. Gradient-based iterative updates deform \(I\) and \(J\) along the geodesic diffeomorphisms \(\phi\) to a midway fixed point, thus motivating the denotation of the registration strategy as symmetric normalization–SyN [32]. A CPU implementation of SyN is available in the ANTs package. Given a large number of voxels in a total-body PET image, we implemented it on a GPU (Nvidia GTX 2080Ti).

Upon finishing all pairwise registrations, the motion field at each frame was obtained by concatenating successive motion fields from all previous frames. The motion-compensated frame was thus obtained by transforming the image with the derived motion fields. For a given scan, the above process will eventually generate a new set of frames aligned to the reference position defined at the early scan. We labeled this correction framework as SyN-seq in the following context. After correction, the 90-frame of dynamic images were summed back to 66 before performing subsequent analysis.

Comparison of different methods

The original uncorrected images were labeled as NMC. The method that performed the diffeomorphism registrations to align each frame to a designated reference frame was labeled as SyN-mid, where the reference was selected as the mid-frame (frame 70). The registration parameters were similar to the ones used for SyN-seq, except that the similarity metric was normalized mutual information to account for the potential large contrast difference among the dynamic images.

Instead of SyN, the method performed the sequential affine registration was labeled as Aff-seq. Affine registration was used to model and estimate the motion as global transformation (represented as translations, rotations and scaling). The optimization in the transformation parameters started from a coarse scale, which was then used to initialize the registration at the next finer scale. Similar to SyN-seq, the similarity metric was selected as normalized cross-correlation to account for the image dissimilarity. Powell algorithm [36] was used for optimization with a fractional tolerance of 10−4 and 500 maximum running iterations. The coarse-to-fine registration was repeated (three levels in total) until the finest scale and stop** criteria were reached.

Data analysis

All statistical analyses were performed using the Statistical and Machine Learning Toolbox in MATLAB R2018b. The common arbitrary threshold of 0.05 was selected as the level of significance. The proposed correction framework SyN-seq was compared with SyN-mid, Aff-seq and NMC. For all 12 subjects, visual and quantification assessments were performed as described below. Motion-contaminated 10-min SUV images were created by combining the corresponding frames. For the five patients with solid tumors, the lesions were delineated using a semiautomatic region growing method with a 50% cutoff threshold of the maximum intensity value, and their SUVmean were reported before and after correction. Mutual information and dice coefficient were computed between each transformed frame and the reference to reveal the degree of recovery. Sampled time–activity curves (TACs) of the cerebral cortex, kidney cortex, liver and thigh muscles (ROIs in Fig. 1) were plotted and compared. Parametric Ki and K1 images were computed for each set of images by using fast nonlinear estimation under the assumption of an irreversible two-tissue compartment model [37]. The IDIFs were measured for each set separately, as described earlier in the Methods section. The fitting residual (FR) was computed as an indication of the goodness of fit, which should be low for the dynamic images to be better aligned:

$$FR = \sum\limits_{k \in F} {\sum\limits_{j \in N} {\left( {I_{k} (x_{j} ) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{I}_{k} (x_{j} ,p)} \right)}^{2} }$$
(4)

where k and F are the index and total number of frames, respectively; j and N are the index and total number of voxels in a frame, respectively; and p denotes the fitted parameters.

Fig. 1
figure 1

Example ROIs to extract the time-activity curves that are used for quantifications

To determine the inter-group variety, coefficient of variation (CV) in Ki and K1 was computed for the cerebral cortex, kidney, liver and thigh muscle; the low value indicates the low variation in the group:

$${\text{CV }} = {\text{ SD}}/{\text{mean}}$$
(5)

where mean and SD were computed for all subjects. A small CV value indicates low inter-subject variability in the group, thus indicating a better recovery of kinetic parameters from the effect of motion on the difference in patients.

Finally, two experienced radiologists conducted five-scale scoring for the quality of SUV and dynamic frames. The raters were blinded to the clinical information of the patients. For an SUV image, the score was determined according to the amount of artifacts and image resolution. As such, a successful motion correction should remove the positioning change when displaying dynamic frames as a movie, and a higher score will be given.

Results

SyN-seq required an average processing time of 35 min to process a dynamic scan. In contrast, SyN-mid and Aff-seq required average of 60 min and 8 min, respectively. The selected subtraction images (to reference frame 40) at frame 66, 60, 54, 48, 41 are shown in Fig. 2. All frames after alignment demonstrated less intensity difference than the uncorrected ones. The residual contrast difference is probably mostly due to the tracer kinetics as the distribution has not reached equilibrium yet (indicated by the subtraction image between frame 41 and 40). Note that the correction failed to compensate for the effect of arm movement due to the truncation in the transaxial plane (arrow). (Fig. 3).

Fig. 2
figure 2

Patient 12: subtraction images at given frames before (NMC) and after correction with the SyN-seq. The corrected frames have less positional difference to the reference frame 40

Fig. 3
figure 3

Patient 12: SyN-mid- and SyN-seq-corrected images and their associated motion fields (at frame 48). For the motion vector fields, the red, green and blue represents the three directions of the vector, and the saturation represents magnitude of the vector along each direction (as the color wheel shows)

In Fig. 4, SUV images from NMC, Aff-seq, SyN-seq and SyN-mid are displayed in parallel for comparison. For most motion-contaminated scans, Aff-seq recovered the image quality to some extent, while SyN-seq and SyN-mid further removed the residual artifacts. Furthermore, visual appearance of SyN-seq and SyN-mid was comparable, although there are some visual differences in abdominal and other regions. This may partly be explained by the difference in motion fields (Fig. 3) that result in the positional difference for viewing. The average improvement in tumor SUVmean was 5.35 ± 4.92% (P = 0.047) with a maximum improvement of 12.89%. Details effects of correction on tumor uptake are listed in Table 2.

Fig. 4
figure 4

Effect of motion correction on SUV images for A patient 8 (50–60 min), B patient 9 (7–12 min) and C patient 12 (40–50 min). Red arrows indicate the motion-contaminated regions. The images in A and B suffered more from head and internal abdominal movement, while the one in C suffered from the irregular body shift mostly along the Superior-Inferior direction

Table 2 Effects of correction on tumor SUVmean for the subject with lesions

SyN-seq and SyN-mid achieved the best recovery in aligning dynamic frames. Mutual information and dice coefficient at all frames relative to the reference are shown in Fig. 5. The similarity between the frames improved greatly after correction. Differences in mutual information and dice coefficient before and after correction using SyN-seq were significant (P = 0.003 and P = 0.011). There was also a significant difference between SyN-seq and Aff-seq (P = 0.012). However, there was no significant quantitative difference between SyN-seq and SyN-mid. Sampled TACs from patient 4 and patient 12 are displayed in Fig. 6. The quantifications in the cerebral cortex and kidney cortex were affected by the patient’s movement in late frames, whereas those in the liver and muscle were affected to a lower extent as spillover was less prominent in the uniform region. The clear anomaly from the Aff-seq profile in Fig. 6B is due to the failure of the registration at frame 40, which causes all subsequent registrations to a wrong reference position. Parametric Ki images are shown in Fig. 7. Similar to SUV images in Fig. 4,  Aff-seq and SyN-mid recovered the image quality to some extent, while SyN-seq further improved the image quality. The residual fitting errors when performing the kinetic modeling were 3.53 ± 0.46, 3.34 ± 0.41, 2.74 ± 0.49 and 2.88 ± 0.42 (× 106 Bq/ml) for NMC, Aff-seq, SyN-seq and SyN-mid, respectively. Quantification in K1 exhibited little improvement after correction as shown in Fig. 8 because K1 mostly depends on the early phase of the scan for which no correction was applied.

Fig. 5
figure 5

Distribution plots of the mutual information A and dice coefficient B for assessing the frame alignment. The lines represent the means, and the shadowed areas represents the associated standard deviations

Fig. 6
figure 6

TACs sampled at organs from A patient 4 B patient 12. The quantification in the cerebral cortex and kidney cortex is more sensitive to movement, while the relatively uniform regions in liver and thigh muscle are affected to a less extent

Fig. 7
figure 7

Corresponding Ki images for A patient 8, B patient 9 and C patient 12 in Fig. 4. Red arrows indicate the motion-contaminated regions, which can be different from the ones in Fig. 4

Fig. 8
figure 8

K1 images did not show visual differences before and after correction (corresponding to Fig. 7B). K1 images from other subjects have similar conclusions

Table 3 shows that both SyN-seq and SyN-mid reduced the group CV of Ki in the cerebral cortex, liver, kidney cortex and thigh muscle (with a mean reduction of 28.9%, 5.3%, 11.5%, 1.8%, respectively), thus demonstrating the correction successfully reduced the inter-subject variability. This reduction was not much different between SyN-seq and SyN-mid. The average score assessed by two radiologists on the SUV images were 3.5, 3.6, 4.05 and 4.0 for NMC, Aff-seq, SyN-seq and SyN-mid, respectively. The average scores on the corresponding dynamic frames were 3.2, 3.6, 4.25 and 4.23, respectively. The subjective assessment was consistent with the results in previous paragraphs.

Table 3 Summary of coefficient of variation (CV) and associated variances for kinetic parameters in the cerebral cortex, liver, kidney cortex and thigh muscle for 12 subjects

Discussion

In this paper, we proposed a framework to compensate for the body shift and deformation in dynamic total-body PET imaging. The proposed method is an image-based approach that attempts to align the densely reconstructed frames (1 min per frame). The results showed that the quality of SUV and Ki images improved after correction, and the degree of recovery varied across the subject. The tumor uptakes were also improved after correction across the subjects with tumors. The performance of SyN-seq was better than that of other correction methods in terms of quantification, although SyN-seq only outperformed SyN-mid slightly in some subjects. The results demonstrate that not only the single-subject staging accuracy can potentially be improved, and the inter-subject group variability can be reduced which in turn reduces the number of subjects required in a cross-sectional study. The successful motion correction in dynamic total-body imaging will benefit applications that require simultaneous multi-organ imaging, such as imaging of the brain-gut axis [1, 38]. When computing Ki, we applied nonlinear estimation based on the irreversible two-tissue compartment model. Similar results are expected for the Ki image estimated using graphical Patlak analysis that will also benefit from motion correction. Note that the visual comparison between SyN-mid and others may not be entirely fair, as they are aligning to a different position; hence, their views in Fig. 4 and Fig. 7 may slightly be different.

We focused on random body shifts and deformation across the entire body, but not periodic physiological movements, e.g., respiratory motion. The difference between respiratory motion correction and total-body dynamic motion correction is three-fold. Firstly, most respiration is periodic; hence, there are a limited number of motion fields, which can be estimated from a static scan by existing vendor software. While a total-body dynamic scan time is relatively longer, motion correction needs to take care of the multiple random estimation of motion fields. Secondly, short-time frame registration can be more challenging than respiratory motion correction with more counts. In addition, the large contrast difference between the early and late frames poses challenges. Thirdly, we think that respiratory motion fields can be represented with a limited number of parameters, which can be well estimated by B-spline or optical flow-based algorithms [16, 39, 40]. On the other hand, whole-body motion fields are arguably more complex and require a more complex representation such as diffeomorphism. In fact, we tried B-spline registration to replace SyN in our application, but it performed worse. Despite of above difference, it would be of great interest to correct both physiological respiration and non-physiological body motion simultaneously, which may be enabled by uEXPLORER as the frames can be reconstructed at a very high temporal resolution (down to 100 ms) [46]. However, the goal here was to propose an automatic image-driven method that can readily work on the dynamic reconstructed images. In the future, we will investigate the possibility of accurate attenuation correction to the existing workflow once the vendor supports reconstruction with a user-defined AC map. Secondly, the potential degradation from the intra-frame motion was ignored. Although most motion artifacts were successfully suppressed, residual artifacts could still be present in a 1-min frame image. To deal with this, motion detection is required to first detect the exact time when the body starts to move and then separate the raw data in time accordingly. This will also reduce the required number of the image registration process. Thirdly, as stated earlier, SyN-seq might be inaccurate in the case when registration errors propagated in the sequential registration processes. Although we did not observe such effect in this study, caution should be taken especially when one attempts to register low-quality frames. Lastly, we expect the method will also benefit for first 10-min scan at least better than conventional image-based method, but this remains to be investigated. Similarly, the varying contrast can be problematic for a non-FDG tracer. The kinetics and the distribution of a non-FDG tracer could be quite different from FDG, which may prohibit accurate whole-body registration due to insufficient common information in-between frames. Therefore, further validations of the proposed correction on non-FDG tracers are warranted.

Conclusion

In this work, we demonstrated an image-based motion correction method for dynamic total-body PET imaging. The proposed method can reduce the effect of body shift and deformation on image quality and quantification of both static and parametric images. The quality improvement in the total-body PET will benefit clinical applications that require simultaneous imaging at whole-body level. We plan to evaluate the feasibility of applying the proposed correction framework to certain applications, e.g., assessing the whole-body tumor burden with dynamic imaging.