Introduction

The global incidence of papillary thyroid microcarcinoma(PTMC) has risen substantially worldwide in the past three decades [1, 2], following the widespread adoption of ultrasound(US) and other diagnostic imaging modalities [3]. Because mot PTMC are low risk with an excellent prognosis, the optimal management remains controversial [4].

To avoid overtreatment, active surveillance(AS) has been recommended as a new management option to immediate surgery for adult patients with biopsy-proven low-risk PTMC [3, 5], and has favorable results [6,7,8,9,10] Conventional two-dimensional ultrasound(2DUS) was the most widely used imaging modality in the routine follow-up of AS. When the appearance of new tumors and/or lymph node metastasis(LNM), or tumor size enlargement were found, conversion surgery was recommended [5]. The tumor size enlargement was initially defined as an increase in maximal diameter by more than 3.0 mm [8, 11]. Recently, some studies also defined enlargement as a 50% increase in tumor volume [12,13,14]. However, the quality of US evaluation was limited by the observer variation, which posed a challenge for the implementation of this management in real-world practice [14,15,16].

Understanding of the inter-observer variations of US was necessary for accurate evaluation [17, 18]. A previous study reported the inter-observer variation of maximum diameter and volume of PTMC was from − 26.6 to 24.5%, and from − 65.8 to 64.4%, respectively [19]. With the advent and progress, three-dimensional ultrasound(3DUS) could scan the target organs or lesions by a single sweep of an US beam and provide the images in multiple slices and planes, which has been applied for fetal growth, tumor diagnosis and interventional therapy [20,21,22,23]. It was reported that 3DUS could overcome the drawbacks of 2DUS, making the US examination more objective and less observer dependent, especially in the field of volumetry [24,25,26,27,28,29]. To the best of our knowledge, little is known of the inter-observer variations of 2DUS and 3DUS in the measurement of PTMC.

Therefore, the purpose of this prospective study was to investigate the inter-observer variations of 2DUS and 3DUS in the measurement of maximum diameter and volume for PTMC.

Methods

Study design

This prospective study was approved by the Institutional Review Board of our hospital(S2020-237-01). All the enrolled patients fulfilled these inclusion criteria: [1] confirmation of solitary PTMC by fine-needle aspiration (FNA) or core-needle biopsy(CNB); [2] serum thyroid hormone and thyrotropin levels within normal ranges; [3] accept two complete sets of evaluation, including 2DUS and 3DUS by two observers. Exclusion criteria were: [1] benign results or follicular neoplasm on FNA or CNB; [2] patients with a history of neck irradiation or thyroid disease treatment; [3] patients with neck extension disorder that could not tolerate two complete sets of US scans by two observers.

The sample size was calculated by PASS 15 software (NCSS LCC., Kaysville, UT, USA). The type 1 error was 0.05, and the power was 0.8 based on a two-sided effect. A sample size of 50 subjects with two observers per subject was needed to detect an intraclass correlation coefficient (ICC) of 0.95 by the two modalities in the measurement of volume when the null hypothesis one was 0.9. Therefore, between Jan 2021 to March 2021, this study recruited 51 consecutive patients with solitary PTMC who underwent 2DUS and 3DUS evaluation.

Measurement

Two physicians (Observer A with more than 10-year experience in thyroid US; Observer B with 5-year experience in thyroid US) performed all the measurements using a SAMSUNG RS85A instrument (SAMSUNG) equipped with an internal 3DUS virtual organ computer aided analysis (VOCAL) program. A 3–12 MHz linear array transducer (L3-12 A) was used to acquire 2DUS images, and a 3–14 MHz volume transducer (LV3-14) was used for 3DUS images acquisition. The thyroid parenchyma background status was defined as normal or Hashimoto thyroiditis [thyroid peroxidase antibody (TPOAb > 60 IU/mL) with or without anti-thyroglobulin antibodies (TgAb > 60 IU/mL)].

Prior to the study, the two observers underwent a training session that consisted of 20 unenrolled cases to acquaint themselves with 3D scanning and manually outlined. To obtain objective measurement, the two observers standardized a measurement protocol as follows(Fig. 1):

Fig. 1
figure 1

Measurement flowchart. 2DUS: two-dimensional ultrasound; 3DUS: three-dimensional ultrasound; D: maximum diameter; 2DV: volume measured by two-dimensional ultrasound; 3DV: volume measured by three-dimensional ultrasound

(1) Patients were scanned consecutively by the two observers. Only one observer was present in the exanimation room at any time. For each patient, each observer performed a complete new set of scans for the measurement, consisting of 2DUS and 3DUS, without knowledge of the other physician’s results.

(2) During the examination, 2DUS was performed first. For each tumor, the location, composition, echogenicity, shape, margin and echogenic foci were evaluated [30]. The anteroposterior and transverse diameters of tumor were measured on the transverse US image with the largest dimensions, and the longitudinal diameter was measured on the longitudinal US image with the largest dimensions. Tumor was measured with the calipers placed outside of any visible halo [31]. All the measurements were made to the nearest 0.01 cm. After measuring the three diameters, the largest one was defined as the maximum diameter. The 2DUS volume of tumor was calculated using the ellipsoid formula methods as follows: V = πabc/6 (V is the 2DUS volume, while a is the longitudinal diameter, b and c are anteroposterior and transverse diameters, π is 3.1415). The three diameters were measured twice to obtain the means of maximum diameter and 2DUS volume by each observer.

(3) When 2DUS examinations were finished, 3DUS mode was activated to view the largest longitudinal image of tumor. For the 3D data acquisition, the entire tumor was scanned through a single sweep. Each observer scanned the tumor twice and the 3DUS images was stored in the hard disk of the system for further analysis. The images were reviewed and measured in the same after all examinations were completed. The VOCAL method was used to reconstruct and postprocess the 3DUS images to calculate the volume. With 3 orthogonal slices simultaneously displayed, the longitudinal US image plane was selected as A plane. Select the contour type as the manual and the angel of rotation as 30°. Then 6 slices images were obtained to manually trace the contour lines of the tumor. Once outlining was finished, the 3DUS volume could be obtained automatically. For each tumor, the volume was measured twice to obtain a mean 3DUS volume of each observer.

(4) After the two observers finished their measurements, a total of six measurements were obtained for each tumor. The means of volume by each measurement modality were calculated on the means of the two observers. The measurement time of each modality were also recorded. The measurement time of 2DUS measurement was defined from the 2DUS evaluation to the calculation of 2DUS volume.

The measurement time of 3DUS measurement was defined from 3DUS mode activation to the 3DUS volume obtained by VOCAL method.

Statistical analysis

Statistical analysis was performed using the SPSS statistical software(Version 25.0) and GraphPad Prism(Version 8.0.0) software. A difference with P < 0.05 was considered as statistically significant. Normally distributed continuous variables are expressed as mean ± standard deviation and compared using the paired-samples t-test. Categorical variables were presented as numbers with percentages.

The inter-observer reliability was assessed using ICC with 95% confidence intervals(CIs) based on the absolute agreement and two-way random effects model. Reliability was classified as follows: excellent(ICC > 0.90), good(ICC = 0.75–0.90), moderate(ICC = 0.5–0.74), and poor (ICC < 0.50) [32]. The inter-observer agreement was assessed using Bland-Altman analysis. Agreement was expressed as a bias with 95% limits-of-agreement (LOA). The bias was the tendency for one modality to underestimate or overestimate the measurement relative to the other [33]. LOA was the range within which 95% of the differences between measurements by the two observers or modalities would lie [34], and expressed the absolute magnitude of the agreement between the two observers or modalities. The width of LOA varied with the precision of measurements. LOA was wider when measurements were imprecise and vice versa [35]. The Kolmogorov-Smirnov test was used to assess the normality of the distribution before Bland-Altman analysis, and the measurements were performed as the ratio by the two observers or modalities. The conclusion on agreement should be made based on the width of LOA in comparison to a priori defined clinical criteria [35, 36]. According to the 2015 American Thyroid Association(ATA) Guidelines [3], volume changes of less than 50% should be considered as the measurement variation. Therefore, the acceptable agreement of volume in this study should be a LOA ranged from 0.5 to 1.5.

Results

A total of 51 patients (46 females, 5 males) with solitary PTMC were included in this study(Table 1). The measurements of PTMC by the two observers are summarized in Table 2. The mean of maximum diameter by two observers was 0.78 ± 0.14 cm. The mean of 2DUS volume and 3DUS volume by two observers was 0.175 ± 0.078 cm3 and 0.163 ± 0.074 cm3(P = 0.005). Representative cases are shown in Figs. 2 and 3.

Table 1 Clinical characteristics of patients with PTMC
Table 2 The measurements of PTMC by the two observers
Fig. 2
figure 2

The 2DUS and 3DUS images of a 51-year-old female with PTMC. A.B. The longitudinal and transverse images of 2DUS showed a solid tumor located in the right lobe of thyroid. The tumor size was 0.68cmÍ0.65cmÍ0.65 cm and the 2DUS volume was 0.150 cm3. C. In the measurements of 3DUS volume, the longitudinal US image plane was selected as A plane, and a total of six slices images were obtained to manually trace the contour lines of the tumor. D. After the outlining, volume could be obtained automatically with three orthogonal slices simultaneously displayed, and the 3DUS volume was 0.128 cm3

Fig. 3
figure 3

The 2DUS and 3DUS images of a 36-year-old male with PTMC. A.B. The longitudinal and transverse images of 2DUS showed a solid tumor located in the left lobe of thyroid. The tumor size was 0.61cmÍ0.57cmÍ0.52 cm and the 2DUS volume was 0.095 cm3. C. In the measurements of 3DUS volume, the longitudinal US image plane was selected as A plane, and a total of six slices images were obtained to manually trace the contour lines of the tumor. D. After the outlining, volume could be obtained automatically with three orthogonal slices simultaneously displayed, and the 3DUS volume was 0.090 cm3

The measurement time of maximum diameter was 54.7 ± 4.8s. The measurement time of 3DUS volume was significantly longer than that of 2DUS (918.85 ± 9.98 s vs. 424.35 ± 9.88 s, P < 0.001). The intra-observer reliability and agreement by each observer are shown in supplement table 1.

Inter-observer reliability

The inter-observer reliability of PTMC measurement were all excellent. The ICCs of inter-observer reliability of maximum diameter, 2DUS volume and 3DUS volume were 0.922(0.864–0.955), 0.928(0.874–0.959), and 0.974(0.955–0.985), respectively. The ICC of inter-observer reliability of volume by two modalities was 0.955(0.909–0.976).

Inter-observer agreement

The inter-observer agreement of PTMC measurements are summarized in Table 3. The Bland-Altman analysis showed that the bias and 95%LOA of maximum diameter was 0.9869(0.7956–1.178). It means that for about 95% of cases, maximum diameter measured by observer A was between 0.7956 and 1.178 times the maximum diameter measured by observer B. This applied to all the reported LOA hereinafter with corresponding variation. The inter-observer agreement of 2DUS volume and 3DUS volume was 1.008(0.5802–1.435), and 1.011(0.7576–1.265), respectively. The width of 95%LOA of maximum diameter, 2DUS volume and 3DUS volume was 0.3824, 0.8548 and 0.5074. For inter-observer agreement of volume measured by 2DUS and 3DUS, the bias was 1.096, which was above one, and the 95%LOA was from 0.7322 to 1.459. The Bland-Altman plots of PTMC measurements are shown in Figs. 4 and 5.

Table 3 The inter-observer agreement of 2DUS and 3DUS in measuring PTMC
Fig. 4
figure 4

Bland-Altman plots of PTMC measurement by the two observers. A. Maximum diameter; B. 2DUS volume; C.3DUS volume; The x-axes showed the average of measurements by the two observers. The y-axes showed the ratio between the two observers. Solid lines were the ratio(bias). Top and bottom dashed lines correspond to upper and lower margins of 95% limits-of-agreement(LOA)

Fig. 5
figure 5

Bland-Altman plots of volume measured by 2DUS and 3DUS. The x-axes showed the average of measurements by the two modalities. The y-axes showed the ratio between the two modalities. Solid lines were the ratio(bias). Top and bottom dashed lines correspond to upper and lower margins of 95% limits-of-agreement(LOA)

Discussion

It was very important to obtain a reliable measurement of PTMC, as it could be the indication of conversion surgery during AS [5]. This prospective study found that the inter-observer reliability of PTMC measurements were all excellent. The inter-observer agreement (bias and 95%LOA) of maximum diameter, 2DUS volume and 3DUS volume was 0.9869(0.7956–1.178), 1.008(0.5802–1.435), and 1.011(0.7576–1.265), respectively. According to our results, for PTMC, any ratio difference from 0.7322 to 1.459 in maximum diameter, or from 0.5802 to 1.435 in 2DUS volume, or from 0.7576 to 1.265 in 3DUS volume could be considered as the measurement variation for patients with PTMC. Among all the measurements, maximum diameter had the narrowest width of 95% LOA, suggesting maximum diameter had a lowest degree of observer variation for PTMC measurement. Moreover, compared with 2DUS volume, 3DUS volume was significantly smaller and had a narrower 95% LOA. It suggested that volume measured by 3DUS had lower variability and higher repeatability than that obtained by 2DUS.

Because of the indolent nature and favorable outcomes, AS has been recommended as a new management option to immediate surgery for patients with low-risk PTMC [3, 5]. During AS, the evaluation of tumor size enlargement was particularly important, as it could affect the treatment decision-making [5]. Tumor enlargement was defined as growth of more than 3 mm in the maximum diameter or a volume increase greater than 50% [5]. Over 5-year AS, the incidence of volume increase greater than 50% was 24.8–47.5% and of growth of 3.0 mm or more was 12.1–22.4% [12, 14]. However, the observer dependence and measurement variation of US for PTMC have not been considered to determine the meaningful changes in tumor size [17].

To our best knowledge, only one study evaluated the inter-observer variation of 2DUS measurement of PTMC, and the results showed that the 95%LOA of maximum diameter was from − 26.6 to 24.5%, and of volume was from − 65.8 to 64.4%, respectively [19]. It suggested that the inter-observer variation of maximum diameter was smaller than that of volume, which was also consistent with previous studies about measurement variation for well-defined nodule by 2DUS [15, 17, 18]. Similar results were also found in this study. It was because that volume measured by 2DUS was subject to a high degree of observer variation by multiplied in three diameters using the ellipsoid formula method, an increase in volume was more likely to detected than a small increases in diameter [14, 15].

This study also evaluated the inter-observer variation of volume measured by 3DUS. The results found that the inter-observer of 3DUS volume were excellent, and the 95%LOA was from 0.7576 to 1.265. It indicated that different observers did not affect the measurement, and 3DUS could be a reliable and reproducible volume measurement of PTMC. Moreover, the bias of volume measured by 2DUS and 3DUS was above one. Compared with 2DUS volume, 3DUS volume were significantly smaller and had a narrower 95% LOA. It suggested that volume measured by 2DUS was overestimated, and volume measured by 3DUS was more reliable than those obtained by 2DUS, which was consistent with previous studies [21, 25, 26]. These results can be explained by the different measurement methods of the two modalities. 2DUS used the ellipsoid formula method to calculate the volume, which was based on the assumption that the object was an ellipse [15]. However, for PTMC that usually had an ill-defined or irregular margin rather than a smooth one, this method could be overestimated the volume when multiplied by three diameters in the calculation. As a result, volume of irregular shaped tumor measured by 2DUS could have a high inter-observer variation [25, 26]. In contrast, these deficiencies could be avoided when volume was measured using 3DUS, which had a significant potential for increasing the reproducibility of volume measurement [20, 28]. Because 3DUS could easily obtain multiple slice images which encompassed the entire tumor, the border of tumor could be sensitively detected and manually outlined to calculate the final volume, even if the target object was small [21].

Although 3DUS could accurately reflect changes in tumor size and identify of tumor enlargement to inform the timing of conversion surgery during AS, there were drawbacks that limit its clinical application in routine evaluation. First, the borders identification by 3DUS could be subject to the errors because not all the slices had high imaging quality as well as 2DUS. Although this study showed that the 95% LOA of 3DUS volume was within the clinical criteria, the measurement still needed to be cautious. Second, 3DUS required additional processing of manually outlining the tumor border after scanning, leading to labor-intensive and time-consuming. Third, compared with maximum diameter, the 95%LOAof 3DUS was still relatively larger. It suggested that maximum diameter was not only a practical and simple method for tumor enlargement, but also had high reproducibility. However, the tumor growth pattern of PTMC was complicated. Some studies reported that PTMC grew rather rapidly in the initial stages of progression, and in many cases the growth decreased or even vanished at certain time points [13, 37]. Therefore, a comprehensive evaluation of tumor size during AS is needed and 3DUS could provide more reliable estimates of tumor volume change.

This study had limitations. First, this study only evaluated the inter-observer measurement variation of PTMC, thus we did not obtain the true volume of tumor. Second, only one US machine has been used in this study. In clinical routine procedure, it was almost impossible to measure the tumor by the same US machine at each follow-up period. Further studies used two different machines are needed to confirm the results. Third, because only 9 patients in this study had Hashimoto thyroiditis, its impact on the measurement variation was not evaluated. Fourth, this study only enrolled patients with solitary PTMC. Further study is needed to investigate whether the results can be applied to multifocal PTMC.

Conclusions

The inter-observer reliability of PTMC measurements by 2DUS and 3DUS were excellent. For PTMC, any ratio difference from 0.7322 to 1.459 in maximum diameter, or from 0.5802 to 1.435 in 2DUS volume, or from 0.7576 to 1.265 in 3DUS volume could be considered as the measurement variation for patients with PTMC. Maximum diameter had the lowest degree of observer variation, which was more practical and simple measurement of PTMC. Volume measured by 3DUS had lower variability and higher repeatability than that by 2DUS, which might be helpful to provide more reliable estimates of tumor size for PTMC.