Background

Thyroid nodules are common clinically (3–7% on palpation), radiologically (20–76% on ultrasonography), and histopathologically (> 50% in surgical specimens). Only 4–6% of thyroid nodules are malignant; however, the main challenging issue in assessment of thyroid nodules is the exclusion of malignancy. Malignant nodules run in families; therefore, an accurate, affordable, and noninvasive diagnostic test is needed for surveying and to effectively differentiate benign from malignant thyroid nodules [1, 2].

Ultrasound, while widely used for thyroid nodule assessment, has its limitations in accurately distinguishing benign from malignant cases; diagnosing thyroid cancer definitively requires further investigation, often through fine-needle aspiration cytology (FNAC) which is highly accurate, with a success rate of up to 97%. However, up to 16% of FNACs may require a repeat biopsy due to inconclusive results [3, 4].

The diagnostic accuracy of US elastography combined with gray-scale ultrasound is still controversial. Developed in 1991, it leverages the principle that softer tissues like benign lesions yield more easily to external force than stiffer ones like malignancies [5]. US elastography, also called “electronic palpation,” gives an objective assessment of tissue rigidity. There are two known types of elastography used nowadays in medical purposes: strain elastography and shear wave elastography. The final diagnosis usually depends on the combination of conventional US, elastography, and fine-needle aspiration cytology [6].

This study investigated how well US elastography can diagnose malignant thyroid nodules, compared to traditional ultrasound and fine-needle aspiration, using the postoperative pathology as the gold standard.

Methods

We prospectively included 104 patients. They were recruited from Otolaryngology Department and Diagnostic and Interventional Radiology Unit et al.-Kasr Al-Ainy Hospital, Cairo University, Egypt, from the period of January 2021 to September 2022. The study was performed after the approval of the Institutional Review Board of our hospital (IRB no. MD-15–2021). This study respected the principles of the Helsinki Declaration of 1975.

Regardless of age or gender, this study included individuals with thyroid nodules and normal thyroid function who were eligible for any thyroidectomy procedure. While recurrent cases, patients with a history of iodine radiation, those with diffuse thyroid lesions, those with pure cystic nodules, and those not fit for surgery or refusing surgery were excluded.

Assessment of the patients included detailed history taking, vocal cord mobility assessment, head and neck examination, and the following investigations: routine preoperative investigations, thyroid profile, conventional ultrasonography (US), shear wave elastography (SWE), fine-needle aspiration cytology (FNAC), and CT neck in cases with retrosternal extension.

Conventional ultrasound was performed using B-mode (Fig. 1) and color Doppler techniques in assessment of the thyroid nodules. Each nodule’s characteristics like number, dimensions, margin, shape, calcification, and composition including echogenicity were then analyzed and used to assign a risk score based on the ACR TI-RADS criteria. Color Doppler analysis revealed four distinct vascularity categories for the nodules: avascular, peripheral only, internal only, and combined peripheral and internal.

Fig. 1
figure 1

B-mode US of the right thyroid lobe

Then SWE was done using a Toshiba Aplio 500 machine (Toshiba Medical Systems, Tochigiken, Japan) with linear probe of central frequency of 7.5 MHz (Fig. 2). SWE was employed by the same operator, utilizing the same US equipment and maintaining a stable, pressure-free transducer position for 3 s, to minimize compression artifacts and accurately measure tissue stiffness of the target nodules.

Fig. 2
figure 2

Elastography of the left thyroid lobe nodule

SWE mode was displayed over the B-mode image. The color scale was used as blue for soft lesions and red for hard. Two regions of interest (ROI) were chosen, the first one of them in the stiffest area of the nodule and adjacent parenchyma, where cystic components and calcifications were tried to be avoided, and the other one in the normal thyroid tissue or on the sternocleidomastoid of the same size of first ROI. The average SWE values were measured and displayed in kilopascal (KPa), and then the elastic ratio (ER) was calculated by the software. The ER is the ratio between the mean stiffness of the lesion and normal parenchyma. The best of three different measures taken in each nodule was chosen in this study. Measurements of different regions were taken if the size of the nodule was too large to be scanned. Both elastic mean (E-mean) and elastic ratio (ER) are semiquantitative elastic indices (EIs), meaning they provide relative rather than absolute measures of tissue stiffness.

FNAC was performed in nodules with TR3 (if size ≥ 2.5), TR4 (if size ≥ 1.5), and TR5 (if size ≥ 1). The samples obtained were cytopathologically examined using the updated 2017 Bethesda system. Thyroid surgery was performed selectively based on the results of FNAC. This included cases with suspicious nodules (Bethesda 4 & 5), benign nodules causing bothersome symptoms (Bethesda 2), and those with indeterminate cytology results (Bethesda 3).

The surgical specimens were sent for histopathological examination, and the awaited postoperative pathological results were correlated with the preoperative investigation including the conventional US, SWE, and FNAC.

Statistical methods

The Statistical Package for Social Sciences SPSS version 25 for Windows (SPSS Inc., Chicago, IL, USA) was used in the analysis of data. Median, range, mean, and standard deviation were used in the presentation of numerical data, while categorical data was displayed in the form of numbers and percentages. Numerical variables were compared using Student’s unpaired t-test or Mann–Whitney U-test, while categorical ones were compared using the chi-square test or Fisher exact test. The diagnostic performance of each method was computed in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy. Only p-values that are less than 0.05 were considered significant.

Criteria and cutoff values used for prediction of malignancy were solid content, hypo-echogenicity, microcalcifications, taller than wider, irregular margin, TI-RADS ≥ 3, internal vascularity, E-mean ≥ 34.3 kPa, ER ≥ 2.6, and Bethesda category ≥ 3. Also, the tandem method was used in the analysis of SWE and TI-RADS in combination, in which the test is considered positive only when both tools are positive.

Results

Demographic and clinicopathologic characters

This study included 104 patients with thyroid nodules, 20 males (19%) and 84 females (81%). Their age range was 30–69 years with a mean of 46 years.

Seventy-six patients (74%) were presented with solitary nodules. The size of nodules was of a mean of 27.8 ± 6.9 SD mm in malignant nodules, compared to 29 ± 5 SD mm in benign ones, with a p-value of 0.4.

Postoperative histopathology was papillary thyroid carcinoma (n = 24), follicular carcinoma (n = 5), medullary carcinoma (n = 3), undifferentiated carcinoma (n = 1), follicular adenoma (n = 17), multinodular goiter (n = 42), and chronic lymphocytic thyroiditis (n = 12) (Fig. 3).

Fig. 3
figure 3

Postoperative histopathology

Diagnostic performance of ultrasound (Table 1)

Table 1 Diagnostic performance of ultrasound

Ultrasound features like composition, echogenicity, calcification, TI-RAD score, and vascularity significantly differed between benign and malignant thyroid nodules. Among individual ultrasound features, vascularity offered the highest specificity (100%) for discriminating between benign and malignant nodules, followed by calcification (99%), shape (94%), and echogenicity (91%). While composition demonstrated the highest sensitivity (91%) among all the parameters assessed, the overall accuracy of ultrasound in predicting malignancy based on the TI-RAD score was 84%.

Diagnostic performance of SWE (Tables 2 and 3)

Table 2 The results of SWE in benign and malignant nodules
Table 3 Diagnostic performance of SWE

Malignant nodules showed significant higher E-mean and ER than benign ones (p < 0.001).

The average E-mean in benign nodules was 26.9 ± 19.5 kPa, compared to 69.5 ± 27 kPa in malignant nodules, with p-value < 0.001.

The average SWE ratio was 1.9 ± 1.1 in benign nodules, compared to 5.5 ± 2.1 in malignant ones, with p-value < 0.001. The SWE ratio showed a diagnostic sensitivity and specificity of 91 and 86%, respectively.

Diagnostic performance of SWE and US in combination (Tables 4 and 5)

Table 4 Diagnostic performance of SWE and ultrasound in combination (parallel method)
Table 5 Diagnostic performance of SWE and ultrasound in combination (tandem method)

In the parallel method, the results were considered positive if any or both methods (SWE or TI-RADS) were positive and considered negative when and only when both methods showed negative results.

In contrast, the tandem method used a single score in classification of the nature of the nodule, based on the combination of the results of both methods.

Diagnostic performance of FNAC (Fig. 4, Table 6)

Fig. 4
figure 4

Distribution of nodules among Bethesda groups

Table 6 The malignancy rate in each Bethesda Group

On FNAC, 49 nodules (47%) were Category II (benign), while 48 nodules (46.2%) were Category VI or V (which indicates malignant or suspicious for malignancy), according to the Bethesda system. FNAC demonstrates excellent diagnostic performance for Bethesda categories V and VI, with malignancy rates of 97% and 100%, respectively.

Comparison of diagnostic performance of US, SWE, and FNAC (Table 7)

Table 7 Comparison of diagnostic performance of US, SWE, and FNAC

FNAC showed highest performance in comparison to other methods regarding specificity (99%), PPV (97%), and overall accuracy (96%), and its NPV was similar to combination method of US and SWE together.

The combination method of US and SWE achieved highest sensitivity (94%) among other methods but lowest specificity (69%).

Discussion

Thyroid nodules are highly prevalent, so malignant nodules should be differentiated from benign ones. Since the clinical examination is limited in such aspect, ultrasonography and radioisotope scanning serve as initial modalities for stratifying nodules warranting further histopathological assessment [7]. Among elastography techniques, shear wave stands out for its exceptional accuracy in evaluating thyroid nodules. This high sensitivity and specificity help decrease the number of patients requiring unnecessary invasive procedures [8].

In this study, the most specific US features of malignancy were internal vascularity (100%), microcalcifications (99%), hypo-echogenicity (94%), and shape of lesion (91%). However, none of them showed high sensitivity (values were 55%, 36%, 67%, and 12%, respectively). On the opposite side, the solid content of the nodule showed highest significant sensitivity (91%) but very low specificity (30%).

This was concordant with a previous study which had described the internal vascularity and hypo-echogenicity as being the most specific US features of malignancy (100%, 90.9%), but their sensitivity was low (54.5%, 66.6%) [9]. While other authors found that microcalcification is the highest sonographic criterion in prediction of malignancy (85% sensitivity and 75.6% specificity), this discrepancy could be explained by inclusion of samples of different sizes in these different studies. They used a significantly larger group (313 patients) in which malignant nodules were found to be proportionally higher [10].

There is no single criterion in the US that has high accuracy in distinguishing nodules. However, the sensitivity and specificity of the diagnosis could be improved by using combined criteria [10, 11]. TI-RADS had evolved over time to improve accuracy in identifying malignant thyroid nodules. Initially introduced with good performance (88% sensitivity/NPV), it was further refined to 95.7%/99.7%. Our study used the latest version with a pooled sensitivity of 85% and NPV of 92% [12, 13].

Unfortunately, increased sensitivity was at the expense of specificity, which was 83% in this current study with PPV of only70%. Other studies concluded the same result of low specificity. This low specificity explained the high incidence of benign cytology (87–91%) found in most of the nodules assessed by FNAC on the basis of US suspicion only [14, 15]. Moreover, until now, no unified ultrasonographic standard for detection of thyroid cancer has existed yet [10].

In this study, the values of E-mean in malignancy were higher and significant (p< 0.001), as was found in the meta-analysis study (84.3% sensitivity and 88.4% specificity) [16]. Also, another study had reported higher E-mean in thyroid carcinoma with diagnostic specificity of 86.4%, compatible to the E-mean reported in our study of diagnostic specificity of 83% [17].

The elasticity ratio (ER) was chosen as the cutoff value for distinguishing malignant from benign thyroid nodules among the tested SWE EIs due to its higher sensitivity and specificity. When a cutoff of 2.6 was chosen for ER, it provided values of 91, 86, 75, and 95%, for sensitivity, specificity, PPV, and NPV, respectively. This was concordant with many other studies that showed high accuracy value of ER, being significantly higher in malignant nodules [10, 18].

Combination of elastography with TI-RAD score was studied and showed better sensitivity and NPV than SWE or TI-RAD alone in some studies [19, 20]. In this current study, higher values were also obtained (sensitivity and NPV increased to 94% and 96%, respectively).

Variable results were shown among different studies, with sensitivity values range of 86–97% and specificity values range of 66–100% [21, 22].

The specificity values differed greatly between studies. It was relatively higher in retrospective studies than prospective ones (with average of 92% vs 79%, respectively), most probably due to selective bias [23].

In addition, although combination of SWE and TI-RAD yielded higher sensitivity as reported earlier, the specificity was decreased by such combination in some studies [18, 20] and in this current study also (specificity decreased to 69% by combination after being 86% in SWE alone).

In an attempt to achieve a better specificity, we tested the tandem method in such study, in which only the test is considered positive when both SWE and TI-RAD are positive. By applying such method, we achieved the highest values of specificity and PPV, 97% and 93%, respectively. Unfortunately, the sensitivity was affected as it decreased to its lowest value (79%). The same finding was reported previously [10].

Our results suggest a potential complementary relationship between SWE and TI-RADS. TI-RADS may compensate for limitations of SWE such as calcifications and carotid pulsations, while SWE may mitigate potential operator and observer variability associated with TI-RADS.

While this study provided insights into SWE parameters for papillary carcinomas and common benign nodules, its focus on a single center and limited representation of diverse malignancies like thyroid lymphomas and Hurthle cell carcinomas restricts its applicability to a broader range of thyroid pathologies.

While inter/intra-observer variability could be a challenge in interpreting ultrasound evaluations, we reduced its impact in this study by consistently using the same ultrasound machine, settings, and experienced operator. This helped ensure standardization and minimize potential discrepancies.

Conclusion

Combining SWE and TI-RADS could enhance the diagnostic accuracy in differentiating thyroid nodules by leveraging their complementary strengths. This noninvasive approach has the potential to accurately assess malignancy risk and potentially reduce the need for invasive procedures, although further research may be necessary.