Background

Thyroid nodules are more and more often detected given the increasing use of imaging (about 41% of the population by ultrasound), Mostly incidentally and asymptomatic and only 10% malignant [1, 2].

Ultrasound is the most precise and cost-effective in their evaluation, whose objective is to distinguish between those benign that may be kept under surveillance from those with malignant features that require additional approaches. Fine-needle aspiration (FNA) has revealed high sensitivity and specificity in that distinction, with a rate of non-diagnostic results of only 2–16%. However, its performance must be selective in order to avoid unnecessary surgeries, which are not risk-free in nodules with indeterminate cytology (5–20%), of which only 20% are malignant. Thus, FNA indications should be based on sonographic stratification of the risk of malignancy, in conjunction with the clinical presentation and patient’s risk factors [1, 3, 4].

To balance the benefit of detecting clinically significant cancers with the risk and cost of FNA and treatment of benign nodules or indolent cancers, classification systems have arisen to group nodules in categories with equal percentage of malignancy risk, based on sonographic features. An ideal stratification system must recommend the least number of FNA possible, identifying most neoplasms [5].

The initial propose of Thyroid Imaging Reporting and Data System (TI-RADS) was developed in 2009 by Horvath et al., and later others emerged, the American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (ACR TI-RADS) and the European Thyroid Imaging and Reporting Data System (EU-TIRADS) being among the most used. Their intent was to simplify the report of sonographic findings [2, 3, 6, 7].

These systems share some characteristics, but they also have some differences (Fig. 1). In ACR TI-RADS the nodules receive a sum of points assigned to 5 sonographic features (composition, echogenicity, morphology, margins, and echogenic foci). Based on their final score, one of five risk category is assigned, from TR1 (benign) to TR5 (highly suspicious of malignancy). In EU-TIRADS a specific feature instantly classifies the nodule into one category, eliminating the need of point summation and consequently simplifying the classification process and turning it less time-consuming. It only takes the presence of non-oval morphology, irregular margins, microcalcifications or marked hypoechogenicity to consider a nodule of high risk of malignancy (category 5). Inherent to its greater simplicity is the non-consideration of some potentially relevant features, namely the presence of macrocalcifications [3, 6].

Fig. 1
figure 1

Differences between ACR TI-RADS and EU-TIRADS classification systems. In ACR TI-RADS, the risk category is based on a sum of points assigned to 5 sonographic features. In EU-TIRADS a specific feature instantly classifies the nodule into one category. Adapted from references 3 and 6. Adapted from references 3 and 6

The goal of this study was to compare the performance of EU-TIRADS and ACR TI-RADS in terms of avoidance of unnecessary FNA and diagnostic performance in thyroid nodules submitted to ultrasound and FNA.

Methods

A retrospective analysis of thyroid nodules submitted to ultrasound and FNA at a tertiary-care institution between January 1st of 2016 and July 31st of 2019 was performed. Demographic data (sex and age) of patients were registered and sonographic images of the nodules available in the Hospital’s archive were analysed, without previous knowledge of the histological results. Ultrasound and FNA were performed by four different radiologists with 3 to 20 years of experience in thyroid imaging.

To each nodule the sonographic features evaluated included its size (largest axis in millimetres), location (right lobe, left lobe, isthmus), composition (cystic/ almost completely cystic, spongiform, mixed or solid/almost completely solid), echogenicity (anechoic, very hypoechoic, hypoechoic, isoechoic or hyperechoic relatively to the adjacent thyroid parenchyma), morphology (taller-than-wide or not), contour (smooth, ill-defined, lobulated/irregular, extra-thyroidal extension), presence of echogenic foci (none, with comet-tail artifact, macrocalcifications, peripheral calcifications or punctate echogenic foci) and the presence or absence of hypoechoic halo. In nodules located both in a lobe and isthmus its predominant location was considered. In nodules with mixed composition, the echogenicity of its solid component was considered. Based on these features, each nodule was classified according to ACR TI-RADS and EU-TIRADS, by means of retrospective analysis of the sonographic images. Subsequently, cytological results were obtained, including its classification by Bethesda System [8]. In surgically excised nodules (within a maxim period of three months after sonographic evaluation), the histopathological result was obtained.

Nodules < 10 mm (without indication to FNA in neither classification system) and those with non-diagnostic ytological results were excluded. Lastly, differences between FNA recommendations of both systems were analysed.

Statistical analysis was performed with the 23rd version of the Statistical Package for Social Science (SPSS). Chi-square (χ2) and Fisher exact tests were used to correlate categorical variables, as indicated. It was considered statistically significant a p-value < 0.05.

This retrospective study was approved by our institution’s Ethics Committee and the requirement for informed consent from patients was waived.

Results

Study sample

During study’s period, a total of 701 thyroid nodules of 646 patients were submitted to ultrasound and FNA. Eleven nodules under 10 mm were excluded, as were 25 with non-diagnostic cytological results. The final sample included 665 nodules in 598 patients, 556 females (83.6%) and 109 males (16.4%), with an average (± Standard Deviation) age of 59.1 ± 15.6 years old.

Cytological result according to was atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) in 63 (9.5%), follicular neoplasm or suspicious for a follicular neoplasm in 20 (3%), suspicious for malignancy in 3 (0.5%), malignant in 9 (1.4%), and benign in the remainder 570 (85.7%), mainly colloid or hyperplasic adenomatous nodules. The proportion of nodules within each diagnostic category of the Bethesda System for Reporting Thyroid Cytopathology is represented in Fig. 2. Descriptive analysis of sonographic features of thyroid nodules is detailed in Table 1.

Fig. 2
figure 2

Diagnostic Categories of the Bethesda System for Reporting Thyroid Cytopathology

Table 1 Descriptive analysis of sonographic features of thyroid nodules

Of the 665 nodules studied, 75 were surgically removed, including 6 Bethesda VI, 2 Bethesda V, 15 Bethesda IV, 20 Bethesda III, and 32 Bethesda II nodules. All specimens were evaluated by the same pathology team. Malignancy was reported in 23 of them, which included 6 Bethesda VI (3 classic and 3 follicular variant), 1 Bethesda V (classic variant papillary carcinoma), 5 Bethesda IV, 4 Bethesda III (follicular carcinoma) and 7 Bethesda II (incidental papillary tumours). Therefore, in the analysis, the malignant group includes the 16 surgically confirmed (excluding the incidental tumours) (Fig. 3).

Fig. 3
figure 3

Final study sample selection after applying exclusion criteria (size < 10 mm, non-diagnostic cytology). AUS/FLUS = atypia of undetermined significance/follicular lesion of undetermined significance

Presumed reasons for the surgical resection of nodules without malignant histology include negative compressive or aesthetic effects (their average size was 40 mm), or fear of malignancy.

Notably, among the nodules with non-diagnostic citology histology (n = 25) not included in the analysis, an incidental papillary carcinoma was found in the specimen.

Comparison between ACR and EU TI-RADS

A statistically significant association was found between equivalent risk categories of ACR TI-RADS and EU-TIRADS (p < 0.001). Categories obtained with both systems are represented in Fig. 4. Using EU-TIRADS the number of nodules classified as TI-RADS 5 was higher (n = 144) in comparison with ACR TI-RADS (n = 67).

Fig. 4
figure 4

Risk Categories of the thyroid nodules assigned according to ACR TI-RADS (purple) and EU-TIRADS (blue)

Regarding recommendations to perform FNA according to the nodules’ size, there would have been a reduction of 32% (n = 213) and 24.5% (n = 163) in FNA performed according to ACR TI-RADS and EU-TIRADS, respectively.

Both ACR and EU-TIRADS classification systems presented a statistically significant association with Bethesda classification levels < III/\(\ge\) III (p = 0.005 and p < 0.001, respectively, Table 2).

Table 2 Correlation between ACR and EU-TIRADS classification systems and Bethesda classification levels < III/\(\ge\) III

Excluding risk category EU-TIRADS 1 and combining categories ACR TI-RADS 1 and 2 in a category of benign or non-suspicious nodules, equivalent to the category EU-TIRADS 2, a different risk category was obtained in 174 nodules (26.1%), mainly with an upgrade of the EU-TIRADS category (Table 3, Fig. 5). Regarding the size of the nodules, that discrepancy would change the indication to perform FNA in 54 (8.1%). In 49 of those, FNA would only be indicated following EU-TIRADS recommendations, whereas in 5 it would only be indicated following ACR TI-RADS. In the remaining 491 nodules (73.8%), the level of risk was identical independently of the classification system applied.

Table 3 Sonographic features of nodules that obtained different risk categories according to ACR TI-RADS and EU-TIRADS
Fig. 5
figure 5

Representative images of nodules with different risk categories according to ACR TI-RADS and EU-TIRADS. A – nodule with mixed composition with isoechoic solid component (category ACR 2/EU 3); B – solid nodule markedly hypoechoic (category ACR 4/ EU 5); C – solid isoechoic nodule with punctate echogenic foci (arrow, category ACR 4/EU 5); D – solid hypoechoic nodule with taller-than-wide morphology (category ACR 4/ EU 5); E – solid hypoechoic nodule with posterior bulging of thyroid’s contour, indicative of extrathyroidal extension (category ACR 5/EU 4)

Only 4 of the cases with a discrepancy in the recommendation to perform FNA (3 only indicated following EU TI-RADS and 1 following ACR TIRADS) had Bethesda IV cytology, of which only 2 were surgically removed, with benign results (although an incidental papillary tumour was found). Five nodules with similar discrepancy (3 only indicated following EU TI-RADS and 2 following ACR TIRADS) had Bethesda III cytology, of which only one was removed, with benign results. In the remaining nodules, when the indication to perform FNA was different, the cytological results were Bethesda II, of which 7 were surgically removed. Interestingly, in 2 of those (with FNA only recommended following EU-TIRADS), incidental papillary microcarcinomas in surgical specimens.

Considering only the 75 nodules that were surgically removed (16 malignant and 59 benign), only 2 of the 18 nodules that wouldn’t have indication to perform FNA according to ACR TI-RADS proved to be malignant (11.1% false negatives) and 14 of the 57 nodules with indication (24.6% true positives). Following EU-TIRADS recommendations, 2 of the 13 nodules without indication to FNA were malignant (15.4% false negatives), as well as 14 of the 62 with indication (22.6% true positives). One papillary carcinoma (classic variant) wouldn’t be detected following both system’s recommendations (isoechoic solid nodule, with no other suspicious features).

Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy of both classification systems in the detection of malignant thyroid nodules are represented in Table 4. Both exhibit a good sensitivity and NPV but weak specificity and PPV. ACR TI-RADS turned out to be somewhat more specific (27.1% vs 18.6%) and accurate (40% vs 33.3%). Thus, ACR TI-RADS appears more efficient in the exclusion of malignancy, although the difference between the performance of both is small.

Table 4 Sensitivity, specificity, PPV, NPV and accuracy of ACR TI-RADS and EU-TIRADS, according to FNA recommendation

Discussion

Ultrasound is indicated in the initial evaluation and follow-up of thyroid nodules. Classification and malignancy risk stratification systems based on sonographic features of those nodules allow improvement in interobserver reproducibility in their description, simplification in communicating the results, and aiding in the decision whether to perform FNA or not, minimizing unnecessary punctions [3]. ACR TI-RADS incorporates all the sonographic features of the nodules, scored based on their malignant potential, making this system objective and detailed. EU-TIRADS is simpler, including only 4 sonographic features indicative of a high risk of malignancy.

A common limitation to both classification systems is the potential difficulty in interpreting some of the nodules’ features, as the identification of a spongiform composition and distinguishing between punctate echogenic foci that represent microcalcifications from comet-tail artifacts that reflect colloid crystals. However, the agreement in the final TI-RADS category is usually more consistent between readers than the agreement regarding individual features. To reduce the interobserver variability, radiologists’ meetings to discuss discrepant cases and obtaining second opinions in cases of uncertainty could be useful [3, 6, 9].

Regarding nodules’ size, although there are studies in which larger nodules were associated with a higher probability of malignancy, others revealed an inverse association with the size [10,11,12]. Thus, nodule size is not considered useful in distinguishing between benign and malignant nodules, and in both ACR TI-RADS and EU-TIRADS, it is a criterion to recommend FNA but not to stratify the risk of malignancy [3, 6, 13]. In fact, FNA of TR5 nodules with 5 to 9 mm in size may be appropriate under some circumstances, and the decision to perform FNA must be based not only on the nodule’s size but also on the clinical risk factors (for example, it doesn't make sense in inoperable patients, or those with a low life expectancy because of other comorbidities) and patient’s desire [3, 6].

Considering a nodule with mixed echogenicity, ACR suggests that it should be described as “predominantly” hiper-, iso-, or hypoechoic. In EU-TIRADS, it is the echogenicity of the solid component that distinguishes between low risk and intermediate-risk categories (it only takes any hypoechoic portion to be considered intermediate-risk, as it is only required a very hypoechoic portion to be considered high-risk). Of the nodules with a different ACR TI-RADS and EU-TIRADS classification in our study, in most cases (75.3%, n = 131) such discrepancy was due to the nodule’s echogenicity, with a higher risk category using EU-TIRADS. Among those, FNA was only recommended according to EU-TIRADS in 45, two of which proved had Bethesda IV cytology (not surgically removed for pathology confirmation). The remaining were benign, with 2 incidental papillary tumours being found in surgical specimens. Despite the consideration of the echogenicity alone increased the risk category in EU-TIRADS (thus reducing the dimensional cut-off to recommend FNA), the authors of this system make the reservation that this feature must be combined with the presence of other features that reduce or increase the risk of malignancy [3].

Both ACR and ETA consider microcalcifications a feature of high risk of malignancy. ACR TI-RADS assigns 3 points to the nodule and EU-TIRADS categorize as risk category 5, regardless of its other features. For this reason, the presence of such foci in 8 nodules of our study, considered moderately suspicious according to ACR TI-RADS, turned them of high-risk according to EU-TIRADS. Two of them would only have an indication to perform FNA according to EU-TIRADS, one of which had Bethesda IV cytology, whose removal revealed only an incidental papillary tumour.

The point assigned to the presence of macrocalcifications in ACR TI-RADS, contributed to the fact that 15 solid isoechoic nodules were considered moderately suspicious, whereas according to EU-TIRADS they were low-risk. Five of them would only have an indication to perform FNA according to ACR TI-RADS, one of them with Bethesda IV The distinction between irregular and lobular margins is a matter of discussion, representing the sonographic feature with greater interobserver variability [14]. Despite that, the presence of an irregular margin is a highly precise marker of malignancy. ACR TI-RADS assigns 2 points to the nodules, whereas EU-TIRADS puts it automatically in a high-risk category. For this reason, 4 of the analyzed nodules obtained an ACR TI-RADS 4 e EU-TIRADS 5 classification (all benign).

Both ACR and ETA consider the presence of extrathyroidal extension (bulging, protrusion or disruption of the capsular margin) a highly reliable sign of malignancy and unfavorable prognosis. Although it only assigns 4 points in the ACR TI-RADS, it is not included in the objective criteria of risk stratification in EU-TIRADS [3, 6]. For this reason, 9 nodules with capsular bulging in our study had a higher risk category with ACR TI-RADS. None of them was malignant.

Regarding nodular morphology, ACR considers that the taller-than-wide appearance is a poorly sensitive indicator of malignancy, but highly specific. It seems to be due to the compressibility of the nodule, reflecting a centrifugal growth, as opposed to the growth of benign nodules, parallel to the normal thyroid tissue plane [12, 15]. In ACR TI-RADS 3 points are assigned to taller-than-wide nodules, which are instantaneously considered high risk in EU-TIRADS. For this motive, 7 nodules in our study had a higher risk category with EU-TIRADS, without discrepancy in FNA indication.

In the present study, both systems allowed a significant reduction in the number of FNA performed (specially ACR TI-RADS), at the cost of non-diagnosing very few malignant nodules. The authors of a prospective study with 502 nodules that showed similar results suggest that the greater reduction of FNA with ACR TI-RADS reflects the greater dimensional cut-off to recommend it in the low-risk nodules (25 mm instead of 20 mm). Such value was chosen given the fact that follicular cancers < 2 cm rarely present with distant metastasis [6, 16]. In our study, that was the case of 17 nodules low-risk in both classifications, measuring between 20 and 25 mm, thus only having indication to perform FNA according to EU-TIRADS.

As previously mentioned, the main cause of discrepant categories, responsible for a greater category according to EU-TIRADS and consequently indication to perform FNA in a greater number of nodules using this system, was the fact that the hypoechogenicity of a mixed nodule confers it moderate risk, whereas according to ACR TI-RADS it would have been considered only low-risk. Another explanation for the discrepancy is the application of isolated sonographic features, namely very low echogenicity, presence of microcalcifications, and irregular margins to directly assign a high-risk category to a nodule in EU-TIRADS (resulting in a greater number of nodules with this category), whereas in ACR TI-RADS it is necessary a group of potentially suspicious features to consider a nodule of high-risk. It is important to state that, among nodules with indication to perform FNA only according to EU-TIRADS, only 3 revealed Bethesda IV cytology, of which only one was surgically removed, whose specimen revealed an incidental papillary tumour (as was the case of the one nodule with indication to perform FNA only according to ACR TIRADS, whose specimen revealed benignity). Notably, 2 incidental papillary microcarcinomas were found in surgical specimens of after benign cytology of nodules with indication to perform FNA only according to EU-TIRADS.

A systematic review and meta-analysis of 29 studies comparing the diagnostic performance in the detection of thyroid cancer of four US-based risk stratification systems in a total of 33 748 thyroid nodules reported a higher pooled sensitivity (with a larger difference considering only category 5 nodules) and slightly lower specificity for EU-TIRADS, compared with ACR TIRADS, although the overall diagnostic performance of all systems was comparable [17]. Our results show equally good sensitivity and NPV between ACR TIRADS and EU-TIRADS, although ACR TI-RADS was slightly more specific (27.1% vs 18.6%) and accurate (40% vs 33.3%). Still, the differences found between the performance of both were very small.

Limitations

The main limitations of this study include its retrospective nature, limiting the categorization of nodules to the sonographic images available, as well as the fact that the sonographic evaluation was performed by different operators and ultrasound machines, which could have influenced the image-based nodule characterization and classification and consequently, the results. Besides, the initial sample was comprised by a group of patients referred by doctors of different specialties to perform FNA and may therefore not constitute a representative sample of the general population. The number of pathology proven nodules was small, of which few were malignant, although with a relatively similar proportion of follicular and papillary carcinomas. These limitations are overcome with larger size controlled prospective studies.

Conclusions

Classification and malignancy risk stratification systems of thyroid nodules based on sonographic features allow the limitation of FNA performance to the really necessary cases, in a uniformized and simplified manner. Data analysed in this study revealed a statistically significant association between ACR TI-RADS and EU-TIRADS and both can be considered appropriate to stratify malignancy risk of thyroid nodules. ACR TI-RADS allowed a greater reduction of FNA performed, without significant loss in the detection of malignant nodules.