Background

Gastrointestinal stromal tumors (GISTs) are the most common sarcoma of the gastrointestinal tract, occurring mostly in the muscular wall of the stomach or small bowel, where it is felt to arise from the interstitial cells of Cajal or similar cells [1, 2]. The primary treatment for GIST is surgical excision, but a significant number of cases recur [3, 4]. Adjuvant imatinib, a tyrosine kinase inhibitor, is useful in select cases of GIST based on risk of recurrence [58]. At present, the risk of recurrence is determined based on tumor size, mitotic rate, tumor site, and tumor rupture [1, 5, 813], as for example in the Miettinen risk score [11], but more accurate predictors would be useful to better direct therapy.

While most GISTs have mutations in the KIT gene, mutations in the platelet derived growth factor receptor alpha (PDGFRA) gene are also common [1, 2, 5, 8, 14]. In a small percentage of GISTs, mutations in other genes such as BRAF, succinate dehydrogenase (SDH), or neurofibromatosis (NF) may occur [1, 5, 1520]. The type of KIT or PDGFRA mutation may affect the recurrence rate as well as response to imatinib [5, 8]. Despite the key role of activating mutations of KIT or PDGFRA, GIST biology is also dependent upon other genetic changes [1]. Cases of KIT-mutant GIST have been reported that present with coexisting downstream mutations [5, 8, 21, 22].

Gene expression patterns have been used to predict the development of metastases in soft tissue sarcoma [2326]. Differences in the gene expression profiles of GISTs with different KIT- or PDGFRA-mutant tumors have been reported [27, 28], and several recent studies have explored the use of gene expression patterns to predict recurrence rate of GIST [2935].

In previously published studies using various biochemical pathways, we derived gene expression profiles that identified two subgroups of aggressive fibromatosis (AF-gene set), ovarian carcinomas (OVCA-gene set), and clear cell renal cell carcinomas (RCC-gene set) [3639]. We previously used a gene set derived from these three studies to separate 73 high grade soft tissue sarcoma into 2 or 4 groups with different propensities of metastasis [25]. In an independent study, these gene sets were used to separate 309 high-grade soft tissue sarcoma into 2 or 4 groups with different propensity of metastasis [26].

In the present study, we utilized our three gene sets to examine a group of 60 GISTs using Agilent chip based expression profiling [33]. These gene sets successfully separated the GIST samples into subsets with different probabilities of developing disease recurrence, and may be useful to better predict who would benefit from adjuvant imatinib.

Methods

Samples

Sixty primary tumor samples were obtained from patients who had surgical resection of a GIST, and patients were followed without treatment until tumors recurred as previously described [33]. Frozen samples from resected primary GISTs untreated until tumor recurrence were selected from the European GIST database CONTICAGIST (http://www.conticagist.org). According to French law at the time of the study, experiments were performed in agreement with the Bioethics Law 2004 800 and the Ethics Charter from the National Institute of Cancer; all subjects signed a non-opposition statement for research use of their sample. Total RNA was extracted from each frozen tumor sample, and analyzed on Agilent Whole human 44K Genome Oligo Array (Agilent Technologies) as previously described [33]. Patient characteristics were previously described [33]. These data were kindly provided by Dr. F. Chibon, Institute Bergonie, Bordeaux, France.

Gene sets

Three different previously described gene sets with limited overlap were used: the AF-gene set, OVCA-gene set, and RCC-gene set. These gene sets consist of 161, 173, and 138 known genes respectively [3639]. The AF-gene set and RCC-gene set distinguished between two subgroups of AF samples and RCC samples, respectively. The OVCA-gene set distinguished borderline from invasive serous OVCA. These three gene sets were pooled resulting in a combined gene set.

Hierarchical clustering and fold-change analysis

The AF-, OVCA-, and RCC-gene sets were used individually or combined, to cluster the 60 primary GIST samples. For clustering, genes were median centered, normalized, and then clustered by complete hierarchical clustering using uncentered correlation with Eisen clustering software [40] and viewed using the TreeView software (http://www.rana.lbl.gov) [41].

Analysis of time to metastasis

For each data set, we used the Kaplan–Meier (K-M) method to calculate metastasis-free survival probabilities, and cumulative probabilities of metastasis (one minus survival probabilities) at critical time points (2, 4, 6, and 8 years). p values were calculated by using the log-rank test for comparing different groups. p values ≤0.05 were considered statistically significant. Analyses were performed in R version 3.0.1 [42].

Results

Analysis of GIST samples using the individual AF-, OVCA-, and RCC-gene sets

We analyzed 60 GIST samples with the individual AF-, RCC-, and OVCA-probe sets; patient characteristics have been previously reported [23]. Hierarchical clustering of the GIST samples using each individual gene set (Additional file 1: Figure S1) identified differences in time to metastasis when the GIST samples were analyzed as two groups defined by the first branch point of the clustering. For the AF-gene set, the probability of not developing metastases by 6 years in Group B was 0.54 while none of the patients in Group A recurred (Fig. 1a; Table 1A, p = 0.003). For the OVCA-gene set, the probability of not developing metastases at 6 years was 0.20 in Group B vs 0.97 in group A (Fig. 1b; Table 1B, p < 0.001). For the RCC-gene set, the probability of not developing metastases at 6 years was 0.46 for Group B and 0.90 for Group A (Fig. 1c; Table 1C, p = 0.003).

Fig. 1
figure 1

Kaplan–Meier analysis of the time to development of metastases of two groups (groups A and B) defined by the first break point of the hierarchical clustering of 60 GIST samples. GIST samples were analyzed using the individual AF-gene set (a), OVCA-gene set (b), RCC-gene set (c), and the combined gene set (d). The time to development of metastasis differed between groups A and B (a, p = 0.003; b, p < 0.001; c, p = 0.003; and d, p = 0.029). GIST samples were also analyzed by separating the samples into 3 well-separated clusters using the combined gene set (e, p < 0.001), and by identifying samples as “good” or “poor” prognosis by any 2 of the 3 gene sets as described in the text (f, p < 0.001)

Table 1 Probability (95 % CI) of no metastasis as a function of time

Analysis of GIST samples using the combined gene set

Hierarchical clustering of the GIST samples was also performed using the combined gene set (AF-gene set, OVCA-gene set, and RCC-gene set) (Fig. 1d). Kaplan–Meier analysis was performed using the two sample sets defined by the first branch point. The probability of not developing a metastasis by 6 years was 0.59 for Group B, while none recurred in Group A (Fig. 1d; Table 1D, p = 0.029). In Group B, clustering was evident between two subgroups of sufficient sample size to analyze independently (Fig. 2). The probability of not developing a recurrence by 6 years was 0.39 for Group B2, while none of the patients in Group A or Group B1 recurred (Fig. 1e; Table 1E, p < 0.001 for comparisons between each of the 3 sets). We also grouped the samples into 2 groups defined as “good prognosis” or “poor prognosis”. Samples were defined as “good” prognosis if they were in Group A in the clustering by at least 2 of the 3 gene sets (AF-, OVCA-, or RCC; n = 30). The probability of not developing a metastasis by 6 years was 0.40 for the “poor” prognosis group, while none of the patients in the “good” prognosis group recurred (Fig. 1f; Table 1F, p < 0.001).

Fig. 2
figure 2

Clustering of gene expression in the 60 GIST samples. The samples were clustered using the probes in the pooled gene set as described in the text. Groups A and B (= B1 + B2) are defined by the first branch point in the clustering. Groups B1 and B2 are defined by the second branch point in Group B

Effect of Miettinen risk score on probability of developing recurrence

As some prognostic criteria correlate with recurrence of GIST, we questioned whether this scoring method might be improved by combining it with our clustering patterns. When the GIST samples were analyzed according to Miettinen risk status [11], none of the 29 patients in the low or very low risk groups recurred, yet 16 of 31 patients who scored in the high- or intermediate-risk score by Miettinen risk also did not recur (Fig. 3a; Table 2A).

Fig. 3
figure 3

Kaplan–Meier analysis of the time to development of metastases. Kaplan–Meier analysis of the time to development of metastases for Miettinen risk groups of all GIST samples (a). Miettinen risk identified distinct risk groups (p < 0.001), panel a. When the 31 high- and intermediate-risk GIST samples from panel a were examined for their grouping using the individual AF-gene set (b), OVCA-gene set (c), RCC-gene set (d), and the combined gene set (e) in Fig. 1, the time to development of metastasis differed between groups A and B (b, p < 0.001; c, p = 0.003; d, p = 0.029; and e, p = 0.029)

Table 2 Probability (95 % CI) of no metastasis as a function of time

We went back to the hierarchical clustering performed with each of the gene sets in Fig. 1 to determine where these 31 patients with high- and intermediate-risk from the Miettinen score had been grouped, i.e. were they in Group A (good prognosis) or Group B (bad prognosis) (Fig. 1a–d). The probability of no metastasis for these 31 patients is shown in Fig. 3b–e and Table 2. Of interest is the finding that many of the 31 patients who were classified as high- or intermediate-risk by the Miettinen score were grouped as good prognosis using our gene sets and did not recur. The rate of recurrence in the “good risk” group was 0 % (0/8) for the AF-gene set, 14 % (2/14) for the OVCA-gene set, 18 % (3/17) for the RCC-gene set, and 0 % (0/5) for the pooled-gene set. Interestingly, among the “good” groups of high- and intermediate-risk samples defined by the AF-gene set, 0/3 high-risk and 0/5 intermediate-risk tumors recurred; 1/5 high- and 1/9 intermediate-risk tumors recurred in the “good” group identified by clustering with the OVCA-gene set, and 2/7 high- and 1/7 intermediate-risk tumors recurred when clustered with the RCC-gene set. Among the “bad” prognosis group defined by clustering with the AF-, OVCA-, and RCC- gene sets, 11/14 high- and 4/9 intermediate-risk, 10/12 high- and 3/5 intermediate-risk, and 9/10 high- and 3/7 intermediate-risk samples recurred, respectively.

Discussion

The biologic heterogeneity of GISTs, as with other soft tissue sarcomas, introduces complexities in deciding optimal treatment. This study used hierarchical clustering with gene sets derived from earlier studies of various biochemical pathways in aggressive fibromatosis, renal cell carcinoma, and ovarian carcinoma [3639, 43] to examine 60 GIST samples using Agilent chip expression profiling. The analyses separated the GIST samples into at least two groups with different probabilities of developing metastatic disease. Although the gene sets were derived using biochemical pathways, we did not observe simple differences in biochemical pathways between the groups; possibly with a larger sample set, more detailed biochemical differences will become evident. Our data suggest that appreciation of these GIST subsets with distinct clinical behavior could be used to stratify GIST patients in clinical trials and in patient management. Miettinen risk group classification also identified distinct risk groups in our 60 GIST cases. In particular, our analysis also identified subsets of Miettinen high- and intermediate-risk samples that different in the risk of metastasis. When the high- and intermediate-risk GIST samples were examined without the low- and very low-risk samples, the individual AF-gene set, OVCA-gene set, RCC-gene set, and the combined gene set were associated with the time to development of metastasis. This finding suggests that further characterization of recurrence risk among samples classified as high- or intermediate-risk is possible. Furthermore, these results validate the potential role of the use of these gene sets in predicting the behavior of heterogeneous tumor sets.

These gene sets have also been shown to separate sets of soft tissue sarcoma samples into groups with different metastatic behavior [25, 26]. A gene set of 67 genes involved in mitosis and control of chromosome integrity, termed the complexity index in sarcomas (CINSARC), also predicts metastasis outcome in non-translocation dependent soft tissue sarcomas [23]. Both the gene sets used here and the CINSARC [23, 33] gene set identified subsets of the GIST samples that differed in time to recurrence. These data support the potential use of these gene sets to predict biological behavior in GIST as well as other soft tissue sarcomas. Only 11 of the 67 genes in the CINSARC gene set were also present in our pooled gene set.

Other methods of examining genetic heterogeneity may also be helpful. A recent study found that chromosomal changes detected by comparative genomic hybridization (CGH) were predictive of GIST outcome [33]. This study, as well as a second study, also found that a “genomic index” calculated from the number of chromosomal alterations (segmental gains and losses), and number of chromosomes involved was a strong predictor of recurrence as well [33, 44]. Another study using array-based analysis of gene copy number separated 42 GISTs into 4 groups with different survival rates [35].

Conclusions

Gene expression profiles may provide a useful technique to better predict long-term outcomes after surgery in patients with GIST and other sarcomas. Such information could be used to restrict the use of adjuvant therapy and reduce heterogeneity among groups in clinical trials. Due to the limited sample size of our study, we examined the identification of only two subsets of the GIST sample set with different metastatic propensity. The ability to detect multiple subgroups is highly dependent on the number of samples and the distribution of samples among the various groups. With larger sample sets, it may be possible to further refine classification and identify clinically useful heterogeneity. In addition, although gene expression analysis may provide a useful indicator of long-term outcomes, it should be used in combination with standard prognostic factors in order to have maximum predictive value [8]. For example, in this study, further characterization of recurrence risk among samples classified as high-or intermediate-risk was possible. These results also validate the potential role of the use of these gene sets in predicting the behavior of heterogeneous tumor sets. Several different gene sets appear to separate the samples into 2 groups with different behavior.