Abstract
Objective
Patients with gastric atrophy and intestinal metaplasia (IM) were at risk for gastric cancer, necessitating an accurate risk assessment. We aimed to establish and validate a diagnostic approach for gastric biopsy specimens using deep learning and OLGA/OLGIM for individual gastric cancer risk classification.
Methods
In this study, we prospectively enrolled 545 patients suspected of atrophic gastritis during endoscopy from 13 tertiary hospitals between December 22, 2017, to September 25, 2020, with a total of 2725 whole-slide images (WSIs). Patients were randomly divided into a training set (n = 349), an internal validation set (n = 87), and an external validation set (n = 109). Sixty patients from the external validation set were randomly selected and divided into two groups for an observer study, one with the assistance of algorithm results and the other without. We proposed a semi-supervised deep learning algorithm to diagnose and grade IM and atrophy, and we compared it with the assessments of 10 pathologists. The model’s performance was evaluated based on the area under the curve (AUC), sensitivity, specificity, and weighted kappa value.
Results
The algorithm, named GasMIL, was established and demonstrated encouraging performance in diagnosing IM (AUC 0.884, 95% CI 0.862–0.902) and atrophy (AUC 0.877, 95% CI 0.855–0.897) in the external test set. In the observer study, GasMIL achieved an 80% sensitivity, 85% specificity, a weighted kappa value of 0.61, and an AUC of 0.953, surpassing the performance of all ten pathologists in diagnosing atrophy. Among the 10 pathologists, GasMIL’s AUC ranked second in OLGA (0.729, 95% CI 0.625–0.833) and fifth in OLGIM (0.792, 95% CI 0.688–0.896). With the assistance of GasMIL, pathologists demonstrated improved AUC (p = 0.013), sensitivity (p = 0.014), and weighted kappa (p = 0.016) in diagnosing IM, and improved specificity (p = 0.007) in diagnosing atrophy compared to pathologists working alone.
Conclusion
GasMIL shows the best overall performance in diagnosing IM and atrophy when compared to pathologists, significantly enhancing their diagnostic capabilities.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Gastric cancer (GC) is a major global health concern, ranking as the fifth most commonly diagnosed malignant tumor and the fourth leading cause of cancer-related deaths worldwide [1]. More than 95% of GC are adenocarcinomas [2], of which intestinal adenocarcinoma is the most common type [3].
Several pathologies of chronic atrophic gastritis (CAG), including atrophy, intestinal metaplasia (IM), and dysplasia, are important pathways for the development of intestinal-type adenocarcinoma from normal mucosa (also known as the “Correa cascade”) [4]. The GC risk in CAG patients gradually increases with the progression of the Correa cascade. Recent studies [5, 6] have shown that the annual incidence rates of gastric cancer for atrophy, intestinal metaplasia, and dysplasia are 0.1%, 0.12–0.25%, and 0.6–6%, respectively. The operative link for gastric intestinal metaplasia assessment (OLGIM) [7] and operative link for gastritis assessment (OLGA) [8] by integrating the IM/atrophy score and the topography [9] have been advocated by international guidelines [3, 10, 11] for risk stratification of individuals diagnosed with gastric precancerous conditions. However, accurately assessing pathological IM and atrophy and stratifying OLGIM/OLGA risk have been difficult for pathologists.
Since the Sydney system was updated in 1994 [12], pathologists have continuously pointed out that the diagnostic system can be challenging in clinical practice [5]. The diagnosis of atrophic gastritis needs grading the severity of gland loss, which is difficult to evaluate quantitatively with accuracy, resulting in poor precision [11]. To overcome this difficulty, pathologists have tried various methods, including holding meetings to unify conceptual terminology [13], cycling a group of pathological pictures for repeated training [14], and proposing some intuitive measurement methods [15], etc. However, these efforts have failed to effectively improve accuracy, or the methods were difficult to apply widely. So far, the consistency and accuracy of pathological histology in patients with CAG for diagnosis and GC risk stratification remain limited [3, 6, 10, 11].
Deep learning has shown potential in medical image analysis. Automatic recognition technology based on deep learning has also achieved outstanding results in the diagnosis of digital pathological images, such as breast cancer, lung cancer, colorectal cancer, and prostate cancer [16,17,18,19,20,21]. Under certain conditions, the diagnostic performance of these artificial intelligence models is not inferior to that of human experts. However, these studies often use fully supervised learning, requiring pathologists to manually label lesions for pixel-level training, which is easily affected by various factors. Deep learning based on weak supervision can automatically mine suspicious lesions only with accurate category information. It is suitable for situations that are highly affected by subjective factors or difficult to obtain manual annotations. It is expected to be applied to computational pathology to further improve the accuracy and consistency of diagnosis [22,23,24].
In this study, 2725 whole-slide images (WSIs) were collected continuously from 545 endoscopic suspected CAG patients in a multi-center trial to establish a deep neural network-based diagnostic model named GasMIL. The randomized observer study was conducted to verify the accuracy and consistency of the diagnoses made by pathologists assisted with GasMIL. We aimed to establish and validate a convolutional neural network algorithm to diagnose and risk stratification of individuals diagnosed with precancerous gastric mucosal changes.
Methods
Study design
The data sets of this study were obtained from a multicentre, prospective trial registered at https://clinicaltrials.gov/ (register number: NCT02955134). All authors had access to the study data, reviewed, and approved the final manuscript. From December 22, 2017, to September 25, 2020, 545 patients suspected of having atrophic gastritis were consecutively included during endoscopy at 13 tertiary hospitals. According to a 4:1 ratio, patients were randomly divided into a model construction set and a test set. The part used for model construction was further randomly divided into a training set and a validation set according to a ratio of 4:1 (Fig. 1).
The ethics committees of the 13 tertiary hospitals approved the trial protocol, and all participants signed informed consent forms. An independent data safety monitoring committee was responsible for monitoring the progress and safety of the trial.
Participants and biopsy assessment
The inclusion criteria were patients aged 40–65 years suspected of having chronic atrophic gastritis during endoscopy. The exclusion criteria included autoimmune gastritis, gastric or duodenal ulcers, upper gastrointestinal bleeding, high-grade intraepithelial neoplasia in the gastric mucosa, or suspected malignant transformation based on histological diagnosis.
For patients meeting the inclusion and exclusion criteria, specimens were taken from 5 sites in the stomach, including two from the lesser and the greater curvature of the antrum (both within 2–3 cm from the pylorus), one from the lesser curvature of the corpus (about 4 cm proximal to the angulus), one from the middle portion of the greater curvature of the corpus (approximately 8 cm from the cardia), and one from the incisura angularis. The tissues were sliced, scanned using MoticEasyScan Pro, and then uploaded to the online diagnostic system.
WSIs were reviewed by two pathologists, and the superficial slices were removed for the follow-up study. Three experienced pathologists independently graded diagnoses according to the New Sydney system [12] (see Supplementary Figure S1 for diagnostic criteria), and the final diagnosis result was obtained after discussing any inconsistencies.
Model development
We proposed a deep neural network named GasMIL (Fig. 2) to predict the degree of atrophy/IM of gastric tissue slice images. Based on the pathological slice images from five parts of each patient, a patient-level grading prediction was comprehensively obtained. We apply weakly supervised learning in our algorithmic framework, specifically multiple instance learning (MIL). Unlike traditional deep convolutional neural networks, MIL only requires coarse-grained labels for the pathological diagnosis of each image, avoiding the need for complicated manual annotations by doctors. Additionally, we constructed MIL features for pathological images of different resolutions and performed multi-scale aggregation. The basic principle of this design is that shallow low-level features (such as local edges and textures) and deep high-level features (such as severe disease appearance) contain useful information for grade prediction and extracting and integrating comprehensive multi-scale image features helps in making the best grade decision.
Self-supervised learning for learning patch embedding
We cropped each WSI into non-overlapping blocks at resolutions of 224 × 224 with a field of view of 0.5 µm per pixel (MPP) and 2.0 MPP, respectively. Before multi-instance learning, we pre-learn embedding for each cut patch using SimCLR [25] proposed by “Hinton’s team”. This simple framework for contrastive learning was employed to learn robust image representations without manual labeling. For each original patch, we applied several data augmentation operations (including random rotation, random color distortion, etc.) to generate sub-images and performed feature extraction through a resnet18-based encoder. We then constructed a contrastive loss to minimize the distance between these sub-images from the same original image in feature space. The output of the trained encoder was used for downstream MIL tasks.
Construction of multi-scale patch embedding
For both 0.5 MPP and 2.0 MPP magnifications, we constructed single-scale WSI classifiers. The patch embedding under 2.0 MPP magnification was spliced with the embedding corresponding to the physical 0.5 MPP magnification position to obtain a comprehensive patch embedding.
Using WSI classifier to select key patches
In the MIL hypothesis, when a WSI is marked as positive (label > 0), at least one patch is the target lesion area; if the mark is negative, all patch labels should be negative. Based on this assumption, we used the MLP network as a classifier to feed multi-scale patch embeddings. After completing the training of the patch-level classifier, we could obtain the probability of all patches in the current WSI being predicted as lesion areas and sort them to obtain the patches that should be the most focused on. We uniformly selected the top 20 patches with the highest ranking for each WSI to input into the downstream aggregator.
Apply the transformer to the aggregation of patches
The traditional MIL uses pooling algorithms to comprehensively evaluate the prediction probability of the top patches and obtain the prediction degree of WSI. However, these CV-based pooling algorithms ignore the correlation information between patches. Considering the potential model enhancements from these correlations, we introduced the transformer into the aggregation stage. The 100 most critical (most likely to be lesion area) patch vectors obtained from each WSI through the first step were sequentially passed to the transformer classifier to predict the entire rank probability of WSI.
Patient-level prediction model construction
For the WSI of five gastric parts in the same patient, we obtained five prediction grades through the WSI-level prediction model. According to the OLGA and OLGIM, we could then obtain the final patient-level prediction grade.
Observer study
Sixty patients from the test set were randomly selected for observational studies. All clinical information was concealed and randomly divided into two groups: one with the aid of GasMIL diagnostic results and the other without. The digital WSIs were distributed to 10 pathologists for diagnosis, and the diagnosis results were recorded. The AUC, sensitivity, specificity, and consistency of the two groups of slides diagnosed by 10 pathologists were obtained.
Statistical analysis
We employed a combination of statistical tests, including the T-test and Wilcoxon signed-rank test, to examine the impact of age at baseline and a combination of the Chi-square test, Fisher’s exact test, continuously corrected Chi-square test, and signed-rank test to analyze gender. The Wilcoxon signed-rank test was also used to compare baseline histological data.
Receiver operating characteristic (ROC) curves and area under the curve (AUC) were analyzed using the machine learning Python package sci-kit-learn to quantify diagnostic classifier performance, as well as accuracy, sensitivity, and specificity. The cutoff value of the ROC curve was set at 0.5. Cohen’s kappa coefficient was used to assess interobserver agreement between diagnostic models and human pathologists. Python and Pytorch were used to build WSI algorithms.
Results
Baseline characteristics
Between December 2017 and September 2020, up to 609 potentially eligible patients from 13 Chinese tertiary hospitals were enrolled in this study. Among them, 64 patients were excluded because the biopsy samples were superficial. Thus, 545 patients with 2725 WSIs were finally enrolled for analysis (Fig. 1).
After randomization of these patients, 349 patients with 1745 WSIs were assigned to the training set, 87 patients with 435 WSIs were assigned to the validation set, and the other 109 patients with 545 WSIs composed the test set. Their characteristics are summarized in Table 1. There were no significant differences in all baseline characteristics (p > 0.05) or the distribution of patients among OLGA and OLGIM stages (p > 0.05) between the training, validation, and test sets 2).
Performance of the model on the training, validation, and test set
In the diagnostic model we built, the data set has a total of 546 patients and was randomly divided. To avoid model prediction bias caused by differences in the data distribution of each cohort, we used stratified sampling to divide the data set according to the true atrophy grade of the patients. We split patients into a training cohort and an external testing cohort in a ratio of 4:1. In the training cohort, 20% of the patients were used for internal validation of the model.
We use AUC to measure the ability of GasMIL to predict pathological images of different grades. Table 2 shows the model’s AUC, sensitivity, and specificity in the training, internal validation, and independent test cohorts. At the WSI level, GasMIL achieves good diagnostic performance in the independent test cohort (AUCInflammation: 0.970; AUCActivity: 0.981; AUCIM: 0.884; AUCAtrophy: 0.877). The representative heat map images with varying atrophy/IM grades are shown in Fig. 3A and B. For a certain problem, we referred to OLGA and OLGIM to convert the WSI prediction grade for different stomachs of the same patient to the final prediction grade for that patient. At the patient level, the good predictive performance of the diagnostic model carried over (AUCOLGIM: 0.792; AUCOLGA: 0.729).
As demonstrated in the cross-tabulation of GasMIL results with the gold standard (see Figure S1-4), among the 150 WSIs, two initially categorized as normal/mild atrophy were misdiagnosed as severe atrophy, and two initially labeled as severe atrophy were misdiagnosed as mild atrophy. Moreover, one case initially classified as normal was misdiagnosed as severe IM, and four initially categorized as severe IM were misdiagnosed as normal/mild IM. Retrieve the original image, we found that severe cases were often misdiagnosed as normal/mild due to significant inflammation and structural distortion (Fig. 3C). Additionally, when pathology sections intersected blood vessels or were excessively thin, normal/mild conditions were frequently misdiagnosed as severe (Fig. 3D).
Results of the observer study compared with 10 pathologists
The diagnostic results of GasMIL were compared with those of 10 pathologists on 30 patients with 150 WSIs. At the slide level, the ROC of the GasMIL for inflammation, activity, IM, and atrophy were 0.970 (95% CI 0.947–0.986), 0.970 (95% CI 0.941–0.998), 0.949 (95% CI 0.918–0.972) and 0.953 (95% CI 0.927–0.976), respectively (Fig. 4A–D), and were higher than those of 10 pathologists. The diagnostic sensitivity, specificity, and weighted kappa value of the GasMIL for inflammation, activity, IM, and atrophy were also higher than those of most pathologists (supplementary table S5).
The ROC for the OLGIM model was 0.792, which was higher than 6 out of 10 pathologists. For OLGA, GasMIL’s ROC was 0.729, ranking second among the 10 pathologists (Fig. 4E, F). The diagnostic sensitivity, specificity, and weighted kappa value of the artificial intelligence diagnostic model for OLGA and OLGIM are also higher than those of most pathologists (supplementary table S5).
The results of GasMIL-assisted diagnosis compared with pathologist-independent diagnosis
We compared the results of pathologists with and without GasMIL assistance on two groups of 30 patients with 150 WSIs. For IM, pathologists have higher AUC, sensitivity, and weighted kappa (p = 0.013, 0.014, 0.016, respectively, Fig. 5A, B, D) with the assistance of GasMIL, but there was no statistical difference in specificity (p > 0.05, Fig. 5C). For atrophy, pathologists had higher specificity with the assistance of GasMIL (p = 0.007, Fig. 5G), but there was no statistical difference in AUC, sensitivity, and weighted kappa (p > 0.05, Fig. 5E, F, H).
At the patient level, whether pathologists were assisted with GasMIL or not, the AUC, sensitivity, specificity, and weighted kappa of OLGIM and OLGA showed no statistical difference (p > 0.05, Fig. 5I–P), but there was a trend of improvement.
Discussion
Recent studies have shown that deep learning-based algorithms were promising in classifying and grading pathological lesions in digitized H&E slides [26, 27]. Regarding the poor diagnostic accuracy and increasing diagnostic workload of endoscopic biopsy specimens call for a high-performance algorithm with high sensitivity and specificity [28].
In this study, we developed an algorithm named GasMIL to diagnose inflammation, activity, atrophy, and IM in gastric biopsy specimens and demonstrated superb performance better than all ten pathologists. Especially concerning atrophy, the concept is represented by the discrepancy between the expected glands and what is actually observed at the histologic exam [29], which can be subjective and pathologists are most likely to be inconsistent with [30, 31]. Accurately diagnosing atrophy was considered crucial for the prevention of gastric cancer, as a study found that 37.2% of patients who developed gastric cancer had been diagnosed with indefinite atrophy previously [32]. In the present study, the GasMIL showed an 80% sensitivity, 85% specificity, and 0.61 weighted kappa value on the observer study, which was higher than that of pathologists trained through four rounds of reading (kappa = 0.46) [14]. Therefore, at the slide level, GasMIL has the potential to serve as a tool for supervising pathologists to process the sheer number of samples in limited clinical working hours.
To estimate whether the GasMIL model can accurately stratify GC risk in CAG patients as well, OLGIM and OLGA were obtained by combining the results of 5 WSIs. In the observer study, GasMIL showed the second-highest diagnostic accuracy in OLGA and fifth in OLGIM among the ten pathologists, with AUCs of 0.72 and 0.79, respectively. In contrast to OLGA, OLGIM reports a high interobserver concordance, consistent with our observer study [7, 33]. However, OLGIM staging was considered less sensitive than OLGA staging for it downgrades high-risk patients to low-risk groups [34]. Obtaining both OLGA and OLGIM information on the same pathological slide is beneficial for the secondary prevention of GC [9, 35]. At the patient level, high overall accuracy for the GasMIL in GC risk stratification was observed, suggesting that it can assist clinicians in individual GC risk stratification.
In addition, we conducted an observer study to investigate whether GasMIL can help pathologists improve diagnostic accuracy. The results revealed that with the assistance of GasMIL, the accuracy of pathologists in diagnosing IM significantly improved, as did the diagnostic specificity of atrophy. However, its accuracy in diagnosing OLGIM and OLGA did not significantly improve. Since the sample size analysis for the observer study was based on slides [36], the sample size of 30 patients may be too small to detect a statistical difference between OLGIM and OLGA. However, an increasing trend can be observed in Fig. 4 for their higher median with the assistance of GasMIL, and a follow-up observer study with a larger sample size is needed to evaluate its role in helping diagnose OLGIM and OLGA.
To the best of our knowledge, this is the first study that aimed to establish and validate a convolutional neural network algorithm to diagnose and grade OLGA and OLGIM based on the updated Sydney protocol. The AGA [3] recommends that gastric biopsies according to the updated Sydney system should become standard in the diagnostic workup for dyspepsia and gastritis, a step that has been shown to increase the detection rate of H. pylori and IM [37]. After a standard biopsy, it is essential to ensure that the pathologist has histologic scoring of gastric biopsy (for OLGA/OLGIM staging), avoiding the possibility of a secondary staging exam to determine risk level. However, in the grading of the biopsy IM and atrophy, many pathologists worldwide do not report severity scoring routinely [38]. The possible reason is that the judgment of severity scoring is quite subjective based on the proportion, making it difficult for pathologists to give an accurate and consistent result [13, 39, 40]. Therefore, we built a classification model for atrophy and IM using artificial intelligence techniques that can help quantify severity scores. High-quality multi-center data with strict quality control of all image acquisitions and histological analysis for every individual were used in this study, and a reliable gold standard diagnosis was jointly made by three experienced pathologists. Furthermore, GasMIL was fully validated, including the test set validation, compared with ten pathologists, and GasMIL auxiliary diagnosis compared with pathologists alone to investigate the robustness and reliability of GasMIL. The results proved that applying GasMIL for the quantitative analysis of WSIs offered valuable benefits for diagnosing atrophy and IM as well as GC risk in patients with CAG. Once the GasMIL model is established, pathologists only need to review and confirm the results of the model classification in the daily workflow, which is extremely easy for clinical applications.
Some studies have applied Convolutional Neuronal Networks technology to diagnose gastritis on biopsy H&E images. Panagiotis et al. reported a digital pathology framework for gastric gland segmentation and classification that achieved object dice scores equal to 0.908 and 0.967, respectively, in a dataset consisting of 20 patients with 85 WSIs of normal, gastric atrophy, and IM [41]. Georg et al. reported a convolutional neuronal network-based algorithm to classify gastritis into autoimmune, bacterial, and chemical subtypes, achieving an overall accuracy of 84% in a data set of 135 patients [42]. However, their digital pathology framework and study design were fundamentally different from ours. We focus on the problem that pathologists are having difficulty reaching a consensus when making diagnoses according to the updated Sydney system, we established an algorithm that can precisely grade the severity of atrophy and IM and can calculate the GC risk accordingly which is also beneficial to determine follow-up intervals.
Our study has some limitations. First, our model is independently developed and verified based on different gastric problems. This approach will increase machine memory overhead and does not further consider the relationship between image representations of different gastric problems. Secondly, as a black box model, the interpretability of the deep learning model is still poor. We tried to observe the model’s attention to different areas in the form of a heat map, but it is still difficult to explain how the model learns. Thirdly, the sample size of the observer study was insufficient for detecting the statistical difference between GasMIL auxiliary diagnosis and pathologists alone to diagnose OLGIM and OLGA, a larger sample size study is needed.
In conclusion, GasMIL shows the best overall performance in diagnosing inflammation, activity, IM, and atrophy, ranking fifth in diagnosing OLGIM and second in OLGA compared to ten pathologists. GasMIL-assisted significantly improves the performance of pathologists in diagnosing IM and atrophy. All of this suggested a clinical application potential of GasMIL for accurate pathological grading and GC risk stratification in atrophic gastritis patients.
Data availability
All authors confirm that they have full access to all the data in the study and approved the final version of the article.
References
Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA C J Clin. 2021;71:209–49.
Ajani JA, Lee J, Sano T, et al. Gastric adenocarcinoma. Nat Rev Dis Prim. 2017;3:1–19.
Gupta S, Li D, El Serag HB, et al. AGA clinical practice guidelines on management of gastric intestinal metaplasia. Gastroenterology. 2020;158:693–702.
Correa P. A human model of gastric carcinogenesis. Can Res. 1988;48:3554–60.
Song H, Ekheden IG, Zheng Z, et al. Incidence of gastric cancer among patients with gastric precancerous lesions: observational cohort study in a low risk Western population. BMJ. 2015;351: h3867.
Piazuelo MB, Bravo LE, Mera RM, et al. The Colombian chemoprevention trial: 20-year follow-up of a cohort of patients with gastric precancerous lesions. Gastroenterology. 2021;160:1106-1117.e3.
Capelle LG, de Vries AC, Haringsma J, et al. The staging of gastritis with the OLGA system by using intestinal metaplasia as an accurate alternative for atrophic gastritis. Gastrointest Endosc. 2010;71:1150–8.
Rugge M, Genta RM. Staging gastritis: an international proposal. Gastroenterology. 2005;129:1807–8.
Rugge M, Genta RM, Fassan M, et al. OLGA gastritis staging for the prediction of gastric cancer risk: a long-term follow-up study of 7436 patients. Am J Gastroenterol. 2018;113:1621–8.
Banks M, Graham D, Jansen M, et al. British Society of Gastroenterology guidelines on the diagnosis and management of patients at risk of gastric adenocarcinoma. Gut. 2019;68:1545–75.
Pimentel-Nunes P, Libânio D, Marcos-Pinto R, et al. Management of epithelial precancerous conditions and lesions in the stomach (maps II): European Society of gastrointestinal endoscopy (ESGE), European Helicobacter and microbiota Study Group (EHMSG), European Society of pathology (ESP), and Sociedade Port. Endoscopy. 2019;51:365–88.
Dixon MF, Genta RM, Yardley JH, et al. Classification and grading of gastritis: the updated Sydney system. Am J Surg Pathol. 1996;20:1161–81.
Rugge M, Correa P, Dixon MF, et al. Gastric mucosal atrophy: interobserver consistency using new criteria for classification and grading. Aliment Pharmacol Ther. 2002;16:1249–59.
Kim SS, Kook M, Shin O, et al. Factors to improve the interobserver agreement for gastric atrophy and intestinal metaplasia: consensus of definition and criteria. Histopathology. 2018;72:838–45.
Al-Omari FA, Matalka II, Al-Jarrah MA, et al. An intelligent decision support system for quantitative assessment of gastric atrophy. J Clin Pathol. 2011;64:330–7.
Sun C, Li B, Wei G, et al. Deep learning with whole slide images can improve the prognostic risk stratification with stage III colorectal cancer. Comput Methods Programs Biomed. 2022;221: 106914.
Feng L, Liu Z, Li C, et al. Development and validation of a radiopathomics model to predict pathological complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a multicentre observational study. Lancet Digit Health. 2022;4:e8–17.
Bejnordi BE, Veta M, Van Diest PJ, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318:2199–210.
Bulten W, Pinckaers H, van Boven H, et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 2020;21:233–41.
Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24:1559–67.
Madabhushi A, Feldman MD, Leo P. Deep-learning approaches for Gleason grading of prostate biopsies. Lancet Oncol. 2020;21:187–9.
Shi J-Y, Wang X, Ding G-Y, et al. Exploring prognostic indicators in the pathological images of hepatocellular carcinoma based on deep learning. Gut. 2021;70:951–61.
Lu MY, Williamson DFK, Chen TY, et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biom Eng. 2021;5:555–70.
Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25:1301–9.
Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR; 2020. pp. 1597–1607
van der Laak J, Litjens G, Ciompi F. Deep learning in histopathology: the path to the clinic. Nat Med. 2021;27:775–84.
Ström P, Kartasalo K, Olsson H, et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 2020;21:222–32.
Park J, Jang BG, Kim YW, et al. A prospective validation and observer performance study of a deep learning algorithm for pathologic diagnosis of gastric tumors in endoscopic biopsies deep learning–assisted diagnosis in gastric biopsies. Clin Cancer Res. 2021;27:719–28.
Pennelli G, Grillo F, Galuppini F, et al. Gastritis: update on etiological features and histological practical approach. Pathologica. 2020;112:153.
Salazar BE, Perez-Cala T, Gomez-Villegas SI, et al. The OLGA-OLGIM staging and the interobserver agreement for gastritis and preneoplastic lesion screening: a cross-sectional study. Virchows Arch. 2022;480:759–69.
Leja M, Funka K, Janciauskas D, et al. Interobserver variation in assessment of gastric premalignant lesions: higher agreement for intestinal metaplasia than for atrophy. Eur J Gastroenterol Hepatol. 2013;25:694–9.
Kim HJ, Kim N, Yun CY, et al. The clinical meaning of the “indefinite for atrophy” lesions within gastric mucosa biopsy specimens in a region with a high prevalence of gastric cancer. Helicobacter. 2019;24:1–9.
Lee JWJ, Zhu F, Srivastava S, et al. Severity of gastric intestinal metaplasia predicts the risk of gastric cancer: a prospective multicentre cohort study (GCEP). Gut. 2022;71:854–63.
Rugge M, Fassan M, Pizzi M, et al. Operative link for gastritis assessment vs operative link on intestinal metaplasia assessment. World J Gastroenterol: WJG. 2011;17:4596.
Rugge M, Meggio A, Pravadelli C, et al. Gastritis staging in the endoscopic follow-up for the secondary prevention of gastric cancer: a 5-year prospective study of 1755 patients. Gut. 2019;68:11–7.
Lehmann EL, D’Abrera HJ. Nonparametrics: statistical methods based on ranks. San Francisco: Holden-Day; 1975.
Lash JG, Genta RM. Adherence to the Sydney System guidelines increases the detection of Helicobacter gastritis and intestinal metaplasia in 400 738 sets of gastric biopsies. Aliment Pharmacol Ther. 2013;38:424–31.
Huang RJ, Laszkowska M, In H, et al. Controlling gastric cancer in a world of heterogeneous risk. Gastroenterology. 2023;S0016–5085(23):00051–3.
Offerhaus GJA, Price AB, ten Kate FJW, et al. Observer agreement on the grading of gastric atrophy. Histopathology. 1999;34:320–5.
Rugge M, Cassaro M, Pennelli G, et al. Atrophic gastritis: pathology and endoscopy in the reversibility assessment. Gut. 2003;52:1387–8.
Barmpoutis P, Waddingham W, Yuan J, et al. A digital pathology workflow for the segmentation and classification of gastric glands: study of gastric atrophy and intestinal metaplasia cases. PLoS ONE. 2022;17: e0275232.
Steinbuss G, Kriegsmann K, Kriegsmann M. Identification of gastritis subtypes by convolutional neuronal networks on histological images of antrum and corpus biopsies. Int J Mol Sci. 2020;21:6652.
Funding
National Traditional Chinese Medicine Inheritance and Innovation Team Project (No. ZYYCXTD-C-202210). Science Research Program for TCM Industry (No. 201507001-09).
Author information
Authors and Affiliations
Contributions
SF: conceptualization; data curation; formal analysis; investigation; methodology; validation; visualization; writing—original draft; writing—review and editing. ZL: conceptualization; data curation; investigation; methodology; project administration; supervision; writing—original draft; writing—review and editing. QQ: conceptualization; investigation; software; visualization; writing—original draft; writing—review and editing. ZT: investigation; methodology; supervision; writing—review and editing. YY: data curation; methodology; resources; validation. ZK: data curation; methodology; validation. XD: data curation; methodology; validation. SX: investigation; methodology. YL: methodology; validation. YL: methodology; visualization. LG: investigation; validation. LT: investigation; visualization. XL: investigation; validation. GF: investigation; validation. YZ: investigation; data curation. PZ: conceptualization; data curation; supervision; validation. WZ: conceptualization; data curation; supervision; validation. XL: methodology; supervision; validation. JT: conceptualization; methodology; project administration; resources; supervision. WW: conceptualization; data curation; funding acquisition; project administration; resources; supervision; validation; writing—original draft.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fang, S., Liu, Z., Qiu, Q. et al. Diagnosing and grading gastric atrophy and intestinal metaplasia using semi-supervised deep learning on pathological images: development and validation study. Gastric Cancer 27, 343–354 (2024). https://doi.org/10.1007/s10120-023-01451-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10120-023-01451-9