Abstract
Objective
To develop a two-phased deep learning sorting algorithm for post-X-ray image acquisition in order to facilitate large musculoskeletal image datasets according to their anatomical entity.
Methods
In total, 42,608 unstructured and pseudonymized radiographs were retrieved from the PACS of a musculoskeletal tumor center. In phase 1, imaging data were sorted into 1000 clusters by a self-supervised model. A human-in-the-loop radiologist assigned weak, semantic labels to all clusters and clusters with the same label were merged. Three hundred thirty-two non-musculoskeletal clusters were discarded. In phase 2, the initial model was modified by “injecting” the identified labels into the self-supervised model to train a classifier. To provide statistical significance, data split and cross-validation were applied. The hold-out test set consisted of 50% external data. To gain insight into the model’s predictions, Grad-CAMs were calculated.
Results
The self-supervised clustering resulted in a high normalized mutual information of 0.930. The expert radiologist identified 28 musculoskeletal clusters. The modified model achieved a classification accuracy of 96.2% and 96.6% for validation and hold-out test data for predicting the top class, respectively. When considering the top two predicted class labels, an accuracy of 99.7% and 99.6% was accomplished. Grad-CAMs as well as final cluster results underlined the robustness of the proposed method by showing that it focused on similar image regions a human would have considered for categorizing images.
Conclusion
For efficient dataset building, we propose an accurate deep learning sorting algorithm for classifying radiographs according to their anatomical entity in the assessment of musculoskeletal diseases.
Key Points
• Classification of large radiograph datasets according to their anatomical entity.
• Paramount importance of structuring vast amounts of retrospective data for modern deep learning applications.
• Optimization of the radiological workflow and increase in efficiency as well as decrease of time-consuming tasks for radiologists through deep learning.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Musculoskeletal diseases present a daily and also a global challenge for today’s healthcare system with far-reaching economic burdens to society and consequences to each individual who is affected by this disease. Finally, they result in pain and restriction of motion and interfering with the individuals’ quality of life [1]. Presenting a diverse group with respect to their pathophysiology, most conditions are, at least in part, classified according to the anatomical entity in which they are located. Hence, anatomy is being a crucial organizing principle for such diseases [2]. Accompanied by the ever-ongoing process of improving medical imaging with rapid changes and innovation [3], the recent dawning era of artificial intelligence (AI) and artificial neural networks will potentially lead to an increase of radiology exams. This will be especially true for medical imaging since it presents a cornerstone in the daily clinical routine and the workflow of shared care of patients with musculoskeletal diseases. Along the journey of new AI technologies, a variety of medical applications related to medical imaging has been noticed with the majority focusing on a head-to-head comparison of AI with humans [4]. However, modern imaging is more likely to involve human-in-the-loop setups, where humans actively collaborate with AI systems and provide oversight. When facing an increase of image data, the complexity of the radiology imaging workflow and corresponding amount of data promotes the need for possibilities to understand and use radiology data for gaining new knowledge and insights [5]. The complexity of medical imaging technologies implements the challenge for radiologists and neural networks to capture all details of each dataset, potentially reversing the hoped-for effect of reducing medical costs and time consumption when combining the workflow of human-AI collaboration. In order to avoid trade-offs to manage complex datasets in an active collaboration of humans with AI systems [6], pre-sorting algorithms according to the anatomical entity can be helpful to categorize large amount of image data, thus resulting in a more effective data analysis.
Multiple supervised, unsupervised, as well as self-supervised models have emerged [7,8,9,10,11] over the past years. These models could be utilized for sorting data. However, supervised learning requires a significant amount of annotated data and therefore demands for a substantial amount of time of a domain expert [12, 13]. In contrast, unsupervised and self-supervised models eliminate the need for time-consuming annotations, but clustering data follows mathematical rules such as similarity measures and consequently does not necessarily cluster data according to specific needs. In this study, we focus on utilization and modification of an established self-supervised model to categorize data effectively and efficiently according to specific requirements, while still keeping the demand of human interaction at a minimum. DeepCluster [8] demonstrates a self-supervising approach for learning image representation. The model iteratively groups features using a standard k-means clustering algorithm and uses the subsequent labels as supervision to update the weights of the network.
Therefore, in the present study, we propose a sorting algorithm for musculoskeletal X-ray radiography (SAM-X), a novel framework to support automatic classification and reasoning in the context of large image datasets.
Materials and methods
Dataset
The local institutional review and ethics board approved this retrospective study (N°48/20S). The study was performed in accordance with national and international guidelines. Informed consent was waived for this retrospective and anonymized study.
In total, 42,608 unstructured and pseudonymized radiographs were retrieved from the local Picture Archiving and Communication System (PACS) from a musculoskeletal tumor center (Klinikum rechts der Isar, Technical University of Munich) with sarcoma-associated ICD codes in DICOM format. The image data were collected over the past 25 years and contained heterogeneous data quality, resolution, and external images (~20%). Metadata such as DICOM header information or diagnoses are not yet validated and therefore not yet available.
All radiographs have been obtained through standard radiography techniques according to the body part imaged and in accordance with the radiographic manual procedures of our institution. Based on the aforementioned sarcoma-associated ICD code selection, data were acquired on potentially various points in time of the respective therapy status. Due to the various radiographic appearances and ubiquitous locations of sarcomas, our dataset includes a variety of all body parts, including potential anatomic variations, prosthetic devices, and medical implants. Figure 1 demonstrates examples from the dataset.
Statistical analysis
The result of the initial clustering (phase 1) was assessed through a normalized mutual information (NMI) metric. NMI is a variant of a measure commonly used in information theory, called mutual information. Mutual information indicates the “amount of information” that can be extracted from one distribution with respect to a second one. The results of sorting the images in the classification task were measured with an accuracy score calculating the amount of correctly assigned images with respect to all images from the hold-out test set. Additionally, the accuracy for considering the two most probable prediction labels was computed. Since meta-information is not yet available, no distribution analysis of sex, gender, diagnoses, etc. was conducted.
Model training
Model training and inference were conducted on a DGX Station A100 with four 80-GB graphical processing units (Nvidia Corporation, Santa Clara), 64 2.25-GHz cores, and 512-GB DDR4 system memory running on a Linux/Ubuntu 20.04 distribution (Canonical). Preprocessing and model implementation were performed in Python 3.9.6 (https://www.python.org/) using PyTorch 1.10.2 and cuda toolkit 11.3 (https://pytorch.org/). The trained classification model will be available on GitHub (https://github.com/) upon publication.
Algorithm
A two-phase deep learning framework was developed consisting of a self-supervised model, human interaction through weak, semantic label assignment, and implementation of a supervised learning task for final training (Fig. 2).
In phase 1, the 42,608 unstructured and pseudonymized X-ray images from a musculoskeletal research storage were clustered into 1000 clusters by application of DeepCluster [8]. Following, a senior radiologist identified several musculoskeletal labels by screening the results from phase 1. Each of the 1000 clusters was either assigned a class label or discarded, since the images were not applicable for the task at hand (e.g., not a musculoskeletal X-ray such as upper gastrointestinal series/barium swallow, mammography). Clusters with the same class label were merged, so consequently “weak” (noisy) classes emerged. In phase 2, a convolutional neural network was trained on the emerged classes as weak, semantic labels: to accomplish an optimal training and update of the network’s weights, the created labels were “injected” into the same self-supervised model. The auxiliary labels for each training iteration from k-means became obsolete, since we provided the labels. It became a supervised task and the network’s weights were trained with respect to our classification requirements. In order to provide statistical significance and avoid cross-contamination, the data was split into training, validation, and hold-out test sets with a respective split ratio of 6-2-2, hence a 5-fold cross-validation was implemented. Half of the hold-out test set consisted of external imaging data to provide an independent and unbiased test set and increase significance of the results.
Plausibility
To add plausibility and additional insight into the AI model, Grad-CAMs were implemented in the final inference step [14]. Grad-CAMs utilize the gradient information from the last convolutional layer of a deep learning network to understand specific neurons and their impact for decision-making. The result is a colored heat map, which is co-registered to the original input image and indicates where the algorithm found relevant information for the task at hand. This technique was applied to acquire a better understanding where the algorithm detects relevant information. To provide a higher expressiveness, the Grad-CAM results were averaged from the 5-fold cross-validation.
Results
In phase 1, an NMI of 0.930 in clustering the entire dataset was reached. Subsequently, a senior radiologist identified the following 28 main musculoskeletal classes: abdomen, ankle, calcaneus, cervical spine, clavicula, elbow, femur, finger, foot, forearm, hand, hip, humerus, knee, lower leg, lumbar spine, paranasal sinus, patella, pelvis, ribs, sacrum, shoulder, skull, spine, thoracic spine, thorax, whole leg (standing), and wrist. In total, 13,175 non-musculoskeletal images from three hundred thirty-two clusters were discarded and a “musculoskeletal subset” of 29,433 images remained for further training. Table 1 shows the final classes with the number of images per class and percentage share with respect to the musculoskeletal subset.
In phase 2, a cross-validated classification accuracy of 96.2% for validation and 96.6% for hold-out test data was accomplished, when only considering the class with the highest prediction probability. When considering the top two predicted class labels, an accuracy of 99.7% and 99.6% for validation and testing was reached, respectively. Figure 3 shows examples of the final predictions of the knee cluster.
Furthermore, Fig. 4 displays the results of the cross-validated Grad-CAMs for the classes pelvis (1a, 1b) and shoulder (2b, 2b). Purple pixels indicate that the algorithm did not find any relevant information in these pixels in contrast to red pixels, where most relevant information was detected.
Discussion
In the present study, we demonstrate a sorting algorithm for musculoskeletal X-ray radiographs (SAM-X) to categorize images according to their anatomical entity. Based on a two-phase deep learning framework, 42,608 unstructured and pseudonymized radiographs have been categorized into a total of 28 anatomical regions. Cross-validated accuracy of 96.2% for validation and 96.6% for hold-out test data indicated a high accuracy and excellent reliability. The aforementioned final predictions of the knee cluster demonstrate that even very different pathologies and appearances of knees were correctly identified and categorized. Further, the calculated Grad-CAMs display the algorithms focus for predicting a specific class and help to unravel the black-box nature of DL methodology. These results underline the plausibility and robustness of the predictions: the algorithm primarily focused on anatomical regions, which are significant for the respective class and would also have been used by radiologists to determine the class.
Today, the emerging use of medical AI systems and neural networks presents an important, yet “still in the making” opportunity for our daily clinical routine and research. Radiology has always been at the front line of evolution in medical imaging since the introduction of digital imaging systems [15], teleradiology [16], computer-aided diagnosis tools [17], AI systems, and machine learning methods have emerged in the age of digitization. Ultimately, this has led to a significant increase of radiology examinations [17, 18]. Despite the broad use of previous research projects to analyze AI and humans in a head-to-head comparison [4, 9], promoting a potential level of lacking trustworthiness in these new technologies, future use is more likely practicing the human-in-the-loop setup used in our study as well, providing the possibility of humans actively collaborating with AI systems. Due to the complexity of clinical radiology examinations, accompanied by the amount of data linked to each examination, radiological daily operation may become inefficient, requiring tools to improve daily workflow and productivity [19]. Even more, with respect to image research, the most time-consuming part will become dataset building, potentially being the bottleneck [7] between data collection and creation of structured vs. unstructured data. Also, with respect to musculoskeletal disorders, radiologists, clinicians, and researchers face a diverse group of underlying pathologies which, in the setting of the aforementioned increasing number of radiology examinations and large image datasets, may benefit from pre-sorting algorithms to maintain order and effectivity. Although the aforementioned technologies have emerged, up to date, radiography still plays an essential and fundamental role for diagnosing, differentiating, and assessing the onset as well as progression of various musculoskeletal diseases [20,21,22,23,, 24].
Mainly to harmonize occupational exposures and to flourish study comparison in meta-analyses for the use in occupational healthcare, surveillance, or research [25], the principle of categorizing musculoskeletal disease has already been widely used in the setting of identifying potential etiological or work-related factors that may lead to the onset or worsening of musculoskeletal disorders [26, 27]. However, case categorization in the setting of research or daily clinical routine needs to consider feasibility and the availability of resources. Hence, Dionne et al [28] proposed a minimal and optimal case definition for categorizing musculoskeletal diseases depending on its research purpose to promote balanced results. In contrast to the aforementioned studies, focusing on the preceding etiological aspect of musculoskeletal diseases, the sorting algorithm proposed in our study steps in to manage post-image acquisition of musculoskeletal diseases, and yet is in line with the aforementioned approach of Dionne et al since our proposed pre-sorting algorithm based on the anatomical entity provides a minimal still optimal tool for classifying radiographs in the assessment of musculoskeletal diseases. Since the expansion of radiological exams is most likely to generate large volumes of information and in order to establish a common hub to facilitate such large image datasets with potentially underlying musculoskeletal diseases, the data in our study were categorized according to their anatomical entity since most musculoskeletal diseases are, at least in part, classified according to the anatomical entity in which they are located [2].
To the best of our knowledge, no framework for curating and categorizing medical radiographs by musculoskeletal characteristics has yet been proposed. However, related problems have been addressed. In 2005, Lehmann et al [29] proposed automatic categorization of medical images into 80 classes, e.g., by imaging modality and biological system in the context of content-based image retrieval (CBIR). Uwimana et al [30] also demonstrated a content-based image retrieval model by establishing links between low-level features of images and high-level features of text codes. Gál et al [31] proposed a CBIR model with a multidisciplinary approach to solve the classification problem by combining image features, metadata, textual, and referential information. More recently, Guo et al [32] presented an interactive algorithm for dermatological image quantification that combines computation, visualization, and expert interaction. The most comparable study was proposed by Kart et al [33]: DeepMCAT, an unsupervised clustering approach also based on DeepCluster [8]. An end-to-end training automatically categorizes large-scale cardiac MR images into 13 classes without any annotation. The main differences with the method presented by Kart et al are that we integrated weak annotation to train the CNN according to our requirements and we aim to categorize X-ray images into 28 classes. However, we hypothesize that our approach generally can achieve high results due to the implementation of powerful state-of-the-art self-supervised methodology while keeping the demand of human interaction at a limit and being adaptable to other requirements with only minor adjustments.
We acknowledge that our study has several limitations. Firstly, the classifier did assume an input of images that relate to one of the (pre)defined radiographic classes. Images that would have been derived of a different image modality, such as ultrasound or cross-sectional imaging, and falsely merged to our SAM-X model, would not be detected as such but forced into one of the classes. However, this is a common issue with AI classification models and needs to be addressed in the future. Secondly, weak supervision is an approach of machine learning that uses imprecise (noisy) data for supervised learning usually by bypassing the time-consuming task of hand-labelling the whole dataset [34] for example through obtaining weak labels with clustering methods. Assuming that some images were “incorrectly” clustered even though the region of interested is present in the image (e.g., shoulder and clavicula or femur and knee), it is reasonable to consider multiple labels for a single image. To address this issue, we calculated a second accuracy score and considered the two predictions with the highest probability (as presented in the “Results” section). The respective scores for validation and testing reached 99.7% and 99.6% (in comparison to a single label 96.2 and 96.6). These numbers indicate that the model did not weakly label with respect to any incidental image features such as background or artefacts, but did indeed label according to similar anatomical features.
In conclusion, to facilitate the increasing amount of radiology examinations, accompanied by large image datasets, we propose a precise human-in-the-loop sorting algorithm for classifying radiographs in the assessment of musculoskeletal diseases according to the anatomical entity in which they are located. For dataset building, the algorithm proposes to be an efficient and time-saving tool in the setting of post-image acquisition.
Abbreviations
- AI:
-
Artificial intelligence
- CBIR:
-
Content-based image retrieval
- DL:
-
Deep Learning
- NMI:
-
Normalized mutual information
- PACS:
-
Picture Archiving and Communication System
References
Organization, W.H. (2003) The burden of musculoskeletal conditions at the start of the new millennium: report of a WHO scientific group
Board, W.C.o.T.E., Soft Tissue and Bone Tumours (2020) International agency for research on cancer
Aiello M, Cavaliere C, D'Albore A, Salvatore M (2019) The challenges of diagnostic imaging in the era of big data. J Clin Med 8(3):316
Liu X, Faes L, Kale AU et al (2019) A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 1(6):e271–e297
Kharat AT, Singhal S (2017) A peek into the future of radiology using big data applications. Indian J Radiol Imaging 27(2):241–248
Larson DB, Magnus DC, Lungren MP, Shah NH, Langlotz CP (2020) Ethics of using and sharing clinical imaging data for artificial intelligence: a proposed framework. Radiology 295(3):675–682
Rajpurkar P, Chen E, Banerjee O, Topol EJ (2022) AI in health and medicine. Nat Med 28(1):31–38
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV)
von Schacky CE, Wilhelm NJ, Schäfer VS et al (2021) Multitask deep learning for segmentation and classification of primary bone tumors on radiographs. Radiology 301(2):398–406
Nguyen X-B, Lee GS, Kim SH, Yang HJ (2020) Self-supervised learning based on spatial awareness for medical image analysis. IEEE Access 8:162973–162981
Ghesu FC, Georgescu B, Mansoor A et al (2022) Self-supervised Learning from 100 Million Medical Images. arXiv preprint arXiv:2201.01283
Montagnon E, Cerny M, Cadrin-Chênevert A et al (2020) Deep learning workflow in radiology: a primer. Insights Imaging 11(1):22
Willemink MJ, Koszek WA, Hardell C et al (2020) Preparing medical imaging data for machine learning. Radiology 295(1):4–15
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision
Alhajeri M, Aldosari H, Aldosari B (2017) Evaluating latest developments in PACS and their impact on radiology practices: A systematic literature review. Inform Med Unlocked 9:181–190
Mun SK, Turner JW (1999) Telemedicine: emerging e-medicine. Annu Rev Biomed Eng 1:589–610
Doi K (2007) Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph 31(4-5):198–211
McDonald RJ, Schwartz KM, Eckel LJ et al (2015) The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload. Acad Radiol 22(9):1191–1198
Schemmel A, Lee M, Hanley T et al (2016) Radiology workflow disruptors: a detailed analysis. J Am Coll Radiol 13(10):1210–1214
Al-Rakhami MS, Islam MM, Islam MZ, Asraf A, Sodhro AH, Ding W (2021) Diagnosis of Covid-19 from X-rays using combined CNN-RNN architecture with transfer learning. MedRxiv:2020.08.24.20181339
Chhikara P, Singh P, Gupta P, Bhatia T (2020) Deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays. In: Advances in bioinformatics, multimedia, and electronics circuits and signals. Springer, pp 155–168
Das NN, Kumar N, Kaur M, Kumar V, Singh D (2020) Automated deep transfer learning-based approach for detection of COVID-19 infection in chest X-rays. IRBM
Ory PA (2003) Radiography in the assessment of musculoskeletal conditions. Best Pract Res Clin Rheumatol 17(3):495–512
Taljanovic MS, Hunter TB, Fitzpatrick KA, Krupinski EA, Pope TL (2003) Musculoskeletal magnetic resonance imaging: importance of radiography. Skeletal Radiol 32(7):403–411
van der Molen HF, Visser S, Alfonso JH et al (2021) Diagnostic criteria for musculoskeletal disorders for use in occupational healthcare or research: a scoping review of consensus- and synthesised-based case definitions. BMC Musculoskelet Disord 22(1):169
Ma K, Zhuang ZG, Wang L et al (2019) The Chinese Association for the Study of Pain (CASP): consensus on the assessment and management of chronic nonspecific low back pain. Pain Res Manag 2019:8957847
Sluiter JK, Rest KM, Frings-Dresen MH (2001) Criteria document for evaluating the work-relatedness of upper-extremity musculoskeletal disorders. Scand J Work Environ Health 27(Suppl 1):1–102
Dionne CE, Dunn KM, Croft PR et al (2008) A consensus approach toward the standardization of back pain definitions for use in prevalence studies. Spine (Phila Pa 1976) 33(1):95–103
Lehmann TM, Guld MO, Deselaers T et al (2005) Automatic categorization of medical images for content-based retrieval and data mining. Comput Med Imaging Graph 29(2-3):143–155
Uwimana E, Ruiz ME (2008) Automatic classification of medical images for content based image retrieval systems (CBIR). In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. SAGE Publications Sage CA, Los Angeles, CA
Gál V, Solt I, Gedeon T, Nachtegael M (2011) Multi-disciplinary modality classification for medical images. Magnetic Resonance Imaging 17:1–7
Guo X, Yu Q, Li R et al (2016) An expert-in-the-loop paradigm for learning medical image grouping. In: Pacific-Asia conference on knowledge discovery and data mining. Springer
Kart T, Bai W, Glocker B, Rueckert D (2021) DeepMCAT: large-scale deep clustering for medical image categorization. Springer International Publishing, Cham
Zhou Z-H (2018) A brief introduction to weakly supervised learning. Nat Sci Rev 5(1):44–53
Funding
Open Access funding enabled and organized by Projekt DEAL. The authors state that this work has not received any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Guarantor
The scientific guarantor of this publication is Jan Neumann.
Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry
No complex statistical methods were necessary for this paper.
Informed consent
Written informed consent was waived by the Institutional Review Board.
Ethical approval
Institutional Review Board approval was obtained (N°48/20S).
Methodology
• retrospective
• diagnostic or prognostic study/experimental
• performed at one institution
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hinterwimmer, F., Consalvo, S., Wilhelm, N. et al. SAM-X: sorting algorithm for musculoskeletal x-ray radiography. Eur Radiol 33, 1537–1544 (2023). https://doi.org/10.1007/s00330-022-09184-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00330-022-09184-6