Automated Three-Dimensional Imaging and Pfirrmann Classification of Intervertebral Disc Using a Graphical Neural Network in Sagittal Magnetic Resonance Imaging of the Lumbar Spine

Baur, David; Bieck, Richard; Berger, Johann; Schöfer, Patrick; Stelzner, Tim; Neumann, Juliane; Neumuth, Thomas; Heyde, Christoph-E.; Voelker, Anna

doi:10.1007/s10278-024-01251-2

Automated Three-Dimensional Imaging and Pfirrmann Classification of Intervertebral Disc Using a Graphical Neural Network in Sagittal Magnetic Resonance Imaging of the Lumbar Spine

Original Paper
Open access
Published: 12 September 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Imaging Informatics in Medicine Aims and scope Submit manuscript

Automated Three-Dimensional Imaging and Pfirrmann Classification of Intervertebral Disc Using a Graphical Neural Network in Sagittal Magnetic Resonance Imaging of the Lumbar Spine

Download PDF

David Baur¹,
Richard Bieck²,
Johann Berger²,
Patrick Schöfer²,
Tim Stelzner²,
Juliane Neumann²,
Thomas Neumuth²,
Christoph-E. Heyde¹ &
…
Anna Voelker ORCID: orcid.org/0000-0002-2715-5438¹

14 Accesses
Explore all metrics

Abstract

This study aimed to develop a graph neural network (GNN) for automated three-dimensional (3D) magnetic resonance imaging (MRI) visualization and Pfirrmann grading of intervertebral discs (IVDs), and benchmark it against manual classifications. Lumbar IVD MRI data from 300 patients were retrospectively analyzed. Two clinicians assessed the manual segmentation and grading for inter-rater reliability using Cohen's kappa. The IVDs were then processed and classified using an automated convolutional neural network (CNN)–GNN pipeline, and their performance was evaluated using F1 scores. Manual Pfirrmann grading exhibited moderate agreement (κ = 0.455–0.565) among the clinicians, with higher exact match frequencies at lower lumbar levels. Single-grade discrepancies were prevalent except at L5/S1. Automated segmentation of IVDs using a pretrained U-Net model achieved an F1 score of 0.85, with a precision and recall of 0.83 and 0.88, respectively. Following 3D reconstruction of the automatically segmented IVD into a 3D point-cloud representation of the target intervertebral disc, the GNN model demonstrated moderate performance in Pfirrmann classification. The highest precision (0.81) and F1 score (0.71) were observed at L2/3, whereas the overall metrics indicated moderate performance (precision: 0.46, recall: 0.47, and F1 score: 0.46), with variability across spinal levels. The integration of CNN and GNN offers a new perspective for automating IVD analysis in MRI. Although the current performance highlights the need for further refinement, the moderate accuracy of the model, combined with its 3D visualization capabilities, establishes a promising foundation for more advanced grading systems.

ISSLS PRIZE IN BIOENGINEERING SCIENCE 2017: Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist

Article Open access 06 February 2017

Deep learning for automated, interpretable classification of lumbar spinal stenosis and facet arthropathy from axial MRI

Article 15 March 2023

External validation of the deep learning system “SpineNet” for grading radiological features of degeneration on MRIs of the lumbar spine

Article 14 July 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Low back pain (LBP) is a prevalent condition often associated with degenerative lumbar spinal diseases. These degenerative changes primarily affect the intervertebral disc (IVD) region, involving complex alterations in both the nucleus pulposus and adjacent vertebral end plates [1]. The Pfirrmann classification system, introduced in 2001, has been widely used to standardize the description of these degenerative changes [2]. This system categorizes IVD degeneration into five progressive severity levels based on the disc's structural integrity, signal intensity variations, and alterations in disc height.

However, the Pfirrmann classification system has shown limitations in terms of reliability and consistency. Studies have reported moderate intra-observer agreement with kappa values ranging from 0.69 to 0.81, while inter-rater reliability metrics have been found to vary between 0.491 and 0.830 [3, 4]. These inconsistencies pose challenges for the seamless integration of this classification into routine clinical diagnostics.

In response to these limitations, researchers have developed alternative classification systems. For instance, Soydan et al. proposed a novel 8-grade system that aims to overcome some of the limitations of the Pfirrmann classification by providing a more comprehensive assessment of disc degeneration, including aspects such as disc homogeneity, nucleus-annulus separation, signal intensity, disc height, and disc border regularity [5].

A recent meta-analysis by Compte et al. critically evaluated the performance of various machine learning algorithms in identifying lumbar disc degeneration and associated pathologies from MRI, comparing them to radiologists. The study highlighted that while current models, particularly deep learning approaches, show promise, they often underperform in external validations, pointing to the need for further research in developing robust algorithms that can generalize across diverse datasets [6].

Recent advancements in artificial intelligence (AI) and machine learning (ML) have opened new ways to address these limitations. Several studies have explored the application of ML techniques in automating the assessment of disc degeneration. For instance, Niemeyer et al. developed a deep learning model for the classification of disc degeneration based on MRI data, achieving promising results with an accuracy of 97.3% [7]. Their convolutional neural network (CNN) approach demonstrated high reliability across different spinal levels and degenerative stages.

Graph Neural Networks (GNNs) have emerged as a promising tool for processing non-standardized and irregular data, making them particularly suitable for 3D medical image analysis. Zhang et al. reviewed the application of GNNs in image-guided disease diagnosis, highlighting their ability to effectively represent relationships between relevant regions of interest in medical images [8]. Further supporting this, Ahmedt-Aristizabal et al. systematically reviewed the use of GNNs in medical diagnosis and analysis, emphasizing the potential of GNNs to capture both local and global structural information, which is critical in medical image analysis [9].

Despite these advancements, several gaps and limitations persist in the current literature. Most studies focus on 2D image analysis, potentially overlooking important 3D structural information. The integration of CNN and GNN approaches for comprehensive IVD analysis has not been thoroughly investigated, and the potential of 3D point-cloud representations in enhancing the accuracy of IVD degeneration classification remains unexplored.

To address these gaps, our study aims to develop and evaluate a novel CNN-GNN pipeline for automated 3D imaging and Pfirrmann classification of IVDs. By leveraging both 2D and 3D information, we hypothesize that this approach could provide more accurate and consistent assessments of IVD degeneration, potentially overcoming the limitations of current classification methods.

Materials and Methods

This study hypothesized that incorporating three-dimensional (3D) geometric forms to differentiate degenerative IVD pathologies could more accurately distinguish between degeneration classes of IVDs. Transitioning from conventional convolutional neural networks (CNNs), such as Pfirrmann class prediction, to GNNs is a potentially promising solution. We decided to train a GNN and compare its grading capabilities for Pfirrmann classes with those of human researchers.

Study Subjects

As this study was designed retrospectively, approved by the local ethics committee (025/21-ek), and was performed in accordance with the Declaration of Helsinki, the need for informed consent was waived. Inclusion was limited to patients with LBR who underwent MRI diagnostics with sagittal T2-weighted magnetic resonance imaging (MRI) slices of the whole lumbar spine (L1-S1) at our university hospital in 2015–2021. The exclusion criteria were age < 18 years, previous lumbar spine surgery, pre-existing spine implants, detectable tumors, or fractures. In total, 300 patients with 6043 sagittal T2-weighted turbo spin-echo (TSE) MR scans of the lumbar spine were included.

MRI Imaging

Different MRI scanners were used in the timeframe of patient inclusion ((1.5 Tesla MRI Aera, 3.0 Tesla Trio, Siemens, Erlangen, Germany) or (3.0 Tesla MRI Ingenia, Philipps, Amsterdam, Netherlands)). PACS software Syngo Plaza (Siemens Medical Solutions, Germany) was used. All sagittal T2-weighted TSE sequences of the lower spine were included. There were no exclusions based on the imaging quality.

Generation of Ground Truth Labels and csv Files

We extracted sagittal T2-weighted MRI slices of the lumbar spine from all 300 patients. After pseudonymization of the patient data, MRI slices were saved in the DICOM format. For mask generation, two independent spine surgery specialists segmented the data using segmentation software (Materialize – Mimics Version 22.0.0.524 Löwen, Belgium). The MRI slices were segmented into labels for each lumbar vertebra (L1-S1) and lumbar IVDs (L1/2-L5/S1) within all sagittal slices. All midline sagittal MR images of the lumbar spine were graded according to the Pfirrmann classification. The results were saved in Microsoft Excel. Intra- and inter-rater reliabilities were calculated using Cohen's kappa coefficients [10]. All labels were saved as 3-channel RGB. jpg files with a resolution of 300 dots per inch. For training, a.csv file was created. It contained labels for vertebrae, IVDs, and Pfirrmann grading of all respective IVDs for direct implementation in a Python 3.6 environment to prepare the data for further processing.

Image Processing Pipeline and Deep Learning Task

We automated an image-processing pipeline that accepted patient-individual sagittal MRI slices as input and classified IVD information according to a previously applied Pfirrmann grading. The pipeline consists of three main stages: (1) CNN-based image segmentation using a U-Net architecture to segment IVDs and vertebral bodies from each MRI slice, (2) 3D surface and volume reconstruction, where segmented slices were combined to create 3D representations of each IVD using marching cubes algorithm and point cloud generation techniques, and (3) graph-based classification, where the 3D point cloud representations were converted into graphs and analyzed using a GNN to predict Pfirrmann grades.

Our dataset consisted of multiple MRI slices per patient. To account for the potential correlation between slices from the same patient, we implemented a hierarchical approach in our analysis. For the segmentation task, we used a patient-level cross-validation strategy, ensuring all slices from a single patient were either in the training set or the test set, but never split between the two. For the classification task, we aggregated predictions across all slices for each patient before making a final classification decision.

MRI Segmentation

For segmentation, a pre-trained U-Net model (feature size = 32), previously employed for the identification of muscle and fat tissue in axial images, was used [11, 12]. The model was fine-tuned on the segmentation labels generated for the axial MRI slices as in the previous studies. We initiated a ten-fold cross-validation study as part of the training process. The slices and labels were transformed and augmented to increase the training sample size and reduce overfitting. The segmentation performance was evaluated using the F1 score (Fig. 1a, b).

For the segmentation model, we employed a tenfold cross-validation approach. In each fold, 10% of the data was used as test data, while the remaining 90% was split into 85% training and 15% validation data. We used an Adam optimizer with a learning rate of 0.001. The loss function used was dice loss, which is particularly effective for segmentation tasks.

Segmentation was performed on all available sagittal slices for each patient, not just the midline. This comprehensive approach allowed us to construct a detailed 3D representation of each IVD, capturing its full geometry across multiple slices.

3D Surface and Volume Reconstruction

In the reconstruction step, the segmented MRI slices were processed into 3D point-cloud representations of the surface and volume of the target IVDs for further analysis (Fig. 2). This first required the reconstruction of voxel representations of segmented MRI slices based on known pixel spacing from the DICOM imaging parameters [13]. An MRI volume can be represented as a 3D array, where each voxel's position is defined by its row, column, and slice indices (x, y, z coordinates), and its intensity value represents the measured signal at that location. This representation allows us to treat the MRI volume as a discrete scalar field for subsequent processing steps.

To capture the 3D geometry of the IVDs more flexibly, we employed random sampling of 80% of the points within the mesh volume, intentionally deviating from the regular grid structure of the original MRI voxels. This method has been shown to improve the representation of complex shapes and boundaries by allowing for a more adaptive structure, which is beneficial for graph-based analysis in 3D medical imaging [14, 15]. The resulting surface meshes comprised vertices and edge connections and were further processed to remove artifacts and improve the overall mesh quality using feature-preserving smoothing and edge splitting. This resulted in minimal reduction in the ground-truth mesh volume. Subsequently, point sampling was performed using the volume of the consolidated surface meshes. We applied the following approaches: (a) native voxel-to-volume, (b) random, and (c) regularized ellipsoid sampling. Using (a), we translated the scalar field values from the DICOM slices into node intensities; (b) was based on rejection sampling from inside the mesh; and (c) was based on the maximum and minimum values of the disc magnitudes.

Normalization to [-1,1] was applied to the data, and feature calculations were performed. These calculations included the distance of the nodes to the disc center, analogous to the nucleus, and the distance of the nodes to specified hypernodes that represent the point of maximum distance in each direction [16]. Positional encoding with a Laplacian eigenvector for each node with a step size of 20 and positional encoding with a random walk step size of 20 were used.

A radius graph (0.1 radius) was used to generate edge indices between nodes to perform message passing, and face indices were employed as edge indices [17]. Two reference architectures were used for comparison. First, a standard GCN model performing convolutions over the node features was used as the baseline [18]. The second model used two subnetworks that employed GCN and x-convolution layers separately, combined with attention layers for consolidation [19]. This dual-graph model incorporated the advantages of both convolution operations over feature maps and the spatial operations of x-conv across point clouds.

Model Training and Optimization

To train and optimize the dual-graph model, we implemented various augmentation functions, including random rotation and random jitter; this was aimed to increase the diversity of the training dataset and to improve the model generalization [20]. The model was trained for 50 epochs using decoupled weight decay regularization with an AdamW optimizer [21].

To address the issue of class imbalance, we employed a weighted geometric mean of the focal loss and cross-entropy loss (CEL) as the objective function for model optimization [22]. The weights were adjusted to offset class imbalance and ensure adequate representation of all IVD classes during the training process.

Statistical Analysis

We evaluated the inter-rater reliability of the IVD classification among human experts (two spine surgeons) and a dual-graph model using Cohen's kappa coefficient [10]. Further, we computed Fleiss' kappa to assess reliability among all three raters [23]. The model performance metrics, including precision, recall, and F1 score were calculated to assess the effectiveness of the dual-graph model in classifying IVDs according to the Pfirrmann grading.

Results

A total of 300 patients were included in the study, yielding 6043 sagittal T2-weighted TSE MR scans of the lumbar spine. The descriptive statistics of the patients are presented in Table 1.

Table 1 Descriptive patients’ data

Full size table

Automated image segmentation using the U-Net model demonstrated high efficiency. For the predicted segmentation of the IVD, F1 score, precision, and recall of 0.85, 0.83, and 0.88, respectively were achieved. Similarly, high values were obtained for the predicted segmentation of the vertebral bodies, with F1 score, precision, and recall of 0.87, 0.87, and 0.89, respectively. These results reflect the model's performance at the slice level, while accounting for patient-level correlations through our hierarchical approach as described in the methods.

Analysis of the Pfirrmann classification revealed moderate agreement between the two researchers across all lumbar disc levels. The mean Cohen's kappa value for inter-rater reliability was 0.507. The Cohen's kappa values for each IVD were as follows: for L1/2, L2/3, L3/4, L4/5, L5/S1, κ = 0.455, 0.500, 0.519, 0.496, and 0.565, respectively. These kappa values indicated that the inter-rater agreement exceeded what would be expected by chance alone and fell within the "moderate agreement" category.

The highest agreement was observed at the L5/S1 level, suggesting that the researchers were more consistently aligned in their assessments of the lower lumbar IVDs. Conversely, the L1/2 level exhibited the lowest agreement, indicating greater variability in grading this particular disc level.

Further analysis of the grading outcomes revealed that most gradings were consistent between the two researchers, with exact matches ranging from 131 to 207 across spinal levels, and single-grade differences being more common than two or more grade differences.

The performance of the GNN model in classifying the same disc degeneration was assessed using precision, recall, and F1 score metrics. Detailed analyses are presented in Tables 2, 3, and 4. The model achieved the highest precision (0.81), recall (0.63), and F1 score (0.71) at L2/3. The mean precision, recall, and F1 scores of the model across all levels were 0.46, 0.47, and 0.46, respectively. The standard deviations for these metrics across levels indicated variability in the model's performance, in the range 0.058–0.259, 0.052–0.213, and 0.081–0.209 for precision, recall, and the F1 score, respectively.

Table 2 Precision graph model

Full size table

Table 3 Recall graph model

Full size table

Table 4 F1 score graph model

Full size table

Discussion

To the best of our knowledge, this study is the first to evaluate the use of a complex CNN–GNN data pipeline for the analysis of degenerative disc disease on lumbar MRI images.

Our results for inter-rater reliability according to the Pfirrmann grading system exhibited moderate inter-rater agreement, with Cohen’s kappa values in the range 0.455–0.565. This finding is partially consistent with those of previous studies. For instance, Niemeyer et al. demonstrated a similar level of inter-rater reliability in the human Pfirrmann classification of discs, with a kappa value of 0.59, aligning with our moderate agreement category [7]. However, Carrino et al. exhibited a slightly higher kappa value of 0.66, which fell within the substantial agreement range [3].

The lowest inter-rater reliability was observed in the L1/2 segment, whereas the highest reliability was observed in the L5/S1 segment. Regarding the L1/2 segment, the GNN showed a low performance at the segmental level. However, its performance in the L5/S1 segment was unsatisfactory. Teraguchi et al. conducted a cross-sectional study on the prevalence and distribution of disc degeneration and showed that in the lumbar spine, the L1/2 (30.85%) segment exhibited the least degenerative changes, whereas the L4/5 (72.45%) and L5/S1 (68.8%) segments exhibited the most severe signs of degeneration [24]. The poorer inter-rater reliability may be attributed to the minor changes observed on MRI with mild degenerative changes in the L1/2 segment that are difficult to differentiate by the human eye.

Our study's results can be further contextualized within the broader landscape of automated IVD analysis. Jamaludin et al. achieved high accuracy in disc localization and segmentation using a vertebral approach [25]. Castro-Mateos et al. focused on 2D segmentation and degeneration grading [26], achieving promising results but lacking the 3D perspective our method provides. Huang et al.'s Spine Explorer demonstrated the potential of deep learning for vertebrae and disc quantification, aligning with our goal of comprehensive spine analysis [27].

It's important to note that the Pfirrmann system poses certain challenges for deep learning models, as observed by Gao et al. [28]. Their push–pull regularization network addressed some of these challenges, achieving high accuracy in automated grading. Our GNN-based approach aims to tackle these difficulties by incorporating 3D structural information, potentially offering a more comprehensive analysis of disc degeneration.

A recent study by Soydan et al. used a similar U-Net segmentation approach followed by Inception V3 classification, achieving high accuracy (segmentation: 0.9597, classification: 0.9346) [29]. While our segmentation performance was slightly lower (F1 score: 0.85), both studies demonstrate the potential of CNN-based approaches in IVDD analysis.

GNNs sometimes struggle with overfitting the training data, which can reduce their performance on new, unseen data. To mitigate this, we implemented several strategies, including data augmentation techniques, a weighted loss function to address class imbalance, and decoupled weight decay regularization during training. These steps aim to improve the model's generalization capability and robustness to unseen data.

This study had several limitations. The training data for this study, despite encompassing a wide range of patient ages, had most patients averaging approximately 60 years. This age distribution could potentially lead to underrepresentation of lower Pfirrmann grades in the dataset. Moreover, as this was a single-center study, the findings were confined to different MRI scanners, substantially limiting the generalizability of the results.

Our current dataset and study design did not allow for analysis of potential gender or BMI effects on model performance. Previous research has shown that these factors can influence disc degeneration patterns [30]. Future studies should consider stratifying results by gender and BMI, and potentially incorporating these as input features to improve model accuracy and generalizability.

The potential for employing 3D models in IVD assessment is a promising avenue for future research but requires validation. Our findings support the integration of AI tools with human expertise for more accurate evaluation of spinal pathology. However, the limitations of this study, such as potential biases in image selection and inconsistencies in the GNN model, must be acknowledged. These factors may affect the generalizability of our results, underscoring the importance of further research using larger and more diverse datasets to establish the effectiveness of new classification systems, particularly those that utilize 3D evaluation techniques.

Conclusion

Our study introduces a novel CNN-GNN pipeline for 3D visualization and classification of lumbar disc degeneration. While the model’s performance varied across spinal levels, it demonstrated the potential of integrating 3D geometric information in automated IVD analysis. Despite current limitations, this approach offers a new perspective on capturing spatial relationships in disc degeneration assessment. Future research should focus on refining these techniques, utilizing larger datasets, and incorporating additional clinical factors to improve accuracy and generalizability. The continued development of such AI-based 3D imaging approaches could significantly enhance our understanding and evaluation of intervertebral disc degeneration.

Data Availability

The detailed datasets used in this study are available from the corresponding author upon request.

Abbreviations

MRI:: Magnetic resonance imaging
CNN:: Convolutional neural network
GNN:: Graph neural network
ML:: Machine learning
IVD:: Intervertebral disc
VB:: Vertebral body
AI:: Artificial intelligence
3D:: Three-dimensional
LBP:: Low back pain

References

Moore RJ. The vertebral endplate: disc degeneration, disc regeneration. Eur Spine J 2006; 15 Suppl 3(Suppl 3):S333–7.
Pfirrmann CW, Metzdorf A, Zanetti M, Hodler J, Boos N. Magnetic resonance classification of lumbar intervertebral disc degeneration. Spine 2001; 26(17):1873–8.
Article CAS PubMed Google Scholar
Carrino JA, Lurie JD, Tosteson ANA, Tosteson TD, Carragee EJ, Kaiser J et al. Lumbar spine: reliability of MR imaging findings. Radiology 2009; 250(1):161–70.
Article PubMed Google Scholar
Arana E, Royuela A, Kovacs FM, Estremera A, Sarasíbar H, Amengual G et al. Lumbar spine: agreement in the interpretation of 1.5-T MR images by using the Nordic Modic Consensus Group classification form. Radiology 2010; 254(3):809–17.
Article PubMed Google Scholar
Soydan Z, Bayramoglu E, Urut DU, Iplikcioglu AC, Sen C. Tracing the disc: The novel qualitative morphometric MRI based disc degeneration classification system. JOR Spine 2024; 7(1):e1321.
Article CAS PubMed PubMed Central Google Scholar
Compte R, Granville Smith I, Isaac A, Danckert N, McSweeney T, Liantis P et al. Are current machine learning applications comparable to radiologist classification of degenerate and herniated discs and Modic change? A systematic review and meta-analysis. European Spine Journal 2023; 32(11):3764–87.
Article PubMed Google Scholar
Niemeyer F, Galbusera F, Tao Y, Kienle A, Beer M, Wilke H-J. A Deep Learning Model for the Accurate and Reliable Classification of Disc Degeneration Based on MRI Data. Invest Radiol 2021; 56(2):78–85.
Article PubMed Google Scholar
Zhang L, Zhao Y, Che T, Li S, Wang X. Graph neural networks for image‐guided disease diagnosis: A review. iRADIOLOGY 2023; 1(2):151–66.
Article Google Scholar
Ahmedt-Aristizabal D, Armin MA, Denman S, Fookes C, Petersson L. Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future. Sensors (Basel) 2021; 21(14).
Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 1960; 20(1):37–46.
Article Google Scholar
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Cham: Springer International Publishing; 2015. p. 234–41 (Lecture Notes in Computer Science).
Baur D, Bieck R, Berger J, Neumann J, Henkelmann J, Neumuth T et al. Analysis of the paraspinal muscle morphology of the lumbar spine using a convolutional neural network (CNN). Eur Spine J 2022; 31(3):774–82.
Article PubMed Google Scholar
Pham DL, Xu C, Prince JL. Current methods in medical image segmentation. Annu Rev Biomed Eng 2000; 2:315–37.
Article CAS PubMed Google Scholar
Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans Pattern Anal Mach Intell 2021; 43(12):4338–64.
Article PubMed Google Scholar
Xiao A, Huang J, Guan D, Zhang X, Lu S, Shao L. Unsupervised Point Cloud Representation Learning With Deep Neural Networks: A Survey; 2022 9. Available from: URL: http://arxiv.org/pdf/2202.13589.
Belkin M, Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In: Advances in Neural Information Processing Systems. MIT Press; 2001 Available from: URL: https://proceedings.neurips.cc/paper_files/paper/2001/file/f106b7f99d2cb30c3db1c3cc0fde9ccb-Paper.pdf.
Bevilacqua B, Zhou Y, Ribeiro B. Size-Invariant Graph Representations for Graph Classification Extrapolations; 2021.
Kipf TN, Welling M. Semi-Supervised Classification with Graph Convolutional Networks; 2016.
Li J, Chen BM, Lee GH. SO-Net: Self-Organizing Network for Point Cloud Analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE; 2018. p. 9397–406.
Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. J Big Data 2019; 6(1).
Loshchilov I, Hutter F. Decoupled Weight Decay Regularization; 2017.
Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal Loss for Dense Object Detection; 2017.
Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin 1971; 76(5):378–82.
Article Google Scholar
Teraguchi M, Yoshimura N, Hashizume H, Muraki S, Yamada H, Minamide A et al. Prevalence and distribution of intervertebral disc degeneration over the entire spine in a population-based cohort: the Wakayama Spine Study. Osteoarthritis Cartilage 2014; 22(1):104–10.
Article CAS PubMed Google Scholar
Jamaludin A, Lootus M, Kadir T, Zisserman A. Automatic Intervertebral Discs Localization and Segmentation: A Vertebral Approach. In: Vrtovec T, Yao J, Glocker B, Klinder T, Frangi A, Zheng G et al., editors. Computational Methods and Clinical Applications for Spine Imaging. Cham: Springer International Publishing; 2016. p. 97–103 (Lecture Notes in Computer Science).
Castro-Mateos I, Pozo JM, Lazary A, Frangi AF. 2D segmentation of intervertebral discs and its degree of degeneration from T2-weighted magnetic resonance images. In: Medical Imaging 2014: Computer-Aided Diagnosis. SPIE; 2014. p. 903517 (SPIE Proceedings).
Huang J, Shen H, Wu J, Hu X, Zhu Z, Lv X et al. Spine Explorer: a deep learning based fully automated program for efficient and reliable quantifications of the vertebrae and discs on sagittal lumbar spine MR images. Spine J 2020; 20(4):590–9.
Article PubMed Google Scholar
Gao F, Liu S, Zhang X, Wang X, Zhang J. Automated Grading of Lumbar Disc Degeneration Using a Push-Pull Regularization Network Based on MRI. J Magn Reson Imaging 2021; 53(3):799–806.
Article PubMed Google Scholar
Soydan Z, Bayramoglu E, Karasu R, Sayin I, Salturk S, Uvet H. An Automatized Deep Segmentation and Classification Model for Lumbar Disk Degeneration and Clarification of Its Impact on Clinical Decisions. Global Spine J 2023:21925682231200783.
Takatalo J, Karppinen J, Niinimäki J, Taimela S, Näyhä S, Järvelin M-R et al. Prevalence of degenerative imaging findings in lumbar magnetic resonance imaging among young adults. Spine 2009; 34(16):1716–21.
Article PubMed Google Scholar
Samartzis D, Karppinen J, Chan D, Luk KDK, Cheung KMC. The association of lumbar intervertebral disc degeneration on magnetic resonance imaging with body mass index in overweight and obese adults: a population-based study. Arthritis Rheum 2012; 64(5):1488–96.
Article PubMed Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. The study was partially funded by Deutsche Arthrose-Hilfe e.V. (Grant number PP513-A585).

Author information

Authors and Affiliations

Department for Orthopedics, Trauma and Plastic Surgery, University Hospital Leipzig, Liebigstraße 20, 04103, Leipzig, Germany
David Baur, Christoph-E. Heyde & Anna Voelker
Innovation Center Computer Assisted Surgery (ICCAS), Universität Leipzig, Leipzig, Germany
Richard Bieck, Johann Berger, Patrick Schöfer, Tim Stelzner, Juliane Neumann & Thomas Neumuth

Authors

David Baur
View author publications
You can also search for this author in PubMed Google Scholar
Richard Bieck
View author publications
You can also search for this author in PubMed Google Scholar
Johann Berger
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Schöfer
View author publications
You can also search for this author in PubMed Google Scholar
Tim Stelzner
View author publications
You can also search for this author in PubMed Google Scholar
Juliane Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Neumuth
View author publications
You can also search for this author in PubMed Google Scholar
Christoph-E. Heyde
View author publications
You can also search for this author in PubMed Google Scholar
Anna Voelker
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DB, RB, and AV wrote the main manuscript; DB and AV performed the human Pfirrmann classification; RB, TS, PS and DB wrote the code; JN evaluated the measurements; and TN and CEH supervised manuscript writing.

Corresponding author

Correspondence to Anna Voelker.

Ethics declarations

Ethics Approval

This study protocol was approved by the ethics committee of the Medical Faculty at the University of Leipzig, Germany (Ethics Committee; 025/21-ek) and in accordance with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

David Baur and Richard Bieck shared first authorship.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Baur, D., Bieck, R., Berger, J. et al. Automated Three-Dimensional Imaging and Pfirrmann Classification of Intervertebral Disc Using a Graphical Neural Network in Sagittal Magnetic Resonance Imaging of the Lumbar Spine. j Imaging. Inform. med. (2024). https://doi.org/10.1007/s10278-024-01251-2

Download citation

Received: 07 April 2024
Revised: 25 August 2024
Accepted: 27 August 2024
Published: 12 September 2024
DOI: https://doi.org/10.1007/s10278-024-01251-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automated Three-Dimensional Imaging and Pfirrmann Classification of Intervertebral Disc Using a Graphical Neural Network in Sagittal Magnetic Resonance Imaging of the Lumbar Spine

Abstract

Similar content being viewed by others

ISSLS PRIZE IN BIOENGINEERING SCIENCE 2017: Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist

Deep learning for automated, interpretable classification of lumbar spinal stenosis and facet arthropathy from axial MRI

External validation of the deep learning system “SpineNet” for grading radiological features of degeneration on MRIs of the lumbar spine

Introduction