Abstract
Parkinson’s disease (PD) exhibits heterogeneity in terms of symptoms and prognosis, likely due to diverse neuroanatomical alterations. This study employs a contrastive deep learning approach to analyze Magnetic Resonance Imaging (MRI) data from 932 PD patients and 366 controls, aiming to disentangle PD-specific neuroanatomical alterations. The results reveal that these neuroanatomical alterations in PD are correlated with individual differences in dopamine transporter binding deficit, neurodegeneration biomarkers, and clinical severity and progression. The correlation with clinical severity is verified in an external cohort. Notably, certain proteins in the cerebrospinal fluid are strongly associated with PD-specific features, particularly those involved in the immune function. The most notable neuroanatomical alterations are observed in both subcortical and temporal regions. Our findings provide deeper insights into the patterns of brain atrophy in PD and potential underlying molecular mechanisms, paving the way for earlier patient stratification and the development of treatments to slow down neurodegeneration.
Similar content being viewed by others
Introduction
Parkinson’s disease (PD) is the second most common neurodegeneration disease, with heterogeneous motor and cognitive symptoms and a highly variable prognosis, suggesting that underlying neuroanatomical alterations are also heterogeneous1,2. Understanding the heterogeneity in neuroanatomical alterations is crucial for facilitating early stratification of patients at risk of fast disease progression3,4,5. However, neuroanatomical alterations, such as gray matter volume loss, may not be specific to PD as they are influenced by factors such as genetics, early life factors, and lifestyle that cause inter-individual neuroanatomical variations6,7,8. Differences in data acquisition sites and methodological artifacts may also contribute to variations in the neuroanatomical manifestations of different groups of patients.
Prior studies have primarily relied on linear models to explore neuroimaging differences between PD patients and controls at the group level, adjusting for covariates7,9,10,11. Nevertheless, the relationship between these covariates and brain imaging data might not always be linear but could involve intricate nonlinear interactions12,13. Additionally, these studies have primarily focused on group-level comparisons, offering a general understanding of PD-related differences but neglecting the disease-specific variations unique to each patient7,9,10,11. Capturing such patient-specific variations is crucial for achieving more precise diagnoses and tailored therapeutic interventions.
Contrastive Variational Autoencoders (CVAE) is a powerful deep-learning model that can disentangle disease-specific features, both linearly and nonlinearly, from variations shared by both case and control populations14. This approach has recently been employed to disentangle autism spectrum disorder-specific features from neuroanatomical variations shared with controls15. Similarly, in PD, the pathological accumulation of α-synuclein leads to the loss of dopaminergic neurons in the basal ganglia, a shared underlying mechanism that characterizes the disease16,17. Given CVAE’s capacity to isolate these disease-specific features from the general neuroanatomical variations observed in both case and control subjects, it makes it a highly suitable model for PD research. Moreover, the unique properties of CVAE allow the identification of disease-specific variations that are unique to each patient.
The dopaminergic neuronal dysfunction in PD can be measured using dopamine transporter (DAT) imaging, which is a useful marker of disease severity in PD18. However, pathological α-synuclein not only targets the basal ganglia but also spreads throughout the brain connectome, inducing axonal dysfunction in distinct brain regions17,19. Neurofilament light chain protein (NfL) in serum is a sensitive measure of neuroaxonal damage, which is highly expressed in large-caliber myelinated axons20. Serum NfL levels increase proportionally to the degree of axonal dysfunction5, providing an additional useful biomarker for PD severity20,21.
In this study, we employed CVAE to disentangle features unique to PD from structural MRI data that encompass variations shared by both PD patients and controls (Supplementary Fig. 1). We validated the disentangled PD-specific features by demonstrating that PD-specific features were related to DAT binding deficit, biofluid markers for neurodegeneration, and clinical severity and progression. In addition, we identified cerebrospinal fluid (CSF) proteins that are associated with PD-specific features to understand which biological pathways are relevant to PD-specific neuroanatomical changes. Finally, we leveraged the characteristics of the CVAE to visualize PD-specific neuroanatomical changes.
Results
Demographic and clinical characteristics
In total, 1298 participants were included in this study, consisting of 932 patients with PD and 366 controls. Table 1 summarizes the baseline demographic and clinical characteristics of the participants. There were no significant differences in age among the three datasets of PD patients and controls (One-Way ANOVA test, p = 0.91). The proportion of females in the PD patients is the highest at SAHSU, followed by SAHZJU, and is the lowest in the PPMI (the Parkinson’s Progression Marker Initiative) study (50.70%, 44.77%, and 38.76%, respectively). The median (IQR: interquartile range) educational years of PD patients and controls in PPMI was 16 (14-18) years. The participants in the Second Affiliated Hospital of Zhejiang University (SAHZJU) and the Second Affiliated Hospital of Soochow University (SAHSU) have lower educational levels, with median (IQR) of 9 (5–11.5) and 9 (5.5–11.5) years. In addition, the mean Movement Disorder Society-Unified Parkinson Disease Rating Scale (MDS-UPDRS)-Part III of the PD patients in SAHSU is higher than that of SAHZJU and PPMI (27.00 [18.50, 34.50], 20.00 [13.00, 32.00], 20.00 [14.00, 27.00], respectively). The participants in PPMI have higher Montreal Cognitive Assessment (MoCA) than SAHZJU (PD: 27.00 [26.00, 29.00] vs 22.00 [18.00, 26.00]; controls: 28.00 [27.00, 29.00] vs 25.00 [22.00, 27.00]).
The PD-specific features correlate with disease severity
We found that measures of PD severity were predominantly associated with PD-specific features, as indicated by Kendall rank correlation, rather than shared features. These include putamen specific binding ratios (SBR) (PD-specific: τ = 0.03, t9 = 7.46, p < 0.001; shared: τ = −0.00, t9 = −0.84, p = 0.422), MDS-UPDRS-Part III (PD-specific: τ = 0.01, t9 = 5.57, p < 0.001; shared: τ = 0.00, t9 = 0.51, p = 0.624), MoCA (PD-specific: τ = −0.01, t9 = −4.24, p = 0.002; shared: τ = 0.01, t9 = 1.07, p = 0.313). Similar results were obtained when analyzing fluid biomarkers, including serum NfL (PD-specific: τ = 0.03, t9 = 9.69, p < 0.001; shared: τ = 0.00, t9 = 0.61, p = 0.555), CSF α-Synuclein (PD-specific: τ = 0.02, t9 = 10.73, p < 0.001; shared: τ = 0.00, t9 = 0.72, p = 0.488), CSF Total Tau (T-Tau) (PD-specific: τ = 0.02, t9 = 5.99, p < 0.001; shared: τ = −0.00, t9 = −0.48, p = 0.642), and CSF Phosphorylated Tau (P-Tau) (PD-specific: τ = 0.01, t9 = 5.24, p < 0.001; shared: τ = 0.00, t9 = 0.35, p = 0.732), CSF Aβ42 (PD-specific: τ = 0.01, t9 = 7.39, p < 0.001; shared: τ = 0.00, t9 = 0.32, p = 0.758) (Fig. 1). The details of these tests can be found in Supplementary Table 1. The RSA analysis conducted using the SAHZJU dataset also yielded similar findings, as shown in Supplementary Fig. 2. Additionally, the PD-specific features were also associated with non-clinical information (Fig. 1 and Supplementary Fig. 3). This suggests that non-clinical factors may have an impact on brain structural changes in PD, which will be discussed further (see “Discussion”).
We conducted analyses to assess the independence of PD-specific features from those shared features, both quantitatively and qualitatively (Supplementary Figs. 4 and 5). Our findings indicate a low overall similarity (r = 0.031–0.056) between PD-specific and shared features across 10 sampling results. Following the dimension reduction of these features, the scatter plot separates into two distinct categories. Additionally, we evaluated the sensitivity and specificity of CVAE in distinguishing PD-specific features from structural MRI data, and the results demonstrate its effectiveness in this task (Supplementary Figs. 4–6).
We further developed a non-contrastive variational autoencoder (VAE) to investigate the associations between neuroanatomy and PD symptoms, without necessitating the separation of PD-specific and shared variations. Our Kendall rank correlation analysis revealed that there were no significant correlations between the neuroanatomical features extracted by the VAE and any of the clinical or non-clinical characteristics (one-sample t-test, site: p = 0.15; age: p = 0.47; gender: p = 0.78; education: p = 0.94; putamen SBR: p = 0.14; serum NfL: p = 0.41; MDS-UPDRS-Part III: p = 0.22; MoCA: p = 0.30; CSF α-Synuclein: p = 0.73; CSF T-Tau: p = 0.63; CSF P-Tau: p = 0.35; CSF Aβ42: p = 0.41) (Fig. 1). This suggests that the neuroanatomical features derived from the VAE, encompassing both PD-specific and shared variations, may not capture the structural changes underlying PD symptoms.
In conclusion, by employing the CVAE, we successfully distinguished the neuroanatomical variations specific to PD from variations shared by both case and control populations. Notably, these PD-specific features exhibited differential associations with disease severity.
Generalization to an independent dataset
To test generalization, we applied the pre-trained VAE and CVAE models to an independent dataset of PD patients from the SAHSU dataset, using a zero-free-parameters strategy. This allowed us to disentangle PD-specific and shared features without retraining the model. We found that PD-specific features were more correlated with MDS-UPDRS III scores (PD-specific feature: τ = 0.04, t9 = 7.81, p < 0.001, shared: τ = 0.00, t9 = 0.11, p = 0.913) compared to shared features (Fig. 7), as indicated by Kendall rank correlation. These results indicate that the CVAE model is generalizable and can effectively disentangle shared and PD-specific features from neuroanatomical images of PD patients.
PD-specific features correlate with clinical progression
We estimated the slope of each clinical score over time using linear mixed-effects (LME)22 models and investigated its association with shared and PD-specific features. Our Kendall rank correlation analysis revealed that PD-specific features were significantly correlated with MDS-UPDRS-Part III slope (PD-specific: τ = 0.03, t9 = 7.43, p < 0.001; shared: τ = −0.00, t9 = −0.65, p = 0.532; and MoCA slope (PD-specific: τ = 0.01, t9 = 3.10, p = 0.013; shared: τ = 0.00, t9 = 0.03, p = 0.978;), while shared features were not (Fig. 2b). These results support the notion that PD-specific features are closely related to the motor and cognitive progression within PD.
To further support the association between PD-specific features and clinical progression, we clustered PD patients into two subgroups based on their PD-specific features. Specifically, we reduced the 16-dimensional PD-specific features to 2 dimensions using principal component analysis and fed these into a K-means cluster algorithm (Fig. 2c). The result suggested that 2 clusters were optimal, comprising 62.3% (n = 302) and 37.7% (n = 183) of the 485 PD patients from the PPMI cohort. We then used LME models to compare the clinical progression of the two subgroups by testing the interaction of group and time. Interestingly, we found that patients in subgroup 1 had slower rates of progression in MDS-UPDRS Part-III (subgroup * month: standardized β = 0.08, 95% CI, 0.03–0.13 p = 0.01), but faster rates of progression in MoCA (subgroup * month: standardized β = 0.01, 95% CI, 0.00–0.03, p = 0.05) than those in subgroup 2 (Fig. 2d). We also compared the demographic and clinical characteristics of these two subgroups and found significant differences in CSF α-synuclein (two-sample t-test, p = 0.007) and gender (two-sample t-test, p = 0.0001) (Supplementary Table 2).
PD-specific features correlate with CSF proteins
We identified 1773 CSF proteins that displayed significantly stronger associations with PD-specific features compared to their relationship with shared features (paired two-sample t-tests, FDR-corrected p < 0.05), as depicted in Fig. 3a. These proteins were mainly enriched in the immune system, encompassing processes related to T cell activation, positive regulation of cytokine production, regulation of peptidase activity, leukocyte proliferation, neutrophil degranulation, neutrophil activation, and lymphocyte proliferation (Fig. 3b). Furthermore, these proteins were also enriched in several molecular functions and cellular components including collagen-containing extracellular matrix, vesicle lumen, cytokine receptor binding, glycosaminoglycan binding, and so on (Supplementary Fig. 8). Using STRING protein–protein interaction network analysis, we found that these proteins are highly interconnected, with the top hub node being AKT1 (Fig. 3d). Additionally, we investigated whether these 1773 proteins were associated with putamen SBR, fluid biomarkers, clinical severity, and progression. Ultimately, we identified 12 proteins that were correlated with all these factors (Fig. 3c).
Visualization of the PD-specific neuroanatomical alterations
To identify neuroanatomical alterations related to PD-specific features at the PD group level, we calculated the average Jacobian maps across all PD patients from PPMI and SAHZJU (Fig. 4a). Our findings revealed a widespread atrophy pattern, predominantly affecting the subcortical regions and extending to the temporal, frontal, and parietal lobes across the cerebral cortex (Fig. 4b). Furthermore, beyond the observed atrophy, we identified volumetric enlargement in select white matter areas, including periventricular regions, the right centrum semiovale, and within the cerebellar white matter (Fig. 4b).
To identify neuroanatomical variation loci within the PD population, we calculated the standard deviation of Jacobian maps across all PD patients from PPMI and SAHZJU (Fig. 4c). The results suggest that the frontal, temporal, parietal lobes, and cerebellum exhibiting high variance. Conversely, subcortical regions display lower variability across the brain. The detailed analysis results of PPMI and SAHZJU alone were shown in Supplementary Figs. 9 and 10, respectively. Notably, analysis using the SAHSU (Supplementary Fig. 11) showed similar brain patterns as the PPMI and SAHZJU, thus further supporting the robustness of our findings across different datasets and populations. These findings suggest that subcortical atrophy is a common feature of PD patients, while neuroanatomical alterations in the cortex and cerebellum exhibit greater variance. The Supplementary Results and Discussion section contains a detailed description of the neuroanatomical alterations between the two subgroups.
Discussion
In this study, we used a deep-learning model, CVAE, to disentangle PD-specific features from structural MRI data that include variations shared by both PD patients and controls. The PD-specific features disentangled by the CVAE showed associations with well-established PD biomarkers, including DAT binding, α-Synuclein, and serum NfL, as well as clinical severity measures such as MDS-UPDRS-Part III and MoCA. Moreover, the CVAE was successfully generalized to an independent dataset without the need for additional training to disentangle PD-specific features. Furthermore, we also identified 1773 CSF proteins that exhibited stronger correlations with PD-specific features compared to shared features, with the most prominent being enriched in the immune system. Our visualization of the PD-specific features revealed widespread neuroanatomical changes in both cortical and subcortical regions.
The use of CVAE in this study provides an advantage in avoiding non-PD-related confounding variations, enabling the identification of features specific to PD. PD-specific features disentangled by CVAE were associated with deficits in DAT binding on SPECT imaging as well as levels of NfL in serum. DAT-SPECT imaging is a sensitive method for detecting dopamine neuronal dysfunction, which is a hallmark of PD18. The NfL is a biomarker of axonal damage and is also useful in tracking disease severity and progression in PD20. The associations of PD-specific features with these measures indicate that the PD-specific features reflect the underlying neurodegeneration in PD. Additionally, the PD-specific features were found to be associated not only with α-Synuclein but also with tau and Aβ42, suggesting a synergistic contribution of the three proteins to structural changes in PD. This is consistent with previous findings showing correlations between levels of CSF proteins and clinical progression and the facilitation of α-Synuclein aggregation and propagation by tau in PD23,24,25. PD-specific features were also found to be associated with age and sex, which is in line with previous studies demonstrating the effects of these factors on brain structural changes in PD26,27,28. Furthermore, the linkages between PD-specific features and study sites may stem from the varying distribution of patients with differing disease severities across different study sites, as evident from our analysis of the relationships among sites, PD biomarkers, and clinical scores (Supplementary Table 3). In conclusion, our use of CVAE has enabled the identification of PD-specific features that are associated with disease severity.
The current study identified 1773 CSF proteins that showed stronger correlations with PD-specific features compared to shared features. Consistent with the accumulating evidence supporting the involvement of immune function in PD29, we have discovered several immunologically pertinent GO terms, suggesting potential therapeutic avenues aimed at mitigating inflammatory processes that contribute to PD neurodegeneration29. Central to our protein interaction network analysis is AKT1, a genetic risk factor for PD30. AKT1's centrality within this network underscores its potential significance in PD pathogenesis. Animal studies have suggested that activating the Akt1-CREB pathway could potentially halt neurodegeneration in PD, providing a promising therapeutic target31. Furthermore, insulin-like growth factor 2 (IGF2) emerges as a target that can prevent dopaminergic neuronal loss and reduce intracellular α-Synuclein accumulation in PD models32. Additionally, our study highlights the therapeutic potential of targeting Synapsin III (SY3). Gene silencing of SY3 represents a potential therapeutic strategy to rescue α-Synuclein transgenic mice from PD-like phenotypes33, suggesting its role in mitigating neurodegeneration. Loss of ubiquitin carboxyl-terminal hydrolase L1 (UCHL1; also called PARK5) can reverse defects related to PD by suppressing glycolysis34. CD44, a non-kinase transmembrane glycoprotein, deficiency has been shown to repress neuroinflammation and rescue dopaminergic neurons in PD mouse models35. High mobility group box 1 (HMGB1), a DNA-binding protein that can be actively secreted by inflammatory cells and passively released by necrotic cells, may play a key role in PD pathogenesis36,37. Collectively, these proteins, which are intimately associated with PD-specific features, represent potential therapeutic targets for the development of treatments aimed at mitigating or halting neurodegeneration in PD.
Our visualization of the PD-specific features at the group level is consistent with previous international multicentre analyses of brain structure, which have shown extensive cortical thinning, subcortical volume losses, and increased ventricle volume7,10,11,38. Interestingly, despite the observed cortical thinning and subcortical volume losses, we also found improved white matter structure in certain regions. This finding is in line with previous diffusion tensor imaging studies, which have demonstrated microstructural compensation in the prodromal/early stages of PD at the regional scale39. The compensation mechanisms might be masking or buffering against PD pathology40. Among all the improved white matter structures, the cerebellum white matter showed the most substantial improvement, which is consistent with increasing evidence suggesting that the cerebellum plays a compensatory role in PD27,40. Nevertheless, our study also identified brain regions, such as frontal areas, which have not been extensively described in previous research 7,10,11,38.
By leveraging the CVAE model, our study quantified disease-specific variations in individual PD patients. Our findings reveal that the changes in subcortex, the disease epicentre of PD10,41,42, exhibit less variability within the patient group. Conversely, changes in brain regions like the frontal lobe and cerebellum exhibit greater differences among patients. This suggests that the rate or pattern of α-Synuclein spread varies among patients, with some patients experiencing earlier involvement of these regions than others43. The underlying pathogenesis of PD is the "prion-like" mechanism of α-Synuclein spreading through brain connections10,41,42. Given the frontal lobe's vital role in cognitive functions, its impairment may explain the observed variations in cognitive decline among PD patients38,44.
The results of this study should be interpreted within the context of several limitations. First, the imbalance in sample size between PD patients and controls may impact the model training, potentially compromising the ability to effectively disentangle features specific to PD. Secondly, there were differences in demographic characteristics between the PD and control groups, which may influence the disentangling of PD-specific features. It should be noted that any changes that are common in the control group will not be distinctly disentangled, which could also affect the results. To address these limitations, future studies could consider incorporating a larger and more diverse control dataset to improve the identification of more subtle PD-specific features. Finally, the CVAE model was applied to a resampled MRI dataset with the size of 64 × 64 × 64. This may limit the model's ability to capture more subtle structural changes in the PD brain. Further studies with larger sample size using higher-resolution inputs may identify additional, more subtle neuroanatomical changes in the brain of PD. The CVAE model architecture is flexible and can be extended to other types of data, such as diffusion MRI and non-imaging data. Incorporating such data types is an important direction for future research and development.
In summary, this study used the CVAE model to disentangle PD-specific features from structural MRI data, revealing associations with well-established PD biomarkers, disease severity, and progression of both motor and cognitive symptoms. Additionally, we identified a set of CSF proteins enriched in immune function that are associated with PD-specific features. Furthermore, the CVAE model's ability to identify specific loci of neuroanatomical changes at the individual level presents an opportunity for patient stratification in precision treatments for PD.
Methods
Study participants
The PPMI is an observational, longitudinal, prospective multicentre study designed to identify PD biomarkers to accelerate disease-modifying therapeutic trials45. The PPMI study enrolled PD patients with the following criteria: being 30 years of age or older, diagnosed within 2 years, having a baseline Hoehn and Yahr stage less than III, displaying dopaminergic deficit on imaging, untreated with dopamine replacement therapy, and having either at least two of the following motor symptoms: resting tremor, bradykinesia, and rigidity or a single asymmetric bradykinesia or asymmetric resting tremor. Controls were defined to have no current or active clinically neurological disorder, no first-degree relative with PD, and normal DAT-SPECT imaging. A total of 646 participants from PPMI were included in the study, comprising 485 PD patients (with 188 females) and 161 healthy controls (with 56 females).
The SAHZJU dataset comprises 376 PD participants (with 167 females) and 205 control participants (with 78 females). The SAHSU dataset, which was used as a validation cohort, comprised 71 PD patients, with 36 females and 35 males. The patients with PD in the SAHZJU and SAHSU datasets were diagnosed by movement disorder specialists according to the United Kingdom Parkinson’s Disease Society Brain Bank criteria46. Controls in the SAHZJU and SAHSU have no current or active clinically neurological disorder and no first-degree relative with PD.
To evaluate the severity of PD, we employed various techniques. Firstly, the striatal binding ratio (SBR) were determined using SPECT measurements with the tracer [123I] FP-CIT. This ratio provides an indication of dopamine neuron terminal density45. Additionally, we used fluid biomarkers, including those from cerebrospinal fluid (CSF) and serum, as well as clinical scores. With regards to clinical symptoms, we particularly focused on motor and cognitive manifestations, as in previous studies10,42. To assess motor symptoms comprehensively, we utilized the MDS-UPDRS-Part III. For evaluating global cognitive function, we employed the Montreal Cognitive Assessment (MoCA).
The PPMI study gained approval from the local ethics committees at each participating site. Additionally, this study received approval from the medical ethics committee of the Second Affiliated Hospital of Zhejiang University School of Medicine and the Second Affiliated Hospital of Soochow University. Written informed consent was obtained from all participating patients. All ethical regulations relevant to human research participants were followed.
MRI data acquisition and processing
T1-weighted MRI data from the PPMI used scanners from three different manufacturers, including Siemens, GE Medical Systems, and Philips Medical Systems. Acquisition parameters were as follows: repetition time (TR) = 2300/1900 ms; echo time (TE) = 2.98/2.96/2.27/2.48/2.52 ms; inversion time (IR) = 900 ms; flip angle (FA) = 9°; matrix = 256 × 256, number of slices = 196 (sagittal); and 1 × 1 × 1 mm3 isotropic resolution. Details regarding acquisition parameters and protocols can be found on the PPMI website (https://www.ppmi-info.org/).
T1-weighted MRI data from the SAHZJU dataset were scanned on a GE Discovery MR750 3.0 T MRI scanner, using a fast spoiled gradient-recalled sequence: TR = 7.336 ms; TE = 3.036 ms; IR = 450 ms; FA = 11°; field of view (FOV) = 260 × 260 mm2; matrix = 256 × 256; slice thickness = 1.2 mm; number of slices = 196 (sagittal). The total acquisition time was 5 min and 53 s. Earplugs and foam pads were used to reduce noise and head motion.
T1-weighted MRI data from the SAHSU dataset were acquired using a Siemens Prisma MR750 3.0 T MRI scanner. A three-dimensional magnetization prepared rapid acquisition gradient echo (MPRAGE) sequence was employed with the following parameters: TR = 2300 ms; TE 2.34 ms, FA = 8°; FOV = 256 × 256 mm2; matrix = 256 × 256; slice thickness = 1.0 mm; number of slices = 240 (sagittal), and 1 × 1 × 1 mm3 isotropic resolution. The total acquisition time was 3 min and 54 s.
Before processing, all MRI data underwent a thorough inspection to identify and exclude any instances of masking errors or motion artifacts. Consequently, we excluded 81 participants (48 females) from the PPMI dataset, resulting in a final study population of 646 participants, comprising 485 patients with PD (188 females) and 161 controls (56 females). Similarly, 26 PD participants and 17 controls from SAHZJU were excluded, narrowing down the subsequent analysis to 376 PD participants (167 females) and 205 control participants (78 females). For a comprehensive overview of the screening process and methods, please refer to Supplementary Fig. 12. Then, qualified T1-weighted MRIs were preprocessed using the FreeSurfer software. The FreeSurfer preprocessing involves several steps, including motion correction, intensity normalization to compensate for inhomogeneity, and removal of non-brain tissues. Once the FreeSurfer preprocessing was complete, the MRI images were linearly and non-linearly aligned into Montreal Neurological Institute (MNI152-2009c) standard space using Advanced Normalization Tools (ANTs)47. Due to computational efficiency considerations and to strike a balance between memory usage and image detail, the input images were normalized and resampled to the size of 64×64×64 (Supplementary Fig. 1b) as the previous study15.
Biofluid biomarker measurements
The PPMI study collected CSF and serum samples at baseline. The collection, handling, shipment, and storage of CSF and blood samples of PD patients were carried out in accordance with the PPMI biological manual45.
The Simoa singleplex NF-light assay was used to measure the serum NfL levels. The levels of CSF α-synuclein were measured using an enzyme-linked immunosorbent assay kit commercially available from BioLegend. Levels of Amyloid-β 1-42 (Aβ42), phosphorylated tau at threonine 181 position (P-tau), and total tau (T-tau) in CSF were measured using the multiplex Luminex xMAP platform with Fujirebio-Innogenetics INNO-BIA AlzBio3 immunoassay kit-based reagents.
The CSF proteomics data were measured using the SOMAscan platform, which measures each protein as relative fluorescence units of Slow Off-Rate Modified Aptamers (SOMAmers) that specifically bind to target proteins on slide-based hybridization microarrays48. To ensure data quality, several steps were taken: outlier samples, calibrators, buffer, and non-human SOMAmers were eliminated; the measured levels were then normalized using hybridization, plate scaling, median normalization within the plate, and calibration at SomaLogic's side. Subsequently, the data were log2 transformed, median was normalized between plates. To mitigate potential batch effects, the 'ComBat' method49 was utilized for batch correction at the plate level, ensuring the consistency and comparability of proteomic data across different batches.
The architecture of the CVAE model
The CVAE model is an unsupervised deep learning algorithm modified from the basic model14. Its primary goal is to identify signals in the target dataset (PD group) that are more enriched than those found in the background dataset (Control group). The model consists of a fully connected auto-encoder and auto-decoder. The shared encoder and PD-specific encoder, both contain identical structures, and project input data onto parameters of two distinct, 16-dimensional latent distributions: one for shared features and one for PD-specific features. The decoder takes a 32-dimensional vector (obtained by concatenating shared and PD-specific features) as input and produces a reconstructed PD brain structural MRI as output (Supplementary Table 4). Synthetic control brains use a latent 32-dimensional representation obtained by concatenating the shared feature representation with a 16-dimensional zero vector representing disease-specific features. Notably, the CVAE model was trained solely on a binary distinction between PD and control participants, without any supervision from individual clinical or non-clinical information. Supplementary Table 4 and Fig. 1d display specific information regarding each encoder and decoder layer's input feature size and parameters.
In addition to the CVAE, we also trained a non-contrastive VAE model to investigate whether features extracted solely by VAE were associated with disease severity, without disentangling PD-specific and shared features (Supplementary Table 5 and Fig. 1c). To ensure equal dimensionality of the latent features in both models, the VAE model had the same number of latent features as the CVAE: twice the number of the CVAE's shared features. While both CVAE and VAE share a similar encoder-decoder structure, they differ in the number of filters used in their convolutional and deconvolutional layers. This variance arises from the distinct model designs: VAE has one encoder, whereas CVAE has two encoders. Additionally, the disentanglement of latent features is unique to the CVAE training strategy. We trained both models using the Adam50 optimizer. The training process was terminated when the reconstruction error, defined as the mean-squared error between the input and reconstruction batch, fell below a threshold of 0.005. The CVAE and VAE models were implemented with the Pytorch framework51.
Identification of the loci of neuroanatomical alterations related to PD-specific features
To identify the loci of neuroanatomical alterations related to PD-specific features for each patient, we employed a three-step procedure (Fig. 4). Firstly, we reconstructed two brain images for each PD patient: a synthetic control brain that was reconstructed using only the shared features capturing individual variations unrelated to the disease, and a synthetic PD brain that incorporated both the shared and PD-specific features. Supplementary Fig. 13 presents an illustrative example of the input, synthetic PD, and corresponding synthetic control brain images of a representative patient, displayed in transverse, sagittal, and coronal planes. The synthetic control brain closely resembled the PD patient's neuroanatomical structure but excluded any features identified by the CVAE as specific to PD. This was achieved by setting the values of PD-specific features to zero before utilizing the CVAE decoder. In the second step, we estimated a nonrigid transformation by using the synthetic control brain to align it with the corresponding synthetic PD brain. ANTs47 was employed to calculate deformation fields, facilitating the transformation between synthetic control brains and their corresponding synthetic PD brains. Initially, a linear transformation approach was utilized to align the two brains, establishing a rough correspondence. Subsequently, a nonlinear transformation stage was undertaken, leveraging the inverse-consistent diffeomorphic image registration offered by the Python package ANTs47. This nonlinear technique fine-tuned the alignment, capturing subtle anatomical differences. Throughout this process, a vector field was generated, revealing local volumetric changes between the two brains52. Finally, the Jacobian maps were normalized to a standard MNI space, facilitating cross-subject comparisons and subsequent analysis.
Statistics and reproducibility
Demographic and clinical variables were reported as mean and interquartile range (IQR). All tests were two-sided with a significance level of P < 0.05.
Representational similarity analysis
To obtain dissimilarity matrices, we calculated pairwise dissimilarities between patients based on the features extracted by VAE or CVAE and each nonclinical and clinical measures. For categorical variables such as gender or scanning site, a subject distance of 0 was assigned if they matched and 1 if they did not. For measurements with a ratio scale, such as age and UPDRS III, the subject distance was determined by calculating their respective measurements' absolute Euclidean distance. To further assess the relationships between these dissimilarity matrices, we employed the Kendall rank correlation (Kendall τ) analysis for representational similarity analysis (RSA)53. This choice was made as Kendall rank correlation effectively measures the similarity between two sets of matrices, unlike Pearson or Spearman correlation, which are not suitable for this purpose.
The study assessed the consistency and reliability of outcomes across various samples by examining correlations between the representational dissimilarity matrices (RDMs) based on features extracted by VAE or CVAE and RDMs for nonclinical and clinical variables. To determine whether the correlations between these two types of RDMs were significantly greater than zero, one-sample t-tests were used. Paired sample t-tests were also performed to explore whether the RDMs derived from participants' demographic and clinical variables exhibited stronger correlations with the RDMs derived from PD-specific features or those derived from shared features. RSA was conducted via Python packages, including StatsModels 0.8.0, SciPy 1.6.3, NumPy 1.16.6, and pandas 0.21.0.
Linear mixed-effects model analysis
To assess the temporal changes in clinical scores, we utilized the LME model. We adjusted for various confounding factors as fixed effects, including age, gender, race (classified as White or others), study site, years of education, subgroup, month, and the interaction between subgroup and month. To capture individual differences, we modeled participant-specific slope and intercept as random effects. Missing follow-up data were excluded from the analysis. The LME analysis was performed using the lme4 package in R54. The similarities between features extracted by CVAE were compared to those in MDS-UPDRS-Part III and MoCA slope using Kendall τ correlation. The clinical progression of two subgroups was compared using LME models by testing the interaction of the subgroups and time.
Identification of proteins in CSF associated PD-specific features
To evaluate the relationships between CSF protein levels and various features specific to PD, as well as those shared across populations, we conducted paired-sample t-tests. Initially, we calculated the Kendall rank correlation coefficients to assess the association between CSF proteins and both PD-specific and shared features. Subsequently, we performed paired-sample t-tests on the results of these Kendall correlation analyses, specifically comparing the CSF proteins' association with PD-specific features and their association with shared features. To ensure the reliability of our findings, we accounted for multiple hypothesis testing by applying the Benjamini–Hochberg method to control the false discovery rate (FDR) in our analysis of CSF proteins.
To understand the biological implications and functional relevance of the set proteins associated with PD-specific features, gene ontology (GO) enrichment analysis was conducted using the clusterProfiler55 R package, and the protein–protein interaction information was assessed using the STRING database56. The p-value was also adjusted using the Benjamini–Hochberg method.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data used in this study, retrieved from the Parkinson's Progression Markers Initiative (PPMI) database, is publicly available on the PPMI website at https://www.ppmi-info.org/. Regarding the datasets from the Second Affiliated Hospital of Zhejiang University School of Medicine (SAHZJU) and the Second Affiliated Hospital of Soochow University (SAHSU), due to the inclusion of personal and sensitive information, these data are not publicly available. Access to these datasets is subject to controlled access policies to ensure compliance with ethical and legal standards, particularly concerning the protection of participant privacy. In accordance with the Chinese regulations outlined in the "Ethical Review Methods for Biomedical Research Involving Human Subjects," we have taken measures to protect the confidentiality of participant information and to comply with the informed consent agreements. Qualified researchers interested in accessing these datasets for non-commercial research purposes, within the limits of participants' consent, should submit a formal request to the co-authors: M.Z. (zhangminming@zju.edu.cn) for SAHZJU data and C.M. (drchengjiemao@163.com) for SAHSU data. Requests will be reviewed and processed within one month. Access to the data will be granted subject to institutional ethics committee approval and the signing of material transfer agreements and data use agreements that specify the conditions of data use, including restrictions on downstream data reuse. A copy or a link to the data use agreement will be provided upon request.
Code availability
Python scripts of all models and analysis pipeline are made available on GitHub: (https://github.com/zlpsophina/CVAE_VAE; https://doi.org/10.5281/zenodo.12788950)57.
References
Horsager, J. et al. Brain-first versus body-first Parkinson’s disease: A multimodal imaging case-control study. Brain 143, 3077–3088 (2020).
Fereshtehnejad, S. M., Zeighami, Y., Dagher, A. & Postuma, R. B. Clinical criteria for subtyping Parkinson’s disease: Biomarkers and longitudinal progression. Brain 140, 1959–1976 (2017).
Oveisgharan, S. et al. Association of White Matter Hyperintensities With Pathology and Progression of Parkinsonism in Aging. JAMA Neurol. 78, 1494–1502 (2021).
Wang, L. et al. Association of Structural Measurements of Brain Reserve With Motor Progression in Patients With Parkinson Disease. Neurology 99, e977 (2022).
Wang, L. et al. Association of Cortical and Subcortical Microstructure with Clinical Progression and Fluid Biomarkers in Patients With Parkinson Disease. Neurology 101, e300–e310 (2023).
Mitchell, T. et al. Emerging Neuroimaging Biomarkers across Disease Stage in Parkinson Disease: A Review. JAMA Neurol. 78, 1262–1272 (2021).
Laansma, M. A. et al. International Multicenter Analysis of Brain Structure Across Clinical Stages of Parkinson’s Disease. Mov. Disord. 36, 2583–2594 (2021).
Loughnan, R. et al. Association of Genetic Variant Linked to Hemochromatosis With Brain Magnetic Resonance Imaging Measures of Iron and Movement Disorders. JAMA Neurol. 79, 919–928 (2022).
Zeighami, Y. et al. A clinical-anatomical signature of Parkinson’s disease identified with partial least squares and magnetic resonance imaging. Neuroimage 190, 69–78 (2019).
Zeighami, Y. et al. Network structure of brain atrophy in de novo Parkinson’s disease. Elife 4, e08440 (2015).
Mak, E. et al. Baseline and longitudinal grey matter changes in newly diagnosed Parkinson’s disease: ICICLE-PD study. Brain 138, 2974–2986 (2015).
Dohm-Hansen, S. et al. The’middle-aging’brain. Trends Neurosci. 47, 259–272 (2024).
Bethlehem, R. A. I. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).
Abid, A. & Zou, J. Y. Contrastive Variational Autoencoder Enhances Salient Features. CoRR abs/1902.04601 (2019)
Aglinskas, A., Hartshorne, J. K. & Anzellotti, S. Contrastive machine learning reveals the structure of neuroanatomical variation within autism. Science 376, 1070–1074 (2022).
Prots, I. et al. α-Synuclein oligomers induce early axonal dysfunction in human iPSC-based models of synucleinopathies. Proc. Natl Acad. Sci. USA 115, 7813–7818 (2018).
Henderson, M. X. et al. Spread of α-synuclein pathology through the brain connectome is modulated by selective vulnerability and predicted by network analysis. Nat. Neurosci. 22, 1248–1257 (2019).
Seibyl, J. P. et al. Decreased single‐photon emission computed tomographic {123I}β‐CIT striatal uptake correlates with symptom severity in parkinson’s disease. Ann. Neurol. 38, 589–598 (1995).
Prots, I. et al. α-Synuclein oligomers induce early axonal dysfunction in human iPSC-based models of synucleinopathies. Proc. Natl Acad. Sci. 115, 7813–7818 (2018).
Lin, C. H. et al. Blood NfL: A biomarker for disease severity and progression in Parkinson disease. Neurology 93, e1104–e1111 (2019).
Khalil, M. et al. Neurofilaments as biomarkers in neurological disorders. Nat. Rev. Neurol. 14, 577–589 (2018).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 67, 1–48 (2015).
Irwin, D. J. et al. Evolution of Alzheimer’s Disease Cerebrospinal Fluid Biomarkers in Early Parkinson’s Disease. Ann. Neurol. 88, 574–587 (2020).
Kang, J. H. et al. Association of cerebrospinal fluid β-amyloid 1-42, t-tau, p-tau 181, and α-synuclein levels with clinical features of drug-naive patients with early parkinson disease. JAMA Neurol. 70, 1277–1287 (2013).
Pan, L. et al. Tau accelerates α-synuclein aggregation and spreading in Parkinson’s disease. Brain 145, 3454–3471 (2022).
Tremblay, C. et al. Sex effects on brain structure in de novo Parkinson’s disease: a multimodal neuroimaging study. Brain 143, 3052–3066 (2020).
Xuan, M. et al. Different patterns of gray matter density in early- and middle-late-onset Parkinson’s disease: a voxel-based morphometry study. Brain Imaging Behav. 13, 172–179 (2019).
de la Fuente-Fernández, R. et al. Age-specific progression of nigrostriatal dysfunction in Parkinson’s disease. Ann. Neurol. 69, 803–810 (2011).
Tansey, M. G. et al. Inflammation and immune dysfunction in Parkinson disease. Nat. Rev. Immunol. 22, 657–673 (2022).
Ran, C. et al. Genetic studies of the protein kinase AKT1 in Parkinson’s disease. Neurosci. Lett. 501, 41–44 (2011).
Kim, H. et al. Activation of the Akt1-CREB pathway promotes RNF146 expression to inhibit PARP1-mediated neuronal death. Sci. Signal 13, eaax7119 (2020).
Arcos, J. et al. IGF2 prevents dopaminergic neuronal loss and decreases intracellular alpha-synuclein accumulation in Parkinson’s disease models. Cell Death Discov. 9, 438 (2023).
Faustini, G. et al. Synapsin III gene silencing redeems alpha-synuclein transgenic mice from Parkinson’s disease-like phenotype. Mol. Ther. 30, 1465–1483 (2022).
Ham, S. J. et al. Loss of UCHL1 rescues the defects related to Parkinson’s disease by suppressing glycolysis. Sci. Adv. 7, eabg4574 (2021).
Wang, Y. et al. CD44 deficiency represses neuroinflammation and rescues dopaminergic neurons in a mouse model of Parkinson’s disease. Pharm. Res. 177, 106133 (2022).
Tian, Y., Chen, R. & Su, Z. HMGB1 is a potential and challenging therapeutic target for Parkinson’s Disease. Cell Mol. Neurobiol. 43, 47–58 (2023).
Angelopoulou, E., Piperi, C. & Papavassiliou, A. G. High‐mobility group box 1 in Parkinson’s disease: from pathogenesis to therapeutic approaches. J. Neurochem. 146, 211–218 (2018).
Bai, X. et al. Microstructural but not macrostructural cortical degeneration occurs in Parkinson’s disease with mild cognitive impairment. NPJ Parkinsons Dis. 8, 151 (2022).
Sanjari Moghaddam, H., Dolatshahi, M., Mohebi, F. & Aarabi, M. H. Structural white matter alterations as compensatory mechanisms in Parkinson’s disease: A systematic review of diffusion tensor imaging studies. J. Neurosci. Res. 98, 1398–1416 (2020).
Blesa, J. et al. Compensatory mechanisms in Parkinson’s disease: Circuits adaptations and role in disease modification. Exp. Neurol. 298, 148–161 (2017).
Pandya, S. et al. Predictive model of spread of Parkinson’s pathology using network diffusion. Neuroimage 192, 178–194 (2019).
Yau, Y. et al. Network connectivity determines cortical thinning in early Parkinson’s disease progression. Nat. Commun. 9, 12 (2018).
Vo, A. et al. Network connectivity and local transcriptomic vulnerability underpin cortical atrophy progression in Parkinson’s disease. Neuroimage Clin. 40, 103523 (2023).
Aarsland, D. et al. Parkinson disease-associated cognitive impairment. Nat. Rev. Dis. Prim. 7, 1–21 (2021).
Marek, K. et al. The Parkinson’s progression markers initiative (PPMI) – establishing a PD biomarker cohort. Ann. Clin. Transl. Neurol. 5, 1460–1477 (2018).
Daniel, S. E. & Lees, A. J. Parkinson’s Disease Society Brain Bank, London: overview and research. J. Neural Transm. Suppl. 39, 165–172 (1993).
Avants, B. B., Tustison, N. & Song, G. Advanced normalization tools (ANTS). Insight J. 2, 1–35 (2009).
Gold, L. et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. Nat. Preced. https://doi.org/10.1038/npre.2010.4538.1 (2010).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (eds. Bengio, Y. & LeCun, Y.) (2015).
Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. in Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) 32 (Curran Associates, Inc, 2019).
Chung, M. K. et al. A Unified Statistical Approach to Deformation-Based Morphometry. Neuroimage 14, 595–606 (2001).
Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis - connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 4 (2008).
Bates, D. Fitting linear mixed models in R. R. N. 5, 27–30 (2005).
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS 16, 284–287 (2012).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
LiPingZheng. zlpsophina/CVAE_VAE: v1.0.0. https://doi.org/10.5281/zenodo.12788950 (2024).
Acknowledgements
This study was funded by grants from the Natural Science Foundation of Shanghai (23ZR1406000), the National Key R&D Program of China (2019YFA0709502), Shanghai Municipal Science and Technology Major Project (2018SHZDZX01), the National Natural Science Foundation of China (82071997), Xiong An New AreaScience and Technology Innovation Project (2023XAGG0073), Shanghai Rising-Star Program (21QA1408700) and 111 Project (B18015). Further, we would like to thank the support from the Shanghai Center for Brain Science and Brain-Inspired Technology, ZHANGJIANG LAB, and the State Key Laboratory of Neurobiology and Frontiers Center for Brain Science of Ministry of Education. PPMI – a public-private partnership – is funded by the Michael J. Fox Foundation for Parkinson’s Research funding partners 4D Pharma, Abbvie, Acurex Therapeutics, Allergan, Amathus Therapeutics, ASAP, Avid Radiopharmaceuticals, Bial Biotech, Biogen, BioLegend, Bristol-Myers Squibb, Calico, Celgene, Dacapo Brain Science, Denali, The Edmond J. Safra Foundaiton, GE Healthcare, Genentech, GlaxoSmithKline, Golub Capital, Handl Therapeutics, Insitro, Janssen Neuroscience, Lilly, Lundbeck, Merck, Meso Scale Discovery, Neurocrine Biosciences, Pfizer, Piramal, Prevail, Roche, Sanofi Genzyme, Servier, Takeda, Teva, UCB, Verily, and Voyager Therapeutics.We want to thank all the participants in the study.
Author information
Authors and Affiliations
Contributions
J.F., L.W. and M.Z.: conception and design of work. L.Z.: model development and statistical analyses. L.Z., L.W., C.Z.: manuscript writing. C.Z., J.Y., C.M, C.L., B.Z. M.Z., L.W. and J.F: manuscript critical revision. C.Z., L.W., C.M, C.X, W.C., C.L., X.X., B.Z., M.Z. and J.F.: data interpretation. C.Z., C.M., P. H., X. G., T. G., J.W., Y.L, X.X., B.Z. and M. Z.: data collection. C.Z., L.W. and L.Z.: data processing. All the authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Govinda Poudel, Jianxiao Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Marta Vallejo, George Inglis and Luke R. Grinham. [A peer review file is available.]
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zheng, L., Zhou, C., Mao, C. et al. Contrastive machine learning reveals Parkinson’s disease specific features associated with disease severity and progression. Commun Biol 7, 954 (2024). https://doi.org/10.1038/s42003-024-06648-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-024-06648-x
- Springer Nature Limited