Introduction

Cognitive impairment (CI), one of the most disabling manifestations of Multiple Sclerosis (MS), affects approximately 45–60% of people with MS (PwMS)1,2 and can occur at all phases of the disease, including the early stage3. Such impairments can negatively impact everyday tasks of PwMS, disrupting their quality of life, overall wellbeing, and physical and social functioning4. Attention, information processing speed (IPS), learning and memory, language and executive functions seem to be the most affected cognitive domains1,5. Despite the high prevalence of CI in PwMS and its implications for clinical care, more than 30 years of research has typically focused on dichotomous classification of impaired/unimpaired6. After the pioneering study by Leavitt et al.7, evidence has been collected in favour of a taxonomy that recognizes predominant subtypes, named cognitive phenotypes, leading to an advanced knowledge regarding the presence, rate, type, and number of CI6,8,9,10,11. These studies identified from three to five phenotypes, focusing on main cognitive functions affected by MS, highlighting a spectrum of abilities ranging from intact to impaired, but approaches are quite variable in terms of the choice of neuropsychological instruments and methodology. In addition, like all symptoms of MS, CI is extremely variable in severity and progression. Some individuals may not decline or decline slowly, while others may experience significant decline. Some changes may be relatively mild and easily compensated for, while others may impact functioning in key areas of daily life, including work, driving, or management of business affairs12. To this reason, monitoring cognitive phenotypes over time is pivotal to identify trajectories of cognitive decline in MS13.

In light of the latest evidence provided on the taxonomy of specific cognitive phenotypes in MS as a meaningful measure of cognitive status14, the aim of our study was to identify predominant cognitive phenotypes in PwMS considering both cognitive and mood information collected from patient-reported outcomes (PROs) and clinician-assessed outcomes (CAOs), and investigate whether observed cognitive profiles changed after 1 year. The identified phenotypes were then examined to establish whether they differ from each other according to demographic and clinical characteristics.

Results

Baseline characteristics of study population

From the initial sample of 283 PwMS, acquired retrospectively from PROMOPRO-MS initiative15, 11 individuals were excluded due to missing data. The final cohort included 272 PwMS (female = 183; mean age = 61.6 ± 12.1 years) (see Table 1 for a summary of demographic and clinical characteristics and test scores of PwMS). In accordance with a prior study8, four cognitive phenotypes were identified, based on the combination of the seven dimensions of Montreal Cognitive Assessment (MoCA), Symbol Digit Modalities Test (SDMT), and subscales of both anxiety and depression from the Hospital Anxiety and Depression Scale (HADS), with the employment of the k-means clustering method with k = 4; the division was optimised using 10 iterations with various centroid values. The algorithm found the same cognitive clusters with the introduction of the Functional Independence Measure (FIM) ™ cognitive subtotal coordinate, indicating a confirmatory tendency.

Table 1 Demographic and clinical sample characteristics (N = 272).

Cognitive phenotypes identification

The hierarchy of features according to their respective weight during clusterization is indicated in Fig. 1, suggesting that subgroup identification was most strongly correlated with the domains of delayed recall memory and attention. Four cognitive phenotypes were identified: phenotype 1 (44.5%; N = 121) showed an almost intact profile, while phenotype 2 (22.8%; N = 62) revealed an attention impairment, with language difficulty and anxiety nearly below threshold. In addition to a delayed recall deficit, phenotype 3 (24.3%; N = 66) presented other functions significantly impaired compared to the previous subgroups, as visuo-executive, attention, language, and IPS. Finally, phenotype 4 (8.4%; N = 23) had a multi-domain impairment profile, grouping together PwMS with important deficits in almost all the domains considered (visuo-executive, attention language, delayed recall, orientation, IPS and anxiety). As clearly displayed in radar plots of Fig. 2, the percentage of cognitively impaired people within each phenotype presented an incremental tendency: phenotype 1 includes only 9.9% of participants with cognitive deficits, phenotype 2 had 51.6%, phenotype 3 grouped 83.3% with CI and, finally, individuals in phenotype 4 (100%) had difficulties in various cognitive domains. However, radar plots aimed to illustrate that CI is a pervasive issue across all four subgroups, not limited to just one (i.e., phenotype 4).

Fig. 1
figure 1

Gradient boosting decision tree. The histogram displays the relative importance that each of the ten coordinates gained into the clustering process. Abstraction and delayed recall were found to be the most important ones.

Fig. 2
figure 2

Radar plots illustrating the cognitive performance of individuals with cognitive impairment (CI) across the four cognitive phenotypes. Each plot displays the mean values for selected cognitive domains within each phenotype. The range for each coordinate is rescaled to the interval [0, 1], where 0 indicates lower and 1 indicates better cognitive performance, except for the mood domain, where lower values represent fewer symptoms of anxiety and depression.

A more-in-depth analysis about the demographic and clinical composition of the four phenotypes revealed that phenotype 1 had the higher level of years of education (13 years, SD = 3.4) together with the lower value of disability as measured by the Expanded Disability Status Scale (EDSS)16 of 5.2 (SD = 1.8); an opposite pattern was observed for phenotype 4, which included people with lower educational level (8.3 years, SD = 2.8) and the higher degree of disability (EDSS = 6.4, SD = 1.2). Considering MS course, phenotype 1 contained 53% of people with relapsing–remitting MS (RRMS), phenotypes 2 and 3 were fairly comparable in terms of prevalence of individuals with secondary progressive MS (SPMS), respectively 54.8% and 47%, and phenotype 4 had a prevalence of SPMS (65.2%) and primary progressive MS (PPMS) (21.7%). Except for phenotype 3 with of an equal number of female and male participants, phenotypes 1, 2 and 4 included more women than men. Age showed an increasing trend between all subgroups, with younger individuals of 58.4 years (SD = 11.0) in phenotype 1 and older PwMS of 67.7 years (SD = 11.9) in phenotype 4. While other demographic and clinical features tended to diverge clearly among phenotypes 1 and 4, the same trend was not observed for phenotypes 2 and 3. All the results are shown in Table 2.

Table 2 Summary of demographic and clinical characteristics by four phenotypes.

Cognitive phenotypes over 1 year follow up.

Considering changes over 1 year, Fig. 3 resumed the evolution of cognitive phenotypes at the second time point of 1 year. Phenotype 1 presented a small improvement in visuo-executive abilities (Mean scores at T0 = 4.30 (SD = 0.91); T1 = 4.52 (SD = 0.71); p = 0.0066; Wilcoxon effect size: − 0.25), and a worsening in abstraction (Mean scores at T0 = 2 (SD = 0); T1 = 1.89 (SD = 0.34); p = 0.0008; Wilcoxon effect size = 0.30) and delayed recall (Mean scores at T0 = 4.29 (SD = 0.76); T1 = 4.07 (SD = 1.07); p = 0.003; Wilcoxon effect size = 0.27). Phenotypes 2 showed an enhancement in abstraction (Mean scores at T0 = 0.79 (SD = 0.41); T1 = 1.26 (SD = 0.60); p < 0.0001; Wilcoxon effect size = − 0.60). Individuals in phenotypes 3 performed better than baseline in delayed recall (Mean scores at T0 = 1.48 (SD = 1.06); T1 = 2.03 (SD = 1.61); p = 0.003; Wilcoxon effect size = − 0.37), while worsened in abstraction (Mean scores at T0 = 1.84 (SD = 0.36); T1 = 1.56 (SD = 0.58); p = 0.0009; Wilcoxon effect size = − 0.36). Finally, phenotype 4 improved in language (Mean scores at T0 = 0.96 (SD = 0.56); T1 = 1.36 (SD = 0.79); p = 0.02; Wilcoxon effect size = − 0.48), with a significant worsening in anxiety (Mean scores at T0 = 8.43 (SD = 4.35); T1 = 6.91 (SD = 4.58); p = 0.04; Wilcoxon effect size = 0.43) (see Table 3 for all p values). Interestingly, the FIM cognition subscale did not significantly vary between the two time points for any phenotype, suggesting that no considerable changes were observed by clinician (Phenotype 1 T0 = 34.44 (SD = 1.28); T1 = 34.27 (SD = 3.33); p = 0.194; phenotype 2 T0 = 34.16 (SD = 1.46); T1 = 33.52 (SD = 4.78); p = 0.703; phenotype 3 T0 = 33.73 (SD = 1.84); T1 = 33.36 (SD = 3.08); p = 0.615; phenotype 4 T0 = 29.13 (SD = 6.50); T1 = 28.43 (SD = 5.57); p = 0.314).

Fig. 3
figure 3

The amount of cognitive change between time T0 (baseline) and T1 (1-year follow up) for each cognitive phenotype. Significant statistical changes are indicated by coloured columns. Positive values indicate a worsening, while negative ones reflect an improvement in the cognitive function. To maintain this convention, anxiety and depression values have been properly converted. The medium effect threshold is indicated by red dashed lines.

Table 3 Comparison between baseline and 1-year follow-up for cognitive domains in each cognitive phenotype.

Discussion

The typical convention that dichotomizes CI as present vs. absent in MS lacks key nuances, such as identifying predominant profiles or specific areas of deficit and possible change over time. Our study provides new insights into the knowledge of cognitive phenotypes of PwMS and the stability of impairment pattern after 1 year. In line with previous paper by Podda et al.8, four cognitive phenotypes were identified in terms of type, rate, and number of impaired domains and provides a taxonomy of major cognitive areas affected in PwMS.

Following an adapted characterization by Hancock et al.6, at first time point, phenotype 1 (44.5%) showed a substantially preserved cognitive profile without any impaired domain; phenotype 2 (22.8%) had a mild-cognitive impairment profile with emergent attention difficulty (single-domain impairment); phenotype 3 (24.3%) included people with impaired visuo-executive, attention, language, memory and IPS (bi-domain impairment), and finally, phenotype 4 (8.4%) grouped individuals with marked and widespread deficits in visuo-executive, attention, language, memory, orientation, IPS and mood disorders (multi-domain impairment). Results confirmed that such cognitive taxonomy tended to converge with a global worsening of health status based on demographic and clinical information. Individuals in phenotype 1 were younger, with a prevalence of RRMS, higher education, with a lower EDSS and more independent. Phenotypes 2 and 3 were quite similar in terms of age, educational level, disease course, EDSS and FIM, but phenotype 3 grouped the same number of males and females and individuals with the longer disease duration compared to the other phenotypes. Finally, PwMS in Phenotype 4 were older, with progressive courses (both SPMS and PPMS), with a lower educational level, with the presence of an aid, as indicated by EDSS score, and a lower independence degree8. As suggested by Podda et al.8, this reflects a continuum from younger individuals with a lower disability level, a RRMS course, an almost intact cognitive profile (phenotype 1), to individuals with mild worsening in more than one cognitive domain, accompanying with a critical health status (phenotypes 2 and 3), to older individuals with a marked and widespread alteration of cognitive functioning combined with a more severe clinical profile (i.e., higher disability level and progressive forms) (phenotype 4).

This characterization is in line with other previous studies3,17,18,19,20. Participants with progressive MS exhibited moderately higher impairment in all cognitive domains than people with RRMS, with more prominent differences in attention, IPS, and verbal memory17,21. These differences between RRMS and SPMS may be explained by an obviously longer duration of the disease that is associated with a higher lesion load and a more severe disability20. As indicated by Amato et al.19, as MS progresses, the profile of cognitive deficits tends to involve other functions that initially were intact: after 10 years, a higher EDSS score and a progressive MS followed by increasing age proved to be positively correlated with worse cognitive outcome. Thus, in participants with progressive MS, CI were more frequent, widespread, and severe than in RRMS, taking into account differences in age, sex, and disease duration3. This suggests that a higher frequency of CI in progressive forms could be linked not only to a longer duration of disease, but also to the clinical pattern of progressive phenotype. Although in this study information about pharmacological treatment has not been collected, it is reasonable to assume that medication could further contribute to the differences observed between these phenotypes. Most disease modifying therapies (DMTs) are indicated for RRMS, while only few drugs consistently showed efficacy in slowing disability progression in the subgroup of people with SPMS22. In people with RRMS, initiating treatment with a DMT early in the disease course is associated with better long- term outcomes than delaying treatment. Once a transition from RRMS to SPMS has occurred, however, DMTs become ineffective at preventing disability progression23. However, recently, evidence suggested that Siponimod showed superiority over placebo in terms of disability progression in a representative population of people with SPMS, who had reached the non-relapsing stage of SPMS and had a high level of established disability24.

After 1 year, cognitive phenotypes did not substantially change over time in terms of type and number of impaired domains within each subgroups. However, some fluctuations occurred in the rate of impairment. Unexpected improvements in visuo-executive for phenotype 1, abstraction for phenotype 2, delayed recall for phenotype 3 and language for phenotype 4 have been observed, confirming that there is not a straightforward linear evolution of cognitive dysfunction. As a whole, results on the evolution of CI in MS have been controversial, since both cognitive preservation and progressive deterioration have been reported, and remarkable fluctuations have been noted during very brief follow-up periods19,25,26. Amato et al.27 found that, although individuals showed significant losses in verbal memory and abstract reasoning on initial baseline, these deficits remained substantially unchanged 4 years later, by which time the pattern of CI had expanded to include language as well. One possible speculation may be that even though performance deteriorated, practice effects were serving to mask the deterioration. Although we included in the analysis only PwMS that had undergone assessments on at least two occasions 1 year apart, we should not exclude that, before this temporal window, participants improved their performances at follow-up visits due to familiarity with tests. This is in line with Jonsson et al.28,who in a 4-year follow-up study of 64 newly diagnosed PwMS reported improvements on tests of attentional control, mental processing and visual spatial memory, and no changes on problem-solving or naming. The authors suggested that improved performance on cognitive measures could be largely attributable to practice effects that may neglect the true progression rate, and this should be considered in interpreting results. Another factor that may contribute to this discrepancy could be the great variation in the length of follow-up. One-year follow-up might be a brief temporal window to observe significant changes across cognitive phenotypes compared to studies with two or more years at follow-up19,25,26,29. As suggested by Katsari et al.26, only adequate intervals between initial assessment and follow-up of 10–20 years would allow important insights into the pattern of cognitive evolution in MS. However, Amato et al. highlights that in MS cognitive deterioration occurs more slowly and less consistently than in degenerative dementia such as Alzheimer’s disease, but it is present. Once CI arises, it is unlikely to remit to any significant extent, although it appears to remain stable19. Given that incipient cognitive decline seems to be the major risk factor for further deterioration in the short-term, findings may suggest that, during a sufficiently long follow-up period, the likelihood that PwMS would display CI tends to increase, although at different rates.

Furthermore, our participants had a higher educational level as a whole. Thus, since cognitive reserve has an impact as a prognostic factor, educational level could prevent our participants to the progression of cognitive decline over 1 year. As indicated by Sumowski et al.30, PwMS who had greater intellectual enrichment, conventionally measured as years of educational, were protected against cognitive decline over 4.5 years.

Cognitive phenotypes have the potential to provide additional breadth to the reporting of CI for PwMS, which may be quite useful in clinical work to improve communication and understanding between healthcare providers, PwMS and their caregivers. Since the FIM cognition subscale did not significantly vary after 1 year, this suggests that clinicians lacked slight and subclinical changes in terms of comprehension, expression, social interaction, problem solving and memory of PwMS. Despite the importance of identifying cognitive changes in MS, little is known about the accuracy of identifying CI based on clinical judgment. A neurological evaluation in raising concerns about CI is of immediate clinical relevance. Thus, in a review by Fischer and colleagues31, it was reported that most neurologists providing care to PwMS underestimated the prevalence of CI in their clinic population. Romero et al.32 showed that neurologists’ predictions of CI based on a typical clinical visit were not significantly different from chance, suggesting that the routine clinical assessment lacks sensitivity in identifying CI. A possible reason for the clinical examination proving insensitive to CI in PwMS is that cognitive dysfunction in MS is more subtle than that seen in conditions such as Alzheimer’s or cerebrovascular disease. Recently, Jackson et al.33 confirmed that PwMS and their clinicians significantly differ in their perceptions of CI across multiple cognitive domains: clinician perceived cognitive deficits were predicted by a combination of factors including the PwMS’ objective cognitive scores, physical disability, age, and depression.

As a whole, these results corroborate the use of cognitive phenotypes as a valuable tool to increase knowledge of clinicians about CI and its changes over time. Our study confirms the importance of a practical and quick screen for identification of PwMS who might need a close cognitive monitoring that could help clinicians in planning effective treatments tailored to subgroups of cognitively homogeneous individuals8. In a recent study by Ziccardi et al.14, MS individuals characterized by a single-impairment cognitive phenotype benefited most from a home-based restorative cognitive rehabilitation program than PwMS with more affected cognitive domains (i.e. multi-domain impairment). Moreover, lower functioning in a specific cognitive test was associated with a greater improvement within the same test.

Given that CI often changes over time, independent of clinically identified exacerbations or changes in physical symptoms, our results corroborate that investigating cognitive phenotypes and their stability over time across the clinical course of MS would provide prompt and adequate information to decision-making regarding changes to treatment and symptom management.

This study has several important limitations. Potential effect of medication and treatments of PwMS should be considered when investigating cognitive phenotypes and their stability over time.

Moreover, the generalizability of the identified cognitive taxonomy to the broader MS population remains uncertain, as this study was retrospective and involved individuals who were outpatients at the AISM Rehabilitation Service of Genoa as part of the PROMOPRO-MS initiative15. Therefore, the choice of specific tests may contribute to the prevalence, distribution, and type of these phenotypes. As established by previous studies6,9,10, although standardized cognitive batteries, as the Rao Brief Repeatable Battery (BRB)34, are highly recommended for a more precise evaluation of cognitive functioning in MS, they require considerable time and resources, which might limit their integration into routine clinical practice.

Despite these challenges, the PROMOPRO-MS database, which collects data on various functional domains (e.g., manual ability, bladder and bowel problems, cognitive functions, mood, quality of life, to the grade of independence in daily life) within a relatively short timeframe, offers a practical approach to developing cognitive phenotypes in MS.

Integrating PROs and CAOs with other indices like digital biomarkers could enhance decision-making in clinical practice. Digital tools as mobile Health app, developed for the management of CI in MS35,36,37,38,39, are bringing some potentials that could favor the adoption of cognitive phenotypes into clinical practice. These tools could facilitate more frequent assessments, possibly conducted at home, reducing high stress levels and demands for both clinicians and patients due to repeating administration procedures over time35, and help integrate PROs on other significant MS-related dimensions like mood and fatigue13. The availability of longitudinal, multi-domain, big data could favor the application of revolutionary technology, such as a digital twin40, to MS cognitive phenotyping. This approach could potentially transform patient care by visualizing disease progression and informing tailored treatment strategies13. By making cognitive assessments more precise and integrating them with advanced analytics, a cognitive taxonomy could help to reduce healthcare costs, minimize trial-and-error treatment approaches, and ultimately improve patient adherence and satisfaction through personalized care. The use of integrated databases, such as PROMOPRO-MS, represents a fruitful and instrumental way to identify a taxonomy of cognitive profiles in partnership with PwMS. This approach not only increase clinical meaningfulness of PROs and CAOs, detecting rapid changes due to the pathology evolution, but also supports a timely and cost-effective individualized treatment for PwMS15.

Methods

Participants

The dataset was acquired retrospectively from PROMOPRO-MS initiative15 that consists of a large, multicenter and prospective study that involves the Italian MS society Rehabilitation Services of Genoa, Padua and Vicenza. In PROMOPRO-MS all the PwMS have been assessed using clinical evaluations and self-administered questionnaires related to a variety of domains, including fatigue, cognitive and physical impairment and psychological well-being. The current cohort included in this study was composed by PwMS that met PROMOPRO-MS inclusion criteria, as a definite diagnosis of MS, all disease forms (RRMS, SPMS and PPMS) and an age above 18 years. Only participants that had undergone assessments on at least two occasions one year apart were included. Data collected between January 2014 and July 2022 were considered for study purposes.

Outcome measures

The features chosen from PROMOPRO-MS database to be used in the analysis are those that pertain to cognitive and mood domains. Considering cognitive functioning, we selected the MoCA41 and SDMT42. MoCA is a cognitive screening test scored on a 30-point scale with higher values corresponding to a better cognitive status. Cut-offs of the MoCA seven dimensions were used to isolate specific impaired domains43: Visuospatial-Executive (< 4), Attention (< 6), Abstraction (< 1), Delayed Recall (< 4), Language (< 3), Naming (< 3), and Orientation (< 6). SDMT is a neuropsychological test used to assess IPS, with a score of ≤ 3444 indicating the presence of IPS difficulties. This cut-off facilitates the application of the SDMT in clinical and research settings and allows Italian clinicians to use this neuropsychological tool with increased confidence. MoCA and SDMT have been chosen for their good psychometric properties42,45 and because they can be considered the best tools to quickly provide information about cognitive functioning in MS (about 10 and 5 min, respectively), thus reducing working load and preventing fatigability of participants.

Mood disorders were tested using HADS46, a self-assessed questionnaire consisting of 14 multiple-choice (0–3 Likert scale) items probing the presence of depression (HADS-d subscale) and anxiety (HADS-a subscale). A threshold score of 8 or above was found to be an accurate indicator for both depression and anxiety symptoms.

For each participant demographic (i.e., gender, age, years of education) and clinical (i.e., disease duration, disease course, and disability level as measured by the EDSS16) information were collected. In addition, FIM, a seven-step ordinal scale that comprises 18 items divided into motor and cognitive categories, was administered to assess, and grade the functional status of a person based on the level of assistance he/she requires. FIM includes measures of independence for self-care, including sphincter control, transfers, locomotion, communication, and social cognition. Overall, the higher the scores, the higher the independence level. For study purpose, we included only FIM cognition subscale47,48. All participants gave written informed consent prior to study entry in accordance with the revised Declaration of Helsinki49. The study was approved by the Regional Ethics Committee of Azienda Ospedaliera “San Martino”, Genoa, Italy (Number: 023REG2014).

Statistical analysis

Following a previous paper by Podda et al.8, ten coordinates, derived from the combination of the seven dimensions of MoCA, SDMT, and HADS for both depression and anxiety, were used to describe the participants. To consider all scales reduced to the same interval (0,1), a min–max rescaling was applied. The different classes into which PwMS have been clustered were determined using the k-means algorithm50. For additional analysis, the clustering process was performed again with the FIM cognitive subtotal added. The gradient-boosting decision tree approach was used to determine which features had been most important in the clusterization process51. We used 100 decision trees, considered as weak learners, and applied the bootstrap aggregation method to address the classification problem. Subsequently, a predominant cognitive functioning profile was delineated for every subgroup, considering the proportion of impaired individuals for each cognitive domain.

The demographic (i.e., age, sex, years of education) and clinical (i.e., EDSS, FIM cognitive subtotal, MS course, disease duration) parameters acquired at the initial time point of each group were evaluated; Jarque Bera test52 was used to verify the normality hypothesis for each class. Based on the results, differences between groups of clinical and demographic characteristics were compared by the χ2-test for categorical variables or the Kruskal–Wallis test53 test for numerical ones.

The significance was evaluated with α = 0.05 and Bonferroni correction was then applied. In addition, each pair of groups' following comparisons were assessed using Fisher's exact test for categorical variables and the Mann–Whitney U test54 for numerical ones.

A cumulative percentage of PwMS categorised as “cognitively impaired” was assessed for each subgroup, following an adapted characterization of Hancock et al.6 and considering outcome measures of the study and consistency with previous works. Thus, individuals were grouped based on the number of total impairments (0 impairments = “preserved,” 1–2 impairment = “single domain,” 3–5 impairments = “bi-domain,” and 6 + impairments = “multi-domain”).

After the descriptive analysis of each phenotype, first and second time points were used to examine the traits associated with these four phenotypes to spot any potential significant changes. Since the baseline and follow-up were both observed, the difference in the mean level of impairment for each phenotype was determined using the non-parametric Wilcoxon signed rank test55 applied to each cognitive subgroup. The magnitude of differences was measured with the Wilcoxon effect size, employing the Wilcoxon Z value normalised by the population of each cluster56. The absolute value classifies the magnitude of the effect: larger the value, larger the effect; the sign relates to the difference evaluated by the test (i.e., negative values for improvement), that is (Xi1 − Xi2), i = 1…N with N the number of PwMS from the cluster; the difference was calculated between the accuracies of each test for both time points. All the analyses were conducted with MATLAB (version 9.14.0 (R2023a) Update 4).