Classification and deep-learning–based prediction of Alzheimer disease subtypes by using genomic data

Shigemizu, Daichi; Akiyama, Shintaro; Suganuma, Mutsumi; Furutani, Motoki; Yamakawa, Akiko; Nakano, Yukiko; Ozaki, Kouichi; Niida, Shumpei

doi:10.1038/s41398-023-02531-1

Classification and deep-learning–based prediction of Alzheimer disease subtypes by using genomic data

Article
Open access
Published: 29 June 2023

Volume 13, article number 232, (2023)
Cite this article

Download PDF

You have full access to this open access article

Translational Psychiatry

Classification and deep-learning–based prediction of Alzheimer disease subtypes by using genomic data

Download PDF

Daichi Shigemizu ORCID: orcid.org/0000-0002-4412-0552^1,2,
Shintaro Akiyama¹,
Mutsumi Suganuma¹,
Motoki Furutani^1,3,
Akiko Yamakawa¹,
Yukiko Nakano³,
Kouichi Ozaki ORCID: orcid.org/0000-0003-0028-1570^1,2,3 &
…
Shumpei Niida⁴

4403 Accesses
10 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

Late-onset Alzheimer’s disease (LOAD) is the most common multifactorial neurodegenerative disease among elderly people. LOAD is heterogeneous, and the symptoms vary among patients. Genome-wide association studies (GWAS) have identified genetic risk factors for LOAD but not for LOAD subtypes. Here, we examined the genetic architecture of LOAD based on Japanese GWAS data from 1947 patients and 2192 cognitively normal controls in a discovery cohort and 847 patients and 2298 controls in an independent validation cohort. Two distinct groups of LOAD patients were identified. One was characterized by major risk genes for developing LOAD (APOC1 and APOC1P1) and immune-related genes (RELB and CBLC). The other was characterized by genes associated with kidney disorders (AXDND1, FBP1, and MIR2278). Subsequent analysis of albumin and hemoglobin values from routine blood test results suggested that impaired kidney function could lead to LOAD pathogenesis. We developed a prediction model for LOAD subtypes using a deep neural network, which achieved an accuracy of 0.694 (2870/4137) in the discovery cohort and 0.687 (2162/3145) in the validation cohort. These findings provide new insights into the pathogenic mechanisms of LOAD.

Deep learning-based polygenic risk analysis for Alzheimer’s disease prediction

Article Open access 06 April 2023

c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer’s disease

Article Open access 13 October 2023

Systematic Analysis and Biomarker Study for Alzheimer’s Disease

Article Open access 26 November 2018

Introduction

Alzheimer’s disease (AD) is the most common cause of dementia among elderly people [1, 2]. The majority of AD cases are sporadic late-onset AD (LOAD), diagnosed in people ≥65 years [3]. The diagnosis of LOAD is characterized by the accumulation of amyloid-beta (Aβ) plaques and tau neurofibrillary tangles in the neurodegenerative brain [4], but it is a heterogeneous disease and the symptoms vary among patients. Previous studies have reported some subtypes of LOAD [5, 6]. Bredesen reported three LOAD subtypes (inflammatory, non-inflammatory, and cortical) after a metabolic profiling analysis of patients with cognitive decline [5]. Byun et al. identified four LOAD subtypes with heterogeneous patterns of regional atrophy from MRI-measured volumes of the hippocampus and cortical regions (both impaired, hippocampal atrophy only, cortical atrophy only, and both spared) [7]. However, these findings were derived from small samples, limiting their interpretation.

Genome-wide association studies (GWAS) have been highly successful in identifying genetic risk factors associated with LOAD [8]. The strongest genetic risk factor for LOAD is the apolipoprotein E ε4 (APOE4) polymorphism [9]. Others include bridging integrator 1 (BIN1) [10], clusterin (CLU) [10], and triggering receptors expressed on myeloid cells 2 (TREM2) [11], also identified from LOAD GWAS analyses. Risk prediction models using these genetic risk factors have also been developed for LOAD using various computational approaches [12, 13], including machine learning techniques [14]. With a large sample size, determining genetic risk will likely not only contribute to our understanding of the underlying pathologic mechanisms of LOAD but will also provide new insights into the classification of distinct LOAD subtypes. However, no previous studies have been conducted on genetic risks for the determination of LOAD subtypes.

Here, we comprehensively investigated the genetic architecture of LOAD based on Japanese GWAS data from a large number of patients and controls and examined the presence of LOAD subtypes using the energy landscape [15]. Each subject was represented as a binary vector based on the association signals obtained from GWAS. The energy landscape was visualized with disconnectivity graphs. We revealed the presence of two distinct groups of LOAD patients. Subsequent analysis of representative association signals from each group provided evidence that one group was characterized by both major risk genes for developing LOAD and immune-related genes, and the other group was characterized by genes associated with kidney disorders. We further found that LOAD patients had significantly decreased levels of serum albumin and hemoglobin, which are used to assess kidney function. These results suggest that impaired kidney function could be involved in LOAD pathogenesis. Our findings will contribute to a better understanding of the mechanisms driving heterogeneity in LOAD and will provide novel insight into potential subtype-specific therapies as a step toward precision medicine.

Materials and methods

Ethics statements

This study was approved by the ethics committee of the National Center for Geriatrics and Gerontology (NCGG). The design and performance of the current study involving human subjects were clearly described in a research protocol. All participation was voluntary, and all participants completed informed consent in writing before registering with the NCGG Biobank.

Clinical samples

Genome-wide genotyping data from the blood samples of a total of 7284 subjects used in this study and their associated clinical data were downloaded from the NCGG Biobank database. Of the total, 4139 subjects, composed of 1947 AD subjects and 2192 cognitively normal (CN) subjects, were genotyped using the Affymetrix Japonica Array (called the discovery cohort). The remaining 3145 subjects, composed of 847 AD subjects and 2298 CN subjects, were genotyped using the Infinium Asian Screening Array (called the validation cohort). The AD subjects were diagnosed with probable or possible AD using the criteria of the National Institute on Aging and Alzheimer’s Association workgroups [16, 17]. The CN subjects had subjective cognitive complaints, but normal cognition (score >23) on a neuropsychological assessment with a comprehensive neuropsychological test, the Mini-Mental State Examination. The diagnosis of all subjects was conducted based on medical history, physical examination and diagnostic tests, neurological examination, neuropsychological tests, and brain imaging with magnetic resonance imaging or computerized tomography by experts including neurologists, psychiatrists, geriatricians, or a neurosurgeon, all experts in dementia who are familiar with its diagnostic criteria. All subjects were ≥60 years of age.

Quality control in the GWAS

Genotype imputation was conducted by using IMPUTE2 [18] with the 3.5k Japanese reference panel developed by the Tohoku Medical Megabank Organization (https://www.megabank.tohoku.ac.jp/english/) for the discovery cohort and the 5.6k in-house reference panel for the validation set. We used imputed variants with an INFO score ≥0.4. Quality control was performed in each dataset separately after imputation using PLINK software [19]. We first applied quality control (QC) filters: (1) sex inconsistencies (--check-sex); (2) inbreeding coefficient (--het 0.1); (3) genotype missingness (--missing 0.05); (4) PI_HAT >0.25, where PI_HAT is a statistic for the proportion of identity by descent (--genome); and (5) exclusion of outliers from the clusters of East Asian populations in a principal component analysis (PCA) that was conducted together with 1000 Genomes Phase 3 data. We next applied QC filters to the genetic markers (single-nucleotide polymorphisms [SNPs] and short insertions and deletions [Indels]): (1) genotyping efficiency or call rate (--geno 0.95), (2) minor allele frequency (--freq 0.001), and (3) Hardy–Weinberg equilibrium (--hwe 0.001). To generate a set of independent SNPs, we further performed linkage disequilibrium–based SNP pruning using the statistical analysis program PLINK, version 1.90b [20] with a window size of 50 SNPs, a step of 5 SNPs, and a pairwise r² threshold of 0.1 (--indep-pairwise 50 5 0.1).

Disconnectivity graph

The autosomal variants that passed QC criteria as described above were assessed with a logistic regression model, adjusting for sex and age with PLINK software (--logistic) [19] in the following way:

$$logit\left( {P_i} \right) = \beta _0 + \beta _1 \times age_i + \beta _2 \times sex_i + \beta _3 \times Variant_i.$$

The coefficient and p-value of each variant were obtained from the discovery set. Using the variants with p < 0.01 weighted by their coefficients, a PCA analysis was performed. The principal component (PC) scores were binarized (i.e., −1 or +1) by the mean of the eigenvector values. Each subject was represented by a binary vector $V_k$ = ($\sigma _{1,}\sigma _{2, \cdots ,}\sigma _N$) of $2^N$, the appearance probability $P(V_k)$ of which was applied to the pairwise maximum entropy model (i.e., Boltzmann distribution). Energy values of $2^N$ binary vectors were compared to identify local energy minimums. By using a disconnectivity graph labeling the energy local minimum states, energy landscapes [15] can be visualized using the R packages akima and rgl (Supplementary Fig. 1).

RNA-sequencing data analysis

All RNA-sequencing (RNA-seq) data were downloaded from the NCGG Biobank database [21]. The quality of the read sequences (fastq files) was assessed by using FastQC (version 0.11.7). The low-quality reads (<Q20) and trimmed reads with adapter sequences (shorter than 50 bp) were excluded by using Cutadapt (version 1.16). The remaining clean sequenced reads were mapped to the human reference genome (GRCh37) with STAR (version 2.5.2b) [22]. Quantification, in transcripts per million, was performed with RSEM (version 1.3.0).

qRT-PCR validation of genes

cDNA was synthesized by using a PrimeScript II 1st Strand cDNA Synthesis Kit (Takara Bio, Shiga, Japan). quantitative RT-PCR (qRT-PCR) analysis was performed by using TaqMan Fast Advanced Master Mix (Thermo Fisher Scientific, Waltham, MA) and TaqMan Probes (Thermo Fisher Scientific), in accordance with the manufacturer’s instructions and the Quantstudio7 Flex Real-Time PCR System (Thermo Fisher Scientific). Target genes and their corresponding TaqMan probes were RELB (Hs00232399_m1) and FBP1 (Hs00983323_m1). The qRT-PCR conditions were one cycle of 50 °C for 2 min, 95 °C for 20 s followed by 40 cycles of 95 °C for 1 s, and 60 °C for 20 s. Human beta-actin (ACTB, Hs99999903_m1) was preselected as a reference gene for the normalization of target gene expression levels. Gene expression levels from qRT-PCR were calculated relative to that of the reference gene ACTB by using the semiquantitative method [23]. Gene expression levels of RELB were obtained for 16 randomly selected subjects with heterozygous genotypes and 20 randomly selected subjects with homozygous genotypes. Those of FBP were obtained for eight randomly selected subjects with heterozygous genotypes and 20 randomly selected subjects with homozygous genotypes.

Construction of a LOAD subtype prediction model

All data were strictly separated into a discovery cohort and a validation cohort. A PCA analysis was performed using variants with GWAS p < 0.01 in the discovery cohort. PC scores obtained from the PCA analysis were used to construct a disconnectivity graph. To estimate the PC scores (PC1 to PC10) from the variant data, prediction models were developed by using a deep neural network model. The neural networks comprised six hidden layers of 512 neurons along with randomized leaky rectified linear unit (RReLU) activation and 50% dropout. The networks were trained for 100 to 1500 epochs at 100 intervals with a batch size of 32 using the discovery cohort.

Estimated PC scores were binarized (i.e., −1 or +1) by the mean of eigenvector values used to construct the disconnectivity graph. Each subject was represented by a binary vector V_k = ($\sigma _{1,}\sigma _{2, \cdots ,}\sigma _N$) of $2^N$, from which the position on the disconnectivity graph was detected. Prediction models were assessed by the rate at which LOAD cases and CN subjects were predicted in the red and green spheres of the same groups on the disconnectivity graph. The LOAD subtype prediction models were evaluated on the independent validation cohort. These models were constructed using the open-source machine learning library PyTorch (version 1.7.0).

Results

GWAS of the Japanese subjects

The discovery cohort consisted of 1947 LOAD cases and 2192 CN subjects. The average ages of the subjects from whom the LOAD and CN blood samples were obtained was 79.1 years (SD = 6.3 years) and 70.8 years (SD = 5.9 years), respectively, and the female-to-male ratios were 2.13:1 and 1.19:1, respectively. We examined association signals from the discovery cohort. A total of 9,252,465 genetic markers (SNPs and Indels) passed stringent QC filters for both genotypes and samples after genotype imputation. The QC-passed genetic markers were further pruned for linkage disequilibrium to obtain a set of independent markers. A subset of 2,741,829 genetic markers was used for subsequent analysis.

The validation cohort consisted of 847 LOAD cases and 2298 CN subjects. The average ages of the subjects from whom the LOAD and CN samples were obtained was 79.4 years (SD = 6.4 years) and 75.4 years (SD = 4.3 years), respectively, and the female-to-male ratios were 1.91:1 and 1.27:1, respectively.

Disconnectivity graph construction

Of the 2,741,829 genetic markers from the discovery cohort, 22,128 showed association signals with GWAS p < 0.01. Using the association signals, we performed a PCA. According to the PC scores, subjects were binarized to 2⁷ binary vectors. Local energy minimum states were obtained from a comparison of energy values among the binary vectors, and the energy landscape was visualized with disconnectivity graphs labeling their local energy minimum states. In Fig. 1a, LOAD cases and CN subjects are represented as red and green spheres whose sizes correspond to the difference in LOAD and CN frequencies (i.e., the red spheres represent that LOAD cases are more frequent than controls and the green spheres represent that CN subjects are more frequent than LOAD cases). Both LOAD cases and CN subjects created clusters, and we revealed the presence of two distinct groups in LOAD and in CN (Fig. 1b).

Demographic data of the two LOAD groups

The discovery cohort was classified into two distinct groups: group 1 was composed of 918 AD cases and 895 CN subjects, and group 2 contained 1029 AD cases and 1297 CN subjects. The demographic data of the subjects in each group are shown in Table 1. No differences were observed between the two groups in the average age and female-to-male ratio, but statistically significant differences were found in the number of APOE4 alleles between the LOAD subjects in each group (Fisher’s exact test p = 3.70 × 10⁻¹²) and between the CN subjects in each group (Fisher’s exact test p = 1.06 × 10⁻⁸), with the proportion of subjects with APOE4 higher in group 1 than in group 2. When visualizing the energy landscape by APOE4 and APOE2 information, significant positive correlations between LOAD frequency and APOE4 frequency were observed, with a higher correlation coefficient in group 1 (Kendall’s τ = 0.48, p = 6.72 × 10⁻⁷) than in group 2 (Kendall’s τ = 0.41, p = 1.59 × 10⁻⁶) (Supplementary Fig. 2).

Table 1 Clinical characteristics of the discovery cohort.

Full size table

Representative association signals of each group and their functional annotations

To identify distinctive association signals in each group, we focused on representative clusters where LOAD and CN frequencies differed by more than 5% (i.e., the red or green spheres in Fig. 1 were large). We observed six representative clusters (three red and three green spheres) in group 1, which contained 20,792 genetic markers from 848 subjects (391 LOAD cases and 457 CN samples), and five representative clusters (two red and three green spheres) in group 2, which contained 19,929 genetic markers from 1047 subjects (305 LOAD cases and 742 CN samples). Association signals were detected with logistic regression, adjusting for sex and age. From group 1, four genetic markers, located on or near four genes (APOC1, APOC1P1, RELB, and CBLC) showed statistically significant associations at Bonferroni-corrected p < 0.05; from group 2, 12 genetic markers, located on or near four genes (AXDND1, FBP1, AOPEP, and MIR2278) had significant associations (Table 2). Group 1 was characterized by major risk genes for developing LOAD (APOC1 and APOC1P1) [24] and immune-related genes (RELB and CBLC) [25,26,27], whereas group 2 was characterized by genes associated with kidney disorders, such as nephrotic syndrome (AXDND1, FBP1, and MIR2278) [28, 29]. Our results were consistent with previous findings of associations between kidney disease and AD [30] and between the immune system and AD [31].

Table 2 Association signals detected with logistic regression.

Full size table

Closest gene expression of association signals using blood samples

We used blood samples to examine whether the association signals detected affect expression of their closest genes. Of the eight genes, the Human Protein Atlas (HPA) database (https://www.proteinatlas.org) indicated that two (RELB = 2.0 and FBP1 = 33.1) are expressed in blood cells, where expressed genes were defined as genes with normalized transcripts per million ≥2 in total human peripheral blood mononuclear cells. Although MIR2278 and APOC1P1 have not been registered in the HPA database, the HPA database showed that APOC1, AOPEP, RELB, and FBP1 are expressed in brain tissues (Fig. 2a).

**Fig. 2: Validation of expression of the genes closest to the association signals using qRT-PCR.**

Association signals of RELB and FBP1 (rs146190016 and rs550833079) were located within their genes. The whole-blood RNA-seq data are available from the NCGG Biobank database. We examined the effect of the association signals on the expression of these genes by using linear regression adjusting for age and sex. We downloaded data on RELB for 126 of the 848 subjects in group 1 and data on RBP1 for 151 of the 1047 subjects in group 2 from the NCGG Biobank database. Although rs550833079 was not associated with a significant expression change in FBP1 (p = 0.37), rs146190016 contributed to a statistically significant increase of RELB expression (p = 0.019, Table 2 and Fig. 2b). To validate the expression Quantitative Trait Locus (eQTL) results for these genes, we used qRT-PCR data, which were consistent with the RNA-seq results for both genes (Fig. 2c).

Assessment of kidney function

Five measurements for assessment of kidney function (creatinine, cystatin C, eGFR, albumin [Alb], and hemoglobin [Hb]) were measured in routine blood tests. We obtained these data for 387 (342 LOAD cases and 45 CN subjects) from group 1 and 355 (269 LOAD cases and 86 CN subjects) from group 2. The difference in the measurements between LOAD and CN was determined with the Wilcoxon rank-sum test. We found that albumin and hemoglobin reached a false discovery rate (FDR) of significance (FDR_Alb = 0.012 in group 1; FDR_Alb = 1.30 × 10⁻⁶, FDR_Hb = 4.53 × 10⁻⁶ in group 2, Fig. 3), although the remaining three measures showed no statistically significant difference between LOAD and CN in either group. These results support the hypothesis that impaired kidney function can lead to LOAD pathogenesis.

As type 2 diabetes mellitus (T2D) is known to increase the risk for dementia, including AD [32], we examined the level of HbA1c measured in routine blood tests. We obtained the data for 388 (343 LOAD cases and 45 CN subjects) from group 1 and 355 (269 LOAD cases and 86 CN subjects) from group 2. The difference in levels between LOAD and CN was determined with the Wilcoxon rank-sum test, but there were no statistically significant differences between LOAD and CN subjects in both groups (P = 0.687 in group 1; P = 0.696 in group 2, Supplementary Fig. 3).

LOAD subtype prediction model

A disconnectivity graph was constructed based on PC scores, obtained from the PCA analysis using 22,128 variants with GWAS p < 0.01 in the discovery cohort. Of them, 17,743 variants were common with those in the independent validation cohort (see the Methods). The genomic variants could predict LOAD subtypes of patients. We applied a deep neural network with six hidden layers of 512 neurons along with RReLU activation, 50% dropout, and a batch size of 32 to predict PC scores from variant data of the discovery cohort (Fig. 4a). The predicted PC scores were binarized by the mean of eigenvector values used for the construction of the disconnectivity graph. Each subject was represented by a binary vector $V_7$ = ($\sigma _{1,}\sigma _{2, \cdots ,}\sigma _7$) of 2⁷, from which the location on the disconnectivity graph was determined. Prediction models were assessed by the rate at which LOAD cases and CN subjects were predicted in the red and green spheres of the same groups on the disconnectivity graph. We constructed neural network models with several hidden layers (Layer = 1, 2, …,7) and assessed them using the independent validation cohort (Supplemental Fig. 4). Our LOAD subtype prediction model (i.e., a deep neural network model comprised of six hidden layers) achieved an accuracy of 0.694 (2870/4137) in 900 epochs in the discovery cohort (group 1, 1228/1812 = 0.678; group 2, 1642/2325 = 0.706) and of 0.687 (2162/3145) in the validation cohort (group 1, 1025/1508 = 0.680; group 2, 1137/1637 = 0.695, Fig. 4b).

Discussion

Using the genotyping data from a case-control cohort of >4100 subjects, we examined genetic risks for the determination of LOAD subtypes and revealed the presence of two distinct groups of LOAD patients. Clinical characteristics of the groups showed a significant difference in the frequency of APOE4 alleles, although no differences were observed in the average age and the female-to-male ratio between groups. To understand the genetic difference between groups, we further investigated representative association signals in each group. Four representative genes were detected from each group: group 1, APOC1, APOC1P1, RELB, and CBLC; group 2, AXDND1, FBP1, AOPEP, and MIR2278.

The four genes in group 1 (APOC1, APOC1P1, RELB, and CBLC) are located near the APOE region, and APOC1 and APOC1P1 are in linkage disequilibrium with the APOE gene. However, previous studies have shown that they are APOE-independent risk factors for AD [24]. RELB is a member of the nuclear factor kappa B (NF-κB) family [25]. Activated NF-κB has been found in neurons and glial cells in the brains of AD patients [33]. CBLC (Cbl proto-oncogene C) is a member of the Cbl family of E3 ubiquitin ligases. Recent studies have shown that CBLC is likely to indirectly modulate the stimulatory function of NF‐κB complexes [34]. Nho et al. reported that RELB and CBLC were significantly associated with LOAD-related cerebrospinal fluid Aβ_1–42 with adjustment for APOE genotype [35]. These results suggest that the representative genes in group 1 are APOE-independent risk factors for LOAD and are likely to contribute to Aβ accumulation through direct or indirect NF‐κB activation.

On the other hand, three of the four representative genes in group 2 (AXDND1, FBP1, and MIR2278) are likely to be associated with kidney diseases. Several variants in AXDND1 have been registered as mutations involved in nephrotic syndrome in the GeneCards [28] and ClinVar databases [36]. FBP1 (fructose-bisphosphatase1) is a gluconeogenesis regulatory enzyme, which is suppressed in kidney tumors [37]. Although AOPEP has not been shown to be associated with kidney diseases, MIR2278 (microRNA 2278) can target STAT5A [38], which has been reported to contribute to the pathogenesis of kidney disease [39]. We further examined markers for kidney function measured in routine blood test results. Statistically significant differences between LOAD and CN were found in albumin and hemoglobin concentrations. These results suggest that impaired kidney function in group 2 LOAD patients could have led to AD pathogenesis. Wu et al. reported that AD severity was significantly associated with serum albumin and hemoglobin [40].

In this study, we revealed the presence of two distinct groups of LOAD patients through the genetic architecture of LOAD based on Japanese GWAS data, clearly revealing the presence of LOAD subtypes. We also used a deep neural network to create a LOAD subtype prediction model that achieved high accuracies in the discovery and validation cohorts, as our model can predict LOAD subtypes and phenotype simultaneously (i.e., LOAD group 1, LOAD group 2, CN group 1, and CN group 2). We believe that our findings will contribute to the understanding of LOAD and provide new insights into its pathogenic mechanisms. Many clinical trials using anti-Aβ antibody drugs, even Lecanemab, have not shown convincing results. This might be due to the existence of LOAD subtypes with different genetic backgrounds. Additional association studies between LOAD and CN using whole-genome sequencing data from large samples including non-Japanese will undoubtedly lead to further understanding of LOAD subtypes in the future.

Data availability

All GWAS summary datasets used or analyzed in the current study are available from the corresponding author on reasonable request.

Code availability

We used open-source program languages R (version 3.4.3) and Ruby (version 2.4.0) and the machine learning library PyTorch (version 1.7.0) to analyze data and create plots. Code is available upon request from the corresponding author.

References

Hardy J, Selkoe DJ. The amyloid hypothesis of Alzheimer’s disease: progress and problems on the road to therapeutics. Science. 2002;297:353–6.
Article CAS PubMed Google Scholar
Prince M, Bryce R, Albanese E, Wimo A, Ribeiro W, Ferri CP. The global prevalence of dementia: a systematic review and metaanalysis. Alzheimers Dement. 2013;9:63–75. e62
Article PubMed Google Scholar
Rabinovici GD. Late-onset Alzheimer disease. Continuum. 2019;25:14–33.
PubMed Google Scholar
Huber CM, Yee C, May T, Dhanala A, Mitchell CS. Cognitive decline in preclinical Alzheimer’s disease: amyloid-beta versus tauopathy. J Alzheimers Dis. 2018;61:265–81.
Article CAS PubMed Google Scholar
Bredesen DE. Metabolic profiling distinguishes three subtypes of Alzheimer’s disease. Aging. 2015;7:595–600.
Article CAS PubMed PubMed Central Google Scholar
Ferreira D, Nordberg A, Westman E. Biological subtypes of Alzheimer disease: a systematic review and meta-analysis. Neurology. 2020;94:436–48.
Article PubMed PubMed Central Google Scholar
Byun MS, Kim SE, Park J, Yi D, Choe YM, Sohn BK, et al. Heterogeneity of regional brain atrophy patterns associated with distinct progression rates in Alzheimer’s disease. PLoS ONE. 2015;10:e0142756.
Article PubMed PubMed Central Google Scholar
Bellenguez C, Kucukali F, Jansen IE, Kleineidam L, Moreno-Grau S, Amin N, et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat Genet. 2022;54:412–36.
Article CAS PubMed PubMed Central Google Scholar
Liu CC, Liu CC, Kanekiyo T, Xu H, Bu G. Apolipoprotein E and Alzheimer disease: risk, mechanisms and therapy. Nat Rev Neurol. 2013;9:106–18.
Article CAS PubMed PubMed Central Google Scholar
Santos LRD, Almeida JFF, Pimassoni LHS, Morelato RL, Paula F. The combined risk effect among BIN1, CLU, and APOE genes in Alzheimer’s disease. Genet Mol Biol. 2020;43:e20180320.
Article PubMed PubMed Central Google Scholar
Olive C, Ibanez L, Farias FHG, Wang F, Budde JP, Norton JB, et al. Examination of the effect of rare variants in TREM2, ABI3, and PLCG2 in LOAD through multiple phenotypes. J Alzheimers Dis. 2020;77:1469–82.
Article CAS PubMed PubMed Central Google Scholar
Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer’s disease. Nat Neurosci. 2020;23:311–22.
Article CAS PubMed Google Scholar
Jia L, Li F, Wei C, Zhu M, Qu Q, Qin W, et al. Prediction of Alzheimer’s disease using multi-variants from a Chinese genome-wide association study. Brain. 2021;144:924–37.
Article PubMed Google Scholar
De Velasco Oriol J, Vallejo EE, Estrada K, Tamez Pena JG, Disease Neuroimaging Initiative TA. Benchmarking machine learning models for late-onset alzheimer’s disease prediction from genomic data. BMC Bioinform. 2019;20:709.
Article Google Scholar
Ezaki T, Watanabe T, Ohzeki M, Masuda N. Energy landscape analysis of neuroimaging data. Philos Trans A Math Phys Eng Sci. 2017;375:20160287.
PubMed PubMed Central Google Scholar
McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR Jr, Kawas CH, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7:263–9.
Article PubMed PubMed Central Google Scholar
Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7:270–9.
Article PubMed PubMed Central Google Scholar
Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44:955–9.
Article CAS PubMed PubMed Central Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Article CAS PubMed PubMed Central Google Scholar
Slifer SH. PLINK: key functions for data analysis. Curr Protoc Hum Genet. 2018;97:e59.
PubMed Google Scholar
Shigemizu D, Mori T, Akiyama S, Higaki S, Watanabe H, Sakurai T, et al. Identification of potential blood biomarkers for early diagnosis of Alzheimer’s disease through RNA sequencing analysis. Alzheimers Res Ther. 2020;12:87.
Article CAS PubMed PubMed Central Google Scholar
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
Article CAS PubMed Google Scholar
Marone M, Mozzetti S, De Ritis D, Pierelli L, Scambia G. Semiquantitative RT-PCR analysis to assess the expression levels of multiple transcripts from the same sample. Biol Proced Online. 2001;3:19–25.
Article CAS PubMed PubMed Central Google Scholar
Prendecki M, Florczak-Wyspianska J, Kowalska M, Ilkowski J, Grzelak T, Bialas K, et al. Biothiols and oxidative stress markers and polymorphisms of TOMM40 and APOC1 genes in Alzheimer’s disease patients. Oncotarget. 2018;9:35207–25.
Article PubMed PubMed Central Google Scholar
Li J, Chen S, Chen W, Ye Q, Dou Y, Xiao Y, et al. Role of the NF-kappaB family member RelB in regulation of Foxp3(+) regulatory T cells in vivo. J Immunol. 2018;200:1325–34.
Article CAS PubMed Google Scholar
Qiao G, Zhao Y, Li Z, Tang PQ, Langdon WY, Yang T, et al. T cell activation threshold regulated by E3 ubiquitin ligase Cbl-b determines fate of inducible regulatory T cells. J Immunol. 2013;191:632–9.
Article CAS PubMed Google Scholar
Liu YC. Ubiquitin ligases and the immune response. Annu Rev Immunol. 2004;22:81–127.
Article PubMed Google Scholar
Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr Protoc Bioinformatics 2016;54:1.30.31–33.
Wang L, Man S, Bian Y. Bioinformatics analysis of biomarkers of aristolochic acid-induced early nephrotoxicity in embryonic stem cells. Exp Ther Med. 2021;21:508.
Article CAS PubMed PubMed Central Google Scholar
Zhang CY, He FF, Su H, Zhang C, Meng XF. Association between chronic kidney disease and Alzheimer’s disease: an update. Metab Brain Dis. 2020;35:883–94.
Article PubMed Google Scholar
Jevtic S, Sengar AS, Salter MW, McLaurin J. The role of the immune system in Alzheimer disease: etiology and treatment. Ageing Res Rev. 2017;40:84–94.
Article CAS PubMed Google Scholar
Bangen KJ, Werhane ML, Weigand AJ, Edmonds EC, Delano-Wood L, Thomas KR, et al. Reduced regional cerebral blood flow relates to poorer cognition in older adults with type 2 diabetes. Front Aging Neurosci. 2018;10:270.
Article PubMed PubMed Central Google Scholar
Kaltschmidt B, Uherek M, Volk B, Baeuerle PA, Kaltschmidt C. Transcription factor NF-kappaB is activated in primary neurons by amyloid beta peptides and in neurons surrounding early plaques from patients with Alzheimer disease. Proc Natl Acad Sci USA. 1997;94:2642–7.
Article CAS PubMed PubMed Central Google Scholar
Dong L, Li YZ, An HT, Wang YL, Chen SH, Qian YJ, et al. The E3 ubiquitin ligase c-Cbl inhibits microglia-mediated CNS inflammation by regulating PI3K/Akt/NF-kappaB pathway. CNS Neurosci Ther. 2016;22:661–9.
Article CAS PubMed PubMed Central Google Scholar
Nho K, Kim S, Horgusluoglu E, Risacher SL, Shen L, Kim D, et al. Association analysis of rare variants near the APOE region with CSF and neuroimaging biomarkers of Alzheimer’s disease. BMC Med Genomics. 2017;10:29.
Article PubMed PubMed Central Google Scholar
Landrum MJ, Chitipiralla S, Brown GR, Chen C, Gu B, Hart J, et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 2020;48:D835–D844.
Article CAS PubMed Google Scholar
Liu M, Pan Q, Xiao R, Yu Y, Lu W, Wang L. A cluster of metabolism-related genes predict prognosis and progression of clear cell renal cell carcinoma. Sci Rep. 2020;10:12949.
Article CAS PubMed PubMed Central Google Scholar
Kaymaz BT, Gunel NS, Ceyhan M, Cetintas VB, Ozel B, Yandim MK, et al. Revealing genome-wide mRNA and microRNA expression patterns in leukemic cells highlighted “hsa-miR-2278” as a tumor suppressor for regain of chemotherapeutic imatinib response due to targeting STAT5A. Tumour Biol. 2015;36:7915–27.
Article CAS PubMed Google Scholar
Fragiadaki M, Lannoy M, Themanns M, Maurer B, Leonhard WN, Peters DJ, et al. STAT5 drives abnormal proliferation in autosomal dominant polycystic kidney disease. Kidney Int. 2017;91:575–86.
Article CAS PubMed Google Scholar
Wu JJ, Weng SC, Liang CK, Lin CS, Lan TH, Lin SY, et al. Effects of kidney function, serum albumin and hemoglobin on dementia severity in the oldest old people with newly diagnosed Alzheimer’s disease in a residential aged care facility: a cross-sectional study. BMC Geriatr. 2020;20:391.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the NCGG Biobank and the NCGG Integrated Database for Dementia Research (iDDR) for providing the study materials, clinical information, and technical support. This study was supported by grants from AMED (Grant Numbers JP18kk0205009 and JP18kk0205012 to SN, grant numbers JP21dk0207045 and JP22dk0207060 to SN and KO, grant numbers JP21de0107002 and JP21dk0207052 to SN and DS, and grant number JP21km0405501 to KO); Research Funding for Longevity Sciences from the NCGG (21-23 to KO and 21-24 to DS); JSPS KAKENHI (grant number JP21H02470 to DS); The Hori Science and Arts Foundation (to DS); The Chukyo Longevity Medical Research and Promotion Foundation (to DS); Japan Foundation For Aging and Health (to SN); and a grant from the Japanese Ministry of Health, Labor, and Welfare for Research on Dementia (to KO). We also thank ELSS editors for English corrections.

Author information

Authors and Affiliations

Medical Genome Center, Research Institute, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan
Daichi Shigemizu, Shintaro Akiyama, Mutsumi Suganuma, Motoki Furutani, Akiko Yamakawa & Kouichi Ozaki
RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
Daichi Shigemizu & Kouichi Ozaki
Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, 734-8553, Japan
Motoki Furutani, Yukiko Nakano & Kouichi Ozaki
Core Facility Administration, Research Institute, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan
Shumpei Niida

Authors

Daichi Shigemizu
View author publications
You can also search for this author in PubMed Google Scholar
Shintaro Akiyama
View author publications
You can also search for this author in PubMed Google Scholar
Mutsumi Suganuma
View author publications
You can also search for this author in PubMed Google Scholar
Motoki Furutani
View author publications
You can also search for this author in PubMed Google Scholar
Akiko Yamakawa
View author publications
You can also search for this author in PubMed Google Scholar
Yukiko Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Kouichi Ozaki
View author publications
You can also search for this author in PubMed Google Scholar
Shumpei Niida
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DS developed the method and performed the analyses; SA, AY, and MF provided technical assistance; MS performed the qRT-PCR validation of gene expression; DS wrote the manuscript; and DS, KO, YN, and SN organized this work. All authors contributed to and approved the final manuscript.

Corresponding author

Correspondence to Daichi Shigemizu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Figure S1

Figure S2

Figure S3

Figure S4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shigemizu, D., Akiyama, S., Suganuma, M. et al. Classification and deep-learning–based prediction of Alzheimer disease subtypes by using genomic data. Transl Psychiatry 13, 232 (2023). https://doi.org/10.1038/s41398-023-02531-1

Download citation

Received: 02 March 2023
Revised: 16 June 2023
Accepted: 19 June 2023
Published: 29 June 2023
DOI: https://doi.org/10.1038/s41398-023-02531-1
Springer Nature Limited

This article is cited by

The HLA-DRB1*09:01-DQB1*03:03 haplotype is associated with the risk for late-onset Alzheimer’s disease in APOE ${{\varepsilon }}$4–negative Japanese adults
- Daichi Shigemizu
- Koya Fukunaga
- Kouichi Ozaki
npj Aging (2024)
Comprehensive Systematic Computation on Alzheimer's Disease Classification
- Prashant Upadhyay
- Pradeep Tomar
- Satya Prakash Yadav
Archives of Computational Methods in Engineering (2024)

Classification and deep-learning–based prediction of Alzheimer disease subtypes by using genomic data

Abstract

Similar content being viewed by others

Introduction

Materials and methods

Ethics statements

Clinical samples

Quality control in the GWAS

Disconnectivity graph

RNA-sequencing data analysis

qRT-PCR validation of genes

Construction of a LOAD subtype prediction model

Results

GWAS of the Japanese subjects

Disconnectivity graph construction

Demographic data of the two LOAD groups

Representative association signals of each group and their functional annotations

Closest gene expression of association signals using blood samples

Assessment of kidney function

LOAD subtype prediction model

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation