Introduction

Acute myocardial infarction (AMI), the most severe form of cardiovascular disease, is associated with [1, 2] millions of deaths annually around the world [3, 4]. Generally, the diagnosis of AMI includes clinical syndrome, electrocardiogram, and serum changes in enzyme levels [5]. However, AMI is easily misdiagnosed because of the following three aspects: nonclassic clinical symptoms [6, 7], atypical underappreciation [8], and an untimely serum peak. Because of the above three problems, a previous study [9] reported that the missed diagnosis rate of AMI is higher than 0.9%. The diagnosis and treatment of AMI must be prompt; otherwise, it may trigger irreversible results. Therefore, exploring new markers of AMI to decrease missed diagnoses is essential and urgent.

Leukocytes play an important and varied role in the entire evolution of AMI. During the acute injury phase of AMI, leukocytes promote a severe inflammatory cascade response through the polarization of M1 macrophages [10]. During the repair phase of AMI, M2 macrophages in leukocytes suppress inflammation and mediate the repair of injured myocardium [11]. Furthermore, leukocyte alteration positively correlates with AMI severity and, inversely, with patient survival [12, 13].

RNAs are involved in the evolution of AMI. For example, miR-155 correlated positively with the concentration of inflammatory cytokines, such as IL-6 and TNF-α [14], in AMI. Neutrophil-derived S100A8/A9 amplify granulopoiesis and cardiac injury in AMI mice [15]. Conversely, M2 macrophage-derived exosomes carry miR-1271-5p [16] to alleviate AMI-related cardiac injury. In conclusion, RNA on leukocytes plays a different role in the evolution of AMI, possibly related to different leukocyte subtypes. However, numerous studies have focused on integrating target interventions [12, 17] and leukocyte complications [17, 18]. Few studies have focused on the diagnostic value of leukocytes' RNA. Because the leukocytes' RNA is involved in the evolution of AMI, these RNA might have diagnosing value for AMI patients. The diagnosis value might be related to various leukocyte subtypes.

Machine learning (ML) helps humans learn patterns from complex data to predict future behavioural outcomes and trends. ML was widely utilized in variable filtering. A previous study used a single ML algorithm or two integrated ML algorithms (e.g., support vector machine [18] or least absolute shrinkage and selection operator [19]) to optimize variables. Still, these approaches may have missed potential genes [20]. Compared with a single ML algorithm, the integrated ML (IML) approach [21,22,23] we developed is more advantageous in variable screening and model building. IML helps identify potential genes mistakenly deleted by a single ML and find more meaningful variables [21]. IML integrates the advantages of a single ML, and its predictive classification value is better [23]. Based on a favourable filtration value in transcriptomics, IML might be used to comprehensively explore the diagnostic value in AMI patients.

In summary, we aim to explore the potential diagnostic value of transcriptome within leukocytes for identifying AMI patients. Because of IML's good variable screening and excellent predictive value, IML was first used to mine diagnostic genes in AMI leukocytes with multiple microarrays. Single microarray data might have inherent biases in capturing the entire transcriptomic landscape, so multiple microarrays are integrated after resolving batch effects to reduce bias and validate each other. And clinical validation was added to confirm the result. The relationship between transcriptome and leukocyte subtypes was unclear, so the correlation between immune cells and target transcriptome was subsequently accomplished. We expect to explore the functional roles of the identified genes in AMI pathophysiology, investigating their potential as therapeutic targets.

Methods

Data acquisition

The raw data were obtained from the Gene Expression Omnibus (GEO, March 27, 2022). AMI patients have similar symptoms to stable coronary artery disease (SCAD), which were set as the controls. An increasing leukocyte may influence the result of other cardiovascular diseases (e.g., stroke [24, 25] and heart failure [26]), which will be excluded. Based on the above, the following inclusion and exclusion criteria were set: (I) inclusion criteria—(i) diagnosed as AMI patients on admission; (ii) transcriptome was obtained from leukocytes in blood; (iii) initial data were free and accessible; and (iv) the control individuals were diagnosed with health or SCAD; and (II) exclusion criteria—(i) other cardiovascular diseases suspected and (ii) blood were taken more than one day after hospitalization.

Data processing

To ensure the reliability of the data, the R package sva (version 3.46.0) was applied to data integration to minimize the branch effects with the ComBat function and parametric adjustments. Regarding the distribution ratio of previous literature (1.64:1 [27] to 5:1 [28]) and to minimize the branching effect, this research was distributed in the training or testing sets at a ratio of 5:2. GSE59867 was included in the training set. In contrast, GSE60993, GSE62646, and GSE48060 were included in the testing set. In brief, the training set was applied to explore candidate diagnostic genes, and the testing set was used for validation. Based on the differential DEGs, three functional enrichment analyses were developed via the Kyoto Encyclopedia of Genes and Genomes Gene Set Enrichment Analysis (KEGG-GSEA), Gene Ontology (GO), and Disease Ontology (DO). In addition, the GO terms included three branches: molecular function (MF), biological process (BP), and cellular components (CC). Notably, the novel IML served two functions: developing classification ML and exploring the candidate variable. Finally, the above candidate genes were verified in the testing group and clinical study, and an immune analysis among the candidate genes was performed. CIBERSORT was processed for immune correlation analysis in the corrplot R package (version 0.92). And the primary code was linked with https://github.com/Linzhang-BiuBiuBiu/ML-for-diagnosis-genes.

Searching for DEGs

Because the same gene may have multiple sequences, the transcriptome will appear to have several expression data for the same genes. For the same genes, limma (version 3.54.0) was employed to identify the DEGs with the average gene expression. According to the Benjamini and Hochberg method, two thresholds were established: a log of fold change (|logFC|) > 0.7 (previous studies were 0.5 [29]–1 [23]) and a false discovery rate [30] < 0.05.

IML of six ML algorithms

Classification models of IML, composed of six ML algorithms, were processed, covering support vector machine (SVM), neural network (NN), random forest (RF), gradient boosting machine (GBM), decision trees (DT), and least absolute shrinkage and selection operator (LASSO). In brief, IML was used to identify candidate genes with the overall normalized weights. The six ML algorithms were developed to optimize parameter settings, model development in the training sets, and validation in the testing sets. For stability, all ML algorithms were tenfold cross-validated. Notably, an accuracy value was applied to evaluate the predictive classification value, and a higher accuracy value showed a better classification value of the six ML algorithms.

LASSO was processed with the glmnet (version 4.1-6) R package. cv.glmnet was utilized to majorize lambda. For the parameters, the scale of "lambda" was set between 0 and 100 with "binomial" and "class".Based on the minimum lambda, glmnet was processed to the LASSO with alpha and a "binomial" method in training sets.

SVM was developed with the e1071 R package (version 1.7–12). tune.svm was adopted to optimize the settings parameter with the kernel of "linear", and the cost between 1 and 20. Next, based on the optimized number of support vectors in the training set, the SVM model was finished.

DT was processed with rpart (version 4.1.19) and rpart.plot (version 3.1.1). Based on the "class" method and a cp value of 0.001, the rpart function was adopted for the DT model.

RF was completed with the R package randomForest (version 4.7-1.1). First, the tuneRF function was adopted to optimize 0-700 trees with one step size. RF was developed based on the minimum error rate to optimize the number of trees.

NN was processed with neuralnet (version 1.44.2) with neuralnet function, five layers (an input, an output, and three hidden layers), err.fct of "sse", and the linear.

Compared with the other 5 ML algorithms, GBM processed more steps and was prone to making mistakes. The GMB was developed with h2o (version 3.38.0.1). First, the Java operating environment was installed, which is the virtual environment of GBM. Essential for running the memory setting in h2o.init, the model memory of GBM was adjusted to 8G. The h2o data type in GBM was inevitable, and the as.h2o function was utilized to transform the data format. Next, h2o.gbm tuned the parameters and developed the model with the "Bernoulli" distribution, 200 trees, a learning rate of 0.001, and a sample rate of 90%.

Furthermore, with the weights of the above six ML algorithms in DEGs, the normalized sum weight of IML was calculated as follows: overall weights = abs(RF)/abs(RFmax) + abs(SVM)/abs(SVMmax) + abs(LASSO)/abs(LASSOmax) + abs(NN)/abs(NNmax) + abs(GBM)/abs(GBMmax) + abs(DT)/abs(DTmax). For instance, if the weight of interleukin-6 in six ML algorithms was 30, -22, 20, -2, 320, and -8, the maximum absolute value weights in the six ML algorithms were 60, 88, 80, 8, 640, and 16. Therefore, the overall weight of interleukin-6 was |30|/60 +|− 22|/88 +|20|/80 +|− 2|/8 +|320|/640 +|− 8|/16 = 2.25. With normalized overall weights > 1, the candidate genes were estimated by the area under the curve (AUC).

Clinical validation

The clinical trial was performed according to the Declaration of Helsinki guidelines. All AMI and SCAD patients provided individual written informed consent from October 10, 2022, to December 31, 2022, and the Ethics Review Committee of Jinghai District Hospital approved the study. There was no increase in the cost of treatment for the patients, no addition of other intervention in the treatment, and the blood samples used were taken from the discarded blood of the patients after their routine blood tests on the same day. If the patient did not have a routine blood test on that day, then the patient was excluded. All AMI patients underwent coronary angiography, and blood samples were collected in anticoagulant tubes on admission. Density gradient centrifugation [31, 32] was performed for leukocyte isolation (mainly monocytes and lymphocytes). In brief, 8 mL of Ficoll solution was added to 8 mL of anticoagulated blood, and the upper plasma layer was discarded after centrifugation. The white cell layer at the isolate's junction was aspirated, added to 10 mL of saline, and centrifuged; the bottom layer was the leukocytes (mainly monocytes and lymphocytes). RNA, isolated from leukocytes, was synthesized with reverse transcription kits (Takara, Shiga, Japan). Quantitative reverse transcription PCR was executed on an ABI7900HI (Thermo Fisher Scientific). According to previous literature, the relative content of the candidate genes was scaled to the reference gene (GAPDH [33]), and Table 1 lists the primer sequences.

Table 1 List of primers for real-time PCR analysis in GAPDH, AQP9, and SOCS3

Results

Included datasets

A total of 4 datasets (Table 2) (220 sample sizes), namely, GSE59867, GSE60993, GSE62646, and GSE48060, were integrated for this study. The training set was obtained from GSE59867 (46 controls and 111 AMI patients) based on a raw ratio of 5:2. Furthermore, the testing set was integrated with the other three datasets (28 controls and 35 AMI patients), namely, GSE60993, GSE62646, and GSE48060. The following analysis is presented in Fig. 1.

Table 2 Fundamental information in the 4 datasets
Fig. 1
figure 1

The workflow of this study contains four parts: GEO datasets for training and testing sets, machine learnings for classification and variable filtration, diagnosis value verification, and immune correlation

DEG identification

Thirty-nine DEGs were identified (Additional file 1: Table S1) in a training set from 17,049 RNAs. Compared to the control group (SCAD), 28 genes were upregulated (SOCS3, HP, ECRP, AQP9, FAM20A, CES1, STAB1, NRG1.1, NRG1, DYSF, RNASE1, RNASE2, ASGR2, CYP1B1, MERTK, FCGR1A.2, MIR21, FCGR1A.1, TCN2, VSIG4, PPARG, FCGR1A, SLED1, S100A9, FMN1.1, CD163, TMEM176A, and SERPINB2) and 11 genes were downregulated (KLRC3, KLRD1, KLRA1P, DTHD1, KLRC4, MYBL1, CLC, KLRC2, KLRC4-KLRK1, SNORD20, and SNORD45B) in AMI individuals (Fig. 2).

Fig. 2
figure 2

Heatmap and volcano plot of 39 DEGs in the AMI and control groups. A Red in the heat map indicates high expression, and a blue indicates low expression. B Green in the volcano map suggests lower expression, and red indicates high expression

Functional analysis

Based on the above DEGs, 45 GSEA terms (Additional file 1: Table S2) were identified, and the top 5 are shown in Fig. 3A, B; 160 GO terms (Additional file 1: Table S3) were identified, and the top 5 are shown in Fig. 3C; and the top 15 of 57 DO terms (Additional file 1: Table S4) are shown in Fig. 3D. In GSEA-KEGG of AMI, the top 3 were Fc gamma R-mediated phagocytosis, Huntington disease, and Leishmania infection. In GO, the top 3 in BP were the stimulatory C-type lectin receptor signalling pathway, response to lectin, and cellular response to lectin. In DO terms, the top 3 were atherosclerosis, arteriosclerotic cardiovascular disease, and arteriosclerosis.

Fig. 3
figure 3

Functional analysis of GSEA, GO, and DO terms. A The top 5 GSEA-KEGG pathways in controls. B The top 5 GSEA-KEGG pathways in AMI patients. C The top 5 GO terms in BP, CC, and MF. D The top 15 DO terms

IML of six ML algorithms

Six ML algorithms (Fig. 4) and their accuracies (Table 3) were assessed. Eight genes were identified in LASSO (Fig. 4A), and the training and testing sets' accuracy value was 70.70% (Table 3). In SVM, 13 genes were filtered (Fig. 4B), and the accuracies were 88.46% and 91.84%, respectively. The error rate of RF (Fig. 4C) decreased with an increasing number of trees. Until 161 trees, the error rate of RF was minimized, and the accuracy of the two sets was 98.09% and 100%. In DT (Fig. 4D), the gene expression of 9.8 in AQP9 could discriminate the control and AMI groups, while the accuracies were unstable, 94.27%, and 75.52%. In GBM (Fig. 4E), sixfold methods were established to optimize the diagnosis genes, but unstable accuracies, such as the above ML algorithms, were 93.30% and 85.71%. In the NN (Fig. 4F), although sufficient for discriminating the controls and AMI patients with three hidden layers, the accuracy was either 83.74% or 71.43%. Among the above ML algorithms, the primary weights of 39 DEGs were identified (Additional file 1: Table S5). Interestingly, RF had the highest and most stable accuracy value among all ML algorithms. The normalized overall weights (Table 4) were calculated to filter the candidate variables. Twenty-six genes (ASGR2, SOCS3, AQP9, PPARG, RNASE1, DYSF, S100A9, FCGR1A, VSIG4, STAB1, MYBL1, KLRD1, ECRP, TCN2, FAM20A, MERTK, HP, RNASE2, DTHD1, CLC, SNORD20, CD163, NRG1, SNORD45B, CYP1B1, and KLRC2) were identified because of overall weights > 1 (Table 4).

Fig. 4
figure 4

Six ML algorithms for classification with 39 DEGs. A LASSO for eight candidate genes and the error bars mean the fluctuation range of Binomial Deviance; B SVM for 13 candidate genes. C RF discriminated between the control and AMI groups. And the red, black, and green lines represent the Con, out-of-bag (OOB), and AMI groups respectively. D DT discriminated between the control and AMI groups. E A sixfold GBM submodel was constructed. The heat map illustrates the importance of genes in each respective submodel. The intensity of the color corresponds to the significance of the gene in the particular submodel. F NN discriminated between the control and AMI groups. All 39 DEGs were involved in modelling in NN, and there are ten because of space limitations. If an edge is colored red, it indicates a positive correlation, meaning that the current feature positively affects the classification result. Conversely, if the edge is gray, it implies a negative correlation. Furthermore, the thickness of the edge signifies the weight's magnitude

Table 3 Accuracy of six MLs based on 39 DEGs in the training and test sets
Table 4 Overall weights of six classification models were constructed to optimize the candidate diagnostic genes

With the basis of overall normalized weights > 1, 26 candidate genes were filtered for subsequent diagnosis in AMI and control groups in the training and testing sets. Among the 26 genes, 10 were excluded because of no differentiation in the testing set. Sixteen genes were significant in the two sets (Fig. 5).

Fig. 5
figure 5

The 16 DEGs also differed in the testing set

Diagnosis value of candidate genes

Sixteen candidate genes were included in the following ROC analysis. The AUC values of SOCS3, AQP9, and ASGR2 were greater than 0.85 in both the training and testing sets. In particular, 2 genes, SOCS3 and AQP9, were greater than 0.9 (Fig. 6). The AUC value of the two genes indicated a potential diagnostic value in AMI.

Fig. 6
figure 6

ROC curves for AQP9, SOCS3, and ASGR2 in the training and testing sets

Correlation analysis

Immune correlation was performed with the 220 samples (Fig. 7). The infiltration landscape (Fig. 7A) showed 22 immune distributions in the control and AMI groups. Nine types of immune cells (T cells CD8, T cells CD4 naive, T cells regulatory (Tregs), NK cells resting, monocytes, macrophages M0, macrophages M2, eosinophils, and neutrophils) infiltrated significantly between the control and AMI groups (Additional file 1: Fig. S1). Moreover, the correlations between 22 immunized cells and the two diagnostic genes, AQP9 and SOCS3, based on Spearman analysis (Fig. 7B, C) showed significant correlations with 9 immune cells (monocytes, neutrophils, T cells CD8, NK cells resting, T cells CD4 naive, eosinophils, macrophages M2, dendritic cells activated, and B cells memory). More importantly, two immune cell types (monocytes and neutrophils) possessed a higher correlation coefficient (Fig. 7B, C) than the other 7 immune cell types (Additional file 1: Figs. S2, S3). In particular, the correlation coefficients of monocytes (Fig. 7B, C) were highest for the two genes (0.56 for SOCS3 and 0.76 for AQP9).

Fig. 7
figure 7

Immune correlation analysis of AQP9 and SOCS3 between the control and AMI groups. A The stacked column graph between the control and AMI groups. B The lollipop map and scatterplot of the different immune cell types in SOCS3 C The lollipop map and scatterplot of the different immune cell types in AQP9. * mean < 0.05, ** mean < 0.01, ***mean < 0.001

Clinical validation

Finally, 40 individuals (20 SCAD and 20 AMI patients) were recruited. The general information of these individuals was shown in Table 5. Among 39 clinical characteristics were summarized, and 13 had significance between the SCAD and AMI patients, including WBC, NeP, MonP, Lym, GAT, D-dimer, CRP, SOCS3, AQP9, LDH, cTnT, CK-MB, and Albumin.

Table 5 The general characteristics of the 40 patients

The relative RNA levels (Fig. 8A) of AQP9 and SOCS3 were both significant. The SOCS3 content of coronary arteries differed by the number of lesions (Fig. 8B): three lesions showed significantly higher SOCS3 than two and one (Fig. 8B). In Fig. 8C, two AMI patients were excluded because of unclear Killip grading. Patients with III-IV Killip classification had higher SOCS3 compared to those with I-II (Fig. 8C). Although more stenotic coronary arteries were associated with higher levels of AQP9, the difference was less significant than for SOCS3 (Fig. 8B). In addition, different Killip classifications associated with AQP9 possessed no significant differences (Fig. 8C). Furthermore, the 9 significant clinical features were analysed with Pearson correlation test (Additional file 1: Fig. S4). And SOCS3 had a positive correlation with AQP9. Both genes had a negative correlation with Albumin.

Fig. 8
figure 8

Relative RNA levels of AQP9 and SOCS3 in AMI patients and SCAD controls. A The relative content of SOCS3 and AQP9 in AMI patients and SCAD controls. B The comparison of AQP9 and SOCS3 in the number of coronary arteries with different stenoses in AMI. C The comparison of AQP9 and SOCS3 in various Killip classifications in AMI. * mean < 0.05, ** mean < 0.01, *** mean < 0.001, ns mean no significance

Discussion

To our knowledge, our work is the first to filter AMI diagnosis genes based on the overall normalized weights of IML. Four microarrays with 220 samples were adopted for data analysis, and further clinical studies were performed to validate the results. Two genes, AQP9 and SOCS3, showed an AUC > 0.9 in both the training set and testing set (Fig. 6). Both genes showed a typical and highest correlation coefficient (Fig. 7) in monocytes. The clinical study verified the significance between AMI and SCAD controls, indicating a potential diagnostic value of AQP9 and SOCS3. Compared with previous studies, we reached similar conclusions that AQP9 presented diagnostic value for AMI [34, 35], and we further explored the immune correlation of AQP9. Additionally, Prof. Zhu [36] identified SOCS3 as an immune-related gene in AMI, and we expanded it to have diagnostic value. More importantly, this study is the first to reveal the RNA correlation of AQP9 and SOCS3, especially SOCS3, between the number of stenotic coronary arteries and the Killip classification.

AQP9, a cell membrane protein, transports water down the concentration gradient. ERK1/2 can be reversed in AMI rats by silencing AQP9, attenuating cardiomyocytes' inflammatory response and apoptosis and upregulating cardiac function [37]. The above research indicated the crucial role of AQP9 in the pathogenesis of AMI. In human polymorphonuclear leukocytes, AQP9-related inflammation may result from the NK-κB [38] and F-actin polymerization [39]. In our work, the ROC curve of AQP9 was > 0.9. Therefore, AQP9 might be a potential genetic marker for diagnosing AMI with SCAD.

SOCS3 is increased in AMI mice [29] and regulates the T-cell repertoire with STAT3/SOCS3 signalling [40]. More importantly, cardiac-specific silencing of SOCS3 triggers sustained STAT3 and decreases myocardial apoptosis [41]. Therefore, SOCS3 is the dominant negative modulator [42] of Th17 via STAT3 [43]. Apoptosis regulates the pathophysiological evaluation of AMI [44]. In vitro, SOCS3 can trigger the apoptosis of mammary cells [45], and knocking out SOCS3 regulates the expression of apoptosis in 3T3-L1 preadipocytes [46]. The above research emphasized the immune regulation of SOCS3 and the regulation of apoptosis with STAT3. In our work, the ROC curve of SOCS3 was > 0.9. Therefore, SOCS3 might be an effective genetic marker for diagnosing AMI.

Additionally, the CIBERSORT algorithm showed that the proportion of neutrophils and monocytes in the AMI group was higher than in the control group. The progression of AMI is correlated with immune disorder. For example, the white blood cell count correlates highly with in-hospital mortality after AMI [47]. Neutrophils are increased in peripheral blood, and researchers have emphasized that neutrophils-lymphocytes [48, 49] and monocytes/macrophages [50] can be easily acquired factors for the prognosis of AMI. Macrophages were dominant in infarcted myocardium, especially over the first week of AMI [51]. However, NK cells have diminished cytotoxic function [52], and the targeted regulation of NK cells may indicate a dominant role in the cure of AMI. At the beginning of AMI, inflammation deteriorates with increased neutrophils and monocytes [53], and inflammation decreases over time with the reduced function of NK cells. Innate immunity is a vital regulatory factor in the inflammatory, proliferative, and maturation phases [3, 54, 55]. AMI leads to a deteriorated inflammatory process. Currently, novel therapeutic interventions targeting the immune system may regulate slant inflammation, which is conducive to resolving pathological conditions. In a previous clinical trial of 182 NSTEMI patients (a subtype of AMI), the patient's intake of IL-1 blockers decreased acute inflammation [56]. Another immune study showed that short-term blockade of S100A9 downregulates inflammation [57] in permanent coronary ischemia mice. However, the above immune interventions are still experimental and not in the clinic. In summary, regulating immune cells along with the progression of AMI and immune intervention in AMI might be a potential target.

AQP9 expression was highest in human polymorphonuclear leukocytes [39] compared with the spleen and liver, suggesting a possible correlation between AQP9 and immunity or inflammation. AQP9 regulates water flow on leukocytes [58], which regulates cellular morphology and motility, a change that facilitates the migration of leukocytes to inflammatory sites. Similar to our result, Hawang [59] indicated the correlation between AQP9 and neutrophile granulocytes. Research [29, 60, 61] emphasizes the correlation between SOCS3 and neutrophils in inflammation. In our research, both genes had a higher correlation with two immune cells, neutrophils and monocytes. The immune cell correlation indicated that the targeted gene therapy of immune cells may benefit the course of AMI—potential feasibility of using AQP9 and SOCS3 as therapeutic targets or predictors of treatment response.

ML algorithms are widely performed for various cardiovascular diseases, such as optimizing variables, classification, and congression. For variable filtration, numerous studies take only single or double ML algorithms (e.g., weighted gene coexpression network analysis [60], LASSO, and SVM). However, only the single or double ML algorithms might unconsciously delete the potential genes. For example, AQP9 will be ignored if we only take DT because the weights of AQP9 were zero in DT (Table 4). Taking only a single ML might miss some potential genes. For example, although LASSO can detect candidate genes with big data when highly correlated features exist, the LASSO regression method tends to select one of them and ignore all the other features, leading to the instability of the results [61]. In pigmented skin lesions [62], SVM and NN displayed their talent classification value. In preoperative postsurgical mortality [63], GBM is optimized rather than DT, RF, and SVM. Various ML algorithms may show different weights even in the same variable (Table 4). Necessarily, the overall normalized weights of IML were taken to filter genes. Surprisingly, IML explores two potential, unreported diagnostic genes in AMI. In our study, IML has good value in both variable screening and model prediction.

Inevitably, four limitations exist in this work, although the best efforts were taken to eliminate them. Primarily, small sample size verification might possess some bias. So, multicentre collaborations or leveraging larger external datasets is crucial for further verification. Although testing sets and clinical validation were developed to assess the stability of the diagnostic value, the bias of single-centre validation might exist. More confirmation, clinical trials and animal experiments are indispensable for solid verification. Next, the ML algorithms contained limitations (e.g., the black box phenomenon [64]), especially NN, which has numerous layers [65]. The set of operations an ML performs in making a prediction is unknown, even if a human knows precisely what the model is doing at each step of the decision-making process. The operations performed cannot be described in terms of human-understandable semantics. And the interpretability techniques for ML models always catch the eye of developers, which enhances the transparency and reliability of the ML. Thirdly, because of the limitations of our laboratory extraction process, clinically validated acquired leukocytes are predominantly lymphocytes and monocytes. Finally, limited clinical features were obtained (e.g., age [66], ethnicity, and race [67]). Clinical features could potentially enhance the predictive accuracy of the diagnostic model and provide a more comprehensive understanding of AMI. For example, various combinations (e.g., sex, smoking or not, and laboratory indicators) of clinical variables [68] are calibrated to analyze the relationship between the target variable and the outcome.

Conclusion

Based on the overall normalized weights of IML, the research successfully merges four microarrays and uncovers hidden diagnostic genes AQP9 and SOCS3 for leukocytes of AMI patients. AQP9 and SOCS3 are closely associated with monocytes and neutrophils, which might contribute to advancing AMI diagnosis and shedding light on novel genetic markers, including AMI pathogenesis, targeted therapies, and potential precision medicine. Although clinical validation copies the result again. Multiple clinical characteristics, multicenter, and large-sample relevant trials are still needed to confirm its clinical value.