Background

The popularity of artificial intelligence (AI) in healthcare has exponentially risen in recent years, attracting the attention of professionals and students alike [1, 2]. The emergence of large language models like ChatGPT has further expanded AI’s potential in medicine, offering new possibilities for clinical applications and medical training [3, 4]. AI has demonstrated expert-level performance in various medical domains, including breast cancer screening, chest radiograph interpretation, and prediction of treatment outcomes [5,6,7,8].

The increasing prevalence of AI in healthcare necessitates its incorporation into medical education. AI offers numerous potential benefits for medical training, such as enhancing understanding of complex concepts, providing personalized learning experiences, and simulating clinical scenarios [9,10,11,12]. Moreover, familiarizing medical students with AI tools and technologies prepares them for the realities of their future professional lives [13, 14]. However, the integration of AI also raises significant ethical challenges, including concerns about patient autonomy, beneficence, non-maleficence, and justice [9, 15, 16].

Existing literature has primarily focused on the technical aspects of AI in medicine or its potential applications in specific medical specialties [17]. Other studies have explored healthcare professionals' perceptions of AI, but these have been limited by small sample sizes and lack of geographic diversity [17]. This gap in the literature precludes a comprehensive understanding of how future healthcare professionals across different regions perceive and prepare for AI integration in their fields.

This multicenter study addresses this gap by investigating the perspectives of medical, dental, and veterinary students on AI in their education and future practice across multiple countries. Specifically, we examine: 1) students' technological literacy and AI knowledge, 2) the current state of AI in their curricula, 3) their preferences for AI education, and 4) their attitudes towards AI's role in their fields. By exploring regional differences on a large, international scale, this study offers a unique comparative overview of students' perceptions worldwide.

Methods

This multicenter cross-sectional study was conducted in accordance with the Strengthening the reporting of observational studies in epidemiology (STROBE) statement and received ethical approval from the Institutional Review Board at Charité – University Medicine Berlin (EA4/213/22), serving as the principal institution, in compliance with the Declaration of Helsinki and its later amendments [18, 19]. To ensure participant anonymity, the necessity for informed consent was waived.

Instrument development and design

Following the Association for Medical Education in Europe (AMEE) guide, this study aimed to develop an anonymous online survey to assess: 1) the technological literacy and knowledge of informatics and AI, 2) the current state of AI in their respective curricula and preferences for AI education, and 3) the perspectives towards AI in the medical profession among international medicine, dentistry, and veterinary medicine students [20]. To inform instrument development, a literature review of existing publications on the attitudes of medical students towards AI in medicine was independently performed by four reviewers (FB, LH, KKB, LCA), leveraging MEDLINE, Scopus, and Google Scholar databases in December 2022. Studies were selected for review based on the following criteria: 1) the publications were original research articles, 2) the scope aligned with our research objectives and targeted medical students, 3) the survey was conducted in English language, 4) the items were publicly accessible, 5) the measurement of perspectives towards AI was not restricted to a particular medical subfield. Following these criteria, five articles comprising a total of 96 items were identified as relevant to the research scope [21,22,23,24,25]. After a consensus-based discussion, items that did not match our research objectives or overlapped in content were excluded, resulting in 23 remaining items. These items were subsequently tailored to fit the context of medical education and the medical profession.

A review cycle was undertaken with a focus group of medical AI researchers and students, as well as an expert panel including physicians, medical faculty members and educators, AI researchers and developers, and biomedical statisticians (FB, LH, DT, MRM, KKB, LCA, AB, RC, GDV, AH, LJ, AL, PS, LX). The finalized survey consisted of 16 multiple-choice items, eight demographic queries, and one free-field comment section. These items were further refined based on content-based domain samples, and responses were standardized using a four- or five-point Likert scale where applicable.

The preliminary assessment was conducted through cognitive interviews with ten medical students at Charité – University Medicine Berlin to evaluate the scale's comprehensiveness and overall length. The feedback resulted in two rewordings and one item removal, finalizing the survey with 15 multiple-choice items and eight demographic queries supported by one free-field comment section. The final questionnaire items and response options can be viewed in Table 1.

Table 1 Questionnaire items and response options

Using REDCap (Research Electronic Data Capture) hosted at Charité – University Medicine Berlin, the English survey was subsequently disseminated through the medical student newsletter at Charité and deactivated after receiving responses from 50 medical students who served as the pilot study group and were not included in the final participant pool [26, 27]. After psychometric validation, participating sites distributed the REDCap online survey among medical, dental, and veterinary students at their faculty. Due to the large number of Spanish-speaking sites, a separate Spanish online version of the survey was employed using paired forward and backward translation with reconciliation by two bilingual medical professionals (LG, JSPO). Depending on their faculty location, participating sites distributed either the English or Spanish online survey via their faculty newsletters and courses using a QR code or the direct website link (non-probability convenience sampling). The survey was available for participation from April to October 2023.

Our data collection methodology was designed to mitigate several risks related to privacy, confidentiality, consent, transparency of recruitment, and minimization of harm, as highlighted before [28]. By using faculty newsletters and course distributions, we reduced the exposure of personal information on social media platforms, thereby maintaining a higher level of privacy. This method ensured that our participants' identities and responses were not publicly available or exposed to wider online networks. To further secure the data, the survey platform used was selected for its robust security features, including data encryption and secure storage. We explicitly informed participants about how their data would be used and protected, ensuring transparency and building trust.

Distributing the survey through official academic channels, such as faculty newsletters, implied a degree of formality and oversight, increasing the likelihood that participants were adequately informed of the study's intentions. By detailing the purpose of the study, the use of data and participants' rights on the first page of the survey, participants had to indicate their understanding and agreement by ticking an 'I agree' box before proceeding.

Using institutional channels for distribution provided a transparent and credible recruitment process that was likely to reach a relevant and engaged audience. We ensured that participants were aware that their participation was completely voluntary and that they could withdraw from the study at any time without penalty. We also provided contact details for participants to ask questions about the study, promoting openness and trust.

By avoiding the use of social media for recruitment, we eliminated the risk of participants' responses being exposed to their social networks, thereby protecting their privacy and reducing potential social risks. The content of the survey was carefully reviewed to ensure that no questions could cause distress or harm to participants. Participants were informed that they could skip any questions they felt uncomfortable answering, ensuring their well-being and autonomy throughout the survey process.

Inclusion and exclusion criteria

Inclusion criteria consisted of students at least 18 years of age, actively enrolled in a (human) medicine, dentistry, or veterinary medicine degree program, who responded to the survey during its open period and were proficient in either English or Spanish, depending on their faculty location. Participants had to confirm their enrollment in a relevant program and input their age to verify they were above 18 years old. Only those meeting these criteria could proceed with the survey. Respondents who started the survey but did not answer any multiple-choice items were excluded from the analysis. Partial missing responses to survey items resulted in exclusion from each subanalysis.

Statistical analysis

Statistical analyses were performed with SPSS Statistics 25 (version 28.0.1.0) and R (version 4.2.1), using the "tidyverse", "rnaturalearth", and "sf" packages [29,30,31,32]. The Kolmogorov–Smirnov test was used to test for normal distribution. Categorical and ordinal data were reported as frequencies with percentages. Medians and interquartile ranges (IQR) were reported for non-parametric continuous data. Variances were reported for items in Likert scale format. The response rate was derived from the overall student enrollment numbers at each faculty according to the faculty websites or the Times Higher Education World University Rankings 2024 due to the unavailability of official data on enrolled medical, dentistry, or veterinary students. In the pilot study group, item reliability was measured using Cronbach's α, with values above 0.7 interpreted as acceptable internal consistency. Explanatory factor analysis was used to examine the structure and subscales of the instrument, using an eigenvalue cutoff of 1 for item extraction. Items with factor loadings of 0.4 or higher were retained. Data suitability for structural evaluation was assessed using the Kaiser–Meyer–Olkin measure and Bartlett's test of sphericity. For geographical subgroup analysis, respondents were categorized based on their faculty location (Global North versus Global South) according to the United Nations' Finance Center for South-South Cooperation [33]. Additionally, participants were grouped into continents based on the United Nations geoscheme [34]. Due to the substantial number of European participants, students in North/West and South/East Europe were analyzed separately. Further subgroup analyses based on gender, age, academic year, technological literacy, self-reported AI knowledge, and previous curricular AI events can be found in the appendix (see Supplementary Tables 1–7). The Mann–Whitney U-test was employed for subgroup analyses of two independent non-parametric samples. For continental comparison, the Kruskal–Wallis one-way analysis of variance and Dunn-Bonferroni post hoc test were performed. To estimate effect size, we calculated r, with 0.5 indicating a large effect, 0.3 a medium effect, and 0.1 a small effect [35]. An asymptotic two-sided P-value below 0.05 was considered statistically significant.

Results

Pilot study

The median age of the pilot study group was 24 years (IQR: 21–26 years). 58% of participants identified as female (n = 29), 38% as male (n = 19), and 4% (n = 2) did not report their gender. The median current academic year was 2 (IQR: 2–4 years) out of 6 total academic years. Internal consistency for our scale's dimensions ranged from acceptable to good, as indicated by Cronbach's α. The section on "Technological literacy and knowledge of informatics and AI" registered an α of 0.718, while the section "Current state of AI in the curriculum and preferences for AI education" scored an α of 0.726, both displaying acceptable internal consistency. A Cronbach's α value of 0.825 for the "Perspectives towards AI in the medical profession" section denoted good internal consistency. The Kaiser–Meyer–Olkin measure for sampling adequacy was 0.801, confirming the sample's representational validity. Bartlett's test of sphericity returned a P-value of less than 0.001, validating the chosen method for factor analysis. Factor analysis yielded a structure comprising 15 items across three dimensions, collectively explaining 54% of the total variance. Factor loadings for individual items ranged from 0.495 for "Which of these technical devices do you use at least once a week?" to 0.888 for "What is your general attitude toward the application of artificial intelligence (AI) in medicine?".

Study cohort

Between the first of April and the first of October 2023, 4900 responses were recorded, of which 4345 (88.7%) were collected via the English survey and 555 (11.3%) via the Spanish survey version. Of these, 283 (5.8%) respondents reported degrees other than medicine, dentistry, or veterinary medicine or indicated that they had completed their studies, while 21 (0.4%) did not respond to any multiple-choice item or did not indicate their degree. The final study cohort comprised 4596 participants from 192 faculty and 48 countries, of whom 4313 (93.8%) were medical, 205 (4.5%) dentistry, and 78 (1.7%) veterinary medicine students. Of 5,575,307 enrolled students from all degrees at the 183 (95.3%) participating faculties in which the total enrollment number was publicly available, the survey achieved an average response rate of 0.2% (standard deviation: 0.4%). Most respondents studied in Southern/Eastern European (n = 1240, 27%) countries, followed by Northern/Western Europe (n = 1110, 24.2%), Asia (n = 944, 20.5%), South America (n = 555, 12.1%), North America (n = 515, 11.2%), Africa (n = 125, 2.7%), and Australia (n = 104, 2.3%). Please refer to Fig. 1 to view the distribution of participating institutions in relation to the number of participants on a world map. A detailed list of survey participants divided by country, faculty, city, degree, number of enrolled students, and response rate is provided in the appendix (see Supplementary Table 8). The median age of the study population was 22 years (IQR: 20–24 years). 56.6% of the participants were female (n = 2600) and 42.4% male (n = 1946), with a median academic year of 3 (IQR: 2–5 years). Full descriptive data, including items on technological literacy and preferences for AI teaching in the medical curriculum, are displayed in Table 2. Any free field comments of the survey participants are listed in the appendix (see Supplementary Table 9), with selected comments highlighted in Fig. 2.

Fig. 1
figure 1

The world map displays the geographical distribution of participating institutions (blue dots) in relation to the number of respondents per institution

Table 2 Descriptive data of the study population and results of the questions about tech-savviness and topic preferences for AI teaching
Fig. 2
figure 2

Diverse perspectives from medical students on the integration of artificial intelligence (AI) in healthcare education and practice. The selected quotes reflect a range of sentiments, from concerns about dehumanization and potential challenges in low-resource settings to viewing AI as a beneficial tool that complements rather than replaces the human touch in medicine

Collective perceptions towards artificial intelligence

Table 3 displays the survey results for Likert scale items. Students generally reported a rather or extremely positive attitude towards the application of AI in medicine (3091, 67.6%). The highest positive attitude towards AI in the medical profession was recorded for the item "How do you estimate the effect of artificial intelligence (AI) on the efficiency of healthcare processes in the next 10 years?" with 4042 respondents (88.4%) estimating a moderate or great improvement. Contrarily, 3171 students (69.4%) rather or completely agreed with the item "The use of artificial intelligence (AI) in medicine will increasingly lead to legal and ethical conflicts.". Regarding AI education and knowledge, 3451 students (75.3%) reported no or little knowledge of AI, and 3474 (76.1%) rather or completely agreed that they would like to have more teaching on AI in medicine as part of their curricula. On the other hand, 3497 (76.3%) students responded that they did not have any curricular events on AI as part of their degree, as illustrated on the country level in Fig. 3. Variability in responses was observed, ranging from 0.279 for the item "How would you rate your general knowledge of artificial intelligence (AI)?" —measured on a four-point Likert scale— to 1.372 for "With my current knowledge, I feel sufficiently prepared to work with artificial intelligence (AI) in my future profession as a physician.". Notably, the items capturing the trade-offs in medical AI diagnostics revealed that most students preferred AI explainability (n = 3659, 80.2%) over a higher accuracy (n = 902, 19.8%) and higher sensitivity (n = 2906, 63.9%) over higher specificity (n = 1118, 24.6%) or equal sensitivity/specificity (n = 524, 11.5%), as visualized in Fig. 4.

Table 3 Survey results of Likert scale format items on attitudes towards the medical degree, AI in the medical profession, AI education and knowledge
Fig. 3
figure 3

Pie charts illustrating student responses at the country level for the item "As part of my studies, there are curricular events on artificial intelligence (AI) in medicine.". A more filled, darker red chart indicates a higher proportion of students reporting no AI events, while a less filled, greener chart indicates fewer students reporting the absence of AI events. The missing portion of each chart displays the proportion of students who reported AI events, regardless of the duration. An all-white pie chart indicates that all students reported AI events in the medical curriculum. The absolute number of responses per country is shown above each chart. Analysis of the pie charts from countries with a representative sample of at least 50 respondents reveals that, among 28 nations, only four (Indonesia, Switzerland, Vietnam, and China) exhibited over 50% of students reporting the inclusion of AI events within their medical curriculum. Data from the USA displayed an equal proportion of students reporting the presence or absence of AI events in their curriculum (50% each). The residual 23 countries, encompassing Germany, Portugal, Mexico, Brazil, Poland, UAE, Austria, Italy, India, Argentina, Macedonia, Canada, Slovenia, Ecuador, Australia, Azerbaijan, Japan, Spain, Chile, Moldova, South Africa, Nepal, and Nigeria, had a lower proportion of students reporting the integration of AI in the medical curriculum. Abbreviations: UAE, United Arab Emirates; USA, United States of America

Fig. 4
figure 4

Gantt diagrams depicting medical students' preferences in AI diagnostics. a AI explainability (n = 3659, 80.2%) versus higher accuracy (n = 902, 19.8%) and b) higher sensitivity (n = 2906, 63.9%) versus higher specificity (n = 1118, 24.6%) or equal sensitivity and specificity (n = 524, 11.5%)

Regional comparisons

Please refer to Table 4 to view the results of the comparison of responses from the Global North and South for Likert scale format items. Perceptions between the Global North and South differed significantly for nine Likert scale format items. The highest effect size was observed for the item on AI increasing ethical and legal conflicts, with respondents from the Global North indicating a higher agreement (median: 4, IQR: 3–5) compared to those from the Global South (median: 4, IQR: 3–4; r = 0.185; P < 0.001). Notably, Global South students felt more prepared to use AI in their future practice (median: 3, IQR: 2–4) compared to their Global North counterparts (median: 2, IQR: 1–3; r = 0.162; P < 0.001) and reported longer AI-related curricular events (median: 1, IQR: 1–2; Global North: median: 1, IQR: 1–1; r = 0.090; P < 0.001). Conversely, Global North students rated their AI knowledge higher (median: 2, IQR: 2–3; Global South: median: 2, IQR: 2–2; r = 0.025; P < 0.001).

Table 4 Regional comparison of respondents from the Global North and South for Likert scale format items

For continental comparison, the Kruskal–Wallis one-way analysis of variance revealed significantly different Likert scale responses across all survey items (see Table 5). Subsequent Dunn-Bonferroni post hoc analysis displayed various significant differences in Likert scale responses for pairwise regional comparisons, while median and IQR remained largely consistent. Considering only medium to large effect sizes, the item "The use of artificial intelligence (AI) in medicine will increasingly lead to legal and ethical conflicts." yielded an r of 0.301 when comparing Northern/Western European (median: 4, IQR: 4–5) and South American participants (median: 4, IQR: 3–4; P < 0.001), and an r of 0.311 between South American and Australian participants (median: 4, IQR: 4–5; P < 0.001). Similarly, the statement "With my current knowledge, I feel sufficiently prepared to work with artificial intelligence (AI) in my future profession as a physician." displayed strong effect sizes in comparisons between North/West Europe (median: 2, IQR: 1–2) and Asia (median: 3, IQR: 2–4; r = 0.531; P < 0.001), South/East Europe (median: 2, IQR: 2–3) and Asia (r = 0.342; P < 0.001), and South America (median: 2, IQR: 2–3) and Asia (r = 0.398; P < 0.001).

Table 5 Regional comparison of Likert scale format items on the continental level

Discussion

Our multicenter study of 4596 medical, dental, and veterinary students from 192 faculties in 48 countries provides crucial insights into the global landscape of AI perception and education in healthcare curricula. The findings reveal a nuanced picture: while students generally express optimism about AI’s role in future healthcare practice, this is tempered by significant concerns and a striking lack of preparedness.

The educational basis of our study lies in addressing a critical gap in AI education within medical curricula, exploring how this deficiency varies across different regions, particularly between continents and the Global North and South. As AI rapidly advances and promises to reshape healthcare, the need for future physicians to be adequately prepared through comprehensive AI education becomes increasingly urgent. Our study goes beyond merely asserting the necessity of AI education by elucidating regional differences in perceptions and experiences related to AI among healthcare students.

Our findings extend previous research highlighting inadequacies in AI education in medical schools globally. Kolachalama and Garg [36] noted that AI is not widely taught in medical schools, with most curricula lacking substantial AI training modules. Chan and Zary [37] reinforced this, emphasizing the gap between recognizing AI’s potential benefits and actually integrating AI education into medical programs. Our study confirms these deficiencies on a larger, international scale, revealing that over three-quarters of students reported no AI-related events in their curriculum, despite strong interest in such education. Importantly, our research uncovers regional disparities in AI education and perception.

Students from the Global South were generally less likely to report having AI incorporated into their curricula compared to their counterparts in the Global North. This discrepancy underscores the need for tailored educational strategies that consider these regional differences to ensure equitable preparation for an AI-enhanced medical landscape. The observed differences in perceived preparedness for working with AI, particularly among Asian students, may reflect varying national AI policies, educational strategies, and macroeconomic factors [38, 39].

Depending on the study and item design, self-reported AI knowledge in the literature ranges from 2.8% of 2981 medical students in Turkey in 2022 who reported feeling informed about the use of AI in medicine to 51.8% of 900 medical students in Jordan in 2021 who indicated having read articles about AI or machine learning in the past two years [21, 40,41,42,43,44]. On the other hand, the reported prevalence of AI training in the medical curriculum ranges, for instance, from 9.2% in a 2020 survey of 484 medical students in the United Kingdom up to 24.4% in a 2022 study among 2981 medical students in Turkey, although variations in item designs and demographic contexts hinder a comprehensive longitudinal analysis [22, 40, 42, 43, 45]. In our study, less than 18% (n = 5) of countries with a sample size of 50 or more participants had a higher or equal proportion of students reporting any duration of AI teaching, pointing to a persistent deficit in medical AI education across various demographic landscapes. Overall, the incorporation of AI into medical education on a broader national or international scale is limited, and the adoption of frameworks, certification programs, interdisciplinary collaborations, modules, and formal lectures seems still to be at an early stage [14, 46,47,48, 49].

While our study design and varying sample sizes across regions complicate causal analysis, the fact that three of four countries with over 50% of students reporting AI training were in Asia suggests a potential link between educational exposure and perceived readiness.

Despite the overall positive outlook, our study reveals a pronounced concern among students about the ethical and legal challenges posed by AI integration in healthcare. This echoes findings from Mehta et al. and Civaner et al. [40, 50], highlighting the critical need for AI education to address not only technical skills but also ethical, legal, and societal implications.

In terms of educational preferences, most of the participants in our study indicated their interest in learning practical skills, followed by future perspectives and legal and ethical aspects of medical AI. This underscores the great potential of AI education to not only improve medical students' oversight, knowledge, and practical skills in using AI but also to educate about ethical, legal, and societal implications — topics that are also addressed in other AI education frameworks, such as the United Nations Educational, Scientific and Cultural Organization K-12 AI curricula report [51].

In our subgroup analysis of respondents across continents, two items displayed moderate to large effect sizes. First, participants from South America were less likely to agree that the use of medical AI will increase ethical and legal conflicts compared to participants from Northern/Western Europe and Australia. Yet, students' median responses in these regions were identical. Thus, the level of effect size primarily reflects outliers rather than a uniform regional disparity in opinion. Second, Asian students reported being better prepared to work with AI in their future careers. Although these differences in perceived preparedness could be driven by different national AI policies and educational strategies as well as macroeconomic factors, our study design and varying sample sizes across regions complicate a causal analysis [38, 39].

Finally, the strong preference for explainable AI systems over highly accurate but opaque ones underscores the growing emphasis on ‘Explainable AI’ in medicine, underlining the importance of transparency in fostering trust and acceptance among future healthcare professionals [52,53,54, 55].

This study has limitations. First, the uneven regional distribution of participants potentially biased results in favor of overrepresented regions. In addition, the online design and language availability in either English or Spanish, as well as the non-probability convenience sampling method, may have introduced selection bias by excluding students without internet access, students who were not proficient in either language, or students who did not wish to participate. Another potential source of selection bias could be that respondents with a specific interest in or experience with AI were more likely to participate in the survey. Furthermore, the calculated response rate appeared to be rather low due to the lack of data on the number of students enrolled in each medical discipline for most participating institutions. Consequently, we derived the response rate using the total student enrollment numbers, which significantly underestimated the true rate of participation among medical students as it assumes that all students within each faculty received an invitation to participate. Moreover, the presence of 20 institutions with fewer than 50 student respondents has skewed the response rate further downward.

Conclusions

In conclusion, our study -the currently largest survey of medical students’ perceptions towards AI in healthcare education and practice- reveals a broadly optimistic view of AI’s role in healthcare. It draws on insights from students with diverse geographical, sociodemographic, and cultural backgrounds, underlining the critical need for AI education in medical curricula around the world and identifying a universal challenge and opportunity: to adeptly prepare healthcare students for a future that integrates AI into healthcare practice.