Abstract
Student diversity in health professions education (HPE) can be affected by selection procedures. Little is known about how different selection tools impact student diversity across programs using different combinations of traditional and broadened selection criteria. The present multi-site study examined the chances in selection of subgroups of applicants to HPE undergraduate programs with distinctive selection procedures, and their performance on corresponding selection tools. Probability of selection of subgroups (based on gender, migration background, prior education, parental education) of applicants (N = 1935) to five selection procedures of corresponding Dutch HPE undergraduate programs was estimated using multilevel logistic regression. Multilevel linear regression was used to analyze performance on four tools: prior-education grade point average (pe-GPA), biomedical knowledge test, curriculum-sampling test, and curriculum vitae (CV). First-generation Western immigrants and applicants with a foreign education background were significantly less likely to be selected than applicants without a migration background and with pre-university education. These effects did not vary across programs. More variability in effects was found between different selection tools. Compared to women, men performed significantly poorer on CVs, while they had higher scores on biomedical knowledge tests. Applicants with a non-Western migration background scored lower on curriculum-sampling tests. First-generation Western immigrants had lower CV-scores. First-generation university applicants had significantly lower pe-GPAs. There was a variety in effects for applicants with different alternative forms of prior education. For curriculum-sampling tests and CVs, effects varied across programs. Our findings highlight the need for continuous evaluation, identifying best practices within existing tools, and applying alternative tools.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Medical schools and other health professions education (HPE) schools are responsible for selecting qualified students, as well as generating student populations that reflect the diverse society they will serve in the future (General Medical Council, 2015). A diverse healthcare workforce, aside from issues of equity and fairness, is important to improve the cultural competency of healthcare providers, and increase and equalize access to high-quality healthcare for different population groups (Cohen et al., 2002; Morgan et al., 2016). However, student diversity can be affected by the use of selection procedures for undergraduate HPE programs, as selection chances are unequally distributed across subgroups of applicants (Fielding et al., 2018; Mathers et al., 2016). Selection procedures cannot only include a great variety of tools, either defined at a national level or by an individual school, but the same tool can also be implemented in different ways. This raises the question if different tools have differential effects on student diversity (Patterson et al., 2016), and if some tools are more context-independent than others concerning their impact on student diversity. In the present multi-site study, we examined the selection chances of applicant subgroups and their performance on different selection tools in multiple contexts.
So far, literature has shown that selection procedures mainly negatively affect the selection chances of applicants with lower socio-economic status (SES) and from ethnic minorities (Fielding et al., 2018; Mathers et al., 2016; Mulder et al., 2022; Stegers-Jager et al., 2015; Steven et al., 2016). However, this effect may not always be straightforward, as performance differences between subgroups can depend on the combination of tools used in the procedures (Stegers-Jager, 2018). Traditionally, selection procedures in the United States and Europe mainly included prior education grade point average (pre-GPA) and cognitive tests, aimed at measuring intellectual ability. In the past decades, there has been a shift towards the inclusion of broadened selection criteria, which aim to add to the information derived from traditional tools, and often intend to evaluate personal qualities (Niessen & Meijer, 2017; Stegers-Jager, 2018). Examples include curriculum vitae (CV) and situational judgement tests (SJT). In this paper, we will refer to this distinction with the terms traditional and broadened criteria.
Prior research demonstrated performance discrepancies on traditional criteria, favoring higher SES and ethnic majority applicants (Girotti et al. 2020; Juster et al., 2019; Lievens et al., 2016; Stegers-Jager et al., 2015). Although broadened selection criteria were partly introduced to mitigate these adverse effects on student diversity, results so far are inconsistent (Stegers-Jager, 2018). For instance, Lievens et al (2016) found that the inclusion of an SJT in the United Kingdom could increase the representation of lower SES applicants, but not of ethnic minority applicants. However, a similar study in the United States found that adding an SJT was advantageous for the representation of both lower SES and ethnic minority applicants (Juster et al., 2019). This implies that the effects of selection tools on diversity can be context-dependent, at least in the case of broadened criteria. The curriculum-sampling test is another tool assessing broadened criteria that is increasingly used in international contexts and proved effective in terms of predicting academic achievement (Niessen et al., 2018). Curriculum-sampling tests mimic representative parts of a subject of the academic program. Generally, applicants study literature or watch video lectures from small-scale versions of an introductory course, followed by an exam (Niessen et al., 2018). Curriculum-sampling tests are aimed at measuring a mixture of attributes such as knowledge, motivation, and time spent studying (Niessen & Meijer, 2017). Additionally, these tests intend to assess the applicants’ ‘fit’ with the program (e.g., the way of testing and studying). To our knowledge, subgroup performance differences on this specific tool have not yet been investigated.
Every country has its own laws and regulations for selection and admission, as well as a unique context regarding student diversity. Typical for the Netherlands is that, after years of lottery, programs are now responsible for designing their own selection procedures. Programs independently decide which tools they include (both self-developed and standardized), and how many they include (with a minimum of two). This results in a great variety of procedures and tools. Results from a national retrospective study indicate that since the abolishment of lottery, inequality in selection chances between subgroups of applicants has increased (Mulder et al., 2022). The authors found that women, ethnic majority applicants, and applicants with higher SES had a higher probability of admission compared to their peers. However, this study did not take into account the role of the extensive range of possible selection procedures and tools. One previous single-site study attempted to unravel this matter, and concluded that ethnic minority and lower SES applicants had lower scores on academic criteria, but not on non-academic criteria (Stegers-Jager et al., 2015). The researchers discovered that for the institution under consideration, men had higher selection chances compared to women, which was again only related to performance on academic criteria. This contradicts the findings of the aforementioned national study (Mulder et al., 2022). This strengthens the hypothesis that the effects of selection on student diversity are context-dependent. An additional observation of the single-site study was that being a first-generation immigrant was correlated with poorer selection outcomes (Stegers-Jager et al., 2015), a variable that was not accounted for in the national cohort study. A final potentially relevant variable that was included in neither of the studies, is prior education. A recent report indicated that applicants with prior foreign education had smaller selection chances compared to applicants from the ‘traditional’ pre-university track (Van Den Broek et al., 2018).
In short, it is not clear how different selection tools can affect student diversity across different contexts. The freedom of Dutch HPE programs to design their selection procedures creates the unique opportunity to compare the effects of selection on student diversity across different procedures with a variety of selection tools. The present prospective multi-site study aimed to evaluate the probability of selection into five undergraduate HPE programs for subgroups of applicants based on gender, migration background (as an indicator of ethnicity), parental education (as an indicator of SES), and prior education. Additionally, we examined performance differences on two traditional selection tools (pre-GPA, biomedical knowledge test), and two tools assessing broadened criteria (curriculum-sampling test, CV).
Method
Design and context
The present research concerns a prospective multi-site cohort study. We collected data from five university-level undergraduate HPE programs in the Netherlands, including three medical programs (labeled A, B, and C), one technical-medical (clinical technology) program (labeled D), and one pharmacy program (labeled E). The included programs were located in different parts of the Netherlands, both in urban and rural areas, and were all concerned about enhancing diversity in their selection processes.
Uniquely to the Netherlands is that admission requirements of different types of undergraduate HPE programs are identical. To be eligible, applicants need to meet the same stringent requirements regarding subjects taken (e.g., physics, chemistry, and biology) and educational level. Consequently, the applicant pools are relatively homogeneous in terms of academic background; students who apply to a university-level undergraduate HPE program are already strongly preselected based on academic skills due to highly selective secondary education (Niessen & Meijer, 2016). When applicants apply to their program of choice, they apply to one specific institution. Each institution has a predetermined fixed number of spots. By law, institutions are required to include at least two selection criteria, but as previously mentioned, there are no additional requirements concerning, for instance, the content and quality of the tools. Consequently, great variety exists in the selection procedures that programs employ, both between and within different types of HPE programs at different institutions. We studied tools used by more than one program, to evaluate whether effects were similar or different across programs.
The selection procedures of the five programs are described in Table 1. The tools used by multiple programs were pre-GPA, biomedical knowledge test, curriculum-sampling test, and CV. Pre-GPA comprised of applicants’ average school grades on required subjects, usually mathematics, physics, biology, and/or chemistry. Biomedical knowledge tests assessed applicants’ existing general knowledge about biomedical subjects, without requiring any preparation. Curriculum-sampling tests were (largely) based on preparatory materials in the form of a lecture and/or reading materials that applicants had received some weeks prior to the testing day. CVs consisted of an assessment of extracurricular activities, such as (voluntary) jobs, internships, or evidence of extraordinary cultural or athletic skills. One standardized tool was included, the biomedical knowledge subtest of the BioMedical Admissions Test (BMAT), which was administered by one program (D). All other selection tools in our sample were self-developed by the individual programs. Consequently, the specific application of the tools differed between the programs, e.g., the specific subjects included in pre-GPA and the types of questions in tests. All selection tools, except for the BMAT, were administered in Dutch. Programs were responsible for their own quality assurance, and we did not have access to psychometric information.
Participants and procedure
All applicants engaged in the selection procedures for entry in September 2020 (N = 3280) were invited to participate. For programs A, D, and E, applicants were invited during the on-site testing days. Programs B and C did not perform on-site testing due to COVID-19 pandemic measures, necessitating recruitment via e-mail during the selection procedure.
Applicants were requested to complete a demographics questionnaire. In this survey, applicants were asked to report their student number, gender, migration background, parental education and prior education. Data on performance on the selection procedures were derived from the related university student administration systems. Student number was used to connect the data from the demographics questionnaire with performance data.
Informed consent was obtained from all participants. Applicants were informed that participation was voluntary and would not influence their selection outcomes, and we made explicit that the researchers operated independently from the selection committees. Applicants did not receive incentives for participation in the study. All data were pseudonymized immediately after the demographics and performance data were combined. The Medical Ethical Review Committee of Erasmus MC declared the study exempt from ethical approval.
Variables
Predictors included gender, migration background, prior education, and parental education.
Gender diversity was acknowledged in the present study, and applicants had the option to choose between three categories: ‘man’, ‘woman’, and ‘other, namely [free text box]’.
Migration background was used as a proxy for ethnicity, recognizing that this does not completely capture the multidimensional character of ethnicity. Migration background was defined in alignment with Statistics Netherlands (CBS). Individuals have a migration background when at least one of their parents was born outside of the Netherlands. Based on the taxonomy of CBS, we distinguished between a Western and non-Western migration background. All European (excluding Turkey), North American, and Oceanian countries, Indonesia, and Japan were considered Western. Non-Western countries included all countries in Africa, Asia (excluding Indonesia and Japan), Latin America, and Turkey. Additionally, we distinguished between first-generation and second-generation immigrants. First-generation immigrants were born outside the Netherlands. In the Netherlands, the use of migration background and CBS taxonomy are considered the standard for operationalizing ethnicity, also in research in HPE (e.g., Mulder et al., 2022; Stegers-Jager et al., 2015).
In the Netherlands, the typical educational route to an HPE program is the pre-university track of secondary school with a health/science profile. However, applicants can apply to HPE programs from alternative forms of prior education. We distinguished between standard Dutch pre-university education, university, higher vocational education, all forms of foreign education, and other forms of prior education (e.g., entrance exams and adult education).
Finally, parental education was used as a proxy for SES, acknowledging that this is only one of the many indicators that can be used to operationalize SES. Parental education was determined by the educational level of applicants’ parents. Applicants were categorized as first-generation university applicants when none of their parents had attended higher education, i.e., university or higher vocational education. First-generation university applicants were a subgroup of interest, because previous research has demonstrated that their odds of being selected into medical school are lower (Mason et al., 2021; Stegers-Jager et al., 2015). Additionally, they face numerous obstacles when applying to medical school, including a lack of knowledge about the admission process and financial barriers (Romero et al., 2020).
Outcome measures
Five outcome measures were included. The first—binary—outcome measure indicated whether an applicant was selected (yes/no), determined by their ranking number. The other four outcome measures were continuous and reflected performance on the four tools: pre-GPA, curriculum-sampling test, biomedical knowledge test, and CV. For each tool, the responsible program calculated a raw score based on its own scoring method. Subsequently, each program transformed these raw scores into standardized Z-scores to enable comparisons of tools between tracks. These Z-scores were made available to the researchers and used for the analyses.
Statistical analyses
Multilevel logistic regression analysis was performed to calculate odds ratios (OR) for the effect of the different predictors on the probability of selection. An OR of > 1 indicates an increased likelihood of selection. Since the content of the selection procedure differed between programs, we used the program to which the applicants applied as a random intercept in this model. Program E was excluded from this analysis, given the high selection rate of 96%, which was in large contrast with other programs having an average selection rate of 47% (Appendix 1: Table 6). The selection rate of this program was this high due to the small number of applicants compared to the number of available spots.
To compare performance on different tools, we used Z-scores that were provided by the participating programs. We performed multilevel linear regression to assess performance differences between Z-scores on the four overlapping tools. Program C applied different scoring methods for two independent selection tracks with intake restriction (Table 1), resulting in Z-scores for the two different tracks. Therefore, we used the variable ‘track’ instead of ‘program’ as a random intercept for the analyses of the four tools. This random effect was included because, as mentioned earlier, the specific application of each tool differed across settings.
We applied likelihood ratio tests with the boundary correction to assess whether the inclusion of ‘program’ or ‘track’ as a random intercept explained significantly more variance compared to the model without the intercept.
Analyses were executed using the LME4 1.1.26 and NLME 3.1.152 packages in R version 4.0.4. For all statistical analyses, assumptions were checked. We interpreted OR of > 1.68 or < 0.60 as a small effect, OR of > 3.47 or < 0.29 as a medium effect, and OR of > 6.71 or < 0.15 as a large effect (Chen et al., 2010).
Results
Applicant characteristics
In total, 1935 applicants participated in the study (response rate 59%, range 34–81% for individual programs). With respect to gender, 30% of the respondents identified as men, and one applicant identified as ‘other’. This individual was excluded from the subgroup analyses, and therefore only the categories of men and women are described in the results. Furthermore, 38% had a migration background, 20% applied from alternative forms of prior education, and 25% were first-generation university applicants. In terms of gender and age, all samples were representative of the complete applicant pool. For two programs (C and E), participating applicants performed slightly better on the selection (in terms of ranking number) compared to non-participating applicants.
Since applicants were exposed to some overlapping tools, but also to some unique tools, and one program was excluded from the analyses on the probability of selection, the distribution of applicant characteristics differed between the multilevel analyses (Table 2). Noteworthy is that for the biomedical knowledge test, the proportion of applicants with a migration background was relatively low compared to the other programs (28% vs 36–43%). Additionally, for pre-GPA, the proportion of applicants from alternative forms of prior education was comparatively small (10% vs 19–23%). This is probably caused by the fact that pre-GPA is not always included as a selection tool for those applicants.
The individual programs differed in their distribution of the applicant characteristics of interest (Appendix 1: Tables 6, 7). The most notable difference is that compared to the other programs, Program D—the only rural program in the sample—had a lower representation of applicants with a migration background (13% vs 32–53%) and first-generation university applicants (15% vs 24–32%). Differences in demographic composition are not caused by differences in admission requirements, since these were all comparable across programs. However, it is possible that other institutional-related factors made certain programs more attractive to specific subgroups, including location and selection procedure (Wouters et al., 2017b).
Probability of selection
First-generation Western immigrants were significantly less likely to be selected compared to applicants without a migration background (23% vs 49%), corresponding to an adjusted OR of 0.45 (95% confidence interval [CI] [0.20, 0.99]; Table 3). Additionally, foreign-educated applicants had smaller selection odds than those from standard pre-university education (24% vs 49%, adjusted OR = 0.46, 95% CI [0.22, 0.94]). Both can be interpreted as small effects (i.e., OR < 0.60). The category ‘other forms of prior education’ demonstrated a medium (i.e., OR < 0.29), but non-significant (level 0.05) negative effect (18% vs 49%, adjusted OR = 0.28, 95% CI [0.08, 1.00]), which could be due to the small size of this group (N = 17). Gender and parental education were not significantly associated with the probability of selection. The random effect of program was not significant (SD = 0.00, 95% CI [0.00, 0.21], p = 0.50), indicating that subgroup differences in the probability of selection were similar across programs, given the fixed structure considered (i.e., the variables of gender, migration background, prior education and parental education).
Performance on traditional criteria
Pre-GPA
Pre-GPA was used by four programs (B, C, D, and E), of which one program used two independent selection tracks, resulting in five tracks in the analysis (B, C1, C2, D, and E). Compared to traditional applicants, first-generation university applicants had significantly lower pre-GPAs (B = − 0.17, 95% CI [− 0.30, − 0.03]; Tables 4, 5). As Z-scores were used for all criteria, the unstandardized Bs indicate the difference in SD. Thus, for example, pre-GPAs of first-generation university applicants were 0.17 SD lower than those of non-first-generation university applicants. Applicants with university-level and with ‘other forms of prior education’ had significantly lower pre-GPA (respectively, B = − 0.41, 95% CI [− 0.63, − 0.18]; B = − 0.76, 95% CI [− 1.42, − 0.11] compared to standard pre-university applicants, while pre-GPAs of applicants with foreign education were significantly higher (B = 1.13, 95% CI [0.54, 1.72]). Gender and migration background were not associated with pre-GPA. The random effect of track was not significant (SD = 0.005, 95% CI [0, 506553], p = 0.50), indicating that the performance differences found on pre-GPA were similar across tracks, given the fixed structure considered.
Biomedical knowledge test
Biomedical knowledge tests were used by two programs (A and D). Men and applicants who were studying at university-level performed significantly better on biomedical knowledge tests compared to women and applicants from standard pre-university education (respectively, B = 0.21, 95% CI [0.06, 0.37]; B = 0.32, 95% CI [0.11, 0.52]; Tables 6, 7). Migration background and parental education were not associated with test scores, and the random effect of track was not significant (SD = 0.01, 95% CI [0, 12114,82], p = 0.46), indicating that subgroup differences in performance were similar across programs, given the fixed structure considered.
Performance on broadened criteria
Curriculum-sampling test
Three programs included curriculum-sampling tests (A, B, and E). Applicants with a non-Western migration background, both first-generation and second-generation, scored lower on curriculum-sampling tests compared to their traditional counterparts (respectively, B = − 0.43, 95% CI [− 0.67, − 0.20]; B = − 0.21, 95% CI [− 0.34, − 0.10]; Tables 6, 7). Applicants who were already studying at university-level performed significantly better compared to standard pre-university applicants (B = 0.37, 95% CI [0.21, 0.53]), while applicants with foreign education had lower test scores (B = − 0.56, 95% CI [− 0.91, − 0.22]). Test scores were not influenced by gender or parental education. Given the fixed structure considered, the random effect of track was significant (SD = 0.09, 95% CI [0.03, 0.32], p = 0.01), implying that our overall findings differed between programs. Descriptive statistics of the individual programs employing curriculum-sampling tests (Appendix 1: Table 8) indicate that only for program E, applicants with a first-generation non-Western background had notable low mean Z-scores compared to those without a migration background (M = − 0.86 vs M = 0.29). This difference was smaller for program B (M = − 0.13 vs M = 0.18) and non-existent for program A (M = 0.02 vs M = − 0.01). Noteworthy is that in program E relatively more applicants had a first-generation non-Western background than in the other two programs.
CV
Three tracks derived from two different programs included a CV (B, C1, and C2). Compared to women or traditional applicants, CV scores were significantly lower for men (B = − 0.17, 95% CI [− 0.31, − 0.02]; Tables 6, 7), first-generation Western immigrants (B = − 0.43, 95% CI [− 0.85, − 0.00]), and applicants with higher vocational education, foreign education, and ‘other forms of prior education’ (Bs between − 0.61 and − 0.81). Parental education was not associated with CV scores. There was a significant effect of track for CV (SD = 0.25, 95% CI [0.08, 0.76] p < 0.001), indicating that the aforementioned effects differed across tracks, given the fixed structure considered. Descriptive statistics (Appendix 1: Table 9) suggest that the gender-based performance gap was smaller for track B compared to the other tracks (track B: M = 0.04 (men) vs M = 0.17 (women); track C1: M = − 0.29 vs M = − 0.06, track C2: M = − 0.11 vs M = 0.25). The overall result that first-generation Western immigrants had lower scores than applicants without a migration background was found for track B (M = − 0.75 vs M = 0.22) and track C2 (M = − 0.42 vs M = 0.09), but not for track C1 (M = − 0.03 vs M = − 0.14). Compared to those without a migration background, applicants with a second-generation non-Western background had lower CV scores in track B (M = − 0.28 vs M = 0.22), similar CV scores in track C1 (M = − 0.17 vs M = − 0.14) and higher CV-scores in track C2 (M = 0.45 vs M = 0.09). For track B, larger differences were observed between different forms of prior education, but this is probably related to the fact that for program C, the tracks were distinguished based on prior education, resulting in a large concentration of standard pre-university education in track C1 and a large concentration of other forms of prior education in track C2.
Discussion
Unraveling the impact of distinctive selection procedures on student diversity in undergraduate HPE programs requires an insight into how subgroups based on gender, migration background (as an indicator of ethnicity), parental education (as an indicator of SES), and prior education perform on the applied selection tools in different contexts. Our results demonstrated that selection chances of applicants with non-traditional backgrounds were generally smaller, but only significantly for applicants with first-generation Western migration backgrounds and applicants with foreign education. These findings did not differ between programs. However, when taking a closer look, we found larger differences in subgroup performance and more variability in effects. We conclude that the broadened criteria under research—curriculum-sampling tests and CVs—may reduce SES-related performance differences, but not disparities based on applicant ethnicity. Furthermore, subgroup performance differences were context-specific for broadened criteria, but not for traditional criteria.
Our first key finding that the implementation of broadened selection criteria instead of traditional criteria potentially reduces performance disparities based on SES, but may not mitigate an ethnicity-related performance gap, confirms the previous work of Lievens et al. (2016). With respect to the traditional criteria under research, we found that first-generation university applicants had lower pre-GPAs than applicants from traditional backgrounds, also confirming previous research (Griffin & Hu, 2015; Juster et al., 2019; Puddey et al., 2011). Nevertheless, pre-GPAs did not differ between ethnic majority and ethnic minority applicants, which may be explained by a great variety in pre-GPAs between different ethnic minority groups (Puddey et al., 2011). We did not identify significant SES-based and ethnicity-based performance differences on biomedical knowledge tests. However, the sample for this outcome measure was smaller and less diverse compared to the other tools, and international research on such tools persistently reveals such disparities (Girotti et al. 2020; Griffin & Hu, 2015; Juster et al., 2019; Lievens et al., 2016; Puddey et al., 2011). With respect to broadened criteria, our study is the first to investigate subgroup performance on curriculum-sampling tests and CVs in a multi-institutional setting. On both broadened criteria under research, we did not find performance differences based on SES, which resonates with previous research (Griffin & Hu, 2015; Juster et al., 2019; Lievens et al., 2016; Stegers-Jager et al., 2015). Nevertheless, we found that applicants with a migration background were disadvantaged on both tools, whereas previous studies reported mixed findings (Juster et al., 2019; Lievens et al., 2016; Stegers-Jager et al., 2015). A possible explanation for our findings regarding SES is that broadened criteria are less prone to coaching—which is generally more available to high SES applicants (Stemig et al., 2015)—due to their unstandardized and program-specific nature. Traditional criteria, on the other hand, are potentially more susceptible to coaching, as applicants can, for instance, purchase private tutoring to increase their pre-GPA. Simultaneously, the lack of standardization of broadened criteria could increase the risk of cultural bias. Cultural bias can, for instance, occur when certain questions are interpreted differently by members from ethnic minority groups, and may explain the lower scores of applicants with migration backgrounds (Kim & Zebelina, 2015). Language bias probably did not play a significant role in performance disparities based on migration background, because effects were not consistently observed amongst all first-generation immigrants. Additionally, results from previous research suggest that disparities also exist for immigrants from Dutch-speaking countries (Stegers-Jager et al., 2015).
A second key finding is that for the broadened selection criteria, subgroup performance differences were context-specific, whereas the traditional selection criteria had consistent effects across programs. This is in accordance with the current evidence for subgroup differences in performance on the two types of criteria: results from prior research regarding the use of broadened criteria are mixed (Juster et al., 2019; Lievens et al., 2016; Stegers-Jager, 2018; Stegers-Jager et al., 2015), while the outcomes from traditional criteria are rather consistent in disadvantaging ethnic minority and lower SES applicants (Griffin & Hu, 2015; Juster et al., 2019; Lievens et al., 2016; Puddey et al., 2011; Stegers-Jager et al., 2015). Additionally, the overall finding that men and applicants with a first-generation Western migration background had lower CV scores was not in line with a previous Dutch single-institution study (Stegers-Jager et al., 2015). Our study is the first to directly demonstrate that seemingly comparable tools can have differential effects on subgroup performance across different programs. Typically, broadened criteria allow for more variation and can be further adjusted to the specific program contents, which may be the cause of stronger context-dependent effects on subgroup performance for these tools. For instance, curriculum-sampling tests vary in their subject, preparatory materials, and preparation time. Additionally, previous research suggests that the complexity of the language (Lievens et al., 2016), and question format (Edwards & Arthur, 2007), may contribute to subgroup differences in test performance. Likewise, the scoring method and the type of extracurricular activities that are considered in CV scores may play a role, since healthcare experiences are considered to be unequally accessible to applicants from different backgrounds (Wouters, Croiset, Isik, et al., 2017).
A third key finding is that subgroup differences in performance on individual tools did not always have consequences for the probability of selection of those subgroups. We found that selection chances were only significantly smaller for applicants with a first-generation Western migration background and applicants with foreign education, two subgroups that were left unnoticed by previous research. Combining tools with differential subgroup performance within procedures and across procedures may have counter-balanced the overall effect, and the weightings of different tools may have played a role (Lievens et al., 2016; Stegers-Jager, 2018). Our findings are not fully supported by the results from a recent retrospective study that included applicants to all Dutch undergraduate HPE programs (Mulder et al., 2022). The authors found significantly lower selection probability for additional ethnic minority groups, men and lower SES groups, although the results were negligible in terms of statistical effect size (Chen et al., 2010). The discrepancy between findings may be explained by differences between target groups: the present study used prospective data of a subset of programs and included a more heterogenous group of applicants, including older applicants and those with foreign education.
Strengths of our study include that we collected data from multiple programs and that we used a multilevel analytical approach, creating the opportunity to correct for and examine contextual differences. The typical Dutch admissions system, which allows schools to design their own selection procedure, allowed us to compare a variety of (applications of) tools. As a consequence, not all tools were used by all programs. Therefore, direct comparison across different outcome measures and examination of the correlation of performance between different tools were not possible. Another limitation is that although the present study is, to our knowledge, the first to include the selection procedures of a range of different types of undergraduate HPE programs, it was not possible to cover all specialties and institutions. This may have consequences for the generalizability of our findings. Furthermore, we included parental education as a relevant indicator for SES, since first-generation university applicants have been shown to face barriers during the transition into higher education (Stephens et al., 2014), but we may have overlooked other potentially relevant SES-related effects, such as parental income and profession (Girotti et al. 2020; Mulder et al., 2022; Steven et al., 2016). Likewise, migration background is a stable and objective indicator of ethnicity, but does not account for ethnic identity (Ross et al., 2020; Stronks et al., 2009). Another limitation is that for certain subgroups, sample sizes were small, thus those results should be interpreted with caution. Finally, two selection procedures were partly affected by COVID-19 measures, potentially reducing the generalizability of our findings. Nevertheless, the effects of the probability of selection did not differ significantly between the four programs in that analysis, of which two were affected by COVID.
The variety in subgroup differences between and within tools implies that future research should determine whether specific characteristics of tools can play a moderating role in their effects on diversity. This could lead to the identification of best practices. Furthermore, based on our results we cannot draw conclusions with respect to the effect of different weightings of tools. Therefore, we endorse a previous suggestion to investigate the effects of different weightings of tools on student diversity (Stegers-Jager, 2018). Future studies should also examine whether selection tools differentially predict academic performance for different subgroups, to determine whether the performance disparities we found correspond with bias (i.e., underprediction or overprediction for certain subgroups). Finally, future research should identify the specific underlying characteristics and needs of subgroups of applicants with non-traditional backgrounds within the context of HPE selection, to provide better support during and, as suggested by others (Lievens, 2015; Wouters, 2020), also after selection. For instance, applicants from alternative forms of prior education may face difficulties managing the expectations in HPE selection that can strongly differ from their previous educational experiences (Katartzi & Hayward, 2020; Rienties & Tempelaar, 2013).
From a practical viewpoint, the context-specificity of subgroup differences in performance indicates that HPE programs need to establish continuous evaluation of the possible effects of their selection procedures on student diversity, rather than only relying on existing research in other contexts. Additionally, we encourage programs to conscientiously include and/or develop alternative tools that can reduce adverse impact and explicitly promote well-needed diversity, such as SJTs (Juster et al., 2019) and multiple mini-interviews (Griffin & Hu, 2015), while keeping in mind that effects can be context-specific. We acknowledge the desire to apply school-specific selection procedures, as selection procedures that align their contents with the curriculum can have high predictive value (Schreurs et al., 2020). Simultaneously, this creates a responsibility for programs to evaluate different aspects of the validity of their selection procedure, including adverse impact (Schreurs, 2020). Additionally, programs could consider validating their tools with diverse norming groups (Padilla & Borsato, 2008).
In conclusion, selection into undergraduate HPE programs can unintentionally impact student diversity, hindering equitable admission. Compared to traditional criteria, broadened criteria can reduce SES-related performance differences, but not disparities based on ethnicity. For broadened criteria, subgroup differences in performance also vary across contexts. We, therefore, call for continuous evaluation effects of selection on diversity, the identification of best practices within existing tools, the inclusion of tools with a positive or neutral impact on student diversity, and sufficient quality control.
References
Chen, H., Cohen, P., & Chen, S. (2010). How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics - Simulation and Computation, 39(4), 860–864. https://doi.org/10.1080/03610911003650383
Cohen, J. J., Gabriel, B. A., & Terrell, C. (2002). The case for diversity in the health care workforce. Health Affairs, 21(5), 90–102. https://doi.org/10.1377/hlthaff.21.5.90
Edwards, B. D., & Arthur, W. (2007). An examination of factors contributing to a reduction in subgroup differences on a constructed-response paper-and-pencil test of scholastic achievement. Journal of Applied Psychology, 92(3), 794–801. https://doi.org/10.1037/0021-9010.92.3.794
Fielding, S., Tiffin, P. A., Greatrix, R., Lee, A. J., Patterson, F., Nicholson, S., & Cleland, J. (2018). Do changing medical admissions practices in the UK impact on who is admitted? An interrupted time series analysis. British Medical Journal Open, 8(10), e023274. https://doi.org/10.1136/bmjopen-2018-023274
General Medical Council. (2015). Promoting excellence: standards for medical education and training, pp. 1–51.
Girotti, J. A., Chanatry, J. A., Clinchot, D. M., McClure, S. C., Sein, A. S., Walker, I. W., Searcy, C. A., Swan Sein, A., Walker, I. W., Searcy, C. A., Sein, A. S., Walker, I. W., & Searcy, C. A. (2020). Investigating group differences in examinees’ preparation for and performance on the new MCAT exam. Academic Medicine, 95(3), 365–374. https://doi.org/10.1097/acm.0000000000002940
Griffin, B., & Hu, W. (2015). The interaction of socio-economic status and gender in widening participation in medicine. Medical Education, 49(1), 103–113. https://doi.org/10.1111/medu.12480
Juster, F. R., Baum, R. C., Zou, C., Risucci, D., Ly, A., Reiter, H., Miller, D. D., & Dore, K. L. (2019). Addressing the diversity-validity dilemma using situational judgment tests. Academic Medicine, 94(8), 1197–1203. https://doi.org/10.1097/ACM.0000000000002769
Katartzi, E., & Hayward, G. (2020). Conceptualising transitions from vocational to higher education: Bringing together Bourdieu and Bernstein. British Journal of Sociology of Education, 41(3), 299–314. https://doi.org/10.1080/01425692.2019.1707065
Kim, K. H., & Zebelina, D. (2015). Cultural bias in assessment: Can creativity assessment help? International Journal of Critical Pedagogy, 6(2), 129–148.
Lievens, F. (2015). Diversity in medical school admission: Insights from personnel recruitment and selection. Medical Education, 49(1), 11–14. https://doi.org/10.1111/MEDU.12615
Lievens, F., Patterson, F., Corstjens, J., Martin, S., & Nicholson, S. (2016). Widening access in selection using situational judgement tests: Evidence from the UKCAT. Medical Education, 50(6), 624–636. https://doi.org/10.1111/medu.13060
Mason, H. R. C., Ata, A., Nguyen, M., Nakae, S., Chakraverty, D., Eggan, B., Martinez, S., & Jeffe, D. B. (2021). First-generation and continuing-generation college graduates’ application, acceptance, and matriculation to U.S. medical schools: A national cohort study. Medical Education Online. https://doi.org/10.1080/10872981.2021.2010291
Mathers, J., Sitch, A., & Parry, J. (2016). Population-based longitudinal analyses of offer likelihood in UK medical schools: 1996–2012. Medical Education, 50(6), 612–623. https://doi.org/10.1111/medu.12981
Morgan, H. K., Haggins, A., Lypson, M. L., & Ross, P. (2016). The importance of the premedical experience in diversifying the health care workforce. Academic Medicine, 91(11), 1488–1491. https://doi.org/10.1097/ACM.0000000000001404
Mulder, L., Wouters, A., Twisk, J. W. R., Koster, A. S., Akwiwu, E. U., Ravesloot, J. H., Croiset, G., & Kusurkar, R. A. (2022). Selection for health professions education leads to increased inequality of opportunity and decreased student diversity in The Netherlands, but lottery is no solution: A retrospective multi-cohort study. Medical Teacher. https://doi.org/10.1080/0142159X.2022.2041189
Niessen, A. S. M., & Meijer, R. R. (2016). Selection of medical students on the basis of nonacademic skills: Is it worth the trouble? Clinical Medicine, 16(4), 339–342. https://doi.org/10.7861/clinmedicine.16-4-339
Niessen, A. S. M., & Meijer, R. R. (2017). On the use of broadened admission criteria in higher education. Perspectives on Psychological Science, 12(3), 436–448. https://doi.org/10.1177/1745691616683050
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2018). Admission testing for higher education: A multi-cohort study on the validity of high-fidelity curriculum-sampling tests. PLoS ONE, 13(6), e0198746. https://doi.org/10.1371/journal.pone.0198746
Padilla, A. M., & Borsato, G. N. (2008). Issues in culturally appropriate psychoeducational assessment—PsycNET. In L. A. Suzuku & J. G. Ponterotto (Eds.), Handbook of multicultural assessment: Clinical, psychological, and educational applications (pp. 5–21). Jossey-Bass/Wiley.
Patterson, F., Knight, A., Dowell, J., Nicholson, S., Cousans, F., & Cleland, J. (2016). How effective are selection methods in medical education? A systematic review. Medical Education, 50(1), 36–60. https://doi.org/10.1111/medu.12817
Puddey, I. B., Mercer, A., Carr, S. E., & Louden, W. (2011). Potential influence of selection criteria on the demographic composition of students in an Australian medical school. BMC Medical Education, 11(1), 1–12. https://doi.org/10.1186/1472-6920-11-97
Rienties, B., & Tempelaar, D. (2013). The role of cultural dimensions of international and Dutch students on academic and social integration and academic performance in the Netherlands. International Journal of Intercultural Relations, 37(2), 188–201. https://doi.org/10.1016/j.ijintrel.2012.11.004
Romero, R., Miotto, K., Casillas, A., & Sanford, J. (2020). Understanding the experiences of first-generation medical students: Implications for a diverse physician workforce. Academic Psychiatry, 44(4), 467–470. https://doi.org/10.1007/S40596-020-01235-8
Ross, P. T., Hart-Johnson, T., Santen, S. A., & Zaidi, N. L. B. (2020). Considerations for using race and ethnicity as quantitative variables in medical education research. Perspectives on Medical Education, 9(5), 318–323. https://doi.org/10.1007/s40037-020-00602-3
Schreurs, S. (2020). Selection for medical school: The quest for validity. https://doi.org/10.26481/DIS.20200320SS
Schreurs, S., Cleutjens, K. B. J. M., Cleland, J., & oude Egbrink, M. G. A. (2020). Outcomes-based selection into medical school: Predicting excellence in multiple competencies during the clinical years. Academic Medicine, 95(9), 1411. https://doi.org/10.1097/ACM.0000000000003279
Stegers-Jager, K. M. (2018). Lessons learned from 15 years of non-grades-based selection for medical school. Medical Education, 52(1), 86–95. https://doi.org/10.1111/medu.13462
Stegers-Jager, K. M., Steyerberg, E. W., Lucieer, S. M., & Themmen, A. P. N. (2015). Ethnic and social disparities in performance on medical school selection criteria. Medical Education, 49(1), 124–133. https://doi.org/10.1111/medu.12536
Stemig, M. S., Sackett, P. R., & Lievens, F. (2015). Effects of organizationally endorsed coaching on performance and validity of situational judgment tests. International Journal of Selection and Assessment, 23(2), 174–181. https://doi.org/10.1111/ijsa.12105
Stephens, N. M., Hamedani, M. G., & Destin, M. (2014). Closing the social-class achievement gap: A difference-education intervention improves first-generation students’ academic performance and all students’ college transition. Psychological Science, 25(4), 943–953. https://doi.org/10.1177/0956797613518349
Steven, K., Dowell, J., Jackson, C., & Guthrie, B. (2016). Fair access to medicine? Retrospective analysis of UK medical schools application data 2009–2012 using three measures of socioeconomic status. BMC Medical Education, 16(1), 1–10. https://doi.org/10.1186/s12909-016-0536-1
Stronks, K., Kulu-Glasgow, I., & Agyemang, C. (2009). The utility of “country of birth” for the classification of ethnic groups in health research: The Dutch experience. Ethnicity & Health, 14(3), 255–269. https://doi.org/10.1080/13557850802509206
Van Den Broek, A., Mulder, J., De Korte, K., Bendig-Jacobs, J., & Van Essen, M. (2018). Selectie bij opleidingen met een numerus fixus & de toegankelijkheid van het hoger onderwijs Onderzoek in opdracht van het ministerie van OCW.
Wouters, A. (2020). Getting to know our non-traditional and rejected medical school applicants. Perspectives on Medical Education, 9(3), 132–134. https://doi.org/10.1007/S40037-020-00579-Z
Wouters, A., Croiset, G., Isik, U., & Kusurkar, R. A. (2017a). Motivation of Dutch high school students from various backgrounds for applying to study medicine: A qualitative study. British Medical Journal Open, 7(5), e014779. https://doi.org/10.1136/bmjopen-2016-014779
Wouters, A., Croiset, G., Schripsema, N. R., Cohen-Schotanus, J., Spaai, G. W. G., Hulsman, R. L., & Kusurkar, R. A. (2017b). Students’ approaches to medical school choice: Relationship with students’ characteristics and motivation. International Journal of Medical Education, 8, 217–226. https://doi.org/10.5116/ijme.5921.5090
Acknowledgements
We would like to thank everyone from the participating institutions who helped us with data collection.
Funding
Funding was provided by Nationaal Regieorgaan Onderwijsonderzoek (Grant No. 40.5.18650.007).
Author information
Authors and Affiliations
Contributions
SF, KS, and AMW substantially contributed to the conception and design of the study. KS, RVG, MG, JR and AW were responsible for data collection. SF and PA analyzed the data. SF wrote the first draft of the article. All authors interpreted the data, revised it critically for intellectual content, approved the final manuscript for publication, and have agreed to be accountable for all aspects of the work in ensuring that questions related to its accuracy or integrity are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
The present study was carried out in accordance with the Declaration of Helsinki and was deemed exempt from full review after evaluation by the Medical Ethics Committee of Erasmus University Medical Center, Rotterdam.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fikrat-Wevers, S., Stegers-Jager, K.M., Afonso, P.M. et al. Selection tools and student diversity in health professions education: a multi-site study. Adv in Health Sci Educ 28, 1027–1052 (2023). https://doi.org/10.1007/s10459-022-10204-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-022-10204-9