Abstract
Interpretation of images and spatial relationships is essential in medicine, but the evidence base on how to assess these skills is sparse. Thirty medical students were randomized into two groups (A and B), and invited to “think aloud” while completing 14 histology MCQs. All students answered six identical MCQs, three with only text and three requiring image interpretation. Students then answered eight “matched” questions, where a text-only MCQ on version A was “matched” with an image-based MCQ on paper B, or vice versa. Students’ verbalizations were coded with a realist, inductive approach and emerging codes were identified and integrated within overarching themes. High-performing students were more likely to self-generate an answer as compared to middle and lower performing students, who verbalized more option elimination. Images had no consistent influence on item statistics, and students’ self-identified visual-verbal preference (“learning style”) had no consistent influence on their results for text or image-based questions. Students’ verbalizations regarding images depended on whether interpretation of the adjacent image was necessary to answer the question or not. Specific comments about the image were present in 95% of student-item verbalizations (142 of 150) if interpreting the image was essential to answering the question, whereas few students referred to images if they were an unnecessary addition to the vignette. In conclusion, while assessing image interpretation is necessary for authenticity and constructive alignment, MCQs should be constructed to only include information and images relevant to answering the question, and avoid adding unnecessary information or images that may increase extraneous cognitive load.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Visuospatial skills are an intrinsic element of medicine and medical sciences, and the disciplines of anatomy and histology are typically the first areas of the medical curriculum where students will experience the need to develop and display skills of identification and interpretation. How these skills are introduced, internalized, and ultimately assessed has direct relevance to educators, particularly as technological advances have led to image-based resources and assessments being more easily, and so increasingly, incorporated into curricula.
Much education has moved online in recent years, with many institutions now teaching histology by means of virtual microscopy or computer-based programs [1,2,3,4,5,6,7]. The use of images in teaching and learning is well described, where the dual-channels assumption of the multiple representation principle proposes that learners process information primarily through separate auditory-verbal and visual-pictorial channels [8,9,10,11]. Students will have differences in cognitive ability, learning styles, and preferences along the visualizer–verbalizer dimension, and the concept of teaching to learning styles or preferences is still pervasive in education [12,13,14]. However, there is a dearth of evidence to support teaching to individual learning styles or preferences [15, 16], with “no adequate evidence base to justify incorporating learning styles assessments into general educational practice” [17]. Instead, the evidence base demonstrates that students benefit from learning with a combination of images and verbal information, to balance incoming information between these two main channels [18, 19].
Information on the effect of images in assessments is more limited, perhaps in part due to the historical challenges in preparing and including images in unique examinations. Nowadays, digital photography, printing, and online assessments mean that image reproduction and inclusion has become a straightforward task [20,21,22,23,24]. While all assessment methods have different strengths and weaknesses, multiple-choice questions (MCQs) are extremely time-efficient, allowing broad sampling across the curriculum, and so remain a core component of most programs of assessment [25,26,27]. Medicine and the medical sciences require accurate identification of clinical signs, anatomical parts, and histological features, and while precise verbal descriptions could be included within clinical vignettes, doing so may make text overly grammatically convoluted or complex [25, 27,28,29]. This issue is even more relevant for institutions with substantial numbers of non-native English speakers, studying medicine in their second (or third) language, and so conscious consideration should be given to only include construct-relevant language, an inherent part of the technical vocabulary of medical sciences, while minimizing linguistic clutter and irrelevant grammatical complexity [10, 27, 30].
Many learning outcomes in histology also require that students identify structures by visual inspection and interpretation [31]. While principles of constructive alignment require that these learning outcomes be assessed using images, the evidence base on how to do so is sparse, with variable outcomes [23, 32]. The bespoke 70-plate booklet of illustrations used by Hunt et al. in the 1970s was undoubtedly of high quality, and improved the authenticity of the assessment, but its use negatively impacted candidates’ scores, as they repeatedly switched focus between this booklet and reading the questions on the examination paper [20]. This phenomenon is described as the spatial continuity effect and can be avoided by placing text and images adjacent to each other in either printed or online assessments [9, 33]. More recent studies with well-aligned text and images found no evidence of a consistent effect on item difficulty or discrimination [24, 34, 35].
So, where lies the balance within authentic assessment of an undergraduate histology curriculum? Inclusion of some images within assessments is now a simple matter, but does an accompanying image provide candidates with an additional advantage or cue when answering a question, or is it a distracting increase to cognitive load [36, 37]? This study aims to address this gap by investigating how images influence medical students’ reasoning in histology MCQs, specifically (1) the cognitive processes and critical thinking of students while answering single best answer MCQs in histology, (2) whether images influence the verbalized cognitive processes of participants, and (3) whether self-identified verbal and visual learners display different verbalizations or cognitive processes when answering text and image-based MCQs.
Materials and Methods
Ethics Approval, Student Recruitment, and Anonymization
Ethical approval for this study was received from the Research Ethics Committee of the Royal College of Surgeons in Ireland (reference RCSI-REC1132). Students in their first year of both the Direct (Undergraduate, DEM, 340 students) and Graduate Entry Medical (GEM, 80 students) programs were invited to participate by means of a forum post (with attached Participant Information Leaflet) and to contact the principal investigator by e-mail if they wished to volunteer. All students who did so were assigned a unique participant number for pseudoanonymization by a gatekeeper, with no role or responsibility in teaching or assessing medical students. Within RCSI’s School of Medicine, the histology course was taught by self-directed online tutorials, within the first year of the curriculum, integrated into the systems-based, multidisciplinary modules, and students were advised to study the “Endocrine System” online histology tutorial prior to their interview (which was part of their normal course content for the semester) [35, 38].
Preparation of Multiple-Choice Questions
Two examination papers with 14 multiple-choice questions were prepared by two content experts, with six identical anchor MCQs on each test (Table 1; Supplementary Information). Three anchor MCQs had textual vignettes only, and three items required interpretation of an adjacent image (“required image”). The remainder of MCQs on each test were matched, where one test had an MCQ with a textual vignette, and the matched MCQ on the other test included an image (Table 1; Supplementary Information). The image-based MCQ contained either (a) identical text along with an image containing information complementary but non-essential to answering the question (“redundant image”) or (b) a modified textual vignette, with removal of details critical to answering the question, and an image added to provide that required information or context (“required image”).
Interviews
Students then met individually with one of the interviewers, and were given an opportunity to ask questions before giving formal consent, then randomly assigned to either version A or version B of the test (Table 1; Supplementary Information). Students completed demographic questions regarding their educational level and linguistic abilities (native and known languages), and then the Verbal–Visual Learning Style Rating (VVLSR; 7-point Likert), to identify whether they self-identified as predominantly verbal or visual learners [12]. Students were given some guidance on verbalizing their thoughts (“think-aloud”) and asked to answer two practice questions, voicing their thoughts as they completed these questions. Students then completed their 14 MCQs, while continuing to verbalize their thoughts for recording and transcription.
Analyses
Quantitative Data
All demographic and test data were collated and tabulated in MS Excel, then imported to STATA 17.0 for statistical analysis (StataCorp., College Station, TX). A caveat must be stated that the primarily qualitative focus of this study, and the small number of participants (n = 30), limits the statistical power and thus the interpretation of quantitative statistical analyses. Differences were considered significant for values of p < 0.05 for all (parametric) statistical analyses performed in this study, with the mean and standard deviation used to summarize students’ scores. For analysis of the Verbal–Visual Learning Style Rating, the mean was again chosen as the measure of central tendency, and comparisons were performed by means of independent t-tests [39,40,41,42,43]. Item psychometrics were calculated in STATA, with item discrimination calculated by means of a point biserial correlation (pwcorr, a true Pearson product-moment correlation), with a higher positive correlation for an MCQ indicating that students who achieved a high score on the overall test also scored higher on that individual MCQ.
Qualitative Analysis
Transcripts were imported into QSR International’s NVIVO 11 qualitative data analysis software (QSR International Pty. Ltd.), which was used for all further thematic analyses. While some codes and themes were anticipated from prior reading of the existing evidence base, all transcripts were initially read for familiarity, then coded with a realist, inductive approach, with anticipated and additional emerging codes identified and integrated within themes [44,45,46,47,48]. The process was iterative, with the thematic structure undergoing revisions throughout, with coding and analysis shared and discussed between co-authors throughout the process [49]. Following development of the final thematic framework, a final formal analysis of 20% of scripts was performed by an additional coder for comparison with this final schema and themes.
Results
Quantitative Analyses and Item Statistics
All students completed the interview well within the 30 min allotted, with a mean recording length of 14 min and 5.5 s. Cronbach’s alpha (scale reliability coefficient) for version A of the test was 0.68, while version B was 0.67. Comparing students’ scores on the test papers overall, there was no statistically significant difference observed between students who completed version A of the test (M = 10, SD = 1.77) as compared to those who completed version B (M = 10.5, SD = 2.03; t(28) = − 0.7663, p = 0.45). Item statistics for the six identical anchor MCQs showed no statistically significant difference in the item facility observed on version A of the test (M = 0.63, SD = 0.23) as compared to version B (M = 0.67, SD = 0.29; t(5) = − 0.8076, p = 0.46). Similarly, no significant differences were observed in point-biserial correlation (0.266 ± 0.28 vs 0.384 ± 0.19; t(5) = − 0.8611, p = 0.43) of these six anchor MCQs as answered by students completing either version A or B of the test.
For the remainder of the MCQs on the paper, these matched image MCQs either had (a) identical text and an additional (redundant) image that was not essential to answering the question or (b) modification of the vignette and substitution of textual information with an image (Table 1; Supplementary Information). Comparing the text-only MCQs with their match that had an identical textual vignette and an additional “redundant” image attached, a slight and non-significant reduction in item facility (0.75 vs 0.67; p = 0.08) was observed, but with no demonstrable impact on point-biserial correlation (0.43 vs 0.42; p = 0.96). Comparing text-only MCQs with their match containing modified text and an image requiring interpretation showed no statistically significant difference in either item facility (0.88 vs 0.88; p = 1.00) or point-biserial correlation (0.25 vs 0.20; p = 0.84).
There was no difference observed in the students’ self-identified VVLSR when comparing students who completed version A of the test (M = 4.67, SD = 1.5) with those who completed version B (M = 4.47, SD = 1.85; t(28) = 0.3259, p = 0.75). There was no significant difference between those who identified as more verbal, more visual, or equal learners with regard to overall score, or subscores on anchor MCQs, on text-only MCQs, or on MCQs with images overall (Table 2). For scores on the MCQs with redundant images, self-identified visual learners received a lower score on these two MCQs (1.1 ± 0.72) than either verbal (1.5 ± 0.53) or equal learners (1.67 ± 0.52), but this did not reach statistical significance (F(2, 27) = 1.93; p = 0.165; Table 2).
Qualitative Exploration of Cognitive Processes
All 30 students each verbalized their responses to 14 questions, resulting in 420 student-item verbalizations. The verbalizations and cognitive processes observed are organized under three main themes within which sub-themes were developed (Table 3). The first theme concerned non-inferential description of the students’ vocalizations or observed behaviors, including sub-themes of reading the vignette fully (verbatim), linguistic, or language mispronunciations, admitting knowledge deficits, or returning to review or reread MCQs prior to completing the paper [48]. The second theme involved identification of reasoning or cognitive strategies that students used to answer the question, incorporating sub-themes including generating a correct answer from ready knowledge before reviewing options, using option elimination to select an answer, or selecting an incorrect option with no obvious verbalization or consideration of the correct option (premature closure) [34, 48]. Another sub-theme was whether students noticed and used the deliberate vertical cues that were inserted into the papers, whereby information in one MCQ aided in answering another MCQ on the paper. The third theme included all observed verbalizations and inferred cognitive processes specifically related to image identification or analysis: analytical, non-analytical, and image not mentioned [46, 47]. Analytical observations made specific reference to features such as scale, shape, or color to deduce the answer. Non-analytical observations gave no indication as to how (named) features were identified, and for many of the image-based MCQs, there were simply no verbalizations related to the image at all.
Student Performance
Most students read the vignette and question aloud, fully, and verbatim (403 of 420 student-item verbalizations; Tables 3, 4), but the list of options was seldom read in the same systematic manner (59 of 420 student-item verbalizations). High-performing students were significantly more likely to self-generate an immediate answer to MCQs, without any verbalization indicating that they had read the full option list (74 of 126 student-item verbalizations; 59%), than medium (94 of 210; 44.8%) or lower (26 of 84; 31%) performing students (F(2, 27) = 6.60, p = 0.0046; Tables 3, 4). Not all apparently self-generated answers were correct. There were 43 verbalizations where a student reached (incorrect) closure prematurely, generating an answer by selecting an incorrect option, without any verbal indication that the correct option had been read or considered (Table 3). Premature closure was observed more frequently in verbalizations from lower performing students (16 of 84 student-item verbalizations; 19.1%) as compared to medium (23 of 210; 11%) or high (4 of 126; 3.2%) performing students (F(2, 27) = 7.33, p = 0.0029; Table 4).
Unsurprisingly, lower performing students were more likely to verbalize about knowledge deficits, or being uncertain, than medium- or high-performing students (F(2, 27) = 5.31, p = 0.0114; Tables 3, 4). Lower performing students also appeared to have more difficulty or delay in answering MCQs (22 of 84 student-item verbalizations; 26.2%), compared to medium (30 of 210; 14.3%) or high (1 of 126; 0.8%) performing students (F(2, 27) = 7.05, p = 0.0034; Tables 3, 4). Lower performing students were also more likely to return and review MCQs for a second, or even a third time, before completing the test (F(2, 27) = 3.18, p = 0.0577; Tables 3, 4). This study was designed to include a small number of vertical cues on each paper, but few students appeared to notice these, as they were remarked upon in only eight verbalizations (three of which were from one individual student (Tables 3, 4)).
Image Interpretation
Students verbalized more observations when answering MCQs containing an image which was necessary or essential to answering the question, whereas redundant images were unlikely to be mentioned by students at all (X2(2) = 133.0720, p < 0.001; Fisher’s exact test, p < 0.001; Table 5). Students who self-identified as verbal learners were more likely to have a verbally analytical approach to answering MCQs with images, making specific comments about the scale, shape, features, or colors within the image (X2(4) = 17.8040, p = 0.001; Fisher’s exact test, p = 0.001; Table 6).
Right it’s not methylene blue because I don’t see any blue indications in the image. P01
So, I know that the predominant stain for an awful lot of the slides was the H&E one and what I’m looking at doesn’t look as pink as some of those. I’m going to scratch out A which is eosin and B which is haematoxylin. Definitely not methylene blue because they look kind of red. P11
Okay, I have to analyse this image because there is no colloid, so I don’t think it will be thyroid. No follicles evident. It does look like it has two lobes though so it could be pituitary. It’s probably pancreas, no I don’t think it would be pancreas. It has no follicles as well. I don’t think it would be adrenal either, just because split in two I’ll go with pituitary. P12
Em, okay, this looks like its pointing at a thing in between the big things, so I’m going to guess that it’s interstitial cell or a leydig cell, so testosterone is what I will choose. P21
Well it looks like there is two distinct stains, one is lighter than the other, em, and one’s bigger than the other. So it looks like an anterior and a posterior pituitary to me, so I’m going to say pituitary.P29
Visual learners were more likely to make non-analytical comments about the image, where students would mention the image, perhaps even naming a structure seen within it, but giving no verbal indication as to how they had identified, interpreted, or analyzed it.
I’m going to go with testosterone because I feel like they look like leydig cells. P01
so the arrow’s pointing the posterior pituitary which makes oxytocin and ADH, so the only answer is oxytocin. P13
Em, there’s a picture as well, em, so I suppose the picture is just to remind you P17
So I know that this is going to be the glomerulus, this is the glom, this is the fasculata and this is reticularis so that’s going to be your androgens. P18
…for this question I didn’t really use the image on the right since it wasn’t really useful to me, since it didn’t relate to the actual thought process. P30
Discussion
This study sought to explore (1) the cognitive processes and critical thinking of students while answering single best answer MCQs in histology, (2) whether images influence the verbalized cognitive processes of students, and (3) whether self-identified verbal and visual learners display different verbalizations or cognitive processes when answering text and image-based MCQs. The “think-aloud” method explores metacognition through the lens of viewing thinking as inner speech, where people externally vocalize their inner monologue, and is accepted as a valid research methodology to explore reasoning and problem-solving in many fields, including medicine [34, 47, 48, 50,51,52]. There are some criticisms, such as the potential for this ongoing verbalization to cause people to use limited cognitive resources on incidental processing, leaving less cognitive capacity for essential processing, or to potentially interrupt or influence the internal voice [9, 53, 54].
Another potential issue is that not all thinking or cognition is performed in an analytical manner, subject to being easily verbalized. Intuitive leaps, unconscious biases, subconscious pattern recognition—these subconscious thoughts will not be captured by verbalization of an inner monologue, although they may still heavily influence decision making, particularly when addressing complex questions or contexts [53, 55,56,57,58]. The finding that high-performing students were significantly more likely to self-generate an answer as compared to middle and lower performing students is consistent with observations in related studies of reasoning [46, 48, 59] and is the theoretical basis for the development and use of the very-short-answer question format [60]. Lower performers not only were more likely to verbalize about knowledge deficits, but also more likely went back to check or change answers than other students, which is also a finding consistent with previous studies [48].
Image recognition and interpretation are key skills in many of the medical sciences [31, 34, 61,62,63,64,65]. Therefore, the principles of constructive alignment mean that visual interpretation and analysis should be an integral, albeit proportional, part of assessment strategy and design [32]. While few assessments may specifically assess these skills of visual interpretation and analysis, those that do are typically well received by students, who appreciate their authenticity in preparing them for clinical practice, sentiments mirrored by students in this study [11, 66, 67]. While much prior research has reported that inserting images to a single best answer MCQ has no influence overall item psychometrics per se [35, 68,69,70,71], other studies have reported inconsistent effects and hypothesized that these effects are due to the qualities or characteristics of the image used [24]. Other studies have contrasting findings, reporting that students’ scores are higher when answering MCQs with images [67], or conversely that the inclusion of images reduces item facility (the inverse of item difficulty), lowering scores, potentially due to increasing extraneous cognitive load or spatial contiguity effects [18, 20].
Despite the small number of MCQs with redundant images in this study, the manner in which these images went mostly unmentioned in students’ verbalizations, along with the reduced item facility for these MCQs, strongly suggests that redundant images are a hindrance in assessments, not a help, and should not be included within MCQ vignettes. While no other comparable research has been done to date within medical assessment, the inclusion of “irrelevant, redundant or interacting sources of information” in arithmetic examinations is also suggested to slow down the speed at which students are able to process information, leading to increased testing time and item difficulty (the inverse of item facility) [37]. This coherence effect strongly suggests that over-excessive detail reduces capacity for essential information processing, and thus potentially detrimental to students’ performance in assessments [9, 19, 72, 73].
However, this study also demonstrated that the inclusion of an image that was essential to correctly answering an MCQ did not appear to have any significant influence on item psychometrics or observed verbalizations as compared to text-alone vignettes. Thus, the use of images in MCQ vignettes written to specifically test the ability of candidates’ ability to identify or interpret required images not only is no threat to validity, but also is logically required according to the principles of constructive alignment [27, 32, 68,69,70,71]. Furthermore, students’ self-identified VVLSR had no discernable influence on their objective scores when answering verbal or visual MCQs. Verbal learners were significantly more analytical in their verbalizations when answering image-based MCQs than visual learners, but learners who self-identify as being verbal learners may de facto experience a more analytical inner verbal monologue than those who self-identify as visual learners [12, 16, 17, 58].
Additional factors, such as the quality of the images provided for candidates, along with their spatial (or temporal) relationship to the placement of the question text, do merit some conscious consideration when writing MCQs [9, 19, 20, 27, 33, 34, 74]. Where text and image are spatially separated on separate sheets or screens, some processing capacity will be diverted from image interpretation by the necessity to switch visual focus, looking back and forth between text and image [19, 20, 73]. Where this is not possible, and the image and text are separated, research on cognition suggests text should be placed to precede the image to provide context, with a caveat that this is as yet not definitively researched in MCQ assessments [19, 33].
The hypothesis that the characteristics or complexity of the images used in MCQ vignettes will affect item statistics and metacognition has been recognized by numerous authors [11, 19, 24, 34, 75]. Sagoo et al. found that students scored significantly higher on questions with images (both anatomical and radiological) compared to questions without images [11]. Further analysis considering image subtypes demonstrated that “students performed significantly better on questions referring to bones than to soft tissues regardless of the image type [anatomical or radiological]” suggesting that visual interpretation of an isolated structure (a bone) is less complex than the synthesis of information required to interpret images of interrelated and intersecting soft tissues [11]. The simplicity or complexity of both verbal vignettes and images may be accounted for within assessment strategies or processes, for example mapping to cognitive taxonomies [23, 76,77,78]. Analysis of histological and cross-sectional images requires interpreting complex “categorical spatial relations,” whereby the relationships between objects must be judged [34, 79]. Students seem to struggle more with MCQs displaying cross-sectional illustrations, as compared to those which use simpler diagrams or line drawings [24], and to demonstrate different cognitive processes when answering MCQs nested within cross-sectional themes, with more reliance on option elimination, and less visualizing or verbal reasoning being described [34]. It is essential for students to consciously develop and improve these visual and spatial interpretive skills by training and practice, as opposed to passively noting what is pointed out to them, or only memorizing a limited set of exemplars, so that they can apply their knowledge when viewing unfamiliar or novel images, whether in assessments or in future independent practice [14, 19, 75, 79, 80]. For this reason, perhaps a number of novel images should be used within assessments if the required aim is to truly test the students’ abilities of image interpretation, even aside from the argument that the use of familiar images may promote positive cueing [24, 34, 36], an item flaw that may also be present in purely textual vignettes [81]
Interpretation of images and spatial relationships is essential in many disciplines and so including images within assessments aids authenticity and constructive alignment. While the effects of visual or multimedia learning have been explored in many contexts, guidelines for assessment are still sparse, but some basic principles can be considered. Firstly, is the image relevant and essential to answering the question? Redundant information, including images, may simply increase extraneous cognitive load to no benefit, potentially influencing student performance [19, 37, 72, 80]. Secondly, does the image show the relevant structure in isolation, such as an individual bone, or is it seen in relation to surrounding structures, as is the case with a histological cross section, or an abdominal CT scan? Interpreting spatial relations is certainly appropriate for many assessments, but increases the difficulty of the task or question. The third point of recognition considers whether the candidate is presented with a familiar image, seen and studied during their learning activities, or an entirely unfamiliar one. Novel images can be true tests of a candidate’s ability to demonstrate their knowledge and skill at image interpretation, but is potentially more cognitively demanding than recognizing a familiar image, or at least one similar to previous images studied. The realism of the image could also be considered; is the image a simple diagram, or is it a photograph of an actual histological or anatomical specimen [82]? Finally, for formatting, the spatial contiguity principle states that images should be as close as possible to their corresponding text, so that they may be viewed simultaneously without a need to switch focus, as opposed to being on a separate page or screen [19, 33].
Limitations of the Study
This research study was designed as a qualitative think-aloud exploration of cognitive processes. While quantitative statistical analyses were performed and reported, the small number of participants (n = 30) limits the statistical power of these quantitative analyses. Furthermore, the cohort of students who participated were all volunteers, and so entirely self-selecting. While it was conducted in one institution, RCSI encompasses highly diverse student and staff bodies, and in this study of 30 students only 11 recorded their nationality as being within the EU, and students from Asia, North America, and the Middle East were all represented. Only 12 students were monolingual English speakers, 15 identified as bilingual, with two students speaking three languages fluently and one individual fully confident in four. However, the authors’ hope that the findings of this study will stimulate further interest and provide some supporting evidence for future investigation in this field.
Conclusions
In summary, high-performing students were significantly more likely to self-generate an answer as compared to middle and lower performing students, who relied to a greater degree on option elimination. Adding images to MCQs did not have a consistent influence on item statistics, and the students’ self-identified visual-verbal preference (“learning style”) had no consistent bearing on their results for text or image-based questions. Students’ verbalizations regarding images are very highly dependent on whether the image was necessary or unnecessary to answering the question. For MCQs where interpretation of the image was required, specific references to the image were noted for 95% of student-item verbalizations (142 of 150 maximum). In contrast, for MCQs where the image was redundant or unnecessary to answering the MCQ, reference to the image was recorded in only 17% of student-item verbalizations (10 of 60 maximum). The finding does align with the principles of question writing, whereby MCQ vignettes should not be cluttered with unnecessary information that do not help with cueing or answering MCQs, and may instead be detrimental distractions, adding to extraneous cognitive load.
References
Heidger PM, Dee F, Consoer D, et al. Integrated approach to teaching and testing in histology with real and virtual imaging. Anat Rec. 2002;269:107–12.
Bloodgood RA, Ogilvie RW. Trends in histology laboratory teaching in United States medical schools. Anat Rec B. 2006;289:169–75. https://doi.org/10.1002/ar.b.20111.
Paulsen FP, Eichhorn M, Bräuer L. Virtual microscopy—the future of teaching histology in the medical curriculum? Ann Anat. 2010;192:378–82. https://doi.org/10.1016/j.aanat.2010.09.008.
Blake CA, Lavoie HA, Millette CF. Teaching medical histology at the University of South Carolina School of Medicine: transition to virtual slides and virtual microscopes. Anat Rec B. 2003;275B:196–206. https://doi.org/10.1002/ar.b.10037.
Thompson AR, Lowrie DJ. An evaluation of outcomes following the replacement of traditional histology laboratories with self-study modules. Anat Sci Educ. 2017;10:276–85. https://doi.org/10.1002/ase.1659.
McBride JM, Drake RL. National survey on anatomical sciences in medical education. Anat Sci Educ. 2018;11:7–14. https://doi.org/10.1002/ase.1760.
Then SM, Kokolski M, Mbaki Y, et al. An international collaborative approach to learning histology using a virtual microscope. Anat Histol Embryol. 2023;52:21–30. https://doi.org/10.1111/ahe.12888.
Levie WH, Lentz R. Effects of text illustrations: a review of research. ECTJ. 1982;30:195–232. https://doi.org/10.1007/bf02765184.
Mayer RE, Moreno R. Nine ways to reduce cognitive load in multimedia learning. Educ Psychol. 2003;38:43–52. https://doi.org/10.1207/s15326985ep3801_6.
Mayer RE. Applying the science of learning to medical education. Med Educ. 2010;44:543–9. https://doi.org/10.1111/j.1365-2923.2010.03624.x.
Sagoo MG, Vorstenbosch MA, Bazira PJ, et al. Online assessment of applied anatomy knowledge: the effect of images on medical students’ performance. Anat Sci Educ. 2021;14:342–51. https://doi.org/10.1002/ase.1965.
Mayer RE, Massa LJ. Three facets of visual and verbal learners: cognitive ability, cognitive style, and learning preference. J Educ Psychol. 2003;95:833. https://doi.org/10.1037/0022-0663.95.4.833.
Newton PM, Miah M. Evidence-based higher education – is the learning styles ‘myth’ important? Frontiers in Psychology. 2017; 8. https://doi.org/10.3389/fpsyg.2017.00444
Höffler TN. Spatial ability: its influence on learning with visualizations—a meta-analytic review. Educ Psychol Rev. 2010;22:245–69. https://doi.org/10.1007/s10648-010-9126-7.
Rogowsky BA, Calhoun BM, Tallal P. Providing instruction based on students’ learning style preferences does not improve learning [Brief Research Report]. Frontiers in Psychology. 2020; 11. https://doi.org/10.3389/fpsyg.2020.00164
Artino ARJ, Zafar Iqbal M, Crandall SJ. Debunking the learning-styles hypothesis in medical education. Acad Med. 2023;98:289. https://doi.org/10.1097/acm.0000000000004738.
Pashler H, McDaniel M, Rohrer D, et al. Learning styles: concepts and evidence. Psychological science in the public interest. 2008;9:105–19. https://doi.org/10.1111/j.1539-6053.2009.01038.x.
Mayer RE. The Cambridge handbook of multimedia learning. Cambridge University Press; 2014.
Mayer RE. Designing multimedia instruction in anatomy: an evidence-based approach. Clin Anat. 2020;33:2–11. https://doi.org/10.1002/ca.23265.
Hunt DR. Illustrated multiple choice examinations. Med Educ. 1978;12:417–20. https://doi.org/10.1111/j.1365-2923.1978.tb01420.x.
Buzzard A, Bajsdaranayake R, Harvey C. How to produce visual material for multiple choice examinations. Med Teach. 1987;9:451–6. https://doi.org/10.3109/01421598709008341.
Dwyer F, De Melo H. A systematic assessment of the effect of visual testing of visualized instruction: a justification for instructional congruency. J Vis Verbal Lang. 1984;4:53–66. https://doi.org/10.1080/23796529.1984.11674379.
Phillips AW, Smith SG, Straus CM. Driving deeper learning by assessment: an adaptation of the revised Bloom’s taxonomy for medical imaging in gross anatomy. Acad Radiol. 2013;20:784–9. https://doi.org/10.1016/j.acra.2013.02.001.
Vorstenbosch MA, Klaassen TP, Kooloos JG, et al. Do images influence assessment in anatomy? Exploring the effect of images on item difficulty and item discrimination. Anat Sci Educ. 2013;6:29–41. https://doi.org/10.1002/ase.1290.
Case SM, Swanson DB. Constructing written test questions for the basic and clinical sciences. National Board of Medical Examiners Philadelphia, PA; 2002.
van der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to programmes. Med Educ. 2005;39:309–17. https://doi.org/10.1111/j.1365-2929.2005.02094.x.
Holland J, Stevens N. Guidelines for the development of multiple choice items & assessments. Dublin, Ireland: RCSI University of Medicine and Health Sciences; 2020. ISBN: 978–1–9996983–0–0. Available from: https://repository.rcsi.com/articles/book/Guidelines_for_the_development_of_multiple_choice_items_assessments_/13947164.
Downing SM. Construct-irrelevant variance and flawed test questions: do multiple-choice item-writing principles make any difference? Acad Med. 2002;77:S103–4.
Haladyna TM, Downing SM. Construct-irrelevant variance in high-stakes testing. Educ Meas Issues Pract. 2004;23:17–27. https://doi.org/10.1111/j.1745-3992.2004.tb00149.x.
Abedi J. Language Issues in Item Development. In: Downing SM, Haladyna TM, editors. Handbook of test development. Lawrence Erlbaum Associates Publishers; 2006:377–98.
Das M, Ettarh R, Lowrie D, et al. A guide to competencies, educational goals, and learning objectives for teaching medical histology in an undergraduate medical education setting. Med Sci Educ. 2019;29:523–34. https://doi.org/10.1007/s40670-018-00688-9.
Biggs J. Aligning teaching and assessing to course objectives. Teach Learn High Educ. 2003;2:13–7.
Eitel A, Scheiter K. Picture or text first? Explaining sequence effects when learning with pictures and text. Educ Psychol Rev. 2015;27:153–80. https://doi.org/10.1007/s10648-014-9264-4.
Vorstenbosch MA, Bouter ST, Hurk MM, et al. Exploring the validity of assessment in anatomy: do images influence cognitive processes used in answering extended matching questions? Anat Sci Educ. 2014;7:107–16. https://doi.org/10.1002/ase.1382.
Holland J, O’Sullivan R, Arnett R. Is a picture worth a thousand words: an analysis of the difficulty and discrimination parameters of illustrated vs. text-alone vignettes in histology multiple choice questions. BMC Med Educ. 2015; 15: 184. https://doi.org/10.1186/s12909-015-0452-9
Crisp V, Sweiry E. Can a picture ruin a thousand words? The effects of visual resources in exam questions. Educ Res (Windsor). 2006;48:139–54. https://doi.org/10.1080/00131880600732249.
Berends IE, van Lieshout EC. The effect of illustrations in arithmetic problem-solving: effects of increased cognitive load. Learn Instr. 2009;19:345–53. https://doi.org/10.1016/j.learninstruc.2008.06.012.
Holland J, Clarke E, Glynn M. Out of sight, out of mind: do repeating students overlook online course components? Anat Sci Educ. 2016;9:555–64. https://doi.org/10.1002/ase.1613.
De Winter JC, Dodou D. Five-point Likert items: t test versus Mann-Whitney-Wilcoxon. Pract Assess Res Eval. 2010;15:1–12. https://doi.org/10.7275/BJ1P-TS64.
Carifio J, Perla RJ. Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. J Soc Sci. 2007;3:106–16. https://doi.org/10.3844/jssp.2007.106.116.
Conroy RM. What hypotheses do" nonparametric" two-group tests actually test? Stata Journal. 2012;12:182. https://doi.org/10.1177/1536867X1201200202.
Norman G. Likert scales, levels of measurement and the “laws” of statistics. Adv in Health Sci Educ. 2010;15:625–32. https://doi.org/10.1007/s10459-010-9222-y.
Sullivan GM, Artino AR. Analyzing and interpreting data from Likert-type scales. JGME. 2013;5:541–2. https://doi.org/10.4300/JGME-5-4-18.
Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3:77–101. https://doi.org/10.1191/1478088706qp063oa.
Clarke V, Braun V, Hayfield N. Thematic analysis. In: Smith JA, editor. Qualitative psychology: a practical guide to research methods. Vol. 3: SAGE Publications Ltd; 2015:222–248.
Coderre S, Mandin H, Harasym PH, et al. Diagnostic reasoning strategies and diagnostic success. Med Educ. 2003;37:695–703. https://doi.org/10.1046/j.1365-2923.2003.01577.x.
Heemskerk L, Norman G, Chou S, et al. The effect of question format and task difficulty on reasoning strategies and diagnostic performance in internal medicine residents. Adv in Health Sci Educ. 2008;13:453–62. https://doi.org/10.1007/s10459-006-9057-8.
Heist BS, Gonzalo JD, Durning S, et al. Exploring clinical reasoning strategies and test-taking behaviors during clinical vignette style multiple-choice examinations: a mixed methods study. JGME. 2014;6:709–14. https://doi.org/10.4300/JGME-D-14-00176.1.
Kiger ME, Varpio L. Thematic analysis of qualitative data: AMEE Guide No. 131. Med Teach. 2020; 42: 846–854. https://doi.org/10.1080/0142159X.2020.1755030
Coderre S, Harasym P, Mandin H, et al. The impact of two multiple-choice question formats on the problem-solving strategies used by novices and experts. BMC Med Educ. 2004;4:23. https://doi.org/10.1186/1472-6920-4-23.
Gorard S. Research design: creating robust approaches for the social sciences. Sage; 2013.
Durning SJ, Dong T, Artino AR, et al. Dual processing theory and expertsʼ reasoning: exploring thinking on national multiple-choice questions. Perspect Med Educ. 2015;4:168–75. https://doi.org/10.1007/s40037-015-0196-6.
Durning SJ, Artino AR, Beckman TJ, et al. Does the think-aloud protocol reflect thinking? Exploring functional neuroimaging differences with thinking (answering multiple choice questions) versus thinking aloud. Med Teach. 2013;35:720–6. https://doi.org/10.3109/0142159X.2013.801938.
Ericsson KA, Simon HA. How to study thinking in everyday life: contrasting think-aloud protocols with descriptions and explanations of thinking. Mind Cult Act. 1998;5:178–86. https://doi.org/10.1207/s15327884mca0503_3.
Aczel B, Lukacs B, Komlos J, et al. Unconscious intuition or conscious analysis? Critical questions for the deliberation-without-attention paradigm. Judgm Decis Mak. 2011;6:351–8. https://doi.org/10.1017/S1930297500001960.
Rey A, Goldstein RM, Perruchet P. Does unconscious thought improve complex decision making? Psychological Research PRPF. 2009;73:372–9. https://doi.org/10.1007/s00426-008-0156-4.
Gigerenzer G, Brighton H. Homo heuristicus: why biased minds make better inferences. Top Cogn Sci. 2009;1:107–43. https://doi.org/10.1111/j.1756-8765.2008.01006.x.
Kelsey JM. Inner experience and self-ratings of inner speaking. University of Nevada, Las Vegas: University of Nevada, Las Vegas; 2016. Available from: https://digitalscholarship.unlv.edu/thesesdissertations/2691
Durning SJ, Ten Cate OTJ. Peer teaching in medical education. Med Teach. 2007;29:523–4. https://doi.org/10.1080/01421590701683160.
Sam AH, Field SM, Collares CF, et al. Very-short-answer questions: reliability, discrimination and acceptability. Med Educ. 2018;52:447–55. https://doi.org/10.1111/medu.13504.
Hołda MK, Stefura T, Koziej M, et al. Alarming decline in recognition of anatomical structures amongst medical students and physicians. Ann Anat. 2019;221:48–56. https://doi.org/10.1016/j.aanat.2018.09.004.
Langlois J, Bellemare C, Toulouse J, et al. Spatial abilities and anatomy knowledge assessment: a systematic review. Anat Sci Educ. 2017;10:235–41. https://doi.org/10.1002/ase.1655.
Luursema J-M, Vorstenbosch M, Kooloos J. Stereopsis, visuospatial ability, and virtual reality in anatomy learning. Anatomy research international. 2017; 2017. https://doi.org/10.1155/2017/1493135
O’Brien KE, Cannarozzi ML, Torre DM, et al. Training and assessment of CXR/basic radiology interpretation skills: results from the 2005 CDIM Survey. Teach Learn Med. 2008;20:157–62. https://doi.org/10.1080/10401330801991840.
Phillips AW, Smith SG, Straus CM. The role of radiology in preclinical anatomy: a critical review of the past, present, and future. Acad Radiol. 2013;20:297-304.e1. https://doi.org/10.1016/j.acra.2012.10.005.
Rathan R, Hamdy H, Kassab SE, et al. Implications of introducing case based radiological images in anatomy on teaching, learning and assessment of medical students: a mixed-methods study. BMC Med Educ. 2022;22:723. https://doi.org/10.1186/s12909-022-03784-y.
Magi M, Jayagandhi S, Dinesh Kumar V, et al. Analysing the effect of incorporating images while framing MCQs for online clinical anatomy assessment among first year medical students. Int J Anat Res 2022; 10: 8482–8488. https://doi.org/10.16965/ijar.2022.225
Notebaert AJ. The effect of images on item statistics in multiple choice anatomy examinations. Anat Sci Educ. 2016: 68 - 78. https://doi.org/10.1002/ase.1637
Sagoo MG, Smith CF, Gosden E. Assessment of anatomical knowledge by practical examinations: the effect of question design on student performance. Anat Sci Educ. 2016;9:446–52. https://doi.org/10.1002/ase.1597.
Bahlmann O. Illustrated versus non-illustrated anatomical test items in anatomy course tests and German Medical Licensing examinations (M1). GMS J Med Educ. 2018; 35. https://doi.org/10.3205/zma001172
Rutgers DR, van Raamt F, van der Gijp A, et al. Determinants of difficulty and discriminating power of image-based test items in postgraduate radiological examinations. Acad Radiol. 2018;25:665–72. https://doi.org/10.1016/j.acra.2017.10.014.
Leppink J, van den Heuvel A. The evolution of cognitive load theory and its application to medical education. Perspect Med Educ. 2015;4:119–27. https://doi.org/10.1007/s40037-015-0192-x.
Sweller J. Story of a research program. Education Review. 2016; 23. https://doi.org/10.14507/er.v23.2025
Doyle M, Boyle B, Holland J, et al. The trainee experience: a candidate feedback from the pilot European exam in medical microbiology. Poster presented at: 11c -Medical education for CM/ID. ECCMID 2022 (European Society of Clinical Microbiology and Infectious Diseases); Lisbon, Portugal 2022.
Ben Awadh A, Clark J, Clowry G, et al. Multimodal three-dimensional visualization enhances novice learner interpretation of basic cross-sectional anatomy. Anat Sci Educ. 2022;15:127–42. https://doi.org/10.1002/ase.2045.
Stringer JK, Santen SA, Lee E, et al. Examining Bloom’s taxonomy in multiple choice questions: students’ approach to questions. Med Sci Educ. 2021;31:1311–7. https://doi.org/10.1007/s40670-021-01305-y.
Tiemeier AM, Stacy ZA, Burke JM. Using multiple choice questions written at various Bloom's taxonomy levels to evaluate student performance across a therapeutics sequence. INNOVATIONS in pharmacy. 2011; 2. https://doi.org/10.24926/iip.v2i2.224
Zaidi NB, Hwang C, Scott S, et al. Climbing Bloom’s taxonomy pyramid: lessons from a graduate histology course. Anat Sci Educ. 2017;10:456–64. https://doi.org/10.1002/ase.1685.
Pettersson AF, Karlgren K, Al-Saadi J, et al. How students discern anatomical structures using digital three-dimensional visualizations in anatomy education. Anat Sci Educ. 2023;00:1–13. https://doi.org/10.1002/ase.2255.
Cook MP. Visual representations in science education: the influence of prior knowledge and cognitive load theory on instructional design principles. Sci Educ. 2006;90:1073–91. https://doi.org/10.1002/sce.20164.
Schuwirth L, Vleuten Cvd, Donkers H. A closer look at cueing effects in multiple‐choice questions. Med Educ. 1996; 30: 44–49. https://doi.org/10.1111/j.1365-2923.1996.tb00716.x
Erolin C. Preference for realism in 3D anatomical scans. J Vis Commun Med. 2023;46:85–96. https://doi.org/10.1080/17453054.2023.2226690.
Funding
Open Access funding provided by the IReL Consortium. This work was supported by an Education Research Grant from the Irish Network of Healthcare Educators (INHED).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Supplementary Information
Interview sheets & MCQs.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Holland, J., McGarvey, A., Flood, M. et al. A Qualitative Exploration of Student Cognition When Answering Text-Only or Image-Based Histology Multiple-Choice Questions. Med.Sci.Educ. (2024). https://doi.org/10.1007/s40670-024-02104-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s40670-024-02104-x