Introduction

It is internationally recognised for special measures to be considered for children and vulnerable adults when they are giving evidence in court (International Criminal Court, 2023; Ministry of Justice, 2023). Live links to court allow a witness to give evidence outside the courtroom with the aim of reducing distress and improving the quality of evidence (Ministry of Justice, 2023). Remote forensic interviews via video or audio link are not currently recommended without compelling reasons to do so, despite successful use of video-conferencing software in a range of alternative legal situations, and for suspects under certain conditions (International Criminal Court, 2023; PACE Code, 2022). During the COVID-19 pandemic, recommendations were made to address any digital exclusions which limit children’s access to support and services (Romanou & Belton, 2020). Despite some changes in remote interview guidance during COVID for adult witnesses, prohibitions still apply internationally for vulnerable witnesses, including all witnesses under the age of 18, due to concerns for child welfare and evidence reliability (Institute for International Criminal Investigations, 2021; NSSGII, 2020).

A small body of work examining the use of tele-forensic interviewing (tele-FI) for children has shown minimal differences in information quantity/accuracy, and verbal misinformation acceptance, when compared to face-to-face interviews (Dickinson et al., 2021; Doherty-Sneddon & McAuley, 2000; Hamilton et al., 2017). To our knowledge, no research has examined gestural misinformation acceptance in relation to tele-FI, despite face-to-face research showing robust evidence for the GME in children (Johnstone et al., 2023; Kirk et al., 2015; Meyer et al., 2023). For example, research has shown that when children are asked if they remember what someone looked like, exposure to the leading gesture of hat can lead them to believe the man was wearing a hat, even if this contradicts what they have seen (Johnstone et al., 2023). To address this gap in research, the present study aimed to evaluate the GME during tele-FI and compare this to face-to-face interviews.

Child Maltreatment

Based on 38 systematically reviewed past-violence surveys from 96 countries, Hillis et al. (2016) estimated that one in two children aged 2–17 experienced child abuse worldwide in 2016, with prevalence rates differing between genders and/or continent (Moody et al., 2018). During the COVID-19 pandemic, child abuse rates increased even further, mainly due to heightened family tension and a reduction in protective services globally (Bourgault et al., 2021; Romanou & Belton, 2020). Despite this, official reports of child maltreatment decreased internationally, likely due to the reduced accessibility of professional services such as social workers and teachers (Kourti et al., 2023). Social distancing measures placed extra strain on the police service, with gathering witness statements in line with Best Evidence Interview guidelines viewed as one of the biggest problems, particularly for vulnerable witnesses such as children (HMICFRS, 2021). Forces in England and Wales adapted by taking witness statements by phone, while temporary interview protocols were put in place to allow audio/video conferencing technology to be used for legal advice (HMICFRS, 2021).

Tele-Forensic Interviewing

Video-conferencing software is successfully used in a variety of legal and medical contexts such as child telepsychiatry (McGuinness & Ellington, 2011), tele-psychological services (Batastini et al., 2016), custody evaluations (Dale & Smith, 2021), tele-mental health (Gloff et al., 2015), and forensic evaluations (Luxton & Lexcen, 2018). Court live links, routinely used as part of special measures for children, allows a child to give evidence in court via live video-conferencing software. Findings indicate that this lessens a child’s distress (Landström & Granhag, 2010) and improves the quality of evidence given by increasing resistance to verbally leading or misleading questioning (Goodman et al., 1998). As an investigative tool, tele-FI offers many benefits over that of face-to-face interviews. There are many circumstances where face-to-face interviews are difficult to obtain, with access reduced during natural disasters or communicable disease outbreaks such as COVID-19, or when witnesses live in remote locations, are hospitalised, require interviewers with specialised training, or may be unable to attend an interview due to mental health difficulties. In these cases, children often must wait longer periods than is typically advisable to give evidence, with delays increasing the chance that memory of the event will diminish (Read & Connolly, 2017).

Only three studies, to the best of our knowledge, have examined the use of tele-FI to elicit evidence from children (Dickinson et al., 2021; Doherty-Sneddon & McAuley, 2000; Hamilton et al., 2017). Results from Doherty-Sneddon and McAuley (2000) demonstrated that despite the loss of gestural information and visual cues associated with video-conferencing software, evidential quality was better using tele-FI compared to face-to-face interviews. Doherty-Sneddon and McAuley (2000) were supported by Hamilton et al. (2017) who showed that memory reports from 100 children aged 5–12 were just as accurate and informative via video-feed as face-to-face after a 1-2-day delay, with no difference in ability to be verbally misled. Longer delays have also shown similar results, with younger children (4-6-year-olds) and older children (7-8-years-old) demonstrating comparable accuracy between face-to-face and tele-FI conditions, and no difference in verbal suggestibility, even after a two-week delay (Dickinson et al., 2021).

Gesture

Gesture has important adaptive functions associated with speech, including roles in the packaging and manipulation of spatio-motoring information to convey meaning and help conceptualise and organise speech (Kita et al., 2017), and as an aid to enhance comprehension, particularly in instances of ambiguous or challenging articulation (Dargue et al., 2019; Hostetter, 2011). Gestural information can also be suggestive and has been shown to corrupt eyewitness memory in typically developing children (Broaders & Goldin-Meadow, 2010; Johnstone et al., 2023; Kirk et al., 2015; Meyer et al., 2023), adults (Gurney et al., 2016), and in children with intellectual or developmental difficulties (Johnstone et al., 2024). Observing gestures not only enhances memory over speech alone through exploitation of the listener’s mirror-neuron system (Proverbio & Zani, 2022) and motor system (Ianì & Bucciarelli, 2017), but also acts as a source of information (Dargue et al., 2019; Pezdek & Roe, 1995) even when this information is misleading. Gesture has been shown to both share common neural pathways with speech for comprehension and production (Marstaller & Burianová, 2015; Yang et al., 2015) and have a distinct route bypassing the language network more congruent with the processing of emotions and facial expressions (Jouravlev et al., 2019).

The GME challenges conventional assumptions about verbal suggestibility in certain populations, such as those with higher verbal ability, older age, or stronger memory trace strength (Kirk et al., 2015). Similarly, little is known regarding the GME and gender, with inconsistent findings in the verbal suggestibility literature to date (Bruck & Melnyk, 2004). Recent studies highlight a pervasive susceptibility to gestural misinformation regardless of age in children aged 5-8-years (Johnstone et al., 2023) and 6-13-years-old (Meyer et al., 2023), indicating a disassociation between the GME and typical developmental trends such as improved cognitive processing (Bruck & Melnyk, 2004), resistance to social pressure (Gudjonsson et al., 2016; Roebers et al., 2005), and stronger language/narrative skills (Perez et al., 2022). In the presence of highly salient gestures, the GME demonstrates a strong misleading effect and stays influential even when gestures are subtle (Johnstone et al., 2023). The GME has also shown a stronger misleading ability when questions are focused on peripheral details (descriptions of feelings, other people, places, and temporal events) compared to central details (the core event or crime and the main characters or perpetrators (Johnstone et al., 2023). These findings support research demonstrating enhanced recall and memory integration for central action events (Ibabe & Sporer, 2004; Sarwar et al., 2014), and infer that when a memory trace is weaker that the reliance on gesture for comprehension, or as a source of information, may be greater (Dargue et al., 2019; Pezdek & Roe, 1995).

Aims of this Study

The current study investigated the effect of gestures on children’s recall accuracy in tele-FI interviews, and compared the findings to previously published face-to-face interviews using the same methodology, video, and question sets (Johnstone et al., 2023). It examined whether tele-FI interviews could gather information of similar quantity and quality as face-to-face interviews and whether gesture saliency or question centrality influenced the misinformation effect. A variety of naturalistic gestures were used, with questions counterbalanced to ensure equal exposure to accurate, misleading, and no gesture conditions. Age was analysed continuously to maintain statistical power (Bainter et al., 2020).

Given the lack of research in this area, predictions were based on similar work conducted on live links (Goodman et al., 1998; Landström & Granhag, 2010) and on the tele-FI verbal suggestibility literature (Dickinson et al., 2021; Doherty-Sneddon & McAuley, 2000; Hamilton et al., 2017). As such, it was hypothesised that: (1) children would be significantly misled by gestural misinformation during tele-FI, with salient gestures being more misleading than subtle ones, and peripheral event questions being more susceptible to misleading information than central event questions. Differences between tele-Fi and face-to-face conditions were predicted to show: (2) an increase in quantity and quality of information during tele-FI compared to face-to-face interviews, with greater suggestibility for the GME during tele-FI due to the need to move gestures higher up the body, to be visible on screen, and making them more noticeable as a source of information. Regarding age differences, it was expected that: (3) there would be an interaction between age and interview mode, with older children showing increased quantity and quality of information during tele-FI, and younger children showing a better quantity and quality of information in face-to-face conditions. Lastly, (4) the study explored the influence of gender on accuracy and the GME, with predictions uncertain regarding the direction or presence of any effect.

Method

Pre-Registration

The study was pre-registered on the Open Science Framework on 17th March 2023 under the registration link https://doi.org/10.17605/OSF.IO/4NJ7C. Pre-registration involved detailing the study design, methodology, and analysis plan prior to data collection. Any deviations from the pre-registered protocol are outlined below, along with justifications for these modifications.

Participants

A power analysis was performed using G Power to determine the sample size required for a linear multiple regression. A power of 0.80, a significance level of α = 0.05, and an effect size of η2 = 0.23 (equivalent to f = 0.30) derived from previous published research (Johnstone et al., 2023), resulted in a suggested sample size of 36. Within misinformation research the smallest effect size of interest (SESOI) is most often accepted as any effect size leading to a p-value below 0.05, a raw mean difference extending up to 1 misinformation detail, or any reliable effect size at all (Riesthuis et al., 2022). The calculated effect size used in this study was based on prior research by the authors using the same methodology and misinformation paradigm in which a mean of 1.12 misinformation details was found in children aged 5-8-years-old and was indicative of a small to medium effect size (Johnstone et al., 2023). In contrast to pre-registration, forty-seven children were recruited from 2 mainstream schools in the UK, after data collection issues at the first school. Children ranged in age from 5- to 8-years-old (M = 6 years 11 months, SD = 1.08, 25 boys, 22 girls). Comparisons to face-to-face conditions were based on previous work by the author (N = 63, M = 7 years 2 months, SD = 0.99, 35 girls and 28 boys). Given the availability of participants and the fact they all met the inclusion criteria, a larger sample was used to allow all the children to participate who wanted to. Children were recruited as a sample of convenience from a mix of backgrounds. Ethical approval for this study was given by The University of Sheffield.

Materials

Children watched a five-minute video of a young girl taking a gymnastics examination. The video was the same video used in Johnstone et al. (2023) and included scenes of a girl on a beam doing gymnastics. The video contained shots of other people in the room, and some scenes before and after the routine. There was little speech in the video and no questions were asked about anything that was said or heard. Children’s responses were video recorded using Google meet as requested by the schools.

Interview questions, scripts, and video were derived from past research (Johnstone et al., 2023) to ensure a reliable comparison with previous face-to-face research. Each script included 12 questions with four accurate gesture conditions, four misleading gesture conditions and four no gesture questions. All children answered four questions from each condition to a total of 12 questions. All scripts were counterbalanced so that a question asked in script 1 (for example) with an accurate gesture, was asked with a misleading gesture in script 2, and no gesture in script 3 (Table 1). Of the 12 questions, six questions were based on central information, and six on peripheral information. Each script contained six questions including salient gestures and six questions with subtle gestures. This led to three questions in each category - central/salient (x3), central/subtle (x3), peripheral/salient (x3), and peripheral/subtle (x3) for each script.

Table 1 An example question and the associated gestures for each question set

Design

The tele-Fi experiment consisted of a within-participant design with a repeated measures ANOVA to examine correct, incorrect or “do not know/do not remember” (DKDR) answers for each condition (accurate/misleading/no gesture). Incorrect responses were examined further to assess whether answers were consistent (congruent with the gesture) or inconsistent (answer unrelated to the gesture) with the misleading gesture observed. For example, in response to the question, “What colour leotard was the girl wearing? (red)” with a misleading gesture indicating blue, a response of “blue” was deemed consistent, while “pink” was considered inconsistent. All inconsistent answers were plausible and relevant to the context.

The misinformation paradigm was taken from prior work by the authors (Johnstone et al., 2023) and included 6 stages (Fig. 1). During free recall assessment children were asked to provide as much detail as possible about the video they had watched, and items of information (IOI) given about the video assessed. General prompts were used to aid information retrieval, including questions such as “Can you remember anything else?”, “Could you describe what happened next?”, and “Can you tell me a bit about the people you saw in the video?”. Comparisons to face-to-face interviews used prior published research by Johnstone et al. (2023) using the same misinformation paradigm, video, and methodology. This resulted in a between participant design to examine total words spoken, correct IOI’s, incorrect IOI’s, DKDR, accuracy and gesture consistent answers for face-to-face (n = 63) and tele-FI (n = 47) conditions. Dichotomous moderated regressions were conducted to assess the impact of age and mode of interview total words spoken, correct IOI’s, incorrect IOI’s, DKDR, accuracy and gesture consistent answers, with mode of interview (face-to-face or tele-FI) as the moderator and age as the predictor variable.

Coding

Free recall interviews were coded by items of information (IOIs), which were defined IOI’s as any information the child gave about the video. Following from Johnstone et al. (2023), examples of IOIs were things such as the storyline of the video (e.g., gymnastics/exam), information on individuals (e.g., gender/clothes/hair/age), events (e.g., she did a cartwheel/they hugged) and emotions (e.g., she was crying/they were nervous). IOIs were used to examine the accuracy of reports by assessing whether the information collected was correct or incorrect. Inter-rater reliability was determined for 40% of the sample which was coded by an experienced researcher who was not otherwise involved in the study. There was good agreement between the two coders for the pre-interview free-recall Kappa = 0.76, p < 0.001, and for the post-interview free-recall, Kappa = 0.89, p < 0.001. Structured interview responses were systematically classified into three categories: correct, incorrect, or ‘do not know/do not remember’ (DKDR). DKDR responses incorporated non-verbal gestures such as shrugging their shoulders and head shaking to indicate ‘no’. Inter-rater reliability for the structured interview was assessed for 20% of the participants and demonstrated a high agreement between raters (Kappa = 0.99, p < 0.001).

Procedure

To replicate the tele-FI environment as closely as possible, 2 separate rooms of the school were set up with a laptop each and connected using the online video conferencing software Google Meet. The selection of Google Meet was chosen by the schools as the safest platform for the pupils. Children were brought from their class by the experimenter and seated in one of the rooms in front of the laptop. After setting the video playing, the experimenter left the room under the pretence of having to speak to someone but said they would remain outside the room. When the video finished, the experimenter returned feigning disappointment that they missed the video, switched the screen to Google Meet, then proceeded to go to the second room, at which point they appeared on screen for the child.

In accordance with the principles outlined in the enhanced cognitive interview guidelines (Fisher & Geiselman, 2010; Geiselman & Fisher, 2014) a rapport-building phase was included before the interview began. The interviewer first checked that the child could hear and see them before perceived control was handed to the child using the phrase “I don’t know what happened in that video, but I have some questions here. Do you think you could help me by answering some questions?” Children were assured that if they did not know the answer to a question that this was okay and to just say they didn’t know or couldn’t remember.

The interview phase included four stages - pre-interview free recall, structured interview questions, a distractor task, and post-interview free recall. This was in line with Johnstone et al. (2023) to ensure comparisons between face-to-face interviews and tele-FI would be reliable.

Fig. 1
figure 1

The 6 stages of the gestural misinformation paradigm: Rapport, Video, Pre-interview free-recall, Structured Interview, Distractor task, and Post-interview free-recall

Following free recall, each child was randomly assigned to one of the three scripts and asked 12 structured questions across three conditions: no gesture (Condition 1), accurate gesture (Condition 2), and misleading gesture (Condition 3). In Condition 1, questions were asked without accompanying gestures and the experimenter kept their hands still but within the field of vision. In Condition 2, accurate gestures were used which were consistent with the video content. In Condition 3, misleading gestures were used showing inconsistent, but plausible, false information. All gestures were iconic, visually represented meaningful concepts, and were chosen to communicate semantic information to the participants (McNeill, 1992). In line with our previous findings demonstrating the influence of question centrality and gesture saliency on the GME (Johnstone et al., 2023) six questions were asked about the central character and/or action, and six were about peripheral events that took place before or after the main event or were about other people in the video (Andrews & Lamb, 2019). Gestures were divided evenly into salient or subtle gestures (Chu et al., 2014), with salient gestures being large arm or hand movements, and subtle gestures being small hand or finger movements. All gestures were easily visible within the screens field of vision.

After the structured interview children engaged in a distractor task while they talked with the experimenter about school and hobbies for three minutes. A second free recall test then took place and children were asked if they could remember anything else about the video they had watched. The same open-ended prompts were used to help elicit information. After this the experimenter returned to the child and turned off the video recording which was then automatically saved to Google drive. The child was thanked for their help and asked whether they preferred talking face-to-face or via the video, before being given a small reward and a certificate and being taken back to their class.

Results

In the small number of cases where results deviated from normal distribution, the appropriate non-parametric test was completed to assess significance, in conjunction with the parametric test. In cases where the both the non-parametric and parametric analysis agreed, then the parametric analysis was reported. Given the number of comparisons being made, all results were adjusted using Bonferroni’s correction to give a conservative significance estimate, unless otherwise stated.

The Effect of Gesture During Tele-FI

The mean number of questions that children answered correctly, incorrectly, or DKDR via tele-FI for each condition are visually represented in Fig. 2.

Fig. 2
figure 2

Mean number of correct, incorrect, and DK/DR responses via tele-FI (out of a maximum score of 4) including 95% confidence intervals

Compared to the control of no gesture (M = 1.81), accurate gestures were more likely to elicit a correct answer (M = 2.21), and misleading gestures were less likely to produce a correct answer (M = 1.53). A repeated measures ANOVA was conducted to test the effect of gesture (1. Accurate, 2. Misleading or 3. Control) on correct responses. A significant main effect of gesture type for correct responses was found, F(2,92) = 6.82, p = 0.002, η2 = 0.13. Pairwise comparisons with Bonferroni adjustment revealed that accurate gestures were more likely to elicit a correct answer than a misleading gesture (MD = 0.68, p = 0.002, Cohen’s d = 0.73). No difference was found when comparing the control to either accurate gestures (MD = 0.40 p = 0.1, Cohen’s d = 0.39), or misleading gestures (MD = 0.28 p = 0.42, Cohen’s d = 0.27).

For incorrect answers, misleading gestures were more likely to produce an incorrect response (M = 1.70) than either accurate gestures (M = 1.00) or the no gesture control (M = 1.17). A repeated measures ANOVA showed that the main effect of gesture (1. Accurate, 2. Misleading or 3. Control) on incorrect responses was significant F(2,92) = 6.54, p = 0.002, η2 = 0.12 with comparisons showing that misleading gestures were more likely to elicit an incorrect answer than accurate gestures (MD = 0.70, p = 0.001, Cohen’s d = 0.80). No difference was found compared to the control for either accurate (MD = 0.17, p = 1.00, Cohen’s d = 0.18) or misleading conditions (MD = 0.53, p = 0.67, Cohen’s d = 0.52).

The control condition of no gesture (M = 1.02) was more likely to elicit a DKDR response than accurate gestures (M = 0.79) or misleading gestures (M = 0.77). After completion of a repeated measures ANOVA examining the main effect of gesture (1. Accurate, 2. Misleading or 3. Control) on DKDR answers, no main effect was found between conditions F(2,92) = 1.83, p = 0.166, η2 = 0.038.

Incorrect answers were more likely to be consistent with the misleading gesture seen during tele-FI (M = 1.23) than inconsistent (M = 0.45), and a t-test showed that this difference was significant t(46) = 4.65, p < 0.001, Cohen’s d = 0.68).

Question Centrality and Gesture Saliency

Salient gestures were more likely to elicit the misleading suggested word than gestures that were more subtle (MD = 0.60), t(46) = 5.51, p < 0.001, Cohen’s d = 0.80, CI [0.47,1.13]. Events that were peripheral to the main plot of the video showed no more ability to be misled than central events (MD = 0.0.13), t(46)=-0.88, p=,0.382 Cohen’s d=-0.13, CI [-0.42,0.16].

The top 5 misleading gestures during tele-FI were salient gestures, with the most misleading gesture of thumbs up able to mislead participants in the tele-FI condition 73% of the time. These results are comparable to face-to-face conditions (Johnstone et al., 2023) in which the top 4 misleading gestures were classed as salient, with the thumbs up gesture misleading participants 52% of the time (Fig. 3). Comparison of tele-FI and face-to-face interviewing demonstrated that the top 6 gestures from the face-to-face condition were also the most misleading in the tele-FI condition, with the main difference being the increased ability of the feelings gesture (angry vs. sad) to mislead during tele-FI (63%) compared to face-to-face (19%).

Peripheral questions were responsible for more DKDR answers (n = 83) than central questions (n = 30) while not much difference was seen between DKDR responses for salient gestures (n = 60) or subtle gestures (n = 53). Compared to face-to-face interviews, results demonstrated that centrality and saliency produced a similar pattern of DKDR responses during tele-FI as observed previously (Johnstone et al., 2023).

Fig. 3
figure 3

The percentage of children misled by the 12 possible gestures for face-to-face interviews (N = 63) and tele-FI (N = 47)

Face-to-Face Interviews vs. Tele-Forensic Interviewing

To assess whether tele-FI elicits the same quantity and quality of information as face-to-face interviews, we examined the total number of words spoken, total correct and incorrect information, DKDR answers, accuracy, and consistent answers for each condition (Table 2). This is a slight deviation from the pre-registration and was completed to reduce the chance of statistical error. Total correct and incorrect information was calculated using coded free recall IOIs and structured interview answers. A composite accuracy score (correct answers minus incorrect answers) was created to assess overall practical relevance.

Analysis showed a significant difference between total words spoken (MD = 90.44), F(1,109) = 30.22, p < 0.001, correct IOI’s (MD = 4.45), F(1,109) = 4.29, p = 0.041, and incorrect IOI’s (MD = 1.41), F(1,109) = 7.67, p = 0.007. No difference was seen in DKDR responses (MD = 0.02), F(1,109) = 0.00, p = 0.964, accuracy (MD = 2.19), F(1,109) = 1.31, p = 0.255, or incorrect answers consistent with the misleading gesture (MD = 0.12), F(1,109) = 0.59, p = 0.445.

Table 2 Mean, standard deviations, effect size and 95% confidence intervals for the collected data for both face-to-face interviews and tele-FI

Interview Mode and Age

To determine whether the mode of interview affected children differently depending on age, a series of moderated regressions were run with age as the predictor variable, total words, DKDR, accuracy and answers consistent with misleading gesture as the outcome variables, and mode of interview as the moderator.

For the effect of age on total words spoken the model was found to be significant r2(106) = 0.29, p < 0.001. During face-to-face interviews (b = 13.32, 95% CI [-7.59, 34.23], t = 1.26, p = 0.209) a non-significant, positive relationship between total words and age was seen. In comparison tele-FI interviews showed a significant, positive relationship (b = 31.31, 95% CI [9.74, 52.88], t = 2.88, p = 0.005) between total words spoken and the age of the participant. The difference in total words spoken between face-to-face interviews and tele-FI interviews was significant (p < 0.001) with tele-FI having a greater number of words spoken than face-to-face interviews. The interview condition (face-to-face or tele-FI) did not moderate the effect of age upon total words spoken r2change (106) = 0.01, F = 1.41, p = 0.238.

For the effect of age on DKDR answers the model was found to be significant r2(106) = 0.11, p = 0.005. During both face-to-face interviews (b=-0.62, 95% CI [-1.14, -0.09], t=-2.33, p = 0.022) and tele-FI interviews (b=-0.79, 95% CI [-1.33, -0.24], t=-2.88, p = 0.005) a negative relationship between DKDR answers and age was found. The difference in DKDR answers between face-to-face interviews and tele-FI interviews was non-significant (p = 0.683). The interview condition (face-to-face or tele-FI) did not moderate the effect of age upon DKDR answers r2change (106) = 0.002, F = 0.19, p = 0.661.

For the effect of age on accuracy the model was found to be significant r2(106) = 0.18, p < 0.001. During both face-to-face interviews (b = 2.86, 95% CI [0.53, 5.18], t = 2.45, p = 0.017) and tele-FI interviews (b = 4.80, 95% CI [2.40, 7.20], t = 3.97, p < 0.001) a significant, positive relationship between accuracy and age was found. The difference in accuracy between face-to-face interviews and tele-FI interviews was non-significant (p = 0.073) with tele-FI showing a small improvement in accuracy compared to face-to-face interviews. The interview condition (face-to-face or tele-FI) did not moderate the effect of age upon accuracy r2change (106) = 0.01, F = 1.33, p = 0.251. A breakdown of the accuracy score showed a significant, moderate, positive correlation between age and correct information for both tele-FI (r(45) = 0.45, p = 0.002) and face-to-face conditions (r(61) = 0.32, p = 0.010). No correlation between age and incorrect information for tele-FI (r(45) = 0.14, p = 0.344) or face-to-face interviews (r(61) = 0.11, p = 0.39) was found. This indicates that older children provided information with a greater degree of accuracy overall, but this was mainly due to an increase in correct answers, rather than a decrease in incorrect answers.

The effect of age on the ability to be misled by gesture showed the model to be non-significant r2(106) = 0.01, p = 0.865. During both face-to-face interviews (b=-0.01, 95% CI [-0.22, 0.20], t=-0.08, p = 935) and tele-FI interviews (b = 0.04, 95% CI [-0.18, 0.26], t = 0.39, p = 0.700) a non-significant relationship between consistent answers and age was found. The difference in consistent answers between face-to-face interviews and tele-FI interviews was non-significant (p = 0.433) with tele-FI showing a slight increase in consistent answers as children aged, and face-to-face interviews showing a decrease in consistent answers as children aged. The interview condition (face-to-face or tele-FI) did not moderate the effect of age upon consistent answers r2change (106) = 0.00, F = 0.11, p = 0.739.

Gender

No effect of gender was seen on accuracy (MD = 2.37) F(1,45) = 0.591, p = 0.448, η2 = 0.13, or consistent answers (MD = 0.18) F(1,45) = 0.587, p = 0.446, η2 = 0.13, in tele-FI conditions. Analysis of gender in face-to-face conditions showed a significant effect of gender on accuracy F(1,45) = 4.23, p = 0.044, η2 = 0.07 with girls being more accurate than boys (MD = 4.85), and on consistent answers F(1,45) = 6.01, p = 0.017, η2 = 0.90, with boys being more easily misled than girls (MD = 0.51).

Exploratory Observations

Beyond the remit of pre-registration, interviewer observation noted differences in child preferences for talking on screen or face-to-face. Out of 47 participants who answered the question “Do you like speaking to me face-to-face or using the screen more?” asked after formal interviews, 25 said they preferred speaking via video call, 14 preferred face-to-face and 8 said either was fine. Of the 14 who preferred speaking face-to-face, 6 said it was because ‘it was easier to hear’, 4 said ‘they preferred it because they didn’t like being on their own’, and 4 didn’t know why. Of the 25 who preferred speaking via video call, 7 said it was ‘easier and more fun’, 3 said they preferred being ‘on their own’, 2 said face-to-face was ‘too loud’ for them, and 13 didn’t know why.

Discussion

Findings from the current study add to the growing misinformation literature demonstrating the significant effect of gesture on children’s eyewitness testimony (Broaders & Goldin-Meadow, 2010; Johnstone et al., 2023; Kirk et al., 2015; Meyer et al., 2023). Consistent with our hypotheses, gestural information was found to influence children aged 5–8 years-old, showing a consistent pattern across both interview modes; with accurate gestures resulting in more correct answers, misleading gestures in more incorrect answers, and ‘no gesture’ in more DKDR responses. In line with our expectations, the GME significantly misled children during tele-FI, with salient gestures showing a strong misleading ability consistent with prior research (Johnstone et al., 2023). Contrasting with our expectations, and prior work by Doherty-Sneddon and McAuley (2000), although more correct and incorrect information was gathered via tele-FI than face-to-face interviews, accuracy and gesture consistent answers remained constant across interview modes. No relationship between the GME and age was identified during tele-FI, however older children did exhibit higher levels of accuracy and provided more comprehensive information, while younger children showed a greater tendency to give a DKDR response. Findings showed no effect of centrality, or gender differences, during tele-FI, which contradicts previous work on central/peripheral event questions during interview (Ibabe & Sporer, 2004; Johnstone et al., 2023; Sarwar et al., 2014), and research showing greater accuracy and reduced suggestibility in girls (Johnstone et al., 2023).

The Gestural Misinformation Effect

Doherty-Sneddon and McAuley (2000) proposed that the reduction in visual field and social/emotional distance created by tele-FI may diminish social conformity pressures, gesture comprehension, and provide a degree of protection for eyewitness recall (Gudjonsson et al., 2016; Roebers et al., 2005). Findings from the current study do not support this, potentially due to the balance between those factors which protect against the GME (e.g., reduce social pressure), and those which enhance it (e.g., distorted speech/video) in the sample taken and settings tested (Dargue et al., 2019; Hostetter, 2011). Examining the type of questions and gestures revealed a strong misleading effect of salient gestures, and a similar pattern of results in tele-FI as in face-to-face interviews. This aligns with past research indicating heightened attention to salient gestures compared to subtle ones (Chu et al., 2014), with more expressive gestures likely serving as a more substantial source of information due to their enhanced visibility and neurocognitive engagement (Dargue et al., 2019; Ianì & Bucciarelli, 2017; Pezdek & Roe, 1995; Yang et al., 2015). Despite the suggestion that gestures may become more salient and engaging when hands are positioned higher on the body within the visual screen, no support was found for this. This provides reassurance that while the GME remains significant in tele-FI contexts, it does not appear to be more pronounced than in face-to-face conditions.

Gesture’s role in tele-FI was supported by an increase in DKDR responses when gestures were absent, or questions were asked about peripheral events. Higher rates of DKDR responses were observed among younger children compared to older children, a trend more pronounced in tele-FI than face-to-face interviews. Interviewer observation noted more prompts required in tele-FI to keep younger children within view, a task unnecessary in face-to-face interviews, or with older children. This aligns with Doherty-Sneddon and McAuley’s (2000) findings that video interviewers exert more effort managing younger children, necessitating more non-task speech. These observations, alongside DKDR findings, may explain why there appears to be contrasting trends in age-related suggestibility to the GME, across different interview modes. It’s possible that when a correct answer was unavailable, disruptions and distractions during tele-FI for the younger children, reduced the availability of gestures as a source of information, and increased the use of DKDR answers, compared to older children.

Quality and Quantity of Information

Using tele-FI to elicit eyewitness testimony had a positive impact on evidence quantity in this study, and there was an increase in total words spoken by children when compared to face-to-face interviews. Overall, children spoke 4250 more words during tele-FI than during face-to-face interviews, strongly indicating that rapport with the child was not affected using video-conferencing software, as had been suggested by the National Children’s Alliance (2020). Examination of age trends showed that this difference was mainly due to older children disclosing more information during tele-FI compared to younger children, supporting work by Dickinson et al. (2021), and indicating that younger children may find greater benefit from the personal interaction of face-to-face interviews. Results from previous studies are inconsistent though, with variations in age effects and type/volume of information disclosed over tele-FI compared to face-to-face interviews (Dickinson et al., 2021; Doherty-Sneddon & McAuley, 2000; Hamilton et al., 2017). Methodological differences across studies may explain these differences.

While both incorrect and correct information increased during tele-FI, overall accuracy remained consistent across conditions. An age-related interaction effect was observed, with older children exhibiting greater accuracy compared to younger children in both interview formats. This finding aligns with prior work by Meyer et al. (2023), indicating that cognitive and social developmental advancements enhance recall abilities as children mature (Bruck & Melnyk, 2004; Gudjonsson et al., 2016; Roebers et al., 2005). Despite better accuracy overall for older children, suggestibility remained constant across ages and interview conditions in this experiment, indicating an interesting interaction between the GME and accuracy that could warrant further exploration using a broader age range. The current study’s data may represent only a portion of what a real police interview would gather, as limited prompts were used to ensure replicability. Similarly, it is unclear whether children would be alone during tele-FI, or whether they would have a caregiver or social worker present, which can have both benefits and disadvantages.

Conclusion

Findings highlight the potential benefits of tele-FI and call for new guidance to allow the use of video-conferencing software to gather primary interview evidence. The use of tele-FI addresses the need to improve children’s access to legal and supportive services, with the aim of lowering distress, reducing interview wait times, and enhancing evidential quality in the same way as live links to court. Finally, the integration of tele-FI in the justice system would help ensure equitable access to protective services for vulnerable populations, marking a crucial step towards a more inclusive and robust legal system.