Introduction

Over the past few decades, the emphasis on Science, Technology, Engineering, and Mathematics (STEM) education has increased substantially (Kayan-Fadlelmula et al., 2022; Li et al., 2020). This surge in emphasis is not without reason: STEM fields stand at the forefront of innovation, playing a pivotal role in driving economic growth and addressing global challenges (Widya et al., 2019). STEM encompasses an array of disciplines, each intricate in its own way. In students’ STEM performance at school, both cognitive and non-cognitive factors contribute. Cognitive factors, including prior knowledge, spatial skills, abstract thinking, and logical reasoning have been identified as significant predictors of success in STEM fields (Andersen, 2014; Berkowitz & Stern, 2018). Among non-cognitive factors, attitudes towards STEM subjects and fields have also been shown to play a significant role (Potvin & Hasni, 2014; Potvin et al., 2020). Additionally, demographic factors, such as socio-economic status (Rozgonjuk et al., 2023), and individual characteristics, such as gender (Wang et al., 2023), can significantly influence both academic performance and development of STEM attitudes.

Although research exists on the interplay between academic achievement and attitudes in STEM—and gender differences in the domain—specific aspects, such as the association between science test performance and motivational variables like anxiety, still warrant further investigation. Furthermore, science can be decomposed into more specific subjects, including physics, chemistry, biology, and geography. In the present work, we aim to investigate the association between science achievement, test-related attitudes as well as general and subject-specific science anxiety, and gender differences in these links.

Literature review

Historically, gender differences in STEM fields, particularly in science, have been a focal point of educational and psychological research (Li et al., 2020). The so-called gender-equality paradox refers to a phenomenon observed in many societies where, despite significant efforts to achieve gender equality across various fields, a notable discrepancy persists in the representation of men and women in certain professions, particularly those related to STEM (Stoet & Geary, 2018; Tandrayen-Ragoobur & Gokulsing, 2022). Not only do gender disparities exist in participation and representation in these fields, but studies have also indicated differences in performance, attitudes, and experiences (Voyer & Voyer, 2014). In Estonia, the proportion of female STEM graduates is generally higher than in other OECD countries; however, the pay gap between men and women remains one of the largest among the surveyed countries (OECD, 2024). Furthermore, some evidence suggests an association between a larger pay gap and a higher proportion of women in STEM-related fields, possibly reflecting a higher pay gap in STEM jobs (Treialt, 2021). Of relevance, the average age of Estonian teachers (including in science) is one of the highest among OECD countries (OECD, 2019b, p. 84), and it is a major factor in teaching motivation (Täht et al., 2023).

Several factors have been proposed to contribute to the development of the gender gap in STEM, including social and cultural influences and stereotypes (Cheryan et al., 2017; Master et al., 2021; Verdugo-Castro et al., 2022), differences in career interests and priorities (So et al., 2022), socio-economic status (Early et al., 2020; Rozgonjuk et al., 2023), differences in personality traits (Anni et al., 2023; Hofmann et al., 2023; McKinney et al., 2021), and societal expectations regarding gender roles (Hägglund & Leuze, 2021; Schmitt et al., 2008).

Academic self-efficacy and anxiety are two psychological concepts that are often intertwined when it comes to shaping the attitudes towards academic subjects (Burns et al., 2021; McKinney et al., 2021; Rozgonjuk et al., 2020). Self-efficacy is the belief in one’s own abilities (Bandura, 1997), whereas academic anxiety refers to feelings of fear, apprehension, or worry experienced in response to academic tasks, assessments, or performance expectations (Tobias & Weissbrod, 1980). Research has shown that self-efficacy correlates with better academic outcomes in general (Warren et al., 2020), as well as more specifically in science (Burns et al., 2021) and mathematics (Özcan & Eren Gümüş, 2019). Conversely, poorer academic performance in respective domains has been shown to be accompanied by higher anxiety in mathematics (Caviola et al., 2022; Guzmán et al., 2023; Namkung et al., 2019) and in science (Megreya et al., 2021).

According to the meta-analysis by Reilly et al. (2015), male students tend to have a higher achievement in both mathematics and science than female students in the U.S. On the other hand, recent research, relying on international studies, shows a more mixed picture: in general, while boys outperform girls in mathematics, girls outperformed boys in science in most of the participating OECD countries in the PISA 2018 survey (OECD, 2019a, p. 142). These findings suggest that gender disparities in STEM achievement are context-dependent, highlighting the importance of considering cultural, educational, and socio-economic factors when addressing gender gaps in education.

In studies that do not find gender differences in academic outcomes, girls tend to score lower on mathematics self-efficacy (Zander et al., 2020) and higher on mathematics anxiety measures (Vos et al., 2023). In another study, Megreya et al. (2021) found that female students had higher science anxiety. Relatedly, Cotner et al. (2020) found that female Norwegian biology students reported higher testing anxiety than male biology students; interestingly, anxiety did not predict the achievement of male students. Moreover, in a recent systematic-narrative literature review, Balducci (2023) has also investigated the association between the gender gap in mathematics and science and country-level gender equality. Interestingly, while mathematics gender gap was not linked to country-level gender equality, larger gender differences in mathematics attitudes and anxiety are reported in more gender-equal countries. When it comes to science, Balducci (2023) found that girls display a lower science self-concept than boys in the vast majority of cases—even though, as mentioned above, boys did not necessarily outperform girls in terms of ability.

These differences are not without consequences. For instance, Sakellariou and Fang (2021) argue that early development of self-efficacy in mathematics and science is predictive of propensity to study STEM subjects. The authors also showed that reducing the STEM gender gap is effective in girls with above-average self-efficacy. It could be argued, therefore, that by reducing STEM-related anxiety (which would, conversely, likely result in improving STEM self-efficacy), more girls could consider a STEM career path. Reducing anxiety and improving self-efficacy do not solely depend on one’s performance but can also be achieved through targeted interventions addressing anxiety sources, teaching methods, and learning-related activities (Zakariya, 2022).

Conceptual framework

In this study, we contextualize gender differences in science, extrapolated to STEM education in general within the framework of Social Cognitive Theory (SCT; Bussey & Bandura, 1999). SCT posits that individual beliefs, environmental influences, and behaviors dynamically interact to shape academic outcomes. Building upon this theory, we propose that gender disparities in STEM fields result from a complex interplay of personal factors (such as self-efficacy and anxiety), socio-cultural influences (including stereotypes and societal expectations), and environmental factors (such as educational experiences and opportunities). Transitioning to our specific focus, we delve into investigating the variance in personal factors, notably science anxiety. This choice is informed by our understanding that individual attitudes and emotions can significantly impact academic performance and career choices within STEM disciplines.

In addition, although initially derived from the mathematics domain, the insights from Baloglu and Kocak (2006) can be plausibly extended to science. They suggest that achievement-related attitudes and motivational factors—in the context of the present study, science anxiety—can be influenced by situational, dispositional, and environmental factors. Situational elements pertain directly to the domain in question (i.e., testing in the present study). In contrast, dispositional and environmental factors revolve around the inclination to cultivate stable attitudes and behaviors towards subjects and the external contexts associated with the domain, respectively. Hence, science anxiety could be affected by previous testing experiences (situational), general tendencies to experience anxiety (dispositional), and societal expectations and stereotypes regarding science performance (environmental).

The framework by Baloglu and Kocak (2006) is relevant to our study in linking science anxiety to situational test-related variables to provide empirical evidence on these associations. Investigating other factors (i.e., dispositional and environmental) is not within the scope of the present study. It is important to note that although the science anxiety inventory items refer to anxiety experienced towards science beyond testing situations, we queried about these attitudes after the test was taken. Hence, it is possible that, on one hand, anxiety could impact test performance; on the other hand, test-related variables like test difficulty, test duration, and test appeal could influence self-reported science anxiety, which might, in turn, impact attitudes towards science.

Aims and hypotheses

While science represents a component of STEM, emphasis on examining achievements in science in relation to anxiety, and the gender differences in that link, have garnered less attention than areas like mathematics. The main aim of the present study is to explore the associations between and gender differences in science test performance, test-taking time, science anxiety, and test-related attitudes. It could be argued that students who have lower science anxiety are more motivated to perform well on a test (i.e., they find the test important to them), and are also willing to exert more effort (Glynn et al., 2009). This, in turn, could lead to better performance. There is evidence in literature supporting this logic in test-taking research. Lower test anxiety is related to higher motivation and better academic performance, and students with higher test anxiety have an increased potential to engage in academic self-handicapping behavior. In addition, one may argue that higher motivation to perform well can also lead students to spend more time and apply greater diligence to their test-taking. In other words, more time spent on test-taking should correlate with higher motivation, and this could lead to better outcomes. However, based on literature, one may also anticipate that girls report higher science-related anxiety in comparison to boys—despite similar academic outcomes. Given this reasoning, the following hypotheses are posed:

H1: Science anxiety measures are negatively correlated with science test performance. High levels of anxiety, particularly when specific to a subject like science, can hinder a student's ability to access and utilize accumulated knowledge (Mallow, 2006). Research has consistently shown that high anxiety levels can interfere with learning and impair test performance (Caviola et al., 2022). In addition, the link between academic performance and anxiety is relatively well-established in other STEM domains, like mathematics (Caviola et al., 2022; Guzmán et al., 2023; Namkung et al., 2019), but relatively less researched in the science domain.

H2: Motivational variables (test importance, effort) are positively correlated with test performance. Test-taking motivation and effort are closely connected. Students with higher motivation levels are more likely to exert effort in preparing for and taking the test, in turn leading to improved results (Alhadabi & Karpinski, 2020). The significance of motivation is evident in situations where a student possesses the requisite knowledge but may underperform due to a lack of motivation or perceived relevance of the test to their future goals (Dökme et al., 2022).

H3: Test-taking duration is positively correlated with test importance and effort, as well as test performance. While one might intuitively assume that faster completion times reflect higher levels of proficiency, empirical evidence suggests that rapid guessing (i.e., shorter test-taking time) correlates negatively with test scores (Rios et al., 2022). On the other hand, more time taken to solve assignments is generally linked to better results also in mathematics and science (Silm et al., 2020).

H4. There are no gender differences in science test performance but girls score higher on science anxiety measures. This hypothesis is inspired by findings from STEM research, particularly in mathematics, where there are generally no gender differences in academic performance, but girls report higher levels of STEM-related anxiety (Foley et al., 2017; Guzmán et al., 2023).

H5. The correlation between test performance and anxiety is stronger in girls than in boys. In other words, anxiety plays a larger role in performance in girls. Previously, it has been shown that the link between mathematics anxiety and performance is stronger in girls than in boys (Devine et al., 2012; Dowker et al., 2016). Additionally, this hypothesis is inspired by the findings in literature that showed that despite having similar academic outcomes in STEM subjects, girls tend to opt for a STEM career less (Schmader, 2023; Sevilla & Snodgrass Rangel, 2023). The findings could provide an additional explanation to that phenomenon: it could be that STEM is associated with more anxiety for girls than for boys, consequently guiding the former away from a STEM career consideration.

Alongside these hypotheses, we explore the potential differences between general and subject-specific anxiety measures in relation to test performance and additional test-related variables (e.g., perceived test difficulty and appeal). While these analyses are not focal in the present study, we believe they may nevertheless be informative and useful.

The results of the present study supplement the STEM education field, as studies regarding gender differences in the interplay between academic performance-related variables and STEM-related anxiety are scarce. In case the gender gap in science is related to anxiety, and other variables show correlations with performance and/or anxiety, the findings could inform academics and educational professionals about potential points for anxiety interventions, as outlined by Zakariya (2022).

Methodology

Sample and procedure

The target sample was 12th-grade students from general education schools and 3rd-year (partly 2nd-year) students from vocational educational institutions who had completed the national curriculum for the 4th level of education in physics, chemistry, biology, and geography. The goal was to include 20% (appx. N = 2500) of the students from the respective general student population (N = 10147). The data were collected in March 2023 from a representative sample of students with the aim of investigating science skills and science-related attitudes.

The survey was conducted in classrooms where each student solved the test and responded to the survey from an assigned computer. In total, the advised time for the procedure was 145 min, though this could be extended if necessary. 120 min were assigned solely for the science test administration, 15 min for the survey, and 10 min for a break between test sub-sections.

The students first took the science ability test. After that, the students filled out a three-part post-test survey. The first part focused on epistemic beliefs in science (not used in the present work), the second part was about science anxiety, and the third part regarded test-related attitudes. Importantly, in the part related to science anxiety, all students could respond to items regarding general science anxiety (please see the “Measures” section); subsequently, the students needed to select one subject to assess the related anxiety. Participating in the study was voluntary.

In total, 1907 Estonian 12th-graders took part in the study. After excluding the participants who had missing values, the sample comprised N = 1843 students. Furthermore, students whose test-taking duration deviated more or less than three standard deviations from the average time were removed from the analysis. The effective sample comprised N = 1839 students, 944 (51.3%) were female and 895 (48.7%) were male students. 1438 (78.2%) participants were from a general secondary school, while 401 (21.8%) were vocational secondary school students. 1466 (79.7%) students took the test in Estonian, and 373 (20.3%) students took the test in Russian language. Although the effective sample size was less than the planned N = 2500, the sample was nevertheless representative of the student population of Estonia.

Measures

Science ability test The interdisciplinary science ability test consisted of physics, chemistry, biology, and geography assignments. These context-based multi-part tasks focused on science content and principles in daily life and global phenomena. The test was developed in several iterations across 2018 to 2022, based on both expert feedback as well as psychometric analysis; the detailed test development procedure is described in Vaino et al., (2024).

In total, the test consisted of 37 items which assessed the following facets:

  • (a)Knowledge comprehension. Students were required to explain natural phenomena, identify cause-and-effect relationships, use scientific symbolism, perform calculations, and explain or construct scientific models;

  • (b)Inquiry skills. Students demonstrated mastery of various research skills, from formulating research problems, questions, or hypotheses to evaluating the quality of conducted experiments;

  • (c)Problem-solving. Students were tasked with resolving issues both with scientific content and those with a scientific basis and social relevance, making reasoned decisions;

  • (d)Communication abilities. Students composed short texts on scientific topics, sought information from various sources, and assessed the reliability of the information obtained.

The responses were assessed based on the solution completion (e.g., 0 = incorrect, 1 = partially correct, 2 = correct). Confirmatory factor analysis showed an acceptable model fit (Kline, 2015) for the unidimensional solution of the test, χ2(629) = 2268.29, p < 0.001, RMSEA = 0.036, CFI = 0.908, TLI = 0.903 (Rannikmäe et al. 2023). In addition, Cronbach’s α = 0.849 for the total test.

Post-testing survey general and subject-specific science anxiety. The science anxiety part of the procedure regarded the general and subject-specific statements. These items were created by the survey development team which included experts in education and psychology. The general science anxiety questionnaire included the following statements:

  1. 1.

    Natural sciences lessons frighten me.

  2. 2.

    In natural science lessons, I worry that everything is too complicated for me.

  3. 3.

    When solving natural sciences homework, I worry that I do not understand it.

The agreement with these statements was rated on a scale from 1 = completely disagree to 4 = completely agree. The scores were summed to form a general science anxiety score. Cronbach’s alpha for this scale was α = 0.86.

In addition, the students could select one of the subjects (options: physics, chemistry, geography, biology) to assess their anxiety related to that subject. The following statements were used:

  1. 1.

    I often worry that it is difficult for me in [chemistry/physics/geography/biology] lessons.

  2. 2.

    I get very tense when I have to learn [chemistry/physics/geography/biology] at home (do homeworks).

  3. 3.

    I get nervous when I am working on solving [chemistry/physics/geography/biology]-related problems.

  4. 4.

    I feel helpless when I am working on solving [chemistry/physics/geography/biology]-related problems.

  5. 5.

    I worry that I will get bad grades in [chemistry/physics/geography/biology].

It is important to note that, as mentioned, only one subject could be chosen for evaluation. Hence, this scale was operationalized as “subject-specific” anxiety (as opposed to “general”). As with the general science anxiety inventory, the responses ranged from 1 = completely disagree to 4 = completely agree, and the scores of responses were summed. The internal reliability of the scale across the total sample was Cronbach’s α = 0.87.

Post-testing survey science test-related attitudes. The students responded to seven items regarding test-taking motivation (1 = completely disagree to 5 = completely agree), which included statements regarding the importance of the test to the student (3 items; example item: “The test was important for me.”) as well as the effort put into test-taking (4 items; example item: “I put in a lot of effort throughout the test”). Cronbach’s α = 0.70 and α = 0.84 for test importance and effort, respectively. In addition, the students evaluated the difficulty (1 = very simple to 5 = very difficult) and appeal (1 = very interesting to 5 = not at all interesting) of the test with one item; the latter was reverse-coded for better interpretability to 1 = not at all interesting to 5 = very interesting).

Analysis

The data analysis was conducted in R v4.3.0 software (R Core Team, 2023). Descriptive statistics, internal reliability, and Pearson correlation analysis was conducted with psych v2.3.3 (Revelle, 2021) and RcmdrMisc v.2.7–2 (Fox, 2022). Group differences in gender were analyzed with the independent samples t test from R’s base() package. The lsr v.0.5.2 (Navarro, 2015) was used to compute Cohen’s d-s for gender differences effect size estimation. We used common effect size benchmarks: d = 0.01, 0.20, 0.50, 0.80, 1.20, and 2.00 for “very small”, “small”, “medium”, “large”, “very large”, and “huge” effect sizes, respectively (Cohen, 1988; Sawilowsky, 2009). However, we also want to point out that Kraft (2020) has argued that it may be more justified to interpret d < 0.05, d < 0.20, and d > 0.50 as “small”, “medium”, and “large” effect sizes, respectively, in educational context. To test for differences between Pearson correlation coefficients for science test score and other variables, Fisher's r-to-z transformation was employed, which converts correlations into normally distributed z-scores for comparison (Silver & Dunlap, 1987).

Results

In this section, we first present the descriptive statistics and correlation analysis results for the total sample. This is followed by the results on gender differences.

Descriptive statistics and correlations for the total sample

The descriptive statistics and correlations for the total sample are displayed in Table 1.

Table 1 Descriptive statistics and correlation coefficients

Table 1 shows that test performance is moderately and positively linked to test-taking duration: more time spent on solving the test is associated with better performance. Similarly, finding the test to be appealing and important for oneself as well as reporting exerting more effort were linked to better performance. On the other hand, test difficulty had a small significant negative correlation with test performance. When it comes to test-taking duration, students who spent more time on test assignments also rated the importance of the test higher, found the test to be more appealing, and reported putting more effort into solving the assignments. These correlations were medium-to-large in effect sizes.

Looking into anxiety measures, the general science anxiety score did not correlate statistically significantly with other variables. Interestingly, general science anxiety did not correlate with subject-specific anxiety. On the other hand, subject-specific anxiety yields a small negative correlation with test performance, and students who rated the test more difficult also reported higher subject-specific anxiety. The results also show that students who found the test to be more difficult had more anxiety in the physics and chemistry domains.

When it comes to other test performance-related attitudes, the results in Table 1 show that finding the test appealing, in addition to positively correlating with better results and higher test-taking time, had a positive link with test importance and effort, and a negative association with test difficulty.

Gender differences in test performance-related factors and anxiety

Gender differences in science test performance and performance-related attitudes are displayed in Table 2 and depicted in Fig. 1a, b. The figures display the results for the total sample, as well as for male and female students separately.

Table 2 Gender differences in test and anxiety-related variables
Fig. 1
figure 1

a Error bar plots (95% CIs) for gender differences in science test performance, test-taking duration, and test-taking related attitudes. b Error bar plots (95% CIs) for group differences in general and subject-specific science anxiety

The results in Table 2 show that there were no gender differences in science test performance, nor was one group more motivated (i.e., assessing the test importance higher) than the other group to perform well on the test. In addition, girls and boys found the test similarly appealing, and both groups reported similar levels of effort in test performance.

The findings in Table 2, however, show that girls evaluated the test more difficult than boys. Female students reported higher anxiety levels in all anxiety measures.

How do different factors correlate with test performance across gender, and do these correlations differ? The latter question would allow further investigation if test-related variables or anxiety measures have a stronger link with test performance. To answer these questions, correlation coefficients were compared across the groups. The results are displayed in Table 3 and in Fig. 2a, b.

Table 3 Correlation differences in links between test performance and other performance-related factors across different groups
Fig. 2
figure 2

a Error bar plots (95% CIs) for correlation differences between genders in science test performance, test-taking duration, and test-taking related attitudes. b Error bar plots (95% CIs) for correlation differences between genders in general and subject-specific science anxiety

The results in Table 3 show that test duration had the strongest (positive) correlation with test performance, i.e., more time spent on taking the test was correlated with better test results. As reported for the total sample in Table 1, test effort, importance, and appeal also had positive correlations with test performance also in both boys’ and girls’ samples. Test score and difficulty had a negative correlation. There were no gender differences in how strongly test-related variables correlated with test scores, indicating similar strengths in these associations across genders.

We also investigated the differences in correlations between test scores and various anxiety measures. Apart from general science anxiety (GS), geography and biology anxiety, other anxiety variables exhibited a statistically significant, albeit small, negative correlation with test scores in both boys and girls. However, the magnitudes of these correlations were similar, as the differences in correlations were not statistically significant.

To account for other possible effects (school type and test language), we conducted separate analyses on a homogeneous group of secondary school students who took the test in Estonian. The results, displayed in Supplementary Materials 1, showed marginal differences. Therefore, we discuss the total sample results as the main findings.

Discussion

The main aim of the present study is to explore the associations between science test performance, test-taking time, science anxiety, test-related attitudes, and gender differences. We posed several hypotheses, and conducted additional analyses.

According to the first hypothesis (H1), we expected that the science test scores are negatively associated with science anxiety across the total sample. Previous works have shown that, among STEM subjects, mathematics anxiety generally predicts poorer mathematics achievement (Caviola et al., 2022; Guzmán et al., 2023; Namkung et al., 2019). The results showed that this hypothesis found partial support from the data. General science anxiety did not correlate with test performance; rather, the achievement-relevant anxiety seems to be subject-specific, especially when it comes to chemistry. Although the correlation was small, it nevertheless shows that students who reported higher chemistry-related anxiety also scored lower on a science test that includes elements of other natural sciences. These findings also call for careful consideration of treating science anxiety as a general construct, whereas this anxiety might be subject-specific.

We expected that higher test importance, test-taking effort, time spent on test, and performance would all be positively correlated (H2 and H3). An explanation for such expectations lies in the assumption that students who perceive the science test as important to them, are more likely to put more effort into taking the test. Behaviorally, this could manifest in more time spent on solving the assignments, for example, making sure that the task is understood and double-checking the responses. In turn, this should result in higher test scores. Both hypotheses were supported by the data. In fact, test-taking time had the highest correlation with test performance among all the tested variables, yielding a medium-to-high effect. These results are also in line with literature that has previously demonstrated that more time taken for solving tasks tends to result in better performance (Silm et al., 2020). All other correlations were also positive, in the small-to-medium effect size range. Interestingly, girls took more time to solve the tasks than boys, but there were no differences in test importance and effort. That said, the link between test score and test-taking duration did not differ across genders.

Finally, we expected not to see gender differences in test performance; however, girls were expected to score higher on science anxiety measures (H4). In addition, we expected that the correlation between science test performance and anxiety measures is stronger in girls than in boys (H5). The former hypothesis, too, found support from data. Although general science anxiety did not correlate with other measures, girls nevertheless scored higher, albeit with a small effect size. However, in subject-specific anxiety measures, the effect sizes of gender differences were medium-to-large. On the other hand, the latter hypothesis (H5) did not find support from the data, suggesting that the correlations between anxiety and performance do not differ statistically significantly between boys and girls. These results are interesting, as they seem to indicate that although girls report more anxiety, it does not seem to affect their results (or vice versa) more than those of boys. The findings also contradict those previously found in the mathematics domain (Devine et al., 2012; Dowker et al., 2016). Indeed, one could potentially hypothesize that the gender gap in STEM careers (Schmader, 2023; Sevilla & Snodgrass Rangel, 2023)—at least based on science—might not originate from ability or from anxiety influencing performance, but rather from the anxiety itself.

There could be several explanations for these findings, ranging from societal expectations (e.g., stereotypes) to individual characteristics. Given that societal stereotypes affect not only how girls’ skills in STEM are perceived, but also how this perception negatively impacts their interest in these subjects (Master & Meltzoff, 2020; Master et al., 2021; Shapiro & Williams, 2012), as well as how women may be perceived as lacking the qualities as scientists due to stereotypes (Carli et al., 2016), it might not be surprising that female students may feel the pressure to perform in STEM subjects while also having a lower STEM self-concept (Leibham et al., 2013). More recently, the role of social media has also been outlined as potentially influencing math and science attitudes (Daniels & Robnett, 2021).

Personality research has demonstrated that women, on average, score higher on the neuroticism personality trait, which reflects experiencing more negative affect, including anxious tendencies (Hofmann et al., 2023; Mac Giolla & Kajonius, 2018). Relatedly, the neuroticism personality trait is not correlated with cognitive ability (Rozgonjuk et al., 2021a, 2021b). Hence, it may be that female students report more science anxiety than male students, despite their abilities not differing significantly. It has also been hypothesized that such anxious tendencies (at least in the STEM context) may drive a self-deprecating cycle in female students, where anxiety carries over to other testing situations (Pelch, 2018). This anxiety possibly affects performance and skews attitudes towards the study domain negatively. In this regard, it may be interesting to further examine the dynamics of experiencing anxiety and worrying tendencies in adolescents, and if there are general changes in anxious tendencies also in academic settings over time, similarly to more general personality characteristics (Mõttus & Rozgonjuk, 2021).

With regard to additional exploratory analyses, we generally found that the correlations between test performance and other variables did not statistically significantly differ between boys and girls, meaning that the correlations found in the total sample should be applicable to both groups. The students who scored higher on the test also reported more positive feelings towards the test, i.e., found the test more appealing, and, conversely, found the test less difficult than the students who had a poorer test score. This is in line with previous findings, demonstrating that higher test-taking motivation, effort, and liking the test are associated with better achievement (Alhadabi & Karpinski, 2020; Pekrun et al., 2014; Silm et al., 2013; Živković et al., 2023), whereas the perceived difficulty of the test is linked to poorer results (Mazana et al., 2018).

Interestingly, those who found the test appealing, also took more time for solving the tasks. Although test difficulty and test appeal had a negative correlation, test difficulty did not predict test-taking time. These results suggest that students who took more time for solving the assignments did not seem to do so because they found the test very challenging; rather, also supported by the findings reported above, the findings indicate that the students were motivated to give their best. The findings may further underscore the importance of positive affective factors in successful test performance.

Additional interesting insights regard anxiety’s negative correlation with test appeal and positive link with test difficulty. These links were subject-specific and the effects were rather small but they do seem to suggest that, in some cases, there might be an interplay between the perception of science assignments and anxiety. Granted that the present study did not investigate causal associations, one could further hypothesize that perceived test difficulty (or appeal) could be affected by the anxiety one has towards a science subject. Attitudes like finding the subject difficult has been associated with anxiety in mathematics before (Rozgonjuk et al., 2020). Building positive attitudes towards science and testing can help boost science self-efficacy which, in turn, is associated with better results in science (Lau & Ho, 2022).

Science as a subject differs from mathematics, as it encompasses more domains. The results of our study showed, though, that when it comes to science attitudes (namely, anxiety), it may also be anchored to a specific subject, and not only rather than a general attitude towards science.

Studies tend to report that boys outperform girls in mathematics (Reilly et al., 2015; Rozgonjuk et al., 2023), but there is also evidence that girls outperform boys in science (OECD, 2019a). Regardless, female students tend to have poorer self-concept regarding these subjects, and could manifest in anxiety towards the subject (Balducci, 2023; Megreya et al., 2021). Our work is in line with the works regarding attitudes—girls do report higher science anxiety; on the other hand, our work also shows that this anxiety is not due to poorer achievement. Based on this, the findings suggest that STEM research might gain additional insights by also analyzing the different domains separately.

The main contribution of the present study was to provide empirical evidence for the ‘S’ in STEM research: science. Although science is an essential field in contemporary education and career opportunities, many STEM-focused papers seem to focus either on the ‘M’ in STEM—mathematics—or on STEM in general, including investigating gender differences. Our work revealed that there are not only interesting links and differences in the science domain—but science itself might be too general of a construct. Instead, specific science subjects could explain the link between performance, anxiety, and—perhaps also subsequent career considerations. The theoretical contribution is highlighting the subject-specific nature of science anxiety: it may be more nuanced. This knowledge can be used to refine theoretical models of (STEM-related) academic anxiety and performance. From a more practical perspective, knowing that girls have higher science anxiety but not necessarily lower ability could be used to form more positive attitudes in girls regarding science. For instance, although underestimated students may perform as well as others, they often have lower academic self-concept and expectations for success (Urhahne et al., 2011). This is also associated with poorer expectations from teachers (Urhahne et al., 2011). It could be argued that the societal expectations for gender roles (Hägglund & Leuze, 2021; Schmitt et al., 2008) may exacerbate the underestimation of the role of girls in science, potentially shaping the attitudes of girls towards science and STEM in general. Hence, developing programs or approaches to inform the students and educators about girls being equally capable but more anxious in science than boys could be used towards building more realistic self-concepts regarding science. This may, in turn, help alleviate the gender gap in STEM.

The main limitation of this study is the cross-sectional design. Although anxiety toward science and specific subjects likely precedes test performance, the potential impact of test performance on other evaluations cannot be ruled out. Another limitation is relying on single items in some of the exploratory analyses. However, the items like test appeal and difficulty are relatively straightforward. In addition, studies have shown that single-item measures can yield results as reliable and valid as those of longer questionnaires (Diamantopoulos et al., 2012; Gardner et al., 1998; Rossiter, 2002). Even though test-taking duration might be an indication of test-taking effort (Silm et al., 2020), we acknowledge that we cannot definitively distinguish between rapid guessing, proficiency, diligence, and difficulty in answering. Although out of the scope in the present study, such distinction could be made by investigating the link between response patterns and timestamps of response events. While the anxiety-related items used in this study mostly focused on how the students generally felt about typical science-related educational situations (e.g., regarding homeworks), it should also be noted that the potential test anxiety experienced by students could also have influenced the post-test science anxiety evaluations. Future studies could consider implementing both before and after assessments.

The role of technology in STEM education can also be investigated in light of the present results. For instance, the use of virtual reality (Yang et al., 2024), social media (Achilleos et al., 2019; He et al., 2016), as well as smartphones (Mella-Norambuena et al., 2021; Smith et al., 2023) has shown promise in STEM education with regard to improving students’ motivation. More research is needed to investigate gender differences in technology use in STEM settings. In addition, the use of digital technology in classroom settings should be done with caution (Aru & Rozgonjuk, 2022; Rozgonjuk, Täht, et al., 2021), as studies have shown that social media and smartphone use, as well as notifications received from these media and devices, are linked to more procrastination (Rozgonjuk et al., 2018a, 2018b), superficial learning styles (Rozgonjuk et al., 2019; Rozgonjuk, Saal, et al., 2018), and boredom proneness (Elhai et al., 2021; Wolniewicz et al., 2020). Hence, non-skillful implementation of technology assistance may not necessarily improve the attitudes towards science.

Finally, it would be interesting to compare the science-related variables of this study with other STEM components, such as engineering and mathematics, as well as in interdisciplinary contexts (Darmawansah et al., 2023; Gao et al., 2020).

Conclusions

In this study, we focused on the interplay and gender differences in the S-domain of STEM: science. Although some of the findings were in line with previous results from the general STEM and mathematics education fields, there are also indications that “science” might not be a broad concept. Anxiety toward science might be subject-specific. Furthermore, the results showed that although there were no gender differences in terms of science test performance, girls reported consistently higher levels of anxiety, both in terms of general and subject-specific science. However, the correlations between anxiety measures and performance were not stronger in girls than in boys. The results suggest that the gender gap in STEM might not necessarily stem from ability but rather motivational variables, such as anxiety. However, additional research is needed to establish the source of science-related anxiety in girls and its potential impact on pursuing a STEM career.