Introduction

Problem posing has long been thought of as a vital intellectual activity in scientific investigation. As Einstein and Infeld (1938) pointed out, the formulation of an interesting problem is often more important than its solution. Research and practice on problem posing are relatively new compared to those on problem solving (Brown & Walter, 1993; Cai et al., 2015), but problem posing is attracting increased attention by both researchers and practitioners. The importance of problem posing in school mathematics is underpinned, for example, by a growing body of empirical evidence showing that problem posing has the potential to support students’ mathematical understanding, problem solving ability, and creativity (Bonotto & Santo, 2015; Cai & Hwang, 2002).

Given the important role of problem posing in the teaching and learning of mathematics, research and practice have aimed to develop ways to enhance students’ problem posing competence. Several studies have shown that, with appropriate instructional support, students and teachers are capable of posing interesting and important mathematical problems (e.g., Cai et al., 2015; Silver & Cai, 2005). While there is an increase in the number of interventions aimed at improving participants’ mathematical problem posing competence (Cai et al., 2020), these interventions have varied widely in their design and implementation, yielding mixed results. The variation suggests a lack of consensus on what constitutes an effective intervention to promote this important competence. Particularly, questions like the following remain unanswered: What are the key components of effective interventions in enhancing mathematical problem posing competence? Are certain intervention components more important than others, and if so, for which participant groups and under what conditions? Indeed, there is a lack of a meta-analysis of past interventions in the area of mathematical problem posing to answer questions such as these, thus making it difficult for researchers and practitioners to understand, adopt, or strive toward best practice.

In this paper, we use the term intervention broadly to refer to a purposeful set of actions taken to improve a situation (Stylianides & Stylianides, 2013), in this case the mathematical problem posing competence of individuals at any level, from kindergarten to secondary school students as well as prospective or in-service teachers; these actions could be delivered in various settings (e.g., classrooms or laboratories) and systematic evidence would have been collected to explore their effectiveness. Specifically, in this paper we take a step toward addressing the aforementioned research gap by reporting a synthesis of different components that were incorporated in interventions aimed at enhancing participants’ mathematical problem posing competence, the findings of a meta-analysis of the treatment efficacy of these interventions, and the moderators’ impact on the treatment efficacy. By doing so, we expect our findings to support researchers and practitioners in their future efforts to design and implement (more) effective interventions to improve students’ or teachers’ mathematical problem posing competence.

Background

Mathematical problem posing

There is no agreement on how mathematical problem posing is defined, though it is generally used to refer to “the process by which, on the basis of mathematical experience, students construct personal interpretations of concrete situations and formulate them as meaningful mathematical problems” (Stoyanova & Ellerton, 1996, p. 519). As a complex notion, mathematical problem posing has been described in different ways: as a logical process (Cai & Hwang, 2020; Cai & Rott, 2024; Stoyanova & Ellerton, 1996); as a product-oriented phenomenon (Silver, 1994); as a role-centered accomplishment shaped by the norms of particular communities (Klinshtern et al., 2015; Kontorovich, 2020); and a cognitive activity, a research or instructional tool, or a learning goal (Cai & Leikin, 2020; Liljedahl & Cai, 2021).

Despite the varied manifestations embedded in definitions, there is significant overlap among them. They all view the process of problem posing as generating or revealing something new from a set of data, and it is considered to be a form of authentic mathematical inquiry (Bonotto & Santo, 2015). Problem posing is, in fact, of central importance to the discipline of mathematics and to mathematical thinking (Kilpatrick, 1987). The advancement of mathematics requires creative imagination as the result of raising new questions, creating new possibilities, and viewing old questions from new angles (Ellerton & Clarkson, 1996). Indeed, the identification and posing of good problems was recognized to be an important part of doing high-quality mathematics decades ago (Hadamard, 1945).

If a goal of education is to prepare students for the kinds of thinking they will need in the future, it seems reasonable that “the experience of discovering and creating one’s own mathematics problems ought to be part of every student’s education” (Kilpatrick, 1987, p. 123) rather than reserved for candidates for advanced degrees in mathematics. Partly based on realizations such as this one, in recent years, several curriculum frameworks around the world have supported the central role of problem posing in students’ mathematical education as a way of helping students learn how to think creatively and engage in mathematical inquiry (e.g., Chinese Ministry of Education, 2022; Ministry of Education of Italy, 2007; National Council of Teachers of Mathematics (NCTM), 2000; Toh et al., 2023).

In line with the growing recognition of the significance of problem posing in school mathematics, there has been a surge in research studies focused on exploring various aspects of this notion. This literature can be categorized in the following three strands (Cai & Leikin, 2020; Liljedahl & Cai, 2021): research on problem posing as a cognitive activity, which focuses on understanding the nature of problem posing itself and its relationship with other constructs; research on problem posing as a tool, which investigates how problem posing can serve to improve students’ or other participants’ learning of mathematics more generally; and research on problem posing as a goal, which focuses on how one develops the capacity for posing good problems. Theoretical arguments and empirical evidence supporting the importance of problem posing competence in school mathematics describe mathematical problem posing both as a valuable goal in itself and as a tool to accomplish broader mathematical goals through engaging in problem posing activities. For instance, mathematical problem posing can deepen mathematical understanding, advance mathematical problem solving skills, promote mathematical creativity, and foster positive attitudes toward mathematics (Cai et al., 2015; Rosli et al., 2014). As elucidated by Cai (2022), this approach of viewing problem posing as a tool emphasizes engaging participants in problem posing tasks and activities to help them achieve both wider-range cognitive and noncognitive learning goals, while, at the same time, developing their problem posing competence as they engage in these tasks.

Despite widespread recognition of problem posing as an important intellectual competence in school mathematics and research that has shown that students and teachers are capable of posing worthwhile mathematical problems, participants often pose problems that are nonmathematical, irrelevant, unsolvable, unclear, or have errors (Cai & Hwang, 2002; Joaquin, 2023; Silver, 1994; Silver & Cai, 1996; Zhang et al., 2022a). Several hypotheses have been offered for these difficulties. For example, Crespo and Sinclair (2008) hypothesized that the difficulties might relate to a lack of opportunity for participants to explore the problem situation adequately during the problem posing process. English (1997) proposed that participants might lack foundations in problem posing. Ellerton (2013) argued that the difficulties might arise from little or no opportunity for participants to be involved in problem posing. Indeed, most of the mathematical problems a learner encounters during their education have been posed and formulated by others – the teacher or the textbook author (Kilpatrick, 1987).

Several efforts to address these difficulties have been made. For example, some researchers attempted to provide participants with more opportunities for exploration of mathematical situations (Crespo, 2003; English, 1998), while others explored the characteristics of disciplinary practice in order to identify strategies to facilitate high-quality problem posing (Brown & Walter, 1993; Milinković, 2015). While the results of most such efforts generally suggest that it is feasible to improve participants’ mathematical problem posing competence, there is a wide variation in the design of the interventions, their research participants, and the instruments they used to measure the variables of interest including the outcomes. Thus, it is not clear what intervention designs are effective, with respect to what outcome measures, and for whom (Cai et al., 2015). A meta-analysis of these interventions and their effect on mathematical problem posing competence is sorely needed.

Interventions to enhance mathematical problem posing competence

Mathematical problem posing competence refers to the criterion behavior as well as the knowledge, cognitive skills, and affective-motivational dispositions that underlie that behavior during engaging in the activity of mathematical problem posing (Zhang et al., 2023). Conceptually, it is assumed to involve a multitude of cognitive and affective states that are changing throughout the duration of the problem posing activity and cannot all be directly observed but rather must be inferred from observed behavior (Blömeke et al., 2015). However, the development of participants’ mathematical problem posing competence has been documented to result in particular positive observed cognitive outcomes, such as higher quality and quantity of posed problems, and more positive affective-motivational dispositions that underlie the cognitive outcomes (Bicer et al., 2020; Cai & Leikin, 2020; Zhang et al., 2023).

As discussed earlier, although there are several interventions that aimed to enhance students’ and teachers’ mathematical problem posing competence, these have not been systematically reviewed. There are a few reviews related to mathematical problem posing that align predominantly with the “problem posing as a tool” perspective as compared to the other perspectives we discussed earlier. For example, Rosli et al. (2014), Kul and Çelik (2020), and Wang et al. (2022) conducted a meta-analysis on the effects of engaging in problem posing activities on students’ learning of mathematics. Other reviews examined the effects of engaging in problem posing activities on particular learning goals such as problem solving (Kopparla et al., 2019; Priest, 2009), and mathematical attitudes and achievement (Bevan et al., 2019). While these reviews occasionally ventured into a few studies examining the effect of engaging in problem posing activities on the development of problem posing, their primary focus remained on demonstrating the merit of problem posing as a tool on a broad-based impact for learning mathematics as opposed to examining problem posing as a goal. Accordingly, the aforementioned reviews are informative but insufficient to reveal what might constitute effective interventions, including but not limited to methods of engaging participants in problem posing activities, to enhance mathematical problem posing competence. To the best of our knowledge no attempt has been made to synthesize interventions that aimed to impact positively on participants’ mathematical problem posing competence, that is, intervention studies treating problem posing as a goal. Hence, a systematic approach for reviewing the body of empirical research is needed to understand what intervention components might be important to include and potential moderators of treatment efficacy.

Intervention components

According to a constructivism-oriented viewpoint, special attention in analyzing the core components of interventions should be paid in order to stimulate meaningful reflection (Danusso et al., 2010) including questions like, “what works?” To address this issue, Harden and Thomas (2005) described the intervention development as “‘ideas’ for actions to affect outcome ‘X’,” and they suggested we think about questions like, “how do people experience ‘X’?” or “what factors make it more/less likely that ‘X’ occurs?” Interventions (or “ideas” for actions) designed by educational researchers are invariably inspired by theories of learning, cognition, motivation, or development (Pressley et al., 2006). As far as intervention design for enhancing participants’ mathematical problem posing competence is concerned, the “ideas” for actions could emerge in the analysis of the reasons why participants have difficulties posing mathematical problems. For example, we discussed such reasons earlier including participants lacking the foundation of problem posing or opportunities to engage in problem posing or explore problem posing situations (Ellerton, 2013; English, 1997). Therefore, the interventions may provide instructional practice or offer relevant resources in response to what participants are lacking. Also, the “ideas” for actions can be inspired by evidence accumulation of strategies used for generating mathematical problems such as the “what-if-not” strategy (Brown & Walter, 1993), which involves participants listing the elements of the problem and then generating the new problem by asking “what if not the element k.

In conclusion, prior research provided necessary theoretical foundation for the development of interventions including possible instructional practices, resources, or strategies enabling informed decisions about how to shape and organize the particular aspects of treatments (Pressley et al., 2006). In the present review, we adopted Bicer’s (2021) definition of “instructional practices in mathematics education” and Boller et al.’s (2014) typology of “educational quality improvement interventions” to identify the intervention components that were incorporated in interventions that aimed to improve participants’ mathematical problem posing competence. These components included activity-based practice that participants were required to experience (e.g., problem posing activity), method-based assistance that helped participants to pose problems (e.g., problem posing strategies, technology), and environment-based support that guided interaction (e.g., peer discussion).

Potential moderators of treatment efficacy

Empirical studies have been conducted to examine the effects of interventions on participants’ mathematical problem posing competence. Given the wide range of intervention designs and implementations, it is not surprising that there is heterogeneity in effect sizes between studies. Knowledge about study features (i.e., moderators) that can explain the heterogeneity in effect sizes can be useful for researchers to evaluate the effectiveness of existing interventions and design new potentially effective interventions (Li et al., 2020a). Even though no meta-analysis has been conducted to examine the moderating effect of intervention designs on improving participants’ mathematical problem posing competence, in this review we followed previous meta-analyses with respect to mathematical learning (e.g., Myers et al., 2022; Niu et al., 2013) and grouped these moderators based on research design, sample characteristics, and intervention characteristics. In what follows, we discuss separately each group of moderators.

Regarding research design, we used Garzón et al.’s (2020) typology that grouped studies as between-participants design studies involving experimental and control treatments to measure the raw difference between treatments (pretest–posttest-control, posttest only with control) and within-participants design studies employing a single-group pretest and posttest design (single-group pretest–posttest). Within-participants designs, as highlighted by Cohen (1988) and Maxwell and Delaney (2004), benefit from increased statistical power due to minimized variability and reduced error variance, potentially leading to larger observed effect sizes. Furthermore, Niu et al. (2013) argued that within-participants designs are vulnerable to most threats to internal validity such as maturation, history, and testing, since they lack a control group, which might also contribute to larger observed treatment effect sizes compared to between-participants designs. In their review, Niu et al. (2013) empirically indicated that within-participants design studies (mean ES = 0.312, SE = 0.087) had significantly larger effect sizes than between-participants design studies (mean ES = 0.120, SE = 0.074), with a significance level of 0.10. Although this level of significance is less stringent than 0.05, it still suggests that within-participants designs tend to yield larger observed effects due to their inherent methodological characteristics. Hence, we hypothesize that studies adopting a within-participants design will yield a higher mean treatment effect size than studies using another design.

Regarding sample characteristics that could help determine for which group of participants interventions may be most useful, we considered sample level (K-12 students vs. prospective teachers vs. in-service teachers) and sample size (small group vs. medium group vs. large group). Although prior meta-analyses in mathematics have attempted to examine the moderating effect of sample characteristics, their findings have been largely inconsistent. For example, for sample level, Rosli et al. (2014) found that prospective teachers were strongly influenced by engaging in problem posing activities across all mathematical learning outcomes compared to grade 4–12 students. However, Wang et al. (2022) concluded that there was not enough evidence that sample level was a moderator for the effect of problem posing strategies on mathematical learning achievements. Silver (1994) suggested that students who have been exposed to traditional forms of mathematics teaching for a long time (e.g., students in higher grade levels) and were relatively successful in learning mathematics in this style of teaching were more likely to have a lower motivation level in posing problems compared to younger students. In addition, Voica and Pelczer (2009) found that in-service teachers’ pedagogical knowledge and classroom experience constrained their views of the problems they could pose. While we recognize that problem posing may hold varying significance for teachers (in-service or prospective) and students from a pedagogical standpoint, our understanding of the differences in treatment efficacy across sample level, such as students versus prospective or in-service teachers, remain limited. Thus, we expect to see greater improvement among younger participants and seek to investigate if the learners’ level notably affects the efficacy of the treatment. Regarding sample size, some intervention studies in learning achievement showed that effect sizes of different sample sizes differed significantly (Zheng et al., 2020) while others reported no significant moderation effect (Borde et al., 2017). Considering that mathematical problem posing was a relatively new activity compared to problem solving for many participants (Cai et al., 2015; Zhang et al., 2022a, 2022b), we hypothesize that assistance through smaller group may contribute to more significant gains.

Regarding intervention characteristics that could help determine the conditions under which interventions are most effective (Myers et al., 2022), we considered intervention duration (short duration vs. medium duration vs. long duration), the number of intervention components (single component vs. multiple components), the mode of intervention components (activity-based practice vs. method-base assistance vs. environment-based support vs. mixed). Regarding intervention duration, the results from prior research have been inconsistent. Wang et al. (2022) found that longer-duration interventions were associated with larger improvement in students’ mathematical dispositions compared to shorter-duration interventions, but several other studies found that delivering medium-duration interventions was the primary source of heterogeneity and influenced the most the effect size of learning achievement (Liu & Pásztor, 2022; Zheng et al., 2020). Therefore, intervention duration may have a significant impact on the improvement of participants’ mathematical problem posing competence. However, the direction and strength of this impact may vary depending on other moderators, such as the target sample characteristics and the intervention delivery method. Regarding the number and mode of intervention components, intervention studies often have several components implemented across one or more settings by different intervention agents, with a general lack of consensus on what causes or contributes to specific outcomes associated with a particular intervention design (Sheridan et al., 2019). The number and the mode of core intervention components that contribute to positive outcomes in mathematical problem posing competence have not been empirically determined. Such information is necessary to direct the design and implementation of effective interventions (Damschroder & Hagedorn, 2011; Sheridan et al., 2019). Hence, we hypothesize that different intervention components are associated with different levels of effectiveness at improving participants’ mathematical problem posing competence.

The focus of this meta-analysis

To take a step toward understanding the impact of existing published interventions on participants’ mathematical problem posing competence, we conducted a meta-analysis of this body of research to address the following three research questions.

  • RQ1: What components were incorporated in published interventions for enhancing participants’ mathematical problem posing competence?

  • RQ2: What are the overall treatment effects of the published interventions on participants’ mathematical problem posing competence?

  • RQ3: What moderators (e.g., research design, sample characteristics, and intervention characteristics) influenced the effectiveness of published interventions on participants’ mathematical problem posing competence?

At the instructional design level, we aim to cast light on the components that researchers incorporated in interventions for improving mathematical problem posing competence so as to deepen understanding of the mechanisms by which this competence can be enhanced (RQ1). In addition, we are interested in the overall treatment efficacy of published interventions on participants’ mathematical problem posing competence (RQ2) and in the moderators’ effect on treatment efficacy (RQ3) so as to cast light on what works best and inform the future design of (more) effective interventions.

Methods

Literature search

We followed standardized guidelines for systematic reviews by Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) (Page et al., 2021). In July 2021 we searched electronically the following databases, which we identified based on Depaepe et al.’s (2013) review of commonly used databases in mathematics education research: Web of Science, Educational Resources Information Center (ERIC), PsycINFO, and Springer. In order to make the searching as comprehensive as possible, the query string with Boolean operators was set as follows, partially adapted from the searching word used in Lo et al.’s (2017) and Wang et al.’s (2022) reviews: (math* OR algebra OR trigonometry OR geometry OR calculus OR statistics) AND (“problem posing” OR problem-posing).Footnote 1 In an effort to ensure all eligible publications were identified, the first 15 pages of Google Scholar search results (10 publications per page, ordered by relevance) were cross-referenced with the compiled inclusion bibliography, using the same search terms.

Inclusion and exclusion criteria

The typology of research literature on mathematical problem posing (Cai & Leikin, 2020; Liljedahl & Cai, 2021) that we discussed earlier included the following three strands: research on problem posing as a cognitive activity, research on problem posing as a tool, and research on problem posing as a goal. We considered the last research strand, namely, problem posing as a goal regarding how one develops the capacity of posing good problems, as the only one relevant to our review given our particular focus, and we formulated the exclusion criteria so as to filter out intervention studies belonging to the other two strands. To incorporate as many pertinent publications as possible and differentiate the publications between problem posing as a goal, problem posing as a cognitive activity, and problem posing as a tool, we followed Stylianides et al. (2024) and examined the (main) focus of each publication. If the publication mentioned that enhancing participants’ mathematical problem posing competence was one of the (or the only) primary research aims in its title or abstract, we classified it in this review as research on problem posing as a goal. In particular, the inclusion and exclusion criteria were considered by stages as follows.

In the stage of title and abstract screening (stage 1), the literature had to (a) be peer-reviewed and published in journal articles or book chapters; (b) be printed in English; (c) be published between January 1990 and June 2021; and (d) include the term “mathematical problem posing” or “problem posing” (in the subject of mathematics) in the title and/or abstract/keywords. Publications that met the inclusion criteria underwent full review (stage 2), while those that did not were excluded. Additionally, publications for which a copy could not be obtained were also excluded.

In the stage of full review (stage 2), the publications that reported research on problem posing as a goal, as we explained previously, were included. The number of these publications was reduced using the following exclusion criteria: (a) duplicate publications from different databases, or book chapters if there were journal articles that reported the same data/analysis or findings (journal articles were typically more elaborate); and (b) publications whose full review revealed that problem posing as a goal was in fact not one of the main aims.

After obtaining the set of publications based on the previous criteria for more detailed review, more restrictive criteria were applied to further select eligible studies for the meta-analysis (stage 3): (a) publications that reported on at least one type of intervention component related to mathematical problem posing along with statistical evidence; (b) publications that measured participants’ mathematical problem posing and reported enough quantitative data such that the effect size could be calculated; (c) publications that considered the intervention implementation effects including the comparison of treatment and control groups or the comparison of a single pre- and post-group.

Publications identified and selected

In the stage of title and abstract screening (stage 1), two research assistants (masters students majoring in mathematics education) classified 30 publications randomly selected from a total of 1412 publications. The inter-rater agreement for judging whether these 30 articles should be included for full review was 100%. They then independently reviewed the remaining 1382 publications. The sample was reduced to 509 publications, following title and abstract screening and removal of 2 publications for which a copy could not be obtained, for full review (stage 2). This second stage resulted in the exclusion of 220 duplicate publications, 89 publications that viewed problem posing as a tool (e.g., Bicer et al., 2020; Darhim et al., 2021), and 161 publications that viewed problem posing as a cognitive activity or as an independent variable (e.g., Van Harpen & Presmeg, 2013). Finally, a total of 39 publications met the inclusion criteria and were retained for systematic review (stage 3). Among them, five publications presented only the suggested intervention without an experiment (Abramovich & Cho, 2015; Aydin & Monaghan, 2018; Contreras, 2007; Lavy & Bershadsky, 2003; Milinković, 2015), and nine others reported uncertain experimental design information like missing experiment details and intervention duration (Xia et al., 2007) or insufficient statistical data (Abu-Elwan, 2007; Bonotto, 2010; Courtney et al., 2014; Crespo, 2003; Kwon & Capraro, 2018; Öçal et al., 2020; Otun & Njoku, 2020); the remaining 26 publications employed at least one intervention component and reported statistical data for calculation of effect sizes and thus were used for the meta-analysis. Figure 1 presents the search flow summary.

Fig. 1
figure 1

Search flow summary

Data extraction and coding

Two trained research assistants extracted the following data (where available) from the 26 included publications in the meta-analysis: (a) publication information including the study details (i.e., DOI, author names, publication year, country of origin) and type of publication (journal article, book chapter); (b) intervention information including the research design based on Garzón et al.’s (2020) typology (i.e., pretest–posttest-control that evaluates participants before and after the treatment, posttest only with control that evaluates participants only after the treatment, and pretest–posttest that evaluates a single group of participants before and after the treatment), sample characteristics (i.e., age and grade relevant to sample level, sample sizeFootnote 2), and intervention characteristics (i.e., intervention duration,Footnote 3 intervention components, the number of intervention components, and the mode of intervention components); and (c) measured outcome resulting from the intervention including the type of outcome (i.e., the quantity of posed problems, the quality of posed problems and noncognitive aspects about problem posing) and statistical outcome (i.e., the effect size or some other relevant data reflecting the level of participants’ mathematical problem posing competence).

Particularly, in terms of the data coding of the “intervention components” for RQ1, we considered the question “how participants experienced problem posing?” according to the guidance for systematic reviews proposed by Harden and Thomas (2005). The information from the procedure/design of the methodology part in each study that conducted an experiment was parsed into discrete categories of intervention components. If the study suggested intervention components without reporting on an experiment, we extracted the data from the description of the suggested intervention components and any evidence or examples that were provided as rationale for the suggestions. The constant comparative method (Strauss & Corbin, 2008) was adopted and used for identifying specific components that belonged to particular categories of intervention components that we discussed earlier: activity-based practice, method-based assistance, and environment-based support. Based on the categories of intervention components, we identified the mode of intervention components as single-based support (e.g., activity-based practice only), two-based support (e.g., activity-based practice combined with method-based assistance), or mixed-based support (three categories of components combined).

According to the criteria of data extraction, all research members initially examined a random selection of two articles from the 26 included studies in the meta-analysis. Results were discussed to ensure agreement and consistency in data extraction across research assistants. Two research assistants conducted the audit of the extracted data. The inter-rater reliability was 0.9 calculated by Cohen’s kappa statistic among each coding point from 26 reviewed studies (Cohen, 1992). All disagreements were discussed until consensus was reached.

Effect sizes (ESs)

A meta-analysis integrating studies with different research designs allows us to accumulate a larger sample and provide a more complete overview of interventions on a particular topic, which avoids sample noise that can lead to an incorrect or inconclusive interpretation of the results (Lipsey & Wilson, 2001). In this review, where possible and according to Garzón et al.’s (2020) typology, we included between-participants design (i.e., pretest–posttest-control and posttest only with control) to measure the raw difference between treatments (raw-score metric) and within-participants design (single-group pretest–posttest) to evaluate the change difference after treatment (change-score metric). In order to balance the synthesis of the best quality evidence as well as describe the extent of change attributable to the mathematical problem posing competence through interventions, we opted for the raw-score metric as the common metric and we transformed the change-score metric effect size (ES) into a raw-score metric ES, as recommended by Morris and Deshon (2002). The equation \({d}_{\text{BP}}={d}_{\text{WP}}\sqrt{2\left(1-\rho \right)}\) was used, where \({d}_{\text{BP}}\) is the transformed ES for the raw-score metric and \({d}_{\text{WP}}\) is the ES for the change-score metric.Footnote 4

Specifically, the ESs from each comparison were typically calculated with Hedge’s g (Lipsey & Wilson, 2001). The calculation of the ES was performed on an individual basis because some studies employed more than one outcome measure of problem posing competence. We calculated the Glass’s \(\Delta\) effect size and standard error of each learning outcome in a study by the extracted data (i.e., the mean score of the experimental group on pretest and posttest, t-statistic, chi-square, sample size, etc.) and corresponding statistical formula (Lipsey & Wilson, 2001). Then, we applied the Hedge’s g adjusted estimate to each ES index for correcting the sampling bias (Lipsey & Wilson, 2001). Where a study provided several ESs with respect to one particular aspect of outcome, such as the quantity of posed problems, the quality of posed problems, or a noncognitive outcome, we averaged the ESs and standard deviations to calculate the overall ES (Bernard et al., 2004). We finally grouped similar research outcomes for testing the homogeneity analysis of the effect size distribution by the Hedge’s g indices.

Risk of publication bias

Published research only comprises a proportion of all the research conducted. However, unpublished research may differ significantly from published research due to selectivity of what gets published (Song et al., 2013). Also, studies with significant outcomes tend to get published more than those with nonsignificant outcomes (Stern & Simes, 1997). If only published papers are used for a meta-analysis, the results may be biased (Sutton, 2009), which is a major threat to meta-analytic validity. Therefore, to assess the publication bias we used a symmetric distribution of effect size as indicated by a statistically significant Egger’s test (Balduzzi et al., 2019), trim and fill analysis and mixed-effects meta-regression test for funnel plot asymmetry.

Moderator analysis

Since there are multiple moderators, they may amplify or attenuate each other’s effect on treatment effectiveness. A stepwise meta-regression analysis was used to explain the sources of differences (i.e., heterogeneity) between studies and explore moderators that impact on the treatment efficacy (Hedges, 1982). We included all the potential moderators into a stepwise meta-regression model to investigate whether particular moderators explained any of the heterogeneity of treatment effects between studies. The weighted least squares approach to estimate regression coefficients was used and the weights were based upon the random effects model to approximate inverse variance. We used small-sample adjusted t-test to determine if there was a relationship between moderators and effect sizes in the population as well as adjusted F-test to assess model fit (Tipton & Pustejovsky, 2015). In addition, outliers might cause the rising residual heterogeneity and the increasing mean estimated effect size (Viechtbauer & Cheung, 2010). To identify the potential outliers, we followed Myers’ (2022) method and considered a value an outlier if it exceeded the 75th percentile by a factor of 1.5 times the interquartile range. The results of this calculation ranged from − 1.033 to 2.588, so we removed one effect size above this threshold. The study of Kalmpourtzis (2019) had effect size (g = 3.32) and was considered to be an outlier. The sensitivity analysis of the models with and without outliers was performed to examine the robustness of our results to the outlier (Harwell & Maeda, 2008).

Results

Description of selected studies

In Table 2 in Appendix, we summarized all 26 studies that were included in the meta-analysis. Figure 2 presents the intervention duration and participant age of those 26 studies that reported experimental findings. These studies were published from 1996 (Silver et al., 1996) to 2021 (Ayvaz & Durmus, 2021; Cai & Hwang, 2021; Leavy & Hourigan, 2022)Footnote 5, but most of them (19/26) were published after 2010. The studies were conducted in ten different countries/districts (Australia, China, Cyprus, Greece, Indonesia, Ireland, Israel, Japan, Turkey, USA), and included a range of participants: kindergarten students, elementary school students, secondary school students (mostly ages 5 to 18, 13 studies), university students preparing to become elementary or secondary school teachers (mostly ages 18–22, 10 studies), and in-service teachers (mostly ages over 23, 3 studies). The duration of the interventions in these studies ranged from less than 1 day (9 studies), between 1 day and less than 1 week (2 studies), more than 1 week but less than a month (5 studies), and more than 1 month (10 studies).

Fig. 2
figure 2

Note All reviewed publications in this figure are included in the reference list

Outline of the studies included in the meta-analysis (n = 26).

Intervention components incorporated in studies

To address RQ1, we used evidence from the 26 studies included in the meta-analysis and identified the following intervention components that the studies used for enhancing participants’ mathematical problem posing competence (see last column of Table 2 in Appendix). The intervention components were categorized as activity-based practice, method-based assistance, and environment-based support, and are summarized in Fig. 3. Next, we elaborate on each category of intervention component separately.

Fig. 3
figure 3

Frequency of studies from those included in the meta-analysis (n = 26) that incorporated a particular category of intervention component

Regarding activity-based practice, the intervention components included “overview of what problem posing is” in 26.9% of the reviewed studies (WPP, n = 7), “discussion of what ‘good’ problems are” in 15.4% of the reviewed studies (WGP, n = 4), “engagement with problem posing activities” in 65.4% of the reviewed studies (PPA, n = 17), and “evaluation of posed problems” in 19.2% of the reviewed studies (EPP, n = 5). Establishing knowledge of what problem posing is (WPP) and value judgements of the products of the problem posing activity (WGP) are important and pervasive for participants’ subsequent engagement in problem posing (Cai et al., 2020). The most common component (PPA), included in almost half of the reviewed studies, was to set branches of problem posing activities for participants to engage in. It provided directly experience and practices for participants to generate problems (Cai & Hwang, 2021; English, 1997). Evaluation of the problems posed by posers or presented by researchers (EPP) is an approach that enables researchers to gather evidence of how participants make judgements about their problem posing performance and the rationale behind problem selection and modification.

Regarding method-based assistance, reviewed studies which fell in this category offered such assistance in the form of the following intervention components: “comprehension of the problem posing situation” in 15.4% of the reviewed studies (CPPS, n = 4), “use of strategies involved in problem posing” in 19.2% of the reviewed studies (SPP, n = 5), “use of problem posing examples” in 15.4% of the reviewed studies (PPE, n = 4), and “use of technology in problem posing” in 19.2% of the reviewed studies (TPP, n = 5). CPPS helps participants gain familiarity with the situation of problem posing tasks to push and pull at the constraints of it, to become aware of its various characteristics, possible tensions, etc. (Hawkins, 2000). Studies in this category equipped participants with scaffolding involved in problem posing, such as with strategies (e.g., what-if-not strategies, SPP) and examples (PPE) to reduce their entry barrier. In addition, the use of technology (TPP) in an intervention was often intended to help participants better engage in particular intervention components (e.g., SPP or PPE). Technology also contributes to better applying realistic or game scenarios to problem posing (Aydin & Monaghan, 2018).

Regarding environment-based support, 46.2% of the reviewed studies attempted to incorporate such support in the form of “creation of an interactive learning environment” (ILE, n = 12) in the interventions. Interactive support leads to participants’ feeling of safety and appreciation, together with an increased interest in within-solution problem posing and openness for trying new things (Schindler & Bakker, 2020). It could be embedded in any type of intervention components (e.g., Cai & Hwang, 2021; English, 1997).

The treatment effect on mathematical problem posing competence

To address RQ2, the forest plot in Fig. 4 presents the overall treatment effect on the clusters of measured outcomes, including the quantity and quality of the posed problems, and noncognitive outcomes. For the random effect model, the mean effect across 30 effect sizes from the 25 studies that included no outliers was medium, positive, and significant (g = 0.72, 95% CI = [0.53, 0.90], p < 0.001) according to Hedge’s (1982) general benchmarks. Regarding each cluster of measured outcomes, the treatment effect on the quality of posed problems was larger than that on the quantity of posed problems, and smaller than that on the noncognitive outcomes. The results showed that the effect sizes on these three clusters were 0.73, 0.60, and 0.79, respectively. However, we found no between-group variance across clusters of measured outcomes (p = 0.88 > 0.05), demonstrating that the interventions positively affected the quantity and quality of posed problems and noncognitive aspects of problem posing without any difference.

Fig. 4
figure 4

Note We excluded the outlier (the study by Kalmpourtzis, 2019) because its unusually huge effect size of g = 3.32 would likely bias the overall ES (Lipsey & Wilson, 2001)

Overall treatment effect on the clusters of measured outcomes.

For the model without outliers that included 31 effect sizes from all 26 studies, the test of heterogeneity showed a large and significant residual heterogeneity estimate across studies (QB(30) = 121.62, \({I}^{2}=\) 75.3%). The sensitivity analysis showed that removing the outlier did not substantively alter the magnitude and direction of the point estimates generated by the model that included the outlier. Although excluding the outlier reduced the amount of residual heterogeneity obtained using all data points by nearly 6.1%, the sensitivity analysis estimates still showed considerable heterogeneity in the effect sizes (QB(29) = 94.25, \({I}^{2}=\) 69.2%). This showed there was significant variation among the studies even after removing influential data points.

The moderators of treatment efficacy

The variance inflation factor test indicated that the VIF value of several moderators was more than 10, which suggested that multicollinearity existed between moderators. To address RQ3, we used a stepwise meta-regression model as our predictive model to simultaneously test the influence of all moderators on treatment efficacy. Using stepwise meta-regression, an initial feature selection step was performed to determine what moderators were suitable for inclusion in the final model. A criterion of p < 0.10 was set for inclusion in the model (Carrara et al., 2018). The model explains a statistically significant portion of the variance (F(11,18) = 3.474, p < 0.01), \({R}^{2}\) = 0.68, and consists of seven moderators as variables: research design, sample level, sample size, number of intervention components, mode of intervention components, method-based assistance, and environment-based support.

The results of meta-regression including these seven moderators are shown in Table 1. When controlling other moderators, the following results were obtained. Regarding the research design, the overall effect size of pre-post design was on average 58% higher than that of pre-post-control design (t = 2.085, p < 0.1). Regarding the sample level, interventions delivered to K-12 students generated an average of 89% and 96% higher effect size than those delivered to prospective teachers (t = −4.189, p < 0.001) and in-service teachers (t = −3.788, p < 0.01), respectively. Regarding the sample size, interventions implemented with a small group of students (less than 25) produced an average of 72% higher effect size than those implemented with a large group (t = -1.973, p < 0.1). In terms of the number and mode of intervention components, the results indicated that interventions employed with more than one intervention component had an average of 63% higher effect size compared to those employed with a single intervention component (t = 2.202, p < 0.05). Similarly, the overall effect size of interventions applied in one-based support (i.e., activity-based practice, method-based assistance, or environment-based support) was on average 64% and 186% higher than that of interventions applied in two-based support (t = -2.174, p < 0.05) and mixed-based support (t = -3.289, p < 0.01), respectively. Regarding types of intervention components, we found that the effect sizes of interventions that incorporated method-based assistance or environment-based support were on average 84% or 83% higher than those of interventions without method-based assistance (t = 1.905, p < 0.1) or environment-based assistance (t = 2.154, p < 0.05), respectively.

Table 1 Stepwise meta-regression results depending on moderators

Publication bias

Results of the Egger’s test showed the coefficient for the modified effect standard deviation was significant for the models (p < 0.05), indicating that the effect size distribution was asymmetric (funnel plot was shown in Fig. 5, left). Given that any factor which is associated with both study effect and study size could confound the true association and cause as asymmetric funnel (Peters et al., 2008), we applied the trim and fill analysis of the random effects model imputed nine missing negative studies and reduced the point estimate of r to 0.5109, as shown by its confidence interval (95% CI = [0.304, 0.715]) and the heterogeneity test (Q(38) = 150.5, p < 0.05). Therefore, the true correlation between behaviors is likely to be of strong magnitude and not enough evidence showed that plot asymmetry was caused by publication bias (Duval & Tweedie, 2000) (funnel plot is shown in Fig. 5, middle). The mixed-effects meta-regression test for funnel plot asymmetry showed that the coefficient for modified effect standard deviation was not significant for the models (z = 1.719, p > 0.05, Fig. 5, right). Accordingly, we concluded there was not enough evidence to suggest publication bias.

Fig. 5
figure 5

Funnel plots illustrating the assessment of publication bias in the meta-analysis

Discussion

In this review we identified nine typical intervention components—under the broad categories of activity-based practice, method-based assistance, or environment-based support—that had been parts of interventions published in the literature (RQ1). We also conducted a meta-analysis to examine the treatment efficacy on each cluster of measured outcomes in regard with mathematical problem posing competence. In addition, we conducted a meta-regression analysis to determine if variability in the interventions’ effect sizes was associated with six kinds of moderators related to the research design, the sample, and the intervention characteristics. We next discuss our results in regard to treatment effect (RQ2) and moderators’ influence on the treatment efficacy (RQ3).

The treatment effect on mathematical problem posing competence

Regarding RQ2, our results showed that the interventions had a medium, significant, and positive impact on participants’ mathematical problem posing competence (without outliers: g = 0.72, p < 0.001). The estimates without outliers suggest that, compared to the participants in the control groups, participants in the intervention groups demonstrated about 0.72 SD unit improvement in their mathematical problem posing scores. The mean treatment estimates without outliers indicated that approximately 76% of the students in the intervention groups scored above the mean of their peers in the control groups (Lipsey et al., 2012).

In addition, we found that the treatment effect on noncognitive outcomes of problem posing was larger than the effect on the outcome of quality of posed problems and that the latter was larger than the effect on the outcome of quantity of posed problems. However, the between-group variance was not significant. Several research studies found that noncognitive factors have potential to improve cognitive skills (Frank, 2020; Holmlund & Silva, 2014). Thus, it would be important to further examine changes in participants’ cognitive skills on mathematical problem posing as we examine changes in their respective noncognitive skills. Furthermore, while the quantity of posed problems as an outcome measure could reflect posers’ fluency in problem posing, it does not mean that posing more problems represents an enhanced problem posing competence, not least because participants can generate lots of problems by changing the values of the variables in the first posed problem (Zhang et al., 2022a).

The moderators’ influence on treatment efficacy

Regarding RQ3, considering moderator influence via stepwise meta-regression analysis, we found that seven moderators—research design, sample level, sample size, number and mode of intervention components, and the existence of particular types of intervention components—explained a statistically significant portion of the heterogeneity of treatment effects between studies. We reflect on the results according to the typology of moderators, including research design, sample characteristics, and intervention characteristics, and we do so separately for each type under the assumption that other moderators remain fixed.

In terms of research design, the overall effect size of pre-post design was higher than that of pre-post-control design. This result was consistent with our original hypothesis, namely, that studies adopting a single-group design would yield higher mean treatment effect size than studies using experimental of quasi-experimental designs. The significance level was set at 0.10, consistent with Niu et al. (2013). As Lakens (2013) explained, the increased statistical power and sensitivity in within-participants designs allow for the detection of smaller, yet significant, treatment effects that might be missed in between-participants designs. However, the significance level being set at 0.10 suggests that, while our findings support the hypothesis, the evidence is not as robust as it could be. It is crucial to balance the increased power and the potential validity threats inherent in within-participants designs. Careful consideration of these trade-offs helps enhance the robustness and reliability of effect size estimates in intervention studies.

In terms of sample characteristics, the results indicated that the sample level significantly influenced treatment efficacy regardless of outliers. Specifically, the treatment efficacy of studies delivered to K-12 students was significantly higher than that to prospective teachers and in-service teachers. This result matched our hypothesis that lower grade level participants might benefit more from interventions targeting their mathematical problem posing competence. Higher grade level participants, who are more accustomed to the conventional teacher-led instruction and are relatively successful in learning mathematics in this way of teaching, are more likely to possess low motivation in posing problems (Silver, 1994) and thus benefit less from the interventions. Furthermore, Cai and Hwang (2020) delved into the nuances of problem posing from a pedagogical standpoint, highlighting the difference between students and teachers. For teachers, problem posing extends beyond merely generating problems based on given problem situations or modifying existing problems, which are the areas students are solely focusing on. Teachers might also consider activities like predicting problems students might pose, generating situations for students to pose problems, and posing problems for students to solve. This distinction underscores that the impact of a problem posing intervention could vary depending on the roles of the participants. As Voica and Pelczer (2009) noted, individuals without role-specific constraints, who focus purely on the mathematical aspects of problem posing, might have a better performance. To conclude, although we recognize that problem posing has different pedagogical implications for students and teachers, this distinction can shed some light on the significant variance in treatment efficacy driven by the sample level. It also motivates further investigation into the reasons behind this phenomenon and, in particular, how interventions might uniquely resonate with participants with or without role-specific constraints.

In addition, we found that studies implemented with small groups of students (less than 25) produced relatively larger effects than those implemented with large groups (more than 50). This result is consistent with our original hypothesis about the role of group size and has received support in the literature from at least two perspectives. From a statistical perspective, effect sizes based on small samples were found to be larger than effect sizes based on larger samples even when the actual magnitudes of the intervention effects were identical (Lipsey et al., 2012). From an instructional perspective, significant gains made by low-performing students were attributed in part to the number of hours spent in, as well as the intensity of, the intervention (Torgesen, 2000), and this intensity was likely to be higher when the intervention was delivered to a smaller group. Relatedly, as mathematical problem posing is a relatively new activity compared to problem solving for many participants, participants are more likely to have a low performance at the beginning and thus require more assistance through small-group or individual instruction (Cirino et al., 2015; Powell et al., 2009).

In terms of intervention characteristics, we found that interventions that were designed with more than one intervention component or incorporated one particular type of intervention component were associated with significantly larger effects than those conducted with a single intervention component or incorporated mixed types of intervention components. These results suggest a complex role of design multiplicity of intervention components, appearing to favor multiplicity within but not across type of intervention components. A commonly held view is that interventions with more than one type of components are more effective than single-type-component interventions (Squires et al., 2014), since there are multiple barriers at different levels to changing participants’ behaviors (Grimshaw et al., 2012). In the case of problem posing, participants may lack prior experience with problem posing activity including strategies of how to pose problems. Accordingly, it is reasonable to expect that multifaceted interventions that target several of these barriers simultaneously using a mixture of types of intervention components (e.g., activity-based practice combined with method-based assistance) may be more effective to address the barriers to a behavior. Yet, despite the face validity of this point and our results that support the opposite, evidence as to whether multifaced interventions are truly more effective remains uncertain (Squires et al., 2014). As the field explores this matter more, it is useful to note our finding that interventions that employed the method-based assistance or environment-based support produced a higher effect size than interventions that applied no such type of assistance or support.

Finally, regarding another key moderator with respect to intervention characteristics, namely, intervention duration, there was not enough evidence that duration was a moderator since none of the coefficients of duration in the regression models were statistically significant. On one hand, interventions of relatively long duration might have offered participants additional opportunities to receive explicit modeling and practice to develop their skills, as well as opportunities for ongoing progress monitoring and feedback (Powell & Fuchs, 2015). On the other hand, interventions of relatively short duration might have been implemented with higher fidelity (Stylianides & Stylianides, 2013). The way intervention duration was calculated in this review is also worth consideration. Specifically, we used intervention duration to refer to the length of time over which an intervention was implemented or spread across, rather than the length of time that participants actually experienced the intervention components. This way of calculating duration might not reflect accurately the intensity of the intervention as it might include time periods when participants did not receive an intervention treatment. This, in turn, could result in a dilution of the overall intervention effect, making it more difficult to detect a significant relationship between duration and treatment efficacy.

Limitations and future meta-analyses

Despite our best efforts to identify relevant publications, we were unable to access several potential studies. We contacted authors to obtain these articles, but we received no response on some occasions. Also, we did not take account of publications presented at conferences due to concerns about inconsistent standards of peer review and the relatively short length of articles in conference proceedings that may not allow authors to present in sufficient detail their research designs and findings. Furthermore, although our statistical analysis did not indicate publication bias, the tendency of journals to publish studies with significant effects, combined with the large proportion of such studies in our meta-analysis, suggests that publication bias may still have influenced our findings. Finally, although the meta-analysis considered several kinds of potential moderators, the moderator analysis showed considerable between-study heterogeneity, suggesting other factors not accounted for in this analysis might affect the effect sizes. Future meta-analyses exploring additional potential moderators, such as the dependent measure type (researcher-developed assessment or standardized measures) and study quality rating, are needed for deepening our understanding of the factors that impact an intervention’s effectiveness.

Conclusions

Although mathematical problem posing is a younger field of inquiry within mathematics education compared to its twin activity of mathematical problem solving, it has attracted increased research attention over the recent years and, gradually, an important theoretical and research foundation has been established in relation to both (e.g., Cai et al., 2015; Silver, 2023; Toh et al., 2023). Our findings in this review of interventions to improve participants’ mathematical problem posing competence, including the mechanisms underlying the more or less effective interventions and moderators’ influence on intervention efficacy, help deepen theoretical understanding of this competence and how to promote it (Bronfendrenner, 1977; Snyder et al., 2019).

The findings provide researchers and practitioners with useful guidance about how to design and implement (more) effective interventions to enhance students’ and teachers’ mathematical problem posing competence and, through this, participants’ other important skills that are believed to be associated with problem posing competence, notably, mathematical problem solving and creativity skills (Bonotto & Santo, 2015; Cai & Hwang, 2002). As we explained previously, the effectiveness of the interventions we reviewed differed across intervention designs. In particular, the number and mode of intervention components, along with the existence of certain types of intervention components, emerged as significant factors influencing the treatment efficacy. Researchers and practitioners who design new interventions can select and tailor appropriate intervention component dosage or types in order to optimize the treatment efficacy, while considering their particular aims, contextual factors, and participant needs. However, it is important to recognize that intervention implementation is a dynamic, context-specific process. Each layer of a context, whether at the micro (individual), meso (team or organization), or macro (system) level, can affect the intervention’s effectiveness (Moullin et al., 2020). Thus, ongoing tailoring of intervention design, as well as formative and summative evaluations of factors at any of these levels, is necessary to comprehensively evaluate mechanisms of intervention success.

The demonstrated effectiveness and uncovered mechanisms of existing interventions in this study highlight the feasibility of integrating problem posing into real-world educational settings and mathematics curricula. Teachers can translate the identified intervention components into classroom practices in teaching mathematics both for and through problem posing (Silver, 2023). To support these practices, professional development programs to equip teachers with the necessary knowledge and skills are sorely needed, including how to integrate problem posing and problem solving activities in mathematics curricula and classrooms (Toh et al., 2023). Through the collaborative effort of researchers, practitioners, and policymakers, the theoretical and practical advancements in mathematical problem posing can be translated into tangible educational improvements.