1 Introduction

Self-regulated learning (SRL) is considered a complex set of recursive and goal-oriented learning processes (Panadero, 2017). Self-regulated learners set their learning goals and actively select, monitor and modify their learning strategies to accomplish these goals and succeed in different learning tasks (Zimmerman, 2013; Winne & Hadwin, 1998; Winne, 2022; Cleary et al., 2022). Self-regulated learners are thus in control over their learning processes and learning goals (Winne, 2018). As engagement in SRL processes has a potential to improve academic achievements and, more broadly, to support lifelong learning (Cleary & Chen, 2009; Klug et al., 2011; Recommendation, 2018; Theobald, 2021), it is critical for students to master their command of SRL and become productive learners in different domains of knowledge.

To advance understanding of SRL and identify the relationships among different learning processes involved, researchers have proposed several SRL theoretical frameworks, such as Winne and Hadwin (1998); Winne (2018); Zimmerman (2000); Pintrich (2000). Although differences among these theoretical models are noticeable, these models broadly agree that SRL is a cyclic process that involves a repertoire of learning goals and learning strategies (Panadero, 2017). For example, according to Zimmerman (2000), self-regulated learners selectively use specific processes to work on learning tasks, over three cyclical phases: forethought, performance and self-refection. Winne and Hadwin (1998)’s theoretical model describes SRL as a dynamic set of skills where learning unfolds over five facets (conditions, operations, products, evaluations, and standards - COPES) and four phases (defining task requirements, setting goals and devising plans, enacting study tactics, and adapting future studying).

Even though researchers have made a substantial progress over the past several decades towards deeper understanding and more effective support for learning processes involved in SRL, development of SRL skills is still considered challenging for many students (Bjork et al., 2013). For example, students struggle to gather appropriate resources for a learning task (List & Du, 2021); set relevant, specific and attainable goals to guide their engagement with the task(McCardle et al., 2017); select appropriate learning strategies and effectively use them (Azevedo, 2018; List & Lin, 2023); and accurately monitor and evaluate their own progress (Zimmerman, 2002; Gutierrez de Blume, 2022; Lim et al., 2023). Students often need guidance to successfully enact these learning processes. Educational researchers and practitioners proposed different types of external support to students as they are developing SRL skills (Jivet et al., 2020, 2021; Perez-Alvarez et al., 2022). Broadly, the SRL support has so far been provided in a more traditional way, e.g., via a classroom-style coaching on goal setting (McCardle et al., 2017; Morisano et al., 2010; Alessandri et al., 2020) and metacognitive strategies (Cleary et al., 2022; Dignath & Veenman, 2021), and, more recently, using technology-enhanced learning platforms, e.g., computer-based scaffolding environments that support task orientation, strategy use and metacognitive monitoring (Baker et al., 2020; Azevedo et al., 2017; Azevedo & Aleven, 2013; Pérez et al., 2020; Jivet et al., 2020, 2021; Dever et al., 2023; Srivastava et al., 2022; Lim et al., 2023).

In recent years, researchers have become increasingly interested in using chatbots to address educational problems (Wollny et al., 2021; Li et al., 2023; Dai et al., 2023). One of the main reasons for such increased interest is that chatbots have a potential to scaffold or externally regulate learning processes in dynamically changing learning contexts like SRL (Azevedo & Hadwin, 2005), because chatbots use artificial intelligence and natural language processing to simulate and adapt to conversation with humans. Following the growing interest in educational chatbots, researchers have recently published several literature reviews on the topic (Winkler & Söllner, 2018; Pérez et al., 2020; Smutny & Schreiberova, 2020). All these reviews have contributed a significant knowledge to this field, providing valuable findings about the currently available educational chatbots across disciplines and the benefits of using chatbot technologies in education to, e.g., supplement teaching or recommend learning content to students. However, to our knowledge, researchers have yet to learn how educational chatbots developed so far have supported processes theorised in SRL. These new findings may add to the current educational research and practice given the documented benefits of SRL skills for academic performance and life-long learning. To contribute new research knowledge to the fields of educational technology and learning sciences, we conducted the present systematic review of the literature explicitly focusing on how educational chatbots have been used to support SRL processes and learning achievements. Our analysis was based on Winne and Hadwin (1998)’s theoretical framework that defined facets and phases of SRL. Our findings may inform future research related to development and implementation of educational chatbots that provide a more comprehensive SRL support to learners.

2 Background

2.1 SRL theoretical framework to guide this systematic review

Different theoretical frameworks have been proposed to date to define SRL processes and to understand the relationships among them, and, in this way, help researchers to measure and support learners’ engagement in SRL. For an overview of major SRL theoretical frameworks, see Panadero (2017). To theoretically ground our systematic literature review, we utilized the SRL theoretical model proposed by Winne and Hadwin (1998). According to this framework, students’ SRL processes unfold over four general phases: task definition, goal setting and planning, enacting study tactics, and adaptation to future studying, and five facets: conditions, operations, products, evaluations and standards. We opted to use this framework because (1) it is one of the six most cited frameworks in the literature, signifying its robustness and widespread acceptance among researchers, and it is particularly welcomed in research involving computer assisted learning (Panadero et al., 2016; 2) it provides a comprehensive account of cognitive, metacognitive and motivational processes that interweave in SRL offering a holistic view of the learning process; and (3) the model is distinguished by its detailed depiction of how different phases interact with each other over time as learning unfolds, affording researchers and educators ways to design specific and time-sensitive SRL support to learners (Greene & Azevedo, 2007).

The first phase in Winne and Hadwin’s model of SRL is task definition where learners make inferences and develop perceptions about the features of the task, and survey available resources for studying. The next phase is goal setting and planning where learners set their learning goals, devise plans and determine learning strategies which will be used to accomplish goals for learning. In the following phase, students enact their learning strategies and oversee (i.e., metacognitively monitor) the effectiveness of those strategies in addressing the task. For example, learners might highlight key concepts and construct a vocabulary list during a reading task, and, if they deem this strategy to be ineffective, they may decide to modify (i.e., metacognitively control) it, e.g., engage in note-taking instead of highlighting. In the adaptation phase, learners reflect on their studying during the previous stages and make forward-reaching adaptations for similar tasks in the future, e.g., a learner may decide to include note-taking in a repository of preferable learning strategies for the upcoming reading comprehension tasks, as note-taking worked well for the learner in the present task. In this way, learners reach beyond the present task and change their cognitive conditions for future learning (Greene & Azevedo, 2007).

Learning activities that unfold over the four general phases of SRL can be characterised relative to five common dimensions, i.e., facets: conditions, operations, products, evaluations and standards (COPES). Conditions encompass different internal and external factors that affect how a learner will engage with a task. For example, internal conditions include the learner’s prior knowledge of a domain, knowledge of learning strategies, experience with a task, and motivation and interest in a task; whereas external conditions include available learning resources, task instructions, scoring rubrics and time constraints. Operations are the processes by which learners manipulate information at hand and, in that way, induce actual learning (Winne, 2022). Winne (2018) defined five fundamental operations including searching, monitoring, assembling, rehearsing and translating (SMART). As learners engage in operations, they create products of learning, e.g., a note, essay draft or program code. Self-regulated learners actively evaluate their learning products against standards, e.g., a scoring rubric or instructional objectives. Upon evaluating their learning products, self-regulated learners may engage in metacognitive control, i.e., they may decide to modify their learning goals and strategies, and revise the products (Greene & Azevedo, 2007; Raković et al., 2022a).

2.2 Educational chatbots

A chatbot is an interactive computer program enhanced by artificial intelligence (AI) and natural language processing (NLP) to simulate conversation with humans through text and voice. Since the development of the earliest chatbot Eliza (Weizenbaum, 1966) in 1966, various chatbots have evolved providing interactive interface for users to engage with different services, resources, and data in a natural conversational style (McTear, 2020). As well, chatbots have been used as tools to understand and model human behavior (McTear, 2020). The use of chatbots has seen a significant increase over the past several years (Zawacki-Richter et al., 2019), offering support to users in different contexts, e.g., customer services, online shopping and banking (Illescas-Manzano et al., 2021).

Due to its characteristics to dynamically and adaptively interact with users, educational chatbots have been considered a viable option to support learning in different settings (Smutny & Schreiberova, 2020), including SRL. For example, as metacognitive processes of monitoring and control are considered central in SRL (Winne, 2022), learners need to continuously engage those processes to succeed in a learning task. Many learners, however, struggle to sustain these metacognitive processes throughout a learning session (Azevedo & Aleven, 2013), which further prevents them from productively engaging in SRL and performing well in a task. Educational chatbot may provide external regulation to learners by performing a part of metacognitive monitoring instead of students having to conduct these processes by themselves (Molenaar, 2022), e.g, a bot may identify two learning strategies that a learner had used previously in the task and ask a learner to compare the effectiveness of these two strategies relative to task requirements. In this way, a chatbot may help the learner preserve cognitive resources for other aspects of the task, e.g., constructing deeper understanding of concepts studied. As well, by providing SRL guidance to students, chatbot may help learners increase their engagement across phases and facets of SRL, which may further benefit their development of SRL skills and boost their academic achievements.

Recent literature reviews (Winkler & Söllner, 2018; Pérez et al., 2020; Smutny & Schreiberova, 2020; Wollny et al., 2021) have reported that chatbots have been used for the two main purposes in educational settings, including (1) service support and (2) teaching support. Building on the success of chatbots in the area of customer service, chatbots have been used at many educational institutions to provide service support to students, e.g., support with enrolment, library and campus resources (Sweidan et al., 2021; Allison, 2012). For example, an interactive bot SIAAA-C (Sweidan et al., 2021) is designed to provide students with important campus resources, e.g., campus map and notifications during COVID-19. On the other hand, teaching-oriented chatbots have been commonly used in formal education to supplement traditional teaching in different domains, e.g., languages, math and science. Harnessing their conversational features, those chatbots typically play the role of human tutor and provide learners with content knowledge and practice questions. For instance, Wu et al. (2020) developed a multi-module chatbot that supported students studying mathematics and Chinese history, whereas Mageira et al. (2022) and Vázquez-Cano et al. (2021) created the chatbots to help students learn English and Spanish, respectively, e.g., through prompting and recommending additional learning resources. The literature reviews published so far (Winkler & Söllner, 2018; Pérez et al., 2020; Smutny & Schreiberova, 2020; Wollny et al., 2021) identifed different types of educational chatbots and technologies used to implement those bots. These reviews have also revealed the potential of using chatbots to facilitate teaching and learning processes, to recommend learning content and to provide service support to students. While there have been several studies investigating the use of chatbots for SRL, there has been insufficient understanding about the extent to which different aspects of SRL have been supported by chatbots. To address this gap, we conducted a systematic literature review to learn (1) how educational chatbots have provided support for learners’ SRL and (2) how that support has affected learners’ SRL skills and performance. This inquiry is critical because by identifying and synthesizing the ways in which educational chatbots contribute to or hinder SRL, our study could potentially offer valuable insights into the design of more effective educational technologies that are aligned with pedagogical goals. Second, understanding the impact of chatbots on learners’ SRL skills and performance can inform educators and policymakers about the potential benefits and limitations of integrating these technologies into the learning environments. More formally, the following Research Questions guided our systematic review:

  • RQ1: How have educational chatbots been used to support students’ SRL processes relative to (i) phases and (ii) facets theorised in Winne and Hadwin (1998) and Winne (2018)?

  • RQ2: To what extent has the use of educational chatbots improved learners’ SRL processing and learning performance?

3 Methodology

We conducted a systematic review of the literature to answer our research questions. To ensure a thorough and transparent systematic literature review process, we carried this review using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) framework as a guideline (Page et al., 2021; Moher et al., 2009). The systematic literature review involved three major phases (1) search for relevant publications in multiple bibliographical databases, (2) select relevant publications following the PRISMA framework, (3) extract and analyse relevant information in selected publications to answer research questions.

3.1 Literature search

We utilized the SPIDER framework (Cooke et al., 2012) to define parameters for the literature search. The SPIDER framework proposes five general groups of search criteria, including sample, phenomenon of interest, design, evaluation and research type. As per our inclusion criteria (detailed in the next section) our Sample (S) involved students studying in formal educational settings at primary, secondary and tertiary levels. The Phenomena of our Interest (PI) were self-regulated learning and educational chatbots. We searched for research studies that have been Designed to empirically evaluate the effects of chatbots on SRL (D) and that have reported outcome measures based on these Evaluations (E). We included qualitative, quantitative and mixed-methods studies (R) in our search.

We used the following search query: (“chatbot” OR “educational chatbot” OR “conversational agent”) AND “self-regulated learning” AND “formal education” AND (“student” OR “learner”) AND “research article” to search for titles, abstracts and keywords of publications in bibliographical databases. We included studies published between 2012 and 2023, inclusively, as we deemed this time range to be sufficient to capture the state-of-the-art in the emerging field of educational chatbots. We searched the following bibliographical databases: Scopus, Elsevier, ACM, IEEE Xplore, Web of Science, ERIC, PsychInfo, Wiley library, Google Scholar, ResearchGate and the library database at our university. The search was conducted in October 2023. At this stage, we retrieved 598 publications. After removing 72 duplicates, 526 publications remained in our dataset for further analysis.

3.2 Abstract screening and full paper review

To identify relevant publications for our review we performed two reviewing steps, following the PRISMA guidelines (1) abstract screening and (2) full paper review. In other words, publications selected in the abstract screening step were reviewed in full for their relevance at the full paper review step. For these two reviewing steps, we followed our inclusion and exclusion criteria. Specifically, we included research studies that:

  1. 1.

    Reported on the use of chatbots in formal educational settings

  2. 2.

    Reported on the use of chatbots to support students to engage in SRL processing (e.g., goal setting, strategy use, and monitoring)

  3. 3.

    Described characteristics of educational chatbots (e.g., chatbot architecture and types of utterances exchanged between student and bot)

  4. 4.

    Reported on the effectiveness of educational chatbots in supporting SRL skills and/or learning outcomes

  5. 5.

    Were published in peer-reviewed journals and conference proceedings in English between Jan 2012 and Oct 2023

We excluded:

  1. 1.

    Publications that reported on using chatbots outside of formal educational settings (e.g., school administration and customer service)

  2. 2.

    Publications from which it could not be clearly inferred what SRL processes have been supported by the chatbot (e.g., studies applying a third-party chatbot as a black box intervention or using a chatbot to conduct a quiz)

  3. 3.

    Publications that did not provide a clear description of chatbot characteristics

  4. 4.

    Publications that did not provide the evaluation of chatbot effectiveness

  5. 5.

    Technical reports, conceptual and design papers

  6. 6.

    Non-peer reviewed publications and publications without available full-text

At the screening step, two reviewers screened the titles and abstracts of 526 publications, i.e., those publications that remained from the previous phase in this review. Each reviewer had an opportunity to vote “Yes”, “Maybe” or “No” for the study, relative to whether the study should be included in the next stage of the review. The reviewers had the agreement on 456 papers ( 86.7%, Fleiss kappa = 0.734, p<0.001). The remaining 70 conflicts were resolved through discussion between the reviewers. The main reasons for conflicts came from abstracts that did not explicitly state whether the chatbot evaluation was performed in the study. The reviewers agreed to keep such articles in the dataset and fully assess those in the next stage. A total of 101 publications remained in the dataset after this stage.

At the full paper review step, the reviewers randomly selected 15 out of 101 publications (nearly 15%), separately reviewed those and voted whether the paper should be included in the study or not, following the inclusion and exclusion criteria. The reviewers agreed on 12 out of 15 publications (80%, Fleiss kappa = 0.52, p=0.04). The common disagreement between the reviewers at this stage was about whether the study provided a sufficiently clear description of the bot characteristics. This disagreement was resolved through discussion between the reviewers and the decision was made to include in the final review only those publications that described types of utterances exchanged between a student and a bot. The reviewers evenly split the remaining publications in the dataset (i.e., 86 publications were randomly assigned to each reviewer) and reviewed those separately. A total of 27 papers were extracted for the review. We summarized our review process in Fig. 1 (Page et al., 2021).

Fig. 1
figure 1

PRISMA flow diagram

Fig. 2
figure 2

Number of publications by year color-coded with chatbot architectures

3.3 Analysis of extracted publications

The first author of this review extracted data from each publication as per following categories: general information (publication title, authors, year, sample size, level of education, domain of education and learning task), chatbot type, SRL facet (conditions, operations, products, evaluations, and standards), SRL phase (task understanding, goal setting and planning, enactment, and adaptation), and reported effects (on SRL processes and learning achievements). To categorise publications into suitable SRL facets and phases, the first author closely followed definitions of constructs provided in Winne and Hadwin (1998). See the section SRL Theoretical Framework to Guide This Review for details. The analysis of SRL facets and phases in selected publications was used to address RQ1, while the analysis of the reported effects of chatbots on SRL processes and learning achievements was used to address RQ2.

4 Results

4.1 General information

We summarised the studies included in our systematic literature review in Fig. 2. Out of the 526 studies that we assessed in this review, 27 studies fit the inclusion criteria for full review. Over 92% of these studies were published in 2020 onward, i.e, six in 2020, 11 in 2021, two studies in 2022 and six studies in 2023, whereas only two studies were published before 2020. We observed that 13 studies utilized a natural language processing (NLP)-driven approach in their chatbot design to interpret and respond to user inputs in a conversational manner. On the other hand, 13 studies employed rule-based architectures in their chatbot design, i.e., following predefined pathways or rules to respond to specific commands or keywords, offering predictable and consistent interactions within a structured framework (Fig. 2). Additional architectures in the reviewed studies include an NLP-driven architecture with contextual bandit algorithm (Cai et al., 2021) and knowledge-based system accessing a vast domain-specific database to deliver accurate information (Chang et al., 2022b).

Further, the chatbots we reviewed provided SRL support to students in different domains of education, including language learning, math, science, computer programming, accounting and educational psychology, with language learning being slightly more prominent than the other domains (Fig. 3). Moreover, the chatbots included in this review have been mainly utilised in higher education, i.e., researchers provided chatbots to university students in 21 studies. Two studies were conducted in primary schools, three studies were conducted in secondary school and one study involved a diverse student population recruited from Amazon Mechanical Turk (Fig. 3).

Fig. 3
figure 3

Domain of education supported by chatbot color-coded with participants’ level of education

4.2 RQ1: How have educational chatbots been used to support students’ SRL processes relative to (1) phases and (2) facets theorised in Winne and Hadwin (1998) and Winne (2018)?

Of 27 articles included in this review, 15 reported on using chatbots to support student SRL processing in a single SRL phase, 11 articles reported on support across two and 1 article reported on support across three SRL phases. None of the reviewed studies appeared to utilise educational chatbots to provide comprehensive SRL support across all four phases of SRL defined in Winne and Hadwin (1998). More specifically, in 25 articles researchers used chatbots to facilitate SRL during the strategy enactment phase, i.e., the phase in which students are to select and use learning tactics and strategies. In these studies, chatbots were mainly utilised to guide students to enact learning tactics/strategies to accomplish a particular learning task, such as writing a thesis statement (Lin & Chang, 2020) or an essay (Neumann et al., 2021), learning a programming language (Ait et al., 2023; Tian et al., 2021) and developing a project report (Kumar, 2021). Six chatbots supported students at the task definition stage, e.g., “Make sure to re-read the question!” (Cai et al., 2021). Five chatbots supported students to set goals and devise plans for learning, e.g., by scaffolding students to specify their achievement goals (Hew et al., 2021, 2023) and by guiding goal setting with questions (Du et al., 2021; Al-Abdullatif et al., 2023). Four chatbots supported students to adapt to their future studying, e.g., by providing students with the opportunity to monitor their learning progress (Harati et al., 2021; Oliveira et al., 2021) (Fig. 4).

Fig. 4
figure 4

Venn diagram showing the number of studies over SRL phases

In all the studies we reviewed authors have reported on using chatbots to promote SRL processes at conditions, operations, and products, the three cognitive facets of SRL. For instance, researchers have used chatbots to promote students’ internal conditions for a task that include activation of domain knowledge (Cai et al., 2021; Neumann et al., 2021), task interest and motivation (Fryer et al., 2017, 2020; Yin et al., 2021), self-efficacy (Chang et al., 2022a), and outcome expectation (Hew et al., 2021)). Researchers have also utilised chatbots to support students to leverage external conditions for a task. For example, chatbots recommended learning resources to students (Bailey et al., 2021; Chang et al., 2022b), and guided students to manage their studying time (Harati et al., 2021) and to understand task requirements Du et al. (2021); Mellado-Silva et al. (2020); Chen et al. (2020)). We also found that in 17 studies chatbots supported students to engage in cognitive operations of assembling. These include integrating and consolidating conceptual knowledge in math (Cai et al., 2021), English language learning (Fryer et al., 2017; Xia et al., 2023), physical sciences (Deveci Topal et al., 2021) and accounting (Mellado-Silva et al., 2020). Nine chatbots provided support for cognitive operations of translating. These include guiding students to transform knowledge from readings into a written narrative (Bailey et al., 2021) and to apply knowledge in a practical project (Kumar, 2021). 13 chatbots provided support for metacognitive operations of monitoring. These include guiding students to monitor for domain knowledge acquisition (Harati et al., 2021), for learning goals and responses to questions (Hew et al., 2021), for learning strategy employment (Song & Kim, 2021) and for progress and performance (Cai et al., 2021; Neumann et al., 2021; Oliveira et al., 2021; Zhang et al., 2023a, b). Five chatbots provided support for searching operations. These include guiding students to search for course materials and learning content (Chang et al., 2022a, b; Oliveira et al., 2021), for specific learning strategies (Du et al., 2021) and for learning tools (Jones & Castellano, 2018). And one chatbot included support for a cognitive operation of rehearsing by guiding students to formulate acquired knowledge in their own words (Jeon, 2021). Next, we found that the most common learning products that students created while studying with chatbots were answers to questions on tests/quizzes (Cai et al., 2021; Chang et al., 2022b; Jeon, 2021), and only a few chatbots have supported students to produce essays (Neumann et al., 2021), thesis statements (Lin & Chang, 2020), project reports (Kumar, 2021) and learning goals (Du et al., 2021).

Further, 16 chatbots in the corpus we reviewed have appeared to provide support for learning processes theorised to occur at the evaluations facet of SRL. For instance, chatbots utilised in Cai et al. (2021), Oliveira et al. (2021), Zhang et al. (2023a) and Lin and Chang (2020) assisted students to engage in judgment of learning, whereas chatbots in Jones and Castellano (2018), Hew et al. (2021), Song and Kim (2021) and Yin et al. (2021) promoted student engagement in self-reflection. Last, 14 chatbots provided a guidance to students to better comprehend task standards. Specifically, these chatbots provided students with initial explanations of task requirements and other task features (Bailey et al., 2021; Lin & Chang, 2020; Jones & Castellano, 2018; Chen et al., 2020), task-related tips (Tian et al., 2021), opportunities for progress check relative to task topics (Harati et al., 2021), and questions for goal setting (Du et al., 2021; Hew et al., 2023). We provide the summary table of the SRL phases and facets supported by the educational chatbots included in this review in the appendix (Figs. 5, 6, 7, 8, and 9).

4.3 RQ2: To what extent the use of educational chatbots improved students’ SRL processing and learning performance?

Among the publications reviewed, we found mixed effects of educational chatbots on students’ SRL processes and learning performance. In terms of promoting SRL processing, researchers have reported that students who studied with chatbots tended to: use more effective learning strategies (Bailey et al., 2021; Chang et al., 2022b; Mellado-Silva et al., 2020), increase their awareness of the importance of setting learning goals (Du et al., 2021; Hew et al., 2023), control the learning process over their study pace (Yin et al., 2021; Tian et al., 2021), enhance their learning engagement and self-efficacy (Chang et al., 2022a; Hew et al., 2021; Oliveira et al., 2021), and transfer some of their SRL skills to a new learning activity (Jones & Castellano, 2018). Researchers have also found that students who studied with a chatbot did not sustain well their interest in task, attributed to the novelty effect (Fryer et al., 2017), and did not increase their SRL processing (Harati et al., 2021). Moreover, the use of chatbot in one of the studies did not appear to statistically significantly boost student internal conditions, i.e., need for cognition, perception of learning, creativity, self-efficacy and motivational beliefs – conditions critical for productive SRL (Kumar, 2021). The systematic review also shows that chatbots were used to improve students’ learning performance in tasks spanning different subjects, including English as a second language (Bailey et al., 2021), obstetrics (Chang et al., 2022a), physical education (Chang et al., 2022b), science (Deveci Topal et al., 2021), accounting (Mellado-Silva et al., 2020), geography (Jones & Castellano, 2018) and educational psychology (Lin & Chang, 2020; Kumar, 2021). We also note that the use of chatbot had limited effects on learning performance of students working on a chemistry task (Harati et al., 2021), and statistically non-significant effects on performance of students working on tasks in math (Cai et al., 2021), computer science (Oliveira et al., 2021) and geography (Jones & Castellano, 2018). We summarised descriptive and inferential statistics on SRL processes and learning performance across the reviewed studies in the appendix (Figs. 5, 6, 7, 8, and 9).

5 Discussion

Even though chatbot is not a new technology, our results indicate that the application of chatbots for promoting SRL has only recently attracted attention from educational researchers and practitioners, i.e., over 90% of the papers in the reviewed corpus were published after 2020. Unlike some other educational technologies that have been widely researched as support for student SRL over the past decade – e.g., intelligent tutoring systems (Duffy & Azevedo, 2015; Dever et al., 2023; Taub et al., 2021) and computer-based scaffolding environments (Molenaar et al., 2012; Srivastava et al., 2022; Lim et al., 2023) – the use of chatbots to this purpose appears yet to be more deeply explored.

The two most prominent chatbot architectures in the reviewed corpus were NLP-driven and rule-based chatbots. NLP-driven chatbots utilize NLP and machine learning methods to derive the meaning from user input and understand user intents. Even though the NLP-driven models often require extensive training before they can be applied, chatbots based on this architecture typically offer more robustness in interpreting insufficiently clear and grammatically incorrect student inputs. We found DialogFlow to be a commonly used NLP platform powering NLP-driven chatbots for SRL (Deveci Topal et al., 2021; Bailey et al., 2021). On the other hand, rule-based chatbots use a set of predefined rules, e.g., a tree-like decision flow, to map student input to appropriate chatbot response. These rules are created after anticipating users’ input and pre-scripted during the bot design. Rule-based chatbot provides better behavior control and may be a particularly applicable architecture for researchers aiming to explicitly map user inputs to SRL processes, e.g., “What should I do first?” can be mapped to goal setting, and, based on that, chatbot may provide a series of prompts to guide a learner to set their goals. In this review, researchers utilised chatbots to support learning in a very diverse set of educational domains (i.e., 15 different domains identified in our corpus) and this finding aligns with findings from the previously conducted literature reviews (Winkler & Söllner, 2018; Pérez et al., 2020; Smutny & Schreiberova, 2020) that also reported that researchers tended to apply educational chatbots in diverse domains. The main reason for this cross-domain popularity of chatbots may be because this technology was designed to adapt to conversation with different users and on different topics.

Educational chatbots for self-regulated learning have mainly supported learners’ processes at one or two phases of the Winne and Hadwin model of SRL. Our findings suggest that the current design and implementation of educational chatbots lack the ability to aid the whole SRL cycle and thus provide students with comprehensive SRL support addressing all the four phases of SRL defined in Winne and Hadwin (1998) and Winne (2018). Commonly, almost all of the reviewed chatbots were designed to promote the enactment of learning tactics and strategies that educators deemed to be important for success in different learning tasks. For example, in the writing thesis statement task (Lin & Chang, 2020), educators may guide students to strategically engage the following learning activities “identify relevant passage” \(\rightarrow \) “identify claims” \(\rightarrow \) “compose thesis statement” \(\rightarrow \) “evaluate your conceptual understanding” \(\rightarrow \) “revise thesis statement”. The bot was designed to provide guidance to students on these activities, in any order they prefer. For this reason, utterances between learners and chatbots have been often mapped to specific learning tactics to reinforce the learning of students, taking into account required learning activities. In this way, chatbots have served as a potentially effective supplement to traditional classroom teaching, the trend also identified in the previous literature (Pérez et al., 2020). This finding ties with another finding from our review showing that chatbots mainly supported operations of rehearsing, assembling and translating, i.e., cognitive operations that are typically contingent upon task conditions (Winne, 1995), such as integrate information from several readings in an essay or recap a math formula in a quiz. Together, these findings may suggest that design of SRL chatbots was primarily informed by the nature of specific learning tasks, e.g., persuasive writing, numerical conversion and software programming, and, as such, dependent upon expected sequences of actions that learners should take to address those tasks.

To a lesser extent, chatbots supported students to metacognitively evaluate their immediate and past studying, and to adapt their studying accordingly. For instance, by using interactive and personalised feedback from chatbots the students were afforded the opportunity to engage in judgement of learning (Cai et al., 2021; Chang et al., 2022b; Lin & Chang, 2020; Oliveira et al., 2021), and evaluate and adapt learning strategies they used during the task. In two of the studies, engagement in metacognitive judgement of learning was reported to be associated with increased student engagement in critical thinking (Chang et al., 2022b) and writing performance (Lin & Chang, 2020), further confirming the potential of educational chatbots to support student metacognition, which is considered to be one of the central processes for productive SRL (Winne, 2018). Students’ internal conditions such as motivation, self-efficacy, and interest in a task, are often measured using self-report questionnaires, interviews and self-reflection prompts administered before or after the learning session. Data that dynamically capture student internal conditions as they evolve during the session is rarely collected, making it hard for educational technologies to provide immediate support adaptive to learning conditions. Even though some chatbots we reviewed have demonstrated ability to promote internal conditions, e.g., learning motivation (Yin et al., 2021), perception of learning (Neumann et al., 2021) and self-efficacy (Chang et al., 2022a), the capability of educational chatbots to provide responses sensitive to evolving internal conditions remains limited. We also note that many chatbots in the reviewed corpus supported students to search for, gather and access learning resources for their tasks. As chatbots have been traditionally used in dialogue systems for customer service and information acquisition (Serban et al., 2017; Winkler & Söllner, 2018), we speculate the popularity of this feature in educational chatbots may have been naturally inherited from the field of customer service and adapted to support students as they gather learning content. Moreover, the recent explosion of advanced generative language models that generate sophisticated human-like responses and engage in natural language conversations, such as ChatGPT, has opened up new possibilities to improve educational chatbots from being tools mainly used for information acquisition to a powerful pedagogical tools that can revolutionize how students learn by offering personalized learning experiences and real-time guidance adapted to the student’s learning skills and knowledge of content.

We found that chatbots in the reviewed studies generally promoted increase in productive SRL processes and learning performance of students across different domains, confirming the potential of this technology to support SRL. Non-significant effects were identified in a group of studies and we attribute this finding to several possible reasons. Student motivation and engagement in learning sessions facilitated with chatbot may have dropped as many students may feel isolated in such learning context and may prefer direct support from teachers instead (Zhang et al., 2020). This may further lead to challenges in sustaining students’ learning interest in a task, as indicated in one of the reviewed studies (Fryer et al., 2017). As the reviewed chatbots have mainly supported university students, it may be expected that many students in this population already possessed a preferred catalogue of learning strategies and that one-time session with chatbot may not be sufficient to help those students alter their approaches to learning. Another reason for non-significant effects may be related to chatbot’s challenges to always provide satisfactory and accurate responses, that clearly target particular learning processes (Deveci Topal et al., 2021).

6 Conclusion and implications for further studies

The findings of this systematic literature review indicate the increasing interest of researchers in using educational chatbots to support self-regulated learning. The reviewed studies predominantly employed NLP-driven and rule-based chatbot architectures. Both architectures have shown potential in promoting various processes in SRL, particularly in the enactment of learning strategies and cognitive operations such as assembling, translating, and monitoring. Despite these advancements, the review identifies significant gaps in the comprehensive support of SRL. None of the chatbots have provided SRL support across all the four phases of SRL, as proposed by Winne and Hadwin Winne and Hadwin (1998). The support often involved guiding students through the steps within specific learning tasks rather than offering a holistic support to student SRL processing. The effects of chatbots on students’ SRL processes and learning performance appeared to be mixed. While many studies reported improvements in the use of learning strategies, student engagement, and self-efficacy, others found limited or non-significant effects on learning performance.

Based on the findings from this systematic literature review, we propose the following areas of investigation towards advancing research on chatbots and SRL.

1. Create chatbots that provide a comprehensive SRL support across all the phases

Our results suggest that, to date, there has been no chatbot designed to provide a comprehensive support across all the four phases of SRL defined in the Winne and Hadwin model. For instance, even though student engagement in goal settings, planning, and adaptation has been widely documented to benefit student learning experiences and performance (Alessandri et al., 2020; Raković et al., 2022a; Rakovic et al., 2022b), SRL processes at these stages have been rarely supported in the reviewed corpus, which may partially explain small, insignificant and limited effects of several chatbots on student achievements in this review. Within the SRL framework, each phase builds upon the previous one creating a cyclical process that allows students to continuously improve their learning strategies and accomplish their learning goals. Supporting studying in each phase of SRL may provide students with better control over their learning and may lead to greater academic success, increased confidence and motivation in one’s ability to learn.

2. Identify specific learning tasks in which chatbots can provide most effective support

While it is important to apply chatbots in different subject domains, it is equally important to identify specific tasks within those domains where chatbots can be most effective. By doing so, researchers and educators can ensure that chatbots are used in a targeted and effective manner, maximizing the impact of this technology on students’ learning experience. In this way, chatbots may help learners develop a catalogue of task-specific learning skills and transfer these skills to similar tasks in the future.

3. Evaluate the effectiveness of SRL chatbots in longitudinal studies

Most of the educational chatbots in this review have been evaluated in small scale studies, e.g., a one-time intervention administered in one class. Such a lack of longitudinal data might impede researchers from gaining a deeper understanding of the long term benefits of SRL chatbots. Therefore, it may benefit future research in educational technology and learning sciences if researchers conduct a longitudinal study, e.g., a study spanning over one or several semesters, to examine the effects of chatbot on the development of students’ SRL skills over time.

4. Use chatbot to elicit students’ internal conditions

Student internal conditions including prior knowledge, motivation, interest, self-efficacy, achievement goals, utility value and outcome expectations can have a significant impact on how learners approach and engage with the learning process (Meece, 2023). However, there are very limited existing efforts in learning analytics focusing on understanding and eliciting internal conditions (Matcha et al., 2019). These constructs have been typically measured at the beginning of a learning task. Since SRL is a dynamic and cyclic process (Panadero, 2017), student internal conditions may often change during the learning session, also affecting other processes that learners enact. For example, use of effective learning strategies and accomplishment of some learning goals early in a learning session may increase students’ self-efficacy and motivation later in the session, compared to what learners reported at the outset of the session. Given the conversational and interactive nature of chatbots, researchers may consider using this technology as an instrument that dynamically captures changes in internal conditions and helps learners to reflect on their own learning process, e.g., by engaging in dialogues with learners, asking questions that gauge students’ understanding, their learning goals and confidence levels, analyzing the content and frequency of student interactions, providing feedback on their progress, and offering suggestions for improvement.

5. Record and analyse what students did, not only what they say they did

Digital trace-data, e.g., navigation logs, text annotations and keystrokes, that students generate in digital learning environments have been increasingly harnessed to unobtrusively measure SRL (Fan et al., 2022; Rakovic et al., 2022b; Lim et al., 2023). For instance, trace-data are often mapped to theorised SRL processes and dynamically analysed (e.g., by using process mining and natural language processing approaches) to obtain a more complete picture of student learning behaviors. To our knowledge, educational chatbots developed to date have mainly gathered information about student learning in two ways (1) via self-reports, e.g., based on what students said to the bot they did, and (2) via student performance data, e.g., correct/incorrect answers on a test. Researchers, however, have identified several challenges related to those methods. For example, students self-reports may often be insufficiently accurate and biased towards student beliefs or social desirability (Winne, 2022), whereas performance data often cannot directly inform the intervention (Arizmendi et al., 2022). Researchers may consider introducing trace-data as an additional input to SRL chatbots to ensure bots more accurately monitor student SRL processes as they dynamically unfold and, based on that, provide students with a more accurate and timely SRL support.

6. Support students at all educational levels

The chatbots in the reviewed corpus have mainly supported university students. More research is needed to adapt chatbots to cater to the needs of students at other levels of education, e.g., primary and secondary. Students at different levels of education may have different learning needs and preferences (Ambrose et al., 2010). By conducting research and adapting chatbots to cater these specific needs from different levels of students, we can ensure that the benefits of educational chatbots are accessible to students of all developmental stages, potentially creating more effective, engaging and inclusive learning environment for them. Also, university students have become more experienced learners and have often formed certain learning habits. This may make it challenging for the educational chatbot to induce some changes to learning. An early intervention at a primary or secondary education level could help in preparing students to better self-regulate their learning at a tertiary education level and throughout life.

7. Improve the effectiveness and accuracy of chatbot responses by harnessing the potential of large language models and generative AI

In our review, a notable limitation of current educational chatbots is their often unsatisfactory responses (Deveci Topal et al., 2021), highlighting a gap in understanding users’ intentions and providing relevant support. This indicates that the current chatbots may have limited ability to interpret users’ intentions and provide adequate support, which further may hamper student engagement and motivation. Researchers may utilise the rapidly emerging technologies of generative AI specifically large language models like ChatGPT, that can handle complex language problems, e.g., large language models such as ChatGPT, to enhance the chatbots’ ability to understand learners’ intentions and provide appropriate responses in the context of SRL. In this way, the volume of productive interactions between students and SRL chatbots may improve student learning experiences and interest in studying with a bot, marking a step forward in AI-driven education.