Learning Analytics in Education for the Twenty-First Century

De Witte, Kristof; Chénier, Marc-André

doi:10.1007/978-3-031-16624-2_16

Kristof De Witte^6,7 &
Marc-André Chénier⁸

7938 Accesses
1 Citations
3 Altmetric

Abstract

The online traces that students leave on electronic learning platforms; the improved integration of educational, administrative and online data sources; and the increasing accessibility of hands-on software allow the domain of learning analytics to flourish. Learning analytics, as in interdisciplinary domain borrowing from statistics, computer sciences and education, exploits the increased accessibility of technology to foster an optimal learning environment that is both transparent and cost-effective. This chapter illustrates the potential of learning analytics to stimulate learning outcomes and to contribute to educational quality management. Moreover, it discusses the increasing emergence of large and accessible data sets in education and compares the cost-effectiveness of learning analytics to that of costly and unreliable retrospective studies and surveys. The chapter showcases the potential of methods that permit savvy users to make insightful predictions about student types, performance and the potential of reforms. The chapter concludes with recommendations, challenges to the implementation and growth of learning analytics.

You have full access to this open access chapter, Download chapter PDF

Big Data Learning Analytics: A New Perpsective

Learning analytics in higher education: a preponderance of analytics but very little learning?

Article Open access 04 May 2021

Technological barriers and incentives to learning analytics adoption in higher education: insights from users

Article 13 March 2019

1 Introduction

Education stakeholders are currently working within an environment where vast quantities of data can be leveraged to have a deeper understanding of the educational attainment of learners. A growing pool of data is generated through software with which students, teachers and administrators interact (Kassab et al., 2020), through apps, social networking and the collection of user behaviour on aggregators such as YouTube and Google (De Wit & Broucker, 2017). Moreover, thanks to the Internet of Everything phenomenon, stakeholders in the education domain have access to data in which people, processes, data and things connect to the internet and to each other (Langedijk et al., 2019). That data takes on non-traditional formats and retains language, location, movement, networks, images and video information (Lazer et al., 2020). Such non-traditional data sets require cutting-edge analytical techniques in order to be effectively used for learning purposes and to be translated into succinct policy recommendations.

Learning analytics, as an interdisciplinary domain borrowing from statistics, computer sciences and education (Leitner et al., 2017), exploits this new data-rich landscape to improve the learning process and outcomes of current and future citizens (De Wit & Broucker, 2017). In education, learning analytics is set squarely within the new computational social sciences, which consist in the “development and application of computational methods to complex, typically large-scale human behavioral data” (Lazer et al., 2009). Learning analytics directs these advances towards the creation of actionable information in education. It applies data analytics to the field of education, and it attempts to propose ways to explore, analyse and visualize data from any relevant data source (Vanthienen & De Witte, 2017). An important role of learning analytics is the exploitation of the traces left by students on electronic learning platforms (Greller & Drachsler, 2012). As such, learning analytics allows teachers to maximize the cognitive and non-cognitive education outcomes of students (Long & Siemens, 2011). In an optimal learning environment, one would maximally leverage the potential of students to increase their welfare and performance not only during schooling but also afterwards, across civil society.

As the COVID-19 pandemic induced shifts towards online and home education, there is an increased opportunity for data analytics in general and to mitigate the crisis’ effects both on learning outcomes (Maldonado & De Witte, 2021) and on the well-being of students (Iterbeke & De Witte, 2020) in particular. The online traces that students leave on electronic learning platforms allow teachers, schools and policy-makers to better tailor targeted remedial teaching interventions to the most needy students. The closures of schools also showed how unequally digital devices are spread among students, with significant groups of disadvantaged students without access to basic digital instruments such as stable broadband access and computer. Similarly, the school closures revealed significant differences between countries in their readiness for online teaching and in the availability of high-quality digital instruction. Still, thanks to the unprecedented crisis, multiple countries made significant investments in the educational ICT infrastructure (De Witte & Smet, 2021). If this coincides with improved training of teachers and school managers; an improved integration of educational, administrative and online data sources; and the improved accessibility of hands-on software, we expect to see the domain of learning analytics to further flourish in the next decades.

The following chapter aims to contribute to this accelerated use of learning analytics by picturing its potential in multiple educational domains. We first discuss the increasing emergence of large and accessible data sets in education and the associated growth in expertise in educational data collection and analysis. This is sustained by real-time streamed data and increasingly autonomous administrative data sets. Section 16.2 compares the cost-effectiveness of learning analytics to that of costly and unreliable retrospective studies and surveys. Learning analytics may also contribute to the improvement in the quality of the currently dispensed education through fraud detection and student performance prediction, for example. In Sect. 16.3, three tools of growing popularity and potential for learning analytics are presented: the Bayesian Additive Regression Trees (BART), the Social Network Analysis (SNA) and the Natural Language Processing (NLP). These tools permit savvy users to make insightful predictions about student types, performance and the potential of reforms. The brief description of these techniques aims to familiarize practitioners and decision-makers with their potential. Finally, alongside recommendations, technical and non-technical challenges to the implementation and growth of learning analytics and empirically based education in general are discussed. As the growing possibilities of learning analytics result in sensitive options regarding data usage and linkages, we discuss in the conclusion section the related ethical and legal concerns.

2 Potential for Educators and Citizens

2.1 Growing Opportunities for Data-Driven Policies in Education

“Students and teachers are leaving large amounts of digital footprints and traces in various educational apps and learning management platforms, and education administrators register various processes and outcomes in digital administrative systems” (Nouri et al., 2019). In this section, we discuss three trends that allow for growing opportunities in fomenting creative data-driven policies in education: (1) the development of online teaching platforms, (2) software-oriented administrative data collection with links between heterogeneous data sets (Langedijk et al., 2019) and (3) the Internet of Things (Langedijk et al., 2019).

First, consider the online teaching platforms. A prime example of the latter are massive open online courses (i.e. MOOCs, De Smedt et al., 2017). Institutional MOOC initiatives have been contributing to making high-quality educational material accessible to a wide range of students and to maintaining the prestige of the participating institutions (Dalipi et al., 2018). For adults, MOOC completion has also been associated with increased resilience to unemployment (Castaño-Muñoz & Rodrigues, 2021). From a learning analytics perspective, it is interesting to observe that all student activities can be tracked within the MOOC. This information has been studied to give empirical grounding to suggestions to reduce course dropout by fostering peer engagements on online forums, team homeworks and peer evaluations (Dalipi et al., 2018). From a methodological perspective, some of the innovative methodologies exploiting MOOC’s large data sets include K-means clustering, support vector machines and hidden Markov models.^{Footnote 1}

A second trend in data-driven policies in education arise from software-oriented administrative data collections. These refer to the digital warehousing of administrative data such that this data can be relatively easily linked with other data sets and easily transformed through, for example, the inclusion of a large quantity of new observations (e.g. student files) and the ad hoc addition of new variables of interest (Agasisti et al., 2017). Administrative data sets are built around procedures whose aims are not primarily to foster data-driven policies (Barra et al., 2017; Rettore & Trivellato, 2019). In that sense, they can provide rich information about students and other educational stakeholders while being quicker to gather and significantly cheaper than retrospective surveys (Figlio et al., 2016).

As a major advantage, software-oriented administrative data collections can be easily linked to other data sources, such as the wide array of information surveyed by local governments in their interactions with citizens. Through software integration, data regarding such diverse domains as public health and agriculture may be seamlessly captured. To conceptualize the diversity of potential data sources, Langedijk et al. (2019) describe those data as divided into thematic silos. Each silo represents an important civil concern, health or education, for example, and within each silo, stakeholders can define sub-themes onto which interesting data sets are attached. For example, in the case of education, some proposed sub-themes are standardized test results, textbook quality and teacher quality^{Footnote 2} (Langedijk et al., 2019). Through the development of electronic networks, links cannot only be established within silos, where policy-makers may, for instance, be interested in the relation between teacher quality and test scores, but also across silos, where improvement in learning outcomes can be associated with changes in the health of citizens (Langedijk et al., 2019). The analyses required to measure such associations can take advantage of the typically long-run collection of administrative data (Figlio et al., 2016). As an additional advantage of the electronic networks, whereas data has traditionally been transmitted in batches, in order to produce descriptive reports at set time intervals, for example, electronic networks now permit event registration in real time (De Wit & Broucker, 2017; Mukala et al., 2015). The real-time extraction of data benefits teachers and students who can rely, for example, on automated assignments and online dashboards in order to improve their learning experience and their learning outcomes (De Smedt et al., 2017).

A good example of data set linkages in education are studies with population data that aim to explore education outcomes in specific subgroups. A recent study by Mazrekaj et al. (2020) made use of the rich micro-data sets made available to researchers by the Dutch Central Bureau of Statistics (CBS). These micro-data cover many themes of social life (e.g. financial, educational, health, environmental, professional silos and more) and are, though of limited access because of privacy issues, easy to link together with standard analytics software.

Third, consider the Internet of Things. The Internet of Things denotes the numerous physical devices with integrated internet connectivity (DeNardis, 2020). In educational settings, these devices are the computers, SNS services, mobile devices, camera, sensors and software with which students, teachers and administrators interact (Kassab et al., 2020). They are used to monitor student attendance and class behaviour and their interactions with online teaching services and laboratories. On online platforms, but also through mobile apps and logging platforms (e.g. library access, blogs, electronic learning environment), students’ and tutors’ behaviours and opinions can be monitored in real time and passed through automatic analytics platforms or saved to solve future policy issues (De Smedt et al., 2017; De Wit & Broucker, 2017). Similarly, RFID (radio-frequency identification) sensors track the locations and availability of educational appliances such as laboratory equipment and projectors. Students and tutors can communicate with each other regardless of location, and assessment feedback can be delivered instantaneously, resulting in higher-quality education.

2.2 Learning Analytics as a Toolset

The toolset of learning analytics can be used for several purposes. We first provide some examples on how it can contribute to improve the cost-effectiveness of education and next how it can foster education outcomes on cognitive and non-cognitive scales. Finally, we provide examples of how learning analytics can assist in educational quality management.

2.2.1 Improving Cost-Effectiveness of Education

The increasing public scrutiny and tighter budgets, which are an ever-present reality of the educational landscape, motivate a double goal for data-driven solutions. These must improve efficiency and performance with regard to learning outcomes while also proposing solutions that are competitive in terms of cost (Barra et al., 2017). There are two poles through which cost-effective learning analytics solutions can be proposed.

The first pole stands at the level of data collection. Administrative data sets suffer from their high cost of data cleaning and collection. Indeed, although data extraction is usually native to recent administrative software (King, 2016), administrative data sets typically require ad hoc linkages and research designs (Agasisti et al., 2017). In the sense that their inclusion in data-driven decision-making is not their primary purpose, they constitute an opportunistic data source and thus may occasionally demand more resource investments than deliberate data collection procedures. Meanwhile, the omnipresent network of computing devices and the associated online educational platforms permit data extraction at every step of the learning process (De Smedt et al., 2017). As previously indicated, this type of unstructured data can be saved, but the real-time data stream can also be designed in such a way to permit automatic analyses. This deliberate pipeline associating the collected data to useful analyses can insure cost-effectiveness through economies of scales. It can also serve as a baseline to future improvements in summarizing data for students, teachers and stakeholders in general. In short, rich data sets and insightful analyses can be produced without requiring punctual organizational involvement. In that sense, the environment in which learning analytics is embedded permits professionals and stakeholders to benefit from opportunistic analyses and from insights that are delivered efficiently (Barra et al., 2017). For example, during the COVID-19 crisis, learning analytics was used to monitor how students were reached by online teaching.

The second pole to achieve cost-effectiveness in the establishment of data-driven policy-making for education is that of data analytics. Up until now, technologically able and creative teams have been achieving parity with the expanding volume, variety and velocity of data by developing and applying advanced analytical methods (De Wit & Broucker, 2017; King, 2016). One such method is Data Envelopment Analysis (DEA). It permits the employment of administrative and learning data in order to directly fulfil goals related to cost minimization (Barra et al., 2017; De Witte & López-Torres, 2017; Mergoni & De Witte, 2021). The result of such analyses may be useful in promoting efficient investments in educational resources (see, e.g. the report by the European Commission Expert Group on Quality Investment in Education and Training). Additional spending brings to the forefront its paradoxical effect of increasing cost-effectiveness in the long run. Advances in social sciences have already demonstrated the consequences of poor learning outcomes, the principal of which are “lower incomes and economic growth, lower tax revenues, and higher costs of such public services as health, criminal justice, and public assistance” (Groot & van den Brink, 2017). Hence, learning outcomes deserve an important place in discussions around the cost-effectiveness of education (De Witte & Smet, 2021).

2.2.2 Improving Learning Outcomes

In terms of directly improving educational quality, three ambitions can be distinguished for learning analytics: making improvements in (non-)cognitive learning outcomes, reducing learning support frictions and a wide deployment and long-term maintenance for each teaching tool (Viberg et al., 2018). These ambitions are now discussed.

First, learning outcomes can be interpreted as the academic performance of students, as measured by quizzes and examinations (Viberg et al., 2018). Learning outcomes can also be defined in a broader way than similar testable outcomes, for example, by being related to interpersonal skills and civic qualities. However widely defined, it is important that the set of criteria identifying educational success is well-defined by stakeholders and that it is clearly communicated to and open to the contributions of citizens. In that way, educational policy discussions can be centred around transparent and recognized aims.

Although there is a rich literature evaluating learning analytics in higher education, the contributions of learning analytics tools to improving the (non-)cognitive learning outcomes of secondary school students have received relatively little attention in the empirical literature (Bruno et al., 2021). Nevertheless, clear improvements in writing and argumentative quality have been associated with the use of automatic text evaluation softwares (Lee et al., 2019; Palermo & Wilson, 2020). These softwares use Natural Language Processing (NLP) to analyse data extracted from online learning platforms. Automatic text evaluation has also shown promising results at higher education levels and with non-traditional adult students (Whitelock et al., 2015b). There is thus flexibility in terms of the type of students or teachers to whom learning analytics approaches apply.

Another interesting contribution of learning analytics to the outcomes of secondary school students has been in improving their computer programming abilities. This has been accomplished through another advanced data analysis technique, process mining, which helped teachers in pairing students based on captured behavioural traces during programming exercises (Berland et al., 2015).

Second, with respect to learning support frictions, there is often a lag between the assumptions behind the design of learning platforms and the observed behaviours of students (Nguyen et al., 2018). An example of this lag is that students tend to spend less time studying than recommended by their instructors. Less involved students also tend to spend less time preparing assignments (Nguyen et al., 2018). By reducing their ability to receive feedback in a timely manner, a similar lag can negatively affect both students’ and teachers’ involvement in the learning process. Thanks to learning analytics tools, students will receive tailored feedback, will rehearse exercises that are particularly difficult for them and will receive stimulating examples that fit their interest (Iterbeke et al., 2020). This reduces the learning support frictions and consequently improves learning outcomes.

Yet, the lag between the desired learning outcomes and student behaviour cannot be corrected simply through the implementation of electronic platforms or through a gamification of the learning process. It is critical that the digital tools being implemented and those implementing them take students’ feedback into account. Many students are now used to accessing information without having to pass through much in the manner of physical and social barriers. For those students, the interactivity and the practicality of the digital learning tools are particularly important (Pardo et al., 2018; Selwyn, 2019). Other students may not have the same familiarity with online computing devices. For these, accessibility has to be negotiated into the tools.

Many authors warn of a transfer from magisterial education to learning platforms in which feedback and exercises may be too numerous, superficial or ill-adapted to students’ capabilities or learning ambitions (Lonn et al., 2015; Pardo et al., 2018; Topolovec, 2018). Hence, a hybrid approach to learning support is suggested wherein technologies, such as those just touched upon of automatic text analyses and process mining, are combined with personalized feedback from teachers and tutors. Indeed, classroom teaching is often characterized by a lack of personalization and biases in the dispensation of feedback and exercises. For example, low-performing students are over-represented among the receiver of teacher feedback. Additionally, given the same learning objectives, feedback may be administrated differently to students of different genders and origins. Teachers may find learning analytics tools useful in helping their students attain the desired learning outcomes while fostering their personal learning ambitions and their self-confidence (Evans, 2013; Hattie & Timperley, 2007).

Third, learning analytics can provide additional value to students and teachers. In that sense, we observe several clear advantageous applications of learning analytics.

Learning analytics could contribute to non-cognitive skills, as collaboration is an area where non-cognitive skills play an important role. Identifying collaboration and the factors that incite it can improve learning outcomes and even help in preventing fraud. The implementation of analytics methods such as Social Network Analysis (SNA) in learning platforms may allow teachers to prevent or foster such collaborations (De Smedt et al., 2017). Simple indicators like the time of assignment submission can be treated as proxies for collaboration. We discuss SNA more into depth in Sect. 16.3.
Another computational approach, process mining, can exploit real-time behavioural data to summarize the interactions of students with a given course’s material. Students can then be distinguished based on their mode of involvement in the course (Mukala et al., 2015). It allows teachers to learn how the teaching method results in behavioural actions. These insights can be incorporated in the course design, and on the detection of inefficient behaviour, allowing fast and personalized intervention (De Smedt et al., 2017).
A conjoint method to generate value from learning analytics is by implementing text analyses directly on the learning platforms. Natural Language Processing (NLP) is a text analysis method that has been shown to greatly improve the performance of students with regard to assignments such as the writing of essays (Whitelock et al., 2015a). Generally, text analysis can provide automated feedback shared with the students and their teachers (De Smedt et al., 2017). Providing automated feedback makes another argument for the cost-effectiveness of learning analytics. By giving course providers the ability to score large student bodies, it allows teachers to put more focus onto providing adapted support to their students (De Smedt et al., 2017). We discuss NLP into more depth in Sect. 16.3.
Not the least advantage of online learning is that it allows asynchronous and synchronous interactions and communications between the participants to a course (Broadbent & Poon, 2015). These interactions can be logged as unstructured data and incorporated into useful text, process and social network analyses.

2.2.3 Educational Quality Management

A key component of quality improvement in education is the creation of quality and performance indicators related to teachers and schools (Vanthienen & De Witte, 2017). Learning analytics’ contribution to educational quality improvement is in providing data sources and computational methods and combining them in order to produce actionable summaries of teaching and schooling quality (Barra et al., 2017). Whereas, traditionally, data analyses have required punctual involvement and costly (time) investments from stakeholders, learning analytics can rely on computational power and dense networks of computational devices to automatically propose real-time reports to policy-makers. Below, contributions in terms of quality measurement and predictions are introduced.

2.2.4 Underlying Data for Quality Measurement

Through the exploitation of unstructured, streamed, behavioural data and pre-existing administrative data sets, analytical reports can be updated in real time to reflect the state of education at any desired level, from the individual student and classroom to the country as a whole. That information is commonly ordered in online dashboards (De Smedt et al., 2017). Analysts and programmers can even allow the user to customize the presented summary in real time, by applying filters on maps and subgroups of students, for example.

2.2.5 Efficiency Measurement

An aspect of the quality measurements provided by learning analytics is efficiency research, in which inputs and outputs are compared against a best practice frontier (see the earlier discussed Data Envelopment Analysis model). In this branch of literature, schools are, for instance, compared based on their ability to maximize learning outcomes given a set of educational inputs (De Witte & López-Torres, 2017; e Silva & Camanho, 2017; Mergoni & De Witte, 2021). The outcome of a similar analysis might be used for quality assessment purposes.

2.2.6 Predictions

When discussing the potential of learning analytics for educators and stakeholders, the ability to make predictions about learning outcomes is an unavoidable point of interest. In quantitative analyses, predictions are generated by translating latent patterns in historical data, be it structured or unstructured, in order to identify likely future outcomes (De Witte & Vanthienen, 2017).

Predictions can be produced using, for example, the Bayesian Additive Regression Trees (BART) model (see Sect. 16.3), as applied in Stoffi et al. (2021). There, linked administrative and PISA data available only in Flanders is used to distinguish a group of overwhelmingly under-performing Walloon students and explain their situation. Typically, such a technique uses administrative data that is available for both endowment groups in order to make a sensible generalization from one to the other.

Alternatively, process mining can be used to identify clusters of students and distinguish successful interaction patterns with a course’s material (Mukala et al., 2015). Similar applications can be imagined for Social Network Analysis (De Smedt et al., 2017), through the evaluation of collaborative behaviour, and Natural Language Processing. These techniques are usually perceived as descriptive, but their output may very well be included in a predictive framework by education professionals and researchers.

Learning analytics has initiated a shift from using purely predictive analytics as a mean to identify student retention probabilities and grades towards the application of a wider set of methods (Viberg et al., 2018). In return, cutting-edge exploratory and descriptive methods can improve traditional predictive pipelines.

3 An Array of Policy-Driving Tools

It is one thing to comb over the numerous contributions and potential of learning analytics to data-informed decision-making; it is yet another to actually take the plunge and settle on tools for problem-solving in education. In what follows, a brief introduction to distinct methods from the field of computational social sciences is provided. In that way, the reader can get acquainted with the intuition of the methods and how they can be used to improve learning outcomes and quality measurement in education. To set the scene, we also illustrate how the approaches open up the range of innovative educational questions that can be answered through learning analytics.

3.1 Bayesian Additive Regression Trees

The Bayesian Additive Regression Trees (BART) stems from machine learning and probabilistic programming. It is a predictive and classifying algorithm that makes solving complex prediction problems simple by relying on a set of sane parameter configurations. Earlier comparable algorithms such as the Gradient Boosting Machine (GBM) and the Random Forest (RF) require repeated adjustments that hinge the quality of their predictions on an analyst’s programming ability and limited computational resources. By contrast, the BART incorporates prior knowledge about educational science problems in order to produce competitive predictions and measures of uncertainty after a single estimation run (Dorie et al., 2019). This contributes to the accessibility of knowledge discovery and the credibility of policy statements in education.

As with the GBM and the RF, the essential and most basic component of the BART algorithm is the decision or prediction tree. The prediction tree is a classic predictive method that, unlike traditional regression methods, does not assume linear associations between sets of variable. It is robust to outlying variable values, such as those due to measurement error, and can accommodate a large quantity of data and high-dimensional data sets.

Their accuracy and relative simplicity have made regression trees popular diagnostic and prediction tools in medicine and public health (Lemon et al., 2003; Podgorelec et al., 2002). In education, a recent application of regression trees has been to explore dropout motivations and predictors in tertiary education (Alfermann et al., 2021). The regression tree algorithm (i.e. CART or classification and regression trees, Breiman et al., 2017) does variable selection automatically, so researchers are able to distinguish a few salient motivations, such as the perceived usefulness of the work, from a vast endowment of possible predictors.

To predict quantities such as test scores or dropout risk, regression trees separate the observations into boxes associating a set of characteristics with an outcome. The trees are created in multiple steps. In each of these steps, all observations comprised in a box of characteristics are split in two new boxes. Each split is selected by the algorithm to maximize the accuracy of the desired predictions. The end result of this division of observations into smaller and smaller boxes are branches through which each individual observation descends into a leaf. That leaf is the final box that assigns a single prediction value (e.g. a student’s well-being score) to the set of observations sharing its branch. Graphically, the end result is a binary decision tree where each split is illustrated by a programmatic if statement leading onto either the next binary split or a leaf.

The Bayesian Additive Regression Trees (BART) algorithm is the combination of many such small regression trees (Kapelner & Bleich, 2016). Each regression tree adds to the predictive performance of the algorithm by picking up on the mistakes and leftover information from the previously estimated trees. After hundreds or possibly thousands of such trees are estimated, complex and subtle associations can be detected in the data. This makes the BART algorithm particularly competitive in areas of learning analytics where a large quantity of data are collected and there is little existing theory as to how interesting variables may be related to the outcome of interest, be it some aspect of the well-being of students or their learning outcomes.

The specific characteristic of the BART algorithm is its underlying Bayesian probability model (Kapelner & Bleich, 2016). By using prior probabilistic knowledge to restrict estimation possibilities to realistic prediction scenario, the algorithm can avoid detecting spurious association between variables. Each data set, unless it constitutes a perfect survey of the entire population of interest, contains variable associations that are present purely due to chance. Such coincidental associations reduce the ability to predict true outcomes when they are included in predictive models. Thus, each regression tree estimated by the BART algorithm is kept relatively small. Because each tree tends to assign predictions to larger sets of observations (i.e. large boxes), the predictive ability of individual trees is bad. This is why analysts call them weak learners. However, by combining many such weak learners, a flexible, precise and accurate prediction function can be generated (Hill et al., 2020).

The BART algorithm has already been presented earlier in this chapter as a flexible technique to detect and explain learning outcome inequalities (Stoffi et al., 2021). A refinement of the algorithm also permits the detection of heterogeneous policy effects on the learning outcomes of students. This is showed in Bargagli-Stoffi et al. (2019), where it is found that Flemish schools with a young and less experienced school director benefit most from a certain public funding policy. The large administrative data sets provided by educational institutions and governments are well fit for the application of rewarding but computationally demanding techniques such as the BART (Bargagli-Stoffi et al., 2019).

3.2 Social Network Analysis

The aim of Social Network Analysis (SNA) is to study the relations between individuals or organizations belonging to the same social networks (Wasserman, Faust, et al., 1994). Relations between these actors are defined by nodes and ties. The nodes are points of observations, which can be students, schools, administrations and more. The ties indicate a relationship between nodes and can contain additional information about the intensity of various components of that relationship (e.g. the time spent collaborating, the type of communication; Grunspan et al., 2014). Specifically for education, SNA aims to describe the networks of students and staff and make that information actionable to stakeholders. Applications of SNA include the optimization of learning design, the reorganization of student groups and the identification of at-risk clusters of students (Cela et al., 2015). Through text analysis and other advanced analytics methods, SNA can handle unstructured data from school blogs, wikis, forums, etc. (Cela et al., 2015). We discuss five examples more in detail next and refer the interested readers to the review by Cela et al. (2015), who provides many other concrete applications of SNA in education.

As a first example, the recognized importance of peer effects, both within and outside the classroom, makes Social Network Analysis (SNA) a particularly useful tool in education (Agasisti et al., 2017; Cela et al., 2015; Iterbeke et al., 2020). Applications of SNA model peer effects indirectly as a component of unobserved school or classroom effects that influence the (non-)cognitive skills (Cooc & Kim, 2017). As a second example, SNA has been applied to describe and explain a multiplicity of phenomena in schools. In a study of second and third primary school graders from 41 schools in North Carolina, Cooc and Kim (2017) found that pupils with a low reading ability who associated with higher ability peers for guidance significantly improved their reading scores over a summer. Third, other relevant applications of SNA have been in assessing the participation of peers in the well-being, be it mental or physical, of students. Surveying 1458 Belgian teenagers, Wegge et al. (2014) showed that the authors of cyber-bullying were often also responsible for physically bullying a student. Additionally, it was observed that a majority of bullies were in the same class as the bullied students. Moreover, a map of bullying networks isolated some students as being perpetrators of the bullying of multiple students. In cases of intimidation and bullying, a clear advantage of SNA over the usual approaches is that the data does not depend on isolated denunciations from victims and peers. The analysis of Wegge et al. (2014) simultaneously identifies culprits and victims, suggesting a course of action that does not focus attention on an isolated victim of bullying. A fourth example application of SNA is in improving the managerial efficacy and the performance of employees within educational organizations. One way to do this is by identifying bottlenecks in the transmission of information through the mapping of social networks. This can take two forms in the language of SNA: brokerage and structural holes (Everton, 2012). In a brokerage situation, a single agent or node controls the passing of information from one organizational sub-unit to the other. Meanwhile, structural holes identify absent ties between sub-units in the network. In a school, an important broker may be the principal’s secretary, whereas structural holes may be present if teachers or staffs do not communicate well with one another (Hawe & Ghali, 2008). As a fifth illustration, the SNA method has been used to propose a typology of teachers based on the nature of their ties with students and to identify clusters of students more likely to be plagiarising with each other (Chang et al., 2010; Merlo et al., 2010; Ryymin et al., 2008). The ability to cluster students based on the intensity of their collaborations in a course has also been distinguished as a way to prevent fraud. Detecting cooperation between students is one of the key application of SNA in learning analytics (De Smedt et al., 2017).

3.3 Natural Language Processing

Natural Language Processing (NLP) is an illustration of the ability of computing machines to communicate with human languages (Smith et al., 2020). NLP applications can be achieved with relatively simple sets of rules or heuristics (e.g. word counts, word matching) or without applying cutting-edge machine learning techniques (Smith et al., 2020). Given NLP relies on machine learning techniques, it is better able to understand the context and reveal hidden meanings in communications (e.g. irony) (Smith et al., 2020).

In education, the use of NLP has been shown to improving students’ learning outcomes (Whitelock et al., 2015a) and promoting student engagement. Moreover, NLP systems have the potential to provide one-on-one tutoring and personalized study material (Litman, 2016). The automatic grading of complex assignments is a precious feature of NLP models in education. These may eventually become a cost-effective solution that facilitate the evaluation of deeper learning skills than those evaluated through answers to multiple-choice questions (Smith et al., 2020). By efficiently adjusting the evaluation of knowledge to the learning outcomes desired by stakeholders, NLPs can contribute to educational performance. External and open data sets have allowed NLP solutions to achieve better accuracy in tasks such as grading. Such data sets can situate words within commonly invoked themes or contexts, for example, allowing the NLP model to make a more nuanced analysis of language data (Smith et al., 2020). Access to rich language data sets and algorithmic improvements may even allow NLP solutions to produce course assessment material automatically (Litman, 2016). However, an open issue with machine learning implementations of NLP is that the features used in grading by the computer may not provide useful feedback to the student or the teacher (e.g. by basing the grade on word counts) (Litman, 2016). Reasonable feedback may still require human input.

4 Issues and Recommendations

Despite the outlined benefits and contributions of learning analytics, there are, however, still some issues and limitations. A clear distinction can be made between issues belonging to the technical and non-technical parts of learning analytics (De Wit & Broucker, 2017). In the first case, there are the issues related to platform and analytics implementations, data warehousing, device networking, etc. With regard to the non-technical issues, there are concerns over the public acceptance and involvement in learning analytics, private and public regulations, human resources acquisition and the enthusiasm of stakeholders as to the technical potential of learning analytics. We summarize these challenges and propose a nuanced policy pathway to learning analytics implementation and promotion.

4.1 Non-technical Issues

Few learning analytics papers mention ethical and legal issues linked to the applications of their recommendations (Viberg et al., 2018). Clearly, developments in learning analytics participate to and benefit from the expansion of behavioural data collection. The spread and depth of data collection are generating new controversies around data privacy and security. These have an important place in public discourse and, if mishandled by stakeholders, could contribute to further limiting the potential of data availability and computational power in learning analytics and similar disciplines (Langedijk et al., 2019). Scientists are currently complaining about the restrictions put upon their research by rules and accountability procedures. Such rules curtail data-driven enterprises and may be detrimental to learning outcome’s improvements (Groot & van den Brink, 2017). To facilitate collaboration between decision-makers, it is important that the administrative procedures related to learning analytics been seen by researchers as contributing to a healthy professional environment (Groot & van den Brink, 2017).

Additionally, public accountability and policies promoting organizational transparency may be a proper counter-balance to privacy concerns among citizens (e Silva & Camanho, 2017). The transparency and accessibility of information, by making relevant educational data sets public, for example, can involve citizens in the knowledge discovery related to education and foster enthusiasm for data-driven inference in that domain (De Smedt et al., 2017). It is also important that the concerned parties, including civil society, are interested in applying data-driven decision-making (Agasisti et al., 2017). It can be difficult to convince leaders in education to shift to data-driven policies since, for them, “experience and gut-instinct have a stronger pull” (Long & Siemens, 2011).

Just as necessary as political commitment, the acquisition of a skilled workforce is another sizeable non-technical issue (Agasisti et al., 2017). The growth of data-driven decision-making has yielded an increase in the demand for higher-educated workers while reducing the employment of unskilled workers (Groot & van den Brink, 2017). In other words, there is a gap between the growing availability of large, complex data sets and the pool of human resources that is necessary to clean and analyse those data (De Smedt et al., 2017). This invokes the problem, shared across the computational social sciences, of the double requirement of technical and analytical skills. Often, even domain-specific knowledge is an unavoidable component of useful policy insights (De Smedt et al., 2017). That multiplicity of professional requirements has made certain authors talk of the desirable modern data analyst as a scholar-practitioner (Streitwieser & Ogden, 2016).

4.2 Technical Issues

Many technical problems must be tackled before data-driven educational policies become a gold standard. Generally, there is a need for additional research regarding the effects of online educational softwares and of digital data collection pipelines on student and teacher outcomes. Additionally, inequalities in terms of the access to online education and its usage are an ever-present challenge (Jacob et al., 2016; Robinson et al., 2015).

There is yet relatively little evidence indicating that learning analytics improve the learning outcomes of students (Alpert et al., 2016; Bettinger et al., 2017; Jacob et al., 2016; Viberg et al., 2018). For example, less sophisticated correction algorithms may be exploited by students who will tailor their solution to obtain maximal scores without obtaining the desired knowledge (De Wit & Broucker, 2017). This is a question of adjustment between the spirit and the letter of the learning process.

Additionally, although the combination of administrative and streamed data is in many ways advantageous compared to survey data (Langedijk et al., 2019), the fast collection and analysis of data create issues of data accuracy. With real-time data analyses and reorientations of the learning process, accessible computing power becomes an issue.

Meanwhile, the unequal access to online resources and devices plainly removes a section of the student and teacher population from being reached by the digital tools of education. In part, this creates issues of under-representation in educational studies that increasingly rely on data obtained online (Robinson et al., 2015). It also creates a divide between those stakeholders that can make an informed choice between using and developing digital tools and face-to-face education and those that cannot access it or to whom digital education has a prohibitive cost (Bettinger et al., 2017; Di Pietro et al., 2020; Robinson et al., 2015).

Lack of access to digital or hybrid learning tools (i.e. a mix of face-to-face and digital education) may directly impede the learning and well-being of students. Indeed, students with access to online and hybrid education can access resources independently to enhance their educational pathway (Di Pietro et al., 2020). In a sense, a larger range of choices makes better educational outcomes attainable. For example, students at a school within a neighbourhood of low socio-economic standing may access a diverse network of students and teachers on electronic platforms (Jacob et al., 2016). In times of crisis such as with the COVID-19 school lockdowns, ready access to online educational platform also reduces the opportunity cost of education (Chakraborty et al., 2021; Di Pietro et al., 2020).

However, access is not a purely technical challenge. There are also noted gaps between populations in terms of the usage that is made of educational platforms and internet resources more generally (Di Pietro et al., 2020; Jacob et al., 2016). Students participating to MOOC, for example, are overwhelmingly highly educated professionals (Zafras et al., 2020). Online education may also leave more discretion to students. This discretion has proven to be a disadvantage to those who perform less well and are less motivated in face-to-face classes (Di Pietro et al., 2020).

4.3 Recommendations

Data-driven policies will require vast investments in information technology systems towards both data centres and highly skilled human resources. Therefore, additional data warehouses need to be built and maintained. Those require strong engineering capabilities (De Smedt et al., 2017). The integration of teaching and peer collaborations within computer systems promises to accelerate innovations in education. One can imagine that, in the future, administrative and real-time learning data will be updated and analysed in real time. The analyses will also benefit from combining data from other areas of interest such as health or finance. Additionally, the reach of analytics programs could be international, allowing for the shared integration and advancement of knowledge systems across countries (Langedijk et al., 2019).

Although there is a large practical potential of data-driven policies and educational tools, it is important that an educational data strategy not be developed in and of itself. Unlike what some big data enthusiasts have claimed, the data does not “speaks for itself” in education (Anderson, 2008). Those teachers, administrators and policy-makers, who are working to better educate our children, will still face complicated dilemma appealing to their professional expertise regardless of the level of integration of data analytics in education.

Furthermore, to insure political willingness, it is critical that work teams and stakeholders profit from the collected and analysed data (De Smedt et al., 2017). This contributes to the transparency of data use. Finally, although the evidence is still quite thin regarding the benefits of learning analytics, it must be noted that only a small quantity of validated instruments are actually being used to measure the quality and transmission of knowledge through learning platforms (Jivet et al., 2018).

Despite this scarcity of evidence pertaining to education, the exploitation of data through learning analytics can be linked to the recognized advantages of big data in driving public policy. Namely, it can facilitate a differentiation of services, increased decisional transparency, needs identification and organizational efficiency (Broucker, 2016). Generally, the lack of available data backing a decision is an indication of a lack of information and, thus, sub-optimal decision-making (Broucker, 2016).

Policies can be better implemented through quick and vast access to information about students and other educational stakeholders. In other words, the needs of students and other educational stakeholders can be more efficiently satisfied with evidence obtained from data collection (e.g. lower cost, higher speed of implementation). Such evidence-based education is a rational response to the so-called fetishization of change that has been plaguing educational reforms (Furedi, 2010; Groot & van den Brink, 2017).

It follows that data analytics should not become a new object for the fetishization of change in educational reforms. Indeed, quantitative goals (e.g. quantity of sensors in a classroom) should not be confounded with educational attainments (Long & Siemens, 2011; Mandl et al., 2008). Rather, data analytics should be developed and motivated as an approach that ensures that there are opportunities to use data in order to sustain mutually agreeable educational objectives.

These objectives may pertain to the lifetime health, job satisfaction, time allocation and creativity of current students (Oreopoulos & Salvanes, 2011). In other words, learning analytics pipelines must be carefully implemented in order to ensure that they are a rational response to contemporary challenges in education.

Notes

1.
K-means clustering divides the observations (e.g. a sample of teachers) into a quantity K of groups that share similar measured characteristics. That similarity is defined as the squared distance to the mean of the group’s characteristics (Bishop, 2006). Support vector machines construct a porous hyperplane that maximally separate the observations closest to it. They are particularly useful to solve classification problems with high-dimensional data (e.g. registered student activity during multiple lecture) (Bishop, 2006). Finally, hidden Markov models assume that measurements are generated by underlying hidden states. These hidden states are modelled as a Markov process (Bishop, 2006). That approach is particularly suited to the analysis of sequential data such as the quantity of attempts in an educational game (Tadayon, 2020).
2.
Teacher quality is a multi-dimensional concept that is often proxied by teacher value-added scores.

References

Agasisti, T., Ieva, F., Masci, C., Paganoni, A. M., & Soncin, M. (2017). Data analytics applications in education. Auerbach Publications. https://doi.org/10.4324/9781315154145-8
Alfermann, D., Holl, C., & Reimann, S. (2021). Should i stay or should i go? indicators of dropout thoughts of doctoral students in computer science. International Journal of Higher Education, 10(3), 246–258. https://doi.org/10.5430/ijhe.v10n3p246
Article Google Scholar
Alpert, W. T., Couch, K. A., & Harmon, O. R. (2016). A randomized assessment of online learning. American Economic Review, 106(5), 378–82. https://doi.org/10.1257/aer.p20161057
Article Google Scholar
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine, 16(7), 16–07.
Google Scholar
Bargagli-Stoffi, F. J., De Witte, K., & Gnecco, G. (2019). Heterogeneous causal effects with imperfect compliance: A novel bayesian machine learning approach. Preprint arXiv:1905.12707.
Google Scholar
Barra, C., Destefanis, S., Sena, V., & Zotti, R. (2017). Disentangling faculty efficiency from students’ effort. Data Analytics Applications in Education (pp. 105–128). Auerbach Publications. https://doi.org/10.4324/9781315154145-5
Berland, M., Davis, D., & Smith, C. P. (2015). Amoeba: Designing for collaboration in computer science classrooms through live learning analytics. International Journal of Computer-Supported Collaborative Learning, 10(4), 425–447. https://doi.org/10.1007/s11412-015-9217-z
Article Google Scholar
Bettinger, E. P., Fox, L., Loeb, S., & Taylor, E. S. (2017). Virtual classrooms: How online college courses affect student success. American Economic Review, 107(9), 2855–75. https://doi.org/10.1257/aer.20151193
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
MATH Google Scholar
Breiman, L., Friedman, J. H., Olshen R. A., & Stone, C. J. (2017). Classification and regression trees. Routledge.
Book MATH Google Scholar
Broadbent, J., & Poon, W. L. (2015). Self-regulated learning strategies & academic achievement in online higher education learning environments: A systematic review. The Internet and Higher Education, 27, 1–13. https://doi.org/10.1016/j.iheduc.2015.04.007
Article Google Scholar
Broucker, B. (2016). Big data governance; een analytisch kader. Bestuurskunde, 25(1), 24–28.
Article Google Scholar
Bruno, E., Alexandre, B., Ferreira Mello, R., Falcão, T. P., Vesin, B., & Gašević, D. (2021). Applications of learning analytics in high schools: A systematic literature review. Frontiers in Artificial Intelligence, 4, 132.
Google Scholar
Castaño-Muñoz, J., & Rodrigues, M. (2021). Open to moocs? Evidence of their impact on labour market outcomes. Computers & Education, 173, 104289. https://doi.org/10.1016/j.compedu.2021.104289
Article Google Scholar
Cela, K. L., Sicilia, M. Á., & Sánchez, S. (2015). Social network analysis in e-learning environments: A preliminary systematic review. Educational Psychology Review, 27(1), 219–246. https://doi.org/10.1007/s10648-014-9276-0
Article Google Scholar
Chakraborty P., Mittal, P., Gupta, M. S., Yadav, S., & Arora, A. (2021). Opinion of students on online education during the COVID-19 pandemic. Human Behavior and Emerging Technologies, 3(3), 357–365. https://doi.org/10.1002/hbe2.240
Article Google Scholar
Chang, W.-C., Lin, H.-W., & Wu, L.-C. (2010). Applied social network anaysis to project curriculum. In The 6th International Conference on Networked Computing and Advanced Information Management (pp. 710–715).
Google Scholar
Cooc, N., & Kim, J. S. (2017). Peer influence on children’s reading skills: A social network analysis of elementary school classrooms. Journal of Educational Psychology, 109(5), 727. https://doi.org/10.1037/edu0000166
Article Google Scholar
Dalipi, F., Imran, A. S., & Kastrati, Z. (2018). Mooc dropout prediction using machine learning techniques: Review and research challenges. In 2018 IEEE Global Engineering Education Conference (EDUCON) (pp. 1007–1014).
Google Scholar
De Smedt, J., vanden Broucke, S. K., Vanthienen, J., & De Witte, K. (2017). Improved student feedback with process and data analytics. In Data analytics applications in education (pp. 11–36). Auerbach Publications. https://doi.org/10.4324/9781315154145-2
De Wit, K., & Broucker, B. (2017). The governance of big data in higher education. In Data analytics applications in education (pp. 213–234). Auerbach Publications. https://doi.org/10.4324/9781315154145-9
De Witte, K., & Vanthienen, J. (2017). Data analytics applications in education. Auerbach Publications. https://doi.org/10.1201/b20438
De Witte, K., & López-Torres, L. (2017). Efficiency in education: A review of literature and a way forward. Journal of the Operational Research Society, 68(4), 339–363. https://doi.org/10.1057/jors.2015.92
Article Google Scholar
De Witte, K., & Smet, M. (2021). Financing Education in the Context of COVID-19 (Ad hoc report No. 3/2021). European Expert Network on Economics of Education (EENEE).
Google Scholar
DeNardis, L. (2020). The cyber-physical disruption. In The internet in everything (pp. 25–56). Yale University Press.
Google Scholar
Di Pietro, G., Biagi, F., Costa, P., Karpiński, Z., & Mazza, J. (2020). The likely impact of covid-19 on education: Reflections based on the existing literature and recent international datasets (Vol. 30275). Publications Office of the European Union.
Google Scholar
Dorie, V., Hill, J., Shalit, U., Scott, M., & Cervone, D. (2019). Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science, 34(1), 43–68. https://doi.org/10.1214/18-STS667
Article MATH Google Scholar
e Silva, M. C. A., & Camanho, A. S. (2017). Using data analytics to benchmark schools: The case of Portugal. In Data analytics applications in education (pp. 129–162). Auerbach Publications. https://doi.org/10.4324/9781315154145-6
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of Educational Research, 83(1), 70–120. https://doi.org/10.3102/0034654312474350
Article Google Scholar
Everton, S. F. (2012). Disrupting dark networks. Cambridge University Press. https://doi.org/10.1017/CBO9781139136877
Figlio, D., Karbownik, K., & Salvanes, K. G. (2016). Education research and administrative data. In Handbook of the economics of education (pp. 75–138). Elsevier.
Google Scholar
Furedi, F. (2010). Wasted: Why education isn’t educating. Bloomsbury Publishing.
Google Scholar
Greller, W., & Drachsler, H. (2012). Translating learning into numbers: A generic framework for learning analytics. Educational Technology & Society, 15(3), 42–57.
Google Scholar
Groot, W., & van den Brink, H. M. (2017). Evidence-based education and its implications for research and data analytics with an application to the overeducation literature. In Data analytics applications in education (pp. 235–260). Auerbach Publications. https://doi.org/10.4324/9781315154145-10
Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014). Understanding classrooms through social network analysis: A primer for social network analysis in education research. CBE—Life Sciences Education, 13(2), 167–178. https://doi.org/10.1187/cbe.13-08-0162
Article Google Scholar
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
Article Google Scholar
Hawe, P., & Ghali, L. (2008). Use of social network analysis to map the social relationships of staff and teachers at school. Health Education Research, 23(1), 62–69. https://doi.org/10.1093/her/cyl162
Article Google Scholar
Hill, J., Linero, A., & Murray J. (2020). Bayesian additive regression trees: A review and look forward. Annual Review of Statistics and Its Application, 7, 251–278. https://doi.org/10.1146/annurev-statistics-031219-041110
Article Google Scholar
Iterbeke, K., & De Witte, K. (2020). Helpful or harmful? The role of personality traits in student experiences of the covid-19 crisis and school closure. FEB Research Report Department of Economics. https://doi.org/10.1177/01461672211050515
Iterbeke, K., De Witte, K., Declercq, K., & Schelfhout, W. (2020). The effect of ability matching and differentiated instruction in financial literacy education. evidence from two randomised control trials. Economics of Education Review, 78, 101949. https://doi.org/10.1016/j.econedurev.2019.101949
Jacob, B., Berger, D., Hart, C., & Loeb, S. (2016). Can technology help promote equality of educational opportunities? RSF: The Russell Sage Foundation Journal of the Social Sciences, 2(5), 242–271. https://doi.org/10.7758/rsf.2016.2.5.12
Article Google Scholar
Jivet, I., Scheffel, M., Specht, M., & Drachsler, H. (2018). License to evaluate: Preparing learning analytics dashboards for educational practice. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (pp. 31–40). https://doi.org/10.1145/3170358.3170421
Kapelner, A., & Bleich, J. (2016). Bartmachine: Machine learning with bayesian additive regression trees. Journal of Statistical Software, Articles, 70(4), 1–40. https://doi.org/10.18637/jss.v070.i04
Google Scholar
Kassab, M., DeFranco, J., & Laplante, P. (2020). A systematic literature review on internet of things in education: Benefits and challenges. Journal of Computer Assisted Learning, 36(2), 115–127. https://doi.org/10.1111/jcal.12383
Article Google Scholar
King, G. (2016). Big data is not about the data! computational social science: Discovery and prediction.
Google Scholar
Langedijk, S., Vollbracht, I., & Paruolo, P. (2019). The potential of administrative microdata for better policy-making in Europe. In Data-driven policy impact evaluation, (p. 333). https://doi.org/10.1007/978-3-319-78461-8_20
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy M., Roy D., & Van Alstyne, M. (2009). Social Science: Computational social science. Science, 323(5915), 721–723. https://doi.org/10.1126/science.1167742
Article Google Scholar
Lazer, D., Pentland, A., Watts, D. J., Aral, S., Athey S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H., Nelson, A., Salganik, M. J., Strohmaier, M., Vespignani, A., & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060–1062. https://doi.org/10.1126/science.aaz8170
Article Google Scholar
Lee, H.-S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty Science Education, 103(3), 590–622. https://doi.org/10.1002/sce.21504
Google Scholar
Leitner, P., Khalil, M., & Ebner, M. (2017). Learning analytics in higher education—a literature review. In Learning analytics: Fundaments, applications, and trends (pp. 1–23). https://doi.org/10.1007/978-3-319-52977-6_1
Lemon, S. C., Roy J., Clark, M. A., Friedmann, P. D., & Rakowski, W. (2003). Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Annals of Behavioral Medicine, 26(3), 172–181. https://doi.org/10.1207/S15324796ABM2603_02
Article Google Scholar
Litman, D. (2016). Natural language processing for enhancing teaching and learning. In Thirtieth AAAI Conference on Artificial Intelligence.
Google Scholar
Long, P., & Siemens, G. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30.
Google Scholar
Lonn, S., Aguilar, S. J., & Teasley, S. D. (2015). Investigating student motivation in the context of a learning analytics intervention during a summer bridge program. Computers in Human Behavior, 47, 90–97. https://doi.org/10.1016/j.chb.2014.07.013
Article Google Scholar
Maldonado, J., & De Witte, K. (2021). The effect of school closures on standardised student test. British Educational Research Journal, 48(1), 49–94. https://doi.org/10.1002/berj.3754
Article Google Scholar
Mandl, U., Dierx, A., & Ilzkovitz, F. (2008). The effectiveness and efficiency of public spending (Technical Report). Directorate General Economic and Financial Affairs (DG ECFIN).
Google Scholar
Mazrekaj, D., De Witte, K., & Cabus, S. (2020). School outcomes of children raised by same-sex parents: Evidence from administrative panel data. American Sociological Review, 85(5), 830–856. https://doi.org/10.1177/0003122420957249
Article Google Scholar
Mergoni, A., & De Witte, K. (2021). Policy evaluation and efficiency: A systematic literature review. International Transactions in Operational Research. https://doi.org/10.1111/itor.13012
Merlo, E., Ríos, S. A., Álvarez, H., L’Huillier, G., & Velásquez, J. D. (2010). Finding inner copy communities using social network analysis. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–590).
Google Scholar
Mukala, P., Buijs, J. C., Leemans, M., & van der Aalst, W. M. (2015). Learning analytics on coursera event data: A process mining approach. In SIMPDA (pp. 18–32).
Google Scholar
Nguyen, Q., Huptych, M., & Rienties, B. (2018). Linking students’ timing of engagement to learning design and academic performance. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (pp. 141–150). https://doi.org/10.1145/3170358.3170398
Nouri, J., Ebner, M., Ifenthaler, D., Saqr, M., Malmberg, J., Khalil, M., Bruun, J., Viberg, O., Conde González, M. Á., Papamitsiou, Z., & Berthelsen, U. D. (2019). Efforts in Europe for Data-Driven Improvement of Education–A Review of Learning Analytics Research in Seven Countries. International Journal of Learning Analytics and Artificial Intelligence for Education (iJAI), 1(1), 8–27. https://doi.org/10.3991/ijai.v1i1.11053
Article Google Scholar
Oreopoulos, P., & Salvanes, K. G. (2011). Priceless: The nonpecuniary benefits of schooling. Journal of Economic Perspectives, 25(1), 159–84. https://doi.org/10.1257/jep.25.1.159
Article Google Scholar
Palermo, C., & Wilson, J. (2020). Implementing automated writing evaluation in different instructional contexts: A mixed-methods study. Journal of Writing Research, 12(1), 63–108.
Article Google Scholar
Pardo, A., Bartimote, K., Shum, S. B., Dawson, S., Gao, J., Gašević, D., Leichtweis, S., Liu, D., Martínez-Maldonado, R., Mirriahi, N., Moskal, A. C. M., Schulte, J., Siemens, G., & Vigentini, L. (2018). Ontask: Delivering data-informed, personalized learning support actions. Journal of Learning Analytics, 5(3), 235–249.
Article Google Scholar
Podgorelec, V., Kokol, P., Stiglic, B., & Rozman, I. (2002). Decision trees: An overview and their use in medicine. Journal of Medical Systems, 26(5), 445–463. https://doi.org/10.1023/A:1016409317640
Article Google Scholar
Rettore, E., & Trivellato, U. (2019). The use of administrative data to evaluate the impact of active labor market policies: The case of the italian liste di mobilità. In Data-driven policy impact evaluation (pp. 165–182). Springer. https://doi.org/10.1007/978-3-319-78461-8_11
Robinson, L., Cotten, S. R., Ono, H., Quan-Haase, A., Mesch, G., Chen, W., Schulz, J., Hale, T. M., & Stern, M. J. (2015). Digital inequalities and why they matter. Information, Communication & Society, 18(5), 569–582. https://doi.org/10.1080/1369118X.2015.1012532
Article Google Scholar
Ryymin, E., Palonen, T., & Hakkarainen, K. (2008). Networking relations of using ict within a teacher community. Computers & Education, 51(3), 1264–1282. https://doi.org/10.1016/j.compedu.2007.12.001
Article Google Scholar
Selwyn, N. (2019). What’s the problem with learning analytics? Journal of Learning Analytics, 6(3), 11–19.
Article Google Scholar
Smith, G. G., Haworth, R., & Žitnik, S. (2020). Computer science meets education: Natural language processing for automatic grading of open-ended questions in ebooks. Journal of Educational Computing Research, 58(7), 1227–1255. https://doi.org/10.1177/0735633120927486
Article Google Scholar
Stoffi, F. J. B., De Beckker, K., Maldonado, J. E., & De Witte, K. (2021). Assessing sensitivity of machine learning predictions. a novel toolbox with an application to financial literacy. Preprint arXiv:2102.04382.
Google Scholar
Streitwieser, B., & Ogden, A. C. (2016). International higher education’s scholar-practitioners: Bridging research and practice , Books, S., (Ed.).
Google Scholar
Tadayon, M., & Pottie, G. J. (2020). Predicting student performance in an educational game using a hidden markov model. IEEE Transactions on Education, 63(4), 299–304. https://doi.org/10.1109/TE.2020.2984900
Article Google Scholar
Topolovec, S. (2018). A comparison of self-paced and instructor-paced online courses: The interactive effects of course delivery mode and student characteristics.
Google Scholar
Vanthienen, J., & De Witte, K. (2017). Data analytics applications in education. Auerbach Publications. https://doi.org/10.4324/9781315154145
Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Computers in Human Behavior, 89, 98–110. https://doi.org/10.1016/j.chb.2018.07.027
Article Google Scholar
Wasserman, S., Faust, K. (1994). Social network analysis: Methods and applications. Cambridge University Press.
Book MATH Google Scholar
Wegge, D., Vandebosch, H., & Eggermont, S. (2014). Who bullies whom online: A social network analysis of cyberbullying in a school context. Communications, 39(4), 415–433. https://doi.org/10.1515/commun-2014-0019
Article Google Scholar
Whitelock, D., Twiner, A., Richardson, J. T., Field, D., & Pulman, S. (2015a). Feedback on academic essay writing through pre-emptive hints: Moving towards. European Journal of Open, Distance and E-learning, 18(1), 1–15.
Google Scholar
Whitelock, D., Twiner, A., Richardson, J. T., Field, D., & Pulman, S. (2015b). Openessayist: A supply and demand learning analytics tool for drafting academic essays. In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge (pp. 208–212).
Google Scholar
Zafras, I., Kostas, A., & Sofos, A. (2020). Moocs & participation inequalities in distance education: A systematic literature review 2009-2019. European Journal of Open Education and E-learning Studies, 5(1), 68–89.
Article Google Scholar

Download references

Acknowledgements

The authors are grateful for valuable comments and suggestions from the participants of the Education panel of the CSS4P workshop, particularly Federico Biagi and Zsuzsa Blaskó. Moreover, they wish to thank Alexandre Leroux of GERME, Francisco do Nascimento Pitthan, Willem De Cort, Silvia Palmaccio and the members of the LEER and CSS4P team for the rewarding discussions and suggestions.

Author information

Authors and Affiliations

Leuven Economics of Education Research (LEER), KU Leuven, Leuven, Belgium
Kristof De Witte
Maastricht Economic and Social Research Institute on Innovation and Technology (UNU-MERIT), United Nations University, Maastricht, The Netherlands
Kristof De Witte
Leuven Economics of Education Research (LEER), KU Leuven, Leuven, Belgium
Marc-André Chénier

Authors

Kristof De Witte
View author publications
You can also search for this author in PubMed Google Scholar
Marc-André Chénier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kristof De Witte .

Editor information

Editors and Affiliations

Scientific Development Unit, Centre for Advanced Studies, Science and Art European Commission - Joint Research Centre, Ispra, Italy
Eleonora Bertoni
Scientific Development Unit, Centre for Advanced Studies, Science and Art European Commission - Joint Research Centre, Ispra, Italy
Matteo Fontana
Scientific Development Unit, Centre for Advanced Studies, Science and Art European Commission - Joint Research Centre, Ispra, Italy
Lorenzo Gabrielli
Scientific Development Unit, Centre for Advanced Studies, Science and Art European Commission - Joint Research Centre, Ispra, Italy
Serena Signorelli
Digital Economy Unit, European Commission - Joint Research Centre, Ispra, Italy
Michele Vespe

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

De Witte, K., Chénier, MA. (2023). Learning Analytics in Education for the Twenty-First Century. In: Bertoni, E., Fontana, M., Gabrielli, L., Signorelli, S., Vespe, M. (eds) Handbook of Computational Social Science for Policy. Springer, Cham. https://doi.org/10.1007/978-3-031-16624-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-16624-2_16
Published: 14 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16623-5
Online ISBN: 978-3-031-16624-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Analytics in Education for the Twenty-First Century

Abstract

Similar content being viewed by others

Big Data Learning Analytics: A New Perpsective

Learning analytics in higher education: a preponderance of analytics but very little learning?

Technological barriers and incentives to learning analytics adoption in higher education: insights from users

1 Introduction

2 Potential for Educators and Citizens

2.1 Growing Opportunities for Data-Driven Policies in Education

2.2 Learning Analytics as a Toolset

2.2.1 Improving Cost-Effectiveness of Education

2.2.2 Improving Learning Outcomes

2.2.3 Educational Quality Management

2.2.4 Underlying Data for Quality Measurement

2.2.5 Efficiency Measurement

2.2.6 Predictions

3 An Array of Policy-Driving Tools

3.1 Bayesian Additive Regression Trees

3.2 Social Network Analysis

3.3 Natural Language Processing

4 Issues and Recommendations

4.1 Non-technical Issues

4.2 Technical Issues

4.3 Recommendations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Learning Analytics in Education for the Twenty-First Century

Abstract

Similar content being viewed by others

Big Data Learning Analytics: A New Perpsective

Learning analytics in higher education: a preponderance of analytics but very little learning?

Technological barriers and incentives to learning analytics adoption in higher education: insights from users

1 Introduction

2 Potential for Educators and Citizens

2.1 Growing Opportunities for Data-Driven Policies in Education

2.2 Learning Analytics as a Toolset

2.2.1 Improving Cost-Effectiveness of Education

2.2.2 Improving Learning Outcomes

2.2.3 Educational Quality Management

2.2.4 Underlying Data for Quality Measurement

2.2.5 Efficiency Measurement

2.2.6 Predictions

3 An Array of Policy-Driving Tools

3.1 Bayesian Additive Regression Trees

3.2 Social Network Analysis

3.3 Natural Language Processing

4 Issues and Recommendations

4.1 Non-technical Issues

4.2 Technical Issues

4.3 Recommendations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation