Keywords

As mentioned in the introduction, conducting experimental studies in learning technology and CCI research entails an iterative process of observation, rationalization, and validation (see Fig. 6.1). More detailed processes with additional steps, such as conducting a literature review, have been proposed (e.g., Ross & Morrison, 2013). Nevertheless, no matter how detailed description we have, determining, conducting, and reporting the data analysis is fundamental. Although data analysis procedures vary widely in complexity, selection of the appropriate analysis is usually based on two aspects: the RQs/hypotheses and the type of data involved. To clarify the process, Fig. 6.1 shows the steps typically needed to determine the process and conduct the data analysis.

Fig. 6.1
A diagram for the step by step process for performing data analysis. First initiate, then check for anomalies, run the respective tests, and finally calculate effect sizes and other matrices for reporting.

Typical process for determining and conducting data analysis in learning technology and CCI studies

6.1 Data Collection

There are different ways that researchers collect data. Whether it is qualitative, quantitative or mixed research design, researchers need to collect data that are going to support the rationalization of the study (e.g., respond to the hypotheses or the RQs). In particular, in human-factors IT-related fields, we usually see different quantitative (e.g., log files/analytics, questionnaire data, sensor data) or/and qualitative (e.g., interviews, field notes) data collections taking place. Although it is possible to follow some of the principles described in this section with qualitative data as well, through different forms of data quantification (e.g., annotations, text mining, expert analysis); most of the practices described here concern quantitative data collections. In several cases those data collections are associated with specific measurements (which are associated with the RQs), in some data collections the measurements are predefined (e.g., questionnaire data, some log files) in some other data collections the measurements are post-computed (e.g., from sensor data), and in some other data collections there are no measurements (e.g., this is common in qualitative research studies). In this section, we will see some example data collections that are relevant for a learning technology and CCI researcher.

Questionnaire Data (Also Known as Survey Data)

The use of questionnaires (also called surveys) has a long history in both HCI and learning technology research. The goal is to understand users/learners attitudes and perceptions toward an artifact, or/and a procedure. Questionnaires are also allowing us to gather information about users’ backgrounds (e.g., habits, technology use), demographics and awareness. Questionnaires have been used for several years across different fields, such as social psychology, behavioral research and marketing, and can be put into practice in a pen and paper form or as a part of the system (e.g., integrated questionnaires). Several standardized questionnaires have been developed to gather information about system’s perceived usability (e.g., System Usability Scale (SUS) (Brooke, 1996), Computer System Usability Questionnaires (CSUQ) (Lewis, 1995)), users’ perceived effort (e.g., NASA Task Load Index (NASA-TLX) (Hart & Staveland, 1988)), and users’ attitudes and perceptions (e.g., perceived usefulness, perceived ease of use (Davis, 1989). Questionnaires are a direct means of measuring users’ perceived experience such as satisfaction, enjoyment, ease of use, with many of them having high level of standardization in HCI research (e.g., satisfaction is part of the ISO 9241). In the same vein, questionnaires are systematically used to assess learning experience, several questionnaire instruments have been developed and widely used in the past (e.g., to evaluate a learning system or different aspects of the learning design) (see: Kay & Knaack, 2009; Henrie et al., 2015). Questionnaires is a commonly accepted measure of users’ and learners’ experience (at least the perceived one), and despite some criticism (e.g., overuse or overreliance on questionnaires), questionnaires will probably continue to be a valid approach for externalizing and quantifying users’ perceived experience. Below in Fig. 6.2 you can see two standard questionnaires for measuring system’s usability (left) and users’ mental effort (right).

Fig. 6.2
The left panel depicts a list of ten questions for measuring system usability. The scores range from 1 through 5. The right panel depicts the interpretation of the scores from N A S A task load index from very low to very high.

Standard questionnaires for measuring: system’s usability (SUS; Brooke, 1996) (left). (Image from Klug, 2017; licensed under CC BY-ND 3.0).; mental effort in a task (NASA-TLX; Hart & Staveland, 1988) (right). (Image by https://commons.wikimedia.org/wiki/File:NasaTLX.png)

In this book we are not going to have a deep discussion on the role of questionnaires in IT related research, such a discussion can be found in Müller et al. (2014) and Groves et al. (2011). However, we will briefly discuss how questionnaires can help us collecting useful data and what are the most common measurements we see in learning technology and HCI research. The most common conceptual constructs (measurements) are multi item (multi questions), so several similar question are used to construct the measurement of the construct. In most of the cases, they are measured using Likert scales (e.g., five or seven point are the most common) and the wording of the scales can be configured to match the question.Footnote 1 Although no strict requirements exist, in large scale studies (usually survey studies) we see expectations for ten respondents per item (question). In experimental designs we see studies with less respondents per item. However, the researchers need to be considerable of the “ecology” of the measurements (e.g., should have a manageable number of questions that allow the user to understand, reflect and respond). Beyond the care the researchers need to pay during the research design of a study, there are also procedures for assessing the convergent validity of questionnaire measurements used in a study, for instance Fornell and Larcker (1981) proposed the following three routines, composite reliability of each measurement, usually Cronbach α above 0.7; item/question reliability of the measure, usually factor loading of 0.7 and above for each question (with no cross loadings) is a good indicator; and the Average Variance Extracted (AVE) of the measure, usually it is expected that AVE is equal or exceeds 0.50. In the following table (Table 6.1) we provide some examples of commonly used (in learning technology and HCI) measurements. Those items are properly contextualized and provided as options to a general question such as “Please indicate how much you agree or disagree with the following statements based on your experience with [the artifact]:”. Whenever [the artifact] the researchers can use the artifact of interest (e.g., the XYZ mobile application, the avatar, the dashboard).

Table 6.1 Examples of measurements, their description, their questions and the respective reference

Analytics (Also Known as User Logs)

In the fourth chapter we discussed about user traces that are left behind when users interact with technologies, and the implications those traces have for learning technology and CCI research. Those traces produce a wide range of insights, including users’ response time, response correctness, number of attempts to solve a problem, time spent interacting with learning resources, navigation to various learning resources, activity on the various communication functionalities (e.g., forums), and other learning trace data. Besides the ways systems’ can develop intelligence when leveraging on these data, such data can also be used to enrich measurements when conducting experimental studies. As we discussed in the fourth chapter tracking logs are powerful (you can see examples from edX MOOCs hereFootnote 2) and can help us to infer useful measurements, see services that host and provide access to learning interaction data such as Pittsburgh Science of Learning Center’s DataShop (https://pslcdatashop.web.cmu.edu/). Although a perfect one-to-one relationship between “measurements” and “conceptual constructs” is practically impossible, we see that very close relationships (i.e., analytics that capture the target construct to a great extent) exist and are heavily used to CCI and learning technology research (e.g., learning performance that is defined as the scores of the user in the assessment tasks). This allows us to capture those useful measurements intuitively (e.g., via the log files). Although such measurements can be post-computed from the tracking logs of the technology and the respective database schema; it is also possible and significantly more practical to “architect” analytics when designing and developing the technology. By architecting the analytics, you can develop relational database schemas that organize the data with respect to your needs and meaningful measurements (e.g., see Pardos et al., 2016), architecting analytics is also powerful when you have to work with learning eco-systems, where analytics across systems need to be captured and make sense (Mangaroska et al., 2021). The use of analytics in measurements during experimentation is an interesting and complex topic. The goal of this book, is not to go deep in this topic, but provide some examples of commonly used analytics based measurements in the context of learning technology and CCI (see Table 6.2). These selection of those measurements needs to take into consideration the context of the study of the technology and be relevant with the intended RQ.

Table 6.2 Examples of analytics-based measurements, their description, how they are usually computed and an example of their use from the literature

Sensor-Based Analytics (Sensor Data)

Advances in sensors, social signal processing and computational analyses have demonstrated the potential to help us understand user and learning processes which were either not-possible to be captured or “too complex” for traditional analytics. For example, psychomotor learning with physical objects needs high frequency data and analyses can now happen in a reasonable time-window (Sharma & Giannakos, 2020). Due to the need for combining different expertise (e.g., learning scientists, data scientists, computer scientists), the collection, analysis and interpretation of sensor data in CCI and learning contexts have been a challenging endeavor. Nevertheless, over the last years the Multimodal Learning Analytics (MMLA) research community has managed to gather diverse research expertise’s (e.g., educational, computational, psychological), and contributed with rich measurements with respect to HCI and learning. A perfect one-to-one relationship between sensor-based measures and conceptual constructs does not exist (Giannakos et al., 2022), however, MMLA research is achieving acceptable levels of reliability and validity, allowing us to use measurements that provide useful insights (e.g., from eye-activity, facial expression or users’ motions and gestures). Table 6.3 depicts some examples of commonly used sensor based measurements in the context of learning technology and CCI. Once again, the selection of those measurements needs to take into consideration the context of the study of the technology and be relevant with the intended RQ, moreover, the researchers also need to consider the level of intrusiveness (the extent to which a measurement is ecologically valid, e.g., does not interfere with the task or impose obtrusive conditions). In different sub-domains of learning technology and HCI, we see researchers coining measurements that align with the objectives of those sub-domains. For example, in the context of Computer Supported Collaborative Learning (CSCL) research, we find researchers using a measurement called Joint Visual Attention (JVA) (i.e., the moments more than one users look at the same area) or “with-me-ness” (i.e., the moments the learner is looking on the content delivered by the teacher, e.g., how much the learner follows the teacher), although those measurements are not as general or widely used as the ones we identify in Table 6.3, they are very important for the challenges of this particular sub-domain (Sharma et al., 2014, 2017).

Table 6.3 Examples of sensor-based measurements, their description, how they are usually computed and an example of their use from the literature

Pictorial Self-Report Data

Traditional verbal questionnaires assume that respondents are able fully grasp a question and think abstractly about their experience. However, several populations (e.g., children younger than 12) have not yet developed these skills or are in conditions that do not allow them to respond those instruments in a valid manner (e.g., a user who has dyslexia or is very tired from the main task); instead, their thinking processes are based on mental representations that relate to concrete events, objects, or experiences. This must be taken into account when adapting the measurement method to meet participants’ needs. Following this line of reasoning and related work in child development and psychology (Harter & Pike, 1984), there is an number of instruments that use visual methods (or observations and qualitative, checklist-based measurements), which we know are more effective than verbal methods (Döring et al., 2010). Such visual analogs represent specific situations, behaviors, and people to whom a user can easily relate.

Such visual analogs are usually employed to collect data during evaluation of an artefact (as well as during the lifetime of an application). We have seen pictorial questionnaires popping up while we are using an application or at the end of an activity (e.g., after we try a resource that has been recommended to us). Similarly with the verbal questionnaires, pictorial questionnaires are used to qualtify users’ perceived experience such as satisfaction, enjoyment, ease of use and alike. Although pictorial questionnaires usually do not follow the multi item (multi question) paradigm of the verbal ones (so the validity is not always being assessed), however, it is easier to employ pictorial questionnaires “on the spot” and capture temporal experience of the users. Moreover due to their usually short reading time, it is also easy to employ them either in selected critical moments (when the user finished a task) or in a random manner during the activity so we can get repeated measurements (Fig. 6.3).

Fig. 6.3
An illustration depicts the six reactions of a human with smileys namely for upvote, funny, love, surprised, angry, and sad. Three different example questions are given and the user is asked to answer each of those.

Three examples of pictorial surveys used to evaluate users’ experience

Pictorial questionnaires are not meant to substitute verbal questionnaires, those two types of self-reporting instruments have been designed to address different research needs. Verbal questionnaires can use the specificity of verbal communication to extract exact information and the widely used measurements have been extensively validated and standardized. Pictorial questionnaires are used when “verbal communication becomes a challenge” and have the benefits of not increasing users’ cognitive load and overall burden, and reducing the time-to-complete. Similarly with verbal questionnaires, the pictorial questionnaires should be properly contextualized and sometimes complemented with minimal text such as “what do you think about [the artifact]:”. Whenever [the artifact] the researchers can use the artifact of interest (e.g., the XYZ mobile application, the avatar, the dashboard). Nevertheless, pictorial questionnaires should be self-standing, even if the user cannot read the provided text, depending the end-user, sometimes researchers need to use oral communication to explain what aspects we are asking the end-user to rate with the visual analogs. Similarly with verbal questionnaires, pictorial questionnaires can be used in both pen and paper and in a digital version, however, some of the advantages of digitally administering pictorial questionnaires to assess software (e.g., temporality, overall burden) might be lost or weakened. Table 6.4 depicts some examples of commonly used pictorial questionnaire measurements in the context of learning technology and CCI.

Table 6.4 Examples of pictorial survey measurements, their description, how they look like and the respective reference

6.2 Data Analysis

To make the process clearer and to provide additional resources, Table 6.5 summarizes the most common data analysis procedures used in learning technology and CCI research. Let us now think of a simple between-subjects design with one control group (e.g., no use of dashboard in the LMS) and one experimental group (e.g., a simple dashboard that provides students’ previous test scores), with students’ weekly test scores as the dependent variable. In this case, a t-test for independent samples is needed (provided that parametric assumptions are met) to test the hypothesis that introducing a simple dashboard affects students’ learning performance. Adding a second experimental group (i.e., a third treatment group) with a dashboard that not only provides but also visualizes students’ scores will require a different analysis. In that case, we will need a one-way analysis of variance (ANOVA) (provided that parametric assumptions are met) to compare the three means; if the results of the ANOVA are significant, we can conduct a follow-up Tukey or REGWQ post-hoc comparison of means to find the pairwise differences. Learning technology and CCI researchers do not have to be data analysts or statisticians, but it is important to provide clear RQs and hypotheses and to follow a few basic rules and guidelines during the data analysis. Clearly formulated RQs will also make it possible to work with data analysts or statisticians if more sophisticated analyses are required that go beyond the scope of this book.

Table 6.5 Comparison of data analysis procedures in learning technology and CCI research

Most of the studies in learning technology and CCI employ the null hypothesis significance testing (NHST)Footnote 3 approaches and analyse data using the variance-based methods we present in Table 6.5. Despite the usefulness of variance-based approaches we have seen an increasing the need for new methods as well as combinations of different methods and approaches that can reduce biases and help us obtain a more holistic understanding of the phenomenon. Examples of such methods that we see being increasingly used in HCI and learning technology/ analytics are Bayesian methods (Robertson & Kaptein, 2016), fuzzy-set qualitative comparative analysis (fsQCA, or simpler versions of it such as QCA) (Pappas et al., 2019; Papamitsiou et al., 2018), process mining (Sharma et al., forthcoming), Hidden Markov Models (Sharma et al., 2020), or different machine learning methods (Kidziński et al., 2016).

As mentioned above, to support learning technology/CCI researchers, we provide a comprehensive how-to guide that allows them to choose between the various analyses by looking at the data types, the function of each of the analyses, working examples, the main conditions and assumptions, and resources for step-by-step implementation. Novice researchers should be aware that in order to explore causal relationships (cause-and-effect relationships) on the basis of experimental designs that compare outcomes associated with treatments, it is necessary to use tests that test causal effects (e.g., t-tests or ANOVAs) rather than correlational tests (e.g., Pearson correlations).