Introduction

Statistics is a way to cope with variability in the world and offers strong tools, concepts and language to understand and describe such variability. The power of statistics lies in the abstraction from the individual values of the data collected to an aggregate view of how the data is distributed, both when the task involves describing the data and when the data is utilised for inference (e.g. Wild, 2006). When learning statistics, it is essential to develop a conceptual understanding of a data set, perceiving it not merely as a collection of individual values, but rather as a unified entity that can reveal important aspects of the variation in the data. According to Bakker & Gravemeijer (2004), a conceptual understanding of distribution is a prerequisite for developing the ability to choose appropriate statistical measures. Nevertheless, it is difficult for middle-grade students to expand their views on data from the measurement value of an object to understanding data distribution. According to Garfield et al. (2008), the tasks students encounter in textbooks do not support their understanding of the notion of distribution. Rather, the tasks ask students to ‘look at a histogram or stem plot and describe the shape, center, and spread …’ (Garfield et al., 2008, p. 186). Consequently, many students do not understand that the distribution of data is an entity. As noted by Bakker & Hoffmann (2005), ‘An essential characteristic of statistics is that it can predict properties of individual values. Therefore, if students cannot see a data set as a whole, they miss the essential point of doing statistics.’ (p. 334).

Digital tools can support students develop an understanding of key concepts in statistics education, including the idea of distribution. Digital technology can promote students’ development of a conceptual understanding about such powerful statistical concepts, tools and language. In the field of statistics education research, the potential of digital tools has been widely explored. Biehler et al. (2013) presented an overview of statistical tools for education at all levels, from the primary to the tertiary level. Different tools serve different purposes and offer different affordances. Some tools have been developed for professional statisticians, while others have been developed for educational purposes, e.g. Minitool (Cobb et al., 1997), TinkerPlots (Konold & Miller, 2011) or CODAP (Finzer & Damelin, 2014).

What they have in common is what Biehler et al. (2013) called fast modes of transportation. According to this metaphor, statistical reasoning is characterised as movements between looking at data as individual values and looking at the patterns and trends in the data. Instead of spending time, for instance, to calculate means manually or draw graphs, the power of the digital tools frees up cognitive space to focus on the more fundamental ideas. This is equivalent to the idea of lever potential, which Winsløw (2003) and Dreyfus (1994) investigated in the context of computer algebra systems, which involves allowing students to work on a conceptual level while leaving lower-level operations to the computer. Few studies have explicitly investigated how digital tools support students’ development of statistical concepts. Bakker & Hoffmann (2005) followed the development of students’ diagrammatic reasoning while working with Minitool. Ben-Zvi & Ben-Arush (2014) studied students’ instrumentation of certain features in TinkerPlots.

This current study, however, draws on Vergnaud’s (1998, 2017) theory of conceptual fields. Vergnaud presented an extensive conceptualisation of what constitutes a mental scheme. This study follows the case of Frida, a 12–13-year-old Danish middle-school student, as she construes the concept of statistical distribution while interacting with the digital tool TinkerPlots. Specifically, the study involves an in-depth investigation of how the digital experiences shape her mental understanding of data distribution in terms of Vergnaud’s conceptual notions. The richness of this theoretical approach not only relates to the development of Frida’s understanding of the concept, but also to how, over time, the digital tool shapes her personal goals and anticipations while she works on statistics tasks. In particular, the analysis of her personal goals and anticipations helps to extend the understanding of how digital experiences shape and challenge students’ conceptual understandings.

The study seeks to answer the following research question:

  • How can interaction with the digital tool TinkerPlots shape a student’s conceptual understanding of data distribution, and how does the student’s tool-shaped scheme(s) challenge or support further conceptual development of this understanding as she progresses to new situations?

This research involves a case study built on transcripts from, in part, screencast recordings of Frida’s actions with TinkerPlots, while working on two different tasks, as well as a transcript from an artefact interview conducted after one of the teaching sequences. Vergnaud’s (1998) scheme notion from the theory of conceptual fields is the analytical frame for analysing the case of Frida. The findings and potential implications for statistical teaching practice in a digital learning environment are also discussed.

The following section introduces the potential of digital tools in statistics teaching. First, the idea of distribution is presented along with why this concept is important as an educational aim. Second, research on the potential of including digital tools in this process is discussed. Next, the method section clarifies the boundaries of the specific case of Frida. The analysis focuses on three excerpts, which extend over a time span of one year from three different teaching sequences. Finally, the findings of the analysis are discussed along with their educational implications.

The Role of Digital Tools in Students’ Conceptualisation of Data Distribution

The Complexity of Making Sense of Data Distribution

In the context of statistics, distribution is a meta-concept with several embedded statistical sub-concepts, such as variability, shape and centres (e.g. Burrill & Biehler, 2011). Wild (2006) explained the idea of distribution, from a statistician’s point of view. He emphasised that data is not interesting as a collection of individual values; rather, it is the patterns that can be seen when freed from distracting details:

When case labels are set aside individuals with identical values for the variables of interest become indistinguishable so that, without any loss of information, we can reduce the data to a set of distinct values and their corresponding frequencies, that is, to a frequency distribution. (p. 11)

Statistical reasoning can be described as the shift from seeing data as individual values to seeing data as a distribution (Bakker & Gravemeijer, 2004). Bakker and Gravemeijer investigated seventh-grade students’ learning of several aspects of distribution, developing a three-layer model. The model describes how statistical reasoning is moving between the layers. The top layer includes the distribution as an entity, while the middle layer contains terminology and concepts describing individual features of the distribution. This could be means, modes or medians, or it could be a way to describe a range (e.g. standard deviation). It could also be a way to describe the skewness and density of the data. The bottom layer contains data as individual values. Here, the student focuses on what the data represents, for example, considering what a single outlier actually represents in the given context. A professional statistician would begin by looking at the distribution of data as a conceptual entity and then move between the different layers of this model. However, according to Bakker and Gravemeijer, this movement should not be seen as the end goal for students.

The conceptualisation of statistical distribution and the development of smooth movement between the layers is non-trivial and, according to Bakker & Gravemeijer (2004), difficult for middle-school students. Indeed, it is complicated to connect the distribution as an entity to the different formal descriptors. For instance, Mokros & Russell (1995) raised concerns about the ‘unappreciated complexity’ of the mathematical mean. While it is trivial to calculate, it is complex and difficult for students to find meaning in it. Konold & Pollatsek (2002) highlighted a problem with students’ understanding of averages, noting that students often fail to see averages as representative of the whole distribution. To address this issue, they proposed viewing statistics as the study of a ‘noisy’ process, in which signals can be detected. Both Mokros and Russell, and Konold and Polatsek, identified comparing distributions as a potential conceptual lever. When students describe similarities and differences between distributions, they utilise various methods to accomplish the comparison and, in doing so, they gain an understanding of the concept of distribution as a distinct entity.

The Potential of Digital Technologies in Statistics Education

The role of digital technologies in statistics teaching should involve ‘accessing, analyzing and interpreting large real data sets, automating calculations and processes, generating and modifying appropriate statistical graphics and models, performing simulations to illustrate abstract concepts and exploring “what happens if …” type questions’ (Chance et al., 2007, pp. 2–3). According to the literature, digital tools in statistics education hold such great potential for good statistics learning environments (Ben-Zvi et al., 2018). There is a distinction between route-type and landscape-type educational software for statistics teaching. The educational tool TinkerPlots creates a landscape ‘in which students and teachers may freely explore data’ (Garfield & Ben-Zvi, 2004, p. 402). The tool is designed to support students in transiting data from an unorganised form to both formal and informal structures (Biehler et al., 2013). Various informal and formal visualisations are embedded in software (both in Minitool and TinkerPlots), which allows students to develop various sub-models in an emergent modelling process (Bakker & Gravemeijer, 2004; Ben-Zvi et al., 2018).

In the Connections Project, TinkerPlots gradually became a thinking tool for developing students’ statistical reasoning, aiding them in learning new ways to organise and represent data (Biehler et al., 2013). Ben-Zvi & Ben-Arush (2014) investigated students’ interaction with certain features of TinkerPlots. They used the theory of instrumental genesis (e.g. Guin & Trouche, 1998) to analyse the students’ instrumentation and instrumentalisation of the tool. This approach supported the students’ instrumental and cognitive development while they constructed meanings of key ideas through an exploratory data analysis process.

Bakker & Hoffmann (2005) analysed students’ learning about statistical distributions using Peirce’s framework of semiotics. They found that students’ diagrammatic reasoning led to the development of statistical concepts as objects in the discourse. In this framework, a diagram serves as a symbolic representation of relationships. Examples of such representational systems include Euclidean geometry, computer software diagrams or a student’s informal sketch for statistical reasoning. Diagrammatic reasoning involves three steps: constructing a diagram, experimenting with it and observing the results of the experiments, followed by reflection on those observations.

In the study of Bakker and Hoffmann, the students developed statistical objects into means of communication in their further reasoning. In particular, Bakker and Hoffmann focused on students’ development of the term ‘bump’, referring to the shape of the statistical distribution, hence the distribution as an aggregate. Their analyses focused on the possibilities for students to engage in what they called ‘hypostatic abstraction’, which refers to the process whereby certain characteristics of a set of objects become an object in itself. They gave an example from their own study, where students developed the concept of spread. First, the students referred to the spread by saying ‘the dots are spread out’. Subsequently, through the process of hypostatic abstraction, they referred to the same phenomenon as ‘the spread is large’. The term ‘spread’ has developed as an object in itself and became a mean for communication (Bakker & Hoffmann, 2005, p. 342). Their conclusions led to three recommendations for teaching statistics. First, students should both make their own and learn conventional diagrams. Second, students should experiment with diagrams. Third, students’ reflections should be stimulated.

Students’ Statistical Distribution Schemes in a Digital Learning Environment

In this study, the analysis of Frida’s case draws on Vergnaud’s theory of conceptual fields and the notion of schemes. However, the notion of schemes also appears in the theory of instrumental genesis developed by Guin & Trouche (1998). The idea behind this theory is that the tool develops from being an artefact to an instrument for the student. Drijvers et al. (2013) described the process of instrumental genesis as involving three dualities: artefact–instrument, instrumentation–instrumentalisation and scheme–technique. In this study, the analysis will relate Frida’s techniques with the tool and the four aspects of Vergnaud’s notion of scheme, which will be elaborated in the next section. Whereas the study of Ben-Zvi & Ben-Arush (2014) focused on the instrumentation–instrumentalisation duality, this study focuses on the scheme–technique duality. The case of Frida follows the techniques she employs with the tool in parallel with the traces of her scheme of statistical distribution.

Vergnaud (2017) described conceptualisation as a way to reduce information and elaborate a sense of what is sufficient and necessary to manage a certain type of situation. The use of symbolism enables a person to represent the relevant information and support the organisation of activity when facing the given situation. The different approaches are then interiorised, so that they can be evoked when the person faces a similar situation. The different ways to organise activity guide perception, action and imagination. A person’s mediation actions, such as the use of words, sentences and gestures, are crucial to the development of schemes.

Vergnaud distinguished between two psychological functions of schemes. The first involves organising and generating activities for familiar situations. The second function relates to tackling new situations and extending the scope of the application of the scheme. According to him, a scheme can be divided into four parts: (1) a dynamic functional whole, (2) an invariant organisation of activity and behaviour for a certain class of situations, (3) a subset of four categories (see below), (4) a mapping of a multi-dimensional informational space onto a multi-dimensional space of action variables. The third definition is useful for analytical purposes and will be the frame for analysing the transcripts of the video observations of Frida’s work with TinkerPlots.

The four components of the scheme are as follows: (1) the student’s goals and anticipations, (2) rules of action, (3) operational invariants (theorems-in-action and concepts-in-action), (4) possibilities to make inferences (Vergnaud, 1998, 2017). The objectives of the scheme are what the student anticipates and wants to attain, the effects to be considered and the possible intermediate states that are reached (Vergnaud, 2017). The students’ goals are not to be confused with the goals of the tasks. While objectives are personal, in this study, they inform the process of students’ conceptualisation of data distribution. The operational invariants refer to the epistemic components of schemes. They often remain implicit and unconscious. However, Vergnaud noted an important distinction. Whereas theorems-in-action is a proposition that can be true or false, concepts-in-action do not have this property; they are simply relevant or not.

According to Vergnaud (1998), schemes and situations are narrowly tied together. A scheme addresses a class of situations. Vergnaud (2017) presented the example of an algorithm that can apply to a whole class of situations that share certain characteristics. When students are first introduced to a concept, Vergnaud stated that the class of situations is small when they first construe a scheme. Therefore, schemes begin as what he called local organisations of activity, and therefore, the field of application is very small at the beginning of the development of a scheme.

Vergnaud (1998) defined different categories of schemes. One category that is relevant to the case of Frida is perceptive-gestural schemes. A perceptive-gestural scheme can be efficiently applied to a whole range of situations, and the scheme can generate a sequence of actions relevant in the situation. Concepts and theorems are involved in these schemes. One example is counting a set of objects as a perceptive-gestural scheme, where the concept of cardinal numbers constitutes the underlying concept. It is important to note that, here, the observable aspects of counting might be seen as pointing at each object or perhaps nodding when uttering each counting word. This should not be confused with the scheme as such. The scheme is an organisation of behavior that, in a variety of counting circumstances, will help the counter reach the goal of deciding the cardinality of a set of objects, which happens when the counter comes to the last counted object and when the last uttered counting word attains the meaning of telling the cardinality of the whole counted set.

This study will focus on Frida’s perceptive-gestural scheme of data distribution. The analyses will show how Frida’s digital experiences with TinkerPlots strongly shape her perceptive-gestural scheme of data distribution, which will here be referred to as a tool-shaped scheme. The analyses will show how Frida turns a chain of different actions with the digital tool into a rapid and smooth gesture that supports and challenges her in reaching her goals. Like with a counting scheme, the development of Frida’s perceptual-gestural scheme will be a co-ordination of certain hand moves with the computer mouse and associated screen moves. This involves certain ideas of what perceptions from the computer screen signal that the goal of the task has been reached so that an answer can be given.

Method

The Task and the Tool—The Context of the Case Study

Because schemes and situations are narrowly tied together, and a scheme addresses a class of situations (Vergnaud, 1998), the case study is a relevant method to access students’ schemes of statistical distributions. In establishing the case of Frida, this study follows her in several situations. Therefore, it is possible to observe how her statistical distribution scheme expands and evolves over time. The study follows Frida and her peers over a period of one year in three teaching modules of five lessons (varying on 45 and 90 min). The three teaching modules made up all statistics teaching in the period. Figure 1 presents an overview of the three teaching modules. The three selected excerpts for this study are marked with red in Fig. 1.

Fig. 1
figure 1

Overview of the three teaching modules

In excerpt 1 from the first teaching module, the students compared the ages of their parents (gender comparison). This was their very first experience with TinkerPlots. Because the tool was new to the students, the idea was that understanding the data and the context should not involve any obstacles. However, it should be possible to activate relevant functions in the tool and answer a statistical question. In the second teaching module about six months later, the students were working with data they collected themselves with the purpose of entering into the public debate about indoor climate in Danish schools. The students had read newspaper articles about a general Danish problem of overly high levels of carbon dioxide (CO2) levels in Danish classrooms (Andersen, 2013; Kronborg & Jelved, 2022).

The teacher facilitated a data-driven investigation of the following question: Do we have a problem with the indoor climate in our classroom? The students prepared a presentation of their findings and invited the school principal to hear their arguments. The students created representations of the data in TinkerPlots that they found relevant for explaining and describing their findings for their presentation. Excerpt 2 stems from an artefact interview conducted just after the end of the second teaching module. Excerpt 3 stems from Frida and Mathias’ preparation for a written assignment, an argumentative text about classroom indoor climate.

In the curricular demands for this age group, descriptive statistics are emphasised in the mid-level of the curriculum, and statistical descriptors, such as means, medians, modes and ranges, are central to the curriculum in grades 4 to 6 (Ministry of Children and Education, 2019). The overarching aim of the activities was to combine, on the one hand, for students to experience the process through which data could support them in gaining insight on a phenomenon and support them in communication about a problem and, on the other hand, to develop a concept of distribution. The progression was supposed to move from students’ informal descriptions of the data to embedding the formal descriptors (e.g. means, medians, ranges, quartiles and boxplots) to increasingly make sense of the distribution of the data.

In each lesson, the teacher, almost without exception, initiated discussions in the classroom to recap key points from previous lessons. At the end of each session, the teacher initiated classroom conversations, where students shared their experiences from TinkerPlots investigations. Formal language was incorporated as needed, and the teacher would delve into the relevance and usefulness of the students’ findings. During TinkerPlots group activities, the teacher moved among students, providing support, discussing their work and addressing practical problems or comprehension questions. While the three selected excerpts showcasing Frida’s learning journey are crucial for her conceptual development, it is important to note that these situations are not isolated. Various types of engagement contribute to Frida’s overall development. In addition to the mentioned activities, Frida actively participated in classroom discussions, collaborated with peers and participated in writing tasks. Moreover, discussions may extend beyond the classroom, as students, including Frida, would engage with peers or even discuss aspects of their investigations with parents at home.

TinkerPlots was an important tool that supported students’ informal investigations. The first features they were introduced to were how to draw a plot, stack, drag and separate dots, draw lines, add attributes to the axes and use the measure tool to find the range. The tool also had black-box features that students could activate and deactivate, including means, medians and modes. In the third teaching module, the students were also introduced to boxplots. A boxplot, also known as a box-and-whisker plot, is a statistical representation of the distribution of a data-set. It provides a visual summary of key characteristics such as the minimum, first quartile (Q1), median (Q2), third quartile (Q3) and maximum. In TinkerPlots, the students can add a boxplot to a dot plot diagram.

Because the idea of distribution is a meta-concept that includes several other embedded concepts, the task should be balanced between openness and explicitness. If the task asks students to compare two sets of data, they have various ways of doing so, some more relevant than others. Both formal and informal natural language and gestures, as well as exact values, interweave in a good and thorough description of the data distribution. On the one hand, if students are asked to calculate a mean, median or range, they might succeed in doing so. However, this might serve no purpose or contain any meaning for the students. On the other hand, if the task is too vague and asks questions of students that are too broad, there is a risk that the students will be confused. The teaching sequences in this study seek to balance this dilemma.

The Data of the Study and Selection of the Case of Frida

The data for the case of Frida is drawn from several sources. First, field notes from classroom observations helped to identify the case of Frida. The main source of data was transcribed video data from screencast recordings of students work on the computer, while working with the tasks in TinkerPlots. The web camera simultaneously recorded the students to track conversations and mimics. A smaller group of students was selected for a semi-structured artefact interview (Kvale & Brinkmann, 2015). The analysis of Frida’s case builds on the transcripts of screencast recordings of her work on two different tasks. However, Frida was also selected for an interview after the second teaching sequence. Excerpt 2 stems from this interview.

Frida’s case was selected because of its epistemological potential. Several cases could have been selected from the teaching sequence to follow students’ learning trajectories and their interaction with TinkerPlots. However, according to Thomas (2011), ‘The essence of selection must rest in the dynamic of the relation between the subject and object.’ (p. 514). His subject is a practical, historical unit, and the object is an analytical or theoretical frame. In this study, the relationship between Frida’s use of the tool and the development of the scheme represented a key case, defined by its ability to exemplify the analytical object (Thomas, 2011). In this study, Frida’s case illuminates all parts of the scheme in the scheme–technique duality. Thomas proposed a case study approach that guides the case selection: (1) establishing the subject, (2) defining the purpose, (3) selecting the approach, (4) determining the process and establishing the boundaries of the case.

Frida as the subject (1) presents a relation to the theoretical frame, as we are able to observe her articulation of her goals, which is an important aspect of Vergnaud’s concept of schemes, and this could inform the analysis of the role of the tool in her establishment of a scheme of statistical distribution. As previously stated, Frida’s exploration of the concept of distribution extends beyond the three chosen excerpts. Nevertheless, these excerpts are relevant, because they show how Frida’s interaction with the tool influences the perceptual-gestural scheme. They also illustrate how the tool’s features assist and present challenges to conceptualising data distribution.

The purpose of the case (2) is to explore the relationship between scheme and technique in the learning of statistics. The case study explores how a digital tool (TinkerPlots) can create certain types of situations that affect the development of the students’ scheme. Therefore, theory testing establishes the approach (3) of the study. The purpose is not to expand the theory, but rather to explore an empirical manifestation of how a digital tool heavily shapes the students’ scheme. The case study process is, in terms of Thomas (2011), a diachronic process (4). The term ‘diachronic’ emphasises the commitment to tracking Frida’s development over time. The study closely examines how Frida’s understanding of data distribution evolves over the span of one year.

Analysis

Excerpt 1: Frida and Mathias compare the ages of their parents while they plot–stack–drag: They reject the ‘ugly’ representations and finally find the one that satisfies them.

In the following excerpt, Frida and Mathias work on the task of comparing the age of the students in the class’ parents based on gender. The task is formulated as follows:

figure a

The following excerpt from the screencast recording follows Frida and Mathias, who have a pile of 48 data cards with the age and gender of their parents when they had their first child, to the point where they have a stacked dot plot connected with a line. The point at which they have a stacked dot plot is where they are able to solve the task and to describe and compare the distributions of their parents’ ages.

Frida and Mathias use a plot–sort–stack–drag technique. First, they drag a plot down and the data from the cards randomly spread out as dots in the window (Fig. 2). Second, they sort the data by adding the attributes to the axes. Third, the students stack the data, sorting it into piles with the same case values. Fourth, the students drag the dots to the left or the right, until a representation satisfies them. They also adjust the size of the plot window. In the following excerpt, Frida and Mathias also activate the mean and connect the stacks with a line.

Fig. 2
figure 2

Data cards (to the left) and dot plot (to the right)

Frida reads the task aloud.

Frida: Okay, now we need to get this one down. This is gender.

Frida drags the attribute ‘gender’ (Køn in Danish) to the vertical axis and age to the horizontal axis (Fig. 3).

Frida: Well, I can’t see anything.

Frida enlarges the window. Mathias points at the screen on the four white dots (Fig. 3).

Fig. 3
figure 3

The data are sorted by gender in vertical columns and by age in horizontal rows

Mathias: Like this, because this is Eva’s parents.

Mathias points at the white dots.

Frida:This is gender.

Frida drags the dots several times; at last, she stacks the dots.

Frida: Oh, this is really ugly. It is really annoying. Okay, we can use …, what is it called, ‘average’?

Frida activates the mean (the blue triangle) – a black-box technique.

Frida: Okay, okay.

Mathias: It is 28.5!

Frida and Mathias are smiling. Earlier, they had discussed their predictions about the age distribution, and they had expected the parents to be around 28 years old.

In the next part of the task, the students are asked to make handwritten sketches on their paper and describe and compare the two distributions. For this purpose, Frida chooses to separate the ages further and activates a ‘draw line’. There are four choices for drawing lines in the dropdown menu. Frida tries different ways (Figs. 6, 7 and 8).

Mathias: Wow. Shut up! (Mathias refers to Fig. 6)

Frida: But, there are two of these. This is not what we wanted, is it?

Mathias: What does it even mean?

Frida: I don’t understand that either.

Frida tries ‘connect equal values’ (Fig. 7).

Mathias: Is it the couples, or what?

Frida tries ‘connect stacks’ (Fig. 8).

Mathias: Like that!

Frida: Okay! Consider being thirteen and have a child … ouch, my privates.

Frida and Mathias draw the sketch on the paper. They move on to the next sub-task.

Mathias: It is not symmetrical. It is really not symmetrical. Or … a little bit there.

Frida: The mothers are very flat.

Mathias: The fathers have many peaks.

Frida: Yes. Well, there is one peak.

Mathias: There are two there. (Mathias refers to the two small peaks left of the highest peak.)

Frida: Yeah, but that is not really … The mothers are very flat.

Mathias: But, there is one peak?

Frida and Mathias answer question C with the support of Fig. 8. Frida writes the following:

Question B:

‘The ‘mothers’ are very flat. There are not that many peaks. There is a very little peak at 26. The ‘fathers’ have a very high peak at 30 and a small peak at 28.’

Question C:

‘The fathers are approximately 1.5 years older than the mothers. The mothers are more spread out over the whole diagram, whereas the fathers peak very big at 30.’

Frida and Mathias use informal words when describing and comparing the distributions of the parents. However, they succeed in doing so, and they also draw a conclusion about the fathers being the oldest. Without using the formal word for ‘mean’, they use it in their comparison. It seems evident that the tool played an important role, and that Frida and Mathias tried several (for them) useless representations (Figs. 4, 5, 6 and 7) before they decided to use the diagram in Fig. 8 to support them in solving the task.

Fig. 4
figure 4

The data is grouped into four-year age spans

Fig. 5
figure 5

The data is stacked in two columns

Fig. 6
figure 6

The data is separated horizontally, the mean is calculated and the ‘draw line’ feature is applied

Fig. 7
figure 7

A line that connects all the dots is drawn

Fig. 8
figure 8

A line is drawn to connect the stacks

Focusing on the interaction with the tool, it is evident that Frida has an objective. It is not precisely articulated what her objective is, but it is evident when she is not satisfied with the feedback from the tool (‘This is ugly’ or ‘This is not what we wanted, is it?’). Her objective is not convergent with the task, as the task does not ask for a specific representation. Nevertheless, the tool guides Frida’s goal for solving the task and she articulates the desire to be able to ‘see’. This establishes some kind of rule of action. The representation Frida creates with the tool must expose visual characteristics of the distribution. The distribution is thus a concept-in-action, understood as a collection of data that has a shape that exposes certain features of a phenomenon, in this example, the differences in the age of the parents. Actually, the representation from Fig. 5 could also have helped her solve the task, but Frida found it ugly for some reason. Nevertheless, in the end, it is possible to make an (informal) inference about differences between genders. Ultimately, Frida’s experiments with different representations of the distribution make it possible to solve the task and make a comparison of the two genders.

The flexibility of the tool and the ability to plot, stack and drag the dots plays an important role in Frida’s development of the distribution scheme. As stated by Vergnaud (2017), the schemes start as local organisations of activity and thus have a small field of application. In the case of Frida, the scheme is tool-shaped and relates to the action of manipulating the representation until she is satisfied.

Data distribution becomes a concept-in-action for Mathias and Frida. Distribution is a meta-concept with several embedded sub-concepts embedded. While the task does not ask for specific sub-concepts, Frida and Mathias handle both individual values of the data (Emily’s parents), the mean as representative of the ages of the mothers and fathers, and thus they describe features of the distribution in natural language to describe the shape.

Excerpt 2: Frida and Iris are explaining what they like about their favourite representation.

The next teaching sequence takes place six months later. This time, the theme is indoor climate. The students read newspaper articles about a general problem of overly high CO2 levels in Danish classrooms. The articles claim that these high levels disturb students’ ability to learn and cause headaches. In the articles, 1000 ppm is defined as the critical value that should not be exceeded. The students are to investigate if they have this problem in their classroom and report their finding to the school principal. The teacher and the researcher have installed a CO2 metre that measures the CO2 levels during the day in ppm. The students measure other attributes as well, including the school subject, time of day (categories: morning, forenoon, lunchtime and afternoon) and mood/energy level.

Figures 9, 10 and 11 present data from the students’ study of the CO2 levels. The measurements vary from around 400 to 2600 ppm. The same data is the basis of the task in Excerpt 3, where Frida is making sense of the data, but Excerpt 2 stems from an artefact interview with Frida and Iris focused on why they prefer the representations they do and, more generally, what they value in the representation they create with the TinkerPlots. The aim was to gain a deeper understanding of what guided Frida’s techniques when she seeks the representation that fulfils her goal. The goal is not, as shown in the previous excerpt, explicit. Nevertheless, Frida is still clear when she does not reach her goal.

Fig. 9
figure 9

CO2 data into columns

Fig. 10
figure 10

The CO2 data in ppm on the horizontal axis is completely separated

Fig. 11
figure 11

The CO2 data is grouped into categories with a span of 300 ppm

The dragging part of the plot–stack–drag technique is where Frida makes her final decision regarding whether the representation is good. She drags the dots back and forth until she reaches her goal. The three examples (Figs. 9, 10 and 11) show the different degrees of separation when the dots are dragged horizontally. Frida and Iris generated the three examples shown in Figs. 9, 10 and 11. They use these examples to illustrate their preferred representation to the interviewer and the reasons behind their choice. In Fig. 9, the dots are not separated at all. In Fig. 10, the dots are fully separated horizontally, while in Fig. 11, the dots are separated to a level where Frida and Iris are satisfied with the representation. In the following excerpt, the interviewer asks Frida and Iris to explain why they prefer this representation (Figs. 9, 10 and 11).

Iris: So, now I have separated them like totally, so I think it’s hard to see where this one is and where this one is.

Int.: What is it that you want to see? Now you’re saying, Iris, you have such a nice way of expressing it, you say it’s a bit difficult to see.

Iris: Well, it’s a bit tricky to see where it’s located without having to tap on it, then you can see that it’s there. But, it can be a bit harder to spot. It’s a bit easier to see here (Iris is referring to Figure 11).

Frida: So, you can kind of make it more clear (Frida makes a curve with her hand).

Int.: Frida, you’re saying you can make it a bit clearer, and then you do this with your hand. What do you like about it? What does the hand movement mean?

Frida: It’s just very easy to figure out what’s happening or something like that. It’s very ... I don’t know. You can see it quite well. It’s easy to understand, unlike the other one, for example, if it’s all scattered.

Iris: Because the wave is very unclear (Iris is referring to Fig. 10).

This part of the artefact interview with Frida and her peer Iris clarifies her goal. In the first excerpt, it is evident that some representations did not fulfil Frida’s goal. In Excerpt 2, the interviewer asks Frida and Iris which representation they preferred and why. Once again, the representation should make them able to ‘see’. Iris explains that it is ‘difficult to see’ when the dots are dragged out the most (Fig. 10). Frida makes an important gesture when she explains what she likes about the representation in Fig. 11. She draws a wave in the air and explains that it is clearer this way. This shows that Frida (and Iris) are driven by a goal to create a representation with a smooth shape, which reveals some features.

The students have developed a goal that is very closely related to distribution as a concept-in-action. They seek to view a collection of data as a coherent unit, which can be represented in a way that exposes features to the students. The technique of plot–stack–drag becomes a rule of action (drag until satisfied with the representation), and implies a scheme of distribution according to which data can create a shape. At this point, plotting, stacking and dragging occur quickly, forming a seamless and cohesive movement. Consequently, Frida and Iris have accommodated these actions as part of a perceptive-gestural scheme. In this scheme, the swift hand movements with the computer mouse replicate the movements of the dots in the plot displayed on the screen.

Excerpt 3: Frida and Mathias investigate data about the indoor climate: ‘Why do they not stack properly?’.

The third excerpt occurs six months later. The students are working on new tasks, but the teacher also asks them to revisit the CO2 data and write a letter to the editor of the local newspaper to start a debate about the problem of the indoor climate. Frida is now working with Mathias again. However, Mathias has some technical issues with a damaged file and does not participate much in the dialogue. Frida continues to work on the task while attempting to help Mathias and sharing her work.

The task asks the students to investigate the data with the features they know from before: make a data card; plot the data; drag the data; draw attributes; drag lines and activate the mean, median and mode; and use the ruler to measure the spread. The teacher reminds the students of the common descriptors, means, medians, modes, ranges and minimum and maximum values. The teacher also introduces the students to the boxplot and instructs them to include it in their representation of the data. The teacher instructs the students how to draw a boxplot. In the following excerpt, Frida has made a plot, added attributes and dragged the dots into an acceptable representation. By now, this technique has become routine to Frida, and she performs it rapidly. Frida executes the plot–stack–drag so quickly and smoothly that it has become a coherent gesture. Nevertheless, this time it does not satisfy Frida goals.

The CO2 data takes many different values. Conversely, the age data is categorical and more compatible with Frida’s plot–stack–drag technique. She is not satisfied with the representation, as her technique did not work. However, her conceptual understanding and more diverse ways for describing the data allow her to proceed in spite of the representation being ‘ugly’.

Frida reads the task aloud.

Frida: Okay, we can do that.

Frida drags a plot (Fig. 12).

Fig. 12
figure 12

A table on the left and a dot plot on the right

Frida: Stack! (Fig. 13).

Fig. 13
figure 13

The dots are stacked with a count on the vertical axis

Frida: What? Time of day. Count.

Frida drags the dots to the right two times (Figs. 14 and 15).

Fig. 14
figure 14

The dots are dragged to the right once and grouped into two categories with a span of 2000 ppm

Fig. 15
figure 15

The dots are dragged further to the right

Frida: Why don’t they stack properly? It is so annoying.

Frida drags a box plot down to the dot plot (Fig. 16).

Fig. 16
figure 16

A box plot is added to the chart

Frida makes an ‘equal count division’ (Fig. 17).

Fig. 17
figure 17

The ‘equal count division’ feature is activated and applied to the chart

Frida: Mean. Median.

Frida activates both the mean and median (Fig. 18).

Fig. 18
figure 18

The mean and the median are applied, represented by the blue triangle and the red ‘t’ symbol, respectively

Frida: Well, I don’t quite understand for what we should use the median. It is just a different way to show the mean. It is unnecessary. It is like OK, it is there, but (…) it does not really matter. (…) Okay, ‘numeric value’, where is it?

Frida adds the numeric values (Fig. 19). Afterwards, Frida deactivates the median.

Fig. 19
figure 19

Numeric values are added to the mean and median

Mathias has some technical difficulties with the files. Frida tries to help him and returns to their common task afterwards.

Frida: Okay, now you can see. Now, I will just take a picture of this. I can send it to you.

Frida makes a screenshot of the diagram and copies it into a Word document. She becomes unsure of the threshold limit value of 1000 ppm and checks it with her peers. Frida returns to the task description to determine the exact formulation. The task asks her to describe the data set with the descriptors she knows. (In the following, Frida’s writing is in underlined italics.)

Frida: As one can see, the mean of the CO2 measurements in September is 1152,73 (…). The first 50% are under 1000 and the last 50% are over 1000.

Frida: It can’t be right (…). Ah, now I know.

Frida stare at the screen for a moment before she continues to write.

Frida: The median, which is the place between the two 50%, is about 1000, but the mean is, on the other hand, higher.

Frida looks at the screen again.

Frida: And what does that mean?

The researcher (res.) asks Frida and Mathias if everything is all right.

Frida: What does it indicate if the mean is higher than the median?

Res. Yes, what does it indicate?

Frida: That the numbers, like the higher ones. The highest numbers are way higher than ... I mean, they’re much higher than the lowest ones are low.

Res.: It can mean different things, but how do you calculate the mean?

Frida: It is …

Mathias: If you put everyone in a row and divide them.

Frida: Yes.

Res.: Okay, so you add up all the numbers and divide by the number of measurements.

Frida and Mathias: Yes.

Res.: But what we discussed was the day of the geography lesson when the level reached as high as twenty-six hundred or something – it was an unusually high level. How is it here? Is the mean higher than the median?

Frida: Yes. Actually pretty much. One hundred and fifty higher.

Res.: Okay, so the mean is higher than the median and it is because there are some values which are very …

Frida: Yes, which are very high.

Res.: Well, explain it to me.

Frida: The high values are high.

Mathias: The high are higher … eh, than the low are lower.

Frida: And they do not affect the median. Just that they are high, so …

Res.: Why? Why do they not affect the median but the mean?

Frida: Because with the median, you line them up, and then it does not matter how high the highest are.

Res.: Okay, so if the one that is twenty-six hundred something was one hundred million instead?

Frida: Then, the median would be the same, but the mean would be higher.

Frida continues writing.

Res.: So, what are you writing, Frida?

Frida: That … that the mean is higher than the median because some are very high. A few very high values affect the mean more than the median because the median doesn’t care about it … It doesn’t care about the value. It cares about the value compared to the others. Something like that.

Frida giggles slightly and puts her finger on her chin as if she is thinking.

Res.: You are on the right track, but explaining isn’t easy.

Frida starts writing again.

Frida: This is because there are some very high numbers. The high numbers have a greater impact on the mean than on the median. For example, the median would remain the same if the number just above 2600 were 1000000. The last 50% are therefore higher than the first 50% are low.

Frida: This is the best I can write right now.

Nevertheless, Frida continues writing even though the teacher has asked them to stop.

Frida: As the numbers increase, the different 25% have a larger and larger range of variation. It spreads out more as the numbers get higher.

As Excerpt 2 shows, Frida appreciates a representation that exposes the shape of the distribution. This is a valuable goal for Frida, allowing her to ‘see’ something, as she referred to it in Excerpts 1 and 2. The plot–stack–drag technique has become a rule of action. She executes the plot–stack–drag sequence seamlessly and cohesively, as if the three movements are interconnected and flow as one unified gesture. However, she is very explicit that the plot–stack–drag technique does not satisfy her goal this time (‘Why don’t they stack properly? It is so annoying.’). This might be due to the data, which acts differently than the categorical data on age in the first excerpt, but if Frida were more familiar with the tool, she might have succeeded anyway. She could have changed the size of the dots, the shape and size of the plot window, and created a representation of a smooth left-skewed distribution. This was not the case, and Frida was annoyed with the representation, blaming it on the tool. Frida’s plot–stack–drag does not satisfy her objective of creating a smooth shape of the data. Therefore, she has to discard her tool-shaped distribution scheme this time and revise the distribution scheme.

Fortunately, Frida does not give up and, since the first excerpt, her scheme of distribution has expanded with several other theorems-in-action. Here, she makes sense of the mean, median and quartiles in her description of the distribution. However, the shape does not seem to support her description this time. This time, Frida combines black box techniques with a theoretical explanation. At first, Frida alternately activates and deactivates the median. The median does not initially support her in reaching her objective, namely, making sense of the distribution. She deems the median superfluous. However, she returns to the median and asks the researcher to help her. This led to a conversation about the definitions of mean and median.

It is clear from the dialogue that Frida already knows the definitions. However, the dialogue becomes a turning point. Frida starts by discarding the median when using the tool. She interprets it as a synonym for the same feature that the mean expresses. Frida makes sense of the theoretical definitions, and the mean, median and the difference between them turn into theorems-in-action shortly after as Frida writes her assignment. Her scheme of distribution now also consists of formal descriptors, such as range, mean, median and boxplot (sub-schemes), which are now concepts-in-action and theorems-in-action. Hence, she exhibits the capacity to explore other representations than the one created by her plot–stack–drag technique. The boxplot and its representation of median and quartiles become a new set of concepts-in-action that support Frida in achieving her goals and solving the task.

Discussion

As stated in the introduction, TinkerPlots creates a landscape in which students can freely explore data and which supports them in the organisation of data. TinkerPlots frees the students from tidy calculations (Garfield & Ben-Zvi, 2004). In the case of Frida, it is clear how she uses the features of the tool to support her in solving the task. In the first excerpt, Frida and Mathias try different representations, rapidly shifting between them, before deciding which one best supports solving the task. Frida’s desire to ‘see’ something in the representation when they compare gender differences might indicate that her idea of distribution consists not only of data as a collection of individual values but also as a whole, which can expose certain characteristics like shape. Frida and Iris’ handwave gesture underlines the idea of distribution as a whole. In Frida’s conceptualisation of the statistical distribution, her personal goals and anticipation might play a significant role. It is not only important to make sense of formal statistical measures, such as means, medians, modes and ranges. The tricky part is to judge what is worthy of attention, and therefore will the personal goals of the students have an enhanced position when making sense of data distributions.

The flexibility of TinkerPlots provides the students with a variety of choices. The students can adjust the dot plot, choose which attributes are interesting, make adjustments until the representation is just right, activate or deactivate different features and measures of centre, activate the ruler and so on. In her first experience with the tool, Frida quickly develops a technique of dragging a dot plot, stacking the dots, selecting the attributes and then dragging the dots back and forth until she is satisfied with the representation. The representation that satisfies her is the one that enables her to ‘see something’. I refer to this gesture as her plot–stack–drag technique. The three actions—plot, stack and drag—become one coherent and rapidly performed gesture for Frida, eventually becoming a perceptual-gestural scheme. Even the simplest and most modest plot–stack–drag technique allows the student actively to evaluate the feedback from the tool and easily to adjust it if it does not satisfy the student’s objective. For Frida, this represents a relevant starting point when she moves from the individual values of the parents’ ages to construct a coherent description of gender differences.

In the first excerpt, Frida and Mathias use the mean as the only formal descriptor, but in the last excerpt, several descriptors are embedded in the description. One of the major challenges in developing a conceptual understanding about descriptive statistics, and the idea of empirical distribution is seeing the descriptors as representative of the whole distribution (e.g. Konold & Pollatsek, 2002; Mokros & Russell, 1995). It seems that TinkerPlots supports Frida in developing a conceptual understanding of the descriptors as representatives of the whole distribution. In the first excerpt, Frida uses the mean along with a description of the shape. In the third excerpt, Frida uses several descriptors. In her description of the quartiles of the CO2 levels, she does not report the quartiles as numbers. Rather, she has noticed how they increasingly spread out from the first to the last quartile.

This description arises from Frida’s interaction with the tool, as she identifies the (most) satisfying representation and then notices characteristic features of the distribution. This might be connected to her personal goal of creating a representation that enables her to ‘see something’, a goal that is tightly connected to her plot–stack–drag technique and her experiences with the digital tool. It seems that the metaphor of digital tools in the learning of statistics as fast modes of transportation (Biehler et al., 2013) is appropriate and beneficial in the case of Frida. Students must not only conceptualise statistical measures, but also develop their judgement and anticipation. The process of learning to judge what is worthy of attention is non-trivial, and students must develop this competence in parallel with conceptual knowledge about statistical measures. It seems that the tool has supported Frida in this conceptual development.

However, in the third excerpt, it is evident that Frida’s plot–stack–drag technique was challenging for her as she attempted to make sense of the CO2 data. The plot–stack–drag technique did not act as she anticipated which obviously annoyed her. Frida’s experience with TinkerPlots has established a certain kind of situation, as the technique supported her in solving the task, and, in her first experience with data exploration, the plot–stack–drag technique supported her in drawing conclusions about the data distributions. As Vergnaud (1998) stated, situations and schemes are tightly connected. The situations that Frida already masters shape her scheme of distribution. The tool-shaped scheme is relevant for her, until she encounters a situation in which the data act differently in the tool. This relates to what Vergnaud called the perceptive-gestural scheme, which can be efficient for a whole range of situations, and the scheme can generate a sequence of actions relevant to the situation. Fortunately, by the time of the third excerpt, the teacher had introduced the students to other ways of describing and representing data, and Frida’s scheme had been expanded with the boxplot, which supported her in reaching her goal, which might have still been to ‘see something’.

When engaging with TinkerPlots in Excerpt 3, Frida initially dismissed the median as irrelevant to achieving her goals. However, she reconsidered this decision and explored the theoretical definition in a dialogue with a researcher. Despite Frida and Mathias being familiar with the definitions of mean and median, they lacked sufficient depth to evolve into theorems-in-action and fulfil their objectives. The tool itself was insufficient in aiding Frida in comprehending the median. For the median, the mean and the difference between them to transition into theorems-in-action for Frida, a synchronisation between interacting with the tool and reflecting upon the theoretical definitions seemed crucial. This dynamic created a situation supporting Frida’s conceptualisation of statistical distributions and the sub-schemes associated with statistical descriptors.

Although digital tools provide good support to students in their exploration of data and conceptual development, it is important to be sensitive to which tool-shaped schemes the students develop along the way. In order to support the students’ development of statistical distribution schemes, it is important for the teacher to be sensitive to the duality between the techniques the students develop with the tool and how these techniques shape the scheme. In particular, there is something important to learn from the ways that students’ experiences with the digital tool shape their goals and anticipations. Frida is flexible and is quick either to develop a new sub-scheme or to adapt the boxplot into her distribution scheme. However, teachers should be aware of the limitations presented by fixed tool-shaped schemes, which may hinder students’ development of a rich and flexible understanding of statistical distributions. The conflict Frida experiences when the tool does not serve her goal reveals a great deal of information. If the teacher engaged in a dialogue with the student to explore the origin of the student’s personal goals, the teacher could understand important aspects of the student’s conceptualisation of statistical distribution. For instance, it is important for Frida to find a representation with a smooth shape that brings out features of the distribution.

Indeed, by engaging in a dialogue to explore students’ goals and anticipations, the teacher could contribute to the expansion of the students’ schemes. In this case, an expansion could have been supported in two ways if the teacher had the ability to explore Frida’s dissatisfaction with the feedback from the tool. First, it could have led to an investigation of the digital opportunities and the students could learn new ways to handle the tool, such as adjusting the window, the size of the dots and the bin size. This might have led Frida to succeed in finding a satisfactory representation with her plot–stack–drag technique. This success, in turn, could have expanded the range of possible actions available to her, ultimately increasing her readiness to handle various situations in the data exploration process. Second, a dialogue with Frida and her peers about why the tool did not ‘stack properly’ could elicit a reflection upon how to handle various kinds of data, that is, differences between categorical and continuous variables. Crucially, the personal goals and anticipations that students develop from their interactions with the digital tool present learning opportunities if they are brought into the light and explored through dialogue.

It is important to note that the three excerpts are not the exclusive factors influencing Frida’s conceptual development. Given that she is not isolated, the generalisability of the findings is limited. However, this case holds significance as a key case in terms of Thomas (2011), serving as proof of existence. This case was selected due to the interesting dialectical relationship between the subject, Frida and the object, the theory of conceptual fields. Frida was remarkable because of her explicitness regarding her satisfaction and dissatisfaction with the feedback from the digital tool, which gave access to interesting parts of her statistical distribution scheme, particularly how the digital experiences shaped her personal goals and anticipations. The results of this paper reveal an important dialectical relationship between the plot–stack–drag technique in TinkerPlots and how it formed Frida’s goals and anticipations. This finding enhances an important aspect of Vergnaud’s theory of conceptual fields. Specifically, it illustrates how digital experiences can shape perceptual-gestural schemes and the conceptualisation of statistical distribution. The time span of one year allowed the exploration of how the technique and personal goal of finding the right representation followed Frida from one situation to the next.

However, this study presents opportunities for further research. The study of Ben-Zvi & Ben-Arush (2014), as noted above, contributed different student profiles of instrumentation in the exploratory data analysis of TinkerPlots. Bakker & Hoffmann (2005) explored the role of digital tool in students’ learning trajectories. In the case of Frida, the focus was on the dialectical relationship between the student’s goals and the technique developed with the digital tool. To enhance our comprehension of the intricate link between students’ personal goals and the techniques they employ with digital tools while learning statistical concepts, it could be intriguing to investigate additional cases, particularly if variations exist in student profiles when monitoring the learning paths of other students.

Conclusion

The research question of this study focused upon how the interaction with the digital tool TinkerPlots could shape students’ conceptual development of the notion of data distribution and how the students’ tool-shaped scheme challenges or supports their further conceptual development as they progress to new situations. The analyses of the case of Frida showed how Frida’s plot–stack–drag technique shaped her personal goal of finding a representation that enabled her to ‘see’. The analysis revealed that this personal goal supported her conceptual development of the idea of statistical distribution. However, the study also showed that the technique could lead to disappointment when the student fails to achieve the goal of identifying the representation that fulfils this goal. A disappointment required that Frida reshaped her distribution scheme. The case also showed how the tool was insufficient for concept development. Theoretical reflections about the definitions of underlying concepts are crucial.

Overall, the study highlights the importance of teachers’ awareness of the personal goals that students develop when interacting with TinkerPlots. It is crucial to bring possible conflicts to the fore, in order to exploit the learning opportunities such conflicts might contain. The findings also suggest a direction for further research. Specifically, future research could explore the dialectical relationship between the techniques that students develop when using digital tools for data exploration and their personal goals and anticipations that these techniques form along the way.