Abstract
The goals on this chapter are to:
You have full access to this open access chapter, Download chapter PDF
2.1 Introduction and Scope
2.1.1 Scope
The goals on this chapter are to:
-
discuss the fundamentals of educational data management, including issues related with data cleaning methods, metadata, data curation and storage for preserving educational data, and
-
introduce the key Ethical Principles that govern the use of educational data, especially in terms of privacy, security of data and informed consent that should be addressed via transparent and well-defined ethical policies and codes of practices.
2.1.2 Chapter Learning Objectives
Learning Objectives | Learn2Analyse Educational data literacy Competence profile |
---|---|
Know and Understand the most common quality issues of raw educational data | 1.2 |
Understand data cleaning methods for educational datasets | 2.1 |
Understand the advantages of enhancing educational data through data description | 2.2 |
Understand the need for data curation in educational data management | 2.3 |
Be able to identify storage issues for preserving educational data | 2.4 |
Understand the importance of informed consent as a key Ethical Principle of Educational Data | 6.1 |
Understand the significance of educational data protection policies | 6.2 |
2.1.3 Introduction
This chapter will introduce the second key competence of educational data literacy, namely, Educational Data Management.
The first step in this imperative process is Data Cleaning. Since educational data comes from various sources, it could be really messy. It may come in diverse formats and it may contain various types of inaccuracies. Thus, it is essential to know the most common quality issues of raw educational data and understand the data cleaning methods for educational datasets.
In order to add value to the datasets, educators need to understand the advantages of enhancing educational data through data description by using Metadata, usually defined as “data about data”.
Data Curation is attributed with great importance in educational data management, in order to transform raw data into consistent data that can then be analysed.
Moreover, to ensure continued and reliable long-term access there are many important aspects we need to consider and manage, when it comes to an effective digital preservation process for the educational data.
Special focus should be given on key technical elements of digital preservation. The selected storage solution is of prime importance for digital preservation, since security and privacy issues are significant concerns.
Along with the emerging opportunities offered, education data-driven practice and assessment raise challenges such as ethical issues and implications especially in terms of privacy, security of data and informed consent that should be addressed via transparent and well-defined ethical policies and codes of practices.
Several frameworks, policies and guidelines have been developed to help institutions and educators to identify potential ethical issues and to apply clear ethical policies that govern the use of educational data.
New regulations, like the GDPR (General Data Protection Regulation) have raised awareness of data ethics issues that can arise from data misuse.
Informed consent is declared by most international guidelines as one of the pivotal principles in Data Ethics. The way individuals are informed is crucial for the informed consent process. Educators should ensure that individuals fully realize the expected consequences of granting or withholding consent.
With regards to the collection of personal data about children, additional protection should be granted since children are less aware of the risks and consequences of sharing data and of their rights.
As mentioned, in the light of rapid development of Educational Data Analytics on a global basis, new challenges to privacy and data protection have also emerged.
Do educational data analytics challenge the principles of data protection? Is privacy a show-stopper? How privacy is guaranteed/secured, especially if minors and/or sensitive data is involved?
Education professionals need to pay extra attention to sensitive data (special category of personal data) since an organisation can only process this data under specific conditions (explicit consent may be needed).
Moreover, the protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures are taken. In order to identify sensitive data, assess and respond to data risks and monitor implemented security processes, a Data Protection Impact Assessment (DPIA) may be required whenever processing is likely to result in a high risk to the rights and freedoms of individuals (IT Governance UK, 2016).
2.2 Adding Value to Educational Datasets (Educational Data Management)
2.2.1 Making Data Tidy (Data Cleaning)
We are surrounded by a sea of data. As per BrightBytes (2017) “The widespread availability of accurate and usable data has the potential to unlock a universe of information for educators.” We could add, that without the appropriate process of getting data ready to use (whether you call it wrangling, cleansing or simply cleaning), “data is simply a scatter of numbers”. You may also review the video “Data Wrangling for Faster, More Accurate Analysis” (in the useful video resources) showing that “Data discovery is a critical step when working with complicated data”.
In this topic, we will continue studying the language of data. It is time for the second key area of data literacy vocabulary, Educational Data Management. The first step in this imperative process is Data Cleaning. Figure 2.1 depicts the framework of data cleaning as defined by Maletic and Marcus (2000) in Data Cleansing: Beyond Integrity Analysis.
As mentioned, educational data comes from various sources. There is data from online learning environments, data from state tests, demographic data, data from management information systems, from open educational resources and much more. It would be really useful if we could unify all these little pieces to reveal the big picture and realize the untapped potential.
All this data could be really messy. It may come in diverse formats and it may contain various types of inaccuracies like missing values, outliers, duplicate instances. To obtain an integrated and consistent database that is free from any sort of discrepancies, data clean-up is required.
As Romero et al. (2014) describe in A Survey on Pre-Processing Educational Data, the data cleaning task concerns the detection of erroneous or irrelevant data and how to discard it.
Let’s move on and find out the most common discrepancies in data, like:
-
missing data,
-
outliers,
-
inconsistent data,
-
double instances,
and how to handle them (Fig. 2.2).
Missing values occur when no value is stored for the variable in the current observation (Little & Rubin, 2002).
When using an e-learning environment, it is very common for learners to study at their own pace, to follow their own learning path. They usually skip some activities and complete only a part of the tasks in the course. Sometimes they even drop out and never come back. Thus, missing data is very common when collecting educational data.
Romero et al. (2014) suggest several ways to handle missing data:
-
Use a label, like “null” (unspecified), or “?” (missing)
-
Use a substitute value like the attribute mean or the mode
-
By determining what is the most probable value to fill the missing value, using regression.
-
In some extreme cases, in order to clean data and ensure their completeness, learners who have all or almost all their values missed can be removed from data.
An outlier is an observation that has values which deviate from the expected, either too large or too small from most other observations (Fig. 2.3). They may be caused by typographical errors or errors in measurement. Remember when NASA lost a Spacecraft due to a Metric Math mistake (Harish, 2019)?
In datasets, different scales of numerical values are often used to make it easier for humans to read. For example, in budget datasets, the units are often in the millions. 1,500,000 often becomes 1.5 m. However, smaller amounts like 400,000 are still written in full. As a result, 1.5 m looks like it is an outlier, while it is an inconsistency in data types and formats.
However, Romero et al. (2010) indicate that “outliers may be phenomena of interest in a dataset, it could be correct and represent real variability for the given attribute.”
In the context of educational data, outliers can be often true observations (Romero et al., 2014). For example, there are always exceptions among learners, who succeed with little effort or fail against all expectations. In another example, very high values are often recorded for time-spent because the learner had not signed-out before leaving the digital learning environment.
It is clear that not all outliers are errors. It depends on the aims of the analysis, whether these outliers should be eliminated or not, and requires knowledge of the context in which the data was produced and collected.
For example, imagine negative values for the age of a person or height data measured either in meters or in centimetres. In fact, some incorrect data may also result from inconsistencies in naming conventions or data codes in use, or inconsistent formats for input fields, such as a date (Chakrabarti et al., 2009). The most common error is the mixed use of American (MM/DD/YYYY) and European (DD/MM/YYYY) formats (see Date formats around the world).
People often try to save time when entering data by abbreviating terms. If these abbreviations are not consistent, it can cause errors in the dataset. Differences in capitalisation, spacing, and genders of adjectives can all cause errors. There can be numerous inconsistencies. We have to deliberately deal with them. At the same time, it is in every case better to log the details of our procedure cautiously for future reference.
Data deduplication is a process that reduces storage overhead by eliminating redundant copies of data and, ensuring that storage media retain only unique instances of data. A duplicate record is where the same piece of data has been entered more than once (Fig. 2.5). Duplicate records often occur when datasets have been combined or because it was not known there was already an entry.
In educational organisations, data integration and correlation are essential activities related to data collection. Information obtained from multiple sources usually leads to duplicated data observations and inaccurate data. This duplicate elimination is one of the most important steps in the data cleaning process. The procedure of detecting and eliminating duplicates from a particular data set is called Deduplication.
According to Crowdflower Data Science Report 2016, scientists spend the most time collecting and cleaning data (Fig. 2.6). Messy data is by far the most time-consuming aspect of the typical data scientist’s workflow.
The point with data is that it needs to be regularly maintained to ensure that data remains clean and crystal clear Ronald van Loon (2018).
Much of the data may be unstructured, noisy and in need of thorough cleansing and preparation before it is ready to yield working insights Big Data expert, Bernard Marr (2017).
Questions and Teaching Materials
-
1.
Finally, after Alice collected the necessary parental consent for her intervention, the flipped classroom course is up and running.
After running the online course for three weeks, Alice tracks her students’ activity in the online learning environment. Thus, she also collects data related to students’ engagement, behaviour and performance in the LMS e.g. time spent in the platform, the videos her students watched, their progress in the online course, downloaded files, their online quiz scores, their participation in the forum as well as interaction among them.
Before proceeding further, Alice confirms that the collected data meets basic quality characteristics. She watches the video “Data Wrangling for Faster, More Accurate Analysis”. Thus, she examines and verifies the educational data against different quality measures. Inconsistences in data, like missing pieces, errors, even differences in how the same value is expressed, produce inaccurate results.
-
True
-
False
-
Correct answer: True
-
2.
Alice has collected educational data from various sources (data from online learning environments, data from state tests, demographic data, data from management information systems, from open educational resources and much more) and she wants to unify the datasets in order to reveal the big picture.
Alice soon realizes that the data coming from various sources in diverse formats, is quite messy, containing missing values, outliers, and duplicate instances. To obtain a consistent database, free from any sort of discrepancies, data cleaning is required so as to detect erroneous or irrelevant data and discard it.
In the framework of data cleaning, as defined by Maletic and Marcus (2000) and presented in fig. 2.1, the following three phases define a data cleansing process.
Help Alice to arrange the phases in the right order:
-
A.
Correct the uncovered errors
-
B.
Define and determine error types
-
C.
Search and identify error instances
-
A.
Correct answer: B – C – A
-
3.
Alice has collected data from the Learning Management System and she realizes that some users accessed her course just once (in error or in order to see one specific resource or to do an activity) but never returned to the course later.
What would you suggest Alice to do in order to handle the missing values?
-
A.
to use a label, like “null” (unspecified), or “?” (missing)
-
B.
to use a substitute value like the attribute mean or the mode
-
C.
by determining what is the most probable value to fill the missing value, using regression.
-
D.
by removing these learners from the dataset.
-
A.
Correct answer: D
-
4.
Alice has extracted the following dataset containing file downloads data from the school’s Learning Management System.
File1.pdf | File2.pdf | File3.pdf | File4.pdf | File5.pdf | File6.pdf | File7.pdf | File8.pdf | File9.pdf | |
---|---|---|---|---|---|---|---|---|---|
Student1 | 2 | 1 | 0 | 2 | 1 | 1 | 0 | 1 | 2 |
Student2 | 1 | 3 | 2 | 1 | 1 | 1 | 2 | 1 | 1 |
Student3 | 1 | 1 | 2 | 1 | 1 | 0 | 1 | 2 | 3 |
Student4 | 12 | 14 | 18 | 20 | 16 | 15 | 14 | 12 | 9 |
Student5 | 1 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 2 |
Student6 | 1 | 2 | 1 | 1 | 1 | 1 | 3 | 2 | 1 |
Student7 | 0 | 1 | 2 | 3 | 1 | 1 | 1 | 2 | 1 |
Student8 | 1 | 1 | 0 | 1 | 2 | 2 | 1 | 0 | 2 |
Student9 | 1 | 1 | 2 | 1 | 1 | 1 | 3 | 2 | 1 |
Student10 | 1 | 0 | 1 | 2 | 3 | 1 | 1 | 2 | 1 |
Student11 | 16 | 15 | 14 | 12 | 9 | 12 | 11 | 10 | 8 |
Student12 | 1 | 2 | 1 | 0 | 2 | 1 | 0 | 1 | 2 |
Student13 | 1 | 1 | 3 | 2 | 1 | 1 | 2 | 1 | 1 |
Student14 | 1 | 1 | 1 | 2 | 1 | 0 | 1 | 2 | 3 |
Student15 | 1 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 2 |
Student16 | 1 | 2 | 1 | 1 | 1 | 1 | 3 | 2 | 1 |
Student17 | 0 | 1 | 2 | 3 | 1 | 1 | 1 | 2 | 1 |
Student18 | 1 | 0 | 1 | 2 | 1 | 2 | 1 | 0 | 2 |
Student19 | 1 | 2 | 1 | 1 | 1 | 1 | 3 | 2 | 1 |
Student20 | 0 | 1 | 2 | 3 | 1 | 1 | 1 | 2 | 1 |
Student21 | 2 | 1 | 0 | 2 | 1 | 0 | 1 | 2 | 1 |
Student22 | 1 | 3 | 2 | 1 | 1 | 2 | 1 | 1 | 1 |
Student23 | 1 | 1 | 2 | 1 | 0 | 1 | 2 | 3 | 1 |
She can easily identify two outliers (Student4 and Student11). Help Alice to decide what to do with these outliers, in order to proceed with the data analysis. These outliers:
-
A.
are errors and should be eliminated in order to proceed.
-
B.
are true observations and should not be eliminated.
Correct answer: B
-
5.
Alice participates in an International Conference on Teaching and Learning. Therefore, she must prepare a review of students’ performance from 6 different countries in three main subjects, namely Maths, English, and Science.
Students’ performance data from 6 different countries are collected in the following table.
Date of Birth | Student | Maths | English | Science | Country | |
---|---|---|---|---|---|---|
1 | 4/9/2008 | Richard | 95 | 68 | 96 | USA |
2 | 9/10/2007 | David | 65 | 78 | 70 | UK |
3 | 12/12/2009 | Mary | 59 | 55 | 53 | USA |
4 | 6/12/2010 | Ann | 97 | 99 | 98 | France |
5 | 8/13/2011 | Elen | 100 | 97 | 98 | Greece |
6 | 11/14/2010 | Catherine | 67 | 59 | 70 | UK |
7 | 9/14/2005 | James | 54 | 67 | 63 | USA |
8 | 5/17/2006 | Martha | 79 | 83 | 88 | Italy |
9 | 4/17/2007 | Bill | 84 | 78 | 90 | UK |
10 | 8/18/2007 | Phil | 45 | 78 | 55 | USA |
11 | 9/18/2008 | James | 75 | 83 | 88 | Itally |
12 | 10/19/2009 | Tom | 85 | 89 | 92 | Greece |
13 | 6/19/2010 | Joe | 9,4 | 9,7 | 9,1 | UK |
14 | 9/20/2029 | Jill | 49 | 60 | 53 | Canada |
15 | 5/17/2006 | Martha | 79 | 83 | 88 | Italy |
16 | 12/12/2009 | Mary | 59 | 55 | 53 | USA |
17 | 24/10/2010 | Tony | 96 | 79 | 100 | Italy |
18 | 8/24/2006 | Lisa | 79 | −75 | 69 | UK |
19 | 5/25/2004 | Robert | 97 | 83 | 90 | USA |
20 | 4/25/2029 | Michael | 100 | 89 | 55 | Italy |
21 | 25/6/2007 | Rose | 67 | 97 | 88 | Greace |
22 | 8/26/2008 | Sofia | 54 | 60 | 92 | UK |
23 | 9/26/2009 | Jim | 97 | 88 | 67 | Greece |
24 | 4/26/2006 | Betty | 60 | 92 | 54 | France |
Alice soon realises that the key to finding the inconsistencies is to create a filter. The filter will allow her to see all of the unique values in the column, making it easier to isolate the incorrect values. (Source: https://edu.gcfglobal.org/en/excel-tips/a-trick-for-finding-inconsistent-data/1/).
After examining carefully this table, please help Alice to select the inconsistencies you have identified
-
A.
negative values for students’ grades
-
B.
different data formats
-
C.
typos in dates
-
D.
differences in spaces
-
E.
different grades’ scale
-
F.
typos in country data
-
G.
differences in capitalisation
Correct answers: A, B, C, E, F. In our example, we can identify the following inconsistencies: In row 21 Greece is misspelled and in row 11 Italy has double l; In row 18 there is a negative value for the grade in English; In row 13 grades are in different scale; In rows 14 and 20 dates are out of range; and In rows 17 and 21 dates are in different format (DD/MM instead of MM/DD).
-
6.
Alice participates in an International Conference on Teaching and Learning. Therefore, she must prepare a review of students’ performance from 6 different countries in three main subjects, namely Maths, English, and Science.
Students’ performance data from 6 different countries are collected in the following table.
Date of Birth | Student | Maths | English | Science | Country | |
---|---|---|---|---|---|---|
1 | 4/9/2008 | Richard | 95 | 68 | 96 | USA |
2 | 9/10/2007 | David | 65 | 78 | 70 | UK |
3 | 12/12/2009 | Mary | 59 | 55 | 53 | USA |
4 | 6/12/2010 | Ann | 97 | 99 | 98 | France |
5 | 8/13/2011 | Elen | 100 | 97 | 98 | Greece |
6 | 11/14/2010 | Catherine | 67 | 59 | 70 | UK |
7 | 9/14/2005 | James | 54 | 67 | 63 | USA |
8 | 5/17/2006 | Martha | 79 | 83 | 88 | Italy |
9 | 4/17/2007 | Bill | 84 | 78 | 90 | UK |
10 | 8/18/2007 | Phil | 45 | 78 | 55 | USA |
11 | 9/18/2008 | James | 75 | 83 | 88 | Italy |
12 | 10/19/2009 | Tom | 85 | 89 | 92 | Greece |
13 | 6/19/2010 | Joe | 94 | 97 | 91 | UK |
14 | 9/20/2009 | Jill | 49 | 60 | 53 | Canada |
15 | 5/17/2006 | Martha | 79 | 83 | 88 | Italy |
16 | 12/12/2009 | Mary | 59 | 55 | 53 | USA |
17 | 10/24/2010 | Tony | 96 | 79 | 100 | Italy |
18 | 8/24/2006 | Lisa | 79 | 75 | 69 | UK |
19 | 5/25/2004 | Robert | 97 | 83 | 90 | USA |
20 | 4/25/2009 | Michael | 100 | 89 | 55 | Italy |
21 | 6/25/2007 | Rose | 67 | 97 | 88 | Greece |
22 | 8/26/2008 | Sofia | 54 | 60 | 92 | UK |
23 | 9/26/2009 | Jim | 97 | 88 | 67 | Greece |
24 | 4/26/2006 | Betty | 60 | 92 | 54 | France |
After searching the web for answers, Alice finds out that she can identify duplicate rows by selecting Home-Conditional Formatting-Highlight Cell Rules-Duplicate Values in MS Excel.
Help Alice identify the duplicates. How many duplicates can you identify?
-
A.
None
-
B.
One pair of rows
-
C.
One triplet of rows
-
D.
Two pairs of rows
Correct answer: D
-
7.
After reading the Crowdflower Data Science Report, Alice realises that mining data for patterns and refining algorithms are the two most time-consuming tasks of a data-scientist’s workflow.
-
True
-
False
-
Correct answer: False.
-
8.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response about data cleaning in the following reflective task. You may reflect on:
-
1.
Identify factors that contribute to inconsistencies to educational datasets generated from online courses
-
2.
How can we explain the existence of outliers in educational data?
-
1.
2.2.2 Data to Describe Data (Metadata)
Metadata is usually defined as “data about data”. Johnson et al. (2018) provide the following definition about metadata “It is information about a data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g., title, author, date created) and/or specific elements inherent to data sets (e.g., spatial coverage, time periods).”
However, in the context of education, metadata can more aptly be defined as tags used to describe educational assets.
Metadata helps:
-
to organize,
-
find and
-
understand data
Metadata answers the following questions about data:
-
Who created it?
-
What is it?
-
When was it created?
-
How was it generated?
-
Where was it created?
-
How may it be used?
-
Are there restrictions on it?
Practical examples of metadata: https://dataedo.com/kb/data-glossary/what-is-metadata Kononow (2018), Fig. 2.7)
In Understanding Metadata 2017, from the National Information Standards Organization, Riley (2017) distinguishes the three types of metadata (see Fig. 2.8):
-
Descriptive metadata
-
Administrative metadata
-
Structural metadata
Descriptive metadata can describe a learning asset or resource related to education — including learning standards, lessons, assessment items, books, etc. — for purposes such as identification, search and discovery. Descriptive metadata can be thought of as a keyword or tag on an asset that makes it easier to find. Examples include subject, grade level, and related skills and concepts.
Administrative metadata is used to manage a learning asset. Examples of this type of metadata include status, disposition, rights and licensing.
Structural metadata describes how data is organized or formatted and is often governed by a widely-adopted standard that ensures the data is accurately represented when exchanged and presented. Structural metadata enables content to be machine readable.
Metadata are used for the purposes of:
-
Discovery of information
-
Identification of a resource
-
Interoperability, exchange of content between systems
-
Digital-object management i.e., deliver the appropriate version.
-
Preservation helps signalling when preservation actions should be undertaken
-
Navigation within parts of items
Primary uses of various metadata types are presented in the Table 2.1 below (adapted from Understanding Metadata, 2017).
The video from the National Archives of Australia “Meta… What? Metadata” (in the useful video resources) helps us understand the importance of metadata in order to describe, use, find and manage content and data.
The National Information Standards Organization describes “data interoperability, as the effective exchange of content between systems. Interoperability relies on metadata describing that content so that the systems involved can effectively profile incoming material and match it to their internal structures.” You may also review this video “Learn More About Data Interoperability” (in the useful video resources).
Questions and Teaching Materials
-
1.
Alice has heard of “metadata”, but she is not quite sure what it means or why she might need it. She downloaded this photo from pxhere.com an online community sharing copyright-free images.
What information can Alice gather from photo’s metadata? Match the questions from the first column with the values in the second column.
Question | Value |
---|---|
A. Who created the photo? | 1. Greater Flamingo |
2. CC0 Public Domain | |
3. 12/1/2020 7:38 PM | |
B. What is it? | 4. 7/11/2020 5:27 PM |
5. Alice | |
C. When was it created? | 6. Canon EOS 6D Mark II |
7. 219 mm | |
D. How was it generated? | 8. MARTIN TRNKA |
9. sRGB | |
E. What are the photo’s copyrights | 10. ISO-200 |
11. Digital Photo Professional |
Correct answer: A8 – B1 – C4 – D6 – E2
-
2.
Open educational resources (OER) are freely accessible, openly licensed text, media, and other digital assets that are useful for teaching, learning, and assessing as well as for research purposes. The term OER describes publicly accessible materials and resources for any user to use, re-mix, improve and redistribute under some licenses.
OER Repositories are repositories of open educational resources covering most of educational disciplines. Open Repositories are websites which house open books, textbooks, lectures, tutorials, quiz/test, case studies, assessment tools, images, syllabi, simulations, online courses and other resources of educational value.
Photodentro OER repositories is the Greek National Learning Object Repository (LOR) for primary and secondary education. It hosts reusable learning objects (small, self-contained reusable units of learning). It is open to everyone, pupils, teachers, parents, as well as anybody else interested. The URL for accessing Photodentro LOR is http://photodentro.edu.gr/lor.
For the purpose of collecting learning material for the flipped classroom initiative, Alice has found the following Learning Object (LO) in Photodentro OER repositories:
Alice is studying the Learning Object’s metadata page (http://photodentro.edu.gr/lor/r/8521/2705?locale=en) to find answers to the following questions:
-
1.
What is the Subject Area of the LO?
-
A.
English Language > Literature – Art – Culture > Reading
-
B.
FOREIGN LANGUAGE
-
C.
B1-medium knowledge
-
D.
Lost in the Museum (mystery game)
-
A.
Correct answer: A.
-
2.
What are the Licence Terms of the LO?
-
A.
Creative Commons Attribution-NoDerivatives Greece 3.0
-
B.
Creative Commons Attribution-ShareAlike 3.0 International License.
-
C.
Creative Commons Attribution-NonCommercial-ShareAlike Greece 3.0
-
D.
Creative Commons Attribution-NonCommercial-NoDerivatives Greece 3.0
-
A.
Correct answer: C.
-
3.
What is the Date of Publication?
-
A.
02/09/2019
-
B.
03/09/2019
-
C.
7/12/2020
-
D.
19/05/2013
-
A.
Correct answer: D.
-
4.
What is the File Size?
-
A.
4.91 MB
-
B.
12–15 MB
-
C.
25 MB
-
D.
8125 MB
-
A.
Correct answer: A.
-
5.
After watching the video “ Meta… What? Metadata! ” Alice realises one of the most common uses of metadata, which is to group content, making it more efficient to retrieve it during a search.
-
True
-
False
-
Correct answer: True.
-
6.
Alice watches the video from the League of Innovative Schools “ Learn More About Data Interoperability ” promoting the movement to advance data interoperability in public education.
In this video, data interoperability is defined as the seamless, safe and controlled exchange between applications, with clear standards for how to send and receive student information, privately and securely.
-
True
-
False
-
Correct answer: True
-
7.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response about metadata, in the following reflective task. You may reflect on:
The advantages of enhancing educational data through data description.
2.2.3 The Significance of Data Curation
According to ICPSR (2018), “Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future. Without curation, however, data can be difficult to find, use, and interpret” (Fig. 2.9).
Michael Stonebraker (2014), defines data curation as the process of turning independently created data sources (structured and semi-structured data) into unified data sets ready for analytics, using domain experts to guide the process. It involves:
-
Identifying data sources of interest (whether from inside or outside the enterprise)
-
Verifying the data (to ascertain its composition)
-
Cleaning the incoming data (for example, 99,999 is not a legal zip code)
-
Transforming the data (for example, from European date format to US date format)
-
Integrating it with other data sources of interest (into a composite whole)
-
Deduplicating the resulting composite data set.
Castanedo (2015), on the other hand, describes data curation as the process that involves data cleaning, schema definition/mapping, and entity matching to transform raw data into consistent data that can then be analysed. Schema definition/mapping is making associations among data attributes and features. Entity matching is finding data in different data sources that refer to the same entity. Entity matching is essential to remove duplicate records.
In this video, “ICPSR 101: What is Data Curation?” (in the useful video resources), ICPSR explains the intricacies of the work data processors do every day to find and fix issues in the data, ensuring their long-term availability and value to the research community.
According to The Digital Curation Centre (DCC) Fig. 2.10 provides a graphical, high-level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt through the iterative curation cycle.
We can identify four full life cycle actions:
-
Description and Representation
-
Preservation Planning
-
Community Watch and Participation
-
Curate and Preserve
The outer cycle represents the sequential actions of the data curation process:
-
Conceptualise
-
Create or Receive
-
Appraise and Select
-
Ingest
-
Preservation Action
-
Store
-
Access, Use and Reuse
-
Transform
Digital curation is all about maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle (Jisc, 2006).
You may also review the video “Data Curation @UCSB”, (in the useful video resources) to watch how UCSB Library eyes digital curation service to help preserve research data created across campus.
Now that we have completed the hard work to make our data tidy and meaningful, we will put in a little extra effort to preserve our valuable results.
Thus, we will discuss Digital Educational Data Preservation which is considered a key task in the data curation process, to safeguard our unique educational data from getting stolen, destroyed or simply lost.
Questions and Teaching Materials
-
1.
Alice is studying the Data Curation Process to ensure that data is reliably retrievable for future reuse, and to determine what data is worth saving and for how long.
Help Alice match the following Data Curation processes to the appropriate Data Curation Phase.
Data curation process | Data curation phase |
---|---|
A. Cleaning | Phase 1: Organize |
B. Presenting | |
C. Annotating | |
D. Preserving | Phase 2: Enhance |
E. Collecting | |
F. Tagging | Phase 3: Reuse |
G. Deduplicating | |
H. Publishing |
Correct answer: A1-B3-C2-D3-E1-F2-G1-H3.
-
2.
Data Curation is not quite clear to Alice, so she watches the video from ICPSR (“ ICPSR 101: What is Data Curation? ”) explaining what data curation is all about. According to this video, the purpose of data curation is to ensure that people can find data now and in the future. This can be achieved by following the 5 steps of data curation.
Please help Alice to arrange the following steps in the right order:
-
A.
Find and fix issues with data
-
B.
Identify data in the scope of the archive
-
C.
Ensure that data will last forever (or at least for a very long time)
-
D.
Make data findable and usable
-
E.
Get data (convince the data owners to share it)
-
A.
Correct answer: B-E-A-D-C
-
3.
Alice studies the Digital Curation Centre’s (DCC) Curation Lifecycle Model . According to this complex diagram, there are four full lifecycle actions and eight sequential actions of the data curation process.
Please help Alice to select only the full lifecycle data curation actions from the following list.
-
A.
Create or Receive
-
B.
Description and Representation
-
C.
Access, Use and Reuse
-
D.
Appraise and Select
-
E.
Preservation Planning
-
F.
Curate and Preserve
-
G.
Transform
-
H.
Community Watch and Participation
-
A.
Correct answers: B, E, F, H
-
4.
The last step of Data Curation Cycle is to ensure that data will last forever (or at least for a very long time). Alice is anxious, how can digital records last “forever”? What if the technology becomes obsolete?
Thankfully, in the “Data Curation @UCSB” video Alice just watched Greg Janee, a Digital Library Research Specialist claims that digital information is far more robust than paper.
Is Alice’s understanding correct?
-
Yes
-
No
-
Correct answer: No.
-
5.
ACTIVITY/PRACTICE QUESTION (Short answer)
Name some of the data curation actions described in this session.
-
6.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response in the following reflective task. You may reflect on:
The significance of data curation in educational data management.
2.2.4 Storage Issues for Preserving Educational Data
As explained in the short Library of Congress video “Why Digital Preservation is Important for Everyone” (in the useful video resources), traditional information sources such as books, photos and sculptures can easily survive for years, decades or even centuries but digital items are fragile and require special care to keep them useable. Rapid technological changes also affect digital preservation. As new technologies appear, older ones become obsolete, making it difficult to access older content.
This video explores the complex nature of the problem, how digital content, unlike content on traditional media, depends on technology to make it available and requires active management to ensure its ongoing accessibility.
Preservation is no longer simply a concern for memory institutions in the long term but for everyone interested in using and accessing digital materials. The greater the importance of digital materials, the greater the need for their preservation: digital preservation protects investment, captures potential and transmits opportunities to future generations and our own. Digital materials – and the opportunities they create – are fragile ((Digital Preservation Handbook), Digital Preservation Coalition (2015).
Jisc, 2006 defines Digital Preservation as “the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material”.
According to Principles and Good Practice for Preserving Data, “A sustainable preservation programme addresses organisational issues, technological concerns and funding questions” (Interuniversity Consortium for Political and Social Research (ICPSR), 2009). The simple questions to be answered:
-
Organisational Issues: “What are the requirements and parameters for the organisation’s digital preservation programme?”
-
Technological Issues: “How will the organisation meet defined digital preservation requirements?”
-
Resources Issues: “What resources will be needed to develop and maintain the digital preservation programme?”
Figure 2.11 is based on Digital Preservation Handbook (Digital Preservation Coalition, 2015), and presents the most important aspects we need to consider and manage, so as to ensure an effective digital preservation process for our educational data.
Even though our main focus is not to drill down deep into technical details and aspects of digital preservation issues, which are not part of educators’ main role, however it is essential to get an overview and understanding so as to be able to collaborate effectively with the responsible technical team, using a common language. Thus, next we will discuss briefly such issues for the effective educational data digital preservation.
The first steps that need to be undertaken in order to begin to build or enhance the needed digital preservation activities are summarized in Fig. 2.12. You may further review detailed information in Digital Preservation Handbook (Digital Preservation Coalition, 2015).
Special focus should be given on these key technical elements of digital preservation, as specified under USGS Guidelines, 2014:
-
Storage & Geographic Location – Storage systems, locations, and multiple copies to prevent loss of data.
-
Data Integrity – Procedures to prevent, detect, and recover from unexpected or deliberate changes to data.
-
Information Security – Procedures to prevent human-caused corruption of data, deletion and unauthorized access.
-
Metadata – Documentation of the data to enable contextual understanding and long-term usability.
-
File Formats – File types, data structures, and naming conventions to aid long-term preservation and reuse.
-
Physical Media – Reduce obsolescence risks that can threaten the readability of physical media.
To assess an organization’s readiness, it is recommended that these components are checked against the National Digital Stewardship Alliance (NDSA) ‘Levels of Digital Preservation’ (Phillips et al., 2013):
-
Level 1 – protect your data
-
Level 2 – know your data
-
Level 3 – monitor your data
-
Level 4 – repair your data
With regards to the storage technology, it has changed dramatically over the last twenty years. Initially, the norm was storing data using discrete media items, such as CDs/DVDs and hard-disk drives. Today, it has become common practice to use IT storage systems for the increasingly large volumes of digital material that needs to be preserved and to be easily and quickly retrievable (Digital Preservation Coalition, 2015).
At this point it is important to clarify the difference between backup and digital preservation process. Backup refers to “short-term data recovery solutions following loss or corruption” (Jisc, 2006). Preservation storage systems “require a higher level of geographic redundancy, stronger disaster recovery, longer-term planning, and most importantly active monitoring of data integrity in order to detect unwanted changes such as file corruption or loss” (Digital Preservation Handbook).
The selected storage solution is of prime importance for digital preservation. When selecting the storage strategy there are several options we need to consider, such as Cost and Scalability, required Capacity, Security, Remote Access, Collaboration and Disaster Recovery. Legal provisions due to privacy or confidentiality may also influence our decision. Figure 2.13 summarizes the pros and cons of each of the two basic storage methods, on-premises servers (local infrastructure/data centres) and Cloud-based storage, as well as recommended actions to comply with the latest regulations (COMPARE THE CLOUD, 2018). You may also review the video “Public Cloud vs Private Cloud vs Hybrid Cloud” (in the useful video resources), which compares and contrasts public, private and hybrid clouds: the basic elements of each, the features and benefits that each delivers, and how each type meets specific business needs.
In their 2018 report, Data Management Life Cycle Final report, Miller and his colleagues recognise the demand for cost-effective storage technologies. “More and more organizations are considering outsourcing storage services or cloud storage options because the availability of cloud computing resources opens up possibilities for users to purchasing access to computing power and storage space as a service instead of maintaining it themselves. This way, providers are responsible for the performance, reliability, and scalability of the computing environment, while users can concentrate on data analysis and production”.
Nevertheless, security and privacy are significant concerns holding back use of the cloud, particularly for confidential, sensitive, or personally identifiable information. Let’s not forget what happened at Code Space, which led to data deletion and the eventual shutdown of the company.
The most common risks we need to consider include: Downtime and service outages since cloud computing systems are internet based, vulnerability to external cyber-security attacks, compliance and legal issues depending on the applied regulation, lifetime costs that could end up being higher than you expected as well as limited control and flexibility since the cloud infrastructure is owned, managed and monitored by the service provider.
Despite these concerns, the potential of cloud storage seems to be more promising than the associated risks which are expected to diminish over time. As per Gartner “Through 2025, 99% of cloud security failures will be the customer’s fault” (Panetta, 2019). and “Organizations that do not have a high-level cloud computing strategy driven by their business strategy will significantly increase their risk of failure and wasted investment” (Cearley, 2017).
Whichever is our choice, even a hybrid storage solution, we need to realize that storage technologies present several risks to long-term preservation of data. Moreover, “Many cases of content loss are not necessarily due to technical faults but can come from human error, lack of budget, or a failure to regularly monitor the integrity of the stored data” (Digital Preservation Coalition, 2015) (Fig. 2.14).
Let’s now take a closer look at security issues and particularly cybersecurity.
According to Digital Preservation Handbook, security issues relate to:
-
system security (e.g., protecting digital preservation and networked systems / services from exposure to external / internal threats),
-
collection security (e.g., protecting content from loss or change, the authorisation and audit of repository processes), and
-
the legal and regulatory aspects (e.g. personal or confidential information in the digital material, secure access, redaction).
When it comes to cybersecurity, protecting educational data requires both administrative and technological security measures, in order to prevent unauthorized parties from accessing it. In the below Fig. 2.15, you may review some of these countermeasures to create an effective defence against cyber-attacks.
In order to help school protect against cyberthreats and develop effective security programs, there is also a really useful Report about K-12 Security Risk Methodology (Woody, 2004), emphasizing that while technology “is broadly used in the K-12 environment by many participants including administrators, teachers, parents, students, school board members, etc.” “while this enables a wide range of useful activities, the risk for inappropriate and illegal behaviour that violates privacy, regulations, and common courtesy is increasing exponentially”.
The thing that kept me awake at night (as NATO military commander) was cybersecurity. Cybersecurity proceeds from the highest levels of our national interest ... through our medical, our educational, to our personal finance (systems). (Admiral James Stavridis, Ret.Former-NATO Commander in Cybersecurity and Digital Business Risk Management, 2020).
To this point we have provided an overview of the key issues of digital preservation and realized its importance to maintain usable our educational data over time. You may also review in this video “How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV” (in the useful video resources), the (mostly) true story of how ‘Toy Story 2’ was almost deleted from Pixar Animation’s computers during the making of the film. And how the film was saved by one mom’s home computer!
Let us move forwards to identify good practices and appropriate actions to collect the needed data, as well to protect this data and safeguard its privacy, especially when it comes to sensitive educational data.
After all, “Data protection is all about protecting people – not just files and computer systems” (Moore Barlow, 2018).
Questions and Teaching Materials
-
1.
Following the discussion with the DPO about the school’s preservation strategy and policies, Alice starts wondering. Is digital content so fragile, after all? Should I find more about preservation issues to protect my course’s digital content?
Alice accesses the video “Why Digital Preservation is Important for Everyone”.
She now understands that though traditional information sources can easily survive for years, decades and even centuries, digital items require special care to preserve them. More specifically, the digital items are fragile as they require special care to keep them usable, they are dependent as they depend on technology to make them available and require active management to ensure their ongoing accessibility.
Is this assumption True or False? Please select the right answer.
-
True
-
False
-
Correct answer: True
-
2.
Alice soon realises that she needs to seek “guidance on key issues and actions to consider when creating digital materials to ensure their longevity of active use and potential for long-term preservation” ( Digital Preservation Handbook ).
Please mark the correct key elements corresponding to each category of issues that Alice needs to address for digital preservation.
Organisational issues | Technological issues | Resources issues | |
---|---|---|---|
Integrity of Data over time | X | ||
Legal Compliance | X | ||
Budgets and Costs | X | ||
Balancing Security and Access | X | ||
Staffing and needed Skills | X | ||
Information Security | X | ||
Collaboration | X | ||
Facilities Required | X | ||
Metadata Standards | X | ||
Selection of Data to be Preserved | X | ||
Sustainable File Formats | X |
Correct answers: as marked with X above
-
3.
Alice is presently at the point of investigating on the key technical elements of digital preservation.
It’s a bit hard for her to deal with such technical issues. Are you ready to help her?
You may review the definitions of the key technical elements of digital preservation, presented in page 2 of the USGS Guidelines, 2014.
Please match the appropriate definition (from the right column), to the respective technical element (in the left column).
1. Metadata | A. Basic recommendations to reduce obsolescence risks that can threaten the readability of physical media |
2. Physical Media | B. Storage systems, locations, and multiple copies to prevent loss of data |
3. Information Security | C. File types, data structures, and naming conventions to aid long-term preservation and reuse |
4. File Formats | D. Procedures to prevent human-caused corruption of data, deletion, and unauthorized access |
5. Storage & Geographic Location | E. Documentation of the data to enable contextual understanding and long-term usability |
Correct answers: 1-E, 2-A, 3-D, 4-C, 5-B
-
4.
Let’s go back to Alice. She gets informed by the responsible colleague about the hybrid storage solution used by the school. It’s a combination of local infrastructure/data centre and cloud-based storage. Moreover, as per her school guidelines for data storage good practice strategy, she needs to create multiple independent copies to stabilize her files. The copies are geographically separated in different locations, using different storage technologies and are actively monitored to ensure any problems are detected and corrected.
She wonders about the criteria that influenced the school’s decision making for the selected storage solution for digital preservation. Can you help her specify these selection criteria?
Please select the right answers.
-
A.
Collision
-
B.
Security
-
C.
Disaster Recovery
-
D.
Redundancy
-
E.
Cost
-
A.
Correct answers: B, C, and E.
-
5.
Alice is now interested in learning more about cost-effective storage technologies and more specifically about storing data on the cloud. What is a cloud and why there are different types of clouds? She decides to watch again the video “ Public Cloud vs Private Cloud vs Hybrid Cloud ”.
Can you assist Alice in getting a deeper understanding of cloud-based storage?
Please select the right answer(s). You may select more than one answer.
-
A.
Clouds are smart, automated and adaptive
-
B.
Clouds are less efficient and cost effective that traditional Data Centers.
-
C.
Public clouds are hosted by a cloud service provider and tenants pay for services they actually use.
-
D.
Private Clouds provide higher scalability and lower control.
-
E.
Hybrid clouds are a combination of both private and public clouds enabling the creation of new innovative apps with uncertain demand.
-
A.
Correct answers: A, C, E
-
6.
After reading the article “ Murder in the Amazon cloud ”, Vadali ( 2017 ), presenting the story of Code Space, which led to data deletion and the eventual shutdown of the company, Alice is more concerned about storage security.
What are the needed tasks for the school and herself personally, to keep the students ‘data safe?
You may review again Fig. 2.15, as well as the Techniques for protecting information according to Digital Preservation Handbook.
Please select the right answer(s). You may select more than one answer.
-
A.
Strengthen software and operating systems.
-
B.
Do not abandon software when it becomes obsolete, you may need to reuse it.
-
C.
Use access controls to specify who is allowed to access digital material and the type of access that is permitted
-
D.
Train only the people whose security awareness is part of their duties.
-
E.
Built a short-term plan for security
-
F.
Use Encryption, a cryptographic technique which protects digital material by converting it into a scrambled form.
-
A.
Correct answers: A, C, F
-
7.
Alice watches the video “ How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV ” and thinks “What an unbelievable story!”
She then starts laughing. The director could have avoided this “almost disaster” if he.
Please select the right answer.
-
A.
had not typed the command RM*
-
B.
had multiple independent copies of the digital material of the movie
-
C.
had used a combination of online and offline storage techniques for the copies of the digital material of the movie
-
D.
had kept the copies of the digital material of the movie geographically separated into different locations
-
E.
All the above.
-
A.
Correct answer: E.
-
8.
ACTIVITY/PRACTICE QUESTION (Short answer)
Name some types of educational data that need long term preservation.
-
9.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response in the following reflective tasks. You may reflect on:
-
1.
Storage issues for preserving educational data
-
2.
Good practices when preserving educational data
-
1.
2.3 Educational Data Ethics
2.3.1 Informed Consent
The video “Introduction to data ethics” (in useful video resources) introduces the basic principles of data ethics.
As Pentland states when describing Big Data, “the ability to track, predict and even control the behaviour of individuals and groups of people is a classic example of Promethean fire: it can be used for good or ill” (Pentland, 2013).
New regulations, like the GDPR (General Data Protection Regulation) (Regulation (EU), 2016) that we will discuss later on, along with recent events such as the Cambridge Analytica and Facebook scandal, have raised awareness of data ethics issues that can arise from data misuse (Open Data Institute, 2018a).
Open Data Institute (ODI) (Broad et al., 2017), defines Data Ethics as.
a branch of ethics that evaluates data practices with the potential to adversely impact on people and society – in data collection, sharing and use.
Several frameworks, policies and guidelines have been developed to address data ethics issues, including JISC’s code of practice (Shacklett, 2016), updated in 2018, the LACE (Learning Analytics Community Exchange) framework in 2016 and the ICDE (International Council for Open and Distance Education) Global guidelines (Slade & Tait, 2019). To help identify potential ethical issues associated with a data project or activity and the steps needed to act ethically, Open Data Institute has also designed the Data Ethics Canvas in 2018 (Open Data Institute, 2018b).
We will further discuss the basic common principles of these practices in Chap. 3.
As emphasized by Shacklock (2016)“Institutions should put in place clear ethical policies and codes of practices that govern the use of educational data. These policies should, at a minimum, address privacy, security of data and consent.”
Before proceeding further, the brief video “What is the GDPR?” (in useful video resources) provides an overview of the European Union data protection rules, also known as the EU General Data Protection Regulation (or GDPR), that apply since 25 May 2018 to all entities who collect, store and process any personal data belonging to EU citizens and residents (even organisations that are not EU-based). GDPR has strengthened the conditions for consent (GDPR.eu, 2019).
We will soon discuss this new regulation and how should be applied by the various entities. First, let’s see what informed consent is all about.
Informed consent is declared by most international guidelines as one of the pivotal principles in Data Ethics and “is explicitly mentioned as a principle in article 7 of the International Covenant on Civil and Political Rights (1966), a United Nations Treaty” (European Commission, 2013).
According to Griffiths et al. (2016) “Informed consent refers to the requirement for an individual to give consent for the collection and analysis of the data which they generate.” While “Transparency refers to the degree to which users can observe the ways in which the data they generate is used”.
As per European Commission’s report (2013) regarding Ethics for Researchers “Informed consent consists of three components: adequate information, voluntariness and competence.”
Thus, prior to consenting, individuals should be clearly informed of the data collection goals, possible adverse impacts and the means available to them to refuse or withdraw consent, without consequences, at any time.
Moreover, individuals must be competent to understand the information and should be fully aware of the consequences of their consent. Greater attention is required for some special categories of people, such as children, vulnerable adults and people with certain cultural or traditional backgrounds.
At this point, it is important to understand the distinction between consent and informed consent. For informed consent, we need to ensure that individuals genuinely understand how we intend to use their data e.g., by running focus groups and/or publishing explanatory documents.
As per European Commission guidelines about GDPR, “when a company or organisation asks for consent to collect or reuse personal information, the data subjects have to make a clear action agreeing to this, for example by signing a consent form or selecting yes from a clear yes/no option on a webpage”…“It is not enough to simply opt out, for example by checking a box saying they don’t want to receive marketing emails. They have to opt in and agree to their personal data being stored and/or re-used for this purpose.”
European Commission emphasizes that informed consent means that before you consent, you must be given information about the processing of your personal data, including at least:
-
the identity of the organisation processing data;
-
the purposes for which the data is being processed;
-
the type of data that will be processed;
-
the possibility to withdraw consent;
-
where applicable, the fact that the data will be used solely for automated-based decision-making, including profiling;
-
information about whether the consent is related to an international transfer of your data, the possible risks of data transfers to countries outside the EU if those countries are not the subject of a Commission adequacy decision and there are no adequate safeguards.
The way individuals are informed is crucial for the informed consent process. We should ensure that they fully realize the expected consequences of granting or withholding consent (Fig. 2.16).
With regards to the collection of personal data about children, additional protection should be granted since children are less aware of the risks and consequences of sharing data and of their rights.
In U.S., the foundational federal law on student privacy, the Family Educational Rights and Privacy Act (FERPA), establishes student privacy rights by restricting with whom and under what circumstances schools may share students’ personally identifiable information. DQC has developed a tool that summarizes some of the main provisions of FERPA and can be used as a guide to help interested parties to understand when they need to take a closer look at the law or consult an expert.
Under GDPR, any information addressed specifically to a child should be adapted to be easily accessible, using clear and plain language.
For most online services (social networking sites) the consent of the parent or guardian is required in order to process a child’s personal data on the grounds of consent up to a certain age.
The age threshold for obtaining parental consent is established by each EU Member State and can be between 13 and 16 years, according to National Data Protection Authority.
As per European Commission clarifications for the Rights for Citizens, “Companies have to make reasonable efforts, taking into consideration available technology, to check that the consent given is truly in line with the law. This may involve implementing age-verification measures such as asking a question that an average child would not be able to answer or requesting that the minor provides his parents’ email to enable written consent”.
Within the context of education, there are quite different approaches relating to the consent in collecting learners’ data, according to national guidelines (when available).
Figure 2.17 depicts the main principles and challenges that should be taken under consideration to comply with GDPR. As presented, data-related activity can still be lawful, by complying with legal obligations e.g. GDPR, even though it may be considered that data is not treated ethically. Sclater (2017) also argues that “consent is required for use of sensitive data and in order to take interventions directly with students on the basis of the analytics. This implies that if the data in question are not considered ‘sensitive’, and do not form the basis for any intervention, consent is not required (on the basis that this may be considered as of legitimate interest)”.
Moreover, as per the ICDE’s recent report (2019), many institutions seek for consent to collect student data for additional purposes, beyond institutional reporting and basic student support, at the point of registration. As emphasized, “expectation that users should consent to uses of personal data unknown at the point of registration seems to be an unreasonable and unethical one.”
An alternative approach supported by most of the existing guidelines (Higher Education Commission, JISC’s code of practice, ICDE Global guidelines) might be to differentiate between the granting of initial consent for the collection of data and the obtaining of additional consent at the point where a specific personal intervention is proposed, or in the case where new data is incorporated into the institution’s system, or existing data is used in new ways.
As concluded in ICDE report (2019) “national legislation will influence positions taken, but generally this principle (of consent) should be built around a minimum of informed consent (that is, transparency before registration).”
You may also review this video “Why develop a data science code of ethics?” (in useful video resources) where experts from the data science community explain why it’s important to have a code of ethics.
Questions and Teaching Materials
-
1.
After watching the video introducing Data Ethics Principles “ Introduction to Data Ethics ”, Alice is really concerned. Companies are collecting so much data every day. According to the video, Google can track your searches on your individual devices, even if you are not logged in to your account, up to:
-
A.
7 days
-
B.
2 months
-
C.
6 months
-
D.
3 years
-
A.
Correct answer: C
-
2.
Before using the flipped classroom initiative, Alice wants to study Grade 9 students’ perceptions of technology, using an online questionnaire she made with Google Forms.
Alice wants to prepare an informed parental consent form for her students (as they are under 15) in order to participate in the students’ perceptions of technology survey, but she is a bit confused with all this information.
Can you help Alice to have a better understanding?
-
A.
Prior to consenting, individuals should be clearly informed of how the data will be used
-
True
-
False
-
-
A.
Correct answer: True
-
B.
When individuals give consent for the collection and analysis of the data which they generate, they cannot refuse or withdraw their consent
-
True
-
False
-
Correct answer: False
-
C.
EU General Data Protection Regulation (or GDPR) , apply since 25 May 2018 even to organisations that are not EU-based, as long as they collect, store and process any personal data belonging to EU citizens and residents.
-
True
-
False
-
Correct answer: True
-
3.
You give some advice to Alice in order to help her prepare the consent form for the students’ perceptions of technology study. Select all that apply.
A consent request must:
-
A.
Include contact details of the company processing the data
-
B.
Be anonymized
-
C.
Include information about the possibility of withdrawing consent
-
D.
Be freely given
-
E.
Be included in the terms and conditions
-
F.
Be presented in a formal language
-
G.
Specify the purpose of the data process
-
H.
Specify the type of data that will be processed
-
A.
Correct answers: A, C, D, G, H
-
4.
Alice has a colleague, Betty, who has just come on board and wants to conduct an online survey with her 17-year-old students about their eating habits. Betty asks Alice if it is necessary to collect parental consent in order to process her students’ personal data.
Help Alice decide if a consent as a parent or guardian is required in order to process students’ personal data
-
Yes
-
No
-
Correct answer: No.
-
5.
Alice’s Secondary High-School relies upon the sixth lawful basis (public task basis) to justify the processing of personal data (according to GDPR) where processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.
Is this lawful basis (public task basis) appropriate for Alice in order to take interventions directly with students on the basis of the participation data recorded within the Learning Management System?
Help Alice find the correct answer
-
Yes
-
No
-
Correct answer: No.
-
6.
In the video “ Why develop a data science code of ethics? ”, Paula Goldman, VP/Head of Omidyar Network’s Tech and Society Solutions Lab, claims that data and algorithms are neutral.
-
True
-
False
-
Correct answer: False.
-
7.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response in the following reflective task. You may reflect on:
-
1.
What information must be given to individuals, whose data is collected. You can search for additional information on the European Commission’s website.
-
2.
Using information from the European Commission website, create an infographic presenting the General Protection Data Regulations.
-
1.
2.3.2 Sensitive Educational Data Protection
Balancing digital learning with privacy and security is essential to fostering a successful digital culture (iKeepSafe, 2017).
Privacy is a fundamental human right and a core value in the functioning of democratic societies. As already discussed in the previous topics, with the exponential progress in the field of information and communication technologies and in the light of rapid development of Educational Data Analytics on a global basis, new challenges to privacy and data protection have emerged.
The “Privacy Overview for K12 Teachers and Administrators” video (in useful video resources) provides us with an overview of the privacy issues that may arise and growing concerns about educational data privacy. Is educational data privacy over in the digital age?
In the Quantified Student infographic you may see what a day in the data-driven life of most measured and monitored student in the history of education, looks like.
“The data collection begins even before he steps into the school,” says Khaliah Barnes, director of the Student Privacy Project at the Electronic Privacy Information Center. “The issue is that this reveals specifically sensitive information,” says Barnes (Hill, 2014).
Moreover, as Jose Ferreira CEO at Knewton (one of the biggest actors in the field of educational technology software), points out “We literally know everything about what you know and how you learn best, everything.” Ferreira calls education “the world’s most data-mineable industry by far” (Hill, 2014).
Do educational data analytics challenge the principles of data protection? Is privacy a show-stopper? How privacy is guaranteed/secured, especially if minors and/or sensitive data is involved?
The European position has been expressed in the European Commission’s report: “New Modes of Learning and Teaching in Higher Education” (European Commission, 2014). In recommendation 14, the Commission clearly stated: “Member States should ensure that legal frameworks allow higher education institutions to collect and analyse learning data. The full and informed consent of students must be a requirement and the data should only be used for educational purposes”, and in recommendation 15: “Online platforms should inform users about their privacy and data protection policy in a clear and understandable way. Individuals should always have the choice to anonymise their data.” This is a widely accepted framework mirrored in the laws of multiple nations and international organisations including many U.S. states (Drachsler & Greller, 2016).
Thus, it is essential that all educators understand how learners’ personal information is used and adequately protect learners’ data in order to strengthen the trust of all parties involved and encourage their participation in digital learning.
In the video by the Data Quality Campaign “Who Uses Student Data?” (in useful video resources), it is emphasized that most personal student information stays local. Districts, states, and the federal government all collect data about students for important purposes like informing instruction and providing information to the public. But the type of data collected, and who can access them, is different at each point.
As clearly stated in Foundational Principles for Using and Safeguarding Students’ Personal Information developed by a coalition of US national education organisations “Everyone who uses student information has a responsibility to maintain the privacy and the security of students’ data, especially when these data are personally identifiable.”
The basic information security techniques, as specified by Digital Preservation Handbook, include:
Encryption
-
Encryption is a cryptographic technique which protects digital material by converting it into a scrambled form. The use of a key is required to unscramble the data and convert it back to its original form.
Access Control
-
Access control enables an administrator to specify who is allowed to access digital material and the type of access that is permitted (for example read only, write).
Redaction
-
Redaction refers to the process of identifying and removing or replacing confidential or sensitive information, using anonymisation or pseudonymisation.
Now that we have a better understanding of the different types of data as categorized in terms of privacy, we will further review the levels of data as specified under GDPR.
The Fig. 2.18 presents the main categories of personal data as defined by GDPR.
We need to pay extra attention to sensitive (special category of personal data) since an organisation can only process this data under specific conditions (explicit consent may be needed). Even personal data, as clarified under GDPR, “should only be processed where it isn’t reasonably feasible to carry out the processing in another manner. Where possible, it is preferable to use anonymous data. Where personal data is needed, it should be adequate, relevant, and limited to what is necessary for the purpose (‘data minimisation’).”
Once data is truly anonymised and does no longer contain any identifying elements, the anonymisation is irreversible and individuals are no longer identifiable, the data will not fall within the scope of the GDPR and it becomes easier to use.
Before anonymization, we should consider the purposes for which the data is to be used. Anonymisation may devalue the data, so that it is no longer useful for specific purposes.
The ICO’s Code of Conduct on Anonymisation provides further guidance on anonymisation techniques (UCL, 2018). Unlike anonymisation, in pseudonymised data personally identifiable material is replaced with artificial identifiers. Pseudonymised personal data can still fall within scope of the GDPR, depending on how difficult it is to attribute the pseudonym to a particular individual.
Whether ‘de-identified’ or pseudonymised data is in use, there is a residual risk of re-identification. For example, anonymisation is often seen as the “easy way out” of data protection obligations. However, experts around the world are adamant that 100% anonymisation is not possible. Anonymised data can rather easily be de-anonymised when they are merged with other information sources. (Drachsler & Greller, 2016).
L. Sweeney (2000) presented that it’s possible to personally identify 87% of the U.S. population based on just three data points: five-digit ZIP code, gender and date-of-birth (Wes, 2018). Later on, in 2006, the AOL release of users’ search logs (Hansell, 2006) and the case of the Searcher No. 4417749, as recorded in “A Face Is Exposed for AOL Searcher No. 4417749“by M. Barbaro and T. Zeller (2006) of New York times, was one of the first widely known cases of re-identification. In 2007, the Netflix case (Narayanan & Shmatikov, 2008), followed when researchers de-anonymized some of the Netflix data by matching rankings and timestamps with public information on the Internet Movie Database. As per Hill (2012), in 2012 the retail company Target, using behavioural advertising techniques, managed to identify a pregnant teen girl from her web searches and sent her relevant vouchers at home. (D’Acquisto et al., 2015).
Thus, though de-identification techniques can reduce the risks to the data subjects concerned and help organisations to meet their data-protection obligations, we need to assess properly the adequacy of these methods so as to decide whether further steps to de-identify the data are necessary (UCL, 2018).
The GDPR introduces two new principles: data protection by design and data protection by default, whose definitions are presented in Fig. 2.19.
As specified in GDPR (Regulation (EU), 2016), the protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures be taken which meet in particular the principles of data protection by design and data protection by default.
“Data protection by design minimises privacy risks and increases trust”, while “Data protection by default entails ensuring that your company always makes the most privacy friendly setting the default setting” (European Union, 2018).
An example of Data protection by design is the use of pseudonymisation & encryption and examples for Data protection by default include “data minimisation” (only the data necessary should be processed), the limited accessibility as well as the short storage period.
Let’s now review further the privacy by design strategies and the storage privacy (Data protection by design), as well as the Storage Limitation (Data protection by default).
Figure 2.20 depicts eight Privacy By Design Strategies, as proposed by the European Union Agency for Network and Information Security (D’Acquisto et al., 2015). These strategies enable us to identify the data protection and privacy requirements early in the educational analytics value chain and subsequently to implement the necessary technical and organizational measures. One of the most significant privacy enhancing technologies that can be used for implementing such strategies, is storage privacy.
Privacy challenges should be, seen as opportunities that, if appropriately handled, can build trust in the big data ecosystem for the benefit of both users and big data industry (D’Acquisto et al., 2015).
Danezis et al. (2014), in this report “Privacy and Data Protection by Design”, defines Storage Privacy as “the ability to store data without anyone being able to read (let alone manipulate) them, except the party having stored the data (called here the data owner) and whoever the data owner authorises.”
As specified further in the report, “a major challenge to implement private storage is to prevent non-authorised parties from accessing the stored data. If the data owner stores data locally, then physical access control might help, but it is not sufficient if the computer equipment is connected to a network: a hacker might succeed in remotely accessing the stored data. If the data owner stores data in the cloud, then physical access control is not even feasible.”
A straightforward option for storage privacy is storing the data, either locally or in cloud storage, in encrypted form. One can use full disk encryption (FDE) or file system-level encryption (FSE). As clarified in the report, “encryption and decryption operations must be carried out locally, not by remote service, because both keys and data must remain in the power of the data owner if any storage privacy is to be achieved. The report specifies that outsourced data storage on remote clouds is practical and relatively safe as long as only the data owner, not the cloud service, holds the decryption keys. Such storage may be distributed for added robustness to failures.”
When it comes to Data protection by default, Storage limitation is one of the key conditions for processing personal data under GDPR. It replies to a simple question “For how long can data be kept and is it necessary to update it?” Regulation’s answer is straightforward “You must ensure that personal data is stored for no longer than necessary for the purposes for which it was collected”. There are 6 basic guidelines, specified clearly by GDPR, which you need to take under consideration when storing personal data (Fig. 2.21).
Before closing this chapter, it is essential to analyse the individuals’ rights. The main reason for the introduction of GDPR is to allow European Union citizens to better control their personal data. More specifically is designed to:
-
Harmonize data privacy laws across Europe,
-
Protect and empower all EU citizens’ data privacy
-
Reshape the way organisations across the region approach data privacy.
GDPR applies to “all companies operating in the EU, wherever they are based” (European Commission, 2018). The GDPR introduces stronger rights for data subjects (Intersoft Consulting, 2018), and creates new obligations for data controllers (the person or body handling the personal data).
Figure 2.22 presents individuals’ rights so as to have control over their personal data, under GDPR. To exercise individuals’ rights they should contact the company or organisation processing their personal data, also known as the controller. If the company/organisation has a Data Protection Officer (‘DPO’) they may address their request to the DPO. The company/organisation must respond to their requests without undue delay and at the latest within 1 month.
When the personal data, for which a company/organisation is responsible, is disclosed, either accidentally or unlawfully, to unauthorised recipients or is made temporarily unavailable or altered, a data breach occurs. In case a data breach occurs and the breach poses a risk to individual rights and freedoms, the company/organisation should notify its Data Protection Authority (DPA) within 72 hours after becoming aware of the breach. Depending on whether or not the data breach poses a high risk to those affected, a business may also be required to inform all individuals affected by the data breach (European Commission, 2018h).
Whenever processing is likely to result in a high risk to the rights and freedoms of individuals, as specified by GDPR, a Data Protection Impact Assessment (DPIA) is required. A DPIA is required at least in the following cases:
-
a systematic and extensive evaluation of the personal aspects of an
individual, including profiling;
-
processing of sensitive data on a large scale;
-
systematic monitoring of public areas on a large scale.
National Data Protection Authorities, in collaboration with the European Data Protection Board, may provide lists of cases where a DPIA would be required. As emphasized, “the DPIA should be conducted before the processing and should be considered as a living tool, not merely as a one-off exercise. Where there are residual risks that can’t be mitigated by the measures put in place, the DPA must be consulted prior to the start of the processing”.
Figure 2.23 provides the 3 Basic Steps to Identify and Protect Sensitive Data, as per Krueger (2017).
A DPIA should be conducted as early as possible in the project lifecycle, so that its findings and recommendations can be incorporated into the design of the processing operation (itgovernance).
You may also review the video “Protecting Student-Data Privacy: An Expert’s View” (see useful video resources) where Fordham University Law Professor Joel Reidenberg talks with Education Week Correspondent John Tulenko about student data and the best ways to keep it secure.
Questions and Teaching Materials
-
1.
Alice is a bit confused. Several state and federal laws require privacy protection for students and children. In the video she just watched, “ Privacy Overview for K12 Teachers and Administrators ”, what laws are mentioned concerning data privacy for children?
There is more than one correct answer. Help Alice select the right ones
-
A.
FERPA
-
B.
CIPA
-
C.
COPPA
-
D.
CAPTA
-
A.
Correct answers: A, C
-
2.
From watching the “ Who Uses Student Data? ” video, Alice understands that teachers have access only to de-identified data (i.e. information about individual students but with identifying information removed).
Is Alice’s understanding correct?
Please select the correct answer:
-
Yes
-
No
-
Correct answers: No.
-
3.
For the purposes of research, Alice intends to release student data.
Alice asks to be informed by the responsible DPO on school’s policy and guidelines to protect students’ data privacy, confidentiality, integrity and security. She becomes aware of personal and sensitive data handling and the use of anonymisation and pseudonymisation to remove personally identifiable information.
As student data might be released for the purposes of research, all names, postal codes and other identifiable data are removed. Completely removing fields that could be used in any way to identify a person is considered a strong form of
-
A.
data pseudonymisation
-
B.
data anonymisation
Please select the correct term to complete the sentence.
-
A.
Correct answer: B
-
4.
Alice has concerns about her students’ records, and more specifically about medical reports related to student’s learning difficulties being accessed by unauthorized third persons. She contacts the responsible DPO and is informed about the appropriate technical and organisational measures taken by the school, so as to secure data protection by design and by default.
More specifically the DPO explains to Alice that the School Information System (SIS) has a mechanism for comprehensively logging who consulted the medical reports and preventing unauthorized access to these sensitive data. Moreover, personal and sensitive data are pseudoanonymized and “data minimization” (only the data necessary should be processed) is used.
Alice feels secure because the technical and organisational measures being taken meet in particular the principles of data protection by design and data protection by default.
Is Alice correct in feeling secure?
Please select the correct answer:
-
Yes
-
No
-
Correct answer: Yes
-
5.
Storage privacy is about preventing non-authorized parties from accessing the stored data. This can be achieved only when encryption and decryption operations are carried out locally, not by remote service, because both keys and data must remain in the power of the data owner.
Alice assumes that if any storage privacy is to be achieved, then data must be stored locally and cloud storage should be avoided.
Do you agree with the assumption of Alice?
Please select the correct answer:
-
Yes
-
No
-
Correct answer: No.
-
6.
Alice’s institution runs a recruitment office and for that purpose it collects CVs and keeps records of persons seeking employment. They keep recruitment application forms and interview notes (for unsuccessful candidates) for 5 years in case they need them without taking any measures for updating the CVs
Alice doubts that the storage period is proportionate to the purpose of finding employment and thinks that this is not compliant with GDPR. Do you agree with Alice?
You may review “For how long can data be kept and is it necessary to update it? | European Commission (europa.eu)”.
Please select the correct answer:
-
Yes
-
No
-
Correct answer: Yes.
-
7.
Alice is trying to understand the rights for data subjects described in GDPR. She reviews “Data protection and online privacy – Your Europe (europa.eu)” and “It’s your data – take control – Data protection in the EU (europa.eu)”.
Help Alice match the cases to the appropriate individual right.
Case | Individual Right |
---|---|
A. You’ve bought goods from an online retailer. You can ask the company to give you the personal data they hold about you, including: your name and contact details, credit card information and dates and types of purchases. | 1. Right to object |
B. You bought two tickets online to see your favorite band play live. Afterwards, you’re bombarded with adverts for concerts and events that you’re not interested in. You inform the online ticketing company that you don’t want to receive further advertising material. | 2. Right to rectification |
C. You apply for a new insurance policy but notice the company mistakenly records you as a smoker, increasing your life insurance payments. | 3. Right to be forgotten |
D. When you type your name into an online search engine, the results include links to an old newspaper article about a debt you paid long ago. | 4. Right of Access |
E. You apply for a loan with an online bank. You are asked to insert your data and the bank’s algorithm tells you whether the bank will grant you the loan and gives the suggested interest rate. | 5. Right to data portability |
F. You’ve found a cheaper electricity supplier. You ask your existing supplier to transmit your data directly to the new supplier, if it’s technically feasible or to return your data to you in a commonly-used and machine readable format so that it can be used on other systems. | 6. Rights related to automated decision making |
Correct answer: A4 – B1 – C2 – D3 – E6 – F5.
-
8.
Alice’s institution recruitment office decides to implement an innovative recruitment procedure which includes e-recruitment tools automatically pre-selecting/excluding candidates without human intervention. Alice thinks that a Data Protection Impact Assessment (DPIA) is required.
Study the “Decision of the European Data Protection Supervisor of 16 July 2019 on DPIA Lists issued under Articles 39(4) and (5) of Regulation (EU)” and select the “Criteria for processing ‘likely to result in high risk’”, that will trigger DPIA in the case of Alice’s institution new recruitment procedure (select 3 criteria).
Which are the criteria for processing “likely to result in high risk”?
-
1.
Systematic and extensive evaluation of personal aspects or scoring, including profiling and predicting.
-
2.
Automated-decision making with legal or similar significant effect: processing that aims at taking decisions on data subjects
-
3.
Systematic monitoring: processing used to observe, monitor or control data subjects, especially in publicly accessible spaces. This may cover video-surveillance but also other monitoring, e.g. of staff internet use.
-
4.
Sensitive data or data of a highly personal nature: data revealing ethnic or racial origin, political opinions, religious or philosophical beliefs, trade-union membership, genetic data, biometric data for uniquely identifying a natural person, data concerning health or sex life or sexual orientation, criminal convictions or offences and related security measures or data of highly personal nature.
-
5.
Data processed on a large scale, whether based on number of people concerned and/or amount of data processed about each of them and/or permanence and/or geographical coverage
-
6.
Datasets matched or combined from different data processing operations performed for different purposes and/or by different data controllers in a way that would exceed the reasonable expectations of the data subject.
-
7.
Data concerning vulnerable data subjects: situations where an imbalance in the relationship between the position of the data subject and the controller can be identified.
-
8.
Innovative use or applying technological or organisational solutions that can involve novel forms of data collection and usage. Indeed, the personal and social consequences of the deployment of a new technology may be unknown.
-
9.
Preventing data subjects from exercising a right or using a service or a contract.
-
1.
Correct answer: 1, 2, 8.
-
9.
According to Professor Joel Reidenberg, in the video “Protecting Student-Data Privacy: An Expert’s View”, the worst that could happen because of bad data practices is:
-
A.
Students being used as guinea pigs for the development of commercial products
-
B.
Educational harm to children, where they are being improperly labelled
-
C.
The development of programs that assess teachers’ performance
-
D.
The development of flexible mechanisms so parents can consent and opt-in to additional uses of data
-
A.
Correct answer: B.
-
10.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response in the following reflective task. You may reflect on:
-
1.
Privacy issues for preserving educational data
-
2.
Educational data protection
-
1.
2.4 Concluding Self-Assessed Assignment
2.4.1 Introduction
Both Alice and you have come a long way in your understanding of the power of educational data as a key success factor for online and blended teaching and learning, as well as of the fundamentals of Educational Data Collection and Management, including issues related to ethics and privacy.
You are now ready to develop further your Educational Data Literacy Competences focusing on Educational Data Analysis, Comprehension and Interpretation.
In order to proceed, you are requested to complete a concluding self-assessed assignment. This self-assessed assignment is a real life scenario activity (based on the use case of our teacher Alice), using a rubric across three proficiency levels and an exemplary solution rating. When you have completed this assignment, you will assess it yourself, following the rubric which will list the criteria required and give guidelines for the assessment.
This self-assessed assignment procedure consists of 5 steps:
-
Step 1. Real life scenario
-
Step 2. Getting familiar with the assessment rubric
-
Step 3. Prepare your answer
-
Step 4. Review a sample solution
-
Step 5. Self-evaluate your answer
2.4.2 Step 1. Real Life Scenario
Alice is an enthusiastic English Language teacher who has just been appointed in an Experimental High School, in Athens, Greece. She wants to use student data to gain insights and plan her teaching activities accordingly, so as to improve this year’s Grade 9 students’ academic performance.
Alice contacts Mr. Adams, appointed as school’s Data Protection Officer (DPO), to secure all necessary approvals for the sources handled by her school or by the corresponding district. As soon as Alice signs the required data protection consent form, she gets permission and downloads the datasets from the several sources.
Alice also requests to grant her access to the LMS used by the school (a new teacher account is created by the LMS administrator). Before implementing her flipped classroom strategy, she contacts the school’s DPO again to discuss any legal and ethical issues she needs to pay attention to. As advised by the DPO, she accesses the LMS and via the “User agreements page”, she reviews the existing user agreements and confirms that signed informed consent has been given for all participating students (either parental consent on behalf of minors or directly by the students, as defined by National Data Protection Authority).
Alice realizes that she must update the current consent form based to the new General Data Protection Regulation Policy.
You need to help Alice to prepare a new consent form for the students participating in her flipped classroom model.
2.4.3 Step 2. Getting Familiar with the Assessment Rubric
Alice reviews the Initial Consent Form.
Please help Alice to evaluate this Initial Consent Form using the Rubric for assessing the Consent Form and to identify potential issues.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response about the evaluation of the Initial Consent Form created by Alice, in the following reflective task. You may reflect on:
-
1.
Does this consent form comply with GDPR consent requirements?
-
2.
If not, what would you advise Alice to modify, so that this consent form is GDPR compliant and limits her school’s exposure to regulatory penalties?
2.4.3.1 Initial Consent Form
2.4.3.1.1 Introduction
Welcome to Athens Experimental High School (the “School” or “We”) Learning Management System (LMS). The School provides this LMS to you subject to the following Terms of Use and Privacy Policy (together, the “Terms”). When you use this LMS, you agree to abide by these Terms. If you do not agree to abide by these Terms, you may not use this LMS. Please read the Terms carefully.
The School reserves the right to make changes to this LMS and to modify the Terms at any time at its sole discretion. We encourage you to review the Terms frequently for modifications. By your use of this LMS, you agree to abide by any such modifications to the Terms, which are binding on you.
2.4.3.1.2 Privacy Policy
This Privacy Policy describes the School’s agreement with you regarding how we will handle certain information on the LMS. This Privacy Policy does not address information obtained from other sources such as submissions by mail, phone or other devices or from personal contact. By accessing the LMS and/or providing information to the School on the LMS, you consent to the collection, use and disclosure of certain information in accordance with this Privacy Policy.
2.4.3.1.2.1 Information Collected on Our LMS:
If you merely download material or browse through the LMS, our servers may automatically collect certain information from you which may include: (a) the name of the domain and host from which you access the Internet; (b) the browser software you use and your operating system; and (c) the Internet address of the website from which you linked to the LMS. The information we automatically collect may be used to improve the LMS to make it as useful as possible for our visitors; however, such information will not be tied to the personal information you choose to provide to us.
We do collect and keep personally identifiable information when you choose to voluntarily register to the LMS and submit such information. After your registration, we retain the information you submit for our records and to contact you from time to time. Please note that if we decide to change the manner in which we use or retain personal information, we may update this Privacy Policy, at our sole discretion.
2.4.3.1.2.2 Disclosure of Personal Information to Third Parties:
The School does not rent or sell personal information that you choose to provide to us nor does the School disclose credit card or other personal financial information to third parties other than as necessary to complete a credit card or other financial transaction or as required by law. The School does engage certain third parties to perform functions and provide services, including, without limitation, hosting and maintenance, customer relationship, database storage and management, payment transaction and direct marketing campaigns. We will share your personal information with these third parties, but only to the extent necessary to perform the functions and provide the services, and only pursuant to binding contractual obligations requiring such third parties to maintain the privacy and security of your data.
2.4.3.1.2.3 Receiving Promotional Materials:
We may send you information or materials such as newsletters, ebooks, whitepapers by e-mail or postal mail when you submit your address via the LMS. By your registration in the LMS, you are consenting to our sending you such information or materials.
If you do not want to receive promotional information or material, please send an email with your name, mailing address and email address to athens.expschool.online@gmail.com. When we receive your request, we may take reasonable steps to remove your name from such lists.
2.4.3.1.2.4 Cookies
A cookie is a small text file that a website can place on your computer’s hard drive for record-keeping or other administrative purposes. Our LMS may use cookies to help to personalise your experience on the LMS. Although most web browsers accept cookies automatically, usually you can modify your browser setting to decline cookies. If you decide to decline cookies, you may not be able to fully use the features of the LMS. Cookies may also be used at certain sites accessible through links on the LMS.
2.4.3.1.2.5 Links to Other Websites:
The School is not responsible for the practices or policies of the websites linked to or from the LMS, including without limitation their privacy practices or policies. If you elect to use a link that accesses another party’s website, you will be subject to that website’s practices and policies.
2.4.3.1.3 Terms of Use
2.4.3.1.3.1 For Informational Purposes Only
The School makes available the information on this Website for informational purposes only. You are solely responsible for the information you provide on this Website and for the information you use that you view on this Website. Information on this Website is not intended to be a replacement for direct consultation with the School; if you have questions or concerns, please contact the School directly.
2.4.3.1.3.2 Copyright and Trademark Information
The content included on this LMS, such as data, text, graphics, logos, images and software and its compilation is the property of the School and/or its content suppliers and is protected by copyright and trademark laws. In the event you upload any content including, without limitation, photographs or videos to this LMS, you (i) represent to the School and its affiliates that you have all rights necessary to upload the content; (ii) agree to indemnify the School and its affiliates for any third party infringement or other claims related thereto; and (iii) hereby license to the School and its affiliates a perpetual non-cancellable royalty-free license to use such uploaded content for any purposes in any media now existing or hereafter developed.
2.4.3.1.3.3 License for Your Use
For any period of time that you use this LMS and abide by these terms, the School grants to you a limited, revocable and nonexclusive license to access this LMS for your use but not to copy, download or modify it, or any portion of it, except with the express written consent of the School. This LMS or any portion of this LMS may not be reproduced, duplicated, copied, sold, visited or otherwise exploited without the express written consent of the School. You may not utilize framing to enclose any trademark, logo, content or other proprietary information contained on this LMS without the express written consent of the School. You may not use any meta tags or any other “hidden text” utilizing the School or its affiliates’ name or trademarks without the School’s express written consent.
You agree to use this LMS only for lawful purposes, and you acknowledge that your failure to do so may subject you to civil or criminal liability. You are responsible for ensuring that any materials you upload, post or submit to this LMS do not violate the copyright, trademark, trade secret or other personal or proprietary rights of any third party and you hereby agree to indemnify the School for any third party infringement or personal rights claims. You agree not to disrupt, modify, or interfere with this LMS or its associated software, hardware and servers in any way and you agree not to impede or interfere with others’ use of this LMS. You further agree not to alter or tamper with any information or materials on or associated with this LMS. Any unauthorized use or violation of these terms automatically terminates any permission or license granted by the School to access and use this LMS.
2.4.3.1.3.4 External Links
This LMS may provide links or references to third party websites or applications, including without limitation, third party websites or applications of advertisers or of providers of informational articles or other users. The School is not responsible for any information you choose to provide to those third party websites or applications; any information, products or services you acquire from those third party websites or applications, or any damages arising from your access to or use of those third party websites or applications.
Any links to third party websites and applications are provided as a convenience to the visitors of this LMS and any inclusion of any such links in this Website does not imply an endorsement or warranty of the third party websites or applications or their security, content, products, offerings or services. You are cautioned that any third party websites or applications are governed by their own terms of use and privacy policies, so when linking you should make sure to visit the appropriate pages of those third party websites or applications to determine what terms of use and privacy policies will apply to your use.
-
YES, I GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.
-
NO, I DO NOT GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.
Adapted from: https://www.whitbyschool.org/privacy-policy
2.4.3.2 Rubric for Assessing the Consent Form
Criteria | 1 Unacceptable | 3 Good/Solid | 5 Exemplary |
---|---|---|---|
Language | The consent request is presented neither in a clear, nor in a concise way, using language that is not easy to understand | The consent request is presented in a quite clear and concise way, using language that is quite easy to understand | The consent request is presented in a very clear and concise way, using language that is very easy to understand |
Explicit and Distinguishable | The consent request is not explicit or distinguishable from other pieces of information. | The consent request is quite distinguishable from other pieces of information but is not given via a positive act. | The consent request is clearly distinguishable from other pieces of information, given via an electronic tick-box that the individual has to explicitly check online |
Freely given consent | The individual does not have a free choice. | The individual has a free choice and it is quite clear how to refuse consent without being at a disadvantage. | The individual has a free choice and it is very clear how to refuse consent without being at a disadvantage. |
Possibility to withdraw the given consent | The consent form does not include the possibility to withdraw consent | The consent form includes the possibility to withdraw consent, but does not explain how to do it. | The consent form includes the possibility to withdraw consent and explains clearly how to do it. |
Rights of the data subject | The individuals are not informed about their rights as a data subject (GDPR Art.12 to 23) | Rights of the data subject (GDPR Art.12 to 23) are somehow stated but the modalities to exercise these rights are not clear. | Individuals are clearly informed about their rights as a data subject (GDPR Art.12 to 23) and they can effectively exercise these rights |
Identity of the organisation processing data | The consent form does not include the identity of the organisation processing data | The consent form includes quite clearly the identity of the organisation processing data | The consent form includes very clearly the identity of the organisation processing data |
Purposes for which the data is being processed | The consent form does not explain the purposes for which the data is being processed | The consent form explains quite clearly the purposes for which the data is being processed | The consent form explains very clearly the purposes for which the data is being processed |
Describes the type of data that will be processed | The consent form does not describe the type of data that will be processed | The consent form describes the type of data that will be processed | The consent form describes in detail the type of data that will be processed |
International transfer of data | The consent form does not include information about whether the consent is related to an international transfer of your data | The consent form includes quite clearly information about whether the consent is related to an international transfer of your data | The consent form includes clearly information about whether the consent is related to an international transfer of your data |
2.4.4 Step 3. Prepare Your Answer
Please assist Alice in preparing a consent form for the students participating in the online course for the flipped classroom initiative.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response about the preparation of the consent form for Alice’s students participating in the online course for the flipped classroom initiative, in the following reflective task. You may reflect on:
-
1.
How should the consent form be formulated so that Alice can obtain consent compliant with GDPR requirements?
-
2.
What are the key features to create an effective opt-in consent form that works under GDPR?
2.4.5 Step 4. Review a Sample Solution
Please review a sample of an Exemplary solution that follows the criteria specified in the Rubric for assessing the Consent Form.
ACTIVITY/PRACTICE QUESTION (Reflect on)
We encourage you to elaborate on your response about the Exemplary solution that follows the criteria specified in the Rubric for assessing the Consent Form, in the following reflective task. You may reflect on:
-
1.
Do you identify any GDPR requirements that you did not take under consideration when creating your consent form?
2.4.5.1 Exemplary Sample Solution
Consent Form to Register and Participate in the Online Course for the English Language Course of the ninth Grade of Athens Experimental High School.
In order to register and participate in the online course that will be offered for the English Language Course of the ninth Grade, you are invited to indicate your consent for the collection and processing of your personal data for the purposes of the online course, administered by Athens Experimental High School.
Athens Experimental High School (or “we”) uses a variety of resources to support student learning. Moodle™ software has been adopted as Athens Experimental High School’s Learning Management System (LMS). Moodle™ software is free and open source, and allows educators to create a private space online, filled with tools that easily create courses and various activities, all optimised for collaborative learning. In order to provide access to our students to the online course for the English Language Course of the ninth Grade on this platform/site, we need to collect and store personal information about them. You may also refer to https://moodle.com/privacy-notice/.
Please note:
-
1.
The online course for the English Language Course of the ninth Grade will be carried out from 15/09/2021 to 15/06/2021.
-
2.
Before you proceed to the registration to this online course, you will be asked to indicate your consent for the collection and processing of your personal data for the purposes of the course.
-
3.
For the purposes of GDPR Regulation: ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements; ‘controller’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law.
-
4.
The Data Controller for data processed under this Notice is:
Athens Experimental High School (VAT 021 27 76 45).
20 Makrygianni Road.
11,676 Athens.
Greece.
Legal basis for processing the personal and sensitive data:
Personal Data:
In connection with this online course, the Athens Experimental High School’s collection and processing of the following Personal Data is lawful based on.
Article 6.1(a), GDPR, Consent.
Article 6.1(b), GDPR, Contract.
Article 6.1(c), GDPR, Legal Obligation.
Article 6.1(f), GDPR, Legitimate Interest:
□ Name, Surname, Email Address.
□ User activity and contribution data.
Sensitive Data:
In connection with this research, the Athens Experimental High School’s collection and processing of the following Sensitive Data is lawful based on consent (Article 9.2(a), GDPR):
□ Gender.
Potential Benefits:
The participation in this online course enables data subjects (students) to effectively collaborate with their peers, and tutor(s) to collect data, efficiently provide resources, timely feedback and differentiated learning opportunities.
Potential Risk or Discomforts:
We do not perceive of any risk or discomfort in participating in the online course.
Storage of Data:
The installation of the Moodle™ software platform is hosted in a secure server at Athens Experimental High School’s premises. The collected data is also stored in this secure server for the time required by the purposes described in this notice, for maximum 5 years.
Data transfer outside the European Union:
We may share some of the data collected with services located outside the European Union, in particular through the aforementioned Moodle™ software services.
Right to Withdraw:
Your participation in this online course is voluntary. You are under no obligation to participate in this online course and you may withdraw consent at any time, without being at a disadvantage, by contacting the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.
Rights of Data Subject:
Whilst Athens Experimental High School is in possession of or processing your personal data, you, the data subject, have the following rights:
-
Right of access – you have the right to request a copy of the information that we hold about you.
-
Right of rectification – you have a right to correct data that we hold about you that is inaccurate or incomplete.
-
Right to be forgotten – in certain circumstances you can ask for the data we hold about you to be erased from our records. The erasure of your information shall be subject to the Athens Experimental High School’s need to retain certain information pursuant to any other identified lawful basis.
-
Right to restriction of processing – where certain conditions apply to have a right to restrict the processing.
-
Right of portability – you have the right to have the data we hold about you transferred to another organisation.
-
Right to object – you have the right to object to certain types of processing such as direct marketing.
-
Right to object to automated processing, including profiling – you also have the right to be subject to the legal effects of automated processing or profiling.
-
Right to judicial review: in the event that Athens Experimental High School refuses your request under rights of access, we will provide you with a reasonable explanation.
by contacting the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.
If the Athens Experimental High School’s use of your information is pursuant to your consent, you have the right to withdraw consent without affecting the lawfulness of the Athens Experimental High School’s use of the information prior to receipt of your request.
If you think your data protection rights have been breached you have the right to lodge a complaint with Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com and/or your national Data Protection Authority (DPA).
Data Subject Concerns and Reporting:
If you have any questions concerning the online course or experience any discomfort related to the online course, please contact the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.
Conflict of Interest
We do not perceive any conflicts of interest in the development of this online course.
Compensation:
There is no compensation for data subjects in this online course.
Confidentiality:
The only people processing your data will be the tutor(s) involved in the Athens Experimental High School’s online course(s). The tutor(s) undertake to keep any information provided herein confidential, not to let it out of our possession and to report on the findings from the perspective of the entire participating group and not from the perspective of an individual. Please note that confidentiality cannot be guaranteed while data is in transit over the Internet.
Purposes for which the data is being collected and processed:
The data which is collected and processed via the online course in the Course Management System (Moodle) is being used by the Athens Experimental High School to facilitate teaching and learning. For this, online teaching resources are uploaded where the data subjects (students) enrol and study the lecture material at home. The material is in the form of videos, small activities with automatic feedback (online quizzes), and forum discussions. The data subjects (students) can undertake some additional homework online to further check their understanding and extend their learning. Though this online course and via the usage of CMS tools the tutor(s) monitor the data subjects (students) learning process, discover patterns, find indicators for success and indicators for poor marks or drop-out and proceed with recommendations and revisions of the course’s online learning activities and educational resources, aiming to improve data subjects’ (students’) academic performance.
We ensure that the information we collect, process and use is appropriate for these correspondence purposes.
By indicating consent to participate in this online course you also indicate consent for the possible use of data for automated decision making, such as profiling, to identify data subjects’ (students’) progress against a range of indicators and activities identified to have an impact on data subjects’ (students’) success in the online course.
Consent to register and participate in the Online Course for the English Language Course of the ninth Grade of Athens Experimental High School.
Selecting “YES, I AGREE” below indicates that:
-
You have read the above information;
-
You voluntarily agree to participate in this online course;
-
You understand the procedures described above;
-
You give consent for the use of your Personal Data for the purposes outlined in this notice;
-
You give consent for the use of your Sensitive Data for the purposes outlined in this notice;
-
You are at least 15 years of age.
-
YES, I AGREE
-
NO, I DO NOT AGREE
For students who are less than 15 years of age, consent from a parent or guardian is necessary
-
YES, I GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.
-
NO, I DO NOT GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.
2.4.6 Step 5. Self-Evaluate Your Answer
Now that you have seen the Exemplary sample solution, please rate your initial answer (evaluate the consent form you created), using the criteria in the Rubric for assessing the Consent Form.
Language
-
1.
The consent request is presented neither in a clear, nor in a concise way, using language that is not easy to understand
-
2.
The consent request is presented in a quite clear and concise way, using language that is quite easy to understand
-
3.
The consent request is presented in a very clear and concise way, using language that is very easy to understand
Explicit and Distinguishable
-
1.
The consent request is not explicit or distinguishable from other pieces of information.
-
2.
The consent request is quite distinguishable from other pieces of information but is not given via a positive act.
-
3.
The consent request is clearly distinguishable from other pieces of information, given via an electronic tick-box that the individual has to explicitly check online
Freely given consent
-
1.
The individual does not have a free choice.
-
2.
The individual has a free choice and it is quite clear how to refuse consent without being at a disadvantage.
-
3.
The individual has a free choice and it is very clear how to refuse consent without being at a disadvantage.
Possibility to withdraw the given consent
-
1.
The consent form does not include the possibility to withdraw consent
-
2.
The consent form includes the possibility to withdraw consent, but does not explain how to do it.
-
3.
The consent form includes the possibility to withdraw consent and explains clearly how to do it.
Rights of the data subject
-
1.
The individuals are not informed about their rights as a data subject (GDPR Art.12 to 23)
-
2.
Rights of the data subject (GDPR Art.12 to 23) are somehow stated but the modalities to exercise these rights are not clear.
-
3.
Individuals are clearly informed about their rights as a data subject (GDPR Art.12 to 23) and they can effectively exercise these rights
Identity of the organisation processing data
-
1.
The consent form does not include the identity of the organisation processing data
-
2.
The consent form includes quite clearly the identity of the organisation processing data
-
3.
The consent form includes very clearly the identity of the organisation processing data
Purposes for which the data is being processed
-
1.
The consent form does not explain the purposes for which the data is being processed
-
2.
The consent form explains quite clearly the purposes for which the data is being processed
-
3.
The consent form explains very clearly the purposes for which the data is being processed
Describes the type of data that will be processed
-
1.
The consent form does not describe the type of data that will be processed
-
2.
The consent form describes the type of data that will be processed
-
3.
The consent form describes in detail the type of data that will be processed
International transfer of data
-
1.
The consent form does not include information about whether the consent is related to an international transfer of your data
-
2.
The consent form includes quite clearly information about whether the consent is related to an international transfer of your data
-
3.
The consent form includes clearly information about whether the consent is related to an international transfer of your data
References
Barbaro, M., & Zeller, T. (2006, August 9). A face is exposed for AOL searcher No. 4417749. The New York Times. Retrieved from https://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=all&_r=0
Moore Barlow. (2018, June 26). GDPR and safeguarding in schools: What you need to know. Retrieved from https://www.moorebarlow.com/gdpr-and-safeguarding-in-schools-what-you-need-to-know/
BrightBytes. (2017, June 29). BrightBytes acquires trusted IPaaS provider authentica solutions. Retrieved from https://www.brightbytes.net/resources-archive/pressrelease
Broad, E., Smith, A., & Wells, P. (2017). Helping organisations navigate ethical concerns in their data practices. Open Data Institute. Retrieved from https://www.scribd.com/document/358778144/ODI-Ethical-Data-Handling-2017-09-13#download
Castanedo, F. (2015). Data preparation in the big data era: Best practices for data integration. O’Reilly Media. Retrieved from https://www.tamr.com/wp-content/uploads/2015/09/Data_Preparation_in_the_Big_Data_Era_Tamr.pdf
Cearley, D. (2017, June 23). Devise an effective cloud computing strategy by answering five key questions. Retrieved from https://www.gartner.com
Chakrabarti, S., Cox, E., Frank, E., Güting, R. A., Han, J., Jiang, X., Kamber, M., Lightstone, S. S., Nadeau, T. P., Neapolitan, R. E., Pyle, D., Refaat, M., Schneider, M., Teorey, T. J., & Witten, I. H. (2009). Data mining: Know it all. M. Kaufmann.
COMPARE THE CLOUD. (2018, February 21). 6 Pros and cons of cloud storage for business. Retrieved from https://www.comparethecloud.net
Cybersecurity and Digital Business Risk Management. (2020, December 29). Retrieved from https://www.gartner.com
D’Acquisto, G. D., Domingo-Ferrer, J., Kikiras, P., Torra, V., de Montjoye, Y. A., & Bourka, A. (2015). Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics (European Union Agency for Network and Information Security (ENISA)). ENISA. https://doi.org/10.2824/641480
Danezis, G., Domingo-Ferrer, J., Hansen, M., Hoepman, J.-H., Métayer, D. L., Tirtea, R., & Schiffner, S. (2014). Privacy and data protection by design – from policy to engineering (European Union Agency for Network and Information Security (ENISA)). ENISA. https://doi.org/10.2824/38623
Digital Preservation Coalition. (2015). Digital preservation handbook (2nd ed.). Retrieved from https://www.dpconline.org/handbook/digital-preservation/why-digital-preservation-matters
Drachsler, H., & Greller, W. (2016). Privacy and Analytics – it’s a DELICATE Issue. A Checklist for Trusted Learning Analytics. In LAK 16 Conference Proceedings of the Sixth International Conference on Learning Analytics & Knowledge Conference. ACM. https://doi.org/10.1145/2883851.2883893
European Commission (2013). Ethics for researchers. In Facilitating research excellence in FP7. Publications Office of the European Union.
European Commission. (2014). Report to the European Commission on New modes of learning and teaching in higher education. High Level Group on the Modernisation of Higher Education. Luxembourg: Publications Office of the European Union. https://doi.org/10.2766/81897
European Commission. (2018a, January). National data protection authorities. Retrieved from https://edpb.europa.eu/about-edpb/board/members_en
European Commission. (2018b, January 4). 2018 reform of EU data protection rules. Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform_en
European Commission. (2018c, March 22). Data protection and online privacy. Retrieved from https://europa.eu/youreurope/citizens/consumers/internet-telecoms/data-protection-online-privacy/index_en.htm
European Commission. (2018d, August 01). Can personal data about children be collected? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rights-citizens/how-my-personal-data-protected/can-personal-data-about-children-be-collected_en
European Commission. (2018e, August 01). For how long can data be kept and is it necessary to update it? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/principles-gdpr/how-long-can-data-be-kept-and-it-necessary-update-it_en
European Commission. (2018f, August 01). How much data can be collected? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/principles-gdpr/how-much-data-can-be-collected_en
European Commission. (2018g, August 01). Rights for citizens. Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rights-citizens_en
European Commission. (2018h). The GDPR: new opportunities, new obligations: What every business needs to know about the EU’s General Data Protection Regulation. https://ec.europa.eu/info/sites/default/files/data-protection-factsheet-sme-obligations_en.pdf
European Commission. (2018i, August 01). What are Data Protection Authorities (DPAs)? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-are-data-protection-authorities-dpas_en
European Commission. (2018j, August 01). What are my rights? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rights-citizens/my-rights/what-are-my-rights_en
European Commission. (2018k, August 01). What are the responsibilities of a Data Protection Officer (DPO)? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/obligations/data-protection-officers/what-are-responsibilities-data-protection-officer-dpo_en
European Commission. (2018l, August 01). What does data protection ’by design’ and ’by default’ mean? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/
European Commission. (2018m, August 01). What is personal data? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en
European Commission. (2018n, August 01). What personal data is considered sensitive? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/legal-grounds-processing-data/sensitive-data/what-personal-data-considered-sensitive_en
European Commission. (2018o, August 01). When is a Data Protection Impact Assessment (DPIA) required? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/obligations/when-data-protection-impact-assessment-dpia-required_en
European Commission. (2018p, August 01). When is consent valid? Retrieved from https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/legal-grounds-processing-data/grounds-processing/when-consent-valid_en
European Union. (2018). It’s your data – take control. Publications Office of the European Union.
GDPR.EU. (2019, February, 13). Does the GDPR apply to companies outside of the EU? Retrieved from https://gdpr.eu/companies-outside-of-europe/
Griffiths, D., Drachsler, H., Kickmeier-Rust, M., Steiner, C., Hoel, T., & Greller, W. (2016, January 6). Is privacy a show-stopper for learning analytics? A review of current issues and their solutions. Learning Analytics Review, ISSN: 2057–7494. LACE project.
Hansell, S. (2006, August 8). AOL removes search data on group of web users. The New York Times. Retrieved from https://www.nytimes.com/2006/08/08/business/media/08aol.html
Harish, A. (2019, March 21). When NASA lost a spacecraft due to a metric math mistake. [blog post]. Retrieved from https://www.simscale.com/blog/2017/12/nasa-mars-climate-orbiter-metric/
Higgins, S. (2008). The DCC curation lifecycle model. International Journal of Digital Curation, 3(1), 134–140. https://doi.org/10.2218/ijdc.v3i1.48
Hill, K. (2012, February 16). How target figured out a teen girl was pregnant before her father did. Forbes. Retrieved from https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/?sh=df77ff666686
Hill, A. (2014, September 15). A day in the life of a data mined kid [Web log post]. Retrieved from https://www.marketplace.org
ICPSR. (2018). Data management & curation. Institute for Social Research at the University of Michigan. Retrieved from https://www.icpsr.umich.edu/icpsrweb/content/datamanagement/index.html
iKeepSafe. (2017). Data privacy in education: An iKeepSafe educator training course. Retrieved from https://ikeepsafe.org/wp-content/uploads/2018/02/Data-Privacy-in-Education-Full-Curriculum-2017-2.pdf
Intersoft Consulting. (2018, October 5). General Data Protection Regulation (GDPR): Chapter 3 Rights of the data subject. Retrieved from https://gdpr-info.eu/chapter-3/
Interuniversity Consortium for Political and Social Research (ICPSR). (2009). Principles and good practice for preserving data (International Household Survey Network, IHSN Working Paper No 003). Retrieved from http://www.ihsn.org/principles-and-good-practice-for-preserving-data
IT Governance UK. (2016, October 3). Data protection impact assessments under the GDPR. Retrieved from https://www.itgovernance.co.uk/privacy-impact-assessment-pia
Jisc. (2006). Digital preservation: Continued access to authentic digital assets. Retrieved from https://www.webarchive.org.uk/wayback/archive/20140613220103/http://www.jisc.ac.uk/media/documents/publications/digitalpreservationbp.pdf
Johnston, L. R., Carlson, J., Hudson-Vitale, C., Imker, H., Kozlowski, W., Olendorf, R., & Stewart, C. (2018). How Important is Data Curation? Gaps and Opportunities for Academic Libraries. Journal of Librarianship and Scholarly Communication, 6(1), eP2198. https://doi.org/10.7710/2162-3309.2198
Kononow, P. (2018, September 16). What is metadata (with examples). Dataedo. Retrieved from https://dataedo.com/kb/data-glossary/what-is-metadata
Krueger, M. (2017, October13). 3 Steps to identify and protect sensitive data for the GDPR [Web log post]. Retrieved from https://www.pacificdataintegrators.com/insights/3-Steps-to-Identify-and-Protect-Sensitive-Data-for-GDPR
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). John Wiley & Sons, Inc.. https://doi.org/10.1002/9781119013563
Loon, R. (2018, February 6). Digital transformation requires a data-driven culture. Retrieved from https://www.thedigitaltransformationpeople.com/channels/people-and-change/digital-transformation-requires-data-driven-culture/
Maletic, J. I., & Marcus, A. (2000). Data cleansing: beyond integrity analysis. Fifth Conference on Information Quality (IQ 2000), 200–209.
Marr, B. (2017, November 15). Outside insight: Why external data is the fuel of tomorrow’s business success. Retrieved from: https://www.forbes.com/sites/bernardmarr/2017/11/15/outside-insight-why-external-data-is-the-fuel-of-tomorrows-business-success/?sh=21c35a0a5e1d
Narayanan, A., & Shmatikov, V. (2008). Robust deanonymization of large datasets (how to break anonymity of the netflix prize dataset). The University of Texas at Austin.
Open Data Institute. (2018a, March 24). No one owns data: we need to strengthen our rights. Retrieved from https://theodi.org/article/no-one-owns-data-we-need-to-strengthen-our-rights/
Open Data Institute. (2018b, April 25). Updating the data ethics Canvas. Retrieved from https://theodi.org/article/updating-the-data-ethics-canvas/
Panetta, K. (2019, October 10). Is the cloud secure? Retrieved from https://www.gartner.com/
Pentland, A. S. (2013). The data-driven society. Scientific American, 309(4), 78–83.
Phillips, M., Bailey, J., Goethals, A., & Owens, T. (2013). The NDSA levels of digital preservation: An explanation and uses. National Digital Stewardship Alliance (NDSA).. Retrieved from https://ndsa.org/documents/NDSA_Levels_Archiving_2013.pdf
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal L. 119. Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex:32016R0679
Riley, J. (2017). Understanding metadata: What is metadata, and what is it for? Baltimore.
Romero, C., Ventura, S., Pechenizkiy, M., & Baker, R. S. J. D. (2010). Handbook of Educational Data Mining. CRC Press.
Romero, C., Romero, J. R., & Ventura, S. (2014). A survey on pre-processing educational data. In Educational data mining studies in computational intelligence (pp. 29–64). Springer. https://doi.org/10.1007/978-3-319-02738-8_2
Sclater, N. (2017). Consent and the GDPR: what approaches are universities taking?. Effective Learning Analytics. Retrieved from https://analytics.jiscinvolve.org/wp/2017/06/30/consent-and-the-gdpr-what-approaches-are-universities-taking/
Sclater, N., & Bailey, P. (2015). Code of practice for learning analytics. Jisc. Retrieved from https://www.jisc.ac.uk/guides/code-of-practice-for-learning-analytics
Shacklock, X. (2016). From bricks to clicks: The potential of data and analytics in higher education. Policy Connect – Higher Education Commission.
Slade, S., & Tait, A. (2019). Global guidelines: Ethics in Learning Analytics. International Council for Open and Distance Education.
Stonebraker, M. (2014, October 16). Three approaches to scalable data curation. Strata Hadoop World in New York 2014. Retrieved from https://www.tamr.com/blog/three-approaches-scalable-data-curation-stonebraker-stratahadoop/
Sweeney, L. (2000). Simple Demographics Often Identify People Uniquely (Data Privacy Working Paper 3). Carnegie Mellon University.
UCL. (2018, November 15). GDPR – Anonymisation & Pseudonymisation. Retrieved from https://www.ucl.ac.uk/data-protection/guidance-staff-students-and-researchers/practical-data-protection-guidance-notices/anonymisation-and
UK, Information Commissioner’s Office (ICO). (2012, November). Anonymisation: Managing data protection risk code of practice. Retrieved from https://ico.org.uk/media/1061/anonymisation-code.pdf
USGS. (2014). Guidelines for the preservation of digital scientific data. Retrieved from https://ndsa.org/documents/USGS_Guidelines_for_the_Preservation_of_Digital_Scientific_Data_Final.pdf
SaiGayatri Vadali. (2017, December 28). Day 7: Data cleaning – All you need to know about it. Retrieved from https://becominghuman.ai/day-7-data-cleaning-all-that-you-need-to-know-about-it-23b05738abe7
Venezia, P. (2014, June 23). Murder in the Amazon cloud. Retrieved from https://www.infoworld.com/
Wes, M. (2018). Looking to comply with GDPR? Here’s a primer on anonymization and pseudonymization. Retrieved from https://iapp.org/news/a/looking-to-comply-with-gdpr-heres-a-primer-on-anonymization-and-pseudonymization/
Woody, C. (2004). Risk methodology K-12. Retrieved from https://studylib.net/doc/8097924/k-12-risk-methodology
Useful Video Resources
External Video: Data wrangling for faster, more accurate analysis [1:47].
External Video: Meta… what? metadata! [5:25].
External Video: Learn more about data interoperability [1:12].
External Video: ICPSR 101: What is data curation? [1:29].
External Video: Data curation @UCSB [2:29].
External Video: Why digital preservation is important for everyone [2:51].
External Video: Public cloud vs private cloud vs hybrid cloud [3:28].
External Video: How toy story 2 almost got deleted: Stories from pixar animation: ENTV [2:25].
External Video: Introduction to data ethics [3:23].
External Video: What is the GDPR? [1:11].
External Video: Why develop a data science code of ethics? [3:06].
External Video: Privacy overview for K12 teachers and Administrators [5:26].
External Video: Who uses student data? [2:30].
External Video: Protecting student-data privacy: An expert’s View [3:44].
Further Readings
Acaps. (2016). Technical brief data cleaning. Retrieved from Acaps: https://www.acaps.org/sites/acaps/files/resources/files/acaps_technical_brief_data_cleaning_april_2016_0.pdf
Beagrie, N., Charlesworth, A., & Miller, P. (2015). The national archives guidance on cloud storage and digital preservation (2nd ed.). Crown copyright licensed under the Open Government Licence v2.0.
Chapman, A. D. (2005a). Principles and methods of data cleaning – primary species and species occurrence data, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen.
Chapman, A. D. (2005b). Principles of data quality, version 1.0. Report for the global biodiversity information facility, Copenhagen.
Common Education Data Standards (CEDS). (2017). Retrieved from https://ceds.ed.gov/dataModel.aspxCrowdFlower. (2016). Data Science Report (Rep.). Retrieved from Crowd Flower website https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf
Consortium for School Networking (CoSN). (2017). Cybersecurity. Retrieved from https://www.cosn.org/cybersecurity
Data Cleaning: In-Depth Guide [Web log post]. (2019, January 1). Retrieved from https://blog.aimultiple.com/data-cleaning/
Data Quality Campaign. (2013). Roadmap to safeguarding student data key focus areas for state education agencies. Retrieved from https://2pido73em67o3eytaq1cp8au-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/DQC-roadmap-safeguarding-data-June24.pdf
Data Quality Campaign. (2018a). Education data legislation review: 2018 state activity. Retrieved from https://2pido73em67o3eytaq1cp8au-wpengine.netdna-ssl.com/wp-content/uploads/2018/09/2018-DQC-Legislative-Summary.pdf
Data Quality Campaign. (2018b). TIME TO ACT 2018: Using data to meet ESSA Goals. Retrieved from https://2pido73em67o3eytaq1cp8au-wpengine.netdna-ssl.com/wp-content/uploads/2018/12/DQC-Time-to-Act-2018.pdf
Data Science Primer. (2017, June 9). Chapter 3: Data cleaning steps and techniques. Retrieved from https://elitedatascience.com/data-cleaning
de Jonge, E., & van der Loo, M. (2013). An introduction to data cleaning with R. Statistics Netherlands.
DQC. (2015). A stoplight for student data use. Retrieved from https://2pido73em67o3eytaq1cp8au-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/DQC-FERPA-Stoplight.pdf
Elgabry, O. (2019, February 28). The ultimate guide to data cleaning. When the data is spewing garbage. Retrieved from https://towardsdatascience.com/the-ultimate-guide-to-data-cleaning-3969843991d4
Freitas, A., & Curry, E. (2015). Big data curation. In J. Cavanillas, E. Curry, & W. Wahlster (Eds.), New horizons for a data-driven economy: A roadmap for usage and exploitation of big data in Europe (pp. 87–118). Springer. https://doi.org/10.1007/978-3-319-21569-3_6
Gartner. (2018). Plan your GDPR Journey. Retrieved from https://blogs.gartner.com/smarterwithgartner/files/2018/05/PR_435169_Beadle_Are_You_Ready_for_GDPR_Infographic_final.png
GDPR Report. (2017, November 07). Data masking: Anonymisation or pseudonymisation? Retrieved from https://gdpr.report/news/2017/11/07/data-masking-anonymisation-pseudonymisation/
Hollidge, R. (2018, February 21). Where are you storing your data and is your storage method GDPR compliant?. Retrieved from https://www.instantonit.com
Hswe, P., & Musser, L. (n.d.). 2.1 What is metadata? Penn State University Libraries. Retrieved from https://www.e-education.psu.edu/dmpt/node/660
Johnson, L. R., et al. (2018). How important are data curation activities to researchers? Gaps and opportunities for academic libraries. Journal of Librarianship and Scholarly Communication, 6(General Issue), eP2198. https://doi.org/10.7710/2162-3309.2198
Ladd, M. (2018, April 5). What is metadata and how does it impact education? [Blog post].
Larkin, A. (2018, June 26). Disadvantages of cloud computing. Retrieved from https://cloudacademy.com
Mason, J. (2004). Context and metadata for learning, education, and training. In R. McGreal (Ed.), Online education using learning objects (pp. 168–182). Routledge Farmer.
Miller, K., Miller, M., Moran, M., Dai, B. (2018). Data management life cycle: Final report. In PRC 17–84F. A&M Transportation Institute.
National Forum on Education Statistics. (2009). Forum guide to metadata: The meaning behind education data (NFES 2009–805). U.S. Department of Education. National Center for Education Statistics.
OpenRefine. (2010, July 11). Introduction to OpenRefine. Retrieved from http://openrefine.org/
Osborne, J. W. (2010). Data cleaning basics: Best practices in dealing with extreme scores. Newborn and Infant Nursing Reviews, 10(1), 37–43. https://doi.org/10.1053/j.nainr.2009.12.009
Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why researchers should always check for them). Practical Assessment, Research, and Evaluation, 9(6).
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556. https://doi.org/10.3102/00346543074004525
Richter, T., & Pawlowski, J. (2007). The need for standardization of context metadata for e-learning environments. e-ASEM Conference. Seoul.
Shacklett, M. (2016, August 12). Data curation takes the value of big data to a new level. Retrieved from https://www.techrepublic.com/article/data-curation-takes-the-value-of-big-data-to-a-new-level/
Student Data Principles. (2014). Retrieved from https://studentdataprinciples.org/
UK, Information Commissioner’s Office (ICO). (2017, October 20). Overview of the General Data Protection Regulation (GDPR). Retrieved from https://ico.org.uk/media/for-organisations/data-protection-reform/overview-of-the-gdpr-1-13.pdf
UK, Information Commissioner’s Office (ICO). (2019a, May 17). Guide to the General Data Protection Regulation (GDPR). Retrieved from https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/
UK, Information Commissioner’s Office (ICO). (2019b, April 30). Individual rights. Retrieved https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/
Verbert, K., Manouselis, N., Drachsler, H., & Duval, E. (2012). Dataset-driven research to support learning and knowledge analytics. [Electronic version]. Educational Technology & Society, 15(3), 133–148.
Webber, M. (2018). The GDPR’s impact on the cloud service provider as a processor. Privacy & Data Protection Journal, 16(4).
Whitelegg, D. (2018, May 25). A developer’s guide to the GDPR. Understand how the GDPR impacts you. Retrieved from https://developer.ibm.com/articles/s-gdpr1/
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 14(10). https://doi.org/10.18637/jss.v059.i10
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Mougiakou, S., Vinatsella, D., Sampson, D., Papamitsiou, Z., Giannakos, M., Ifenthaler, D. (2023). Adding Value and Ethical Principles to Educational Data. In: Educational Data Analytics for Teachers and School Leaders. Advances in Analytics for Learning and Teaching. Springer, Cham. https://doi.org/10.1007/978-3-031-15266-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-15266-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15265-8
Online ISBN: 978-3-031-15266-5
eBook Packages: EducationEducation (R0)