Introduction

Data serves as a central element (Sadowski 2019) in the contemporary digital world. The late 1950s witnessed a revolution in computing, resulting in the ongoing computerisation and digitisation of diverse paper-based archives and records (Hu et al. 2021), which led to a heightened concentration of data and information. With the rising tide of data protection concerns (Romanou 2018), people have been increasingly aware of their data and privacy rights (Wagner 2024). In light of these concerns, lawmakers globally have drafted and refined their respective data protection laws to prevent or mitigate associated social structural risks arising from the wrongful processing of data (Tamburri 2020) or data breaches (Zhao and Cheng 2024). Stance on data protection can be seen as one of the sources of numerous debates, suspicions, and fears. The extent to which personal data is restricted in its usage and the degree to which it is protected are closely intertwined with the socioeconomic development of the digital society, as well as human dignity and privacy rights.

The present study focuses on the stance of data protection laws in the United States, the European Union, and China. The reason for narrowing down our choice to these three jurisdictions is due to their salient features and significant impacts on the global landscape of data protection. The EU took the lead in creating the “right to the protection of personal data” (Fuster and Gutwirth 2013). Established in the EU Charter of Fundamental Rights in 2000, it not only plays a pivotal role in personal data protection in the EU but marks a milestone for the protection of personal data around the world. Data protection emerged almost simultaneously in the U.S. and the EU during the 1970s (Bennett 1988). Due to advancements in computer technology and large-scale Internet applications (Cheng and Liu 2024), the U.S. collects and stores almost the largest amount of data, which gives it rich practical experience in data processing and legal regulation. China, like other jurisdictions, enacted data protection laws later than the aforementioned blocks (Pernot-Leplay 2020). However, it is building its legal framework at a rapid pace (Cheng et al. 2023), showing its strong potential and burgeoning power. Given the notable representativeness of the data protection legislation across these three jurisdictions, a comparative investigation into their respective frameworks helps to gain a deep understanding of the stance spectrum of data protection.

The landscape of data protection laws is dynamically broadening, exemplified by data protection challenges (Li et al. 2023) posed by generative artificial intelligence typified by ChatGPT (Cheng and Liu 2023). This trend has infused the discourse on data protection laws and privacy with vitality and complexity. In addition to the collision of diverse legal views, one particular challenge is that, it calls for more interdisciplinary insights and knowledge paradigms (González et al. 2022) to enhance understanding of data protection. Whilst the discussion of data protection expands in scope and depth apparently, linguistic exploration of this theme remains an uncharted domain. Relevant literature has been published so far involving an institutional argumentative exploration of Swiss Federal Data Protection (Greco et al. 2016), and a study on data protection laws from the perspective of terminological variation and ideology by Pei and Cheng (2020). Compared to the latter one, the present study extends its scope from the U.S. and the EU to three jurisdictions with the inclusion of China, and shifts the perspective to “stance” so as to expand the breadth and depth of linguistic exploration in the field of data protection laws.

The research questions of this study are:

  1. (1)

    What are the divergences and convergences of stance expressions between data protection laws of the U.S., the EU, and China?

  2. (2)

    What is the overall stance orientation of data protection laws across the three jurisdictions?

  3. (3)

    What are the underlying rationales for the choices and representations of stance among the three jurisdictions?

This study aims to make contributions from both theoretical and practical perspectives. The theoretical significance of this study is twofold. Firstly, it fills in the gap in the study of data protection laws from a stance perspective, offering contributions to the literature on stance in legislative discourse. Secondly, it aims to put forward a specialised research model of stance which contributes to more informed and comprehensive stance identification and analysis in legal discourse. The practical significance of this study can also be delineated from two perspectives. The first is to provide guidance for individuals and assist them in navigating and seeking improved protection of their personality and privacy rights by uncovering the stance of data protection laws. Second, it helps legislators convey their conceptual views, value balances, and consensus expectations in a more precise way through the presentation of stance features when revising or refining laws.

Stance and stance in law

Background review of stance

Generally, stance expressions play an important role in legislators’ articulation of legal values and strategic intentions within legislative texts. To investigate how legislators in different jurisdictions use stance as a tool to shape the identity of data protectors, thereby meeting the expectations of target audiences in legislative discourse, this study employs Hyland’s (2005, 2018) concept and framework of stance for analysis.

The conceptualisation of stance involves a set of umbrella terms, including attitude (Halliday 2004), appraisal (Martin 2000; Martin and White 2005), evaluation (Bednarek 2006; Hunston and Thompson 2000), interaction (Du Bois 2002), and posture (Grabe 1984). Among the current interpretations of the concept of stance, three of them are particularly influential and widely applied. Biber and Finnegan (1989: 93) define stance as “the lexical and grammatical expression of attitudes, feelings, judgements, or commitment regarding the propositional content of a message”. Biber et al. divide stance devices into three categories: epistemic stance, attitudinal stance, and style-of-speaking stance (Biber et al. 2000; Conrad and Biber 2000). Another conceptualisation of stance underscores its interactive aspect (Du Bois 2007), portraying stance as a form of social public behaviour and putting forward “stance triangle” as a structure for interactive discourse. Hyland (2005) posits that stance “expresses a textual ‘voice’ or community-recognised personality” and introduces the “stance and engagement framework” (Hyland and Jiang 2016), which is chosen as the main theoretical framework for the present study.

In terms of research fields, previous studies have shown that most stance-related research focuses on media discourse (Zhang 2018; Vis et al. 2012; Wu 2018; Myers 2014; Vertommen et al. 2012) and political discourse (Delandshere and Petrosky 2004; Chand 2011; Nir et al. 2014). This is attributed to the ease of identifying explicit stance expressions in these texts (Hyland and Jiang 2016). In recent years, stance research has flourished in academic writing, specifically branching into stance analysis of various academic genres such as abstracts (Hyland and Tse 2005; Pho 2008), undergraduate essays (Aull and Lancaster 2014), doctoral dissertations (Chan 2015), and research articles (Hyland 2012). It can be seen that stance research in legal discourse is far less abundant compared to other linguistic domains, which underscores the necessity for further studies.

Integration into the profession: stance in law

While research on stance as a theoretical or methodological framework in legal discourse may not be as extensive as in other discourse domains (Breeze 2011), it still covers various aspects within legal discourse. Evidence can be found in research on the construction of stances using the Noun in analysing strategic behaviour exhibited by Justices (Charles 2007), study on stance expressions in courtroom discourse (Chaemsaithong 2015, 2017), analysis of stances in court judgments (Cheng and Cheng 2014), study of the use of epistemic lexical verbs as hedging devices in different legal genres (Vass 2017), examination of stance adverbs in judicial opinions (Poole 2021), the investigation into stance markers in oral argument about marriage laws (Tracy 2011), corpus studies on the ideological stance of threatening language (Gales 2010), exploration of stance-taking on abortion jurisprudence (McKeown 2022) and evaluative analysis of the stance on reputational harm in defamation law (Izes 2023).

It can be seen that previous research on stance in law mainly focuses on the judicial domain, with comparatively fewer studies in the legislative field. As previously discussed, stance expressions and features are more readily detectable in discourse areas such as media, politics, and the judiciary as well. Compared to legislative discourse, this easily detectable presence of stance is notable in courtroom discourse and court judgments. According to Hyland and Jiang (2016), there is a close correlation between personal ideologies, value systems, and stance. Although the connection between legislation and individuals does not seem as close as in the judiciary, at a more abstract level, legislative texts do reflect deep-seated public ideologies and values.

Stance in legislative discourse involves a series of coherent rhetorical choices, which enables legislators or drafters to negotiate and balance various values associated with the law and legal subjects so as to meet the requirements of the rationality, validity, and fairness of legislation and the expectations of the general public toward the law. Based on the above literature review, it is evident that there is a need for ongoing academic focus on the stance in legislative discourse.

Inspired by the review of stance and stance in law, this study explores stance expressions in legislative discourse with a corpus-driven approach, especially the patterns of stance construction in data protection laws in the U.S., the EU, and China to identify their respective strategic expressions and stance orientation.

Methodology

A corpus-driven approach to stance analysis

Corpus, as a tool, is increasingly prevalent in quantitative research. Etymologically, the term corpus (plural is corpora) traces its origins to the Latin word for body. Following this, Baker (2010: 6) defines the corpus as “a ‘body’ of language, or more specifically, a (usually) very large collection of naturally occurring language, stored as computer files”. Therefore, it could be said that the corpus is “a finite collection of machine readable text” (McEnery and Wilson 2001: 197). As a form of empirical research, corpus analysis facilitates the exploration of real-life large-scale examples using tools such as Antconc, Wmatrix, Lancsbox, and others, rather than relying solely on a limited number of typical examples or introspection. Put in another way, corpus analysis enables us to gain a more objective understanding of the theme and object under study. For instance, in legal studies, traditional doctrinal research relies on in-depth analysis of one typical case to derive insights into a specific field. In contrast, corpus analysis allows extracting a large number of relevant court judgments (Mattioli and McAuliffe 2021) to obtain more extensive, evidence-based or data-based results for further attribution. Tognini-Bonelli (2001) distinguishes between corpus-based and corpus-driven research. The former employs the corpus to validate researchers’ intuition through a source of examples, while the latter takes an inductive approach by utilising a series of real data and procedural designs. The corpus-driven approach is based on the view of corpus analysis as a method or methodology. Whether it is a corpus-driven or corpus-based study, representativeness is a fundamental and crucial consideration (Biber 2006) for the selection of corpus data.

Stance, as a kind of typical linguistic marker (Aull and Lancaster 2014), needs to be annotated or identified effectively, especially in large-scale texts. The manual annotation or identification of stance markers is a time-consuming process, which makes corpus analysis the most convenient and, consequently, the most prevalent method for researchers in stance studies. Scholars (e.g., Crosthwaite and Jiang 2017; Aull et al. 2017; Rezaei et al. 2021) employ the corpus as a tool to investigate various stance categories or conduct horizontal and longitudinal comparative studies. The present study, focusing on the stance analysis in data protection laws of the U.S., the EU, and China, appropriately adopts the corpus-driven approach, which shows both sound rationale and practical feasibility.

Materials and corpora

The corpus-driven approach requires the corpus data to be explored must be well matched to the research questions (McEnery and Hardie 2012). It implies that the selection of corpus data is aimed to best address the proposed research questions. In response to the research questions posed in this study, three comparable sub-corpora are self-compiled, consisting of 12 legislative texts pertaining to data protection laws and comprising a total of 169,167 tokens. In this context, the term ‘token’ refers to the occurrence of a word form in a text or corpus. As for the concept of comparable corpora, Hunston (2002) takes sampling from different varieties of the same language as an exemplary standard. Four aspects of specific standards were considered in this study to retrieve the materials, which include comparable language, representativeness, pertinent search terms, and authoritative database. The composition and size of the corpora are shown in Table 1.

Table 1 The detailed information of the three corpora.

Firstly, the United States Legislation Corpus (hereinafter referred to as USLC) comprises one federal and six state data protection legislative texts. Specifically, it includes the American Data Privacy and Protection Act (ADPPA), the California Privacy Rights Act (CPRA), the California Consumer Privacy Act (CCPA), the Colorado Privacy Act (CPA), the Connecticut Data Privacy Act (CTDPA), the Virginia Consumer Data Protection Act (VCDPA), and the Utah Consumer Privacy Act (UCPA). For an extended duration, the U.S. lacked a unified data protection law at the federal level. However, the enactment of the ADPPA on June 3, 2022, by the U.S. House of Representatives and Senate marked a groundbreaking moment in federal legislation, which establishes a robust national framework for personal data protection. In order to show a more comprehensive view of the U.S. data privacy laws, this study also includes the enacted laws of data privacy and protection in five states.

Secondly, the European Union Legislation Corpus (hereinafter referred to as EULC) includes two key legislative texts concerning personal data protection: the General Data Protection Regulation (GDPR) and the Data Protection Law Enforcement Directive (Directive). Enacted in 2016, GDPR is not only the world’s earliest but the most comprehensive data protection law, serving as a pivotal benchmark in global legislative efforts (Hoofnagle et al. 2019; De Hert and Papakonstantinou 2016). Since its implementation in 2018, GDPR has produced numerous administrative enforcement cases. The leading role of the EU in data protection legislation and enforcement is tied up with its long-standing tradition of respecting and safeguarding privacy rights as fundamental human rights.

Thirdly, the Chinese Legislation Corpus (hereinafter referred to as CLC) is extracted from the PKU-law database (English version) and comprises three highly relevant data protection laws, namely, the Personal Information Protection Law of the People’s Republic of China (PIPL), the Data Security Law of the People’s Republic of China (DSL), and the Cybersecurity Law of the People’s Republic of China (CL). The CL, enacted on November 7, 2016, lays the groundwork for China’s data protection legal framework (Cheng and Liu 2022; Wang et al. 2020), explicitly emphasising the protection of personal information. Passed on June 10, 2021, the DSL stands as a foundational legal instrument in China’s data security domain, with a primary focus on national security. Enacted on August 20, 2021, the PIPL is China’s first comprehensive legislation on personal information protection (Zhao and Feng 2021), which shares notable similarities with GDPR.

Method

The following corpus analysis was carried out in three steps. First, the data for the three corpora were converted to plain text files, then into a corpus analysis tool Lancsbox (Brezina et al. 2015). In what follows, the corpora were searched for stance features following the stance markers list of Hyland (2018). This list of stance devices is widely applicable and characterised by its non-disciplinary specific nature (Crosthwaite et al. 2017), which makes it suitable for examining the stance features of data protection laws in the present study. The full list of stance items can be found in the Appendix. After automated searching in the corpora, each stance item was manually examined and counted to ensure its fidelity in performing the stance function. For instance, in the present corpora to be explored, agree in “Pre-Dispute Arbitration Agree” does not function as a stance marker but rather as part of a specific legal term. In this case, it requires manual exclusion. The statistical results of stance items were double-checked by the co-author and if necessary, further consultation with peers were sought to ensure the reliability of the data analysis.

Second, the “stance and engagement framework” (Hyland and Jiang 2016) was borrowed to analyse different dimensions of stance. While this framework includes both the writer-oriented features of “stance” and the reader-oriented features of “engagement”, we only examined the former. Regarding the latter category, which includes reader pronouns, directives, questions, shared-knowledge devices, and asides, they were observed to be rare in legislative texts. Hyland’s framework (Hyland 2005; Hyland and Jiang 2016) on stance comprises three main components: evidentiality, affect and presence, along with four types of stance devices: attitude markers, hedges, boosters, and self-mentions, which we will explain and analyse in detail below.

Third, based on the results of step 2, a concordance analysis was carried out to delve deeper into the findings obtained from the comparison of the three corpora. It involves extracting salient stance items and performing a concordance search to locate their contextual information, as outlined by Scott and Tribble (2006). According to Sinclair (1991), the meaning and function of a word in a specific discourse are revealed through its co-textual features. Given that the analysis of concordances is a qualitative form of analysis and the analysis of frequency data is a quantitative form, this study adopts a mixed-methods approach combining both qualitative and quantitative analysis.

Corpus findings and analysis

Cross-corpora comparison of stance markers

The present study makes a holistic comparison of stance markers across the three corpora. As shown in Table 2, the proportion of stance items is highest in EULC, at approximately 28‱, while CLC has the lowest proportion at around 14‱, with USLC falling in between. It can be seen that in terms of data protection legislation, the stance of the EU appears more distinct and prominent compared to that of the U.S. and that of China. The findings also align with the view proposed by Pei and Cheng (2020) regarding the stringent standards set by the EU in data protection. However, the lower proportion of stance items in CLC does not directly imply a lack of clarity in China’s stance on data protection or even lower standards for data protection. Instead, it might indicate an intentional legislative strategy, which will be further explored and explained in the discussion section.

Table 2 The comparison of stance markers.

Figure 1 shows the distribution of stance devices across the three corpora. As we will present a comparative analysis of four distinct dimensions in detail below, here we will mainly focus on summarising their commonalities. The commonality analysis offers insights into the performance of various stance dimensions in data protection laws. It is evident that all three corpora exhibit the highest proportion of hedging devices, which indicates that hedging devices are most often used to express stance in data protection laws. Furthermore, it reflects the prudent and rational but intricate nature (Cheng and Cheng 2014) of data protection legislation and the discursive space constructed by legislators in this domain. Additionally, it is obvious that all three corpora show a remarkably low presence of self-mentions. The data reveals that both EULC and CLC demonstrate a complete absence of self-mention devices. This finding appears to contrast sharply with trends observed in stance studies of other genres. In certain contexts such as speeches or dialogues, self-mentions are relatively more frequent, serving to express the discoursal self (Hyland 2005) and foster interaction between the author and potential audiences. However, it comes as no surprise that self-mentions are less common in expository texts, particularly in legislative texts, as the lower self-presentation aims to underscore the objectivity of the text. Therefore, it can be inferred that not only in the legislative texts of data protection in this study, but in legislative texts in general, self-mentions display a low prevalence to maintain and safeguard the objectivity of legislation. We then turn to the use of these stance features by section.

Fig. 1: The distribution of stance devices in three corpora.
figure 1

AM attitude markers, B boosters, H hedges, SM self-mentions.

Cross-corpus comparison by section

Evidentiality

According to Hyland (2005), evidentiality refers to “the writer’s expressed commitment to the reliability of the propositions he or she presents and their potential impact on the reader”. Within the legislative context, the assessments of certainty enable legislators to convey legal statements from a particular perspective. Evidentiality is manifested through hedges and boosters (Hyland and Jiang 2016). Both elements indicate legislators’ explicit intrusion into a text to convey a legislative stance, whether to ensure reliability or convey legal reasoning. Therefore, by assessing the weight legislators impose on the propositions, we can infer their commitment to legislative reliability and legal reasoning. Figure 2 reports the distribution frequency of boosters and hedges in data protection laws. Next, we will conduct a comparative corpus analysis of these two elements.

Fig. 2: The frequency of boosters and hedges used in data protection laws.
figure 2

“Boosters” is represented by blue blocks and “Hedges” by orange blocks.

Hedges convey a prudent stance of the writer by indicating a lack of complete commitment to a proposition, which allows information to be presented as an opinion rather than a fact (Hyland 2005). According to Fig. 2, we can observe that hedges are most prevalent in EULC, reaching 105.77 per 10,000 tokens, significantly exceeding those in USLC and CLC. It suggests the EU places greater emphasis on legal reasoning and rigour regarding data protection.

Further examination of the wordings for hedges reveals differences and similarities among the three jurisdictions regarding permissions and obligations in personal data protection laws. Table 3 reveals that the most frequently occurring hedging devices across all three corpora are primarily “modality”, including epistemic modality like likely and possible, as well as deontic modality such as may and should. Modality has been investigated in legal discourse in different contexts of situation (e.g., Cheng and Cheng 2014; Cheng and Sin 2011; Li et al. 2016), where it is used to reflect the ways in which legislators or jurists construct legal facts and indicate legal possibilities, or to examine the distribution of legal permissions and obligations. Table 3 shows that should has the highest frequency in EULC, whereas may is the most frequent hedging device in USLC and CLC. Each concordance line containing should or may was examined in context to validate the aforementioned claim.

  1. (1)

    [GDPR.txt] Results of such verification should be communicated to the person or entity referred to in point (h) and to the board of the controlling undertaking of a group of undertakings, or of the group of enterprises engaged in a joint economic activity, and should be available upon request to the competent supervisory authority.

  2. (2)

    [directive.txt] The principles of, and rules on the protection of natural persons with regard to the processing of their personal data should, whatever their nationality or residence, respect their fundamental rights and freedoms, in particular their right to the protection of personal data.

  3. (3)

    [ADPPA.txt] any time beyond the initial 2 times described in subparagraph (A), may allow the individual to exercise such right for a reasonable fee for each request.

  4. (4)

    [ADPPA.txt] A covered entity may not infer that an individual has provided affirmative express consent to an act or practice from the inaction of the individual or the individual’s continued use of a service or product provided by the covered entity.

  5. (5)

    [CL.txt] No individual or organisation may conduct any activity endangering cybersecurity,

The data in Table 3 initially appears to show that the EU personal data protection legal system places a strong emphasis on obligations, particularly those imposed on data processors. However, a further concordance analysis reveals that this preliminary observation does not seem to be accurate. While may typically conveys permission, its meaning changes to obligation when used in conjunction with no or not (Li et al. 2016). For instance, in the case of the U.S., Fig. 3 reports the occurrence of may not in USLC. However, despite this, obligations in the EU personal data protection laws still appear to be generally higher compared to those in the U.S. and China.

Table 3 The top 5 hedging devices ranked by frequency.
Fig. 3
figure 3

The occurrences of may not in USLC.

In terms of epistemic modality, possible appears as one of the top five hedging devices in all three corpora. As can be seen from the data in Table 3, the frequency of possible in EULC is 6.69 per 10,000 tokens, significantly higher than the frequencies in USLC (1.94) and CLC (1.15). It underscores the prudent approach of the EU personal data protection laws in indicating legal possibilities, which allows for discretion in legal interpretation (Cheng and Cheng 2012).

Boosters, on the other hand, express conviction and assert claims (Hyland 1999). As shown in Fig. 2, the frequency of boosters appears relatively average across the three corpora, with slightly higher occurrences observed in USLC and CLC compared to EULC. Noteworthily, this contrasts with the analysis of hedges earlier as indicated in Table 3.The relatively limited use of certainty-based stance markers in the EU personal data protection laws reflects a preference for objectivity and a reduced reliance on subjective beliefs (Hyland and Jiang 2016). This re-affirms the prudent and modest approach of the EU data protection legislation, as indicated in Fig. 2 and Table 3.

When we turn our attention to wordings for boosters, a notable observation stands out: personal data protection legislation demonstrates a strong sense of legal constructiveness. It is closely related to data protection as an emerging field, where rules and principles necessitate establishment and reinforcement through stance markers. It can be seen from Table 4 that establish(ed) appears with high frequency across all three corpora, with values of 7.54 in USLC, 10.7 in EULC, and 10.89 in CLC. In USLC, establish follows with meaningful enforcement, which indicates the emphasis placed on enforcement of the U.S. legal rules of data protection laws. In EULC, the occurrence of established also reflects the adaptational requirements of the principles and rules of the GDPR. The high frequency of establish in CLC aligns with the characteristic of strong legal constructiveness in Chinese legislation (Paler 2005). In Chinese data protection legislative texts, establish consistently co-occurs with improve, further reflecting this attribute. Based on the analysis of the boosters, it can be noted that legislative texts on data protection consolidate and emphasise the legal rules and principles constructed by stance markers such as establish(ed).

  1. (6)

    [ADPPA.txt] To provide consumers with foundational data privacy rights, create strong oversight mechanisms, and establish meaningful enforcement.

  2. (7)

    [directive.txt] Regulation (EC)No 45/2001 and other Union legal acts applicable to such processing of personal data should be adapted to the principles and rules established in Regulation (EU)2016/679.

  3. (8)

    [CL.txt] establish and improve the cybersecurity guarantee system, and enhance the capability to protect cybersecurity.

Table 4 The top 5 boosting devices ranked by frequency.

Affect

Attitude markers, as one of the dimensions of stance, convey the writer’s affect, involving emotions, perspectives, and beliefs towards propositions (Hyland 2005). Examples include linguistic markers such as agree, interestingly, appropriate, and others that directly express the writer’s affective stance. Affect typically conveys the writer’s sentiments or evaluations, carrying a strong sense of subjectivity. Consequently, affective expressions are relatively rare in legislative texts. However, it is precisely because of this rarity that the attitude markers appearing in the corpus to be explored hold greater research value, as they denote a strong and distinct stance and value orientation (Yu 2023). Legislators seek to convey stance on specific legal topics through shared attitudes, values, and reactions in an implicit way. Attitude markers are typically categorised into negative and positive forms (Hyland and Jiang 2016), reflecting the attitudinal inclinations of the writer.

Figure 4 presents a comparison of the frequency of attitude markers across the three corpora. It is evident that attitudinal stance is more frequent in EULC, while appearing less frequently in USLC, at only 6.59 per 10,000 tokens. It suggests a concerted effort by the EU legislators to convey and underscore a strong sense of stance and value orientation in the realm of personal data protection. This point will be further elaborated in the discussion section in conjunction with the socio-legal context.

Fig. 4: The frequency of attitude markers used in data protection laws.
figure 4

A comparison of the USLC, EULC, and CLC.

More valuable evidence comes to light when we look at the specific use of attitude markers. As depicted in Table 5, appropriate, as an attitude marker, appears relatively frequent in the three corpora. It peaks in EULC at 21.73 occurrences per 10,000 tokens, followed by 7.45 in CLC, and though lower at 4.53 in USLC, it still ranks first within this sub-corpus. Appropriate seems to convey a leaning-towards-negative meaning (Appel et al. 2016), which indicates a higher level of prudence compared to positive-oriented attitude markers. However, negative-oriented attitude markers can also convey a clear stance, that is, a prudent and impartial legislative attitude. For instance, expressions like “take reasonable and appropriate steps” in the CPRA text and “at appropriate time” in the CL text serve this function effectively.

Table 5 Top 3 The attitude markers ranked by frequency.

The findings also unveil another function of negative-oriented attitude markers, that is, the neutralisation and weakening effect to disguise overly strong and direct stance expressions by legislators. In the following examples, GDPR imposes requirements for technical and organisational measures on data processing activities for scientific, historical research, or statistical purposes. This requirement itself represents an obligation and a direct and strong expression of stance. However, the inclusion of appropriate in this expression seems to mitigate the compulsory nature of the requirement. Nevertheless, the demand in this statement remains the same even without appropriate. It also illustrates that appropriate in this context serves as a legislative discursive strategy to attenuate the stance.

  1. (9)

    [GDPR.txt] accordance with Article 89(1) subject to implementation of the appropriate technical and organisational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject (‘storage limitation’);

  2. (10)

    [GDPR.txt] processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’).

  3. (11)

    [CPRA.txt] grants the business rights to take reasonable and appropriate steps to help to ensure that the third party, service provider, or contractor uses the personal information transferred in a manner consistent with the business’s obligations under this title.

  4. (12)

    [CL.txt] The standardisation administrative department of the State Council and other relevant departments of the State Council shall, according to their respective functions, organise the formulation of and revise at appropriate time national and industry standards relating to cybersecurity administration and the security of network products, services and operations.

Additionally, it is important to highlight certain attitude markers conveying positive meanings. For instance, important, which ranks first in CLC and second in EULC, as well as essential, which ranks second in USLC. Analysis of collocations and concordances reveals that important in CLC is always collocated with data. It underscores a significant feature of China’s data protection legislative system, which emphasises graded classification for data protection. The security and protection of important data are central to China’s data classification system. Balancing data security with developmental needs is a pressing task, highlighting the urgency for streamlined data management. Both the Cybersecurity Law and Data Security Law in China outline specific provisions for managing important data (Cheng et al. 2023), including cataloguing, designated personnel, and risk evaluation. It underscores China’s prioritisation of “security”, “national interests”, and “public interests” within its legislative endeavours in data protection.

In EULC, important typically collocates with public interest. It reflects the EU’s exception principle in data processing, whereby data processing restrictions can be overridden based on important public interests of the member states. It also mirrors the EU’s emphasis on public interest in the domain of personal data protection. In USLC, essential usually co-occurs with goods and services, shedding light on a key aspect of the U.S. data protection legislation. Prior to the enactment of comprehensive national data privacy laws, the most comprehensive and typical personal data protection is the CCPA focusing on consumer privacy protection. The primary concern in the U.S. regarding personal data and privacy rights stemmed from the balance of interests between businesses and consumers in the market economy (Acquisti et al. 2016).

  1. (13)

    [CL.txt] Taking measures such as data categorisation, and back-up and encryption of important data.

  2. (14)

    [DSL.txt] The national data security work coordination mechanism shall coordinate the relevant departments in the formulation of a catalogue of important data to strengthen the protection of important data.

  3. (15)

    [GDPR.txt] exercise or defence of legal claims or for the protection of the rights of another natural or legal person or for reasons of important public interest of the Union or of a Member State.

  4. (16)

    [CPA.txt] “DECISIONS THAT PRODUCE LEGAL OR SIMILARLY SIGNIFICANT EFFECTS CONCERNING A CONSUMER” MEANS……OR ACCESS TO ESSENTIAL GOODS OR SERVICES.

Presence

Presence typically refers to the author’s projection into the text (Hyland 2005), often achieved through self-mention devices such as first-person pronouns and possessive determiners. Self-mention devices commonly include “I, we, me, my, our, mine, us, the author, the author’s, the writer, and the writer’s” (Hyland 2018: 269–270). It can be inferred this kind of stance-expressing device tends to appear less frequently in legal texts, especially legislative texts. This is because legislative texts typically strive to avoid subjective statements and expressions as much as possible. Furthermore, the legislation represents not the will or thoughts of individual legislators, but rather serves as legal instruments designed by the entire nation to achieve specific purposes. As illustrated in Fig. 5, self-mentions have very low prevalence across all three corpora, with their absence in EULC and CLC being particularly notable. However, it is worthy of exploration how self-mentions appear and function within USLC to express stance.

Fig. 5: The frequency of self-mention used in data protection laws.
figure 5

A comparison of the USLC, EULC, and CLC.

The corpus search results indicate that my appeared 5 times and our appeared 4 times in USLC. Further concordance analysis reveals that my in USLC does not specifically refer to legislators or legislatures but rather refers to data subjects in the form of headline-like quotations. While this way facilitates interaction between the legislature and the audience targeted by legislative texts, it does not strictly function as a self-mention marker. In addition, our in USLC effectively performs stance function by linking legislators with readers through phrases such as “our society”, “our personal information”, and “our desire”, enhancing the readers’ identification with legislative values.

  1. (17)

    [CCPA.txt] Provide a clear and conspicuous link on the business’ Internet homepage, titled “Do Not Sell My Personal Information”, to an Internet Web page that enables a consumer, or a person authorised by the consumer, to opt out of the sale of the consumer’s personal information.

  2. (18)

    [CPRA.txt] California is the world leader in many new technologies that have reshaped our society.

  3. (19)

    [CCPA.txt] A series of congressional hearings highlighted that our personal information may be vulnerable to misuse when shared on the Internet. As a result, our desire for privacy controls and transparency in data practices is heightened.

However, it is noteworthy that the self-mention device our appears in the legislative text of the US data protection laws not within statutes but rather in the declaration section of the legislature. Hence, it remains evident that self-mention devices are rarely present in formal legislative texts, in order to ensure the authority and objectivity of the law to the greatest extent.

Discussion

Stance practice in data protection laws

In the preceding section, we conducted a thorough analysis of the divergences and convergences in stance expression between the data protection laws of the U.S., the EU, and China, thus addressing RQ1. Building upon this foundation, we aim to answer RQ2 and RQ3 through in-depth discussion in this section, thereby exploring and refining the theoretical contributions of this comparative study.

Regarding RQ2, the data protection laws of the U.S., the EU, and China demonstrate two overarching stance orientations. One is the modesty of legislation. Across all three jurisdictions, legislative texts present a prevalence of prudent hedging devices and a scarcity of subjective self-mention devices. This stance orientation seems to be found in various genres of legislative texts. Current exploration of data protection laws reflects the prudence and objectivity that legislators have shown when navigating the complexities of legislating in an emerging domain. The other is the sufficient discursive space left for the data protection judiciary. It is dictated by the distinctive nature of data protection laws compared to other genres of law. In the digital world, various unprecedented legal cases concerning data continuously emerge, such as those related to data scraping (Liu and Chen 2021), which leads to a unique feature that judicial precedents often precede legislative action in data protection regulation. This characteristic is evident in both civil law and common law systems. Essentially, in domains heavily influenced by technology like data protection and intellectual property rights, statutory laws frequently fail to keep pace with societal advancements. Consequently, judicial decisions can provide normative guidance for social behaviour (Kordzadeh and Ghasemaghaei 2022; Pei and Cheng 2022). Given this dynamic, it is imperative for data protection legislation to afford sufficient discursive space for judicial discretion.

Regarding RQ3, the underlying reasons for the distinctions in stance practices among the U.S., the EU, and China stem from their respective legislative values and public ideologies. The voice of the text transmitted by stance expressions mirrors the different national voices in data governance. Specifically, the prevalence of hedges in EULC underscores the EU’s commitment to precision and reasoning in data protection legislation. This inclination can be traced back to the EU’s pioneering role in enacting personal data protection legislation on a global scale, without any existing framework to draw from. Consequently, the EU draws upon its rich historical traditions and legislative experiences concerning personal privacy protection (Li and Kit 2021), shaping the right to the protection of personal data as a fundamental right within its legislative framework. Additionally, the examination of boosters reveals a strong inclination of the EU legislature to convey legislative values. This aligns with the EU’s ambition to promulgate its values and principles of personal data protection globally. Despite its significance, GDPR still exhibits inherent limitations, which China has taken note of and made legislative improvements accordingly. The relatively low proportion of stance items in CLC may suggest a strategic legislative approach. This is because China, in subsequent legislative efforts, recognised the challenges posed by the EU’s stringent standards of personal data protection, which hindered data flow and transactions within the EU. Therefore, China has embraced a comparatively neutral stance in its approach to data protection legislation, seeking to strike a balance between protecting individual data rights and facilitating the utilisation and flow of data (Cheng and Liu 2023). This stance is manifested in the use of covert stance markers aimed at achieving objectivity within the law. Finally, it can be inferred that the legislative stance in the U.S. places importance on national security while also emphasising the protection of personal data and privacy rights, alongside facilitating data utilisation to foster the growth of data-driven economy.

Stance in law

This part takes a specific view of the stance in law. Based on the influential work of Hyland (2005) and the close examination of data protection laws, a more detailed and law-targeted research model of stance is proposed, as shown in Fig. 6. Through a comprehensive and detailed analysis of data protection legislative corpora, the judgments of the landmark Schrems case (Flórez Rojas 2016), which mirrors a principled stance on data protection (Zalnieriute 2022), and supplementary relevant judgments, this study sub-categorises each main type of stance into different sub-types depending on the semantic meanings of particular markers used. These sub-types of stance devices are particularly suitable for stance identification and analysis in legal texts.

Fig. 6
figure 6

The model of stance in law (adapted from Hyland’s (2005) stance model).

Evidentiality in law is important because “fact is not found but reconstructed based on the admissible evidence ” (Cheng and Cheng 2014) especially in court judgments (Wu and Cheng 2020). As a kind of writer’s commitment to the reliability of the propositions, evidentiality involves not only the degree of certainty but also the source of the propositional information (Aikhenvald 2004). Hence, on one hand, the balance of boosters and hedges in the text indicates a modulatory strategy of the writer to convey a commitment to the text content; on the other hand, the division of mental and material evidentials is based on the source of the propositional information. Elucidating further, mental or mental-state evidentials refer to the evidence or information acquired through one’s personal opinion or beliefs. For example, I think is the most typical and subjective way to state one’s opinion. Material evidentials, contrary to mental ones, are used when the speaker or writer has not personally perceived the information but has acquired it from others or other things. Legally, mental evidentials, pertaining to the cognitive domain, are exempt from evidential burden as thoughts are inherently unverifiable, which reflect a subjective orientation of stance. Conversely, material evidentials often need linguistic evidence as support, which inclines towards a more objective stance. There are three main sub-types of material evidentials: inferred, reported and documented. Inferred evidentials refer to the reasoning process and denote its reliability in view of the quality of the available evidence (Aikhenvald 2004). Therefore, the use of an inferred evidential allows the writer to explicitly qualify their statement and convey to the audience the extent to which the proposition can be considered well-founded or tentative (Kwon 2012). Reported evidentials refer to statements that are based on information reported by others rather than direct observation or first-hand experience by the speaker or the writer (Clift 2006). The use of reported evidentials allows the recipient to properly evaluate the credibility of the statement based on the original source of information. Documented evidentials denote the information based on an official documented form, such as legislation, legal contracts or any other form of documentation that can be examined and referenced within the legal proceeding (Mithun 2020). In legal judgments, documented evidentials are generally given greater weight and credibility than reported evidentials, as they are considered more reliable and less susceptible to potential biases that can arise from the reporting process. Examples of sub-types of evidentiality extracted from the judgments are shown in Fig. 7.

Fig. 7
figure 7

Examples of different types of "Evidentiality".

Affect in law, especially in judicial practice, has been widely explored in the area of law and psychology or sociology of law (e.g., Rossmanith 2015; Maroney 2011; Amaya and Del Mar 2020). For instance, the act of drafting a legal judgment is an exercise of attitudinal persuasion (Anleu and Mack 2021), either advocating for or against a particular stance. By expressing an affirmative or negative attitude, it aims to convince the parties involved, as well as other readers, of the validity and correctness of the court’s decision (Baum 2017; Barak 2009). Affirmative attitudes, including recognitory, encouraging, and commendatory attitudes, generally denote positive feelings towards legal actions, which highlights their validity, effectiveness, or compliance with legal standards. Negative attitudes, including questioning, cautionary, critical, and punitive attitudes, indicate negative feelings towards legal actions, which identify shortcomings, violations, or areas where the actions fail to meet legal standards. Therefore, investigating the affect in law is vital for a comprehensive understanding of the attitudinal factors that influence lawmakers and judges. It helps in identifying the motivations, orientations and values of the people who made these affective evaluations. Figure 8 shows examples of sub-types of affect extracted from the judgments.

Fig. 8
figure 8

Examples of different types of "Affect".

“All writing carries information about the author” (Hyland 2018: 62), and legal texts are no exception. In legal texts or documents, writers make a conscious choice about the presence or absence of the author to adopt a particular stance and establish a contextually situated authorial identity. When engaging in the act of writing, writers inevitably project their own subjectivity and positionality within the work. Furthermore, the writer’s stance and orientation towards others will also be woven into the fabric of the text. Hence, presence in legal discourse can be divided into two types according to the distinction of writers and other actors: self-projection and other-projection. Self-projection indicates the personal projection of the writers as a powerful means of self-representation. It usually has two ways of expression, namely, self-mention and self-citation. Other-projection, on the other hand, indicates referencing actions or statements made by other parties involved in the legal process to show the writer’s stance. For instance, a judge might refer to a plaintiff’s claims to evaluate their credibility, or to a witness’s testimony to support or refute a point of law. Distinguishing between self-projection and other-projection helps to analyse how legal writers position themselves and others within legal discourse. Figure 9 shows examples of sub-types of presence extracted from the judgments and the corpora in this study.

Fig. 9
figure 9

Examples of different types of "Presence".

On balance, this study makes three main contributions at the intersection of stance and legal discourse. Firstly, compared to other genres of law, stance expressions of data protection laws show a unique sense of legal constructiveness. In this context, legal constructiveness is reflected in the use of stance markers and other linguistic markers in order to achieve the establishment and reinforcement of rules and principles for emerging laws such as the data protection legislation being explored in our study. As for the traditional or long-standing legislation, this type of legal constructiveness through stance expression may not be as pronounced but has developed a relatively stable form. The study of stance markers in marriage law (Tracy 2011) and defamation law (Izes 2023) serves as evidence that researchers usually view legislative stance a consistent benchmark for studying stance in judicial opinions or oral arguments. The second contribution lies in the alignment of stance construction patterns in data protection legislation with broader legislative discourse. Contrary to the anticipated prevalence of overt stance markers in legal discourse, such as the extensive use of boosters in judicial dissenting (Boginskaya 2022), the analysis reveals a higher frequency of hedging devices in legislative discourse on data protection. It reflects that compared to the explicit and overt stance expressions, the legislative discourse on data protection prioritises covert ones, such as negative-oriented attitude markers, with the aim of upholding prudence, authority and objectivity in legislation. Lastly, the present study formulates a specialised research model of stance in law based on Hyland’s foundational work on stance, which provides a more targeted way to understand how stance is conveyed in legal texts. The enriched sub-categories shed light on how legal professionals express the subjective, objective, positive, negative, explicit, implicit and other more nuanced orientations of stance in their writing.

Concluding remarks

To sum up, this study set out to explore the patterns of stance representation and construction in data protection laws, while also re-contextualising the underlying strategic ideological expressions of lawmakers within the socio-legal framework. In this study, we adopted a corpus-driven approach to investigate stance expressions in data protection legislation across the U.S., the EU, and China. Drawing upon Hyland’s framework on stance, we derived valuable insights into various dimensions of stance. At a micro-level, the findings of this study reveal both commonalities and distinctions of stance expression within data protection legislation across three jurisdictions, which provide a relatively comprehensive understanding of their stance construction patterns. From a macro-level perspective, these different stance choices also mirror distinct text-external factors like legal cultures (Wagner et al. 2020), values, and ideologies. Notably, the research has also shown that stance tends to be conveyed implicitly rather than through explicit grammatical devices in legislative discourse on data protection.

The insights gained from this study highlight the important role of stance in legal discourse. Stance, as a series of linguistic rhetorical choices and a type of legal value orientation, enables legislators to intrude into texts explicitly or implicitly to convey legal reasoning and ideologies, reconcile conflicting interests and demands among stakeholders, and ensure the objectivity and fairness of the law. Simultaneously, for the target audience of legislative texts, unravelling the stance embedded in the law facilitates a more precise grasp of legislative intent, thereby aiding in a more effective understanding and application of legal protections for their rights and interests.

Overall, this study serves as a stepping stone for future explorations in identifying stance in legislative discourse and pushing the boundaries of our knowledge in stance theory. Future works are encouraged to build upon our findings and explore the rich territories of stance representation within the data protection judicial domain, especially within court judgments or courtroom discourse.