1 Introduction

E-voting aims at two main security properties, namely vote privacy and verifiability; the latter is often split in several sub-properties:

  • cast-as-intended: the ballot cast by the voter contains their intended vote;

  • recorded-as-cast: the ballot recorded in the ballot box corresponds to the one cast by the voter;

  • tallied-as-recorded: the result corresponds to the ballots recorded in the ballot box;

  • eligibility: ballots recorded in the ballot box only come from legitimate voters.

In this paper, we focus on cast-as-intended, which aims at protecting against a malicious voting device that could try to modify the vote of a voter, e.g. when encrypting it. Specifically, we consider the recently proposed Belenios-CaI system [5], which builds upon Belenios [6], a system now in production for 8 years and with more than 700,000 voters in total. In Belenios-CaI, voters audit their ballot by checking a control value. Instead of the sole encryption \(\textsf{enc}(v)\) of the voter’s vote v, the ballot consists of three encryptions \(\textsf{enc}(v), \textsf{enc}(a), \textsf{enc}(b)\) and a zero-knowledge proof that \(v+a=b\) modulo c, where c is some constant greater than the number of voting options. The voting device displays the ballot and the values vabc. The voter should check that \(v+a=b\) modulo c and randomly challenge their voting device to open either \(\textsf{enc}(a)\) or \(\textsf{enc}(b)\). The voting device has to provide the corresponding randomness and the correctness of the encryption is then discharged to auditors: the voter simply checks that either a or b appears next to their ballot on the public bulletin board and the auditors check that the randomness corresponds to the audited encryption. An interesting feature of Belenios-CaI is that the voter does not need any second device nor additional secure channel to receive extra material. However, Belenios-CaI presents usability issues. First of all, voters need to compute arithmetic operations (modular addition). Moreover the audit of the ballot involves several verifications that must be performed by the voter, which seems cumbersome.

Our contributions. We propose an encoding of addition to ease the voter’s journey. Namely, we consider addition modulo 2 only, and encode addition by asking voters to tell whether two symbols are identical or different, which is a much simpler task. We design and implement a voter interface that guides the voter to perform the checks for each voting choice. We also extend the current implementation of Belenios to support the new ballot format and verification. This yields the first implementation of Belenios-CaI. The code is open-source and covers all parts of the elections (server and voting client).

In order to check the usability of Belenios-CaI with our approach, we conducted an experiment with 165 participants, which were randomly assigned either Belenios-CaI or the original Belenios. The goal was to study whether Belenios-CaI introduces a reasonable overhead of complexity w.r.t. Belenios, a system that is used on an everyday basis. Moreover, in Belenios-CaI, voters must choose to open either \(\textsf{enc}(a)\) or \(\textsf{enc}(b)\) “at random”, a difficult task for voters, who are not perfect random generators. Hence we study whether the bias in the randomness could affect verifiability but also privacy. More precisely, we investigate three research questions.

Q1 - Usability:

Is Belenios-CaI usable?

We aim at understanding (i) whether voters still manage to vote with Belenios-CaI and (ii) how the additional step affects perceived usability w.r.t. Belenios.

Q2 - Secrecy:

Is the secrecy of the vote degraded by the control pattern?

Belenios-CaI exposes additional information on the public bulletin board under the form of a control pattern selected by the voter (the selection of a or b for each voting choice). Although there is by design no correlation between the voting choice and the control pattern of a ballot, it is possible that the voter’s control pattern selection is influenced by their choice of voting option. This would represent an important leakage since it can compromise the secrecy of the vote.

Q3 - Verifiability:

Is the randomness provided by the voters sufficient to provide cast-as-intended verifiability?

We aim at determining whether the random selection performed by voters is sufficient to prevent an adversary controlling voting clients from manipulating an election with a very low probability of being detected.

Our experiment shows that Belenios-CaI remains reasonably usable compared to Belenios and provide sufficient guarantees w.r.t. cast-as-intended. Moreover, our statistical analysis did not detect any meaningful correlation between votes and control patterns, indicating that Belenios-CaI does not threaten vote privacy. It is important to note, however, that our experiment was conducted in a research center in computer science, where most participants are researchers, PhD students or engineers. As future work, it would be necessary to conduct a wider experiment on the general population in order to determine whether the bias of our population could affect the conclusion of the study.

Related work. Several cast-as-intended mechanisms have been proposed, as summarized in Table 1. A natural approach is to use a second device checking that the voting device has correctly encrypted the voter’s intended vote. This is the approach followed by Estonia [9] where the voting device exports in a QR-code the randomness used for encrypting the vote. The Polyas system [15] refines this approach to provide a better receipt-freeness resistance. The Benaloh’s challenge [3] follows the same idea except that the audited ballot is never the one that is cast, again to mitigate vote-buying attacks. While the Benaloh’s challenge does not require explicitly a second device, it needs an independent party that can check the correctness of the encryption. Another approach is to use return codes, as proposed in Switzerland [8, 18]. Voters receive a voting sheet where each voting choice is associated to a return code (specific to each voter). Once a voter casts a vote, their voting device displays a code that must match the one written on their voting sheet (for their voting choice). This approach relies on a secure postal channel to distribute voting sheets and on a heavy infrastructure (several independent online servers). In order to avoid a second device and keep the infrastructure simple, some systems (e.g. Select [11], Selene [17], Hyperion [16]) introduce a tracker that appears on the public bulletin board, next to the voter’s vote, allowing them to check that their vote has been counted. One important advantage of this approach is its simplicity w.r.t. voters: they see their vote. However, in case of a vote manipulation, voters can only detect it once the election is over and tallied, which renders dispute-resolution even more complex.

Table 1. Comparison of cast-as-intended mechanisms

User studies have already been conducted on other cast-as-intended mechanisms. In particular, Marky et al.  [14] show that the Benaloh’s challenge is very difficult to conduct for voters. In a recent study, Hilt et al.  [10] investigate how voters react to vote manipulations for systems that provide cast-as-intended through a second device while Volkamer et al.  [19] study whether code-voting and QR-codes may respectively increase security and usability. Marky et al.  [13] study voter’s perception when physical printed audit trails are used in parallel with online voting. Of course, no user study was applied to Belenios-CaI yet, due to its recent design.

2 Context

2.1 Overview of Belenios-CaI

We provide a brief overview of Belenios-CaI, that was introduced in [5] as a variant of Belenios. We refer to this article for the precise description and security analysis.We assume here that the reader is familiar with Helios-like e-voting systems. As in Belenios, the actors are the voters, a voting server, some decryption trustees, a credential authority (a.k.a. registrar), and a public bulletin board, with external auditors. In Belenios-CaI, the roles of the decryption trustees and the credential authority are unchanged, so we will not talk about them further.

For a given question, there is a set of possible answers (the candidates), and each of them can be selected, possibly with a limit on selections. The ballot is a set of micro-ballots, one for each answer, that encodes the selected / not-selected choice of the voter. Each micro-ballot takes the following form:

$$ \textsf{bal}= (\textsf{enc}(v), \textsf{enc}(a), \textsf{enc}(b), \pi ),$$

where v has a value of 0 or 1, which encodes the choice of the voter, a and b are random integers chosen by the voting device such that \(b \equiv v + a \bmod 10\), and \(\pi \) is a zero-knowledge proof that the three plaintexts hidden in the three ciphertexts are indeed integers that verify these arithmetic properties.

Once the ballot is sent to the server by the voting device, the voter receives from the server (on a channel not controlled by the voting device) a tracking number that also serves as a commitment on the ballot. This number can no longer be changed, because the voter will look for it on thepublic bulletin board.

Then, the audit phase starts. The voting device shows the a and b values to the voter, who must check that the modular equality \(b \equiv ~\hbox {v} + a \bmod 10\) indeed holds. A key point, here, is that this operation must be done by the voter themself, and not by their voting device. Then, the voter picks one of the two values a and b at random, and the voting device sends to the server the randomness that was used to encrypt this value, so that the server can open the ciphertext. It then publishes on the board the tracking number, the ballot, the revealed randomness, and the corresponding a or b decrypted value. The voter visits the board (possibly with another device), and checks that everything is as expected. In addition to their usual tasks, the auditors must check that the revealed randomness indeed opens the ciphertext to the value a or b published by the server on the board.

In a typical setting, there are several micro-ballots, and therefore there is one (ab) pair for each possible answer. In [5], it is suggested to present them as in Fig. 1, with all the data corresponding to one micro-ballot put on a single line, thus forming a table. Using additions modulo any number at least 2 instead of 10 is possible, and yields the same theoretical security.

Fig. 1.
figure 1

Original Belenios-CaI audit phase (picture from [5]). The voter checks an addition modulo 10, and randomly selects one of the a or b value for each line.

2.2 Terminology Used in the Present Study

In our paper, we will use slightly different names for the various elements in the audit phase of the voter’s journey:

  • The control values are the values a and b, such that the vote v verifies \(b \equiv v + a \bmod 10\). These were called audit codes in [5].

  • The mask is the random choice made by the voter of which control value will be revealed. This is just one bit per line (whether the blue box is on the left or on the right, in Fig. 1).

  • The control pattern is the combination of the mask and of the control values that must be revealed. On Fig. 1, this is the set of blue boxes, with their positions in the table, and the values inside them. The control pattern is the data that is visible on the public bulletin board at the end of the voter’s journey and that they should compare to what their device showed to them.

3 User Interface Design

As first contribution, we present the design of a user interface for Belenios-CaI.

3.1 Challenges

Non-trivial Process. Belenioscai requires the voter to perform a verification and a random selection for each voting item. The risk is that most of the voters would not actually verify what is asked but just validate when possible, to complete the process faster. This risk is exacerbated by the fact that voters perform this task online and anonymously [12]. Since this would compromise the verifiability of the cast-as-intended property, the system should enforce the verification.

Table 2. Possible values for \(v\), \(a\) and \(b\) modulo 2: on the left, the two cases where the voter did not select the answer, and on the right, the two cases where they did.

More generally, we aim at designing an interface that guides voters linearly through simple tasks.

Arithmetical Operations. Further, the system should be usable by anyone and we should not assume technical knowledge of the voter. In particular, the voter cannot be relied on to perform arithmetical operations. Since Belenios-CaI depends on the computation of a modular addition for each voting option, a workaround had to be found.

Voter Randomness. Another critical aspect of the protocol is the randomness provided by the voter during the selection of a mask. Although a slightly biased randomness is sufficient for verifiability as explained in Sect. 4.3, it should not be influenced by the intention of the voter to preserve vote secrecy.

3.2 Design Choices

Compute Instead of Verify. In order to bring the voter to do the verification required by Belenios-CaI, we ask them to perform the computation instead of just verifying it. This forces the voter to examine each line and select one of the answers. If they make a mistake, a warning indicates the issue.

Modulo 2 with Symbols. The computation modulo 10 can be performed more intuitively using a simple trick: by choosing \(m = 2\), computing \(v+ a\mod m\) is equivalent to answering the question “Is \(v\) identical to \(a\)?” and mapping “yes” to \(b= 0\) and “no” to \(b= 1\).

In order to make the task clearer and the resulting pattern easier to visualize, we replaced the numbers with symbols. A first pair of symbols was needed to represent whether a voting option is chosen or not. We used checkboxes as illustrated in Table 2, which we will denote as checked and crossed in the following. Since the value of \(a\) is compared with \(v\), their domain have to be the same. The symbols for \(b\) were selected to represent “yes” and “no”, denoted as thumb-up and thumb-down.

Hide Vote During Selection. Since voters have to select random control values, independently of their voting option, the voting option and choice are hidden during this step of the audit phase, as illustrated in Fig. 2.

3.3 Audit Flow

Steps. In order to make the flow as simple and linear as possible, the interface lets the voter perform the necessary operations in four steps as depicted in Fig. 2. More details about this process can be found in Fig. 1 of the intial paper [5].

  1. 1.

    (check) The relationship between \(v\), \(a\) and \(b\) is verified by answering the question “Are the two symbols identical?” for each line, i.e., for each voting item.

  2. 2.

    (select) The voter randomly picks one of the control values, for each line.

  3. 3.

    (save) The control pattern resulting from the selection is saved by the voter in the form of a PDF file.

  4. 4.

    (ballot box) After casting the ballot, the voter verifies that the control pattern corresponding to their ballot is identical to the one in the save step.

Note that the first two steps include an action for each voting item, therefore the voting time increases significantly for an election with a larger number of candidates.

Fig. 2.
figure 2

Four steps of the audit phase.

Layout. Since the goal of the audit phase is to prevent a malicious voting client from manipulating the cast ballot by making any tampering visible, the interface should have a stable layout with components moving as little as possible. Thus, the grid structure with one row per voting item and three columns (for \(v\), \(a\) and \(b\)) remains unchanged across the three steps.

During the select step, the voter should pick one control value or the other independently of whether the line in question corresponds to an option voted for, therefore the first column, containing \(v\), is grayed out (see Fig. 2b). In the save step, the masking of the chosen voting option is kept to prevent a voter from taking a screenshot of their vote along with the control pattern, thus producing a receipt with their vote in clear. Nevertheless, the name of the voting item has to be displayed to avoid a malicious voting client from swapping two items in the interface without being detected.

3.4 Prototype

We implemented a fully functional and publicly available prototypeFootnote 1 of our design based on the existing Belenios project. Next to implementing the user interface of the audit phase as outlined in this section, other adaptations were made to Belenios while developing the prototype.

  • The zero-knowledge proofs needed by Belenios-CaI were added to the ballot data structure. They ensure, e.g., that the relationship \(v+ a\equiv b\mod 2\) holds for the encryption of each voting item.

  • To improve the usability of the system, the instruction to copy and save the ballot tracking number manually was replaced by the download of a PDF document containing the tracking number. The control pattern can be downloaded in a similar way. The voter should check that the downloaded documents match what is displayed by the voting client.

  • The static ballot box page was improved to include a search function, only displaying the ballots with a tracking number including characters entered by the user. The control pattern was also added along its ballot.

4 Experimental Setting

4.1 Participants

We organized an experiment inviting the employees working in our French computer science research center (Loria and Inria), to vote using Belenios. We discuss the potential biases and their effect in Sect. 6.

The median age of the 129 participants who completed the experiment and fully filled the survey is 40 years old, 32% of them identified as a woman and 67% as a man. More details can be found in the extended version of this paper [7]. 11%, denoted as “admin staff”, are not working in the field of computer science, while the other participants are researchers in computer science (permanent, postdoc or PhD students).

Participants had to vote using either our prototype Belenios-CaI (denoted by \(\textsf{cai}\)) or the original software Belenios-base (denoted by \(\textsf{base}\)). The assignment to a group was made by alternating \(\textsf{cai}\) and \(\textsf{base}\) when participants connected to the voting system (i.e. in a round-robin manner).

4.2 Methods

The experiment was conducted online and lasted for a week.

Participants Task. Participants were contacted by email. They were proposed to first vote online for their favorite dessert among 5 choices from the local canteen. This voting question, asked with the consent of the pastry chef, satisfied both requirements of low impact (no focus on the integrity in this context), and of high motivation (to incentivize more participants). The invitation message included a short description of the data collected for the experiment, together with a link to a web page with a more detailed data management policy. Participants were told that the experiment was aiming at evaluating usability but they were not told that two versions were running.

When connecting to the voting system, participants are asked for their personal credential (included in the invitation email). Then they select their voting choice. After reviewing their choice, the participants authenticate themselves using a short code received by email. Lastly, if they are assigned Belenios-CaI, they have to audit their encrypted ballot as described in Sect. 3.3. If assigned to Belenios-base, they are directed immediately to the “success” page.

Then they were asked to answer a survey. This survey was mentioned in the invitation email and voters were reminded about the survey once they had voted.

Tools. In order to understand the behavior of the participants and identify obstacles in their journey, the voting system was modified to record the time spent on each step of the process as well as the number of mistakes made in the process and the number of clicks on specific elements of the interface. It was also modified in order to assign either Belenios-CaI and Belenios-base in a round-robin manner.

Since the election organized for the experiment had no impact and the results were not used, this version sacrifices privacy to understand the behavior of voters. In particular, the private decryption key was owned by our voting server in order to later decrypt the individual ballots. This was needed to measure the correlation between audit patterns and the voting choice. To keep the experiment anonymous, participants were identified with their voting credential only, letting us link data from the voting system with their vote and the survey results without keeping track of their vote identifiers (email address).

Survey. Available in French or English and hosted on a local instance of LimeSurvey, the survey was split in three sections. The first one contained personal questions about the participant, the second the questions to obtain a System Usability Score (SUS) [4] and the third some further questions about the voting process. SUS takes the form of a 10-question survey on a 5-level Likert scale and results in a usability score on a scale of 0–100. We selected SUS for its simplicity and robustness [2].

4.3 Data Collection and Analysis

Since the research questions address different aspects of the practicality of Belenios-CaI, we describe separately the data collected to address them.

Q1 Usability. Our goal is to measure if the addition of an audit phase in Belenios-CaI negatively impacts its usability compared to Belenios-base. We consider three main aspects to compare the usability of Belenios-CaI w.r.t. Belenios-base:

  • effectiveness: we first record the success rate of the \(\textsf{cai}\) and \(\textsf{base}\) groups, i.e. whether a voter manages to cast her ballot;

  • satisfaction: we compute the SUS scores of both groups resulting from the survey;

  • efficiency: we record the time spent on each voting step and the number of mis-clicks to understand where the voters struggled during the process. To focus on the voting interface we denote normalized voting time the time spent in the voting booth without the authentication step, where the voter has to enter a validation code sent by email.

In each case, we compare the measure (success rate, SUS score, or voting time) between \(\textsf{cai}\) and \(\textsf{base}\), to see if they differ significantly. For this, we compute the mean value and the standard error, and check if the intervals overlap.

Q2 Secrecy. In order to investigate whether the control pattern reveals some information about the vote (Q2), ballots were individually decrypted at the end of the experiment and paired with their corresponding control pattern.

We investigate three possible correlations; for each of them, we performed a Chi-squared independence test:

  • Do voters prefer to select the right control value (\(b\)) on items they voted for?

  • Do voters select \(a\) more easily when the symbol is checked on non-voted items?

  • Do voters select more easily a positive symbol (thumb-up or checked) on items they voted for?

Note that in our interface, the symbols of control values for a voted item are always either both positive or both negative (see the right part of Table 2), thus we do not expect a revealing behavior on voted items.

Q3 Verifiability. The main desired property of the system is the verifiability of cast-as-intended: a ballot should contain the vote intended by the voter, and any manipulation attempt by an adversary controlling voting clients should be detected with high probability. Since Belenios-CaI relies on the randomness provided by the voters to ensure this property, we evaluate whether the data collected during our experiment supports this hypothesis. The set of observed control masks was the only data needed to perform this analysis.

If one mask was significantly more frequent, e.g. if voters would select control values on the right in 99% of cases, an adversary could modify a ballot without being detected with high probability by adapting the control values on the left.

Strategy of an Adversary. To modify a voting item without being detected, an adversary has to adapt the control value which will not be selected by the voter during the select step. In the case of an election where voters choose one among k options, modifying a ballot only requires changing 2 voting items, the one chosen by the voter and the one chosen by the adversary. Therefore, the adversary needs to predict the left-or-right selection on these 2 lines.

Adversary Success Rate. Assuming that the adversary knows the distribution of control masks, they know in particular the distribution of the sub-mask composed of the 2 lines they need to modify D. We call \(peak\) the most frequent sub-mask and \(p= D(peak)\) its frequency. Since the adversary succeeds when the voter selects the predicted mask, their success rate when attempting to modify the same two lines of m ballots is \(p_{\textrm{adv}}= p^m\).

For example, let us consider an attacker attempting to manipulate the ballots of \(m = 10\) voters who are voting for the first candidate by creating ballots for the second candidate instead. Further, we assume that the sub-mask distribution for the first two voting items is \(D(0,0) = 0.1\), \(D(0,1) = 0.1\), \(D(1,0) = 0.7\), \(D(1,1) = 0.1\). In this case we have \(peak= 1,0\) and \(p= 0.7\), which yields \(p_{\textrm{adv}}= p^m = 0.7^{10} = 0.028\).

We observe that the success rate of the adversary decreases exponentially with the number of ballots they attempt to manipulate, which indicates that Belenios-CaI is effective mostly in large elections, since in that case an adversary needs to modify proportionally more ballots to impact the result.

Assessing Observed Distribution. Given the actual distribution of masks observed during our experiment, we compute the pair of voting items with the highest maximal frequency, which indicates the most important weakness in terms of security. We call this observed peak in the mask distribution \(p_{\textrm{obs}}\). To evaluate whether the experiment provides enough evidence that the frequency of \(peak\) is at most a chosen acceptable value \(p_{\textrm{acc}}\), we perform a statistical test where the null hypothesis \(H_0\) is defined as “\(p\ge p_{\textrm{acc}}\)”, in order to determine whether \(H_0\) can be rejected with a significance level \(\alpha = 0.01\). We consider the number of occurrences of the \(peak\) mask and model its distribution as a binomially distributed random variable \(X \sim B(n, p_{\textrm{acc}})\), where n is the size of the group of our experiment.

Fig. 3.
figure 3

Number of participants in the different groups. Some participants could not authenticate during the voting process. Among the ones who filled the survey, some used the credential given as example, which prevented us from linking their survey answer to their vote. The \(\textsf{usability}\) group was used to address Q1, with \(\textsf{cai}\) denoting the participants assigned to Belenios-CaI and \(\textsf{base}\) the ones assigned to Belenios-base. The group \(\textsf{cai}_\textsf{sec}\) denotes the participants assigned to Belenios-CaI even if they did not answer the survey. It is used to address the security-oriented questions Q2 and Q3.

5 Results

5.1 Participants’ Group

An invitation email was sent to 880 employees. It contained a link to the voting system, a personal voting credential, a link to the survey and information about data privacy in the experiment. Among them, 178 started voting and 165 could cast a ballot, as illustrated in Fig. 3. 138 participants completed the survey, but 6 of them did not enter their voting credentials when asked, which prevented us from linking their answer to their ballot. This leaves 129 participants who successfully voted and completed the survey.

Groups. A group of participants (\(\textsf{cai}\)) was assigned to the Belenios-CaI system while the other part (\(\textsf{base}\)) used the original system without cast-as-intended. The \(\textsf{base}\) group contains 71 participants while \(\textsf{cai}\) only 58. The difference is due to a weakness of our round-robin assignment: when a person connected to the voting system and was assigned to \(\textsf{cai}\) but did not complete their vote, their participation was not recorded as exploitable but resulted in two consecutive participants assigned to \(\textsf{base}\).

To address the first research question Q1, we used the data of the participants in \(\textsf{base}\) and \(\textsf{cai}\), forming the \(\textsf{usability}\) group. In the case of Q2 and Q3, since these research questions address security properties of Belenios-CaI and are not related to the survey, we considered all participants who used the Belenios-CaI system, also including those who did not fill the survey. We designate this group as \(\textsf{cai}_\textsf{sec}\), consisting of 76 participants (see Fig. 3).

5.2 Q1 Usability

Success Rate. First, we observe that all of the 76 participants which were assigned to group \(\textsf{cai}\) and managed to authenticate could successfully conclude the audit phase and cast their ballot. This indicates that the usability of the audit step was sufficient to let every participant complete it. More details about the number of voters and their progress can be found in [7].

Fig. 4.
figure 4

On the left, comparison between SUS scores of \(\textsf{base}\) and \(\textsf{cai}\) groups. On the right the score of each question displayed separately; for the “negative” questions 2, 4, 6, 8, 10, we displayed the reverted score, so that on this picture, higher is always better.

SUS Score. The SUS score seems to differ between the \(\textsf{cai}\) and the \(\textsf{base}\) groups, with a higher score for \(\textsf{base}\) (mean value of 78.45, std. error of 1.61) than for \(\textsf{cai}\) (mean of 72.07, std. error of 1.82), hence the intervals do not overlap. Figure 4a illustrates in a more qualitative way this difference, showing the median and the quartiles in both cases. We observe that the median score of both groups can be classified as “good” according to [2]. Figure 4b displays the average score to each of the questions in the SUS survey for both groups, showing that the difference is distributed over the different aspects measured by SUS.

Fig. 5.
figure 5

Normalized voting time distribution of \(\textsf{base}\) and \(\textsf{cai}\) groups.

Voting Time. Similarly, the normalized voting time (see Sect. 4.3) of \(\textsf{cai}\) is higher than \(\textsf{base}\). The mean value for \(\textsf{cai}\) is 113.2 sec. (std. error 8.16) and for \(\textsf{base}\) it is 29.9 sec. (std. error of 3.86). Figure 5 complements this data with the median values and the shape of the distributions in both cases.

The difference between the voting times of both groups is explained integrally by the time on the audit phase (median: 71 s). Although this additional time is important, the overall voting time remains under 2 min, a time still acceptable to complete a voting process.

5.3 Q2 Secrecy

The control patterns selected by the voters are published in the ballot box next to their ballot tracker. Thus, any dependence between a vote and its corresponding control pattern is a threat to vote secrecy. We investigate here whether the data collected during our experiment contains evidence of such a dependence.

Data. Each vote is composed of one item per possible voting option; each item is a triple \((v, a, b)\), where only one of \(a\)  and \(b\) is displayed on the board. This results in a total of 380 items (\(= 5 \cdot |\textsf{cai}_\textsf{sec}|\), for the 5 voting options of each ballot and a population \(|\textsf{cai}_\textsf{sec}| = 76\)). As detailed in Sect. 3, \(v\) and \(a\) are represented in the voter interface as checked or crossed (for 1 and 0, respectively), and \(b\) takes the shape of thumb-up or thumb-down. Note that we consider each item of the ballots separately, distinguishing between 76 voted items (corresponding to the chosen option) and 304 non-voted items.

Table 3. Results of Pearson’s \(\chi ^2\) independence tests, for three pairs of variables, where correlation could lead to a privacy leak. For each line, the number of choices for each variable is 2, therefore the degree of freedom is 1, and the reference value for \(\chi ^2\) at a significance level of 0.05 is 3.84. Any computed value less than 3.84 means that the variables are independent.

Independence Tests. We performed the three hypothesis tests listed in Sect. 4.3. The results of the Pearson’s \(\chi ^2\) tests are displayed in Table 3, for three pairs of variables that should be independent, otherwise an attacker could deduce some information on the vote and break privacy. In all cases, the test indicates independence.

5.4 Q3 - Verifiability

We start by aggregating the observed control masks, i.e., the left-right selection in the control patterns. The mask with the highest frequency is 1, 1, 1, 1, 1, present in 10 out of \(|\textsf{cai}_\textsf{sec}| = 76\) ballots, i.e., in 13.2% of cases. This observation suggests that the distribution is not too skewed and that cast-as-intended can be ensured. For completeness, the distribution of control masks observed in our experiment can be found in the extended version of this paper [7].

We follow the method proposed in Sect. 4.3 to assess the observed distribution and decide whether this leads to an acceptable advantage for an attacker. Among the \({5 \atopwithdelims ()2} = 10\) combinations, it turns out to be the first two voting items with \(peak= 0,0\) give the highest frequency and \(p_{\textrm{obs}}= \frac{31}{76} = 0.41\). This value is the result of our experiment with a sample size of \(|\textsf{cai}_\textsf{sec}| = 76\).

This 0.41 value is well below what we decided to be an acceptable value \(p_{\textrm{acc}}=0.7\). For the sake of completeness, we computed the probability that we observed a peak at 0.41, if the real distribution includes a peak with 0.7. This probability is \(1.25 \cdot 10^{-7}\) (details are given in [7]) and confirms that, based on our observation, an adversary will gain only an acceptable advantage regarding the verifiability property.

5.5 Feedback Given in the Free Text Questions

In the third part of the survey, participants were given the opportunity to provide comments on various aspects and we thank them for their suggestions.

Regarding security, many participants say they trust the system, but they also acknowledge that this is mostly due to the fact that they know the authors of this experiment and trust their expertise. They do not understand how the various steps in the voter’s journey have an impact on the security. Some participants are more skeptical and seem to be reluctant against Internet voting in general.

Feedback about usability is mostly positive, in the sense that many participants consider that the number of steps is high but that this is acceptable if justified for security. However, many also mention that not understanding the precise reasons for these steps generates some frustration. Among the difficulties mentioned by the participants, going back and forth between the mailbox and the browser is mentioned several times as a problem. Some also say that they managed to use the system but that they believe it might be difficult for others (elders, in particular). Some remarks were specifically related to the cast-as-intended functionality, and one participant explicitly mentioned the hesitation when having to randomly choose the mask.

More generally, feedback from participants confirms that Belenios-CaI does not seem to be hindering the voting system, but it does offer some ideas for improving the user experience and acceptance of the online voting system.

6 Discussion

Considering the cost of ensuring cast-as-intended in perceived usability and in voting time, we observe that it might not be worth the additional security in the case of small scale elections since a few modifications may still happen with realistic probability. However, for large elections, the protection against malicious voting clients can make the traded usability acceptable. Indeed, it is unlikely that an attacker can modify sufficiently many votes without being detected.

Limitations. The recruitment of participants was performed in the research center where the authors work. Therefore, most participants had at least a Master in Computer Science. They may have a different understanding of how to “randomly” select the control values, as compared to the general population. Furthermore, although participation was anonymous, the fact that many participants know the authors of the study might have introduced a bias in the responses to the survey. But since we are interested in the comparison between Belenios and Belenios-CaI and not in a system alone, the possible bias should be equivalent for both systems and not affect the difference in perceived usability.

The main goal of the study was to determine whether Belenios-CaI is an acceptable evolution of Belenios. It does not evaluate whether the underlying cast-as-intended mechanism is more usable than others such as Benaloh’s challenge [3] or return codes [8, 18].

Future Work. The analysis regarding secrecy and verifiability performed in Sect. 5.3 only provides insight about settings similar to our experiment. To make stronger statements concerning the impact of voter-generated randomness on vote privacy, a study on a larger population sample, more representative of the general population, would be necessary.

Another interesting aspect would be the evaluation of verification efficiency, i.e., whether Belenios-CaI allows voters to detect a manipulation, and a comparison with other systems providing cast-as-intended verifiability.