Abstract
The reliability and usefulness of verification depend on the ability to represent appropriately the uncertainty. Most existing work on neural network verification relies on the hypothesis of either set-based or probabilistic information on the inputs. In this work, we rely on the framework of imprecise probabilities, specifically p-boxes, to propose a quantitative verification of ReLU neural networks, which can account for both probabilistic information and epistemic uncertainty on inputs. On classical benchmarks, including the ACAS Xu examples, we demonstrate that our approach improves the tradeoff between tightness and efficiency compared to related work on probabilistic network verification, while handling much more general classes of uncertainties on the inputs and providing fully guaranteed results.
You have full access to this open access chapter, Download conference paper PDF
Keywords
1 Introduction
Verifying that neural networks satisfy desirable properties has become crucial for ensuring the safety of learning-enabled autonomous systems. However, most existing approaches that provide guarantees on the satisfaction of a specification are designed for adversarial input uncertainties, and offer only qualitative assessments. Complete methods return whether or not the property is satisfied but are often not scalable, while sound methods return either that a property is satisfied or that the answer is unknown, due to over-approximation errors. For instance, the analyzers DeepZ [27], DeepPoly [28] and Verinet [13] propagate respectively zonotopes, polyhedra, and symbolic intervals through the layers of a neural network, to ensure that certain specifications are met. In addition, many analyzers have considered producing also robustness bounds of networks, as specifically done by CROWN [35], FCROWN [16] and CNN-Cert [5].
In contrast, quantitative verification has been little explored for neural networks, despite providing a better understanding of the system by refining information about property satisfaction. This is especially true for probabilistic verification. Some authors have considered estimating the statistics of the output of neural networks, given a multivariate probabilistic law for its inputs. This approach has been used in particular for assessing the robustness of neural networks [31, 36] and for probabilistically certifying their correctness under adversarial attacks [2, 3, 14, 20, 34]. But these estimates, using improved sampling methods, do not give guaranteed bounds. Even fewer articles have considered guaranteed probabilistic bounds. In [32], the authors describe the analyzer PROVEN, which provides probability certificates of neural network robustness when the input perturbation is given by a probabilistic distribution, based on abstractions. For networks with ReLU activation function, methods have been developed in [21, 22] to find the probability of the output or of the input-output relationships. In [8], the authors consider an ellipsoid input space with Gaussian random variables and compute confidence ellipsoids for the outputs of ReLU networks, using semi-definite programming. In [29], the authors consider truncated multivariate Gaussian distribution inputs and abstract them by probabilistic stars (ProbStars), a variation of the star set abstraction [1] recently introduced in the context of reachability analysis. They propagate them in a guaranteed manner in a network, and estimate the probability of violating a safety property on the output by computing each probstar’s probability. In a way, ProbStar is a hybrid method, relying on guaranteed set-based computations, but estimating the probabilities in a non guaranteed manner.
The works mentioned above rely on the hypothesis of either set-based or probabilistic information on the inputs. However, in real-world systems, precise models representative of the data are not always available. For instance, several probabilistic models may be plausible for describing of a problem, or a probabilistic model may be known but with uncertain parameters. Therefore, we need to consider both aleatory information and epistemic uncertainty. Imprecise probabilities [4, 30] offer a framework that unifies probabilistic and set-based information. This framework includes a wide variety of mathematical models, among which probability boxes (p-boxes in short) [9], which characterize an uncertain random variable by all probability distributions consistent with lower and upper bounds on its cumulative distribution function (CDF in short). A p-box can be seen as interval bounds on a probability distribution. An Interval-based discrete over-approximation of p-boxes, Interval Dempster-Shafer structure [25] (DSI in short) has been proposed. Algorithms for arithmetic operations on DSI can be derived [9, 33], which can be seen as a unification of standard interval analysis with traditional probability theory, allowing probability bound analysis on arithmetic expressions. It gives the same answer as interval analysis does when only range information is available. And it gives sound bounds on the distribution function, as a sound counterpart of a Monte Carlo simulation, when information is precise enough to fully specify input distributions and their dependencies. However, DSI arithmetic is expensive and suffers from the conservativeness of interval arithmetic, on which it relies. Probabilistic affine forms have been proposed as an alternative [6, 7], improving both precision and efficiency by combining affine forms or zonotopes and DSI structures.
In this work, we first extend for the analysis of ReLU neural networks, the Interval Dempster Shafer arithmetic in Sect. 3 and probabilistic affine arithmetic in Sect. 4. We then introduce in Sect. 5 a new abstraction, Zonotopic Dempster Shafer structures, which exhibits much better computational properties (complexity and tightness of the approximations). This new abstraction is directly related to the general notion of random sets [24] which generalize one-dimensional Dempster-Shafer structures such as the DSI. Finally, in Sect. 6 we demonstrate on benchmarks from the state of the art that our approach improves in terms of tradeoff between tightness and efficiency compared to the most closely related work [29], while being able to handle much more general classes of uncertainties on the inputs and providing fully guaranteed results.
2 Problem Statement
We consider an L-layer feedforward ReLU network with input \(x^0 \in \mathbb {R}^{h_0}\) and output \(y = f(x^0) = x^L \in \mathbb {R}^{h_L}\), with f being the composition of L layers, \(f = f^{L-1} \circ \ldots \circ f^0\). The k-th layer of the ReLU network is defined by \(f^k : \mathbb {R}^{h_k} \rightarrow \mathbb {R}^{h_{k+1}}\) of the form \( x^{k+1} =f^k(x^k)= \sigma (A^k x^k +b^k),\) where \(A^k \in \mathbb {R}^{h_{k+1} \times h_k}\) is the weight matrix, \(b^k \in \mathbb {R}^{h_{k+1}}\) is the bias, and \(\sigma (x^k_j) := \max (0,x^k_j)\) is the component-wise ReLU function, where \(x^k_j\) is the jth component of \(x^k \in \mathbb {R}^{h_k}\).
For a multi-dimensional distribution \(X \in \mathbb {R}^n\), we note \(P(X \le x) := \mathbb {P}(X_1 \le x_1 \wedge X_2 \le x_2 \ldots \wedge X_n \le x_n)\), and in what follows, we systematically use \(\le \) as the componentwise order. We are interested in the following two problems, extending, in particular to a larger class of inputs, the quantitative verification properties of [29]:
Problem 1
(Probability Bounds Analysis). Given a ReLU network f and a constrained probabilistic input set \(\mathcal{X}= \{ X \in \mathbb {R}^{h_0} \mid C X \le d \wedge \underline{F}(x) \le \mathbb {P}(X \le x) \le \overline{F}(x), \forall x \}\) where \(\underline{F}\) and \(\overline{F}\) are two cumulative distribution functions, compute a constrained probabilistic output set \(\mathcal{Y}\) guaranteed to contain \(\{f(X), X \in \mathcal{X} \}\).
Problem 2
(Quantitative Property Verification). Given a ReLU network f, a constrained probabilistic input set \(\mathcal{X}\) and a linear safety property \(Hy \le w\), bound the probability of the network output vector y satisfying this property.
We illustrate throughout the paper the approach on the toy example below. The details on the analyzes of this example, which results are stated in further sections, are provided in the Appendix of preprint [10].
Example 1
We consider the ReLU network defined by the matrices of weights and biases: \( A_1 = \begin{bmatrix} 1 & -1 \\ 1 & 1. \end{bmatrix}, b_1 = \begin{bmatrix} 0.0 \\ 0.0 \end{bmatrix}, A_2 = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}, b_2 = \begin{bmatrix} 0.0 \\ 0.0 \end{bmatrix}. \) We take only one ReLU layer and an affine output layer. After the ReLU layer, we note \(x^1 = \sigma (A_1 x^0 + b_1) = \sigma (x^0_1-x^0_2,x^0_1+x^0_2)\), and after the output layer \(x^2 = A_2 x^1 + b_2\).
The problem is to verify the network against the unsafe output set \(x^2_1 \le -2 \wedge x^2_2 \ge 2\) for an input \(x^0=(x^0_1,x^0_2) \in [-2,2] \times [-1,1]\). This writes \(H x^2 \le w\) with \(H = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}, w = \begin{bmatrix} -2 & - 2 \end{bmatrix}\).
3 Analysis with Interval Dempster-Shafer Structures
Probability-Boxes and Interval Dempster-Shafer Arithmetic. We characterize a real-valued random variable X by its cumulative probability distribution function (CDF in short) \(F: \mathbb {R}\rightarrow [0,1]\) defined by \(F(x) = \mathbb {P}(X \le x)\). A p-box [9] is defined by a pair of CDF:
Definition 1
(P-box). Given two CDF \(\overline{F}, \underline{F}\), the p-box \([\underline{F},\overline{F}]\) represents the set of distribution functions F such that \(\underline{F}(x) \le F(x) \le \overline{F}(x)\) for all \(x \in \mathbb {R}\).
P-boxes can be combined in mathematical calculations, but analytical solutions are usually not available. Interval Dempster-Shafer structures [25] provide a simple way to soundly over-approximate the set of cdfs using a discrete representation, for which the arithmetic operations can be converted into a series of elementary interval calculations.
Definition 2
(Interval Dempster-Shafer Structure). An interval Dempster-Shafer structure (DSI in short) is a finite set of intervals, named focal elements, associated with a probability, written \(d=\bigl \{\langle \boldsymbol{x_1},w_1\rangle ,\langle \boldsymbol{x_2},w_2\rangle ,\dots ,\langle \boldsymbol{x_n},w_n\rangle \bigr \}\), where \(\boldsymbol{x_i}\) is an interval and \(w_i\in (0,1]\) is its probability, with \(\sum _{k=1}^nw_k=1\).
Proposition 1
(CDF of an Interval Dempster-Shafer Structure). A DSI \(d=\bigl \{\langle \boldsymbol{x_1},w_1\rangle ,\langle \boldsymbol{x_2},w_2\rangle ,\dots ,\langle \boldsymbol{x_n},w_n\rangle \bigr \}\) defines the discrete p-box \(\bigl [\underline{F_d},\overline{F_d}\bigr ]\) representing the sets of distributions such that \(\underline{F_d}(u) \le \mathbb {P}(X \le u) \le \overline{F_d}(u)\) with \(\underline{F_d}(u)=\sum _{\overline{\boldsymbol{x_i}}< u}w_i\) and \(\overline{F_d}(u)=\sum _{\underline{\boldsymbol{x_i}}\le u}w_i\), noting \(\underline{\boldsymbol{x_i}}\) (resp. \(\overline{\boldsymbol{x_i}}\)) the greatest lower bound (resp. least upper bound) of set \(\boldsymbol{x_i}\).
Conversely, discrete upper and lower approximations of distribution functions can be constructed, for instance using the inverse CDF as in [33]. Given a discretization size N, they define a DSI with N focal elements where all weights are equal to 1/N. The focal elements \(\boldsymbol{x_i}\) are defined evaluating the quantiles or inverse cdfs for uniformly spaced probability levels \(p_i = \frac{i-1}{N}\) for \(i = 1, \ldots , N + 1\), by \(\boldsymbol{x_i} = [\overline{F}^{-1}_d(p_i),\underline{F}^{-1}_d(p_{i+1})]\) where \(F^{-1}_d(p)=\inf \{x \mid F_d(x) \ge p \}\).
The arithmetic operators on DSI structures [9, 33] compute guaranteed enclosures of all possible distributions of an output variable if the input p-boxes enclose the input distributions. Let two random variables X and Y represented by DSI structures \(d_X=\{\langle \boldsymbol{x_i},w_i\rangle ,\,i\in [1,n]\}\) and \(d_Y=\{\langle \boldsymbol{y_j},w'_j\rangle ,\ j\in [1,m]\}\), and Z be the random variable such that \(Z=X+Y\) (the algorithms for other arithmetic operations are similar). In particular, they define algorithms for the extreme cases of unknown dependence and independence between X and Y.
Definition 3
(Probabilistic Dependence and Dependence Graph). Two random variables \(X_1\) and \(X_2\) are independent if and only if their CDF can be decomposed as \(F(x_1,x_2)= F_1(x_1)F_2(x_2)\). Otherwise, the random variables are called correlated. The probabilistic dependence graph G over a set of n variables \(X_1, \ldots , X_n\) is an undirected graph where the \(X_i\) are the vertices and there exists an edge \((X_i,X_j)\) in the graph iff variables \(X_i\) and \(X_j\) are correlated.
The addition of DSI independent variables is obtained as a discrete convolution of the two input distributions [9, 33]:
Definition 4
(Addition of Independent DSIs). If X and Y are independent random variables, then the DSI for \(Z = X \oplus Y\) is \(d_Z=\{\langle \boldsymbol{z_{i,j}},r_{i,j}\rangle ,\ i\in [1,n],\ j\in [1,m]\}\) such that: \( \forall i\in [1,n],\ j\in [1,m],\ \boldsymbol{z_{i,j}}=\boldsymbol{x_i} + \boldsymbol{y_j}\text { and }r_{i,j}=w_i\times w'_j . \)
The number of focal elements grows exponentially with the number of such operations. In order to keep the computation tractable, the number of focal elements is usually bounded, at the cost of some over-approximations.
Different algorithms have been proposed for the addition of DSIs with unknown dependence, relying on the Fréchet-Hoeffding copula bounds or on linear programming, most of them produce the same result [9, 23, 33].
DSI Analysis of Neural Networks. We now define a sound probability bounds analysis of ReLU neural networks.
Modelling the Network Inputs. Consider an \(h_0\)-dimensional uncertain input vector \(x^0=(x^0_1,\ldots ,x^0_{h_0})\), which can be represented as a vector \(d^0 = (d^0_1, \ldots ,d^0_{h_0})\) of \(h_0\) DSI, each with the same number n of focal elements for simplicity of presentation: where \(\boldsymbol{x^0_{i,j}}\in \mathbb {I}\mathbb {R}\) is an interval and \(w^0_{i,j}\in [0,1]\) is the associated probability, with \(\sum _{j=1}^n w^0_{i,j}=1\), for all \(i \in 1,\ldots ,h_0\). A dependence graph is assumed to be known between the components of the input vector. Affine Transform of a Vector of DSI Structures. Given a vector of random variables \(X=(X_1,\ldots ,X_k)\) represented as a vector \(d = (d_1, \ldots ,d_k)\) of DSI structures, and a dependence graph G, we define a DSI \(d^y = \sum _{j=1}^k a_j d_j + b\) which includes the result of \(Y = \sum _{j=1}^k a_j X_j + b\) on the DSI d by:
-
we note \(a_j d_j\) where \(a_j \in \mathbb {R}\) and \(d_j\) is a DSI \(\left\{ \langle \boldsymbol{x_{j,i}},w_{j,i}\rangle \mid i=1,\ldots ,n\right\} \), the result of the multiplication of a constant by a DSI: \(\left\{ \langle a_j \boldsymbol{x_{j,i}},w_{j,i}\rangle \mid i=1,\ldots ,n\right\} \),
-
we arbitrarily choose to compute the sum \(\sum _{j=1}^k a_j d_j\) as \(((a_1 d_1 + a_2 d_2) + a_3 d_3) + \ldots a_k d_k) \), applying for the j-th sum the right operators depending of the dependence between \(X_{j+1}\) and \(X_1\) to \(X_j\),
-
we note \(d + b\) where d is a DSI \(\left\{ \langle \boldsymbol{x_i},w_i\rangle \mid i=1,\ldots ,n\right\} \) and \(b \in \mathbb {R}\) the result of the addition of a constant to a DSI: \(\left\{ \langle b+ \boldsymbol{x_i},w_i\rangle \mid i=1,\ldots ,n\right\} \),
-
the dependence graph is updated by adding an edge between Y and all \(X_j\) such that \(a_j\) is non zero
Interpreting the action of the ReLU function \(Y=\max (0,X)\) means enforcing the constraints \(Y \ge X\) and \(Y \ge 0\). This means that Y is obtained by intersecting the focal elements of the representation of X with \([0,\infty )\):
Lemma 1
(ReLU of a DSI). Given a random variable X represented by the DSI \(d= \{\langle \boldsymbol{x_i},w_i\rangle ,\,i\in [1,n]\}\), then the CDF of \(Y=\sigma (X)=\max (0,X)\) is included in the DSI \( \{\langle \boldsymbol{y_i},w_i\rangle ,\,i\in [1,n]\} \text{ with } y_i = [\max (0,\underline{x_i}), \max (0,\overline{x_i})]. \)
This leads to Algorithm 1 for the analysis of an L-layer ReLU network with the notations of Sect. 2.
Output The output after propagation in the network consists in:
-
the vector of DSI \(d^L\) characterizing the network output (solving Problem 1)
-
interval bounds noted \(cdf(H d^L,w)\) on the probability \(\mathbb {P}(H x^L \le w)\) (solving Problem 2). Let \([\underline{P_m},\overline{P_m}]\), with m ranging over the rows of H and w, be the interval for the probability \(\mathbb {P}( \sum _{i=1}^{h_L} h_{mi} x^L_i \le w_m)\). It is obtained applying Proposition 1 to compute the CDF at w on each component of the vector \(H d^L\). We define \(cdf(H d^L,w) = [\min _m \underline{P_m}, \min _m \overline{P_m}].\)
The DSI computation encodes the marginal distribution of each component of a vector \(x^i\) as a DSI. The probability of a conjunction \(H x^L\) is thus computed considering each inequality independently and expressing that the probability of the conjunction is lower or equal than the probability of each term.
Analysis of the Toy Example. Consider Example 1. A classical interval analysis of the network from the input set \(x^0=(x^0_1,x^0_2) \in [-2,2] \times [-1,1]\) yields the output ranges \(x^2_1 \in [-3,3]\) and \(x^2_2 \in [0,6]\). As these have non empty intersection with the property \(x^2_1 \le -2 \wedge x^2_2 \ge 2\), this analysis does not allow to conclude.
Uniform Distribution on Inputs Abstracted by DSI with 2 Focal Elements. Let us now suppose that we additionally know that the 2 components of the input follow a uniform distribution. We first choose a discretization of the inputs by DSI with 2 focal elements, \(d^0_1 = \{ \langle [-2,0], 0.5 \rangle , \langle [0,2], 0.5 \rangle \}\) and \(d^0_2 = \{ \langle [-1,0], 0.5 \rangle , \langle [0,1], 0.5 \rangle \}\). Let us suppose the inputs independent, the first output after the first affine layer, \(d_{y_1} = d^0_1 - d^0_2 \), computed following Definition 4, is \( \{ \langle [-2,1] , 0.25\rangle , \langle [-3,0] , 0.25\rangle , \langle [0,3], 0.25 \rangle , \langle [-1,2], 0.25 \rangle \}. \) In order to limit the complexity of computation, the result of each operation on DSI can be reduced by a sound overapproximation with a fixed number of focal elements. This can be done by joining some focal elements and adding the corresponding weights. For instance here, when reducing to 2 focal elements by joining the first 2 and the last 2 focal elements, this results in \(d_{y_1} = \{ \langle [-3,1] , 0.5\rangle ; \langle [-1,3] , 0.5\rangle \}\). Then, applying to \(d_{y_1}\) the ReLU function by Lemma 1 produces \( d^1_1 = \{ \langle [0,1] , 0.5 \rangle , \langle [0,3], 0.5 \rangle \}\). The other output \(x^1_2\) of the first layer has the same DSI representation. After the output layer, the first output is \(d^2_1 = d^1_1 - d^1_2 = \{ \langle [-3,1], 0.5 \rangle , \langle [-1,3], 0.5 \rangle \}\). Here \(x^1_1\) and \(x^1_2\) can no longer be considered as independent as they both are correlated to \(x^0_1\) and \(x^0_2\), the subtraction of their DSI representation is computed accordingly. The second output is \(d^2_2= d^1_1 + d^1_2 = \{ \langle [0,4], 0.5 \rangle , \langle [0,6] , 0.5 \rangle \}\).
Take now the property \(x^2_1 \le -2 \wedge x^2_2 \ge 2\). Using Proposition 1, we deduce from \(d^2_1\) and \(d^2_2\) that \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.5]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0.0, 1.0]\), from which \(\mathbb {P}(x^2_1 \le -2 \wedge x^2_2 \ge 2) \in [0, 0.5]\). Consider \(\mathbb {P}(x^2_1 \le -2) \) evaluated using \(d^2_1 = \{ \langle [-3,1], 0.5 \rangle , \langle [-1,3], 0.5 \rangle \}\). Its lower bound is obtained using Proposition 1 by \(\underline{P}(-2)=\sum _{\overline{\boldsymbol{x_i}}< -2}w_i = 0\), as the upper bounds of the 2 focal elements \([-3,1]\) and \([-1,3]\) are both greater than -2. The upper bound is \(\overline{P}(-2)=\sum _{\underline{\boldsymbol{x_i}}\le u}w_i = 0.5\), as the lower bound of \([-3,1]\) is lower than -2, which is not the case for \([-1,3]\).
Increasing the Number of Focal Elements. refines the over-approximation of the input distributions and the sets of CDF obtained for the outputs. For instance for 100 focal elements, in the case inputs can be considered as independent, we obtain \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.07]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0.05, 0.52]\). In the case of inputs with unknown correlation, \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.26]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0, 1]\). However, the supports of the sets of distribution remain unchanged and are equal to the ranges obtained through interval analysis. Indeed, the affine layers introduce some conservatism due to the wrapping effect of the intervals used as focal elements. Additionally, joint distribution are not naturally represented in the DSI framework, making it difficult to accuractely verify general properties.
4 Analysis with Probabilistic Zonotopes
Probabilistic affine forms [6, 7] are affine forms where the symbolic variables or noise symbols are constrained by DSI structures instead of being simply bounded in [−1,1]. This can be seen as a simple way to encode affine correlations between uncertain variables abstracted by p-boxes, or a quantitative version of affine forms. We first introduce the probabilistic affine forms, presented as probabilistic zonotopes, which represent vectors of probabilistic affine forms. Then we propose an analysis of neural networks relying on these probabilistic zonotopes.
Affine Forms, Zonotopes and Probabilistic Zonotopes. An affine form is a linear expression \(\alpha _0+\sum _{j=1}^p \alpha _j \varepsilon _j \) with real coefficients \( \alpha _j\) and symbolic variables \(\varepsilon _j \) called noise symbols which values range in [−1,1]. A zonotope is the geometric concretization of a vector of affine forms:
Definition 5
(Zonotope). An n-dimensional zonotope \(\mathcal Z\) with center \(c \in R^n\) and a vector \(\varGamma = \begin{bmatrix} g_1 \ldots g_p \end{bmatrix} \in \mathbb {R}^{n,p}\) of p generators \(g_j \in \mathbb {R}^n\) for \(j = 1,\ldots ,p\) is defined as \( \mathcal{Z} = \langle c, \varGamma \rangle = \{ c + \varGamma \varepsilon \mid \Vert {\varepsilon } \Vert _{\infty } \le 1 \}\).
We note \(\gamma _i(\mathcal{Z}) = c_i +\sum _{j=1}^p g_{ij} [-1,1] \) the range of its i-th component.
Zonotopes are closed under affine transformations:
Proposition 2
(Affine Transforms of a Zonotope). For \(A \in \mathbb {R}^{m,n} \) and \(b \in \mathbb {R}^{m}\) we define \( A \mathcal{Z} + b = \langle Ac+b, A \varGamma \rangle \) as the m-dimensional resulting zonotope.
Definition 6
(Probabilistic Zonotope). For \(\varepsilon \) a vector of random variables of \(\mathbb {R}^p\), a zonotope \(\mathcal{Z}= \langle c, \varGamma \rangle \) with \(c \in \mathbb {R}^n\) and \(\varGamma \in \mathbb {R}^{n,p}\) can be interpreted as a probabilistic zonotope noted \(p\mathcal{Z}(\varepsilon ) = \langle c, \varGamma ,\varepsilon \rangle \) representing the \(n-\)dimensional random variable \(Z =c + \varGamma \varepsilon \). Let \(d_{\varepsilon }\) be a \(p-\)dimensional vector of DSI structures with support in \([-1,1]^p\) and G a dependence graph on the components \({\varepsilon }_1, \ldots , .{\varepsilon }_p\). The marginal of each component of \(p\mathcal{Z}(d_{\varepsilon } )\) is the affine transform on DSI structures: \(c^i + \sum _{j=1}^p g_{ij} d_{\varepsilon _j} , \; i = 1, \ldots ,n\) computed as in Sect. 3.
Zonotopes represent affine relations that hold between uncertain quantities. In the case of probabilistic zonotopes, imprecise affine relations hold:
Example 2
Let \( {x_1}=1+\varepsilon _1-\varepsilon _2\), \(x_2=-\frac{1}{2}\varepsilon _1+\frac{1}{4} \varepsilon _2\), \(d_{\varepsilon _1}=\{\langle [-1,0],\frac{1}{2} \rangle , \langle [0,1],\frac{1}{2} \rangle \}\), \(d_{\varepsilon _2}=\{\langle [-\frac{1}{10},0],\frac{1}{2} \rangle , \langle [0,\frac{1}{10}],\frac{1}{2} \rangle \}\), Then \(x_1+2 {x_2}=1-\frac{1}{2}\varepsilon _2\), represented by the DSI \(d=1-\frac{1}{2} d_{\varepsilon _2}=\{\langle [\frac{19}{20},1],\frac{1}{2} \rangle , \langle [1,\frac{21}{20}], \frac{1}{2} \rangle \}\), by (a simple version of) the Affine Transform of a Vector of DSI Structures. Thus the lower probability that \(x_1+2x_2\le \frac{21}{20}\) is 1; and the upper probability that \(x_1+2x_2 < \frac{19}{20}\) is 0. But \(x_1+2x_2\le 1\) has upper probability \(\frac{1}{2}\) and lower probability 0 and is thus an imprecise relation.
Probabilistic Zonotopes for the Analysis of Neural Networks. We detail hereafter Algorithm 2 using probabilistic zonotopes.
Input and Initialization. The input of the algorithm is the same as in Sect. 3, the uncertain input is modelled as a vector \(d^0 = (d^0_1, \ldots ,d^0_{h_0})\) of \(h_0\) DSI. We can then define \(\boldsymbol{x}^0 \in \mathbb {I}\mathbb {R}^{h_0}\) the \(h_0\)-dimensional box obtained as the support of \(d^0\), computed for each DSI as the union of its focal elements with non-zero weight. Finally, we define \(p\mathcal{Z}^0(\varepsilon ) = \langle c^0, \varGamma ^0 ,d_{\varepsilon } \rangle \) in Line 1 of Algorithm 2 by:
-
\(\mathcal{Z}^0= \langle c^0, \varGamma ^0 \rangle ,\) is built from the box \(\boldsymbol{x}^0\),
-
\(d_{\varepsilon }\) is the vector of DSI obtained by rescaling \(d^0\) between -1 and 1.
Propagation in the Layers. The propagation in the affine layers can be expressed directly as affine transform on the zonotope by Proposition 2, and later interpreted as a probabilistic zonotope. Proposition 3 introduces the ReLU transformer proposed in [26], encoded in zonotope matrix form. The ReLU transform is applied componentwise (on each row) and a new noise symbol (and thus a new column in the generator matrix) is added whenever an over-approximation is needed, that is when the input is not either always positive or negative.
Proposition 3
(ReLU Transform of a Zonotope). Let \( \mathcal{Z} = \langle c, \varGamma \rangle \) with \(\varGamma = (g_{ij})_{i,j} \in \mathbb {R}^{n,p}\) be a zonotope, we note \([l_i,u_i]=\gamma _i(\mathcal{Z})\) the range of its i-th component. The result of applying componentwise the ReLU activation function is a zonotope \( \mathcal{Z'} = \langle c', \varGamma ' \rangle \) where \(c' \in \mathbb {R}^n\) and \(\varGamma ' \in \mathbb {R}^{n,p+n}\), with \(c'_i = \lambda _i c_i + \mu _i\) and
Output. The output zonotope after the L layers is \(\mathcal{Z}^{L}= \langle c^L, \varGamma ^L \rangle \) with \(c^L \in \mathbb {R}^{h_L}\) and \(\varGamma ^L \in \mathbb {R}^{h_L,\sum _{k=0}^L h_k}\). At line 5 of Algorithm 2, the probabilistic zonotope \(p\mathcal{Z}^{L}(d_{\varepsilon })\) is converted into a vector of DSI, following Definition 6. In this interpretation as a probabilistic zonotope, we must define the DSI structures corresponding to the \(\sum _{k=1}^L h_k\) new noise symbols introduced by the ReLU transformers. A sound although conservative interpretation is to take the interval \([-1,1]\) as DSI for them. This corresponds to considering that there is no available information about the distribution of the variable represented by these new noise symbols.
At line 6, the transform \(H \mathcal{Z}^{L}\), interpreted as a probabilistic zonotope, is converted in a vector of DSI and used to bound the probability \(\mathbb {P}(Hy \le w)\).
Analysis of the Toy Example. We consider again Example 1.
Deterministic Zonotopes Analysis. From the input sets \(x^0 \in [-2,2] \times [-1,1]\), the zonotopic interpretation is initialized with the affine forms \(x^0_1 = 2 \varepsilon _1\), \(x^0_2 = \varepsilon _2\) with \(\varepsilon _1, \varepsilon _2 \in [-1,1]\), encoded: \(\mathcal{Z}^0= \langle c^0, \varGamma ^0 \rangle \text{ with } c^0 = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \varGamma ^0 = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}. \) Using the affine transforms on zonotopes and Proposition 3 for the ReLU layer with \((\lambda ,\mu )=(0.5,0.75)\) for both neurons, we obtain after the second affine layer:
The first output \(x^2_1\) is bounded in a tighter interval than with interval propagation ([-3,3]), the second ouput \(x^2_2\) is incomparable to the interval computation ([0,6]).
Probabilistic Zonotopes Analysis. Let us now suppose that the inputs \(x^0_1\) and \(x^0_2\) follow a uniform law, which can be abstracted as in Sect. 3 with DSI structures \(d^0_1\) and \(d^0_2\). Algorithm 2 produces the same input zonotope and propagation through the network as above. Let us discretize the inputs with 2 focal elements. The rescaling of the DSI \(d^0_1\) and \(d^0_2\) between -1 and 1 yields \(d_{\varepsilon _1} = \{ \langle [-1,0], 0.5 \rangle , \langle [0,1], 0.5 \rangle \} \) and \(d_{\varepsilon _2} = \{ \langle [-1,0], 0.5 \rangle , \langle [0,1], 0.5 \rangle \}\).
The concretization of the final probabilistic zonotope \( p\mathcal{Z}^2(d_{\varepsilon })\) to a vector of DSI writes: \(d^2_1 = - d_{\varepsilon _2} + 0.75 d_{\varepsilon _3} - 0.75 d_{\varepsilon _4}\) and \(d^2_2 = 1.5 + 2 d_{\varepsilon _1} + 0.75 d_{\varepsilon _3} + 0.75 d_{\varepsilon _4}\), where \(d_{\varepsilon _3}\) and \(d_{\varepsilon _4}\) are the DSI corresponding to the noise symbols introduced in the analysis by the ReLU function, with unknown distribution in \([-1,1]\). We get \(d^2_1 = \{ \langle [-2.5,1.5], 0.5 \rangle , \langle [-1.5,2.5], 0.5 \rangle \}\) and \(d^2_2 = \{ \langle [-2.,3.], 0.5 \rangle , \langle [0.,5.], 0.5 \rangle \}\) and deduce \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.5]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0, 1]\).
The supports of the DSI are equal to the range obtained by the classical zonotopic analysis, thus incomparable to the support of the DSI obtained by Algorithm 1. The results are more generally not strictly comparable to those of DSI computation. For instance here with 100 focal elements, we have \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.26]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0, 0.76]\) both in the case of independent inputs \(x^0_1\) and \(x^0_2\) and unknown correlation, which is better than DSI in the case of unknown correlation, while DSI are better for independent inputs. The reason why the results do not depend here on the correlation between inputs is that \(d_{\varepsilon _1}\) does not appear in the expression of \(d^2_1\) and \(d_{\varepsilon _2}\) in the expression of \(d^2_2\), so that the information of correlation between inputs is not used.
5 Analysis with Zonotopic Dempster-Shafer Structures
In Sect. 4, a unique initial zonotope is built and propagated in the network. This propagation is exact through affine layers, but can be highly conservative for nonlinear operations such as the activation functions. In Algorithm 3, we suppose that the inputs are independent and perform the zonotopic propagation at a finer grain, on each tuple of focal elements of the inputs. This can be seen as using zonotopic focal elements to represent the input vector of a layer, instead of interval focal elements to represent each component of the input vector.
Input and Initialization. The input is the same as in Sects. 3 and 4: the uncertain input is modelled as a vector \(d^0 = (d^0_1, \ldots ,d^0_{h_0})\) of \(h_0\) DSI. Assuming the input components as independent, we perform the convolution of the distributions of the input components to build a DSZ abstraction of the input vector: we construct one zonotope per possible \(h_0\)-tuple of focal elements representing the input vector of DSI \(d^0\), with weight the product of the weights of each interval focal elements: we define for each \((i_1,i_2,\ldots ,i_{h_0}) \in [1,n]^{h_0}\) the zonotope \(\mathcal{Z}^0_{i_1 \ldots i_{h_0}} = \langle c^0_{i_1\ldots i_{h_0}}, \varGamma ^0_{i_1\ldots i_{h_0}} \rangle ,\) built from the box \(\boldsymbol{x}^0_{1 i_1} \times \boldsymbol{x}^0_{2 i_2}\times \ldots \times \boldsymbol{x}^0_{h_0 i_{h_0}}\) and define the input \(d_\mathcal{Z}^0\) as a Dempster Shafer structure with zonotopic focal elements (DSZ in short): \( d_\mathcal{Z}^0 =\bigl \{\langle \mathcal{Z}^0_{i_1 \ldots i_{h_0}} , w^0_{1,i_1} w^0_{2,i_2} \ldots w^0_{h_0,i_{h_0}} \rangle , \,(i_1,i_2,\ldots ,i_{h_0}) \in [1,n]^{h_0} \bigr \}. \)
The number of focal elements does not have to be identical for each component of the input vector \(d^0\), this choice was made here for simplicity of notation.
The propagation in the layers then consists in propagating each zonotope focal elements. Note that the number of focal elements remains constant through the propagation in the layers because all convolutions were computed at initialization, only the zonotopes size evolves with the layer dimensions.
Output. The DSZ \(d_\mathcal{Z}^L\) is projected on the output vector, defined for each \(i \in [1,h_L]\) by the DSI \( d^L_i = \bigl \{ \langle \gamma _i(\mathcal{Z}^0_{i_1 \ldots i_{h_0}}) , w^0_{1,i_1} w^0_{2,i_2} \ldots w^0_{h_0,i_{h_0}} , \,(i_1,i_2,\ldots ,i_{h_0}) \in [1,n]^{h_0} \rangle \bigr \} .\)
The property can be assessed by evaluating the set of joint cumulative distributions represented by the DSZ \(H d_\mathcal{Z}^L\), by generalizing the definition of Proposition 1 from interval to zonotopic focal elements:
Proposition 4
(CDF of a Zonotopic Dempster-Shafer Structure). Let X be a random variable in \(\mathbb {R}^n\) and \(d_\mathcal{Z}=\bigl \{ \langle \mathcal{Z}_1, w_1 \rangle ,\langle \mathcal{Z}_2, w_2 \rangle ,\dots ,\langle \mathcal{Z}_u, w_u \rangle \bigr \}\) be a DSZ with \(\mathcal{Z}_k = \langle c_k, \varGamma _k \rangle \) with \(c_k \in \mathbb {R}^n\) and \(\varGamma _k \in R^{n,p}\) and \(w_k \in ]0,1]\) and \(\sum _{k=1}^u w_u = 1\). The DSZ \(d_\mathcal{Z}\) defines a discrete p-box representing the sets of joint cumulative distribution functions such that for \(v \in \mathbb {R}^n\),
where \(\downarrow v\) is the set of points in \(\mathbb {R}^n\) less or equal than v in the componentwise ordering. Practically, we can use the ranges or projections of each component of \(\mathcal{Z}_k\) to get a conservative over-approximation of the p-box:
This proposition can be derived from the notion of cdf of a random set of [24]. The bounds obtained by Proposition 4 are always at least as good than by first converting the DSZ as a vector and then applying Proposition 1.
DSZ Analysis of the Toy Example. We consider again Example 1. with 2 focal elements for each input, we have \(d^0_1 = \{ \langle [-2,0], 0.5 \rangle , \langle [0,2], 0.5 \rangle \}\) and \(d^0_2 = \{ \langle [-1,0], 0.5 \rangle , \langle [0,1], 0.5 \rangle \}\). At Line 1 of Algorithm 3, \(d_\mathcal{Z}^0\) is a DSZ structure with 4 zonotopic focal elements, each with weight 0.25: \( \mathcal{Z}^0_{11}= \langle \begin{bmatrix} -1 \\ - 0.5 \end{bmatrix} , \begin{bmatrix} 1 & 0 \\ 0 & 0.5 \end{bmatrix} \rangle , \, \mathcal{Z}^0_{12}= \langle \begin{bmatrix} -1 \\ 0.5 \end{bmatrix} , \begin{bmatrix} 1 & 0 \\ 0 & 0.5 \end{bmatrix} \rangle , \, \mathcal{Z}^0_{21}= \langle \begin{bmatrix} 1 \\ -0.5 \end{bmatrix} , \begin{bmatrix} 1 & 0 \\ 0 & 0.5 \end{bmatrix} \rangle , \, \mathcal{Z}^0_{22}= \langle \begin{bmatrix} 1 \\ 0.5 \end{bmatrix} , \begin{bmatrix} 1 & 0 \\ 0 & 0.5 \end{bmatrix} \rangle \). After the output layer, the 4 zonotopic elements, each with weight 0.25, are: \( \mathcal{Z}^2_{11}= \langle \begin{bmatrix} \frac{1}{6} \\ \frac{1}{6} \end{bmatrix} , \begin{bmatrix} \frac{1}{3} & -\frac{1}{6} & \frac{1}{3} \\ \frac{1}{3} & -\frac{1}{6} & \frac{1}{3} \end{bmatrix} \rangle , \, \mathcal{Z}^2_{12}= \langle \begin{bmatrix} -\frac{1}{6} \\ \frac{1}{6} \end{bmatrix} , \begin{bmatrix} -\frac{1}{3} & -\frac{1}{6} & -\frac{1}{3} \\ \frac{1}{3} & \frac{1}{6} & \frac{1}{3} \end{bmatrix} \rangle , \, \mathcal{Z}^2_{21}= \langle \begin{bmatrix} \frac{5}{6} \\ \frac{13}{6} \end{bmatrix} , \begin{bmatrix} \frac{1}{3} & -\frac{5}{6} & -\frac{1}{3} \\ \frac{10}{6} & -\frac{1}{6} & \frac{1}{3} \end{bmatrix} \rangle , \,\) \(\mathcal{Z}^2_{22}= \langle \begin{bmatrix} -\frac{5}{6} \\ \frac{13}{6} \end{bmatrix} , \begin{bmatrix} - \frac{1}{3} & -\frac{5}{6} & \frac{1}{3} \\ \frac{10}{6} & \frac{1}{6} & \frac{1}{3} \end{bmatrix} \rangle \). From these and their projected ranges for \(x^2_1\) and \(x^2_2\), we deduce (see Appendix of preprint [10]) using Proposition 4 that \(\mathbb {P}(x^2_1 \le -2) \in [0,0.25]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0,0.5]\) and the conjunction \(\mathbb {P}(Hy \le w) \in [0.0, 0.25]\).
6 Evaluation
Implementation. We implemented our approachFootnote 1 using the Julia library ProbabilityBoundsAnalysis.jl [11] for Interval Dempster Shafer abstraction and arithmetic. In this library, the focal elements of a DSI structure all have same weight. The result is reduced after each arithmetic operation to keep a constant number of focal elements. We rely on this DSI implementation, but our DSZ implementation does not present the same restrictions. The focal elements are bounded, but a flag allows the user to specify that a distribution may have unbounded support, and this knowledge is used to produce a sound CDF estimation for unbounded distributions. In our current implementation, we do not use this possibility, but we believe that the work presented here can be extended to unbounded support. Timings for our analysis are on a MacBook Pro 2.3 GHz Intel Core i9 with 8 cores (the implementation is not parallel, although obviously parallelizable).
Comparing DSI, Probabilistic Zonotopes and DSZ on the Toy Example. We compare in Table 1 our 3 abstractions in the case of independent inputs, varying the number of focal elements and the input distributions: U(n) denotes a uniform distribution represented with n focal elements, and N(n) a truncated normal law in the same range with n focal elements.
On this example, the DSZ analysis is by far more precise, followed by the DSI and finally the probabilistic zonotopes. Refining the input discretization with more focal elements tightens the output of all analyzes, but only the DSZ converges to actually tight bounds. In particular, for DSI and probabilistic zonotopes, the support of the output distribution is unchanged when the input is refined. The computation times are, on this example, of the same order of magnitude for all three analyzes, slightly higher for DSZ, and lower for probabilistic zonotopes. The reason for the probabilistic zonotopes to have lower cost is that affine transforms are computed on the zonotopes, and the costly operations between DSI are delayed until the final representation as a DSI. It is not surprising that the DSZ have slightly higher cost, because of the exponential number of zonotopic focal elements. However, the computation is obviously parallelizable.
We can also note the strong impact of the input hypotheses on the results, advocating the need of such an approach which can account in a same framework and computation for large classes of inputs. For instance, changing the input distribution from a uniform to a Gaussian truncated to same support produces very different probability bounds. In Table 1, we supposed the inputs independent. For instance, the DSI analysis for 100 focal elements and a uniform law, produces for independent inputs \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.07]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0.05, 0.52]\), while for inputs with unknown dependence, \(\mathbb {P}(x^2_1 \le -2) \in [0, 0.26]\) and \(\mathbb {P}(x^2_2 \ge 2) \in [0, 1]\).
For independent inputs, the DSZ is the best choice among our approaches. In the case of correlated inputs, it is hard to conclude from such a simple example between DSI and probabilistic zonotopes. In the context of discrete dynamical systems where probabilistic zonotopes were proposed [6, 7], they were much better than DSI both in terms of efficiency and accuracy. The context of neural networks is less favorable, but it is probable that probabilistic zonotopes can be more interesting than DSI for larger networks. However, our focus is to explore in the future the encoding of multivariate probabilistic distributions as input distributions, and lift this current restriction on the DSZ analysis.
Comparing DSZ to Probstar [29]. We now compare our approach to the results of the closely related approach [29] on their two benchmark examples. Similarly to [29], we consider the inputs as independent. ACAS Xu We consider the ACAS Xu networks benchmark, where the networks have 5 inputs and 5 outputs, with the same input configurations and properties (\(P_2: y_1 > y_2 \wedge y_1 > y_3 \wedge y_1 > y_4 \wedge y_1 > y_5\), \(P_3/P_4: y_1 < y_2 \wedge y_1 < y_3 \wedge y_1 < y_4 \wedge y_1 < y_5\)) as in [29]. The lower and upper bounds on the inputs, lb and ub, depend on the property, and are used in [29] to define probabilistic input sets by Gaussian distributions with mean \(m=(ub+lb)/2\) and standard deviation \((ub-m)/a\), where \(a=3\), truncated between lb and ub. In our work, after creation of the input DSI from the above Gaussian distribution, we truncate all focal elements so that the support of the DSI is restricted to the input range [lb, ub]. In [29], an argument is used to deduce bounds for the probability for non truncated distributions. We could use a similar argument here but we focus on the results for the truncated distributions, and compare our results to the interval \([US-Prob-LB,US-Prob-UB]\) with the notations of [29].
We choose for our approach an over-approximation of the input distributions using a different number of focal elements for each component of the vector input, based on the relative widths of the input intervals. We represent these as vectors of number of focal elements, taking [5, 80, 50, 6, 5] for Property 2, [5, 20, 1, 6, 5] for Properties 3 and 4. In Table 2, we compare our results with the Probstar approach with two parametrizations: \(p_f=0\) corresponds to an exact set-based propagation, while \(p_f=e^{-5}\) corresponds to the level of over-approximation in propagation most widely used in [29].
The running times for Probstar in Table 2 are those of [29], hence not computed on the same computer. We reproduced Property 2 on Net 1–6 with Probstars on our MacBook: for \(p=0\), it takes 3614 s on 8 cores, 5045 s on 4 cores, 12542 on 1 core, to be compared to the 1424 s in Table 2; for \(p=e^{-5}\), it takes 425 s on 8 cores, 489 on 4 cores, 1489 on 1 core, to be compared to the 206 s in Table 2) and to the 46 s with DSZ.
The tightness of the enclosures of the DSZ is comparable to Probstars with \(p=e^{-5}\), for an analysis being generally an order of magnitude faster. The results look consistent between the 2 analyzes for Property 2. Properties 3 and 4 (originally from [15]) are true on the whole input range, which can be proven by classical set-based analysis, and our approach accordingly produces a probability equal to 1. The approach of [29] produces more precise, ’exact’ results, when \(p=0\), than our approach. However, only the set-based propagation is exact, there is also a part of probabilistic estimation. For instance, when reproducing Property 2 on Net 1–6 with Probstars, we obtained for \(p=0\), the probabilities 1.56119e-05 with 8 cores, 6.76052e-06 with 4 cores, 7.22045e-06 with 1 core, to be compared to the ’exact’ 1.87224e-05 in the table. In contrast, our approach produces fully guaranteed bounds while allowing a much richer classes of inputs.
In Table 2, we manually chose the number of focal element per input component. Although we refrained from optimizing too much, choosing for instance the same discretization for different networks, this impairs the practicality of the approach. As a first answer, we implemented a simple loop to automatically refine the discretization starting from a very rough one, by some basic sensitivity analysis. For instance, for Property 2 and net-1-6, the total refinement process with as stopping criterion the width of the probability interval lower than 0.05 takes 112 s and leads to the number of focal elements [5, 81, 38, 5, 5] and a probability in [0, 0.0276], with bounds twice tighter than Probstars with \(p=e^{-5}\).
Rocket Lander. Let us now consider the rocket lander example of [29], with the same inputs and properties. The networks have here 9 inputs. Taking the vector of focal elements [7, 12, 10, 17, 9, 7, 1, 1, 2, 1, 1] produces the results of Table 3
Again, the timings for Probstars are those of [29]; we executed for instance on our MacBook the analysis of Property 1 on network 0 with 4 cores, the running times were 1351.2 s for \(p_f=1e{-5}\) and 12127 s for \(p_f=0\).
7 Conclusion
A central notion for dealing with multivariate probabilistic distributions is that of a copula [18], and in particular Sklar’s theorem which links multivariate cdf with the cdf of its marginals. Multiple authors have considered generalizing Sklar’s theorem to imprecise probabilities, [17, 19], with e.g. applications in [12] to the analysis of non-linear dynamical systems. In this work, we developed the case of multidimensional imprecise probabilites described by the independence copula. Future work includes the tractable treatment of other copulas in our framework. Finally, we focused here on ReLU-based networks, but the approach is by no means restricted to this activation function.
Notes
- 1.
available from https://doi.org/10.5281/zenodo.12519084.
References
Bak, S., Duggirala, P.S.: Simulation-equivalent reachability of large linear systems with inputs. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 401–420. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_20
Baluta, T., Chua, Z.L., Meel, K.S., Saxena, P.: Scalable quantitative verification for deep neural networks (2021)
Baluta, T., Shen, S., S., S., Meel, K.S., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Computer and Communications Security (2019)
Beer, M., S.F., Kreinovich, V.: Imprecise probabilities in engineering analyses. Mech. Syst. Signal Process. 37(1), 4–29 (2013)
Boopathy, A., Weng, T.W., Chen, P.Y., Liu, S., Daniel, L.: CNN-cert: an efficient framework for certifying robustness of convolutional neural networks. In: AAAI (2019)
Bouissou, O., Goubault, E., Goubault-Larrecq, J., Putot, S.: A generalization of p-boxes to affine arithmetic. Computing 94(2–4), 189–201 (2012)
Bouissou, O., Goubault, E., Putot, S., Chakarov, A., Sankaranarayanan, S.: Uncertainty propagation using probabilistic affine forms and concentration of measure inequalities. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 225–243. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_13
Fazlyab, M., Morari, M., Pappas, G.J.: Probabilistic verification and reachability analysis of neural networks via semidefinite programming. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 2726–2731 (2019). https://doi.org/10.1109/CDC40024.2019.9029310
Ferson, S., Kreinovich, V., Ginzburg, L., Myers, D.: Constructing probability boxes and dempster-shafer structures. Tech. rep., Sandia National Laboratories, SAND2002-4015, Albuquerque, New Mexico (2003)
Goubault, E., Putot, S.: A Zonotopic Dempster-Shafer Approach to the Quantitative Verification of Neural Networks (2024). https://hal.science/hal-04546350. Working paper or preprint
Gray, A., Ferson, S., Patelli, E.: ProbabilityBoundsAnalysis.jl: arithmetic with sets of distributions. In: Proceedings of JuliaCon (2021)
Gray, A., Forets, M., Schilling, C., Ferson, S., Benet, L.: Verified propagation of imprecise probabilities in non-linear ODEs. Int. J. Approx. Reason. 164, 109044 (2024). https://doi.org/10.1016/j.ijar.2023.109044
Henriksen, P., Lomuscio, A.R.: Efficient neural network verification via adaptive refinement and adversarial search. In: Giacomo, G.D., et al. (eds.) ECAI 2020 - 24th European Conference on Artificial Intelligence, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020). Frontiers in Artificial Intelligence and Applications, vol. 325, pp. 2513–2520. IOS Press (2020)
Huang, C., Hu, Z., Huang, X., Pei, K.: Statistical certification of acceptable robustness for neural networks. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds.) ICANN 2021. LNCS, vol. 12891, pp. 79–90. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86362-3_7
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Lyu, Z., Ko, C.Y., Kong, Z., Wong, N., Lin, D., Daniel, L.: Fastened crown: tightened neural network robustness certificates. Proc. AAAI Conf. Artif. Intell. 34(04), 5037–5044 (2020)
Montes, I., Miranda, E., Pelessoni, R., Vicig, P.: Sklar’s theorem in an imprecise setting. Fuzzy Sets and Systems 278, 48–66 (2015). https://doi.org/10.1016/j.fss.2014.10.007, https://www.sciencedirect.com/science/article/pii/S0165011414004539, special Issue on uncertainty and imprecision modelling in decision making (EUROFUSE 2013)
Nelsen, R.B.: An Introduction to Copulas, 2nd edn. Springer, New York (2006)
Omladič, M., Stopar, N.: A full scale sklar’s theorem in the imprecise setting. Fuzzy Sets and Systems 393, 113–125 (2020). https://doi.org/10.1016/j.fss.2020.02.001, https://www.sciencedirect.com/science/article/pii/S0165011420300348, copulas and Related Topics
Pautov, M., Tursynbek, N., Munkhoeva, M., Muravev, N., Petiushko, A., Oseledets, I.: Cc-cert: A probabilistic approach to certify general robustness of neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 36, 7975–7983 (06 2022). https://doi.org/10.1609/aaai.v36i7.20768
Pilipovsky, J., Sivaramakrishnan, V., Oishi, M., Tsiotras, P.: Probabilistic verification of Relu neural networks via characteristic functions. In: Matni, N., Morari, M., Pappas, G.J. (eds.) Proceedings of The 5th Annual Learning for Dynamics and Control Conference. Proceedings of Machine Learning Research, vol. 211, pp. 966–979. PMLR (2023)
Păsăreanu, C., Converse, H., Filieri, A., Gopinath, D.: On the probabilistic analysis of neural networks. In: 2020 IEEE/ACM 15th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), pp. 5–8 (2020). https://doi.org/10.1145/3387939.3391594
Regan, H., Ferson, S., Berleant, D.: Equivalence of methods for uncertainty propagation of real-valued random variables. Int. J. Approx. Reason. 36, 1–30 (2004). https://doi.org/10.1016/j.ijar.2003.07.013
Schmelzer, B.: Random sets, copulas and related sets of probability measures. Int. J. Approx. Reason. 160, 108952 (2023). https://doi.org/10.1016/j.ijar.2023.108952
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press (1976)
Singh, G., Gehr, T., Mirman, M., Püschel, M., Vechev, M.: Fast and effective robustness certification. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/f2f446980d8e971ef3da97af089481c3-Paper.pdf
Singh, G., Gehr, T., Mirman, M., Püschel, M., Vechev, M.T.: Fast and effective robustness certification. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 10825–10836 (2018)
Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. Proc. ACM Program. Lang. (POPL) (2019)
Tran, H.D., Choi, S., Okamoto, H., Hoxha, B., Fainekos, G., Prokhorov, D.: Quantitative verification for neural networks using probstars. In: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control (HSCC 2023). Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3575870.3587112
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman & Hall (1991)
Webb, S., Rainforth, T., Teh, Y.W., Kumar, M.P.: A statistical approach to assessing neural network robustness. ICLR. arXiv preprint arXiv:1811.07209 (2019)
Weng, L., et al.: PROVEN: Verifying robustness of neural networks with a probabilistic approach. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6727–6736. PMLR (2019). https://proceedings.mlr.press/v97/weng19a.html
Williamson, R.C., Downs, T.: Probabilistic arithmetic: numerical methods for calculating convolutions and dependency bounds. Journ. Approx. Reas. (1990)
Zhang, D., Ye, M., Gong, C., Zhu, Z., Liu, Q.: Black-box certification with randomized smoothing: a functional optimization based framework. In: Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS 2020). Curran Associates Inc., Red Hook (2020)
Zhang, H., Weng, T.W., Chen, P.Y., Hsieh, C.J., Daniel, L.: Efficient neural network robustness certification with general activation functions. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 4939–4948. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/d04863f100d59b3eb688a11f95b0ae60-Paper.pdf
Zhang, T., Ruan, W., Fieldsend, J.E.: Proa: a probabilistic robustness assessment against functional perturbations. In: ECML PKDD 2022, Part III. LNCS, pp. 154–170. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-26409-2_10
Acknowledgement
This work was partially supported by the SAIF project, funded by the France 2030 government investment plan managed by the French National Research Agency, under the reference ANR-23-PEIA-0006, and by the 2021 project FARO, funded by Agence de l’Innovation de Défense AID through the Centre Interdisciplinaire d’Etudes pour la Défense et la Sécurité CIEDS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Data Availability Statement
The Julia implementation of our approach and the examples to reproduce results of this paper are available from https://doi.org/10.5281/zenodo.12519084.
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2025 The Author(s)
About this paper
Cite this paper
Goubault, E., Putot, S. (2025). A Zonotopic Dempster-Shafer Approach to the Quantitative Verification of Neural Networks. In: Platzer, A., Rozier, K.Y., Pradella, M., Rossi, M. (eds) Formal Methods. FM 2024. Lecture Notes in Computer Science, vol 14933. Springer, Cham. https://doi.org/10.1007/978-3-031-71162-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-71162-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71161-9
Online ISBN: 978-3-031-71162-6
eBook Packages: Computer ScienceComputer Science (R0)