Abstract
Graph weighted models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings, trees and 2-dimensional words to arbitrary families of labeled graphs (and hypergraphs). In this paper, we propose polynomial time algorithms for minimizing and deciding the equivalence of GWMs defined over the family of circular strings on a finite alphabet (GWM\(^\mathrm{c}\)s). The study of GWM\(^\mathrm{c}\)s is particularly relevant since circular strings can be seen as the simplest family of graphs with cycles. Despite the simplicity of this family and of the corresponding computational model, the minimization problem is considerably more challenging than in the case of weighted automata over strings and trees: while linear algebra tools are overall sufficient to tackle the minimization problem for classical weighted automata (defined over a field), the minimization of GWM\(^\mathrm{c}\)s involves fundamental notions from the theory of finite dimensional algebra. We posit that the properties of GWM\(^\mathrm{c}\)s unraveled in this paper willprove useful for the study of GWMs defined over richer families of graphs.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Functions defined over syntactical structures such as strings, trees and graphs are ubiquitous in computer science. Automata models allow one to succinctly represent such functions. In particular, weighted automata can efficiently model functions mapping structured objects to values in a semi-ring. Weighted automata have been defined to handle functions whose domain are e.g. strings [9, 26], trees [8, 16] and 2-dimensional words [11]. More recently, Bailly et al. [2] proposed a computational model for functions mapping labeled graphs (or hypergraphs) to values in a field (see also [22, Chap. 2]): Graph Weighted Models (GWMs). GWMs extend the notion of linear representation of a function defined over strings and trees to functions defined over graphs labeled by symbols in a ranked alphabet: loosely speaking, while string weighted automata can be defined by associating each symbol in a finite alphabet to a linear map and tree weighted automata by associating each symbol in a ranked alphabet to a multilinear map, GWMs are defined by associating each arity k symbol from a ranked alphabet to a kth order tensor. The computation of a GWM boils down to mapping each vertex in a graph to the tensor associated to its label and performing contractions directed by the edges of the input graph to obtain a value in the supporting field. When restricted to the families of strings, trees or 2-dimensional words, GWMs are expressively equivalent to the classical notions of weighted automata over these structures.
Weighted automata have recently received interest from the machine learning community due to their ability to represent functions defined over structured objects. Efficient (and often consistent) learning algorithms have been developed for such computational models defined over sequences [3, 6, 10, 19] and trees [1, 4, 14]. Motivated by the relevance of learning functions defined over richer families of labeled graphs, our long term objective is to design efficient learning algorithms for GWMs. This is however a challenging task. Given the close relationship between minimization and learning for classical weighted automata (see e.g. [7, 21, 27]), we take a first step in this direction by tackling the problem of minimizing GWMs defined over the simple family of circular strings.
Circular strings are strings whose last symbol is connected to the first. A circular string can be seen as a directed graph where each vertex is labeled by a symbol from a finite alphabet and is connected to his unique successor (i.e. a labeled graph composed of a unique cycle). Circular strings are relevant in biology (see e.g. [20] and references therein) and have been studied from a formal language perspective in the non-quantitative setting in [24]. The study of GWMs defined over such graphs is particularly relevant since circular strings are in some sense the simplest family of graphs with cycles (and cycles can be seen as the key obstacle for going from strings and trees to general graphs). Moreover, GWMs defined over the family of circular strings—which we henceforth denote by GWM\(^\mathrm{c}\)s to avoid confusions—take a simple form making them easily amenable to theoretical study: a GWM\(^\mathrm{c}\) is given by a set of matrices \(\mathbf {A}^\sigma \) for each symbol \(\sigma \) in a finite alphabet, and maps any circular string \(\sigma _1\sigma _2\cdots \sigma _k\) to the trace of the products of the matrices associated with the letters in the stringFootnote 1. Despite the simplicity of this computational model and its strong connection with string weighted automata, the minimization problem is considerably more challenging than in the case of string or tree weighted automata. More precisely, while the minimization problem can easily be handled using notions from linear algebra for e.g. real-valued string weighted automata (see e.g. [7]), we show in this paper that the minimization of GWM\(^\mathrm{c}\)s requires fundamental concepts from the theory of finite-dimensional algebras (such as the ones of radical and semi-simplicity).
Contributions. Throughout the paper, we only consider automata defined over a field of characteristic 0. After introducing notions on weighted automata, GWM\(^\mathrm{c}\)s and finite-dimensional algebras in Sect. 2, we first tackle the problem of deciding the equivalence of GWM\(^\mathrm{c}\)s in Sect. 3. The study of the equivalence problem is motivated by the simple observation that two minimal GWMs computing the same function are not necessarily related by a change of basis, which is in contrast with a classical result stating that two minimal string weighted automata are equivalent if and only if they are related by a change of basis. Building from this observation, we unravel the fundamental notion of semi-simple GWM\(^\mathrm{c}\) and we show that any function recognizable by a \(GWM^{c}\) can be computed by a semi-simple GWM\(^\mathrm{c}\) (Corollary 1) and that two semi-simple \(GWM^{c}s\) of equal dimensions computing the same function are necessarily related by a change of basis (Corollary 2). These two results naturally give rise to a polynomial time algorithm to decide whether two \(GWM^{c}s\) are equivalent. We then move on to the minimization problem in Sect. 4, where we give a polynomial time minimization algorithm for \(GWM^{c}s\) which fundamentally relies on the notion of semi-simple GWM\(^\mathrm{c}\) (Corollary 3). While the problem of minimizing a GWM defined over the simple family of circular strings is central to this paper, we see it as a test bed for developing the theory of general GWMs: beyond the minimization and equivalence algorithms we propose, we believe that one of our main contributions is to illustrate how the theory of GWMs will rely on advanced concepts from algebra theory and to unravel fundamental properties that will surely be central to the study of GWMs defined over more general families of graphs (such as the one of semi-simple GWM\(^\mathrm{c}\)).
1.1 Notations
For any integer n we let \([n] = \{1,2,\cdots , n\}\). We denote the set of integers by \(\mathbb {N}\) and the fields of real and rational numbers by \(\mathbb {R}\) and \(\mathbb {Q}\) respectively. Let \(\mathbb {F}\) be a field of characteristic 0, we denote by \(\mathcal {M}_{n}(\mathbb {F}) = \mathbb {F}^{n\times n}\) the set of all \(n\times n\) matrices over \(\mathbb {F}\). We use lower case bold letters for vectors (e.g. \(\mathbf {v} \in \mathbb {F}^{d_1}\)) and upper case bold letters for matrices (e.g. \(\mathbf {M}\in \mathbb {F}^{d_1 \times d_2}\)). We denote by \(\mathbf {I}_n\) the \(n\times n\) identity matrix (or simply \(\mathbf {I}\) if the dimension is clear from context). Given a matrix \(\mathbf {M}\in \mathbb {F}^{d_1 \times d_2}\), we denote its entries by \(\mathbf {M}_{i,j}\) and we use \(\mathrm {vec}(\mathbf {M}) \in \mathbb {F}^{d_1 d_2}\) to denote the column vector obtained by concatenating the columns of \(\mathbf {M}\). We use \(\ker (\mathbf {A})\) to denote the kernel (or null space) of a matrix \(\mathbf {A}\). Given two matrices \(\mathbf {A}\in \mathcal {M}_m(\mathbb {F})\) and \(\mathbf {B}\in \mathcal {M}_n(\mathbb {F})\) we denote their Kronecker product by \(\mathbf {A}\otimes \mathbf {B}\in \mathcal {M}_{mn}(\mathbb {F})\) and their direct sum by \(\mathbf {A}\oplus \mathbf {B}\in \mathcal {M}_{m+n}(\mathbb {F})\): \(\mathbf {A}\otimes \mathbf {B}\) is the block matrix with blocks \((\mathbf {A}_{i,j}\mathbf {B})_{i,j}\) and \(\mathbf {A}\oplus \mathbf {B}\) is the block diagonal matrix with \(\mathbf {A}\) in the upper diagonal block and \(\mathbf {B}\) in the lower one. We denote by \(\varSigma ^*\) the set of strings on a finite alphabet \(\varSigma \) and the empty string by \(\lambda \). We denote by \(\varSigma ^+\) the set of non-empty strings and by \(\varSigma ^k\) the set of all strings of length k.
2 Preliminaries
We first present notions on weighted automata, graph weighted models and finite dimensional algebras. The reader is referred to [9, 16, 25] for more details on weighted automata theory, to [2] and [22, Chap. 2] for an introduction to graph weighted models, and to [13, 17] for a thorough introduction to finite dimensional algebras.
2.1 Weighted Automata and GWMs over Circular Strings
Let \(\varSigma \) be a finite alphabet. A weighted finite automaton (WFA) over a field \(\mathbb {F}\) with n states is a tuple \(M=({\varvec{\alpha }}, \{\mathbf {M}^\sigma \}_{\sigma \in \varSigma },{\varvec{\omega }} )\) where \({\varvec{\alpha }},{\varvec{\omega }}\in \mathbb {F}^n\) are the initial and final weight vectors respectively, and \(\mathbf {M}^\sigma \in \mathcal {M}_n(\mathbb {F})\) is the transition matrix for each symbol \(\sigma \in \varSigma \). A WFA computes a function \(f_M:\varSigma ^*\rightarrow \mathbb {F}\) defined for each word \(x=x_1x_2\cdots x_k\in \varSigma ^*\) by
We will often use the shorthand notation \(\mathbf {M}^x= \mathbf {M}^{x_1}\mathbf {M}^{x_2}\cdots \mathbf {M}^{x_k}\) for any word \(x=x_1x_2\cdots x_k\in \varSigma ^*\). A WFA M with n states is minimal if its number of states is minimal, i.e. any WFA \(M'\) such that \(f_M=f_{M'}\) has at least n states. We say that a function \(f:\varSigma ^*\rightarrow \mathbb {R}\) is WFA-recognizable if there exists a WFA computing it.
Graph weighted models (GWMs) have been introduced as a computational model over arbitrary labeled graphs and hypergraphs in [2]. In this paper, we focus on the simple model of GWMs defined over the family of circular strings. A circular string is a string without a beginning or an end, one can think of it as a string closed onto itself (see Fig. 1).
A d-dimensional GWM A over circular strings (GWM\(^\mathrm{c}\)) on \(\varSigma \) is given by a set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\). It computes a function \(f_A:\varSigma ^+ \rightarrow \mathbb {F}\) definedFootnote 2 for each word \({x=x_1x_2\cdots x_k\in \varSigma ^+}\) by
By invariance of the trace under cyclic permutation, we have \(f_A(x_1 x_2\cdots x_k) = f_A(x_2 x_3\cdots x_k x_1) = f_A(x_3 x_4\cdots x_k x_1 x_2) = \cdots \). This is in accordance with the definition of a circular string: for any string \(x'\) obtained by cyclic permutation of the letters of a string x, both x and \(x'\) correspond to the same circular string. Similarly to WFAs, a GWM\(^\mathrm{c}\) is minimal if its dimension is minimal and a function \(f:\varSigma ^+\rightarrow \mathbb {F}\) is \(GWM^{c}{\text {-}}recognizable\) if it can be computed by a GWM\(^\mathrm{c}\).
It is immediate to see that there exist WFA-recognizable functions that are not GWM\(^\mathrm{c}\)-recognizable, this is the case of any WFA-recognizable function that is not invariant under cyclic permutation of letters in a wordFootnote 3. In contrast, one can easily show that any GWM\(^\mathrm{c}\)-recognizable function is WFA-recognizable. More precisely, we have the following result.
Proposition 1
For any d-dimensional GWM\(^\mathrm{c}\) \(A = \{\mathbf {A}^\sigma \}_{\sigma \in \varSigma } \) on \(\varSigma \), the WFA M with \(d^2\) states \(({\varvec{\alpha }}, \{\mathbf {M}^\sigma \}_{\sigma \in \varSigma },{\varvec{\omega }} )\) where \({\varvec{\alpha }}={\varvec{\omega }}=\mathrm {vec}(\mathbf {I}_d)\) and \(\mathbf {M}^\sigma = \mathbf {I}_d\otimes \mathbf {A}^\sigma \) for each \(\sigma \in \varSigma \), is such that \(f_M(x) = f_A(x)\) for all \(x \in \varSigma ^*\).
Proof
For any \(w=w_1\cdots w_n \in \varSigma ^*\) we have \(f_A(w) = \mathrm {Tr} (\mathbf {A}^w) = \sum _{i\in [d]} \mathbf {A}^w_{i,i} = \sum _{i\in [d]} \mathbf {e}_i^\top \mathbf {A}^w \mathbf {e}_i\) where \(\mathbf {e}_i\) is the i-th vector of the canonical basis of \(\mathbb {F}^d\). Since \({\varvec{\alpha }}={\varvec{\omega }}= (\mathbf {e}_1^\top , \cdots , \mathbf {e}_d^\top )^\top \) and \(\mathbf {M}^\sigma = \mathbf {I} \otimes \mathbf {A}^\sigma \) is the block-diagonal matrix with \(\mathbf {A}^\sigma \) repeated d times on the diagonal, one can check that \(f_M(w) ={\varvec{\alpha }}^\top \mathbf {M}^w {\varvec{\omega }}= \sum _{i\in [d]} \mathbf {e}_i^\top \mathbf {A}^w \mathbf {e}_i = f_A(w)\). \(\square \)
It follows from this proposition that the learning and equivalence problems for GWM\(^\mathrm{c}\)s could be handled by using the corresponding algorithms for WFAs. We will nonetheless study the equivalence problem in the next sectionFootnote 4 without falling back onto the theory of WFAs, which will allow us to unravel fundamental properties of GWMs that will be particularly relevant to further studies (moreover, the minimization problem obviously cannot be handled in such a way).
2.2 Finite-Dimensional Algebras
An algebra \(\mathcal {A}\) over a field \(\mathbb {F}\) (or \(\mathbb {F}\)-algebra) is a vector space over the field \(\mathbb {F}\) equipped with a bilinear operation (called multiplication or product). An algebra is associative if its product is associative and it is finite-dimensional if it is of finite dimension as a vector space over \(\mathbb {F}\). In this paper, we will only consider finite-dimensional associative algebras. A sub-algebra \(\mathcal {B}\) of an algebra \(\mathcal {A}\) is a linear subspace of \(\mathcal {A}\) which is closed under product (i.e. \(\mathcal {B}\) equipped with the operations of \(\mathcal {A}\) is an algebra itself).
A classical example of finite-dimensional algebra is the set \(\mathcal {L}(V)\) of linear operators on some finite-dimensional vector space V (where the product is composition). In this particular example, the algebra \(\mathcal {L}(V)\) is isomorphic to the full matrix algebra \(\mathcal {M}_d(\mathbb {F})\), where d is the dimension of V; we will mainly focus on matrix algebras in this paper, i.e. sub-algebras of the full matrix algebra \(\mathcal {M}_d(\mathbb {F})\) for some d (an example of such an algebra is the set of \(d\times d\) upper triangular matrices). In particular, we will often consider the algebra generated by a finite set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) for some finite alphabet \(\varSigma \), that is the set of all finite linear combinations of matrices of the form \(\mathbf {A}^x=\mathbf {A}^{x_1}\mathbf {A}^{x_2}\cdots \mathbf {A}^{x_k}\) for \(x=x_1x_2\cdots x_k\in \varSigma ^*\). More formally, if we denote by \(\mathcal {A}\) this algebra, we have
Let \(\mathcal {A}\) be a finite-dimensional algebra over \(\mathbb {F}\). A sub-algebra \(\mathcal {X}\) of \(\mathcal {A}\) is called an ideal of \(\mathcal {A}\) if both \(xa\in \mathcal {X}\) and \(ax\in \mathcal {X}\) for any \(x\in \mathcal {X},\ a\in \mathcal {A}\) (i.e. \(\mathcal {X}\) is both left and right \(\mathcal {A}\)-invariant), which we will denote by \(\mathcal {A}\mathcal {X}=\mathcal {X}\mathcal {A}=\mathcal {A}\). A sub-algebra \(\mathcal {X}\) of \(\mathcal {A}\) is nilpotent if there exists some integer k such that \(\mathcal {X}^k= \{x_1x_2\cdots x_k :\ x_i\in \mathcal {X},\ i\in [k]\}=\{0\}\). The factor algebra \(\mathcal {A}/\mathcal {X}\) of an algebra \(\mathcal {A}\) by an ideal \(\mathcal {X}\) is the algebra consisting of all cosets \(a+\mathcal {X}\) for \(a\in \mathcal {A}\), in other words \(\mathcal {A}/\mathcal {X}\) is the quotient of \(\mathcal {A}\) by the equivalence relation (\(a\sim b\) if and only if \(a-b\in \mathcal {X}\)). The radicalFootnote 5 of \(\mathcal {A}\) is the maximal nilpotent ideal of \(\mathcal {A}\) and will be denoted by \(\mathrm {Rad}(\mathcal {A})\) (the existence of \(\mathrm {Rad}(\mathcal {A})\) follows from the fact that \(\mathcal {A}\) is of finite dimension). An algebra \(\mathcal {A}\) is semi-simple if its radical is \(\{0\}\).
Let us illustrate these definitions with a very simple example. Let \(\mathcal {G}\subset \mathcal {M}_2(\mathbb {R})\) be the algebra generated by the matrix \(\mathbf {G}= \left[ \begin{matrix}1&{}1\\ 0&{}1\end{matrix}\right] \). One can easily check that
\(\mathcal {G}= \left\{ \left[ \begin{matrix}\alpha &{}\beta \\ 0 &{} \alpha \end{matrix}\right] :\alpha ,\ \beta \in \mathbb {R}\right\} \) and is thus of dimension 2. Consequently, both
are sub-algebras of \(\mathcal {G}\). Moreover, \(\mathcal {G}_2\) is a nilpotent ideal and one can check that it is maximal, i.e. \(\mathrm {Rad}(\mathcal {G})=\mathcal {G}_2\) and hence \(\mathcal {G}\) is not semi-simple.
Intuitively, the radical of an algebra \(\mathcal {A}\) contains its bad elements (in the sense that these elements annihilate all simple \(\mathcal {A}\)-modules). In our previous example, this bad property translates into the fact that the non-zero elements of \(\mathcal {G}_2\) cannot be diagonalized. We will use two fundamental results from the theory of finite dimensional algebra. The first one is the Wedderburn-Malcev theorem which states that (under some conditions on the ground field \(\mathbb {F}\)) the elements of the radical can be filtered out from the algebra, i.e. one can find a sub-algebra of \(\mathcal {A}\) that is isomorphic to \(\mathcal {A}/\mathrm {Rad}(\mathcal {A})\) (see e.g. [17, Theorem 6.2.3]).
Theorem 1
(Wedderburn-Malcev Theorem). Let \(\mathcal {A}\) be a finite-dimensional algebra over a field of characteristic0. There exists a semi-simple subalgebra \(\tilde{\mathcal {A}}\) of \(\mathcal {A}\) which is isomorphic to \(\mathcal {A}/\mathrm {Rad}(\mathcal {A})\) and such that \(\mathcal {A}= \tilde{\mathcal {A}}\oplus \mathrm {Rad}(\mathcal {A})\) (direct sum of vector spaces).
Going back to the example of the algebra \(\mathcal {G}\) described above, we showed that it is not semi-simple, however one can easily check that \(\mathcal {G}/\mathrm {Rad}(\mathcal {G})\) is isomorphic to the algebra \(\mathcal {G}_1\) in Eq. (1) which is semi-simple, and furthermore that \(\mathcal {G}= \mathcal {G}_1\oplus \mathrm {Rad}(\mathcal {G})\).
The second fundamental result we will need is related to the notion of representation of an algebra. A representation of an \(\mathbb {F}\)-algebra \(\mathcal {A}\) is a homomorphism of \(\mathcal {A}\) into the algebra \(\mathcal {L}(V)\) of the linear operators on some vector space V (over \(\mathbb {F}\)). Two representations \(\rho :\mathcal {A}\rightarrow \mathcal {L}(V)\) and \(\tau :\mathcal {A}\rightarrow \mathcal {L}(W)\) are similar if there exists an isomorphism \(\phi :V\rightarrow W\) such that \(\rho (a) = \phi ^{-1}\tau (a)\phi \) for all \(a\in \mathcal {A}\). For semi-simple algebras, the notion of similar representations is fundamentally related to the trace operator, which will be particularly relevant to the present study. Formally, we have the following theorem (see e.g. [17, Corollary 2.6.3]).
Theorem 2
Let \(\rho \) and \(\tau \) be two representations of a semi-simple algebra \(\mathcal {A}\) over a field of characteristic 0. These representations are similar if and only if \(\mathrm {Tr} (\rho (a)) = \mathrm {Tr} (\tau (a))\) for all \(a\in \mathcal {A}\).
3 Semi-Simple GWMs and the Equivalence Problem
In this section, we study the equivalence problem: given two GWMs over circular strings, how can we decide whether they compute the same function? In light of Proposition 1, one could solve this problem by simply converting the two GWM\(^\mathrm{c}\)s into WFAs and checking whether these two WFAs compute the same function; indeed the equivalence problem for WFAs defined over a field is decidable in polynomial time [9]. Nonetheless, we will tackle this problem without relying on this proposition and, by doing so, we will unravel the notion of semi-simple \(GWM^{c}\) which will be relevant to the study of the minimization problem in the next section (and which should also be central to the study of GWMs defined over more general families of graphs).
3.1 Semi-Simplicity, Nilpotent Matrices and Traces
Let \(\mathcal {A}\) be a finite dimensional matrix algebra. Recall that the radical of \(\mathcal {A}\) is its maximal nilpotent ideal. A useful characterization of the elements of the radical relies on the notion of strongly nilpotent elements: \(\mathbf {A}\in \mathcal {A}\) is strongly nilpotent if \(\mathbf {A}\mathbf {X}\) is nilpotent for any \(\mathbf {X}\in \mathcal {A}\). It turns out that the radical of \(\mathcal {A}\) is exactly the set of its strongly nilpotent elements [17, Corollary 3.1.10]. Since the computation of a GWM\(^\mathrm{c}\) boils down to applying the trace operator, we will leverage this property to relate the notions of radical and semi-simplicity to simple properties of the elements of \(\mathcal {A}\) with respect to the trace operator. We start with a simple lemma relating nilpotency and trace.
Lemma 1
Let \(\mathbb {F}\) be a field of characteristic 0 and let \(\mathbf {A}\in \mathcal {M}_{d}(\mathbb {F})\). Then \(\mathbf {A}\) is nilpotent if and only if \(\mathrm {Tr} (\mathbf {A}^n) = 0\) for all \(n\ge 1\).
Proof
Let \(\mathbf {A}\) be a nilpotent matrix and let k be such that \(\mathbf {A}^k=0\). Suppose \(\mathbf {A}\mathbf {v}=\gamma \mathbf {v}\) for some \(\mathbf {v}\ne \mathbf {0}\) (where \(\gamma \) could belong to an algebraically closed field extension of \(\mathbb {F}\)). Then \(\mathbf {A}^k\mathbf {v}=\gamma ^k \mathbf {v}= 0\) hence \(\gamma = 0\) since \(\mathbb {F}\) is of characteristic 0, thus \(\mathbf {A}\) has only 0 eigenvalues and \(\mathrm {Tr} (\mathbf {A}^n) = 0\) for all \(n\ge 1\).
Conversely, suppose that \(\mathrm {Tr} (\mathbf {A}^n) = 0\) for all \(n\ge 1\). Then, we have \(\mathrm {Tr} (P(\mathbf {A})) = 0\) for any polynomial P with constant term 0. Suppose that \(\mathbf {A}\) has a non-zero eigenvalue \(\gamma \) and let \(m>0\) be its multiplicity. Choose a polynomial P such that \(P(\gamma ) = 1\), \(P(0)=0\) and \(P(\mu ) = 0\) for any eigenvalue \(\mu \) of \(\mathbf {A}\) distinct from \(\gamma \). We then have \(0 = \mathrm {Tr} (P(\mathbf {A})) = m\), a contradiction. Hence \(\mathbf {A}\) has only zero eigenvalues and is nilpotent. \(\square \)
One can use the previous lemma to show that an element \(\mathbf {A}\in \mathcal {A}\) is strongly nilpotent if and only if \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\) for all \(\mathbf {X}\in \mathcal {A}\), which leads to the following useful characterization of the semi-simplicity of an algebra.
Proposition 2
Let \(\mathcal {A}\subset \mathcal {M}_d(\mathbb {F})\) be a matrix algebra. We have
Consequently, \(\mathcal {A}\) is semi-simple if and only if for all \(\mathbf {A}\in \mathcal {A}\) different from 0 there exists \(\mathbf {X}\in \mathcal {A}\) such that \(\mathrm {Tr} (\mathbf {A}\mathbf {X})\ne 0\).
Proof
We will show that \(\mathbf {A}\in \mathcal {A}\) is strongly nilpotent if and only if \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\) for all \(\mathbf {X}\in \mathcal {A}\). The proposition will then directly follows from the fact that \(\mathrm {Rad}(\mathcal {A})\) is the set of strongly nilpotent elements of \(\mathcal {A}\) and from the fact that \(\mathcal {A}\) is semi-simple if and only if \(\mathrm {Rad}(\mathcal {A})=\{0\}\).
Let \(\mathbf {A}\in \mathcal {A}\) be such that \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\) for all \(\mathbf {X}\in \mathcal {A}\). Since \(\mathbf {X}(\mathbf {A}\mathbf {X})^{n-1}\in \mathcal {A}\) for all \(n\ge 1\) and all \(\mathbf {X}\in \mathcal {A}\) we have \(\mathrm {Tr} ((\mathbf {A}\mathbf {X})^n) = 0\) for all \(n\ge 1\) and all \(\mathbf {X}\in \mathcal {A}\), hence \(\mathbf {A}\mathbf {X}\) is nilpotent for all \(\mathbf {X}\in \mathcal {A}\) by Lemma 1, i.e. \(\mathbf {A}\) is strongly nilpotent. Conversely, let \(\mathbf {A}\) be a strongly nilpotent element of \(\mathcal {A}\). By Lemma 1 we have \(\mathrm {Tr} ((\mathbf {A}\mathbf {X})^n)=0\) for all \(\mathbf {X}\in \mathcal {A}\) and all \(n\ge 1\), in particular \(\mathrm {Tr} (\mathbf {A}\mathbf {X})=0\). \(\square \)
3.2 Equivalence of GWMs
We now consider the problem of deciding whether two GWM\(^\mathrm{c}\)s are equivalent. Let us first briefly show how one can decide whether two real-valued WFAs compute the same function. One way to address this problem relies on the following result: two minimal real-valued WFAs computing the same function are related by a change of basis. Note that it is easy to check that WFAs are invariant under a change of basis of their weight vectors and transition matrices. The following proposition show that such a change of basis is actually the only way for two minimal WFAs to compute the same function [26] (see also [6, Corollary 4.2]).
Proposition 3
If two WFAs \(A=({\varvec{\alpha }},\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma },{\varvec{\omega }})\) and \(\tilde{A}=(\tilde{{\varvec{\alpha }}}, \{\tilde{\mathbf {A}}^\sigma \}_{\sigma \in \varSigma }, \tilde{{\varvec{\omega }}})\) with d states taking their values in \(\mathbb {R}\) are minimal and compute the same function, i.e. \(f_A=f_{\tilde{A}}\), then there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {R})\) such that
Hence, to decide whether two WFAs compute the same function one can simply minimize them and check whether the weight vectors and transition matrices obtained after minimization are related by a change of basis (which can both be done in polynomial time). In contrast, one can easily find an example of two minimal \(GWM^{c}s\) whose matrices are not related by a change of basis. Consider the constant function \(f(x)=2\) for all \(x\in \varSigma ^+\). One can check that the two GWM\(^\mathrm{c}\)s G and \(\tilde{G}\) with 2 states defined by the matrices
respectively are minimal and compute f, however \(\mathbf {G}\) and \(\tilde{\mathbf {G}}\) are not similar.
Let us now introduce the notion of semi-simple\(GWM^{c}\). We say that a GWM\(^\mathrm{c}\) A defined by a set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) is semi-simple if the algebra \(\mathcal {A}\) generated by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) is semi-simple. It follows from the example presented in Sect. 2.2 that G is not semi-simple while \(\tilde{G}\) is a semi-simple \(GWM^{c}\) computing the \(GWM^{c}{\text {-}}recognizable function\) f. We will now show that this simple example can be generalized:any \(GWM^{c}{\text {-}}recognizable\) function can be computed by a semi-simple \(GWM^{c}\). This non-trivial result relies on the following theorem which is a direct consequence of the Wedderburn-Malcev theorem.
Theorem 3
Let \(\mathcal {A}\subset \mathcal {M}_d(\mathbb {F})\) be a matrix algebra over a field of characteristic 0. Then there exist a semi-simple sub-algebra \(\tilde{\mathcal {A}}\) of \(\mathcal {A}\) and a surjective homomorphism \(\pi :\mathcal {A}\rightarrow \tilde{\mathcal {A}}\) such that \(\mathrm {Tr} (\mathbf {A}) = \mathrm {Tr} (\pi (\mathbf {A}))\) for all \(\mathbf {A}\in \mathcal {A}\).
Proof
By Theorem 1 there exists a semi-simple sub-algebra \(\tilde{\mathcal {A}}\) of \(\mathcal {A}\) which is isomorphic to \(\mathcal {A}/\mathrm {Rad}(\mathcal {A})\) and such that \(\mathcal {A}= \tilde{\mathcal {A}} \oplus \mathrm {Rad}(\mathcal {A})\) (direct sum of vector spaces). Let \(\pi :\mathcal {A}\rightarrow \tilde{\mathcal {A}}\) be the projection associated with this direct sum. Then for any \(\mathbf {A}\in \mathcal {A}\) we have
Indeed, since \((1-\pi )(\mathbf {A})\in \mathrm {Rad}(\mathcal {A})\), it is nilpotent, hence its trace is zero. \(\square \)
Using the notations from Theorem 3, it follows that for any d-dimensional GWM\(^\mathrm{c}\) A given by a set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_{d}(\mathbb {F})\) generating the algebra \(\mathcal {A}\), the d-dimensional \(\tilde{A}\) given by the matrices \(\{\tilde{\mathbf {A}}^\sigma = \pi (\mathbf {A}^\sigma )\}_{\sigma \in \varSigma }\) is a semi-simple GWM\(^\mathrm{c}\) computing the function \(f_A\), hence the following corollary.
Corollary 1
Any function that can be computed by a GWM\(^\mathrm{c}\) can be computed by a semi-simple GWM\(^\mathrm{c}\) of the same dimension.
Given a finite dimensional algebra \(\mathcal {A}\), one can compute the surjective homomorphism \(\pi \) from Theorem 3 in polynomial time when \(\mathbb {F}\) allows efficient arithmetic computations (e.g. \(\mathbb {F}=\mathbb {Q}\)) [12, 15]. The algorithm takes as input a basis \(a_1,\cdots ,a_n\) of \(\mathcal {A}\) (as a vector space) and the structure coefficients of the algebra (which are the scalars \(c^k_{i,j}\in \mathbb {F}\) satisfying \(a_ia_j=\sum _k c_{i,j}^ka_k\)). Since one can easily compute a basis and the structure coefficients of a matrix algebra \(\mathcal {A}\) given a set of generators \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) in polynomial time, it follows that any GWM\(^\mathrm{c}\) can be transformed in polynomial time into a semi-simple \(GWM^{c}\) (of the same dimension) computing the same function.
We now show that a result similar to Proposition 3 holds for semi-simple GWM\(^\mathrm{c}\)s: two semi-simple d-dimensional \(GWM^{c}s\) are equivalent if and only if they are related by a change of basis. This result relies on the following theorem.
Theorem 4
Let \(\varSigma \) be a finite alphabet and let \(\mathcal {A},\mathcal {B}\subset \mathcal {M}_d(\mathbb {F})\) be the algebras generated by the sets of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) and \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\) respectively.
If \(\mathcal {A}\) and \(\mathcal {B}\) are semi-simple and \(\mathrm {Tr} (\mathbf {A}^w)=\mathrm {Tr} (\mathbf {B}^w)\) for all \(w\in \varSigma ^*\) then \(\mathcal {A}\) is isomorphic to \(\mathcal {B}\). Moreover, the mapping \(\tilde{\phi }:\mathcal {A}\rightarrow \mathcal {B}\) defined by extending the mapping
by linearity is well-defined and is an isomorphism.
Proof
The mapping \(\phi \) is by construction a trace-preserving surjective semi-group homomorphism. We first showFootnote 6 that \(\phi \) can be extended to a homomorphism \(\tilde{\phi }:\mathcal {A}\rightarrow \mathcal {B}\). By definition, any \(\mathbf {A}\in \mathcal {A}\) can be written as \(\mathbf {A}= \sum _{i=1}^n \alpha _i \mathbf {A}^{x_i}\) for some \(n\in \mathbb {N}\), \(\alpha _1,\cdots ,\alpha _n\in \mathbb {F}\), \(x_1,\cdots ,x_n\in \varSigma ^*\). We will show that the mapping
is well-defined. By construction of \(\tilde{\phi }\), it suffices to show that if \(\sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} = 0\) for some \(\alpha _i\in \mathbb {F}\), \(x_i\in \varSigma ^*\), then \(\tilde{\phi }( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} ) = 0\). Suppose \(\sum _{i=1}^n\alpha _i \mathbf {A}^{x_i} = 0\), then \(\sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\mathbf {A}^x = 0\) for any \(x\in \varSigma ^*\). By linearity of the trace and since \(\phi \) is a trace-preserving morphism, it follows that
for all \(x\in \varSigma ^*\). By linearity of the trace and since \(\phi \) is surjective, we thus have \(\mathrm {Tr} \left[ \tilde{\phi }\left( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\right) \mathbf {B}\right] = 0\) for any \(\mathbf {B}\in \mathcal {B}\), hence \(\tilde{\phi }\left( \sum _{i=1}^n\alpha _i \mathbf {A}^{x_i}\right) \) belongs to \(\mathrm {Rad}(\mathcal {B})\) by Proposition 2 and must be 0 since \(\mathcal {B}\) is semi-simple.
One can easily check that \(\tilde{\phi }\) is trace-preserving, is surjective and is a homomorphism. It remains to show that \(\tilde{\phi }\) is injective. Let \(\mathbf {A}\in \mathcal {A}\) be such that \(\tilde{\phi }(\mathbf {A}) = 0\). Since \(\tilde{\phi }\) is a homomorphism we have \(\tilde{\phi }(\mathbf {A}\mathbf {X})=0\) for any \(\mathbf {X}\in \mathcal {A}\), and thus \(0=\mathrm {Tr} (\tilde{\phi }(\mathbf {A}\mathbf {X})) = \mathrm {Tr} (\mathbf {A}\mathbf {X})\) for all \(\mathbf {X}\in \mathcal {A}\). Hence \(\mathbf {A}\in \mathrm {Rad}(\mathcal {A})\) by Proposition 2 and must be 0 since \(\mathcal {A}\) is semi-simple. \(\square \)
The previous theorem can be leveraged to show that if two semi-simple GWM\(^\mathrm{c}\)s of the same dimension compute the same function, then they are related by a change of basis (note that the converse of this statement is immediate since the trace is a basis independent operator). Let A and B be two d-dimensional semi-simple GWM\(^\mathrm{c}\)s computing the same function and let \(\mathcal {A},\mathcal {B}\subset \mathcal {M}_d\) be the algebras generated by their respective sets of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) and \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\). First observe that the identity mapping \(\rho :\mathcal {A}\rightarrow \mathcal {L}(\mathbb {F}^d)\) defined by \(\rho (\mathbf {A})=\mathbf {A}\) for all \(\mathbf {A}\in \mathcal {A}\) is (trivially) a representation of the algebra \(\mathcal {A}\). Now, since A and B compute the same function and are semi-simple, we have \(\mathrm {Tr} (\mathbf {A}^w)=\mathrm {Tr} (\mathbf {B}^w)\) for all \(w\in \varSigma ^*\) and it follows from Theorem 4 that \(\mathcal {A}\) is isomorphic to \(\mathcal {B}\); let \(\tilde{\phi }:\mathcal {A}\rightarrow \mathcal {B}\) be the isomorphism defined in this theorem. Then, the mapping \(\tau :\mathcal {A}\rightarrow \mathcal {L}(\mathbb {F}^d)\) defined by \(\tau (\mathbf {A}) = \tilde{\phi }(\mathbf {A})\) for all \(\mathbf {A}\in \mathcal {A}\) is also a representation of \(\mathcal {A}\), and since \(\mathcal {A}\) is semi-simple it follows from Theorem 2 that \(\rho \) and \(\tau \) are similar. That is, there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {F})\) such that \(\rho (\mathbf {A}) = \mathbf {P}^{-1}\tau (\mathbf {A})\mathbf {P}\) for all \(\mathbf {A}\in \mathcal {A}\). In particular we have
for all \(\sigma \in \varSigma \), hence the following corollary.
Corollary 2
Two d-dimensional semi-simple GWM\(^\mathrm{c}\)s A and B compute the same function if and only if they are related by a change of basis, i.e. there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {F})\) such that \(\mathbf {A}^\sigma = \mathbf {P}^{-1}\mathbf {B}^\sigma \mathbf {P}\) for all \(\sigma \in \varSigma \).
In the case where \(\mathbb {F}\) allows for efficient arithmetic computations (e.g. \(\mathbb {F}=\mathbb {Q}\)), it follows that the equivalence of GWM\(^\mathrm{c}\)s can be decided in polynomial time. Indeed, given two GWM\(^\mathrm{c}\)s A and B of the same dimension defined by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) and \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\) respectively, one can first transform them into semi-simple GWM\(^\mathrm{c}\)s using Theorem 3 and the algorithm in [12, 15], and then check whether the resulting matrices are related by a change of basis. The case where the two GWM\(^\mathrm{c}\)s are not of the same dimension can be easily handled. Without loss of generality, suppose that A and B are semi-simple GWM\(^\mathrm{c}\)s of dimension d and \(d'\) respectively with \(d'<d\). One can construct a d-dimensional GWM\(^\mathrm{c}\) \(\tilde{B}\) computing the same function as B by considering the block-diagonal matrices \(\tilde{\mathbf {B}}^\sigma = \mathbf {B}^\sigma \oplus \mathbf {0}\) for each \(\sigma \in \varSigma \) (where \(\mathbf {0}\) is the \((d-d')\times (d-d')\) matrix with all entries equal to 0). It is easy to check that \(\tilde{B}\) is semi-simple if B is semi-simple, hence one can decide if A is equivalent to B by checking whether the matrices \(\mathbf {A}^\sigma \) and \(\tilde{\mathbf {B}}^\sigma \) are related by a change of basis.
4 Minimization of GWMs over Circular Strings
We now consider the minimization problem: given a GWM\(^\mathrm{c}\) A, can we find a minimal GWM\(^\mathrm{c}\) computing \(f_A\)? We will show that the answer is in the positive and that such a minimal GWM\(^\mathrm{c}\) can be computed in polynomial time. We start with a technical lemma that generalizes the classical result stating that for any \(d\times d\) matrix \(\mathbf {A}\), the kernel of \(\mathbf {A}^d\) is equal to the kernel of \(\mathbf {A}^{d+k}\) for any \(k\ge 0\).
Lemma 2
Let \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) be a finite set of matrices. Then for all \(k\ge 0\) we have
Proof
For any integer i, let \(E_i = \bigcap _{x\in \varSigma ^i}\mathrm {ker}(\mathbf {A}^x)\). We start by showing that if \(E_i = E_{i+1}\) for some i then \(E_{i+1} = E_{i+2}\). The inclusion \(E_{i+1} \subseteq E_{i+2}\) is immediate. Suppose \(E_i=E_{i+1}\) for some integer i. If \(\mathbf {v}\in E_{i+2}\) then \(\mathbf {A}^\sigma \mathbf {v}\in \mathrm {ker}(\mathbf {A}^x)\) for all \(x\in \varSigma ^{i+1}\) and all \(\sigma \in \varSigma \), i.e. \(\mathbf {A}^\sigma \mathbf {v}\in E_{i+1} = E_i\) for all \(\sigma \in \varSigma \), which implies \(\mathbf {A}^\sigma \mathbf {v}\in \mathrm {ker}(\mathbf {A}^y)\) for all \(y\in \varSigma ^i\) and all \(\sigma \in \varSigma \) from which \(\mathbf {v}\in E_{i+1}\) follows directly. To conclude, since each \(E_i\) is a linear subspace of \(\mathbb {F}^d\), \(E_i\subsetneq E_{i+1}\) implies \(\dim {E_i} < \dim {E_{i+1}}\), hence there must exist an i for which \(E_i=E_{i+1}\) and this i cannot be greater than d. \(\square \)
We show in the following theorem that the linear space \(E=\bigcap _{x\in \varSigma ^d}\mathrm {ker}(\mathbf {A}^x)\) is not relevant to the computation of a GWM\(^\mathrm{c}\) A with matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\), i.e. one can project each matrix \(\mathbf {A}^x\) onto the orthogonal complement of E without changing the function computed by A.
Theorem 5
Let A be a GWM\(^\mathrm{c}\) given by the set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\). Consider the linear space
and let \(\mathbf {\Pi }\in \mathbb {F}^{d\times d}\) be the matrix of the orthogonal projection onto E.
Then, the GWM\(^\mathrm{c}\) \(\hat{A}\) given by the matrices \(\hat{\mathbf {A}}^\sigma =\mathbf {A}^\sigma (\mathbf {I}-\mathbf {\Pi })\) for each \(\sigma \in \varSigma \) is such that \(f_A=f_{\hat{A}}\).
Proof
Let \(\mathcal {A}\) be the algebra generated by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\). Let us first observe that E is \(\mathcal {A}\)-invariant, which follows from Lemma 2. Indeed, if \(\mathbf {v}\in E\) and \(y\in \varSigma ^*\) we have \(\mathbf {A}^x\mathbf {A}^y\mathbf {v}= \mathbf {0}\) for any \(x\in \varSigma ^d\) (since \(|xy|\ge d\)), hence \(\mathbf {A}^y\mathbf {v}\in E\); the extension to an arbitrary element of \(\mathbf {A}\) is immediate by linearity. This implies that for any \(\mathbf {A}\in \mathcal {A}\), we have
Now, let \(k\ge 1\), let \(x=x_1x_2\cdots x_k\in \varSigma ^k\) and let \(\mathbf {P}_1 = \mathbf {\Pi }\) and \(\mathbf {P}_2 = \mathbf {I}-\mathbf {\Pi }\). We can decompose \(\mathbf {A}^x\) into
We will show that the traces of all the summands in this last expression, except for the first one, are equal to 0. First, using Eq. (2) we have \(\mathbf {A}^{x_1}\mathbf {\Pi }\mathbf {A}^{x_2}\mathbf {\Pi }\cdots \mathbf {A}^{x_k}\mathbf {\Pi }= \mathbf {A}^x\mathbf {\Pi }.\) Moreover, for any integer s such that \(sk\ge d\) we have \((\mathbf {A}^x\mathbf {\Pi })^s = \mathbf {A}^{x^s} \mathbf {\Pi }= \mathbf {0}\) by definition of E and by Lemma 2, thus \(\mathbf {A}^x\mathbf {\Pi }\) is nilpotent and its trace is 0 by Lemma 1. For the remaining terms, let \(j_1,\cdots ,j_k \in \{1,2\}\) not all equal. Let \(l\in [k]\) be an index such that \(j_l=2\) and \(j_{\overline{l+1}} = 1\) where \(\overline{l+1} = l+1\) if \(l<k\) and 1 otherwise. Using the invariance of the trace under cyclic permutations of a matrix product, we obtain
where we used Eq. 2 again for the last equality. To conclude, we have shown that \(\mathrm {Tr} (\mathbf {A}^x) = \mathrm {Tr} (\hat{\mathbf {A}}^x)\) for all \(x\in \varSigma ^*\), hence A and \(\hat{A}\) compute the same function on circular strings. \(\square \)
Moreover, we now show that the subspace E from the previous theorem can be used to obtain a characterization of the minimality of a GWM\(^\mathrm{c}\).
Theorem 6
Let A be a GWM\(^\mathrm{c}\) given by the set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\). Then, A is minimal if and only if the linear space
is trivial, i.e. \(E = \{\mathbf {0}\}\).
Proof
Suppose that E is not trivial and let \(\mathbf {\Pi }\) be the matrix of the orthogonal projection onto E. Then, the rank R of \(\mathbf {I}-\mathbf {\Pi }\) is strictly less than d and there exists an orthogonal matrix \(\mathbf {U}\in \mathbb {R}^{d\times R}\) such that \(\mathbf {I}-\mathbf {\Pi }= \mathbf {U}\mathbf {U}^\top \). It follows from the previous proposition that, for any non-empty word \(x=x_1\cdots x_k\), we have
Hence, the R-dimensional GWM\(^\mathrm{c}\) given by the matrices \(\hat{\mathbf {A}}^\sigma = \mathbf {U}^\top \mathbf {A}^{\sigma } \mathbf {U}\) computes the same function as A, showing that A is not minimal.
Suppose now that A is not minimal. Let B be a GWM\(^\mathrm{c}\) of dimension \(d' <d\), given by the matrices \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\), such that \(f_B=f_A\). Let \(\mathcal {A}\) (resp. \(\mathcal {B}\)) be the algebra generated by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\) (resp. \(\{\mathbf {B}^\sigma \}_{\sigma \in \varSigma }\)). By Corollary 1, we can assume that both A and B are semi-simple GWM\(^\mathrm{c}\)s, i.e. that the algebras \(\mathcal {A}\) and \(\mathcal {B}\) are semi-simple. For each \(\sigma \in \varSigma \), let \(\hat{\mathbf {B}}^\sigma = \mathbf {B}^\sigma \oplus \mathbf {0}\in \mathbb {R}^{d\times d}\) be the block diagonal matrix having \(\mathbf {B}^\sigma \) in the upper diagonal block and 0’s elsewhere. Let \(\hat{\mathcal {B}}\) be the algebras generated by the matrices \(\{\hat{\mathbf {B}}^\sigma \}_{\sigma \in \varSigma } \subset \mathcal {M}_d(\mathbb {F})\). It is easy to check that the GWM\(^\mathrm{c}\) \(\hat{B}\) computes the same function as A and B and that the algebra \(\hat{\mathcal {B}}\) is semi-simple (it is indeed isomorphic to the semi-simple algebra \(\mathcal {B}\)). It then follows from Corollary 2 that there exists an invertible matrix \(\mathbf {P}\in \mathcal {M}_d(\mathbb {F})\) such that \(\mathbf {A}^\sigma = \mathbf {P}\hat{\mathbf {B}}^\sigma \mathbf {P}^{-1}\text { for all }\sigma \in \varSigma .\) Let \(\mathbf {e}_d\) be the dth vector of the canonical basis of \(\mathbb {F}^d\), by definition of \(\hat{\mathbf {B}}^\sigma \) we have \(\hat{\mathbf {B}}^\sigma \mathbf {e}_d = \mathbf {0}\) for any \(\sigma \in \varSigma \), and consequently \(\mathbf {A}^\sigma \mathbf {P}\mathbf {e}_d = \mathbf {0}\) for any symbol \(\sigma \), showing that \(\mathbf {P}\mathbf {e}_d\in E\) and \(E \ne \{\mathbf {0}\}\). \(\square \)
It follows from the two previous theorems that by restricting the linear operators \(\mathbf {A}^\sigma \) of a GWM\(^\mathrm{c}\) A to the subspace \(E^\bot \), one can obtain a minimal GWM\(^\mathrm{c}\) computing \(f_A\). We formally state this result in the following corollary.
Corollary 3
Let A be a GWM\(^\mathrm{c}\) given by the matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\subset \mathcal {M}_d(\mathbb {F})\) and let \(\mathbf {\Pi }\) be the matrix of the orthogonal projection onto the space \(E = \bigcap _{x\in \varSigma ^d} \ker (\mathbf {A}^x)\). For any orthogonal matrix \(\mathbf {U}\in \mathbb {F}^{d\times R}\) such that \(\mathbf {I}-\mathbf {\Pi }=\mathbf {U}\mathbf {U}^\top \) (where R is the dimension of \(E^\bot \)), the R-dimensional GWM\(^\mathrm{c}\) \(\hat{A}\) given by the matrices \(\hat{\mathbf {A}}^\sigma = \mathbf {U}^\top \mathbf {A}^\sigma \mathbf {U}\) is a minimal GWM\(^\mathrm{c}\) computing \(f_A\).
Proof
Using the invariance of the trace under cyclic permutations of a matrix product, it directly follows from Theorem 5 that \(f_{\hat{A}} = f_A\). Moreover, one can check that \(\hat{E} = \bigcap _{x\in \varSigma ^d} \ker (\hat{\mathbf {A}}^x) = \{\mathbf {0}\}\) by construction of the matrices \(\hat{\mathbf {A}}^\sigma \), hence \(\hat{A}\) is minimal by Theorem 6. \(\square \)
We showed that a GWM\(^\mathrm{c}\) can be minimized by restricting its matrices to the subspace \(E^\bot \). In order to do so, one needs to compute a basis of \(E = \bigcap _{x\in \varSigma ^d} \ker (\mathbf {A}^x)\). This can naively be done by first computing \(\ker (\mathbf {A}^x)\) for each \(x\in \varSigma ^d\) and then computing a basis for the intersection of these linear subspaces, however the complexity of this approach is exponential in the dimension d. We show in the following proposition that for semi-simple GWM\(^\mathrm{c}\)s, one simply needs to compute a basis of the space \(\bigcap _{\sigma \in \varSigma } \ker (\mathbf {A}^\sigma )\), which can be done in polynomial time (provided that the field \(\mathbb {F}\) admits efficient symbolic arithmetic, e.g. \(\mathbb {F}=\mathbb {Q}\)).
Proposition 4
Let \(\mathcal {A}\subset \mathcal {M}_d(\mathbb {F})\) be the finite dimensional algebra generated by the set of matrices \(\{\mathbf {A}^\sigma \}_{\sigma \in \varSigma }\). Then if \(\mathcal {A}\) is semi-simple we have
Proof
For any integer \(i\ge 1\), let \(E_i = \bigcap _{x\in \varSigma ^i}\mathrm {ker}(\mathbf {A}^x)\). Recall from the proof of Lemma 2 that \(E_i\subset E_{i+1}\) for all i and that \(E_i=E_{i+1}\) implies \(E_i = E_{i+k}\) for any integer \(k\ge 0\), hence it will be sufficient to show that \(E_1=E_2\). One can check that each \(E_i\) is \(\mathcal {A}\)-invariant, i.e. each \(E_i\) is an \(\mathcal {A}\)-module. Since \(\mathcal {A}\) is semi-simple, any \(\mathcal {A}\)-module is semi-simple [17, Theorem 2.6.2], which implies that if M is an \(\mathcal {A}\)-module, every submodule U of M has a complement [17, Proposition 2.2.1], i.e. there exists an \(\mathcal {A}\)-module V such that \(M=U\oplus V\). Now since \(E_1\) is a submodule of the \(\mathcal {A}\)-module \(E_2\), \(E_1\) has a complement U in \(E_2\), i.e. U is \(\mathcal {A}\)-invariant and \(E_2 = E_1\oplus U\). Let \(\mathbf {v}\in U\). We show \(\mathbf {v}=\mathbf {0}\). Since \(\mathbf {v}\in E_2\), we have \(\mathbf {A}^{\sigma _1}\mathbf {A}^{\sigma _2}\mathbf {v}= \mathbf {0}\) for all \(\sigma _1,\sigma _2\in \varSigma \), hence \(\mathbf {A}^\sigma \mathbf {v}\in E_1\) for all \(\sigma \in \varSigma \). Moreover, we have \(\mathbf {A}^\sigma \mathbf {v}\in U\) for all \(\sigma \in \varSigma \) since U is \(\mathcal {A}\)-invariant. It follows that \(\mathbf {A}^\sigma \mathbf {v}\in E_1\cap U=\{\mathbf {0}\}\) and \(\mathbf {A}^\sigma \mathbf {v}= \mathbf {0}\) for all \(\sigma \in \varSigma \), hence \(\mathbf {v}\in E_1\) and since \(\mathbf {v}\in U\) we have \(\mathbf {v}=\mathbf {0}\). To conclude, we have \(U = \{\mathbf {0}\}\), hence \(E_1 = E_2\). \(\square \)
Since a GWM\(^\mathrm{c}\) can be transformed into an equivalent semi-simple GWM\(^\mathrm{c}\) in polynomial time (see Corollary 1 and the following discussion), the minimization of a GWM\(^\mathrm{c}\) defined over circular strings can be achieved in polynomial time by first converting it to a semi-simple GWM\(^\mathrm{c}\) and then applying Corollary 3 with Proposition 4. The overall minimization algorithm is summarized in Algorithm 1.
5 Conclusion
We proposed polynomial time algorithms to handle both the minimization and the equivalence problems for GWMs defined over circular strings. By doing so, we unraveled fundamental notions from algebra theory that will be central to the study of GWMs. In particular, the notion of semi-simple GWM\(^\mathrm{c}\) was paramount to our analysis. Intuitively, semi-simplicity can be thought of as a weak form of minimality: components from the radical do not contribute to the final computation of a GWM\(^\mathrm{c}\) (semi-simplification thus corresponds to annihilating these irrelevant components from the algebra, i.e. from the GWM\(^\mathrm{c}\)’s dynamics).
The next step is of course to try to extend the results obtained in this paper to GWMs defined over more general families of graphs. One promising direction we are currently investigating relies on extending the central notion of semi-simple GWM\(^\mathrm{c}\) to GWMs defined over arbitrary families of labeled graphs: by opening any edge e in a graph G one obtains a graph \(G_e\) with two free ports (i.e. edges having one end that is not connected to any vertex) which would be mapped by a d-dimensional GWM A to a matrix \(\mathbf {A}^{G_e}\in \mathcal {M}_d(\mathbb {F})\) (indeed, a GWM naturally maps any graph with k free ports to a kth order tensor; see [22, Sect. 2.2.3] for more details). For circular strings, opening an edge corresponds to choosing a particular position in the circular string leading to an actual string \(x\in \varSigma ^*\) which is mapped to \(\mathbf {A}^x\) by the GWM. For arbitrary labeled graphs, we have \(f_A(G)=\mathrm {Tr} (\mathbf {A}^{G_e})\) similarly to the case of circular strings. One can then consider the algebra \(\mathcal {A}\) generated by the matrices \(\mathbf {A}^{G_e}\) for any graph G in some family of graphs and any edge e in G, and define a semi-simple GWM as a GWM for which this algebra \(\mathcal {A}\) is semi-simple (note that one exactly recovers the notion of semi-simple GWM introduced here in the special case of circular strings). Hence, the fundamental results from algebra theory we leveraged in this paper should be directly relevant to the study of general GWMs. Beyond minimization, we intend to study the problem of approximate minimization (such as the ones considered in [7, 23] for string and tree weighted automata) along with the closely related problem of learning GWMs defined over richer families of graphs than the one of circular strings.
Notes
- 1.
Note that this is a not a definition per se but rather a consequence of the definition of general GWMs (as introduced in [2, 22]): when restricted to the family of circular strings, a GWM is given by a set of matrices and its computation can be succinctly expressed using the trace operator (whereas a general GWM is given by a set of tensors and its computation relies on partial traces).
- 2.
Observe that we exclude the empty string from the domain of \(f_A\). This is on purpose since \(f_A(\lambda )\) would be the dimension of A (using the convention \(\mathbf {A}^\lambda =\mathbf {I}\)): given two GWM\(^\mathrm{c}\)s of different dimensions computing the same function on \(\varSigma ^+\), we want to consider them as equivalent even though they disagree on \(\lambda \).
- 3.
Note that this is not a necessary condition: the function f defined on \(\{a,b\}^*\) by \(f(x)=1\) if \(x=a\) and 0 otherwise is WFA-recognizable but not GWM\(^\mathrm{c}\)-recognizable.
- 4.
- 5.
Note that this definition is specific to the finite-dimensional case; for general rings, there exist distinct non-equivalent definitions of radicals, which all agree with the one given here in the case of finite-dimensional algebras.
- 6.
This part of the proof is adapted from the proof of Proposition 3.1 in [18].
References
Bailly, R., Carreras Pérez, X., Luque, F.M., Quattoni, A.J.: Unsupervised spectral learning of WCFG as low-rank matrix completion. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 624–635. ACL (2013)
Bailly, R., Denis, F., Rabusseau, G.: Recognizable series on hypergraphs. In: Proceedings of the 9th International Conference on Language and Automata Theory and Applications, pp. 639–651 (2015)
Bailly, R., Denis, F., Ralaivola, L.: Grammatical inference as a principal component analysis problem. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 33–40. ACM (2009)
Bailly, R., Habrard, A., Denis, F.: A spectral approach for probabilistic grammatical inference on trees. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS (LNAI), vol. 6331, pp. 74–88. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16108-7_10
Bailly, R., Rabusseau, G., Denis, F.: Recognizable series on graphs and hypergraphs. J. Comput. Syst. Sci. (2017, in press)
Balle, B., Carreras, X., Luque, F.M., Quattoni, A.: Spectral learning of weighted automata. Mach. Learn. 96(1–2), 33–63 (2014)
Balle, B., Panangaden, P., Precup, D.: A canonical form for weighted automata and applications to approximate minimization. In: 30th Annual Symposium on Logic in Computer Science, pp. 701–712. IEEE (2015)
Berstel, J., Reutenauer, C.: Recognizable formal power series on trees. Theor. Comput. Sci. 18(2), 115–148 (1982)
Berstel, J., Reutenauer, C.: Rational Series and Their Languages. Springer, Heidelberg (1988)
Boots, B., Siddiqi, S.M., Gordon, G.J.: Closing the learning-planning loop with predictive state representations. Int. J. Robot. Res. 30(7), 954–966 (2011)
Bozapalidis, S., Grammatikopoulou, A.: Recognizable picture series. J. Automata Lang. Comb. 10(2/3), 159–183 (2005)
Bremner, M.R.: How to compute the Wedderburn decomposition of a finite-dimensional associative algebra. Groups Complex. Cryptol. 3(1), 47–66 (2011)
Bre\(\check{\rm {s}}\)ar, M.: Introduction to Noncommutative Algebra. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08693-4
Cohen, S.B., Stratos, K., Collins, M., Foster, D.P., Ungar, L.H.: Spectral learning of latent-variable PCFGs: algorithms and sample complexity. J. Mach. Learn. Res. 15(1), 2399–2449 (2014)
de Graaf, W.A., Ivanyos, G., Küronya, K., Rónyai, L.: Computing Levi decompositions in lie algebras. Appl. Algebra Eng. Commun. Comput. 8(4), 291–303 (1997)
Droste, M., Kuich, W., Vogler, H. (eds.): Handbook of Weighted Automata. Springer Science & Business Media, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01492-5
Drozd, Y.A., Kirichenko, V.V.: Finite Dimensional Algebras. Springer Science & Business Media, Heidelberg (2012). https://doi.org/10.1007/978-3-642-76244-4
Hladnik, M., Omladic, M., Radjavi, H.: Trace-preserving homomorphisms of semigroups. J. Funct. Anal. 204(2), 269–292 (2003)
Hsu, D.J., Kakade, S.M., Zhang, T.: A spectral algorithm for learning Hidden Markov Models. In: Proceedings of the Conference on Learning Theory (2009)
Lee, T., Na, J.C., Park, H., Park, K., Sim, J.S.: Finding consensus and optimal alignment of circular strings. Theoret. Comput. Sci. 468, 92–101 (2013)
Marusic, I., Worrell, J.: Complexity of equivalence and learning for multiplicity tree automata. J. Mach. Learn. Res. 16, 2465–2500 (2015)
Rabusseau, G.: A Tensor Perspective on Weighted Automata, Low-Rank Regression and Algebraic Mixtures. PhD thesis, Aix-Marseille Université (2016)
Rabusseau, G., Balle, B., Cohen, S.: Low-rank approximation of weighted tree automata. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 839–847 (2016)
Rittaud, B., Vivier, L.: Circular words and applications. In: WORDS, pp. 31–36 (2011)
Sakarovitch, J.: Elements of Automata Theory. Cambridge University Press, Cambridge (2009)
Tzeng, W.-G.: On the definition of a family of automata. Inf. Control 4(2–3), 245–270 (1961)
Tzeng, W.-G.: A polynomial-time algorithm for the equivalence of probabilistic automata. SIAM J. Comput. 21(2), 216–227 (1992)
Acknowledgements
The author acknowledges support of an IVADO postdoctoral fellowship and would like to thank the reviewers for their helpful comments as well as Philip Amortila, François Denis, Clara Lacroce, Prakash Panangaden and Joelle Pineau for fruitful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Author(s)
About this paper
Cite this paper
Rabusseau, G. (2018). Minimization of Graph Weighted Models over Circular Strings. In: Baier, C., Dal Lago, U. (eds) Foundations of Software Science and Computation Structures. FoSSaCS 2018. Lecture Notes in Computer Science(), vol 10803. Springer, Cham. https://doi.org/10.1007/978-3-319-89366-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-89366-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89365-5
Online ISBN: 978-3-319-89366-2
eBook Packages: Computer ScienceComputer Science (R0)