Abstract
Data in the form of a continuous vector function on a given interval are referred to as multivariate functional data. These data are treated as realizations of multivariate random processes. The paper is devoted to three statistical dimension reduction techniques for multivariate data. For the first one, principal components analysis, the authors present a review of a recent paper (Jacques and Preda in, Comput Stat Data Anal, 71:92–106, 2014). For two others one, canonical variables and discriminant coordinates, the authors extend existing works for univariate functional data to multivariate. These methods for multivariate functional data are presented, illustrated and discussed in the context of analyzing real data sets. Each of these techniques is applied on real data set.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In recent years, methods for representing data by functions or curves have received much attention. Such data are known in the literature as functional data (Bongiorno et al. 2014; Ferraty and Vieu 2006; Horváth and Kokoszka 2012; Ramsay and Silverman 2005). Examples of functional data can be found in various application domains, such as medicine, economics, meteorology and many others. In previous papers on functional data analysis, objects are characterized only by one feature observed at many time points. Methods of functional data analysis are becoming increasingly popular, e.g. in the cluster analysis (Jacques and Preda 2013; James and Sugar 2003; Peng and Müller 2008), classification (Chamroukhi et al. 2013; Delaigle and Hall 2012; Mosler and Mozharovskyi 2015; Rossi and Villa 2006) and regression (Ferraty et al. 2012; Goia and Vieu 2014; Kudraszow and Vieu 2013; Peng et al. 2015; Rachdi and Vieu 2006; Wang et al. 2015). Unfortunately, multivariate data methods cannot be directly used for functional data, because of the problem of dimensionality and difficulty in putting functional data into order. In many applications there is a need to use statistical methods for objects characterized by many features observed at many time points (double multivariate data), and such data are called multivariate functional data. A pioneering theoretical works were Besse (1979), Besse and Ramsay (1986), where random variables take values in a general Hilbert space. Saporta (1981) presents an analysis of multivariate functional data from the point of view of factorial methods (principal components and canonical analysis). Berrenderoa et al. (2011), Jacques and Preda (2014) and Panaretos et al. (2010) discussed principal component analysis for multivariate functional data (MFPCA). In this paper we extend the construction of principal component analysis to other projective dimension reduction techniques, i.e. discriminant coordinates and canonical correlation analysis for multivariate functional data.
Dimension reduction is a very active field of statistical research. We focused only on projective dimension reduction techniques (Burges 2009): principal components analysis (PCA), canonical correlation analysis (CCA) and Fisher discriminant analysis (DCA). These procedures are a transformation that allows us to obtain a linear projection of our data, originally in \(\mathbf {R}^p\) onto \(\mathbf {R}^k\), where \(k < p\). Along with reducing the data dimensions, the data are also projected in a different orientation. Altogether, this transformation presents the data in a manner that stresses out the trends in it facilitating its interpretation.
The rest of this paper is organized as follows. We first review the concept of transformation of discrete data to multivariate functional data (Sect. 2). Section 3 contains a review principal components analysis for multivariate functional data. Sections 4 and 5 contain our extension of existing works for univariate functional data to multivariate, respectively for CCA and discriminant coordinates. Section 6 contains the results of our experiments on the real data set. Conclusions are given in Sect. 7.
2 Transformation of discrete data to multivariate functional data
Let X(t) be a stochastic process with continuous parameter \(t\in I\). Moreover, assume that \(X\in L_2(I)\), where \(L_2(I)\) is a Hilbert space of square integrable functions on the interval I and that the process X(t) has the following representation:
where \(\{\varphi _b\}\) are orthonormal basis functions, and \(c_0,c_1,\ldots ,c_B\) are the random coefficients.
Many financial, meteorological and other data are recorded at discrete moments in time. Let \(x_j\) denote an observed value of process X(t) at the jth time point \(t_j\), where I is a compact set such that \(t_j \in I\), for \(j=1,...,J\). Then our data consist of J pairs \((t_{j},x_{j})\). This discrete data can be smoothed by continuous function x(t), where \(t \in I\) (Ramsay and Silverman 2005).
Let \(\pmb {x}=(x_1,x_2,\ldots ,x_{J})'\), \(\pmb {c}=(c_0,c_1,\ldots ,c_B)'\) and \(\pmb {\Phi }(t)\) be a matrix of dimension \(J \times (B+1)\) containing the values \(\varphi _b(t_j)\), \(b=0,1,...,B, j=1,2,...,J\). The coefficient \(\pmb {c}\) in (1) is estimated by the least squares method, that is, so as to minimize the function:
Differentiating \(S(\pmb {c})\) with respect to the vector \(\pmb {c}\), we obtain the least squares method estimator
Then
The degree of smoothness of the function x(t) depends on the value B (a small value of B causes more smoothing of the curves). The optimum value for B is selected using the Bayesian information criterion (BIC):
We decided to use such criterion because Akaike Information Criterion (AIC) better measures predictive accuracy while BIC better measures goodness of fit (Berk 2008; Shmueli 2010; Sober 2002).
Let us assume that there are n independent pairs of values \((t_{ij},x_{ij})\), \(j=1,...,J\), \(i=1,...,n\). These discrete data are smoothed to continuous functions in the following form:
Among all the \(B_1,B_2,...,B_n\) one common value of B is chosen, as the modal value of the numbers \(B_1,B_2,...,B_n\).
The set of functions \(\left\{ x_1(t),..., x_n(t):t\in I\right\} \) obtained in this way is called functional data (see Ramsay and Silverman 2005). Note that in some cases it could be interesting to discretize smooth functions. This is at the center of variable selection methods in functional data analysis (Aneiros and Vieu 2014).
So far we have been dealing with data characterized by one feature. Our considerations can be generalized to the case of \(p\ge 2\) features. Then our data consist of n independent vector functions \(\pmb {x}_i(t)=(x_{i1}(t),x_{i2}(t),....,x_{ip}(t))'\), \(t\in I\), \(i=1,...,n\). The data \(\left\{ \pmb {x}_1(t),..., \pmb {x}_n(t):t\in I\right\} \) will be called multivariate functional data. Multivariate functional data can conveniently be treated as realizations of a finite multidimensional stochastic process \(\pmb {X}(t)=(X_1(t),X_2(t),...,X_p(t))'\) with continuous parameter \(t \in I\). We will further assume, that \(\pmb {X}\in L_2^p(I)\), where \(L_2(I)\) is a Hilbert space of square integrable functions on the interval I equipped with the following inner product:
We consider the case, where the dth component of process \(\pmb {X}(t)\) can be represented by a finite number of orthonormal basis functions \(\{\varphi _b\}\)
where \(c_{db}\) are random variables. Let
where \(\pmb {\varphi }_{B_d}(t)=(\varphi _{0}(t),...,\varphi _{B_d}(t))'\), \(d=1,...,p\). Then
3 Principal component analysis for multivariate functional data
The idea of principal component analysis is to reduce the dimensionality of a data set consisting of a large number of correlated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming to a new set of variables, the principal components, which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables. Suppose that we are observing a p-dimensional random vector \(\pmb {X}=(X_1,X_2,...,X_p)' \in \mathbb {R}^p\). In the first step we look for a linear combination \(U_1 = u_{11} X_1 + u_{12} X_2 + ... + u_{1p} X_p = \pmb {u}_1'\pmb {X}\) of the elements of vector \(\pmb {X}\) having maximum variance. The variable \(U_1\) is called the first principal component. Next, we look for a linear combination \(U_2=\pmb {u}_2'\pmb {X}\), uncorrelated with the first principal component \(U_1\), having maximum variance, and so on, so that at the kth stage a linear combination \(U_k=\pmb {u}_k'\pmb {X}\), called the kth principal component, is found that has maximum variance subject to being uncorrelated with the first \(k-1\) principal components (Jolliffe 2002). The observations can be presented graphically as points on a plane \((U_1,U_2)\). The functional case of PCA (FPCA) is a more informative way of looking at the variability structure in the variance-covariance operator for one dimensional functional data (Górecki and Krzyśko 2012). In this section we present PCA for multivariate functional data (Jacques and Preda 2014).
Without loss of generality we will further assume, that \({{\mathrm{E}}}(\pmb {X})=\pmb {0}\). In principal component analysis in the multivariate functional case, we are interested in finding the inner product
having maximal variance for all \(\pmb {u}\in L_2^p(I)\) such, that \({<}\pmb {u},\pmb {u}{>}=1\). Let
where \({<}\pmb {u}_1,\pmb {u}_1{>}\,=1\). The inner product \(U_1\,={<}\pmb {u}_1,\pmb {X}{>}\) will be called the first principal component, and the vector function \(\pmb {u}_1\) will be called the first vector weight function. Subsequently we look for the second principal component \(U_2={<}\pmb {u}_2,\pmb {X}{>}\), which maximizes \({{\mathrm{Var}}}({<}\pmb {u},\pmb {X}{>})\), is such that \({<}\pmb {u}_2,\pmb {u}_2{>}=1\), and is not correlated with the first functional principal component \(U_1\), i.e. is subject to the restriction \({<}\pmb {u}_1,\pmb {u}_2{>}=0\).
In general, the kth functional principal component \(U_k={<}\pmb {u}_k,\pmb {X}{>}\) satisfies the conditions:
where
The expression \((\lambda _k,\pmb {u}_k(t))\) will be called the kth principal system of the process \(\pmb {X}(t)\).
In sect. 2 we have shown, that the process \(\pmb {X}(t)\) can be represented as \(\pmb {X}(t)=\pmb {\Phi }(t) \pmb {c}\), \(t \in I\). Now let us consider the principal components of the random vector \(\pmb {c}\). From \({{\mathrm{E}}}(\pmb {X})=\pmb {0}\) we have \({{\mathrm{E}}}(\pmb {c})=\pmb {0}\). Let us denote \({{\mathrm{Var}}}(\pmb {c})=\pmb {\Sigma }\). The kth principal component \(U^{*}_k={<}\pmb {\omega }_k, \pmb {c}{>}\) of this vector satisfies the conditions:
where \(\kappa _1 ,\kappa _2=1,...,k\), \(K=B_1+...+B_p\). The expression \((\gamma _k,\pmb {\omega }_k)\) will be called the kth principal system of vector \(\pmb {c}\).
Determining the kth principal system of vector \(\pmb {c}\) is equivalent to solving for the eigenvalue and corresponding eigenvectors of the covariance matrix \(\pmb {\Sigma }\) of that vector, standardized so that \(\pmb {\omega }'_{\kappa _1}\pmb {\omega }_{\kappa _2}=\delta _{{\kappa _1}{\kappa _2}}\).
Theorem 1
The kth principal system \((\lambda _k,\pmb {u}_k(t))\) of the stochastic process \(\pmb {X}(t)\) is related to the kth principal system \((\gamma _k,\pmb {\omega }_k)\) of the random vector \(\pmb {c}\) by the equations:
where \(k=1,...,s\) and \(s={{\mathrm{rank}}}(\pmb {\Sigma })\).
Proof
It may be assumed (Ramsay and Silverman 2005), that the vector weight function \(\pmb {u}(t)\) and the process \(\pmb {X}(t)\) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:
where \(\pmb {\omega } \in \mathbb {R}^{K+p}\). Then
where
Let us consider the first functional principal component of process \(\pmb {X}(t)\):
where \({<}\pmb {u}_1,\pmb {u}_1{>}=1\). This is equivalent to saying that
where \(\pmb {\omega }'_1\pmb {\omega }_1=1\).
This is the definition of the first principal component of the random vector \(\pmb {c}\). On the other hand, if we begin with the first principal system of the random vector \(\pmb {c}\) defined by \((\gamma _1,\pmb {\omega }_1)\), we will obtain the first principal system for the process \(\pmb {X}(t)\) from the equations
We may extend these considerations to the second principal system and so on. \(\square \)
Principal component analysis for random vectors \(\pmb {c}\) is based on the matrix \(\pmb {\Sigma }\). In practice this matrix is unknown. We estimate it on the basis of n independent realizations \(\pmb {x}_1(t),\pmb {x}_2(t),....,\pmb {x}_n(t)\) of the form \(\pmb {x}_i(t)=\pmb {\Phi }(t) \hat{\pmb {c}}_i\) of the random process \(\pmb {X}(t)\), where the vectors \(\hat{\pmb {c}}_i\) are centered, \(i=1,2,...,n\).
Let \(\hat{\pmb {C}}=(\hat{\pmb {c}}_1,\hat{\pmb {c}}_2,...,\hat{\pmb {c}}_n)'.\) Then
Let \(\hat{\gamma }_1\ge \hat{\gamma }_2\ge ...\ge \hat{\gamma }_s\) be non-zero eigenvalues of matrix \(\hat{\pmb {\Sigma }}\), and \(\hat{\pmb {\omega }}_1, \hat{\pmb {\omega }}_2,..., \hat{\pmb {\omega }}_s\) the corresponding eigenvectors, where \(s={{\mathrm{rank}}}(\hat{\pmb {\Sigma }})\).
Moreover the kth principal system of the random process \(\pmb {X}(t)\) determined from a sample has the following form:
Hence the coefficients of the projection of the ith realization \(\pmb {x}_{i}(t)\) of process \(\pmb {X}(t)\) on the kth functional principal component are equal to:
for \(i=1,2,...,n\), \(k=1,2,...,s\). Finally the coefficients of the projection of the ith realization \(\pmb {x}_{i}(t)\) of the process \(\pmb {X}(t)\) on the plane of the first two functional principal components from the sample are equal to \((\hat{\pmb {\omega }}'_1 \hat{\pmb {c}}_i, \hat{\pmb {\omega }}'_2 \hat{\pmb {c}}_i)\), \(i=1,2,...,n.\)
4 Discriminant coordinates for multivariate functional data
Now let us consider the case where the samples originate from L groups. We would often like to present them graphically, to see their configuration or to eliminate outlying observations. However it may be difficult to produce such a presentation even if only three features are observed. A different method must therefore be sought for presenting multidimensional data originating from multiple groups. To make the task easier, in the first step every p-dimensional observation \(\pmb {X}=(X_1,X_2,...,X_p)' \in \mathbb {R}^p\) can be transformed into a one-dimensional observation \(U_1 = u_{11} X_1 + u_{12} X_2 + ... + u_{1p} X_p = \pmb {u}_1'\pmb {X}\), and the resulting one-dimensional observations can be presented graphically as points on a straight line. In the second step we can define a second linear combination \(U_2=\pmb {u}_2\pmb {X}\) not correlated with the first, and present the observations graphically as points on a plane \((U_1,U_2)\). The space of discriminant coordinates is a space convenient for the use of various classification methods (methods of discriminant analysis). When \(L=2\) we obtain only one discriminant coordinate, coinciding with the well-known Fisher’s linear discriminant function (Fisher 1936). The functional case of discriminant coordinates analysis (FDCA) and its kernel variant (KFDCA) are also well known (Górecki et al. 2014). In this section we propose FDCA for multivariate functional data (MFDCA). Let \(\pmb {x}_{l1}(t),\pmb {x}_{l2}(t),...,\pmb {x}_{ln_l}(t)\) be \(n_l\) independent realizations of a p-dimensional stochastic process \(\pmb {X}(t)\) belonging to the lth class, where \(l=1,2,\ldots ,L\). Our purpose is to construct the discriminant coordinate based on multivariate functional data, i.e. to construct
such that their between-class variance is maximal compared with the total variance, where \(\pmb {u} \in L_2^p(I)\). The vector function \(\pmb {u}(t)=(u_1(t),u_2(t),...,u_p(t))'\) is called the vector weight function.
More precisely, the first functional discriminant coordinate \(U_1=\,{<}\pmb {u}_1,\pmb {X}{>}\) is defined as
subject to the constraint
where \({{\mathrm{Var}}}_B({<}\pmb {u},\pmb {X}{>})\) and \({{\mathrm{Var}}}_T({<}\pmb {u},\pmb {X}{>})\) are respectively the between-class and total variance of discriminant coordinate \(U_1\). Condition (4) ensures the uniqueness of the first discriminant coordinate \(U_1\).
Similarly we can construct the kth functional discriminant coordinate
where the vector weight function \(\pmb {u}_k(t)\) is defined as
subject to the constraint
Moreover the kth discriminant coordinate \(U_k\) is not correlated with the first \(k-1\) discriminant coordinates. The expression \((\lambda _k,\pmb {u}_k(t))\) will be called the kth discriminant system of the process \(\pmb {X}(t)\).
Let us recall that the process \(\pmb {X}(t)\) can be represented as \(\pmb {X}(t)=\pmb {\Phi }(t) \pmb {c}, t \in I\). Now let us consider the discriminant coordinates of the random vector \(\pmb {c}\). The kth discriminant coordinate \(U^{*}_k=\,{<}\pmb {\omega }_k, \pmb {c}{>}\) of this vector satisfies the condition:
subject to the restriction
Additionally the kth discriminant coordinate \(U^{*}_k\) is not correlated with the first \(k-1\) discriminant coordinates, i.e.
The expression \((\gamma _k,\pmb {\omega }_k)\) will be called the kth discriminant system of the random vector \(\pmb {c}\).
Theorem 2
The kth discriminant system \((\lambda _k,\pmb {u}_k(t))\) of the stochastic process \(\pmb {X}(t)\) is related to the kth discriminant system \((\gamma _k,\pmb {\omega }_k)\) of the random vector \(\pmb {c}\) by the equations:
where \(k=1,...,s\), \(s=\min (K+p,L-1)\).
Proof
We assume, that the vector weight function \(\pmb {u}(t)\) and the process \(\pmb {X}(t)\) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:
where \(\pmb {\omega } \in \mathbb {R}^{K+p}\). Than
Hence the between-class variance of the inner product \({<} \pmb {u}, \pmb {X}{>}\) is
and the total variance
where \({{\mathrm{Var}}}_B(\pmb {c})\) and \({{\mathrm{Var}}}_T(\pmb {c})\) are respectively the matrices of sum of squares and products of between-class and total variance.
For the first functional discriminant coordinate of the process \(\pmb {X}(t)\) we have:
where \(\pmb {\omega }_1'{{\mathrm{Var}}}_T(c)\pmb {\omega }_1=1\). This is equivalent to
where \(\pmb {\omega }'_1\pmb {\omega }_1=1\).
This meets the definition of the first discriminant coordinate of the random vector \(\pmb {c}\). On the other hand, if the first discriminant system \((\gamma _1,\pmb {\omega }_1)\) defines the first discriminant coordinate of the random vector \(\pmb {c}\), we will obtain the first discriminant system for the process \(\pmb {X}(t)\) from the equations
We may extend these considerations to the second discriminant system and so on.
\(\square \)
The matrices \({{\mathrm{Var}}}_B(\pmb {c})\) and \({{\mathrm{Var}}}_T(\pmb {c})\) are unknown and must be estimated based on the sample. Let \(\pmb {x}_{l1}(t), \pmb {x}_{l2}(t),...,\pmb {x}_{ln_l}(t)\) be a sample belonging to the lth class, where \(l=1,2,\ldots ,L\). The function \(\pmb {x}_{li}(t)\) has the form
where \(\hat{\pmb {c}}_{lj}=\left( \hat{c}_{10}^{(lj)},...,\hat{c}_{1K_1}^{(lj)},...,\hat{c}_{p0}^{(lj)},...,\hat{c}_{pK_p}^{(lj)}\right) '\), \(i=1,2,...,n_l\), \(l=1,2,...,L\). Let
Then
where \(n=\sum _{l=1}^Ln_l\). Next we find non-zero eigenvalues \(\hat{\gamma }_1\ge \hat{\gamma }_2\ge ...\ge \hat{\gamma }_s\) and the corresponding eigenvectors \(\hat{\pmb {\omega }}_1, \hat{\pmb {\omega }}_2,..., \hat{\pmb {\omega }}_s\) of matrix \(\hat{\pmb {T}}^{-1}\hat{\pmb {B}}\), where \(s=\min (K+p,L-1)\). Furthermore the kth discriminant system of the random process \(\pmb {X}(t)\) has the following form:
Hence the coefficients of the projection of the ith realization \(\pmb {x}_{li}(t)\) of process \(\pmb {X}(t)\) belonging to the lth class on the kth functional discriminant component are equal to:
for \(i=1,2,...,n_l\), \(k=1,2,...,s\), \(l=1,2,...,L\).
5 Canonical correlation analysis for multivariate functional data
Suppose now that we are observing two random vectors \(\pmb {Y}=(Y_1,Y_2,...,Y_p)' \in \mathbb {R}^p\) and \(\pmb {X}=(X_1,X_2,...,X_q)' \in \mathbb {R}^q\) and looking for the relationship between them. This is the one of the main problems of canonical correlation analysis. We search for weight vectors \(\pmb {u} \in \mathbb {R}^p\) and \(\pmb {v} \in \mathbb {R}^q\), such that the linear combinations \(U_1 = u_{11} Y_1 + u_{12} Y_2 + ... + u_{1p} Y_p = \pmb {u}_1'\pmb {Y}\) and \(V_1 = v_{11} X_1 + v_2 X_{12} + ... + v_{1q} X_q = \pmb {v}_1'\pmb {X}\), called the first pair of canonical variables, are maximally correlated.
Canonical correlation analysis has been extended to the case of multivariate time series (Brillinger 2001), under the assumption of stationarity, and extension of canonical correlation to functional data has been proposed in Leurgans et al. (1993), where the need for regularization was pointed out. He et al. (2000) showed that random processes with finite basis expansion have simple canonical structures, analogously to the case of random vectors. This motivates to implement regularization by projecting random processes on a finite number of basis function. The idea to project processes on the finite k basis functions has been discussed in He et al. (2004). This projection is on a prespecified orhonormal basis. In this Section, we consider the canonical correlations for the multivariate functional data. The proposed method is a generalization of the method presented in He et al. (2004). Other generalization is presented by Dubin and Müller (2005).
Let \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) are stochastic processes. We will further assume that \(\pmb {Y}\in L_2^p(I_1)\), \(\pmb {X}\in L_2^q(I_2)\) and each component \(Y_g(t)\) of process \(\pmb {Y}(t)\) and \(X_h(t)\) of process \(\pmb {X}(t)\) can be represented by a finite number of orthonormal basis functions \(\{\varphi _e\}\) and \(\{\varphi _f\}\) respectively:
Moreover let \({{\mathrm{E}}}(\pmb {Y})=\pmb {0}\), \({{\mathrm{E}}}(\pmb {X})=\pmb {0}\). This fact does not cause loss of generality, because functional canonical variables are calculated based on the covariance functions of processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\).
We introduce the following notation:
where \(\pmb {\varphi }_{E_1},...,\pmb {\varphi }_{E_p}\) and \(\pmb {\varphi }_{F_1},...,\pmb {\varphi }_{F_q}\) are orthonormal basis functions of space \(L_2(I_1)\) and \(L_2(I_2)\), respectively, and \(K_1 = E_1+E_2+...+E_p, K_2 = F_1+F_2+...+F_q\). Using the above matrix notation the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) can be represented as:
Functional canonical variables U and V for stochastic processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) are defined as follows
where the vector functions \(\pmb {u}(t)\) and \(\pmb {v}(t)\) are called the vector weight functions. The weight functions \(\pmb {u}(t)\) and \(\pmb {v}(t)\) are chosen to maximize the coefficients
subject to the constraint that
The coefficient \(\rho \) is called the canonical correlation coefficient. However, simply carrying out this maximization does not produce a meaningful result. The correlation \(\rho \) achieved by the function \(\pmb {u}(t)\) and \(\pmb {v}(t)\) is equal to 1. The canonical variate weight functions \(\pmb {u}(t)\) and \(\pmb {v}(t)\) do not give any meaningful information about the data and clearly demonstrate the need for a technique involving smoothing. A straightforward way of introducing smoothing is to modify the constraints (5) by adding roughness penalty terms to give (Ramsay and Silverman 2005):
where the roughness function \({{\mathrm{PEN}}}_2\) is the integrated squared second derivative
Assuming that the vector weight function \(\pmb {u}(t)\) and the process Y(t) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:
we have
where
Similarly assuming that
we can obtain
where
Now the first functional canonical correlation \(\rho _1\) and corresponding vector weight functions \(\pmb {u}_1(t)\) and \(\pmb {v}_1(t)\) are defined as
subject to the constraint that
In general, the kth functional canonical correlation \(\rho _k\) and the associated vector weight functions \(\pmb {u}_k(t)\) and \(\pmb {v}_k(t)\) are defined as follows:
where \(\pmb {u}_k(t)\) and \(\pmb {v}_k(t)\) are subject to the restrictions (6) and (7), and the kth pair of canonical variables \((U_k,V_k)\) is not correlated with the first \(k-1\) canonical variables, where
are canonical variables. We refer to this procedure as smoothed canonical correlation analysis. The expression \((\rho _k,\pmb {u}_k(t),\pmb {v}_k(t))\) will be called the kth canonical system of the pair of processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\).
Let
Let us consider the canonical variables \(U^{*}={<}\pmb {\omega }, \pmb {\alpha }{>}\) and \(V^{*}={<}\pmb {\nu }, \pmb {\beta }{>}\) of random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) respectively. The kth canonical correlation \(\gamma _k\) and associated vector weights \(\pmb {\omega }_k\) and \(\pmb {\nu }_k\) are defined as
subject to the restriction
where \(\pmb {R}_1\) and \(\pmb {R}_2\) are given by (8) and (9) respectively, and the kth canonical variables \((U^{*}_k, V^{*}_k)\) are not correlated with the first \(k-1\) canonical variables. The expression \((\gamma _k,\pmb {\omega }_k, \pmb {\nu }_k)\) will be called the kth canonical system of the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\).
Theorem 3
The kth canonical system \((\rho _k,\pmb {u_k}(t),\pmb {v_k}(t))\) of the pair of random processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) is related to the kth canonical system \((\gamma _k,\pmb {\omega }_k,\pmb {\nu }_k)\) of the pair of the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) by the equations:
where \(1\le k \le \min (K_1+p,K_2+q), K_1=E_1+...E_p, K_2=F_1+...+F_q\).
Proof
Without loss of generality we may assume that the covariance matrices \(\pmb {\Sigma }_{11}\) and \(\pmb {\Sigma }_{22}\) are of full column rank. As in the proof of Theorem 1 it may be assumed, that the vector weight function \(\pmb {u}(t)\) and the process \(\pmb {Y}(t)\) are in the same space, i.e. the function \(\pmb {u}(t)\) can be written in the form:
where \(\pmb {\omega } \in \mathbb {R}^{K_1+p}\). Than
Similarly for \(\pmb {v}\in L_2^q(I_2)\)
where \(\pmb {\nu } \in \mathbb {R}^{K_2+q}\). Hence
Let us consider the first canonical correlation between the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\):
where \(\pmb {u}(t)\) and \(\pmb {v}(t)\) are subject to the restrictions (6) and (7). This is equivalent to saying that
subject to the restriction
where \(\pmb {R}_1\) and \(\pmb {R}_2\) are given by (8) and (9) respectively. This is the definition of the first canonical correlation between the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\).
On the other hand, if we begin with the first canonical system \((\gamma _1,\pmb {\omega }_1,\pmb {\nu }_1)\) of the pair of random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\), we will obtain the first canonical system for the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) from the equation
We may extend these considerations to the second canonical system and so on. \(\square \)
Canonical correlation analysis for the random vectors \(\pmb {\alpha }\) and \(\pmb {\beta }\) is based on the matrices \(\pmb {\Sigma }_{11}, \pmb {\Sigma }_{22}\) and \(\pmb {\Sigma }_{12}\), which are unknown. We estimate them on the basis of n independent realizations \(\pmb {y}_1(t),\pmb {y}_2(t),....,\pmb {y}_n(t)\) of the form \(\pmb {y}_i(t)=\pmb {\Phi }_1(t) \hat{\pmb {\alpha }}_i\) of random process \(\pmb {Y}(t)\), and \(\pmb {x}_1(t),\pmb {x}_2(t),....,\pmb {x}_n(t)\) of the form \(\pmb {x}_i(t)=\pmb {\Phi }_2(t) \hat{\pmb {\beta }}_i\) of random process \(\pmb {X}(t)\), \(i=1,2,...,n\), where
Let
Finally the estimators of the matrices \(\pmb {\Sigma }_{11},\pmb {\Sigma }_{22}\) and \(\pmb {\Sigma }_{12}\) have the form:
Let \(\hat{\pmb {C}} = \hat{\pmb {\Sigma }}_{11}^{-1} \hat{\pmb {\Sigma }}_{12}\) and \(\hat{\pmb {D}} = \hat{\pmb {\Sigma }}_{22}^{-1} \hat{\pmb {\Sigma }}_{21}\), where \(\hat{\pmb {\Sigma }}_{21}=\hat{\pmb {\Sigma }}_{12}'\). The matrices \(\hat{\pmb {C}}\hat{\pmb {D}}\) and \(\hat{\pmb {D}}\hat{\pmb {C}}\) have the same nonzero eigenvalues \(\hat{\gamma }_k^2\), and their corresponding eigenvectors \(\hat{\pmb {\omega }}_k\) and \(\hat{\pmb {\nu }}_k\) are given by the equations:
\(1\le k \le \min (K_1+p,K_2+q)\).
Hence the coefficients of the projection of the ith realization \(\pmb {y}_i(t)\) of process \(\pmb {Y}(t)\) on the kth functional canonical variable are equal to
Analogously the coefficients of the projection of the the ith realization \(\pmb {x}_{i}(t)\) of process \(\pmb {X}(t)\) on the kth functional canonical variable are equal to
where \(i=1,2,\ldots ,n\), \(k=1,\ldots ,\min (K_1+p,K_2+q)\).
6 Example
The following data (Fig. 1) come from the online database of the World Bank (http://data.worldbank.org/). For the analysis, fifty-four countries of the world were chosen (\(n=54\)). These were recorded in the years 1972–2009 (\(J = 38\)). Each country belongs to one of four classes (\(L=4\)):
-
1.
Low-income economies (GDP $1,025 or less), \(n_1=3\)
-
2.
Lower-middle-income economies (GDP $1,026 to $4,035), \(n_2=19\)
-
3.
Upper-middle-income economies (GDP $4,036 to $12,475), \(n_3=14\)
-
4.
High-income economies (GDP $12,476 or more), \(n_4=18\)
and was characterized by four variables:
-
1.
\(X_1\): GDP growth (annual %)—Annual percentage growth rate of GDP at market prices based on constant local currency. Aggregates are based on constant 2000 U.S. dollars. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.
-
2.
\(X_2\): Energy use (rate of growth in kg of oil equivalent per capita)—Energy use refers to use of primary energy before transformation to other end-use fuels, which is equal to indigenous production plus imports and stock changes, minus exports and fuels supplied to ships and aircraft engaged in international transport.
-
3.
\(X_3\): CO\(_2\) emissions (rate of growth in kt)—Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
-
4.
\(X_4\): Population in urban agglomerations of more than 1 million (% of total population)—Population in urban agglomerations of more than one million is the percentage of a country’s population living in metropolitan areas that in 2000 had a population of more than one million people.
The data were transformed to functional data by the method described in Sect. 2. The calculations were performed using the Fourier basis system what is a typical selection. But others such as splines, polynomials or wavelets can also be used. Optimum values of B, selected using the BIC criterion, for \(X_1, X_2, X_3\) and \(X_4\) take the values 2, 2, 2 and 6 respectively. The time interval \([0,T]=[0,38]\) was divided into moments of time in the following way: \(t_1=0.5 (1972), t_2=1.5 (1973),..., t_{38}=37.5 (2009)\).
We used the R package fda (Ramsay et al. 2009) to create Fourier system and in order to convert raw data into a functional object. Other procedures were implemented by us (Table 1).
6.1 Multivariate functional principal component analysis (MFPCA)
The statistical objects in the functional principal component analysis are 54 countries (\(n=54\)) characterized by four (\(p=4\)) pieces of functional data\(\pmb {x}_i(t)=(x_{i1}(t),x_{i2}(t),x_{i3}(t),x_{i4}(t))', t \in [0,38], i=1,2,...,54\). No account is taken of an objects membership of one of the four defined groups of countries. The vector functions \(\pmb {x}_1(t),\pmb {x}_2(t),...,\pmb {x}_{54}(t)\) have the form
where \(\pmb {\Phi }(t)\) is a matrix with the form of (3), and the vector \(\hat{\pmb {c}}_i\) has the form
where \(B_{1}=B_{2}=B_{3}=2, B_{4}=6, i=1,2,...,54\). In the first step, from the vectors \(\hat{\pmb {c}}_1,\hat{\pmb {c}}_2,...,\hat{\pmb {c}}_{54}\) we build the matrix \(\hat{\pmb {\Sigma }}\), and next we find its eigenvalues \(\hat{\gamma }_k\) and the corresponding vectors \(\hat{\pmb {\omega }}_k\). The ratios of the particular eigenvalues to the sum of all eigenvalues, expressed as percentages, are shown in Fig. 2. It can be seen from Fig. 2 that \(94.8\,\%\) of the total variation is accounted for by the first functional principal component. In the second step we form the vector weight functions
where \(k=1,...,16\), and the corresponding functional principal components in the form
The graphs of the four components of the vector weight functions for the first and second functional principal components appear in Fig. 3. The values of the coefficients of the vector weight functions corresponding to the first and second functional principal components are given in Table 2. At a given time point t the greater is the absolute value of a component of the vector weight function, the greater is the contribution, in the structure of the given functional principal component, from the process X(t) corresponding to that component. From Fig. 3 (left) it can be seen that the greatest contribution in the structure of the first functional principal component comes from process \(X_4(t)\), and this holds for all of the observation years considered. Figure 3 (right) shows that, on specified time intervals, the greatest contribution in the structure of the second functional principal component comes alternately from the processes \(X_2(t)\) and \(X_1(t)\). The total contribution of a particular original process \(X_i(t)\) in the structure of a given functional principal component is equal to the area under the module weighting function corresponding to this process. These contributions for the four components of the vector process \(\pmb {X}(t)\), and the first and second functional principal components are given in Table 2. The relative positions of the 54 countries in the system of the first two functional principal components are shown in Fig. 4. The system of the first two functional principal components retains \(96.3\,\%\) of the total variation. From Fig. 4, we see that 54 countries form a relatively homogeneous group, with the exception of Singapore (SGP), Korea Rep. (KOR) and China (CHN).
6.2 Multivariate functional discriminant coordinates (MFDCA)
In the construction of functional discriminant coordinates, by contrast with the construction of functional principal components, we take account additionally of the information concerning the division of the 54 countries into four disjoint groups (\(L=4\)). From the vectors \(\hat{\pmb {c}}_i\) we build the estimator \(\hat{\pmb {B}}\) of the matrix of between-class variation, and the estimator \(\hat{\pmb {T}}\) of the matrix of total variation, and then we find the non-zero eigenvalues \(\hat{\gamma }_k\) of the matrix \(\hat{\pmb {T}}^{-1}\hat{\pmb {B}}\) and the corresponding vectors \(\hat{\pmb {\omega }}_k, k=1,2,3\). The ratios of particular eigenvalues to the sum of all eigenvalues, expressed as percentages, are shown in Fig. 5. It can be seen from Fig. 5 that \(74.7\,\%\) of the total variation is accounted for by the first functional discriminant coordinate. In the second step we form the vector weight functions
where \(k=1,2,3\), and the corresponding functional discriminant coordinates in the form
The graphs of the four components of the vector weight function for the first and second functional discriminant coordinates appear in Fig. 6. The values of the coefficients of the vector weight functions corresponding to the first and second functional discriminant coordinates are given in Table 3. At a given time point t the greater is the absolute value of a component of the vector weight function, the greater is the contribution, in the structure of the given functional discriminant coordinate, from the process X(t) corresponding to that component. Figure 6 (left) shows that the greatest contribution in the structure of the first and second functional discriminant coordinates comes from process \(X_4(t)\), and this holds for all of the observation years considered. Similarly as in the case of the functional principal components, the total contribution of a particular original process \(X_i(t)\) in the structure of a particular functional discriminant coordinate can be estimated using the area under the module weighting function corresponding to this process. These contributions for the four components of the vector process \(\pmb {X}(t)\) and the first and second functional discriminant coordinates are given in Table 3. The relative positions of the 54 countries in the system of the first two functional discriminant coordinates are shown in Fig. 7. The system of the first two functional discriminant coordinates retains \(93.9\,\%\) of the total variation. Compared with the projection onto the first two functional principal components, the division into four groups is better visible here. State clearly different from the other countries is the Democratic Republic of Kongo (COD). In the group of developed countries, Finland (FIN) is clearly different from other countries.
6.3 Multivariate functional canonical analysis (MFCCA)
In the construction of functional canonical variables we do not take account of the division of the 54 countries into four groups, and we divide the four-dimensional stochastic process into two parts: \(\pmb {Y}(t)=(X_2(t),X_3(t))'\) and \(\pmb {X}(t)=(X_1(t),X_4(t))'\). In our case \(p=q=2\). We are interested in the relationship between the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\). We build estimators of the matrices \(\pmb {\Sigma }_{11},\pmb {\Sigma }_{22}\) and \(\pmb {\Sigma }_{12}\), and we then find the non-zero eigenvalues \(\hat{\gamma }^2_k\) and corresponding vectors \(\hat{\pmb {\omega }}_k\) of the matrix \(\hat{\pmb {C}}\hat{\pmb {D}}\), and the eigenvalues \(\hat{\gamma }^2_k\) and corresponding vectors \(\hat{\pmb {\nu }}_k\) of the matrix \(\hat{\pmb {D}}\hat{\pmb {C}}\), where \(\hat{\pmb {C}} = \hat{\pmb {\Sigma }}_{11}^{-1} \hat{\pmb {\Sigma }}_{12}\) and \(\hat{\pmb {D}} = \hat{\pmb {\Sigma }}_{22}^{-1} \hat{\pmb {\Sigma }}_{21}, \hat{\pmb {\Sigma }}_{21}=\hat{\pmb {\Sigma }}_{12}', k=1,...,6\). The eigenvalues \(\hat{\gamma }_k\), called canonical correlations, are shown in Fig. 8. In the second step we form vector weight functions
corresponding to the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\), where \(k=1,...,6.\). Corresponding to these functions are the functional canonical variables in the form
corresponding to the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\). The graphs of the two components of the vector weight function for the first and second functional canonical variables of the processes \(\pmb {Y}(t)\) and \(\pmb {X}(t)\) are shown in Figs. 9, 10. Table 4 contains the values of the coefficients of the vector weight functions, together with the total contribution from each process in the structure of the corresponding functional canonical variable. The relative positions of the 54 countries in the systems \((\hat{U}_1,\hat{V}_1)\) of functional canonical variables are shown in Fig. 11. The strong correlation (\(\rho _1=0.951\)) between the processes \(\pmb {X}(t)\) and \(\pmb {Y}(t)\) means that in the system of canonical variables \((\hat{U}_1,\hat{V}_1)\) points representing individual countries are almost on a straight line. In terms of the correlation between the processes \(\pmb {X}(t)\) and \(\pmb {Y}(t)\), 54 countries form a relatively homogeneous group with the exception of Singapore (SGP) and Saudi Arabia (SAU).
7 Conclusions and future work
This paper introduces and analyzes a new methods of constructing canonical variables and discriminant coordinates for multivariate functional data. In addition, we reviewed principal components analysis for such data (Jacques and Preda 2014). FDA is an important tool that can be used for exploratory data analysis. A primary advantage is the ability do assess continuous data without reducing the signal into discrete variables. By representing each curve as a function, it is possible to use functional analogue of classical methods. Functional methods (1) allow more complex dynamics than classical methods; (2) they utilize a nonparametric smoothing technique to reduce the observational error; and (3) they solve the inverse and multicollinearity problems caused by the ”curse of dimensionality”.
Proposed methods was applied to geographic economic multivariate time series. Our research has shown, on this example, that the use of a multivariate projective dimension reduction techniques gives good results and provide an attractive method for flexibly analyse such data. Of course, the performance of the algorithms needs to be further evaluated on additional real and artificial data sets.
In a similar way, we would like to extend manifold dimension reduction techniques like multidimensional scaling (Borg and Groenen 2005), isometric feature mapping (Tenenbaum et al. 2000) or maximum variance unfolding (Weiss 1999) for univariate functional data to multivariate case. This is the direction of our future research.
References
Aneiros G, Vieu P, Aneiros G, Vieu P (2014) Variable selection in infinite-dimensional problems. Stat Probab Lett 94:12–20
Berk RA (2008) Statistical learning from a regression perspective. Springer, New York
Berrenderoa JR, Justela A, Svarcb M (2011) Principal components for multivariate functional data. Comput Stat Data Anal 55:2619–2634
Besse P (1979) Étude descriptive d’un processus: approximation et interpolation. Ph.D. thesis, Université Paul Sabatier (Toulouse)
Besse P, Ramsay JO (1986) Principal components analysis of sampled functions. Psychometrika 51(2):285–311
Bongiorno EG, Goia A, Salinelli E (2014) Contributions in infinite-dimensional statistics and related topics. Societa Editrice Esculapio, Bologna
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications. Springer, New York
Brillinger DR (2001) Time series: data analysis and theory. Society for Industrial and Applied Mathematics, Philadelphia
Burges CJC (2009) Dimension reduction: a guided tour. Found Trends Mach Learn 2(4):275–365
Chamroukhi F, Glotin H, Samé A (2013) Model-based functional mixture discriminant analysis with hidden process regression for curve classification. Neurocomputing 18:153–163
Delaigle A, Hall P (2012) Achieving near perfect classification for functional data. J R Stat Soc 74(2):267–286
Dubin JA, Müller HG (2005) Dynamical correlation for multivariate longitudinal data. J Am Stat Assoc 100(471):872–881
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Ferraty F, Gonzalez-Manteiga W, Martinez-Calvo A, Vieu P (2012) Presmoothing in functional linear regression. Stat Sin 22:69–94
Fisher RA (1936) The use of multiple measurements in taxonomic problem. Ann Eugen 7:179–188
Goia A, Vieu P (2014) Some advances on semi-parametric functional data modelling. In: Contributions in infinite-dimensional statistics and related topics, Esculapio, Bologna
Górecki T, Krzyśko M (2012) Functional principal components analysis. In: Pociecha J, Decker R (eds) Data analysis methods and its applications. C.H. Beck, Munich, pp 71–87
Górecki T, Krzyśko M, Waszak Ł (2014) Functional discriminant coordinates. Commun Stat—Theory Methods 43(5):1013–1025
He G, Müller HG, Wang JL (2000) Extending correlation and regression from multivariate to functional data. Asymptotics in statistics and probability. VSP, Zeist, pp 197–210
He G, Müller HG, Wang JL (2004) Methods of canonical analysis for functional data. J Stat Plan Inference 122:141–159
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171
Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Kudraszow NL, Vieu P (2013) Uniform consistency of kNN regressors for functional variables. Stat Probab Lett 83(8):1863–1870
Leurgans SE, Moyeed RA, Silverman BW (1993) Canonical correlation analysis when the data are curves. J R Stat Soc 55(3):725–740
Mosler K, Mozharovskyi P (2015) Fast DD-classification of functional data. Stat Papers, doi:10.1007/s00362-015-0738-3
Panaretos VM, Kraus D, Maddocks JH (2010) Second-order comparison of Gaussian random functions and the geometry of DNA minicircles. J Am Stat Assoc 105(490):670–682
Peng J, Müller HG (2008) Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann Appl Stat 2(3):1056–1077
Peng Q, Zhou J, Tang N (2015) Varying coefficient partially functional linear regression models. Stat Papers, doi:10.1007/s00362-015-0681-3
Rachdi M, Vieu P (2006) Nonparametric regression for functional data: automatic smoothing parameter selection. J Stat Plan Inference 137:2784–2801
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Ramsay JO, Hooker G, Graves S (2009) Functional data analysis with R and MATLAB. Springer, New York
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69(7–9):730–742
Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Ph.D. thesis, Université Pierre et Marie Curie (Paris)
Shmueli G (2010) To explain or to predict? Stat Sci 25(3):289–310
Sober E (2002) Instrumentalism, parsimony, and the Akaike framework. Philos Sci 69:112–123
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Wang G, Zhou J, Wu W, Chen M (2015) Robust functional sliced inverse regression. Stat Papers, doi:10.1007/s00362-015-0695-x
Weiss Y (1999) Segmentation using eigenvectors: a unifying view. In: Proceedings of the IEEE international conference on computer vision. Los Alamitos, pp 975–982
Acknowledgments
We would like to thank the editor and two anonymous reviewers for the very useful comments and suggestions which help us improve the quality of our paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Górecki, T., Krzyśko, M., Waszak, Ł. et al. Selected statistical methods of data analysis for multivariate functional data. Stat Papers 59, 153–182 (2018). https://doi.org/10.1007/s00362-016-0757-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-016-0757-8