Abstract
This paper is concerned with orthonormal systems in real intervals, given with zero Dirichlet boundary conditions. More specifically, our interest is in systems with a skew-symmetric differentiation matrix (this excludes orthonormal polynomials). We consider a simple construction of such systems and pursue its ramifications. In general, given any \(\text {C}^1(a,b)\) weight function such that \(w(a)=w(b)=0\), we can generate an orthonormal system with a skew-symmetric differentiation matrix. Except for the case \(a=-\infty \), \(b=+\infty \), only few powers of that matrix are bounded and we establish a connection between properties of the weight function and boundedness. In particular, we examine in detail two weight functions: the Laguerre weight function \(x^\alpha \textrm{e}^{-x}\) for \(x>0\) and \(\alpha >0\) and the ultraspherical weight function \((1-x^2)^\alpha \), \(x\in (-1,1)\), \(\alpha >0\), and establish their properties. Both weights share a most welcome feature of separability, which allows for fast computation. The quality of approximation is highly sensitive to the choice of \(\alpha \), and we discuss how to choose optimally this parameter, depending on the number of zero boundary conditions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and Motivation
1.1 Spectral Methods
This work is motivated by spectral methods for time-dependent partial differential equations (PDEs) of the form
where \({\mathcal {L}}\) is a well-posed linear operator, defining a strongly continuous semigroup, and \(\Omega \subseteq {\mathbb {R}}^d\), given with an initial condition for u(x, 0) and appropriate boundary conditions on \(\partial \Omega \). Standard examples are \({\mathcal {L}}=\Delta \) with \(f\equiv 0\) (the diffusion equation) or f a cubic polynomial in u with real zeros (the Fitzhugh–Nagumo equation) and \({\mathcal {L}}={\mathrm i}\Delta \) with either \(f=-{\mathrm i}V(x)\) (the linear Schrödinger equation) or \(f=-{\mathrm i}\lambda |u|^2u\) (the nonlinear Schrödinger equation with standard cubic nonlinearity).
In this paper, we are concerned by spectral methods applied in tandem with a splitting approach. As an example, we may commence by approximating locally the solution of (1.1) using the Strang splitting,
where \(u^n(x)\) is an approximation to \(u(x,n\Delta t)\). Here \(\textrm{e}^{t{\mathcal {L}}}v\) is a shorthand for a numerical solution at t of \(\partial u/\partial t={\mathcal {L}}u\), \(u(0)=v\), while \(\textrm{e}^{tf}v\) denotes a numerical solution of the ordinary differential equation (ODE) \(\,\text {d}u/\,\text {d}t=f(x,u)\), \(u(0)=v\). The splitting (1.2), which incurs a local error of \(\mathcal{O}\!\left( (\Delta t)^3\right) \), is but one example of operatorial splittings [1, 2, 20] and is intended here to illustrate a general point, namely that the solution of ‘complicated’ PDEs can be reduced to the solution of ‘simple’ PDEs and ODEs. Once done correctly, this procedure is consistent with eventual quality of the solution, concerning accuracy and stability alike.
Another benefit of (1.2) and of similar splittings is that it is consistent with conservation of the \(\text {L}_2\) energy. Many dispersive equations, e.g. Schrödinger (linear or nonlinear), Gross–Pitaevskii, Dirac, Klein–Gordon and Korteveg–De Vries, conserve the \(\text {L}_2\) norm of the solution. This often represents a highly significant physical feature, and it is vital to respect it under discretisation. (Note that conservation of \(\text {L}_2\) norm automatically implies numerical stability.) Because of the special form of (1.2) (and of similar splittings), the overall numerical scheme preserves \(\text {L}_2\) energy if both the discretisations of \(\textrm{e}^{ t{\mathcal {L}}}\) and \(\textrm{e}^{tf}\) do so. Insofar as \(\textrm{e}^{tf}\) is concerned, we can use the very extensive and robust existing theory [8], e.g. use a symplectic method (which automatically also preserves the \(\text {L}_2\) norm). It is more challenging to ensure that \(\Vert \textrm{e}^{t{\mathcal {L}}}v\Vert _2=\Vert v\Vert _2\) for every v, in other words that the discretisation of \(\textrm{e}^{t{\mathcal {L}}}\) is unitary.
1.2 The Differentiation Matrix
In this paper, we are concerned with spectral methods for time-dependent problems [3, 9, 27]. In a nutshell, we commence from a set \(\Phi =\{\varphi _n\}_{n\in {\mathbb {Z}}_+}\), where each \(\varphi _n\) is defined in \(\Omega \) and endowed with appropriate regularity. We assume that \(\Phi \) is orthonormal in the standard \(\text {L}_2\) inner product,
and complete in \(\text {L}_2(\Omega )\), and expand a solution in the basis \(\Phi \),
where the coefficients \(u_n(0)\)s are determined by expanding the initial condition \(u_0\), while the \(u_n(t)\)s, \(t>0\) are typically evolved by Galerkin conditions, which for (1.1) read
In a practical method, we truncate the expansion and the range of m, thereby obtaining a finite-dimensional linear system of ODEs.
Substantive advantage of spectral methods is that expansions in orthonormal bases typically converge very rapidly indeed: for example, orthogonal polynomials converge in a finite interval to analytic (in the interval and its neighbourhood) functions at an exponential rate. Therefore, the number of degrees of freedom, compared to the more usual finite difference or finite element methods, is substantially smaller. While this is not the entire truth—finite differences and finite elements produce sparse linear algebraic systems while spectral elements yield dense matrices which sometimes can be also ill conditioned and, moreover, expanding a function in an orthonormal basis can be potentially costly—spectral methods are often the approach of choice in numerical computations. In the specific context of time-dependent problems, however, naive spectral methods are unstable [9]. This motivates us to consider the major concept of a differentiation matrix.
In the sequel, we restrict our narrative to the univariate case, \(\Omega \subseteq {\mathbb {R}}\), for a number of reasons. Firstly, surprisingly, even the univariate case (as we hope to persuade the reader) is dramatically incomplete. Secondly, it lays the foundations to a multivariate case, whether by tensorial extension to parallelepipeds or by more advanced means which we intend to explore in a future paper.
The set \(\Phi \) being a basis of \(\text {L}_2(\Omega )\cap \text {C}^1(\Omega )\), any function therein can be expressed as a linear combination of the \(\varphi _n\)s, and this is particularly true with regard to the derivatives \(\varphi _m'\). This yields a linear map represented by the infinite-dimensional matrix \(\mathscr {D}\) such that
It is very simple to prove by integration by parts that the differentiation operator \(\varvec{D}=\partial /\partial x\) is skew Hermitian in the following three configurations of boundary conditions:
-
1.
The torus \(\Omega ={\mathbb {T}}\) (i.e. periodic boundary conditions);
-
2.
The Cauchy problem: \(\Omega ={\mathbb {R}}\); and
-
3.
Zero Dirichlet boundary conditions on the boundary of \(\Omega \subset {\mathbb {R}}\).
In that case \(\Vert \textrm{e}^{t\scriptstyle \varvec{D}}\Vert =1\) (unless otherwise stated, we assume in this paper the Euclidean norm.) and, \(\varvec{D}^2\) being Hermitian and negative semidefinite, \(\Vert \textrm{e}^{t\scriptstyle \varvec{D}^2}\Vert \le 1\). More generally, \(\varvec{D}^{2\ell +1}\) is skew Hermitian and \((-1)^{\ell -1} \varvec{D}^{2\ell }\) negative semidefinite for all \(\ell \in {\mathbb {Z}}_+\). Consequently, once \({\mathcal {L}}=\sum _{\ell =1}^M a_\ell \varvec{D}^{\ell }\), where \((-1)^{\ell -1} a_{2\ell }\ge 0\), it is trivial to prove that \(\langle {\mathcal {L}}u,u\rangle \le 0\) for every u in the underlying Hilbert space.
This feature is retained by a spectral method, provided that \(\mathscr {D}\) is skew Hermitian, as is its \((N+1)\times (N+1)\) section \(\mathscr {D}_N\). In the context of the PDE (1.1), it thus follows that, letting \({\mathcal {L}}_N=\sum _{\ell =1}^M a_\ell \mathscr {D}_N^{\,\ell }\), we have \(w^*{\mathcal {L}}_N w\le 0\) for all \(w\in {\mathbb {C}}^{M+1}\). A consequence is that the \(\text {L}_2\) energy is conserved for \({\mathcal {L}}=\varvec{D}\) (the Schrödinger case) and it dissipates for \({\mathcal {L}}=\varvec{D}^2\) (the diffusion equation case). In both cases, numerical stability comes in the wash.
1.3 Few Examples
This is the right moment to expand further on skew symmetry and differentiation matrices in a practical setting by means of few examples, restricting ourselves for simplicity to a single space dimension. Firstly, the reaction–diffusion equation
where f is a low-degree polynomial. In a finite-dimensional setting, the equation is replaced by \(\varvec{u}'=\mathscr {D}^*\mathscr {D}\, \varvec{u}+f(\varvec{u})\) and, provided \(\mathscr {D}\) is skew Hermitian, \(\mathscr {D}^*\mathscr {D}\) is negative semidefinite. Once we use, e.g. the Strang splitting (1.2), the diffusion component \(\mathscr {D}^{\,2}\) is assured to be dissipative. Another example is the convection–diffusion equation for incompressible flow (assuming, for simplicity, constant vector field),
where \(c\in {\mathbb {R}}\). Its semidiscretisation is \(\varvec{u}'=(\mathscr {D}^*\mathscr {D}+c\mathscr {D})\varvec{u}\) and, since for skew-Hermitian matrix \(\mathscr {D}\)
a negative-semidefinite matrix, stability is assured.
Yet another example (and the underlying motivation to the work that has led to this paper) is dispersive equations of the form
where \(f(x,u)=V(x)u\) for linear Schrödinger equation, \(f(x,u)=\lambda |u|^2u\) for standard nonlinear Schrödinger equation and \(f(x,u)=[V(x)+\lambda |u|^2]u\) for the Gross–Pitaevskii equation. All these equations conserve the \(\text {L}_2\) energy \(\Vert u\Vert \) (often known as ‘mass’ in this context), and this is a fundamental physical invariant. Once the equation is semi-discretised in the form \({\mathrm i}\varvec{u}'=-\mathscr {D}^*\mathscr {D}\,\varvec{u}+\varvec{F}(\varvec{u})\) and solved, e.g. with the Strang splitting (1.2), the mass is conserved because \({\mathrm i}\mathscr {D}^*\mathscr {D}\) is skew Hermitian.
Our last example is the KdV equation
Strang splitting (1.2) means that we solve separately \(\partial \varvec{u}/\partial t=-\mathscr {D}^3 \varvec{u}\) and scalar Burgers equations \(\partial u_m/\partial t=-u_m (\mathscr {D}\varvec{u})_m\). In each case, once \(\mathscr {D}\) is skew Hermitian, the solution is stable and \(\text {L}_2\) energy conserved.
An important take-away lesson of our four examples is that we typically work (for real \(\mathscr {D}\)) with a specific power (or powers) of the differentiation matrix, predicated by the space derivatives present in the differential equation. In particular, we require not just \(\mathscr {D}\) but its specific powers to be bounded.
1.4 An Orthonormal Basis
The obvious choice of an orthonormal system is a set of orthogonal polynomials—unless we use (possibly shifted) Legendre polynomials, this means replacing the \(\text {L}_2\) inner product by another, defined by the orthogonality weight function—but it is clear that this produces a lower triangular \(\mathscr {D}\). While there are nontrivial ways round it [22], there is strong motivation to consider alternative orthonormal systems.
The periodic case—without loss of generality, \(\Omega =[-\pi ,\pi ]\) with periodic boundary conditions—is obvious: we let \(\Phi =\{\textrm{e}^{{\mathrm i}nx}\}_{n\in {\mathbb {Z}}}\), the Fourier basis. An added bonus is fast expansion by means of a Fast Fourier Transform of any \(\text {L}_2[-\pi ,\pi ]\cap \text {C}_{\text {per}}[-\pi ,\pi ]\) function in the underlying basis. This is the paradigmatic case whereby a spectral method has few competitors.
The Cauchy case \(\Omega =(-\infty ,\infty )\) has been the subject of an extensive recent study [11,12,13,14]. In particular, all orthonormal systems \(\Phi \) such that \(\mathscr {D}\) is skew Hermitian and tridiagonal have been completely characterised. Specifically, they are in a one-to-one relationship with Borel measures, supported on the entire real line. Let \(\,\text {d}\mu (x)=w(x)\,\text {d}x\) be such a measure (w might be a generalised function) and \(\mathscr {P}=\{p_n\}_{n\in {\mathbb {Z}}_+}\) the underlying set of orthonormal polynomials. It is elementary that \(\mathscr {P}\) obeys a three-term recurrence relation
where \(\beta _{-1}=0\), \(\alpha _n\in {\mathbb {R}}\) and \(\beta _n>0\) for \(n\in {\mathbb {Z}}_+\). Inverse Fourier transforming \(\{w^{1/2} p_n\}_{n\in {\mathbb {Z}}_+}\) and multiplying the n term by \({\mathrm i}^n\), we obtain an orthonormal set \(\Psi \), dense in \(\text {L}_2({\mathbb {R}})\) and such that
In other words, \(\psi _n={\mathrm i}^n {\mathcal {F}}^{-1}(w^{1/2}p_n)\). Therefore, \(\mathscr {D}\) is skew Hermitian (skew symmetric if \(\alpha _n\equiv 0\), which is the case once w is an even function) and tridiagonal, while orthonormality follows from the Plancherel theorem. Tridiagonality is a valuable feature because it is easy to manipulate \(\mathscr {D}\) (e.g. multiply \(\mathscr {D}_N\) by a vector or approximate \(\exp (t\mathscr {D}_N)\)) and the powers of the infinite-dimensional matrix \(\mathscr {D}\) (approximating higher derivatives) remain bounded.
This leaves us with the third—and most difficult—case, namely zero Dirichlet boundary conditions.Footnote 1 This is the subject of this paper.
A natural inclination is to extend the Fourier transform-based theory from \((-\infty ,\infty )\) to, say, \((-1,1)\). This can be done in one of two obvious ways and, unfortunately, both fail. The first is to choose a measure \(\,\text {d}\mu \) supported by \((-1,1)\), but this leads again to \(\Phi \) supported on the entire real line, the only difference being that in this case the closure of \(\Phi \) is not \(\text {L}_2(-\infty ,\infty )\) but a Paley–Wiener space [11]. Another possibility is to abandon altogether the Fourier route and alternatively commence by specifying a \(\varphi _0\), subsequently determining the \(\varphi _n\)s for \(n\in {\mathbb {N}}\) and the matrix \(\mathscr {D}\) consistently with both orthogonality and tridiagonality. A forthcoming paper demonstrates how to do this algorithmically. However, given \(\varphi '=\mathscr {D}\,\varphi \), it follows by induction that \(\varphi ^{(s)}=\mathscr {D}^{\,s}\varphi \) for all \(s\in {\mathbb {Z}}_+\). Consistency with zero Dirichlet boundary conditions, though, requires \(\varphi _n(\pm 1)\equiv 0\), and this implies that \(\varphi _n^{(s)}(\pm 1)\equiv 0\) for all \(n,s\in {\mathbb {Z}}_+\). If \(\varphi _0\) is analytic in \((-1,1)\), this means that it necessarily must have an essential singularity at the endpoints. Intuitively, this is bad news, and this is confirmed by numerical experiments that indicate that the \(\varphi _n\)s develop boundary layers and wild oscillations near \(\pm 1\) and their approximation power is nil.
Both ideas above fall short and the current paper embarks on an altogether different approach, abandoning tridiagonality and the Fourier route altogether. Note that the existence of an essential singularity at the endpoints hinged on the fact that all powers of the infinite matrix \(\mathscr {D}\) are bounded. This is obvious once \(\mathscr {D}\) is tridiagonal (or, more generally, banded); hence, our main idea is to choose an orthonormal set \(\mathscr {D}\) such that \(\mathscr {D}^{\,s+1}\) blows up for some \(s\in {\mathbb {N}}\). At the same time, we wish to retain a major blessing of tridiagonality, namely that \(\mathscr {D}_N w\) can be computed in \(\mathcal{O}\!\left( N\right) \) operations for any \(w\in {\mathbb {C}}^{N+1}\) and, more generally, that \(\mathscr {D}_N\) is amenable for fast linear algebra.
1.5 Plan of this Paper
The main idea underlying this paper is exceedingly simple: given a measure \(\,\text {d}\mu =w\,\text {d}x\), where \(w\in {\mathbb {C}}^1(a,b)\), and an underlying set \(\mathscr {P}=\{p_n\}_{n\in {\mathbb {Z}}_+}\) of orthonormal polynomials, we set
It follows at once that \(\Phi \) is orthonormal with respect to \(\text {L}_2(a,b)\) and it is easy to determine conditions so that \(\varphi _n(a)=\varphi _n(b)=0\) for all \(n\in {\mathbb {Z}}_+\). It is not difficult to specify the conditions on w for skew symmetry of \(\mathscr {D}\). However, the narrative becomes more complicated once we seek a system such that \(\mathscr {D}^{\,k}\) is bounded for \(k=1,\ldots ,s\) and blows up for \(k=s+1\). Likewise, it is considerably more challenging to identify systems \(\Phi \) that allow for fast computation of \(\mathscr {D}_N w\) for \(w\in {\mathbb {C}}^{N+1}\).
In Sect. 2, we introduce the functions (1.4) in a more rigorous setting of Sobolev spaces and explore general properties of their differentiation matrices. Section 3 is devoted to two families of weight functions, namely the Laguerre family \(w_\alpha (x)=x^\alpha \textrm{e}^{-x}\chi _{(0,\infty )}(x)\) and the ultraspherical family \(w_\alpha (x)=(1-x^2)^\alpha \chi _{(-1,1)}(x)\). We prove that both families have a separable differentiation matrix. This feature (put to a good use in Sect. 4) is very special—indeed, there are good reasons to conjecture that these two families are the only weights with this feature. We present a detailed example of two orthogonal families, generalised Hermite weights \(w_\mu (x)=|x|^{2\mu }\textrm{e}^{-x^2}\) and Konoplev weights \(w_{\alpha ,\beta }(x)=|x|^{2\beta +1}(1-x^2)^\alpha \chi _{(-1,1)}(x)\), and prove that their differentiation matrices cannot be separable unless (for Konoplev weights) \(\beta =-\frac{1}{2}\) and the weight reduces to ultraspherical. Note, however, that separability is just one feature lending itself to fast numerical algebra and one cannot rule out that other weights might lead to differentiation matrices which, while non-separable, are amenable to fast computation.
Finally, in Sect. 4 we demonstrate how separability of the differentiation matrix can be utilised for fast multiplication of \(\mathscr {D}_N w\), \(w\in {\mathbb {R}}^{N+1}\), in \(\mathcal{O}\!\left( N\right) \) operations.
The original idea to consider functions of the form (1.4) in the specific case of Freud weights has been considered first by Luong [19], who demonstrated that in this specific case \(\mathscr {D}\) is a skew-symmetric, banded matrix with bandwidth seven. This was a serendipitious choice: in Sect. 2, we prove that the only weights that produce a banded matrix in the setting of (1.4) are generalised Freud weights!
2 W-Functions
2.1 The Definition and Some of Its Consequences
Let (a, b) be a non-empty real interval, \(-\infty \le a<b\le \infty \), and \(s\in {\mathbb {N}}\cup \{\infty \}\). We denote by the Sobolev space of \(\text {H}_2^s[a,b]\) functions f such that
(note that \(\text {C}^{s-1}[a,b]\subset \text {H}_2^s[a,b]\), therefore the derivatives are well defined) equipped with the inner product
A weight function \(w\in \text {L}_2(a,b)\cap \text {C}^1(a,b)\) is a positive function with all its moments
bounded. Given a weight functions, we can define (e.g. using a Gram–Schmidt process) a set of orthonormal polynomials \(\mathscr {P}=\{p_n\}_{n\in {\mathbb {Z}}_+}\) such that
Such a set is unique once we require, for example, that the coefficient of \(x^n\) in \(p_n\) is always positive. We say that \(\varphi _n\) is the nth W-function once
and let \(\Phi =\{\varphi _n\}_{n\in {\mathbb {Z}}_+}\). It follows at once from (2.1) that \(\Phi \) is an orthonormal set with respect to the standard \(\text {L}_2\) inner product.
Remark 1
The functions \(\varphi _n\) inherit some features of orthonormal polynomials; in particular, they obey the same three-term recurrence relation. However, the expansion coefficients of an arbitrary function are different:
Remark 2
An important difference between \(\mathscr {P}\) and \(\Phi \) is that while the \(p_n\) are polynomials, hence analytic functions, the W-functions carry over potential singularities of the weight function. For example, for the Chebyshev weight function \(w(x)=(1-x^2)^{-1/2}\chi _{(-1,1)}(x)\) the \(\varphi _n\)s have weak singularity at the endpoints \(\pm 1\), while their derivatives possess strong singularity there.
We let \(\mathscr {D}\) stand for the infinite-dimensional differentiation matrix
We say that w is of index \(s\in {\mathbb {N}}\cup \{\infty \}\) and denote this by \(\text {ind}\, w=s\) if \(\mathscr {D}^{\,k}\) is bounded for \(k=1,\ldots ,s\), while \(\mathscr {D}^{\,s+1}\) is unbounded.
Lemma 1
\(\mathscr {D}\) is skew symmetric if and only if \(w(a)=w(b)=0\).
Proof
Assume first that \(-\infty<a<b<\infty \) and note that \(\mathscr {D}\) is skew symmetric if and only if \(\mathscr {D}_{m,n}+\mathscr {D}_{n,m}=0\), \(m,n\in {\mathbb {Z}}_+\). Since
it follows from (2.2) and the orthonormality of \(\Phi \) that \(\mathscr {D}\) is skew symmetric if
for \(m,n\in {\mathbb {Z}}_+\). The latter is equivalent, for every \(m,n\in {\mathbb {Z}}_+\), to
All the zeros of orthogonal polynomials reside in (a, b). Therefore, they cannot vanish at the endpoints and \((-1)^k p_k(a)p_k(b)>0\), \(k\in {\mathbb {N}}\); hence, \(p_m(a)p_n(a)\) cannot equal \(p_m(b)p_n(b)\) for all \(m,n\in {\mathbb {N}}\). We deduce that \(\mathscr {D}\) is skew symmetric if and only if \(w(a)=w(b)=0\).
The proof is similar—in fact, somewhat simpler—once either \(b=\infty \) or \(a=-\infty \). If \((a,b)=(-\infty ,\infty )\), then \(\text {L}_2\) boundedness and continuity imply \(w(\pm \infty )=0\), \(\mathscr {D}\) is skew symmetric, and there is nothing to prove. \(\square \)
Consequently, we impose an additional condition on the weight function, namely that it vanishes at the endpoints. Note that this is automatically true once an endpoint is infinite.
A quintessential example of a family of W-functions is Hermite functions
where the \(\text {H}_n\)s are standard Hermite polynomials. Hermite functions are well known in mathematical physics, because they are eigenfunctions of the free Schrödinger operator. They can be derived from orthonormalised Hermite polynomials via the Fourier transform route, as mentioned in introduction and e.g. in Arieh Iserles and Marcus Webb [11]; hence, their differentiation matrix is tridiagonal. On the other hand, they are W-functions with \(w(x)=\textrm{e}^{-x^2}\). Tridiagonality implies that \(\text {ind}\,w=\infty \). More generally, \(\text {ind}\,w=\infty \) once \(\mathscr {D}\) is a banded matrix and it is interesting to characterise all weight functions with this feature.
Theorem 2
The differentiation matrix \(\mathscr {D}\) of a system of W-functions is banded if and only if \(w(x)=\textrm{e}^{-c(x)}\), \(x\in {\mathbb {R}}\), where c is an even-degree polynomial whose highest degree coefficient is strictly positive.
Proof
Letting \(m\le n-1\), orthogonality implies that
while for \(m\ge n+1\) skew symmetry yields
Assume that \(\mathscr {D}\) has bandwidth \(2L+1\), in other words that \(\mathscr {D}_{m,n}=0\) for \(|m-n|\ge L+1\). It follows for \(m\le n-1\) that
If in addition \(n\ge L\), expanding \(X_m\) in the basis \(\mathscr {P}\) it follows that \(X_m\) is a polynomial of degree \(m+L+1\). However,
for some \(x_0\). Since w is independent of m, necessarily \(p_m\) divides \(X_m\) and the remainder c is a polynomial independent of m. Therefore, without loss of generality \(w(x)=\textrm{e}^{-c(x)}\) where c is a polynomial of degree \(L+1\). w being integrable and \(w(a)=w(b)=0\), necessarily \(a=-\infty \), \(b=\infty \),Footnote 2L is odd and c is an even-degree polynomial with strictly positive leading-degree coefficient. \(\square \)
We have recovered precisely the W-functions associated with generalised Freud polynomials that have been originally considered in Luong [19]. However, such W-functions are of little interest within the context of this paper, since we seek weight functions of finite index.
This is the point to note the expression (2.3) for the elements of \(\mathscr {D}\) such that \(m\ge n+1\). (If \(m\le n-1\) we need to flip the sign.) We will make much use of it in the sequel.
2.2 The Boundedness of \(\mathscr {D}^{\,s}\)
We assume in this section that the weight w is strictly positive in (a, b), as smooth in [a, b] as needed in our construction, and set
Therefore,
As long as \(\mathscr {D}^{\,s}\) is bounded, we have
It is trivial to prove that
where each \(U_{r,j}\) is a linear combination of products of the form \(\prod _i w^{(\ell _i)}\) such that \(\ell _i\ge 1\) and \(\sum _i \ell _i=r\): for example,
In general, \(U_{r,r}=(-1)^{r-1} (2r)! {w'}^r/(4^rr!)\), \(r\in {\mathbb {N}}\).
Since \(w(x)>0\) in (a, b), the only possible source of singularity in (2.4) is that \(\varphi _m^{(s)}\) is non-integrable at an endpoint. Recalling that \(w(a)=w(b)=0\), integrability is lost exclusively when dividing by a power of w, and the larger the power, the more significant the singularity. In other words, \(\varphi _m^{(s)}\) is bounded for all \(m\in {\mathbb {Z}}_+\) only if the integral
is bounded for any polynomial p, and this is contingent on \({\tilde{w}}_s={w'}^s/w^{s-1}\) being a signed weight function, i.e. all its moments exist and \({\tilde{w}}_s\not \equiv 0\).Footnote 3 The following theorem is thereby true.
Theorem 3
A necessary condition for \(\text {ind}\,w\ge s\) is that \({\tilde{w}}_r\), \(r=2,\ldots ,s\), are signed weight functions.
Let \(-\infty<a<b<\infty \). Regularity and \(w(a)=w(b)=0\) imply that
Therefore, after elementary algebra,
Theorem 4
A necessary condition for \(\text {ind}\, w\ge s\) in a finite interval (a, b) is that \(\alpha ,\beta >s-1\). Likewise, once \(b=\infty \), we need \(\alpha >s-1\) and for \(a=-\infty \) the condition is \(\beta >s-1\).
Proof
Consistently with our assumptions, \(v\ne 0\) in [a, b]; therefore, the only source of singularity may come from \((x-a)^{\alpha -s}\) and \((b-x)^{\beta -s}\). We conclude that, for \(\mathscr {D}^{\,s}\) to be bounded, we need \(\alpha ,\beta >s-1\). The semi-infinite cases follow in an identical (and simpler!) manner. \(\square \)
In the special case \(s=2\), we can complement Theorem 3 with a sufficient condition.
Theorem 5
A necessary and sufficient condition for \(\text {ind}\,w\ge 2\) is that \({\tilde{w}}_2={w'}^2/w\) is a signed measure.
Proof
We compute \(\mathscr {D}^{\,2}\) directly. Using skew symmetry,
Recalling (2.3), let us consider the infinite sum
Therefore,
Let \(\mathscr {P}\) be orthonormal and complete in \(\text {L}_2((a,b),w\,\text {d}x)\) and \(f\in \text {L}_2((a,b),w\,\text {d}x)\). Then
Moreover, by the Parseval theorem,
Since
exchanging summation and integration we have
Let
be the Christoffel–Darboux kernel. It now follows from (2.7) that for every \(f\in \text {L}_2((a,b),w\,\text {d}x)\) it is true that
and we deduce that
In other words, K is a reproducing kernel.Footnote 4
We now return to (2.6), deducing that
This is bounded because \({\tilde{w}}_2\) is a signed measure and \(p_mp_n\) a polynomial, and we deduce that \(\text {ind}\,w\ge 2\). The necessity of \({\tilde{w}}_2\) being a signed measure is obvious from the argument that led to Theorem 4. \(\square \)
3 Separable Systems
In this section, we consider two families of weight functions that share a hugely beneficial attribute of separability, and we also provide two examples of weights that lack this feature.
We say that a weight function w is separable if there exist real sequences \({\mathfrak {a}}=\{{\mathfrak {a}}_n\}_{n\in {\mathbb {Z}}_+}\) and \({\mathfrak {b}}=\{{\mathfrak {b}}_n\}_{n\in {\mathbb {Z}}_+}\) such that
and it is symmetrically separable subject to the existence of real sequences \({\mathfrak {a}}=\{{\mathfrak {a}}_n\}_{n\in {\mathbb {Z}}_+}\) and \({\mathfrak {b}}=\{{\mathfrak {b}}_n\}_{n\in {\mathbb {Z}}_+}\) such that
It is demonstrated in Sect. 4 that separability or symmetric separability allows for very rapid computation of products of the form \(\mathscr {D}_N\varvec{v}\) for \(\varvec{v}\in {\mathbb {R}}^{N+1}\). This is intimately related to earlier results on fast computation of some structured algebraic systems in Refs. [6, 7].
In this section, we consider two families of measures, one separable and the other symmetrically separable: the Laguerre weight \(w(x)=x^\alpha \textrm{e}^{-x}\chi _{(0,\infty )}(x)\) and the ultraspherical weight \((1-x^2)^\alpha \chi _{(-1,1)}(x)\), respectively. In a way, they are the most obvious measures in intervals of the form \((0,\infty )\) and \((-1,1)\), respectively. Yet, interestingly, separability appears to be a very rare feature and we provide counterexamples further in this section.
In both Laguerre and ultraspherical cases, we are able to present comprehensive analysis, deriving the sequences \({\mathfrak {a}},{\mathfrak {b}}\) explicitly, determining \(\text {ind}\,w\) and (in Sect. 4) discussing the optimal choice of the parameter \(\alpha \). While both Laguerre and ultraspherical polynomials have been comprehensively studied, the separability and the formulæ (3.1) and (3.2) are, to the best of author’s knowledge, new.
3.1 The Laguerre Family
Laguerre polynomials are orthogonal with respect to the Laguerre weight,
[23, p. 206]. In our case, we consider just the case \(\alpha >0\), so that the weight function vanishes at the origin. We have
In Theorem 12, we determine that the Laguerre weight is separable—the proof requires a fair bit of algebraic computation and is relegated to Appendix A. The separability coefficients are given in (A.8), which we repeat here for clarity,
Note that
this will be important in the sequel.
Theorem 4 presents a necessary condition for \(\text {ind}\, w\ge s\) for \(s\ge 2\): for a Laguerre weight \(w=w_\alpha \) it translates to \(\alpha >s-1\). In the remainder of this subsection, we wish to prove that for the Laguerre weight function this condition is also sufficient.
The matrix \(\mathscr {D}^{\,s}\) is absolutely bounded for \(s\ge 0\) if
It is clear that absolute boundedness implies boundedness.
We assume that \(m\ge n+1\) and observe that everything depends on the interplay of the relative sizes of \(k_0=m,k_1,k_2,\ldots ,k_{s-1},k_s=n\) because, for example,
We can disregard the case \(k_j=k_{j+1}\) because then \(\mathscr {D}_{k_j,k_{j+1}}=0\) and the entire product vanishes; hence, we assume that always \(k_j\ne k_{j+1}\). We use the shorthand \(\searrow \) for \(k_j>k_{j+1}\) and \(\nearrow \) for \(k_j<k_{j+1}\). Note that once s is even then \(\mathscr {D}^{\,s}\) is symmetric and diagonal elements no longer vanish: in that case we need to consider also the case \(m=n\) but the proof is identical.
To illustrate our argument, for \(s=4\) we have eight options:
except that \(\nearrow ^3\) is impossible because \(k_0=m> n=k_2\).
We let \({\mathcal {Q}}_N\) stand for a generic polynomial of degree exactly N and note for further use the following technical result with a straightforward proof.
Proposition 6
The sum
converges as \(K\rightarrow \infty \) if and only if \(\alpha >N+1\). Here c is a constant, while “\(\sim \)” means that we are disregarding lower-order terms.
In general, the main idea is to write a sequence of \(\nearrow \)s and \(\searrow \)s in the form
where \(i_k,j_k\ge 0\) and \(\sum _{k=1}^t (i_k+j_k)=s\). We call \(\nearrow ^r\) a \(\nearrow \)-pre-chain of length r and \(\searrow ^r\) a \(\searrow \)-pre-chain of length r; in other words, we decompose each product in (3.5) into a sequence of alternating pre-chains.
Consider first an \(\nearrow \)-pre-chain of length \(r\ge 1\). Because of (3.4), it equals
We say that \({\mathfrak {a}}_{k_{\ell -1}}\) and \({\mathfrak {b}}_{k_{\ell +r-1}}\) are the head and the tail of the pre-chain, respectively.
Likewise, for an \(\searrow \)-pre-chain of length \(r\ge 1\) we have
Now \({\mathfrak {b}}_{k_{\ell -1}}\) and \({\mathfrak {a}}_{k_{\ell +r-1}}\) are the head and the tail of the pre-chain, respectively.
Except for \(\ell =0\) and \(\ell +r=s\), we join the tail of a pre-chain to the head of the succeeding pre-chain. The outcome is \(\nearrow \)-chains and \(\searrow \)-chains. Note thus that a chain has no head, while its tail is multiplied by the head of its successor pre-chain. (In this procedure, we lose the head of the leading pre-chain and the tail of the last pre-chain, but this makes no difference to the finiteness—or otherwise—of the sum)
An \(\nearrow \)-chain of length r is of the form
a finite sum. Hence, it cannot be a source for unboundedness of the sum (3.5). Matters are different, though, with an \(\searrow \)-chain of length r: straightforward algebra and Proposition 6 imply that
Therefore, boundedness takes place if \(\alpha -r>1\).
Since the length of any chain is at most \(s-1\) and \(\searrow ^{s-1}\) is impossible (recall, \(k_0>k_s\)), the maximal length of an \(\searrow \)-chain is \(s-2\). We thus deduce that \(\alpha >s-1\).
Theorem 7
\(\text {ind}\,w_\alpha \ge s\) for the Laguerre weight if and only if \(\alpha >s-1\).
Proof
The necessity is proved in Theorem 4, while sufficiency follows because absolute boundedness in (3.5) implies boundedness. \(\square \)
In Fig. 1, we display the absolute values of the entries of \(\mathscr {D}^{\,s}\) for different values of \(\alpha \) and s. The computation involves infinite matrices, hence infinite products which need be truncated in computation. Thus, we compute \(300\times 300\) matrices and their powers, while displaying just their \(100\times 100\) section, since this minimises the truncation effects. For \(s=1\), all differentiation matrices are bounded and of a moderate size; the sole difference is that as \(\alpha \) grows, the matrix becomes more ‘centred’ about the diagonal. However, already for \(s=2\) the difference is discernible. For \(\alpha =1\), we are right on the boundary of \(\alpha >s-1\) (on its wrong side!) and the size of \(\mathscr {D}^{\,2}\) grows rapidly: had we displayed a section of an \(M\times M\) matrix for \(M\gg 300\), the magnitude would have grown at a logarithmic rate, as indicated by the proof of absolute boundedness. Once \(\alpha >1\), the magnitude grows at a slower rate and would remain bounded for \(M\rightarrow \infty \). Finally, for \(s=3\) the cases \(\alpha =1\) and \(\alpha =2\) correspond to polynomial and logarithmic growth, respectively, and this is apparent in the figure. Finally, for \(\alpha =4\) the rate of growth slows down and it is persuasive that the magnitude remains bounded as \(M\rightarrow \infty \). Note that even in a ‘good’ \(\alpha \) regime the magnitude, while decaying along rows and columns, grows along diagonals. We refer to the discussion following Fig. 2 for an explanation of this behaviour, commenting here that this phenomenon follows from \({\mathfrak {a}}_m\sim m^{-\alpha /2}\) and \({\mathfrak {b}}_n\sim n^{\alpha /2}\).
It follows from Theorem 7 that once we approximate functions in , we need to choose \(\alpha >s-1\). However, there is much more to the choice of a good \(\alpha \) and we defer its discussion to Section 4. As it turns out, the quality of approximation is exceedingly sensitive to the right choice and numerical results indicate that there exists a ‘sweet spot’ that brings about substantially improved quality of approximation.
3.2 The Ultraspherical Family
The ultraspherical weightFootnote 5 a special case of the Jacobi weight, is \(w_\alpha (x)=(1-x^2)^\alpha \chi _{(-1,1)}(x)\), \(\alpha >1\)—in our case the requirement \(w_\alpha (\pm 1)=0\) restricts \(\alpha \) to the range \((0,\infty )\). We have
where the constant
orthonormalises an ultraspherical polynomial [23, p. 260]. Recalling the identity (2.3), we let
It is sufficient to derive the \(\mathscr {E}_{m,n}\)s explicitly and prove that \(\mathscr {E}\) is symmetrically separable. We recall that our interest is in odd values of \(m+n\) and assume without loss of generality that \(m\ge n+1\)
Let
noting that \(S_{m,n}^\alpha =0\) if \(m+n\) is odd. The three-term recurrence relation for orthonormal ultraspherical polynomials is
as can be easily confirmed from Rainville [23, p. 263]. Therefore,
Our next task is determining the explicit form of \(S_{m,n}^\alpha \) for even \(m+n\) and, without loss of generality, \(m\ge n\). This is accomplished in Appendix B and results in
We conclude from (3.7) that
and
is valid for all odd \(m+n\), \(m\ge n+1\)—once \(n\ge m+1\), we need to swap m and n and invert the sign. (Of course, \(\mathscr {D}_{m,n}=0\) once \(m+n\) is even.)
Our first conclusion is that the measure is symmetrically separable with
The next conclusion is that the rate of growth (or decay) is dramatically different along the rows and the columns of \(\mathscr {D}\). It follows from (3.8) and the standard Stirling formula [21, 5.11.3] that
Therefore, the elements of the differentiation matrix decay geometrically (at any rate, for \(\alpha >\frac{1}{2}\)) along rows (and, because of skew symmetry, columns) and increase geometrically along diagonals. Note that, forming powers of \(\mathscr {D}\), it is the decay along rows and columns that allows for boundedness. (Incidentally, it can be proved using special functions that \((\mathscr {D}^{\,2})_{0,0}=\alpha (2\alpha +1)/[4(\alpha -1)]\), driving home the fact, already known from Theorem 5, that \(\alpha >1\) is necessary and sufficient for boundedness. We leave the proof, which plays no further role in our narrative, as an exercise for the reader.)
Figure 2 displays the magnitude of the powers of differentiation matrix for ultraspherical weights and different values of \(\alpha \) and s, using the same rules of engagement as in Fig. 1. Two trends are discernible, both following from our discussion. Firstly, the decay along rows accelerates as \(\alpha \) grows and \(\mathscr {D}^{\,s}\) is more concentrated near the diagonal. Secondly, the terms along the diagonal (of course, with \(m+n\) of the right parity) grow the fastest. Their rate of growth is rapid (and grows with \(\alpha \)), but this need not be a problem, at any rate once we approximate sufficiently smooth functions. In that case \(\mathscr {D}\), its powers and possibly functions (e.g. \(\exp (h\mathscr {D})\)) act on the expansion coefficients of functions in the underlying basis \(\Phi \). Provided these functions are sufficiently smooth, it is plausible that these coefficients decay very rapidly and, for analytic functions, at an exponential rate. (We defer to Sect. 4 for more substantive discussion of convergence.) Thus, large terms along the diagonal will multiply small terms in a vector of expansion coefficients—something that might conceivably cause loss of accuracy for truly huge matrices but which is probably negligible in practice.
Similar to Laguerre weights, we now seek to prove that Theorem 4 provides also a sufficient condition for \(\text {ind}\,w_\alpha \ge s\) for ultraspherical weights, i.e. that \(\alpha >s-1\) implies that \(\mathscr {D}^{\,s}\) is bounded. Our method of proof is similar to that of Theorem 7, except that we need to account for a number of differences: firstly \(\mathscr {D}_{m,n}\) can be nonzero only when \(m+n\) is even, secondly, we have symmetric separability in place of separability, and thirdly, (3.4) is no longer true and needs to be replaced by
Letting again \(k_0=m\), \(k_s=n\), where \(m\ge n+1\) (or \(m\ge n\) once s is even), we need to replace (3.5) by
where the star means that we sum only over pairs \((k_{i-1},k_{i})\) such that \(k_{i-1}+k_{i}\) is odd. Note that Proposition 6 remains true for the ‘starred sum’ except that the constant (of which we care little) is different.
We again commence with \(\nearrow \)- and \(\searrow \)-pre-chains. Little changes for a \(\searrow \) chain, since the sum remains finite. The only possible challenge to boundedness may originate in a \(\nearrow \) chain. We analyse a \(\nearrow \)-pre-chain using (3.10),
and, using Proposition 6, transition seamlessly to a \(\nearrow \)-chain, while disregarding lower-order terms,
Consequently, \(\alpha >r\) is necessary and sufficient for convergence for each \(\nearrow \)-chain. Since the length of an \(\nearrow \)-chain is at most \(s-1\), we deduce, similar to Theorem 7, that
Theorem 8
\(\text {ind}\,w_\alpha \ge s\) for the ultraspherical weight if an only if \(\alpha >s-1\).
Both Theorems 7 and 8 present the same inequality. This is not surprising since, for both weights, \(\alpha \) measures the ‘strength’ of a zero at the endpoint(s).
We conclude this subsection with plots displaying the \({\ell }_2\) norms of truncated \(N\times N\) differentiation matrices for both Laguerre and ultraspherical cases for \(\alpha \in \{1,2,3,4\}\).
As can be seen from Fig. 3, the \(\ell _2\) norm of an \(N\times N\) principal minor of \(\mathscr {D}\) increases linearly with N for Laguerre, quadratically for the ultraspherical weight. This has obvious implications, inter alia to the implementation of Krylov subspace methods in the manipulation of differentiation matrices. Having said so, Krylov subspace methods may be problematic in this context [10]. In Sect. 4, we outline alternative approaches to numerical algebra of separable differentiation matrices.
3.3 Counterexamples: generalised Hermite and Konoplev weights
Ultraspherical and Laguerre weights are the obvious and most elementary choice in the intervals \((-1,1)\) and \((0,\infty )\), respectively, and they are both separable in the sense of this paper. This might lead to an impression that separability is ubiquitous: this would be highly misleading.
Lemma 9
Let
Separability implies that \(\iota _{m,n}=0\) for all \(m\ge n+2\), while symmetric separability implies that \({\check{\iota }}_{m,n}=0\) for all \(m+n\) odd, \(m\ge n+2\).
Proof
Follows at once from the definition of (symmetric) separability. \(\square \)
Note that neither (3.11) nor (3.12) are sufficient. Thus, a skew-symmetric \(\mathscr {D}\) such that \(\mathscr {D}_{2m+1,n}=0\) for all \(m\in {\mathbb {Z}}_+\) and \(n\le 2m-1\) satisfies (3.11), but in general is not separable. Likewise, a tridiagonal skew-symmetric matrix obeys (3.12) but is not symmetrically separable—this is the case with the differentiation matrix associated with the Hermite weight, for example. Trying weights at random and computing, say, \(\iota _{2,0}\) leads time and again to non-separable weights.
To explore further the (non)existence of separable weights, we examine two weights, generalisations of Hermite and ultraspherical weights, respectively, but endowed with an additional parameter: the generalised Hermite and Konoplev weights.
3.3.1 Generalised Hermite weights
Letting \(\mu >-\frac{1}{2}\), we examine the weight
[5, p. 156], originally considered by Szegő.Footnote 6 It can be easily deduced from Chihara [5, p. 156–7] that the underlying W-functions are
Generalised Hermite weights are of marginal importance to the work of this paper, and although their differentiation matrix can be derived explicitly,
where \((z)_n=z(z+1)\cdots (z+n-1)\) is the Pochhammer symbol, with skew-symmetric complement, we will not present here a formal (and lengthy) algebra. Instead, a reader might use a symbolic algebra package to compute the first few elements, enough to evaluate \(\iota _{2,0}\) and \({\check{\iota }}_{3,0}\) and check that they are both nonzero—in light of Lemma 9 this is sufficient to rule out separability and symmetric separability, respectively.
As a matter of fact, \(\mathscr {D}\) has an interesting shape: its \((2m+1)\)st columns (hence also the \((2n+1)\)st rows) are consistent with a tridiagonal matrix, more specifically with the differentiation matrix corresponding to the standard Hermite weight (i.e. with \(\mu =0\)). More specifically,
otherwise \(\iota _{m,n}=0\) for \(m\ge n+2\), while
In each case, the separability tests (3.11) and (3.12) fail only marginally—but fail nonetheless.
3.3.2 Konoplev weights
Letting \(\alpha ,\gamma >-1\), we set
The weight (3.14), which has been considered in [16, 17] and described in Chihara [5, p. 155], generalises ultraspherical weights by adding the possible weakly singular factor \(|x|^{2\gamma +1}\). Specifically, \(w_{\alpha ,\gamma }\in \text {C}^s(-1,1)\) if and only if
The underlying orthogonal polynomial system is
and the monic polynomials obey the three-term recurrence relation
where
Replacing Jacobi polynomials by their orthonormal counterparts and using a formula from Rainville [23, p. 260], easy algebra confirms that
therefore
The weights (3.14) are symmetric; thus, we examine the possibility of symmetric separability. A brute force computation yields
ruling out symmetric separability except for the case \(\gamma =-\frac{1}{2}\), which corresponds to the ultraspherical weight.
3.3.3 A Limiting Behaviour of the \(\iota _{m,n}\)s
While separability, hence \(\iota _{m,n}=0\) for \(m\ge n+2\), appears to be exceedingly rare, we claim that the latter holds more broadly in a much weaker, asymptotic form.
Let w be a weight in (a, b), \(w(a)=w(b)=0\), with the underlying orthonormal polynomials \(\{p_n\}_{n=0}^\infty \), where the coefficient of \(x^n\) in \(p_n\) is \(k_n>0\). Comparing the coefficients of \(x^{n+1}\) in the three-term recurrence relation (1.3), we deduce at once that \(k_{n+1}/k_n=\beta _n^{-1}\).
Theorem 10
Assuming that \(\beta _n\ge \beta ^*>0\) and \({\tilde{w}}(x)=[w'(x)]^2/w(x)\) is itself a weight function in (a, b), it is true that
Proof
Letting \(m\ge n+2\), (2.3) yields
We recall the Christoffel–Darboux formula,
where \(k_n>0\) is the coefficient of \(x^n\) in \(p_n\) [5, p. 153]. Therefore,
because \(\beta _n^{-1}\le {\beta ^*}^{-1}\). Letting \(n\rightarrow \infty \) in (3.16), we obtain
where K is tha Christoffel–Darbeaux kernel from the proof of Theorem 5. According to (2.8), it is a reproducing kernel and it follows at once that the double integral vanishes. \(\square \)
The condition \(\beta _n>\beta ^*\), \(n\in {\mathbb {Z}}_+\), is very weak: we already know that \(\beta _n>0\), all the condition says is that, in addition, the \(\beta _n\)s are bounded away from zero.
4 Computational Aspects
4.1 A Product of \(\mathscr {D}\) by a Vector
Practical implementation of the ideas of this paper requires manipulation of expressions involving a matrix \(\mathscr {D}\) which is either separable or symmetrically separable: formation of products of the form \(\mathscr {D}^{\,r}\varvec{f}\) for \(r\in {\mathbb {N}}\), the solution of algebraic linear systems of the form \(p(\mathscr {D})\varvec{y}=\varvec{x}\), where p is a polynomial and the computation of \(\textrm{e}^{h\mathscr {D}}\varvec{u}\). Tridiagonal differentiation matrices, of the form considered in [11], enjoy substantial advantage in this context. Yet, once a weight function is separable (or symmetrically separable), all these objectives can be attained using fast algorithms. The matrix \(\mathscr {D}\) is a special case of a semiseparable matrix [7, p. 50]: all its minors located either wholly above the diagonal or wholly beneath it are of rank 1. This allows for fast products and fast computation of linear systems [6, 7, 28] and, using the Cauchy–Dunford integral formula (also known as the Dunford–Taylor or Riesz–Fantappié formula, [4]), compute \(\textrm{e}^{h\mathscr {D}}\varvec{u}\).
In this subsection, we examine in detail the formation of products of the form \(\varvec{h}=\mathscr {D}\,\varvec{f}\), where \(\varvec{f}\) is a (real or complex) infinite-dimensional vector, while \(\mathscr {D}\) is either separable or symmetrically separable. While the main idea is not new—cf. for example, [6]—there is merit in presenting it for the convenience of the reader, as an elementary example of more advanced numerical algebra computations in [6, 7, 28]. As a matter of fact, our algorithm is somewhat more general, because it is based on infinite-dimensional computations.
Consider a separable weight function, e.g. a Laguerre weight. The starting point is an integer N, typically much larger than M, such that \(|f_m|\) is negligible (in practical terms, smaller than a user-provided error tolerance) for \(m>N\), and we wish to form
We commence by assuming that a weight is separable, whereby (3.1) yields
where
Then
Assuming that the \({\mathfrak {a}}_m\)s and \({\mathfrak {b}}_n\)s have been precomputed (and this needs to be done only once, no matter how many products are required), the calculation (4.1) takes just \(\approx N+4M\) flops—and by the same token, computing the first \(M+1\) entries of \(\mathscr {D}_N^{\,r}\!\varvec{f}_{\!N}\) takes \(\approx r(N+4M)\) flops.
Similar operations count applies to symmetrically separable weight, whereby the entries of \(\mathscr {D}\) obey (3.2). Assuming that both M and N are even, we have
Therefore
Set
hence
However,
and
Thus, again, we need just \(\approx 4M+N\) flops to compute the first \(M+1\) terms of \(\mathscr {D}_N\varvec{f}_{\!N}\).
4.2 The Speed of Convergence
The convergence of orthogonal polynomials to ‘nice’ (in particular, analytic) functions is well understood, and this can be leveraged to the case of W-functions. It is beneficial first, though, to present some computational results, to highlight the importance of choosing the right value of \(\alpha \) in the context of either Laguerre or ultraspherical weights, while comparing them to standard approximation by the underlying orthogonal polynomials.
It rapidly becomes apparent that we have a competition between different imperatives:
-
The number of zero boundary conditions: This determines the value of \(\alpha \) and, according to Theorems 7 and 8, we need \(\alpha >s-1\) in .
-
Regularity of approximating functions: While \(\mathscr {P}\) consists of polynomials, hence analytic functions, this is not the case with \(\Phi \), whether in the context of ultraspherical or Laguerre weights: it all depends on the value of \(\alpha \). If \(\alpha \) is an even integer, then the \(\varphi _n\)s are analytic; otherwise, analyticity fails at the endpoints.
-
The underlying function space: Much depends on how the error is measured. Among the many possibilities, we single out two: the norm for a suitable value of p (in particular, the \(\text {L}_2(a,b)\) norm) and the \(\text {L}_\infty [a,b]\) (and, more generally, \(\text {H}_\infty ^p[a,b]\)) norm. The choice of a norm depends on the underlying application.
Preliminary numerical experimentation reveals a remarkable state of affairs. In Fig. 4, we let \(\alpha \) be in \(\{1,2,3,4\}\). In this and all figures in this paper, we denote \(\alpha =1\) by a red, dotted line, \(\alpha =2\) by a magenta solid line, \(\alpha =3\) by a green dashed line and, finally, \(\alpha =4\) by a blue dash-dotted line. Because of the rapid decay of errors, we display them all in a logarithmic scale to base 10—in other words, the y-axis displays the number of decimal digits. Given a function f and recalling the expansion coefficients \({\hat{f}}_n^P\) and \({\hat{f}}^\Phi _n\) from Remark 1, corresponding to expansions in \(\mathscr {P}\) and \(\Phi \), respectively, we let
Thus, \(F^P_N-f\) and \(F^\Phi _N-f\) are the (pointwise) errors with respect to the polynomial and the W-function basis, respectively, and we need to measure them in an appropriate norm. We denote by \({}^d\!F_N^P\) the derivative expansion, i.e. with \(p_n\) and f replaced by \(p_n'\) and \(f'\), respectively, similarly for higher derivatives and for \(F^\Phi _N\).
4.2.1 Ultraspherical W-Functions
We commence from ultraspherical weights and consider
In Fig. 4, we display in logarithmic norm the \(\text {L}_\infty [-1,1]\) error for polynomial approximation to f and its first derivative (top row) and for W-functions for the ultraspherical weight.Footnote 7 Polynomial approximation—as can be expected from general theory and the analyticity of f—decays at an exponential speed and, for \(N=30\), we attain \(\approx 32\) significant digits. This is also the case with derivatives, with a very minor degradation in accuracy. The error for W-functions, though, is radically different. The errors for \(\alpha \in \{1,3,4\}\) decay very slowly, at a polynomial rate, and for \(N=30\) we recover just \(\approx 4\) significant digits, an unacceptably large error. On the other hand, the error for \(\alpha =2\) at \(N=30\) is \(\approx 3\times 10^{-39}\), significantly better than polynomial approximation!
The reason for this ‘miraculous’ behaviour for \(\alpha =2\) bears some attention. Little surprise perhaps that \(\alpha =1\) behaves poorly because it is at the wrong end of the boundedness condition for \(\mathscr {D}^{\,2}\). However, as a matter of fact, we do not consider second derivatives in this particular instance and \(\alpha \in \{3,4\}\) are just as bad. The reasons are as follows. For \(\alpha \in \{1,3\}\), the \(\varphi _n\)s have a weak singularity along the boundary, while \(\varphi _n'\) becomes singular there. For \(\alpha =4\), on the other hand, \(\varphi _n'(\pm 1)=0\) mean that \(\text {L}_\infty \) convergence of derivatives is impossible unless also the derivatives of f vanish at the endpoints. (This is the reason why \(\log _{10}\Vert {}^d\!F_N^\Phi -f'\Vert _\infty \) is displayed only for \(\alpha =2\).)
Not much changes if, instead of \(\text {L}_\infty \), we compute an error, except that in general \(\text {L}_2\)-like norms are more forgiving. In principle, neither singularities or excessive vanishing of derivatives at the endpoints need prevent convergence. Thus, in Fig. 5 we plot the \(\text {L}_2(-1,1)\) errors for example (4.2). The overall picture remains the same: polynomial approximation decays at exponential rate and we attain, regardless of the choice of \(\alpha \), about 34 significant digits for \(N=30\), while W-function approximation for \(\alpha \in \{1,3,4\}\) is very poor yet, for \(\alpha =2\), we again hit the ‘sweet spot’ and recover \(\approx 38\) significant digits. W-functions are vastly superior for \(\alpha =2\) but fail dismally otherwise.
To explore further the error committed by ultraspherical W-functions, we consider
the only difference in this (not very imaginative!) choice is that now \(f(\pm 1)=f'(\pm 1)=0\). We display the \(\text {L}_\infty \) error for \(f^{(i)}\), \(i=0,1,2\), in Fig. 6 for the W-functions. The error in polynomial approximation is roughly independent of \(\alpha \), and for \(N=30\) we attain \(\approx 24\) decimal digits for f, \(\approx 21\) for \(f'\) and \(\approx 19\) for \(f''\). By this stage, we should not be surprised that \(\alpha =1\) and \(\alpha =3\) do badly in approximating f because of the weak singularity at the endpoints and they fail altogether approximating derivatives. For \(\alpha \in \{2,4\}\), the endpoints are analytic and indeed the underlying functions do very well indeed, definitely better than polynomial approximation. \(\alpha =4\) is a winner, unsurprisingly because and this is matched by \(\Phi \). However, \(\alpha =2\) does quite well, worse by perhaps two decimal digits but still beating polynomial approximation. The reason is that too few zero Dirichlet boundary conditions do not prevent \(\text {L}_\infty \) convergence of an orthogonal sequence, although they might slow it up to a modest extent. On the other hand, excessive zero Dirichlet boundary conditions prevent \(\text {L}_\infty \) convergence at the endpoints. Thus, the interplay between the number of zero boundary conditions and the choice of \(\alpha \) is not symmetric! It is always better to err by choosing smaller \(\alpha \), as long as it is an even integer, consistent with the bound of Theorem 8.
4.2.2 Laguerre W-Functions
We are now concerned with the Laguerre weight and choose the model problem
Note that \(f(0)=0\), \(f'(0)\ne 0\).
An expansion in Laguerre (or any other) polynomials cannot be bounded in an infinite interval; hence, instead of plotting \(\log _{10}\Vert F_N^P-f\Vert _2\) for increasing values of N, we choose \(N=40\) and plot the pointwise error in the interval [0, 30]. This is evident on the left of Fig. 7: the error is just about fine for small \(x>0\), subsequently growing rapidly (as a matter of fact, exponentially). On the other hand, as can be seen on the right of that figure, the error of W-functions is uniformly bounded. For \(\alpha \in \{1,3,4\}\), it is fairly similar—and unacceptably large—while for \(\alpha =2\) we attain \(\approx 10\) decimal digits of accuracy, apparently uniformly in \([0,\infty )\). Yet again we have the ‘sweet spot’ for \(\alpha =2\). This state of affairs remains true for the first few derivatives, and the deterioration in accuracy using W-functions is very mild indeed.
Finally, we consider
Now \(f(0)=f'(0)=0\) and \(f''(0)\ne 0\). There is no need to display the \(\text {L}_\infty [0,\infty )\) error committed by Laguerre polynomials since, again, it is unbounded.
In Fig. 8, we employ the same colour and style scheme to plot the errors committed in [0, 30] for f, \(f'\) and \(f''\). Clearly, \(\alpha =2\) and \(\alpha =4\), the two values associated with analyticity at the origin, win insofar as approximating the function itself is concerned, although the margin is somewhat smaller than in our other examples. The approximation of the first and the second derivatives is more interesting: on the face of it, it is a dead heat between \(\alpha =2\) and \(\alpha =4\), but closer examination of the behaviour near the left endpoint unravels a crucial difference. For example, for \(\eta =10^{-10}\) we have (to four significant digits)
\(\alpha \) | 1 | 2 | 3 | 4 |
---|---|---|---|---|
\(|F_{60}^\Phi (\eta )-f(\eta )|\) | \(2.128_{-08}\) | \(9.631_{-15}\) | \(1.434_{-16}\) | \(7.778_{-24}\) |
\(|{}^d\!F_{60}^\Phi (\eta )-f'(\eta )|\) | \(1.064_{+02}\) | \(9.631_{-05}\) | \(2.151_{-06}\) | \(9.555_{-14}\) |
\(|{}^{dd}\!F^\Phi _{60}(\eta )-f''(\eta )|\) | \(5.319_{+11}\) | \(4.092_{-03}\) | \(1.075_{+05}\) | \(9.555_{-04}\) |
The conclusion is clear. Once the inequality of Theorem 7 is breached, the approximation blows up at the origin: this happens with \(\alpha =1\) and any derivative. The error for \(\alpha =3\) decays for \(N\gg 1\) for the function value and the first derivative, but it blows up for the second derivative, while for \(\alpha =2\) the progression to the correct boundary condition is considerably slower than for \(\alpha =4\). This is apparent from Fig. 9: \(\alpha =4\) wins, although by a small margin.
4.2.3 The Speed of Convergence Redux
As promised, we leverage standard theory on the convergence of orthogonal expansions of analytic functions to the setting of W-functions. For simplicity, we consider just the ultraspherical weight, supported in \([-1,1]\), but our argument readily extends to all other settings. Thus, suppose that the function f is analytic in a Bernstein ellipse enveloping \([-1,1]\) and \(f(\pm 1)=0\) and let \({\tilde{f}}(x)=f(x)/(1-x^2)\). Note that \({\tilde{f}}\) is also analytic in \([-1,1]\). Expanding f in powers of an ultraspherical W-function yields the coefficients
and, once \(\alpha =2\), we have standard expansion of \({\tilde{f}}\) in the ultraspherical polynomial basis \(\{\text {P}_n^{(2,2)}\}_{n\in {\mathbb {Z}}_+}\) and standard results for expansions in orthogonal polynomials (cf. for example [29]) apply. In particular,
and the rate of convergence in the \(\text {L}_2\) norm is at least exponential. The same is true for any even integer \(\alpha \ge 2\).
Similar reasoning applies to other weights and to higher-order zero boundary conditions.
Needless to say, choosing \(\alpha \in 2{\mathbb {N}}\) is essential, but the ultimate choice depends on the number of zero boundary conditions, in the spirit of the previous two subsections.
4.3 Outstanding Computational and Theoretical Challenges
This is the first paper to consider W-functions in an organised way, although of course Hermite functions have been used and investigated extensively and W-functions associated with Freud weights (and which are special because of Theorem 2) have been introduced in [19]. Needless to say, this work neither resolves all the mathematical and computational issues associated with W-functions nor claims to do so. While there are important theoretical questions, e.g. to characterise all separable or symmetrically separable weight functions, perhaps the most urgent issues are related to the applications of W-functions to spectral methods. This concerns issues in approximation theory (speed of convergence in different function classes), as well as purely computational questions. The speed of approximation points out an imperfect duality between W-functions and the functions \(\Psi =\{\psi _n\}_{n\in {\mathbb {Z}}_+}\) from Sect. 1. Recalling the \({\hat{f}}_n^\Phi \), the nth expansion coefficient in \(\mathscr {P}\) and letting \({\check{w}}(x)=w(x)\chi _{(a,b)}(x)\), the Plancherel theorem yields at once for every \(n\in {\mathbb {Z}}_+\)
and we recover an expansion in \(\Psi \) of the Fourier transform of f. This duality, though, is imperfect because, unless \((a,b)={\mathbb {R}}\), it is valid (insofar as \(\Psi \) is concerned) only in the Paley–Wiener space \({\mathcal {P}}_{(a,b)}({\mathbb {R}})\) rather than in \(\text {L}_2({\mathbb {R}})\) [11]. Moreover, comprehensive convergence theory for functions of the form \(\Psi \) is also lacking. Yet, even an imperfect duality might potentially lead to useful outcomes.
The final issue we wish to mention is fast computation of expansion coefficients in a W-function basis, similar perhaps to fast expansion algorithms in polynomial bases [22]. All this is a matter for future research.
Notes
It is elementary to reduce nonzero Dirichlet conditions to zero ones by reformulating the PDE for \(u^{\text {new}}=u-\rho \), where \(\rho \) is any sufficiently regular function that obeys the correct Dirichlet boundary conditions on the boundary.
In a finite interval, we would have had an essential singularity at an endpoint.
\({\tilde{w}}_1\) cannot be a ‘true’ weight function because \(w(a)=w(b)=0\); hence, \(w'\) changes sign in (a, b).
While this is probably known, the author failed to find this result in literature, even in the encyclopaedic review of the Christoffel–Darboux kernel in Simon [25], see also Refs. [15, 18]—the reason might well be that the emphasis is usually on general Borel measures, rather than on \(\,\text {d}\mu =w\,\text {d}x\) with \(w\in \text {C}^1(a,b)\). One way or the other, the proof is included for completeness.
Also known, subject to different scaling, as the Gegenbauer weight [23, p. 276].
Polynomial approximation, of course, leads to an unstable spectral method. Yet, its error and its comparison with the error committed by W-functions are of independent interest.
And it might well be already known.
References
Philipp Bader, Arieh Iserles, Karolina Kropielnicka, and Pranav Singh. Effective approximation for the semiclassical Schrödinger equation. Found. Comput. Math., 14(4):689–720, 2014.
Sergio Blanes and Vasile Gradinaru. High order efficient splittings for the semiclassical time-dependent Schrödinger equation. J. Comput. Phys., 405:109157, 13, 2020.
C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral methods. Scientific Computation. Springer-Verlag, Berlin, 2006.
Diego Caratelli, Ernesto Palini, and Paolo Emilio Ricci. Finite dimensional applications of the Dunford-Taylor integral. Bull. TICMI, 25(1):63–75, 2021.
T. S. Chihara. An Introduction to Orthogonal Polynomials. Mathematics and its Applications, Vol. 13. Gordon and Breach Science Publishers, New York-London-Paris, 1978.
Y. Eidelman and I. Gohberg. Algorithms for inversion of diagonal plus semiseparable operator matrices. Integral Equations Operator Theory, 44(2):172–211, 2002.
Wolfgang Hackbusch. Hierarchical matrices: algorithms and analysis, volume 49 of Springer Series in Computational Mathematics. Springer, Heidelberg, 2015.
Ernst Hairer, Christian Lubich, and Gerhard Wanner. Geometric numerical integration, volume 31 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, second edition, 2006.
Jan S. Hesthaven, Sigal Gottlieb, and David Gottlieb. Spectral methods for time-dependent problems, volume 21 of Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2007.
Marlis Hochbruck and Christian Lubich. On Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal., 34(5):1911–1925, 1997.
Arieh Iserles and Marcus Webb. Orthogonal systems with a skew-symmetric differentiation matrix. Found. Comput. Math., 19(6):1191–1221, 2019.
Arieh Iserles and Marcus Webb. A family of orthogonal rational functions and other orthogonal systems with a skew-Hermitian differentiation matrix. J. Fourier Anal. Appl., 26(1):Paper No. 19, 2020.
Arieh Iserles and Marcus Webb. A differential analogue of Favard’s theorem. In From operator theory to orthogonal polynomials, combinatorics, and number theory—a volume in honor of Lance Littlejohn’s 70th birthday, volume 285 of Oper. Theory Adv. Appl., pages 239–263. Birkhäuser/Springer, Cham, 2021.
Arieh Iserles and Marcus Webb. Fast computation of orthogonal systems with a skew-symmetric differentiation matrix. Comm. Pure Appl. Math., 74(3):478–506, 2021.
Mourad E. H. Ismail. Classical and Quantum Orthogonal Polynomials in One Variable, volume 98 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2005.
V. P. Konoplev. Polynomials orthogonal with respect to weight functions which are zero or infinite at isolated points of the interval of orthogonality. Dokl. Akad. Nauk SSSR, 141:781–784, 1961.
V. P. Konoplev. The asymptotic behaviour of orthogonal polynomials at one-sided singularities of weight functions (algebraic singularities). Dokl. Akad. Nauk SSSR, 160:997–1000, 1965.
Jean Bernard Lasserre, Edouard Pauwels, and Mihai Putinar. The Christoffel–Darboux Kernel for Data Analysis, volume 38 of Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2022.
Karen Min Luong. Approximation of Wave Packets on the Real Line. PhD thesis, University of Cambridge, 2023.
Robert I. McLachlan and G. Reinout W. Quispel. Splitting methods. Acta Numer., 11:341–434, 2002.
Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark, editors. NIST Handbook of Mathematical Functions. U.S. Department of Commerce, National Institute of Standards and Technology, Washington, DC; Cambridge University Press, Cambridge, 2010.
Sheehan Olver, Richard Mikaël Slevinsky, and Alex Townsend. Fast algorithms using orthogonal polynomials. Acta Numer., 29:573–699, 2020.
Earl D. Rainville. Special Functions. The Macmillan Co., New York, 1960.
Barry Simon. Orthogonal polynomials on the unit circle. Part 1, volume 54 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI, 2005.
Barry Simon. The Christoffel–Darboux kernel. In Perspectives in Partial Differential Equations, Harmonic Analysis and Applications, volume 79 of Proc. Sympos. Pure Math., pages 295–335. Amer. Math. Soc., Providence, RI, 2008.
Gábor Szegő. Orthogonal polynomials. American Mathematical Society Colloquium Publications, Vol. XXIII. American Mathematical Society, Providence, R.I., fourth edition, 1975.
Lloyd N. Trefethen. Spectral methods in MATLAB, volume 10 of Software, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000.
Ellen Van Camp, Nicola Mastronardi, and Marc Van Barel. Two fast algorithms for solving diagonal-plus-semiseparable linear systems. In Proceedings of the 10th International Congress on Computational and Applied Mathematics (ICCAM-2002), volume 164/165, pages 731–747, 2004.
J. L. Walsh. Interpolation and approximation by rational functions in the complex domain. American Mathematical Society Colloquium Publications, Vol. XX. American Mathematical Society, Providence, R.I., fourth edition, 1965.
Acknowledgements
The author wishes to thank both referees, whose comprehensive and well-informed comments have led to a much better paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Endre Sűli.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Separability Coefficients for Laguerre Weights
Our starting point is the generating function
[23, p. 202] and we recall that
Set
Because of
and (2.3), it follows that
We set
Using (A.6) it follows that for \(|s|,|t|<1\),
(Cf. for example, [23] for the definition and basic facts§on confluent hypergeometric functions.) We now expand: all it takes is elementary (but long) algebra:
The following proposition can be trivially proved by induction.Footnote 8
Proposition 11
We thus deduce from the definition of Q that \(q_{n,n}=0\) and
and conclude that
for \(m\ge n+1\), with skew-symmetric completion for \(m\le n\).
We have just determined both separability and the explicit form of the sequences \({\mathfrak {a}}\) and \({\mathfrak {b}}\).
Theorem 12
The Laguerre weight is separable and
Symmetric Separability Coefficients for Ultraspherical Weights
We recall that
and we are concerned with \(m\ge n\) (\(S_{m,n}\) is symmetric) and even \(m+n\). We commence by dividing \(\text {P}_n^{(\alpha ,\alpha )}\) by \((1-x^2)\)—it follows from the Euclidean algorithm that
where \(\gamma _n\in {\mathbb {P}}_{n-1}\) and \(\delta _n(x)=\delta _{n,0}+\delta _{n,1} x\) is linear. Because of parity, if n is even then \(\delta _1=0\), while if it is odd, then \(\delta _0=0\). The description of \(\delta \) can be completed by considering \(x=1\),
therefore,
Since \(\deg \gamma \le n-1\le m-1\), it follows from orthogonality that
Letting
we thus have
We wish to prove that
To this end, it is helpful to rewrite (B.1) in the form
To prove that (B.1) is identical to (B.2) for \({\mathfrak {e}}_m\) we commence from the latter, noting that it is the same as
and use the Gamma duplication formula
[21, 5.5.5]. Letting \(z=m+\alpha +\frac{1}{2}\), we have
and obtain (B.1) following elementary manipulation. An identical procedure applies to \({\mathfrak {o}}_m\).
Replacing m by 2m in (3.6) results in the recursion
while replacing m by \(2m+1\) results in
Replacing \(x^2=1-(1-x^2)\) and using orthogonality,
for \(m\in {\mathbb {N}}\). Thus,
We compute directly
(this is consistent with (B.2) for \(m=0\)), whereby (B.2) follows from (B.3) and (B.4) by easy induction. We deduce that
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Iserles, A. Stable Spectral Methods for Time-Dependent Problems and the Preservation of Structure. Found Comput Math (2024). https://doi.org/10.1007/s10208-024-09647-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10208-024-09647-w