Abstract
Motivated by a paper of Dragomir, we give new refinements for both discrete and integral Jensen inequalities using the Jensen’s gap. As applications, we give refinements of various inequalities verifiable by Jensen’s inequality. Topics covered: norms, quasi-arithmetic means, Hölder’s inequality and f-divergences in information theory.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let C be a convex subset of a real vector space V. A function \(f:C\rightarrow R\) is said to be convex if whenever u, \(v\in C\) and \(\alpha \in [ 0,1] \) we have
The set of positive integers will be denoted by \(\mathbb {N}_{+}\).
Let the set I denote either \(\left\{ 1,\ldots ,n\right\} \) for some \(n\ge 1\) or \(\mathbb {N}_{+}\). We say that the numbers \(\left( p_{i}\right) _{i\in I}\) represent a (positive) discrete probability distribution if \(\left( p_{i}>0\right) \) \(p_{i}\ge 0\) \(\left( i\in I\right) \) and \(\sum \nolimits _{i\in I}p_{i}=1\).
Jensen’s inequality is one of the most important inequalities regarding convex functions.
The following discrete and integral versions of Jensen’s inequality are well known (see [10]).
Theorem 1.1
(discrete Jensen inequality for finite sums) Let C be a convex subset of a real vector space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},\ldots ,p_{n}\) represent a discrete probability distribution, and \(v_{1},\ldots ,v_{n}\in C\), then
Theorem 1.2
(integral Jensen inequality) Let \(\varphi \) be an integrable function on a probability space \(\left( X,\mathcal {A},\mu \right) \) taking values in an interval \(C\subset \mathbb {R}\). Then \( {\displaystyle \int \nolimits _{X}} \varphi d\mu \) lies in C. If f is a convex function on C such that \(f\circ \varphi \) is \(\mu \)-integrable, then
The following refinement of the discrete Jensen inequality for finite sums can be found in [4]. Its special case \(n=2\) originates from the former paper [5].
Theorem 1.3
Let C be a convex subset of a real vector space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},\ldots ,p_{n}\) represent a discrete probability distribution, \(q_{1},\ldots ,q_{n}\) represent a positive discrete probability distribution, and \(v_{1},\ldots ,v_{n}\in C\), then
The previous result can also be found in paper [9], when C is an interval in \(\mathbb {R}\).
Inequality (1.3) is a refinement of discrete Jensen inequality by using the so-called discrete Jensen gap
Motivated by Theorem 1.3, we obtain new refinements of both the discrete and the integral Jensen inequalities, also using Jensen’s gap. If \(p_{i}>0\) \(\left( i=1,\ldots ,n\right) \), then it is easy to think that the two inequalities (1.3–1.4) are equivalent, so in this paper we concentrate only on refinements of either the discrete or the integral Jensen inequality similar to inequality (1.3). We first show that inequality (1.3) can be extended to the case where \(q_{1},\ldots ,q_{n} \) are only nonnegative, and we give its form for infinite sums. This result allows us to prove a new refinement of the integral Jensen inequality. Paper [4] deals only with discrete inequalities for finite sums, while in paper [9] there are also versions of Theorem 1.3 for integrals, but not in general measure spaces, only on Borel sets of compact intervals. While the proofs of discrete inequalities in [9] are essentially transferable to the proofs of integral inequalities there, this is not the case for the integral inequality we have given. As applications, we give refinements of various inequalities verifiable by different types of Jensen’s inequality. Topics covered: norms, quasi-arithmetic means, Hölder’s inequality and f-divergences in information theory.
2 Preliminary results
We first give a version of the discrete Jensen inequality for series. The author has not found this form in the literature (although it is probably not new), so we prove it.
Theorem 2.1
(discrete Jensen inequality for series) Let C be a closed convex subset of a real normed space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},p_{2},\ldots \) represent a discrete probability distribution, \(v_{1},v_{2},\ldots \in C\) such that the series \(\sum \nolimits _{i=1} ^{\infty }p_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) are convergent and f is lower semicontinuous, then
Proof
We can suppose that \(p_{1}>0\). By the discrete Jensen inequality for finite sums,
Since
\(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) is convergent and C is closed, \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\in C\). It now follows from (2.2) and the convergence of \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) that
and hence the result follows from the lower semicontinuity of f.
The proof is complete. \(\square \)
Next, we obtain an extension of Theorem 1.3 for infinite sums, and show that the two inequalities in Theorem 1.3 are equivalent in the sense that either one follows from the other.
Theorem 2.2
-
(a)
Let C be a closed convex subset of a real normed space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},p_{2},\ldots \) represent a discrete probability distribution, \(q_{1},q_{2},\ldots \) represent a positive discrete probability distribution, \(v_{1},v_{2},\ldots \in C\) such that the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\), \(\sum \nolimits _{i=1} ^{\infty }q_{i}v_{i}\), \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \ \)and \(\sum \nolimits _{i=1}^{\infty }q_{i}f\left( v_{i}\right) \) are convergent and f is lower semicontinuous, then
$$\begin{aligned} f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\inf _{i\ge 1} \frac{p_{i}}{q_{i}}\left( \sum \limits _{i=1}^{\infty }q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }q_{i}v_{i}\right) \right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) . \end{aligned}$$(2.3) -
(b)
If \(p_{i}>0\) \(\left( i=1,\ldots ,n\right) \) in Theorem 1.3, then inequalities (1.3–1.4) are equivalent.
Proof
(a) Let
It is easy to check that
By using this and the discrete Jensen inequality for series, we obtain that
which gives the inequality.
(b) It follows obviously from Theorem 1.3 (by reversing the probability distributions) that
and hence
Since
we have that
The proof is complete. \(\square \)
Remark 2.3
-
(a)
If
$$\begin{aligned} \inf _{i\ge 1}\frac{p_{i}}{q_{i}}=0 \end{aligned}$$(for example, one of the numbers \(p_{1},p_{2}\ldots \) is 0), then (2.3) is just the discrete Jensen inequality, and therefore Theorem 2.2 (a) is really interesting in the case where
$$\begin{aligned} \inf _{i\ge 1}\frac{p_{i}}{q_{i}}>0. \end{aligned}$$(2.5)A similar statement applies to inequality (1.3).
-
(b)
If (2.5) is satisfied, and the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) are absolutely convergent, then the series \(\sum \nolimits _{i=1}^{\infty }q_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty } q_{i}f\left( v_{i}\right) \) are also absolutely convergent. Therefore if V is a complete space, then they are also convergent.
-
(c)
Condition (2.5) by itself does not provide a finite upper bound on the expression \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \), since it is possible that
$$\begin{aligned} \sup _{i\ge 1}\frac{p_{i}}{q_{i}}=\infty . \end{aligned}$$But if
$$\begin{aligned} 0<\inf _{i\ge 1}\frac{p_{i}}{q_{i}}\le \sup _{i\ge 1}\frac{p_{i}}{q_{i}} <\infty , \end{aligned}$$then
$$\begin{aligned} \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \le f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\sup _{i\ge 1}\frac{p_{i}}{q_{i} }\left( \sum \limits _{i=1}^{\infty }q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }q_{i}v_{i}\right) \right) \end{aligned}$$(2.6)holds too, and in this case inequalities (2.3) and (2.6) are also equivalent.
-
(d)
Inequality (2.3) also compares two different discrete Jensen gaps:
$$\begin{aligned} 0\le \inf _{i\ge 1}\frac{p_{i}}{q_{i}}\left( \sum \limits _{i=1}^{\infty } q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }q_{i} v_{i}\right) \right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) . \end{aligned}$$
3 Main results
The first result shows that neither the positivity of the probability distribution \(q_{1},\ldots ,q_{n}\) in Theorem 1.3 nor the positivity of the probability distribution \(q_{1},q_{2},\ldots \) in Theorem 2.2 (a) are essential conditions.
Theorem 3.1
-
(a)
Let J be a nonempty subset of \(\left\{ 1,\ldots ,n\right\} \). Let C be a convex subset of a real normed space V, and let \(f:C\rightarrow \mathbb {R}\) be a continuous convex function. If \(p_{1},\ldots ,p_{n}\) and \(\left( q_{j}\right) _{j\in J}\) represent positive discrete probability distributions, and \(v_{1},\ldots ,v_{n}\in C\), then
$$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\min _{j\in J}\frac{p_{j} }{q_{j}}\left( \sum \limits _{j\in J}q_{j}f\left( v_{j}\right) -f\left( \sum \limits _{j\in J}q_{j}v_{j}\right) \right) \le \sum \limits _{i=1}^{n} p_{i}f\left( v_{i}\right) . \nonumber \\ \end{aligned}$$(3.1) -
(b)
Let J be either a nonempty finite subset of \(\mathbb {N}_{+}\) or an infinite subset of \(\mathbb {N}_{+}\). Let C be a closed convex subset of a real Banach space V, and let \(f:C\rightarrow \mathbb {R}\) be a continuous convex function. If \(p_{1},p_{2},\ldots \) and \(\left( q_{j}\right) _{j\in J}\) represent positive discrete probability distributions, and \(v_{1},v_{2},\ldots \in C\) such that the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\), \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \ \)are absolutely convergent, and the series \(\sum \nolimits _{j\in J}q_{j}v_{j} \), \(\sum \nolimits _{j\in J}q_{j}f\left( v_{j}\right) \) are convergent, then
$$\begin{aligned} f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\inf _{j\in J} \frac{p_{j}}{q_{j}}\left( \sum \limits _{j\in J}q_{j}f\left( v_{j}\right) -f\left( \sum \limits _{j\in J}q_{j}v_{j}\right) \right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) .\nonumber \\ \end{aligned}$$(3.2) -
(c)
If \(V=\mathbb {R}\), so C is an interval in \(\mathbb {R}\), then either (a) or (b) remains true without the continuity of f.
Proof
We first prove part (b).
(b) Let \(K:=\mathbb {N}_{+}\setminus J\), and let
If \(J=\mathbb {N}_{+}\), then Theorem 2.2 (a) can be applied, while if \(s=0\), then (3.2) is obvious, and hence we can suppose that \(J\ne \mathbb {N}_{+}\) (in this case \(c>0\)) and \(s>0\).
For every \(0<\varepsilon <\min \left( 1,\frac{c}{s}\right) \) define
Then \(\left( \widehat{q}_{i}\left( \varepsilon \right) \right) _{i=1}^{\infty }\) represents a positive discrete probability distribution for any possible \(\varepsilon \).
Since the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) are absolutely convergent and the space is complete, the series
are also convergent (it is only really interesting if K is an infinite set). It now follows from the convergence of the series \(\sum \nolimits _{j\in J} q_{j}v_{j} \ \)and \(\sum \nolimits _{j\in J}q_{j}f\left( v_{j}\right) \) that
and
for all \(0<\varepsilon <\min \left( 1,\frac{c}{s}\right) \).
Based on the previous two statements, we can apply Theorem 2.2 (a) which gives that
Let
and let \(\delta >0\) be fixed.
Then there exist \(\overline{j}\in J\) and \(0<\overline{\varepsilon }<\min \left( 1,\frac{c}{s}\right) \) such that
Since \(0<\varepsilon <\frac{c}{s}\), for every \(i\in K\) we have
It follows from (3.6) and (3.7) that
which implies
Therefore (3.5) implies the result, by using (3.3), (3.4) and the continuity of f.
(a) The proof of part (b) can be followed in a simplified version.
(c) Let the left-hand endpoint of C be \(a\ge -\infty \) and the right-hand endpoint be \(b\le \infty \). Assume \(a\in C\) and f is not continuous at a. Our previous considerations show that it can only be a problem if \(\sum \limits _{j\in K}q_{j}v_{j}=a\). But this is only possible if \(v_{j}=a\) \(\left( j\in K\right) \). In this case, however
If \(b\in C\) and f is not continuous at a, the proof goes similarly.
The proof is complete. \(\square \)
Remark 3.2
- (a)
-
(b)
Assume \(J=\left\{ i_{1},\ldots ,i_{k}\right\} \) where \(1\le i_{1}<\cdots <i_{k}\le n\). Then (3.1) can be written in the following form
$$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\min _{1\le j\le k} \frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1}^{k}q_{i_{j}}f\left( v_{i_{j}}\right) -f\left( \sum \limits _{j=1}^{k}q_{i_{j}}v_{i_{j}}\right) \right) \le \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) . \end{aligned}$$Inequality (3.2) can be rewritten in a similar form, and in this case K can be a finite or infinite set.
-
(c)
It is easy to check that
$$\begin{aligned} \min _{j\in J}\frac{p_{j}}{q_{j}}\le 1\text { and }\inf _{j\in J}\frac{p_{j} }{q_{j}}\le 1. \end{aligned}$$ -
(d)
Remark 2.3 (d) also applies here.
We are now able to state and prove the analogue of Theorem 3.1 for the integral Jensen inequality.
Theorem 3.3
Let \(\left( X,\mathcal {A}\right) \) be a measurable space with probability measures \(\mu \) and \(\nu \) having the following property
Assume \(\varphi :X\rightarrow \mathbb {R}\) is a measurable function taking values in an interval \(C\subset \mathbb {R}\) such that \(\varphi \) is \(\mu \)- and \(\nu \)-integrable. If f is a convex function on C such that \(f\circ \varphi \) is \(\mu \)- and \(\nu \)-integrable, then
Proof
(i) Assume first that \(\varphi \) is a simple function on X, which means that it has only finitely many different values. If \(\left\{ c_{1},\ldots c_{k}\right\} \subset C\) is the set of distinct values of \(\varphi \), then the sets \(A_{i}:=\left\{ x\in X\mid \varphi \left( x\right) =c_{i}\right\} \) \(\left( i=1,\ldots ,k\right) \) are pairwise disjoint elements of \(\mathcal {A} \).
In this case inequality (3.9) can be written in the form
It follows from (3.8) that \(\mu \left( A\right) =0\) implies \(\nu \left( A\right) =0\), and hence we can suppose \(\mu \left( A_{i}\right) >0\) \(\left( i=1,\ldots ,k\right) \). According to this and
inequality (3.10) is an immediate consequence of Theorem 3.1 (a).
(ii) Now assume that f is continuous.
It is well known (see [6]) that there exists a sequence \(\left( \varphi _{n}\right) _{n=1}^{\infty }\) of \(\mathcal {A}\)-measurable simple functions defined on X such that \(\left| \varphi _{1}\right| \le \left| \varphi _{2}\right| \le \ldots \le \left| \varphi _{n}\right| \le \ldots \le \left| \varphi \right| \) and \(\varphi _{n}\left( x\right) \rightarrow \varphi \left( x\right) \) for each \(x\in X\). It follows that \(\varphi _{n}\left( x\right) \in C\) \(\left( x\in X\right) \) and \(\varphi _{n}\) is \(\mu \)- and \(\nu \)-integrable for all \(n\in \mathbb {N}_{+}\).
By part (i), we obtain that
The dominated convergence theorem implies that
By the continuity of f,
Since f is convex and continuous, it is either monotonic on C or it has a global minimum at an interior point \(t_{0}\) of C (see [10]). In the first case
while in the second case
It follows from our previous observations that we can again apply the dominated convergence theorem showing that
Now the result comes from inequality (3.11).
(iii) Finally, assume f is not continuous at least at one endpoint of the interval C.
Then it is not hard to think that there exists a decreasing sequence \(\left( f_{n}\right) _{n=1}^{\infty }\) of convex functions defined on C such that \(f_{n}\) is continuous and \(f_{n}\circ \varphi \) is \(\mu \)- and \(\nu \)-integrable for all \(n\in \mathbb {N}_{+}\), and also \(f_{n}\left( t\right) \rightarrow f\left( t\right) \) for each \(t\in C\). In this case the sequence \(\left( f_{n}\circ \varphi \right) _{n=1}^{\infty }\) is decreasing too, and \(f_{n}\left( \varphi \left( x\right) \right) \rightarrow f\left( \varphi \left( x\right) \right) \) for each \(x\in X\), and hence Beppo Levi’s theorem shows that
Now, we can apply part (ii).
The proof is complete. \(\square \)
Let \(\left( X,\mathcal {A}\right) \) be a measurable space. The unit mass at \(x\in X\) (the Dirac measure at x) is denoted by \(\varepsilon _{x}\). The set of all subsets of X is denoted by \(P\left( X\right) \).
Remark 3.4
-
(a)
As we have seen in the proof, or we can refer to condition (3.8), the measure \(\nu \) is continuous with respect to the measure \(\mu \). Then \(\nu \) has a Radon–Nikodym derivative (or density) \(q:X\rightarrow \mathbb {R}\) with respect to \(\mu \). By using this, the condition (3.8) and inequality (3.9) can be written in the following form:
$$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid \int \limits _{A}qd\mu >0\right\} } \frac{\mu \left( A\right) }{\int \limits _{A}qd\mu } \end{aligned}$$and
$$\begin{aligned} f\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) qd\mu -f\left( {\displaystyle \int \limits _{X}} \varphi qd\mu \right) \right) \le {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\mu . \end{aligned}$$ -
(b)
It is easy to think that \(s\le 1\).
-
(c)
Let \(C\subset \mathbb {R}\) be an interval, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. Assume \(p_{1},\ldots ,p_{n}\) represent a positive discrete probability distribution, \(q_{1},\ldots ,q_{n}\) represent a discrete probability distribution, and \(v_{1},\ldots ,v_{n}\in C\). Consider the discrete measures
$$\begin{aligned} \mu :=\sum \limits _{i=1}^{n}p_{i}\varepsilon _{i}\text { and }\nu :=\sum \limits _{i=1}^{n}q_{i}\varepsilon _{i} \end{aligned}$$on the set of all subsets of \(\left\{ 1,\ldots n\right\} \). Then obviously
$$\begin{aligned} 0<s:=\min _{\left\{ A\subset \left\{ 1,\ldots n\right\} \mid \nu \left( A\right) >0\right\} }\frac{\mu \left( A\right) }{\nu \left( A\right) }, \end{aligned}$$and if \(\varphi \) is defined on \(\left\{ 1,\ldots ,n\right\} \) by \(\varphi \left( i\right) :=v_{i}\), then Theorem 3.3 gives that
$$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +s\left( \sum \limits _{i=1} ^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}q_{i} v_{i}\right) \right) \le \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) . \end{aligned}$$This is not as sharp a result as Theorem 3.1 (a), since
$$\begin{aligned} s\le \min _{\left\{ j\in \left\{ 1,\ldots ,n\right\} \mid q_{j}>0\right\} }\frac{p_{j}}{q_{j}}, \end{aligned}$$but it is quite natural, since \(\varphi \) is now an elementary function, but in the general case \(\varphi \) must be approximated by a sequence of elementary functions.
-
(d)
We can also see from (c) that the measure \(\mu \) is in general not continuous with respect to the measure \(\nu \).
-
(e)
Similarly to Remark 2.3 (d), a comparison of different integral Jensen gaps can be obtained here:
$$\begin{aligned} 0\le s\left( {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\nu -f\left( {\displaystyle \int \limits _{X}} \varphi d\nu \right) \right) \le {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\mu -f\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) . \end{aligned}$$
It is worth noting the following version of the previous theorem.
Corollary 3.5
Let \(\left( X,\mathcal {A}\right) \) be a measurable space with a \(\sigma \)-finite measure \(\xi \) on \(\mathcal {A}\), and let p, \(q:X\rightarrow \mathbb {R}\) be positive and \(\xi \)-integrable functions such that
and
Assume \(\varphi :X\rightarrow \mathbb {R}\) is a measurable function taking values in an interval \(C\subset \mathbb {R}\) for which \(\varphi p\) and \(\varphi q\) are \(\xi \)-integrable functions. If f is a convex function on C such that \(\left( f\circ \varphi \right) p\) and \(\left( f\circ \varphi \right) q\) are \(\xi \)-integrable, then
Proof
Define the measures \(\mu \) and \(\nu \) on \(\mathcal {A}\) by
By (3.12), \(\mu \) and \(\nu \) are probability measures, and hence
It is also known from the theory of integration that
Theorem 3.3 can be applied.
The proof is complete. \(\square \)
Remark 3.6
-
(a)
The measures \(\mu \) and \(\nu \) are \(\xi \)-continuous and the functions p and q are the Radon–Nikodym derivatives of \(\mu \) and \(\nu \) with respect to \(\xi \), respectively.
-
(b)
If the measure \(\xi \) is not \(\sigma \)-finite, then there is no positive and integrable function on X.
-
(c)
If
$$\begin{aligned} 0<\inf _{X}\frac{p}{q} \end{aligned}$$(3.15)then obviously
$$\begin{aligned} \inf _{X}\frac{p}{q}\le s, \end{aligned}$$which implies that (3.13) holds too, but (3.15) does not follow from (3.13) in general. Although our result is weaker under the condition (3.15), it is also interesting because it is easier to check than condition (3.13).
4 Applications
The first application relates to norms.
Proposition 4.1
Let \(\left( V,\left\| \cdot \right\| \right) \) be a real Banach space, and let \(\alpha \ge 1\).
-
(a)
If \(p_{1},\ldots ,p_{n}\) and \(q_{i_{1}},\ldots ,q_{i_{k}}\) represent positive discrete probability distributions, where \(1\le i_{1}<i_{2}<\cdots <i_{k}\le n\), \(v_{1},\ldots ,v_{n}\in V\), and \(\alpha \ge 1\), then
$$\begin{aligned} \left\| \sum \limits _{i=1}^{n}p_{i}v_{i}\right\| ^{\alpha }+\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1}^{k}q_{i_{j} }\left\| v_{i_{j}}\right\| ^{\alpha }-\left\| \sum \limits _{j=1} ^{k}q_{i_{j}}v_{i_{j}}\right\| ^{\alpha }\right) \le \sum \limits _{i=1} ^{n}p_{i}\left\| v_{i}\right\| ^{\alpha }. \end{aligned}$$ -
(b)
Assume \(p_{1},p_{2},\ldots \) represent a positive discrete probability distribution, and \(v_{1},v_{2},\ldots \in V\) such that the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) is absolutely convergent.
- (b\(_{1}\)):
-
If \(q_{i_{1}},\ldots ,q_{i_{k}}\) represent a positive discrete probability distribution, where \(1\le i_{1}<i_{2}<\cdots <i_{k} \), then
$$\begin{aligned} \left\| \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right\| ^{\alpha } +\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1} ^{k}q_{i_{j}}\left\| v_{i_{j}}\right\| ^{\alpha }-\left\| \sum \limits _{j=1}^{k}q_{i_{j}}v_{i_{j}}\right\| ^{\alpha }\right) \le \sum \limits _{i=1}^{\infty }p_{i}\left\| v_{i}\right\| ^{\alpha }. \end{aligned}$$ - (b\(_{2}\)):
-
If \(q_{i_{1}},q_{i_{2}},\ldots \)represent a positive discrete probability distribution, where \(1\le i_{1}<i_{2}<\cdots \), and the series \(\sum \limits _{j=1}^{\infty }q_{i_{j}}v_{i_{j}}\) is absolutely convergent, then
$$\begin{aligned} \left\| \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right\| ^{\alpha } +\inf _{j\ge 1}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1}^{\infty }q_{i_{j}}\left\| v_{i_{j}}\right\| ^{\alpha }-\left\| \sum \limits _{j=1}^{\infty }q_{i_{j}}v_{i_{j}}\right\| ^{\alpha }\right) \le \sum \limits _{i=1}^{\infty }p_{i}\left\| v_{i}\right\| ^{\alpha }. \end{aligned}$$
Proof
It is well known that the function \(f:V\rightarrow \mathbb {R}\), \(f\left( x\right) :=\left\| x\right\| ^{\alpha }\) is continuous and convex.
-
(a)
Theorem 3.1 (a) can be applied.
-
(b)
In this case the series \(\sum \nolimits _{i=1}^{\infty }p_{i}\left\| v_{i}\right\| ^{\alpha }\) and \(\sum \nolimits _{j=1}^{\infty }q_{i_{j}}\left\| v_{i_{j}}\right\| ^{\alpha }\) are convergent, and hence the result follows from Theorem 3.1 (b).
The proof is complete. \(\square \)
Remark 4.2
Paper [4] contains the special case of Proposition 4.1 (a), when \(k=n\).
The second application concerns quasi-arithmetic means.
Let \(C\subset \mathbb {R}\) be an interval, and let \(g:C\rightarrow \mathbb {R}\) be a continuous and strictly monotone function.
If \(p_{1},\ldots ,p_{n}\) represent a discrete probability distribution and \(v_{1},\ldots ,v_{n}\in C\), then the weighted quasi-arithmetic mean is defined by
If \(\left( X,\mathcal {A},\mu \right) \) is a probability space, and \(\varphi :X\rightarrow C\) is a measurable function such that \(g\circ \varphi \) is \(\mu \)-integrable on X, then
is called the quasi-arithmetic mean (integral g-mean) of \(\varphi \). Of course (4.2) contains (4.1) as a special case.
Proposition 4.3
Let \(C\subset \mathbb {R}\) be an interval, and g be a continuous and strictly monotone function on C.
-
(a)
Let \(p_{1},\ldots ,p_{n}\) and \(q_{i_{1}},\ldots ,q_{i_{k}}\) represent positive discrete probability distributions, where \(1\le i_{1}<i_{2}<\cdots <i_{k}\le n\), and let \(v_{1},\ldots ,v_{n}\in C\). If either g is strictly increasing and concave or g is strictly decreasing and convex, then
$$\begin{aligned}{} & {} g^{-1}\left( \sum \limits _{i=1}^{n}p_{i}g\left( v_{i}\right) \right) +\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1} ^{k}q_{i_{j}}v_{i_{j}}-g^{-1}\left( \sum \limits _{j=1}^{k}q_{i_{j}}g\left( v_{i_{j}}\right) \right) \right) \nonumber \\{} & {} \quad \le \sum \limits _{i=1}^{n}p_{i_{j}} v_{i_{j}}. \end{aligned}$$(4.3)If either g is strictly increasing and convex or g is strictly decreasing and concave, then the reverse inequality is satisfied.
-
(b)
Let \(\left( X,\mathcal {A}\right) \) be a measurable space with probability measures \(\mu \) and \(\nu \) having the property (3.8). Assume \(\varphi :X\rightarrow \mathbb {R}\) is a measurable function taking values in C such that \(\varphi \) and \(g\circ \varphi \) are \(\mu \)- and \(\nu \)-integrable. If either g is strictly increasing and concave or g is strictly decreasing and convex, then
$$\begin{aligned} g^{-1}\left( {\displaystyle \int \limits _{X}} g\circ \varphi d\mu \right) +s\left( {\displaystyle \int \limits _{X}} \varphi d\nu -g^{-1}\left( {\displaystyle \int \limits _{X}} g\circ \varphi d\nu \right) \right) \le {\displaystyle \int \limits _{X}} \varphi d\mu . \end{aligned}$$(4.4)
If either g is strictly increasing and convex or g is strictly decreasing and concave, then the reverse inequality is satisfied.
Proof
Under the conditions the function \(g^{-1}\) is continuous and convex.
The proof is complete. \(\square \)
Remark 4.4
Of course, part (a) of the previous result can also be formulated for infinite sums in a way analogous to Proposition 4.1 (b).
We consider two special cases of the previous result.
Example
-
(a)
Choose \(C=] 0,\infty [ \) and \(g=\ln \) in Proposition 4.3 (a). Then (4.3) gives that
$$\begin{aligned} \prod \limits _{i=1}^{n}v_{i}^{p_{i}}+\min _{1\le j\le k}\frac{p_{i_{j}} }{q_{i_{j}}}\left( \sum \limits _{j=1}^{k}q_{i_{j}}v_{i_{j}}-\prod \limits _{j=1}^{k}v_{i_{j}}^{q_{i_{j}}}\right) \le \sum \limits _{i=1}^{n} p_{i}v_{i}, \end{aligned}$$which contains weighted arithmetic means and weighted geometric means, and it refines the inequality between these means.
-
(b)
Choose \(C=] 0,\infty [ \) and \(g:] 0,\infty [ \rightarrow \mathbb {R}\), \(g\left( t\right) :=t^{\alpha }\) \(\left( \alpha \ne 0\right) \) in Proposition 4.3 (b).
The mean defined by the function
is called the \(\alpha \)th power mean (Hölder mean). It is usual to extend it for \(\alpha =0\) by
In this case (4.4) gives that for \(\alpha \in ] -\infty ,0[ \cup ] 0,1[ \)
while for \(\alpha \in [ 1,\infty [ \) the reverse inequality holds. For \(\alpha =0\) we have
In the following result we obtain refinements of Hölder’s inequality. If \(\left( X,\mathcal {A},\mu \right) \) is a measure space, then \(L^{\alpha }\left( \mu \right) \) \(\left( \alpha \ge 1\right) \) denotes the vector space of all complex \(\alpha \)-fold \(\mu \)-integrable functions on X.
Proposition 4.5
Assume \(\alpha \), \(\beta >1\) with \(\frac{1}{\alpha }+\frac{1}{\beta }=1\).
-
(a)
If \(p_{1},\ldots ,p_{n}\) and \(q_{i_{1}},\ldots ,q_{i_{k}}\) are positive numbers, where \(1\le i_{1}<i_{2}<\cdots <i_{k}\le n\), and \(u_{1},\ldots ,u_{n}\in \mathbb {C}\), \(v_{1},\ldots ,v_{n}\in \mathbb {C}\), then
$$\begin{aligned}{} & {} \sum \limits _{i=1}^{n}p_{i}\left| u_{i}\right| \left| v_{i}\right| \\{} & {} \qquad +\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \left( \sum \limits _{j=1}^{k}q_{i_{j}}\left| u_{i_{j}}\right| ^{\alpha }\right) ^{\frac{1}{\alpha }}\left( \sum \limits _{j=1}^{k}q_{i_{j}}\left| v_{i_{j} }\right| ^{\beta }\right) ^{\frac{1}{\beta }}-\sum \limits _{j=1}^{k} q_{i_{j}}\left| u_{i_{j}}\right| \left| v_{i_{j}}\right| \right) \\{} & {} \quad \le \left( \sum \limits _{i=1}^{n}p_{i}\left| u_{i}\right| ^{\alpha }\right) ^{\frac{1}{\alpha }}\left( \sum \limits _{i=1}^{n}p_{i}\left| v_{i}\right| ^{\beta }\right) ^{\frac{1}{\beta }}. \end{aligned}$$ -
(b)
Let \(\left( X,\mathcal {A}\right) \) be a measurable space with measures \(\xi \) and \(\eta \), and let u, \(v:X\rightarrow \mathbb {C}\) be measurable functions such that \(u\in L^{\alpha }\left( \xi \right) \cap L^{\alpha }\left( \eta \right) \), \(v\in L^{\beta }\left( \xi \right) \cap L^{\beta }\left( \eta \right) \). If
$$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid \int \limits _{A}\left| v\right| ^{\alpha }d\eta >0\right\} }\frac{\int \limits _{A}\left| v\right| ^{\beta }d\xi }{\int \limits _{A}\left| v\right| ^{\alpha }d\eta }, \end{aligned}$$(4.5)then
$$\begin{aligned}{} & {} \int \limits _{X}\left| u\right| \left| v\right| d\xi +s\left( \left( \int \limits _{X}\left| u\right| ^{\alpha }d\eta \right) ^{\frac{1}{\alpha }}\left( \int \limits _{X}\left| v\right| ^{\beta }d\eta \right) ^{\frac{1}{\beta }}- {\displaystyle \int \limits _{X}} \left| u\right| \left| v\right| d\eta \right) \end{aligned}$$(4.6)$$\begin{aligned}{} & {} \quad \le \left( \int \limits _{X}\left| u\right| ^{\alpha }d\xi \right) ^{\frac{1}{\alpha }}\left( \int \limits _{X}\left| v\right| ^{\beta } d\xi \right) ^{\frac{1}{\beta }}. \end{aligned}$$(4.7)
Proof
We only show (b), (a) can similarly be proved using Theorem 3.1 (a) instead of Theorem 3.3.
(b) It follows from (4.5) that \(\int \limits _{X}\left| v\right| ^{\beta }d\xi >0\) and \(\int \limits _{X}\left| v\right| ^{\beta }d\eta >0\).
Define the measure \(\mu \) on \(\mathcal {A}\) having density \(\left| v\right| ^{\beta }/\int \limits _{X}\left| v\right| ^{\beta }d\xi \) with respect to \(\xi \), that is
The measure \(\nu \) on \(\mathcal {A}\) is defined similarly by
Then \(\mu \) and \(\nu \) are probability measures on \(\mathcal {A}\), and \(g\in L^{\alpha }\left( \xi \right) \cap L^{\alpha }\left( \eta \right) \) shows that the function
is \(\mu \)- and \(\nu \)-integrable. The function \(f:[ 0,\infty [ \rightarrow \mathbb {R}\), \(f(t)=-t^{1/\alpha }\) is strictly convex, and since \(uv\in L^{1}\left( \xi \right) \cap L^{1}\left( \eta \right) \), we have that
is also \(\mu \)- and \(\nu \)-integrable. It can be seen that Theorem 3.3 can be applied to the introduced functions f and \(\varphi \), and measures \(\mu \) and \(\nu \), and therefore
By considering the relationship between \(\mu \)- and \(\xi \)- as well as \(\nu \)- and \(\eta \)-integrals, (4.8) can be rewritten in the form
from which we obtain (4.6-4.7) by simple calculation.
The proof is complete. \(\square \)
Remark 4.6
-
(a)
Obviously, starting from Theorem 3.1 (b), we could make a statement analogous to (a) for infinite sums.
-
(b)
If \(v\ne 0\) \(\eta \)-a.e. on X, then
$$\begin{aligned} s=\inf _{\left\{ A\in \mathcal {A}\mid \eta \left( A\right) >0\right\} } \frac{\int \limits _{A}\left| v\right| ^{\beta }d\xi }{\int \limits _{A} \left| v\right| ^{\alpha }d\eta }. \end{aligned}$$
Finally, some applications to information theory are presented.
Throughout the rest of the paper probability measures P and Q are defined on a fixed measurable space \(\left( X,\mathcal {A}\right) \). It is also assumed that P and Q are continuous with respect to a \(\sigma \)-finite measure \(\xi \) on \(\mathcal {A}\). The Radon–Nikodym derivatives of P and Q with respect to \(\xi \) are denoted by p and q, respectively. These densities are \(\xi \)-almost everywhere uniquely determined.
Introduce the set of functions
and define for every \(f\in F\) the function
If \(f\in F\), then either f is monotonic or there exists a point \(t_{0} \in ] 0,\infty [ \) such that f is decreasing on \(] 0,t_{0}[ \). This implies that the limit
exists in \(] -\infty ,\infty ] \), and
extends f into a convex function on \([ 0,\infty [ \).
It is well known that for every \(f\in F\) the function \(f^{*}\) also belongs to F, and therefore
The important notion of f-divergences was introduced in [2, 3], and independently in [1].
Definition 4.7
For every \(f\in F\) we define the f-divergence of P and Q by
where the following conventions are used
Remark 4.8
-
(a)
For every \(f\in F\) the perspective \(\hat{f}:] 0,\infty [ \times ] 0,\infty [ \rightarrow \mathbb {R}\) of f is defined by
$$\begin{aligned} \hat{f}\left( x,y\right) :=yf\left( \frac{x}{y}\right) . \end{aligned}$$Then (see [11]) \(\hat{f}\) is also a convex function. It is proved in [12] that (4.9) is the unique rule leading to convex and lower semicontinuous extension of \(\hat{f}\) to the set
$$\begin{aligned} \left\{ \left( x,y\right) \in \mathbb {R}^{2}\mid x,y\ge 0\right\} . \end{aligned}$$ -
(b)
Since \(f^{*}\left( 0\right) \in ] -\infty ,\infty ] \), Lemma 2.8 in [7] shows that \(D_{f}\left( P,Q\right) \) exists in \(] -\infty ,\infty ] \) and
$$\begin{aligned} D_{f}\left( P,Q\right) = {\displaystyle \int \limits _{\left( q>0\right) }} f\left( \frac{p\left( \omega \right) }{q\left( \omega \right) }\right) dQ\left( \omega \right) +f^{*}\left( 0\right) P\left( q=0\right) . \end{aligned}$$(4.10)
It follows that if P is continuous with respect to Q, then
The basic inequality (see [8])
is one of the key properties of f-divergences.
In the next result we refine this inequality.
Proposition 4.9
Assume densities p and q are positive functions such that
and \(\frac{p}{q}\) is a P-integrable function. If \(f\in F\) such that \(f\circ \frac{p}{q}\) is P- and Q-integrable, then
Proof
It can be obtained easily from Theorem 3.3 by choosing \(\mu =Q\), \(\nu =P\) and \(\varphi =\frac{p}{q}\).
As we have mentioned in Remark 3.4 (a), the measure P is continuous with respect to Q, and hence by Remark 4.8 (b),
The proof is complete. \(\square \)
Remark 4.10
-
(a)
Since P is continuous with respect to Q, P has a Radon–Nikodym derivative \(u:X\rightarrow \mathbb {R}\) with respect to Q. For the uniqueness of the Radon–Nikodym derivative, \(p=uq\). By using this, (4.13) can be written in other forms:
$$\begin{aligned} f\left( 1\right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \frac{p}{q}\right) pd\xi -f\left( {\displaystyle \int \limits _{X}} \frac{p^{2}}{q}d\xi \right) \right) \le D_{f}\left( P,Q\right) \end{aligned}$$or
$$\begin{aligned} f\left( 1\right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ u\right) udQ-f\left( {\displaystyle \int \limits _{X}} u^{2}dQ\right) \right) \le D_{f}\left( P,Q\right) . \end{aligned}$$ -
(b)
As we have mentioned in Remark 3.6 (c) for \(0<\inf _{X}\frac{q}{p}\) the condition (4.12) is also satisfied.
Data Availability
Not applicable.
References
Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. B. 28, 131–140 (1966)
Csiszár, I.: Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität on Markoffschen Ketten. Publ. Math. Inst. Hung. Acad. Sci. Ser. A 8, 84–108 (1963)
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2, 299–318 (1967)
Dragomir, S.S.: Bounds for the normalized Jensen functional. Bull. Aust. Math. Soc. 74(3), 471–478 (2006)
Dragomir, S.S., Pečarić, J., Persson, L.E.: Properties of some functionals related to Jensen’s inequality. Acta Math. Hung. 70, 129–143 (1996)
Hewitt, E., Stromberg, K.R.: Real and Abstract Analysis. Graduate Text in Mathematics, vol. 25. Springer, Berlin (1965)
Horváth, L., Pečarić, D., Pečarić, J.: A refinement and an exact equality condition for the basic inequality of \(f\)-divergences. Filomat 32(12), 4263–4273 (2018)
Liese, F., Vajda, I.: On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52, 4394–4412 (2006)
Mitroi, F.: About the precision in Jensen–Steffensen inequality. Ann. Univ. Craiova 37(4), 73–84 (2010)
Niculescu, C., Persson, L.E.: Convex Functions and Their Applications. A Contemporary Approach. Springer, Berlin (2006)
Rockafellar, T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970)
Vajda, I.: Theory of Statistical Inference and Information. Kluwer, Boston, MA (1989)
Acknowledgements
Research supported by the Hungarian National Research, Development and Innovation Office Grant No. K139346.
Funding
Open access funding provided by University of Pannonia.
Author information
Authors and Affiliations
Contributions
László Horváth wrote the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Horváth, L. Refinements of discrete and integral Jensen inequalities with Jensen’s gap. Aequat. Math. 98, 557–577 (2024). https://doi.org/10.1007/s00010-023-01006-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00010-023-01006-4
Keywords
- Discrete and integral Jensen inequalities
- Refinement
- Quasi-arithmetic means
- Hölder’s inequality
- f-divergences