Novel iterative ensemble smoothers derived from a class of generalized cost functions

Luo, Xiaodong

doi:10.1007/s10596-021-10046-1

Novel iterative ensemble smoothers derived from a class of generalized cost functions

Original Paper
Open access
Published: 01 April 2021

Volume 25, pages 1159–1189, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Geosciences Aims and scope Submit manuscript

Novel iterative ensemble smoothers derived from a class of generalized cost functions

Download PDF

Xiaodong Luo ORCID: orcid.org/0000-0002-0734-862X¹

539 Accesses
16 Citations
Explore all metrics

Abstract

Iterative ensemble smoothers (IES) are among the state-of-the-art approaches to solving history matching problems. From an optimization-theoretic point of view, these algorithms can be derived by solving certain stochastic nonlinear-least-squares problems. In a broader picture, history matching is essentially an inverse problem, which is often ill-posed and may not possess a unique solution. To mitigate the ill-posedness, in the course of solving an inverse problem, prior knowledge and domain experience are often incorporated, as a regularization term, into a suitable cost function within a respective optimization problem. Whereas in the inverse theory there is a rich class of inversion algorithms resulting from various choices of regularized cost functions, there are few ensemble data assimilation algorithms (including IES) which in their practical uses are implemented in a form beyond nonlinear-least-squares. This work aims to narrow this noticed gap. Specifically, we consider a class of more generalized cost functions, and establish a unified formula that can be used to construct a corresponding group of novel ensemble data assimilation algorithms, called generalized IES (GIES), in a principled and systematic way. For demonstration, we choose a subset (up to 30 +) of the GIES algorithms derived from the unified formula, and apply them to two history matching problems. Experiment results indicate that many of the tested GIES algorithms exhibit superior performance to that of an original IES developed in a previous work, showcasing the potential benefit of designing new ensemble data assimilation algorithms through the proposed framework.

Article PDF

Analysis of iterative ensemble smoothers for solving inverse problems

Article Open access 03 March 2018

Accounting for model errors in iterative ensemble smoothers

Article Open access 24 April 2019

A robust adaptive iterative ensemble smoother scheme for practical history matching applications

Article 27 March 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Kay, S.M.: Fundamentals of statistical signal processing. vol 1: Estimation theory. Prentice Hall PTR (1993)
Evensen, G.: Data assimilation: The ensemble Kalman filter. Springer Science & Business Media (2009)
Kalnay, E.: Atmospheric modeling, data assimilation and predictability. Cambridge University Press (2002)
Oliver, D.S., Reynolds, A.C., Liu, N.: Inverse theory for petroleum reservoir characterization and history matching. Cambridge University Press (2008)
Tarantola, A.: Inverse problem theory and methods for model parameter estimation. SIAM (2005)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Springer (2000)
Chen, Y., Oliver, D.: Levenberg-Marquardt forms of the iterative ensemble smoother for efficient history matching and uncertainty quantification. Comput. Geosci. 17, 689–703 (2013)
Article Google Scholar
Chen, Y., Oliver, D.S.: Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Math. Geosci. 44, 1–26 (2012)
Article Google Scholar
Emerick, A.A.: Deterministic ensemble smoother with multiple data assimilation as an alternative for history-matching seismic data. Comput. Geosci. 22(5), 1175–1186 (2018)
Article Google Scholar
Emerick, A.A., Reynolds, A.C.: Ensemble smoother with multiple data assimilation. Comput. Geosci. 55, 3–15 (2012)
Article Google Scholar
Evensen, G., Raanes, P.N., Stordal, A.S., Hove, J.: Efficient implementation of an iterative ensemble smoother for big-data assimilation and reservoir history matching. Front. Appl. Math. Stat. 5, 47 (2019)
Article Google Scholar
Iglesias, M.A.: Iterative regularization for ensemble data assimilation in reservoir models. Comput. Geosci. 19, 177–212 (2015)
Article Google Scholar
Luo, X., Stordal, A., Lorentzen, R., Nævdal, G.: Iterative ensemble smoother as an approximate solution to a regularized minimum-average-cost problem: theory and applications. SPE J. 20, 962–982 (2015). https://doi.org/10.2118/176023-PA, SPE-176023-PA
Article Google Scholar
Ma, X., Bi, L.: A robust adaptive iterative ensemble smoother scheme for practical history matching applications. Comput. Geosci. 23(3), 415–442 (2019)
Article Google Scholar
Stordal, A.S., Elsheikh, A.H.: Iterative ensemble smoothers in the annealed importance sampling framework. Adv. Water Resour. 86, 231–239 (2015)
Article Google Scholar
Evensen, G.: Analysis of iterative ensemble smoothers for solving inverse problems. Comput. Geosci. 22. https://doi.org/10.1007/s10596-018-9731-y (2018)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenom. 60(1-4), 259–268 (1992)
Article Google Scholar
Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of ℓ₂-ℓ_p minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Article Google Scholar
Li, L., Jiang, S., Huang, Q.: Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimed. 14(5), 1401–1413 (2012)
Article Google Scholar
Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008)
Article Google Scholar
Li, Q., Lin, N., et al.: The Bayesian elastic net. Bayesian Anal. 5(1), 151–170 (2010)
Article Google Scholar
Luo, X., Bhakta, T., Jakobsen, M., Nævdal, G.: An ensemble 4D-seismic history-matching framework with sparse representation based on wavelet multiresolution analysis. SPE J. 22, 985–1010 (2017). https://doi.org/10.2118/180025-PA, SPE-180025-PA
Article Google Scholar
Luo, X., Bhakta, T., Jakobsen, M., Nævdal, G.: Efficient big data assimilation through sparse representation: A 3D benchmark case study in petroleum engineering. PLOS ONE 13, e0198586 (2018)
Article Google Scholar
Soares, R.V., Luo, X., Evensen, G., Bhakta, T.: 4D seismic history matching: Assessing the use of a dictionary learning based sparse representation method. J. Pet. Sci. Eng. 195, 107763 (2020)
Article Google Scholar
Lorentzen, R., Flornes, K., Nævdal, G.: History matching channelized reservoirs using the ensemble Kalman filter. SPE J. 17, 137–151 (2012)
Article Google Scholar
Canchumuni, S.WA, Emerick, A.A., Pacheco, M.A.C.: History matching geological facies models based on ensemble smoother and deep generative models. J. Pet. Sci. Eng. 177, 941–958 (2019)
Article Google Scholar
Jafarpour, B.: Wavelet reconstruction of geologic facies from nonlinear dynamic flow measurements. IEEE Trans. Geosci. Remote Sens. 49, 1520–1535 (2011)
Article Google Scholar
Sarma, P., Chen, W.H., et al.: Generalization of the ensemble Kalman filter using kernels for nongaussian random fields. In: SPE reservoir simulation symposium. Society of Petroleum Engineers, The Woodlands. SPE-119177-MS (2009)
Strebelle, S.: Conditional simulation of complex geological structures using multiple-point statistics. Math. Geol. 34, 1–21 (2002)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley (2012)
Luo, X., Bhakta, T.: Automatic and adaptive localization for ensemble-based history matching. J. Pet. Sci. Eng. 184, 106559 (2020)
Article Google Scholar
Magnus, J.R., Neudecker, H.: Matrix differential calculus with applications in statistics and econometrics. Wiley (2019)
Simon, D.: Optimal state estimation: Kalman, H-infinity, and nonlinear approaches. Wiley-Interscience (2006)

Download references

Acknowledgements

The author would like to thank two anonymous reviewers for their valuable and constructive suggestions. The author acknowledges financial support from the Research Council of Norway through the Petromaks-2 project DIGIRES (RCN no. 280473) and the industrial partners AkerBP, Wintershall DEA, Vår Energi, Petrobras, Equinor, Lundin and Neptune Energy, and would also like to thank Schlumberger for providing academic licenses to ECLIPSE^Ⓒ.

Funding

Open access funding provided by NORCE Norwegian Research Centre AS.

Author information

Authors and Affiliations

Norwegian Research Centre (NORCE), 5008, Bergen, Norway
Xiaodong Luo

Authors

Xiaodong Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Luo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Iterative ensemble smoothers derived from a class of generalized cost functions

In the sequel, we proceed to develop an approximate solution to the generalized MAC problem in Eqs. 9 and 10, in a way similar to that in [13]. To this end, we need to assume certain regularity conditions, namely, the operators $\mathcal {D}$, $\mathcal {R}$, $\mathcal {T}$, $\mathcal {S}$ and g are (locally) differentiable up to relevant orders in our derivations below.

We start from considering the first order Taylor approximation

$$ \begin{array}{@{}rcl@{}} \mathbf{f}\left( \mathbf{x}_0 + \delta \mathbf{x}\right) \approx \mathbf{f}\left( \mathbf{x}_0 \right) + \left[\boldsymbol{\nabla}_{f}\left( \mathbf{x}_0\right)\right]^T \delta \mathbf{x}, \end{array} $$

(45)

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{f}\left( \mathbf{x}_0\right) \equiv \frac{\partial \mathbf{f} \left( \mathbf{x} \right)}{\partial \mathbf{x}} |_{\mathbf{x} = \mathbf{x}_0}, \end{array} $$

(46)

where f is a vector-valued function, x₀ is an input vector associated with a (relatively small) perturbation vector δx, and $\boldsymbol {\nabla }_{f}\left (\mathbf {x}_{0}\right )$, as defined in Eq. 46, represents the gradient with respect to f evaluated at the point x₀. Note that throughout this work, we adopt the following convention: given the m_x-dimensional vector x₀ and the function $\mathbf {f}: \mathbb {R}^{m_{x}} \rightarrow \mathbb {R}^{m_{f}}$, the gradient $\boldsymbol {\nabla }_{f}\left (\mathbf {x}_{0}\right )$ is a matrix in the dimension of m_x × m_f.

The following formulas of matrix calculus [32]

$$ \begin{array}{@{}rcl@{}} \frac{\partial \mathbf{f}\left( \mathbf{u}\left( \mathbf{x} \right)\right)}{\partial \mathbf{x}} = \frac{\partial \mathbf{u} \left( \mathbf{x} \right) }{\partial \mathbf{x}} \frac{\partial \mathbf{f} \left( \mathbf{u} \right)}{\partial \mathbf{u}} = \boldsymbol{\nabla}_{u}\left( \mathbf{x}\right) \boldsymbol{\nabla}_{f}\left( \mathbf{u}\right) , \end{array} $$

(47)

$$ \begin{array}{@{}rcl@{}} \frac{\partial \mathbf{M} \mathbf{u}\left( \mathbf{x} \right)}{\partial \mathbf{x}} = \frac{\partial \mathbf{u} \left( \mathbf{x} \right) }{\partial \mathbf{x}} \mathbf{M}^T = \boldsymbol{\nabla}_{u}\left( \mathbf{x}\right) \mathbf{M}^T. \end{array} $$

(48)

will get involved in our derivation later. In Eq. 47, $\mathbf {f}\left (\mathbf {u}\left (\mathbf {x} \right )\right )$ represents the composition of two vector functions f and u, with the dummy input vector x. Equation 48 may be considered as a special case of Eq. 47, where $\mathbf {f}\left (\mathbf {u}\left (\mathbf {x} \right )\right ) = \mathbf {M} \mathbf {u}\left (\mathbf {x} \right )$, with M being a constant matrix, such that ∂Mu/∂u = M^T.

In addition, the follow matrix identity

$$ \begin{array}{@{}rcl@{}} \mathbf{C}_m \mathbf{H}^T \left( \mathbf{H} \mathbf{C}_m \mathbf{H}^T + \mathbf{C}_d \right)^{-1} = \left( \mathbf{C}_m^{-1} + \mathbf{H}^T \mathbf{C}_d^{-1} \mathbf{H} \right)^{-1} \mathbf{H}^T \mathbf{C}_d^{-1} , \end{array} $$

(49)

will also be found useful for our derivation later. The left-hand-side (LHS) of Eq. 49 would correspond to the Kalman gain matrix in the Kalman filter, if one treats C_m as the prior model error covariance matrix, H as a linear observation operator that maps a model vector onto the observation space, and C_d as the observation error covariance matrix. The right-hand-side (RHS) of Eq. 49 is another way to represent the Kalman gain matrix, and is often used to formulate the information filter [33].

To solve the minimization problem in Eqs. 9–10, we aim to set

$$ \begin{array}{@{}rcl@{}} \partial C_j^{i+1} / \partial \mathbf{m}_j^{i+1} = 0. \end{array} $$

(50)

To this end, the follow linearization [13] strategy, through the first Taylor approximation (45), is adopted.

$$ \begin{array}{@{}rcl@{}} \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i+1}\right) &\equiv& \mathcal{T}\left( \mathbf{g}\left( \mathbf{m}_j^{i+1}\right)\right) \approx \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) + \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) , \end{array} $$

(51)

$$ \begin{array}{@{}rcl@{}} \mathcal{S} \left( \mathbf{m}_j^{i+1}\right) &\approx& \mathcal{S} \left( \mathbf{m}_c^{i}\right) + \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) , \end{array} $$

(52)

where $\mathbf {m}_{c}^{i}$ is a “common” point with respect to the (background) ensemble ${\mathscr{M}}^{i}$. The motivation to use this “common” point is to avoid evaluating the gradients with respect to the operators $\mathcal {T} \circ \mathbf {g}$ and $\mathcal {S}$ at multiple points. There could be various choices for $\mathbf {m}_{c}^{i}$, e.g., by setting $\mathbf {m}_{c}^{i}$ as the ensemble mean of Mⁱ, or as one ensemble member closest to the ensemble mean [13]. In the current work, by default we will let $\mathbf {m}_{c}^{i}$ be the ensemble mean of ${\mathscr{M}}^{i}$, unless otherwise stated. We note that the choice of $\mathbf {m}_{c}^{i}$ does not affect the deviations below.

Inserting (51) into the data mismatch term $\mathcal {D}$ of Eq. 10, then we have

$$ \begin{array}{@{}rcl@{}} \mathcal{A} &\equiv& \mathcal{D} \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T}\left( \mathbf{g}\left( \mathbf{m}_j^{i+1}\right) \right)\right] \approx\\ && \mathcal{D} \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) - \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \right] . \end{array} $$

(53)

Accordingly, we have

$$ \begin{array}{@{}rcl@{}} \partial \mathcal{A} / \partial \mathbf{m}_j^{i+1} & = &- \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) - \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \right] \\ & \approx &- \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left\{ \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) \right] - \left( \boldsymbol{\nabla}_{\boldsymbol{\nabla}_{\mathcal{D}}} \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \right\} \\ & \equiv &- \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left\{ \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) \right] - \left( \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \right\}. \end{array} $$

(54)

In Eq. 54, we use Eqs. 47 and -48 to derive the first line. In the second line, we then apply the first Taylor approximation (45), around the point $\mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{c}^{i}\right )$, to the function $\boldsymbol {\nabla }_{\mathcal {D}}$ in the first line. As a result, we come out with a second-order gradient (i.e., Hessian) $\boldsymbol {\nabla }_{\boldsymbol {\nabla }_{\mathcal {D}}}$, which is denoted as $\boldsymbol {\nabla }_{\mathcal {D}}^{2}$ in the third line for short.

Similarly, by inserting (52) into the regularization term $\mathcal {R}$ of Eq. 10, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{B} &\equiv& \mathcal{R} \left[ \mathcal{S}\left( \mathbf{m}_j^{i+1}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] \approx\\&& \mathcal{R} \left[\mathcal{S} \left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) + \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \right] , \end{array} $$

(55)

and the corresponding derivative is

$$ \begin{array}{@{}rcl@{}} \partial \mathcal{B} / \partial \mathbf{m}_j^{i+1} & \approx &\boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \left\{ \boldsymbol{\nabla}_{\mathcal{R}} \left[ \mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] +\right.\\&& \left.\left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \right\} . \end{array} $$

(56)

Combing (10), (50), (54) and (56) and with some linear algebra, one has

$$ \begin{array}{@{}rcl@{}} && \left\{ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \right. \\ && \left. \quad + \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \right\} \left( \mathbf{m}_j^{i+1} - \mathbf{m}_c^{i} \right) \\ & = &\boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) \right] - \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}} \left[ \mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right]. \end{array} $$

(57)

As in the ensemble-based methods, the model variables $\mathbf {m}_{j}^{i+1}$ are updated from $\mathbf {m}_{j}^{i}$, rather than from $\mathbf {m}_{c}^{i}$, we re-arrange (57) as follows:

$$ \begin{array}{@{}rcl@{}} && \left\{ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \right. \\ && \left. \quad + \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \right\} \left( \mathbf{m}_j^{i+1} - \mathbf{m}_j^{i} \right) \\ & = &\boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) \right] - \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}} \left[ \mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] \\ && \quad + \left\{ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \right. \\ && \left. \qquad \quad + \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \right\} \left( \mathbf{m}_c^{i} - \mathbf{m}_j^{i} \right) . \end{array} $$

(58)

After regrouping different terms, the RHS of Eq. 58 can be rewritten as $\mathcal {E}+\mathcal {F}$, with

$$ \begin{array}{@{}rcl@{}} \mathcal{E} & \equiv &\boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left\{ \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right] + \left( \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_c^{i} - \mathbf{m}_j^{i} \right) \right\} \\ & \approx &\boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}} \left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right) + \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_c^{i} - \mathbf{m}_j^{i} \right) \right] \\ & \approx &\boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}} \left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] . \end{array} $$

(59)

In Eq. 59, the second line is obtained by applying the Taylor approximation (45) (in a reverse order) to the function $\boldsymbol {\nabla }_{\mathcal {D}}$, around the point $\mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{c}^{i}\right )$. Likewise, the third line is derived by applying (45) again, now to the function $\mathcal {T} \circ \mathbf {g}$, around the point $\mathbf {m}_{c}^{i}$.

Likewise, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{F} & \equiv & - \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \left\{\boldsymbol{\nabla}_{\mathcal{R}} \left[ \mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] - \left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_c^{i} - \mathbf{m}_j^{i} \right)\right\} \\ & \approx &- \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}} \left[ \mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) - \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \left( \mathbf{m}_c^{i} - \mathbf{m}_j^{i} \right) \right] \\ & \approx &- \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}} \left[ \mathcal{S}\left( \mathbf{m}_j^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\mathbf{0} . \end{array} $$

(60)

To derive the final result in Eq. 60, we make the following assumption: $\boldsymbol {\nabla }_{\mathcal {R}}\left (\mathbf {0}\right ) = \mathbf {0}$. The rationale behind this assumption is that, in a conventional setting of regularized inverse-problem theory [6], the regularization term would typically achieve the minimum value (and actually vanish) at the zero value.

Combining (58) – (60), one obtains

$$ \begin{array}{@{}rcl@{}} & &\left\{ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T \right. \\ && \left. \quad + \gamma^i \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)^T \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \right\} \left( \mathbf{m}_j^{i+1} - \mathbf{m}_j^{i} \right) \\ & = & \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}} \left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] . \end{array} $$

(61)

As the Hessian $\boldsymbol {\nabla }_{\mathcal {D}}^{2} \left [\mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{c}^{i}\right )\right ]$ and $\boldsymbol {\nabla }_{\mathcal {R}}^{2} \left [\mathcal {S}\left (\mathbf {m}_{c}^{i}\right ) - \mathcal {S}\left (\mathbf {m}_{j}^{i}\right )\right ]$ are symmetric, we discard the transpose operator T hereafter. To proceed further, we need to apply (49)–(61). To this end, we do the following assignments:

$$ \begin{array}{@{}rcl@{}} \left( \mathbf{C}_{m,j}^{i}\right)^{-1} & = &\boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T; \end{array} $$

(62)

$$ \begin{array}{@{}rcl@{}} \left( \mathbf{C}_{d,j}^{i}\right)^{-1} & = &\boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]; \end{array} $$

(63)

$$ \begin{array}{@{}rcl@{}} \mathbf{H}^i & = &\left[ \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \right]^T . \end{array} $$

(64)

Then using Eq. 49 and with some algebra, one has the following update formula:

$$ \begin{array}{@{}rcl@{}} \mathbf{m}_j^{i+1} & = &\mathbf{m}_j^{i} + \mathbf{K}_j^i \mathbf{C}_{d,j}^{i} \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\mathbf{m}_j^{i} + \mathbf{K}_j^i \left\{\boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right]\right\}^{-1} \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] , \end{array} $$

(65)

$$ \begin{array}{@{}rcl@{}} \mathbf{K}_j^i & \equiv &\mathbf{C}_{m,j}^{i} \left( \mathbf{H}^i\right)^T \left( \mathbf{H}^i \mathbf{C}_{m,j}^{i} \left( \mathbf{H}^i\right)^T + \gamma^i \mathbf{C}_{d,j}^{i} \right)^{-1}. \end{array} $$

(66)

Before proceeding further, we have a few remarks. First, the assimilation algorithm presented in Eqs. 62–66 provides an approximation solution to the GMAC problem, but without using ensemble approximations to the first- and second-order gradients therein yet. If one already has access to all required gradients, then they can be used in the assimilation algorithm, and this may lead to improved performance of history matching. On the other hand, if some of the gradients are not available or impractical to compute and/or store, then one may employ ensemble approximations to obtain certain (sometimes partially) derivative-free algorithms, as will be discussed later.

Second, in the original IES algorithm (3)–(7), the Kalman-gain-like matrix, Kⁱ in Eq. 4, is common to all ensemble members. In contrast, in the case with a generalized cost function, e.g., that in Eq. 10, the corresponding Kalman-gain-like matrix, $\mathbf {K}_{j}^{i}$ in Eq. (66), depends on both the individual ensemble member $\mathbf {m}_{j}^{i}$ and even the (potentially) perturbed data d_j in general, as one can see from Eqs. 62–63. While the diversity of $\mathbf {K}_{j}^{i}$ may be of theoretical interest, in practice one of the implications is the increased memory consumption for storing these Kalman-gain-like matrices. This problem can be avoided if one lets both the distance metric $\mathcal {D}$ and the regularization operator $\mathcal {R}$ be some quadratic functions, as in the problem formulation that leads to the original IES, cf. Equations 2 or Eqs. 11 and 12; or partially mitigated if one adopts ensemble approximations to compute $\mathbf {K}_{j}^{i}$, as will be shown below.

To obtain ensemble-based approximation, we start from computing the product $\mathbf {C}_{m,j}^{i} \left (\mathbf {H}^{i}\right )^{T}$ involved in the computation of $\mathbf {K}_{j}^{i}$. Let

$$ \begin{array}{@{}rcl@{}} \mathbf{T}_1 & \equiv &\mathbf{C}_{m,j}^{i} \left( \mathbf{H}^i\right)^T \\ & = &\left\{\boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T\right\}^{-1} \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right) \\ & = &\mathbf{S}_{\mathcal{I}}^i \left\{ \left( \mathbf{S}_{\mathcal{I}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \left[ \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{m}_c^{i}\right) \right]^T \mathbf{S}_{\mathcal{I}}^i \right\}^{-1} \\&& \left( \mathbf{S}_{\mathcal{I}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{T} \circ \mathbf{g}}\left( \mathbf{m}_c^{i}\right), \end{array} $$

(67)

where $\mathbf {S}_{\mathcal {I}}$ is an (ensemble-induced) square-root matrix, as defined in Eqs. 6 and 8. Now, for a generic function f with suitable regularity conditions, we have

$$ \begin{array}{@{}rcl@{}} \left( \mathbf{S}_{\mathcal{I}}^i\right)^T \boldsymbol{\nabla}_{\mathbf{f}}\left( \mathbf{m}_c^{i}\right) & = &\frac{1}{\sqrt{N_e -1}} \left[\left( \mathbf{m}_1^{i} - \bar{\mathbf{m}}^{i} \right)^T \boldsymbol{\nabla}_{\mathbf{f}}\left( \mathbf{m}_c^{i}\right), \left( \mathbf{m}_2^{i} - \bar{\mathbf{m}}^{i}\right)^T \boldsymbol{\nabla}_{\mathbf{f}}\left( \mathbf{m}_c^{i}\right), \dotsb, \left( \mathbf{m}_{N_e}^{i} - \bar{\mathbf{m}}^{i}\right)^T \boldsymbol{\nabla}_{\mathbf{f}}\left( \mathbf{m}_c^{i}\right) \right] \\ & \approx &\frac{1}{\sqrt{N_e -1}} \left[\mathbf{f}\left( \mathbf{m}_1^{i}\right) - \mathbf{f}\left( \mathbf{m}_c^{i}\right), \mathbf{f}\left( \mathbf{m}_2^{i}\right) - \mathbf{f}\left( \mathbf{m}_c^{i}\right),\dotsb,\mathbf{f}\left( \mathbf{m}_{N_e}^{i}\right) - \mathbf{f}\left( \mathbf{m}_c^{i}\right) \right]^T \\ & \equiv &\left( \mathbf{S}_{\mathbf{f}}^i\right)^T . \end{array} $$

(68)

To drive the results in Eq. 68, we have used the fact that $\bar {\mathbf {m}}^{i} = \mathbf {m}_{c}^{i}$, and applied the first order Taylor approximation (45), in the reverse order, to derive the result in the second line.

With Eq. 68, we then have

$$ \begin{array}{@{}rcl@{}} \mathbf{T}_1 & \approx &\mathbf{S}_{\mathcal{I}}^i \left\{ \left( \mathbf{S}_{\mathcal{S}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \mathbf{S}_{\mathcal{S}}^i \right\}^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \\ & = &\mathbf{S}_{\mathcal{I}}^i \left[ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T . \end{array} $$

(69)

$$ \begin{array}{@{}rcl@{}} \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) & \equiv &\left( \mathbf{S}_{\mathcal{S}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \mathbf{S}_{\mathcal{S}}^i . \end{array} $$

(70)

If one does not have the analytic form of the Hessian $\boldsymbol {\nabla }_{\mathcal {R}}^{2}$ of the regularization operator $\mathcal {R}$, then an ensemble-based approximation strategy can be further deployed to compute $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$, which will be presented separately later.

Now we consider the computation of the product $\left (\mathbf {H}^{i}\right ) \mathbf {C}_{m,j}^{i} \left (\mathbf {H}^{i}\right )^{T}$. Following the previous deduction, we have

$$ \begin{array}{@{}rcl@{}} \mathbf{T}_2 & \equiv &\mathbf{H}^i \mathbf{C}_{m,j}^{i} \left( \mathbf{H}^i\right)^T \\ & \approx &\mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \left[ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T . \end{array} $$

(71)

On the other hand, we have

$$ \begin{array}{@{}rcl@{}} \mathbf{C}_{d,j}^{i} & = &\left\{\boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right] \right\}^{-1} \\ & = &\mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \left\{ \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right] \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \right\}^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \\ & = &\mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \end{array} $$

(72)

$$ \begin{array}{@{}rcl@{}} \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) & \equiv &\left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{D}}^2 \left[\mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_c^{i}\right)\right] \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i . \end{array} $$

(73)

Again, if $\boldsymbol {\nabla }_{\mathcal {D}}^{2}$ cannot be evaluated exactly, then an ensemble-based approximation can be adopted, which will be discussed later. Assembling the results from Eqs. 67–73 into Eq. 65, we derive the following ensemble-based model update formula:

$$ \begin{array}{@{}rcl@{}} \mathbf{m}_j^{i+1} & = &\mathbf{m}_j^{i} + \mathbf{S}_{\mathcal{I}}^i \left[ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \left\{ \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \left[ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T + \gamma^i \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \right\}^{-1} \times \\ && \qquad \qquad \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\mathbf{m}_j^{i} + \mathbf{S}_{\mathcal{I}}^i \left[ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left\{ \left[ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} + \gamma^i \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) \right]^{-1} \right\}^{-1} \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\mathbf{m}_j^{i} + \mathbf{S}_{\mathcal{I}}^i \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) + \gamma^i \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\mathbf{m}_j^{i} + \tilde{\mathbf{K}}_j^i \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] . \end{array} $$

(74)

$$ \begin{array}{@{}rcl@{}} \tilde{\mathbf{K}}_j^i & \equiv &\mathbf{S}_{\mathcal{I}}^i \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) + \gamma^i \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) \right]^{-1} \left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T . \end{array} $$

(75)

In Eq. 74, if the analytic form of the gradient $\boldsymbol {\nabla }_{\mathcal {D}}$ is not available, then an ensemble-based approximation strategy can be adopted for its computation, as will be discussed later. Equation 75 defines an effective Kalman-gain-like matrix $\tilde {\mathbf {K}}_{j}^{i}$ involved in the model update formula (74). For the purpose of calculating $\tilde {\mathbf {K}}_{j}^{i}$, the involved square-root matrices $\mathbf {S}_{\mathcal {I}}^{i}$ and $\mathbf {S}_{\mathcal {T} \circ \mathbf {g}}^{i}$ are common to all ensemble members. The differences among ensemble members reside in the matrices $\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )$ and $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$, which are both in the dimension of N_e × N_e. Assuming that $\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )$ and $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$ vary over all ensemble members, then the storage costs of $\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )$ and $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$ are in the order of ${N_{e}^{3}}$, which is normally affordable in practice, with the typical ensemble size N_e being in the order of 10².

Below we show that the original IES algorithm 3 is a special case of the more general formula in Eq. 74. To this end, let the distance metric $\mathcal {D}$ and the regularization operator $\mathcal {R}$ be some quadratic mappings as defined in Eqs. 11 and 12. In addition, let both the transform operators $\mathcal {T}$ and $\mathcal {S}$ be identity, such that

$$ \begin{array}{@{}rcl@{}} \mathbf{S}_{\mathcal{S}}^i & = &\mathbf{S}_{\mathcal{I}}^i ; \\ \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i & = &\mathbf{S}_{\mathbf{g}}^i; \\ \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] & = &\mathbf{C}_d^{-1} \left( \mathbf{d}_j - \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right); \\ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right) & = &\left( \mathbf{S}_{\mathbf{g}}^i\right)^T \mathbf{C}_d^{-1} \mathbf{S}_{\mathbf{g}}^i ; \\ \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) & = &\left( \mathbf{S}_{\mathcal{I}}^i\right)^T \left( \mathbf{C}_m^i\right)^{-1} \mathbf{S}_{\mathcal{I}}^i = \mathbf{I}_{N_e} , \end{array} $$

(76)

where $\mathbf {I}_{N_{e}}$ is the identity matrix in the dimension of N_e × N_e. As a result, the update formula (74) is reduced to

$$ \begin{array}{@{}rcl@{}} \mathbf{m}_j^{i+1} & = &\mathbf{m}_j^{i} + \mathbf{S}_{\mathcal{I}}^i \left[ \left( \mathbf{S}_{\mathbf{g}}^i\right)^T \mathbf{C}_d^{-1} \mathbf{S}_{\mathbf{g}}^i + \gamma^i \mathbf{I}_{N_e} \right]^{-1} \left( \mathbf{S}_{\mathbf{g}}^i\right)^T \mathbf{C}_d^{-1} \left( \mathbf{d}_j - \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right). \end{array} $$

(77)

According to Eq. 49, one has

$$ \begin{array}{@{}rcl@{}} \left[ \left( \mathbf{S}_{\mathbf{g}}^i\right)^T \mathbf{C}_d^{-1} \mathbf{S}_{\mathbf{g}}^i + \gamma^i \mathbf{I}_{N_e} \right]^{-1} \left( \mathbf{S}_{\mathbf{g}}^i\right)^T \mathbf{C}_d^{-1} = (\mathbf{S}_{\mathbf{g}}^i)^T \left( \mathbf{S}_{\mathbf{g}}^i(\mathbf{S}_{\mathbf{g}}^i)^T + \gamma^i \mathbf{C}_d \right)^{-1}. \end{array} $$

(78)

Therefore, Eq. 77 is equivalent to

$$ \begin{array}{@{}rcl@{}} \mathbf{m}_j^{i+1} & = &\mathbf{m}_j^{i} + \mathbf{S}_{\mathcal{I}}^i (\mathbf{S}_{\mathbf{g}}^i)^T \left( \mathbf{S}_{\mathbf{g}}^i(\mathbf{S}_{\mathbf{g}}^i)^T + \gamma^i \mathbf{C}_d \right)^{-1} \left( \mathbf{d}_j - \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right), \end{array} $$

(79)

which is exactly the same as the update formula induced by Eqs. 3–7.

Appendix B: Approximations to the products between gradient or Hessian and certain ensemble-induced square-root matrices

Here, we consider ensemble-based approximations to the term $\left (\mathbf {S}_{\mathcal {T} \circ \mathbf {g}}^{i}\right )^{T} \boldsymbol {\nabla }_{\mathcal {D}} \left [ \mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{j}^{i}\right ) \right ]$ in Eq. 74, the term $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$ defined in Eq. 70, and the term $\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )$ defined in Eq. 73, when the analytic forms of $\boldsymbol {\nabla }_{\mathcal {D}}$, $\boldsymbol {\nabla }_{\mathcal {R}}^{2}$ and $\boldsymbol {\nabla }_{\mathcal {D}}^{2}$ are not available.

Similar to Eq. 68, by applying the first order Taylor approximation (45), one has

$$ \begin{array}{@{}rcl@{}} & &\left( \mathbf{S}_{\mathcal{T} \circ \mathbf{g}}^i\right)^T \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\frac{1}{\sqrt{N_e -1}} \left[ \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_1^{i}\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right), \dotsb, \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_{N_e}^{i}\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right) \right]^T \times \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & = &\frac{1}{\sqrt{N_e -1}} \left[ \left( \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_1^{i}\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) \right) + \left( \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right) \right), \dotsb, \right. \\ && \qquad \qquad \qquad \left. \left( \mathcal{T}\circ \mathbf{g} \left( \mathbf{m}_{N_e}^{i}\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) \right) + \left( \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right) \right) \right]^T \times \boldsymbol{\nabla}_{\mathcal{D}} \left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_j^{i}\right) \right] \\ & \approx &\frac{1}{\sqrt{N_e -1}} \left[ \left( \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) \right) - \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_1^{i}\right) \right) \right) + \right. \\ && \qquad \qquad \qquad \left( \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right) \right) - \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) \right) \right) , \\ && \qquad \qquad \qquad \qquad \qquad \qquad \dotsb \dotsb \\ && \qquad \qquad \qquad \left( \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) \right) - \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_{N_e}^{i}\right) \right) \right) + \\ && \qquad \qquad \qquad \left. \left( \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right) \right) - \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_j^{i}\right) \right) \right) \right] \\ & = &\frac{\mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_c^{i}\right) \right)}{\sqrt{N_e -1}} \mathbf{1}_{N_e} - \frac{1}{\sqrt{N_e -1}} \left[ \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_1^{i}\right) \right),\dotsb, \mathcal{D} \left( \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g} \left( \mathbf{m}_{N_e}^{i}\right) \right) \right]^T , \end{array} $$

(80)

where $\mathbf {1}_{N_{e}}$ is a N_e-long vector whose elements are all equal to 1.

Next, consider the ensemble approximation to $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$. First, let us evaluate

$$ \begin{array}{@{}rcl@{}} \mathbf{P}_1 & ={} &\left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right) \mathbf{S}_{\mathcal{S}}^i \\ & = &\frac{1}{\sqrt{N_e -1}} \left( \boldsymbol{\nabla}_{\mathcal{R}}^2 \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right]\right)\\&& \qquad \qquad \qquad \left[ \mathcal{S}\left( \mathbf{m}_1^{i}\right) - \mathcal{S}\left( \mathbf{m}_c^{i}\right), \dotsb, \mathcal{S}\left( \mathbf{m}_{N_e}^{i}\right) - \mathcal{S}\left( \mathbf{m}_c^{i}\right) \right] \\ & \approx &\frac{1}{\sqrt{N_e -1}} \left[ \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_1^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] - \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] ,\dotsb, \right. \\ && \qquad \qquad \qquad \left. \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_{N_e}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] - \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_{c}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \right]. \end{array} $$

(81)

Then we have

$$ \begin{array}{@{}rcl@{}} \mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right) & = &\left( \mathbf{S}_{\mathcal{S}}^i\right)^T \mathbf{P}_1 \\ & \approx &\frac{1}{N_e -1} \left[ \mathcal{S}\left( \mathbf{m}_1^{i}\right) - \mathcal{S}\left( \mathbf{m}_c^{i}\right), \dotsb, \mathcal{S}\left( \mathbf{m}_{N_e}^{i}\right) - \mathcal{S}\left( \mathbf{m}_c^{i}\right) \right]^T \\ && \times \left[ \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_1^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] - \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_c^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right],\dotsb, \right. \\ && \qquad \qquad \left. \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_{N_e}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] - \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_{c}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \right]. \end{array} $$

(82)

The element of $\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )$ on the k th row ($k = 1, 2, \dotsb , N_{e}$) and the ℓ th column ($\ell = 1, 2, \dotsb , N_{e}$), denoted by $ \left [\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\right ]_{k,l}$, is then given by

$$ \begin{array}{@{}rcl@{}} \left[\mathbf{M}_{\mathcal{R}}^i\left( \mathbf{m}_j^{i} \right)\right]_{k,\ell} & = &\frac{1}{N_e -1} \left( \mathcal{S}\left( \mathbf{m}_k^{i}\right) - \mathcal{S}\left( \mathbf{m}_c^{i}\right)\right)^T \left\{\boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_{\ell}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] - \boldsymbol{\nabla}_{\mathcal{R}} \left[\mathcal{S}\left( \mathbf{m}_{c}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right)\right] \right\} \\ & \approx &\frac{1}{N_e -1} \left\{ \mathcal{R}\left[ \mathcal{S}\left( \mathbf{m}_k^{i}\right) + \mathcal{S}\left( \mathbf{m}_{\ell}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) - \mathcal{S}\left( \mathbf{m}_c^{i}\right) \right] + \mathcal{R}\left[\mathcal{S}\left( \mathbf{m}_{c}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] \right. \\ & &\qquad \qquad \quad \left. - \mathcal{R}\left[\mathcal{S}\left( \mathbf{m}_{k}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] - \mathcal{R}\left[\mathcal{S}\left( \mathbf{m}_{\ell}^{i}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right] \right\}. \end{array} $$

(83)

Clearly, one has $\left [\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\right ]_{k,\ell } = \left [\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\right ]_{\ell ,k}$, meaning that the ensemble approximation leads to a symmetric matrix.

In a similar way, the elements of the ensemble approximation to the term $\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )$ can be obtained as follows:

$$ \begin{array}{@{}rcl@{}} \left[ \mathbf{M}_{\mathcal{D}}^i\left( \mathbf{d}_j \right)\right]_{k,\ell} & \approx &\frac{1}{N_e -1} \left\{ \mathcal{D}\left[ \mathcal{T}\left( \mathbf{d}_j\right) + \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_{c}^{i}\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_{k}^{i}\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_{\ell}^{i}\right) \right] + \mathcal{D}\left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_{c}^{i}\right) \right] \right. \\ && \qquad \qquad \quad \left. - \mathcal{D}\left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_{k}^{i}\right) \right] - \mathcal{D}\left[ \mathcal{T}\left( \mathbf{d}_j\right) - \mathcal{T} \circ \mathbf{g}\left( \mathbf{m}_{\ell}^{i}\right) \right] \right\}. \end{array} $$

(84)

Again, the ensemble approximation leads to a symmetric matrix, since $\left [ \mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\right ]_{k,\ell } = \left [ \mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\right ]_{\ell ,k}$.

Appendix C: Analytic forms of gradient and Hessian with respect to the ${\ell _{p}^{q}}$ metric

In the sequel, we derive the analytic forms of gradient and Hessian of the distance metric $\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}$, where x is an m_x-dimensional vector, and B a matrix in the dimension of N_b × m_x. As a result, the product Bx leads to a N_b-dimensional vector, while the meaning of $\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}$ is to raise the ℓ_p norm of Bx to the q th power. In short, we call $\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}$ a ${\ell _{p}^{q}}$ metric hereafter.

By definition, we have the ${\ell _{p}^{q}}$ metric

$$ \begin{array}{@{}rcl@{}} \Vert \mathbf{B} \mathbf{x} \Vert_p^q &=& \left( \sum\limits_{e=1}^{N_b} \vert \left( \mathbf{B} \mathbf{x}\right)_e \vert^p\right)^{q/p}; \end{array} $$

(85)

$$ \begin{array}{@{}rcl@{}} \left( \mathbf{B} \mathbf{x}\right)_e &\equiv& \sum\limits_{f=1}^{m_x} B_{e,f} x_f , \end{array} $$

(85)

where $\left (\mathbf {B} \mathbf {x}\right )_{e}$ stands for the e th element of the vector Bx, x_f for the f th element of the vector x ($f = 1, 2, \dotsb , m_{x}$), and B_{e, f} for the element of the matrix B on the e th row and the f th column.

With some algebra, one has

$$ \begin{array}{@{}rcl@{}} \frac{\Vert \mathbf{B} \mathbf{x} \Vert_p^q}{\partial x_k} & = &\frac{\partial}{\partial x_k} \left( \sum\limits_{e=1}^{N_b} \vert \sum\limits_{f=1}^{m_x} B_{e,f} x_f \vert^p \right)^{q/p} \\ & = &\frac{q}{p} \left( \sum\limits_{e=1}^{N_b} \vert \sum\limits_{f=1}^{m_x} B_{e,f} x_f \vert^p\right)^{(q-p)/p} \left[\sum\limits_{e=1}^{N_b} p \vert \sum\limits_{f=1}^{m_x} B_{e,f} x_f \vert^{p-1} \frac{\partial}{\partial x_k} \vert \sum\limits_{f=1}^{m_x} B_{e,f} x_f \vert \right] \\ & = &q \left( \sum\limits_{e=1}^{N_b} \vert \sum\limits_{f=1}^{m_x} B_{e,f} x_f \vert^p\right)^{(q-p)/p} \left[\sum\limits_{e=1}^{N_b} \vert \sum\limits_{f=1}^{m_x} B_{e,f} x_f \vert^{p-2} \left( \sum\limits_{f=1}^{m_x} B_{e,f} x_f \right) B_{e,k} \right] \\ & = &q \Vert \mathbf{B} \mathbf{x} \Vert_p^{q-p} \left[\sum\limits_{e=1}^{N_b} \vert \left( \mathbf{B} \mathbf{x}\right)_e \vert^{p-2} \left( \left( \mathbf{B} \mathbf{x}\right)_e \right) B_{e,k} \right]. \end{array} $$

(87)

Note that to derive the result in the third line of Eq. 87 from the second one, we have used the result that d|x|/dx = sgn(x) = x/|x| (with sgn being the sign function), while ignoring the singularity of the derivative at x = 0. In the context of ensemble-based data assimilation, this way of handling the singularity at x = 0 may work well in general, since we are typically dealing with gradient and Hessian evaluated at local points away from 0, due to the deviations of ensemble members from the ensemble mean. For example, see the formula in Eq. 16.

Based on Eq. 87, one has

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\Vert \mathbf{B} \mathbf{x} \Vert_p^q}\left( \mathbf{x} \right) & \equiv &\frac{ \partial \Vert \mathbf{B} \mathbf{x} \Vert_p^q}{\partial \mathbf{x}} \\ & = &q \Vert \mathbf{B} \mathbf{x} \Vert_p^{q-p} \mathbf{B}^T \mathbf{a}; \end{array} $$

(88)

$$ \begin{array}{@{}rcl@{}} \mathbf{a} & \equiv &\vert \mathbf{B} \mathbf{x} \vert^{\dot\land (p-2)} \odot \left( \mathbf{B} \mathbf{x}\right) . \end{array} $$

(89)

In Eq. 89, the operator $\dot \land $ raises the elements of a vector to a certain power, e.g., for a (column) vector $\mathbf {y} = [y_{1},y_{2},\dotsb ]^{T}$, one has $\vert \mathbf {y} \vert ^{\dot \land p} = \left [|y_{1}|^{p},|y_{2}|^{p},\dotsb \right ]^{T}$. On the other hand, the operator ⊙ stands for element-wise product (i.e., Hadamard product), such that for $\mathbf {x} = [x_{1},x_{2},\dotsb ]^{T}$ and $\mathbf {y} = [y_{1},y_{2},\dotsb ]^{T}$, one has $\mathbf {x} \odot \mathbf {y} = \left [x_{1} y_{1}, x_{2} y_{2}, \dotsb \right ]^{T}$.

As special cases, if p = q = 2 or p = q = 1, then Eqs. 88 and 89 lead to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\Vert \mathbf{B} \mathbf{x} \Vert_2^2}\left( \mathbf{x}\right) = 2 \mathbf{B}^T \mathbf{B} \mathbf{x}, \end{array} $$

or

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\Vert \mathbf{B} \mathbf{x} \Vert_1^1}\left( \mathbf{x}\right) = \mathbf{B}^T \dot{sgn}(\mathbf{B} \mathbf{x}), \end{array} $$

where $\dot {sgn}$ denotes the sign function applied element-wise to its input.

In a similar way, we can proceed to compute the Hessian of $\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}$. With some algebra, we obtain

$$ \begin{array}{@{}rcl@{}} \frac{\Vert \mathbf{B} \mathbf{x} \Vert_p^q}{\partial x_k \partial x_s} & = &q (q-p) \Vert \mathbf{B} \mathbf{x} \Vert_p^{q-2p} \left[\sum\limits_{e=1}^{N_b} \vert \left( \mathbf{B} \mathbf{x}\right)_e \vert^{p-2} \left( \left( \mathbf{B} \mathbf{x}\right)_e \right) B_{e,k} \right] \\&& \left[\sum\limits_{e=1}^{N_b} \vert \left( \mathbf{B} \mathbf{x}\right)_e \vert^{p-2} \left( \left( \mathbf{B} \mathbf{x}\right)_e \right) B_{e,s} \right] + q (p-1) \Vert \mathbf{B} \mathbf{x} \Vert_p^{q-p} \\&& \left[\sum\limits_{e=1}^{N_b} B_{e,k} B_{e,s} \vert \left( \mathbf{B} \mathbf{x}\right)_e \vert^{p-2} \right]. \end{array} $$

(90)

As a result, we have

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\Vert \mathbf{B} \mathbf{x} \Vert_p^q}\left( \mathbf{x}\right) & = &q (q-p) \Vert \mathbf{B} \mathbf{x} \Vert_p^{q-2p} \mathbf{B}^T \mathbf{a} \mathbf{a}^T \mathbf{B} + q (p-1) \Vert \mathbf{B} \mathbf{x} \Vert_p^{q-p} \left( \mathbf{B} \odot \mathbf{C}\right)^T \left( \mathbf{B} \odot \mathbf{C}\right); \end{array} $$

(91)

$$ \begin{array}{@{}rcl@{}} \mathbf{C} & \equiv &\vert \mathbf{B} \mathbf{x} \vert^{\dot\land (p/2-1)} \mathbf{1}^T_{m_x}, \end{array} $$

(92)

where $\mathbf {1}_{m_{x}}$ stands for a m_x-dimensional vector whose elements are all equal to 1. As is evident in Eq. 91, $\boldsymbol {\nabla }^{2}_{\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}}\left (\mathbf {x}\right ) $ is symmetric. Interestingly, if 0 < q = p < 1, one can see that $\boldsymbol {\nabla }^{2}_{\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}}\left (\mathbf {x}\right ) $ is negative semi-definite, reflecting the fact that the metric $\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}$ is non-convex in this case.

Again, if p = q = 2 or p = q = 1, then Eqs. 91 and 92 lead to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\Vert \mathbf{B} \mathbf{x} \Vert_2^2}\left( \mathbf{x}\right) = 2 \mathbf{B}^T \mathbf{B}, \end{array} $$

or

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\Vert \mathbf{B} \mathbf{x} \Vert_1^1}\left( \mathbf{x}\right) = \mathbf{0}, \mathbf{B}\mathbf{x} \neq \mathbf{0}. \end{array} $$

The latter result implies that the Hessian of $\Vert \mathbf {B} \mathbf {x} {\Vert _{1}^{1}}$ vanishes almost everywhere (except at Bx = 0).

Appendix D: Gradient and Hessian of a mixture of regularization operators

Here we aim to evaluate the gradient and Hessian of a mixture of K_mix regularization operators in the transformed model space, in the form of

$$ \begin{array}{@{}rcl@{}} \sum\limits_{k=1}^{K_{mix}} w_k \mathcal{R}_k \left( \mathcal{S}_k \left( \mathbf{x}\right)\right), \end{array} $$

(93)

where w_k is a scalar, representing the weight associated with the k th regularization operator $\mathcal {R}_{k}$, which acts on the transformed model variables $\mathcal {S}_{k} \left (\mathbf {x}\right )$, obtained by applying the transform operator $\mathcal {S}_{k}$ to the model variables x.

For ease of comprehension, in the sequel we adopt the method of induction. To start, consider the case with a single regularization operator $\mathcal {R}$, such that the regularization term is simply in the form of $\mathcal {R}\left (\mathcal {S} \left (\mathbf {x}\right )\right )$. In this case, we have

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\mathcal{R}}\left( \mathbf{x}\right) \equiv \frac{\partial \mathcal{R}\left( \mathcal{S} \left( \mathbf{x}\right)\right)}{\partial \mathbf{x} } = \frac{\partial \mathcal{S}}{\partial \mathbf{x} } \frac{\partial \mathcal{R}\left( \mathcal{S} \right)}{\partial \mathcal{S} } = \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right) \boldsymbol{\nabla}_{\mathcal{R}}\left( \mathcal{S}\right). \end{array} $$

(94)

To get the Hessian, we have

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\mathcal{R}}\left( \mathbf{x}\right) & = &\frac{\partial}{\partial \mathbf{x}} \left( \frac{\partial \mathcal{S}}{\partial \mathbf{x} } \frac{\partial \mathcal{R}\left( \mathcal{S} \right)}{\partial \mathcal{S} }\right) \\ & = &\frac{\partial}{\partial \mathbf{x}} \left( \frac{\partial \mathcal{S}}{\partial \mathbf{x} } \right)^T \frac{\partial \mathcal{R}\left( \mathcal{S} \right)}{\partial \mathcal{S} } + \frac{\partial}{\partial \mathbf{x}} \left( \frac{\partial \mathcal{R}\left( \mathcal{S} \right)}{\partial \mathcal{S} }\right) \left( \frac{\partial \mathcal{S}}{\partial \mathbf{x} } \right)^T \\ & \approx &\frac{\partial \mathcal{S}}{\partial \mathbf{x} } \frac{\partial}{\partial \mathcal{S}} \left( \frac{\partial \mathcal{R}\left( \mathcal{S} \right)}{\partial \mathcal{S} }\right) \left( \frac{\partial \mathcal{S}}{\partial \mathbf{x} } \right)^T \\ & = &\boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right) \boldsymbol{\nabla}^2_{\mathcal{R}}\left( \mathcal{S}\right) \left( \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right)\right)^T. \end{array} $$

(95)

To derive the third line of Eq. 95, we ignore the Hessian of the transform operator $\mathcal {S}$ with respect to x, in light of the strategy of first-order Taylor approximation in Eq. 55.

Now let us consider the case of a mixture of two regularization operators, in terms of $w_{1} \mathcal {R}_{1}\left (\mathcal {S}_{1} \left (\mathbf {x}\right )\right ) + w_{2} \mathcal {R}_{2}\left (\mathcal {S}_{2} \left (\mathbf {x}\right )\right )$. In this case, we define

$$ \begin{array}{@{}rcl@{}} \mathcal{R}\left( \mathcal{S} \left( \mathbf{x}\right)\right) & \equiv &w_1 \mathcal{R}_1\left( \mathcal{S}_1 \left( \mathbf{x}\right)\right) + w_2 \mathcal{R}_2\left( \mathcal{S}_2 \left( \mathbf{x}\right)\right) \\ & = &\mathbf{w}^T \tilde{\mathcal{R}} \left( \mathcal{S} \left( \mathbf{x}\right)\right); \end{array} $$

(96)

$$ \begin{array}{@{}rcl@{}} \mathcal{S} \left( \mathbf{x} \right) & \equiv &\left[ \left( \mathcal{S}_1 \left( \mathbf{x} \right)\right)^T, \left( \mathcal{S}_2 \left( \mathbf{x} \right)\right)^T\right]^T; \end{array} $$

(97)

$$ \begin{array}{@{}rcl@{}} \mathbf{w} & = &\left[w_1 , w_2\right]^T, \end{array} $$

(98)

$$ \begin{array}{@{}rcl@{}} \tilde{\mathcal{R}} \left( \mathcal{S} \left( \mathbf{x}\right)\right) & = &\left[ \left( \mathcal{R}_1\left( \mathcal{S}_1 \left( \mathbf{x}\right)\right)\right)^T, \left( \mathcal{R}_2\left( \mathcal{S}_2 \left( \mathbf{x}\right)\right)\right)^T\right]^T. \end{array} $$

(99)

As a result, we obtain

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\mathcal{R}}\left( \mathbf{x}\right) & \equiv &\frac{\partial \mathcal{R}\left( \mathcal{S} \left( \mathbf{x}\right)\right)}{\partial \mathbf{x} } = \frac{\partial \mathcal{S}}{\partial \mathbf{x} } \frac{\partial \tilde{\mathcal{R}}\left( \mathcal{S} \right)}{\partial \mathcal{S} } \mathbf{w} = \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right) \boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) \mathbf{w}; \end{array} $$

(100)

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right) & = &\left[\boldsymbol{\nabla}_{\mathcal{S}_1}\left( \mathbf{x}\right), \boldsymbol{\nabla}_{\mathcal{S}_2}\left( \mathbf{x}\right)\right]; \end{array} $$

(101)

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) & = &\begin{pmatrix} \boldsymbol{\nabla}_{\mathcal{R}_1}\left( \mathcal{S}_1\right) & \mathbf{0} \\ \mathbf{0} & \boldsymbol{\nabla}_{\mathcal{R}_2}\left( \mathcal{S}_2\right) \end{pmatrix}. \end{array} $$

(102)

Therefore, we have the gradient of $\mathcal {R}$ with respect to $\mathcal {S}$, in terms of

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\mathcal{R}}\left( \mathcal{S}\right) & \equiv\boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) \mathbf{w} = \left[w_1 \left( \boldsymbol{\nabla}_{\mathcal{R}_1}\left( \mathcal{S}_1\right)\right)^T, w_2 \left( \boldsymbol{\nabla}_{\mathcal{R}_2}\left( \mathcal{S}_2\right)\right)^T \right]^T. \end{array} $$

(104)

In addition, we have

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\mathcal{R}}\left( \mathbf{x}\right) & = &\frac{\partial }{\partial \mathbf{x} } \left( \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right) \boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) \mathbf{w} \right) \\ & \approx &\frac{\partial }{\partial \mathbf{x} } \left( \boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) \mathbf{w} \right) \left( \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right)\right)^T \end{array} $$

(105)

$$ \begin{array}{@{}rcl@{}} & = &\boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right) \left[ \frac{\partial }{\partial \mathcal{S} } \left( \boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) \mathbf{w} \right) \right] \left( \boldsymbol{\nabla}_{\mathcal{S}}\left( \mathbf{x}\right)\right)^T . \end{array} $$

(106)

Hence the Hessian of $\mathcal {R}$ with respect to $\mathcal {S}$ is given by

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\mathcal{R}}\left( \mathcal{S}\right) & \equiv &\frac{\partial }{\partial \mathcal{S} } \left( \boldsymbol{\nabla}_{\tilde{\mathcal{R}}}\left( \mathcal{S}\right) \mathbf{w} \right) \\ & = &\begin{pmatrix} w_1 \boldsymbol{\nabla}^2_{\mathcal{R}_1}\left( \mathcal{S}_1\right) & \mathbf{0} \\ \mathbf{0} & w_2 \boldsymbol{\nabla}^2_{\mathcal{R}_2}\left( \mathcal{S}_2\right) \end{pmatrix}. \end{array} $$

(107)

To obtain (107), we have used the fact that $\partial \boldsymbol {\nabla }_{\mathcal {R}_{i}}\left (\mathcal {S}_{i}\right ) / \partial \mathcal {S}_{j} = \mathbf {0}$, for i≠j, i, j ∈{1, 2}.

Similarly, for a mixture of K_mix regularization operators, i.e.,

$$ \begin{array}{@{}rcl@{}} \mathcal{R}\left( \mathcal{S} \left( \mathbf{x}\right)\right) \equiv \sum\limits_{k=1}^{K_{mix}} w_k \mathcal{R}_k \left( \mathcal{S}_k \left( \mathbf{x}\right)\right), \end{array} $$

one has its gradient and Hessian in the transformed model space given by

$$ \begin{array}{@{}rcl@{}} \mathcal{S} \left( \mathbf{x} \right) & \equiv &\left[ \left( \mathcal{S}_1 \left( \mathbf{x} \right)\right)^T, \left( \mathcal{S}_2 \left( \mathbf{x} \right)\right)^T, \dotsb, \left( \mathcal{S}_{K_{mix}} \left( \mathbf{x} \right)\right)^T\right]^T; \end{array} $$

(108)

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}_{\mathcal{R}}\left( \mathcal{S}\right) &=& \left[w_1 \left( \boldsymbol{\nabla}_{\mathcal{R}_1}\left( \mathcal{S}_1\right)\right)^T, w_2 \left( \boldsymbol{\nabla}_{\mathcal{R}_2}\left( \mathcal{S}_2\right)\right)^T,\dotsb, w_{K_{mix}} \left( \boldsymbol{\nabla}_{\mathcal{R}_{K_{mix}}}\left( \mathcal{S}_{K_{mix}}\right)\right)^T \right]^T; \end{array} $$

(109)

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\nabla}^2_{\mathcal{R}}\left( \mathcal{S}\right) & =& \begin{pmatrix} w_1 \boldsymbol{\nabla}^2_{\mathcal{R}_1}\left( \mathcal{S}_1\right) & \mathbf{0} & {\cdots} & \mathbf{0} \\ \mathbf{0} & w_2 \boldsymbol{\nabla}^2_{\mathcal{R}_2}\left( \mathcal{S}_2\right) & {\cdots} & \mathbf{0} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ \mathbf{0} & \mathbf{0} & {\cdots} & w_{K_{mix}} \boldsymbol{\nabla}^2_{\mathcal{R}_{K_{mix}}} \left( \mathcal{S}_{K_{mix}}\right) \end{pmatrix}. \end{array} $$

(110)

Appendix E: Additional results in a 5-spots example

5.1 The reservoir model

Here we present some additional numerical results obtained by applying a subset of ${\ell _{p}^{q}}$-GIES algorithms to a 2D 5-spots example, wherein the numerical reservoir model is in the dimension of 50 × 50 (in the unit of gridblocks), with oil, water and gas phases. There are four producers (labeled as P1 – P4) on the corners of the model, and one injector (labeled as I1) in the center, as indicated in Fig. 11. The locations of these wells, in terms of Cartesian coordinates (x, y), are as follows: P1 at (2, 2); P2 at (49, 2); P3 at (2, 49); P4 at (49, 49) and I1 at (25, 25).

The parameters to be estimated consist of x-dimensional permeability (PERMX) and porosity (PORO) on all reservoir gridblocks. Initial ensembles for PERMX and PORO (with 100 ensemble members) are generated through sequential Gaussian simulation. Figure 11 shows the PERMX and PORO maps in the reference model, which is used to generate production data every 30 days, for a total period of 1500 days. One of the producers, P4, is shut in during history matching, therefore the production data consist of well oil/water/gas production rates (WOPR/WWPR/WGPR) from the other 3 producers, and well bottom-hole pressures (WBHP) from wells P1 – P3 and I1 every 30 days. The number of production data used for history matching is 650 in total. The observed production data are contaminated by certain zero-mean Gaussian white noise. For WOPR/WWPR/WGPR data, the standard deviations (STDs) of observation noise are either 10% of their magnitudes, or set to 10^− 6 if the observed production data (e.g., WWPR) are equal to zero; whereas for WBHP, their STDs are 1 bar.

The ${\ell _{p}^{q}}$-GIES algorithms adopted for this case study are constructed based on the following considerations. As Figure 11 indicates, there are also spatial patterns in the reservoir models that can be exploited by the algorithms, e.g., through the calculations of the first-order variations of PERMX and PORO maps, similar to the previous channelized reservoir characterization problem. On the other hand, histograms of PERMX and PORO in the current case study do not provide particularly useful information of spatial patterns. As such, we choose to construct the ${\ell _{p}^{q}}$-GIES algorithms by solving the GMAC problem with the following cost function:

$$ \begin{array}{@{}rcl@{}} C_j^{i+1}& = &\frac{1}{2}\left( \mathbf{d}_j - \mathbf{g}\left( \mathbf{m}_j^{i+1} \right)\right)^T \mathbf{C}_{\mathbf{d}}^{-1} \left( \mathbf{d}_j - \mathbf{g}\left( \mathbf{m}_j^{i+1} \right)\right)\\&&\quad + \gamma^{i} \mathcal{R} \left[ \mathcal{S}\left( \mathbf{m}_j^{i+1}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right]; \end{array} $$

(111)

$$ \begin{array}{@{}rcl@{}} \mathcal{R} \left[ \mathcal{S}\left( \mathbf{m}_j^{i+1}\right) - \mathcal{S}\left( \mathbf{m}_j^{i}\right) \right]& = &\frac{1}{2} \left[w_1 \left( \mathbf{m}_j^{i+1} - \mathbf{m}_j^{i} \right)^T \left( \mathbf{C}_{\mathbf{m}}^i\right)^{-1} \left( \mathbf{m}_j^{i+1} - \mathbf{m}_j^{i} \right)\right.\\ &&\quad + w_2 \Vert \mathcal{S}_{V} \left( \mathbf{m}_j^{i+1} \right)- \mathcal{S}_{V} \left( \mathbf{m}_j^{i}\right) {\Vert_{2}^{2}} \\ && \left.\quad + w_3 \Vert \mathcal{S}_{V} \left( \mathbf{m}_j^{i+1} \right)- \mathcal{S}_{V} \left( \mathbf{m}_j^{i}\right) {\Vert_{1}^{2}} \right], \end{array} $$

(112)

where $\mathcal {S}_{V}$ is a transform operator similar to that in the channelized reservoir characterization problem, but applied to both PERMX and PORO maps; and the weights w_i (i = 1, 2, 3) are determined in a way similar to that in Eq. 43, which leads to the scalar coefficients α_i (i = 1, 2, 3) in Table 4.

Table 4 Performance of ${\ell _{p}^{q}}$-GIES algorithms in terms of RMSE, which are evaluated with respect to the ensembles of reservoir models at the final iteration steps

Full size table

Similar to the situation in the channelized reservoir characterization problem, we adopt 3-bit binary codes to refer to the ${\ell _{p}^{q}}$-GIES algorithms derived from Eq. 111, which leads to 7 algorithms after excluding the one with the code 000 (i.e., no regularization). The characteristics of these 7 algorithms are the same as those summarized in Table 2, except that here only three individual regularization terms, namely, $\left (\mathbf {m}_{j}^{i+1} - \mathbf {m}_{j}^{i} \right )^{T} \left (\mathbf {C}_{\mathbf {m}}^{i}\right )^{-1} \left (\mathbf {m}_{j}^{i+1} - \mathbf {m}_{j}^{i} \right )$, $\Vert \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i+1} \right )- \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i}\right ) {\Vert _{2}^{2}}$ and $\Vert \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i+1} \right )- \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i}\right ) {\Vert _{1}^{2}}$, are adopted. All these ${\ell _{p}^{q}}$-GIES algorithms are equipped with an automatic and adaptive localization scheme, RndShfl-GC, of [31] during history matching, and are run with 10 iteration steps.

In the example here, we use root mean square error (RMSE) as a measure to cross-validate the history matching performance. Given an m-dimensional reference model m^ref and an ensemble ${\mathscr{M}}^{i} = \{ \mathbf {m}_{j}^{i} \}_{j=1}^{{N_{e}}}$ of reservoir models at the i th iteration step, we compute an ensemble ${\varOmega }({\mathscr{M}}^{i})$ of RMSE as follows:

$$ \begin{array}{@{}rcl@{}} {\varOmega}(\mathcal{M}^{i}) \equiv \left\{ \zeta_j^i | \zeta_j^i = \frac{\Vert \mathbf{m}_j^{i} - \mathbf{m}^{ref} \Vert_2^2}{\sqrt{m}} \right\}_{j=1}^{N_e} . \end{array} $$

(113)

Table 4 reports data mismatch and RMSE values (in the form of mean ± STD) with respect to the final ensembles of the 7 ${\ell _{p}^{q}}$-GIES algorithms. The RMSE values are listed for both PERMX and PORO. For reference, data mismatch, RMSE of PORO and RMSE of PERMX with respect to the initial ensemble are (1.7376 ± 4.3294) × 10⁴, 0.0644 ± 0.0037 and 0.4367 ± 0.0263, respectively. For performance assessment, we adopt the average RMSE over PERMX and PORO (with equal weights). Following this setting, the original IES (code 100) is ranked as the 6th, meaning that there are a few other ${\ell _{p}^{q}}$-GIES algorithms that outperform the original IES. As such, one can draw a conclusion similar to that in the previous channelized reservoir characterization problem, that is, it is possible to design through the GIES framework new ensemble history matching algorithms that may perform better than the original IES algorithm in certain circumstances.

In the current case study, except for the ${\ell _{p}^{q}}$-GIES algorithm with the code 001 (for convenience, hereafter we simply call it algorithm 001, and the same custom will be applied to other algorithms), the other algorithms result in very similar RMSE values for the estimated PORO maps. On the other hand, the differences among the RMSE values of PERMX are more substantial. In particular, for algorithm 001, although its mean data mismatch value is higher, its RMSE values for both PORO and PERMX are the lowest among the tested algorithms.

To limit the length of the current work, we do not proceed to present the full numerical results. Instead, we focus on inspecting the impacts of a few ${\ell _{p}^{q}}$-GIES algorithms on model updates. To this end, Fig. 12 reports the PERMX and PORO maps of one initial reservoir model, and the corresponding maps of the final models obtained by algorithms 100 (i.e., the original IES), 001 and 010, respectively. In line with the results in Table 4, the PORO maps obtained by algorithms 100 and 010, respectively, are very similar. In comparison to the initial PORO map, a noticeable difference can be observed in the area around the coordinate (45,15). On the other hand, the final PORO map obtained by algorithm 001 bears more structural differences from those of the other two algorithms. Meanwhile, compared to the initial PORO map, the final PORO map of algorithm 001 also exhibits some clear differences in the area around the coordinate (12,7). A similar conclusion can be drawn for PERMX maps, especially if one compares the PERMX maps in the area around the coordinate (18,12). As such, in this particular case study, it appears that, due to the use of the regularization term $\Vert \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i+1} \right )- \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i}\right ) {\Vert _{1}^{2}}$ with the ℓ₁ norm, algorithm 001 is able to produce flatter regions, in which the estimated parameters (PERMX and PORO) exhibit less spatial variations. This property is less noticeable in algorithms 100 and 010, in which the distance is measured by the ℓ₂ norm instead.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Luo, X. Novel iterative ensemble smoothers derived from a class of generalized cost functions. Comput Geosci 25, 1159–1189 (2021). https://doi.org/10.1007/s10596-021-10046-1

Download citation

Received: 22 May 2020
Accepted: 09 February 2021
Published: 01 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10596-021-10046-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Novel iterative ensemble smoothers derived from a class of generalized cost functions

Abstract

Article PDF

Similar content being viewed by others

Analysis of iterative ensemble smoothers for solving inverse problems

Accounting for model errors in iterative ensemble smoothers

A robust adaptive iterative ensemble smoother scheme for practical history matching applications

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Iterative ensemble smoothers derived from a class of generalized cost functions

Appendix B: Approximations to the products between gradient or Hessian and certain ensemble-induced square-root matrices

Appendix C: Analytic forms of gradient and Hessian with respect to the \({\ell _{p}^{q}}\) metric

Appendix D: Gradient and Hessian of a mixture of regularization operators

Appendix E: Additional results in a 5-spots example

5.1 The reservoir model

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Novel iterative ensemble smoothers derived from a class of generalized cost functions

Abstract

Article PDF

Similar content being viewed by others

Analysis of iterative ensemble smoothers for solving inverse problems

Accounting for model errors in iterative ensemble smoothers

A robust adaptive iterative ensemble smoother scheme for practical history matching applications

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Iterative ensemble smoothers derived from a class of generalized cost functions

Appendix B: Approximations to the products between gradient or Hessian and certain ensemble-induced square-root matrices

Appendix C: Analytic forms of gradient and Hessian with respect to the \({\ell _{p}^{q}}\) metric

Appendix D: Gradient and Hessian of a mixture of regularization operators

Appendix E: Additional results in a 5-spots example

5.1 The reservoir model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation