Abstract
Iterative ensemble smoothers (IES) are among the state-of-the-art approaches to solving history matching problems. From an optimization-theoretic point of view, these algorithms can be derived by solving certain stochastic nonlinear-least-squares problems. In a broader picture, history matching is essentially an inverse problem, which is often ill-posed and may not possess a unique solution. To mitigate the ill-posedness, in the course of solving an inverse problem, prior knowledge and domain experience are often incorporated, as a regularization term, into a suitable cost function within a respective optimization problem. Whereas in the inverse theory there is a rich class of inversion algorithms resulting from various choices of regularized cost functions, there are few ensemble data assimilation algorithms (including IES) which in their practical uses are implemented in a form beyond nonlinear-least-squares. This work aims to narrow this noticed gap. Specifically, we consider a class of more generalized cost functions, and establish a unified formula that can be used to construct a corresponding group of novel ensemble data assimilation algorithms, called generalized IES (GIES), in a principled and systematic way. For demonstration, we choose a subset (up to 30 +) of the GIES algorithms derived from the unified formula, and apply them to two history matching problems. Experiment results indicate that many of the tested GIES algorithms exhibit superior performance to that of an original IES developed in a previous work, showcasing the potential benefit of designing new ensemble data assimilation algorithms through the proposed framework.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Kay, S.M.: Fundamentals of statistical signal processing. vol 1: Estimation theory. Prentice Hall PTR (1993)
Evensen, G.: Data assimilation: The ensemble Kalman filter. Springer Science & Business Media (2009)
Kalnay, E.: Atmospheric modeling, data assimilation and predictability. Cambridge University Press (2002)
Oliver, D.S., Reynolds, A.C., Liu, N.: Inverse theory for petroleum reservoir characterization and history matching. Cambridge University Press (2008)
Tarantola, A.: Inverse problem theory and methods for model parameter estimation. SIAM (2005)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Springer (2000)
Chen, Y., Oliver, D.: Levenberg-Marquardt forms of the iterative ensemble smoother for efficient history matching and uncertainty quantification. Comput. Geosci. 17, 689–703 (2013)
Chen, Y., Oliver, D.S.: Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Math. Geosci. 44, 1–26 (2012)
Emerick, A.A.: Deterministic ensemble smoother with multiple data assimilation as an alternative for history-matching seismic data. Comput. Geosci. 22(5), 1175–1186 (2018)
Emerick, A.A., Reynolds, A.C.: Ensemble smoother with multiple data assimilation. Comput. Geosci. 55, 3–15 (2012)
Evensen, G., Raanes, P.N., Stordal, A.S., Hove, J.: Efficient implementation of an iterative ensemble smoother for big-data assimilation and reservoir history matching. Front. Appl. Math. Stat. 5, 47 (2019)
Iglesias, M.A.: Iterative regularization for ensemble data assimilation in reservoir models. Comput. Geosci. 19, 177–212 (2015)
Luo, X., Stordal, A., Lorentzen, R., Nævdal, G.: Iterative ensemble smoother as an approximate solution to a regularized minimum-average-cost problem: theory and applications. SPE J. 20, 962–982 (2015). https://doi.org/10.2118/176023-PA, SPE-176023-PA
Ma, X., Bi, L.: A robust adaptive iterative ensemble smoother scheme for practical history matching applications. Comput. Geosci. 23(3), 415–442 (2019)
Stordal, A.S., Elsheikh, A.H.: Iterative ensemble smoothers in the annealed importance sampling framework. Adv. Water Resour. 86, 231–239 (2015)
Evensen, G.: Analysis of iterative ensemble smoothers for solving inverse problems. Comput. Geosci. 22. https://doi.org/10.1007/s10596-018-9731-y (2018)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenom. 60(1-4), 259–268 (1992)
Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of ℓ2-ℓp minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Li, L., Jiang, S., Huang, Q.: Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimed. 14(5), 1401–1413 (2012)
Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008)
Li, Q., Lin, N., et al.: The Bayesian elastic net. Bayesian Anal. 5(1), 151–170 (2010)
Luo, X., Bhakta, T., Jakobsen, M., Nævdal, G.: An ensemble 4D-seismic history-matching framework with sparse representation based on wavelet multiresolution analysis. SPE J. 22, 985–1010 (2017). https://doi.org/10.2118/180025-PA, SPE-180025-PA
Luo, X., Bhakta, T., Jakobsen, M., Nævdal, G.: Efficient big data assimilation through sparse representation: A 3D benchmark case study in petroleum engineering. PLOS ONE 13, e0198586 (2018)
Soares, R.V., Luo, X., Evensen, G., Bhakta, T.: 4D seismic history matching: Assessing the use of a dictionary learning based sparse representation method. J. Pet. Sci. Eng. 195, 107763 (2020)
Lorentzen, R., Flornes, K., Nævdal, G.: History matching channelized reservoirs using the ensemble Kalman filter. SPE J. 17, 137–151 (2012)
Canchumuni, S.WA, Emerick, A.A., Pacheco, M.A.C.: History matching geological facies models based on ensemble smoother and deep generative models. J. Pet. Sci. Eng. 177, 941–958 (2019)
Jafarpour, B.: Wavelet reconstruction of geologic facies from nonlinear dynamic flow measurements. IEEE Trans. Geosci. Remote Sens. 49, 1520–1535 (2011)
Sarma, P., Chen, W.H., et al.: Generalization of the ensemble Kalman filter using kernels for nongaussian random fields. In: SPE reservoir simulation symposium. Society of Petroleum Engineers, The Woodlands. SPE-119177-MS (2009)
Strebelle, S.: Conditional simulation of complex geological structures using multiple-point statistics. Math. Geol. 34, 1–21 (2002)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley (2012)
Luo, X., Bhakta, T.: Automatic and adaptive localization for ensemble-based history matching. J. Pet. Sci. Eng. 184, 106559 (2020)
Magnus, J.R., Neudecker, H.: Matrix differential calculus with applications in statistics and econometrics. Wiley (2019)
Simon, D.: Optimal state estimation: Kalman, H-infinity, and nonlinear approaches. Wiley-Interscience (2006)
Acknowledgements
The author would like to thank two anonymous reviewers for their valuable and constructive suggestions. The author acknowledges financial support from the Research Council of Norway through the Petromaks-2 project DIGIRES (RCN no. 280473) and the industrial partners AkerBP, Wintershall DEA, Vår Energi, Petrobras, Equinor, Lundin and Neptune Energy, and would also like to thank Schlumberger for providing academic licenses to ECLIPSEⒸ.
Funding
Open access funding provided by NORCE Norwegian Research Centre AS.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Iterative ensemble smoothers derived from a class of generalized cost functions
In the sequel, we proceed to develop an approximate solution to the generalized MAC problem in Eqs. 9 and 10, in a way similar to that in [13]. To this end, we need to assume certain regularity conditions, namely, the operators \(\mathcal {D}\), \(\mathcal {R}\), \(\mathcal {T}\), \(\mathcal {S}\) and g are (locally) differentiable up to relevant orders in our derivations below.
We start from considering the first order Taylor approximation
where f is a vector-valued function, x0 is an input vector associated with a (relatively small) perturbation vector δx, and \(\boldsymbol {\nabla }_{f}\left (\mathbf {x}_{0}\right )\), as defined in Eq. 46, represents the gradient with respect to f evaluated at the point x0. Note that throughout this work, we adopt the following convention: given the mx-dimensional vector x0 and the function \(\mathbf {f}: \mathbb {R}^{m_{x}} \rightarrow \mathbb {R}^{m_{f}}\), the gradient \(\boldsymbol {\nabla }_{f}\left (\mathbf {x}_{0}\right )\) is a matrix in the dimension of mx × mf.
The following formulas of matrix calculus [32]
will get involved in our derivation later. In Eq. 47, \(\mathbf {f}\left (\mathbf {u}\left (\mathbf {x} \right )\right )\) represents the composition of two vector functions f and u, with the dummy input vector x. Equation 48 may be considered as a special case of Eq. 47, where \(\mathbf {f}\left (\mathbf {u}\left (\mathbf {x} \right )\right ) = \mathbf {M} \mathbf {u}\left (\mathbf {x} \right )\), with M being a constant matrix, such that ∂Mu/∂u = MT.
In addition, the follow matrix identity
will also be found useful for our derivation later. The left-hand-side (LHS) of Eq. 49 would correspond to the Kalman gain matrix in the Kalman filter, if one treats Cm as the prior model error covariance matrix, H as a linear observation operator that maps a model vector onto the observation space, and Cd as the observation error covariance matrix. The right-hand-side (RHS) of Eq. 49 is another way to represent the Kalman gain matrix, and is often used to formulate the information filter [33].
To solve the minimization problem in Eqs. 9–10, we aim to set
To this end, the follow linearization [13] strategy, through the first Taylor approximation (45), is adopted.
where \(\mathbf {m}_{c}^{i}\) is a “common” point with respect to the (background) ensemble \({\mathscr{M}}^{i}\). The motivation to use this “common” point is to avoid evaluating the gradients with respect to the operators \(\mathcal {T} \circ \mathbf {g}\) and \(\mathcal {S}\) at multiple points. There could be various choices for \(\mathbf {m}_{c}^{i}\), e.g., by setting \(\mathbf {m}_{c}^{i}\) as the ensemble mean of Mi, or as one ensemble member closest to the ensemble mean [13]. In the current work, by default we will let \(\mathbf {m}_{c}^{i}\) be the ensemble mean of \({\mathscr{M}}^{i}\), unless otherwise stated. We note that the choice of \(\mathbf {m}_{c}^{i}\) does not affect the deviations below.
Inserting (51) into the data mismatch term \(\mathcal {D}\) of Eq. 10, then we have
Accordingly, we have
In Eq. 54, we use Eqs. 47 and -48 to derive the first line. In the second line, we then apply the first Taylor approximation (45), around the point \(\mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{c}^{i}\right )\), to the function \(\boldsymbol {\nabla }_{\mathcal {D}}\) in the first line. As a result, we come out with a second-order gradient (i.e., Hessian) \(\boldsymbol {\nabla }_{\boldsymbol {\nabla }_{\mathcal {D}}}\), which is denoted as \(\boldsymbol {\nabla }_{\mathcal {D}}^{2}\) in the third line for short.
Similarly, by inserting (52) into the regularization term \(\mathcal {R}\) of Eq. 10, we have
and the corresponding derivative is
Combing (10), (50), (54) and (56) and with some linear algebra, one has
As in the ensemble-based methods, the model variables \(\mathbf {m}_{j}^{i+1}\) are updated from \(\mathbf {m}_{j}^{i}\), rather than from \(\mathbf {m}_{c}^{i}\), we re-arrange (57) as follows:
After regrouping different terms, the RHS of Eq. 58 can be rewritten as \(\mathcal {E}+\mathcal {F}\), with
In Eq. 59, the second line is obtained by applying the Taylor approximation (45) (in a reverse order) to the function \(\boldsymbol {\nabla }_{\mathcal {D}}\), around the point \(\mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{c}^{i}\right )\). Likewise, the third line is derived by applying (45) again, now to the function \(\mathcal {T} \circ \mathbf {g}\), around the point \(\mathbf {m}_{c}^{i}\).
Likewise, we have
To derive the final result in Eq. 60, we make the following assumption: \(\boldsymbol {\nabla }_{\mathcal {R}}\left (\mathbf {0}\right ) = \mathbf {0}\). The rationale behind this assumption is that, in a conventional setting of regularized inverse-problem theory [6], the regularization term would typically achieve the minimum value (and actually vanish) at the zero value.
Combining (58) – (60), one obtains
As the Hessian \(\boldsymbol {\nabla }_{\mathcal {D}}^{2} \left [\mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{c}^{i}\right )\right ]\) and \(\boldsymbol {\nabla }_{\mathcal {R}}^{2} \left [\mathcal {S}\left (\mathbf {m}_{c}^{i}\right ) - \mathcal {S}\left (\mathbf {m}_{j}^{i}\right )\right ]\) are symmetric, we discard the transpose operator T hereafter. To proceed further, we need to apply (49)–(61). To this end, we do the following assignments:
Then using Eq. 49 and with some algebra, one has the following update formula:
Before proceeding further, we have a few remarks. First, the assimilation algorithm presented in Eqs. 62–66 provides an approximation solution to the GMAC problem, but without using ensemble approximations to the first- and second-order gradients therein yet. If one already has access to all required gradients, then they can be used in the assimilation algorithm, and this may lead to improved performance of history matching. On the other hand, if some of the gradients are not available or impractical to compute and/or store, then one may employ ensemble approximations to obtain certain (sometimes partially) derivative-free algorithms, as will be discussed later.
Second, in the original IES algorithm (3)–(7), the Kalman-gain-like matrix, Ki in Eq. 4, is common to all ensemble members. In contrast, in the case with a generalized cost function, e.g., that in Eq. 10, the corresponding Kalman-gain-like matrix, \(\mathbf {K}_{j}^{i}\) in Eq. (66), depends on both the individual ensemble member \(\mathbf {m}_{j}^{i}\) and even the (potentially) perturbed data dj in general, as one can see from Eqs. 62–63. While the diversity of \(\mathbf {K}_{j}^{i}\) may be of theoretical interest, in practice one of the implications is the increased memory consumption for storing these Kalman-gain-like matrices. This problem can be avoided if one lets both the distance metric \(\mathcal {D}\) and the regularization operator \(\mathcal {R}\) be some quadratic functions, as in the problem formulation that leads to the original IES, cf. Equations 2 or Eqs. 11 and 12; or partially mitigated if one adopts ensemble approximations to compute \(\mathbf {K}_{j}^{i}\), as will be shown below.
To obtain ensemble-based approximation, we start from computing the product \(\mathbf {C}_{m,j}^{i} \left (\mathbf {H}^{i}\right )^{T}\) involved in the computation of \(\mathbf {K}_{j}^{i}\). Let
where \(\mathbf {S}_{\mathcal {I}}\) is an (ensemble-induced) square-root matrix, as defined in Eqs. 6 and 8. Now, for a generic function f with suitable regularity conditions, we have
To drive the results in Eq. 68, we have used the fact that \(\bar {\mathbf {m}}^{i} = \mathbf {m}_{c}^{i}\), and applied the first order Taylor approximation (45), in the reverse order, to derive the result in the second line.
With Eq. 68, we then have
If one does not have the analytic form of the Hessian \(\boldsymbol {\nabla }_{\mathcal {R}}^{2}\) of the regularization operator \(\mathcal {R}\), then an ensemble-based approximation strategy can be further deployed to compute \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\), which will be presented separately later.
Now we consider the computation of the product \(\left (\mathbf {H}^{i}\right ) \mathbf {C}_{m,j}^{i} \left (\mathbf {H}^{i}\right )^{T}\). Following the previous deduction, we have
On the other hand, we have
Again, if \(\boldsymbol {\nabla }_{\mathcal {D}}^{2}\) cannot be evaluated exactly, then an ensemble-based approximation can be adopted, which will be discussed later. Assembling the results from Eqs. 67–73 into Eq. 65, we derive the following ensemble-based model update formula:
In Eq. 74, if the analytic form of the gradient \(\boldsymbol {\nabla }_{\mathcal {D}}\) is not available, then an ensemble-based approximation strategy can be adopted for its computation, as will be discussed later. Equation 75 defines an effective Kalman-gain-like matrix \(\tilde {\mathbf {K}}_{j}^{i}\) involved in the model update formula (74). For the purpose of calculating \(\tilde {\mathbf {K}}_{j}^{i}\), the involved square-root matrices \(\mathbf {S}_{\mathcal {I}}^{i}\) and \(\mathbf {S}_{\mathcal {T} \circ \mathbf {g}}^{i}\) are common to all ensemble members. The differences among ensemble members reside in the matrices \(\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\) and \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\), which are both in the dimension of Ne × Ne. Assuming that \(\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\) and \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\) vary over all ensemble members, then the storage costs of \(\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\) and \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\) are in the order of \({N_{e}^{3}}\), which is normally affordable in practice, with the typical ensemble size Ne being in the order of 102.
Below we show that the original IES algorithm 3 is a special case of the more general formula in Eq. 74. To this end, let the distance metric \(\mathcal {D}\) and the regularization operator \(\mathcal {R}\) be some quadratic mappings as defined in Eqs. 11 and 12. In addition, let both the transform operators \(\mathcal {T}\) and \(\mathcal {S}\) be identity, such that
where \(\mathbf {I}_{N_{e}}\) is the identity matrix in the dimension of Ne × Ne. As a result, the update formula (74) is reduced to
According to Eq. 49, one has
Therefore, Eq. 77 is equivalent to
which is exactly the same as the update formula induced by Eqs. 3–7.
Appendix B: Approximations to the products between gradient or Hessian and certain ensemble-induced square-root matrices
Here, we consider ensemble-based approximations to the term \(\left (\mathbf {S}_{\mathcal {T} \circ \mathbf {g}}^{i}\right )^{T} \boldsymbol {\nabla }_{\mathcal {D}} \left [ \mathcal {T}\left (\mathbf {d}_{j}\right ) - \mathcal {T} \circ \mathbf {g}\left (\mathbf {m}_{j}^{i}\right ) \right ]\) in Eq. 74, the term \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\) defined in Eq. 70, and the term \(\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\) defined in Eq. 73, when the analytic forms of \(\boldsymbol {\nabla }_{\mathcal {D}}\), \(\boldsymbol {\nabla }_{\mathcal {R}}^{2}\) and \(\boldsymbol {\nabla }_{\mathcal {D}}^{2}\) are not available.
Similar to Eq. 68, by applying the first order Taylor approximation (45), one has
where \(\mathbf {1}_{N_{e}}\) is a Ne-long vector whose elements are all equal to 1.
Next, consider the ensemble approximation to \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\). First, let us evaluate
Then we have
The element of \(\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\) on the k th row (\(k = 1, 2, \dotsb , N_{e}\)) and the ℓ th column (\(\ell = 1, 2, \dotsb , N_{e}\)), denoted by \( \left [\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\right ]_{k,l}\), is then given by
Clearly, one has \(\left [\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\right ]_{k,\ell } = \left [\mathbf {M}_{\mathcal {R}}^{i}\left (\mathbf {m}_{j}^{i} \right )\right ]_{\ell ,k}\), meaning that the ensemble approximation leads to a symmetric matrix.
In a similar way, the elements of the ensemble approximation to the term \(\mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\) can be obtained as follows:
Again, the ensemble approximation leads to a symmetric matrix, since \(\left [ \mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\right ]_{k,\ell } = \left [ \mathbf {M}_{\mathcal {D}}^{i}\left (\mathbf {d}_{j} \right )\right ]_{\ell ,k}\).
Appendix C: Analytic forms of gradient and Hessian with respect to the \({\ell _{p}^{q}}\) metric
In the sequel, we derive the analytic forms of gradient and Hessian of the distance metric \(\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}\), where x is an mx-dimensional vector, and B a matrix in the dimension of Nb × mx. As a result, the product Bx leads to a Nb-dimensional vector, while the meaning of \(\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}\) is to raise the ℓp norm of Bx to the q th power. In short, we call \(\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}\) a \({\ell _{p}^{q}}\) metric hereafter.
By definition, we have the \({\ell _{p}^{q}}\) metric
where \(\left (\mathbf {B} \mathbf {x}\right )_{e}\) stands for the e th element of the vector Bx, xf for the f th element of the vector x (\(f = 1, 2, \dotsb , m_{x}\)), and Be, f for the element of the matrix B on the e th row and the f th column.
With some algebra, one has
Note that to derive the result in the third line of Eq. 87 from the second one, we have used the result that d|x|/dx = sgn(x) = x/|x| (with sgn being the sign function), while ignoring the singularity of the derivative at x = 0. In the context of ensemble-based data assimilation, this way of handling the singularity at x = 0 may work well in general, since we are typically dealing with gradient and Hessian evaluated at local points away from 0, due to the deviations of ensemble members from the ensemble mean. For example, see the formula in Eq. 16.
Based on Eq. 87, one has
In Eq. 89, the operator \(\dot \land \) raises the elements of a vector to a certain power, e.g., for a (column) vector \(\mathbf {y} = [y_{1},y_{2},\dotsb ]^{T}\), one has \(\vert \mathbf {y} \vert ^{\dot \land p} = \left [|y_{1}|^{p},|y_{2}|^{p},\dotsb \right ]^{T}\). On the other hand, the operator ⊙ stands for element-wise product (i.e., Hadamard product), such that for \(\mathbf {x} = [x_{1},x_{2},\dotsb ]^{T}\) and \(\mathbf {y} = [y_{1},y_{2},\dotsb ]^{T}\), one has \(\mathbf {x} \odot \mathbf {y} = \left [x_{1} y_{1}, x_{2} y_{2}, \dotsb \right ]^{T}\).
As special cases, if p = q = 2 or p = q = 1, then Eqs. 88 and 89 lead to
or
where \(\dot {sgn}\) denotes the sign function applied element-wise to its input.
In a similar way, we can proceed to compute the Hessian of \(\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}\). With some algebra, we obtain
As a result, we have
where \(\mathbf {1}_{m_{x}}\) stands for a mx-dimensional vector whose elements are all equal to 1. As is evident in Eq. 91, \(\boldsymbol {\nabla }^{2}_{\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}}\left (\mathbf {x}\right ) \) is symmetric. Interestingly, if 0 < q = p < 1, one can see that \(\boldsymbol {\nabla }^{2}_{\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}}\left (\mathbf {x}\right ) \) is negative semi-definite, reflecting the fact that the metric \(\Vert \mathbf {B} \mathbf {x} {\Vert _{p}^{q}}\) is non-convex in this case.
Again, if p = q = 2 or p = q = 1, then Eqs. 91 and 92 lead to
or
The latter result implies that the Hessian of \(\Vert \mathbf {B} \mathbf {x} {\Vert _{1}^{1}}\) vanishes almost everywhere (except at Bx = 0).
Appendix D: Gradient and Hessian of a mixture of regularization operators
Here we aim to evaluate the gradient and Hessian of a mixture of Kmix regularization operators in the transformed model space, in the form of
where wk is a scalar, representing the weight associated with the k th regularization operator \(\mathcal {R}_{k}\), which acts on the transformed model variables \(\mathcal {S}_{k} \left (\mathbf {x}\right )\), obtained by applying the transform operator \(\mathcal {S}_{k}\) to the model variables x.
For ease of comprehension, in the sequel we adopt the method of induction. To start, consider the case with a single regularization operator \(\mathcal {R}\), such that the regularization term is simply in the form of \(\mathcal {R}\left (\mathcal {S} \left (\mathbf {x}\right )\right )\). In this case, we have
To get the Hessian, we have
To derive the third line of Eq. 95, we ignore the Hessian of the transform operator \(\mathcal {S}\) with respect to x, in light of the strategy of first-order Taylor approximation in Eq. 55.
Now let us consider the case of a mixture of two regularization operators, in terms of \(w_{1} \mathcal {R}_{1}\left (\mathcal {S}_{1} \left (\mathbf {x}\right )\right ) + w_{2} \mathcal {R}_{2}\left (\mathcal {S}_{2} \left (\mathbf {x}\right )\right )\). In this case, we define
As a result, we obtain
Therefore, we have the gradient of \(\mathcal {R}\) with respect to \(\mathcal {S}\), in terms of
In addition, we have
Hence the Hessian of \(\mathcal {R}\) with respect to \(\mathcal {S}\) is given by
To obtain (107), we have used the fact that \(\partial \boldsymbol {\nabla }_{\mathcal {R}_{i}}\left (\mathcal {S}_{i}\right ) / \partial \mathcal {S}_{j} = \mathbf {0}\), for i≠j, i, j ∈{1, 2}.
Similarly, for a mixture of Kmix regularization operators, i.e.,
one has its gradient and Hessian in the transformed model space given by
Appendix E: Additional results in a 5-spots example
5.1 The reservoir model
Here we present some additional numerical results obtained by applying a subset of \({\ell _{p}^{q}}\)-GIES algorithms to a 2D 5-spots example, wherein the numerical reservoir model is in the dimension of 50 × 50 (in the unit of gridblocks), with oil, water and gas phases. There are four producers (labeled as P1 – P4) on the corners of the model, and one injector (labeled as I1) in the center, as indicated in Fig. 11. The locations of these wells, in terms of Cartesian coordinates (x, y), are as follows: P1 at (2, 2); P2 at (49, 2); P3 at (2, 49); P4 at (49, 49) and I1 at (25, 25).
The parameters to be estimated consist of x-dimensional permeability (PERMX) and porosity (PORO) on all reservoir gridblocks. Initial ensembles for PERMX and PORO (with 100 ensemble members) are generated through sequential Gaussian simulation. Figure 11 shows the PERMX and PORO maps in the reference model, which is used to generate production data every 30 days, for a total period of 1500 days. One of the producers, P4, is shut in during history matching, therefore the production data consist of well oil/water/gas production rates (WOPR/WWPR/WGPR) from the other 3 producers, and well bottom-hole pressures (WBHP) from wells P1 – P3 and I1 every 30 days. The number of production data used for history matching is 650 in total. The observed production data are contaminated by certain zero-mean Gaussian white noise. For WOPR/WWPR/WGPR data, the standard deviations (STDs) of observation noise are either 10% of their magnitudes, or set to 10− 6 if the observed production data (e.g., WWPR) are equal to zero; whereas for WBHP, their STDs are 1 bar.
The \({\ell _{p}^{q}}\)-GIES algorithms adopted for this case study are constructed based on the following considerations. As Figure 11 indicates, there are also spatial patterns in the reservoir models that can be exploited by the algorithms, e.g., through the calculations of the first-order variations of PERMX and PORO maps, similar to the previous channelized reservoir characterization problem. On the other hand, histograms of PERMX and PORO in the current case study do not provide particularly useful information of spatial patterns. As such, we choose to construct the \({\ell _{p}^{q}}\)-GIES algorithms by solving the GMAC problem with the following cost function:
where \(\mathcal {S}_{V}\) is a transform operator similar to that in the channelized reservoir characterization problem, but applied to both PERMX and PORO maps; and the weights wi (i = 1, 2, 3) are determined in a way similar to that in Eq. 43, which leads to the scalar coefficients αi (i = 1, 2, 3) in Table 4.
Similar to the situation in the channelized reservoir characterization problem, we adopt 3-bit binary codes to refer to the \({\ell _{p}^{q}}\)-GIES algorithms derived from Eq. 111, which leads to 7 algorithms after excluding the one with the code 000 (i.e., no regularization). The characteristics of these 7 algorithms are the same as those summarized in Table 2, except that here only three individual regularization terms, namely, \(\left (\mathbf {m}_{j}^{i+1} - \mathbf {m}_{j}^{i} \right )^{T} \left (\mathbf {C}_{\mathbf {m}}^{i}\right )^{-1} \left (\mathbf {m}_{j}^{i+1} - \mathbf {m}_{j}^{i} \right )\), \(\Vert \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i+1} \right )- \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i}\right ) {\Vert _{2}^{2}}\) and \(\Vert \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i+1} \right )- \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i}\right ) {\Vert _{1}^{2}}\), are adopted. All these \({\ell _{p}^{q}}\)-GIES algorithms are equipped with an automatic and adaptive localization scheme, RndShfl-GC, of [31] during history matching, and are run with 10 iteration steps.
In the example here, we use root mean square error (RMSE) as a measure to cross-validate the history matching performance. Given an m-dimensional reference model mref and an ensemble \({\mathscr{M}}^{i} = \{ \mathbf {m}_{j}^{i} \}_{j=1}^{{N_{e}}}\) of reservoir models at the i th iteration step, we compute an ensemble \({\varOmega }({\mathscr{M}}^{i})\) of RMSE as follows:
Table 4 reports data mismatch and RMSE values (in the form of mean ± STD) with respect to the final ensembles of the 7 \({\ell _{p}^{q}}\)-GIES algorithms. The RMSE values are listed for both PERMX and PORO. For reference, data mismatch, RMSE of PORO and RMSE of PERMX with respect to the initial ensemble are (1.7376 ± 4.3294) × 104, 0.0644 ± 0.0037 and 0.4367 ± 0.0263, respectively. For performance assessment, we adopt the average RMSE over PERMX and PORO (with equal weights). Following this setting, the original IES (code 100) is ranked as the 6th, meaning that there are a few other \({\ell _{p}^{q}}\)-GIES algorithms that outperform the original IES. As such, one can draw a conclusion similar to that in the previous channelized reservoir characterization problem, that is, it is possible to design through the GIES framework new ensemble history matching algorithms that may perform better than the original IES algorithm in certain circumstances.
In the current case study, except for the \({\ell _{p}^{q}}\)-GIES algorithm with the code 001 (for convenience, hereafter we simply call it algorithm 001, and the same custom will be applied to other algorithms), the other algorithms result in very similar RMSE values for the estimated PORO maps. On the other hand, the differences among the RMSE values of PERMX are more substantial. In particular, for algorithm 001, although its mean data mismatch value is higher, its RMSE values for both PORO and PERMX are the lowest among the tested algorithms.
To limit the length of the current work, we do not proceed to present the full numerical results. Instead, we focus on inspecting the impacts of a few \({\ell _{p}^{q}}\)-GIES algorithms on model updates. To this end, Fig. 12 reports the PERMX and PORO maps of one initial reservoir model, and the corresponding maps of the final models obtained by algorithms 100 (i.e., the original IES), 001 and 010, respectively. In line with the results in Table 4, the PORO maps obtained by algorithms 100 and 010, respectively, are very similar. In comparison to the initial PORO map, a noticeable difference can be observed in the area around the coordinate (45,15). On the other hand, the final PORO map obtained by algorithm 001 bears more structural differences from those of the other two algorithms. Meanwhile, compared to the initial PORO map, the final PORO map of algorithm 001 also exhibits some clear differences in the area around the coordinate (12,7). A similar conclusion can be drawn for PERMX maps, especially if one compares the PERMX maps in the area around the coordinate (18,12). As such, in this particular case study, it appears that, due to the use of the regularization term \(\Vert \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i+1} \right )- \mathcal {S}_{V} \left (\mathbf {m}_{j}^{i}\right ) {\Vert _{1}^{2}}\) with the ℓ1 norm, algorithm 001 is able to produce flatter regions, in which the estimated parameters (PERMX and PORO) exhibit less spatial variations. This property is less noticeable in algorithms 100 and 010, in which the distance is measured by the ℓ2 norm instead.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Luo, X. Novel iterative ensemble smoothers derived from a class of generalized cost functions. Comput Geosci 25, 1159–1189 (2021). https://doi.org/10.1007/s10596-021-10046-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10596-021-10046-1