Analysis of Sea Surface Temperature Variability Using Machine Learning

Ouala, Said; Chapron, Bertrand; Collard, Fabrice; Gaultier, Lucile; Fablet, Ronan

doi:10.1007/978-3-031-40094-0_11

Said Ouala¹³,
Bertrand Chapron¹⁴,
Fabrice Collard¹⁵,
Lucile Gaultier¹⁵ &
…
Ronan Fablet¹³

Part of the book series: Mathematics of Planet Earth ((MPE,volume 11))

Included in the following conference series:

Stochastic Transport in Upper Ocean Dynamics Annual Workshop

1127 Accesses

Abstract

Sea surface temperature (SST) is a critical factor in the global climate system and plays a key role in many marine processes. Understanding the variability of SST is therefore important for a range of applications, including weather and climate prediction, ocean circulation modeling, and marine resource management. In this study, we use machine learning techniques to analyze SST anomaly (SSTA) data from the Mediterranean Sea over a period of 33 years. The objective is to best explain the temporal variability of the SSTA extremes. These extremes are revealed to be well explained through a non-linear interaction between multi-scale processes. The results contribute to better unveil factors influencing SSTA extremes, and the development of more accurate prediction models.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Sea surface temperature (SST) is a critical parameter in the global climate system [1, 2] and plays a vital role in many marine processes, including ocean circulation, evaporation, and the exchange of heat and moisture between the ocean and atmosphere [3, 4, 5].

In recent years, particular attention has been attracted by marine heat waves, when SST largely exceeds the local expected average values [6, 7]. Extreme SST can cause coral bleaching [8, 9], with cascading effects on the entire ecosystem. Additionally, localized events affect the amount of atmospheric moisture available, to impact precipitation patterns and the likelihood of drought or flooding in certain regions [10]. Better uncovering factors contributing to these extreme events is therefore of great importance to help predicting and mitigating their impacts.

The SST dynamics compound many processes that interact across a continuum of spatio-temporal scales. A first-order approximation of such a system was initially introduced by [11, 12]. Hasselmann pioneered a two-scale stochastic decomposition, to represent the interactions between slow and fast variables. In this study, we focus on SSTA data collected in the Mediterranean Sea, and examine the potential of machine learning techniques to derive relevant dynamical models. Focus is given on the seasonal modulation of the SSTA and we wish to unveil factors influencing the temporal variability of SSTA extremes. The proposed analysis builds on Hasselmann’s assumption that the variability of the SSTA can be decomposed into slow and fast components. The slow variables mostly follows the seasonal cycle, while the fast variables are linked to rapid processes, e.g. the wind variability. We thus approximate the probability density function of the SSTA data, using a stochastic differential equation in which the drift function represents the seasonal cycle and the diffusion function represents the envelope of the fast SSTA response.

The paper is organized as follows. We start by introducing the general underlying state space model of the SST anomaly. Rather than directly presenting the stochastic model, we first assume that an underlying deterministic ordinary differential equation (ODE) can represent the non-periodic variability of the SSTA. Considering a phase space reconstruction setting, we use the neural embedding of dynamical systems (NbedDyn) framework [13, 14] for this task. We then discuss the limitations of such a representation, and present the stochastic model. We conclude by summarizing our findings and potential future directions.

2 Method

Let us assume the following state-space model

$$\displaystyle \begin{aligned} \dot{\mathbf{z}}_{t} &= {f}(\mathbf{{z}}_{t}){} \end{aligned} $$

(1)

$$\displaystyle \begin{aligned} \mathbf{{x}}_{t} &= \mathcal{H}(\mathbf{{z}}_{t}){} \end{aligned} $$

(2)

where $t \in [0, +\infty ]$ is time. The variables $\mathbf {{z}}_t \in \mathbb {R}^{s}$ and $\mathbf {{x}}_t \in \mathbb {R}^{n}$ represent the state variables and the SST anomaly observations respectively. f and $\mathcal {H}_t$ are the dynamical and observation operators. The impact of noise on the dynamics and observation models is omitted for simplicity of the presentation.

2.1 Deterministic Model Hypothesis

The NbedDyn Framework

If we assume that ${\mathbf {z}}_{\mathbf {t}}$ is asymptotic to a limit-set $L \subset \mathbb {R}^s$ and that the observations model is not an embedding [15], The NbedDyn model allows one to jointly derive a geometric reconstruction of the unseen phase space from partial observations and a corresponding dynamical model. For any given operator $\mathcal {H}$ of a deterministic dynamical system, Takens theorem [16] guarantees that such an augmented space exists. However, instead of using a delay embedding, NbedDyn defines a $d_E$-dimensional augmented space with states ${\mathbf {u}}_t \in \mathbb {R}^{d_E}$ as follows:

$$\displaystyle \begin{aligned} {\mathbf{u}}_t^T = [{\mathbf{x}}_{t}^T, {\mathbf{y}}_{t}^T] {} \end{aligned} $$

(3)

where ${\mathbf {y}}_{t} \in \mathbb {R}^{d_E-r}$ are stated as the latent states and T represents the matrix transpose. They account for the unobserved components of the true state ${\mathbf {z}}_t$.

The augmented state ${\mathbf {u}}_t$ is assumed to satisfy the following state space model:

$$\displaystyle \begin{aligned} \dot{\mathbf{u}}_t &=f_{\theta_1}({\mathbf{u}}_{\mathbf{t}}) {} \end{aligned} $$

(4)

$$\displaystyle \begin{aligned} {\mathbf{x}}_t &=(\mathbf{G}{\mathbf{u}}_{\mathbf{t}}) {} \end{aligned} $$

(5)

where $\mathbf {G}$ is a projection matrix that satisfies ${\mathbf {x}}_t = \mathbf {G}{\mathbf {u}}_{\mathbf {t}}$. The dynamical operator $f_{\theta _1}$ belongs to a given family of neural network operators parameterized by a parameter vector ${\theta _1}$. In this work, we follow [14] and use a linear quadratic parameterization of $f_{\theta _1}$. This particular parameterization allows us to guarantee boundedness of the ODE (4) using the Schlegel boundedness theorem [17]. A linear quadratic ODE model can be written as follows:

$$\displaystyle \begin{aligned} \dot{\mathbf{u}}_{t} =f_{\theta_1}({\mathbf{u}}_{\mathbf{t}}) = \mathbf{c} + \mathbf{L}{\mathbf{u}}_t + [{\mathbf{u}}^T_t{\mathbf{Q}}^{(1)}{\mathbf{u}}_t, \ldots, {\mathbf{u}}^T_t{\mathbf{Q}}^{({d_E})}{\mathbf{u}}_t]^T {} \end{aligned} $$

(6)

where $\mathbf {c} \in \mathbb {R}^{d_E}$, $ \mathbf {L} \in \mathbb {R}^{{d_E} \times {d_E}}$ and ${\mathbf {Q}}^{(i)} = [q_{i,j,k}]^{d_E}_{j,k=1},i = 1, \ldots , {d_E}$. The above approximate model is shifted according to $\bar {\mathbf {u}}_t = {\mathbf {u}}_t-\mathbf {m}$ with $\mathbf {m} \in \mathbb {R}^{d_E}$. The approximate dynamical equation of the shifted state can be written as:

$$\displaystyle \begin{aligned} \dot{\bar{\mathbf{u}}}_{t} = \mathbf{d} + \mathbf{A}\bar{\mathbf{u}}_t + [\bar{\mathbf{u}}^T_t{\mathbf{Q}}^{(1)}{\mathbf{u}}_t, \ldots, \bar{\mathbf{u}}^T_t{\mathbf{Q}}^{({d_E})}\bar{\mathbf{u}}_t]^T {} \end{aligned} $$

(7)

with

$$\displaystyle \begin{aligned} \mathbf{d} = \mathbf{c} + \mathbf{L}\mathbf{m} + [{\mathbf{m}}^T{\mathbf{Q}}^{(1)}{\mathbf{m}}, \ldots, {\mathbf{m}}^T{\mathbf{Q}}^{(s)}{\mathbf{m}}]^T {} \end{aligned} $$

(8)

and

$$\displaystyle \begin{aligned} \mathbf{A} = \bigg( a_{ij} \bigg)= \bigg( l_{ij} + \sum^s_{k = 1}(q_{i,j,k}+q_{i,k,j})m_k \bigg) {} \end{aligned} $$

(9)

Given an observation time series of size $N+1$ $\{{\mathbf {x}}_{t_0},\ldots ,{\mathbf {x}}_{t_N}\}$, the training setting comes to jointly learning the model parameters $\theta _1 = \{\mathbf {c}, \mathbf {L}, {\mathbf {Q}}^{\mathbf {1}}, {\mathbf {Q}}^{\mathbf {2}}, \cdots , {\mathbf {Q}}^{{\mathbf {d}}_{\mathbf {E}}}, \mathbf {m}\}$ and the latent states ${\mathbf {y}}_t$ according to the following constrained optimization problem

$$\displaystyle \begin{gathered} \begin{aligned} \hat{\theta}_1, \{\hat{\mathbf{y}}_{t_i}\}_{i = 0}^{i = {N-1}} = \displaystyle \arg \min_{\theta_1,\{{\mathbf{y}}_{t_i}\}} & \displaystyle \sum_{i=1}^{N} \|{\mathbf{x}}_{t_i} - \mathbf{G} \varPhi_{\theta_1,t_i} \left({\mathbf{u}}_{t_{i-1}} \right ) \|{}^2 \\ &+ \lambda_1 \|{\mathbf{u}}_{t_i} - \varPhi_{\theta_1,t_i}({\mathbf{u}}_{t_{i-1}})\|{}^2 \\ &+ \lambda_2 \mathcal{C}_1 \\ &+ \lambda_3 \mathcal{C}_2 \end{aligned} {} \end{gathered} $$

(10)

with $\varPhi _{\theta _1,t}({\mathbf {u}}_{t-1}) = {\mathbf {u}}_{t-1} + \int _{t-1}^{t}f_{\theta _1} ({\mathbf {u}}_{w})dw$ is the flow of the ODE (6) (in our work, this flow is approximated using a Runge Kutta 4 scheme) and $\mathcal {C}_1 = \sum _{i,j,k=1}^s \|q_{i,j,k}+q_{i,k,j}+b_{j,i,k}+b_{j,k,i}+b_{k,i,j}+b_{k,j,i}\|{ }^2$ and $\mathcal {C}_2 = \sum _{i=1}^s \mathrm {Max}(\alpha _i,0)/\mathrm {Max}(\alpha _i+1,0)$ where $\alpha _i, i =1, \ldots , {d_E}$ the eigenvalues of the matrix ${\mathbf {A}}_s = \frac {1}{2}(\mathbf {A} + {\mathbf {A}}^T)$. The variables $\lambda _{1,2,3}$ are constant weighting parameters. The first constraint $\mathcal {C}_1$ steams from the energy-preserving condition of the quadratic non-linearity. It forces the contribution of the quadratic terms of $f_{\theta _1}$ to the fluctuation energy to sum up to zero. The second constraint, $\mathcal {C}_2$, ensures that the eigenvalues of ${\mathbf {A}}_s$ are negative. Satisfying these constraints guarantees that the model $f_{\theta _1}$ is bounded through the existence of a monotonically attracting trapping region that includes the limit-set revealed by the minimization of the forecasting loss. Similarly to the Takens delay embedding technique, the sequence:

$$\displaystyle \begin{aligned} R_{t_0,t_N} = \{\hat{\mathbf{u}}_{t_i}^T = [{\mathbf{x}}_{t_i}^T, \hat{\mathbf{y}}_{t}^T] \text{ with } t_i = t_0, \dots, t_N \} {} \end{aligned} $$

(11)

represents a geometric reconstruction of the phase space. In addition to this reconstruction, the NbedDyn model can be used to forecast new observations by determining an initial condition of the unobserved component ${\mathbf {y}}_t$ and performing a numerical integration of the ODE model (6). We infer the initial condition using a minimization of an objective function similar to (10), but only with respect to the latent states ${\mathbf {y}}_{t}$. This minimization can be seen as a variational data assimilation problem, with partial observations of the state-space variables and known dynamical and observation models [18].

Related Works

Related state-of-the-art techniques mainly rely on the reconstruction of a phase space using delay embedding [16]. This includes both traditional parametric and non-parametric modeling techniques [19, 20] as well as recurrent neural networks (RNNs). The latter family of methods includes both simple RNN parameterizations of dynamical systems, as well as latent space inference techniques that are built on an approximation of a posterior distribution that requires the parameterization of a delay embedding [21, 22, 23].

The interest of the NbedDyn framework in contrast to delay embedding based approaches resides in the fact that we do not exploit either a delay embedding or an explicit modeling of the inference model (i.e., the reconstruction of the latent states given the observed time series). As such, our scheme only involves the selection of the class of ODEs of interest. This model reduces the complexity of the overall scheme to the complexity of the ODE representation and guarantees the consistency of the reconstructed latent states w.r.t. the learnt ODE.

2.2 Stochastic Model Hypothesis: The Stochastic NbedDyn

When using phase space reconstruction techniques, one should not forget about the assumptions that this theory is built on. For any embedding to work, we are assuming that the dynamical model in (1) exists and can be represented by an ordinary differential equation [15]. For several realistic applications, this ODE may not exist or can have an extremely large dimension. In geoscience, for instance, the dimension of a state space variable can reach $s \approx O(10^9)$. In these situations, reconstructing such an high-dimensional phase space becomes significantly more challenging. In practice, the model returned by any embedding technique can be complemented by an appropriate closure. The form of this closure term can be deterministic using for example the framework of [24] or stochastic through an appropriate calibration of a noise forcing.

When considering SST anomaly data, after calibration of the neural embedding model, an unpredictable, high frequency residual remains. Based on Hasselmann’s idea, we assume this residual component represents the effect of fast-scale processes, e.g. passages of atmospheric and oceanic eddies. To first order, it can be represented as a modulated white noise. Indeed, this residual, shown in Fig. 3, exhibits correlations with the slow-scale SST anomaly data.

To model stochastic SST anomalies, the deterministic NbedDyn model described above is first optimized, and further complemented (6) with a stochastic forcing as follows:

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ccl} \dot{\mathbf{u}}_{t} &=& {f}_{\theta_{1}}({\mathbf{u}}_{t}) + {g}_{\theta_{2}}({\mathbf{u}}_{t})\boldsymbol{\xi}_t\\ {\mathbf{x}}_{t} &=& \mathbf{G}{\mathbf{u}}_{t} {} \end{array}\right. \end{aligned} $$

(12)

with $\boldsymbol {\xi }_t$ is a white noise. We derive the parameters of the model (12), as follows. Given an observation time series of size $N+1$ $\{{\mathbf {x}}_{t_0},\ldots ,{\mathbf {x}}_{t_N}\}$, similarly to the deterministic case, we optimize the diffusion parameters $\theta _2$ to minimize the forecast of the observations. In addition to the diffusion parameters, we also reconstruct a noise realization ${\boldsymbol {\xi }}^{rec}$ that generates the observations process under (12). Overall, the optimization problem can be written as follows:

$$\displaystyle \begin{aligned} \begin{array}{c} \hat{\theta}_2, \{\hat{\boldsymbol{\xi}}_{t_i}^{rec}\}_{i = 0}^{i = {N-1}} = \arg \displaystyle \displaystyle \min_{\theta_2,\boldsymbol{\xi}^{rec}} \sum_{t=1}^T \|{\mathbf{x}}_{t} - \mathbf{G} \varPhi_{\theta,t} \left({\mathbf{u}}_{t-1},\boldsymbol{\xi}^{rec}_{t-1}) \right \|{}^2 \\~\\ \mbox{Subject to } \left \{ \begin{array}{lcl} {\mathbf{u}}_t& =& \varPhi_{\theta,t}({\mathbf{u}}_{t-1},\boldsymbol{\xi}^{rec})\\ \mathbf{G}{\mathbf{u}}_{t}& =& {\mathbf{x}}_{t}\\ {\mathbf{R}}_{\boldsymbol{\xi}^{rec} \boldsymbol{\xi}^{rec}}(\tau)& = & 0 \text{ for all } \tau \neq 0 \\ \end{array}\right. \end{array} {} \end{aligned} $$

(13)

with $\{\hat {\boldsymbol {\xi }}_{t_i}^{rec}\}_{i = 0}^{i = {N-1}}$ is the noise realization that minimizes the objective function in (13) and $\varPhi _{\theta ,t}$:

$$\displaystyle \begin{aligned} \varPhi_{\theta,t}({\mathbf{u}}_{t-1},\boldsymbol{\xi}^{rec}) = {\mathbf{u}}_{t-1} + \int_{t-1}^{t}f_\theta({\mathbf{u}}_{w})dw+ \int_{t-1}^{t}g_\theta({\mathbf{u}}_{w})\boldsymbol{\xi}^{rec}_w dw\end{aligned}$$

the solution of the stochastic model. This solution is approximated in this work using an Euler-Maruyama scheme, which makes the model converge to an Ito SDE.

In practice, we use the following regularized optimization problem:

$$\displaystyle \begin{gathered} \begin{aligned} \hat{\theta}_2,\hat{\boldsymbol{\xi}}^{rec} = \arg \displaystyle \min_{{\theta}_2,\boldsymbol{\xi}^{rec}} &\sum_{t=1}^T \|{\mathbf{x}}_{t} - \mathbf{G} \varPhi_{\theta,t} \left({\mathbf{u}}_{t-1}, \boldsymbol{\xi}^{rec}) \right \|{}^2 \\ &+ \lambda_4 \mathcal{C}_3 \\ &+ \lambda_5 \mathcal{C}_4 \\ &+ \lambda_6 \mathcal{C}_5 \end{aligned} {} \end{gathered} $$

(14)

with $\mathcal {C}_3 = \|{\mathbf {R}}_{\boldsymbol {\xi }^{rec} \boldsymbol {\xi }^{rec}}(\tau ) \|{ }^2$, $\mathcal {C}_4 = \mathbf {Var}(\varPhi _{\theta ,t} ({\mathbf {u}}_{t-1}, \boldsymbol {\xi }^{samp}))$, $\mathcal {C}_5 = \| \varPhi _{\theta ,t} ({\mathbf {u}}_{t-1}, \boldsymbol {\xi }^{rec}) - \mathbf {E}[\varPhi _{\theta ,t} ({\mathbf {u}}_{t-1}, \boldsymbol {\xi }^{samp})]\|{ }^2$ and $\boldsymbol {\xi }^{samp}$ is a sampled Gaussian white noise. The variables $\lambda _{4,5,6}$ are constant weighting parameters. The first constraint $\mathcal {C}_3$ makes the reconstructed noise path white. The second and third constraints, $\mathcal {C}_{4,5}$, ensure that the SDE generalizes to sampled white noises. Specifically $\mathcal {C}_{3}$ makes an ensemble of trajectories generated from sampled white noise close to the trajectory generated from the reconstructed noise and $\mathcal {C}_{2}$ reduces the spread of the ensemble around the trajectory simulated from the reconstructed noise.

After optimization, we can couple the optimization problems (10) and (14) and calibrate jointly all the model parameters $\theta _1, \theta _2, {\mathbf {y}}_t, {\boldsymbol {\xi }}^{rec}$. This fine tuning step is not essential but allows both the drift and diffusion parameters of the model to adapt to each other.

3 Numerical Experiments

3.1 Data

Sea Surface Temperature Anomalies (SSTA) in the Mediterranean Sea correspond to the Ligurian Sea at $8.6^\circ \mathrm {E},43.8^\circ \mathrm {N}$. The anomalies are computed based on a yearly average of the annual 99th percentile of the SST reanalysis [25, 26]. The time series is made up of daily SST anomaly measurements from 1987 to 2019. We use the daily data from 1987 to 2014 as training data. Figure 3e illustrates the time series. These time series include a seasonal cycle and non-periodic high temperature extremes in the summer.

3.2 Analysis of the Deterministic Model

In this first experiment, we investigate if the deterministic neural embedding model is able to model the non-periodic variability of the SSTA extremes. For this purpose, we test 3 models with dimensions of the embedding ranging from 1 to 10.

Analysis of the Embeddings

The choice of the dimension $d_E$ is linked to the number of independent variables that can be used to model the dynamics using, in our context, a bounded autonomous linear quadratic ODE. We start by studying the direct impact of $d_E$ on the performance of the NbedDyn model. Figure 1 shows the impact of $d_E$ on the training error between the observations and the model simulation. Other criteria could be used (please refer to [13, 14] for a more in depth analysis of this parameters on other case studies), but overall, the training error provides a direct measure of the effectiveness of the embedding dimension in the training phase. The first evaluation of the training error reported in Fig. 1 corresponds to $d_E$ equal to the dimension of the measurements, i.e. $d_E = 1$. In this experiment, no latent states ${\mathbf {y}}_t$ are used and the embedding ${\mathbf {u}}_t = {\mathbf {x}}_t$. In such situations, the ODE model can not perfectly fit the data. Furthermore, at this particular value of $d_E$, the models are more likely to display a bad asymptotic behavior. As the dimension increases, this training error decreases which confirms better modeling abilities using the NbedDyn model. In the following, we study the models with $d_E = 3, 6$ and 10.

Asymptotic Properties of the Models

We evaluate the asymptotic behavior of the deterministic models for $d_E = 3, 6$ and 10. For this purpose, we run the nbedDyn models for a period of 27 years. The resulting simulation is visualized with respect to the reconstructed phase space (11) of the training data in Fig. 2. Overall, the models are only able to reproduce the seasonal cycle of the SST anomaly data. Other experiments (not shown here) suggest that even a farther increase of the dimension of the embedding does not allow the model to capture the non-periodic behavior of the SST anomaly extremes.

Analysis of the Training Residuals

To further investigate the asymptotic behavior of the deterministic models, we visualize in Fig. 3 the training residual $ \{ {\mathbf {x}}_{t_i} - \mathbf {G} \varPhi _{\theta ,t_i} ({\mathbf {u}}_{t_{i-1}}) \text{ with } t_i = t_0, \dots , t_N \}$. When the dimension of the embedding increases above $d_E = 1$, a qualitative and quantitative change in the residual error is present. This is due to the fact that a two-dimensional ODE (in $\mathbb {R}$) is needed to capture the oscillations of the seasonal cycle of the SSTA. However, when the dimension of the embedding increases above 2, no clear qualitative or quantitative change is present. Furthermore, the residual is much more high frequency than the training SSTA data, which suggests that the errors are due to a missing high frequency scales that can not be modeled using the standard deterministic model.

Based on these considerations, and motivated by Hasselmann’s works on stochastic climate models with applications on SST anomaly data, we proposed the stochastic NbedDyn model. In this framework, the SST residual of Fig. 3 is modeled as a stochastic forcing.

3.3 Analysis of the Stochastic Model

We focus our analysis on the model with $d_E = 6$. We add a stochastic forcing to the neural embedding model (the parameters of the diffusion function $g_{\theta _2}$ are optimized according to Appendix 1). Figure 4 shows the reconstructed phase space under this new model, as well as a model simulation of 27 years. When compared to the simulations of the deterministic model in Fig. 2, the stochastic model is able to cover the whole reconstructed phase space, including the regions with high temperature extremes. This shows that including the high frequency forcing is crucial for the model to capture the non-periodic behavior of the extremes.

These observations are further illustrated in the simulation example given in Fig. 5. The stochastic model is able to produce an ensemble of SST anomaly trajectories that reproduce the non-periodic variability of the extremes. Furthermore, the trajectories generated from a sampled white noise match the one of the reconstructed noise, which validates the proposed training procedure.

We can also discuss the marginal PDF of the stochastic model and compare it to the one computed from the data in Fig. 6. The PDF of the model is computed over a simulation of 109 years. Overall, the model is able to correctly model the high SST anomalies (in the summer), including the non-periodic extremes that form the tail of the distribution. The negative SST anomalies (in the winter) are not approximated as good as in the summer case. This is due to the fact that the model flattens the PDF in the winter by generating trajectories that have more spread (as highlighted in the ensemble prediction experiment in Fig. 5). We did not investigate this problem within the present study. However, we can make the PDF sharper in the winter by forcing the diffusion of the model to be closer to zero during this season.

4 Conclusion

In this work, we examined the potential of machine learning techniques to derive relevant dynamical models of sea surface temperature anomaly data in the Mediterranean Sea. We focused on the seasonal modulation of SSTA extremes and used a neural embedding model to reconstruct the phase space of SSTA data. We then added a stochastic forcing term to account for the missing high frequency variability. Our results contribute to the understanding of the factors influencing SSTA extremes and the development of more accurate prediction models. In particular, the analysis highlights the importance of including these fast high-frequency scales in the modeling of SSTA data.

One potential avenue for future work is to investigate the white noise hypothesis in comparison to other types of stochastic models, such as those based on colored noise or fractional Brownian motion. Furthermore, it would be interesting to apply the methodology to other regions and compare the results to evaluate the local impacts of the fast-scales on the slower ones. The ability of using this model as an emulator and studying its predictive skills with respect to standard ocean data assimilation based systems is also a promising perspective.

Finally, and from a methodological point of view, this work highlights the importance of complementing models that are returned by an embedding methodology. Specifically, and as discussed in Sect. 2.2, in complex applications such as the ones in geosciences, the dimension of the underlying state variables is likely to be huge and defining ways of complementing reduced order models through appropriate closure terms is mandatory in order to capture the variability of the data. Analysing the residual of the model fitting procedure is a natural way to define and optimize this closure terms.

References

Grant R Bigg, TD Jickells, PS Liss, and TJ Osborn. The role of the oceans in climate. International Journal of Climatology: A journal of the Royal Meteorological Society, 23(10):1127–1159, 2003.
Google Scholar
Song Yang, Zhenning Li, Jin-Yi Yu, Xiaoming Hu, Wenjie Dong, and Shan He. El niño–southern oscillation and its impact in the changing climate. National Science Review, 5(6):840–857, 2018.
Article Google Scholar
Gerold Siedler, John Gould, and John Church. Ocean circulation and climate: observing and modelling the global ocean. Elsevier, 2001.
Google Scholar
Alexander Otto, Friederike EL Otto, Olivier Boucher, John Church, Gabi Hegerl, Piers M Forster, Nathan P Gillett, Jonathan Gregory, Gregory C Johnson, Reto Knutti, et al. Energy budget constraints on climate response. Nature Geoscience, 6(6):415–416, 2013.
Google Scholar
Kevin E Trenberth and John T Fasullo. An apparent hiatus in global warming? Earth’s Future, 1(1):19–32, 2013.
Google Scholar
Eric CJ Oliver, Simon J Wotherspoon, Matthew A Chamberlain, and Neil J Holbrook. Projected Tasman Sea extremes in sea surface temperature through the twenty-first century. Journal of Climate, 27(5):1980–1998, 2014.
Google Scholar
Deborah Pardo, Stéphanie Jenouvrier, Henri Weimerskirch, and Christophe Barbraud. Effect of extreme sea surface temperature events on the demography of an age-structured albatross population. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1723):20160143, 2017.
Google Scholar
Terry P Hughes, James T Kerry, Mariana Álvarez-Noriega, Jorge G Álvarez-Romero, Kristen D Anderson, Andrew H Baird, Russell C Babcock, Maria Beger, David R Bellwood, Ray Berkelmans, et al. Global warming and recurrent mass bleaching of corals. Nature, 543(7645):373–377, 2017.
Google Scholar
PC Mohanty, A Kushabaha, RS Mahendra, RK Nayak, BK Sahu, E Rao, and T Sinivasa Kumar. Persistence of marine heat waves for coral bleaching and their spectral characteristics around andaman coral reef. Environmental Monitoring and Assessment, 193(8):1–9, 2021.
Google Scholar
Wenju Cai, Simon Borlace, Matthieu Lengaigne, Peter Van Rensch, Mat Collins, Gabriel Vecchi, Axel Timmermann, Agus Santoso, Michael J McPhaden, Lixin Wu, et al. Increasing frequency of extreme el niño events due to greenhouse warming. Nature climate change, 4(2):111–116, 2014.
Google Scholar
Klaus Hasselmann. Stochastic climate models part I. Theory. Tellus, 28(6):473–485, 1976.
Google Scholar
Claude Frankignoul and Klaus Hasselmann. Stochastic climate models, part II application to sea-surface temperature anomalies and thermocline variability. Tellus, 29(4):289–305, 1977.
Article Google Scholar
S Ouala, D Nguyen, L Drumetz, B Chapron, A Pascual, F Collard, L Gaultier, and R Fablet. Learning latent dynamics for partially observed chaotic systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 30(10):103121, 2020.
Google Scholar
Said Ouala, Steven L Brunton, Bertrand Chapron, Ananda Pascual, Fabrice Collard, Lucile Gaultier, and Ronan Fablet. Bounded nonlinear forecasts of partially observed geophysical systems with physics-constrained deep learning. Physica D: Nonlinear Phenomena, page 133630, 2023.
Google Scholar
Tim Sauer, James A. Yorke, and Martin Casdagli. Embedology. Journal of Statistical Physics, 65(3):579–616, Nov 1991.
Article MathSciNet MATH Google Scholar
Floris Takens. Detecting strange attractors in turbulence. In David Rand and Lai-Sang Young, editors, Dynamical Systems and Turbulence, Warwick 1980, pages 366–381, Berlin, Heidelberg, 1981. Springer Berlin Heidelberg.
Google Scholar
Michael Schlegel and Bernd R. Noack. On long-term boundedness of Galerkin models. Journal of Fluid Mechanics, 765:325–352, 2015.
Article MathSciNet MATH Google Scholar
Peter Lynch and Xiang-Yu Huang. Initialization, pages 241–260. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
Google Scholar
Henry D. I. Abarbanel. Modeling Chaos, pages 95–114. Springer New York, New York, NY, 1996.
Google Scholar
Johan Paduart, Lieve Lauwers, Jan Swevers, Kris Smolders, Johan Schoukens, and Rik Pintelon. Identification of nonlinear systems using Polynomial Nonlinear State Space models. Automatica, 46(4):647–656, April 2010.
Article MathSciNet MATH Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs], December 2015. arXiv: 1512.03385.
Google Scholar
Rahul G. Krishnan, Uri Shalit, and David Sontag. Structured Inference Networks for Nonlinear State Space Models. arXiv:1609.09869 [cs, stat], September 2016. arXiv: 1609.09869.
Google Scholar
Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018.
Google Scholar
Matthew Levine and Andrew Stuart. A framework for machine learning of model error in dynamical systems. Communications of the American Mathematical Society, 2(07):283–344, 2022.
Article MathSciNet Google Scholar
S Simoncelli, C Fratianni, N Pinardi, A Grandi, M Drudi, P Oddo, and S Dobricic. Mediterranean sea physical reanalysis (cmems med-physics)[data set]. Copernicus Monitoring Environment Marine Service (CMEMS), 2019.
Google Scholar
Karina von Schuckmann et al. Copernicus marine service ocean state report, issue 3. Journal of Operational Oceanography, 12(sup1):S1–S123, 2019.
Google Scholar

Download references

Author information

Authors and Affiliations

IMT Atlantique, Lab-STICC, Brest, France
Said Ouala & Ronan Fablet
Ifremer, LOPS, Plouzané, France
Bertrand Chapron
ODL, Locmaria-Plouzané, France
Fabrice Collard & Lucile Gaultier

Authors

Said Ouala
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Chapron
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Collard
View author publications
You can also search for this author in PubMed Google Scholar
Lucile Gaultier
View author publications
You can also search for this author in PubMed Google Scholar
Ronan Fablet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Said Ouala .

Editor information

Editors and Affiliations

Ifremer -- Institut Français de Recherche pour l’Exploitation de la Mer, Plouzané, France
Bertrand Chapron
Imperial College London, London, UK
Dan Crisan
Imperial College London, London, UK
Darryl Holm
Campus Universitaire de Beaulieu, Inria - Institut National de Recherche en Sciences et Technologies du Numérique, Rennes, France
Etienne Mémin
Imperial College London, London, UK
Anna Radomska

Appendices

Appendix 1: Training

The trainable parameters of the deterministic NbedDyn models i.e. the linear quadratic ODE and initial conditions of the latent states are initially sampled from a uniform distribution. The training of all models is carried using the Adam optimizer. We use a varying learning rate (from 0.1 to 0.001) in all the experiments to speed up the training. Regarding the weighting parameters $\{\lambda _i\}_{i = 1}^{i = 3}$, we set $\lambda _1 = 1$ during all the training. The weights responsible for the boundedness constraints were set at higher values in the beginning of the training i.e. $\lambda _2 = 100$ and $\lambda _3 = 1000$ and then reduced to $\lambda _2 = 1$ when $\lambda _3\mathcal {C}_2 = 0$. The training is stopped using cross-validation. Regarding the stochastic forcing, the parameters of the diffusion are initially sampled from a uniform distribution and the noise path $\boldsymbol {\xi }^{rec}$ is initialized from a standard normal distribution. We use a learning rate of 0.001 and the weighting parameters $\{\lambda _i\}_{i = 4}^{i = 6}$ to one during all the training. We finished the training with a fine tuning step, in which all the parameters of the model are optimized jointly with a learning rate of 0.0001.

Appendix 2: Parameterization of the Diffusion Function

The diffusion function $g_{\theta _2} : \mathbb {R}^{d_E}\longrightarrow \mathbb {R}^{d_E\times d_N}$ where $d_N$ is the dimension of the noise. In our experiment, we parameterized this function using a fully connected neural networks with 2 hidden layers with a sigmoid activation and 400 neurones per hidden layer. The dimension of the noise $d_N$ is set to the dimension of the state ${\mathbf {u}}_t$.

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ouala, S., Chapron, B., Collard, F., Gaultier, L., Fablet, R. (2024). Analysis of Sea Surface Temperature Variability Using Machine Learning. In: Chapron, B., Crisan, D., Holm, D., Mémin, E., Radomska, A. (eds) Stochastic Transport in Upper Ocean Dynamics II. STUOD 2022. Mathematics of Planet Earth, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-031-40094-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-40094-0_11
Published: 04 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40093-3
Online ISBN: 978-3-031-40094-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics