Keywords

1 Introduction

Sea surface temperature (SST) is a critical parameter in the global climate system [1, 2] and plays a vital role in many marine processes, including ocean circulation, evaporation, and the exchange of heat and moisture between the ocean and atmosphere [3, 4, 5].

In recent years, particular attention has been attracted by marine heat waves, when SST largely exceeds the local expected average values [6, 7]. Extreme SST can cause coral bleaching [8, 9], with cascading effects on the entire ecosystem. Additionally, localized events affect the amount of atmospheric moisture available, to impact precipitation patterns and the likelihood of drought or flooding in certain regions [10]. Better uncovering factors contributing to these extreme events is therefore of great importance to help predicting and mitigating their impacts.

The SST dynamics compound many processes that interact across a continuum of spatio-temporal scales. A first-order approximation of such a system was initially introduced by [11, 12]. Hasselmann pioneered a two-scale stochastic decomposition, to represent the interactions between slow and fast variables. In this study, we focus on SSTA data collected in the Mediterranean Sea, and examine the potential of machine learning techniques to derive relevant dynamical models. Focus is given on the seasonal modulation of the SSTA and we wish to unveil factors influencing the temporal variability of SSTA extremes. The proposed analysis builds on Hasselmann’s assumption that the variability of the SSTA can be decomposed into slow and fast components. The slow variables mostly follows the seasonal cycle, while the fast variables are linked to rapid processes, e.g. the wind variability. We thus approximate the probability density function of the SSTA data, using a stochastic differential equation in which the drift function represents the seasonal cycle and the diffusion function represents the envelope of the fast SSTA response.

The paper is organized as follows. We start by introducing the general underlying state space model of the SST anomaly. Rather than directly presenting the stochastic model, we first assume that an underlying deterministic ordinary differential equation (ODE) can represent the non-periodic variability of the SSTA. Considering a phase space reconstruction setting, we use the neural embedding of dynamical systems (NbedDyn) framework [13, 14] for this task. We then discuss the limitations of such a representation, and present the stochastic model. We conclude by summarizing our findings and potential future directions.

2 Method

Let us assume the following state-space model

$$\displaystyle \begin{aligned} \dot{\mathbf{z}}_{t} &= {f}(\mathbf{{z}}_{t}){} \end{aligned} $$
(1)
$$\displaystyle \begin{aligned} \mathbf{{x}}_{t} &= \mathcal{H}(\mathbf{{z}}_{t}){} \end{aligned} $$
(2)

where \(t \in [0, +\infty ]\) is time. The variables \(\mathbf {{z}}_t \in \mathbb {R}^{s}\) and \(\mathbf {{x}}_t \in \mathbb {R}^{n}\) represent the state variables and the SST anomaly observations respectively. f and \(\mathcal {H}_t\) are the dynamical and observation operators. The impact of noise on the dynamics and observation models is omitted for simplicity of the presentation.

2.1 Deterministic Model Hypothesis

The NbedDyn Framework

If we assume that \({\mathbf {z}}_{\mathbf {t}}\) is asymptotic to a limit-set \(L \subset \mathbb {R}^s\) and that the observations model is not an embedding [15], The NbedDyn model allows one to jointly derive a geometric reconstruction of the unseen phase space from partial observations and a corresponding dynamical model. For any given operator \(\mathcal {H}\) of a deterministic dynamical system, Takens theorem [16] guarantees that such an augmented space exists. However, instead of using a delay embedding, NbedDyn defines a \(d_E\)-dimensional augmented space with states \({\mathbf {u}}_t \in \mathbb {R}^{d_E}\) as follows:

$$\displaystyle \begin{aligned} {\mathbf{u}}_t^T = [{\mathbf{x}}_{t}^T, {\mathbf{y}}_{t}^T] {} \end{aligned} $$
(3)

where \({\mathbf {y}}_{t} \in \mathbb {R}^{d_E-r}\) are stated as the latent states and T represents the matrix transpose. They account for the unobserved components of the true state \({\mathbf {z}}_t\).

The augmented state \({\mathbf {u}}_t\) is assumed to satisfy the following state space model:

$$\displaystyle \begin{aligned} \dot{\mathbf{u}}_t &=f_{\theta_1}({\mathbf{u}}_{\mathbf{t}}) {} \end{aligned} $$
(4)
$$\displaystyle \begin{aligned} {\mathbf{x}}_t &=(\mathbf{G}{\mathbf{u}}_{\mathbf{t}}) {} \end{aligned} $$
(5)

where \(\mathbf {G}\) is a projection matrix that satisfies \({\mathbf {x}}_t = \mathbf {G}{\mathbf {u}}_{\mathbf {t}}\). The dynamical operator \(f_{\theta _1}\) belongs to a given family of neural network operators parameterized by a parameter vector \({\theta _1}\). In this work, we follow [14] and use a linear quadratic parameterization of \(f_{\theta _1}\). This particular parameterization allows us to guarantee boundedness of the ODE (4) using the Schlegel boundedness theorem [17]. A linear quadratic ODE model can be written as follows:

$$\displaystyle \begin{aligned} \dot{\mathbf{u}}_{t} =f_{\theta_1}({\mathbf{u}}_{\mathbf{t}}) = \mathbf{c} + \mathbf{L}{\mathbf{u}}_t + [{\mathbf{u}}^T_t{\mathbf{Q}}^{(1)}{\mathbf{u}}_t, \ldots, {\mathbf{u}}^T_t{\mathbf{Q}}^{({d_E})}{\mathbf{u}}_t]^T {} \end{aligned} $$
(6)

where \(\mathbf {c} \in \mathbb {R}^{d_E}\), \( \mathbf {L} \in \mathbb {R}^{{d_E} \times {d_E}}\) and \({\mathbf {Q}}^{(i)} = [q_{i,j,k}]^{d_E}_{j,k=1},i = 1, \ldots , {d_E}\). The above approximate model is shifted according to \(\bar {\mathbf {u}}_t = {\mathbf {u}}_t-\mathbf {m}\) with \(\mathbf {m} \in \mathbb {R}^{d_E}\). The approximate dynamical equation of the shifted state can be written as:

$$\displaystyle \begin{aligned} \dot{\bar{\mathbf{u}}}_{t} = \mathbf{d} + \mathbf{A}\bar{\mathbf{u}}_t + [\bar{\mathbf{u}}^T_t{\mathbf{Q}}^{(1)}{\mathbf{u}}_t, \ldots, \bar{\mathbf{u}}^T_t{\mathbf{Q}}^{({d_E})}\bar{\mathbf{u}}_t]^T {} \end{aligned} $$
(7)

with

$$\displaystyle \begin{aligned} \mathbf{d} = \mathbf{c} + \mathbf{L}\mathbf{m} + [{\mathbf{m}}^T{\mathbf{Q}}^{(1)}{\mathbf{m}}, \ldots, {\mathbf{m}}^T{\mathbf{Q}}^{(s)}{\mathbf{m}}]^T {} \end{aligned} $$
(8)

and

$$\displaystyle \begin{aligned} \mathbf{A} = \bigg( a_{ij} \bigg)= \bigg( l_{ij} + \sum^s_{k = 1}(q_{i,j,k}+q_{i,k,j})m_k \bigg) {} \end{aligned} $$
(9)

Given an observation time series of size \(N+1\) \(\{{\mathbf {x}}_{t_0},\ldots ,{\mathbf {x}}_{t_N}\}\), the training setting comes to jointly learning the model parameters \(\theta _1 = \{\mathbf {c}, \mathbf {L}, {\mathbf {Q}}^{\mathbf {1}}, {\mathbf {Q}}^{\mathbf {2}}, \cdots , {\mathbf {Q}}^{{\mathbf {d}}_{\mathbf {E}}}, \mathbf {m}\}\) and the latent states \({\mathbf {y}}_t\) according to the following constrained optimization problem

$$\displaystyle \begin{gathered} \begin{aligned} \hat{\theta}_1, \{\hat{\mathbf{y}}_{t_i}\}_{i = 0}^{i = {N-1}} = \displaystyle \arg \min_{\theta_1,\{{\mathbf{y}}_{t_i}\}} & \displaystyle \sum_{i=1}^{N} \|{\mathbf{x}}_{t_i} - \mathbf{G} \varPhi_{\theta_1,t_i} \left({\mathbf{u}}_{t_{i-1}} \right ) \|{}^2 \\ &+ \lambda_1 \|{\mathbf{u}}_{t_i} - \varPhi_{\theta_1,t_i}({\mathbf{u}}_{t_{i-1}})\|{}^2 \\ &+ \lambda_2 \mathcal{C}_1 \\ &+ \lambda_3 \mathcal{C}_2 \end{aligned} {} \end{gathered} $$
(10)

with \(\varPhi _{\theta _1,t}({\mathbf {u}}_{t-1}) = {\mathbf {u}}_{t-1} + \int _{t-1}^{t}f_{\theta _1} ({\mathbf {u}}_{w})dw\) is the flow of the ODE (6) (in our work, this flow is approximated using a Runge Kutta 4 scheme) and \(\mathcal {C}_1 = \sum _{i,j,k=1}^s \|q_{i,j,k}+q_{i,k,j}+b_{j,i,k}+b_{j,k,i}+b_{k,i,j}+b_{k,j,i}\|{ }^2\) and \(\mathcal {C}_2 = \sum _{i=1}^s \mathrm {Max}(\alpha _i,0)/\mathrm {Max}(\alpha _i+1,0)\) where \(\alpha _i, i =1, \ldots , {d_E}\) the eigenvalues of the matrix \({\mathbf {A}}_s = \frac {1}{2}(\mathbf {A} + {\mathbf {A}}^T)\). The variables \(\lambda _{1,2,3}\) are constant weighting parameters. The first constraint \(\mathcal {C}_1\) steams from the energy-preserving condition of the quadratic non-linearity. It forces the contribution of the quadratic terms of \(f_{\theta _1}\) to the fluctuation energy to sum up to zero. The second constraint, \(\mathcal {C}_2\), ensures that the eigenvalues of \({\mathbf {A}}_s\) are negative. Satisfying these constraints guarantees that the model \(f_{\theta _1}\) is bounded through the existence of a monotonically attracting trapping region that includes the limit-set revealed by the minimization of the forecasting loss. Similarly to the Takens delay embedding technique, the sequence:

$$\displaystyle \begin{aligned} R_{t_0,t_N} = \{\hat{\mathbf{u}}_{t_i}^T = [{\mathbf{x}}_{t_i}^T, \hat{\mathbf{y}}_{t}^T] \text{ with } t_i = t_0, \dots, t_N \} {} \end{aligned} $$
(11)

represents a geometric reconstruction of the phase space. In addition to this reconstruction, the NbedDyn model can be used to forecast new observations by determining an initial condition of the unobserved component \({\mathbf {y}}_t\) and performing a numerical integration of the ODE model (6). We infer the initial condition using a minimization of an objective function similar to (10), but only with respect to the latent states \({\mathbf {y}}_{t}\). This minimization can be seen as a variational data assimilation problem, with partial observations of the state-space variables and known dynamical and observation models [18].

Related Works

Related state-of-the-art techniques mainly rely on the reconstruction of a phase space using delay embedding [16]. This includes both traditional parametric and non-parametric modeling techniques [19, 20] as well as recurrent neural networks (RNNs). The latter family of methods includes both simple RNN parameterizations of dynamical systems, as well as latent space inference techniques that are built on an approximation of a posterior distribution that requires the parameterization of a delay embedding [21, 22, 23].

The interest of the NbedDyn framework in contrast to delay embedding based approaches resides in the fact that we do not exploit either a delay embedding or an explicit modeling of the inference model (i.e., the reconstruction of the latent states given the observed time series). As such, our scheme only involves the selection of the class of ODEs of interest. This model reduces the complexity of the overall scheme to the complexity of the ODE representation and guarantees the consistency of the reconstructed latent states w.r.t. the learnt ODE.

2.2 Stochastic Model Hypothesis: The Stochastic NbedDyn

When using phase space reconstruction techniques, one should not forget about the assumptions that this theory is built on. For any embedding to work, we are assuming that the dynamical model in (1) exists and can be represented by an ordinary differential equation [15]. For several realistic applications, this ODE may not exist or can have an extremely large dimension. In geoscience, for instance, the dimension of a state space variable can reach \(s \approx O(10^9)\). In these situations, reconstructing such an high-dimensional phase space becomes significantly more challenging. In practice, the model returned by any embedding technique can be complemented by an appropriate closure. The form of this closure term can be deterministic using for example the framework of [24] or stochastic through an appropriate calibration of a noise forcing.

When considering SST anomaly data, after calibration of the neural embedding model, an unpredictable, high frequency residual remains. Based on Hasselmann’s idea, we assume this residual component represents the effect of fast-scale processes, e.g. passages of atmospheric and oceanic eddies. To first order, it can be represented as a modulated white noise. Indeed, this residual, shown in Fig. 3, exhibits correlations with the slow-scale SST anomaly data.

To model stochastic SST anomalies, the deterministic NbedDyn model described above is first optimized, and further complemented (6) with a stochastic forcing as follows:

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ccl} \dot{\mathbf{u}}_{t} &=& {f}_{\theta_{1}}({\mathbf{u}}_{t}) + {g}_{\theta_{2}}({\mathbf{u}}_{t})\boldsymbol{\xi}_t\\ {\mathbf{x}}_{t} &=& \mathbf{G}{\mathbf{u}}_{t} {} \end{array}\right. \end{aligned} $$
(12)

with \(\boldsymbol {\xi }_t\) is a white noise. We derive the parameters of the model (12), as follows. Given an observation time series of size \(N+1\) \(\{{\mathbf {x}}_{t_0},\ldots ,{\mathbf {x}}_{t_N}\}\), similarly to the deterministic case, we optimize the diffusion parameters \(\theta _2\) to minimize the forecast of the observations. In addition to the diffusion parameters, we also reconstruct a noise realization \({\boldsymbol {\xi }}^{rec}\) that generates the observations process under (12). Overall, the optimization problem can be written as follows:

$$\displaystyle \begin{aligned} \begin{array}{c} \hat{\theta}_2, \{\hat{\boldsymbol{\xi}}_{t_i}^{rec}\}_{i = 0}^{i = {N-1}} = \arg \displaystyle \displaystyle \min_{\theta_2,\boldsymbol{\xi}^{rec}} \sum_{t=1}^T \|{\mathbf{x}}_{t} - \mathbf{G} \varPhi_{\theta,t} \left({\mathbf{u}}_{t-1},\boldsymbol{\xi}^{rec}_{t-1}) \right \|{}^2 \\~\\ \mbox{Subject to } \left \{ \begin{array}{lcl} {\mathbf{u}}_t& =& \varPhi_{\theta,t}({\mathbf{u}}_{t-1},\boldsymbol{\xi}^{rec})\\ \mathbf{G}{\mathbf{u}}_{t}& =& {\mathbf{x}}_{t}\\ {\mathbf{R}}_{\boldsymbol{\xi}^{rec} \boldsymbol{\xi}^{rec}}(\tau)& = & 0 \text{ for all } \tau \neq 0 \\ \end{array}\right. \end{array} {} \end{aligned} $$
(13)

with \(\{\hat {\boldsymbol {\xi }}_{t_i}^{rec}\}_{i = 0}^{i = {N-1}}\) is the noise realization that minimizes the objective function in (13) and \(\varPhi _{\theta ,t}\):

$$\displaystyle \begin{aligned} \varPhi_{\theta,t}({\mathbf{u}}_{t-1},\boldsymbol{\xi}^{rec}) = {\mathbf{u}}_{t-1} + \int_{t-1}^{t}f_\theta({\mathbf{u}}_{w})dw+ \int_{t-1}^{t}g_\theta({\mathbf{u}}_{w})\boldsymbol{\xi}^{rec}_w dw\end{aligned}$$

the solution of the stochastic model. This solution is approximated in this work using an Euler-Maruyama scheme, which makes the model converge to an Ito SDE.

In practice, we use the following regularized optimization problem:

$$\displaystyle \begin{gathered} \begin{aligned} \hat{\theta}_2,\hat{\boldsymbol{\xi}}^{rec} = \arg \displaystyle \min_{{\theta}_2,\boldsymbol{\xi}^{rec}} &\sum_{t=1}^T \|{\mathbf{x}}_{t} - \mathbf{G} \varPhi_{\theta,t} \left({\mathbf{u}}_{t-1}, \boldsymbol{\xi}^{rec}) \right \|{}^2 \\ &+ \lambda_4 \mathcal{C}_3 \\ &+ \lambda_5 \mathcal{C}_4 \\ &+ \lambda_6 \mathcal{C}_5 \end{aligned} {} \end{gathered} $$
(14)

with \(\mathcal {C}_3 = \|{\mathbf {R}}_{\boldsymbol {\xi }^{rec} \boldsymbol {\xi }^{rec}}(\tau ) \|{ }^2\), \(\mathcal {C}_4 = \mathbf {Var}(\varPhi _{\theta ,t} ({\mathbf {u}}_{t-1}, \boldsymbol {\xi }^{samp}))\), \(\mathcal {C}_5 = \| \varPhi _{\theta ,t} ({\mathbf {u}}_{t-1}, \boldsymbol {\xi }^{rec}) - \mathbf {E}[\varPhi _{\theta ,t} ({\mathbf {u}}_{t-1}, \boldsymbol {\xi }^{samp})]\|{ }^2\) and \(\boldsymbol {\xi }^{samp}\) is a sampled Gaussian white noise. The variables \(\lambda _{4,5,6}\) are constant weighting parameters. The first constraint \(\mathcal {C}_3\) makes the reconstructed noise path white. The second and third constraints, \(\mathcal {C}_{4,5}\), ensure that the SDE generalizes to sampled white noises. Specifically \(\mathcal {C}_{3}\) makes an ensemble of trajectories generated from sampled white noise close to the trajectory generated from the reconstructed noise and \(\mathcal {C}_{2}\) reduces the spread of the ensemble around the trajectory simulated from the reconstructed noise.

After optimization, we can couple the optimization problems (10) and (14) and calibrate jointly all the model parameters \(\theta _1, \theta _2, {\mathbf {y}}_t, {\boldsymbol {\xi }}^{rec}\). This fine tuning step is not essential but allows both the drift and diffusion parameters of the model to adapt to each other.

3 Numerical Experiments

3.1 Data

Sea Surface Temperature Anomalies (SSTA) in the Mediterranean Sea correspond to the Ligurian Sea at \(8.6^\circ \mathrm {E},43.8^\circ \mathrm {N}\). The anomalies are computed based on a yearly average of the annual 99th percentile of the SST reanalysis [25, 26]. The time series is made up of daily SST anomaly measurements from 1987 to 2019. We use the daily data from 1987 to 2014 as training data. Figure 3e illustrates the time series. These time series include a seasonal cycle and non-periodic high temperature extremes in the summer.

3.2 Analysis of the Deterministic Model

In this first experiment, we investigate if the deterministic neural embedding model is able to model the non-periodic variability of the SSTA extremes. For this purpose, we test 3 models with dimensions of the embedding ranging from 1 to 10.

Analysis of the Embeddings

The choice of the dimension \(d_E\) is linked to the number of independent variables that can be used to model the dynamics using, in our context, a bounded autonomous linear quadratic ODE. We start by studying the direct impact of \(d_E\) on the performance of the NbedDyn model. Figure 1 shows the impact of \(d_E\) on the training error between the observations and the model simulation. Other criteria could be used (please refer to [13, 14] for a more in depth analysis of this parameters on other case studies), but overall, the training error provides a direct measure of the effectiveness of the embedding dimension in the training phase. The first evaluation of the training error reported in Fig. 1 corresponds to \(d_E\) equal to the dimension of the measurements, i.e. \(d_E = 1\). In this experiment, no latent states \({\mathbf {y}}_t\) are used and the embedding \({\mathbf {u}}_t = {\mathbf {x}}_t\). In such situations, the ODE model can not perfectly fit the data. Furthermore, at this particular value of \(d_E\), the models are more likely to display a bad asymptotic behavior. As the dimension increases, this training error decreases which confirms better modeling abilities using the NbedDyn model. In the following, we study the models with \(d_E = 3, 6\) and 10.

Fig. 1
figure 1

Mean training error at convergence. We report the mean training error at convergence of the deterministic NbedDyn model for different dimensions \(d_E\) of the embedding. This error is averaged over the training time series, and we highlight here both the mean and standard deviations

Asymptotic Properties of the Models

We evaluate the asymptotic behavior of the deterministic models for \(d_E = 3, 6\) and 10. For this purpose, we run the nbedDyn models for a period of 27 years. The resulting simulation is visualized with respect to the reconstructed phase space (11) of the training data in Fig. 2. Overall, the models are only able to reproduce the seasonal cycle of the SST anomaly data. Other experiments (not shown here) suggest that even a farther increase of the dimension of the embedding does not allow the model to capture the non-periodic behavior of the SST anomaly extremes.

Fig. 2
figure 2

Asymptotic solution of the deterministic models. We visualize the simulation of the deterministic models with respect to the reconstructed phase space. The models with \(d_E = 3, 6\) and 10 are given in figure (a), (b) and (c) respectively. For \(d_E > 3\), we project the simulation and the reconstructed phase space into \( \mathbb {R}^{3}\)

Analysis of the Training Residuals

To further investigate the asymptotic behavior of the deterministic models, we visualize in Fig. 3 the training residual \( \{ {\mathbf {x}}_{t_i} - \mathbf {G} \varPhi _{\theta ,t_i} ({\mathbf {u}}_{t_{i-1}}) \text{ with } t_i = t_0, \dots , t_N \}\). When the dimension of the embedding increases above \(d_E = 1\), a qualitative and quantitative change in the residual error is present. This is due to the fact that a two-dimensional ODE (in \(\mathbb {R}\)) is needed to capture the oscillations of the seasonal cycle of the SSTA. However, when the dimension of the embedding increases above 2, no clear qualitative or quantitative change is present. Furthermore, the residual is much more high frequency than the training SSTA data, which suggests that the errors are due to a missing high frequency scales that can not be modeled using the standard deterministic model.

Fig. 3
figure 3

Training residual and corresponding training data. We visualize the training residual for \(d_E = 1\) in (a), \(d_E = 3\) in (b), \(d_E = 6\) in (c) and \(d_E = 10\) in (d). Corresponding training data are given in (e)

Based on these considerations, and motivated by Hasselmann’s works on stochastic climate models with applications on SST anomaly data, we proposed the stochastic NbedDyn model. In this framework, the SST residual of Fig. 3 is modeled as a stochastic forcing.

3.3 Analysis of the Stochastic Model

We focus our analysis on the model with \(d_E = 6\). We add a stochastic forcing to the neural embedding model (the parameters of the diffusion function \(g_{\theta _2}\) are optimized according to Appendix 1). Figure 4 shows the reconstructed phase space under this new model, as well as a model simulation of 27 years. When compared to the simulations of the deterministic model in Fig. 2, the stochastic model is able to cover the whole reconstructed phase space, including the regions with high temperature extremes. This shows that including the high frequency forcing is crucial for the model to capture the non-periodic behavior of the extremes.

Fig. 4
figure 4

Simulation of the stochastic model. We visualize a stochastic model simulation and compare it to the reconstructed phase space. Both the model simulation and reconstructions are projected into \( \mathbb {R}^{3}\)

These observations are further illustrated in the simulation example given in Fig. 5. The stochastic model is able to produce an ensemble of SST anomaly trajectories that reproduce the non-periodic variability of the extremes. Furthermore, the trajectories generated from a sampled white noise match the one of the reconstructed noise, which validates the proposed training procedure.

Fig. 5
figure 5

Ensemble simulation of the stochastic model. We visualize an ensemble simulation of the stochastic model, both in the training (a) and test (b) sets. The simulation in the training set is carried to compare trajectories computed from a sampled noise to the one issued from the reconstructed noise \( \boldsymbol {\xi }^{rec}\)

We can also discuss the marginal PDF of the stochastic model and compare it to the one computed from the data in Fig. 6. The PDF of the model is computed over a simulation of 109 years. Overall, the model is able to correctly model the high SST anomalies (in the summer), including the non-periodic extremes that form the tail of the distribution. The negative SST anomalies (in the winter) are not approximated as good as in the summer case. This is due to the fact that the model flattens the PDF in the winter by generating trajectories that have more spread (as highlighted in the ensemble prediction experiment in Fig. 5). We did not investigate this problem within the present study. However, we can make the PDF sharper in the winter by forcing the diffusion of the model to be closer to zero during this season.

Fig. 6
figure 6

PDF of the data and the stochastic model. We compare the marginal PDF of the stochastic model with the one of the data

4 Conclusion

In this work, we examined the potential of machine learning techniques to derive relevant dynamical models of sea surface temperature anomaly data in the Mediterranean Sea. We focused on the seasonal modulation of SSTA extremes and used a neural embedding model to reconstruct the phase space of SSTA data. We then added a stochastic forcing term to account for the missing high frequency variability. Our results contribute to the understanding of the factors influencing SSTA extremes and the development of more accurate prediction models. In particular, the analysis highlights the importance of including these fast high-frequency scales in the modeling of SSTA data.

One potential avenue for future work is to investigate the white noise hypothesis in comparison to other types of stochastic models, such as those based on colored noise or fractional Brownian motion. Furthermore, it would be interesting to apply the methodology to other regions and compare the results to evaluate the local impacts of the fast-scales on the slower ones. The ability of using this model as an emulator and studying its predictive skills with respect to standard ocean data assimilation based systems is also a promising perspective.

Finally, and from a methodological point of view, this work highlights the importance of complementing models that are returned by an embedding methodology. Specifically, and as discussed in Sect. 2.2, in complex applications such as the ones in geosciences, the dimension of the underlying state variables is likely to be huge and defining ways of complementing reduced order models through appropriate closure terms is mandatory in order to capture the variability of the data. Analysing the residual of the model fitting procedure is a natural way to define and optimize this closure terms.