Abstract
Much of what we have described in the preceding chapters provides the basic tools necessary to build physiological state-space estimators. In this chapter, we will briefly review some additional concepts in state-space estimation, a non-traditional method of estimation, and some supplementary models. These may help serve as pointers if extensions are to be built to the models already described.
You have full access to this open access chapter, Download chapter PDF
Much of what we have described in the preceding chapters provides the basic tools necessary to build physiological state-space estimators. In this chapter, we will briefly review some additional concepts in state-space estimation, a non-traditional method of estimation, and some supplementary models. These may help serve as pointers if extensions are to be built to the models already described.
9.1 State-Space Model with a Time-Varying Process Noise Variance Based on a GARCH(p, q) Framework
Thus far, we have not considered time-varying model parameters. In reality, the human body is not static. Instead it undergoes changes from time to time (e.g., due to disease conditions, adaptation to new environments). In this section, we will consider a state equation of the form
where \(\varepsilon _{k} \sim \mathcal {N}(0, \sigma ^{2}_{\varepsilon , k})\). Note that the process noise variance now depends on the time index k. Here we will use concepts from the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) framework to model \(\varepsilon _{k}\). In a general GARCH(p, q) framework, we take
where \(\nu _{k} \sim \mathcal {N}(0, 1)\) and
where the \(\alpha _{i}\)’s and \(\beta _{j}\)’s are coefficients to be determined. Now, conditioned on having observed all the sensor readings up to time index \((k - 1)\), we have
and
As is evident from (9.5), the variance of \(\varepsilon _{k}\) depends on k. If a GARCH(p, q) model is used for the process noise term in the random walk, the predict equations in the state estimation step change to
The update equations in the state estimation step remain unchanged. Note also that the calculation of \(\sigma ^{2}_{k|k - 1}\) requires the previous process noise terms. In general, these will have to be calculated based on successive differences between the \(x_{k}\) and \(x_{k - 1}\) estimates.
Moreover, we would also have \((p + q + 1)\) additional GARCH terms (the \(\alpha _{i}\)’s and \(\beta _{j}\)’s) to determine at the parameter estimation step. These terms would have to be chosen to maximize the log-likelihood
The maximization of Q with respect to the GARCH terms is rather complicated. Choosing a GARCH(1, 1) model for \(\varepsilon _{k}\) simplifies the computations somewhat. Additionally, note the recursive form contained within Q. For each value of k, we have terms of the form \(h^{2}_{k - j}\) which contain within them further \(h^{2}\) terms. In general, computing Q is challenging unless further simplifying assumptions are made.
When \(x_{k}\) evolves with time following \(x_{k} = x_{k - 1} + \varepsilon _{k}\), where \(\varepsilon _{k}\) is modeled using a GARCH(p, q) framework, the predict equations in the state estimation step are
The parameter estimation step updates for the \((p + q + 1)\) GARCH terms are chosen to maximize
9.2 Deriving the Parameter Estimation Step Equations for Terms Related to a Binary Observation
Thus far, we have only considered cases where the probability of binary event occurrence \(p_{k}\) is of the form
We have also thus far only estimated \(\beta _{0}\) empirically (e.g., based on the average probability of point process event occurrence). Occasionally, however, we will find it helpful to model \(p_{k}\) as
and determine \(\beta _{0}\) and \(\beta _{1}\) at the parameter estimation step. If we wish to do so, we will need to consider the probability term that needs to be maximized at this step. Based on (3.27), this probability term is
This yields the expected log-likelihood
As in the case of determining the parameter updates for the terms in a CIF, this expected value is also somewhat complicated. Again, the trick is to perform a Taylor expansion around the mean \(\mathbb {E}[x_{k}] = x_{k|K}\) for each of the individual log terms. After performing this expansion, we end up with terms like \(\mathbb {E}[x_{k} - x_{k|K}]\) and \(\mathbb {E}[(x_{k} - x_{k|K})^{2}]\) which greatly simplify our calculations.
Let us begin by performing a Taylor expansion of the log term around \(x_{k|K}\) [6].
Note the terms \((x_{k} - x_{k|K})\) and \((x_{k} - x_{k|K})^{2}\) in the expansion. Taking the expected value on both sides,
Therefore,
Now,
And similarly,
Taking the partial derivative of Q with respect to \(\beta _{0}\), we have
And similarly for \(\beta _{1}\), we have
By setting
we obtain two simultaneous equations with which to solve for \(\beta _{0}\) and \(\beta _{1}\). Note also that the use of \(\beta _{0}\) and \(\beta _{1}\) in \(p_{k}\) causes changes to the filter update equations for \(x_{k|k}\) and \(\sigma ^{2}_{k|k}\).
The parameter estimation step updates for \(\beta _{0}\) and \(\beta _{1}\) when we observe a binary variable \(n_{k}\) are obtained by solving
9.3 Extending Estimation to a Vector-Valued State
We have also thus far only considered cases where a single state \(x_{k}\) gives rise to different observations. In a number of applications, we will encounter the need to estimate a vector-valued state \({\mathbf {x}}_{k}\). For instance, we may need to estimate the position of a small animal on a 2D plane from neural spiking observations or may need to estimate different aspects of emotion from physiological signal features. We have a multi-dimensional \({\mathbf {x}}_{k}\) in each of these cases.
Let us first consider the predict equations in the state estimation step. Assume that we have a state \({\mathbf {x}}_{k}\) that varies with time following
where A and B are matrices and \({\mathbf {e}}_{k} \sim \mathcal {N}(\mathbf {0}, \varSigma )\) is the process noise. The basic statistical results related to mean and variance in (2.1)–(2.6) simply generalize to the vector case. Thus, the predict equations in the state estimation step become
where the covariance (uncertainty) \(\varSigma \) of \({\mathbf {x}}_{k}\) is now a matrix.
Recall also how we derived the update equations in the state estimation step. We calculated the terms that appeared in posterior \(p(x_{k}|y_{1:k})\) and made a Gaussian approximation to it in order to derive the mean and variance updates \(x_{k|k}\) and \(\sigma ^{2}_{k|k}\). In all of the scalar cases, the log posterior density had the form
where \(f(x_{k})\) was some function of \(x_{k}\). This function could take on different forms depending on whether binary, continuous, or spiking-type observations (or different combinations of them) were present. In each of the cases, the mean and variance were derived based on the first and second derivatives of \(q_{s}\).
There are two different ways for calculating the update step equations in the vector case.
-
The first is the traditional approach outlined in [10]. Here, the result that holds for the 1D case is simply extended to the vector case. Regardless of the types of observations (features) that are present in the state-space model, the log posterior is of the form
$$\displaystyle \begin{aligned} q_{v} &= f({\mathbf{x}}_{k}) - \frac{1}{2}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k - 1})^{\intercal}\varSigma^{-1}_{k|k - 1}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k - 1}) + \text{constant}. \end{aligned} $$(9.35)The manner in which the updates \({\mathbf {x}}_{k|k}\) and \(\varSigma _{k|k}\) are calculated, however, is quite similar. We simply take the first vector derivative of \(q_{v}\) and solve for where it is \(\mathbf {0}\) to obtain \({\mathbf {x}}_{k|k}\). We next take the Hessian of \(q_{v}\) comprising all the second derivatives and take its negative inverse to obtain \(\varSigma _{k|k}\).
-
The second approach is slightly different [115]. Note that, based on making a Gaussian approximation to the log posterior, we can write
$$\displaystyle \begin{aligned} - \frac{1}{2}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k})^{\intercal}\varSigma^{-1}_{k|k}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k}) &= f({\mathbf{x}}_{k})\!- \!\frac{1}{2}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k\!-\!1})^{\intercal}\varSigma^{-1}_{k|k\!-\!1}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k\!-\!1})\\ &\qquad + \text{constant}. \end{aligned} $$(9.36)Let us take the first vector derivative with respect to \({\mathbf {x}}_{k}\) on both sides. This yields
$$\displaystyle \begin{aligned} - \varSigma^{-1}_{k|k}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k}) &= \frac{\partial f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}} - \varSigma^{-1}_{k|k - 1}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k - 1}). {} \end{aligned} $$(9.37)Let us now evaluate this expression at \({\mathbf {x}}_{k} = {\mathbf {x}}_{k|k - 1}\). Do you see that if we substitute \({\mathbf {x}}_{k} = {\mathbf {x}}_{k|k - 1}\) in the above expression, the second term on the right simply goes away? Therefore, we end up with
$$\displaystyle \begin{aligned} - \varSigma^{-1}_{k|k}({\mathbf{x}}_{k|k - 1} - {\mathbf{x}}_{k|k}) &= \frac{\partial f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}}\Bigg\rvert_{{\mathbf{x}}_{k|k - 1}} \end{aligned} $$(9.38)$$\displaystyle \begin{aligned} \implies {\mathbf{x}}_{k|k} &= {\mathbf{x}}_{k| k - 1} + \varSigma_{k|k}\frac{\partial f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}}\Bigg\rvert_{{\mathbf{x}}_{k|k - 1}}. \end{aligned} $$(9.39)This yields the mean state update for \({\mathbf {x}}_{k|k}\). How do we derive the covariance matrix \(\varSigma _{k|k}\)? We simply take the vector derivative of (9.37) again. Note that in this case, \(\frac {\partial ^{2}}{\partial {\mathbf {x}}_{k}^{2}}\) is a matrix of all the second derivative terms. Thus, we obtain
$$\displaystyle \begin{aligned} \varSigma^{-1}_{k|k} &= -\frac{\partial^{2}f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}^{2}} + \varSigma^{-1}_{k|k - 1} \end{aligned} $$(9.40)$$\displaystyle \begin{aligned} \implies \varSigma_{k|k} &= \Bigg[-\frac{\partial^{2}f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}^{2}} + \varSigma^{-1}_{k|k - 1} \Bigg]^{-1}. \end{aligned} $$(9.41)
9.4 The Use of Machine Learning Methods for State Estimation
Machine learning approaches can also be used for state estimation (e.g., [116, 117]). In these methods, neural networks or other techniques are utilized to learn a particular state-space model and infer the unobserved state(s) from a dataset. In this section, we will briefly describe how the neural network approach in [116] is used for estimation. In [116], Krishnan et al. considered the general Gaussian state-space model
where \(y_{k}\) represents the observations. Both the state equation and the output equation are learned using two separate neural networks (for simplicity, we group both of them together under the title “state-space neural network”—SSNN). A separate recurrent neural network (RNN) is used to estimate \(x_{k}\). Taking \(\psi \) and \(\phi \) to denote the parameters of the state-space model and the RNN, respectively, the networks are trained by maximizing
where \(p_{\psi }(\cdot )\) and \(q_{\phi }(\cdot )\) denote density functions [116]. The actual training is performed within the algorithm as a minimization of the negative term which we label \(Q_{ML}\). Analogous to the state-space EM algorithms we have seen so far, in this neural network approach, the SSNN replaces the explicit state-space model, the RNN replaces the Bayesian filter, and the weights of the neural networks replace the model parameters. The objective, however, is still to estimate \(x_{k}\) from observations such as \(n_{k}\), \(r_{k}\), and \(s_{k}\). Since neural networks are used to learn the state-space model, more complicated state transitions and input-output relationships are permitted. One of the drawbacks, however, is that a certain degree of interpretability is lost.
Similarities also exist between the terms in \(Q_{ML}\) and the log-likelihood terms we have seen thus far. For instance, when a binary variable \(n_{k}\) is present among the observations \(y_{k}\), \(Q_{ML}\) contains the summation
Take a moment to look back at how (3.15) and (3.26) fit in with this summation. In this case, however, \(f_{n}(\cdot )\) is learned by the SSNN (in our other approaches, we explicitly modeled the relationship between \(x_{k}\) and \(p_{k}\) using a sigmoid). Similarly, if a continuous-valued variable \(s_{k}\) is present in \(y_{k}\), there is the summation
where \(f_{\mu _{s}}(\cdot )\) and \(f_{\sigma ^{2}_{s}}(\cdot )\) represent mean and variance functions learned by the SSNN. Again, recall that we had a very similar term at the parameter estimation step for a continuous variable \(s_{k}\).
One of the primary advantages of the neural network approach in [116] is that we no longer need to derive all the EM algorithm equations when new observations are added. This is a notable drawback with the traditional EM approach. Moreover, we can also modify the objective function to
where \(l_{k}\) is an external influence and \(0 \leq \rho \leq 1\). This provides the option to perform state estimation while permitting an external influence (e.g., domain knowledge or subject-provided labels) to affect \(x_{k}\).
9.5 Additional MATLAB Code Examples
In this section we briefly describe the two state-space models in [118] and [30] for which the MATLAB code examples are provided. The equation derivations for these two models require no significant new knowledge. The first of these incorporates one binary observation from skin conductance and one EKG spiking-type observation. The second incorporates one binary observation and two continuous observations. It is almost identical to the model with the same observations described in an earlier chapter but has a circadian rhythm term as \(I_{k}\). The derivation of the state and parameter estimation equations is similar to what we have seen before.
9.5.1 State-Space Model with One Binary and One Spiking-Type Observation
The MATLAB code example for the state-space model with one binary and one spiking-type observation is provided in the “one_bin_one_spk” folder. The model is described in [118] and attempts to estimate sympathetic arousal from binary-valued SCRs and EKG R-peaks (the RR-intervals are modeled using an HDIG-based CIF). The results are shown in Fig. 9.1. The data come from the study described in [119] where subjects had to perform office work-like tasks under different conditions. In the first condition, the subjects were permitted to take as much time as they liked. The other two conditions involved e-mail interruptions and time constraints. Based on the results reported in [118], it appeared that task uncertainty (i.e., how new the task is) seemed to have generated the highest sympathetic arousal responses for the subject considered.
9.5.2 State-Space Model with One Binary and Two Continuous Observations with a Circadian Input in the State Equation
Cortisol is known to exhibit circadian variation [120, 121]. Typically, cortisol concentrations in the blood begin to rise early morning during late stages of sleep. Peak values are reached shortly after awakening. Later in the day, cortisol levels tend to drop toward bedtime and usually reach their lowest values in the middle of the night [122, 123]. In [30], a circadian \(I_{k}\) term was assumed to drive \(x_{k}\) so that it evolved with time following
where
The model also considered the upper and lower envelopes of the blood cortisol concentrations as the two continuous variables \(r_{k}\) and \(s_{k}\). The pulsatile secretions formed the binary variable \(n_{k}\). The inclusion of each continuous variable necessitates the determination of three model parameters (two governing the linear fit and the third being the sensor noise variance). In addition, the state-space model in [30] also estimated \(\beta _{0}\) and \(\beta _{1}\) in \(p_{k}\). There are also six more parameters in the state equation: \(\rho \), \(a_{1}\), \(a_{2}\), \(b_{1}\), \(b_{2}\), and \(\sigma ^{2}_{\varepsilon }\). To ease computational complexity, the EM algorithm in [30] treated the four parameters related to the circadian rhythm (\(a_{1}\), \(a_{2}\), \(b_{1}\), and \(b_{2}\)) somewhat differently. Thus, while all the parameters were updated at the parameter estimation step, \(a_{1}\), \(a_{2}\), \(b_{1}\), and \(b_{2}\) were excluded from the convergence criteria. The results are shown in Fig. 9.2. Here, the data were simulated for a hypothetical patient suffering from a type of hypercortisolism (Cushing’s disease) based on the parameters in [124]. Cushing’s disease involves excess cortisol secretion into the bloodstream and may be caused by tumors or prolonged drug use [125]. Symptoms of Cushing’s disease involve a range of physical and psychological symptoms including insomnia and fatigue [126,127,128]. The resulting cortisol-related energy state estimates do not have the usual circadian-like patterns seen for a healthy subject. This may partially account for why Cushing’s patients experience daytime bouts of fatigue and nighttime sleeping difficulties.
References
T. P. Coleman, M. Yanike, W. A. Suzuki, and E. N. Brown, “A mixed-filter algorithm for dynamically tracking learning from multiple behavioral and neurophysiological measures,” The Dynamic Brain: An Exploration of Neuronal Variability and its Functional Significance, pp. 3–28, 2011.
E. N. Brown, L. M. Frank, D. Tang, M. C. Quirk, and M. A. Wilson, “A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells,” Journal of Neuroscience, vol. 18, no. 18, pp. 7411–7425, 1998.
D. S. Wickramasuriya and R. T. Faghih, “A Bayesian filtering approach for tracking arousal from binary and continuous skin conductance features,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 6, pp. 1749–1760, 2020.
D. S. Wickramasuriya and R. T. Faghih, “A cortisol-based energy decoder for investigation of fatigue in hypercortisolism,” in 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), July 2019, pp. 11–14.
M. M. Shanechi, J. J. Chemali, M. Liberman, K. Solt, and E. N. Brown, “A brain-machine interface for control of medically-induced coma,” PLoS Computational Biology, vol. 9, no. 10, p. e1003284, 2013.
R. G. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks for nonlinear state space models,” in 31st AAAI Conf. Artificial Intelligence, 2017.
X. Zheng, M. Zaheer, A. Ahmed, Y. Wang, E. P. Xing, and A. J. Smola, “State space LSTM models with particle MCMC inference,” arXiv preprint arXiv:1711.11179, 2017.
D. S. Wickramasuriya and R. T. Faghih, “A novel filter for tracking real-world cognitive stress using multi-time-scale point process observations,” in 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), July 2019, pp. 599–602.
S. Koldijk, M. Sappelli, S. Verberne, M. A. Neerincx, and W. Kraaij, “The SWELL knowledge work dataset for stress and user modeling research,” in 16th International Conference on Multimodal Interaction. ACM, 2014, pp. 291–298.
E. N. Brown, P. M. Meehan, and A. P. Dempster, “A stochastic differential equation model of diurnal cortisol patterns,” American Journal of Physiology-Endocrinology and Metabolism, vol. 280, no. 3, pp. E450–E461, 2001.
I. Vargas, A. N. Vgontzas, J. L. Abelson, R. T. Faghih, K. H. Morales, and M. L. Perlis, “Altered ultradian cortisol rhythmicity as a potential neurobiologic substrate for chronic insomnia,” Sleep Medicine Reviews, vol. 41, pp. 234–243, 2018.
D. M. Arble, G. Copinschi, M. H. Vitaterna, E. Van Cauter, and F. W. Turek, “Chapter 12 - Circadian rhythms in neuroendocrine systems,” in Handbook of Neuroendocrinology, G. Fink, D. W. Pfaff, and J. E. Levine, Eds. San Diego: Academic Press, 2012, pp. 271–305. [Online]. Available: http://www.sciencedirect.com/science/article/pii/B9780123750976100125
F. Suay and A. Salvador, “Chapter 3 - Cortisol,” in Psychoneuroendocrinology of Sport and Exercise: Foundations, Markers, Trends, F. Ehrlenspiel and K. Strahler, Eds. Routledge, 2012, pp. 43–60.
M. A. Lee, N. Bakh, G. Bisker, E. N. Brown, and M. S. Strano, “A pharmacokinetic model of a tissue implantable cortisol sensor,” Advanced Healthcare Materials, vol. 5, no. 23, pp. 3004–3015, 2016.
H. Raff and T. Carroll, “Cushing’s syndrome: From physiological principles to diagnosis and clinical care,” The Journal of Physiology, vol. 593, no. 3, pp. 493–506, 2015.
M. N. Starkman and D. E. Schteingart, “Neuropsychiatric manifestations of patients with Cushing’s syndrome: relationship to cortisol and adrenocorticotropic hormone levels,” Archives of Internal Medicine, vol. 141, no. 2, pp. 215–219, 1981.
R. A. Feelders, S. Pulgar, A. Kempel, and A. Pereira, “The burden of Cushing’s disease: Clinical and health-related quality of life aspects,” European Journal of Endocrinology, vol. 167, no. 3, pp. 311–326, 2012.
A. Lacroix, R. A. Feelders, C. A. Stratakis, and L. K. Nieman, “Cushing’s syndrome,” The Lancet, vol. 386, no. 9996, pp. 913–927, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0140673614613751
Author information
Authors and Affiliations
9.1 Electronic Supplementary Material
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Wickramasuriya, D.S., Faghih, R.T. (2024). Additional Models and Derivations. In: Bayesian Filter Design for Computational Medicine. Springer, Cham. https://doi.org/10.1007/978-3-031-47104-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-47104-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47103-2
Online ISBN: 978-3-031-47104-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)