A parameter estimation method for multivariate binned Hawkes processes

Shlomovich, Leigh; Cohen, Edward A. K.; Adams, Niall

doi:10.1007/s11222-022-10121-2

A parameter estimation method for multivariate binned Hawkes processes

Open access
Published: 19 October 2022

Volume 32, article number 98, (2022)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

A parameter estimation method for multivariate binned Hawkes processes

Download PDF

Leigh Shlomovich¹,
Edward A. K. Cohen¹ &
Niall Adams¹

2178 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

It is often assumed that events cannot occur simultaneously when modelling data with point processes. This raises a problem as real-world data often contains synchronous observations due to aggregation or rounding, resulting from limitations on recording capabilities and the expense of storing high volumes of precise data. In order to gain a better understanding of the relationships between processes, we consider modelling the aggregated event data using multivariate Hawkes processes, which offer a description of mutually-exciting behaviour and have found wide applications in areas including seismology and finance. Here we generalise existing methodology on parameter estimation of univariate aggregated Hawkes processes to the multivariate case using a Monte Carlo expectation–maximization (MC-EM) algorithm and through a simulation study illustrate that alternative approaches to this problem can be severely biased, with the multivariate MC-EM method outperforming them in terms of MSE in all considered cases.

Inference for ETAS models with non-Poissonian mainshock arrival times

Article Open access 13 December 2018

Spatio-Temporal Hawkes Point Processes: A Review

Article Open access 19 September 2024

Modelling Overdispersion with Integer-Valued Moving Average Processes

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Modern data acquisition systems in many applications collect vast amounts of data that can be characterised as time series. An area of particular interest is cyber-security, where network data are recorded and questions arise about the correlation structure and ‘excitational’ effects in and between the resulting time series. We aim to uncover trends within and between such time stamped event data, which we model using point processes. For this, we are particularly interested in the multivariate Hawkes process, which provides a model for ‘mutually exciting’ events. Introduced in Hawkes (1971a, 1971b) primarily to model the occurrence of seismic activity, multivariate Hawkes processes have found wide application to many disciplines due to their ability to model cross-excitation. In the case of financial data, multivariate Hawkes processes have been used to model the joint dynamics of trades and mid-price changes of the NYSE (Bowsher 2007) and an overview of literature surrounding the application of these processes to finance is given in Bacry et al. (2015). Additionally Hawkes processes have been considered within cyber-security for modelling of computer network traffic (Mark et al. 2019; Price-Williams and Heard 2019) and social media activity, for example in Kobayashi and Lambiotte (2016) where propagation of ‘Twitter cascades’ is considered. As discussed in Shlomovich et al. (2022) practical limitations on storing or recording high resolution data results in an abundance of binned data, that is streams of counts of occurrences per time bin. We can use the counting process representation of a P-variate point process to model this binned data. Let $N^{(p)}(t)$ be the pth continuous time count process ($p= 1, \ldots , P$) denoting the number of events until time $t \in {\mathbb {R}}$ where $N^{(p)}(0) = 0$, $N^{(p)}(t) = N^{(p)}((0,t])$ for $t > 0$ and $-N^{(p)}((t,0])$ for $t < 0$. We further denote

$$\begin{aligned} N^{(p)}_{t} = N^{(p)}(\varDelta (t+1)) - N^{(p)}(\varDelta t), \end{aligned}$$

to be the binned (aggregated) process. Then the discrete-time vector process $\varvec{N}_t$ is given by

$$\begin{aligned} \varvec{N}_t = \left[ N^{(1)}_{t}, \ldots , N^{(P)}_{t}\right] , \, t = 1, \ldots , K, \end{aligned}$$

where $K = T/\varDelta $ and $\varDelta $ is the bin width. Here we develop and extend the methodology detailed in Shlomovich et al. (2022) to handle parameter estimation of multivariate binned Hawkes processes. We do this by considering the superposition of the multivariate count process. This is an elegant method which manages to replicate both the marginal and the cross-correlation structure of the true process. Results from a simulation study, comparing the extended MC-EM algorithm to both the INAR(p) and binned log-likelihood approximations are presented in Sect. 3. This method is shown to produce estimates with lower MSE than currently available alternatives. We additionally derive the Hessian of a multivariate Hawkes process with exponential kernel, required for optimization, and present this in “Appendix B” along with the gradient derived in Ozaki (1979).

1.1 Multivariate Hawkes processes

Formally, the P-dimensional Hawkes process $\varvec{N}(t) = \left[ N^{(1)}(t), \ldots , N^{(P)}(t)\right] $ is a class of stochastic process such that independently for each $p \in \{1, \ldots P\}$,

$$\begin{aligned} {\text {Pr}}\{\mathrm{{d}}N^{(p)}(t)=1 \mid {\varvec{N}}(s) \, (s \le t)\}&= \lambda _{(p)}^{*}(t) \mathrm{{d}}t+o(\mathrm{{d}}t), \\ {\text {Pr}}\{\mathrm{{d}}N^{(p)}(t)>1 \mid {\varvec{N}}(s) \, (s \le t)\}&= o(\mathrm{{d}}t), \end{aligned}$$

where $\mathrm {d} N^{(p)}(t)=N^{(p)}(t+\mathrm{{d}}t)-N^{(p)}(t)$ (Hawkes 1971a). It is characterized via its conditional intensity function (CIF) $\lambda _{(p)}^{*}(t),$ defined as

$$\begin{aligned} \lambda _{(p)}^{*}(t)=\nu _p + \sum _{m=1}^{P} \int _{-\infty }^{t} g_{pm}(t-u) \mathrm{{d}}N^{(m)}(u), \end{aligned}$$

(1)

where $\varvec{\nu }>0$ is a P-dimensional vector called the background intensity and $\varvec{g}(u)$ is the non-negative excitation kernel such that $\varvec{g}(u) = 0$ for $u < 0$ and given by a $P \times P$ matrix of functions. In this way, the intensity at an arbitrary time-point is dependent on the history of the multivariate process allowing for both self and mutually exciting behavior. In simple cases, if $g_{ij}(u)=0$ for all u and $i \ne j$, then this is non-cross exciting behavior. That is the cross-covariances are equal to zero and whilst the processes may be self exciting, they are independent and thus not mutually exciting (Hawkes 1971a). Further, in the bivariate case if $g_{21}(u)=0$ for all u but $g_{12} (u) \ne 0$ for all u then $N^{(1)}(t)$ does not affect the likelihood of events in $N^{(2)}(t)$, but the converse does not hold. In this case we have one way interaction between the two processes involved.

The CIF from (1) can be written for an exponential kernel as

$$\begin{aligned} \lambda _{(p)}^{*}(t)=\nu _{p}+\sum _{m=1}^{P} \sum _{j=1}^{N^{m}(t)} \alpha _{ p m} \exp \left( -\beta _{p m}\left( t-t^{m}_{j}\right) \right) , \end{aligned}$$

where $\varvec{\nu }$, $\varvec{\alpha } = \left( \alpha _{pm} \right) $, $\varvec{\beta } = \left( \beta _{pm} \right) $, $p,m = 1, \ldots P$ are known as the baseline, excitation and decay parameters, respectively. Assuming stationarity of the multivariate Hawkes process we have that the vector of stationary densities is

$$\begin{aligned} \varvec{\lambda } = E \{ \varvec{ \lambda }^* \} \end{aligned}$$

so that

$$\begin{aligned} \varvec{\lambda }&=\left( \varvec{I}_{P}-\int _{0}^{\infty } \varvec{g}(u) \mathrm{{d}}u\right) ^{-1} \varvec{\nu } =\left( \varvec{I}_{P}-\varvec{G}(0)\right) ^{-1} \varvec{\nu }, \end{aligned}$$

where ${\varvec{G}}(\omega )$ is the Fourier transform of the excitation kernel $\varvec{g}(\cdot ),$ given by

$$\begin{aligned} \varvec{G}(\omega )=\int _{-\infty }^{\infty } e^{-i \omega \tau } \varvec{g}(\tau ) \mathrm {d} \tau . \end{aligned}$$

Note that $\varvec{G}(0)$ is commonly referred to as the branching ratio, typically denoted by $\varvec{\gamma }$, with the condition for stationarity being that the spectral radius $\rho (\varvec{\gamma })<1$ (Hawkes 1971b).

2 Multivariate MCEM for binned Hawkes processes

In a continuous time framework, maximum likelihood estimation (MLE) can be used to estimate the model parameters from a set of exact multivariate events on the interval [0, T] (Ozaki 1979) denoted

$$\begin{aligned} \varvec{{\mathcal {T}}} = \Big \{ {\mathcal {T}}_{(p)} \Big \}_{p=1, \ldots P} = \Big \{ t^p_1, \ldots t^p_{N^{(p)}_T} \Big \}_{p=1, \ldots P} \in [0, T], \end{aligned}$$

where T is the maximum observation or simulation time, $t^p_l$ is the lth event in process p, and $N^{(p)}(T)$ is the total number of events in process p. When we observe an aggregation of these latent continuous times to a count process of events per time bin, we lose information and in particular the likelihood of our observed event times given a parameter set can no longer be computed exactly due to reliance on the underlying time-stamps. In particular, the Hawkes process which is defined by its CIF, depends on the history of the process, which if ‘blurred’ by binning or rounding requires correct handling in order to obtain meaningful parameter estimates. In the univariate setting, a method for parameter estimation based on the Whittle likelihood is presented in Cheysson and Lang (2022) and a modified Monte Carlo Expectation Maximisation (MC-EM) approach is presented in Shlomovich et al. (2022), termed the Binned Hawkes Expectation Maximisation (BH-EM) algorithm; which is extended here for multivariate data.

2.1 The Monte Carlo EM algorithm

The EM algorithm (Dempster et al. 1977) augments observed data by a latent quantity (Wei and Tanner 1990) to iteratively compute the maximizer of a likelihood. Here, the observed data are the multivariate event counts per unit time $ \varvec{N}_t = [N^{(1)}_{t}, \ldots , N^{(P)}_{t}]. $ The latent data, $\varvec{{\mathcal {T}}}$ are the unobserved, true event times. These are marked time-stamps which are not observed due to practical restrictions. We denote the parameter set to be $\varvec{\varTheta }=\{\varvec{\nu }, \varvec{\alpha }, \varvec{\beta }\}$, where $\varvec{\nu }$ is $P \times 1$, and $\varvec{\alpha }$ and $\varvec{\beta }$ are $P \times P$. The following two steps are as detailed in Shlomovich et al. (2022):

1.
In the E (Expectation) step, we compute
$$\begin{aligned} Q_{i+1}(&\varvec{\varTheta }, \varvec{\varTheta }^i) = \nonumber \\&\int _{{\mathbb {T}}} \log (p (\varvec{\varTheta } \mid {\varvec{N}}, \varvec{{\mathcal {T}}})) p ( \varvec{{\mathcal {T}}} \mid {\varvec{N}}, \varvec{\varTheta }^i) \mathrm{{d}}\varvec{{\mathcal {T}}}, \end{aligned}$$
(2)
where ${\mathbb {T}}$ denotes the sample space for $\varvec{{\mathcal {T}}}$.
2.
In the M (Maximization) step, maximize the conditional expectation in (2) to obtain the updated parameter estimate, $\varvec{\varTheta }^{i+1}$.

Monte Carlo methods can be used to numerically compute (2) if it is intractable, forming an algorithm known as MCEM (Wei and Tanner 1990). As we cannot sample the time-stamps directly we use importance sampling to simulate proposals $\varvec{{\mathcal {T}}}^*$ for $\varvec{{\mathcal {T}}}$ from a feasible alternative distribution, denoted $q(\varvec{{\mathcal {T}}} \mid {\varvec{N}}, \varvec{\varTheta }^i)$.

Unique to the multivariate formulation is the need to retain the latent covariance structure between the P processes. To this end, we sample the times of the univariate superposition of the P-variate process. We then split the superposed simulated time-stamps to P processes to create proposals matching the multivariate counts. We refer to such viable proposals as consistent. This approach is detailed further in Sect. 2.2. Once sampled, each proposal is weighted according to the probability it came from the desired distribution. That is, given a set of M samples $\varvec{{\mathcal {T}}}^{*(1)},\ldots ,\varvec{{\mathcal {T}}}^{*(M)}$, we assign weights

$$\begin{aligned} w_k = \frac{p\big (\varvec{{\mathcal {T}}}^{*(k)} \mid {\varvec{N}}, \varvec{\varTheta }^i\big )}{q\big (\varvec{{\mathcal {T}}}^{*(k)} \mid {\varvec{N}}, \varvec{\varTheta }^i\big )}, \end{aligned}$$

(3)

and approximate (2) with

$$\begin{aligned} Q_{i+1}\big (\varvec{\varTheta }, \varvec{\varTheta }^{i}\big ) = \frac{\sum _{k=1}^M w_k \log \big (p \big (\varvec{\varTheta } \mid {\varvec{N}}, \varvec{{\mathcal {T}}}^{*(k)}\big )\big )}{ \sum _{k=1}^M w_k }. \end{aligned}$$

(4)

It is shown in Shlomovich et al. (2022) that if only proposing consistent event times

$$\begin{aligned} p \left( \varvec{{\mathcal {T}}}^{*(k)} \mid {\varvec{N}, \varvec{\varTheta }^i} \right) \propto p \left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{\varTheta }^i \right) , \end{aligned}$$

where $\log \left( p(\varvec{{\mathcal {T}}}^{*(k)}\mid \varvec{\varTheta }) \right) $ is given in Daley and Vere-Jones (2003) by

$$\begin{aligned} \log {\mathcal {L}}&(\varvec{\varTheta } ; \varvec{{\mathcal {T}}}) = \sum _{p=1}^{P} \log {\mathcal {L}}^p(\varvec{\varTheta } ; \varvec{{\mathcal {T}}}), \\&= \sum _{p=1}^P \left[ \sum _{j=1}^{N^{(p)}(T)} \log \lambda _{(p)}^{*}\left( t^{(p)}_{j}\right) -\int _{0}^{T} \lambda _{(p)}^{*}(u) \mathrm {d} u. \right] . \end{aligned}$$

2.2 Multivariate sampling via the superposition process

Generating consistent time-stamps for the latent multivariate process such that the true covariance structure is sufficiently captured is an important problem. We propose the following method for this, which allows us to extend the work in Shlomovich et al. (2022) for multivariate count data. Consider $\varvec{N}_t=[N_{1,t}, \ldots , N_{P,t}]$ being a binned P-variate Hawkes process with exponential kernel. In order to generate possible multivariate, continuous-time proposal, $\varvec{{\mathcal {T}}}^{*}$, of the underlying event times, $\varvec{{\mathcal {T}}}$, we first consider the superposed latent time-stamps,

$$\begin{aligned} \tilde{{\mathcal {T}}} = \mathrm{{sort}} \left\{ \cup _p {\mathcal {T}}_p \right\} , \end{aligned}$$

where ${\mathcal {T}}_p$ represents the set of latent times for the pth process, for $p = 1, \ldots , P$, and thus the superposed binned count process

$$\begin{aligned} {\tilde{N}}_t=N_{1,t} + \cdots + N_{P,t}. \end{aligned}$$

We map the multivariate parameter set $\varvec{\varTheta }=\{\varvec{\nu }, \varvec{\alpha }, \varvec{\beta }\}$ to a corresponding set ${\tilde{\varTheta }} = \{{\tilde{\nu }}, {\tilde{\alpha }}, {\tilde{\beta }}\} $ which approximates the univariate superposed process.

It is given that the intensity of the superposed count process ${\tilde{N}}$ can be decomposed as

$$\begin{aligned} {\tilde{\lambda }}^*(t) = \sum _{p=1}^{P} \lambda ^*_{(p)}(t), \end{aligned}$$

(5)

where ${\tilde{\lambda }}^*(t)$ is the CIF for the superposed process. Using (5) and $\lambda ^*_p(t)$ defined in Eq. (1) we have that

$$\begin{aligned} {\tilde{\lambda }}^*(t)&= \sum _{p=1}^{P} \nu _p \nonumber \\&\quad + \sum _{p=1}^{P} \sum _{m=1}^{P} \sum _{t_{m,j} < t} \alpha _{pm} \exp \left\{ -\beta _{pm}\left( t-t_{m,j}\right) \right\} . \end{aligned}$$

(6)

We approximate the CIF of the superposed process with that of a univariate Hawkes process with exponential kernel and parameters ${\tilde{\varTheta }} = \{{\tilde{\nu }}, {\tilde{\alpha }}, {\tilde{\beta }}\} $. Doing so enables the cross and self excitation effects to be sufficiently captured by the internal correlation structure of the approximated superposed process, which is significantly simpler to generate proposals for. Specifically,

$$\begin{aligned} {\tilde{\lambda }}^*(t)&\approx {\tilde{\lambda }}^*_{a}(t), \nonumber \\&= {\tilde{\nu }} + \sum _{t_j < t} {\tilde{\alpha }} \exp \left\{ -{\tilde{\beta }} \left( t-t_{j}\right) \right\} , \end{aligned}$$

(7)

where ${\tilde{\lambda }}^*_{a}(t)$ is the approximated CIF. By equating terms in Eqs. (6) and (7), it is clear that the baseline for the superposed process is simple to handle, and can be defined as ${\tilde{\nu }} = \sum _{p=1}^P \nu _p$.

The choice of ${\tilde{\alpha }}$ and ${\tilde{\beta }}$ requires more care as, from the preceding two Equations, we approximate the contribution to the CIF from excitational effects as

$$\begin{aligned} \sum _{p=1}^{P} \sum _{m=1}^{P} \sum _{t_{m,j}<t}&\alpha _{pm} \exp \left\{ -\beta _{pm}\left( t-t_{m,j}\right) \right\} \nonumber \\&\approx \sum _{t_j < t}^{{\tilde{N}}(t)} {\tilde{\alpha }} \exp \left\{ -{\tilde{\beta }} \left( t-t_{j}\right) \right\} . \end{aligned}$$

(8)

To aid the choice of appropriate ${\tilde{\alpha }}$ and ${\tilde{\beta }}$ we utilise the stationarity intensity of a univariate Hawkes process. We have that

$$\begin{aligned} {\tilde{\lambda }}_{a} = \frac{{\tilde{\nu }}}{1 - {\tilde{\gamma }}}, \end{aligned}$$

where ${\tilde{\lambda }}_{a}$ is the stationary intensity for ${\tilde{\lambda }}_a^*(t)$ and ${\tilde{\gamma }}$ is the branching ratio for ${\tilde{\lambda }}_a^*(t)$, equal to ${\tilde{\alpha }}/{\tilde{\beta }}$ due to the exponential form of the kernel. As the intensity of the superposed process can be decomposed, as in (5), the stationary intensities can similarly be decomposed. Therefore we have that

$$\begin{aligned} {\tilde{\lambda }}_{a} \approx {\tilde{\lambda }}&= \sum _{p=1}^P \lambda _p, \nonumber \\&= \frac{1}{K \varDelta } \sum _{p=1}^P E\{ N_p (T)\}, \end{aligned}$$

where $K \varDelta = T $. Noting that ${\tilde{\nu }} = \sum _{p=1}^P \nu _p$, we thus have

$$\begin{aligned} {\tilde{\gamma }} \approx 1 - \frac{K \varDelta \sum _{p=1}^P \nu _p}{ \sum _{p=1}^P E\{ N_p(T)\} }. \end{aligned}$$

(9)

The relationship in Eq. (9) aids the choice of ${\tilde{\alpha }}$ and ${\tilde{\beta }}$ such that we retain a consistent stationary intensity under the approximation of the CIF with the exponential kernel given in Eq. (7). Further, the form in Eq. (9) is simple to compute as $E\{N_p (T)\}$ is well approximated by the mean number of events in each process.

We find that a suitable choice for ${\tilde{\beta }}$ is

$$\begin{aligned} {\tilde{\beta }}&=\text {weighted} \, \text {mean}\left\{ \beta _{pm}\right\} _{p,m = 1, \ldots , P}, \nonumber \\&= \frac{\sum _{p=1}^P \sum _{m=1}^P \omega _{m} \beta _{pm} }{\sum _{p=1}^P \sum _{m=1}^P \omega _{m}}, \end{aligned}$$

(10)

where $\omega _{m}$ is the proportion of total events which are observed in process m. This form is reached by considering Eq. (8). The decay rate across events can approximately be given by the mean of the multivariate decays, weighted according to the proportion of events in each process. The benefit of weighting the mean can be seen by reasoning about the case when the processes have significantly different numbers of events. Combining (10) with the relationship for ${\tilde{\gamma }}$ in Eq. (9) we have

$$\begin{aligned} {\tilde{\alpha }}= \text {weighted} \, \text {mean}\left\{ \beta _{p m}\right\} \left( 1 - \frac{K \varDelta \sum _{p=1}^P \nu _p}{ \sum _{p=1}^P E\{ N_p (T) \} } \right) . \end{aligned}$$

(11)

It is important to note that as this reparameterisation does not rely on any latent continuous time-points, but rather just the observed counts and current multivariate parameter estimates, it is efficient to implement in practice.

Empiricial studies show this reparameterization accurately recovers the true CIF of the superposed process. In this way we only simulate a univariate process, albeit a superposed version, and denote the simulated times as $\tilde{{\mathcal {T}}}^{*}$. In order to generate a realisation of the multivariate Hawkes process, $\varvec{{\mathcal {T}}}^{*}$, with cross-covariances, we can then uniformly sample the observed number of events in each bin for each process from the $\tilde{{\mathcal {T}}}^{*}$. In other words, we simulate a consistent set of continuous times for the superposed process matching the observed counts for ${\tilde{N}}(t)$, and then uniformly assign points within each bin to each of the P processes. To reduce variance of the estimates, the allocation of events to the P processes can be conducted ${\tilde{m}}$ times to generate multiple possible multivariate versions of the proposed realisation of ${\tilde{N}}$. If ${\tilde{m}}>1$, the sample which maximises the log-likelihood is selected, otherwise the single multivariate proposal is taken and the MLE is used to estimate the parameters of the multivariate process from the continuous-time proposed realisation. We present results for ${\tilde{m}}=10$ in Sect. 3.

This method provides us with an efficient way of artificially injecting cross-correlation into the consistent proposals whilst also retaining the marginal properties. In order to speed up the maximisation of the likelihood, we require the gradient and Hessian, given in “Appendix B”. The full algorithm for this multivariate approach is derived and presented in “Appendix C”.

2.3 Sampling method

The question remains of how to best sample the latent times. The sequential simulation method detailed in Shlomovich et al. (2022) is applicable in the multivariate extension due to the reparameterization step meaning we need only sample times for the univariate superposition process. In this case, the normalised weight of the kth Monte Carlo sample ($k=1, \ldots , M$) is given by

$$\begin{aligned} w_{k}&= \frac{p\left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N}, \varvec{\varTheta }^{i}\right) }{q\left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N}, \varvec{\varTheta }^{i} \right) } \nonumber \\&\equiv \exp \left\{ \log \left[ p\left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N}, \varvec{\varTheta }^{i}\right) \right] \right. \nonumber \\&\quad \quad \quad \left. - \log \left[ q\left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N}, \varvec{\varTheta }^{i} \right) \right] \right\} \end{aligned}$$

(12)

where M is the number of Monte Carlo samples. Further we note that

$$\begin{aligned} \log \left( q\left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N}, \varvec{\varTheta }^{i}\right) \right)&= \log \left( q\left( \tilde{{\mathcal {T}}}^{*(k)} \mid {\tilde{N}}, {\tilde{\varTheta }}^{i}\right) \right) \\&\quad + \log \left( \mathrm{{Pr}} \left( \varvec{{\mathcal {T}}}^{*(k)} \mid \tilde{{\mathcal {T}}}^{*(k)} , {\varvec{N}} \right) \right) , \end{aligned}$$

where $ \log \left( q\left( \tilde{{\mathcal {T}}}^{*(k)} \mid {\tilde{N}}, {\tilde{\varTheta }}^{i}\right) \right) $ is the log-likelihood of the sequentially sampled superposed times given the superposed counts and reparameterized univariate estimates, and $ \mathrm{{Pr}} \left( \varvec{{\mathcal {T}}}^{*(k)} \mid \tilde{{\mathcal {T}}}^{*(k)} , {\varvec{N}} \right) $ denotes the probability of the random division of superposed time-points into a P-variate count process matching the observed counts.

Large differences between $p\left( \varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N}\right) $ and $q(\varvec{{\mathcal {T}}}^{*(k)} \mid \varvec{N})$ can result in the weights being close to zero. Therefore we rescale the exponent term to avoid arithmetic underflow when computing the weights. We have

(13)

Thus we can use Eq. (13) in place of Eq. (12).

3 Simulation study

We conduct a simulation study to compare the performance of the multivariate MC-EM algorithm to the INAR(p) method introduced in Kirchner (2016). We also compare the results to an approximation which ignores inter-bin excitation, referred to as the binned log-likelihood method. This approach represents the CIF as a piecewise constant function within each bin, equivalently assuming that $N_j^{(p)} \sim \mathrm{{Poisson}} \{ \varDelta \lambda ^*_{m} \left( [j-1] \varDelta \right) \}$, where $N_j^{(p)}$ are the counts in the jth bin of the pth process (Mark et al. 2019).

Given parameters $\varvec{\nu },$ a $2 \times 1$ matrix, and $\varvec{\alpha }$ and $\varvec{\beta }$ both being $2 \times 2$ matrices, along with some maximum simulation time T, we can simulate realizations of a Hawkes process. The generated events $\varvec{{\mathcal {T}}}$ represent the underlying process and aggregating these to a chosen binning $\varDelta $ allows us to simulate the count data $\{\varvec{N}_t, \, t=1, \ldots , K \}$. We then apply each of the multivariate MC-EM methods, INAR(p) and binned log-likelihood approximation.

Boxplots for each of the ten estimated parameters used for characterising the bivariate Hawkes process are given in Fig. 1. The parameters used for simulation are

$$\begin{aligned} \varvec{\nu }= \left[ {\begin{array}{c} 0.3 \\ 0.3 \\ \end{array} } \right] , \; \varvec{\alpha }= \left[ {\begin{array}{cc} 0.7 &{} 0.9 \\ 0.6 &{} 1.0 \\ \end{array} } \right] , \; \varvec{\beta }= \left[ {\begin{array}{cc} 1.5 &{} 2.0 \\ 2.0 &{} 3.5 \\ \end{array} } \right] , \end{aligned}$$

with $ \varDelta = 1$ and $T = 2000$. The parameters have been chosen as a stationary case with ample cross-excitation and non-symmetric self-excitation. The mean parameter estimates from repeated simulations is presented on the vertical axis. The INAR(p) approximation method can yield highly variable results, which is to be expected as the method is primarily intended for selecting a parametric kernel from continuous time data. The usual implementation of the INAR(p) involves selecting $\varDelta $ such that there is approximately one event per bin, however for our application this is not possible and so the choice of $\varDelta $ here is chosen to better reflect real data. A log-scale has been used in all four graphs relating to each of $\varvec{\alpha }$ and $\varvec{\beta }$. Both the binned log-likelihood, and particularly the INAR(p) method produced large outliers, resulting in very large MSE relative to the MC-EM approach. Tables 1 and 2 present summary statistics for the outlier-trimmed data. Specifically we remove the top and bottom 5% of parameter estimates for each of the three methods and report the relative bias, mean and standard deviation for each of the parameters and methods. We see that even after removing outliers and negative values from the INAR(p) parameter estimates, the variability is much higher than that of the MC-EM. The absolute value of the bias of both the INAR(p) and binned log-likelihood approaches is also larger than that of the MC-EM, in some cases significantly so. Overall, the proposed MCEM approach has much improved estimation performance than the other methods. Table 3 additionally presents the ratio between the RMSE using $T=2000$ and $T=1000$ for each of the estimation approaches, using 300 Hawkes process realizations for each method. By fixing $\varDelta $, this allows the effect of doubling the number of bins K to be explored. As with the MLE, for the proposed MCEM method, it is expected that the RMSE reduces by approximately a factor of $1/\sqrt{2}$ when doubling K. We find that the mean RMSE ratio across the 10 parameters is 0.762 for the MCEM method, and 0.725 for the MLE acting on the continuous time points. However both the INAR(p) and binned log-likelihood methods have a significantly higher mean ratio, of 0.875 and 0.990 respectively.

Table 1 Relative bias for the data trimmed, removing the top and bottom 5% of values for each of the three considered methods

Full size table

Table 2 Mean and standard deviation values for the data trimmed to remove the top and bottom 5% of values each, to handle outliers

Full size table

Table 3 Ratio of RMSE for K = 2000 to K = 1000 for each of the methods considered, and the MLE

Full size table

Evaluation of the methods is additionally explored via goodness of fit. This is an important aspect which allows us to check the validity of the estimates given the data. Often, the random change theorem, given in Daley and Vere-Jones (2003), is used for considering goodness of fit by transforming time-points using the compensator function

$$\begin{aligned} \varLambda (t^p_k) = \int _{0}^{t^p_k} \lambda _{(p)}^{*}(u) \mathrm{{d}}u, \end{aligned}$$

where ${t^p_k}$ is the $k\mathrm{{th}}$ event in process p. In practice, $\lambda _{(p)}^{*}(u)$ is estimated using parameter estimates $\hat{\varvec{\varTheta }}$, and goodness of fit is conducted by considering the distribution of the transformed times, defined as ${\mathcal {T}}^{\dagger } = \{t^{\dagger }_1, t^{\dagger }_2, \ldots \} = \{\varLambda (t_1), \varLambda (t_2), \ldots \}$. By the random time change theorem, ${\mathcal {T}}^{\dagger }$ is a realization of a unit rate Poisson process if and only if ${\mathcal {T}}$ is a realization from the point process defined by $\varLambda (\cdot )$.

Figure 2 shows QQ-plots relating to count processes from the simulation study. The continuous times from a single realization of a Hawkes process generated in the simulation study are transformed using each of the parameter estimates from the three approaches considered. The QQ-plots then compare the distribution of the interarrival times between each of the transformed times and a unit rate exponential distribution. We see that the estimates produced by the MC-EM algorithm are indeed a viable parameter set for this realization under an exponential Hawkes model, with the transformed time-points being distributed very close to the theoretical distribution. We also note that as the binned log-likelihood method ignores the effect of inter-bin excitation, it is expected that as the average number of counts in a bin increases, the fit from using this method will worsen. Whilst the QQ-plots given in Fig. 2 demonstrate the viability of the estimated parameters for a specific realization, in order to illustrate goodness of fit across all realizations in the simulation study, Fig. 3 also presents the distribution of Kolmogorov–Smirnov (KS) test statistics. Here, the KS test is used to test the equality of the distribution of the transformed interarrival times to an Exp(1) distribution for each of the three methods considered. This is done for all realizations used in the simulation study, and the distribution of the resulting KS test statistics is presented in Fig. 3. The MLE is also used to produce parameter estimates from the true, continuous time stamps, with the resulting KS test statistics also given in Fig. 3 for easier interpretation. It is clear that the MC-EM approach is far more comparable to the MLE results than the binned log-likelihood or INAR(p) methods.

Note, variance estimation has not been discussed here, but parametric bootstrapping would allow for approximations of this to be obtained (Efron 1987).

4 Case study

Network flow data, referred to as NetFlow, assembles records exported by routers and describes communications between devices connected to an enterprise network. Monitoring and analysing NetFlow data has been successful at detecting a range of malicious network behavior (Turcotte et al. 2018). Here we detect both self-exciting effects in the communication between a pair of network devices, termed here as an ‘edge’, and mutually-exciting activity between such edges in the Los Alamos National Lab (LANL) enterprise network. By modelling the activity in this way, insight into the correlation structure of communications in the network can be obtained and monitored. We consider the methods outlined in Sect. 3 for parameter estimation of aggregated Hawkes processes as the NetFlow data is recorded at a 1 s resolution resulting in multiple events occurring simultaneously.

Figure 4 presents a selected pair of edges in the LANL network with possibe mutually-exciting Hawkes behavior in the communications over a duration of 45 min, with a total of 171 events. We refer to the counts in the top plot as process 1 and the counts in the bottom plot as process 2. The window selected is chosen as a period of more frequent events surrounded by no activity.

Parameter estimates are found using the MC-EM, INAR(p) and binned log-likelihood methods and performance is considered using goodness of fit. This allows us to assess whether the estimates represent viable parameters for modelling the observed data as a mutually-exciting Hawkes process. To do this, we use the time rescaling theorem discussed in Sect. 3. Figure 5 shows QQ-plots for the observed times, uniformly redistributed within their respective bins so as to obtain continuous times for the purpose of assessing goodness of fit (Gerhard and Gerstner 2010). From Fig. 5, the parameter estimates obtained via the MC-EM algorithm are likely viable parameters. The MC-EM parameter estimates are

$$\begin{aligned} \varvec{\nu }= \left[ {\begin{array}{c} 0.01 \\ 0.01 \\ \end{array} } \right] , \; \varvec{\alpha }= \left[ {\begin{array}{cc} 0.49 &{} 0.00 \\ 0.22 &{} 0.28 \\ \end{array} } \right] , \; \varvec{\beta }= \left[ {\begin{array}{cc} 1.46 &{} 0.53 \\ 0.80 &{} 1.01 \\ \end{array} } \right] . \end{aligned}$$

The branching ratio, defined in Sect. 1.1 is therefore

$$\begin{aligned} \varvec{\gamma }= \varvec{\alpha } \oslash \varvec{\beta } = \left[ {\begin{array}{cc} 0.34 &{} 0.00 \\ 0.27 &{} 0.28 \\ \end{array} } \right] , \end{aligned}$$

where $\oslash $ denotes element-wise division and $\gamma _{ij}$ is the average number of events in process j directly triggered by each event in process i.

These results indicate that there is mutually-exciting behavior between these processes in one direction such that process 2 does not affect process 1 but process 1 does affect process 2, where process 1 is presented in blue in Fig. 4 and process 2 in red. There parameter estimates suggest that for each event in process 1, an average of 0.27 events will be triggered in process 2. The baseline parameters given by $\varvec{\nu }$ indicate the rate of the events is approximately one event every 100 s. Both processes are modelled with self-excitation, with each event in processes 1 and 2 triggering 0.34 and 0.28 events in their respective processes. Figure 6 shows the estimated excitation kernel components. From this we can see nature of the exciting effects, where $g_{ij}(x)$ illustrates the excitational effect which process j has on process i.

5 Conclusion

We have presented a novel method for generalising an MCEM algorithm to handle multivariate aggregated data. By reparameterizing our multivariate model in terms of the superposed process we can inject the necessary cross-covariance structure required for generating valid proposals. We also present closed form expressions for the gradient and Hessian of the log likelihood for increased computational efficiency. We further conducted a simulation study to compare this approach to the INAR(p) approximation detailed in Kirchner (2016) and a multivariate extension of the binned log likelihood method from Mark et al. (2019) and Shlomovich et al. (2022). As in the univariate case, the MCEM method out-performed both alternatives in the presented parameter set and moreover the bin width, $\varDelta $ can vary provided the interval bounds are known. The multivariate extension can be applied for other Hawkes kernels.

References

Bacry, E., Mastromatteo, I., Muzy, J.F.: Hawkes processes in finance. Mark Microstruct Liq 1(1), 1550005 (2015)
Article Google Scholar
Bowsher, C.G.: Modelling security market events in continuous time: Intensity based, multivariate point process models. J. Econom. 141(2), 876–912 (2007)
Article MathSciNet MATH Google Scholar
Cheysson, F., Lang, G.: Spectral estimation of Hawkes processes from count data. arXiv:2003.04314 [math, stat] (2022)
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes, 2nd edn. Springer, New York (2003)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Efron, B.: Better bootstrap confidence intervals. J. Am. Stat. Assoc. 82(397), 171–185 (1987)
Article MathSciNet MATH Google Scholar
Gerhard, F., Gerstner, W.: Rescaling, thinning or complementing? On goodness-of-fit procedures for point process models and generalized linear models. In: Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc. (2010)
Hawkes, A.G.: 1Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90 (1971a)
Hawkes, A.G.: 2Point spectra of some mutually exciting point processes. J. R. Stat. Soc. Ser. B (Methodol.) 33(3), 438–443 (1971b)
Kirchner, M.: Hawkes and INAR($\infty $) processes. Stoch. Process. Appl. 126(8), 2494–2525 (2016)
Article MathSciNet MATH Google Scholar
Kobayashi, R., Lambiotte, R.: TiDeH: time-dependent Hawkes process for predicting retweet dynamics. In: Tenth International AAAI Conference on Web and Social Media (2016)
Mark, B., Raskutti, G., Willett, R.: Network estimation from point process data. IEEE Trans. Inf. Theory 65(5), 2953–2975 (2019)
Article MathSciNet MATH Google Scholar
Ozaki, T.: Maximum likelihood estimation of Hawkes’ self-exciting point processes. Ann. Inst. Stat. Math. 31(1), 145–155 (1979)
Article MathSciNet MATH Google Scholar
Price-Williams, M., Heard, N.A.: Nonparametric self-exciting models for computer network traffic. Stat. Comput. 30, 1–12 (2019)
MathSciNet MATH Google Scholar
Shlomovich, L., Cohen, E.A.K., Adams, N., Patel, L.: Parameter estimation of binned hawkes processes. J. Comput. Graph. Stat. pp 1–27 (2022)
Turcotte, M.J.M., Kent, A.D., Hash, C.: Unified host and network data set. In: Data Science for Cyber-Security, Security Science and Technology, pp. 1–22. World Scientific, Singapore (2018)
Google Scholar
Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the Poor Man’s Data Augmentation algorithms. J. Am. Stat. Assoc. 85(411), 699–704 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
Leigh Shlomovich, Edward A. K. Cohen & Niall Adams

Authors

Leigh Shlomovich
View author publications
You can also search for this author in PubMed Google Scholar
Edward A. K. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Niall Adams
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leigh Shlomovich.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Leigh Shlomovich is funded by an Industrial Strategy Engineering and Physical Sciences Research Council Scholarship.

Appendices

Appendix A: Algorithmic details

In the simulation study presented here, the number of Monte Carlo samples $M=10$, and the number of splits ${\tilde{m}} = 10$. The L2 norm of the difference between successive parameter estimates is used for determining convergence. For the case study presented, the MC-EM algorithm takes 8.64 s, INAR(p) takes 0.11 s and the binned log likelihood takes 0.14 s. The simulation study presented has additionally been timed on a 2020 MacBookPro 32GB RAM, 2.3 GHz Intel Core i7. The MC-EM algorithm takes an average of $3.76*10^3$ s per realization, INAR(p) takes 0.527 s per realization, where this includes the optimal selection of p and exponential fit. The binned log-likelihood 0.46 s per realization.

Appendix B: Gradient and Hessian of multivariate continuous time Hawkes processes

The likelihood function of the multivariate Hawkes process is given by

$$\begin{aligned} \log {\mathcal {L}}\left( \varvec{\varTheta } ; \varvec{{\mathcal {T}}}\right) =\sum _{m=1}^{P} \log {\mathcal {L}}^{m}\left( \varvec{\varTheta } ; \varvec{{\mathcal {T}}}\right) \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \log {\mathcal {L}}&^{m}\left( \varvec{\varTheta } ; \varvec{{\mathcal {T}}}\right) = -\nu _{m} T \\&-\sum _{n=1}^{P} \frac{\alpha _{m n}}{\beta _{m n}} \sum _{k: t_{k}^{n}<T}\left[ 1-\exp \left( -\beta _{m n}\left( T-t_{k}^{n}\right) \right) \right] \\&+\sum _{k: t^m_{k}<T} \log \left( \nu _{m}+\sum _{n=1}^{P} \alpha _{m n} R_{m n}(k)\right) , \end{aligned} \end{aligned}$$

where $R_{m n}(k)$ is defined as

$$\begin{aligned} R_{m n}(k)=\sum _{i: t_{i}^{n}<t_{k}^{m}} \exp \left( -\beta _{m n}\left( t_{k}^{m}-t_{i}^{n}\right) \right) , \quad k \ge 2, \end{aligned}$$

and $R_{m n}(1) = 0$. This can recursively be defined as

$$\begin{aligned} R_{m n}(k)&=\exp ^{-\beta _{m n}\left( t_{k}^{m}-t_{k-1}^{m}\right) } R_{m n}(k-1) \\&\quad +\sum _{i: t_{k-1}^{m}<t_{i}^{n}<t_{k}^{m}} \exp \left( -\beta _{m n}\left( t_{k}^{m}-t_{i}^{n}\right) \right) . \end{aligned}$$

Therefore, the gradient can be expressed by the following.

$$\begin{aligned} \frac{\partial L^{m}}{\partial \nu _{m}}=-T+\sum _{k: t_{k}^{m}<T} \frac{1}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)}, \end{aligned}$$

$$\begin{aligned} \frac{\partial L^{m}}{\partial \alpha _{m n}}&=-\frac{1}{\beta _{m n}} \sum _{k: t_{k}^{n}<T}\left[ 1-\exp \left( -\beta _{m n}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad +\sum _{k: t_{k}^{m}<T} \frac{R_{m n}(k)}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)}, \\ \frac{\partial L^{m}}{\partial \beta _{m n}}=&\frac{\alpha _{mn}}{\beta _{m n}^{2}} \sum _{k: t_{k}^{n}<T}\left[ 1-\exp \left( -\beta _{m n}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad -\frac{\alpha _{m n}}{\beta _{m n}} \sum _{k: t_{k}^{n}<T}\left[ \left( T-t_{k}^{n}\right) \exp \left( -\beta _{m n}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad -\sum _{k: t_{k}^{m}<T} \frac{\alpha _{m n} R_{m n}^{\prime }(k)}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)}, \end{aligned}$$

where

$$\begin{aligned} R_{m n}^{\prime }(k)=\sum _{i: t^n_{i}<t^{m}_{k}}\left( t_{k}^{m}-t_{i}^{n}\right) \exp \left( -\beta _{m n}\left( t_{k}^{m}-t_{i}^{n}\right) \right) , \end{aligned}$$

$k \ge 2$ and $R_{m n}^{\prime }(1) = 0$.

We can also compute the Hessian for the continuous multivariate Hawkes likelihood. There are 15 categories to consider, given here.

$$\begin{aligned}&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \nu _m^2} = - \sum _{k: t_k^m< T} \frac{1}{\left( \nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)\right) ^{2}}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \nu _m \partial \nu _n} = 0, \quad m \ne n, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \alpha _{mn}^2} =-\sum _{k: t_{k}^{m}<T}\left[ \frac{R_{mn}(k)}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)}\right] ^{2}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \alpha _{mn} \partial \alpha _{mn'}} = -\sum _{k: t_{k}^{m}<T} \frac{R_{m n}(k) R_{m n'}(k)}{\left( \nu _m+\sum _{j=1}^{P} \alpha _{m j} R_{mj}(k)\right) ^{2}},\\&n' \ne n, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \alpha _{mn} \partial \alpha _{m'n'}} = 0, \quad m' \ne m, \quad n,n' \in \{1, \ldots , P\}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \alpha _{mn} \partial \nu _{m}} = -\sum _{k: t_{k}^{m}<T} \frac{R_{mn}(k)}{\left( \nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k) \right) ^2}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \alpha _{mn} \partial \nu _{m'}} = 0, \quad m' \ne m, \end{aligned}$$

$$\begin{aligned}&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \nu _{m}} = \sum _{k: t_{k}^{m}<T} \frac{\alpha _{m n} R_{m n}^{\prime }(k)}{\left( \nu _{m}+\sum _{j=1}^{P} \alpha _{mj} R_{m j}(k)\right) ^{2}}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \nu _{m'}} = 0, \quad m' \ne m, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \alpha _{mn}} \\&= -\frac{1}{\beta _{m n}} \sum _{k:t_{k}^{n}<T}\left[ \left( T-t_{k}^{n}\right) \exp \left( -\beta _{m n} \left( T-t_k^n\right) \right) \right] \\&\quad +\frac{1}{\beta _{m n}^{2}} \sum _{k: t_{k}^{n}<T}^{T}\left[ 1-\exp \left( -\beta _{m n}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad -\sum _{k: t^m_{k}<T} \frac{R_{m n}^{\prime }(k)}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)} \\&\quad + \sum _{k: t^m_{k}<T} \frac{\alpha _{m n} R_{mn}^{\prime }(k) R_{m n}(k)}{\left( \nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)\right) ^{2}}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \alpha _{mn'}} = \sum _{k: t_{k}^{m}<T} \frac{\alpha _{m n} R_{m n}^{\prime }(k) R_{m n'}(k)}{\left( \nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)\right) ^{2}} \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \alpha _{m'n'}} = 0, \quad m' \ne m, \quad n,n' \in \{1, \ldots , P\}, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta ^2_{mn}} = -\frac{2 \alpha _{mn}}{\beta _{mn}^3} \sum _{k: t_{k}^{n}<T} \left[ 1-\exp \left( -\beta _{mn}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad +\frac{2 \alpha _{mn}}{\beta _{mn}^{2}} \sum _{k: t_k^n<T} \left[ \left( T-t_{k}^{n}\right) \exp \left( -\beta _{mn}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad +\frac{\alpha _{m n}}{\beta _{m n}} \sum _{k: t_{k}^n<T}\left[ \left( T-t_{k}^{n}\right) ^{2} \exp \left( -\beta _{mn}\left( T-t_{k}^{n}\right) \right) \right] \\&\quad +\sum _{k: t^m_{k}<T}\Biggl [\frac{\alpha _{m n} R_{mn}^{\prime \prime }(k)}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)}\\&\quad -\left( \frac{\alpha _{m n} R_{m n}^{\prime }(k)}{\nu _{m}+\sum _{j=1}^{P} \alpha _{m j} R_{m j}(k)}\right) ^{2}\Biggr ], \end{aligned}$$

where

$$\begin{aligned} R_{m n}^{\prime \prime }(k)=\sum _{i: t_{i}^{n}<t_{k}^{m}}\left( t_{k}^{m}-t_{i}^{n}\right) ^{2} \exp \left( -\beta _{m n}\left( t_{k}^{m}-t_{i}^{n}\right) \right) , \end{aligned}$$

$k \ge 2$ and $R_{mn}^{\prime \prime }(1) = 0.$

$$\begin{aligned} \frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \beta _{mn'}}&= -\sum _{k: t_{k}^{m}<T} \frac{\alpha _{m n} R_{m n}^{\prime }(k) \alpha _{m n'} R_{m n'}^{\prime }(k)}{\left( \nu _{m}+\sum _{j=1}^{P} \alpha _{mj}R_{mj}(k) \right) ^2},\\&n' \ne n, \\&\frac{\partial ^2 {\mathcal {L}}^m}{\partial \beta _{mn} \partial \beta _{m'n'}} {=} 0, \quad m' {\ne } m, \quad n,n' \in \{1, \ldots , P\}. \end{aligned}$$

This covers all cases required for the full Hessian matrix.

Appendix C: Multivariate MC-EM algorithm

Here we provide an algorithm for the parameter estimation of multivariate aggregated Hawkes processes via the MC-EM procedure. Code is available in MATLAB at https://github.com/lshlomovich/MCEM_Multivariate_Hawkes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shlomovich, L., Cohen, E.A.K. & Adams, N. A parameter estimation method for multivariate binned Hawkes processes. Stat Comput 32, 98 (2022). https://doi.org/10.1007/s11222-022-10121-2

Download citation

Received: 02 September 2021
Accepted: 13 June 2022
Published: 19 October 2022
DOI: https://doi.org/10.1007/s11222-022-10121-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A parameter estimation method for multivariate binned Hawkes processes

Abstract

Similar content being viewed by others

Inference for ETAS models with non-Poissonian mainshock arrival times

Spatio-Temporal Hawkes Point Processes: A Review

Modelling Overdispersion with Integer-Valued Moving Average Processes