Hawkes process
The Hawkes process is a self-exciting temporal point process in probability theory and statistics, designed to model sequences of events where the occurrence of one event temporarily increases the probability of subsequent events, capturing phenomena like clustering or contagion.[1] Introduced by Alan G. Hawkes in 1971, it is defined by a conditional intensity function \lambda(t) that combines a constant background rate \mu with the influence of past events \{T_i < t\} through an excitation kernel \phi, typically expressed as \lambda(t) = \mu + \sum_{T_i < t} \phi(t - T_i), where \phi is often an exponential decay function such as \alpha e^{-\beta (t - T_i)} to represent decaying influence.[1][2] This formulation allows the process to exhibit both immigration (background events) and branching (self-induced events), with the branching ratio \int_0^\infty \phi(u) \, du < 1 ensuring stationarity and preventing infinite cascades.[3] Originally developed to analyze point processes with spectral properties, the Hawkes process gained widespread adoption in the 1980s through applications in seismology, particularly via the Epidemic-Type Aftershock Sequence (ETAS) model for earthquake aftershocks, where past quakes trigger future ones.[2] Over the decades, it has been extended to multivariate settings to handle multiple interacting event types, enabling the modeling of mutual excitations across dimensions like social networks or financial markets.[3] Inference for Hawkes processes typically relies on maximum likelihood estimation for parametric forms or nonparametric kernel methods, with recent advances incorporating deep learning for flexible neural variants and reinforcement learning for control and optimization in dynamic systems.[3] Key applications span diverse fields, including finance, where it models high-frequency trading and limit order book dynamics by capturing transaction clustering; neuroscience, for spike train analysis in neural activity; social media, to describe retweet cascades and information diffusion; and epidemiology, such as modeling malaria outbreaks or disease spread with self-reinforcing patterns.[3][4] These uses highlight the process's versatility in capturing temporal dependencies and causality in event data, though challenges remain in handling high-dimensional data and ensuring computational efficiency for real-time inference.[2]Background
Point processes
A point process is a stochastic model that describes the occurrence of random events in continuous time, typically represented by a sequence of event times \{T_i\}_{i=1}^\infty where $0 < T_1 < T_2 < \cdots and T_i \to \infty as i \to \infty.[5] These models are fundamental in fields such as statistics, probability, and stochastic processes for analyzing phenomena like arrivals in queues, earthquakes, or neuronal spikes, where events happen irregularly over time.[5] Point processes can be distinguished by their structure and assumptions about event occurrences. Simple point processes, exemplified by the Poisson process, feature events that occur independently with no simultaneous occurrences at the same time.[6] In contrast, more complex variants include renewal processes, where inter-event times are independent and identically distributed but follow an arbitrary positive distribution, allowing for greater flexibility in modeling dependencies on prior intervals without full history dependence.[7] Key properties of point processes include the counting measure N(t), defined as the number of events occurring in the interval [0, t], which is a non-decreasing step function jumping by one at each event time.[5] Inter-event times, denoted as X_i = T_i - T_{i-1} (with T_0 = 0), capture the waiting periods between consecutive events and are central to characterizing the process's temporal structure.[8] The compensator function \Lambda(t), a non-decreasing predictable process, provides the expected cumulative number of events up to time t, ensuring that N(t) - \Lambda(t) behaves as a martingale under suitable conditions.[5] A canonical example is the homogeneous Poisson process, which has a constant rate parameter \lambda > 0, meaning the expected number of events in any interval of length t is \lambda t, and events occur independently with exponentially distributed inter-event times of mean $1/\lambda.[6] This process serves as a baseline for understanding more advanced models, such as self-exciting point processes that build upon these foundations.[5]Self-exciting mechanisms
Self-excitation in point processes describes a mechanism where the occurrence of an event at a given time temporarily elevates the conditional intensity rate for subsequent events within the same process, promoting temporal clustering and bursty behavior. This dynamic contrasts with homogeneous Poisson processes, where event rates remain constant and independent, by incorporating feedback from past events that amplifies future occurrences. Such mechanisms are essential for modeling systems exhibiting contagion or reinforcement effects, where isolated events are rare and sequences dominate.[9] The conceptual foundations of self-excitation draw from diverse fields, with early inspirations in epidemiology modeling the propagation of contagious diseases, where each infection raises the risk of new cases through direct transmission. In this context, self-exciting dynamics capture how initial outbreaks can cascade into epidemics, reflecting the inherent "contagiousness" of events. Similarly, in neuroscience, self-excitation underlies models of neural firing patterns, where a single action potential can depolarize the neuron further, leading to bursts of spikes that represent synchronized activity in neuronal populations. These ideas predate formal point process frameworks but highlight the intuitive role of event-triggered reinforcement in biological systems.[10][11][12] A prominent empirical precursor to self-exciting models appears in seismology through Omori's law, formulated in 1894, which quantifies the decaying rate of earthquake aftershocks following a mainshock as inversely proportional to time elapsed, indicating how one seismic event triggers a sequence of dependent followers. This law illustrates self-excitation intuitively: a major earthquake destabilizes fault zones, increasing the likelihood of nearby tremors that, in turn, may induce more, forming clustered sequences rather than random occurrences. Modern self-exciting point processes, such as the Hawkes process, build on this by providing a probabilistic structure to simulate such triggering cascades.[13][14] Self-exciting mechanisms differ from mutually exciting processes, which involve cross-influence between distinct event types or subprocesses, such as one population affecting another's rate without intra-process feedback. In self-excitation, the reinforcement is endogenous to the process, emphasizing internal contagion, whereas mutual excitation captures interdependencies, as also outlined in foundational work distinguishing the two.[9]Mathematical formulation
Intensity function
The intensity function of the Hawkes process, denoted \lambda(t), represents the instantaneous expected rate of event occurrences at time t, conditional on the history H_t of events observed strictly before t.[15] This function takes the general form \lambda(t) = \mu + \sum_{t_i < t} \alpha \phi(t - t_i), where \mu > 0 denotes the constant baseline intensity, \alpha > 0 is the magnitude of excitation induced by each past event, and \phi is a positive kernel function supported on (0, \infty).[16] This expression captures the self-exciting nature of the process, initially proposed by Hawkes for exponential kernels and extended to general decaying kernels via a cluster process representation.[17][16] The baseline component \mu models the exogenous rate of "immigrant" events that occur independently of prior activity, while the summation term accounts for the endogenous excitation, with each historical event at time t_i contributing an additional intensity \alpha \phi(t - t_i) that reflects branching from past occurrences.[16] The kernel \phi(u) is typically a monotonically decreasing function ensuring finite temporal influence, such as the exponential kernel \phi(u) = \beta e^{-\beta u} for u > 0 and \beta > 0, which provides a memory parameter controlling the decay rate of excitation.[17] This decay property guarantees that the cumulative effect of past events remains bounded, supporting the process's interpretability as a conditional rate.[16]Univariate case
The univariate Hawkes process models a single type of event where occurrences can trigger future events of the same type, simplifying analysis by focusing on self-excitation without inter-type interactions. This specialization builds on the general intensity function by restricting it to one dimension, making it suitable for introductory studies of clustering behavior in temporal data. The conditional intensity function for the univariate case is \begin{equation} \lambda(t) = \mu + \sum_{t_i < t} \alpha , e^{-\beta (t - t_i)}, \end{equation} where the sum is over all previous event times t_i in the history of the process, assuming an exponential kernel for the excitation. Here, \mu > 0 represents the constant background intensity that drives exogenous events, \alpha > 0 denotes the magnitude of the instantaneous jump in intensity immediately after an event, and \beta > 0 governs the exponential decay rate, determining how quickly the influence of past events diminishes. These parameters allow the model to capture both baseline activity and the contagious nature of events, with the exponential form enabling closed-form expressions for many properties. A key condition for the stationarity of the univariate Hawkes process is that the branching ratio n = \frac{\alpha}{\beta} < 1, which quantifies the expected number of offspring events triggered by each occurrence; values of n approaching 1 lead to increased clustering and higher variability in event rates, while n \geq 1 causes the process to diverge. This ratio arises from the integral of the kernel function, \int_0^\infty \alpha e^{-\beta s} \, ds = \frac{\alpha}{\beta}, and ensures the total expected intensity remains finite over time. Simulation of the univariate Hawkes process can be efficiently performed using Ogata's modified thinning algorithm, which generates event times by proposing candidates from a homogeneous Poisson process and accepting them probabilistically based on the conditional intensity to account for the self-exciting dynamics. This approach is particularly effective for the exponential kernel, as it avoids exact integration of the intensity while bounding the maximum possible rate during simulation intervals. The algorithm proceeds in the following steps:- Initialize the current time t = 0, an empty event history, and initial intensity \lambda(t) = \mu.
-
While t < T (the simulation horizon):
- Compute an upper bound \bar{\lambda}(t) for the intensity over the next potential interval, typically \bar{\lambda}(t) = \mu + \alpha \sum_{t_i < t} e^{-\beta (t - t_i)} + \alpha to account for possible future jumps.
- Generate a candidate waiting time \Delta \sim \exp(\bar{\lambda}(t)) and set candidate time t' = t + \Delta.
- If t' > T, stop.
- Evaluate the exact intensity \lambda(t') at t'.
- Generate a uniform random variable u \sim U(0, 1).
- If u \leq \frac{\lambda(t')}{\bar{\lambda}(t')}, accept t' as an event time, add it to the history, and update the intensity accordingly; otherwise, reject and set t = t' without adding an event.
- Advance t = t'.