Window-based change point detection¶
Description¶
Window-based change point detection is used to perform fast signal segmentation and is implemented in
ruptures.detection.Window
.
The algorithm uses two windows which slide along the data stream.
The statistical properties of the signals within each window are compared with a discrepancy
measure.
For a given cost function \(c(\cdot)\) (see Cost functions), a discrepancy measure is derived
\(d(\cdot,\cdot)\) as follows:
where \(\{y_t\}_t\) is the input signal and \(u<v<w\) are indexes. The discrepancy is the cost gain of splitting the sub-signal \(y_{u..w}\) at the index \(v\). If the sliding windows \(u..v\) and \(v..w\) both fall into a segment, their statistical properties are similar and the discrepancy between the first window and the second window is low. If the sliding windows fall into two dissimilar segments, the discrepancy is significantly higher, suggesting that the boundary between windows is a change point. The discrepancy curve is the curve, defined for all indexes \(t\) between \(w/2\) and \(n-w/2\) (\(n\) is the number of samples),
where \(w\) is the window length. A sequential peak search is performed on the discrepancy curve in order to detect change points.
The benefits of window-based segmentation includes low complexity (of the order of \(\mathcal{O}(n w)\), where \(n\) is the number of samples), the fact that it can extend any single change point detection method to detect multiple changes points and that it can work whether the number of regimes is known beforehand or not.
See also
Usage¶
Start with the usual imports and create a signal.
import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt
# creation of data
n, dim = 500, 3 # number of samples, dimension
n_bkps, sigma = 3, 5 # number of change points, noise standart deviation
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)
To perform a binary segmentation of a signal, initialize a ruptures.detection.Window
instance.
# change point detection
model = "l2" # "l1", "rbf", "linear", "normal", "ar"
algo = rpt.Window(width=40, model=model).fit(signal)
my_bkps = algo.predict(n_bkps=3)
# show results
rpt.show.display(signal, bkps, my_bkps, figsize=(10, 6))
plt.show()
The window length (in number of samples) is modified through the argument 'width'
.
Usual methods assume that the window length is smaller than the smallest regime length.
In the situation in which the number of change points is unknown, one can specify a penalty using
the 'pen'
parameter or a threshold on the residual norm using 'epsilon'
.
my_bkps = algo.predict(pen=np.log(n)*dim*sigma**2)
# or
my_bkps = algo.predict(epsilon=3*n*sigma**2)
See also
Change point detection: a general formulation for more information about stopping rules of sequential algorithms.
For faster predictions, one can modify the 'jump'
parameter during initialization.
The higher it is, the faster the prediction is achieved (at the expense of precision).
algo = rpt.Window(model=model, jump=10).fit(signal)
Code explanation¶
-
class
ruptures.detection.
Window
(width=100, model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]¶ Window sliding method.
-
__init__
(width=100, model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]¶ Instanciate with window length.
- Parameters
width (int, optional) – window length. Defaults to 100 samples.
model (str, optional) – segment model, [“l1”, “l2”, “rbf”]. Not used if
is not None. ('custom_cost') –
custom_cost (BaseCost, optional) – custom cost function. Defaults to None.
min_size (int, optional) – minimum segment length.
jump (int, optional) – subsample (one every jump points).
params (dict, optional) – a dictionary of parameters for the cost instance.
- Returns
self
-
fit
(signal)[source]¶ Compute params to segment signal.
- Parameters
signal (array) – signal to segment. Shape (n_samples, n_features) or (n_samples,).
- Returns
self
-
fit_predict
(signal, n_bkps=None, pen=None, epsilon=None)[source]¶ Helper method to call fit and predict once.
-
predict
(n_bkps=None, pen=None, epsilon=None)[source]¶ Return the optimal breakpoints.
Must be called after the fit method. The breakpoints are associated with the signal passed to fit(). The stopping rule depends on the parameter passed to the function.
- Parameters
n_bkps (int) – number of breakpoints to find before stopping.
penalty (float) – penalty value (>0)
penalty – penalty value
- Returns
sorted list of breakpoints
- Return type
list
-