Binary segmentation¶
Description¶
Binary change point detection is used to perform fast signal segmentation and is implemented in
ruptures.detection.BinSeg
.
It is a sequential approach: first, one change point is detected in the complete input signal, then
series is split around this change point, then the operation is repeated on the two resulting
sub-signals. See for instance [BSBai97] and [BSFry14] for a theoretical and
algorithmic analysis of ruptures.detection.BinSeg
.
The benefits of binary segmentation includes low complexity (of the order of
\(\mathcal{O}(n\log n)\), where \(n\) is the number of samples), the fact that it can extend
any single change point detection method to detect multiple changes points and that it can work
whether the number of regimes is known beforehand or not.
Usage¶
Start with the usual imports and create a signal.
import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt
# creation of data
n = 500 # number of samples
n_bkps, sigma = 3, 5 # number of change points, noise standart deviation
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)
To perform a binary segmentation of a signal, initialize a ruptures.detection.BinSeg
instance.
# change point detection
model = "l2" # "l1", "rbf", "linear", "normal", "ar"
algo = rpt.Binseg(model=model).fit(signal)
my_bkps = algo.predict(n_bkps=3)
# show results
rpt.show.display(signal, bkps, my_bkps, figsize=(10, 6))
plt.show()
In the situation in which the number of change points is unknown, one can specify a penalty using
the 'pen'
parameter or a threshold on the residual norm using 'epsilon'
.
my_bkps = algo.predict(pen=np.log(n)*dim*sigma**2)
# or
my_bkps = algo.predict(epsilon=3*n*sigma**2)
See also
Change point detection: a general formulation for more information about stopping rules of sequential algorithms.
For faster predictions, one can modify the 'jump'
parameter during initialization.
The higher it is, the faster the prediction is achieved (at the expense of precision).
algo = rpt.Binseg(model=model, jump=10).fit(signal)
Code explanation¶
-
class
ruptures.detection.
Binseg
(model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]¶ Binary segmentation.
-
__init__
(model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]¶ Initialize a Binseg instance.
- Parameters
model (str, optional) – segment model, [“l1”, “l2”, “rbf”,…]. Not used if
'custom_cost'
is not None.custom_cost (BaseCost, optional) – custom cost function. Defaults to None.
min_size (int, optional) – minimum segment length. Defaults to 2 samples.
jump (int, optional) – subsample (one every jump points). Defaults to 5 samples.
params (dict, optional) – a dictionary of parameters for the cost instance.
- Returns
self
-
fit
(signal)[source]¶ Compute params to segment signal.
- Parameters
signal (array) – signal to segment. Shape (n_samples, n_features) or (n_samples,).
- Returns
self
-
fit_predict
(signal, n_bkps=None, pen=None, epsilon=None)[source]¶ Fit to the signal and return the optimal breakpoints.
Helper method to call fit and predict once
- Parameters
signal (array) – signal. Shape (n_samples, n_features) or (n_samples,).
n_bkps (int) – number of breakpoints.
penalty (float) – penalty value (>0)
epsilon (float) – reconstruction budget (>0)
- Returns
sorted list of breakpoints
- Return type
list
-
predict
(n_bkps=None, pen=None, epsilon=None)[source]¶ Return the optimal breakpoints.
Must be called after the fit method. The breakpoints are associated with the signal passed to fit(). The stopping rule depends on the parameter passed to the function.
- Parameters
n_bkps (int) – number of breakpoints to find before stopping.
penalty (float) – penalty value (>0)
epsilon (float) – reconstruction budget (>0)
- Returns
sorted list of breakpoints
- Return type
list
-
References
- BSBai97
J. Bai. Estimating multiple breaks one at a time. Econometric Theory, 13(3):315–352, 1997.
- BSFry14
P. Fryzlewicz. Wild binary segmentation for multiple change-point detection. The Annals of Statistics, 42(6):2243–2281, 2014. doi:10.1214/14-AOS1245.