Bottom-up segmentation¶

Description¶

Bottom-up change point detection is used to perform fast signal segmentation and is implemented in ruptures.detection.BottomUp. It is a sequential approach. Contrary to binary segmentation, which is a greedy procedure, bottom-up segmentation is generous: it starts with many change points and successively deletes the less significant ones. First, the signal is divided in many sub-signals along a regular grid. Then contiguous segments are successively merged according to a measure of how similar they are. See for instance [BUKCHP01] or [BUFry07] for an algorithmic analysis of ruptures.detection.BottomUp. The benefits of bottom-up segmentation includes low complexity (of the order of \(\mathcal{O}(n\log n)\), where \(n\) is the number of samples), the fact that it can extend any single change point detection method to detect multiple changes points and that it can work whether the number of regimes is known beforehand or not.

Schematic view of the bottom-up segmentation algorithm.¶

Usage¶

Start with the usual imports and create a signal.

import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt
# creation of data
n, dim = 500, 3  # number of samples, dimension
n_bkps, sigma = 3, 5  # number of change points, noise standart deviation
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)

To perform a bottom-up segmentation of a signal, initialize a ruptures.detection.BottomUp instance.

# change point detection
model = "l2"  # "l1", "rbf", "linear", "normal", "ar"
algo = rpt.BottomUp(model=model).fit(signal)
my_bkps = algo.predict(n_bkps=3)

# show results
rpt.show.display(signal, bkps, my_bkps, figsize=(10, 6))
plt.show()

In the situation in which the number of change points is unknown, one can specify a penalty using the 'pen' parameter or a threshold on the residual norm using 'epsilon'.

my_bkps = algo.predict(pen=np.log(n)*dim*sigma**2)
# or
my_bkps = algo.predict(epsilon=3*n*sigma**2)

Code explanation¶

class ruptures.detection.BottomUp(model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]¶

Bottom-up segmentation.

__init__(model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]¶

Initialize a BottomUp instance.

Parameters

model (str, optional) – segment model, [“l1”, “l2”, “rbf”]. Not used if 'custom_cost' is not None.
custom_cost (BaseCost, optional) – custom cost function. Defaults to None.
min_size (int, optional) – minimum segment length. Defaults to 2 samples.
jump (int, optional) – subsample (one every jump points). Defaults to 5 samples.
params (dict, optional) – a dictionary of parameters for the cost instance.

Returns

self

fit(signal)[source]¶

Compute params to segment signal.

Parameters: signal (array) – signal to segment. Shape (n_samples, n_features) or (n_samples,).
Returns: self

fit_predict(signal, n_bkps=None, pen=None, epsilon=None)[source]¶

Fit to the signal and return the optimal breakpoints.

Helper method to call fit and predict once

Parameters

signal (array) – signal. Shape (n_samples, n_features) or (n_samples,).
n_bkps (int) – number of breakpoints.
penalty (float) – penalty value (>0)
epsilon (float) – reconstruction budget (>0)

Returns

sorted list of breakpoints

Return type

list

predict(n_bkps=None, pen=None, epsilon=None)[source]¶

Return the optimal breakpoints.

Must be called after the fit method. The breakpoints are associated with the signal passed to fit(). The stopping rule depends on the parameter passed to the function.

Parameters

n_bkps (int) – number of breakpoints to find before stopping.
penalty (float) – penalty value (>0)
epsilon (float) – reconstruction budget (>0)

Returns

sorted list of breakpoints

Return type

list

References

BUFry07: Piotr Fryzlewicz. Unbalanced Haar Technique for Nonparametric Function Estimation. Journal of the American Statistical Association, 102(480):1318–1327, 2007. doi:10.1198/016214507000000860.
BUKCHP01: E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 289–296. 2001.