Exact segmentation: dynamic programming

Description

The method is implemented in ruptures.detection.Dynp.

Roughly speaking, it computes the cost of all subsequences of a given signal. The number of computed costs is of the order \(\mathcal{O}(Kn^2)\), where \(K\) is the number of change points and \(n\) the number of samples. This has to be multiplied by the computational cost of computing the approximation error on one sub-sequence. Consequently, piecewise constant models are significantly faster than linear or autoregressive models.

Computational cost is drastically reduced when considering only a subsample of possible change points. When calling ruptures.detection.Dynp.__init__(), the minimum distance between change points can be set through the keyword 'min_size'; through the parameter 'jump', only change point indexes multiple of a particular value are considered.

Usage

import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt

# creation of data
n, dim = 500, 3
n_bkps, sigma = 3, 5
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)

# change point detection
model = "l1"  # "l2", "rbf"
algo = rpt.Dynp(model=model, min_size=3, jump=5).fit(signal)
my_bkps = algo.predict(n_bkps=3)

# show results
rpt.show.display(signal, bkps, my_bkps, figsize=(10, 6))
plt.show()

Code explanation

class ruptures.detection.Dynp(model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]

Find optimal change points using dynamic programming.

Given a segment model, it computes the best partition for which the sum of errors is minimum.

__init__(model='l2', custom_cost=None, min_size=2, jump=5, params=None)[source]

Creates a Dynp instance.

Parameters
  • model (str, optional) – segment model, [“l1”, “l2”, “rbf”]. Not used if 'custom_cost' is not None.

  • custom_cost (BaseCost, optional) – custom cost function. Defaults to None.

  • min_size (int, optional) – minimum segment length.

  • jump (int, optional) – subsample (one every jump points).

  • params (dict, optional) – a dictionary of parameters for the cost instance.

Returns

self

fit(signal)[source]

Create the cache associated with the signal.

Dynamic programming is a recurrence; intermediate results are cached to speed up computations. This method sets up the cache.

Parameters

signal (array) – signal. Shape (n_samples, n_features) or (n_samples,).

Returns

self

fit_predict(signal, n_bkps)[source]

Fit to the signal and return the optimal breakpoints.

Helper method to call fit and predict once

Parameters
  • signal (array) – signal. Shape (n_samples, n_features) or (n_samples,).

  • n_bkps (int) – number of breakpoints.

Returns

sorted list of breakpoints

Return type

list

predict(n_bkps)[source]

Return the optimal breakpoints.

Must be called after the fit method. The breakpoints are associated with the signal passed to fit().

Parameters

n_bkps (int) – number of breakpoints.

Returns

sorted list of breakpoints

Return type

list