Kernelized mean change

Description

Given a positive semi-definite kernel \(k(\cdot, \cdot) : \mathbb{R}^d\times \mathbb{R}^d \mapsto \mathbb{R}\) and its associated feature map \(\Phi:\mathbb{R}^d \mapsto \mathcal{H}\) (where \(\mathcal{H}\) is an appropriate Hilbert space), this cost function detects changes in the mean of the embedded signal \(\{\Phi(y_t)\}_t\) [KERACH12][KERGBR+12]. Formally, for a signal \(\{y_t\}_t\) on an interval \(I\),

\[c(y_{I}) = \sum_{t\in I} \|\Phi(y_t) - \bar{\mu}\|_{\mathcal{H}}^2\]

where \(\bar{\mu}\) is the empirical mean of the embedded sub-signal \(\{\Phi(y_t)\}_{t\in I}\). Here the kernel is the radial basis function (rbf):

\[k(x, y) = \exp(-\gamma \|x-y\|^2)\]

where \(\|\cdot\|\) is the Euclidean norm and \(\gamma>0\) is the so-called bandwidth parameter and is determined according to median heuristics (i.e. equal to the inverse of median of all pairwise distances).

Usage

Start with the usual imports and create a signal.

import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt
# creation of data
n, dim = 500, 3  # number of samples, dimension
n_bkps, sigma = 3, 5  # number of change points, noise standart deviation
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)

Then create a CostRbf instance and print the cost of the sub-signal signal[50:150].

c = rpt.costs.CostRbf().fit(signal)
print(c.error(50, 150))

You can also compute the sum of costs for a given list of change points.

print(c.sum_of_costs(bkps))
print(c.sum_of_costs([10, 100, 200, 250, n]))

In order to use this cost class in a change point detection algorithm (inheriting from BaseEstimator), either pass a CostRbf instance (through the argument 'custom_cost') or set model="rbf".

c = rpt.costs.CostRbf(); algo = rpt.Dynp(custom_cost=c)
# is equivalent to
algo = rpt.Dynp(model="rbf")

Code explanation

class ruptures.costs.CostRbf[source]

Kernel cost function (rbf kernel).

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

error(start, end)[source]

Return the approximation cost on the segment [start:end].

Parameters
  • start (int) – start of the segment

  • end (int) – end of the segment

Returns

segment cost

Return type

float

Raises

NotEnoughPoints – when the segment is too short (less than 'min_size' samples).

fit(signal)[source]

Sets parameters of the instance.

Parameters

signal (array) – signal. Shape (n_samples,) or (n_samples, n_features)

Returns

self

References

KERACH12

S. Arlot, A. Celisse, and Z. Harchaoui. Kernel change-point detection. arXiv preprint arXiv:1202.3878, 1(0000):1–26, 2012.

KERGBR+12

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(1):723–773, 2012.