Mahalanobis-type metric

Description

Given a positive semi-definite matrix \(M\in\mathbb{R}^{d\times d}\), this cost function detects changes in the mean of the embedded signal defined by the pseudo-metric

\[\|x-y\|_M^2 = (x-y)^t M (x-y)\]

Formally, for a signal \(\{y_t\}_t\) on an interval \(I\), the cost function is equal to

\[c(y_{I}) = \sum_{t\in I} \|y_t - \bar{\mu}\|_{M}^2\]

where \(\bar{\mu}\) is the empirical mean of the sub-signal \(\{y_t\}_{t\in I}\). The matrix \(M\) can for instance be the result of a similarity learning algorithm [MLXJR03] or the inverse of the empirical covariance matrix (yielding the Mahalanobis distance).

Usage

Start with the usual imports and create a signal.

import numpy as np
import matplotlib.pylab as plt
import ruptures as rpt
# creation of data
n, dim = 500, 3  # number of samples, dimension
n_bkps, sigma = 3, 5  # number of change points, noise standart deviation
signal, bkps = rpt.pw_constant(n, dim, n_bkps, noise_std=sigma)

Then create a CostMl instance and print the cost of the sub-signal signal[50:150].

M = np.eye(dim)
c = rpt.costs.CostMl(metric=M).fit(signal)
print(c.error(50, 150))

You can also compute the sum of costs for a given list of change points.

print(c.sum_of_costs(bkps))
print(c.sum_of_costs([10, 100, 200, 250, n]))

In order to use this cost class in a change point detection algorithm (inheriting from BaseEstimator), either pass a CostMl instance (through the argument 'custom_cost') or set model="mahalanobis".

c = rpt.costs.CostMl(metric=M); algo = rpt.Dynp(custom_cost=c)
# is equivalent to
algo = rpt.Dynp(model="mahalanobis", params={"metric": M})

Code explanation

class ruptures.costs.CostMl(metric=None)[source]

Mahalanobis-type cost function.

__init__(metric=None)[source]

Create a new instance.

Parameters

metric (ndarray, optional) – PSD matrix that defines a Mahalanobis-type pseudo distance. If None, defaults to the Mahalanobis matrix. Shape (n_features, n_features).

Returns

self

error(start, end)[source]

Return the approximation cost on the segment [start:end].

Parameters
  • start (int) – start of the segment

  • end (int) – end of the segment

Returns

segment cost

Return type

float

Raises

NotEnoughPoints – when the segment is too short (less than 'min_size' samples).

fit(signal)[source]

Sets parameters of the instance.

Parameters

signal (array) – signal. Shape (n_samples,) or (n_samples, n_features)

Returns

self

References

MLXJR03

E. P. Xing, M. I. Jordan, and S. J. Russell. Distance metric learning, with application to clustering with side-Information. In Advances in Neural Information Processing Systems 21 (NIPS 2003), 521–528. 2003.