Detecting change points¶
Detecting change points with fastchange is simple, given a 1-dimensional signal we need to chose a segmentation method, a cost function, and a penalty. What methods we use depends on how many change points there could be, whether we need exact results, and the structure of the time series. Below we will go over some common cases and how we can solve them with fastchange.
Detecting a single change point¶
The simplest case is detecting a single change point in a series. To do this, we will use the AmocSeg
(At-most one change point segmentation) class, which gives us an exact estimate of the most likely change if there is one. We first create a synthetic signal from a random normal distribution with a single change point at index 100. Given we know its the signal follows a normal distribution we will use the NormalMeanVarCost
to identify when the mean or variance has changed, subject to a complexity penalty (mbic_penalty
) to limit how aggressive the model is.
# Importing packages
import numpy as np
import matplotlib.pyplot as plt
from fastchange.seg.amoc import AmocSeg
from fastchange.costs.normal import NormalMeanVarCost
from fastchange.penalties import mbic_penalty
# Creating synthetic data
data = np.hstack([
np.random.normal(0, 1, (100,)),
np.random.normal(5, 2, (100,))
])
# Plotting our synthetic data
plt.figure(figsize=(12, 8))
plt.plot(data)
plt.show()
Now that we have our synthetic signal, we are going to declare our model, fit it on the signal, and get the array of estimated change point locations. Note there will be at most 1 index returned as we are using AMOC.
# Running AMOC changepoint
model = AmocSeg(NormalMeanVarCost(), mbic_penalty).fit(data)
cpts = model.predict()
We can overlay our change point estimates on the original signal to see where the signal has changed, and from the plot below we can see the model does a good job of finding when the mean/variance has changed.
# Plotting our synthetic data with detected change points
plt.figure(figsize=(12, 8))
plt.plot(data)
for i in cpts:
plt.axvline(i, color='red', linestyle='--')
plt.show()
Detecting multiple change points¶
The more common case is multiple change point detection, where there is an unknown number of changes in a signal. Similar to the section above, we create an artifical signal with three clear change points:
# Creating synthetic data
data = np.hstack([
np.random.normal(0, 1, (500,)),
np.random.normal(10, 2, (500,)),
np.random.normal(8, 1, (500,)),
np.random.normal(-1, 1, (500,)),
])
# Plotting our synthetic data
plt.figure(figsize=(12, 8))
plt.plot(data)
plt.show()
Given the quadratic complexity of this problem, we need to decide whether we are okay with approximate change point locations to gain some speed.
BinSeg: Fast + Approximate¶
Should we be okay with inexact change point estimates, Binary Segmentation is a common approach with logarithmic complexity. Binary segmentation iteratively applies AMOC on each resulting subsignal until no more change points can be discovered or we reach some maximum number of change points specified by the user. Identifying multiple changes in fastchange is the same as the single change case, we just need to change our segmentation method:
# Importing binary segmentation
from fastchange.seg.binseg import BinSeg
# Running BinSeg changepoint
model = BinSeg(NormalMeanVarCost(), mbic_penalty, min_len=10).fit(data)
cpts = model.predict()
# Plotting our synthetic data with detected change points
plt.figure(figsize=(12, 8))
plt.plot(data)
for i in cpts:
plt.axvline(i, color='red', linestyle='--')
plt.show()
From the plot above we can see Binary Segmentation does a good job of identifying all the change points in our signal. We are also going to time Binary Segmentation so we can compare it to the exact methods we discuss in the next section.
%timeit BinSeg(NormalMeanVarCost(), mbic_penalty, min_len=10).fit(data).predict()
1.76 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Pelt: Somewhat slower + Exact¶
In cases where we need to know the exact location of change points, Pelt (Pruned Exact Linear Time) is a common segmentation method that can find all changes in a signal in linear time on the length of the series.
# Importing binary segmentation
from fastchange.seg.pelt import PeltSeg
# Running PELT changepoint
model = PeltSeg(NormalMeanVarCost(), mbic_penalty, min_len=10).fit(data)
cpts = model.predict()
# Plotting our synthetic data with detected change points
plt.figure(figsize=(12, 8))
plt.plot(data)
for i in cpts:
plt.axvline(i, color='red', linestyle='--')
plt.show()
%timeit PeltSeg(NormalMeanVarCost(), mbic_penalty, min_len=10).fit(data)
1.92 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It looks like Pelt did just as good as Binary Segmentation at finding all three of the changes, but at the expense of some runtime as shown above. The difference in this case wasn't large, but as series gets longer and there are more change points this gap will widen.