Skip to content
Snippets Groups Projects
Investigations_Univariate.Rmd 5.69 KiB
Newer Older
Hans Reimann's avatar
Hans Reimann committed
---
Hans Reimann's avatar
Hans Reimann committed
title: "Efficient Change Point detection with COMBSS"
subtitle: "Investigation on the Change in Mean"
Hans Reimann's avatar
Hans Reimann committed
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
  
## Stating the Problem  
  
Let $y_{1:p}$ be realizations of random variables $X_{1:p}$. Let $\mu_0,\mu_1,..,\mu_k\in\mathbb{R}$ with $Y_{1:\tau_1}\sim_{iid}\mathcal{N}(\mu_0,\sigma^2)$, $Y_{\tau_1+1:\tau_2}\sim_{iid}\mathcal{N}(\mu_1,\sigma^2)$, $...$, $Y_{\tau_{k}+1,p}\sim_{iid}\mathcal{N}(\mu_k,\sigma^2)$.   
The problem is to reliable identify the parameters $k$, $\tau_{1:k}$ and $\mu_0,\mu_1,..,\mu_k$ ideally with a quantification of uncertainty or evaluation criterion of some sort.  
  
## Existing Frameworks  
 
**Introduction to theoretical approaches:**  
  
Niu, Y. S., Hao, N., & Zhang, H. (2016). Multiple change-point detection: a selective overview. *Statistical Science*, 611-623.  
  
Approaches vary from hypothesis testing to formulating the problem in regression framework. However, developing sufficiently effective algorithms to apply the theory is where it starts to be a lot more complicated.  
  
Hans Reimann's avatar
Hans Reimann committed
**Introduction to Algorithm Evaluation:**  
Hans Reimann's avatar
Hans Reimann committed
  
van den Burg, G. J., & Williams, C. K. (2020). An evaluation of change point detection algorithms. *arXiv preprint arXiv:2003.06222.*
  
$\implies$ A variety of approaches exist with mainly sequential analysis (online) or maximum-likelihood estimation.

Example:  
  
(1) CUSUM (cumulative sum control chart) with  
  
$S_0=0$ and $S_{n+1}=max(0,S_n+y_{n+1}-w_n)$ for $y_{1:p}$ samples and $w_n$ assigned weights to detect changes in positive directions by choice of $w_n$. 
  
Page, E. S. (1957). On problems in which a change in a parameter occurs at an unknown point. *Biometrika*, 44(1/2), 248-252.

Hans Reimann's avatar
Hans Reimann committed
(2) Extension on the idea by Page in  
Hans Reimann's avatar
Hans Reimann committed
  
Basseville, M., & Nikiforov, I. V. (1993). *Detection of abrupt changes: theory and application* (Vol. 104). Englewood Cliffs: prentice Hall.  
  
Hans Reimann's avatar
Hans Reimann committed
## Univariate Mean Change Point Detection as a Regression Problem  
Hans Reimann's avatar
Hans Reimann committed
    
Adapted from Niu, Hao & Zhang (2016):  
  
Let $\beta_0=\mathbb{E}[Y_1]$ and $\beta_j=\mathbb{E}[Y_{j+1}]-\mathbb{E}[Y_{j}]$ for $j=1,...,p-1$.  
  
Then $X_i$ for $i=1,...,p$ is given by 
$$Y_i=\sum^{i-1}_{j=0}\beta_j + \epsilon_i$$  
with $\epsilon_i\sim_{iid}\mathcal{N}(0,\sigma^2)$.  
  
In matrix form this can pre rewritten as  
$$Y=X\beta+\epsilon$$ 
with $X_{pxp}$ being a lower triangular matrix with all entries being $1$.  
Hans Reimann's avatar
Hans Reimann committed
  
The closed form solution for the given univariate change point problem is the given by $\beta_0=\mu_0$, $\beta_{\tau_1}=\mu_1-\mu_0$, $...$, $\beta_{\tau_k}=\mu_k-\mu_{k-1}$ and $\beta_j=0$ otherwise.  
  
However, as the parameters are unknown, a regularization approach is applied instead in order to reduce all columns of the design matrix $X$ which do not contain a change point:  
  
$$||Y-X_s\hat{\beta}_s||+f(\lambda,s)$$  
  
with $s=(s_0, s_1, ..., s_{p-1})$, $s_0=1$ and $s_{\tau_i}=1$ for $i=1,..,k$, $X_s=diag(s)\cdot X$ and $\hat{\beta}_s$ the MLE for $X_s$ and $||\cdot||$ some norm and $f(\lambda,s)$ some penalty.  
  
Works on that approach have been done previously:  
  
Utilizing LASSO-regularization with a $l_1$ type penalty for variable selection.  
  
Harchaoui, Z., & Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. *Journal of the American Statistical Association*, 105(492), 1480-1493.  
  
And fused LASSO pattern recovery as a generalizations to the problem.  
  
Qian, J., & Jia, J. (2016). On stepwise pattern recovery of the fused lasso. *Computational Statistics & Data Analysis*, 94, 221-237.

  
---  
  
Essentially the approach boils down to a set of questions:  
  
(no specific order)
  
(1) Which form of regularization performs best depending on the type of change point problem?  
  
(2) Is an application of an approach computational feasible?  
  
(3) What is the relationship of the (unknown) parameters $\sigma^2$, $k$ and a penalty parameter (like $\lambda$)?  
  
While (3) depends on (1) in a way, (1) itself heavily depends on the exact formulation of the change point problem. For the most general case stated here, the $l_2$-norm is the obvious and likely the best choice. However, if we loose the normality assumption, a rank based norm will be more appropriate. If we extent to the multivariate case, we want to keep the $l_2$-norm but might want to utilize generalized distances $d_i$ instead of $Y_i$.  
  
Additionally, $Y$ should be decomposed to get rid of trend and cycles or seasonality before being inspected for change points. Also, while not required, $Y$ can be centered according to $Y_1$ in order to make reduce $s$ by one entry with $s_0=0$.  
  
### Obtaining the vector s with COMBSS  
  
The algorithm for best subset selection introduced with COMBSS promises fast results for $s$. In a way we answer (2) we first by wanting to utilize the strength of COMBSS to then look into (1) and from that deriving (3).  
  
***Idea:***  
  
Hans Reimann's avatar
Hans Reimann committed
Defining a continuous cost function $c(\lambda,t)=c_\lambda(t)$ by adapting the norm to the given type of change point problem (univariate/multivariate, known/unknown variance, normal/not normal) and adding a penalty based on sparsity of $t\in [0,1]^p$, the non-binary version of $s$.
Hans Reimann's avatar
Hans Reimann committed
  
*Example:* (multivariate normal case with known variance)  
Let $d$ be the vector of generalized distances of $y$ and $X$ the design matrix as described above. 
  
Hans Reimann's avatar
Hans Reimann committed
$$c_\lambda(t)=\frac1p||d-X_t\hat{\beta}_t||^2_2+\lambda\sum_{i=0}^{p-1} t_i$$
Hans Reimann's avatar
Hans Reimann committed
Minimize $c_\lambda(t)$ in $t$ for fixed $\lambda$.
  
Question: (3) and accordingly the selection of $\lambda$.  
Hans Reimann's avatar
Hans Reimann committed
---  

Hans Reimann's avatar
Hans Reimann committed
Nice to have: $X$ always has full rank.

To Do: Cases and Simulation. Choice of $\beta$ for non-mean problems (rank and variance changes).