Investigations_Univariate.Rmd

---
title: "Efficient Change Point detection with COMBSS"
subtitle: "Investigation on the Change in Mean"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
  
## Stating the Problem  
  
Let $y_{1:p}$ be realizations of random variables $X_{1:p}$. Let $\mu_0,\mu_1,..,\mu_k\in\mathbb{R}$ with $Y_{1:\tau_1}\sim_{iid}\mathcal{N}(\mu_0,\sigma^2)$, $Y_{\tau_1+1:\tau_2}\sim_{iid}\mathcal{N}(\mu_1,\sigma^2)$, $...$, $Y_{\tau_{k}+1,p}\sim_{iid}\mathcal{N}(\mu_k,\sigma^2)$.   
The problem is to reliable identify the parameters $k$, $\tau_{1:k}$ and $\mu_0,\mu_1,..,\mu_k$ ideally with a quantification of uncertainty or evaluation criterion of some sort.  
  
## Existing Frameworks  
 
**Introduction to theoretical approaches:**  
  
Niu, Y. S., Hao, N., & Zhang, H. (2016). Multiple change-point detection: a selective overview. *Statistical Science*, 611-623.  
  
Approaches vary from hypothesis testing to formulating the problem in regression framework. However, developing sufficiently effective algorithms to apply the theory is where it starts to be a lot more complicated.  
  
**Introduction to Algorithm Evaluation:**  
  
van den Burg, G. J., & Williams, C. K. (2020). An evaluation of change point detection algorithms. *arXiv preprint arXiv:2003.06222.*
  
$\implies$ A variety of approaches exist with mainly sequential analysis (online) or maximum-likelihood estimation.

Example:  
  
(1) CUSUM (cumulative sum control chart) with  
  
$S_0=0$ and $S_{n+1}=max(0,S_n+y_{n+1}-w_n)$ for $y_{1:p}$ samples and $w_n$ assigned weights to detect changes in positive directions by choice of $w_n$. 
  
Page, E. S. (1957). On problems in which a change in a parameter occurs at an unknown point. *Biometrika*, 44(1/2), 248-252.

(2) Extension on the idea by Page in  
  
Basseville, M., & Nikiforov, I. V. (1993). *Detection of abrupt changes: theory and application* (Vol. 104). Englewood Cliffs: prentice Hall.  
  
## Univariate Mean Change Point Detection as a Regression Problem  
    
Adapted from Niu, Hao & Zhang (2016):  
  
Let $\beta_0=\mathbb{E}[Y_1]$ and $\beta_j=\mathbb{E}[Y_{j+1}]-\mathbb{E}[Y_{j}]$ for $j=1,...,p-1$.  
  
Then $X_i$ for $i=1,...,p$ is given by 
$$Y_i=\sum^{i-1}_{j=0}\beta_j + \epsilon_i$$  
with $\epsilon_i\sim_{iid}\mathcal{N}(0,\sigma^2)$.  
  
In matrix form this can pre rewritten as  
$$Y=X\beta+\epsilon$$ 
with $X_{pxp}$ being a lower triangular matrix with all entries being $1$.  
  
The closed form solution for the given univariate change point problem is the given by $\beta_0=\mu_0$, $\beta_{\tau_1}=\mu_1-\mu_0$, $...$, $\beta_{\tau_k}=\mu_k-\mu_{k-1}$ and $\beta_j=0$ otherwise.  
  
However, as the parameters are unknown, a regularization approach is applied instead in order to reduce all columns of the design matrix $X$ which do not contain a change point:  
  
$$||Y-X_s\hat{\beta}_s||+f(\lambda,s)$$  
  
with $s=(s_0, s_1, ..., s_{p-1})$, $s_0=1$ and $s_{\tau_i}=1$ for $i=1,..,k$, $X_s=diag(s)\cdot X$ and $\hat{\beta}_s$ the MLE for $X_s$ and $||\cdot||$ some norm and $f(\lambda,s)$ some penalty.  
  
Works on that approach have been done previously:  
  
Utilizing LASSO-regularization with a $l_1$ type penalty for variable selection.  
  
Harchaoui, Z., & Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. *Journal of the American Statistical Association*, 105(492), 1480-1493.  
  
And fused LASSO pattern recovery as a generalizations to the problem.  
  
Qian, J., & Jia, J. (2016). On stepwise pattern recovery of the fused lasso. *Computational Statistics & Data Analysis*, 94, 221-237.

  
---  
  
Essentially the approach boils down to a set of questions:  
  
(no specific order)
  
(1) Which form of regularization performs best depending on the type of change point problem?  
  
(2) Is an application of an approach computational feasible?  
  
(3) What is the relationship of the (unknown) parameters $\sigma^2$, $k$ and a penalty parameter (like $\lambda$)?  
  
While (3) depends on (1) in a way, (1) itself heavily depends on the exact formulation of the change point problem. For the most general case stated here, the $l_2$-norm is the obvious and likely the best choice. However, if we loose the normality assumption, a rank based norm will be more appropriate. If we extent to the multivariate case, we want to keep the $l_2$-norm but might want to utilize generalized distances $d_i$ instead of $Y_i$.  
  
Additionally, $Y$ should be decomposed to get rid of trend and cycles or seasonality before being inspected for change points. Also, while not required, $Y$ can be centered according to $Y_1$ in order to make reduce $s$ by one entry with $s_0=0$.  
  
### Obtaining the vector s with COMBSS  
  
The algorithm for best subset selection introduced with COMBSS promises fast results for $s$. In a way we answer (2) we first by wanting to utilize the strength of COMBSS to then look into (1) and from that deriving (3).  
  
***Idea:***  
  
Defining a continuous cost function $c(\lambda,t)=c_\lambda(t)$ by adapting the norm to the given type of change point problem (univariate/multivariate, known/unknown variance, normal/not normal) and adding a penalty based on sparsity of $t\in [0,1]^p$, the non-binary version of $s$.
  
*Example:* (multivariate normal case with known variance)  
  
Let $d$ be the vector of generalized distances of $y$ and $X$ the design matrix as described above. 
  
$$c_\lambda(t)=\frac1p||d-X_t\hat{\beta}_t||^2_2+\lambda\sum_{i=0}^{p-1} t_i$$
  
Minimize $c_\lambda(t)$ in $t$ for fixed $\lambda$.
  
Question: (3) and accordingly the selection of $\lambda$.  
  
  
---  

Nice to have: $X$ always has full rank.

To Do: Cases and Simulation. Choice of $\beta$ for non-mean problems (rank and variance changes).