Basic Kalman Filter (Due 04/18/2022)
Joshua Cormier	Harjas Dadiyal	Isis Moussan	Geanna Perera
Upload to Black Board, by 19:15 04/18/2022

Accept Reject Sampling (Due 04/11/2022)
Joshua Cormier	Harjas Dadiyal	Isis Moussan	Geanna Perera
Upload to Black Board, by 19:15 04/11/2021

Maps and Controls (Due 04/11/2022)
Joshua Cormier	Harjas Dadiyal	Isis Moussan	Geanna Perera
Upload to Black Board, by 19:15 04/11/2022

Here is an example code for State update.

Robot Perception notes

Robot Motion notes

Matrix Inversion Lemma notes

Kalman Filter notes

Test 1

Joshua Cormier Harjas Dadiyal Isis Moussan Geanna Perera

Upload your solution to BlackBoard DropBox by 19:15, April 4 2022

Test 1
Joshua Cormier	Harjas Dadiyal	Isis Moussan	Geanna Perera
Upload your solution to BlackBoard DropBox by 19:15, April 4 2022

Oja's rule appears very similar to LMS rule. $$ \mathbf{w}(n+1)=\mathbf{w}(n)+\eta\, \underbrace{\mathbf{w}^T(n) \mathbf{x}(n)}_{y(n)} \mathbf{x}(n) $$ with $\eta>0$ a small but positive constant, results in a positive feedback, so $\mathbf{w}$ will tend to grow without bounds, hence the need to normalize, $ \frac{\mathbf{w}(n)}{||\mathbf{w}(n)||} \rightarrow \mathbf{w}(n) $ $$ \begin{align} \mathbf{w}(n+1) &= \frac{\mathbf{w}(n)+\eta\, y(n) \mathbf{x}(n)}{\left(\left(\mathbf{w}(n)+\eta\, y(n)\mathbf{x}(n) \right)^T\left(\mathbf{w}(n)+\eta\, y(n)\mathbf{x}(n) \right) \right)^{\frac{1}{2}}} \\ &= \frac{\mathbf{w}(n)+\eta\, y(n) \mathbf{x}(n)}{\left(\left(\mathbf{w}^T(n)+\eta\, y(n)\mathbf{x}^T(n) \right)\left(\mathbf{w}(n)+\eta\, y(n)\mathbf{x}(n) \right) \right)^{\frac{1}{2}}} \\ &= \frac{\mathbf{w}(n)+\eta\, y(n) \mathbf{x}(n)}{\left( \mathbf{w}^T(n)\mathbf{w}(n)+2\eta\, y^2(n)+\eta^2\, y^2(n)\mathbf{x}^T(n)\mathbf{x}(n) \right)^{\frac{1}{2}}} \\ \end{align} $$ This guarantees $||\mathbf{w}(n+1)||=||\mathbf{w}(n)||=\sqrt{\mathbf{w}^T\mathbf{w}}=1$, so $\mathbf{w}^T\mathbf{w} = 1$. Next, since $\eta>0$ is small constant, linearize about $\eta=0$ by using first two terms of Taylor series expansions, that is $$ \begin{align} \mathbf{w}(n+1) &= \mathbf{w}(n+1)\Biggr|_{\eta =0}+\eta \cdot\frac{\partial \mathbf{w}(n+1)}{\partial \eta}\Biggr|_{\eta = 0}+O(\eta^2) \end{align} $$ Find $\frac{\partial \mathbf{w}(n+1)}{\partial \eta}$ first. $$ \begin{align} \frac{\partial \mathbf{w}(n+1)}{\partial \eta} &= \frac{\partial }{\partial \eta}\left[ \frac{\mathbf{w}(n)+\eta\, y(n) \mathbf{x}(n)}{\left( \mathbf{w}^T(n)\mathbf{w}(n)+2\eta\, y^2(n)+\eta^2\, y^2(n)\mathbf{x}^T(n)\mathbf{x}(n) \right)^{\frac{1}{2}}} \right] \\ &= \frac{\left( \mathbf{w}^T(n)\mathbf{w}(n)+2\eta\, y^2(n)+\eta^2\, y^2(n)\mathbf{x}^T(n)\mathbf{x}(n) \right)^{\frac{1}{2}} \cdot \left( y(n)\mathbf{x}(n) \right) -\left(\mathbf{w}(n)+\eta\, y(n)\mathbf{x}(n)\right)\cdot \frac{1}{2}\left( \mathbf{w}^T(n)\mathbf{w}(n)+2\eta\, y^2(n)+\eta^2\, y^2(n)\mathbf{x}^T(n)\mathbf{x}(n) \right)^{-\frac{1}{2}} \cdot \left( 2 y^2(n)+2\eta\, y^2(n)\mathbf{x}^T(n)\mathbf{x}(n) \right) }{\mathbf{w}^T(n)\mathbf{w}(n)+2\eta\, y^2(n)+\eta^2\, y^2(n)\mathbf{x}^T(n)\mathbf{x}(n)} \\ \frac{\partial \mathbf{w}(n+1)}{\partial \eta}\Biggr|_{\eta=0} &= \frac{\left( \mathbf{w}^T(n)\mathbf{w}(n) \right)^{\frac{1}{2}} \cdot \left( y(n)\mathbf{x}(n) \right) -\left(\mathbf{w}(n) \right)\cdot \left( \mathbf{w}^T(n)\mathbf{w}(n) \right)^{-\frac{1}{2}} \cdot \left( y^2(n)\right) }{\mathbf{w}^T(n)\mathbf{w}(n)} \\ \frac{\partial \mathbf{w}(n+1)}{\partial \eta}\Biggr|_{\eta=0,\, \mathbf{w}^T\mathbf{w}=1} &= y(n)\left[ \mathbf{x}(n)-y(n)\mathbf{w}(n) \right] \end{align} $$ Finally $$ \begin{align} \mathbf{w}(n+1) &= \mathbf{w}(n+1)\Biggr|_{\eta =0}+\eta \cdot\frac{\partial \mathbf{w}(n+1)}{\partial \eta}\Biggr|_{\eta = 0}+O(\eta^2) \\ &\approx \mathbf{w}(n)+\eta\cdot y(n)\left[ \mathbf{x}(n)-y(n)\mathbf{w}(n) \right] \,\,\, \checkmark \,\,\, \text{ Erkki Oja's rule} \end{align} $$

Unlike LMS or RLS, there is no external reference signal or a desired response $d(n)$, hence bombastically referred to as "unsupervised learning".
The weight vector $\mathbf{w}$ converges to the eigenvector of $\mathbf{R}=\mathbb{E}\{\mathbf{x}(n) \mathbf{x}^T(n)\}$ corresponding to the largest eigenvalue. $\boxed{\mathbf{w}=\lim\limits_{n\rightarrow \infty} \mathbf{w}(n)}$ is aka the "principal component".

LMS and RLS assignment (due March 07 2022)

The purpose of this assignment is to get familiar with the

Details of implementation of the Least Mean Square (LMS) rule used in system identification. $$ \mathbf{w}_{n+1}=\mathbf{w}_n+\mu e(n)\mathbf{x}_n $$
The effect of step size $\mu$ along with eigenvalue spread of $\mathbf{R}$.
Idea of Stochastic Gradient Descent (SGD).
Mean Square Error (MSE) surface, that is positive definite quadratic form: $$ \begin{align} \mathbb{E}\left[\left(d(n)-\mathbf{w}^T\mathbf{x}\right)^2\right] &= \mathbb{E}\left[d^2(n) \right]-2\mathbf{w}^T\mathbb{E}\left[ d(n)\mathbf{x}\right]+\mathbf{w}^T\mathbb{E}\left[\mathbf{xx}^T \right]\mathbf{w} \\ &= \sigma^2_{d_n} -2\mathbf{w}^T\mathbf{p}+\mathbf{w}^T\mathbf{Rw}^T \end{align} $$
Autocorrelation matrix $\mathbf{R}=\mathbb{E}[\mathbf{xx}^T]$ and the crosscorrelation vector $\mathbf{p}=\mathbb{E}[d(n)\mathbf{x}]$ and their time averaged estimates.
Details of implementation of the conventional Recursive Least Square (RLS) adaptive filtering algorithm.
INITIALIZE at $n=0$
$$ \begin{align} \mathbf{w}(0) &= \mathbf{x}(0)=\mathbf{0} \\ \mathbf{P}(0) &= \delta \mathbf{I}, \,\,\, \delta \gg 1 \end{align} $$ REPEAT for $n=1$ to $n=$ enough
1. Fetch $d(n), \, \mathbf{x}(n)$
2. Calculate $$ \begin{align} e(n|n-1)&=d(n)-\mathbf{x}^T(n)\mathbf{w}(n-1) & \text{calculate prediction error}\\ \mu &= \mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n) & \text{positive definite quadratic form} \\ \mathbf{g}(n) &= \frac{\mathbf{P}(n-1)\mathbf{x}(n)}{\lambda +\mu(n)} & \text{gain} \\ \mathbf{w}(n) &= \mathbf{w}(n-1)+\mathbf{g}(n)e(n|n-1) & \text{weight update} \\ \mathbf{P}(n) &= \frac{1}{\lambda}\left[ \mathbf{P}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1) \right] & \text{update matrix inverse} \end{align} $$
Return to i.
Tracking of time varying linear system with LMS and RLS.

Upload to Blackboard only one zip folder with correctly named files.
Written work should be in pdf (preferably $\LaTeX $), no microsoft word files please.
Do not email this assignment.
It would be awesome if you could collaborate with others in the class.
I will answer questions related to this assignment till noon, Friday March 15 2021. No late questions over weekend.

LMS/RLS Assignment (due 03/15/2022)
Joshua Cormier	Harjas Dadiyal	Isis Moussan	Geanna Perera
Upload to Black Board, by 03/07/2022

The goal here is to compute iteratively the solution $\boxed{\mathbf{w}^*=\mathbf{R}^{-1}\mathbf{P}}$ to $$\min_{\mathbf{w}} \,\,\, \epsilon(\mathbf{w})=\sigma^2_d(n)-2\mathbf{w}^T\mathbf{p}+\mathbf{w}^T\mathbf{R}\mathbf{w} $$ However, instead of stochastic gradient descent which uses instantaneous estimates, we

Estimate $ \mathbf{R}$ from the data $\mathbf{x}(n)$ directly using sample autocorrelation matrix $$ \begin{align} \Phi(n) &= \sum_{i=1}^n \lambda^{n-i} \mathbf{x}(i)\mathbf{x}^T(i) & 0<\lambda \leq 1 \text{ forgetting factor} \\ &= \lambda \underbrace{\left[ \sum_{i=1}^{n-1}\lambda^{n-1-i}\mathbf{x}(i)\mathbf{x}^T(i) \right]}_{\Phi(n-1)}+\mathbf{x}(n)\mathbf{x}^T(n) \end{align} $$ And analogously $$ \mathbf{p}(n)=\sum_{i=1}^n \lambda^{n-i}d(i)\mathbf{x}(i) $$
Use matrix inversion lemma $$(A-BD^{-1}C)^{-1} = A^{-1}+A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1}$$ to approximate $ \Phi^{-1} $ iteratively, by making the following substitutions:
1. $A=\lambda \Phi(n-1)$
2. $ B=\mathbf{x}(n) $
3. $ C=\mathbf{x}^T(n) $
4. $ D=-1 $

Next, let's name some terms to simplify further work. $$ \begin{align} \mu(n) &= \mathbf{x}^T(n)\Phi^{-1}(n-1)\mathbf{x}(n) & \text{quadratic form, positive definite, scalar} \\ \mathbf{P}(n) &= \Phi^{-1}(n) & \\ {\color{blue}{\mathbf{g}(n)}} &= {\color{blue}{\frac{\mathbf{P}(n-1)\mathbf{x}(n)}{\lambda +\mu(n)}}} & N\times 1 {\color{blue}{\text{ gain vector}}} \\ \mathbf{P}(n) &= \frac{1}\lambda\left[ \mathbf{P}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\right] & {\color{red}{\text{main recursion}}} \\ {\color{blue}{\lambda\mathbf{g}(n) +\mathbf{g}(n)\underbrace{\mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n)}_{\mu(n)}}} &= {\color{blue}{\mathbf{P}(n-1)\mathbf{x}(n)}} & {\color{blue}{\text{rewriting gain vector}}}\\ {\color{blue}{\lambda\mathbf{g}(n)}} &= {\color{blue}{\mathbf{P}(n-1)\mathbf{x}(n)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n)}} \\ {\color{blue}{\mathbf{g}(n)}} &= {\color{blue}{\underbrace{\frac{1}\lambda\left[ \mathbf{P}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\right]}_{\mathbf{P}(n)} \mathbf{x}(n)}} \\ \mathbf{g}(n) &= \mathbf{P}(n)\mathbf{x}(n)=\Phi^{-1}(n)\mathbf{x}(n) & \text{important for further analysis} \end{align} $$ Recall from above the cross-correlation vector $$ \mathbf{p}(n)=\sum_{i=1}^n \lambda^{n-i}d(i)\mathbf{x}(i)=\lambda\mathbf{p}(n-1)+d(n)\mathbf{x}(n) $$ and substitute into $\boxed{\mathbf{w}(n)=\mathbf{R}^{-1} \mathbf{p}(n) }$ to find a recursive expression for $\mathbf{w}(n)$, $$ \begin{align} \mathbf{w}(n) &= \frac{1}{\lambda}\left[ \mathbf{P}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1) \right]\cdot \left[ \lambda \mathbf{p}(n-1)+d(n)\mathbf{x}(n) \right] \\ &= \mathbf{P}(n-1)\mathbf{p}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{p}(n-1) + \frac{1}{\lambda} \left[ \mathbf{P}(n-1)\mathbf{x}(n)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n) \right]d(n) \end{align} $$ now use $ \mathbf{p}(n)=\sum_{i=1}^n \lambda^{n-i}d(i)\mathbf{x}(i) $ to obtain $$ \begin{align} \mathbf{w}(n) &= \mathbf{w}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{w}(n-1)+\frac{1}{\lambda}\left[ \lambda \mathbf{g}(n)+\mu(n) \mathbf{g}(n)-\mathbf{g}(n)\mu(n) \right] d(n) \\ &= \mathbf{w}(n-1) + \mathbf{g}(n)\underbrace{\left[ d(n)-\mathbf{x}^T(n)\mathbf{w}(n-1) \right]}_{e(n|n-1)} \\ &= \mathbf{w}(n-1) + \mathbf{g}(n)e(n|n-1)\,\, \checkmark \end{align} $$

RLS algorithm summary:
INITIALIZE at $n=0$
$$ \begin{align} \mathbf{w}(0) &= \mathbf{x}(0)=\mathbf{0} \\ \mathbf{P}(0) &= \delta \mathbf{I}, \,\,\, \delta \gg 1 \end{align} $$ REPEAT for $n=1$ to $n=$enough

Fetch $d(n), \, \mathbf{x}(n)$
Calculate prediction error $$ e(n|n-1)=d(n)-\mathbf{x}^T(n)\mathbf{w}(n-1)$$
Calculate $\mu$ and gain $$ \begin{align} \mu &= \mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n) \\ \mathbf{g}(n) &= \frac{\mathbf{P}(n-1)\mathbf{x}(n)}{\lambda +\mu(n)} \end{align} $$
Update the weights $$ \mathbf{w}(n)= \mathbf{w}(n-1)+\mathbf{g}(n)e(n|n-1)$$
Update the matrix inverse $$ \mathbf{P}(n)=\frac{1}{\lambda}\left[ \mathbf{P}(n-1)-\mathbf{g}(n)\mathbf{x}^T(n)\mathbf{P}(n-1) \right] $$

Return to 1.

123 RLS and LMS steps for comparison.

Wiring block diagram for RLS with 5 coefficients.

The goal is to iteratively minimize the Mean Squared Error in the following scenario (to be concrete) System Identification

Note that here the $\mathbb{E}$xpectation is with respect to $\mathbf{x}(n)$, while the minimization is with respect to $\mathbf{w}$. $$ \begin{align} \epsilon(\mathbf{w}) = \mathbb{E}\left[ e^2(n) \right] &= \mathbb{E}\left[\left(d(n)-\mathbf{w}^T\mathbf{x}(n)\right)^2\right]\\ &= \mathbb{E}\left[ d^2(n)-2\mathbf{w}^Td(n)\mathbf{x}(n)+\mathbf{w}^T\mathbf{x}(n)\mathbf{x}^T(n)\mathbf{w} \right] \\ &= \mathbb{E}\left[ d^2(n) \right]-2\mathbf{w}^T\mathbb{E}\left[ d(n)\mathbf{x}(n) \right] + \mathbf{w}^T \mathbb{E}\left[ \mathbf{x}(n)\mathbf{x}^T(n) \right] \mathbf{w} \\ &= \underbrace{\sigma^2_d(n)}_{\text{constant}}-\underbrace{2\mathbf{w}^T\mathbf{p}}_{\text{linear in }\mathbf{w}}+\underbrace{\mathbf{w}^T\mathbf{R}\mathbf{w}}_{\text{quadratic in }\mathbf{w}} \end{align} $$ The three terms appearing in this objective function are:

$ \mathbb{E}\left[ d^2(n) \right]=\sigma^2_d(n)$ is the power of the desired response.
$\mathbf{p}=\mathbb{E}\left[ d(n)\mathbf{x}(n) \right]$ is the $N\times 1$ cross-correlation vector between the scalar $d(n)$ and the vector $\mathbf{x}(n)$. $$ \mathbf{p}=\left[\begin{array}{c} \mathbb{E}\left[d(n)x(n)\right] \\ \mathbb{E}\left[d(n)x(n-1)\right] \\ \vdots \\ \mathbb{E}\left[d(n)x(n-N+1)\right] \end{array}\right] =\left[\begin{array}{c}\phi_{xd}(0) \\ \phi_{xd}(1) \\ \vdots \\ \phi_{xd}(N-1)\end{array}\right] $$
$\mathbf{R}=\mathbb{E}\left[ \mathbf{x}(n)\mathbf{x}^T(n) \right]$ is an $N\times N$ auto-covariance matrix of the process $\mathbf{x}(n)$. $$ \begin{align} \mathbf{R}=\mathbb{E}\left[ \mathbf{x}(n)\mathbf{x}^T(n) \right] &= \mathbb{E} \left[\begin{array}{c}x(n) \\ x(n-1) \\ \vdots \\ x(n-N+1)\end{array}\right] \left[\begin{array}{cccc}x(n) & x(n-1) & \cdots & x(n-N+1)\end{array}\right] \\ &= \mathbb{E}\left[\begin{array}{cccc} x(n)x(n) & x(n)x(n-1) & \cdots & x(n)x(n-N+1) \\ x(n-1)x(n) & x(n-1)x(n-1) & \cdots & x(n-1)x(n-N+1) \\ \vdots & \vdots & \ddots & \vdots \\ x(n-N+1)x(n) & x(n-N+1)x(n-1) & \cdots & x(n-N+1)x(n-N+1) \end{array}\right]\\ &= \left[\begin{array}{cccc} \phi_{xx}(0) & \phi_{xx}(1) & \cdots & \phi_{xx}(N-1) \\ \phi_{xx}(1) & \phi_{xx}(0) & \cdots & \phi_{xx}(N-2) \\ \vdots & \vdots & \ddots & \vdots \\ \phi_{xx}(N-1) & \phi_{xx}(N-2) & \cdots & \phi_{xx}(0) \end{array}\right] \end{align} $$

$\mathbf{R}$ is symmetric, that is $\boxed{\mathbf{R}^T=\mathbf{R}}$. Apply $ (\mathbf{AB})^T=\mathbf{B}^T\mathbf{A}^T$ to $\mathbf{R}=\mathbb{E}\left[ \mathbf{x}(n)\mathbf{x}^T(n) \right]$.
All eigenvalues of $\mathbf{R}$ are real. $$ \begin{align} \lambda_i\mathbf{v}_i &= \mathbf{Rv}_i & \text{conjugate transpose both sides} \\ \lambda_i^*\mathbf{v}_i^H &= \mathbf{v}_i^H\mathbf{R} & \text{right multiply both sides by } \mathbf{v}_i \\ \lambda_i^*\mathbf{v}_i^H \mathbf{v}_i &= \mathbf{v}_i^H\mathbf{R}\mathbf{v}_i & \\ \lambda_i^*\mathbf{v}_i^H \mathbf{v}_i &= \lambda_i \mathbf{v}_i^H\mathbf{v}_i & \\ \lambda_i^*||\mathbf{v}_i||^2 &= \lambda_i||\mathbf{v}_i||^2 & \text{note that } ||\mathbf{v}_i||\neq 0 & \\ \lambda_i^* &= \lambda_i & \text{ hence $\lambda_i$ is real}\checkmark \end{align} $$
Eigenvectors $\left\{ \mathbf{v}_i,\mathbf{v}_j\right\}_{i\neq j}$ of $\mathbf{R}$ corresponding to distinct eigenvalues $\lambda_i\neq \lambda_j$, are orthogonal, $ \mathbf{v}_i \bot \mathbf{v}_j $, or $\mathbf{v}_i^T \mathbf{v}_j=0$. $$ \begin{align} \lambda_i\mathbf{v}_i &= \mathbf{Rv}_i & \text{transpose both sides} \\ \lambda_i\mathbf{v}_i^T &= \mathbf{v}_i^T\mathbf{R} & \text{right multiply both sides by } \mathbf{v}_j \\ \lambda_i\mathbf{v}_i^T\mathbf{v}_j &= \mathbf{v}_i^T\mathbf{R}\mathbf{v}_j & \\ \lambda_i\mathbf{v}_i^T\mathbf{v}_j &= \lambda_j\mathbf{v}_i^T\mathbf{v}_j & \\ (\lambda_i-\lambda_j)\mathbf{v}_i^T\mathbf{v}_j &= 0 & \text{but $\lambda_i\neq\lambda_j$, hence } \mathbf{v}_i \bot \mathbf{v}_j \end{align} $$
All eigenvalues of $\mathbf{R}$ are nonnegative, which is equivalent to $ \boxed{\mathbf{u}^T\mathbf{R}\mathbf{u}>0, \,\, \forall \mathbf{u}\in\mathbf{R}^N}$ $$ \begin{align} \mathbf{u}^T\mathbf{R}\mathbf{u} &= \mathbf{u}^T\mathbb{E}\left[ \mathbf{x}(n)\mathbf{x}^T(n)\right]\mathbf{u} \\ &= \mathbb{E}\left[ \mathbf{u}^T\mathbf{x}(n)\mathbf{x}^T(n)\mathbf{u} \right] \\ &= \mathbb{E}\left[ \left(\mathbf{x}^T(n)\mathbf{u}\right)^2 \right] >0 \end{align} $$
Diagonalization $\mathbf{R}=\mathbf{M\Lambda M}^T$ $$ \begin{align} \lambda_1 \mathbf{v}_1 &= \mathbf{Rv}_1 \\ \left[ \lambda_1\mathbf{v}_1 \lambda_2\mathbf{v}_2 \right] &= \mathbf{R} \left[ \mathbf{v}_1 \mathbf{v}_2 \right]\\ \underbrace{\left[\begin{array}{cccc}| & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_N \\ | & | & & |\end{array}\right]}_{\mathbf{M}} \underbrace{\left[\begin{array}{cccc}\lambda_1 & 0 & \cdots & 0 \\0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\0 & 0 & \cdots & \lambda_N\end{array}\right]}_{\mathbf{\Lambda}} &= \mathbf{R}\underbrace{\left[\begin{array}{cccc}| & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_N \\ | & | & & |\end{array}\right]}_{\mathbf{M}} \\ \mathbf{M\Lambda} &= \mathbf{RM} \\ \mathbf{\Lambda} &= \mathbf{M}^{-1}\mathbf{RM} \hspace{6mm} \text{Diagonalization of } \mathbf{R}\\ \mathbf{\Lambda} &= \mathbf{M}^T\mathbf{RM} \hspace{4mm} \boxed{\mathbf{M}^T=\mathbf{M}^{-1} \text{ because } \mathbf{v}_j^T\mathbf{v}_i=0 \text{ for } i\neq j} \\ \mathbf{M\Lambda M}^T &= \mathbf{R} \hspace{19mm} \text{rewritten as stated in the following property} \end{align} $$
Spectral representation $$ \begin{align} \mathbf{R} &= \mathbf{M\Lambda M}^T \\ &= \underbrace{\left[\begin{array}{cccc}| & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_N \\ | & | & & |\end{array}\right]}_{\mathbf{M}} \underbrace{\left[\begin{array}{cccc}\lambda_1 & 0 & \cdots & 0 \\0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\0 & 0 & \cdots & \lambda_N\end{array}\right]}_{\mathbf{\Lambda}} \underbrace{\left[\begin{array}{ccc}\frac{\phantom{xyz}}{\phantom{xyz}} & \mathbf{v}_1^T & \frac{\phantom{xyz}}{\phantom{xyz}} \\ \frac{\phantom{xyz}}{\phantom{xyz}} & \mathbf{v}_2^T & \frac{\phantom{xyz}}{\phantom{xyz}} \\ & \vdots & \\ \frac{\phantom{xyz}}{\phantom{xyz}} & \mathbf{v}_N^T & \frac{\phantom{xyz}}{\phantom{xyz}}\end{array}\right]}_{\mathbf{M}^T} \\ &= \lambda_1\mathbf{v}_1\mathbf{v}_1^T+\lambda_2\mathbf{v}_2\mathbf{v}_2^T+\cdots \lambda_N\mathbf{v}_N\mathbf{v}_N^T \hspace{8mm} \text{ sum of rank 1 projections } \checkmark \\ &= \sum_{i=1}^N \lambda_i\mathbf{v}_i\mathbf{v}_i^T \end{align} $$

In applications, the true value of $\phi_{xy}(m)$ is unknown, and must be estimated from the available data.

Quadratic Forms: on left $\lambda$'s fixed, $\{\mathbf{v}_1,\mathbf{v}_2\}$ change, on right vice versa.

To find the optimal weights $\mathbf{w}^*$ such that MSE is minimized, that is $\epsilon(\mathbf{w}^*)=\epsilon_{\text{min}}$, need to solve $$ \nabla \epsilon(\mathbf{w})= \left[\begin{array}{c} \frac{\partial \epsilon(\mathbf{w})}{\partial w_1} \\ \frac{\partial \epsilon(\mathbf{w})}{\partial w_2} \\ \vdots \\ \frac{\partial \epsilon(\mathbf{w})}{\partial w_N} \end{array}\right]= \left[\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array}\right] $$ Consider the $N=2$ case, with $\mathbf{w} = \left[ \begin{array}{c}w_1\\w_2 \end{array} \right]$ $$ \begin{align} \epsilon(\mathbf{w}) &= \sigma^2_d(n)-2\mathbf{w}^T\mathbf{p}+\mathbf{w}^T\mathbf{R}\mathbf{w} & \text{objective to minimize}\\ &= \sigma^2_d(n)-2 \left[\begin{array}{cc}w_1 & w_2\end{array}\right] \left[\begin{array}{c}\phi_{xd}(0) \\ \phi_{xd}(1)\end{array}\right] +\left[\begin{array}{cc}w_1 & w_2\end{array}\right] \left[\begin{array}{cc}\phi_{xx}(0) & \phi_{xx}(1) \\ \phi_{xx}(1) & \phi_{xx}(0)\end{array}\right] \left[\begin{array}{c}w_1 \\ w_2\end{array}\right] & \\ &= \sigma^2_d(n)-2w_1\phi_{xd}(0)-2w_2\phi_{xd}(1)+ w_1^2\phi_{xx}(0)+2w_1w_2\phi_{xx}(1)+w_2^2\phi_{xx}(0) & \\ \frac{\partial \epsilon(\mathbf{w})}{\partial w_1} &= -2\phi_{xd}(0)+2w_1\phi_{xx}(0)+2w_2\phi_{xx}(1)=0 & \text{ set first partials to }0\\ \frac{\partial \epsilon(\mathbf{w})}{\partial w_2} &= -2\phi_{xd}(1)+2w_1\phi_{xx}(1)+2w_2\phi_{xx}(0)=0 & \text{and rearrange}\\ \phi_{xd}(0) &= w_1\phi_{xx}(0)+w_2\phi_{xx}(1) & {\color{red}{\boxed{\nabla \epsilon(\mathbf{w})= -2\mathbf{p}+2\mathbf{Rw}}}}\\ \phi_{xd}(1) &= w_1\phi_{xx}(1)+w_2\phi_{xx}(0) & \text{in matrix form} \\ \left[\begin{array}{c}\phi_{xd}(0) \\ \phi_{xd}(1)\end{array}\right] &= \left[\begin{array}{cc}\phi_{xx}(0) & \phi_{xx}(1) \\ \phi_{xx}(1) & \phi_{xx}(0)\end{array}\right] \left[\begin{array}{c}w_1^* \\w_2^*\end{array}\right] & \text{denote optimal weights} \\ \mathbf{p} &= \mathbf{Rw}^* & \\ \mathbf{w}^* &= \mathbf{R}^{-1}\mathbf{p} & \text{provided } \det(\mathbf{R})=\prod_{i=1}^N\lambda_i\neq 0 \end{align} $$ From the last line, optimal weights $\mathbf{w}^*$ exist, iff $\lambda_i\neq 0$, for $1\leq i \leq N$. It was shown above that $\lambda_i> 0$, for $1\leq i \leq N$, hence $\lambda_i\geq 0$, for $1\leq i \leq N$, that is $\mathbf{R} \succ 0$, aka positive-definite.
To find $\epsilon_{\text{min}}$, substitute $$ \begin{align} \epsilon_{\text{min}} = \epsilon(\mathbf{w}^*) &= \sigma^2_d(n)-2\left(\mathbf{R}^{-1}\mathbf{p}\right)^T\mathbf{p}+\left(\mathbf{R}^{-1}\mathbf{p} \right)^T\mathbf{R}\mathbf{R}^{-1}\mathbf{p} &\\ &= \sigma^2_d(n)-2\mathbf{p}^T\mathbf{R}^{-1}\mathbf{p}+\mathbf{p}^T \mathbf{R}^{-1}\mathbf{R}\mathbf{R}^{-1}\mathbf{p} & \boxed{\left(\mathbf{R}^{-1}\right)^T=\left(\mathbf{R}^T\right)^{-1}} \text{ one line exercise.} \\ &= \sigma^2_d(n)- 2\mathbf{p}^T\mathbf{w}^*+\mathbf{p}^T\mathbf{w}^* & \boxed{\mathbf{I}=\left(\mathbf{R}^{-1}\mathbf{R} \right)^T = \mathbf{R}^T\left( \mathbf{R}^{-1}\right)^T =\mathbf{R}\left(\mathbf{R}^{-1}\right)^T}\\ &= \sigma^2_d(n)- \mathbf{p}^T\mathbf{w}^* & \text{the smallest MSE we can hope for.} \end{align} $$

The solution $\boxed{\mathbf{w}^*=\mathbf{R}^{-1}\mathbf{p}}$ is of "analytical" importance. Applications require iterative solutions. One approach is to apply deterministic gradient descent, that is: $$ \underbrace{\mathbf{w}(n+1)}_{\text{next }\mathbf{w}}=\underbrace{\mathbf{w}(n)}_{\text{current }\mathbf{w}}\underbrace{-\mu \nabla \epsilon(\mathbf{w}(n))}_{\text{update}} $$ Here $ \mu >0$ is a scaling factor, aka stepsize.

Deterministic Gradient Descent, choose $\mu$ wisely, or else. To choose $\mu$ we need to recenter MSE surface above the origin of the weight space and to align the minor and major axes with the weight axes. This is accomplished in two steps.

Translate the coordinate axes using $$\mathbf{v}=\mathbf{w}-\mathbf{w}^*$$ so $\mathbf{w}=\mathbf{v}+\mathbf{R}^{-1}\mathbf{p}$ $$ \begin{align} \epsilon(\mathbf{w}) &= \sigma^2_d(n)-2\mathbf{w}^T\mathbf{p}+\mathbf{w}^T\mathbf{R}\mathbf{w} \\ \epsilon(\mathbf{v}) &= \sigma^2_d(n)-2\left( \mathbf{v}+\mathbf{R}^{-1}\mathbf{p} \right)^T\mathbf{p}+\left( \mathbf{v}+\mathbf{R}^{-1}\mathbf{p} \right)^T\mathbf{R}\left( \mathbf{v}+\mathbf{R}^{-1}\mathbf{p} \right) \\ &= \sigma^2_d(n)-2\left( \mathbf{v}^T+\mathbf{p}^T\mathbf{R}^{-1} \right)\mathbf{p}+\left( \mathbf{v}^T+\mathbf{p}^T\mathbf{R}^{-1} \right)\mathbf{R}\left( \mathbf{v}+\mathbf{R}^{-1}\mathbf{p} \right) \\ &= \sigma^2_d(n)-2\mathbf{v}^T\mathbf{p}-2\mathbf{p}^T\mathbf{R}^{-1}\mathbf{p}+\left( \mathbf{v}^T \mathbf{R}+\mathbf{p}^T\right)\left( \mathbf{v}+\mathbf{R}^{-1}\mathbf{p} \right) \\ &= \sigma^2_d(n)-2\mathbf{v}^T\mathbf{p}-2\mathbf{p}^T\mathbf{w}^*+\mathbf{v}^T\mathbf{R}\mathbf{v}+\mathbf{v}^T\mathbf{p}+\mathbf{p}^T\mathbf{v}+\mathbf{p}^T\mathbf{R}^{-1}\mathbf{p} \\ &= \sigma^2_d(n)-\mathbf{p}^T\mathbf{w}^* +\mathbf{v}^T\mathbf{R}\mathbf{v} \\ &= \epsilon_{\text{min}}+\mathbf{v}^T\mathbf{R}\mathbf{v} \end{align} $$
Rotate the $(v_1,v_2)$-coordinates. $$ \begin{align} \epsilon(\mathbf{v}) &= \epsilon_{\text{min}}+\mathbf{v}^T\mathbf{R}\mathbf{v} \\ &= \epsilon_{\text{min}}+\mathbf{v}^T\mathbf{M\Lambda M}^T\mathbf{v} \\ &= \epsilon_{\text{min}}+\left(\mathbf{M}^T\mathbf{v}\right)^T\mathbf{\Lambda} \left(\mathbf{M}^T\mathbf{v}\right) \\ \epsilon(\mathbf{v}') &= \epsilon_{\text{min}}+\mathbf{v}'^T\mathbf{\Lambda}\mathbf{v}' \end{align} $$ Hence $ \mathbf{v}'=\mathbf{M}^T\mathbf{v}$ is the rotated coordinate system $(v'_1,v'_2)$.

Next, let's take a look at gradient descent. $$ \begin{align} \mathbf{w}(1) &= \mathbf{w}(0)-\mu \nabla \epsilon(\mathbf{w}(0)) & \text{initial condition} \\ \mathbf{w}(n+1) &= \mathbf{w}(n)-\mu \nabla \epsilon(\mathbf{w}(n)) & \text{recursive form} \\ \mathbf{w}(n+1) &= \mathbf{w}(n) + \alpha \underbrace{\left[ \mathbf{p}-\mathbf{R}\mathbf{w}(n) \right]}_{-\frac{1}{2}\nabla \epsilon(\mathbf{w}(n))} & \alpha = 2\mu \tag{main $\mathbf{w}$ update}\\ \mathbf{w}(n+1) &= \left[ \mathbf{I} -\alpha\mathbf{R} \right]\mathbf{w}(n)+\alpha \mathbf{p} & \text{recursive form} \\ \mathbf{w}(1) &= \left[ \mathbf{I} -\alpha\mathbf{R} \right]\mathbf{w}(0)+\alpha \mathbf{p} & \text{initial condition} \\ \mathbf{w}(n) &= \left[\mathbf{I}-\alpha \mathbf{R} \right]^n \mathbf{w}(0)+\alpha \mathbf{p}\sum_{j=0}^{n-1}\left[ \mathbf{I}-\alpha \mathbf{R} \right]^j & \text{iterative solution} \end{align} $$ Need to use the last equation to show convergence and arrive at conditions on the step size $\mu=2\alpha$, so need to show $$ \begin{align} \lim_{n\rightarrow \infty} \mathbf{w}(n) &= \mathbf{w}^* & \text{which is equivalent to}\\ \lim_{n\rightarrow \infty} \mathbf{v}(n) &= \mathbf{0} & \text{which is the same as}\\ \lim_{n\rightarrow \infty} \mathbf{v}'(n) &= \mathbf{0} \end{align} $$ Now we use the translation followed by rotation of coordinates as follows: $$ \begin{align} \mathbf{w}(n+1)-{\color{red}{\mathbf{w}^*}} &= \mathbf{w}(n)-{\color{red}{\mathbf{w}^*}} + \alpha \left[ \mathbf{p}-\mathbf{R}\mathbf{w}(n) \right] & \text{recall that } \mathbf{v}=\mathbf{w}-\mathbf{w}^* \tag{main $\mathbf{w}$ update}\\ \mathbf{v}(n+1) &= \mathbf{v}(n)+ \alpha \left[ \mathbf{p}-\mathbf{R}\mathbf{w}(n) \right] & \text{add and subtract }\alpha \mathbf{Rw}^* \text{ on the right} \tag{main $\mathbf{v}$ update}\\ \mathbf{v}(n+1) &= \mathbf{v}(n)- \alpha \mathbf{R}\underbrace{\left[\mathbf{w}(n)-{\color{red}{\mathbf{w}^*}} \right]}_{\mathbf{v}(n)}+\underbrace{\alpha\mathbf{p}-{\color{red}{\alpha\mathbf{Rw}^*}}}_{=\mathbf{0}} & \text{recall that } \mathbf{p}=\mathbf{Rw}^*\\ \mathbf{v}(n+1) &= \left[\mathbf{I}-\alpha\mathbf{R} \right]\mathbf{v}(n) & \text{recursive form} \\ \mathbf{v}(n) &= \left[\mathbf{I}-\alpha\mathbf{R} \right]^n\mathbf{v}(0) & \text{non recursive, but }\left[\mathbf{I}-\alpha\mathbf{R} \right] \text{ is not diagonal} \\ \mathbf{v}(n+1) &= \left[\mathbf{MIM}^T-\alpha\mathbf{M\Lambda M}^T \right]\mathbf{v}(n) & \text{since } \mathbf{R}=\mathbf{M\Lambda M}^T\\ \mathbf{v}(n+1) &= \mathbf{M}\left[\mathbf{I}-\alpha\mathbf{\Lambda} \right]\mathbf{M}^T\mathbf{v}(n) & \\ \mathbf{M}^T\mathbf{v}(n+1) &= \left[\mathbf{I}-\alpha\mathbf{\Lambda} \right]\mathbf{M}^T\mathbf{v}(n) & \mathbf{M}^T=\mathbf{M}^{-1} \\ \mathbf{v}'(n+1) &= \left[\mathbf{I}-\alpha\mathbf{\Lambda} \right]\mathbf{v}'(n) & \mathbf{v}'=\mathbf{M}^T\mathbf{v} \\ \mathbf{v}'(n) &= \left[\mathbf{I}-\alpha\mathbf{\Lambda} \right]^n \mathbf{v}'(0) & \text{expand in components} \\ \left[\begin{array}{c}v'_1(n) \\ v'_2(n) \\ \vdots \\ v'_N(n)\end{array}\right] &= \left[\begin{array}{cccc}(1-\alpha \lambda_1)^n & 0 & \cdots & 0 \\ 0 & (1-\alpha \lambda_2)^n & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & (1-\alpha \lambda_N)^n\end{array}\right] \left[\begin{array}{c}v'_1(0) \\ v'_2(0) \\ \vdots \\ v'_N(0)\end{array}\right] & \text{consider } \boxed{v'_i(n)=(1-\alpha \lambda_i)^n v'_i(0)}, \, 1\leq i \leq N \\ \end{align} $$ As $n\rightarrow \infty$, $(1-\alpha \lambda_i)^n$ will converge (geometrically) to zero iff $|1-\alpha \lambda_i|<1$, $1\leq i \leq N$, hence $$ \begin{align} -1 &< 1-\alpha \lambda_i<1 \\ -2 &< -\alpha \lambda_i<0 \\ 0 &< 2\mu \lambda_i < 2 \\ 0 &< \mu < \frac{1}{\lambda_i} & \text{ for } 1\leq i \leq N, \text{so choose }\\ 0 &< \mu < \frac{1}{\lambda_\text{max}} & \text{to be on safe side use } \sum_{i=1}^N \lambda_i=\text{Tr}(\mathbf{R}) \\ 0 &< \mu < \frac{1}{\text{Tr}(\mathbf{R})} < \frac{1}{\lambda_\text{max}} & \end{align} $$ $\text{Tr}(\mathbf{R})$ is the power of $\mathbf{x}(n)$, which is unavailable at a single instant, so as a heuristic we use tap-input power $\sum\limits_{i=0}^{N-1}\mathbb{E}\left[x^2(n-i)\right]$. $$ 0<\mu<\frac{1}{\text{tap-input power}} $$

The LMS algorithm is derived in two steps as follows. First recall the definition of the MSE gradient $$ \begin{align} \nabla \epsilon(\mathbf{w}(n)) &= \frac{\partial}{\partial \mathbf{w}}\left[ \mathbb{E} \left\{ e^2(n) \right\} \right] \\ &= \mathbb{E}\left\{ \frac{\partial}{\partial \mathbf{w}} \left[ e^2(n)\right] \right\} \\ &= 2\mathbb{E}\left\{ e(n)\frac{\partial}{\partial \mathbf{w}} \left[ e(n)\right] \right\} & \text{but } \boxed{e(n)=d(n)-\mathbf{w}^T(n)\mathbf{x}(n)} \text{ so} \\ \frac{\partial }{\partial \mathbf{w}}e(n) &= -\mathbf{x}(n) \\ \nabla \epsilon(\mathbf{w}(n)) &= -2\mathbb{E}\left\{ e(n)\mathbf{x}(n) \right\} & \text{ hence} \\ \mathbf{w}(n+1) &= \mathbf{w}(n)+\alpha \mathbb{E}\left\{ e(n)\mathbf{x}(n) \right\} & \mathbb{E}\text{xpectation gives exact gradient} \end{align} $$ The exact gradient is unavailable in on-line applications , so we make a noisy approximate estimate $$ \mathbb{E}\left\{ e(n)\mathbf{x}(n) \right\} \approx e(n)\mathbf{x}(n) $$ We end up with a Stochastic Gradient Descent $$ \boxed{\underbrace{\mathbf{w}(n+1)=\mathbf{w}(n)+\alpha e(n)\mathbf{x}(n)}_{\text{LMS rule}}, \,\, 0<\alpha<\frac{2}{\text{tap-input power}}} $$

All of the analysis above apply to the red (expected) path. Proving that a single run (one of the black paths) converges to optimal weights $\mathbf{w}^*$ is another matter, and is beyond the scope of this class.

30 runs of LMS, and their average in red.

Wiring block diagram for LMS with 5 coefficients.

CRLB notes

Before motivating the concept of convolution and impulse response, let's revisit polynomial multiplication.
Given polynomials $A(z)=a_0+a_1z+a_2z^2+a_3z^3+\cdots + a_N z^N$ and $B(z)=b_0+b_1z+b_2z^2+b_3z^3+\cdots + b_M z^M$, let's first express their product $C(z)=A(z)B(z)=c_0+c_1z+c_2z^2+c_3z^3+\cdots +c_{N+M}z^{N+M}$ is therms of the coefficients $\{a_i\}_{i=0}^N$ and $\{b_i\}_{i=0}^M$ $$ \begin{align} C(z) &= \left( a_0+a_1z+a_2z^2+a_3z^3+\cdots + a_N z^N \right)\left( b_0+b_1z+b_2z^2+b_3z^3+\cdots + b_M z^M \right) \\ & =a_0b_0+a_0b_1z+a_0b_2z^2+a_0b_3z^3+ \cdots +a_0b_Mz^M \\ & \phantom{=a_0b_0}\,\,+ a_1b_0z+a_1b_1z^2+a_1b_2z^3+a_1b_3z^4+ \cdots +a_1b_Mz^{M+1} \\ & \phantom{=a_0b_0+a_1b_0z}\,\, +a_2b_0z^2+a_2b_1z^3+a_2b_2z^4+a_2b_3z^5+ \cdots +a_2b_Mz^{M+2} \\ & \phantom{=a_0b_0+a_1b_0z+a_2b_0z^2}\,\, +a_3b_0z^3+a_3b_1z^4+a_3b_2z^5+a_3b_3z^6+ \cdots +a_3b_Mz^{M+3} \\ & \phantom{=a_0b_0+a_1b_0z+a_2b_0z^2+a}\ddots \,\,\,\,\,\, + \phantom{a3}\vdots \phantom{a3z}\, + \phantom{a3}\vdots \phantom{a3z} + \phantom{a3}\vdots \phantom{a3z} +\cdots + \phantom{a3z}\vdots\,\,\,\,\,\,\,\,\, \ddots \\ & \phantom{=a_0b_0+a_1b_0z+a_2b_0z^2+a_3b_0z^3}\,\, +\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots +a_Nb_Mz^{N+M} \\ &= \underbrace{a_0b_0}_{c_0}+\underbrace{(a_0b_1+a_1b_0)}_{c_1}z+\underbrace{(a_0b_2+a_1b_1+a_2b_0)}_{c_2}z^2+\underbrace{(a_0b_3+a_1b_2+a_2b_1+a_3b_0)}_{c_3}z^3+\cdots + c_{N+M}z^{N+M} \end{align} $$ Notice the pattern here: $\boxed{c_k=\sum\limits_{i=0}^ka_ib_{k-i}}$, and hence $$ \begin{align} C(z) &= \left( a_0+a_1z+a_2z^2+a_3z^3+\cdots + a_N z^N \right)\left( b_0+b_1z+b_2z^2+b_3z^3+\cdots + b_M z^M \right) \\ &= \left(\sum_{i=0}^N a_iz^i\right)\left(\sum_{j=0}^M b_jz^j\right) \\ &= \sum_{k=0}^{N+M}\left( \sum_{i=0}^ka_ib_{k-i} \right)z^k \end{align} $$ Using a somewhat pompous and bombastic terminology, we say that the coefficients sequence $\langle c_0, c_1, c_3, \cdots , c_{N+M} \rangle$ is the convolution of sequences $\langle a_0, a_1, a_3, \cdots , a_N \rangle$ and $\langle b_0, b_1, b_3, \cdots , b_M \rangle$.

Multipath echo is modeled using convolution.

In DSP literature, $ z^{-1} $ is used instead of $ z $. Hence, instead of the correspondence $$ \langle a_0, a_1, a_3, \cdots , a_N \rangle \,\,\, \iff \,\,\, a_0+a_1z+a_2z^2+a_3z^3+\cdots +a_Nz^N = \sum_{n=0}^Na_nz^n $$ we have somewhat awkward but ultimately convenient (for frequency response and stability analysis) correspondence $$ \langle a_0, a_1, a_3, \cdots , a_N \rangle \,\,\, \iff \,\,\, a_0+a_1z^{-1}+a_2z^{-2}+a_3z^{-3}+\cdots +a_Nz^{-N} = \sum_{n=0}^Na_nz^{-n} $$ Three comments are in order:

Notice the effect of multiplication by $z^{-1}$ $$ \begin{align} z^{-1}\left(\sum_{n=0}^Na_nz^{-n}\right) &=\sum_{n=0}^N a_n z^{-1}z^{-n} \\ & = z^{-1} \left(a_0+a_1z^{-1}+a_2z^{-2}+a_3z^{-3}+\cdots +a_Nz^{-N}\right) \\ & = 0+a_0z^{-1}+a_1z^{-2}+a_2z^{-3}+a_3z^{-4}+\cdots +a_Nz^{-(N+1)} \\ & \iff \,\,\,\langle 0, a_0, a_1, a_3, \cdots , a_N \rangle \end{align} $$ This models time delay, or time shift.
Recall Power Series $f(x)=\sum_{n=0}^\infty a_nx^n$ from Calc 2, for example
- $ \cos x = 1-\frac{1}{2!}x^2+\frac{1}{4!}x^4-\frac{1}{6!}x^6+\frac{1}{8!}x^8-\cdots = \sum_{n=0}^\infty \frac{(-1)^n}{(2n)!}\left( x^2 \right)^n$ converges for $-\infty < x< \infty$, and establishes the correspondence: $$ \left\langle 1, -\frac{1}{2!}, +\frac{1}{4!}, -\frac{1}{6!}, \cdots , \frac{(-1)^n}{(2n)!}, \cdots \right\rangle \,\,\, \iff \,\,\, \sum_{n=0}^\infty \frac{(-1)^n}{(2n)!}\left( x^2 \right)^n = \cos x $$
- $ \exp(x) = 1+x+\frac{1}{2!}x^2+\frac{1}{3!}x^3+\frac{1}{4!}x^4+\frac{1}{5!}x^5+\cdots = \sum_{n=0}^\infty \frac{1}{n!} x^n $ converges for $-\infty < x< \infty$, and establishes the correspondence: $$ \left\langle 1, \frac{1}{2!}, \frac{1}{3!}, \frac{1}{4!}, \cdots , \frac{1}{n!}, \cdots \right\rangle \,\,\, \iff \,\,\, \sum_{n=0}^\infty \frac{1}{n!}x^n = \exp(x) $$
- $ \frac{1}{1-x} = 1+x+x^2+x^3+x^4+x^5+\cdots = \sum_{n=0}^\infty x^n $ converges for $-1 < x< +1$, and establishes a correspondence: $$ \left\langle 1, 1, 1, 1, \cdots , 1, \cdots \right\rangle \,\,\, \iff \,\,\, \sum_{n=0}^\infty x^n = \frac{1}{1-x} $$
Set $z^{-n}=e^{-i\omega n}$ to obtain the Discrete Fourier Transform from the $z$-transform.

Consider the Weighted Moving Average filter

Tapped Delay Line, aka FIR filter

Notice that the output $y[n]$ of the FIR filter (aka Tapped Delay Line shown above) is given by the convolution of the input sequence $\langle x[0], x[1], x[2], \cdots , x[k] \rangle$ with the tap coefficients (aka weights) $\langle w[0], w[1], w[2], \cdots , w[M] \rangle$. That is $$ y[n]=\sum_{i=0}^n x[i]w[n-i] $$

IIR filter has the feedforward FIR (moving average) part, and the feedback or autoregressive part. [1] $$ y[n]=-\sum_{k=1}^N a_ky[n-k]+\sum_{k=0}^M b_k x[n-k] $$ Type 1 canonic direct form for IIR filter

Type 1 canonic direct form for IIR filter

Your first computer assignment is to write a Matlab function that accepts:

The sequence (or the vector) $\mathbf{b}=[b_0, b_1, b_2, \cdots , b_M]^T$ of feedforward coefficients.
The sequence (or the vector) $\mathbf{a}=[a_1, a_2, \cdots , a_N]^T$ of feedback coefficients.
The filter input sequence (or vector) $\mathbf{x}=\langle x[0], x[1], x[2], \cdots , x[Q]\rangle $.

and returns the filter output sequence $\langle y[0], y[1], y[2], \cdots , y[P]\rangle $ for some large enough $P$ to show that the filter is working properly.
Recall that this is not much more than three steps inside a for loop

Update the state vector by shifting the entries in the tapped delay line.
Compute the inner product between the appropriate part of state vector and the coefficients $\mathbf{b}$.
Compute the inner product between the appropriate part of state vector and the coefficients $\mathbf{a}$.
Store and add the results to correct locations (variables).

Test your code as was discussed in class (like follows):

First:
- Set $\mathbf{a}=[0, 0, 0, \cdots , 0]^T$, and $\mathbf{b}=[1, -2, 3, \cdots , 5]^T$ or something easy to remember like that.
- Set $x[n]=\delta[n]$.
- Since there is no feedback, the filter will be only a moving average.
- The output $y[n]$ should be a finite impulse response, that is the sequence $ [b_0, b_1, b_2, \cdots , b_M]^T $.
Second:
- Set $\mathbf{a}=\left[-\frac{1}{2}, 0, 0, \cdots , 0\right]^T$, and $\mathbf{b}=[0, 0, 0, \cdots , 0]^T$.
- Set $x[n]=\delta[n]$.
- This is a pure autoregressive linear filter.
- The output $y[n]$ should be a infinite impulse response, that is the sequence $\left\langle 1, -\frac{1}{2}, \frac{1}{4}, \cdots , \left(-\frac{1}{2} \right)^P, \cdots \right\rangle $.
- Also try $\mathbf{a}=\left[0, \frac{1}{2}, 0, \cdots , 0\right]^T$, and the rest as above.
Finally:
- Set $\mathbf{a}$ to a small numbers (to avoid instability), and $\mathbf{b}$ to any numbers in range $ -1.5 \leq b_i \leq 1.5 $.
- Set $x[n]$ to Gaussian noise $\mathcal{N}(0,1)$.
- This is a genearl IIR filter.
- Plot both the input $x[n]$ and the output $y[n]$ pairs in all cases above for comparison.

[1] The notation $a_k$ or $a[k]$ is used interchangeably and inconsistently throughout the DSP literature.

ELEE 5810 Estimation Theory with Applications

LMS and RLS assignment (due March 07 2022)