# Introduction

This tutorial is about linear transforms - a basic building block of many, including deep learning, models.

# Short recap and syncing repositories

Before you proceed onwards, remember to activate you virtual environments so you can use the software you installed last week as well as run the notebooks in interactive mode, no through github.com website.

## Virtual environments

To activate virtual environment:
   * If you were on Tuesday/Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`
   * If you were on Monday group:
      + and if you have chosen **comfy** way type: workon mlpractival
      + and if you have chosen **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):

## On Synchronising repositories

I started writing this, but do not think giving students a choice is a good way to progess, the most painless way to follow would be to ask them to stash their changes (with some meaningful message) and work on the clean updated repository. This way one can always (temporarily) recover the work once needed but everyone starts smoothly the next lab. We do not want to help anyone how to resolve the conflicts...

Enter your git mlp repository you set up last week (i.e. ~/mlpractical/repo-mlp) and depending on how you want to proceed you either can:
  1. Overridde some changes you have made (both in the notebooks and/or in the code if you happen to modify parts that were updated by us) with the code we have provided for this lab
  2. Try to merge your code with ours (for example, if you want to use `MetOfficeDataProvider` you have written)
  
Our recommendation is, you should at least keep the progress in the notebooks (so you can peek some details when needed)
  
```
git pull
```

## Default Synchronising Strategy

Need to think/discuss this.

# Linear and Affine Transforms

Depending on the required level of details, one may need to. The basis of all linear models is so called affine transform, that is the transform that implements some (linear) rotation of some input points and shift (translation) them. Denote by $\vec x$ some input vector, then the affine transform is defined as follows:

![Making Predictions](res/singleLayerNetWts-1.png)

$
\begin{equation}
  \mathbf y=\mathbf W \mathbf x + \mathbf b
\end{equation}
$

<b>Note:</b> the bias term can be incorporated as an additional column in the weight matrix, though in this tutorials we will use a separate variable to for this purpose.

An $i$th element of vecotr $\mathbf y$ is hence computed as:

$
\begin{equation}
   y_i=\mathbf w_i \mathbf x + b_i
\end{equation}
$

where $\mathbf w_i$ is the $i$th row of $\mathbf W$

$
\begin{equation}
   y_i=\sum_j w_{ji}x_j + b_i
\end{equation}
$

???



In [1]:
import numpy
x=numpy.random.uniform(-1,1,(4,)); 
W=numpy.random.uniform(-1,1,(5,4)); 
y=numpy.dot(W,x);
print y

[ 0.06875593 -0.69616488  0.08823301  0.34533413 -0.22129962]


In [4]:
def my_dot(x, W, b):
    y = numpy.zeros_like((x.shape[0], W.shape[1]))
    raise NotImplementedError('Write me!')
    return y

[[ 0.63711     0.11566944  0.74416104]
 [-0.01335825  0.46206922 -0.1109265 ]
 [-0.37523063 -0.06755371  0.04352121]
 [ 0.25885831 -0.53660826 -0.40905639]]


In [22]:

for itr in xrange(0,100):
  my_dot(W,x)
    


[ 0  1  2  3  4  5  6  7  8  9 10]


# Iterative learning of linear models

We will learn the model with (batch for now) gradient descent.


## Running example

![Making Predictions](res/singleLayerNetPredict.png)
 

  * Input vector $\mathbf{x} = (x_1, x_1, \ldots, x_d)^T $
  * Output vector $\mathbf{y} = (y_1, \ldots, y_K)^T $
  * Weight matrix $\mathbf{W}$: $w_{ki}$ is the weight from input $x_i$ to output $y_k$
  * Bias $w_{k0}$ is the bias for output $k$
  * Targets vector $\mathbf{t} = (t_1, \ldots, t_K)^T $


$
  y_k = \sum_{i=1}^d w_{ki} x_i + w_{k0}
$

If we define $x_0=1$ we can simplify the above to

$
  y_k = \sum_{i=0}^d w_{ki} x_i \quad ; \quad \mathbf{y} = \mathbf{Wx}
$

$
E = \frac{1}{2} \sum_{n=1}^N ||\mathbf{y}^n - \mathbf{t}^n||^2 =  \sum_{n=1}^N E^n \\
  E^n = \frac{1}{2} ||\mathbf{y}^n - \mathbf{t}^n||^2
$

 $ E^n = \frac{1}{2} \sum_{k=1}^K (y_k^n - t_k^n)^2 $
 set $\mathbf{W}$ to minimise $E$ given the training set
  
$
 E^n = \frac{1}{2} \sum_{k=1}^K (y^n_k - t^n_k)^2 
    = \frac{1}{2} \sum_{k=1}^K \left( \sum_{i=0}^d w_{ki} x^n_i - t^n_k \right)^2 \\
    \pderiv{E^n}{w_{rs}} = (y^n_r - t^n_r)x_s^n =  \delta^n_r x_s^n \quad ; \quad
    \delta^n_r = y^n_r - t^n_r \\
    \pderiv{E}{w_{rs}} = \sum_{n=1}^N \pderiv{E^n}{w_{rs}} = \sum_{n=1}^N \delta^n_r x_s^n
$


\begin{algorithmic}[1]
      \Procedure{gradientDescentTraining}{$\mvec{X}, \mvec{T},
        \mvec{W}$}
        \State initialize $\mvec{W}$ to small random numbers
%        \State randomize order of training examples in $\mvec{X}
        \While{not converged}
           \State for all $k,i$: $\Delta w_{ki} \gets 0$
           \For{$n \gets 1,N$}
            \For{$k \gets 1,K$}
              \State $y_k^n \gets \sum_{i=0}^d w_{ki} x_{ki}^n$
              \State $\delta_k^n \gets y_k^n - t_k^n$
              \For{$i \gets 1,d$}
                \State $\Delta w_{ki} \gets \Delta w_{ki} + \delta_k^n \cdot x_i^n$
              \EndFor
            \EndFor
           \EndFor
           \State for all $k,i$: $w_{ki} \gets w_{ki} - \eta \cdot \Delta w_{ki}$
        \EndWhile
       \EndProcedure
\end{algorithmic}

# Excercises

# Fun Stuff

So what on can do with linear transform, and what are the properties of those?

Exercise, show, the LT is invertible, basically, solve the equation:

y=Wx+b, given y (transformed image), find such x that is the same as the original one.