wording in lab 01

This commit is contained in:
Steve Renals 2015-10-05 10:26:58 +01:00
parent d2f20c074d
commit 56aabeaa51

View File

@ -6,23 +6,23 @@
"source": [
"# Introduction\n",
"\n",
"This tutorial is about linear transforms - a basic building block of many, including deep learning, models.\n",
"This tutorial is about linear transforms - a basic building block of neural networks, including deep learning models.\n",
"\n",
"# Virtual environments and syncing repositories\n",
"\n",
"Before you proceed onwards, remember to activate you virtual environments so you can use the software you installed last week as well as run the notebooks in interactive mode, no through github.com website.\n",
"Before you proceed onwards, remember to activate you virtual environments so you can use the software you installed last week as well as run the notebooks in interactive mode, not through the github.com website.\n",
"\n",
"## Virtual environments\n",
"\n",
"To activate virtual environment:\n",
" * If you were on Tuesday/Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`\n",
" * If you were on Monday group:\n",
" + and if you have chosen **comfy** way type: workon mlpractival\n",
" + and if you have chosen **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):\n",
"To activate the virtual environment:\n",
" * If you were in last week's Tuesday or Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`\n",
" * If you were in the Monday group:\n",
" + and if you have chosen the **comfy** way type: `workon mlpractical`\n",
" + and if you have chosen the **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):\n",
"\n",
"## On Synchronising repositories\n",
"\n",
"Enter your git mlp repository you set up last week (i.e. ~/mlpractical/repo-mlp) and once you synced the repository (in one of the two below ways), start the notebook session by typing:\n",
"Enter the git mlp repository you set up last week (i.e. `~/mlpractical/repo-mlp`) and once you sync the repository (in one of the two below ways), start the notebook session by typing:\n",
"\n",
"```\n",
"ipython notebook\n",
@ -32,10 +32,10 @@
"\n",
"To avoid potential conflicts between the changes you have made since last week and our additions, we recommend `stash` your changes and `pull` the new code from the mlpractical repository by typing:\n",
"\n",
"1. `git stash save \"my 1st lab work\"`\n",
"1. `git stash save \"Lab1 work\"`\n",
"2. `git pull`\n",
"\n",
"Then, once you need you can always (temporaily) restore a desired state of the repository.\n",
"Then, if you need to, you can always (temporaily) restore a desired state of the repository.\n",
"\n",
"### For advanced github users\n",
"\n",
@ -52,19 +52,19 @@
"***\n",
"### Note on storing matrices in computer memory\n",
"\n",
"Consider you want to store the following array in memory: $\\left[ \\begin{array}{ccc}\n",
"Suppose you want to store the following matrix in memory: $\\left[ \\begin{array}{ccc}\n",
"1 & 2 & 3 \\\\\n",
"4 & 5 & 6 \\\\\n",
"7 & 8 & 9 \\end{array} \\right]$ \n",
"\n",
"In computer memory the above matrix would be organised as a vector in either (assume you allocate the memory at once for the whole matrix):\n",
"If you allocate the memory at once for the whole matrix, then the above matrix would be organised as a vector in one of two possible forms:\n",
"\n",
"* Row-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n",
"1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\end{array} \\right ]$\n",
"* Column-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n",
"1 & 4 & 7 & 2 & 5 & 8 & 3 & 6 & 9 \\end{array} \\right ]$\n",
"\n",
"Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with modern (and default) `c`-like approach and use row-wise format (contrary to Fortran that used column-wise approach). \n",
"Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with the more modern (and default) `C`-like approach and use the row-wise format (in contrast to Fortran that used a column-wise approach). \n",
"\n",
"This means, that in this tutorial:\n",
"* vectors are kept row-wise $\\mathbf{x} = (x_1, x_1, \\ldots, x_D) $ (rather than $\\mathbf{x} = (x_1, x_1, \\ldots, x_D)^T$)\n",
@ -73,18 +73,18 @@
"x_{21} & x_{22} & \\ldots & x_{2D} \\\\\n",
"x_{31} & x_{32} & \\ldots & x_{3D} \\\\ \\end{array} \\right]$ and each row (i.e. $\\left[ \\begin{array}{cccc} x_{11} & x_{12} & \\ldots & x_{1D} \\end{array} \\right]$) represents a single data-point (like one MNIST image or one window of observations)\n",
"\n",
"In lecture slides you will find the equations following the conventional mathematical column-wise approach, but you can easily map them one way or the other using using matrix transpose.\n",
"In lecture slides you will find the equations following the conventional mathematical approach, using column vectors, but you can easily map between column-major and row-major organisations using a matrix transpose.\n",
"\n",
"***\n",
"\n",
"## Linear and Affine Transforms\n",
"\n",
"The basis of all linear models is so called affine transform, that is a transform that implements some linear transformation and translation of input features. The transforms we are going to use are parameterised by:\n",
"The basis of all linear models is the so called affine transform, which is a transform that implements a linear transformation and translation of the input features. The transforms we are going to use are parameterised by:\n",
"\n",
" * Weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n",
" * Bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n",
" * A weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n",
" * A bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n",
"\n",
"Note, the bias is simply some additve term, and can be easily incorporated into an additional row in weight matrix and an additinal input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n",
"Note, the bias is simply some additive term, and can be easily incorporated into an additional row in weight matrix and an additional input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n",
"\n",
"![Making Predictions](res/singleLayerNetWts-1.png)\n",
"\n",
@ -119,6 +119,8 @@
"\\end{equation}\n",
"$\n",
"\n",
"This is equivalent to slides 12/13 in lecture 1, except we are using row vectors.\n",
"\n",
"When $\\mathbf{x}$ is a mini-batch (contains $B$ data-points of dimension $D$ each), i.e. $\\left[ \\begin{array}{cccc}\n",
"x_{11} & x_{12} & \\ldots & x_{1D} \\\\\n",
"x_{21} & x_{22} & \\ldots & x_{2D} \\\\\n",
@ -131,7 +133,7 @@
"\\end{equation}\n",
"$\n",
"\n",
"where both $\\mathbf{X}\\in\\mathbb{R}^{B\\times D}$ and $\\mathbf{Y}\\in\\mathbb{R}^{B\\times K}$ are matrices, and $\\mathbf{b}$ needs to be <a href=\"http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html\">broadcased</a> $B$ times (numpy will do this by default). However, we will not make an explicit distinction between a special case for $B=1$ and $B>1$ and simply use equation (3) instead, although $\\mathbf{x}$ and hence $\\mathbf{y}$ could be matrices. From implementation point of view, it does not matter.\n",
"where both $\\mathbf{X}\\in\\mathbb{R}^{B\\times D}$ and $\\mathbf{Y}\\in\\mathbb{R}^{B\\times K}$ are matrices, and $\\mathbf{b}$ needs to be <a href=\"http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html\">broadcasted</a> $B$ times (numpy will do this by default). However, we will not make an explicit distinction between a special case for $B=1$ and $B>1$ and simply use equation (3) instead, although $\\mathbf{x}$ and hence $\\mathbf{y}$ could be matrices. From an implementation point of view, it does not matter.\n",
"\n",
"The desired functionality for matrix multiplication in numpy is provided by <a href=\"http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html\">numpy.dot</a> function. If you haven't use it so far, get familiar with it as we will use it extensively."
]
@ -140,9 +142,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Note on random number generators (general note)\n",
"### A general note on random number generators\n",
"\n",
"It is generally a good practice (for machine learning applications **not** cryptography!) to seed a pseudo-random number generator once at the beginning of the experiment, and use it later through the code where necesarry. This allows to avoid hard to reproduce scenariors where a particular action happens only for a particular sequence of numbers (which you cannot reproduce easily due to unknown random seeds sequence on the way!). As such, within this course we are going use a single random generator object. For instance, the one similar to the below:"
"It is generally a good practice (for machine learning applications **not** for cryptography!) to seed a pseudo-random number generator once at the beginning of the experiment, and use it later through the code where necesarry. This makes it easier to reproduce results since random initialisations can be replicated. As such, within this course we are going use a single random generator object, similar to the below:"
]
},
{
@ -166,9 +168,9 @@
"source": [
"## Exercise 1 \n",
"\n",
"Using numpy.dot, implement **forward** propagation through the linear transform defined by equations (3) and (4) for $B=1$ and $B>1$. As data ($\\mathbf{x}$) use `MNISTDataProvider` from previous laboratories. For case when $B=1$ write a function to compute the 1st output ($y_1$) using equations (1) and (2). Check if the output is the same as the corresponding one obtained with numpy. \n",
"Using `numpy.dot`, implement **forward** propagation through the linear transform defined by equations (3) and (4) for $B=1$ and $B>1$. As data ($\\mathbf{x}$) use `MNISTDataProvider` introduced last week. For the case when $B=1$, write a function to compute the 1st output ($y_1$) using equations (1) and (2). Check if the output is the same as the corresponding one obtained with numpy. \n",
"\n",
"Tip: To generate random data you can use `random_generator.uniform(-0.1, 0.1, (D, 10))` from the preceeding cell."
"Tip: To generate random data you can use `random_generator.uniform(-0.1, 0.1, (D, 10))` from above."
]
},
{
@ -235,7 +237,7 @@
"source": [
"## Exercise 2\n",
"\n",
"Modify (if necessary) examples from Exercise 1 to perform **backward** propagation, that is, given $\\mathbf{y}$ (obtained in previous step) and weight matrix $\\mathbf{W}$, project $\\mathbf{y}$ onto the input space $\\mathbf{x}$ (ignore or set to zero the biases towards $\\mathbf{x}$ in backward pass). Mathematically, we are interested in the following transformation: $\\mathbf{z}=\\mathbf{y}\\mathbf{W}^T$"
"Modify the examples from Exercise 1 to perform **backward** propagation, that is, given $\\mathbf{y}$ (obtained in the previous step) and weight matrix $\\mathbf{W}$, project $\\mathbf{y}$ onto the input space $\\mathbf{x}$ (ignore or set to zero the biases towards $\\mathbf{x}$ in backward pass). Mathematically, we are interested in the following transformation: $\\mathbf{z}=\\mathbf{y}\\mathbf{W}^T$"
]
},
{
@ -475,7 +477,7 @@
" * Scalar bias $b$ for the only output in our model \n",
" * Scalar target $t$ for the only output in out model\n",
" \n",
"First, ensure you can make use of data provider (note, for training data has been normalised to zero mean and unit variance, hence different effective range than one can find in file):"
"First, ensure you can make use of the data provider (note, for training data has been normalised to zero mean and unit variance, hence different effective range than one can find in file):"
]
},
{
@ -502,7 +504,7 @@
"source": [
"## Exercise 4\n",
"\n",
"The below code implements a very simple variant of stochastic gradient descent for the weather regression example. Your task is to implement 5 functions in the next cell and then run two next cells that 1) build sgd functions and 2) run the actual training."
"The below code implements a very simple variant of stochastic gradient descent for the rainfall prediction example. Your task is to implement 5 functions in the next cell and then run two next cells that 1) build sgd functions and 2) run the actual training."
]
},
{
@ -631,10 +633,10 @@
"source": [
"## Exercise 5\n",
"\n",
"Modify the above regression problem so the model makes binary classification whether the the weather is going to be one of those \\{rainy, sunny} (look at slide 12 of the 2nd lecture)\n",
"Modify the above prediction (regression) problem so the model makes a binary classification whether the the weather is going to be one of those \\{rainy, not-rainy} (look at slide 12 of the 2nd lecture)\n",
"\n",
"Tip: You need to introduce the following changes:\n",
"1. Modify `MetOfficeDataProvider` (for example, inherit from MetOfficeDataProvider to create a new class MetOfficeDataProviderBin) and modify `next()` function so it returns as `targets` either 0 (sunny - if the the amount of rain [before mean/variance normalisation] is equal to 0 or 1 (rainy -- otherwise).\n",
"1. Modify `MetOfficeDataProvider` (for example, inherit from MetOfficeDataProvider to create a new class MetOfficeDataProviderBin) and modify `next()` function so it returns as `targets` either 0 (not-rainy - if the the amount of rain [before mean/variance normalisation] is equal to 0) or 1 (rainy -- otherwise).\n",
"2. Modify the functions from previous exercise so the fprop implements `sigmoid` on top of affine transform.\n",
"3. Modify cost function to binary cross-entropy\n",
"4. Make sure you compute the gradients correctly (as you have changed both the output and the cost)\n"