Renaming notebooks to start from 01 and removing old courseworks and solutions.

This commit is contained in:
Matt Graham 2016-09-21 03:04:30 +01:00
parent c844ff2027
commit 27ee34cbba
11 changed files with 0 additions and 4863 deletions

File diff suppressed because one or more lines are too long

View File

@ -1,896 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"This tutorial is about linear transforms - a basic building block of neural networks, including deep learning models.\n",
"\n",
"# Virtual environments and syncing repositories\n",
"\n",
"Before you proceed onwards, remember to activate you virtual environments so you can use the software you installed last week as well as run the notebooks in interactive mode, not through the github.com website.\n",
"\n",
"## Virtual environments\n",
"\n",
"To activate the virtual environment:\n",
" * If you were in last week's Tuesday or Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`\n",
" * If you were in the Monday group:\n",
" + and if you have chosen the **comfy** way type: `workon mlpractical`\n",
" + and if you have chosen the **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):\n",
"\n",
"## On Synchronising repositories\n",
"\n",
"Enter the git mlp repository you set up last week (i.e. `~/mlpractical/repo-mlp`) and once you sync the repository (in one of the two below ways, or look at our short Git FAQ <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>), start the notebook session by typing:\n",
"\n",
"```\n",
"ipython notebook\n",
"```\n",
"\n",
"### Default way\n",
"\n",
"To avoid potential conflicts between the changes you have made since last week and our additions, we recommend `stash` your changes and `pull` the new code from the mlpractical repository by typing:\n",
"\n",
"1. `git stash save \"Lab1 work\"`\n",
"2. `git pull`\n",
"\n",
"Then, if you need to, you can always (temporaily) restore a desired state of the repository (look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>).\n",
"\n",
"**Otherwise** you may also create a branch for each lab separately (again, look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> and git tutorials we linked there), this will allow you to keep `master` branch clean, and pull changes into it every week from the central repository. At the same time branching gives you much more flexibility with changes you introduce to the code as potential conflicts will not occur until you try to make an explicit merge.\n",
"\n",
"### For advanced github users\n",
"\n",
"It is OK if you want to keep your changes and merge the new code with whatever you already have, but you need to know what you are doing and how to resolve conflicts.\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Single Layer Models\n",
"\n",
"***\n",
"### Note on storing matrices in computer memory\n",
"\n",
"Consider you want to store the following array in memory: $\\left[ \\begin{array}{ccc}\n",
"1 & 2 & 3 \\\\\n",
"4 & 5 & 6 \\\\\n",
"7 & 8 & 9 \\end{array} \\right]$ \n",
"\n",
"In computer memory the above matrix would be organised as a vector in either (assume you allocate the memory at once for the whole matrix):\n",
"\n",
"* Row-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n",
"1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\end{array} \\right ]$\n",
"* Column-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n",
"1 & 4 & 7 & 2 & 5 & 8 & 3 & 6 & 9 \\end{array} \\right ]$\n",
"\n",
"Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with modern (and default) `c`-like approach and use row-wise format (contrary to Fortran that used column-wise approach). \n",
"\n",
"This means, that in this tutorial:\n",
"* vectors are kept row-wise $\\mathbf{x} = (x_1, x_1, \\ldots, x_D) $ (rather than $\\mathbf{x} = (x_1, x_1, \\ldots, x_D)^T$)\n",
"* similarly, in case of matrices we will stick to: $\\left[ \\begin{array}{cccc}\n",
"x_{11} & x_{12} & \\ldots & x_{1D} \\\\\n",
"x_{21} & x_{22} & \\ldots & x_{2D} \\\\\n",
"x_{31} & x_{32} & \\ldots & x_{3D} \\\\ \\end{array} \\right]$ and each row (i.e. $\\left[ \\begin{array}{cccc} x_{11} & x_{12} & \\ldots & x_{1D} \\end{array} \\right]$) represents a single data-point (like one MNIST image or one window of observations)\n",
"\n",
"In lecture slides you will find the equations following the conventional mathematical column-wise approach, but you can easily map them one way or the other using using matrix transpose.\n",
"\n",
"***\n",
"\n",
"## Linear and Affine Transforms\n",
"\n",
"The basis of all linear models is so called affine transform, that is a transform that implements some linear transformation and translation of input features. The transforms we are going to use are parameterised by:\n",
"\n",
" * Weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n",
" * Bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n",
"\n",
"Note, the bias is simply some additve term, and can be easily incorporated into an additional row in weight matrix and an additinal input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n",
"\n",
"![Making Predictions](res/singleLayerNetWts-1.png)\n",
"\n",
"For instance, for the above example of 5-dimensional input vector by $\\mathbf{x} = (x_1, x_2, x_3, x_4, x_5)$, weight matrix $\\mathbf{W}=\\left[ \\begin{array}{ccc}\n",
"w_{11} & w_{12} & w_{13} \\\\\n",
"w_{21} & w_{22} & w_{23} \\\\\n",
"w_{31} & w_{32} & w_{33} \\\\\n",
"w_{41} & w_{42} & w_{43} \\\\\n",
"w_{51} & w_{52} & w_{53} \\\\ \\end{array} \\right]$, bias vector $\\mathbf{b} = (b_1, b_2, b_3)$ and outputs $\\mathbf{y} = (y_1, y_2, y_3)$, one can write the transformation as follows:\n",
"\n",
"(for the $i$-th output)\n",
"\n",
"(1) $\n",
"\\begin{equation}\n",
" y_i = b_i + \\sum_j x_jw_{ji}\n",
"\\end{equation}\n",
"$\n",
"\n",
"or the equivalent vector form (where $\\mathbf w_i$ is the $i$-th column of $\\mathbf W$, but note, when we **slice** the $i$th column we will get a **vector** $\\mathbf w_i = (w_{1i}, w_{2i}, w_{3i}, w_{4i}, w_{5i})$, hence the transpose for $\\mathbf w_i$ in the below equation):\n",
"\n",
"(2) $\n",
"\\begin{equation}\n",
" y_i = b_i + \\mathbf x \\mathbf w_i^T\n",
"\\end{equation}\n",
"$\n",
"\n",
"The same operation can be also written in matrix form, to compute all the outputs $\\mathbf{y}$ at the same time:\n",
"\n",
"(3) $\n",
"\\begin{equation}\n",
" \\mathbf y=\\mathbf x\\mathbf W + \\mathbf b\n",
"\\end{equation}\n",
"$\n",
"\n",
"This is equivalent to slides 12/13 in lecture 1, except we are using row vectors.\n",
"\n",
"When $\\mathbf{x}$ is a mini-batch (contains $B$ data-points of dimension $D$ each), i.e. $\\left[ \\begin{array}{cccc}\n",
"x_{11} & x_{12} & \\ldots & x_{1D} \\\\\n",
"x_{21} & x_{22} & \\ldots & x_{2D} \\\\\n",
"\\cdots \\\\\n",
"x_{B1} & x_{B2} & \\ldots & x_{BD} \\\\ \\end{array} \\right]$ equation (3) effectively becomes to be\n",
"\n",
"(4) $\n",
"\\begin{equation}\n",
" \\mathbf Y=\\mathbf X\\mathbf W + \\mathbf b\n",
"\\end{equation}\n",
"$\n",
"\n",
"where $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$ and both $\\mathbf{X}\\in\\mathbb{R}^{B\\times D}$ and $\\mathbf{Y}\\in\\mathbb{R}^{B\\times K}$ are matrices, and $\\mathbf{b}\\in\\mathbb{R}^{1\\times K}$ needs to be <a href=\"http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html\">broadcasted</a> $B$ times (numpy will do this by default). However, we will not make an explicit distinction between a special case for $B=1$ and $B>1$ and simply use equation (3) instead, although $\\mathbf{x}$ and hence $\\mathbf{y}$ could be matrices. From an implementation point of view, it does not matter.\n",
"\n",
"The desired functionality for matrix multiplication in numpy is provided by <a href=\"http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html\">numpy.dot</a> function. If you haven't use it so far, get familiar with it as we will use it extensively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A general note on random number generators\n",
"\n",
"It is generally a good practice (for machine learning applications **not** for cryptography!) to seed a pseudo-random number generator once at the beginning of the experiment, and use it later through the code where necesarry. This makes it easier to reproduce results since random initialisations can be replicated. As such, within this course we are going use a single random generator object, similar to the below:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy\n",
"import sys\n",
"\n",
"#initialise the random generator to be used later\n",
"seed=[2015, 10, 1]\n",
"random_generator = numpy.random.RandomState(seed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1 \n",
"\n",
"Using `numpy.dot`, implement **forward** propagation through the linear transform defined by equations (3) and (4) for $B=1$ and $B>1$ i.e. use parameters $\\mathbf{W}$ and $\\mathbf{b}$ with data $\\mathbf{X}$ to determine $\\mathbf{Y}$. Use `MNISTDataProvider` (introduced last week) to generate $\\mathbf{X}$. We are going to write a function for each equation:\n",
"1. `y1_equation_1`: Return the value of the $1^{st}$ dimension of $\\mathbf{y}$ (the output of the first output node) given a single training data point $\\mathbf{x}$ using a sum\n",
"1. `y1_equation_2`: Repeat above using vector multiplication (use `numpy.dot()`)\n",
"1. `y_equation_3`: Return the value of $\\mathbf{y}$ (the whole output layer) given a single training data point $\\mathbf{x}$\n",
"1. `Y_equation_4`: Return the value of $\\mathbf{Y}$ given $\\mathbf{X}$\n",
"\n",
"We have initialised $\\mathbf{b}$ to zeros and randomly generated $\\mathbf{W}$ for you. The constants introduced above are:\n",
"* The number of data points $B = 3$\n",
"* The dimensionality of the input $D = 10$\n",
"* The dimensionality of the output $K = 10$"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from mlp.dataset import MNISTDataProvider\n",
"\n",
"mnist_dp = MNISTDataProvider(dset='valid', batch_size=3, max_num_batches=1, randomize=False)\n",
"B = 3\n",
"D = 784\n",
"K = 10\n",
"irange = 0.1\n",
"W = random_generator.uniform(-irange, irange, (D, K)) \n",
"b = numpy.zeros((10,))\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y1e1 0.55861474982\n",
"y1e2 0.55861474982\n",
"ye3 [ 0.55861475 0.79450077 0.17439693 0.00265688 0.66272539 -0.09985686\n",
" 0.56468591 0.58105588 -0.18613727 0.08151257]\n",
"Ye4 [[ 0.55861475 0.79450077 0.17439693 0.00265688 0.66272539 -0.09985686\n",
" 0.56468591 0.58105588 -0.18613727 0.08151257]\n",
" [-0.43965864 0.59573972 -0.22691119 0.26767124 -0.31343979 0.07224664\n",
" -0.19616183 0.0851733 -0.24088286 -0.19305162]\n",
" [-0.20176359 0.42394166 -1.03984446 0.15492101 0.15694745 -0.53741022\n",
" 0.05887668 -0.21124527 -0.07870156 -0.00506471]]\n"
]
}
],
"source": [
"mnist_dp.reset()\n",
"\n",
"#implement following functions, then run the cell\n",
"def y1_equation_1(x, W, b):\n",
" k = 0\n",
" s = 0\n",
" for j in xrange(len(x)):\n",
" s += x[j] * W[j,k]\n",
" return b[k] + s\n",
" \n",
"def y1_equation_2(x, W, b):\n",
" k = 0\n",
" return numpy.dot(x, W[:,k]) + b[k]\n",
"\n",
"def y_equation_3(x, W, b):\n",
" return numpy.dot(x, W) + b\n",
"\n",
"def y_equation_4(x, W, b):\n",
" return numpy.dot(x, W) + b\n",
"\n",
"for X, t in mnist_dp:\n",
" n = 0\n",
" y1e1 = y1_equation_1(X[n], W, b)\n",
" y1e2 = y1_equation_2(X[n], W, b)\n",
" ye3 = y_equation_3(X[n], W, b)\n",
" Ye4 = y_equation_4(X, W, b)\n",
"\n",
"print 'y1e1', y1e1\n",
"print 'y1e2', y1e2\n",
"print 'ye3', ye3\n",
"print 'Ye4', Ye4"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exercise 2\n",
"\n",
"Modify (if necessary) examples from Exercise 1 to perform **backward** propagation, that is, given $\\mathbf{y}$ (obtained in previous step) and weight matrix $\\mathbf{W}$, project $\\mathbf{y}$ onto the input space $\\mathbf{x}$ (ignore or set to zero the biases towards $\\mathbf{x}$ in backward pass). Mathematically, we are interested in the following transformation: $\\mathbf{z}=\\mathbf{y}\\mathbf{W}^T$"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.00683757 -0.13638553 0.00203203 ..., 0.02690207 -0.07364245\n",
" 0.04403087]\n",
" [-0.00447621 -0.06409652 0.01211384 ..., 0.0402248 -0.04490571\n",
" -0.05013801]\n",
" [ 0.03981022 -0.13705957 0.05882239 ..., 0.04491902 -0.08644539\n",
" -0.07106441]]\n"
]
}
],
"source": [
"y = y_equation_3(x, W, b)\n",
"z = numpy.dot(y, W.T)\n",
"\n",
"print z\n",
"assert z.shape == x.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"## Exercise 3 (optional)\n",
"\n",
"In case you do not fully understand how matrix-vector and/or matrix-matrix products work, consider implementing `my_dot_mat_mat` function (you have been given `my_dot_vec_mat` code to look at as an example) which takes as the input the following arguments:\n",
"\n",
"* D-dimensional input vector $\\mathbf{x} = (x_1, x_2, \\ldots, x_D) $.\n",
"* Weight matrix $\\mathbf{W}\\in\\mathbb{R}^{D\\times K}$:\n",
"\n",
"and returns:\n",
"\n",
"* K-dimensional output vector $\\mathbf{y} = (y_1, \\ldots, y_K) $\n",
"\n",
"Your job is to write a variant that works in a mini-batch mode where both $\\mathbf{x}\\in\\mathbb{R}^{B\\times D}$ and $\\mathbf{y}\\in\\mathbb{R}^{B\\times K}$ are matrices in which each rows contain one of $B$ data-points from mini-batch (rather than $\\mathbf{x}\\in\\mathbb{R}^{D}$ and $\\mathbf{y}\\in\\mathbb{R}^{K}$)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def my_dot_vec_mat(x, W):\n",
" J = x.shape[0]\n",
" K = W.shape[1]\n",
" assert (J == W.shape[0]), (\n",
" \"Number of columns of x expected to \"\n",
" \" to be equal to the number of rows in \"\n",
" \"W, bot got shapes %s, %s\" % (x.shape, W.shape)\n",
" )\n",
" y = numpy.zeros((K,))\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[k] += x[j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Well done!\n"
]
}
],
"source": [
"irange = 0.1 #+-range from which we draw the random numbers\n",
"\n",
"x = random_generator.uniform(-irange, irange, (5,)) \n",
"W = random_generator.uniform(-irange, irange, (5,3)) \n",
"\n",
"y_my = my_dot_vec_mat(x, W)\n",
"y_np = numpy.dot(x, W)\n",
"\n",
"same = numpy.allclose(y_my, y_np)\n",
"\n",
"if same:\n",
" print 'Well done!'\n",
"else:\n",
" print 'Matrices are different:'\n",
" print 'y_my is: ', y_my\n",
" print 'y_np is: ', y_np"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def my_dot_mat_mat(x, W):\n",
" I = x.shape[0]\n",
" J = x.shape[1]\n",
" K = W.shape[1]\n",
" assert (J == W.shape[0]), (\n",
" \"Number of columns in of x expected to \"\n",
" \" to be the same as rows in W, got\"\n",
" )\n",
" #allocate the output container\n",
" y = numpy.zeros((I, K))\n",
" \n",
" #implement here matrix-matrix inner product here\n",
" for i in xrange(0, I):\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[i, k] += x[i, j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test whether you get comparable numbers to what numpy is producing:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Well done!\n"
]
}
],
"source": [
"irange = 0.1 #+-range from which we draw the random numbers\n",
"\n",
"x = random_generator.uniform(-irange, irange, (2,5)) \n",
"W = random_generator.uniform(-irange, irange, (5,3)) \n",
"\n",
"y_my = my_dot_mat_mat(x, W)\n",
"y_np = numpy.dot(x, W)\n",
"\n",
"same = numpy.allclose(y_my, y_np)\n",
"\n",
"if same:\n",
" print 'Well done!'\n",
"else:\n",
" print 'Matrices are different:'\n",
" print 'y_my is: ', y_my\n",
" print 'y_np is: ', y_np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we benchmark each approach (we do it in separate cells, as timeit currently can measure whole cell execuiton only)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#generate bit bigger matrices, to better evaluate timings\n",
"x = random_generator.uniform(-irange, irange, (10, 1000))\n",
"W = random_generator.uniform(-irange, irange, (1000, 100))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_dot timings:\n",
"10 loops, best of 3: 726 ms per loop\n"
]
}
],
"source": [
"print 'my_dot timings:'\n",
"%timeit -n10 my_dot_mat_mat(x, W)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"numpy.dot timings:\n",
"10 loops, best of 3: 1.17 ms per loop\n"
]
}
],
"source": [
"print 'numpy.dot timings:'\n",
"%timeit -n10 numpy.dot(x, W)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Optional section ends here**\n",
"***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iterative learning of linear models\n",
"\n",
"We will learn the model with stochastic gradient descent on N data-points using mean square error (MSE) loss function, which is defined as follows:\n",
"\n",
"(5) $\n",
"E = \\frac{1}{2} \\sum_{n=1}^N ||\\mathbf{y}^n - \\mathbf{t}^n||^2 = \\sum_{n=1}^N E^n \\\\\n",
" E^n = \\frac{1}{2} ||\\mathbf{y}^n - \\mathbf{t}^n||^2\n",
"$\n",
"\n",
"(6) $ E^n = \\frac{1}{2} \\sum_{k=1}^K (y_k^n - t_k^n)^2 $\n",
" \n",
"Hence, the gradient w.r.t (with respect to) the $r$ output y of the model is defined as, so called delta function, $\\delta_r$: \n",
"\n",
"(8) $\\frac{\\partial{E^n}}{\\partial{y_{r}}} = (y^n_r - t^n_r) = \\delta^n_r \\quad ; \\quad\n",
" \\delta^n_r = y^n_r - t^n_r \\\\\n",
" \\frac{\\partial{E}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\delta^n_r\n",
"$\n",
"\n",
"Similarly, using the above $\\delta^n_r$ one can express the gradient of the weight $w_{sr}$ (from the s-th input to the r-th output) for linear model and MSE cost as follows:\n",
"\n",
"(9) $\n",
" \\frac{\\partial{E^n}}{\\partial{w_{sr}}} = (y^n_r - t^n_r)x_s^n = \\delta^n_r x_s^n \\quad\\\\\n",
" \\frac{\\partial{E}}{\\partial{w_{sr}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{w_{rs}}} = \\sum_{n=1}^N \\delta^n_r x_s^n\n",
"$\n",
"\n",
"and the gradient for bias parameter at the $r$-th output is:\n",
"\n",
"(10) $\n",
" \\frac{\\partial{E}}{\\partial{b_{r}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{b_{r}}} = \\sum_{n=1}^N \\delta^n_r\n",
"$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"![Making Predictions](res/singleLayerNetPredict.png)\n",
" \n",
" * Input vector $\\mathbf{x} = (x_1, x_2, \\ldots, x_D) $\n",
" * Output scalar $y_1$\n",
" * Weight matrix $\\mathbf{W}$: $w_{ik}$ is the weight from input $x_i$ to output $y_k$. Note, here this is really a vector since a single scalar output, y_1.\n",
" * Scalar bias $b$ for the only output in our model \n",
" * Scalar target $t$ for the only output in out model\n",
" \n",
"First, ensure you can make use of data provider (note, for training data has been normalised to zero mean and unit variance, hence different effective range than one can find in file):"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Observations: [[-0.12 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.11 -0.1 0.09 -0.06 -0.09 -0. 0.28 -0.12 -0.12 -0.08]\n",
" [-0.13 0.05 -0.13 -0.01 -0.11 -0.13 -0.13 -0.13 -0.13 -0.13]\n",
" [ 0.2 0.12 0.25 0.16 0.03 -0. 0.15 0.08 -0.08 -0.11]\n",
" [-0.13 -0.12 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.1 0.51 1.52 0.14 -0.02 0.77 0.11 0.79 -0.02 0.08]\n",
" [ 0.24 0.15 -0.01 0.08 -0.1 0.45 -0.12 -0.1 -0.13 0.48]\n",
" [ 0.13 -0.06 -0.07 -0.11 -0.11 -0.11 -0.13 -0.11 -0.02 -0.12]\n",
" [-0.06 0.28 -0.13 0.06 0.09 0.09 0.01 -0.07 0.14 -0.11]\n",
" [-0.13 -0.13 -0.1 -0.06 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]]\n",
"To predict: [[-0.12]\n",
" [-0.12]\n",
" [-0.13]\n",
" [-0.1 ]\n",
" [-0.13]\n",
" [-0.08]\n",
" [ 0.24]\n",
" [-0.13]\n",
" [-0.02]\n",
" [-0.13]]\n",
"Observations: [[-0.09 -0.13 -0.13 -0.03 -0.05 -0.11 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.03 0.32 0.28 0.09 -0.04 0.19 0.31 -0.13 0.37 0.34]\n",
" [ 0.12 0.13 0.06 -0.1 -0.1 0.94 0.24 0.12 0.28 -0.04]\n",
" [ 0.26 0.17 -0.04 -0.13 -0.12 -0.09 -0.12 -0.13 -0.1 -0.13]\n",
" [-0.1 -0.1 -0.01 -0.03 -0.07 0.05 -0.03 -0.12 -0.05 -0.13]\n",
" [-0.13 -0.13 -0.13 -0.13 -0.13 -0.13 0.1 -0.13 -0.13 -0.13]\n",
" [-0.01 -0.1 -0.13 -0.13 -0.12 -0.13 -0.13 -0.13 -0.13 -0.11]\n",
" [-0.11 -0.06 -0.11 0.02 -0.03 -0.02 -0.05 -0.11 -0.13 -0.13]\n",
" [-0.01 0.25 -0.08 0.04 -0.1 -0.12 0.06 -0.1 0.08 -0.06]\n",
" [-0.09 -0.09 -0.09 -0.13 -0.11 -0.12 -0. -0.02 0.19 -0.11]]\n",
"To predict: [[-0.13]\n",
" [-0.11]\n",
" [-0.09]\n",
" [-0.08]\n",
" [ 0.19]\n",
" [-0.13]\n",
" [-0.13]\n",
" [-0.03]\n",
" [-0.13]\n",
" [-0.11]]\n",
"Observations: [[-0.08 -0.11 -0.11 0.32 0.05 -0.11 -0.13 0.07 0.08 0.63]\n",
" [-0.07 -0.1 -0.09 -0.08 0.26 -0.05 -0.1 -0. 0.36 -0.12]\n",
" [-0.03 -0.1 0.19 -0.02 0.35 0.38 -0.1 0.44 -0.02 0.21]\n",
" [-0.12 -0. -0.02 0.19 -0.11 -0.11 -0.13 -0.11 -0.02 -0.13]\n",
" [ 0.09 0.1 -0.03 -0.05 0. -0.12 -0.12 -0.13 -0.13 -0.13]\n",
" [ 0.21 0.05 -0.12 -0.05 -0.08 -0.1 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.04 -0.11 0.19 0.16 -0.01 -0.07 -0. -0.06 -0.03 0.16]\n",
" [ 0.09 0.05 0.51 0.34 0.16 0.51 0.56 0.21 -0.06 -0. ]\n",
" [-0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.09 0.49]\n",
" [-0.06 -0.11 -0.13 0.06 -0.01 -0.12 0.54 0.2 -0.1 -0.11]]\n",
"To predict: [[ 0.1 ]\n",
" [ 0.09]\n",
" [ 0.16]\n",
" [-0.13]\n",
" [-0.13]\n",
" [ 0.04]\n",
" [-0.1 ]\n",
" [ 0.05]\n",
" [-0.1 ]\n",
" [-0.11]]\n"
]
}
],
"source": [
"from mlp.dataset import MetOfficeDataProvider\n",
"\n",
"modp = MetOfficeDataProvider(10, batch_size=10, max_num_batches=3, randomize=True)\n",
"\n",
"%precision 2\n",
"for x, t in modp:\n",
" print 'Observations: ', x\n",
" print 'To predict: ', t"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 4\n",
"\n",
"The below code implements a very simple variant of stochastic gradient descent for the weather regression example. Your task is to implement 5 functions in the next cell and then run two next cells that 1) build sgd functions and 2) run the actual training."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"\n",
"#When implementing those, take into account the mini-batch case, for which one is\n",
"#expected to sum the errors for each example\n",
"\n",
"def fprop(x, W, b):\n",
" #code implementing eq. (3)\n",
" return numpy.dot(x, W) + b\n",
"\n",
"def cost(y, t):\n",
" #Mean Square Error cost, equation (5)\n",
" return numpy.mean(0.5*numpy.sum((y - t)**2, axis=1))\n",
"\n",
"def cost_grad(y, t):\n",
" #Gradient of the cost w.r.t y equation (8)\n",
" return y - t\n",
"\n",
"def cost_wrt_W(cost_grad, x):\n",
" #Gradient of the cost w.r.t W, equation (9)\n",
" return numpy.dot(x.T, cost_grad)\n",
" \n",
"def cost_wrt_b(cost_grad):\n",
" #Gradient of the cost w.r.t to b, equation (10)\n",
" return numpy.sum(cost_grad, axis = 0)\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"\n",
"def sgd_epoch(data_provider, W, b, learning_rate):\n",
" mse_stats = []\n",
" \n",
" #get the minibatch of data\n",
" for x, t in data_provider:\n",
"\n",
" #1. get the estimate of y\n",
" y = fprop(x, W, b)\n",
"\n",
" #2. compute the loss function\n",
" tmp = cost(y, t)\n",
" mse_stats.append(tmp)\n",
" \n",
" #3. compute the grad of the cost w.r.t the output layer activation y\n",
" #i.e. how the cost changes when output y changes\n",
" cost_grad_deltas = cost_grad(y, t)\n",
"\n",
" #4. compute the gradients w.r.t model's parameters\n",
" grad_W = cost_wrt_W(cost_grad_deltas, x)\n",
" grad_b = cost_wrt_b(cost_grad_deltas)\n",
"\n",
" #4. Update the model, we update with the mean gradient\n",
" # over the minibatch, rather than sum of particular gradients\n",
" # in a minibatch, to do so we scale the learning rate by batch_size\n",
" batch_size = x.shape[0]\n",
" effect_learn_rate = learning_rate / batch_size\n",
"\n",
" W = W - effect_learn_rate * grad_W\n",
" b = b - effect_learn_rate * grad_b\n",
" \n",
" return W, b, numpy.mean(mse_stats)\n",
"\n",
"def sgd(data_provider, W, b, learning_rate=0.1, max_epochs=10):\n",
" \n",
" for epoch in xrange(0, max_epochs):\n",
" #reset the data provider\n",
" data_provider.reset()\n",
" \n",
" #train for one epoch\n",
" W, b, mean_cost = \\\n",
" sgd_epoch(data_provider, W, b, learning_rate)\n",
" \n",
" print \"MSE training cost after %d-th epoch is %f\" % (epoch + 1, mean_cost)\n",
" \n",
" return W, b\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE training cost after 1-th epoch is 0.017213\n",
"MSE training cost after 2-th epoch is 0.016103\n",
"MSE training cost after 3-th epoch is 0.015705\n",
"MSE training cost after 4-th epoch is 0.015437\n",
"MSE training cost after 5-th epoch is 0.015255\n",
"MSE training cost after 6-th epoch is 0.015128\n",
"MSE training cost after 7-th epoch is 0.015041\n",
"MSE training cost after 8-th epoch is 0.014981\n",
"MSE training cost after 9-th epoch is 0.014936\n",
"MSE training cost after 10-th epoch is 0.014903\n",
"MSE training cost after 11-th epoch is 0.014879\n",
"MSE training cost after 12-th epoch is 0.014862\n",
"MSE training cost after 13-th epoch is 0.014849\n",
"MSE training cost after 14-th epoch is 0.014839\n",
"MSE training cost after 15-th epoch is 0.014830\n",
"MSE training cost after 16-th epoch is 0.014825\n",
"MSE training cost after 17-th epoch is 0.014820\n",
"MSE training cost after 18-th epoch is 0.014813\n",
"MSE training cost after 19-th epoch is 0.014813\n",
"MSE training cost after 20-th epoch is 0.014810\n",
"MSE training cost after 21-th epoch is 0.014808\n",
"MSE training cost after 22-th epoch is 0.014805\n",
"MSE training cost after 23-th epoch is 0.014806\n",
"MSE training cost after 24-th epoch is 0.014804\n",
"MSE training cost after 25-th epoch is 0.014796\n",
"MSE training cost after 26-th epoch is 0.014798\n",
"MSE training cost after 27-th epoch is 0.014801\n",
"MSE training cost after 28-th epoch is 0.014802\n",
"MSE training cost after 29-th epoch is 0.014801\n",
"MSE training cost after 30-th epoch is 0.014799\n",
"MSE training cost after 31-th epoch is 0.014799\n",
"MSE training cost after 32-th epoch is 0.014793\n",
"MSE training cost after 33-th epoch is 0.014800\n",
"MSE training cost after 34-th epoch is 0.014796\n",
"MSE training cost after 35-th epoch is 0.014799\n",
"MSE training cost after 36-th epoch is 0.014800\n",
"MSE training cost after 37-th epoch is 0.014798\n",
"MSE training cost after 38-th epoch is 0.014799\n",
"MSE training cost after 39-th epoch is 0.014799\n",
"MSE training cost after 40-th epoch is 0.014794\n"
]
},
{
"data": {
"text/plain": [
"(array([[ 0.01],\n",
" [ 0.03],\n",
" [ 0.03],\n",
" [ 0.04],\n",
" [ 0.06],\n",
" [ 0.07],\n",
" [ 0.26]]), array([-0.]))"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"#some hyper-parameters\n",
"window_size = 7\n",
"irange = 0.1\n",
"learning_rate = 0.001\n",
"max_epochs=40\n",
"\n",
"# note, while developing you can set max_num_batches to some positive number to limit\n",
"# the number of training data-points (you will get feedback faster)\n",
"mdp = MetOfficeDataProvider(window_size, batch_size=10, max_num_batches=-100, randomize=True)\n",
"\n",
"#initialise the parameters\n",
"W = random_generator.uniform(-irange, irange, (window_size, 1))\n",
"b = random_generator.uniform(-irange, irange, (1, ))\n",
"\n",
"#train the model\n",
"sgd(mdp, W, b, learning_rate=learning_rate, max_epochs=max_epochs)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exercise 5\n",
"\n",
"Modify the above regression problem so the model makes binary classification whether the the weather is going to be one of those \\{rainy, sunny} (look at slide 12 of the 2nd lecture)\n",
"\n",
"Tip: You need to introduce the following changes:\n",
"1. Modify `MetOfficeDataProvider` (for example, inherit from MetOfficeDataProvider to create a new class MetOfficeDataProviderBin) and modify `next()` function so it returns as `targets` either 0 (sunny - if the the amount of rain [before mean/variance normalisation] is equal to 0 or 1 (rainy -- otherwise).\n",
"2. Modify the functions from previous exercise so the fprop implements `sigmoid` on top of affine transform.\n",
"3. Modify cost function to binary cross-entropy\n",
"4. Make sure you compute the gradients correctly (as you have changed both the output and the cost)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#sorry, this one will be added later..."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -1,341 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"This tutorial is an introduction to the first coursework about multi-layer networks (also known as Multi-Layer Perceptrons - MLPs - or Deep Neural Networks - DNNs). Here, we will show how to build a single layer linear model (similar to the one from the previous lab) for MNIST digit classification using the provided code-base. \n",
"\n",
"The principal purpose of this introduction is to get you familiar with how to connect the code blocks (and what operations each of them implements) in order to set up an experiment that includes 1) building the model structure 2) optimising the model's parameters (weights) and 3) evaluating the model on test data. \n",
"\n",
"## For those affected by notebook kernel issues\n",
"\n",
"In case you are still having issues with running notebook kernels, have a look at [this note](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/kernel_issue_fix.md) on the GitHub.\n",
"\n",
"## Virtual environments\n",
"\n",
"Before you proceed onwards, remember to activate your virtual environment:\n",
" * If you were in last week's Tuesday or Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`\n",
" * If you were in the Monday group:\n",
" + and if you have chosen the **comfy** way type: `workon mlpractical`\n",
" + and if you have chosen the **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):\n",
"\n",
"## Syncing the git repository\n",
"\n",
"Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for the coursework, as follows:\n",
"\n",
"1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
"2. List the branches and check which is currently active by typing: `git checkout`\n",
"3. If you are not in `master` branch, switch to it by typing: \n",
"```\n",
"git checkout master\n",
" ```\n",
"4. Then update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
"```\n",
"git pull\n",
"```\n",
"5. And now, create the new branch & swith to it by typing:\n",
"```\n",
"git checkout -b coursework1\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi Layer Models\n",
"\n",
"Today, we shall build models which can have an arbitrary number of hidden layers. Please have a look at the diagram below, and the corresponding computations (which have an *exact* matrix form as expected by numpy, and row-wise orientation; note that $\\circ$ denotes an element-wise product). In the diagram, we briefly describe how each comptation relates to the code we have provided.\n",
"\n",
"![Making Predictions](res/code_scheme.svg)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Structuring the model\n",
" * The model (for now) is allowed to have a sequence of layers, mapping inputs $\\mathbf{x}$ to outputs $\\mathbf{y}$. \n",
" * This operation is implemented as a special type of a layer in `mlp.layers.MLP` class. It keeps a sequence of other layers (of various typyes like Linear, Sigmoid, Softmax, etc.) as well as the internal state of a model for a mini-batch, that is, the intermediate data produced in *forward* and *backward* passes.\n",
"2. Forward computation\n",
" * `mlp.layers.MLP` provides an `fprop()` method that iterates over defined layers propagates $\\mathbf{x}$ to $\\mathbf{y}$. \n",
" * Each layer (look at `mlp.layers.Linear` attached below) also implements an `fprop()` method, which performs an atomic, for the given layer, operation. Most often, for the $i$-th layer, we want to obtain a linear transform $\\mathbf a^i$ of the inputs, and apply some non-linear transfer function $f^i(\\mathbf a^i)$ to produce the output $\\mathbf h^i$. Note, in general each layer may implement different activation functions $f^i()$, however for now we will use only `sigmoid` and `softmax`\n",
"3. Backward computation\n",
" * Similarly, `mlp.layers.MLP` also implements a `bprop()` function, to back-propagate the errors from the top to the bottom layer. This class also keeps the back-propagated statistics ($\\delta$) to be used later when computing the gradients with respect to the parameters.\n",
" * This functionality is also re-implemented by particular layers (again, have a look at the `bprop` function of `mlp.layers.Linear`). `bprop()` returns both $\\delta$ (needed to update the parameters) but also back-progapates the gradient down to the inputs. Also note, that depending on whether the layer is the top or not (i.e. if it deals directly with the cost function or not) some simplifications may apply ( as with cross-entropy and softmax). That's why when implementing a new type of layer that may be used as an output layer one also need to specify the implementation of `bprop_cost()`.\n",
"4. Learning the model\n",
" * The actual evaluation of the cost as well as the *forward* and *backward* passes may be found in the `train_epoch()` method of `mlp.optimisers.SGDOptimiser`\n",
" * This function also calls the `pgrads()` method on each layer, that given activations and deltas, returns the list of the gradients of the cost with respect to the model parameters, i.e. $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{W^i}}}$ and $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{b}^i}}$ at the above diagram (look at an example implementation in `mlp.layers.Linear`)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# %load -s Linear mlp/layers.py\n",
"# DO NOT RUN THIS CELL (AS YOU WILL GET ERRORS), IT WAS JUST LOADED TO VISUALISE ABOVE COMMENTS\n",
"class Linear(Layer):\n",
"\n",
" def __init__(self, idim, odim,\n",
" rng=None,\n",
" irange=0.1):\n",
"\n",
" super(Linear, self).__init__(rng=rng)\n",
"\n",
" self.idim = idim\n",
" self.odim = odim\n",
"\n",
" self.W = self.rng.uniform(\n",
" -irange, irange,\n",
" (self.idim, self.odim))\n",
"\n",
" self.b = numpy.zeros((self.odim,), dtype=numpy.float32)\n",
"\n",
" def fprop(self, inputs):\n",
" \"\"\"\n",
" Implements a forward propagation through the i-th layer, that is\n",
" some form of:\n",
" a^i = xW^i + b^i\n",
" h^i = f^i(a^i)\n",
" with f^i, W^i, b^i denoting a non-linearity, weight matrix and\n",
" biases of this (i-th) layer, respectively and x denoting inputs.\n",
"\n",
" :param inputs: matrix of features (x) or the output of the previous layer h^{i-1}\n",
" :return: h^i, matrix of transformed by layer features\n",
" \"\"\"\n",
" a = numpy.dot(inputs, self.W) + self.b\n",
" # here f() is an identity function, so just return a linear transformation\n",
" return a\n",
"\n",
" def bprop(self, h, igrads):\n",
" \"\"\"\n",
" Implements a backward propagation through the layer, that is, given\n",
" h^i denotes the output of the layer and x^i the input, we compute:\n",
" dh^i/dx^i which by chain rule is dh^i/da^i da^i/dx^i\n",
" x^i could be either features (x) or the output of the lower layer h^{i-1}\n",
" :param h: it's an activation produced in forward pass\n",
" :param igrads, error signal (or gradient) flowing to the layer, note,\n",
" this in general case does not corresponds to 'deltas' used to update\n",
" the layer's parameters, to get deltas ones need to multiply it with\n",
" the dh^i/da^i derivative\n",
" :return: a tuple (deltas, ograds) where:\n",
" deltas = igrads * dh^i/da^i\n",
" ograds = deltas \\times da^i/dx^i\n",
" \"\"\"\n",
"\n",
" # since df^i/da^i = 1 (f is assumed identity function),\n",
" # deltas are in fact the same as igrads\n",
" ograds = numpy.dot(igrads, self.W.T)\n",
" return igrads, ograds\n",
"\n",
" def bprop_cost(self, h, igrads, cost):\n",
" \"\"\"\n",
" Implements a backward propagation in case the layer directly\n",
" deals with the optimised cost (i.e. the top layer)\n",
" By default, method should implement a bprop for default cost, that is\n",
" the one that is natural to the layer's output, i.e.:\n",
" here we implement linear -> mse scenario\n",
" :param h: it's an activation produced in forward pass\n",
" :param igrads, error signal (or gradient) flowing to the layer, note,\n",
" this in general case does not corresponds to 'deltas' used to update\n",
" the layer's parameters, to get deltas ones need to multiply it with\n",
" the dh^i/da^i derivative\n",
" :param cost, mlp.costs.Cost instance defining the used cost\n",
" :return: a tuple (deltas, ograds) where:\n",
" deltas = igrads * dh^i/da^i\n",
" ograds = deltas \\times da^i/dx^i\n",
" \"\"\"\n",
"\n",
" if cost is None or cost.get_name() == 'mse':\n",
" # for linear layer and mean square error cost,\n",
" # cost back-prop is the same as standard back-prop\n",
" return self.bprop(h, igrads)\n",
" else:\n",
" raise NotImplementedError('Linear.bprop_cost method not implemented '\n",
" 'for the %s cost' % cost.get_name())\n",
"\n",
" def pgrads(self, inputs, deltas):\n",
" \"\"\"\n",
" Return gradients w.r.t parameters\n",
"\n",
" :param inputs, input to the i-th layer\n",
" :param deltas, deltas computed in bprop stage up to -ith layer\n",
" :return list of grads w.r.t parameters dE/dW and dE/db in *exactly*\n",
" the same order as the params are returned by get_params()\n",
"\n",
" Note: deltas here contain the whole chain rule leading\n",
" from the cost up to the the i-th layer, i.e.\n",
" dE/dy^L dy^L/da^L da^L/dh^{L-1} dh^{L-1}/da^{L-1} ... dh^{i}/da^{i}\n",
" and here we are just asking about\n",
" 1) da^i/dW^i and 2) da^i/db^i\n",
" since W and b are only layer's parameters\n",
" \"\"\"\n",
"\n",
" grad_W = numpy.dot(inputs.T, deltas)\n",
" grad_b = numpy.sum(deltas, axis=0)\n",
"\n",
" return [grad_W, grad_b]\n",
"\n",
" def get_params(self):\n",
" return [self.W, self.b]\n",
"\n",
" def set_params(self, params):\n",
" #we do not make checks here, but the order on the list\n",
" #is assumed to be exactly the same as get_params() returns\n",
" self.W = params[0]\n",
" self.b = params[1]\n",
"\n",
" def get_name(self):\n",
" return 'linear'\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 1: Experiment with linear models and MNIST\n",
"\n",
"The below snippet demonstrates how to use the code we have provided for the coursework 1. Get familiar with it, as from now on we will use till the end of the course, including the 2nd coursework.\n",
"\n",
"It should be straightforward to extend the following code to more complex models, like stack more layers, change the cost, the optimiser, learning rate schedules, etc.. But **ask** in case something is not clear.\n",
"\n",
"In this particular example, we use the following components:\n",
" * One layer mapping data-points ($\\mathbf x$) straight to 10 digits classes represented as 10 (linear) outputs ($\\mathbf y$). This operation is implemented as a linear layer in `mlp.layers.Linear`. Get familiar with this class (read the comments, etc.) as it is going to be a building block for the coursework.\n",
" * One can stack as many different layers as required through the container `mlp.layers.MLP`\n",
" * As an objective here we use the Mean Square Error cost defined in `mlp.costs.MSECost`\n",
" * Our *Stochastic Gradient Descent* optimiser can be found in `mlp.optimisers.SGDOptimiser`. Its parent `mlp.optimisers.Optimiser` implements validation functionality (and an interface in case one need to implement a different optimiser)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy\n",
"import logging\n",
"\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.INFO)\n",
"\n",
"from mlp.layers import MLP, Linear #import required layer types\n",
"from mlp.optimisers import SGDOptimiser #import the optimiser\n",
"from mlp.dataset import MNISTDataProvider #import data provider\n",
"from mlp.costs import MSECost #import the cost we want to use for optimisation\n",
"from mlp.schedulers import LearningRateFixed\n",
"\n",
"rng = numpy.random.RandomState([2015,10,10])\n",
"\n",
"# define the model structure, here just one linear layer\n",
"# and mean square error cost\n",
"cost = MSECost()\n",
"model = MLP(cost=cost)\n",
"model.add_layer(Linear(idim=784, odim=10, rng=rng))\n",
"#one can stack more layers here\n",
"\n",
"# define the optimiser, here stochasitc gradient descent\n",
"# with fixed learning rate and max_epochs as stopping criterion\n",
"lr_scheduler = LearningRateFixed(learning_rate=0.01, max_epochs=20)\n",
"optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
"\n",
"logger.info('Initialising data providers...')\n",
"train_dp = MNISTDataProvider(dset='train', batch_size=100, max_num_batches=-10, randomize=True)\n",
"valid_dp = MNISTDataProvider(dset='valid', batch_size=100, max_num_batches=-10, randomize=False)\n",
"\n",
"logger.info('Training started...')\n",
"optimiser.train(model, train_dp, valid_dp)\n",
"\n",
"logger.info('Testing the model on test set:')\n",
"test_dp = MNISTDataProvider(dset='eval', batch_size=100, max_num_batches=-10, randomize=False)\n",
"cost, accuracy = optimiser.validate(model, test_dp)\n",
"logger.info('MNIST test set accuracy is %.2f %% (cost is %.3f)'%(accuracy*100., cost))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise\n",
"\n",
"Modify the above code by adding an intemediate linear layer of size 200 hidden units between input and output layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy\n",
"import logging\n",
"\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.INFO)\n",
"\n",
"from mlp.layers import MLP, Linear #import required layer types\n",
"from mlp.optimisers import SGDOptimiser #import the optimiser\n",
"from mlp.dataset import MNISTDataProvider #import data provider\n",
"from mlp.costs import MSECost #import the cost we want to use for optimisation\n",
"from mlp.schedulers import LearningRateFixed\n",
"\n",
"rng = numpy.random.RandomState([2015,10,10])\n",
"\n",
"# define the model structure, here just one linear layer\n",
"# and mean square error cost\n",
"cost = MSECost()\n",
"model = MLP(cost=cost)\n",
"model.add_layer(Linear(idim=784, odim=200, rng=rng))\n",
"model.add_layer(Linear(idim=200, odim=10, rng=rng))\n",
"\n",
"# define the optimiser, here stochasitc gradient descent\n",
"# with fixed learning rate and max_epochs as stopping criterion\n",
"lr_scheduler = LearningRateFixed(learning_rate=0.01, max_epochs=20)\n",
"optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
"\n",
"logger.info('Initialising data providers...')\n",
"train_dp = MNISTDataProvider(dset='train', batch_size=100, max_num_batches=-10, randomize=True)\n",
"valid_dp = MNISTDataProvider(dset='valid', batch_size=100, max_num_batches=-10, randomize=False)\n",
"\n",
"logger.info('Training started...')\n",
"optimiser.train(model, train_dp, valid_dp)\n",
"\n",
"logger.info('Testing the model on test set:')\n",
"test_dp = MNISTDataProvider(dset='eval', batch_size=100, max_num_batches=-10, randomize=False)\n",
"cost, accuracy = optimiser.validate(model, test_dp)\n",
"logger.info('MNIST test set accuracy is %.2f %% (cost is %.3f)'%(accuracy*100., cost))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -1,336 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Coursework #1\n",
"\n",
"## Introduction\n",
"\n",
"This coursework is concerned with building multi-layer networks to address the MNIST digit classification problem. It builds on the previous labs, in particular [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb) in which single layer networks were trained for MNIST digit classification. The course will involve extending that code to use Sigmoid and Softmax layers, combining these into multi-layer networks, and carrying out a number of MNIST digit classification experiments, to investigate the effect of learning rate, the number of hidden units, and the number of hidden layers.\n",
"\n",
"The coursework is divided into 4 tasks:\n",
"* **Task 1**: *Implementing a sigmoid layer* - 15 marks. \n",
"This task involves extending the `Linear` class in file `mlp/layers.py` to `Sigmoid`, with code for forward prop, backprop computation of the gradient, and weight update.\n",
"* **Task 2**: *Implementing a softmax layer* - 15 marks. \n",
"This task involves extending the `Linear` class in file `mlp/layers.py` to `Softmax`, with code for forward prop, backprop computation of the gradient, and weight update.\n",
"* **Task 3**: *Constructing a multi-layer network* - 40 marks. \n",
"This task involves putting together a Sigmoid and a Softmax layer to create a multi-layer network, with one hidden layer (100 units) and one output layer, that is trained to classify MNIST digits. This task will include reporting classification results, exploring the effect of learning rates, and plotting Hinton Diagrams for the hidden units and output units.\n",
"* **Task 4**: *Experiments with different architectures* - 30 marks. \n",
"This task involves further MNIST classification experiments, primarily looking at the effect of using different numbers of hidden layers.\n",
"The coursework will be marked out of 100, and will contribute 30% of the total mark in the MLP course.\n",
"\n",
"## Previous Tutorials\n",
"\n",
"Before starting this coursework make sure that you have completed the first three labs:\n",
"\n",
"* [00_Introduction.ipynb](00_Introduction.ipynb) - setting up your environment; *Solutions*: [00_Introduction_solution.ipynb](00_Introduction_solution.ipynb)\n",
"* [01_Linear_Models.ipynb](01_Linear_Models.ipynb) - training single layer networks; *Solutions*: [01_Linear_Models_solution.ipynb](01_Linear_Models_solution.ipynb)\n",
"* [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb) - training a single layer network for MNIST digit classification\n",
"\n",
"To ensure that your virtual environment is correct, please see [this note](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/kernel_issue_fix.md) on the GitHub.\n",
"## Submission\n",
"**Submission Deadline: Thursday 29 October, 16:00** \n",
"\n",
"Submit the coursework as an ipython notebook file, using the `submit` command in the terminal on a DICE machine. If your file is `03_MLP_Coursework1.ipynb` then you would enter:\n",
"\n",
"`submit mlp 1 03_MLP_Coursework1.ipynb` \n",
"\n",
"where `mlp 1` indicates this is the first coursework of MLP.\n",
"\n",
"After submitting, you should receive an email of acknowledgment from the system confirming that your submission has been received successfully. Keep the email as evidence of your coursework submission.\n",
"\n",
"**Please make sure you submit a single `ipynb` file (and nothing else)!**\n",
"\n",
"**Submission Deadline: Thursday 29 October, 16:00** \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting Started\n",
"Please enter your exam number and the date in the next code cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#MLP Coursework 1\n",
"#Exam number: <ENTER EXAM NUMBER>\n",
"#Date: <ENTER DATE>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please run the next code cell, which imports `numpy` and seeds the random number generator. Please **do not** modify the random number generator seed!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy\n",
"\n",
"#Seed a random number generator running the below cell, but do **not** modify the seed.\n",
"rng = numpy.random.RandomState([2015,10,10])\n",
"rng_state = rng.get_state()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 1 - Sigmoid Layer (15%)\n",
"\n",
"In this task you need to create a class `Sigmoid` which encapsulates a layer of sigmoid units. You should do this by extending the `mlp.layers.Linear` class (in file `mlp/layers.py`), which implements a a layer of linear units (i.e. weighted sum plus bias). The `Sigmoid` class extends this by applying the sigmoid transfer function to the weighted sum in the forward propagation, and applying the derivative of the sigmoid in the gradient descent back propagation and computing the gradients with respect to layer's parameters. Do **not** copy the implementation provided in `Linear` class but rather, **reuse** it through inheritance.\n",
"\n",
"When you have implemented `Sigmoid` (in the `mlp.layers` module), then please test it by running the below code cell.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from mlp.layers import Sigmoid\n",
"\n",
"a = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49])\n",
"b = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49, 20, 20])\n",
"\n",
"rng.set_state(rng_state)\n",
"sigm = Sigmoid(idim=a.shape[0], odim=b.shape[0], rng=rng)\n",
"\n",
"fp = sigm.fprop(a)\n",
"deltas, ograds = sigm.bprop(h=fp, igrads=b)\n",
"\n",
"print fp.sum()\n",
"print deltas.sum()\n",
"print ograds.sum()\n",
"%precision 3\n",
"print fp\n",
"print deltas\n",
"print ograds\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"To include the `Sigmoid` code in the notebook please run the below code cell. (The `%load` notebook command is used to load the source of the `Sigmoid` class from `mlp/layers.py`.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%load -s Sigmoid mlp/layers.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 2 - Softmax (15%)\n",
"\n",
"In this task you need to create a class `Softmax` which encapsulates a layer of softmax units. As in the previous task, you should do this by extending the `mlp.layers.Linear` class (in file `mlp/layers.py`).\n",
"\n",
"When you have implemented `Softmax` (in the `mlp.layers` module), then please test it by running the below code cell.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from mlp.layers import Softmax\n",
"\n",
"a = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49])\n",
"b = numpy.asarray([0, 0, 0, 0, 0, 0, 0, 1])\n",
"\n",
"rng.set_state(rng_state)\n",
"softmax = Softmax(idim=a.shape[0], odim=b.shape[0], rng=rng)\n",
"\n",
"fp = softmax.fprop(a)\n",
"deltas, ograds = softmax.bprop_cost(h=None, igrads=fp-b, cost=None)\n",
"\n",
"print fp.sum()\n",
"print deltas.sum()\n",
"print ograds.sum()\n",
"%precision 3\n",
"print fp\n",
"print deltas\n",
"print ograds\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"To include the `Softmax` code in the notebook please run the below code cell. (The notebook `%load` command is used to load the source of the `Softmax` class from `mlp/layers.py`.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%load -s Softmax mlp/layers.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 3 - Multi-layer network for MNIST classification (40%)\n",
"\n",
"**(a)** (20%) Building on the single layer linear network for MNIST classification used in lab [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb), and using the `Sigmoid` and `Softmax` classes that you implemented in tasks 1 and 2, construct and learn a model that classifies MNIST images and:\n",
" * Has one hidden layer with a sigmoid transfer function and 100 units\n",
" * Uses a softmax output layer to discriminate between the 10 digit classes (use the `mlp.costs.CECost()` cost)\n",
"\n",
"Your code should print the final values of the error function and the classification accuracy for train, validation, and test sets (please keep also the log information printed by default by the optimiser). Limit the number of training epochs to 30. You can, of course, split your code across as many cells as you think is necessary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# include here the complete code that constructs the model, performs training,\n",
"# and prints the error and accuracy for train/valid/test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(b)** (10%) Investigate the impact of different learning rates $\\eta \\in \\{0.5, 0.2, 0.1, 0.05, 0.01, 0.005\\}$ on the convergence of the network training as well as the final accuracy:\n",
" * Plot (on a single graph) the error rate curves for each learning rate as a function of training epochs for training set\n",
" * Plot (on another single graph) the error rate curves as a function of training epochs for validation set\n",
" * Include a table of the corresponding error rates for test set\n",
"\n",
"The notebook command `%matplotlib inline` ensures that your graphs will be added to the notebook, rather than opened as additional windows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(c)** (10%) Plot the following graphs:\n",
" * Display the 784-element weight vector of each of the 100 hidden units as 10x10 grid plot of 28x28 images, in order to visualise what features of the input they are encoding. To do this, take the weight vector of each hidden unit, reshape to 28x28, and plot using the `imshow` function).\n",
" * Plot a Hinton Diagram of the output layer weight matrix for digits 0 and 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Task 4 - Experiments with 1-5 hidden layers (30%)\n",
"\n",
"In this task use the learning rate which resulted in the best accuracy in your experiments in Task 3 (b). Perform the following experiments:\n",
"\n",
" * Train a similar model to Task 3, with one hidden layer, but with 800 hidden units. \n",
" * Train 4 additional models with 2, 3, 4 and 5 hidden layers. Set the number of hidden units for each model, such that all the models have similar number of trainable weights ($\\pm$2%). For simplicity, for a given model, keep the number of units in each hidden layer the same.\n",
" * Plot value of the error function for training and validation sets as a function of training epochs for each model\n",
" * Plot the test set classification accuracy as a function of the number of hidden layers\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"This is the end of coursework 1.\n",
"\n",
"Please remember to save your notebook, and submit your notebook following the instructions at the top. Please make sure that you have executed all the code cells when you submit the notebook.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

File diff suppressed because one or more lines are too long

View File

@ -1,708 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"This tutorial focuses on implementation of alternatives to sigmoid transfer functions for hidden units. (*Transfer functions* are also called *activation functions* or *nonlinearities*.) First, we will work with hyperboilc tangent (tanh) and then unbounded (or partially unbounded) piecewise linear functions: Rectifying Linear Units (ReLU) and Maxout.\n",
"\n",
"\n",
"## Virtual environments\n",
"\n",
"Before you proceed onwards, remember to activate your virtual environment by typing `activate_mlp` or `source ~/mlpractical/venv/bin/activate` (or if you did the original install the \"comfy way\" type: `workon mlpractical`).\n",
"\n",
"\n",
"## Syncing the git repository\n",
"\n",
"Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for this lab, as follows:\n",
"\n",
"1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
"2. List the branches and check which are currently active by typing: `git branch`\n",
"3. If you have followed our recommendations, you should be in the `lab4` branch, please commit your local changed to the repo index by typing:\n",
"```\n",
"git commit -am \"finished lab4\"\n",
"```\n",
"4. Now you can switch to `master` branch by typing: \n",
"```\n",
"git checkout master\n",
" ```\n",
"5. To update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
"```\n",
"git pull\n",
"```\n",
"6. And now, create the new branch & switch to it by typing:\n",
"```\n",
"git checkout -b lab5\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview of alternative transfer functions\n",
"\n",
"Now, we briefly summarise some other possible choices for hidden layer transfer functions.\n",
"\n",
"## Tanh\n",
"\n",
"Given a linear activation $a_{i}$ tanh implements the following operation:\n",
"\n",
"(1) $h_i(a_i) = \\mbox{tanh}(a_i) = \\frac{\\exp(a_i) - \\exp(-a_i)}{\\exp(a_i) + \\exp(-a_i)}$\n",
"\n",
"Hence, the derivative of $h_i$ with respect to $a_i$ is:\n",
"\n",
"(2) $\\begin{align}\n",
"\\frac{\\partial h_i}{\\partial a_i} &= 1 - h^2_i\n",
"\\end{align}\n",
"$\n",
"\n",
"\n",
"## ReLU\n",
"\n",
"Given a linear activation $a_{i}$ relu implements the following operation:\n",
"\n",
"(3) $h_i(a_i) = \\max(0, a_i)$\n",
"\n",
"Hence, the gradient is :\n",
"\n",
"(4) $\\begin{align}\n",
"\\frac{\\partial h_i}{\\partial a_i} &=\n",
"\\begin{cases}\n",
" 1 & \\quad \\text{if } a_i > 0 \\\\\n",
" 0 & \\quad \\text{if } a_i \\leq 0 \\\\\n",
"\\end{cases}\n",
"\\end{align}\n",
"$\n",
"\n",
"ReLU implements a form of data-driven sparsity, that is, on average the activations are sparse (many of them are 0) but the general sparsity pattern will depend on particular data-point. This is different from sparsity obtained in model's parameters one can obtain with $L1$ regularisation as the latter affect all data-points in the same way.\n",
"\n",
"## Maxout\n",
"\n",
"Maxout is an example of data-driven type of non-linearity in which the transfer function can be learned from data. That is, the model can build a non-linear transfer function from piecewise linear components. These linear components, depending on the number of linear regions used in the pooling operator (given by parameter $K$), can approximate arbitrary functions, such as ReLU, abs, etc.\n",
"\n",
"Given some subset (group, pool) of $K$ linear activations $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ at the $l$-th layer, maxout implements the following operation:\n",
"\n",
"(5) $h_i(a_j, a_{j+1}, \\ldots, a_{j+K}) = \\max(a_j, a_{j+1}, \\ldots, a_{j+K})$\n",
"\n",
"Hence, the gradient of $h_i$ w.r.t to the pooling region $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ is :\n",
"\n",
"(6) $\\begin{align}\n",
"\\frac{\\partial h_i}{\\partial (a_j, a_{j+1}, \\ldots, a_{j+K})} &=\n",
"\\begin{cases}\n",
" 1 & \\quad \\text{for the max activation} \\\\\n",
" 0 & \\quad \\text{otherwise} \\\\\n",
"\\end{cases}\n",
"\\end{align}\n",
"$\n",
"\n",
"Implementation tips are given in Exercise 3.\n",
"\n",
"# On weight initialisation\n",
"\n",
"Activation functions directly affect the \"network dynamics\", that is, the magnitudes of the statistics each layer is producing. For example, *slashing* non-linearities like sigmoid or tanh bring the linear activations to a certain bounded range. ReLU, on the contrary, has an unbounded positive side. This directly affects all statistics collected in forward and backward passes as well as the gradients w.r.t paramters - hence also the pace at which the model learns. That is why learning rate is usually required to be tuned for given the characterictics of the non-linearities used. \n",
"\n",
"Another important hyperparameter is the initial range used to initialise the weight matrices. We have largely ignored it so far (although if you did further experiments in coursework 1, you may have found setting it had an effect on training deeper networks with 4 or 5 hidden layers). However, for sigmoidal non-linearities (sigmoid, tanh) the initialisation range is an important hyperparameter and a considerable amount of research has been put into determining what is the best strategy for choosing it. In fact, one of the early triggers of the recent resurgence of deep learning was pre-training - techniques for initialising weights in an unsupervised manner so that one can effectively train deeper models in supervised fashion later. \n",
"\n",
"## Sigmoidal transfer functions\n",
"\n",
"Y. LeCun in [Efficient Backprop](http://link.springer.com/chapter/10.1007%2F3-540-49430-8_2) recommends the following setting of the initial range $r$ for sigmoidal units (assuming that the data has been normalised to zero mean, unit variance): \n",
"\n",
"(7) $ r = \\frac{1}{\\sqrt{N_{IN}}} $\n",
"\n",
"where $N_{IN}$ is the number of inputs to the given layer and the weights are then sampled from the (usually uniform) distribution $U(-r,r)$. The motivation is to keep the initial forward-pass signal in the linear region of the sigmoid non-linearity so that the gradients are large enough for training to proceed (note that the sigmoidal non-linearities saturate when activations are either very positive or very negative, leading to very small gradients and hence poor learning dynamics).\n",
"\n",
"The initialisation used in (7) however leads to different magnitudes of activations/gradients at different layers (due to multiplicative nature of the computations) and more recently, [Glorot et. al](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf) proposed the so-called *normalised initialisation*, which ensures the variance of the forward signal (activations) is approximately the same in each layer. The same applies to the gradients obtained in backward pass. \n",
"\n",
"The $r$ in the *normalised initialisation* for $\\mbox{tanh}$ non-linearity is then:\n",
"\n",
"(8) $ r = \\frac{\\sqrt{6}}{\\sqrt{N_{IN}+N_{OUT}}} $\n",
"\n",
"For the sigmoid (logistic) non-linearity, to get similiar characteristics, one should scale $r$ in (8) by 4, that is:\n",
"\n",
"(9) $ r = \\frac{4\\sqrt{6}}{\\sqrt{N_{IN}+N_{OUT}}} $\n",
"\n",
"## Piece-wise linear transfer functions (ReLU, Maxout)\n",
"\n",
"For unbounded transfer functions initialisation is not as crucial as for sigmoidal ones. This is due to the fact that their gradients do not diminish (they are acutally more likely to explode) and they do not saturate (ReLU saturates at 0, but not on the positive slope, where gradient is 1 everywhere). (In practice ReLU is sometimes \"clipped\" with a maximum value, typically 20).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 1: Implement the tanh transfer function\n",
"\n",
"Your implementation should follow the code conventions used to build other layer types (for example, Sigmoid and Softmax). Test your solution by training a one-hidden-layer model with 100 hidden units, similiar to the one used in Task 3a in the coursework. \n",
"\n",
"Tune the learning rate and compare the initial ranges in equations (7) and (8). Note that there might not be much difference for one-hidden-layer model, but you can easily notice a substantial gain from using (8) (or (9) for logistic sigmoid activation) for deeper models, for example, the 5 hidden-layer network from the first coursework.\n",
"\n",
"Implementation tip: Use numpy.tanh() to compute the non-linearity. Use the irange argument when creating the given layer type to provide the initial sampling range."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Initialising data providers...\n"
]
}
],
"source": [
"import numpy\n",
"import logging\n",
"from mlp.dataset import MNISTDataProvider\n",
"\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.INFO)\n",
"\n",
"# Note, you were asked to do run the experiments on all data and smaller models. \n",
"# Here I am running the exercises on 1000 training data-points only (similar to regularisation notebook)\n",
"logger.info('Initialising data providers...')\n",
"train_dp = MNISTDataProvider(dset='train', batch_size=10, max_num_batches=100, randomize=True)\n",
"valid_dp = MNISTDataProvider(dset='valid', batch_size=10000, max_num_batches=-10, randomize=False)\n",
"test_dp = MNISTDataProvider(dset='eval', batch_size=10000, max_num_batches=-10, randomize=False)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Training started...\n",
"INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 2.319. Accuracy is 10.50%\n",
"INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 2.315. Accuracy is 11.33%\n",
"INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 1.048. Accuracy is 66.30%\n",
"INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 0.571. Accuracy is 82.72%\n",
"INFO:mlp.optimisers:Epoch 1: Took 2 seconds. Training speed 764 pps. Validation speed 12988 pps.\n",
"INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 0.485. Accuracy is 84.40%\n",
"INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 0.455. Accuracy is 86.58%\n",
"INFO:mlp.optimisers:Epoch 2: Took 2 seconds. Training speed 720 pps. Validation speed 12988 pps.\n",
"INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 0.362. Accuracy is 87.70%\n",
"INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 0.435. Accuracy is 86.90%\n",
"INFO:mlp.optimisers:Epoch 3: Took 2 seconds. Training speed 788 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 0.251. Accuracy is 92.10%\n",
"INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 0.417. Accuracy is 88.09%\n",
"INFO:mlp.optimisers:Epoch 4: Took 2 seconds. Training speed 788 pps. Validation speed 13159 pps.\n",
"INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 0.175. Accuracy is 95.40%\n",
"INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 0.405. Accuracy is 88.16%\n",
"INFO:mlp.optimisers:Epoch 5: Took 2 seconds. Training speed 776 pps. Validation speed 12988 pps.\n",
"INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 0.121. Accuracy is 96.40%\n",
"INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 0.458. Accuracy is 87.24%\n",
"INFO:mlp.optimisers:Epoch 6: Took 2 seconds. Training speed 690 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 0.091. Accuracy is 97.90%\n",
"INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 0.418. Accuracy is 88.37%\n",
"INFO:mlp.optimisers:Epoch 7: Took 2 seconds. Training speed 841 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 0.065. Accuracy is 98.70%\n",
"INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 0.400. Accuracy is 89.44%\n",
"INFO:mlp.optimisers:Epoch 8: Took 2 seconds. Training speed 794 pps. Validation speed 12501 pps.\n",
"INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 0.043. Accuracy is 99.30%\n",
"INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 0.406. Accuracy is 89.35%\n",
"INFO:mlp.optimisers:Epoch 9: Took 2 seconds. Training speed 747 pps. Validation speed 12822 pps.\n",
"INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 0.029. Accuracy is 99.50%\n",
"INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 0.410. Accuracy is 89.69%\n",
"INFO:mlp.optimisers:Epoch 10: Took 2 seconds. Training speed 953 pps. Validation speed 12822 pps.\n",
"INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 0.023. Accuracy is 99.80%\n",
"INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 0.424. Accuracy is 89.41%\n",
"INFO:mlp.optimisers:Epoch 11: Took 2 seconds. Training speed 953 pps. Validation speed 13159 pps.\n",
"INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 0.018. Accuracy is 99.80%\n",
"INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 0.429. Accuracy is 89.50%\n",
"INFO:mlp.optimisers:Epoch 12: Took 2 seconds. Training speed 870 pps. Validation speed 12988 pps.\n",
"INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 0.015. Accuracy is 99.90%\n",
"INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 0.428. Accuracy is 89.58%\n",
"INFO:mlp.optimisers:Epoch 13: Took 2 seconds. Training speed 878 pps. Validation speed 12822 pps.\n",
"INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 0.012. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 0.436. Accuracy is 89.41%\n",
"INFO:mlp.optimisers:Epoch 14: Took 2 seconds. Training speed 894 pps. Validation speed 12501 pps.\n",
"INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 0.010. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 0.433. Accuracy is 89.64%\n",
"INFO:mlp.optimisers:Epoch 15: Took 2 seconds. Training speed 834 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 0.009. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 0.439. Accuracy is 89.63%\n",
"INFO:mlp.optimisers:Epoch 16: Took 2 seconds. Training speed 820 pps. Validation speed 12988 pps.\n",
"INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 0.443. Accuracy is 89.78%\n",
"INFO:mlp.optimisers:Epoch 17: Took 2 seconds. Training speed 902 pps. Validation speed 12501 pps.\n",
"INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 0.446. Accuracy is 89.72%\n",
"INFO:mlp.optimisers:Epoch 18: Took 2 seconds. Training speed 870 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 0.445. Accuracy is 89.83%\n",
"INFO:mlp.optimisers:Epoch 19: Took 2 seconds. Training speed 918 pps. Validation speed 12822 pps.\n",
"INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 0.451. Accuracy is 89.75%\n",
"INFO:mlp.optimisers:Epoch 20: Took 2 seconds. Training speed 834 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 0.454. Accuracy is 89.80%\n",
"INFO:mlp.optimisers:Epoch 21: Took 2 seconds. Training speed 902 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 0.456. Accuracy is 89.77%\n",
"INFO:mlp.optimisers:Epoch 22: Took 2 seconds. Training speed 863 pps. Validation speed 12501 pps.\n",
"INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 0.458. Accuracy is 89.84%\n",
"INFO:mlp.optimisers:Epoch 23: Took 2 seconds. Training speed 820 pps. Validation speed 12822 pps.\n",
"INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 0.460. Accuracy is 89.80%\n",
"INFO:mlp.optimisers:Epoch 24: Took 2 seconds. Training speed 856 pps. Validation speed 12988 pps.\n",
"INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 0.461. Accuracy is 89.86%\n",
"INFO:mlp.optimisers:Epoch 25: Took 2 seconds. Training speed 902 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 0.004. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 0.467. Accuracy is 89.86%\n",
"INFO:mlp.optimisers:Epoch 26: Took 2 seconds. Training speed 910 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 0.004. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 0.466. Accuracy is 89.81%\n",
"INFO:mlp.optimisers:Epoch 27: Took 2 seconds. Training speed 827 pps. Validation speed 12501 pps.\n",
"INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 0.004. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 0.468. Accuracy is 89.84%\n",
"INFO:mlp.optimisers:Epoch 28: Took 2 seconds. Training speed 894 pps. Validation speed 12501 pps.\n",
"INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 0.004. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 0.471. Accuracy is 89.83%\n",
"INFO:mlp.optimisers:Epoch 29: Took 2 seconds. Training speed 902 pps. Validation speed 12659 pps.\n",
"INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 0.004. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 0.473. Accuracy is 89.81%\n",
"INFO:mlp.optimisers:Epoch 30: Took 2 seconds. Training speed 918 pps. Validation speed 11495 pps.\n",
"INFO:root:Testing the model on test set:\n",
"INFO:root:MNIST test set accuracy is 89.33 %, cost (ce) is 0.480\n"
]
}
],
"source": [
"\n",
"from mlp.layers import MLP, Tanh, Softmax #import required layer types\n",
"from mlp.optimisers import SGDOptimiser #import the optimiser\n",
"from mlp.costs import CECost #import the cost we want to use for optimisation\n",
"from mlp.schedulers import LearningRateFixed\n",
"\n",
"rng = numpy.random.RandomState([2015,10,10])\n",
"\n",
"#some hyper-parameters\n",
"nhid = 100\n",
"learning_rate = 0.2\n",
"max_epochs = 30\n",
"cost = CECost()\n",
" \n",
"stats = []\n",
"for layer in xrange(1, 2):\n",
"\n",
" train_dp.reset()\n",
" valid_dp.reset()\n",
" test_dp.reset()\n",
" \n",
" #define the model\n",
" model = MLP(cost=cost)\n",
" model.add_layer(Tanh(idim=784, odim=nhid, irange=1./numpy.sqrt(784), rng=rng))\n",
" for i in xrange(1, layer):\n",
" logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
" model.add_layer(Tanh(idim=nhid, odim=nhid, irange=0.2, rng=rng))\n",
" model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
"\n",
" # define the optimiser, here stochasitc gradient descent\n",
" # with fixed learning rate and max_epochs\n",
" lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
" optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
"\n",
" logger.info('Training started...')\n",
" tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
"\n",
" logger.info('Testing the model on test set:')\n",
" tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
" logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
" \n",
" stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 2: Implement ReLU\n",
"\n",
"Again, your implementation should follow the conventions used to build Linear, Sigmoid and Softmax layers. As in exercise 1, test your solution by training a one-hidden-layer model with 100 hidden units, similiar to the one used in Task 3a in the coursework. Tune the learning rate (start with the initial one set to 0.1) with the initial weight range set to 0.05."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Training started...\n",
"INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 2.317. Accuracy is 15.20%\n",
"INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 2.317. Accuracy is 13.98%\n",
"INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 1.400. Accuracy is 57.10%\n",
"INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 0.666. Accuracy is 84.13%\n",
"INFO:mlp.optimisers:Epoch 1: Took 0 seconds. Training speed 6600 pps. Validation speed 53922 pps.\n",
"INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 0.607. Accuracy is 82.20%\n",
"INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 0.486. Accuracy is 85.70%\n",
"INFO:mlp.optimisers:Epoch 2: Took 0 seconds. Training speed 3536 pps. Validation speed 80001 pps.\n",
"INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 0.413. Accuracy is 88.20%\n",
"INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 0.427. Accuracy is 87.59%\n",
"INFO:mlp.optimisers:Epoch 3: Took 0 seconds. Training speed 6251 pps. Validation speed 72222 pps.\n",
"INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 0.313. Accuracy is 90.30%\n",
"INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 0.399. Accuracy is 88.12%\n",
"INFO:mlp.optimisers:Epoch 4: Took 0 seconds. Training speed 5038 pps. Validation speed 65450 pps.\n",
"INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 0.233. Accuracy is 93.90%\n",
"INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 0.370. Accuracy is 89.09%\n",
"INFO:mlp.optimisers:Epoch 5: Took 0 seconds. Training speed 5425 pps. Validation speed 69818 pps.\n",
"INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 0.189. Accuracy is 94.90%\n",
"INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 0.394. Accuracy is 88.26%\n",
"INFO:mlp.optimisers:Epoch 6: Took 0 seconds. Training speed 5226 pps. Validation speed 73042 pps.\n",
"INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 0.141. Accuracy is 96.60%\n",
"INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 0.386. Accuracy is 88.72%\n",
"INFO:mlp.optimisers:Epoch 7: Took 0 seconds. Training speed 3155 pps. Validation speed 58826 pps.\n",
"INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 0.105. Accuracy is 98.50%\n",
"INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 0.375. Accuracy is 89.52%\n",
"INFO:mlp.optimisers:Epoch 8: Took 0 seconds. Training speed 5681 pps. Validation speed 72784 pps.\n",
"INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 0.084. Accuracy is 98.80%\n",
"INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 0.385. Accuracy is 89.54%\n",
"INFO:mlp.optimisers:Epoch 9: Took 0 seconds. Training speed 6656 pps. Validation speed 77418 pps.\n",
"INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 0.070. Accuracy is 98.70%\n",
"INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 0.385. Accuracy is 89.74%\n",
"INFO:mlp.optimisers:Epoch 10: Took 0 seconds. Training speed 7067 pps. Validation speed 76853 pps.\n",
"INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 0.051. Accuracy is 99.50%\n",
"INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 0.411. Accuracy is 88.81%\n",
"INFO:mlp.optimisers:Epoch 11: Took 0 seconds. Training speed 5863 pps. Validation speed 71343 pps.\n",
"INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 0.044. Accuracy is 99.50%\n",
"INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 0.398. Accuracy is 89.34%\n",
"INFO:mlp.optimisers:Epoch 12: Took 0 seconds. Training speed 5974 pps. Validation speed 64065 pps.\n",
"INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 0.035. Accuracy is 99.50%\n",
"INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 0.400. Accuracy is 89.37%\n",
"INFO:mlp.optimisers:Epoch 13: Took 0 seconds. Training speed 6211 pps. Validation speed 82847 pps.\n",
"INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 0.027. Accuracy is 99.70%\n",
"INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 0.411. Accuracy is 89.30%\n",
"INFO:mlp.optimisers:Epoch 14: Took 0 seconds. Training speed 5834 pps. Validation speed 68986 pps.\n",
"INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 0.023. Accuracy is 99.80%\n",
"INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 0.398. Accuracy is 89.99%\n",
"INFO:mlp.optimisers:Epoch 15: Took 0 seconds. Training speed 5601 pps. Validation speed 73777 pps.\n",
"INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 0.020. Accuracy is 99.80%\n",
"INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 0.413. Accuracy is 89.64%\n",
"INFO:mlp.optimisers:Epoch 16: Took 0 seconds. Training speed 6732 pps. Validation speed 62236 pps.\n",
"INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 0.017. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 0.411. Accuracy is 89.74%\n",
"INFO:mlp.optimisers:Epoch 17: Took 0 seconds. Training speed 5339 pps. Validation speed 70669 pps.\n",
"INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 0.016. Accuracy is 99.90%\n",
"INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 0.414. Accuracy is 89.78%\n",
"INFO:mlp.optimisers:Epoch 18: Took 0 seconds. Training speed 6595 pps. Validation speed 72214 pps.\n",
"INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 0.014. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 0.418. Accuracy is 89.79%\n",
"INFO:mlp.optimisers:Epoch 19: Took 0 seconds. Training speed 5860 pps. Validation speed 64213 pps.\n",
"INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 0.012. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 0.418. Accuracy is 90.04%\n",
"INFO:mlp.optimisers:Epoch 20: Took 0 seconds. Training speed 5726 pps. Validation speed 66591 pps.\n",
"INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 0.011. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 0.422. Accuracy is 89.87%\n",
"INFO:mlp.optimisers:Epoch 21: Took 0 seconds. Training speed 4780 pps. Validation speed 62159 pps.\n",
"INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 0.011. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 0.425. Accuracy is 89.98%\n",
"INFO:mlp.optimisers:Epoch 22: Took 0 seconds. Training speed 4839 pps. Validation speed 79642 pps.\n",
"INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 0.010. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 0.425. Accuracy is 90.09%\n",
"INFO:mlp.optimisers:Epoch 23: Took 0 seconds. Training speed 6742 pps. Validation speed 82643 pps.\n",
"INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 0.009. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 0.429. Accuracy is 90.00%\n",
"INFO:mlp.optimisers:Epoch 24: Took 0 seconds. Training speed 6590 pps. Validation speed 80128 pps.\n",
"INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 0.432. Accuracy is 90.04%\n",
"INFO:mlp.optimisers:Epoch 25: Took 0 seconds. Training speed 4635 pps. Validation speed 69917 pps.\n",
"INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 0.435. Accuracy is 89.99%\n",
"INFO:mlp.optimisers:Epoch 26: Took 0 seconds. Training speed 6685 pps. Validation speed 79693 pps.\n",
"INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 0.437. Accuracy is 90.04%\n",
"INFO:mlp.optimisers:Epoch 27: Took 0 seconds. Training speed 6992 pps. Validation speed 76700 pps.\n",
"INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 0.438. Accuracy is 90.07%\n",
"INFO:mlp.optimisers:Epoch 28: Took 0 seconds. Training speed 5097 pps. Validation speed 57735 pps.\n",
"INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 0.440. Accuracy is 90.04%\n",
"INFO:mlp.optimisers:Epoch 29: Took 0 seconds. Training speed 5390 pps. Validation speed 77974 pps.\n",
"INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 0.444. Accuracy is 90.08%\n",
"INFO:mlp.optimisers:Epoch 30: Took 0 seconds. Training speed 5589 pps. Validation speed 60936 pps.\n",
"INFO:root:Testing the model on test set:\n",
"INFO:root:MNIST test set accuracy is 89.13 %, cost (ce) is 0.444\n"
]
}
],
"source": [
"\n",
"from mlp.layers import MLP, Relu, Softmax \n",
"from mlp.optimisers import SGDOptimiser \n",
"from mlp.costs import CECost \n",
"from mlp.schedulers import LearningRateFixed\n",
"\n",
"rng = numpy.random.RandomState([2015,10,10])\n",
"\n",
"#some hyper-parameters\n",
"nhid = 100\n",
"learning_rate = 0.1\n",
"max_epochs = 30\n",
"cost = CECost()\n",
" \n",
"stats = []\n",
"for layer in xrange(1, 2):\n",
"\n",
" train_dp.reset()\n",
" valid_dp.reset()\n",
" test_dp.reset()\n",
" \n",
" #define the model\n",
" model = MLP(cost=cost)\n",
" model.add_layer(Relu(idim=784, odim=nhid, irange=0.05, rng=rng))\n",
" for i in xrange(1, layer):\n",
" logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
" model.add_layer(Relu(idim=nhid, odim=nhid, irange=0.2, rng=rng))\n",
" model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
"\n",
" # define the optimiser, here stochasitc gradient descent\n",
" # with fixed learning rate and max_epochs\n",
" lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
" optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
"\n",
" logger.info('Training started...')\n",
" tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
"\n",
" logger.info('Testing the model on test set:')\n",
" tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
" logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
" \n",
" stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 3: Implement Maxout\n",
"\n",
"As with the previous two exercises, your implementation should follow the conventions used to build the Linear, Sigmoid and Softmax layers. For now implement only non-overlapping pools (i.e. the pool in which all activations $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ belong to only one pool). As before, test your solution by training a one-hidden-layer model with 100 hidden units, similiar to the one used in Task 3a in the coursework. Use the same optimisation hyper-parameters (learning rate, initial weights range) as you used for ReLU models. Tune the pool size $K$ (but keep the number of total parameters fixed).\n",
"\n",
"Note: The Max operator reduces dimensionality, hence for example, to get 100 hidden maxout units with pooling size set to $K=2$ the size of linear part needs to be set to $100K$ (assuming non-overlapping pools). This affects how you compute the total number of weights in the model.\n",
"\n",
"Implementation tips: To back-propagate through the maxout layer, one needs to keep track of which linear activation $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ was the maximum in each pool. The convenient way to do so is by storing the indices of the maximum units in the fprop function and then in the backprop stage pass the gradient only through those (i.e. for example, one can build an auxiliary matrix where each element is either 1 (if unit was maximum, and passed forward through the max operator for a given data-point) or 0 otherwise. Then in the backward pass it suffices to upsample the maxout *igrads* signal to the linear layer dimension and element-wise multiply by the aforemenioned auxiliary matrix.\n",
"\n",
"*Optional:* Implement the generic pooling mechanism by introducing an additional *stride* hyper-parameter $0<S\\leq K$. It specifies how many units you move to build the next pool. For instance, for non-overlapping pooling with $S=K=3$ one would build the first two maxout units as: $h_1=\\max(a_1,a_2,a_3)$ and $h_2=\\max(a_4,a_5,a_6)$. However, after setting $S=1$ the pools should share some subset of linear activations: $h_1=\\max(a_1,a_2,a_3)$ and $h_2=\\max(a_2,a_3,a_4)$."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Training started...\n",
"INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 2.320. Accuracy is 7.90%\n",
"INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 2.321. Accuracy is 7.57%\n",
"INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 1.223. Accuracy is 65.00%\n",
"INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 0.686. Accuracy is 79.01%\n",
"INFO:mlp.optimisers:Epoch 1: Took 5 seconds. Training speed 986 pps. Validation speed 2320 pps.\n",
"INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 0.505. Accuracy is 84.60%\n",
"INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 0.413. Accuracy is 88.88%\n",
"INFO:mlp.optimisers:Epoch 2: Took 5 seconds. Training speed 855 pps. Validation speed 2414 pps.\n",
"INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 0.337. Accuracy is 91.20%\n",
"INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 0.371. Accuracy is 89.27%\n",
"INFO:mlp.optimisers:Epoch 3: Took 5 seconds. Training speed 971 pps. Validation speed 2360 pps.\n",
"INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 0.241. Accuracy is 93.30%\n",
"INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 0.351. Accuracy is 89.72%\n",
"INFO:mlp.optimisers:Epoch 4: Took 5 seconds. Training speed 957 pps. Validation speed 2369 pps.\n",
"INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 0.172. Accuracy is 96.50%\n",
"INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 0.349. Accuracy is 89.90%\n",
"INFO:mlp.optimisers:Epoch 5: Took 5 seconds. Training speed 812 pps. Validation speed 2421 pps.\n",
"INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 0.127. Accuracy is 97.60%\n",
"INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 0.344. Accuracy is 90.20%\n",
"INFO:mlp.optimisers:Epoch 6: Took 5 seconds. Training speed 949 pps. Validation speed 2420 pps.\n",
"INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 0.091. Accuracy is 99.10%\n",
"INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 0.363. Accuracy is 89.79%\n",
"INFO:mlp.optimisers:Epoch 7: Took 5 seconds. Training speed 964 pps. Validation speed 2393 pps.\n",
"INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 0.064. Accuracy is 99.10%\n",
"INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 0.339. Accuracy is 90.63%\n",
"INFO:mlp.optimisers:Epoch 8: Took 6 seconds. Training speed 943 pps. Validation speed 2223 pps.\n",
"INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 0.050. Accuracy is 99.60%\n",
"INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 0.339. Accuracy is 90.72%\n",
"INFO:mlp.optimisers:Epoch 9: Took 6 seconds. Training speed 786 pps. Validation speed 2193 pps.\n",
"INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 0.035. Accuracy is 99.80%\n",
"INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 0.336. Accuracy is 90.90%\n",
"INFO:mlp.optimisers:Epoch 10: Took 6 seconds. Training speed 760 pps. Validation speed 1982 pps.\n",
"INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 0.029. Accuracy is 99.80%\n",
"INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 0.344. Accuracy is 90.68%\n",
"INFO:mlp.optimisers:Epoch 11: Took 5 seconds. Training speed 795 pps. Validation speed 2392 pps.\n",
"INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 0.023. Accuracy is 99.90%\n",
"INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 0.344. Accuracy is 90.78%\n",
"INFO:mlp.optimisers:Epoch 12: Took 5 seconds. Training speed 937 pps. Validation speed 2494 pps.\n",
"INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 0.019. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 0.349. Accuracy is 90.83%\n",
"INFO:mlp.optimisers:Epoch 13: Took 5 seconds. Training speed 994 pps. Validation speed 2351 pps.\n",
"INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 0.016. Accuracy is 99.90%\n",
"INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 0.352. Accuracy is 90.77%\n",
"INFO:mlp.optimisers:Epoch 14: Took 6 seconds. Training speed 975 pps. Validation speed 2207 pps.\n",
"INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 0.014. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 0.354. Accuracy is 90.84%\n",
"INFO:mlp.optimisers:Epoch 15: Took 5 seconds. Training speed 951 pps. Validation speed 2410 pps.\n",
"INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 0.013. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 0.356. Accuracy is 90.92%\n",
"INFO:mlp.optimisers:Epoch 16: Took 5 seconds. Training speed 968 pps. Validation speed 2358 pps.\n",
"INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 0.011. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 0.358. Accuracy is 90.80%\n",
"INFO:mlp.optimisers:Epoch 17: Took 5 seconds. Training speed 928 pps. Validation speed 2357 pps.\n",
"INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 0.010. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 0.362. Accuracy is 90.88%\n",
"INFO:mlp.optimisers:Epoch 18: Took 6 seconds. Training speed 832 pps. Validation speed 2138 pps.\n",
"INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 0.009. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 0.364. Accuracy is 90.89%\n",
"INFO:mlp.optimisers:Epoch 19: Took 7 seconds. Training speed 828 pps. Validation speed 1831 pps.\n",
"INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 0.009. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 0.366. Accuracy is 90.94%\n",
"INFO:mlp.optimisers:Epoch 20: Took 6 seconds. Training speed 773 pps. Validation speed 2109 pps.\n",
"INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 0.367. Accuracy is 90.95%\n",
"INFO:mlp.optimisers:Epoch 21: Took 6 seconds. Training speed 787 pps. Validation speed 2118 pps.\n",
"INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 0.366. Accuracy is 91.02%\n",
"INFO:mlp.optimisers:Epoch 22: Took 7 seconds. Training speed 718 pps. Validation speed 1954 pps.\n",
"INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 0.369. Accuracy is 90.99%\n",
"INFO:mlp.optimisers:Epoch 23: Took 6 seconds. Training speed 819 pps. Validation speed 2160 pps.\n",
"INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 0.372. Accuracy is 91.00%\n",
"INFO:mlp.optimisers:Epoch 24: Took 6 seconds. Training speed 824 pps. Validation speed 2177 pps.\n",
"INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 0.374. Accuracy is 90.93%\n",
"INFO:mlp.optimisers:Epoch 25: Took 6 seconds. Training speed 819 pps. Validation speed 2080 pps.\n",
"INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 0.373. Accuracy is 91.03%\n",
"INFO:mlp.optimisers:Epoch 26: Took 6 seconds. Training speed 788 pps. Validation speed 2131 pps.\n",
"INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 0.376. Accuracy is 90.96%\n",
"INFO:mlp.optimisers:Epoch 27: Took 6 seconds. Training speed 803 pps. Validation speed 2085 pps.\n",
"INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 0.377. Accuracy is 90.95%\n",
"INFO:mlp.optimisers:Epoch 28: Took 6 seconds. Training speed 778 pps. Validation speed 2054 pps.\n",
"INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 0.378. Accuracy is 90.98%\n",
"INFO:mlp.optimisers:Epoch 29: Took 6 seconds. Training speed 748 pps. Validation speed 2053 pps.\n",
"INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
"INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 0.379. Accuracy is 90.99%\n",
"INFO:mlp.optimisers:Epoch 30: Took 7 seconds. Training speed 732 pps. Validation speed 1832 pps.\n",
"INFO:root:Testing the model on test set:\n",
"INFO:root:MNIST test set accuracy is 90.39 %, cost (ce) is 0.384\n"
]
}
],
"source": [
"\n",
"from mlp.layers import MLP, Maxout, Softmax \n",
"from mlp.optimisers import SGDOptimiser\n",
"from mlp.costs import CECost \n",
"from mlp.schedulers import LearningRateFixed\n",
"\n",
"#some hyper-parameters\n",
"nhid = 100\n",
"learning_rate = 0.1\n",
"k = 2 #maxout pool size (stride is assumed k)\n",
"max_epochs = 30\n",
"cost = CECost()\n",
" \n",
"stats = []\n",
"for layer in xrange(1, 2):\n",
"\n",
" train_dp.reset()\n",
" valid_dp.reset()\n",
" test_dp.reset()\n",
" \n",
" #define the model\n",
" model = MLP(cost=cost)\n",
" model.add_layer(Maxout(idim=784, odim=nhid, k=k, irange=0.05, rng=rng))\n",
" for i in xrange(1, layer):\n",
" logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
" model.add_layer(Maxout(idim=nhid, odim=nhid, k=k, irange=0.2, rng=rng))\n",
" model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
"\n",
" # define the optimiser, here stochasitc gradient descent\n",
" # with fixed learning rate and max_epochs\n",
" lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
" optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
"\n",
" logger.info('Training started...')\n",
" tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
"\n",
" logger.info('Testing the model on test set:')\n",
" tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
" logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
" \n",
" stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 4: Train all the above models with dropout\n",
"\n",
"Try all of the above non-linearities with dropout training. Use the dropout hyper-parameters $\\{p_{inp}, p_{hid}\\}$ that worked best for sigmoid models from the previous lab.\n",
"\n",
"Note: the code for dropout you were asked to implement last week has not been given as a solution for this week - as a result you need to move/merge the required dropout parts from your previous *lab4* branch (or implement it if you haven't already done so). \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#This one is a simple merge of above experiments with last exercise in previous tutorial."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -1,542 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Coursework #2\n",
"\n",
"This notebook contains some extended versions of hints and some code examples that are suppose to make it easier to proceed with certain tasks in Coursework #2.\n",
"\n",
"## Virtual environments\n",
"\n",
"Before you proceed onwards, remember to activate your virtual environment by typing `activate_mlp` or `source ~/mlpractical/venv/bin/activate` (or if you did the original install the \"comfy way\" type: `workon mlpractical`).\n",
"\n",
"## Syncing the git repository\n",
"\n",
"Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for the coursework, as follows:\n",
"\n",
"1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
"2. List the branches and check which are currently active by typing: `git branch`\n",
"3. If you have followed our recommendations, you should be in the `lab5` branch, please commit your local changes to the repo index by typing:\n",
"```\n",
"git commit -am \"finished lab5\"\n",
"```\n",
"4. Now you can switch to `master` branch by typing: \n",
"```\n",
"git checkout master\n",
" ```\n",
"5. To update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
"```\n",
"git pull\n",
"```\n",
"6. And now, create the new branch & switch to it by typing:\n",
"```\n",
"git checkout -b coursework2\n",
"```\n",
"\n",
"# Store the intermediate results (check-pointing and pickling)\n",
"\n",
"Once you have finished a task it is a good idea to check-point your current notebook's status (logs, plots and whatever else has been stored in the notebook). By doing this, you can always revert to this state later when necessary. You can do this by going to menus `File->Save and Checkpoint` and `File->Revert to Checkpoint`.\n",
"\n",
"Another good practice would be to save models and the statistics you generate to disk. You can easily do this in python by using *cPickle*, as in the following example."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python \n",
"import cPickle\n",
"\n",
"#...\n",
"(train_stats, valid_stats) = optimiser.train(model, train_dp, valid_dp)\n",
"test_stats = optimiser.validate(model, test_dp)\n",
"\n",
"#this one saves the model, you can save like this any object \n",
"#in python, like tuples, dictionaries, lists, etc.\n",
"with open('task1_mdl.pkl','wb') as f:\n",
" cPickle.dump(model, f)\n",
" \n",
"#then, to load you can type \n",
"with open('task1_mdl.pkl','r') as f:\n",
" model2 = cPickle.load(f)\n",
"\n",
"#and you can use it again (if needed) without retraining\n",
"test_stats2 = optimiser.validate(model2, test_dp)\n",
" \n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notes on numpy and tensors\n",
"\n",
"This is a remainder on some numpy conventions you may find useful (especially in the second part of coursework #2, which involves the implementation of convolution and pooling layers).\n",
"\n",
"Links to numpy indexing:\n",
"* [Numpy (advanced) indexing](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)\n",
"* [More on indexing of multi-dimensional arrays](http://docs.scipy.org/doc/numpy/user/basics.indexing.html)\n",
"\n",
"Below we list some (potentially) useful functions - you are not expected to need them all - we just outline some (non-obvious) functionality that you may find useful. Search the numpy documentation to get precise information about them. \n",
"\n",
"* `numpy.sum` - note that the axis arguments allow to specify a sequence of axes, hence, the reduction (here sum) can be performed along arbitrary dimensions.\n",
"* `numpy.amax` - the same as with sum\n",
"* `numpy.transpose` - can specify which axes you want to get transposed in a tensor\n",
"* `numpy.argmax` - gives you the argument (index) of the maximum value in a tensor\n",
"* `numpy.flatten` - collapses the n-dimensional tensor into vector (copy)\n",
"* `numpy.ravel` - collapses the n-dimensional tensor into vector (creates a view)\n",
"* `numpy.reshape` - allows to reshape a tensor into another (valid from data perspective) tensor (matrix, vector) with a different shape (but the same number of total elements)\n",
"* `numpy.rot90(m, k)` - rotate matrix `m` by 90 degrees `k` times (counter-clockwise)\n",
"* `numpy.newaxis` - add an axis with dimension 1 (handy for keeping tensor shapes compatible with expected broadcasting)\n",
"* `numpy.rollaxis` - roll an axis in a tensor\n",
"* `slice` - allows to specify a range (can be used when indexing numpy arrays)\n",
"* `ellipsis` - allows to pick an arbitrary number of dimensions (inferred)\n",
"* `max_and_argmax` - `(mlp.layers)` - an auxiliary function we have provided to get both max and argmax of a tensor across an arbitrary axes, possibly in the format preserving tensor's original shape (this is not trivial to do using numpy *out-of-the-shelf* functionality).\n",
"\n",
"The below cells contain some simple examples showing the basics of tensor manipulation in numpy (go through them if you haven't used numpy in this context before)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%load_ext cython\n",
"\n",
"from mlp.dataset import MNISTDataProvider\n",
"import numpy\n",
"\n",
"rng = numpy.random.RandomState([2015, 11, 11])\n",
"\n",
"# we create MNISTDataProvider with 'conv_reshape' argument set to True, \n",
"# this will give us a 4D tensor of shape\n",
"# (batch_size, num_input_channels, width, height)\n",
"# compared to standard fully connected one\n",
"# (batch_size, width*height)\n",
"batch_size = 2\n",
"mdp = MNISTDataProvider('valid', batch_size=batch_size, max_num_batches=1, randomize=False, conv_reshape=True)\n",
"x, t = mdp.next()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%pylab\n",
"%matplotlib inline\n",
"# this will print (2, 1, 28, 28) which is\n",
"# (batch_size, num_input_channels, width, height)\n",
"print x.shape\n",
"# and this the flattened version used to date (2, 784)\n",
"print x.reshape(batch_size, -1).shape\n",
"\n",
"# you can pick the first image in the usual way you index numpy arrays\n",
"img1 = x[0]\n",
"# or in full notation which is equivalent (numpy by default treats single \n",
"# indexes as selecting everything for this dimension)\n",
"img2 = x[0, :, :, :]\n",
"#this will be true\n",
"print numpy.allclose(img1, img2)\n",
"\n",
"# Notice, slicing like this is collapsing the leading dimension \n",
"# (as we picked only 0-th element), so the below will give you\n",
"# (1, 28, 28)\n",
"print img1.shape\n",
"print img2.shape\n",
"\n",
"# to keep this dimension, one can use numpy.newaxis, which will add one dimension of\n",
"# size 1, as a result you get a tensor (1, 1, 28, 28) \n",
"# one image (as expected), but with preserved ndims of the source 4D tensor \n",
"# this can be handy as it can simplify assignments to the original tensor\n",
"img1=x[0, numpy.newaxis]\n",
"print img1.shape\n",
"\n",
"#Let assume you want to get a sum of pixel intensities in each image in a mini-batch, \n",
"#you can of course iterate over images and compute sum for each separately, but you can also:\n",
"\n",
"#to get the sum of pixel for each image separately, you could write:\n",
"# (which means, sum along axis 2 and 3 together)\n",
"print numpy.sum(x, axis=(2,3))\n",
"\n",
"#notice, that any of other calls would do what we want as:\n",
"#will print the total sum for all images\n",
"print numpy.sum(x)\n",
"\n",
"#will print the sum along the last axes (the columns of the images)\n",
"print numpy.sum(x, axis=-1)\n",
"\n",
"# finally, let us swap the 10x20 rectangle of pixels between images\n",
"# in numpy you can of course perform an arbitrary slicing and assignment\n",
"# of sub-matrices\n",
"\n",
"slice_x = slice(10, 20)\n",
"slice_y = slice(5, 25)\n",
"\n",
"patch1 = x[0, :, slice_x, slice_y]\n",
"patch2 = x[1, :, slice_x, slice_y]\n",
"print patch1.shape, patch1.shape\n",
"\n",
"xc = numpy.array(x) #this will make a copy of x\n",
"xc[0, :, slice_x, slice_y] = patch2\n",
"xc[1, :, slice_x, slice_y] = patch1\n",
"\n",
"fig, ax = plt.subplots(2,2)\n",
"ax[0, 0].imshow(x[0,0], cmap=cm.Greys_r)\n",
"ax[0, 1].imshow(x[1,0], cmap=cm.Greys_r)\n",
"ax[1, 0].imshow(xc[0,0], cmap=cm.Greys_r)\n",
"ax[1, 1].imshow(xc[1,0], cmap=cm.Greys_r)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Verifying the gradients\n",
"\n",
"One can numerically compute the gradient using the [finite differences](https://en.wikipedia.org/wiki/Finite_difference) method, that is, perturb the input arguments by some small value and then measure how this affects the function change:\n",
"\n",
"$\n",
"f(x) = \\frac{f(x+\\epsilon) - f(x)}{\\epsilon}\n",
"$\n",
"\n",
"Because $\\epsilon$ is usually very small (1e-4 or smaller) it is recommended (due to finite precision of numerical machines) to use the centred variant (which is implemented in mlp.utils):\n",
"\n",
"$\n",
"f(x) = \\frac{f(x+\\epsilon) - f(x-\\epsilon)}{2\\epsilon}\n",
"$\n",
"\n",
"The numerical gradient gives a good intuition if something is wrong. But take care, since one can easily find ill-conditioned cases where this test might fail - either due to numerical precision when gradients get really small, or other because of issues like discontinuities in transfer functions (ReLU, Maxout) where perturbing the inputs might cause the piecwise component to cross \"the border\". For instance, for ReLU assume $f(x) < 0$ by a some small margin in argument $x$ and the gradient is correctly set to 0. However, the finite difference quotient rule with some $\\epsilon$ such $f(x+\\epsilon) > 0$ will give a non-zero numerical gradient. Anyway, this method remains very useful in verifying whether the implemented forward and backward pasees are mutually correct.\n",
"\n",
"Below, you can find some examples on how one can use it, first for an arbitrary function and then short snippet on how to check the gradient backpropagated through layer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%autoreload\n",
"\n",
"import numpy\n",
"from mlp.utils import verify_gradient\n",
"\n",
"rng = numpy.random.RandomState([2015, 11, 11])\n",
"\n",
"#simple example\n",
"def f1_correct_grad(x, **kwargs):\n",
" fval = x**2\n",
" fgrad = 2*x\n",
" return numpy.sum(fval), fgrad\n",
"\n",
"def f1_incorrect_grad(x, **kwargs):\n",
" fval = x**2\n",
" fgrad = x\n",
" return numpy.sum(fval), fgrad\n",
"\n",
"x = rng.uniform(-5, 5, (10,))\n",
"\n",
"#this one should be OK\n",
"print verify_gradient(f=f1_correct_grad, x=x)\n",
"# this one should raise an exception, as the computed gradient is wrong\n",
"print verify_gradient(f=f1_incorrect_grad, x=x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also check the backprop implementation in the layer. Notice, it **does not** necessarily check whether your layer implementation is correct but rather if the gradient computation is correct, given the forward pass computation. If you get the forward pass wrong, and somehow got the gradients right w.r.t what the forward pass is computing, the below check will not capture it (obviously). Contrary to normal scenraio where 32 floating point precision is sufficient, when checking gradients please make sure 64bit precision is used (or tune the tolerance)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%autoreload\n",
"\n",
"from mlp.layers import Sigmoid\n",
"from mlp.utils import verify_layer_gradient\n",
"\n",
"# keep it small, however, one can test it on some subset of MNIST\n",
"idim = 10\n",
"odim = 5\n",
"\n",
"x = rng.uniform(-2, 2, (20, idim))\n",
"s = Sigmoid(idim=idim, odim=odim, rng=rng)\n",
"verify_layer_gradient(s, x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Speeding up the code\n",
"\n",
"Convolution can be accelerated in many ways, one of them is the use of *Cython* to write crucial bits in python (the one that involve heavy loop usage). You can speed up your code by:\n",
"\n",
"* Using numpy as much as possible (which will use highly optimised looping, and possibly a form of BLAS-implemented paralleism where possible)\n",
"* Applying standard tricks to convolution (they boil down to more efficent use of BLAS routines (above) by loop unrolling - fewer operations on larger matrices, rather than more on smaller)\n",
"* Speeding up the code by compiling pythonic c-functions (cython)\n",
"\n",
"\n",
"## Using Cython for the crucial bottleneck pieces\n",
"\n",
"Cython will compile them to C and the code should be comparable in terms of efficiency to numpy using similar operations in numpy. Of course, one can only rely on numpy. Slicing numpy across many dimensions gets much more complicated than working with vectors and matrices and we do understand that this can be confusing. Hence, we allow the basic implementation (with any penalty or preference from our side) to be based on embedded loops (which is perhaps much easier to comprehend and debug).\n",
"\n",
"Below we give some example cython code for the matrix-matrix dot function from the second tutorial so that you can see the basic differences and compare the obtained speeds. They give you all the necessary patterns needed to implement naive (reasonably efficient) convolution. If you use native python, rather than Cython, then naive looping will be *very* slow.\n",
"\n",
"Some tutorials:\n",
" * [Cython, language basics](http://docs.cython.org/src/userguide/language_basics.html#language-basics)\n",
" * [Cython, basic tutorial](http://docs.cython.org/src/tutorial/cython_tutorial.html)\n",
" * [Cython in ipython notebooks](http://docs.cython.org/src/quickstart/build.html)\n",
" * [A tutorial on how to optimise the cython code](http://docs.cython.org/src/tutorial/numpy.html) (includes a working example which is actually simple convolution code, do not use it `as is`)\n",
" \n",
"\n",
"Before you proceed, check that you have installed `cython` (it should be installed with scipy). If the below imports do not work, then - staying in the activated virtual environment - type:\n",
" \n",
" ```\n",
" pip install cython\n",
" ```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#native pythonic implementation (as in the second tutorial, rather slow...)\n",
"def my_dot_mat_mat(x, W):\n",
" I = x.shape[0]\n",
" J = x.shape[1]\n",
" K = W.shape[1]\n",
" assert (J == W.shape[0]), (\n",
" \"Number of columns in x expected to \"\n",
" \" to be the same as rows in W, got\"\n",
" \" %i != %i\" % (J, W.shape[0])\n",
" )\n",
" #allocate the output container\n",
" y = numpy.zeros((I, K))\n",
" \n",
" #implement matrix-matrix inner product here\n",
" for i in xrange(0, I):\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[i, k] += x[i, j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%cython\n",
"# this shows an example on how to build cython accelerated\n",
"# function than one can later use a standard python function\n",
"\n",
"#notice, you need to specify all the imports again (so they are\n",
"#available in the compiled verions of the code, when executed)\n",
"import numpy\n",
"cimport numpy\n",
"\n",
"def my_dot_mat_mat_cython(numpy.ndarray x, numpy.ndarray W):\n",
" cdef int I = x.shape[0]\n",
" cdef int J = x.shape[1]\n",
" cdef int K = W.shape[1]\n",
" \n",
" assert (J == W.shape[0]), (\n",
" \"Number of columns in x expected to \"\n",
" \" to be the same as rows in W, got\"\n",
" \" %i != %i\" % (J, W.shape[0])\n",
" )\n",
" \n",
" #allocate the output container and other variables\n",
" cdef numpy.ndarray y = numpy.zeros((I, K))\n",
" cdef int i, k, j\n",
" \n",
" #implement matrix-matrix inner product here\n",
" for i in xrange(0, I):\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[i, k] += x[i, j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%cython\n",
"# this shows an example on how to build cython accelerated\n",
"# function than one can later call as usual from python\n",
"\n",
"#the optimisation relies on making an explicit c-buffers for data, \n",
"#hence they can be accesses much quicker, notice, this is where \n",
"#the real gain in speed comes from really\n",
"\n",
"#you need to specify all the imports again (so they are\n",
"#available in the compiled verions of the code)\n",
"import numpy\n",
"cimport numpy\n",
"\n",
"#we specify the types our function is supposed to\n",
"#be compiled with, float32 is sufficient for neural nets\n",
"#you can use defalut 64bit precision if you want\n",
"DTYPE = numpy.float32\n",
"ctypedef numpy.float32_t DTYPE_t\n",
"\n",
"\n",
"def my_dot_mat_mat_cython_optimised(numpy.ndarray[DTYPE_t, ndim=2] x, \n",
" numpy.ndarray[DTYPE_t, ndim=2] W):\n",
" \n",
" cdef int I = x.shape[0]\n",
" cdef int J = x.shape[1]\n",
" cdef int K = W.shape[1]\n",
" \n",
" assert (J == W.shape[0]), (\n",
" \"Number of columns in x expected to \"\n",
" \" to be the same as rows in W, got\"\n",
" \" %i != %i\" % (J, W.shape[0])\n",
" )\n",
" \n",
" #allocate the output container and other variables, notice, when allocating\n",
" #y we specify its type both for the buffer but also in numpy.zeros() function\n",
" cdef numpy.ndarray[DTYPE_t, ndim=2] y = numpy.zeros((I, K), dtype=DTYPE)\n",
" cdef int i, k, j\n",
" \n",
" #implement matrix-matrix inner product here\n",
" for i in xrange(0, I):\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[i, k] += x[i, j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can optimise the code further as in the [linked](http://docs.cython.org/src/tutorial/numpy.html) tutorial. However, the above example seems to be a reasonable compromise for developing the code - it gives reasonably accelerated code, with all the checks one may expect to be existent under development (checking bounds of indices, wheter types of variables match, tracking overflows etc.). Look [here](http://docs.cython.org/src/reference/compilation.html) for more optimisation decorators that one can use to speed things up.\n",
"\n",
"Below we do some benchmarks on each of the above functions. Notice the huge speed-up in going from non-optimised cython code to an optimised one (on my machine, 643ms -> 6.35ms - this is 2 orders of magnitude!). It is still around two times slower than the BLAS accelerated numpy.dot routine (the non-cached result is around 3.3ms). But our method just benchmarks the dot product, an operation that has been optimised incredibly well in numerical libraries. Of course, we **do not** want you to use this code for dot products and you should rely on functions provided by numpy (whenever reasonably possible). The above code was just given as an example how to produce much more efficient code with very small effort. In many scenarios (convolution is an example) the code is more complex than a single dot product and some looping is necessary anyway, especially when dealing with multi-dimensional tensors where atomic operations using direct loop-based indexing may be much easier to comprehend (and debug) than a direct multi-dimensional manipulation of numpy tensors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#generate bit bigger matrices, to better evaluate timings\n",
"\n",
"# note, we explicitly cast them to float32 as it was the type cython\n",
"# functions were compiled with, float32 is more than sufficient precision\n",
"# for neural networks\n",
"x = rng.uniform(-1, 1, (10, 1000)).astype(numpy.float32)\n",
"W = rng.uniform(-0.3, 0.2, (1000, 100)).astype(numpy.float32)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print 'native pyton my_dot timings:'\n",
"%timeit -n10 my_dot_mat_mat(x, W)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print 'cython my_dot timings:'\n",
"%timeit -n10 my_dot_mat_mat_cython(x, W)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print 'optimised cython my_dot timings:'\n",
"%timeit -n10 my_dot_mat_mat_cython_optimised(x, W)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print 'numpy.dot timings (with BLAS):'\n",
"%timeit -n10 numpy.dot(x, W)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -1,362 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Please don't edit this cell!**\n",
"\n",
"# Marks and Feedback\n",
"\n",
"**Total Marks:** XX/100\n",
"\n",
"**Overall comments:**\n",
"\n",
"\n",
"## Part 1. Investigations into Neural Networks (35 marks)\n",
"\n",
"* **Task 1**: *Experiments with learning rate schedules* - XX/5\n",
" * learning rate schedulers implemented\n",
" * experiments carried out\n",
" * further comments\n",
"\n",
"\n",
"* **Task 2**: *Experiments with regularisation* - XX/5\n",
" * L1 experiments\n",
" * L2 experiments\n",
" * dropout experiments\n",
" * annealed dropout implmented\n",
" * further experiments carried out\n",
" * further comments\n",
" \n",
"\n",
"* **Task 3**: *Experiments with pretraining* - XX/15\n",
" * autoencoder pretraining implemented\n",
" * denoising autoencoder pretraining implemented\n",
" * CE layer-by-layer pretraining implemented\n",
" * experiments\n",
" * further comments\n",
"\n",
"\n",
"* **Task 4**: *Experiments with data augmentation* - XX/5\n",
" * training data augmneted using noise, rotation, ...\n",
" * any further augmnetations\n",
" * experiments \n",
" * further comments\n",
"\n",
"\n",
"* **Task 5**: *State of the art* - XX/5\n",
" * motivation for systems constructed\n",
" * experiments\n",
" * accuracy of best system\n",
" * further comments\n",
"\n",
"\n",
"\n",
"## Part 2. Convolutional Neural Networks (55 marks)\n",
"\n",
"* **Task 6**: *Implement convolutional layer* - XX/20\n",
" * linear conv layer\n",
" * sigmoid conv layer\n",
" * relu conv layer\n",
" * any checks for correctness\n",
" * loop-based or vectorised implementations\n",
" * timing comparisons\n",
"\n",
"\n",
"* **Task 7**: *Implement maxpooling layer* - XX/10\n",
" * implementation of non-overlapping pooling\n",
" * generic implementation\n",
" * any checks for correctness\n",
"\n",
"\n",
"* **Task 8**: *Experiments with convolutional networks* - XX/25\n",
" * 1 conv layer (1 fmap)\n",
" * 1 conv layer (5 fmaps)\n",
" * 2 conv layers\n",
" * further experiments\n",
"\n",
"\n",
"\n",
"## Presentation (10 marks)\n",
"\n",
"* ** Marks:** XX/10\n",
" * Concise description of each system constructed\n",
" * Experiment design and motivations for different systems\n",
" * Presentation of results - graphs, tables, diagrams\n",
" * Conclusions\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Coursework #2\n",
"\n",
"## Introduction\n",
"\n",
"\n",
"## Previous Tutorials\n",
"\n",
"Before starting this coursework make sure that you have completed the following labs:\n",
"\n",
"* [04_Regularisation.ipynb](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/04_Regularisation.ipynb) - regularising the model\n",
"* [05_Transfer_functions.ipynb](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/05_Transfer_functions.ipynb) - building and training different activation functions\n",
"* [06_MLP_Coursework2_Introduction.ipynb](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/06_MLP_Coursework2_Introduction.ipynb) - Notes on numpy and tensors\n",
"\n",
"\n",
"## Submission\n",
"**Submission Deadline: Thursday 14 January 2016, 16:00** \n",
"\n",
"Submit the coursework as an ipython notebook file, using the `submit` command in the terminal on a DICE machine. If your file is `06_MLP_Coursework1.ipynb` then you would enter:\n",
"\n",
"`submit mlp 2 06_MLP_Coursework1.ipynb` \n",
"\n",
"where `mlp 2` indicates this is the second coursework of MLP.\n",
"\n",
"After submitting, you should receive an email of acknowledgment from the system confirming that your submission has been received successfully. Keep the email as evidence of your coursework submission.\n",
"\n",
"**Please make sure you submit a single `ipynb` file (and nothing else)!**\n",
"\n",
"**Submission Deadline: Thursday 14 January 2016, 16:00** \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting Started\n",
"Please enter your student number and the date in the next code cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#MLP Coursework 2\n",
"#Student number: <ENTER STUDENT NUMBER>\n",
"#Date: <ENTER DATE>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 1. Investigations into Neural Networks (35 marks)\n",
"\n",
"In this part you are may choose exactly what you implement. However, you are expected to express your motivations, observations, and findings in a clear and cohesive way. Try to make it clear why you decided to do certain things. Use graphs and/or tables of results to show trends and other characteristics you think are important. \n",
"\n",
"For example, in Task 1 you could experiment with different schedulers in order to compare their convergence properties. In Task 2 you could look into (and visualise) what happens to weights when applying L1 and/or L2 regularisation when training. For instance, you could create sorted histograms of weight magnitudes in in each layer, etc..\n",
"\n",
"**Before submission, please collapse all the log entries into smaller boxes (by clicking on the bar on the left hand side)**\n",
"\n",
"### Task 1 - Experiments with learning rate schedules (5 marks)\n",
"\n",
"Investigate the effect of learning rate schedules on training and accuracy. Implement at least one additional learning rate scheduler mentioned in the lectures. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#load the corresponding code here, and also attach scripts that run the experiments ()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 2 - Experiments with regularisers (5 marks)\n",
"\n",
"Investigate the effect of different regularisation approaches (L1, L2, dropout). Implement the annealing dropout scheduler (mentioned in lecture 5). Do some further investigations and experiments with model structures (and regularisers) of your choice. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 3 - Experiments with pretraining (15 marks)\n",
"\n",
"Implement pretraining of multi-layer networks with autoencoders, denoising autoencoders, and using layer-by-layer cross-entropy training. \n",
"\n",
"Implementation tip: You could add the corresponding methods to `optimiser`, namely, `pretrain()` and `pretrain_epoch()`, for autoencoders. Simiilarly, `pretrain_discriminative()` and `pretrain_epoch_discriminative()` for cross-entropy layer-by-layer pretraining. Of course, you can modify any other necessary pieces, but include all the modified fragments below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 4 - Experiments with data augmentation (5 marks)\n",
"\n",
"Using the standard MNIST training data, generate some augmented training examples (for example, using noise or rotation). Perform experiments on using this expanded training data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 5 - State of the art (5 marks)\n",
"\n",
"Using any techniques you have learnt so far (combining any number of them), build and train the best model you can (no other constraints)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Part 2. Convolutional Neural Networks (55 marks)\n",
"\n",
"In this part of the coursework, you are required to implement deep convolutional networks. This includes code for forward prop, back prop, and weight updates for convolutional and max-pooling layers, and should support the stacking of convolutional + pooling layers. You should implement all the parts relating to the convolutional layer in the mlp/conv.py module; if you decide to implement some routines in cython, keep them in mlp/conv.pyx). Attach both files in this notebook.\n",
"\n",
"Implementation tips: Look at [lecture 7](http://www.inf.ed.ac.uk/teaching/courses/mlp/2015/mlp07-cnn.pdf) and [lecture 8](http://www.inf.ed.ac.uk/teaching/courses/mlp/2015/mlp08-cnn2.pdf), and the introductory tutorial, [06_MLP_Coursework2_Introduction.ipynb](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/06_MLP_Coursework2_Introduction.ipynb)\n",
"\n",
"### Task 6 - Implement convolutional layer (20 marks)\n",
"\n",
"Implement linear convolutional layer, and then extend to sigmoid and ReLU transfer functions (do it in a similar way to fully-connected layers). Include all relevant code. It is recommended that you first implement in the naive way with nested loops (python and/or cython); optionally you may then implement in a vectorised way in numpy. Include logs for each way you implement the convolutional layer, as timings for different implementations are of interest. Include all relevant code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Task 7 - Implement max-pooling layer (10 marks)\n",
"\n",
"Implement a max-pooling layer. Non-overlapping pooling (which was assumed in the lecture presentation) is required. You may also implement a more generic solution with striding as well. Include all relevant code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 8 - Experiments with convolutional networks (25 marks)\n",
"\n",
"Construct convolutional networks with a softmax output layer and a single fully connected hidden layer. Your first experiments should use one convolutional+pooling layer. As a default use convolutional kernels of dimension 5x5 (stride 1) and pooling regions of 2x2 (stride 2, hence non-overlapping).\n",
"\n",
"* Implement and test a convolutional network with 1 feature map\n",
"* Implement and test a convolutional network with 5 feature maps\n",
"\n",
"Explore convolutional networks with two convolutional layers, by implementing, training, and evaluating a network with two convolutional+maxpooling layers with 5 feature maps in the first convolutional layer, and 10 feature maps in the second convolutional layer.\n",
"\n",
"Carry out further experiments to optimise the convolutional network architecture (you could explore kernel sizes and strides, number of feature maps, sizes and strides of pooling operator, etc. - it is up to you)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"**This is the end of coursework 2.**\n",
"\n",
"Please remember to save your notebook, and submit your notebook following the instructions at the top. Please make sure that you have executed all the code cells when you submit the notebook.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}