lab and coursework

2015-10-12 01:52:54 +01:00 · 2015-10-12 01:52:54 +01:00 · 28ccdd9c8b
commit 28ccdd9c8b
parent 2b1700180e
2 changed files with 620 additions and 0 deletions
--- a/02_MNIST_SLN.ipynb
+++ b/02_MNIST_SLN.ipynb
@ -0,0 +1,292 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "This tutorial is an introduction to the coursework about multi-layer preceptron (MLP) models, or Deep Neural Networks (DNNs). Here, we will show how to build a single layer linear model (similar to the one from the previous lab) for MNIST classification using the provided code-base. \n",
+    "\n",
+    "The principal purpose of this introduction is to get you familiar with how to connect the provided blocks (and what operations each of them implements) to set up an experiment, including 1) build the model structure 2) optimise the model's parameters and 3) evaluate the model on test data. \n",
+    "\n",
+    "## For those affected by notebook kernel issues\n",
+    "\n",
+    "In case you are still having issues with running notebook kernels, have a look at [this note](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/kernel_issue_fix.md) on the GitHub.\n",
+    "\n",
+    "## Virtual environments\n",
+    "\n",
+    "Before you proceed onwards, remember to activate your virtual environment:\n",
+    "   * If you were in last week's Tuesday or Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`\n",
+    "   * If you were in the Monday group:\n",
+    "      + and if you have chosen the **comfy** way type: `workon mlpractical`\n",
+    "      + and if you have chosen the **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):\n",
+    "\n",
+    "## Syncing repos\n",
+    "\n",
+    "Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for the coursework, as follows:\n",
+    "\n",
+    "1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
+    "2. List the branches and check which is currently active by typing: `git checkout`\n",
+    "3. Change the\n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Multi Layer Models\n",
+    "\n",
+    "Today, we are going to build the models with an arbitrary number of hidden layers, please have a look at the below diagram and the corresponding computations (which have an *exact* matrix form as expected by numpy and row-wise orientation, $\\circ$ denotes an element-wise product). Below the diagram, we briefly describe how each comptation relates to the code we have provided.\n",
+    "\n",
+    "![Making Predictions](res/code_scheme.svg)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1. Structuring the model\n",
+    "   * The model (for now) is allowed to have a sequence of layers, mapping inputs $\\mathbf{x}$ to outputs $\\mathbf{y}$. \n",
+    "   * This operation is implemented as a special type of a layer in `mlp.layers.MLP` class. It keeps a sequence of other layers (of various typyes like Linear, Sigmoid, Softmax, etc.) as well as the internal state of a model for a mini-batch, that is, the intermediate data produced in *forward* and *backward* passes.\n",
+    "2. Forward computation\n",
+    "    * `mlp.layers.MLP` provides a `fprop()` method that iterates over defined layers propagates $\\mathbf{x}$ to $\\mathbf{y}$. \n",
+    "    * Each layer (look at `mlp.layers.Linear` attached below) also implements `fprop()` method, which performs an atomic, for the given layer, operation. Most often, for the $i$-th layer, we want to obtain a linear transform $\\mathbf a^i$ of the inputs, and apply some non-linear transfer function $f^i(\\mathbf a^i)$ to produce the output $\\mathbf h^i$. Note, in general each layer may implement different activation functions $f^i()$, however for now we will use only `sigmoid` and `softmax`\n",
+    "3. Backward computation\n",
+    "   * Similarly, `mlp.layers.MLP` also implements `bprop()` function, to back-propagate the errors from the top to the bottom layer. This class also keeps the back-propagated stats ($\\delta$) to be used later when computing the gradients w.r.t the parameters.\n",
+    "   * This functionality is also re-implemented by particular layers (again, have a look at `bprop` function of `mlp.layers.Linear`). `bprop()` is suppsed to return both $\\delta$ (needed to update the parameters) but also back-progapate the gradient down to the inputs. Also note, that depending on whether the layer is the top or not (deals directly with the cost or not) some simplifications may apply (i.e. as with cross-entropy and softmax). That's why when implementing a new type of layer that may be used as an output layer one also need to specify the implementation of `bprop_cost()`.\n",
+    "4. Learning the model\n",
+    "   * The actual evaluation of the cost as well as the *forward* and *backward* passes one may find `train_epoch()` method of `mlp.optimisers.SGDOptimiser`\n",
+    "   * This function also calls the `pgrads()` method on each layer, that given activations and deltas, is supposed to return the list of the gradients of the cost w.r.t the model parameters, i.e. $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{W^i}}}$ and  $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{b}^i}}$ at the above diagram (look at an example implementation in `mlp.layers.Linear`)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# %load -s Linear mlp/layers.py\n",
+    "class Linear(Layer):\n",
+    "\n",
+    "    def __init__(self, idim, odim,\n",
+    "                 rng=None,\n",
+    "                 irange=0.1):\n",
+    "\n",
+    "        super(Linear, self).__init__(rng=rng)\n",
+    "\n",
+    "        self.idim = idim\n",
+    "        self.odim = odim\n",
+    "\n",
+    "        self.W = self.rng.uniform(\n",
+    "            -irange, irange,\n",
+    "            (self.idim, self.odim))\n",
+    "\n",
+    "        self.b = numpy.zeros((self.odim,), dtype=numpy.float32)\n",
+    "\n",
+    "    def fprop(self, inputs):\n",
+    "        \"\"\"\n",
+    "        Implements a forward propagation through the i-th layer, that is\n",
+    "        some form of:\n",
+    "           a^i = xW^i + b^i\n",
+    "           h^i = f^i(a^i)\n",
+    "        with f^i, W^i, b^i denoting a non-linearity, weight matrix and\n",
+    "        biases of this (i-th) layer, respectively and x denoting inputs.\n",
+    "\n",
+    "        :param inputs: matrix of features (x) or the output of the previous layer h^{i-1}\n",
+    "        :return: h^i, matrix of transformed by layer features\n",
+    "        \"\"\"\n",
+    "        a = numpy.dot(inputs, self.W) + self.b\n",
+    "        # here f() is an identity function, so just return a linear transformation\n",
+    "        return a\n",
+    "\n",
+    "    def bprop(self, h, igrads):\n",
+    "        \"\"\"\n",
+    "        Implements a backward propagation through the layer, that is, given\n",
+    "        h^i denotes the output of the layer and x^i the input, we compute:\n",
+    "        dh^i/dx^i which by chain rule is dh^i/da^i da^i/dx^i\n",
+    "        x^i could be either features (x) or the output of the lower layer h^{i-1}\n",
+    "        :param h: it's an activation produced in forward pass\n",
+    "        :param igrads, error signal (or gradient) flowing to the layer, note,\n",
+    "               this in general case does not corresponds to 'deltas' used to update\n",
+    "               the layer's parameters, to get deltas ones need to multiply it with\n",
+    "               the dh^i/da^i derivative\n",
+    "        :return: a tuple (deltas, ograds) where:\n",
+    "               deltas = igrads * dh^i/da^i\n",
+    "               ograds = deltas \\times da^i/dx^i\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # since df^i/da^i = 1 (f is assumed identity function),\n",
+    "        # deltas are in fact the same as igrads\n",
+    "        ograds = numpy.dot(igrads, self.W.T)\n",
+    "        return igrads, ograds\n",
+    "\n",
+    "    def bprop_cost(self, h, igrads, cost):\n",
+    "        \"\"\"\n",
+    "        Implements a backward propagation in case the layer directly\n",
+    "        deals with the optimised cost (i.e. the top layer)\n",
+    "        By default, method should implement a bprop for default cost, that is\n",
+    "        the one that is natural to the layer's output, i.e.:\n",
+    "        here we implement linear -> mse scenario\n",
+    "        :param h: it's an activation produced in forward pass\n",
+    "        :param igrads, error signal (or gradient) flowing to the layer, note,\n",
+    "               this in general case does not corresponds to 'deltas' used to update\n",
+    "               the layer's parameters, to get deltas ones need to multiply it with\n",
+    "               the dh^i/da^i derivative\n",
+    "        :param cost, mlp.costs.Cost instance defining the used cost\n",
+    "        :return: a tuple (deltas, ograds) where:\n",
+    "               deltas = igrads * dh^i/da^i\n",
+    "               ograds = deltas \\times da^i/dx^i\n",
+    "        \"\"\"\n",
+    "\n",
+    "        if cost is None or cost.get_name() == 'mse':\n",
+    "            # for linear layer and mean square error cost,\n",
+    "            # cost back-prop is the same as standard back-prop\n",
+    "            return self.bprop(h, igrads)\n",
+    "        else:\n",
+    "            raise NotImplementedError('Linear.bprop_cost method not implemented '\n",
+    "                                      'for the %s cost' % cost.get_name())\n",
+    "\n",
+    "    def pgrads(self, inputs, deltas):\n",
+    "        \"\"\"\n",
+    "        Return gradients w.r.t parameters\n",
+    "\n",
+    "        :param inputs, input to the i-th layer\n",
+    "        :param deltas, deltas computed in bprop stage up to -ith layer\n",
+    "        :return list of grads w.r.t parameters dE/dW and dE/db in *exactly*\n",
+    "                the same order as the params are returned by get_params()\n",
+    "\n",
+    "        Note: deltas here contain the whole chain rule leading\n",
+    "        from the cost up to the the i-th layer, i.e.\n",
+    "        dE/dy^L dy^L/da^L da^L/dh^{L-1} dh^{L-1}/da^{L-1} ... dh^{i}/da^{i}\n",
+    "        and here we are just asking about\n",
+    "          1) da^i/dW^i and 2) da^i/db^i\n",
+    "        since W and b are only layer's parameters\n",
+    "        \"\"\"\n",
+    "\n",
+    "        grad_W = numpy.dot(inputs.T, deltas)\n",
+    "        grad_b = numpy.sum(deltas, axis=0)\n",
+    "\n",
+    "        return [grad_W, grad_b]\n",
+    "\n",
+    "    def get_params(self):\n",
+    "        return [self.W, self.b]\n",
+    "\n",
+    "    def set_params(self, params):\n",
+    "        #we do not make checks here, but the order on the list\n",
+    "        #is assumed to be exactly the same as get_params() returns\n",
+    "        self.W = params[0]\n",
+    "        self.b = params[1]\n",
+    "\n",
+    "    def get_name(self):\n",
+    "        return 'linear'\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example 1: Experiment with linear models and MNIST\n",
+    "\n",
+    "The below snippet demonstrates how to use the code we have provided for the coursework 1. Get familiar with it, as from now on we will use till the end of the course, including the 2nd coursework.\n",
+    "\n",
+    "It should be straightforward to extend the following code to more complex models, like stack more layers, change the cost, the optimiser, learning rate schedules, etc.. But **ask** in case something is not clear.\n",
+    "\n",
+    "In this particular example, we use the following components:\n",
+    "  *  One layer mapping data-points ($\\mathbf x$) straight to 10 digits classes represented as 10 (linear) outputs ($\\mathbf y$). This operation is implemented as a linear layer in `mlp.layers.Linear`. Get familiar with this class (read the comments, etc.) as it is going to be a building block for the coursework.\n",
+    "  * One can stack as many different layers as required through the container `mlp.layers.MLP`\n",
+    "  * As an objective here we use the Mean Square Error cost defined in `mlp.costs.MSECost`\n",
+    "  * Our *Stochastic Gradient Descent* optimiser can be found in `mlp.optimisers.SGDOptimiser`. Its parent `mlp.optimisers.Optimiser` implements validation functionality (and an interface in case one need to implement a different optimiser)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import numpy\n",
+    "import logging\n",
+    "\n",
+    "logger = logging.getLogger()\n",
+    "logger.setLevel(logging.INFO)\n",
+    "\n",
+    "from mlp.layers import MLP, Linear #import required layer types\n",
+    "from mlp.optimisers import SGDOptimiser #import the optimiser\n",
+    "from mlp.dataset import MNISTDataProvider #import data provider\n",
+    "from mlp.costs import MSECost #import the cost we want to use for optimisation\n",
+    "from mlp.schedulers import LearningRateFixed\n",
+    "\n",
+    "rng = numpy.random.RandomState([2015,10,10])\n",
+    "\n",
+    "# define the model structure, here just one linear layer\n",
+    "# and mean square error cost\n",
+    "cost = MSECost()\n",
+    "model = MLP(cost=cost)\n",
+    "model.add_layer(Linear(idim=784, odim=10, rng=rng))\n",
+    "#one can stack more layers here\n",
+    "\n",
+    "# define the optimiser, here stochasitc gradient descent\n",
+    "# with fixed learning rate and max_epochs as stopping criterion\n",
+    "lr_scheduler = LearningRateFixed(learning_rate=0.01, max_epochs=20)\n",
+    "optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
+    "\n",
+    "logger.info('Initialising data providers...')\n",
+    "train_dp = MNISTDataProvider(dset='train', batch_size=100, max_num_batches=-10, randomize=True)\n",
+    "valid_dp = MNISTDataProvider(dset='valid', batch_size=100, max_num_batches=-10, randomize=False)\n",
+    "\n",
+    "logger.info('Training started...')\n",
+    "optimiser.train(model, train_dp, valid_dp)\n",
+    "\n",
+    "logger.info('Testing the model on test set:')\n",
+    "test_dp = MNISTDataProvider(dset='eval', batch_size=100, max_num_batches=-10, randomize=False)\n",
+    "cost, accuracy = optimiser.validate(model, test_dp)\n",
+    "logger.info('MNIST test set accuracy is %.2f %% (cost is %.3f)'%(accuracy*100., cost))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise\n",
+    "\n",
+    "Modify the above code by adding an intemediate linear layer of size 200 hidden units between input and output layers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
--- a/03_MLP_Coursework1.ipynb
+++ b/03_MLP_Coursework1.ipynb
@ -0,0 +1,328 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Coursework #1\n",
+    "\n",
+    "## Introduction\n",
+    "\n",
+    "This coursework is concerned with building multi-layer networks to address the MNIST digit classification problem. It builds on the previous labs, in particular [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb) in which single layer networks were trained for MNIST digit classification.   The course will involve extending that code to use Sigmoid and Softmax layers, combining these into multi-layer networks, and carrying out a number of MNIST digit classification experiments, to investigate the effect of learning rate, the number of hidden units, and the number of hidden layers.\n",
+    "\n",
+    "The coursework is divided into 4 tasks:\n",
+    "* **Task 1**:   *Implementing a sigmoid layer* - 15 marks.  \n",
+    "This task involves extending the `Linear` class in file `mlp/layers.py` to `Sigmoid`, with code for forward prop, backprop computation of the gradient, and weight update.\n",
+    "* **Task 2**:  *Implementing a softmax layer* - 15 marks.  \n",
+    "This task involves extending the `Linear` class in file `mlp/layers.py` to `Softmax`, with code for forward prop, backprop computation of the gradient, and weight update.\n",
+    "* **Task 3**:  *Constructing a multi-layer network* - 40 marks.  \n",
+    "This task involves putting together a Sigmoid and a Softmax layer to create a multi-layer network, with one hidden layer (100 units) and one output layer, that is trained to classify MNIST digits.  This task will include reporting classification results, exploring the effect of learning rates, and plotting Hinton Diagrams for the hidden units and output units.\n",
+    "* **Task 4**:  *Experiments with different architectures*  - 30 marks.  \n",
+    "This task involves further MNIST classification experiments, primarily looking at the effect of using different numbers of hidden layers.\n",
+    "The coursework will be marked out of 100, and will contribute 30% of the total mark in the MLP course.\n",
+    "\n",
+    "## Previous Tutorials\n",
+    "\n",
+    "Before starting this coursework make sure that you have completed the first three labs:\n",
+    "\n",
+    "* [00_Introduction.ipynb](00_Introduction.ipynb) - setting up your environment; *Solutions*: [00_Introduction_solution.ipynb](00_Introduction_solution.ipynb)\n",
+    "* [01_Linear_models.ipynb](01_Linear_models.ipynb) - training single layer networks; *Solutions*: [01_Linear_models_solution.ipynb](01_Linear_models_solution.ipynb)\n",
+    "* [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb) - training a single layer network for MNIST digit classification\n",
+    "\n",
+    "To ensure that your virtual environment is correct, please see [this note](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/kernel_issue_fix.md) on the GitHub.\n",
+    "## Submission\n",
+    "**Submission Deadline:  Thursday 29 October, 16:00** \n",
+    "\n",
+    "Submit the coursework as an ipython notebook file, using the `submit` command in the terminal on a DICE machine. If your file is `03_MLP_Coursework1.ipynb` then you would enter:\n",
+    "\n",
+    "`submit mlp 1 03_MLP_Coursework1.ipynb` \n",
+    "\n",
+    "where `mlp 1` indicates this is the first coursework of MLP.\n",
+    "\n",
+    "After submitting, you should receive an email of acknowledgment from the system confirming that your submission has been received successfully. Keep the email as evidence of your coursework submission.\n",
+    "\n",
+    "**Please make sure you submit a single `ipynb` file (and nothing else)!**\n",
+    "\n",
+    "**Submission Deadline:  Thursday 29 October, 16:00** \n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Getting Started\n",
+    "Please enter your exam number and the date in the next code cell."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "#MLP Coursework 1\n",
+    "#Exam number: <ENTER EXAM NUMBER>\n",
+    "#Date: <ENTER DATE>\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please run the next code cell, which imports `numpy` and seeds the random number generator.  Please **do not** modify the random number generator seed!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import numpy\n",
+    "\n",
+    "#Seed a random number generator running the below cell, but do **not** modify the seed.\n",
+    "rng = numpy.random.RandomState([2015,10,10])\n",
+    "rng_state = rng.get_state()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Task 1 - Sigmoid Layer (15%)\n",
+    "\n",
+    "In this task you need to create a class `Sigmoid` which encapsulates a layer of `sigmoid` units.  You should do this by extending the `mlp.layers.Linear` class (in file `mlp/layers.py`), which implements a a layer of linear units (i.e. weighted sum plus bias).  The `Sigmoid` class extends this by applying the `sigmoid` transfer function to the weighted sum in the forward propagation, and applying the derivative of the `sigmoid` in the gradient descent back propagation and computing the gradients with respect to layer's parameters. Do **not** copy the implementation provided in `Linear` class but rather, **reuse** it through inheritance.\n",
+    "\n",
+    "When you have implemented `Sigmoid` (in the `mlp.layers` module), then please test it by running the below code cell.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from mlp.layers import Sigmoid\n",
+    "\n",
+    "a = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49])\n",
+    "b = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49, 20, 20])\n",
+    "\n",
+    "rng.set_state(rng_state)\n",
+    "sigm = Sigmoid(idim=a.shape[0], odim=b.shape[0], rng=rng)\n",
+    "\n",
+    "fp = sigm.fprop(a)\n",
+    "deltas, ograds  = sigm.bprop(h=fp, igrads=b)\n",
+    "\n",
+    "print fp.sum()\n",
+    "print deltas.sum()\n",
+    "print ograds.sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "***\n",
+    "To include the `Sigmoid` code in the notebook please run the below code cell.  (The `%load` notebook command is used to load the source of the `Sigmoid` class from `mlp/layers.py`.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%load -s Sigmoid mlp/layers.py\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Task 2 - Softmax (15%)\n",
+    "\n",
+    "In this task you need to create a class `Softmax` which encapsulates a layer of `softmax` units.  As in the previous task, you should do this by extending the `mlp.layers.Linear` class (in file `mlp/layers.py`).\n",
+    "\n",
+    "When you have implemented `Softmax` (in the `mlp.layers` module), then please test it by running the below code cell.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from mlp.layers import Softmax\n",
+    "\n",
+    "a = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49])\n",
+    "b = numpy.asarray([0, 0, 0, 0, 0, 0, 0, 1])\n",
+    "\n",
+    "rng.set_state(rng_state)\n",
+    "softmax = Softmax(idim=a.shape[0], odim=b.shape[0], rng=rng)\n",
+    "\n",
+    "fp = softmax.fprop(a)\n",
+    "deltas, ograds = softmax.bprop_cost(h=None, igrads=fp-b, cost=None)\n",
+    "\n",
+    "print fp.sum()\n",
+    "print deltas.sum()\n",
+    "print ograds.sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "***\n",
+    "To include the `Softmax` code in the notebook please run the below code cell.  (The notebook `%load` command is used to load the source of the `Softmax` class from `mlp/layers.py`.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%load -s Softmax mlp/layers.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Task 3 - Multi-layer network for MNIST classification (40%)\n",
+    "\n",
+    "**(a)** (20%)  Building on the single layer linear network for MNIST classification used in lab [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb), and using the `Sigmoid` and `Softmax` classes that you implemented in tasks 1 and 2, construct and learn a model that classifies MNIST images and:\n",
+    "   * Has one hidden layer with a `sigmoid` transfer function and 100 units\n",
+    "   * Uses a `softmax` output layer to discriminate between the 10 digit classes (use the `mlp.costs.CECost()` cost)\n",
+    "\n",
+    "Your code should print the final values of the error function and the classification accuracy for train, validation, and test sets (please keep also the log information printed by default by the optimiser). Limit the number of training epochs to 30. You can, of course, split the solution at as many cells as you think is necessary."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# include here the complete code that constructs the model, performs training,\n",
+    "# and prints the error and accuracy for train/valid/test"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**(b)** (10%) Investigate the impact of different learning rates $\\eta \\in \\{0.5, 0.2, 0.1, 0.05, 0.01, 0.005\\}$ on the convergence of the network training as well as the final accuracy:\n",
+    "   * Plot (on a single graph) the error rate curves for each learning rate as a function of training epochs for training set\n",
+    "   * Plot (on another single graph) the error rate curves as a function of training epochs for validation set\n",
+    "   * Include a table of the corresponding error rates for test set\n",
+    "\n",
+    "The notebook command `%matplotlib inline` ensures that your graphs will be added to the notebook, rather than opened as additional windows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**(c)** (10%) Plot the following graphs:\n",
+    "  * Display the 784-element weight vector of each of the 100 hidden units as 10x10 grid plot of 28x28 images, in order to visualise what features of the input they are encoding.  To do this, take the weight vector of each hidden unit, reshape to 28x28, and plot using the `imshow` function).\n",
+    "  * Plot a Hinton Diagram of the output layer weight matrix for digits 0 and 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "## Task 4 - Experiments with 1-5 hidden layers (30%)\n",
+    "\n",
+    "In this task use the learning rate which resulted in the best accuracy in your experiments in Task 3 (b).  Perform the following experiments:\n",
+    "\n",
+    "  * Train a similar model to Task 3, with one hidden layer, but with 800 hidden units. \n",
+    "  * Train 4 additional models with 2, 3, 4 and 5 hidden layers.  Set the number of hidden units for each model, such that all the models have similar number of trainable weights ($\\pm$2%).   For simplicity, for a given model, keep the number of units in each hidden layer the same.\n",
+    "  * Plot value of the error function for training and validation sets as a function of training epochs for each model\n",
+    "  * Plot the test set classification accuracy as a function of the number of hidden layers\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "This is the end of coursework 1.\n",
+    "\n",
+    "Please remember to save your notebook, and submit your notebook following the instructions at the top.  Please make sure that you have executed all the code cells when you submit the notebook.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}