mlpractical/notebooks/03_MLP_Coursework1.ipynb
2016-09-17 15:08:06 +01:00

337 lines
13 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Coursework #1\n",
"\n",
"## Introduction\n",
"\n",
"This coursework is concerned with building multi-layer networks to address the MNIST digit classification problem. It builds on the previous labs, in particular [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb) in which single layer networks were trained for MNIST digit classification. The course will involve extending that code to use Sigmoid and Softmax layers, combining these into multi-layer networks, and carrying out a number of MNIST digit classification experiments, to investigate the effect of learning rate, the number of hidden units, and the number of hidden layers.\n",
"\n",
"The coursework is divided into 4 tasks:\n",
"* **Task 1**: *Implementing a sigmoid layer* - 15 marks. \n",
"This task involves extending the `Linear` class in file `mlp/layers.py` to `Sigmoid`, with code for forward prop, backprop computation of the gradient, and weight update.\n",
"* **Task 2**: *Implementing a softmax layer* - 15 marks. \n",
"This task involves extending the `Linear` class in file `mlp/layers.py` to `Softmax`, with code for forward prop, backprop computation of the gradient, and weight update.\n",
"* **Task 3**: *Constructing a multi-layer network* - 40 marks. \n",
"This task involves putting together a Sigmoid and a Softmax layer to create a multi-layer network, with one hidden layer (100 units) and one output layer, that is trained to classify MNIST digits. This task will include reporting classification results, exploring the effect of learning rates, and plotting Hinton Diagrams for the hidden units and output units.\n",
"* **Task 4**: *Experiments with different architectures* - 30 marks. \n",
"This task involves further MNIST classification experiments, primarily looking at the effect of using different numbers of hidden layers.\n",
"The coursework will be marked out of 100, and will contribute 30% of the total mark in the MLP course.\n",
"\n",
"## Previous Tutorials\n",
"\n",
"Before starting this coursework make sure that you have completed the first three labs:\n",
"\n",
"* [00_Introduction.ipynb](00_Introduction.ipynb) - setting up your environment; *Solutions*: [00_Introduction_solution.ipynb](00_Introduction_solution.ipynb)\n",
"* [01_Linear_Models.ipynb](01_Linear_Models.ipynb) - training single layer networks; *Solutions*: [01_Linear_Models_solution.ipynb](01_Linear_Models_solution.ipynb)\n",
"* [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb) - training a single layer network for MNIST digit classification\n",
"\n",
"To ensure that your virtual environment is correct, please see [this note](https://github.com/CSTR-Edinburgh/mlpractical/blob/master/kernel_issue_fix.md) on the GitHub.\n",
"## Submission\n",
"**Submission Deadline: Thursday 29 October, 16:00** \n",
"\n",
"Submit the coursework as an ipython notebook file, using the `submit` command in the terminal on a DICE machine. If your file is `03_MLP_Coursework1.ipynb` then you would enter:\n",
"\n",
"`submit mlp 1 03_MLP_Coursework1.ipynb` \n",
"\n",
"where `mlp 1` indicates this is the first coursework of MLP.\n",
"\n",
"After submitting, you should receive an email of acknowledgment from the system confirming that your submission has been received successfully. Keep the email as evidence of your coursework submission.\n",
"\n",
"**Please make sure you submit a single `ipynb` file (and nothing else)!**\n",
"\n",
"**Submission Deadline: Thursday 29 October, 16:00** \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting Started\n",
"Please enter your exam number and the date in the next code cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#MLP Coursework 1\n",
"#Exam number: <ENTER EXAM NUMBER>\n",
"#Date: <ENTER DATE>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please run the next code cell, which imports `numpy` and seeds the random number generator. Please **do not** modify the random number generator seed!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy\n",
"\n",
"#Seed a random number generator running the below cell, but do **not** modify the seed.\n",
"rng = numpy.random.RandomState([2015,10,10])\n",
"rng_state = rng.get_state()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 1 - Sigmoid Layer (15%)\n",
"\n",
"In this task you need to create a class `Sigmoid` which encapsulates a layer of sigmoid units. You should do this by extending the `mlp.layers.Linear` class (in file `mlp/layers.py`), which implements a a layer of linear units (i.e. weighted sum plus bias). The `Sigmoid` class extends this by applying the sigmoid transfer function to the weighted sum in the forward propagation, and applying the derivative of the sigmoid in the gradient descent back propagation and computing the gradients with respect to layer's parameters. Do **not** copy the implementation provided in `Linear` class but rather, **reuse** it through inheritance.\n",
"\n",
"When you have implemented `Sigmoid` (in the `mlp.layers` module), then please test it by running the below code cell.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from mlp.layers import Sigmoid\n",
"\n",
"a = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49])\n",
"b = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49, 20, 20])\n",
"\n",
"rng.set_state(rng_state)\n",
"sigm = Sigmoid(idim=a.shape[0], odim=b.shape[0], rng=rng)\n",
"\n",
"fp = sigm.fprop(a)\n",
"deltas, ograds = sigm.bprop(h=fp, igrads=b)\n",
"\n",
"print fp.sum()\n",
"print deltas.sum()\n",
"print ograds.sum()\n",
"%precision 3\n",
"print fp\n",
"print deltas\n",
"print ograds\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"To include the `Sigmoid` code in the notebook please run the below code cell. (The `%load` notebook command is used to load the source of the `Sigmoid` class from `mlp/layers.py`.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%load -s Sigmoid mlp/layers.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 2 - Softmax (15%)\n",
"\n",
"In this task you need to create a class `Softmax` which encapsulates a layer of softmax units. As in the previous task, you should do this by extending the `mlp.layers.Linear` class (in file `mlp/layers.py`).\n",
"\n",
"When you have implemented `Softmax` (in the `mlp.layers` module), then please test it by running the below code cell.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from mlp.layers import Softmax\n",
"\n",
"a = numpy.asarray([-20.1, 52.4, 0, 0.05, 0.05, 49])\n",
"b = numpy.asarray([0, 0, 0, 0, 0, 0, 0, 1])\n",
"\n",
"rng.set_state(rng_state)\n",
"softmax = Softmax(idim=a.shape[0], odim=b.shape[0], rng=rng)\n",
"\n",
"fp = softmax.fprop(a)\n",
"deltas, ograds = softmax.bprop_cost(h=None, igrads=fp-b, cost=None)\n",
"\n",
"print fp.sum()\n",
"print deltas.sum()\n",
"print ograds.sum()\n",
"%precision 3\n",
"print fp\n",
"print deltas\n",
"print ograds\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"To include the `Softmax` code in the notebook please run the below code cell. (The notebook `%load` command is used to load the source of the `Softmax` class from `mlp/layers.py`.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%load -s Softmax mlp/layers.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 3 - Multi-layer network for MNIST classification (40%)\n",
"\n",
"**(a)** (20%) Building on the single layer linear network for MNIST classification used in lab [02_MNIST_SLN.ipynb](02_MNIST_SLN.ipynb), and using the `Sigmoid` and `Softmax` classes that you implemented in tasks 1 and 2, construct and learn a model that classifies MNIST images and:\n",
" * Has one hidden layer with a sigmoid transfer function and 100 units\n",
" * Uses a softmax output layer to discriminate between the 10 digit classes (use the `mlp.costs.CECost()` cost)\n",
"\n",
"Your code should print the final values of the error function and the classification accuracy for train, validation, and test sets (please keep also the log information printed by default by the optimiser). Limit the number of training epochs to 30. You can, of course, split your code across as many cells as you think is necessary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# include here the complete code that constructs the model, performs training,\n",
"# and prints the error and accuracy for train/valid/test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(b)** (10%) Investigate the impact of different learning rates $\\eta \\in \\{0.5, 0.2, 0.1, 0.05, 0.01, 0.005\\}$ on the convergence of the network training as well as the final accuracy:\n",
" * Plot (on a single graph) the error rate curves for each learning rate as a function of training epochs for training set\n",
" * Plot (on another single graph) the error rate curves as a function of training epochs for validation set\n",
" * Include a table of the corresponding error rates for test set\n",
"\n",
"The notebook command `%matplotlib inline` ensures that your graphs will be added to the notebook, rather than opened as additional windows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(c)** (10%) Plot the following graphs:\n",
" * Display the 784-element weight vector of each of the 100 hidden units as 10x10 grid plot of 28x28 images, in order to visualise what features of the input they are encoding. To do this, take the weight vector of each hidden unit, reshape to 28x28, and plot using the `imshow` function).\n",
" * Plot a Hinton Diagram of the output layer weight matrix for digits 0 and 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Task 4 - Experiments with 1-5 hidden layers (30%)\n",
"\n",
"In this task use the learning rate which resulted in the best accuracy in your experiments in Task 3 (b). Perform the following experiments:\n",
"\n",
" * Train a similar model to Task 3, with one hidden layer, but with 800 hidden units. \n",
" * Train 4 additional models with 2, 3, 4 and 5 hidden layers. Set the number of hidden units for each model, such that all the models have similar number of trainable weights ($\\pm$2%). For simplicity, for a given model, keep the number of units in each hidden layer the same.\n",
" * Plot value of the error function for training and validation sets as a function of training epochs for each model\n",
" * Plot the test set classification accuracy as a function of the number of hidden layers\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"This is the end of coursework 1.\n",
"\n",
"Please remember to save your notebook, and submit your notebook following the instructions at the top. Please make sure that you have executed all the code cells when you submit the notebook.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}