Merge pull request #40 from matt-graham/mlp2016-7/master

MLP 2016-7 updates
2016-09-21 02:55:42 +01:00 · 2016-09-21 02:55:42 +01:00 · c844ff2027
commit c844ff2027
parent d8c7ae96a0 b99514b6fc
38 changed files with 1898 additions and 1809 deletions
--- a/00_Introduction.ipynb
+++ b/00_Introduction.ipynb
@ -1,400 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction\n",
-    "\n",
-    "This notebook shows how to set-up a working python envirnoment for the Machine Learning Practical course.\n",
-    "\n",
-    "\n",
-    "# Setting up the software\n",
-    "\n",
-    "Within this course we are going to work with python (using some auxiliary libraries like numpy and scipy). Depending on the infrastracture and working environment (e.g. DICE), root permission may not be not available so the packages cannot be installed in default locations. A convenient python configuration, which allows us to install and update third party libraries easily using package manager, are so called virtual environments. They can be also used to work (and test) the code with different versions of software.\n",
-    "\n",
-    "## Instructions for Windows\n",
-    "\n",
-    "The fastest way to get working setup on Windows is to install Anaconda (http://www.continuum.io) package. It's a python environment with precompiled versions of the most popular scientific python libraries. It also works on MacOS, but numpy is not linked without a fee to a numerical library, hence for MacOS we recommend the following procedure.\n",
-    "\n",
-    "## Instructions for MacOS\n",
-    "\n",
-    "  * Install macports following instructions at https://www.macports.org/install.php\n",
-    "  * Install the relevant python packages in macports\n",
-    "\n",
-    "  ```\n",
-    "  sudo port install py27-scipy +openblas\n",
-    "  sudo port install py27-ipython +notebook\n",
-    "  sudo port install py27-notebook\n",
-    "  sudo port install py27-matplotlib\n",
-    "  sudo port select --set python python27\n",
-    "  sudo port select --set ipython2 py27-ipython\n",
-    "  sudo port select --set ipython py27-ipython\n",
-    "  ```\n",
-    "\n",
-    "Make sure that your `$PATH` has `/opt/local/bin` before `/usr/bin` so you pick up the version of python you just installed.\n",
-    "\n",
-    "## Instructions for DICE:\n",
-    "\n",
-    "### Directory structure and getting things organised\n",
-    "\n",
-    "To get things somehow standarized between people, and make life of everyone easier, we propse to organise your DICE setup in the following directory structure:\n",
-    "\n",
-    "  * `~/mlpractical/` -- for a general course repository\n",
-    "  * `~/mlpractical/repos-3rd` -- for stuff you download, build and install (numpy, OpenBlas, virtualenv)\n",
-    "  * `~/mlpractical/repo-mlp` -- this is the actual course repository you clone from our website (do not create a dir for it yet!)\n",
-    "  * `~/mlpractical/venv` -- this is where virutal envirnoment will make its dir (do not create a dir for it yet!)\n",
-    "\n",
-    "Create now repos-3rd directory (option -p in the below command will automatically create (non-existing) **p**arent directories (mlpractical):\n",
-    "\n",
-    "  * `mkdir -p ~/mlpractical/repos-3rd`\n",
-    "\n",
-    "And now, let us set an MLP_WDIR environmental variable (MLP Working DIRectory) that will keep an absolute path of working dir pointing to `~/mlpractial`, **add the below line** to your `~/.bashrc` file (if it does not exists, create one using a text editor! e.g. by running `gedit ~/.bashrc`):\n",
-    "\n",
-    "```\n",
-    "export MLP_WDIR=~/mlpractical\n",
-    "```\n",
-    "\n",
-    "Now re-source `~/.bashrc` by typing (so the env variables get updated!): `source ~/.bashrc`\n",
-    "\n",
-    "Enter the `repos-3rd` directory by typing: `cd ~/mlpractical/repos-3rd`  (or  ```cd $MLP_WDIR/repos-3rd``` if you want)\n",
-    "\n",
-    "### Configuring virtual environment\n",
-    "\n",
-    "Make sure you are in `repos-3rd` directory and that MLP_WDIR variable has been exported (you may type export in the terminal and examine the list of availabe variables in the current session), then type:\n",
-    "\n",
-    "  * `git clone https://github.com/pypa/virtualenv`\n",
-    "  * Enter the cloned repository and type ```./virtualenv.py --python /usr/bin/python2.7 --no-site-packages $MLP_WDIR/venv```\n",
-    "  * Activate the environment by typing `source ~/mlpractical/venv/bin/activate` (to leave the virtual environment one may type `decativate`)\n",
-    "  * Environments need to be activated every time ones start the new session so we will now create a handy alias to it in `~/.bashrc` script, by typing the below command (note, MLP_WDIR export needs to preceed this command):\n",
-    "  \n",
-    "  ```alias activate_mlp=\"source $MLP_WDIR/venv/bin/activate\"```\n",
-    "  \n",
-    "Then every time you open new session and want to activate the right virtual environment, simply type `activate_mlp` instead `source ~/mlpractical/venv/bin/activate`. Note, you need to re-soure the .bashrc in order alias to be visible in the current session.\n",
-    "\n",
-    "### Installing remaining packages\n",
-    "\n",
-    "Then, before you follow next, install/upgrade the following packages:\n",
-    "\n",
-    "```\n",
-    "pip install --upgrade pip\n",
-    "pip install setuptools\n",
-    "pip install setuptools --upgrade\n",
-    "pip install ipython\n",
-    "pip install notebook\n",
-    "```\n",
-    "\n",
-    "### Installing numpy\n",
-    "\n",
-    "Note, having virtual environment properly installed one may then run `pip install numpy` to use pip to install numpy, though this will most likely lead to the suboptimal configuration where numpy is linked to ATLAS numerical library, which on DICE is compiled in multi-threaded mode. This means whenever numpy use BLAS accelerated computations (using ATLAS), it will use **all** the available cores at the given machine. This happens because ATLAS can be compiled to either run computations in single *or* multi threaded modes. However, contrary to some other backends, the latter does not allow to use an arbitrary number of threads (specified by the user prior to computation). This is highly suboptimal, as the potential speed-up resulting from paralleism depends on many factors like the communication overhead between threads, the size of the problem, etc. Using all cores for our exercises is not-necessary.\n",
-    "\n",
-    "For which reason, we are going to compile our own version of BLAS package, called *OpenBlas*. It allows to specify the number of threads manually by setting an environmental variable OMP_NUM_THREADS=N, where N is a desired number of parallel threads (please use 1 by default). You can set an environment variable in the current shell by running\n",
-    "\n",
-    "```\n",
-    "export OMP_NUM_THREADS=1\n",
-    "```\n",
-    "\n",
-    "(note the lack of spaces around the equals sign and use of `export` to define an environment variable which will be available in sub-shells rather than just a variable local to the current shell).\n",
-    "\n",
-    "#### OpenBlas\n",
-    "\n",
-    "Enter again repos-3rd directory and copy into terminal the following commands (one at the time):\n",
-    "\n",
-    "```\n",
-    "cd ~/mlpractical/repos-3rd\n",
-    "OBDir=$MLP_WDIR/repos-3rd/OpenBLAS\n",
-    "git clone git://github.com/xianyi/OpenBLAS\n",
-    "cd OpenBLAS\n",
-    "make\n",
-    "make PREFIX=$OBDir install\n",
-    "```\n",
-    "\n",
-    "Once OpenBLAS is finished compiling we need to ensure the compiled shared library files in the `lib` subdirectory are available to the shared library loader. This can be done by appending the absolute path to the `lib` subdirectory to the `LD_LIBRARY_PATH` environment variable. To ensure this changes persist we will change the bash start up file `~/.bashrc` by opening it in a text editor (e.g. by running `gedit ~/.bashrc`) and adding the following line\n",
-    "\n",
-    "```\n",
-    "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MLP_WDIR/repos-3rd/OpenBLAS/lib\n",
-    "```\n",
-    "\n",
-    "Note, we again are using MLP_WDIR here, so the above line needs to be placed after you set MLP_WDIR.\n",
-    "\n",
-    "After you have edited `.bashrc` run\n",
-    "\n",
-    "```\n",
-    "source ~/.bashrc\n",
-    "activate_mlp  # This is the alias you set up in the bashrc\n",
-    "#source ~/mlpractical/venv/bin/activate\n",
-    "```\n",
-    "\n",
-    "to rerun the bash start up script make sure the new environment variable is available in the current shell and then reactivate the virtual environment.\n",
-    "\n",
-    "#### Numpy\n",
-    "\n",
-    "To install `numpy` linked against the OpenBLAS libraries we just compiled, first run the following commands (one at a time)\n",
-    "\n",
-    "```\n",
-    "cd ~/mlpractical/repos-3rd/\n",
-    "wget http://downloads.sourceforge.net/project/numpy/NumPy/1.9.2/numpy-1.9.2.zip\n",
-    "unzip numpy-1.9.2.zip\n",
-    "cd numpy-1.9.2\n",
-    "echo \"[openblas]\" >> site.cfg\n",
-    "echo \"library_dirs = $OBDir/lib\" >> site.cfg\n",
-    "echo \"include_dirs = $OBDir/include\" >> site.cfg\n",
-    "python setup.py build --fcompiler=gnu95\n",
-    "```\n",
-    "\n",
-    "Assuming the virtual environment is activated, the below command will install numpy in a desired space (`~/mlpractical/venv/...`):\n",
-    "\n",
-    "```\n",
-    "python setup.py install\n",
-    "```\n",
-    "\n",
-    "Now use pip to install remaining packages: `scipy`, `matplotlib`, `argparse`, and `nose` by executing:\n",
-    "\n",
-    "```\n",
-    "pip install scipy matplotlib argparse nose\n",
-    "```\n",
-    "\n",
-    "### Getting the mlpractical repository\n",
-    "\n",
-    "Clone the course repository from the github, by navigating to `~/mlpractical` directory and typing:\n",
-    "\n",
-    "```\n",
-    "cd $MLP_WDIR\n",
-    "git clone https://github.com/CSTR-Edinburgh/mlpractical.git repo-mlp\n",
-    "```\n",
-    "\n",
-    "When download is ready, enter the repo-mlp directory and start the actual interactive notebook session by typing:\n",
-    "\n",
-    "```\n",
-    "cd repo-mlp\n",
-    "ipython notebook\n",
-    "```\n",
-    "\n",
-    "This should start a ipython server which opens a new browser window listing files in `repo-mlp` directory, including `00_Introduction.ipynb.`. Open it and run (from the browser interface) the following examples and exercies."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "%clear\n",
-    "import numpy\n",
-    "# show_config() prints the configuration of numpy numerical backend \n",
-    "# you should be able to see linkage to OpenBlas or some other library\n",
-    "# in case those are empty, it means something went wrong and \n",
-    "# numpy will use a default (slow) pythonic implementation for algebra\n",
-    "numpy.show_config()\n",
-    "#numpy.test()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Also, below we check whether and how much speedup one may expect by using different number of cores:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "import multiprocessing\n",
-    "import timeit\n",
-    "\n",
-    "num_cores = multiprocessing.cpu_count()\n",
-    "N = 1000\n",
-    "x = numpy.random.random((N,N))\n",
-    "\n",
-    "for i in xrange(0, num_cores):\n",
-    "    # first, set the number of threads OpenBLAS\n",
-    "    # should use, the below line is equivalent\n",
-    "    # to typing export OMP_NUM_THREADS=i+1 in bash shell\n",
-    "    print 'Running matrix-matrix product on %i core(s)' % i\n",
-    "    os.environ['OMP_NUM_THREADS'] = str(i+1)\n",
-    "    %%timeit numpy.dot(x,x.T)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Test whether you can plot and display the figures using pyplot"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "# Remove the below line if not running this code in an ipython notebook\n",
-    "# It's a special command allowing the notebook to display plots inline\n",
-    "%matplotlib inline\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "x = numpy.linspace(0.0, 2*numpy.pi, 100)\n",
-    "y1 = numpy.sin(x)\n",
-    "y2 = numpy.cos(x)\n",
-    "\n",
-    "plt.plot(x, y1, lw=2, label=r'$\\sin(x)$')\n",
-    "plt.plot(x, y2, lw=2, label=r'$\\cos(x)$')\n",
-    "plt.xlabel('x')\n",
-    "plt.ylabel('y')\n",
-    "plt.legend()\n",
-    "plt.xlim(0.0, 2*numpy.pi)\n",
-    "plt.grid()\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Exercises\n",
-    "\n",
-    "Today exercises are meant to get you familiar with ipython notebooks (if you haven't used them so far), data organisation and how to access it. Next week onwars, we will follow with the material covered in lectures.\n",
-    "\n",
-    "## Data providers\n",
-    "\n",
-    "Open (in the browser) `mlp.dataset` module (go to `Home` tab and navigate to mlp package, then click on the link `dataset.py`). Have a look thourgh the code and comments, then follow to exercises.\n",
-    "\n",
-    "<b>General note:</b> you can load the mlp code into your favourite python IDE but it is totally OK if you work (modify & save) the code directly in the browser by opening/modyfing the necessary modules in the tabs.\n",
-    "\n",
-    "### Exercise 1 \n",
-    "\n",
-    "Using MNISTDataProvider, write a code that iterates over the first 5 minibatches of size 100 data-points. Print MNIST digits in 10x10 images grid plot. Images are returned from the provider as tuples of numpy arrays `(features, targets)`. The `features` matrix has shape BxD while the `targets` vector is of size B, where B is the size of a mini-batch and D is dimensionality of the features. By deafult, each data-point (image) is stored in a 784 dimensional vector of pixel intensities normalised to [0,1] range from an inital integer values [0-255]. However, the original spatial domain is two dimensional, so before plotting you need to convert it into 2D matrix (MNIST images have the same number of pixels for height and width).\n",
-    "\n",
-    "Tip: Useful functions for this exercise are: imshow, subplot, gridspec"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import matplotlib.gridspec as gridspec\n",
-    "import matplotlib.cm as cm\n",
-    "from mlp.dataset import MNISTDataProvider\n",
-    "\n",
-    "def show_mnist_image(img):\n",
-    "    fig = plt.figure()\n",
-    "    gs = gridspec.GridSpec(1, 1)\n",
-    "    ax1 = fig.add_subplot(gs[0,0])\n",
-    "    ax1.imshow(img, cmap=cm.Greys_r)\n",
-    "    plt.show()\n",
-    "\n",
-    "def show_mnist_images(batch):\n",
-    "    raise NotImplementedError('Write me!')\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "# An example for a single MNIST image\n",
-    "mnist_dp = MNISTDataProvider(dset='valid', batch_size=1, max_num_examples=2, randomize=False)\n",
-    "\n",
-    "for batch in mnist_dp:\n",
-    "    features, targets = batch\n",
-    "    show_mnist_image(features.reshape(28, 28))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Exercise 2\n",
-    "\n",
-    "`MNISTDataProvider` as `targets` currently returns a vector of integers, each element in this vector represents an id of the category `features` data-point represent. Later in the course we are going to need 1-of-K representation of targets, for instance, given the minibatch of size 3 and the corresponding targets vector $[2, 2, 0]$ (and assuming there are only 3 different classes to discriminate between), one needs to convert it into matrix $\\left[ \\begin{array}{ccc}\n",
-    "0 & 0 & 1 \\\\\n",
-    "0 & 0 & 1 \\\\\n",
-    "1 & 0 & 0 \\end{array} \\right]$. \n",
-    "\n",
-    "Implement `__to_one_of_k` method of `MNISTDataProvider` class. Then modify (uncomment) an appropriate line in its `next` method, so the raw targets get converted to `1 of K` coding. Test the code in the cell below."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "#implement here Exercise 1"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "### Exercise 3\n",
-    "\n",
-    "Write your own data provider `MetOfficeDataProvider` that wraps the weather data for south Scotland (could be obtained from: http://www.metoffice.gov.uk/hadobs/hadukp/data/daily/HadSSP_daily_qc.txt). The file was also downloaded and stored in `data` directory for your convenience. The provider should return a tuple `(x,t)` of the estimates over an arbitrary time windows (i.e. last N-1 days) for `x` and the N-th day as the one which model should be able to predict, `t`. For now, skip missing data-points (denoted by -99.9) and simply use the next correct value. Make sure the provider works for arbitrary `batch_size` settings, including the case where single mini-batch is equal to all datapoints in the dataset. Test the dataset in the cell below.\n",
-    "\n",
-    "Tip: To follow with this exercise, copy MNISTDataProvider in dataset.py, rename it to `MetOfficeDataProvider` and reimplement necesarry parts (including the arguments you pass to the constructor)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 2",
-   "language": "python",
-   "name": "python2"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 2
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.9"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
--- a/README.md
+++ b/README.md
@ -1,79 +1,14 @@
-# mlpractical
-## Machine Learning Practical (INFR11119)
+# Machine Learning Practical

-**Note:** At this point, you can go straight to 00_Introduction notebook - which contains more information.
+This repository contains the code for the University of Edinburgh [School of Informatics](http://www.inf.ed.ac.uk) course [Machine Learning Practical](http://www.inf.ed.ac.uk/teaching/courses/mlp/).

-To run the notebooks (and later the code you are going to write within this course)
-you are expected to have installed the following packages:
-
-<ul>
-<li>python 2.7+</li>
-<li>numpy (anything above 1.6, 1.9+ recommended, optimally compiled with some BLAS library [MKL, OpenBLAS, ATLAS, etc.)</li>
-<li>scipy (optional, but may be useful to do some tests)</li>
-<li>matplotlib (for plotting)</li>
-<li>ipython (v3.0+, 4.0 recommended)</li>
-<li>notebook (notebooks are in version 4.0)</li>
-</ul>
-
-You can install them straight away on your personal computer, 
-there is also a notebook tutorial (00_Introduction) on how to
-do this (particularly) on DICE, and what configuration you 
-are expected to have installed. For now, it suffices if you 
-get the software working on your personal computers so you can
- start ipython notebook server and open the inital introductory 
-tutorial (which will be made publicitly available next Monday).
-
-### Installing the software on personal computers
-
-#### On Windows: 
-
-Download and install the Anaconda package 
-(https://store.continuum.io/cshop/anaconda/)
-
-#### On Mac (use macports): 
-
-<ul>
-<li>Install macports following instructions at https://www.macports.org/install.php</li>
-<li>Install the relevant python packages in macports
-<ul>
-<li>  sudo port install py27-scipy +openblas </li>
-<li>  sudo port install py27-ipython +notebook </li>
-<li>  sudo port install py27-notebook </li>
-<li>  sudo port install py27-matplotlib </li>
-<li>  sudo port select --set python python27 </li>
-<li>  sudo port select --set ipython2 py27-ipython </li>
-<li>  sudo port select --set ipython py27-ipython </li>
-</ul>
-</ul>
-
-Also, make sure that your $PATH has /opt/local/bin before /usr/bin 
-so you pick up the version of python you just installed
-
-#### On DICE (we will do this during the first lab)
-
-### Getting the mlpractical repository
-
-Assuming ~/mlpractical is a target workspace you want to use during
-this course (where ~ denotes your home path, i.e. /home/user1). 
-To start, open the terminal and clone the github mlpractical 
-repository to your local disk:
-
-git clone https://github.com/CSTR-Edinburgh/mlpractical.git
-
-(Note: you can do it from your git account if you have one as the
-above just clone the repo as anonymous user, though it does not 
-matter at this point, as you are not required to submit pull requests, but you are **welcomed** to do so if you think some aspects of the notebooks can be improved!)
-
-Naviagate to the checked out directory (cd ~/mlpractical) and type:
-
-ipython notebook
-
-This should start ipython notebook server and open the browser with the page
-listing files/subdirs in the current directory.
-
-To update the repository (for example, on Monday), 
-enter ~/mlpractical and type git pull.
+This assignment-based course is focused on the implementation and evaluation of machine learning systems. Students who do this course will have experience in the design, implementation, training, and evaluation of machine learning systems.

+The code in this repository is split into:

+  *  a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
+  *  a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.

+## Getting set up

+Detailed instructions for setting up a development environment for the course are given in [this file](environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
--- a/environment-set-up.md
+++ b/environment-set-up.md
@ -0,0 +1,301 @@
+# Environment set up
+
+In this course we will be using [Python 2.7](https://www.python.org/) for all the labs and coursework assignments. In particular we will be making heavy use of the numerical computing libraries [NumPy](http://www.numpy.org/) and [SciPy](http://www.scipy.org/), and the interactive notebook application [Jupyter](http://jupyter.org/).
+
+A common headache in software projects is ensuring the correct versions of all dependencies are available on the current development system. Often you may be working on several distinct projects simultaneously each with its own potentially conflicting dependencies on external libraries. Additionally you may be working across multiple different machines (for example a personal laptop and University computers) with possibly different operating systems. Further, as is the case in Informatics on DICE, you may not have root-level access to a system you are working on and so not be able to install software at a system-wide level and system updates may cause library versions to be changed to incompatible versions.
+
+One way of overcoming these issues is to use project-specific *virtual environments*. In this context a  virtual environment is an isolated development environment where the external dependencies of a project can be installed and managed independent of the system-wide library versions (and those of the environments of other projects).
+
+There are several virtual environment solutions available in the Python eco-system, including the native [pyvenv](https://docs.python.org/3/library/venv.html) in Python 3 and the popular [virtualenv](https://virtualenv.pypa.io/en/stable/). Also related is [pip](https://pip.pypa.io/en/stable/) a Python package manager natively included in Python 2.7.9 and above.
+
+Here we will instead use the environment capabilities of the [Conda](http://conda.pydata.org/docs/) package management system. Unlike pip and virtualenv/pyvenv, Conda is not limited to managing Python packages but is a language and platform agnostic package manager. Both NumPy and SciPy have many non-Python external dependencies and their performance is very dependent on correctly linking to optimised linear algebra libraries.
+
+Conda can handle installation of the Python libraries we will be using and all their external dependencies, in particular allowing easy installation of [optimised numerical computing libraries](https://docs.continuum.io/mkl-optimizations/). Further Conda can easily be installed on Linux, OSX and Windows systems meaning if you wish to set up an environment on a personal machine as well this should be easy to do whatever your operating system of choice is.
+
+There are several options available for installing Conda on a system. Here we will use the Python 2.7 version of [Miniconda](http://conda.pydata.org/miniconda.html), which installs just Conda and its dependencies. An alternative is to install the [Anaconda Python distribution](https://docs.continuum.io/anaconda/), which installs Conda and a large selection of popular Python packages. As we will require only a small subset of these packages we will use the more barebones Miniconda to avoid eating into your DICE disk quota too much, however if installing on a personal machine you may wish to consider Anaconda if you want to explore other Python packages.
+
+## Installing Miniconda
+
+We provide instructions here for getting an environment with all the required dependencies running on computers running the School of Informatics [DICE desktop](http://computing.help.inf.ed.ac.uk/dice-platform). The same instructions should be able to used on other Linux distributions such as Ubuntu and Linux Mint with minimal adjustments.
+
+For those wishing to install on a personal Windows or OSX machine, the initial instructions for setting up Conda will differ slightly - you should instead select the relevant installer for your system from [here](http://conda.pydata.org/miniconda.html) and following the corresponding installation instructions from [here](http://conda.pydata.org/docs/install/quick.html). After Conda is installed the [remaining instructions](#create-conda-env) should be the same across different systems.
+
+---
+
+Open a bash terminal (`Applications > Terminal` on DICE).
+
+We first need to download the latest 64-bit Python 2.7 Miniconda install script:
+
+```
+wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
+```
+
+This uses `wget` a command-line tool for downloading files.
+
+Now run the install script:
+
+```
+bash Miniconda2-latest-Linux-x86_64.sh
+```
+
+You will first be asked to review the software license agreement. Assuming you choose to agree, you will then be asked to choose an install location for Miniconda. The default is to install in the root of your home directory `~/miniconda`. We recommend going with this default unless you have a particular reason to do otherwise.
+
+You will then be asked whether to prepend the Miniconda binaries directory to the `PATH` system environment variable definition in `.bashrc`. As the DICE bash start-up mechanism differs from the standard set up ([details here](http://computing.help.inf.ed.ac.uk/dice-bash)), on DICE you should respond `no` here as we will set up the addition to `PATH` manually in the next step. On other Linux distributions you may choose to accept the default.
+
+On DICE, append the Miniconda binaries directory to `PATH` in manually in `~/.benv` using
+
+```
+echo "export PATH=\""\$PATH":$HOME/miniconda2/bin\"" >> ~/.benv
+```
+
+For those who this appears a bit opaque to and want to know what is going on see here <sup id="a1">[1](#f1)</sup>.
+
+We now need to `source` the updated `~/.benv` so that the `PATH` variable in the current terminal session is updated:
+
+```
+source .benv
+```
+
+Alternatively we could have just closed the current terminal and started a new one. All future terminal sessions should have the updated `PATH` loaded by default.
+
+## <span id="create-conda-env">Creating the Conda environment</span>
+
+You should now have a working Conda installation. If you run
+
+```
+conda --help
+```
+from a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
+
+Assuming Conda is working, we will now create our Conda environment:
+
+```
+conda create -n mlp python=2.7
+```
+
+This bootstraps a new Conda environment named `mlp` with a minimal Python 2.7 install. You will be presented with a 'package plan' listing the packages to be installed and asked whether to proceed: type `y` then enter.
+
+We will now *activate* our created environment:
+
+```
+source activate mlp
+```
+
+or on Windows only
+
+```
+activate mlp
+```
+
+When a environment is activated its name will be prepended on to the prompt which should now look something like `(mlp) [machine-name]:~$` on DICE.
+
+**You need to run this `source activate mlp` command every time you wish to activate the `mlp` environment in a terminal (for example at the beginning of each lab)**. When the environment is activated, the environment will be searched first when running commands so that e.g. `python` will launch the Python interpreter installed locally in the `mlp` environment rather than a system-wide version.
+
+If you wish to deactivate an environment loaded in the current terminal e.g. to launch the system Python interpreter, you can run `source deactivate` (just `deactivate` on Windows).
+
+We will now install the dependencies for the course into the new environment:
+
+```
+conda install numpy scipy matplotlib jupyter
+```
+
+Again you will be given list of the packages to be installed and asked to confirm whether to proceed. Enter `y` then wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
+
+Once the installation is finished, to recover some disk space we can clear the package tarballs Conda just downloaded:
+
+```
+conda clean -t
+```
+
+These tarballs are usually cached to allow quicker installation into additional environments however we will only be using a single environment here so there is no need to keep them on disk.
+
+## Getting the course code and a short introduction to Git
+
+The next step in getting our environment set up will be to download the course code. This is available in a Git repository on Github:
+
+https://github.com/CSTR-Edinburgh/mlpractical
+
+[Git](https://git-scm.com/) is a distributed version control system and [Github](https://github.com) a popular site for hosting Git repositories. We will be using Git to distribute the code for all the labs and assignments. We will explain all the necessary `git` commands as we go, though those new to Git may find [this concise guide by Roger Dudler](http://rogerdudler.github.io/git-guide/) or [this slightly longer one from Atlassian](https://www.atlassian.com/git/tutorials/) useful.
+
+Git is installed by default on DICE desktops. If you are running a system which does not have Git installed, you can use Conda to install it in your environment using:
+
+```
+conda install git
+```
+
+We will now go over the process of [cloning](https://www.atlassian.com/git/tutorials/setting-up-a-repository/git-clone) a local copy of the `mlpractical` repository.
+
+---
+**Confident Git users only:**
+
+For those who have their own Github account (you can set up a free account easily [here](https://github.com/join)) and are confident Git users, you may wish to consider instead [creating a private fork](http://stackoverflow.com/a/30352360) of the `CSTR-Edinburgh/mlpractical` repository on Github. This is not required for the course, however it will allow you to push your local commits to Github making it easier to for example sync your work between DICE computers and a personal machine.
+
+**Note we do not recommend creating a public fork using the default forking mechanism on Github as this will make any commits you push to the fork publicly available which creates a risk of plagiarism.**
+
+If you are already familiar with Git you may wish to skip over the explanatory sections below.
+
+---
+
+If necessary, change the current directory in the terminal to the one you wish to clone the code to. By default we will assume you are cloning to your home directory (run `cd ~` to get to this directory) however if you have an existing system for organising your workspace feel free to keep to that. To clone the `mlpractical` code to the current directory run
+
+```
+git clone https://github.com/CSTR-Edinburgh/mlpractical.git
+```
+
+This will create a new `mlpractical` subdirectory with a local copy of the repository in it. Enter the directory and list all its contents, including hidden files, by running:
+
+```
+cd mlpractical
+ls -a  # Windows equivalent: dir /a
+```
+
+For the most part this will look much like any other directory, with there being the following three non-hidden sub-directories:
+
+  * `data`: Data files used in the labs and assignments.
+  * `mlp`: The custom Python package we will use in this course.
+  * `notebooks`: The Jupyter notebook files for each lab and coursework.
+
+Additionally there exists a hidden `.git` subdirectory (on Unix systems by default files and directories prepended with a period '.' are hidden). This directory contains the repository history database and various configuration files and references. Unless you are sure you know what you are doing you generally should not edit any of the files in this directory directly. Generally most configuration options can be enacted more safely using a `git config` command. For instance to globally set the user name and email used in commits run:
+
+```
+git config --global user.name "[your name]"
+git config --global user.email "[matric-number]@sms.ed.ac.uk"
+```
+
+From the root `mlpractical` directory if you now run:
+
+`git status`
+
+a status message containing information about your local clone of the repository should be displayed.
+
+Providing you have not made any changes yet, all that will be displayed is the name of the current *branch* (we will explain what a branch is to those new to Git in a little while), a message that the branch is up to date with the remote repository and that there is nothing to commit in the working directory.
+
+The two key concepts you will need to know about Git for this course are *commits* and *branches*.
+
+A *commit* in Git is a snapshot of the state of the project. The snapshots are recorded in the repository history and allow us to track changes to the code over time and rollback changes if necessary. In Git there is a three stage process to creating a new commit.
+
+  1. The relevant edits are made to files in the working directory and any new files created.
+
+  2. The files with changes to be committed (including any new files) are added to the *staging area* by running:
+
+  ```
+  git add [file1] [file2] ...
+  ```
+
+  3. Finally the *staged changes* are used to create a new commit by running
+
+  ```
+  git commit -m "A commit message describing the changes."
+  ```
+
+This writes the staged changes as a new commit in the repository history. We can see a log of the details of previous commits by running:
+
+```
+git log
+```
+
+Although it is not a requirement of the course for you to make regular commits of your work, we strongly recommend you do as it is a good habit to get into.
+
+The other key Git concept you will need to know about are *branches*. A branch in Git represents an independent line of development of a project. When a repository is first created it will contain a single branch, named `master` by default. Commits to this branch form a linear series of snapshots of the project.
+
+A new branch is created from a commit on an existing branch. Any commits made to this new branch then evolve as an independent and parallel line of changes - that is commits to the new branch will not affect the old branch and vice versa.
+
+A typical Git workflow in a software development setting would be to create a new branch whenever making changes to a project, for example to fix a bug or implement a new feature. These changes are then isolated from the main code base allowing regular commits without worrying about making unstable changes to the main code base. Key to this workflow is the ability to *merge* commits from a branch into another branch, e.g. when it is decided a new feature is sufficiently developed to be added to the main code base. Although merging branches is key aspect of using Git in many projects, as dealing with merge conflicts when two branches both make changes to same parts of files can be a somewhat tricky process, we will here generally try to avoid the need for merges.
+
+We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.
+
+To list the branches present in the local repository, run:
+
+```
+git branch
+```
+
+This will display a list of branches with a * next to the current branch. To switch to a different existing branch in the local repository  run
+
+```
+git checkout [branch-name]
+```
+
+This will change the code in the working directory to the current state of the checked out branch. Any files added to the staging area and committed will then create a new commit on this branch.
+
+To checkout a new branch from the remote `CSTR-Edinburgh/mlpractical` repository on Github, you can use:
+
+```
+git checkout -b [local-name-for-branch] origin/[remote-branch]
+```
+
+This will create and checkout and new local branch *tracking* the remote branch. We will use this command at the beginning of each lab to checkout the code for that week and create a local branch for you to commit your work to.
+
+## Installing the `mlp` Python package
+
+In your local repository we noted above the presence of a `mlp` subdirectory. This contains the custom Python package implementing the NumPy based neural network framework we will be using in this course.
+
+In order to make the modules in this package available in your environment we need install it. A [setuptools](https://setuptools.readthedocs.io/en/latest/) `setup.py` script is provided in the root of the `mlpractical` directory for this purpose.
+
+The standard way to install a Python package using a `setup.py` script is to run `python setup.py install`. This creates a copy of the package in the `site-packages` directory of the currently active Python environment.
+
+As we will be updating the code in the `mlp` package during the course of the labs this would require you to re-run  `python setup.py install` every time a change is made to the package. Instead we therefore recommend you install the package in development mode by running:
+
+```
+python setup.py develop
+```
+
+Instead of copying the package, this will instead create a symbolic link to the copy in the local repository. This means any changes made will be immediately available without the need to reinstall the package.
+
+Note that after the first time a Python module is loaded into an interpreter instance, using for example:
+
+```
+import mlp
+```
+
+Running the `import` statement any further times will have no effect even if the underlying module code has been changed. To reload an already imported module we instead need to use the [`reload`](https://docs.python.org/2.7/library/functions.html#reload) function.
+
+## Adding a data directory variable to the environment
+
+We observed previously the presence of a `data` subdirectory in the local repository. This directory holds the data files that will be used in the course. To enable the data loaders in the `mlp` package to locate these data files we need to set a `MLP_DATA_DIR` environment variable pointing to this directory.
+
+Assuming you used the recommended Miniconda install location and cloned the `mlpractical` repository to your home directory, this variable can be automatically defined when activating the environment by running the following commands (on non-Windows systems):
+
+```
+cd ~/miniconda2/envs/mlp
+mkdir -p ./etc/conda/activate.d
+mkdir -p ./etc/conda/deactivate.d
+echo -e '#!/bin/sh\n' >> ./etc/conda/activate.d/env_vars.sh
+echo "export MLP_DATA_DIR=$HOME/mlpractical/data" >> ./etc/conda/activate.d/env_vars.sh
+echo -e '#!/bin/sh\n' >> ./etc/conda/deactivate.d/env_vars.sh
+echo 'unset MLP_DATA_DIR' >> ./etc/conda/deactivate.d/env_vars.sh
+```
+
+And on Windows systems (replacing the `[]` placeholders with the relevant paths):
+
+```
+cd [path-to-conda-root]\envs\mlp
+mkdir .\etc\conda\activate.d
+mkdir .\etc\conda\deactivate.d
+@echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
+@echo "set MLP_DATA_DIR="  >> .\etc\conda\deactivate.d\env_vars.bat
+```
+
+After running these commands, deactivate then reactivate your environment to define the variable in the current session.
+
+## Loading the first lab notebook
+
+Your environment is now all set up so you can move on to the introductory exercises in the first lab notebook.
+
+One of the dependencies you installed in your environment earlier was Jupyter. Jupyter notebooks allow combining formatted text with runnable code cells and visualisation of the code output in an intuitive web application interface. Although originally specific to Python (under the previous moniker IPython notebooks) the notebook interface has now been abstracted making them available to a wide range of languages.
+
+There will be a Jupyter notebook available for each lab and assignment in this course, with a combination of explanatory sections for you to read through which will complement the material covered in lectures, as well as series of practical coding exercises to be written and run in the notebook interface. The first lab notebook will cover some of the basics of the notebook interface.
+
+To open a notebook, you first need to launch a Jupyter notebook server instance. From within the `mlpractical` directory containing your local copy of the repository (and with the `mlp` environment activated) run:
+
+```
+jupyter notebook
+```
+
+This will start a notebook server instance in the current terminal (with a series of status messages being streamed to the terminal output) and launch a browser window which will load the notebook application interface.
+
+By default the notebook interface will show a list of the files in the directory the notebook server was launched from when first loaded. If you click on the `notebooks` directory in this file list, a list of files in this directory should then be displayed. Click the `00_Introduction.ipynb` entry to load the first notebook.
+
+---
+
+<b id="f1">[1]</b> The `echo` command causes the following text to be streamed to an output (standard terminal output by default). Here we use the append redirection operator `>>` to redirect the `echo` output to a file `~/.benv`, with it being appended to the end of the current file. The text actually added is `export PATH="$PATH:[your-home-directory]/miniconda/bin"` with the `\"` being used to escape the quote characters. The `export` command defines system-wide environment variables (more rigorously those inherited by child shells) with `PATH` being the environment variable defining where `bash` searches for executables as a colon-seperated list of directories. Here we add the Miniconda binary directory to the end of the current `PATH` definition. [↩](#a1)
--- a/gitFAQ.md
+++ b/gitFAQ.md
@ -1,63 +0,0 @@
-## How do I get started?
-
-Setting up your virtual environment and getting the initial MLP materials is explained in the first part of the first lab, in `00_Introduction.ipynb`
-
-## How do I update to this weeks coursework?
-
-To avoid potential conflicts between the changes you have made since last week and our additions, we recommend stash your changes and pull the new code from the mlpractical repository by typing:
-
-```
-git stash save "Lab1 work"
-git pull
-```
-
-Then, if you need to, you can always (temporaily) restore a desired state of the repository.
-
-
-At any point you can use `git stash list` to print a list of all the current stashes. For example if you have one stash created as above the output would be:
-
-```
-stash@{0}: On master: Lab 1 work
-```
-
-The `stash@{0}` indicates the position of the stash in the stash stack (last in, first out so newer stashes will have lower indices) with 0 meaning this is the entry at the top of stack (and the only entry here). After that is an indication of which branch the stash was made from (here it was made from the default `master` branch), and finally the description you gave the stash when creating it.
-
-To restore changes saved in a stash you have two options. 
-
-The easiest option is to create a new *branch* in your local repository from the stash. A *branch* can be thought of as a parallel working copy of the repository, which is 'branched' off from a particular commit version of another branch, often as here the main 'master' branch. 
-
-First you need to make sure any changes made to the current branch are either committed or stashed using `git stash save` as above. You can check if you have any non staged changes since the last commit by running `git status` - if there are any they will be listed under a `Changes not staged for commit` section of the output.
-
-Once all changes are committed / stashed, you can then create a new branch from your previous stash. First run `git stash list` as above and take note of the stash index `i` in the `stash@{i}` indicator corresponding to the stash you wish to restore.
-
-Then run
-
-```
-git stash branch name_of_branch stash@{i}
-```
-where `name_of_branch` is some name to give the branch (e.g. `lab1`). This will take your stashed changes and apply them to the commit they were derived from in a new branch. If you run `git branch` you should now see something like
-
-```
-* lab1
-  master
-```
-where the asterisk indicates the currently active branch. Your current working copy should now be in an identical state to the point when you made the stash. You can now continue working from this branch and changes you make and commit will be on this branch alone.
-
-If you later want to return to the master branch (for example to pull some changes from github at the start of a new lab) you can do this by running
-
-```
-git checkout master
-```
-
-Again you need to make sure there are not any uncommitted stages on your current branch before you do this. Similarly you can use `git checkout branch_name` to check out any branch.
-
-The alternative to creating a separate branch to restore a stash to is to use `git stash apply` or `git stash pop` to merge the stashed changes in to your current branch working copy (apply keeps the stash after applying it while pop removes it from the stash stack). If the stashed changes involve updates to files you have also edited in your current working copy you will most probably end up with merge conflicts you will need to resolve. If you are comfortable doing this feel free to use this approach, however this is not something we will be able to help you with.
-
-
-## If I find an error in something in the practical, can a I push a change to correct it?
-
-Yes by making a fork, then a pull request.  But equally you can make a comment on nb.mit.edu or send email.  Again probably not worth bothering with the git way unless you already understand what to do.
-
-## What is a good tutorial on git?
-
-I like [this concise one from Roger Dudler](http://rogerdudler.github.io/git-guide/)  and a [slightly longer one from Atlassian](https://www.atlassian.com/git/tutorials/).  There are many others!
--- a/kernel_issue_fix.md
+++ b/kernel_issue_fix.md
@ -1,47 +0,0 @@
-
-# How to fix notebook's "kernel issues" on DICE
-
-Some people in MLP have been affected by a recent update to the `numpy` and `numerical` 
-libraries on DICE on 3 October.  The problem affects you if you get a message stating that the kernel was restarted when you run code involving `numpy`.
-
-If you have experienced these issues you have either:
-1. ended up using the default `atlas` libraries with `numpy` (which have been updated in the meantime) 
-2. or re-compiled `numpy` with the new DICE `OpenBLAS` that is available, but the `LD_LIBRARY_PATH` that you set during the first lab last week gave priority to load the `OpenBLAS` libraries compiled last time - which could introduce some unexepcted behaviour at runtime.
-
-## The Fix
-
-Follow the below steps **before** you activate the old virtual environment (or deactivate it if it is activated). The fix basically involves rebuilding the virtual environments. But the whole process is now much simpler due to the fact `OpenBLAS` is now a default numerical library on DICE.
-
-1.	Comment out (or remove) the `export=$LD_LIBRARY_PATH...` line in your ~/.bashrc file. Then type
-	```
-	unset LD_LIBRARY_PATH
-	``` 
-	in the terminal. To make sure this variable is not set, type `export` and check visually in the printed list of variables
-
-2.	Go to `~/mlpractical/repos-3rd/virtualenv` and install the new virtual environment (`venv2`) by typing: 
-	```
-	./virtualenv.py --python /usr/bin/python2.7 --no-site-packages $MLP_WDIR/venv2
-	```
-
-3.	Activate your new virtual environment by typing: 
-	```
-	source $MLP_WDIR/venv2/bin/activate 
-	```
-	and install the usual packages required by MLP using pip:
-	```
-	pip install pip --upgrade
-	pip install numpy
-	pip install ipython
-	pip install notebook
-	pip install matplotlib
-	```
-
-4.	Change directory to `~/mlpractical/repo-mlp` and check that `numpy` is linked to the DICE-standard `OpenBLAS` (and works) by starting ipython notebook:
-	```
-	ipython notebook
-	```
-	then run the first two interactive examples from `00_Introduction.py.`   If they run, then you can simply modify the `activate_mlp` alias in `./bashrc` to point to `venv2` instead of `venv`.
-
-5.	You can also remove both the old `venv` and the other unrequired directories that contain `numpy` and `OpenBLAS` sources in the `~/mlpractical/repos-3rd` directory.
-
-
--- a/mlp/init.py
+++ b/mlp/init.py
@ -0,0 +1,6 @@
+# -*- coding: utf-8 -*-
+"""Machine Learning Practical package."""
+
+__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham']
+
+DEFAULT_SEED = 123456  # Default random number generator seed if none provided.
--- a/mlp/conv.py
+++ b/mlp/conv.py
@ -1,126 +0,0 @@
-
-# Machine Learning Practical (INFR11119),
-# Pawel Swietojanski, University of Edinburgh
-
-
-import numpy
-import logging
-from mlp.layers import Layer
-
-
-logger = logging.getLogger(__name__)
-
-"""
-You have been given some very initial skeleton below. Feel free to build on top of it and/or
-modify it according to your needs. Just notice, you can factor out the convolution code out of
-the layer code, and just pass (possibly) different conv implementations for each of the stages
-in the model where you are expected to apply the convolutional operator. This will allow you to
-keep the layer implementation independent of conv operator implementation, and you can easily
-swap it layer, for example, for more efficient implementation if you came up with one, etc.
-"""
-
-def my1_conv2d(image, kernels, strides=(1, 1)):
-    """
-    Implements a 2d valid convolution of kernels with the image
-    Note: filer means the same as kernel and convolution (correlation) of those with the input space
-    produces feature maps (sometimes refereed to also as receptive fields). Also note, that
-    feature maps are synonyms here to channels, and as such num_inp_channels == num_inp_feat_maps
-    :param image: 4D tensor of sizes (batch_size, num_input_channels, img_shape_x, img_shape_y)
-    :param filters: 4D tensor of filters of size (num_inp_feat_maps, num_out_feat_maps, kernel_shape_x, kernel_shape_y)
-    :param strides: a tuple (stride_x, stride_y), specifying the shift of the kernels in x and y dimensions
-    :return: 4D tensor of size (batch_size, num_out_feature_maps, feature_map_shape_x, feature_map_shape_y)
-    """
-    raise NotImplementedError('Write me!')
-
-
-class ConvLinear(Layer):
-    def __init__(self,
-                 num_inp_feat_maps,
-                 num_out_feat_maps,
-                 image_shape=(28, 28),
-                 kernel_shape=(5, 5),
-                 stride=(1, 1),
-                 irange=0.2,
-                 rng=None,
-                 conv_fwd=my1_conv2d,
-                 conv_bck=my1_conv2d,
-                 conv_grad=my1_conv2d):
-        """
-
-        :param num_inp_feat_maps: int, a number of input feature maps (channels)
-        :param num_out_feat_maps: int, a number of output feature maps (channels)
-        :param image_shape: tuple, a shape of the image
-        :param kernel_shape: tuple, a shape of the kernel
-        :param stride: tuple, shift of kernels in both dimensions
-        :param irange: float, initial range of the parameters
-        :param rng: RandomState object, random number generator
-        :param conv_fwd: handle to a convolution function used in fwd-prop
-        :param conv_bck: handle to a convolution function used in backward-prop
-        :param conv_grad: handle to a convolution function used in pgrads
-        :return:
-        """
-
-        super(ConvLinear, self).__init__(rng=rng)
-
-        raise NotImplementedError()
-
-    def fprop(self, inputs):
-        raise NotImplementedError()
-
-    def bprop(self, h, igrads):
-        raise NotImplementedError()
-
-    def bprop_cost(self, h, igrads, cost):
-        raise NotImplementedError('ConvLinear.bprop_cost method not implemented')
-
-    def pgrads(self, inputs, deltas, l1_weight=0, l2_weight=0):
-        raise NotImplementedError()
-
-    def get_params(self):
-        raise NotImplementedError()
-
-    def set_params(self, params):
-        raise NotImplementedError()
-
-    def get_name(self):
-        return 'convlinear'
-
-#you can derive here particular non-linear implementations:
-#class ConvSigmoid(ConvLinear):
-#...
-
-
-class ConvMaxPool2D(Layer):
-    def __init__(self,
-                 num_feat_maps,
-                 conv_shape,
-                 pool_shape=(2, 2),
-                 pool_stride=(2, 2)):
-        """
-
-        :param conv_shape: tuple, a shape of the lower convolutional feature maps output
-        :param pool_shape: tuple, a shape of pooling operator
-        :param pool_stride: tuple, a strides for pooling operator
-        :return:
-        """
-
-        super(ConvMaxPool2D, self).__init__(rng=None)
-        raise NotImplementedError()
-
-    def fprop(self, inputs):
-        raise NotImplementedError()
-
-    def bprop(self, h, igrads):
-        raise NotImplementedError()
-
-    def get_params(self):
-        return []
-
-    def pgrads(self, inputs, deltas, **kwargs):
-        return []
-
-    def set_params(self, params):
-        pass
-
-    def get_name(self):
-        return 'convmaxpool2d'
--- a/mlp/costs.py
+++ b/mlp/costs.py
@ -1,64 +1,173 @@
-# Machine Learning Practical (INFR11119),
-# Pawel Swietojanski, University of Edinburgh
+# -*- coding: utf-8 -*-
+"""Model costs.
+
+This module defines cost functions, with the aim of model training being to
+minimise the cost function given a set of inputs and target outputs. The cost
+functions typically measure some concept of distance between the model outputs
+and target outputs.
+"""
+
+import numpy as np


-import numpy
+class MeanSquaredErrorCost(object):
+    """Mean squared error cost."""

+    def __call__(self, outputs, targets):
+        """Calculates cost function given a batch of outputs and targets.

-class Cost(object):
-    """
-    Defines an interface for the cost object
-    """
-    def cost(self, y, t, **kwargs):
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar cost function value.
        """
-        Implements a cost for monitoring purposes
-        :param y: matrix -- an output of the model
-        :param t: matrix -- an expected output the model should produce
-        :param kwargs: -- some optional parameters required by the cost
-        :return: the scalar value representing the cost given y and t
+        return 0.5 * np.mean(np.sum((outputs - targets)**2, axis=1))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of cost function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of cost function with respect to outputs.
        """
-        raise NotImplementedError()
+        return outputs - targets

-    def grad(self, y, t, **kwargs):
+    def __repr__(self):
+        return 'MeanSquaredErrorCost'
+
+
+class BinaryCrossEntropyCost(object):
+    """Binary cross entropy cost."""
+
+    def __call__(self, outputs, targets):
+        """Calculates cost function given a batch of outputs and targets.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar cost function value.
        """
-        Implements a gradient of the cost w.r.t y
-        :param y: matrix -- an output of the model
-        :param t: matrix -- an expected output the model should produce
-        :param kwargs: -- some optional parameters required by the cost
-        :return: matrix - the gradient of the cost w.r.t y
+        return -np.mean(
+            targets * np.log(outputs) + (1. - targets) * np.log(1. - ouputs))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of cost function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of cost function with respect to outputs.
        """
-        raise NotImplementedError()
+        return (1. - targets) / (1. - outputs) - (targets / outputs)

-    def get_name(self):
-        return 'cost'
+    def __repr__(self):
+        return 'BinaryCrossEntropyCost'


-class MSECost(Cost):
-    def cost(self, y, t, **kwargs):
-        se = 0.5*numpy.sum((y - t)**2, axis=1)
-        return numpy.mean(se)
+class BinaryCrossEntropySigmoidCost(object):
+    """Binary cross entropy cost with logistic sigmoid applied to outputs."""

-    def grad(self, y, t, **kwargs):
-        return y - t
+    def __call__(self, outputs, targets):
+        """Calculates cost function given a batch of outputs and targets.

-    def get_name(self):
-        return 'mse'
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar cost function value.
+        """
+        probs = 1. / (1. + np.exp(-outputs))
+        return -np.mean(
+            targets * np.log(probs) + (1. - targets) * np.log(1. - probs))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of cost function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of cost function with respect to outputs.
+        """
+        probs = 1. / (1. + np.exp(-outputs))
+        return probs - targets
+
+    def __repr__(self):
+        return 'BinaryCrossEntropySigmoidCost'


-class CECost(Cost):
-    """
-    Cross Entropy (Negative log-likelihood) cost for multiple classes
-    """
-    def cost(self, y, t, **kwargs):
-        #assumes t is 1-of-K coded and y is a softmax
-        #transformed estimate at the output layer
-        nll = t * numpy.log(y)
-        return -numpy.mean(numpy.sum(nll, axis=1))
+class CrossEntropyCost(object):
+    """Multi-class cross entropy cost."""

-    def grad(self, y, t, **kwargs):
-        #assumes t is 1-of-K coded and y is a softmax
-        #transformed estimate at the output layer
-        return y - t
+    def __call__(self, outputs, targets):
+        """Calculates cost function given a batch of outputs and targets.

-    def get_name(self):
-        return 'ce'
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar cost function value.
+        """
+        return -np.mean(np.sum(targets * np.log(outputs), axis=1))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of cost function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of cost function with respect to outputs.
+        """
+        return -targets / outputs
+
+    def __repr__(self):
+        return 'CrossEntropyCost'
+
+
+class CrossEntropySoftmaxCost(object):
+    """Multi-class cross entropy cost with Softmax applied to outputs."""
+
+    def __call__(self, outputs, targets):
+        """Calculates cost function given a batch of outputs and targets.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar cost function value.
+        """
+        probs = np.exp(outputs)
+        probs /= probs.sum(-1)[:, None]
+        return -np.mean(np.sum(targets * np.log(probs), axis=1))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of cost function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of cost function with respect to outputs.
+        """
+        probs = np.exp(outputs)
+        probs /= probs.sum(-1)[:, None]
+        return probs - targets
+
+    def __repr__(self):
+        return 'CrossEntropySoftmaxCost'
--- a/mlp/data_providers.py
+++ b/mlp/data_providers.py
@ -0,0 +1,206 @@
+# -*- coding: utf-8 -*-
+"""Data providers.
+
+This module provides classes for loading datasets and iterating over batches of
+data points.
+"""
+
+import cPickle
+import gzip
+import numpy as np
+import os
+from mlp import DEFAULT_SEED
+
+
+class DataProvider(object):
+    """Generic data provider."""
+
+    def __init__(self, inputs, targets, batch_size, max_num_batches=-1,
+                 shuffle_order=True, rng=None):
+        """Create a new data provider object.
+
+        Args:
+            inputs (ndarray): Array of data input features of shape
+                (num_data, input_dim).
+            targets (ndarray): Array of data output targets of shape
+                (num_data, output_dim) or (num_data,) if output_dim == 1.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        self.inputs = inputs
+        self.targets = targets
+        self.batch_size = batch_size
+        assert max_num_batches != 0 and not max_num_batches < -1, (
+            'max_num_batches should be -1 or > 0')
+        self.max_num_batches = max_num_batches
+        # maximum possible number of batches is equal to number of whole times
+        # batch_size divides in to the number of data points which can be
+        # found using integer division
+        possible_num_batches = self.inputs.shape[0] // batch_size
+        if self.max_num_batches == -1:
+            self.num_batches = possible_num_batches
+        else:
+            self.num_batches = min(self.max_num_batches, possible_num_batches)
+        self.shuffle_order = shuffle_order
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+        self.reset()
+
+    def __iter__(self):
+        """Implements Python iterator interface.
+
+        This should return an object implementing a `next` method which steps
+        through a sequence returning one element at a time and raising
+        `StopIteration` when at the end of the sequence. Here the object
+        returned is the DataProvider itself.
+        """
+        return self
+
+    def reset(self):
+        """Resets the provider to the initial state to use in a new epoch."""
+        self._curr_batch = 0
+        if self.shuffle_order:
+            self.shuffle()
+
+    def shuffle(self):
+        """Randomly shuffles order of data."""
+        new_order = self.rng.permutation(self.inputs.shape[0])
+        self.inputs = self.inputs[new_order]
+        self.targets = self.targets[new_order]
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        if self._curr_batch + 1 > self.num_batches:
+            # no more batches in current iteration through data set so reset
+            # the dataset for another pass and indicate iteration is at end
+            self.reset()
+            raise StopIteration()
+        # create an index slice corresponding to current batch number
+        batch_slice = slice(self._curr_batch * self.batch_size,
+                            (self._curr_batch + 1) * self.batch_size)
+        inputs_batch = self.inputs[batch_slice]
+        targets_batch = self.targets[batch_slice]
+        self._curr_batch += 1
+        return inputs_batch, targets_batch
+
+
+class MNISTDataProvider(DataProvider):
+    """Data provider for MNIST handwritten digit images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None):
+        """Create a new MNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the MNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'eval'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.which_set = which_set
+        self.num_classes = 10
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        data_path = os.path.join(
+            os.environ['MLP_DATA_DIR'], 'mnist_{0}.pkl.gz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # use a context-manager to ensure the files are properly closed after
+        # we are finished with them
+        with gzip.open(data_path) as f:
+            inputs, targets = cPickle.load(f)
+        # pass the loaded data to the parent class __init__
+        super(MNISTDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(MNISTDataProvider, self).next()
+        return inputs_batch, self.to_one_of_k(targets_batch)
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+
+class MetOfficeDataProvider(DataProvider):
+    """South Scotland Met Office weather data provider."""
+
+    def __init__(self, window_size, batch_size=10, max_num_batches=-1,
+                 shuffle_order=True, rng=None):
+        """Create a new Met Offfice data provider object.
+
+        Args:
+            window_size (int): Size of windows to split weather time series
+               data into. The constructed input features will be the first
+               `window_size - 1` entries in each window and the target outputs
+               the last entry in each window.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        data_path = os.path.join(
+            os.environ['MLP_DATA_DIR'], 'HadSSP_daily_qc.txt')
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        raw = np.loadtxt(data_path, skiprows=3, usecols=range(2, 32))
+        assert window_size > 1, 'window_size must be at least 2.'
+        self.window_size = window_size
+        #filter out all missing datapoints and flatten to a vector
+        filtered = raw[raw >= 0].flatten()
+        #normalise data to zero mean, unit standard deviation
+        mean = np.mean(filtered)
+        std = np.std(filtered)
+        normalised = (filtered - mean) / std
+        # create a view on to array corresponding to a rolling window
+        shape = (normalised.shape[-1] - self.window_size + 1, self.window_size)
+        strides = normalised.strides + (normalised.strides[-1],)
+        windowed = np.lib.stride_tricks.as_strided(
+            normalised, shape=shape, strides=strides)
+        # inputs are first (window_size - 1) entries in windows
+        inputs = windowed[:, :-1]
+        # targets are last entry in windows
+        targets = windowed[:, -1]
+        super(MetOfficeDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
--- a/mlp/dataset.py
+++ b/mlp/dataset.py
@ -1,351 +0,0 @@
-
-# Machine Learning Practical (INFR11119),
-# Pawel Swietojanski, University of Edinburgh
-
-import cPickle
-import gzip
-import numpy
-import os
-import logging
-
-
-logger = logging.getLogger(__name__)
-
-
-class DataProvider(object):
-    """
-    Data provider defines an interface for our
-    generic data-independent readers.
-    """
-    def __init__(self, batch_size, randomize=True, rng=None):
-        """
-        :param batch_size: int, specifies the number
-               of elements returned at each step
-        :param randomize: bool, shuffles examples prior
-               to iteration, so they are presented in random
-               order for stochastic gradient descent training
-        :return:
-        """
-        self.batch_size = batch_size
-        self.randomize = randomize
-        self._curr_idx = 0
-        self.rng = rng
-
-        if self.rng is None:
-            seed=[2015, 10, 1]
-            self.rng = numpy.random.RandomState(seed)
-
-    def reset(self):
-        """
-        Resets the provider to the initial state to
-        use in another epoch
-        :return: None
-        """
-        self._curr_idx = 0
-
-    def __randomize(self):
-        """
-        Data-specific implementation of shuffling mechanism
-        :return:
-        """
-        raise NotImplementedError()
-
-    def __iter__(self):
-        """
-        This method says an object is iterable.
-        """
-        return self
-
-    def next(self):
-        """
-        Data-specific iteration mechanism. Called each step 
-        (i.e. each iteration in a loop)
-        unitl StopIteration() exception is raised.
-        :return:
-        """
-        raise NotImplementedError()
-
-    def num_examples(self):
-        """
-        Returns a number of data-points in dataset
-        """
-        return NotImplementedError()
-
-
-
-class MNISTDataProvider(DataProvider):
-    """
-    The class iterates over MNIST digits dataset, in possibly
-    random order.
-    """
-    def __init__(self, dset,
-                 batch_size=10,
-                 max_num_batches=-1,
-                 max_num_examples=-1,
-                 randomize=True,
-                 rng=None,
-                 conv_reshape=False):
-
-        super(MNISTDataProvider, self).\
-            __init__(batch_size, randomize, rng)
-
-        assert dset in ['train', 'valid', 'eval'], (
-            "Expected dset to be either 'train', "
-            "'valid' or 'eval' got %s" % dset
-        )
-        
-        assert max_num_batches != 0, (
-            "max_num_batches should be != 0"
-        )
-        
-        if max_num_batches > 0 and max_num_examples > 0:
-            logger.warning("You have specified both 'max_num_batches' and " \
-                  "a deprecead 'max_num_examples' arguments. We will " \
-                  "use the former over the latter.")
-
-        dset_path = './data/mnist_%s.pkl.gz' % dset
-        assert os.path.isfile(dset_path), (
-            "File %s was expected to exist!." % dset_path
-        )
-
-        with gzip.open(dset_path) as f:
-            x, t = cPickle.load(f)
-
-        self._max_num_batches = max_num_batches
-        #max_num_examples arg was provided for backward compatibility
-        #but it maps us to the max_num_batches anyway
-        if max_num_examples > 0 and max_num_batches < 0:
-            self._max_num_batches = max_num_examples / self.batch_size      
-            
-        self.x = x
-        self.t = t
-        self.num_classes = 10
-        self.conv_reshape = conv_reshape
-
-        self._rand_idx = None
-        if self.randomize:
-            self._rand_idx = self.__randomize()
-
-    def reset(self):
-        super(MNISTDataProvider, self).reset()
-        if self.randomize:
-            self._rand_idx = self.__randomize()
-
-    def __randomize(self):
-        assert isinstance(self.x, numpy.ndarray)
-
-        if self._rand_idx is not None and self._max_num_batches > 0:
-            return self.rng.permutation(self._rand_idx)
-        else:
-            #the max_to_present secures that random examples
-            #are returned from the same pool each time (in case
-            #the total num of examples was limited by max_num_batches)
-            max_to_present = self.batch_size*self._max_num_batches \
-                                if self._max_num_batches > 0 else self.x.shape[0]
-            return self.rng.permutation(numpy.arange(0, self.x.shape[0]))[0:max_to_present]
-
-    def next(self):
-
-        has_enough = (self._curr_idx + self.batch_size) <= self.x.shape[0]
-        presented_max = (0 < self._max_num_batches <= (self._curr_idx / self.batch_size))
-
-        if not has_enough or presented_max:
-            raise StopIteration()
-
-        if self._rand_idx is not None:
-            range_idx = \
-                self._rand_idx[self._curr_idx:self._curr_idx + self.batch_size]
-        else:
-            range_idx = \
-                numpy.arange(self._curr_idx, self._curr_idx + self.batch_size)
-
-        rval_x = self.x[range_idx]
-        rval_t = self.t[range_idx]
-
-        self._curr_idx += self.batch_size
-
-        if self.conv_reshape:
-            rval_x = rval_x.reshape(self.batch_size, 1, 28, 28)
-
-        return rval_x, self.__to_one_of_k(rval_t)
-
-    def num_examples(self):
-        return self.x.shape[0]
-
-    def num_examples_presented(self):
-        return self._curr_idx + 1
-
-    def __to_one_of_k(self, y):
-        rval = numpy.zeros((y.shape[0], self.num_classes), dtype=numpy.float32)
-        for i in xrange(y.shape[0]):
-            rval[i, y[i]] = 1
-        return rval
-
-
-class MetOfficeDataProvider(DataProvider):
-    """
-    The class iterates over South Scotland Weather, in possibly
-    random order.
-    """
-    def __init__(self, window_size,
-                 batch_size=10,
-                 max_num_batches=-1,
-                 max_num_examples=-1,
-                 randomize=True):
-
-        super(MetOfficeDataProvider, self).\
-            __init__(batch_size, randomize)
-
-        dset_path = './data/HadSSP_daily_qc.txt'
-        assert os.path.isfile(dset_path), (
-            "File %s was expected to exist!." % dset_path
-        )
-
-        if max_num_batches > 0 and max_num_examples > 0:
-            logger.warning("You have specified both 'max_num_batches' and " \
-                  "a deprecead 'max_num_examples' arguments. We will " \
-                  "use the former over the latter.")
-        
-        raw = numpy.loadtxt(dset_path, skiprows=3, usecols=range(2, 32))
-        
-        self.window_size = window_size
-        self._max_num_batches = max_num_batches
-        #max_num_examples arg was provided for backward compatibility
-        #but it maps us to the max_num_batches anyway
-        if max_num_examples > 0 and max_num_batches < 0:
-            self._max_num_batches = max_num_examples / self.batch_size       
-        
-        #filter out all missing datapoints and
-        #flatten a matrix to a vector, so we will get
-        #a time preserving representation of measurments
-        #with self.x[0] being the first day and self.x[-1] the last
-        self.x = raw[raw >= 0].flatten()
-        
-        #normalise data to zero mean, unit variance
-        mean = numpy.mean(self.x)
-        var = numpy.var(self.x)
-        assert var >= 0.01, (
-            "Variance too small %f " % var
-        )
-        self.x = (self.x-mean)/var
-        
-        self._rand_idx = None
-        if self.randomize:
-            self._rand_idx = self.__randomize()
-
-    def reset(self):
-        super(MetOfficeDataProvider, self).reset()
-        if self.randomize:
-            self._rand_idx = self.__randomize()
-
-    def __randomize(self):
-        assert isinstance(self.x, numpy.ndarray)
-        # we generate random indexes starting from window_size, i.e. 10th absolute element
-        # in the self.x vector, as we later during mini-batch preparation slice
-        # the self.x container backwards, i.e. given we want to get a training 
-        # data-point for 11th day, we look at 10 preeceding days. 
-        # Note, we cannot do this, for example, for the 5th day as
-        # we do not have enough observations to make an input (10 days) to the model
-        return numpy.random.permutation(numpy.arange(self.window_size, self.x.shape[0]))
-
-    def next(self):
-
-        has_enough = (self.window_size + self._curr_idx + self.batch_size) <= self.x.shape[0]
-        presented_max = (0 < self._max_num_batches <= (self._curr_idx / self.batch_size))
-
-        if not has_enough or presented_max:
-            raise StopIteration()
-
-        if self._rand_idx is not None:
-            range_idx = \
-                self._rand_idx[self._curr_idx:self._curr_idx + self.batch_size]
-        else:
-            range_idx = \
-                numpy.arange(self.window_size + self._curr_idx, 
-                             self.window_size + self._curr_idx + self.batch_size)
-
-        #build slicing matrix of size minibatch, which will contain batch_size
-        #rows, each keeping indexes that selects windows_size+1 [for (x,t)] elements
-        #from data vector (self.x) that itself stays always sorted w.r.t time
-        range_slices = numpy.zeros((self.batch_size, self.window_size + 1), dtype=numpy.int32)
-       
-        for i in xrange(0, self.batch_size):
-            range_slices[i, :] = \
-                numpy.arange(range_idx[i], 
-                             range_idx[i] - self.window_size - 1, 
-                             -1,
-                             dtype=numpy.int32)[::-1]
-
-        #here we use advanced indexing to select slices from observation vector
-        #last column of rval_x makes our targets t (as we splice window_size + 1
-        tmp_x = self.x[range_slices]
-        rval_x = tmp_x[:,:-1]
-        rval_t = tmp_x[:,-1].reshape(self.batch_size, -1)
-        
-        self._curr_idx += self.batch_size
-
-        return rval_x, rval_t
-
-    
-class FuncDataProvider(DataProvider):
-    """
-    Function gets as an argument a list of functions defining the means
-    of a normal distribution to sample from.
-    """
-    def __init__(self,
-                 fn_list=[lambda x: x ** 2, lambda x: numpy.sin(x)],
-                 std_list=[0.1, 0.1],
-                 x_from = 0.0,
-                 x_to = 1.0,
-                 points_per_fn=200,
-                 batch_size=10,
-                 randomize=True):
-        """
-        """
-
-        super(FuncDataProvider, self).__init__(batch_size, randomize)
-
-        def sample_points(y, std):
-            ys = numpy.zeros_like(y)
-            for i in xrange(y.shape[0]):
-                ys[i] = numpy.random.normal(y[i], std)
-            return ys
-
-        x = numpy.linspace(x_from, x_to, points_per_fn, dtype=numpy.float32)
-        means = [fn(x) for fn in fn_list]
-        y = [sample_points(mean, std) for mean, std in zip(means, std_list)]
-
-        self.x_orig = x
-        self.y_class = y
-
-        self.x = numpy.concatenate([x for ys in y])
-        self.y = numpy.concatenate([ys for ys in y])
-
-        if self.randomize:
-            self._rand_idx = self.__randomize()
-        else:
-            self._rand_idx = None
-
-    def __randomize(self):
-        assert isinstance(self.x, numpy.ndarray)
-        return numpy.random.permutation(numpy.arange(0, self.x.shape[0]))
-
-    def __iter__(self):
-        return self
-
-    def next(self):
-        if (self._curr_idx + self.batch_size) >= self.x.shape[0]:
-            raise StopIteration()
-
-        if self._rand_idx is not None:
-            range_idx = self._rand_idx[self._curr_idx:self._curr_idx + self.batch_size]
-        else:
-            range_idx = numpy.arange(self._curr_idx, self._curr_idx + self.batch_size)
-
-        x = self.x[range_idx]
-        y = self.y[range_idx]
-
-        self._curr_idx += self.batch_size
-
-        return x, y
-
--- a/mlp/initialisers.py
+++ b/mlp/initialisers.py
@ -0,0 +1,65 @@
+# -*- coding: utf-8 -*-
+"""Parameter initialisers.
+
+This module defines classes to initialise the parameters in a layer.
+"""
+
+import numpy as np
+from mlp import DEFAULT_SEED
+
+
+class ConstantInit(object):
+    """Constant parameter initialiser."""
+
+    def __init__(self, value):
+        """Construct a constant parameter initialiser.
+
+        Args:
+            value: Value to initialise parameter to.
+        """
+        self.value = value
+
+    def __call__(self, shape):
+        return np.ones(shape=shape) * self.value
+
+
+class UniformInit(object):
+    """Random uniform parameter initialiser."""
+
+    def __init__(self, low, high, rng=None):
+        """Construct a random uniform parameter initialiser.
+
+        Args:
+            low: Lower bound of interval to sample from.
+            high: Upper bound of interval to sample from.
+            rng (RandomState): Seeded random number generator.
+        """
+        self.low = low
+        self.high = high
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+
+    def __call__(self, shape):
+        return self.rng.uniform(low=self.low, high=self.high, size=shape)
+
+
+class NormalInit(object):
+    """Random normal parameter initialiser."""
+
+    def __init__(self, mean, std, rng=None):
+        """Construct a random uniform parameter initialiser.
+
+        Args:
+            mean: Mean of distribution to sample from.
+            std: Standard deviation of distribution to sample from.
+            rng (RandomState): Seeded random number generator.
+        """
+        self.mean = mean
+        self.std = std
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+
+    def __call__(self, shape):
+        return self.rng.normal(loc=self.mean, scale=self.std, size=shape)
--- a/mlp/layers.py
+++ b/mlp/layers.py
@ -1,561 +1,325 @@
+# -*- coding: utf-8 -*-
+"""Layer definitions.

-# Machine Learning Practical (INFR11119),
-# Pawel Swietojanski, University of Edinburgh
+This module defines classes which encapsulate a single layer.

-import numpy
-import logging
-from mlp.costs import Cost
+These layers map input activations to output activation with the `fprop`
+method and map gradients with repsect to outputs to gradients with respect to
+their inputs with the `bprop` method.

+Some layers will have learnable parameters and so will additionally define
+methods for getting and setting parameter and calculating gradients with
+respect to the layer parameters.
+"""

-logger = logging.getLogger(__name__)
-
-
-def max_and_argmax(x, axes=None, keepdims_max=False, keepdims_argmax=False):
-    """
-    Return both max and argmax for the given multi-dimensional array, possibly
-    preserve the original shapes
-    :param x: input tensor
-    :param axes: tuple of ints denoting axes across which
-                 one should perform reduction
-    :param keepdims_max: boolean, if true, shape of x is preserved in result
-    :param keepdims_argmax:, boolean, if true, shape of x is preserved in result
-    :return: max (number) and argmax (indices) of max element along certain axes
-             in multi-dimensional tensor
-    """
-    if axes is None:
-        rval_argmax = numpy.argmax(x)
-        if keepdims_argmax:
-            rval_argmax = numpy.unravel_index(rval_argmax, x.shape)
-    else:
-        if isinstance(axes, int):
-            axes = (axes,)
-        axes = tuple(axes)
-        keep_axes = numpy.array([i for i in range(x.ndim) if i not in axes])
-        transposed_x = numpy.transpose(x, numpy.concatenate((keep_axes, axes)))
-        reshaped_x = transposed_x.reshape(transposed_x.shape[:len(keep_axes)] + (-1,))
-        rval_argmax = numpy.asarray(numpy.argmax(reshaped_x, axis=-1), dtype=numpy.int64)
-
-        # rval_max_arg keeps the arg index referencing to the axis along which reduction was performed (axis=-1)
-        # when keepdims_argmax is True we need to map it back to the original shape of tensor x
-        # print 'rval maxaarg', rval_argmax.ndim, rval_argmax.shape, rval_argmax
-        if keepdims_argmax:
-            dim = tuple([x.shape[a] for a in axes])
-            rval_argmax = numpy.array([idx + numpy.unravel_index(val, dim)
-                                       for idx, val in numpy.ndenumerate(rval_argmax)])
-            # convert to numpy indexing convention (row indices first, then columns)
-            rval_argmax = zip(*rval_argmax)
-
-    if keepdims_max is False and keepdims_argmax is True:
-        # this could potentially save O(N) steps by not traversing array once more
-        # to get max value, haven't benchmark it though
-        rval_max = x[rval_argmax]
-    else:
-        rval_max = numpy.asarray(numpy.amax(x, axis=axes, keepdims=keepdims_max))
-
-    return rval_max, rval_argmax
-
-
-class MLP(object):
-    """
-    This is a container for an arbitrary sequence of other transforms
-    On top of this, the class also keeps the state of the model, i.e.
-    the result of forward (activations) and backward (deltas) passes
-    through the model (for a mini-batch), which is required to compute
-    the gradients for the parameters
-    """
-    def __init__(self, cost, rng=None):
-
-        assert isinstance(cost, Cost), (
-            "Cost needs to be of type mlp.costs.Cost, got %s" % type(cost)
-        )
-
-        self.layers = [] #the actual list of network layers
-        self.activations = [] #keeps forward-pass activations (h from equations)
-                              # for a given minibatch (or features at 0th index)
-        self.deltas = [] #keeps back-propagated error signals (deltas from equations)
-                         # for a given minibatch and each layer
-        self.cost = cost
-
-        if rng is None:
-            self.rng = numpy.random.RandomState([2015,11,11])
-        else:
-            self.rng = rng
-
-    def fprop(self, x):
-        """
-
-        :param inputs: mini-batch of data-points x
-        :return: y (top layer activation) which is an estimate of y given x
-        """
-
-        if len(self.activations) != len(self.layers) + 1:
-            self.activations = [None]*(len(self.layers) + 1)
-
-        self.activations[0] = x
-        for i in xrange(0, len(self.layers)):
-            self.activations[i+1] = self.layers[i].fprop(self.activations[i])
-        return self.activations[-1]
-
-    def fprop_dropout(self, x, dp_scheduler):
-        """
-        :param inputs: mini-batch of data-points x
-        :param dp_scheduler: dropout scheduler
-        :return: y (top layer activation) which is an estimate of y given x
-        """
-
-        if len(self.activations) != len(self.layers) + 1:
-            self.activations = [None]*(len(self.layers) + 1)
-
-        p_inp, p_hid = dp_scheduler.get_rate()
-
-        d_inp = 1
-        p_inp_scaler, p_hid_scaler = 1.0/p_inp, 1.0/p_hid
-        if p_inp < 1:
-            d_inp = self.rng.binomial(1, p_inp, size=x.shape)
-
-        self.activations[0] = p_inp_scaler*d_inp*x #it's OK to scale the inputs by p_inp_scaler here
-        self.activations[1] = self.layers[0].fprop(self.activations[0])
-        for i in xrange(1, len(self.layers)):
-            d_hid = 1
-            if p_hid < 1:
-                d_hid = self.rng.binomial(1, p_hid, size=self.activations[i].shape)
-            self.activations[i] *= d_hid #but not the hidden activations, since the non-linearity grad *may* explicitly depend on them
-            self.activations[i+1] = self.layers[i].fprop(p_hid_scaler*self.activations[i])
-
-        return self.activations[-1]
-
-    def bprop(self, cost_grad, dp_scheduler=None):
-        """
-        :param cost_grad: matrix -- grad of the cost w.r.t y
-        :return: None, the deltas are kept in the model
-        """
-
-        # allocate the list of deltas for each layer
-        # note, we do not use all of those fields but
-        # want to keep it aligned 1:1 with activations,
-        # which will simplify indexing later on when
-        # computing grads w.r.t parameters
-        if len(self.deltas) != len(self.activations):
-            self.deltas = [None]*len(self.activations)
-
-        # treat the top layer in special way, as it deals with the
-        # cost, which may lead to some simplifications
-        top_layer_idx = len(self.layers)
-        self.deltas[top_layer_idx], ograds = self.layers[top_layer_idx - 1].\
-            bprop_cost(self.activations[top_layer_idx], cost_grad, self.cost)
-
-        p_hid_scaler = 1.0
-        if dp_scheduler is not None:
-            p_inp, p_hid = dp_scheduler.get_rate()
-            p_hid_scaler /= p_hid
-
-        # then back-prop through remaining layers
-        for i in xrange(top_layer_idx - 1, 0, -1):
-            self.deltas[i], ograds = self.layers[i - 1].\
-                bprop(self.activations[i], ograds*p_hid_scaler)
-
-    def add_layer(self, layer):
-        self.layers.append(layer)
-
-    def set_layers(self, layers):
-        self.layers = layers
-
-    def get_name(self):
-        return 'mlp'
+import numpy as np
+import mlp.initialisers as init


 class Layer(object):
+    """Abstract class defining the interface for a layer."""
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        raise NotImplementedError()
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        raise NotImplementedError()
+
+
+class LayerWithParameters(Layer):
+    """Abstract class defining the interface for a layer with parameters."""
+
+    def grads_wrt_params(self, inputs, grads_wrt_outputs):
+        """Calculates gradients with respect to layer parameters.
+
+        Args:
+            inputs: Array of inputs to layer of shape (batch_size, input_dim).
+            grads_wrt_to_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            List of arrays of gradients with respect to the layer parameters
+            with parameter gradients appearing in same order in tuple as
+            returned from `get_params` method.
+        """
+        raise NotImplementedError()
+
+    def params_cost(self):
+        """Returns the parameter dependent cost term for this layer.
+
+        If no parameter-dependent cost terms are set this returns zero.
+        """
+        raise NotImplementedError()
+
+    @property
+    def params(self):
+        """Returns a list of parameters of layer.
+
+        Returns:
+            List of current parameter values. This list should be in the
+            corresponding order to the `values` argument to `set_params`.
+        """
+        raise NotImplementedError()
+
+    @params.setter
+    def params(self, values):
+        """Sets layer parameters from a list of values.
+
+        Args:
+            values: List of values to set parameters to. This list should be
+                in the corresponding order to what is returned by `get_params`.
+        """
+        raise NotImplementedError()
+
+
+class AffineLayer(LayerWithParameters):
+    """Layer implementing an affine tranformation of its inputs.
+
+    This layer is parameterised by a weight matrix and bias vector.
    """
-    Abstract class defining an interface for
-    other transforms.
-    """
-    def __init__(self, rng=None):

-        if rng is None:
-            seed=[2015, 10, 1]
-            self.rng = numpy.random.RandomState(seed)
-        else:
-            self.rng = rng
+    def __init__(self, input_dim, output_dim,
+                 weights_initialiser=init.UniformInit(-0.1, 0.1),
+                 biases_initialiser=init.ConstantInit(0.),
+                 weights_cost=None, biases_cost=None):
+        """Initialises a parameterised affine layer.
+
+        Args:
+            input_dim (int): Dimension of inputs to the layer.
+            output_dim (int): Dimension of the layer outputs.
+            weights_initialiser: Initialiser for the weight parameters.
+            biases_initialiser: Initialiser for the bias parameters.
+            weights_cost: Weights-dependent cost term.
+            biases_cost: Biases-dependent cost term.
+        """
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+        self.weights = weights_initialiser((self.output_dim, self.input_dim))
+        self.biases = biases_initialiser(self.output_dim)
+        self.weights_cost = weights_cost
+        self.biases_cost = biases_cost

    def fprop(self, inputs):
-        """
-        Implements a forward propagation through the i-th layer, that is
-        some form of:
-           a^i = xW^i + b^i
-           h^i = f^i(a^i)
-        with f^i, W^i, b^i denoting a non-linearity, weight matrix and
-        biases at the i-th layer, respectively and x denoting inputs.
+        """Forward propagates activations through the layer transformation.

-        :param inputs: matrix of features (x) or the output of the previous layer h^{i-1}
-        :return: h^i, matrix of transformed by layer features
-        """
-        raise NotImplementedError()
-    
-    def bprop(self, h, igrads):
-        """
-        Implements a backward propagation through the layer, that is, given
-        h^i denotes the output of the layer and x^i the input, we compute:
-        dh^i/dx^i which by chain rule is dh^i/da^i da^i/dx^i
-        x^i could be either features (x) or the output of the lower layer h^{i-1}
-        :param h: it's an activation produced in forward pass
-        :param igrads, error signal (or gradient) flowing to the layer, note,
-               this in general case does not corresponds to 'deltas' used to update
-               the layer's parameters, to get deltas ones need to multiply it with
-               the dh^i/da^i derivative
-        :return: a tuple (deltas, ograds) where:
-               deltas = igrads * dh^i/da^i
-               ograds = deltas \times da^i/dx^i
-        """
-        raise NotImplementedError()
+        For inputs `x`, outputs `y`, weights `W` and biases `b` the layer
+        corresponds to `y = W.dot(x) + b`.

-    def bprop_cost(self, h, igrads, cost=None):
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
        """
-        Implements a backward propagation in case the layer directly
-        deals with the optimised cost (i.e. the top layer)
-        By default, method should implement a back-prop for default cost, that is
-        the one that is natural to the layer's output, i.e.:
-        linear -> mse, softmax -> cross-entropy, sigmoid -> binary cross-entropy
-        :param h: it's an activation produced in forward pass
-        :param igrads, error signal (or gradient) flowing to the layer, note,
-               this in general case does not corresponds to 'deltas' used to update
-               the layer's parameters, to get deltas ones need to multiply it with
-               the dh^i/da^i derivative
-        :return: a tuple (deltas, ograds) where:
-               deltas = igrads * dh^i/da^i
-               ograds = deltas \times da^i/dx^i
+        return self.weights.dot(inputs.T).T + self.biases
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return grads_wrt_outputs.dot(self.weights)
+
+    def grads_wrt_params(self, inputs, grads_wrt_outputs):
+        """Calculates gradients with respect to layer parameters.
+
+        Args:
+            inputs: array of inputs to layer of shape (batch_size, input_dim)
+            grads_wrt_to_outputs: array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim)
+
+        Returns:
+            list of arrays of gradients with respect to the layer parameters
+            `[grads_wrt_weights, grads_wrt_biases]`.
        """

-        raise NotImplementedError()
+        grads_wrt_weights = np.dot(grads_wrt_outputs.T, inputs)
+        grads_wrt_biases = np.sum(grads_wrt_outputs, axis=0)

-    def pgrads(self, inputs, deltas, **kwargs):
+        if self.weights_cost is not None:
+            grads_wrt_weights += self.weights_cost.grad(self.weights)
+
+        if self.biases_cost is not None:
+            grads_wrt_biases += self.biases_cost.grads(self.biases)
+
+        return [grads_wrt_weights, grads_wrt_biases]
+
+    def params_cost(self):
+        """Returns the parameter dependent cost term for this layer.
+
+        If no parameter-dependent cost terms are set this returns zero.
        """
-        Return gradients w.r.t parameters
-        """
-        raise NotImplementedError()
+        params_cost = 0
+        if self.weights_cost is not None:
+            params_cost += self.weights_cost(self.weights)
+        if self.biases_cost is not None:
+            params_cost += self.biases_cost(self.biases)
+        return params_cost

-    def get_params(self):
-        raise NotImplementedError()
+    @property
+    def params(self):
+        """A list of layer parameter values: `[weights, biases]`."""
+        return [self.weights, self.biases]

-    def set_params(self):
-        raise NotImplementedError()
+    @params.setter
+    def params(self, values):
+        self.weights = values[0]
+        self.biases = values[1]

-    def get_name(self):
-        return 'abstract_layer'
+    def __repr__(self):
+        return 'AffineLayer(input_dim={0}, output_dim={1})'.format(
+            self.input_dim, self.output_dim)


-class Linear(Layer):
-
-    def __init__(self, idim, odim,
-                 rng=None,
-                 irange=0.1):
-
-        super(Linear, self).__init__(rng=rng)
-
-        self.idim = idim
-        self.odim = odim
-
-        self.W = self.rng.uniform(
-            -irange, irange,
-            (self.idim, self.odim))
-
-        self.b = numpy.zeros((self.odim,), dtype=numpy.float32)
+class SigmoidLayer(Layer):
+    """Layer implementing an element-wise logistic sigmoid transformation."""

    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        For inputs `x` and outputs `y` this corresponds to
+        `y = 1 / (1 + exp(-x))`.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
        """
-        Implements a forward propagation through the i-th layer, that is
-        some form of:
-           a^i = xW^i + b^i
-           h^i = f^i(a^i)
-        with f^i, W^i, b^i denoting a non-linearity, weight matrix and
-        biases of this (i-th) layer, respectively and x denoting inputs.
+        return 1. / (1. + np.exp(-inputs))

-        :param inputs: matrix of features (x) or the output of the previous layer h^{i-1}
-        :return: h^i, matrix of transformed by layer features
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
        """
+        return grads_wrt_outputs * outputs * (1. - outputs)

-        #input comes from 4D convolutional tensor, reshape to expected shape
-        if inputs.ndim == 4:
-            inputs = inputs.reshape(inputs.shape[0], -1)
-
-        a = numpy.dot(inputs, self.W) + self.b
-        # here f() is an identity function, so just return a linear transformation
-        return a
-
-    def bprop(self, h, igrads):
-        """
-        Implements a backward propagation through the layer, that is, given
-        h^i denotes the output of the layer and x^i the input, we compute:
-        dh^i/dx^i which by chain rule is dh^i/da^i da^i/dx^i
-        x^i could be either features (x) or the output of the lower layer h^{i-1}
-        :param h: it's an activation produced in forward pass
-        :param igrads, error signal (or gradient) flowing to the layer, note,
-               this in general case does not corresponds to 'deltas' used to update
-               the layer's parameters, to get deltas ones need to multiply it with
-               the dh^i/da^i derivative
-        :return: a tuple (deltas, ograds) where:
-               deltas = igrads * dh^i/da^i
-               ograds = deltas \times da^i/dx^i
-        """
-
-        # since df^i/da^i = 1 (f is assumed identity function),
-        # deltas are in fact the same as igrads
-        ograds = numpy.dot(igrads, self.W.T)
-        return igrads, ograds
-
-    def bprop_cost(self, h, igrads, cost):
-        """
-        Implements a backward propagation in case the layer directly
-        deals with the optimised cost (i.e. the top layer)
-        By default, method should implement a bprop for default cost, that is
-        the one that is natural to the layer's output, i.e.:
-        here we implement linear -> mse scenario
-        :param h: it's an activation produced in forward pass
-        :param igrads, error signal (or gradient) flowing to the layer, note,
-               this in general case does not corresponds to 'deltas' used to update
-               the layer's parameters, to get deltas ones need to multiply it with
-               the dh^i/da^i derivative
-        :param cost, mlp.costs.Cost instance defining the used cost
-        :return: a tuple (deltas, ograds) where:
-               deltas = igrads * dh^i/da^i
-               ograds = deltas \times da^i/dx^i
-        """
-
-        if cost is None or cost.get_name() == 'mse':
-            # for linear layer and mean square error cost,
-            # cost back-prop is the same as standard back-prop
-            return self.bprop(h, igrads)
-        else:
-            raise NotImplementedError('Linear.bprop_cost method not implemented '
-                                      'for the %s cost' % cost.get_name())
-
-    def pgrads(self, inputs, deltas, l1_weight=0, l2_weight=0):
-        """
-        Return gradients w.r.t parameters
-
-        :param inputs, input to the i-th layer
-        :param deltas, deltas computed in bprop stage up to -ith layer
-        :param kwargs, key-value optional arguments
-        :return list of grads w.r.t parameters dE/dW and dE/db in *exactly*
-                the same order as the params are returned by get_params()
-
-        Note: deltas here contain the whole chain rule leading
-        from the cost up to the the i-th layer, i.e.
-        dE/dy^L dy^L/da^L da^L/dh^{L-1} dh^{L-1}/da^{L-1} ... dh^{i}/da^{i}
-        and here we are just asking about
-          1) da^i/dW^i and 2) da^i/db^i
-        since W and b are only layer's parameters
-        """
-
-        #input comes from 4D convolutional tensor, reshape to expected shape
-        if inputs.ndim == 4:
-            inputs = inputs.reshape(inputs.shape[0], -1)
-
-        #you could basically use different scalers for biases
-        #and weights, but it is not implemented here like this
-        l2_W_penalty, l2_b_penalty = 0, 0
-        if l2_weight > 0:
-            l2_W_penalty = l2_weight*self.W
-            l2_b_penalty = l2_weight*self.b
-
-        l1_W_penalty, l1_b_penalty = 0, 0
-        if l1_weight > 0:
-            l1_W_penalty = l1_weight*numpy.sign(self.W)
-            l1_b_penalty = l1_weight*numpy.sign(self.b)
-
-        grad_W = numpy.dot(inputs.T, deltas) + l2_W_penalty + l1_W_penalty
-        grad_b = numpy.sum(deltas, axis=0) + l2_b_penalty + l1_b_penalty
-
-        return [grad_W, grad_b]
-
-    def get_params(self):
-        return [self.W, self.b]
-
-    def set_params(self, params):
-        #we do not make checks here, but the order on the list
-        #is assumed to be exactly the same as get_params() returns
-        self.W = params[0]
-        self.b = params[1]
-
-    def get_name(self):
-        return 'linear'
+    def __repr__(self):
+        return 'SigmoidLayer'


-class Sigmoid(Linear):
-    def __init__(self,  idim, odim,
-                 rng=None,
-                 irange=0.1):
-
-        super(Sigmoid, self).__init__(idim, odim, rng, irange)
-    
-    def fprop(self, inputs):
-        #get the linear activations
-        a = super(Sigmoid, self).fprop(inputs)
-        #stabilise the exp() computation in case some values in
-        #'a' get very negative. We limit both tails, however only
-        #negative values may lead to numerical issues -- exp(-a)
-        #clip() function does the following operation faster:
-        # a[a < -30.] = -30,
-        # a[a > 30.] = 30.
-        numpy.clip(a, -30.0, 30.0, out=a)
-        h = 1.0/(1 + numpy.exp(-a))
-        return h
-    
-    def bprop(self, h, igrads):
-        dsigm = h * (1.0 - h)
-        deltas = igrads * dsigm
-        ___, ograds = super(Sigmoid, self).bprop(h=None, igrads=deltas)
-        return deltas, ograds
-
-    def bprop_cost(self, h, igrads, cost):
-        if cost is None or cost.get_name() == 'bce':
-            return super(Sigmoid, self).bprop(h=h, igrads=igrads)
-        else:
-            raise NotImplementedError('Sigmoid.bprop_cost method not implemented '
-                                      'for the %s cost' % cost.get_name())
-
-    def get_name(self):
-        return 'sigmoid'
-
-
-class Softmax(Linear):
-
-    def __init__(self,idim, odim,
-                 rng=None,
-                 irange=0.1):
-
-        super(Softmax, self).__init__(idim,
-                                      odim,
-                                      rng=rng,
-                                      irange=irange)
+class ReluLayer(Layer):
+    """Layer implementing an element-wise rectified linear transformation."""

    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.

-        # compute the linear outputs
-        a = super(Softmax, self).fprop(inputs)
-        # apply numerical stabilisation by subtracting max
-        # from each row (not required for the coursework)
-        # then compute exponent
-        assert a.ndim in [1, 2], (
-            "Expected the linear activation in Softmax layer to be either "
-            "vector or matrix, got %ith dimensional tensor" % a.ndim
-        )
-        axis = a.ndim - 1
-        exp_a = numpy.exp(a - numpy.max(a, axis=axis, keepdims=True))
-        # finally, normalise by the sum within each example
-        y = exp_a/numpy.sum(exp_a, axis=axis, keepdims=True)
+        For inputs `x` and outputs `y` this corresponds to `y = max(0, x)`.

-        return y
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).

-    def bprop(self, h, igrads):
-        raise NotImplementedError('Softmax.bprop not implemented for hidden layer.')
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return np.maximum(inputs, 0.)

-    def bprop_cost(self, h, igrads, cost):
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.

-        if cost is None or cost.get_name() == 'ce':
-            return super(Softmax, self).bprop(h=h, igrads=igrads)
-        else:
-            raise NotImplementedError('Softmax.bprop_cost method not implemented '
-                                      'for %s cost' % cost.get_name())
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.

-    def get_name(self):
-        return 'softmax'
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return (outputs > 0) * grads_wrt_outputs
+
+    def __repr__(self):
+        return 'ReluLayer'


-class Relu(Linear):
-    def __init__(self,  idim, odim,
-                 rng=None,
-                 irange=0.1):
-
-        super(Relu, self).__init__(idim, odim, rng, irange)
+class TanhLayer(Layer):
+    """Layer implementing an element-wise hyperbolic tangent transformation."""

    def fprop(self, inputs):
-        #get the linear activations
-        a = super(Relu, self).fprop(inputs)
-        h = numpy.clip(a, 0, 20.0)
-        #h = numpy.maximum(a, 0)
-        return h
+        """Forward propagates activations through the layer transformation.

-    def bprop(self, h, igrads):
-        deltas = (h > 0)*igrads
-        ___, ograds = super(Relu, self).bprop(h=None, igrads=deltas)
-        return deltas, ograds
+        For inputs `x` and outputs `y` this corresponds to `y = tanh(x)`.

-    def bprop_cost(self, h, igrads, cost):
-        raise NotImplementedError('Relu.bprop_cost method not implemented '
-                                      'for the %s cost' % cost.get_name())
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).

-    def get_name(self):
-        return 'relu'
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return np.tanh(inputs)

+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.

-class Tanh(Linear):
-    def __init__(self,  idim, odim,
-                 rng=None,
-                 irange=0.1):
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.

-        super(Tanh, self).__init__(idim, odim, rng, irange)
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).

-    def fprop(self, inputs):
-        #get the linear activations
-        a = super(Tanh, self).fprop(inputs)
-        numpy.clip(a, -30.0, 30.0, out=a)
-        h = numpy.tanh(a)
-        return h
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return (1. - outputs**2) * grads_wrt_outputs

-    def bprop(self, h, igrads):
-        deltas = (1.0 - h**2) * igrads
-        ___, ograds = super(Tanh, self).bprop(h=None, igrads=deltas)
-        return deltas, ograds
-
-    def bprop_cost(self, h, igrads, cost):
-        raise NotImplementedError('Tanh.bprop_cost method not implemented '
-                                      'for the %s cost' % cost.get_name())
-
-    def get_name(self):
-        return 'tanh'
-
-
-class Maxout(Linear):
-    def __init__(self,  idim, odim, k,
-                 rng=None,
-                 irange=0.05):
-
-        super(Maxout, self).__init__(idim, odim*k, rng, irange)
-
-        self.max_odim = odim
-        self.k = k
-
-    def fprop(self, inputs):
-        #get the linear activations
-        a = super(Maxout, self).fprop(inputs)
-        ar = a.reshape(a.shape[0], self.max_odim, self.k)
-        h, h_argmax = max_and_argmax(ar, axes=2, keepdims_max=True, keepdims_argmax=True)
-        self.h_argmax = h_argmax
-        return h[:, :, 0] #get rid of the last reduced dimensison (of size 1)
-
-    def bprop(self, h, igrads):
-        #hack for dropout backprop (ignore dropped neurons). Note, this is not
-        #entirely correct when h fires at 0 exaclty (but is not dropped, in which case
-        #derivative should be 1). However, this is rather unlikely to happen (that h fires as 0)
-        #and probably can be ignored for now. Otherwise, one would have to keep the dropped unit
-        #indexes and zero grads according to them.
-        igrads = (h != 0)*igrads
-        #convert into the shape where upsampling is easier
-        igrads_up = igrads.reshape(igrads.shape[0], self.max_odim, 1)
-        #upsample to the linear dimension (but reshaped to (batch_size, maxed_num (1), pool_size)
-        igrads_up = numpy.tile(igrads_up, (1, 1, self.k))
-        #generate mask matrix and set to 1 maxed elements
-        mask = numpy.zeros_like(igrads_up)
-        mask[self.h_argmax] = 1.0
-        #do bprop through max operator and then reshape into 2D
-        deltas = (igrads_up * mask).reshape(igrads_up.shape[0], -1)
-        #and then do bprop thorough linear part
-        ___, ograds = super(Maxout, self).bprop(h=None, igrads=deltas)
-        return deltas, ograds
-
-    def bprop_cost(self, h, igrads, cost):
-        raise NotImplementedError('Maxout.bprop_cost method not implemented '
-                                      'for the %s cost' % cost.get_name())
-
-    def get_name(self):
-        return 'maxout'
+    def __repr__(self):
+        return 'TanhLayer'
--- a/mlp/learning_rules.py
+++ b/mlp/learning_rules.py
@ -0,0 +1,161 @@
+# -*- coding: utf-8 -*-
+"""Learning rules.
+
+This module contains classes implementing gradient based learning rules.
+"""
+
+import numpy as np
+
+
+class GradientDescentLearningRule(object):
+    """Simple (stochastic) gradient descent learning rule.
+
+    For a scalar loss function `L(p[0], p_[1] ... )` of some set of potentially
+    multidimensional parameters this attempts to find a local minimum of the
+    loss function by applying updates to each parameter of the form
+
+        p[i] := p[i] - learning_rate * dL/dp[i]
+
+    With `learning_rate` a positive scaling parameter.
+
+    The loss function used in successive applications of these updates may be a
+    stochastic estimator of the true loss function (e.g. when the loss with
+    respect to only a subset of data-points is calculated) in which case this
+    will correspond to a stochastic gradient descent learning rule.
+    """
+
+    def __init__(self, learning_rate=1e-3):
+        """Creates a new learning rule object.
+
+        Args:
+            learning_rate: A postive scalar to scale gradient updates to the
+                parameters by. This needs to be carefully set - if too large
+                the learning dynamic will be unstable and may diverge, while
+                if set too small learning will proceed very slowly.
+
+        """
+        assert learning_rate > 0., 'learning_rate should be positive.'
+        self.learning_rate = learning_rate
+
+    def initialise(self, params):
+        """Initialises the state of the learning rule for a set or parameters.
+
+        This must be called before `update_params` is first called.
+
+        Args:
+            params: A list of the parameters to be optimised. Note these will
+                be updated *in-place* to avoid reallocating arrays on each
+                update.
+        """
+        self.params = params
+
+    def reset(self):
+        """Resets any additional state variables to their intial values.
+
+        For this learning rule there are no additional state variables so we
+        do nothing here.
+        """
+        pass
+
+    def update_params(self, grads_wrt_params):
+        """Applies a single gradient descent update to all parameters.
+
+        All parameter updates are performed using in-place operations and so
+        nothing is returned.
+
+        Args:
+            grads_wrt_params: A list of gradients of the scalar loss function
+                with respect to each of the parameters passed to `initialise`
+                previously, with this list expected to be in the same order.
+        """
+        for param, grad in zip(self.params, grads_wrt_params):
+            param -= self.learning_rate * grad
+
+
+class MomentumLearningRule(GradientDescentLearningRule):
+    """Gradient descent with momentum learning rule.
+
+    This extends the basic gradient learning rule by introducing extra
+    momentum state variables for each parameter. These can help the learning
+    dynamic help overcome shallow local minima and speed convergence when
+    making multiple successive steps in a similar direction in parameter space.
+
+    For parameter p[i] and corresponding momentum m[i] the updates for a
+    scalar loss function `L` are of the form
+
+        m[i] := mom_coeff * m[i] - learning_rate * dL/dp[i]
+        p[i] := p[i] + m[i]
+
+    with `learning_rate` a positive scaling parameter for the gradient updates
+    and `mom_coeff` a value in [0, 1] that determines how much 'friction' there
+    is the system and so how quickly previous momentum contributions decay.
+    """
+
+    def __init__(self, learning_rate=1e-3, mom_coeff=0.9):
+        """Creates a new learning rule object.
+
+        Args:
+            learning_rate: A postive scalar to scale gradient updates to the
+                parameters by. This needs to be carefully set - if too large
+                the learning dynamic will be unstable and may diverge, while
+                if set too small learning will proceed very slowly.
+            mom_coeff: A scalar in the range [0, 1] inclusive. This determines
+                the contribution of the previous momentum value to the value
+                after each update. If equal to 0 the momentum is set to exactly
+                the negative scaled gradient each update and so this rule
+                collapses to standard gradient descent. If equal to 1 the
+                momentum will just be decremented by the scaled gradient at
+                each update. This is equivalent to simulating the dynamic in
+                a frictionless system. Due to energy conservation the loss
+                of 'potential energy' as the dynamics moves down the loss
+                function surface will lead to an increasingly large 'kinetic
+                energy' and so speed, meaning the updates will become
+                increasingly large, potentially unstably so. Typically a value
+                less than but close to 1 will avoid these issues and cause the
+                dynamic to converge to a local minima where the gradients are
+                by definition zero.
+        """
+        super(MomentumLearningRule, self).__init__(learning_rate)
+        assert mom_coeff >= 0. and mom_coeff <= 1., (
+            'mom_coeff should be in the range [0, 1].'
+        )
+        self.mom_coeff = mom_coeff
+
+    def initialise(self, params):
+        """Initialises the state of the learning rule for a set or parameters.
+
+        This must be called before `update_params` is first called.
+
+        Args:
+            params: A list of the parameters to be optimised. Note these will
+                be updated *in-place* to avoid reallocating arrays on each
+                update.
+        """
+        super(MomentumLearningRule, self).initialise(params)
+        self.moms = []
+        for param in self.params:
+            self.moms.append(np.zeros_like(param))
+
+    def reset(self):
+        """Resets any additional state variables to their intial values.
+
+        For this learning rule this corresponds to zeroing all the momenta.
+        """
+        for mom in zip(self.moms):
+            mom *= 0.
+
+    def update_params(self, grads_wrt_params):
+        """Applies a single update to all parameters.
+
+        All parameter updates are performed using in-place operations and so
+        nothing is returned.
+
+        Args:
+            grads_wrt_params: A list of gradients of the scalar loss function
+                with respect to each of the parameters passed to `initialise`
+                previously, with this list expected to be in the same order.
+        """
+        for param, mom, grad in zip(self.params, self.moms, grads_wrt_params):
+            mom *= self.mom_coeff
+            mom -= self.learning_rate * grad
+            param += mom
--- a/mlp/models.py
+++ b/mlp/models.py
@ -0,0 +1,145 @@
+# -*- coding: utf-8 -*-
+"""Model definitions.
+
+This module implements objects encapsulating learnable models of input-output
+relationships. The model objects implement methods for forward propagating
+the inputs through the transformation(s) defined by the model to produce
+outputs (and intermediate states) and for calculating gradients of scalar
+functions of the outputs with respect to the model parameters.
+"""
+
+from mlp.layers import LayerWithParameters
+
+
+class SingleLayerModel(object):
+    """A model consisting of a single transformation layer."""
+
+    def __init__(self, layer):
+        """Create a new single layer model instance.
+
+        Args:
+            layer: The layer object defining the model architecture.
+        """
+        self.layer = layer
+
+    @property
+    def params(self):
+        """A list of all of the parameters of the model."""
+        return self.layer.params
+
+    def fprop(self, inputs):
+        """Calculate the model outputs corresponding to a batch of inputs.
+
+        Args:
+            inputs: Batch of inputs to the model.
+
+        Returns:
+            List which is a concatenation of the model inputs and model
+            outputs, this being done for consistency of the interface with
+            multi-layer models for which `fprop` returns a list of
+            activations through all immediate layers of the model and including
+            the inputs and outputs.
+        """
+        activations = [inputs, self.layer.fprop(inputs)]
+        return activations
+
+    def grads_wrt_params(self, activations, grads_wrt_outputs):
+        """Calculates gradients with respect to the model parameters.
+
+        Args:
+            activations: List of all activations from forward pass through
+                model using `fprop`.
+            grads_wrt_outputs: Gradient with respect to the model outputs of
+               the scalar function parameter gradients are being calculated
+               for.
+
+        Returns:
+            List of gradients of the scalar function with respect to all model
+            parameters.
+        """
+        return self.layer.grads_wrt_params(activations[0], grads_wrt_outputs)
+
+    def params_cost(self):
+        """Calculates the parameter dependent cost term of the model."""
+        return self.layer.params_cost()
+
+    def __repr__(self):
+        return 'SingleLayerModel(' + str(layer) + ')'
+
+
+class MultipleLayerModel(object):
+    """A model consisting of multiple layers applied sequentially."""
+
+    def __init__(self, layers):
+        """Create a new multiple layer model instance.
+
+        Args:
+            layers: List of the the layer objecst defining the model in the
+                order they should be applied from inputs to outputs.
+        """
+        self.layers = layers
+
+    @property
+    def params(self):
+        """A list of all of the parameters of the model."""
+        params = []
+        for layer in self.layers:
+            if isinstance(layer, LayerWithParameters):
+                params += layer.params
+        return params
+
+    def fprop(self, inputs):
+        """Forward propagates a batch of inputs through the model.
+
+        Args:
+            inputs: Batch of inputs to the model.
+
+        Returns:
+            List of the activations at the output of all layers of the model
+            plus the inputs (to the first layer) as the first element. The
+            last element of the list corresponds to the model outputs.
+        """
+        activations = [inputs]
+        for i, layer in enumerate(self.layers):
+            activations.append(self.layers[i].fprop(activations[i]))
+        return activations
+
+    def grads_wrt_params(self, activations, grads_wrt_outputs):
+        """Calculates gradients with respect to the model parameters.
+
+        Args:
+            activations: List of all activations from forward pass through
+                model using `fprop`.
+            grads_wrt_outputs: Gradient with respect to the model outputs of
+               the scalar function parameter gradients are being calculated
+               for.
+
+        Returns:
+            List of gradients of the scalar function with respect to all model
+            parameters.
+        """
+        grads_wrt_params = []
+        for i, layer in enumerate(self.layers[::-1]):
+            inputs = activations[-i - 2]
+            outputs = activations[-i - 1]
+            grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)
+            if isinstance(layer, LayerWithParameters):
+                grads_wrt_params += layer.grads_wrt_params(
+                    inputs, grads_wrt_outputs)[::-1]
+            grads_wrt_outputs = grads_wrt_inputs
+        return grads_wrt_params[::-1]
+
+    def params_cost(self):
+        """Calculates the parameter dependent cost term of the model."""
+        params_cost = 0.
+        for layer in self.layers:
+            if isinstance(layer, LayerWithParameters):
+                params_cost += layer.params_cost()
+        return params_cost
+
+    def __repr__(self):
+        return (
+            'MultiLayerModel(\n    ' +
+            '\n    '.join([str(layer) for layer in self.layers]) +
+            '\n)'
+        )
--- a/mlp/optimisers.py
+++ b/mlp/optimisers.py
@ -1,214 +1,134 @@
-# Machine Learning Practical (INFR11119),
-# Pawel Swietojanski, University of Edinburgh
+# -*- coding: utf-8 -*-
+"""Model optimisers.
+
+This module contains objects implementing (batched) stochastic gradient descent
+based optimisation of models.
+"""

-import numpy
 import time
 import logging
-
-from mlp.layers import MLP
-from mlp.dataset import DataProvider
-from mlp.schedulers import LearningRateScheduler
+from collections import OrderedDict
+import numpy as np


 logger = logging.getLogger(__name__)


 class Optimiser(object):
-    def train_epoch(self, model, train_iter):
-        raise NotImplementedError()
+    """Basic model optimiser."""

-    def train(self, model, train_iter, valid_iter=None):
-        raise NotImplementedError()
+    def __init__(self, model, cost, learning_rule, train_dataset,
+                 valid_dataset=None, data_monitors=None):
+        """Create a new optimiser instance.

-    def validate(self, model, valid_iterator, l1_weight=0, l2_weight=0):
-        assert isinstance(model, MLP), (
-            "Expected model to be a subclass of 'mlp.layers.MLP'"
-            " class but got %s " % type(model)
-        )
-
-        assert isinstance(valid_iterator, DataProvider), (
-            "Expected iterator to be a subclass of 'mlp.dataset.DataProvider'"
-            " class but got %s " % type(valid_iterator)
-        )
-
-        acc_list, nll_list = [], []
-        for x, t in valid_iterator:
-            y = model.fprop(x)
-            nll_list.append(model.cost.cost(y, t))
-            acc_list.append(numpy.mean(self.classification_accuracy(y, t)))
-
-        acc = numpy.mean(acc_list)
-        nll = numpy.mean(nll_list)
-
-        prior_costs = Optimiser.compute_prior_costs(model, l1_weight, l2_weight)
-
-        return nll + sum(prior_costs), acc
-
-    @staticmethod
-    def classification_accuracy(y, t):
+        Args:
+            model: The model to optimise.
+            cost: The scalar cost function to minimise.
+            learning_rule: Gradient based learning rule to use to minimise
+                cost.
+            train_dataset: Data provider for training set data batches.
+            valid_dataset: Data provider for validation set data batches.
+            data_monitors: Dictionary of functions evaluated on targets and
+                model outputs (averaged across both full training and
+                validation data sets) to monitor during training in addition
+                to the cost. Keys should correspond to a string label for
+                the statistic being evaluated.
        """
-        Returns classification accuracy given the estimate y and targets t
-        :param y: matrix -- estimate produced by the model in fprop
-        :param t: matrix -- target  1-of-K coded
-        :return: vector of y.shape[0] size with binary values set to 0
-                 if example was miscalssified or 1 otherwise
+        self.model = model
+        self.cost = cost
+        self.learning_rule = learning_rule
+        self.learning_rule.initialise(self.model.params)
+        self.train_dataset = train_dataset
+        self.valid_dataset = valid_dataset
+        self.data_monitors = OrderedDict([('cost', cost)])
+        if data_monitors is not None:
+            self.data_monitors.update(data_monitors)
+
+    def do_training_epoch(self):
+        """Do a single training epoch.
+
+        This iterates through all batches in training dataset, for each
+        calculating the gradient of the estimated loss given the batch with
+        respect to all the model parameters and then updates the model
+        parameters according to the learning rule.
        """
-        y_idx = numpy.argmax(y, axis=1)
-        t_idx = numpy.argmax(t, axis=1)
-        rval = numpy.equal(y_idx, t_idx)
-        return rval
+        for inputs_batch, targets_batch in self.train_dataset:
+            activations = self.model.fprop(inputs_batch)
+            grads_wrt_outputs = self.cost.grad(activations[-1], targets_batch)
+            grads_wrt_params = self.model.grads_wrt_params(
+                activations, grads_wrt_outputs)
+            self.learning_rule.update_params(grads_wrt_params)

-    @staticmethod
-    def compute_prior_costs(model, l1_weight, l2_weight):
+    def eval_monitors(self, dataset, label):
+        """Evaluates the monitors for the given dataset.
+
+        Args:
+            dataset: Dataset to perform evaluation with.
+            label: Tag to add to end of monitor keys to identify dataset.
+
+        Returns:
+            OrderedDict of monitor values evaluated on dataset.
        """
-        Computes the cost contributions coming from parameter-dependent only
-        regularisation penalties
+        data_mon_vals = OrderedDict([(key + label, 0.) for key
+                                     in self.data_monitors.keys()])
+        for inputs_batch, targets_batch in dataset:
+            activations = self.model.fprop(inputs_batch)
+            for key, data_monitor in self.data_monitors.items():
+                data_mon_vals[key + label] += data_monitor(
+                    activations[-1], targets_batch)
+        for key, data_monitor in self.data_monitors.items():
+            data_mon_vals[key + label] /= dataset.num_batches
+        return data_mon_vals
+
+    def get_epoch_stats(self):
+        """Computes training statistics for an epoch.
+
+        Returns:
+            An OrderedDict with keys corresponding to the statistic labels and
+            values corresponding to the value of the statistic.
        """
-        assert isinstance(model, MLP), (
-            "Expected model to be a subclass of 'mlp.layers.MLP'"
-            " class but got %s " % type(model)
-        )
+        epoch_stats = OrderedDict()
+        epoch_stats.update(self.eval_monitors(self.train_dataset, '(train)'))
+        if self.valid_dataset is not None:
+            epoch_stats.update(self.eval_monitors(
+                self.valid_dataset, '(valid)'))
+        epoch_stats['cost(param)'] = self.model.params_cost()
+        return epoch_stats

-        l1_cost, l2_cost = 0, 0
-        for i in xrange(0, len(model.layers)):
-            params = model.layers[i].get_params()
-            for param in params:
-                if l2_weight > 0:
-                    l2_cost += 0.5 * l2_weight * numpy.sum(param**2)
-                if l1_weight > 0:
-                    l1_cost += l1_weight * numpy.sum(numpy.abs(param))
+    def log_stats(self, epoch, epoch_time, stats):
+        """Outputs stats for a training epoch to a logger.

-        return l1_cost, l2_cost
+        Args:
+            epoch (int): Epoch counter.
+            epoch_time: Time taken in seconds for the epoch to complete.
+            stats: Monitored stats for the epoch.
+        """
+        logger.info('Epoch {0}: {1:.1f}s to complete\n    {2}'.format(
+            epoch, epoch_time,
+            ', '.join(['{0}={1:.2e}'.format(k, v) for (k, v) in stats.items()])
+        ))

+    def train(self, num_epochs, stats_interval=5):
+        """Trains a model for a set number of epochs.

-class SGDOptimiser(Optimiser):
-    def __init__(self, lr_scheduler,
-                 dp_scheduler=None,
-                 l1_weight=0.0,
-                 l2_weight=0.0):
+        Args:
+            num_epochs: Number of epochs (complete passes through trainin
+                dataset) to train for.
+            stats_interval: Training statistics will be recorded and logged
+                every `stats_interval` epochs.

-        super(SGDOptimiser, self).__init__()
-
-        assert isinstance(lr_scheduler, LearningRateScheduler), (
-            "Expected lr_scheduler to be a subclass of 'mlp.schedulers.LearningRateScheduler'"
-            " class but got %s " % type(lr_scheduler)
-        )
-
-        self.lr_scheduler = lr_scheduler
-        self.dp_scheduler = dp_scheduler
-        self.l1_weight = l1_weight
-        self.l2_weight = l2_weight
-
-    def train_epoch(self, model, train_iterator, learning_rate):
-
-        assert isinstance(model, MLP), (
-            "Expected model to be a subclass of 'mlp.layers.MLP'"
-            " class but got %s " % type(model)
-        )
-        assert isinstance(train_iterator, DataProvider), (
-            "Expected iterator to be a subclass of 'mlp.dataset.DataProvider'"
-            " class but got %s " % type(train_iterator)
-        )
-
-        acc_list, nll_list = [], []
-        for x, t in train_iterator:
-
-            # get the prediction
-            if self.dp_scheduler is not None:
-                y = model.fprop_dropout(x, self.dp_scheduler)
-            else:
-                y = model.fprop(x)
-
-            # compute the cost and grad of the cost w.r.t y
-            cost = model.cost.cost(y, t)
-            cost_grad = model.cost.grad(y, t)
-
-            # do backward pass through the model
-            model.bprop(cost_grad, self.dp_scheduler)
-
-            #update the model, here we iterate over layers
-            #and then over each parameter in the layer
-            effective_learning_rate = learning_rate / x.shape[0]
-
-            for i in xrange(0, len(model.layers)):
-                params = model.layers[i].get_params()
-                grads = model.layers[i].pgrads(inputs=model.activations[i],
-                                               deltas=model.deltas[i + 1],
-                                               l1_weight=self.l1_weight,
-                                               l2_weight=self.l2_weight)
-                uparams = []
-                for param, grad in zip(params, grads):
-                    param = param - effective_learning_rate * grad
-                    uparams.append(param)
-                model.layers[i].set_params(uparams)
-
-            nll_list.append(cost)
-            acc_list.append(numpy.mean(self.classification_accuracy(y, t)))
-
-        #compute the prior penalties contribution (parameter dependent only)
-        prior_costs = Optimiser.compute_prior_costs(model, self.l1_weight, self.l2_weight)
-        training_cost = numpy.mean(nll_list) + sum(prior_costs)
-
-        return training_cost, numpy.mean(acc_list)
-
-    def train(self, model, train_iterator, valid_iterator=None):
-
-        converged = False
-        cost_name = model.cost.get_name()
-        tr_stats, valid_stats = [], []
-
-        # do the initial validation
-        train_iterator.reset()
-        tr_nll, tr_acc = self.validate(model, train_iterator, self.l1_weight, self.l2_weight)
-        logger.info('Epoch %i: Training cost (%s) for initial model is %.3f. Accuracy is %.2f%%'
-                    % (self.lr_scheduler.epoch, cost_name, tr_nll, tr_acc * 100.))
-        tr_stats.append((tr_nll, tr_acc))
-
-        if valid_iterator is not None:
-            valid_iterator.reset()
-            valid_nll, valid_acc = self.validate(model, valid_iterator, self.l1_weight, self.l2_weight)
-            logger.info('Epoch %i: Validation cost (%s) for initial model is %.3f. Accuracy is %.2f%%'
-                        % (self.lr_scheduler.epoch, cost_name, valid_nll, valid_acc * 100.))
-            valid_stats.append((valid_nll, valid_acc))
-
-        while not converged:
-            train_iterator.reset()
-
-            tstart = time.clock()
-            tr_nll, tr_acc = self.train_epoch(model=model,
-                                              train_iterator=train_iterator,
-                                              learning_rate=self.lr_scheduler.get_rate())
-            tstop = time.clock()
-            tr_stats.append((tr_nll, tr_acc))
-
-            logger.info('Epoch %i: Training cost (%s) is %.3f. Accuracy is %.2f%%'
-                        % (self.lr_scheduler.epoch + 1, cost_name, tr_nll, tr_acc * 100.))
-
-            vstart = time.clock()
-            if valid_iterator is not None:
-                valid_iterator.reset()
-                valid_nll, valid_acc = self.validate(model, valid_iterator,
-                                                     self.l1_weight, self.l2_weight)
-                logger.info('Epoch %i: Validation cost (%s) is %.3f. Accuracy is %.2f%%'
-                            % (self.lr_scheduler.epoch + 1, cost_name, valid_nll, valid_acc * 100.))
-                self.lr_scheduler.get_next_rate(valid_acc)
-                valid_stats.append((valid_nll, valid_acc))
-            else:
-                self.lr_scheduler.get_next_rate(None)
-            vstop = time.clock()
-
-            train_speed = train_iterator.num_examples_presented() / (tstop - tstart)
-            valid_speed = valid_iterator.num_examples_presented() / (vstop - vstart)
-            tot_time = vstop - tstart
-            #pps = presentations per second
-            logger.info("Epoch %i: Took %.0f seconds. Training speed %.0f pps. "
-                        "Validation speed %.0f pps."
-                        % (self.lr_scheduler.epoch, tot_time, train_speed, valid_speed))
-
-            # we stop training when learning rate, as returned by lr scheduler, is 0
-            # this is implementation dependent and depending on lr schedule could happen,
-            # for example, when max_epochs has been reached or if the progress between
-            # two consecutive epochs is too small, etc.
-            converged = (self.lr_scheduler.get_rate() == 0)
-
-        return tr_stats, valid_stats
+        Returns:
+            Tuple with first value being an array of training run statistics
+            and the second being a dict mapping the labels for the statistics
+            recorded to their column index in the array.
+        """
+        run_stats = []
+        for epoch in range(1, num_epochs + 1):
+            start_time = time.clock()
+            self.do_training_epoch()
+            epoch_time = time.clock() - start_time
+            if epoch % stats_interval == 0:
+                stats = self.get_epoch_stats()
+                self.log_stats(epoch, epoch_time, stats)
+                run_stats.append(stats.values())
+        return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}
--- a/notebooks/00_Introduction.ipynb
+++ b/notebooks/00_Introduction.ipynb
@ -0,0 +1,451 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "nbpresent": {
+     "id": "b167e6e2-05e0-4a4b-a6cc-47cab1c728b4"
+    }
+   },
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "## Getting started with Jupyter notebooks\n",
+    "\n",
+    "The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to [skip straight to the exercises below](#Exercises).\n",
+    "\n",
+    "*Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python [hence the name was changed to make it more language agnostic](https://ipython.org/#jupyter-and-the-future-of-ipython) however IPython notebook is still commonly used.*\n",
+    "\n",
+    "### Jupyter basics: the server, dashboard and kernels\n",
+    "\n",
+    "In launching this notebook you will have already come across two of the other key components of the Jupyter system - the notebook *server* and *dashboard* interface.\n",
+    "\n",
+    "We began by starting a notebook server instance in the terminal by running\n",
+    "\n",
+    "```\n",
+    "jupyter notebook\n",
+    "```\n",
+    "\n",
+    "This will have begun printing a series of log messages to terminal output similar to\n",
+    "\n",
+    "```\n",
+    "$ jupyter notebook\n",
+    "[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical\n",
+    "[I 08:58:24.417 NotebookApp] 0 active kernels\n",
+    "[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/\n",
+    "```\n",
+    "\n",
+    "The last message included here indicates the URL the application is being served at. The default behaviour of the `jupyter notebook` command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running `jupyter notebook --no-browser`. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using\n",
+    "\n",
+    "```\n",
+    "juptyer notebook --help\n",
+    "```\n",
+    "\n",
+    "While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use `Ctrl+C` - this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter `y` or skip the confirmation by hitting `Ctrl+C` again.\n",
+    "\n",
+    "When the notebook application first opens in your browser you are taken to the notebook *dashboard*. This will appear something like this\n",
+    "\n",
+    "<img src='res/jupyter-dashboard.png' />\n",
+    "\n",
+    "The dashboard above is showing the `Files` tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the `..` link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.\n",
+    "\n",
+    "As well as allowing you to launch existing notebooks, the `Files` tab of the dashboard also allows new notebooks to be created using the `New` drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).\n",
+    "\n",
+    "In addition to opening notebook files, we can also edit text files such as `.py` source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.\n",
+    "\n",
+    "The `Running` tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed. In Jupyter parlance, the Python process associated with a notebook is often termed a *kernel* although this is also sometimes used to refer to a slightly different [concept](http://jupyter.readthedocs.io/en/latest/projects/kernels.html).\n",
+    "\n",
+    "### The notebook interface\n",
+    "\n",
+    "The top of your notebook window should appear something like this:\n",
+    "\n",
+    "<img src='res/jupyter-notebook-interface.png' />\n",
+    "\n",
+    "The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual *checkpoint* of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the `Revert to checkpoint` dropdown under the `File` menu perhaps suggesting otherwise.\n",
+    "\n",
+    "As well as having options to save and revert to checkpoints, the `File` menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.\n",
+    "\n",
+    "The `Edit` menu contains standard clipboard functions as well as options for reorganising notebook *cells*. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The `Edit` and `Insert` drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the `Cell` menu allow running of code cells and changing cell types.\n",
+    "\n",
+    "The `Kernel` menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and `reload` is failing to work. \n",
+    "\n",
+    "To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.\n",
+    "\n",
+    "The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the `User Interface Tour` under the `Help` menu drop down.\n",
+    "\n",
+    "### Markdown cells: easy text formatting\n",
+    "\n",
+    "This entire introduction has been written in what is termed a *Markdown* cell of a notebook. [Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides - for example [this cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).\n",
+    "\n",
+    "---\n",
+    "\n",
+    "```\n",
+    "## Level 2 heading\n",
+    "### Level 3 heading\n",
+    "\n",
+    "*Italicised* and **bold** text.\n",
+    "\n",
+    "  * bulleted\n",
+    "  * lists\n",
+    "  \n",
+    "and\n",
+    "\n",
+    "  1. enumerated\n",
+    "  2. lists\n",
+    "\n",
+    "Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style\n",
+    "\n",
+    "$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
+    "```\n",
+    "---\n",
+    "\n",
+    "## Level 2 heading\n",
+    "### Level 3 heading\n",
+    "\n",
+    "*Italicised* and **bold** text.\n",
+    "\n",
+    "  * bulleted\n",
+    "  * lists\n",
+    "  \n",
+    "and\n",
+    "\n",
+    "  1. enumerated\n",
+    "  2. lists\n",
+    "\n",
+    "Inline maths $y = mx + c$ using [MathJax]() as well as display maths\n",
+    "\n",
+    "$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
+    "\n",
+    "---\n",
+    "\n",
+    "We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.\n",
+    "\n",
+    "---\n",
+    "```\n",
+    "<img src=\"http://placehold.it/350x150\" />\n",
+    "```\n",
+    "---\n",
+    "\n",
+    "<img src=\"http://placehold.it/350x150\" />\n",
+    "\n",
+    "---\n",
+    "\n",
+    "  \n",
+    "### Code cells: in browser code execution\n",
+    "\n",
+    "Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the <i class=\"fa-step-forward fa\"></i> button on the notebook toolbar, go to `Cell > Run Cells` or use the keyboard shortcut `Ctrl+Enter`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import print_function\n",
+    "import sys\n",
+    "\n",
+    "print('Hello world!')\n",
+    "print('Alarming hello!', file=sys.stderr)\n",
+    "print('Hello again!')\n",
+    "'And again!'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This example shows the three main components of a code cell.\n",
+    "\n",
+    "The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.\n",
+    "\n",
+    "To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display `In [ ]:`. After the cell is run this is updated to `In [n]:` where `n` is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!\n",
+    "\n",
+    "Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29) is displayed. Immediately below that output to [`stderr`](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) is displayed. All of the output to `stdout` is displayed together even if there has been output to `stderr` between as shown by the suprising ordering in the output here. \n",
+    "\n",
+    "The final part of the output area is the *display* area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special `__repr__` method e.g. for a string it is just the quote enclosed value of the string itself.\n",
+    "\n",
+    "### Useful keyboard shortcuts\n",
+    "\n",
+    "There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the `Keyboard Shortcuts` option under the `Help` menu. We will cover a few of those we find most useful below.\n",
+    "\n",
+    "Shortcuts come in two flavours: those applicable in *command mode*, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in *edit mode* when the content of a cell is being edited, indicated by a green current cell highlight.\n",
+    "\n",
+    "In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the `Tab` key.\n",
+    "\n",
+    "  * Pressing `Tab` a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing `a` + `[Tab]` in a code cell will come up with a list of objects beginning with `a` in the current global namespace, while typing `np.a` + `[Tab]` (assuming `import numpy as np` has been run already) will bring up a list of objects in the root NumPy namespace beginning with `a`.\n",
+    "  * Pressing `Shift+Tab` once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing `Shift+Tab` twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing `Shift+Tab` four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.\n",
+    "\n",
+    "A series of useful shortcuts available in both command and edit mode are `[modifier]+Enter` where `[modifier]` is one of `Ctrl` (run selected cell), `Shift` (run selected cell and select next) or `Alt` (run selected cell and insert a new cell after).\n",
+    "\n",
+    "A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing `L` which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.\n",
+    "  \n",
+    "### Magics\n",
+    "\n",
+    "There are a range of *magic* commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), however three that are particularly useful for this course:\n",
+    "\n",
+    "  * [`%%timeit`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-timeit) Put at the beginning of a cell to time its execution and print the resulting timing statistics.\n",
+    "  * [`%precision`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-precision) Set the precision for pretty printing of floating point values and NumPy arrays.\n",
+    "  * [`%debug`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-debug) Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.\n",
+    "  \n",
+    "### Plotting with `matplotlib`\n",
+    "\n",
+    "When setting up your environment one of the dependencies we asked you to install was `matplotlib`. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.\n",
+    "\n",
+    "When using `matplotlib` in a notebook you should first run the [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib)\n",
+    "\n",
+    "```\n",
+    "%matplotlib inline\n",
+    "```\n",
+    "\n",
+    "This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using `matplotlib` to show case some of the basic plot options. To see the output produced select the cell and then run it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "nbpresent": {
+     "id": "2bced39d-ae3a-4603-ac94-fbb6a6283a96"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# use the matplotlib magic to specify to display plots inline in the notebook\n",
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "\n",
+    "# generate a pair of sinusoids\n",
+    "x = np.linspace(0., 2. * np.pi, 100)\n",
+    "y1 = np.sin(x)\n",
+    "y2 = np.cos(x)\n",
+    "\n",
+    "# produce a new figure object with a defined (width, height) in inches\n",
+    "fig = plt.figure(figsize=(8, 4))\n",
+    "# add a single axis to the figure\n",
+    "ax = fig.add_subplot(111)\n",
+    "# plot the two sinusoidal traces on the axis, adjusting the line width\n",
+    "# and adding LaTeX legend labels\n",
+    "ax.plot(x, y1, linewidth=2, label=r'$\\sin(x)$')\n",
+    "ax.plot(x, y2, linewidth=2, label=r'$\\cos(x)$')\n",
+    "# set the axis labels\n",
+    "ax.set_xlabel('$x$', fontsize=16)\n",
+    "ax.set_ylabel('$y$', fontsize=16)\n",
+    "# force the legend to be displayed\n",
+    "ax.legend()\n",
+    "# adjust the limits of the horizontal axis\n",
+    "ax.set_xlim(0., 2. * np.pi)\n",
+    "# make a grid be displayed in the axis background\n",
+    "ax.grid('on')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "nbpresent": {
+     "id": "533c10f0-95ba-4684-a72d-fd52cef0d007"
+    }
+   },
+   "source": [
+    "# Exercises\n",
+    "\n",
+    "Today's exercises are meant to allow you to get some initial familiarisation with the `mlp` package and how data is provided to the learning functions. Next week onwards, we will follow with the material covered in lectures.\n",
+    "\n",
+    "## Data providers\n",
+    "\n",
+    "Open (in the browser) the [`mlp.data_providers`](../../edit/mlp/data_providers.py) module. Have a look through the code and comments, then follow to the exercises.\n",
+    "\n",
+    "### Exercise 1 \n",
+    "\n",
+    "The `MNISTDataProvider` iterates over input images and target classes (digit IDs) from the [MNIST database of handwritten digit images](http://yann.lecun.com/exdb/mnist/), a common supervised learning benchmark task. Using the data provider and `matplotlib` we can for example iterate over the first couple of images in the dataset and display them using the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "nbpresent": {
+     "id": "978c1095-a9ce-4626-a113-e0be5fe51ecb"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib.gridspec as gridspec\n",
+    "import matplotlib.cm as cm\n",
+    "import mlp.data_providers as data_providers\n",
+    "\n",
+    "def show_single_image(img, fig_size=(2, 2)):\n",
+    "    fig = plt.figure(figsize=fig_size)\n",
+    "    ax = fig.add_subplot(111)\n",
+    "    ax.imshow(img, cmap=cm.Greys_r)\n",
+    "    ax.axis('off')\n",
+    "    plt.show()\n",
+    "    return fig, ax\n",
+    "\n",
+    "# An example for a single MNIST image\n",
+    "mnist_dp = data_providers.MNISTDataProvider(\n",
+    "    which_set='valid', batch_size=1, max_num_batches=2, shuffle_order=True)\n",
+    "\n",
+    "for inputs, target in mnist_dp:\n",
+    "    show_single_image(inputs.reshape(28, 28))\n",
+    "    print('Image target: {0}'.format(target))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Generally we will want to deal with batches of multiple images i.e. `batch_size > 1`. As a first task:\n",
+    "\n",
+    "  * Using MNISTDataProvider, write code that iterates over the first 5 minibatches of size 100 data-points. \n",
+    "  * Display each batch of MNIST digits in a $10\\times10$ grid of images. \n",
+    "  \n",
+    "**Notes**:\n",
+    "\n",
+    "  * Images are returned from the provider as tuples of numpy arrays `(inputs, targets)`. The `inputs` matrix has shape `(batch_size, input_dim)` while the `targets` array is of shape `(batch_size,)`, where `batch_size` is the number of data points in a single batch and `input_dim` is dimensionality of the input features. \n",
+    "  * Each input data-point (image) is stored as a 784 dimensional vector of pixel intensities normalised to $[0, 1]$ from inital integer values in $[0, 255]$. However, the original spatial domain is two dimensional, so before plotting you will need to reshape the one dimensional input arrays in to two dimensional arrays 2D (MNIST images have the same height and width dimensions)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# write your code here for iterating over five batches of 100 data points each and displaying as 10x10 grids\n",
+    "\n",
+    "def show_batch_of_images(img_batch):\n",
+    "    raise NotImplementedError('Write me!')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "nbpresent": {
+     "id": "d2d525de-5d5b-41d5-b2fb-a83874dba986"
+    }
+   },
+   "source": [
+    "### Exercise 2\n",
+    "\n",
+    "`MNISTDataProvider` as `targets` currently returns a vector of integers, each element in this vector represents an the integer ID of the class the corresponding data-point represents. \n",
+    "\n",
+    "For training of neural networks a 1-of-K representation of multi-class targets is more useful. Instead of representing class identity by an integer ID, for each data point a vector of length equal to the number of classes is created, will all elements zero except for the element corresponding to the class ID. \n",
+    "\n",
+    "For instance, given a batch of 5 integer targets `[2, 2, 0, 1, 0]` and assuming there are 3 different classes \n",
+    "the corresponding 1-of_K encoded targets would be\n",
+    "```\n",
+    "[[0, 0, 1],\n",
+    " [0, 0, 1],\n",
+    " [1, 0, 0],\n",
+    " [0, 1, 0],\n",
+    " [1, 0, 0]]\n",
+    "```\n",
+    "\n",
+    "  * Implement the `to_one_of_k` method of `MNISTDataProvider` class. \n",
+    "  * Uncomment and modify an appropriate line in the `next` method, so the raw targets are converted to 1-of-K coding. \n",
+    "  * Test your code by running the the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "mnist_dp = data_providers.MNISTDataProvider(\n",
+    "    which_set='valid', batch_size=5, max_num_batches=5, shuffle_order=False)\n",
+    "\n",
+    "for inputs, targets in mnist_dp:\n",
+    "    assert np.all(targets.sum(-1) == 1.)\n",
+    "    assert np.all(targets >= 0.)\n",
+    "    assert np.all(targets <= 1.)\n",
+    "    print(targets)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "nbpresent": {
+     "id": "471093b7-4b94-4295-823a-5285c79d3119"
+    }
+   },
+   "source": [
+    "### Exercise 3\n",
+    "\n",
+    "Write your own data provider `MetOfficeDataProvider` that wraps [weather data for south Scotland](http://www.metoffice.gov.uk/hadobs/hadukp/data/daily/HadSSP_daily_qc.txt). A previous version of this data has been stored in `data` directory for your convenience.\n",
+    "\n",
+    "The data is organised in the text file as a table, with the first two columns indexing the year and month of the readings and the following 31 columns giving daily precipitation values for the corresponding month. As not all months have 31 days some of entries correspond to non-existing days. These values are indicated by a non-physical value of `-99.9`.\n",
+    "\n",
+    "  * You should read all of the data from the file (`np.loadtxt` may be useful for this) and then filter out the `-99.9` values and collapse the table to one-dimensional array corresponding to a sequence of daily measurements for the whole period data is available for.\n",
+    "  * A common initial preprocessing step in machine learning tasks is to normalise data so that it has zero mean and a standard deviation of one. Normalise the data sequence so that its overall mean is zero and standard deviation one.\n",
+    "  * Each data point in the data provider should correspond to a window of length specified in the `__init__` method as `window_size` of this contiguous data sequence, with the model inputs being the first `window_size - 1` elements of the window and the target output being the last element of the window. For example if the original data sequence was `[1, 2, 3, 4, 5, 6]` and `window_size=3` then `input, target` pairs iterated over by the data provider should be\n",
+    "  ```\n",
+    "  [1, 2], 3\n",
+    "  [4, 5], 6\n",
+    "  ```\n",
+    "  * **Extension**: Have the data provider instead overlapping windows of the sequence so that more training data instances are produced. For example for the sequence `[1, 2, 3, 4, 5, 6]` the corresponding `input, target` pairs would be\n",
+    "\n",
+    "```\n",
+    "[1, 2], 3\n",
+    "[2, 3], 4\n",
+    "[3, 4], 5\n",
+    "[4, 5], 6\n",
+    "```\n",
+    "  * Test your code by running the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "nbpresent": {
+     "id": "c8553a56-9f25-4198-8a1a-d7e9572b4382"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "for window_size in [2, 5, 10]:\n",
+    "    met_dp = data_providers.MetOfficeDataProvider(\n",
+    "        window_size=window_size, batch_size=5, max_num_batches=5, shuffle_order=False)\n",
+    "    for inputs, targets in met_dp:\n",
+    "        assert inputs.shape == (5, window_size - 1)\n",
+    "        assert targets.shape == (5, )"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
--- a/notebooks/00_Introduction_solution.ipynb
+++ b/notebooks/00_Introduction_solution.ipynb
--- a/notebooks/01_Linear_Models.ipynb
+++ b/notebooks/01_Linear_Models.ipynb
--- a/notebooks/01_Linear_Models_solution.ipynb
+++ b/notebooks/01_Linear_Models_solution.ipynb
--- a/notebooks/02_MNIST_SLN.ipynb
+++ b/notebooks/02_MNIST_SLN.ipynb
--- a/notebooks/02_MNIST_SLN_solution.ipynb
+++ b/notebooks/02_MNIST_SLN_solution.ipynb
--- a/notebooks/03_MLP_Coursework1.ipynb
+++ b/notebooks/03_MLP_Coursework1.ipynb
--- a/notebooks/04_Regularisation.ipynb
+++ b/notebooks/04_Regularisation.ipynb
--- a/notebooks/04_Regularisation_solution.ipynb
+++ b/notebooks/04_Regularisation_solution.ipynb
--- a/notebooks/05_Transfer_functions.ipynb
+++ b/notebooks/05_Transfer_functions.ipynb
--- a/notebooks/05_Transfer_functions_solution.ipynb
+++ b/notebooks/05_Transfer_functions_solution.ipynb
--- a/notebooks/06_MLP_Coursework2_Introduction.ipynb
+++ b/notebooks/06_MLP_Coursework2_Introduction.ipynb
--- a/notebooks/07_MLP_Coursework2.ipynb
+++ b/notebooks/07_MLP_Coursework2.ipynb
--- a/notebooks/res/code_scheme.svg
+++ b/notebooks/res/code_scheme.svg
--- a/notebooks/res/jupyter-dashboard.png
+++ b/notebooks/res/jupyter-dashboard.png
--- a/notebooks/res/jupyter-notebook-interface.png
+++ b/notebooks/res/jupyter-notebook-interface.png
--- a/notebooks/res/singleLayerNetBP-1.png
+++ b/notebooks/res/singleLayerNetBP-1.png
--- a/notebooks/res/singleLayerNetPredict.png
+++ b/notebooks/res/singleLayerNetPredict.png
--- a/notebooks/res/singleLayerNetWts-1.png
+++ b/notebooks/res/singleLayerNetWts-1.png
--- a/notebooks/res/singleLayerNetWtsBP.pdf
+++ b/notebooks/res/singleLayerNetWtsBP.pdf
--- a/notebooks/res/singleLayerNetWtsEqns-1.png
+++ b/notebooks/res/singleLayerNetWtsEqns-1.png
--- a/notebooks/res/singleLayerNetWtsEqns.pdf
+++ b/notebooks/res/singleLayerNetWtsEqns.pdf
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,13 @@
+""" Setup script for mlp package. """
+
+from setuptools import setup
+
+setup(
+    name = "mlp",
+    author = "Pawel Swietojanski, Steve Renals and Matt Graham",
+    description = ("Neural network framework for University of Edinburgh "
+                   "School of Informatics Machine Learning Practical course."),
+    url = "https://github.com/CSTR-Edinburgh/mlpractical",
+    packages=['mlp']
+)
+