mlpractical/06_MLP_Coursework2_Introduction.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Coursework #2\n",
    "\n",
    "This notebook contains some extended versions of hints and some code examples that are suppose to make it easier to proceed with certain tasks in the Coursework #2.\n",
    "\n",
    "## Virtual environments\n",
    "\n",
    "Before you proceed onwards, remember to activate your virtual environment by typing `activate_mlp` or `source ~/mlpractical/venv/bin/activate` (or if you did the original install the \"comfy way\" type: `workon mlpractical`).\n",
    "\n",
    "## Syncing the git repository\n",
    "\n",
    "Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for the coursework, as follows:\n",
    "\n",
    "1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
    "2. List the branches and check which are currently active by typing: `git branch`\n",
    "3. If you have followed our recommendations, you should be in the `lab5` branch, please commit your local changes to the repo index by typing:\n",
    "```\n",
    "git commit -am \"finished lab5\"\n",
    "```\n",
    "4. Now you can switch to `master` branch by typing: \n",
    "```\n",
    "git checkout master\n",
    " ```\n",
    "5. To update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
    "```\n",
    "git pull\n",
    "```\n",
    "6. And now, create the new branch & switch to it by typing:\n",
    "```\n",
    "git checkout -b coursework2\n",
    "```\n",
    "\n",
    "# Store the intermediate results (check-pointing and pickling)\n",
    "\n",
    "Once you have finished a certain task it is a good idea to check-point your current notebook's status (logs, plots and whatever else has been stored in the notebook). By doing this, you can always revert to this state later when necessary (without rerunning experimens). You can do this by going to menus `File->Save and Checkpoint` and `File->Revert to Checkpoint`.\n",
    "\n",
    "The other good practice would be to dump to disk models and produced statistics. You can easily do it in python by using pickles, as in the following example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python \n",
    "import cPickle\n",
    "\n",
    "#...\n",
    "(train_stats, valid_stats) = optimiser.train(model, train_dp, valid_dp)\n",
    "test_stats = optimiser.validate(model, test_dp)\n",
    "\n",
    "#this one saves the model, you can save like this any object \n",
    "#in python, like tuples, dictionaries, lists, etc.\n",
    "with open('task1_mdl.pkl','wb') as f:\n",
    "    cPickle.dump(model, f)\n",
    "    \n",
    "#then, to load you can type \n",
    "with open('task1_mdl.pkl','r') as f:\n",
    "    model2 = cPickle.load(f)\n",
    "\n",
    "#and you can use it again (if needed) without retraining\n",
    "test_stats2 = optimiser.validate(model2, test_dp)\n",
    "    \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notes on numpy and tensors\n",
    "\n",
    "This is a remainder on some numpy conventions you may find useful (especially with 2nd part of the coursework involving implementation of convolution and pooling layers).\n",
    "\n",
    "Links to numpy indexing:\n",
    "* [Numpy (advanced) indexing](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)\n",
    "* [More on indexing of multi-dimensional arrays](http://docs.scipy.org/doc/numpy/user/basics.indexing.html)\n",
    "\n",
    "Below we list some functions, special flags, etc. of (potentially) interesting functions - you are not expected to use them all - we just outline the (non-obvious) functionality you may find useful. Search numpy documentation to get an exact information about them. \n",
    "\n",
    "* `numpy.sum` - just notice axis arguments allows to specify a sequence of axes, hence, the reduction (here sum) can be performed along arbitrary dimensions.\n",
    "* `numpy.amax` - the same as with sum\n",
    "* `numpy.transpose` - can specify which axes you want to get transposed in a tensor\n",
    "* `numpy.argmax` - gives you the argument (index) of the maximum value in a tensor\n",
    "* `numpy.flatten` - collapses the n-dimensional tensor into vector (copy)\n",
    "* `numpy.ravel` - collapses the n-dimensional tensor into vector (creates a view)\n",
    "* `numpy.reshape` - allows to reshape tensor into another (valid from data perspective) tensor (matrix, vector) with different shape (but the same number of total elements)\n",
    "* `numpy.rot90` - rotate a matrix by 90 (or multiply of 90) degrees counter-clockwise\n",
    "* `numpy.newaxis` - adds an axis with dimension 1 (handy for keeping tensor shapes compatible with expected broadcasting)\n",
    "* `numpy.rollaxis` - allows to shuffle certain axis in a tensor\n",
    "* `slice` - allows to specify a range (can be used when indexing numpy arrays)\n",
    "* `ellipsis` - allows to pick an arbitrary number of dimensions (inferred)\n",
    "* `max_and_argmax` - `(mlp.layers)` - an auxiliary function we have provided to get both max and argmax of a tensor across an arbitrary axes, possibly in the format preserving tensor's original shape (this is not trivial to do using numpy *out-of-the-shelf* functionality).\n",
    "\n",
    "Below cells contain some simple examples showing basics behind tensor manipulation in numpy (go through them if you haven't used numpy in this context before)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%load_ext cython\n",
    "\n",
    "from mlp.dataset import MNISTDataProvider\n",
    "import numpy\n",
    "\n",
    "rng = numpy.random.RandomState([2015, 11, 11])\n",
    "\n",
    "# we create MNISTDataProvider with 'conv_reshape' argument set to True, \n",
    "# this will give us a 4D tensor of shape\n",
    "# (batch_size, num_input_channels, width, height)\n",
    "# compared to standard fully connected one\n",
    "# (batch_size, width*height)\n",
    "batch_size = 2\n",
    "mdp = MNISTDataProvider('valid', batch_size=batch_size, max_num_batches=1, randomize=False, conv_reshape=True)\n",
    "x, t = mdp.next()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%pylab\n",
    "%matplotlib inline\n",
    "# this will print (2, 1, 28, 28) which is\n",
    "# (batch_size, num_input_channels, width, height)\n",
    "print x.shape\n",
    "# and this the flattened version used to date (2, 784)\n",
    "print x.reshape(batch_size, -1).shape\n",
    "\n",
    "# you can pick the first image in the usual way you index numpy arrays\n",
    "img1 = x[0]\n",
    "# or in full notation which is equivalent (numpy by default treats single \n",
    "# indexes as selecting everything for this dimension)\n",
    "img2 = x[0, :, :, :]\n",
    "#this will be true\n",
    "print numpy.allclose(img1, img2)\n",
    "\n",
    "# Notice, slicing like this is collapsing the leading dimension \n",
    "# (as we picked only 0-th element), so the below will give you\n",
    "# (1, 28, 28)\n",
    "print img1.shape\n",
    "print img2.shape\n",
    "\n",
    "# to keep this dimension, one can use numpy.newaxis, which will add one dimension of\n",
    "# size 1, as a result you get a tensor (1, 1, 28, 28) \n",
    "# one image (as expected), but with preserved ndims of the source 4D tensor \n",
    "# this can be handy as it can simplify assignments to the original tensor\n",
    "img1=x[0, numpy.newaxis]\n",
    "print img1.shape\n",
    "\n",
    "#Let assume you want to get a sum of pixel intensities in each image in a mini-batch, \n",
    "#you can of course iterate over images and compute sum for each separately, but you can also:\n",
    "\n",
    "#to get the sum of pixel for each image separately, you could write:\n",
    "# (which means, sum along axis 2 and 3 together)\n",
    "print numpy.sum(x, axis=(2,3))\n",
    "\n",
    "#notice, that any of other calls would do what we want as:\n",
    "#will print the total sum for all images\n",
    "print numpy.sum(x)\n",
    "\n",
    "#will print the sum along the last axes (the columns of the images)\n",
    "print numpy.sum(x, axis=-1)\n",
    "\n",
    "# finally, let us swap the 10x20 rectangle of pixels between images\n",
    "# in numpy you can of course perform an arbitrary slicing and assignment\n",
    "# of sub-matrices\n",
    "\n",
    "slice_x = slice(10, 20)\n",
    "slice_y = slice(5, 25)\n",
    "\n",
    "patch1 = x[0, :, slice_x, slice_y]\n",
    "patch2 = x[1, :, slice_x, slice_y]\n",
    "print patch1.shape, patch1.shape\n",
    "\n",
    "xc = numpy.array(x) #this will make a copy of x\n",
    "xc[0, :, slice_x, slice_y] = patch2\n",
    "xc[1, :, slice_x, slice_y] = patch1\n",
    "\n",
    "fig, ax = plt.subplots(2,2)\n",
    "ax[0, 0].imshow(x[0,0], cmap=cm.Greys_r)\n",
    "ax[0, 1].imshow(x[1,0], cmap=cm.Greys_r)\n",
    "ax[1, 0].imshow(xc[0,0], cmap=cm.Greys_r)\n",
    "ax[1, 1].imshow(xc[1,0], cmap=cm.Greys_r)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Verifying the gradients\n",
    "\n",
    "One can numerically compute the gradient using [finite differences](https://en.wikipedia.org/wiki/Finite_difference) method, that is, perturb the input arguments by some small value and then measure how this affects the function change:\n",
    "\n",
    "$\n",
    "f(x) = \\frac{f(x+\\epsilon) - f(x)}{\\epsilon}\n",
    "$\n",
    "\n",
    "Because $\\epsilon$ is usually very small (1e-4 or smaller) it is recommended (due to finite precision of numerical machines) to use the centred variant (which was implemented in `mlp.utils`):\n",
    "\n",
    "$\n",
    "f(x) = \\frac{f(x+\\epsilon) - f(x-\\epsilon)}{2\\epsilon}\n",
    "$\n",
    "\n",
    "The numerical gradient gives the good intution when something is wrong. But it should be treated with a bit of salt -  as one can easily find ill-conditioned cases where this test might fail - either due to numerical precision when gradients get really small, or other issues like discontinuities in transfer functions (ReLU, Maxout) where perturbing the inputs might cause the piecwise component to swap \"the border\". For instance, for ReLU assume $f(x) < 0$ by a some small margin in argument $x$ and the gradient is correctly set to 0. However, the finite difference quotient rule with some $\\epsilon$ such $f(x+\\epsilon) > 0$  will give non-zero numerical gradient. Anyway, this method remains a very useful in verifying whether the implemented forward and backward pasees are mutually correct.\n",
    "\n",
    "Below, you can find some examples on how one can use it, first for an arbitrary function and then short snippet on how to check the gradient backpropagated through layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%autoreload\n",
    "\n",
    "import numpy\n",
    "from mlp.utils import verify_gradient\n",
    "\n",
    "rng = numpy.random.RandomState([2015, 11, 11])\n",
    "\n",
    "#simple example\n",
    "def f1_correct_grad(x, **kwargs):\n",
    "    fval = x**2\n",
    "    fgrad = 2*x\n",
    "    return numpy.sum(fval), fgrad\n",
    "\n",
    "def f1_incorrect_grad(x, **kwargs):\n",
    "    fval = x**2\n",
    "    fgrad = x\n",
    "    return numpy.sum(fval), fgrad\n",
    "\n",
    "x = rng.uniform(-5, 5, (10,))\n",
    "\n",
    "#this one should be OK\n",
    "print verify_gradient(f=f1_correct_grad, x=x)\n",
    "# this one should raise an exception, as the computed gradient is wrong\n",
    "print verify_gradient(f=f1_incorrect_grad, x=x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also check the backprop implementation in the layer. Notice, it **does not** necessairly check whether your layer implementation is correct but rather if the gradient computation is correct, given forward pass computation. If you get the forward pass wrong, and somehow get gradients right w.r.t what forward pass is computing, the below check will not capture it (obviously). Contrary to normal scenraio where 32 floating point precision is sufficient, when checking gradients please make sure 64bit precision is used (or tune the tolerance)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%autoreload\n",
    "\n",
    "from mlp.layers import Sigmoid\n",
    "from mlp.utils import verify_layer_gradient\n",
    "\n",
    "# keep it small, however, one can test it on some subset of MNIST\n",
    "idim = 10\n",
    "odim = 5\n",
    "\n",
    "x = rng.uniform(-2, 2, (20, idim))\n",
    "s = Sigmoid(idim=idim, odim=odim, rng=rng)\n",
    "verify_layer_gradient(s, x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Speeding up the code\n",
    "\n",
    "Convolution can be accelerated in many ways, one of them is the use of cython to write crucial bits in python (the one that involve heavy loop usage). \n",
    "\n",
    "* Use numpy as much as possible (which will use highly optimised looping, and possibly a form of BLAS implemented paralleism where possible)\n",
    "* Applying standard tricks to convolution (they boil down to more efficent use of BLAS routines (above) by loop unrolling - fewer operations on larger matrices, than more on smaller)\n",
    "* Speeding up the code by compiling pythonic c-functions (cython)\n",
    "\n",
    "\n",
    "## Using Cython for the crucial bottleneck pieces\n",
    "\n",
    "Cython will compile them to C and the code should be comparable in terms of efficiency to numpy using similar operations in numpy. Of course, one can only rely on numpy. Slicing numpy across many dimensions gets much more complicated than working than working with vectors and matrices and we do undersand those can be confusing for some people. Hence, we allow the basic implementation of convolutiona and/or pooling (with any penalty or preference from our side) to be loop only based (which is perhaps much easier to comprehend and debug).\n",
    "\n",
    "Below we give an example cython code for matrix-matrix dot function from the second tutorial so you can see the basic differences and compare obtained speeds. They give you all the necessary pattern needed to implement naive (reasonably efficient) convolution. Naive looping in (native) python is gonna be *very* slow.\n",
    "\n",
    "Some tutorials:\n",
    " * [Cython, language basics](http://docs.cython.org/src/userguide/language_basics.html#language-basics)\n",
    " * [Cython, basic tutorial](http://docs.cython.org/src/tutorial/cython_tutorial.html)\n",
    " * [Cython in ipython notebooks](http://docs.cython.org/src/quickstart/build.html)\n",
    " * [A tutorial on how to optimise the cython code](http://docs.cython.org/src/tutorial/numpy.html) (a working example is actually a simple convolution code, do not use it `as is`)\n",
    " \n",
    "\n",
    "Before you proceed, in case you do not have installed `cython` (it should be installed with scipy). But in case the below imports do not work, staying in the activated virtual environment type:\n",
    " \n",
    " ```\n",
    " pip install cython\n",
    " ```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#native pythonic implementation (as in the second tutorial, rather slow...)\n",
    "def my_dot_mat_mat(x, W):\n",
    "    I = x.shape[0]\n",
    "    J = x.shape[1]\n",
    "    K = W.shape[1]\n",
    "    assert (J == W.shape[0]), (\n",
    "        \"Number of columns in x expected to \"\n",
    "        \" to be the same as rows in W, got\"\n",
    "        \" %i != %i\" % (J, W.shape[0])\n",
    "    )\n",
    "    #allocate the output container\n",
    "    y = numpy.zeros((I, K))\n",
    "    \n",
    "    #implement matrix-matrix inner product here\n",
    "    for i in xrange(0, I):\n",
    "        for k in xrange(0, K):\n",
    "            for j in xrange(0, J):\n",
    "                y[i, k] += x[i, j] * W[j,k]\n",
    "                \n",
    "    return y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%cython\n",
    "# this shows an example on how to build cython accelerated\n",
    "# function than one can later use a standard python function\n",
    "\n",
    "#notice, you need to specify all the imports again (so they are\n",
    "#available in the compiled verions of the code, when executed)\n",
    "import numpy\n",
    "cimport numpy\n",
    "\n",
    "def my_dot_mat_mat_cython(numpy.ndarray x, numpy.ndarray W):\n",
    "    cdef int I = x.shape[0]\n",
    "    cdef int J = x.shape[1]\n",
    "    cdef int K = W.shape[1]\n",
    "    \n",
    "    assert (J == W.shape[0]), (\n",
    "        \"Number of columns in x expected to \"\n",
    "        \" to be the same as rows in W, got\"\n",
    "        \" %i != %i\" % (J, W.shape[0])\n",
    "    )\n",
    "    \n",
    "    #allocate the output container and other variables\n",
    "    cdef numpy.ndarray y = numpy.zeros((I, K))\n",
    "    cdef int i, k, j\n",
    "    \n",
    "    #implement matrix-matrix inner product here\n",
    "    for i in xrange(0, I):\n",
    "        for k in xrange(0, K):\n",
    "            for j in xrange(0, J):\n",
    "                y[i, k] += x[i, j] * W[j,k]\n",
    "                \n",
    "    return y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%cython\n",
    "# this shows an example on how to build cython accelerated\n",
    "# function than one can later call as usual from python\n",
    "\n",
    "#the optimisation relies on making an explicit c-buffers for data, \n",
    "#hence they can be accesses much quicker, notice, this is where \n",
    "#the real gain in speed comes from really\n",
    "\n",
    "#you need to specify all the imports again (so they are\n",
    "#available in the compiled verions of the code)\n",
    "import numpy\n",
    "cimport numpy\n",
    "\n",
    "#we specify the types our function is supposed to\n",
    "#be compiled with, float32 is sufficient for neural nets\n",
    "#you can use defalut 64bit precision if you want\n",
    "DTYPE = numpy.float32\n",
    "ctypedef numpy.float32_t DTYPE_t\n",
    "\n",
    "\n",
    "def my_dot_mat_mat_cython_optimised(numpy.ndarray[DTYPE_t, ndim=2] x, \n",
    "                                    numpy.ndarray[DTYPE_t, ndim=2] W):\n",
    "    \n",
    "    cdef int I = x.shape[0]\n",
    "    cdef int J = x.shape[1]\n",
    "    cdef int K = W.shape[1]\n",
    "    \n",
    "    assert (J == W.shape[0]), (\n",
    "        \"Number of columns in x expected to \"\n",
    "        \" to be the same as rows in W, got\"\n",
    "        \" %i != %i\" % (J, W.shape[0])\n",
    "    )\n",
    "    \n",
    "    #allocate the output container and other variables, notice, when allocating\n",
    "    #y we specify its type both for the buffer but also in numpy.zeros() function\n",
    "    cdef numpy.ndarray[DTYPE_t, ndim=2] y = numpy.zeros((I, K), dtype=DTYPE)\n",
    "    cdef int i, k, j\n",
    "    \n",
    "    #implement matrix-matrix inner product here\n",
    "    for i in xrange(0, I):\n",
    "        for k in xrange(0, K):\n",
    "            for j in xrange(0, J):\n",
    "                y[i, k] += x[i, j] * W[j,k]\n",
    "                \n",
    "    return y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can optimise the code further as in the [linked](http://docs.cython.org/src/tutorial/numpy.html) tutorial. However, the above example seems to be a reasonable compromise for developing the code - it gives a reasonably accelerated code, with all the security checks one may expect to be existent under development (checking bounds of indices, wheter types of variables match, tracking overflows etc.). Look [here](http://docs.cython.org/src/reference/compilation.html) for more optimisation decorators one can use to speed things up.\n",
    "\n",
    "Below we do some benchmarks on each of the above functions. Notice huge speed-up from going from non-optimised cython code to optimised one (on my machine, 643ms -> 6.35ms - this is 2 orders!). It's still around two times slower than BLAS accelerated numpy.dot routine (non-cached result is around 3.3ms). But our method just benchmarks the dot product, operation that has been optimised incredibly well in numerical libraries. Of course, we **do not** want you to use this code for dot products and you should rely on functions provided by numpy (whenever reasonably possible). The above code was just given as an example how to produce much more efficient code with very small programming effort. In many scenarios (convolution is an example) the code is more complex than a single dot product and some looping is necessary anyway, especially when dealing with multi-dimensional tensors where atom operations using direct loop-based indexing may be much easier to comprehend (and debug) than a direct multi-dimensional manipulation of numpy tensors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#generate bit bigger matrices, to better evaluate timings\n",
    "\n",
    "# note, we explicitly cast them to float32 as it was the type cython\n",
    "# functions were compiled with, float32 is more than sufficient precision\n",
    "# for neural networks\n",
    "x = rng.uniform(-1, 1, (10, 1000)).astype(numpy.float32)\n",
    "W = rng.uniform(-0.3, 0.2, (1000, 100)).astype(numpy.float32)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print 'native pyton my_dot timings:'\n",
    "%timeit -n10 my_dot_mat_mat(x, W)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print 'cython my_dot timings:'\n",
    "%timeit -n10 my_dot_mat_mat_cython(x, W)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print 'optimised cython my_dot timings:'\n",
    "%timeit -n10 my_dot_mat_mat_cython_optimised(x, W)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print 'numpy.dot timings (with BLAS):'\n",
    "%timeit -n10 numpy.dot(x, W)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}