"This notebook contains some extended versions of hints and some code examples that are suppose to make it easier to proceed with certain tasks in the Coursework #2.\n",
"Before you proceed onwards, remember to activate your virtual environment by typing `activate_mlp` or `source ~/mlpractical/venv/bin/activate` (or if you did the original install the \"comfy way\" type: `workon mlpractical`).\n",
"Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for the coursework, as follows:\n",
"1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
"2. List the branches and check which are currently active by typing: `git branch`\n",
"3. If you have followed our recommendations, you should be in the `lab5` branch, please commit your local changes to the repo index by typing:\n",
"```\n",
"git commit -am \"finished lab5\"\n",
"```\n",
"4. Now you can switch to `master` branch by typing: \n",
"```\n",
"git checkout master\n",
" ```\n",
"5. To update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
"```\n",
"git pull\n",
"```\n",
"6. And now, create the new branch & switch to it by typing:\n",
"Once you have finished a certain task it is a good idea to check-point your current notebook's status (logs, plots and whatever else has been stored in the notebook). By doing this, you can always revert to this state later when necessary (without rerunning experimens). You can do this by going to menus `File->Save and Checkpoint` and `File->Revert to Checkpoint`.\n",
"The other good practice would be to dump to disk models and produced statistics. You can easily do it in python by using pickles, as in the following example."
"This is a remainder on some numpy conventions you may find useful (especially with 2nd part of the coursework involving implementation of convolution and pooling layers).\n",
"* [More on indexing of multi-dimensional arrays](http://docs.scipy.org/doc/numpy/user/basics.indexing.html)\n",
"\n",
"Below we list some functions, special flags, etc. of (potentially) interesting functions - you are not expected to use them all - we just outline the (non-obvious) functionality you may find useful. Search numpy documentation to get an exact information about them. \n",
"\n",
"* `numpy.sum` - just notice axis arguments allows to specify a sequence of axes, hence, the reduction (here sum) can be performed along arbitrary dimensions.\n",
"* `numpy.amax` - the same as with sum\n",
"* `numpy.transpose` - can specify which axes you want to get transposed in a tensor\n",
"* `numpy.argmax` - gives you the argument (index) of the maximum value in a tensor\n",
"* `numpy.reshape` - allows to reshape tensor into another (valid from data perspective) tensor (matrix, vector) with different shape (but the same number of total elements)\n",
"* `numpy.rot90` - rotate a matrix by 90 (or multiply of 90) degrees counter-clockwise\n",
"* `numpy.newaxis` - adds an axis with dimension 1 (handy for keeping tensor shapes compatible with expected broadcasting)\n",
"* `numpy.rollaxis` - allows to shuffle certain axis in a tensor\n",
"* `slice` - allows to specify a range (can be used when indexing numpy arrays)\n",
"* `ellipsis` - allows to pick an arbitrary number of dimensions (inferred)\n",
"* `max_and_argmax` - `(mlp.layers)` - an auxiliary function we have provided to get both max and argmax of a tensor across an arbitrary axes, possibly in the format preserving tensor's original shape (this is not trivial to do using numpy *out-of-the-shelf* functionality).\n",
"Below cells contain some simple examples showing basics behind tensor manipulation in numpy (go through them if you haven't used numpy in this context before)."
"# and this the flattened version used to date (2, 784)\n",
"print x.reshape(batch_size, -1).shape\n",
"\n",
"# you can pick the first image in the usual way you index numpy arrays\n",
"img1 = x[0]\n",
"# or in full notation which is equivalent (numpy by default treats single \n",
"# indexes as selecting everything for this dimension)\n",
"img2 = x[0, :, :, :]\n",
"#this will be true\n",
"print numpy.allclose(img1, img2)\n",
"\n",
"# Notice, slicing like this is collapsing the leading dimension \n",
"# (as we picked only 0-th element), so the below will give you\n",
"# (1, 28, 28)\n",
"print img1.shape\n",
"print img2.shape\n",
"\n",
"# to keep this dimension, one can use numpy.newaxis, which will add one dimension of\n",
"# size 1, as a result you get a tensor (1, 1, 28, 28) \n",
"# one image (as expected), but with preserved ndims of the source 4D tensor \n",
"# this can be handy as it can simplify assignments to the original tensor\n",
"img1=x[0, numpy.newaxis]\n",
"print img1.shape\n",
"\n",
"#Let assume you want to get a sum of pixel intensities in each image in a mini-batch, \n",
"#you can of course iterate over images and compute sum for each separately, but you can also:\n",
"\n",
"#to get the sum of pixel for each image separately, you could write:\n",
"# (which means, sum along axis 2 and 3 together)\n",
"print numpy.sum(x, axis=(2,3))\n",
"\n",
"#notice, that any of other calls would do what we want as:\n",
"#will print the total sum for all images\n",
"print numpy.sum(x)\n",
"\n",
"#will print the sum along the last axes (the columns of the images)\n",
"print numpy.sum(x, axis=-1)\n",
"\n",
"# finally, let us swap the 10x20 rectangle of pixels between images\n",
"# in numpy you can of course perform an arbitrary slicing and assignment\n",
"# of sub-matrices\n",
"\n",
"slice_x = slice(10, 20)\n",
"slice_y = slice(5, 25)\n",
"\n",
"patch1 = x[0, :, slice_x, slice_y]\n",
"patch2 = x[1, :, slice_x, slice_y]\n",
"print patch1.shape, patch1.shape\n",
"\n",
"xc = numpy.array(x) #this will make a copy of x\n",
"xc[0, :, slice_x, slice_y] = patch2\n",
"xc[1, :, slice_x, slice_y] = patch1\n",
"\n",
"fig, ax = plt.subplots(2,2)\n",
"ax[0, 0].imshow(x[0,0], cmap=cm.Greys_r)\n",
"ax[0, 1].imshow(x[1,0], cmap=cm.Greys_r)\n",
"ax[1, 0].imshow(xc[0,0], cmap=cm.Greys_r)\n",
"ax[1, 1].imshow(xc[1,0], cmap=cm.Greys_r)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Verifying the gradients\n",
"\n",
"One can numerically compute the gradient using [finite differences](https://en.wikipedia.org/wiki/Finite_difference) method, that is, perturb the input arguments by some small value and then measure how this affects the function change:\n",
"Because $\\epsilon$ is usually very small (1e-4 or smaller) it is recommended (due to finite precision of numerical machines) to use the centred variant (which was implemented in `mlp.utils`):\n",
"The numerical gradient gives the good intution when something is wrong. But it should be treated with a bit of salt - as one can easily find ill-conditioned cases where this test might fail - either due to numerical precision when gradients get really small, or other issues like discontinuities in transfer functions (ReLU, Maxout) where perturbing the inputs might cause the piecwise component to swap \"the border\". For instance, for ReLU assume $f(x) < 0$ by a some small margin in argument $x$ and the gradient is correctly set to 0. However, the finite difference quotient rule with some $\\epsilon$ such $f(x+\\epsilon) > 0$ will give non-zero numerical gradient. Anyway, this method remains a very useful in verifying whether the implemented forward and backward pasees are mutually correct.\n",
"\n",
"Below, you can find some examples on how one can use it, first for an arbitrary function and then short snippet on how to check the gradient backpropagated through layer."
"You can also check the backprop implementation in the layer. Notice, it **does not** necessairly check whether your layer implementation is correct but rather if the gradient computation is correct, given forward pass computation. If you get the forward pass wrong, and somehow get gradients right w.r.t what forward pass is computing, the below check will not capture it (obviously). Contrary to normal scenraio where 32 floating point precision is sufficient, when checking gradients please make sure 64bit precision is used (or tune the tolerance)."
"# keep it small, however, one can test it on some subset of MNIST\n",
"idim = 10\n",
"odim = 5\n",
"\n",
"x = rng.uniform(-2, 2, (20, idim))\n",
"s = Sigmoid(idim=idim, odim=odim, rng=rng)\n",
"verify_layer_gradient(s, x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Speeding up the code\n",
"\n",
"Convolution can be accelerated in many ways, one of them is the use of cython to write crucial bits in python (the one that involve heavy loop usage). \n",
"\n",
"* Use numpy as much as possible (which will use highly optimised looping, and possibly a form of BLAS implemented paralleism where possible)\n",
"* Applying standard tricks to convolution (they boil down to more efficent use of BLAS routines (above) by loop unrolling - fewer operations on larger matrices, than more on smaller)\n",
"* Speeding up the code by compiling pythonic c-functions (cython)\n",
"\n",
"\n",
"## Using Cython for the crucial bottleneck pieces\n",
"Cython will compile them to C and the code should be comparable in terms of efficiency to numpy using similar operations in numpy. Of course, one can only rely on numpy. Slicing numpy across many dimensions gets much more complicated than working than working with vectors and matrices and we do undersand those can be confusing for some people. Hence, we allow the basic implementation of convolutiona and/or pooling (with any penalty or preference from our side) to be loop only based (which is perhaps much easier to comprehend and debug).\n",
"Below we give an example cython code for matrix-matrix dot function from the second tutorial so you can see the basic differences and compare obtained speeds. They give you all the necessary pattern needed to implement naive (reasonably efficient) convolution. Naive looping in (native) python is gonna be *very* slow.\n",
"\n",
"Some tutorials:\n",
" * [Cython, language basics](http://docs.cython.org/src/userguide/language_basics.html#language-basics)\n",
" * [A tutorial on how to optimise the cython code](http://docs.cython.org/src/tutorial/numpy.html) (a working example is actually a simple convolution code, do not use it `as is`)\n",
"Before you proceed, in case you do not have installed `cython` (it should be installed with scipy). But in case the below imports do not work, staying in the activated virtual environment type:\n",
" \n",
" ```\n",
" pip install cython\n",
" ```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#native pythonic implementation (as in the second tutorial, rather slow...)\n",
"def my_dot_mat_mat(x, W):\n",
" I = x.shape[0]\n",
" J = x.shape[1]\n",
" K = W.shape[1]\n",
" assert (J == W.shape[0]), (\n",
" \"Number of columns in x expected to \"\n",
" \" to be the same as rows in W, got\"\n",
" \" %i != %i\" % (J, W.shape[0])\n",
" )\n",
" #allocate the output container\n",
" y = numpy.zeros((I, K))\n",
" \n",
" #implement matrix-matrix inner product here\n",
" for i in xrange(0, I):\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[i, k] += x[i, j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%cython\n",
"# this shows an example on how to build cython accelerated\n",
"# function than one can later use a standard python function\n",
"\n",
"#notice, you need to specify all the imports again (so they are\n",
"#available in the compiled verions of the code, when executed)\n",
" #allocate the output container and other variables, notice, when allocating\n",
" #y we specify its type both for the buffer but also in numpy.zeros() function\n",
" cdef numpy.ndarray[DTYPE_t, ndim=2] y = numpy.zeros((I, K), dtype=DTYPE)\n",
" cdef int i, k, j\n",
" \n",
" #implement matrix-matrix inner product here\n",
" for i in xrange(0, I):\n",
" for k in xrange(0, K):\n",
" for j in xrange(0, J):\n",
" y[i, k] += x[i, j] * W[j,k]\n",
" \n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can optimise the code further as in the [linked](http://docs.cython.org/src/tutorial/numpy.html) tutorial. However, the above example seems to be a reasonable compromise for developing the code - it gives a reasonably accelerated code, with all the security checks one may expect to be existent under development (checking bounds of indices, wheter types of variables match, tracking overflows etc.). Look [here](http://docs.cython.org/src/reference/compilation.html) for more optimisation decorators one can use to speed things up.\n",
"Below we do some benchmarks on each of the above functions. Notice huge speed-up from going from non-optimised cython code to optimised one (on my machine, 643ms -> 6.35ms - this is 2 orders!). It's still around two times slower than BLAS accelerated numpy.dot routine (non-cached result is around 3.3ms). But our method just benchmarks the dot product, operation that has been optimised incredibly well in numerical libraries. Of course, we **do not** want you to use this code for dot products and you should rely on functions provided by numpy (whenever reasonably possible). The above code was just given as an example how to produce much more efficient code with very small programming effort. In many scenarios (convolution is an example) the code is more complex than a single dot product and some looping is necessary anyway, especially when dealing with multi-dimensional tensors where atom operations using direct loop-based indexing may be much easier to comprehend (and debug) than a direct multi-dimensional manipulation of numpy tensors."