"This notebook contains some extended versions of hints and some code examples that are suppose to make it easier to proceed with certain tasks in Coursework #2.\n",
"Before you proceed onwards, remember to activate your virtual environment by typing `activate_mlp` or `source ~/mlpractical/venv/bin/activate` (or if you did the original install the \"comfy way\" type: `workon mlpractical`).\n",
"Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for the coursework, as follows:\n",
"1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
"2. List the branches and check which are currently active by typing: `git branch`\n",
"3. If you have followed our recommendations, you should be in the `lab5` branch, please commit your local changes to the repo index by typing:\n",
"```\n",
"git commit -am \"finished lab5\"\n",
"```\n",
"4. Now you can switch to `master` branch by typing: \n",
"```\n",
"git checkout master\n",
" ```\n",
"5. To update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
"```\n",
"git pull\n",
"```\n",
"6. And now, create the new branch & switch to it by typing:\n",
"Once you have finished a task it is a good idea to check-point your current notebook's status (logs, plots and whatever else has been stored in the notebook). By doing this, you can always revert to this state later when necessary. You can do this by going to menus `File->Save and Checkpoint` and `File->Revert to Checkpoint`.\n",
"Another good practice would be to save models and the statistics you generate to disk. You can easily do this in python by using *cPickle*, as in the following example."
"This is a remainder on some numpy conventions you may find useful (especially in the second part of coursework #2, which involves the implementation of convolution and pooling layers).\n",
"Below we list some (potentially) useful functions - you are not expected to need them all - we just outline some (non-obvious) functionality that you may find useful. Search the numpy documentation to get precise information about them. \n",
"* `numpy.sum` - note that the axis arguments allow to specify a sequence of axes, hence, the reduction (here sum) can be performed along arbitrary dimensions.\n",
"* `numpy.reshape` - allows to reshape a tensor into another (valid from data perspective) tensor (matrix, vector) with a different shape (but the same number of total elements)\n",
"* `numpy.rot90(m, k)` - rotate matrix `m` by 90 degrees `k` times (counter-clockwise)\n",
"* `numpy.newaxis` - add an axis with dimension 1 (handy for keeping tensor shapes compatible with expected broadcasting)\n",
"* `numpy.rollaxis` - roll an axis in a tensor\n",
"* `max_and_argmax` - `(mlp.layers)` - an auxiliary function we have provided to get both max and argmax of a tensor across an arbitrary axes, possibly in the format preserving tensor's original shape (this is not trivial to do using numpy *out-of-the-shelf* functionality).\n",
"The below cells contain some simple examples showing the basics of tensor manipulation in numpy (go through them if you haven't used numpy in this context before)."
"One can numerically compute the gradient using the [finite differences](https://en.wikipedia.org/wiki/Finite_difference) method, that is, perturb the input arguments by some small value and then measure how this affects the function change:\n",
"Because $\\epsilon$ is usually very small (1e-4 or smaller) it is recommended (due to finite precision of numerical machines) to use the centred variant (which is implemented in mlp.utils):\n",
"The numerical gradient gives a good intuition if something is wrong. But take care, since one can easily find ill-conditioned cases where this test might fail - either due to numerical precision when gradients get really small, or other because of issues like discontinuities in transfer functions (ReLU, Maxout) where perturbing the inputs might cause the piecwise component to cross \"the border\". For instance, for ReLU assume $f(x) < 0$ by a some small margin in argument $x$ and the gradient is correctly set to 0. However, the finite difference quotient rule with some $\\epsilon$ such $f(x+\\epsilon) > 0$ will give a non-zero numerical gradient. Anyway, this method remains very useful in verifying whether the implemented forward and backward pasees are mutually correct.\n",
"Below, you can find some examples on how one can use it, first for an arbitrary function and then short snippet on how to check the gradient backpropagated through layer."
"You can also check the backprop implementation in the layer. Notice, it **does not** necessarily check whether your layer implementation is correct but rather if the gradient computation is correct, given the forward pass computation. If you get the forward pass wrong, and somehow got the gradients right w.r.t what the forward pass is computing, the below check will not capture it (obviously). Contrary to normal scenraio where 32 floating point precision is sufficient, when checking gradients please make sure 64bit precision is used (or tune the tolerance)."
"Convolution can be accelerated in many ways, one of them is the use of *Cython* to write crucial bits in python (the one that involve heavy loop usage). You can speed up your code by:\n",
"* Using numpy as much as possible (which will use highly optimised looping, and possibly a form of BLAS-implemented paralleism where possible)\n",
"* Applying standard tricks to convolution (they boil down to more efficent use of BLAS routines (above) by loop unrolling - fewer operations on larger matrices, rather than more on smaller)\n",
"Cython will compile them to C and the code should be comparable in terms of efficiency to numpy using similar operations in numpy. Of course, one can only rely on numpy. Slicing numpy across many dimensions gets much more complicated than working with vectors and matrices and we do understand that this can be confusing. Hence, we allow the basic implementation (with any penalty or preference from our side) to be based on embedded loops (which is perhaps much easier to comprehend and debug).\n",
"Below we give some example cython code for the matrix-matrix dot function from the second tutorial so that you can see the basic differences and compare the obtained speeds. They give you all the necessary patterns needed to implement naive (reasonably efficient) convolution. If you use native python, rather than Cython, then naive looping will be *very* slow.\n",
" * [A tutorial on how to optimise the cython code](http://docs.cython.org/src/tutorial/numpy.html) (includes a working example which is actually simple convolution code, do not use it `as is`)\n",
"Before you proceed, check that you have installed `cython` (it should be installed with scipy). If the below imports do not work, then - staying in the activated virtual environment - type:\n",
"You can optimise the code further as in the [linked](http://docs.cython.org/src/tutorial/numpy.html) tutorial. However, the above example seems to be a reasonable compromise for developing the code - it gives reasonably accelerated code, with all the checks one may expect to be existent under development (checking bounds of indices, wheter types of variables match, tracking overflows etc.). Look [here](http://docs.cython.org/src/reference/compilation.html) for more optimisation decorators that one can use to speed things up.\n",
"Below we do some benchmarks on each of the above functions. Notice the huge speed-up in going from non-optimised cython code to an optimised one (on my machine, 643ms -> 6.35ms - this is 2 orders of magnitude!). It is still around two times slower than the BLAS accelerated numpy.dot routine (the non-cached result is around 3.3ms). But our method just benchmarks the dot product, an operation that has been optimised incredibly well in numerical libraries. Of course, we **do not** want you to use this code for dot products and you should rely on functions provided by numpy (whenever reasonably possible). The above code was just given as an example how to produce much more efficient code with very small effort. In many scenarios (convolution is an example) the code is more complex than a single dot product and some looping is necessary anyway, especially when dealing with multi-dimensional tensors where atomic operations using direct loop-based indexing may be much easier to comprehend (and debug) than a direct multi-dimensional manipulation of numpy tensors."