minor cosmetic editing

This commit is contained in:
Steve Renals 2015-11-12 14:52:56 +00:00
parent 4a8c01046e
commit 597aabdfa9

View File

@ -6,13 +6,13 @@
"source": [
"# Introduction to Coursework #2\n",
"\n",
"This notebook contains some extended versions of hints and some code examples that are suppose to make it easier to proceed with certain tasks in the Coursework #2.\n",
"This notebook contains some extended versions of hints and some code examples that are suppose to make it easier to proceed with certain tasks in Coursework #2.\n",
"\n",
"# Store the intermediate results (check-pointing and pickling)\n",
"\n",
"Once you have finished a certain task it is a good idea to check-point your current notebook's status (logs, plots and whatever else has been stored in the notebook). By doing this, you can always revert to this state later when necessary. You can do this by going to menus `File->Save and Checkpoint` and `File->Revert to Checkpoint`.\n",
"Once you have finished a task it is a good idea to check-point your current notebook's status (logs, plots and whatever else has been stored in the notebook). By doing this, you can always revert to this state later when necessary. You can do this by going to menus `File->Save and Checkpoint` and `File->Revert to Checkpoint`.\n",
"\n",
"The other good practice would be to dump to disk models and produced statistics. You can easily do it in python by using pickles, as in the following example."
"Another good practice would be to save models and the statistics you generate to disk. You can easily do this in python by using *cPickle*, as in the following example."
]
},
{
@ -47,27 +47,27 @@
"source": [
"# Notes on numpy and tensors\n",
"\n",
"This is a remainder on some numpy conventions you may find useful (especially with 2nd part of the coursework involving implementation of convolution and pooling layers).\n",
"This is a remainder on some numpy conventions you may find useful (especially in the second part of coursework #2, which involves the implementation of convolution and pooling layers).\n",
"\n",
"Links to numpy indexing:\n",
"* [Numpy (advanced) indexing](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)\n",
"* [More on indexing of multi-dimensional arrays](http://docs.scipy.org/doc/numpy/user/basics.indexing.html)\n",
"\n",
"Below we list some functions, special flags, etc. of (potentially) interesting functions - you are not expected to use them all - we just outline the (non-obvious) functionality you may find useful. Search numpy documentation to get an exact information about them. \n",
"Below we list some (potentially) useful functions - you are not expected to need them all - we just outline some (non-obvious) functionality that you may find useful. Search the numpy documentation to get precise information about them. \n",
"\n",
"* `numpy.sum` - just notice axis arguments allows to specify a sequence of axes, hence, the reduction (here sum) can be performed along arbitrary dimensions.\n",
"* `numpy.sum` - note that the axis arguments allow to specify a sequence of axes, hence, the reduction (here sum) can be performed along arbitrary dimensions.\n",
"* `numpy.amax` - the same as with sum\n",
"* `numpy.transpose` - can specify which axes you want to get transposed in a tensor\n",
"* `numpy.argmax` - gives you the argument (index) of the maximum value in a tensor\n",
"* `numpy.flatten` - collapses the n-dimensional tensor into vector\n",
"* `numpy.reshape` - allows to reshape tensor into another (valid from data perspective) tensor (matrix, vector) with different shape (but the same number of total elements)\n",
"* `numpy.rot90` - rotate a matrix by 90 (or multiply of 90) degrees counter-clockwise\n",
"* `numpy.newaxis` - adds an axis with dimension 1 (handy for keeping tensor shapes compatible with expected broadcasting)\n",
"* `numpy.rollaxis` - allows to shuffle certain axis in a tensor\n",
"* `numpy.reshape` - allows to reshape a tensor into another (valid from data perspective) tensor (matrix, vector) with a different shape (but the same number of total elements)\n",
"* `numpy.rot90(m, k)` - rotate matrix `m` by 90 degrees `k` times (counter-clockwise)\n",
"* `numpy.newaxis` - add an axis with dimension 1 (handy for keeping tensor shapes compatible with expected broadcasting)\n",
"* `numpy.rollaxis` - roll an axis in a tensor\n",
"* `slice` - allows to specify a range (can be used when indexing numpy arrays)\n",
"* `ellipsis` - allows to pick an arbitrary number of dimensions (inferred)\n",
"\n",
"Below cells contain some simple examples showing basics behind tensor manipulation in numpy (go through them if you haven't used numpy in this context before)."
"The below cells contain some simple examples showing the basics of tensor manipulation in numpy (go through them if you haven't used numpy in this context before)."
]
},
{
@ -175,19 +175,19 @@
"source": [
"# Verifying the gradients\n",
"\n",
"One can numerically compute the gradient using [finite differences](https://en.wikipedia.org/wiki/Finite_difference) method, that is, perturb the input arguments by some small value and then measure how this affects the function change:\n",
"One can numerically compute the gradient using the [finite differences](https://en.wikipedia.org/wiki/Finite_difference) method, that is, perturb the input arguments by some small value and then measure how this affects the function change:\n",
"\n",
"$\n",
"f(x) = \\frac{f(x+\\epsilon) - f(x)}{\\epsilon}\n",
"$\n",
"\n",
"Because $\\epsilon$ is usually very small (1e-4 or smaller) it is recommended (due to finite precision of numerical machines) to use the centred variant (which was implemented in mlp.utils):\n",
"Because $\\epsilon$ is usually very small (1e-4 or smaller) it is recommended (due to finite precision of numerical machines) to use the centred variant (which is implemented in mlp.utils):\n",
"\n",
"$\n",
"f(x) = \\frac{f(x+\\epsilon) - f(x-\\epsilon)}{2\\epsilon}\n",
"$\n",
"\n",
"The numerical gradient gives the good intution when something is wrong. But it should be treated with a bit of salt - as one can easily find ill-conditioned cases where this test might fail - either due to numerical precision when gradients get really small, or other issues like discontinuities in transfer functions (ReLU, Maxout) where perturbing the inputs might cause the piecwise component to swap \"the border\". For instance, for ReLU assume $f(x) < 0$ by a some small margin in argument $x$ and the gradient is correctly set to 0. However, the finite difference quotient rule with some $\\epsilon$ such $f(x+\\epsilon) > 0$ will give non-zero numerical gradient. Anyway, this method remains a very useful in verifying whether the implemented forward and backward pasees are mutually correct.\n",
"The numerical gradient gives a good intuition if something is wrong. But take care, since one can easily find ill-conditioned cases where this test might fail - either due to numerical precision when gradients get really small, or other because of issues like discontinuities in transfer functions (ReLU, Maxout) where perturbing the inputs might cause the piecwise component to cross \"the border\". For instance, for ReLU assume $f(x) < 0$ by a some small margin in argument $x$ and the gradient is correctly set to 0. However, the finite difference quotient rule with some $\\epsilon$ such $f(x+\\epsilon) > 0$ will give a non-zero numerical gradient. Anyway, this method remains very useful in verifying whether the implemented forward and backward pasees are mutually correct.\n",
"\n",
"Below, you can find some examples on how one can use it, first for an arbitrary function and then short snippet on how to check the gradient backpropagated through layer."
]
@ -230,7 +230,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also check the backprop implementation in the layer. Notice, it **does not** necessairly check whether your layer implementation is correct but rather if the gradient computation is correct, given forward pass computation. If you get the forward pass wrong, and somehow get gradients right w.r.t what forward pass is computing, the below check will not capture it (obviously). "
"You can also check the backprop implementation in the layer. Notice, it **does not** necessarily check whether your layer implementation is correct but rather if the gradient computation is correct, given the forward pass computation. If you get the forward pass wrong, and somehow got the gradients right w.r.t what the forward pass is computing, the below check will not capture it (obviously). "
]
},
{
@ -261,27 +261,27 @@
"source": [
"# Speeding up the code\n",
"\n",
"Convolution can be accelerated in many ways, one of them is the use of cython to write crucial bits in python (the one that involve heavy loop usage). \n",
"Convolution can be accelerated in many ways, one of them is the use of *Cython* to write crucial bits in python (the one that involve heavy loop usage). You can speed up your code by:\n",
"\n",
"* Use numpy as much as possible (which will use highly optimised looping, and possibly a form of BLAS implemented paralleism where possible)\n",
"* Applying standard tricks to convolution (they boil down to more efficent use of BLAS routines (above) by loop unrolling - fewer operations on larger matrices, than more on smaller)\n",
"* Using numpy as much as possible (which will use highly optimised looping, and possibly a form of BLAS-implemented paralleism where possible)\n",
"* Applying standard tricks to convolution (they boil down to more efficent use of BLAS routines (above) by loop unrolling - fewer operations on larger matrices, rather than more on smaller)\n",
"* Speeding up the code by compiling pythonic c-functions (cython)\n",
"\n",
"\n",
"## Using Cython for the crucial bottleneck pieces\n",
"\n",
"Cython will compile them to C and the code should be comparable in terms of efficiency to numpy using similar operations in numpy. Of course, one can only rely on numpy. Slicing numpy across many dimensions gets much more complicated than working than working with vectors and matrices and we do undersand those can be confusing for some people. Hence, we allow the basic implementation (with any penalty or preference from our side) to be loop only based (which is perhaps much easier to comprehend and debug).\n",
"Cython will compile them to C and the code should be comparable in terms of efficiency to numpy using similar operations in numpy. Of course, one can only rely on numpy. Slicing numpy across many dimensions gets much more complicated than working than working with vectors and matrices and we do understand that this can be confusing. Hence, we allow the basic implementation (with any penalty or preference from our side) to be based on embedded loops (which is perhaps much easier to comprehend and debug).\n",
"\n",
"Below we give an example cython code for matrix-matrix dot function from the second tutorial so you can see the basic differences and compare obtained speeds. They give you all the necessary pattern needed to implement naive (reasonably efficient) convolution. Naive looping in (native) python is gonna be *very* slow.\n",
"Below we give some example cython code for the matrix-matrix dot function from the second tutorial so that you can see the basic differences and compare the obtained speeds. They give you all the necessary patterns needed to implement naive (reasonably efficient) convolution. If you use native python, rather than Cython, then naive looping will be *very* slow.\n",
"\n",
"Some tutorials:\n",
" * [Cython, language basics](http://docs.cython.org/src/userguide/language_basics.html#language-basics)\n",
" * [Cython, basic tutorial](http://docs.cython.org/src/tutorial/cython_tutorial.html)\n",
" * [Cython in ipython notebooks](http://docs.cython.org/src/quickstart/build.html)\n",
" * [A tutorial on how to optimise the cython code](http://docs.cython.org/src/tutorial/numpy.html) (a working example is actually a simple convolution code)\n",
" * [A tutorial on how to optimise the cython code](http://docs.cython.org/src/tutorial/numpy.html) (includes a working example which is actually simple convolution code)\n",
" \n",
"\n",
"Before you proceed, in case you do not have installed `cython` (it should be installed with scipy). But in case the below imports do not work, staying in the activated virtual environment type:\n",
"Before you proceed, check that you have installed `cython` (it should be installed with scipy). If the below imports do not work, then - staying in the activated virtual environment - type:\n",
" \n",
" ```\n",
" pip install cython\n",
@ -418,9 +418,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can optimise the code further as in the [linked](http://docs.cython.org/src/tutorial/numpy.html) tutorial. However, the above example seems to be a reasonable compromise for developing the code - it gives a reasonably accelerated code, with all the security checks one may expect to be existent under development (checking bounds of indices, wheter types of variables match, tracking overflows etc.). Look [here](http://docs.cython.org/src/reference/compilation.html) for more optimisation decorators one can use to speed things up.\n",
"You can optimise the code further as in the [linked](http://docs.cython.org/src/tutorial/numpy.html) tutorial. However, the above example seems to be a reasonable compromise for developing the code - it gives reasonably accelerated code, with all the checks one may expect to be existent under development (checking bounds of indices, wheter types of variables match, tracking overflows etc.). Look [here](http://docs.cython.org/src/reference/compilation.html) for more optimisation decorators that one can use to speed things up.\n",
"\n",
"Below we do some benchmarks on each of the above functions. Notice huge speed-up from going from non-optimised cython code to optimised one (on my machine, 643ms -> 6.35ms - this is 2 orders!). It's still around two times slower than BLAS accelerated numpy.dot routine (non-cached result is around 3.3ms). But our method just benchmarks the dot product, operation that has been optimised incredibly well in numerical libraries. Of course, we **do not** want you to use this code for dot products and you should rely on functions provided by numpy (whenever reasonably possible). The above code was just given as an example how to produce much more efficient code with very small effort. In many scenarios (convolution is an example) the code is more complex than a single dot product and some looping is necessary anyway, especially when dealing with multi-dimensional tensors where atom operations using direct loop-based indexing may be much easier to comprehend (and debug) than a direct multi-dimensional manipulation of numpy tensors."
"Below we do some benchmarks on each of the above functions. Notice the huge speed-up in going from non-optimised cython code to an optimised one (on my machine, 643ms -> 6.35ms - this is 2 orders of magnitude!). It is still around two times slower than the BLAS accelerated numpy.dot routine (the non-cached result is around 3.3ms). But our method just benchmarks the dot product, an operation that has been optimised incredibly well in numerical libraries. Of course, we **do not** want you to use this code for dot products and you should rely on functions provided by numpy (whenever reasonably possible). The above code was just given as an example how to produce much more efficient code with very small effort. In many scenarios (convolution is an example) the code is more complex than a single dot product and some looping is necessary anyway, especially when dealing with multi-dimensional tensors where atomic operations using direct loop-based indexing may be much easier to comprehend (and debug) than a direct multi-dimensional manipulation of numpy tensors."
]
},
{
@ -505,7 +505,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
"version": "2.7.10"
}
},
"nbformat": 4,