From 0ac8d0b1e02e42b8a02e086c3cef9bb3d1878330 Mon Sep 17 00:00:00 2001 From: pswietojanski
\n",
+ "wget http://downloads.sourceforge.net/project/numpy/NumPy/1.9.2/numpy-1.9.2.zip\n",
+ "unzip numpy-1.9.2.zip\n",
+ "cd numpy-1.9.2\n",
+ "echo \"[openblas]\" >> site.cfg\n",
+ "echo \"library_dirs = /path/to/OpenBlas/lib\" >> site.cfg\n",
+ "echo \"include_dirs = /path/to/OpenBLAS/include\" >> site.cfg\n",
+ "
\n",
+ "\n",
+ "python setup.py build --fcompiler=gnu95\n",
+ "\n",
+ "Assuming the virtual environment is activated, the below command will install numpy in a desired space (~/.virtualenvs/mlpractical/...):\n",
+ "\n",
+ "python setup.py install\n",
+ "\n",
+ "\n",
+ "### Installing remaining packages and running tests\n",
+ "\n",
+ "Use pip to install remaining packages: `scipy`, `matplotlib`, `argparse`, `nose`, and check if they pass the tests. An example for numpy is given below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "%clear\n",
+ "import numpy\n",
+ "# show_config() prints the configuration of numpy numerical backend \n",
+ "# you should be able to see linkage to OpenBlas or some other library\n",
+ "# in case those are empty, it means something went wrong and \n",
+ "# numpy will use a default (slow) pythonic implementation for algebra\n",
+ "numpy.show_config()\n",
+ "#numpy.test()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Also, below we check whether and how much speedup one may expect by using different number of cores:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import multiprocessing\n",
+ "import timeit\n",
+ "\n",
+ "num_cores = multiprocessing.cpu_count()\n",
+ "N = 1000\n",
+ "x = numpy.random.random((N,N))\n",
+ "\n",
+ "for i in xrange(0, num_cores):\n",
+ " # first, set the number of threads OpenBLAS\n",
+ " # should use, the below line is equivalent\n",
+ " # to typing export OMP_NUM_THREADS=i+1 in bash shell\n",
+ " print 'Running matrix-matrix product on %i core(s)' % i\n",
+ " os.environ['OMP_NUM_THREADS'] = str(i+1)\n",
+ " %%timeit numpy.dot(x,x.T)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Test whether you can plot and display the figures using pyplot"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "x = numpy.linspace(0.0, 2*numpy.pi, 100)\n",
+ "y1 = numpy.sin(x)\n",
+ "y2 = numpy.cos(x)\n",
+ "\n",
+ "plt.plot(x, y1, lw=2, label=r'$\\sin(x)$')\n",
+ "plt.plot(x, y2, lw=2, label=r'$\\cos(x)$')\n",
+ "plt.xlabel('x')\n",
+ "plt.ylabel('y')\n",
+ "plt.legend()\n",
+ "plt.xlim(0.0, 2*numpy.pi)\n",
+ "plt.grid()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Exercises\n",
+ "\n",
+ "Today exercises are meant to get you familiar with ipython notebooks (if you haven't used them so far), data organisation and how to access it. Next week onwars, we will follow with the material covered in lectures.\n",
+ "\n",
+ "## Data providers\n",
+ "\n",
+ "Open (in the browser) `mlp.dataset` module (go to `Home` tab and navigate to mlp package, then click on the link `dataset.py`). Have a look thourgh the code and comments, then follow to exercises.\n",
+ "\n",
+ "General note: you can load the mlp code into your favourite python IDE but it is totally OK if you work (modify & save) the code directly in the browser by opening/modyfing the necessary modules in the tabs.\n",
+ "\n",
+ "### Exercise 1 \n",
+ "\n",
+ "Using MNISTDataProvider, write a code that iterates over the first 5 minibatches of size 100 data-points. Print MNIST digits in 10x10 images grid plot. Images are returned from the provider as tuples of numpy arrays `(features, targets)`. The `features` matrix has shape BxD while the `targets` vector is of size B, where B is the size of a mini-batch and D is dimensionality of the features. By deafult, each data-point (image) is stored in a 784 dimensional vector of pixel intensities normalised to [0,1] range from an inital integer values [0-255]. However, the original spatial domain is two dimensional, so before plotting you need to convert it into 2D matrix (MNIST images have the same number of pixels for height and width).\n",
+ "\n",
+ "Tip: Useful functions for this exercise are: imshow, subplot, gridspec"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import matplotlib.gridspec as gridspec\n",
+ "import matplotlib.cm as cm\n",
+ "from mlp.dataset import MNISTDataProvider\n",
+ "\n",
+ "def show_mnist_image(img):\n",
+ " fig = plt.figure()\n",
+ " gs = gridspec.GridSpec(1, 1)\n",
+ " ax1 = fig.add_subplot(gs[0,0])\n",
+ " ax1.imshow(img, cmap=cm.Greys_r)\n",
+ " plt.show()\n",
+ "\n",
+ "def show_mnist_images(batch):\n",
+ " raise NotImplementedError('Write me!')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "# An example for a single MNIST image\n",
+ "mnist_dp = MNISTDataProvider(dset='valid', batch_size=1, max_num_examples=2, randomize=False)\n",
+ "\n",
+ "for batch in mnist_dp:\n",
+ " features, targets = batch\n",
+ " show_mnist_image(features.reshape(28, 28))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "#implement here Exercise 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Exercise 2\n",
+ "\n",
+ "`MNISTDataProvider` as `targets` currently returns a vector of integers, each element in this vector represents an id of the category `features` data-point represent. Later in the course we are going to need 1-of-K representation of targets, for instance, given the minibatch of size 3 and the corresponding targets vector $[2, 2, 0]$ (and assuming there are only 3 different classes to discriminate between), one needs to convert it into matrix $\\left[ \\begin{array}{ccc}\n",
+ "0 & 0 & 1 \\\\\n",
+ "0 & 0 & 1 \\\\\n",
+ "1 & 0 & 0 \\end{array} \\right]$. \n",
+ "\n",
+ "Implement `__to_one_of_k` method of `MNISTDataProvider` class. Then modify (uncomment) an appropriate line in its `next` method, so the raw targets get converted to `1 of K` coding. Test the code in the cell below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "### Exercise 3\n",
+ "\n",
+ "Write your own data provider `MetOfficeDataProvider` that wraps the weather data for south Scotland (could be obtained from: http://www.metoffice.gov.uk/hadobs/hadukp/data/daily/HadSSP_daily_qc.txt). The file was also downloaded and stored in `data` directory for your convenience. The provider should return a tuple `(x,t)` of the estimates over an arbitrary time windows (i.e. last N-1 days) for `x` and the N-th day as the one which model should be able to predict, `t`. For now, skip missing data-points (denoted by -99.9) and simply use the next correct value. Make sure the provider works for arbitrary `batch_size` settings, including the case where single mini-batch is equal to all datapoints in the dataset. Test the dataset in the cell below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 2",
+ "language": "python",
+ "name": "python2"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 2
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython2",
+ "version": "2.7.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}