diff --git a/01_Linear_Models_solution.ipynb b/01_Linear_Models_solution.ipynb index 961fd21..b5f7a4a 100644 --- a/01_Linear_Models_solution.ipynb +++ b/01_Linear_Models_solution.ipynb @@ -55,19 +55,19 @@ "***\n", "### Note on storing matrices in computer memory\n", "\n", - "Suppose you want to store the following matrix in memory: $\\left[ \\begin{array}{ccc}\n", + "Consider you want to store the following array in memory: $\\left[ \\begin{array}{ccc}\n", "1 & 2 & 3 \\\\\n", "4 & 5 & 6 \\\\\n", "7 & 8 & 9 \\end{array} \\right]$ \n", "\n", - "If you allocate the memory at once for the whole matrix, then the above matrix would be organised as a vector in one of two possible forms:\n", + "In computer memory the above matrix would be organised as a vector in either (assume you allocate the memory at once for the whole matrix):\n", "\n", "* Row-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n", "1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\end{array} \\right ]$\n", "* Column-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n", "1 & 4 & 7 & 2 & 5 & 8 & 3 & 6 & 9 \\end{array} \\right ]$\n", "\n", - "Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with the more modern (and default) `C`-like approach and use the row-wise format (in contrast to Fortran that used a column-wise approach). \n", + "Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with modern (and default) `c`-like approach and use row-wise format (contrary to Fortran that used column-wise approach). \n", "\n", "This means, that in this tutorial:\n", "* vectors are kept row-wise $\\mathbf{x} = (x_1, x_1, \\ldots, x_D) $ (rather than $\\mathbf{x} = (x_1, x_1, \\ldots, x_D)^T$)\n", @@ -76,18 +76,18 @@ "x_{21} & x_{22} & \\ldots & x_{2D} \\\\\n", "x_{31} & x_{32} & \\ldots & x_{3D} \\\\ \\end{array} \\right]$ and each row (i.e. $\\left[ \\begin{array}{cccc} x_{11} & x_{12} & \\ldots & x_{1D} \\end{array} \\right]$) represents a single data-point (like one MNIST image or one window of observations)\n", "\n", - "In lecture slides you will find the equations following the conventional mathematical approach, using column vectors, but you can easily map between column-major and row-major organisations using a matrix transpose.\n", + "In lecture slides you will find the equations following the conventional mathematical column-wise approach, but you can easily map them one way or the other using using matrix transpose.\n", "\n", "***\n", "\n", "## Linear and Affine Transforms\n", "\n", - "The basis of all linear models is the so called affine transform, which is a transform that implements a linear transformation and translation of the input features. The transforms we are going to use are parameterised by:\n", + "The basis of all linear models is so called affine transform, that is a transform that implements some linear transformation and translation of input features. The transforms we are going to use are parameterised by:\n", "\n", - " * A weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n", - " * A bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n", + " * Weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n", + " * Bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n", "\n", - "Note, the bias is simply some additive term, and can be easily incorporated into an additional row in weight matrix and an additional input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n", + "Note, the bias is simply some additve term, and can be easily incorporated into an additional row in weight matrix and an additinal input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n", "\n", "![Making Predictions](res/singleLayerNetWts-1.png)\n", "\n", @@ -152,13 +152,14 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy\n", + "import sys\n", "\n", "#initialise the random generator to be used later\n", "seed=[2015, 10, 1]\n", @@ -185,7 +186,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": { "collapsed": false }, @@ -204,7 +205,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 4, "metadata": { "collapsed": false }, @@ -273,11 +274,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[-0.00683757 -0.13638553 0.00203203 ..., 0.02690207 -0.07364245\n", + " 0.04403087]\n", + " [-0.00447621 -0.06409652 0.01211384 ..., 0.0402248 -0.04490571\n", + " -0.05013801]\n", + " [ 0.03981022 -0.13705957 0.05882239 ..., 0.04491902 -0.08644539\n", + " -0.07106441]]\n" + ] + } + ], "source": [ "y = y_equation_3(x, W, b)\n", "z = numpy.dot(y, W.T)\n", @@ -307,7 +321,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": { "collapsed": false }, @@ -331,11 +345,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Well done!\n" + ] + } + ], "source": [ "irange = 0.1 #+-range from which we draw the random numbers\n", "\n", @@ -357,7 +379,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { "collapsed": true }, @@ -392,11 +414,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Well done!\n" + ] + } + ], "source": [ "irange = 0.1 #+-range from which we draw the random numbers\n", "\n", @@ -425,7 +455,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { "collapsed": true }, @@ -438,11 +468,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "my_dot timings:\n", + "10 loops, best of 3: 726 ms per loop\n" + ] + } + ], "source": [ "print 'my_dot timings:'\n", "%timeit -n10 my_dot_mat_mat(x, W)" @@ -450,11 +489,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "numpy.dot timings:\n", + "10 loops, best of 3: 1.17 ms per loop\n" + ] + } + ], "source": [ "print 'numpy.dot timings:'\n", "%timeit -n10 numpy.dot(x, W)" @@ -522,11 +570,78 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Observations: [[-0.12 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]\n", + " [-0.11 -0.1 0.09 -0.06 -0.09 -0. 0.28 -0.12 -0.12 -0.08]\n", + " [-0.13 0.05 -0.13 -0.01 -0.11 -0.13 -0.13 -0.13 -0.13 -0.13]\n", + " [ 0.2 0.12 0.25 0.16 0.03 -0. 0.15 0.08 -0.08 -0.11]\n", + " [-0.13 -0.12 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]\n", + " [-0.1 0.51 1.52 0.14 -0.02 0.77 0.11 0.79 -0.02 0.08]\n", + " [ 0.24 0.15 -0.01 0.08 -0.1 0.45 -0.12 -0.1 -0.13 0.48]\n", + " [ 0.13 -0.06 -0.07 -0.11 -0.11 -0.11 -0.13 -0.11 -0.02 -0.12]\n", + " [-0.06 0.28 -0.13 0.06 0.09 0.09 0.01 -0.07 0.14 -0.11]\n", + " [-0.13 -0.13 -0.1 -0.06 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]]\n", + "To predict: [[-0.12]\n", + " [-0.12]\n", + " [-0.13]\n", + " [-0.1 ]\n", + " [-0.13]\n", + " [-0.08]\n", + " [ 0.24]\n", + " [-0.13]\n", + " [-0.02]\n", + " [-0.13]]\n", + "Observations: [[-0.09 -0.13 -0.13 -0.03 -0.05 -0.11 -0.13 -0.13 -0.13 -0.13]\n", + " [-0.03 0.32 0.28 0.09 -0.04 0.19 0.31 -0.13 0.37 0.34]\n", + " [ 0.12 0.13 0.06 -0.1 -0.1 0.94 0.24 0.12 0.28 -0.04]\n", + " [ 0.26 0.17 -0.04 -0.13 -0.12 -0.09 -0.12 -0.13 -0.1 -0.13]\n", + " [-0.1 -0.1 -0.01 -0.03 -0.07 0.05 -0.03 -0.12 -0.05 -0.13]\n", + " [-0.13 -0.13 -0.13 -0.13 -0.13 -0.13 0.1 -0.13 -0.13 -0.13]\n", + " [-0.01 -0.1 -0.13 -0.13 -0.12 -0.13 -0.13 -0.13 -0.13 -0.11]\n", + " [-0.11 -0.06 -0.11 0.02 -0.03 -0.02 -0.05 -0.11 -0.13 -0.13]\n", + " [-0.01 0.25 -0.08 0.04 -0.1 -0.12 0.06 -0.1 0.08 -0.06]\n", + " [-0.09 -0.09 -0.09 -0.13 -0.11 -0.12 -0. -0.02 0.19 -0.11]]\n", + "To predict: [[-0.13]\n", + " [-0.11]\n", + " [-0.09]\n", + " [-0.08]\n", + " [ 0.19]\n", + " [-0.13]\n", + " [-0.13]\n", + " [-0.03]\n", + " [-0.13]\n", + " [-0.11]]\n", + "Observations: [[-0.08 -0.11 -0.11 0.32 0.05 -0.11 -0.13 0.07 0.08 0.63]\n", + " [-0.07 -0.1 -0.09 -0.08 0.26 -0.05 -0.1 -0. 0.36 -0.12]\n", + " [-0.03 -0.1 0.19 -0.02 0.35 0.38 -0.1 0.44 -0.02 0.21]\n", + " [-0.12 -0. -0.02 0.19 -0.11 -0.11 -0.13 -0.11 -0.02 -0.13]\n", + " [ 0.09 0.1 -0.03 -0.05 0. -0.12 -0.12 -0.13 -0.13 -0.13]\n", + " [ 0.21 0.05 -0.12 -0.05 -0.08 -0.1 -0.13 -0.13 -0.13 -0.13]\n", + " [-0.04 -0.11 0.19 0.16 -0.01 -0.07 -0. -0.06 -0.03 0.16]\n", + " [ 0.09 0.05 0.51 0.34 0.16 0.51 0.56 0.21 -0.06 -0. ]\n", + " [-0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.09 0.49]\n", + " [-0.06 -0.11 -0.13 0.06 -0.01 -0.12 0.54 0.2 -0.1 -0.11]]\n", + "To predict: [[ 0.1 ]\n", + " [ 0.09]\n", + " [ 0.16]\n", + " [-0.13]\n", + " [-0.13]\n", + " [ 0.04]\n", + " [-0.1 ]\n", + " [ 0.05]\n", + " [-0.1 ]\n", + " [-0.11]]\n" + ] + } + ], "source": [ "from mlp.dataset import MetOfficeDataProvider\n", "\n", @@ -549,7 +664,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": { "collapsed": true }, @@ -582,7 +697,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": { "collapsed": true }, @@ -640,11 +755,74 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 18, "metadata": { "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MSE training cost after 1-th epoch is 0.017213\n", + "MSE training cost after 2-th epoch is 0.016103\n", + "MSE training cost after 3-th epoch is 0.015705\n", + "MSE training cost after 4-th epoch is 0.015437\n", + "MSE training cost after 5-th epoch is 0.015255\n", + "MSE training cost after 6-th epoch is 0.015128\n", + "MSE training cost after 7-th epoch is 0.015041\n", + "MSE training cost after 8-th epoch is 0.014981\n", + "MSE training cost after 9-th epoch is 0.014936\n", + "MSE training cost after 10-th epoch is 0.014903\n", + "MSE training cost after 11-th epoch is 0.014879\n", + "MSE training cost after 12-th epoch is 0.014862\n", + "MSE training cost after 13-th epoch is 0.014849\n", + "MSE training cost after 14-th epoch is 0.014839\n", + "MSE training cost after 15-th epoch is 0.014830\n", + "MSE training cost after 16-th epoch is 0.014825\n", + "MSE training cost after 17-th epoch is 0.014820\n", + "MSE training cost after 18-th epoch is 0.014813\n", + "MSE training cost after 19-th epoch is 0.014813\n", + "MSE training cost after 20-th epoch is 0.014810\n", + "MSE training cost after 21-th epoch is 0.014808\n", + "MSE training cost after 22-th epoch is 0.014805\n", + "MSE training cost after 23-th epoch is 0.014806\n", + "MSE training cost after 24-th epoch is 0.014804\n", + "MSE training cost after 25-th epoch is 0.014796\n", + "MSE training cost after 26-th epoch is 0.014798\n", + "MSE training cost after 27-th epoch is 0.014801\n", + "MSE training cost after 28-th epoch is 0.014802\n", + "MSE training cost after 29-th epoch is 0.014801\n", + "MSE training cost after 30-th epoch is 0.014799\n", + "MSE training cost after 31-th epoch is 0.014799\n", + "MSE training cost after 32-th epoch is 0.014793\n", + "MSE training cost after 33-th epoch is 0.014800\n", + "MSE training cost after 34-th epoch is 0.014796\n", + "MSE training cost after 35-th epoch is 0.014799\n", + "MSE training cost after 36-th epoch is 0.014800\n", + "MSE training cost after 37-th epoch is 0.014798\n", + "MSE training cost after 38-th epoch is 0.014799\n", + "MSE training cost after 39-th epoch is 0.014799\n", + "MSE training cost after 40-th epoch is 0.014794\n" + ] + }, + { + "data": { + "text/plain": [ + "(array([[ 0.01],\n", + " [ 0.03],\n", + " [ 0.03],\n", + " [ 0.04],\n", + " [ 0.06],\n", + " [ 0.07],\n", + " [ 0.26]]), array([-0.]))" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "\n", "#some hyper-parameters\n",