re-commit (01_Linear_Models and 01_Linear_Models_solution text do not match!!!)

This commit is contained in:
kungfujam 2015-10-16 18:11:49 +01:00
parent 40156f3778
commit 80b025e525

View File

@ -55,19 +55,19 @@
"***\n",
"### Note on storing matrices in computer memory\n",
"\n",
"Suppose you want to store the following matrix in memory: $\\left[ \\begin{array}{ccc}\n",
"Consider you want to store the following array in memory: $\\left[ \\begin{array}{ccc}\n",
"1 & 2 & 3 \\\\\n",
"4 & 5 & 6 \\\\\n",
"7 & 8 & 9 \\end{array} \\right]$ \n",
"\n",
"If you allocate the memory at once for the whole matrix, then the above matrix would be organised as a vector in one of two possible forms:\n",
"In computer memory the above matrix would be organised as a vector in either (assume you allocate the memory at once for the whole matrix):\n",
"\n",
"* Row-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n",
"1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\end{array} \\right ]$\n",
"* Column-wise layout where the order would look like: $\\left [ \\begin{array}{ccccccccc}\n",
"1 & 4 & 7 & 2 & 5 & 8 & 3 & 6 & 9 \\end{array} \\right ]$\n",
"\n",
"Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with the more modern (and default) `C`-like approach and use the row-wise format (in contrast to Fortran that used a column-wise approach). \n",
"Although `numpy` can easily handle both formats (possibly with some computational overhead), in our code we will stick with modern (and default) `c`-like approach and use row-wise format (contrary to Fortran that used column-wise approach). \n",
"\n",
"This means, that in this tutorial:\n",
"* vectors are kept row-wise $\\mathbf{x} = (x_1, x_1, \\ldots, x_D) $ (rather than $\\mathbf{x} = (x_1, x_1, \\ldots, x_D)^T$)\n",
@ -76,18 +76,18 @@
"x_{21} & x_{22} & \\ldots & x_{2D} \\\\\n",
"x_{31} & x_{32} & \\ldots & x_{3D} \\\\ \\end{array} \\right]$ and each row (i.e. $\\left[ \\begin{array}{cccc} x_{11} & x_{12} & \\ldots & x_{1D} \\end{array} \\right]$) represents a single data-point (like one MNIST image or one window of observations)\n",
"\n",
"In lecture slides you will find the equations following the conventional mathematical approach, using column vectors, but you can easily map between column-major and row-major organisations using a matrix transpose.\n",
"In lecture slides you will find the equations following the conventional mathematical column-wise approach, but you can easily map them one way or the other using using matrix transpose.\n",
"\n",
"***\n",
"\n",
"## Linear and Affine Transforms\n",
"\n",
"The basis of all linear models is the so called affine transform, which is a transform that implements a linear transformation and translation of the input features. The transforms we are going to use are parameterised by:\n",
"The basis of all linear models is so called affine transform, that is a transform that implements some linear transformation and translation of input features. The transforms we are going to use are parameterised by:\n",
"\n",
" * A weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n",
" * A bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n",
" * Weight matrix $\\mathbf{W} \\in \\mathbb{R}^{D\\times K}$: where element $w_{ik}$ is the weight from input $x_i$ to output $y_k$\n",
" * Bias vector $\\mathbf{b}\\in R^{K}$ : where element $b_{k}$ is the bias for output $k$\n",
"\n",
"Note, the bias is simply some additive term, and can be easily incorporated into an additional row in weight matrix and an additional input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n",
"Note, the bias is simply some additve term, and can be easily incorporated into an additional row in weight matrix and an additinal input in the inputs which is set to $1.0$ (as in the below picture taken from the lecture slides). However, here (and in the code) we will keep them separate.\n",
"\n",
"![Making Predictions](res/singleLayerNetWts-1.png)\n",
"\n",
@ -152,13 +152,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy\n",
"import sys\n",
"\n",
"#initialise the random generator to be used later\n",
"seed=[2015, 10, 1]\n",
@ -185,7 +186,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {
"collapsed": false
},
@ -204,7 +205,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 4,
"metadata": {
"collapsed": false
},
@ -273,11 +274,24 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.00683757 -0.13638553 0.00203203 ..., 0.02690207 -0.07364245\n",
" 0.04403087]\n",
" [-0.00447621 -0.06409652 0.01211384 ..., 0.0402248 -0.04490571\n",
" -0.05013801]\n",
" [ 0.03981022 -0.13705957 0.05882239 ..., 0.04491902 -0.08644539\n",
" -0.07106441]]\n"
]
}
],
"source": [
"y = y_equation_3(x, W, b)\n",
"z = numpy.dot(y, W.T)\n",
@ -307,7 +321,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"metadata": {
"collapsed": false
},
@ -331,11 +345,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Well done!\n"
]
}
],
"source": [
"irange = 0.1 #+-range from which we draw the random numbers\n",
"\n",
@ -357,7 +379,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {
"collapsed": true
},
@ -392,11 +414,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Well done!\n"
]
}
],
"source": [
"irange = 0.1 #+-range from which we draw the random numbers\n",
"\n",
@ -425,7 +455,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {
"collapsed": true
},
@ -438,11 +468,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_dot timings:\n",
"10 loops, best of 3: 726 ms per loop\n"
]
}
],
"source": [
"print 'my_dot timings:'\n",
"%timeit -n10 my_dot_mat_mat(x, W)"
@ -450,11 +489,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"numpy.dot timings:\n",
"10 loops, best of 3: 1.17 ms per loop\n"
]
}
],
"source": [
"print 'numpy.dot timings:'\n",
"%timeit -n10 numpy.dot(x, W)"
@ -522,11 +570,78 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Observations: [[-0.12 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.11 -0.1 0.09 -0.06 -0.09 -0. 0.28 -0.12 -0.12 -0.08]\n",
" [-0.13 0.05 -0.13 -0.01 -0.11 -0.13 -0.13 -0.13 -0.13 -0.13]\n",
" [ 0.2 0.12 0.25 0.16 0.03 -0. 0.15 0.08 -0.08 -0.11]\n",
" [-0.13 -0.12 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.1 0.51 1.52 0.14 -0.02 0.77 0.11 0.79 -0.02 0.08]\n",
" [ 0.24 0.15 -0.01 0.08 -0.1 0.45 -0.12 -0.1 -0.13 0.48]\n",
" [ 0.13 -0.06 -0.07 -0.11 -0.11 -0.11 -0.13 -0.11 -0.02 -0.12]\n",
" [-0.06 0.28 -0.13 0.06 0.09 0.09 0.01 -0.07 0.14 -0.11]\n",
" [-0.13 -0.13 -0.1 -0.06 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13]]\n",
"To predict: [[-0.12]\n",
" [-0.12]\n",
" [-0.13]\n",
" [-0.1 ]\n",
" [-0.13]\n",
" [-0.08]\n",
" [ 0.24]\n",
" [-0.13]\n",
" [-0.02]\n",
" [-0.13]]\n",
"Observations: [[-0.09 -0.13 -0.13 -0.03 -0.05 -0.11 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.03 0.32 0.28 0.09 -0.04 0.19 0.31 -0.13 0.37 0.34]\n",
" [ 0.12 0.13 0.06 -0.1 -0.1 0.94 0.24 0.12 0.28 -0.04]\n",
" [ 0.26 0.17 -0.04 -0.13 -0.12 -0.09 -0.12 -0.13 -0.1 -0.13]\n",
" [-0.1 -0.1 -0.01 -0.03 -0.07 0.05 -0.03 -0.12 -0.05 -0.13]\n",
" [-0.13 -0.13 -0.13 -0.13 -0.13 -0.13 0.1 -0.13 -0.13 -0.13]\n",
" [-0.01 -0.1 -0.13 -0.13 -0.12 -0.13 -0.13 -0.13 -0.13 -0.11]\n",
" [-0.11 -0.06 -0.11 0.02 -0.03 -0.02 -0.05 -0.11 -0.13 -0.13]\n",
" [-0.01 0.25 -0.08 0.04 -0.1 -0.12 0.06 -0.1 0.08 -0.06]\n",
" [-0.09 -0.09 -0.09 -0.13 -0.11 -0.12 -0. -0.02 0.19 -0.11]]\n",
"To predict: [[-0.13]\n",
" [-0.11]\n",
" [-0.09]\n",
" [-0.08]\n",
" [ 0.19]\n",
" [-0.13]\n",
" [-0.13]\n",
" [-0.03]\n",
" [-0.13]\n",
" [-0.11]]\n",
"Observations: [[-0.08 -0.11 -0.11 0.32 0.05 -0.11 -0.13 0.07 0.08 0.63]\n",
" [-0.07 -0.1 -0.09 -0.08 0.26 -0.05 -0.1 -0. 0.36 -0.12]\n",
" [-0.03 -0.1 0.19 -0.02 0.35 0.38 -0.1 0.44 -0.02 0.21]\n",
" [-0.12 -0. -0.02 0.19 -0.11 -0.11 -0.13 -0.11 -0.02 -0.13]\n",
" [ 0.09 0.1 -0.03 -0.05 0. -0.12 -0.12 -0.13 -0.13 -0.13]\n",
" [ 0.21 0.05 -0.12 -0.05 -0.08 -0.1 -0.13 -0.13 -0.13 -0.13]\n",
" [-0.04 -0.11 0.19 0.16 -0.01 -0.07 -0. -0.06 -0.03 0.16]\n",
" [ 0.09 0.05 0.51 0.34 0.16 0.51 0.56 0.21 -0.06 -0. ]\n",
" [-0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.13 -0.09 0.49]\n",
" [-0.06 -0.11 -0.13 0.06 -0.01 -0.12 0.54 0.2 -0.1 -0.11]]\n",
"To predict: [[ 0.1 ]\n",
" [ 0.09]\n",
" [ 0.16]\n",
" [-0.13]\n",
" [-0.13]\n",
" [ 0.04]\n",
" [-0.1 ]\n",
" [ 0.05]\n",
" [-0.1 ]\n",
" [-0.11]]\n"
]
}
],
"source": [
"from mlp.dataset import MetOfficeDataProvider\n",
"\n",
@ -549,7 +664,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"metadata": {
"collapsed": true
},
@ -582,7 +697,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"metadata": {
"collapsed": true
},
@ -640,11 +755,74 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE training cost after 1-th epoch is 0.017213\n",
"MSE training cost after 2-th epoch is 0.016103\n",
"MSE training cost after 3-th epoch is 0.015705\n",
"MSE training cost after 4-th epoch is 0.015437\n",
"MSE training cost after 5-th epoch is 0.015255\n",
"MSE training cost after 6-th epoch is 0.015128\n",
"MSE training cost after 7-th epoch is 0.015041\n",
"MSE training cost after 8-th epoch is 0.014981\n",
"MSE training cost after 9-th epoch is 0.014936\n",
"MSE training cost after 10-th epoch is 0.014903\n",
"MSE training cost after 11-th epoch is 0.014879\n",
"MSE training cost after 12-th epoch is 0.014862\n",
"MSE training cost after 13-th epoch is 0.014849\n",
"MSE training cost after 14-th epoch is 0.014839\n",
"MSE training cost after 15-th epoch is 0.014830\n",
"MSE training cost after 16-th epoch is 0.014825\n",
"MSE training cost after 17-th epoch is 0.014820\n",
"MSE training cost after 18-th epoch is 0.014813\n",
"MSE training cost after 19-th epoch is 0.014813\n",
"MSE training cost after 20-th epoch is 0.014810\n",
"MSE training cost after 21-th epoch is 0.014808\n",
"MSE training cost after 22-th epoch is 0.014805\n",
"MSE training cost after 23-th epoch is 0.014806\n",
"MSE training cost after 24-th epoch is 0.014804\n",
"MSE training cost after 25-th epoch is 0.014796\n",
"MSE training cost after 26-th epoch is 0.014798\n",
"MSE training cost after 27-th epoch is 0.014801\n",
"MSE training cost after 28-th epoch is 0.014802\n",
"MSE training cost after 29-th epoch is 0.014801\n",
"MSE training cost after 30-th epoch is 0.014799\n",
"MSE training cost after 31-th epoch is 0.014799\n",
"MSE training cost after 32-th epoch is 0.014793\n",
"MSE training cost after 33-th epoch is 0.014800\n",
"MSE training cost after 34-th epoch is 0.014796\n",
"MSE training cost after 35-th epoch is 0.014799\n",
"MSE training cost after 36-th epoch is 0.014800\n",
"MSE training cost after 37-th epoch is 0.014798\n",
"MSE training cost after 38-th epoch is 0.014799\n",
"MSE training cost after 39-th epoch is 0.014799\n",
"MSE training cost after 40-th epoch is 0.014794\n"
]
},
{
"data": {
"text/plain": [
"(array([[ 0.01],\n",
" [ 0.03],\n",
" [ 0.03],\n",
" [ 0.04],\n",
" [ 0.06],\n",
" [ 0.07],\n",
" [ 0.26]]), array([-0.]))"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"#some hyper-parameters\n",