This commit is contained in:
pswietojanski 2015-10-05 09:20:14 +01:00
parent e5ffdfeb60
commit 1d4587bd42

View File

@ -432,7 +432,7 @@
"source": [
"# Iterative learning of linear models\n",
"\n",
"We will learn the model with stochastic gradient descent using mean square error (MSE) loss function, which is defined as follows:\n",
"We will learn the model with stochastic gradient descent on N data-points using mean square error (MSE) loss function, which is defined as follows:\n",
"\n",
"(5) $\n",
"E = \\frac{1}{2} \\sum_{n=1}^N ||\\mathbf{y}^n - \\mathbf{t}^n||^2 = \\sum_{n=1}^N E^n \\\\\n",
@ -444,7 +444,8 @@
"Hence, the gradient w.r.t (with respect to) the $r$ output y of the model is defined as, so called delta function, $\\delta_r$: \n",
"\n",
"(8) $\\frac{\\partial{E^n}}{\\partial{y_{r}}} = (y^n_r - t^n_r) = \\delta^n_r \\quad ; \\quad\n",
" \\delta^n_r = y^n_r - t^n_r \n",
" \\delta^n_r = y^n_r - t^n_r \\\\\n",
" \\frac{\\partial{E}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\delta^n_r\n",
"$\n",
"\n",
"Similarly, using the above $\\delta^n_r$ one can express the gradient of the weight $w_{sr}$ (from the s-th input to the r-th output) for linear model and MSE cost as follows:\n",
@ -518,7 +519,6 @@
"\n",
"def fprop(x, W, b):\n",
" #code implementing eq. (3)\n",
" #return: y\n",
" raise NotImplementedError('Write me!')\n",
"\n",
"def cost(y, t):\n",
@ -571,8 +571,8 @@
" #4. Update the model, we update with the mean gradient\n",
" # over the minibatch, rather than sum of particular gradients\n",
" # in a minibatch, to do so we scale the learning rate by batch_size\n",
" mb_size = x.shape[0]\n",
" effect_learn_rate = learning_rate / mb_size\n",
" batch_size = x.shape[0]\n",
" effect_learn_rate = learning_rate / batch_size\n",
"\n",
" W = W - effect_learn_rate * grad_W\n",
" b = b - effect_learn_rate * grad_b\n",