Merge pull request #12 from pswietojanski/master

2nd lab
2015-10-05 09:20:55 +01:00 · 2015-10-05 09:20:55 +01:00 · c1e1d61781
commit c1e1d61781
parent 10a1b3bd77 1d4587bd42
1 changed files with 5 additions and 5 deletions
--- a/01_Linear_Models.ipynb
+++ b/01_Linear_Models.ipynb
@ -432,7 +432,7 @@
   "source": [
    "# Iterative learning of linear models\n",
    "\n",
-    "We will learn the model with stochastic gradient descent using mean square error (MSE) loss function, which is defined as follows:\n",
+    "We will learn the model with stochastic gradient descent on N data-points using mean square error (MSE) loss function, which is defined as follows:\n",
    "\n",
    "(5) $\n",
    "E = \\frac{1}{2} \\sum_{n=1}^N ||\\mathbf{y}^n - \\mathbf{t}^n||^2 =  \\sum_{n=1}^N E^n \\\\\n",
@ -444,7 +444,8 @@
    "Hence, the gradient w.r.t (with respect to) the $r$ output y of the model is defined as, so called delta function, $\\delta_r$: \n",
    "\n",
    "(8) $\\frac{\\partial{E^n}}{\\partial{y_{r}}} = (y^n_r - t^n_r) =  \\delta^n_r \\quad ; \\quad\n",
-    "    \\delta^n_r = y^n_r - t^n_r \n",
+    "    \\delta^n_r = y^n_r - t^n_r \\\\\n",
+    "    \\frac{\\partial{E}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\delta^n_r\n",
    "$\n",
    "\n",
    "Similarly, using the above $\\delta^n_r$ one can express the gradient of the  weight $w_{sr}$ (from the s-th input to the r-th output) for linear model and MSE cost as follows:\n",
@ -518,7 +519,6 @@
    "\n",
    "def fprop(x, W, b):\n",
    "    #code implementing eq. (3)\n",
-    "    #return: y\n",
    "    raise NotImplementedError('Write me!')\n",
    "\n",
    "def cost(y, t):\n",
@ -571,8 +571,8 @@
    "        #4. Update the model, we update with the mean gradient\n",
    "        # over the minibatch, rather than sum of particular gradients\n",
    "        # in a minibatch, to do so we scale the learning rate by batch_size\n",
-    "        mb_size = x.shape[0]\n",
-    "        effect_learn_rate = learning_rate / mb_size\n",
+    "        batch_size = x.shape[0]\n",
+    "        effect_learn_rate = learning_rate / batch_size\n",
    "\n",
    "        W = W - effect_learn_rate * grad_W\n",
    "        b = b - effect_learn_rate * grad_b\n",