diff --git a/01_Linear_Models.ipynb b/01_Linear_Models.ipynb index 7fc5611..7eea371 100644 --- a/01_Linear_Models.ipynb +++ b/01_Linear_Models.ipynb @@ -432,7 +432,7 @@ "source": [ "# Iterative learning of linear models\n", "\n", - "We will learn the model with stochastic gradient descent using mean square error (MSE) loss function, which is defined as follows:\n", + "We will learn the model with stochastic gradient descent on N data-points using mean square error (MSE) loss function, which is defined as follows:\n", "\n", "(5) $\n", "E = \\frac{1}{2} \\sum_{n=1}^N ||\\mathbf{y}^n - \\mathbf{t}^n||^2 = \\sum_{n=1}^N E^n \\\\\n", @@ -444,7 +444,8 @@ "Hence, the gradient w.r.t (with respect to) the $r$ output y of the model is defined as, so called delta function, $\\delta_r$: \n", "\n", "(8) $\\frac{\\partial{E^n}}{\\partial{y_{r}}} = (y^n_r - t^n_r) = \\delta^n_r \\quad ; \\quad\n", - " \\delta^n_r = y^n_r - t^n_r \n", + " \\delta^n_r = y^n_r - t^n_r \\\\\n", + " \\frac{\\partial{E}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\delta^n_r\n", "$\n", "\n", "Similarly, using the above $\\delta^n_r$ one can express the gradient of the weight $w_{sr}$ (from the s-th input to the r-th output) for linear model and MSE cost as follows:\n", @@ -518,7 +519,6 @@ "\n", "def fprop(x, W, b):\n", " #code implementing eq. (3)\n", - " #return: y\n", " raise NotImplementedError('Write me!')\n", "\n", "def cost(y, t):\n", @@ -571,8 +571,8 @@ " #4. Update the model, we update with the mean gradient\n", " # over the minibatch, rather than sum of particular gradients\n", " # in a minibatch, to do so we scale the learning rate by batch_size\n", - " mb_size = x.shape[0]\n", - " effect_learn_rate = learning_rate / mb_size\n", + " batch_size = x.shape[0]\n", + " effect_learn_rate = learning_rate / batch_size\n", "\n", " W = W - effect_learn_rate * grad_W\n", " b = b - effect_learn_rate * grad_b\n",