diff --git a/01_Linear_Models.ipynb b/01_Linear_Models.ipynb
index 7fc5611..7eea371 100644
--- a/01_Linear_Models.ipynb
+++ b/01_Linear_Models.ipynb
@@ -432,7 +432,7 @@
    "source": [
     "# Iterative learning of linear models\n",
     "\n",
-    "We will learn the model with stochastic gradient descent using mean square error (MSE) loss function, which is defined as follows:\n",
+    "We will learn the model with stochastic gradient descent on N data-points using mean square error (MSE) loss function, which is defined as follows:\n",
     "\n",
     "(5) $\n",
     "E = \\frac{1}{2} \\sum_{n=1}^N ||\\mathbf{y}^n - \\mathbf{t}^n||^2 =  \\sum_{n=1}^N E^n \\\\\n",
@@ -444,7 +444,8 @@
     "Hence, the gradient w.r.t (with respect to) the $r$ output y of the model is defined as, so called delta function, $\\delta_r$: \n",
     "\n",
     "(8) $\\frac{\\partial{E^n}}{\\partial{y_{r}}} = (y^n_r - t^n_r) =  \\delta^n_r \\quad ; \\quad\n",
-    "    \\delta^n_r = y^n_r - t^n_r \n",
+    "    \\delta^n_r = y^n_r - t^n_r \\\\\n",
+    "    \\frac{\\partial{E}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\frac{\\partial{E^n}}{\\partial{y_{r}}} = \\sum_{n=1}^N \\delta^n_r\n",
     "$\n",
     "\n",
     "Similarly, using the above $\\delta^n_r$ one can express the gradient of the  weight $w_{sr}$ (from the s-th input to the r-th output) for linear model and MSE cost as follows:\n",
@@ -518,7 +519,6 @@
     "\n",
     "def fprop(x, W, b):\n",
     "    #code implementing eq. (3)\n",
-    "    #return: y\n",
     "    raise NotImplementedError('Write me!')\n",
     "\n",
     "def cost(y, t):\n",
@@ -571,8 +571,8 @@
     "        #4. Update the model, we update with the mean gradient\n",
     "        # over the minibatch, rather than sum of particular gradients\n",
     "        # in a minibatch, to do so we scale the learning rate by batch_size\n",
-    "        mb_size = x.shape[0]\n",
-    "        effect_learn_rate = learning_rate / mb_size\n",
+    "        batch_size = x.shape[0]\n",
+    "        effect_learn_rate = learning_rate / batch_size\n",
     "\n",
     "        W = W - effect_learn_rate * grad_W\n",
     "        b = b - effect_learn_rate * grad_b\n",