From 73f894493a793848f68b28610f521eb24cfc7a26 Mon Sep 17 00:00:00 2001
From: pswietojanski <p.swietojanski@gmail.com>
Date: Mon, 9 Nov 2015 10:22:04 +0000
Subject: [PATCH] minor modifications to lab 5

---
 05_Transfer_functions.ipynb | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/05_Transfer_functions.ipynb b/05_Transfer_functions.ipynb
index 99b4201..98c4030 100644
--- a/05_Transfer_functions.ipynb
+++ b/05_Transfer_functions.ipynb
@@ -48,6 +48,8 @@
     "\n",
     "## Tanh\n",
     "\n",
+    "Given a linear activation $a_{i}$ tanh implements the following operation:\n",
+    "\n",
     "(1) $h_i(a_i) = \\mbox{tanh}(a_i) = \\frac{\\exp(a_i) - \\exp(-a_i)}{\\exp(a_i) + \\exp(-a_i)}$\n",
     "\n",
     "Hence, the derivative of $h_i$ with respect to $a_i$ is:\n",
@@ -69,8 +71,8 @@
     "(4) $\\begin{align}\n",
     "\\frac{\\partial h_i}{\\partial a_i} &=\n",
     "\\begin{cases}\n",
-    "     1     & \\quad \\text{if } a_i \\geq 0 \\\\\n",
-    "     0       & \\quad \\text{if } a_i < 0 \\\\\n",
+    "     1     & \\quad \\text{if } a_i > 0 \\\\\n",
+    "     0       & \\quad \\text{if } a_i \\leq 0 \\\\\n",
     "\\end{cases}\n",
     "\\end{align}\n",
     "$\n",
@@ -173,11 +175,13 @@
    "source": [
     "# Exercise 3: Implement Maxout\n",
     "\n",
-    "As with the previous two exercises, your implementation should follow the conventions used to build the Linear, Sigmoid and Softmax layers. As before, test your solution by training a one-hidden-layer model with 100 hidden units, similiar to the one used in Task 3a in the coursework. Use the same optimisation hyper-parameters (learning rate, initial weights range) as you used for ReLU models. Tune the pool size $K$ (but keep the number of total parameters fixed).\n",
+    "As with the previous two exercises, your implementation should follow the conventions used to build the Linear, Sigmoid and Softmax layers. For now implement only non-overlapping pools (i.e. the pool in which all activations $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ belong to only one pool). As before, test your solution by training a one-hidden-layer model with 100 hidden units, similiar to the one used in Task 3a in the coursework. Use the same optimisation hyper-parameters (learning rate, initial weights range) as you used for ReLU models. Tune the pool size $K$ (but keep the number of total parameters fixed).\n",
     "\n",
     "Note: The Max operator reduces dimensionality, hence for example, to get 100 hidden maxout units with pooling size set to $K=2$ the size of linear part needs to be set to $100K$ (assuming non-overlapping pools). This affects how you compute the total number of weights in the model.\n",
     "\n",
-    "Implementation tips: To back-propagate through the maxout layer, one needs to keep track of which linear activation $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ was the maximum in each pool. The convenient way to do so is by storing the indices of the maximum units in the fprop function and then in the backprop stage pass the gradient only through those (i.e. for example, one can build an auxiliary matrix where each element is either 1 (if unit was maximum, and passed forward through the max operator for a given data-point) or 0 otherwise. Then in the backward pass it suffices to upsample the maxout *igrads* signal to the linear layer dimension and element-wise multiply by the aforemenioned auxiliary matrix."
+    "Implementation tips: To back-propagate through the maxout layer, one needs to keep track of which linear activation $a_{j}, a_{j+1}, \\ldots, a_{j+K}$ was the maximum in each pool. The convenient way to do so is by storing the indices of the maximum units in the fprop function and then in the backprop stage pass the gradient only through those (i.e. for example, one can build an auxiliary matrix where each element is either 1 (if unit was maximum, and passed forward through the max operator for a given data-point) or 0 otherwise. Then in the backward pass it suffices to upsample the maxout *igrads* signal to the linear layer dimension and element-wise multiply by the aforemenioned auxiliary matrix.\n",
+    "\n",
+    "*Optional:* Implement the generic pooling mechanism by introducing an additional *stride* hyper-parameter $0<S\\leq K$. It specifies how many units you move to build the next pool. For instance, for non-overlapping pooling with $S=K=3$ one would build the first two maxout units as: $h_1=\\max(a_1,a_2,a_3)$ and $h_2=\\max(a_4,a_5,a_6)$. However, when setting $S=1$ the pools will share some subset of activations: $h_1=\\max(a_1,a_2,a_3)$ and $h_2=\\max(a_2,a_3,a_4)$."
    ]
   },
   {
@@ -226,7 +230,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython2",
-   "version": "2.7.10"
+   "version": "2.7.9"
   }
  },
  "nbformat": 4,