Minor textual modifications

This commit is contained in:
Steve Renals 2015-10-12 09:55:34 +01:00
parent bebc595a80
commit 63c90f815e

View File

@ -61,14 +61,14 @@
" * The model (for now) is allowed to have a sequence of layers, mapping inputs $\\mathbf{x}$ to outputs $\\mathbf{y}$. \n",
" * This operation is implemented as a special type of a layer in `mlp.layers.MLP` class. It keeps a sequence of other layers (of various typyes like Linear, Sigmoid, Softmax, etc.) as well as the internal state of a model for a mini-batch, that is, the intermediate data produced in *forward* and *backward* passes.\n",
"2. Forward computation\n",
" * `mlp.layers.MLP` provides a `fprop()` method that iterates over defined layers propagates $\\mathbf{x}$ to $\\mathbf{y}$. \n",
" * Each layer (look at `mlp.layers.Linear` attached below) also implements `fprop()` method, which performs an atomic, for the given layer, operation. Most often, for the $i$-th layer, we want to obtain a linear transform $\\mathbf a^i$ of the inputs, and apply some non-linear transfer function $f^i(\\mathbf a^i)$ to produce the output $\\mathbf h^i$. Note, in general each layer may implement different activation functions $f^i()$, however for now we will use only `sigmoid` and `softmax`\n",
" * `mlp.layers.MLP` provides an `fprop()` method that iterates over defined layers propagates $\\mathbf{x}$ to $\\mathbf{y}$. \n",
" * Each layer (look at `mlp.layers.Linear` attached below) also implements an `fprop()` method, which performs an atomic, for the given layer, operation. Most often, for the $i$-th layer, we want to obtain a linear transform $\\mathbf a^i$ of the inputs, and apply some non-linear transfer function $f^i(\\mathbf a^i)$ to produce the output $\\mathbf h^i$. Note, in general each layer may implement different activation functions $f^i()$, however for now we will use only `sigmoid` and `softmax`\n",
"3. Backward computation\n",
" * Similarly, `mlp.layers.MLP` also implements `bprop()` function, to back-propagate the errors from the top to the bottom layer. This class also keeps the back-propagated stats ($\\delta$) to be used later when computing the gradients w.r.t the parameters.\n",
" * This functionality is also re-implemented by particular layers (again, have a look at `bprop` function of `mlp.layers.Linear`). `bprop()` is suppsed to return both $\\delta$ (needed to update the parameters) but also back-progapate the gradient down to the inputs. Also note, that depending on whether the layer is the top or not (deals directly with the cost or not) some simplifications may apply (i.e. as with cross-entropy and softmax). That's why when implementing a new type of layer that may be used as an output layer one also need to specify the implementation of `bprop_cost()`.\n",
" * Similarly, `mlp.layers.MLP` also implements a `bprop()` function, to back-propagate the errors from the top to the bottom layer. This class also keeps the back-propagated statistics ($\\delta$) to be used later when computing the gradients with respect to the parameters.\n",
" * This functionality is also re-implemented by particular layers (again, have a look at the `bprop` function of `mlp.layers.Linear`). `bprop()` returns both $\\delta$ (needed to update the parameters) but also back-progapates the gradient down to the inputs. Also note, that depending on whether the layer is the top or not (i.e. if it deals directly with the cost function or not) some simplifications may apply ( as with cross-entropy and softmax). That's why when implementing a new type of layer that may be used as an output layer one also need to specify the implementation of `bprop_cost()`.\n",
"4. Learning the model\n",
" * The actual evaluation of the cost as well as the *forward* and *backward* passes one may find `train_epoch()` method of `mlp.optimisers.SGDOptimiser`\n",
" * This function also calls the `pgrads()` method on each layer, that given activations and deltas, is supposed to return the list of the gradients of the cost w.r.t the model parameters, i.e. $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{W^i}}}$ and $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{b}^i}}$ at the above diagram (look at an example implementation in `mlp.layers.Linear`)"
" * The actual evaluation of the cost as well as the *forward* and *backward* passes may be found in the `train_epoch()` method of `mlp.optimisers.SGDOptimiser`\n",
" * This function also calls the `pgrads()` method on each layer, that given activations and deltas, returns the list of the gradients of the cost with respect to the model parameters, i.e. $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{W^i}}}$ and $\\frac{\\partial{\\mathbf{E}}}{\\partial{\\mathbf{b}^i}}$ at the above diagram (look at an example implementation in `mlp.layers.Linear`)"
]
},
{