mlpractical/04_Regularisation_solution.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "This tutorial focuses on implementation of three reqularisaion techniques, two of them are norm based approaches which are added to optimised objective and the third technique, called *droput*, is a form of noise injection by random corruption of information carried by hidden units during training.\n",
    "\n",
    "\n",
    "## Virtual environments\n",
    "\n",
    "Before you proceed onwards, remember to activate your virtual environment:\n",
    "   * If you were in last week's Tuesday or Wednesday group type `activate_mlp` or `source ~/mlpractical/venv/bin/activate`\n",
    "   * If you were in the Monday group:\n",
    "      + and if you have chosen the **comfy** way type: `workon mlpractical`\n",
    "      + and if you have chosen the **generic** way, `source` your virutal environment using `source` and specyfing the path to the activate script (you need to localise it yourself, there were not any general recommendations w.r.t dir structure and people have installed it in different places, usually somewhere in the home directories. If you cannot easily find it by yourself, use something like: `find . -iname activate` ):\n",
    "\n",
    "## Syncing the git repository\n",
    "\n",
    "Look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a> for more details. But in short, we recommend to create a separate branch for this lab, as follows:\n",
    "\n",
    "1. Enter the mlpractical directory `cd ~/mlpractical/repo-mlp`\n",
    "2. List the branches and check which is currently active by typing: `git branch`\n",
    "3. If you have followed our recommendations, you should be in the `coursework1` branch, please commit your local changed to the repo index by typing:\n",
    "```\n",
    "git commit -am \"finished coursework\"\n",
    "```\n",
    "4. Now you can switch to `master` branch by typing: \n",
    "```\n",
    "git checkout master\n",
    " ```\n",
    "5. To update the repository (note, assuming master does not have any conflicts), if there are some, have a look <a href=\"https://github.com/CSTR-Edinburgh/mlpractical/blob/master/gitFAQ.md\">here</a>\n",
    "```\n",
    "git pull\n",
    "```\n",
    "6. And now, create the new branch & swith to it by typing:\n",
    "```\n",
    "git checkout -b lab4\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Regularisation\n",
    "\n",
    "Regularisation add some *complexity term* to the cost function. It's purpose is to put some prior on the model's parameters. The most common prior is perhaps the one which assumes smoother solutions (the one which are not able to fit training data too well) are better as they are more likely to better generalise to unseen data. \n",
    "\n",
    "A way to incorporate such prior in the model is to add some term that penalise certain configurations of the parameters -- either from growing too large ($L_2$) or the one that prefers solution that could be modelled with less parameters ($L_1$), hence encouraging some parameters to become 0. One can, of course, combine many such priors when optimising the model, however, in the lab we shall use $L_1$ and/or $L_2$ priors.\n",
    "\n",
    "They can be easily incorporated into the training objective by adding some additive terms, as follows:\n",
    "\n",
    "(1) $\n",
    " \\begin{align*}\n",
    "        E^n &= \\underbrace{E^n_{\\text{train}}}_{\\text{data term}} + \n",
    "    \\underbrace{\\beta_{L_1} E^n_{L_1}}_{\\text{prior term}} + \\underbrace{\\beta_{L_2} E^n_{L_2}}_{\\text{prior term}}\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "where $ E^n_{\\text{train}} = - \\sum_{k=1}^K t^n_k \\ln y^n_k $,  $\\beta_{L_1}$ and $\\beta_{L_2}$ some non-negative constants specified a priori (hyper-parameters) and $E^n_{L_1}$ and $E^n_{L_2}$ norm metric specifying certain properties of parameters:\n",
    "\n",
    "(2) $\n",
    " \\begin{align*}\n",
    " E^n_{L_p}(\\mathbf{W}) = \\left ( \\sum_{i,j \\in \\mathbf{W}} |w_{i,j}|^p \\right )^{\\frac{1}{p}}\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "where $p$ denotes the norm-order (for regularisation either 1 or 2). (TODO: explain here why we usualy skip square root for p=2)\n",
    "\n",
    "## $L_{p=2}$ (Weight Decay)\n",
    "\n",
    "(3) $\n",
    " \\begin{align*}\n",
    "        E^n &= \\underbrace{E^n_{\\text{train}}}_{\\text{data term}} + \n",
    "    \\underbrace{\\beta E^n_{L_2}}_{\\text{prior term}} = E^n_{\\text{train}} + \\beta_{L_2} \\frac{1}{2}|w_i|^2\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "(4) $\n",
    "\\begin{align*}\\frac{\\partial E^n}{\\partial w_i} &= \\frac{\\partial (E^n_{\\text{train}} + \\beta_{L_2} E_{L_2}) }{\\partial w_i} \n",
    "  = \\left( \\frac{\\partial E^n_{\\text{train}}}{\\partial w_i}  + \\beta_{L_2} \\frac{\\partial\n",
    "      E_{L_2}}{\\partial w_i} \\right) \n",
    "  = \\left( \\frac{\\partial E^n_{\\text{train}}}{\\partial w_i}  + \\beta_{L_2} w_i \\right)\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "(5) $\n",
    "\\begin{align*}\n",
    "  \\Delta w_i &= -\\eta \\left( \\frac{\\partial E^n_{\\text{train}}}{\\partial w_i}  + \\beta_{L_2} w_i \\right) \n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "where $\\eta$ is learning rate.\n",
    "\n",
    "## $L_{p=1}$ (Sparsity)\n",
    "\n",
    "(6) $\n",
    " \\begin{align*}\n",
    "        E^n &= \\underbrace{E^n_{\\text{train}}}_{\\text{data term}} + \n",
    "    \\underbrace{\\beta E^n_{L_1}}_{\\text{prior term}} \n",
    "        = E^n_{\\text{train}} + \\beta_{L_1} |w_i|\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "(7) $\\begin{align*}\n",
    "  \\frac{\\partial E^n}{\\partial w_i} =  \\frac{\\partial E^n_{\\text{train}}}{\\partial w_i}  + \\beta_{L_1} \\frac{\\partial E_{L_1}}{\\partial w_i}  =  \\frac{\\partial E^n_{\\text{train}}}{\\partial w_i}  + \\beta_{L_1}  \\mbox{sgn}(w_i)\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
    "(8) $\\begin{align*}\n",
    "  \\Delta w_i &= -\\eta \\left( \\frac{\\partial E^n_{\\text{train}}}{\\partial w_i}  + \\beta_{L_1} \\mbox{sgn}(w_i) \\right) \n",
    "\\end{align*}$\n",
    "\n",
    "Where $\\mbox{sgn}(w_i)$ is the sign of $w_i$: $\\mbox{sgn}(w_i) = 1$ if $w_i>0$ and $\\mbox{sgn}(w_i) = -1$ if $w_i<0$\n",
    "\n",
    "One can also apply those penalty terms for biases, however, this is usually not necessary as biases have secondary impact on smoothnes of the given solution.\n",
    "\n",
    "## Dropout\n",
    "\n",
    "Dropout, for a given layer's output $\\mathbf{h}^i \\in \\mathbb{R}^{BxH^l}$ (where $B$ is batch size and $H^l$ is the $l$-th layer output dimensionality) implements the following transformation:\n",
    "\n",
    "(9) $\\mathbf{\\hat h}^l = \\mathbf{d}^l\\circ\\mathbf{h}^l$\n",
    "\n",
    "where $\\circ$ denotes an elementwise product and $\\mathbf{d}^l \\in \\{0,1\\}^{BxH^i}$ is a matrix in which $d^l_{ij}$ element is sampled from the Bernoulli distribution:\n",
    "\n",
    "(10) $d^l_{ij} \\sim \\mbox{Bernoulli}(p^l_d)$\n",
    "\n",
    "with $0<p^l_d<1$ denoting the probability the given unit is kept unchanged (dropping probability is thus $1-p^l_d$). We ignore here edge scenarios where $p^l_d=1$ and there is no dropout applied (and the training would be exactly the same as in standard SGD) and $p^l_d=0$ where all units would have been dropped, hence the model would not learn anything.\n",
    "\n",
    "The probability $p^l_d$ is a hyperparameter (like learning rate) meaning it needs to be provided before training and also very often tuned for the given task. As the notation suggest, it can be specified separately for each layer, including scenario where $l=0$ when some random input features (pixels in the image for MNIST) are being also ommitted.\n",
    "\n",
    "### Keeping the $l$-th layer output $\\mathbf{\\hat h}^l$ (input to the upper layer) appropiately scaled at test-time\n",
    "\n",
    "The other issue one needs to take into account is the mismatch that arises between training and test (runtime) stages when dropout is applied. It is due to the fact that droput is not applied when testing hence the average input to the unit in upper layer is going to be bigger when compared to training stage (where some inputs are set to 0), in average $1/p^l_d$ times bigger. \n",
    "\n",
    "So to account for this mismatch one could either:\n",
    "\n",
    "1. When training is finished scale the final weight matrices $\\mathbf{W}^l, l=1,\\ldots,L$ by $p^{l-1}_d$ (remember, $p^{0}_d$ is the probability related to the input features)\n",
    "2. Scale the activations in equation (9) during training, that is, for each mini-batch multiply $\\mathbf{\\hat h}^l$ by $1/p^l_d$ to compensate for dropped units and then at run-time use the model as usual, **without** scaling. Make sure the $1/p^l_d$ scaler is taken into account for both forward and backward passes.\n",
    "\n",
    "\n",
    "Our recommendation is option 2 as it will make some things easier from implementation perspective. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Initialising data providers...\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "import logging\n",
    "from mlp.dataset import MNISTDataProvider\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)\n",
    "logger.info('Initialising data providers...')\n",
    "\n",
    "train_dp = MNISTDataProvider(dset='train', batch_size=10, max_num_batches=100, randomize=True)\n",
    "valid_dp = MNISTDataProvider(dset='valid', batch_size=10000, max_num_batches=-10, randomize=False)\n",
    "test_dp = MNISTDataProvider(dset='eval', batch_size=10000, max_num_batches=-10, randomize=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Training started...\n",
      "INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 2.624. Accuracy is 8.60%\n",
      "INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 2.554. Accuracy is 9.84%\n",
      "INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 2.631. Accuracy is 59.70%\n",
      "INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 0.679. Accuracy is 77.66%\n",
      "INFO:mlp.optimisers:Epoch 1: Took 9 seconds. Training speed 295 pps. Validation speed 1648 pps.\n",
      "INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 0.497. Accuracy is 83.10%\n",
      "INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 0.449. Accuracy is 86.47%\n",
      "INFO:mlp.optimisers:Epoch 2: Took 10 seconds. Training speed 281 pps. Validation speed 1656 pps.\n",
      "INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 0.339. Accuracy is 90.20%\n",
      "INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 0.390. Accuracy is 88.50%\n",
      "INFO:mlp.optimisers:Epoch 3: Took 9 seconds. Training speed 302 pps. Validation speed 1624 pps.\n",
      "INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 0.237. Accuracy is 93.00%\n",
      "INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 0.400. Accuracy is 88.40%\n",
      "INFO:mlp.optimisers:Epoch 4: Took 9 seconds. Training speed 314 pps. Validation speed 1621 pps.\n",
      "INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 0.182. Accuracy is 95.20%\n",
      "INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 0.427. Accuracy is 88.01%\n",
      "INFO:mlp.optimisers:Epoch 5: Took 10 seconds. Training speed 311 pps. Validation speed 1590 pps.\n",
      "INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 0.139. Accuracy is 96.10%\n",
      "INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 0.409. Accuracy is 88.26%\n",
      "INFO:mlp.optimisers:Epoch 6: Took 9 seconds. Training speed 333 pps. Validation speed 1598 pps.\n",
      "INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 0.095. Accuracy is 98.30%\n",
      "INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 0.370. Accuracy is 89.90%\n",
      "INFO:mlp.optimisers:Epoch 7: Took 9 seconds. Training speed 334 pps. Validation speed 1568 pps.\n",
      "INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 0.069. Accuracy is 99.00%\n",
      "INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 0.407. Accuracy is 89.28%\n",
      "INFO:mlp.optimisers:Epoch 8: Took 9 seconds. Training speed 315 pps. Validation speed 1610 pps.\n",
      "INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 0.053. Accuracy is 99.50%\n",
      "INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 0.382. Accuracy is 90.10%\n",
      "INFO:mlp.optimisers:Epoch 9: Took 10 seconds. Training speed 316 pps. Validation speed 1439 pps.\n",
      "INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 0.039. Accuracy is 99.50%\n",
      "INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 0.402. Accuracy is 89.77%\n",
      "INFO:mlp.optimisers:Epoch 10: Took 10 seconds. Training speed 326 pps. Validation speed 1456 pps.\n",
      "INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 0.031. Accuracy is 99.70%\n",
      "INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 0.424. Accuracy is 89.40%\n",
      "INFO:mlp.optimisers:Epoch 11: Took 9 seconds. Training speed 314 pps. Validation speed 1610 pps.\n",
      "INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 0.024. Accuracy is 99.70%\n",
      "INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 0.448. Accuracy is 88.64%\n",
      "INFO:mlp.optimisers:Epoch 12: Took 9 seconds. Training speed 305 pps. Validation speed 1712 pps.\n",
      "INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 0.020. Accuracy is 99.90%\n",
      "INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 0.406. Accuracy is 90.36%\n",
      "INFO:mlp.optimisers:Epoch 13: Took 9 seconds. Training speed 317 pps. Validation speed 1664 pps.\n",
      "INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 0.016. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 0.420. Accuracy is 89.97%\n",
      "INFO:mlp.optimisers:Epoch 14: Took 9 seconds. Training speed 327 pps. Validation speed 1572 pps.\n",
      "INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 0.014. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 0.423. Accuracy is 89.82%\n",
      "INFO:mlp.optimisers:Epoch 15: Took 9 seconds. Training speed 330 pps. Validation speed 1710 pps.\n",
      "INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 0.012. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 0.427. Accuracy is 90.01%\n",
      "INFO:mlp.optimisers:Epoch 16: Took 9 seconds. Training speed 324 pps. Validation speed 1637 pps.\n",
      "INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 0.011. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 0.426. Accuracy is 90.13%\n",
      "INFO:mlp.optimisers:Epoch 17: Took 9 seconds. Training speed 313 pps. Validation speed 1656 pps.\n",
      "INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 0.010. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 0.426. Accuracy is 90.16%\n",
      "INFO:mlp.optimisers:Epoch 18: Took 9 seconds. Training speed 306 pps. Validation speed 1672 pps.\n",
      "INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 0.009. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 0.428. Accuracy is 90.27%\n",
      "INFO:mlp.optimisers:Epoch 19: Took 9 seconds. Training speed 307 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 0.435. Accuracy is 90.06%\n",
      "INFO:mlp.optimisers:Epoch 20: Took 9 seconds. Training speed 310 pps. Validation speed 1621 pps.\n",
      "INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 0.008. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 0.429. Accuracy is 90.35%\n",
      "INFO:mlp.optimisers:Epoch 21: Took 9 seconds. Training speed 309 pps. Validation speed 1608 pps.\n",
      "INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 0.440. Accuracy is 90.05%\n",
      "INFO:mlp.optimisers:Epoch 22: Took 9 seconds. Training speed 307 pps. Validation speed 1670 pps.\n",
      "INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 0.007. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 0.442. Accuracy is 90.04%\n",
      "INFO:mlp.optimisers:Epoch 23: Took 9 seconds. Training speed 308 pps. Validation speed 1650 pps.\n",
      "INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 0.443. Accuracy is 90.14%\n",
      "INFO:mlp.optimisers:Epoch 24: Took 9 seconds. Training speed 330 pps. Validation speed 1642 pps.\n",
      "INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 0.441. Accuracy is 90.26%\n",
      "INFO:mlp.optimisers:Epoch 25: Took 9 seconds. Training speed 322 pps. Validation speed 1659 pps.\n",
      "INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 0.006. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 0.442. Accuracy is 90.30%\n",
      "INFO:mlp.optimisers:Epoch 26: Took 9 seconds. Training speed 314 pps. Validation speed 1590 pps.\n",
      "INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 0.445. Accuracy is 90.28%\n",
      "INFO:mlp.optimisers:Epoch 27: Took 9 seconds. Training speed 304 pps. Validation speed 1684 pps.\n",
      "INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 0.450. Accuracy is 90.21%\n",
      "INFO:mlp.optimisers:Epoch 28: Took 9 seconds. Training speed 309 pps. Validation speed 1629 pps.\n",
      "INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 0.454. Accuracy is 90.21%\n",
      "INFO:mlp.optimisers:Epoch 29: Took 9 seconds. Training speed 312 pps. Validation speed 1664 pps.\n",
      "INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 0.005. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 0.455. Accuracy is 90.17%\n",
      "INFO:mlp.optimisers:Epoch 30: Took 10 seconds. Training speed 302 pps. Validation speed 1590 pps.\n",
      "INFO:root:Testing the model on test set:\n",
      "INFO:root:MNIST test set accuracy is 89.24 %, cost (ce) is 0.464\n"
     ]
    }
   ],
   "source": [
    "#Baseline experiment\n",
    "\n",
    "from mlp.layers import MLP, Linear, Sigmoid, Softmax #import required layer types\n",
    "from mlp.optimisers import SGDOptimiser #import the optimiser\n",
    "\n",
    "from mlp.costs import CECost #import the cost we want to use for optimisation\n",
    "from mlp.schedulers import LearningRateFixed\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)\n",
    "rng = numpy.random.RandomState([2015,10,10])\n",
    "\n",
    "#some hyper-parameters\n",
    "nhid = 800\n",
    "learning_rate = 0.5\n",
    "max_epochs = 30\n",
    "cost = CECost()\n",
    "    \n",
    "stats = []\n",
    "for layer in xrange(1, 2):\n",
    "\n",
    "    train_dp.reset()\n",
    "    valid_dp.reset()\n",
    "    test_dp.reset()\n",
    "    \n",
    "    #define the model\n",
    "    model = MLP(cost=cost)\n",
    "    model.add_layer(Sigmoid(idim=784, odim=nhid, irange=0.2, rng=rng))\n",
    "    for i in xrange(1, layer):\n",
    "        logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
    "        model.add_layer(Sigmoid(idim=nhid, odim=nhid, irange=0.2, rng=rng))\n",
    "    model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
    "\n",
    "    # define the optimiser, here stochasitc gradient descent\n",
    "    # with fixed learning rate and max_epochs\n",
    "    lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
    "    optimiser = SGDOptimiser(lr_scheduler=lr_scheduler)\n",
    "\n",
    "    logger.info('Training started...')\n",
    "    tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
    "\n",
    "    logger.info('Testing the model on test set:')\n",
    "    tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
    "    logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
    "    \n",
    "    stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 1: Implement L1 based regularisation\n",
    "\n",
    "Implement L1 regularisation penalty (just for weight matrices, ignore biases). Test your solution on one hidden layer model similar to the one from coursework's Task 4 (800 hidden units) but limit training data to 10 000 (random) data-points (keep validation and test sets the same). First build and train not-regularised model as a basline. Then train regularised model starting with $\\beta_{L1}$ set to 0.001 and do some grid search for better values. Plot validation accuracies as a function of epochs for each model (each $\\beta_{L1}$ you tried).\n",
    "\n",
    "Implementation tips:\n",
    "* Have a look at the constructor of mlp.optimiser.SGDOptimiser class, it has been modified to take more optimisation-related arguments.\n",
    "* The best place to implement regularisation terms is `pgrads` method of mlp.layers.Layer (sub)-classes (look at equations (5) and (8) to see why). Some modifications are also required in `train_epoch`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Training started...\n",
      "INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 8.934. Accuracy is 8.60%\n",
      "INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 8.863. Accuracy is 9.84%\n",
      "INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 9.299. Accuracy is 58.20%\n",
      "INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 6.952. Accuracy is 81.23%\n",
      "INFO:mlp.optimisers:Epoch 1: Took 12 seconds. Training speed 261 pps. Validation speed 1285 pps.\n",
      "INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 6.838. Accuracy is 84.10%\n",
      "INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 6.776. Accuracy is 86.86%\n",
      "INFO:mlp.optimisers:Epoch 2: Took 10 seconds. Training speed 248 pps. Validation speed 1546 pps.\n",
      "INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 6.661. Accuracy is 89.60%\n",
      "INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 6.741. Accuracy is 87.56%\n",
      "INFO:mlp.optimisers:Epoch 3: Took 11 seconds. Training speed 257 pps. Validation speed 1499 pps.\n",
      "INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 6.541. Accuracy is 92.60%\n",
      "INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 6.729. Accuracy is 87.59%\n",
      "INFO:mlp.optimisers:Epoch 4: Took 10 seconds. Training speed 268 pps. Validation speed 1495 pps.\n",
      "INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 6.458. Accuracy is 95.30%\n",
      "INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 6.853. Accuracy is 82.51%\n",
      "INFO:mlp.optimisers:Epoch 5: Took 12 seconds. Training speed 267 pps. Validation speed 1220 pps.\n",
      "INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 6.400. Accuracy is 96.80%\n",
      "INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 6.636. Accuracy is 89.53%\n",
      "INFO:mlp.optimisers:Epoch 6: Took 12 seconds. Training speed 258 pps. Validation speed 1235 pps.\n",
      "INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 6.333. Accuracy is 98.40%\n",
      "INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 6.622. Accuracy is 89.54%\n",
      "INFO:mlp.optimisers:Epoch 7: Took 12 seconds. Training speed 253 pps. Validation speed 1214 pps.\n",
      "INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 6.290. Accuracy is 98.70%\n",
      "INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 6.616. Accuracy is 89.02%\n",
      "INFO:mlp.optimisers:Epoch 8: Took 14 seconds. Training speed 253 pps. Validation speed 1023 pps.\n",
      "INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 6.246. Accuracy is 99.20%\n",
      "INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 6.576. Accuracy is 89.49%\n",
      "INFO:mlp.optimisers:Epoch 9: Took 12 seconds. Training speed 259 pps. Validation speed 1299 pps.\n",
      "INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 6.206. Accuracy is 99.40%\n",
      "INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 6.554. Accuracy is 89.61%\n",
      "INFO:mlp.optimisers:Epoch 10: Took 11 seconds. Training speed 270 pps. Validation speed 1307 pps.\n",
      "INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 6.172. Accuracy is 99.50%\n",
      "INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 6.533. Accuracy is 89.83%\n",
      "INFO:mlp.optimisers:Epoch 11: Took 12 seconds. Training speed 252 pps. Validation speed 1205 pps.\n",
      "INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 6.136. Accuracy is 99.40%\n",
      "INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 6.517. Accuracy is 89.65%\n",
      "INFO:mlp.optimisers:Epoch 12: Took 12 seconds. Training speed 255 pps. Validation speed 1292 pps.\n",
      "INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 6.105. Accuracy is 99.60%\n",
      "INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 6.484. Accuracy is 89.90%\n",
      "INFO:mlp.optimisers:Epoch 13: Took 12 seconds. Training speed 257 pps. Validation speed 1290 pps.\n",
      "INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 6.074. Accuracy is 99.80%\n",
      "INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 6.457. Accuracy is 89.87%\n",
      "INFO:mlp.optimisers:Epoch 14: Took 11 seconds. Training speed 260 pps. Validation speed 1337 pps.\n",
      "INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 6.041. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 6.439. Accuracy is 89.76%\n",
      "INFO:mlp.optimisers:Epoch 15: Took 11 seconds. Training speed 263 pps. Validation speed 1311 pps.\n",
      "INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 6.011. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 6.411. Accuracy is 89.89%\n",
      "INFO:mlp.optimisers:Epoch 16: Took 12 seconds. Training speed 261 pps. Validation speed 1263 pps.\n",
      "INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 5.981. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 6.385. Accuracy is 89.94%\n",
      "INFO:mlp.optimisers:Epoch 17: Took 12 seconds. Training speed 258 pps. Validation speed 1276 pps.\n",
      "INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 5.952. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 6.360. Accuracy is 89.98%\n",
      "INFO:mlp.optimisers:Epoch 18: Took 12 seconds. Training speed 255 pps. Validation speed 1306 pps.\n",
      "INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 5.922. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 6.335. Accuracy is 89.86%\n",
      "INFO:mlp.optimisers:Epoch 19: Took 10 seconds. Training speed 259 pps. Validation speed 1536 pps.\n",
      "INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 5.893. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 6.312. Accuracy is 89.92%\n",
      "INFO:mlp.optimisers:Epoch 20: Took 12 seconds. Training speed 255 pps. Validation speed 1255 pps.\n",
      "INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 5.864. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 6.283. Accuracy is 89.84%\n",
      "INFO:mlp.optimisers:Epoch 21: Took 10 seconds. Training speed 279 pps. Validation speed 1543 pps.\n",
      "INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 5.835. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 6.258. Accuracy is 89.92%\n",
      "INFO:mlp.optimisers:Epoch 22: Took 11 seconds. Training speed 254 pps. Validation speed 1335 pps.\n",
      "INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 5.806. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 6.232. Accuracy is 89.97%\n",
      "INFO:mlp.optimisers:Epoch 23: Took 11 seconds. Training speed 256 pps. Validation speed 1378 pps.\n",
      "INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 5.777. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 6.202. Accuracy is 90.04%\n",
      "INFO:mlp.optimisers:Epoch 24: Took 13 seconds. Training speed 255 pps. Validation speed 1133 pps.\n",
      "INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 5.748. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 6.178. Accuracy is 89.96%\n",
      "INFO:mlp.optimisers:Epoch 25: Took 12 seconds. Training speed 253 pps. Validation speed 1233 pps.\n",
      "INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 5.720. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 6.157. Accuracy is 89.90%\n",
      "INFO:mlp.optimisers:Epoch 26: Took 12 seconds. Training speed 259 pps. Validation speed 1260 pps.\n",
      "INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 5.691. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 6.125. Accuracy is 89.90%\n",
      "INFO:mlp.optimisers:Epoch 27: Took 11 seconds. Training speed 283 pps. Validation speed 1357 pps.\n",
      "INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 5.662. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 6.099. Accuracy is 89.90%\n",
      "INFO:mlp.optimisers:Epoch 28: Took 12 seconds. Training speed 257 pps. Validation speed 1247 pps.\n",
      "INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 5.634. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 6.067. Accuracy is 90.03%\n",
      "INFO:mlp.optimisers:Epoch 29: Took 12 seconds. Training speed 257 pps. Validation speed 1256 pps.\n",
      "INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 5.605. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 6.043. Accuracy is 89.94%\n",
      "INFO:mlp.optimisers:Epoch 30: Took 12 seconds. Training speed 256 pps. Validation speed 1285 pps.\n",
      "INFO:root:Testing the model on test set:\n",
      "INFO:root:MNIST test set accuracy is 89.30 %, cost (ce) is 0.448\n"
     ]
    }
   ],
   "source": [
    "%autoreload\n",
    "\n",
    "import numpy\n",
    "import logging\n",
    "\n",
    "from mlp.layers import MLP, Linear, Sigmoid, Softmax #import required layer types\n",
    "from mlp.optimisers import SGDOptimiser #import the optimiser\n",
    "from mlp.dataset import MNISTDataProvider #import data provider\n",
    "from mlp.costs import CECost #import the cost we want to use for optimisation\n",
    "from mlp.schedulers import LearningRateFixed\n",
    "\n",
    "rng = numpy.random.RandomState([2015,10,10])\n",
    "\n",
    "#some hyper-parameters\n",
    "nhid = 800\n",
    "learning_rate = 0.5\n",
    "max_epochs = 30\n",
    "l1_weight = 0.0001\n",
    "l2_weight = 0.0\n",
    "cost = CECost()\n",
    "    \n",
    "stats = []\n",
    "layer = 1\n",
    "for i in xrange(1, 2):\n",
    "\n",
    "    train_dp.reset()\n",
    "    valid_dp.reset()\n",
    "    test_dp.reset()\n",
    "    \n",
    "    #define the model\n",
    "    model = MLP(cost=cost)\n",
    "    model.add_layer(Sigmoid(idim=784, odim=nhid, irange=0.2, rng=rng))\n",
    "    for i in xrange(1, layer):\n",
    "        logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
    "        model.add_layer(Sigmoid(idim=nhid, odim=nhid, irange=0.2, rng=rng))\n",
    "    model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
    "\n",
    "    # define the optimiser, here stochasitc gradient descent\n",
    "    # with fixed learning rate and max_epochs\n",
    "    lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
    "    optimiser = SGDOptimiser(lr_scheduler=lr_scheduler, \n",
    "                             dp_scheduler=None,\n",
    "                             l1_weight=l1_weight, \n",
    "                             l2_weight=l2_weight)\n",
    "\n",
    "    logger.info('Training started...')\n",
    "    tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
    "\n",
    "    logger.info('Testing the model on test set:')\n",
    "    tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
    "    logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
    "    \n",
    "    stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 2:  Implement L2 based regularisation\n",
    "\n",
    "Implement L2 regularisation method. Follow similar steps as in Exercise 1. Start with $\\beta_{L2}$ set to 0.001 and do some grid search for better values. Plot validation accuracies as a function of epochs for each model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Training started...\n",
      "INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 2.666. Accuracy is 8.60%\n",
      "INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 2.595. Accuracy is 9.84%\n",
      "INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 2.862. Accuracy is 58.70%\n",
      "INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 0.772. Accuracy is 75.41%\n",
      "INFO:mlp.optimisers:Epoch 1: Took 12 seconds. Training speed 255 pps. Validation speed 1302 pps.\n",
      "INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 0.544. Accuracy is 83.30%\n",
      "INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 0.547. Accuracy is 83.96%\n",
      "INFO:mlp.optimisers:Epoch 2: Took 11 seconds. Training speed 252 pps. Validation speed 1397 pps.\n",
      "INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 0.361. Accuracy is 90.60%\n",
      "INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 0.538. Accuracy is 84.64%\n",
      "INFO:mlp.optimisers:Epoch 3: Took 9 seconds. Training speed 328 pps. Validation speed 1565 pps.\n",
      "INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 0.270. Accuracy is 92.70%\n",
      "INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 0.442. Accuracy is 88.85%\n",
      "INFO:mlp.optimisers:Epoch 4: Took 12 seconds. Training speed 250 pps. Validation speed 1299 pps.\n",
      "INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 0.218. Accuracy is 94.60%\n",
      "INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 0.444. Accuracy is 88.78%\n",
      "INFO:mlp.optimisers:Epoch 5: Took 10 seconds. Training speed 294 pps. Validation speed 1543 pps.\n",
      "INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 0.179. Accuracy is 96.60%\n",
      "INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 0.462. Accuracy is 88.42%\n",
      "INFO:mlp.optimisers:Epoch 6: Took 10 seconds. Training speed 307 pps. Validation speed 1543 pps.\n",
      "INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 0.135. Accuracy is 98.40%\n",
      "INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 0.418. Accuracy is 89.87%\n",
      "INFO:mlp.optimisers:Epoch 7: Took 10 seconds. Training speed 297 pps. Validation speed 1548 pps.\n",
      "INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 0.109. Accuracy is 99.30%\n",
      "INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 0.434. Accuracy is 89.65%\n",
      "INFO:mlp.optimisers:Epoch 8: Took 12 seconds. Training speed 292 pps. Validation speed 1217 pps.\n",
      "INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 0.089. Accuracy is 99.30%\n",
      "INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 0.451. Accuracy is 89.26%\n",
      "INFO:mlp.optimisers:Epoch 9: Took 11 seconds. Training speed 311 pps. Validation speed 1348 pps.\n",
      "INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 0.088. Accuracy is 99.50%\n",
      "INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 0.459. Accuracy is 89.22%\n",
      "INFO:mlp.optimisers:Epoch 10: Took 9 seconds. Training speed 325 pps. Validation speed 1600 pps.\n",
      "INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 0.074. Accuracy is 99.80%\n",
      "INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 0.443. Accuracy is 90.03%\n",
      "INFO:mlp.optimisers:Epoch 11: Took 11 seconds. Training speed 274 pps. Validation speed 1325 pps.\n",
      "INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 0.070. Accuracy is 99.80%\n",
      "INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 0.452. Accuracy is 90.10%\n",
      "INFO:mlp.optimisers:Epoch 12: Took 12 seconds. Training speed 288 pps. Validation speed 1133 pps.\n",
      "INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 0.064. Accuracy is 99.70%\n",
      "INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 0.485. Accuracy is 89.01%\n",
      "INFO:mlp.optimisers:Epoch 13: Took 11 seconds. Training speed 315 pps. Validation speed 1218 pps.\n",
      "INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 0.060. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 0.457. Accuracy is 90.14%\n",
      "INFO:mlp.optimisers:Epoch 14: Took 12 seconds. Training speed 282 pps. Validation speed 1245 pps.\n",
      "INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 0.059. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 0.459. Accuracy is 90.29%\n",
      "INFO:mlp.optimisers:Epoch 15: Took 11 seconds. Training speed 280 pps. Validation speed 1268 pps.\n",
      "INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 0.057. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 0.465. Accuracy is 90.00%\n",
      "INFO:mlp.optimisers:Epoch 16: Took 10 seconds. Training speed 328 pps. Validation speed 1368 pps.\n",
      "INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 0.056. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 0.467. Accuracy is 90.14%\n",
      "INFO:mlp.optimisers:Epoch 17: Took 11 seconds. Training speed 273 pps. Validation speed 1287 pps.\n",
      "INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 0.055. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 0.475. Accuracy is 90.15%\n",
      "INFO:mlp.optimisers:Epoch 18: Took 11 seconds. Training speed 279 pps. Validation speed 1279 pps.\n",
      "INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 0.054. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 0.473. Accuracy is 90.06%\n",
      "INFO:mlp.optimisers:Epoch 19: Took 12 seconds. Training speed 284 pps. Validation speed 1188 pps.\n",
      "INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 0.053. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 0.478. Accuracy is 90.06%\n",
      "INFO:mlp.optimisers:Epoch 20: Took 13 seconds. Training speed 250 pps. Validation speed 1172 pps.\n",
      "INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 0.053. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 0.484. Accuracy is 90.05%\n",
      "INFO:mlp.optimisers:Epoch 21: Took 11 seconds. Training speed 255 pps. Validation speed 1325 pps.\n",
      "INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 0.052. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 0.485. Accuracy is 90.21%\n",
      "INFO:mlp.optimisers:Epoch 22: Took 10 seconds. Training speed 328 pps. Validation speed 1353 pps.\n",
      "INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 0.052. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 0.484. Accuracy is 90.08%\n",
      "INFO:mlp.optimisers:Epoch 23: Took 11 seconds. Training speed 266 pps. Validation speed 1332 pps.\n",
      "INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 0.051. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 0.485. Accuracy is 90.17%\n",
      "INFO:mlp.optimisers:Epoch 24: Took 10 seconds. Training speed 281 pps. Validation speed 1520 pps.\n",
      "INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 0.051. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 0.486. Accuracy is 90.26%\n",
      "INFO:mlp.optimisers:Epoch 25: Took 11 seconds. Training speed 259 pps. Validation speed 1321 pps.\n",
      "INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 0.051. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 0.487. Accuracy is 90.23%\n",
      "INFO:mlp.optimisers:Epoch 26: Took 11 seconds. Training speed 283 pps. Validation speed 1344 pps.\n",
      "INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 0.051. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 0.493. Accuracy is 90.25%\n",
      "INFO:mlp.optimisers:Epoch 27: Took 11 seconds. Training speed 270 pps. Validation speed 1357 pps.\n",
      "INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 0.050. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 0.493. Accuracy is 90.27%\n",
      "INFO:mlp.optimisers:Epoch 28: Took 11 seconds. Training speed 283 pps. Validation speed 1302 pps.\n",
      "INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 0.050. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 0.499. Accuracy is 90.22%\n",
      "INFO:mlp.optimisers:Epoch 29: Took 10 seconds. Training speed 327 pps. Validation speed 1395 pps.\n",
      "INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 0.050. Accuracy is 100.00%\n",
      "INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 0.497. Accuracy is 90.22%\n",
      "INFO:mlp.optimisers:Epoch 30: Took 12 seconds. Training speed 268 pps. Validation speed 1142 pps.\n",
      "INFO:root:Testing the model on test set:\n",
      "INFO:root:MNIST test set accuracy is 89.30 %, cost (ce) is 0.459\n"
     ]
    }
   ],
   "source": [
    "%autoreload\n",
    "\n",
    "import numpy\n",
    "import logging\n",
    "\n",
    "from mlp.layers import MLP, Linear, Sigmoid, Softmax #import required layer types\n",
    "from mlp.optimisers import SGDOptimiser #import the optimiser\n",
    "from mlp.dataset import MNISTDataProvider #import data provider\n",
    "from mlp.costs import CECost #import the cost we want to use for optimisation\n",
    "from mlp.schedulers import LearningRateFixed\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)\n",
    "rng = numpy.random.RandomState([2015,10,10])\n",
    "\n",
    "#some hyper-parameters\n",
    "nhid = 800\n",
    "learning_rate = 0.5\n",
    "max_epochs = 30\n",
    "l1_weight = 0\n",
    "l2_weight = 0.00001\n",
    "cost = CECost()\n",
    "    \n",
    "stats = []\n",
    "layer = 1\n",
    "for i in xrange(1, 2):\n",
    "\n",
    "    train_dp.reset()\n",
    "    valid_dp.reset()\n",
    "    test_dp.reset()\n",
    "    \n",
    "    #define the model\n",
    "    model = MLP(cost=cost)\n",
    "    model.add_layer(Sigmoid(idim=784, odim=nhid, irange=0.2, rng=rng))\n",
    "    for i in xrange(1, layer):\n",
    "        logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
    "        model.add_layer(Sigmoid(idim=nhid, odim=nhid, irange=0.2, rng=rng))\n",
    "    model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
    "\n",
    "    # define the optimiser, here stochasitc gradient descent\n",
    "    # with fixed learning rate and max_epochs\n",
    "    lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
    "    optimiser = SGDOptimiser(lr_scheduler=lr_scheduler, \n",
    "                             dp_scheduler=None,\n",
    "                             l1_weight=l1_weight, \n",
    "                             l2_weight=l2_weight)\n",
    "\n",
    "    logger.info('Training started...')\n",
    "    tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
    "\n",
    "    logger.info('Testing the model on test set:')\n",
    "    tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
    "    logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
    "    \n",
    "    stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 3:\n",
    "    \n",
    "Droput applied to input features (turning on/off some random pixels) may be also viewed as a form of data augmentation -- as we effectively create images that differ in some way from training one but also model is tasked to properly classify imperfect data-points.\n",
    "\n",
    "Your task in this exercise is to pick a random digit from MNIST dataset (use MNISTDataProvider) and corrupt it pixel-wise with different levels of probabilities $p_{d} \\in \\{0.9, 0.7, 0.5, 0.2, 0.1\\}$ (reminder, dropout probability is $1-p_d$) that is, for each pixel $x_{i,j}$ in image $\\mathbf{X} \\in \\mathbb{R}^{W\\times H}$:\n",
    "\n",
    "$\\begin{align}\n",
    "d_{i,j} & \\sim\\ \\mbox{Bernoulli}(p_{d}) \\\\\n",
    "x_{i,j} &=\n",
    "\\begin{cases}\n",
    "     0     & \\quad \\text{if } d_{i,j} = 0\\\\\n",
    "     x_{i,j}       & \\quad \\text{if } d_{i,j} = 1\\\\\n",
    "\\end{cases}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "Plot the solution as a 2x3 grid of images for each $p_d$ scenario, at position (0, 0) plot an original (uncorrupted) image.\n",
    "\n",
    "Tip: You may use numpy.random.binomial function to draw samples from Bernoulli distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using matplotlib backend: TkAgg\n",
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<matplotlib.image.AxesImage at 0x7f0472955e10>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWwAAAD7CAYAAABOi672AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsvVlsXVt63/lbZ54nTudwkkhRw9WV7q0BdW9VKh2X3V1A\nxX4M4CCNvMQxAiQx2kY/JHA/xHY/Bmi3O4Ef/GAHSMqdFNoBjAAu23GcOCk7VXWdKlXpStekJE46\nhzw88zwPux/EtbQPNVEcxHM21w/YIEWRR0v8n/Xfa3/rW98nDMNAo9FoNOOP7aIHoNFoNJrjoQ1b\no9FoJgRt2BqNRjMhaMPWaDSaCUEbtkaj0UwI2rA1Go1mQjixYQshviGE+FQI8ZkQ4p+e5aA0F4vW\n1ppoXScfcZI8bCGEG1gH/jqQAb4L/APDMO6d7fA07xqtrTXRulqDk66wPwYeGoaxZxhGH/gW8DNn\nNyzNBaK1tSZaVwvgOOHPLQJJ059TwNfM3yCE0EcoxwjDMMQxv1VrO0FoXa3Jq3Q96QpbC2tdtLbW\nROtqAU5q2ClgyfTnJUbv3prJRWtrTbSuFuCkhv2XwB0hxIIQwgn8LPCHZzcszQWitbUmWlcLcKIY\ntmEYbSHEPwT+mGem/28Mw/jhmY5McyFoba2J1tUanCit71gvrDcwxoq32Jx6I1rb8UHrak3OetNR\no9FoNO8YbdgajUYzIWjD1mg0mglBG7ZGo9FMCNqwNRqNZkLQhq3RaDQTgjZsjUajmRC0YWs0Gs2E\noA1bo9FoJoSTllcFQAixA1SBAdAzDOOjsxiU5mLRuloTrevkcyrD5lnJxq8ZhlE8i8FoxgatqzXR\nuk44ZxESObNaBpqxQutqTbSuE8xpDdsA/kQIcV8I8QtnMSDNWKB1tSZa1wnntCGRLxuGkRVCzAB/\nJIRYNwzjP53FwDQXitbVmmhdJ5xTrbANw8gefswBvwd86SwGpblYtK7WROs6+ZzYsIUQPiGE7/Bz\nP/AN4OFZDUxzMWhdrYnW1RqcJiQyB/z+YdFzH/DvDMP4D2czrPHDbrfjcDhwOBwjnzscDmw2Gzab\nDSGE+ijEs70dIQTmJhHdbpdOp0On06Hb7TIYDBgOhwwGA86rmcRbcql0PYrT6cTj8eDxeHC5XEpb\nqSug9DXr1e/3la6dTofBYKAureu7RwgxMkePzt/X6QoozbrdLs1mk2azSavVurD/j0R3nDkmXq8X\nv99PIBAgEAjg9/vV5fF4cDqdOJ3Ol05ywzDUVSwWyeVy5HI5SqUSrVaLdrtNq9ViOBye2/h1Z5Lj\nEYlEiMfjzM3NMT09PaLr0YkuNQWo1WpK11wuN6Jrv98/t/FqXV+O3W5X89Q8ZwOBAD6fT2nqdDqV\nrna7fcSw5XxNJpM8ffqUZPLd9Sx+la6n3XS8NDidTkKhEFNTUyNXLBYjGAzi9XrxeDx4vd6Ru7k0\n7OFwiGEY7O7usrW1xebmJkIIqtUqhmHQ6XTO1bA1x8Pv9zM/P8+NGzdYWVnB6/Wqy+l0qhWa3W5X\nk3o4HJLP59nc3OTJkycYhkG1WgWerdDO07A1L8dms+H1eonFYkxPT4/M2Wg0qjSVT1Jyzkpdh8Mh\nw+GQVCrFvXv3aDQa79SwX4U27GPicrkIBoPMzMywuLhIIpFgYWGB+fl5YrGYuosHAgG1KnM6ndhs\nNiX+cDjk4cOH+Hw+er0ejUZDmbW8s2suFmnYt2/f5sMPP1SaBoNB3G73iLZmXZPJJMFgkOFwOGLW\njUbjgv9HlxObzYbP5yMWi6m5KudrIpF4o64yVLm+vk6z2WR3d/ei/0vAJTRsuTqSd1S32z1yuVwu\n9VE+JtntdsLhMNPT08zMzLxwx5YrbK/Xi9vtHlmFybiYNO5gMMjs7CxXr16l1+vhcrnodruUSiV6\nvd5F/3omFiHEiLZSx5dd8nukJvl8nkKhQKFQwOFw4PV6CYVCRKNRfD6fWonJFbbct5D/rjSHqakp\nlpaWaDabeDweBoMB1WqVZrN5wb8d6+DxePD5fPh8PhwOh4ovN5tNgsEgU1NTTE9PEw6HgWf6FItF\nfD4fs7OzeDyekRW2nOvm2LZEzn9znPuiuZSGLWNXPp+PUCj00svv96vvczqd+P1+gsGguiubY2Jy\nMsvvN8fDzELbbDb8fj+zs7N0Oh0Mw6DX61EqlbDb7Rf4W5l85CaTy+VST0Mv0zUcDqsVlbxZbmxs\nsL6+TrVaxW63K8OOxWIjukqjNm9SSTweD1NTUywvLyOEUCvtdDp9Ub8SS+L1epUpezwe8vk8+Xye\ndrtNJBJhbW2NW7duMTc3x97eHnt7e6RSKbxeL91uF7fbTTQaVZqa9ybMG8lS26NJBBfNpTVsj8dD\nKBRiZmbmpVc0GlXZAjLOdXTyvm7X+WimiPzo9/uZmZlRq7xyuUwqldKGfUqkYbvdbrxeL5FIhNnZ\n2Rd0nZ2dVRvFHo+HVquFy+WiWq2ytbWFw+HA4/EQDAaJRCIjuppX1Uc/ejweYrEYAD6fj2q1yv7+\nPk6n82J+IRZF3hiXlpbw+/3YbDba7TbFYpFwOMza2hoff/wxV65c4ZNPPlFPTj6fb8SwXzZfJUez\nRsbFrMGihm1Or3M4HMpoXS4XPp9PZXfIST07O8vc3Jya4LOzs8RiMbWJ6PF4RnaQ5QpKbiQOBgP6\n/b6KfZkfzY+GRVwuF36/H8Mw1GOc2+0eeRTTvBzzBBJCjNxEj944zSGKxcXFEZ39fr/StdVqkc1m\nSafTJJNJ4vG4emSWIZN+v89gMHgh7GKe7A6HA5/Pp74vHA6rDWjN63mTruaU2YWFBa5evcrVq1fx\ner10Oh1KpZLSXaZlynkeDAbVHBsMBiqbR/qCOXz5Ml3l03UsFmNubm4kLXcwGLzz35XlDPto/qWM\na8lLmrUMbUQiEcLhsPooH5t9Pt/IGwae52YOBgN6vR7dblcJ2G63abfbdDodFWMzxz6dTudIxki/\n31dGYE4P07was7ZOp5NYLKZ0dTqdlEoldQGEQiEWFhZYXV0lHA4TDodVCEs+CrtcLhKJBHfu3EEI\nQSgUYnFxEbvdTqlUUrq2Wi1lylJb80aV1FW+N+QNXOv6Zl6naygUGnmynZqaUjdfmXaXSqWw2WyU\ny2WePHmCy+UilUpRrVaJxWJ89atfVXtSGxsbpNNpNddDoZBK9ZOXNHKn04nX6yWRSHDr1i3a7TaZ\nTIZsNksmk7mQDWXLGTaMhj1isRhXrlxRd2VzNod5pSU/mkMg5jsuMLKi7nQ6arOj0WhQrVapVqvU\n63UikQjRaJRoNMpwOMTj8ag3pXlF3uv11G605s3I36HH48HtdqvN25WVFVwuF0+ePKHb7ZJOp5X5\nSsOWP+PxeEZWVdKwhRBMT09js9kIBoM4HA6KxSLVapVarUa1WsXlcildI5EIHo8HYCQVzHwj1roe\nj5fpurKywtWrV4nH4yOplea86kajQSqVIhgMYrPZqFQqPHnyhGq1SjweJ5FIMD8/z507d2g0GqTT\naTY2NiiVSiQSCfU9MtUvGo1iGIbyALmfEY/HuXXrFi6Xi0ePHiGEoFKpjKdhCyF+B/gZIGsYxt3D\nr8WAb/Hs9FQa+NuGYZTPc6DH5WgsMxaLcfXqVe7evcvdu3fVyjoYDOLxeF4be5avB6jJJ1dR7XZb\nGXWpVFKxsmKxSDwep9vtvhCakT//son9rldik6YroB553W43Pp+Pubk5rl+/zt27d3G73fR6PdLp\nNO12GxhdYQMv1dZut5NIJJiZmeG9996j2WxSr9ep1+sUi0Wlaz6fx+v10m63MQxD6Wm323G73WOz\nwraCrvF4nLW1NT744ANWVlbUfA0EAiMni3O5HI8fPyYQCKj9ILkXEYvF+Kmf+inu3LnDX/trf429\nvT0ODg7Y2NjgRz/6Eaurq1y7do3V1VUWFxeVrnLPQd7M5Qrb5XIxMzODzWajWq3y9OnTC/ldHWeF\n/a+Afwn8a9PXfg34A8M
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f0475571290>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%pylab\n",
    "%matplotlib inline\n",
    "\n",
    "train_dp.reset()\n",
    "x, t = train_dp.next()\n",
    "img = x[0].reshape(28,28)\n",
    "pds = [0.9, 0.7, 0.5, 0.2, 0.1]\n",
    "imgs = [None] * (len(pds)+1)\n",
    "imgs[0] = img\n",
    "\n",
    "for i, pd in enumerate(pds):\n",
    "    d = rng.binomial(1, pd, img.shape)\n",
    "    imgs[i + 1] = d*img\n",
    "\n",
    "fig, ax = plt.subplots(2,3)\n",
    "ax[0, 0].imshow(imgs[0], cmap=cm.Greys_r)\n",
    "ax[0, 1].imshow(imgs[1], cmap=cm.Greys_r)\n",
    "ax[0, 2].imshow(imgs[2], cmap=cm.Greys_r)\n",
    "ax[1, 0].imshow(imgs[3], cmap=cm.Greys_r)\n",
    "ax[1, 1].imshow(imgs[4], cmap=cm.Greys_r)\n",
    "ax[1, 2].imshow(imgs[5], cmap=cm.Greys_r)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 4: Implement Dropout \n",
    "\n",
    "Implement dropout regularisation technique. Then for the same initial configuration as in Exercise 1. investigate effectivness of different dropout rates applied to input features and/or hidden layers. Start with $p_{inp}=0.5$ and $p_{hid}=0.5$ and do some search for better settings.\n",
    "\n",
    "Implementation tips:\n",
    "* Add a function `fprop_dropout` to `mlp.layers.MLP` class which (on top of `inputs` argument) takes also dropout-related argument(s) and perform dropout forward propagation through the model.\n",
    "* One also would have to introduce required modificastions to `mlp.optimisers.SGDOptimiser.train_epoch()` function.\n",
    "* Design and implemnt dropout scheduler in a similar way to how learning rates are handled (that is, allowing for some implementation dependent schedule which is kept independent of implementation in `mlp.optimisers.SGDOptimiser.train()`). \n",
    "   +  For this exercise implement only fixed dropout scheduler - `DropoutFixed`, but implementation should allow to easily add other schedules in the future. \n",
    "   +  Dropout scheduler of any type should return a tuple of two numbers $(p_{inp},\\; p_{hid})$, the first one is dropout factor for input features (data-points), and the latter dropout factor for hidden layers (assumed the same for all hidden layers)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Training started...\n",
      "INFO:mlp.optimisers:Epoch 0: Training cost (ce) for initial model is 2.624. Accuracy is 8.60%\n",
      "INFO:mlp.optimisers:Epoch 0: Validation cost (ce) for initial model is 2.554. Accuracy is 9.84%\n",
      "INFO:mlp.optimisers:Epoch 1: Training cost (ce) is 3.828. Accuracy is 50.90%\n",
      "INFO:mlp.optimisers:Epoch 1: Validation cost (ce) is 0.716. Accuracy is 76.93%\n",
      "INFO:mlp.optimisers:Epoch 1: Took 9 seconds. Training speed 295 pps. Validation speed 1692 pps.\n",
      "INFO:mlp.optimisers:Epoch 2: Training cost (ce) is 1.132. Accuracy is 66.90%\n",
      "INFO:mlp.optimisers:Epoch 2: Validation cost (ce) is 0.753. Accuracy is 78.82%\n",
      "INFO:mlp.optimisers:Epoch 2: Took 10 seconds. Training speed 289 pps. Validation speed 1653 pps.\n",
      "INFO:mlp.optimisers:Epoch 3: Training cost (ce) is 1.043. Accuracy is 71.90%\n",
      "INFO:mlp.optimisers:Epoch 3: Validation cost (ce) is 0.501. Accuracy is 85.69%\n",
      "INFO:mlp.optimisers:Epoch 3: Took 10 seconds. Training speed 280 pps. Validation speed 1681 pps.\n",
      "INFO:mlp.optimisers:Epoch 4: Training cost (ce) is 0.810. Accuracy is 78.50%\n",
      "INFO:mlp.optimisers:Epoch 4: Validation cost (ce) is 0.481. Accuracy is 85.92%\n",
      "INFO:mlp.optimisers:Epoch 4: Took 9 seconds. Training speed 308 pps. Validation speed 1675 pps.\n",
      "INFO:mlp.optimisers:Epoch 5: Training cost (ce) is 0.769. Accuracy is 79.40%\n",
      "INFO:mlp.optimisers:Epoch 5: Validation cost (ce) is 0.588. Accuracy is 84.25%\n",
      "INFO:mlp.optimisers:Epoch 5: Took 9 seconds. Training speed 320 pps. Validation speed 1733 pps.\n",
      "INFO:mlp.optimisers:Epoch 6: Training cost (ce) is 0.792. Accuracy is 78.60%\n",
      "INFO:mlp.optimisers:Epoch 6: Validation cost (ce) is 0.434. Accuracy is 88.49%\n",
      "INFO:mlp.optimisers:Epoch 6: Took 9 seconds. Training speed 334 pps. Validation speed 1692 pps.\n",
      "INFO:mlp.optimisers:Epoch 7: Training cost (ce) is 0.675. Accuracy is 82.00%\n",
      "INFO:mlp.optimisers:Epoch 7: Validation cost (ce) is 0.514. Accuracy is 86.76%\n",
      "INFO:mlp.optimisers:Epoch 7: Took 9 seconds. Training speed 284 pps. Validation speed 1704 pps.\n",
      "INFO:mlp.optimisers:Epoch 8: Training cost (ce) is 0.808. Accuracy is 79.90%\n",
      "INFO:mlp.optimisers:Epoch 8: Validation cost (ce) is 0.620. Accuracy is 84.22%\n",
      "INFO:mlp.optimisers:Epoch 8: Took 9 seconds. Training speed 317 pps. Validation speed 1684 pps.\n",
      "INFO:mlp.optimisers:Epoch 9: Training cost (ce) is 0.810. Accuracy is 79.90%\n",
      "INFO:mlp.optimisers:Epoch 9: Validation cost (ce) is 0.645. Accuracy is 84.91%\n",
      "INFO:mlp.optimisers:Epoch 9: Took 9 seconds. Training speed 304 pps. Validation speed 1675 pps.\n",
      "INFO:mlp.optimisers:Epoch 10: Training cost (ce) is 0.631. Accuracy is 83.20%\n",
      "INFO:mlp.optimisers:Epoch 10: Validation cost (ce) is 0.493. Accuracy is 88.80%\n",
      "INFO:mlp.optimisers:Epoch 10: Took 10 seconds. Training speed 286 pps. Validation speed 1656 pps.\n",
      "INFO:mlp.optimisers:Epoch 11: Training cost (ce) is 0.676. Accuracy is 83.90%\n",
      "INFO:mlp.optimisers:Epoch 11: Validation cost (ce) is 0.598. Accuracy is 86.78%\n",
      "INFO:mlp.optimisers:Epoch 11: Took 9 seconds. Training speed 296 pps. Validation speed 1710 pps.\n",
      "INFO:mlp.optimisers:Epoch 12: Training cost (ce) is 0.659. Accuracy is 83.80%\n",
      "INFO:mlp.optimisers:Epoch 12: Validation cost (ce) is 0.432. Accuracy is 90.21%\n",
      "INFO:mlp.optimisers:Epoch 12: Took 9 seconds. Training speed 309 pps. Validation speed 1675 pps.\n",
      "INFO:mlp.optimisers:Epoch 13: Training cost (ce) is 0.525. Accuracy is 86.80%\n",
      "INFO:mlp.optimisers:Epoch 13: Validation cost (ce) is 0.490. Accuracy is 89.64%\n",
      "INFO:mlp.optimisers:Epoch 13: Took 9 seconds. Training speed 320 pps. Validation speed 1681 pps.\n",
      "INFO:mlp.optimisers:Epoch 14: Training cost (ce) is 0.488. Accuracy is 88.50%\n",
      "INFO:mlp.optimisers:Epoch 14: Validation cost (ce) is 0.606. Accuracy is 86.87%\n",
      "INFO:mlp.optimisers:Epoch 14: Took 9 seconds. Training speed 305 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 15: Training cost (ce) is 0.441. Accuracy is 88.30%\n",
      "INFO:mlp.optimisers:Epoch 15: Validation cost (ce) is 0.570. Accuracy is 89.27%\n",
      "INFO:mlp.optimisers:Epoch 15: Took 9 seconds. Training speed 331 pps. Validation speed 1736 pps.\n",
      "INFO:mlp.optimisers:Epoch 16: Training cost (ce) is 0.478. Accuracy is 87.80%\n",
      "INFO:mlp.optimisers:Epoch 16: Validation cost (ce) is 0.488. Accuracy is 90.52%\n",
      "INFO:mlp.optimisers:Epoch 16: Took 9 seconds. Training speed 337 pps. Validation speed 1710 pps.\n",
      "INFO:mlp.optimisers:Epoch 17: Training cost (ce) is 0.461. Accuracy is 89.60%\n",
      "INFO:mlp.optimisers:Epoch 17: Validation cost (ce) is 0.736. Accuracy is 86.67%\n",
      "INFO:mlp.optimisers:Epoch 17: Took 9 seconds. Training speed 294 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 18: Training cost (ce) is 0.440. Accuracy is 88.90%\n",
      "INFO:mlp.optimisers:Epoch 18: Validation cost (ce) is 0.618. Accuracy is 88.99%\n",
      "INFO:mlp.optimisers:Epoch 18: Took 10 seconds. Training speed 279 pps. Validation speed 1659 pps.\n",
      "INFO:mlp.optimisers:Epoch 19: Training cost (ce) is 0.599. Accuracy is 87.40%\n",
      "INFO:mlp.optimisers:Epoch 19: Validation cost (ce) is 0.487. Accuracy is 91.03%\n",
      "INFO:mlp.optimisers:Epoch 19: Took 10 seconds. Training speed 281 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 20: Training cost (ce) is 0.447. Accuracy is 90.10%\n",
      "INFO:mlp.optimisers:Epoch 20: Validation cost (ce) is 0.574. Accuracy is 89.52%\n",
      "INFO:mlp.optimisers:Epoch 20: Took 10 seconds. Training speed 282 pps. Validation speed 1675 pps.\n",
      "INFO:mlp.optimisers:Epoch 21: Training cost (ce) is 0.579. Accuracy is 87.80%\n",
      "INFO:mlp.optimisers:Epoch 21: Validation cost (ce) is 0.550. Accuracy is 90.48%\n",
      "INFO:mlp.optimisers:Epoch 21: Took 9 seconds. Training speed 302 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 22: Training cost (ce) is 0.461. Accuracy is 89.70%\n",
      "INFO:mlp.optimisers:Epoch 22: Validation cost (ce) is 0.597. Accuracy is 90.02%\n",
      "INFO:mlp.optimisers:Epoch 22: Took 9 seconds. Training speed 303 pps. Validation speed 1684 pps.\n",
      "INFO:mlp.optimisers:Epoch 23: Training cost (ce) is 0.519. Accuracy is 89.50%\n",
      "INFO:mlp.optimisers:Epoch 23: Validation cost (ce) is 0.645. Accuracy is 90.35%\n",
      "INFO:mlp.optimisers:Epoch 23: Took 10 seconds. Training speed 277 pps. Validation speed 1670 pps.\n",
      "INFO:mlp.optimisers:Epoch 24: Training cost (ce) is 0.439. Accuracy is 90.80%\n",
      "INFO:mlp.optimisers:Epoch 24: Validation cost (ce) is 0.634. Accuracy is 90.04%\n",
      "INFO:mlp.optimisers:Epoch 24: Took 10 seconds. Training speed 268 pps. Validation speed 1687 pps.\n",
      "INFO:mlp.optimisers:Epoch 25: Training cost (ce) is 0.365. Accuracy is 91.50%\n",
      "INFO:mlp.optimisers:Epoch 25: Validation cost (ce) is 0.564. Accuracy is 91.55%\n",
      "INFO:mlp.optimisers:Epoch 25: Took 9 seconds. Training speed 309 pps. Validation speed 1733 pps.\n",
      "INFO:mlp.optimisers:Epoch 26: Training cost (ce) is 0.391. Accuracy is 91.60%\n",
      "INFO:mlp.optimisers:Epoch 26: Validation cost (ce) is 0.660. Accuracy is 90.20%\n",
      "INFO:mlp.optimisers:Epoch 26: Took 9 seconds. Training speed 329 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 27: Training cost (ce) is 0.412. Accuracy is 91.40%\n",
      "INFO:mlp.optimisers:Epoch 27: Validation cost (ce) is 0.614. Accuracy is 90.82%\n",
      "INFO:mlp.optimisers:Epoch 27: Took 9 seconds. Training speed 281 pps. Validation speed 1698 pps.\n",
      "INFO:mlp.optimisers:Epoch 28: Training cost (ce) is 0.470. Accuracy is 90.40%\n",
      "INFO:mlp.optimisers:Epoch 28: Validation cost (ce) is 0.593. Accuracy is 91.29%\n",
      "INFO:mlp.optimisers:Epoch 28: Took 10 seconds. Training speed 278 pps. Validation speed 1684 pps.\n",
      "INFO:mlp.optimisers:Epoch 29: Training cost (ce) is 0.443. Accuracy is 90.90%\n",
      "INFO:mlp.optimisers:Epoch 29: Validation cost (ce) is 0.623. Accuracy is 90.79%\n",
      "INFO:mlp.optimisers:Epoch 29: Took 9 seconds. Training speed 309 pps. Validation speed 1661 pps.\n",
      "INFO:mlp.optimisers:Epoch 30: Training cost (ce) is 0.359. Accuracy is 92.20%\n",
      "INFO:mlp.optimisers:Epoch 30: Validation cost (ce) is 0.614. Accuracy is 90.74%\n",
      "INFO:mlp.optimisers:Epoch 30: Took 9 seconds. Training speed 312 pps. Validation speed 1659 pps.\n",
      "INFO:mlp.optimisers:Epoch 31: Training cost (ce) is 0.345. Accuracy is 92.30%\n",
      "INFO:mlp.optimisers:Epoch 31: Validation cost (ce) is 0.698. Accuracy is 90.71%\n",
      "INFO:mlp.optimisers:Epoch 31: Took 9 seconds. Training speed 292 pps. Validation speed 1684 pps.\n",
      "INFO:mlp.optimisers:Epoch 32: Training cost (ce) is 0.443. Accuracy is 91.20%\n",
      "INFO:mlp.optimisers:Epoch 32: Validation cost (ce) is 0.590. Accuracy is 91.87%\n",
      "INFO:mlp.optimisers:Epoch 32: Took 9 seconds. Training speed 291 pps. Validation speed 1678 pps.\n",
      "INFO:mlp.optimisers:Epoch 33: Training cost (ce) is 0.557. Accuracy is 91.00%\n",
      "INFO:mlp.optimisers:Epoch 33: Validation cost (ce) is 0.624. Accuracy is 91.39%\n",
      "INFO:mlp.optimisers:Epoch 33: Took 10 seconds. Training speed 277 pps. Validation speed 1687 pps.\n",
      "INFO:mlp.optimisers:Epoch 34: Training cost (ce) is 0.451. Accuracy is 91.30%\n",
      "INFO:mlp.optimisers:Epoch 34: Validation cost (ce) is 0.687. Accuracy is 91.12%\n",
      "INFO:mlp.optimisers:Epoch 34: Took 9 seconds. Training speed 320 pps. Validation speed 1684 pps.\n",
      "INFO:mlp.optimisers:Epoch 35: Training cost (ce) is 0.456. Accuracy is 91.40%\n",
      "INFO:mlp.optimisers:Epoch 35: Validation cost (ce) is 0.723. Accuracy is 91.09%\n",
      "INFO:mlp.optimisers:Epoch 35: Took 9 seconds. Training speed 336 pps. Validation speed 1721 pps.\n",
      "INFO:mlp.optimisers:Epoch 36: Training cost (ce) is 0.379. Accuracy is 92.80%\n",
      "INFO:mlp.optimisers:Epoch 36: Validation cost (ce) is 0.753. Accuracy is 90.54%\n",
      "INFO:mlp.optimisers:Epoch 36: Took 9 seconds. Training speed 320 pps. Validation speed 1710 pps.\n",
      "INFO:mlp.optimisers:Epoch 37: Training cost (ce) is 0.387. Accuracy is 93.10%\n",
      "INFO:mlp.optimisers:Epoch 37: Validation cost (ce) is 0.721. Accuracy is 91.16%\n",
      "INFO:mlp.optimisers:Epoch 37: Took 9 seconds. Training speed 306 pps. Validation speed 1692 pps.\n",
      "INFO:mlp.optimisers:Epoch 38: Training cost (ce) is 0.489. Accuracy is 91.60%\n",
      "INFO:mlp.optimisers:Epoch 38: Validation cost (ce) is 0.818. Accuracy is 89.82%\n",
      "INFO:mlp.optimisers:Epoch 38: Took 9 seconds. Training speed 301 pps. Validation speed 1707 pps.\n",
      "INFO:mlp.optimisers:Epoch 39: Training cost (ce) is 0.510. Accuracy is 91.70%\n",
      "INFO:mlp.optimisers:Epoch 39: Validation cost (ce) is 0.690. Accuracy is 91.15%\n",
      "INFO:mlp.optimisers:Epoch 39: Took 9 seconds. Training speed 296 pps. Validation speed 1712 pps.\n",
      "INFO:mlp.optimisers:Epoch 40: Training cost (ce) is 0.560. Accuracy is 91.80%\n",
      "INFO:mlp.optimisers:Epoch 40: Validation cost (ce) is 0.729. Accuracy is 91.11%\n",
      "INFO:mlp.optimisers:Epoch 40: Took 9 seconds. Training speed 302 pps. Validation speed 1695 pps.\n",
      "INFO:mlp.optimisers:Epoch 41: Training cost (ce) is 0.484. Accuracy is 91.50%\n",
      "INFO:mlp.optimisers:Epoch 41: Validation cost (ce) is 0.629. Accuracy is 92.18%\n",
      "INFO:mlp.optimisers:Epoch 41: Took 9 seconds. Training speed 325 pps. Validation speed 1689 pps.\n",
      "INFO:mlp.optimisers:Epoch 42: Training cost (ce) is 0.327. Accuracy is 93.40%\n",
      "INFO:mlp.optimisers:Epoch 42: Validation cost (ce) is 0.723. Accuracy is 91.48%\n",
      "INFO:mlp.optimisers:Epoch 42: Took 9 seconds. Training speed 300 pps. Validation speed 1661 pps.\n",
      "INFO:mlp.optimisers:Epoch 43: Training cost (ce) is 0.358. Accuracy is 93.50%\n",
      "INFO:mlp.optimisers:Epoch 43: Validation cost (ce) is 0.665. Accuracy is 91.98%\n",
      "INFO:mlp.optimisers:Epoch 43: Took 9 seconds. Training speed 291 pps. Validation speed 1707 pps.\n",
      "INFO:mlp.optimisers:Epoch 44: Training cost (ce) is 0.441. Accuracy is 92.80%\n",
      "INFO:mlp.optimisers:Epoch 44: Validation cost (ce) is 0.846. Accuracy is 90.96%\n",
      "INFO:mlp.optimisers:Epoch 44: Took 9 seconds. Training speed 325 pps. Validation speed 1718 pps.\n",
      "INFO:mlp.optimisers:Epoch 45: Training cost (ce) is 0.526. Accuracy is 91.10%\n",
      "INFO:mlp.optimisers:Epoch 45: Validation cost (ce) is 0.674. Accuracy is 92.17%\n",
      "INFO:mlp.optimisers:Epoch 45: Took 9 seconds. Training speed 317 pps. Validation speed 1710 pps.\n",
      "INFO:mlp.optimisers:Epoch 46: Training cost (ce) is 0.407. Accuracy is 91.90%\n",
      "INFO:mlp.optimisers:Epoch 46: Validation cost (ce) is 0.819. Accuracy is 90.26%\n",
      "INFO:mlp.optimisers:Epoch 46: Took 9 seconds. Training speed 308 pps. Validation speed 1698 pps.\n",
      "INFO:mlp.optimisers:Epoch 47: Training cost (ce) is 0.482. Accuracy is 92.60%\n",
      "INFO:mlp.optimisers:Epoch 47: Validation cost (ce) is 0.752. Accuracy is 91.34%\n",
      "INFO:mlp.optimisers:Epoch 47: Took 9 seconds. Training speed 286 pps. Validation speed 1687 pps.\n",
      "INFO:mlp.optimisers:Epoch 48: Training cost (ce) is 0.405. Accuracy is 92.90%\n",
      "INFO:mlp.optimisers:Epoch 48: Validation cost (ce) is 0.787. Accuracy is 91.25%\n",
      "INFO:mlp.optimisers:Epoch 48: Took 10 seconds. Training speed 279 pps. Validation speed 1672 pps.\n",
      "INFO:mlp.optimisers:Epoch 49: Training cost (ce) is 0.597. Accuracy is 91.70%\n",
      "INFO:mlp.optimisers:Epoch 49: Validation cost (ce) is 0.794. Accuracy is 91.60%\n",
      "INFO:mlp.optimisers:Epoch 49: Took 9 seconds. Training speed 285 pps. Validation speed 1698 pps.\n",
      "INFO:mlp.optimisers:Epoch 50: Training cost (ce) is 0.472. Accuracy is 93.30%\n",
      "INFO:mlp.optimisers:Epoch 50: Validation cost (ce) is 0.918. Accuracy is 90.65%\n",
      "INFO:mlp.optimisers:Epoch 50: Took 9 seconds. Training speed 303 pps. Validation speed 1672 pps.\n",
      "INFO:root:Testing the model on test set:\n",
      "INFO:root:MNIST test set accuracy is 90.79 %, cost (ce) is 0.898\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The autoreload extension is already loaded. To reload it, use:\n",
      "  %reload_ext autoreload\n"
     ]
    }
   ],
   "source": [
    "%autoreload\n",
    "\n",
    "import numpy\n",
    "import logging\n",
    "\n",
    "from mlp.layers import MLP, Linear, Sigmoid, Softmax #import required layer types\n",
    "from mlp.optimisers import SGDOptimiser #import the optimiser\n",
    "from mlp.dataset import MNISTDataProvider #import data provider\n",
    "from mlp.costs import CECost #import the cost we want to use for optimisation\n",
    "from mlp.schedulers import LearningRateFixed, DropoutFixed\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)\n",
    "rng = numpy.random.RandomState([2015,10,10])\n",
    "\n",
    "#some hyper-parameters\n",
    "nhid = 800\n",
    "learning_rate = 0.5\n",
    "max_epochs = 50\n",
    "l1_weight = 0.0\n",
    "l2_weight = 0.0\n",
    "cost = CECost()\n",
    "    \n",
    "stats = []\n",
    "layer = 1\n",
    "for i in xrange(1, 2):\n",
    "\n",
    "    train_dp.reset()\n",
    "    valid_dp.reset()\n",
    "    test_dp.reset()\n",
    "    \n",
    "    #define the model\n",
    "    model = MLP(cost=cost)\n",
    "    model.add_layer(Sigmoid(idim=784, odim=nhid, irange=0.2, rng=rng))\n",
    "    for i in xrange(1, layer):\n",
    "        logger.info(\"Stacking hidden layer (%s)\" % str(i+1))\n",
    "        model.add_layer(Sigmoid(idim=nhid, odim=nhid, irange=0.2, rng=rng))\n",
    "    model.add_layer(Softmax(idim=nhid, odim=10, rng=rng))\n",
    "\n",
    "    # define the optimiser, here stochasitc gradient descent\n",
    "    # with fixed learning rate and max_epochs\n",
    "    lr_scheduler = LearningRateFixed(learning_rate=learning_rate, max_epochs=max_epochs)\n",
    "    dp_scheduler = DropoutFixed(0.5, 0.5)\n",
    "    optimiser = SGDOptimiser(lr_scheduler=lr_scheduler, \n",
    "                             dp_scheduler=dp_scheduler,\n",
    "                             l1_weight=l1_weight, \n",
    "                             l2_weight=l2_weight)\n",
    "\n",
    "    logger.info('Training started...')\n",
    "    tr_stats, valid_stats = optimiser.train(model, train_dp, valid_dp)\n",
    "\n",
    "    logger.info('Testing the model on test set:')\n",
    "    tst_cost, tst_accuracy = optimiser.validate(model, test_dp)\n",
    "    logger.info('MNIST test set accuracy is %.2f %%, cost (%s) is %.3f'%(tst_accuracy*100., cost.get_name(), tst_cost))\n",
    "    \n",
    "    stats.append((tr_stats, valid_stats, (tst_cost, tst_accuracy)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}