final changes

changes
add BN+RC layer
2024-11-22 09:26:24 +00:00 · 2024-11-19 17:04:58 +00:00 · 2024-11-19 10:38:54 +00:00 · 2024-11-19 10:10:02 +00:00 · 2024-11-19 09:47:18 +00:00 · 2024-11-19 09:42:31 +00:00
91 changed files with 9924 additions and 1933 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,5 +1,6 @@
 #dropbox stuff
 *.dropbox*
+.idea/*

 # Byte-compiled / optimized / DLL files
 __pycache__/
@ -25,6 +26,7 @@ var/
 *.egg-info/
 .installed.cfg
 *.egg
+*.tar.gz

 # PyInstaller
 #  Usually these files are written by a python script from a template
@ -59,5 +61,29 @@ docs/_build/
 # PyBuilder
 target/

-# Notebook stuff
+# Pycharm
+.idea/*
+
+#Notebook stuff
 notebooks/.ipynb_checkpoints/
+
+#Google Cloud stuff
+/google-cloud-sdk
+.ipynb_checkpoints/
+data/cifar-100-python/
+data/MNIST/
+solutions/
+report/mlp-cw1-template.aux
+report/mlp-cw1-template.out
+report/mlp-cw1-template.pdf
+report/mlp-cw1-template.synctex.gz
+.DS_Store
+report/mlp-cw2-template.aux
+report/mlp-cw2-template.out
+report/mlp-cw2-template.pdf
+report/mlp-cw2-template.synctex.gz
+report/mlp-cw2-template.bbl
+report/mlp-cw2-template.blg
+
+venv
+saved_models
--- a/README.md
+++ b/README.md
@ -6,19 +6,10 @@ This assignment-based course is focused on the implementation and evaluation of

 The code in this repository is split into:

-* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
-* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
+  *  a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
+  *  a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.

-## Remote working
-
-If you are working remotely, follow this [guide](notes/remote-working-guide.md).
-
-## Getting set up
-
-Detailed instructions for setting up a development environment for the course are given in [this file](notes/environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
-
-## Exercises
-
-If you are first time users of jupyter notebook, check out `notebooks/00_notebook.ipynb` to understand its features.
-
-To get started with the exercises, go to the `notebooks` directory. For lab 1, work with the notebook starting with the prefix `01`, and so on.  
+## Coursework 2
+This branch contains the python code and latex files of the first coursework. The code follows the same structure as the labs, in particular the mlp package, and a specific notebook is provided to help you run experiments.
+ * Detailed instructions are given in MLP2024_25_CW2_Spec.pdf (see Learn, Assessment, CW2).
+ * The [report directory](https://github.com/VICO-UoE/mlpractical/tree/mlp2024-25/coursework2/report) contains the latex files that you will use to create your report.
--- a/VGG_08/result_outputs/summary.csv
+++ b/VGG_08/result_outputs/summary.csv
@ -0,0 +1,102 @@
+train_acc,train_loss,val_acc,val_loss
+0.010694736842105264,4.827323,0.024800000000000003,4.5659676
+0.03562105263157895,4.3888855,0.0604,4.136276
+0.0757684210526316,3.998175,0.09480000000000001,3.8678854
+0.10734736842105265,3.784943,0.12159999999999999,3.6687074
+0.13741052631578948,3.6023798,0.15439999999999998,3.4829779
+0.16888421052631578,3.4196754,0.1864,3.3093607
+0.1941263157894737,3.2674048,0.20720000000000002,3.2223148
+0.21861052631578948,3.139925,0.22880000000000003,3.1171055
+0.24134736842105264,3.0145736,0.24760000000000001,3.0554724
+0.26399999999999996,2.9004965,0.2552,2.9390912
+0.27898947368421056,2.815607,0.2764,2.9205213
+0.29532631578947366,2.7256868,0.2968,2.7410471
+0.31138947368421044,2.6567938,0.3016,2.7083752
+0.3236842105263158,2.595405,0.322,2.665904
+0.33486315789473686,2.5434496,0.3176,2.688214
+0.3462526315789474,2.5021079,0.33159999999999995,2.648656
+0.35381052631578946,2.4609485,0.342,2.5658453
+0.36157894736842106,2.4152951,0.34119999999999995,2.5403407
+0.36774736842105266,2.382958,0.3332,2.6936982
+0.37753684210526317,2.3510027,0.36160000000000003,2.4663532
+0.38597894736842114,2.319616,0.3608,2.4559999
+0.3912421052631579,2.294115,0.3732,2.3644555
+0.39840000000000003,2.2598042,0.3716,2.4516551
+0.4036,2.2318766,0.37439999999999996,2.4189563
+0.4105263157894737,2.2035582,0.3772,2.3899698
+0.41501052631578944,2.1830406,0.3876,2.3215945
+0.4193263157894737,2.158597,0.37800000000000006,2.3831298
+0.4211578947368421,2.148888,0.38160000000000005,2.3436418
+0.4260842105263159,2.1250536,0.39840000000000003,2.3471045
+0.4313684210526315,2.107519,0.4044,2.2744477
+0.4370526315789474,2.0837262,0.398,2.245617
+0.439642105263158,2.0691078,0.41200000000000003,2.216309
+0.4440842105263158,2.046351,0.4096,2.2329648
+0.44696842105263157,2.0330904,0.4104,2.1841388
+0.4518105263157895,2.0200553,0.4244,2.1780539
+0.45298947368421055,2.0069249,0.42719999999999997,2.1625984
+0.4602105263157895,1.9896894,0.4204,2.2195568
+0.46023157894736844,1.9788533,0.4244,2.1803434
+0.46101052631578954,1.9693571,0.4128,2.1858895
+0.46774736842105263,1.9547894,0.4204,2.1908271
+0.4671157894736842,1.9390026,0.4244,2.1841395
+0.4698105263157895,1.924038,0.424,2.1843896
+0.4738736842105264,1.9161719,0.43,2.154806
+0.47541052631578945,1.9033127,0.4463999999999999,2.1130056
+0.48,1.8961077,0.44439999999999996,2.113019
+0.48456842105263154,1.8838875,0.43079999999999996,2.1191697
+0.4857263157894737,1.8711865,0.44920000000000004,2.1213412
+0.4887578947368421,1.8590263,0.44799999999999995,2.1077166
+0.49035789473684216,1.8479114,0.4428,2.0737479
+0.4908421052631579,1.845268,0.4436,2.07655
+0.4939368421052632,1.8336699,0.4548,2.0769904
+0.49924210526315793,1.8237538,0.4548,2.061769
+0.49677894736842104,1.8111013,0.44240000000000007,2.0676718
+0.5008842105263157,1.8031327,0.4548,2.0859065
+0.5,1.8026625,0.458,2.0704215
+0.5030736842105263,1.792004,0.4596,2.1113508
+0.505578947368421,1.7810374,0.45679999999999993,2.0382714
+0.5090315789473684,1.7691813,0.4444000000000001,2.0911386
+0.512042105263158,1.7633294,0.4616,2.0458508
+0.5142736842105263,1.7549652,0.4464,2.0786576
+0.5128421052631579,1.7518128,0.4656,2.026332
+0.518042105263158,1.7420768,0.46,2.0141299
+0.5182315789473684,1.7321203,0.45960000000000006,2.0226884
+0.5192842105263158,1.7264535,0.46279999999999993,2.0182638
+0.5217894736842105,1.7245325,0.46399999999999997,2.0110855
+0.5229684210526316,1.7184331,0.46679999999999994,2.0191038
+0.5227578947368421,1.7116771,0.4604,2.0334535
+0.5245894736842105,1.7009526,0.4692,2.0072439
+0.5262315789473684,1.6991171,0.4700000000000001,2.0296187
+0.5278526315789474,1.6958193,0.4708,1.9912667
+0.527157894736842,1.6907407,0.4736,2.006095
+0.5299578947368421,1.6808176,0.4715999999999999,2.012164
+0.5313052631578947,1.676356,0.47239999999999993,1.9955354
+0.5338315789473685,1.6731659,0.47839999999999994,2.005768
+0.5336000000000001,1.662152,0.4672,2.015392
+0.5354736842105263,1.6638054,0.4692,1.9890119
+0.5397894736842105,1.6575475,0.4768,2.0090258
+0.5386526315789474,1.6595734,0.4824,1.9728817
+0.5376631578947368,1.6536722,0.4816,1.9769167
+0.5384842105263159,1.6495628,0.47600000000000003,1.9980135
+0.5380842105263157,1.6488388,0.478,1.9884782
+0.5393473684210528,1.6408547,0.48,1.9772192
+0.5415157894736843,1.632917,0.4828,1.9732709
+0.5394947368421052,1.6340653,0.4776,1.9623082
+0.5429052631578948,1.6340532,0.47759999999999997,1.9812362
+0.5452421052631579,1.6246406,0.48119999999999996,1.9846246
+0.5436210526315789,1.6288266,0.4864,1.9822198
+0.5437684210526316,1.6240481,0.48279999999999995,1.9768158
+0.546357894736842,1.6208181,0.4804,1.9625885
+0.5485052631578946,1.6164333,0.47839999999999994,1.9738724
+0.5466736842105263,1.6169226,0.47800000000000004,1.9842362
+0.547621052631579,1.6159856,0.4828,1.9709526
+0.5480421052631579,1.6175526,0.48560000000000003,1.967775
+0.5468421052631579,1.6149833,0.48119999999999996,1.9626708
+0.5493894736842105,1.6063902,0.4835999999999999,1.96621
+0.5490736842105263,1.6096952,0.48120000000000007,1.9742922
+0.5514736842105264,1.6084315,0.4867999999999999,1.9604725
+0.5489263157894737,1.6069487,0.4831999999999999,1.9733659
+0.5494947368421053,1.6030664,0.49079999999999996,1.9693874
+0.5516842105263158,1.6043342,0.486,1.9647765
+0.552442105263158,1.6039867,0.48480000000000006,1.9649359
--- a/VGG_08/result_outputs/test_summary.csv
+++ b/VGG_08/result_outputs/test_summary.csv
@ -0,0 +1,2 @@
+test_acc,test_loss
+0.49950000000000006,1.9105633
--- a/VGG_38/result_outputs/summary.csv
+++ b/VGG_38/result_outputs/summary.csv
@ -0,0 +1,101 @@
+train_acc,train_loss,val_acc,val_loss
+0.009263157894736843,4.8649125,0.0104,4.630689
+0.009810526315789474,4.6264124,0.009600000000000001,4.618983
+0.009705263157894738,4.621914,0.011200000000000002,4.6184525
+0.008989473684210525,4.619472,0.0064,4.6164784
+0.009747368421052633,4.6168556,0.0076,4.6138463
+0.00951578947368421,4.6156826,0.0108,4.6139345
+0.009789473684210525,4.614809,0.008400000000000001,4.6116896
+0.009936842105263159,4.613147,0.0104,4.6148276
+0.009810526315789474,4.612325,0.0076,4.6123877
+0.009094736842105263,4.6117926,0.007200000000000001,4.6149993
+0.008421052631578947,4.611283,0.011600000000000001,4.6114736
+0.009010526315789472,4.6105323,0.009600000000000001,4.607559
+0.009894736842105263,4.6103206,0.008400000000000001,4.6086206
+0.00934736842105263,4.6095214,0.011200000000000002,4.6091933
+0.009473684210526316,4.6095295,0.008,4.6095695
+0.010252631578947369,4.609189,0.0104,4.610459
+0.009536842105263158,4.6087623,0.0092,4.6091356
+0.00848421052631579,4.6086617,0.009600000000000001,4.609126
+0.008421052631578947,4.6083455,0.011200000000000002,4.6088147
+0.009410526315789473,4.608145,0.0068000000000000005,4.608519
+0.009263157894736843,4.6078997,0.0092,4.6085033
+0.009389473684210526,4.607453,0.01,4.6083508
+0.008989473684210528,4.6075597,0.008400000000000001,4.6073136
+0.009326315789473686,4.607266,0.008,4.6069093
+0.01,4.607154,0.0076,4.6069508
+0.008778947368421053,4.607089,0.011200000000000002,4.60659
+0.009326315789473684,4.606807,0.0068,4.6072598
+0.009031578947368422,4.6068263,0.011200000000000002,4.607257
+0.008842105263157896,4.6066294,0.008,4.606883
+0.008968421052631579,4.606647,0.006400000000000001,4.607275
+0.008947368421052631,4.6065364,0.0092,4.606976
+0.008842105263157896,4.6064167,0.0076,4.607016
+0.008799999999999999,4.606425,0.0096,4.607184
+0.009326315789473686,4.606305,0.0072,4.6068683
+0.00905263157894737,4.606274,0.0072,4.606982
+0.00934736842105263,4.6062336,0.007200000000000001,4.607209
+0.009221052631578948,4.606221,0.0076,4.607369
+0.009557894736842105,4.60607,0.0076,4.6074376
+0.009073684210526317,4.6061006,0.0072,4.607068
+0.009242105263157895,4.606005,0.0064,4.6067224
+0.009957894736842107,4.605986,0.0072,4.6068263
+0.009052631578947368,4.605935,0.0072,4.6067867
+0.008694736842105264,4.6059127,0.0064,4.6070905
+0.009536842105263158,4.605874,0.006400000000000001,4.606976
+0.009663157894736842,4.605872,0.0072,4.6068897
+0.008821052631578948,4.6057997,0.0064,4.607028
+0.009768421052631579,4.605778,0.0072,4.6069264
+0.0092,4.6057644,0.007200000000000001,4.607018
+0.008926315789473685,4.6057386,0.0072,4.60698
+0.008989473684210525,4.6057277,0.0064,4.6070237
+0.009242105263157895,4.6057053,0.0064,4.6069183
+0.009094736842105263,4.605692,0.006400000000000001,4.6068764
+0.009473684210526316,4.60566,0.0064,4.606909
+0.009494736842105262,4.605613,0.0064,4.606978
+0.009747368421052631,4.6056285,0.0064,4.606753
+0.009789473684210527,4.605578,0.006400000000000001,4.6068797
+0.009199999999999998,4.6055675,0.0064,4.606888
+0.009073684210526317,4.6055593,0.0064,4.606874
+0.008821052631578948,4.6055293,0.006400000000000001,4.606851
+0.009326315789473684,4.6055255,0.0064,4.606871
+0.009557894736842105,4.6055083,0.006400000000000001,4.606851
+0.009600000000000001,4.605491,0.0064,4.6068635
+0.00856842105263158,4.605466,0.0064,4.606862
+0.009894736842105263,4.605463,0.006400000000000001,4.6068873
+0.009494736842105262,4.605441,0.0064,4.6068926
+0.008673684210526314,4.6054277,0.0064,4.6068554
+0.009221052631578948,4.6054296,0.0063999999999999994,4.6068907
+0.008989473684210528,4.605404,0.0064,4.6068807
+0.00928421052631579,4.6053905,0.006400000000000001,4.6068707
+0.0092,4.6053743,0.0064,4.606894
+0.008989473684210525,4.605368,0.0064,4.606845
+0.009515789473684212,4.605355,0.0064,4.6068635
+0.009073684210526317,4.605352,0.0064,4.6068773
+0.009642105263157895,4.6053243,0.0064,4.606883
+0.009747368421052633,4.6053176,0.0064,4.6069
+0.009873684210526316,4.6053023,0.0064,4.6068873
+0.009536842105263156,4.605297,0.0064,4.6068654
+0.009515789473684212,4.6052866,0.0064,4.6068883
+0.009978947368421053,4.605265,0.006400000000000001,4.606894
+0.009957894736842107,4.605259,0.0064,4.6068826
+0.009410526315789475,4.6052504,0.0064,4.6068697
+0.01002105263157895,4.6052403,0.006400000000000001,4.6068807
+0.01002105263157895,4.6052313,0.0064,4.606872
+0.00951578947368421,4.605224,0.0064,4.6068883
+0.009852631578947368,4.605219,0.006400000000000001,4.606871
+0.009894736842105265,4.605209,0.0064,4.606871
+0.00922105263157895,4.605204,0.0064,4.6068654
+0.010042105263157896,4.605193,0.0064,4.6068764
+0.009978947368421053,4.6051874,0.006400000000000001,4.6068697
+0.009747368421052633,4.605183,0.0064,4.6068673
+0.010189473684210526,4.605178,0.0064,4.606873
+0.009789473684210527,4.605173,0.0064,4.6068773
+0.009936842105263159,4.605169,0.0064,4.606874
+0.010042105263157894,4.605166,0.0064,4.606877
+0.009494736842105262,4.6051593,0.0064,4.606874
+0.009536842105263158,4.6051593,0.0063999999999999994,4.606874
+0.010021052631578946,4.6051564,0.006400000000000001,4.6068716
+0.009747368421052631,4.605154,0.0064,4.6068726
+0.009642105263157895,4.605153,0.0064,4.606872
+0.009305263157894737,4.6051517,0.0064,4.6068726
--- a/VGG_38/result_outputs/test_summary.csv
+++ b/VGG_38/result_outputs/test_summary.csv
@ -0,0 +1,2 @@
+test_acc,test_loss
+0.01,4.608619
--- a/data/VGG38_BN_RC_accuracy_performance.pdf
+++ b/data/VGG38_BN_RC_accuracy_performance.pdf
--- a/data/VGG38_BN_RC_loss_performance.pdf
+++ b/data/VGG38_BN_RC_loss_performance.pdf
--- a/data/emnist-test.npz
+++ b/data/emnist-test.npz
--- a/data/emnist-train.npz
+++ b/data/emnist-train.npz
--- a/data/emnist-valid.npz
+++ b/data/emnist-valid.npz
--- a/data/problem_model_accuracy_performance.pdf
+++ b/data/problem_model_accuracy_performance.pdf
--- a/data/problem_model_loss_performance.pdf
+++ b/data/problem_model_loss_performance.pdf
--- a/mlp/init.py
+++ b/mlp/init.py
@ -1,6 +1,6 @@
 # -*- coding: utf-8 -*-
 """Machine Learning Practical package."""

-__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham']
+__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham', 'Antreas Antoniou']

 DEFAULT_SEED = 123456  # Default random number generator seed if none provided.
--- a/mlp/data_providers.py
+++ b/mlp/data_providers.py
@ -7,8 +7,17 @@ data points.

 import pickle
 import gzip
+import sys
+
 import numpy as np
 import os
+
+from PIL import Image
+from torch.utils import data
+from torch.utils.data import Dataset
+from torchvision import transforms
+from torchvision.datasets.utils import download_url, check_integrity
+
 from mlp import DEFAULT_SEED


@ -35,23 +44,54 @@ class DataProvider(object):
        """
        self.inputs = inputs
        self.targets = targets
-        self.batch_size = batch_size
-        assert max_num_batches != 0 and not max_num_batches < -1, (
-            'max_num_batches should be -1 or > 0')
-        self.max_num_batches = max_num_batches
+        if batch_size < 1:
+            raise ValueError('batch_size must be >= 1')
+        self._batch_size = batch_size
+        if max_num_batches == 0 or max_num_batches < -1:
+            raise ValueError('max_num_batches must be -1 or > 0')
+        self._max_num_batches = max_num_batches
+        self._update_num_batches()
+        self.shuffle_order = shuffle_order
+        self._current_order = np.arange(inputs.shape[0])
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+        self.new_epoch()
+
+    @property
+    def batch_size(self):
+        """Number of data points to include in each batch."""
+        return self._batch_size
+
+    @batch_size.setter
+    def batch_size(self, value):
+        if value < 1:
+            raise ValueError('batch_size must be >= 1')
+        self._batch_size = value
+        self._update_num_batches()
+
+    @property
+    def max_num_batches(self):
+        """Maximum number of batches to iterate over in an epoch."""
+        return self._max_num_batches
+
+    @max_num_batches.setter
+    def max_num_batches(self, value):
+        if value == 0 or value < -1:
+            raise ValueError('max_num_batches must be -1 or > 0')
+        self._max_num_batches = value
+        self._update_num_batches()
+
+    def _update_num_batches(self):
+        """Updates number of batches to iterate over."""
        # maximum possible number of batches is equal to number of whole times
        # batch_size divides in to the number of data points which can be
        # found using integer division
-        possible_num_batches = self.inputs.shape[0] // batch_size
+        possible_num_batches = self.inputs.shape[0] // self.batch_size
        if self.max_num_batches == -1:
            self.num_batches = possible_num_batches
        else:
            self.num_batches = min(self.max_num_batches, possible_num_batches)
-        self.shuffle_order = shuffle_order
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-        self.reset()

    def __iter__(self):
        """Implements Python iterator interface.
@ -63,27 +103,36 @@ class DataProvider(object):
        """
        return self

-    def reset(self):
-        """Resets the provider to the initial state to use in a new epoch."""
+    def new_epoch(self):
+        """Starts a new epoch (pass through data), possibly shuffling first."""
        self._curr_batch = 0
        if self.shuffle_order:
            self.shuffle()

-    def shuffle(self):
-        """Randomly shuffles order of data."""
-        new_order = self.rng.permutation(self.inputs.shape[0])
-        self.inputs = self.inputs[new_order]
-        self.targets = self.targets[new_order]
-
    def __next__(self):
        return self.next()

+    def reset(self):
+        """Resets the provider to the initial state."""
+        inv_perm = np.argsort(self._current_order)
+        self._current_order = self._current_order[inv_perm]
+        self.inputs = self.inputs[inv_perm]
+        self.targets = self.targets[inv_perm]
+        self.new_epoch()
+
+    def shuffle(self):
+        """Randomly shuffles order of data."""
+        perm = self.rng.permutation(self.inputs.shape[0])
+        self._current_order = self._current_order[perm]
+        self.inputs = self.inputs[perm]
+        self.targets = self.targets[perm]
+
    def next(self):
        """Returns next data batch or raises `StopIteration` if at end."""
        if self._curr_batch + 1 > self.num_batches:
-            # no more batches in current iteration through data set so reset
-            # the dataset for another pass and indicate iteration is at end
-            self.reset()
+            # no more batches in current iteration through data set so start
+            # new epoch ready for another pass and indicate iteration is at end
+            self.new_epoch()
            raise StopIteration()
        # create an index slice corresponding to current batch number
        batch_slice = slice(self._curr_batch * self.batch_size,
@ -93,7 +142,6 @@ class DataProvider(object):
        self._curr_batch += 1
        return inputs_batch, targets_batch

-
 class MNISTDataProvider(DataProvider):
    """Data provider for MNIST handwritten digit images."""

@ -114,7 +162,7 @@ class MNISTDataProvider(DataProvider):
            rng (RandomState): A seeded random number generator.
        """
        # check a valid which_set was provided
-        assert which_set in ['train', 'valid', 'eval'], (
+        assert which_set in ['train', 'valid', 'test'], (
            'Expected which_set to be either train, valid or eval. '
            'Got {0}'.format(which_set)
        )
@ -160,6 +208,78 @@ class MNISTDataProvider(DataProvider):
        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
        return one_of_k_targets

+class EMNISTDataProvider(DataProvider):
+    """Data provider for EMNIST handwritten digit images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, flatten=False):
+        """Create a new EMNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the EMNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.which_set = which_set
+        self.num_classes = 47
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        data_path = os.path.join(
+            os.environ['MLP_DATA_DIR'], 'emnist-{0}.npz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # load data from compressed numpy file
+        loaded = np.load(data_path)
+        print(loaded.keys())
+        inputs, targets = loaded['inputs'], loaded['targets']
+        inputs = inputs.astype(np.float32)
+        targets = targets.astype(np.int)
+        if flatten:
+            inputs = np.reshape(inputs, newshape=(-1, 28*28))
+        else:
+            inputs = np.reshape(inputs, newshape=(-1, 28, 28, 1))
+        inputs = inputs / 255.0
+        # pass the loaded data to the parent class __init__
+        super(EMNISTDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(EMNISTDataProvider, self).next()
+        return inputs_batch, self.to_one_of_k(targets_batch)
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets

 class MetOfficeDataProvider(DataProvider):
    """South Scotland Met Office weather data provider."""
@ -253,3 +373,374 @@ class CCPPDataProvider(DataProvider):
        targets = loaded[which_set + '_targets']
        super(CCPPDataProvider, self).__init__(
            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+class EMNISTPytorchDataProvider(Dataset):
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, flatten=False, transforms=None):
+        self.numpy_data_provider = EMNISTDataProvider(which_set=which_set, batch_size=batch_size, max_num_batches=max_num_batches,
+                 shuffle_order=shuffle_order, rng=rng, flatten=flatten)
+        self.transforms = transforms
+
+    def __getitem__(self, item):
+        x = self.numpy_data_provider.inputs[item]
+        for augmentation in self.transforms:
+            x = augmentation(x)
+        return x, int(self.numpy_data_provider.targets[item])
+
+    def __len__(self):
+        return len(self.numpy_data_provider.targets)
+
+class AugmentedMNISTDataProvider(MNISTDataProvider):
+    """Data provider for MNIST dataset which randomly transforms images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, transformer=None):
+        """Create a new augmented MNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'test'. Determines which
+                portion of the MNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+            transformer: Function which takes an `inputs` array of shape
+                (batch_size, input_dim) corresponding to a batch of input
+                images and a `rng` random number generator object (i.e. a
+                call signature `transformer(inputs, rng)`) and applies a
+                potentiall random set of transformations to some / all of the
+                input images as each new batch is returned when iterating over
+                the data provider.
+        """
+        super(AugmentedMNISTDataProvider, self).__init__(
+            which_set, batch_size, max_num_batches, shuffle_order, rng)
+        self.transformer = transformer
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(
+            AugmentedMNISTDataProvider, self).next()
+        transformed_inputs_batch = self.transformer(inputs_batch, self.rng)
+        return transformed_inputs_batch, targets_batch
+
+class Omniglot(data.Dataset):
+    """`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
+    Args:
+        root (string): Root directory of dataset where directory
+            ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
+        train (bool, optional): If True, creates dataset from training set, otherwise
+            creates from test set.
+        transform (callable, optional): A function/transform that  takes in an PIL image
+            and returns a transformed version. E.g, ``transforms.RandomCrop``
+        target_transform (callable, optional): A function/transform that takes in the
+            target and transforms it.
+        download (bool, optional): If true, downloads the dataset from the internet and
+            puts it in root directory. If dataset is already downloaded, it is not
+            downloaded again.
+    """
+    def collect_data_paths(self, root):
+        data_dict = dict()
+        print(root)
+        for subdir, dir, files in os.walk(root):
+            for file in files:
+                if file.endswith('.png'):
+                    filepath = os.path.join(subdir, file)
+                    class_label = '_'.join(subdir.split("/")[-2:])
+                    if class_label in data_dict:
+                        data_dict[class_label].append(filepath)
+                    else:
+                        data_dict[class_label] = [filepath]
+
+        return data_dict
+
+    def __init__(self, root, set_name,
+                 transform=None, target_transform=None,
+                 download=False):
+        self.root = os.path.expanduser(root)
+        self.root = os.path.abspath(os.path.join(self.root, 'omniglot_dataset'))
+        self.transform = transform
+        self.target_transform = target_transform
+        self.set_name = set_name  # training set or test set
+        self.data_dict = self.collect_data_paths(root=self.root)
+
+        x = []
+        label_to_idx = {label: idx for idx, label in enumerate(self.data_dict.keys())}
+        y = []
+
+        for key, value in self.data_dict.items():
+            x.extend(value)
+            y.extend(len(value) * [label_to_idx[key]])
+
+        y = np.array(y)
+
+
+        rng = np.random.RandomState(seed=0)
+
+        idx = np.arange(len(x))
+        rng.shuffle(idx)
+
+        x = [x[current_idx] for current_idx in idx]
+        y = y[idx]
+
+        train_sample_idx = rng.choice(a=[i for i in range(len(x))], size=int(len(x) * 0.80), replace=False)
+        evaluation_sample_idx = [i for i in range(len(x)) if i not in train_sample_idx]
+        validation_sample_idx = rng.choice(a=[i for i in range(len(evaluation_sample_idx))], size=int(len(evaluation_sample_idx) * 0.40), replace=False)
+        test_sample_idx = [i for i in range(len(evaluation_sample_idx)) if i not in evaluation_sample_idx]
+
+        if self.set_name=='train':
+            self.data = [item for idx, item in enumerate(x) if idx in train_sample_idx]
+            self.labels = y[train_sample_idx]
+
+        elif self.set_name=='val':
+            self.data = [item for idx, item in enumerate(x) if idx in validation_sample_idx]
+            self.labels = y[validation_sample_idx]
+
+        else:
+            self.data = [item for idx, item in enumerate(x) if idx in test_sample_idx]
+            self.labels = y[test_sample_idx]
+
+    def __getitem__(self, index):
+        """
+        Args:
+            index (int): Index
+        Returns:
+            tuple: (image, target) where target is index of the target class.
+        """
+        img, target = self.data[index], self.labels[index]
+
+        img = Image.open(img)
+        img.show()
+
+        if self.transform is not None:
+            img = self.transform(img)
+
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+
+        return img, target
+
+    def __len__(self):
+        return len(self.data)
+
+
+    def __repr__(self):
+        fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
+        fmt_str += '    Number of datapoints: {}\n'.format(self.__len__())
+        tmp = self.set_name
+        fmt_str += '    Split: {}\n'.format(tmp)
+        fmt_str += '    Root Location: {}\n'.format(self.root)
+        tmp = '    Transforms (if any): '
+        fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
+        tmp = '    Target Transforms (if any): '
+        fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
+        return fmt_str
+
+class CIFAR10(data.Dataset):
+    """`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
+    Args:
+        root (string): Root directory of dataset where directory
+            ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
+        train (bool, optional): If True, creates dataset from training set, otherwise
+            creates from test set.
+        transform (callable, optional): A function/transform that  takes in an PIL image
+            and returns a transformed version. E.g, ``transforms.RandomCrop``
+        target_transform (callable, optional): A function/transform that takes in the
+            target and transforms it.
+        download (bool, optional): If true, downloads the dataset from the internet and
+            puts it in root directory. If dataset is already downloaded, it is not
+            downloaded again.
+    """
+    base_folder = 'cifar-10-batches-py'
+    url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
+    filename = "cifar-10-python.tar.gz"
+    tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
+    train_list = [
+        ['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
+        ['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
+        ['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
+        ['data_batch_4', '634d18415352ddfa80567beed471001a'],
+        ['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
+    ]
+
+    test_list = [
+        ['test_batch', '40351d587109b95175f43aff81a1287e'],
+    ]
+
+    def __init__(self, root, set_name,
+                 transform=None, target_transform=None,
+                 download=False):
+        self.root = os.path.expanduser(root)
+        self.transform = transform
+        self.target_transform = target_transform
+        self.set_name = set_name  # training set or test set
+
+        if download:
+            self.download()
+
+        if not self._check_integrity():
+            raise RuntimeError('Dataset not found or corrupted.' +
+                               ' You can use download=True to download it')
+
+        # now load the picked numpy arrays
+        rng = np.random.RandomState(seed=0)
+
+        train_sample_idx = rng.choice(a=[i for i in range(50000)], size=47500, replace=False)
+        val_sample_idx = [i for i in range(50000) if i not in train_sample_idx]
+
+        if self.set_name=='train':
+            self.data = []
+            self.labels = []
+            for fentry in self.train_list:
+                f = fentry[0]
+                file = os.path.join(self.root, self.base_folder, f)
+                fo = open(file, 'rb')
+                if sys.version_info[0] == 2:
+                    entry = pickle.load(fo)
+                else:
+                    entry = pickle.load(fo, encoding='latin1')
+                self.data.append(entry['data'])
+                if 'labels' in entry:
+                    self.labels += entry['labels']
+                else:
+                    self.labels += entry['fine_labels']
+                fo.close()
+
+            self.data = np.concatenate(self.data)
+
+            self.data = self.data.reshape((50000, 3, 32, 32))
+            self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
+            self.data = self.data[train_sample_idx]
+            self.labels = np.array(self.labels)[train_sample_idx]
+            print(set_name, self.data.shape)
+            print(set_name, self.labels.shape)
+
+        elif self.set_name=='val':
+            self.data = []
+            self.labels = []
+            for fentry in self.train_list:
+                f = fentry[0]
+                file = os.path.join(self.root, self.base_folder, f)
+                fo = open(file, 'rb')
+                if sys.version_info[0] == 2:
+                    entry = pickle.load(fo)
+                else:
+                    entry = pickle.load(fo, encoding='latin1')
+                self.data.append(entry['data'])
+                if 'labels' in entry:
+                    self.labels += entry['labels']
+                else:
+                    self.labels += entry['fine_labels']
+                fo.close()
+
+            self.data = np.concatenate(self.data)
+            self.data = self.data.reshape((50000, 3, 32, 32))
+            self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
+            self.data = self.data[val_sample_idx]
+            self.labels = np.array(self.labels)[val_sample_idx]
+            print(set_name, self.data.shape)
+            print(set_name, self.labels.shape)
+
+        else:
+            f = self.test_list[0][0]
+            file = os.path.join(self.root, self.base_folder, f)
+            fo = open(file, 'rb')
+            if sys.version_info[0] == 2:
+                entry = pickle.load(fo)
+            else:
+                entry = pickle.load(fo, encoding='latin1')
+            self.data = entry['data']
+            if 'labels' in entry:
+                self.labels = entry['labels']
+            else:
+                self.labels = entry['fine_labels']
+            fo.close()
+            self.data = self.data.reshape((10000, 3, 32, 32))
+            self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
+            self.labels = np.array(self.labels)
+            print(set_name, self.data.shape)
+            print(set_name, self.labels.shape)
+
+    def __getitem__(self, index):
+        """
+        Args:
+            index (int): Index
+        Returns:
+            tuple: (image, target) where target is index of the target class.
+        """
+        img, target = self.data[index], self.labels[index]
+
+        # doing this so that it is consistent with all other datasets
+        # to return a PIL Image
+
+        img = Image.fromarray(img)
+
+        if self.transform is not None:
+            img = self.transform(img)
+
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+
+        return img, target
+
+    def __len__(self):
+        return len(self.data)
+
+    def _check_integrity(self):
+        root = self.root
+        for fentry in (self.train_list + self.test_list):
+            filename, md5 = fentry[0], fentry[1]
+            fpath = os.path.join(root, self.base_folder, filename)
+            if not check_integrity(fpath, md5):
+                return False
+        return True
+
+    def download(self):
+        import tarfile
+
+        if self._check_integrity():
+            print('Files already downloaded and verified')
+            return
+
+        root = self.root
+        download_url(self.url, root, self.filename, self.tgz_md5)
+
+        # extract file
+        cwd = os.getcwd()
+        tar = tarfile.open(os.path.join(root, self.filename), "r:gz")
+        os.chdir(root)
+        tar.extractall()
+        tar.close()
+        os.chdir(cwd)
+
+    def __repr__(self):
+        fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
+        fmt_str += '    Number of datapoints: {}\n'.format(self.__len__())
+        tmp = self.set_name
+        fmt_str += '    Split: {}\n'.format(tmp)
+        fmt_str += '    Root Location: {}\n'.format(self.root)
+        tmp = '    Transforms (if any): '
+        fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
+        tmp = '    Target Transforms (if any): '
+        fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
+        return fmt_str
+
+
+class CIFAR100(CIFAR10):
+    """`CIFAR100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
+    This is a subclass of the `CIFAR10` Dataset.
+    """
+    base_folder = 'cifar-100-python'
+    url = "https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz"
+    filename = "cifar-100-python.tar.gz"
+    tgz_md5 = 'eb9058c3a382ffc7106e4002c42a8d85'
+    train_list = [
+        ['train', '16019d7e3df5f24257cddd939b257f8d'],
+    ]
+
+    test_list = [
+        ['test', 'f0ef6b0ae62326f3e7ffdfab6717acfc'],
+    ]
--- a/mlp/errors.py
+++ b/mlp/errors.py
@ -23,10 +23,9 @@ class SumOfSquaredDiffsError(object):
            targets: Array of target outputs of shape (batch_size, output_dim).

        Returns:
-            Scalar error function value.
+            Scalar cost function value.
        """
-        #TODO write your code here
-        raise NotImplementedError()
+        return 0.5 * np.mean(np.sum((outputs - targets)**2, axis=1))

    def grad(self, outputs, targets):
        """Calculates gradient of error function with respect to outputs.
@ -36,11 +35,142 @@ class SumOfSquaredDiffsError(object):
            targets: Array of target outputs of shape (batch_size, output_dim).

        Returns:
-            Gradient of error function with respect to outputs. This should be
-            an array of shape (batch_size, output_dim).
+            Gradient of error function with respect to outputs.
        """
-        #TODO write your code here
-        raise NotImplementedError()
+        return (outputs - targets) / outputs.shape[0]

    def __repr__(self):
-        return 'SumOfSquaredDiffsError'
+        return 'MeanSquaredErrorCost'
+
+
+class BinaryCrossEntropyError(object):
+    """Binary cross entropy error."""
+
+    def __call__(self, outputs, targets):
+        """Calculates error function given a batch of outputs and targets.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar error function value.
+        """
+        return -np.mean(
+            targets * np.log(outputs) + (1. - targets) * np.log(1. - outputs))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of error function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of error function with respect to outputs.
+        """
+        return ((1. - targets) / (1. - outputs) -
+                (targets / outputs)) / outputs.shape[0]
+
+    def __repr__(self):
+        return 'BinaryCrossEntropyError'
+
+
+class BinaryCrossEntropySigmoidError(object):
+    """Binary cross entropy error with logistic sigmoid applied to outputs."""
+
+    def __call__(self, outputs, targets):
+        """Calculates error function given a batch of outputs and targets.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar error function value.
+        """
+        probs = 1. / (1. + np.exp(-outputs))
+        return -np.mean(
+            targets * np.log(probs) + (1. - targets) * np.log(1. - probs))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of error function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of error function with respect to outputs.
+        """
+        probs = 1. / (1. + np.exp(-outputs))
+        return (probs - targets) / outputs.shape[0]
+
+    def __repr__(self):
+        return 'BinaryCrossEntropySigmoidError'
+
+
+class CrossEntropyError(object):
+    """Multi-class cross entropy error."""
+
+    def __call__(self, outputs, targets):
+        """Calculates error function given a batch of outputs and targets.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar error function value.
+        """
+        return -np.mean(np.sum(targets * np.log(outputs), axis=1))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of error function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of error function with respect to outputs.
+        """
+        return -(targets / outputs) / outputs.shape[0]
+
+    def __repr__(self):
+        return 'CrossEntropyError'
+
+
+class CrossEntropySoftmaxError(object):
+    """Multi-class cross entropy error with Softmax applied to outputs."""
+
+    def __call__(self, outputs, targets):
+        """Calculates error function given a batch of outputs and targets.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Scalar error function value.
+        """
+        normOutputs = outputs - outputs.max(-1)[:, None]
+        logProb = normOutputs - np.log(np.sum(np.exp(normOutputs), axis=-1)[:, None])
+        return -np.mean(np.sum(targets * logProb, axis=1))
+
+    def grad(self, outputs, targets):
+        """Calculates gradient of error function with respect to outputs.
+
+        Args:
+            outputs: Array of model outputs of shape (batch_size, output_dim).
+            targets: Array of target outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Gradient of error function with respect to outputs.
+        """
+        probs = np.exp(outputs - outputs.max(-1)[:, None])
+        probs /= probs.sum(-1)[:, None]
+        return (probs - targets) / outputs.shape[0]
+
+    def __repr__(self):
+        return 'CrossEntropySoftmaxError'
--- a/mlp/initialisers.py
+++ b/mlp/initialisers.py
@ -63,3 +63,81 @@ class NormalInit(object):

    def __call__(self, shape):
        return self.rng.normal(loc=self.mean, scale=self.std, size=shape)
+
+class GlorotUniformInit(object):
+    """Glorot and Bengio (2010) random uniform weights initialiser.
+
+    Initialises an two-dimensional parameter array using the 'normalized
+    initialisation' scheme suggested in [1] which attempts to maintain a
+    roughly constant variance in the activations and backpropagated gradients
+    of a multi-layer model consisting of interleaved affine and logistic
+    sigmoidal transformation layers.
+
+    Weights are sampled from a zero-mean uniform distribution with standard
+    deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
+    `output_dim` are the input and output dimensions of the weight matrix
+    respectively.
+
+    References:
+      [1]: Understanding the difficulty of training deep feedforward neural
+           networks, Glorot and Bengio (2010)
+    """
+
+    def __init__(self, gain=1., rng=None):
+        """Construct a normalised initilisation random initialiser object.
+
+        Args:
+            gain: Multiplicative factor to scale initialised weights by.
+                Recommended values is 1 for affine layers followed by
+                logistic sigmoid layers (or another affine layer).
+            rng (RandomState): Seeded random number generator.
+        """
+        self.gain = gain
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+
+    def __call__(self, shape):
+        assert len(shape) == 2, (
+            'Initialiser should only be used for two dimensional arrays.')
+        std = self.gain * (2. / (shape[0] + shape[1]))**0.5
+        half_width = 3.**0.5 * std
+        return self.rng.uniform(low=-half_width, high=half_width, size=shape)
+
+
+class GlorotNormalInit(object):
+    """Glorot and Bengio (2010) random normal weights initialiser.
+
+    Initialises an two-dimensional parameter array using the 'normalized
+    initialisation' scheme suggested in [1] which attempts to maintain a
+    roughly constant variance in the activations and backpropagated gradients
+    of a multi-layer model consisting of interleaved affine and logistic
+    sigmoidal transformation layers.
+
+    Weights are sampled from a zero-mean normal distribution with standard
+    deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
+    `output_dim` are the input and output dimensions of the weight matrix
+    respectively.
+
+    References:
+      [1]: Understanding the difficulty of training deep feedforward neural
+           networks, Glorot and Bengio (2010)
+    """
+
+    def __init__(self, gain=1., rng=None):
+        """Construct a normalised initilisation random initialiser object.
+
+        Args:
+            gain: Multiplicative factor to scale initialised weights by.
+                Recommended values is 1 for affine layers followed by
+                logistic sigmoid layers (or another affine layer).
+            rng (RandomState): Seeded random number generator.
+        """
+        self.gain = gain
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+
+    def __call__(self, shape):
+        std = self.gain * (2. / (shape[0] + shape[1]))**0.5
+        return self.rng.normal(loc=0., scale=std, size=shape)
--- a/mlp/layers.py
+++ b/mlp/layers.py
@ -14,6 +14,7 @@ respect to the layer parameters.

 import numpy as np
 import mlp.initialisers as init
+from mlp import DEFAULT_SEED


 class Layer(object):
@ -68,12 +69,154 @@ class LayerWithParameters(Layer):
        """
        raise NotImplementedError()

+    def params_penalty(self):
+        """Returns the parameter dependent penalty term for this layer.
+
+        If no parameter-dependent penalty terms are set this returns zero.
+        """
+        raise NotImplementedError()
+
    @property
    def params(self):
        """Returns a list of parameters of layer.

        Returns:
-            List of current parameter values.
+            List of current parameter values. This list should be in the
+            corresponding order to the `values` argument to `set_params`.
+        """
+        raise NotImplementedError()
+
+    @params.setter
+    def params(self, values):
+        """Sets layer parameters from a list of values.
+
+        Args:
+            values: List of values to set parameters to. This list should be
+                in the corresponding order to what is returned by `get_params`.
+        """
+        raise NotImplementedError()
+
+
+class StochasticLayerWithParameters(Layer):
+    """Specialised layer which uses a stochastic forward propagation."""
+
+    def __init__(self, rng=None):
+        """Constructs a new StochasticLayer object.
+
+        Args:
+            rng (RandomState): Seeded random number generator object.
+        """
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+
+    def fprop(self, inputs, stochastic=True):
+        """Forward propagates activations through the layer transformation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            stochastic: Flag allowing different deterministic
+                forward-propagation mode in addition to default stochastic
+                forward-propagation e.g. for use at test time. If False
+                a deterministic forward-propagation transformation
+                corresponding to the expected output of the stochastic
+                forward-propagation is applied.
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        raise NotImplementedError()
+
+    def grads_wrt_params(self, inputs, grads_wrt_outputs):
+        """Calculates gradients with respect to layer parameters.
+
+        Args:
+            inputs: Array of inputs to layer of shape (batch_size, input_dim).
+            grads_wrt_to_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            List of arrays of gradients with respect to the layer parameters
+            with parameter gradients appearing in same order in tuple as
+            returned from `get_params` method.
+        """
+        raise NotImplementedError()
+
+    def params_penalty(self):
+        """Returns the parameter dependent penalty term for this layer.
+
+        If no parameter-dependent penalty terms are set this returns zero.
+        """
+        raise NotImplementedError()
+
+    @property
+    def params(self):
+        """Returns a list of parameters of layer.
+
+        Returns:
+            List of current parameter values. This list should be in the
+            corresponding order to the `values` argument to `set_params`.
+        """
+        raise NotImplementedError()
+
+    @params.setter
+    def params(self, values):
+        """Sets layer parameters from a list of values.
+
+        Args:
+            values: List of values to set parameters to. This list should be
+                in the corresponding order to what is returned by `get_params`.
+        """
+        raise NotImplementedError()
+
+
+class StochasticLayer(Layer):
+    """Specialised layer which uses a stochastic forward propagation."""
+
+    def __init__(self, rng=None):
+        """Constructs a new StochasticLayer object.
+
+        Args:
+            rng (RandomState): Seeded random number generator object.
+        """
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+
+    def fprop(self, inputs, stochastic=True):
+        """Forward propagates activations through the layer transformation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            stochastic: Flag allowing different deterministic
+                forward-propagation mode in addition to default stochastic
+                forward-propagation e.g. for use at test time. If False
+                a deterministic forward-propagation transformation
+                corresponding to the expected output of the stochastic
+                forward-propagation is applied.
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        raise NotImplementedError()
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs. This should correspond to
+        default stochastic forward-propagation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
        """
        raise NotImplementedError()

@ -87,7 +230,7 @@ class AffineLayer(LayerWithParameters):
    def __init__(self, input_dim, output_dim,
                 weights_initialiser=init.UniformInit(-0.1, 0.1),
                 biases_initialiser=init.ConstantInit(0.),
-                 weights_cost=None, biases_cost=None):
+                 weights_penalty=None, biases_penalty=None):
        """Initialises a parameterised affine layer.

        Args:
@ -95,11 +238,17 @@ class AffineLayer(LayerWithParameters):
            output_dim (int): Dimension of the layer outputs.
            weights_initialiser: Initialiser for the weight parameters.
            biases_initialiser: Initialiser for the bias parameters.
+            weights_penalty: Weights-dependent penalty term (regulariser) or
+                None if no regularisation is to be applied to the weights.
+            biases_penalty: Biases-dependent penalty term (regulariser) or
+                None if no regularisation is to be applied to the biases.
        """
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.weights = weights_initialiser((self.output_dim, self.input_dim))
        self.biases = biases_initialiser(self.output_dim)
+        self.weights_penalty = weights_penalty
+        self.biases_penalty = biases_penalty

    def fprop(self, inputs):
        """Forward propagates activations through the layer transformation.
@ -113,8 +262,26 @@ class AffineLayer(LayerWithParameters):
        Returns:
            outputs: Array of layer outputs of shape (batch_size, output_dim).
        """
-        #TODO write your code here
-        raise NotImplementedError()
+        return self.weights.dot(inputs.T).T + self.biases
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return grads_wrt_outputs.dot(self.weights)

    def grads_wrt_params(self, inputs, grads_wrt_outputs):
        """Calculates gradients with respect to layer parameters.
@ -128,14 +295,530 @@ class AffineLayer(LayerWithParameters):
            list of arrays of gradients with respect to the layer parameters
            `[grads_wrt_weights, grads_wrt_biases]`.
        """
-        #TODO write your code here
-        raise NotImplementedError()
+
+        grads_wrt_weights = np.dot(grads_wrt_outputs.T, inputs)
+        grads_wrt_biases = np.sum(grads_wrt_outputs, axis=0)
+
+        if self.weights_penalty is not None:
+            grads_wrt_weights += self.weights_penalty.grad(parameter=self.weights)
+
+        if self.biases_penalty is not None:
+            grads_wrt_biases += self.biases_penalty.grad(parameter=self.biases)
+
+        return [grads_wrt_weights, grads_wrt_biases]
+
+    def params_penalty(self):
+        """Returns the parameter dependent penalty term for this layer.
+
+        If no parameter-dependent penalty terms are set this returns zero.
+        """
+        params_penalty = 0
+        if self.weights_penalty is not None:
+            params_penalty += self.weights_penalty(self.weights)
+        if self.biases_penalty is not None:
+            params_penalty += self.biases_penalty(self.biases)
+        return params_penalty

    @property
    def params(self):
        """A list of layer parameter values: `[weights, biases]`."""
        return [self.weights, self.biases]

+    @params.setter
+    def params(self, values):
+        self.weights = values[0]
+        self.biases = values[1]
+
    def __repr__(self):
        return 'AffineLayer(input_dim={0}, output_dim={1})'.format(
            self.input_dim, self.output_dim)
+
+
+class SigmoidLayer(Layer):
+    """Layer implementing an element-wise logistic sigmoid transformation."""
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        For inputs `x` and outputs `y` this corresponds to
+        `y = 1 / (1 + exp(-x))`.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return 1. / (1. + np.exp(-inputs))
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return grads_wrt_outputs * outputs * (1. - outputs)
+
+    def __repr__(self):
+        return 'SigmoidLayer'
+
+
+class ConvolutionalLayer(LayerWithParameters):
+    """Layer implementing a 2D convolution-based transformation of its inputs.
+    The layer is parameterised by a set of 2D convolutional kernels, a four
+    dimensional array of shape
+        (num_output_channels, num_input_channels, kernel_height, kernel_dim_2)
+    and a bias vector, a one dimensional array of shape
+        (num_output_channels,)
+    i.e. one shared bias per output channel.
+    Assuming no-padding is applied to the inputs so that outputs are only
+    calculated for positions where the kernel filters fully overlap with the
+    inputs, and that unit strides are used the outputs will have spatial extent
+        output_height = input_height - kernel_height + 1
+        output_width = input_width - kernel_width + 1
+    """
+
+    def __init__(self, num_input_channels, num_output_channels,
+                 input_height, input_width,
+                 kernel_height, kernel_width,
+                 kernels_init=init.UniformInit(-0.01, 0.01),
+                 biases_init=init.ConstantInit(0.),
+                 kernels_penalty=None, biases_penalty=None):
+        """Initialises a parameterised convolutional layer.
+        Args:
+            num_input_channels (int): Number of channels in inputs to
+                layer (this may be number of colour channels in the input
+                images if used as the first layer in a model, or the
+                number of output channels, a.k.a. feature maps, from a
+                a previous convolutional layer).
+            num_output_channels (int): Number of channels in outputs
+                from the layer, a.k.a. number of feature maps.
+            input_height (int): Size of first input dimension of each 2D
+                channel of inputs.
+            input_width (int): Size of second input dimension of each 2D
+                channel of inputs.
+            kernel_height (int): Size of first dimension of each 2D channel of
+                kernels.
+            kernel_width (int): Size of second dimension of each 2D channel of
+                kernels.
+            kernels_intialiser: Initialiser for the kernel parameters.
+            biases_initialiser: Initialiser for the bias parameters.
+            kernels_penalty: Kernel-dependent penalty term (regulariser) or
+                None if no regularisation is to be applied to the kernels.
+            biases_penalty: Biases-dependent penalty term (regulariser) or
+                None if no regularisation is to be applied to the biases.
+        """
+        self.num_input_channels = num_input_channels
+        self.num_output_channels = num_output_channels
+        self.input_height = input_height
+        self.input_width = input_width
+        self.kernel_height = kernel_height
+        self.kernel_width = kernel_width
+        self.kernels_init = kernels_init
+        self.biases_init = biases_init
+        self.kernels_shape = (
+            num_output_channels, num_input_channels, kernel_height, kernel_width
+        )
+        self.inputs_shape = (
+            None, num_input_channels, input_height, input_width
+        )
+        self.kernels = self.kernels_init(self.kernels_shape)
+        self.biases = self.biases_init(num_output_channels)
+        self.kernels_penalty = kernels_penalty
+        self.biases_penalty = biases_penalty
+
+        self.cache = None
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+        For inputs `x`, outputs `y`, kernels `K` and biases `b` the layer
+        corresponds to `y = conv2d(x, K) + b`.
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, num_input_channels, image_height, image_width).
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, num_output_channels, output_height, output_width).
+        """
+        raise NotImplementedError
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+        Args:
+            inputs: Array of layer inputs of shape
+                (batch_size, num_input_channels, input_height, input_width).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape
+                (batch_size, num_output_channels, output_height, output_width).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape
+                (batch_size, num_output_channels, output_height, output_width).
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, num_input_channels, input_height, input_width).
+        """
+        # Pad the grads_wrt_outputs
+        raise NotImplementedError
+
+    def grads_wrt_params(self, inputs, grads_wrt_outputs):
+        """Calculates gradients with respect to layer parameters.
+        Args:
+            inputs: array of inputs to layer of shape (batch_size, input_dim)
+            grads_wrt_to_outputs: array of gradients with respect to the layer
+                outputs of shape
+                (batch_size, num_output_channels, output_height, output_width).
+        Returns:
+            list of arrays of gradients with respect to the layer parameters
+            `[grads_wrt_kernels, grads_wrt_biases]`.
+        """
+        # Get inputs_col from previous fprop
+        raise NotImplementedError
+
+    def params_penalty(self):
+        """Returns the parameter dependent penalty term for this layer.
+        If no parameter-dependent penalty terms are set this returns zero.
+        """
+        params_penalty = 0
+        if self.kernels_penalty is not None:
+            params_penalty += self.kernels_penalty(self.kernels)
+        if self.biases_penalty is not None:
+            params_penalty += self.biases_penalty(self.biases)
+        return params_penalty
+
+    @property
+    def params(self):
+        """A list of layer parameter values: `[kernels, biases]`."""
+        return [self.kernels, self.biases]
+
+    @params.setter
+    def params(self, values):
+        self.kernels = values[0]
+        self.biases = values[1]
+
+    def __repr__(self):
+        return (
+            'ConvolutionalLayer(\n'
+            '    num_input_channels={0}, num_output_channels={1},\n'
+            '    input_height={2}, input_width={3},\n'
+            '    kernel_height={4}, kernel_width={5}\n'
+            ')'
+                .format(self.num_input_channels, self.num_output_channels,
+                        self.input_height, self.input_width, self.kernel_height,
+                        self.kernel_width)
+        )
+
+
+class ReluLayer(Layer):
+    """Layer implementing an element-wise rectified linear transformation."""
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        For inputs `x` and outputs `y` this corresponds to `y = max(0, x)`.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return np.maximum(inputs, 0.)
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return (outputs > 0) * grads_wrt_outputs
+
+    def __repr__(self):
+        return 'ReluLayer'
+
+
+class TanhLayer(Layer):
+    """Layer implementing an element-wise hyperbolic tangent transformation."""
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        For inputs `x` and outputs `y` this corresponds to `y = tanh(x)`.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return np.tanh(inputs)
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return (1. - outputs ** 2) * grads_wrt_outputs
+
+    def __repr__(self):
+        return 'TanhLayer'
+
+
+class SoftmaxLayer(Layer):
+    """Layer implementing a softmax transformation."""
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        For inputs `x` and outputs `y` this corresponds to
+
+            `y = exp(x) / sum(exp(x))`.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        # subtract max inside exponential to improve numerical stability -
+        # when we divide through by sum this term cancels
+        exp_inputs = np.exp(inputs - inputs.max(-1)[:, None])
+        return exp_inputs / exp_inputs.sum(-1)[:, None]
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return (outputs * (grads_wrt_outputs -
+                           (grads_wrt_outputs * outputs).sum(-1)[:, None]))
+
+    def __repr__(self):
+        return 'SoftmaxLayer'
+
+
+class RadialBasisFunctionLayer(Layer):
+    """Layer implementing projection to a grid of radial basis functions."""
+
+    def __init__(self, grid_dim, intervals=[[0., 1.]]):
+        """Creates a radial basis function layer object.
+
+        Args:
+            grid_dim: Integer specifying how many basis function to use in
+                grid across input space per dimension (so total number of
+                basis functions will be grid_dim**input_dim)
+            intervals: List of intervals (two element lists or tuples)
+                specifying extents of axis-aligned region in input-space to
+                tile basis functions in grid across. For example for a 2D input
+                space spanning [0, 1] x [0, 1] use intervals=[[0, 1], [0, 1]].
+        """
+        num_basis = grid_dim ** len(intervals)
+        self.centres = np.array(np.meshgrid(*[
+            np.linspace(low, high, grid_dim) for (low, high) in intervals])
+                                ).reshape((len(intervals), -1))
+        self.scales = np.array([
+            [(high - low) * 1. / grid_dim] for (low, high) in intervals])
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return np.exp(-(inputs[..., None] - self.centres[None, ...]) ** 2 /
+                      self.scales ** 2).reshape((inputs.shape[0], -1))
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        num_basis = self.centres.shape[1]
+        return -2 * (
+                ((inputs[..., None] - self.centres[None, ...]) / self.scales ** 2) *
+                grads_wrt_outputs.reshape((inputs.shape[0], -1, num_basis))
+        ).sum(-1)
+
+    def __repr__(self):
+        return 'RadialBasisFunctionLayer(grid_dim={0})'.format(self.grid_dim)
+
+
+class DropoutLayer(StochasticLayer):
+    """Layer which stochastically drops input dimensions in its output."""
+
+    def __init__(self, rng=None, incl_prob=0.5, share_across_batch=True):
+        """Construct a new dropout layer.
+
+        Args:
+            rng (RandomState): Seeded random number generator.
+            incl_prob: Scalar value in (0, 1] specifying the probability of
+                each input dimension being included in the output.
+            share_across_batch: Whether to use same dropout mask across
+                all inputs in a batch or use per input masks.
+        """
+        super(DropoutLayer, self).__init__(rng)
+        assert incl_prob > 0. and incl_prob <= 1.
+        self.incl_prob = incl_prob
+        self.share_across_batch = share_across_batch
+        self.rng = rng
+
+    def fprop(self, inputs, stochastic=True):
+        """Forward propagates activations through the layer transformation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            stochastic: Flag allowing different deterministic
+                forward-propagation mode in addition to default stochastic
+                forward-propagation e.g. for use at test time. If False
+                a deterministic forward-propagation transformation
+                corresponding to the expected output of the stochastic
+                forward-propagation is applied.
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        if stochastic:
+            mask_shape = (1,) + inputs.shape[1:] if self.share_across_batch else inputs.shape
+            self._mask = (self.rng.uniform(size=mask_shape) < self.incl_prob)
+            return inputs * self._mask
+        else:
+            return inputs * self.incl_prob
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs. This should correspond to
+        default stochastic forward-propagation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return grads_wrt_outputs * self._mask
+
+    def __repr__(self):
+        return 'DropoutLayer(incl_prob={0:.1f})'.format(self.incl_prob)
+
+
+class ReshapeLayer(Layer):
+    """Layer which reshapes dimensions of inputs."""
+
+    def __init__(self, output_shape=None):
+        """Create a new reshape layer object.
+
+        Args:
+            output_shape: Tuple specifying shape each input in batch should
+                be reshaped to in outputs. This **excludes** the batch size
+                so the shape of the final output array will be
+                    (batch_size, ) + output_shape
+                Similarly to numpy.reshape, one shape dimension can be -1. In
+                this case, the value is inferred from the size of the input
+                array and remaining dimensions. The shape specified must be
+                compatible with the input array shape - i.e. the total number
+                of values in the array cannot be changed. If set to `None` the
+                output shape will be set to
+                    (batch_size, -1)
+                which will flatten all the inputs to vectors.
+        """
+        self.output_shape = (-1,) if output_shape is None else output_shape
+
+    def fprop(self, inputs):
+        """Forward propagates activations through the layer transformation.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+
+        Returns:
+            outputs: Array of layer outputs of shape (batch_size, output_dim).
+        """
+        return inputs.reshape((inputs.shape[0],) + self.output_shape)
+
+    def bprop(self, inputs, outputs, grads_wrt_outputs):
+        """Back propagates gradients through a layer.
+
+        Given gradients with respect to the outputs of the layer calculates the
+        gradients with respect to the layer inputs.
+
+        Args:
+            inputs: Array of layer inputs of shape (batch_size, input_dim).
+            outputs: Array of layer outputs calculated in forward pass of
+                shape (batch_size, output_dim).
+            grads_wrt_outputs: Array of gradients with respect to the layer
+                outputs of shape (batch_size, output_dim).
+
+        Returns:
+            Array of gradients with respect to the layer inputs of shape
+            (batch_size, input_dim).
+        """
+        return grads_wrt_outputs.reshape(inputs.shape)
+
+    def __repr__(self):
+        return 'ReshapeLayer(output_shape={0})'.format(self.output_shape)
--- a/mlp/learning_rules.py
+++ b/mlp/learning_rules.py
@ -160,3 +160,229 @@ class MomentumLearningRule(GradientDescentLearningRule):
            mom *= self.mom_coeff
            mom -= self.learning_rate * grad
            param += mom
+
+
+class AdamLearningRule(GradientDescentLearningRule):
+    """Adaptive moments (Adam) learning rule.
+    First-order gradient-descent based learning rule which uses adaptive
+    estimates of first and second moments of the parameter gradients to
+    calculate the parameter updates.
+    References:
+      [1]: Adam: a method for stochastic optimisation
+           Kingma and Ba, 2015
+    """
+
+    def __init__(self, learning_rate=1e-3, beta_1=0.9, beta_2=0.999,
+                 epsilon=1e-8):
+        """Creates a new learning rule object.
+        Args:
+            learning_rate: A postive scalar to scale gradient updates to the
+                parameters by. This needs to be carefully set - if too large
+                the learning dynamic will be unstable and may diverge, while
+                if set too small learning will proceed very slowly.
+            beta_1: Exponential decay rate for gradient first moment estimates.
+                This should be a scalar value in [0, 1]. The running gradient
+                first moment estimate is calculated using
+                `m_1 = beta_1 * m_1_prev + (1 - beta_1) * g`
+                 where `m_1_prev` is the previous estimate and `g` the current
+                 parameter gradients.
+            beta_2: Exponential decay rate for gradient second moment
+                estimates. This should be a scalar value in [0, 1]. The run
+                gradient second moment estimate is calculated using
+                `m_2 = beta_2 * m_2_prev + (1 - beta_2) * g**2`
+                 where `m_2_prev` is the previous estimate and `g` the current
+                 parameter gradients.
+            epsilon: 'Softening' parameter to stop updates diverging when
+                second moment estimates are close to zero. Should be set to
+                a small positive value.
+        """
+        super(AdamLearningRule, self).__init__(learning_rate)
+        assert beta_1 >= 0. and beta_1 <= 1., 'beta_1 should be in [0, 1].'
+        assert beta_2 >= 0. and beta_2 <= 1., 'beta_2 should be in [0, 2].'
+        assert epsilon > 0., 'epsilon should be > 0.'
+        self.beta_1 = beta_1
+        self.beta_2 = beta_2
+        self.epsilon = epsilon
+
+    def initialise(self, params):
+        """Initialises the state of the learning rule for a set or parameters.
+        This must be called before `update_params` is first called.
+        Args:
+            params: A list of the parameters to be optimised. Note these will
+                be updated *in-place* to avoid reallocating arrays on each
+                update.
+        """
+        super(AdamLearningRule, self).initialise(params)
+        self.moms_1 = []
+        for param in self.params:
+            self.moms_1.append(np.zeros_like(param))
+        self.moms_2 = []
+        for param in self.params:
+            self.moms_2.append(np.zeros_like(param))
+        self.step_count = 0
+
+    def reset(self):
+        """Resets any additional state variables to their initial values.
+        For this learning rule this corresponds to zeroing the estimates of
+        the first and second moments of the gradients.
+        """
+        for mom_1, mom_2 in zip(self.moms_1, self.moms_2):
+            mom_1 *= 0.
+            mom_2 *= 0.
+        self.step_count = 0
+
+    def update_params(self, grads_wrt_params):
+        """Applies a single update to all parameters.
+        All parameter updates are performed using in-place operations and so
+        nothing is returned.
+        Args:
+            grads_wrt_params: A list of gradients of the scalar loss function
+                with respect to each of the parameters passed to `initialise`
+                previously, with this list expected to be in the same order.
+        """
+        for param, mom_1, mom_2, grad in zip(
+                self.params, self.moms_1, self.moms_2, grads_wrt_params):
+            mom_1 *= self.beta_1
+            mom_1 += (1. - self.beta_1) * grad
+            mom_2 *= self.beta_2
+            mom_2 += (1. - self.beta_2) * grad ** 2
+            alpha_t = (
+                    self.learning_rate *
+                    (1. - self.beta_2 ** (self.step_count + 1)) ** 0.5 /
+                    (1. - self.beta_1 ** (self.step_count + 1))
+            )
+            param -= alpha_t * mom_1 / (mom_2 ** 0.5 + self.epsilon)
+        self.step_count += 1
+
+
+class AdaGradLearningRule(GradientDescentLearningRule):
+    """Adaptive gradients (AdaGrad) learning rule.
+    First-order gradient-descent based learning rule which normalises gradient
+    updates by a running sum of the past squared gradients.
+    References:
+      [1]: Adaptive Subgradient Methods for Online Learning and Stochastic
+           Optimization. Duchi, Haxan and Singer, 2011
+    """
+
+    def __init__(self, learning_rate=1e-2, epsilon=1e-8):
+        """Creates a new learning rule object.
+        Args:
+            learning_rate: A postive scalar to scale gradient updates to the
+                parameters by. This needs to be carefully set - if too large
+                the learning dynamic will be unstable and may diverge, while
+                if set too small learning will proceed very slowly.
+            epsilon: 'Softening' parameter to stop updates diverging when
+                sums of squared gradients are close to zero. Should be set to
+                a small positive value.
+        """
+        super(AdaGradLearningRule, self).__init__(learning_rate)
+        assert epsilon > 0., 'epsilon should be > 0.'
+        self.epsilon = epsilon
+
+    def initialise(self, params):
+        """Initialises the state of the learning rule for a set or parameters.
+        This must be called before `update_params` is first called.
+        Args:
+            params: A list of the parameters to be optimised. Note these will
+                be updated *in-place* to avoid reallocating arrays on each
+                update.
+        """
+        super(AdaGradLearningRule, self).initialise(params)
+        self.sum_sq_grads = []
+        for param in self.params:
+            self.sum_sq_grads.append(np.zeros_like(param))
+
+    def reset(self):
+        """Resets any additional state variables to their initial values.
+        For this learning rule this corresponds to zeroing all the sum of
+        squared gradient states.
+        """
+        for sum_sq_grad in self.sum_sq_grads:
+            sum_sq_grad *= 0.
+
+    def update_params(self, grads_wrt_params):
+        """Applies a single update to all parameters.
+        All parameter updates are performed using in-place operations and so
+        nothing is returned.
+        Args:
+            grads_wrt_params: A list of gradients of the scalar loss function
+                with respect to each of the parameters passed to `initialise`
+                previously, with this list expected to be in the same order.
+        """
+        for param, sum_sq_grad, grad in zip(
+                self.params, self.sum_sq_grads, grads_wrt_params):
+            sum_sq_grad += grad ** 2
+            param -= (self.learning_rate * grad /
+                      (sum_sq_grad + self.epsilon) ** 0.5)
+
+
+class RMSPropLearningRule(GradientDescentLearningRule):
+    """Root mean squared gradient normalised learning rule (RMSProp).
+    First-order gradient-descent based learning rule which normalises gradient
+    updates by a exponentially smoothed estimate of the gradient second
+    moments.
+    References:
+      [1]: Neural Networks for Machine Learning: Lecture 6a slides
+           University of Toronto,Computer Science Course CSC321
+      http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
+    """
+
+    def __init__(self, learning_rate=1e-3, beta=0.9, epsilon=1e-8):
+        """Creates a new learning rule object.
+        Args:
+            learning_rate: A postive scalar to scale gradient updates to the
+                parameters by. This needs to be carefully set - if too large
+                the learning dynamic will be unstable and may diverge, while
+                if set too small learning will proceed very slowly.
+            beta: Exponential decay rate for gradient second moment
+                estimates. This should be a scalar value in [0, 1]. The running
+                gradient second moment estimate is calculated using
+                `m_2 = beta * m_2_prev + (1 - beta) * g**2`
+                 where `m_2_prev` is the previous estimate and `g` the current
+                 parameter gradients.
+            epsilon: 'Softening' parameter to stop updates diverging when
+                gradient second moment estimates are close to zero. Should be
+                set to a small positive value.
+        """
+        super(RMSPropLearningRule, self).__init__(learning_rate)
+        assert beta >= 0. and beta <= 1., 'beta should be in [0, 1].'
+        assert epsilon > 0., 'epsilon should be > 0.'
+        self.beta = beta
+        self.epsilon = epsilon
+
+    def initialise(self, params):
+        """Initialises the state of the learning rule for a set or parameters.
+        This must be called before `update_params` is first called.
+        Args:
+            params: A list of the parameters to be optimised. Note these will
+                be updated *in-place* to avoid reallocating arrays on each
+                update.
+        """
+        super(RMSPropLearningRule, self).initialise(params)
+        self.moms_2 = []
+        for param in self.params:
+            self.moms_2.append(np.zeros_like(param))
+
+    def reset(self):
+        """Resets any additional state variables to their initial values.
+        For this learning rule this corresponds to zeroing all gradient
+        second moment estimates.
+        """
+        for mom_2 in self.moms_2:
+            mom_2 *= 0.
+
+    def update_params(self, grads_wrt_params):
+        """Applies a single update to all parameters.
+        All parameter updates are performed using in-place operations and so
+        nothing is returned.
+        Args:
+            grads_wrt_params: A list of gradients of the scalar loss function
+                with respect to each of the parameters passed to `initialise`
+                previously, with this list expected to be in the same order.
+        """
+        for param, mom_2, grad in zip(
+                self.params, self.moms_2, grads_wrt_params):
+            mom_2 *= self.beta
+            mom_2 += (1. - self.beta) * grad ** 2
+            param -= (self.learning_rate * grad /
+                      (mom_2 + self.epsilon) ** 0.5)
--- a/mlp/models.py
+++ b/mlp/models.py
@ -8,7 +8,7 @@ outputs (and intermediate states) and for calculating gradients of scalar
 functions of the outputs with respect to the model parameters.
 """

-from mlp.layers import LayerWithParameters
+from mlp.layers import LayerWithParameters, StochasticLayer, StochasticLayerWithParameters


 class SingleLayerModel(object):
@ -27,7 +27,7 @@ class SingleLayerModel(object):
        """A list of all of the parameters of the model."""
        return self.layer.params

-    def fprop(self, inputs):
+    def fprop(self, inputs, evaluation=False):
        """Calculate the model outputs corresponding to a batch of inputs.

        Args:
@ -59,9 +59,87 @@ class SingleLayerModel(object):
        """
        return self.layer.grads_wrt_params(activations[0], grads_wrt_outputs)

-    def params_cost(self):
-        """Calculates the parameter dependent cost term of the model."""
-        return self.layer.params_cost()
+    def __repr__(self):
+        return 'SingleLayerModel(' + str(self.layer) + ')'
+
+
+class MultipleLayerModel(object):
+    """A model consisting of multiple layers applied sequentially."""
+
+    def __init__(self, layers):
+        """Create a new multiple layer model instance.
+
+        Args:
+            layers: List of the the layer objecst defining the model in the
+                order they should be applied from inputs to outputs.
+        """
+        self.layers = layers
+
+    @property
+    def params(self):
+        """A list of all of the parameters of the model."""
+        params = []
+        for layer in self.layers:
+            if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
+                params += layer.params
+        return params
+
+    def fprop(self, inputs, evaluation=False):
+        """Forward propagates a batch of inputs through the model.
+
+        Args:
+            inputs: Batch of inputs to the model.
+
+        Returns:
+            List of the activations at the output of all layers of the model
+            plus the inputs (to the first layer) as the first element. The
+            last element of the list corresponds to the model outputs.
+        """
+        activations = [inputs]
+        for i, layer in enumerate(self.layers):
+            if evaluation:
+                if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
+                                                                                   StochasticLayerWithParameters):
+                    current_activations = self.layers[i].fprop(activations[i], stochastic=False)
+                else:
+                    current_activations = self.layers[i].fprop(activations[i])
+            else:
+                if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
+                                                                                   StochasticLayerWithParameters):
+                    current_activations = self.layers[i].fprop(activations[i], stochastic=True)
+                else:
+                    current_activations = self.layers[i].fprop(activations[i])
+            activations.append(current_activations)
+        return activations
+
+    def grads_wrt_params(self, activations, grads_wrt_outputs):
+        """Calculates gradients with respect to the model parameters.
+
+        Args:
+            activations: List of all activations from forward pass through
+                model using `fprop`.
+            grads_wrt_outputs: Gradient with respect to the model outputs of
+               the scalar function parameter gradients are being calculated
+               for.
+
+        Returns:
+            List of gradients of the scalar function with respect to all model
+            parameters.
+        """
+        grads_wrt_params = []
+        for i, layer in enumerate(self.layers[::-1]):
+            inputs = activations[-i - 2]
+            outputs = activations[-i - 1]
+            grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)
+            if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
+                grads_wrt_params += layer.grads_wrt_params(
+                    inputs, grads_wrt_outputs)[::-1]
+            grads_wrt_outputs = grads_wrt_inputs
+        return grads_wrt_params[::-1]

    def __repr__(self):
-        return 'SingleLayerModel(' + str(layer) + ')'
+        return (
+            'MultiLayerModel(\n    ' +
+            '\n    '.join([str(layer) for layer in self.layers]) +
+            '\n)'
+        )
--- a/mlp/optimisers.py
+++ b/mlp/optimisers.py
@ -9,7 +9,7 @@ import time
 import logging
 from collections import OrderedDict
 import numpy as np
-
+import tqdm

 logger = logging.getLogger(__name__)

@ -18,7 +18,7 @@ class Optimiser(object):
    """Basic model optimiser."""

    def __init__(self, model, error, learning_rule, train_dataset,
-                 valid_dataset=None, data_monitors=None):
+                 valid_dataset=None, data_monitors=None, notebook=False):
        """Create a new optimiser instance.

        Args:
@ -43,6 +43,11 @@ class Optimiser(object):
        self.data_monitors = OrderedDict([('error', error)])
        if data_monitors is not None:
            self.data_monitors.update(data_monitors)
+        self.notebook = notebook
+        if notebook:
+            self.tqdm_progress = tqdm.tqdm_notebook
+        else:
+            self.tqdm_progress = tqdm.tqdm

    def do_training_epoch(self):
        """Do a single training epoch.
@ -52,12 +57,15 @@ class Optimiser(object):
        respect to all the model parameters and then updates the model
        parameters according to the learning rule.
        """
+        with self.tqdm_progress(total=self.train_dataset.num_batches) as train_progress_bar:
+            train_progress_bar.set_description("Epoch Progress")
            for inputs_batch, targets_batch in self.train_dataset:
                activations = self.model.fprop(inputs_batch)
                grads_wrt_outputs = self.error.grad(activations[-1], targets_batch)
                grads_wrt_params = self.model.grads_wrt_params(
                    activations, grads_wrt_outputs)
                self.learning_rule.update_params(grads_wrt_params)
+                train_progress_bar.update(1)

    def eval_monitors(self, dataset, label):
        """Evaluates the monitors for the given dataset.
@ -72,7 +80,7 @@ class Optimiser(object):
        data_mon_vals = OrderedDict([(key + label, 0.) for key
                                     in self.data_monitors.keys()])
        for inputs_batch, targets_batch in dataset:
-            activations = self.model.fprop(inputs_batch)
+            activations = self.model.fprop(inputs_batch, evaluation=True)
            for key, data_monitor in self.data_monitors.items():
                data_mon_vals[key + label] += data_monitor(
                    activations[-1], targets_batch)
@ -121,14 +129,20 @@ class Optimiser(object):
            and the second being a dict mapping the labels for the statistics
            recorded to their column index in the array.
        """
+        start_train_time = time.time()
        run_stats = [list(self.get_epoch_stats().values())]
+        with self.tqdm_progress(total=num_epochs) as progress_bar:
+            progress_bar.set_description("Experiment Progress")
            for epoch in range(1, num_epochs + 1):
-            start_time = time.process_time()
+                start_time = time.time()
                self.do_training_epoch()
-            epoch_time = time.process_time() - start_time
+                epoch_time = time.time()- start_time
                if epoch % stats_interval == 0:
                    stats = self.get_epoch_stats()
                    self.log_stats(epoch, epoch_time, stats)
                    run_stats.append(list(stats.values()))
-        return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}
+                progress_bar.update(1)
+        finish_train_time = time.time()
+        total_train_time = finish_train_time - start_train_time
+        return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}, total_train_time

--- a/mlp/penalties.py
+++ b/mlp/penalties.py
@ -0,0 +1,90 @@
+import numpy as np
+
+seed = 22102017
+rng = np.random.RandomState(seed)
+
+
+class L1Penalty(object):
+    """L1 parameter penalty.
+
+    Term to add to the objective function penalising parameters
+    based on their L1 norm.
+    """
+
+    def __init__(self, coefficient):
+        """Create a new L1 penalty object.
+
+        Args:
+            coefficient: Positive constant to scale penalty term by.
+        """
+        assert coefficient > 0., 'Penalty coefficient must be positive.'
+        self.coefficient = coefficient
+
+    def __call__(self, parameter):
+        """Calculate L1 penalty value for a parameter.
+
+        Args:
+            parameter: Array corresponding to a model parameter.
+
+        Returns:
+            Value of penalty term.
+        """
+        return self.coefficient * abs(parameter).sum()
+
+    def grad(self, parameter):
+        """Calculate the penalty gradient with respect to the parameter.
+
+        Args:
+            parameter: Array corresponding to a model parameter.
+
+        Returns:
+            Value of penalty gradient with respect to parameter. This
+            should be an array of the same shape as the parameter.
+        """
+        return self.coefficient * np.sign(parameter)
+
+    def __repr__(self):
+        return 'L1Penalty({0})'.format(self.coefficient)
+
+
+class L2Penalty(object):
+    """L1 parameter penalty.
+
+    Term to add to the objective function penalising parameters
+    based on their L2 norm.
+    """
+
+    def __init__(self, coefficient):
+        """Create a new L2 penalty object.
+
+        Args:
+            coefficient: Positive constant to scale penalty term by.
+        """
+        assert coefficient > 0., 'Penalty coefficient must be positive.'
+        self.coefficient = coefficient
+
+    def __call__(self, parameter):
+        """Calculate L2 penalty value for a parameter.
+
+        Args:
+            parameter: Array corresponding to a model parameter.
+
+        Returns:
+            Value of penalty term.
+        """
+        return 0.5 * self.coefficient * (parameter ** 2).sum()
+
+    def grad(self, parameter):
+        """Calculate the penalty gradient with respect to the parameter.
+
+        Args:
+            parameter: Array corresponding to a model parameter.
+
+        Returns:
+            Value of penalty gradient with respect to parameter. This
+            should be an array of the same shape as the parameter.
+        """
+        return self.coefficient * parameter
+
+    def __repr__(self):
+        return 'L2Penalty({0})'.format(self.coefficient)
--- a/mlp/schedulers.py
+++ b/mlp/schedulers.py
@ -0,0 +1,34 @@
+# -*- coding: utf-8 -*-
+"""Training schedulers.
+
+This module contains classes implementing schedulers which control the
+evolution of learning rule hyperparameters (such as learning rate) over a
+training run.
+"""
+
+import numpy as np
+
+
+class ConstantLearningRateScheduler(object):
+    """Example of scheduler interface which sets a constant learning rate."""
+
+    def __init__(self, learning_rate):
+        """Construct a new constant learning rate scheduler object.
+
+        Args:
+            learning_rate: Learning rate to use in learning rule.
+        """
+        self.learning_rate = learning_rate
+
+    def update_learning_rule(self, learning_rule, epoch_number):
+        """Update the hyperparameters of the learning rule.
+
+        Run at the beginning of each epoch.
+
+        Args:
+            learning_rule: Learning rule object being used in training run,
+                any scheduled hyperparameters to be altered should be
+                attributes of this object.
+            epoch_number: Integer index of training epoch about to be run.
+        """
+        learning_rule.learning_rate = self.learning_rate
--- a/notebooks/00_notebook.ipynb
+++ b/notebooks/00_notebook.ipynb
@ -1,242 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction\n",
-    "\n",
-    "## Getting started with Jupyter notebooks\n",
-    "\n",
-    "The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to [skip straight to the exercises below](#Exercises).\n",
-    "\n",
-    "*Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python [hence the name was changed to make it more language agnostic](https://ipython.org/#jupyter-and-the-future-of-ipython) however IPython notebook is still commonly used.*\n",
-    "\n",
-    "### Jupyter basics: the server, dashboard and kernels\n",
-    "\n",
-    "In launching this notebook you will have already come across two of the other key components of the Jupyter system - the notebook *server* and *dashboard* interface.\n",
-    "\n",
-    "We began by starting a notebook server instance in the terminal by running\n",
-    "\n",
-    "```\n",
-    "jupyter notebook\n",
-    "```\n",
-    "\n",
-    "This will have begun printing a series of log messages to terminal output similar to\n",
-    "\n",
-    "```\n",
-    "$ jupyter notebook\n",
-    "[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical\n",
-    "[I 08:58:24.417 NotebookApp] 0 active kernels\n",
-    "[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/\n",
-    "```\n",
-    "\n",
-    "The last message included here indicates the URL the application is being served at. The default behaviour of the `jupyter notebook` command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running `jupyter notebook --no-browser`. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using\n",
-    "\n",
-    "```\n",
-    "jupyter notebook --help\n",
-    "```\n",
-    "\n",
-    "While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use `Ctrl+C` - this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter `y` or skip the confirmation by hitting `Ctrl+C` again.\n",
-    "\n",
-    "When the notebook application first opens in your browser you are taken to the notebook *dashboard*. This will appear something like this\n",
-    "\n",
-    "<img src='res/jupyter-dashboard.png' />\n",
-    "\n",
-    "The dashboard above is showing the `Files` tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the `..` link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.\n",
-    "\n",
-    "As well as allowing you to launch existing notebooks, the `Files` tab of the dashboard also allows new notebooks to be created using the `New` drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).\n",
-    "\n",
-    "In addition to opening notebook files, we can also edit text files such as `.py` source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.\n",
-    "\n",
-    "The `Running` tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed.\n",
-    "\n",
-    "### The notebook interface\n",
-    "\n",
-    "The top of your notebook window should appear something like this:\n",
-    "\n",
-    "<img src='res/jupyter-notebook-interface.png' />\n",
-    "\n",
-    "The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual *checkpoint* of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the `Revert to checkpoint` dropdown under the `File` menu perhaps suggesting otherwise.\n",
-    "\n",
-    "As well as having options to save and revert to checkpoints, the `File` menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.\n",
-    "\n",
-    "The `Edit` menu contains standard clipboard functions as well as options for reorganising notebook *cells*. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The `Edit` and `Insert` drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the `Cell` menu allow running of code cells and changing cell types.\n",
-    "\n",
-    "The `Kernel` menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and `reload` is failing to work. \n",
-    "\n",
-    "To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.\n",
-    "\n",
-    "The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the `User Interface Tour` under the `Help` menu drop down.\n",
-    "\n",
-    "### Markdown cells: easy text formatting\n",
-    "\n",
-    "This entire introduction has been written in what is termed a *Markdown* cell of a notebook. [Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides - for example [this cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).\n",
-    "\n",
-    "---\n",
-    "\n",
-    "```\n",
-    "## Level 2 heading\n",
-    "### Level 3 heading\n",
-    "\n",
-    "*Italicised* and **bold** text.\n",
-    "\n",
-    "  * bulleted\n",
-    "  * lists\n",
-    "  \n",
-    "and\n",
-    "\n",
-    "  1. enumerated\n",
-    "  2. lists\n",
-    "\n",
-    "Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style\n",
-    "\n",
-    "$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
-    "```\n",
-    "---\n",
-    "\n",
-    "## Level 2 heading\n",
-    "### Level 3 heading\n",
-    "\n",
-    "*Italicised* and **bold** text.\n",
-    "\n",
-    "  * bulleted\n",
-    "  * lists\n",
-    "  \n",
-    "and\n",
-    "\n",
-    "  1. enumerated\n",
-    "  2. lists\n",
-    "\n",
-    "Inline maths $y = mx + c$ using [MathJax]() as well as display maths\n",
-    "\n",
-    "$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
-    "\n",
-    "---\n",
-    "\n",
-    "We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.\n",
-    "\n",
-    "---\n",
-    "```\n",
-    "<img src=\"http://placehold.it/350x150\" />\n",
-    "```\n",
-    "---\n",
-    "\n",
-    "<img src=\"http://placehold.it/350x150\" />\n",
-    "\n",
-    "---\n",
-    "\n",
-    "  \n",
-    "### Code cells: in browser code execution\n",
-    "\n",
-    "Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the <i class=\"fa-step-forward fa\"></i> button on the notebook toolbar, go to `Cell > Run Cells` or use the keyboard shortcut `Ctrl+Enter`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from __future__ import print_function\n",
-    "import sys\n",
-    "\n",
-    "print('Hello world!')\n",
-    "print('Alarming hello!', file=sys.stderr)\n",
-    "print('Hello again!')\n",
-    "'And again!'"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This example shows the three main components of a code cell.\n",
-    "\n",
-    "The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.\n",
-    "\n",
-    "To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display `In [ ]:`. After the cell is run this is updated to `In [n]:` where `n` is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!\n",
-    "\n",
-    "Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29) is displayed. Immediately below that output to [`stderr`](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) is displayed. All of the output to `stdout` is displayed together even if there has been output to `stderr` between as shown by the suprising ordering in the output here. \n",
-    "\n",
-    "The final part of the output area is the *display* area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special `__repr__` method e.g. for a string it is just the quote enclosed value of the string itself.\n",
-    "\n",
-    "### Useful keyboard shortcuts\n",
-    "\n",
-    "There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the `Keyboard Shortcuts` option under the `Help` menu. We will cover a few of those we find most useful below.\n",
-    "\n",
-    "Shortcuts come in two flavours: those applicable in *command mode*, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in *edit mode* when the content of a cell is being edited, indicated by a green current cell highlight.\n",
-    "\n",
-    "In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the `Tab` key.\n",
-    "\n",
-    "  * Pressing `Tab` a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing `a` + `[Tab]` in a code cell will come up with a list of objects beginning with `a` in the current global namespace, while typing `np.a` + `[Tab]` (assuming `import numpy as np` has been run already) will bring up a list of objects in the root NumPy namespace beginning with `a`.\n",
-    "  * Pressing `Shift+Tab` once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing `Shift+Tab` twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing `Shift+Tab` four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.\n",
-    "\n",
-    "A series of useful shortcuts available in both command and edit mode are `[modifier]+Enter` where `[modifier]` is one of `Ctrl` (run selected cell), `Shift` (run selected cell and select next) or `Alt` (run selected cell and insert a new cell after).\n",
-    "\n",
-    "A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing `L` which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.\n",
-    "  \n",
-    "### Magics\n",
-    "\n",
-    "There are a range of *magic* commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), however three that are particularly useful for this course:\n",
-    "\n",
-    "  * [`%%timeit`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-timeit) Put at the beginning of a cell to time its execution and print the resulting timing statistics.\n",
-    "  * [`%precision`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-precision) Set the precision for pretty printing of floating point values and NumPy arrays.\n",
-    "  * [`%debug`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-debug) Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.\n",
-    "  \n",
-    "### Plotting with `matplotlib`\n",
-    "\n",
-    "When setting up your environment one of the dependencies we asked you to install was `matplotlib`. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.\n",
-    "\n",
-    "When using `matplotlib` in a notebook you should first run the [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib)\n",
-    "\n",
-    "```\n",
-    "%matplotlib inline\n",
-    "```\n",
-    "\n",
-    "This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using `matplotlib` to show case some of the basic plot options. To see the output produced select the cell and then run it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "\n",
-    "# generate a pair of sinusoids\n",
-    "x = np.linspace(0., 2. * np.pi, 100)\n",
-    "y1 = np.sin(x)\n",
-    "y2 = np.cos(x)\n",
-    "\n",
-    "# produce a new figure object with a defined (width, height) in inches\n",
-    "fig = plt.figure(figsize=(8, 4))\n",
-    "# add a single axis to the figure\n",
-    "ax = fig.add_subplot(111)\n",
-    "# plot the two sinusoidal traces on the axis, adjusting the line width\n",
-    "# and adding LaTeX legend labels\n",
-    "ax.plot(x, y1, linewidth=2, label=r'$\\sin(x)$')\n",
-    "ax.plot(x, y2, linewidth=2, label=r'$\\cos(x)$')\n",
-    "# set the axis labels\n",
-    "ax.set_xlabel('$x$', fontsize=16)\n",
-    "ax.set_ylabel('$y$', fontsize=16)\n",
-    "# force the legend to be displayed\n",
-    "ax.legend()\n",
-    "# adjust the limits of the horizontal axis\n",
-    "ax.set_xlim(0., 2. * np.pi)\n",
-    "# make a grid be displayed in the axis background\n",
-    "ax.grid(True)"
-   ]
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/notebooks/01_Introduction.ipynb
+++ b/notebooks/01_Introduction.ipynb
--- a/notebooks/02_Single_layer_models.ipynb
+++ b/notebooks/02_Single_layer_models.ipynb
--- a/notebooks/Coursework_2_Pytorch_Introduction.ipynb
+++ b/notebooks/Coursework_2_Pytorch_Introduction.ipynb
--- a/notebooks/Plot_Results.ipynb
+++ b/notebooks/Plot_Results.ipynb
--- a/notebooks/res/._fprop-bprop-block-diagram.png
+++ b/notebooks/res/._fprop-bprop-block-diagram.png
--- a/notebooks/res/._jupyter-dashboard.png
+++ b/notebooks/res/._jupyter-dashboard.png
--- a/notebooks/res/._jupyter-notebook-interface.png
+++ b/notebooks/res/._jupyter-notebook-interface.png
--- a/notebooks/res/._singleLayerNetBP-1.png
+++ b/notebooks/res/._singleLayerNetBP-1.png
--- a/notebooks/res/._singleLayerNetPredict.png
+++ b/notebooks/res/._singleLayerNetPredict.png
--- a/notebooks/res/._singleLayerNetWts-1.png
+++ b/notebooks/res/._singleLayerNetWts-1.png
--- a/notebooks/res/._singleLayerNetWtsEqns-1.png
+++ b/notebooks/res/._singleLayerNetWtsEqns-1.png
--- a/notes/environment-set-up.md
+++ b/notes/environment-set-up.md
@ -16,8 +16,10 @@ Conda can handle installation of the Python libraries we will be using and all t

 There are several options available for installing Conda on a system. Here we will use the Python 3 version of [Miniconda](http://conda.pydata.org/miniconda.html), which installs just Conda and its dependencies. An alternative is to install the [Anaconda Python distribution](https://docs.continuum.io/anaconda/), which installs Conda and a large selection of popular Python packages. As we will require only a small subset of these packages we will use the more barebones Miniconda to avoid eating into your DICE disk quota too much, however if installing on a personal machine you may wish to consider Anaconda if you want to explore other Python packages.

+
 ## 2. Installing Miniconda

+
 We provide instructions here for getting an environment with all the required dependencies running on computers running 
 the School of Informatics [DICE desktop](http://computing.help.inf.ed.ac.uk/dice-platform). The same instructions 
 should be able to used on other Linux distributions such as Ubuntu and Linux Mint with minimal adjustments.
@ -32,7 +34,7 @@ If you are using ssh connection to the student server, move to the next step. If

 We first need to download the latest 64-bit Python 3 Miniconda install script:

-```bash
+```
 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 ```

@ -40,7 +42,7 @@ This uses `wget` a command-line tool for downloading files.

 Now run the install script:

-```bash
+```
 bash Miniconda3-latest-Linux-x86_64.sh
 ```

@ -54,14 +56,14 @@ definition in `.bashrc`. As the DICE bash start-up mechanism differs from the st

 On DICE, append the Miniconda binaries directory to `PATH` in manually in `~/.benv` using

-```bash
+```
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
 ```

 To avoid any errors later, check both the bashrc and benv files for the correct file path by running : 

-```bash
+```
 vim ~/.bashrc and vim ~/.benv 
 ```

@ -69,43 +71,43 @@ For those who this appears a bit opaque to and want to know what is going on see

 We now need to `source` the updated `~/.benv` so that the `PATH` variable in the current terminal session is updated:

-```bash
+```
 source ~/.benv
 ```

 From the next time you log in all future terminal sessions should have conda readily available via:

-```bash
+```
 conda activate
 ```

+
 ## 3. Creating the Conda environment

 You should now have a working Conda installation. If you run

-```bash
+```
 conda --help
 ```
-
-From a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
+from a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).

 Assuming Conda is working, we will now create our Conda environment:

-```bash
-conda create -n mlp python=3.12.5 -y
+```
+conda create -n mlp python=3
 ```

-This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install.  
+This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install. You will be presented with a 'package plan' listing the packages to be installed and asked whether to proceed: type `y` then enter.

 We will now *activate* our created environment:

-```bash
+```
 conda activate mlp
 ```

 or on Windows only

-```bash
+```
 activate mlp
 ```

@ -117,41 +119,38 @@ If you wish to deactivate an environment loaded in the current terminal e.g. to

 We will now install the dependencies for the course into the new environment:

-```bash
-conda install numpy scipy matplotlib jupyter -y
+```
+conda install numpy scipy matplotlib jupyter
 ```

-Wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
+Again you will be given a list of the packages to be installed and asked to confirm whether to proceed. Enter `y` then wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.

 Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).

-```bash 
-conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
+```
+conda install pytorch torchvision torchaudio cpuonly -c pytorch
 ```

 Once the installation is finished, to recover some disk space we can clear the package tarballs Conda just downloaded:

-```bash
-conda clean -t -y
+```
+conda clean -t
 ```

 These tarballs are usually cached to allow quicker installation into additional environments however we will only be using a single environment here so there is no need to keep them on disk.

 ***ANLP and IAML students only:***
 To have normal access to your ANLP and IAML environments please do the following:
-
 1. ```nano .condarc```
 2. Add the following lines in the file:
-
-```yml
+```
 envs_dirs:
- /group/teaching/conda/envs
+ - /group/teaching/conda/envs

 pkgs_dirs:
- /group/teaching/conda/pkgs
- ~/miniconda3/pkgs
+ - /group/teaching/conda/pkgs
+ - ~/miniconda3/pkgs
 ```
-
 3. Exit by using control + x and then choosing 'yes' at the exit prompt.

 ## 4. Getting the course code and a short introduction to Git
@ -168,7 +167,7 @@ https://github.com/VICO-UoE/mlpractical

 Git is installed by default on DICE desktops. If you are running a system which does not have Git installed, you can use Conda to install it in your environment using:

-```bash
+```
 conda install git
 ```

@ -189,30 +188,32 @@ If you are already familiar with Git you may wish to skip over the explanatory s

 By default we will assume here you are cloning to your home directory however if you have an existing system for organising your workspace feel free to keep to that. **If you clone the repository to a path other than `~/mlpractical` however you will need to adjust all references to `~/mlpractical` in the commands below accordingly.**

+
 To clone the `mlpractical` repository to the home directory run

-```bash
+```
 git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
 ```

 This will create a new `mlpractical` subdirectory with a local copy of the repository in it. Enter the directory and list all its contents, including hidden files, by running:

-```bash
+```
 cd ~/mlpractical
 ls -a  # Windows equivalent: dir /a
 ```

 For the most part this will look much like any other directory, with there being the following three non-hidden sub-directories:

-* `data`: Data files used in the labs and assignments.
-* `mlp`: The custom Python package we will use in this course.
-* `notebooks`: The Jupyter notebook files for each lab and coursework.
+  * `data`: Data files used in the labs and assignments.
+  * `mlp`: The custom Python package we will use in this course.
+  * `notebooks`: The Jupyter notebook files for each lab and coursework.

 Additionally there exists a hidden `.git` subdirectory (on Unix systems by default files and directories prepended with a period '.' are hidden). This directory contains the repository history database and various configuration files and references. Unless you are sure you know what you are doing you generally should not edit any of the files in this directory directly. Generally most configuration options can be enacted more safely using a `git config` command.

+
 For instance to globally set the user name and email used in commits you can run:

-```bash
+```
 git config --global user.name "[your name]"
 git config --global user.email "[matric-number]@sms.ed.ac.uk"
 ```
@ -235,19 +236,19 @@ A *commit* in Git is a snapshot of the state of the project. The snapshots are r

  2. The files with changes to be committed (including any new files) are added to the *staging area* by running:

-  ```bash
+  ```
  git add file1 file2 ...
  ```

  3. Finally the *staged changes* are used to create a new commit by running

-  ```bash
+  ```
  git commit -m "A commit message describing the changes."
  ```

 This writes the staged changes as a new commit in the repository history. We can see a log of the details of previous commits by running:

-```bash
+```
 git log
 ```

@ -259,17 +260,17 @@ A new branch is created from a commit on an existing branch. Any commits made to

 A typical Git workflow in a software development setting would be to create a new branch whenever making changes to a project, for example to fix a bug or implement a new feature. These changes are then isolated from the main code base allowing regular commits without worrying about making unstable changes to the main code base. Key to this workflow is the ability to *merge* commits from a branch into another branch, e.g. when it is decided a new feature is sufficiently developed to be added to the main code base. Although merging branches is key aspect of using Git in many projects, as dealing with merge conflicts when two branches both make changes to same parts of files can be a somewhat tricky process, we will here generally try to avoid the need for merges.

-We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.
+<p id='branching-explanation'>We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.</p>

 To list the branches present in the local repository, run:

-```bash
+```
 git branch
 ```

 This will display a list of branches with a `*` next to the current branch. To switch to a different existing branch in the local repository run

-```bash
+```
 git checkout branch-name
 ```

@ -277,8 +278,8 @@ This will change the code in the working directory to the current state of the c

 You should make sure you are on the first lab branch now by running:

-```bash
-git checkout mlp2024-25/lab1
+```
+git checkout mlp2023-24/lab1
 ```

 ## 6. Installing the `mlp` Python package
@ -291,7 +292,7 @@ The standard way to install a Python package using a `setup.py` script is to run

 As we will be updating the code in the `mlp` package during the course of the labs this would require you to re-run  `python setup.py install` every time a change is made to the package. Instead therefore you should install the package in development mode by running:

-```bash
+```
 python setup.py develop
 ```

@ -303,20 +304,20 @@ Instead of copying the package, this will instead create a symbolic link to the

 Note that after the first time a Python module is loaded into an interpreter instance, using for example:

-```python
+```
 import mlp
 ```

 Running the `import` statement any further times will have no effect even if the underlying module code has been changed. To reload an already imported module we instead need to use the [`importlib.reload`](https://docs.python.org/3/library/importlib.html#importlib.reload) function, e.g.

-```python
+```
 import importlib
 importlib.reload(mlp)
 ```

 **Note: To be clear as this has caused some confusion in previous labs the above `import ...` / `reload(...)` statements should NOT be run directly in a bash terminal. They are examples Python statements - you could run them in a terminal by first loading a Python interpreter using:**

-```bash
+```
 python
 ```

@ -330,7 +331,7 @@ We observed previously the presence of a `data` subdirectory in the local reposi

 Assuming you used the recommended Miniconda install location and cloned the `mlpractical` repository to your home directory, this variable can be automatically defined when activating the environment by running the following commands (on non-Windows systems):

-```bash
+```
 cd ~/miniconda3/envs/mlp
 mkdir -p ./etc/conda/activate.d
 mkdir -p ./etc/conda/deactivate.d
@ -343,12 +344,12 @@ export MLP_DATA_DIR=$HOME/mlpractical/data

 And on Windows systems (replacing the `[]` placeholders with the relevant paths):

-```bash
+```
 cd [path-to-conda-root]\envs\mlp
 mkdir .\etc\conda\activate.d
 mkdir .\etc\conda\deactivate.d
-echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
-echo "set MLP_DATA_DIR="  >> .\etc\conda\deactivate.d\env_vars.bat
+@echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
+@echo "set MLP_DATA_DIR="  >> .\etc\conda\deactivate.d\env_vars.bat
 set MLP_DATA_DIR=[path-to-local-repository]\data
 ```

@ -362,7 +363,7 @@ There will be a Jupyter notebook available for each lab and assignment in this c

 To open a notebook, you first need to launch a Jupyter notebook server instance. From within the `mlpractical` directory containing your local copy of the repository (and with the `mlp` environment activated) run:

-```bash
+```
 jupyter notebook
 ```

@ -378,13 +379,13 @@ Below are instructions for setting up the environment without additional explana

 Start a new bash terminal. Download the latest 64-bit Python 3.9 Miniconda install script:

-```bash
+```
 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 ```

 Run the install script:

-```bash
+```
 bash Miniconda3-latest-Linux-x86_64.sh
 ```

@ -393,70 +394,69 @@ Review the software license agreement and choose whether to accept. Assuming you
 You will then be asked whether to prepend the Miniconda binaries directory to the `PATH` system environment variable definition in `.bashrc`. You should respond `no` here as we will set up the addition to `PATH` manually in the next step.

 Append the Miniconda binaries directory to `PATH` in manually in `~/.benv`:
-
-```bash
+```
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
 ```

 `source` the updated `~/.benv`:

-```bash
+```
 source ~/.benv
 ```

 Create a new `mlp` Conda environment:

-```bash
-conda create -n mlp python=3.12.5 -y
+```
+conda create -n mlp python=3
 ```

 Activate our created environment:

-```bash
+```
 conda activate mlp
 ```

 Install the dependencies for the course into the new environment:

-```bash
-conda install numpy scipy matplotlib jupyter -y
+```
+conda install numpy scipy matplotlib jupyter
 ```

 Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).

-```bash 
-conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
+```
+conda install pytorch torchvision torchaudio cpuonly -c pytorch
 ```

 Clear the package tarballs Conda just downloaded:

-```bash
+```
 conda clean -t
 ```

 Clone the course repository to your home directory:

-```bash
+```
 git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
 ```

 Make sure we are on the first lab branch

-```bash
+```
 cd ~/mlpractical
-git checkout mlp2024-25/lab1
+git checkout mlp2023-24/lab1
 ```

 Install the `mlp` package in the environment in develop mode

-```bash
+```
 python ~/mlpractical/setup.py develop
 ```

 Add an `MLP_DATA_DIR` variable to the environment

-```bash
+```
 cd ~/miniconda3/envs/mlp
 mkdir -p ./etc/conda/activate.d
 mkdir -p ./etc/conda/deactivate.d
@ -469,13 +469,14 @@ export MLP_DATA_DIR=$HOME/mlpractical/data

 Environment is now set up. Load the notebook server from `mlpractical` directory

-```bash
+```
 cd ~/mlpractical
 jupyter notebook
 ```

 and then open the first lab notebook from the `notebooks` directory.

+
 ---

 <b id="f1">[1]</b> The `echo` command causes the following text to be streamed to an output (standard terminal output by default). Here we use the append redirection operator `>>` to redirect the `echo` output to a file `~/.benv`, with it being appended to the end of the current file. The text actually added is `export PATH="$PATH:[your-home-directory]/miniconda/bin"` with the `\"` being used to escape the quote characters. The `export` command defines system-wide environment variables (more rigorously those inherited by child shells) with `PATH` being the environment variable defining where `bash` searches for executables as a colon-seperated list of directories. Here we add the Miniconda binary directory to the end of the current `PATH` definition. [↩](#a1)
--- a/notes/figures/._putty1.png
+++ b/notes/figures/._putty1.png
--- a/notes/figures/._putty2.png
+++ b/notes/figures/._putty2.png
--- a/notes/figures/._putty3.png
+++ b/notes/figures/._putty3.png
--- a/notes/figures/._putty4.png
+++ b/notes/figures/._putty4.png
--- a/notes/figures/._putty5.png
+++ b/notes/figures/._putty5.png
--- a/notes/figures/boot_disk.png
+++ b/notes/figures/boot_disk.png
--- a/notes/figures/increase_quota.png
+++ b/notes/figures/increase_quota.png
--- a/notes/figures/vm_instance_configuration.png
+++ b/notes/figures/vm_instance_configuration.png
--- a/notes/figures/vm_instance_location.png
+++ b/notes/figures/vm_instance_location.png
--- a/notes/pytorch-experiment-framework.md
+++ b/notes/pytorch-experiment-framework.md
@ -0,0 +1,125 @@
+# PyTorch Experiment Framework
+
+## What does this framework do?
+The PyTorch experiment framework located in ```mlp/pytorch_mlp_framework``` includes tooling for building an array of deep neural networks,
+including fully connected and convolutional networks. In addition, it also includes tooling for experiment running, 
+metric handling and storage, model weight storage, checkpointing (allowing continuation from previous saved point), as 
+well as taking care of keeping track of the best validation model which is then used as the end to produce test set evaluation metrics.
+
+## Why do we need it?
+It serves two main purposes. The first, is to allow you an easy, worry-free transition into using PyTorch for experiments
+ in your coursework. The second, is to teach you good coding practices for building and running deep learning experiments
+  using PyTorch. The framework comes fully loaded with tooling that can keep track of relevant metrics, save models, resume from previous saved states and 
+  even automatically choose the best validation model for test set evaluation. We include documentation and comments in almost 
+  every single line of code in the framework, to help you maximize your learning. The code style itself, can be used for
+   learning good programming practices in structuring your code in a modular, readable and computationally efficient manner that minimizes chances of user-error.
+
+## Installation
+
+First thing you have to do is activate your conda MLP environment. 
+
+### GPU version on Google Compute Engine
+
+For usage on google cloud, the disk image we provide comes pre-loaded with all the packages you need to run the PyTorch
+experiment framework, including PyTorch itself.  Thus when you created an instance and setup your environment, everything you need for this framework was installed, thus removing the need for you to install PyTorch.
+
+### CPU version on DICE (or other local machine)
+
+If you do not have your MLP conda environment installed on your current machine please follow the instructions in the [MLP environment installation guide](notes/environment-set-up.md). It includes an explanation on how to install a CPU version of PyTorch, or a GPU version if you have a GPU available on your local machine.
+
+Once PyTorch is installed in your MLP conda enviroment, you can start using the framework. The framework has been built to allow you to control your experiment hyperparameters directly from the command line, by using command line argument parsing.
+
+## Using the framework
+
+You can get a list of all available hyperparameters and arguments by using:
+```
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py -h
+```
+
+The -h at the end is short for --help, which presents a list with all possible arguments next to a description of what they modify in the setup.
+Once you execute that command, you should be able to see the following list:
+
+```
+Welcome to the MLP course's PyTorch training and inference helper script
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --batch_size [BATCH_SIZE]
+                        Batch_size for experiment
+  --continue_from_epoch [CONTINUE_FROM_EPOCH]
+                        Which epoch to continue from. 
+                        If -2, continues from where it left off
+                        If -1, starts from scratch
+                        if >=0, continues from given epoch
+  --seed [SEED]         Seed to use for random number generator for experiment
+  --image_num_channels [IMAGE_NUM_CHANNELS]
+                        The channel dimensionality of our image-data
+  --image_height [IMAGE_HEIGHT]
+                        Height of image data
+  --image_width [IMAGE_WIDTH]
+                        Width of image data
+  --num_stages [NUM_STAGES]
+                        Number of convolutional stages in the network. A stage
+                        is considered a sequence of convolutional layers where
+                        the input volume remains the same in the spacial
+                        dimension and is always terminated by a dimensionality
+                        reduction stage
+  --num_blocks_per_stage [NUM_BLOCKS_PER_STAGE]
+                        Number of convolutional blocks in each stage, not
+                        including the reduction stage. A convolutional block
+                        is made up of two convolutional layers activated using
+                        the leaky-relu non-linearity
+  --num_filters [NUM_FILTERS]
+                        Number of convolutional filters per convolutional
+                        layer in the network (excluding dimensionality
+                        reduction layers)
+  --num_epochs [NUM_EPOCHS]
+                        The experiment's epoch budget
+  --num_classes [NUM_CLASSES]
+                        The experiment's epoch budget
+  --experiment_name [EXPERIMENT_NAME]
+                        Experiment name - to be used for building the
+                        experiment folder
+  --use_gpu [USE_GPU]   A flag indicating whether we will use GPU acceleration
+                        or not
+  --weight_decay_coefficient [WEIGHT_DECAY_COEFFICIENT]
+                        Weight decay to use for Adam
+  --block_type BLOCK_TYPE
+                        Type of convolutional blocks to use in our network
+                        (This argument will be useful in running experiments
+                        to debug your network)
+
+```
+
+For example, to run a simple experiment using a 7-layer convolutional network on the CPU you can run:
+
+```
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_07 --num_classes 100 --block_type 'conv_block' --weight_decay_coefficient 0.00000 --use_gpu False
+```
+
+Your experiment should begin running.
+
+Your experiments statistics and model weights are saved in the directory tutorial_exp_1/ under tutorial_exp_1/logs and 
+tutorial_exp_1/saved_models.
+
+
+To run on a GPU on Google Compute Engine the command would be:
+```
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_07 --num_classes 100 --block_type 'conv_block' --weight_decay_coefficient 0.00000 --use_gpu True
+
+```
+
+We have also provided the exact scripts we used to run the experiments of VGG07 and VGG37 as shown in the coursework spec inside the files:
+- run_vgg_08_default.sh
+- run_vgg_38_default.sh
+
+**However, remember, if you want to reuse those scripts for your own investigations, change the experiment name and seed.
+If you do not change the name, the old folders will be overwritten.**
+
+## So, where can I ask more questions and find more information on PyTorch and what it can do?
+
+First course of action should be to search the web and then to refer to the PyTorch [documentation](https://pytorch.org/docs/stable/index.html),
+ [tutorials](https://pytorch.org/tutorials/) and [github](https://github.com/pytorch/pytorch) sites.
+ 
+ If you still can't get an answer to your question then as always, post on Piazza and/or come to the lab sessions.
+ 
--- a/pytorch_mlp_framework/arg_extractor.py
+++ b/pytorch_mlp_framework/arg_extractor.py
@ -0,0 +1,133 @@
+import argparse
+
+
+def str2bool(v):
+    if v.lower() in ("yes", "true", "t", "y", "1"):
+        return True
+    elif v.lower() in ("no", "false", "f", "n", "0"):
+        return False
+    else:
+        raise argparse.ArgumentTypeError("Boolean value expected.")
+
+
+def get_args():
+    """
+    Returns a namedtuple with arguments extracted from the command line.
+    :return: A namedtuple with arguments
+    """
+    parser = argparse.ArgumentParser(
+        description="Welcome to the MLP course's Pytorch training and inference helper script"
+    )
+
+    parser.add_argument(
+        "--batch_size",
+        nargs="?",
+        type=int,
+        default=100,
+        help="Batch_size for experiment",
+    )
+    parser.add_argument(
+        "--continue_from_epoch",
+        nargs="?",
+        type=int,
+        default=-1,
+        help="Epoch you want to continue training from while restarting an experiment",
+    )
+    parser.add_argument(
+        "--seed",
+        nargs="?",
+        type=int,
+        default=7112018,
+        help="Seed to use for random number generator for experiment",
+    )
+    parser.add_argument(
+        "--image_num_channels",
+        nargs="?",
+        type=int,
+        default=3,
+        help="The channel dimensionality of our image-data",
+    )
+    parser.add_argument(
+        "--learning-rate",
+        nargs="?",
+        type=float,
+        default=1e-3,
+        help="The learning rate (default 1e-3)",
+    )
+    parser.add_argument(
+        "--image_height", nargs="?", type=int, default=32, help="Height of image data"
+    )
+    parser.add_argument(
+        "--image_width", nargs="?", type=int, default=32, help="Width of image data"
+    )
+    parser.add_argument(
+        "--num_stages",
+        nargs="?",
+        type=int,
+        default=3,
+        help="Number of convolutional stages in the network. A stage is considered a sequence of "
+        "convolutional layers where the input volume remains the same in the spacial dimension and"
+        " is always terminated by a dimensionality reduction stage",
+    )
+    parser.add_argument(
+        "--num_blocks_per_stage",
+        nargs="?",
+        type=int,
+        default=5,
+        help="Number of convolutional blocks in each stage, not including the reduction stage."
+        " A convolutional block is made up of two convolutional layers activated using the "
+        " leaky-relu non-linearity",
+    )
+    parser.add_argument(
+        "--num_filters",
+        nargs="?",
+        type=int,
+        default=16,
+        help="Number of convolutional filters per convolutional layer in the network (excluding "
+        "dimensionality reduction layers)",
+    )
+    parser.add_argument(
+        "--num_epochs",
+        nargs="?",
+        type=int,
+        default=100,
+        help="Total number of epochs for model training",
+    )
+    parser.add_argument(
+        "--num_classes",
+        nargs="?",
+        type=int,
+        default=100,
+        help="Number of classes in the dataset",
+    )
+    parser.add_argument(
+        "--experiment_name",
+        nargs="?",
+        type=str,
+        default="exp_1",
+        help="Experiment name - to be used for building the experiment folder",
+    )
+    parser.add_argument(
+        "--use_gpu",
+        nargs="?",
+        type=str2bool,
+        default=True,
+        help="A flag indicating whether we will use GPU acceleration or not",
+    )
+    parser.add_argument(
+        "--weight_decay_coefficient",
+        nargs="?",
+        type=float,
+        default=0,
+        help="Weight decay to use for Adam",
+    )
+    parser.add_argument(
+        "--block_type",
+        type=str,
+        default="conv_block",
+        help="Type of convolutional blocks to use in our network"
+        "(This argument will be useful in running experiments to debug your network)",
+    )
+    args = parser.parse_args()
+    print(args)
+    return args
--- a/pytorch_mlp_framework/experiment_builder.py
+++ b/pytorch_mlp_framework/experiment_builder.py
@ -0,0 +1,462 @@
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import torch.nn.functional as F
+import tqdm
+import os
+import numpy as np
+import time
+
+from pytorch_mlp_framework.storage_utils import save_statistics
+from matplotlib import pyplot as plt
+import matplotlib
+
+matplotlib.rcParams.update({"font.size": 8})
+
+
+class ExperimentBuilder(nn.Module):
+    def __init__(
+        self,
+        network_model,
+        experiment_name,
+        num_epochs,
+        train_data,
+        val_data,
+        test_data,
+        weight_decay_coefficient,
+        learning_rate,
+        use_gpu,
+        continue_from_epoch=-1,
+    ):
+        """
+        Initializes an ExperimentBuilder object. Such an object takes care of running training and evaluation of a deep net
+        on a given dataset. It also takes care of saving per epoch models and automatically inferring the best val model
+        to be used for evaluating the test set metrics.
+        :param network_model: A pytorch nn.Module which implements a network architecture.
+        :param experiment_name: The name of the experiment. This is used mainly for keeping track of the experiment and creating and directory structure that will be used to save logs, model parameters and other.
+        :param num_epochs: Total number of epochs to run the experiment
+        :param train_data: An object of the DataProvider type. Contains the training set.
+        :param val_data: An object of the DataProvider type. Contains the val set.
+        :param test_data: An object of the DataProvider type. Contains the test set.
+        :param weight_decay_coefficient: A float indicating the weight decay to use with the adam optimizer.
+        :param use_gpu: A boolean indicating whether to use a GPU or not.
+        :param continue_from_epoch: An int indicating whether we'll start from scrach (-1) or whether we'll reload a previously saved model of epoch 'continue_from_epoch' and continue training from there.
+        """
+        super(ExperimentBuilder, self).__init__()
+
+        self.experiment_name = experiment_name
+        self.model = network_model
+
+        if torch.cuda.device_count() >= 1 and use_gpu:
+            self.device = torch.device("cuda")
+            self.model.to(self.device)  # sends the model from the cpu to the gpu
+            print("Use GPU", self.device)
+        else:
+            print("use CPU")
+            self.device = torch.device("cpu")  # sets the device to be CPU
+            print(self.device)
+
+        print("here")
+
+        self.model.reset_parameters()  # re-initialize network parameters
+        self.train_data = train_data
+        self.val_data = val_data
+        self.test_data = test_data
+
+        print("System learnable parameters")
+        num_conv_layers = 0
+        num_linear_layers = 0
+        total_num_parameters = 0
+        for name, value in self.named_parameters():
+            print(name, value.shape)
+            if all(item in name for item in ["conv", "weight"]):
+                num_conv_layers += 1
+            if all(item in name for item in ["linear", "weight"]):
+                num_linear_layers += 1
+            total_num_parameters += np.prod(value.shape)
+
+        print("Total number of parameters", total_num_parameters)
+        print("Total number of conv layers", num_conv_layers)
+        print("Total number of linear layers", num_linear_layers)
+
+        print(f"Learning rate: {learning_rate}")
+        self.optimizer = optim.Adam(
+            self.parameters(),
+            amsgrad=False,
+            weight_decay=weight_decay_coefficient,
+            lr=learning_rate,
+        )
+        self.learning_rate_scheduler = optim.lr_scheduler.CosineAnnealingLR(
+            self.optimizer, T_max=num_epochs, eta_min=0.00002
+        )
+        # Generate the directory names
+        self.experiment_folder = os.path.abspath(experiment_name)
+        self.experiment_logs = os.path.abspath(
+            os.path.join(self.experiment_folder, "result_outputs")
+        )
+        self.experiment_saved_models = os.path.abspath(
+            os.path.join(self.experiment_folder, "saved_models")
+        )
+
+        # Set best models to be at 0 since we are just starting
+        self.best_val_model_idx = 0
+        self.best_val_model_acc = 0.0
+
+        if not os.path.exists(
+            self.experiment_folder
+        ):  # If experiment directory does not exist
+            os.mkdir(self.experiment_folder)  # create the experiment directory
+            os.mkdir(self.experiment_logs)  # create the experiment log directory
+            os.mkdir(
+                self.experiment_saved_models
+            )  # create the experiment saved models directory
+
+        self.num_epochs = num_epochs
+        self.criterion = nn.CrossEntropyLoss().to(
+            self.device
+        )  # send the loss computation to the GPU
+
+        if (
+            continue_from_epoch == -2
+        ):  # if continue from epoch is -2 then continue from latest saved model
+            self.state, self.best_val_model_idx, self.best_val_model_acc = (
+                self.load_model(
+                    model_save_dir=self.experiment_saved_models,
+                    model_save_name="train_model",
+                    model_idx="latest",
+                )
+            )  # reload existing model from epoch and return best val model index
+            # and the best val acc of that model
+            self.starting_epoch = int(self.state["model_epoch"])
+
+        elif continue_from_epoch > -1:  # if continue from epoch is greater than -1 then
+            self.state, self.best_val_model_idx, self.best_val_model_acc = (
+                self.load_model(
+                    model_save_dir=self.experiment_saved_models,
+                    model_save_name="train_model",
+                    model_idx=continue_from_epoch,
+                )
+            )  # reload existing model from epoch and return best val model index
+            # and the best val acc of that model
+            self.starting_epoch = continue_from_epoch
+        else:
+            self.state = dict()
+            self.starting_epoch = 0
+
+    def get_num_parameters(self):
+        total_num_params = 0
+        for param in self.parameters():
+            total_num_params += np.prod(param.shape)
+
+        return total_num_params
+
+    def plot_func_def(self, all_grads, layers):
+        """
+        Plot function definition to plot the average gradient with respect to the number of layers in the given model
+        :param all_grads: Gradients wrt weights for each layer in the model.
+        :param layers: Layer names corresponding to the model parameters
+        :return: plot for gradient flow
+        """
+        plt.plot(all_grads, alpha=0.3, color="b")
+        plt.hlines(0, 0, len(all_grads) + 1, linewidth=1, color="k")
+        plt.xticks(range(0, len(all_grads), 1), layers, rotation="vertical")
+        plt.xlim(xmin=0, xmax=len(all_grads))
+        plt.xlabel("Layers")
+        plt.ylabel("Average Gradient")
+        plt.title("Gradient flow")
+        plt.grid(True)
+        plt.tight_layout()
+
+        return plt
+
+    def plot_grad_flow(self, named_parameters):
+        """
+        The function is being called in Line 298 of this file.
+        Receives the parameters of the model being trained. Returns plot of gradient flow for the given model parameters.
+
+        """
+        all_grads = []
+        layers = []
+
+        """
+        Complete the code in the block below to collect absolute mean of the gradients for each layer in all_grads with the             layer names in layers.
+        """
+
+        for name, param in named_parameters:
+            if "bias" in name:
+                continue
+            # Check if the parameter requires gradient and has a gradient
+            if param.requires_grad and param.grad is not None:
+                try:
+                    _, a, _, b, _ = name.split(".", 4)
+                except:
+                    b, a = name.split(".", 1)
+
+                layers.append(f"{a}_{b}")
+                # Collect the mean of the absolute gradients
+                all_grads.append(param.grad.abs().mean().item())
+
+        plt = self.plot_func_def(all_grads, layers)
+
+        return plt
+
+    def run_train_iter(self, x, y):
+
+        self.train()  # sets model to training mode (in case batch normalization or other methods have different procedures for training and evaluation)
+        x, y = x.float().to(device=self.device), y.long().to(
+            device=self.device
+        )  # send data to device as torch tensors
+        out = self.model.forward(x)  # forward the data in the model
+
+        loss = F.cross_entropy(input=out, target=y)  # compute loss
+
+        self.optimizer.zero_grad()  # set all weight grads from previous training iters to 0
+        loss.backward()  # backpropagate to compute gradients for current iter loss
+
+        self.optimizer.step()  # update network parameters
+        self.learning_rate_scheduler.step()  # update learning rate scheduler
+
+        _, predicted = torch.max(out.data, 1)  # get argmax of predictions
+        accuracy = np.mean(list(predicted.eq(y.data).cpu()))  # compute accuracy
+        return loss.cpu().data.numpy(), accuracy
+
+    def run_evaluation_iter(self, x, y):
+        """
+        Receives the inputs and targets for the model and runs an evaluation iterations. Returns loss and accuracy metrics.
+        :param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
+        :param y: The targets for the model. A numpy array of shape batch_size, num_classes
+        :return: the loss and accuracy for this batch
+        """
+        self.eval()  # sets the system to validation mode
+        x, y = x.float().to(device=self.device), y.long().to(
+            device=self.device
+        )  # convert data to pytorch tensors and send to the computation device
+        out = self.model.forward(x)  # forward the data in the model
+
+        loss = F.cross_entropy(input=out, target=y)  # compute loss
+
+        _, predicted = torch.max(out.data, 1)  # get argmax of predictions
+        accuracy = np.mean(list(predicted.eq(y.data).cpu()))  # compute accuracy
+        return loss.cpu().data.numpy(), accuracy
+
+    def save_model(
+        self,
+        model_save_dir,
+        model_save_name,
+        model_idx,
+        best_validation_model_idx,
+        best_validation_model_acc,
+    ):
+        """
+        Save the network parameter state and current best val epoch idx and best val accuracy.
+        :param model_save_name: Name to use to save model without the epoch index
+        :param model_idx: The index to save the model with.
+        :param best_validation_model_idx: The index of the best validation model to be stored for future use.
+        :param best_validation_model_acc: The best validation accuracy to be stored for use at test time.
+        :param model_save_dir: The directory to store the state at.
+        :param state: The dictionary containing the system state.
+
+        """
+        self.state["network"] = (
+            self.state_dict()
+        )  # save network parameter and other variables.
+        self.state["best_val_model_idx"] = (
+            best_validation_model_idx  # save current best val idx
+        )
+        self.state["best_val_model_acc"] = (
+            best_validation_model_acc  # save current best val acc
+        )
+        torch.save(
+            self.state,
+            f=os.path.join(
+                model_save_dir, "{}_{}".format(model_save_name, str(model_idx))
+            ),
+        )  # save state at prespecified filepath
+
+    def load_model(self, model_save_dir, model_save_name, model_idx):
+        """
+        Load the network parameter state and the best val model idx and best val acc to be compared with the future val accuracies, in order to choose the best val model
+        :param model_save_dir: The directory to store the state at.
+        :param model_save_name: Name to use to save model without the epoch index
+        :param model_idx: The index to save the model with.
+        :return: best val idx and best val model acc, also it loads the network state into the system state without returning it
+        """
+        state = torch.load(
+            f=os.path.join(
+                model_save_dir, "{}_{}".format(model_save_name, str(model_idx))
+            )
+        )
+        self.load_state_dict(state_dict=state["network"])
+        return state, state["best_val_model_idx"], state["best_val_model_acc"]
+
+    def run_experiment(self):
+        """
+        Runs experiment train and evaluation iterations, saving the model and best val model and val model accuracy after each epoch
+        :return: The summary current_epoch_losses from starting epoch to total_epochs.
+        """
+        total_losses = {
+            "train_acc": [],
+            "train_loss": [],
+            "val_acc": [],
+            "val_loss": [],
+        }  # initialize a dict to keep the per-epoch metrics
+        for i, epoch_idx in enumerate(range(self.starting_epoch, self.num_epochs)):
+            epoch_start_time = time.time()
+            current_epoch_losses = {
+                "train_acc": [],
+                "train_loss": [],
+                "val_acc": [],
+                "val_loss": [],
+            }
+            self.current_epoch = epoch_idx
+            with tqdm.tqdm(
+                total=len(self.train_data)
+            ) as pbar_train:  # create a progress bar for training
+                for idx, (x, y) in enumerate(self.train_data):  # get data batches
+                    loss, accuracy = self.run_train_iter(
+                        x=x, y=y
+                    )  # take a training iter step
+                    current_epoch_losses["train_loss"].append(
+                        loss
+                    )  # add current iter loss to the train loss list
+                    current_epoch_losses["train_acc"].append(
+                        accuracy
+                    )  # add current iter acc to the train acc list
+                    pbar_train.update(1)
+                    pbar_train.set_description(
+                        "loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
+                    )
+
+            with tqdm.tqdm(
+                total=len(self.val_data)
+            ) as pbar_val:  # create a progress bar for validation
+                for x, y in self.val_data:  # get data batches
+                    loss, accuracy = self.run_evaluation_iter(
+                        x=x, y=y
+                    )  # run a validation iter
+                    current_epoch_losses["val_loss"].append(
+                        loss
+                    )  # add current iter loss to val loss list.
+                    current_epoch_losses["val_acc"].append(
+                        accuracy
+                    )  # add current iter acc to val acc lst.
+                    pbar_val.update(1)  # add 1 step to the progress bar
+                    pbar_val.set_description(
+                        "loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
+                    )
+            val_mean_accuracy = np.mean(current_epoch_losses["val_acc"])
+            if (
+                val_mean_accuracy > self.best_val_model_acc
+            ):  # if current epoch's mean val acc is greater than the saved best val acc then
+                self.best_val_model_acc = val_mean_accuracy  # set the best val model acc to be current epoch's val accuracy
+                self.best_val_model_idx = epoch_idx  # set the experiment-wise best val idx to be the current epoch's idx
+
+            for key, value in current_epoch_losses.items():
+                total_losses[key].append(
+                    np.mean(value)
+                )  # get mean of all metrics of current epoch metrics dict, to get them ready for storage and output on the terminal.
+
+            save_statistics(
+                experiment_log_dir=self.experiment_logs,
+                filename="summary.csv",
+                stats_dict=total_losses,
+                current_epoch=i,
+                continue_from_mode=(
+                    True if (self.starting_epoch != 0 or i > 0) else False
+                ),
+            )  # save statistics to stats file.
+
+            # load_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv') # How to load a csv file if you need to
+
+            out_string = "_".join(
+                [
+                    "{}_{:.4f}".format(key, np.mean(value))
+                    for key, value in current_epoch_losses.items()
+                ]
+            )
+            # create a string to use to report our epoch metrics
+            epoch_elapsed_time = (
+                time.time() - epoch_start_time
+            )  # calculate time taken for epoch
+            epoch_elapsed_time = "{:.4f}".format(epoch_elapsed_time)
+            print(
+                "Epoch {}:".format(epoch_idx),
+                out_string,
+                "epoch time",
+                epoch_elapsed_time,
+                "seconds",
+            )
+            self.state["model_epoch"] = epoch_idx
+            self.save_model(
+                model_save_dir=self.experiment_saved_models,
+                # save model and best val idx and best val acc, using the model dir, model name and model idx
+                model_save_name="train_model",
+                model_idx=epoch_idx,
+                best_validation_model_idx=self.best_val_model_idx,
+                best_validation_model_acc=self.best_val_model_acc,
+            )
+            self.save_model(
+                model_save_dir=self.experiment_saved_models,
+                # save model and best val idx and best val acc, using the model dir, model name and model idx
+                model_save_name="train_model",
+                model_idx="latest",
+                best_validation_model_idx=self.best_val_model_idx,
+                best_validation_model_acc=self.best_val_model_acc,
+            )
+
+            ################################################################
+            ##### Plot Gradient Flow at each Epoch during Training  ######
+            print("Generating Gradient Flow Plot at epoch {}".format(epoch_idx))
+            plt = self.plot_grad_flow(self.model.named_parameters())
+            if not os.path.exists(
+                os.path.join(self.experiment_saved_models, "gradient_flow_plots")
+            ):
+                os.mkdir(
+                    os.path.join(self.experiment_saved_models, "gradient_flow_plots")
+                )
+                # plt.legend(loc="best")
+            plt.savefig(
+                os.path.join(
+                    self.experiment_saved_models,
+                    "gradient_flow_plots",
+                    "epoch{}.pdf".format(str(epoch_idx)),
+                )
+            )
+            ################################################################
+
+        print("Generating test set evaluation metrics")
+        self.load_model(
+            model_save_dir=self.experiment_saved_models,
+            model_idx=self.best_val_model_idx,
+            # load best validation model
+            model_save_name="train_model",
+        )
+        current_epoch_losses = {
+            "test_acc": [],
+            "test_loss": [],
+        }  # initialize a statistics dict
+        with tqdm.tqdm(total=len(self.test_data)) as pbar_test:  # ini a progress bar
+            for x, y in self.test_data:  # sample batch
+                loss, accuracy = self.run_evaluation_iter(
+                    x=x, y=y
+                )  # compute loss and accuracy by running an evaluation step
+                current_epoch_losses["test_loss"].append(loss)  # save test loss
+                current_epoch_losses["test_acc"].append(accuracy)  # save test accuracy
+                pbar_test.update(1)  # update progress bar status
+                pbar_test.set_description(
+                    "loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
+                )  # update progress bar string output
+
+        test_losses = {
+            key: [np.mean(value)] for key, value in current_epoch_losses.items()
+        }  # save test set metrics in dict format
+        save_statistics(
+            experiment_log_dir=self.experiment_logs,
+            filename="test_summary.csv",
+            # save test set metrics on disk in .csv format
+            stats_dict=test_losses,
+            current_epoch=0,
+            continue_from_mode=False,
+        )
+
+        return total_losses, test_losses
--- a/pytorch_mlp_framework/model_architectures.py
+++ b/pytorch_mlp_framework/model_architectures.py
@ -0,0 +1,640 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class FCCNetwork(nn.Module):
+    def __init__(
+        self, input_shape, num_output_classes, num_filters, num_layers, use_bias=False
+    ):
+        """
+        Initializes a fully connected network similar to the ones implemented previously in the MLP package.
+        :param input_shape: The shape of the inputs going in to the network.
+        :param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
+        :param num_filters: Number of filters used in every fcc layer.
+        :param num_layers: Number of fcc layers (excluding dim reduction stages)
+        :param use_bias: Whether our fcc layers will use a bias.
+        """
+        super(FCCNetwork, self).__init__()
+        # set up class attributes useful in building the network and inference
+        self.input_shape = input_shape
+        self.num_filters = num_filters
+        self.num_output_classes = num_output_classes
+        self.use_bias = use_bias
+        self.num_layers = num_layers
+        # initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
+        self.layer_dict = nn.ModuleDict()
+        # build the network
+        self.build_module()
+
+    def build_module(self):
+        print("Building basic block of FCCNetwork using input shape", self.input_shape)
+        x = torch.zeros((self.input_shape))
+
+        out = x
+        out = out.view(out.shape[0], -1)
+        # flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
+        # shapes of all dimensions after the 0th dim
+
+        for i in range(self.num_layers):
+            self.layer_dict["fcc_{}".format(i)] = nn.Linear(
+                in_features=out.shape[1],  # initialize a fcc layer
+                out_features=self.num_filters,
+                bias=self.use_bias,
+            )
+
+            out = self.layer_dict["fcc_{}".format(i)](
+                out
+            )  # apply ith fcc layer to the previous layers outputs
+            out = F.relu(out)  # apply a ReLU on the outputs
+
+        self.logits_linear_layer = nn.Linear(
+            in_features=out.shape[1],  # initialize the prediction output linear layer
+            out_features=self.num_output_classes,
+            bias=self.use_bias,
+        )
+        out = self.logits_linear_layer(
+            out
+        )  # apply the layer to the previous layer's outputs
+        print("Block is built, output volume is", out.shape)
+        return out
+
+    def forward(self, x):
+        """
+        Forward prop data through the network and return the preds
+        :param x: Input batch x a batch of shape batch number of samples, each of any dimensionality.
+        :return: preds of shape (b, num_classes)
+        """
+        out = x
+        out = out.view(out.shape[0], -1)
+        # flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
+        # shapes of all dimensions after the 0th dim
+
+        for i in range(self.num_layers):
+            out = self.layer_dict["fcc_{}".format(i)](
+                out
+            )  # apply ith fcc layer to the previous layers outputs
+            out = F.relu(out)  # apply a ReLU on the outputs
+
+        out = self.logits_linear_layer(
+            out
+        )  # apply the layer to the previous layer's outputs
+        return out
+
+    def reset_parameters(self):
+        """
+        Re-initializes the networks parameters
+        """
+        for item in self.layer_dict.children():
+            item.reset_parameters()
+
+        self.logits_linear_layer.reset_parameters()
+
+
+class EmptyBlock(nn.Module):
+    def __init__(
+        self,
+        input_shape=None,
+        num_filters=None,
+        kernel_size=None,
+        padding=None,
+        bias=None,
+        dilation=None,
+        reduction_factor=None,
+    ):
+        super(EmptyBlock, self).__init__()
+
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        self.layer_dict["Identity"] = nn.Identity()
+
+    def forward(self, x):
+        out = x
+
+        out = self.layer_dict["Identity"].forward(out)
+
+        return out
+
+
+class EntryConvolutionalBlock(nn.Module):
+    def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
+        super(EntryConvolutionalBlock, self).__init__()
+
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        out = x
+
+        self.layer_dict["conv_0"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+
+        out = self.layer_dict["conv_0"].forward(out)
+        self.layer_dict["bn_0"] = nn.BatchNorm2d(num_features=out.shape[1])
+        out = F.leaky_relu(self.layer_dict["bn_0"].forward(out))
+
+        print(out.shape)
+
+    def forward(self, x):
+        out = x
+
+        out = self.layer_dict["conv_0"].forward(out)
+        out = F.leaky_relu(self.layer_dict["bn_0"].forward(out))
+
+        return out
+
+
+class ConvolutionalProcessingBlock(nn.Module):
+    def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
+        super(ConvolutionalProcessingBlock, self).__init__()
+
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        out = x
+
+        self.layer_dict["conv_0"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+
+        out = self.layer_dict["conv_0"].forward(out)
+        out = F.leaky_relu(out)
+
+        self.layer_dict["conv_1"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+
+        out = self.layer_dict["conv_1"].forward(out)
+        out = F.leaky_relu(out)
+
+        print(out.shape)
+
+    def forward(self, x):
+        out = x
+
+        out = self.layer_dict["conv_0"].forward(out)
+        out = F.leaky_relu(out)
+
+        out = self.layer_dict["conv_1"].forward(out)
+        out = F.leaky_relu(out)
+
+        return out
+
+
+class ConvolutionalDimensionalityReductionBlock(nn.Module):
+    def __init__(
+        self,
+        input_shape,
+        num_filters,
+        kernel_size,
+        padding,
+        bias,
+        dilation,
+        reduction_factor,
+    ):
+        super(ConvolutionalDimensionalityReductionBlock, self).__init__()
+
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+        self.reduction_factor = reduction_factor
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        out = x
+
+        self.layer_dict["conv_0"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+
+        out = self.layer_dict["conv_0"].forward(out)
+        out = F.leaky_relu(out)
+
+        out = F.avg_pool2d(out, self.reduction_factor)
+
+        self.layer_dict["conv_1"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+
+        out = self.layer_dict["conv_1"].forward(out)
+        out = F.leaky_relu(out)
+
+        print(out.shape)
+
+    def forward(self, x):
+        out = x
+
+        out = self.layer_dict["conv_0"].forward(out)
+        out = F.leaky_relu(out)
+
+        out = F.avg_pool2d(out, self.reduction_factor)
+
+        out = self.layer_dict["conv_1"].forward(out)
+        out = F.leaky_relu(out)
+
+        return out
+
+
+class ConvolutionalNetwork(nn.Module):
+    def __init__(
+        self,
+        input_shape,
+        num_output_classes,
+        num_filters,
+        num_blocks_per_stage,
+        num_stages,
+        use_bias=False,
+        processing_block_type=ConvolutionalProcessingBlock,
+        dimensionality_reduction_block_type=ConvolutionalDimensionalityReductionBlock,
+    ):
+        """
+        Initializes a convolutional network module
+        :param input_shape: The shape of the tensor to be passed into this network
+        :param num_output_classes: Number of output classes
+        :param num_filters: Number of filters per convolutional layer
+        :param num_blocks_per_stage: Number of blocks per "stage". Each block is composed of 2 convolutional layers.
+        :param num_stages: Number of stages in a network. A stage is defined as a sequence of layers within which the
+        data dimensionality remains constant in the spacial axis (h, w) and can change in the channel axis. After each stage
+        there exists a dimensionality reduction stage, composed of two convolutional layers and an avg pooling layer.
+        :param use_bias: Whether to use biases in our convolutional layers
+        :param processing_block_type: Type of processing block to use within our stages
+        :param dimensionality_reduction_block_type: Type of dimensionality reduction block to use after each stage in our network
+        """
+        super(ConvolutionalNetwork, self).__init__()
+        # set up class attributes useful in building the network and inference
+        self.input_shape = input_shape
+        self.num_filters = num_filters
+        self.num_output_classes = num_output_classes
+        self.use_bias = use_bias
+        self.num_blocks_per_stage = num_blocks_per_stage
+        self.num_stages = num_stages
+        self.processing_block_type = processing_block_type
+        self.dimensionality_reduction_block_type = dimensionality_reduction_block_type
+
+        # build the network
+        self.build_module()
+
+    def build_module(self):
+        """
+        Builds network whilst automatically inferring shapes of layers.
+        """
+        self.layer_dict = nn.ModuleDict()
+        # initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
+        print(
+            "Building basic block of ConvolutionalNetwork using input shape",
+            self.input_shape,
+        )
+        x = torch.zeros(
+            (self.input_shape)
+        )  # create dummy inputs to be used to infer shapes of layers
+
+        out = x
+        self.layer_dict["input_conv"] = EntryConvolutionalBlock(
+            input_shape=out.shape,
+            num_filters=self.num_filters,
+            kernel_size=3,
+            padding=1,
+            bias=self.use_bias,
+            dilation=1,
+        )
+        out = self.layer_dict["input_conv"].forward(out)
+        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
+        for i in range(self.num_stages):  # for number of layers times
+            for j in range(self.num_blocks_per_stage):
+                self.layer_dict["block_{}_{}".format(i, j)] = (
+                    self.processing_block_type(
+                        input_shape=out.shape,
+                        num_filters=self.num_filters,
+                        bias=self.use_bias,
+                        kernel_size=3,
+                        dilation=1,
+                        padding=1,
+                    )
+                )
+                out = self.layer_dict["block_{}_{}".format(i, j)].forward(out)
+            self.layer_dict["reduction_block_{}".format(i)] = (
+                self.dimensionality_reduction_block_type(
+                    input_shape=out.shape,
+                    num_filters=self.num_filters,
+                    bias=True,
+                    kernel_size=3,
+                    dilation=1,
+                    padding=1,
+                    reduction_factor=2,
+                )
+            )
+            out = self.layer_dict["reduction_block_{}".format(i)].forward(out)
+
+        out = F.avg_pool2d(out, out.shape[-1])
+        print("shape before final linear layer", out.shape)
+        out = out.view(out.shape[0], -1)
+        self.logit_linear_layer = nn.Linear(
+            in_features=out.shape[1],  # add a linear layer
+            out_features=self.num_output_classes,
+            bias=True,
+        )
+        out = self.logit_linear_layer(out)  # apply linear layer on flattened inputs
+        print("Block is built, output volume is", out.shape)
+        return out
+
+    def forward(self, x):
+        """
+        Forward propages the network given an input batch
+        :param x: Inputs x (b, c, h, w)
+        :return: preds (b, num_classes)
+        """
+        out = x
+        out = self.layer_dict["input_conv"].forward(out)
+        for i in range(self.num_stages):  # for number of layers times
+            for j in range(self.num_blocks_per_stage):
+                out = self.layer_dict["block_{}_{}".format(i, j)].forward(out)
+            out = self.layer_dict["reduction_block_{}".format(i)].forward(out)
+
+        out = F.avg_pool2d(out, out.shape[-1])
+        out = out.view(
+            out.shape[0], -1
+        )  # flatten outputs from (b, c, h, w) to (b, c*h*w)
+        out = self.logit_linear_layer(
+            out
+        )  # pass through a linear layer to get logits/preds
+        return out
+
+    def reset_parameters(self):
+        """
+        Re-initialize the network parameters.
+        """
+        for item in self.layer_dict.children():
+            try:
+                item.reset_parameters()
+            except:
+                pass
+
+        self.logit_linear_layer.reset_parameters()
+
+
+# My Implementation:
+
+
+class ConvolutionalProcessingBlockBN(nn.Module):
+    def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
+        super().__init__()
+
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        out = x
+
+        # First convolutional layer with Batch Normalization
+        self.layer_dict["conv_0"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+        self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
+        out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
+
+        # Second convolutional layer with Batch Normalization
+        self.layer_dict["conv_1"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+        self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
+        out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
+
+        print(out.shape)
+
+    def forward(self, x):
+        out = x
+
+        # Apply first conv layer + BN + ReLU
+        out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
+
+        # Apply second conv layer + BN + ReLU
+        out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
+
+        return out
+
+
+class ConvolutionalDimensionalityReductionBlockBN(nn.Module):
+    def __init__(
+        self,
+        input_shape,
+        num_filters,
+        kernel_size,
+        padding,
+        bias,
+        dilation,
+        reduction_factor,
+    ):
+        super().__init__()
+
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+        self.reduction_factor = reduction_factor
+
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        out = x
+
+        # First convolutional layer with Batch Normalization
+        self.layer_dict["conv_0"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+        self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
+        out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
+
+        # Dimensionality reduction through average pooling
+        out = F.avg_pool2d(out, self.reduction_factor)
+
+        # Second convolutional layer with Batch Normalization
+        self.layer_dict["conv_1"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+        self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
+        out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
+
+        print(out.shape)
+
+    def forward(self, x):
+        out = x
+
+        # Apply first conv layer + BN + ReLU
+        out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
+
+        # Dimensionality reduction through average pooling
+        out = F.avg_pool2d(out, self.reduction_factor)
+
+        # Apply second conv layer + BN + ReLU
+        out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
+
+        return out
+
+
+class ConvolutionalProcessingBlockBNRC(nn.Module):
+    def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
+        super().__init__()
+        self.num_filters = num_filters
+        self.kernel_size = kernel_size
+        self.input_shape = input_shape
+        self.padding = padding
+        self.bias = bias
+        self.dilation = dilation
+        self.build_module()
+
+    def build_module(self):
+        self.layer_dict = nn.ModuleDict()
+        x = torch.zeros(self.input_shape)
+        out = x
+
+        # First convolutional layer with BN
+        self.layer_dict["conv_0"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+        self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
+
+        out = self.layer_dict["conv_0"].forward(out)
+        out = self.layer_dict["bn_0"].forward(out)
+        out = F.leaky_relu(out)
+
+        # Second convolutional layer with BN
+        self.layer_dict["conv_1"] = nn.Conv2d(
+            in_channels=out.shape[1],
+            out_channels=self.num_filters,
+            bias=self.bias,
+            kernel_size=self.kernel_size,
+            dilation=self.dilation,
+            padding=self.padding,
+            stride=1,
+        )
+        self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
+
+        out = self.layer_dict["conv_1"].forward(out)
+        out = self.layer_dict["bn_1"].forward(out)
+        out = F.leaky_relu(out)
+
+        # Print final output shape for debugging
+        print(out.shape)
+
+    def forward(self, x):
+        residual = x  # Save input for residual connection
+        out = x
+
+        # Apply first conv layer + BN + ReLU
+        out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
+
+        # Apply second conv layer + BN + ReLU
+        out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
+
+        # Add residual connection
+        # Ensure shape compatibility
+        assert residual.shape == out.shape
+        # if residual.shape == out.shape:
+        out += residual
+
+        return out
--- a/pytorch_mlp_framework/storage_utils.py
+++ b/pytorch_mlp_framework/storage_utils.py
@ -0,0 +1,77 @@
+import pickle
+import os
+import csv
+
+
+def save_to_stats_pkl_file(experiment_log_filepath, filename, stats_dict):
+    summary_filename = os.path.join(experiment_log_filepath, filename)
+    with open("{}.pkl".format(summary_filename), "wb") as file_writer:
+        pickle.dump(stats_dict, file_writer)
+
+
+def load_from_stats_pkl_file(experiment_log_filepath, filename):
+    summary_filename = os.path.join(experiment_log_filepath, filename)
+    with open("{}.pkl".format(summary_filename), "rb") as file_reader:
+        stats = pickle.load(file_reader)
+
+    return stats
+
+
+def save_statistics(
+    experiment_log_dir,
+    filename,
+    stats_dict,
+    current_epoch,
+    continue_from_mode=False,
+    save_full_dict=False,
+):
+    """
+    Saves the statistics in stats dict into a csv file. Using the keys as the header entries and the values as the
+    columns of a particular header entry
+    :param experiment_log_dir: the log folder dir filepath
+    :param filename: the name of the csv file
+    :param stats_dict: the stats dict containing the data to be saved
+    :param current_epoch: the number of epochs since commencement of the current training session (i.e. if the experiment continued from 100 and this is epoch 105, then pass relative distance of 5.)
+    :param save_full_dict: whether to save the full dict as is overriding any previous entries (might be useful if we want to overwrite a file)
+    :return: The filepath to the summary file
+    """
+    summary_filename = os.path.join(experiment_log_dir, filename)
+    mode = "a" if continue_from_mode else "w"
+    with open(summary_filename, mode) as f:
+        writer = csv.writer(f)
+        if not continue_from_mode:
+            writer.writerow(list(stats_dict.keys()))
+
+        if save_full_dict:
+            total_rows = len(list(stats_dict.values())[0])
+            for idx in range(total_rows):
+                row_to_add = [value[idx] for value in list(stats_dict.values())]
+                writer.writerow(row_to_add)
+        else:
+            row_to_add = [value[current_epoch] for value in list(stats_dict.values())]
+            writer.writerow(row_to_add)
+
+    return summary_filename
+
+
+def load_statistics(experiment_log_dir, filename):
+    """
+    Loads a statistics csv file into a dictionary
+    :param experiment_log_dir: the log folder dir filepath
+    :param filename: the name of the csv file to load
+    :return: A dictionary containing the stats in the csv file. Header entries are converted into keys and columns of a
+     particular header are converted into values of a key in a list format.
+    """
+    summary_filename = os.path.join(experiment_log_dir, filename)
+
+    with open(summary_filename, "r+") as f:
+        lines = f.readlines()
+
+    keys = lines[0].split(",")
+    stats = {key: [] for key in keys}
+    for line in lines[1:]:
+        values = line.split(",")
+        for idx, value in enumerate(values):
+            stats[keys[idx]].append(value)
+
+    return stats
--- a/pytorch_mlp_framework/tests.py
+++ b/pytorch_mlp_framework/tests.py
@ -0,0 +1,87 @@
+import unittest
+import torch
+from model_architectures import (
+    ConvolutionalProcessingBlockBN,
+    ConvolutionalDimensionalityReductionBlockBN,
+    ConvolutionalProcessingBlockBNRC,
+)
+
+
+class TestBatchNormalizationBlocks(unittest.TestCase):
+    def setUp(self):
+        # Common parameters
+        self.input_shape = (1, 3, 32, 32)  # Batch size 1, 3 channels, 32x32 input
+        self.num_filters = 16
+        self.kernel_size = 3
+        self.padding = 1
+        self.bias = False
+        self.dilation = 1
+        self.reduction_factor = 2
+
+    def test_convolutional_processing_block(self):
+        # Create a ConvolutionalProcessingBlockBN instance
+        block = ConvolutionalProcessingBlockBN(
+            input_shape=self.input_shape,
+            num_filters=self.num_filters,
+            kernel_size=self.kernel_size,
+            padding=self.padding,
+            bias=self.bias,
+            dilation=self.dilation,
+        )
+
+        # Generate a random tensor matching the input shape
+        input_tensor = torch.randn(self.input_shape)
+
+        # Forward pass
+        try:
+            output = block(input_tensor)
+            self.assertIsNotNone(output, "Output should not be None.")
+        except Exception as e:
+            self.fail(f"ConvolutionalProcessingBlock raised an error: {e}")
+
+    def test_convolutional_processing_block_with_rc(self):
+        # Create a ConvolutionalProcessingBlockBNRC instance
+        block = ConvolutionalProcessingBlockBNRC(
+            input_shape=self.input_shape,
+            num_filters=self.num_filters,
+            kernel_size=self.kernel_size,
+            padding=self.padding,
+            bias=self.bias,
+            dilation=self.dilation,
+        )
+
+        # Generate a random tensor matching the input shape
+        input_tensor = torch.randn(self.input_shape)
+
+        # Forward pass
+        try:
+            output = block(input_tensor)
+            self.assertIsNotNone(output, "Output should not be None.")
+        except Exception as e:
+            self.fail(f"ConvolutionalProcessingBlock raised an error: {e}")
+
+    def test_convolutional_dimensionality_reduction_block(self):
+        # Create a ConvolutionalDimensionalityReductionBlockBN instance
+        block = ConvolutionalDimensionalityReductionBlockBN(
+            input_shape=self.input_shape,
+            num_filters=self.num_filters,
+            kernel_size=self.kernel_size,
+            padding=self.padding,
+            bias=self.bias,
+            dilation=self.dilation,
+            reduction_factor=self.reduction_factor,
+        )
+
+        # Generate a random tensor matching the input shape
+        input_tensor = torch.randn(self.input_shape)
+
+        # Forward pass
+        try:
+            output = block(input_tensor)
+            self.assertIsNotNone(output, "Output should not be None.")
+        except Exception as e:
+            self.fail(f"ConvolutionalDimensionalityReductionBlock raised an error: {e}")
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/pytorch_mlp_framework/train_evaluate_image_classification_system.py
+++ b/pytorch_mlp_framework/train_evaluate_image_classification_system.py
@ -0,0 +1,102 @@
+import numpy as np
+import torch
+from torch.utils.data import DataLoader
+from torchvision import transforms
+
+import mlp.data_providers as data_providers
+from pytorch_mlp_framework.arg_extractor import get_args
+from pytorch_mlp_framework.experiment_builder import ExperimentBuilder
+from pytorch_mlp_framework.model_architectures import *
+import os
+
+# os.environ["CUDA_VISIBLE_DEVICES"]="0"
+
+args = get_args()  # get arguments from command line
+rng = np.random.RandomState(seed=args.seed)  # set the seeds for the experiment
+torch.manual_seed(seed=args.seed)  # sets pytorch's seed
+
+# set up data augmentation transforms for training and testing
+transform_train = transforms.Compose(
+    [
+        transforms.RandomCrop(32, padding=4),
+        transforms.RandomHorizontalFlip(),
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ]
+)
+
+transform_test = transforms.Compose(
+    [
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ]
+)
+
+train_data = data_providers.CIFAR100(
+    root="data", set_name="train", transform=transform_train, download=True
+)  # initialize our rngs using the argument set seed
+val_data = data_providers.CIFAR100(
+    root="data", set_name="val", transform=transform_test, download=True
+)  # initialize our rngs using the argument set seed
+test_data = data_providers.CIFAR100(
+    root="data", set_name="test", transform=transform_test, download=True
+)  # initialize our rngs using the argument set seed
+
+train_data_loader = DataLoader(
+    train_data, batch_size=args.batch_size, shuffle=True, num_workers=2
+)
+val_data_loader = DataLoader(
+    val_data, batch_size=args.batch_size, shuffle=True, num_workers=2
+)
+test_data_loader = DataLoader(
+    test_data, batch_size=args.batch_size, shuffle=True, num_workers=2
+)
+
+if args.block_type == "conv_block":
+    processing_block_type = ConvolutionalProcessingBlock
+    dim_reduction_block_type = ConvolutionalDimensionalityReductionBlock
+elif args.block_type == "empty_block":
+    processing_block_type = EmptyBlock
+    dim_reduction_block_type = EmptyBlock
+elif args.block_type == "conv_bn":
+    processing_block_type = ConvolutionalProcessingBlockBN
+    dim_reduction_block_type = ConvolutionalDimensionalityReductionBlockBN
+elif args.block_type == "conv_bn_rc":
+    processing_block_type = ConvolutionalProcessingBlockBNRC
+    dim_reduction_block_type = ConvolutionalDimensionalityReductionBlockBN
+else:
+    raise ModuleNotFoundError
+
+custom_conv_net = (
+    ConvolutionalNetwork(  # initialize our network object, in this case a ConvNet
+        input_shape=(
+            args.batch_size,
+            args.image_num_channels,
+            args.image_height,
+            args.image_width,
+        ),
+        num_output_classes=args.num_classes,
+        num_filters=args.num_filters,
+        use_bias=False,
+        num_blocks_per_stage=args.num_blocks_per_stage,
+        num_stages=args.num_stages,
+        processing_block_type=processing_block_type,
+        dimensionality_reduction_block_type=dim_reduction_block_type,
+    )
+)
+
+conv_experiment = ExperimentBuilder(
+    network_model=custom_conv_net,
+    experiment_name=args.experiment_name,
+    num_epochs=args.num_epochs,
+    weight_decay_coefficient=args.weight_decay_coefficient,
+    learning_rate=args.learning_rate,
+    use_gpu=args.use_gpu,
+    continue_from_epoch=args.continue_from_epoch,
+    train_data=train_data_loader,
+    val_data=val_data_loader,
+    test_data=test_data_loader,
+)  # build an experiment object
+experiment_metrics, test_metrics = (
+    conv_experiment.run_experiment()
+)  # run experiment and return experiment metrics
--- a/report/.gitignore
+++ b/report/.gitignore
@ -0,0 +1,4 @@
+*.fls
+*.fdb_latexmk
+s2759177/
+*.zip
--- a/report/README.txt
+++ b/report/README.txt
@ -0,0 +1 @@
+Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory.  However certain LaTeX distributions are missing certain files, and the they are included in this directory.  If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.
--- a/report/VGG38_BN/result_outputs/summary.csv
+++ b/report/VGG38_BN/result_outputs/summary.csv
@ -0,0 +1,101 @@
+train_acc,train_loss,val_acc,val_loss
+0.027410526315789472,4.440032,0.0368,4.238186
+0.0440842105263158,4.1909122,0.0644,4.1239405
+0.05604210526315791,4.0817885,0.0368,4.495799
+0.0685263157894737,3.984858,0.0964,3.8527937
+0.08345263157894738,3.8947835,0.09080000000000002,3.8306112
+0.09391578947368423,3.8246264,0.10399999999999998,3.7504945
+0.10189473684210527,3.760145,0.1124,3.6439042
+0.11197894736842108,3.704831,0.0992,3.962508
+0.12534736842105265,3.6408415,0.1404,3.516474
+0.1385894736842105,3.5672796,0.1444,3.5242612
+0.14873684210526317,3.5145628,0.12960000000000002,3.5745378
+0.16103157894736844,3.4476008,0.1852,3.3353982
+0.16846315789473681,3.399858,0.15600000000000003,3.453797
+0.1760210526315789,3.3611393,0.1464,3.5799885
+0.18625263157894736,3.3005812,0.196,3.201007
+0.19233684210526317,3.26565,0.17439999999999997,3.397586
+0.19625263157894737,3.2346153,0.212,3.169959
+0.20717894736842105,3.174345,0.2132,3.0981174
+0.2136,3.1425776,0.2036,3.2191591
+0.2217684210526316,3.094137,0.236,3.0018876
+0.23069473684210529,3.0539455,0.20440000000000003,3.1800296
+0.23395789473684211,3.0338168,0.22599999999999998,3.0360818
+0.24463157894736842,2.9761615,0.2588,2.8876188
+0.25311578947368424,2.931479,0.2,3.242481
+0.25795789473684216,2.900163,0.28320000000000006,2.830947
+0.26789473684210524,2.8484874,0.2768,2.8190458
+0.2709263157894737,2.833472,0.2352,3.0098538
+0.2816421052631579,2.7842317,0.29560000000000003,2.7288156
+0.28764210526315787,2.745757,0.2648,2.8955112
+0.2930315789473684,2.7276495,0.27680000000000005,2.8336413
+0.3001263157894737,2.6826382,0.316,2.6245823
+0.3068421052631579,2.658441,0.27,2.9279957
+0.30909473684210526,2.638565,0.31160000000000004,2.637653
+0.3213263157894737,2.5939283,0.31799999999999995,2.627816
+0.3211157894736843,2.579544,0.25079999999999997,2.9502957
+0.3259999999999999,2.5540712,0.3332,2.569941
+0.3336421052631579,2.5239582,0.278,2.7676308
+0.3371368421052632,2.5109046,0.2916,2.725589
+0.34404210526315787,2.4714804,0.34120000000000006,2.4782379
+0.3500631578947368,2.4545348,0.30600000000000005,2.6625924
+0.34976842105263156,2.4408882,0.342,2.5351026
+0.3586315789473684,2.4116046,0.3452,2.450749
+0.3568421052631579,2.4133172,0.3288,2.5647113
+0.3630947368421052,2.3772728,0.36519999999999997,2.388074
+0.37069473684210524,2.3505116,0.324,2.5489926
+0.37132631578947367,2.352426,0.33680000000000004,2.5370462
+0.37606315789473677,2.319005,0.3712,2.3507965
+0.3800210526315789,2.3045664,0.33,2.6327293
+0.38185263157894733,2.2965574,0.3764,2.364877
+0.38785263157894734,2.269467,0.37799999999999995,2.330837
+0.3889684210526316,2.26941,0.3559999999999999,2.513778
+0.3951789473684211,2.2413251,0.3888,2.2839465
+0.3944421052631579,2.2319226,0.35919999999999996,2.4310353
+0.4,2.220305,0.3732,2.348543
+0.4051157894736842,2.1891508,0.39440000000000003,2.2730627
+0.40581052631578945,2.1873925,0.33399999999999996,2.5648093
+0.4067789473684211,2.1817088,0.4044,2.2244952
+0.41555789473684207,2.1543047,0.39759999999999995,2.220972
+0.4170526315789474,2.14905,0.33399999999999996,2.6612198
+0.41762105263157895,2.1321266,0.3932,2.2343464
+0.42341052631578946,2.1131704,0.37800000000000006,2.327929
+0.4212842105263158,2.112597,0.376,2.3302126
+0.4295157894736842,2.0925663,0.4100000000000001,2.175698
+0.4299368421052632,2.0846903,0.3772,2.3750577
+0.43134736842105265,2.075184,0.4044,2.1888158
+0.43829473684210524,2.045202,0.41239999999999993,2.1673117
+0.43534736842105265,2.0590534,0.37440000000000007,2.3269994
+0.4417684210526316,2.0356588,0.42,2.1668334
+0.4442736842105263,2.028207,0.41239999999999993,2.2346516
+0.44581052631578943,2.021492,0.40519999999999995,2.2030904
+0.44884210526315793,2.0058675,0.4296,2.0948715
+0.45071578947368424,1.993417,0.39,2.2856123
+0.45130526315789476,1.9970801,0.43599999999999994,2.110219
+0.45686315789473686,1.9651922,0.4244,2.1253593
+0.4557263157894737,1.9701725,0.3704,2.4576838
+0.4609684210526315,1.956996,0.4412,2.0626938
+0.4639789473684211,1.9407912,0.398,2.3076272
+0.46311578947368426,1.9410807,0.4056,2.2181008
+0.4686736842105263,1.918824,0.45080000000000003,2.030652
+0.4650315789473684,1.924879,0.3948,2.2926931
+0.46964210526315786,1.9188553,0.43599999999999994,2.107239
+0.47357894736842104,1.8991861,0.43119999999999997,2.067097
+0.47212631578947367,1.8987728,0.41359999999999997,2.1667569
+0.4773263157894737,1.8892545,0.46,2.0283196
+0.4802526315789474,1.8736148,0.41960000000000003,2.1698954
+0.47406315789473685,1.8849738,0.43399999999999994,2.1001608
+0.48627368421052636,1.8492608,0.45520000000000005,1.9936249
+0.48589473684210527,1.8534511,0.38439999999999996,2.354954
+0.48667368421052637,1.8421199,0.44120000000000004,2.0467849
+0.4902736842105263,1.8265136,0.45519999999999994,2.0044358
+0.4879789473684211,1.838593,0.3984,2.3019247
+0.49204210526315795,1.8199797,0.4656,1.9858631
+0.4945894736842105,1.805858,0.436,2.1293921
+0.4939578947368421,1.8174701,0.4388,2.0611947
+0.4961684210526316,1.7953233,0.4612,1.9728945
+0.49610526315789477,1.7908033,0.42440000000000005,2.1648548
+0.4996,1.7908286,0.4664,1.9897026
+0.5070105263157895,1.7658812,0.452,2.0411723
+0.5027368421052631,1.7692825,0.4136000000000001,2.280331
+0.5062315789473685,1.7649119,0.4768,1.9493303
--- a/report/VGG38_BN/result_outputs/test_summary.csv
+++ b/report/VGG38_BN/result_outputs/test_summary.csv
@ -0,0 +1,2 @@
+test_acc,test_loss
+0.46970000000000006,1.9579598
--- a/report/VGG38_BN_RC/result_outputs/summary.csv
+++ b/report/VGG38_BN_RC/result_outputs/summary.csv
@ -0,0 +1,101 @@
+train_acc,train_loss,val_acc,val_loss
+0.04040000000000001,4.2986817,0.07600000000000001,3.9793916
+0.07663157894736841,3.948711,0.09840000000000002,3.8271046
+0.1072842105263158,3.7670445,0.0908,3.8834984
+0.14671578947368422,3.544252,0.1784,3.3180876
+0.18690526315789474,3.3382895,0.1672,3.4958847
+0.2185684210526316,3.1613564,0.23240000000000002,3.0646808
+0.2584,2.9509778,0.2904,2.7620668
+0.2886736842105263,2.7674758,0.2504,3.083242
+0.3186736842105263,2.6191177,0.34600000000000003,2.5320892
+0.3488421052631579,2.4735146,0.3556,2.463249
+0.36701052631578945,2.3815694,0.32480000000000003,2.6590502
+0.39258947368421054,2.2661598,0.41200000000000003,2.215237
+0.40985263157894736,2.1811035,0.3644,2.4625826
+0.42557894736842106,2.1193688,0.3896,2.2802749
+0.4452,2.0338347,0.45080000000000003,2.0216491
+0.45298947368421055,1.9886738,0.3768,2.4903286
+0.4690105263157895,1.9385177,0.46519999999999995,1.9589043
+0.48627368421052636,1.8654134,0.46199999999999997,1.9572229
+0.4910947368421053,1.836772,0.3947999999999999,2.371203
+0.5033052631578947,1.7882212,0.4864,1.8270072
+0.515578947368421,1.7451773,0.418,2.2281988
+0.5166526315789474,1.7310464,0.4744,1.9468222
+0.532,1.6639497,0.5176,1.7627875
+0.534821052631579,1.6504371,0.426,2.2908173
+0.5399578947368422,1.6263881,0.5092,1.7892419
+0.5538105263157893,1.5786182,0.5184,1.7781507
+0.5530526315789474,1.5743873,0.45480000000000004,2.052206
+0.5610526315789474,1.5367776,0.5404000000000001,1.6886607
+0.5709263157894736,1.508275,0.5072000000000001,1.8317349
+0.5693894736842106,1.5026951,0.49760000000000004,1.9268813
+0.5827368421052632,1.4614111,0.5484,1.6791071
+0.583557894736842,1.4580216,0.4744,2.084504
+0.5856842105263159,1.4402864,0.5468,1.6674811
+0.5958105263157895,1.4054152,0.5468,1.7081916
+0.5964631578947368,1.4043275,0.4988,1.8901508
+0.6044631578947368,1.3692447,0.548,1.6456038
+0.6065473684210526,1.3562685,0.5448,1.7725601
+0.6055578947368421,1.3638091,0.52,1.803752
+0.6169684210526316,1.3224502,0.5688,1.6048553
+0.6184421052631579,1.3228824,0.4772,2.0309162
+0.6193894736842105,1.312684,0.5496,1.6357917
+0.6287368421052631,1.2758818,0.5552,1.7120187
+0.6270105263157894,1.2829372,0.4872000000000001,1.9630791
+0.6313473684210527,1.2609128,0.5632,1.6049384
+0.6374736842105263,1.2429903,0.5516,1.7101723
+0.6342947368421055,1.2540665,0.5272,1.8112053
+0.642778947368421,1.2098345,0.5692,1.5996393
+0.6447368421052632,1.217454,0.5056,2.087292
+0.6437052631578949,1.2123955,0.5660000000000001,1.6426488
+0.6533263157894735,1.1804259,0.5672,1.6429158
+0.6521052631578947,1.1856273,0.5316000000000001,1.8833923
+0.658021052631579,1.1663536,0.5652,1.6239171
+0.6622947368421054,1.1522906,0.5376000000000001,1.8352613
+0.6543789473684212,1.1700194,0.5539999999999999,1.7920883
+0.6664,1.1246897,0.5828,1.5657492
+0.6645473684210526,1.1307288,0.5296,1.8285477
+0.6647157894736843,1.1294464,0.5852,1.59438
+0.6713473684210526,1.1020554,0.5647999999999999,1.6256377
+0.6691368421052631,1.1129124,0.5224,1.9497899
+0.6737684210526315,1.0941163,0.5708,1.5900868
+0.6765473684210527,1.0844595,0.55,1.7522817
+0.6762947368421053,1.0832069,0.5428000000000001,1.8020345
+0.6799789473684209,1.0637755,0.5864,1.5690281
+0.6808421052631578,1.066873,0.5168,1.9964217
+0.6843157894736842,1.0618489,0.5720000000000001,1.6391727
+0.6866736842105262,1.0432214,0.5731999999999999,1.6571078
+0.6877684210526315,1.0442319,0.5192,2.0341485
+0.6890105263157895,1.0338738,0.5836,1.5887364
+0.693642105263158,1.0206536,0.5456,1.8537303
+0.6905894736842106,1.0271776,0.5548000000000001,1.8022745
+0.6981263157894737,1.001102,0.5852,1.5923084
+0.6986105263157896,1.0052379,0.512,2.011443
+0.698042105263158,0.9990784,0.5744,1.638558
+0.7031578947368421,0.977477,0.5816,1.5790274
+0.7013473684210526,0.98766434,0.5448000000000001,1.8414693
+0.7069684210526315,0.9691622,0.59,1.5866013
+0.7061894736842105,0.9620083,0.55,1.7695292
+0.7050526315789474,0.9689725,0.5408,1.8329593
+0.7101052631578948,0.95279986,0.5852,1.5835829
+0.7122315789473684,0.9483001,0.5224,1.9749893
+0.7115157894736842,0.94911486,0.5808,1.6965445
+0.7166315789473684,0.9338312,0.5788,1.6249495
+0.7120631578947368,0.9428737,0.5224,1.9721117
+0.7197263157894737,0.92057914,0.5960000000000001,1.6235417
+0.7258315789473684,0.9071854,0.528,2.0651033
+0.7186947368421053,0.922529,0.5628,1.7508049
+0.7257684210526316,0.9007169,0.5980000000000001,1.5797865
+0.7254105263157896,0.89657074,0.5472,1.8673587
+0.7229263157894736,0.90324384,0.5771999999999999,1.6998875
+0.7308842105263157,0.8757633,0.5856,1.6750972
+0.7254947368421052,0.8956531,0.5479999999999999,1.9809356
+0.7302105263157894,0.8803156,0.5960000000000001,1.6343199
+0.7353473684210525,0.8630421,0.56,1.9686066
+0.732021052631579,0.8823739,0.5632,1.8139118
+0.7324631578947367,0.8676047,0.5952000000000001,1.6235788
+0.7366526315789473,0.85581774,0.5392,1.9346147
+0.7340210526315789,0.8636227,0.5868,1.6743768
+0.7416631578947368,0.84529686,0.5836,1.6691054
+0.734757894736842,0.85352796,0.516,2.227477
+0.7435368421052632,0.83374214,0.582,1.697568
--- a/report/VGG38_BN_RC/result_outputs/test_summary.csv
+++ b/report/VGG38_BN_RC/result_outputs/test_summary.csv
@ -0,0 +1,2 @@
+test_acc,test_loss
+0.6018000000000001,1.5933747
--- a/report/VGG38_default/result_outputs/summary.csv
+++ b/report/VGG38_default/result_outputs/summary.csv
@ -0,0 +1,101 @@
+train_acc,train_loss,val_acc,val_loss
+0.009600000000000001,4.609349,0.0104,4.6072426
+0.009326315789473684,4.6068563,0.0092,4.606588
+0.009747368421052631,4.6062207,0.0084,4.606326
+0.009621052631578947,4.6059957,0.0076,4.6067405
+0.009873684210526314,4.605887,0.0076,4.6068487
+0.009136842105263157,4.605854,0.008,4.6074386
+0.009536842105263158,4.605795,0.007200000000000001,4.6064863
+0.009578947368421051,4.6057415,0.006400000000000001,4.6065035
+0.009410526315789473,4.6058245,0.0076,4.606772
+0.009094736842105263,4.6057224,0.007600000000000001,4.6064925
+0.00911578947368421,4.605707,0.007200000000000001,4.6067533
+0.009852631578947368,4.605685,0.007200000000000001,4.6068745
+0.01031578947368421,4.6056952,0.0072,4.6067533
+0.009789473684210527,4.6057863,0.0072,4.6070247
+0.01031578947368421,4.6056023,0.0064,4.607134
+0.010189473684210526,4.605698,0.0064,4.606934
+0.009957894736842107,4.605643,0.006400000000000001,4.6068535
+0.009452631578947369,4.605595,0.0064,4.6070676
+0.009368421052631578,4.6057224,0.008,4.6070356
+0.010210526315789474,4.6056094,0.009600000000000001,4.6070833
+0.009557894736842105,4.6056895,0.0076,4.6069493
+0.009600000000000001,4.605709,0.008400000000000001,4.60693
+0.00985263157894737,4.6055284,0.0084,4.6068263
+0.009200000000000002,4.60564,0.0076,4.6071053
+0.009031578947368422,4.6056323,0.008400000000000001,4.606731
+0.009663157894736842,4.60559,0.0068,4.6069546
+0.008484210526315789,4.605676,0.009600000000000001,4.6063976
+0.0096,4.605595,0.011200000000000002,4.6067076
+0.00951578947368421,4.605619,0.0096,4.6068506
+0.009242105263157895,4.6056657,0.0072,4.6067576
+0.009326315789473684,4.6055913,0.012,4.6070724
+0.01023157894736842,4.605646,0.012000000000000002,4.6066885
+0.009494736842105262,4.605563,0.0072,4.6067305
+0.009810526315789474,4.6055746,0.007200000000000001,4.6067824
+0.010147368421052632,4.605596,0.0072,4.607214
+0.009536842105263156,4.6055007,0.007200000000000001,4.607186
+0.009452631578947369,4.605547,0.0072,4.607297
+0.009578947368421055,4.6055694,0.0072,4.607313
+0.009410526315789475,4.6055374,0.0072,4.60726
+0.00985263157894737,4.605587,0.0072,4.6072307
+0.009389473684210526,4.605559,0.0072,4.607227
+0.009852631578947368,4.6055884,0.008,4.6070976
+0.008968421052631579,4.6055803,0.008,4.607156
+0.009536842105263158,4.605502,0.0076,4.6073594
+0.009410526315789473,4.6055517,0.008,4.607176
+0.01,4.6055126,0.006400000000000001,4.606937
+0.009915789473684213,4.6055126,0.008,4.607185
+0.009305263157894737,4.605594,0.0064,4.606834
+0.009326315789473684,4.6054907,0.008,4.6070714
+0.009094736842105263,4.6055007,0.0076,4.6068645
+0.009052631578947368,4.6055903,0.008400000000000001,4.606755
+0.010294736842105263,4.605449,0.008,4.6068816
+0.009578947368421055,4.6054883,0.0064,4.6067166
+0.009452631578947369,4.60552,0.01,4.6066008
+0.008821052631578948,4.6054573,0.009600000000000001,4.6065955
+0.008968421052631579,4.605544,0.008,4.6063676
+0.010147368421052632,4.605516,0.0064,4.6068606
+0.009600000000000001,4.6054597,0.0096,4.6072354
+0.01008421052631579,4.605526,0.0076,4.6074166
+0.010126315789473685,4.6054554,0.0076,4.6074657
+0.009705263157894736,4.6054635,0.0088,4.607237
+0.009726315789473684,4.605516,0.007200000000000001,4.606978
+0.009894736842105262,4.6054883,0.0072,4.607135
+0.009663157894736842,4.605501,0.007200000000000001,4.607015
+0.00976842105263158,4.605536,0.008,4.6073785
+0.009473684210526316,4.6055303,0.009600000000000001,4.6070166
+0.009347368421052632,4.6054993,0.0076,4.607084
+0.009178947368421054,4.6054535,0.0084,4.6070604
+0.008842105263157892,4.605507,0.0076,4.6069884
+0.009726315789473684,4.6055107,0.007599999999999999,4.6069903
+0.009536842105263156,4.6054244,0.0084,4.6070695
+0.009452631578947369,4.605474,0.0072,4.607035
+0.009621052631578949,4.605444,0.0076,4.6071277
+0.010084210526315791,4.6054263,0.0076,4.6071534
+0.009326315789473686,4.605477,0.0088,4.607115
+0.009010526315789472,4.60548,0.0076,4.6072206
+0.010042105263157897,4.605475,0.0076,4.607185
+0.00976842105263158,4.6054463,0.008400000000000001,4.6071196
+0.01,4.605421,0.008,4.6069384
+0.009536842105263156,4.605482,0.008,4.607035
+0.009915789473684213,4.6054354,0.008,4.6071534
+0.010042105263157894,4.6054177,0.007200000000000001,4.607074
+0.009242105263157895,4.605473,0.0072,4.606825
+0.009726315789473684,4.6054006,0.0072,4.606701
+0.009684210526315788,4.6054583,0.0104,4.606925
+0.009642105263157895,4.6054606,0.0104,4.6068645
+0.00936842105263158,4.605405,0.0076,4.606976
+0.009263157894736843,4.605455,0.0076,4.606981
+0.00905263157894737,4.6054463,0.0092,4.6070757
+0.009915789473684213,4.605465,0.0068000000000000005,4.607151
+0.009389473684210526,4.605481,0.008400000000000001,4.606995
+0.009789473684210527,4.605436,0.0068000000000000005,4.6071105
+0.010273684210526315,4.605466,0.007200000000000001,4.606909
+0.009789473684210527,4.605443,0.0072,4.6066866
+0.009957894736842107,4.6053886,0.0076,4.606541
+0.010168421052631578,4.605481,0.006400000000000001,4.606732
+0.009242105263157894,4.605444,0.006400000000000001,4.606939
+0.009621052631578949,4.6054454,0.008,4.606915
+0.00976842105263158,4.60547,0.0076,4.6068935
+0.009873684210526316,4.6055245,0.0064,4.6072345
--- a/report/VGG38_default/result_outputs/test_summary.csv
+++ b/report/VGG38_default/result_outputs/test_summary.csv
@ -0,0 +1,2 @@
+test_acc,test_loss
+0.01,4.6053004
--- a/report/additional-latex-files/README.txt
+++ b/report/additional-latex-files/README.txt
@ -0,0 +1 @@
+Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory.  However certain LaTeX distributions are missing certain files, and the they are included in this directory.  If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.
--- a/report/additional-latex-files/algorithm.sty
+++ b/report/additional-latex-files/algorithm.sty
@ -0,0 +1,79 @@
+% ALGORITHM STYLE -- Released 8 April 1996
+%    for LaTeX-2e
+% Copyright -- 1994 Peter Williams
+% E-mail Peter.Williams@dsto.defence.gov.au
+\NeedsTeXFormat{LaTeX2e}
+\ProvidesPackage{algorithm}
+\typeout{Document Style `algorithm' - floating environment}
+
+\RequirePackage{float}
+\RequirePackage{ifthen}
+\newcommand{\ALG@within}{nothing}
+\newboolean{ALG@within}
+\setboolean{ALG@within}{false}
+\newcommand{\ALG@floatstyle}{ruled}
+\newcommand{\ALG@name}{Algorithm}
+\newcommand{\listalgorithmname}{List of \ALG@name s}
+
+% Declare Options
+% first appearance
+\DeclareOption{plain}{
+  \renewcommand{\ALG@floatstyle}{plain}
+}
+\DeclareOption{ruled}{
+  \renewcommand{\ALG@floatstyle}{ruled}
+}
+\DeclareOption{boxed}{
+  \renewcommand{\ALG@floatstyle}{boxed}
+}
+% then numbering convention
+\DeclareOption{part}{
+  \renewcommand{\ALG@within}{part}
+  \setboolean{ALG@within}{true}
+}
+\DeclareOption{chapter}{
+  \renewcommand{\ALG@within}{chapter}
+  \setboolean{ALG@within}{true}
+}
+\DeclareOption{section}{
+  \renewcommand{\ALG@within}{section}
+  \setboolean{ALG@within}{true}
+}
+\DeclareOption{subsection}{
+  \renewcommand{\ALG@within}{subsection}
+  \setboolean{ALG@within}{true}
+}
+\DeclareOption{subsubsection}{
+  \renewcommand{\ALG@within}{subsubsection}
+  \setboolean{ALG@within}{true}
+}
+\DeclareOption{nothing}{
+  \renewcommand{\ALG@within}{nothing}
+  \setboolean{ALG@within}{true}
+}
+\DeclareOption*{\edef\ALG@name{\CurrentOption}}
+
+% ALGORITHM
+%
+\ProcessOptions
+\floatstyle{\ALG@floatstyle}
+\ifthenelse{\boolean{ALG@within}}{
+  \ifthenelse{\equal{\ALG@within}{part}}
+     {\newfloat{algorithm}{htbp}{loa}[part]}{}
+  \ifthenelse{\equal{\ALG@within}{chapter}}
+     {\newfloat{algorithm}{htbp}{loa}[chapter]}{}
+  \ifthenelse{\equal{\ALG@within}{section}}
+     {\newfloat{algorithm}{htbp}{loa}[section]}{}
+  \ifthenelse{\equal{\ALG@within}{subsection}}
+     {\newfloat{algorithm}{htbp}{loa}[subsection]}{}
+  \ifthenelse{\equal{\ALG@within}{subsubsection}}
+     {\newfloat{algorithm}{htbp}{loa}[subsubsection]}{}
+  \ifthenelse{\equal{\ALG@within}{nothing}}
+     {\newfloat{algorithm}{htbp}{loa}}{}
+}{
+  \newfloat{algorithm}{htbp}{loa}
+}
+\floatname{algorithm}{\ALG@name}
+
+\newcommand{\listofalgorithms}{\listof{algorithm}{\listalgorithmname}}
+
--- a/report/additional-latex-files/algorithmic.sty
+++ b/report/additional-latex-files/algorithmic.sty
@ -0,0 +1,201 @@
+% ALGORITHMIC STYLE -- Released 8 APRIL 1996
+%    for LaTeX version 2e
+% Copyright -- 1994 Peter Williams
+% E-mail PeterWilliams@dsto.defence.gov.au
+%
+% Modified by Alex Smola (08/2000)
+% E-mail Alex.Smola@anu.edu.au
+%
+\NeedsTeXFormat{LaTeX2e}
+\ProvidesPackage{algorithmic}
+\typeout{Document Style `algorithmic' - environment}
+%
+\RequirePackage{ifthen}
+\RequirePackage{calc}
+\newboolean{ALC@noend}
+\setboolean{ALC@noend}{false}
+\newcounter{ALC@line}
+\newcounter{ALC@rem}
+\newlength{\ALC@tlm}
+%
+\DeclareOption{noend}{\setboolean{ALC@noend}{true}}
+%
+\ProcessOptions
+%
+% ALGORITHMIC
+\newcommand{\algorithmicrequire}{\textbf{Require:}}
+\newcommand{\algorithmicensure}{\textbf{Ensure:}}
+\newcommand{\algorithmiccomment}[1]{\{#1\}}
+\newcommand{\algorithmicend}{\textbf{end}}
+\newcommand{\algorithmicif}{\textbf{if}}
+\newcommand{\algorithmicthen}{\textbf{then}}
+\newcommand{\algorithmicelse}{\textbf{else}}
+\newcommand{\algorithmicelsif}{\algorithmicelse\ \algorithmicif}
+\newcommand{\algorithmicendif}{\algorithmicend\ \algorithmicif}
+\newcommand{\algorithmicfor}{\textbf{for}}
+\newcommand{\algorithmicforall}{\textbf{for all}}
+\newcommand{\algorithmicdo}{\textbf{do}}
+\newcommand{\algorithmicendfor}{\algorithmicend\ \algorithmicfor}
+\newcommand{\algorithmicwhile}{\textbf{while}}
+\newcommand{\algorithmicendwhile}{\algorithmicend\ \algorithmicwhile}
+\newcommand{\algorithmicloop}{\textbf{loop}}
+\newcommand{\algorithmicendloop}{\algorithmicend\ \algorithmicloop}
+\newcommand{\algorithmicrepeat}{\textbf{repeat}}
+\newcommand{\algorithmicuntil}{\textbf{until}}
+
+%changed by alex smola
+\newcommand{\algorithmicinput}{\textbf{input}}
+\newcommand{\algorithmicoutput}{\textbf{output}}
+\newcommand{\algorithmicset}{\textbf{set}}
+\newcommand{\algorithmictrue}{\textbf{true}}
+\newcommand{\algorithmicfalse}{\textbf{false}}
+\newcommand{\algorithmicand}{\textbf{and\ }}
+\newcommand{\algorithmicor}{\textbf{or\ }}
+\newcommand{\algorithmicfunction}{\textbf{function}}
+\newcommand{\algorithmicendfunction}{\algorithmicend\ \algorithmicfunction}
+\newcommand{\algorithmicmain}{\textbf{main}}
+\newcommand{\algorithmicendmain}{\algorithmicend\ \algorithmicmain}
+%end changed by alex smola
+
+\def\ALC@item[#1]{%
+\if@noparitem \@donoparitem
+  \else \if@inlabel \indent \par \fi
+         \ifhmode \unskip\unskip \par \fi
+         \if@newlist \if@nobreak \@nbitem \else
+                        \addpenalty\@beginparpenalty
+                        \addvspace\@topsep \addvspace{-\parskip}\fi
+           \else \addpenalty\@itempenalty \addvspace\itemsep
+          \fi
+    \global\@inlabeltrue
+\fi
+\everypar{\global\@minipagefalse\global\@newlistfalse
+          \if@inlabel\global\@inlabelfalse \hskip -\parindent \box\@labels
+             \penalty\z@ \fi
+          \everypar{}}\global\@nobreakfalse
+\if@noitemarg \@noitemargfalse \if@nmbrlist \refstepcounter{\@listctr}\fi \fi
+\sbox\@tempboxa{\makelabel{#1}}%
+\global\setbox\@labels
+ \hbox{\unhbox\@labels \hskip \itemindent
+       \hskip -\labelwidth \hskip -\ALC@tlm
+       \ifdim \wd\@tempboxa >\labelwidth
+                \box\@tempboxa
+          \else \hbox to\labelwidth {\unhbox\@tempboxa}\fi
+       \hskip \ALC@tlm}\ignorespaces}
+%
+\newenvironment{algorithmic}[1][0]{
+\let\@item\ALC@item
+  \newcommand{\ALC@lno}{%
+\ifthenelse{\equal{\arabic{ALC@rem}}{0}}
+{{\footnotesize \arabic{ALC@line}:}}{}%
+}
+\let\@listii\@listi
+\let\@listiii\@listi
+\let\@listiv\@listi
+\let\@listv\@listi
+\let\@listvi\@listi
+\let\@listvii\@listi
+  \newenvironment{ALC@g}{
+    \begin{list}{\ALC@lno}{ \itemsep\z@ \itemindent\z@
+    \listparindent\z@ \rightmargin\z@ 
+    \topsep\z@ \partopsep\z@ \parskip\z@\parsep\z@
+    \leftmargin 1em
+    \addtolength{\ALC@tlm}{\leftmargin}
+    }
+  }
+  {\end{list}}
+  \newcommand{\ALC@it}{\addtocounter{ALC@line}{1}\addtocounter{ALC@rem}{1}\ifthenelse{\equal{\arabic{ALC@rem}}{#1}}{\setcounter{ALC@rem}{0}}{}\item}
+  \newcommand{\ALC@com}[1]{\ifthenelse{\equal{##1}{default}}%
+{}{\ \algorithmiccomment{##1}}}
+  \newcommand{\REQUIRE}{\item[\algorithmicrequire]}
+  \newcommand{\ENSURE}{\item[\algorithmicensure]}
+  \newcommand{\STATE}{\ALC@it}
+  \newcommand{\COMMENT}[1]{\algorithmiccomment{##1}}
+%changes by alex smola
+  \newcommand{\INPUT}{\item[\algorithmicinput]}
+  \newcommand{\OUTPUT}{\item[\algorithmicoutput]}
+  \newcommand{\SET}{\item[\algorithmicset]}
+%  \newcommand{\TRUE}{\algorithmictrue}
+%  \newcommand{\FALSE}{\algorithmicfalse}
+  \newcommand{\AND}{\algorithmicand}
+  \newcommand{\OR}{\algorithmicor}
+  \newenvironment{ALC@func}{\begin{ALC@g}}{\end{ALC@g}}
+  \newenvironment{ALC@main}{\begin{ALC@g}}{\end{ALC@g}}
+%end changes by alex smola
+  \newenvironment{ALC@if}{\begin{ALC@g}}{\end{ALC@g}}
+  \newenvironment{ALC@for}{\begin{ALC@g}}{\end{ALC@g}}
+  \newenvironment{ALC@whl}{\begin{ALC@g}}{\end{ALC@g}}
+  \newenvironment{ALC@loop}{\begin{ALC@g}}{\end{ALC@g}}
+  \newenvironment{ALC@rpt}{\begin{ALC@g}}{\end{ALC@g}}
+  \renewcommand{\\}{\@centercr}
+  \newcommand{\IF}[2][default]{\ALC@it\algorithmicif\ ##2\ \algorithmicthen%
+\ALC@com{##1}\begin{ALC@if}}
+  \newcommand{\SHORTIF}[2]{\ALC@it\algorithmicif\ ##1\
+    \algorithmicthen\ {##2}}
+  \newcommand{\ELSE}[1][default]{\end{ALC@if}\ALC@it\algorithmicelse%
+\ALC@com{##1}\begin{ALC@if}}
+  \newcommand{\ELSIF}[2][default]%
+{\end{ALC@if}\ALC@it\algorithmicelsif\ ##2\ \algorithmicthen%
+\ALC@com{##1}\begin{ALC@if}}
+  \newcommand{\FOR}[2][default]{\ALC@it\algorithmicfor\ ##2\ \algorithmicdo%
+\ALC@com{##1}\begin{ALC@for}}
+  \newcommand{\FORALL}[2][default]{\ALC@it\algorithmicforall\ ##2\ %
+\algorithmicdo%
+\ALC@com{##1}\begin{ALC@for}}
+  \newcommand{\SHORTFORALL}[2]{\ALC@it\algorithmicforall\ ##1\ %
+    \algorithmicdo\ {##2}}
+  \newcommand{\WHILE}[2][default]{\ALC@it\algorithmicwhile\ ##2\ %
+\algorithmicdo%
+\ALC@com{##1}\begin{ALC@whl}}
+  \newcommand{\LOOP}[1][default]{\ALC@it\algorithmicloop%
+\ALC@com{##1}\begin{ALC@loop}}
+%changed by alex smola
+  \newcommand{\FUNCTION}[2][default]{\ALC@it\algorithmicfunction\ ##2\ %
+    \ALC@com{##1}\begin{ALC@func}}
+  \newcommand{\MAIN}[2][default]{\ALC@it\algorithmicmain\ ##2\ %
+    \ALC@com{##1}\begin{ALC@main}}
+%end changed by alex smola
+  \newcommand{\REPEAT}[1][default]{\ALC@it\algorithmicrepeat%
+    \ALC@com{##1}\begin{ALC@rpt}}
+    \newcommand{\UNTIL}[1]{\end{ALC@rpt}\ALC@it\algorithmicuntil\ ##1}
+  \ifthenelse{\boolean{ALC@noend}}{
+    \newcommand{\ENDIF}{\end{ALC@if}}
+    \newcommand{\ENDFOR}{\end{ALC@for}}
+    \newcommand{\ENDWHILE}{\end{ALC@whl}}
+    \newcommand{\ENDLOOP}{\end{ALC@loop}}
+    \newcommand{\ENDFUNCTION}{\end{ALC@func}}
+    \newcommand{\ENDMAIN}{\end{ALC@main}}
+  }{
+    \newcommand{\ENDIF}{\end{ALC@if}\ALC@it\algorithmicendif}
+    \newcommand{\ENDFOR}{\end{ALC@for}\ALC@it\algorithmicendfor}
+    \newcommand{\ENDWHILE}{\end{ALC@whl}\ALC@it\algorithmicendwhile}
+    \newcommand{\ENDLOOP}{\end{ALC@loop}\ALC@it\algorithmicendloop}
+    \newcommand{\ENDFUNCTION}{\end{ALC@func}\ALC@it\algorithmicendfunction}
+    \newcommand{\ENDMAIN}{\end{ALC@main}\ALC@it\algorithmicendmain}
+  } 
+  \renewcommand{\@toodeep}{}
+  \begin{list}{\ALC@lno}{\setcounter{ALC@line}{0}\setcounter{ALC@rem}{0}%
+      \itemsep\z@ \itemindent\z@ \listparindent\z@%
+      \partopsep\z@ \parskip\z@ \parsep\z@%
+      \labelsep 0.5em \topsep 0.2em%
+      \ifthenelse{\equal{#1}{0}}
+      {\labelwidth 0.5em }
+      {\labelwidth  1.2em }
+      \leftmargin\labelwidth \addtolength{\leftmargin}{\labelsep}
+      \ALC@tlm\labelsep
+      }
+    }
+  {\end{list}}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/report/additional-latex-files/fancyhdr.sty
+++ b/report/additional-latex-files/fancyhdr.sty
@ -0,0 +1,485 @@
+% fancyhdr.sty version 3.2
+% Fancy headers and footers for LaTeX.
+% Piet van Oostrum, 
+% Dept of Computer and Information Sciences, University of Utrecht,
+% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
+% Telephone: +31 30 2532180. Email: piet@cs.uu.nl
+% ========================================================================
+% LICENCE:
+% This file may be distributed under the terms of the LaTeX Project Public
+% License, as described in lppl.txt in the base LaTeX distribution.
+% Either version 1 or, at your option, any later version.
+% ========================================================================
+% MODIFICATION HISTORY:
+% Sep 16, 1994
+% version 1.4: Correction for use with \reversemargin
+% Sep 29, 1994:
+% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands
+% Oct 4, 1994:
+% version 1.6: Reset single spacing in headers/footers for use with
+% setspace.sty or doublespace.sty
+% Oct 4, 1994:
+% version 1.7: changed \let\@mkboth\markboth to
+% \def\@mkboth{\protect\markboth} to make it more robust
+% Dec 5, 1994:
+% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more
+% importantly) use the \chapter/sectionmark definitions from ps@headings if
+% they exist (which should be true for all standard classes).
+% May 31, 1995:
+% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage...
+% construction in the doc did not work properly with the fancyplain style. 
+% June 1, 1995:
+% version 1.91: The definition of \@mkboth wasn't restored on subsequent
+% \pagestyle{fancy}'s.
+% June 1, 1995:
+% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain}
+% \pagestyle{fancy} would erroneously select the plain version.
+% June 1, 1995:
+% version 1.93: \fancypagestyle command added.
+% Dec 11, 1995:
+% version 1.94: suggested by Conrad Hughes <chughes@maths.tcd.ie>
+% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule
+% position (old hardcoded value of .3\normalbaselineskip is far too high
+% when used with very small footer fonts).
+% Jan 31, 1996:
+% version 1.95: call \@normalsize in the reset code if that is defined,
+% otherwise \normalsize.
+% this is to solve a problem with ucthesis.cls, as this doesn't
+% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't
+% work as this is optimized to do very little, so there \@normalsize should
+% be called. Hopefully this code works for all versions of LaTeX known to
+% mankind.  
+% April 25, 1996:
+% version 1.96: initialize \headwidth to a magic (negative) value to catch
+% most common cases that people change it before calling \pagestyle{fancy}.
+% Note it can't be initialized when reading in this file, because
+% \textwidth could be changed afterwards. This is quite probable.
+% We also switch to \MakeUppercase rather than \uppercase and introduce a
+% \nouppercase command for use in headers. and footers.
+% May 3, 1996:
+% version 1.97: Two changes:
+% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults
+% for the chapter and section marks. The current version of amsbook and
+% amsart classes don't seem to need them anymore. Moreover the standard
+% latex classes don't use \markboth if twoside isn't selected, and this is
+% confusing as \leftmark doesn't work as expected.
+% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem
+% in the amsbook and amsart classes, that make global changes to \topskip,
+% which are reset in \ps@empty. Hopefully this doesn't break other things.
+% May 7, 1996:
+% version 1.98:
+% Added % after the line  \def\nouppercase
+% May 7, 1996:
+% version 1.99: This is the alpha version of fancyhdr 2.0
+% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf.
+% Changed \headrulewidth, \footrulewidth, \footruleskip to
+% macros rather than length parameters, In this way they can be
+% conditionalized and they don't consume length registers. There is no need
+% to have them as length registers unless you want to do calculations with
+% them, which is unlikely. Note that this may make some uses of them
+% incompatible (i.e. if you have a file that uses \setlength or \xxxx=)
+% May 10, 1996:
+% version 1.99a:
+% Added a few more % signs
+% May 10, 1996:
+% version 1.99b:
+% Changed the syntax of \f@nfor to be resistent to catcode changes of :=
+% Removed the [1] from the defs of \lhead etc. because the parameter is
+% consumed by the \@[xy]lhead etc. macros.
+% June 24, 1997:
+% version 1.99c:
+% corrected \nouppercase to also include the protected form of \MakeUppercase
+% \global added to manipulation of \headwidth.
+% \iffootnote command added.
+% Some comments added about \@fancyhead and \@fancyfoot.
+% Aug 24, 1998
+% version 1.99d
+% Changed the default \ps@empty to \ps@@empty in order to allow
+% \fancypagestyle{empty} redefinition.
+% Oct 11, 2000
+% version 2.0
+% Added LPPL license clause.
+%
+% A check for \headheight is added. An errormessage is given (once) if the
+% header is too large. Empty headers don't generate the error even if
+% \headheight is very small or even 0pt. 
+% Warning added for the use of 'E' option when twoside option is not used.
+% In this case the 'E' fields will never be used.
+%
+% Mar 10, 2002
+% version 2.1beta
+% New command: \fancyhfoffset[place]{length}
+% defines offsets to be applied to the header/footer to let it stick into
+% the margins (if length > 0).
+% place is like in fancyhead, except that only E,O,L,R can be used.
+% This replaces the old calculation based on \headwidth and the marginpar
+% area.
+% \headwidth will be dynamically calculated in the headers/footers when
+% this is used.
+%
+% Mar 26, 2002
+% version 2.1beta2
+% \fancyhfoffset now also takes h,f as possible letters in the argument to
+% allow the header and footer widths to be different.
+% New commands \fancyheadoffset and \fancyfootoffset added comparable to
+% \fancyhead and \fancyfoot.
+% Errormessages and warnings have been made more informative.
+%
+% Dec 9, 2002
+% version 2.1
+% The defaults for \footrulewidth, \plainheadrulewidth and
+% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when
+% someone inadvertantly uses \setlength to change any of these, the value
+% of \z@skip will not be changed, rather an errormessage will be given.
+
+% March 3, 2004
+% Release of version 3.0
+
+% Oct 7, 2004
+% version 3.1
+% Added '\endlinechar=13' to \fancy@reset to prevent problems with
+% includegraphics in header when verbatiminput is active.
+
+% March 22, 2005
+% version 3.2
+% reset \everypar (the real one) in \fancy@reset because spanish.ldf does
+% strange things with \everypar between << and >>.
+
+\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty}
+
+\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else
+                                   \fancy@gbl\def#1{#2\strut}\fi}
+
+\let\fancy@gbl\global
+
+\def\@fancyerrmsg#1{%
+        \ifx\PackageError\undefined
+        \errmessage{#1}\else
+        \PackageError{Fancyhdr}{#1}{}\fi}
+\def\@fancywarning#1{%
+        \ifx\PackageWarning\undefined
+        \errmessage{#1}\else
+        \PackageWarning{Fancyhdr}{#1}{}\fi}
+
+% Usage: \@forc \var{charstring}{command to be executed for each char}
+% This is similar to LaTeX's \@tfor, but expands the charstring.
+
+\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}}
+\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else
+                                    \f@@rc#1#2\f@@rc{#3}\fi}
+\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}}
+
+% Usage: \f@nfor\name:=list\do{body}
+% Like LaTeX's \@for but an empty list is treated as a list with an empty
+% element
+
+\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}%
+    \expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}}
+
+% Usage: \def@ult \cs{defaults}{argument}
+% sets \cs to the characters from defaults appearing in argument
+% or defaults if it would be empty. All characters are lowercased.
+
+\newcommand\def@ult[3]{%
+    \edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a
+    \def#1{}%
+    \@forc\tmpf@ra{#2}%
+        {\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}%
+    \ifx\@empty#1\def#1{#2}\fi}
+% 
+% \if@in <char><set><truecase><falsecase>
+%
+\newcommand{\if@in}[4]{%
+    \edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}%
+    \expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi}
+
+\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}%
+                                     {\f@ncyhf\fancyhead h[]}}
+\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}%
+                                     {\f@ncyhf\fancyfoot f[]}}
+\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}%
+                                   {\f@ncyhf\fancyhf{}[]}}
+
+% New commands for offsets added
+
+\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}%
+                                           {\f@ncyhfoffs\fancyheadoffset h[]}}
+\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}%
+                                           {\f@ncyhfoffs\fancyfootoffset f[]}}
+\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}%
+                                         {\f@ncyhfoffs\fancyhfoffset{}[]}}
+
+% The header and footer fields are stored in command sequences with
+% names of the form: \f@ncy<x><y><z> with <x> for [eo], <y> from [lcr]
+% and <z> from [hf].
+
+\def\f@ncyhf#1#2[#3]#4{%
+    \def\temp@c{}%
+    \@forc\tmpf@ra{#3}%
+        {\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}%
+            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
+    \ifx\@empty\temp@c\else
+        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
+          [#3]}%
+    \fi
+    \f@nfor\temp@c{#3}%
+        {\def@ult\f@@@eo{eo}\temp@c
+         \if@twoside\else
+           \if\f@@@eo e\@fancywarning
+             {\string#1's `E' option without twoside option is useless}\fi\fi
+         \def@ult\f@@@lcr{lcr}\temp@c
+         \def@ult\f@@@hf{hf}{#2\temp@c}%
+         \@forc\f@@eo\f@@@eo
+             {\@forc\f@@lcr\f@@@lcr
+                 {\@forc\f@@hf\f@@@hf
+                     {\expandafter\fancy@def\csname
+                      f@ncy\f@@eo\f@@lcr\f@@hf\endcsname
+                      {#4}}}}}}
+
+\def\f@ncyhfoffs#1#2[#3]#4{%
+    \def\temp@c{}%
+    \@forc\tmpf@ra{#3}%
+        {\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}%
+            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
+    \ifx\@empty\temp@c\else
+        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
+          [#3]}%
+    \fi
+    \f@nfor\temp@c{#3}%
+        {\def@ult\f@@@eo{eo}\temp@c
+         \if@twoside\else
+           \if\f@@@eo e\@fancywarning
+             {\string#1's `E' option without twoside option is useless}\fi\fi
+         \def@ult\f@@@lcr{lr}\temp@c
+         \def@ult\f@@@hf{hf}{#2\temp@c}%
+         \@forc\f@@eo\f@@@eo
+             {\@forc\f@@lcr\f@@@lcr
+                 {\@forc\f@@hf\f@@@hf
+                     {\expandafter\setlength\csname
+                      f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname
+                      {#4}}}}}%
+     \fancy@setoffs}
+
+% Fancyheadings version 1 commands. These are more or less deprecated,
+% but they continue to work.
+
+\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}}
+\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}}
+\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}}
+
+\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}}
+\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}}
+\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}}
+
+\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}}
+\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}}
+\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}}
+
+\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}}
+\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}}
+\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}}
+
+\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}}
+\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}}
+\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}}
+
+\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}}
+\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}}
+\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}}
+
+\newlength{\fancy@headwidth}
+\let\headwidth\fancy@headwidth
+\newlength{\f@ncyO@elh}
+\newlength{\f@ncyO@erh}
+\newlength{\f@ncyO@olh}
+\newlength{\f@ncyO@orh}
+\newlength{\f@ncyO@elf}
+\newlength{\f@ncyO@erf}
+\newlength{\f@ncyO@olf}
+\newlength{\f@ncyO@orf}
+\newcommand{\headrulewidth}{0.4pt}
+\newcommand{\footrulewidth}{0pt}
+\newcommand{\footruleskip}{.3\normalbaselineskip}
+
+% Fancyplain stuff shouldn't be used anymore (rather
+% \fancypagestyle{plain} should be used), but it must be present for
+% compatibility reasons.
+
+\newcommand{\plainheadrulewidth}{0pt}
+\newcommand{\plainfootrulewidth}{0pt}
+\newif\if@fancyplain \@fancyplainfalse
+\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi}
+
+\headwidth=-123456789sp %magic constant
+
+% Command to reset various things in the headers:
+% a.o.  single spacing (taken from setspace.sty)
+% and the catcode of ^^M (so that epsf files in the header work if a
+% verbatim crosses a page boundary)
+% It also defines a \nouppercase command that disables \uppercase and
+% \Makeuppercase. It can only be used in the headers and footers.
+\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf
+\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13
+ \def\baselinestretch{1}%
+ \def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax
+     \expandafter\let\csname MakeUppercase \endcsname\relax##1}}%
+ \ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e
+   \ifx\@normalsize\undefined \normalsize % for ucthesis.cls
+   \else \@normalsize \fi
+ \else% NFSS (2.09) present
+  \@newbaseline%
+ \fi}
+
+% Initialization of the head and foot text.
+
+% The default values still contain \fancyplain for compatibility.
+\fancyhf{} % clear all
+% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages
+% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages
+\if@twoside
+  \fancyhead[el,or]{\fancyplain{}{\sl\rightmark}}
+  \fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}}
+\else
+  \fancyhead[l]{\fancyplain{}{\sl\rightmark}}
+  \fancyhead[r]{\fancyplain{}{\sl\leftmark}}
+\fi
+\fancyfoot[c]{\rm\thepage} % page number
+
+% Use box 0 as a temp box and dimen 0 as temp dimen. 
+% This can be done, because this code will always
+% be used inside another box, and therefore the changes are local.
+
+\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning
+  {\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J
+    We now make it that large for the rest of the document.^^J
+    This may cause the page layout to be inconsistent, however\@gobble}%
+  \dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi
+  \box0}
+
+% Put together a header or footer given the left, center and
+% right text, fillers at left and right and a rule.
+% The \lap commands put the text into an hbox of zero size,
+% so overlapping text does not generate an errormessage.
+% These macros have 5 parameters:
+% 1. LEFTSIDE BEARING % This determines at which side the header will stick
+%    out. When \fancyhfoffset is used this calculates \headwidth, otherwise
+%    it is \hss or \relax (after expansion).
+% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component.
+% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp.
+% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component.
+% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion).
+
+\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
+  \@fancyvbox\headheight{\hbox
+    {\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill
+      \parbox[b]{\headwidth}{\centering#3}\hfill
+      \llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5}
+
+\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
+    \@fancyvbox\footskip{\footrule
+      \hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill
+        \parbox[t]{\headwidth}{\centering#3}\hfill
+        \llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5}
+
+\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi
+    \hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}}
+
+\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi
+    \vskip-\footruleskip\vskip-\footrulewidth
+    \hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}}
+
+\def\ps@fancy{%
+\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook
+%
+% Define \MakeUppercase for old LaTeXen.
+% Note: we used \def rather than \let, so that \let\uppercase\relax (from
+% the version 1 documentation) will still work.
+%
+\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}%
+\@ifundefined{chapter}{\def\sectionmark##1{\markboth
+{\MakeUppercase{\ifnum \c@secnumdepth>\z@
+ \thesection\hskip 1em\relax \fi ##1}}{}}%
+\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne
+ \thesubsection\hskip 1em\relax \fi ##1}}}%
+{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne
+ \@chapapp\ \thechapter. \ \fi ##1}}{}}%
+\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@
+ \thesection. \ \fi ##1}}}}%
+%\csname ps@headings\endcsname % use \ps@headings defaults if they exist
+\ps@@fancy
+\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}%
+% Initialize \headwidth if the user didn't
+%
+\ifdim\headwidth<0sp
+%
+% This catches the case that \headwidth hasn't been initialized and the
+% case that the user added something to \headwidth in the expectation that
+% it was initialized to \textwidth. We compensate this now. This loses if
+% the user intended to multiply it by a factor. But that case is more
+% likely done by saying something like \headwidth=1.2\textwidth. 
+% The doc says you have to change \headwidth after the first call to
+% \pagestyle{fancy}. This code is just to catch the most common cases were
+% that requirement is violated.
+%
+    \global\advance\headwidth123456789sp\global\advance\headwidth\textwidth
+\fi}
+\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy}
+\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy}
+\let\ps@@empty\ps@empty
+\def\ps@@fancy{%
+\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip
+\def\@mkboth{\protect\markboth}%
+\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}%
+\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}%
+\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}%
+\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}%
+}
+% Default definitions for compatibility mode:
+% These cause the header/footer to take the defined \headwidth as width
+% And to shift in the direction of the marginpar area
+
+\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi}
+\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi}
+\let\fancy@Oelh\fancy@Oorh
+\let\fancy@Oerh\fancy@Oolh
+
+\let\fancy@Oolf\fancy@Oolh
+\let\fancy@Oorf\fancy@Oorh
+\let\fancy@Oelf\fancy@Oelh
+\let\fancy@Oerf\fancy@Oerh
+
+% New definitions for the use of \fancyhfoffset
+% These calculate the \headwidth from \textwidth and the specified offsets.
+
+\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh
+                   \advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh}
+\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh
+                   \advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh}
+
+\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf
+                   \advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf}
+\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf
+                   \advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf}
+
+\def\fancy@setoffs{%
+% Just in case \let\headwidth\textwidth was used
+  \fancy@gbl\let\headwidth\fancy@headwidth
+  \fancy@gbl\let\fancy@Oolh\fancy@offsolh
+  \fancy@gbl\let\fancy@Oelh\fancy@offselh
+  \fancy@gbl\let\fancy@Oorh\hss
+  \fancy@gbl\let\fancy@Oerh\hss
+  \fancy@gbl\let\fancy@Oolf\fancy@offsolf
+  \fancy@gbl\let\fancy@Oelf\fancy@offself
+  \fancy@gbl\let\fancy@Oorf\hss
+  \fancy@gbl\let\fancy@Oerf\hss}
+
+\newif\iffootnote
+\let\latex@makecol\@makecol
+\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi
+\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol}
+\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi}
+\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi}
+\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi}
+
+\newcommand{\fancypagestyle}[2]{%
+  \@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}}
--- a/report/additional-latex-files/natbib.sty
+++ b/report/additional-latex-files/natbib.sty
--- a/report/figures/VGG38_BN_RC_acc.pdf
+++ b/report/figures/VGG38_BN_RC_acc.pdf
--- a/report/figures/VGG38_BN_RC_accuracy_performance.pdf
+++ b/report/figures/VGG38_BN_RC_accuracy_performance.pdf
--- a/report/figures/VGG38_BN_RC_loss.pdf
+++ b/report/figures/VGG38_BN_RC_loss.pdf
--- a/report/figures/VGG38_BN_RC_loss_performance.pdf
+++ b/report/figures/VGG38_BN_RC_loss_performance.pdf
--- a/report/figures/accuracy_plot.pdf
+++ b/report/figures/accuracy_plot.pdf
--- a/report/figures/grad_flow_vgg08.pdf
+++ b/report/figures/grad_flow_vgg08.pdf
--- a/report/figures/gradplot_38.pdf
+++ b/report/figures/gradplot_38.pdf
--- a/report/figures/gradplot_38_bn.pdf
+++ b/report/figures/gradplot_38_bn.pdf
--- a/report/figures/gradplot_38_bn_rc.pdf
+++ b/report/figures/gradplot_38_bn_rc.pdf
--- a/report/figures/gradplot_38_watermarked.pdf
+++ b/report/figures/gradplot_38_watermarked.pdf
--- a/report/figures/gradplot_38bnrc.pdf
+++ b/report/figures/gradplot_38bnrc.pdf
--- a/report/figures/loss_plot.pdf
+++ b/report/figures/loss_plot.pdf
--- a/report/icml2017.bst
+++ b/report/icml2017.bst
--- a/report/mlp-cw2-questions.tex
+++ b/report/mlp-cw2-questions.tex
@ -0,0 +1,176 @@
+%% REPLACE sXXXXXXX with your student number
+\def\studentNumber{s2759177}
+
+
+%% START of YOUR ANSWERS
+%% Add answers to the questions below, by replacing the text inside the brackets {} for \youranswer{ "Text to be replaced with your answer." }. 
+%
+% Do not delete the commands for adding figures and tables. Instead fill in the missing values with your experiment results, and replace the images with your own respective figures.
+%
+% You can generally delete the placeholder text, such as for example the text "Question Figure 3 - Replace the images ..." 
+%
+% There are 5 TEXT QUESTIONS. Replace the text inside the brackets of the command \youranswer with your answer to the question.
+%
+% There are also 3 "questions" to replace some placeholder FIGURES with your own, and 1 "question" asking you to fill in the missing entries in the TABLE provided. 
+%
+% NOTE! that questions are ordered by the order of appearance of their answers in the text, and not necessarily by the order you should tackle them. You should attempt to fill in the TABLE and FIGURES before discussing the results presented there. 
+%
+% NOTE! If for some reason you do not manage to produce results for some FIGURES and the TABLE, then you can get partial marks by discussing your expectations of the results in the relevant TEXT QUESTIONS. The TABLE specifically has enough information in it already for you to draw meaningful conclusions.
+%
+% Please refer to the coursework specification for more details.
+
+
+%% - - - - - - - - - - - - TEXT QUESTIONS - - - - - - - - - - - - 
+
+%% Question 1:
+% Use Figures 1, 2, and 3 to identify the Vanishing Gradient Problem (which of these model suffers from it, and what are the consequences depicted?).
+% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page}
+
+\newcommand{\questionOne} {
+\youranswer{
+We can observe the 8 layer network learning (even though it does not achieve high accuracy), but the 38-layer network fails to learn, as its gradients vanish almost entirely in the earlier layers. This is evident in Figure 3, where the gradients in VGG38 are close to zero for all but the last few layers, preventing effective weight updates during backpropagation. Consequently, the deeper network is unable to extract meaningful features or minimize its loss, leading to stagnation in both training and validation performance.
+
+We conclude that VGG08 performs nominally during training, while VGG38 suffers from the vanishing gradient problem, as its gradients diminish to near-zero in early layers, impeding effective weight updates and preventing the network from learning meaningful features. This limitation nullifies the advantages of its deeper architecture, as reflected in its stagnant loss and accuracy throughout training. This is in stark contrast to VGG08 which maintains a healthy gradient flow across layers, allowing effective weight updates and enabling the network to learn features, reduce loss, and improve accuracy despite its smaller depth.
+}
+}
+
+%% Question 2:
+% Consider these results (including Figure 1 from \cite{he2016deep}). Discuss the relation between network capacity and overfitting, and whether, and how, this is reflected on these results. What other factors may have lead to this difference in performance?
+% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page
+\newcommand{\questionTwo} {
+\youranswer{Our results thus corroborate that increasing network depth can lead to higher training and testing errors, as seen in the comparison between VGG08 and VGG38. While deeper networks, like VGG38, have a larger capacity to learn complex features, they may struggle to generalize effectively, resulting in overfitting and poor performance on unseen data. This is consistent with the behaviour observed in Figure 1 from \cite{he2016deep}, where the 56-layer network exhibits higher training error and, consequently, higher test error compared to the 20-layer network.
+
+Our results suggest that the increased capacity of VGG38 does not translate into better generalization, likely due to the vanishing gradient problem, which hinders learning in deeper networks. Other factors, such as inadequate regularization or insufficient data augmentation, could also contribute to the observed performance difference, leading to overfitting in deeper architectures.}
+}
+
+%% Question 3:
+% In this coursework, we didn't incorporate residual connections to the downsampling layers. Explain and justify what would need to be changed in order to add residual connections to the downsampling layers. Give and explain 2 ways of incorporating these changes and discuss pros and cons of each.
+\newcommand{\questionThree} {
+\youranswer{
+Our work does not incorporate residual connections across the downsampling layers, as this creates a dimensional mismatch between the input and output feature maps due to the reduction in spatial dimensions. To add residual connections, one approach is to use a convolutional layer with a kernel size of $1\times 1$, stride, and padding matched to the downsampling operation to transform the input to the same shape as the output. Another approach would be to use average pooling or max pooling directly on the residual connection to downsample the input feature map, matching its spatial dimensions to the output, followed by a linear transformation to align the channel dimensions.
+
+The difference between these two methods is that the first approach using a $1\times 1$ convolution provides more flexibility by learning the transformation, which can enhance model expressiveness but increases computational cost, whereas the second approach with pooling is computationally cheaper and simpler but may lose fine-grained information due to the fixed, non-learnable nature of pooling operations.
+}
+}
+
+%% Question 4:
+% Question 4 - Present and discuss the experiment results (all of the results and not just the ones you had to fill in) in Table 1 and Figures 4 and 5 (you may use any of the other Figures if you think they are relevant to your analysis). You will have to determine what data are relevant to the discussion, and what information can be extracted from it. Also, discuss what further experiments you would have ran on any combination of VGG08, VGG38, BN, RC in order to
+% \begin{itemize}
+%     \item Improve performance of the model trained (explain why you expect your suggested experiments will help with this).
+%     \item Learn more about the behaviour of BN and RC (explain what you are trying to learn and how).
+% \end{itemize}
+% 
+% The average length for an answer to this question is approximately 1 of the columns in a 2-column page
+\newcommand{\questionFour} {
+\youranswer{
+Our results demonstrate the effectiveness of batch normalization and residual connection as proposed by \cite{he2016deep}, enabling effective training of deep convolutional networks as shown by the significant improvement in training and validation performance for VGG38 when incorporating these techniques. Table~\ref{tab:CIFAR_results} highlights that adding BN alone (VGG38 BN) reduces both training and validation losses compared to the baseline VGG38, with validation accuracy increasing from near-zero to $47.68\%$ at a learning rate (LR) of $1\mathrm{e}{-3}$. Adding RC further enhances performance, as seen in VGG38 RC achieving $52.32\%$ validation accuracy under the same conditions. The combination of BN and RC (VGG38 BN + RC) yields the best results, achieving $53.76\%$ validation accuracy with LR $1\mathrm{e}{-3}$. BN+RC appears to benefit greatly from a higher learning rate, as it improves further to $58.20\%$ a LR of $1\mathrm{e}{-2}$. BN alone however deteriorates at higher learning rates, as evidenced by lower validation accuracy, emphasizing the stabilizing role of RC. \autoref{fig:training_curves_bestModel} confirms the synergy of BN and RC, with the VGG38 BN + RC model reaching $74\%$ training accuracy and plateauing near $60\%$ validation accuracy. \autoref{fig:avg_grad_flow_bestModel} illustrates stable gradient flow, with BN mitigating vanishing gradients and RC maintaining gradient propagation through deeper layers, particularly in the later stages of the network.
+
+While this work did not evaluate residual connections on downsampling layers, a thorough evaluation of both methods put forth earlier would be required to complete the picture, highlighting how exactly residual connections in downsampling layers affect gradient flow, feature learning, and overall performance. Such an evaluation would clarify whether the additional computational cost of using $1\times 1$ convolutions for matching dimensions is justified by improved accuracy or if the simpler pooling-based approach suffices, particularly for tasks where computational efficiency is crucial.
+}
+}
+
+
+%% Question 5:
+% Briefly draw your conclusions based on the results from the previous sections (what are the take-away messages?) and conclude your report with a recommendation for future work. 
+% 
+% Good recommendations for future work also draw on the broader literature (the papers already referenced are good starting points). Great recommendations for future work are not just incremental (an example of an incremental suggestion would be: ``we could also train with different learning rates'') but instead also identify meaningful questions or, in other words, questions with answers that might be somewhat more generally applicable. 
+% 
+% For example, \citep{huang2017densely} end with \begin{quote}``Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features, e.g.,  [4,5].''\end{quote} 
+% 
+% while \cite{bengio1993problem} state in their conclusions that \begin{quote}``There remains theoretical questions to be considered,  such as whether the problem with simple gradient descent  discussed in this paper would be observed with  chaotic attractors that are not  hyperbolic.''\\\end{quote}
+% 
+% The length of this question description is indicative of the average length of a conclusion section
+\newcommand{\questionFive} {
+\youranswer{
+The results presented showcase a clear solution to the vanishing gradient problem. With batch normalization and Residual Connections, we are able to train much deeper neural networks effectively, as evidenced by the improved performance of VGG38 with these modifications. The combination of BN and RC not only stabilizes gradient flow but also enhances both training and validation accuracy, particularly when paired with an appropriate learning rate. These findings reinforce the utility of architectural innovations like those proposed in \cite{he2016deep} and \cite{ioffe2015batch}, which have become foundational in modern deep learning.
+
+While these methods appear to enable training of deeper neural networks, the critical question of how these architectural enhancements generalize across different datasets and tasks remains open. Future work could investigate the effectiveness of BN and RC in scenarios involving large-scale datasets, such as ImageNet, or in domains like natural language processing and generative models, where deep architectures also face optimization challenges. Additionally, exploring the interplay between residual connections and emerging techniques like attention mechanisms \citep{vaswani2017attention} might uncover further synergies. Beyond this, understanding the theoretical underpinnings of how residual connections influence optimization landscapes and gradient flow could yield insights applicable to designing novel architectures.}
+}
+
+
+%% - - - - - - - - - - - - FIGURES - - - - - - - - - - - - 
+
+%% Question Figure 3:
+\newcommand{\questionFigureThree} {
+% Question Figure 3 - Replace this image with a figure depicting the average gradient across layers, for the VGG38 model.
+%\textit{(The provided figure is correct, and can be used in your analysis. It is partially obscured so you can get credit for producing your own copy).}
+\youranswer{
+\begin{figure}[t]
+    \centering
+    \includegraphics[width=\linewidth]{figures/gradplot_38.pdf}
+    \caption{Gradient Flow on VGG38}
+    \label{fig:avg_grad_flow_38}
+\end{figure}
+}
+}
+
+%% Question Figure 4:
+% Question Figure 4 - Replace this image with a figure depicting the training curves for the model with the best performance \textit{across experiments you have available (you don't need to run the experiments for the models we already give you results for)}. Edit the caption so that it clearly identifies the model and what is depicted.
+\newcommand{\questionFigureFour} {
+\youranswer{
+\begin{figure}[t]
+    \begin{subfigure}{\linewidth}
+        \centering
+        \includegraphics[width=\linewidth]{figures/VGG38_BN_RC_loss_performance.pdf}
+        \caption{Cross entropy error per epoch}
+        \label{fig:vgg38_loss_curves}
+    \end{subfigure}
+
+    \begin{subfigure}{\linewidth}
+        \centering
+        \includegraphics[width=\linewidth]{figures/VGG38_BN_RC_accuracy_performance.pdf}
+        \caption{Classification accuracy per epoch}
+        \label{fig:vgg38_acc_curves}
+    \end{subfigure}
+    \caption{Training curves for the 38 layer CNN with batch normalization and residual connections, trained with LR of $0.01$}
+    \label{fig:training_curves_bestModel}
+\end{figure}
+}
+}
+
+%% Question Figure 5:
+% Question Figure 5 - Replace this image with a figure depicting the average gradient across layers, for the model with the best performance \textit{across experiments you have available (you don't need to run the experiments for the models we already give you results for)}. Edit the caption so that it clearly identifies the model and what is depicted.
+\newcommand{\questionFigureFive} {
+\youranswer{
+\begin{figure}[t]
+    \centering
+    \includegraphics[width=\linewidth]{figures/gradplot_38_bn_rc.pdf}
+    \caption{Gradient Flow for the 38 layer CNN with batch normalization and residual connections, trained with LR of $0.01$}
+    \label{fig:avg_grad_flow_bestModel}
+\end{figure}
+}
+}
+
+%% - - - - - - - - - - - - TABLES - - - - - - - - - - - - 
+
+%% Question Table 1:
+% Question Table 1 - Fill in Table 1 with the results from your experiments on 
+% \begin{enumerate}
+%     \item \textit{VGG38 BN (LR 1e-3)}, and 
+%     \item \textit{VGG38 BN + RC (LR 1e-2)}.
+% \end{enumerate}
+\newcommand{\questionTableOne} {
+\youranswer{
+%
+\begin{table*}[t]
+    \centering
+    \begin{tabular}{lr|ccccc}
+    \toprule
+        Model                   & LR   & \# Params & Train loss & Train acc & Val loss & Val acc \\
+    \midrule
+        VGG08                   & 1e-3 & 60 K      &  1.74      & 51.59     & 1.95     & 46.84 \\
+        VGG38                   & 1e-3 & 336 K     &  4.61      & 00.01     & 4.61     & 00.01 \\
+        VGG38 BN                & 1e-3 & 339 K     &  1.76      & 50.62     & 1.95     & 47.68 \\
+        VGG38 RC                & 1e-3 & 336 K     &  1.33      & 61.52     & 1.84     & 52.32 \\
+        VGG38 BN + RC           & 1e-3 & 339 K     &  1.26      & 62.99     & 1.73     & 53.76 \\
+        VGG38 BN                & 1e-2 & 339 K     &  1.70      & 52.28     & 1.99     & 46.72 \\
+        VGG38 BN + RC           & 1e-2 & 339 K     &  0.83      & 74.35     & 1.70     & 58.20 \\
+    \bottomrule
+    \end{tabular}
+    \caption{Experiment results (number of model parameters, Training and Validation loss and accuracy) for different combinations of VGG08, VGG38, Batch Normalisation (BN), and Residual Connections (RC), LR is learning rate.}
+    \label{tab:CIFAR_results}
+\end{table*} 
+}
+}
+
+%% END of YOUR ANSWERS
--- a/report/mlp-cw2-template.tex
+++ b/report/mlp-cw2-template.tex
@ -0,0 +1,314 @@
+%% Template for MLP Coursework 2 / 13 November 2023
+
+%% Based on  LaTeX template for ICML 2017 - example_paper.tex at 
+%%  https://2017.icml.cc/Conferences/2017/StyleAuthorInstructions
+
+\documentclass{article}
+\input{mlp2022_includes}
+
+
+\definecolor{red}{rgb}{0.95,0.4,0.4}
+\definecolor{blue}{rgb}{0.4,0.4,0.95}
+\definecolor{orange}{rgb}{1, 0.65, 0}
+
+\newcommand{\youranswer}[1]{{\color{red} \bf[#1]}} %your answer: 
+
+
+%% START of YOUR ANSWERS
+\input{mlp-cw2-questions}
+%% END of YOUR ANSWERS
+
+
+
+%% Do not change anything in this file. Add your answers to mlp-cw1-questions.tex
+
+
+
+\begin{document} 
+
+\twocolumn[
+\mlptitle{MLP Coursework 2}
+\centerline{\studentNumber}
+\vskip 7mm
+]
+
+\begin{abstract} 
+Deep neural networks have become the state-of-the-art 
+in many standard computer vision problems thanks to their powerful
+representations and availability of large labeled datasets.
+While very deep networks allow for learning more levels of abstractions in their layers from the data, training these models successfully is a challenging task due to problematic gradient flow through the layers, known as vanishing/exploding gradient problem.
+In this report, we first analyze this problem in VGG models with 8 and 38 hidden layers on the CIFAR100 image dataset, by monitoring the gradient flow during training. 
+We explore known solutions to this problem including batch normalization or residual connections, and explain their theory and implementation details. 
+Our experiments show that batch normalization and residual connections effectively address the aforementioned problem and hence enable a deeper model to outperform shallower ones in the same experimental setup.
+\end{abstract} 
+
+\section{Introduction}
+\label{sec:intro}
+Despite the remarkable progress of modern convolutional neural networks (CNNs) in image classification problems~\cite{simonyan2014very, he2016deep}, training very deep networks is a challenging procedure.
+One of the major problems is the Vanishing Gradient Problem (VGP), a phenomenon where the gradients of the error function with respect to network weights shrink to zero, as they backpropagate to earlier layers, hence preventing effective weight updates. 
+This phenomenon is prevalent and has been extensively studied in various deep neural networks including feedforward  networks~\cite{glorot2010understanding},  RNNs~\cite{bengio1993problem}, and CNNs~\cite{he2016deep}. 
+Multiple solutions have been proposed to mitigate this problem by using weight initialization strategies~\cite{glorot2010understanding},
+activation functions~\cite{glorot2010understanding}, input normalization~\cite{bishop1995neural},
+batch normalization~\cite{ioffe2015batch}, and shortcut connections \cite{he2016deep, huang2017densely}.
+
+This report focuses on diagnosing the VGP occurring in the VGG38 model\footnote{VGG stands for the Visual Geometry Group in the University of Oxford.} and addressing it by implementing two standard solutions.
+In particular, we first study a ``broken'' network in terms of its gradient flow, L1 norm of gradients with respect to its weights for each layer and contrast it to ones in the healthy and shallower VGG08 to pinpoint the problem.
+Next, we review two standard solutions for this problem,  batch normalization (BN)~\cite{ioffe2015batch} and residual connections (RC)~\cite{he2016deep} in detail and discuss how they can address the gradient problem.
+We first incorporate batch normalization (denoted as VGG38+BN), residual connections (denoted as VGG38+RC),  and their combination (denoted as VGG38+BN+RC) to the given VGG38 architecture.
+We train the resulting three configurations, and VGG08 and VGG38 models on CIFAR100 (pronounced as `see far 100' ) dataset and present the results.
+The results show that though separate use of BN and RC does mitigate the vanishing/exploding gradient problem, therefore enabling effective training of the VGG38 model, the best results are obtained by combining both BN and RC.
+
+%
+
+
+\section{Identifying training problems of a deep CNN}
+\label{sec:task1}
+
+\begin{figure}[t]
+    \begin{subfigure}{\linewidth}
+        \centering
+        \includegraphics[width=\linewidth]{figures/loss_plot.pdf}
+        \caption{Cross entropy error per epoch}
+        \label{fig:loss_curves}
+    \end{subfigure}
+
+    \begin{subfigure}{\linewidth}
+        \centering
+        \includegraphics[width=\linewidth]{figures/accuracy_plot.pdf}
+        \caption{Classification accuracy per epoch}
+        \label{fig:acc_curves}
+    \end{subfigure}
+    \caption{Training curves for VGG08 and VGG38 in terms of (a) cross-entropy error and (b) classification accuracy}
+    \label{fig:curves}
+\end{figure}
+
+\begin{figure}[t]
+    \centering
+    \includegraphics[width=\linewidth]{figures/grad_flow_vgg08.pdf}
+    \caption{Gradient flow on VGG08}
+    \label{fig:grad_flow_08}
+\end{figure}
+
+\questionFigureThree
+
+Concretely, training deep neural networks typically involves three steps: forward
+pass, backward pass (or backpropagation algorithm~\cite{rumelhart1986learning}) and weight update.
+The first step involves passing the input $\bx^{(0)}$ to the network and producing 
+the network prediction and also the error value.
+In detail, each layer takes in the output of the previous layer and applies
+a non-linear transformation:
+\begin{equation}
+\label{eq.fprop}
+\bx^{(l)} = f^{(l)}(\bx^{(l-1)}; W^{(l)})    
+\end{equation} 
+where $(l)$ denotes the $l$-th layer in $L$ layer deep network,
+$f^{(l)}(\cdot,W^{(l)})$ is a non-linear transformation for layer $l$, and $W^{(l)}$ are the weights of layer $l$.
+For instance, $f^{(l)}$ is typically a convolution operation followed by an activation function in convolutional neural networks.
+The second step involves the backpropagation algorithm, where we calculate the gradient of an error function $E$ (\textit{e.g.} cross-entropy) for each layer's weight as follows:
+
+\begin{equation}
+    \label{eq.bprop}
+\frac{\partial E}{\partial W^{(l)}} = \frac{\partial E}{\partial \bx^{(L)}} \frac{\partial \bx^{(L)}}{\partial \bx^{(L-1)}} \dots \frac{\partial \bx^{(l+1)}}{\partial \bx^{(l)}}\frac{\partial \bx^{(l)}}{\partial W^{(l)}}.
+\end{equation}
+
+This step includes consecutive tensor multiplications between multiple
+partial derivative terms.
+The final step involves updating model weights by using the computed 
+$\frac{\partial E}{\partial W^{(l)}}$ with an update rule.
+The exact update rule depends on the optimizer.
+
+A notorious problem for training deep neural networks is the vanishing/exploding gradient
+problem~\cite{bengio1993problem} that typically occurs in the backpropagation step when some of partial gradient terms in Eq.~\ref{eq.bprop} includes values larger or smaller than 1.
+In this case, due to the multiple consecutive multiplications, the gradients \textit{w.r.t.} weights can get exponentially very small (close to 0) or very large (close to infinity) and
+prevent effective learning of network weights.
+
+
+%
+
+
+Figures~\ref{fig:grad_flow_08} and \ref{fig:grad_flow_38} depict the gradient flows through VGG architectures \cite{simonyan2014very} with 8 and 38 layers respectively, trained and evaluated for a total of 100 epochs on the CIFAR100 dataset. \questionOne.
+
+
+\section{Background Literature}
+\label{sec:lit_rev}
+In this section we will highlight some of the most influential
+papers that have been central to overcoming the VGP in
+deep CNNs.
+
+\paragraph{Batch Normalization}\cite{ioffe2015batch}
+BN seeks to solve the  problem of 
+internal covariate shift (ICS), when distribution of each layer’s 
+inputs changes during training, as the parameters of the previous layers change. 
+The authors argue that without batch normalization, the distribution of
+each layer’s inputs can vary significantly due to the  stochastic nature of randomly sampling mini-batches from your
+training set. 
+Layers in the network hence must continuously adapt to these high variance distributions which hinders the rate of convergence gradient-based optimizers.
+This optimization problem is exacerbated further with network depth due
+to the updating of parameters at layer $l$ being dependent on
+the previous $l-1$ layers.
+
+It is hence beneficial to embed the normalization of
+training data into the network architecture after work from
+LeCun \emph{et al.} showed that training converges faster with
+this addition \cite{lecun2012efficient}. Through standardizing
+the inputs to each layer, we take a step towards achieving
+the fixed distributions of inputs that remove the ill effects
+of ICS. Ioffe and Szegedy demonstrate the effectiveness of
+their technique through training an ensemble of BN
+networks which achieve an accuracy on the ImageNet classification
+task exceeding that of humans in 14 times fewer
+training steps than the state-of-the-art of the time.
+It should be noted, however, that the exact reason for BN’s effectiveness is still not completely understood and it is 
+an open research question~\cite{santurkar2018does}.
+
+
+
+\paragraph{Residual networks (ResNet)}\cite{he2016deep} A well-known way of mitigating the VGP is proposed by He~\emph{et al.} in \cite{he2016deep}. In their paper, the authors depict the error curves of a 20 layer and a 56 layer network to motivate their method. Both training and testing error of the 56 layer network are significantly higher than of the shallower one.
+
+\questionTwo.
+
+Residual networks, colloquially
+known as ResNets, aim to alleviate VGP through the
+incorporation of skip connections that bypass the linear
+transformations into the network architecture. 
+The authors argue that this new mapping is significantly easier
+to optimize since if an identity mapping were optimal, the
+network could comfortably learn to push the residual to
+zero rather than attempting to fit an identity mapping via
+a stack of nonlinear layers. 
+They bolster their argument
+by successfully training ResNets with depths exceeding
+1000 layers on the CIFAR10 dataset.
+Prior to their work, training even a 100-layer was accepted
+as a great challenge within the deep learning community.
+The addition of skip connections solves the VGP through
+enabling information to flow more freely throughout the
+network architecture without the addition of neither extra
+parameters, nor computational complexity.
+
+\section{Solution overview}
+\subsection{Batch normalization}
+
+
+
+
+
+BN has been a standard component in the state-of-the-art 
+convolutional neural networks \cite{he2016deep,huang2017densely}.
+% As mentioned in Section~\ref{sec:lit_rev}, 
+Concretely, BN is a
+layer transformation that is performed to whiten the activations
+originating from each layer. 
+As computing full dataset statistics at each training iteration
+would be computationally expensive, BN computes batch statistics
+to approximate them. 
+Given a minibatch of $B$ training samples and their feature maps
+ $X = (\bx^1, \bx^2,\ldots , \bx^B)$ at an arbitrary layer where $X \in \mathbb{R}^{B\times H \times W \times C}$, $H, W$ are the height, width of the feature map and $C$ is the number of channels, the batch normalization first computes the following statistics:
+
+\begin{align}
+\label{eq.bnstats}
+    \mu_c &= \frac{1}{BWH}  \sum_{n=1}^{B}\sum_{i,j=1}^{H,W} \bx_{cij}^{n}\\
+    \sigma^2_c &= \frac{1}{BWH}
+    \sum_{n=1}^{B}\sum_{i,j=1}^{H,W} (\bx_{cij}^{n} - \mu_{c})^2
+\end{align} where $c$, $i$, $j$ denote the index values for $y$, $x$ and channel coordinates of feature maps, and $\bm{\mu}$ and $\bm{\sigma}^2$ are the mean and variance of the batch.
+
+BN applies the following operation on each feature map in batch B for every $c,i,j$:
+\begin{equation}
+\label{eq.bnop}
+\text{BN}(\bx_{cij}) = \frac{\bx_{cij} - \mu_{c}}{\sqrt{\sigma^2_{c}} + \epsilon} * \gamma_{c} + \beta_{c}
+\end{equation} where $\gamma \in \mathbb{R}^C$ and $\beta\in \mathbb{R}^C$ are learnable parameters and $\epsilon$ is a small constant introduced to ensure numerical stability.
+
+At inference time, using batch statistics is a poor choice as it introduces noise in the evaluation and might not even be well defined. Therefore, $\bm{\mu}$ and $\bm{\sigma}$ are replaced by running averages of the mean and variance computed during training, which is a better approximation of the full dataset statistics.
+
+Recent work
+has shown that BatchNorm has a more fundamental
+benefit of smoothing the optimization landscape during
+training \cite{santurkar2018does} thus enhancing the predictive
+power of gradients as our guide to the global minimum.
+Furthermore, a smoother optimization landscape should
+additionally enable the use of a wider range of learning
+rates and initialization schemes which is congruent with the
+findings of Ioffe and Szegedy in the original BatchNorm
+paper~\cite{ioffe2015batch}.
+
+
+\subsection{Residual connections}
+
+Residual connections are another approach used in the state-of-the-art Residual Networks~\cite{he2016deep} to tackle the vanishing gradient problem.
+Introduced by He et. al.~\cite{he2016deep}, a residual block consists of a
+convolution (or group of convolutions) layer, ``short-circuited'' with an identity mapping.
+More precisely, given a mapping $F^{(b)}$ that denotes the transformation of the block $b$ (multiple consecutive layers), $F^{(b)}$ is applied to its input
+feature map $\bx^{(b-1)}$ as $\bx^{(b)} = \bx^{(b-1)} + {F}(\bx^{(b-1)})$.
+
+Intuitively, stacking residual blocks creates an architecture where inputs of each blocks
+are given two paths : passing through the convolution or skipping to the next layer. A residual network can therefore be seen as an ensemble model averaging every sub-network
+created by choosing one of the two paths. The skip connections allow gradients to flow
+easily into early layers, since 
+\begin{equation}
+    \frac{\partial \bx^{(b)}}{\partial \bx^{(b-1)}} = \mathbbm{1} + \frac{\partial{F}(\bx^{(b-1)})}{\partial \bx^{(b-1)}}
+    \label{eq.grad_skip}
+\end{equation} where $\bx^{(b-1)} \in \mathbb{R}^{C \times H \times W }$ and $\mathbbm{1}$ is a $\mathbb{R}^{C \times H \times W}$-dimensional tensor with entries 1 where $C$, $H$ and $W$ denote the number of feature maps, its height and width respectively. 
+Importantly, $\mathbbm{1}$ prevents the zero gradient flow.
+
+
+\section{Experiment Setup}
+
+\questionFigureFour
+
+\questionFigureFive
+
+\questionTableOne
+
+We conduct our experiment on the CIFAR100 dataset \cite{krizhevsky2009learning},
+which consists of 60,000 32x32 colour images from 100 different classes. The number of samples per class is balanced, and the
+samples are split into training, validation, and test set while
+maintaining balanced class proportions. In total, there are 47,500; 2,500; and 10,000 instances in the training, validation,
+and test set, respectively. Moreover, we apply data augmentation strategies (cropping, horizontal flipping) to improve the generalization of the model.
+
+With the goal of understanding whether BN or skip connections
+help fighting vanishing gradients, we first test these
+methods independently, before combining them in an attempt
+to fully exploit the depth of the VGG38 model.
+
+All experiments are conducted using the Adam optimizer with the default
+learning rate (1e-3) -- unless otherwise specified, cosine annealing and a batch size of 100
+for 100 epochs. 
+Additionally, training images are augmented with random 
+cropping and horizontal flipping.
+Note that we do not use data augmentation at test time.
+These hyperparameters along with the augmentation strategy are used
+to produce the results shown in Fig.~\ref{fig:curves}.
+
+When used, BN is applied
+after each convolutional layer, before the Leaky
+ReLU non-linearity. 
+Similarly, the skip connections are applied from 
+before the convolution layer to before the final activation function
+of the block as per Fig.~2 of \cite{he2016deep}. 
+Note that adding residual connections between the feature maps before and after downsampling requires special treatment, as there is a dimension mismatch between them. 
+Therefore in the coursework, we do not use residual connections in the down-sampling blocks. However, please note that batch normalization should still be implemented for these blocks. 
+
+\subsection{Residual Connections to Downsampling Layers}
+\label{subsec:rescimp}
+
+\questionThree.
+
+
+\section{Results and Discussion}
+\label{sec:disc}
+
+\questionFour.
+
+\section{Conclusion}
+\label{sec:concl}
+
+\questionFive.    
+
+\bibliography{refs}
+
+\end{document} 
+
+
+
+
+
--- a/report/mlp2022.sty
+++ b/report/mlp2022.sty
@ -0,0 +1,720 @@
+% File: mlp2017.sty (LaTeX style file for ICML-2017, version of 2017-05-31)
+
+% Modified by Daniel Roy 2017: changed byline to use footnotes for affiliations, and removed emails
+
+% This file contains the LaTeX formatting parameters for a two-column 
+% conference proceedings that is 8.5 inches wide by 11 inches high.  
+% 
+% Modified by Percy Liang 12/2/2013: changed the year, location from the previous template for ICML 2014
+
+% Modified by Fei Sha 9/2/2013: changed the year, location form the previous template for ICML 2013
+%
+% Modified by Fei Sha 4/24/2013: (1) remove the extra whitespace after the first author's email address (in %the camera-ready version) (2) change the Proceeding ... of ICML 2010 to 2014 so PDF's metadata will show up % correctly
+%
+% Modified by Sanjoy Dasgupta, 2013: changed years, location
+%
+% Modified by Francesco Figari, 2012: changed years, location
+%
+% Modified by Christoph Sawade and Tobias Scheffer, 2011: added line 
+% numbers, changed years
+%
+% Modified by Hal Daume III, 2010: changed years, added hyperlinks
+%
+% Modified by Kiri Wagstaff, 2009: changed years
+%
+% Modified by Sam Roweis, 2008: changed years
+%
+% Modified by Ricardo Silva, 2007: update of the ifpdf verification
+%
+% Modified by Prasad Tadepalli and Andrew Moore, merely changing years. 
+%
+% Modified by Kristian Kersting, 2005, based on Jennifer Dy's 2004 version
+% - running title. If the original title is to long or is breaking a line,
+%   use \mlptitlerunning{...} in the preamble to supply a shorter form.
+%   Added fancyhdr package to get a running head. 
+% - Updated to store the page size because pdflatex does compile the 
+%   page size into the pdf. 
+%
+% Hacked by Terran Lane, 2003:
+% - Updated to use LaTeX2e style file conventions (ProvidesPackage,
+%   etc.)
+% - Added an ``appearing in'' block at the base of the first column
+%   (thus keeping the ``appearing in'' note out of the bottom margin
+%   where the printer should strip in the page numbers).
+% - Added a package option [accepted] that selects between the ``Under
+%   review'' notice (default, when no option is specified) and the
+%   ``Appearing in'' notice (for use when the paper has been accepted
+%   and will appear).
+%
+%   Originally created as:  ml2k.sty (LaTeX style file for ICML-2000)
+%   by P. Langley (12/23/99)
+
+%%%%%%%%%%%%%%%%%%%%
+%% This version of the style file supports both a ``review'' version
+%% and a ``final/accepted'' version.  The difference is only in the
+%% text that appears in the note at the bottom of the first column of
+%% the first page.  The default behavior is to print a note to the
+%% effect that the paper is under review and don't distribute it.  The
+%% final/accepted version prints an ``Appearing in'' note.  To get the
+%% latter behavior, in the calling file change the ``usepackage'' line
+%% from:
+%%	\usepackage{icml2017}
+%% to
+%%	\usepackage[accepted]{icml2017}
+%%%%%%%%%%%%%%%%%%%%
+
+\NeedsTeXFormat{LaTeX2e}
+\ProvidesPackage{mlp2022}[2021/10/16 MLP Coursework Style File]
+
+% Use fancyhdr package
+\RequirePackage{fancyhdr}
+\RequirePackage{color}
+\RequirePackage{algorithm}
+\RequirePackage{algorithmic}
+\RequirePackage{natbib}
+\RequirePackage{eso-pic} % used by \AddToShipoutPicture 
+\RequirePackage{forloop}
+
+%%%%%%%% Options
+%\DeclareOption{accepted}{%
+% \renewcommand{\Notice@String}{\ICML@appearing}
+  \gdef\isaccepted{1}
+%}
+\DeclareOption{nohyperref}{%
+  \gdef\nohyperref{1}
+}
+
+\ifdefined\nohyperref\else\ifdefined\hypersetup
+  \definecolor{mydarkblue}{rgb}{0,0.08,0.45}
+  \hypersetup{ %
+    pdftitle={},
+    pdfauthor={},
+    pdfsubject={MLP Coursework 2021-22},
+    pdfkeywords={},
+    pdfborder=0 0 0,
+    pdfpagemode=UseNone,
+    colorlinks=true,
+    linkcolor=mydarkblue,
+    citecolor=mydarkblue,
+    filecolor=mydarkblue,
+    urlcolor=mydarkblue,
+    pdfview=FitH}
+
+  \ifdefined\isaccepted \else
+    \hypersetup{pdfauthor={Anonymous Submission}}
+  \fi
+\fi\fi
+
+%%%%%%%%%%%%%%%%%%%%
+% This string is printed at the bottom of the page for the
+% final/accepted version of the ``appearing in'' note.  Modify it to
+% change that text.
+%%%%%%%%%%%%%%%%%%%%
+\newcommand{\ICML@appearing}{\textit{MLP Coursework 1 2021--22}}
+
+%%%%%%%%%%%%%%%%%%%%
+% This string is printed at the bottom of the page for the draft/under
+% review version of the ``appearing in'' note.  Modify it to change
+% that text.
+%%%%%%%%%%%%%%%%%%%%
+\newcommand{\Notice@String}{MLP Coursework 1 2021--22}
+
+% Cause the declared options to actually be parsed and activated
+\ProcessOptions\relax
+
+% Uncomment the following for debugging.  It will cause LaTeX to dump
+% the version of the ``appearing in'' string that will actually appear
+% in the document.
+%\typeout{>> Notice string='\Notice@String'}
+
+% Change citation commands to be more like old ICML styles
+\newcommand{\yrcite}[1]{\citeyearpar{#1}}
+\renewcommand{\cite}[1]{\citep{#1}}
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% to ensure the letter format is used. pdflatex does compile the
+% page size into the pdf. This is done using \pdfpagewidth and 
+% \pdfpageheight. As Latex does not know this directives, we first
+% check whether pdflatex or latex is used.
+%
+% Kristian Kersting 2005
+%
+% in order to account for the more recent use of pdfetex as the default
+% compiler, I have changed the pdf verification.
+%
+% Ricardo Silva 2007
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\paperwidth=210mm
+\paperheight=297mm
+
+% old PDFLaTex verification, circa 2005
+%
+%\newif\ifpdf\ifx\pdfoutput\undefined
+%  \pdffalse % we are not running PDFLaTeX
+%\else
+%  \pdfoutput=1 % we are running PDFLaTeX
+%  \pdftrue
+%\fi
+
+\newif\ifpdf %adapted from ifpdf.sty
+\ifx\pdfoutput\undefined
+\else
+   \ifx\pdfoutput\relax
+   \else
+     \ifcase\pdfoutput
+     \else
+       \pdftrue
+     \fi
+   \fi
+\fi
+
+\ifpdf
+%    \pdfpagewidth=\paperwidth
+%    \pdfpageheight=\paperheight
+  \setlength{\pdfpagewidth}{210mm}
+  \setlength{\pdfpageheight}{297mm}
+\fi
+
+% Physical page layout 
+
+\evensidemargin -5.5mm  
+\oddsidemargin -5.5mm 
+\setlength\textheight{248mm}
+\setlength\textwidth{170mm} 
+\setlength\columnsep{6.5mm}
+\setlength\headheight{10pt}
+\setlength\headsep{10pt} 
+\addtolength{\topmargin}{-20pt}
+
+%\setlength\headheight{1em}
+%\setlength\headsep{1em}
+\addtolength{\topmargin}{-6mm}
+
+%\addtolength{\topmargin}{-2em}
+
+%% The following is adapted from code in the acmconf.sty conference
+%% style file.  The constants in it are somewhat magical, and appear
+%% to work well with the two-column format on US letter paper that
+%% ICML uses, but will break if you change that layout, or if you use
+%% a longer block of text for the copyright notice string.  Fiddle with
+%% them if necessary to get the block to fit/look right.
+%%
+%% -- Terran Lane, 2003
+%%
+%% The following comments are included verbatim from acmconf.sty:
+%%
+%%% This section (written by KBT) handles the 1" box in the lower left
+%%% corner of the left column of the first page by creating a picture,
+%%% and inserting the predefined string at the bottom (with a negative
+%%% displacement to offset the space allocated for a non-existent
+%%% caption).
+%%%
+\def\ftype@copyrightbox{8}
+\def\@copyrightspace{
+% Create a float object positioned at the bottom of the column.  Note
+% that because of the mystical nature of floats, this has to be called
+% before the first column is populated with text (e.g., from the title
+% or abstract blocks).  Otherwise, the text will force the float to
+% the next column.  -- TDRL.
+\@float{copyrightbox}[b]
+\begin{center}
+\setlength{\unitlength}{1pc}
+\begin{picture}(20,1.5)
+% Create a line separating the main text from the note block.
+% 4.818pc==0.8in.
+\put(0,2.5){\line(1,0){4.818}}
+% Insert the text string itself.  Note that the string has to be
+% enclosed in a parbox -- the \put call needs a box object to
+% position.  Without the parbox, the text gets splattered across the
+% bottom of the page semi-randomly.  The 19.75pc distance seems to be
+% the width of the column, though I can't find an appropriate distance
+% variable to substitute here.  -- TDRL.
+\put(0,0){\parbox[b]{19.75pc}{\small \Notice@String}}
+\end{picture}
+\end{center}
+\end@float}
+
+% Note: A few Latex versions need the next line instead of the former.
+% \addtolength{\topmargin}{0.3in}
+% \setlength\footheight{0pt}
+\setlength\footskip{0pt} 
+%\pagestyle{empty} 
+\flushbottom \twocolumn
+\sloppy
+
+% Clear out the addcontentsline command
+\def\addcontentsline#1#2#3{}
+ 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%% commands for formatting paper title, author names, and addresses. 
+
+%%start%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%% title as running head -- Kristian Kersting 2005 %%%%%%%%%%%%%
+
+
+%\makeatletter
+%\newtoks\mytoksa
+%\newtoks\mytoksb
+%\newcommand\addtomylist[2]{%
+%  \mytoksa\expandafter{#1}%
+%  \mytoksb{#2}%
+%  \edef#1{\the\mytoksa\the\mytoksb}%
+%}
+%\makeatother 
+
+% box to check the size of the running head
+\newbox\titrun
+
+% general page style
+\pagestyle{fancy}
+\fancyhf{}
+\fancyhead{}
+\fancyfoot{}
+% set the width of the head rule to 1 point
+\renewcommand{\headrulewidth}{1pt}
+
+% definition to set the head as running head in the preamble
+\def\mlptitlerunning#1{\gdef\@mlptitlerunning{#1}}
+
+% main definition adapting \mlptitle from 2004
+\long\def\mlptitle#1{%
+
+   %check whether @mlptitlerunning exists
+   % if not \mlptitle is used as running head
+   \ifx\undefined\@mlptitlerunning%
+	\gdef\@mlptitlerunning{#1}
+   \fi
+
+   %add it to pdf information
+  \ifdefined\nohyperref\else\ifdefined\hypersetup
+     \hypersetup{pdftitle={#1}}
+   \fi\fi
+
+   %get the dimension of the running title
+   \global\setbox\titrun=\vbox{\small\bf\@mlptitlerunning}
+
+   % error flag
+   \gdef\@runningtitleerror{0}
+
+   % running title too long
+   \ifdim\wd\titrun>\textwidth%
+	  {\gdef\@runningtitleerror{1}}%
+   % running title breaks a line
+   \else\ifdim\ht\titrun>6.25pt
+	   {\gdef\@runningtitleerror{2}}%
+	\fi
+   \fi 
+
+   % if there is somthing wrong with the running title
+   \ifnum\@runningtitleerror>0
+	   \typeout{}%
+           \typeout{}%
+           \typeout{*******************************************************}%
+           \typeout{Title exceeds size limitations for running head.}%
+           \typeout{Please supply a shorter form for the running head}
+           \typeout{with \string\mlptitlerunning{...}\space prior to \string\begin{document}}%
+           \typeout{*******************************************************}%
+ 	    \typeout{}%
+           \typeout{}%
+           % set default running title
+	   \chead{\small\bf Title Suppressed Due to Excessive Size}%
+    \else
+	   % 'everything' fine, set provided running title
+  	   \chead{\small\bf\@mlptitlerunning}%
+    \fi
+
+  % no running title on the first page of the paper
+  \thispagestyle{empty}
+
+%%%%%%%%%%%%%%%%%%%% Kristian Kersting %%%%%%%%%%%%%%%%%%%%%%%%%  
+%end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+  {\center\baselineskip 18pt
+                       \toptitlebar{\Large\bf #1}\bottomtitlebar}
+}
+
+
+\gdef\icmlfullauthorlist{}
+\newcommand\addstringtofullauthorlist{\g@addto@macro\icmlfullauthorlist}
+\newcommand\addtofullauthorlist[1]{%
+  \ifdefined\icmlanyauthors%
+    \addstringtofullauthorlist{, #1}%
+  \else%
+    \addstringtofullauthorlist{#1}%
+    \gdef\icmlanyauthors{1}%
+  \fi%
+  \ifdefined\nohyperref\else\ifdefined\hypersetup%
+    \hypersetup{pdfauthor=\icmlfullauthorlist}%
+  \fi\fi}
+
+
+\def\toptitlebar{\hrule height1pt \vskip .25in} 
+\def\bottomtitlebar{\vskip .22in \hrule height1pt \vskip .3in} 
+
+\newenvironment{icmlauthorlist}{%
+  \setlength\topsep{0pt}
+  \setlength\parskip{0pt}
+  \begin{center}
+}{%
+  \end{center}
+}
+
+\newcounter{@affiliationcounter}
+\newcommand{\@pa}[1]{%
+% ``#1''
+\ifcsname the@affil#1\endcsname
+   % do nothing
+\else
+  \ifcsname @icmlsymbol#1\endcsname
+    % nothing
+  \else
+  \stepcounter{@affiliationcounter}%
+  \newcounter{@affil#1}%
+  \setcounter{@affil#1}{\value{@affiliationcounter}}%
+  \fi
+\fi%
+\ifcsname @icmlsymbol#1\endcsname
+  \textsuperscript{\csname @icmlsymbol#1\endcsname\,}%
+\else
+  %\expandafter\footnotemark[\arabic{@affil#1}\,]%
+  \textsuperscript{\arabic{@affil#1}\,}%
+\fi
+}
+
+%\newcommand{\icmlauthor}[2]{%
+%\addtofullauthorlist{#1}%
+%#1\@for\theaffil:=#2\do{\pa{\theaffil}}%
+%}
+\newcommand{\icmlauthor}[2]{%
+  \ifdefined\isaccepted
+    \mbox{\bf #1}\,\@for\theaffil:=#2\do{\@pa{\theaffil}} \addtofullauthorlist{#1}%
+   \else
+    \ifdefined\@icmlfirsttime
+    \else
+      \gdef\@icmlfirsttime{1}
+      \mbox{\bf Anonymous Authors}\@pa{@anon} \addtofullauthorlist{Anonymous Authors}
+     \fi
+    \fi
+}
+
+\newcommand{\icmlsetsymbol}[2]{%
+  \expandafter\gdef\csname @icmlsymbol#1\endcsname{#2}
+ }
+   
+
+\newcommand{\icmlaffiliation}[2]{%
+\ifdefined\isaccepted
+\ifcsname the@affil#1\endcsname
+ \expandafter\gdef\csname @affilname\csname the@affil#1\endcsname\endcsname{#2}%
+\else
+  {\bf AUTHORERR: Error in use of \textbackslash{}icmlaffiliation command. Label ``#1'' not mentioned in some \textbackslash{}icmlauthor\{author name\}\{labels here\} command beforehand. }
+  \typeout{}%
+  \typeout{}%
+  \typeout{*******************************************************}%
+  \typeout{Affiliation label undefined. }%
+  \typeout{Make sure \string\icmlaffiliation\space follows }
+  \typeout{all of \string\icmlauthor\space commands}%
+  \typeout{*******************************************************}%
+  \typeout{}%
+  \typeout{}%
+\fi
+\else % \isaccepted
+ % can be called multiple times... it's idempotent
+ \expandafter\gdef\csname @affilname1\endcsname{Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country}
+\fi
+}
+
+\newcommand{\icmlcorrespondingauthor}[2]{
+\ifdefined\isaccepted
+ \ifdefined\icmlcorrespondingauthor@text
+   \g@addto@macro\icmlcorrespondingauthor@text{, #1 \textless{}#2\textgreater{}}
+ \else
+   \gdef\icmlcorrespondingauthor@text{#1 \textless{}#2\textgreater{}}
+ \fi
+\else
+\gdef\icmlcorrespondingauthor@text{Anonymous Author \textless{}anon.email@domain.com\textgreater{}}
+\fi
+}
+
+\newcommand{\icmlEqualContribution}{\textsuperscript{*}Equal contribution }
+
+\newcounter{@affilnum}
+\newcommand{\printAffiliationsAndNotice}[1]{%
+\stepcounter{@affiliationcounter}%
+{\let\thefootnote\relax\footnotetext{\hspace*{-\footnotesep}#1%
+\forloop{@affilnum}{1}{\value{@affilnum} < \value{@affiliationcounter}}{
+\textsuperscript{\arabic{@affilnum}}\ifcsname @affilname\the@affilnum\endcsname%
+\csname @affilname\the@affilnum\endcsname%
+\else
+{\bf AUTHORERR: Missing \textbackslash{}icmlaffiliation.}
+\fi
+}.
+\ifdefined\icmlcorrespondingauthor@text
+Correspondence to: \icmlcorrespondingauthor@text.
+\else
+{\bf AUTHORERR: Missing \textbackslash{}icmlcorrespondingauthor.}
+\fi
+
+\ \\
+\Notice@String
+}
+}
+}
+
+  
+%\makeatother
+
+\long\def\icmladdress#1{%
+ {\bf The \textbackslash{}icmladdress command is no longer used.  See the example\_paper PDF .tex for usage of \textbackslash{}icmlauther and \textbackslash{}icmlaffiliation.}
+}
+
+%% keywords as first class citizens
+\def\icmlkeywords#1{%
+%  \ifdefined\isaccepted \else
+%    \par {\bf Keywords:} #1%
+%  \fi
+%  \ifdefined\nohyperref\else\ifdefined\hypersetup
+%    \hypersetup{pdfkeywords={#1}}
+%  \fi\fi
+%  \ifdefined\isaccepted \else
+%    \par {\bf Keywords:} #1%
+%  \fi
+  \ifdefined\nohyperref\else\ifdefined\hypersetup
+    \hypersetup{pdfkeywords={#1}}
+  \fi\fi
+}
+
+% modification to natbib citations
+\setcitestyle{authoryear,round,citesep={;},aysep={,},yysep={;}}
+
+% Redefinition of the abstract environment. 
+\renewenvironment{abstract}
+   {%
+% Insert the ``appearing in'' copyright notice.
+%\@copyrightspace
+\centerline{\large\bf Abstract}
+    \vspace{-0.12in}\begin{quote}}
+   {\par\end{quote}\vskip 0.12in}
+
+% numbered section headings with different treatment of numbers
+
+\def\@startsection#1#2#3#4#5#6{\if@noskipsec \leavevmode \fi
+   \par \@tempskipa #4\relax
+   \@afterindenttrue
+% Altered the following line to indent a section's first paragraph. 
+%  \ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \@afterindentfalse\fi
+   \ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \fi
+   \if@nobreak \everypar{}\else
+     \addpenalty{\@secpenalty}\addvspace{\@tempskipa}\fi \@ifstar
+     {\@ssect{#3}{#4}{#5}{#6}}{\@dblarg{\@sict{#1}{#2}{#3}{#4}{#5}{#6}}}}
+
+\def\@sict#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
+     \def\@svsec{}\else 
+     \refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname}\fi
+     \@tempskipa #5\relax
+      \ifdim \@tempskipa>\z@
+        \begingroup #6\relax
+          \@hangfrom{\hskip #3\relax\@svsec.~}{\interlinepenalty \@M #8\par}
+        \endgroup
+       \csname #1mark\endcsname{#7}\addcontentsline
+         {toc}{#1}{\ifnum #2>\c@secnumdepth \else
+                      \protect\numberline{\csname the#1\endcsname}\fi
+                    #7}\else
+        \def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
+                      {#7}\addcontentsline
+                           {toc}{#1}{\ifnum #2>\c@secnumdepth \else
+                             \protect\numberline{\csname the#1\endcsname}\fi
+                       #7}}\fi
+     \@xsect{#5}}
+
+\def\@sect#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
+     \def\@svsec{}\else 
+     \refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname\hskip 0.4em }\fi
+     \@tempskipa #5\relax
+      \ifdim \@tempskipa>\z@ 
+        \begingroup #6\relax
+          \@hangfrom{\hskip #3\relax\@svsec}{\interlinepenalty \@M #8\par}
+        \endgroup
+       \csname #1mark\endcsname{#7}\addcontentsline
+         {toc}{#1}{\ifnum #2>\c@secnumdepth \else
+                      \protect\numberline{\csname the#1\endcsname}\fi
+                    #7}\else
+        \def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
+                      {#7}\addcontentsline
+                           {toc}{#1}{\ifnum #2>\c@secnumdepth \else
+                             \protect\numberline{\csname the#1\endcsname}\fi
+                       #7}}\fi
+     \@xsect{#5}}
+
+% section headings with less space above and below them
+\def\thesection {\arabic{section}}
+\def\thesubsection {\thesection.\arabic{subsection}}
+\def\section{\@startsection{section}{1}{\z@}{-0.12in}{0.02in}
+             {\large\bf\raggedright}}
+\def\subsection{\@startsection{subsection}{2}{\z@}{-0.10in}{0.01in}
+                {\normalsize\bf\raggedright}}
+\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-0.08in}{0.01in}
+                {\normalsize\sc\raggedright}}
+\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
+  0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+\def\subparagraph{\@startsection{subparagraph}{5}{\z@}{1.5ex plus
+  0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+ 
+% Footnotes 
+\footnotesep 6.65pt % 
+\skip\footins 9pt 
+\def\footnoterule{\kern-3pt \hrule width 0.8in \kern 2.6pt } 
+\setcounter{footnote}{0} 
+ 
+% Lists and paragraphs 
+\parindent 0pt 
+\topsep 4pt plus 1pt minus 2pt 
+\partopsep 1pt plus 0.5pt minus 0.5pt 
+\itemsep 2pt plus 1pt minus 0.5pt 
+\parsep 2pt plus 1pt minus 0.5pt 
+\parskip 6pt
+ 
+\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em 
+\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em  
+\leftmarginvi .5em 
+\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt 
+ 
+\def\@listi{\leftmargin\leftmargini} 
+\def\@listii{\leftmargin\leftmarginii 
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep 
+   \topsep 2pt plus 1pt minus 0.5pt 
+   \parsep 1pt plus 0.5pt minus 0.5pt 
+   \itemsep \parsep} 
+\def\@listiii{\leftmargin\leftmarginiii 
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep 
+    \topsep 1pt plus 0.5pt minus 0.5pt  
+    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt 
+    \itemsep \topsep} 
+\def\@listiv{\leftmargin\leftmarginiv 
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep} 
+\def\@listv{\leftmargin\leftmarginv 
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep} 
+\def\@listvi{\leftmargin\leftmarginvi 
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep} 
+ 
+\abovedisplayskip 7pt plus2pt minus5pt% 
+\belowdisplayskip \abovedisplayskip 
+\abovedisplayshortskip  0pt plus3pt%    
+\belowdisplayshortskip  4pt plus3pt minus3pt% 
+ 
+% Less leading in most fonts (due to the narrow columns) 
+% The choices were between 1-pt and 1.5-pt leading 
+\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt} 
+\def\small{\@setsize\small{10pt}\ixpt\@ixpt} 
+\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt} 
+\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt} 
+\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt} 
+\def\large{\@setsize\large{14pt}\xiipt\@xiipt} 
+\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt} 
+\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt} 
+\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt} 
+\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt} 
+
+% Revised formatting for figure captions and table titles. 
+\newsavebox\newcaptionbox\newdimen\newcaptionboxwid
+
+\long\def\@makecaption#1#2{
+ \vskip 10pt 
+        \baselineskip 11pt
+        \setbox\@tempboxa\hbox{#1. #2}
+        \ifdim \wd\@tempboxa >\hsize
+        \sbox{\newcaptionbox}{\small\sl #1.~}
+        \newcaptionboxwid=\wd\newcaptionbox
+        \usebox\newcaptionbox {\footnotesize #2}
+%        \usebox\newcaptionbox {\small #2}
+        \else 
+          \centerline{{\small\sl #1.} {\small #2}} 
+        \fi}
+
+\def\fnum@figure{Figure \thefigure}
+\def\fnum@table{Table \thetable}
+
+% Strut macros for skipping spaces above and below text in tables. 
+\def\abovestrut#1{\rule[0in]{0in}{#1}\ignorespaces}
+\def\belowstrut#1{\rule[-#1]{0in}{#1}\ignorespaces}
+
+\def\abovespace{\abovestrut{0.20in}}
+\def\aroundspace{\abovestrut{0.20in}\belowstrut{0.10in}}
+\def\belowspace{\belowstrut{0.10in}}
+
+% Various personal itemization commands. 
+\def\texitem#1{\par\noindent\hangindent 12pt
+               \hbox to 12pt {\hss #1 ~}\ignorespaces}
+\def\icmlitem{\texitem{$\bullet$}}
+
+% To comment out multiple lines of text.
+\long\def\comment#1{}
+
+
+
+
+%% Line counter (not in final version). Adapted from NIPS style file by Christoph Sawade
+
+% Vertical Ruler
+% This code is, largely, from the CVPR 2010 conference style file
+% ----- define vruler
+\makeatletter
+\newbox\icmlrulerbox
+\newcount\icmlrulercount
+\newdimen\icmlruleroffset
+\newdimen\cv@lineheight
+\newdimen\cv@boxheight
+\newbox\cv@tmpbox
+\newcount\cv@refno
+\newcount\cv@tot
+% NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
+\newcount\cv@tmpc@ \newcount\cv@tmpc
+\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
+\cv@tmpc=1 %
+\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
+   \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
+\ifnum#2<0\advance\cv@tmpc1\relax-\fi
+\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
+\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\makevruler[#1][#2][#3][#4][#5]{
+	\begingroup\offinterlineskip
+		\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
+		\global\setbox\icmlrulerbox=\vbox to \textheight{%
+			{
+				\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
+				\cv@lineheight=#1\global\icmlrulercount=#2%
+				\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
+				\cv@refno1\vskip-\cv@lineheight\vskip1ex%
+				\loop\setbox\cv@tmpbox=\hbox to0cm{					 % side margin
+					\hfil {\hfil\fillzeros[#4]\icmlrulercount}
+				}%
+				\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
+				\advance\cv@refno1\global\advance\icmlrulercount#3\relax
+				\ifnum\cv@refno<\cv@tot\repeat
+			}
+		}
+	\endgroup
+}%
+\makeatother
+% ----- end of vruler
+
+
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\icmlruler#1{\makevruler[12pt][#1][1][3][\textheight]\usebox{\icmlrulerbox}}
+\AddToShipoutPicture{%
+\icmlruleroffset=\textheight
+\advance\icmlruleroffset by 5.2pt % top margin
+  \color[rgb]{.7,.7,.7}
+  \ifdefined\isaccepted \else
+	  \AtTextUpperLeft{%
+	    \put(\LenToUnit{-35pt},\LenToUnit{-\icmlruleroffset}){%left ruler
+	      \icmlruler{\icmlrulercount}}
+%	    \put(\LenToUnit{1.04\textwidth},\LenToUnit{-\icmlruleroffset}){%right ruler
+%	      \icmlruler{\icmlrulercount}}
+	  }
+	 \fi
+}
+\endinput
--- a/report/mlp2022_includes.tex
+++ b/report/mlp2022_includes.tex
@ -0,0 +1,50 @@
+\usepackage[T1]{fontenc}
+\usepackage{amssymb,amsmath}
+\usepackage{txfonts}
+\usepackage{microtype}
+
+% For figures
+\usepackage{graphicx}
+\usepackage{subcaption} 
+
+% For citations
+\usepackage{natbib}
+
+% For algorithms
+\usepackage{algorithm}
+\usepackage{algorithmic}
+
+% the hyperref package is used to produce hyperlinks in the
+% resulting PDF.  If this breaks your system, please commend out the
+% following usepackage line and replace \usepackage{mlp2017} with
+% \usepackage[nohyperref]{mlp2017} below.
+\usepackage{hyperref}
+\usepackage{url}
+\urlstyle{same}
+
+\usepackage{color}
+\usepackage{booktabs} % To thicken table lines
+\usepackage{multirow} % Multirow cells in table
+
+% Packages hyperref and algorithmic misbehave sometimes.  We can fix
+% this with the following command.
+\newcommand{\theHalgorithm}{\arabic{algorithm}}
+
+
+% Set up MLP coursework style (based on ICML style)
+\usepackage{mlp2022}
+\mlptitlerunning{MLP Coursework 2 (\studentNumber)}
+\bibliographystyle{icml2017}
+\usepackage{bm,bbm}
+\usepackage{soul}
+
+\DeclareMathOperator{\softmax}{softmax}
+\DeclareMathOperator{\sigmoid}{sigmoid}
+\DeclareMathOperator{\sgn}{sgn}
+\DeclareMathOperator{\relu}{relu}
+\DeclareMathOperator{\lrelu}{lrelu}
+\DeclareMathOperator{\elu}{elu}
+\DeclareMathOperator{\selu}{selu}
+\DeclareMathOperator{\maxout}{maxout}
+\newcommand{\bx}{\bm{x}}
+
--- a/report/refs.bib
+++ b/report/refs.bib
@ -0,0 +1,184 @@
+
+@inproceedings{goodfellow2013maxout,
+  title={Maxout networks},
+  author={Goodfellow, Ian and Warde-Farley, David and Mirza, Mehdi and Courville, Aaron and Bengio, Yoshua},
+  booktitle={International conference on machine learning},
+  pages={1319--1327},
+  year={2013},
+  organization={PMLR}
+}
+
+@article{srivastava2014dropout,
+  title={Dropout: a simple way to prevent neural networks from overfitting},
+  author={Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever, Ilya and Salakhutdinov, Ruslan},
+  journal={The journal of machine learning research},
+  volume={15},
+  number={1},
+  pages={1929--1958},
+  year={2014},
+  publisher={JMLR. org}
+}
+
+@book{Goodfellow-et-al-2016,
+    title={Deep Learning},
+    author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
+    publisher={MIT Press},
+    note={\url{http://www.deeplearningbook.org}},
+    year={2016}
+}
+
+@inproceedings{ng2004feature,
+  title={Feature selection, L1 vs. L2 regularization, and rotational invariance},
+  author={Ng, Andrew Y},
+  booktitle={Proceedings of the twenty-first international conference on Machine learning},
+  pages={78},
+  year={2004}
+}
+
+@article{simonyan2014very,
+  title={Very deep convolutional networks for large-scale image recognition},
+  author={Simonyan, Karen and Zisserman, Andrew},
+  journal={arXiv preprint arXiv:1409.1556},
+  year={2014}
+}
+
+@inproceedings{he2016deep,
+  title={Deep residual learning for image recognition},
+  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
+  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+  pages={770--778},
+  year={2016}
+}
+
+@inproceedings{glorot2010understanding,
+  title={Understanding the difficulty of training deep feedforward neural networks},
+  author={Glorot, Xavier and Bengio, Yoshua},
+  booktitle={Proceedings of the thirteenth international conference on artificial intelligence and statistics},
+  pages={249--256},
+  year={2010},
+  organization={JMLR Workshop and Conference Proceedings}
+}
+
+@inproceedings{bengio1993problem,
+  title={The problem of learning long-term dependencies in recurrent networks},
+  author={Bengio, Yoshua and Frasconi, Paolo and Simard, Patrice},
+  booktitle={IEEE international conference on neural networks},
+  pages={1183--1188},
+  year={1993},
+  organization={IEEE}
+}
+
+@inproceedings{ide2017improvement,
+  title={Improvement of learning for CNN with ReLU activation by sparse regularization},
+  author={Ide, Hidenori and Kurita, Takio},
+  booktitle={2017 International Joint Conference on Neural Networks (IJCNN)},
+  pages={2684--2691},
+  year={2017},
+  organization={IEEE}
+}
+
+@inproceedings{ioffe2015batch,
+  title={Batch normalization: Accelerating deep network training by reducing internal covariate shift},
+  author={Ioffe, Sergey and Szegedy, Christian},
+  booktitle={International conference on machine learning},
+  pages={448--456},
+  year={2015},
+  organization={PMLR}
+}
+
+@inproceedings{huang2017densely,
+  title={Densely connected convolutional networks},
+  author={Huang, Gao and Liu, Zhuang and Van Der Maaten, Laurens and Weinberger, Kilian Q},
+  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+  pages={4700--4708},
+  year={2017}
+}
+
+@article{rumelhart1986learning,
+  title={Learning representations by back-propagating errors},
+  author={Rumelhart, David E and Hinton, Geoffrey E and Williams, Ronald J},
+  journal={nature},
+  volume={323},
+  number={6088},
+  pages={533--536},
+  year={1986},
+  publisher={Nature Publishing Group}
+}
+
+@inproceedings{du2019gradient,
+  title={Gradient descent finds global minima of deep neural networks},
+  author={Du, Simon and Lee, Jason and Li, Haochuan and Wang, Liwei and Zhai, Xiyu},
+  booktitle={International Conference on Machine Learning},
+  pages={1675--1685},
+  year={2019},
+  organization={PMLR}
+}
+
+@inproceedings{pascanu2013difficulty,
+  title={On the difficulty of training recurrent neural networks},
+  author={Pascanu, Razvan and Mikolov, Tomas and Bengio, Yoshua},
+  booktitle={International conference on machine learning},
+  pages={1310--1318},
+  year={2013},
+  organization={PMLR}
+}
+
+@article{li2017visualizing,
+  title={Visualizing the loss landscape of neural nets},
+  author={Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom},
+  journal={arXiv preprint arXiv:1712.09913},
+  year={2017}
+}
+
+@inproceedings{santurkar2018does,
+  title={How does batch normalization help optimization?},
+  author={Santurkar, Shibani and Tsipras, Dimitris and Ilyas, Andrew and M{\k{a}}dry, Aleksander},
+  booktitle={Proceedings of the 32nd international conference on neural information processing systems},
+  pages={2488--2498},
+  year={2018}
+}
+
+@article{krizhevsky2009learning,
+  title={Learning multiple layers of features from tiny images},
+  author={Krizhevsky, Alex and Hinton, Geoffrey and others},
+  journal={},
+  year={2009},
+  publisher={Citeseer}
+}
+
+@incollection{lecun2012efficient,
+  title={Efficient backprop},
+  author={LeCun, Yann A and Bottou, L{\'e}on and Orr, Genevieve B and M{\"u}ller, Klaus-Robert},
+  booktitle={Neural networks: Tricks of the trade},
+  pages={9--48},
+  year={2012},
+  publisher={Springer}
+}
+
+@book{bishop1995neural,
+  title={Neural networks for pattern recognition},
+  author={Bishop, Christopher M and others},
+  year={1995},
+  publisher={Oxford university press}
+}
+
+@article{vaswani2017attention,
+  author       = {Ashish Vaswani and
+                  Noam Shazeer and
+                  Niki Parmar and
+                  Jakob Uszkoreit and
+                  Llion Jones and
+                  Aidan N. Gomez and
+                  Lukasz Kaiser and
+                  Illia Polosukhin},
+  title        = {Attention Is All You Need},
+  journal      = {CoRR},
+  volume       = {abs/1706.03762},
+  year         = {2017},
+  url          = {http://arxiv.org/abs/1706.03762},
+  eprinttype    = {arXiv},
+  eprint       = {1706.03762},
+  timestamp    = {Sat, 23 Jan 2021 01:20:40 +0100},
+  biburl       = {https://dblp.org/rec/journals/corr/VaswaniSPUJGKP17.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
--- a/run_vgg_08_default.sh
+++ b/run_vgg_08_default.sh
@ -0,0 +1 @@
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_08_experiment --use_gpu True --num_classes 100 --block_type 'conv_block' --continue_from_epoch -1
--- a/run_vgg_38_bn.sh
+++ b/run_vgg_38_bn.sh
@ -0,0 +1 @@
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG38_BN --use_gpu True --num_classes 100 --block_type 'conv_bn' --continue_from_epoch -1
--- a/run_vgg_38_bn_rc.sh
+++ b/run_vgg_38_bn_rc.sh
@ -0,0 +1 @@
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG38_BN_RC --use_gpu True --num_classes 100 --block_type 'conv_bn_rc' --continue_from_epoch -1 --learning-rate 0.01
--- a/run_vgg_38_default.sh
+++ b/run_vgg_38_default.sh
@ -0,0 +1 @@
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG_38_experiment --use_gpu True --num_classes 100 --block_type 'conv_block' --continue_from_epoch -1
--- a/setup.py
+++ b/setup.py
@ -10,4 +10,3 @@ setup(
    url = "https://github.com/VICO-UoE/mlpractical",
    packages=['mlp']
 )
-
Author	SHA1	Message	Date
Anton Lydike	46ca7c6dfd	final changes	2024-11-22 09:26:24 +00:00
Anton Lydike	c29681b4ba	changes	2024-11-19 17:04:58 +00:00
Anton Lydike	ae0e14b5fb	add BN+RC layer	2024-11-19 10:38:54 +00:00
Anton Lydike	7861133463	don't plot bias layers	2024-11-19 10:10:02 +00:00
Anton Lydike	94d3a1d484	add runner for batch normalized version	2024-11-19 09:47:18 +00:00
Anton Lydike	cb5c6f4e19	formatting and BN	2024-11-19 09:42:31 +00:00
Anton Lydike	92fccb8eb2	add a bunch of extra files	2024-11-18 20:40:20 +00:00
Anton Lydike	05e53aacaf	fix experiment_builder.py	2024-11-18 13:30:36 +00:00
tpmmthomas	58613aee35	Update cw2 debug	2024-11-11 22:41:17 +08:00
tpmmthomas	26364ec94e	update cw2	2024-11-11 22:33:32 +08:00
Visual Computing (VICO) Group	98e232af70	Add missing files	2024-11-11 13:00:28 +00:00
Visual Computing (VICO) Group	a404c62b6f	Rm cw1 figures	2024-11-11 11:46:48 +00:00
Visual Computing (VICO) Group	45a2df1b11	Update	2024-11-11 11:34:32 +00:00
Visual Computing (VICO) Group	be1f124dff	Update	2024-11-11 09:57:57 +00:00
Visual Computing (VICO) Group	9b9a7d50fa	Add missing data files	2024-10-14 11:01:45 +01:00
Visual Computing (VICO) Group	5d52a22448	Add missing files	2024-10-14 10:51:43 +01:00
Hakan Bilen	4657cca862	Update README.md	2024-10-14 10:10:17 +01:00
Hakan Bilen	6a17a30da1	Update README.md	2024-10-14 10:08:48 +01:00
Visual Computing (VICO) Group	2fda722e3d	Minor update	2024-10-14 10:03:02 +01:00
Visual Computing (VICO) Group	6883eb77c2	Add cw1	2024-10-14 09:56:47 +01:00
tpmmthomas	207595b4a1	update lab 4	2024-10-10 21:52:23 +08:00
tpmmthomas	9f1f3ccd04	Update lab 3	2024-10-03 21:53:33 +08:00
				`@ -0,0 +1 @@`
				`Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory. However certain LaTeX distributions are missing certain files, and the they are included in this directory. If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.`
				`@ -0,0 +1 @@`
				`python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_08_experiment --use_gpu True --num_classes 100 --block_type 'conv_block' --continue_from_epoch -1`