Init

2018-01-31 22:28:57 +00:00 · 2018-01-31 22:28:57 +00:00 · b8e3e10f13
commit b8e3e10f13
parent d14e05706f
70 changed files with 5800 additions and 21313 deletions
--- a/README.md
+++ b/README.md
@ -5,31 +5,53 @@ This repository contains the code for the University of Edinburgh [School of Inf
 This assignment-based course is focused on the implementation and evaluation of machine learning systems. Students who do this course will have experience in the design, implementation, training, and evaluation of machine learning systems.

 The code in this repository is split into:
+1. notebooks: 
+    1. Introduction_to_tensorflow: Introduces students to the basics of tensorflow and lower level operations.
+    2. Introduction_to_tf_mlp_repo: Introduces students to the high level functionality of this repo and how one 
+    could run an experiment. The code is full of comments and documentation so you should spend more time 
+    reading and understanding the code by running simple experiments and changing pieces of code to see the impact 
+    on the system.
+2. utils: 
+    1. network_summary: Provides utilities with which one can get network summaries, such as the number of parameters and names of layers.
+    2. parser_utils which are used to parse arguments passed to the training scripts.
+    3. storage, which is responsible for storing network statistics.
+3. data_providers.py : Provides the data providers for training, validation and testing.
+4. network_architectures.py: Defines the network architectures. We provide VGGNet as an example.
+5. network_builder.py: Builds the tensorflow computation graph. In more detail, it builds the losses, tensorflow summaries and training operations.
+6. network_trainer.py: Runs an experiment, composed of training, validation and testing. It is setup to use arguments such that one can easily write multiple bash scripts with different hyperparameters and run experiments very quickly with minimal code changes.
    
-  *  a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
-  *  a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
    
 ## Getting set up

 Detailed instructions for setting up a development environment for the course are given in [this file](notes/environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
-
-## Frequent Issues/Solutions
-
-Don’t forget that from your /mlpractica/l folder you should first do 
+Once you have setup the basic environment then to install the requirements for the tf_mlp repo simply run:
 ```
-git status #to check whether there are any changes in your local branch. If there are, you need to do: 
-git add “path /to/file”
-git commit -m “some message”
+pip install -r requirements.txt
+```
+For CPU tensorflow and
+```
+pip install -r requirements_gpu.txt
+```
+for GPU tensorflow.
+
+If you install the wrong version of tensorflow simply run
+
+```
+pip uninstall $tensorflow_to_uninstall
+```
+replacing $tensorflow_to_uninstall with the tensorflow you want to install and then install the correct one 
+using pip install as normally done.
+
+## Additional Packages
+
+For the tf_mlp you are required to install either the tensorflow-1.4.1 package for CPU users or the tensorflow_gpu-1.4.1 for GPU users. Both of these can easily be installed via pip using:
+
+```
+pip install tensorflow
 ```

-Only if this is OK, you can run 
-```
-git checkout mlp2017-8/lab[n]
-```
-Related to MLP module not found error:
-Another thing is to make sure you have you MLP_DATA_DIR path correctly set. You can check this by typing 
-```echo $MLP_DATA_DIR```
-in the command line. If this is not set up, you need to follow the instructions on the set-up-environment to get going. 
+or 

-Finally, please make sure you have run 
-```python setup.py develop```
+```
+pip install tensorflow_gpu
+```
--- a/cifar100_network_trainer.py
+++ b/cifar100_network_trainer.py
@ -0,0 +1,182 @@
+import argparse
+import numpy as np
+import tensorflow as tf
+import tqdm
+from data_providers import CIFAR100DataProvider
+from network_builder import ClassifierNetworkGraph
+from utils.parser_utils import ParserClass
+from utils.storage import build_experiment_folder, save_statistics
+
+tf.reset_default_graph()  # resets any previous graphs to clear memory
+parser = argparse.ArgumentParser(description='Welcome to CNN experiments script')  # generates an argument parser
+parser_extractor = ParserClass(parser=parser)  # creates a parser class to process the parsed input
+
+batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
+strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
+# returns a list of objects that contain
+# our parsed input
+
+experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
+                                                                   batch_size, batch_norm,
+                                                                   strided_dim_reduction)
+#  generate experiment name
+
+rng = np.random.RandomState(seed=seed)  # set seed
+
+train_data = CIFAR100DataProvider(which_set="train", batch_size=batch_size, rng=rng)
+val_data = CIFAR100DataProvider(which_set="valid", batch_size=batch_size, rng=rng)
+test_data = CIFAR100DataProvider(which_set="test", batch_size=batch_size, rng=rng)
+#  setup our data providers
+
+print("Running {}".format(experiment_name))
+print("Starting from epoch {}".format(continue_from_epoch))
+
+saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path)  # generate experiment dir
+
+# Placeholder setup
+data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1], train_data.inputs.shape[2],
+                                          train_data.inputs.shape[3]], 'data-inputs')
+data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
+
+training_phase = tf.placeholder(tf.bool, name='training-flag')
+rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
+dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
+
+classifier_network = ClassifierNetworkGraph(input_x=data_inputs, target_placeholder=data_targets,
+                                            dropout_rate=dropout_rate, batch_size=batch_size,
+                                            num_channels=train_data.inputs.shape[2], n_classes=train_data.num_classes,
+                                            is_training=training_phase, augment_rotate_flag=rotate_data,
+                                            strided_dim_reduction=strided_dim_reduction,
+                                            use_batch_normalization=batch_norm)  # initialize our computational graph
+
+if continue_from_epoch == -1:  # if this is a new experiment and not continuation of a previous one then generate a new
+    # statistics file
+    save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
+                                                                 "val_c_loss", "val_c_accuracy",
+                                                                 "test_c_loss", "test_c_accuracy"], create=True)
+
+start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0  # if new experiment start from 0 otherwise
+# continue where left off
+
+summary_op, losses_ops, c_error_opt_op = classifier_network.init_train()  # get graph operations (ops)
+
+total_train_batches = train_data.num_batches
+total_val_batches = val_data.num_batches
+total_test_batches = test_data.num_batches
+
+best_epoch = 0
+
+if tensorboard_enable:
+    print("saved tensorboard file at", logs_filepath)
+    writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
+
+init = tf.global_variables_initializer()  # initialization op for the graph
+
+with tf.Session() as sess:
+    sess.run(init)  # actually running the initialization op
+    train_saver = tf.train.Saver()  # saver object that will save our graph so we can reload it later for continuation of
+    val_saver = tf.train.Saver()
+    #  training or inference
+
+    continue_from_epoch = -1
+
+    if continue_from_epoch != -1:
+        train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
+                                                   continue_from_epoch))  # restore previous graph to continue operations
+
+    best_val_accuracy = 0.
+    with tqdm.tqdm(total=epochs) as epoch_pbar:
+        for e in range(start_epoch, epochs):
+            total_c_loss = 0.
+            total_accuracy = 0.
+            with tqdm.tqdm(total=total_train_batches) as pbar_train:
+                for batch_idx, (x_batch, y_batch) in enumerate(train_data):
+                    iter_id = e * total_train_batches + batch_idx
+                    _, c_loss_value, acc = sess.run(
+                        [c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: True, rotate_data: False})
+                    # Here we execute the c_error_opt_op which trains the network and also the ops that compute the
+                    # loss and accuracy, we save those in _, c_loss_value and acc respectively.
+                    total_c_loss += c_loss_value  # add loss of current iter to sum
+                    total_accuracy += acc # add acc of current iter to sum
+
+                    iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
+                                                                                         total_c_loss / (batch_idx + 1),
+                                                                                         total_accuracy / (
+                                                                                             batch_idx + 1)) # show
+                                            # iter statistics using running averages of previous iter within this epoch
+                    pbar_train.set_description(iter_out)
+                    pbar_train.update(1)
+                    if tensorboard_enable and batch_idx % 25 == 0:  # save tensorboard summary every 25 iterations
+                        _summary = sess.run(
+                            summary_op,
+                            feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                       data_targets: y_batch, training_phase: True, rotate_data: False})
+                        writer.add_summary(_summary, global_step=iter_id)
+
+            total_c_loss /= total_train_batches  # compute mean of los
+            total_accuracy /= total_train_batches # compute mean of accuracy
+
+            save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+            # save graph and weights
+            print("Saved current model at", save_path)
+
+            total_val_c_loss = 0.
+            total_val_accuracy = 0. #  run validation stage, note how training_phase placeholder is set to False
+            # and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
+            #  to collect losses on the validation set
+            with tqdm.tqdm(total=total_val_batches) as pbar_val:
+                for batch_idx, (x_batch, y_batch) in enumerate(val_data):
+                    c_loss_value, acc = sess.run(
+                        [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: False, rotate_data: False})
+                    total_val_c_loss += c_loss_value
+                    total_val_accuracy += acc
+                    iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
+                                                                       total_val_accuracy / (batch_idx + 1))
+                    pbar_val.set_description(iter_out)
+                    pbar_val.update(1)
+
+            total_val_c_loss /= total_val_batches
+            total_val_accuracy /= total_val_batches
+
+            if best_val_accuracy < total_val_accuracy:  # check if val acc better than the previous best and if
+                # so save current as best and save the model as the best validation model to be used on the test set
+                #  after the final epoch
+                best_val_accuracy = total_val_accuracy
+                best_epoch = e
+                save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+                print("Saved best validation score model at", save_path)
+
+            epoch_pbar.update(1)
+            # save statistics of this epoch, train and val without test set performance
+            save_statistics(logs_filepath, "result_summary_statistics",
+                            [e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
+                             -1, -1])
+
+        val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
+        # restore model with best performance on validation set
+        total_test_c_loss = 0.
+        total_test_accuracy = 0.
+        # computer test loss and accuracy and save
+        with tqdm.tqdm(total=total_test_batches) as pbar_test:
+            for batch_id, (x_batch, y_batch) in enumerate(test_data):
+                c_loss_value, acc = sess.run(
+                    [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                    feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                               data_targets: y_batch, training_phase: False, rotate_data: False})
+                total_test_c_loss += c_loss_value
+                total_test_accuracy += acc
+                iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
+                                                                     acc / (batch_idx + 1))
+                pbar_test.set_description(iter_out)
+                pbar_test.update(1)
+
+        total_test_c_loss /= total_test_batches
+        total_test_accuracy /= total_test_batches
+
+        save_statistics(logs_filepath, "result_summary_statistics",
+                        ["test set performance", -1, -1, -1, -1,
+                         total_test_c_loss, total_test_accuracy])
--- a/cifar10_network_trainer.py
+++ b/cifar10_network_trainer.py
@ -0,0 +1,182 @@
+import argparse
+import numpy as np
+import tensorflow as tf
+import tqdm
+from data_providers import CIFAR10DataProvider
+from network_builder import ClassifierNetworkGraph
+from utils.parser_utils import ParserClass
+from utils.storage import build_experiment_folder, save_statistics
+
+tf.reset_default_graph()  # resets any previous graphs to clear memory
+parser = argparse.ArgumentParser(description='Welcome to CNN experiments script')  # generates an argument parser
+parser_extractor = ParserClass(parser=parser)  # creates a parser class to process the parsed input
+
+batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
+strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
+# returns a list of objects that contain
+# our parsed input
+
+experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
+                                                                   batch_size, batch_norm,
+                                                                   strided_dim_reduction)
+#  generate experiment name
+
+rng = np.random.RandomState(seed=seed)  # set seed
+
+train_data = CIFAR10DataProvider(which_set="train", batch_size=batch_size, rng=rng)
+val_data = CIFAR10DataProvider(which_set="valid", batch_size=batch_size, rng=rng)
+test_data = CIFAR10DataProvider(which_set="test", batch_size=batch_size, rng=rng)
+#  setup our data providers
+
+print("Running {}".format(experiment_name))
+print("Starting from epoch {}".format(continue_from_epoch))
+
+saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path)  # generate experiment dir
+
+# Placeholder setup
+data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1], train_data.inputs.shape[2],
+                                          train_data.inputs.shape[3]], 'data-inputs')
+data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
+
+training_phase = tf.placeholder(tf.bool, name='training-flag')
+rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
+dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
+
+classifier_network = ClassifierNetworkGraph(input_x=data_inputs, target_placeholder=data_targets,
+                                            dropout_rate=dropout_rate, batch_size=batch_size,
+                                            num_channels=train_data.inputs.shape[2], n_classes=train_data.num_classes,
+                                            is_training=training_phase, augment_rotate_flag=rotate_data,
+                                            strided_dim_reduction=strided_dim_reduction,
+                                            use_batch_normalization=batch_norm)  # initialize our computational graph
+
+if continue_from_epoch == -1:  # if this is a new experiment and not continuation of a previous one then generate a new
+    # statistics file
+    save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
+                                                                 "val_c_loss", "val_c_accuracy",
+                                                                 "test_c_loss", "test_c_accuracy"], create=True)
+
+start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0  # if new experiment start from 0 otherwise
+# continue where left off
+
+summary_op, losses_ops, c_error_opt_op = classifier_network.init_train()  # get graph operations (ops)
+
+total_train_batches = train_data.num_batches
+total_val_batches = val_data.num_batches
+total_test_batches = test_data.num_batches
+
+best_epoch = 0
+
+if tensorboard_enable:
+    print("saved tensorboard file at", logs_filepath)
+    writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
+
+init = tf.global_variables_initializer()  # initialization op for the graph
+
+with tf.Session() as sess:
+    sess.run(init)  # actually running the initialization op
+    train_saver = tf.train.Saver()  # saver object that will save our graph so we can reload it later for continuation of
+    val_saver = tf.train.Saver()
+    #  training or inference
+
+    continue_from_epoch = -1
+
+    if continue_from_epoch != -1:
+        train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
+                                                   continue_from_epoch))  # restore previous graph to continue operations
+
+    best_val_accuracy = 0.
+    with tqdm.tqdm(total=epochs) as epoch_pbar:
+        for e in range(start_epoch, epochs):
+            total_c_loss = 0.
+            total_accuracy = 0.
+            with tqdm.tqdm(total=total_train_batches) as pbar_train:
+                for batch_idx, (x_batch, y_batch) in enumerate(train_data):
+                    iter_id = e * total_train_batches + batch_idx
+                    _, c_loss_value, acc = sess.run(
+                        [c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: True, rotate_data: False})
+                    # Here we execute the c_error_opt_op which trains the network and also the ops that compute the
+                    # loss and accuracy, we save those in _, c_loss_value and acc respectively.
+                    total_c_loss += c_loss_value  # add loss of current iter to sum
+                    total_accuracy += acc # add acc of current iter to sum
+
+                    iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
+                                                                                         total_c_loss / (batch_idx + 1),
+                                                                                         total_accuracy / (
+                                                                                             batch_idx + 1)) # show
+                                            # iter statistics using running averages of previous iter within this epoch
+                    pbar_train.set_description(iter_out)
+                    pbar_train.update(1)
+                    if tensorboard_enable and batch_idx % 25 == 0:  # save tensorboard summary every 25 iterations
+                        _summary = sess.run(
+                            summary_op,
+                            feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                       data_targets: y_batch, training_phase: True, rotate_data: False})
+                        writer.add_summary(_summary, global_step=iter_id)
+
+            total_c_loss /= total_train_batches  # compute mean of los
+            total_accuracy /= total_train_batches # compute mean of accuracy
+
+            save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+            # save graph and weights
+            print("Saved current model at", save_path)
+
+            total_val_c_loss = 0.
+            total_val_accuracy = 0. #  run validation stage, note how training_phase placeholder is set to False
+            # and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
+            #  to collect losses on the validation set
+            with tqdm.tqdm(total=total_val_batches) as pbar_val:
+                for batch_idx, (x_batch, y_batch) in enumerate(val_data):
+                    c_loss_value, acc = sess.run(
+                        [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: False, rotate_data: False})
+                    total_val_c_loss += c_loss_value
+                    total_val_accuracy += acc
+                    iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
+                                                                       total_val_accuracy / (batch_idx + 1))
+                    pbar_val.set_description(iter_out)
+                    pbar_val.update(1)
+
+            total_val_c_loss /= total_val_batches
+            total_val_accuracy /= total_val_batches
+
+            if best_val_accuracy < total_val_accuracy:  # check if val acc better than the previous best and if
+                # so save current as best and save the model as the best validation model to be used on the test set
+                #  after the final epoch
+                best_val_accuracy = total_val_accuracy
+                best_epoch = e
+                save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+                print("Saved best validation score model at", save_path)
+
+            epoch_pbar.update(1)
+            # save statistics of this epoch, train and val without test set performance
+            save_statistics(logs_filepath, "result_summary_statistics",
+                            [e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
+                             -1, -1])
+
+        val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
+        # restore model with best performance on validation set
+        total_test_c_loss = 0.
+        total_test_accuracy = 0.
+        # computer test loss and accuracy and save
+        with tqdm.tqdm(total=total_test_batches) as pbar_test:
+            for batch_id, (x_batch, y_batch) in enumerate(test_data):
+                c_loss_value, acc = sess.run(
+                    [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                    feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                               data_targets: y_batch, training_phase: False, rotate_data: False})
+                total_test_c_loss += c_loss_value
+                total_test_accuracy += acc
+                iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
+                                                                     acc / (batch_idx + 1))
+                pbar_test.set_description(iter_out)
+                pbar_test.update(1)
+
+        total_test_c_loss /= total_test_batches
+        total_test_accuracy /= total_test_batches
+
+        save_statistics(logs_filepath, "result_summary_statistics",
+                        ["test set performance", -1, -1, -1, -1,
+                         total_test_c_loss, total_test_accuracy])
--- a/data/HadSSP_daily_qc.txt
+++ b/data/HadSSP_daily_qc.txt
--- a/data/ccpp_data.npz
+++ b/data/ccpp_data.npz
--- a/data_packer.py
+++ b/data_packer.py
@ -0,0 +1,61 @@
+import numpy as np
+import os
+
+np.random.seed(5112017)
+data_path = "data/cifar-10-batches-py"
+def unpickle(file):
+    import pickle
+    with open(file, 'rb') as fo:
+        dict = pickle.load(fo, encoding='bytes')
+    return dict
+
+train_data = []
+train_labels = []
+test_data = []
+test_labels = []
+
+for subdir, dir, files in os.walk(data_path):
+    for file in files:
+        if not("html" in file) and not("meta" in file) and not(".txt"in file):
+            filepath = os.path.join(subdir, file)
+            print(filepath)
+            data_batch = unpickle(filepath)
+            print(filepath, data_batch.keys())
+            if "test" not in file:
+                train_data.extend(data_batch[b'data'])
+                train_labels.extend(data_batch[b'labels'])
+            else:
+                test_data.extend(data_batch[b'data'])
+                test_labels.extend(data_batch[b'labels'])
+
+x_train = np.array(train_data)
+y_train = np.array(train_labels)
+
+x_test = np.array(test_data)
+y_test = np.array(test_labels)
+
+ids = np.arange(x_train.shape[0])
+np.random.shuffle(ids)
+
+x_train = x_train[ids]
+y_train = y_train[ids]
+
+val_start_index = int(0.85 * x_train.shape[0])
+print(val_start_index)
+
+x_val = x_train[val_start_index:]
+y_val = y_train[val_start_index:]
+
+x_train = x_train[:val_start_index]
+y_train = y_train[:val_start_index]
+
+
+# train_pack = np.array({"inputs": x_train, "targets": y_train})
+# validation_pack = np.array({"inputs": x_val, "targets": y_val})
+# testing_pack = np.array({"inputs": x_test, "targets": y_test})
+
+np.savez("data/cifar10-train", inputs=x_train, targets=y_train)
+np.savez("data/cifar10-valid", inputs=x_val, targets=y_val)
+np.savez("data/cifar10-test", inputs=x_test, targets=y_test)
+print(x_train.shape, y_train.shape, x_val.shape)
+
--- a/data_packer_2.py
+++ b/data_packer_2.py
@ -0,0 +1,59 @@
+import numpy as np
+import os
+
+np.random.seed(5112017)
+data_path = "/home/antreas/mlpractical_2016-2017/mlpractical/data"
+def unpickle(file):
+    import pickle
+    with open(file, 'rb') as fo:
+        dict = pickle.load(fo, encoding='bytes')
+    return dict
+
+train_data = []
+train_labels = []
+
+for subdir, dir, files in os.walk(data_path):
+    for file in files:
+        if not("html" in file) and not("meta" in file) and not(".txt"in file) and ("msd-25" in file):
+            filepath = os.path.join(subdir, file)
+            print(filepath)
+            data_batch = np.load(filepath)
+            print(filepath, data_batch.keys())
+            if "test" not in file and "var" not in file:
+                train_data.extend(data_batch['inputs'])
+                train_labels.extend(data_batch['targets'])
+
+x_train = np.array(train_data)
+y_train = np.array(train_labels)
+
+
+
+ids = np.arange(x_train.shape[0])
+np.random.shuffle(ids)
+
+x_train = x_train[ids]
+y_train = y_train[ids]
+
+val_start_index = int(0.75 * x_train.shape[0])
+test_start_index = int(0.85 * x_train.shape[0])
+print(val_start_index)
+
+x_val = x_train[val_start_index:]
+y_val = y_train[val_start_index:]
+
+x_test = x_train[test_start_index:]
+y_test = y_train[test_start_index:]
+
+x_train = x_train[:val_start_index]
+y_train = y_train[:val_start_index]
+
+
+# train_pack = np.array({"inputs": x_train, "targets": y_train})
+# validation_pack = np.array({"inputs": x_val, "targets": y_val})
+# testing_pack = np.array({"inputs": x_test, "targets": y_test})
+
+np.savez("data/msd25-train", inputs=x_train, targets=y_train)
+np.savez("data/msd25-valid", inputs=x_val, targets=y_val)
+np.savez("data/msd25-test", inputs=x_test, targets=y_test)
+print(x_train.shape, y_train.shape, x_val.shape)
+
--- a/mlp/data_providers.py
+++ b/mlp/data_providers.py
@ -5,11 +5,10 @@ This module provides classes for loading datasets and iterating over batches of
 data points.
 """

-import pickle
-import gzip
-import numpy as np
 import os
-from mlp import DEFAULT_SEED
+
+import numpy as np
+DEFAULT_SEED = 22012018


 class DataProvider(object):
@ -203,7 +202,7 @@ class EMNISTDataProvider(DataProvider):
    """Data provider for EMNIST handwritten digit images."""

    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
-                 shuffle_order=True, rng=None):
+                 shuffle_order=True, rng=None, flatten=False, one_hot=False):
        """Create a new EMNIST data provider object.

        Args:
@ -223,6 +222,7 @@ class EMNISTDataProvider(DataProvider):
            'Expected which_set to be either train, valid or eval. '
            'Got {0}'.format(which_set)
        )
+        self.one_hot = one_hot
        self.which_set = which_set
        self.num_classes = 47
        # construct path to data using os.path.join to ensure the correct path
@ -235,11 +235,15 @@ class EMNISTDataProvider(DataProvider):
        )
        # load data from compressed numpy file
        loaded = np.load(data_path)
-        print(loaded.keys())
+
        inputs, targets = loaded['inputs'], loaded['targets']
        inputs = inputs.astype(np.float32)
+        if flatten:
            inputs = np.reshape(inputs, newshape=(-1, 28*28))
+        else:
+            inputs = np.expand_dims(inputs, axis=3)
        inputs = inputs / 255.0
+
        # pass the loaded data to the parent class __init__
        super(EMNISTDataProvider, self).__init__(
            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
@ -247,7 +251,10 @@ class EMNISTDataProvider(DataProvider):
    def next(self):
        """Returns next data batch or raises `StopIteration` if at end."""
        inputs_batch, targets_batch = super(EMNISTDataProvider, self).next()
+        if self.one_hot:
            return inputs_batch, self.to_one_of_k(targets_batch)
+        else:
+            return inputs_batch, targets_batch

    def to_one_of_k(self, int_targets):
        """Converts integer coded class target to 1 of K coded targets.
@ -268,6 +275,336 @@ class EMNISTDataProvider(DataProvider):
        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
        return one_of_k_targets

+class CIFAR10DataProvider(DataProvider):
+    """Data provider for CIFAR-10 object images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, flatten=False, one_hot=False):
+        """Create a new EMNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the EMNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.one_hot = one_hot
+        self.which_set = which_set
+        self.num_classes = 10
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        data_path = os.path.join(
+            os.environ['MLP_DATA_DIR'], 'cifar10-{0}.npz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # load data from compressed numpy file
+        loaded = np.load(data_path)
+
+        inputs, targets = loaded['inputs'], loaded['targets']
+        inputs = inputs.astype(np.float32)
+        if flatten:
+            inputs = np.reshape(inputs, newshape=(-1, 32*32*3))
+        else:
+            inputs = np.reshape(inputs, newshape=(-1, 3, 32, 32))
+            inputs = np.transpose(inputs, axes=(0, 2, 3, 1))
+
+        inputs = inputs / 255.0
+        # label map gives strings corresponding to integer label targets
+
+
+        # pass the loaded data to the parent class __init__
+        super(CIFAR10DataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(CIFAR10DataProvider, self).next()
+        if self.one_hot:
+            return inputs_batch, self.to_one_of_k(targets_batch)
+        else:
+            return inputs_batch, targets_batch
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+
+
+class CIFAR100DataProvider(DataProvider):
+    """Data provider for CIFAR-100 object images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, flatten=False, one_hot=False):
+        """Create a new EMNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the EMNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.one_hot = one_hot
+        self.which_set = which_set
+        self.num_classes = 100
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        data_path = os.path.join(
+            os.environ['MLP_DATA_DIR'], 'cifar100-{0}.npz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # load data from compressed numpy file
+        loaded = np.load(data_path)
+
+        inputs, targets = loaded['inputs'], loaded['targets']
+        inputs = inputs.astype(np.float32)
+        if flatten:
+            inputs = np.reshape(inputs, newshape=(-1, 32*32*3))
+        else:
+            inputs = np.reshape(inputs, newshape=(-1, 3, 32, 32))
+            inputs = np.transpose(inputs, axes=(0, 2, 3, 1))
+        inputs = inputs / 255.0
+
+        # pass the loaded data to the parent class __init__
+        super(CIFAR100DataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(CIFAR100DataProvider, self).next()
+        if self.one_hot:
+            return inputs_batch, self.to_one_of_k(targets_batch)
+        else:
+            return inputs_batch, targets_batch
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+
+class MSD10GenreDataProvider(DataProvider):
+    """Data provider for Million Song Dataset 10-genre classification task."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, one_hot=False, flatten=True):
+        """Create a new EMNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the EMNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.one_hot = one_hot
+        self.which_set = which_set
+        self.num_classes = 10
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        if which_set is not "test":
+            data_path = os.path.join(
+                os.environ['MLP_DATA_DIR'], 'msd10-{0}.npz'.format(which_set))
+            assert os.path.isfile(data_path), (
+                'Data file does not exist at expected path: ' + data_path
+            )
+            # load data from compressed numpy file
+            loaded = np.load(data_path)
+
+            inputs, target = loaded['inputs'], loaded['targets']
+        else:
+            input_data_path = os.path.join(
+                os.environ['MLP_DATA_DIR'], 'msd-10-genre-test-inputs.npz')
+            assert os.path.isfile(input_data_path), (
+                'Data file does not exist at expected path: ' + input_data_path
+            )
+            target_data_path = os.path.join(
+                os.environ['MLP_DATA_DIR'], 'msd-10-genre-test-targets.npz')
+            assert os.path.isfile(input_data_path), (
+                'Data file does not exist at expected path: ' + input_data_path
+            )
+            # load data from compressed numpy file
+            inputs = np.load(input_data_path)['inputs']
+            target = np.load(target_data_path)['targets']
+        if flatten:
+            inputs = inputs.reshape((-1, 120*25))
+            #inputs, targets = loaded['inputs'], loaded['targets']
+
+
+        # label map gives strings corresponding to integer label targets
+
+        # pass the loaded data to the parent class __init__
+        super(MSD10GenreDataProvider, self).__init__(
+            inputs, target, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(MSD10GenreDataProvider, self).next()
+        if self.one_hot:
+            return inputs_batch, self.to_one_of_k(targets_batch)
+        else:
+            return inputs_batch, targets_batch
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+class MSD25GenreDataProvider(DataProvider):
+    """Data provider for Million Song Dataset 25-genre classification task."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, one_hot=False, flatten=True):
+        """Create a new EMNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the EMNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train or valid. '
+            'Got {0}'.format(which_set)
+        )
+        self.one_hot = one_hot
+        self.which_set = which_set
+        self.num_classes = 25
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+
+        data_path = os.path.join(
+            os.environ['MLP_DATA_DIR'], 'msd10-{0}.npz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # load data from compressed numpy file
+        loaded = np.load(data_path)
+
+        inputs, target = loaded['inputs'], loaded['targets']
+
+        if flatten:
+            inputs = inputs.reshape((-1, 120*25))
+            #inputs, target
+        # pass the loaded data to the parent class __init__
+        super(MSD25GenreDataProvider, self).__init__(
+            inputs, target, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(MSD25GenreDataProvider, self).next()
+        if self.one_hot:
+            return inputs_batch, self.to_one_of_k(targets_batch)
+        else:
+            return inputs_batch, targets_batch
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+

 class MetOfficeDataProvider(DataProvider):
    """South Scotland Met Office weather data provider."""
--- a/emnist_network_trainer.py
+++ b/emnist_network_trainer.py
@ -0,0 +1,182 @@
+import argparse
+import numpy as np
+import tensorflow as tf
+import tqdm
+from data_providers import EMNISTDataProvider
+from network_builder import ClassifierNetworkGraph
+from utils.parser_utils import ParserClass
+from utils.storage import build_experiment_folder, save_statistics
+
+tf.reset_default_graph()  # resets any previous graphs to clear memory
+parser = argparse.ArgumentParser(description='Welcome to CNN experiments script')  # generates an argument parser
+parser_extractor = ParserClass(parser=parser)  # creates a parser class to process the parsed input
+
+batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
+strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
+# returns a list of objects that contain
+# our parsed input
+
+experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
+                                                                   batch_size, batch_norm,
+                                                                   strided_dim_reduction)
+#  generate experiment name
+
+rng = np.random.RandomState(seed=seed)  # set seed
+
+train_data = EMNISTDataProvider(which_set="train", batch_size=batch_size, rng=rng)
+val_data = EMNISTDataProvider(which_set="valid", batch_size=batch_size, rng=rng)
+test_data = EMNISTDataProvider(which_set="test", batch_size=batch_size, rng=rng)
+#  setup our data providers
+
+print("Running {}".format(experiment_name))
+print("Starting from epoch {}".format(continue_from_epoch))
+
+saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path)  # generate experiment dir
+
+# Placeholder setup
+data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1], train_data.inputs.shape[2],
+                                          train_data.inputs.shape[3]], 'data-inputs')
+data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
+
+training_phase = tf.placeholder(tf.bool, name='training-flag')
+rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
+dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
+
+classifier_network = ClassifierNetworkGraph(input_x=data_inputs, target_placeholder=data_targets,
+                                            dropout_rate=dropout_rate, batch_size=batch_size,
+                                            num_channels=train_data.inputs.shape[2], n_classes=train_data.num_classes,
+                                            is_training=training_phase, augment_rotate_flag=rotate_data,
+                                            strided_dim_reduction=strided_dim_reduction,
+                                            use_batch_normalization=batch_norm)  # initialize our computational graph
+
+if continue_from_epoch == -1:  # if this is a new experiment and not continuation of a previous one then generate a new
+    # statistics file
+    save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
+                                                                 "val_c_loss", "val_c_accuracy",
+                                                                 "test_c_loss", "test_c_accuracy"], create=True)
+
+start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0  # if new experiment start from 0 otherwise
+# continue where left off
+
+summary_op, losses_ops, c_error_opt_op = classifier_network.init_train()  # get graph operations (ops)
+
+total_train_batches = train_data.num_batches
+total_val_batches = val_data.num_batches
+total_test_batches = test_data.num_batches
+
+best_epoch = 0
+
+if tensorboard_enable:
+    print("saved tensorboard file at", logs_filepath)
+    writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
+
+init = tf.global_variables_initializer()  # initialization op for the graph
+
+with tf.Session() as sess:
+    sess.run(init)  # actually running the initialization op
+    train_saver = tf.train.Saver()  # saver object that will save our graph so we can reload it later for continuation of
+    val_saver = tf.train.Saver()
+    #  training or inference
+
+    continue_from_epoch = -1
+
+    if continue_from_epoch != -1:
+        train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
+                                                   continue_from_epoch))  # restore previous graph to continue operations
+
+    best_val_accuracy = 0.
+    with tqdm.tqdm(total=epochs) as epoch_pbar:
+        for e in range(start_epoch, epochs):
+            total_c_loss = 0.
+            total_accuracy = 0.
+            with tqdm.tqdm(total=total_train_batches) as pbar_train:
+                for batch_idx, (x_batch, y_batch) in enumerate(train_data):
+                    iter_id = e * total_train_batches + batch_idx
+                    _, c_loss_value, acc = sess.run(
+                        [c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: True, rotate_data: False})
+                    # Here we execute the c_error_opt_op which trains the network and also the ops that compute the
+                    # loss and accuracy, we save those in _, c_loss_value and acc respectively.
+                    total_c_loss += c_loss_value  # add loss of current iter to sum
+                    total_accuracy += acc # add acc of current iter to sum
+
+                    iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
+                                                                                         total_c_loss / (batch_idx + 1),
+                                                                                         total_accuracy / (
+                                                                                             batch_idx + 1)) # show
+                                            # iter statistics using running averages of previous iter within this epoch
+                    pbar_train.set_description(iter_out)
+                    pbar_train.update(1)
+                    if tensorboard_enable and batch_idx % 25 == 0:  # save tensorboard summary every 25 iterations
+                        _summary = sess.run(
+                            summary_op,
+                            feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                       data_targets: y_batch, training_phase: True, rotate_data: False})
+                        writer.add_summary(_summary, global_step=iter_id)
+
+            total_c_loss /= total_train_batches  # compute mean of los
+            total_accuracy /= total_train_batches # compute mean of accuracy
+
+            save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+            # save graph and weights
+            print("Saved current model at", save_path)
+
+            total_val_c_loss = 0.
+            total_val_accuracy = 0. #  run validation stage, note how training_phase placeholder is set to False
+            # and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
+            #  to collect losses on the validation set
+            with tqdm.tqdm(total=total_val_batches) as pbar_val:
+                for batch_idx, (x_batch, y_batch) in enumerate(val_data):
+                    c_loss_value, acc = sess.run(
+                        [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: False, rotate_data: False})
+                    total_val_c_loss += c_loss_value
+                    total_val_accuracy += acc
+                    iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
+                                                                       total_val_accuracy / (batch_idx + 1))
+                    pbar_val.set_description(iter_out)
+                    pbar_val.update(1)
+
+            total_val_c_loss /= total_val_batches
+            total_val_accuracy /= total_val_batches
+
+            if best_val_accuracy < total_val_accuracy:  # check if val acc better than the previous best and if
+                # so save current as best and save the model as the best validation model to be used on the test set
+                #  after the final epoch
+                best_val_accuracy = total_val_accuracy
+                best_epoch = e
+                save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+                print("Saved best validation score model at", save_path)
+
+            epoch_pbar.update(1)
+            # save statistics of this epoch, train and val without test set performance
+            save_statistics(logs_filepath, "result_summary_statistics",
+                            [e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
+                             -1, -1])
+
+        val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
+        # restore model with best performance on validation set
+        total_test_c_loss = 0.
+        total_test_accuracy = 0.
+        # computer test loss and accuracy and save
+        with tqdm.tqdm(total=total_test_batches) as pbar_test:
+            for batch_id, (x_batch, y_batch) in enumerate(test_data):
+                c_loss_value, acc = sess.run(
+                    [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                    feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                               data_targets: y_batch, training_phase: False, rotate_data: False})
+                total_test_c_loss += c_loss_value
+                total_test_accuracy += acc
+                iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
+                                                                     acc / (batch_idx + 1))
+                pbar_test.set_description(iter_out)
+                pbar_test.update(1)
+
+        total_test_c_loss /= total_test_batches
+        total_test_accuracy /= total_test_batches
+
+        save_statistics(logs_filepath, "result_summary_statistics",
+                        ["test set performance", -1, -1, -1, -1,
+                         total_test_c_loss, total_test_accuracy])
--- a/environment_variables.sh
+++ b/environment_variables.sh
@ -0,0 +1,8 @@
+cd ~/anaconda2/envs/mlp
+mkdir -p ./etc/conda/activate.d
+mkdir -p ./etc/conda/deactivate.d
+echo -e '#!/bin/sh\n' >> ./etc/conda/activate.d/env_vars.sh
+echo "export MLP_DATA_DIR=$HOME/mlp_term_2/mlpractical/data" >> ./etc/conda/activate.d/env_vars.sh
+echo -e '#!/bin/sh\n' >> ./etc/conda/deactivate.d/env_vars.sh
+echo 'unset MLP_DATA_DIR' >> ./etc/conda/deactivate.d/env_vars.sh
+export MLP_DATA_DIR=$HOME/mlpractical/data
--- a/gpu_cluster_tutorial_training_script.sh
+++ b/gpu_cluster_tutorial_training_script.sh
@ -0,0 +1,33 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --gres=gpu:1
+#SBATCH --mem=16000  # memory in Mb
+#SBATCH -o sample_experiment_outfile  # send stdout to sample_experiment_outfile
+#SBATCH -e sample_experiment_errfile  # send stderr to sample_experiment_errfile
+#SBATCH -t 2:00:00  # time requested in hour:minute:secon
+export CUDA_HOME=/opt/cuda-8.0.44
+
+export CUDNN_HOME=/opt/cuDNN-6.0_8.0
+
+export STUDENT_ID=sxxxxxx
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+# Activate the relevant virtual environment:
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+
+python network_trainer.py --batch_size 128 --epochs 200 --experiment_prefix vgg-net-emnist-sample-exp --dropout_rate 0.4 --batch_norm_use True --strided_dim_reduction True --seed 25012018
--- a/mlp/init.py
+++ b/mlp/init.py
@ -1,6 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Machine Learning Practical package."""
-
-__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham']
-
-DEFAULT_SEED = 123456  # Default random number generator seed if none provided.
--- a/mlp/errors.py
+++ b/mlp/errors.py
@ -1,176 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Error functions.
-
-This module defines error functions, with the aim of model training being to
-minimise the error function given a set of inputs and target outputs.
-
-The error functions will typically measure some concept of distance between the
-model outputs and target outputs, averaged over all data points in the data set
-or batch.
-"""
-
-import numpy as np
-
-
-class SumOfSquaredDiffsError(object):
-    """Sum of squared differences (squared Euclidean distance) error."""
-
-    def __call__(self, outputs, targets):
-        """Calculates error function given a batch of outputs and targets.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Scalar cost function value.
-        """
-        return 0.5 * np.mean(np.sum((outputs - targets)**2, axis=1))
-
-    def grad(self, outputs, targets):
-        """Calculates gradient of error function with respect to outputs.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Gradient of error function with respect to outputs.
-        """
-        return (outputs - targets) / outputs.shape[0]
-
-    def __repr__(self):
-        return 'MeanSquaredErrorCost'
-
-
-class BinaryCrossEntropyError(object):
-    """Binary cross entropy error."""
-
-    def __call__(self, outputs, targets):
-        """Calculates error function given a batch of outputs and targets.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Scalar error function value.
-        """
-        return -np.mean(
-            targets * np.log(outputs) + (1. - targets) * np.log(1. - ouputs))
-
-    def grad(self, outputs, targets):
-        """Calculates gradient of error function with respect to outputs.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Gradient of error function with respect to outputs.
-        """
-        return ((1. - targets) / (1. - outputs) -
-                (targets / outputs)) / outputs.shape[0]
-
-    def __repr__(self):
-        return 'BinaryCrossEntropyError'
-
-
-class BinaryCrossEntropySigmoidError(object):
-    """Binary cross entropy error with logistic sigmoid applied to outputs."""
-
-    def __call__(self, outputs, targets):
-        """Calculates error function given a batch of outputs and targets.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Scalar error function value.
-        """
-        probs = 1. / (1. + np.exp(-outputs))
-        return -np.mean(
-            targets * np.log(probs) + (1. - targets) * np.log(1. - probs))
-
-    def grad(self, outputs, targets):
-        """Calculates gradient of error function with respect to outputs.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Gradient of error function with respect to outputs.
-        """
-        probs = 1. / (1. + np.exp(-outputs))
-        return (probs - targets) / outputs.shape[0]
-
-    def __repr__(self):
-        return 'BinaryCrossEntropySigmoidError'
-
-
-class CrossEntropyError(object):
-    """Multi-class cross entropy error."""
-
-    def __call__(self, outputs, targets):
-        """Calculates error function given a batch of outputs and targets.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Scalar error function value.
-        """
-        return -np.mean(np.sum(targets * np.log(outputs), axis=1))
-
-    def grad(self, outputs, targets):
-        """Calculates gradient of error function with respect to outputs.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Gradient of error function with respect to outputs.
-        """
-        return -(targets / outputs) / outputs.shape[0]
-
-    def __repr__(self):
-        return 'CrossEntropyError'
-
-
-class CrossEntropySoftmaxError(object):
-    """Multi-class cross entropy error with Softmax applied to outputs."""
-
-    def __call__(self, outputs, targets):
-        """Calculates error function given a batch of outputs and targets.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Scalar error function value.
-        """
-        normOutputs = outputs - outputs.max(-1)[:, None]
-        logProb = normOutputs - np.log(np.sum(np.exp(normOutputs), axis=-1)[:, None])
-        return -np.mean(np.sum(targets * logProb, axis=1))
-
-    def grad(self, outputs, targets):
-        """Calculates gradient of error function with respect to outputs.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Gradient of error function with respect to outputs.
-        """
-        probs = np.exp(outputs - outputs.max(-1)[:, None])
-        probs /= probs.sum(-1)[:, None]
-        return (probs - targets) / outputs.shape[0]
-
-    def __repr__(self):
-        return 'CrossEntropySoftmaxError'
--- a/mlp/initialisers.py
+++ b/mlp/initialisers.py
@ -1,143 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Parameter initialisers.
-
-This module defines classes to initialise the parameters in a layer.
-"""
-
-import numpy as np
-from mlp import DEFAULT_SEED
-
-
-class ConstantInit(object):
-    """Constant parameter initialiser."""
-
-    def __init__(self, value):
-        """Construct a constant parameter initialiser.
-
-        Args:
-            value: Value to initialise parameter to.
-        """
-        self.value = value
-
-    def __call__(self, shape):
-        return np.ones(shape=shape) * self.value
-
-
-class UniformInit(object):
-    """Random uniform parameter initialiser."""
-
-    def __init__(self, low, high, rng=None):
-        """Construct a random uniform parameter initialiser.
-
-        Args:
-            low: Lower bound of interval to sample from.
-            high: Upper bound of interval to sample from.
-            rng (RandomState): Seeded random number generator.
-        """
-        self.low = low
-        self.high = high
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-
-    def __call__(self, shape):
-        return self.rng.uniform(low=self.low, high=self.high, size=shape)
-
-
-class NormalInit(object):
-    """Random normal parameter initialiser."""
-
-    def __init__(self, mean, std, rng=None):
-        """Construct a random uniform parameter initialiser.
-
-        Args:
-            mean: Mean of distribution to sample from.
-            std: Standard deviation of distribution to sample from.
-            rng (RandomState): Seeded random number generator.
-        """
-        self.mean = mean
-        self.std = std
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-
-    def __call__(self, shape):
-        return self.rng.normal(loc=self.mean, scale=self.std, size=shape)
-
-class GlorotUniformInit(object):
-    """Glorot and Bengio (2010) random uniform weights initialiser.
-
-    Initialises an two-dimensional parameter array using the 'normalized
-    initialisation' scheme suggested in [1] which attempts to maintain a
-    roughly constant variance in the activations and backpropagated gradients
-    of a multi-layer model consisting of interleaved affine and logistic
-    sigmoidal transformation layers.
-
-    Weights are sampled from a zero-mean uniform distribution with standard
-    deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
-    `output_dim` are the input and output dimensions of the weight matrix
-    respectively.
-
-    References:
-      [1]: Understanding the difficulty of training deep feedforward neural
-           networks, Glorot and Bengio (2010)
-    """
-
-    def __init__(self, gain=1., rng=None):
-        """Construct a normalised initilisation random initialiser object.
-
-        Args:
-            gain: Multiplicative factor to scale initialised weights by.
-                Recommended values is 1 for affine layers followed by
-                logistic sigmoid layers (or another affine layer).
-            rng (RandomState): Seeded random number generator.
-        """
-        self.gain = gain
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-
-    def __call__(self, shape):
-        assert len(shape) == 2, (
-            'Initialiser should only be used for two dimensional arrays.')
-        std = self.gain * (2. / (shape[0] + shape[1]))**0.5
-        half_width = 3.**0.5 * std
-        return self.rng.uniform(low=-half_width, high=half_width, size=shape)
-
-
-class GlorotNormalInit(object):
-    """Glorot and Bengio (2010) random normal weights initialiser.
-
-    Initialises an two-dimensional parameter array using the 'normalized
-    initialisation' scheme suggested in [1] which attempts to maintain a
-    roughly constant variance in the activations and backpropagated gradients
-    of a multi-layer model consisting of interleaved affine and logistic
-    sigmoidal transformation layers.
-
-    Weights are sampled from a zero-mean normal distribution with standard
-    deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
-    `output_dim` are the input and output dimensions of the weight matrix
-    respectively.
-
-    References:
-      [1]: Understanding the difficulty of training deep feedforward neural
-           networks, Glorot and Bengio (2010)
-    """
-
-    def __init__(self, gain=1., rng=None):
-        """Construct a normalised initilisation random initialiser object.
-
-        Args:
-            gain: Multiplicative factor to scale initialised weights by.
-                Recommended values is 1 for affine layers followed by
-                logistic sigmoid layers (or another affine layer).
-            rng (RandomState): Seeded random number generator.
-        """
-        self.gain = gain
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-
-    def __call__(self, shape):
-        std = self.gain * (2. / (shape[0] + shape[1]))**0.5
-        return self.rng.normal(loc=0., scale=std, size=shape)
--- a/mlp/layers.py
+++ b/mlp/layers.py
--- a/mlp/learning_rules.py
+++ b/mlp/learning_rules.py
@ -1,162 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Learning rules.
-
-This module contains classes implementing gradient based learning rules.
-"""
-
-import numpy as np
-
-
-class GradientDescentLearningRule(object):
-    """Simple (stochastic) gradient descent learning rule.
-
-    For a scalar error function `E(p[0], p_[1] ... )` of some set of
-    potentially multidimensional parameters this attempts to find a local
-    minimum of the loss function by applying updates to each parameter of the
-    form
-
-        p[i] := p[i] - learning_rate * dE/dp[i]
-
-    With `learning_rate` a positive scaling parameter.
-
-    The error function used in successive applications of these updates may be
-    a stochastic estimator of the true error function (e.g. when the error with
-    respect to only a subset of data-points is calculated) in which case this
-    will correspond to a stochastic gradient descent learning rule.
-    """
-
-    def __init__(self, learning_rate=1e-3):
-        """Creates a new learning rule object.
-
-        Args:
-            learning_rate: A postive scalar to scale gradient updates to the
-                parameters by. This needs to be carefully set - if too large
-                the learning dynamic will be unstable and may diverge, while
-                if set too small learning will proceed very slowly.
-
-        """
-        assert learning_rate > 0., 'learning_rate should be positive.'
-        self.learning_rate = learning_rate
-
-    def initialise(self, params):
-        """Initialises the state of the learning rule for a set or parameters.
-
-        This must be called before `update_params` is first called.
-
-        Args:
-            params: A list of the parameters to be optimised. Note these will
-                be updated *in-place* to avoid reallocating arrays on each
-                update.
-        """
-        self.params = params
-
-    def reset(self):
-        """Resets any additional state variables to their intial values.
-
-        For this learning rule there are no additional state variables so we
-        do nothing here.
-        """
-        pass
-
-    def update_params(self, grads_wrt_params):
-        """Applies a single gradient descent update to all parameters.
-
-        All parameter updates are performed using in-place operations and so
-        nothing is returned.
-
-        Args:
-            grads_wrt_params: A list of gradients of the scalar loss function
-                with respect to each of the parameters passed to `initialise`
-                previously, with this list expected to be in the same order.
-        """
-        for param, grad in zip(self.params, grads_wrt_params):
-            param -= self.learning_rate * grad
-
-
-class MomentumLearningRule(GradientDescentLearningRule):
-    """Gradient descent with momentum learning rule.
-
-    This extends the basic gradient learning rule by introducing extra
-    momentum state variables for each parameter. These can help the learning
-    dynamic help overcome shallow local minima and speed convergence when
-    making multiple successive steps in a similar direction in parameter space.
-
-    For parameter p[i] and corresponding momentum m[i] the updates for a
-    scalar loss function `L` are of the form
-
-        m[i] := mom_coeff * m[i] - learning_rate * dL/dp[i]
-        p[i] := p[i] + m[i]
-
-    with `learning_rate` a positive scaling parameter for the gradient updates
-    and `mom_coeff` a value in [0, 1] that determines how much 'friction' there
-    is the system and so how quickly previous momentum contributions decay.
-    """
-
-    def __init__(self, learning_rate=1e-3, mom_coeff=0.9):
-        """Creates a new learning rule object.
-
-        Args:
-            learning_rate: A postive scalar to scale gradient updates to the
-                parameters by. This needs to be carefully set - if too large
-                the learning dynamic will be unstable and may diverge, while
-                if set too small learning will proceed very slowly.
-            mom_coeff: A scalar in the range [0, 1] inclusive. This determines
-                the contribution of the previous momentum value to the value
-                after each update. If equal to 0 the momentum is set to exactly
-                the negative scaled gradient each update and so this rule
-                collapses to standard gradient descent. If equal to 1 the
-                momentum will just be decremented by the scaled gradient at
-                each update. This is equivalent to simulating the dynamic in
-                a frictionless system. Due to energy conservation the loss
-                of 'potential energy' as the dynamics moves down the loss
-                function surface will lead to an increasingly large 'kinetic
-                energy' and so speed, meaning the updates will become
-                increasingly large, potentially unstably so. Typically a value
-                less than but close to 1 will avoid these issues and cause the
-                dynamic to converge to a local minima where the gradients are
-                by definition zero.
-        """
-        super(MomentumLearningRule, self).__init__(learning_rate)
-        assert mom_coeff >= 0. and mom_coeff <= 1., (
-            'mom_coeff should be in the range [0, 1].'
-        )
-        self.mom_coeff = mom_coeff
-
-    def initialise(self, params):
-        """Initialises the state of the learning rule for a set or parameters.
-
-        This must be called before `update_params` is first called.
-
-        Args:
-            params: A list of the parameters to be optimised. Note these will
-                be updated *in-place* to avoid reallocating arrays on each
-                update.
-        """
-        super(MomentumLearningRule, self).initialise(params)
-        self.moms = []
-        for param in self.params:
-            self.moms.append(np.zeros_like(param))
-
-    def reset(self):
-        """Resets any additional state variables to their intial values.
-
-        For this learning rule this corresponds to zeroing all the momenta.
-        """
-        for mom in zip(self.moms):
-            mom *= 0.
-
-    def update_params(self, grads_wrt_params):
-        """Applies a single update to all parameters.
-
-        All parameter updates are performed using in-place operations and so
-        nothing is returned.
-
-        Args:
-            grads_wrt_params: A list of gradients of the scalar loss function
-                with respect to each of the parameters passed to `initialise`
-                previously, with this list expected to be in the same order.
-        """
-        for param, mom, grad in zip(self.params, self.moms, grads_wrt_params):
-            mom *= self.mom_coeff
-            mom -= self.learning_rate * grad
-            param += mom
--- a/mlp/models.py
+++ b/mlp/models.py
@ -1,145 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Model definitions.
-
-This module implements objects encapsulating learnable models of input-output
-relationships. The model objects implement methods for forward propagating
-the inputs through the transformation(s) defined by the model to produce
-outputs (and intermediate states) and for calculating gradients of scalar
-functions of the outputs with respect to the model parameters.
-"""
-
-from mlp.layers import LayerWithParameters, StochasticLayer, StochasticLayerWithParameters
-
-
-class SingleLayerModel(object):
-    """A model consisting of a single transformation layer."""
-
-    def __init__(self, layer):
-        """Create a new single layer model instance.
-
-        Args:
-            layer: The layer object defining the model architecture.
-        """
-        self.layer = layer
-
-    @property
-    def params(self):
-        """A list of all of the parameters of the model."""
-        return self.layer.params
-
-    def fprop(self, inputs):
-        """Calculate the model outputs corresponding to a batch of inputs.
-
-        Args:
-            inputs: Batch of inputs to the model.
-
-        Returns:
-            List which is a concatenation of the model inputs and model
-            outputs, this being done for consistency of the interface with
-            multi-layer models for which `fprop` returns a list of
-            activations through all immediate layers of the model and including
-            the inputs and outputs.
-        """
-        activations = [inputs, self.layer.fprop(inputs)]
-        return activations
-
-    def grads_wrt_params(self, activations, grads_wrt_outputs):
-        """Calculates gradients with respect to the model parameters.
-
-        Args:
-            activations: List of all activations from forward pass through
-                model using `fprop`.
-            grads_wrt_outputs: Gradient with respect to the model outputs of
-               the scalar function parameter gradients are being calculated
-               for.
-
-        Returns:
-            List of gradients of the scalar function with respect to all model
-            parameters.
-        """
-        return self.layer.grads_wrt_params(activations[0], grads_wrt_outputs)
-
-    def __repr__(self):
-        return 'SingleLayerModel(' + str(self.layer) + ')'
-
-
-class MultipleLayerModel(object):
-    """A model consisting of multiple layers applied sequentially."""
-
-    def __init__(self, layers):
-        """Create a new multiple layer model instance.
-
-        Args:
-            layers: List of the the layer objecst defining the model in the
-                order they should be applied from inputs to outputs.
-        """
-        self.layers = layers
-
-    @property
-    def params(self):
-        """A list of all of the parameters of the model."""
-        params = []
-        for layer in self.layers:
-            if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
-                params += layer.params
-        return params
-
-    def fprop(self, inputs, evaluation=False):
-        """Forward propagates a batch of inputs through the model.
-
-        Args:
-            inputs: Batch of inputs to the model.
-
-        Returns:
-            List of the activations at the output of all layers of the model
-            plus the inputs (to the first layer) as the first element. The
-            last element of the list corresponds to the model outputs.
-        """
-        activations = [inputs]
-        for i, layer in enumerate(self.layers):
-            if evaluation:
-                if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
-                                                                                   StochasticLayerWithParameters):
-                    current_activations = self.layers[i].fprop(activations[i], stochastic=False)
-                else:
-                    current_activations = self.layers[i].fprop(activations[i])
-            else:
-                if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
-                                                                                   StochasticLayerWithParameters):
-                    current_activations = self.layers[i].fprop(activations[i], stochastic=True)
-                else:
-                    current_activations = self.layers[i].fprop(activations[i])
-            activations.append(current_activations)
-        return activations
-
-    def grads_wrt_params(self, activations, grads_wrt_outputs):
-        """Calculates gradients with respect to the model parameters.
-
-        Args:
-            activations: List of all activations from forward pass through
-                model using `fprop`.
-            grads_wrt_outputs: Gradient with respect to the model outputs of
-               the scalar function parameter gradients are being calculated
-               for.
-
-        Returns:
-            List of gradients of the scalar function with respect to all model
-            parameters.
-        """
-        grads_wrt_params = []
-        for i, layer in enumerate(self.layers[::-1]):
-            inputs = activations[-i - 2]
-            outputs = activations[-i - 1]
-            grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)
-            if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
-                grads_wrt_params += layer.grads_wrt_params(
-                    inputs, grads_wrt_outputs)[::-1]
-            grads_wrt_outputs = grads_wrt_inputs
-        return grads_wrt_params[::-1]
-
-    def __repr__(self):
-        return (
-            'MultiLayerModel(\n    ' +
-            '\n    '.join([str(layer) for layer in self.layers]) +
-            '\n)'
-        )
--- a/mlp/optimisers.py
+++ b/mlp/optimisers.py
@ -1,148 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Model optimisers.
-
-This module contains objects implementing (batched) stochastic gradient descent
-based optimisation of models.
-"""
-
-import time
-import logging
-from collections import OrderedDict
-import numpy as np
-import tqdm
-
-logger = logging.getLogger(__name__)
-
-
-class Optimiser(object):
-    """Basic model optimiser."""
-
-    def __init__(self, model, error, learning_rule, train_dataset,
-                 valid_dataset=None, data_monitors=None, notebook=False):
-        """Create a new optimiser instance.
-
-        Args:
-            model: The model to optimise.
-            error: The scalar error function to minimise.
-            learning_rule: Gradient based learning rule to use to minimise
-                error.
-            train_dataset: Data provider for training set data batches.
-            valid_dataset: Data provider for validation set data batches.
-            data_monitors: Dictionary of functions evaluated on targets and
-                model outputs (averaged across both full training and
-                validation data sets) to monitor during training in addition
-                to the error. Keys should correspond to a string label for
-                the statistic being evaluated.
-        """
-        self.model = model
-        self.error = error
-        self.learning_rule = learning_rule
-        self.learning_rule.initialise(self.model.params)
-        self.train_dataset = train_dataset
-        self.valid_dataset = valid_dataset
-        self.data_monitors = OrderedDict([('error', error)])
-        if data_monitors is not None:
-            self.data_monitors.update(data_monitors)
-        self.notebook = notebook
-        if notebook:
-            self.tqdm_progress = tqdm.tqdm_notebook
-        else:
-            self.tqdm_progress = tqdm.tqdm
-
-    def do_training_epoch(self):
-        """Do a single training epoch.
-
-        This iterates through all batches in training dataset, for each
-        calculating the gradient of the estimated error given the batch with
-        respect to all the model parameters and then updates the model
-        parameters according to the learning rule.
-        """
-        with self.tqdm_progress(total=self.train_dataset.num_batches) as train_progress_bar:
-            train_progress_bar.set_description("Epoch Progress")
-            for inputs_batch, targets_batch in self.train_dataset:
-                activations = self.model.fprop(inputs_batch)
-                grads_wrt_outputs = self.error.grad(activations[-1], targets_batch)
-                grads_wrt_params = self.model.grads_wrt_params(
-                    activations, grads_wrt_outputs)
-                self.learning_rule.update_params(grads_wrt_params)
-                train_progress_bar.update(1)
-
-    def eval_monitors(self, dataset, label):
-        """Evaluates the monitors for the given dataset.
-
-        Args:
-            dataset: Dataset to perform evaluation with.
-            label: Tag to add to end of monitor keys to identify dataset.
-
-        Returns:
-            OrderedDict of monitor values evaluated on dataset.
-        """
-        data_mon_vals = OrderedDict([(key + label, 0.) for key
-                                     in self.data_monitors.keys()])
-        for inputs_batch, targets_batch in dataset:
-            activations = self.model.fprop(inputs_batch, evaluation=True)
-            for key, data_monitor in self.data_monitors.items():
-                data_mon_vals[key + label] += data_monitor(
-                    activations[-1], targets_batch)
-        for key, data_monitor in self.data_monitors.items():
-            data_mon_vals[key + label] /= dataset.num_batches
-        return data_mon_vals
-
-    def get_epoch_stats(self):
-        """Computes training statistics for an epoch.
-
-        Returns:
-            An OrderedDict with keys corresponding to the statistic labels and
-            values corresponding to the value of the statistic.
-        """
-        epoch_stats = OrderedDict()
-        epoch_stats.update(self.eval_monitors(self.train_dataset, '(train)'))
-        if self.valid_dataset is not None:
-            epoch_stats.update(self.eval_monitors(
-                self.valid_dataset, '(valid)'))
-        return epoch_stats
-
-    def log_stats(self, epoch, epoch_time, stats):
-        """Outputs stats for a training epoch to a logger.
-
-        Args:
-            epoch (int): Epoch counter.
-            epoch_time: Time taken in seconds for the epoch to complete.
-            stats: Monitored stats for the epoch.
-        """
-        logger.info('Epoch {0}: {1:.1f}s to complete\n    {2}'.format(
-            epoch, epoch_time,
-            ', '.join(['{0}={1:.2e}'.format(k, v) for (k, v) in stats.items()])
-        ))
-
-    def train(self, num_epochs, stats_interval=5):
-        """Trains a model for a set number of epochs.
-
-        Args:
-            num_epochs: Number of epochs (complete passes through trainin
-                dataset) to train for.
-            stats_interval: Training statistics will be recorded and logged
-                every `stats_interval` epochs.
-
-        Returns:
-            Tuple with first value being an array of training run statistics
-            and the second being a dict mapping the labels for the statistics
-            recorded to their column index in the array.
-        """
-        start_train_time = time.time()
-        run_stats = [list(self.get_epoch_stats().values())]
-        with self.tqdm_progress(total=num_epochs) as progress_bar:
-            progress_bar.set_description("Experiment Progress")
-            for epoch in range(1, num_epochs + 1):
-                start_time = time.time()
-                self.do_training_epoch()
-                epoch_time = time.time()- start_time
-                if epoch % stats_interval == 0:
-                    stats = self.get_epoch_stats()
-                    self.log_stats(epoch, epoch_time, stats)
-                    run_stats.append(list(stats.values()))
-                progress_bar.update(1)
-        finish_train_time = time.time()
-        total_train_time = finish_train_time - start_train_time
-        return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}, total_train_time
-
--- a/mlp/schedulers.py
+++ b/mlp/schedulers.py
@ -1,34 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Training schedulers.
-
-This module contains classes implementing schedulers which control the
-evolution of learning rule hyperparameters (such as learning rate) over a
-training run.
-"""
-
-import numpy as np
-
-
-class ConstantLearningRateScheduler(object):
-    """Example of scheduler interface which sets a constant learning rate."""
-
-    def __init__(self, learning_rate):
-        """Construct a new constant learning rate scheduler object.
-
-        Args:
-            learning_rate: Learning rate to use in learning rule.
-        """
-        self.learning_rate = learning_rate
-
-    def update_learning_rule(self, learning_rule, epoch_number):
-        """Update the hyperparameters of the learning rule.
-
-        Run at the beginning of each epoch.
-
-        Args:
-            learning_rule: Learning rule object being used in training run,
-                any scheduled hyperparameters to be altered should be
-                attributes of this object.
-            epoch_number: Integer index of training epoch about to be run.
-        """
-        learning_rule.learning_rate = self.learning_rate
--- a/msd10_network_trainer.py
+++ b/msd10_network_trainer.py
@ -0,0 +1,182 @@
+import argparse
+import numpy as np
+import tensorflow as tf
+import tqdm
+from data_providers import MSD10GenreDataProvider
+from network_builder import ClassifierNetworkGraph
+from utils.parser_utils import ParserClass
+from utils.storage import build_experiment_folder, save_statistics
+
+tf.reset_default_graph()  # resets any previous graphs to clear memory
+parser = argparse.ArgumentParser(description='Welcome to CNN experiments script')  # generates an argument parser
+parser_extractor = ParserClass(parser=parser)  # creates a parser class to process the parsed input
+
+batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
+strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
+# returns a list of objects that contain
+# our parsed input
+
+experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
+                                                                   batch_size, batch_norm,
+                                                                   strided_dim_reduction)
+#  generate experiment name
+
+rng = np.random.RandomState(seed=seed)  # set seed
+
+train_data = MSD10GenreDataProvider(which_set="train", batch_size=batch_size, rng=rng)
+val_data = MSD10GenreDataProvider(which_set="valid", batch_size=batch_size, rng=rng)
+test_data = MSD10GenreDataProvider(which_set="test", batch_size=batch_size, rng=rng)
+#  setup our data providers
+
+print("Running {}".format(experiment_name))
+print("Starting from epoch {}".format(continue_from_epoch))
+
+saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path)  # generate experiment dir
+
+# Placeholder setup
+data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1]], 'data-inputs')
+data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
+
+training_phase = tf.placeholder(tf.bool, name='training-flag')
+rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
+dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
+
+classifier_network = ClassifierNetworkGraph(network_name='FCCClassifier',
+                                            input_x=data_inputs, target_placeholder=data_targets,
+                                            dropout_rate=dropout_rate, batch_size=batch_size,
+                                            num_channels=1, n_classes=train_data.num_classes,
+                                            is_training=training_phase, augment_rotate_flag=rotate_data,
+                                            strided_dim_reduction=strided_dim_reduction,
+                                            use_batch_normalization=batch_norm)  # initialize our computational graph
+
+if continue_from_epoch == -1:  # if this is a new experiment and not continuation of a previous one then generate a new
+    # statistics file
+    save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
+                                                                 "val_c_loss", "val_c_accuracy",
+                                                                 "test_c_loss", "test_c_accuracy"], create=True)
+
+start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0  # if new experiment start from 0 otherwise
+# continue where left off
+
+summary_op, losses_ops, c_error_opt_op = classifier_network.init_train()  # get graph operations (ops)
+
+total_train_batches = train_data.num_batches
+total_val_batches = val_data.num_batches
+total_test_batches = test_data.num_batches
+
+best_epoch = 0
+
+if tensorboard_enable:
+    print("saved tensorboard file at", logs_filepath)
+    writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
+
+init = tf.global_variables_initializer()  # initialization op for the graph
+
+with tf.Session() as sess:
+    sess.run(init)  # actually running the initialization op
+    train_saver = tf.train.Saver()  # saver object that will save our graph so we can reload it later for continuation of
+    val_saver = tf.train.Saver()
+    #  training or inference
+
+    continue_from_epoch = -1
+
+    if continue_from_epoch != -1:
+        train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
+                                                   continue_from_epoch))  # restore previous graph to continue operations
+
+    best_val_accuracy = 0.
+    with tqdm.tqdm(total=epochs) as epoch_pbar:
+        for e in range(start_epoch, epochs):
+            total_c_loss = 0.
+            total_accuracy = 0.
+            with tqdm.tqdm(total=total_train_batches) as pbar_train:
+                for batch_idx, (x_batch, y_batch) in enumerate(train_data):
+                    iter_id = e * total_train_batches + batch_idx
+                    _, c_loss_value, acc = sess.run(
+                        [c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: True, rotate_data: False})
+                    # Here we execute the c_error_opt_op which trains the network and also the ops that compute the
+                    # loss and accuracy, we save those in _, c_loss_value and acc respectively.
+                    total_c_loss += c_loss_value  # add loss of current iter to sum
+                    total_accuracy += acc # add acc of current iter to sum
+
+                    iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
+                                                                                         total_c_loss / (batch_idx + 1),
+                                                                                         total_accuracy / (
+                                                                                             batch_idx + 1)) # show
+                                            # iter statistics using running averages of previous iter within this epoch
+                    pbar_train.set_description(iter_out)
+                    pbar_train.update(1)
+                    if tensorboard_enable and batch_idx % 25 == 0:  # save tensorboard summary every 25 iterations
+                        _summary = sess.run(
+                            summary_op,
+                            feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                       data_targets: y_batch, training_phase: True, rotate_data: False})
+                        writer.add_summary(_summary, global_step=iter_id)
+
+            total_c_loss /= total_train_batches  # compute mean of los
+            total_accuracy /= total_train_batches # compute mean of accuracy
+
+            save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+            # save graph and weights
+            print("Saved current model at", save_path)
+
+            total_val_c_loss = 0.
+            total_val_accuracy = 0. #  run validation stage, note how training_phase placeholder is set to False
+            # and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
+            #  to collect losses on the validation set
+            with tqdm.tqdm(total=total_val_batches) as pbar_val:
+                for batch_idx, (x_batch, y_batch) in enumerate(val_data):
+                    c_loss_value, acc = sess.run(
+                        [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: False, rotate_data: False})
+                    total_val_c_loss += c_loss_value
+                    total_val_accuracy += acc
+                    iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
+                                                                       total_val_accuracy / (batch_idx + 1))
+                    pbar_val.set_description(iter_out)
+                    pbar_val.update(1)
+
+            total_val_c_loss /= total_val_batches
+            total_val_accuracy /= total_val_batches
+
+            if best_val_accuracy < total_val_accuracy:  # check if val acc better than the previous best and if
+                # so save current as best and save the model as the best validation model to be used on the test set
+                #  after the final epoch
+                best_val_accuracy = total_val_accuracy
+                best_epoch = e
+                save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+                print("Saved best validation score model at", save_path)
+
+            epoch_pbar.update(1)
+            # save statistics of this epoch, train and val without test set performance
+            save_statistics(logs_filepath, "result_summary_statistics",
+                            [e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
+                             -1, -1])
+
+        val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
+        # restore model with best performance on validation set
+        total_test_c_loss = 0.
+        total_test_accuracy = 0.
+        # computer test loss and accuracy and save
+        with tqdm.tqdm(total=total_test_batches) as pbar_test:
+            for batch_id, (x_batch, y_batch) in enumerate(test_data):
+                c_loss_value, acc = sess.run(
+                    [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                    feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                               data_targets: y_batch, training_phase: False, rotate_data: False})
+                total_test_c_loss += c_loss_value
+                total_test_accuracy += acc
+                iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
+                                                                     acc / (batch_idx + 1))
+                pbar_test.set_description(iter_out)
+                pbar_test.update(1)
+
+        total_test_c_loss /= total_test_batches
+        total_test_accuracy /= total_test_batches
+
+        save_statistics(logs_filepath, "result_summary_statistics",
+                        ["test set performance", -1, -1, -1, -1,
+                         total_test_c_loss, total_test_accuracy])
--- a/msd25_network_trainer.py
+++ b/msd25_network_trainer.py
@ -0,0 +1,182 @@
+import argparse
+import numpy as np
+import tensorflow as tf
+import tqdm
+from data_providers import MSD10GenreDataProvider
+from network_builder import ClassifierNetworkGraph
+from utils.parser_utils import ParserClass
+from utils.storage import build_experiment_folder, save_statistics
+
+tf.reset_default_graph()  # resets any previous graphs to clear memory
+parser = argparse.ArgumentParser(description='Welcome to CNN experiments script')  # generates an argument parser
+parser_extractor = ParserClass(parser=parser)  # creates a parser class to process the parsed input
+
+batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
+strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
+# returns a list of objects that contain
+# our parsed input
+
+experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
+                                                                   batch_size, batch_norm,
+                                                                   strided_dim_reduction)
+#  generate experiment name
+
+rng = np.random.RandomState(seed=seed)  # set seed
+
+train_data = MSD10GenreDataProvider(which_set="train", batch_size=batch_size, rng=rng)
+val_data = MSD10GenreDataProvider(which_set="valid", batch_size=batch_size, rng=rng)
+test_data = MSD10GenreDataProvider(which_set="test", batch_size=batch_size, rng=rng)
+#  setup our data providers
+
+print("Running {}".format(experiment_name))
+print("Starting from epoch {}".format(continue_from_epoch))
+
+saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path)  # generate experiment dir
+
+# Placeholder setup
+data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1]], 'data-inputs')
+data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
+
+training_phase = tf.placeholder(tf.bool, name='training-flag')
+rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
+dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
+
+classifier_network = ClassifierNetworkGraph(network_name='FCCClassifier',
+                                            input_x=data_inputs, target_placeholder=data_targets,
+                                            dropout_rate=dropout_rate, batch_size=batch_size,
+                                            num_channels=1, n_classes=train_data.num_classes,
+                                            is_training=training_phase, augment_rotate_flag=rotate_data,
+                                            strided_dim_reduction=strided_dim_reduction,
+                                            use_batch_normalization=batch_norm)  # initialize our computational graph
+
+if continue_from_epoch == -1:  # if this is a new experiment and not continuation of a previous one then generate a new
+    # statistics file
+    save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
+                                                                 "val_c_loss", "val_c_accuracy",
+                                                                 "test_c_loss", "test_c_accuracy"], create=True)
+
+start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0  # if new experiment start from 0 otherwise
+# continue where left off
+
+summary_op, losses_ops, c_error_opt_op = classifier_network.init_train()  # get graph operations (ops)
+
+total_train_batches = train_data.num_batches
+total_val_batches = val_data.num_batches
+total_test_batches = test_data.num_batches
+
+best_epoch = 0
+
+if tensorboard_enable:
+    print("saved tensorboard file at", logs_filepath)
+    writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
+
+init = tf.global_variables_initializer()  # initialization op for the graph
+
+with tf.Session() as sess:
+    sess.run(init)  # actually running the initialization op
+    train_saver = tf.train.Saver()  # saver object that will save our graph so we can reload it later for continuation of
+    val_saver = tf.train.Saver()
+    #  training or inference
+
+    continue_from_epoch = -1
+
+    if continue_from_epoch != -1:
+        train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
+                                                   continue_from_epoch))  # restore previous graph to continue operations
+
+    best_val_accuracy = 0.
+    with tqdm.tqdm(total=epochs) as epoch_pbar:
+        for e in range(start_epoch, epochs):
+            total_c_loss = 0.
+            total_accuracy = 0.
+            with tqdm.tqdm(total=total_train_batches) as pbar_train:
+                for batch_idx, (x_batch, y_batch) in enumerate(train_data):
+                    iter_id = e * total_train_batches + batch_idx
+                    _, c_loss_value, acc = sess.run(
+                        [c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: True, rotate_data: False})
+                    # Here we execute the c_error_opt_op which trains the network and also the ops that compute the
+                    # loss and accuracy, we save those in _, c_loss_value and acc respectively.
+                    total_c_loss += c_loss_value  # add loss of current iter to sum
+                    total_accuracy += acc # add acc of current iter to sum
+
+                    iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
+                                                                                         total_c_loss / (batch_idx + 1),
+                                                                                         total_accuracy / (
+                                                                                             batch_idx + 1)) # show
+                                            # iter statistics using running averages of previous iter within this epoch
+                    pbar_train.set_description(iter_out)
+                    pbar_train.update(1)
+                    if tensorboard_enable and batch_idx % 25 == 0:  # save tensorboard summary every 25 iterations
+                        _summary = sess.run(
+                            summary_op,
+                            feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                       data_targets: y_batch, training_phase: True, rotate_data: False})
+                        writer.add_summary(_summary, global_step=iter_id)
+
+            total_c_loss /= total_train_batches  # compute mean of los
+            total_accuracy /= total_train_batches # compute mean of accuracy
+
+            save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+            # save graph and weights
+            print("Saved current model at", save_path)
+
+            total_val_c_loss = 0.
+            total_val_accuracy = 0. #  run validation stage, note how training_phase placeholder is set to False
+            # and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
+            #  to collect losses on the validation set
+            with tqdm.tqdm(total=total_val_batches) as pbar_val:
+                for batch_idx, (x_batch, y_batch) in enumerate(val_data):
+                    c_loss_value, acc = sess.run(
+                        [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                        feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                                   data_targets: y_batch, training_phase: False, rotate_data: False})
+                    total_val_c_loss += c_loss_value
+                    total_val_accuracy += acc
+                    iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
+                                                                       total_val_accuracy / (batch_idx + 1))
+                    pbar_val.set_description(iter_out)
+                    pbar_val.update(1)
+
+            total_val_c_loss /= total_val_batches
+            total_val_accuracy /= total_val_batches
+
+            if best_val_accuracy < total_val_accuracy:  # check if val acc better than the previous best and if
+                # so save current as best and save the model as the best validation model to be used on the test set
+                #  after the final epoch
+                best_val_accuracy = total_val_accuracy
+                best_epoch = e
+                save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
+                print("Saved best validation score model at", save_path)
+
+            epoch_pbar.update(1)
+            # save statistics of this epoch, train and val without test set performance
+            save_statistics(logs_filepath, "result_summary_statistics",
+                            [e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
+                             -1, -1])
+
+        val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
+        # restore model with best performance on validation set
+        total_test_c_loss = 0.
+        total_test_accuracy = 0.
+        # computer test loss and accuracy and save
+        with tqdm.tqdm(total=total_test_batches) as pbar_test:
+            for batch_id, (x_batch, y_batch) in enumerate(test_data):
+                c_loss_value, acc = sess.run(
+                    [losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
+                    feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
+                               data_targets: y_batch, training_phase: False, rotate_data: False})
+                total_test_c_loss += c_loss_value
+                total_test_accuracy += acc
+                iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
+                                                                     acc / (batch_idx + 1))
+                pbar_test.set_description(iter_out)
+                pbar_test.update(1)
+
+        total_test_c_loss /= total_test_batches
+        total_test_accuracy /= total_test_batches
+
+        save_statistics(logs_filepath, "result_summary_statistics",
+                        ["test set performance", -1, -1, -1, -1,
+                         total_test_c_loss, total_test_accuracy])
--- a/network_architectures.py
+++ b/network_architectures.py
@ -0,0 +1,162 @@
+import tensorflow as tf
+from tensorflow.contrib.layers import batch_norm
+from tensorflow.python.ops.nn_ops import leaky_relu
+
+from utils.network_summary import count_parameters
+
+
+class VGGClassifier:
+    def __init__(self, batch_size, layer_stage_sizes, name, num_classes, num_channels=1, batch_norm_use=False,
+                 inner_layer_depth=2, strided_dim_reduction=True):
+
+        """
+        Initializes a VGG Classifier architecture
+        :param batch_size: The size of the data batch
+        :param layer_stage_sizes: A list containing the filters for each layer stage, where layer stage is a series of
+        convolutional layers with stride=1 and no max pooling followed by a dimensionality reducing stage which is
+        either a convolution with stride=1 followed by max pooling or a convolution with stride=2
+        (i.e. strided convolution). So if we pass a list [64, 128, 256] it means that if we have inner_layer_depth=2
+        then stage 0 will have 2 layers with stride=1 and filter size=64 and another dimensionality reducing convolution
+        with either stride=1 and max pooling or stride=2 to dimensionality reduce. Similarly for the other stages.
+        :param name: Name of the network
+        :param num_classes: Number of classes we will need to classify
+        :param num_channels: Number of channels of our image data.
+        :param batch_norm_use: Whether to use batch norm between layers or not.
+        :param inner_layer_depth: The amount of extra layers on top of the dimensionality reducing stage to have per
+        layer stage.
+        :param strided_dim_reduction: Whether to use strided convolutions instead of max pooling.
+        """
+        self.reuse = False
+        self.batch_size = batch_size
+        self.num_channels = num_channels
+        self.layer_stage_sizes = layer_stage_sizes
+        self.name = name
+        self.num_classes = num_classes
+        self.batch_norm_use = batch_norm_use
+        self.inner_layer_depth = inner_layer_depth
+        self.strided_dim_reduction = strided_dim_reduction
+        self.build_completed = False
+
+    def __call__(self, image_input, training=False, dropout_rate=0.0):
+        """
+        Runs the CNN producing the predictions and the gradients.
+        :param image_input: Image input to produce embeddings for. e.g. for EMNIST [batch_size, 28, 28, 1]
+        :param training: A flag indicating training or evaluation
+        :param dropout_rate: A tf placeholder of type tf.float32 indicating the amount of dropout applied
+        :return: Embeddings of size [batch_size, self.num_classes]
+        """
+
+        with tf.variable_scope(self.name, reuse=self.reuse):
+            layer_features = []
+            with tf.variable_scope('VGGNet'):
+                outputs = image_input
+                for i in range(len(self.layer_stage_sizes)):
+                    with tf.variable_scope('conv_stage_{}'.format(i)):
+                        for j in range(self.inner_layer_depth):
+                            with tf.variable_scope('conv_{}_{}'.format(i, j)):
+                                if (j == self.inner_layer_depth-1) and self.strided_dim_reduction:
+                                    stride = 2
+                                else:
+                                    stride = 1
+                                outputs = tf.layers.conv2d(outputs, self.layer_stage_sizes[i], [3, 3],
+                                                           strides=(stride, stride),
+                                                           padding='SAME', activation=None)
+                                outputs = leaky_relu(outputs, name="leaky_relu{}".format(i))
+                                layer_features.append(outputs)
+                                if self.batch_norm_use:
+                                    outputs = batch_norm(outputs, decay=0.99, scale=True,
+                                                         center=True, is_training=training, renorm=False)
+                        if self.strided_dim_reduction==False:
+                            outputs = tf.layers.max_pooling2d(outputs, pool_size=(2, 2), strides=2)
+
+                        outputs = tf.layers.dropout(outputs, rate=dropout_rate, training=training)
+                                                                              # apply dropout only at dimensionality
+                                                                              # reducing steps, i.e. the last layer in
+                                                                              # every group
+
+            c_conv_encoder = outputs
+            c_conv_encoder = tf.contrib.layers.flatten(c_conv_encoder)
+            c_conv_encoder = tf.layers.dense(c_conv_encoder, units=self.num_classes)
+
+        self.reuse = True
+        self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.name)
+
+        if not self.build_completed:
+            self.build_completed = True
+            count_parameters(self.variables, "VGGNet")
+
+        return c_conv_encoder, layer_features
+
+
+class FCCLayerClassifier:
+    def __init__(self, batch_size, layer_stage_sizes, name, num_classes, num_channels=1, batch_norm_use=False,
+                 inner_layer_depth=2, strided_dim_reduction=True):
+
+        """
+        Initializes a VGG Classifier architecture
+        :param batch_size: The size of the data batch
+        :param layer_stage_sizes: A list containing the filters for each layer stage, where layer stage is a series of
+        convolutional layers with stride=1 and no max pooling followed by a dimensionality reducing stage which is
+        either a convolution with stride=1 followed by max pooling or a convolution with stride=2
+        (i.e. strided convolution). So if we pass a list [64, 128, 256] it means that if we have inner_layer_depth=2
+        then stage 0 will have 2 layers with stride=1 and filter size=64 and another dimensionality reducing convolution
+        with either stride=1 and max pooling or stride=2 to dimensionality reduce. Similarly for the other stages.
+        :param name: Name of the network
+        :param num_classes: Number of classes we will need to classify
+        :param num_channels: Number of channels of our image data.
+        :param batch_norm_use: Whether to use batch norm between layers or not.
+        :param inner_layer_depth: The amount of extra layers on top of the dimensionality reducing stage to have per
+        layer stage.
+        :param strided_dim_reduction: Whether to use strided convolutions instead of max pooling.
+        """
+        self.reuse = False
+        self.batch_size = batch_size
+        self.num_channels = num_channels
+        self.layer_stage_sizes = layer_stage_sizes
+        self.name = name
+        self.num_classes = num_classes
+        self.batch_norm_use = batch_norm_use
+        self.inner_layer_depth = inner_layer_depth
+        self.strided_dim_reduction = strided_dim_reduction
+        self.build_completed = False
+
+    def __call__(self, image_input, training=False, dropout_rate=0.0):
+        """
+        Runs the CNN producing the predictions and the gradients.
+        :param image_input: Image input to produce embeddings for. e.g. for EMNIST [batch_size, 28, 28, 1]
+        :param training: A flag indicating training or evaluation
+        :param dropout_rate: A tf placeholder of type tf.float32 indicating the amount of dropout applied
+        :return: Embeddings of size [batch_size, self.num_classes]
+        """
+
+        with tf.variable_scope(self.name, reuse=self.reuse):
+            layer_features = []
+            with tf.variable_scope('FCCLayerNet'):
+                outputs = image_input
+                for i in range(len(self.layer_stage_sizes)):
+                    with tf.variable_scope('conv_stage_{}'.format(i)):
+                        for j in range(self.inner_layer_depth):
+                            with tf.variable_scope('conv_{}_{}'.format(i, j)):
+                                outputs = tf.layers.dense(outputs, units=self.layer_stage_sizes[i])
+                                outputs = leaky_relu(outputs, name="leaky_relu{}".format(i))
+                                layer_features.append(outputs)
+                                if self.batch_norm_use:
+                                    outputs = batch_norm(outputs, decay=0.99, scale=True,
+                                                         center=True, is_training=training, renorm=False)
+                        outputs = tf.layers.dropout(outputs, rate=dropout_rate, training=training)
+                                                                              # apply dropout only at dimensionality
+                                                                              # reducing steps, i.e. the last layer in
+                                                                              # every group
+
+            c_conv_encoder = outputs
+            c_conv_encoder = tf.contrib.layers.flatten(c_conv_encoder)
+            c_conv_encoder = tf.layers.dense(c_conv_encoder, units=self.num_classes)
+
+        self.reuse = True
+        self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.name)
+
+        if not self.build_completed:
+            self.build_completed = True
+            count_parameters(self.variables, "FCCLayerNet")
+
+        return c_conv_encoder, layer_features
--- a/network_builder.py
+++ b/network_builder.py
@ -0,0 +1,180 @@
+import tensorflow as tf
+
+from network_architectures import VGGClassifier, FCCLayerClassifier
+
+
+class ClassifierNetworkGraph:
+    def __init__(self, input_x, target_placeholder, dropout_rate,
+                 batch_size=100, num_channels=1, n_classes=100, is_training=True, augment_rotate_flag=True,
+                 tensorboard_use=False, use_batch_normalization=False, strided_dim_reduction=True,
+                 network_name='VGG_classifier'):
+
+        """
+        Initializes a Classifier Network Graph that can build models, train, compute losses and save summary statistics
+        and images
+        :param input_x: A placeholder that will feed the input images, usually of size [batch_size, height, width,
+        channels]
+        :param target_placeholder: A target placeholder of size [batch_size,]. The classes should be in index form
+               i.e. not one hot encoding, that will be done automatically by tf
+        :param dropout_rate: A placeholder of size [None] that holds a single float that defines the amount of dropout
+               to apply to the network. i.e. for 0.1 drop 0.1 of neurons
+        :param batch_size: The batch size
+        :param num_channels: Number of channels
+        :param n_classes: Number of classes we will be classifying
+        :param is_training: A placeholder that will indicate whether we are training or not
+        :param augment_rotate_flag: A placeholder indicating whether to apply rotations augmentations to our input data
+        :param tensorboard_use: Whether to use tensorboard in this experiment
+        :param use_batch_normalization: Whether to use batch normalization between layers
+        :param strided_dim_reduction: Whether to use strided dim reduction instead of max pooling
+        """
+        self.batch_size = batch_size
+        if network_name == "VGG_classifier":
+            self.c = VGGClassifier(self.batch_size, name="classifier_neural_network",
+                                   batch_norm_use=use_batch_normalization, num_channels=num_channels,
+                                   num_classes=n_classes, layer_stage_sizes=[64, 128, 256],
+                                   strided_dim_reduction=strided_dim_reduction)
+        elif network_name == "FCCClassifier":
+            self.c = FCCLayerClassifier(self.batch_size, name="classifier_neural_network",
+                                   batch_norm_use=use_batch_normalization, num_channels=num_channels,
+                                   num_classes=n_classes, layer_stage_sizes=[64, 128, 256],
+                                   strided_dim_reduction=strided_dim_reduction)
+
+        self.input_x = input_x
+        self.dropout_rate = dropout_rate
+        self.targets = target_placeholder
+
+        self.training_phase = is_training
+        self.n_classes = n_classes
+        self.iterations_trained = 0
+
+        self.augment_rotate = augment_rotate_flag
+        self.is_tensorboard = tensorboard_use
+        self.strided_dim_reduction = strided_dim_reduction
+        self.use_batch_normalization = use_batch_normalization
+
+    def loss(self):
+        """build models, calculates losses, saves summary statistcs and images.
+        Returns:
+            dict of losses.
+        """
+        with tf.name_scope("losses"):
+            image_inputs = self.data_augment_batch(self.input_x)  # conditionally apply augmentaions
+            true_outputs = self.targets
+            # produce predictions and get layer features to save for visual inspection
+            preds, layer_features = self.c(image_input=image_inputs, training=self.training_phase,
+                                           dropout_rate=self.dropout_rate)
+            # compute loss and accuracy
+            correct_prediction = tf.equal(tf.argmax(preds, 1), tf.cast(true_outputs, tf.int64))
+            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
+            crossentropy_loss = tf.reduce_mean(
+                tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_outputs, logits=preds))
+
+            # add loss and accuracy to collections
+            tf.add_to_collection('crossentropy_losses', crossentropy_loss)
+            tf.add_to_collection('accuracy', accuracy)
+
+            # save summaries for the losses, accuracy and image summaries for input images, augmented images
+            # and the layer features
+            if len(self.input_x.get_shape().as_list()) == 4:
+                self.save_features(name="VGG_features", features=layer_features)
+            tf.summary.image('image', [tf.concat(tf.unstack(self.input_x, axis=0), axis=0)])
+            tf.summary.image('augmented_image', [tf.concat(tf.unstack(image_inputs, axis=0), axis=0)])
+            tf.summary.scalar('crossentropy_losses', crossentropy_loss)
+            tf.summary.scalar('accuracy', accuracy)
+
+        return {"crossentropy_losses": tf.add_n(tf.get_collection('crossentropy_losses'),
+                                                name='total_classification_loss'),
+                "accuracy": tf.add_n(tf.get_collection('accuracy'), name='total_accuracy')}
+
+    def save_features(self, name, features, num_rows_in_grid=4):
+        """
+        Saves layer features in a grid to be used in tensorboard
+        :param name: Features name
+        :param features: A list of feature tensors
+        """
+        for i in range(len(features)):
+            shape_in = features[i].get_shape().as_list()
+            channels = shape_in[3]
+            y_channels = num_rows_in_grid
+            x_channels = int(channels / y_channels)
+
+            activations_features = tf.reshape(features[i], shape=(shape_in[0], shape_in[1], shape_in[2],
+                                                                  y_channels, x_channels))
+
+            activations_features = tf.unstack(activations_features, axis=4)
+            activations_features = tf.concat(activations_features, axis=2)
+            activations_features = tf.unstack(activations_features, axis=3)
+            activations_features = tf.concat(activations_features, axis=1)
+            activations_features = tf.expand_dims(activations_features, axis=3)
+            tf.summary.image('{}_{}'.format(name, i), activations_features)
+
+    def rotate_image(self, image):
+        """
+        Rotates a single image
+        :param image: An image to rotate
+        :return: A rotated or a non rotated image depending on the result of the flip
+        """
+        no_rotation_flip = tf.unstack(
+            tf.random_uniform([1], minval=1, maxval=100, dtype=tf.int32, seed=None,
+                              name=None))  # get a random number between 1 and 100
+        flip_boolean = tf.less_equal(no_rotation_flip[0], 50)
+        # if that number is less than or equal to 50 then set to true
+        random_variable = tf.unstack(tf.random_uniform([1], minval=1, maxval=3, dtype=tf.int32, seed=None, name=None))
+        # get a random variable between 1 and 3 for how many degrees the rotation will be i.e. k=1 means 1*90,
+        # k=2 2*90 etc.
+        image = tf.cond(flip_boolean, lambda: tf.image.rot90(image, k=random_variable[0]),
+                        lambda: image)  # if flip_boolean is true the rotate if not then do not rotate
+        return image
+
+    def rotate_batch(self, batch_images):
+        """
+        Rotate a batch of images
+        :param batch_images: A batch of images
+        :return: A rotated batch of images (some images will not be rotated if their rotation flip ends up False)
+        """
+        shapes = map(int, list(batch_images.get_shape()))
+        if len(list(batch_images.get_shape())) < 4:
+            return batch_images
+        batch_size, x, y, c = shapes
+        with tf.name_scope('augment'):
+            batch_images_unpacked = tf.unstack(batch_images)
+            new_images = []
+            for image in batch_images_unpacked:
+                new_images.append(self.rotate_image(image))
+            new_images = tf.stack(new_images)
+            new_images = tf.reshape(new_images, (batch_size, x, y, c))
+            return new_images
+
+    def data_augment_batch(self, batch_images):
+        """
+        Augments data with a variety of augmentations, in the current state only does rotations.
+        :param batch_images: A batch of images to augment
+        :return: Augmented data
+        """
+        batch_images = tf.cond(self.augment_rotate, lambda: self.rotate_batch(batch_images), lambda: batch_images)
+        return batch_images
+
+    def train(self, losses, learning_rate=1e-3, beta1=0.9):
+        """
+        Args:
+            losses dict.
+        Returns:
+            train op.
+        """
+        c_opt = tf.train.AdamOptimizer(beta1=beta1, learning_rate=learning_rate)
+        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)  # Needed for correct batch norm usage
+        with tf.control_dependencies(update_ops):
+            c_error_opt_op = c_opt.minimize(losses["crossentropy_losses"], var_list=self.c.variables,
+                                            colocate_gradients_with_ops=True)
+
+        return c_error_opt_op
+
+    def init_train(self):
+        """
+        Builds graph ops and returns them
+        :return: Summary, losses and training ops
+        """
+        losses_ops = self.loss()
+        c_error_opt_op = self.train(losses_ops)
+        summary_op = tf.summary.merge_all()
+        return summary_op, losses_ops, c_error_opt_op
--- a/notebooks/.ipynb_checkpoints/Introduction_to_tensorflow-checkpoint.ipynb
+++ b/notebooks/.ipynb_checkpoints/Introduction_to_tensorflow-checkpoint.ipynb
@ -0,0 +1,557 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction to TensorFlow\n",
+    "\n",
+    "## Computation graphs\n",
+    "\n",
+    "In the first semester we used the NumPy-based `mlp` Python package to illustrate the concepts involved in automatically propagating gradients through multiple-layer neural network models. We also looked at how to use these calculated derivatives to do gradient-descent based training of models in supervised learning tasks such as classification and regression.\n",
+    "\n",
+    "A key theme in the first semester's work was the idea of defining models in a modular fashion. There we considered  models composed of a sequence of *layer* modules, the output of each of which fed into the input of the next in the sequence and each applying a transformation to map inputs to outputs. By defining a standard interface to layer objects with each defining a `fprop` method to *forward propagate* inputs to outputs, and a `bprop` method to *back propagate* gradients with respect to the output of the layer to gradients with respect to the input of the layer, the layer modules could be composed together arbitarily and activations and gradients forward and back propagated through the whole stack respectively.\n",
+    "\n",
+    "<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
+    "  <img style='margin-bottom: 1em;' src='res/pipeline-graph.png' width='30%' />\n",
+    "  <i>'Pipeline' model composed of sequence of single input, single output layer modules</i>\n",
+    "</div>\n",
+    "\n",
+    "By construction a layer was defined as an object with a single array input and single array output. This is a natural fit for the architectures of standard feedforward networks which can be thought of a single pipeline of transformations from user provided input data to predicted outputs as illustrated in the figure above. \n",
+    "\n",
+    "<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
+    "  <img style='display: inline-block; padding-right: 2em; margin-bottom: 1em;' src='res/rnn-graph.png' width='30%' />\n",
+    "  <img style='display: inline-block; padding-left: 2em; margin-bottom: 1em;' src='res/skip-connection-graph.png' width='30%' /> <br />\n",
+    "  <i>Models which fit less well into pipeline structure: left, a sequence-to-sequence recurrent network; right, a feed forward network with skip connections.</i>\n",
+    "</div>\n",
+    "\n",
+    "Towards the end of last semester however we encountered several models which do not fit so well in to this pipeline-like structure. For instance (unrolled) recurrent neural networks tend to have inputs feeding in to and outputs feeding out from multiple points along a deep feedforward model corresponding to the updates of the hidden recurrent state, as illustrated in the left panel in the figure above. It is not trivial to see how to map this structure to our layer based pipeline. Similarly models with skip connections between layers as illustrated in the right panel of the above figure also do not fit particularly well in to a pipeline structure.\n",
+    "\n",
+    "Ideally we would like to be able to compose modular components in more general structures than the pipeline structure we have being using so far. In particular it turns out to be useful to be able to deal with models which have structures defined by arbitrary [*directed acyclic graphs*](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAGs), that is graphs connected by directed edges and without any directed cycles. Both the recurrent network and skip-connections examples can be naturally expressed as DAGs as well many other model structures.\n",
+    "\n",
+    "When working with these more general graphical structures, rather than considering a graph made up of layer modules, it often more useful to consider lower level mathematical operations or *ops* that make up the computation as the fundamental building block. A DAG composed of ops is often termed a *computation graph*. THis terminolgy was covered briefly in [lecture 6](http://www.inf.ed.ac.uk/teaching/courses/mlp/2017-18/mlp06-enc.pdf), and also in the [MLPR course](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w5a_backprop.html). The backpropagation rules we used to propagate gradients through a stack of layer modules can be naturally generalised to apply to computation graphs, with this method of applying the chain rule to automatically propagate gradients backwards through a general computation graph also sometimes termed [*reverse-mode automatic differentiation*](https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation).\n",
+    "\n",
+    "<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
+    "  <img style='margin-bottom: 1em;' src='res/affine-transform-graph.png' width='40%' />\n",
+    "  <i>Computation / data flow graph for an affine transformation $\\boldsymbol{y} = \\mathbf{W}\\boldsymbol{x} + \\boldsymbol{b}$</i>\n",
+    "</div>\n",
+    "\n",
+    "The figure above shows a very simple computation graph corresponding to the mathematical expression  $\\boldsymbol{y} = \\mathbf{W}\\boldsymbol{x} + \\boldsymbol{b}$, i.e. the affine transformation we encountered last semester. Here the nodes of the graph are operations and the edges the vector or matrix values passed between operations. The opposite convention with nodes as values and edges as operations is also sometimes used. Note that just like there was ambiguity about what to define as a layer (as discussed previously at beginning of the [third lab notebook](03_Multiple_layer_models.ipynb), there are a range of choices for the level of abstraction to use in the op nodes in a computational graph. For instance, we could also have chosen to express the above computational graph with a single `AffineTransform` op node with three inputs (one matrix, two vector) and one vector output. Equally we might choose to express the `MatMul` op in terms of the underlying individual scalar addition and multiplication operations. What to consider an operation is therefore somewhat a matter of choice and what is convenient in a particular setting.\n",
+    "\n",
+    "##  TensorFlow\n",
+    "\n",
+    "To allow us to work with models defined by more general computation graphs and to avoid the need to write `fprop` and `bprop` methods for each new model component we want to try out, this semester we will be using the open-source computation graph framework [TensorFlow](https://www.tensorflow.org/), originally developed by the Google Brain team:\n",
+    "\n",
+    "> TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs \n",
+    "in a desktop, server, or mobile device with a single API.\n",
+    "\n",
+    "TensorFlow allows complex computation graphs (also known as data flow graphs in TensorFlow parlance) to be defined via a Python interface, with efficient C++ implementations for running the corresponding operations on different devices. TensorFlow also includes tools for automatic gradient computation and a large and growing suite of pre-define operations useful for gradient-based training of machine learning models.\n",
+    "\n",
+    "In this notebook we will introduce some of the basic elements of constructing, training and evaluating models with TensorFlow. This will use similar material to some of the [official TensorFlow tutorials](https://www.tensorflow.org/tutorials/) but with an additional emphasis of making links to the material covered in this course last semester. For those who have not used a computational graph framework such as TensorFlow or Theano before you may find the [basic usage tutorial](https://www.tensorflow.org/get_started/basic_usage) useful to go through.\n",
+    "\n",
+    "### Installing TensorFlow\n",
+    "\n",
+    "To install TensorFlow, open a terminal, activate your Conda `mlp` environment using\n",
+    "\n",
+    "```\n",
+    "source activate mlp\n",
+    "```\n",
+    "\n",
+    "and then run\n",
+    "\n",
+    "```\n",
+    "pip install tensorflow  # for CPU users\n",
+    "```\n",
+    "\n",
+    "```\n",
+    "pip install tensorflow_gpu  # for GPU users\n",
+    "```\n",
+    "\n",
+    "This should locally install the stable release version of TensorFlow (currently 1.4.1) in your Conda environment. After installing TensorFlow you may need to restart the kernel in the notebook to allow it to be imported."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: EMNIST softmax regression\n",
+    "\n",
+    "As a first example we will train a simple softmax regression model to classify handwritten digit images from the EMNIST data set encountered last semester (for those fed up of working with EMNIST -  don't worry you will soon be moving on to other datasets!). This is equivalent to the model implemented in the first exercise of the third lab notebook. We will walk through constructing an equivalent model in TensorFlow and explain new TensorFlow model concepts as we use them. You should run each cell as you progress through the exercise.\n",
+    "\n",
+    "Similarly to the common convention of importing NumPy under the shortform alias `np` it is common to import the Python TensorFlow top-level module under the alias `tf`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/antreas/anaconda2/envs/mlp/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
+      "  return f(*args, **kwds)\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "import sys\n",
+    "sys.path.append(\"..\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We begin by defining [*placeholder*](https://www.tensorflow.org/api_docs/python/io_ops/placeholders) objects for the data inputs and targets arrays. These are nodes in the computation graph to which we will later *feed* in external data, such as batches of training set inputs and targets. This abstraction allows us to reuse the same computation graph for different data inputs - we can think of placeholders as acting equivalently to the arguments of a function. It is actually possible to feed data into any node in a TensorFlow graph however the advantage of using a placeholder is that is *must* always have a value fed into it (an exception will be raised if a value isn't provided) and no arbitrary alternative values needs to be entered.\n",
+    "\n",
+    "The `tf.placeholder` function has three arguments:\n",
+    "\n",
+    "  * `dtype` : The [TensorFlow datatype](https://www.tensorflow.org/api_docs/python/framework/tensor_types) for the tensor e.g. `tf.float32` for single-precision floating point values.\n",
+    "  * `shape` (optional) : An iterable defining the shape (size of each dimension) of the tensor e.g. `shape=(5, 2)` would indicate a 2D tensor (matrix) with first dimension of size 5 and second dimension of size 2. An entry of `None` in the shape definition corresponds to the corresponding dimension size being left unspecified, so for example `shape=(None, 28, 28)` would allow any 3D inputs with final two dimensions of size 28 to be inputted.\n",
+    "  * `name` (optional): String argument defining a name for the tensor which can be useful when visualising a computation graph and for debugging purposes.\n",
+    "  \n",
+    "As we will generally be working with batches of datapoints, both the `inputs` and `targets` will be 2D tensors with the first dimension corresponding to the batch size (set as `None` here to allow it to specified later) and the second dimension corresponding to the size of each input or output vector. As in the previous semester's work we will use a 1-of-K encoding for the class targets so for EMNIST each output corresponds to a vector of length 47 (number of digit/letter classes)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = tf.placeholder(tf.float32, [None, 784], 'inputs')\n",
+    "targets = tf.placeholder(tf.float32, [None, 47], 'targets')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We now define [*variable*](https://www.tensorflow.org/api_docs/python/state_ops/variables) objects for the model parameters. Variables are stateful tensors in the computation graph - they have to be explicitly initialised and their internal values can be updated as part of the operations in a graph e.g. gradient updates to model parameter during training. They can also be saved to disk and pre-saved values restored in to a graph at a later time.\n",
+    "\n",
+    "The `tf.Variable` constructor takes an `initial_value` as its first argument; this should be a TensorFlow tensor which specifies the initial value to assign to the variable, often a constant tensor such as all zeros, or random samples from a distribution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weights = tf.Variable(tf.zeros([784, 47]))\n",
+    "biases = tf.Variable(tf.zeros([47]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We now build the computation graph corresponding to producing the predicted outputs of the model (log unnormalised class probabilities) given the data inputs and model parameters. We use the TensorFlow [`matmul`](https://www.tensorflow.org/api_docs/python/math_ops/matrix_math_functions#matmul) op to compute the matrix-matrix product between the 2D array of input vectors and the weight matrix parameter variable. TensorFlow [overloads all of the common arithmetic operators](http://stackoverflow.com/a/35095052) for tensor objects so `x + y` where at least one of `x` or `y` is a tensor instance (both `tf.placeholder` and `tf.Variable` return (sub-classes) of `tf.Tensor`) corresponds to the TensorFlow elementwise addition op `tf.add`. Further elementwise binary arithmetic operators like addition follow NumPy style [broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), so in the expression below the `+ biases` sub-expression will correspond to creating an operation in the computation graph which adds the bias vector to each of the rows of the 2D tensor output of the `matmul` op."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "outputs = tf.matmul(inputs, weights) + biases"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "While we could have defined `outputs` as the softmax of the expression above to produce normalised class probabilities as the outputs of the model, as discussed last semester when using a softmax output combined with a cross-entropy error function it usually desirable from a numerical stability and efficiency perspective to wrap the softmax computation in to the error computation (as done in the `CrossEntropySoftmaxError` class in our `mlp` framework). \n",
+    "\n",
+    "In TensorFlow this can be achieved with the `softmax_cross_entropy_with_logits` op which is part of the `tf.nn` submodule which contains a number of ops specifically for neural network type models. This op takes as its first input log unnormalised class probabilities (sometimes termed logits) and as second input the class label targets which should be of the same dimension as the first input. By default the last dimension of the input tensors is assumed to correspond to the class dimension - this can be altered via an optional `dim` argument.\n",
+    "\n",
+    "The output of the `softmax_cross_entropy_with_logits` op here is a 1D tensor with a cross-entropy error value for each data point in the batch. We wish to minimise the mean cross-entropy error across the full dataset and will use the mean of the error on the batch as a stochastic estimator of this value. In TensorFlow ops which *reduce* a tensor along a dimension(s), for example  by taking a sum, mean, or product, are prefixed with `reduce`, with the default behaviour being to perform the reduction across all dimensions of the input tensor and return a scalar output. Therefore the second line below will take the per data point cross-entropy errors and produce a single mean value across the whole batch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "per_datapoint_errors = tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets)\n",
+    "error = tf.reduce_mean(per_datapoint_errors)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Although for the purposes of training we will use the cross-entropy error as this is differentiable, for evaluation we will also be interested in the classification accuracy i.e. what proportion of all of the predicted classes correspond to the true target label. We can calculate this in TensorFlow similarly to how we used NumPy to do this previously - we use the TensorFlow `tf.argmax` op to find the index of along the class dimension corresponding to the maximum predicted class probability and check if this is equal to the index along the class dimension of the 1-of-$k$ encoded target labels. Analagously to the error computation above, this computes per-datapoint values which we then need to average across with a `reduce_mean` op to produce the classification accuracy for a batch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "per_datapoint_pred_is_correct = tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1))\n",
+    "accuracy = tf.reduce_mean(tf.cast(per_datapoint_pred_is_correct, tf.float32))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As mentioned previously TensorFlow is able to automatically calculate gradients of scalar computation graph outputs with respect to tensors in the computation graph. We can explicitly construct a new sub-graph corresponding to the gradient of a scalar with respect to one or more tensors in the graph using the [`tf.gradients`](https://www.tensorflow.org/api_docs/python/train/gradient_computation) function. \n",
+    "\n",
+    "TensorFlow also however includes a number of higher-level `Optimizer` classes in the `tf.train` module that internally deal with constructing graphs corresponding to the gradients of some scalar loss with respect to one or more `Variable` tensors in the graph (usually corresponding to model parameters) and then using these gradients to update the variables (roughly equivalent to the `LearningRule` classes in the `mlp` framework). The most basic `Optimizer` instance is the `GradientDescentOptimizer` which simply adds operations corresponding to basic (stochastic) gradient descent to the graph (i.e. no momentum, adaptive learning rates etc.). The `__init__` constructor method for this class takes one argument `learning_rate` corresponding to the gradient descent learning rate / step size encountered previously.\n",
+    "\n",
+    "Usually we are not interested in the `Optimizer` object other than in adding operations in the graph corresponding to the optimisation steps. This can be achieved using the `minimize` method of the object which takes as first argument the tensor object corresponding to the scalar loss / error to be minimized. A further optional keyword argument `var_list` can be used to specify a list of variables to compute the gradients of the loss with respect to and update; by default this is set to `None` which indicates to use all trainable variables in the current graph. The `minimize` method returns an operation corresponding to applying the gradient updates to the variables - we need to store a reference to this to allow us to run these operations later. Note we do not need to store a reference to the optimizer as we have no further need of this object hence commonly the steps of constructing the `Optimizer` and calling `minimize` are commonly all applied in a single line as below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_step = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(error)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We have now constructed a computation graph which can compute predicted outputs, use these to calculate an error value (and accuracy) and use the gradients of the error with respect to the model parameter variables to update their values with a gradient descent step.\n",
+    "\n",
+    "Although we have defined our computation graph, we have not yet initialised any tensor data in memory - all of the tensor variables defined above are just symbolic representations of parts of the computation graph. We can think of the computation graph as a whole as being similar to a function - it defines a sequence of operations but does not directly run those operations on data itself.\n",
+    "\n",
+    "To run the operations in (part of) a TensorFlow graph we need to create a [`Session`](https://www.tensorflow.org/api_docs/python/client/session_management) object:\n",
+    "\n",
+    "> A `Session` object encapsulates the environment in which `Operation` objects are executed, and `Tensor` objects are evaluated.\n",
+    "\n",
+    "A session object can be constructed using either `tf.Session()` or `tf.InteractiveSession()`. The only difference in the latter is that it installs itself as the default session on construction. This can be useful in interactive contexts such as shells or the notebook interface in which an alternative to running a graph operation using the session `run` method (see below) is to call the `eval` method of an operation e.g. `op.eval()`; generally a session in which the op runs needs to be passed to `eval`; however if an interactive session is used, then this is set as a default to use in `eval` calls."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sess = tf.InteractiveSession()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The key property of a session object is its `run` method. This takes an operation (or list of operations) in a defined graph as an argument and runs the parts of the computation graph necessary to evaluate the output(s) (if any) of the operation(s), and additionally performs any updates to variables states defined by the graph (e.g. gradient updates of parameters). The output values if any of the operation(s) are returned by the `run` call.\n",
+    "\n",
+    "A standard operation which needs to be called before any other operations on a graph which includes variable nodes is a variable *initializer* operation. This, as the name suggests, initialises the values of the variables in the session to the values defined by the `initial_value` argument when adding the variables to the graph. For instance for the graph we have defined here this will initialise the `weights` variable value in the session to a 2D array of zeros of shape `(784, 10)` and the `biases` variable to a 1D array of shape `(10,)`.\n",
+    "\n",
+    "We can access initializer ops for each variable individually using the `initializer` property of the variables in question and then individually run these, however a common pattern is to use the `tf.global_variables_initializer()` function to create a single initializer op which will initialise all globally defined variables in the default graph and then run this as done below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "init_op = tf.global_variables_initializer()\n",
+    "sess.run(init_op)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We are now almost ready to begin training our defined model, however as a final step we need to create objects for accessing batches of EMNIST input and target data. In the tutorial code provided in `tf.examples.tutorials.mnist` there is an `input_data` sub-module which provides a `read_data_sets` function for downloading the MNIST data and constructing an object for iterating over MNIST data. However in the `mlp` package we already have the MNIST and EMNIST data provider classes that we used extensively last semester, and corresponding local copies of the MNIST and EMNIST data, so we will use that here as it provides all the necessary functionality."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import data_providers as data_providers\n",
+    "train_data = data_providers.EMNISTDataProvider('train', batch_size=50, flatten=True, one_hot=True)\n",
+    "valid_data = data_providers.EMNISTDataProvider('valid', batch_size=50, flatten=True, one_hot=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We are now all set to train our model. As when training models last semester, the training procedure will involve two nested loops - an outer loop corresponding to multiple full-passes through the dataset or *epochs* and an inner loop iterating over individual batches in the training data.\n",
+    "\n",
+    "The `init_op` we ran with `sess.run` previously did not depend on the placeholders `inputs` and `target` in our graph, so we simply ran it with `sess.run(init_op)`. The `train_step` operation corresponding to the gradient based updates of the `weights` and `biases` parameter variables does however depend on the `inputs` and `targets` placeholders and so we need to specify values to *feed* into these placeholders; as we wish the gradient updates to be calculated using the gradients with respect to a batch of inputs and targets, the values that we feed in are the input and target batches. This is specified using the keyword `feed_dict` argument to the session `run` method. As the name suggests this should be a Python dictionary (`dict`) with keys corresponding to references to the tensors in the graph to feed values in to and values the corresponding array values to feed in (typically NumPy `ndarray` instances) - here we have `feed_dict = {inputs: input_batch, targets: target_batch}`.\n",
+    "\n",
+    "Another difference in our use of the session `run` method below is that we call it with a list of two operations - `[train_step, error]` rather than just a single operation. This allows the output (and variable updates) of multiple operations in a graph to be evaluated together - here we both run the `train_step` operation to update the parameter values and evaluate the `error` operation to return the mean error on the batch. Although we could split this into two separate session `run` calls, as the operations calculating the batch error will need to be evaluated when running the `train_step` operation (as this is the value gradients are calculated with respect to) this would involve redoing some of the computation and so be less efficient than combining them in a single `run` call.\n",
+    "\n",
+    "As we are running two different operations, the `run` method returns two values here. The `train_step` operation has no outputs and so the first return value is `None` - in the code below we assign this to `_`, this being a common convention in Python code for assigning return values we are not interested in using. The second return value is the average error across the batch which we assign to `batch_error` and use to keep a running average of the dataset error across the epochs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "End of epoch 1: running error average = 1.40\n",
+      "End of epoch 2: running error average = 1.25\n",
+      "End of epoch 3: running error average = 1.22\n",
+      "End of epoch 4: running error average = 1.20\n",
+      "End of epoch 5: running error average = 1.19\n",
+      "End of epoch 6: running error average = 1.18\n",
+      "End of epoch 7: running error average = 1.18\n",
+      "End of epoch 8: running error average = 1.17\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_epoch = 20\n",
+    "for e in range(num_epoch):\n",
+    "    running_error = 0.\n",
+    "    for input_batch, target_batch in train_data:\n",
+    "        _, batch_error = sess.run(\n",
+    "            [train_step, error], \n",
+    "            feed_dict={inputs: input_batch, targets: target_batch})\n",
+    "        running_error += batch_error\n",
+    "    running_error /= train_data.num_batches\n",
+    "    print('End of epoch {0}: running error average = {1:.2f}'.format(e + 1, running_error))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To check your understanding of using sessions objects to evaluate parts of a graph and feeding values in to a graph, complete the definition of the function in the cell below. This should iterate across all batches in a provided data provider and calculate the error and classification accuracy for each, accumulating the average error and accuracy values across the whole dataset and returning these as a tuple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_error_and_accuracy(data):\n",
+    "    \"\"\"Calculate average error and classification accuracy across a dataset.\n",
+    "    \n",
+    "    Args:\n",
+    "        data: Data provider which iterates over input-target batches in dataset.\n",
+    "        \n",
+    "    Returns:\n",
+    "        Tuple with first element scalar value corresponding to average error\n",
+    "        across all batches in dataset and second value corresponding to\n",
+    "        average classification accuracy across all batches in dataset.\n",
+    "    \"\"\"\n",
+    "    err = 0\n",
+    "    acc = 0\n",
+    "    for input_batch, target_batch in data:\n",
+    "        err += sess.run(error, feed_dict={inputs: input_batch, targets: target_batch})\n",
+    "        acc += sess.run(accuracy, feed_dict={inputs: input_batch, targets: target_batch})\n",
+    "    err /= data.num_batches\n",
+    "    acc /= data.num_batches\n",
+    "    return err, acc"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Test your implementation by running the cell below - this should print the error and accuracy of the trained model on the validation and training datasets if implemented correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train data: Error=1.14 Accuracy=0.69\n",
+      "Valid data: Error=1.29 Accuracy=0.66\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('Train data: Error={0:.2f} Accuracy={1:.2f}'\n",
+    "      .format(*get_error_and_accuracy(train_data)))\n",
+    "print('Valid data: Error={0:.2f} Accuracy={1:.2f}'\n",
+    "      .format(*get_error_and_accuracy(valid_data)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Explicit graphs, name scopes, summaries and TensorBoard\n",
+    "\n",
+    "In the exercise above we introduced most of the basic concepts needed for constructing graphs in TensorFlow and running graph operations. In an attempt to avoid introducing too many new terms and syntax at once however we skipped over some of the non-essential elements of creating and running models in TensorFlow, in particular some of the provided functionality for organising and structuring the computation graphs created and for monitoring the progress of training runs.\n",
+    "\n",
+    "Now that you are hopefully more familiar with the basics of TensorFlow we will introduce some of these features as they are likely to provide useful when you are building and training more complex models in the rest of this semester.\n",
+    "\n",
+    "Although we started off by motivating TensorFlow as a framework which builds computation graphs, in the code above we never explicitly referenced a graph object. This is because TensorFlow always registers a default graph at start up and all operations are added to this graph by default. The default graph can be accessed using `tf.get_default_graph()`. For example running the code in the cell below will assign a reference to the default graph to `default_graph` and print the total number of operations in the current graph definition."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of operations in graph: 198\n"
+     ]
+    }
+   ],
+   "source": [
+    "default_graph = tf.get_default_graph()\n",
+    "print('Number of operations in graph: {0}'\n",
+    "      .format(len(default_graph.get_operations())))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also explicitly create a new graph object using `tf.Graph()`. This may be useful if we wish to build up several independent computation graphs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "graph = tf.Graph()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To add operations to a constructed graph object, we use the `graph.as_default()` [context manager](http://book.pythontips.com/en/latest/context_managers.html). Context managers are used with the `with` statement in Python - `with context_manager:` opens a block in Python in which a special `__enter__` method of the `context_manager` object is called before the code in the block is run and a further special `__exit__` method is run after the block code has finished execution. This can be used to for example manage allocation of resources (e.g. file handles) but also to locally change some 'context' in the code - in the example here, `graph.as_default()` is a context manager which changes the default graph within the following block to be `graph` before returning to the previous default graph once the block code is finished running. Context managers are used extensively in TensorFlow so it is worth being familiar with how they work.\n",
+    "\n",
+    "Another common context manager usage in TensorFlow is to define *name scopes*. As we encountered earlier, individual operations in a TensorFlow graph can be assigned names. As we will see later this is useful for making graphs interpretable when we use the tools provided in TensorFlow for visualising them. As computation graphs can become very big (even the quite simple graph we created in the first exercise has around 100 operations in it) even with interpretable names attached to the graph operations it can still be difficult to understand and debug what is happening in a graph. Therefore rather than simply allowing a single-level naming scheme to be applied to the individual operations in the graph, TensorFlow supports hierachical naming of sub-graphs. This allows sets of related operations to be grouped together under a common name, and thus allows both higher and lower level structure in a graph to be easily identified.\n",
+    "\n",
+    "This hierarchical naming is performed by using the name scope context manager `tf.name_scope('name')`. Starting a block `with tf.name_scope('name'):`, will cause all the of the operations added to a graph within that block to be grouped under the name specified in the `tf.name_scope` call. Name scope blocks can be nested to allow finer-grained sub-groupings of operations. Name scopes can be used to group operations at various levels e.g. operations corresponding to inference/prediction versus training, grouping operations which correspond to the classical definition of a neural network layer etc.\n",
+    "\n",
+    "The code in the cell below uses both a `graph.as_default()` context manager and name scopes to create a second copy of the computation graph corresponding to softmax regression that we constructed in the previous exercise."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with graph.as_default():\n",
+    "    with tf.name_scope('data'):\n",
+    "        inputs = tf.placeholder(tf.float32, [None, 784], name='inputs')\n",
+    "        targets = tf.placeholder(tf.float32, [None, 47], name='targets')\n",
+    "    with tf.name_scope('parameters'):\n",
+    "        weights = tf.Variable(tf.zeros([784, 47]), name='weights')\n",
+    "        biases = tf.Variable(tf.zeros([47]), name='biases')\n",
+    "    with tf.name_scope('model'):\n",
+    "        outputs = tf.matmul(inputs, weights) + biases\n",
+    "    with tf.name_scope('error'):\n",
+    "        error = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets))\n",
+    "    with tf.name_scope('train'):\n",
+    "        train_step = tf.train.GradientDescentOptimizer(0.5).minimize(error)\n",
+    "    with tf.name_scope('accuracy'):\n",
+    "        accuracy = tf.reduce_mean(tf.cast(\n",
+    "                tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1)), tf.float32))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As hinted earlier TensorFlow comes with tools for visualising computation graphs. In particular [TensorBoard](https://www.tensorflow.org/how_tos/summaries_and_tensorboard/) is an interactive web application for amongst other things visualising TensorFlow computation graphs (we will explore some of its other functionality in the latter part of the exercise). Typically TensorBoard in launched from a terminal and a browser used to connect to the resulting locally running TensorBoard server instance. However for the purposes of graph visualisation it is also possible to embed a remotely-served TensorBoard graph visualisation interface in a Jupyter notebook using the helper function below (a slight variant of the recipe in [this notebook](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb)).\n",
+    "\n",
+    "<span style='color: red; font-weight: bold;'>Note: The code below seems to not work for some people when accessing the notebook in Firefox. You can either try loading the notebook in an alternative browser, or just skip this section for now and explore the graph visualisation tool when launching TensorBoard below.</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import display, HTML\n",
+    "import datetime\n",
+    "\n",
+    "def show_graph(graph_def, frame_size=(900, 600)):\n",
+    "    \"\"\"Visualize TensorFlow graph.\"\"\"\n",
+    "    if hasattr(graph_def, 'as_graph_def'):\n",
+    "        graph_def = graph_def.as_graph_def()\n",
+    "    timestamp = datetime.datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n",
+    "    code = \"\"\"\n",
+    "        <script>\n",
+    "          function load() {{\n",
+    "            document.getElementById(\"{id}\").pbtxt = {data};\n",
+    "          }}\n",
+    "        </script>\n",
+    "        <link rel=\"import\" href=\"https://tensorboard.appspot.com/tf-graph-basic.build.html\" onload=load()>\n",
+    "        <div style=\"height:{height}px\">\n",
+    "          <tf-graph-basic id=\"{id}\"></tf-graph-basic>\n",
+    "        </div>\n",
+    "    \"\"\".format(height=frame_size[1], data=repr(str(graph_def)), id='graph'+timestamp)\n",
+    "    iframe = \"\"\"\n",
+    "        <iframe seamless style=\"width:{width}px;height:{height}px;border:0\" srcdoc=\"{src}\"></iframe>\n",
+    "    \"\"\".format(width=frame_size[0], height=frame_size[1] + 20, src=code.replace('\"', '&quot;'))\n",
+    "    display(HTML(iframe))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the cell below to display a visualisation of the graph we just defined. Notice that by default all operations within a particular defined name scope are grouped under a single node; this allows the top-level structure of the graph and how data flows between the various components to be easily visualised. We can also expand these nodes however to interrogate the operations within them - simply double-click on one of the nodes to do this (double-clicking on the expanded node will cause it to collapse again). If you expand the `model` node you should see a graph closely mirroring the affine transform example given as a motivation above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "show_graph(graph)"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/notebooks/01_Introduction.ipynb
+++ b/notebooks/01_Introduction.ipynb
--- a/notebooks/02_Single_layer_models.ipynb
+++ b/notebooks/02_Single_layer_models.ipynb
--- a/notebooks/03_Multiple_layer_models.ipynb
+++ b/notebooks/03_Multiple_layer_models.ipynb
--- a/notebooks/04_Generalisation_and_overfitting.ipynb
+++ b/notebooks/04_Generalisation_and_overfitting.ipynb
--- a/notebooks/05_Non-linearities_and_regularisation.ipynb
+++ b/notebooks/05_Non-linearities_and_regularisation.ipynb
--- a/notebooks/06_Dropout_and_maxout.ipynb
+++ b/notebooks/06_Dropout_and_maxout.ipynb
--- a/notebooks/BatchNormalizationLayer_tests.ipynb
+++ b/notebooks/BatchNormalizationLayer_tests.ipynb
@ -1,152 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "from mlp.layers import BatchNormalizationLayer\n",
-    "test_inputs = np.array([[-1.38066782, -0.94725498, -3.05585424,  2.28644454,  0.85520889,\n",
-    "         0.10575624,  0.23618609,  0.84723205,  1.06569909, -2.21704034],\n",
-    "       [ 0.11060968, -0.0747448 ,  0.56809029,  2.45926149, -2.28677816,\n",
-    "        -0.9964566 ,  2.7356007 ,  1.98002308, -0.39032315,  1.46515481]])\n",
-    "test_grads_wrt_outputs = np.array([[-0.43857052,  1.00380109, -1.18425494,  0.00486091,  0.21470207,\n",
-    "        -0.12179054, -0.11508482,  0.738482  , -1.17249238,  0.69188295],\n",
-    "       [ 1.07802015,  0.69901145,  0.81603688, -1.76743026, -1.24418692,\n",
-    "        -0.65729963, -0.50834305, -0.49016145,  1.63749743, -0.71123104]])\n",
-    "\n",
-    "#produce BatchNorm fprop and bprop\n",
-    "activation_layer = BatchNormalizationLayer(input_dim=10)\n",
-    "\n",
-    "beta = np.array(10*[0.3])\n",
-    "gamma = np.array(10*[0.5])\n",
-    "\n",
-    "activation_layer.params = [gamma, beta]\n",
-    "BN_fprop = activation_layer.fprop(test_inputs)\n",
-    "BN_bprop = activation_layer.bprop(\n",
-    "    test_inputs, BN_fprop, test_grads_wrt_outputs)\n",
-    "BN_grads_wrt_params = activation_layer.grads_wrt_params(\n",
-    "    test_inputs, test_grads_wrt_outputs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "true_fprop_outputs = np.array([[-0.1999955 , -0.19998686, -0.19999924, -0.1996655 ,  0.79999899,\n",
-    "         0.79999177, -0.1999984 , -0.19999221,  0.79999528, -0.19999926],\n",
-    "       [ 0.7999955 ,  0.79998686,  0.79999924,  0.7996655 , -0.19999899,\n",
-    "        -0.19999177,  0.7999984 ,  0.79999221, -0.19999528,  0.79999926]])\n",
-    "assert BN_fprop.shape == true_fprop_outputs.shape, (\n",
-    "    'Layer bprop returns incorrect shaped array. '\n",
-    "    'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "    .format(true_fprop_outputs.shape, BN_fprop.shape)\n",
-    ")\n",
-    "assert np.allclose(np.round(BN_fprop, decimals=2), np.round(true_fprop_outputs, decimals=2)), (\n",
-    "'Layer bprop does not return correct values. '\n",
-    "'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}\\n\\n difference is \\n\\n{2}'\n",
-    ".format(true_fprop_outputs, BN_fprop, BN_fprop-true_fprop_outputs)\n",
-    ")\n",
-    "\n",
-    "print(\"Batch Normalization F-prop test passed\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "true_bprop_outputs = np.array([[ -9.14558020e-06,   9.17665617e-06,  -8.40575535e-07,\n",
-    "          6.85384297e-03,   9.40668131e-07,   7.99795574e-06,\n",
-    "          5.03719464e-07,   1.69038704e-05,  -1.82061629e-05,\n",
-    "          5.62083224e-07],\n",
-    "       [  9.14558020e-06,  -9.17665617e-06,   8.40575535e-07,\n",
-    "         -6.85384297e-03,  -9.40668131e-07,  -7.99795574e-06,\n",
-    "         -5.03719464e-07,  -1.69038704e-05,   1.82061629e-05,\n",
-    "         -5.62083224e-07]])\n",
-    "assert BN_bprop.shape == true_bprop_outputs.shape, (\n",
-    "    'Layer bprop returns incorrect shaped array. '\n",
-    "    'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "    .format(true_bprop_outputs.shape, BN_bprop.shape)\n",
-    ")\n",
-    "assert np.allclose(np.round(BN_bprop, decimals=2), np.round(true_bprop_outputs, decimals=2)), (\n",
-    "'Layer bprop does not return correct values. '\n",
-    "'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}\\n\\n difference is \\n\\n{2}'\n",
-    ".format(true_bprop_outputs, BN_bprop, BN_bprop-true_bprop_outputs)\n",
-    ")\n",
-    "\n",
-    "print(\"Batch Normalization B-prop test passed\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "grads_wrt_gamma, grads_wrt_beta = BN_grads_wrt_params\n",
-    "true_grads_wrt_gamma = np.array(([ 1.51657703, -0.30478163,  2.00028878, -1.77110552,  1.45888603,\n",
-    "        0.53550028, -0.39325697, -1.2286243 , -2.8099633 , -1.40311192]))\n",
-    "true_grads_wrt_beta = np.array([ 0.63944963,  1.70281254, -0.36821806, -1.76256935, -1.02948485,\n",
-    "       -0.77909018, -0.62342786,  0.24832055,  0.46500505, -0.01934809])\n",
-    "\n",
-    "assert grads_wrt_gamma.shape == true_grads_wrt_gamma.shape, (\n",
-    "    'Layer bprop returns incorrect shaped array. '\n",
-    "    'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "    .format(true_grads_wrt_gamma.shape, grads_wrt_gamma.shape)\n",
-    ")\n",
-    "assert np.allclose(np.round(grads_wrt_gamma, decimals=2), np.round(true_grads_wrt_gamma, decimals=2)), (\n",
-    "'Layer bprop does not return correct values. '\n",
-    "'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}\\n\\n difference is \\n\\n{2}'\n",
-    ".format(true_grads_wrt_gamma, grads_wrt_gamma, grads_wrt_gamma-true_grads_wrt_gamma)\n",
-    ")\n",
-    "\n",
-    "assert grads_wrt_beta.shape == true_grads_wrt_beta.shape, (\n",
-    "    'Layer bprop returns incorrect shaped array. '\n",
-    "    'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "    .format(true_grads_wrt_beta.shape, grads_wrt_beta.shape)\n",
-    ")\n",
-    "assert np.allclose(np.round(grads_wrt_beta, decimals=2), np.round(true_grads_wrt_beta, decimals=2)), (\n",
-    "'Layer bprop does not return correct values. '\n",
-    "'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}\\n\\n difference is \\n\\n{2}'\n",
-    ".format(true_grads_wrt_beta, grads_wrt_beta, grads_wrt_beta-true_grads_wrt_beta)\n",
-    ")\n",
-    "\n",
-    "print(\"Batch Normalization grads wrt to params test passed\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
--- a/notebooks/ConvolutionalLayer_tests.ipynb
+++ b/notebooks/ConvolutionalLayer_tests.ipynb
@ -1,307 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below a skeleton class and associated test functions for the `fprop`, `bprop` and `grads_wrt_params` methods of the ConvolutionalLayer class are included.\n",
-    "\n",
-    "The test functions assume that in your implementation of `fprop` for the convolutional layer, outputs are calculated only for 'valid' overlaps of the kernel filters with the input - i.e. without any padding.\n",
-    "\n",
-    "It is also assumed that if convolutions with non-unit strides are implemented the default behaviour is to take unit-strides, with the test cases only correct for unit strides in both directions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The three test functions are defined in the cell below. All the functions take as first argument the *class* corresponding to the convolutional layer implementation to be tested (**not** an instance of the class). It is assumed the class being tested has an `__init__` method with at least all of the arguments defined in the skeleton definition above. A boolean second argument to each function can be used to specify if the layer implements a cross-correlation or convolution based operation (see note in [seventh lecture slides](http://www.inf.ed.ac.uk/teaching/courses/mlp/2016/mlp07-cnn.pdf))."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "\n",
-    "def test_conv_layer_fprop(layer_class, do_cross_correlation=False):\n",
-    "    \"\"\"Tests `fprop` method of a convolutional layer.\n",
-    "    \n",
-    "    Checks the outputs of `fprop` method for a fixed input against known\n",
-    "    reference values for the outputs and raises an AssertionError if\n",
-    "    the outputted values are not consistent with the reference values. If\n",
-    "    tests are all passed returns True.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer_class: Convolutional layer implementation following the \n",
-    "            interface defined in the provided skeleton class.\n",
-    "        do_cross_correlation: Whether the layer implements an operation\n",
-    "            corresponding to cross-correlation (True) i.e kernels are\n",
-    "            not flipped before sliding over inputs, or convolution\n",
-    "            (False) with filters being flipped.\n",
-    "\n",
-    "    Raises:\n",
-    "        AssertionError: Raised if output of `layer.fprop` is inconsistent \n",
-    "            with reference values either in shape or values.\n",
-    "    \"\"\"\n",
-    "    inputs = np.arange(96).reshape((2, 3, 4, 4))\n",
-    "    kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))\n",
-    "    if do_cross_correlation:\n",
-    "        kernels = kernels[:, :, ::-1, ::-1]\n",
-    "    biases = np.arange(2)\n",
-    "    true_output = np.array(\n",
-    "        [[[[ -958., -1036., -1114.],\n",
-    "           [-1270., -1348., -1426.],\n",
-    "           [-1582., -1660., -1738.]],\n",
-    "          [[ 1707.,  1773.,  1839.],\n",
-    "           [ 1971.,  2037.,  2103.],\n",
-    "           [ 2235.,  2301.,  2367.]]],\n",
-    "         [[[-4702., -4780., -4858.],\n",
-    "           [-5014., -5092., -5170.],\n",
-    "           [-5326., -5404., -5482.]],\n",
-    "          [[ 4875.,  4941.,  5007.],\n",
-    "           [ 5139.,  5205.,  5271.],\n",
-    "           [ 5403.,  5469.,  5535.]]]]\n",
-    "    )\n",
-    "    \n",
-    "    layer = layer_class(\n",
-    "        num_input_channels=kernels.shape[1], \n",
-    "        num_output_channels=kernels.shape[0], \n",
-    "        input_dim_1=inputs.shape[2], \n",
-    "        input_dim_2=inputs.shape[3],\n",
-    "        kernel_dim_1=kernels.shape[2],\n",
-    "        kernel_dim_2=kernels.shape[3]\n",
-    "    )\n",
-    "    layer.params = [kernels, biases]\n",
-    "    layer_output = layer.fprop(inputs)\n",
-    "    \n",
-    "    assert layer_output.shape == true_output.shape, (\n",
-    "        'Layer fprop gives incorrect shaped output. '\n",
-    "        'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "        .format(true_output.shape, layer_output.shape)\n",
-    "    )\n",
-    "    assert np.allclose(layer_output, true_output), (\n",
-    "        'Layer fprop does not give correct output. '\n",
-    "        'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}\\n\\n difference is \\n\\n{2}.'\n",
-    "        .format(true_output, layer_output, true_output-layer_output)\n",
-    "    )\n",
-    "    return True\n",
-    "\n",
-    "def test_conv_layer_bprop(layer_class, do_cross_correlation=False):\n",
-    "    \"\"\"Tests `bprop` method of a convolutional layer.\n",
-    "    \n",
-    "    Checks the outputs of `bprop` method for a fixed input against known\n",
-    "    reference values for the gradients with respect to inputs and raises \n",
-    "    an AssertionError if the returned values are not consistent with the\n",
-    "    reference values. If tests are all passed returns True.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer_class: Convolutional layer implementation following the \n",
-    "            interface defined in the provided skeleton class.\n",
-    "        do_cross_correlation: Whether the layer implements an operation\n",
-    "            corresponding to cross-correlation (True) i.e kernels are\n",
-    "            not flipped before sliding over inputs, or convolution\n",
-    "            (False) with filters being flipped.\n",
-    "\n",
-    "    Raises:\n",
-    "        AssertionError: Raised if output of `layer.bprop` is inconsistent \n",
-    "            with reference values either in shape or values.\n",
-    "    \"\"\"\n",
-    "    inputs = np.arange(96).reshape((2, 3, 4, 4))\n",
-    "    kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))\n",
-    "    if do_cross_correlation:\n",
-    "        kernels = kernels[:, :, ::-1, ::-1]\n",
-    "    biases = np.arange(2)\n",
-    "    grads_wrt_outputs = np.arange(-20, 16).reshape((2, 2, 3, 3))\n",
-    "    outputs = np.array(\n",
-    "        [[[[ -958., -1036., -1114.],\n",
-    "           [-1270., -1348., -1426.],\n",
-    "           [-1582., -1660., -1738.]],\n",
-    "          [[ 1707.,  1773.,  1839.],\n",
-    "           [ 1971.,  2037.,  2103.],\n",
-    "           [ 2235.,  2301.,  2367.]]],\n",
-    "         [[[-4702., -4780., -4858.],\n",
-    "           [-5014., -5092., -5170.],\n",
-    "           [-5326., -5404., -5482.]],\n",
-    "          [[ 4875.,  4941.,  5007.],\n",
-    "           [ 5139.,  5205.,  5271.],\n",
-    "           [ 5403.,  5469.,  5535.]]]]\n",
-    "    )\n",
-    "    true_grads_wrt_inputs = np.array(\n",
-    "      [[[[ 147.,  319.,  305.,  162.],\n",
-    "         [ 338.,  716.,  680.,  354.],\n",
-    "         [ 290.,  608.,  572.,  294.],\n",
-    "         [ 149.,  307.,  285.,  144.]],\n",
-    "        [[  23.,   79.,   81.,   54.],\n",
-    "         [ 114.,  284.,  280.,  162.],\n",
-    "         [ 114.,  272.,  268.,  150.],\n",
-    "         [  73.,  163.,  157.,   84.]],\n",
-    "        [[-101., -161., -143.,  -54.],\n",
-    "         [-110., -148., -120.,  -30.],\n",
-    "         [ -62.,  -64.,  -36.,    6.],\n",
-    "         [  -3.,   19.,   29.,   24.]]],\n",
-    "       [[[  39.,   67.,   53.,   18.],\n",
-    "         [  50.,   68.,   32.,   -6.],\n",
-    "         [   2.,  -40.,  -76.,  -66.],\n",
-    "         [ -31.,  -89., -111.,  -72.]],\n",
-    "        [[  59.,  115.,  117.,   54.],\n",
-    "         [ 114.,  212.,  208.,   90.],\n",
-    "         [ 114.,  200.,  196.,   78.],\n",
-    "         [  37.,   55.,   49.,   12.]],\n",
-    "        [[  79.,  163.,  181.,   90.],\n",
-    "         [ 178.,  356.,  384.,  186.],\n",
-    "         [ 226.,  440.,  468.,  222.],\n",
-    "         [ 105.,  199.,  209.,   96.]]]])\n",
-    "    layer = layer_class(\n",
-    "        num_input_channels=kernels.shape[1], \n",
-    "        num_output_channels=kernels.shape[0], \n",
-    "        input_dim_1=inputs.shape[2], \n",
-    "        input_dim_2=inputs.shape[3],\n",
-    "        kernel_dim_1=kernels.shape[2],\n",
-    "        kernel_dim_2=kernels.shape[3]\n",
-    "    )\n",
-    "    layer.params = [kernels, biases]\n",
-    "    layer_grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)\n",
-    "    assert layer_grads_wrt_inputs.shape == true_grads_wrt_inputs.shape, (\n",
-    "        'Layer bprop returns incorrect shaped array. '\n",
-    "        'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "        .format(true_grads_wrt_inputs.shape, layer_grads_wrt_inputs.shape)\n",
-    "    )\n",
-    "    assert np.allclose(layer_grads_wrt_inputs, true_grads_wrt_inputs), (\n",
-    "        'Layer bprop does not return correct values. '\n",
-    "        'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}\\n\\n difference is \\n\\n{2}'\n",
-    "        .format(true_grads_wrt_inputs, layer_grads_wrt_inputs, layer_grads_wrt_inputs-true_grads_wrt_inputs)\n",
-    "    )\n",
-    "    return True\n",
-    "\n",
-    "def test_conv_layer_grad_wrt_params(\n",
-    "        layer_class, do_cross_correlation=False):\n",
-    "    \"\"\"Tests `grad_wrt_params` method of a convolutional layer.\n",
-    "    \n",
-    "    Checks the outputs of `grad_wrt_params` method for fixed inputs \n",
-    "    against known reference values for the gradients with respect to \n",
-    "    kernels and biases, and raises an AssertionError if the returned\n",
-    "    values are not consistent with the reference values. If tests\n",
-    "    are all passed returns True.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer_class: Convolutional layer implementation following the \n",
-    "            interface defined in the provided skeleton class.\n",
-    "        do_cross_correlation: Whether the layer implements an operation\n",
-    "            corresponding to cross-correlation (True) i.e kernels are\n",
-    "            not flipped before sliding over inputs, or convolution\n",
-    "            (False) with filters being flipped.\n",
-    "\n",
-    "    Raises:\n",
-    "        AssertionError: Raised if output of `layer.bprop` is inconsistent \n",
-    "            with reference values either in shape or values.\n",
-    "    \"\"\"\n",
-    "    inputs = np.arange(96).reshape((2, 3, 4, 4))\n",
-    "    kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))\n",
-    "    biases = np.arange(2)\n",
-    "    grads_wrt_outputs = np.arange(-20, 16).reshape((2, 2, 3, 3))\n",
-    "    true_kernel_grads = np.array(\n",
-    "        [[[[ -240.,  -114.],\n",
-    "         [  264.,   390.]],\n",
-    "        [[-2256., -2130.],\n",
-    "         [-1752., -1626.]],\n",
-    "        [[-4272., -4146.],\n",
-    "         [-3768., -3642.]]],\n",
-    "       [[[ 5268.,  5232.],\n",
-    "         [ 5124.,  5088.]],\n",
-    "        [[ 5844.,  5808.],\n",
-    "         [ 5700.,  5664.]],\n",
-    "        [[ 6420.,  6384.],\n",
-    "         [ 6276.,  6240.]]]])\n",
-    "    if do_cross_correlation:\n",
-    "        kernels = kernels[:, :, ::-1, ::-1]\n",
-    "        true_kernel_grads = true_kernel_grads[:, :, ::-1, ::-1]\n",
-    "    true_bias_grads = np.array([-126.,   36.])\n",
-    "    layer = layer_class(\n",
-    "        num_input_channels=kernels.shape[1], \n",
-    "        num_output_channels=kernels.shape[0], \n",
-    "        input_dim_1=inputs.shape[2], \n",
-    "        input_dim_2=inputs.shape[3],\n",
-    "        kernel_dim_1=kernels.shape[2],\n",
-    "        kernel_dim_2=kernels.shape[3]\n",
-    "    )\n",
-    "    layer.params = [kernels, biases]\n",
-    "    layer_kernel_grads, layer_bias_grads = (\n",
-    "        layer.grads_wrt_params(inputs, grads_wrt_outputs))\n",
-    "    assert layer_kernel_grads.shape == true_kernel_grads.shape, (\n",
-    "        'grads_wrt_params gives incorrect shaped kernel gradients output. '\n",
-    "        'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "        .format(true_kernel_grads.shape, layer_kernel_grads.shape)\n",
-    "    )\n",
-    "    assert np.allclose(layer_kernel_grads, true_kernel_grads), (\n",
-    "        'grads_wrt_params does not give correct kernel gradients output. '\n",
-    "        'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}.'\n",
-    "        .format(true_kernel_grads, layer_kernel_grads)\n",
-    "    )\n",
-    "    assert layer_bias_grads.shape == true_bias_grads.shape, (\n",
-    "        'grads_wrt_params gives incorrect shaped bias gradients output. '\n",
-    "        'Correct shape is \\n\\n{0}\\n\\n but returned shape is \\n\\n{1}.'\n",
-    "        .format(true_bias_grads.shape, layer_bias_grads.shape)\n",
-    "    )\n",
-    "    assert np.allclose(layer_bias_grads, true_bias_grads), (\n",
-    "        'grads_wrt_params does not give correct bias gradients output. '\n",
-    "        'Correct output is \\n\\n{0}\\n\\n but returned output is \\n\\n{1}.'\n",
-    "        .format(true_bias_grads, layer_bias_grads)\n",
-    "    )\n",
-    "    return True"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "An example of using the test functions if given in the cell below. This assumes you implement a convolution (rather than cross-correlation) operation. If the implementation is correct "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from mlp.layers import ConvolutionalLayer\n",
-    "fprop_correct = test_conv_layer_fprop(ConvolutionalLayer, False)\n",
-    "bprop_correct = test_conv_layer_bprop(ConvolutionalLayer, False)\n",
-    "grads_wrt_param_correct = test_conv_layer_grad_wrt_params(ConvolutionalLayer, False)\n",
-    "if fprop_correct and grads_wrt_param_correct and bprop_correct:\n",
-    "    print('All tests passed.')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
--- a/notebooks/Coursework_2.ipynb
+++ b/notebooks/Coursework_2.ipynb
@ -1,147 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Coursework 2\n",
-    "\n",
-    "This notebook is intended to be used as a starting point for your experiments. The instructions can be found in the instructions file located under spec/coursework2.pdf. The methods provided here are just helper functions. If you want more complex graphs such as side by side comparisons of different experiments you should learn more about matplotlib and implement them. Before each experiment remember to re-initialize neural network weights and reset the data providers so you get a properly initialized experiment. For each experiment try to keep most hyperparameters the same except the one under investigation so you can understand what the effects of each are."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "%matplotlib inline\n",
-    "plt.style.use('ggplot')\n",
-    "\n",
-    "def train_model_and_plot_stats(\n",
-    "        model, error, learning_rule, train_data, valid_data, num_epochs, stats_interval, notebook=True):\n",
-    "    \n",
-    "    # As well as monitoring the error over training also monitor classification\n",
-    "    # accuracy i.e. proportion of most-probable predicted classes being equal to targets\n",
-    "    data_monitors={'acc': lambda y, t: (y.argmax(-1) == t.argmax(-1)).mean()}\n",
-    "\n",
-    "    # Use the created objects to initialise a new Optimiser instance.\n",
-    "    optimiser = Optimiser(\n",
-    "        model, error, learning_rule, train_data, valid_data, data_monitors, notebook=notebook)\n",
-    "\n",
-    "    # Run the optimiser for 5 epochs (full passes through the training set)\n",
-    "    # printing statistics every epoch.\n",
-    "    stats, keys, run_time = optimiser.train(num_epochs=num_epochs, stats_interval=stats_interval)\n",
-    "\n",
-    "    # Plot the change in the validation and training set error over training.\n",
-    "    fig_1 = plt.figure(figsize=(8, 4))\n",
-    "    ax_1 = fig_1.add_subplot(111)\n",
-    "    for k in ['error(train)', 'error(valid)']:\n",
-    "        ax_1.plot(np.arange(1, stats.shape[0]) * stats_interval, \n",
-    "                  stats[1:, keys[k]], label=k)\n",
-    "    ax_1.legend(loc=0)\n",
-    "    ax_1.set_xlabel('Epoch number')\n",
-    "\n",
-    "    # Plot the change in the validation and training set accuracy over training.\n",
-    "    fig_2 = plt.figure(figsize=(8, 4))\n",
-    "    ax_2 = fig_2.add_subplot(111)\n",
-    "    for k in ['acc(train)', 'acc(valid)']:\n",
-    "        ax_2.plot(np.arange(1, stats.shape[0]) * stats_interval, \n",
-    "                  stats[1:, keys[k]], label=k)\n",
-    "    ax_2.legend(loc=0)\n",
-    "    ax_2.set_xlabel('Epoch number')\n",
-    "    \n",
-    "    return stats, keys, run_time, fig_1, ax_1, fig_2, ax_2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# The below code will set up the data providers, random number\n",
-    "# generator and logger objects needed for training runs. As\n",
-    "# loading the data from file take a little while you generally\n",
-    "# will probably not want to reload the data providers on\n",
-    "# every training run. If you wish to reset their state you\n",
-    "# should instead use the .reset() method of the data providers.\n",
-    "import numpy as np\n",
-    "import logging\n",
-    "from mlp.data_providers import MNISTDataProvider, EMNISTDataProvider\n",
-    "\n",
-    "# Seed a random number generator\n",
-    "seed = 10102016 \n",
-    "rng = np.random.RandomState(seed)\n",
-    "batch_size = 100\n",
-    "# Set up a logger object to print info about the training run to stdout\n",
-    "logger = logging.getLogger()\n",
-    "logger.setLevel(logging.INFO)\n",
-    "logger.handlers = [logging.StreamHandler()]\n",
-    "\n",
-    "# Create data provider objects for the MNIST data set\n",
-    "train_data = EMNISTDataProvider('train', batch_size=batch_size, rng=rng)\n",
-    "valid_data = EMNISTDataProvider('valid', batch_size=batch_size, rng=rng)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# The model set up code below is provided as a starting point.\n",
-    "# You will probably want to add further code cells for the\n",
-    "# different experiments you run.\n",
-    "\n",
-    "from mlp.layers import AffineLayer, SoftmaxLayer, SigmoidLayer, ReluLayer, LeakyReluLayer, ELULayer, SELULayer\n",
-    "from mlp.errors import CrossEntropySoftmaxError\n",
-    "from mlp.models import MultipleLayerModel\n",
-    "from mlp.initialisers import ConstantInit, GlorotUniformInit\n",
-    "from mlp.learning_rules import GradientDescentLearningRule\n",
-    "from mlp.optimisers import Optimiser\n",
-    "\n",
-    "#setup hyperparameters\n",
-    "learning_rate = 0.1\n",
-    "num_epochs = 100\n",
-    "stats_interval = 1\n",
-    "input_dim, output_dim, hidden_dim = 784, 47, 100\n",
-    "\n",
-    "weights_init = GlorotUniformInit(rng=rng)\n",
-    "biases_init = ConstantInit(0.)\n",
-    "model = MultipleLayerModel([\n",
-    "    AffineLayer(input_dim, hidden_dim, weights_init, biases_init), \n",
-    "    ReluLayer(),\n",
-    "    AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init), \n",
-    "    ReluLayer(),\n",
-    "    AffineLayer(hidden_dim, output_dim, weights_init, biases_init)\n",
-    "])\n",
-    "\n",
-    "error = CrossEntropySoftmaxError()\n",
-    "# Use a basic gradient descent learning rule\n",
-    "learning_rule = GradientDescentLearningRule(learning_rate=learning_rate)\n",
-    "\n",
-    "#Remember to use notebook=False when you write a script to be run in a terminal\n",
-    "_ = train_model_and_plot_stats(\n",
-    "    model, error, learning_rule, train_data, valid_data, num_epochs, stats_interval, notebook=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
--- a/notebooks/Introduction_to_tensorflow.ipynb
+++ b/notebooks/Introduction_to_tensorflow.ipynb
--- a/notebooks/Introduction_to_tf_mlp_repo.ipynb
+++ b/notebooks/Introduction_to_tf_mlp_repo.ipynb
@ -0,0 +1,99 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tensorflow Experimentation Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "In the previous tutorial we introduced the tensorflow framework and some very basic functionality that it can provide. In this tutorial we will present a far more readable and research oriented tensorflow based code-base that allows one to quickly build new model architectures and research experiments in tensorflow. The proposed code-structure has been tested in real research and has proven a very readable and easily modifiable setup. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The tf_mlp package contains all necessary modules with which one can easily train and evaluate a classifier. The following packages can be found:\n",
+    "\n",
+    "    1. utils: \n",
+    "        1. network_summary: Provides utilities with which one can get network summaries, such as the number of parameters and names of layers.\n",
+    "        2. parser_utils which are used to parse arguments passed to the training scripts.\n",
+    "      3. storage, which is responsible for storing network statistics.\n",
+    "    2. data_providers.py : Provides the data providers for training, validation and testing.\n",
+    "    3. network_architectures.py: Defines the network architectures. We provide VGGNet as an example.\n",
+    "    4. network_builder.py: Builds the tensorflow computation graph. In more detail, it builds the losses, tensorflow summaries and training operations.\n",
+    "    5. network_trainer.py: Runs an experiment, composed of training, validation and testing. It is setup to use arguments such that one can easily write multiple bash scripts with different hyperparameters and run experiments very quickly with minimal code changes.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To run an experiment just run:\n",
+    "\n",
+    "```\n",
+    "python network_trainer.py --batch_size 128 --epochs 100 --experiment_prefix VGG_EMNIST --tensorboard_use True --batch_norm_use True --strided_dim_reduction True --seed 16122017\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The arguments after network_trainer.py can be changed to suit your experimental needs. For more arguments and exploring how to add new arguments of your own please view parser_utils.py under utils and network_trainer.py as they provide all the functionality that is necessary to add arguments."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally remember to make sure your code is not just efficient but readable. Research code has a very bad reputation, so let's try to improve readability one research line code at a time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To run tensorboard just point tensorboard to the correct logs repository as follows:\n",
+    "    ```tensorboard --port 60xx --logdir /path/to/logs```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    ""
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3.0
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
--- a/notebooks/res/affine-transform-graph.png
+++ b/notebooks/res/affine-transform-graph.png
--- a/notebooks/res/computational-graphs.svg
+++ b/notebooks/res/computational-graphs.svg
--- a/notebooks/res/pipeline-graph.png
+++ b/notebooks/res/pipeline-graph.png
--- a/notebooks/res/rnn-graph.png
+++ b/notebooks/res/rnn-graph.png
--- a/notebooks/res/skip-connection-graph.png
+++ b/notebooks/res/skip-connection-graph.png
--- a/notes/cifar10_100_and_million_song_datasets.md
+++ b/notes/cifar10_100_and_million_song_datasets.md
@ -0,0 +1,286 @@
+## Datasets Available on AFS
+
+For your convinience we provided data providers for cifar10/100 and million song dataset. Below you can find 
+information on the datasets and the AFS paths where one can find them.
+
+## CIFAR-10 and CIFAR-100 datasets
+
+[CIFAR-10 and CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) are a pair of image classification datasets collected by collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. They are labelled subsets of the much larger [80 million tiny images](dataset). They are a common benchmark task for image classification - a list of current accuracy benchmarks for both data sets are maintained by Rodrigo Benenson [here](http://rodrigob.github.io/are_we_there_yet/build/).
+
+As the name suggests, CIFAR-10 has images in 10 classes:
+
+    airplane
+    automobile
+    bird 
+    cat
+    deer
+    dog
+    frog
+    horse
+    ship
+    truck
+
+with 6000 images per class for an overall dataset size of 60000. Each image has three (RGB) colour channels and pixel dimension 32×32, corresponding to a total dimension per input image of 3×32×32=3072. For each colour channel the input values have been normalised to the range [0, 1].
+
+CIFAR-100 has images of identical dimensions to CIFAR-10 but rather than 10 classes they are instead split across 100 fine-grained classes (and 20 coarser 'superclasses' comprising multiple finer classes):
+
+<table style='border: none;'>
+    <tbody><tr style='font-weight: bold;'>
+        <td>Superclass</td>
+        <td>Classes</td>
+    </tr>
+    <tr>
+        <td>aquatic mammals</td>
+        <td>beaver, dolphin, otter, seal, whale</td>
+    </tr>
+    <tr>
+        <td>fish</td>
+        <td>aquarium fish, flatfish, ray, shark, trout</td>
+    </tr>
+    <tr>
+        <td>flowers</td>
+        <td>orchids, poppies, roses, sunflowers, tulips</td>
+    </tr>
+    <tr>
+        <td>food containers</td>
+        <td>bottles, bowls, cans, cups, plates</td>
+    </tr>
+    <tr>
+        <td>fruit and vegetables</td>
+        <td>apples, mushrooms, oranges, pears, sweet peppers</td>
+    </tr>
+    <tr>
+        <td>household electrical devices</td>
+        <td>clock, computer keyboard, lamp, telephone, television</td>
+    </tr>
+    <tr>
+        <td>household furniture</td>
+        <td>bed, chair, couch, table, wardrobe</td>
+    </tr>
+    <tr>
+        <td>insects</td>
+        <td>bee, beetle, butterfly, caterpillar, cockroach</td>
+    </tr>
+    <tr>
+        <td>large carnivores</td>
+        <td>bear, leopard, lion, tiger, wolf</td>
+    </tr>
+    <tr>
+        <td>large man-made outdoor things</td>
+        <td>bridge, castle, house, road, skyscraper</td>
+    </tr>
+    <tr>
+        <td>large natural outdoor scenes</td>
+        <td>cloud, forest, mountain, plain, sea</td>
+    </tr>
+    <tr>
+        <td>large omnivores and herbivores</td>
+        <td>camel, cattle, chimpanzee, elephant, kangaroo</td>
+    </tr>
+    <tr>
+        <td>medium-sized mammals</td>
+        <td>fox, porcupine, possum, raccoon, skunk</td>
+    </tr>
+    <tr>
+        <td>non-insect invertebrates</td>
+        <td>crab, lobster, snail, spider, worm</td>
+    </tr>
+    <tr>
+        <td>people</td>
+        <td>baby, boy, girl, man, woman</td>
+    </tr>
+    <tr>
+        <td>reptiles</td>
+        <td>crocodile, dinosaur, lizard, snake, turtle</td>
+    </tr>
+    <tr>
+        <td>small mammals</td>
+        <td>hamster, mouse, rabbit, shrew, squirrel</td>
+    </tr>
+    <tr>
+        <td>trees</td>
+        <td>maple, oak, palm, pine, willow</td>
+    </tr>
+    <tr>
+        <td>vehicles 1</td>
+        <td>bicycle, bus, motorcycle, pickup truck, train</td>
+    </tr>
+    <tr>
+        <td>vehicles 2</td>
+        <td>lawn-mower, rocket, streetcar, tank, tractor</td>
+    </tr>
+</tbody></table>
+
+Each class has 600 examples in it, giving an overall dataset size of 60000 i.e. the same as CIFAR-10.
+
+Both CIFAR-10 and CIFAR-100 have standard splits into 50000 training examples and 10000 test examples. For CIFAR-100 as there is an optional Kaggle competition (see below) scored on predictions on the test set, we have used a non-standard assignation of examples to test and training set and only provided the inputs (and not target labels) for the 10000 examples chosen for the test set. 
+
+For CIFAR-10 the 10000 test set examples have labels provided: to avoid any accidental over-fitting to the test set **you should only use these for the final evaluation of your model(s)**. If you repeatedly evaluate models on the test set during model development it is easy to end up indirectly fitting to the test labels - for those who have not already read it see this [excellent cautionary note from the MLPR notes by Iain Murray](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w2a_train_test_val.html#warning-dont-fool-yourself-or-make-a-fool-of-yourself). 
+
+For both CIFAR-10 and CIFAR-100, the remaining 50000 non-test examples have been split in to a 40000 example training dataset and a 10000 example validation dataset, each with target labels provided. If you wish to use a more complex cross-fold validation scheme you may want to combine these two portions of the dataset and define your own functions for separating out a validation set.
+
+Data provider classes for both CIFAR-10 and CIFAR-100 are available in the `mlp.data_providers` module. Both have similar behaviour to the `MNISTDataProvider` used extensively last semester. A `which_set` argument can be used to specify whether to return a data provided for the training dataset (`which_set='train'`) or validation dataset (`which_set='valid'`).
+
+The CIFAR-100 data provider also takes an optional `use_coarse_targets` argument in its constructor. By default this is set to `False` and the targets returned by the data provider correspond to 1-of-K encoded binary vectors for the 100 fine-grained object classes. If `use_coarse_targets=True` then instead the data provider will return 1-of-K encoded binary vector targets for the 20 coarse-grained superclasses associated with each input instead.
+
+Both data provider classes provide a `label_map` attribute which is a list of strings which are the class labels corresponding to the integer targets (i.e. prior to conversion to a 1-of-K encoded binary vector).
+
+### Accessing the CIFAR-10 and CIFAR-100 data
+
+Before using the data provider objects you will need to make sure the data files are accessible to the `mlp` package by existing under the directory specified by the `MLP_DATA_DIR` path.
+
+The data is available as compressed NumPy `.npz` files in the AFS directory `/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/`.
+
+If you are working on DICE one option is to redefine your `MLP_DATA_DIR` to directly point to the shared AFS data directory by editing the `env_vars.sh` start up file for your environment. This will avoid using up your DICE quota by storing the data files in your homespace but may involve slower initial loading of the data on initialising the data providers if many people are trying access the same files at once. The environment variable can be redefined by running
+
+```
+gedit ~/miniconda3/envs/mlp/etc/conda/activate.d/env_vars.sh
+```
+
+in a terminal window (assuming you installed `miniconda3` to your home directory), and changing the line
+
+```
+export MLP_DATA_DIR=$HOME/mlpractical/data
+```
+
+to
+
+```
+export MLP_DATA_DIR="`/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/`"
+```
+
+and then saving and closing the editor. You will need reload the `mlp` environment using `source activate mlp` and restart the Jupyter notebook server in the reloaded environment for the new environment variable definition to be available.
+
+For those working on DICE who have sufficient quota remaining or those using there own machine, an alternative option is to copy the data files in to your local `mlp/data` directory (or wherever your `MLP_DATA_DIR` environment variable currently points to if different). 
+
+
+Assuming your local `mlpractical` repository is in your home directory you should be able to copy the required files on DICE by running
+
+```
+cp `/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/cifar*.npz ~/mlpractical/data
+```
+
+On a non-DICE machine, you will need to either [set up local access to AFS](http://computing.help.inf.ed.ac.uk/informatics-filesystem), use a remote file transfer client like `scp` or you can alternatively download the files using the iFile web interface [here](https://ifile.inf.ed.ac.uk/?path=%2Fafs%2Finf.ed.ac.uk%2Fgroup%2Fteaching%2Fmlp%2Fdata&goChange=Go) (requires DICE credentials).
+
+As some of the files are quite large you may wish to copy only those you are using currently (e.g. only the files for one of the two tasks) to your local filespace to avoid filling up your quota. The `cifar-100-test-inputs.npz` file will only be needed by those intending to enter the associated optional Kaggle competition.
+
+## Genre classification with the Million Song Dataset
+
+The [Million Song Dataset](http://labrosa.ee.columbia.edu/millionsong/) is a 
+
+>  freely-available collection of audio features and metadata for a million contemporary popular music tracks
+
+originally collected and compiled by Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere.
+
+The dataset is intended to encourage development of algorithms in the field of [music information retrieval](https://en.wikipedia.org/wiki/Music_information_retrieval). The [data for each track](http://labrosa.ee.columbia.edu/millionsong/pages/example-track-description) includes both textual features such as artist and album names, numerical descriptors such as duration and various audio features derived using a music analysis platform provided by [The Echo Nest](https://en.wikipedia.org/wiki/The_Echo_Nest) (since acquired by Spotify). Of the various audio features and segmentations included in the full dataset, the most detailed information is included at a 'segment' level: each segment corresponds to an automatically identified 'quasi-stable music event' - roughly contiguous sections of the audio with similar perceptual quality. The number of segments per track is variable and each segment can itself be of variable length - typically they seem to be around 0.2 - 0.4 seconds but can be as long as 10 seconds or more. 
+
+For each segment of the track various extracted audio features are available - a 12 dimensional vector of [chroma features](https://en.wikipedia.org/wiki/Chroma_feature), a 12 dimensional vector of ['MFCC-like'](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) timbre features and various measures of the loudness of the segment, including loudness at the segment start and maximum loudness. In the version of the data we provide, we include a 25 dimensional vector for each included segment, consisting of the 12 timbre features, 12 chroma features and loudness at start of segment concatenated in that order. To allow easier integration in to standard feedforward models, the basic version of the data we provide includes features only for a fixed length crop of the central 120 segments of each track (with tracks with less than 120 segments therefore not being included). This gives an overall input dimension per track of 120×25=3000. Each of the 3000 input dimensions has been been preprocessed by subtracting the per-dimension mean across the training data and dividing by the per-dimension standard deviation across the training data.
+
+We provide data providers for the fixed length crops versions of the input features, with the inputs being returned in batches of 3000 dimensional vectors (these can be reshaped to (120, 25) to get the per-segment features). To allow for more complex variable-length sequence modelling with for example recurrent neural networks, we also provide a variable length version of the data. This is only provided as compressed NumPy (`.npz`) data files rather than data provider objects - you will need to write your own data provider if you wish to use this version of the data. As the inputs are of variable number of segments they have been ['bucketed'](https://www.tensorflow.org/tutorials/seq2seq/#bucketing_and_padding) into groups of similar maximum length, with the following binning scheme used:
+
+     120 - 250  segments
+     251 - 500  segments
+     501 - 650  segments
+     651 - 800  segments
+     801 - 950  segments
+     951 - 1200 segments
+    1201 - 2000 segments
+    2000 - 4000 segments
+    
+For each bucket the NumPy data files include inputs and targets arrays with second dimension equal to the maximum sgement size in the bucket (e.g. 250 for the bucket) and first dimension equal to the number of tracks with number of segments in that bucket. These are named `inputs_{n}` and `targets_{n}` in the data file where `{n}` is the maximal number of segments in the bucket e.g. `inputs_250` and `targets_250` for the first bucket. For tracks with less segments than the maximum size in the bucket, the features for the track have been padded with `NaN` values. For tracks with more segments than the maximum bucket size of 4000, only the first 4000 segments have been included.
+
+To allow you to match tracks between the fixed length and variable length datasets, the data files also include an array for each bucket giving the indices of the corresponding track in the fixed length input arrays. For example the array `indices_250` will be an array of the same size as the first dimension of `inputs_250` and `targets_250` with the first element of `indices_250` giving the index into the `inputs` and `targets` array of the fixed length data corresponding to first element of `inputs_250` and `targets_250`.
+
+The Million Song Dataset in its original form does not provide any genre labels, however various external groups have proposed genre labels for portions of the data by cross-referencing the track IDs against external music tagging databases. Analagously to the provision of both simpler and more complex classifications tasks for the CIFAR-10 / CIFAR-100 datasets, we provide two classification task datasets derived from the Million Song Dataset - one with 10 coarser level genre classes, and another with 25 finer-grained genre / style classifications.
+
+The 10-genre classification task uses the [*CD2C tagtraum genre annotations*](http://www.tagtraum.com/msd_genre_datasets.html) derived from multiple source databases (beaTunes genre dataset, Last.fm dataset, Top-MAGD dataset), with the *CD2C* variant using only non-ambiguous annotations (i.e. not including tracks with multiple genre labels). Of the 15 genre labels provided in the CD2C annotations, 5 (World, Latin, Punk, Folk and New Age) were not included due to having fewer than 5000 examples available. This left 10 remaining genre classes:
+
+    Rap
+    Rock
+    RnB
+    Electronic
+    Metal
+    Blues
+    Pop
+    Jazz
+    Country
+    Reggae
+
+For each of these 10 classes, 5000 labelled examples have been collected for training / validation (i.e. 50000 example in total) and a further 1000 example per class for testing, with the exception of the `Blues` class for which only 991 testing examples are provided due to there being insufficient labelled tracks of the minimum required length (i.e. a total of 9991 test examples). 
+
+The 9991 test set examples have labels provided: however to avoid any accidental over-fitting to the test set **you should only use these for the final evaluation of your model(s)**. If you repeatedly evaluate models on the test set during model development it is easy to end up indirectly fitting to the test labels - for those who have not already read it see this [excellent cautionary note int the MLPR notes by Iain Murray](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w2a_train_test_val.html#warning-dont-fool-yourself-or-make-a-fool-of-yourself). 
+
+
+The 25-genre classification tasks uses the [*MSD Allmusic Style Dataset*](http://www.ifs.tuwien.ac.at/mir/msd/MASD.html) labels derived from the [AllMusic.com](http://www.allmusic.com/) database by [Alexander Schindler, Rudolf Mayer and Andreas Rauber of Vienna University of Technology](http://www.ifs.tuwien.ac.at/~schindler/pubs/ISMIR2012.pdf). The 25 genre / style labels used are:
+
+    Big Band
+    Blues Contemporary
+    Country Traditional
+    Dance
+    Electronica
+    Experimental
+    Folk International
+    Gospel
+    Grunge Emo
+    Hip Hop Rap
+    Jazz Classic
+    Metal Alternative
+    Metal Death
+    Metal Heavy
+    Pop Contemporary
+    Pop Indie
+    Pop Latin
+    Punk
+    Reggae
+    RnB Soul
+    Rock Alternative
+    Rock College
+    Rock Contemporary
+    Rock Hard
+    Rock Neo Psychedelia
+    
+For each of these 25 classes, 2000 labelled examples have been collected for training / validation (i.e. 50000 example in total). A further 400 example per class have been collected for testing (i.e. 10000 examples in total), which you are provided inputs but not targets for. The optional Kaggle competition being run for this dataset (see email) is scored based on the 25-genre class label predictions on these unlabelled test inputs. 
+
+The tracks used for the 25-genre classification task only partially overlap with those used for the 10-genre classification task and we do not provide any mapping between the two.
+
+For each of the two tasks, the 50000 examples collected for training have been pre-split in to a 40000 example training dataset and a 10000 example validation dataset. If you wish to use a more complex cross-fold validation scheme you may want to combine these two portions of the dataset and define your own functions / classes for separating out a validation set.
+
+Data provider classes for both fixed length input data for the 10 and 25 genre classification tasks in the `mlp.data_providers` module as `MSD10GenreDataProvider` and `MSD25GenreDataProvider`. Both have similar behaviour to the `MNISTDataProvider` used extensively last semester. A `which_set` argument can be used to specify whether to return a data provided for the training dataset (`which_set='train'`) or validation dataset (`which_set='valid'`).  Both data provider classes provide a `label_map` attribute which is a list of strings which are the class labels corresponding to the integer targets (i.e. prior to conversion to a 1-of-K encoded binary vector).
+
+The test dataset files for the 10 genre classification task are provided as two separate NumPy data files `msd-10-genre-test-inputs.npz` and `msd-10-genre-test-targets.npz`. These can be loaded using [`np.load`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) function. The inputs are stored as a $10000\times3000$ array under the key `inputs` in the file `msd-10-genre-test-inputs.npz` and the targets in a 10000 element array of integer labels under the key `targets` in `msd-10-genre-test-targets.npz`. A corresponding `msd-25-genre-test-inputs.npz` file is provided for the 25 genre task inputs.
+
+### Accessing the Million Song Dataset data
+
+Before using the data provider objects you will need to make sure the data files are accessible to the `mlp` package by existing under the directory specified by the `MLP_DATA_DIR` path.
+
+The fixed length input data and associated targets is available as compressed NumPy `.npz` files in the AFS directory ``/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/``.
+
+If you are working on DICE one option is to redefine your `MLP_DATA_DIR` to directly point to the shared AFS data directory by editing the `env_vars.sh` start up file for your environment. This will avoid using up your DICE quota by storing the data files in your homespace but may involve slower initial loading of the data on initialising the data providers if many people are trying access the same files at once. The environment variable can be redefined by running
+
+```
+gedit ~/miniconda3/envs/mlp/etc/conda/activate.d/env_vars.sh
+```
+
+in a terminal window (assuming you installed `miniconda3` to your home directory), and changing the line
+
+```
+export MLP_DATA_DIR=$HOME/mlpractical/data
+```
+
+to
+
+```
+export MLP_DATA_DIR="/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/"
+```
+
+and then saving and closing the editor. You will need reload the `mlp` environment using `source activate mlp` and restart the Jupyter notebook server in the reloaded environment for the new environment variable definition to be available.
+
+Assuming your local `mlpractical` repository is in your home directory you should be able to copy the required files on DICE by running
+
+```
+cp `/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/msd*.npz ~/mlpractical/data
+```
+
+On a non-DICE machine, you will need to either [set up local access to AFS](http://computing.help.inf.ed.ac.uk/informatics-filesystem), use a remote file transfer client like `scp` or you can alternatively download the files using the iFile web interface [here](https://ifile.inf.ed.ac.uk/?path=%2Fafs%2Finf.ed.ac.uk%2Fgroup%2Fteaching%2Fmlp%2Fdata&goChange=Go) (requires DICE credentials).
+
+As some of the files are quite large you may wish to copy only those you are using currently (e.g. only the files for one of the two tasks) to your local filespace to avoid filling up your quota. The `cifar-100-test-inputs.npz` file will only be needed by those intending to enter the associated optional Kaggle competition.
--- a/notes/gpu-cluster-quick-start.md
+++ b/notes/gpu-cluster-quick-start.md
@ -0,0 +1,166 @@
+#GPU Cluster Quick-Start Guide
+
+This guide is intended to guide students into the basics of using the mlp1/mlp2 GPU clusters. It is not intended to be
+an exhaustive guide that goes deep into micro-details of the Slurm ecosystem. For an exhaustive guide please visit 
+[the Slurm Documentation page.](https://slurm.schedmd.com/)
+
+## What is the GPU Cluster?
+It's cluster consisting of server rack machines, each equipped with 8 NVIDIA 1060 GTX 6GB. Initially there are 9 servers (72 GPUs) available for use, during February this should grow up to 25 servers (200 GPUs).  The system has is managed using the open source cluster management software named
+ [Slurm](https://slurm.schedmd.com/overview.html). Slurm has various advantages over the competition, including full 
+ support of GPU resource scheduling.
+ 
+## Why do I need it?
+Most Deep Learning experiments require a large amount of compute as you have noticed in term 1. Usage of GPU can 
+accelerate experiments around 30-50x therefore making experiments that require a large amount of time feasible by 
+slashing their runtimes down by a massive factor. For a simple example consider an experiment that required a month to 
+run, that would make it infeasible to actually do research with. Now consider that experiment only requiring 1 day to 
+run, which allows one to iterate over methodologies, tune hyperparameters and overall try far more things. This simple
+example expresses one of the simplest reasons behind the GPU hype that surrounds machine learning research today.
+
+## Getting Started
+
+### Accessing the Cluster:
+1. If you are not on a DICE machine, then ssh into your dice home using ```ssh sxxxxxx@student.ssh.inf.ed.ac.uk``` 
+2. Then ssh into either mlp1 or mlp2 which are the headnodes of the GPU cluster - it does not matter which you use. To do that
+ run ```ssh mlp1``` or ```ssh mlp2```.
+3. You are now logged into the gpu cluster. If this is your first time logging in you'll need to build your environment.  This is because your home directory on the GPU cluster is separate to your usual AFS home directory on DICE.
+
+### Installing requirements:
+1. Start by downloading the miniconda3 installation file using 
+ ```wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh```.
+2. Now run the installation using ```bash Miniconda3-latest-Linux-x86_64.sh```. At the first prompt reply yes. 
+```
+Do you accept the license terms? [yes|no]
+[no] >>> yes
+```
+3. At the second prompt simply press enter.
+```
+Miniconda3 will now be installed into this location:
+/home/sxxxxxxx/miniconda3
+
+  - Press ENTER to confirm the location
+  - Press CTRL-C to abort the installation
+  - Or specify a different location below
+```
+4. Now you need to activate your environment by first running:
+```source .bashrc```.
+This reloads .bashrc which includes the new miniconda path.
+5. Run ```source activate``` to load miniconda root.
+6. Now run ```conda create -n mlp python=3``` this will create the mlp environment. At the prompt choose y.
+7. Now run ```source activate mlp```.
+8. Install git using```conda install git```. Then config git using: 
+```
+git config --global user.name "[your name]"
+git config --global user.email "[matric-number]@sms.ed.ac.uk"
+```
+9. Now clone the mlpractical repo using ```git clone https://github.com/CSTR-Edinburgh/mlpractical.git```.
+10. Checkout the mlp_tf_tutorial branch using ```git checkout mlp2017-8/mlp_tf_tutorial```.
+11. ```cd mlpractical``` and then install the required packages using ```pip install -r requirements_gpu.txt```.
+12. Once this is done you will need to setup the MLP_DATA path using the following block of commands:
+```bash
+cd ~/miniconda3/envs/mlp
+mkdir -p ./etc/conda/activate.d
+mkdir -p ./etc/conda/deactivate.d
+echo -e '#!/bin/sh\n' >> ./etc/conda/activate.d/env_vars.sh
+echo "export MLP_DATA_DIR=$HOME/mlpractical/data" >> ./etc/conda/activate.d/env_vars.sh
+echo -e '#!/bin/sh\n' >> ./etc/conda/deactivate.d/env_vars.sh
+echo 'unset MLP_DATA_DIR' >> ./etc/conda/deactivate.d/env_vars.sh
+export MLP_DATA_DIR=$HOME/mlpractical/data
+
+```
+
+13. This includes all of the required installations. Proceed to the next section outlining how to use the slurm cluster
+ management software. Please remember to clean your setup files using ```conda clean -t```
+ 
+###Using Slurm
+Slurm provides us with some commands that can be used to submit, delete, view, explore current jobs, nodes and resources among others.
+To submit a job one needs to use ```sbatch script.sh``` which will automatically find available nodes and pass the job,
+ resources and restrictions required. The script.sh is the bash script containing the job that we want to run. Since we will be using the NVIDIA CUDA and CUDNN libraries 
+ we have provided a sample script which should be used for your job submissions. The script is explained in detail below:
+ 
+```bash
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --gres=gpu:1 # use 1 GPU
+#SBATCH --mem=16000  # memory in Mb
+#SBATCH -o outfile  # send stdout to outfile
+#SBATCH -e errfile  # send stderr to errfile
+#SBATCH -t 0:01:00  # time requested in hour:minute:seconds
+
+# Setup CUDA and CUDNN related paths
+export CUDA_HOME=/opt/cuda-8.0.44
+
+export CUDNN_HOME=/opt/cuDNN-6.0_8.0
+
+export STUDENT_ID=sxxxxxx
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+# Setup a folder in the very fast scratch disk which can be used for storing experiment objects and any other files 
+# that may require storage during execution.
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+# Activate the relevant virtual environment:
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+
+# Run the python script that will train our network
+python network_trainer.py --batch_size 128 --epochs 200 --experiment_prefix vgg-net-emnist-sample-exp --dropout_rate 0.4 --batch_norm_use True --strided_dim_reduction True --seed 25012018
+
+```
+
+To actually run this use ```sbatch gpu_cluster_tutorial_training_script.sh```. When you do this, the job will be submitted and you will be given a job id.
+```bash
+[burly]sxxxxxxx: sbatch gpu_cluster_tutorial_training_script.sh 
+Submitted batch job 147
+
+```
+
+To view a list of all running jobs use ```squeue``` for a minimal presentation and ```smap``` for a more involved presentation. Furthermore to view node information use ```sinfo```.
+```bash
+squeue
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+               143 interacti     bash    iainr  R       8:00      1 landonia05
+               147 interacti gpu_clus sxxxxxxx  R       1:05      1 landonia02
+
+```
+Also in case you want to stop/delete a job use ```scancel job_id``` where job_id is the id of the job.
+
+Furthermore in case you want to test some of your code interactively to prototype your solution before you submit it to
+ a node you can use ```srun -p interactive  --gres=gpu:2 --pty python my_code_exp.py```.
+
+## Slurm Cheatsheet
+For a nice list of most commonly used Slurm commands please visit [here](https://bitsanddragons.wordpress.com/2017/04/12/slurm-user-cheatsheet/).
+
+## Syncing or copying data over to DICE
+
+At some point you will need to copy your data to DICE so you can analyse them and produce charts, write reports, store for future use etc.
+To do that there is a couple of ways:
+1. If you want to get your files while you are in dice simply run ```scp mlp1:/home/<username>/output output``` where username is the student id
+ and output is the file you want to copy. Use scp -r for folders. Furthermore you might want to just selectively sync
+  only new files. You can achieve that via syncing using rsync. 
+  ```rsync -ua --progress mlp1:/home/<username>/project_dir target_dir```. The option -u updates only changed files, -a will pack the files before sending and --progress will give you a progress bar that shows what is being sent and how fast.
+rsync is useful when you write code remotely and want to push it to the cluster, since it can track files and automatically update changed files it saves both compute time and human time, because you won't have to spent time figuring out what to send.
+
+2. If you want to send your files while in mlp1-2 to dice. First run ```renc``` give your password and enter. Then run: 
+```
+cp ~/output /afs/inf.ed.ac.uk/u/s/<studentUUN>
+```
+
+This should directly copy the files to AFS. Furthermore one can use rsync as shown before.
+
+##Additional Help
+
+If you require additional help as usual please post on piazza or ask in the tech support helpdesk.
--- a/report/algorithm.sty
+++ b/report/algorithm.sty
@ -1,79 +0,0 @@
-% ALGORITHM STYLE -- Released 8 April 1996
-%    for LaTeX-2e
-% Copyright -- 1994 Peter Williams
-% E-mail Peter.Williams@dsto.defence.gov.au
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesPackage{algorithm}
-\typeout{Document Style `algorithm' - floating environment}
-
-\RequirePackage{float}
-\RequirePackage{ifthen}
-\newcommand{\ALG@within}{nothing}
-\newboolean{ALG@within}
-\setboolean{ALG@within}{false}
-\newcommand{\ALG@floatstyle}{ruled}
-\newcommand{\ALG@name}{Algorithm}
-\newcommand{\listalgorithmname}{List of \ALG@name s}
-
-% Declare Options
-% first appearance
-\DeclareOption{plain}{
-  \renewcommand{\ALG@floatstyle}{plain}
-}
-\DeclareOption{ruled}{
-  \renewcommand{\ALG@floatstyle}{ruled}
-}
-\DeclareOption{boxed}{
-  \renewcommand{\ALG@floatstyle}{boxed}
-}
-% then numbering convention
-\DeclareOption{part}{
-  \renewcommand{\ALG@within}{part}
-  \setboolean{ALG@within}{true}
-}
-\DeclareOption{chapter}{
-  \renewcommand{\ALG@within}{chapter}
-  \setboolean{ALG@within}{true}
-}
-\DeclareOption{section}{
-  \renewcommand{\ALG@within}{section}
-  \setboolean{ALG@within}{true}
-}
-\DeclareOption{subsection}{
-  \renewcommand{\ALG@within}{subsection}
-  \setboolean{ALG@within}{true}
-}
-\DeclareOption{subsubsection}{
-  \renewcommand{\ALG@within}{subsubsection}
-  \setboolean{ALG@within}{true}
-}
-\DeclareOption{nothing}{
-  \renewcommand{\ALG@within}{nothing}
-  \setboolean{ALG@within}{true}
-}
-\DeclareOption*{\edef\ALG@name{\CurrentOption}}
-
-% ALGORITHM
-%
-\ProcessOptions
-\floatstyle{\ALG@floatstyle}
-\ifthenelse{\boolean{ALG@within}}{
-  \ifthenelse{\equal{\ALG@within}{part}}
-     {\newfloat{algorithm}{htbp}{loa}[part]}{}
-  \ifthenelse{\equal{\ALG@within}{chapter}}
-     {\newfloat{algorithm}{htbp}{loa}[chapter]}{}
-  \ifthenelse{\equal{\ALG@within}{section}}
-     {\newfloat{algorithm}{htbp}{loa}[section]}{}
-  \ifthenelse{\equal{\ALG@within}{subsection}}
-     {\newfloat{algorithm}{htbp}{loa}[subsection]}{}
-  \ifthenelse{\equal{\ALG@within}{subsubsection}}
-     {\newfloat{algorithm}{htbp}{loa}[subsubsection]}{}
-  \ifthenelse{\equal{\ALG@within}{nothing}}
-     {\newfloat{algorithm}{htbp}{loa}}{}
-}{
-  \newfloat{algorithm}{htbp}{loa}
-}
-\floatname{algorithm}{\ALG@name}
-
-\newcommand{\listofalgorithms}{\listof{algorithm}{\listalgorithmname}}
-
--- a/report/algorithmic.sty
+++ b/report/algorithmic.sty
@ -1,201 +0,0 @@
-% ALGORITHMIC STYLE -- Released 8 APRIL 1996
-%    for LaTeX version 2e
-% Copyright -- 1994 Peter Williams
-% E-mail PeterWilliams@dsto.defence.gov.au
-%
-% Modified by Alex Smola (08/2000)
-% E-mail Alex.Smola@anu.edu.au
-%
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesPackage{algorithmic}
-\typeout{Document Style `algorithmic' - environment}
-%
-\RequirePackage{ifthen}
-\RequirePackage{calc}
-\newboolean{ALC@noend}
-\setboolean{ALC@noend}{false}
-\newcounter{ALC@line}
-\newcounter{ALC@rem}
-\newlength{\ALC@tlm}
-%
-\DeclareOption{noend}{\setboolean{ALC@noend}{true}}
-%
-\ProcessOptions
-%
-% ALGORITHMIC
-\newcommand{\algorithmicrequire}{\textbf{Require:}}
-\newcommand{\algorithmicensure}{\textbf{Ensure:}}
-\newcommand{\algorithmiccomment}[1]{\{#1\}}
-\newcommand{\algorithmicend}{\textbf{end}}
-\newcommand{\algorithmicif}{\textbf{if}}
-\newcommand{\algorithmicthen}{\textbf{then}}
-\newcommand{\algorithmicelse}{\textbf{else}}
-\newcommand{\algorithmicelsif}{\algorithmicelse\ \algorithmicif}
-\newcommand{\algorithmicendif}{\algorithmicend\ \algorithmicif}
-\newcommand{\algorithmicfor}{\textbf{for}}
-\newcommand{\algorithmicforall}{\textbf{for all}}
-\newcommand{\algorithmicdo}{\textbf{do}}
-\newcommand{\algorithmicendfor}{\algorithmicend\ \algorithmicfor}
-\newcommand{\algorithmicwhile}{\textbf{while}}
-\newcommand{\algorithmicendwhile}{\algorithmicend\ \algorithmicwhile}
-\newcommand{\algorithmicloop}{\textbf{loop}}
-\newcommand{\algorithmicendloop}{\algorithmicend\ \algorithmicloop}
-\newcommand{\algorithmicrepeat}{\textbf{repeat}}
-\newcommand{\algorithmicuntil}{\textbf{until}}
-
-%changed by alex smola
-\newcommand{\algorithmicinput}{\textbf{input}}
-\newcommand{\algorithmicoutput}{\textbf{output}}
-\newcommand{\algorithmicset}{\textbf{set}}
-\newcommand{\algorithmictrue}{\textbf{true}}
-\newcommand{\algorithmicfalse}{\textbf{false}}
-\newcommand{\algorithmicand}{\textbf{and\ }}
-\newcommand{\algorithmicor}{\textbf{or\ }}
-\newcommand{\algorithmicfunction}{\textbf{function}}
-\newcommand{\algorithmicendfunction}{\algorithmicend\ \algorithmicfunction}
-\newcommand{\algorithmicmain}{\textbf{main}}
-\newcommand{\algorithmicendmain}{\algorithmicend\ \algorithmicmain}
-%end changed by alex smola
-
-\def\ALC@item[#1]{%
-\if@noparitem \@donoparitem
-  \else \if@inlabel \indent \par \fi
-         \ifhmode \unskip\unskip \par \fi
-         \if@newlist \if@nobreak \@nbitem \else
-                        \addpenalty\@beginparpenalty
-                        \addvspace\@topsep \addvspace{-\parskip}\fi
-           \else \addpenalty\@itempenalty \addvspace\itemsep
-          \fi
-    \global\@inlabeltrue
-\fi
-\everypar{\global\@minipagefalse\global\@newlistfalse
-          \if@inlabel\global\@inlabelfalse \hskip -\parindent \box\@labels
-             \penalty\z@ \fi
-          \everypar{}}\global\@nobreakfalse
-\if@noitemarg \@noitemargfalse \if@nmbrlist \refstepcounter{\@listctr}\fi \fi
-\sbox\@tempboxa{\makelabel{#1}}%
-\global\setbox\@labels
- \hbox{\unhbox\@labels \hskip \itemindent
-       \hskip -\labelwidth \hskip -\ALC@tlm
-       \ifdim \wd\@tempboxa >\labelwidth
-                \box\@tempboxa
-          \else \hbox to\labelwidth {\unhbox\@tempboxa}\fi
-       \hskip \ALC@tlm}\ignorespaces}
-%
-\newenvironment{algorithmic}[1][0]{
-\let\@item\ALC@item
-  \newcommand{\ALC@lno}{%
-\ifthenelse{\equal{\arabic{ALC@rem}}{0}}
-{{\footnotesize \arabic{ALC@line}:}}{}%
-}
-\let\@listii\@listi
-\let\@listiii\@listi
-\let\@listiv\@listi
-\let\@listv\@listi
-\let\@listvi\@listi
-\let\@listvii\@listi
-  \newenvironment{ALC@g}{
-    \begin{list}{\ALC@lno}{ \itemsep\z@ \itemindent\z@
-    \listparindent\z@ \rightmargin\z@ 
-    \topsep\z@ \partopsep\z@ \parskip\z@\parsep\z@
-    \leftmargin 1em
-    \addtolength{\ALC@tlm}{\leftmargin}
-    }
-  }
-  {\end{list}}
-  \newcommand{\ALC@it}{\addtocounter{ALC@line}{1}\addtocounter{ALC@rem}{1}\ifthenelse{\equal{\arabic{ALC@rem}}{#1}}{\setcounter{ALC@rem}{0}}{}\item}
-  \newcommand{\ALC@com}[1]{\ifthenelse{\equal{##1}{default}}%
-{}{\ \algorithmiccomment{##1}}}
-  \newcommand{\REQUIRE}{\item[\algorithmicrequire]}
-  \newcommand{\ENSURE}{\item[\algorithmicensure]}
-  \newcommand{\STATE}{\ALC@it}
-  \newcommand{\COMMENT}[1]{\algorithmiccomment{##1}}
-%changes by alex smola
-  \newcommand{\INPUT}{\item[\algorithmicinput]}
-  \newcommand{\OUTPUT}{\item[\algorithmicoutput]}
-  \newcommand{\SET}{\item[\algorithmicset]}
-%  \newcommand{\TRUE}{\algorithmictrue}
-%  \newcommand{\FALSE}{\algorithmicfalse}
-  \newcommand{\AND}{\algorithmicand}
-  \newcommand{\OR}{\algorithmicor}
-  \newenvironment{ALC@func}{\begin{ALC@g}}{\end{ALC@g}}
-  \newenvironment{ALC@main}{\begin{ALC@g}}{\end{ALC@g}}
-%end changes by alex smola
-  \newenvironment{ALC@if}{\begin{ALC@g}}{\end{ALC@g}}
-  \newenvironment{ALC@for}{\begin{ALC@g}}{\end{ALC@g}}
-  \newenvironment{ALC@whl}{\begin{ALC@g}}{\end{ALC@g}}
-  \newenvironment{ALC@loop}{\begin{ALC@g}}{\end{ALC@g}}
-  \newenvironment{ALC@rpt}{\begin{ALC@g}}{\end{ALC@g}}
-  \renewcommand{\\}{\@centercr}
-  \newcommand{\IF}[2][default]{\ALC@it\algorithmicif\ ##2\ \algorithmicthen%
-\ALC@com{##1}\begin{ALC@if}}
-  \newcommand{\SHORTIF}[2]{\ALC@it\algorithmicif\ ##1\
-    \algorithmicthen\ {##2}}
-  \newcommand{\ELSE}[1][default]{\end{ALC@if}\ALC@it\algorithmicelse%
-\ALC@com{##1}\begin{ALC@if}}
-  \newcommand{\ELSIF}[2][default]%
-{\end{ALC@if}\ALC@it\algorithmicelsif\ ##2\ \algorithmicthen%
-\ALC@com{##1}\begin{ALC@if}}
-  \newcommand{\FOR}[2][default]{\ALC@it\algorithmicfor\ ##2\ \algorithmicdo%
-\ALC@com{##1}\begin{ALC@for}}
-  \newcommand{\FORALL}[2][default]{\ALC@it\algorithmicforall\ ##2\ %
-\algorithmicdo%
-\ALC@com{##1}\begin{ALC@for}}
-  \newcommand{\SHORTFORALL}[2]{\ALC@it\algorithmicforall\ ##1\ %
-    \algorithmicdo\ {##2}}
-  \newcommand{\WHILE}[2][default]{\ALC@it\algorithmicwhile\ ##2\ %
-\algorithmicdo%
-\ALC@com{##1}\begin{ALC@whl}}
-  \newcommand{\LOOP}[1][default]{\ALC@it\algorithmicloop%
-\ALC@com{##1}\begin{ALC@loop}}
-%changed by alex smola
-  \newcommand{\FUNCTION}[2][default]{\ALC@it\algorithmicfunction\ ##2\ %
-    \ALC@com{##1}\begin{ALC@func}}
-  \newcommand{\MAIN}[2][default]{\ALC@it\algorithmicmain\ ##2\ %
-    \ALC@com{##1}\begin{ALC@main}}
-%end changed by alex smola
-  \newcommand{\REPEAT}[1][default]{\ALC@it\algorithmicrepeat%
-    \ALC@com{##1}\begin{ALC@rpt}}
-    \newcommand{\UNTIL}[1]{\end{ALC@rpt}\ALC@it\algorithmicuntil\ ##1}
-  \ifthenelse{\boolean{ALC@noend}}{
-    \newcommand{\ENDIF}{\end{ALC@if}}
-    \newcommand{\ENDFOR}{\end{ALC@for}}
-    \newcommand{\ENDWHILE}{\end{ALC@whl}}
-    \newcommand{\ENDLOOP}{\end{ALC@loop}}
-    \newcommand{\ENDFUNCTION}{\end{ALC@func}}
-    \newcommand{\ENDMAIN}{\end{ALC@main}}
-  }{
-    \newcommand{\ENDIF}{\end{ALC@if}\ALC@it\algorithmicendif}
-    \newcommand{\ENDFOR}{\end{ALC@for}\ALC@it\algorithmicendfor}
-    \newcommand{\ENDWHILE}{\end{ALC@whl}\ALC@it\algorithmicendwhile}
-    \newcommand{\ENDLOOP}{\end{ALC@loop}\ALC@it\algorithmicendloop}
-    \newcommand{\ENDFUNCTION}{\end{ALC@func}\ALC@it\algorithmicendfunction}
-    \newcommand{\ENDMAIN}{\end{ALC@main}\ALC@it\algorithmicendmain}
-  } 
-  \renewcommand{\@toodeep}{}
-  \begin{list}{\ALC@lno}{\setcounter{ALC@line}{0}\setcounter{ALC@rem}{0}%
-      \itemsep\z@ \itemindent\z@ \listparindent\z@%
-      \partopsep\z@ \parskip\z@ \parsep\z@%
-      \labelsep 0.5em \topsep 0.2em%
-      \ifthenelse{\equal{#1}{0}}
-      {\labelwidth 0.5em }
-      {\labelwidth  1.2em }
-      \leftmargin\labelwidth \addtolength{\leftmargin}{\labelsep}
-      \ALC@tlm\labelsep
-      }
-    }
-  {\end{list}}
-
-
-
-
-
-
-
-
-
-
-
-
-
-
--- a/report/example-refs.bib
+++ b/report/example-refs.bib
@ -1,75 +0,0 @@
-@inproceedings{langley00,
- author    = {P. Langley},
- title     = {Crafting Papers on Machine Learning},
- year      = {2000},
- pages     = {1207--1216},
- editor    = {Pat Langley},
- booktitle     = {Proceedings of the 17th International Conference
-              on Machine Learning (ICML 2000)},
- address   = {Stanford, CA},
- publisher = {Morgan Kaufmann}
-}
-
-@TechReport{mitchell80,
-  author = 	 "T. M. Mitchell",
-  title = 	 "The Need for Biases in Learning Generalizations",
-  institution =  "Computer Science Department, Rutgers University",
-  year = 	 "1980",
-  address =	 "New Brunswick, MA",
-}
-
-@phdthesis{kearns89,
-  author = {M. J. Kearns},
-  title =  {Computational Complexity of Machine Learning},
-  school = {Department of Computer Science, Harvard University},
-  year =   {1989}
-}
-
-@Book{MachineLearningI,
-  editor = 	 "R. S. Michalski and J. G. Carbonell and T.
-		  M. Mitchell",
-  title = 	 "Machine Learning: An Artificial Intelligence
-		  Approach, Vol. I",
-  publisher = 	 "Tioga",
-  year = 	 "1983",
-  address =	 "Palo Alto, CA"
-}
-
-@Book{DudaHart2nd,
-  author =       "R. O. Duda and P. E. Hart and D. G. Stork",
-  title =        "Pattern Classification",
-  publisher =    "John Wiley and Sons",
-  edition =      "2nd",
-  year =         "2000"
-}
-
-@misc{anonymous,
-  title= {Suppressed for Anonymity},
-  author= {Author, N. N.},
-  year= {2011},
-}
-
-@InCollection{Newell81,
-  author =       "A. Newell and P. S. Rosenbloom",
-  title =        "Mechanisms of Skill Acquisition and the Law of
-                  Practice", 
-  booktitle =    "Cognitive Skills and Their Acquisition",
-  pages =        "1--51",
-  publisher =    "Lawrence Erlbaum Associates, Inc.",
-  year =         "1981",
-  editor =       "J. R. Anderson",
-  chapter =      "1",
-  address =      "Hillsdale, NJ"
-}
-
-
-@Article{Samuel59,
-  author = 	 "A. L. Samuel",
-  title = 	 "Some Studies in Machine Learning Using the Game of
-		  Checkers",
-  journal =	 "IBM Journal of Research and Development",
-  year =	 "1959",
-  volume =	 "3",
-  number =	 "3",
-  pages =	 "211--229"
-}
--- a/report/fancyhdr.sty
+++ b/report/fancyhdr.sty
@ -1,485 +0,0 @@
-% fancyhdr.sty version 3.2
-% Fancy headers and footers for LaTeX.
-% Piet van Oostrum, 
-% Dept of Computer and Information Sciences, University of Utrecht,
-% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
-% Telephone: +31 30 2532180. Email: piet@cs.uu.nl
-% ========================================================================
-% LICENCE:
-% This file may be distributed under the terms of the LaTeX Project Public
-% License, as described in lppl.txt in the base LaTeX distribution.
-% Either version 1 or, at your option, any later version.
-% ========================================================================
-% MODIFICATION HISTORY:
-% Sep 16, 1994
-% version 1.4: Correction for use with \reversemargin
-% Sep 29, 1994:
-% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands
-% Oct 4, 1994:
-% version 1.6: Reset single spacing in headers/footers for use with
-% setspace.sty or doublespace.sty
-% Oct 4, 1994:
-% version 1.7: changed \let\@mkboth\markboth to
-% \def\@mkboth{\protect\markboth} to make it more robust
-% Dec 5, 1994:
-% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more
-% importantly) use the \chapter/sectionmark definitions from ps@headings if
-% they exist (which should be true for all standard classes).
-% May 31, 1995:
-% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage...
-% construction in the doc did not work properly with the fancyplain style. 
-% June 1, 1995:
-% version 1.91: The definition of \@mkboth wasn't restored on subsequent
-% \pagestyle{fancy}'s.
-% June 1, 1995:
-% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain}
-% \pagestyle{fancy} would erroneously select the plain version.
-% June 1, 1995:
-% version 1.93: \fancypagestyle command added.
-% Dec 11, 1995:
-% version 1.94: suggested by Conrad Hughes <chughes@maths.tcd.ie>
-% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule
-% position (old hardcoded value of .3\normalbaselineskip is far too high
-% when used with very small footer fonts).
-% Jan 31, 1996:
-% version 1.95: call \@normalsize in the reset code if that is defined,
-% otherwise \normalsize.
-% this is to solve a problem with ucthesis.cls, as this doesn't
-% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't
-% work as this is optimized to do very little, so there \@normalsize should
-% be called. Hopefully this code works for all versions of LaTeX known to
-% mankind.  
-% April 25, 1996:
-% version 1.96: initialize \headwidth to a magic (negative) value to catch
-% most common cases that people change it before calling \pagestyle{fancy}.
-% Note it can't be initialized when reading in this file, because
-% \textwidth could be changed afterwards. This is quite probable.
-% We also switch to \MakeUppercase rather than \uppercase and introduce a
-% \nouppercase command for use in headers. and footers.
-% May 3, 1996:
-% version 1.97: Two changes:
-% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults
-% for the chapter and section marks. The current version of amsbook and
-% amsart classes don't seem to need them anymore. Moreover the standard
-% latex classes don't use \markboth if twoside isn't selected, and this is
-% confusing as \leftmark doesn't work as expected.
-% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem
-% in the amsbook and amsart classes, that make global changes to \topskip,
-% which are reset in \ps@empty. Hopefully this doesn't break other things.
-% May 7, 1996:
-% version 1.98:
-% Added % after the line  \def\nouppercase
-% May 7, 1996:
-% version 1.99: This is the alpha version of fancyhdr 2.0
-% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf.
-% Changed \headrulewidth, \footrulewidth, \footruleskip to
-% macros rather than length parameters, In this way they can be
-% conditionalized and they don't consume length registers. There is no need
-% to have them as length registers unless you want to do calculations with
-% them, which is unlikely. Note that this may make some uses of them
-% incompatible (i.e. if you have a file that uses \setlength or \xxxx=)
-% May 10, 1996:
-% version 1.99a:
-% Added a few more % signs
-% May 10, 1996:
-% version 1.99b:
-% Changed the syntax of \f@nfor to be resistent to catcode changes of :=
-% Removed the [1] from the defs of \lhead etc. because the parameter is
-% consumed by the \@[xy]lhead etc. macros.
-% June 24, 1997:
-% version 1.99c:
-% corrected \nouppercase to also include the protected form of \MakeUppercase
-% \global added to manipulation of \headwidth.
-% \iffootnote command added.
-% Some comments added about \@fancyhead and \@fancyfoot.
-% Aug 24, 1998
-% version 1.99d
-% Changed the default \ps@empty to \ps@@empty in order to allow
-% \fancypagestyle{empty} redefinition.
-% Oct 11, 2000
-% version 2.0
-% Added LPPL license clause.
-%
-% A check for \headheight is added. An errormessage is given (once) if the
-% header is too large. Empty headers don't generate the error even if
-% \headheight is very small or even 0pt. 
-% Warning added for the use of 'E' option when twoside option is not used.
-% In this case the 'E' fields will never be used.
-%
-% Mar 10, 2002
-% version 2.1beta
-% New command: \fancyhfoffset[place]{length}
-% defines offsets to be applied to the header/footer to let it stick into
-% the margins (if length > 0).
-% place is like in fancyhead, except that only E,O,L,R can be used.
-% This replaces the old calculation based on \headwidth and the marginpar
-% area.
-% \headwidth will be dynamically calculated in the headers/footers when
-% this is used.
-%
-% Mar 26, 2002
-% version 2.1beta2
-% \fancyhfoffset now also takes h,f as possible letters in the argument to
-% allow the header and footer widths to be different.
-% New commands \fancyheadoffset and \fancyfootoffset added comparable to
-% \fancyhead and \fancyfoot.
-% Errormessages and warnings have been made more informative.
-%
-% Dec 9, 2002
-% version 2.1
-% The defaults for \footrulewidth, \plainheadrulewidth and
-% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when
-% someone inadvertantly uses \setlength to change any of these, the value
-% of \z@skip will not be changed, rather an errormessage will be given.
-
-% March 3, 2004
-% Release of version 3.0
-
-% Oct 7, 2004
-% version 3.1
-% Added '\endlinechar=13' to \fancy@reset to prevent problems with
-% includegraphics in header when verbatiminput is active.
-
-% March 22, 2005
-% version 3.2
-% reset \everypar (the real one) in \fancy@reset because spanish.ldf does
-% strange things with \everypar between << and >>.
-
-\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty}
-
-\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else
-                                   \fancy@gbl\def#1{#2\strut}\fi}
-
-\let\fancy@gbl\global
-
-\def\@fancyerrmsg#1{%
-        \ifx\PackageError\undefined
-        \errmessage{#1}\else
-        \PackageError{Fancyhdr}{#1}{}\fi}
-\def\@fancywarning#1{%
-        \ifx\PackageWarning\undefined
-        \errmessage{#1}\else
-        \PackageWarning{Fancyhdr}{#1}{}\fi}
-
-% Usage: \@forc \var{charstring}{command to be executed for each char}
-% This is similar to LaTeX's \@tfor, but expands the charstring.
-
-\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}}
-\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else
-                                    \f@@rc#1#2\f@@rc{#3}\fi}
-\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}}
-
-% Usage: \f@nfor\name:=list\do{body}
-% Like LaTeX's \@for but an empty list is treated as a list with an empty
-% element
-
-\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}%
-    \expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}}
-
-% Usage: \def@ult \cs{defaults}{argument}
-% sets \cs to the characters from defaults appearing in argument
-% or defaults if it would be empty. All characters are lowercased.
-
-\newcommand\def@ult[3]{%
-    \edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a
-    \def#1{}%
-    \@forc\tmpf@ra{#2}%
-        {\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}%
-    \ifx\@empty#1\def#1{#2}\fi}
-% 
-% \if@in <char><set><truecase><falsecase>
-%
-\newcommand{\if@in}[4]{%
-    \edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}%
-    \expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi}
-
-\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}%
-                                     {\f@ncyhf\fancyhead h[]}}
-\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}%
-                                     {\f@ncyhf\fancyfoot f[]}}
-\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}%
-                                   {\f@ncyhf\fancyhf{}[]}}
-
-% New commands for offsets added
-
-\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}%
-                                           {\f@ncyhfoffs\fancyheadoffset h[]}}
-\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}%
-                                           {\f@ncyhfoffs\fancyfootoffset f[]}}
-\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}%
-                                         {\f@ncyhfoffs\fancyhfoffset{}[]}}
-
-% The header and footer fields are stored in command sequences with
-% names of the form: \f@ncy<x><y><z> with <x> for [eo], <y> from [lcr]
-% and <z> from [hf].
-
-\def\f@ncyhf#1#2[#3]#4{%
-    \def\temp@c{}%
-    \@forc\tmpf@ra{#3}%
-        {\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}%
-            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
-    \ifx\@empty\temp@c\else
-        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
-          [#3]}%
-    \fi
-    \f@nfor\temp@c{#3}%
-        {\def@ult\f@@@eo{eo}\temp@c
-         \if@twoside\else
-           \if\f@@@eo e\@fancywarning
-             {\string#1's `E' option without twoside option is useless}\fi\fi
-         \def@ult\f@@@lcr{lcr}\temp@c
-         \def@ult\f@@@hf{hf}{#2\temp@c}%
-         \@forc\f@@eo\f@@@eo
-             {\@forc\f@@lcr\f@@@lcr
-                 {\@forc\f@@hf\f@@@hf
-                     {\expandafter\fancy@def\csname
-                      f@ncy\f@@eo\f@@lcr\f@@hf\endcsname
-                      {#4}}}}}}
-
-\def\f@ncyhfoffs#1#2[#3]#4{%
-    \def\temp@c{}%
-    \@forc\tmpf@ra{#3}%
-        {\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}%
-            {}{\edef\temp@c{\temp@c\tmpf@ra}}}%
-    \ifx\@empty\temp@c\else
-        \@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
-          [#3]}%
-    \fi
-    \f@nfor\temp@c{#3}%
-        {\def@ult\f@@@eo{eo}\temp@c
-         \if@twoside\else
-           \if\f@@@eo e\@fancywarning
-             {\string#1's `E' option without twoside option is useless}\fi\fi
-         \def@ult\f@@@lcr{lr}\temp@c
-         \def@ult\f@@@hf{hf}{#2\temp@c}%
-         \@forc\f@@eo\f@@@eo
-             {\@forc\f@@lcr\f@@@lcr
-                 {\@forc\f@@hf\f@@@hf
-                     {\expandafter\setlength\csname
-                      f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname
-                      {#4}}}}}%
-     \fancy@setoffs}
-
-% Fancyheadings version 1 commands. These are more or less deprecated,
-% but they continue to work.
-
-\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}}
-\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}}
-\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}}
-
-\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}}
-\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}}
-\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}}
-
-\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}}
-\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}}
-\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}}
-
-\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}}
-\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}}
-\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}}
-
-\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}}
-\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}}
-\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}}
-
-\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}}
-\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}}
-\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}}
-
-\newlength{\fancy@headwidth}
-\let\headwidth\fancy@headwidth
-\newlength{\f@ncyO@elh}
-\newlength{\f@ncyO@erh}
-\newlength{\f@ncyO@olh}
-\newlength{\f@ncyO@orh}
-\newlength{\f@ncyO@elf}
-\newlength{\f@ncyO@erf}
-\newlength{\f@ncyO@olf}
-\newlength{\f@ncyO@orf}
-\newcommand{\headrulewidth}{0.4pt}
-\newcommand{\footrulewidth}{0pt}
-\newcommand{\footruleskip}{.3\normalbaselineskip}
-
-% Fancyplain stuff shouldn't be used anymore (rather
-% \fancypagestyle{plain} should be used), but it must be present for
-% compatibility reasons.
-
-\newcommand{\plainheadrulewidth}{0pt}
-\newcommand{\plainfootrulewidth}{0pt}
-\newif\if@fancyplain \@fancyplainfalse
-\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi}
-
-\headwidth=-123456789sp %magic constant
-
-% Command to reset various things in the headers:
-% a.o.  single spacing (taken from setspace.sty)
-% and the catcode of ^^M (so that epsf files in the header work if a
-% verbatim crosses a page boundary)
-% It also defines a \nouppercase command that disables \uppercase and
-% \Makeuppercase. It can only be used in the headers and footers.
-\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf
-\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13
- \def\baselinestretch{1}%
- \def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax
-     \expandafter\let\csname MakeUppercase \endcsname\relax##1}}%
- \ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e
-   \ifx\@normalsize\undefined \normalsize % for ucthesis.cls
-   \else \@normalsize \fi
- \else% NFSS (2.09) present
-  \@newbaseline%
- \fi}
-
-% Initialization of the head and foot text.
-
-% The default values still contain \fancyplain for compatibility.
-\fancyhf{} % clear all
-% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages
-% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages
-\if@twoside
-  \fancyhead[el,or]{\fancyplain{}{\sl\rightmark}}
-  \fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}}
-\else
-  \fancyhead[l]{\fancyplain{}{\sl\rightmark}}
-  \fancyhead[r]{\fancyplain{}{\sl\leftmark}}
-\fi
-\fancyfoot[c]{\rm\thepage} % page number
-
-% Use box 0 as a temp box and dimen 0 as temp dimen. 
-% This can be done, because this code will always
-% be used inside another box, and therefore the changes are local.
-
-\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning
-  {\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J
-    We now make it that large for the rest of the document.^^J
-    This may cause the page layout to be inconsistent, however\@gobble}%
-  \dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi
-  \box0}
-
-% Put together a header or footer given the left, center and
-% right text, fillers at left and right and a rule.
-% The \lap commands put the text into an hbox of zero size,
-% so overlapping text does not generate an errormessage.
-% These macros have 5 parameters:
-% 1. LEFTSIDE BEARING % This determines at which side the header will stick
-%    out. When \fancyhfoffset is used this calculates \headwidth, otherwise
-%    it is \hss or \relax (after expansion).
-% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component.
-% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp.
-% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component.
-% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion).
-
-\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
-  \@fancyvbox\headheight{\hbox
-    {\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill
-      \parbox[b]{\headwidth}{\centering#3}\hfill
-      \llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5}
-
-\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
-    \@fancyvbox\footskip{\footrule
-      \hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill
-        \parbox[t]{\headwidth}{\centering#3}\hfill
-        \llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5}
-
-\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi
-    \hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}}
-
-\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi
-    \vskip-\footruleskip\vskip-\footrulewidth
-    \hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}}
-
-\def\ps@fancy{%
-\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook
-%
-% Define \MakeUppercase for old LaTeXen.
-% Note: we used \def rather than \let, so that \let\uppercase\relax (from
-% the version 1 documentation) will still work.
-%
-\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}%
-\@ifundefined{chapter}{\def\sectionmark##1{\markboth
-{\MakeUppercase{\ifnum \c@secnumdepth>\z@
- \thesection\hskip 1em\relax \fi ##1}}{}}%
-\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne
- \thesubsection\hskip 1em\relax \fi ##1}}}%
-{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne
- \@chapapp\ \thechapter. \ \fi ##1}}{}}%
-\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@
- \thesection. \ \fi ##1}}}}%
-%\csname ps@headings\endcsname % use \ps@headings defaults if they exist
-\ps@@fancy
-\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}%
-% Initialize \headwidth if the user didn't
-%
-\ifdim\headwidth<0sp
-%
-% This catches the case that \headwidth hasn't been initialized and the
-% case that the user added something to \headwidth in the expectation that
-% it was initialized to \textwidth. We compensate this now. This loses if
-% the user intended to multiply it by a factor. But that case is more
-% likely done by saying something like \headwidth=1.2\textwidth. 
-% The doc says you have to change \headwidth after the first call to
-% \pagestyle{fancy}. This code is just to catch the most common cases were
-% that requirement is violated.
-%
-    \global\advance\headwidth123456789sp\global\advance\headwidth\textwidth
-\fi}
-\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy}
-\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy}
-\let\ps@@empty\ps@empty
-\def\ps@@fancy{%
-\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip
-\def\@mkboth{\protect\markboth}%
-\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}%
-\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}%
-\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}%
-\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}%
-}
-% Default definitions for compatibility mode:
-% These cause the header/footer to take the defined \headwidth as width
-% And to shift in the direction of the marginpar area
-
-\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi}
-\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi}
-\let\fancy@Oelh\fancy@Oorh
-\let\fancy@Oerh\fancy@Oolh
-
-\let\fancy@Oolf\fancy@Oolh
-\let\fancy@Oorf\fancy@Oorh
-\let\fancy@Oelf\fancy@Oelh
-\let\fancy@Oerf\fancy@Oerh
-
-% New definitions for the use of \fancyhfoffset
-% These calculate the \headwidth from \textwidth and the specified offsets.
-
-\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh
-                   \advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh}
-\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh
-                   \advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh}
-
-\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf
-                   \advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf}
-\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf
-                   \advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf}
-
-\def\fancy@setoffs{%
-% Just in case \let\headwidth\textwidth was used
-  \fancy@gbl\let\headwidth\fancy@headwidth
-  \fancy@gbl\let\fancy@Oolh\fancy@offsolh
-  \fancy@gbl\let\fancy@Oelh\fancy@offselh
-  \fancy@gbl\let\fancy@Oorh\hss
-  \fancy@gbl\let\fancy@Oerh\hss
-  \fancy@gbl\let\fancy@Oolf\fancy@offsolf
-  \fancy@gbl\let\fancy@Oelf\fancy@offself
-  \fancy@gbl\let\fancy@Oorf\hss
-  \fancy@gbl\let\fancy@Oerf\hss}
-
-\newif\iffootnote
-\let\latex@makecol\@makecol
-\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi
-\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol}
-\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi}
-\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi}
-\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi}
-
-\newcommand{\fancypagestyle}[2]{%
-  \@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}}
--- a/report/icml2017.bst
+++ b/report/icml2017.bst
--- a/report/icml_numpapers.pdf
+++ b/report/icml_numpapers.pdf
--- a/report/mlp-cw1-template.pdf
+++ b/report/mlp-cw1-template.pdf
--- a/report/mlp-cw1-template.tex
+++ b/report/mlp-cw1-template.tex
@ -1,207 +0,0 @@
-%% Template for MLP Coursework 1 / 16 October 2017 
-
-%% Based on  LaTeX template for ICML 2017 - example_paper.tex at 
-%%  https://2017.icml.cc/Conferences/2017/StyleAuthorInstructions
-
-\documentclass{article}
-
-\usepackage[T1]{fontenc}
-\usepackage{amssymb,amsmath}
-\usepackage{txfonts}
-\usepackage{microtype}
-
-% For figures
-\usepackage{graphicx}
-\usepackage{subfigure} 
-
-% For citations
-\usepackage{natbib}
-
-% For algorithms
-\usepackage{algorithm}
-\usepackage{algorithmic}
-
-% the hyperref package is used to produce hyperlinks in the
-% resulting PDF.  If this breaks your system, please commend out the
-% following usepackage line and replace \usepackage{mlp2017} with
-% \usepackage[nohyperref]{mlp2017} below.
-\usepackage{hyperref}
-\usepackage{url}
-\urlstyle{same}
-
-% Packages hyperref and algorithmic misbehave sometimes.  We can fix
-% this with the following command.
-\newcommand{\theHalgorithm}{\arabic{algorithm}}
-
-
-% Set up MLP coursework style (based on ICML style)
-\usepackage{mlp2017}
-\mlptitlerunning{MLP Coursework 1 (\studentNumber)}
-\bibliographystyle{icml2017}
-
-
-\DeclareMathOperator{\softmax}{softmax}
-\DeclareMathOperator{\sigmoid}{sigmoid}
-\DeclareMathOperator{\sgn}{sgn}
-\DeclareMathOperator{\relu}{relu}
-\DeclareMathOperator{\lrelu}{lrelu}
-\DeclareMathOperator{\elu}{elu}
-\DeclareMathOperator{\selu}{selu}
-\DeclareMathOperator{\maxout}{maxout}
-
-%% You probably do not need to change anything above this comment
-
-%% REPLACE this with your student number
-\def\studentNumber{s1754321}
-
-\begin{document} 
-
-\twocolumn[
-\mlptitle{MLP Coursework 1: Activation Functions}
-
-\centerline{\studentNumber}
-
-\vskip 7mm
-]
-
-\begin{abstract} 
-The abstract should be 100--200 words long,  providing a concise summary of the contents of your report.
-\end{abstract} 
-
-\section{Introduction}
-\label{sec:intro}
-This document provides a template for the MLP coursework 1 report.  In particular, it structures the document into five sections  (plus an abstract and the references) -- you should keep to this structure for your report.  If you want to use subsections within a section that is fine, but please do not use any deeper structuring.  In this template the text in each section will include an outline of what you should include in each section, along with some practical LaTeX examples (for example figures, tables, algorithms).  Your document should be no longer than \textbf{six pages},  with an additional page allowed for references.
-
-The introduction should place your work in context, giving the overall motivation for the work, and clearly outlining the research questions you have explored -- in this case comparison of the behaviour of the different activation functions,  experimental investigation of the impact of the depth of the network with respect to accuracy, and experimental investigation of different approaches to weight initialisation.  This section should also include a concise description of the MNIST task and  data -- be precise: for example state the size of the training and validation sets.
-
-
-\section{Activation functions}
-\label{sec:actfn}
-This section should cover the theoretical methodology -- in this case you should present the four activation functions: ReLU, Leaky ReLU, ELU, and SELU.  I didn't do it in this document, but the first time you use an acronym you should say what it stands for, for example Restricted Linear Unit (ReLU).  You should use equations to concisely describe each activation function.  For example, ReLU: 
-\begin{equation}
-  \relu(x) = \max(0, x) ,
-\end{equation} 
-which has the gradient:
-\begin{equation}
-  \frac{d}{dx} \relu(x) =
-     \begin{cases} 
-      0      & \quad \text{if } x \leq  0 \\
-      1       & \quad \text{if } x > 0 .
-    \end{cases} 
-\end{equation}
-The \LaTeX for the derivatives is slightly more complicated.  We provided definitions near the top of the file (the part before \verb+\begin{document}+) for \verb+\relu+, \verb+\lrelu+, \verb+\elu+, and \verb+\selu+.  There is no need to discuss the unit tests for these activation functions in this report.
-
-It is probably not needed in this report, but if you would like to include an algorithm in your report, please use the \verb+algorithm+ and \verb+algorithmic+ environments to format pseudocode (for instance, Algorithm~\ref{alg:example}). These require the corresponding style files, \verb+algorithm.sty+ and \verb+algorithmic.sty+ which are supplied with this package. 
-
-\begin{algorithm}[ht]
-\begin{algorithmic}
-   \STATE {\bfseries Input:} data $x_i$, size $m$
-   \REPEAT
-   \STATE Initialize $noChange = true$.
-   \FOR{$i=1$ {\bfseries to} $m-1$}
-   \IF{$x_i > x_{i+1}$} 
-   \STATE Swap $x_i$ and $x_{i+1}$
-   \STATE $noChange = false$
-   \ENDIF
-   \ENDFOR
-   \UNTIL{$noChange$ is $true$}
-\end{algorithmic}
-  \caption{Bubble Sort}
-  \label{alg:example}
-\end{algorithm}
-
-\section{Experimental comparison of activation functions}
-\label{sec:actexpts}
-In this section you should present the results and discussion of your experiments comparing networks using the different activation functions on the MNIST task.  As explained in the coursework document, you should use 2 hidden layers with 100 hidden units per layer for these experiments.  You can compare the learning curves (error vs epoch) for training and/or validation, and the validation set accuracies. 
-
-Your experimental sections should include graphs (for instance, figure~\ref{fig:sample-graph}) and/or tables (for instance, table~\ref{tab:sample-table})\footnote{These examples were taken from the ICML template paper.}, using the \verb+figure+ and \verb+table+ environments, in which you use \verb+\includegraphics+ to include an image (pdf, png, or jpg formats).  Please export graphs as 
-\href{https://en.wikipedia.org/wiki/Vector_graphics}{vector graphics}
-rather than \href{https://en.wikipedia.org/wiki/Raster_graphics}{raster
-files} as this will make sure all detail in the plot is visible.
-Matplotlib supports saving high quality figures in a wide range of
-common image formats using the
-\href{http://matplotlib.org/api/pyplot_api.html\#matplotlib.pyplot.savefig}{\texttt{savefig}}
-function. \textbf{You should use \texttt{savefig} rather than copying
-the screen-resolution raster images outputted in the notebook.} An
-example of using \texttt{savefig} to save a figure as a PDF file (which
-can be included as graphics in a \LaTeX document is given in the coursework document.
-
-If you need a figure or table to stretch across two columns use the \verb+figure*+ or \verb+table*+ environment instead of the \verb+figure+ or \verb+table+ environment.  Use the \verb+subfigure+ environment if you want to include multiple graphics in a single figure.
-
-\begin{figure}[tb]
-\vskip 5mm
-\begin{center}
-\centerline{\includegraphics[width=\columnwidth]{icml_numpapers}}
-\caption{Historical locations and number of accepted papers for International
-  Machine Learning Conferences (ICML 1993 -- ICML 2008) and
-  International Workshops on Machine Learning (ML 1988 -- ML
-  1992). At the time this figure was produced, the number of
-  accepted papers for ICML 2008 was unknown and instead estimated.}
-\label{fig:sample-graph}
-\end{center}
-\vskip -5mm
-\end{figure} 
-
-\begin{table}[tb]
-\vskip 3mm
-\begin{center}
-\begin{small}
-\begin{sc}
-\begin{tabular}{lcccr}
-\hline
-\abovespace\belowspace
-Data set & Naive & Flexible & Better? \\
-\hline
-\abovespace
-Breast    & 95.9$\pm$ 0.2& 96.7$\pm$ 0.2& $\surd$ \\
-Cleveland & 83.3$\pm$ 0.6& 80.0$\pm$ 0.6& $\times$\\
-Glass2    & 61.9$\pm$ 1.4& 83.8$\pm$ 0.7& $\surd$ \\
-Credit    & 74.8$\pm$ 0.5& 78.3$\pm$ 0.6&         \\
-Horse     & 73.3$\pm$ 0.9& 69.7$\pm$ 1.0& $\times$\\
-Meta      & 67.1$\pm$ 0.6& 76.5$\pm$ 0.5& $\surd$ \\
-Pima      & 75.1$\pm$ 0.6& 73.9$\pm$ 0.5&         \\
-\belowspace
-Vehicle   & 44.9$\pm$ 0.6& 61.5$\pm$ 0.4& $\surd$ \\
-\hline
-\end{tabular}
-\end{sc}
-\end{small}
-\caption{Classification accuracies for naive Bayes and flexible 
-Bayes on various data sets.}
-\label{tab:sample-table}
-\end{center}
-\vskip -3mm
-\end{table}
-
-\section{Deep neural network experiments}
-\label{sec:dnnexpts}
-This section should report on your experiments on deeper networks for MNIST.  The two sets of experiments are to explore the impact of the depth of the network (number of hidden layers), and a comparison of different approaches to weight initialisation.
-
-In this section, and in the previous section, you should present your experimental results clearly and concisely, followed by an interpretation and discussion of results. You need to present your results in a way that makes it easy for a reader to understand what they mean. You should facilitate comparisons either using graphs with multiple curves or (if appropriate, e.g. for accuracies) a results table. You need to avoid having too many figures, poorly labelled graphs, and graphs which should be comparable but which use different axis scales. A good presentation will enable the reader to compare trends in the same graph -- each graph should summarise the results relating to a particular research (sub)question.
-
-Your discussion should interpret the results, both in terms of summarising the outcomes of a particular experiment, and attempting to relate to the underlying models. A good report would have some analysis, resulting in an understanding of why particular results are observed, perhaps with reference to the literature. Use bibtex to organise your references -- in this case the references are in the file \verb+example-refs.bib+.  Here is a an example reference \citep{langley00}.  
-
-
-
-
-\section{Conclusions}
-\label{sec:concl}
-You should draw conclusions from the experiments, related to the research questions outlined in the introduction (section~\ref{sec:intro}). You should state the conclusions clearly and concisely. It is good if the conclusion from one experiment influenced what you did in later experiments -- your aim is to learn from your experiments. Extra credit if you relate your findings to what has been reported in the literature.
-
-A good conclusions section would also include a further work discussion, building on work done so far, and referencing the literature where appropriate.
-
-\bibliography{example-refs}
-
-\end{document} 
-
-
-% This document was modified from the file originally made available by
-% Pat Langley and Andrea Danyluk for ICML-2K. This version was
-% created by Lise Getoor and Tobias Scheffer, it was slightly modified  
-% from the 2010 version by Thorsten Joachims & Johannes Fuernkranz, 
-% slightly modified from the 2009 version by Kiri Wagstaff and 
-% Sam Roweis's 2008 version, which is slightly modified from 
-% Prasad Tadepalli's 2007 version which is a lightly 
-% changed version of the previous year's version by Andrew Moore, 
-% which was in turn edited from those of Kristian Kersting and 
-% Codrina Lauth. Alex Smola contributed to the algorithmic style files.  
--- a/report/mlp-cw2-template.pdf
+++ b/report/mlp-cw2-template.pdf
--- a/report/mlp-cw2-template.tex
+++ b/report/mlp-cw2-template.tex
@ -1,195 +0,0 @@
-%% Template for MLP Coursework 2 / 6 November 2017 
-
-%% Based on  LaTeX template for ICML 2017 - example_paper.tex at 
-%%  https://2017.icml.cc/Conferences/2017/StyleAuthorInstructions
-
-\documentclass{article}
-
-\usepackage[T1]{fontenc}
-\usepackage{amssymb,amsmath}
-\usepackage{txfonts}
-\usepackage{microtype}
-
-% For figures
-\usepackage{graphicx}
-\usepackage{subfigure} 
-
-% For citations
-\usepackage{natbib}
-
-% For algorithms
-\usepackage{algorithm}
-\usepackage{algorithmic}
-
-% the hyperref package is used to produce hyperlinks in the
-% resulting PDF.  If this breaks your system, please commend out the
-% following usepackage line and replace \usepackage{mlp2017} with
-% \usepackage[nohyperref]{mlp2017} below.
-\usepackage{hyperref}
-\usepackage{url}
-\urlstyle{same}
-
-% Packages hyperref and algorithmic misbehave sometimes.  We can fix
-% this with the following command.
-\newcommand{\theHalgorithm}{\arabic{algorithm}}
-
-
-% Set up MLP coursework style (based on ICML style)
-\usepackage{mlp2017}
-\mlptitlerunning{MLP Coursework 2 (\studentNumber)}
-\bibliographystyle{icml2017}
-
-
-\DeclareMathOperator{\softmax}{softmax}
-\DeclareMathOperator{\sigmoid}{sigmoid}
-\DeclareMathOperator{\sgn}{sgn}
-\DeclareMathOperator{\relu}{relu}
-\DeclareMathOperator{\lrelu}{lrelu}
-\DeclareMathOperator{\elu}{elu}
-\DeclareMathOperator{\selu}{selu}
-\DeclareMathOperator{\maxout}{maxout}
-
-%% You probably do not need to change anything above this comment
-
-%% REPLACE this with your student number
-\def\studentNumber{sXXXXXXX}
-
-\begin{document} 
-
-\twocolumn[
-\mlptitle{MLP Coursework 2: Learning rules, BatchNorm, and ConvNets}
-
-\centerline{\studentNumber}
-
-\vskip 7mm
-]
-
-\begin{abstract} 
-The abstract should be 100--200 words long,  providing a concise summary of the contents of your report.
-\end{abstract} 
-
-\section{Introduction}
-\label{sec:intro}
-This document provides a template for the MLP coursework 2 report.  This template structures the report into sections, which you are recommended to use, but can change if you wish.  If you want to use subsections within a section that is fine, but please do not use any deeper structuring.  In this template the text in each section will include an outline of what you should include in each section, along with some practical LaTeX examples (for example figures, tables, algorithms).  Your document should be no longer than \textbf{seven pages},  with an additional page allowed for references.
-
-The introduction should place your work in context, giving the overall motivation for the work, and clearly outlining the research questions you have explored.  This section should also include a concise description of the Balanced EMNIST task and  data -- be precise: for example state the size of the training, validation, and test sets.
-
-\section{Baseline systems} 
-In this section you should report your baseline experiments for EMNIST.  No need for theoretical explanations of things covered in the course, but should you go beyond what was covered please explain what you did with references to relevant paper(s) if appropriate.   In this section you should aim to cover the both the ``what'' and the ``why'': \emph{what} you did, giving sufficient information (hyperparameter settings, etc.) so that someone else (e.g. another student on the course) could reproduce your results; and \emph{why} you performed the experiments you are reporting - what you are aiming to discover what is the motivation for the particular experiments you undertook. You should also provide some discussion and interpretation of your results.  
-
-As before, your experimental sections should include graphs (for instance, figure~\ref{fig:sample-graph}) and/or tables (for instance, table~\ref{tab:sample-table})\footnote{These examples were taken from the ICML template paper.}, using the \verb+figure+ and \verb+table+ environments, in which you use \verb+\includegraphics+ to include an image (pdf, png, or jpg formats).  Please export graphs as 
-\href{https://en.wikipedia.org/wiki/Vector_graphics}{vector graphics}
-rather than \href{https://en.wikipedia.org/wiki/Raster_graphics}{raster
-files} as this will make sure all detail in the plot is visible.
-Matplotlib supports saving high quality figures in a wide range of
-common image formats using the
-\href{http://matplotlib.org/api/pyplot_api.html\#matplotlib.pyplot.savefig}{\texttt{savefig}}
-function. \textbf{You should use \texttt{savefig} rather than copying
-the screen-resolution raster images outputted in the notebook.} An
-example of using \texttt{savefig} to save a figure as a PDF file (which
-can be included as graphics in a \LaTeX document is given in the coursework 1 document.
-
-If you need a figure or table to stretch across two columns use the \verb+figure*+ or \verb+table*+ environment instead of the \verb+figure+ or \verb+table+ environment.  Use the \verb+subfigure+ environment if you want to include multiple graphics in a single figure.
-
-\begin{figure}[tb]
-\vskip 5mm
-\begin{center}
-\centerline{\includegraphics[width=\columnwidth]{icml_numpapers}}
-\caption{Historical locations and number of accepted papers for International
-  Machine Learning Conferences (ICML 1993 -- ICML 2008) and
-  International Workshops on Machine Learning (ML 1988 -- ML
-  1992). At the time this figure was produced, the number of
-  accepted papers for ICML 2008 was unknown and instead estimated.}
-\label{fig:sample-graph}
-\end{center}
-\vskip -5mm
-\end{figure} 
-
-\begin{table}[tb]
-\vskip 3mm
-\begin{center}
-\begin{small}
-\begin{sc}
-\begin{tabular}{lcccr}
-\hline
-\abovespace\belowspace
-Data set & Naive & Flexible & Better? \\
-\hline
-\abovespace
-Breast    & 95.9$\pm$ 0.2& 96.7$\pm$ 0.2& $\surd$ \\
-Cleveland & 83.3$\pm$ 0.6& 80.0$\pm$ 0.6& $\times$\\
-Glass2    & 61.9$\pm$ 1.4& 83.8$\pm$ 0.7& $\surd$ \\
-Credit    & 74.8$\pm$ 0.5& 78.3$\pm$ 0.6&         \\
-Horse     & 73.3$\pm$ 0.9& 69.7$\pm$ 1.0& $\times$\\
-Meta      & 67.1$\pm$ 0.6& 76.5$\pm$ 0.5& $\surd$ \\
-Pima      & 75.1$\pm$ 0.6& 73.9$\pm$ 0.5&         \\
-\belowspace
-Vehicle   & 44.9$\pm$ 0.6& 61.5$\pm$ 0.4& $\surd$ \\
-\hline
-\end{tabular}
-\end{sc}
-\end{small}
-\caption{Classification accuracies for naive Bayes and flexible 
-Bayes on various data sets.}
-\label{tab:sample-table}
-\end{center}
-\vskip -3mm
-\end{table}
-
-\section{Learning rules}
-In this section you should compare RMSProp and Adam with gradient descent, introducing these learning rules either as equations or as algorithmic pseudocode.  If you present the different approaches as algorithms, you can use the \verb+algorithm+ and \verb+algorithmic+ environments to format pseudocode (for instance, Algorithm~\ref{alg:example}). These require the corresponding style files, \verb+algorithm.sty+ and \verb+algorithmic.sty+ which are supplied with this package. 
-
-\begin{algorithm}[ht]
-\begin{algorithmic}
-   \STATE {\bfseries Input:} data $x_i$, size $m$
-   \REPEAT
-   \STATE Initialize $noChange = true$.
-   \FOR{$i=1$ {\bfseries to} $m-1$}
-   \IF{$x_i > x_{i+1}$} 
-   \STATE Swap $x_i$ and $x_{i+1}$
-   \STATE $noChange = false$
-   \ENDIF
-   \ENDFOR
-   \UNTIL{$noChange$ is $true$}
-\end{algorithmic}
-  \caption{Bubble Sort}
-  \label{alg:example}
-\end{algorithm}
-
-You should, in your own words, explain what the different learning rules do, and how they differ.  You should then present your experiments and results, comparing and contrasting stochastic gradient descent, RMSProp, and Adam.  As before concentrate on the ``what'' (remember give enough information so someone can reproduce your experiments), the ``why'' (why did you choose the experiments that you performed -- you may have been motivated by your earlier results, by the literature, or by a specific research question), and the interpretation of your results.
-
-In every section, you should present your results in a way that makes it easy for a reader to understand what they mean. You should facilitate comparisons either using graphs with multiple curves or (if appropriate, e.g. for accuracies) a results table. You need to avoid having too many figures, poorly labelled graphs, and graphs which should be comparable but which use different axis scales. A good presentation will enable the reader to compare trends in the same graph -- each graph should summarise the results relating to a particular research (sub)question.
-
-Your discussion should interpret the results, both in terms of summarising the outcomes of a particular experiment, and attempting to relate to the underlying models. A good report would have some analysis, resulting in an understanding of why particular results are observed, perhaps with reference to the literature. Use bibtex to organise your references -- in this case the references are in the file \verb+example-refs.bib+.  Here is a an example reference \citep{langley00}.  
-
-\section{Batch normalisation}
-In this section you should present batch normalisation,  supported using equations or algorithmic pseudocode.  Following this present your experiments, again remembering to include the ``what'', the ``why'', and the interpretation of results.
-
-\section{Convolutional networks}
-In this section you should present your experiments with convolutional networks.  Explain the idea of convolutional layers and pooling layers, and briefly explain how you did the implementation.  There is no need to include chunks of code.  You should report the experiments you have undertaken, again remembering to include \emph{what} experiments you performed (include details of hyperparameters, etc.),  \emph{why} you performed them (what was the motivation for the experiments, what research questions are you exploring), and the interpretation and discussion of your results.
-
-\section{Test results}
-The results reported in the previous sections should be on the validation set.  You should finally report results on the EMNIST test set using what you judge to the be the best deep neural network (without convolutional layers) and the best convolutional network.  Again focus on what the experiments were (be precise), why you chose to do them (in particular, how did you choose the architectures/settings to use with the test set), and a discussion/interpretation of the results.
-
-
-\section{Conclusions}
-\label{sec:concl}
-You should draw conclusions from the experiments, related to the research questions outlined in the introduction (section~\ref{sec:intro}). You should state the conclusions clearly and concisely. It is good if the conclusion from one experiment influenced what you did in later experiments -- your aim is to learn from your experiments. Extra credit if you relate your findings to what has been reported in the literature.
-
-A good conclusions section would also include a further work discussion, building on work done so far, and referencing the literature where appropriate.
-
-\bibliography{example-refs}
-
-\end{document} 
-
-
-% This document was modified from the file originally made available by
-% Pat Langley and Andrea Danyluk for ICML-2K. This version was
-% created by Lise Getoor and Tobias Scheffer, it was slightly modified  
-% from the 2010 version by Thorsten Joachims & Johannes Fuernkranz, 
-% slightly modified from the 2009 version by Kiri Wagstaff and 
-% Sam Roweis's 2008 version, which is slightly modified from 
-% Prasad Tadepalli's 2007 version which is a lightly 
-% changed version of the previous year's version by Andrew Moore, 
-% which was in turn edited from those of Kristian Kersting and 
-% Codrina Lauth. Alex Smola contributed to the algorithmic style files.  
--- a/report/mlp2017.sty
+++ b/report/mlp2017.sty
@ -1,720 +0,0 @@
-% File: mlp2017.sty (LaTeX style file for ICML-2017, version of 2017-05-31)
-
-% Modified by Daniel Roy 2017: changed byline to use footnotes for affiliations, and removed emails
-
-% This file contains the LaTeX formatting parameters for a two-column 
-% conference proceedings that is 8.5 inches wide by 11 inches high.  
-% 
-% Modified by Percy Liang 12/2/2013: changed the year, location from the previous template for ICML 2014
-
-% Modified by Fei Sha 9/2/2013: changed the year, location form the previous template for ICML 2013
-%
-% Modified by Fei Sha 4/24/2013: (1) remove the extra whitespace after the first author's email address (in %the camera-ready version) (2) change the Proceeding ... of ICML 2010 to 2014 so PDF's metadata will show up % correctly
-%
-% Modified by Sanjoy Dasgupta, 2013: changed years, location
-%
-% Modified by Francesco Figari, 2012: changed years, location
-%
-% Modified by Christoph Sawade and Tobias Scheffer, 2011: added line 
-% numbers, changed years
-%
-% Modified by Hal Daume III, 2010: changed years, added hyperlinks
-%
-% Modified by Kiri Wagstaff, 2009: changed years
-%
-% Modified by Sam Roweis, 2008: changed years
-%
-% Modified by Ricardo Silva, 2007: update of the ifpdf verification
-%
-% Modified by Prasad Tadepalli and Andrew Moore, merely changing years. 
-%
-% Modified by Kristian Kersting, 2005, based on Jennifer Dy's 2004 version
-% - running title. If the original title is to long or is breaking a line,
-%   use \mlptitlerunning{...} in the preamble to supply a shorter form.
-%   Added fancyhdr package to get a running head. 
-% - Updated to store the page size because pdflatex does compile the 
-%   page size into the pdf. 
-%
-% Hacked by Terran Lane, 2003:
-% - Updated to use LaTeX2e style file conventions (ProvidesPackage,
-%   etc.)
-% - Added an ``appearing in'' block at the base of the first column
-%   (thus keeping the ``appearing in'' note out of the bottom margin
-%   where the printer should strip in the page numbers).
-% - Added a package option [accepted] that selects between the ``Under
-%   review'' notice (default, when no option is specified) and the
-%   ``Appearing in'' notice (for use when the paper has been accepted
-%   and will appear).
-%
-%   Originally created as:  ml2k.sty (LaTeX style file for ICML-2000)
-%   by P. Langley (12/23/99)
-
-%%%%%%%%%%%%%%%%%%%%
-%% This version of the style file supports both a ``review'' version
-%% and a ``final/accepted'' version.  The difference is only in the
-%% text that appears in the note at the bottom of the first column of
-%% the first page.  The default behavior is to print a note to the
-%% effect that the paper is under review and don't distribute it.  The
-%% final/accepted version prints an ``Appearing in'' note.  To get the
-%% latter behavior, in the calling file change the ``usepackage'' line
-%% from:
-%%	\usepackage{icml2017}
-%% to
-%%	\usepackage[accepted]{icml2017}
-%%%%%%%%%%%%%%%%%%%%
-
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesPackage{mlp2017}[2017/01/01 MLP Coursework Style File]
-
-% Use fancyhdr package
-\RequirePackage{fancyhdr}
-\RequirePackage{color}
-\RequirePackage{algorithm}
-\RequirePackage{algorithmic}
-\RequirePackage{natbib}
-\RequirePackage{eso-pic} % used by \AddToShipoutPicture 
-\RequirePackage{forloop}
-
-%%%%%%%% Options
-%\DeclareOption{accepted}{%
-% \renewcommand{\Notice@String}{\ICML@appearing}
-  \gdef\isaccepted{1}
-%}
-\DeclareOption{nohyperref}{%
-  \gdef\nohyperref{1}
-}
-
-\ifdefined\nohyperref\else\ifdefined\hypersetup
-  \definecolor{mydarkblue}{rgb}{0,0.08,0.45}
-  \hypersetup{ %
-    pdftitle={},
-    pdfauthor={},
-    pdfsubject={MLP Coursework 2017-18},
-    pdfkeywords={},
-    pdfborder=0 0 0,
-    pdfpagemode=UseNone,
-    colorlinks=true,
-    linkcolor=mydarkblue,
-    citecolor=mydarkblue,
-    filecolor=mydarkblue,
-    urlcolor=mydarkblue,
-    pdfview=FitH}
-
-  \ifdefined\isaccepted \else
-    \hypersetup{pdfauthor={Anonymous Submission}}
-  \fi
-\fi\fi
-
-%%%%%%%%%%%%%%%%%%%%
-% This string is printed at the bottom of the page for the
-% final/accepted version of the ``appearing in'' note.  Modify it to
-% change that text.
-%%%%%%%%%%%%%%%%%%%%
-\newcommand{\ICML@appearing}{\textit{MLP Coursework 1 2017-18}}
-
-%%%%%%%%%%%%%%%%%%%%
-% This string is printed at the bottom of the page for the draft/under
-% review version of the ``appearing in'' note.  Modify it to change
-% that text.
-%%%%%%%%%%%%%%%%%%%%
-\newcommand{\Notice@String}{MLP Coursework 1 2017-18}
-
-% Cause the declared options to actually be parsed and activated
-\ProcessOptions\relax
-
-% Uncomment the following for debugging.  It will cause LaTeX to dump
-% the version of the ``appearing in'' string that will actually appear
-% in the document.
-%\typeout{>> Notice string='\Notice@String'}
-
-% Change citation commands to be more like old ICML styles
-\newcommand{\yrcite}[1]{\citeyearpar{#1}}
-\renewcommand{\cite}[1]{\citep{#1}}
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% to ensure the letter format is used. pdflatex does compile the
-% page size into the pdf. This is done using \pdfpagewidth and 
-% \pdfpageheight. As Latex does not know this directives, we first
-% check whether pdflatex or latex is used.
-%
-% Kristian Kersting 2005
-%
-% in order to account for the more recent use of pdfetex as the default
-% compiler, I have changed the pdf verification.
-%
-% Ricardo Silva 2007
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\paperwidth=210mm
-\paperheight=297mm
-
-% old PDFLaTex verification, circa 2005
-%
-%\newif\ifpdf\ifx\pdfoutput\undefined
-%  \pdffalse % we are not running PDFLaTeX
-%\else
-%  \pdfoutput=1 % we are running PDFLaTeX
-%  \pdftrue
-%\fi
-
-\newif\ifpdf %adapted from ifpdf.sty
-\ifx\pdfoutput\undefined
-\else
-   \ifx\pdfoutput\relax
-   \else
-     \ifcase\pdfoutput
-     \else
-       \pdftrue
-     \fi
-   \fi
-\fi
-
-\ifpdf
-%    \pdfpagewidth=\paperwidth
-%    \pdfpageheight=\paperheight
-  \setlength{\pdfpagewidth}{210mm}
-  \setlength{\pdfpageheight}{297mm}
-\fi
-
-% Physical page layout 
-
-\evensidemargin -5.5mm  
-\oddsidemargin -5.5mm 
-\setlength\textheight{248mm}
-\setlength\textwidth{170mm} 
-\setlength\columnsep{6.5mm}
-\setlength\headheight{10pt}
-\setlength\headsep{10pt} 
-\addtolength{\topmargin}{-20pt}
-
-%\setlength\headheight{1em}
-%\setlength\headsep{1em}
-\addtolength{\topmargin}{-6mm}
-
-%\addtolength{\topmargin}{-2em}
-
-%% The following is adapted from code in the acmconf.sty conference
-%% style file.  The constants in it are somewhat magical, and appear
-%% to work well with the two-column format on US letter paper that
-%% ICML uses, but will break if you change that layout, or if you use
-%% a longer block of text for the copyright notice string.  Fiddle with
-%% them if necessary to get the block to fit/look right.
-%%
-%% -- Terran Lane, 2003
-%%
-%% The following comments are included verbatim from acmconf.sty:
-%%
-%%% This section (written by KBT) handles the 1" box in the lower left
-%%% corner of the left column of the first page by creating a picture,
-%%% and inserting the predefined string at the bottom (with a negative
-%%% displacement to offset the space allocated for a non-existent
-%%% caption).
-%%%
-\def\ftype@copyrightbox{8}
-\def\@copyrightspace{
-% Create a float object positioned at the bottom of the column.  Note
-% that because of the mystical nature of floats, this has to be called
-% before the first column is populated with text (e.g., from the title
-% or abstract blocks).  Otherwise, the text will force the float to
-% the next column.  -- TDRL.
-\@float{copyrightbox}[b]
-\begin{center}
-\setlength{\unitlength}{1pc}
-\begin{picture}(20,1.5)
-% Create a line separating the main text from the note block.
-% 4.818pc==0.8in.
-\put(0,2.5){\line(1,0){4.818}}
-% Insert the text string itself.  Note that the string has to be
-% enclosed in a parbox -- the \put call needs a box object to
-% position.  Without the parbox, the text gets splattered across the
-% bottom of the page semi-randomly.  The 19.75pc distance seems to be
-% the width of the column, though I can't find an appropriate distance
-% variable to substitute here.  -- TDRL.
-\put(0,0){\parbox[b]{19.75pc}{\small \Notice@String}}
-\end{picture}
-\end{center}
-\end@float}
-
-% Note: A few Latex versions need the next line instead of the former.
-% \addtolength{\topmargin}{0.3in}
-% \setlength\footheight{0pt}
-\setlength\footskip{0pt} 
-%\pagestyle{empty} 
-\flushbottom \twocolumn
-\sloppy
-
-% Clear out the addcontentsline command
-\def\addcontentsline#1#2#3{}
- 
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%% commands for formatting paper title, author names, and addresses. 
-
-%%start%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%% title as running head -- Kristian Kersting 2005 %%%%%%%%%%%%%
-
-
-%\makeatletter
-%\newtoks\mytoksa
-%\newtoks\mytoksb
-%\newcommand\addtomylist[2]{%
-%  \mytoksa\expandafter{#1}%
-%  \mytoksb{#2}%
-%  \edef#1{\the\mytoksa\the\mytoksb}%
-%}
-%\makeatother 
-
-% box to check the size of the running head
-\newbox\titrun
-
-% general page style
-\pagestyle{fancy}
-\fancyhf{}
-\fancyhead{}
-\fancyfoot{}
-% set the width of the head rule to 1 point
-\renewcommand{\headrulewidth}{1pt}
-
-% definition to set the head as running head in the preamble
-\def\mlptitlerunning#1{\gdef\@mlptitlerunning{#1}}
-
-% main definition adapting \mlptitle from 2004
-\long\def\mlptitle#1{%
-
-   %check whether @mlptitlerunning exists
-   % if not \mlptitle is used as running head
-   \ifx\undefined\@mlptitlerunning%
-	\gdef\@mlptitlerunning{#1}
-   \fi
-
-   %add it to pdf information
-  \ifdefined\nohyperref\else\ifdefined\hypersetup
-     \hypersetup{pdftitle={#1}}
-   \fi\fi
-
-   %get the dimension of the running title
-   \global\setbox\titrun=\vbox{\small\bf\@mlptitlerunning}
-
-   % error flag
-   \gdef\@runningtitleerror{0}
-
-   % running title too long
-   \ifdim\wd\titrun>\textwidth%
-	  {\gdef\@runningtitleerror{1}}%
-   % running title breaks a line
-   \else\ifdim\ht\titrun>6.25pt
-	   {\gdef\@runningtitleerror{2}}%
-	\fi
-   \fi 
-
-   % if there is somthing wrong with the running title
-   \ifnum\@runningtitleerror>0
-	   \typeout{}%
-           \typeout{}%
-           \typeout{*******************************************************}%
-           \typeout{Title exceeds size limitations for running head.}%
-           \typeout{Please supply a shorter form for the running head}
-           \typeout{with \string\mlptitlerunning{...}\space prior to \string\begin{document}}%
-           \typeout{*******************************************************}%
- 	    \typeout{}%
-           \typeout{}%
-           % set default running title
-	   \chead{\small\bf Title Suppressed Due to Excessive Size}%
-    \else
-	   % 'everything' fine, set provided running title
-  	   \chead{\small\bf\@mlptitlerunning}%
-    \fi
-
-  % no running title on the first page of the paper
-  \thispagestyle{empty}
-
-%%%%%%%%%%%%%%%%%%%% Kristian Kersting %%%%%%%%%%%%%%%%%%%%%%%%%  
-%end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-  {\center\baselineskip 18pt
-                       \toptitlebar{\Large\bf #1}\bottomtitlebar}
-}
-
-
-\gdef\icmlfullauthorlist{}
-\newcommand\addstringtofullauthorlist{\g@addto@macro\icmlfullauthorlist}
-\newcommand\addtofullauthorlist[1]{%
-  \ifdefined\icmlanyauthors%
-    \addstringtofullauthorlist{, #1}%
-  \else%
-    \addstringtofullauthorlist{#1}%
-    \gdef\icmlanyauthors{1}%
-  \fi%
-  \ifdefined\nohyperref\else\ifdefined\hypersetup%
-    \hypersetup{pdfauthor=\icmlfullauthorlist}%
-  \fi\fi}
-
-
-\def\toptitlebar{\hrule height1pt \vskip .25in} 
-\def\bottomtitlebar{\vskip .22in \hrule height1pt \vskip .3in} 
-
-\newenvironment{icmlauthorlist}{%
-  \setlength\topsep{0pt}
-  \setlength\parskip{0pt}
-  \begin{center}
-}{%
-  \end{center}
-}
-
-\newcounter{@affiliationcounter}
-\newcommand{\@pa}[1]{%
-% ``#1''
-\ifcsname the@affil#1\endcsname
-   % do nothing
-\else
-  \ifcsname @icmlsymbol#1\endcsname
-    % nothing
-  \else
-  \stepcounter{@affiliationcounter}%
-  \newcounter{@affil#1}%
-  \setcounter{@affil#1}{\value{@affiliationcounter}}%
-  \fi
-\fi%
-\ifcsname @icmlsymbol#1\endcsname
-  \textsuperscript{\csname @icmlsymbol#1\endcsname\,}%
-\else
-  %\expandafter\footnotemark[\arabic{@affil#1}\,]%
-  \textsuperscript{\arabic{@affil#1}\,}%
-\fi
-}
-
-%\newcommand{\icmlauthor}[2]{%
-%\addtofullauthorlist{#1}%
-%#1\@for\theaffil:=#2\do{\pa{\theaffil}}%
-%}
-\newcommand{\icmlauthor}[2]{%
-  \ifdefined\isaccepted
-    \mbox{\bf #1}\,\@for\theaffil:=#2\do{\@pa{\theaffil}} \addtofullauthorlist{#1}%
-   \else
-    \ifdefined\@icmlfirsttime
-    \else
-      \gdef\@icmlfirsttime{1}
-      \mbox{\bf Anonymous Authors}\@pa{@anon} \addtofullauthorlist{Anonymous Authors}
-     \fi
-    \fi
-}
-
-\newcommand{\icmlsetsymbol}[2]{%
-  \expandafter\gdef\csname @icmlsymbol#1\endcsname{#2}
- }
-   
-
-\newcommand{\icmlaffiliation}[2]{%
-\ifdefined\isaccepted
-\ifcsname the@affil#1\endcsname
- \expandafter\gdef\csname @affilname\csname the@affil#1\endcsname\endcsname{#2}%
-\else
-  {\bf AUTHORERR: Error in use of \textbackslash{}icmlaffiliation command. Label ``#1'' not mentioned in some \textbackslash{}icmlauthor\{author name\}\{labels here\} command beforehand. }
-  \typeout{}%
-  \typeout{}%
-  \typeout{*******************************************************}%
-  \typeout{Affiliation label undefined. }%
-  \typeout{Make sure \string\icmlaffiliation\space follows }
-  \typeout{all of \string\icmlauthor\space commands}%
-  \typeout{*******************************************************}%
-  \typeout{}%
-  \typeout{}%
-\fi
-\else % \isaccepted
- % can be called multiple times... it's idempotent
- \expandafter\gdef\csname @affilname1\endcsname{Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country}
-\fi
-}
-
-\newcommand{\icmlcorrespondingauthor}[2]{
-\ifdefined\isaccepted
- \ifdefined\icmlcorrespondingauthor@text
-   \g@addto@macro\icmlcorrespondingauthor@text{, #1 \textless{}#2\textgreater{}}
- \else
-   \gdef\icmlcorrespondingauthor@text{#1 \textless{}#2\textgreater{}}
- \fi
-\else
-\gdef\icmlcorrespondingauthor@text{Anonymous Author \textless{}anon.email@domain.com\textgreater{}}
-\fi
-}
-
-\newcommand{\icmlEqualContribution}{\textsuperscript{*}Equal contribution }
-
-\newcounter{@affilnum}
-\newcommand{\printAffiliationsAndNotice}[1]{%
-\stepcounter{@affiliationcounter}%
-{\let\thefootnote\relax\footnotetext{\hspace*{-\footnotesep}#1%
-\forloop{@affilnum}{1}{\value{@affilnum} < \value{@affiliationcounter}}{
-\textsuperscript{\arabic{@affilnum}}\ifcsname @affilname\the@affilnum\endcsname%
-\csname @affilname\the@affilnum\endcsname%
-\else
-{\bf AUTHORERR: Missing \textbackslash{}icmlaffiliation.}
-\fi
-}.
-\ifdefined\icmlcorrespondingauthor@text
-Correspondence to: \icmlcorrespondingauthor@text.
-\else
-{\bf AUTHORERR: Missing \textbackslash{}icmlcorrespondingauthor.}
-\fi
-
-\ \\
-\Notice@String
-}
-}
-}
-
-  
-%\makeatother
-
-\long\def\icmladdress#1{%
- {\bf The \textbackslash{}icmladdress command is no longer used.  See the example\_paper PDF .tex for usage of \textbackslash{}icmlauther and \textbackslash{}icmlaffiliation.}
-}
-
-%% keywords as first class citizens
-\def\icmlkeywords#1{%
-%  \ifdefined\isaccepted \else
-%    \par {\bf Keywords:} #1%
-%  \fi
-%  \ifdefined\nohyperref\else\ifdefined\hypersetup
-%    \hypersetup{pdfkeywords={#1}}
-%  \fi\fi
-%  \ifdefined\isaccepted \else
-%    \par {\bf Keywords:} #1%
-%  \fi
-  \ifdefined\nohyperref\else\ifdefined\hypersetup
-    \hypersetup{pdfkeywords={#1}}
-  \fi\fi
-}
-
-% modification to natbib citations
-\setcitestyle{authoryear,round,citesep={;},aysep={,},yysep={;}}
-
-% Redefinition of the abstract environment. 
-\renewenvironment{abstract}
-   {%
-% Insert the ``appearing in'' copyright notice.
-%\@copyrightspace
-\centerline{\large\bf Abstract}
-    \vspace{-0.12in}\begin{quote}}
-   {\par\end{quote}\vskip 0.12in}
-
-% numbered section headings with different treatment of numbers
-
-\def\@startsection#1#2#3#4#5#6{\if@noskipsec \leavevmode \fi
-   \par \@tempskipa #4\relax
-   \@afterindenttrue
-% Altered the following line to indent a section's first paragraph. 
-%  \ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \@afterindentfalse\fi
-   \ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \fi
-   \if@nobreak \everypar{}\else
-     \addpenalty{\@secpenalty}\addvspace{\@tempskipa}\fi \@ifstar
-     {\@ssect{#3}{#4}{#5}{#6}}{\@dblarg{\@sict{#1}{#2}{#3}{#4}{#5}{#6}}}}
-
-\def\@sict#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
-     \def\@svsec{}\else 
-     \refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname}\fi
-     \@tempskipa #5\relax
-      \ifdim \@tempskipa>\z@
-        \begingroup #6\relax
-          \@hangfrom{\hskip #3\relax\@svsec.~}{\interlinepenalty \@M #8\par}
-        \endgroup
-       \csname #1mark\endcsname{#7}\addcontentsline
-         {toc}{#1}{\ifnum #2>\c@secnumdepth \else
-                      \protect\numberline{\csname the#1\endcsname}\fi
-                    #7}\else
-        \def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
-                      {#7}\addcontentsline
-                           {toc}{#1}{\ifnum #2>\c@secnumdepth \else
-                             \protect\numberline{\csname the#1\endcsname}\fi
-                       #7}}\fi
-     \@xsect{#5}}
-
-\def\@sect#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
-     \def\@svsec{}\else 
-     \refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname\hskip 0.4em }\fi
-     \@tempskipa #5\relax
-      \ifdim \@tempskipa>\z@ 
-        \begingroup #6\relax
-          \@hangfrom{\hskip #3\relax\@svsec}{\interlinepenalty \@M #8\par}
-        \endgroup
-       \csname #1mark\endcsname{#7}\addcontentsline
-         {toc}{#1}{\ifnum #2>\c@secnumdepth \else
-                      \protect\numberline{\csname the#1\endcsname}\fi
-                    #7}\else
-        \def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
-                      {#7}\addcontentsline
-                           {toc}{#1}{\ifnum #2>\c@secnumdepth \else
-                             \protect\numberline{\csname the#1\endcsname}\fi
-                       #7}}\fi
-     \@xsect{#5}}
-
-% section headings with less space above and below them
-\def\thesection {\arabic{section}}
-\def\thesubsection {\thesection.\arabic{subsection}}
-\def\section{\@startsection{section}{1}{\z@}{-0.12in}{0.02in}
-             {\large\bf\raggedright}}
-\def\subsection{\@startsection{subsection}{2}{\z@}{-0.10in}{0.01in}
-                {\normalsize\bf\raggedright}}
-\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-0.08in}{0.01in}
-                {\normalsize\sc\raggedright}}
-\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
-  0.5ex minus .2ex}{-1em}{\normalsize\bf}}
-\def\subparagraph{\@startsection{subparagraph}{5}{\z@}{1.5ex plus
-  0.5ex minus .2ex}{-1em}{\normalsize\bf}}
- 
-% Footnotes 
-\footnotesep 6.65pt % 
-\skip\footins 9pt 
-\def\footnoterule{\kern-3pt \hrule width 0.8in \kern 2.6pt } 
-\setcounter{footnote}{0} 
- 
-% Lists and paragraphs 
-\parindent 0pt 
-\topsep 4pt plus 1pt minus 2pt 
-\partopsep 1pt plus 0.5pt minus 0.5pt 
-\itemsep 2pt plus 1pt minus 0.5pt 
-\parsep 2pt plus 1pt minus 0.5pt 
-\parskip 6pt
- 
-\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em 
-\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em  
-\leftmarginvi .5em 
-\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt 
- 
-\def\@listi{\leftmargin\leftmargini} 
-\def\@listii{\leftmargin\leftmarginii 
-   \labelwidth\leftmarginii\advance\labelwidth-\labelsep 
-   \topsep 2pt plus 1pt minus 0.5pt 
-   \parsep 1pt plus 0.5pt minus 0.5pt 
-   \itemsep \parsep} 
-\def\@listiii{\leftmargin\leftmarginiii 
-    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep 
-    \topsep 1pt plus 0.5pt minus 0.5pt  
-    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt 
-    \itemsep \topsep} 
-\def\@listiv{\leftmargin\leftmarginiv 
-     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep} 
-\def\@listv{\leftmargin\leftmarginv 
-     \labelwidth\leftmarginv\advance\labelwidth-\labelsep} 
-\def\@listvi{\leftmargin\leftmarginvi 
-     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep} 
- 
-\abovedisplayskip 7pt plus2pt minus5pt% 
-\belowdisplayskip \abovedisplayskip 
-\abovedisplayshortskip  0pt plus3pt%    
-\belowdisplayshortskip  4pt plus3pt minus3pt% 
- 
-% Less leading in most fonts (due to the narrow columns) 
-% The choices were between 1-pt and 1.5-pt leading 
-\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt} 
-\def\small{\@setsize\small{10pt}\ixpt\@ixpt} 
-\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt} 
-\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt} 
-\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt} 
-\def\large{\@setsize\large{14pt}\xiipt\@xiipt} 
-\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt} 
-\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt} 
-\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt} 
-\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt} 
-
-% Revised formatting for figure captions and table titles. 
-\newsavebox\newcaptionbox\newdimen\newcaptionboxwid
-
-\long\def\@makecaption#1#2{
- \vskip 10pt 
-        \baselineskip 11pt
-        \setbox\@tempboxa\hbox{#1. #2}
-        \ifdim \wd\@tempboxa >\hsize
-        \sbox{\newcaptionbox}{\small\sl #1.~}
-        \newcaptionboxwid=\wd\newcaptionbox
-        \usebox\newcaptionbox {\footnotesize #2}
-%        \usebox\newcaptionbox {\small #2}
-        \else 
-          \centerline{{\small\sl #1.} {\small #2}} 
-        \fi}
-
-\def\fnum@figure{Figure \thefigure}
-\def\fnum@table{Table \thetable}
-
-% Strut macros for skipping spaces above and below text in tables. 
-\def\abovestrut#1{\rule[0in]{0in}{#1}\ignorespaces}
-\def\belowstrut#1{\rule[-#1]{0in}{#1}\ignorespaces}
-
-\def\abovespace{\abovestrut{0.20in}}
-\def\aroundspace{\abovestrut{0.20in}\belowstrut{0.10in}}
-\def\belowspace{\belowstrut{0.10in}}
-
-% Various personal itemization commands. 
-\def\texitem#1{\par\noindent\hangindent 12pt
-               \hbox to 12pt {\hss #1 ~}\ignorespaces}
-\def\icmlitem{\texitem{$\bullet$}}
-
-% To comment out multiple lines of text.
-\long\def\comment#1{}
-
-
-
-
-%% Line counter (not in final version). Adapted from NIPS style file by Christoph Sawade
-
-% Vertical Ruler
-% This code is, largely, from the CVPR 2010 conference style file
-% ----- define vruler
-\makeatletter
-\newbox\icmlrulerbox
-\newcount\icmlrulercount
-\newdimen\icmlruleroffset
-\newdimen\cv@lineheight
-\newdimen\cv@boxheight
-\newbox\cv@tmpbox
-\newcount\cv@refno
-\newcount\cv@tot
-% NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
-\newcount\cv@tmpc@ \newcount\cv@tmpc
-\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
-\cv@tmpc=1 %
-\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
-   \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
-\ifnum#2<0\advance\cv@tmpc1\relax-\fi
-\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
-\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
-% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
-\def\makevruler[#1][#2][#3][#4][#5]{
-	\begingroup\offinterlineskip
-		\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
-		\global\setbox\icmlrulerbox=\vbox to \textheight{%
-			{
-				\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
-				\cv@lineheight=#1\global\icmlrulercount=#2%
-				\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
-				\cv@refno1\vskip-\cv@lineheight\vskip1ex%
-				\loop\setbox\cv@tmpbox=\hbox to0cm{					 % side margin
-					\hfil {\hfil\fillzeros[#4]\icmlrulercount}
-				}%
-				\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
-				\advance\cv@refno1\global\advance\icmlrulercount#3\relax
-				\ifnum\cv@refno<\cv@tot\repeat
-			}
-		}
-	\endgroup
-}%
-\makeatother
-% ----- end of vruler
-
-
-% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
-\def\icmlruler#1{\makevruler[12pt][#1][1][3][\textheight]\usebox{\icmlrulerbox}}
-\AddToShipoutPicture{%
-\icmlruleroffset=\textheight
-\advance\icmlruleroffset by 5.2pt % top margin
-  \color[rgb]{.7,.7,.7}
-  \ifdefined\isaccepted \else
-	  \AtTextUpperLeft{%
-	    \put(\LenToUnit{-35pt},\LenToUnit{-\icmlruleroffset}){%left ruler
-	      \icmlruler{\icmlrulercount}}
-%	    \put(\LenToUnit{1.04\textwidth},\LenToUnit{-\icmlruleroffset}){%right ruler
-%	      \icmlruler{\icmlrulercount}}
-	  }
-	 \fi
-}
-\endinput
--- a/report/natbib.sty
+++ b/report/natbib.sty
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,3 @@
+tensorflow==1.4.1
+tqdm==4.11.2
+numpy==1.13.1
--- a/requirements_gpu.txt
+++ b/requirements_gpu.txt
@ -0,0 +1,3 @@
+tensorflow_gpu==1.4.1
+tqdm==4.11.2
+numpy==1.13.1
--- a/scripts/generate_batchnorm_test.py
+++ b/scripts/generate_batchnorm_test.py
@ -1,43 +0,0 @@
-import numpy as np
-from mlp.layers import BatchNormalizationLayer
-import argparse
-
-parser = argparse.ArgumentParser(description='Welcome to GAN-Shot-Learning script')
-
-parser.add_argument('--student_id', nargs="?", type=str, help='Your student id in the format "sxxxxxxx"')
-
-args = parser.parse_args()
-
-student_id = args.student_id
-
-def generate_inputs(student_id):
-    student_number = student_id
-    tests = np.arange(96).reshape((2, 3, 4, 4))
-    tests[:, 0, :, :] = float(student_number[1:3]) / 10 - 5
-    tests[:, :, 1, :] = float(student_number[3:5]) / 10 - 5
-    tests[:, 2, :, :] = float(student_number[5:7]) / 10 - 5
-    tests[0, 1, :, :] = float(student_number[7]) / 10 - 5
-    return tests
-
-test_inputs = generate_inputs(student_id)
-test_inputs = np.reshape(test_inputs, newshape=(2, -1))
-test_grads_wrt_outputs = np.arange(-48, 48).reshape((2, -1))
-
-#produce BatchNorm Layer fprop and bprop
-activation_layer = BatchNormalizationLayer(input_dim=48)
-
-beta = np.array(48*[0.3])
-gamma = np.array(48*[0.8])
-
-activation_layer.params = [gamma, beta]
-BN_fprop = activation_layer.fprop(test_inputs)
-BN_bprop = activation_layer.bprop(
-    test_inputs, BN_fprop, test_grads_wrt_outputs)
-BN_grads_wrt_params = activation_layer.grads_wrt_params(
-    test_inputs, test_grads_wrt_outputs)
-
-test_output = "BatchNormalization:\nFprop: {}\nBprop: {}\nGrads_wrt_params: {}\n"\
-    .format(BN_fprop, BN_bprop, BN_grads_wrt_params)
-
-with open("{}_batchnorm_test_file.txt".format(student_id), "w+") as out_file:
-    out_file.write(test_output)
--- a/scripts/generate_conv_test.py
+++ b/scripts/generate_conv_test.py
@ -1,59 +0,0 @@
-import numpy as np
-from mlp.layers import ConvolutionalLayer
-import argparse
-
-parser = argparse.ArgumentParser(description='Welcome to GAN-Shot-Learning script')
-
-parser.add_argument('--student_id', nargs="?", type=str, help='Your student id in the format "sxxxxxxx"')
-
-args = parser.parse_args()
-
-student_id = args.student_id
-
-def generate_inputs(student_id):
-    student_number = student_id
-    tests = np.arange(96).reshape((2, 3, 4, 4))
-    tests[:, 0, :, :] = float(student_number[1:3]) / 10 - 5
-    tests[:, :, 1, :] = float(student_number[3:5]) / 10 - 5
-    tests[:, 2, :, :] = float(student_number[5:7]) / 10 - 5
-    tests[0, 1, :, :] = float(student_number[7]) / 10 - 5
-    return tests
-
-test_inputs = generate_inputs(student_id)
-test_grads_wrt_outputs = np.arange(-20, 16).reshape((2, 2, 3, 3))
-inputs = np.arange(96).reshape((2, 3, 4, 4))
-kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))
-biases = np.arange(2)
-
-#produce ConvolutionalLayer fprop, bprop and grads_wrt_params
-activation_layer = ConvolutionalLayer(num_input_channels=3, num_output_channels=2, input_dim_1=4, input_dim_2=4,
-                                      kernel_dim_1=2, kernel_dim_2=2)
-activation_layer.params = [kernels, biases]
-conv_fprop = activation_layer.fprop(test_inputs)
-conv_bprop = activation_layer.bprop(
-    test_inputs, conv_fprop, test_grads_wrt_outputs)
-conv_grads_wrt_params = activation_layer.grads_wrt_params(test_inputs,
-                                                          test_grads_wrt_outputs)
-test_output = "ConvolutionalLayer:\nFprop: {}\nBprop: {}\n" \
-              "Grads_wrt_params: {}\n".format(conv_fprop,
-            conv_bprop,
-            conv_grads_wrt_params)
-
-cross_correlation_kernels = kernels[:, :, ::-1, ::-1]
-activation_layer = ConvolutionalLayer(num_input_channels=3, num_output_channels=2, input_dim_1=4, input_dim_2=4,
-                                      kernel_dim_1=2, kernel_dim_2=2)
-activation_layer.params = [cross_correlation_kernels, biases]
-conv_fprop = activation_layer.fprop(test_inputs)
-conv_bprop = activation_layer.bprop(
-    test_inputs, conv_fprop, test_grads_wrt_outputs)
-conv_grads_wrt_params = activation_layer.grads_wrt_params(test_inputs,
-                                                          test_grads_wrt_outputs)
-
-test_cross_correlation_output = "Cross_Correlation_ConvolutionalLayer:\nFprop: {}\nBprop: {}\n" \
-              "Grads_wrt_params: {}\n".format(conv_fprop,
-            conv_bprop,
-            conv_grads_wrt_params)
-
-test_output = test_output + "\n" + test_cross_correlation_output
-with open("{}_conv_test_file.txt".format(student_id), "w+") as out_file:
-    out_file.write(test_output)
--- a/scripts/secure-notebook-server.sh
+++ b/scripts/secure-notebook-server.sh
@ -1,73 +0,0 @@
-#!/bin/bash
-# Configure Jupyter notebook server to use password authentication
-# Make sure Conda environment is active as will assume it is later
-[ -z "$CONDA_PREFIX" ] && echo "Need to have Conda environment activated." && exit 1
-if [ "$#" -gt 2 ]; then
-    echo "Usage: bash secure-notebook-server.sh [jupyter-path] [open-ssl-config-path]"
-    exit 1
-fi
-# If specified read Jupyter directory from passed argument
-JUPYTER_DIR=${1:-"$HOME/.jupyter"}
-# If specified read OpenSSL config file path from passed argument
-# This is needed due to bug in how Conda handles config path
-export OPENSSL_CONF=${2:-"$CONDA_PREFIX/ssl/openssl.cnf"}
-SEPARATOR="=================================================================\n"
-# Create default config file if one does not already exist
-if [ ! -f "$JUPYTER_DIR/jupyter_notebook_config.py" ]; then
-  echo "No existing notebook configuration file found, creating new one ..."
-  printf $SEPARATOR
-  jupyter notebook --generate-config
-  printf $SEPARATOR
-  echo "... notebook configuration file created."
-fi
-# Get user to enter notebook server password
-echo "Getting notebook server password hash. Enter password when prompted ..."
-printf $SEPARATOR
-HASH=$(python -c "from notebook.auth import passwd; print(passwd());")
-printf $SEPARATOR
-echo "... got password hash."
-# Generate self-signed OpenSSL certificate and key file
-echo "Creating certificate file ..."
-printf $SEPARATOR
-CERT_INFO="/C=UK/ST=Scotland/L=Edinburgh/O=University of Edinburgh/OU=School of Informatics/CN=$USER/emailAddress=$USER@sms.ed.ac.uk"
-openssl req \
-    -x509 -nodes -days 365 \
-    -subj "/C=UK/ST=Scotland/L=Edinburgh/O=University of Edinburgh/OU=School of Informatics/CN=$USER/emailAddress=$USER@sms.ed.ac.uk" \
-    -newkey rsa:1024 -keyout "$JUPYTER_DIR/key.key" \
-    -out "$JUPYTER_DIR/cert.pem"
-printf $SEPARATOR
-echo "... certificate created."
-# Setting permissions on key file
-chmod 600 "$JUPYTER_DIR/key.key"
-# Add password hash and certificate + key file paths to config file
-echo "Setting up configuration file..."
-printf $SEPARATOR
-echo "   adding password hash"
-SRC_PSW="^#\?c\.NotebookApp\.password[ ]*=[ ]*u['"'"'"]\(sha1:[a-fA-F0-9]\+\)\?['"'"'"]"
-DST_PSW="c.NotebookApp.password = u'$HASH'"
-grep -q "c.NotebookApp.password" $JUPYTER_DIR/jupyter_notebook_config.py
-if [ ! $? -eq 0 ]; then
-  echo DST_PSW >> $JUPYTER_DIR/jupyter_notebook_config.py
-else
-  sed -i "s/$SRC_PSW/$DST_PSW/" $JUPYTER_DIR/jupyter_notebook_config.py
-fi
-echo "   adding certificate file path"
-SRC_CRT="^#\?c\.NotebookApp\.certfile[ ]*=[ ]*u['"'"'"]\([^'"'"'"]+\)\?['"'"'"]"
-DST_CRT="c.NotebookApp.certfile = u'$JUPYTER_DIR/cert.pem'"
-grep -q "c.NotebookApp.certfile" $JUPYTER_DIR/jupyter_notebook_config.py
-if [ ! $? -eq 0 ]; then
-  echo DST_CRT >> $JUPYTER_DIR/jupyter_notebook_config.py
-else
-  sed -i "s|$SRC_CRT|$DST_CRT|" $JUPYTER_DIR/jupyter_notebook_config.py
-fi
-echo "   adding key file path"
-SRC_KEY="^#\?c\.NotebookApp\.keyfile[ ]*=[ ]*u['"'"'"]\([^'"'"'"]+\)\?['"'"'"]"
-DST_KEY="c.NotebookApp.keyfile = u'$JUPYTER_DIR/key.key'"
-grep -q "c.NotebookApp.keyfile" $JUPYTER_DIR/jupyter_notebook_config.py
-if [ ! $? -eq 0 ]; then
-  echo DST_KEY >> $JUPYTER_DIR/jupyter_notebook_config.py
-else
-  sed -i "s|$SRC_KEY|$DST_KEY|" $JUPYTER_DIR/jupyter_notebook_config.py
-fi
-printf $SEPARATOR
-echo "... finished setting up configuration file."
--- a/setup.py
+++ b/setup.py
@ -1,13 +0,0 @@
-""" Setup script for mlp package. """
-
-from setuptools import setup
-
-setup(
-    name = "mlp",
-    author = "Pawel Swietojanski, Steve Renals, Matt Graham and Antreas Antoniou",
-    description = ("Neural network framework for University of Edinburgh "
-                   "School of Informatics Machine Learning Practical course."),
-    url = "https://github.com/CSTR-Edinburgh/mlpractical",
-    packages=['mlp']
-)
-
--- a/spec/coursework1.pdf
+++ b/spec/coursework1.pdf
--- a/spec/coursework1.tex
+++ b/spec/coursework1.tex
@ -1,493 +0,0 @@
-\documentclass[11pt,]{article}
-\usepackage[T1]{fontenc}
-\usepackage{amssymb,amsmath}
-\usepackage{txfonts}
-\usepackage{microtype}
-\usepackage{amssymb,amsmath}
-\usepackage{graphicx}
-\usepackage{subfigure} 
-\usepackage{natbib}
-\usepackage{paralist}
-\usepackage{hyperref}
-\usepackage{url}
-\urlstyle{same}
-\usepackage{color}
-\usepackage{fancyvrb}
-\newcommand{\VerbBar}{|}
-\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
-\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
-% Add ',fontsize=\small' for more characters per line
-\newenvironment{Shaded}{}{}
-\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
-\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
-\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
-\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
-\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
-\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
-\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
-\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
-\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
-\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
-\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
-\newcommand{\RegionMarkerTok}[1]{{#1}}
-\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
-\newcommand{\NormalTok}[1]{{#1}}
-
-\hypersetup{breaklinks=true,
-            pdfauthor={},
-            pdftitle={},
-            colorlinks=true,
-            citecolor=blue,
-            urlcolor=blue,
-            linkcolor=magenta,
-            pdfborder={0 0 0}}
-
-\setlength{\parindent}{0pt}
-\setlength{\parskip}{6pt plus 2pt minus 1pt}
-\setlength{\emergencystretch}{3em}  % prevent overfull lines
-\setcounter{secnumdepth}{0}
-
-\usepackage[a4paper,body={170mm,250mm},top=25mm,left=25mm]{geometry}
-\usepackage[sf,bf,small]{titlesec}
-\usepackage{fancyhdr}
-
-\pagestyle{fancy}
-\lhead{\sffamily MLP Coursework 1}
-\rhead{\sffamily Due: 30 October 2017}
-\cfoot{\sffamily \thepage}
-
-\author{}
-\date{}
-
-\DeclareMathOperator{\softmax}{softmax}
-\DeclareMathOperator{\sigmoid}{sigmoid}
-\DeclareMathOperator{\sgn}{sgn}
-\DeclareMathOperator{\relu}{relu}
-\DeclareMathOperator{\lrelu}{lrelu}
-\DeclareMathOperator{\elu}{elu}
-\DeclareMathOperator{\selu}{selu}
-\DeclareMathOperator{\maxout}{maxout}
-
-\begin{document}
-
-\section{Machine Learning Practical: Coursework
-1}
-\label{sec:machine-learning-practical-coursework-1}
-
-\textbf{Release date: Monday 16th October 2017}\\
-\textbf{Due date: 16:00 Monday 30th October 2017}
-
-\subsection{Introduction}
-\label{sec:introduction}
-This coursework is concerned with training multi-layer networks to
-address the MNIST digit classification problem. It builds on the
-material covered in the first three lab notebooks and the first four
-lectures. \textbf{You should complete the first three lab
-notebooks before starting the coursework.} The aim of the coursework is
-to investigate variants of the ReLU activation function for hidden units 
-in multi-layer networks, with respect to the validation set accuracies 
-achieved by the trained models.
-
-
- 
-
-
-\subsection{Code}
-\label{sec:code}
-
-You should run all of the experiments for the coursework inside the
-Conda environment you set up in first labs.  The code for the coursework is available on the course
-\href{https://github.com/CSTR-Edinburgh/mlpractical/}{Github repository}
-on a branch \texttt{mlp2017-8/coursework1}. To create a local working
-copy of this branch in your local repository you need to do the
-following.
-
-\begin{enumerate}
-\def\labelenumi{\arabic{enumi}.}
-\itemsep1pt\parskip0pt\parsep0pt
-\item
-  Make sure all modified files on the branch you are currently have been
-  committed
-  (\href{https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2017-8/master/notes/getting-started-in-a-lab.md}{see
-  details here} if you are unsure how to do this).
-\item
-  Fetch changes to the upstream \texttt{origin} repository by running\\
-  \texttt{git fetch origin}
-\item
-  Checkout a new local branch from the fetched branch using\\
-  \texttt{git checkout -b coursework1 origin/mlp2017-8/coursework1}
-\end{enumerate}
-
-You will now have a new branch in your local repository with all the
-code necessary for the coursework in it. In the \texttt{notebooks}
-directory there is a notebook \texttt{Coursework\_1.ipynb} which is
-intended as a starting point for structuring the code for your
-experiments. You will probably want to add additional code cells to this
-as you go along and run new experiments (e.g.~doing each new training
-run in a new cell). You may also wish to use Markdown cells to keep
-notes on the results of experiments.
-
-There will also be a \verb+report+ directory which contains the LaTeX template and style files for the report.  You should copy all these files into the directory which will contain your report.
-
-
-\subsection{Standard network
-architecture}
-\label{sec:standard-network-architecture}
-
-To make the results of your experiments more easily comparable, you
-should try to keep as many of the free choices in the specification of
-the model and learning problem the same across different experiments. If
-you vary only a small number of aspects of the problem at a time this
-will make it easier to interpret the effect of those changes.
-
-In these experiments you should use a multi-layer network with two hidden layers 
-(corresponding to three affine transformations) and a softmax output layer.  The initial baseline
-should use a sigmoid activation function for the hidden layer;   other experiments will explore
-different nonlinear activation functions.  The hidden layers should each contain 100 hidden units.  
-The baseline network can this be defined with the following code (which should be familiar to you from Lab 3):
-
-\begin{Shaded}
-\begin{Highlighting}[]
-\CharTok{import} \NormalTok{numpy }\CharTok{as} \NormalTok{np}
-\CharTok{from} \NormalTok{mlp.layers }\CharTok{import} \NormalTok{AffineLayer, SoftmaxLayer, SigmoidLayer}
-\CharTok{from} \NormalTok{mlp.errors }\CharTok{import} \NormalTok{CrossEntropySoftmaxError}
-\CharTok{from} \NormalTok{mlp.models }\CharTok{import} \NormalTok{MultipleLayerModel}
-\CharTok{from} \NormalTok{mlp.initialisers }\CharTok{import} \NormalTok{ConstantInit, GlorotUniformInit}
-
-\NormalTok{seed = }\DecValTok{10102016}
-\NormalTok{rng = np.random.RandomState(seed)}
-
-\NormalTok{input_dim, output_dim, hidden_dim = }\DecValTok{784}\NormalTok{, }\DecValTok{10}\NormalTok{, }\DecValTok{100}
-
-\NormalTok{weights_init = GlorotUniformInit(rng=rng)}
-\NormalTok{biases_init = ConstantInit(}\DecValTok{0}\NormalTok{.)}
-
-\NormalTok{model = MultipleLayerModel([}
-    \NormalTok{AffineLayer(input_dim, hidden_dim, weights_init, biases_init),}
-    \NormalTok{SigmoidLayer(),}
-    \NormalTok{AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init),}
-    \NormalTok{SigmoidLayer(),}
-    \NormalTok{AffineLayer(hidden_dim, output_dim, weights_init, biases_init)}
-\NormalTok{])}
-
-\NormalTok{error = CrossEntropySoftmaxError()}
-\end{Highlighting}
-\end{Shaded}
-
-Here we are using the Glorot initialisation scheme, discussed in lecture 4.  In part 2B of this coursework you will explore the effect of different initialisation schemes.
-
-The above code creates a network using sigmoid hidden layers; you should modify it to also create a network using ReLU activation functions (see Lab 3).  These two networks will form your baseline systems.
-
-As well as standardising the network architecture, you should also fix
-the hyperparameters of the training procedure not being investigated to
-be the same across different runs. In particular for all experiments you
-should use a \textbf{batch size of 50 and train for a total of 100
-epochs} for all reported runs. You may of course use a smaller number of
-epochs for initial pilot runs.
-
-\subsection{Part 1: Implementing Activation Functions}
-\label{sec:actfns}
-
-In the first part of the assignment you will implement three further 
-activation functions, each of which is related to ReLU \citep{nair2010rectified}:  Leaky ReLU, ELU (Exponential Linear Unit), and SELU (Scaled Exponential Linear Unit).   Each of these units defines an activation function for which $f(x) = x$ when $x>0$, as for ReLU, but avoid having a zero gradient when $x<0$.
-
-\textbf{Leaky ReLU} ($\lrelu(x)$) \citep{maas2013rectifier} has the following form:
-\begin{equation}
-  \lrelu(x) =
-     \begin{cases} 
-      \alpha x     & \quad \text{if } x \leq  0 \\
-      x       & \quad \text{if } x > 0 \\
-    \end{cases} 
-\end{equation}
-Where $\alpha$ is a constant;  typically $\alpha=0.01$, and you can use this value in this coursework.  Note that $\alpha$ can be taken to be a parameter which is learned by back-propagation along with the weights and biases -- this is called Parametric ReLU (PReLU).  
-
-\textbf{ELU} ($\elu(x)$) \citep{clevert2015fast} has the following form:
-\begin{equation}
-  \elu(x) =
-     \begin{cases} 
-      \alpha (\exp(x) - 1)     & \quad \text{if } x \leq  0 \\
-      x       & \quad \text{if } x > 0 \\
-    \end{cases} 
-\end{equation}
-Again $\alpha$ can be taken as a constant or a tunable parameter.  Typically $\alpha=1$, which results in a smooth function, and you can use this value in this coursework.
-
-\textbf{SELU} ($\selu(x)$) \citep{klambauer2017self} has the following form:
-\begin{equation}
-  \selu(x) =
-     \lambda \begin{cases} 
-      \alpha (\exp(x) - 1)     & \quad \text{if } x \leq  0 \\
-      x       & \quad \text{if } x > 0 \\
-    \end{cases} 
-\end{equation}
-In the case of SELU, there is a theoretical argument for optimal values of the two parameters: $\alpha \approx 1.6733$ and $\lambda \approx 1.0507$, and you can use these values in this coursework.
-
-\begin{enumerate}
-  \item Implement each of these activations function as classes \verb+LeakyReluLayer+, \verb+EluLayer+ and \verb+SeluLayer+.  You need to implement \verb+fprop+ and \verb+bprop+ methods for each class.
-  \item Verify the correctness of your implementation using the supplied unit tests in \verb+Activation\_Tests.ipynb+
-  \item Automatically create a test file \verb+sXXXXXXX_test_file.txt+, by running the provided program \verb+generate_inputs.py+ which uses your code for  \verb+LeakyReluLayer+, \verb+EluLayer+ and \verb+SeluLayer+ to run your fprop and bprop methods for each layer on a unique test vector generated using your student ID number.
-\end{enumerate}
-
-For Part 1 of the coursework you need to submit the test file \verb+sXXXXXXX_test_file.txt+ (where sXXXXXXX is replaced with your student number) created in step 3 above.
-
-\subsection{Part 2: MNIST Experiments}
-\label{sec:expts}
-In  Part 2 of the coursework you will experiment with \verb+LeakyReluLayer+, \verb+EluLayer+ and \verb+SeluLayer+ in multi-layer networks trained on MNIST.
-
-\subsubsection{2A:  Comparing activation functions}
-In this sub-part you should compare the behaviour of Leaky ReLU, ELU, and SELU activation functions on the MNIST task.  Carry out all experiments using 2 hidden layers, with 100 units per hidden layer.  You should compare the results with baseline systems of the same architecture using sigmoid units and using ReLU units.  
-
-\subsubsection{2B: Deep neural network experiments}
-In this subpart you will explore the behaviour of deeper networks.  Based on the results of Part 2A, choose one activation function, and compare networks with 2--8 hidden layers, using 100 hidden units per hidden layer.  
-
-Also compare the effect of different initialisation strategies, as discussed in lecture 4.  First look at the effect of weight initialisation based on
-\begin{compactitem}
-\item Fan-in: $w_i \sim U(-\sqrt{3/n_{in}}, \sqrt{3/n_{in}}$
-\item Fan-out: $w_i \sim U(-\sqrt{3/n_{out}}, \sqrt{3/n_{out}}$
-\item Fan-in and Fan-out: $w_i \sim U \left(-\sqrt{6/(n_{in}+n_{out})}, \sqrt{6/(n_{in}+n_{out})}\right)$
-\end{compactitem}
-where $U$ is the uniform distribution.   The first of these corresponds to constraining the estimated variance of a unit to be independent of the number of incoming connections ($n_{in}$);  the second to  constraining the estimated variance of a unit's gradient to be independent of the number of outgoing connections ($n_{out}$);  the third corresponds to Glorot and Bengio's combined initialisation.  
-
-Additionally you could also explore the effect of drawing from a Gaussian distribution compared with a uniform distribution.  In particular you might like to explore initialising a SELU layer drawing from a Gaussian with mean 0 and variance $1/n_{out}$ as recommended by \cite{klambauer2017self}.
-
-For Part 2 of the coursework you need to write and submit a report, using the template provided, in the directory \verb+report+.  Please read the template document \verb+mlp-cw1-template.pdf+ very carefully, as it provides advice and instructions on writing your report.  You can use the LaTeX source file \verb+mlp-cw1-template.tex+ as a template for your report (see below, in the section 'Report').
-
-It is highly recommended that you use LaTeX for your report.  If you have not used LaTeX previously, now is a good time to learn how to use it!
-
-\subsection{Backing up your work}
-\label{sec:backing-up-your-work}
-
-It is \textbf{strongly recommended} you use some method for backing up
-your work. Those working in their AFS homespace on DICE will have their
-work automatically backed up as part of the
-\href{http://computing.help.inf.ed.ac.uk/backups-and-mirrors}{routine
-backup} of all user homespaces. If you are working on a personal
-computer you should have your own backup method in place (e.g.~saving
-additional copies to an external drive, syncing to a cloud service or
-pushing commits to your local Git repository to a private repository on
-Github). \textbf{Loss of work through failure to back up
-\href{http://tinyurl.com/edinflate}{does not consitute a good reason for
-late submission}}.
-
-You may \emph{additionally} wish to keep your coursework under version
-control in your local Git repository on the \texttt{coursework1} branch.
-% This does not need to be limited to the coursework notebook and
-% \texttt{mlp} Python modules - you can also add your report document to
-% the repository.
-
-If you make regular commits of your work on the coursework this will
-allow you to better keep track of the changes you have made and if
-necessary revert to previous versions of files and/or restore
-accidentally deleted work. This is not however required and you should
-note that keeping your work under version control is a distinct issue
-from backing up to guard against hard drive failure. If you are working
-on a personal computer you should still keep an additional back up of
-your work as described above.
-
-\subsection{Report}
-\label{sec:report}
-
-Part two of your coursework submission, worth 70 marks will be a report. The directory
-\verb+coursework1/report+ contains a template for your report (\verb+mlp-cw1-template.txt+);  the generated pdf file (\verb+mlp-cw1-template.pdf+) is also provided, and you should read this file carefully as it contains information about the required structure and experimentation. The template is written in LaTeX, and we strongly recommend that you write your own report using LaTeX, using the supplied document style \verb+mlp2017+ (as in the template).
-
-You should copy the files in the \verb+report+ directory to the directory containing the LaTeX file of your report, as \verb+pdflatex+ will need to access these files when building the pdf document from the LaTeX source file.
-
-Your report should be in a 2-column format, based on the document format used for the ICML conference. The report should be a \textbf{maximum of 6 pages long}, with a further page for references.  We will not read or assess any parts of the report beyond the allowed 6+1 pages.  
-
-Ideally, all figures should be included in your report file as
-\href{https://en.wikipedia.org/wiki/Vector_graphics}{vector graphics}
-rather than \href{https://en.wikipedia.org/wiki/Raster_graphics}{raster
-files} as this will make sure all detail in the plot is visible.
-Matplotlib supports saving high quality figures in a wide range of
-common image formats using the
-\href{http://matplotlib.org/api/pyplot_api.html\#matplotlib.pyplot.savefig}{\texttt{savefig}}
-function. \textbf{You should use \texttt{savefig} rather than copying
-the screen-resolution raster images outputted in the notebook.} An
-example of using \texttt{savefig} to save a figure as a PDF file (which
-can be included as graphics in
-\href{https://en.wikibooks.org/wiki/LaTeX/Importing_Graphics}{LaTeX}
-compiled with \texttt{pdflatex} and in Apple Pages and
-\href{https://support.office.com/en-us/article/Add-a-PDF-to-your-Office-file-74819342-8f00-4ab4-bcbe-0f3df15ab0dc}{Microsoft
-Word} documents) is given below.
-
-\begin{Shaded}
-\begin{Highlighting}[]
-\CharTok{import} \NormalTok{matplotlib.pyplot }\CharTok{as} \NormalTok{plt}
-\CharTok{import} \NormalTok{numpy }\CharTok{as} \NormalTok{np}
-\CommentTok{# Generate some example data to plot}
-\NormalTok{x = np.linspace(}\DecValTok{0}\NormalTok{., }\DecValTok{1}\NormalTok{., }\DecValTok{100}\NormalTok{)}
-\NormalTok{y1 = np.sin(}\DecValTok{2}\NormalTok{. * np.pi * x)}
-\NormalTok{y2 = np.cos(}\DecValTok{2}\NormalTok{. * np.pi * x)}
-\NormalTok{fig_size = (}\DecValTok{6}\NormalTok{, }\DecValTok{3}\NormalTok{)  }\CommentTok{# Set figure size in inches (width, height)}
-\NormalTok{fig = plt.figure(figsize=fig_size)  }\CommentTok{# Create a new figure object}
-\NormalTok{ax = fig.add_subplot(}\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{)  }\CommentTok{# Add a single axes to the figure}
-\CommentTok{# Plot lines giving each a label for the legend and setting line width to 2}
-\NormalTok{ax.plot(x, y1, linewidth=}\DecValTok{2}\NormalTok{, label=}\StringTok{'$y = \textbackslash{}sin(2\textbackslash{}pi x)$'}\NormalTok{)}
-\NormalTok{ax.plot(x, y2, linewidth=}\DecValTok{2}\NormalTok{, label=}\StringTok{'$y = \textbackslash{}cos(2\textbackslash{}pi x)$'}\NormalTok{)}
-\CommentTok{# Set the axes labels. Can use LaTeX in labels within $...$ delimiters.}
-\NormalTok{ax.set_xlabel(}\StringTok{'$x$'}\NormalTok{, fontsize=}\DecValTok{12}\NormalTok{)}
-\NormalTok{ax.set_ylabel(}\StringTok{'$y$'}\NormalTok{, fontsize=}\DecValTok{12}\NormalTok{)}
-\NormalTok{ax.grid(}\StringTok{'on'}\NormalTok{)  }\CommentTok{# Turn axes grid on}
-\NormalTok{ax.legend(loc=}\StringTok{'best'}\NormalTok{, fontsize=}\DecValTok{11}\NormalTok{)  }\CommentTok{# Add a legend}
-\NormalTok{fig.tight_layout()  }\CommentTok{# This minimises whitespace around the axes.}
-\NormalTok{fig.savefig(}\StringTok{'file-name.pdf'}\NormalTok{) }\CommentTok{# Save figure to current directory in PDF format}
-\end{Highlighting}
-\end{Shaded}
-
-(If you are using Libre/OpenOffice you should use Scalable Vector Format
-plots instead using \\
-\texttt{fig.savefig('file-name.svg')}. If the
-document editor you are using for the report does not support including
-either PDF or SVG graphics you can instead output high-resolution raster
-images using \texttt{fig.savefig('file-name.png', dpi=200)} however note
-these files will generally be larger than either SVG or PDF formatted
-graphics.)
-
-However to emphasise again: \textbf{It is highly recommended that you use LaTeX.}
-
-If you make use of any any books, articles, web pages or other resources
-you should appropriately cite these in your report. You do not need to
-cite material from the course lecture slides or lab notebooks.
-
-To create a pdf file \verb+mlp-cw1-template.pdf+ from a LaTeX source file (\verb+mlp-cw1-template.tex+), you can run the following in a terminal:
-\begin{verbatim}
-pdflatex mlp-cw1-template
-bibtex mlp-cw1-template
-pdflatex mlp-cw1-template
-pdflatex mlp-cw1-template
-\end{verbatim}
-(Yes, you have to run pdflatex multiple times, in order  for latex to construct the internal document references.)
-
-An alternative, simpler approach uses the \verb+latexmk+ program:
-\begin{verbatim}
-latexmk -pdf mlp-cw1-template
-\end{verbatim}
-
-It is worth learning how to use LaTeX effectively, as it is particularly powerful for mathematical and academic writing.  There are many tutorials on the web.
-
-
-\subsection{Mechanics}
-\label{sec:mechanics}
-
-\textbf{Marks:} 
-This assignment will be assessed out of 100 marks and
-forms 10\% of your final grade for the course.
-
-\textbf{Academic conduct:} 
-Assessed work is subject to University
-regulations on academic
-conduct:\\\url{http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct}
-
-\textbf{Submission:} 
-You can submit more than once up until the submission deadline. All
-submissions are timestamped automatically. Identically named files
-will overwrite earlier submitted versions, so we will mark the latest
-submission that comes in before the deadline.
-
-If you submit anything before the deadline, you may not resubmit
-afterward. (This policy allows us to begin marking submissions
-immediately after the deadline, without having to worry that some may
-need to be re-marked).
-
-If you do not submit anything before the deadline, you may submit {\em
-exactly once} after the deadline, and a late penalty will be applied
-to this submission unless you have received an approved extension.
-Please be aware that late submissions may receive lower priority for
-marking, and marks may not be returned within the same timeframe as
-for on-time submissions.
-
-{\em Warning:} Unfortunately the \verb+submit+ command will technically
-allow you to submit late even if you submitted before the deadline
-(i.e.\ it does not enforce the above policy). Don't do this! We will
-mark the version that we retrieve just after the deadline, and (even
-worse) you may still be penalized for submitting late because the
-timestamp will update.
-
-For additional information about late penalties and extension
-requests, see the School web page below. Do {\bf not} email any course
-staff directly about extension requests; you must follow the
-instructions on the web page.
-
-\url{http://web.inf.ed.ac.uk/infweb/student-services/ito/admin/coursework-projects/late-coursework-extension-requests}
-
-\textbf{Late submission penalty:}  
-Following the University guidelines, 
-late coursework submitted without an authorised extension will be
-recorded as late and the following penalties will apply: 5
-percentage points will be deducted for every calendar day or part
-thereof it is late, up to a maximum of 7 calendar days. After this
-time a mark of zero will be recorded.
-
-\subsection{Submission}
-\label{sec:submission}
-
-Your coursework submission should be done electronically using the
-\href{http://computing.help.inf.ed.ac.uk/submit}{\texttt{submit}}
-command available on DICE machines.
-
-Your submission should include
-
-\begin{itemize}
-\itemsep1pt\parskip0pt\parsep0pt
-\item
-  the unit test file generated in part 1, \verb+sXXXXXXX_test_file.txt+, where your student number replaces \verb+sXXXXXXX+
-\item
-  your completed report as a PDF file, using the provided template
-\item
-  the notebook (\verb+.ipynb+) file you used to run the experiments in
-\item
-  and your local version of the \texttt{mlp} code including any changes
-  you made to the modules (\texttt{.py} files).
-\end{itemize}
-
-You should copy all of the files to a single directory, \verb+coursework1+, e.g.
-
-\begin{verbatim}
-mkdir coursework1
-cp notebooks/Coursework_1.ipynb mlp/*.py coursework1
-cp reports/coursework1.pdf reports/sXXXXXXX_test_file.txt coursework1
-\end{verbatim}
-
-
-and then submit this directory using
-
-\begin{verbatim}
-submit mlp cw1 coursework1
-\end{verbatim}
-
-The \texttt{submit} command will prompt you with the details of the
-submission including the name of the files / directories you are
-submitting and the name of the course and exercise you are submitting
-for and ask you to check if these details are correct. You should check
-these carefully and reply \texttt{y} to submit if you are sure the files
-are correct and \texttt{n} otherwise.
-
-You can amend an existing submission by rerunning the \texttt{submit}
-command any time up to the deadline. It is therefore a good idea
-(particularly if this is your first time using the DICE submit
-mechanism) to do an initial run of the \texttt{submit} command early on
-and then rerun the command if you make any further updates to your
-submisison rather than leaving submission to the last minute.
-
-
-\subsection{Marking Scheme}
-\label{sec:marking-scheme}
-
-\begin{itemize}
-\item
-  Part 1, Activation function implementation (30 marks). Based on your submitted test file.
-\item
-  Part 2, Report (70 marks).  The following aspects will contribute to the mark for your report:
-  \begin{itemize}
-    \item Abstract - how clear is it? does it cover what is reported in the document
-    \item Introduction - do you clear outline and motivate the paper, and describe the research questions investigated?
-    \item Description of activation functions -- is it clear and correct?
-    \item Experiments -- did you carry out the experiments correctly?  are the results clearly presented and described?  
-    \item Interpretation and discussion of results
-    \item Conclusions
-    \item Presentation and clarity of report 
-  \end{itemize}
-\end{itemize}
-
-\bibliographystyle{plainnat}
-\bibliography{cw1-references}
-\end{document}
--- a/spec/coursework2.pdf
+++ b/spec/coursework2.pdf
--- a/spec/coursework2.tex
+++ b/spec/coursework2.tex
@ -1,408 +0,0 @@
-\documentclass[11pt,]{article}
-\usepackage[T1]{fontenc}
-\usepackage{amssymb,amsmath}
-\usepackage{txfonts}
-\usepackage{microtype}
-\usepackage{amssymb,amsmath}
-\usepackage{graphicx}
-\usepackage{subfigure} 
-\usepackage{natbib}
-\usepackage{paralist}
-\usepackage{hyperref}
-\usepackage{url}
-\urlstyle{same}
-\usepackage{color}
-\usepackage{fancyvrb}
-\newcommand{\VerbBar}{|}
-\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
-\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
-% Add ',fontsize=\small' for more characters per line
-\newenvironment{Shaded}{}{}
-\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
-\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
-\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
-\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
-\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
-\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
-\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
-\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
-\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
-\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
-\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
-\newcommand{\RegionMarkerTok}[1]{{#1}}
-\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
-\newcommand{\NormalTok}[1]{{#1}}
-
-\hypersetup{breaklinks=true,
-            pdfauthor={},
-            pdftitle={},
-            colorlinks=true,
-            citecolor=blue,
-            urlcolor=blue,
-            linkcolor=magenta,
-            pdfborder={0 0 0}}
-
-\setlength{\parindent}{0pt}
-\setlength{\parskip}{6pt plus 2pt minus 1pt}
-\setlength{\emergencystretch}{3em}  % prevent overfull lines
-\setcounter{secnumdepth}{1}
-
-\usepackage[a4paper,body={170mm,250mm},top=25mm,left=25mm]{geometry}
-\usepackage[sf,bf,small]{titlesec}
-\usepackage{fancyhdr}
-
-\pagestyle{fancy}
-\lhead{\sffamily MLP Coursework 2}
-\rhead{\sffamily Due: 28 November 2017}
-\cfoot{\sffamily \thepage}
-
-\author{}
-\date{}
-
-\DeclareMathOperator{\softmax}{softmax}
-\DeclareMathOperator{\sigmoid}{sigmoid}
-\DeclareMathOperator{\sgn}{sgn}
-\DeclareMathOperator{\relu}{relu}
-\DeclareMathOperator{\lrelu}{lrelu}
-\DeclareMathOperator{\elu}{elu}
-\DeclareMathOperator{\selu}{selu}
-\DeclareMathOperator{\maxout}{maxout}
-
-\begin{document}
-
-\begin{center}
-\textsf{\textbf{\Large Machine Learning Practical: Coursework 2}}
-
-\bigskip
-\textbf{Release date: Monday 6th November 2017}
-
-\textbf{Due date: 16:00 Tuesday 28th November 2017}
-\end{center}
-
-\section{Introduction}
-\label{sec:introduction}
-% This coursework is concerned with training multi-layer networks to
-% address the MNIST digit classification problem. It builds on the
-% material covered in the first three lab notebooks and the first four
-% lectures. \textbf{You should complete the first three lab
-% notebooks before starting the coursework.} The aim of the coursework is
-% to investigate variants of the ReLU activation function for hidden units 
-% in multi-layer networks, with respect to the validation set accuracies 
-% achieved by the trained models.
-
-The aim of this coursework is to further explore the classification of images of handwritten digits using neural networks.  We'll be using an extended version of the MNIST database, the EMNIST Balanced dataset, described in Section~\ref{sec:emnist}. Part A of the coursework will consist of building baseline deep neural networks for the EMNIST classification task, implementation  and experimentation of the Adam and RMSProp learning rules, and  implementation  and experimentation of Batch Normalisation.  Part B will concern implementation and experimentation of convolutional networks.  As with the previous coursework, you will need to hand in test files generated from your code, and a report.
-
-\section{Dataset}
-\label{sec:emnist}
-In this coursework we shall use the  EMNIST  (Extended MNIST) Balanced dataset, \url{https://www.nist.gov/itl/iad/image-group/emnist-dataset} \citep{cohen2017emnist}.  EMNIST extends MNIST by including images of handwritten letters (upper and lower case) as well as handwritten digits.  Both EMNIST and MNIST are extracted from the same underlying dataset, referred to as NIST Special Database 19.  Both use the same conversion process resulting in centred images of dimension 28$\times$28.  
-
-Although there are 62 potential classes for EMNIST (10 digits, 26 lower case letters, and 26 upper case letters) we shall use a reduced label set of 47 different labels.  This is because of confusions which arise when trying to discriminate upper-case and lower-case versions of the same letter, following the data conversion process.  In the 47 label set, upper- and lower-case labels are merged for the following letters: C, I, J, K, L, M, O, P, S, U, V, W, X, Y and Z.  
-
-The training set for Balanced EMNIST has about twice the number of examples as MNIST, thus you should expect the run-time of your experiments to be about twice as long.  The expected accuracy rates are lower for EMNIST than for MNIST (as EMNIST has more classes, and more confusable examples), and differences in accuracy between different systems should be larger.  See \citet{cohen2017emnist} for some baseline results on EMNIST, as well as a description of the dataset.
-
-You don't need to download the EMNIST database from the NIST website, it will be part of the \verb+coursework_2+ branch from the \verb+mlpractical+ Github repository, discussed in Section~\ref{sec:code} below.
-
-
-
-
-\section{Code}
-\label{sec:code}
-
-You should run all of the experiments for the coursework inside the
-Conda environment you set up in the first labs.  The code for the coursework 
-is available on the course
-\href{https://github.com/CSTR-Edinburgh/mlpractical/}{Github repository}
-on a branch \verb+mlp2017-8/coursework_2+. To create a local working
-copy of this branch in your local repository you need to do the
-following.
-
-\begin{enumerate}
-\def\labelenumi{\arabic{enumi}.}
-\itemsep1pt\parskip0pt\parsep0pt
-\item
-  Make sure all modified files on the branch you are currently have been
-  committed
-  (\href{https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2017-8/master/notes/getting-started-in-a-lab.md}{see
-  details here} if you are unsure how to do this).
-\item
-  Fetch changes to the upstream \texttt{origin} repository by running\\
-  \texttt{git fetch origin}
-\item
-  Checkout a new local branch from the fetched branch using\\
-  \verb+git checkout -b coursework_2 origin/mlp2017-8/coursework_2+
-\end{enumerate}
-
-You will now have a new branch in your local repository with all the
-code necessary for the coursework in it.   
-
-This branch includes the following additions to your setup:
-
-\begin{itemize}
-	\itemsep1pt\parskip0pt\parsep0pt
-	\item
-	A notebook \verb+BatchNormalizationLayer_tests.ipynb+ which includes 
-	test functions to check the implementations of the BatchNorm layer
-	\texttt{fprop}, \texttt{bprop} and \texttt{grads\_wrt\_params}
-	methods. The BatchNormalizationLayer skeleton code can be found in mlp.layers. 
-	The tests use the mlp.layers implementation so be sure to reload your notebook
-	when you update your mlp.layers code.
-	\item
-	A notebook \verb+ConvolutionalLayer_tests.ipynb+ which includes
-	test functions to check the implementations of the Convolutional layer
-	\texttt{fprop}, \texttt{bprop} and \texttt{grads\_wrt\_params}
-	methods. The ConvolutionalLayer skeleton code can be found in mlp.layers. 
-	The tests use the mlp.layers implementation so be sure to reload your notebook
-	when you update your mlp.layers code.
-	\item
-	A new \texttt{ReshapeLayer} class in the \verb+mlp.layers+ module.
-	When included in a a multiple layer model, this allows the output of
-	the previous layer to be reshaped before being forward propagated to
-	the next layer.
-	\item
-	A new \texttt{EMNISTDataProvider} class in the \verb+mlp.data_providers+ module.
-	This class is a small change to the \texttt{MNISTDataProvider} class, linking to the Balanced EMNIST data, and setting the number of classes to 47.
-	\item
-	Training, validation, and test sets for the \texttt{EMNIST Balanced} dataset that
-	you will use in this coursework
-\end{itemize}
-
-
-% In the \texttt{notebooks}
-% directory there is a notebook \verb+Coursework_1.ipynb+ which is
-% intended as a starting point for structuring the code for your
-% experiments. You will probably want to add additional code cells to this
-% as you go along and run new experiments (e.g.~doing each new training
-% run in a new cell). You may also wish to use Markdown cells to keep
-% notes on the results of experiments.
-
-There will also be a \verb+coursework_2/report+ directory which contains the LaTeX template and style files for the report.  You should copy all these files into the directory which will contain your report.
-
-
-\section{Tasks}
-
-\subsection*{Part A: Deep Neural Networks}
-In part A of the coursework you will focus on using deep neural networks on EMNIST, and you should implement the Adam and RMSProp learning rules, and Batch Normalisation.
-\begin{enumerate}
-  \item Perform baseline experiments using DNNs trained on EMNIST.  Obviously there are a lot things that could be explored including hidden unit activation functions, network architectures, training hyperparameters, and the use of regularisation and dropout.   You cannot explore everything and is best to carefully investigate a few things in depth.
-  \item Implement the RMSProp \citep{tieleman2012rmsprop} and Adam \citep{kingma2015adam} learning rules, by defining new classes inheriting from \texttt{GradientDescendLearningRule} in the \texttt{mlp/learning\_rules.py} module. The \texttt{MomentumLearningRule} class is an example of how to define  a learning rules which uses an additional state variable to calculate the updates to the parameters.
-  \item Perform experiments to compare stochastic gradient descent, RMSProp, and Adam for deep neural network training on EMNIST, building on your earlier baseline experiments.  
-  \item Implement batch normalisation \citep{ioffe2015batch} as a class \verb+BatchNormalizationLayer+.  You need to implement \texttt{fprop}, \texttt{bprop} and \texttt{grads\_wrt\_params} methods for this class. 
-  \item Verify the correctness of your implementation using the supplied unit tests in\\\verb+BatchNormalizationLayer_tests.ipynb+.
-  \item Automatically create a test file \verb+sXXXXXXX_batchnorm_test.txt+, by running the provided program \verb+generate_batchnorm_test.py+ which uses your \verb+BatchNormalizationLayer+ class methods on a unique test vector generated using your student ID number.
-  \item Perform experiments on EMNIST to investigate the impact of using batch normalisation in deep neural networks, building on your earlier experiments.  
-\end{enumerate}
-In the above experiments you should use the validation set to assess accuracy.  Use the test set at the end to assess the accuracy of the deep neural network architecture and training setup that you judge to be the best.
-
-
-
-\subsection*{Part B: Convolutional Networks}
-In part B of the coursework you should implement convolutional  and max-pooling layers, and carry out experiments using a convolutional networks with one and two convolutional layers.  
-\begin{enumerate}
-  \item Implement a convolutional layer as a class \verb+ConvolutionalLayer+.  You need to implement \texttt{fprop}, \texttt{bprop} and \texttt{grads\_wrt\_params} methods for this class. 
-  \item Verify the correctness of your implementation using the supplied unit tests in\\\verb+ConvolutionalLayer_tests.ipynb+.
-  \item Automatically create a test file \verb+sXXXXXXX_conv_test.txt+, by running the provided program \verb+generate_conv_test.py+ which uses your \verb+ConvolutionalLayer+ class methods on a unique test vector generated using your student ID number.
-  \item Implement a max-pooling layer. Non-overlapping pooling (which was assumed in the lecture presentation) is required. You may also implement a more generic solution with striding as well. 
-  \item Construct and train networks containing one and two convolutional layers, and max-pooling layers, using the Balanced EMNIST data, reporting your experimental results.  As a default use convolutional kernels of dimension 5x5 (stride 1) and pooling regions of 2x2 (stride 2, hence non-overlapping).  As a default convolutional networks with two convolutional layers,  investigate a network with two convolutional+maxpooling layers with 5 feature maps in the first convolutional layer, and 10 feature maps in the second convolutional layer. 
-\end{enumerate}
-As before you should mainly use the validation set to assess accuracy, using the test set to assess the accuracy of the convolutional network you judge to be the best.
-
-
-
-
-
-
-\section{Unit Tests}
-\label{sec:tests}
-Part one of your coursework submission will be the test files generated for batch normalisation (\verb+sXXXXXXX_batchnorm_test.txt+) and for the convolutional layer (\verb+sXXXXXXX_conv_test.txt+), as described above.  Please do not change the names of these files as they will be automatically verified.
-
-\section{Report}
-\label{sec:report}
-Part two of your coursework submission, worth 70 marks will be a report. The directory
-\verb+coursework_2/report+ contains a template for your report (\verb+mlp-cw2-template.txt+);  the generated pdf file (\verb+mlp-cw2-template.pdf+) is also provided, and you should read this file carefully as it contains information about the required structure and experimentation. The template is written in LaTeX, and we strongly recommend that you write your own report using LaTeX, using the supplied document style \verb+mlp2017+ (as in the template).
-
-You should copy the files in the \verb+report+ directory to the directory containing the LaTeX file of your report, as \verb+pdflatex+ will need to access these files when building the pdf document from the LaTeX source file.
-
-Your report should be in a 2-column format, based on the document format used for the ICML conference. The report should be a \textbf{maximum of 7 pages long}, with a further page for references.  We will not read or assess any parts of the report beyond the allowed 7+1 pages.  
-
-As before, all figures should be included in your report file as vector graphics;
-please see the section in \verb+coursework1.pdf+ about how to do this.
-
-If you make use of any any books, articles, web pages or other resources
-you should appropriately cite these in your report. You do not need to
-cite material from the course lecture slides or lab notebooks.
-
-To create a pdf file \verb+mlp-cw2-template.pdf+ from a LaTeX source file (\verb+mlp-cw2-template.tex+), you can run the following in a terminal:
-\begin{verbatim}
-pdflatex mlp-cw2-template
-bibtex mlp-cw2-template
-pdflatex mlp-cw2-template
-pdflatex mlp-cw2-template
-\end{verbatim}
-(Yes, you have to run pdflatex multiple times, in order  for latex to construct the internal document references.)
-
-An alternative, simpler approach uses the \verb+latexmk+ program:
-\begin{verbatim}
-latexmk -pdf mlp-cw2-template
-\end{verbatim}
-
-It is worth learning how to use LaTeX effectively, as it is particularly powerful for mathematical and academic writing.  There are many tutorials on the web.
-
-
-\section{Mechanics}
-\label{sec:mechanics}
-
-\textbf{Marks:} 
-This assignment will be assessed out of 100 marks and
-forms 25\% of your final grade for the course.
-
-\textbf{Academic conduct:} 
-Assessed work is subject to University
-regulations on academic
-conduct:\\\url{http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct}
-
-\textbf{Submission:} 
-You can submit more than once up until the submission deadline. All
-submissions are timestamped automatically. Identically named files
-will overwrite earlier submitted versions, so we will mark the latest
-submission that comes in before the deadline.
-
-If you submit anything before the deadline, you may not resubmit
-afterward. (This policy allows us to begin marking submissions
-immediately after the deadline, without having to worry that some may
-need to be re-marked).
-
-If you do not submit anything before the deadline, you may submit {\em
-exactly once} after the deadline, and a late penalty will be applied
-to this submission unless you have received an approved extension.
-Please be aware that late submissions may receive lower priority for
-marking, and marks may not be returned within the same timeframe as
-for on-time submissions.
-
-{\em Warning:} Unfortunately the \verb+submit+ command will technically
-allow you to submit late even if you submitted before the deadline
-(i.e.\ it does not enforce the above policy). Don't do this! We will
-mark the version that we retrieve just after the deadline, and (even
-worse) you may still be penalized for submitting late because the
-timestamp will update.
-
-For additional information about late penalties and extension
-requests, see the School web page below. Do {\bf not} email any course
-staff directly about extension requests; you must follow the
-instructions on the web page.
-
-\url{http://web.inf.ed.ac.uk/infweb/student-services/ito/admin/coursework-projects/late-coursework-extension-requests}
-
-\textbf{Late submission penalty:}  
-Following the University guidelines, 
-late coursework submitted without an authorised extension will be
-recorded as late and the following penalties will apply: 5
-percentage points will be deducted for every calendar day or part
-thereof it is late, up to a maximum of 7 calendar days. After this
-time a mark of zero will be recorded.
-
-\section{Backing up your work}
-\label{sec:backing-up-your-work}
-
-It is \textbf{strongly recommended} you use some method for backing up
-your work. Those working in their AFS homespace on DICE will have their
-work automatically backed up as part of the
-\href{http://computing.help.inf.ed.ac.uk/backups-and-mirrors}{routine
-backup} of all user homespaces. If you are working on a personal
-computer you should have your own backup method in place (e.g.~saving
-additional copies to an external drive, syncing to a cloud service or
-pushing commits to your local Git repository to a private repository on
-Github). \textbf{Loss of work through failure to back up
-\href{http://tinyurl.com/edinflate}{does not consitute a good reason for
-late submission}}.
-
-You may \emph{additionally} wish to keep your coursework under version
-control in your local Git repository on the \verb+coursework_2+ branch.
-
-If you make regular commits of your work on the coursework this will
-allow you to better keep track of the changes you have made and if
-necessary revert to previous versions of files and/or restore
-accidentally deleted work. This is not however required and you should
-note that keeping your work under version control is a distinct issue
-from backing up to guard against hard drive failure. If you are working
-on a personal computer you should still keep an additional back up of
-your work as described above.
-
-
-
-\section{Submission}
-\label{sec:submission}
-
-Your coursework submission should be done electronically using the
-\href{http://computing.help.inf.ed.ac.uk/submit}{\texttt{submit}}
-command available on DICE machines.
-
-Your submission should include
-
-\begin{itemize}
-\itemsep1pt\parskip0pt\parsep0pt
-\item
-  the unit test files generated for part 1, \verb+sXXXXXXX_batchnorm_test.txt+ and \verb+sXXXXXXX_conv_test.txt+, where your student number replaces \verb+sXXXXXXX+.  Please do not 
-  change the names of these files.
-\item
-  your completed report as a PDF file, using the provided template
-\item
-  any notebook (\verb+.ipynb+) files you used to run the experiments in
-\item
-  and your local version of the \texttt{mlp} code including any changes
-  you made to the modules (\texttt{.py} files).
-\end{itemize}
-Please do not submit anything else (e.g. log files).
-
-You should copy all of the files to a single directory, \verb+coursework2+, e.g.
-
-\begin{verbatim}
-mkdir coursework2
-cp reports/coursework2.pdf sXXXXXXX_batchnorm_test.txt sXXXXXXX_conv_test.txt coursework2
-\end{verbatim}
-
-
-and then submit this directory using
-
-\begin{verbatim}
-submit mlp cw2 coursework2
-\end{verbatim}
-
-Please submit the directory, not a zip file, not a tar file.
-
-The \texttt{submit} command will prompt you with the details of the
-submission including the name of the files / directories you are
-submitting and the name of the course and exercise you are submitting
-for and ask you to check if these details are correct. You should check
-these carefully and reply \texttt{y} to submit if you are sure the files
-are correct and \texttt{n} otherwise.
-
-You can amend an existing submission by rerunning the \texttt{submit}
-command any time up to the deadline. It is therefore a good idea
-(particularly if this is your first time using the DICE submit
-mechanism) to do an initial run of the \texttt{submit} command early on
-and then rerun the command if you make any further updates to your
-submisison rather than leaving submission to the last minute.
-
-
-\section{Marking Scheme}
-\label{sec:marking-scheme}
-
-\begin{itemize}
-\item
-  Part 1, Unit tests (30 marks).
-\item
-  Part 2, Report (70 marks).  The following aspects will contribute to the mark for your report:
-  \begin{itemize}
-    \item Abstract - how clear is it? does it cover what is reported in the document
-    \item Introduction - do you clearly outline and motivate the paper, and describe the research questions investigated?
-    \item Methods -- have you carefully described the approaches you have used?
-    \item Experiments -- did you carry out the experiments correctly?  are the results clearly presented and described?  
-    \item Interpretation and discussion of results
-    \item Conclusions
-    \item Presentation and clarity of report 
-  \end{itemize}
-\end{itemize}
-
-\bibliographystyle{plainnat}
-\bibliography{cw2-references}
-\end{document}
--- a/spec/cw1-references.bib
+++ b/spec/cw1-references.bib
@ -1,29 +0,0 @@
-@inproceedings{maas2013rectifier,
-  title={Rectifier nonlinearities improve neural network acoustic models},
-  author={Maas, Andrew L and Hannun, Awni Y and Ng, Andrew Y},
-  booktitle={Proc. ICML},
-  volume={30},
-  number={1},
-  year={2013}
-}
-
-@inproceedings{nair2010rectified,
-  title={Rectified linear units improve restricted {Boltzmann} machines},
-  author={Nair, Vinod and Hinton, Geoffrey E},
-  booktitle={Proc ICML},
-  pages={807--814},
-  year={2010}
-}
-
-@article{clevert2015fast,
-  title={Fast and accurate deep network learning by exponential linear units ({ELU}s)},
-  author={Clevert, Djork-Arn{\'e} and Unterthiner, Thomas and Hochreiter, Sepp},
-  journal={arXiv preprint arXiv:1511.07289},
-  year={2015}
-}
-@article{klambauer2017self,
-  title={Self-Normalizing Neural Networks},
-  author={Klambauer, G{\"u}nter and Unterthiner, Thomas and Mayr, Andreas and Hochreiter, Sepp},
-  journal={arXiv preprint arXiv:1706.02515},
-  year={2017}
-}
--- a/spec/cw2-references.bib
+++ b/spec/cw2-references.bib
@ -1,64 +0,0 @@
-@inproceedings{maas2013rectifier,
-  title={Rectifier nonlinearities improve neural network acoustic models},
-  author={Maas, Andrew L and Hannun, Awni Y and Ng, Andrew Y},
-  booktitle={Proc. ICML},
-  volume={30},
-  number={1},
-  year={2013}
-}
-
-@inproceedings{nair2010rectified,
-  title={Rectified linear units improve restricted {Boltzmann} machines},
-  author={Nair, Vinod and Hinton, Geoffrey E},
-  booktitle={Proc ICML},
-  pages={807--814},
-  year={2010}
-}
-
-@article{clevert2015fast,
-  title={Fast and accurate deep network learning by exponential linear units ({ELU}s)},
-  author={Clevert, Djork-Arn{\'e} and Unterthiner, Thomas and Hochreiter, Sepp},
-  journal={arXiv preprint arXiv:1511.07289},
-  year={2015}
-}
-@article{klambauer2017self,
-  title={Self-Normalizing Neural Networks},
-  author={Klambauer, G{\"u}nter and Unterthiner, Thomas and Mayr, Andreas and Hochreiter, Sepp},
-  journal={arXiv preprint arXiv:1706.02515},
-  year={2017}
-}
-
-@article{cohen2017emnist,
-  title = {{EMNIST}: an extension of {MNIST} to handwritten letters},
-  author = {Cohen, G. and Afshar, S. and Tapson, J. and van Schaik, A.},
-  journal={arXiv preprint arXiv:1702.05373},
-  year={2017},
-  url = {https://arxiv.org/abs/1702.05373}
-}
-
-@inproceedings{kingma2015adam,
-  title = {Adam: A Method for Stochastic Optimization},
-  author = {Diederik P. Kingma and Jimmy Ba},
-  booktitle = {ICML},
-  year = {2015},
-  url = {https://arxiv.org/abs/1412.6980}
-}
-
-@article{tieleman2012rmsprop,
-  title={Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude},
-  author={Tieleman, T. and Hinton, G. E.},
-  journal={COURSERA: Neural Networks for Machine Learning},
-  volume={4},
-  number={2},
-  year={2012},
-  url = {https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf}
-}
-
-@inproceedings{ioffe2015batch,
-  title={Batch normalization: Accelerating deep network training by reducing internal covariate shift},
-  author={Ioffe, Sergey and Szegedy, Christian},
-  booktitle={ICML},
-  pages={448--456},
-  year={2015},
-  url = {http://proceedings.mlr.press/v37/ioffe15.html}
-}
--- a/utils/init.py
+++ b/utils/init.py
--- a/utils/network_summary.py
+++ b/utils/network_summary.py
@ -0,0 +1,27 @@
+def count_parameters(network_variables, name):
+    """
+    This method counts the total number of unique parameters for a list of variable objects
+    :param network_variables: A list of tf network variable objects
+    :param name: Name of the network
+    """
+    total_parameters = 0
+    for variable in network_variables:
+        # shape is an array of tf.Dimension
+        print(variable)
+        shape = variable.get_shape()
+        variable_parametes = 1
+        for dim in shape:
+            variable_parametes *= dim.value
+
+        total_parameters += variable_parametes
+    print(name, "has a total of", total_parameters, "parameters")
+
+
+def view_names_of_variables(variables):
+    """
+    View all variable names in a tf variable list
+    :param variables: A list of tf variables
+    """
+    for variable in variables:
+        print(variable)
+
--- a/utils/parser_utils.py
+++ b/utils/parser_utils.py
@ -0,0 +1,44 @@
+class ParserClass(object):
+    def __init__(self, parser):
+        """
+        Parses arguments and saves them in the Parser Class
+        :param parser: A parser to get input from
+        """
+        parser.add_argument('--batch_size', nargs="?", type=int, default=64, help='batch_size for experiment')
+        parser.add_argument('--epochs', type=int, nargs="?", default=100, help='Number of epochs to train for')
+        parser.add_argument('--logs_path', type=str, nargs="?", default="classification_logs/",
+                            help='Experiment log path, '
+                                 'where tensorboard is saved, '
+                                 'along with .csv of results')
+        parser.add_argument('--experiment_prefix', nargs="?", type=str, default="classification",
+                            help='Experiment name without hp details')
+        parser.add_argument('--continue_epoch', nargs="?", type=int, default=-1, help="ID of epoch to continue from, "
+                                                                                      "-1 means from scratch")
+        parser.add_argument('--tensorboard_use', nargs="?", type=str, default="False",
+                            help='Whether to use tensorboard')
+        parser.add_argument('--dropout_rate', nargs="?", type=float, default=0.35, help="Dropout value")
+        parser.add_argument('--batch_norm_use', nargs="?", type=str, default="False", help='Whether to use tensorboard')
+        parser.add_argument('--strided_dim_reduction', nargs="?", type=str, default="False",
+                            help='Whether to use tensorboard')
+        parser.add_argument('--seed', nargs="?", type=int, default=1122017, help='Whether to use tensorboard')
+
+        self.args = parser.parse_args()
+
+    def get_argument_variables(self):
+        """
+        Processes the parsed arguments and produces variables of specific types needed for the experiments
+        :return: Arguments needed for experiments
+        """
+        batch_size = self.args.batch_size
+        experiment_prefix = self.args.experiment_prefix
+        strided_dim_reduction = True if self.args.strided_dim_reduction == "True" else False
+        batch_norm = True if self.args.batch_norm_use == "True" else False
+        seed = self.args.seed
+        dropout_rate = self.args.dropout_rate
+        tensorboard_enable = True if self.args.tensorboard_use == "True" else False
+        continue_from_epoch = self.args.continue_epoch  # use -1 to start from scratch
+        epochs = self.args.epochs
+        logs_path = self.args.logs_path
+
+        return batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
+               strided_dim_reduction, experiment_prefix, dropout_rate
--- a/utils/storage.py
+++ b/utils/storage.py
@ -0,0 +1,56 @@
+import csv
+
+
+def save_statistics(log_dir, statistics_file_name, list_of_statistics, create=False):
+    """
+    Saves a statistics .csv file with the statistics
+    :param log_dir: Directory of log
+    :param statistics_file_name: Name of .csv file
+    :param list_of_statistics: A list of statistics to add in the file
+    :param create: If True creates a new file, if False adds list to existing
+    """
+    if create:
+        with open("{}/{}.csv".format(log_dir, statistics_file_name), 'w+') as f:
+            writer = csv.writer(f)
+            writer.writerow(list_of_statistics)
+    else:
+        with open("{}/{}.csv".format(log_dir, statistics_file_name), 'a') as f:
+            writer = csv.writer(f)
+            writer.writerow(list_of_statistics)
+
+
+def load_statistics(log_dir, statistics_file_name):
+    """
+    Loads the statistics in a dictionary.
+    :param log_dir: The directory in which the log is saved
+    :param statistics_file_name: The name of the statistics file
+    :return: A dict with the statistics
+    """
+    data_dict = dict()
+    with open("{}/{}.csv".format(log_dir, statistics_file_name), 'r') as f:
+        lines = f.readlines()
+        data_labels = lines[0].replace("\n", "").replace("\r", "").split(",")
+        del lines[0]
+
+        for label in data_labels:
+            data_dict[label] = []
+
+        for line in lines:
+            data = line.replace("\n", "").replace("\r", "").split(",")
+            for key, item in zip(data_labels, data):
+                if item not in data_labels:
+                    data_dict[key].append(item)
+    return data_dict
+
+
+def build_experiment_folder(experiment_name, log_path):
+    saved_models_filepath = "{}/{}/{}".format(log_path, experiment_name.replace("%.%", "/"), "saved_models")
+    logs_filepath = "{}/{}/{}".format(log_path, experiment_name.replace("%.%", "/"), "summary_logs")
+    import os
+
+    if not os.path.exists(logs_filepath):
+        os.makedirs(logs_filepath)
+    if not os.path.exists(saved_models_filepath):
+        os.makedirs(saved_models_filepath)
+
+    return saved_models_filepath, logs_filepath