update google notes to be what I think they should be

update
debug arg_extractor.py
2024-11-18 15:24:31 +00:00 · 2024-10-23 02:05:32 +08:00 · 2024-10-23 02:04:02 +08:00 · 2024-10-23 02:01:28 +08:00 · 2024-10-23 01:59:06 +08:00
80 changed files with 2451 additions and 5979 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,6 @@
+#editors
+*.idea/
+
 #dropbox stuff
 *.dropbox*

@ -58,6 +61,8 @@ docs/_build/

 # PyBuilder
 target/
-
-# Notebook stuff
-notebooks/.ipynb_checkpoints/
+*.tar.gz
+google-cloud-sdk/
+solutions/
+.ipynb_checkpoints/
+emnist_tutorial/
--- a/README.md
+++ b/README.md
@ -1,24 +1,22 @@
-# Machine Learning Practical
+# MLP Compute Engines Tutorials Branch

-This repository contains the code for the University of Edinburgh [School of Informatics](http://www.inf.ed.ac.uk) course [Machine Learning Practical](http://www.inf.ed.ac.uk/teaching/courses/mlp/).
+A short code repo that guides you through the process of running experiments on the Google Cloud Platform.

-This assignment-based course is focused on the implementation and evaluation of machine learning systems. Students who do this course will have experience in the design, implementation, training, and evaluation of machine learning systems.
+## Why do I need it?
+Most Deep Learning experiments require a large amount of compute as you have noticed in term 1. Usage of GPU can accelerate experiments around 30-50x therefore making experiments that require a large amount of time feasible by slashing their runtimes down by a massive factor. For a simple example consider an experiment that required a month to run, that would make it infeasible to actually do research with. Now consider that experiment only requiring 1 day to run, which allows one to iterate over methodologies, tune hyperparameters and overall try far more things. This simple example expresses one of the simplest reasons behind the GPU hype that surrounds machine learning research today.

-The code in this repository is split into:
+## Introduction

-* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
-* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
+The material available includes tutorial documents and code, as well as tooling that provides more advanced features to aid you in your quests to train lots of learnable differentiable computational graphs.

-## Remote working
+## Getting Started

-If you are working remotely, follow this [guide](notes/remote-working-guide.md).
+### Google Cloud Platform

-## Getting set up
+Google Cloud Platform (GCP) is a cloud computing service that provides a number of services, including the ability to run virtual machines (VMs) on their infrastructure. The VMs are called Compute Engine instances. 

-Detailed instructions for setting up a development environment for the course are given in [this file](notes/environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
+As an MLP course student, you will be given 50$ worth of credits. This is enough to run a number of experiments on the cloud.

-## Exercises
+To get started with GCP, please read the [this getting started guide](notes/google_cloud_setup.md).

-If you are first time users of jupyter notebook, check out `notebooks/00_notebook.ipynb` to understand its features.
-
-To get started with the exercises, go to the `notebooks` directory. For lab 1, work with the notebook starting with the prefix `01`, and so on.  
+The guide will take you through the process of setting up a GCP account, creating a project, creating a VM instance, and connecting to it. The VM instance will be a GPU-endowed Linux machine that already includes the necessary PyTorch packages for you to run your experiments. 
--- a/arg_extractor.py
+++ b/arg_extractor.py
@ -0,0 +1,87 @@
+import argparse
+import json
+import os
+import sys
+
+def str2bool(v):
+    if v.lower() in ('yes', 'true', 't', 'y', '1'):
+        return True
+    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
+        return False
+    else:
+        raise argparse.ArgumentTypeError('Boolean value expected.')
+
+
+def get_args():
+    """
+    Returns a namedtuple with arguments extracted from the command line.
+    :return: A namedtuple with arguments
+    """
+    parser = argparse.ArgumentParser(
+        description='Welcome to the MLP course\'s Pytorch training and inference helper script')
+
+    parser.add_argument('--batch_size', nargs="?", type=int, default=100, help='Batch_size for experiment')
+    parser.add_argument('--continue_from_epoch', nargs="?", type=int, default=-1, help='Batch_size for experiment')
+    parser.add_argument('--dataset_name', type=str, help='Dataset on which the system will train/eval our model')
+    parser.add_argument('--seed', nargs="?", type=int, default=7112018,
+                        help='Seed to use for random number generator for experiment')
+    parser.add_argument('--image_num_channels', nargs="?", type=int, default=1,
+                        help='The channel dimensionality of our image-data')
+    parser.add_argument('--image_height', nargs="?", type=int, default=28, help='Height of image data')
+    parser.add_argument('--image_width', nargs="?", type=int, default=28, help='Width of image data')
+    parser.add_argument('--dim_reduction_type', nargs="?", type=str, default='strided_convolution',
+                        help='One of [strided_convolution, dilated_convolution, max_pooling, avg_pooling]')
+    parser.add_argument('--num_layers', nargs="?", type=int, default=4,
+                        help='Number of convolutional layers in the network (excluding '
+                             'dimensionality reduction layers)')
+    parser.add_argument('--num_filters', nargs="?", type=int, default=64,
+                        help='Number of convolutional filters per convolutional layer in the network (excluding '
+                             'dimensionality reduction layers)')
+    parser.add_argument('--num_epochs', nargs="?", type=int, default=100, help='The experiment\'s epoch budget')
+    parser.add_argument('--experiment_name', nargs="?", type=str, default="exp_1",
+                        help='Experiment name - to be used for building the experiment folder')
+    parser.add_argument('--use_gpu', nargs="?", type=str2bool, default=False,
+                        help='A flag indicating whether we will use GPU acceleration or not')
+    parser.add_argument('--weight_decay_coefficient', nargs="?", type=float, default=1e-05,
+                        help='Weight decay to use for Adam')
+    parser.add_argument('--filepath_to_arguments_json_file', nargs="?", type=str, default=None,
+                        help='')
+
+    args = parser.parse_args()
+
+    if args.filepath_to_arguments_json_file is not None:
+        args = extract_args_from_json(json_file_path=args.filepath_to_arguments_json_file, existing_args_dict=args)
+
+    arg_str = [(str(key), str(value)) for (key, value) in vars(args).items()]
+    print(arg_str)
+
+    import torch
+
+    if torch.cuda.is_available():  # checks whether a cuda gpu is available and whether the gpu flag is True
+        device = torch.cuda.current_device()
+        print("use {} GPU(s)".format(torch.cuda.device_count()), file=sys.stderr)
+    else:
+        print("use CPU", file=sys.stderr)
+        device = torch.device('cpu')  # sets the device to be CPU
+
+    return args, device
+
+
+class AttributeAccessibleDict(object):
+    def __init__(self, adict):
+        self.__dict__.update(adict)
+
+
+def extract_args_from_json(json_file_path, existing_args_dict=None):
+
+    summary_filename = json_file_path
+    with open(summary_filename) as f:
+        arguments_dict = json.load(fp=f)
+
+    for key, value in vars(existing_args_dict).items():
+        if key not in arguments_dict:
+            arguments_dict[key] = value
+
+    arguments_dict = AttributeAccessibleDict(arguments_dict)
+
+    return arguments_dict
--- a/cluster_experiment_scripts/cifar100_standard_single_gpu_tutorial.sh
+++ b/cluster_experiment_scripts/cifar100_standard_single_gpu_tutorial.sh
@ -0,0 +1,43 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-Standard
+#SBATCH --gres=gpu:1
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-08:00:00
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 3 --image_height 32 --image_width 32 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'cifar100_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "cifar100"
--- a/cluster_experiment_scripts/cifar10_standard_single_gpu_tutorial.sh
+++ b/cluster_experiment_scripts/cifar10_standard_single_gpu_tutorial.sh
@ -0,0 +1,38 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-Standard
+#SBATCH --gres=gpu:1
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-08:00:00
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/cifar10_tutorial_config.json
--- a/cluster_experiment_scripts/emnist_longjobs_single_gpu_tutorial.sh
+++ b/cluster_experiment_scripts/emnist_longjobs_single_gpu_tutorial.sh
@ -0,0 +1,43 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-LongJobs
+#SBATCH --gres=gpu:1
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-08:00:00
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 1 --image_height 28 --image_width 28 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'emnist_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "emnist"
--- a/cluster_experiment_scripts/emnist_short_single_gpu_tutorial.sh
+++ b/cluster_experiment_scripts/emnist_short_single_gpu_tutorial.sh
@ -0,0 +1,43 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-Short
+#SBATCH --gres=gpu:1
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-03:59:00
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 1 --image_height 28 --image_width 28 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'emnist_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "emnist"
--- a/cluster_experiment_scripts/emnist_standard_multi_gpu_tutorial.sh
+++ b/cluster_experiment_scripts/emnist_standard_multi_gpu_tutorial.sh
@ -0,0 +1,44 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-Standard
+#SBATCH --gres=gpu:4
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-08:00:00
+
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 1 --image_height 28 --image_width 28 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'emnist_test_multi_gpu_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "emnist"
--- a/cluster_experiment_scripts/emnist_standard_single_gpu_tutorial.sh
+++ b/cluster_experiment_scripts/emnist_standard_single_gpu_tutorial.sh
@ -0,0 +1,43 @@
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-Standard
+#SBATCH --gres=gpu:1
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-08:00:00
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 1 --image_height 28 --image_width 28 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'emnist_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "emnist"
--- a/cluster_experiment_scripts/run_jobs_simple.py
+++ b/cluster_experiment_scripts/run_jobs_simple.py
@ -0,0 +1,57 @@
+import os
+import subprocess
+import argparse
+import tqdm
+import getpass
+import time
+
+parser = argparse.ArgumentParser(description='Welcome to the run N at a time script')
+parser.add_argument('--num_parallel_jobs', type=int)
+parser.add_argument('--total_epochs', type=int)
+args = parser.parse_args()
+
+
+def check_if_experiment_with_name_is_running(experiment_name):
+    result = subprocess.run(['squeue --name {}'.format(experiment_name), '-l'], stdout=subprocess.PIPE, shell=True)
+    lines = result.stdout.split(b'\n')
+    if len(lines) > 2:
+        return True
+    else:
+        return False
+
+student_id = getpass.getuser().encode()[:5]
+list_of_scripts = [item for item in
+                   subprocess.run(['ls'], stdout=subprocess.PIPE).stdout.split(b'\n') if
+                   item.decode("utf-8").endswith(".sh")]
+
+for script in list_of_scripts:
+    print('sbatch', script.decode("utf-8"))
+
+epoch_dict = {key.decode("utf-8"): 0 for key in list_of_scripts}
+total_jobs_finished = 0
+
+while total_jobs_finished < args.total_epochs * len(list_of_scripts):
+    curr_idx = 0
+    with tqdm.tqdm(total=len(list_of_scripts)) as pbar_experiment:
+        while curr_idx < len(list_of_scripts):
+            number_of_jobs = 0
+            result = subprocess.run(['squeue', '-l'], stdout=subprocess.PIPE)
+            for line in result.stdout.split(b'\n'):
+                if student_id in line:
+                    number_of_jobs += 1
+
+            if number_of_jobs < args.num_parallel_jobs:
+                while check_if_experiment_with_name_is_running(
+                        experiment_name=list_of_scripts[curr_idx].decode("utf-8")) or epoch_dict[
+                    list_of_scripts[curr_idx].decode("utf-8")] >= args.total_epochs:
+                    curr_idx += 1
+                    if curr_idx >= len(list_of_scripts):
+                        curr_idx = 0
+
+                str_to_run = 'sbatch {}'.format(list_of_scripts[curr_idx].decode("utf-8"))
+                total_jobs_finished += 1
+                os.system(str_to_run)
+                print(str_to_run)
+                curr_idx += 1
+            else:
+                time.sleep(1)
--- a/data/HadSSP_daily_qc.txt
+++ b/data/HadSSP_daily_qc.txt
--- a/data/ccpp_data.npz
+++ b/data/ccpp_data.npz
--- a/data/emnist-test.npz
+++ b/data/emnist-test.npz
--- a/data/emnist-train.npz
+++ b/data/emnist-train.npz
--- a/data/emnist-valid.npz
+++ b/data/emnist-valid.npz
--- a/data/mnist-test.npz
+++ b/data/mnist-test.npz
--- a/data/mnist-train.npz
+++ b/data/mnist-train.npz
--- a/data/mnist-valid.npz
+++ b/data/mnist-valid.npz
--- a/data_augmentations.py
+++ b/data_augmentations.py
@ -0,0 +1,55 @@
+from PIL import Image
+from numpy import random
+from torchvision import transforms
+import numpy as np
+import torch
+
+class Cutout(object):
+    """Randomly mask out one or more patches from an image.
+    Args:
+        n_holes (int): Number of patches to cut out of each image.
+        length (int): The length (in pixels) of each square patch.
+    """
+
+    def __init__(self, n_holes, length):
+        self.n_holes = n_holes
+        self.length = length
+
+    def __call__(self, img):
+        """
+        Args:
+            img (Tensor): Tensor image of size (C, H, W).
+        Returns:
+            Tensor: Image with n_holes of dimension length x length cut out of it.
+        """
+
+        from_PIL = False
+
+        if type(img) == Image.Image:
+            from_PIL = True
+            img = transforms.ToTensor()(img)
+
+        h = img.size(1)
+        w = img.size(2)
+
+        mask = np.ones((h, w), np.float32)
+
+        for n in range(self.n_holes):
+            y = random.randint(0, h)
+            x = random.randint(0, w)
+
+            y1 = np.clip(y - self.length // 2, 0, h)
+            y2 = np.clip(y + self.length // 2, 0, h)
+            x1 = np.clip(x - self.length // 2, 0, w)
+            x2 = np.clip(x + self.length // 2, 0, w)
+
+            mask[y1: y2, x1: x2] = 0.
+
+        mask = torch.from_numpy(mask)
+        mask = mask.expand_as(img)
+        img = img * mask
+
+        if from_PIL:
+            img = transforms.ToPILImage()(img)
+
+        return img
--- a/data_providers.py
+++ b/data_providers.py
@ -0,0 +1,631 @@
+# -*- coding: utf-8 -*-
+"""Data providers.
+
+This module provides classes for loading datasets and iterating over batches of
+data points.
+"""
+from __future__ import print_function
+import pickle
+import gzip
+import numpy as np
+import os
+DEFAULT_SEED = 20112018
+from PIL import Image
+import os
+import os.path
+import numpy as np
+import sys
+if sys.version_info[0] == 2:
+    import cPickle as pickle
+else:
+    import pickle
+
+import torch.utils.data as data
+from torchvision.datasets.utils import download_url, check_integrity
+
+class DataProvider(object):
+    """Generic data provider."""
+
+    def __init__(self, inputs, targets, batch_size, max_num_batches=-1,
+                 shuffle_order=True, rng=None):
+        """Create a new data provider object.
+
+        Args:
+            inputs (ndarray): Array of data input features of shape
+                (num_data, input_dim).
+            targets (ndarray): Array of data output targets of shape
+                (num_data, output_dim) or (num_data,) if output_dim == 1.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        self.inputs = inputs
+        self.targets = targets
+        if batch_size < 1:
+            raise ValueError('batch_size must be >= 1')
+        self._batch_size = batch_size
+        if max_num_batches == 0 or max_num_batches < -1:
+            raise ValueError('max_num_batches must be -1 or > 0')
+        self._max_num_batches = max_num_batches
+        self._update_num_batches()
+        self.shuffle_order = shuffle_order
+        self._current_order = np.arange(inputs.shape[0])
+        if rng is None:
+            rng = np.random.RandomState(DEFAULT_SEED)
+        self.rng = rng
+        self.new_epoch()
+
+    @property
+    def batch_size(self):
+        """Number of data points to include in each batch."""
+        return self._batch_size
+
+    @batch_size.setter
+    def batch_size(self, value):
+        if value < 1:
+            raise ValueError('batch_size must be >= 1')
+        self._batch_size = value
+        self._update_num_batches()
+
+    @property
+    def max_num_batches(self):
+        """Maximum number of batches to iterate over in an epoch."""
+        return self._max_num_batches
+
+    @max_num_batches.setter
+    def max_num_batches(self, value):
+        if value == 0 or value < -1:
+            raise ValueError('max_num_batches must be -1 or > 0')
+        self._max_num_batches = value
+        self._update_num_batches()
+
+    def _update_num_batches(self):
+        """Updates number of batches to iterate over."""
+        # maximum possible number of batches is equal to number of whole times
+        # batch_size divides in to the number of data points which can be
+        # found using integer division
+        possible_num_batches = self.inputs.shape[0] // self.batch_size
+        if self.max_num_batches == -1:
+            self.num_batches = possible_num_batches
+        else:
+            self.num_batches = min(self.max_num_batches, possible_num_batches)
+
+    def __iter__(self):
+        """Implements Python iterator interface.
+
+        This should return an object implementing a `next` method which steps
+        through a sequence returning one element at a time and raising
+        `StopIteration` when at the end of the sequence. Here the object
+        returned is the DataProvider itself.
+        """
+        return self
+
+    def new_epoch(self):
+        """Starts a new epoch (pass through data), possibly shuffling first."""
+        self._curr_batch = 0
+        if self.shuffle_order:
+            self.shuffle()
+
+    def __next__(self):
+        return self.next()
+
+    def reset(self):
+        """Resets the provider to the initial state."""
+        inv_perm = np.argsort(self._current_order)
+        self._current_order = self._current_order[inv_perm]
+        self.inputs = self.inputs[inv_perm]
+        self.targets = self.targets[inv_perm]
+        self.new_epoch()
+
+    def shuffle(self):
+        """Randomly shuffles order of data."""
+        perm = self.rng.permutation(self.inputs.shape[0])
+        self._current_order = self._current_order[perm]
+        self.inputs = self.inputs[perm]
+        self.targets = self.targets[perm]
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        if self._curr_batch + 1 > self.num_batches:
+            # no more batches in current iteration through data set so start
+            # new epoch ready for another pass and indicate iteration is at end
+            self.new_epoch()
+            raise StopIteration()
+        # create an index slice corresponding to current batch number
+        batch_slice = slice(self._curr_batch * self.batch_size,
+                            (self._curr_batch + 1) * self.batch_size)
+        inputs_batch = self.inputs[batch_slice]
+        targets_batch = self.targets[batch_slice]
+        self._curr_batch += 1
+        return inputs_batch, targets_batch
+
+class MNISTDataProvider(DataProvider):
+    """Data provider for MNIST handwritten digit images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None):
+        """Create a new MNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the MNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.which_set = which_set
+        self.num_classes = 10
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        data_path = os.path.join(
+            "data", 'mnist-{0}.npz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # load data from compressed numpy file
+        loaded = np.load(data_path)
+        inputs, targets = loaded['inputs'], loaded['targets']
+        inputs = inputs.astype(np.float32)
+        # pass the loaded data to the parent class __init__
+        super(MNISTDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(MNISTDataProvider, self).next()
+        return inputs_batch, self.to_one_of_k(targets_batch)
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+class EMNISTDataProvider(DataProvider):
+    """Data provider for EMNIST handwritten digit images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, flatten=False):
+        """Create a new EMNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'eval'. Determines which
+                portion of the EMNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid', 'test'], (
+            'Expected which_set to be either train, valid or eval. '
+            'Got {0}'.format(which_set)
+        )
+        self.which_set = which_set
+        self.num_classes = 47
+        # construct path to data using os.path.join to ensure the correct path
+        # separator for the current platform / OS is used
+        # MLP_DATA_DIR environment variable should point to the data directory
+        data_path = os.path.join(
+            "data", 'emnist-{0}.npz'.format(which_set))
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # load data from compressed numpy file
+        loaded = np.load(data_path)
+        print(loaded.keys())
+        inputs, targets = loaded['inputs'], loaded['targets']
+        inputs = inputs.astype(np.float32)
+        if flatten:
+            inputs = np.reshape(inputs, newshape=(-1, 28*28))
+        else:
+            inputs = np.reshape(inputs, newshape=(-1, 1, 28, 28))
+        inputs = inputs / 255.0
+        # pass the loaded data to the parent class __init__
+        super(EMNISTDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+    def __len__(self):
+        return self.num_batches
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(EMNISTDataProvider, self).next()
+        return inputs_batch, self.to_one_of_k(targets_batch)
+
+    def to_one_of_k(self, int_targets):
+        """Converts integer coded class target to 1 of K coded targets.
+
+        Args:
+            int_targets (ndarray): Array of integer coded class targets (i.e.
+                where an integer from 0 to `num_classes` - 1 is used to
+                indicate which is the correct class). This should be of shape
+                (num_data,).
+
+        Returns:
+            Array of 1 of K coded targets i.e. an array of shape
+            (num_data, num_classes) where for each row all elements are equal
+            to zero except for the column corresponding to the correct class
+            which is equal to one.
+        """
+        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
+        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
+        return one_of_k_targets
+
+
+class MetOfficeDataProvider(DataProvider):
+    """South Scotland Met Office weather data provider."""
+
+    def __init__(self, window_size, batch_size=10, max_num_batches=-1,
+                 shuffle_order=True, rng=None):
+        """Create a new Met Office data provider object.
+
+        Args:
+            window_size (int): Size of windows to split weather time series
+               data into. The constructed input features will be the first
+               `window_size - 1` entries in each window and the target outputs
+               the last entry in each window.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        data_path = os.path.join(
+            os.environ['DATASET_DIR'], 'HadSSP_daily_qc.txt')
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        raw = np.loadtxt(data_path, skiprows=3, usecols=range(2, 32))
+        assert window_size > 1, 'window_size must be at least 2.'
+        self.window_size = window_size
+        # filter out all missing datapoints and flatten to a vector
+        filtered = raw[raw >= 0].flatten()
+        # normalise data to zero mean, unit standard deviation
+        mean = np.mean(filtered)
+        std = np.std(filtered)
+        normalised = (filtered - mean) / std
+        # create a view on to array corresponding to a rolling window
+        shape = (normalised.shape[-1] - self.window_size + 1, self.window_size)
+        strides = normalised.strides + (normalised.strides[-1],)
+        windowed = np.lib.stride_tricks.as_strided(
+            normalised, shape=shape, strides=strides)
+        # inputs are first (window_size - 1) entries in windows
+        inputs = windowed[:, :-1]
+        # targets are last entry in windows
+        targets = windowed[:, -1]
+        super(MetOfficeDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+class CCPPDataProvider(DataProvider):
+
+    def __init__(self, which_set='train', input_dims=None, batch_size=10,
+                 max_num_batches=-1, shuffle_order=True, rng=None):
+        """Create a new Combined Cycle Power Plant data provider object.
+
+        Args:
+            which_set: One of 'train' or 'valid'. Determines which portion of
+                data this object should provide.
+            input_dims: Which of the four input dimension to use. If `None` all
+                are used. If an iterable of integers are provided (consisting
+                of a subset of {0, 1, 2, 3}) then only the corresponding
+                input dimensions are included.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+        """
+        data_path = os.path.join(
+            os.environ['DATASET_DIR'], 'ccpp_data.npz')
+        assert os.path.isfile(data_path), (
+            'Data file does not exist at expected path: ' + data_path
+        )
+        # check a valid which_set was provided
+        assert which_set in ['train', 'valid'], (
+            'Expected which_set to be either train or valid '
+            'Got {0}'.format(which_set)
+        )
+        # check input_dims are valid
+        if not input_dims is not None:
+            input_dims = set(input_dims)
+            assert input_dims.issubset({0, 1, 2, 3}), (
+                'input_dims should be a subset of {0, 1, 2, 3}'
+            )
+        loaded = np.load(data_path)
+        inputs = loaded[which_set + '_inputs']
+        if input_dims is not None:
+            inputs = inputs[:, input_dims]
+        targets = loaded[which_set + '_targets']
+        super(CCPPDataProvider, self).__init__(
+            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
+
+
+class AugmentedMNISTDataProvider(MNISTDataProvider):
+    """Data provider for MNIST dataset which randomly transforms images."""
+
+    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
+                 shuffle_order=True, rng=None, transformer=None):
+        """Create a new augmented MNIST data provider object.
+
+        Args:
+            which_set: One of 'train', 'valid' or 'test'. Determines which
+                portion of the MNIST data this object should provide.
+            batch_size (int): Number of data points to include in each batch.
+            max_num_batches (int): Maximum number of batches to iterate over
+                in an epoch. If `max_num_batches * batch_size > num_data` then
+                only as many batches as the data can be split into will be
+                used. If set to -1 all of the data will be used.
+            shuffle_order (bool): Whether to randomly permute the order of
+                the data before each epoch.
+            rng (RandomState): A seeded random number generator.
+            transformer: Function which takes an `inputs` array of shape
+                (batch_size, input_dim) corresponding to a batch of input
+                images and a `rng` random number generator object (i.e. a
+                call signature `transformer(inputs, rng)`) and applies a
+                potentiall random set of transformations to some / all of the
+                input images as each new batch is returned when iterating over
+                the data provider.
+        """
+        super(AugmentedMNISTDataProvider, self).__init__(
+            which_set, batch_size, max_num_batches, shuffle_order, rng)
+        self.transformer = transformer
+
+    def next(self):
+        """Returns next data batch or raises `StopIteration` if at end."""
+        inputs_batch, targets_batch = super(
+            AugmentedMNISTDataProvider, self).next()
+        transformed_inputs_batch = self.transformer(inputs_batch, self.rng)
+        return transformed_inputs_batch, targets_batch
+
+
+
+
+class CIFAR10(data.Dataset):
+    """`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
+
+    Args:
+        root (string): Root directory of dataset where directory
+            ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
+        train (bool, optional): If True, creates dataset from training set, otherwise
+            creates from test set.
+        transform (callable, optional): A function/transform that  takes in an PIL image
+            and returns a transformed version. E.g, ``transforms.RandomCrop``
+        target_transform (callable, optional): A function/transform that takes in the
+            target and transforms it.
+        download (bool, optional): If true, downloads the dataset from the internet and
+            puts it in root directory. If dataset is already downloaded, it is not
+            downloaded again.
+
+    """
+    base_folder = 'cifar-10-batches-py'
+    url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
+    filename = "cifar-10-python.tar.gz"
+    tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
+    train_list = [
+        ['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
+        ['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
+        ['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
+        ['data_batch_4', '634d18415352ddfa80567beed471001a'],
+        ['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
+    ]
+
+    test_list = [
+        ['test_batch', '40351d587109b95175f43aff81a1287e'],
+    ]
+
+    def __init__(self, root, set_name,
+                 transform=None, target_transform=None,
+                 download=False):
+        self.root = os.path.expanduser(root)
+        self.transform = transform
+        self.target_transform = target_transform
+        self.set_name = set_name  # training set or test set
+
+        if download:
+            self.download()
+
+        if not self._check_integrity():
+            raise RuntimeError('Dataset not found or corrupted.' +
+                               ' You can use download=True to download it')
+
+        # now load the picked numpy arrays
+        rng = np.random.RandomState(seed=0)
+
+        train_sample_idx = rng.choice(a=[i for i in range(50000)], size=47500, replace=False)
+        val_sample_idx = [i for i in range(50000) if i not in train_sample_idx]
+
+        if self.set_name=='train':
+            self.data = []
+            self.labels = []
+            for fentry in self.train_list:
+                f = fentry[0]
+                file = os.path.join(self.root, self.base_folder, f)
+                fo = open(file, 'rb')
+                if sys.version_info[0] == 2:
+                    entry = pickle.load(fo)
+                else:
+                    entry = pickle.load(fo, encoding='latin1')
+                self.data.append(entry['data'])
+                if 'labels' in entry:
+                    self.labels += entry['labels']
+                else:
+                    self.labels += entry['fine_labels']
+                fo.close()
+
+            self.data = np.concatenate(self.data)
+
+            self.data = self.data.reshape((50000, 3, 32, 32))
+            self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
+            self.data = self.data[train_sample_idx]
+            self.labels = np.array(self.labels)[train_sample_idx]
+            print(set_name, self.data.shape)
+            print(set_name, self.labels.shape)
+
+        elif self.set_name=='val':
+            self.data = []
+            self.labels = []
+            for fentry in self.train_list:
+                f = fentry[0]
+                file = os.path.join(self.root, self.base_folder, f)
+                fo = open(file, 'rb')
+                if sys.version_info[0] == 2:
+                    entry = pickle.load(fo)
+                else:
+                    entry = pickle.load(fo, encoding='latin1')
+                self.data.append(entry['data'])
+                if 'labels' in entry:
+                    self.labels += entry['labels']
+                else:
+                    self.labels += entry['fine_labels']
+                fo.close()
+
+            self.data = np.concatenate(self.data)
+            self.data = self.data.reshape((50000, 3, 32, 32))
+            self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
+            self.data = self.data[val_sample_idx]
+            self.labels = np.array(self.labels)[val_sample_idx]
+            print(set_name, self.data.shape)
+            print(set_name, self.labels.shape)
+
+        else:
+            f = self.test_list[0][0]
+            file = os.path.join(self.root, self.base_folder, f)
+            fo = open(file, 'rb')
+            if sys.version_info[0] == 2:
+                entry = pickle.load(fo)
+            else:
+                entry = pickle.load(fo, encoding='latin1')
+            self.data = entry['data']
+            if 'labels' in entry:
+                self.labels = entry['labels']
+            else:
+                self.labels = entry['fine_labels']
+            fo.close()
+            self.data = self.data.reshape((10000, 3, 32, 32))
+            self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
+            self.labels = np.array(self.labels)
+            print(set_name, self.data.shape)
+            print(set_name, self.labels.shape)
+
+    def __getitem__(self, index):
+        """
+        Args:
+            index (int): Index
+
+        Returns:
+            tuple: (image, target) where target is index of the target class.
+        """
+        img, target = self.data[index], self.labels[index]
+
+        # doing this so that it is consistent with all other datasets
+        # to return a PIL Image
+
+        img = Image.fromarray(img)
+
+        if self.transform is not None:
+            img = self.transform(img)
+
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+
+        return img, target
+
+    def __len__(self):
+        return len(self.data)
+
+    def _check_integrity(self):
+        root = self.root
+        for fentry in (self.train_list + self.test_list):
+            filename, md5 = fentry[0], fentry[1]
+            fpath = os.path.join(root, self.base_folder, filename)
+            if not check_integrity(fpath, md5):
+                return False
+        return True
+
+    def download(self):
+        import tarfile
+
+        if self._check_integrity():
+            print('Files already downloaded and verified')
+            return
+
+        root = self.root
+        download_url(self.url, root, self.filename, self.tgz_md5)
+
+        # extract file
+        cwd = os.getcwd()
+        tar = tarfile.open(os.path.join(root, self.filename), "r:gz")
+        os.chdir(root)
+        tar.extractall()
+        tar.close()
+        os.chdir(cwd)
+
+    def __repr__(self):
+        fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
+        fmt_str += '    Number of datapoints: {}\n'.format(self.__len__())
+        tmp = self.set_name
+        fmt_str += '    Split: {}\n'.format(tmp)
+        fmt_str += '    Root Location: {}\n'.format(self.root)
+        tmp = '    Transforms (if any): '
+        fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
+        tmp = '    Target Transforms (if any): '
+        fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
+        return fmt_str
+
+
+class CIFAR100(CIFAR10):
+    """`CIFAR100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
+
+    This is a subclass of the `CIFAR10` Dataset.
+    """
+    base_folder = 'cifar-100-python'
+    url = "https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz"
+    filename = "cifar-100-python.tar.gz"
+    tgz_md5 = 'eb9058c3a382ffc7106e4002c42a8d85'
+    train_list = [
+        ['train', '16019d7e3df5f24257cddd939b257f8d'],
+    ]
+
+    test_list = [
+        ['test', 'f0ef6b0ae62326f3e7ffdfab6717acfc'],
+    ]
--- a/experiment_builder.py
+++ b/experiment_builder.py
@ -0,0 +1,306 @@
+import sys
+
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import torch.nn.functional as F
+import tqdm
+import os
+import numpy as np
+import time
+
+from torch.optim.adam import Adam
+
+from storage_utils import save_statistics
+
+class ExperimentBuilder(nn.Module):
+    def __init__(self, network_model, experiment_name, num_epochs, train_data, val_data,
+                 test_data, weight_decay_coefficient, use_gpu, continue_from_epoch=-1):
+        """
+        Initializes an ExperimentBuilder object. Such an object takes care of running training and evaluation of a deep net
+        on a given dataset. It also takes care of saving per epoch models and automatically inferring the best val model
+        to be used for evaluating the test set metrics.
+        :param network_model: A pytorch nn.Module which implements a network architecture.
+        :param experiment_name: The name of the experiment. This is used mainly for keeping track of the experiment and creating and directory structure that will be used to save logs, model parameters and other.
+        :param num_epochs: Total number of epochs to run the experiment
+        :param train_data: An object of the DataProvider type. Contains the training set.
+        :param val_data: An object of the DataProvider type. Contains the val set.
+        :param test_data: An object of the DataProvider type. Contains the test set.
+        :param weight_decay_coefficient: A float indicating the weight decay to use with the adam optimizer.
+        :param use_gpu: A boolean indicating whether to use a GPU or not.
+        :param continue_from_epoch: An int indicating whether we'll start from scrach (-1) or whether we'll reload a previously saved model of epoch 'continue_from_epoch' and continue training from there.
+        """
+        super(ExperimentBuilder, self).__init__()
+
+        self.experiment_name = experiment_name
+        self.model = network_model
+        self.model.reset_parameters()
+        self.device = torch.cuda.current_device()
+
+        if torch.cuda.device_count() > 1 and use_gpu:
+            self.device = torch.cuda.current_device()
+            self.model.to(self.device)
+            self.model = nn.DataParallel(module=self.model)
+            print('Use Multi GPU', self.device)
+        elif torch.cuda.device_count() == 1 and use_gpu:
+            self.device = torch.cuda.current_device()
+            self.model.to(self.device)  # sends the model from the cpu to the gpu
+            print('Use GPU', self.device)
+        else:
+            print("use CPU")
+            self.device = torch.device('cpu')  # sets the device to be CPU
+            print(self.device)
+
+        # re-initialize network parameters
+        self.train_data = train_data
+        self.val_data = val_data
+        self.test_data = test_data
+        self.optimizer = Adam(self.parameters(), amsgrad=False,
+                                    weight_decay=weight_decay_coefficient)
+
+        print('System learnable parameters')
+        num_conv_layers = 0
+        num_linear_layers = 0
+        total_num_parameters = 0
+        for name, value in self.named_parameters():
+            print(name, value.shape)
+            if all(item in name for item in ['conv', 'weight']):
+                num_conv_layers += 1
+            if all(item in name for item in ['linear', 'weight']):
+                num_linear_layers += 1
+            total_num_parameters += np.prod(value.shape)
+
+        print('Total number of parameters', total_num_parameters)
+        print('Total number of conv layers', num_conv_layers)
+        print('Total number of linear layers', num_linear_layers)
+
+        # Generate the directory names
+        self.experiment_folder = os.path.abspath(experiment_name)
+        self.experiment_logs = os.path.abspath(os.path.join(self.experiment_folder, "result_outputs"))
+        self.experiment_saved_models = os.path.abspath(os.path.join(self.experiment_folder, "saved_models"))
+        print(self.experiment_folder, self.experiment_logs)
+        # Set best models to be at 0 since we are just starting
+        self.best_val_model_idx = 0
+        self.best_val_model_acc = 0.
+
+        if not os.path.exists(self.experiment_folder):  # If experiment directory does not exist
+            os.mkdir(self.experiment_folder)  # create the experiment directory
+
+        if not os.path.exists(self.experiment_logs):
+            os.mkdir(self.experiment_logs)  # create the experiment log directory
+
+        if not os.path.exists(self.experiment_saved_models):
+            os.mkdir(self.experiment_saved_models)  # create the experiment saved models directory
+
+        self.num_epochs = num_epochs
+        self.criterion = nn.CrossEntropyLoss().to(self.device)  # send the loss computation to the GPU
+        if continue_from_epoch == -2:
+            try:
+                self.best_val_model_idx, self.best_val_model_acc, self.state = self.load_model(
+                    model_save_dir=self.experiment_saved_models, model_save_name="train_model",
+                    model_idx='latest')  # reload existing model from epoch and return best val model index
+                # and the best val acc of that model
+                self.starting_epoch = self.state['current_epoch_idx']
+            except:
+                print("Model objects cannot be found, initializing a new model and starting from scratch")
+                self.starting_epoch = 0
+                self.state = dict()
+
+        elif continue_from_epoch != -1:  # if continue from epoch is not -1 then
+            self.best_val_model_idx, self.best_val_model_acc, self.state = self.load_model(
+                model_save_dir=self.experiment_saved_models, model_save_name="train_model",
+                model_idx=continue_from_epoch)  # reload existing model from epoch and return best val model index
+            # and the best val acc of that model
+            self.starting_epoch = self.state['current_epoch_idx']
+        else:
+            self.starting_epoch = 0
+            self.state = dict()
+
+    def get_num_parameters(self):
+        total_num_params = 0
+        for param in self.parameters():
+            total_num_params += np.prod(param.shape)
+
+        return total_num_params
+
+    def run_train_iter(self, x, y):
+        """
+        Receives the inputs and targets for the model and runs a training iteration. Returns loss and accuracy metrics.
+        :param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
+        :param y: The targets for the model. A numpy array of shape batch_size, num_classes
+        :return: the loss and accuracy for this batch
+        """
+        self.train()  # sets model to training mode (in case batch normalization or other methods have different procedures for training and evaluation)
+
+        if len(y.shape) > 1:
+            y = np.argmax(y, axis=1)  # convert one hot encoded labels to single integer labels
+
+        #print(type(x))
+
+        if type(x) is np.ndarray:
+            x, y = torch.Tensor(x).float().to(device=self.device), torch.Tensor(y).long().to(
+            device=self.device)  # send data to device as torch tensors
+
+        x = x.to(self.device)
+        y = y.to(self.device)
+
+        out = self.model.forward(x)  # forward the data in the model
+        loss = F.cross_entropy(input=out, target=y)  # compute loss
+
+        self.optimizer.zero_grad()  # set all weight grads from previous training iters to 0
+        loss.backward()  # backpropagate to compute gradients for current iter loss
+
+        self.optimizer.step()  # update network parameters
+        _, predicted = torch.max(out.data, 1)  # get argmax of predictions
+        accuracy = np.mean(list(predicted.eq(y.data).cpu()))  # compute accuracy
+        return loss.data.detach().cpu().numpy(), accuracy
+
+    def run_evaluation_iter(self, x, y):
+        """
+        Receives the inputs and targets for the model and runs an evaluation iterations. Returns loss and accuracy metrics.
+        :param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
+        :param y: The targets for the model. A numpy array of shape batch_size, num_classes
+        :return: the loss and accuracy for this batch
+        """
+        self.eval()  # sets the system to validation mode
+        if len(y.shape) > 1:
+            y = np.argmax(y, axis=1)  # convert one hot encoded labels to single integer labels
+        if type(x) is np.ndarray:
+            x, y = torch.Tensor(x).float().to(device=self.device), torch.Tensor(y).long().to(
+            device=self.device)  # convert data to pytorch tensors and send to the computation device
+
+        x = x.to(self.device)
+        y = y.to(self.device)
+        out = self.model.forward(x)  # forward the data in the model
+        loss = F.cross_entropy(out, y)  # compute loss
+        _, predicted = torch.max(out.data, 1)  # get argmax of predictions
+        accuracy = np.mean(list(predicted.eq(y.data).cpu()))  # compute accuracy
+        return loss.data.detach().cpu().numpy(), accuracy
+
+    def save_model(self, model_save_dir, model_save_name, model_idx, state):
+        """
+        Save the network parameter state and current best val epoch idx and best val accuracy.
+        :param model_save_name: Name to use to save model without the epoch index
+        :param model_idx: The index to save the model with.
+        :param best_validation_model_idx: The index of the best validation model to be stored for future use.
+        :param best_validation_model_acc: The best validation accuracy to be stored for use at test time.
+        :param model_save_dir: The directory to store the state at.
+        :param state: The dictionary containing the system state.
+
+        """
+        state['network'] = self.state_dict()  # save network parameter and other variables.
+        torch.save(state, f=os.path.join(model_save_dir, "{}_{}".format(model_save_name, str(
+            model_idx))))  # save state at prespecified filepath
+
+    def run_training_epoch(self, current_epoch_losses):
+        with tqdm.tqdm(total=len(self.train_data), file=sys.stdout) as pbar_train:  # create a progress bar for training
+            for idx, (x, y) in enumerate(self.train_data):  # get data batches
+                loss, accuracy = self.run_train_iter(x=x, y=y)  # take a training iter step
+                current_epoch_losses["train_loss"].append(loss)  # add current iter loss to the train loss list
+                current_epoch_losses["train_acc"].append(accuracy)  # add current iter acc to the train acc list
+                pbar_train.update(1)
+                pbar_train.set_description("loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy))
+
+        return current_epoch_losses
+
+    def run_validation_epoch(self, current_epoch_losses):
+
+        with tqdm.tqdm(total=len(self.val_data), file=sys.stdout) as pbar_val:  # create a progress bar for validation
+            for x, y in self.val_data:  # get data batches
+                loss, accuracy = self.run_evaluation_iter(x=x, y=y)  # run a validation iter
+                current_epoch_losses["val_loss"].append(loss)  # add current iter loss to val loss list.
+                current_epoch_losses["val_acc"].append(accuracy)  # add current iter acc to val acc lst.
+                pbar_val.update(1)  # add 1 step to the progress bar
+                pbar_val.set_description("loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy))
+
+        return current_epoch_losses
+
+    def run_testing_epoch(self, current_epoch_losses):
+
+        with tqdm.tqdm(total=len(self.test_data), file=sys.stdout) as pbar_test:  # ini a progress bar
+            for x, y in self.test_data:  # sample batch
+                loss, accuracy = self.run_evaluation_iter(x=x,
+                                                          y=y)  # compute loss and accuracy by running an evaluation step
+                current_epoch_losses["test_loss"].append(loss)  # save test loss
+                current_epoch_losses["test_acc"].append(accuracy)  # save test accuracy
+                pbar_test.update(1)  # update progress bar status
+                pbar_test.set_description(
+                    "loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy))  # update progress bar string output
+        return current_epoch_losses
+
+
+    def load_model(self, model_save_dir, model_save_name, model_idx):
+        """
+        Load the network parameter state and the best val model idx and best val acc to be compared with the future val accuracies, in order to choose the best val model
+        :param model_save_dir: The directory to store the state at.
+        :param model_save_name: Name to use to save model without the epoch index
+        :param model_idx: The index to save the model with.
+        :return: best val idx and best val model acc, also it loads the network state into the system state without returning it
+        """
+        state = torch.load(f=os.path.join(model_save_dir, "{}_{}".format(model_save_name, str(model_idx))))
+        self.load_state_dict(state_dict=state['network'])
+        return state['best_val_model_idx'], state['best_val_model_acc'], state
+
+    def run_experiment(self):
+        """
+        Runs experiment train and evaluation iterations, saving the model and best val model and val model accuracy after each epoch
+        :return: The summary current_epoch_losses from starting epoch to total_epochs.
+        """
+        total_losses = {"train_acc": [], "train_loss": [], "val_acc": [],
+                        "val_loss": [], "curr_epoch": []}  # initialize a dict to keep the per-epoch metrics
+        for i, epoch_idx in enumerate(range(self.starting_epoch, self.num_epochs)):
+            epoch_start_time = time.time()
+            current_epoch_losses = {"train_acc": [], "train_loss": [], "val_acc": [], "val_loss": []}
+
+            current_epoch_losses = self.run_training_epoch(current_epoch_losses)
+            current_epoch_losses = self.run_validation_epoch(current_epoch_losses)
+
+            val_mean_accuracy = np.mean(current_epoch_losses['val_acc'])
+            if val_mean_accuracy > self.best_val_model_acc:  # if current epoch's mean val acc is greater than the saved best val acc then
+                self.best_val_model_acc = val_mean_accuracy  # set the best val model acc to be current epoch's val accuracy
+                self.best_val_model_idx = epoch_idx  # set the experiment-wise best val idx to be the current epoch's idx
+
+            for key, value in current_epoch_losses.items():
+                total_losses[key].append(np.mean(value))
+                # get mean of all metrics of current epoch metrics dict,
+                # to get them ready for storage and output on the terminal.
+
+            total_losses['curr_epoch'].append(epoch_idx)
+            save_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv',
+                            stats_dict=total_losses, current_epoch=i,
+                            continue_from_mode=True if (self.starting_epoch != 0 or i > 0) else False) # save statistics to stats file.
+
+            # load_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv') # How to load a csv file if you need to
+
+            out_string = "_".join(
+                ["{}_{:.4f}".format(key, np.mean(value)) for key, value in current_epoch_losses.items()])
+            # create a string to use to report our epoch metrics
+            epoch_elapsed_time = time.time() - epoch_start_time  # calculate time taken for epoch
+            epoch_elapsed_time = "{:.4f}".format(epoch_elapsed_time)
+            print("Epoch {}:".format(epoch_idx), out_string, "epoch time", epoch_elapsed_time, "seconds")
+            self.state['current_epoch_idx'] = epoch_idx
+            self.state['best_val_model_acc'] = self.best_val_model_acc
+            self.state['best_val_model_idx'] = self.best_val_model_idx
+            self.save_model(model_save_dir=self.experiment_saved_models,
+                            # save model and best val idx and best val acc, using the model dir, model name and model idx
+                            model_save_name="train_model", model_idx=epoch_idx, state=self.state)
+            self.save_model(model_save_dir=self.experiment_saved_models,
+                            # save model and best val idx and best val acc, using the model dir, model name and model idx
+                            model_save_name="train_model", model_idx='latest', state=self.state)
+
+        print("Generating test set evaluation metrics")
+        self.load_model(model_save_dir=self.experiment_saved_models, model_idx=self.best_val_model_idx,
+                        # load best validation model
+                        model_save_name="train_model")
+        current_epoch_losses = {"test_acc": [], "test_loss": []}  # initialize a statistics dict
+
+        current_epoch_losses = self.run_testing_epoch(current_epoch_losses=current_epoch_losses)
+
+        test_losses = {key: [np.mean(value)] for key, value in
+                       current_epoch_losses.items()}  # save test set metrics in dict format
+
+        save_statistics(experiment_log_dir=self.experiment_logs, filename='test_summary.csv',
+                        # save test set metrics on disk in .csv format
+                        stats_dict=test_losses, current_epoch=0, continue_from_mode=False)
+
+        return total_losses, test_losses
--- a/experiment_configs/cifar10_tutorial_config.json
+++ b/experiment_configs/cifar10_tutorial_config.json
@ -0,0 +1,16 @@
+{
+  "batch_size": 100,
+  "dataset_name": "cifar10",
+  "continue_from_epoch": -2,
+  "seed": 0,
+  "image_num_channels": 3,
+  "image_height": 32,
+  "image_width": 32,
+  "dim_reduction_type": "avg_pooling",
+  "num_layers": 4,
+  "num_filters": 64,
+  "num_epochs": 250,
+  "experiment_name": "cifar10_tutorial",
+  "use_gpu": true,
+  "weight_decay_coefficient": 1e-05
+}
--- a/experiment_configs/emnist_tutorial_config.json
+++ b/experiment_configs/emnist_tutorial_config.json
@ -0,0 +1,16 @@
+{
+  "batch_size": 100,
+  "dataset_name": "emnist",
+  "continue_from_epoch": -2,
+  "seed": 0,
+  "image_num_channels": 1,
+  "image_height": 28,
+  "image_width": 28,
+  "dim_reduction_type": "avg_pooling",
+  "num_layers": 4,
+  "num_filters": 32,
+  "num_epochs": 250,
+  "experiment_name": "emnist_tutorial",
+  "use_gpu": true,
+  "weight_decay_coefficient": 1e-05
+}
--- a/install.sh
+++ b/install.sh
@ -0,0 +1,6 @@
+conda install -c conda-forge opencv
+conda install numpy scipy matplotlib
+conda install -c conda-forge pbzip2 pydrive
+conda install pillow tqdm
+pip install GPUtil
+conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
--- a/local_experiment_scripts/cifar100_arg_parsing_template.sh
+++ b/local_experiment_scripts/cifar100_arg_parsing_template.sh
@ -0,0 +1,12 @@
+#!/bin/sh
+
+cd ..
+export DATASET_DIR="data/"
+# Activate the relevant virtual environment:
+
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 3 --image_height 32 --image_width 32 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'cifar100_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "cifar100"
--- a/local_experiment_scripts/cifar10_arg_parsing_template.sh
+++ b/local_experiment_scripts/cifar10_arg_parsing_template.sh
@ -0,0 +1,12 @@
+#!/bin/sh
+
+cd ..
+export DATASET_DIR="data/"
+# Activate the relevant virtual environment:
+
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 3 --image_height 32 --image_width 32 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'cifar10_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0. \
+                                                      --dataset_name "cifar10"
--- a/local_experiment_scripts/cifar10_json_parsing_template.sh
+++ b/local_experiment_scripts/cifar10_json_parsing_template.sh
@ -0,0 +1,7 @@
+#!/bin/sh
+
+cd ..
+export DATASET_DIR="data/"
+# Activate the relevant virtual environment:
+
+python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/cifar10_tutorial_config.json
--- a/local_experiment_scripts/emnist_arg_parsing_template.sh
+++ b/local_experiment_scripts/emnist_arg_parsing_template.sh
@ -0,0 +1,11 @@
+#!/bin/sh
+
+cd ..
+export DATASET_DIR="data/"
+# Activate the relevant virtual environment:
+
+python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
+                                                      --image_num_channels 1 --image_height 28 --image_width 28 \
+                                                      --dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
+                                                      --num_epochs 100 --experiment_name 'emnist_test_exp' \
+                                                      --use_gpu "True" --weight_decay_coefficient 0.
--- a/local_experiment_scripts/emnist_json_parsing_template.sh
+++ b/local_experiment_scripts/emnist_json_parsing_template.sh
@ -0,0 +1,7 @@
+#!/bin/sh
+
+cd ..
+export DATASET_DIR="data/"
+# Activate the relevant virtual environment:
+
+python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/emnist_tutorial_config.json
--- a/mlp/init.py
+++ b/mlp/init.py
@ -1,6 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Machine Learning Practical package."""
-
-__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham']
-
-DEFAULT_SEED = 123456  # Default random number generator seed if none provided.
--- a/mlp/data_providers.py
+++ b/mlp/data_providers.py
@ -1,255 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Data providers.
-
-This module provides classes for loading datasets and iterating over batches of
-data points.
-"""
-
-import pickle
-import gzip
-import numpy as np
-import os
-from mlp import DEFAULT_SEED
-
-
-class DataProvider(object):
-    """Generic data provider."""
-
-    def __init__(self, inputs, targets, batch_size, max_num_batches=-1,
-                 shuffle_order=True, rng=None):
-        """Create a new data provider object.
-
-        Args:
-            inputs (ndarray): Array of data input features of shape
-                (num_data, input_dim).
-            targets (ndarray): Array of data output targets of shape
-                (num_data, output_dim) or (num_data,) if output_dim == 1.
-            batch_size (int): Number of data points to include in each batch.
-            max_num_batches (int): Maximum number of batches to iterate over
-                in an epoch. If `max_num_batches * batch_size > num_data` then
-                only as many batches as the data can be split into will be
-                used. If set to -1 all of the data will be used.
-            shuffle_order (bool): Whether to randomly permute the order of
-                the data before each epoch.
-            rng (RandomState): A seeded random number generator.
-        """
-        self.inputs = inputs
-        self.targets = targets
-        self.batch_size = batch_size
-        assert max_num_batches != 0 and not max_num_batches < -1, (
-            'max_num_batches should be -1 or > 0')
-        self.max_num_batches = max_num_batches
-        # maximum possible number of batches is equal to number of whole times
-        # batch_size divides in to the number of data points which can be
-        # found using integer division
-        possible_num_batches = self.inputs.shape[0] // batch_size
-        if self.max_num_batches == -1:
-            self.num_batches = possible_num_batches
-        else:
-            self.num_batches = min(self.max_num_batches, possible_num_batches)
-        self.shuffle_order = shuffle_order
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-        self.reset()
-
-    def __iter__(self):
-        """Implements Python iterator interface.
-
-        This should return an object implementing a `next` method which steps
-        through a sequence returning one element at a time and raising
-        `StopIteration` when at the end of the sequence. Here the object
-        returned is the DataProvider itself.
-        """
-        return self
-
-    def reset(self):
-        """Resets the provider to the initial state to use in a new epoch."""
-        self._curr_batch = 0
-        if self.shuffle_order:
-            self.shuffle()
-
-    def shuffle(self):
-        """Randomly shuffles order of data."""
-        new_order = self.rng.permutation(self.inputs.shape[0])
-        self.inputs = self.inputs[new_order]
-        self.targets = self.targets[new_order]
-
-    def __next__(self):
-        return self.next()
-
-    def next(self):
-        """Returns next data batch or raises `StopIteration` if at end."""
-        if self._curr_batch + 1 > self.num_batches:
-            # no more batches in current iteration through data set so reset
-            # the dataset for another pass and indicate iteration is at end
-            self.reset()
-            raise StopIteration()
-        # create an index slice corresponding to current batch number
-        batch_slice = slice(self._curr_batch * self.batch_size,
-                            (self._curr_batch + 1) * self.batch_size)
-        inputs_batch = self.inputs[batch_slice]
-        targets_batch = self.targets[batch_slice]
-        self._curr_batch += 1
-        return inputs_batch, targets_batch
-
-
-class MNISTDataProvider(DataProvider):
-    """Data provider for MNIST handwritten digit images."""
-
-    def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
-                 shuffle_order=True, rng=None):
-        """Create a new MNIST data provider object.
-
-        Args:
-            which_set: One of 'train', 'valid' or 'eval'. Determines which
-                portion of the MNIST data this object should provide.
-            batch_size (int): Number of data points to include in each batch.
-            max_num_batches (int): Maximum number of batches to iterate over
-                in an epoch. If `max_num_batches * batch_size > num_data` then
-                only as many batches as the data can be split into will be
-                used. If set to -1 all of the data will be used.
-            shuffle_order (bool): Whether to randomly permute the order of
-                the data before each epoch.
-            rng (RandomState): A seeded random number generator.
-        """
-        # check a valid which_set was provided
-        assert which_set in ['train', 'valid', 'eval'], (
-            'Expected which_set to be either train, valid or eval. '
-            'Got {0}'.format(which_set)
-        )
-        self.which_set = which_set
-        self.num_classes = 10
-        # construct path to data using os.path.join to ensure the correct path
-        # separator for the current platform / OS is used
-        # MLP_DATA_DIR environment variable should point to the data directory
-        data_path = os.path.join(
-            os.environ['MLP_DATA_DIR'], 'mnist-{0}.npz'.format(which_set))
-        assert os.path.isfile(data_path), (
-            'Data file does not exist at expected path: ' + data_path
-        )
-        # load data from compressed numpy file
-        loaded = np.load(data_path)
-        inputs, targets = loaded['inputs'], loaded['targets']
-        inputs = inputs.astype(np.float32)
-        # pass the loaded data to the parent class __init__
-        super(MNISTDataProvider, self).__init__(
-            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
-
-    def next(self):
-        """Returns next data batch or raises `StopIteration` if at end."""
-        inputs_batch, targets_batch = super(MNISTDataProvider, self).next()
-        return inputs_batch, self.to_one_of_k(targets_batch)
-
-    def to_one_of_k(self, int_targets):
-        """Converts integer coded class target to 1 of K coded targets.
-
-        Args:
-            int_targets (ndarray): Array of integer coded class targets (i.e.
-                where an integer from 0 to `num_classes` - 1 is used to
-                indicate which is the correct class). This should be of shape
-                (num_data,).
-
-        Returns:
-            Array of 1 of K coded targets i.e. an array of shape
-            (num_data, num_classes) where for each row all elements are equal
-            to zero except for the column corresponding to the correct class
-            which is equal to one.
-        """
-        one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
-        one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
-        return one_of_k_targets
-
-
-class MetOfficeDataProvider(DataProvider):
-    """South Scotland Met Office weather data provider."""
-
-    def __init__(self, window_size, batch_size=10, max_num_batches=-1,
-                 shuffle_order=True, rng=None):
-        """Create a new Met Office data provider object.
-
-        Args:
-            window_size (int): Size of windows to split weather time series
-               data into. The constructed input features will be the first
-               `window_size - 1` entries in each window and the target outputs
-               the last entry in each window.
-            batch_size (int): Number of data points to include in each batch.
-            max_num_batches (int): Maximum number of batches to iterate over
-                in an epoch. If `max_num_batches * batch_size > num_data` then
-                only as many batches as the data can be split into will be
-                used. If set to -1 all of the data will be used.
-            shuffle_order (bool): Whether to randomly permute the order of
-                the data before each epoch.
-            rng (RandomState): A seeded random number generator.
-        """
-        data_path = os.path.join(
-            os.environ['MLP_DATA_DIR'], 'HadSSP_daily_qc.txt')
-        assert os.path.isfile(data_path), (
-            'Data file does not exist at expected path: ' + data_path
-        )
-        raw = np.loadtxt(data_path, skiprows=3, usecols=range(2, 32))
-        assert window_size > 1, 'window_size must be at least 2.'
-        self.window_size = window_size
-        # filter out all missing datapoints and flatten to a vector
-        filtered = raw[raw >= 0].flatten()
-        # normalise data to zero mean, unit standard deviation
-        mean = np.mean(filtered)
-        std = np.std(filtered)
-        normalised = (filtered - mean) / std
-        # create a view on to array corresponding to a rolling window
-        shape = (normalised.shape[-1] - self.window_size + 1, self.window_size)
-        strides = normalised.strides + (normalised.strides[-1],)
-        windowed = np.lib.stride_tricks.as_strided(
-            normalised, shape=shape, strides=strides)
-        # inputs are first (window_size - 1) entries in windows
-        inputs = windowed[:, :-1]
-        # targets are last entry in windows
-        targets = windowed[:, -1]
-        super(MetOfficeDataProvider, self).__init__(
-            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
-
-class CCPPDataProvider(DataProvider):
-
-    def __init__(self, which_set='train', input_dims=None, batch_size=10,
-                 max_num_batches=-1, shuffle_order=True, rng=None):
-        """Create a new Combined Cycle Power Plant data provider object.
-
-        Args:
-            which_set: One of 'train' or 'valid'. Determines which portion of
-                data this object should provide.
-            input_dims: Which of the four input dimension to use. If `None` all
-                are used. If an iterable of integers are provided (consisting
-                of a subset of {0, 1, 2, 3}) then only the corresponding
-                input dimensions are included.
-            batch_size (int): Number of data points to include in each batch.
-            max_num_batches (int): Maximum number of batches to iterate over
-                in an epoch. If `max_num_batches * batch_size > num_data` then
-                only as many batches as the data can be split into will be
-                used. If set to -1 all of the data will be used.
-            shuffle_order (bool): Whether to randomly permute the order of
-                the data before each epoch.
-            rng (RandomState): A seeded random number generator.
-        """
-        data_path = os.path.join(
-            os.environ['MLP_DATA_DIR'], 'ccpp_data.npz')
-        assert os.path.isfile(data_path), (
-            'Data file does not exist at expected path: ' + data_path
-        )
-        # check a valid which_set was provided
-        assert which_set in ['train', 'valid'], (
-            'Expected which_set to be either train or valid '
-            'Got {0}'.format(which_set)
-        )
-        # check input_dims are valid
-        if not input_dims is not None:
-            input_dims = set(input_dims)
-            assert input_dims.issubset({0, 1, 2, 3}), (
-                'input_dims should be a subset of {0, 1, 2, 3}'
-            )
-        loaded = np.load(data_path)
-        inputs = loaded[which_set + '_inputs']
-        if input_dims is not None:
-            inputs = inputs[:, input_dims]
-        targets = loaded[which_set + '_targets']
-        super(CCPPDataProvider, self).__init__(
-            inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
--- a/mlp/errors.py
+++ b/mlp/errors.py
@ -1,46 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Error functions.
-
-This module defines error functions, with the aim of model training being to
-minimise the error function given a set of inputs and target outputs.
-
-The error functions will typically measure some concept of distance between the
-model outputs and target outputs, averaged over all data points in the data set
-or batch.
-"""
-
-import numpy as np
-
-
-class SumOfSquaredDiffsError(object):
-    """Sum of squared differences (squared Euclidean distance) error."""
-
-    def __call__(self, outputs, targets):
-        """Calculates error function given a batch of outputs and targets.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Scalar error function value.
-        """
-        #TODO write your code here
-        raise NotImplementedError()
-
-    def grad(self, outputs, targets):
-        """Calculates gradient of error function with respect to outputs.
-
-        Args:
-            outputs: Array of model outputs of shape (batch_size, output_dim).
-            targets: Array of target outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Gradient of error function with respect to outputs. This should be
-            an array of shape (batch_size, output_dim).
-        """
-        #TODO write your code here
-        raise NotImplementedError()
-
-    def __repr__(self):
-        return 'SumOfSquaredDiffsError'
--- a/mlp/initialisers.py
+++ b/mlp/initialisers.py
@ -1,65 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Parameter initialisers.
-
-This module defines classes to initialise the parameters in a layer.
-"""
-
-import numpy as np
-from mlp import DEFAULT_SEED
-
-
-class ConstantInit(object):
-    """Constant parameter initialiser."""
-
-    def __init__(self, value):
-        """Construct a constant parameter initialiser.
-
-        Args:
-            value: Value to initialise parameter to.
-        """
-        self.value = value
-
-    def __call__(self, shape):
-        return np.ones(shape=shape) * self.value
-
-
-class UniformInit(object):
-    """Random uniform parameter initialiser."""
-
-    def __init__(self, low, high, rng=None):
-        """Construct a random uniform parameter initialiser.
-
-        Args:
-            low: Lower bound of interval to sample from.
-            high: Upper bound of interval to sample from.
-            rng (RandomState): Seeded random number generator.
-        """
-        self.low = low
-        self.high = high
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-
-    def __call__(self, shape):
-        return self.rng.uniform(low=self.low, high=self.high, size=shape)
-
-
-class NormalInit(object):
-    """Random normal parameter initialiser."""
-
-    def __init__(self, mean, std, rng=None):
-        """Construct a random uniform parameter initialiser.
-
-        Args:
-            mean: Mean of distribution to sample from.
-            std: Standard deviation of distribution to sample from.
-            rng (RandomState): Seeded random number generator.
-        """
-        self.mean = mean
-        self.std = std
-        if rng is None:
-            rng = np.random.RandomState(DEFAULT_SEED)
-        self.rng = rng
-
-    def __call__(self, shape):
-        return self.rng.normal(loc=self.mean, scale=self.std, size=shape)
--- a/mlp/layers.py
+++ b/mlp/layers.py
@ -1,141 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Layer definitions.
-
-This module defines classes which encapsulate a single layer.
-
-These layers map input activations to output activation with the `fprop`
-method and map gradients with repsect to outputs to gradients with respect to
-their inputs with the `bprop` method.
-
-Some layers will have learnable parameters and so will additionally define
-methods for getting and setting parameter and calculating gradients with
-respect to the layer parameters.
-"""
-
-import numpy as np
-import mlp.initialisers as init
-
-
-class Layer(object):
-    """Abstract class defining the interface for a layer."""
-
-    def fprop(self, inputs):
-        """Forward propagates activations through the layer transformation.
-
-        Args:
-            inputs: Array of layer inputs of shape (batch_size, input_dim).
-
-        Returns:
-            outputs: Array of layer outputs of shape (batch_size, output_dim).
-        """
-        raise NotImplementedError()
-
-    def bprop(self, inputs, outputs, grads_wrt_outputs):
-        """Back propagates gradients through a layer.
-
-        Given gradients with respect to the outputs of the layer calculates the
-        gradients with respect to the layer inputs.
-
-        Args:
-            inputs: Array of layer inputs of shape (batch_size, input_dim).
-            outputs: Array of layer outputs calculated in forward pass of
-                shape (batch_size, output_dim).
-            grads_wrt_outputs: Array of gradients with respect to the layer
-                outputs of shape (batch_size, output_dim).
-
-        Returns:
-            Array of gradients with respect to the layer inputs of shape
-            (batch_size, input_dim).
-        """
-        raise NotImplementedError()
-
-
-class LayerWithParameters(Layer):
-    """Abstract class defining the interface for a layer with parameters."""
-
-    def grads_wrt_params(self, inputs, grads_wrt_outputs):
-        """Calculates gradients with respect to layer parameters.
-
-        Args:
-            inputs: Array of inputs to layer of shape (batch_size, input_dim).
-            grads_wrt_to_outputs: Array of gradients with respect to the layer
-                outputs of shape (batch_size, output_dim).
-
-        Returns:
-            List of arrays of gradients with respect to the layer parameters
-            with parameter gradients appearing in same order in tuple as
-            returned from `get_params` method.
-        """
-        raise NotImplementedError()
-
-    @property
-    def params(self):
-        """Returns a list of parameters of layer.
-
-        Returns:
-            List of current parameter values.
-        """
-        raise NotImplementedError()
-
-
-class AffineLayer(LayerWithParameters):
-    """Layer implementing an affine tranformation of its inputs.
-
-    This layer is parameterised by a weight matrix and bias vector.
-    """
-
-    def __init__(self, input_dim, output_dim,
-                 weights_initialiser=init.UniformInit(-0.1, 0.1),
-                 biases_initialiser=init.ConstantInit(0.),
-                 weights_cost=None, biases_cost=None):
-        """Initialises a parameterised affine layer.
-
-        Args:
-            input_dim (int): Dimension of inputs to the layer.
-            output_dim (int): Dimension of the layer outputs.
-            weights_initialiser: Initialiser for the weight parameters.
-            biases_initialiser: Initialiser for the bias parameters.
-        """
-        self.input_dim = input_dim
-        self.output_dim = output_dim
-        self.weights = weights_initialiser((self.output_dim, self.input_dim))
-        self.biases = biases_initialiser(self.output_dim)
-
-    def fprop(self, inputs):
-        """Forward propagates activations through the layer transformation.
-
-        For inputs `x`, outputs `y`, weights `W` and biases `b` the layer
-        corresponds to `y = W.dot(x) + b`.
-
-        Args:
-            inputs: Array of layer inputs of shape (batch_size, input_dim).
-
-        Returns:
-            outputs: Array of layer outputs of shape (batch_size, output_dim).
-        """
-        #TODO write your code here
-        raise NotImplementedError()
-
-    def grads_wrt_params(self, inputs, grads_wrt_outputs):
-        """Calculates gradients with respect to layer parameters.
-
-        Args:
-            inputs: array of inputs to layer of shape (batch_size, input_dim)
-            grads_wrt_to_outputs: array of gradients with respect to the layer
-                outputs of shape (batch_size, output_dim)
-
-        Returns:
-            list of arrays of gradients with respect to the layer parameters
-            `[grads_wrt_weights, grads_wrt_biases]`.
-        """
-        #TODO write your code here
-        raise NotImplementedError()
-
-    @property
-    def params(self):
-        """A list of layer parameter values: `[weights, biases]`."""
-        return [self.weights, self.biases]
-
-    def __repr__(self):
-        return 'AffineLayer(input_dim={0}, output_dim={1})'.format(
-            self.input_dim, self.output_dim)
--- a/mlp/learning_rules.py
+++ b/mlp/learning_rules.py
@ -1,162 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Learning rules.
-
-This module contains classes implementing gradient based learning rules.
-"""
-
-import numpy as np
-
-
-class GradientDescentLearningRule(object):
-    """Simple (stochastic) gradient descent learning rule.
-
-    For a scalar error function `E(p[0], p_[1] ... )` of some set of
-    potentially multidimensional parameters this attempts to find a local
-    minimum of the loss function by applying updates to each parameter of the
-    form
-
-        p[i] := p[i] - learning_rate * dE/dp[i]
-
-    With `learning_rate` a positive scaling parameter.
-
-    The error function used in successive applications of these updates may be
-    a stochastic estimator of the true error function (e.g. when the error with
-    respect to only a subset of data-points is calculated) in which case this
-    will correspond to a stochastic gradient descent learning rule.
-    """
-
-    def __init__(self, learning_rate=1e-3):
-        """Creates a new learning rule object.
-
-        Args:
-            learning_rate: A postive scalar to scale gradient updates to the
-                parameters by. This needs to be carefully set - if too large
-                the learning dynamic will be unstable and may diverge, while
-                if set too small learning will proceed very slowly.
-
-        """
-        assert learning_rate > 0., 'learning_rate should be positive.'
-        self.learning_rate = learning_rate
-
-    def initialise(self, params):
-        """Initialises the state of the learning rule for a set or parameters.
-
-        This must be called before `update_params` is first called.
-
-        Args:
-            params: A list of the parameters to be optimised. Note these will
-                be updated *in-place* to avoid reallocating arrays on each
-                update.
-        """
-        self.params = params
-
-    def reset(self):
-        """Resets any additional state variables to their intial values.
-
-        For this learning rule there are no additional state variables so we
-        do nothing here.
-        """
-        pass
-
-    def update_params(self, grads_wrt_params):
-        """Applies a single gradient descent update to all parameters.
-
-        All parameter updates are performed using in-place operations and so
-        nothing is returned.
-
-        Args:
-            grads_wrt_params: A list of gradients of the scalar loss function
-                with respect to each of the parameters passed to `initialise`
-                previously, with this list expected to be in the same order.
-        """
-        for param, grad in zip(self.params, grads_wrt_params):
-            param -= self.learning_rate * grad
-
-
-class MomentumLearningRule(GradientDescentLearningRule):
-    """Gradient descent with momentum learning rule.
-
-    This extends the basic gradient learning rule by introducing extra
-    momentum state variables for each parameter. These can help the learning
-    dynamic help overcome shallow local minima and speed convergence when
-    making multiple successive steps in a similar direction in parameter space.
-
-    For parameter p[i] and corresponding momentum m[i] the updates for a
-    scalar loss function `L` are of the form
-
-        m[i] := mom_coeff * m[i] - learning_rate * dL/dp[i]
-        p[i] := p[i] + m[i]
-
-    with `learning_rate` a positive scaling parameter for the gradient updates
-    and `mom_coeff` a value in [0, 1] that determines how much 'friction' there
-    is the system and so how quickly previous momentum contributions decay.
-    """
-
-    def __init__(self, learning_rate=1e-3, mom_coeff=0.9):
-        """Creates a new learning rule object.
-
-        Args:
-            learning_rate: A postive scalar to scale gradient updates to the
-                parameters by. This needs to be carefully set - if too large
-                the learning dynamic will be unstable and may diverge, while
-                if set too small learning will proceed very slowly.
-            mom_coeff: A scalar in the range [0, 1] inclusive. This determines
-                the contribution of the previous momentum value to the value
-                after each update. If equal to 0 the momentum is set to exactly
-                the negative scaled gradient each update and so this rule
-                collapses to standard gradient descent. If equal to 1 the
-                momentum will just be decremented by the scaled gradient at
-                each update. This is equivalent to simulating the dynamic in
-                a frictionless system. Due to energy conservation the loss
-                of 'potential energy' as the dynamics moves down the loss
-                function surface will lead to an increasingly large 'kinetic
-                energy' and so speed, meaning the updates will become
-                increasingly large, potentially unstably so. Typically a value
-                less than but close to 1 will avoid these issues and cause the
-                dynamic to converge to a local minima where the gradients are
-                by definition zero.
-        """
-        super(MomentumLearningRule, self).__init__(learning_rate)
-        assert mom_coeff >= 0. and mom_coeff <= 1., (
-            'mom_coeff should be in the range [0, 1].'
-        )
-        self.mom_coeff = mom_coeff
-
-    def initialise(self, params):
-        """Initialises the state of the learning rule for a set or parameters.
-
-        This must be called before `update_params` is first called.
-
-        Args:
-            params: A list of the parameters to be optimised. Note these will
-                be updated *in-place* to avoid reallocating arrays on each
-                update.
-        """
-        super(MomentumLearningRule, self).initialise(params)
-        self.moms = []
-        for param in self.params:
-            self.moms.append(np.zeros_like(param))
-
-    def reset(self):
-        """Resets any additional state variables to their intial values.
-
-        For this learning rule this corresponds to zeroing all the momenta.
-        """
-        for mom in zip(self.moms):
-            mom *= 0.
-
-    def update_params(self, grads_wrt_params):
-        """Applies a single update to all parameters.
-
-        All parameter updates are performed using in-place operations and so
-        nothing is returned.
-
-        Args:
-            grads_wrt_params: A list of gradients of the scalar loss function
-                with respect to each of the parameters passed to `initialise`
-                previously, with this list expected to be in the same order.
-        """
-        for param, mom, grad in zip(self.params, self.moms, grads_wrt_params):
-            mom *= self.mom_coeff
-            mom -= self.learning_rate * grad
-            param += mom
--- a/mlp/models.py
+++ b/mlp/models.py
@ -1,67 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Model definitions.
-
-This module implements objects encapsulating learnable models of input-output
-relationships. The model objects implement methods for forward propagating
-the inputs through the transformation(s) defined by the model to produce
-outputs (and intermediate states) and for calculating gradients of scalar
-functions of the outputs with respect to the model parameters.
-"""
-
-from mlp.layers import LayerWithParameters
-
-
-class SingleLayerModel(object):
-    """A model consisting of a single transformation layer."""
-
-    def __init__(self, layer):
-        """Create a new single layer model instance.
-
-        Args:
-            layer: The layer object defining the model architecture.
-        """
-        self.layer = layer
-
-    @property
-    def params(self):
-        """A list of all of the parameters of the model."""
-        return self.layer.params
-
-    def fprop(self, inputs):
-        """Calculate the model outputs corresponding to a batch of inputs.
-
-        Args:
-            inputs: Batch of inputs to the model.
-
-        Returns:
-            List which is a concatenation of the model inputs and model
-            outputs, this being done for consistency of the interface with
-            multi-layer models for which `fprop` returns a list of
-            activations through all immediate layers of the model and including
-            the inputs and outputs.
-        """
-        activations = [inputs, self.layer.fprop(inputs)]
-        return activations
-
-    def grads_wrt_params(self, activations, grads_wrt_outputs):
-        """Calculates gradients with respect to the model parameters.
-
-        Args:
-            activations: List of all activations from forward pass through
-                model using `fprop`.
-            grads_wrt_outputs: Gradient with respect to the model outputs of
-               the scalar function parameter gradients are being calculated
-               for.
-
-        Returns:
-            List of gradients of the scalar function with respect to all model
-            parameters.
-        """
-        return self.layer.grads_wrt_params(activations[0], grads_wrt_outputs)
-
-    def params_cost(self):
-        """Calculates the parameter dependent cost term of the model."""
-        return self.layer.params_cost()
-
-    def __repr__(self):
-        return 'SingleLayerModel(' + str(layer) + ')'
--- a/mlp/optimisers.py
+++ b/mlp/optimisers.py
@ -1,134 +0,0 @@
-# -*- coding: utf-8 -*-
-"""Model optimisers.
-
-This module contains objects implementing (batched) stochastic gradient descent
-based optimisation of models.
-"""
-
-import time
-import logging
-from collections import OrderedDict
-import numpy as np
-
-
-logger = logging.getLogger(__name__)
-
-
-class Optimiser(object):
-    """Basic model optimiser."""
-
-    def __init__(self, model, error, learning_rule, train_dataset,
-                 valid_dataset=None, data_monitors=None):
-        """Create a new optimiser instance.
-
-        Args:
-            model: The model to optimise.
-            error: The scalar error function to minimise.
-            learning_rule: Gradient based learning rule to use to minimise
-                error.
-            train_dataset: Data provider for training set data batches.
-            valid_dataset: Data provider for validation set data batches.
-            data_monitors: Dictionary of functions evaluated on targets and
-                model outputs (averaged across both full training and
-                validation data sets) to monitor during training in addition
-                to the error. Keys should correspond to a string label for
-                the statistic being evaluated.
-        """
-        self.model = model
-        self.error = error
-        self.learning_rule = learning_rule
-        self.learning_rule.initialise(self.model.params)
-        self.train_dataset = train_dataset
-        self.valid_dataset = valid_dataset
-        self.data_monitors = OrderedDict([('error', error)])
-        if data_monitors is not None:
-            self.data_monitors.update(data_monitors)
-
-    def do_training_epoch(self):
-        """Do a single training epoch.
-
-        This iterates through all batches in training dataset, for each
-        calculating the gradient of the estimated error given the batch with
-        respect to all the model parameters and then updates the model
-        parameters according to the learning rule.
-        """
-        for inputs_batch, targets_batch in self.train_dataset:
-            activations = self.model.fprop(inputs_batch)
-            grads_wrt_outputs = self.error.grad(activations[-1], targets_batch)
-            grads_wrt_params = self.model.grads_wrt_params(
-                activations, grads_wrt_outputs)
-            self.learning_rule.update_params(grads_wrt_params)
-
-    def eval_monitors(self, dataset, label):
-        """Evaluates the monitors for the given dataset.
-
-        Args:
-            dataset: Dataset to perform evaluation with.
-            label: Tag to add to end of monitor keys to identify dataset.
-
-        Returns:
-            OrderedDict of monitor values evaluated on dataset.
-        """
-        data_mon_vals = OrderedDict([(key + label, 0.) for key
-                                     in self.data_monitors.keys()])
-        for inputs_batch, targets_batch in dataset:
-            activations = self.model.fprop(inputs_batch)
-            for key, data_monitor in self.data_monitors.items():
-                data_mon_vals[key + label] += data_monitor(
-                    activations[-1], targets_batch)
-        for key, data_monitor in self.data_monitors.items():
-            data_mon_vals[key + label] /= dataset.num_batches
-        return data_mon_vals
-
-    def get_epoch_stats(self):
-        """Computes training statistics for an epoch.
-
-        Returns:
-            An OrderedDict with keys corresponding to the statistic labels and
-            values corresponding to the value of the statistic.
-        """
-        epoch_stats = OrderedDict()
-        epoch_stats.update(self.eval_monitors(self.train_dataset, '(train)'))
-        if self.valid_dataset is not None:
-            epoch_stats.update(self.eval_monitors(
-                self.valid_dataset, '(valid)'))
-        return epoch_stats
-
-    def log_stats(self, epoch, epoch_time, stats):
-        """Outputs stats for a training epoch to a logger.
-
-        Args:
-            epoch (int): Epoch counter.
-            epoch_time: Time taken in seconds for the epoch to complete.
-            stats: Monitored stats for the epoch.
-        """
-        logger.info('Epoch {0}: {1:.1f}s to complete\n    {2}'.format(
-            epoch, epoch_time,
-            ', '.join(['{0}={1:.2e}'.format(k, v) for (k, v) in stats.items()])
-        ))
-
-    def train(self, num_epochs, stats_interval=5):
-        """Trains a model for a set number of epochs.
-
-        Args:
-            num_epochs: Number of epochs (complete passes through trainin
-                dataset) to train for.
-            stats_interval: Training statistics will be recorded and logged
-                every `stats_interval` epochs.
-
-        Returns:
-            Tuple with first value being an array of training run statistics
-            and the second being a dict mapping the labels for the statistics
-            recorded to their column index in the array.
-        """
-        run_stats = [list(self.get_epoch_stats().values())]
-        for epoch in range(1, num_epochs + 1):
-            start_time = time.process_time()
-            self.do_training_epoch()
-            epoch_time = time.process_time() - start_time
-            if epoch % stats_interval == 0:
-                stats = self.get_epoch_stats()
-                self.log_stats(epoch, epoch_time, stats)
-                run_stats.append(list(stats.values()))
-        return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}
-
--- a/model_architectures.py
+++ b/model_architectures.py
@ -0,0 +1,208 @@
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class FCCNetwork(nn.Module):
+    def __init__(self, input_shape, num_output_classes, num_filters, num_layers, use_bias=False):
+        """
+        Initializes a fully connected network similar to the ones implemented previously in the MLP package.
+        :param input_shape: The shape of the inputs going in to the network.
+        :param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
+        :param num_filters: Number of filters used in every fcc layer.
+        :param num_layers: Number of fcc layers (excluding dim reduction stages)
+        :param use_bias: Whether our fcc layers will use a bias.
+        """
+        super(FCCNetwork, self).__init__()
+        # set up class attributes useful in building the network and inference
+        self.input_shape = input_shape
+        self.num_filters = num_filters
+        self.num_output_classes = num_output_classes
+        self.use_bias = use_bias
+        self.num_layers = num_layers
+        # initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
+        self.layer_dict = nn.ModuleDict()
+        # build the network
+        self.build_module()
+
+    def build_module(self):
+        print("Building basic block of FCCNetwork using input shape", self.input_shape)
+        x = torch.zeros((self.input_shape))
+
+        out = x
+        out = out.view(out.shape[0], -1)
+        # flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
+        # shapes of all dimensions after the 0th dim
+
+        for i in range(self.num_layers):
+            self.layer_dict['fcc_{}'.format(i)] = nn.Linear(in_features=out.shape[1],  # initialize a fcc layer
+                                                            out_features=self.num_filters,
+                                                            bias=self.use_bias)
+
+            out = self.layer_dict['fcc_{}'.format(i)](out)  # apply ith fcc layer to the previous layers outputs
+            out = F.relu(out)  # apply a ReLU on the outputs
+
+        self.logits_linear_layer = nn.Linear(in_features=out.shape[1],  # initialize the prediction output linear layer
+                                             out_features=self.num_output_classes,
+                                             bias=self.use_bias)
+        out = self.logits_linear_layer(out)  # apply the layer to the previous layer's outputs
+        print("Block is built, output volume is", out.shape)
+        return out
+
+    def forward(self, x):
+        """
+        Forward prop data through the network and return the preds
+        :param x: Input batch x a batch of shape batch number of samples, each of any dimensionality.
+        :return: preds of shape (b, num_classes)
+        """
+        out = x
+        out = out.view(out.shape[0], -1)
+        # flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
+        # shapes of all dimensions after the 0th dim
+
+        for i in range(self.num_layers):
+            out = self.layer_dict['fcc_{}'.format(i)](out)  # apply ith fcc layer to the previous layers outputs
+            out = F.relu(out)  # apply a ReLU on the outputs
+
+        out = self.logits_linear_layer(out)  # apply the layer to the previous layer's outputs
+        return out
+
+    def reset_parameters(self):
+        """
+        Re-initializes the networks parameters
+        """
+        for item in self.layer_dict.children():
+            item.reset_parameters()
+
+        self.logits_linear_layer.reset_parameters()
+
+class ConvolutionalNetwork(nn.Module):
+    def __init__(self, input_shape, dim_reduction_type, num_output_classes, num_filters, num_layers, use_bias=False):
+        """
+        Initializes a convolutional network module object.
+        :param input_shape: The shape of the inputs going in to the network.
+        :param dim_reduction_type: The type of dimensionality reduction to apply after each convolutional stage, should be one of ['max_pooling', 'avg_pooling', 'strided_convolution', 'dilated_convolution']
+        :param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
+        :param num_filters: Number of filters used in every conv layer, except dim reduction stages, where those are automatically infered.
+        :param num_layers: Number of conv layers (excluding dim reduction stages)
+        :param use_bias: Whether our convolutions will use a bias.
+        """
+        super(ConvolutionalNetwork, self).__init__()
+        # set up class attributes useful in building the network and inference
+        self.input_shape = input_shape
+        self.num_filters = num_filters
+        self.num_output_classes = num_output_classes
+        self.use_bias = use_bias
+        self.num_layers = num_layers
+        self.dim_reduction_type = dim_reduction_type
+        # initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
+        self.layer_dict = nn.ModuleDict()
+        # build the network
+        self.build_module()
+
+    def build_module(self):
+        """
+        Builds network whilst automatically inferring shapes of layers.
+        """
+        print("Building basic block of ConvolutionalNetwork using input shape", self.input_shape)
+        x = torch.zeros((self.input_shape))  # create dummy inputs to be used to infer shapes of layers
+
+        out = x
+        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
+        for i in range(self.num_layers):  # for number of layers times
+            self.layer_dict['conv_{}'.format(i)] = nn.Conv2d(in_channels=out.shape[1],
+                                                             # add a conv layer in the module dict
+                                                             kernel_size=3,
+                                                             out_channels=self.num_filters, padding=1,
+                                                             bias=self.use_bias)
+
+            out = self.layer_dict['conv_{}'.format(i)](out)  # use layer on inputs to get an output
+            out = F.relu(out)  # apply relu
+            print(out.shape)
+            if self.dim_reduction_type == 'strided_convolution':  # if dim reduction is strided conv, then add a strided conv
+                self.layer_dict['dim_reduction_strided_conv_{}'.format(i)] = nn.Conv2d(in_channels=out.shape[1],
+                                                                                       kernel_size=3,
+                                                                                       out_channels=out.shape[1],
+                                                                                       padding=1,
+                                                                                       bias=self.use_bias, stride=2,
+                                                                                       dilation=1)
+
+                out = self.layer_dict['dim_reduction_strided_conv_{}'.format(i)](
+                    out)  # use strided conv to get an output
+                out = F.relu(out)  # apply relu to the output
+            elif self.dim_reduction_type == 'dilated_convolution':  # if dim reduction is dilated conv, then add a dilated conv, using an arbitrary dilation rate of i + 2 (so it gets smaller as we go, you can choose other dilation rates should you wish to do it.)
+                self.layer_dict['dim_reduction_dilated_conv_{}'.format(i)] = nn.Conv2d(in_channels=out.shape[1],
+                                                                                       kernel_size=3,
+                                                                                       out_channels=out.shape[1],
+                                                                                       padding=1,
+                                                                                       bias=self.use_bias, stride=1,
+                                                                                       dilation=i + 2)
+                out = self.layer_dict['dim_reduction_dilated_conv_{}'.format(i)](
+                    out)  # run dilated conv on input to get output
+                out = F.relu(out)  # apply relu on output
+
+            elif self.dim_reduction_type == 'max_pooling':
+                self.layer_dict['dim_reduction_max_pool_{}'.format(i)] = nn.MaxPool2d(2, padding=1)
+                out = self.layer_dict['dim_reduction_max_pool_{}'.format(i)](out)
+
+            elif self.dim_reduction_type == 'avg_pooling':
+                self.layer_dict['dim_reduction_avg_pool_{}'.format(i)] = nn.AvgPool2d(2, padding=1)
+                out = self.layer_dict['dim_reduction_avg_pool_{}'.format(i)](out)
+
+            print(out.shape)
+        if out.shape[-1] != 2:
+            out = F.adaptive_avg_pool2d(out,
+                                        2)  # apply adaptive pooling to make sure output of conv layers is always (2, 2) spacially (helps with comparisons).
+        print('shape before final linear layer', out.shape)
+        out = out.view(out.shape[0], -1)
+        self.logit_linear_layer = nn.Linear(in_features=out.shape[1],  # add a linear layer
+                                            out_features=self.num_output_classes,
+                                            bias=self.use_bias)
+        out = self.logit_linear_layer(out)  # apply linear layer on flattened inputs
+        print("Block is built, output volume is", out.shape)
+        return out
+
+    def forward(self, x):
+        """
+        Forward propages the network given an input batch
+        :param x: Inputs x (b, c, h, w)
+        :return: preds (b, num_classes)
+        """
+        out = x
+        for i in range(self.num_layers):  # for number of layers
+
+            out = self.layer_dict['conv_{}'.format(i)](out)  # pass through conv layer indexed at i
+            out = F.relu(out)  # pass conv outputs through ReLU
+            if self.dim_reduction_type == 'strided_convolution':  # if strided convolution dim reduction then
+                out = self.layer_dict['dim_reduction_strided_conv_{}'.format(i)](
+                    out)  # pass previous outputs through a strided convolution indexed i
+                out = F.relu(out)  # pass strided conv outputs through ReLU
+
+            elif self.dim_reduction_type == 'dilated_convolution':
+                out = self.layer_dict['dim_reduction_dilated_conv_{}'.format(i)](out)
+                out = F.relu(out)
+
+            elif self.dim_reduction_type == 'max_pooling':
+                out = self.layer_dict['dim_reduction_max_pool_{}'.format(i)](out)
+
+            elif self.dim_reduction_type == 'avg_pooling':
+                out = self.layer_dict['dim_reduction_avg_pool_{}'.format(i)](out)
+
+        if out.shape[-1] != 2:
+            out = F.adaptive_avg_pool2d(out, 2)
+        out = out.view(out.shape[0], -1)  # flatten outputs from (b, c, h, w) to (b, c*h*w)
+        out = self.logit_linear_layer(out)  # pass through a linear layer to get logits/preds
+        return out
+
+    def reset_parameters(self):
+        """
+        Re-initialize the network parameters.
+        """
+        for item in self.layer_dict.children():
+            try:
+                item.reset_parameters()
+            except:
+                pass
+
+        self.logit_linear_layer.reset_parameters()
--- a/notebooks/00_notebook.ipynb
+++ b/notebooks/00_notebook.ipynb
@ -1,242 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction\n",
-    "\n",
-    "## Getting started with Jupyter notebooks\n",
-    "\n",
-    "The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to [skip straight to the exercises below](#Exercises).\n",
-    "\n",
-    "*Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python [hence the name was changed to make it more language agnostic](https://ipython.org/#jupyter-and-the-future-of-ipython) however IPython notebook is still commonly used.*\n",
-    "\n",
-    "### Jupyter basics: the server, dashboard and kernels\n",
-    "\n",
-    "In launching this notebook you will have already come across two of the other key components of the Jupyter system - the notebook *server* and *dashboard* interface.\n",
-    "\n",
-    "We began by starting a notebook server instance in the terminal by running\n",
-    "\n",
-    "```\n",
-    "jupyter notebook\n",
-    "```\n",
-    "\n",
-    "This will have begun printing a series of log messages to terminal output similar to\n",
-    "\n",
-    "```\n",
-    "$ jupyter notebook\n",
-    "[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical\n",
-    "[I 08:58:24.417 NotebookApp] 0 active kernels\n",
-    "[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/\n",
-    "```\n",
-    "\n",
-    "The last message included here indicates the URL the application is being served at. The default behaviour of the `jupyter notebook` command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running `jupyter notebook --no-browser`. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using\n",
-    "\n",
-    "```\n",
-    "jupyter notebook --help\n",
-    "```\n",
-    "\n",
-    "While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use `Ctrl+C` - this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter `y` or skip the confirmation by hitting `Ctrl+C` again.\n",
-    "\n",
-    "When the notebook application first opens in your browser you are taken to the notebook *dashboard*. This will appear something like this\n",
-    "\n",
-    "<img src='res/jupyter-dashboard.png' />\n",
-    "\n",
-    "The dashboard above is showing the `Files` tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the `..` link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.\n",
-    "\n",
-    "As well as allowing you to launch existing notebooks, the `Files` tab of the dashboard also allows new notebooks to be created using the `New` drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).\n",
-    "\n",
-    "In addition to opening notebook files, we can also edit text files such as `.py` source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.\n",
-    "\n",
-    "The `Running` tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed.\n",
-    "\n",
-    "### The notebook interface\n",
-    "\n",
-    "The top of your notebook window should appear something like this:\n",
-    "\n",
-    "<img src='res/jupyter-notebook-interface.png' />\n",
-    "\n",
-    "The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual *checkpoint* of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the `Revert to checkpoint` dropdown under the `File` menu perhaps suggesting otherwise.\n",
-    "\n",
-    "As well as having options to save and revert to checkpoints, the `File` menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.\n",
-    "\n",
-    "The `Edit` menu contains standard clipboard functions as well as options for reorganising notebook *cells*. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The `Edit` and `Insert` drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the `Cell` menu allow running of code cells and changing cell types.\n",
-    "\n",
-    "The `Kernel` menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and `reload` is failing to work. \n",
-    "\n",
-    "To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.\n",
-    "\n",
-    "The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the `User Interface Tour` under the `Help` menu drop down.\n",
-    "\n",
-    "### Markdown cells: easy text formatting\n",
-    "\n",
-    "This entire introduction has been written in what is termed a *Markdown* cell of a notebook. [Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides - for example [this cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).\n",
-    "\n",
-    "---\n",
-    "\n",
-    "```\n",
-    "## Level 2 heading\n",
-    "### Level 3 heading\n",
-    "\n",
-    "*Italicised* and **bold** text.\n",
-    "\n",
-    "  * bulleted\n",
-    "  * lists\n",
-    "  \n",
-    "and\n",
-    "\n",
-    "  1. enumerated\n",
-    "  2. lists\n",
-    "\n",
-    "Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style\n",
-    "\n",
-    "$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
-    "```\n",
-    "---\n",
-    "\n",
-    "## Level 2 heading\n",
-    "### Level 3 heading\n",
-    "\n",
-    "*Italicised* and **bold** text.\n",
-    "\n",
-    "  * bulleted\n",
-    "  * lists\n",
-    "  \n",
-    "and\n",
-    "\n",
-    "  1. enumerated\n",
-    "  2. lists\n",
-    "\n",
-    "Inline maths $y = mx + c$ using [MathJax]() as well as display maths\n",
-    "\n",
-    "$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
-    "\n",
-    "---\n",
-    "\n",
-    "We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.\n",
-    "\n",
-    "---\n",
-    "```\n",
-    "<img src=\"http://placehold.it/350x150\" />\n",
-    "```\n",
-    "---\n",
-    "\n",
-    "<img src=\"http://placehold.it/350x150\" />\n",
-    "\n",
-    "---\n",
-    "\n",
-    "  \n",
-    "### Code cells: in browser code execution\n",
-    "\n",
-    "Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the <i class=\"fa-step-forward fa\"></i> button on the notebook toolbar, go to `Cell > Run Cells` or use the keyboard shortcut `Ctrl+Enter`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from __future__ import print_function\n",
-    "import sys\n",
-    "\n",
-    "print('Hello world!')\n",
-    "print('Alarming hello!', file=sys.stderr)\n",
-    "print('Hello again!')\n",
-    "'And again!'"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This example shows the three main components of a code cell.\n",
-    "\n",
-    "The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.\n",
-    "\n",
-    "To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display `In [ ]:`. After the cell is run this is updated to `In [n]:` where `n` is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!\n",
-    "\n",
-    "Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29) is displayed. Immediately below that output to [`stderr`](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) is displayed. All of the output to `stdout` is displayed together even if there has been output to `stderr` between as shown by the suprising ordering in the output here. \n",
-    "\n",
-    "The final part of the output area is the *display* area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special `__repr__` method e.g. for a string it is just the quote enclosed value of the string itself.\n",
-    "\n",
-    "### Useful keyboard shortcuts\n",
-    "\n",
-    "There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the `Keyboard Shortcuts` option under the `Help` menu. We will cover a few of those we find most useful below.\n",
-    "\n",
-    "Shortcuts come in two flavours: those applicable in *command mode*, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in *edit mode* when the content of a cell is being edited, indicated by a green current cell highlight.\n",
-    "\n",
-    "In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the `Tab` key.\n",
-    "\n",
-    "  * Pressing `Tab` a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing `a` + `[Tab]` in a code cell will come up with a list of objects beginning with `a` in the current global namespace, while typing `np.a` + `[Tab]` (assuming `import numpy as np` has been run already) will bring up a list of objects in the root NumPy namespace beginning with `a`.\n",
-    "  * Pressing `Shift+Tab` once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing `Shift+Tab` twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing `Shift+Tab` four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.\n",
-    "\n",
-    "A series of useful shortcuts available in both command and edit mode are `[modifier]+Enter` where `[modifier]` is one of `Ctrl` (run selected cell), `Shift` (run selected cell and select next) or `Alt` (run selected cell and insert a new cell after).\n",
-    "\n",
-    "A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing `L` which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.\n",
-    "  \n",
-    "### Magics\n",
-    "\n",
-    "There are a range of *magic* commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), however three that are particularly useful for this course:\n",
-    "\n",
-    "  * [`%%timeit`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-timeit) Put at the beginning of a cell to time its execution and print the resulting timing statistics.\n",
-    "  * [`%precision`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-precision) Set the precision for pretty printing of floating point values and NumPy arrays.\n",
-    "  * [`%debug`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-debug) Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.\n",
-    "  \n",
-    "### Plotting with `matplotlib`\n",
-    "\n",
-    "When setting up your environment one of the dependencies we asked you to install was `matplotlib`. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.\n",
-    "\n",
-    "When using `matplotlib` in a notebook you should first run the [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib)\n",
-    "\n",
-    "```\n",
-    "%matplotlib inline\n",
-    "```\n",
-    "\n",
-    "This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using `matplotlib` to show case some of the basic plot options. To see the output produced select the cell and then run it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "\n",
-    "# generate a pair of sinusoids\n",
-    "x = np.linspace(0., 2. * np.pi, 100)\n",
-    "y1 = np.sin(x)\n",
-    "y2 = np.cos(x)\n",
-    "\n",
-    "# produce a new figure object with a defined (width, height) in inches\n",
-    "fig = plt.figure(figsize=(8, 4))\n",
-    "# add a single axis to the figure\n",
-    "ax = fig.add_subplot(111)\n",
-    "# plot the two sinusoidal traces on the axis, adjusting the line width\n",
-    "# and adding LaTeX legend labels\n",
-    "ax.plot(x, y1, linewidth=2, label=r'$\\sin(x)$')\n",
-    "ax.plot(x, y2, linewidth=2, label=r'$\\cos(x)$')\n",
-    "# set the axis labels\n",
-    "ax.set_xlabel('$x$', fontsize=16)\n",
-    "ax.set_ylabel('$y$', fontsize=16)\n",
-    "# force the legend to be displayed\n",
-    "ax.legend()\n",
-    "# adjust the limits of the horizontal axis\n",
-    "ax.set_xlim(0., 2. * np.pi)\n",
-    "# make a grid be displayed in the axis background\n",
-    "ax.grid(True)"
-   ]
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/notebooks/01_Introduction.ipynb
+++ b/notebooks/01_Introduction.ipynb
--- a/notebooks/02_Single_layer_models.ipynb
+++ b/notebooks/02_Single_layer_models.ipynb
--- a/notebooks/res/._fprop-bprop-block-diagram.png
+++ b/notebooks/res/._fprop-bprop-block-diagram.png
--- a/notebooks/res/._jupyter-dashboard.png
+++ b/notebooks/res/._jupyter-dashboard.png
--- a/notebooks/res/._jupyter-notebook-interface.png
+++ b/notebooks/res/._jupyter-notebook-interface.png
--- a/notebooks/res/._singleLayerNetBP-1.png
+++ b/notebooks/res/._singleLayerNetBP-1.png
--- a/notebooks/res/._singleLayerNetPredict.png
+++ b/notebooks/res/._singleLayerNetPredict.png
--- a/notebooks/res/._singleLayerNetWts-1.png
+++ b/notebooks/res/._singleLayerNetWts-1.png
--- a/notebooks/res/._singleLayerNetWtsEqns-1.png
+++ b/notebooks/res/._singleLayerNetWtsEqns-1.png
--- a/notebooks/res/code_scheme.svg
+++ b/notebooks/res/code_scheme.svg
--- a/notebooks/res/fprop-bprop-block-diagram.pdf
+++ b/notebooks/res/fprop-bprop-block-diagram.pdf
--- a/notebooks/res/fprop-bprop-block-diagram.png
+++ b/notebooks/res/fprop-bprop-block-diagram.png
--- a/notebooks/res/fprop-bprop-block-diagram.tex
+++ b/notebooks/res/fprop-bprop-block-diagram.tex
@ -1,65 +0,0 @@
-\documentclass[tikz]{standalone}
-
-\usepackage{amsmath}
-\usepackage{tikz}
-\usetikzlibrary{arrows}
-\usetikzlibrary{calc}
-\usepackage{ifthen}
-
-\newcommand{\vct}[1]{\boldsymbol{#1}}
-\newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}}
-
-\tikzstyle{fprop} = [draw,fill=blue!20,minimum size=2em,align=center]
-\tikzstyle{bprop} = [draw,fill=red!20,minimum size=2em,align=center]
-
-\begin{document}
-
-\begin{tikzpicture}[xscale=1.75] %
-    % define number of layers
-    \def\nl{2};
-    % model input
-    \node at (0, 0) (input) {$\vct{x}$};
-    % draw fprop through model layers
-    \foreach \l in {0,...,\nl} {
-        \node[fprop] at (2 * \l + 1, 0) (fprop\l) {\texttt{layers[\l]} \\ \texttt{.fprop}};
-        \ifthenelse{\l > 0}{
-            \node at (2 * \l, 0) (hidden\l) {$\vct{h}_\l$};
-            \draw[->] (hidden\l) -- (fprop\l);
-            \draw[->] let \n1={\l - 1} in (fprop\n1) -- (hidden\l);
-        }{
-            \draw[->] (input) -- (fprop\l);
-        }
-    }
-    % model output
-    \node at (2 * \nl + 2, 0) (output) {$\mathbf{y}$};
-    % error function
-    \node[fprop] at (2 * \nl + 3, 0) (errorfunc) {\texttt{error}};
-    % error value
-    \node at (2 * \nl + 3, -1) (error) {$\bar{E}$};
-    % targets
-    \node at (2 * \nl + 4, -1) (tgt) {$\vct{t}$};
-    % error gradient
-    \node[bprop] at (2 * \nl + 3, -2) (errorgrad) {\texttt{error} \\ \texttt{.grad}};
-    % gradient wrt outputs
-    \node at (2 * \nl + 2, -2) (gradoutput) {$\pd{\bar{E}}{\vct{y}}$};
-    \draw[->] (fprop\nl) -- (output);
-    \draw[->] (output) -- (errorfunc);
-    \draw[->] (errorfunc) -- (error);
-    \draw[->] (error) -- (errorgrad);
-    \draw[->] (errorgrad) -- (gradoutput);
-    \draw[->] (tgt) |- (errorfunc);
-    \draw[->] (tgt) |- (errorgrad);
-    \foreach \l in {0,...,\nl} {
-        \node[bprop] at (2 * \l + 1, -2) (bprop\l) {\texttt{layers[\l]} \\ \texttt{.bprop}};
-        \ifthenelse{\l > 0}{
-            \node at (2 * \l, -2) (grad\l) {$\pd{\bar{E}}{\vct{h}_\l}$};
-            \draw[<-] (grad\l) -- (bprop\l);
-            \draw[<-] let \n1={\l - 1} in (bprop\n1) -- (grad\l);
-        }{}
-    }
-    \node at (0, -2) (gradinput) {$\pd{\bar{E}}{\vct{x}}$};
-    \draw[->] (bprop0) -- (gradinput);
-    \draw[->] (gradoutput) -- (bprop\nl);
-\end{tikzpicture}
-
-\end{document}
--- a/notebooks/res/jupyter-dashboard.png
+++ b/notebooks/res/jupyter-dashboard.png
--- a/notebooks/res/jupyter-notebook-interface.png
+++ b/notebooks/res/jupyter-notebook-interface.png
--- a/notebooks/res/singleLayerNetBP-1.png
+++ b/notebooks/res/singleLayerNetBP-1.png
--- a/notebooks/res/singleLayerNetPredict.png
+++ b/notebooks/res/singleLayerNetPredict.png
--- a/notebooks/res/singleLayerNetWts-1.png
+++ b/notebooks/res/singleLayerNetWts-1.png
--- a/notebooks/res/singleLayerNetWtsBP.pdf
+++ b/notebooks/res/singleLayerNetWtsBP.pdf
--- a/notebooks/res/singleLayerNetWtsEqns-1.png
+++ b/notebooks/res/singleLayerNetWtsEqns-1.png
--- a/notebooks/res/singleLayerNetWtsEqns.pdf
+++ b/notebooks/res/singleLayerNetWtsEqns.pdf
--- a/notes/environment-set-up.md
+++ b/notes/environment-set-up.md
@ -16,8 +16,10 @@ Conda can handle installation of the Python libraries we will be using and all t

 There are several options available for installing Conda on a system. Here we will use the Python 3 version of [Miniconda](http://conda.pydata.org/miniconda.html), which installs just Conda and its dependencies. An alternative is to install the [Anaconda Python distribution](https://docs.continuum.io/anaconda/), which installs Conda and a large selection of popular Python packages. As we will require only a small subset of these packages we will use the more barebones Miniconda to avoid eating into your DICE disk quota too much, however if installing on a personal machine you may wish to consider Anaconda if you want to explore other Python packages.

+
 ## 2. Installing Miniconda

+
 We provide instructions here for getting an environment with all the required dependencies running on computers running 
 the School of Informatics [DICE desktop](http://computing.help.inf.ed.ac.uk/dice-platform). The same instructions 
 should be able to used on other Linux distributions such as Ubuntu and Linux Mint with minimal adjustments.
@ -32,7 +34,7 @@ If you are using ssh connection to the student server, move to the next step. If

 We first need to download the latest 64-bit Python 3 Miniconda install script:

-```bash
+```
 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 ```

@ -40,7 +42,7 @@ This uses `wget` a command-line tool for downloading files.

 Now run the install script:

-```bash
+```
 bash Miniconda3-latest-Linux-x86_64.sh
 ```

@ -54,14 +56,14 @@ definition in `.bashrc`. As the DICE bash start-up mechanism differs from the st

 On DICE, append the Miniconda binaries directory to `PATH` in manually in `~/.benv` using

-```bash
+```
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
 ```

 To avoid any errors later, check both the bashrc and benv files for the correct file path by running : 

-```bash
+```
 vim ~/.bashrc and vim ~/.benv 
 ```

@ -69,43 +71,43 @@ For those who this appears a bit opaque to and want to know what is going on see

 We now need to `source` the updated `~/.benv` so that the `PATH` variable in the current terminal session is updated:

-```bash
+```
 source ~/.benv
 ```

 From the next time you log in all future terminal sessions should have conda readily available via:

-```bash
+```
 conda activate
 ```

+
 ## 3. Creating the Conda environment

 You should now have a working Conda installation. If you run

-```bash
+```
 conda --help
 ```
-
-From a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
+from a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).

 Assuming Conda is working, we will now create our Conda environment:

-```bash
-conda create -n mlp python=3.12.5 -y
+```
+conda create -n mlp python=3
 ```

-This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install.  
+This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install. You will be presented with a 'package plan' listing the packages to be installed and asked whether to proceed: type `y` then enter.

 We will now *activate* our created environment:

-```bash
+```
 conda activate mlp
 ```

 or on Windows only

-```bash
+```
 activate mlp
 ```

@ -117,33 +119,31 @@ If you wish to deactivate an environment loaded in the current terminal e.g. to

 We will now install the dependencies for the course into the new environment:

-```bash
-conda install numpy scipy matplotlib jupyter -y
+```
+conda install numpy scipy matplotlib jupyter
 ```

-Wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
+Again you will be given a list of the packages to be installed and asked to confirm whether to proceed. Enter `y` then wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.

 Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).

-```bash 
-conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
+```
+conda install pytorch torchvision torchaudio cpuonly -c pytorch
 ```

 Once the installation is finished, to recover some disk space we can clear the package tarballs Conda just downloaded:

-```bash
-conda clean -t -y
+```
+conda clean -t
 ```

 These tarballs are usually cached to allow quicker installation into additional environments however we will only be using a single environment here so there is no need to keep them on disk.

 ***ANLP and IAML students only:***
 To have normal access to your ANLP and IAML environments please do the following:
-
 1. ```nano .condarc```
 2. Add the following lines in the file:
-
-```yml
+```
 envs_dirs:
 - /group/teaching/conda/envs

@ -151,7 +151,6 @@ pkgs_dirs:
 - /group/teaching/conda/pkgs
 - ~/miniconda3/pkgs
 ```
-
 3. Exit by using control + x and then choosing 'yes' at the exit prompt.

 ## 4. Getting the course code and a short introduction to Git
@ -168,7 +167,7 @@ https://github.com/VICO-UoE/mlpractical

 Git is installed by default on DICE desktops. If you are running a system which does not have Git installed, you can use Conda to install it in your environment using:

-```bash
+```
 conda install git
 ```

@ -189,15 +188,16 @@ If you are already familiar with Git you may wish to skip over the explanatory s

 By default we will assume here you are cloning to your home directory however if you have an existing system for organising your workspace feel free to keep to that. **If you clone the repository to a path other than `~/mlpractical` however you will need to adjust all references to `~/mlpractical` in the commands below accordingly.**

+
 To clone the `mlpractical` repository to the home directory run

-```bash
+```
 git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
 ```

 This will create a new `mlpractical` subdirectory with a local copy of the repository in it. Enter the directory and list all its contents, including hidden files, by running:

-```bash
+```
 cd ~/mlpractical
 ls -a  # Windows equivalent: dir /a
 ```
@ -210,9 +210,10 @@ For the most part this will look much like any other directory, with there being

 Additionally there exists a hidden `.git` subdirectory (on Unix systems by default files and directories prepended with a period '.' are hidden). This directory contains the repository history database and various configuration files and references. Unless you are sure you know what you are doing you generally should not edit any of the files in this directory directly. Generally most configuration options can be enacted more safely using a `git config` command.

+
 For instance to globally set the user name and email used in commits you can run:

-```bash
+```
 git config --global user.name "[your name]"
 git config --global user.email "[matric-number]@sms.ed.ac.uk"
 ```
@ -235,19 +236,19 @@ A *commit* in Git is a snapshot of the state of the project. The snapshots are r

  2. The files with changes to be committed (including any new files) are added to the *staging area* by running:

-  ```bash
+  ```
  git add file1 file2 ...
  ```

  3. Finally the *staged changes* are used to create a new commit by running

-  ```bash
+  ```
  git commit -m "A commit message describing the changes."
  ```

 This writes the staged changes as a new commit in the repository history. We can see a log of the details of previous commits by running:

-```bash
+```
 git log
 ```

@ -259,17 +260,17 @@ A new branch is created from a commit on an existing branch. Any commits made to

 A typical Git workflow in a software development setting would be to create a new branch whenever making changes to a project, for example to fix a bug or implement a new feature. These changes are then isolated from the main code base allowing regular commits without worrying about making unstable changes to the main code base. Key to this workflow is the ability to *merge* commits from a branch into another branch, e.g. when it is decided a new feature is sufficiently developed to be added to the main code base. Although merging branches is key aspect of using Git in many projects, as dealing with merge conflicts when two branches both make changes to same parts of files can be a somewhat tricky process, we will here generally try to avoid the need for merges.

-We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.
+<p id='branching-explanation'>We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.</p>

 To list the branches present in the local repository, run:

-```bash
+```
 git branch
 ```

 This will display a list of branches with a `*` next to the current branch. To switch to a different existing branch in the local repository run

-```bash
+```
 git checkout branch-name
 ```

@ -277,8 +278,8 @@ This will change the code in the working directory to the current state of the c

 You should make sure you are on the first lab branch now by running:

-```bash
-git checkout mlp2024-25/lab1
+```
+git checkout mlp2023-24/lab1
 ```

 ## 6. Installing the `mlp` Python package
@ -291,7 +292,7 @@ The standard way to install a Python package using a `setup.py` script is to run

 As we will be updating the code in the `mlp` package during the course of the labs this would require you to re-run  `python setup.py install` every time a change is made to the package. Instead therefore you should install the package in development mode by running:

-```bash
+```
 python setup.py develop
 ```

@ -303,20 +304,20 @@ Instead of copying the package, this will instead create a symbolic link to the

 Note that after the first time a Python module is loaded into an interpreter instance, using for example:

-```python
+```
 import mlp
 ```

 Running the `import` statement any further times will have no effect even if the underlying module code has been changed. To reload an already imported module we instead need to use the [`importlib.reload`](https://docs.python.org/3/library/importlib.html#importlib.reload) function, e.g.

-```python
+```
 import importlib
 importlib.reload(mlp)
 ```

 **Note: To be clear as this has caused some confusion in previous labs the above `import ...` / `reload(...)` statements should NOT be run directly in a bash terminal. They are examples Python statements - you could run them in a terminal by first loading a Python interpreter using:**

-```bash
+```
 python
 ```

@ -330,7 +331,7 @@ We observed previously the presence of a `data` subdirectory in the local reposi

 Assuming you used the recommended Miniconda install location and cloned the `mlpractical` repository to your home directory, this variable can be automatically defined when activating the environment by running the following commands (on non-Windows systems):

-```bash
+```
 cd ~/miniconda3/envs/mlp
 mkdir -p ./etc/conda/activate.d
 mkdir -p ./etc/conda/deactivate.d
@ -343,12 +344,12 @@ export MLP_DATA_DIR=$HOME/mlpractical/data

 And on Windows systems (replacing the `[]` placeholders with the relevant paths):

-```bash
+```
 cd [path-to-conda-root]\envs\mlp
 mkdir .\etc\conda\activate.d
 mkdir .\etc\conda\deactivate.d
-echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
-echo "set MLP_DATA_DIR="  >> .\etc\conda\deactivate.d\env_vars.bat
+@echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
+@echo "set MLP_DATA_DIR="  >> .\etc\conda\deactivate.d\env_vars.bat
 set MLP_DATA_DIR=[path-to-local-repository]\data
 ```

@ -362,7 +363,7 @@ There will be a Jupyter notebook available for each lab and assignment in this c

 To open a notebook, you first need to launch a Jupyter notebook server instance. From within the `mlpractical` directory containing your local copy of the repository (and with the `mlp` environment activated) run:

-```bash
+```
 jupyter notebook
 ```

@ -378,13 +379,13 @@ Below are instructions for setting up the environment without additional explana

 Start a new bash terminal. Download the latest 64-bit Python 3.9 Miniconda install script:

-```bash
+```
 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 ```

 Run the install script:

-```bash
+```
 bash Miniconda3-latest-Linux-x86_64.sh
 ```

@ -393,70 +394,69 @@ Review the software license agreement and choose whether to accept. Assuming you
 You will then be asked whether to prepend the Miniconda binaries directory to the `PATH` system environment variable definition in `.bashrc`. You should respond `no` here as we will set up the addition to `PATH` manually in the next step.

 Append the Miniconda binaries directory to `PATH` in manually in `~/.benv`:
-
-```bash
+```
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
 echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
 ```

 `source` the updated `~/.benv`:

-```bash
+```
 source ~/.benv
 ```

 Create a new `mlp` Conda environment:

-```bash
-conda create -n mlp python=3.12.5 -y
+```
+conda create -n mlp python=3
 ```

 Activate our created environment:

-```bash
+```
 conda activate mlp
 ```

 Install the dependencies for the course into the new environment:

-```bash
-conda install numpy scipy matplotlib jupyter -y
+```
+conda install numpy scipy matplotlib jupyter
 ```

 Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).

-```bash 
-conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
+```
+conda install pytorch torchvision torchaudio cpuonly -c pytorch
 ```

 Clear the package tarballs Conda just downloaded:

-```bash
+```
 conda clean -t
 ```

 Clone the course repository to your home directory:

-```bash
+```
 git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
 ```

 Make sure we are on the first lab branch

-```bash
+```
 cd ~/mlpractical
-git checkout mlp2024-25/lab1
+git checkout mlp2023-24/lab1
 ```

 Install the `mlp` package in the environment in develop mode

-```bash
+```
 python ~/mlpractical/setup.py develop
 ```

 Add an `MLP_DATA_DIR` variable to the environment

-```bash
+```
 cd ~/miniconda3/envs/mlp
 mkdir -p ./etc/conda/activate.d
 mkdir -p ./etc/conda/deactivate.d
@ -469,13 +469,14 @@ export MLP_DATA_DIR=$HOME/mlpractical/data

 Environment is now set up. Load the notebook server from `mlpractical` directory

-```bash
+```
 cd ~/mlpractical
 jupyter notebook
 ```

 and then open the first lab notebook from the `notebooks` directory.

+
 ---

 <b id="f1">[1]</b> The `echo` command causes the following text to be streamed to an output (standard terminal output by default). Here we use the append redirection operator `>>` to redirect the `echo` output to a file `~/.benv`, with it being appended to the end of the current file. The text actually added is `export PATH="$PATH:[your-home-directory]/miniconda/bin"` with the `\"` being used to escape the quote characters. The `export` command defines system-wide environment variables (more rigorously those inherited by child shells) with `PATH` being the environment variable defining where `bash` searches for executables as a colon-seperated list of directories. Here we add the Miniconda binary directory to the end of the current `PATH` definition. [↩](#a1)
--- a/notes/figures/._putty1.png
+++ b/notes/figures/._putty1.png
--- a/notes/figures/._putty2.png
+++ b/notes/figures/._putty2.png
--- a/notes/figures/._putty3.png
+++ b/notes/figures/._putty3.png
--- a/notes/figures/._putty4.png
+++ b/notes/figures/._putty4.png
--- a/notes/figures/._putty5.png
+++ b/notes/figures/._putty5.png
--- a/notes/figures/boot_disk.png
+++ b/notes/figures/boot_disk.png
--- a/notes/figures/increase_quota.png
+++ b/notes/figures/increase_quota.png
--- a/notes/figures/vm_instance_configuration.png
+++ b/notes/figures/vm_instance_configuration.png
--- a/notes/figures/vm_instance_location.png
+++ b/notes/figures/vm_instance_location.png
--- a/notes/google_cloud_setup.md
+++ b/notes/google_cloud_setup.md
@ -0,0 +1,175 @@
+# Google Cloud Usage Tutorial
+
+This document has been created to help you setup a google cloud instance to be used for the MLP course using the student credit the course has acquired.
+This document is non-exhaustive and many more useful information is available on the [google cloud documentation page](https://cloud.google.com/docs/).
+For any question you might have, that is not covered here, a quick google search should get you what you need. Anything in the official google cloud docs should be very helpful.
+
+| WARNING: Read those instructions carefully. You will be given 50$ worth of credits and you will need to manage them properly. We will not be able to provide more credits. |
+| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+
+
+### To create your account and start a project funded by the student credit
+
+1. Login with your preferred gmail id to [google cloud console](https://cloud.google.com/). Click on `Console` (upper right corner), which would lead you to a new page and once there, click on Select a Project on the left hand side of the search bar on top of the page and then click on New Project on the right hand side of the Pop-Up.
+Name your project sxxxxxxx-MLPractical - replacing the sxxxxxxx with your student number. **Make sure you are on this project before following the next steps**. 
+2. Get your coupon by following the instructions in the coupon retrieval link that you received.
+3. Once you receive your coupon, follow the email instructions to add your coupon to your account.
+4. Once you have added your coupon, join the [MLPractical GCP Google Group](https://groups.google.com/forum/#!forum/mlpractical_gcp) using the same Google account you used to redeem your coupon. This ensures access to the shared disk images.
+5. Make sure that the financial source for your project is the MLPractical credit. You can check this by going to the [Google Cloud Console](https://console.cloud.google.com/) and selecting your project. Then, click on the `Billing` tile. Once on the `Billing` page, you should be prompted to add the billing account if you haven't yet done so. Choose `Billing Account for Education` as your billing account. Then, under the billing account, click `account management` on the left-hand side tab. You should see your project under `Projects linked to this billing account`. If not, you can add it by clicking on `Add projects` and selecting your project from the list of available projects.  
+
+### To create an instance
+
+1. On the console page, click the button with the three lines at the top left corner.
+2. In the ```Compute Engine``` sub-menu select ```VM Instances```.
+3. Enable ```Compute Engine API``` if prompted.
+4. Click the ```CREATE INSTANCE``` button at the top of the window. 
+5. Click on ```VM FROM INSTANCE TEMPLATE```, and create your VM template for this coursework:
+6. Name the template ```mlpractical-1```.
+7. Select ```Regional``` as the location type and ```us-west1(Oregon)``` as the region.
+
+![VM location](figures/vm_instance_location.png)
+
+8. Under ```Machine Configuration```, select ```GPU``` machine family. Select one NVIDIA T4. Those are the cheapest one, be careful as others can cost up to 8 times more to run.
+9. Below, in ```Machine type```, under ```PRESET``` select ```n1-standard-2 (2 vCPU, 1 core, 7.5Gb memory)```.
+
+![VM location](figures/vm_instance_configuration.png)
+
+10. Under ```Boot disk```, click change.
+11. On the right-hand new menu that appears (under ```PUBLIC IMAGES```), select
+    * ```Deep Learning on Linux``` operating system,
+    * ```Deep Learning VM for PyTorch 2.0 with CUDA 11.8 M125``` 
+        * **Note**: If the above version is not available, you can use any ```Deep Learning VM for PyTorch 2.0 with CUDA 11.8 M***``` instead.
+    * ```Balanced persistent disk``` as boot disk type, 
+    * ```100```GB as disk size, and then click select at the bottom.
+
+![Boot disk](figures/boot_disk.png)
+    
+12. Under ```Availability policies```, in the ```VM provisioning model``` drop down menu, select ```Spot```. Using this option will be helpful if you're running low on credits.
+13. You can ```Enable display device``` if you want to use a GUI. This is not necessary for the coursework.
+14. Leave other options as default and click ```CREATE```.
+15. Tick your newly created template and click ```CREATE VM``` (top centre).
+16. Click ```CREATE```. Your instance should be ready in a minute or two.
+15. If your instance failed to create due to the following error - ```The GPUS-ALL-REGIONS-per-project quota maximum has been exceeded. Current limit: 0.0. Metric: compute.googleapis.com/gpus_all_regions.```, click on ```REQUEST QUOTA``` in the notification.
+16. Tick ```Compute Engine API``` and then click ```EDIT QUOTAS``` (top right).
+
+![VM location](figures/increase_quota.png)
+
+17. This will open a box in the right side corner. Put your ```New Limit``` as ```1``` and in the description you can mention you need GPU for machine learning coursework.
+18. Click ```NEXT```, fill in your details and then click ```SUBMIT REQUEST```.
+19. You will receive a confirmation email with your Quota Limit increased. This may take some minutes.
+20. After the confirmation email, you can recheck the GPU(All Regions) Quota Limit being set to 1. This usually shows up in 10-15 minutes after the confirmation email. 
+21. Retry making the VM instance again as before, by choosing your template, and you should have your instance now. 
+
+
+#### Note
+Be careful to select 1 x T4 GPU (Others can be much more expensive).
+
+You only have $50 dollars worth of credit, which should be about 6 days of GPU usage on a T4.
+
+
+### To login into your instance via terminal:
+
+1. Install `google-cloud-sdk` (or similarly named) package using your OS package manager
+2. Authorize the current machine to access your nodes run ```gcloud auth login```. This will authenticate your google account login.
+3. Follow the prompts to get a token for your current machine.
+4. Run ```gcloud config set project PROJECT_ID``` where you replace `PROJECT-ID` with your project ID. You can find that in the projects drop down menu on the top of the Google Compute Engine window; this sets the current project as the active one. If you followed the above instructions, your project ID should be `sxxxxxxx-mlpractical`, where `sxxxxxxx` is your student number.
+5. In your compute engine window, in the line for the instance  that you have started (`mlpractical-1`), click on the downward arrow next to ```SSH```. Choose ```View gcloud command```. Copy the command to your terminal and press enter. Make sure your VM is up and running before doing this.
+6. Don't add a password to the SSH key.
+7. On your first login, you will be asked if you want to install nvidia drivers, **DO NOT AGREE** and follow the nvidia drivers installation below.
+8. Install the R470 Nvidia driver by running the following commands:
+    * Add "contrib" and "non-free" components to /etc/apt/sources.list
+    ```bash
+    sudo tee -a /etc/apt/sources.list >/dev/null <<'EOF'
+    deb http://deb.debian.org/debian/ bullseye main contrib non-free
+    deb-src http://deb.debian.org/debian/ bullseye main contrib non-free
+    EOF
+    ```
+    * Check that the lines were well added by running:
+    ```bash
+    cat /etc/apt/sources.list
+    ```
+    * Update the list of available packages and install the nvidia-driver package:
+    ```bash
+    sudo apt update
+    sudo apt install nvidia-driver firmware-misc-nonfree
+    ```
+9. Run ```nvidia-smi``` to confirm that the GPU can be found.  This should report 1 Tesla T4 GPU. if not, the driver might have failed to install.
+10. To test that PyTorch has access to the GPU you can type the commands below in your terminal. You should see `torch.cuda_is_available()` return `True`.
+    ```
+    python
+    ```
+    ```
+    import torch
+    torch.cuda.is_available()
+    ```
+    ```
+    exit()
+    ```
+11. Well done, you are now in your instance and ready to use it for your coursework.
+12. Clone a fresh mlpractical repository, and checkout branch `mlp2024-25/mlp_compute_engines`: 
+
+    ```
+    git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
+    cd ~/mlpractical
+    git checkout mlp2024-25/mlp_compute_engines
+    ```
+
+    Then, to test PyTorch running on the GPU, run this script that trains a small convolutional network on EMNIST dataset:
+
+    ```
+    python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/emnist_tutorial_config.json
+    ```
+
+    You should be able to see an experiment running, using the GPU. It should be doing about 260-300 it/s (iterations per second). You can stop it when ever you like using `ctrl-c`.
+
+If all the above matches what’s stated then you should be ready to run your experiments.
+
+To log out of your instance, simply type ```exit``` in the terminal.
+
+### Remember to ```stop``` your instance when not using it. You pay for the time you use the machine, not for the computational cycles used.
+To stop the instance go to `Compute Engine -> VM instances` on the Google Cloud Platform, slect the instance and click ```Stop```.
+
+#### Future ssh access:
+To access the instance in the future simply run the `gcloud` command you copied from the google compute engine instance page.
+
+
+## Copying data to and from an instance
+
+Please look at the [transfering files to VMs from Linux, macOS and Windows](https://cloud.google.com/compute/docs/instances/transfer-files?hl=en) and [google docs page on copying data](https://cloud.google.com/filestore/docs/copying-data). Note also the link on the page for [seting up your SSH keys (Linux or MacOS)](https://cloud.google.com/compute/docs/instances/access-overview?hl=en).
+
+To copy from local machine to a google instance, have a look at this [stackoverflow post](https://stackoverflow.com/questions/27857532/rsync-to-google-compute-engine-instance-from-jenkins).
+
+## Running experiments over ssh:
+
+If ssh fails while running an experiment, then the experiment is normally killed.
+To avoid this use the command ```screen```. It creates a process of the current session that keeps running whether a user is signed in or not.
+ 
+The basics of using screen is to use ```screen``` to create a new session, then to enter an existing session you use:
+```screen -ls```
+To get a list of all available sessions. Then once you find the one you want use:
+```screen -d -r screen_id``` 
+Replacing screen_id with the id of the session you want to enter.
+
+While in a session, you can use:
+- ```ctrl+a+esc``` To pause process and be able to scroll.
+- ```ctrl+a+d``` to detach from session while leaving it running (once you detach you can reattach using ```screen -r```).
+- ```ctrl+a+n``` to see the next session.
+- ```ctrl+a+c``` to create a new session.
+
+You are also free to use other tools such as `nohup` or `tmux`. Use online tutorials and learn it yourself.
+ 
+## Troubleshooting:
+
+| Error| Fix|
+| --- | --- |
+| ```ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].``` | Delete the ssh key files and try again: ```rm ~/.ssh/google_compute_engine*``` |
+|"Mapping" error after following step 3 (```tar zxvf google-cloud-sdk-365.0.0-linux-x86_64.tar.gz; bash google-cloud-sdk/install.sh```) | This is due to conflicts and several packages not being installed properly according to your Python version when creating your Conda environment. Run ```conda create --name mlp python=3.9``` to recreate the environment supported with Python 3.9.  Then, activate the environment ```conda activate mlp``` and follow the instructions from step 3 again.  |
+|"Mapping" error even after successfully completing steps 3 and 4 when using the ```gcloud``` command | Restart your computer and run the following command: ```export CLOUDSDK_PYTHON="/usr/bin/python3"``` |
+| ```gcloud command not found``` | Restart your computer and run the following command: ```export CLOUDSDK_PYTHON="/usr/bin/python3"``` |
+| ```module 'collections' has no attribute 'Mapping'``` when installing the Google Cloud SDK | Install Google Cloud SDK with brew: ```brew install --cask google-cloud-sdk```|
+| ```Access blocked: authorisation error``` in your browser after running ```gcloud auth login``` | Run ```gcloud components update``` and retry to login again. |
+| ```ModuleNotFoundError: No module named 'GPUtil'``` | Install the GPUtil package and you should be able to run the script afterwards: ```pip install GPUtil``` |
+| ```module mlp not found``` | Install the mlp package in your environment: ```python setup.py develop``` |
+| ```NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.``` | Remove the current driver by running: ```cd /``` and ```sudo apt purge nvidia-*``` Follow step 11 of the instructions or the following commands: (1) download the R470 driver ```wget https://us.download.nvidia.com/XFree86/Linux-x86_64/470.223.02/NVIDIA-Linux-x86_64-470.223.02.run```, (2) change the file permissions to make it executable with ```chmod +x NVIDIA-Linux-x86_64-470.223.02.run``` and (3) install the driver ```sudo ./NVIDIA-Linux-x86_64-470.223.02.run``` |
+| ```module 'torch' has no attribute 'cuda'``` | You most probably have a file named ```torch.py``` in your current directory. Rename it to something else and try again. You might need to run the setup again. Else ```import torch``` will be calling this file instead of the PyTorch library and thus causing a conflict. |
+| ```Finalizing NVIDIA driver installation. Error! Your kernel headers for kernel 5.10.0-26-cloud-amd64 cannot be found. Please install the linux-headers-5.10.0-26-cloud-amd64 package, or use the --kernelsourcedir option to tell DKMS where it's located. Driver updated for latest kernel.``` | Install the header package with ```sudo apt install linux-headers-5.10.0-26-cloud-amd64``` |
--- a/notes/mlp_cluster_quick_start_up.md
+++ b/notes/mlp_cluster_quick_start_up.md
@ -0,0 +1,176 @@
+# MLP GPU Cluster Usage Tutorial
+
+This guide is intended to guide students into the basics of using the charles GPU cluster. It is not intended to be
+an exhaustive guide that goes deep into micro-details of the Slurm ecosystem. For an exhaustive guide please visit 
+[the Slurm Documentation page.](https://slurm.schedmd.com/)
+
+
+##### For info on clusters and some tips on good cluster ettiquete please have a look at the complementary lecture slides https://docs.google.com/presentation/d/1SU4ExARZLbenZtxm3K8Unqch5282jAXTq0CQDtfvtI0/edit?usp=sharing
+
+## Getting Started
+
+### Accessing the Cluster:
+1. If you are not on a DICE machine, then ssh into your dice home using ```ssh sxxxxxx@student.ssh.inf.ed.ac.uk``` 
+2. Then ssh into either mlp1 or mlp2 which are the headnodes of the GPU cluster - it does not matter which you use. To do that
+ run ```ssh mlp1``` or ```ssh mlp2```.
+3. You are now logged into the MLP gpu cluster. If this is your first time logging in you'll need to build your environment.  This is because your home directory on the GPU cluster is separate to your usual AFS home directory on DICE.
+- Note: Alternatively you can just ```ssh sxxxxxxx@mlp.inf.ed.ac.uk``` to get there in one step.
+
+### Installing requirements:
+1. Start by downloading the miniconda3 installation file using 
+ ```wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh```.
+2. Now run the installation using ```bash Miniconda3-latest-Linux-x86_64.sh```. At the first prompt reply yes. 
+    ```
+    Do you accept the license terms? [yes|no]
+    [no] >>> yes
+    ```
+3. At the second prompt simply press enter.
+    ```
+    Miniconda3 will now be installed into this location:
+    /home/sxxxxxxx/miniconda3
+    
+      - Press ENTER to confirm the location
+      - Press CTRL-C to abort the installation
+      - Or specify a different location below
+    ```
+4. At the last prompt to initialise conda reply 'yes':
+    ```
+    Do you wish the installer to initialize Miniconda3
+    by running conda init [yes|no]
+    [no] >>> yes
+    ```
+5. Now you need to activate your environment by first running:
+```source .bashrc```.
+This reloads .bashrc which includes the new miniconda path.
+6. Run ```source activate``` to load miniconda root.
+7. Now run ```conda create -n mlp python=3``` this will create the mlp environment. At the prompt choose y.
+8. Now run ```source activate mlp```.
+9. Install git using```conda install git```. Then config git using: 
+```git config --global user.name "[your name]"; git config --global user.email "[matric-number]@sms.ed.ac.uk"```
+10. Now clone the mlpractical repo using ```git clone https://github.com/VICO-UoE/mlpractical.git```.
+11. ```cd mlpractical```
+12. Checkout the mlp_cluster_tutorial branch using ```git checkout mlp2023-24/mlp_compute_engines```.
+13. Install the required packages using ```bash install.sh```.
+
+> Note: Check that you can use the GPU version of PyTorch by running ```python -c "import torch; print(torch.cuda.is_available())"``` in a `bash` script (see the example below). If this returns `True`, then you are good to go. If it returns `False`, then you need to install the GPU version of PyTorch manually. To do this, run ```conda uninstall pytorch``` and then ```pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118``` or ```pip install torch torchvision```. This will install the latest version of PyTorch with CUDA support. This version is also compatible with older CUDA versions installed on the cluster.
+
+14. This includes all of the required installations. Proceed to the next section outlining how to use the slurm cluster
+ management software. Please remember to clean your setup files using ```conda clean -t```
+ 
+### Using Slurm
+Slurm provides us with some commands that can be used to submit, delete, view, explore current jobs, nodes and resources among others.
+To submit a job one needs to use ```sbatch script.sh``` which will automatically find available nodes and pass the job,
+ resources and restrictions required. The script.sh is the bash script containing the job that we want to run. Since we will be using the NVIDIA CUDA and CUDNN libraries 
+ we have provided a sample script which should be used for your job submissions. The script is explained in detail below:
+ 
+```bash
+#!/bin/sh
+#SBATCH -N 1	  # nodes requested
+#SBATCH -n 1	  # tasks requested
+#SBATCH --partition=Teach-Standard
+#SBATCH --gres=gpu:1
+#SBATCH --mem=12000  # memory in Mb
+#SBATCH --time=0-08:00:00
+
+export CUDA_HOME=/opt/cuda-9.0.176.1/
+
+export CUDNN_HOME=/opt/cuDNN-7.0/
+
+export STUDENT_ID=$(whoami)
+
+export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
+
+export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
+
+export CPATH=${CUDNN_HOME}/include:$CPATH
+
+export PATH=${CUDA_HOME}/bin:${PATH}
+
+export PYTHON_PATH=$PATH
+
+mkdir -p /disk/scratch/${STUDENT_ID}
+
+
+export TMPDIR=/disk/scratch/${STUDENT_ID}/
+export TMP=/disk/scratch/${STUDENT_ID}/
+
+mkdir -p ${TMP}/datasets/
+export DATASET_DIR=${TMP}/datasets/
+# Activate the relevant virtual environment:
+
+source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
+cd ..
+python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/emnist_tutorial_config.json
+```
+
+To actually run this use ```sbatch emnist_single_gpu_tutorial.sh```. When you do this, the job will be submitted and you will be given a job id.
+```bash
+[burly]sxxxxxxx: sbatch emnist_single_gpu_tutorial.sh 
+Submitted batch job 147
+
+```
+
+To view a list of all running jobs use ```squeue``` for a minimal presentation and ```smap``` for a more involved presentation. Furthermore to view node information use ```sinfo```.
+```bash
+squeue
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+               143 interacti     bash    iainr  R       8:00      1 landonia05
+               147 interacti gpu_clus sxxxxxxx  R       1:05      1 landonia02
+
+```
+Also in case you want to stop/delete a job use ```scancel job_id``` where job_id is the id of the job.
+
+Furthermore in case you want to test some of your code interactively to prototype your solution before you submit it to
+ a node you can use ```srun -p interactive  --gres=gpu:2 --pty python my_code_exp.py```.
+
+## Slurm Cheatsheet
+For a nice list of most commonly used Slurm commands please visit [here](https://bitsanddragons.wordpress.com/2017/04/12/slurm-user-cheatsheet/).
+
+## Syncing or copying data over to DICE
+
+At some point you will need to copy your data to DICE so you can analyse them and produce charts, write reports, store for future use etc.
+1. If you are on a terminal within I.F/A.T, then skip to step 2, if you are not, then, you'll first have to open a VPN into the university network using the instructions found [here](http://computing.help.inf.ed.ac.uk/openvpn).
+2. From your local machine:
+    1. To send data from a local machine to the cluster: ```rsync -ua --progress <local_path_of_data_to_transfer> <studentID>@mlp.inf.ed.ac.uk:/home/<studentID>/path/to/folder```
+    2. To receive data from the cluster to your local machine ```rsync -ua --progress <studentID>@mlp.inf.ed.ac.uk:/home/<studentID>/path/to/folder <local_path_of_data_to_transfer> ```
+
+## Running an experiment
+To run a default image classification experiment using the template models provided:
+1. Sign into the cluster using ssh sxxxxxxx@mlp1.inf.ed.ac.uk
+2. Activate your conda environment using, source miniconda3/bin/activate ; conda activate mlp
+3. cd mlpractical
+4. cd cluster_experiment_scripts
+5. Find which experiment(s) you want to run (make sure the experiment ends in 'gpu_cluster.sh'). Decide if you want to run a single experiment or multiple experiments in parallel.
+    1. For a single experiment: ```sbatch experiment_script.sh```
+    2. To run multiple experiments using the "hurdle-reducing" script that automatically submits jobs, makes sure the jobs are always in queue/running:
+        1. Make sure the cluster_experiment_scripts folder contains ***only*** the jobs you want to run. 
+        2. Run the command: 
+        ```
+        python run_jobs.py --num_parallel_jobs <number of jobs to keep in the slurm queue at all times> --num_epochs <number of epochs to run each job>
+        ```
+
+## Additional Help
+
+If you require additional help please post on piazza or if you are experiencing technical problems (actual system/hardware problems) then please submit a [computing support ticket](https://www.inf.ed.ac.uk/systems/support/form/).
+
+## List of very useful slurm commands:
+- squeue: Shows all jobs from all users currently in the queue/running
+- squeue -u <user_id>: Shows all jobs from user <user_id> in the queue/running 
+- sprio: Shows the priority score of all of your current jobs that are not yet running
+- scontrol show job <job_id>: Shows all information about job <job_id>
+- scancel <job_id>: Cancels job with id <job_id>
+- scancel -u <user_id>: Cancels all jobs, belonging to user <user_id>, that are currently in the queue/running
+- sinfo: Provides info about the cluster/partitions
+- sbatch <job_script>: Submit a job that will run the script <job_script> to the slurm scheduler.
+
+## Overview of code:
+- [arg_extractor.py](arg_extractor.py): Contains an array of utility methods that can parse python arguments or convert
+ a json config file into an argument NamedTuple.
+- [data_providers.py](data_providers.py): A sample data provider, of the same type used in the MLPractical course.
+- [experiment_builder.py](experiment_builder.py): Builds and executes a simple image classification experiment, keeping track
+of relevant statistics, taking care of storing and re-loading pytorch models, as well as choosing the best validation-performing model to evaluate the test set on.
+- [model_architectures.py](model_architectures.py): Provides a fully connected network and convolutional neural network 
+sample models, which have a number of moving parts indicated as hyperparameters.
+- [storage_utils.py](storage_utils.py): Provides a number of storage/loading methods for the experiment statistics.
+- [train_evaluated_emnist_classification_system.py](train_evaluate_emnist_classification_system.py): Runs an experiment 
+given a data provider, an experiment builder instance and a model architecture
--- a/notes/pytorch-experiment-framework.md
+++ b/notes/pytorch-experiment-framework.md
@ -0,0 +1,125 @@
+# PyTorch Experiment Framework
+
+## What does this framework do?
+The PyTorch experiment framework located in ```mlp/pytorch_mlp_framework``` includes tooling for building an array of deep neural networks,
+including fully connected and convolutional networks. In addition, it also includes tooling for experiment running, 
+metric handling and storage, model weight storage, checkpointing (allowing continuation from previous saved point), as 
+well as taking care of keeping track of the best validation model which is then used as the end to produce test set evaluation metrics.
+
+## Why do we need it?
+It serves two main purposes. The first, is to allow you an easy, worry-free transition into using PyTorch for experiments
+ in your coursework. The second, is to teach you good coding practices for building and running deep learning experiments
+  using PyTorch. The framework comes fully loaded with tooling that can keep track of relevant metrics, save models, resume from previous saved states and 
+  even automatically choose the best validation model for test set evaluation. We include documentation and comments in almost 
+  every single line of code in the framework, to help you maximize your learning. The code style itself, can be used for
+   learning good programming practices in structuring your code in a modular, readable and computationally efficient manner that minimizes chances of user-error.
+
+## Installation
+
+First thing you have to do is activate your conda MLP environment. 
+
+### GPU version on Google Compute Engine
+
+For usage on google cloud, the disk image we provide comes pre-loaded with all the packages you need to run the PyTorch
+experiment framework, including PyTorch itself.  Thus when you created an instance and setup your environment, everything you need for this framework was installed, thus removing the need for you to install PyTorch.
+
+### CPU version on DICE (or other local machine)
+
+If you do not have your MLP conda environment installed on your current machine please follow the instructions in the [MLP environment installation guide](notes/environment-set-up.md). It includes an explanation on how to install a CPU version of PyTorch, or a GPU version if you have a GPU available on your local machine.
+
+Once PyTorch is installed in your MLP conda enviroment, you can start using the framework. The framework has been built to allow you to control your experiment hyperparameters directly from the command line, by using command line argument parsing.
+
+## Using the framework
+
+You can get a list of all available hyperparameters and arguments by using:
+```
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py -h
+```
+
+The -h at the end is short for --help, which presents a list with all possible arguments next to a description of what they modify in the setup.
+Once you execute that command, you should be able to see the following list:
+
+```
+Welcome to the MLP course's PyTorch training and inference helper script
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --batch_size [BATCH_SIZE]
+                        Batch_size for experiment
+  --continue_from_epoch [CONTINUE_FROM_EPOCH]
+                        Which epoch to continue from. 
+                        If -2, continues from where it left off
+                        If -1, starts from scratch
+                        if >=0, continues from given epoch
+  --seed [SEED]         Seed to use for random number generator for experiment
+  --image_num_channels [IMAGE_NUM_CHANNELS]
+                        The channel dimensionality of our image-data
+  --image_height [IMAGE_HEIGHT]
+                        Height of image data
+  --image_width [IMAGE_WIDTH]
+                        Width of image data
+  --num_stages [NUM_STAGES]
+                        Number of convolutional stages in the network. A stage
+                        is considered a sequence of convolutional layers where
+                        the input volume remains the same in the spacial
+                        dimension and is always terminated by a dimensionality
+                        reduction stage
+  --num_blocks_per_stage [NUM_BLOCKS_PER_STAGE]
+                        Number of convolutional blocks in each stage, not
+                        including the reduction stage. A convolutional block
+                        is made up of two convolutional layers activated using
+                        the leaky-relu non-linearity
+  --num_filters [NUM_FILTERS]
+                        Number of convolutional filters per convolutional
+                        layer in the network (excluding dimensionality
+                        reduction layers)
+  --num_epochs [NUM_EPOCHS]
+                        The experiment's epoch budget
+  --num_classes [NUM_CLASSES]
+                        The experiment's epoch budget
+  --experiment_name [EXPERIMENT_NAME]
+                        Experiment name - to be used for building the
+                        experiment folder
+  --use_gpu [USE_GPU]   A flag indicating whether we will use GPU acceleration
+                        or not
+  --weight_decay_coefficient [WEIGHT_DECAY_COEFFICIENT]
+                        Weight decay to use for Adam
+  --block_type BLOCK_TYPE
+                        Type of convolutional blocks to use in our network
+                        (This argument will be useful in running experiments
+                        to debug your network)
+
+```
+
+For example, to run a simple experiment using a 7-layer convolutional network on the CPU you can run:
+
+```
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_07 --num_classes 100 --block_type 'conv_block' --weight_decay_coefficient 0.00000 --use_gpu False
+```
+
+Your experiment should begin running.
+
+Your experiments statistics and model weights are saved in the directory tutorial_exp_1/ under tutorial_exp_1/logs and 
+tutorial_exp_1/saved_models.
+
+
+To run on a GPU on Google Compute Engine the command would be:
+```
+python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_07 --num_classes 100 --block_type 'conv_block' --weight_decay_coefficient 0.00000 --use_gpu True
+
+```
+
+We have also provided the exact scripts we used to run the experiments of VGG07 and VGG37 as shown in the coursework spec inside the files:
+- run_vgg_08_default.sh
+- run_vgg_38_default.sh
+
+**However, remember, if you want to reuse those scripts for your own investigations, change the experiment name and seed.
+If you do not change the name, the old folders will be overwritten.**
+
+## So, where can I ask more questions and find more information on PyTorch and what it can do?
+
+First course of action should be to search the web and then to refer to the PyTorch [documentation](https://pytorch.org/docs/stable/index.html),
+ [tutorials](https://pytorch.org/tutorials/) and [github](https://github.com/pytorch/pytorch) sites.
+ 
+ If you still can't get an answer to your question then as always, post on Piazza and/or come to the lab sessions.
+ 
--- a/scripts/secure-notebook-server.sh
+++ b/scripts/secure-notebook-server.sh
@ -1,73 +0,0 @@
-#!/bin/bash
-# Configure Jupyter notebook server to use password authentication
-# Make sure Conda environment is active as will assume it is later
-[ -z "$CONDA_PREFIX" ] && echo "Need to have Conda environment activated." && exit 1
-if [ "$#" -gt 2 ]; then
-    echo "Usage: bash secure-notebook-server.sh [jupyter-path] [open-ssl-config-path]"
-    exit 1
-fi
-# If specified read Jupyter directory from passed argument
-JUPYTER_DIR=${1:-"$HOME/.jupyter"}
-# If specified read OpenSSL config file path from passed argument
-# This is needed due to bug in how Conda handles config path
-export OPENSSL_CONF=${2:-"$CONDA_PREFIX/ssl/openssl.cnf"}
-SEPARATOR="=================================================================\n"
-# Create default config file if one does not already exist
-if [ ! -f "$JUPYTER_DIR/jupyter_notebook_config.py" ]; then
-  echo "No existing notebook configuration file found, creating new one ..."
-  printf $SEPARATOR
-  jupyter notebook --generate-config
-  printf $SEPARATOR
-  echo "... notebook configuration file created."
-fi
-# Get user to enter notebook server password
-echo "Getting notebook server password hash. Enter password when prompted ..."
-printf $SEPARATOR
-HASH=$(python -c "from jupyter_server.auth import passwd; print(passwd());")
-printf $SEPARATOR
-echo "... got password hash."
-# Generate self-signed OpenSSL certificate and key file
-echo "Creating certificate file ..."
-printf $SEPARATOR
-CERT_INFO="/C=UK/ST=Scotland/L=Edinburgh/O=University of Edinburgh/OU=School of Informatics/CN=$USER/emailAddress=$USER@sms.ed.ac.uk"
-openssl req \
-    -x509 -nodes -days 365 \
-    -subj "/C=UK/ST=Scotland/L=Edinburgh/O=University of Edinburgh/OU=School of Informatics/CN=$USER/emailAddress=$USER@sms.ed.ac.uk" \
-    -newkey rsa:1024 -keyout "$JUPYTER_DIR/key.key" \
-    -out "$JUPYTER_DIR/cert.pem"
-printf $SEPARATOR
-echo "... certificate created."
-# Setting permissions on key file
-chmod 600 "$JUPYTER_DIR/key.key"
-# Add password hash and certificate + key file paths to config file
-echo "Setting up configuration file..."
-printf $SEPARATOR
-echo "   adding password hash"
-SRC_PSW="^#\?c\.NotebookApp\.password[ ]*=[ ]*u['"'"'"]\(sha1:[a-fA-F0-9]\+\)\?['"'"'"]"
-DST_PSW="c.NotebookApp.password = u'$HASH'"
-grep -q "c.NotebookApp.password" $JUPYTER_DIR/jupyter_notebook_config.py
-if [ ! $? -eq 0 ]; then
-  echo DST_PSW >> $JUPYTER_DIR/jupyter_notebook_config.py
-else
-  sed -i "s/$SRC_PSW/$DST_PSW/" $JUPYTER_DIR/jupyter_notebook_config.py
-fi
-echo "   adding certificate file path"
-SRC_CRT="^#\?c\.NotebookApp\.certfile[ ]*=[ ]*u['"'"'"]\([^'"'"'"]+\)\?['"'"'"]"
-DST_CRT="c.NotebookApp.certfile = u'$JUPYTER_DIR/cert.pem'"
-grep -q "c.NotebookApp.certfile" $JUPYTER_DIR/jupyter_notebook_config.py
-if [ ! $? -eq 0 ]; then
-  echo DST_CRT >> $JUPYTER_DIR/jupyter_notebook_config.py
-else
-  sed -i "s|$SRC_CRT|$DST_CRT|" $JUPYTER_DIR/jupyter_notebook_config.py
-fi
-echo "   adding key file path"
-SRC_KEY="^#\?c\.NotebookApp\.keyfile[ ]*=[ ]*u['"'"'"]\([^'"'"'"]+\)\?['"'"'"]"
-DST_KEY="c.NotebookApp.keyfile = u'$JUPYTER_DIR/key.key'"
-grep -q "c.NotebookApp.keyfile" $JUPYTER_DIR/jupyter_notebook_config.py
-if [ ! $? -eq 0 ]; then
-  echo DST_KEY >> $JUPYTER_DIR/jupyter_notebook_config.py
-else
-  sed -i "s|$SRC_KEY|$DST_KEY|" $JUPYTER_DIR/jupyter_notebook_config.py
-fi
-printf $SEPARATOR
-echo "... finished setting up configuration file."
--- a/setup.py
+++ b/setup.py
@ -1,13 +0,0 @@
-""" Setup script for mlp package. """
-
-from setuptools import setup
-
-setup(
-    name = "mlp",
-    author = "Pawel Swietojanski, Steve Renals, Matt Graham and Antreas Antoniou",
-    description = ("Neural network framework for University of Edinburgh "
-                "School of Informatics Machine Learning Practical course."),
-    url = "https://github.com/VICO-UoE/mlpractical",
-    packages=['mlp']
-)
-
--- a/storage_utils.py
+++ b/storage_utils.py
@ -0,0 +1,70 @@
+import pickle
+import os
+import csv
+
+
+def save_to_stats_pkl_file(experiment_log_filepath, filename, stats_dict):
+    summary_filename = os.path.join(experiment_log_filepath, filename)
+    with open("{}.pkl".format(summary_filename), "wb") as file_writer:
+        pickle.dump(stats_dict, file_writer)
+
+
+def load_from_stats_pkl_file(experiment_log_filepath, filename):
+    summary_filename = os.path.join(experiment_log_filepath, filename)
+    with open("{}.pkl".format(summary_filename), "rb") as file_reader:
+        stats = pickle.load(file_reader)
+
+    return stats
+
+
+def save_statistics(experiment_log_dir, filename, stats_dict, current_epoch, continue_from_mode=False, save_full_dict=False):
+    """
+    Saves the statistics in stats dict into a csv file. Using the keys as the header entries and the values as the
+    columns of a particular header entry
+    :param experiment_log_dir: the log folder dir filepath
+    :param filename: the name of the csv file
+    :param stats_dict: the stats dict containing the data to be saved
+    :param current_epoch: the number of epochs since commencement of the current training session (i.e. if the experiment continued from 100 and this is epoch 105, then pass relative distance of 5.)
+    :param save_full_dict: whether to save the full dict as is overriding any previous entries (might be useful if we want to overwrite a file)
+    :return: The filepath to the summary file
+    """
+    summary_filename = os.path.join(experiment_log_dir, filename)
+    mode = 'a' if continue_from_mode else 'w'
+    with open(summary_filename, mode) as f:
+        writer = csv.writer(f)
+        if not continue_from_mode:
+            writer.writerow(list(stats_dict.keys()))
+
+        if save_full_dict:
+            total_rows = len(list(stats_dict.values())[0])
+            for idx in range(total_rows):
+                row_to_add = [value[idx] for value in list(stats_dict.values())]
+                writer.writerow(row_to_add)
+        else:
+            row_to_add = [value[current_epoch] for value in list(stats_dict.values())]
+            writer.writerow(row_to_add)
+
+    return summary_filename
+
+
+def load_statistics(experiment_log_dir, filename):
+    """
+    Loads a statistics csv file into a dictionary
+    :param experiment_log_dir: the log folder dir filepath
+    :param filename: the name of the csv file to load
+    :return: A dictionary containing the stats in the csv file. Header entries are converted into keys and columns of a
+     particular header are converted into values of a key in a list format.
+    """
+    summary_filename = os.path.join(experiment_log_dir, filename)
+
+    with open(summary_filename, 'r+') as f:
+        lines = f.readlines()
+
+    keys = lines[0].split(",")
+    stats = {key: [] for key in keys}
+    for line in lines[1:]:
+        values = line.split(",")
+        for idx, value in enumerate(values):
+            stats[keys[idx]].append(value)
+
+    return stats
--- a/train_evaluate_emnist_classification_system.py
+++ b/train_evaluate_emnist_classification_system.py
@ -0,0 +1,95 @@
+import numpy as np
+
+import data_providers as data_providers
+from arg_extractor import get_args
+from data_augmentations import Cutout
+from experiment_builder import ExperimentBuilder
+from model_architectures import ConvolutionalNetwork
+
+args, device = get_args()  # get arguments from command line
+rng = np.random.RandomState(seed=args.seed)  # set the seeds for the experiment
+
+from torchvision import transforms
+import torch
+
+torch.manual_seed(seed=args.seed)  # sets pytorch's seed
+
+
+
+if args.dataset_name == 'emnist':
+    train_data = data_providers.EMNISTDataProvider('train', batch_size=args.batch_size,
+                                                   rng=rng,
+                                                   flatten=False)  # initialize our rngs using the argument set seed
+    val_data = data_providers.EMNISTDataProvider('valid', batch_size=args.batch_size,
+                                                 rng=rng,
+                                                 flatten=False)  # initialize our rngs using the argument set seed
+    test_data = data_providers.EMNISTDataProvider('test', batch_size=args.batch_size,
+                                                  rng=rng,
+                                                  flatten=False)  # initialize our rngs using the argument set seed
+    num_output_classes = train_data.num_classes
+
+elif args.dataset_name == 'cifar10':
+    transform_train = transforms.Compose([
+        transforms.RandomCrop(32, padding=4),
+        Cutout(n_holes=1, length=14),
+        transforms.RandomHorizontalFlip(),
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ])
+
+    transform_test = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ])
+
+    trainset = data_providers.CIFAR10(root='data', set_name='train', download=True, transform=transform_train)
+    train_data = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2)
+
+    valset = data_providers.CIFAR10(root='data', set_name='val', download=True, transform=transform_test)
+    val_data = torch.utils.data.DataLoader(valset, batch_size=100, shuffle=False, num_workers=2)
+
+    testset = data_providers.CIFAR10(root='data', set_name='test', download=True, transform=transform_test)
+    test_data = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)
+
+    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
+    num_output_classes = 10
+
+elif args.dataset_name == 'cifar100':
+    transform_train = transforms.Compose([
+        transforms.RandomCrop(32, padding=4),
+        Cutout(n_holes=1, length=14),
+        transforms.RandomHorizontalFlip(),
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ])
+
+    transform_test = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ])
+
+    trainset = data_providers.CIFAR100(root='data', set_name='train', download=True, transform=transform_train)
+    train_data = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2)
+
+    valset = data_providers.CIFAR100(root='data', set_name='val', download=True, transform=transform_test)
+    val_data = torch.utils.data.DataLoader(valset, batch_size=100, shuffle=False, num_workers=2)
+
+    testset = data_providers.CIFAR100(root='data', set_name='test', download=True, transform=transform_test)
+    test_data = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)
+
+    num_output_classes = 100
+
+custom_conv_net = ConvolutionalNetwork(  # initialize our network object, in this case a ConvNet
+    input_shape=(args.batch_size, args.image_num_channels, args.image_height, args.image_height),
+    dim_reduction_type=args.dim_reduction_type, num_filters=args.num_filters, num_layers=args.num_layers,
+    use_bias=False,
+    num_output_classes=num_output_classes)
+
+conv_experiment = ExperimentBuilder(network_model=custom_conv_net, use_gpu=args.use_gpu,
+                                    experiment_name=args.experiment_name,
+                                    num_epochs=args.num_epochs,
+                                    weight_decay_coefficient=args.weight_decay_coefficient,
+                                    continue_from_epoch=args.continue_from_epoch,
+                                    train_data=train_data, val_data=val_data,
+                                    test_data=test_data)  # build an experiment object
+experiment_metrics, test_metrics = conv_experiment.run_experiment()  # run experiment and return experiment metrics
Author	SHA1	Message	Date
Anton Lydike	1313d8ab2e	update google notes to be what I think they should be	2024-11-18 15:24:31 +00:00
tpmmthomas	56f22b8ac1	update	2024-10-23 02:05:32 +08:00
tpmmthomas	4bf1a79681	debug arg_extractor.py	2024-10-23 02:04:02 +08:00
tpmmthomas	0f1d7a1498	update notes	2024-10-23 02:01:28 +08:00
tpmmthomas	48d1e846ea	update gcloud instructions	2024-10-23 01:59:06 +08:00