mlpractical/notebooks/04_Generalisation_and_overfitting.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generalisation and overfitting\n",
    "\n",
    "In this notebook we will explore the issue of overfitting and how we can measure how well the models we train generalise their predictions to unseen data. This will build upon the introduction to generalisation given in the fourth lecture."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1: overfitting and model complexity in a 1D regression problem\n",
    "\n",
    "As an exercise we will consider a regression problem. In particular, given a fixed set of (noisy) observations of the underlying functional relationship between inputs and outputs, we will attempt to use a multiple layer network model to learn to predict output values from inputs. The aim of the exercise will be to visualise how increasing the complexity of the model we fit to the training data effects the ability of the model to make predictions across the input space.\n",
    "\n",
    "### Function\n",
    "\n",
    "To keep things simple we will consider a single input-output function defined by a fourth degree polynomial (quartic)\n",
    "\n",
    "$$ f(x) = 10 x^4 - 17 x^3 + 8 x^2 - x $$\n",
    "\n",
    "with the observed values being the function values plus zero-mean Gaussian noise\n",
    "\n",
    "$$ y = f(x) + 0.01 \\epsilon \\qquad \\epsilon \\sim \\mathcal{N}\\left(\\cdot;\\,0,\\,1\\right) $$\n",
    "\n",
    "The inputs will be drawn from the uniform distribution on $[0, 1]$.\n",
    "\n",
    "First import the necessary modules and seed the random number generator by running the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.style.use('ggplot')\n",
    "seed = 17102016 \n",
    "rng = np.random.RandomState(seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Your Tasks:**\n",
    "\n",
    "Write code in the cell below to calculate a polynomial function of one dimensional inputs. \n",
    "\n",
    "If $\\boldsymbol{c}$ is a length $P$ vector of coefficients corresponding to increasing powers in the polynomial (starting from the constant zero power term up to the $P-1^{\\textrm{th}}$ power) the function should correspond to the following\n",
    "\n",
    "\\begin{equation}\n",
    "  f_{\\textrm{polynomial}}(x,\\ \\boldsymbol{c}) = \\sum_{p=0}^{P-1} \\left( c_p x^p \\right)\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "def polynomial_function(inputs, coefficients):\n",
    "    \"\"\"Calculates polynomial with given coefficients of an array of inputs.\n",
    "    \n",
    "    Args:\n",
    "        inputs: One-dimensional array of input values of shape (num_inputs,)\n",
    "        coefficients: One-dimensional array of polynomial coefficient terms\n",
    "           with `coefficients[0]` corresponding to the coefficient for the\n",
    "           zero order term in the polynomial (constant) and `coefficients[-1]`\n",
    "           corresponding to the highest order term.\n",
    "           \n",
    "    Returns:\n",
    "        One dimensional array of output values of shape (num_inputs,)\n",
    "    \n",
    "    \"\"\"\n",
    "    raise NotImplementedError(\"TODO Implement this function\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell below to test your implementation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_coefficients = np.array([-1., 3., 4.])\n",
    "test_inputs = np.array([0., 0.5, 1., 2.])\n",
    "test_outputs = np.array([-1., 1.5, 6., 21.])\n",
    "assert polynomial_function(test_inputs, test_coefficients).shape == (4,), (\n",
    "    'Function gives wrong shape output.'\n",
    ")\n",
    "assert np.allclose(polynomial_function(test_inputs, test_coefficients), test_outputs), (\n",
    "    'Function gives incorrect output values.'\n",
    ")\n",
    "print(\"Function is correct!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now need to use the random number generator to sample input values and calculate the corresponding target outputs using your polynomial implementation with the relevant coefficients for our function. Do this by running the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "coefficients = np.array([0, -1., 8., -17., 10.])\n",
    "input_dim, output_dim = 1, 1\n",
    "noise_std = 0.01\n",
    "num_data = 80\n",
    "inputs = rng.uniform(size=(num_data, input_dim))\n",
    "epsilons = rng.normal(size=num_data)\n",
    "targets = (polynomial_function(inputs[:, 0], coefficients) + \n",
    "           epsilons * noise_std)[:, None]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will split the generated data points in to equal sized training and validation data sets and use these to create data provider objects which we can use to train models in our framework. As the dataset is small here we will use a batch size equal to the size of the data set. Run the cell below to split the data and set up the data provider objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mlp.data_providers import DataProvider\n",
    "num_train = num_data // 2\n",
    "batch_size = num_train\n",
    "inputs_train, targets_train = inputs[:num_train], targets[:num_train]\n",
    "inputs_valid, targets_valid = inputs[num_train:], targets[num_train:]\n",
    "train_data = DataProvider(inputs_train, targets_train, batch_size=batch_size, rng=rng)\n",
    "valid_data = DataProvider(inputs_valid, targets_valid, batch_size=batch_size, rng=rng)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now visualise the data we will be modelling. Run the cell below to plot the target outputs against inputs for both the training and validation sets. Note the clear underlying smooth functional relationship evident in the noisy data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(8, 4))\n",
    "ax = fig.add_subplot(111)\n",
    "ax.plot(inputs_train[:, 0], targets_train[:, 0], '.', label='training data')\n",
    "ax.plot(inputs_valid[:, 0], targets_valid[:, 0], '.', label='validation data')\n",
    "ax.set_xlabel('Inputs $x$', fontsize=14)\n",
    "ax.set_ylabel('Ouputs $y$', fontsize=14)\n",
    "ax.legend(loc='best')\n",
    "fig.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model\n",
    "\n",
    "We will fit models with a varying number of parameters to the training data. As multi-layer logistic sigmoid models tend to perform poorly in regressions tasks like this we will instead use a [radial basis function (RBF) network](https://en.wikipedia.org/wiki/Radial_basis_function_network).\n",
    "\n",
    "This model predicts the output as the weighted sum of basis functions (here Gaussian like bumps) tiled across the input space. The cell below generates a random set of weights and bias for a RBF network and plots the modelled input-output function across inputs $[0, 1]$. Run the cell below for several different number of weight parameters (specified with `num_weights` variable) to get a feel for the sort of predictions the RBF network models produce."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_weights = 15\n",
    "weights_scale = 1.\n",
    "bias_scale = 1.\n",
    "\n",
    "def basis_function(x, centre, scale):\n",
    "    return np.exp(-(x - centre)**2 / scale**2)\n",
    "\n",
    "weights = rng.normal(size=num_weights) * weights_scale\n",
    "bias = rng.normal() * bias_scale\n",
    "\n",
    "centres = np.linspace(0, 1, weights.shape[0])\n",
    "scale = 1. / weights.shape[0]\n",
    "\n",
    "xs = np.linspace(0, 1, 200)\n",
    "ys = np.zeros(xs.shape[0])\n",
    "\n",
    "fig = plt.figure(figsize=(12, 4))\n",
    "ax = fig.add_subplot(1, 1, 1)\n",
    "for weight, centre in zip(weights, centres):\n",
    "    ys += weight * basis_function(xs, centre, scale)\n",
    "ys += bias\n",
    "ax.plot(xs, ys)\n",
    "ax.set_xlabel('Input', fontsize=14)\n",
    "ax.set_ylabel('Output', fontsize=14)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You do not need to study the details of how to implement this model. All of the additional code you need to fit RBF networks is provided in the `RadialBasisFunctionLayer` in the `mlp.layers` module. The `RadialBasisFunctionLayer` class has the same interface as the layer classes we encountered in the previous lab, defining both `fprop` and `bprop` methods, and we can therefore include it as a layer in a `MultipleLayerModel` as with any other layer. \n",
    "\n",
    "Here we will use the `RadialBasisFunctionLayer` as the first layer in a two layer model. This first layer calculates the basis function terms which are then be weighted and summed together in an `AffineLayer`, the second and final layer. This illustrates the advantage of using a modular framework - we can reuse the code we previously implemented to train a quite different model architecture just by defining a new layer class. \n",
    "\n",
    "Run the cell below which contains some necessary setup code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mlp.models import MultipleLayerModel\n",
    "from mlp.layers import AffineLayer, RadialBasisFunctionLayer\n",
    "from mlp.errors import SumOfSquaredDiffsError\n",
    "from mlp.initialisers import ConstantInit, UniformInit\n",
    "from mlp.learning_rules import GradientDescentLearningRule\n",
    "from mlp.optimisers import Optimiser\n",
    "\n",
    "# Regression problem therefore use sum of squared differences error\n",
    "error = SumOfSquaredDiffsError()\n",
    "# Use basic gradient descent learning rule with fixed learning rate\n",
    "learning_rule = GradientDescentLearningRule(0.1)\n",
    "# Initialise weights from uniform distribution and zero bias\n",
    "weights_init = UniformInit(-0.1, 0.1)\n",
    "biases_init = ConstantInit(0.)\n",
    "# Train all models for 2000 epochs\n",
    "num_epoch = 2000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next cell defines RBF network models with varying number of weight parameters (equal to the number of basis functions) and fits each to the training set, recording the final training and validation set errors for the fitted models. Run it now to fit the models and calculate the error values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_weight_list = [2, 5, 10, 25, 50, 100]\n",
    "models = []\n",
    "train_errors = []\n",
    "valid_errors = []\n",
    "for num_weight in num_weight_list:\n",
    "    model = MultipleLayerModel([\n",
    "        RadialBasisFunctionLayer(num_weight),\n",
    "        AffineLayer(input_dim * num_weight, output_dim, \n",
    "                    weights_init, biases_init)\n",
    "    ])\n",
    "    optimiser = Optimiser(model, error, learning_rule, \n",
    "                            train_data, valid_data)\n",
    "    print('-' * 80)\n",
    "    print('Training model with {0} weights'.format(num_weight))\n",
    "    print('-' * 80)\n",
    "    _ = optimiser.train(num_epoch, -1)\n",
    "    outputs_train = model.fprop(inputs_train)[-1]\n",
    "    outputs_valid = model.fprop(inputs_valid)[-1]\n",
    "    models.append(model)\n",
    "    train_errors.append(error(outputs_train, targets_train))\n",
    "    valid_errors.append(error(outputs_valid, targets_valid))\n",
    "    print('  Final training set error: {0:.1e}'.format(train_errors[-1]))\n",
    "    print('  Final validation set error: {0:.1e}'.format(valid_errors[-1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Your Tasks**\n",
    "\n",
    "In the cell below write code to [plot bar charts](http://matplotlib.org/examples/api/barchart_demo.html) of the training and validation set errors for the different fitted models.\n",
    "\n",
    "Some questions to think about from the plots:\n",
    "\n",
    "  * Do the models with more free parameters fit the training data better or worse?\n",
    "  * What does the validation set error value tell us about the models?\n",
    "  * Of the models fitted here which would you say seems like it is most likely to generalise well to unseen data? \n",
    "  * Do any of the models seem to be overfitting?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO plot the bar charts here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's visualise what the fitted model's predictions look like across the whole input space compared to the 'true' function we were trying to fit. \n",
    "\n",
    "**Your Tasks:**  \n",
    "\n",
    "In the cell below, for each of the fitted models stored in the `models` list above:\n",
    "  * Compute output predictions for the model across a linearly spaced series of 500 input points between 0 and 1 in the input space.\n",
    "  * Plot the computed predicted outputs and true function values at the corresponding inputs as line plots on the same axis (use a new axis for each model).\n",
    "  * On the same axis plot the training data sets input-target pairs as points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "#TODO plot the graphs here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should be able to relate your answers to the questions above to what you see in these plots - ask a demonstrator if you are unsure what is going on. In particular for the models which appeared to be overfitting and generalising poorly you should now have an idea how this looks in terms of the model's predictions and how these relate to the training data points and true function values.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PyTorch\n",
    "\n",
    "As we have seen in the [previous lab](https://github.com/VICO-UoE/mlpractical/tree/mlp2023-24/lab3/notebooks/03_Multiple_layer_models.ipynb), our model shows signs of overfitting after $15$ epochs. \n",
    "\n",
    "Overfitting occurs when the model learns the training data too well, and fails to generalise to unseen data. In this case, the model learns the noise in the training data, and fails to learn the underlying function. \n",
    "\n",
    "The model may be too complex, and we can reduce the complexity by reducing the number of parameters. However, this may not be the best solution, as we may not be able to learn the underlying function with a simple model.\n",
    "\n",
    "In practice, a model that overfits the training data will have a low training error, but a high validation error.\n",
    "\n",
    "*What can we deduce if we observe a high training error and a high validation error?*\n",
    "\n",
    "Overfitting is a common problem in machine learning, and there are many techniques to avoid it. In this lab, we will explore one of them: *early stopping*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "from torchvision import datasets,transforms\n",
    "from torch.utils.data.sampler import SubsetRandomSampler\n",
    "\n",
    "torch.manual_seed(seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Device configuration\n",
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "\n",
    "# Set training run hyperparameters\n",
    "batch_size = 128  # number of data points in a batch\n",
    "learning_rate = 0.001  # learning rate for gradient descent\n",
    "num_epochs = 50  # number of training epochs to perform\n",
    "stats_interval = 1  # epoch interval between recording and printing stats"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transform=transforms.Compose([\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize((0.1307,), (0.3081,))\n",
    "        ])\n",
    "\n",
    "train_dataset = datasets.MNIST('../data', train=True, download=True, transform=transform)\n",
    "test_dataset = datasets.MNIST('../data', train=False, download=True, transform=transform)\n",
    "\n",
    "valid_size=0.2 # Leave 20% of training set as validation set\n",
    "num_train = len(train_dataset)\n",
    "indices = list(range(num_train))\n",
    "split = int(np.floor(valid_size * num_train))\n",
    "np.random.shuffle(indices) # Shuffle indices in-place\n",
    "train_idx, valid_idx = indices[split:], indices[:split] # Split indices into training and validation sets\n",
    "train_sampler = SubsetRandomSampler(train_idx)\n",
    "valid_sampler = SubsetRandomSampler(valid_idx)\n",
    "\n",
    "# Create the dataloaders\n",
    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler, pin_memory=True)\n",
    "valid_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=valid_sampler, pin_memory=True)\n",
    "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, pin_memory=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MultipleLayerModel(nn.Module):\n",
    "    \"\"\"Multiple layer model.\"\"\"\n",
    "    def __init__(self, input_dim, output_dim, hidden_dim):\n",
    "        super().__init__()\n",
    "        self.flatten = nn.Flatten()\n",
    "        self.linear_relu_stack = nn.Sequential(\n",
    "            nn.Linear(input_dim, hidden_dim),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(hidden_dim, hidden_dim),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(hidden_dim, output_dim),\n",
    "        )\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = self.flatten(x)\n",
    "        logits = self.linear_relu_stack(x)\n",
    "        return logits\n",
    "    \n",
    "input_dim = 1*28*28\n",
    "output_dim = 10\n",
    "hidden_dim = 100\n",
    "\n",
    "model = MultipleLayerModel(input_dim, output_dim, hidden_dim).to(device)\n",
    "\n",
    "loss = nn.CrossEntropyLoss() # Cross-entropy loss function\n",
    "optimizer = optim.Adam(model.parameters(), lr=learning_rate) # Adam optimiser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Early stopping is a simple technique to avoid overfitting. The idea is to stop training when the validation error starts to increase. This is usually done by monitoring the validation error during training, and stopping when it has not decreased for a certain number of epochs.\n",
    "\n",
    "*Can we state that overfitting is ultimatelly inevitable given training over a very large number of epochs?*\n",
    "\n",
    "In this section, we will implement early stopping in PyTorch. We will use the same model as in the previous lab, but we will train it for $50$ epochs with an early stopping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "class EarlyStopping:\n",
    "    \"\"\"Early stops the training if validation loss doesn't improve after a given patience.\"\"\"\n",
    "    def __init__(self, patience=5, min_delta=0):\n",
    "\n",
    "        self.patience = patience # Number of epochs with no improvement after which training will be stopped\n",
    "        self.min_delta = min_delta # Minimum change in the monitored quantity to qualify as an improvement\n",
    "        self.counter = 0\n",
    "        self.min_validation_loss = float('inf')\n",
    "        self.early_stop = False\n",
    "\n",
    "    def __call__(self, validation_loss):\n",
    "        # If validation loss is lower than minimum validation loss so far,\n",
    "        # reset counter and set minimum validation loss to current validation loss\n",
    "        if validation_loss < self.min_validation_loss:\n",
    "            self.min_validation_loss = validation_loss\n",
    "            self.counter = 0\n",
    "        # If validation loss hasn't improved since minimum validation loss,\n",
    "        # increment counter\n",
    "        elif validation_loss > (self.min_validation_loss + self.min_delta):\n",
    "            self.counter += 1\n",
    "            # If counter has reached patience, set early_stop flag to True\n",
    "            if self.counter >= self.patience: \n",
    "                self.early_stop = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise early stopping object\n",
    "early_stopping = EarlyStopping(patience=5, min_delta=0.01)\n",
    "\n",
    "# Keep track of the loss values over training\n",
    "train_loss = [] \n",
    "valid_loss = []\n",
    "\n",
    "# Train model\n",
    "for i in range(num_epochs+1):\n",
    "    # Training\n",
    "    model.train()\n",
    "    batch_loss = []\n",
    "    for batch_idx, (x, t) in enumerate(train_loader):\n",
    "        x = x.to(device)\n",
    "        t = t.to(device)\n",
    "        \n",
    "        # Forward pass\n",
    "        y = model(x)\n",
    "        E_value = loss(y, t)\n",
    "        \n",
    "        # Backward pass\n",
    "        optimizer.zero_grad()\n",
    "        E_value.backward()\n",
    "        optimizer.step()\n",
    "        \n",
    "        # Logging\n",
    "        batch_loss.append(E_value.item())\n",
    "    \n",
    "    train_loss.append(np.mean(batch_loss))\n",
    "\n",
    "    # Validation\n",
    "    model.eval()\n",
    "    batch_loss = []\n",
    "    for batch_idx, (x, t) in enumerate(valid_loader):\n",
    "        x = x.to(device)\n",
    "        t = t.to(device)\n",
    "        \n",
    "        # Forward pass\n",
    "        y = model(x)\n",
    "        E_value = loss(y, t)\n",
    "        \n",
    "        # Logging\n",
    "        batch_loss.append(E_value.item())\n",
    "    \n",
    "    valid_loss.append(np.mean(batch_loss))\n",
    "\n",
    "    if i % stats_interval == 0:\n",
    "            print('Epoch: {} \\tError(train): {:.6f} \\tError(valid): {:.6f} '.format(\n",
    "                i, train_loss[-1], valid_loss[-1]))\n",
    "            \n",
    "    # Check for early stopping\n",
    "    early_stopping(valid_loss[-1])\n",
    "\n",
    "    if early_stopping.early_stop:\n",
    "        print(\"Early stopping\")\n",
    "        break # Stop training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot the change in the validation and training set error over training.\n",
    "fig_1 = plt.figure(figsize=(8, 4))\n",
    "ax_1 = fig_1.add_subplot(111)\n",
    "ax_1.plot(train_loss, label='Error(train)')\n",
    "ax_1.plot(valid_loss, label='Error(valid)')\n",
    "ax_1.legend(loc=0)\n",
    "ax_1.set_xlabel('Epoch number')\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "mlp",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}