mlpractical/notebooks/Coursework_2_Pytorch_Introduction.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to PyTorch \n",
    "\n",
    "## Introduction\n",
    "Pytorch is a  modern, intuitive, Pythonic and fast framework for building differentiable graphs. Neural networks, as it happens, are a type of acyclic differentiable graph, making PyTorch a convenient framework to use, should you wish to build (potentially) complicated deep neural networks fairly easily.\n",
    "\n",
    "## MLP package vs Pytorch\n",
    "**Student**: Why do I have to learn to use PyTorch now? I've spent all this time working on the MLP framework. Was that a waste of time?\n",
    "\n",
    "**TA**: Pytorch is everything the MLP package is, and more. It's faster, cleaner and far more up to date with modern deep learning advances, meaning it is easy to tailor to experiments you may wish to run. Since it is one of the main deep learning frameworks being used by industry and research alike, it conforms to the expectation of real users like researchers and engineers. The result is that PyTorch is (and continues to become) a robust and flexible package. Coming to grips with PyTorch now means that you'll be able to apply it to any future project that uses deep learning. \n",
    "\n",
    "Furthermore, the MLP framework was written in NumPy and your time developing this has taught you some fundamental implementation details of NNs: this could (and should) make future research directions more easy to think of and will also enable your debugging prowess. PyTorch was written to emulate NumPy as much as possible, so it will feel very familiar to you. The skills you have acquired are highly transferable (they generalize well, so not much overfitting there!).\n",
    "\n",
    "The devleopers of PyTorch try to make sure that the \"latest and greatest\" state-of-the-art research is included and implemented. If this is not the case, you will often find other people reproducing . If you can't wait, you can reproduce it yourself and open source it (a great way to showcase your skills and get github likes).\n",
    "\n",
    "PyTorch has Autograd! Automatic differentiation. \"What is this?\" you may ask. Remember having to write all those backprop functions? Forget about it. Automatic differentiation allows you to backprop through any PyTorch operation you have used in your graph, by simply calling backward(). This [blog-post](https://jdhao.github.io/2017/11/12/pytorch-computation-graph/) explains how Pytorch's autograd works at an intuitive level.\n",
    "\n",
    "**Student**: Why did we even have to use the MLP package? Why did we even bother if such awesome frameworks are available?\n",
    "\n",
    "**TA**: The purpose of the MLP package was not to allow you to build fast deep learning systems. Instead, it was to help teach you the low level mechanics and sensitivities of building a deep learning system. Building this enabled you to dive deep into how to go about building a deep learning framework from scratch. The intuitions you have gained from going through your assignments and courseworks allow you to see deeper in what makes or breaks a deep learning system, at a level few people actually care to explore. You are no longer restricted to the higher level modules provided by Pytorch/TensorFlow. \n",
    "\n",
    "If, for example, a new project required you to build something that does not exist in PyTorch/TensorFlow, or otherwise modify existing modules in a way that requires understanding and intuitions on backpropagation and layer/optimizer/component implementation, you would be able to do it much more easily than others who did not. You are now equipped to understand differentiable graphs, the chain rule, numerical errors, debugging at the lowest level and deep learning system architecture. \n",
    "\n",
    "By trying to implement your modules in an efficient way, you have also become aware of how to optimize a system for efficiency, and gave you intuitions on how one could further improve such a system (parallelization of implementations). \n",
    "\n",
    "Finally, the slowness of CPU training has allowed you to understand just how important modern GPU acceleration is, for deep learning research and applications. By coming across a large breadth of problems and understanding their origins, you will now be able to both anticipate and solve future problems in a more comprehensive way than someone who did not go through the trouble of implementing the basics from scratch. \n",
    "<!-- \n",
    "**Student**: If we are switching to Pytorch, then why bother implementing convolutions in the MLP package for the coursework?\n",
    "\n",
    "**TA**: All your instructors, myself included, have found it greatly beneficial to implement convolutional networks from scratch. Once you implement convolutional layers, you will have a much deeper insight and understanding into how and why they work, as well as how they break. This way, you know what to do and what to avoid in the future. You might even be able to come with the next great network type yourself.  -->\n",
    "\n",
    "\n",
    "## Getting Started\n",
    "\n",
    "**Student**: So, how is the learning curve of Pytorch? How do I start?\n",
    "\n",
    "**TA**: You can start by using this notebook on your experiments, it should teach you quite a lot on how to properly use PyTorch for basic conv net training. You should be aware of the [official pytorch github](https://github.com/pytorch/pytorch), the [pytorch official documentation page](https://pytorch.org/docs/stable/nn.html) and the [pytorch tutorials page](https://pytorch.org/tutorials/). \n",
    "\n",
    "Over the past year, nearly all students using PyTorch and Tensorflow on MLP and on projects found it easier and faster to get up to speed with PyTorch. In fact, I was a TensorFlow user myself, and learning TensorFlow was much more challenging than PyTorch. Mainly because TensorFlow has its own way of 'thinking' about how you build a graph and execute operations - whereas PyTorch is dynamic and works like NumPy, hence is more intuitive. If you were able to work well with the MLP package, you'll be up and running in no time. \n",
    "\n",
    "**Student**: OK, so how fast is pytorch compared to MLP?\n",
    "\n",
    "**TA**: On the CPU side of things, you'll find pytorch at least 5x faster than the MLP framework (about equal for fully connected networks, but much faster for more complicated things like convolutions - unless you write extremely efficient convolutional layer code), and if you choose to use GPUs, either using MS Azure, Google Cloud or our very own MLP Cluster (available for next semester), you can expect, depending on implementation and hardware an approximate 25-70x speed ups, compared to the CPU performance of pytorch. Yes, that means an experiment that would run overnight, now would only require about 15 minutes.\n",
    "\n",
    "**Student**: Ahh, where should I go to ask more questions?\n",
    "\n",
    "**TA**: As always, start with a Google/DuckDuckGo search, then have a look at the PyTorch Github and PyTorch docs, and if you can't find the answer come to Piazza and the lab sessions. We will be there to support you.\n",
    "\n",
    "\n",
    "#### Note: The code in this jupyter notebook is to introduce you to pytorch and allow you to play around with it in an interactive manner. However, to run your experiments, you should use the Pytorch experiment framework located in ```pytorch_mlp_framework/```. Instructions on how to use it can be found in ```notes/pytorch-experiment-framework.md``` along with the comments and documentation included in the code itself."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports and helper functions\n",
    "\n",
    "First, let's import the packages necessary for our tutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch import nn\n",
    "from copy import deepcopy\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import torch.nn.functional as F\n",
    "import torch.backends.cudnn as cudnn\n",
    "import torchvision\n",
    "import tqdm\n",
    "import os\n",
    "import mlp.data_providers as data_providers\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's write a helper function for plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "\n",
    "def plot_stats_in_graph(total_losses, y_axis_label, x_axis_label):\n",
    "    \n",
    "    # Plot the change in the validation and training set error over training.\n",
    "    fig_1 = plt.figure(figsize=(8, 4))\n",
    "    ax_1 = fig_1.add_subplot(111)\n",
    "    for k in total_losses.keys():\n",
    "        if \"loss\" in k:\n",
    "            ax_1.plot(np.arange(len(total_losses[k])), total_losses[k], label=k)\n",
    "    ax_1.legend(loc=0)\n",
    "    ax_1.set_xlabel(x_axis_label)\n",
    "    ax_1.set_ylabel(y_axis_label)\n",
    "    \n",
    "\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basics: What is a tensor?\n",
    "\n",
    "In numpy we used arrays, whereas in pytorch we use tensors. Tensors are basically multi-dimensional arrays, that can also automatically compute backward passes, and thus gradients, as well as store data to be used at any point in our pytorch pipelines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([ 5.,  1., 10.]) tensor(5.3333) tensor(3.6818) \n",
      " [ 5.  1. 10.] 5.3333335 3.6817868\n"
     ]
    }
   ],
   "source": [
    "data_pytorch = torch.Tensor([5., 1., 10.]).float()\n",
    "data_numpy = np.array([5., 1., 10]).astype(np.float32)\n",
    "\n",
    "print(data_pytorch, data_pytorch.mean(), data_pytorch.std(unbiased=False), '\\n',\n",
    "      data_numpy, data_numpy.mean(), data_numpy.std())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tensors have a rich support for a variety of operations, for more information look at the official pytorch [documentation page](https://pytorch.org/docs/stable/torch.html#torch.std)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basics: A simple pytorch graph of operations\n",
    "\n",
    "Pytorch automatically tracks the flow of data through operations without requiring explicit instruction to do so. \n",
    "For example, we can easily compute the grads wrt to a variable **a** (which is initialized with requires grad = True to let the framework know that we'll be requiring the grads of that variable) by simple calling .backward() followed by .grad:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[[0.0019, 0.0018, 0.0016,  ..., 0.0024, 0.0022, 0.0021],\n",
      "          [0.0019, 0.0025, 0.0017,  ..., 0.0028, 0.0024, 0.0023],\n",
      "          [0.0024, 0.0019, 0.0025,  ..., 0.0023, 0.0012, 0.0027],\n",
      "          ...,\n",
      "          [0.0023, 0.0021, 0.0025,  ..., 0.0017, 0.0027, 0.0019],\n",
      "          [0.0026, 0.0023, 0.0015,  ..., 0.0028, 0.0024, 0.0028],\n",
      "          [0.0019, 0.0010, 0.0024,  ..., 0.0021, 0.0014, 0.0019]],\n",
      "\n",
      "         [[0.0027, 0.0026, 0.0022,  ..., 0.0025, 0.0027, 0.0022],\n",
      "          [0.0025, 0.0023, 0.0025,  ..., 0.0020, 0.0024, 0.0030],\n",
      "          [0.0024, 0.0027, 0.0024,  ..., 0.0014, 0.0019, 0.0023],\n",
      "          ...,\n",
      "          [0.0025, 0.0024, 0.0012,  ..., 0.0027, 0.0022, 0.0024],\n",
      "          [0.0021, 0.0023, 0.0026,  ..., 0.0024, 0.0020, 0.0022],\n",
      "          [0.0022, 0.0019, 0.0026,  ..., 0.0013, 0.0025, 0.0018]],\n",
      "\n",
      "         [[0.0021, 0.0018, 0.0017,  ..., 0.0023, 0.0021, 0.0015],\n",
      "          [0.0026, 0.0012, 0.0023,  ..., 0.0022, 0.0018, 0.0022],\n",
      "          [0.0018, 0.0023, 0.0024,  ..., 0.0020, 0.0020, 0.0021],\n",
      "          ...,\n",
      "          [0.0023, 0.0017, 0.0025,  ..., 0.0025, 0.0023, 0.0026],\n",
      "          [0.0023, 0.0023, 0.0025,  ..., 0.0019, 0.0020, 0.0016],\n",
      "          [0.0025, 0.0022, 0.0021,  ..., 0.0023, 0.0023, 0.0021]]],\n",
      "\n",
      "\n",
      "        [[[0.0022, 0.0025, 0.0026,  ..., 0.0025, 0.0021, 0.0016],\n",
      "          [0.0010, 0.0021, 0.0029,  ..., 0.0025, 0.0021, 0.0026],\n",
      "          [0.0016, 0.0023, 0.0020,  ..., 0.0025, 0.0020, 0.0026],\n",
      "          ...,\n",
      "          [0.0024, 0.0016, 0.0025,  ..., 0.0024, 0.0027, 0.0020],\n",
      "          [0.0016, 0.0017, 0.0023,  ..., 0.0017, 0.0023, 0.0020],\n",
      "          [0.0015, 0.0031, 0.0018,  ..., 0.0020, 0.0022, 0.0013]],\n",
      "\n",
      "         [[0.0023, 0.0022, 0.0029,  ..., 0.0017, 0.0019, 0.0026],\n",
      "          [0.0022, 0.0018, 0.0023,  ..., 0.0023, 0.0011, 0.0025],\n",
      "          [0.0024, 0.0022, 0.0022,  ..., 0.0028, 0.0025, 0.0020],\n",
      "          ...,\n",
      "          [0.0021, 0.0025, 0.0021,  ..., 0.0021, 0.0023, 0.0026],\n",
      "          [0.0021, 0.0026, 0.0014,  ..., 0.0031, 0.0024, 0.0025],\n",
      "          [0.0025, 0.0025, 0.0020,  ..., 0.0021, 0.0024, 0.0021]],\n",
      "\n",
      "         [[0.0016, 0.0019, 0.0025,  ..., 0.0021, 0.0019, 0.0029],\n",
      "          [0.0019, 0.0023, 0.0021,  ..., 0.0026, 0.0017, 0.0026],\n",
      "          [0.0016, 0.0026, 0.0020,  ..., 0.0027, 0.0022, 0.0028],\n",
      "          ...,\n",
      "          [0.0015, 0.0026, 0.0015,  ..., 0.0015, 0.0021, 0.0027],\n",
      "          [0.0019, 0.0018, 0.0022,  ..., 0.0020, 0.0016, 0.0021],\n",
      "          [0.0018, 0.0021, 0.0020,  ..., 0.0018, 0.0025, 0.0019]]],\n",
      "\n",
      "\n",
      "        [[[0.0015, 0.0021, 0.0028,  ..., 0.0023, 0.0013, 0.0017],\n",
      "          [0.0019, 0.0023, 0.0021,  ..., 0.0022, 0.0014, 0.0020],\n",
      "          [0.0027, 0.0017, 0.0019,  ..., 0.0022, 0.0018, 0.0015],\n",
      "          ...,\n",
      "          [0.0017, 0.0027, 0.0022,  ..., 0.0019, 0.0024, 0.0026],\n",
      "          [0.0018, 0.0023, 0.0016,  ..., 0.0018, 0.0013, 0.0028],\n",
      "          [0.0018, 0.0021, 0.0017,  ..., 0.0028, 0.0022, 0.0020]],\n",
      "\n",
      "         [[0.0018, 0.0023, 0.0020,  ..., 0.0020, 0.0021, 0.0019],\n",
      "          [0.0019, 0.0016, 0.0016,  ..., 0.0026, 0.0021, 0.0025],\n",
      "          [0.0020, 0.0027, 0.0012,  ..., 0.0020, 0.0016, 0.0025],\n",
      "          ...,\n",
      "          [0.0023, 0.0019, 0.0023,  ..., 0.0025, 0.0026, 0.0030],\n",
      "          [0.0026, 0.0017, 0.0017,  ..., 0.0018, 0.0018, 0.0023],\n",
      "          [0.0024, 0.0025, 0.0031,  ..., 0.0028, 0.0024, 0.0024]],\n",
      "\n",
      "         [[0.0020, 0.0023, 0.0029,  ..., 0.0030, 0.0020, 0.0022],\n",
      "          [0.0023, 0.0014, 0.0024,  ..., 0.0018, 0.0019, 0.0027],\n",
      "          [0.0022, 0.0013, 0.0019,  ..., 0.0021, 0.0025, 0.0015],\n",
      "          ...,\n",
      "          [0.0022, 0.0019, 0.0019,  ..., 0.0015, 0.0026, 0.0020],\n",
      "          [0.0017, 0.0016, 0.0025,  ..., 0.0021, 0.0023, 0.0019],\n",
      "          [0.0019, 0.0027, 0.0020,  ..., 0.0021, 0.0022, 0.0022]]],\n",
      "\n",
      "\n",
      "        ...,\n",
      "\n",
      "\n",
      "        [[[0.0017, 0.0022, 0.0020,  ..., 0.0021, 0.0025, 0.0025],\n",
      "          [0.0024, 0.0021, 0.0024,  ..., 0.0019, 0.0023, 0.0020],\n",
      "          [0.0021, 0.0027, 0.0019,  ..., 0.0024, 0.0014, 0.0018],\n",
      "          ...,\n",
      "          [0.0020, 0.0022, 0.0016,  ..., 0.0030, 0.0028, 0.0021],\n",
      "          [0.0015, 0.0024, 0.0020,  ..., 0.0018, 0.0026, 0.0025],\n",
      "          [0.0028, 0.0025, 0.0030,  ..., 0.0015, 0.0022, 0.0023]],\n",
      "\n",
      "         [[0.0029, 0.0020, 0.0021,  ..., 0.0026, 0.0019, 0.0021],\n",
      "          [0.0027, 0.0023, 0.0024,  ..., 0.0017, 0.0021, 0.0024],\n",
      "          [0.0021, 0.0027, 0.0015,  ..., 0.0017, 0.0019, 0.0025],\n",
      "          ...,\n",
      "          [0.0018, 0.0023, 0.0015,  ..., 0.0026, 0.0021, 0.0019],\n",
      "          [0.0016, 0.0018, 0.0027,  ..., 0.0009, 0.0013, 0.0023],\n",
      "          [0.0013, 0.0026, 0.0022,  ..., 0.0021, 0.0020, 0.0022]],\n",
      "\n",
      "         [[0.0028, 0.0020, 0.0014,  ..., 0.0019, 0.0025, 0.0026],\n",
      "          [0.0024, 0.0018, 0.0017,  ..., 0.0013, 0.0023, 0.0025],\n",
      "          [0.0023, 0.0017, 0.0024,  ..., 0.0018, 0.0023, 0.0025],\n",
      "          ...,\n",
      "          [0.0006, 0.0027, 0.0023,  ..., 0.0022, 0.0022, 0.0017],\n",
      "          [0.0015, 0.0024, 0.0018,  ..., 0.0023, 0.0021, 0.0019],\n",
      "          [0.0015, 0.0014, 0.0025,  ..., 0.0020, 0.0017, 0.0026]]],\n",
      "\n",
      "\n",
      "        [[[0.0020, 0.0024, 0.0019,  ..., 0.0023, 0.0020, 0.0024],\n",
      "          [0.0022, 0.0020, 0.0021,  ..., 0.0017, 0.0019, 0.0019],\n",
      "          [0.0022, 0.0020, 0.0028,  ..., 0.0020, 0.0025, 0.0024],\n",
      "          ...,\n",
      "          [0.0018, 0.0024, 0.0017,  ..., 0.0018, 0.0019, 0.0019],\n",
      "          [0.0021, 0.0015, 0.0012,  ..., 0.0016, 0.0022, 0.0015],\n",
      "          [0.0020, 0.0019, 0.0016,  ..., 0.0019, 0.0025, 0.0023]],\n",
      "\n",
      "         [[0.0020, 0.0017, 0.0020,  ..., 0.0016, 0.0017, 0.0020],\n",
      "          [0.0020, 0.0014, 0.0021,  ..., 0.0022, 0.0021, 0.0026],\n",
      "          [0.0021, 0.0018, 0.0016,  ..., 0.0025, 0.0029, 0.0016],\n",
      "          ...,\n",
      "          [0.0018, 0.0016, 0.0018,  ..., 0.0018, 0.0020, 0.0015],\n",
      "          [0.0017, 0.0015, 0.0018,  ..., 0.0024, 0.0020, 0.0022],\n",
      "          [0.0019, 0.0022, 0.0017,  ..., 0.0014, 0.0026, 0.0020]],\n",
      "\n",
      "         [[0.0019, 0.0024, 0.0022,  ..., 0.0023, 0.0024, 0.0022],\n",
      "          [0.0023, 0.0032, 0.0018,  ..., 0.0013, 0.0030, 0.0020],\n",
      "          [0.0022, 0.0018, 0.0025,  ..., 0.0024, 0.0021, 0.0014],\n",
      "          ...,\n",
      "          [0.0020, 0.0018, 0.0025,  ..., 0.0025, 0.0020, 0.0023],\n",
      "          [0.0021, 0.0027, 0.0019,  ..., 0.0021, 0.0015, 0.0020],\n",
      "          [0.0019, 0.0018, 0.0028,  ..., 0.0024, 0.0018, 0.0026]]],\n",
      "\n",
      "\n",
      "        [[[0.0024, 0.0024, 0.0030,  ..., 0.0029, 0.0023, 0.0018],\n",
      "          [0.0024, 0.0028, 0.0016,  ..., 0.0019, 0.0020, 0.0022],\n",
      "          [0.0014, 0.0022, 0.0019,  ..., 0.0025, 0.0021, 0.0023],\n",
      "          ...,\n",
      "          [0.0022, 0.0024, 0.0016,  ..., 0.0017, 0.0019, 0.0029],\n",
      "          [0.0025, 0.0027, 0.0022,  ..., 0.0018, 0.0028, 0.0019],\n",
      "          [0.0029, 0.0020, 0.0027,  ..., 0.0016, 0.0024, 0.0025]],\n",
      "\n",
      "         [[0.0026, 0.0019, 0.0024,  ..., 0.0029, 0.0025, 0.0010],\n",
      "          [0.0021, 0.0020, 0.0027,  ..., 0.0023, 0.0023, 0.0021],\n",
      "          [0.0022, 0.0027, 0.0023,  ..., 0.0012, 0.0019, 0.0015],\n",
      "          ...,\n",
      "          [0.0026, 0.0023, 0.0020,  ..., 0.0024, 0.0019, 0.0017],\n",
      "          [0.0027, 0.0025, 0.0021,  ..., 0.0030, 0.0026, 0.0025],\n",
      "          [0.0015, 0.0012, 0.0027,  ..., 0.0030, 0.0023, 0.0018]],\n",
      "\n",
      "         [[0.0026, 0.0028, 0.0025,  ..., 0.0025, 0.0023, 0.0017],\n",
      "          [0.0017, 0.0033, 0.0028,  ..., 0.0022, 0.0013, 0.0021],\n",
      "          [0.0020, 0.0019, 0.0019,  ..., 0.0024, 0.0024, 0.0020],\n",
      "          ...,\n",
      "          [0.0022, 0.0014, 0.0017,  ..., 0.0019, 0.0022, 0.0019],\n",
      "          [0.0020, 0.0021, 0.0027,  ..., 0.0021, 0.0021, 0.0019],\n",
      "          [0.0020, 0.0014, 0.0020,  ..., 0.0029, 0.0021, 0.0026]]]])\n"
     ]
    }
   ],
   "source": [
    "a = torch.randn((32, 3, 14, 14), requires_grad=True)\n",
    "b = torch.ones((32, 3, 14, 14)) * 5\n",
    "\n",
    "result_addition = a + b\n",
    "result_double = result_addition * 2\n",
    "result_square = result_double ** 2\n",
    "result_mean = result_square.mean()\n",
    "\n",
    "loss = result_mean\n",
    "\n",
    "loss.backward()\n",
    "\n",
    "print(a.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Student**: Ok, so we can build graphs, what about neural networks? Are there any pre-built layers? How do we train things? How do we define parameters and biases for our models? \n",
    "\n",
    "**TA**: Don't rush. Let's take it step by step. Let's look at nn.Parameters first.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TA**: In Pytorch all learnable components are created using the nn.Parameter class. That class, automatically tracks all gradients, and allows quick and easy updates in a given graph.\n",
    "\n",
    "**Note**: np.dot for a single batch going to a single 2D weight matrix is called using F.linear in Pytorch.\n",
    "\n",
    "**Further Note**: There also exist ParameterDicts for dictionaries of parameters, and ParameterLists when you define a list of parameters for part of your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([16, 32])\n",
      "current loss tensor(0.3759, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.3666, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.3480, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.3201, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.2830, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.2365, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.1808, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.1157, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(0.0414, grad_fn=<MeanBackward0>)\n",
      "current loss tensor(-0.0422, grad_fn=<MeanBackward0>)\n"
     ]
    }
   ],
   "source": [
    "weights = nn.Parameter(torch.randn(32, 32), requires_grad=True)\n",
    "inputs = torch.randn(16, 32)\n",
    "outputs = F.linear(inputs, weights)\n",
    "learning_rate = 0.1\n",
    "\n",
    "print(outputs.shape)\n",
    "\n",
    "for i in range(10):\n",
    "    outputs = F.linear(inputs, weights)\n",
    "    loss = torch.mean(outputs)\n",
    "    loss.backward()\n",
    "    weights.data = weights.data - learning_rate * weights.grad\n",
    "    print('current loss', loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## nn.Modules and why they are important\n",
    "\n",
    "Pytorch implements a class called the nn.Module class. The nn.Module class automatically detects any nn.Parameter, nn.ParameterList or nn.ParameterDict and adds it to a collection of parameters which can be easily accessed using .parameters and/or .named_parameters().\n",
    "\n",
    "Let's look at an example:\n",
    "\n",
    "Let's build a fully connected layer followed by an activation function that can be preselected, similar to coursework 1. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class LinearLayerWithActivation(nn.Module):\n",
    "    def __init__(self, input_shape, num_units, bias=False, activation_type=nn.ReLU()):\n",
    "        super(LinearLayerWithActivation, self).__init__()\n",
    "        self.activation_type = activation_type\n",
    "        self.weights = nn.Parameter(torch.empty(size=(num_units, input_shape[1]), requires_grad=True))\n",
    "        \n",
    "        nn.init.normal_(self.weights)\n",
    "        \n",
    "        if bias:\n",
    "            self.bias = nn.Parameter(torch.zeros(num_units), requires_grad=True)\n",
    "        else:\n",
    "            self.bias = None\n",
    "        \n",
    "    def forward(self, x):\n",
    "        out = F.linear(x, self.weights, self.bias)\n",
    "        out = self.activation_type.forward(out)\n",
    "        return out\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameters with name weights and shape torch.Size([512, 128])\n",
      "Parameters with name bias and shape torch.Size([512])\n"
     ]
    }
   ],
   "source": [
    "x = torch.arange(16*128).view(16, 128).float()\n",
    "y = torch.arange((16))\n",
    "\n",
    "fcc_net = LinearLayerWithActivation(input_shape=x.shape, num_units=512, bias=True, activation_type=nn.Identity())\n",
    "optimizer = optim.Adam(fcc_net.parameters(), amsgrad=False, weight_decay=0.0)\n",
    "\n",
    "\n",
    "for name, params in fcc_net.named_parameters():\n",
    "    print('Parameters with name', name, 'and shape', params.shape)\n",
    "\n",
    "metric_dict = {'losses': []}    \n",
    "    \n",
    "for i in range(50):\n",
    "\n",
    "    out = fcc_net.forward(x)\n",
    "    loss = F.cross_entropy(out, y)\n",
    "    fcc_net.zero_grad() #removes grads of previous step\n",
    "    optimizer.zero_grad() #removes grads of previous step\n",
    "    loss.backward() #compute gradients of current step\n",
    "    optimizer.step() #update step\n",
    "    metric_dict['losses'].append(loss.detach().cpu().numpy()) #.detach: Copies the value of the loss \n",
    "#                                                               and removes it from the graph, \n",
    "#                                                             .cpu() sends to cpu, and \n",
    "#                                                              numpy(), converts it to numpy format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsoAAAF0CAYAAAA6vh/YAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABrE0lEQVR4nO3de3zO9f/H8cf72rVhNmYOzZoxh4lmzpFDhMg5JR1olUOk5HuovtVK5RtS32/1S76lSFmhkjklySFyiBBzKMNoOYUyM4sdrvfvj4urlsl513Z53m83t5t9rvf1+byu69Wl5/Xe+/P5GGutRURERERE8nB4uwARERERkcJIQVlEREREJB8KyiIiIiIi+VBQFhERERHJh4KyiIiIiEg+FJRFRERERPKhoCwiIiIikg8FZRERERGRfCgoi4iIiIjkQ0FZRERERCQfTm8X4KsOHz5MTk5OgRyrfPnyHDx4sECOJZeXeuk71EvfoV76DvXSd1xsL51OJ2XKlDn7uAs+gvylnJwcsrOzL/txjDGe41lrL/vx5PJRL32Heuk71EvfoV76joLspZZeiIiIiIjkQ0FZRERERCQfCsoiIiIiIvlQUBYRERERyYdO5hMREREpICdOnODEiRPeLqPI++2338jKyvrLMcWKFaNYsWIXdRwFZREREZECcOzYMYwxBAcHe67cIBfG39//L68uZq3lt99+49ixY5QsWfKCj6OlFyIiIiIFICcnh8DAQIXkAmCMITAw8KLvaaGgLCIiIlIAFJAL3sW+5wrKIiIiIiL5UFAu4lzzZ5Czb7e3yxARERHxOQrKRZjd8QOujyewb+Bt5L73OvbQz94uSURERHxMz549GTZsmLfL8AoF5aKsWHHMtQ0gNxe77EtcTw/CNekNBWYRERGRS0BBuQgzEVXw+/vzVPjPu5hr67sD89fzcT39IK6EsdhfDni7RBEREZEiS0HZBxSrFYvf34fj+NdoqF0PcnOwS7/AFT8IV8L/sL8c9HaJIiIi8ifWWuyJ4975Y+0F1ZyWlsYjjzxC7dq1qVatGn369CElJcXz+O7du7n33nupXbs21atX58Ybb2ThwoWe5z788MPUqVOHatWq0bx5cz766CPPc/ft28egQYOoXbs21157Lffffz8//fST5/EVK1bQuXNnqlevTvXq1enevTu7d1/e87R0wxEfYqrXwu/vw7HbtuCaPQW+34BdOg+7fAGm5U2Yjj0xoeW9XaaIiIgAZJ3A9XAvrxza8cbHUKz4eT/v73//Ozt37mTixIkEBQUxcuRI7rnnHr766iv8/f156qmnyM7O5tNPPyUwMJDk5GTPDT9efvllkpOT+eCDDwgNDWXnzp0cP34ccN9p7/bbb6dJkyZ8+umnOJ1O/u///o/evXuzYMECHA4H/fr14+6772bs2LFYa/n2228v+yX3FJR9kKlRG79//BubvNkdmH9Iwn71OXbZl5gW7U8G5nLeLlNERESKkJSUFObPn8+MGTNo3LgxAGPGjKFx48bMmzePrl27snfvXjp16kStWrUAqFy5suf5e/bsISYmhrp16wJQqVIlz2MzZ87E4XDwn//8xxN+X3nlFWrVqsXKlSuJjY0lPT2ddu3aUaVKFfz9/YmKirrsr1lB2YeZ6Gvx++cL2K2b3IF560bsV3Oxy+ZjWnZwB+YyZb1dpoiIyJUpoJh7ZtdLxz5f27dvx+l00qBBA8+20NBQqlWrxvbt2wHo27cvTz75JEuWLKFly5Z06tSJ2rVrAxAXF8eAAQPYuHEjrVq1okOHDp7AnZSUxK5du4iOjs5zzBMnTrBr1y5atWpFr1696N27Ny1btqR169Z06tSJq6666kLfgXOioHwFMDVj8Ks5Art1I65ZUyB5E3bxZ9iv52Nu6IDpeBsmRIFZRESkIBljLmj5g7ecaV3zH7fffffdtGrVioULF7J06VLeeOMNhg0bRt++fWnTpg2rV69mwYIFLFu2jDvvvJN7772XYcOG4XK5iI2NZcyYMaftv2xZd0Z59dVX6devH4sXL2bGjBmMGjWKKVOm0LBhw8vzgikEQXn+/PnMnz+fgwfdJ5xFRETQs2dP6tevf9rYt99+mwULFnDvvffSuXNnz/bs7GwSEhJYvnw5WVlZxMTE0L9/f88bC5CRkcHEiRNZs2YNAI0aNaJv376edTMAhw4dYvz48WzevJmAgACaN29OXFwcTqfX36ZLwtSsg99jdbA/JOGaNRm2bcEumoNd+gWm1c2Ym2/DhIR6u0wREREphGrUqEFOTg7r1q3zzAT/+uuvpKSkUKNGDc+4q6++mri4OOLi4hg1ahSTJ0+mb9++gDv03nHHHdxxxx1cd911vPDCCwwbNow6deowe/ZsypUrR3Bw8BlriImJISYmhn/84x/cfPPNzJgx47IGZa9f9SI0NJS7776bUaNGMWrUKGJiYnjppZfynOUIsHr1arZt20aZMmVO28d7773H6tWrGTp0KMOHD+f48eO8+OKLuFwuz5jXX3+dXbt2ER8fT3x8PLt27crzrcXlcjFq1ChOnDjB8OHDGTp0KKtWrWLSpEmX78V7ibkmFsdjo3D8499QvTbkZGMXzsb11AO4pr6DTfvV2yWKiIhIIVO1alU6dOjA448/zurVq9m8eTOPPPIIYWFhdOjQAYBhw4bx1VdfkZqaysaNG1m+fDnVq1cH3CfzffHFF+zcuZOtW7eyYMECT8C+9dZbKVOmDPfffz+rVq0iNTWVlStXMmzYMPbu3UtqaiqjRo1izZo17N69m8WLF5OSkuLZ9+Xi9aDcqFEjGjRoQHh4OOHh4dx1110UL16cbdu2ecb8+uuvvPvuuzzyyCOnze5mZmayaNEi4uLiiI2NJSoqiiFDhpCamkpSUhLgvlTJ+vXrGTRoENHR0URHRzNw4EDWrVvH3r17AdiwYQO7d+9myJAhREVFERsbS1xcHAsXLiQzM7Pg3pACYozB1KqL4/FROP4+HKpdA9lZvwfmjyZgjxz2dpkiIiJSiLzyyivUqVOHe++9l27dumGtJSEhAX9/f8A98RgfH0/r1q3p3bs3VatWZeTIkQD4+/szatQo2rVrx6233oqfnx//+9//AChRogTTp0/n6quvpn///rRu3Zp//vOfHD9+nODgYEqUKMH27dt54IEHaNmyJY8++ij3338/99xzz2V9vYVqTYHL5WLlypWcOHHCs5jb5XIxZswYunXrlufsyFNSUlLIzc0lNjbWsy00NJTIyEiSk5OpV68eycnJBAYG5vm1QHR0NIGBgWzdupXw8HCSk5OJjIwkNPT3pQd169YlOzublJQUYmJi8q05Ozub7Oxsz8/GGEqUKOH5++V26hgXeixjDFxbH1O7HnbLd7hmToaUrdgFM7FLP8e06oTj5lsxpU+fyZdL62J7KYWHeuk71EvfoV5euGnTpnn+HhISwuuvv37GsS+88MIZH/vb3/7G3/72tzM+XqFCBf7v//4v38eCg4OZMGGC52d/f/88+euvXEzPC0VQTk1NJT4+nuzsbIoXL86jjz5KREQE4L5ciJ+fHx07dsz3uWlpaTidToKCgvJsL126NGlpaZ4xpUuXPu25ZxsTFBSE0+n0jMlPYmJinv+AoqKiGD16NOXLF+z1isPCwi5+J+Hh2LadOL7uG9I/HEfW1k3YL2fgWvo5QZ1uJ7hnHH5aw3zZXZJeSqGgXvoO9dJ3eLOXv/32m2fmVS7eubyXAQEBVKxY8YKPUSiCcnh4OC+//DLHjh1j1apVjB07lueff56srCzmzp3L6NGjz/vbwLncccZam2e/+R3jz2P+rEePHnTp0uW0fRw8eJCcnJzzKfmCGGMICwtj//79F3yXndOEV8E+OhLHprW4Zk7G7trG0cQPOPrZJ5gbO+Po0ANTKuTSHEs8LksvxSvUS9+hXvqOwtDLrKysc54Flb92rjPKWVlZ7Nu377TtTqfznCY1C0VQdjqdnm941apVY8eOHcydO5err76a9PR0Bg8e7BnrcrmYNGkSc+fOZezYsYSEhJCTk0NGRkaeWeX09HRq1qwJuH9NcOTIkdOOm56
      "text/plain": [
       "<Figure size 800x400 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_stats_in_graph(metric_dict, y_axis_label='Loss', x_axis_label='Number of Steps')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TA**: Does that make sense now?\n",
    "\n",
    "**Student**: Yeah, somewhat. What about more complicated systems? Will I have to implement everything using barebone components like F.linear etc.?\n",
    "\n",
    "**TA**: You can use existing nn.Modules as components of new nn.Modules therefore, you are able of modularizing your network blocks, and then combining them at the end in one big network with very few lines of code. Pytorch already provides almost every kind of layer out there in their torch.nn package. Look at the [documentation](https://pytorch.org/docs/stable/nn.html) for more information. Now, let's see how we can combine modules to build a larger module. Let's build a multi layer fully connected module.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MultiLayerFCCNetwork(nn.Module):\n",
    "    def __init__(self, input_shape, num_hidden_units, num_output_units, num_hidden_layers):\n",
    "        super(MultiLayerFCCNetwork, self).__init__()\n",
    "        self.input_shape = input_shape\n",
    "        self.num_hidden_units = num_hidden_units\n",
    "        self.num_output_units = num_output_units\n",
    "        self.num_hidden_layers = num_hidden_layers\n",
    "        \n",
    "        x_dummy = torch.zeros(input_shape)\n",
    "        \n",
    "        self.layer_dict = nn.ModuleDict() # Allows us to initialize modules within a dictionary structure.\n",
    "        out = x_dummy\n",
    "        for i in range(self.num_hidden_layers):\n",
    "            self.layer_dict['layer_{}'.format(i)] = LinearLayerWithActivation(input_shape=out.shape, \n",
    "                                                             num_units=self.num_hidden_units, bias=True,\n",
    "                                                                       activation_type=nn.PReLU())\n",
    "            \n",
    "            out = self.layer_dict['layer_{}'.format(i)].forward(out)\n",
    "        \n",
    "        self.layer_dict['output_layer'] = LinearLayerWithActivation(input_shape=out.shape, \n",
    "                                                             num_units=self.num_output_units, \n",
    "                                                             bias=True, activation_type=nn.Identity())\n",
    "        out = self.layer_dict['output_layer'].forward(out)\n",
    "    \n",
    "    def forward(self, x):\n",
    "        out = x\n",
    "        for i in range(self.num_hidden_layers):\n",
    "            out = self.layer_dict['layer_{}'.format(i)].forward(out)\n",
    "\n",
    "        out = self.layer_dict['output_layer'].forward(out)\n",
    "        return out\n",
    "            \n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameters with name layer_dict.layer_0.weights and shape torch.Size([64, 128])\n",
      "Parameters with name layer_dict.layer_0.bias and shape torch.Size([64])\n",
      "Parameters with name layer_dict.layer_0.activation_type.weight and shape torch.Size([1])\n",
      "Parameters with name layer_dict.layer_1.weights and shape torch.Size([64, 64])\n",
      "Parameters with name layer_dict.layer_1.bias and shape torch.Size([64])\n",
      "Parameters with name layer_dict.layer_1.activation_type.weight and shape torch.Size([1])\n",
      "Parameters with name layer_dict.layer_2.weights and shape torch.Size([64, 64])\n",
      "Parameters with name layer_dict.layer_2.bias and shape torch.Size([64])\n",
      "Parameters with name layer_dict.layer_2.activation_type.weight and shape torch.Size([1])\n",
      "Parameters with name layer_dict.layer_3.weights and shape torch.Size([64, 64])\n",
      "Parameters with name layer_dict.layer_3.bias and shape torch.Size([64])\n",
      "Parameters with name layer_dict.layer_3.activation_type.weight and shape torch.Size([1])\n",
      "Parameters with name layer_dict.output_layer.weights and shape torch.Size([512, 64])\n",
      "Parameters with name layer_dict.output_layer.bias and shape torch.Size([512])\n"
     ]
    }
   ],
   "source": [
    "fcc_net = MultiLayerFCCNetwork(input_shape=x.shape, num_hidden_units=64, num_output_units=512, \n",
    "                               num_hidden_layers=4)\n",
    "optimizer = optim.Adam(fcc_net.parameters(), amsgrad=False, weight_decay=0.0)\n",
    "\n",
    "\n",
    "for name, params in fcc_net.named_parameters():\n",
    "    print('Parameters with name', name, 'and shape', params.shape)\n",
    "\n",
    "metric_dict = {'losses': []}    \n",
    "    \n",
    "for i in range(100):\n",
    "\n",
    "    out = fcc_net.forward(x)\n",
    "    loss = F.cross_entropy(out, y)\n",
    "    fcc_net.zero_grad() #removes grads of previous step\n",
    "    optimizer.zero_grad() #removes grads of previous step\n",
    "    loss.backward() #compute gradients of current step\n",
    "    optimizer.step() #update step\n",
    "\n",
    "    metric_dict['losses'].append(loss.detach().cpu().numpy()) #.detach: Copies the value of the loss \n",
    "#                                                               and removes it from the graph, \n",
    "#                                                             .cpu() sends to cpu, and \n",
    "#                                                              numpy(), converts it to numpy format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqcAAAGDCAYAAAAWMLm5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABOaElEQVR4nO3dd3xUVf7/8de5yaQnhJCEUEIvghRBRRSp1kUUdV3srr3X3bUtyhddgcW+6+ruzxVFUVFEUHQtiJUmKIgIKAgJnRAiKaSXOb8/LgxGQCEkuZPJ+/l4zIPMnTtzP8PZuG/OPcVYay0iIiIiIkHA8boAEREREZE9FE5FREREJGgonIqIiIhI0FA4FREREZGgoXAqIiIiIkFD4VREREREgobCqYiIiIgEDYVTEREREQkaCqciIiIiEjQUTkVEREQkaIR7XUBtWrVqFbNmzSIzM5Pc3Fz+8pe/0K9fv4N+/7Rp05g+ffo+xyMjI5kyZUptlioiIiIi+xFS4bSsrIx27doxdOhQHnvssUN+/1lnncWpp55a7diDDz5Ix44da6tEEREREfkVIRVO+/TpQ58+fQ74emVlJa+99hpz586luLiY9PR0Lr74Yo488kgAoqKiiIqKCpy/fv16Nm/ezDXXXFPntYuIiIhIiIXT3/LMM8+wY8cObr/9dpo2bcrixYsZP348jz76KC1atNjn/E8++YQWLVrQrVs3D6oVERERaXwazYSorKws5s+fzx133EG3bt1IS0vjrLPO4ogjjuDTTz/d5/yKigrmzp3LsGHDPKhWREREpHFqND2nmZmZWGu57bbbqh2vrKwkLi5un/MXLVpEaWkpgwcPrq8SRURERBq9RhNOrbU4jsPEiRNxnOodxj8fZ7rHJ598Qt++fUlMTKynCkVERESk0YTTdu3a4ff7yc/P/80xpNnZ2axcuZK77rqrnqoTEREREQixcFpaWkpWVlbgeXZ2NuvXrycuLo6WLVty4okn8q9//YvLLruM9u3bU1BQwIoVK2jTpg19+/YNvO+TTz4hMTHxV2f+i4iIiEjtM9Za63URtWXlypU88MAD+xwfPHgwN910E5WVlcyYMYPPP/+cnTt3Eh8fT5cuXRg1ahRt2rQBwO/3c9NNNzFo0CAuvPDC+v4KIiIiIo1aSIVTEREREWnYGs1SUiIiIiIS/BRORURERCRoKJyKiIiISNDwfLb+tGnTmD59erVjTZo04b///a9HFYmIiIiIVzwPpwDp6encf//9gee/XCT/YOTm5lJZWVmbZR1QSkoKO3bsqJdrSd1RO4YGtWNoUDuGBrVjaKiLdgwPD6dp06YHd26tXrmGHMc57J2YKisrqaioqJ2CfoUxJnA9LXTQcKkdQ4PaMTSoHUOD2jE0BEM7BkU4zcrK4rrrriM8PJzOnTtz4YUX0rx58/2eW1FRUS2EGmOIjo4O/FzX9lyjPq4ldUftGBrUjqFB7Rga1I6hIRja0fN1Tr/55hvKyspo2bIleXl5zJgxgy1btvD4448THx+/z/m/HKPavn17Jk6cWJ8li4iIiEgd8Tyc/lJpaSm33HILI0eOZMSIEfu8fqCe0x07dtTLmFNjDGlpaWRlZem2RQOmdgwNasfQoHYMDWrH0FBX7RgeHk5KSsrBnVtrV60lUVFRtGnThm3btu33dZ/Ph8/n2+9r9fnLYK3VL18IUDuGBrVjaFA7hga1Y2jwsh2DLpxWVFSwZcsWunXr5nUpIiIiEgLKysooKyvzuowGo6SkhPLy8kN+X2RkJJGRkYd9fc/D6UsvvcQxxxxDcnIy+fn5vPnmm5SUlDB48GCvSxMREZEGrqioCGMM8fHxmqx1kHw+3yGvgGStpaSkhKKiImJjYw/r+p6H0507d/KPf/yDgoICEhIS6Ny5M+PGjTvocQkiIiIiB1JZWUmTJk28LiPkGWOIiYkhPz//sD/L83B6++23e12CiIiIhCj1ltav2vj7PvStmERERERE6ojC6SGyuT+x6+2p2JJir0sRERERCTkKp4eo6on7yXv2MeyyRV6XIiIiIiHqvPPOY8yYMV6X4QmF00Nkjh4AgP1qrseViIiIiIQehdND5Bw7EAC78htsUaHH1YiIiIiEFoXTQ2RatsHXtiNUVerWvoiISANjrcWWldb/4zB2W8rLy+PWW2+le/fudOzYkUsuuYSMjIzA65s3b+aPf/wj3bt3p1OnTgwdOpSPP/448N6bb76Znj170rFjRwYMGMDrr78eeO+2bdu4/vrr6d69O0ceeSRXXHEFGzduDLy+YMECzjjjDDp16kS3bt0YOXIkmzdvrvF3ORieLyXVEEUPPIWKDeuwX8+FASd5XY6IiIgcrPIy/DePqvfLOv+aBpFRNXrvHXfcQWZmJi+88AJxcXGMHz+eSy+9lM8++wyfz8df//pXKioqePPNN4mJiWHNmjWBhfAfeeQR1qxZw8svv0xSUhKZmZmUlpYC7k5Qf/jDHzjuuON48803CQ8P5x//+AcXXHABH330EY7jcNVVV3HRRRfx9NNPU1FRwTfffFPny3MpnNZAzKBTKHj5P/D9t9jCAkxcgtcliYiISAjKyMhg9uzZvPXWWxx77LEAPPXUUxx77LF88MEHnHnmmWzdupXhw4cHtn5v27Zt4P1btmyhR48e9O7dG4D09PTAa2+//TaO4/Doo48GAufjjz9O9+7dWbhwIb169aKgoICTTz6Zdu3aAdC5c+c6/84KpzXga9UW0jvApgzs0oWYQad5XZKIiIgcjIhItxfTg+vWxNq1awkPD6dv376BY0lJSXTs2JG1a9cCcOWVV3Lvvffy+eefM3DgQIYPH0737t0BuOyyy7jmmmv47rvvGDx4MKeddlog5C5fvpz169fTpUuXatcsLS1l/fr1DB48mFGjRnHxxRczcOBABg4cyJlnnknz5s1r9F0Olsac1pBz7IkA2K/neVyJiIiIHCxjDCYyqv4fNbwVfqCxqj8/ftFFF7FgwQJ+//vf88MPPzB8+HCef/55AIYNG8bixYu5+uqr2b59OxdccAEPPvggAH6/n169ejF79uxqj4ULF3LOOecA8MQTTzBr1iyOOeYYZs2axcCBA1myZEmNvsvBUjitIbN71j4/fIctyPO0FhEREQlNnTt3prKykqVLlwaO7dy5k4yMjGq32Fu1asVll13Gc889x3XXXcerr74aeK1Zs2acf/75PPXUU4wdO5ZXXnkFgJ49e5KZmUlycjLt27cPPDp06EBCwt4hiz169OCWW25h1qxZdO3albfeeqtOv7PCaQ2ZlDRo2wmsH7t0gdfliIiISAjq0KEDp512GnfddReLFy9m5cqV3HrrraSlpXHaae6wwjFjxvDZZ5+xceNGvvvuO+bPn0+nTp0Ad0LUhx9+SGZmJqtXr2bOnDmBUHvuuefStGlTrrjiChYtWsTGjRtZuHAho0ePZuvWrWzcuJEJEybw9ddfs3nzZj7//HMyMjICn11XNOb0MJhjB2I3rMV+PR+GDPe6HBEREQlBjz/+OGPGjOGPf/wj5eXl9O/fnylTpuDz+QD39vzo0aPZtm0bcXFxDBkyhLFjxwLg8/mYMGECmzZtIioqiuOOO45nnnkGgOjoaGbMmMG4ceO4+uqrKSoqIi0tjUGDBhEfH09paSlr167ljTfeIDc3l9TUVK644gouvfTSOv2+xh7OwltBZMeOHVRUVNT5dYwxtGjRgm3btuHP2Y7/nqvBGJyHX8AkJtX59aV2/LwdQ+RXoFFSO4YGtWNoCNZ2LCgoqHaLWn6bz+ercaY60N+3z+cjJSXloD5Dt/UPg2mWCh26grXYJbq1LyIiInK4FE4PkwnM2p/rcSUiIiIiDZ/C6WEyfQe4P6z9Hrszx9tiRERERBo4hdPDZJKSoZO70K1dMt/jakREREQaNoXTWhC4tf+Vbu2LiIiIHA6F01pgjh4AxkDmGmzOdq/LERERkZ/x+/1el9Ao1Nbfs8JpLTBNmkKXHoC2MxUREQkmMTEx7Nq1SwG1jvn9fnbt2kVMTMxhf5YW4a8lpt9A7Orv3Fv7p//e63JEREQECA8PJzY
      "text/plain": [
       "<Figure size 800x400 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_stats_in_graph(metric_dict, y_axis_label='Loss', x_axis_label='Number of Steps')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TA**: There we go, the network is doing much better during training with a multi-layer neural network. :)\n",
    "\n",
    "**Student**: Hmm.. I am weirdly excited even though I have not digested this completely yet. Where do I go to learn more? \n",
    "\n",
    "**TA**: Firstly, I think you should go and have a look at the MLP Pytorch Framework, so you can learn how Pytorch can be used with more complicated architectures, as well as to learn some good coding practices for research and industry alike. When you are working on your coursework, make sure to have the [pytorch official documentation page](https://pytorch.org/docs/stable/nn.html) open in your browser, as it is extremely well written most of the times. Then, when you have some spare time, perhaps in preparation for next term, I would recommend going through some of the Pytorch tutorials at the [pytorch tutorials page](https://pytorch.org/tutorials/). Finally, the best way to learn, in my opinion, is by engaging with Pytorch through a project that interests you."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "mlp",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}