setup 2024-25 lab 1

This commit is contained in:
tpmmthomas 2024-09-18 21:56:35 +08:00
parent b8daa780f2
commit 1effb25bbc
6 changed files with 377 additions and 347 deletions

View File

@ -6,8 +6,8 @@ This assignment-based course is focused on the implementation and evaluation of
The code in this repository is split into:
* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
## Remote working
@ -16,3 +16,9 @@ If you are working remotely, follow this [guide](notes/remote-working-guide.md).
## Getting set up
Detailed instructions for setting up a development environment for the course are given in [this file](notes/environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
## Exercises
If you are first time users of jupyter notebook, check out `notebooks/00_notebook.ipynb` to understand its features.
To get started with the exercises, go to the `notebooks` directory. For lab 1, work with the notebook starting with the prefix `01`, and so on.

View File

@ -137,7 +137,7 @@ class MNISTDataProvider(DataProvider):
# """Returns next data batch or raises `StopIteration` if at end."""
# inputs_batch, targets_batch = super(MNISTDataProvider, self).next()
# return inputs_batch, self.to_one_of_k(targets_batch)
#
def __next__(self):
return self.next()
@ -163,14 +163,14 @@ class MetOfficeDataProvider(DataProvider):
"""South Scotland Met Office weather data provider."""
def __init__(self, window_size, batch_size=10, max_num_batches=-1,
shuffle_order=True, rng=None):
shuffle_order=True, rng=None):
"""Create a new Met Offfice data provider object.
Args:
window_size (int): Size of windows to split weather time series
data into. The constructed input features will be the first
`window_size - 1` entries in each window and the target outputs
the last entry in each window.
data into. The constructed input features will be the first
`window_size - 1` entries in each window and the target outputs
the last entry in each window.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
@ -187,19 +187,21 @@ class MetOfficeDataProvider(DataProvider):
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load raw data from text file
# ...
# filter out all missing datapoints and flatten to a vector
# ...
# normalise data to zero mean, unit standard deviation
# ...
# convert from flat sequence to windowed data
# ...
# inputs are first (window_size - 1) entries in windows
#TODO: load raw data from text file
#TODO: filter out all missing datapoints and flatten to a vector
#TODO: normalise data to zero mean, unit standard deviation
#TODO: convert from flat sequence to windowed data
#TODO: separate into inputs and targets
# inputs are the first (window_size - 1) entries in windows
# inputs = ...
# targets are last entry in windows
# targets are the last entries in windows
# targets = ...
# initialise base class with inputs and targets arrays
# initialise base class with inputs and targets arrays (uncomment below)
# super(MetOfficeDataProvider, self).__init__(
# inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
def __next__(self):

242
notebooks/00_notebook.ipynb Normal file
View File

@ -0,0 +1,242 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"## Getting started with Jupyter notebooks\n",
"\n",
"The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to [skip straight to the exercises below](#Exercises).\n",
"\n",
"*Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python [hence the name was changed to make it more language agnostic](https://ipython.org/#jupyter-and-the-future-of-ipython) however IPython notebook is still commonly used.*\n",
"\n",
"### Jupyter basics: the server, dashboard and kernels\n",
"\n",
"In launching this notebook you will have already come across two of the other key components of the Jupyter system - the notebook *server* and *dashboard* interface.\n",
"\n",
"We began by starting a notebook server instance in the terminal by running\n",
"\n",
"```\n",
"jupyter notebook\n",
"```\n",
"\n",
"This will have begun printing a series of log messages to terminal output similar to\n",
"\n",
"```\n",
"$ jupyter notebook\n",
"[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical\n",
"[I 08:58:24.417 NotebookApp] 0 active kernels\n",
"[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/\n",
"```\n",
"\n",
"The last message included here indicates the URL the application is being served at. The default behaviour of the `jupyter notebook` command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running `jupyter notebook --no-browser`. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using\n",
"\n",
"```\n",
"jupyter notebook --help\n",
"```\n",
"\n",
"While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use `Ctrl+C` - this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter `y` or skip the confirmation by hitting `Ctrl+C` again.\n",
"\n",
"When the notebook application first opens in your browser you are taken to the notebook *dashboard*. This will appear something like this\n",
"\n",
"<img src='res/jupyter-dashboard.png' />\n",
"\n",
"The dashboard above is showing the `Files` tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the `..` link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.\n",
"\n",
"As well as allowing you to launch existing notebooks, the `Files` tab of the dashboard also allows new notebooks to be created using the `New` drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).\n",
"\n",
"In addition to opening notebook files, we can also edit text files such as `.py` source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.\n",
"\n",
"The `Running` tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed.\n",
"\n",
"### The notebook interface\n",
"\n",
"The top of your notebook window should appear something like this:\n",
"\n",
"<img src='res/jupyter-notebook-interface.png' />\n",
"\n",
"The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual *checkpoint* of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the `Revert to checkpoint` dropdown under the `File` menu perhaps suggesting otherwise.\n",
"\n",
"As well as having options to save and revert to checkpoints, the `File` menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.\n",
"\n",
"The `Edit` menu contains standard clipboard functions as well as options for reorganising notebook *cells*. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The `Edit` and `Insert` drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the `Cell` menu allow running of code cells and changing cell types.\n",
"\n",
"The `Kernel` menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and `reload` is failing to work. \n",
"\n",
"To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.\n",
"\n",
"The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the `User Interface Tour` under the `Help` menu drop down.\n",
"\n",
"### Markdown cells: easy text formatting\n",
"\n",
"This entire introduction has been written in what is termed a *Markdown* cell of a notebook. [Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides - for example [this cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).\n",
"\n",
"---\n",
"\n",
"```\n",
"## Level 2 heading\n",
"### Level 3 heading\n",
"\n",
"*Italicised* and **bold** text.\n",
"\n",
" * bulleted\n",
" * lists\n",
" \n",
"and\n",
"\n",
" 1. enumerated\n",
" 2. lists\n",
"\n",
"Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style\n",
"\n",
"$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
"```\n",
"---\n",
"\n",
"## Level 2 heading\n",
"### Level 3 heading\n",
"\n",
"*Italicised* and **bold** text.\n",
"\n",
" * bulleted\n",
" * lists\n",
" \n",
"and\n",
"\n",
" 1. enumerated\n",
" 2. lists\n",
"\n",
"Inline maths $y = mx + c$ using [MathJax]() as well as display maths\n",
"\n",
"$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
"\n",
"---\n",
"\n",
"We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.\n",
"\n",
"---\n",
"```\n",
"<img src=\"http://placehold.it/350x150\" />\n",
"```\n",
"---\n",
"\n",
"<img src=\"http://placehold.it/350x150\" />\n",
"\n",
"---\n",
"\n",
" \n",
"### Code cells: in browser code execution\n",
"\n",
"Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the <i class=\"fa-step-forward fa\"></i> button on the notebook toolbar, go to `Cell > Run Cells` or use the keyboard shortcut `Ctrl+Enter`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import print_function\n",
"import sys\n",
"\n",
"print('Hello world!')\n",
"print('Alarming hello!', file=sys.stderr)\n",
"print('Hello again!')\n",
"'And again!'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example shows the three main components of a code cell.\n",
"\n",
"The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.\n",
"\n",
"To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display `In [ ]:`. After the cell is run this is updated to `In [n]:` where `n` is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!\n",
"\n",
"Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29) is displayed. Immediately below that output to [`stderr`](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) is displayed. All of the output to `stdout` is displayed together even if there has been output to `stderr` between as shown by the suprising ordering in the output here. \n",
"\n",
"The final part of the output area is the *display* area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special `__repr__` method e.g. for a string it is just the quote enclosed value of the string itself.\n",
"\n",
"### Useful keyboard shortcuts\n",
"\n",
"There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the `Keyboard Shortcuts` option under the `Help` menu. We will cover a few of those we find most useful below.\n",
"\n",
"Shortcuts come in two flavours: those applicable in *command mode*, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in *edit mode* when the content of a cell is being edited, indicated by a green current cell highlight.\n",
"\n",
"In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the `Tab` key.\n",
"\n",
" * Pressing `Tab` a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing `a` + `[Tab]` in a code cell will come up with a list of objects beginning with `a` in the current global namespace, while typing `np.a` + `[Tab]` (assuming `import numpy as np` has been run already) will bring up a list of objects in the root NumPy namespace beginning with `a`.\n",
" * Pressing `Shift+Tab` once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing `Shift+Tab` twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing `Shift+Tab` four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.\n",
"\n",
"A series of useful shortcuts available in both command and edit mode are `[modifier]+Enter` where `[modifier]` is one of `Ctrl` (run selected cell), `Shift` (run selected cell and select next) or `Alt` (run selected cell and insert a new cell after).\n",
"\n",
"A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing `L` which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.\n",
" \n",
"### Magics\n",
"\n",
"There are a range of *magic* commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), however three that are particularly useful for this course:\n",
"\n",
" * [`%%timeit`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-timeit) Put at the beginning of a cell to time its execution and print the resulting timing statistics.\n",
" * [`%precision`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-precision) Set the precision for pretty printing of floating point values and NumPy arrays.\n",
" * [`%debug`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-debug) Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.\n",
" \n",
"### Plotting with `matplotlib`\n",
"\n",
"When setting up your environment one of the dependencies we asked you to install was `matplotlib`. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.\n",
"\n",
"When using `matplotlib` in a notebook you should first run the [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib)\n",
"\n",
"```\n",
"%matplotlib inline\n",
"```\n",
"\n",
"This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using `matplotlib` to show case some of the basic plot options. To see the output produced select the cell and then run it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"# generate a pair of sinusoids\n",
"x = np.linspace(0., 2. * np.pi, 100)\n",
"y1 = np.sin(x)\n",
"y2 = np.cos(x)\n",
"\n",
"# produce a new figure object with a defined (width, height) in inches\n",
"fig = plt.figure(figsize=(8, 4))\n",
"# add a single axis to the figure\n",
"ax = fig.add_subplot(111)\n",
"# plot the two sinusoidal traces on the axis, adjusting the line width\n",
"# and adding LaTeX legend labels\n",
"ax.plot(x, y1, linewidth=2, label=r'$\\sin(x)$')\n",
"ax.plot(x, y2, linewidth=2, label=r'$\\cos(x)$')\n",
"# set the axis labels\n",
"ax.set_xlabel('$x$', fontsize=16)\n",
"ax.set_ylabel('$y$', fontsize=16)\n",
"# force the legend to be displayed\n",
"ax.legend()\n",
"# adjust the limits of the horizontal axis\n",
"ax.set_xlim(0., 2. * np.pi)\n",
"# make a grid be displayed in the axis background\n",
"ax.grid(True)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -1,245 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "b167e6e2-05e0-4a4b-a6cc-47cab1c728b4"
}
},
"source": [
"# Introduction\n",
"\n",
"## Getting started with Jupyter notebooks\n",
"\n",
"The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to [skip straight to the exercises below](#Exercises).\n",
"\n",
"*Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python [hence the name was changed to make it more language agnostic](https://ipython.org/#jupyter-and-the-future-of-ipython) however IPython notebook is still commonly used.*\n",
"\n",
"### Jupyter basics: the server, dashboard and kernels\n",
"\n",
"In launching this notebook you will have already come across two of the other key components of the Jupyter system - the notebook *server* and *dashboard* interface.\n",
"\n",
"We began by starting a notebook server instance in the terminal by running\n",
"\n",
"```\n",
"jupyter notebook\n",
"```\n",
"\n",
"This will have begun printing a series of log messages to terminal output similar to\n",
"\n",
"```\n",
"$ jupyter notebook\n",
"[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical\n",
"[I 08:58:24.417 NotebookApp] 0 active kernels\n",
"[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/\n",
"```\n",
"\n",
"The last message included here indicates the URL the application is being served at. The default behaviour of the `jupyter notebook` command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running `jupyter notebook --no-browser`. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using\n",
"\n",
"```\n",
"jupyter notebook --help\n",
"```\n",
"\n",
"While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use `Ctrl+C` - this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter `y` or skip the confirmation by hitting `Ctrl+C` again.\n",
"\n",
"When the notebook application first opens in your browser you are taken to the notebook *dashboard*. This will appear something like this\n",
"\n",
"<img src='res/jupyter-dashboard.png' />\n",
"\n",
"The dashboard above is showing the `Files` tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the `..` link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.\n",
"\n",
"As well as allowing you to launch existing notebooks, the `Files` tab of the dashboard also allows new notebooks to be created using the `New` drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).\n",
"\n",
"In addition to opening notebook files, we can also edit text files such as `.py` source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.\n",
"\n",
"The `Running` tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed.\n",
"\n",
"### The notebook interface\n",
"\n",
"The top of your notebook window should appear something like this:\n",
"\n",
"<img src='res/jupyter-notebook-interface.png' />\n",
"\n",
"The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual *checkpoint* of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the `Revert to checkpoint` dropdown under the `File` menu perhaps suggesting otherwise.\n",
"\n",
"As well as having options to save and revert to checkpoints, the `File` menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.\n",
"\n",
"The `Edit` menu contains standard clipboard functions as well as options for reorganising notebook *cells*. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The `Edit` and `Insert` drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the `Cell` menu allow running of code cells and changing cell types.\n",
"\n",
"The `Kernel` menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and `reload` is failing to work. \n",
"\n",
"To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.\n",
"\n",
"The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the `User Interface Tour` under the `Help` menu drop down.\n",
"\n",
"### Markdown cells: easy text formatting\n",
"\n",
"This entire introduction has been written in what is termed a *Markdown* cell of a notebook. [Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides - for example [this cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).\n",
"\n",
"---\n",
"\n",
"```\n",
"## Level 2 heading\n",
"### Level 3 heading\n",
"\n",
"*Italicised* and **bold** text.\n",
"\n",
" * bulleted\n",
" * lists\n",
" \n",
"and\n",
"\n",
" 1. enumerated\n",
" 2. lists\n",
"\n",
"Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style\n",
"\n",
"$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
"```\n",
"---\n",
"\n",
"## Level 2 heading\n",
"### Level 3 heading\n",
"\n",
"*Italicised* and **bold** text.\n",
"\n",
" * bulleted\n",
" * lists\n",
" \n",
"and\n",
"\n",
" 1. enumerated\n",
" 2. lists\n",
"\n",
"Inline maths $y = mx + c$ using [MathJax]() as well as display maths\n",
"\n",
"$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
"\n",
"---\n",
"\n",
"We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.\n",
"\n",
"---\n",
"```\n",
"<img src=\"http://placehold.it/350x150\" />\n",
"```\n",
"---\n",
"\n",
"<img src=\"http://placehold.it/350x150\" />\n",
"\n",
"---\n",
"\n",
" \n",
"### Code cells: in browser code execution\n",
"\n",
"Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the <i class=\"fa-step-forward fa\"></i> button on the notebook toolbar, go to `Cell > Run Cells` or use the keyboard shortcut `Ctrl+Enter`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import print_function\n",
"import sys\n",
"\n",
"print('Hello world!')\n",
"print('Alarming hello!', file=sys.stderr)\n",
"print('Hello again!')\n",
"'And again!'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example shows the three main components of a code cell.\n",
"\n",
"The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.\n",
"\n",
"To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display `In [ ]:`. After the cell is run this is updated to `In [n]:` where `n` is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!\n",
"\n",
"Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29) is displayed. Immediately below that output to [`stderr`](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) is displayed. All of the output to `stdout` is displayed together even if there has been output to `stderr` between as shown by the suprising ordering in the output here. \n",
"\n",
"The final part of the output area is the *display* area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special `__repr__` method e.g. for a string it is just the quote enclosed value of the string itself.\n",
"\n",
"### Useful keyboard shortcuts\n",
"\n",
"There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the `Keyboard Shortcuts` option under the `Help` menu. We will cover a few of those we find most useful below.\n",
"\n",
"Shortcuts come in two flavours: those applicable in *command mode*, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in *edit mode* when the content of a cell is being edited, indicated by a green current cell highlight.\n",
"\n",
"In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the `Tab` key.\n",
"\n",
" * Pressing `Tab` a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing `a` + `[Tab]` in a code cell will come up with a list of objects beginning with `a` in the current global namespace, while typing `np.a` + `[Tab]` (assuming `import numpy as np` has been run already) will bring up a list of objects in the root NumPy namespace beginning with `a`.\n",
" * Pressing `Shift+Tab` once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing `Shift+Tab` twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing `Shift+Tab` four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.\n",
"\n",
"A series of useful shortcuts available in both command and edit mode are `[modifier]+Enter` where `[modifier]` is one of `Ctrl` (run selected cell), `Shift` (run selected cell and select next) or `Alt` (run selected cell and insert a new cell after).\n",
"\n",
"A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing `L` which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.\n",
" \n",
"### Magics\n",
"\n",
"There are a range of *magic* commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), however three that are particularly useful for this course:\n",
"\n",
" * [`%%timeit`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-timeit) Put at the beginning of a cell to time its execution and print the resulting timing statistics.\n",
" * [`%precision`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-precision) Set the precision for pretty printing of floating point values and NumPy arrays.\n",
" * [`%debug`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-debug) Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.\n",
" \n",
"### Plotting with `matplotlib`\n",
"\n",
"When setting up your environment one of the dependencies we asked you to install was `matplotlib`. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.\n",
"\n",
"When using `matplotlib` in a notebook you should first run the [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib)\n",
"\n",
"```\n",
"%matplotlib inline\n",
"```\n",
"\n",
"This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using `matplotlib` to show case some of the basic plot options. To see the output produced select the cell and then run it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbpresent": {
"id": "2bced39d-ae3a-4603-ac94-fbb6a6283a96"
}
},
"outputs": [],
"source": [
"# use the matplotlib magic to specify to display plots inline in the notebook\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"# generate a pair of sinusoids\n",
"x = np.linspace(0., 2. * np.pi, 100)\n",
"y1 = np.sin(x)\n",
"y2 = np.cos(x)\n",
"\n",
"# produce a new figure object with a defined (width, height) in inches\n",
"fig = plt.figure(figsize=(8, 4))\n",
"# add a single axis to the figure\n",
"ax = fig.add_subplot(111)\n",
"# plot the two sinusoidal traces on the axis, adjusting the line width\n",
"# and adding LaTeX legend labels\n",
"ax.plot(x, y1, linewidth=2, label=r'$\\sin(x)$')\n",
"ax.plot(x, y2, linewidth=2, label=r'$\\cos(x)$')\n",
"# set the axis labels\n",
"ax.set_xlabel('$x$', fontsize=16)\n",
"ax.set_ylabel('$y$', fontsize=16)\n",
"# force the legend to be displayed\n",
"ax.legend()\n",
"# adjust the limits of the horizontal axis\n",
"ax.set_xlim(0., 2. * np.pi)\n",
"# make a grid be displayed in the axis background\n",
"ax.grid(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
@ -250,7 +10,7 @@
"source": [
"# Exercises\n",
"\n",
"Today's exercises are meant to allow you to get some initial familiarisation with the `mlp` package and how data is provided to the learning functions. Next week onwards, we will follow with the material covered in lectures. \n",
"Today's exercises are meant to allow you to get some initial familiarisation with the `mlp` package and how data is provided to the learning functions. You are going to implement variants of a `DataProvider` class, which preprocesses data and serves data in batches when the `__next__()` function is called. \n",
"\n",
"If you are new to Python and/or NumPy and are struggling to complete the exercises, you may find going through [this Stanford University tutorial](http://cs231n.github.io/python-numpy-tutorial/) by Justin Johnson first helps. There is also a derived Jupyter notebook by Volodymyr Kuleshov and Isaac Caswell which you can download [from here](https://github.com/kuleshov/teaching-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) - if you save this in to your `mlpractical/notebooks` directory you should be able to open the notebook from the dashboard to run the examples.\n",
"\n",
@ -260,7 +20,9 @@
"\n",
"### Exercise 1 \n",
"\n",
"The `MNISTDataProvider` iterates over input images and target classes (digit IDs) from the [MNIST database of handwritten digit images](http://yann.lecun.com/exdb/mnist/), a common supervised learning benchmark task. Using the data provider and `matplotlib` we can for example iterate over the first couple of images in the dataset and display them using the following code:"
"The `MNISTDataProvider` iterates over input images and target classes (digit IDs) from the [MNIST database of handwritten digit images](http://yann.lecun.com/exdb/mnist/), a common supervised learning benchmark task. Using the data provider and `matplotlib` we can for example iterate over the first couple of images in the dataset and display them using the following code:\n",
"\n",
"* NOTE: If you encounter `KeyError: 'MLP_DATA_DIR'`, check that you have correctly set the environment variable following the setup instructions, and that you are in the `mlp` environment."
]
},
{
@ -304,15 +66,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Generally we will want to deal with batches of multiple images i.e. `batch_size > 1`. As a first task:\n",
"Generally we will want to deal with batches of multiple images i.e. `batch_size > 1`. \n",
"\n",
" * Using MNISTDataProvider, write code that iterates over the first 5 minibatches of size 100 data-points. \n",
" * Display each batch of MNIST digits in a $10\\times10$ grid of images. \n",
"**Your tasks**:\n",
"\n",
"* Using `MNISTDataProvider`, write code that iterates over the first 5 minibatches of size 100 data-points. \n",
"* Display each batch of MNIST digits in a $10\\times10$ grid of images. \n",
" \n",
"**Notes**:\n",
"\n",
" * Images are returned from the provider as tuples of numpy arrays `(inputs, targets)`. The `inputs` matrix has shape `(batch_size, input_dim)` while the `targets` array is of shape `(batch_size,)`, where `batch_size` is the number of data points in a single batch and `input_dim` is dimensionality of the input features. \n",
" * Each input data-point (image) is stored as a 784 dimensional vector of pixel intensities normalised to $[0, 1]$ from inital integer values in $[0, 255]$. However, the original spatial domain is two dimensional, so before plotting you will need to reshape the one dimensional input arrays in to two dimensional arrays 2D (MNIST images have the same height and width dimensions)."
" * Each input data-point (image) is stored as a 784 dimensional vector of pixel intensities normalised to $[0, 1]$ from inital integer values in $[0, 255]$. However, the original spatial domain is two dimensional, so before plotting you will need to reshape the one dimensional input arrays in to two dimensional arrays 2D (MNIST images have the same height and width dimensions).\n"
]
},
{
@ -324,8 +88,15 @@
"# write your code here for iterating over five batches of \n",
"# 100 data points each and displaying as 10x10 grids\n",
"\n",
"def show_batch_of_images(img_batch):\n",
" raise NotImplementedError('Write me!')"
"def show_batch_of_images(img_batch, fig_size=(3, 3)):\n",
" #Expected shape of img_batch: (batch_size, im_height, im_width)\n",
" raise NotImplementedError('Write me!')\n",
"\n",
"batch_size = 100\n",
"num_batches = 5\n",
"\n",
"#TODO: initialize the MNISTDataProvider class and iterate over batches\n",
"# with the show_batch_of_images function"
]
},
{
@ -338,9 +109,9 @@
"source": [
"### Exercise 2\n",
"\n",
"`MNISTDataProvider` as `targets` currently returns a vector of integers, each element in this vector represents an the integer ID of the class the corresponding data-point represents. \n",
"The `targets` variable in `MNISTDataProvider` currently returns a vector of integers, where each element in this vector represents an the class of the corresponding data-point (0 to 9). \n",
"\n",
"It is easier to train neural networks using a 1-of-K representation of multi-class targets. Instead of representing class identity by an integer, each target is replaced by a vector of length equal to teh number of classes whose values are zero everywhere except on the index corresponding to the class.\n",
"It is easier to train neural networks using a 1-of-K representation for multi-class targets. Instead of representing class identity by an integer, each target is replaced by a vector of length equal to teh number of classes whose values are zero everywhere except on the index corresponding to the class.\n",
"\n",
"For instance, given a batch of 5 integer targets `[2, 2, 0, 1, 0]` and assuming there are 3 different classes \n",
"the corresponding 1-of-K encoded targets would be\n",
@ -351,10 +122,10 @@
" [0, 1, 0],\n",
" [1, 0, 0]]\n",
"```\n",
"\n",
"**Your Tasks**:\n",
" * Implement the `to_one_of_k` method of `MNISTDataProvider` class. \n",
" * Uncomment the overloaded `next` method, so the raw targets are converted to 1-of-K coding. \n",
" * Test your code by running the the cell below."
" * Test your code by running the the cell below. As you have changed the `mlp` package, reload the notebook kernel before running the cell to make sure the changes are picked up."
]
},
{
@ -363,6 +134,9 @@
"metadata": {},
"outputs": [],
"source": [
"import mlp.data_providers as data_providers\n",
"import numpy as np\n",
"\n",
"mnist_dp = data_providers.MNISTDataProvider(\n",
" which_set='valid', batch_size=5, max_num_batches=5, shuffle_order=False)\n",
"\n",
@ -384,13 +158,16 @@
"source": [
"### Exercise 3\n",
"\n",
"Here you will write your own data provider `MetOfficeDataProvider` that wraps weather data for south Scotland. A previous version of this data has been stored in `data` directory for your convenience and skeleton code for the class provided in `mlp/data_providers.py`.\n",
"Here you will write your own data provider `MetOfficeDataProvider` that wraps weather data for south Scotland. This data is stored in `data/HadSSP_daily_qc.txt` for your convenience and skeleton code for the class provided in `mlp/data_providers.py`.\n",
"\n",
"The data is organised in the text file as a table, with the first two columns indexing the year and month of the readings and the following 31 columns giving daily precipitation values for the corresponding month. As not all months have 31 days some of the entries correspond to non-existing days. These values are indicated by a non-physical value of `-99.9`.\n",
"\n",
" * You should read all of the data from the file ([`np.loadtxt`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) may be useful for this) and then filter out the `-99.9` values and collapse the table to a one-dimensional array corresponding to a sequence of daily measurements for the whole period data is available for. [NumPy's boolean indexing feature](http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays) could be helpful here.\n",
" * A common initial preprocessing step in machine learning tasks is to normalise data so that it has zero mean and a standard deviation of one. Normalise the data sequence so that its overall mean is zero and standard deviation one.\n",
" * Each data point in the data provider should correspond to a window of length specified in the `__init__` method as `window_size` of this contiguous data sequence, with the model inputs being the first `window_size - 1` elements of the window and the target output being the last element of the window. For example if the original data sequence was `[1, 2, 3, 4, 5, 6]` and `window_size=3` then `input, target` pairs iterated over by the data provider should be\n",
"**Your tasks**:\n",
"\n",
" * Implement the `MetOfficeDataProvider` class in `mlp/data_providers.py`. You only need to implement the `__init__()` function, following the instructions below:\n",
" * You should read all of the data from the file ([`np.loadtxt`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) may be useful for this) and then filter out the `-99.9` values and collapse the table to a one-dimensional array corresponding to a sequence of daily measurements for the whole period data is available for. [NumPy's boolean indexing feature](http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays) could be helpful here.\n",
" * A common initial preprocessing step in machine learning tasks is to normalise data so that it has zero mean and a standard deviation of one. Normalise the data sequence so that its overall mean is zero and standard deviation one.\n",
" * Each data point in the data provider should correspond to a window of length specified in the `__init__` method as `window_size` of this contiguous data sequence, with the model inputs being the first `window_size - 1` elements of the window and the target output being the last element of the window. For example if the original data sequence was `[1, 2, 3, 4, 5, 6]` and `window_size=3` then `input, target` pairs iterated over by the data provider should be\n",
" ```\n",
" [1, 2], 3\n",
" [4, 5], 6\n",
@ -403,7 +180,7 @@
"[3, 4], 5\n",
"[4, 5], 6\n",
"```\n",
" * Test your code by running the cell below."
" * Test your code by running the cell below. (Remember to reload kernel after making changes in the `mlp` package)"
]
},
{
@ -416,6 +193,10 @@
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import mlp.data_providers as data_providers\n",
"import numpy as np\n",
"\n",
"batch_size = 3\n",
"for window_size in [2, 5, 10]:\n",
" met_dp = data_providers.MetOfficeDataProvider(\n",
@ -452,7 +233,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.12.5"
}
},
"nbformat": 4,

View File

@ -16,10 +16,8 @@ Conda can handle installation of the Python libraries we will be using and all t
There are several options available for installing Conda on a system. Here we will use the Python 3 version of [Miniconda](http://conda.pydata.org/miniconda.html), which installs just Conda and its dependencies. An alternative is to install the [Anaconda Python distribution](https://docs.continuum.io/anaconda/), which installs Conda and a large selection of popular Python packages. As we will require only a small subset of these packages we will use the more barebones Miniconda to avoid eating into your DICE disk quota too much, however if installing on a personal machine you may wish to consider Anaconda if you want to explore other Python packages.
## 2. Installing Miniconda
We provide instructions here for getting an environment with all the required dependencies running on computers running
the School of Informatics [DICE desktop](http://computing.help.inf.ed.ac.uk/dice-platform). The same instructions
should be able to used on other Linux distributions such as Ubuntu and Linux Mint with minimal adjustments.
@ -34,7 +32,7 @@ If you are using ssh connection to the student server, move to the next step. If
We first need to download the latest 64-bit Python 3 Miniconda install script:
```
```bash
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
```
@ -42,7 +40,7 @@ This uses `wget` a command-line tool for downloading files.
Now run the install script:
```
```bash
bash Miniconda3-latest-Linux-x86_64.sh
```
@ -56,14 +54,14 @@ definition in `.bashrc`. As the DICE bash start-up mechanism differs from the st
On DICE, append the Miniconda binaries directory to `PATH` in manually in `~/.benv` using
```
```bash
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
```
To avoid any errors later, check both the bashrc and benv files for the correct file path by running :
```
```bash
vim ~/.bashrc and vim ~/.benv
```
@ -71,43 +69,43 @@ For those who this appears a bit opaque to and want to know what is going on see
We now need to `source` the updated `~/.benv` so that the `PATH` variable in the current terminal session is updated:
```
```bash
source ~/.benv
```
From the next time you log in all future terminal sessions should have conda readily available via:
```
```bash
conda activate
```
## 3. Creating the Conda environment
You should now have a working Conda installation. If you run
```
```bash
conda --help
```
from a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
From a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
Assuming Conda is working, we will now create our Conda environment:
```
conda create -n mlp python=3
```bash
conda create -n mlp python=3.12.5 -y
```
This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install. You will be presented with a 'package plan' listing the packages to be installed and asked whether to proceed: type `y` then enter.
This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install.
We will now *activate* our created environment:
```
```bash
conda activate mlp
```
or on Windows only
```
```bash
activate mlp
```
@ -119,38 +117,41 @@ If you wish to deactivate an environment loaded in the current terminal e.g. to
We will now install the dependencies for the course into the new environment:
```
conda install numpy scipy matplotlib jupyter
```bash
conda install numpy scipy matplotlib jupyter -y
```
Again you will be given a list of the packages to be installed and asked to confirm whether to proceed. Enter `y` then wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
Wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).
```
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```bash
conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
```
Once the installation is finished, to recover some disk space we can clear the package tarballs Conda just downloaded:
```
conda clean -t
```bash
conda clean -t -y
```
These tarballs are usually cached to allow quicker installation into additional environments however we will only be using a single environment here so there is no need to keep them on disk.
***ANLP and IAML students only:***
To have normal access to your ANLP and IAML environments please do the following:
1. ```nano .condarc```
2. Add the following lines in the file:
```
```yml
envs_dirs:
- /group/teaching/conda/envs
- /group/teaching/conda/envs
pkgs_dirs:
- /group/teaching/conda/pkgs
- ~/miniconda3/pkgs
- /group/teaching/conda/pkgs
- ~/miniconda3/pkgs
```
3. Exit by using control + x and then choosing 'yes' at the exit prompt.
## 4. Getting the course code and a short introduction to Git
@ -167,7 +168,7 @@ https://github.com/VICO-UoE/mlpractical
Git is installed by default on DICE desktops. If you are running a system which does not have Git installed, you can use Conda to install it in your environment using:
```
```bash
conda install git
```
@ -188,32 +189,30 @@ If you are already familiar with Git you may wish to skip over the explanatory s
By default we will assume here you are cloning to your home directory however if you have an existing system for organising your workspace feel free to keep to that. **If you clone the repository to a path other than `~/mlpractical` however you will need to adjust all references to `~/mlpractical` in the commands below accordingly.**
To clone the `mlpractical` repository to the home directory run
```
```bash
git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
```
This will create a new `mlpractical` subdirectory with a local copy of the repository in it. Enter the directory and list all its contents, including hidden files, by running:
```
```bash
cd ~/mlpractical
ls -a # Windows equivalent: dir /a
```
For the most part this will look much like any other directory, with there being the following three non-hidden sub-directories:
* `data`: Data files used in the labs and assignments.
* `mlp`: The custom Python package we will use in this course.
* `notebooks`: The Jupyter notebook files for each lab and coursework.
* `data`: Data files used in the labs and assignments.
* `mlp`: The custom Python package we will use in this course.
* `notebooks`: The Jupyter notebook files for each lab and coursework.
Additionally there exists a hidden `.git` subdirectory (on Unix systems by default files and directories prepended with a period '.' are hidden). This directory contains the repository history database and various configuration files and references. Unless you are sure you know what you are doing you generally should not edit any of the files in this directory directly. Generally most configuration options can be enacted more safely using a `git config` command.
For instance to globally set the user name and email used in commits you can run:
```
```bash
git config --global user.name "[your name]"
git config --global user.email "[matric-number]@sms.ed.ac.uk"
```
@ -236,19 +235,19 @@ A *commit* in Git is a snapshot of the state of the project. The snapshots are r
2. The files with changes to be committed (including any new files) are added to the *staging area* by running:
```
```bash
git add file1 file2 ...
```
3. Finally the *staged changes* are used to create a new commit by running
```
```bash
git commit -m "A commit message describing the changes."
```
This writes the staged changes as a new commit in the repository history. We can see a log of the details of previous commits by running:
```
```bash
git log
```
@ -260,17 +259,17 @@ A new branch is created from a commit on an existing branch. Any commits made to
A typical Git workflow in a software development setting would be to create a new branch whenever making changes to a project, for example to fix a bug or implement a new feature. These changes are then isolated from the main code base allowing regular commits without worrying about making unstable changes to the main code base. Key to this workflow is the ability to *merge* commits from a branch into another branch, e.g. when it is decided a new feature is sufficiently developed to be added to the main code base. Although merging branches is key aspect of using Git in many projects, as dealing with merge conflicts when two branches both make changes to same parts of files can be a somewhat tricky process, we will here generally try to avoid the need for merges.
<p id='branching-explanation'>We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.</p>
We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.
To list the branches present in the local repository, run:
```
```bash
git branch
```
This will display a list of branches with a `*` next to the current branch. To switch to a different existing branch in the local repository run
```
```bash
git checkout branch-name
```
@ -278,8 +277,8 @@ This will change the code in the working directory to the current state of the c
You should make sure you are on the first lab branch now by running:
```
git checkout mlp2023-24/lab1
```bash
git checkout mlp2024-25/lab1
```
## 6. Installing the `mlp` Python package
@ -292,7 +291,7 @@ The standard way to install a Python package using a `setup.py` script is to run
As we will be updating the code in the `mlp` package during the course of the labs this would require you to re-run `python setup.py install` every time a change is made to the package. Instead therefore you should install the package in development mode by running:
```
```bash
python setup.py develop
```
@ -304,20 +303,20 @@ Instead of copying the package, this will instead create a symbolic link to the
Note that after the first time a Python module is loaded into an interpreter instance, using for example:
```
```python
import mlp
```
Running the `import` statement any further times will have no effect even if the underlying module code has been changed. To reload an already imported module we instead need to use the [`importlib.reload`](https://docs.python.org/3/library/importlib.html#importlib.reload) function, e.g.
```
```python
import importlib
importlib.reload(mlp)
```
**Note: To be clear as this has caused some confusion in previous labs the above `import ...` / `reload(...)` statements should NOT be run directly in a bash terminal. They are examples Python statements - you could run them in a terminal by first loading a Python interpreter using:**
```
```bash
python
```
@ -331,7 +330,7 @@ We observed previously the presence of a `data` subdirectory in the local reposi
Assuming you used the recommended Miniconda install location and cloned the `mlpractical` repository to your home directory, this variable can be automatically defined when activating the environment by running the following commands (on non-Windows systems):
```
```bash
cd ~/miniconda3/envs/mlp
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
@ -344,7 +343,7 @@ export MLP_DATA_DIR=$HOME/mlpractical/data
And on Windows systems (replacing the `[]` placeholders with the relevant paths):
```
```bash
cd [path-to-conda-root]\envs\mlp
mkdir .\etc\conda\activate.d
mkdir .\etc\conda\deactivate.d
@ -363,7 +362,7 @@ There will be a Jupyter notebook available for each lab and assignment in this c
To open a notebook, you first need to launch a Jupyter notebook server instance. From within the `mlpractical` directory containing your local copy of the repository (and with the `mlp` environment activated) run:
```
```bash
jupyter notebook
```
@ -379,13 +378,13 @@ Below are instructions for setting up the environment without additional explana
Start a new bash terminal. Download the latest 64-bit Python 3.9 Miniconda install script:
```
```bash
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
```
Run the install script:
```
```bash
bash Miniconda3-latest-Linux-x86_64.sh
```
@ -394,69 +393,70 @@ Review the software license agreement and choose whether to accept. Assuming you
You will then be asked whether to prepend the Miniconda binaries directory to the `PATH` system environment variable definition in `.bashrc`. You should respond `no` here as we will set up the addition to `PATH` manually in the next step.
Append the Miniconda binaries directory to `PATH` in manually in `~/.benv`:
```
```bash
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
```
`source` the updated `~/.benv`:
```
```bash
source ~/.benv
```
Create a new `mlp` Conda environment:
```
conda create -n mlp python=3
```bash
conda create -n mlp python=3.12.5 -y
```
Activate our created environment:
```
```bash
conda activate mlp
```
Install the dependencies for the course into the new environment:
```
conda install numpy scipy matplotlib jupyter
```bash
conda install numpy scipy matplotlib jupyter -y
```
Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).
```
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```bash
conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
```
Clear the package tarballs Conda just downloaded:
```
```bash
conda clean -t
```
Clone the course repository to your home directory:
```
```bash
git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
```
Make sure we are on the first lab branch
```
```bash
cd ~/mlpractical
git checkout mlp2023-24/lab1
git checkout mlp2024-25/lab1
```
Install the `mlp` package in the environment in develop mode
```
```bash
python ~/mlpractical/setup.py develop
```
Add an `MLP_DATA_DIR` variable to the environment
```
```bash
cd ~/miniconda3/envs/mlp
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
@ -469,14 +469,13 @@ export MLP_DATA_DIR=$HOME/mlpractical/data
Environment is now set up. Load the notebook server from `mlpractical` directory
```
```bash
cd ~/mlpractical
jupyter notebook
```
and then open the first lab notebook from the `notebooks` directory.
---
<b id="f1">[1]</b> The `echo` command causes the following text to be streamed to an output (standard terminal output by default). Here we use the append redirection operator `>>` to redirect the `echo` output to a file `~/.benv`, with it being appended to the end of the current file. The text actually added is `export PATH="$PATH:[your-home-directory]/miniconda/bin"` with the `\"` being used to escape the quote characters. The `export` command defines system-wide environment variables (more rigorously those inherited by child shells) with `PATH` being the environment variable defining where `bash` searches for executables as a colon-seperated list of directories. Here we add the Miniconda binary directory to the end of the current `PATH` definition. [](#a1)

View File

@ -6,7 +6,7 @@ setup(
name = "mlp",
author = "Pawel Swietojanski, Steve Renals, Matt Graham and Antreas Antoniou",
description = ("Neural network framework for University of Edinburgh "
"School of Informatics Machine Learning Practical course."),
"School of Informatics Machine Learning Practical course."),
url = "https://github.com/VICO-UoE/mlpractical",
packages=['mlp']
)