608 lines
26 KiB
TeX
608 lines
26 KiB
TeX
\documentclass[11pt,]{article}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{lmodern}
|
|
\usepackage{amssymb,amsmath}
|
|
\usepackage{ifxetex,ifluatex}
|
|
\usepackage{fixltx2e} % provides \textsubscript
|
|
% use upquote if available, for straight quotes in verbatim environments
|
|
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
|
|
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
|
|
\usepackage[utf8]{inputenc}
|
|
\else % if luatex or xelatex
|
|
\ifxetex
|
|
\usepackage{mathspec}
|
|
\usepackage{xltxtra,xunicode}
|
|
\else
|
|
\usepackage{fontspec}
|
|
\fi
|
|
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
|
|
\newcommand{\euro}{€}
|
|
\fi
|
|
% use microtype if available
|
|
\IfFileExists{microtype.sty}{\usepackage{microtype}}{}
|
|
\usepackage{color}
|
|
\usepackage{fancyvrb}
|
|
\newcommand{\VerbBar}{|}
|
|
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
|
|
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
|
|
% Add ',fontsize=\small' for more characters per line
|
|
\newenvironment{Shaded}{}{}
|
|
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
|
|
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
|
|
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
|
|
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
|
|
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
|
|
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
|
|
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
|
|
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
|
|
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
|
|
\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
|
|
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
|
|
\newcommand{\RegionMarkerTok}[1]{{#1}}
|
|
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
|
|
\newcommand{\NormalTok}[1]{{#1}}
|
|
\ifxetex
|
|
\usepackage[setpagesize=false, % page size defined by xetex
|
|
unicode=false, % unicode breaks when used with xetex
|
|
xetex]{hyperref}
|
|
\else
|
|
\usepackage[unicode=true]{hyperref}
|
|
\fi
|
|
\hypersetup{breaklinks=true,
|
|
bookmarks=true,
|
|
pdfauthor={},
|
|
pdftitle={},
|
|
colorlinks=true,
|
|
citecolor=blue,
|
|
urlcolor=blue,
|
|
linkcolor=magenta,
|
|
pdfborder={0 0 0}}
|
|
\urlstyle{same} % don't use monospace font for urls
|
|
\setlength{\parindent}{0pt}
|
|
\setlength{\parskip}{6pt plus 2pt minus 1pt}
|
|
\setlength{\emergencystretch}{3em} % prevent overfull lines
|
|
\setcounter{secnumdepth}{0}
|
|
\usepackage{txfonts}
|
|
\usepackage{microtype}
|
|
|
|
\usepackage[a4paper,body={170mm,250mm},top=25mm,left=25mm]{geometry}
|
|
\usepackage[sf,bf,small]{titlesec}
|
|
\usepackage{fancyhdr}
|
|
|
|
\pagestyle{fancy}
|
|
\lhead{\sffamily MLP Coursework 1}
|
|
\rhead{\sffamily Due: 27 October 2016}
|
|
\cfoot{\sffamily \thepage}
|
|
|
|
\author{}
|
|
\date{}
|
|
|
|
\begin{document}
|
|
|
|
\section{Machine Learning Practical: Coursework
|
|
1}\label{machine-learning-practical-coursework-1}
|
|
|
|
\textbf{Release date: Monday 10th October 2016}\\\textbf{Due date: 16:00
|
|
Thursday 27th October 2016}
|
|
|
|
\subsection{Introduction}\label{introduction}
|
|
|
|
This coursework is concerned with training multi-layer networks to
|
|
address the MNIST digit classification problem. It builds on the
|
|
material covered in the first three lab notebooks and the first four
|
|
lectures. It is highly recommended that you complete the first three lab
|
|
notebooks before starting the coursework. The aim of the coursework is
|
|
to investigate the effect of learning rate schedules and adaptive
|
|
learning rates on the progression of training and the final performance
|
|
achieved by the trained models.
|
|
|
|
\subsection{Mechanics}\label{mechanics}
|
|
|
|
\textbf{Marks:} This assignment will be assessed out of 100 marks and
|
|
forms 10\% of your final grade for the course.
|
|
|
|
\textbf{Academic conduct:} Assessed work is subject to University
|
|
regulations on academic
|
|
conduct:\\\url{http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct}
|
|
|
|
\textbf{Late submissions:} The School of Informatics policy is that late
|
|
coursework normally gets a mark of zero. See
|
|
{\small\url{http://web.inf.ed.ac.uk/infweb/student-services/ito/admin/coursework-projects/late-coursework-extension-requests}}
|
|
for exceptions to this rule. Any requests for extensions should go to
|
|
the Informatics Teaching Office (ITO), either directly or via your
|
|
Personal Tutor.
|
|
|
|
\subsection{Report}\label{report}
|
|
|
|
The main component of your coursework submission, on which you will be
|
|
assessed, will be a short report. This should follow a typical
|
|
experimental report structure, in particular covering the following
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
a clear description of the methods used and algorithms implemented,
|
|
\item
|
|
quantitative results for the experiments you carried out including
|
|
relevant graphs,
|
|
\item
|
|
discussion of the results of your experiments and any conclusions you
|
|
have drawn.
|
|
\end{itemize}
|
|
|
|
The report should be submitted in PDF. You are welcome to use what ever
|
|
document preparation tool you prefer working with to write the report
|
|
providing it can produce a PDF output and can meet the required
|
|
presentation standards for the report.
|
|
|
|
Of the total 100 marks for the coursework, 25 marks have been allocated
|
|
for the quality of presentation and clarity of the report. A good
|
|
report, will clear, precise, and concise. It will contain enough
|
|
information for someone else to reproduce your work (with the exception
|
|
that you do not have to include the values to which the parameters were
|
|
randomly initialised).
|
|
|
|
You will need to include experimental results plotted as graphs in the
|
|
report. You are advised (but not required) to use \texttt{matplotlib} to
|
|
produce these plots, and you may reuse code plotting (and other) code
|
|
given in the lab notebooks as a starting point.
|
|
|
|
Each plot should have all axes labelled and if multiple plots are
|
|
included on the same set of axes a legend should be included to make
|
|
clear what each line represents. Within the report all figures should be
|
|
numbered (and you should use these numbers to refer to the figures in
|
|
the main text) and have a descriptive caption stating what they show.
|
|
|
|
Ideally all figures should be included in your report file as
|
|
\href{https://en.wikipedia.org/wiki/Vector_graphics}{vector graphics}
|
|
rather than \href{https://en.wikipedia.org/wiki/Raster_graphics}{raster
|
|
files} as this will make sure all detail in the plot is visible.
|
|
Matplotlib supports saving high quality figures in a wide range of
|
|
common image formats using the
|
|
\href{http://matplotlib.org/api/pyplot_api.html\#matplotlib.pyplot.savefig}{\texttt{savefig}}
|
|
function. \textbf{You should use \texttt{savefig} rather than copying
|
|
the screen-resolution raster images outputted in the notebook.} An
|
|
example of using \texttt{savefig} to save a figure as a PDF file (which
|
|
can be included as graphics in
|
|
\href{https://en.wikibooks.org/wiki/LaTeX/Importing_Graphics}{LaTeX}
|
|
compiled with \texttt{pdflatex} and in Apple Pages and
|
|
\href{https://support.office.com/en-us/article/Add-a-PDF-to-your-Office-file-74819342-8f00-4ab4-bcbe-0f3df15ab0dc}{Microsoft
|
|
Word} documents) is given below.
|
|
|
|
\begin{Shaded}
|
|
\begin{Highlighting}[]
|
|
\CharTok{import} \NormalTok{matplotlib.pyplot }\CharTok{as} \NormalTok{plt}
|
|
\CharTok{import} \NormalTok{numpy }\CharTok{as} \NormalTok{np}
|
|
\CommentTok{# Generate some example data to plot}
|
|
\NormalTok{x = np.linspace(}\DecValTok{0}\NormalTok{., }\DecValTok{1}\NormalTok{., }\DecValTok{100}\NormalTok{)}
|
|
\NormalTok{y1 = np.sin(}\DecValTok{2}\NormalTok{. * np.pi * x)}
|
|
\NormalTok{y2 = np.cos(}\DecValTok{2}\NormalTok{. * np.pi * x)}
|
|
\NormalTok{fig_size = (}\DecValTok{6}\NormalTok{, }\DecValTok{3}\NormalTok{) }\CommentTok{# Set figure size in inches (width, height)}
|
|
\NormalTok{fig = plt.figure(figsize=fig_size) }\CommentTok{# Create a new figure object}
|
|
\NormalTok{ax = fig.add_subplot(}\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{) }\CommentTok{# Add a single axes to the figure}
|
|
\CommentTok{# Plot lines giving each a label for the legend and setting line width to 2}
|
|
\NormalTok{ax.plot(x, y1, linewidth=}\DecValTok{2}\NormalTok{, label=}\StringTok{'$y = \textbackslash{}sin(2\textbackslash{}pi x)$'}\NormalTok{)}
|
|
\NormalTok{ax.plot(x, y2, linewidth=}\DecValTok{2}\NormalTok{, label=}\StringTok{'$y = \textbackslash{}cos(2\textbackslash{}pi x)$'}\NormalTok{)}
|
|
\CommentTok{# Set the axes labels. Can use LaTeX in labels within $...$ delimiters.}
|
|
\NormalTok{ax.set_xlabel(}\StringTok{'$x$'}\NormalTok{, fontsize=}\DecValTok{12}\NormalTok{)}
|
|
\NormalTok{ax.set_ylabel(}\StringTok{'$y$'}\NormalTok{, fontsize=}\DecValTok{12}\NormalTok{)}
|
|
\NormalTok{ax.grid(}\StringTok{'on'}\NormalTok{) }\CommentTok{# Turn axes grid on}
|
|
\NormalTok{ax.legend(loc=}\StringTok{'best'}\NormalTok{, fontsize=}\DecValTok{11}\NormalTok{) }\CommentTok{# Add a legend}
|
|
\NormalTok{fig.tight_layout() }\CommentTok{# This minimises whitespace around the axes.}
|
|
\NormalTok{fig.savefig(}\StringTok{'file-name.pdf'}\NormalTok{) }\CommentTok{# Save figure to current directory in PDF format}
|
|
\end{Highlighting}
|
|
\end{Shaded}
|
|
|
|
If you are using Libre/OpenOffice you should use Scalable Vector Format
|
|
plots instead using \\
|
|
\texttt{fig.savefig('file-name.svg')}. If the
|
|
document editor you are using for the report does not support including
|
|
either PDF or SVG graphics you can instead output high-resolution raster
|
|
images using \texttt{fig.savefig('file-name.png', dpi=200)} however note
|
|
these files will generally be larger than either SVG or PDF formatted
|
|
graphics.
|
|
|
|
If you make use of any any books, articles, web pages or other resources
|
|
you should appropriately cite these in your report. You do not need to
|
|
cite material from the course lecture slides or lab notebooks.
|
|
|
|
\subsection{Code}\label{code}
|
|
|
|
You should run all of the experiments for the coursework inside the
|
|
Conda environment
|
|
\href{https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2016-7/master/environment-set-up.md}{you
|
|
set up in the first lab}.
|
|
|
|
The code for the coursework is available on the course
|
|
\href{https://github.com/CSTR-Edinburgh/mlpractical/}{Github repository}
|
|
on a branch \texttt{mlp2016-7/coursework1}. To create a local working
|
|
copy of this branch in your local repository you need to do the
|
|
following.
|
|
|
|
\begin{enumerate}
|
|
\def\labelenumi{\arabic{enumi}.}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
Make sure all modified files on the branch you are currently on have
|
|
been committed
|
|
(\href{https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2016-7/master/getting-started-in-a-lab.md}{see
|
|
details here} if you are unsure how to do this).
|
|
\item
|
|
Fetch changes to the upstream \texttt{origin} repository by running\\
|
|
\texttt{git fetch origin}
|
|
\item
|
|
Checkout a new local branch from the fetched branch using\\
|
|
\texttt{git checkout -b coursework1 origin/mlp2016-7/coursework1}
|
|
\end{enumerate}
|
|
|
|
You will now have a new branch in your local repository with all the
|
|
code necessary for the coursework in it. In the \texttt{notebooks}
|
|
directory there is a notebook \texttt{Coursework\_1.ipynb} which is
|
|
intended as a starting point for structuring the code for your
|
|
experiments. You will probably want to add additional code cells to this
|
|
as you go along and run new experiments (e.g.~doing each new training
|
|
run in a new cell). You may also wish to use Markdown cells to keep
|
|
notes on the results of experiments.
|
|
|
|
\subsection{Submission}\label{submission}
|
|
|
|
Your coursework submission should be done electronically using the
|
|
\href{http://computing.help.inf.ed.ac.uk/submit}{\texttt{submit}}
|
|
command available on DICE machines.
|
|
|
|
Your submission should include
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
your completed course report as a PDF file,
|
|
\item
|
|
the notebook (\texttt{.ipynb}) file you use to run the experiments in
|
|
\item
|
|
and your local version of the \texttt{mlp} code including any changes
|
|
you make to the modules (\texttt{.py} files).
|
|
\end{itemize}
|
|
|
|
You should EITHER (1) package all of these files into a single archive
|
|
file using
|
|
\href{http://linuxcommand.org/man_pages/tar1.html}{\texttt{tar}} or
|
|
\href{http://linuxcommand.org/man_pages/zip1.html}{\texttt{zip}}, e.g.
|
|
|
|
{\small
|
|
\begin{verbatim}
|
|
tar -zcf coursework1.tar.gz notebooks/Coursework_1.ipynb mlp/*.py reports/coursework1.pdf
|
|
\end{verbatim}
|
|
}
|
|
|
|
and then submit this archive using
|
|
|
|
\begin{verbatim}
|
|
submit mlp 1 coursework1.tar.gz
|
|
\end{verbatim}
|
|
|
|
OR (2) copy all of the files to a single directory \texttt{coursework1}
|
|
directory, e.g.
|
|
|
|
\begin{verbatim}
|
|
mkdir coursework1
|
|
cp notebooks/Coursework_1.ipynb mlp/*.py reports/coursework1.pdf coursework1
|
|
\end{verbatim}
|
|
|
|
and then submit this directory using
|
|
|
|
\begin{verbatim}
|
|
submit mlp 1 coursework1
|
|
\end{verbatim}
|
|
|
|
The \texttt{submit} command will prompt you with the details of the
|
|
submission including the name of the files / directories you are
|
|
submitting and the name of the course and exercise you are submitting
|
|
for and ask you to check if these details are correct. You should check
|
|
these carefully and reply \texttt{y} to submit if you are sure the files
|
|
are correct and \texttt{n} otherwise.
|
|
|
|
You can amend an existing submission by rerunning the \texttt{submit}
|
|
command any time up to the deadline. It is therefore a good idea
|
|
(particularly if this is your first time using the DICE submit
|
|
mechanism) to do an initial run of the \texttt{submit} command early on
|
|
and then rerun the command if you make any further updates to your
|
|
submisison rather than leaving submission to the last minute.
|
|
|
|
\subsection{Backing up your work}\label{backing-up-your-work}
|
|
|
|
It is \textbf{strongly recommended} you use some method for backing up
|
|
your work. Those working in their AFS homespace on DICE will have their
|
|
work automatically backed up as part of the
|
|
\href{http://computing.help.inf.ed.ac.uk/backups-and-mirrors}{routine
|
|
backup} of all user homespaces. If you are working on a personal
|
|
computer you should have your own backup method in place (e.g.~saving
|
|
additional copies to an external drive, syncing to a cloud service or
|
|
pushing commits to your local Git repository to a private repository on
|
|
Github). \textbf{Loss of work through failure to back up
|
|
\href{http://tinyurl.com/edinflate}{does not consitute a good reason for
|
|
late submission}}.
|
|
|
|
You may \emph{additionally} wish to keep your coursework under version
|
|
control in your local Git repository on the \texttt{coursework1} branch.
|
|
This does not need to be limited to the coursework notebook and
|
|
\texttt{mlp} Python modules - you can also add your report document to
|
|
the repository.
|
|
|
|
If you make regular commits of your work on the coursework this will
|
|
allow you to better keep track of the changes you have made and if
|
|
necessary revert to previous versions of files and/or restore
|
|
accidentally deleted work. This is not however required and you should
|
|
note that keeping your work under version control is a distinct issue
|
|
from backing up to guard against hard drive failure. If you are working
|
|
on a personal computer you should still keep an additional back up of
|
|
your work as described above.
|
|
|
|
\subsection{Standard network
|
|
architecture}\label{standard-network-architecture}
|
|
|
|
To make the results of your experiments more easily comparable, you
|
|
should try to keep as many of the free choices in the specification of
|
|
the model and learning problem the same across different experiments. If
|
|
you vary only a small number of aspects of the problem at a time this
|
|
will make it easier to interpret the effect those changes have.
|
|
|
|
In all experiments you should therefore use the same model architecture
|
|
and parameter initialisation method. In particular you should use a
|
|
model composed of three affine transformations interleaved with logistic
|
|
sigmoid nonlinearities, and a softmax output layer. The intermediate
|
|
layers between the input and output should have a dimension of 100
|
|
(i.e.~two hidden layers with 100 units in each hidden layer). This can
|
|
be defined with the following code:
|
|
|
|
\begin{Shaded}
|
|
\begin{Highlighting}[]
|
|
\CharTok{import} \NormalTok{numpy }\CharTok{as} \NormalTok{np}
|
|
\CharTok{from} \NormalTok{mlp.layers }\CharTok{import} \NormalTok{AffineLayer, SoftmaxLayer, SigmoidLayer}
|
|
\CharTok{from} \NormalTok{mlp.errors }\CharTok{import} \NormalTok{CrossEntropySoftmaxError}
|
|
\CharTok{from} \NormalTok{mlp.models }\CharTok{import} \NormalTok{MultipleLayerModel}
|
|
\CharTok{from} \NormalTok{mlp.initialisers }\CharTok{import} \NormalTok{ConstantInit, GlorotUniformInit}
|
|
|
|
\NormalTok{seed = }\DecValTok{10102016}
|
|
\NormalTok{rng = np.random.RandomState(seed)}
|
|
|
|
\NormalTok{input_dim, output_dim, hidden_dim = }\DecValTok{784}\NormalTok{, }\DecValTok{10}\NormalTok{, }\DecValTok{100}
|
|
|
|
\NormalTok{weights_init = GlorotUniformInit(rng=rng)}
|
|
\NormalTok{biases_init = ConstantInit(}\DecValTok{0}\NormalTok{.)}
|
|
|
|
\NormalTok{model = MultipleLayerModel([}
|
|
\NormalTok{AffineLayer(input_dim, hidden_dim, weights_init, biases_init),}
|
|
\NormalTok{SigmoidLayer(),}
|
|
\NormalTok{AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init),}
|
|
\NormalTok{SigmoidLayer(),}
|
|
\NormalTok{AffineLayer(hidden_dim, output_dim, weights_init, biases_init)}
|
|
\NormalTok{])}
|
|
|
|
\NormalTok{error = CrossEntropySoftmaxError()}
|
|
\end{Highlighting}
|
|
\end{Shaded}
|
|
|
|
Here we are using a special parameter initialisation scheme for the
|
|
weights which makes the scale of the random initialisation dependent on
|
|
the input and output dimensions of the layer, with the aim of trying to
|
|
keep the scale of activations at different layers of the network the
|
|
same at initialisation. The scheme is described in
|
|
\href{http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf}{\emph{Understanding
|
|
the difficulty of training deep feedforward neural networks}, Glorot and
|
|
Bengio (2011)}. As also recommended there we initialise the biases to
|
|
zero. You do not need to read or understand this paper for the
|
|
assignment, it only being mentioned to explain the use of
|
|
\texttt{GlorotUniformInit} in the above code. You should use this
|
|
parameter initialisation for all of your experiments.
|
|
|
|
As well as standardising the network architecture, you should also fix
|
|
the hyperparameters of the training procedure not being investigated to
|
|
be the same across different runs. In particular for all experiments you
|
|
should use a \textbf{batch size of 50 and train for a total of 100
|
|
epochs} for all reported runs. You may of course use a smaller number of
|
|
epochs for initial pilot runs.
|
|
|
|
\subsection{Part 1: Learning rate schedules (10
|
|
marks)}\label{part-1-learning-rate-schedules-10-marks}
|
|
|
|
In the first part of the assignment you will investigate how using a
|
|
time-dependent learning rate schedule influences training.
|
|
|
|
Implement one of the two following time-dependent learning rate
|
|
schedules:
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
exponential $\eta(t) = \eta_0 \exp\left(-t / r\right)$
|
|
\item
|
|
reciprocal $\eta(t) = \eta_0 \left(1 + t / r\right)^{-1}$
|
|
\end{itemize}
|
|
|
|
where $\eta_0$ is the initial learning rate, $t$ the epoch number,
|
|
$\eta(t)$ the learning rate at epoch $t$ and $r$ a free parameter
|
|
governing how quickly the learning rate decays.
|
|
|
|
You should implement the schedule by creating a new scheduler class in
|
|
the \texttt{mlp.schedulers.py} module which follows the interface of the
|
|
example \texttt{ConstantLearningRateScheduler} given in the module. In
|
|
particular as well as an \texttt{\_\_init\_\_} method initialising the
|
|
object with any free parameters for the schedule, the class should
|
|
define a \texttt{update\_learning\_rule} method which sets the
|
|
\texttt{learning\_rate} attribute of a learning rule object based on the
|
|
current epoch number.
|
|
|
|
A (potentially empty) list of scheduler objects are passed to the
|
|
\texttt{\_\_init\_\_} method of the \texttt{Optimiser} object used to
|
|
train the model, for example
|
|
|
|
\begin{Shaded}
|
|
\begin{Highlighting}[]
|
|
\NormalTok{schedulers = [ConstantLearningRateScheduler(learning_rate)]}
|
|
\NormalTok{optimiser = Optimiser(}
|
|
\NormalTok{model, error, learning_rule, train_data,}
|
|
\NormalTok{valid_data, data_monitors, schedulers)}
|
|
\end{Highlighting}
|
|
\end{Shaded}
|
|
|
|
You should:
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
Compare the performance of your time-dependent learning rate schedule
|
|
when training the standard model on the MNIST digit classification
|
|
task, to training with a constant learning rate baseline.
|
|
\item
|
|
Indicate how the free schedule parameters $\eta_0$ and $r$ affect the
|
|
evolution of the training.
|
|
\item
|
|
State the final error function and classification accuracy values and
|
|
include plots of the evolution of the error and accuracy across the
|
|
training epochs for both the training and validation sets. These
|
|
should be reported for both the constant learning rate baseline and
|
|
\emph{at least} one run with your learning rate scheduler
|
|
implementation.
|
|
\end{itemize}
|
|
|
|
\subsection{Part 2: Momentum learning rule (15
|
|
marks)}\label{part-2-momentum-learning-rule-15-marks}
|
|
|
|
In this part of the assignment you will investigate using a gradient
|
|
descent learning rule with momentum. This extends the basic gradient
|
|
learning rule by introducing extra momentum state variables for the
|
|
parameters. These can help the learning dynamic help overcome shallow
|
|
local minima and speed convergence when making multiple successive steps
|
|
in a similar direction in parameter space.
|
|
|
|
An implementation of the momentum learning rule is given in the
|
|
\texttt{mlp.learning\_rules} module in the \texttt{MomentumLearningRule}
|
|
class. Read through the code and documentation for this class and make
|
|
sure you understand how it relates to the equations given in the lecture
|
|
slides.
|
|
|
|
In addition to the \texttt{learning\_rate} parameter, the
|
|
\texttt{MomentumLearningRule} also accepts a \texttt{mom\_coeff}
|
|
argument. This \emph{momentum coefficient} $\alpha \in [0,\,1]$
|
|
determines the contribution of the previous momentum value to the new
|
|
momentum after an update.
|
|
|
|
\newpage
|
|
As a first task you should:
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
Compare the performance of a basic gradient descent learning rule to
|
|
the momentum learning rule for several values of the momentum
|
|
coefficient $\alpha$.
|
|
\item
|
|
Interpret how the momentum coefficient $\alpha$ influences training.
|
|
\item
|
|
Include plots of the error and accuracy training curves across the
|
|
training epochs, for the different momentum coefficients you test.
|
|
\end{itemize}
|
|
|
|
Analogous to scheduling of the learning rate, it is also possible to
|
|
vary the momentum coefficient over a training run. In particular it is
|
|
common to increase the coefficient from an initially lower value at the
|
|
start of training (when the direction of the gradient of the error
|
|
function in parameter space are likely to vary a lot) to a larger value
|
|
closer to 1 later in training. One possible schedule is
|
|
|
|
\begin{equation}
|
|
\alpha(t) = \alpha_{\infty} \left( 1 - \frac{\gamma}{t + \tau} \right)
|
|
\end{equation}
|
|
|
|
where $\alpha_{\infty} \in [0,\,1]$ determines the asymptotic momentum
|
|
coefficient and $\tau \geq 1$ and $0 \leq \gamma \leq \tau$ determine
|
|
the initial momentum coefficient and how quickly the coefficient tends to
|
|
$\alpha_{\infty}$.
|
|
|
|
You should create a scheduler class which implements the above momentum
|
|
coefficient schedule by adding a further definition to the
|
|
\texttt{mlp.schedulers} module. This should have the same interface as
|
|
the learning rate scheduler implemented in the previous part.
|
|
|
|
Using your implementation you should:
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
Try out several different momentum rate schedules by using different
|
|
values for $\alpha_{\infty}$, $\gamma$ and $\tau$ and investigate whether
|
|
using a variable momentum coefficient gives improved performance over
|
|
a constant momentum coefficient baseline.
|
|
\end{itemize}
|
|
|
|
\subsection{Part 3: Adaptive learning rules (40
|
|
marks)}\label{part-3-adaptive-learning-rules-40-marks}
|
|
|
|
In the final part of the assignment you will investigate adaptive
|
|
learning rules which attempt to automatically tune the scale of updates
|
|
in a parameter-dependent fashion.
|
|
|
|
You should implement \textbf{two} of the three adaptive learning rules
|
|
mentioned in the
|
|
\href{http://www.inf.ed.ac.uk/teaching/courses/mlp/2016/mlp04-learn.pdf}{fourth
|
|
lecture slides}:
|
|
\href{http://jmlr.org/papers/v12/duchi11a.html}{AdaGrad},
|
|
\href{http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf}{RMSProp}
|
|
and \href{https://arxiv.org/abs/1412.6980}{Adam}.
|
|
|
|
You should implement the learning rules by defining new classes
|
|
inheriting from \texttt{GradientDescendLearningRule} in the
|
|
\texttt{mlp/learning\_rules.py} module. The
|
|
\texttt{MomentumLearningRule} class should show you how to define
|
|
learning rules which use additional state variables to calculate the
|
|
updates to the parameters.
|
|
|
|
You should:
|
|
|
|
\begin{itemize}
|
|
\itemsep1pt\parskip0pt\parsep0pt
|
|
\item
|
|
Compare the performance of your two implemented adaptive training
|
|
rules to your previous results using the basic gradient descent and
|
|
momentum learning rules. Ideally you should compare both in terms of
|
|
speed of convergence (including potentially accounting for greater
|
|
computional cost of the adaptive updates) and the final error /
|
|
classification accuracy on both training and validation data sets.
|
|
\item
|
|
Briefly discuss any free parameters in the adaptive learning rules you
|
|
implement and how sensitive training performance seems to the values
|
|
used for them.
|
|
\item
|
|
Include example plots of the evolution of the error and accuracy
|
|
across the training epochs for the training and validation sets for
|
|
both of your implemented adaptive learning rules.
|
|
\end{itemize}
|
|
|
|
\subsection{Marking Scheme}\label{marking-scheme}
|
|
|
|
\begin{itemize}
|
|
\item
|
|
Part 1, Learning Rate Schedules (10 marks). Marks awarded for
|
|
completeness of implementation, experimental methodology, experimental
|
|
results.
|
|
\item
|
|
Part 2, Momentum Learning Rule (15 marks). Marks awarded for
|
|
completeness of implementation, experimental methodology, experimental
|
|
results.
|
|
\item
|
|
Part 3, Adaptive Learning Rules (40 marks). Marks awarded for
|
|
completeness of implementation, experimental methodology, experimental
|
|
results.
|
|
\item
|
|
Presentation and clarity of report (25 marks). Marks awarded for
|
|
overall structure, clear and concise presentation, providing enough
|
|
information to enable work to be reproduced, clear and concise
|
|
presentation of results, informative discussion and conclusions.
|
|
\item
|
|
Additional Excellence (10 marks). Marks awarded for significant
|
|
personal insight, creativity, originality, and/or extra depth and
|
|
academic maturity.
|
|
\end{itemize}
|
|
|
|
\end{document}
|