Adding coursework files to master branch.

This commit is contained in:
Matt Graham 2016-10-14 03:56:06 +01:00
parent 4863599bed
commit 0cec9920f3
5 changed files with 1120 additions and 0 deletions

247
courseworks/coursework_1.md Normal file
View File

@ -0,0 +1,247 @@
# Machine Learning Practical: Coursework 1
**Release date: Monday 10th October 2016**
**Due date: 16:00 Thursday 27th October 2016**
## Introduction
This coursework is concerned with training multi-layer networks to address the MNIST digit classification problem. It builds on the material covered in the first three lab notebooks and the first four lectures. It is highly recommended that you complete the first three lab notebooks before starting the coursework. The aim of the coursework is to investigate the effect of learning rate schedules and adaptive learning rates on the progression of training and the final performance achieved by the trained models.
## Mechanics
**Marks:** This assignment will be assessed out of 100 marks and forms 10% of your final grade for the course.
**Academic conduct:** Assessed work is subject to University regulations on academic conduct:
<http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct>
**Late submissions:** The School of Informatics policy is that late coursework normally gets a mark of zero. See <http://web.inf.ed.ac.uk/infweb/student-services/ito/admin/coursework-projects/late-coursework-extension-requests> for exceptions to this rule. Any requests for extensions should go to the Informatics Teaching Office (ITO), either directly or via your Personal Tutor.
## Report
The main component of your coursework submission, on which you will be assessed, will be a short report. This should follow a typical experimental report structure, in particular covering the following
* a clear description of the methods used and algorithms implemented,
* quantitative results for the experiments you carried out including relevant graphs,
* discussion of the results of your experiments and any conclusions you have drawn.
The report should be submitted in PDF. You are welcome to use what ever document preparation tool you prefer working with to write the report providing it can produce a PDF output and can meet the required presentation standards for the report.
Of the total 100 marks for the coursework, 25 marks have been allocated for the quality of presentation and clarity of the report. A good report, will clear, precise, and concise. It will contain enough information for someone else to reproduce your work (with the exception that you do not have to include the values to which the parameters were randomly initialised).
You will need to include experimental results plotted as graphs in the report. You are advised (but not required) to use `matplotlib` to produce these plots, and you may reuse code plotting (and other) code given in the lab notebooks as a starting point.
Each plot should have all axes labelled and if multiple plots are included on the same set of axes a legend should be included to make clear what each line represents. Within the report all figures should be numbered (and you should use these numbers to refer to the figures in the main text) and have a descriptive caption stating what they show.
Ideally all figures should be included in your report file as [vector graphics](https://en.wikipedia.org/wiki/Vector_graphics) rather than [raster files](https://en.wikipedia.org/wiki/Raster_graphics) as this will make sure all detail in the plot is visible. Matplotlib supports saving high quality figures in a wide range of common image formats using the [`savefig`](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.savefig) function. **You should use `savefig` rather than copying the screen-resolution raster images outputted in the notebook.** An example of using `savefig` to save a figure as a PDF file (which can be included as graphics in [LaTeX](https://en.wikibooks.org/wiki/LaTeX/Importing_Graphics) compiled with `pdflatex` and in Apple Pages and [Microsoft Word](https://support.office.com/en-us/article/Add-a-PDF-to-your-Office-file-74819342-8f00-4ab4-bcbe-0f3df15ab0dc) documents) is given below.
```python
import matplotlib.pyplot as plt
import numpy as np
# Generate some example data to plot
x = np.linspace(0., 1., 100)
y1 = np.sin(2. * np.pi * x)
y2 = np.cos(2. * np.pi * x)
fig_size = (6, 3) # Set figure size in inches (width, height)
fig = plt.figure(figsize=fig_size) # Create a new figure object
ax = fig.add_subplot(1, 1, 1) # Add a single axes to the figure
# Plot lines giving each a label for the legend and setting line width to 2
ax.plot(x, y1, linewidth=2, label='$y = \sin(2\pi x)$')
ax.plot(x, y2, linewidth=2, label='$y = \cos(2\pi x)$')
# Set the axes labels. Can use LaTeX in labels within $...$ delimiters.
ax.set_xlabel('$x$', fontsize=12)
ax.set_ylabel('$y$', fontsize=12)
ax.grid('on') # Turn axes grid on
ax.legend(loc='best', fontsize=11) # Add a legend
fig.tight_layout() # This minimises whitespace around the axes.
fig.savefig('file-name.pdf') # Save figure to current directory in PDF format
```
If you are using Libre/OpenOffice you should use Scalable Vector Format plots instead using `fig.savefig('file-name.svg')`. If the document editor you are using for the report does not support including either PDF or SVG graphics you can instead output high-resolution raster images using `fig.savefig('file-name.png', dpi=200)` however note these files will generally be larger than either SVG or PDF formatted graphics.
If you make use of any any books, articles, web pages or other resources you should appropriately cite these in your report. You do not need to cite material from the course lecture slides or lab notebooks.
## Code
You should run all of the experiments for the coursework inside the Conda environment [you set up in the first lab](https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2016-7/master/environment-set-up.md).
The code for the coursework is available on the course [Github repository](https://github.com/CSTR-Edinburgh/mlpractical/) on a branch `mlp2016-7/coursework1`. To create a local working copy of this branch in your local repository you need to do the following.
1. Make sure all modified files on the branch you are currently on have been committed ([see details here](https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2016-7/master/getting-started-in-a-lab.md) if you are unsure how to do this).
2. Fetch changes to the upstream `origin` repository by running
```
git fetch origin
```
3. Checkout a new local branch from the fetched branch using
```
git checkout -b coursework1 origin/mlp2016-7/coursework1
```
You will now have a new branch in your local repository with all the code necessary for the coursework in it. In the `notebooks` directory there is a notebook `Coursework_1.ipynb` which is intended as a starting point for structuring the code for your experiments. You will probably want to add additional code cells to this as you go along and run new experiments (e.g. doing each new training run in a new cell). You may also wish to use Markdown cells to keep notes on the results of experiments.
## Submission
Your coursework submission should be done electronically using the [`submit`](http://computing.help.inf.ed.ac.uk/submit) command available on DICE machines.
Your submission should include
* your completed course report as a PDF file,
* the notebook (`.ipynb`) file you use to run the experiments in
* and your local version of the `mlp` code including any changes you make to the modules (`.py` files).
You should EITHER (1) package all of these files into a single archive file using [`tar`](http://linuxcommand.org/man_pages/tar1.html) or [`zip`](http://linuxcommand.org/man_pages/zip1.html), e.g.
```
tar -zcf coursework1.tar.gz notebooks/Coursework_1.ipynb mlp/*.py reports/coursework1.pdf
```
and then submit this archive using
```
submit mlp 1 coursework1.tar.gz
```
OR (2) copy all of the files to a single directory `coursework1` directory, e.g.
```
mkdir coursework1
cp notebooks/Coursework_1.ipynb mlp/*.py reports/coursework1.pdf coursework1
```
and then submit this directory using
```
submit mlp 1 coursework1
```
The `submit` command will prompt you with the details of the submission including the name of the files / directories you are submitting and the name of the course and exercise you are submitting for and ask you to check if these details are correct. You should check these carefully and reply `y` to submit if you are sure the files are correct and `n` otherwise.
You can amend an existing submission by rerunning the `submit` command any time up to the deadline. It is therefore a good idea (particularly if this is your first time using the DICE submit mechanism) to do an initial run of the `submit` command early on and then rerun the command if you make any further updates to your submisison rather than leaving submission to the last minute.
## Backing up your work
It is **strongly recommended** you use some method for backing up your work. Those working in their AFS homespace on DICE will have their work automatically backed up as part of the [routine backup](http://computing.help.inf.ed.ac.uk/backups-and-mirrors) of all user homespaces. If you are working on a personal computer you should have your own backup method in place (e.g. saving additional copies to an external drive, syncing to a cloud service or pushing commits to your local Git repository to a private repository on Github). **Loss of work through failure to back up [does not consitute a good reason for late submission](http://tinyurl.com/edinflate)**.
You may *additionally* wish to keep your coursework under version control in your local Git repository on the `coursework1` branch. This does not need to be limited to the coursework notebook and `mlp` Python modules - you can also add your report document to the repository.
If you make regular commits of your work on the coursework this will allow you to better keep track of the changes you have made and if necessary revert to previous versions of files and/or restore accidentally deleted work. This is not however required and you should note that keeping your work under version control is a distinct issue from backing up to guard against hard drive failure. If you are working on a personal computer you should still keep an additional back up of your work as described above.
## Standard network architecture
To make the results of your experiments more easily comparable, you should try to keep as many of the free choices in the specification of the model and learning problem the same across different experiments. If you vary only a small number of aspects of the problem at a time this will make it easier to interpret the effect those changes have.
In all experiments you should therefore use the same model architecture and parameter initialisation method. In particular you should use a model composed of three affine transformations interleaved with logistic sigmoid nonlinearities, and a softmax output layer. The intermediate layers between the input and output should have a dimension of 100 (i.e. two hidden layers with 100 units in each hidden layer). This can be defined with the following code:
```python
import numpy as np
from mlp.layers import AffineLayer, SoftmaxLayer, SigmoidLayer
from mlp.errors import CrossEntropySoftmaxError
from mlp.models import MultipleLayerModel
from mlp.initialisers import ConstantInit, GlorotUniformInit
seed = 10102016
rng = np.random.RandomState(seed)
input_dim, output_dim, hidden_dim = 784, 10, 100
weights_init = GlorotUniformInit(rng=rng)
biases_init = ConstantInit(0.)
model = MultipleLayerModel([
AffineLayer(input_dim, hidden_dim, weights_init, biases_init),
SigmoidLayer(),
AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init),
SigmoidLayer(),
AffineLayer(hidden_dim, output_dim, weights_init, biases_init)
])
error = CrossEntropySoftmaxError()
```
Here we are using a special parameter initialisation scheme for the weights which makes the scale of the random initialisation dependent on the input and output dimensions of the layer, with the aim of trying to keep the scale of activations at different layers of the network the same at initialisation. The scheme is described in [*Understanding the difficulty of training deep feedforward neural networks*, Glorot and Bengio (2011)](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf). As also recommended there we initialise the biases to zero. You do not need to read or understand this paper for the assignment, it only being mentioned to explain the use of `GlorotUniformInit` in the above code. You should use this parameter initialisation for all of your experiments.
As well as standardising the network architecture, you should also fix the hyperparameters of the training procedure not being investigated to be the same across different runs. In particular for all experiments you should use a **batch size of 50 and train for a total of 100 epochs** for all reported runs. You may of course use a smaller number of epochs for initial pilot runs.
## Part 1: Learning rate schedules (10 marks)
In the first part of the assignment you will investigate how using a time-dependent learning rate schedule influences training.
Implement one of the two following time-dependent learning rate schedules:
* exponential $\eta(t) = \eta_0 \exp\left(-t / r\right)$
* reciprocal $\eta(t) = \eta_0 \left(1 + t / r\right)^{-1}$
where $\eta_0$ is the initial learning rate, $t$ the epoch number, $\eta(t)$ the learning rate at epoch $t$ and $r$ a free parameter governing how quickly the learning rate decays.
You should implement the schedule by creating a new scheduler class in the `mlp.schedulers.py` module which follows the interface of the example `ConstantLearningRateScheduler` given in the module. In particular as well as an `__init__` method initialising the object with any free parameters for the schedule, the class should define a `update_learning_rule` method which sets the `learning_rate` attribute of a learning rule object based on the current epoch number.
A (potentially empty) list of scheduler objects are passed to the `__init__` method of the `Optimiser` object used to train the model, for example
```python
schedulers = [ConstantLearningRateScheduler(learning_rate)]
optimiser = Optimiser(
model, error, learning_rule, train_data,
valid_data, data_monitors, schedulers)
```
You should:
* Compare the performance of your time-dependent learning rate schedule when training the standard model on the MNIST digit classification task, to training with a constant learning rate baseline.
* Indicate how the free schedule parameters $\eta_0$ and $r$ affect the evolution of the training.
* State the final error function and classification accuracy values and include plots of the evolution of the error and accuracy across the training epochs for both the training and validation sets. These should be reported for both the constant learning rate baseline and *at least* one run with your learning rate scheduler implementation.
## Part 2: Momentum learning rule (15 marks)
In this part of the assignment you will investigate using a gradient descent learning rule with momentum. This extends the basic gradient learning rule by introducing extra momentum state variables for the parameters. These can help the learning dynamic help overcome shallow local minima and speed convergence when making multiple successive steps in a similar direction in parameter space.
An implementation of the momentum learning rule is given in the `mlp.learning_rules` module in the `MomentumLearningRule` class. Read through the code and documentation for this class and make sure you understand how it relates to the equations given in the lecture slides.
In addition to the `learning_rate` parameter, the `MomentumLearningRule` also accepts a `mom_coeff` argument. This *momentum coefficient* $\alpha \in [0,\,1]$ determines the contribution of the previous momentum value to the new momentum after an update.
As a first task you should:
* Compare the performance of a basic gradient descent learning rule to the momentum learning rule for several values of the momentum coefficient $\alpha$.
* Interpret how the momentum coefficient $\alpha$ influences training.
* Include plots of the error and accuracy training curves across the training epochs, for the different momentum coefficients you test.
Analogous to scheduling of the learning rate, it is also possible to vary the momentum coefficient over a training run. In particular it is common to increase the coefficient from an initially lower value at the start of training (when the direction of the gradient of the error function in parameter space are likely to vary a lot) to a larger value closer to 1 later in training. One possible schedule is
\begin{equation}
\alpha(t) = \alpha_{\infty} \left( 1 - \frac{\gamma}{t + \tau} \right)
\end{equation}
where $\alpha_{\infty} \in [0,\,1]$ determines the asymptotic momentum coefficient and $\tau \geq 1$ and $0 \leq \gamma \leq \tau$ determine the initial momentum coefficient and how quickly the coefficient tends to $\alpha_{\infty}$.
You should create a scheduler class which implements the above momentum coefficient schedule by adding a further definition to the `mlp.schedulers` module. This should have the same interface as the learning rate scheduler implemented in the previous part.
Using your implementation you should:
* Try out several different momentum rate schedules by using different values for $\alpha_{\infty}$, $\gamma$ and $\tau$ and investigate whether using a variable momentum coefficient gives improved performance over a constant momentum coefficient baseline.
## Part 3: Adaptive learning rules (40 marks)
In the final part of the assignment you will investigate adaptive learning rules which attempt to automatically tune the scale of updates in a parameter-dependent fashion.
You should implement **two** of the three adaptive learning rules mentioned in the [fourth lecture slides](http://www.inf.ed.ac.uk/teaching/courses/mlp/2016/mlp04-learn.pdf): [AdaGrad](http://jmlr.org/papers/v12/duchi11a.html), [RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) and [Adam](https://arxiv.org/abs/1412.6980).
You should implement the learning rules by defining new classes inheriting from `GradientDescendLearningRule` in the `mlp/learning_rules.py` module. The `MomentumLearningRule` class should show you how to define learning rules which use additional state variables to calculate the updates to the parameters.
You should:
* Compare the performance of your two implemented adaptive training rules to your previous results using the basic gradient descent and momentum learning rules. Ideally you should compare both in terms of speed of convergence (including potentially accounting for greater computional cost of the adaptive updates) and the final error / classification accuracy on both training and validation data sets.
* Briefly discuss any free parameters in the adaptive learning rules you implement and how sensitive training performance seems to the values used for them.
* Include example plots of the evolution of the error and accuracy across the training epochs for the training and validation sets for both of your implemented adaptive learning rules.
## Marking Scheme
* Part 1, Learning Rate Schedules (10 marks). Marks awarded for completeness of implementation, experimental methodology, experimental results.
* Part 2, Momentum Learning Rule (15 marks). Marks awarded for completeness of implementation, experimental methodology, experimental results.
* Part 3, Adaptive Learning Rules (40 marks). Marks awarded for completeness of implementation, experimental methodology, experimental results.
* Presentation and clarity of report (25 marks). Marks awarded for overall structure, clear and concise presentation, providing enough information to enable work to be reproduced, clear and concise presentation of results, informative discussion and conclusions.
* Additional Excellence (10 marks). Marks awarded for significant personal insight, creativity, originality, and/or extra depth and academic maturity.

Binary file not shown.

View File

@ -0,0 +1,607 @@
\documentclass[11pt,]{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\usepackage{xltxtra,xunicode}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\newcommand{\euro}{}
\fi
% use microtype if available
\IfFileExists{microtype.sty}{\usepackage{microtype}}{}
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\newenvironment{Shaded}{}{}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
\newcommand{\RegionMarkerTok}[1]{{#1}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\NormalTok}[1]{{#1}}
\ifxetex
\usepackage[setpagesize=false, % page size defined by xetex
unicode=false, % unicode breaks when used with xetex
xetex]{hyperref}
\else
\usepackage[unicode=true]{hyperref}
\fi
\hypersetup{breaklinks=true,
bookmarks=true,
pdfauthor={},
pdftitle={},
colorlinks=true,
citecolor=blue,
urlcolor=blue,
linkcolor=magenta,
pdfborder={0 0 0}}
\urlstyle{same} % don't use monospace font for urls
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{0}
\usepackage{txfonts}
\usepackage{microtype}
\usepackage[a4paper,body={170mm,250mm},top=25mm,left=25mm]{geometry}
\usepackage[sf,bf,small]{titlesec}
\usepackage{fancyhdr}
\pagestyle{fancy}
\lhead{\sffamily MLP Coursework 1}
\rhead{\sffamily Due: 27 October 2016}
\cfoot{\sffamily \thepage}
\author{}
\date{}
\begin{document}
\section{Machine Learning Practical: Coursework
1}\label{machine-learning-practical-coursework-1}
\textbf{Release date: Monday 10th October 2016}\\\textbf{Due date: 16:00
Thursday 27th October 2016}
\subsection{Introduction}\label{introduction}
This coursework is concerned with training multi-layer networks to
address the MNIST digit classification problem. It builds on the
material covered in the first three lab notebooks and the first four
lectures. It is highly recommended that you complete the first three lab
notebooks before starting the coursework. The aim of the coursework is
to investigate the effect of learning rate schedules and adaptive
learning rates on the progression of training and the final performance
achieved by the trained models.
\subsection{Mechanics}\label{mechanics}
\textbf{Marks:} This assignment will be assessed out of 100 marks and
forms 10\% of your final grade for the course.
\textbf{Academic conduct:} Assessed work is subject to University
regulations on academic
conduct:\\\url{http://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct}
\textbf{Late submissions:} The School of Informatics policy is that late
coursework normally gets a mark of zero. See
{\small\url{http://web.inf.ed.ac.uk/infweb/student-services/ito/admin/coursework-projects/late-coursework-extension-requests}}
for exceptions to this rule. Any requests for extensions should go to
the Informatics Teaching Office (ITO), either directly or via your
Personal Tutor.
\subsection{Report}\label{report}
The main component of your coursework submission, on which you will be
assessed, will be a short report. This should follow a typical
experimental report structure, in particular covering the following
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
a clear description of the methods used and algorithms implemented,
\item
quantitative results for the experiments you carried out including
relevant graphs,
\item
discussion of the results of your experiments and any conclusions you
have drawn.
\end{itemize}
The report should be submitted in PDF. You are welcome to use what ever
document preparation tool you prefer working with to write the report
providing it can produce a PDF output and can meet the required
presentation standards for the report.
Of the total 100 marks for the coursework, 25 marks have been allocated
for the quality of presentation and clarity of the report. A good
report, will clear, precise, and concise. It will contain enough
information for someone else to reproduce your work (with the exception
that you do not have to include the values to which the parameters were
randomly initialised).
You will need to include experimental results plotted as graphs in the
report. You are advised (but not required) to use \texttt{matplotlib} to
produce these plots, and you may reuse code plotting (and other) code
given in the lab notebooks as a starting point.
Each plot should have all axes labelled and if multiple plots are
included on the same set of axes a legend should be included to make
clear what each line represents. Within the report all figures should be
numbered (and you should use these numbers to refer to the figures in
the main text) and have a descriptive caption stating what they show.
Ideally all figures should be included in your report file as
\href{https://en.wikipedia.org/wiki/Vector_graphics}{vector graphics}
rather than \href{https://en.wikipedia.org/wiki/Raster_graphics}{raster
files} as this will make sure all detail in the plot is visible.
Matplotlib supports saving high quality figures in a wide range of
common image formats using the
\href{http://matplotlib.org/api/pyplot_api.html\#matplotlib.pyplot.savefig}{\texttt{savefig}}
function. \textbf{You should use \texttt{savefig} rather than copying
the screen-resolution raster images outputted in the notebook.} An
example of using \texttt{savefig} to save a figure as a PDF file (which
can be included as graphics in
\href{https://en.wikibooks.org/wiki/LaTeX/Importing_Graphics}{LaTeX}
compiled with \texttt{pdflatex} and in Apple Pages and
\href{https://support.office.com/en-us/article/Add-a-PDF-to-your-Office-file-74819342-8f00-4ab4-bcbe-0f3df15ab0dc}{Microsoft
Word} documents) is given below.
\begin{Shaded}
\begin{Highlighting}[]
\CharTok{import} \NormalTok{matplotlib.pyplot }\CharTok{as} \NormalTok{plt}
\CharTok{import} \NormalTok{numpy }\CharTok{as} \NormalTok{np}
\CommentTok{# Generate some example data to plot}
\NormalTok{x = np.linspace(}\DecValTok{0}\NormalTok{., }\DecValTok{1}\NormalTok{., }\DecValTok{100}\NormalTok{)}
\NormalTok{y1 = np.sin(}\DecValTok{2}\NormalTok{. * np.pi * x)}
\NormalTok{y2 = np.cos(}\DecValTok{2}\NormalTok{. * np.pi * x)}
\NormalTok{fig_size = (}\DecValTok{6}\NormalTok{, }\DecValTok{3}\NormalTok{) }\CommentTok{# Set figure size in inches (width, height)}
\NormalTok{fig = plt.figure(figsize=fig_size) }\CommentTok{# Create a new figure object}
\NormalTok{ax = fig.add_subplot(}\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{, }\DecValTok{1}\NormalTok{) }\CommentTok{# Add a single axes to the figure}
\CommentTok{# Plot lines giving each a label for the legend and setting line width to 2}
\NormalTok{ax.plot(x, y1, linewidth=}\DecValTok{2}\NormalTok{, label=}\StringTok{'$y = \textbackslash{}sin(2\textbackslash{}pi x)$'}\NormalTok{)}
\NormalTok{ax.plot(x, y2, linewidth=}\DecValTok{2}\NormalTok{, label=}\StringTok{'$y = \textbackslash{}cos(2\textbackslash{}pi x)$'}\NormalTok{)}
\CommentTok{# Set the axes labels. Can use LaTeX in labels within $...$ delimiters.}
\NormalTok{ax.set_xlabel(}\StringTok{'$x$'}\NormalTok{, fontsize=}\DecValTok{12}\NormalTok{)}
\NormalTok{ax.set_ylabel(}\StringTok{'$y$'}\NormalTok{, fontsize=}\DecValTok{12}\NormalTok{)}
\NormalTok{ax.grid(}\StringTok{'on'}\NormalTok{) }\CommentTok{# Turn axes grid on}
\NormalTok{ax.legend(loc=}\StringTok{'best'}\NormalTok{, fontsize=}\DecValTok{11}\NormalTok{) }\CommentTok{# Add a legend}
\NormalTok{fig.tight_layout() }\CommentTok{# This minimises whitespace around the axes.}
\NormalTok{fig.savefig(}\StringTok{'file-name.pdf'}\NormalTok{) }\CommentTok{# Save figure to current directory in PDF format}
\end{Highlighting}
\end{Shaded}
If you are using Libre/OpenOffice you should use Scalable Vector Format
plots instead using \\
\texttt{fig.savefig('file-name.svg')}. If the
document editor you are using for the report does not support including
either PDF or SVG graphics you can instead output high-resolution raster
images using \texttt{fig.savefig('file-name.png', dpi=200)} however note
these files will generally be larger than either SVG or PDF formatted
graphics.
If you make use of any any books, articles, web pages or other resources
you should appropriately cite these in your report. You do not need to
cite material from the course lecture slides or lab notebooks.
\subsection{Code}\label{code}
You should run all of the experiments for the coursework inside the
Conda environment
\href{https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2016-7/master/environment-set-up.md}{you
set up in the first lab}.
The code for the coursework is available on the course
\href{https://github.com/CSTR-Edinburgh/mlpractical/}{Github repository}
on a branch \texttt{mlp2016-7/coursework1}. To create a local working
copy of this branch in your local repository you need to do the
following.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\itemsep1pt\parskip0pt\parsep0pt
\item
Make sure all modified files on the branch you are currently on have
been committed
(\href{https://github.com/CSTR-Edinburgh/mlpractical/blob/mlp2016-7/master/getting-started-in-a-lab.md}{see
details here} if you are unsure how to do this).
\item
Fetch changes to the upstream \texttt{origin} repository by running\\
\texttt{git fetch origin}
\item
Checkout a new local branch from the fetched branch using\\
\texttt{git checkout -b coursework1 origin/mlp2016-7/coursework1}
\end{enumerate}
You will now have a new branch in your local repository with all the
code necessary for the coursework in it. In the \texttt{notebooks}
directory there is a notebook \texttt{Coursework\_1.ipynb} which is
intended as a starting point for structuring the code for your
experiments. You will probably want to add additional code cells to this
as you go along and run new experiments (e.g.~doing each new training
run in a new cell). You may also wish to use Markdown cells to keep
notes on the results of experiments.
\subsection{Submission}\label{submission}
Your coursework submission should be done electronically using the
\href{http://computing.help.inf.ed.ac.uk/submit}{\texttt{submit}}
command available on DICE machines.
Your submission should include
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
your completed course report as a PDF file,
\item
the notebook (\texttt{.ipynb}) file you use to run the experiments in
\item
and your local version of the \texttt{mlp} code including any changes
you make to the modules (\texttt{.py} files).
\end{itemize}
You should EITHER (1) package all of these files into a single archive
file using
\href{http://linuxcommand.org/man_pages/tar1.html}{\texttt{tar}} or
\href{http://linuxcommand.org/man_pages/zip1.html}{\texttt{zip}}, e.g.
{\small
\begin{verbatim}
tar -zcf coursework1.tar.gz notebooks/Coursework_1.ipynb mlp/*.py reports/coursework1.pdf
\end{verbatim}
}
and then submit this archive using
\begin{verbatim}
submit mlp 1 coursework1.tar.gz
\end{verbatim}
OR (2) copy all of the files to a single directory \texttt{coursework1}
directory, e.g.
\begin{verbatim}
mkdir coursework1
cp notebooks/Coursework_1.ipynb mlp/*.py reports/coursework1.pdf coursework1
\end{verbatim}
and then submit this directory using
\begin{verbatim}
submit mlp 1 coursework1
\end{verbatim}
The \texttt{submit} command will prompt you with the details of the
submission including the name of the files / directories you are
submitting and the name of the course and exercise you are submitting
for and ask you to check if these details are correct. You should check
these carefully and reply \texttt{y} to submit if you are sure the files
are correct and \texttt{n} otherwise.
You can amend an existing submission by rerunning the \texttt{submit}
command any time up to the deadline. It is therefore a good idea
(particularly if this is your first time using the DICE submit
mechanism) to do an initial run of the \texttt{submit} command early on
and then rerun the command if you make any further updates to your
submisison rather than leaving submission to the last minute.
\subsection{Backing up your work}\label{backing-up-your-work}
It is \textbf{strongly recommended} you use some method for backing up
your work. Those working in their AFS homespace on DICE will have their
work automatically backed up as part of the
\href{http://computing.help.inf.ed.ac.uk/backups-and-mirrors}{routine
backup} of all user homespaces. If you are working on a personal
computer you should have your own backup method in place (e.g.~saving
additional copies to an external drive, syncing to a cloud service or
pushing commits to your local Git repository to a private repository on
Github). \textbf{Loss of work through failure to back up
\href{http://tinyurl.com/edinflate}{does not consitute a good reason for
late submission}}.
You may \emph{additionally} wish to keep your coursework under version
control in your local Git repository on the \texttt{coursework1} branch.
This does not need to be limited to the coursework notebook and
\texttt{mlp} Python modules - you can also add your report document to
the repository.
If you make regular commits of your work on the coursework this will
allow you to better keep track of the changes you have made and if
necessary revert to previous versions of files and/or restore
accidentally deleted work. This is not however required and you should
note that keeping your work under version control is a distinct issue
from backing up to guard against hard drive failure. If you are working
on a personal computer you should still keep an additional back up of
your work as described above.
\subsection{Standard network
architecture}\label{standard-network-architecture}
To make the results of your experiments more easily comparable, you
should try to keep as many of the free choices in the specification of
the model and learning problem the same across different experiments. If
you vary only a small number of aspects of the problem at a time this
will make it easier to interpret the effect those changes have.
In all experiments you should therefore use the same model architecture
and parameter initialisation method. In particular you should use a
model composed of three affine transformations interleaved with logistic
sigmoid nonlinearities, and a softmax output layer. The intermediate
layers between the input and output should have a dimension of 100
(i.e.~two hidden layers with 100 units in each hidden layer). This can
be defined with the following code:
\begin{Shaded}
\begin{Highlighting}[]
\CharTok{import} \NormalTok{numpy }\CharTok{as} \NormalTok{np}
\CharTok{from} \NormalTok{mlp.layers }\CharTok{import} \NormalTok{AffineLayer, SoftmaxLayer, SigmoidLayer}
\CharTok{from} \NormalTok{mlp.errors }\CharTok{import} \NormalTok{CrossEntropySoftmaxError}
\CharTok{from} \NormalTok{mlp.models }\CharTok{import} \NormalTok{MultipleLayerModel}
\CharTok{from} \NormalTok{mlp.initialisers }\CharTok{import} \NormalTok{ConstantInit, GlorotUniformInit}
\NormalTok{seed = }\DecValTok{10102016}
\NormalTok{rng = np.random.RandomState(seed)}
\NormalTok{input_dim, output_dim, hidden_dim = }\DecValTok{784}\NormalTok{, }\DecValTok{10}\NormalTok{, }\DecValTok{100}
\NormalTok{weights_init = GlorotUniformInit(rng=rng)}
\NormalTok{biases_init = ConstantInit(}\DecValTok{0}\NormalTok{.)}
\NormalTok{model = MultipleLayerModel([}
\NormalTok{AffineLayer(input_dim, hidden_dim, weights_init, biases_init),}
\NormalTok{SigmoidLayer(),}
\NormalTok{AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init),}
\NormalTok{SigmoidLayer(),}
\NormalTok{AffineLayer(hidden_dim, output_dim, weights_init, biases_init)}
\NormalTok{])}
\NormalTok{error = CrossEntropySoftmaxError()}
\end{Highlighting}
\end{Shaded}
Here we are using a special parameter initialisation scheme for the
weights which makes the scale of the random initialisation dependent on
the input and output dimensions of the layer, with the aim of trying to
keep the scale of activations at different layers of the network the
same at initialisation. The scheme is described in
\href{http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf}{\emph{Understanding
the difficulty of training deep feedforward neural networks}, Glorot and
Bengio (2011)}. As also recommended there we initialise the biases to
zero. You do not need to read or understand this paper for the
assignment, it only being mentioned to explain the use of
\texttt{GlorotUniformInit} in the above code. You should use this
parameter initialisation for all of your experiments.
As well as standardising the network architecture, you should also fix
the hyperparameters of the training procedure not being investigated to
be the same across different runs. In particular for all experiments you
should use a \textbf{batch size of 50 and train for a total of 100
epochs} for all reported runs. You may of course use a smaller number of
epochs for initial pilot runs.
\subsection{Part 1: Learning rate schedules (10
marks)}\label{part-1-learning-rate-schedules-10-marks}
In the first part of the assignment you will investigate how using a
time-dependent learning rate schedule influences training.
Implement one of the two following time-dependent learning rate
schedules:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
exponential $\eta(t) = \eta_0 \exp\left(-t / r\right)$
\item
reciprocal $\eta(t) = \eta_0 \left(1 + t / r\right)^{-1}$
\end{itemize}
where $\eta_0$ is the initial learning rate, $t$ the epoch number,
$\eta(t)$ the learning rate at epoch $t$ and $r$ a free parameter
governing how quickly the learning rate decays.
You should implement the schedule by creating a new scheduler class in
the \texttt{mlp.schedulers.py} module which follows the interface of the
example \texttt{ConstantLearningRateScheduler} given in the module. In
particular as well as an \texttt{\_\_init\_\_} method initialising the
object with any free parameters for the schedule, the class should
define a \texttt{update\_learning\_rule} method which sets the
\texttt{learning\_rate} attribute of a learning rule object based on the
current epoch number.
A (potentially empty) list of scheduler objects are passed to the
\texttt{\_\_init\_\_} method of the \texttt{Optimiser} object used to
train the model, for example
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{schedulers = [ConstantLearningRateScheduler(learning_rate)]}
\NormalTok{optimiser = Optimiser(}
\NormalTok{model, error, learning_rule, train_data,}
\NormalTok{valid_data, data_monitors, schedulers)}
\end{Highlighting}
\end{Shaded}
You should:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
Compare the performance of your time-dependent learning rate schedule
when training the standard model on the MNIST digit classification
task, to training with a constant learning rate baseline.
\item
Indicate how the free schedule parameters $\eta_0$ and $r$ affect the
evolution of the training.
\item
State the final error function and classification accuracy values and
include plots of the evolution of the error and accuracy across the
training epochs for both the training and validation sets. These
should be reported for both the constant learning rate baseline and
\emph{at least} one run with your learning rate scheduler
implementation.
\end{itemize}
\subsection{Part 2: Momentum learning rule (15
marks)}\label{part-2-momentum-learning-rule-15-marks}
In this part of the assignment you will investigate using a gradient
descent learning rule with momentum. This extends the basic gradient
learning rule by introducing extra momentum state variables for the
parameters. These can help the learning dynamic help overcome shallow
local minima and speed convergence when making multiple successive steps
in a similar direction in parameter space.
An implementation of the momentum learning rule is given in the
\texttt{mlp.learning\_rules} module in the \texttt{MomentumLearningRule}
class. Read through the code and documentation for this class and make
sure you understand how it relates to the equations given in the lecture
slides.
In addition to the \texttt{learning\_rate} parameter, the
\texttt{MomentumLearningRule} also accepts a \texttt{mom\_coeff}
argument. This \emph{momentum coefficient} $\alpha \in [0,\,1]$
determines the contribution of the previous momentum value to the new
momentum after an update.
\newpage
As a first task you should:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
Compare the performance of a basic gradient descent learning rule to
the momentum learning rule for several values of the momentum
coefficient $\alpha$.
\item
Interpret how the momentum coefficient $\alpha$ influences training.
\item
Include plots of the error and accuracy training curves across the
training epochs, for the different momentum coefficients you test.
\end{itemize}
Analogous to scheduling of the learning rate, it is also possible to
vary the momentum coefficient over a training run. In particular it is
common to increase the coefficient from an initially lower value at the
start of training (when the direction of the gradient of the error
function in parameter space are likely to vary a lot) to a larger value
closer to 1 later in training. One possible schedule is
\begin{equation}
\alpha(t) = \alpha_{\infty} \left( 1 - \frac{\gamma}{t + \tau} \right)
\end{equation}
where $\alpha_{\infty} \in [0,\,1]$ determines the asymptotic momentum
coefficient and $\tau \geq 1$ and $0 \leq \gamma \leq \tau$ determine
the initial momentum coefficient and how quickly the coefficient tends to
$\alpha_{\infty}$.
You should create a scheduler class which implements the above momentum
coefficient schedule by adding a further definition to the
\texttt{mlp.schedulers} module. This should have the same interface as
the learning rate scheduler implemented in the previous part.
Using your implementation you should:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
Try out several different momentum rate schedules by using different
values for $\alpha_{\infty}$, $\gamma$ and $\tau$ and investigate whether
using a variable momentum coefficient gives improved performance over
a constant momentum coefficient baseline.
\end{itemize}
\subsection{Part 3: Adaptive learning rules (40
marks)}\label{part-3-adaptive-learning-rules-40-marks}
In the final part of the assignment you will investigate adaptive
learning rules which attempt to automatically tune the scale of updates
in a parameter-dependent fashion.
You should implement \textbf{two} of the three adaptive learning rules
mentioned in the
\href{http://www.inf.ed.ac.uk/teaching/courses/mlp/2016/mlp04-learn.pdf}{fourth
lecture slides}:
\href{http://jmlr.org/papers/v12/duchi11a.html}{AdaGrad},
\href{http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf}{RMSProp}
and \href{https://arxiv.org/abs/1412.6980}{Adam}.
You should implement the learning rules by defining new classes
inheriting from \texttt{GradientDescendLearningRule} in the
\texttt{mlp/learning\_rules.py} module. The
\texttt{MomentumLearningRule} class should show you how to define
learning rules which use additional state variables to calculate the
updates to the parameters.
You should:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
Compare the performance of your two implemented adaptive training
rules to your previous results using the basic gradient descent and
momentum learning rules. Ideally you should compare both in terms of
speed of convergence (including potentially accounting for greater
computional cost of the adaptive updates) and the final error /
classification accuracy on both training and validation data sets.
\item
Briefly discuss any free parameters in the adaptive learning rules you
implement and how sensitive training performance seems to the values
used for them.
\item
Include example plots of the evolution of the error and accuracy
across the training epochs for the training and validation sets for
both of your implemented adaptive learning rules.
\end{itemize}
\subsection{Marking Scheme}\label{marking-scheme}
\begin{itemize}
\item
Part 1, Learning Rate Schedules (10 marks). Marks awarded for
completeness of implementation, experimental methodology, experimental
results.
\item
Part 2, Momentum Learning Rule (15 marks). Marks awarded for
completeness of implementation, experimental methodology, experimental
results.
\item
Part 3, Adaptive Learning Rules (40 marks). Marks awarded for
completeness of implementation, experimental methodology, experimental
results.
\item
Presentation and clarity of report (25 marks). Marks awarded for
overall structure, clear and concise presentation, providing enough
information to enable work to be reproduced, clear and concise
presentation of results, informative discussion and conclusions.
\item
Additional Excellence (10 marks). Marks awarded for significant
personal insight, creativity, originality, and/or extra depth and
academic maturity.
\end{itemize}
\end{document}

12
courseworks/cw1_hdr.tex Normal file
View File

@ -0,0 +1,12 @@
\usepackage{txfonts}
\usepackage{microtype}
\usepackage[a4paper,body={170mm,250mm},top=25mm,left=25mm]{geometry}
\usepackage[sf,bf,small]{titlesec}
\usepackage{fancyhdr}
\pagestyle{fancy}
\lhead{\sffamily MLP Coursework 1}
\rhead{\sffamily Due: 27 October 2016}
\cfoot{\sffamily \thepage}

View File

@ -0,0 +1,254 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Practical: Coursework 1\n",
"\n",
"**Release date: Monday 10th October 2016** \n",
"**Due date: 16:00 Thursday 27th October 2016**\n",
"\n",
"Instructions for the coursework are [available as a PDF here](http://www.inf.ed.ac.uk/teaching/courses/mlp/2016/coursework_1.pdf)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1: Learning rate schedules"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# The below code will set up the data providers, random number\n",
"# generator and logger objects needed for training runs. As\n",
"# loading the data from file take a little while you generally\n",
"# will probably not want to reload the data providers on\n",
"# every training run. If you wish to reset their state you\n",
"# should instead use the .reset() method of the data providers.\n",
"import numpy as np\n",
"import logging\n",
"from mlp.data_providers import MNISTDataProvider\n",
"\n",
"# Seed a random number generator\n",
"seed = 10102016 \n",
"rng = np.random.RandomState(seed)\n",
"\n",
"# Set up a logger object to print info about the training run to stdout\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.INFO)\n",
"logger.handlers = [logging.StreamHandler()]\n",
"\n",
"# Create data provider objects for the MNIST data set\n",
"train_data = MNISTDataProvider('train', batch_size=50, rng=rng)\n",
"valid_data = MNISTDataProvider('valid', batch_size=50, rng=rng)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# The model set up code below is provided as a starting point.\n",
"# You will probably want to add further code cells for the\n",
"# different experiments you run.\n",
"\n",
"from mlp.layers import AffineLayer, SoftmaxLayer, SigmoidLayer\n",
"from mlp.errors import CrossEntropySoftmaxError\n",
"from mlp.models import MultipleLayerModel\n",
"from mlp.initialisers import ConstantInit, GlorotUniformInit\n",
"\n",
"input_dim, output_dim, hidden_dim = 784, 10, 100\n",
"\n",
"weights_init = GlorotUniformInit(rng=rng)\n",
"biases_init = ConstantInit(0.)\n",
"\n",
"model = MultipleLayerModel([\n",
" AffineLayer(input_dim, hidden_dim, weights_init, biases_init), \n",
" SigmoidLayer(),\n",
" AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init), \n",
" SigmoidLayer(),\n",
" AffineLayer(hidden_dim, output_dim, weights_init, biases_init)\n",
"])\n",
"\n",
"error = CrossEntropySoftmaxError()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 2: Momentum learning rule"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# The below code will set up the data providers, random number\n",
"# generator and logger objects needed for training runs. As\n",
"# loading the data from file take a little while you generally\n",
"# will probably not want to reload the data providers on\n",
"# every training run. If you wish to reset their state you\n",
"# should instead use the .reset() method of the data providers.\n",
"import numpy as np\n",
"import logging\n",
"from mlp.data_providers import MNISTDataProvider\n",
"\n",
"# Seed a random number generator\n",
"seed = 10102016 \n",
"rng = np.random.RandomState(seed)\n",
"\n",
"# Set up a logger object to print info about the training run to stdout\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.INFO)\n",
"logger.handlers = [logging.StreamHandler()]\n",
"\n",
"# Create data provider objects for the MNIST data set\n",
"train_data = MNISTDataProvider('train', batch_size=50, rng=rng)\n",
"valid_data = MNISTDataProvider('valid', batch_size=50, rng=rng)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# The model set up code below is provided as a starting point.\n",
"# You will probably want to add further code cells for the\n",
"# different experiments you run.\n",
"\n",
"from mlp.layers import AffineLayer, SoftmaxLayer, SigmoidLayer\n",
"from mlp.errors import CrossEntropySoftmaxError\n",
"from mlp.models import MultipleLayerModel\n",
"from mlp.initialisers import ConstantInit, GlorotUniformInit\n",
"\n",
"input_dim, output_dim, hidden_dim = 784, 10, 100\n",
"\n",
"weights_init = GlorotUniformInit(rng=rng)\n",
"biases_init = ConstantInit(0.)\n",
"\n",
"model = MultipleLayerModel([\n",
" AffineLayer(input_dim, hidden_dim, weights_init, biases_init), \n",
" SigmoidLayer(),\n",
" AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init), \n",
" SigmoidLayer(),\n",
" AffineLayer(hidden_dim, output_dim, weights_init, biases_init)\n",
"])\n",
"\n",
"error = CrossEntropySoftmaxError()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 3: Adaptive learning rules"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# The below code will set up the data providers, random number\n",
"# generator and logger objects needed for training runs. As\n",
"# loading the data from file take a little while you generally\n",
"# will probably not want to reload the data providers on\n",
"# every training run. If you wish to reset their state you\n",
"# should instead use the .reset() method of the data providers.\n",
"import numpy as np\n",
"import logging\n",
"from mlp.data_providers import MNISTDataProvider\n",
"\n",
"# Seed a random number generator\n",
"seed = 10102016 \n",
"rng = np.random.RandomState(seed)\n",
"\n",
"# Set up a logger object to print info about the training run to stdout\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.INFO)\n",
"logger.handlers = [logging.StreamHandler()]\n",
"\n",
"# Create data provider objects for the MNIST data set\n",
"train_data = MNISTDataProvider('train', batch_size=50, rng=rng)\n",
"valid_data = MNISTDataProvider('valid', batch_size=50, rng=rng)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# The model set up code below is provided as a starting point.\n",
"# You will probably want to add further code cells for the\n",
"# different experiments you run.\n",
"\n",
"from mlp.layers import AffineLayer, SoftmaxLayer, SigmoidLayer\n",
"from mlp.errors import CrossEntropySoftmaxError\n",
"from mlp.models import MultipleLayerModel\n",
"from mlp.initialisers import ConstantInit, GlorotUniformInit\n",
"\n",
"input_dim, output_dim, hidden_dim = 784, 10, 100\n",
"\n",
"weights_init = GlorotUniformInit(rng=rng)\n",
"biases_init = ConstantInit(0.)\n",
"\n",
"model = MultipleLayerModel([\n",
" AffineLayer(input_dim, hidden_dim, weights_init, biases_init), \n",
" SigmoidLayer(),\n",
" AffineLayer(hidden_dim, hidden_dim, weights_init, biases_init), \n",
" SigmoidLayer(),\n",
" AffineLayer(hidden_dim, output_dim, weights_init, biases_init)\n",
"])\n",
"\n",
"error = CrossEntropySoftmaxError()"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda env:mlp]",
"language": "python",
"name": "conda-env-mlp-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 1
}