Merge branch 'mlp2017-8/master' of https://github.com/CSTR-Edinburgh/mlpractical into mlp2017-8/master

This commit is contained in:
AntreasAntoniou 2018-09-13 02:14:43 +01:00
commit 97f36afd51
28 changed files with 5838 additions and 34 deletions

View File

@ -5,31 +5,53 @@ This repository contains the code for the University of Edinburgh [School of Inf
This assignment-based course is focused on the implementation and evaluation of machine learning systems. Students who do this course will have experience in the design, implementation, training, and evaluation of machine learning systems.
The code in this repository is split into:
1. notebooks:
1. Introduction_to_tensorflow: Introduces students to the basics of tensorflow and lower level operations.
2. Introduction_to_tf_mlp_repo: Introduces students to the high level functionality of this repo and how one
could run an experiment. The code is full of comments and documentation so you should spend more time
reading and understanding the code by running simple experiments and changing pieces of code to see the impact
on the system.
2. utils:
1. network_summary: Provides utilities with which one can get network summaries, such as the number of parameters and names of layers.
2. parser_utils which are used to parse arguments passed to the training scripts.
3. storage, which is responsible for storing network statistics.
3. data_providers.py : Provides the data providers for training, validation and testing.
4. network_architectures.py: Defines the network architectures. We provide VGGNet as an example.
5. network_builder.py: Builds the tensorflow computation graph. In more detail, it builds the losses, tensorflow summaries and training operations.
6. network_trainer.py: Runs an experiment, composed of training, validation and testing. It is setup to use arguments such that one can easily write multiple bash scripts with different hyperparameters and run experiments very quickly with minimal code changes.
* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
## Getting set up
Detailed instructions for setting up a development environment for the course are given in [this file](notes/environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
## Frequent Issues/Solutions
Dont forget that from your /mlpractica/l folder you should first do
Once you have setup the basic environment then to install the requirements for the tf_mlp repo simply run:
```
git status #to check whether there are any changes in your local branch. If there are, you need to do:
git add “path /to/file”
git commit -m “some message”
pip install -r requirements.txt
```
For CPU tensorflow and
```
pip install -r requirements_gpu.txt
```
for GPU tensorflow.
If you install the wrong version of tensorflow simply run
```
pip uninstall $tensorflow_to_uninstall
```
replacing $tensorflow_to_uninstall with the tensorflow you want to install and then install the correct one
using pip install as normally done.
## Additional Packages
For the tf_mlp you are required to install either the tensorflow-1.4.1 package for CPU users or the tensorflow_gpu-1.4.1 for GPU users. Both of these can easily be installed via pip using:
```
pip install tensorflow
```
Only if this is OK, you can run
```
git checkout mlp2017-8/lab[n]
```
Related to MLP module not found error:
Another thing is to make sure you have you MLP_DATA_DIR path correctly set. You can check this by typing
```echo $MLP_DATA_DIR```
in the command line. If this is not set up, you need to follow the instructions on the set-up-environment to get going.
or
Finally, please make sure you have run
```python setup.py develop```
```
pip install tensorflow_gpu
```

181
cifar100_network_trainer.py Normal file
View File

@ -0,0 +1,181 @@
import argparse
import numpy as np
import tensorflow as tf
import tqdm
from data_providers import CIFAR100DataProvider
from network_builder import ClassifierNetworkGraph
from utils.parser_utils import ParserClass
from utils.storage import build_experiment_folder, save_statistics, get_best_validation_model_statistics
tf.reset_default_graph() # resets any previous graphs to clear memory
parser = argparse.ArgumentParser(description='Welcome to CNN experiments script') # generates an argument parser
parser_extractor = ParserClass(parser=parser) # creates a parser class to process the parsed input
batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
# returns a list of objects that contain
# our parsed input
experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
batch_size, batch_norm,
strided_dim_reduction)
# generate experiment name
rng = np.random.RandomState(seed=seed) # set seed
train_data = CIFAR100DataProvider(which_set="train", batch_size=batch_size, rng=rng, random_sampling=True)
val_data = CIFAR100DataProvider(which_set="valid", batch_size=batch_size, rng=rng)
test_data = CIFAR100DataProvider(which_set="test", batch_size=batch_size, rng=rng)
# setup our data providers
print("Running {}".format(experiment_name))
print("Starting from epoch {}".format(continue_from_epoch))
saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path) # generate experiment dir
# Placeholder setup
data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1], train_data.inputs.shape[2],
train_data.inputs.shape[3]], 'data-inputs')
data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
training_phase = tf.placeholder(tf.bool, name='training-flag')
rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
classifier_network = ClassifierNetworkGraph(input_x=data_inputs, target_placeholder=data_targets,
dropout_rate=dropout_rate, batch_size=batch_size,
n_classes=train_data.num_classes,
is_training=training_phase, augment_rotate_flag=rotate_data,
strided_dim_reduction=strided_dim_reduction,
use_batch_normalization=batch_norm) # initialize our computational graph
if continue_from_epoch == -1: # if this is a new experiment and not continuation of a previous one then generate a new
# statistics file
save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
"val_c_loss", "val_c_accuracy",
"test_c_loss", "test_c_accuracy"], create=True)
start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0 # if new experiment start from 0 otherwise
# continue where left off
summary_op, losses_ops, c_error_opt_op = classifier_network.init_train() # get graph operations (ops)
total_train_batches = train_data.num_batches
total_val_batches = val_data.num_batches
total_test_batches = test_data.num_batches
if tensorboard_enable:
print("saved tensorboard file at", logs_filepath)
writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
init = tf.global_variables_initializer() # initialization op for the graph
with tf.Session() as sess:
sess.run(init) # actually running the initialization op
train_saver = tf.train.Saver() # saver object that will save our graph so we can reload it later for continuation of
val_saver = tf.train.Saver()
best_val_accuracy = 0.
best_epoch = 0
# training or inference
if continue_from_epoch != -1:
train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
continue_from_epoch)) # restore previous graph to continue operations
best_val_accuracy, best_epoch = get_best_validation_model_statistics(logs_filepath, "result_summary_statistics")
print(best_val_accuracy, best_epoch)
with tqdm.tqdm(total=epochs - start_epoch) as epoch_pbar:
for e in range(start_epoch, epochs):
total_c_loss = 0.
total_accuracy = 0.
with tqdm.tqdm(total=total_train_batches) as pbar_train:
for batch_idx, (x_batch, y_batch) in enumerate(train_data):
iter_id = e * total_train_batches + batch_idx
_, c_loss_value, acc = sess.run(
[c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
# Here we execute the c_error_opt_op which trains the network and also the ops that compute the
# loss and accuracy, we save those in _, c_loss_value and acc respectively.
total_c_loss += c_loss_value # add loss of current iter to sum
total_accuracy += acc # add acc of current iter to sum
iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
total_c_loss / (batch_idx + 1),
total_accuracy / (
batch_idx + 1)) # show
# iter statistics using running averages of previous iter within this epoch
pbar_train.set_description(iter_out)
pbar_train.update(1)
if tensorboard_enable and batch_idx % 25 == 0: # save tensorboard summary every 25 iterations
_summary = sess.run(
summary_op,
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
writer.add_summary(_summary, global_step=iter_id)
total_c_loss /= total_train_batches # compute mean of los
total_accuracy /= total_train_batches # compute mean of accuracy
save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
# save graph and weights
print("Saved current model at", save_path)
total_val_c_loss = 0.
total_val_accuracy = 0. # run validation stage, note how training_phase placeholder is set to False
# and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
# to collect losses on the validation set
with tqdm.tqdm(total=total_val_batches) as pbar_val:
for batch_idx, (x_batch, y_batch) in enumerate(val_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_val_c_loss += c_loss_value
total_val_accuracy += acc
iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
total_val_accuracy / (batch_idx + 1))
pbar_val.set_description(iter_out)
pbar_val.update(1)
total_val_c_loss /= total_val_batches
total_val_accuracy /= total_val_batches
if best_val_accuracy < total_val_accuracy: # check if val acc better than the previous best and if
# so save current as best and save the model as the best validation model to be used on the test set
# after the final epoch
best_val_accuracy = total_val_accuracy
best_epoch = e
save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
print("Saved best validation score model at", save_path)
epoch_pbar.update(1)
# save statistics of this epoch, train and val without test set performance
save_statistics(logs_filepath, "result_summary_statistics",
[e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
-1, -1])
val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
# restore model with best performance on validation set
total_test_c_loss = 0.
total_test_accuracy = 0.
# computer test loss and accuracy and save
with tqdm.tqdm(total=total_test_batches) as pbar_test:
for batch_idx, (x_batch, y_batch) in enumerate(test_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_test_c_loss += c_loss_value
total_test_accuracy += acc
iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
total_test_accuracy / (batch_idx + 1))
pbar_test.set_description(iter_out)
pbar_test.update(1)
total_test_c_loss /= total_test_batches
total_test_accuracy /= total_test_batches
save_statistics(logs_filepath, "result_summary_statistics",
["test set performance", -1, -1, -1, -1,
total_test_c_loss, total_test_accuracy])

183
cifar10_network_trainer.py Normal file
View File

@ -0,0 +1,183 @@
import argparse
import numpy as np
import tensorflow as tf
import tqdm
from data_providers import CIFAR10DataProvider
from network_builder import ClassifierNetworkGraph
from utils.parser_utils import ParserClass
from utils.storage import build_experiment_folder, save_statistics, get_best_validation_model_statistics
tf.reset_default_graph() # resets any previous graphs to clear memory
parser = argparse.ArgumentParser(description='Welcome to CNN experiments script') # generates an argument parser
parser_extractor = ParserClass(parser=parser) # creates a parser class to process the parsed input
batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
# returns a list of objects that contain
# our parsed input
experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
batch_size, batch_norm,
strided_dim_reduction)
# generate experiment name
rng = np.random.RandomState(seed=seed) # set seed
train_data = CIFAR10DataProvider(which_set="train", batch_size=batch_size, rng=rng, random_sampling=True)
val_data = CIFAR10DataProvider(which_set="valid", batch_size=batch_size, rng=rng)
test_data = CIFAR10DataProvider(which_set="test", batch_size=batch_size, rng=rng)
# setup our data providers
print("Running {}".format(experiment_name))
print("Starting from epoch {}".format(continue_from_epoch))
saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path) # generate experiment dir
# Placeholder setup
data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1], train_data.inputs.shape[2],
train_data.inputs.shape[3]], 'data-inputs')
data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
training_phase = tf.placeholder(tf.bool, name='training-flag')
rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
classifier_network = ClassifierNetworkGraph(input_x=data_inputs, target_placeholder=data_targets,
dropout_rate=dropout_rate, batch_size=batch_size,
n_classes=train_data.num_classes, is_training=training_phase,
augment_rotate_flag=rotate_data,
strided_dim_reduction=strided_dim_reduction,
use_batch_normalization=batch_norm) # initialize our computational graph
if continue_from_epoch == -1: # if this is a new experiment and not continuation of a previous one then generate a new
# statistics file
save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
"val_c_loss", "val_c_accuracy",
"test_c_loss", "test_c_accuracy"], create=True)
start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0 # if new experiment start from 0 otherwise
# continue where left off
summary_op, losses_ops, c_error_opt_op = classifier_network.init_train() # get graph operations (ops)
total_train_batches = train_data.num_batches
total_val_batches = val_data.num_batches
total_test_batches = test_data.num_batches
if tensorboard_enable:
print("saved tensorboard file at", logs_filepath)
writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
init = tf.global_variables_initializer() # initialization op for the graph
with tf.Session() as sess:
sess.run(init) # actually running the initialization op
train_saver = tf.train.Saver() # saver object that will save our graph so we can reload it later for continuation of
val_saver = tf.train.Saver()
best_val_accuracy = 0.
best_epoch = 0
# training or inference
if continue_from_epoch != -1:
train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
continue_from_epoch)) # restore previous graph to continue operations
best_val_accuracy, best_epoch = get_best_validation_model_statistics(logs_filepath, "result_summary_statistics")
print(best_val_accuracy, best_epoch)
with tqdm.tqdm(total=epochs-start_epoch) as epoch_pbar:
for e in range(start_epoch, epochs):
total_c_loss = 0.
total_accuracy = 0.
with tqdm.tqdm(total=total_train_batches) as pbar_train:
for batch_idx, (x_batch, y_batch) in enumerate(train_data):
iter_id = e * total_train_batches + batch_idx
_, c_loss_value, acc = sess.run(
[c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
# Here we execute the c_error_opt_op which trains the network and also the ops that compute the
# loss and accuracy, we save those in _, c_loss_value and acc respectively.
total_c_loss += c_loss_value # add loss of current iter to sum
total_accuracy += acc # add acc of current iter to sum
iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
total_c_loss / (batch_idx + 1),
total_accuracy / (
batch_idx + 1)) # show
# iter statistics using running averages of previous iter within this epoch
pbar_train.set_description(iter_out)
pbar_train.update(1)
if tensorboard_enable and batch_idx % 25 == 0: # save tensorboard summary every 25 iterations
_summary = sess.run(
summary_op,
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
writer.add_summary(_summary, global_step=iter_id)
total_c_loss /= total_train_batches # compute mean of los
total_accuracy /= total_train_batches # compute mean of accuracy
save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
# save graph and weights
print("Saved current model at", save_path)
total_val_c_loss = 0.
total_val_accuracy = 0. # run validation stage, note how training_phase placeholder is set to False
# and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
# to collect losses on the validation set
with tqdm.tqdm(total=total_val_batches) as pbar_val:
for batch_idx, (x_batch, y_batch) in enumerate(val_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_val_c_loss += c_loss_value
total_val_accuracy += acc
iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
total_val_accuracy / (batch_idx + 1))
pbar_val.set_description(iter_out)
pbar_val.update(1)
total_val_c_loss /= total_val_batches
total_val_accuracy /= total_val_batches
if best_val_accuracy < total_val_accuracy: # check if val acc better than the previous best and if
# so save current as best and save the model as the best validation model to be used on the test set
# after the final epoch
best_val_accuracy = total_val_accuracy
best_epoch = e
save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
print("Saved best validation score model at", save_path)
epoch_pbar.update(1)
# save statistics of this epoch, train and val without test set performance
save_statistics(logs_filepath, "result_summary_statistics",
[e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
-1, -1])
val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
# restore model with best performance on validation set
total_test_c_loss = 0.
total_test_accuracy = 0.
# computer test loss and accuracy and save
with tqdm.tqdm(total=total_test_batches) as pbar_test:
for batch_idx, (x_batch, y_batch) in enumerate(test_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_test_c_loss += c_loss_value
total_test_accuracy += acc
iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
total_test_accuracy / (batch_idx + 1))
pbar_test.set_description(iter_out)
pbar_test.update(1)
total_test_c_loss /= total_test_batches
total_test_accuracy /= total_test_batches
save_statistics(logs_filepath, "result_summary_statistics",
["test set performance", -1, -1, -1, -1,
total_test_c_loss, total_test_accuracy])

740
data_providers.py Normal file
View File

@ -0,0 +1,740 @@
# -*- coding: utf-8 -*-
"""Data providers.
This module provides classes for loading datasets and iterating over batches of
data points.
"""
import os
import numpy as np
DEFAULT_SEED = 22012018
class DataProvider(object):
"""Generic data provider."""
def __init__(self, inputs, targets, batch_size, max_num_batches=-1,
random_sampling=True, rng=None):
"""Create a new data provider object.
Args:
inputs (ndarray): Array of data input features of shape
(num_data, input_dim).
targets (ndarray): Array of data output targets of shape
(num_data, output_dim) or (num_data,) if output_dim == 1.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
self.inputs = inputs
self.targets = targets
if batch_size < 1:
raise ValueError('batch_size must be >= 1')
self._batch_size = batch_size
if max_num_batches == 0 or max_num_batches < -1:
raise ValueError('max_num_batches must be -1 or > 0')
self._max_num_batches = max_num_batches
self._update_num_batches()
self.random_sampling = random_sampling
self._current_order = np.arange(inputs.shape[0])
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
self.new_epoch()
@property
def batch_size(self):
"""Number of data points to include in each batch."""
return self._batch_size
@batch_size.setter
def batch_size(self, value):
if value < 1:
raise ValueError('batch_size must be >= 1')
self._batch_size = value
self._update_num_batches()
@property
def max_num_batches(self):
"""Maximum number of batches to iterate over in an epoch."""
return self._max_num_batches
@max_num_batches.setter
def max_num_batches(self, value):
if value == 0 or value < -1:
raise ValueError('max_num_batches must be -1 or > 0')
self._max_num_batches = value
self._update_num_batches()
def _update_num_batches(self):
"""Updates number of batches to iterate over."""
# maximum possible number of batches is equal to number of whole times
# batch_size divides in to the number of data points which can be
# found using integer division
possible_num_batches = self.inputs.shape[0] // self.batch_size
if self.max_num_batches == -1:
self.num_batches = possible_num_batches
else:
self.num_batches = min(self.max_num_batches, possible_num_batches)
def __iter__(self):
"""Implements Python iterator interface.
This should return an object implementing a `next` method which steps
through a sequence returning one element at a time and raising
`StopIteration` when at the end of the sequence. Here the object
returned is the DataProvider itself.
"""
return self
def new_epoch(self):
"""Starts a new epoch (pass through data), possibly shuffling first."""
self._curr_batch = 0
def __next__(self):
return self.next()
def reset(self):
"""Resets the provider to the initial state."""
inv_perm = np.argsort(self._current_order)
self._current_order = self._current_order[inv_perm]
self.inputs = self.inputs[inv_perm]
self.targets = self.targets[inv_perm]
self.new_epoch()
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
if self._curr_batch + 1 > self.num_batches:
# no more batches in current iteration through data set so start
# new epoch ready for another pass and indicate iteration is at end
self.new_epoch()
raise StopIteration()
# create an index slice corresponding to current batch number
if self.random_sampling:
batch_slice = self.rng.choice(self.inputs.shape[0], size=self.batch_size, replace=False)
else:
batch_slice = slice(self._curr_batch * self.batch_size,
(self._curr_batch + 1) * self.batch_size)
inputs_batch = self.inputs[batch_slice]
targets_batch = self.targets[batch_slice]
self._curr_batch += 1
return inputs_batch, targets_batch
class MNISTDataProvider(DataProvider):
"""Data provider for MNIST handwritten digit images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None):
"""Create a new MNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the MNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
self.which_set = which_set
self.num_classes = 10
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'mnist-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
inputs, targets = loaded['inputs'], loaded['targets']
inputs = inputs.astype(np.float32)
# pass the loaded data to the parent class __init__
super(MNISTDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, random_sampling, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(MNISTDataProvider, self).next()
return inputs_batch, self.to_one_of_k(targets_batch)
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class EMNISTDataProvider(DataProvider):
"""Data provider for EMNIST handwritten digit images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None, flatten=False, one_hot=False):
"""Create a new EMNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the EMNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
self.one_hot = one_hot
self.which_set = which_set
self.num_classes = 47
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'emnist-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
inputs, targets = loaded['inputs'], loaded['targets']
inputs = inputs.astype(np.float32)
if flatten:
inputs = np.reshape(inputs, newshape=(-1, 28*28))
else:
inputs = np.expand_dims(inputs, axis=3)
inputs = inputs / 255.0
# pass the loaded data to the parent class __init__
super(EMNISTDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, random_sampling, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(EMNISTDataProvider, self).next()
if self.one_hot:
return inputs_batch, self.to_one_of_k(targets_batch)
else:
return inputs_batch, targets_batch
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class CIFAR10DataProvider(DataProvider):
"""Data provider for CIFAR-10 object images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None, flatten=False, one_hot=False):
"""Create a new EMNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the EMNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
self.one_hot = one_hot
self.which_set = which_set
self.num_classes = 10
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'cifar10-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
inputs, targets = loaded['inputs'], loaded['targets']
inputs = inputs.astype(np.float32)
if flatten:
inputs = np.reshape(inputs, newshape=(-1, 32*32*3))
else:
inputs = np.reshape(inputs, newshape=(-1, 3, 32, 32))
inputs = np.transpose(inputs, axes=(0, 2, 3, 1))
inputs = inputs / 255.0
# label map gives strings corresponding to integer label targets
# pass the loaded data to the parent class __init__
super(CIFAR10DataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, random_sampling, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(CIFAR10DataProvider, self).next()
if self.one_hot:
return inputs_batch, self.to_one_of_k(targets_batch)
else:
return inputs_batch, targets_batch
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class CIFAR100DataProvider(DataProvider):
"""Data provider for CIFAR-100 object images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None, flatten=False, one_hot=False):
"""Create a new EMNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the EMNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
self.one_hot = one_hot
self.which_set = which_set
self.num_classes = 100
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'cifar100-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
inputs, targets = loaded['inputs'], loaded['targets']
inputs = inputs.astype(np.float32)
if flatten:
inputs = np.reshape(inputs, newshape=(-1, 32*32*3))
else:
inputs = np.reshape(inputs, newshape=(-1, 3, 32, 32))
inputs = np.transpose(inputs, axes=(0, 2, 3, 1))
inputs = inputs / 255.0
# pass the loaded data to the parent class __init__
super(CIFAR100DataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, random_sampling, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(CIFAR100DataProvider, self).next()
if self.one_hot:
return inputs_batch, self.to_one_of_k(targets_batch)
else:
return inputs_batch, targets_batch
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class MSD10GenreDataProvider(DataProvider):
"""Data provider for Million Song Dataset 10-genre classification task."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None, one_hot=False, flatten=True):
"""Create a new EMNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the EMNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
self.one_hot = one_hot
self.which_set = which_set
self.num_classes = 10
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
if which_set is not "test":
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'msd10-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
inputs, target = loaded['inputs'], loaded['targets']
else:
input_data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'msd-10-genre-test-inputs.npz')
assert os.path.isfile(input_data_path), (
'Data file does not exist at expected path: ' + input_data_path
)
target_data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'msd-10-genre-test-targets.npz')
assert os.path.isfile(input_data_path), (
'Data file does not exist at expected path: ' + input_data_path
)
# load data from compressed numpy file
inputs = np.load(input_data_path)['inputs']
target = np.load(target_data_path)['targets']
if flatten:
inputs = inputs.reshape((-1, 120*25))
#inputs, targets = loaded['inputs'], loaded['targets']
# label map gives strings corresponding to integer label targets
# pass the loaded data to the parent class __init__
super(MSD10GenreDataProvider, self).__init__(
inputs, target, batch_size, max_num_batches, random_sampling, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(MSD10GenreDataProvider, self).next()
if self.one_hot:
return inputs_batch, self.to_one_of_k(targets_batch)
else:
return inputs_batch, targets_batch
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class MSD25GenreDataProvider(DataProvider):
"""Data provider for Million Song Dataset 25-genre classification task."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None, one_hot=False, flatten=True):
"""Create a new EMNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the EMNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train or valid. '
'Got {0}'.format(which_set)
)
self.one_hot = one_hot
self.which_set = which_set
self.num_classes = 25
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'msd10-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
inputs, target = loaded['inputs'], loaded['targets']
if flatten:
inputs = inputs.reshape((-1, 120*25))
#inputs, target
# pass the loaded data to the parent class __init__
super(MSD25GenreDataProvider, self).__init__(
inputs, target, batch_size, max_num_batches, random_sampling, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(MSD25GenreDataProvider, self).next()
if self.one_hot:
return inputs_batch, self.to_one_of_k(targets_batch)
else:
return inputs_batch, targets_batch
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class MetOfficeDataProvider(DataProvider):
"""South Scotland Met Office weather data provider."""
def __init__(self, window_size, batch_size=10, max_num_batches=-1,
random_sampling=False, rng=None):
"""Create a new Met Office data provider object.
Args:
window_size (int): Size of windows to split weather time series
data into. The constructed input features will be the first
`window_size - 1` entries in each window and the target outputs
the last entry in each window.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'HadSSP_daily_qc.txt')
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
raw = np.loadtxt(data_path, skiprows=3, usecols=range(2, 32))
assert window_size > 1, 'window_size must be at least 2.'
self.window_size = window_size
# filter out all missing datapoints and flatten to a vector
filtered = raw[raw >= 0].flatten()
# normalise data to zero mean, unit standard deviation
mean = np.mean(filtered)
std = np.std(filtered)
normalised = (filtered - mean) / std
# create a view on to array corresponding to a rolling window
shape = (normalised.shape[-1] - self.window_size + 1, self.window_size)
strides = normalised.strides + (normalised.strides[-1],)
windowed = np.lib.stride_tricks.as_strided(
normalised, shape=shape, strides=strides)
# inputs are first (window_size - 1) entries in windows
inputs = windowed[:, :-1]
# targets are last entry in windows
targets = windowed[:, -1]
super(MetOfficeDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, random_sampling, rng)
class CCPPDataProvider(DataProvider):
def __init__(self, which_set='train', input_dims=None, batch_size=10,
max_num_batches=-1, random_sampling=False, rng=None):
"""Create a new Combined Cycle Power Plant data provider object.
Args:
which_set: One of 'train' or 'valid'. Determines which portion of
data this object should provide.
input_dims: Which of the four input dimension to use. If `None` all
are used. If an iterable of integers are provided (consisting
of a subset of {0, 1, 2, 3}) then only the corresponding
input dimensions are included.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'ccpp_data.npz')
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# check a valid which_set was provided
assert which_set in ['train', 'valid'], (
'Expected which_set to be either train or valid '
'Got {0}'.format(which_set)
)
# check input_dims are valid
if not input_dims is not None:
input_dims = set(input_dims)
assert input_dims.issubset({0, 1, 2, 3}), (
'input_dims should be a subset of {0, 1, 2, 3}'
)
loaded = np.load(data_path)
inputs = loaded[which_set + '_inputs']
if input_dims is not None:
inputs = inputs[:, input_dims]
targets = loaded[which_set + '_targets']
super(CCPPDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, random_sampling, rng)
class AugmentedMNISTDataProvider(MNISTDataProvider):
"""Data provider for MNIST dataset which randomly transforms images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
random_sampling=False, rng=None, transformer=None):
"""Create a new augmented MNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'test'. Determines which
portion of the MNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
random_sampling (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
transformer: Function which takes an `inputs` array of shape
(batch_size, input_dim) corresponding to a batch of input
images and a `rng` random number generator object (i.e. a
call signature `transformer(inputs, rng)`) and applies a
potentiall random set of transformations to some / all of the
input images as each new batch is returned when iterating over
the data provider.
"""
super(AugmentedMNISTDataProvider, self).__init__(
which_set, batch_size, max_num_batches, random_sampling, rng)
self.transformer = transformer
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(
AugmentedMNISTDataProvider, self).next()
transformed_inputs_batch = self.transformer(inputs_batch, self.rng)
return transformed_inputs_batch, targets_batch

181
emnist_network_trainer.py Normal file
View File

@ -0,0 +1,181 @@
import argparse
import numpy as np
import tensorflow as tf
import tqdm
from data_providers import EMNISTDataProvider
from network_builder import ClassifierNetworkGraph
from utils.parser_utils import ParserClass
from utils.storage import build_experiment_folder, save_statistics, get_best_validation_model_statistics
tf.reset_default_graph() # resets any previous graphs to clear memory
parser = argparse.ArgumentParser(description='Welcome to CNN experiments script') # generates an argument parser
parser_extractor = ParserClass(parser=parser) # creates a parser class to process the parsed input
batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
# returns a list of objects that contain
# our parsed input
experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
batch_size, batch_norm,
strided_dim_reduction)
# generate experiment name
rng = np.random.RandomState(seed=seed) # set seed
train_data = EMNISTDataProvider(which_set="train", batch_size=batch_size, rng=rng, random_sampling=True)
val_data = EMNISTDataProvider(which_set="valid", batch_size=batch_size, rng=rng)
test_data = EMNISTDataProvider(which_set="test", batch_size=batch_size, rng=rng)
# setup our data providers
print("Running {}".format(experiment_name))
print("Starting from epoch {}".format(continue_from_epoch))
saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path) # generate experiment dir
# Placeholder setup
data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1], train_data.inputs.shape[2],
train_data.inputs.shape[3]], 'data-inputs')
data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
training_phase = tf.placeholder(tf.bool, name='training-flag')
rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
classifier_network = ClassifierNetworkGraph(input_x=data_inputs, target_placeholder=data_targets,
dropout_rate=dropout_rate, batch_size=batch_size,
n_classes=train_data.num_classes,
is_training=training_phase, augment_rotate_flag=rotate_data,
strided_dim_reduction=strided_dim_reduction,
use_batch_normalization=batch_norm) # initialize our computational graph
if continue_from_epoch == -1: # if this is a new experiment and not continuation of a previous one then generate a new
# statistics file
save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
"val_c_loss", "val_c_accuracy",
"test_c_loss", "test_c_accuracy"], create=True)
start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0 # if new experiment start from 0 otherwise
# continue where left off
summary_op, losses_ops, c_error_opt_op = classifier_network.init_train() # get graph operations (ops)
total_train_batches = train_data.num_batches
total_val_batches = val_data.num_batches
total_test_batches = test_data.num_batches
if tensorboard_enable:
print("saved tensorboard file at", logs_filepath)
writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
init = tf.global_variables_initializer() # initialization op for the graph
with tf.Session() as sess:
sess.run(init) # actually running the initialization op
train_saver = tf.train.Saver() # saver object that will save our graph so we can reload it later for continuation of
val_saver = tf.train.Saver()
best_val_accuracy = 0.
best_epoch = 0
# training or inference
if continue_from_epoch != -1:
train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
continue_from_epoch)) # restore previous graph to continue operations
best_val_accuracy, best_epoch = get_best_validation_model_statistics(logs_filepath, "result_summary_statistics")
print(best_val_accuracy, best_epoch)
with tqdm.tqdm(total=epochs - start_epoch) as epoch_pbar:
for e in range(start_epoch, epochs):
total_c_loss = 0.
total_accuracy = 0.
with tqdm.tqdm(total=total_train_batches) as pbar_train:
for batch_idx, (x_batch, y_batch) in enumerate(train_data):
iter_id = e * total_train_batches + batch_idx
_, c_loss_value, acc = sess.run(
[c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
# Here we execute the c_error_opt_op which trains the network and also the ops that compute the
# loss and accuracy, we save those in _, c_loss_value and acc respectively.
total_c_loss += c_loss_value # add loss of current iter to sum
total_accuracy += acc # add acc of current iter to sum
iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
total_c_loss / (batch_idx + 1),
total_accuracy / (
batch_idx + 1)) # show
# iter statistics using running averages of previous iter within this epoch
pbar_train.set_description(iter_out)
pbar_train.update(1)
if tensorboard_enable and batch_idx % 25 == 0: # save tensorboard summary every 25 iterations
_summary = sess.run(
summary_op,
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
writer.add_summary(_summary, global_step=iter_id)
total_c_loss /= total_train_batches # compute mean of los
total_accuracy /= total_train_batches # compute mean of accuracy
save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
# save graph and weights
print("Saved current model at", save_path)
total_val_c_loss = 0.
total_val_accuracy = 0. # run validation stage, note how training_phase placeholder is set to False
# and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
# to collect losses on the validation set
with tqdm.tqdm(total=total_val_batches) as pbar_val:
for batch_idx, (x_batch, y_batch) in enumerate(val_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_val_c_loss += c_loss_value
total_val_accuracy += acc
iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
total_val_accuracy / (batch_idx + 1))
pbar_val.set_description(iter_out)
pbar_val.update(1)
total_val_c_loss /= total_val_batches
total_val_accuracy /= total_val_batches
if best_val_accuracy < total_val_accuracy: # check if val acc better than the previous best and if
# so save current as best and save the model as the best validation model to be used on the test set
# after the final epoch
best_val_accuracy = total_val_accuracy
best_epoch = e
save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
print("Saved best validation score model at", save_path)
epoch_pbar.update(1)
# save statistics of this epoch, train and val without test set performance
save_statistics(logs_filepath, "result_summary_statistics",
[e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
-1, -1])
val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
# restore model with best performance on validation set
total_test_c_loss = 0.
total_test_accuracy = 0.
# computer test loss and accuracy and save
with tqdm.tqdm(total=total_test_batches) as pbar_test:
for batch_idx, (x_batch, y_batch) in enumerate(test_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_test_c_loss += c_loss_value
total_test_accuracy += acc
iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
total_test_accuracy / (batch_idx + 1))
pbar_test.set_description(iter_out)
pbar_test.update(1)
total_test_c_loss /= total_test_batches
total_test_accuracy /= total_test_batches
save_statistics(logs_filepath, "result_summary_statistics",
["test set performance", -1, -1, -1, -1,
total_test_c_loss, total_test_accuracy])

View File

@ -0,0 +1,21 @@
#!/bin/sh
#To be used before srun so that interactive sessions are run with gpu support
export CUDA_HOME=/opt/cuda-8.0.44
export CUDNN_HOME=/opt/cuDNN-6.0_8.0
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp

View File

@ -0,0 +1,33 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --gres=gpu:1
#SBATCH --mem=16000 # memory in Mb
#SBATCH -o sample_experiment_outfile # send stdout to sample_experiment_outfile
#SBATCH -e sample_experiment_errfile # send stderr to sample_experiment_errfile
#SBATCH -t 2:00:00 # time requested in hour:minute:secon
export CUDA_HOME=/opt/cuda-8.0.44
export CUDNN_HOME=/opt/cuDNN-6.0_8.0
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
python emnist_network_trainer.py --batch_size 128 --epochs 200 --experiment_prefix vgg-net-emnist-sample-exp --dropout_rate 0.4 --batch_norm_use True --strided_dim_reduction True --seed 25012018

181
msd10_network_trainer.py Normal file
View File

@ -0,0 +1,181 @@
import argparse
import numpy as np
import tensorflow as tf
import tqdm
from data_providers import MSD10GenreDataProvider
from network_builder import ClassifierNetworkGraph
from utils.parser_utils import ParserClass
from utils.storage import build_experiment_folder, save_statistics, get_best_validation_model_statistics
tf.reset_default_graph() # resets any previous graphs to clear memory
parser = argparse.ArgumentParser(description='Welcome to CNN experiments script') # generates an argument parser
parser_extractor = ParserClass(parser=parser) # creates a parser class to process the parsed input
batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
# returns a list of objects that contain
# our parsed input
experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
batch_size, batch_norm,
strided_dim_reduction)
# generate experiment name
rng = np.random.RandomState(seed=seed) # set seed
train_data = MSD10GenreDataProvider(which_set="train", batch_size=batch_size, rng=rng, random_sampling=True)
val_data = MSD10GenreDataProvider(which_set="valid", batch_size=batch_size, rng=rng)
test_data = MSD10GenreDataProvider(which_set="test", batch_size=batch_size, rng=rng)
# setup our data providers
print("Running {}".format(experiment_name))
print("Starting from epoch {}".format(continue_from_epoch))
saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path) # generate experiment dir
# Placeholder setup
data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1]], 'data-inputs')
data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
training_phase = tf.placeholder(tf.bool, name='training-flag')
rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
classifier_network = ClassifierNetworkGraph(network_name='FCCClassifier',
input_x=data_inputs, target_placeholder=data_targets,
dropout_rate=dropout_rate, batch_size=batch_size,
n_classes=train_data.num_classes,
is_training=training_phase, augment_rotate_flag=rotate_data,
strided_dim_reduction=strided_dim_reduction,
use_batch_normalization=batch_norm) # initialize our computational graph
if continue_from_epoch == -1: # if this is a new experiment and not continuation of a previous one then generate a new
# statistics file
save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
"val_c_loss", "val_c_accuracy",
"test_c_loss", "test_c_accuracy"], create=True)
start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0 # if new experiment start from 0 otherwise
# continue where left off
summary_op, losses_ops, c_error_opt_op = classifier_network.init_train() # get graph operations (ops)
total_train_batches = train_data.num_batches
total_val_batches = val_data.num_batches
total_test_batches = test_data.num_batches
if tensorboard_enable:
print("saved tensorboard file at", logs_filepath)
writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
init = tf.global_variables_initializer() # initialization op for the graph
with tf.Session() as sess:
sess.run(init) # actually running the initialization op
train_saver = tf.train.Saver() # saver object that will save our graph so we can reload it later for continuation of
val_saver = tf.train.Saver()
best_val_accuracy = 0.
best_epoch = 0
# training or inference
if continue_from_epoch != -1:
train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
continue_from_epoch)) # restore previous graph to continue operations
best_val_accuracy, best_epoch = get_best_validation_model_statistics(logs_filepath, "result_summary_statistics")
print(best_val_accuracy, best_epoch)
with tqdm.tqdm(total=epochs - start_epoch) as epoch_pbar:
for e in range(start_epoch, epochs):
total_c_loss = 0.
total_accuracy = 0.
with tqdm.tqdm(total=total_train_batches) as pbar_train:
for batch_idx, (x_batch, y_batch) in enumerate(train_data):
iter_id = e * total_train_batches + batch_idx
_, c_loss_value, acc = sess.run(
[c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
# Here we execute the c_error_opt_op which trains the network and also the ops that compute the
# loss and accuracy, we save those in _, c_loss_value and acc respectively.
total_c_loss += c_loss_value # add loss of current iter to sum
total_accuracy += acc # add acc of current iter to sum
iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
total_c_loss / (batch_idx + 1),
total_accuracy / (
batch_idx + 1)) # show
# iter statistics using running averages of previous iter within this epoch
pbar_train.set_description(iter_out)
pbar_train.update(1)
if tensorboard_enable and batch_idx % 25 == 0: # save tensorboard summary every 25 iterations
_summary = sess.run(
summary_op,
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
writer.add_summary(_summary, global_step=iter_id)
total_c_loss /= total_train_batches # compute mean of los
total_accuracy /= total_train_batches # compute mean of accuracy
save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
# save graph and weights
print("Saved current model at", save_path)
total_val_c_loss = 0.
total_val_accuracy = 0. # run validation stage, note how training_phase placeholder is set to False
# and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
# to collect losses on the validation set
with tqdm.tqdm(total=total_val_batches) as pbar_val:
for batch_idx, (x_batch, y_batch) in enumerate(val_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_val_c_loss += c_loss_value
total_val_accuracy += acc
iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
total_val_accuracy / (batch_idx + 1))
pbar_val.set_description(iter_out)
pbar_val.update(1)
total_val_c_loss /= total_val_batches
total_val_accuracy /= total_val_batches
if best_val_accuracy < total_val_accuracy: # check if val acc better than the previous best and if
# so save current as best and save the model as the best validation model to be used on the test set
# after the final epoch
best_val_accuracy = total_val_accuracy
best_epoch = e
save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
print("Saved best validation score model at", save_path)
epoch_pbar.update(1)
# save statistics of this epoch, train and val without test set performance
save_statistics(logs_filepath, "result_summary_statistics",
[e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
-1, -1])
val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
# restore model with best performance on validation set
total_test_c_loss = 0.
total_test_accuracy = 0.
# computer test loss and accuracy and save
with tqdm.tqdm(total=total_test_batches) as pbar_test:
for batch_idx, (x_batch, y_batch) in enumerate(test_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_test_c_loss += c_loss_value
total_test_accuracy += acc
iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
total_test_accuracy / (batch_idx + 1))
pbar_test.set_description(iter_out)
pbar_test.update(1)
total_test_c_loss /= total_test_batches
total_test_accuracy /= total_test_batches
save_statistics(logs_filepath, "result_summary_statistics",
["test set performance", -1, -1, -1, -1,
total_test_c_loss, total_test_accuracy])

181
msd25_network_trainer.py Normal file
View File

@ -0,0 +1,181 @@
import argparse
import numpy as np
import tensorflow as tf
import tqdm
from data_providers import MSD10GenreDataProvider
from network_builder import ClassifierNetworkGraph
from utils.parser_utils import ParserClass
from utils.storage import build_experiment_folder, save_statistics, get_best_validation_model_statistics
tf.reset_default_graph() # resets any previous graphs to clear memory
parser = argparse.ArgumentParser(description='Welcome to CNN experiments script') # generates an argument parser
parser_extractor = ParserClass(parser=parser) # creates a parser class to process the parsed input
batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
strided_dim_reduction, experiment_prefix, dropout_rate_value = parser_extractor.get_argument_variables()
# returns a list of objects that contain
# our parsed input
experiment_name = "experiment_{}_batch_size_{}_bn_{}_mp_{}".format(experiment_prefix,
batch_size, batch_norm,
strided_dim_reduction)
# generate experiment name
rng = np.random.RandomState(seed=seed) # set seed
train_data = MSD10GenreDataProvider(which_set="train", batch_size=batch_size, rng=rng, random_sampling=True)
val_data = MSD10GenreDataProvider(which_set="valid", batch_size=batch_size, rng=rng)
test_data = MSD10GenreDataProvider(which_set="test", batch_size=batch_size, rng=rng)
# setup our data providers
print("Running {}".format(experiment_name))
print("Starting from epoch {}".format(continue_from_epoch))
saved_models_filepath, logs_filepath = build_experiment_folder(experiment_name, logs_path) # generate experiment dir
# Placeholder setup
data_inputs = tf.placeholder(tf.float32, [batch_size, train_data.inputs.shape[1]], 'data-inputs')
data_targets = tf.placeholder(tf.int32, [batch_size], 'data-targets')
training_phase = tf.placeholder(tf.bool, name='training-flag')
rotate_data = tf.placeholder(tf.bool, name='rotate-flag')
dropout_rate = tf.placeholder(tf.float32, name='dropout-prob')
classifier_network = ClassifierNetworkGraph(network_name='FCCClassifier',
input_x=data_inputs, target_placeholder=data_targets,
dropout_rate=dropout_rate, batch_size=batch_size,
n_classes=train_data.num_classes,
is_training=training_phase, augment_rotate_flag=rotate_data,
strided_dim_reduction=strided_dim_reduction,
use_batch_normalization=batch_norm) # initialize our computational graph
if continue_from_epoch == -1: # if this is a new experiment and not continuation of a previous one then generate a new
# statistics file
save_statistics(logs_filepath, "result_summary_statistics", ["epoch", "train_c_loss", "train_c_accuracy",
"val_c_loss", "val_c_accuracy",
"test_c_loss", "test_c_accuracy"], create=True)
start_epoch = continue_from_epoch if continue_from_epoch != -1 else 0 # if new experiment start from 0 otherwise
# continue where left off
summary_op, losses_ops, c_error_opt_op = classifier_network.init_train() # get graph operations (ops)
total_train_batches = train_data.num_batches
total_val_batches = val_data.num_batches
total_test_batches = test_data.num_batches
if tensorboard_enable:
print("saved tensorboard file at", logs_filepath)
writer = tf.summary.FileWriter(logs_filepath, graph=tf.get_default_graph())
init = tf.global_variables_initializer() # initialization op for the graph
with tf.Session() as sess:
sess.run(init) # actually running the initialization op
train_saver = tf.train.Saver() # saver object that will save our graph so we can reload it later for continuation of
val_saver = tf.train.Saver()
best_val_accuracy = 0.
best_epoch = 0
# training or inference
if continue_from_epoch != -1:
train_saver.restore(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name,
continue_from_epoch)) # restore previous graph to continue operations
best_val_accuracy, best_epoch = get_best_validation_model_statistics(logs_filepath, "result_summary_statistics")
print(best_val_accuracy, best_epoch)
with tqdm.tqdm(total=epochs - start_epoch) as epoch_pbar:
for e in range(start_epoch, epochs):
total_c_loss = 0.
total_accuracy = 0.
with tqdm.tqdm(total=total_train_batches) as pbar_train:
for batch_idx, (x_batch, y_batch) in enumerate(train_data):
iter_id = e * total_train_batches + batch_idx
_, c_loss_value, acc = sess.run(
[c_error_opt_op, losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
# Here we execute the c_error_opt_op which trains the network and also the ops that compute the
# loss and accuracy, we save those in _, c_loss_value and acc respectively.
total_c_loss += c_loss_value # add loss of current iter to sum
total_accuracy += acc # add acc of current iter to sum
iter_out = "iter_num: {}, train_loss: {}, train_accuracy: {}".format(iter_id,
total_c_loss / (batch_idx + 1),
total_accuracy / (
batch_idx + 1)) # show
# iter statistics using running averages of previous iter within this epoch
pbar_train.set_description(iter_out)
pbar_train.update(1)
if tensorboard_enable and batch_idx % 25 == 0: # save tensorboard summary every 25 iterations
_summary = sess.run(
summary_op,
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: True, rotate_data: False})
writer.add_summary(_summary, global_step=iter_id)
total_c_loss /= total_train_batches # compute mean of los
total_accuracy /= total_train_batches # compute mean of accuracy
save_path = train_saver.save(sess, "{}/{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
# save graph and weights
print("Saved current model at", save_path)
total_val_c_loss = 0.
total_val_accuracy = 0. # run validation stage, note how training_phase placeholder is set to False
# and that we do not run the c_error_opt_op which runs gradient descent, but instead only call the loss ops
# to collect losses on the validation set
with tqdm.tqdm(total=total_val_batches) as pbar_val:
for batch_idx, (x_batch, y_batch) in enumerate(val_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_val_c_loss += c_loss_value
total_val_accuracy += acc
iter_out = "val_loss: {}, val_accuracy: {}".format(total_val_c_loss / (batch_idx + 1),
total_val_accuracy / (batch_idx + 1))
pbar_val.set_description(iter_out)
pbar_val.update(1)
total_val_c_loss /= total_val_batches
total_val_accuracy /= total_val_batches
if best_val_accuracy < total_val_accuracy: # check if val acc better than the previous best and if
# so save current as best and save the model as the best validation model to be used on the test set
# after the final epoch
best_val_accuracy = total_val_accuracy
best_epoch = e
save_path = val_saver.save(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, e))
print("Saved best validation score model at", save_path)
epoch_pbar.update(1)
# save statistics of this epoch, train and val without test set performance
save_statistics(logs_filepath, "result_summary_statistics",
[e, total_c_loss, total_accuracy, total_val_c_loss, total_val_accuracy,
-1, -1])
val_saver.restore(sess, "{}/best_validation_{}_{}.ckpt".format(saved_models_filepath, experiment_name, best_epoch))
# restore model with best performance on validation set
total_test_c_loss = 0.
total_test_accuracy = 0.
# computer test loss and accuracy and save
with tqdm.tqdm(total=total_test_batches) as pbar_test:
for batch_idx, (x_batch, y_batch) in enumerate(test_data):
c_loss_value, acc = sess.run(
[losses_ops["crossentropy_losses"], losses_ops["accuracy"]],
feed_dict={dropout_rate: dropout_rate_value, data_inputs: x_batch,
data_targets: y_batch, training_phase: False, rotate_data: False})
total_test_c_loss += c_loss_value
total_test_accuracy += acc
iter_out = "test_loss: {}, test_accuracy: {}".format(total_test_c_loss / (batch_idx + 1),
total_test_accuracy / (batch_idx + 1))
pbar_test.set_description(iter_out)
pbar_test.update(1)
total_test_c_loss /= total_test_batches
total_test_accuracy /= total_test_batches
save_statistics(logs_filepath, "result_summary_statistics",
["test set performance", -1, -1, -1, -1,
total_test_c_loss, total_test_accuracy])

146
network_architectures.py Normal file
View File

@ -0,0 +1,146 @@
import tensorflow as tf
from tensorflow.contrib.layers import batch_norm
from tensorflow.python.ops.nn_ops import leaky_relu
from utils.network_summary import count_parameters
class VGGClassifier:
def __init__(self, batch_size, layer_stage_sizes, name, num_classes, batch_norm_use=False,
inner_layer_depth=2, strided_dim_reduction=True):
"""
Initializes a VGG Classifier architecture
:param batch_size: The size of the data batch
:param layer_stage_sizes: A list containing the filters for each layer stage, where layer stage is a series of
convolutional layers with stride=1 and no max pooling followed by a dimensionality reducing stage which is
either a convolution with stride=1 followed by max pooling or a convolution with stride=2
(i.e. strided convolution). So if we pass a list [64, 128, 256] it means that if we have inner_layer_depth=2
then stage 0 will have 2 layers with stride=1 and filter size=64 and another dimensionality reducing convolution
with either stride=1 and max pooling or stride=2 to dimensionality reduce. Similarly for the other stages.
:param name: Name of the network
:param num_classes: Number of classes we will need to classify
:param num_channels: Number of channels of our image data.
:param batch_norm_use: Whether to use batch norm between layers or not.
:param inner_layer_depth: The amount of extra layers on top of the dimensionality reducing stage to have per
layer stage.
:param strided_dim_reduction: Whether to use strided convolutions instead of max pooling.
"""
self.reuse = False
self.batch_size = batch_size
self.layer_stage_sizes = layer_stage_sizes
self.name = name
self.num_classes = num_classes
self.batch_norm_use = batch_norm_use
self.inner_layer_depth = inner_layer_depth
self.strided_dim_reduction = strided_dim_reduction
self.build_completed = False
def __call__(self, image_input, training=False, dropout_rate=0.0):
"""
Runs the CNN producing the predictions and the gradients.
:param image_input: Image input to produce embeddings for. e.g. for EMNIST [batch_size, 28, 28, 1]
:param training: A flag indicating training or evaluation
:param dropout_rate: A tf placeholder of type tf.float32 indicating the amount of dropout applied
:return: Embeddings of size [batch_size, self.num_classes]
"""
with tf.variable_scope(self.name, reuse=self.reuse):
layer_features = []
with tf.variable_scope('VGGNet'):
outputs = image_input
for i in range(len(self.layer_stage_sizes)):
with tf.variable_scope('conv_stage_{}'.format(i)):
for j in range(self.inner_layer_depth):
with tf.variable_scope('conv_{}_{}'.format(i, j)):
if (j == self.inner_layer_depth-1) and self.strided_dim_reduction:
stride = 2
else:
stride = 1
outputs = tf.layers.conv2d(outputs, self.layer_stage_sizes[i], [3, 3],
strides=(stride, stride),
padding='SAME', activation=None)
outputs = leaky_relu(outputs, name="leaky_relu{}".format(i))
layer_features.append(outputs)
if self.batch_norm_use:
outputs = batch_norm(outputs, decay=0.99, scale=True,
center=True, is_training=training, renorm=False)
if self.strided_dim_reduction==False:
outputs = tf.layers.max_pooling2d(outputs, pool_size=(2, 2), strides=2)
outputs = tf.layers.dropout(outputs, rate=dropout_rate, training=training)
# apply dropout only at dimensionality
# reducing steps, i.e. the last layer in
# every group
c_conv_encoder = outputs
c_conv_encoder = tf.contrib.layers.flatten(c_conv_encoder)
c_conv_encoder = tf.layers.dense(c_conv_encoder, units=self.num_classes)
self.reuse = True
self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.name)
if not self.build_completed:
self.build_completed = True
count_parameters(self.variables, "VGGNet")
return c_conv_encoder, layer_features
class FCCLayerClassifier:
def __init__(self, batch_size, layer_stage_sizes, name, num_classes, batch_norm_use=False,
inner_layer_depth=2, strided_dim_reduction=True):
"""
Initializes a FCC Classifier architecture
"""
self.reuse = False
self.batch_size = batch_size
self.layer_stage_sizes = layer_stage_sizes
self.name = name
self.num_classes = num_classes
self.batch_norm_use = batch_norm_use
self.inner_layer_depth = inner_layer_depth
self.strided_dim_reduction = strided_dim_reduction
self.build_completed = False
def __call__(self, image_input, training=False, dropout_rate=0.0):
"""
Runs the CNN producing the predictions and the gradients.
:param image_input: Image input to produce embeddings for. e.g. for EMNIST [batch_size, 28, 28, 1]
:param training: A flag indicating training or evaluation
:param dropout_rate: A tf placeholder of type tf.float32 indicating the amount of dropout applied
:return: Embeddings of size [batch_size, self.num_classes]
"""
with tf.variable_scope(self.name, reuse=self.reuse):
layer_features = []
with tf.variable_scope('FCCLayerNet'):
outputs = image_input
for i in range(len(self.layer_stage_sizes)):
with tf.variable_scope('conv_stage_{}'.format(i)):
for j in range(self.inner_layer_depth):
with tf.variable_scope('conv_{}_{}'.format(i, j)):
outputs = tf.layers.dense(outputs, units=self.layer_stage_sizes[i])
outputs = leaky_relu(outputs, name="leaky_relu{}".format(i))
layer_features.append(outputs)
if self.batch_norm_use:
outputs = batch_norm(outputs, decay=0.99, scale=True,
center=True, is_training=training, renorm=False)
outputs = tf.layers.dropout(outputs, rate=dropout_rate, training=training)
# apply dropout only at dimensionality
# reducing steps, i.e. the last layer in
# every group
c_conv_encoder = outputs
c_conv_encoder = tf.contrib.layers.flatten(c_conv_encoder)
c_conv_encoder = tf.layers.dense(c_conv_encoder, units=self.num_classes)
self.reuse = True
self.variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.name)
if not self.build_completed:
self.build_completed = True
count_parameters(self.variables, "FCCLayerNet")
return c_conv_encoder, layer_features

177
network_builder.py Normal file
View File

@ -0,0 +1,177 @@
import tensorflow as tf
from network_architectures import VGGClassifier, FCCLayerClassifier
class ClassifierNetworkGraph:
def __init__(self, input_x, target_placeholder, dropout_rate,
batch_size=100, n_classes=100, is_training=True, augment_rotate_flag=True,
tensorboard_use=False, use_batch_normalization=False, strided_dim_reduction=True,
network_name='VGG_classifier'):
"""
Initializes a Classifier Network Graph that can build models, train, compute losses and save summary statistics
and images
:param input_x: A placeholder that will feed the input images, usually of size [batch_size, height, width,
channels]
:param target_placeholder: A target placeholder of size [batch_size,]. The classes should be in index form
i.e. not one hot encoding, that will be done automatically by tf
:param dropout_rate: A placeholder of size [None] that holds a single float that defines the amount of dropout
to apply to the network. i.e. for 0.1 drop 0.1 of neurons
:param batch_size: The batch size
:param num_channels: Number of channels
:param n_classes: Number of classes we will be classifying
:param is_training: A placeholder that will indicate whether we are training or not
:param augment_rotate_flag: A placeholder indicating whether to apply rotations augmentations to our input data
:param tensorboard_use: Whether to use tensorboard in this experiment
:param use_batch_normalization: Whether to use batch normalization between layers
:param strided_dim_reduction: Whether to use strided dim reduction instead of max pooling
"""
self.batch_size = batch_size
if network_name == "VGG_classifier":
self.c = VGGClassifier(self.batch_size, name="classifier_neural_network",
batch_norm_use=use_batch_normalization, num_classes=n_classes,
layer_stage_sizes=[64, 128, 256], strided_dim_reduction=strided_dim_reduction)
elif network_name == "FCCClassifier":
self.c = FCCLayerClassifier(self.batch_size, name="classifier_neural_network",
batch_norm_use=use_batch_normalization, num_classes=n_classes,
layer_stage_sizes=[64, 128, 256], strided_dim_reduction=strided_dim_reduction)
self.input_x = input_x
self.dropout_rate = dropout_rate
self.targets = target_placeholder
self.training_phase = is_training
self.n_classes = n_classes
self.iterations_trained = 0
self.augment_rotate = augment_rotate_flag
self.is_tensorboard = tensorboard_use
self.strided_dim_reduction = strided_dim_reduction
self.use_batch_normalization = use_batch_normalization
def loss(self):
"""build models, calculates losses, saves summary statistcs and images.
Returns:
dict of losses.
"""
with tf.name_scope("losses"):
image_inputs = self.data_augment_batch(self.input_x) # conditionally apply augmentaions
true_outputs = self.targets
# produce predictions and get layer features to save for visual inspection
preds, layer_features = self.c(image_input=image_inputs, training=self.training_phase,
dropout_rate=self.dropout_rate)
# compute loss and accuracy
correct_prediction = tf.equal(tf.argmax(preds, 1), tf.cast(true_outputs, tf.int64))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
crossentropy_loss = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_outputs, logits=preds))
# add loss and accuracy to collections
tf.add_to_collection('crossentropy_losses', crossentropy_loss)
tf.add_to_collection('accuracy', accuracy)
# save summaries for the losses, accuracy and image summaries for input images, augmented images
# and the layer features
if len(self.input_x.get_shape().as_list()) == 4:
self.save_features(name="VGG_features", features=layer_features)
tf.summary.image('image', [tf.concat(tf.unstack(self.input_x, axis=0), axis=0)])
tf.summary.image('augmented_image', [tf.concat(tf.unstack(image_inputs, axis=0), axis=0)])
tf.summary.scalar('crossentropy_losses', crossentropy_loss)
tf.summary.scalar('accuracy', accuracy)
return {"crossentropy_losses": tf.add_n(tf.get_collection('crossentropy_losses'),
name='total_classification_loss'),
"accuracy": tf.add_n(tf.get_collection('accuracy'), name='total_accuracy')}
def save_features(self, name, features, num_rows_in_grid=4):
"""
Saves layer features in a grid to be used in tensorboard
:param name: Features name
:param features: A list of feature tensors
"""
for i in range(len(features)):
shape_in = features[i].get_shape().as_list()
channels = shape_in[3]
y_channels = num_rows_in_grid
x_channels = int(channels / y_channels)
activations_features = tf.reshape(features[i], shape=(shape_in[0], shape_in[1], shape_in[2],
y_channels, x_channels))
activations_features = tf.unstack(activations_features, axis=4)
activations_features = tf.concat(activations_features, axis=2)
activations_features = tf.unstack(activations_features, axis=3)
activations_features = tf.concat(activations_features, axis=1)
activations_features = tf.expand_dims(activations_features, axis=3)
tf.summary.image('{}_{}'.format(name, i), activations_features)
def rotate_image(self, image):
"""
Rotates a single image
:param image: An image to rotate
:return: A rotated or a non rotated image depending on the result of the flip
"""
no_rotation_flip = tf.unstack(
tf.random_uniform([1], minval=1, maxval=100, dtype=tf.int32, seed=None,
name=None)) # get a random number between 1 and 100
flip_boolean = tf.less_equal(no_rotation_flip[0], 50)
# if that number is less than or equal to 50 then set to true
random_variable = tf.unstack(tf.random_uniform([1], minval=1, maxval=3, dtype=tf.int32, seed=None, name=None))
# get a random variable between 1 and 3 for how many degrees the rotation will be i.e. k=1 means 1*90,
# k=2 2*90 etc.
image = tf.cond(flip_boolean, lambda: tf.image.rot90(image, k=random_variable[0]),
lambda: image) # if flip_boolean is true the rotate if not then do not rotate
return image
def rotate_batch(self, batch_images):
"""
Rotate a batch of images
:param batch_images: A batch of images
:return: A rotated batch of images (some images will not be rotated if their rotation flip ends up False)
"""
shapes = map(int, list(batch_images.get_shape()))
if len(list(batch_images.get_shape())) < 4:
return batch_images
batch_size, x, y, c = shapes
with tf.name_scope('augment'):
batch_images_unpacked = tf.unstack(batch_images)
new_images = []
for image in batch_images_unpacked:
new_images.append(self.rotate_image(image))
new_images = tf.stack(new_images)
new_images = tf.reshape(new_images, (batch_size, x, y, c))
return new_images
def data_augment_batch(self, batch_images):
"""
Augments data with a variety of augmentations, in the current state only does rotations.
:param batch_images: A batch of images to augment
:return: Augmented data
"""
batch_images = tf.cond(self.augment_rotate, lambda: self.rotate_batch(batch_images), lambda: batch_images)
return batch_images
def train(self, losses, learning_rate=1e-3, beta1=0.9):
"""
Args:
losses dict.
Returns:
train op.
"""
c_opt = tf.train.AdamOptimizer(beta1=beta1, learning_rate=learning_rate)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) # Needed for correct batch norm usage
with tf.control_dependencies(update_ops):
c_error_opt_op = c_opt.minimize(losses["crossentropy_losses"], var_list=self.c.variables,
colocate_gradients_with_ops=True)
return c_error_opt_op
def init_train(self):
"""
Builds graph ops and returns them
:return: Summary, losses and training ops
"""
losses_ops = self.loss()
c_error_opt_op = self.train(losses_ops)
summary_op = tf.summary.merge_all()
return summary_op, losses_ops, c_error_opt_op

View File

@ -0,0 +1,557 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to TensorFlow\n",
"\n",
"## Computation graphs\n",
"\n",
"In the first semester we used the NumPy-based `mlp` Python package to illustrate the concepts involved in automatically propagating gradients through multiple-layer neural network models. We also looked at how to use these calculated derivatives to do gradient-descent based training of models in supervised learning tasks such as classification and regression.\n",
"\n",
"A key theme in the first semester's work was the idea of defining models in a modular fashion. There we considered models composed of a sequence of *layer* modules, the output of each of which fed into the input of the next in the sequence and each applying a transformation to map inputs to outputs. By defining a standard interface to layer objects with each defining a `fprop` method to *forward propagate* inputs to outputs, and a `bprop` method to *back propagate* gradients with respect to the output of the layer to gradients with respect to the input of the layer, the layer modules could be composed together arbitarily and activations and gradients forward and back propagated through the whole stack respectively.\n",
"\n",
"<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
" <img style='margin-bottom: 1em;' src='res/pipeline-graph.png' width='30%' />\n",
" <i>'Pipeline' model composed of sequence of single input, single output layer modules</i>\n",
"</div>\n",
"\n",
"By construction a layer was defined as an object with a single array input and single array output. This is a natural fit for the architectures of standard feedforward networks which can be thought of a single pipeline of transformations from user provided input data to predicted outputs as illustrated in the figure above. \n",
"\n",
"<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
" <img style='display: inline-block; padding-right: 2em; margin-bottom: 1em;' src='res/rnn-graph.png' width='30%' />\n",
" <img style='display: inline-block; padding-left: 2em; margin-bottom: 1em;' src='res/skip-connection-graph.png' width='30%' /> <br />\n",
" <i>Models which fit less well into pipeline structure: left, a sequence-to-sequence recurrent network; right, a feed forward network with skip connections.</i>\n",
"</div>\n",
"\n",
"Towards the end of last semester however we encountered several models which do not fit so well in to this pipeline-like structure. For instance (unrolled) recurrent neural networks tend to have inputs feeding in to and outputs feeding out from multiple points along a deep feedforward model corresponding to the updates of the hidden recurrent state, as illustrated in the left panel in the figure above. It is not trivial to see how to map this structure to our layer based pipeline. Similarly models with skip connections between layers as illustrated in the right panel of the above figure also do not fit particularly well in to a pipeline structure.\n",
"\n",
"Ideally we would like to be able to compose modular components in more general structures than the pipeline structure we have being using so far. In particular it turns out to be useful to be able to deal with models which have structures defined by arbitrary [*directed acyclic graphs*](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAGs), that is graphs connected by directed edges and without any directed cycles. Both the recurrent network and skip-connections examples can be naturally expressed as DAGs as well many other model structures.\n",
"\n",
"When working with these more general graphical structures, rather than considering a graph made up of layer modules, it often more useful to consider lower level mathematical operations or *ops* that make up the computation as the fundamental building block. A DAG composed of ops is often termed a *computation graph*. THis terminolgy was covered briefly in [lecture 6](http://www.inf.ed.ac.uk/teaching/courses/mlp/2017-18/mlp06-enc.pdf), and also in the [MLPR course](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w5a_backprop.html). The backpropagation rules we used to propagate gradients through a stack of layer modules can be naturally generalised to apply to computation graphs, with this method of applying the chain rule to automatically propagate gradients backwards through a general computation graph also sometimes termed [*reverse-mode automatic differentiation*](https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation).\n",
"\n",
"<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
" <img style='margin-bottom: 1em;' src='res/affine-transform-graph.png' width='40%' />\n",
" <i>Computation / data flow graph for an affine transformation $\\boldsymbol{y} = \\mathbf{W}\\boldsymbol{x} + \\boldsymbol{b}$</i>\n",
"</div>\n",
"\n",
"The figure above shows a very simple computation graph corresponding to the mathematical expression $\\boldsymbol{y} = \\mathbf{W}\\boldsymbol{x} + \\boldsymbol{b}$, i.e. the affine transformation we encountered last semester. Here the nodes of the graph are operations and the edges the vector or matrix values passed between operations. The opposite convention with nodes as values and edges as operations is also sometimes used. Note that just like there was ambiguity about what to define as a layer (as discussed previously at beginning of the [third lab notebook](03_Multiple_layer_models.ipynb), there are a range of choices for the level of abstraction to use in the op nodes in a computational graph. For instance, we could also have chosen to express the above computational graph with a single `AffineTransform` op node with three inputs (one matrix, two vector) and one vector output. Equally we might choose to express the `MatMul` op in terms of the underlying individual scalar addition and multiplication operations. What to consider an operation is therefore somewhat a matter of choice and what is convenient in a particular setting.\n",
"\n",
"## TensorFlow\n",
"\n",
"To allow us to work with models defined by more general computation graphs and to avoid the need to write `fprop` and `bprop` methods for each new model component we want to try out, this semester we will be using the open-source computation graph framework [TensorFlow](https://www.tensorflow.org/), originally developed by the Google Brain team:\n",
"\n",
"> TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs \n",
"in a desktop, server, or mobile device with a single API.\n",
"\n",
"TensorFlow allows complex computation graphs (also known as data flow graphs in TensorFlow parlance) to be defined via a Python interface, with efficient C++ implementations for running the corresponding operations on different devices. TensorFlow also includes tools for automatic gradient computation and a large and growing suite of pre-define operations useful for gradient-based training of machine learning models.\n",
"\n",
"In this notebook we will introduce some of the basic elements of constructing, training and evaluating models with TensorFlow. This will use similar material to some of the [official TensorFlow tutorials](https://www.tensorflow.org/tutorials/) but with an additional emphasis of making links to the material covered in this course last semester. For those who have not used a computational graph framework such as TensorFlow or Theano before you may find the [basic usage tutorial](https://www.tensorflow.org/get_started/basic_usage) useful to go through.\n",
"\n",
"### Installing TensorFlow\n",
"\n",
"To install TensorFlow, open a terminal, activate your Conda `mlp` environment using\n",
"\n",
"```\n",
"source activate mlp\n",
"```\n",
"\n",
"and then run\n",
"\n",
"```\n",
"pip install tensorflow # for CPU users\n",
"```\n",
"\n",
"```\n",
"pip install tensorflow_gpu # for GPU users\n",
"```\n",
"\n",
"This should locally install the stable release version of TensorFlow (currently 1.4.1) in your Conda environment. After installing TensorFlow you may need to restart the kernel in the notebook to allow it to be imported."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1: EMNIST softmax regression\n",
"\n",
"As a first example we will train a simple softmax regression model to classify handwritten digit images from the EMNIST data set encountered last semester (for those fed up of working with EMNIST - don't worry you will soon be moving on to other datasets!). This is equivalent to the model implemented in the first exercise of the third lab notebook. We will walk through constructing an equivalent model in TensorFlow and explain new TensorFlow model concepts as we use them. You should run each cell as you progress through the exercise.\n",
"\n",
"Similarly to the common convention of importing NumPy under the shortform alias `np` it is common to import the Python TensorFlow top-level module under the alias `tf`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/antreas/anaconda2/envs/mlp/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
" return f(*args, **kwds)\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"import sys\n",
"sys.path.append(\"..\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We begin by defining [*placeholder*](https://www.tensorflow.org/api_docs/python/io_ops/placeholders) objects for the data inputs and targets arrays. These are nodes in the computation graph to which we will later *feed* in external data, such as batches of training set inputs and targets. This abstraction allows us to reuse the same computation graph for different data inputs - we can think of placeholders as acting equivalently to the arguments of a function. It is actually possible to feed data into any node in a TensorFlow graph however the advantage of using a placeholder is that is *must* always have a value fed into it (an exception will be raised if a value isn't provided) and no arbitrary alternative values needs to be entered.\n",
"\n",
"The `tf.placeholder` function has three arguments:\n",
"\n",
" * `dtype` : The [TensorFlow datatype](https://www.tensorflow.org/api_docs/python/framework/tensor_types) for the tensor e.g. `tf.float32` for single-precision floating point values.\n",
" * `shape` (optional) : An iterable defining the shape (size of each dimension) of the tensor e.g. `shape=(5, 2)` would indicate a 2D tensor (matrix) with first dimension of size 5 and second dimension of size 2. An entry of `None` in the shape definition corresponds to the corresponding dimension size being left unspecified, so for example `shape=(None, 28, 28)` would allow any 3D inputs with final two dimensions of size 28 to be inputted.\n",
" * `name` (optional): String argument defining a name for the tensor which can be useful when visualising a computation graph and for debugging purposes.\n",
" \n",
"As we will generally be working with batches of datapoints, both the `inputs` and `targets` will be 2D tensors with the first dimension corresponding to the batch size (set as `None` here to allow it to specified later) and the second dimension corresponding to the size of each input or output vector. As in the previous semester's work we will use a 1-of-K encoding for the class targets so for EMNIST each output corresponds to a vector of length 47 (number of digit/letter classes)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"inputs = tf.placeholder(tf.float32, [None, 784], 'inputs')\n",
"targets = tf.placeholder(tf.float32, [None, 47], 'targets')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now define [*variable*](https://www.tensorflow.org/api_docs/python/state_ops/variables) objects for the model parameters. Variables are stateful tensors in the computation graph - they have to be explicitly initialised and their internal values can be updated as part of the operations in a graph e.g. gradient updates to model parameter during training. They can also be saved to disk and pre-saved values restored in to a graph at a later time.\n",
"\n",
"The `tf.Variable` constructor takes an `initial_value` as its first argument; this should be a TensorFlow tensor which specifies the initial value to assign to the variable, often a constant tensor such as all zeros, or random samples from a distribution."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"weights = tf.Variable(tf.zeros([784, 47]))\n",
"biases = tf.Variable(tf.zeros([47]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now build the computation graph corresponding to producing the predicted outputs of the model (log unnormalised class probabilities) given the data inputs and model parameters. We use the TensorFlow [`matmul`](https://www.tensorflow.org/api_docs/python/math_ops/matrix_math_functions#matmul) op to compute the matrix-matrix product between the 2D array of input vectors and the weight matrix parameter variable. TensorFlow [overloads all of the common arithmetic operators](http://stackoverflow.com/a/35095052) for tensor objects so `x + y` where at least one of `x` or `y` is a tensor instance (both `tf.placeholder` and `tf.Variable` return (sub-classes) of `tf.Tensor`) corresponds to the TensorFlow elementwise addition op `tf.add`. Further elementwise binary arithmetic operators like addition follow NumPy style [broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), so in the expression below the `+ biases` sub-expression will correspond to creating an operation in the computation graph which adds the bias vector to each of the rows of the 2D tensor output of the `matmul` op."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"outputs = tf.matmul(inputs, weights) + biases"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While we could have defined `outputs` as the softmax of the expression above to produce normalised class probabilities as the outputs of the model, as discussed last semester when using a softmax output combined with a cross-entropy error function it usually desirable from a numerical stability and efficiency perspective to wrap the softmax computation in to the error computation (as done in the `CrossEntropySoftmaxError` class in our `mlp` framework). \n",
"\n",
"In TensorFlow this can be achieved with the `softmax_cross_entropy_with_logits` op which is part of the `tf.nn` submodule which contains a number of ops specifically for neural network type models. This op takes as its first input log unnormalised class probabilities (sometimes termed logits) and as second input the class label targets which should be of the same dimension as the first input. By default the last dimension of the input tensors is assumed to correspond to the class dimension - this can be altered via an optional `dim` argument.\n",
"\n",
"The output of the `softmax_cross_entropy_with_logits` op here is a 1D tensor with a cross-entropy error value for each data point in the batch. We wish to minimise the mean cross-entropy error across the full dataset and will use the mean of the error on the batch as a stochastic estimator of this value. In TensorFlow ops which *reduce* a tensor along a dimension(s), for example by taking a sum, mean, or product, are prefixed with `reduce`, with the default behaviour being to perform the reduction across all dimensions of the input tensor and return a scalar output. Therefore the second line below will take the per data point cross-entropy errors and produce a single mean value across the whole batch."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"per_datapoint_errors = tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets)\n",
"error = tf.reduce_mean(per_datapoint_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although for the purposes of training we will use the cross-entropy error as this is differentiable, for evaluation we will also be interested in the classification accuracy i.e. what proportion of all of the predicted classes correspond to the true target label. We can calculate this in TensorFlow similarly to how we used NumPy to do this previously - we use the TensorFlow `tf.argmax` op to find the index of along the class dimension corresponding to the maximum predicted class probability and check if this is equal to the index along the class dimension of the 1-of-$k$ encoded target labels. Analagously to the error computation above, this computes per-datapoint values which we then need to average across with a `reduce_mean` op to produce the classification accuracy for a batch."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"per_datapoint_pred_is_correct = tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1))\n",
"accuracy = tf.reduce_mean(tf.cast(per_datapoint_pred_is_correct, tf.float32))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned previously TensorFlow is able to automatically calculate gradients of scalar computation graph outputs with respect to tensors in the computation graph. We can explicitly construct a new sub-graph corresponding to the gradient of a scalar with respect to one or more tensors in the graph using the [`tf.gradients`](https://www.tensorflow.org/api_docs/python/train/gradient_computation) function. \n",
"\n",
"TensorFlow also however includes a number of higher-level `Optimizer` classes in the `tf.train` module that internally deal with constructing graphs corresponding to the gradients of some scalar loss with respect to one or more `Variable` tensors in the graph (usually corresponding to model parameters) and then using these gradients to update the variables (roughly equivalent to the `LearningRule` classes in the `mlp` framework). The most basic `Optimizer` instance is the `GradientDescentOptimizer` which simply adds operations corresponding to basic (stochastic) gradient descent to the graph (i.e. no momentum, adaptive learning rates etc.). The `__init__` constructor method for this class takes one argument `learning_rate` corresponding to the gradient descent learning rate / step size encountered previously.\n",
"\n",
"Usually we are not interested in the `Optimizer` object other than in adding operations in the graph corresponding to the optimisation steps. This can be achieved using the `minimize` method of the object which takes as first argument the tensor object corresponding to the scalar loss / error to be minimized. A further optional keyword argument `var_list` can be used to specify a list of variables to compute the gradients of the loss with respect to and update; by default this is set to `None` which indicates to use all trainable variables in the current graph. The `minimize` method returns an operation corresponding to applying the gradient updates to the variables - we need to store a reference to this to allow us to run these operations later. Note we do not need to store a reference to the optimizer as we have no further need of this object hence commonly the steps of constructing the `Optimizer` and calling `minimize` are commonly all applied in a single line as below."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"train_step = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(error)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have now constructed a computation graph which can compute predicted outputs, use these to calculate an error value (and accuracy) and use the gradients of the error with respect to the model parameter variables to update their values with a gradient descent step.\n",
"\n",
"Although we have defined our computation graph, we have not yet initialised any tensor data in memory - all of the tensor variables defined above are just symbolic representations of parts of the computation graph. We can think of the computation graph as a whole as being similar to a function - it defines a sequence of operations but does not directly run those operations on data itself.\n",
"\n",
"To run the operations in (part of) a TensorFlow graph we need to create a [`Session`](https://www.tensorflow.org/api_docs/python/client/session_management) object:\n",
"\n",
"> A `Session` object encapsulates the environment in which `Operation` objects are executed, and `Tensor` objects are evaluated.\n",
"\n",
"A session object can be constructed using either `tf.Session()` or `tf.InteractiveSession()`. The only difference in the latter is that it installs itself as the default session on construction. This can be useful in interactive contexts such as shells or the notebook interface in which an alternative to running a graph operation using the session `run` method (see below) is to call the `eval` method of an operation e.g. `op.eval()`; generally a session in which the op runs needs to be passed to `eval`; however if an interactive session is used, then this is set as a default to use in `eval` calls."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"sess = tf.InteractiveSession()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The key property of a session object is its `run` method. This takes an operation (or list of operations) in a defined graph as an argument and runs the parts of the computation graph necessary to evaluate the output(s) (if any) of the operation(s), and additionally performs any updates to variables states defined by the graph (e.g. gradient updates of parameters). The output values if any of the operation(s) are returned by the `run` call.\n",
"\n",
"A standard operation which needs to be called before any other operations on a graph which includes variable nodes is a variable *initializer* operation. This, as the name suggests, initialises the values of the variables in the session to the values defined by the `initial_value` argument when adding the variables to the graph. For instance for the graph we have defined here this will initialise the `weights` variable value in the session to a 2D array of zeros of shape `(784, 10)` and the `biases` variable to a 1D array of shape `(10,)`.\n",
"\n",
"We can access initializer ops for each variable individually using the `initializer` property of the variables in question and then individually run these, however a common pattern is to use the `tf.global_variables_initializer()` function to create a single initializer op which will initialise all globally defined variables in the default graph and then run this as done below."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"init_op = tf.global_variables_initializer()\n",
"sess.run(init_op)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now almost ready to begin training our defined model, however as a final step we need to create objects for accessing batches of EMNIST input and target data. In the tutorial code provided in `tf.examples.tutorials.mnist` there is an `input_data` sub-module which provides a `read_data_sets` function for downloading the MNIST data and constructing an object for iterating over MNIST data. However in the `mlp` package we already have the MNIST and EMNIST data provider classes that we used extensively last semester, and corresponding local copies of the MNIST and EMNIST data, so we will use that here as it provides all the necessary functionality."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import data_providers as data_providers\n",
"train_data = data_providers.EMNISTDataProvider('train', batch_size=50, flatten=True, one_hot=True)\n",
"valid_data = data_providers.EMNISTDataProvider('valid', batch_size=50, flatten=True, one_hot=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now all set to train our model. As when training models last semester, the training procedure will involve two nested loops - an outer loop corresponding to multiple full-passes through the dataset or *epochs* and an inner loop iterating over individual batches in the training data.\n",
"\n",
"The `init_op` we ran with `sess.run` previously did not depend on the placeholders `inputs` and `target` in our graph, so we simply ran it with `sess.run(init_op)`. The `train_step` operation corresponding to the gradient based updates of the `weights` and `biases` parameter variables does however depend on the `inputs` and `targets` placeholders and so we need to specify values to *feed* into these placeholders; as we wish the gradient updates to be calculated using the gradients with respect to a batch of inputs and targets, the values that we feed in are the input and target batches. This is specified using the keyword `feed_dict` argument to the session `run` method. As the name suggests this should be a Python dictionary (`dict`) with keys corresponding to references to the tensors in the graph to feed values in to and values the corresponding array values to feed in (typically NumPy `ndarray` instances) - here we have `feed_dict = {inputs: input_batch, targets: target_batch}`.\n",
"\n",
"Another difference in our use of the session `run` method below is that we call it with a list of two operations - `[train_step, error]` rather than just a single operation. This allows the output (and variable updates) of multiple operations in a graph to be evaluated together - here we both run the `train_step` operation to update the parameter values and evaluate the `error` operation to return the mean error on the batch. Although we could split this into two separate session `run` calls, as the operations calculating the batch error will need to be evaluated when running the `train_step` operation (as this is the value gradients are calculated with respect to) this would involve redoing some of the computation and so be less efficient than combining them in a single `run` call.\n",
"\n",
"As we are running two different operations, the `run` method returns two values here. The `train_step` operation has no outputs and so the first return value is `None` - in the code below we assign this to `_`, this being a common convention in Python code for assigning return values we are not interested in using. The second return value is the average error across the batch which we assign to `batch_error` and use to keep a running average of the dataset error across the epochs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"End of epoch 1: running error average = 1.40\n",
"End of epoch 2: running error average = 1.25\n",
"End of epoch 3: running error average = 1.22\n",
"End of epoch 4: running error average = 1.20\n",
"End of epoch 5: running error average = 1.19\n",
"End of epoch 6: running error average = 1.18\n",
"End of epoch 7: running error average = 1.18\n",
"End of epoch 8: running error average = 1.17\n"
]
}
],
"source": [
"num_epoch = 20\n",
"for e in range(num_epoch):\n",
" running_error = 0.\n",
" for input_batch, target_batch in train_data:\n",
" _, batch_error = sess.run(\n",
" [train_step, error], \n",
" feed_dict={inputs: input_batch, targets: target_batch})\n",
" running_error += batch_error\n",
" running_error /= train_data.num_batches\n",
" print('End of epoch {0}: running error average = {1:.2f}'.format(e + 1, running_error))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To check your understanding of using sessions objects to evaluate parts of a graph and feeding values in to a graph, complete the definition of the function in the cell below. This should iterate across all batches in a provided data provider and calculate the error and classification accuracy for each, accumulating the average error and accuracy values across the whole dataset and returning these as a tuple."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"def get_error_and_accuracy(data):\n",
" \"\"\"Calculate average error and classification accuracy across a dataset.\n",
" \n",
" Args:\n",
" data: Data provider which iterates over input-target batches in dataset.\n",
" \n",
" Returns:\n",
" Tuple with first element scalar value corresponding to average error\n",
" across all batches in dataset and second value corresponding to\n",
" average classification accuracy across all batches in dataset.\n",
" \"\"\"\n",
" err = 0\n",
" acc = 0\n",
" for input_batch, target_batch in data:\n",
" err += sess.run(error, feed_dict={inputs: input_batch, targets: target_batch})\n",
" acc += sess.run(accuracy, feed_dict={inputs: input_batch, targets: target_batch})\n",
" err /= data.num_batches\n",
" acc /= data.num_batches\n",
" return err, acc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test your implementation by running the cell below - this should print the error and accuracy of the trained model on the validation and training datasets if implemented correctly."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train data: Error=1.14 Accuracy=0.69\n",
"Valid data: Error=1.29 Accuracy=0.66\n"
]
}
],
"source": [
"print('Train data: Error={0:.2f} Accuracy={1:.2f}'\n",
" .format(*get_error_and_accuracy(train_data)))\n",
"print('Valid data: Error={0:.2f} Accuracy={1:.2f}'\n",
" .format(*get_error_and_accuracy(valid_data)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2: Explicit graphs, name scopes, summaries and TensorBoard\n",
"\n",
"In the exercise above we introduced most of the basic concepts needed for constructing graphs in TensorFlow and running graph operations. In an attempt to avoid introducing too many new terms and syntax at once however we skipped over some of the non-essential elements of creating and running models in TensorFlow, in particular some of the provided functionality for organising and structuring the computation graphs created and for monitoring the progress of training runs.\n",
"\n",
"Now that you are hopefully more familiar with the basics of TensorFlow we will introduce some of these features as they are likely to provide useful when you are building and training more complex models in the rest of this semester.\n",
"\n",
"Although we started off by motivating TensorFlow as a framework which builds computation graphs, in the code above we never explicitly referenced a graph object. This is because TensorFlow always registers a default graph at start up and all operations are added to this graph by default. The default graph can be accessed using `tf.get_default_graph()`. For example running the code in the cell below will assign a reference to the default graph to `default_graph` and print the total number of operations in the current graph definition."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of operations in graph: 198\n"
]
}
],
"source": [
"default_graph = tf.get_default_graph()\n",
"print('Number of operations in graph: {0}'\n",
" .format(len(default_graph.get_operations())))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also explicitly create a new graph object using `tf.Graph()`. This may be useful if we wish to build up several independent computation graphs."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"graph = tf.Graph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To add operations to a constructed graph object, we use the `graph.as_default()` [context manager](http://book.pythontips.com/en/latest/context_managers.html). Context managers are used with the `with` statement in Python - `with context_manager:` opens a block in Python in which a special `__enter__` method of the `context_manager` object is called before the code in the block is run and a further special `__exit__` method is run after the block code has finished execution. This can be used to for example manage allocation of resources (e.g. file handles) but also to locally change some 'context' in the code - in the example here, `graph.as_default()` is a context manager which changes the default graph within the following block to be `graph` before returning to the previous default graph once the block code is finished running. Context managers are used extensively in TensorFlow so it is worth being familiar with how they work.\n",
"\n",
"Another common context manager usage in TensorFlow is to define *name scopes*. As we encountered earlier, individual operations in a TensorFlow graph can be assigned names. As we will see later this is useful for making graphs interpretable when we use the tools provided in TensorFlow for visualising them. As computation graphs can become very big (even the quite simple graph we created in the first exercise has around 100 operations in it) even with interpretable names attached to the graph operations it can still be difficult to understand and debug what is happening in a graph. Therefore rather than simply allowing a single-level naming scheme to be applied to the individual operations in the graph, TensorFlow supports hierachical naming of sub-graphs. This allows sets of related operations to be grouped together under a common name, and thus allows both higher and lower level structure in a graph to be easily identified.\n",
"\n",
"This hierarchical naming is performed by using the name scope context manager `tf.name_scope('name')`. Starting a block `with tf.name_scope('name'):`, will cause all the of the operations added to a graph within that block to be grouped under the name specified in the `tf.name_scope` call. Name scope blocks can be nested to allow finer-grained sub-groupings of operations. Name scopes can be used to group operations at various levels e.g. operations corresponding to inference/prediction versus training, grouping operations which correspond to the classical definition of a neural network layer etc.\n",
"\n",
"The code in the cell below uses both a `graph.as_default()` context manager and name scopes to create a second copy of the computation graph corresponding to softmax regression that we constructed in the previous exercise."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"with graph.as_default():\n",
" with tf.name_scope('data'):\n",
" inputs = tf.placeholder(tf.float32, [None, 784], name='inputs')\n",
" targets = tf.placeholder(tf.float32, [None, 47], name='targets')\n",
" with tf.name_scope('parameters'):\n",
" weights = tf.Variable(tf.zeros([784, 47]), name='weights')\n",
" biases = tf.Variable(tf.zeros([47]), name='biases')\n",
" with tf.name_scope('model'):\n",
" outputs = tf.matmul(inputs, weights) + biases\n",
" with tf.name_scope('error'):\n",
" error = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets))\n",
" with tf.name_scope('train'):\n",
" train_step = tf.train.GradientDescentOptimizer(0.5).minimize(error)\n",
" with tf.name_scope('accuracy'):\n",
" accuracy = tf.reduce_mean(tf.cast(\n",
" tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1)), tf.float32))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As hinted earlier TensorFlow comes with tools for visualising computation graphs. In particular [TensorBoard](https://www.tensorflow.org/how_tos/summaries_and_tensorboard/) is an interactive web application for amongst other things visualising TensorFlow computation graphs (we will explore some of its other functionality in the latter part of the exercise). Typically TensorBoard in launched from a terminal and a browser used to connect to the resulting locally running TensorBoard server instance. However for the purposes of graph visualisation it is also possible to embed a remotely-served TensorBoard graph visualisation interface in a Jupyter notebook using the helper function below (a slight variant of the recipe in [this notebook](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb)).\n",
"\n",
"<span style='color: red; font-weight: bold;'>Note: The code below seems to not work for some people when accessing the notebook in Firefox. You can either try loading the notebook in an alternative browser, or just skip this section for now and explore the graph visualisation tool when launching TensorBoard below.</span>"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display, HTML\n",
"import datetime\n",
"\n",
"def show_graph(graph_def, frame_size=(900, 600)):\n",
" \"\"\"Visualize TensorFlow graph.\"\"\"\n",
" if hasattr(graph_def, 'as_graph_def'):\n",
" graph_def = graph_def.as_graph_def()\n",
" timestamp = datetime.datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n",
" code = \"\"\"\n",
" <script>\n",
" function load() {{\n",
" document.getElementById(\"{id}\").pbtxt = {data};\n",
" }}\n",
" </script>\n",
" <link rel=\"import\" href=\"https://tensorboard.appspot.com/tf-graph-basic.build.html\" onload=load()>\n",
" <div style=\"height:{height}px\">\n",
" <tf-graph-basic id=\"{id}\"></tf-graph-basic>\n",
" </div>\n",
" \"\"\".format(height=frame_size[1], data=repr(str(graph_def)), id='graph'+timestamp)\n",
" iframe = \"\"\"\n",
" <iframe seamless style=\"width:{width}px;height:{height}px;border:0\" srcdoc=\"{src}\"></iframe>\n",
" \"\"\".format(width=frame_size[0], height=frame_size[1] + 20, src=code.replace('\"', '&quot;'))\n",
" display(HTML(iframe))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the cell below to display a visualisation of the graph we just defined. Notice that by default all operations within a particular defined name scope are grouped under a single node; this allows the top-level structure of the graph and how data flows between the various components to be easily visualised. We can also expand these nodes however to interrogate the operations within them - simply double-click on one of the nodes to do this (double-clicking on the expanded node will cause it to collapse again). If you expand the `model` node you should see a graph closely mirroring the affine transform example given as a motivation above."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"show_graph(graph)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@ -0,0 +1,514 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to TensorFlow\n",
"\n",
"## Computation graphs\n",
"\n",
"In the first semester we used the NumPy-based `mlp` Python package to illustrate the concepts involved in automatically propagating gradients through multiple-layer neural network models. We also looked at how to use these calculated derivatives to do gradient-descent based training of models in supervised learning tasks such as classification and regression.\n",
"\n",
"A key theme in the first semester's work was the idea of defining models in a modular fashion. There we considered models composed of a sequence of *layer* modules, the output of each of which fed into the input of the next in the sequence and each applying a transformation to map inputs to outputs. By defining a standard interface to layer objects with each defining a `fprop` method to *forward propagate* inputs to outputs, and a `bprop` method to *back propagate* gradients with respect to the output of the layer to gradients with respect to the input of the layer, the layer modules could be composed together arbitarily and activations and gradients forward and back propagated through the whole stack respectively.\n",
"\n",
"<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
" <img style='margin-bottom: 1em;' src='res/pipeline-graph.png' width='30%' />\n",
" <i>'Pipeline' model composed of sequence of single input, single output layer modules</i>\n",
"</div>\n",
"\n",
"By construction a layer was defined as an object with a single array input and single array output. This is a natural fit for the architectures of standard feedforward networks which can be thought of a single pipeline of transformations from user provided input data to predicted outputs as illustrated in the figure above. \n",
"\n",
"<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
" <img style='display: inline-block; padding-right: 2em; margin-bottom: 1em;' src='res/rnn-graph.png' width='30%' />\n",
" <img style='display: inline-block; padding-left: 2em; margin-bottom: 1em;' src='res/skip-connection-graph.png' width='30%' /> <br />\n",
" <i>Models which fit less well into pipeline structure: left, a sequence-to-sequence recurrent network; right, a feed forward network with skip connections.</i>\n",
"</div>\n",
"\n",
"Towards the end of last semester however we encountered several models which do not fit so well in to this pipeline-like structure. For instance (unrolled) recurrent neural networks tend to have inputs feeding in to and outputs feeding out from multiple points along a deep feedforward model corresponding to the updates of the hidden recurrent state, as illustrated in the left panel in the figure above. It is not trivial to see how to map this structure to our layer based pipeline. Similarly models with skip connections between layers as illustrated in the right panel of the above figure also do not fit particularly well in to a pipeline structure.\n",
"\n",
"Ideally we would like to be able to compose modular components in more general structures than the pipeline structure we have being using so far. In particular it turns out to be useful to be able to deal with models which have structures defined by arbitrary [*directed acyclic graphs*](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAGs), that is graphs connected by directed edges and without any directed cycles. Both the recurrent network and skip-connections examples can be naturally expressed as DAGs as well many other model structures.\n",
"\n",
"When working with these more general graphical structures, rather than considering a graph made up of layer modules, it often more useful to consider lower level mathematical operations or *ops* that make up the computation as the fundamental building block. A DAG composed of ops is often termed a *computation graph*. THis terminolgy was covered briefly in [lecture 6](http://www.inf.ed.ac.uk/teaching/courses/mlp/2017-18/mlp06-enc.pdf), and also in the [MLPR course](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w5a_backprop.html). The backpropagation rules we used to propagate gradients through a stack of layer modules can be naturally generalised to apply to computation graphs, with this method of applying the chain rule to automatically propagate gradients backwards through a general computation graph also sometimes termed [*reverse-mode automatic differentiation*](https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation).\n",
"\n",
"<div style='margin: auto; text-align: center; padding-top: 1em;'>\n",
" <img style='margin-bottom: 1em;' src='res/affine-transform-graph.png' width='40%' />\n",
" <i>Computation / data flow graph for an affine transformation $\\boldsymbol{y} = \\mathbf{W}\\boldsymbol{x} + \\boldsymbol{b}$</i>\n",
"</div>\n",
"\n",
"The figure above shows a very simple computation graph corresponding to the mathematical expression $\\boldsymbol{y} = \\mathbf{W}\\boldsymbol{x} + \\boldsymbol{b}$, i.e. the affine transformation we encountered last semester. Here the nodes of the graph are operations and the edges the vector or matrix values passed between operations. The opposite convention with nodes as values and edges as operations is also sometimes used. Note that just like there was ambiguity about what to define as a layer (as discussed previously at beginning of the [third lab notebook](03_Multiple_layer_models.ipynb), there are a range of choices for the level of abstraction to use in the op nodes in a computational graph. For instance, we could also have chosen to express the above computational graph with a single `AffineTransform` op node with three inputs (one matrix, two vector) and one vector output. Equally we might choose to express the `MatMul` op in terms of the underlying individual scalar addition and multiplication operations. What to consider an operation is therefore somewhat a matter of choice and what is convenient in a particular setting.\n",
"\n",
"## TensorFlow\n",
"\n",
"To allow us to work with models defined by more general computation graphs and to avoid the need to write `fprop` and `bprop` methods for each new model component we want to try out, this semester we will be using the open-source computation graph framework [TensorFlow](https://www.tensorflow.org/), originally developed by the Google Brain team:\n",
"\n",
"> TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs \n",
"in a desktop, server, or mobile device with a single API.\n",
"\n",
"TensorFlow allows complex computation graphs (also known as data flow graphs in TensorFlow parlance) to be defined via a Python interface, with efficient C++ implementations for running the corresponding operations on different devices. TensorFlow also includes tools for automatic gradient computation and a large and growing suite of pre-define operations useful for gradient-based training of machine learning models.\n",
"\n",
"In this notebook we will introduce some of the basic elements of constructing, training and evaluating models with TensorFlow. This will use similar material to some of the [official TensorFlow tutorials](https://www.tensorflow.org/tutorials/) but with an additional emphasis of making links to the material covered in this course last semester. For those who have not used a computational graph framework such as TensorFlow or Theano before you may find the [basic usage tutorial](https://www.tensorflow.org/get_started/basic_usage) useful to go through.\n",
"\n",
"### Installing TensorFlow\n",
"\n",
"To install TensorFlow, open a terminal, activate your Conda `mlp` environment using\n",
"\n",
"```\n",
"source activate mlp\n",
"```\n",
"\n",
"and then run\n",
"\n",
"```\n",
"pip install tensorflow # for CPU users\n",
"```\n",
"\n",
"```\n",
"pip install tensorflow_gpu # for GPU users\n",
"```\n",
"\n",
"This should locally install the stable release version of TensorFlow (currently 1.4.1) in your Conda environment. After installing TensorFlow you may need to restart the kernel in the notebook to allow it to be imported."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1: EMNIST softmax regression\n",
"\n",
"As a first example we will train a simple softmax regression model to classify handwritten digit images from the EMNIST data set encountered last semester (for those fed up of working with EMNIST - don't worry you will soon be moving on to other datasets!). This is equivalent to the model implemented in the first exercise of the third lab notebook. We will walk through constructing an equivalent model in TensorFlow and explain new TensorFlow model concepts as we use them. You should run each cell as you progress through the exercise.\n",
"\n",
"Similarly to the common convention of importing NumPy under the shortform alias `np` it is common to import the Python TensorFlow top-level module under the alias `tf`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We begin by defining [*placeholder*](https://www.tensorflow.org/api_docs/python/io_ops/placeholders) objects for the data inputs and targets arrays. These are nodes in the computation graph to which we will later *feed* in external data, such as batches of training set inputs and targets. This abstraction allows us to reuse the same computation graph for different data inputs - we can think of placeholders as acting equivalently to the arguments of a function. It is actually possible to feed data into any node in a TensorFlow graph however the advantage of using a placeholder is that is *must* always have a value fed into it (an exception will be raised if a value isn't provided) and no arbitrary alternative values needs to be entered.\n",
"\n",
"The `tf.placeholder` function has three arguments:\n",
"\n",
" * `dtype` : The [TensorFlow datatype](https://www.tensorflow.org/api_docs/python/framework/tensor_types) for the tensor e.g. `tf.float32` for single-precision floating point values.\n",
" * `shape` (optional) : An iterable defining the shape (size of each dimension) of the tensor e.g. `shape=(5, 2)` would indicate a 2D tensor (matrix) with first dimension of size 5 and second dimension of size 2. An entry of `None` in the shape definition corresponds to the corresponding dimension size being left unspecified, so for example `shape=(None, 28, 28)` would allow any 3D inputs with final two dimensions of size 28 to be inputted.\n",
" * `name` (optional): String argument defining a name for the tensor which can be useful when visualising a computation graph and for debugging purposes.\n",
" \n",
"As we will generally be working with batches of datapoints, both the `inputs` and `targets` will be 2D tensors with the first dimension corresponding to the batch size (set as `None` here to allow it to specified later) and the second dimension corresponding to the size of each input or output vector. As in the previous semester's work we will use a 1-of-K encoding for the class targets so for EMNIST each output corresponds to a vector of length 47 (number of digit/letter classes)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"inputs = tf.placeholder(tf.float32, [None, 784], 'inputs')\n",
"targets = tf.placeholder(tf.float32, [None, 47], 'targets')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now define [*variable*](https://www.tensorflow.org/api_docs/python/state_ops/variables) objects for the model parameters. Variables are stateful tensors in the computation graph - they have to be explicitly initialised and their internal values can be updated as part of the operations in a graph e.g. gradient updates to model parameter during training. They can also be saved to disk and pre-saved values restored in to a graph at a later time.\n",
"\n",
"The `tf.Variable` constructor takes an `initial_value` as its first argument; this should be a TensorFlow tensor which specifies the initial value to assign to the variable, often a constant tensor such as all zeros, or random samples from a distribution."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weights = tf.Variable(tf.zeros([784, 47]))\n",
"biases = tf.Variable(tf.zeros([47]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now build the computation graph corresponding to producing the predicted outputs of the model (log unnormalised class probabilities) given the data inputs and model parameters. We use the TensorFlow [`matmul`](https://www.tensorflow.org/api_docs/python/math_ops/matrix_math_functions#matmul) op to compute the matrix-matrix product between the 2D array of input vectors and the weight matrix parameter variable. TensorFlow [overloads all of the common arithmetic operators](http://stackoverflow.com/a/35095052) for tensor objects so `x + y` where at least one of `x` or `y` is a tensor instance (both `tf.placeholder` and `tf.Variable` return (sub-classes) of `tf.Tensor`) corresponds to the TensorFlow elementwise addition op `tf.add`. Further elementwise binary arithmetic operators like addition follow NumPy style [broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), so in the expression below the `+ biases` sub-expression will correspond to creating an operation in the computation graph which adds the bias vector to each of the rows of the 2D tensor output of the `matmul` op."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"outputs = tf.matmul(inputs, weights) + biases"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While we could have defined `outputs` as the softmax of the expression above to produce normalised class probabilities as the outputs of the model, as discussed last semester when using a softmax output combined with a cross-entropy error function it usually desirable from a numerical stability and efficiency perspective to wrap the softmax computation in to the error computation (as done in the `CrossEntropySoftmaxError` class in our `mlp` framework). \n",
"\n",
"In TensorFlow this can be achieved with the `softmax_cross_entropy_with_logits` op which is part of the `tf.nn` submodule which contains a number of ops specifically for neural network type models. This op takes as its first input log unnormalised class probabilities (sometimes termed logits) and as second input the class label targets which should be of the same dimension as the first input. By default the last dimension of the input tensors is assumed to correspond to the class dimension - this can be altered via an optional `dim` argument.\n",
"\n",
"The output of the `softmax_cross_entropy_with_logits` op here is a 1D tensor with a cross-entropy error value for each data point in the batch. We wish to minimise the mean cross-entropy error across the full dataset and will use the mean of the error on the batch as a stochastic estimator of this value. In TensorFlow ops which *reduce* a tensor along a dimension(s), for example by taking a sum, mean, or product, are prefixed with `reduce`, with the default behaviour being to perform the reduction across all dimensions of the input tensor and return a scalar output. Therefore the second line below will take the per data point cross-entropy errors and produce a single mean value across the whole batch."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"per_datapoint_errors = tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets)\n",
"error = tf.reduce_mean(per_datapoint_errors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although for the purposes of training we will use the cross-entropy error as this is differentiable, for evaluation we will also be interested in the classification accuracy i.e. what proportion of all of the predicted classes correspond to the true target label. We can calculate this in TensorFlow similarly to how we used NumPy to do this previously - we use the TensorFlow `tf.argmax` op to find the index of along the class dimension corresponding to the maximum predicted class probability and check if this is equal to the index along the class dimension of the 1-of-$k$ encoded target labels. Analagously to the error computation above, this computes per-datapoint values which we then need to average across with a `reduce_mean` op to produce the classification accuracy for a batch."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"per_datapoint_pred_is_correct = tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1))\n",
"accuracy = tf.reduce_mean(tf.cast(per_datapoint_pred_is_correct, tf.float32))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned previously TensorFlow is able to automatically calculate gradients of scalar computation graph outputs with respect to tensors in the computation graph. We can explicitly construct a new sub-graph corresponding to the gradient of a scalar with respect to one or more tensors in the graph using the [`tf.gradients`](https://www.tensorflow.org/api_docs/python/train/gradient_computation) function. \n",
"\n",
"TensorFlow also however includes a number of higher-level `Optimizer` classes in the `tf.train` module that internally deal with constructing graphs corresponding to the gradients of some scalar loss with respect to one or more `Variable` tensors in the graph (usually corresponding to model parameters) and then using these gradients to update the variables (roughly equivalent to the `LearningRule` classes in the `mlp` framework). The most basic `Optimizer` instance is the `GradientDescentOptimizer` which simply adds operations corresponding to basic (stochastic) gradient descent to the graph (i.e. no momentum, adaptive learning rates etc.). The `__init__` constructor method for this class takes one argument `learning_rate` corresponding to the gradient descent learning rate / step size encountered previously.\n",
"\n",
"Usually we are not interested in the `Optimizer` object other than in adding operations in the graph corresponding to the optimisation steps. This can be achieved using the `minimize` method of the object which takes as first argument the tensor object corresponding to the scalar loss / error to be minimized. A further optional keyword argument `var_list` can be used to specify a list of variables to compute the gradients of the loss with respect to and update; by default this is set to `None` which indicates to use all trainable variables in the current graph. The `minimize` method returns an operation corresponding to applying the gradient updates to the variables - we need to store a reference to this to allow us to run these operations later. Note we do not need to store a reference to the optimizer as we have no further need of this object hence commonly the steps of constructing the `Optimizer` and calling `minimize` are commonly all applied in a single line as below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_step = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(error)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have now constructed a computation graph which can compute predicted outputs, use these to calculate an error value (and accuracy) and use the gradients of the error with respect to the model parameter variables to update their values with a gradient descent step.\n",
"\n",
"Although we have defined our computation graph, we have not yet initialised any tensor data in memory - all of the tensor variables defined above are just symbolic representations of parts of the computation graph. We can think of the computation graph as a whole as being similar to a function - it defines a sequence of operations but does not directly run those operations on data itself.\n",
"\n",
"To run the operations in (part of) a TensorFlow graph we need to create a [`Session`](https://www.tensorflow.org/api_docs/python/client/session_management) object:\n",
"\n",
"> A `Session` object encapsulates the environment in which `Operation` objects are executed, and `Tensor` objects are evaluated.\n",
"\n",
"A session object can be constructed using either `tf.Session()` or `tf.InteractiveSession()`. The only difference in the latter is that it installs itself as the default session on construction. This can be useful in interactive contexts such as shells or the notebook interface in which an alternative to running a graph operation using the session `run` method (see below) is to call the `eval` method of an operation e.g. `op.eval()`; generally a session in which the op runs needs to be passed to `eval`; however if an interactive session is used, then this is set as a default to use in `eval` calls."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sess = tf.InteractiveSession()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The key property of a session object is its `run` method. This takes an operation (or list of operations) in a defined graph as an argument and runs the parts of the computation graph necessary to evaluate the output(s) (if any) of the operation(s), and additionally performs any updates to variables states defined by the graph (e.g. gradient updates of parameters). The output values if any of the operation(s) are returned by the `run` call.\n",
"\n",
"A standard operation which needs to be called before any other operations on a graph which includes variable nodes is a variable *initializer* operation. This, as the name suggests, initialises the values of the variables in the session to the values defined by the `initial_value` argument when adding the variables to the graph. For instance for the graph we have defined here this will initialise the `weights` variable value in the session to a 2D array of zeros of shape `(784, 10)` and the `biases` variable to a 1D array of shape `(10,)`.\n",
"\n",
"We can access initializer ops for each variable individually using the `initializer` property of the variables in question and then individually run these, however a common pattern is to use the `tf.global_variables_initializer()` function to create a single initializer op which will initialise all globally defined variables in the default graph and then run this as done below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"init_op = tf.global_variables_initializer()\n",
"sess.run(init_op)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now almost ready to begin training our defined model, however as a final step we need to create objects for accessing batches of EMNIST input and target data. In the tutorial code provided in `tf.examples.tutorials.mnist` there is an `input_data` sub-module which provides a `read_data_sets` function for downloading the MNIST data and constructing an object for iterating over MNIST data. However in the `mlp` package we already have the MNIST and EMNIST data provider classes that we used extensively last semester, and corresponding local copies of the MNIST and EMNIST data, so we will use that here as it provides all the necessary functionality."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import data_providers as data_providers\n",
"train_data = data_providers.EMNISTDataProvider('train', batch_size=50, flatten=True, one_hot=True)\n",
"valid_data = data_providers.EMNISTDataProvider('valid', batch_size=50, flatten=True, one_hot=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now all set to train our model. As when training models last semester, the training procedure will involve two nested loops - an outer loop corresponding to multiple full-passes through the dataset or *epochs* and an inner loop iterating over individual batches in the training data.\n",
"\n",
"The `init_op` we ran with `sess.run` previously did not depend on the placeholders `inputs` and `target` in our graph, so we simply ran it with `sess.run(init_op)`. The `train_step` operation corresponding to the gradient based updates of the `weights` and `biases` parameter variables does however depend on the `inputs` and `targets` placeholders and so we need to specify values to *feed* into these placeholders; as we wish the gradient updates to be calculated using the gradients with respect to a batch of inputs and targets, the values that we feed in are the input and target batches. This is specified using the keyword `feed_dict` argument to the session `run` method. As the name suggests this should be a Python dictionary (`dict`) with keys corresponding to references to the tensors in the graph to feed values in to and values the corresponding array values to feed in (typically NumPy `ndarray` instances) - here we have `feed_dict = {inputs: input_batch, targets: target_batch}`.\n",
"\n",
"Another difference in our use of the session `run` method below is that we call it with a list of two operations - `[train_step, error]` rather than just a single operation. This allows the output (and variable updates) of multiple operations in a graph to be evaluated together - here we both run the `train_step` operation to update the parameter values and evaluate the `error` operation to return the mean error on the batch. Although we could split this into two separate session `run` calls, as the operations calculating the batch error will need to be evaluated when running the `train_step` operation (as this is the value gradients are calculated with respect to) this would involve redoing some of the computation and so be less efficient than combining them in a single `run` call.\n",
"\n",
"As we are running two different operations, the `run` method returns two values here. The `train_step` operation has no outputs and so the first return value is `None` - in the code below we assign this to `_`, this being a common convention in Python code for assigning return values we are not interested in using. The second return value is the average error across the batch which we assign to `batch_error` and use to keep a running average of the dataset error across the epochs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_epoch = 20\n",
"for e in range(num_epoch):\n",
" running_error = 0.\n",
" for input_batch, target_batch in train_data:\n",
" _, batch_error = sess.run(\n",
" [train_step, error], \n",
" feed_dict={inputs: input_batch, targets: target_batch})\n",
" running_error += batch_error\n",
" running_error /= train_data.num_batches\n",
" print('End of epoch {0}: running error average = {1:.2f}'.format(e + 1, running_error))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To check your understanding of using sessions objects to evaluate parts of a graph and feeding values in to a graph, complete the definition of the function in the cell below. This should iterate across all batches in a provided data provider and calculate the error and classification accuracy for each, accumulating the average error and accuracy values across the whole dataset and returning these as a tuple."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_error_and_accuracy(data):\n",
" \"\"\"Calculate average error and classification accuracy across a dataset.\n",
" \n",
" Args:\n",
" data: Data provider which iterates over input-target batches in dataset.\n",
" \n",
" Returns:\n",
" Tuple with first element scalar value corresponding to average error\n",
" across all batches in dataset and second value corresponding to\n",
" average classification accuracy across all batches in dataset.\n",
" \"\"\"\n",
" err = 0\n",
" acc = 0\n",
" for input_batch, target_batch in data:\n",
" err += sess.run(error, feed_dict={inputs: input_batch, targets: target_batch})\n",
" acc += sess.run(accuracy, feed_dict={inputs: input_batch, targets: target_batch})\n",
" err /= data.num_batches\n",
" acc /= data.num_batches\n",
" return err, acc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test your implementation by running the cell below - this should print the error and accuracy of the trained model on the validation and training datasets if implemented correctly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('Train data: Error={0:.2f} Accuracy={1:.2f}'\n",
" .format(*get_error_and_accuracy(train_data)))\n",
"print('Valid data: Error={0:.2f} Accuracy={1:.2f}'\n",
" .format(*get_error_and_accuracy(valid_data)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2: Explicit graphs, name scopes, summaries and TensorBoard\n",
"\n",
"In the exercise above we introduced most of the basic concepts needed for constructing graphs in TensorFlow and running graph operations. In an attempt to avoid introducing too many new terms and syntax at once however we skipped over some of the non-essential elements of creating and running models in TensorFlow, in particular some of the provided functionality for organising and structuring the computation graphs created and for monitoring the progress of training runs.\n",
"\n",
"Now that you are hopefully more familiar with the basics of TensorFlow we will introduce some of these features as they are likely to provide useful when you are building and training more complex models in the rest of this semester.\n",
"\n",
"Although we started off by motivating TensorFlow as a framework which builds computation graphs, in the code above we never explicitly referenced a graph object. This is because TensorFlow always registers a default graph at start up and all operations are added to this graph by default. The default graph can be accessed using `tf.get_default_graph()`. For example running the code in the cell below will assign a reference to the default graph to `default_graph` and print the total number of operations in the current graph definition."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"default_graph = tf.get_default_graph()\n",
"print('Number of operations in graph: {0}'\n",
" .format(len(default_graph.get_operations())))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also explicitly create a new graph object using `tf.Graph()`. This may be useful if we wish to build up several independent computation graphs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"graph = tf.Graph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To add operations to a constructed graph object, we use the `graph.as_default()` [context manager](http://book.pythontips.com/en/latest/context_managers.html). Context managers are used with the `with` statement in Python - `with context_manager:` opens a block in Python in which a special `__enter__` method of the `context_manager` object is called before the code in the block is run and a further special `__exit__` method is run after the block code has finished execution. This can be used to for example manage allocation of resources (e.g. file handles) but also to locally change some 'context' in the code - in the example here, `graph.as_default()` is a context manager which changes the default graph within the following block to be `graph` before returning to the previous default graph once the block code is finished running. Context managers are used extensively in TensorFlow so it is worth being familiar with how they work.\n",
"\n",
"Another common context manager usage in TensorFlow is to define *name scopes*. As we encountered earlier, individual operations in a TensorFlow graph can be assigned names. As we will see later this is useful for making graphs interpretable when we use the tools provided in TensorFlow for visualising them. As computation graphs can become very big (even the quite simple graph we created in the first exercise has around 100 operations in it) even with interpretable names attached to the graph operations it can still be difficult to understand and debug what is happening in a graph. Therefore rather than simply allowing a single-level naming scheme to be applied to the individual operations in the graph, TensorFlow supports hierachical naming of sub-graphs. This allows sets of related operations to be grouped together under a common name, and thus allows both higher and lower level structure in a graph to be easily identified.\n",
"\n",
"This hierarchical naming is performed by using the name scope context manager `tf.name_scope('name')`. Starting a block `with tf.name_scope('name'):`, will cause all the of the operations added to a graph within that block to be grouped under the name specified in the `tf.name_scope` call. Name scope blocks can be nested to allow finer-grained sub-groupings of operations. Name scopes can be used to group operations at various levels e.g. operations corresponding to inference/prediction versus training, grouping operations which correspond to the classical definition of a neural network layer etc.\n",
"\n",
"The code in the cell below uses both a `graph.as_default()` context manager and name scopes to create a second copy of the computation graph corresponding to softmax regression that we constructed in the previous exercise."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with graph.as_default():\n",
" with tf.name_scope('data'):\n",
" inputs = tf.placeholder(tf.float32, [None, 784], name='inputs')\n",
" targets = tf.placeholder(tf.float32, [None, 47], name='targets')\n",
" with tf.name_scope('parameters'):\n",
" weights = tf.Variable(tf.zeros([784, 47]), name='weights')\n",
" biases = tf.Variable(tf.zeros([47]), name='biases')\n",
" with tf.name_scope('model'):\n",
" outputs = tf.matmul(inputs, weights) + biases\n",
" with tf.name_scope('error'):\n",
" error = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets))\n",
" with tf.name_scope('train'):\n",
" train_step = tf.train.GradientDescentOptimizer(0.5).minimize(error)\n",
" with tf.name_scope('accuracy'):\n",
" accuracy = tf.reduce_mean(tf.cast(\n",
" tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1)), tf.float32))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As hinted earlier TensorFlow comes with tools for visualising computation graphs. In particular [TensorBoard](https://www.tensorflow.org/how_tos/summaries_and_tensorboard/) is an interactive web application for amongst other things visualising TensorFlow computation graphs (we will explore some of its other functionality in the latter part of the exercise). Typically TensorBoard in launched from a terminal and a browser used to connect to the resulting locally running TensorBoard server instance. However for the purposes of graph visualisation it is also possible to embed a remotely-served TensorBoard graph visualisation interface in a Jupyter notebook using the helper function below (a slight variant of the recipe in [this notebook](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb)).\n",
"\n",
"<span style='color: red; font-weight: bold;'>Note: The code below seems to not work for some people when accessing the notebook in Firefox. You can either try loading the notebook in an alternative browser, or just skip this section for now and explore the graph visualisation tool when launching TensorBoard below.</span>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display, HTML\n",
"import datetime\n",
"\n",
"def show_graph(graph_def, frame_size=(900, 600)):\n",
" \"\"\"Visualize TensorFlow graph.\"\"\"\n",
" if hasattr(graph_def, 'as_graph_def'):\n",
" graph_def = graph_def.as_graph_def()\n",
" timestamp = datetime.datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n",
" code = \"\"\"\n",
" <script>\n",
" function load() {{\n",
" document.getElementById(\"{id}\").pbtxt = {data};\n",
" }}\n",
" </script>\n",
" <link rel=\"import\" href=\"https://tensorboard.appspot.com/tf-graph-basic.build.html\" onload=load()>\n",
" <div style=\"height:{height}px\">\n",
" <tf-graph-basic id=\"{id}\"></tf-graph-basic>\n",
" </div>\n",
" \"\"\".format(height=frame_size[1], data=repr(str(graph_def)), id='graph'+timestamp)\n",
" iframe = \"\"\"\n",
" <iframe seamless style=\"width:{width}px;height:{height}px;border:0\" srcdoc=\"{src}\"></iframe>\n",
" \"\"\".format(width=frame_size[0], height=frame_size[1] + 20, src=code.replace('\"', '&quot;'))\n",
" display(HTML(iframe))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the cell below to display a visualisation of the graph we just defined. Notice that by default all operations within a particular defined name scope are grouped under a single node; this allows the top-level structure of the graph and how data flows between the various components to be easily visualised. We can also expand these nodes however to interrogate the operations within them - simply double-click on one of the nodes to do this (double-clicking on the expanded node will cause it to collapse again). If you expand the `model` node you should see a graph closely mirroring the affine transform example given as a motivation above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"show_graph(graph)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@ -0,0 +1,99 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tensorflow Experimentation Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"In the previous tutorial we introduced the tensorflow framework and some very basic functionality that it can provide. In this tutorial we will present a far more readable and research oriented tensorflow based code-base that allows one to quickly build new model architectures and research experiments in tensorflow. The proposed code-structure has been tested in real research and has proven a very readable and easily modifiable setup. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The tf_mlp package contains all necessary modules with which one can easily train and evaluate a classifier. The following packages can be found:\n",
"\n",
" 1. utils: \n",
" 1. network_summary: Provides utilities with which one can get network summaries, such as the number of parameters and names of layers.\n",
" 2. parser_utils which are used to parse arguments passed to the training scripts.\n",
" 3. storage, which is responsible for storing network statistics.\n",
" 2. data_providers.py : Provides the data providers for training, validation and testing.\n",
" 3. network_architectures.py: Defines the network architectures. We provide VGGNet as an example.\n",
" 4. network_builder.py: Builds the tensorflow computation graph. In more detail, it builds the losses, tensorflow summaries and training operations.\n",
" 5. network_trainer.py: Runs an experiment, composed of training, validation and testing. It is setup to use arguments such that one can easily write multiple bash scripts with different hyperparameters and run experiments very quickly with minimal code changes.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run an experiment just run:\n",
"\n",
"```\n",
"python network_trainer.py --batch_size 128 --epochs 100 --experiment_prefix VGG_EMNIST --tensorboard_use True --batch_norm_use True --strided_dim_reduction True --seed 16122017\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The arguments after network_trainer.py can be changed to suit your experimental needs. For more arguments and exploring how to add new arguments of your own please view parser_utils.py under utils and network_trainer.py as they provide all the functionality that is necessary to add arguments."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally remember to make sure your code is not just efficient but readable. Research code has a very bad reputation, so let's try to improve readability one research line code at a time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run tensorboard just point tensorboard to the correct logs repository as follows:\n",
" ```tensorboard --port 60xx --logdir /path/to/logs```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3.0
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

BIN
notebooks/res/rnn-graph.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

View File

@ -0,0 +1,286 @@
## Datasets Available on AFS
For your convinience we provided data providers for cifar10/100 and million song dataset. Below you can find
information on the datasets and the AFS paths where one can find them.
## CIFAR-10 and CIFAR-100 datasets
[CIFAR-10 and CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) are a pair of image classification datasets collected by collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. They are labelled subsets of the much larger [80 million tiny images](dataset). They are a common benchmark task for image classification - a list of current accuracy benchmarks for both data sets are maintained by Rodrigo Benenson [here](http://rodrigob.github.io/are_we_there_yet/build/).
As the name suggests, CIFAR-10 has images in 10 classes:
airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck
with 6000 images per class for an overall dataset size of 60000. Each image has three (RGB) colour channels and pixel dimension 32×32, corresponding to a total dimension per input image of 3×32×32=3072. For each colour channel the input values have been normalised to the range [0, 1].
CIFAR-100 has images of identical dimensions to CIFAR-10 but rather than 10 classes they are instead split across 100 fine-grained classes (and 20 coarser 'superclasses' comprising multiple finer classes):
<table style='border: none;'>
<tbody><tr style='font-weight: bold;'>
<td>Superclass</td>
<td>Classes</td>
</tr>
<tr>
<td>aquatic mammals</td>
<td>beaver, dolphin, otter, seal, whale</td>
</tr>
<tr>
<td>fish</td>
<td>aquarium fish, flatfish, ray, shark, trout</td>
</tr>
<tr>
<td>flowers</td>
<td>orchids, poppies, roses, sunflowers, tulips</td>
</tr>
<tr>
<td>food containers</td>
<td>bottles, bowls, cans, cups, plates</td>
</tr>
<tr>
<td>fruit and vegetables</td>
<td>apples, mushrooms, oranges, pears, sweet peppers</td>
</tr>
<tr>
<td>household electrical devices</td>
<td>clock, computer keyboard, lamp, telephone, television</td>
</tr>
<tr>
<td>household furniture</td>
<td>bed, chair, couch, table, wardrobe</td>
</tr>
<tr>
<td>insects</td>
<td>bee, beetle, butterfly, caterpillar, cockroach</td>
</tr>
<tr>
<td>large carnivores</td>
<td>bear, leopard, lion, tiger, wolf</td>
</tr>
<tr>
<td>large man-made outdoor things</td>
<td>bridge, castle, house, road, skyscraper</td>
</tr>
<tr>
<td>large natural outdoor scenes</td>
<td>cloud, forest, mountain, plain, sea</td>
</tr>
<tr>
<td>large omnivores and herbivores</td>
<td>camel, cattle, chimpanzee, elephant, kangaroo</td>
</tr>
<tr>
<td>medium-sized mammals</td>
<td>fox, porcupine, possum, raccoon, skunk</td>
</tr>
<tr>
<td>non-insect invertebrates</td>
<td>crab, lobster, snail, spider, worm</td>
</tr>
<tr>
<td>people</td>
<td>baby, boy, girl, man, woman</td>
</tr>
<tr>
<td>reptiles</td>
<td>crocodile, dinosaur, lizard, snake, turtle</td>
</tr>
<tr>
<td>small mammals</td>
<td>hamster, mouse, rabbit, shrew, squirrel</td>
</tr>
<tr>
<td>trees</td>
<td>maple, oak, palm, pine, willow</td>
</tr>
<tr>
<td>vehicles 1</td>
<td>bicycle, bus, motorcycle, pickup truck, train</td>
</tr>
<tr>
<td>vehicles 2</td>
<td>lawn-mower, rocket, streetcar, tank, tractor</td>
</tr>
</tbody></table>
Each class has 600 examples in it, giving an overall dataset size of 60000 i.e. the same as CIFAR-10.
Both CIFAR-10 and CIFAR-100 have standard splits into 50000 training examples and 10000 test examples. For CIFAR-100 as there is an optional Kaggle competition (see below) scored on predictions on the test set, we have used a non-standard assignation of examples to test and training set and only provided the inputs (and not target labels) for the 10000 examples chosen for the test set.
For CIFAR-10 the 10000 test set examples have labels provided: to avoid any accidental over-fitting to the test set **you should only use these for the final evaluation of your model(s)**. If you repeatedly evaluate models on the test set during model development it is easy to end up indirectly fitting to the test labels - for those who have not already read it see this [excellent cautionary note from the MLPR notes by Iain Murray](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w2a_train_test_val.html#warning-dont-fool-yourself-or-make-a-fool-of-yourself).
For both CIFAR-10 and CIFAR-100, the remaining 50000 non-test examples have been split in to a 40000 example training dataset and a 10000 example validation dataset, each with target labels provided. If you wish to use a more complex cross-fold validation scheme you may want to combine these two portions of the dataset and define your own functions for separating out a validation set.
Data provider classes for both CIFAR-10 and CIFAR-100 are available in the `mlp.data_providers` module. Both have similar behaviour to the `MNISTDataProvider` used extensively last semester. A `which_set` argument can be used to specify whether to return a data provided for the training dataset (`which_set='train'`) or validation dataset (`which_set='valid'`).
The CIFAR-100 data provider also takes an optional `use_coarse_targets` argument in its constructor. By default this is set to `False` and the targets returned by the data provider correspond to 1-of-K encoded binary vectors for the 100 fine-grained object classes. If `use_coarse_targets=True` then instead the data provider will return 1-of-K encoded binary vector targets for the 20 coarse-grained superclasses associated with each input instead.
Both data provider classes provide a `label_map` attribute which is a list of strings which are the class labels corresponding to the integer targets (i.e. prior to conversion to a 1-of-K encoded binary vector).
### Accessing the CIFAR-10 and CIFAR-100 data
Before using the data provider objects you will need to make sure the data files are accessible to the `mlp` package by existing under the directory specified by the `MLP_DATA_DIR` path.
The data is available as compressed NumPy `.npz` files in the AFS directory `/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/`.
If you are working on DICE one option is to redefine your `MLP_DATA_DIR` to directly point to the shared AFS data directory by editing the `env_vars.sh` start up file for your environment. This will avoid using up your DICE quota by storing the data files in your homespace but may involve slower initial loading of the data on initialising the data providers if many people are trying access the same files at once. The environment variable can be redefined by running
```
gedit ~/miniconda3/envs/mlp/etc/conda/activate.d/env_vars.sh
```
in a terminal window (assuming you installed `miniconda3` to your home directory), and changing the line
```
export MLP_DATA_DIR=$HOME/mlpractical/data
```
to
```
export MLP_DATA_DIR="`/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/`"
```
and then saving and closing the editor. You will need reload the `mlp` environment using `source activate mlp` and restart the Jupyter notebook server in the reloaded environment for the new environment variable definition to be available.
For those working on DICE who have sufficient quota remaining or those using there own machine, an alternative option is to copy the data files in to your local `mlp/data` directory (or wherever your `MLP_DATA_DIR` environment variable currently points to if different).
Assuming your local `mlpractical` repository is in your home directory you should be able to copy the required files on DICE by running
```
cp `/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/cifar*.npz ~/mlpractical/data
```
On a non-DICE machine, you will need to either [set up local access to AFS](http://computing.help.inf.ed.ac.uk/informatics-filesystem), use a remote file transfer client like `scp` or you can alternatively download the files using the iFile web interface [here](https://ifile.inf.ed.ac.uk/?path=%2Fafs%2Finf.ed.ac.uk%2Fgroup%2Fteaching%2Fmlp%2Fdata&goChange=Go) (requires DICE credentials).
As some of the files are quite large you may wish to copy only those you are using currently (e.g. only the files for one of the two tasks) to your local filespace to avoid filling up your quota. The `cifar-100-test-inputs.npz` file will only be needed by those intending to enter the associated optional Kaggle competition.
## Genre classification with the Million Song Dataset
The [Million Song Dataset](http://labrosa.ee.columbia.edu/millionsong/) is a
> freely-available collection of audio features and metadata for a million contemporary popular music tracks
originally collected and compiled by Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere.
The dataset is intended to encourage development of algorithms in the field of [music information retrieval](https://en.wikipedia.org/wiki/Music_information_retrieval). The [data for each track](http://labrosa.ee.columbia.edu/millionsong/pages/example-track-description) includes both textual features such as artist and album names, numerical descriptors such as duration and various audio features derived using a music analysis platform provided by [The Echo Nest](https://en.wikipedia.org/wiki/The_Echo_Nest) (since acquired by Spotify). Of the various audio features and segmentations included in the full dataset, the most detailed information is included at a 'segment' level: each segment corresponds to an automatically identified 'quasi-stable music event' - roughly contiguous sections of the audio with similar perceptual quality. The number of segments per track is variable and each segment can itself be of variable length - typically they seem to be around 0.2 - 0.4 seconds but can be as long as 10 seconds or more.
For each segment of the track various extracted audio features are available - a 12 dimensional vector of [chroma features](https://en.wikipedia.org/wiki/Chroma_feature), a 12 dimensional vector of ['MFCC-like'](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) timbre features and various measures of the loudness of the segment, including loudness at the segment start and maximum loudness. In the version of the data we provide, we include a 25 dimensional vector for each included segment, consisting of the 12 timbre features, 12 chroma features and loudness at start of segment concatenated in that order. To allow easier integration in to standard feedforward models, the basic version of the data we provide includes features only for a fixed length crop of the central 120 segments of each track (with tracks with less than 120 segments therefore not being included). This gives an overall input dimension per track of 120×25=3000. Each of the 3000 input dimensions has been been preprocessed by subtracting the per-dimension mean across the training data and dividing by the per-dimension standard deviation across the training data.
We provide data providers for the fixed length crops versions of the input features, with the inputs being returned in batches of 3000 dimensional vectors (these can be reshaped to (120, 25) to get the per-segment features). To allow for more complex variable-length sequence modelling with for example recurrent neural networks, we also provide a variable length version of the data. This is only provided as compressed NumPy (`.npz`) data files rather than data provider objects - you will need to write your own data provider if you wish to use this version of the data. As the inputs are of variable number of segments they have been ['bucketed'](https://www.tensorflow.org/tutorials/seq2seq/#bucketing_and_padding) into groups of similar maximum length, with the following binning scheme used:
120 - 250 segments
251 - 500 segments
501 - 650 segments
651 - 800 segments
801 - 950 segments
951 - 1200 segments
1201 - 2000 segments
2000 - 4000 segments
For each bucket the NumPy data files include inputs and targets arrays with second dimension equal to the maximum sgement size in the bucket (e.g. 250 for the bucket) and first dimension equal to the number of tracks with number of segments in that bucket. These are named `inputs_{n}` and `targets_{n}` in the data file where `{n}` is the maximal number of segments in the bucket e.g. `inputs_250` and `targets_250` for the first bucket. For tracks with less segments than the maximum size in the bucket, the features for the track have been padded with `NaN` values. For tracks with more segments than the maximum bucket size of 4000, only the first 4000 segments have been included.
To allow you to match tracks between the fixed length and variable length datasets, the data files also include an array for each bucket giving the indices of the corresponding track in the fixed length input arrays. For example the array `indices_250` will be an array of the same size as the first dimension of `inputs_250` and `targets_250` with the first element of `indices_250` giving the index into the `inputs` and `targets` array of the fixed length data corresponding to first element of `inputs_250` and `targets_250`.
The Million Song Dataset in its original form does not provide any genre labels, however various external groups have proposed genre labels for portions of the data by cross-referencing the track IDs against external music tagging databases. Analagously to the provision of both simpler and more complex classifications tasks for the CIFAR-10 / CIFAR-100 datasets, we provide two classification task datasets derived from the Million Song Dataset - one with 10 coarser level genre classes, and another with 25 finer-grained genre / style classifications.
The 10-genre classification task uses the [*CD2C tagtraum genre annotations*](http://www.tagtraum.com/msd_genre_datasets.html) derived from multiple source databases (beaTunes genre dataset, Last.fm dataset, Top-MAGD dataset), with the *CD2C* variant using only non-ambiguous annotations (i.e. not including tracks with multiple genre labels). Of the 15 genre labels provided in the CD2C annotations, 5 (World, Latin, Punk, Folk and New Age) were not included due to having fewer than 5000 examples available. This left 10 remaining genre classes:
Rap
Rock
RnB
Electronic
Metal
Blues
Pop
Jazz
Country
Reggae
For each of these 10 classes, 5000 labelled examples have been collected for training / validation (i.e. 50000 example in total) and a further 1000 example per class for testing, with the exception of the `Blues` class for which only 991 testing examples are provided due to there being insufficient labelled tracks of the minimum required length (i.e. a total of 9991 test examples).
The 9991 test set examples have labels provided: however to avoid any accidental over-fitting to the test set **you should only use these for the final evaluation of your model(s)**. If you repeatedly evaluate models on the test set during model development it is easy to end up indirectly fitting to the test labels - for those who have not already read it see this [excellent cautionary note int the MLPR notes by Iain Murray](http://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w2a_train_test_val.html#warning-dont-fool-yourself-or-make-a-fool-of-yourself).
The 25-genre classification tasks uses the [*MSD Allmusic Style Dataset*](http://www.ifs.tuwien.ac.at/mir/msd/MASD.html) labels derived from the [AllMusic.com](http://www.allmusic.com/) database by [Alexander Schindler, Rudolf Mayer and Andreas Rauber of Vienna University of Technology](http://www.ifs.tuwien.ac.at/~schindler/pubs/ISMIR2012.pdf). The 25 genre / style labels used are:
Big Band
Blues Contemporary
Country Traditional
Dance
Electronica
Experimental
Folk International
Gospel
Grunge Emo
Hip Hop Rap
Jazz Classic
Metal Alternative
Metal Death
Metal Heavy
Pop Contemporary
Pop Indie
Pop Latin
Punk
Reggae
RnB Soul
Rock Alternative
Rock College
Rock Contemporary
Rock Hard
Rock Neo Psychedelia
For each of these 25 classes, 2000 labelled examples have been collected for training / validation (i.e. 50000 example in total). A further 400 example per class have been collected for testing (i.e. 10000 examples in total), which you are provided inputs but not targets for. The optional Kaggle competition being run for this dataset (see email) is scored based on the 25-genre class label predictions on these unlabelled test inputs.
The tracks used for the 25-genre classification task only partially overlap with those used for the 10-genre classification task and we do not provide any mapping between the two.
For each of the two tasks, the 50000 examples collected for training have been pre-split in to a 40000 example training dataset and a 10000 example validation dataset. If you wish to use a more complex cross-fold validation scheme you may want to combine these two portions of the dataset and define your own functions / classes for separating out a validation set.
Data provider classes for both fixed length input data for the 10 and 25 genre classification tasks in the `mlp.data_providers` module as `MSD10GenreDataProvider` and `MSD25GenreDataProvider`. Both have similar behaviour to the `MNISTDataProvider` used extensively last semester. A `which_set` argument can be used to specify whether to return a data provided for the training dataset (`which_set='train'`) or validation dataset (`which_set='valid'`). Both data provider classes provide a `label_map` attribute which is a list of strings which are the class labels corresponding to the integer targets (i.e. prior to conversion to a 1-of-K encoded binary vector).
The test dataset files for the 10 genre classification task are provided as two separate NumPy data files `msd-10-genre-test-inputs.npz` and `msd-10-genre-test-targets.npz`. These can be loaded using [`np.load`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html) function. The inputs are stored as a $10000\times3000$ array under the key `inputs` in the file `msd-10-genre-test-inputs.npz` and the targets in a 10000 element array of integer labels under the key `targets` in `msd-10-genre-test-targets.npz`. A corresponding `msd-25-genre-test-inputs.npz` file is provided for the 25 genre task inputs.
### Accessing the Million Song Dataset data
Before using the data provider objects you will need to make sure the data files are accessible to the `mlp` package by existing under the directory specified by the `MLP_DATA_DIR` path.
The fixed length input data and associated targets is available as compressed NumPy `.npz` files in the AFS directory ``/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/``.
If you are working on DICE one option is to redefine your `MLP_DATA_DIR` to directly point to the shared AFS data directory by editing the `env_vars.sh` start up file for your environment. This will avoid using up your DICE quota by storing the data files in your homespace but may involve slower initial loading of the data on initialising the data providers if many people are trying access the same files at once. The environment variable can be redefined by running
```
gedit ~/miniconda3/envs/mlp/etc/conda/activate.d/env_vars.sh
```
in a terminal window (assuming you installed `miniconda3` to your home directory), and changing the line
```
export MLP_DATA_DIR=$HOME/mlpractical/data
```
to
```
export MLP_DATA_DIR="/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/"
```
and then saving and closing the editor. You will need reload the `mlp` environment using `source activate mlp` and restart the Jupyter notebook server in the reloaded environment for the new environment variable definition to be available.
Assuming your local `mlpractical` repository is in your home directory you should be able to copy the required files on DICE by running
```
cp `/afs/inf.ed.ac.uk/group/teaching/mlp/data/2017-18/msd*.npz ~/mlpractical/data
```
On a non-DICE machine, you will need to either [set up local access to AFS](http://computing.help.inf.ed.ac.uk/informatics-filesystem), use a remote file transfer client like `scp` or you can alternatively download the files using the iFile web interface [here](https://ifile.inf.ed.ac.uk/?path=%2Fafs%2Finf.ed.ac.uk%2Fgroup%2Fteaching%2Fmlp%2Fdata&goChange=Go) (requires DICE credentials).
As some of the files are quite large you may wish to copy only those you are using currently (e.g. only the files for one of the two tasks) to your local filespace to avoid filling up your quota. The `cifar-100-test-inputs.npz` file will only be needed by those intending to enter the associated optional Kaggle competition.

View File

@ -0,0 +1,166 @@
# GPU Cluster Quick-Start Guide
This guide is intended to guide students into the basics of using the mlp1/mlp2 GPU clusters. It is not intended to be
an exhaustive guide that goes deep into micro-details of the Slurm ecosystem. For an exhaustive guide please visit
[the Slurm Documentation page.](https://slurm.schedmd.com/)
## What is the GPU Cluster?
It's cluster consisting of server rack machines, each equipped with 8 NVIDIA 1060 GTX 6GB. Initially there are 9 servers (72 GPUs) available for use, during February this should grow up to 25 servers (200 GPUs). The system has is managed using the open source cluster management software named
[Slurm](https://slurm.schedmd.com/overview.html). Slurm has various advantages over the competition, including full
support of GPU resource scheduling.
## Why do I need it?
Most Deep Learning experiments require a large amount of compute as you have noticed in term 1. Usage of GPU can
accelerate experiments around 30-50x therefore making experiments that require a large amount of time feasible by
slashing their runtimes down by a massive factor. For a simple example consider an experiment that required a month to
run, that would make it infeasible to actually do research with. Now consider that experiment only requiring 1 day to
run, which allows one to iterate over methodologies, tune hyperparameters and overall try far more things. This simple
example expresses one of the simplest reasons behind the GPU hype that surrounds machine learning research today.
## Getting Started
### Accessing the Cluster:
1. If you are not on a DICE machine, then ssh into your dice home using ```ssh sxxxxxx@student.ssh.inf.ed.ac.uk```
2. Then ssh into either mlp1 or mlp2 which are the headnodes of the GPU cluster - it does not matter which you use. To do that
run ```ssh mlp1``` or ```ssh mlp2```.
3. You are now logged into the gpu cluster. If this is your first time logging in you'll need to build your environment. This is because your home directory on the GPU cluster is separate to your usual AFS home directory on DICE.
### Installing requirements:
1. Start by downloading the miniconda3 installation file using
```wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh```.
2. Now run the installation using ```bash Miniconda3-latest-Linux-x86_64.sh```. At the first prompt reply yes.
```
Do you accept the license terms? [yes|no]
[no] >>> yes
```
3. At the second prompt simply press enter.
```
Miniconda3 will now be installed into this location:
/home/sxxxxxxx/miniconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
```
4. Now you need to activate your environment by first running:
```source .bashrc```.
This reloads .bashrc which includes the new miniconda path.
5. Run ```source activate``` to load miniconda root.
6. Now run ```conda create -n mlp python=3``` this will create the mlp environment. At the prompt choose y.
7. Now run ```source activate mlp```.
8. Install git using```conda install git```. Then config git using:
```
git config --global user.name "[your name]"
git config --global user.email "[matric-number]@sms.ed.ac.uk"
```
9. Now clone the mlpractical repo using ```git clone https://github.com/CSTR-Edinburgh/mlpractical.git```.
10. Checkout the semester_2 branch using ```git checkout mlp2017-8/semester_2_materials```.
11. ```cd mlpractical``` and then install the required packages using ```pip install -r requirements_gpu.txt```.
12. Once this is done you will need to setup the MLP_DATA path using the following block of commands:
```bash
cd ~/miniconda3/envs/mlp
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
echo -e '#!/bin/sh\n' >> ./etc/conda/activate.d/env_vars.sh
echo "export MLP_DATA_DIR=$HOME/mlpractical/data" >> ./etc/conda/activate.d/env_vars.sh
echo -e '#!/bin/sh\n' >> ./etc/conda/deactivate.d/env_vars.sh
echo 'unset MLP_DATA_DIR' >> ./etc/conda/deactivate.d/env_vars.sh
export MLP_DATA_DIR=$HOME/mlpractical/data
```
13. This includes all of the required installations. Proceed to the next section outlining how to use the slurm cluster
management software. Please remember to clean your setup files using ```conda clean -t```
### Using Slurm
Slurm provides us with some commands that can be used to submit, delete, view, explore current jobs, nodes and resources among others.
To submit a job one needs to use ```sbatch script.sh``` which will automatically find available nodes and pass the job,
resources and restrictions required. The script.sh is the bash script containing the job that we want to run. Since we will be using the NVIDIA CUDA and CUDNN libraries
we have provided a sample script which should be used for your job submissions. The script is explained in detail below:
```bash
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --gres=gpu:1 # use 1 GPU
#SBATCH --mem=16000 # memory in Mb
#SBATCH -o outfile # send stdout to outfile
#SBATCH -e errfile # send stderr to errfile
#SBATCH -t 1:00:00 # time requested in hour:minute:seconds
# Setup CUDA and CUDNN related paths
export CUDA_HOME=/opt/cuda-8.0.44
export CUDNN_HOME=/opt/cuDNN-6.0_8.0
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
# Setup a folder in the very fast scratch disk which can be used for storing experiment objects and any other files
# that may require storage during execution.
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
# Run the python script that will train our network
python network_trainer.py --batch_size 128 --epochs 200 --experiment_prefix vgg-net-emnist-sample-exp --dropout_rate 0.4 --batch_norm_use True --strided_dim_reduction True --seed 25012018
```
To actually run this use ```sbatch gpu_cluster_tutorial_training_script.sh```. When you do this, the job will be submitted and you will be given a job id.
```bash
[burly]sxxxxxxx: sbatch gpu_cluster_tutorial_training_script.sh
Submitted batch job 147
```
To view a list of all running jobs use ```squeue``` for a minimal presentation and ```smap``` for a more involved presentation. Furthermore to view node information use ```sinfo```.
```bash
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
143 interacti bash iainr R 8:00 1 landonia05
147 interacti gpu_clus sxxxxxxx R 1:05 1 landonia02
```
Also in case you want to stop/delete a job use ```scancel job_id``` where job_id is the id of the job.
Furthermore in case you want to test some of your code interactively to prototype your solution before you submit it to
a node you can use ```srun -p interactive --gres=gpu:2 --pty python my_code_exp.py```.
## Slurm Cheatsheet
For a nice list of most commonly used Slurm commands please visit [here](https://bitsanddragons.wordpress.com/2017/04/12/slurm-user-cheatsheet/).
## Syncing or copying data over to DICE
At some point you will need to copy your data to DICE so you can analyse them and produce charts, write reports, store for future use etc.
To do that there is a couple of ways:
1. If you want to get your files while you are in dice simply run ```scp mlp1:/home/<username>/output output``` where username is the student id
and output is the file you want to copy. Use scp -r for folders. Furthermore you might want to just selectively sync
only new files. You can achieve that via syncing using rsync.
```rsync -ua --progress mlp1:/home/<username>/project_dir target_dir```. The option -u updates only changed files, -a will pack the files before sending and --progress will give you a progress bar that shows what is being sent and how fast.
rsync is useful when you write code remotely and want to push it to the cluster, since it can track files and automatically update changed files it saves both compute time and human time, because you won't have to spent time figuring out what to send.
2. If you want to send your files while in mlp1-2 to dice. First run ```renc``` give your password and enter. Then run:
```
cp ~/output /afs/inf.ed.ac.uk/u/s/<studentUUN>
```
This should directly copy the files to AFS. Furthermore one can use rsync as shown before.
## Additional Help
If you require additional help as usual please post on piazza or ask in the tech support helpdesk.

3
requirements.txt Normal file
View File

@ -0,0 +1,3 @@
tensorflow==1.4.1
tqdm==4.11.2
numpy==1.13.1

3
requirements_gpu.txt Normal file
View File

@ -0,0 +1,3 @@
tensorflow_gpu==1.4.1
tqdm==4.11.2
numpy==1.13.1

View File

@ -1,13 +0,0 @@
""" Setup script for mlp package. """
from setuptools import setup
setup(
name = "mlp",
author = "Pawel Swietojanski, Steve Renals, Matt Graham and Antreas Antoniou",
description = ("Neural network framework for University of Edinburgh "
"School of Informatics Machine Learning Practical course."),
url = "https://github.com/CSTR-Edinburgh/mlpractical",
packages=['mlp']
)

0
utils/__init__.py Normal file
View File

27
utils/network_summary.py Normal file
View File

@ -0,0 +1,27 @@
def count_parameters(network_variables, name):
"""
This method counts the total number of unique parameters for a list of variable objects
:param network_variables: A list of tf network variable objects
:param name: Name of the network
"""
total_parameters = 0
for variable in network_variables:
# shape is an array of tf.Dimension
print(variable)
shape = variable.get_shape()
variable_parametes = 1
for dim in shape:
variable_parametes *= dim.value
total_parameters += variable_parametes
print(name, "has a total of", total_parameters, "parameters")
def view_names_of_variables(variables):
"""
View all variable names in a tf variable list
:param variables: A list of tf variables
"""
for variable in variables:
print(variable)

44
utils/parser_utils.py Normal file
View File

@ -0,0 +1,44 @@
class ParserClass(object):
def __init__(self, parser):
"""
Parses arguments and saves them in the Parser Class
:param parser: A parser to get input from
"""
parser.add_argument('--batch_size', nargs="?", type=int, default=64, help='batch_size for experiment')
parser.add_argument('--epochs', type=int, nargs="?", default=100, help='Number of epochs to train for')
parser.add_argument('--logs_path', type=str, nargs="?", default="classification_logs/",
help='Experiment log path, '
'where tensorboard is saved, '
'along with .csv of results')
parser.add_argument('--experiment_prefix', nargs="?", type=str, default="classification",
help='Experiment name without hp details')
parser.add_argument('--continue_epoch', nargs="?", type=int, default=-1, help="ID of epoch to continue from, "
"-1 means from scratch")
parser.add_argument('--tensorboard_use', nargs="?", type=str, default="False",
help='Whether to use tensorboard')
parser.add_argument('--dropout_rate', nargs="?", type=float, default=0.35, help="Dropout value")
parser.add_argument('--batch_norm_use', nargs="?", type=str, default="False", help='Whether to use tensorboard')
parser.add_argument('--strided_dim_reduction', nargs="?", type=str, default="False",
help='Whether to use tensorboard')
parser.add_argument('--seed', nargs="?", type=int, default=1122017, help='Whether to use tensorboard')
self.args = parser.parse_args()
def get_argument_variables(self):
"""
Processes the parsed arguments and produces variables of specific types needed for the experiments
:return: Arguments needed for experiments
"""
batch_size = self.args.batch_size
experiment_prefix = self.args.experiment_prefix
strided_dim_reduction = True if self.args.strided_dim_reduction == "True" else False
batch_norm = True if self.args.batch_norm_use == "True" else False
seed = self.args.seed
dropout_rate = self.args.dropout_rate
tensorboard_enable = True if self.args.tensorboard_use == "True" else False
continue_from_epoch = self.args.continue_epoch # use -1 to start from scratch
epochs = self.args.epochs
logs_path = self.args.logs_path
return batch_size, seed, epochs, logs_path, continue_from_epoch, tensorboard_enable, batch_norm, \
strided_dim_reduction, experiment_prefix, dropout_rate

70
utils/storage.py Normal file
View File

@ -0,0 +1,70 @@
import csv
import numpy as np
def save_statistics(log_dir, statistics_file_name, list_of_statistics, create=False):
"""
Saves a statistics .csv file with the statistics
:param log_dir: Directory of log
:param statistics_file_name: Name of .csv file
:param list_of_statistics: A list of statistics to add in the file
:param create: If True creates a new file, if False adds list to existing
"""
if create:
with open("{}/{}.csv".format(log_dir, statistics_file_name), 'w+') as f:
writer = csv.writer(f)
writer.writerow(list_of_statistics)
else:
with open("{}/{}.csv".format(log_dir, statistics_file_name), 'a') as f:
writer = csv.writer(f)
writer.writerow(list_of_statistics)
def load_statistics(log_dir, statistics_file_name):
"""
Loads the statistics in a dictionary.
:param log_dir: The directory in which the log is saved
:param statistics_file_name: The name of the statistics file
:return: A dict with the statistics
"""
data_dict = dict()
with open("{}/{}.csv".format(log_dir, statistics_file_name), 'r') as f:
lines = f.readlines()
data_labels = lines[0].replace("\n", "").replace("\r", "").split(",")
del lines[0]
for label in data_labels:
data_dict[label] = []
for line in lines:
data = line.replace("\n", "").replace("\r", "").split(",")
for key, item in zip(data_labels, data):
if item not in data_labels:
data_dict[key].append(item)
return data_dict
def get_best_validation_model_statistics(log_dir, statistics_file_name):
"""
Returns the best val epoch and val accuracy from a log csv file
:param log_dir: The log directory the file is saved in
:param statistics_file_name: The log file name
:return: The best validation accuracy and the epoch at which it is produced
"""
log_file_dict = load_statistics(statistics_file_name=statistics_file_name, log_dir=log_dir)
val_acc = np.array(log_file_dict['val_c_accuracy'], dtype=np.float32)
best_val_acc = np.max(val_acc)
best_val_epoch = np.argmax(val_acc)
return best_val_acc, best_val_epoch
def build_experiment_folder(experiment_name, log_path):
saved_models_filepath = "{}/{}/{}".format(log_path, experiment_name.replace("%.%", "/"), "saved_models")
logs_filepath = "{}/{}/{}".format(log_path, experiment_name.replace("%.%", "/"), "summary_logs")
import os
if not os.path.exists(logs_filepath):
os.makedirs(logs_filepath)
if not os.path.exists(saved_models_filepath):
os.makedirs(saved_models_filepath)
return saved_models_filepath, logs_filepath