mlpractical/notebooks/09a_Object_recognition_with_CIFAR-10_and_CIFAR-100.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "from mlp.data_providers import CIFAR10DataProvider, CIFAR100DataProvider\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CIFAR-10 and CIFAR-100 datasets\n",
    "\n",
    "[CIFAR-10 and CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) are a pair of image classification datasets collected by collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. They are labelled subsets of the much larger [80 million tiny images](dataset). They are a common benchmark task for image classification - a list of current accuracy benchmarks for both data sets are maintained by Rodrigo Benenson [here](http://rodrigob.github.io/are_we_there_yet/build/).\n",
    "\n",
    "As the name suggests, CIFAR-10 has images in 10 classes:\n",
    "\n",
    "> airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck\n",
    "\n",
    "with 6000 images per class for an overall dataset size of 60000. Each image has three (RGB) colour channels and pixel dimension 32×32, corresponding to a total dimension per input image of 3×32×32=3072. For each colour channel the input values have been normalised to the range [0, 1].\n",
    "\n",
    "CIFAR-100 has images of identical dimensions to CIFAR-10 but rather than 10 classes they are instead split across 100 fine-grained classes (and 20 coarser 'superclasses' comprising multiple finer classes):\n",
    "\n",
    "<table style='border: none;'>\n",
    "    <tbody><tr style='font-weight: bold;'>\n",
    "        <td>Superclass</td>\n",
    "        <td>Classes</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>aquatic mammals</td>\n",
    "        <td>beaver, dolphin, otter, seal, whale</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>fish</td>\n",
    "        <td>aquarium fish, flatfish, ray, shark, trout</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>flowers</td>\n",
    "        <td>orchids, poppies, roses, sunflowers, tulips</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>food containers</td>\n",
    "        <td>bottles, bowls, cans, cups, plates</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>fruit and vegetables</td>\n",
    "        <td>apples, mushrooms, oranges, pears, sweet peppers</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>household electrical devices</td>\n",
    "        <td>clock, computer keyboard, lamp, telephone, television</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>household furniture</td>\n",
    "        <td>bed, chair, couch, table, wardrobe</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>insects</td>\n",
    "        <td>bee, beetle, butterfly, caterpillar, cockroach</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>large carnivores</td>\n",
    "        <td>bear, leopard, lion, tiger, wolf</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>large man-made outdoor things</td>\n",
    "        <td>bridge, castle, house, road, skyscraper</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>large natural outdoor scenes</td>\n",
    "        <td>cloud, forest, mountain, plain, sea</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>large omnivores and herbivores</td>\n",
    "        <td>camel, cattle, chimpanzee, elephant, kangaroo</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>medium-sized mammals</td>\n",
    "        <td>fox, porcupine, possum, raccoon, skunk</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>non-insect invertebrates</td>\n",
    "        <td>crab, lobster, snail, spider, worm</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>people</td>\n",
    "        <td>baby, boy, girl, man, woman</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>reptiles</td>\n",
    "        <td>crocodile, dinosaur, lizard, snake, turtle</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>small mammals</td>\n",
    "        <td>hamster, mouse, rabbit, shrew, squirrel</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>trees</td>\n",
    "        <td>maple, oak, palm, pine, willow</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>vehicles 1</td>\n",
    "        <td>bicycle, bus, motorcycle, pickup truck, train</td>\n",
    "    </tr>\n",
    "    <tr>\n",
    "        <td>vehicles 2</td>\n",
    "        <td>lawn-mower, rocket, streetcar, tank, tractor</td>\n",
    "    </tr>\n",
    "</tbody></table>\n",
    "\n",
    "Each class has 600 examples in it, giving an overall dataset size of 60000 i.e. the same as CIFAR-10.\n",
    "\n",
    "Both CIFAR-10 and CIFAR-100 have standard splits into 50000 training examples and 10000 test examples. To avoid accidental (or purposeful...) fitting to the test set, we have used a different assignation of examples to test and training sets and only provided the inputs (and not target labels) for the 10000 examples chosen for the test set. The remaining 50000 examples have been split in to a 40000 example training dataset and a 10000 example validation dataset, each with target labels provided. If you wish to use a more complex cross-fold validation scheme you may want to combine these two portions of the dataset and define your own functions for separating out a validation set.\n",
    "\n",
    "Data provider classes for both CIFAR-10 and CIFAR-100 are available in the `mlp.data_providers` module. Both have similar behaviour to the `MNISTDataProvider` used extensively last semester. A `which_set` argument can be used to specify whether to return a data provided for the training dataset (`which_set='train'`) or validation dataset (`which_set='valid'`). \n",
    "\n",
    "The CIFAR-100 data provider also takes an optional `use_coarse_targets` argument in its constructor. By default this is set to `False` and the targets returned by the data provider correspond to 1-of-K encoded binary vectors for the 100 fine-grained object classes. If `use_coarse_targets=True` then instead the data provider will return 1-of-K encoded binary vector targets for the 20 coarse-grained superclasses associated with each input instead.\n",
    "\n",
    "Both data provider classes provide a `label_map` attribute which is a list of strings which are the class labels corresponding to the integer targets (i.e. prior to conversion to a 1-of-K encoded binary vector).\n",
    "\n",
    "Below example code is given for creating instances of the CIFAR-10 and CIFAR-100 data provider objects and using them to train simple two-layer feedforward network models with rectified linear activations in TensorFlow. You may wish to use this code as a starting point for your own experiments.\n",
    "\n",
    "\n",
    "### Accessing the CIFAR-10 and CIFAR-100 data\n",
    "\n",
    "Before using the data provider objects you will need to copy the associated data files in to your local `mlp/data` directory (or wherever your `MLP_DATA_DIR` environment variable points to if different). The data is available as six compressed NumPy `.npz` files, (`cifar-10-train.npz, cifar-10-valid.npz, cifar-10-test-inputs.npz` and `cifar-100-train.npz, cifar-100-valid.npz, cifar-100-test.npz`) in the AFS directory `/afs/inf.ed.ac.uk/group/teaching/mlp/data`. Assuming your local `mlpractical` repository is in your home directory you should be able to copy the required files by running\n",
    "\n",
    "```\n",
    "cp /afs/inf.ed.ac.uk/group/teaching/mlp/data/cifar*.npz ~/mlpractical/data\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example two-layer classifier model on CIFAR-10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train_data = CIFAR10DataProvider('train', batch_size=50)\n",
    "valid_data = CIFAR10DataProvider('valid', batch_size=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def fully_connected_layer(inputs, input_dim, output_dim, nonlinearity=tf.nn.relu):\n",
    "    weights = tf.Variable(\n",
    "        tf.truncated_normal(\n",
    "            [input_dim, output_dim], stddev=2. / (input_dim + output_dim)**0.5), \n",
    "        'weights')\n",
    "    biases = tf.Variable(tf.zeros([output_dim]), 'biases')\n",
    "    outputs = tf.matmul(inputs, weights) + biases\n",
    "    return outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "inputs = tf.placeholder(tf.float32, [None, train_data.inputs.shape[1]], 'inputs')\n",
    "targets = tf.placeholder(tf.float32, [None, train_data.num_classes], 'targets')\n",
    "num_hidden = 200\n",
    "\n",
    "with tf.name_scope('fc-layer-1'):\n",
    "    hidden_1 = fully_connected_layer(inputs, train_data.inputs.shape[1], num_hidden)\n",
    "with tf.name_scope('output-layer'):\n",
    "    outputs = fully_connected_layer(hidden_1, num_hidden, train_data.num_classes, tf.identity)\n",
    "\n",
    "with tf.name_scope('error'):\n",
    "    error = tf.reduce_mean(\n",
    "        tf.nn.softmax_cross_entropy_with_logits(outputs, targets))\n",
    "with tf.name_scope('accuracy'):\n",
    "    accuracy = tf.reduce_mean(tf.cast(\n",
    "            tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1)), \n",
    "            tf.float32))\n",
    "\n",
    "with tf.name_scope('train'):\n",
    "    train_step = tf.train.AdamOptimizer().minimize(error)\n",
    "    \n",
    "init = tf.global_variables_initializer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "with tf.Session() as sess:\n",
    "    sess.run(init)\n",
    "    for e in range(25):\n",
    "        running_error = 0.\n",
    "        running_accuracy = 0.\n",
    "        for input_batch, target_batch in train_data:\n",
    "            _, batch_error, batch_acc = sess.run(\n",
    "                [train_step, error, accuracy], \n",
    "                feed_dict={inputs: input_batch, targets: target_batch})\n",
    "            running_error += batch_error\n",
    "            running_accuracy += batch_acc\n",
    "        running_error /= train_data.num_batches\n",
    "        running_accuracy /= train_data.num_batches\n",
    "        print('End of epoch {0:02d}: err(train)={1:.2f} acc(train)={2:.2f}'\n",
    "              .format(e + 1, running_error, running_accuracy))\n",
    "        if (e + 1) % 5 == 0:\n",
    "            valid_error = 0.\n",
    "            valid_accuracy = 0.\n",
    "            for input_batch, target_batch in valid_data:\n",
    "                batch_error, batch_acc = sess.run(\n",
    "                    [error, accuracy], \n",
    "                    feed_dict={inputs: input_batch, targets: target_batch})\n",
    "                valid_error += batch_error\n",
    "                valid_accuracy += batch_acc\n",
    "            valid_error /= valid_data.num_batches\n",
    "            valid_accuracy /= valid_data.num_batches\n",
    "            print('                 err(valid)={0:.2f} acc(valid)={1:.2f}'\n",
    "                   .format(valid_error, valid_accuracy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example two-layer classifier model on CIFAR-100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train_data = CIFAR100DataProvider('train', batch_size=50)\n",
    "valid_data = CIFAR100DataProvider('valid', batch_size=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "\n",
    "inputs = tf.placeholder(tf.float32, [None, train_data.inputs.shape[1]], 'inputs')\n",
    "targets = tf.placeholder(tf.float32, [None, train_data.num_classes], 'targets')\n",
    "num_hidden = 200\n",
    "\n",
    "with tf.name_scope('fc-layer-1'):\n",
    "    hidden_1 = fully_connected_layer(inputs, train_data.inputs.shape[1], num_hidden)\n",
    "with tf.name_scope('output-layer'):\n",
    "    outputs = fully_connected_layer(hidden_1, num_hidden, train_data.num_classes, tf.identity)\n",
    "\n",
    "with tf.name_scope('error'):\n",
    "    error = tf.reduce_mean(\n",
    "        tf.nn.softmax_cross_entropy_with_logits(outputs, targets))\n",
    "with tf.name_scope('accuracy'):\n",
    "    accuracy = tf.reduce_mean(tf.cast(\n",
    "            tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1)), \n",
    "            tf.float32))\n",
    "\n",
    "with tf.name_scope('train'):\n",
    "    train_step = tf.train.AdamOptimizer().minimize(error)\n",
    "    \n",
    "init = tf.global_variables_initializer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "with tf.Session() as sess:\n",
    "    sess.run(init)\n",
    "    for e in range(25):\n",
    "        running_error = 0.\n",
    "        running_accuracy = 0.\n",
    "        for input_batch, target_batch in train_data:\n",
    "            _, batch_error, batch_acc = sess.run(\n",
    "                [train_step, error, accuracy], \n",
    "                feed_dict={inputs: input_batch, targets: target_batch})\n",
    "            running_error += batch_error\n",
    "            running_accuracy += batch_acc\n",
    "        running_error /= train_data.num_batches\n",
    "        running_accuracy /= train_data.num_batches\n",
    "        print('End of epoch {0:02d}: err(train)={1:.2f} acc(train)={2:.2f}'\n",
    "              .format(e + 1, running_error, running_accuracy))\n",
    "        if (e + 1) % 5 == 0:\n",
    "            valid_error = 0.\n",
    "            valid_accuracy = 0.\n",
    "            for input_batch, target_batch in valid_data:\n",
    "                batch_error, batch_acc = sess.run(\n",
    "                    [error, accuracy], \n",
    "                    feed_dict={inputs: input_batch, targets: target_batch})\n",
    "                valid_error += batch_error\n",
    "                valid_accuracy += batch_acc\n",
    "            valid_error /= valid_data.num_batches\n",
    "            valid_accuracy /= valid_data.num_batches\n",
    "            print('                 err(valid)={0:.2f} acc(valid)={1:.2f}'\n",
    "                   .format(valid_error, valid_accuracy))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda env:mlp]",
   "language": "python",
   "name": "conda-env-mlp-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}