Compare commits

..

5 Commits

Author SHA1 Message Date
Anton Lydike
1313d8ab2e update google notes to be what I think they should be 2024-11-18 15:24:31 +00:00
tpmmthomas
56f22b8ac1 update 2024-10-23 02:05:32 +08:00
tpmmthomas
4bf1a79681 debug arg_extractor.py 2024-10-23 02:04:02 +08:00
tpmmthomas
0f1d7a1498 update notes 2024-10-23 02:01:28 +08:00
tpmmthomas
48d1e846ea update gcloud instructions 2024-10-23 01:59:06 +08:00
106 changed files with 1583 additions and 13102 deletions

35
.gitignore vendored
View File

@ -1,6 +1,8 @@
#editors
*.idea/
#dropbox stuff
*.dropbox*
.idea/*
# Byte-compiled / optimized / DLL files
__pycache__/
@ -26,7 +28,6 @@ var/
*.egg-info/
.installed.cfg
*.egg
*.tar.gz
# PyInstaller
# Usually these files are written by a python script from a template
@ -60,30 +61,8 @@ docs/_build/
# PyBuilder
target/
# Pycharm
.idea/*
#Notebook stuff
notebooks/.ipynb_checkpoints/
#Google Cloud stuff
/google-cloud-sdk
.ipynb_checkpoints/
data/cifar-100-python/
data/MNIST/
*.tar.gz
google-cloud-sdk/
solutions/
report/mlp-cw1-template.aux
report/mlp-cw1-template.out
report/mlp-cw1-template.pdf
report/mlp-cw1-template.synctex.gz
.DS_Store
report/mlp-cw2-template.aux
report/mlp-cw2-template.out
report/mlp-cw2-template.pdf
report/mlp-cw2-template.synctex.gz
report/mlp-cw2-template.bbl
report/mlp-cw2-template.blg
venv
saved_models
.ipynb_checkpoints/
emnist_tutorial/

View File

@ -1,15 +1,22 @@
# Machine Learning Practical
# MLP Compute Engines Tutorials Branch
This repository contains the code for the University of Edinburgh [School of Informatics](http://www.inf.ed.ac.uk) course [Machine Learning Practical](http://www.inf.ed.ac.uk/teaching/courses/mlp/).
A short code repo that guides you through the process of running experiments on the Google Cloud Platform.
This assignment-based course is focused on the implementation and evaluation of machine learning systems. Students who do this course will have experience in the design, implementation, training, and evaluation of machine learning systems.
## Why do I need it?
Most Deep Learning experiments require a large amount of compute as you have noticed in term 1. Usage of GPU can accelerate experiments around 30-50x therefore making experiments that require a large amount of time feasible by slashing their runtimes down by a massive factor. For a simple example consider an experiment that required a month to run, that would make it infeasible to actually do research with. Now consider that experiment only requiring 1 day to run, which allows one to iterate over methodologies, tune hyperparameters and overall try far more things. This simple example expresses one of the simplest reasons behind the GPU hype that surrounds machine learning research today.
The code in this repository is split into:
## Introduction
* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
The material available includes tutorial documents and code, as well as tooling that provides more advanced features to aid you in your quests to train lots of learnable differentiable computational graphs.
## Coursework 2
This branch contains the python code and latex files of the first coursework. The code follows the same structure as the labs, in particular the mlp package, and a specific notebook is provided to help you run experiments.
* Detailed instructions are given in MLP2024_25_CW2_Spec.pdf (see Learn, Assessment, CW2).
* The [report directory](https://github.com/VICO-UoE/mlpractical/tree/mlp2024-25/coursework2/report) contains the latex files that you will use to create your report.
## Getting Started
### Google Cloud Platform
Google Cloud Platform (GCP) is a cloud computing service that provides a number of services, including the ability to run virtual machines (VMs) on their infrastructure. The VMs are called Compute Engine instances.
As an MLP course student, you will be given 50$ worth of credits. This is enough to run a number of experiments on the cloud.
To get started with GCP, please read the [this getting started guide](notes/google_cloud_setup.md).
The guide will take you through the process of setting up a GCP account, creating a project, creating a VM instance, and connecting to it. The VM instance will be a GPU-endowed Linux machine that already includes the necessary PyTorch packages for you to run your experiments.

View File

@ -1,102 +0,0 @@
train_acc,train_loss,val_acc,val_loss
0.010694736842105264,4.827323,0.024800000000000003,4.5659676
0.03562105263157895,4.3888855,0.0604,4.136276
0.0757684210526316,3.998175,0.09480000000000001,3.8678854
0.10734736842105265,3.784943,0.12159999999999999,3.6687074
0.13741052631578948,3.6023798,0.15439999999999998,3.4829779
0.16888421052631578,3.4196754,0.1864,3.3093607
0.1941263157894737,3.2674048,0.20720000000000002,3.2223148
0.21861052631578948,3.139925,0.22880000000000003,3.1171055
0.24134736842105264,3.0145736,0.24760000000000001,3.0554724
0.26399999999999996,2.9004965,0.2552,2.9390912
0.27898947368421056,2.815607,0.2764,2.9205213
0.29532631578947366,2.7256868,0.2968,2.7410471
0.31138947368421044,2.6567938,0.3016,2.7083752
0.3236842105263158,2.595405,0.322,2.665904
0.33486315789473686,2.5434496,0.3176,2.688214
0.3462526315789474,2.5021079,0.33159999999999995,2.648656
0.35381052631578946,2.4609485,0.342,2.5658453
0.36157894736842106,2.4152951,0.34119999999999995,2.5403407
0.36774736842105266,2.382958,0.3332,2.6936982
0.37753684210526317,2.3510027,0.36160000000000003,2.4663532
0.38597894736842114,2.319616,0.3608,2.4559999
0.3912421052631579,2.294115,0.3732,2.3644555
0.39840000000000003,2.2598042,0.3716,2.4516551
0.4036,2.2318766,0.37439999999999996,2.4189563
0.4105263157894737,2.2035582,0.3772,2.3899698
0.41501052631578944,2.1830406,0.3876,2.3215945
0.4193263157894737,2.158597,0.37800000000000006,2.3831298
0.4211578947368421,2.148888,0.38160000000000005,2.3436418
0.4260842105263159,2.1250536,0.39840000000000003,2.3471045
0.4313684210526315,2.107519,0.4044,2.2744477
0.4370526315789474,2.0837262,0.398,2.245617
0.439642105263158,2.0691078,0.41200000000000003,2.216309
0.4440842105263158,2.046351,0.4096,2.2329648
0.44696842105263157,2.0330904,0.4104,2.1841388
0.4518105263157895,2.0200553,0.4244,2.1780539
0.45298947368421055,2.0069249,0.42719999999999997,2.1625984
0.4602105263157895,1.9896894,0.4204,2.2195568
0.46023157894736844,1.9788533,0.4244,2.1803434
0.46101052631578954,1.9693571,0.4128,2.1858895
0.46774736842105263,1.9547894,0.4204,2.1908271
0.4671157894736842,1.9390026,0.4244,2.1841395
0.4698105263157895,1.924038,0.424,2.1843896
0.4738736842105264,1.9161719,0.43,2.154806
0.47541052631578945,1.9033127,0.4463999999999999,2.1130056
0.48,1.8961077,0.44439999999999996,2.113019
0.48456842105263154,1.8838875,0.43079999999999996,2.1191697
0.4857263157894737,1.8711865,0.44920000000000004,2.1213412
0.4887578947368421,1.8590263,0.44799999999999995,2.1077166
0.49035789473684216,1.8479114,0.4428,2.0737479
0.4908421052631579,1.845268,0.4436,2.07655
0.4939368421052632,1.8336699,0.4548,2.0769904
0.49924210526315793,1.8237538,0.4548,2.061769
0.49677894736842104,1.8111013,0.44240000000000007,2.0676718
0.5008842105263157,1.8031327,0.4548,2.0859065
0.5,1.8026625,0.458,2.0704215
0.5030736842105263,1.792004,0.4596,2.1113508
0.505578947368421,1.7810374,0.45679999999999993,2.0382714
0.5090315789473684,1.7691813,0.4444000000000001,2.0911386
0.512042105263158,1.7633294,0.4616,2.0458508
0.5142736842105263,1.7549652,0.4464,2.0786576
0.5128421052631579,1.7518128,0.4656,2.026332
0.518042105263158,1.7420768,0.46,2.0141299
0.5182315789473684,1.7321203,0.45960000000000006,2.0226884
0.5192842105263158,1.7264535,0.46279999999999993,2.0182638
0.5217894736842105,1.7245325,0.46399999999999997,2.0110855
0.5229684210526316,1.7184331,0.46679999999999994,2.0191038
0.5227578947368421,1.7116771,0.4604,2.0334535
0.5245894736842105,1.7009526,0.4692,2.0072439
0.5262315789473684,1.6991171,0.4700000000000001,2.0296187
0.5278526315789474,1.6958193,0.4708,1.9912667
0.527157894736842,1.6907407,0.4736,2.006095
0.5299578947368421,1.6808176,0.4715999999999999,2.012164
0.5313052631578947,1.676356,0.47239999999999993,1.9955354
0.5338315789473685,1.6731659,0.47839999999999994,2.005768
0.5336000000000001,1.662152,0.4672,2.015392
0.5354736842105263,1.6638054,0.4692,1.9890119
0.5397894736842105,1.6575475,0.4768,2.0090258
0.5386526315789474,1.6595734,0.4824,1.9728817
0.5376631578947368,1.6536722,0.4816,1.9769167
0.5384842105263159,1.6495628,0.47600000000000003,1.9980135
0.5380842105263157,1.6488388,0.478,1.9884782
0.5393473684210528,1.6408547,0.48,1.9772192
0.5415157894736843,1.632917,0.4828,1.9732709
0.5394947368421052,1.6340653,0.4776,1.9623082
0.5429052631578948,1.6340532,0.47759999999999997,1.9812362
0.5452421052631579,1.6246406,0.48119999999999996,1.9846246
0.5436210526315789,1.6288266,0.4864,1.9822198
0.5437684210526316,1.6240481,0.48279999999999995,1.9768158
0.546357894736842,1.6208181,0.4804,1.9625885
0.5485052631578946,1.6164333,0.47839999999999994,1.9738724
0.5466736842105263,1.6169226,0.47800000000000004,1.9842362
0.547621052631579,1.6159856,0.4828,1.9709526
0.5480421052631579,1.6175526,0.48560000000000003,1.967775
0.5468421052631579,1.6149833,0.48119999999999996,1.9626708
0.5493894736842105,1.6063902,0.4835999999999999,1.96621
0.5490736842105263,1.6096952,0.48120000000000007,1.9742922
0.5514736842105264,1.6084315,0.4867999999999999,1.9604725
0.5489263157894737,1.6069487,0.4831999999999999,1.9733659
0.5494947368421053,1.6030664,0.49079999999999996,1.9693874
0.5516842105263158,1.6043342,0.486,1.9647765
0.552442105263158,1.6039867,0.48480000000000006,1.9649359
1 train_acc train_loss val_acc val_loss
2 0.010694736842105264 4.827323 0.024800000000000003 4.5659676
3 0.03562105263157895 4.3888855 0.0604 4.136276
4 0.0757684210526316 3.998175 0.09480000000000001 3.8678854
5 0.10734736842105265 3.784943 0.12159999999999999 3.6687074
6 0.13741052631578948 3.6023798 0.15439999999999998 3.4829779
7 0.16888421052631578 3.4196754 0.1864 3.3093607
8 0.1941263157894737 3.2674048 0.20720000000000002 3.2223148
9 0.21861052631578948 3.139925 0.22880000000000003 3.1171055
10 0.24134736842105264 3.0145736 0.24760000000000001 3.0554724
11 0.26399999999999996 2.9004965 0.2552 2.9390912
12 0.27898947368421056 2.815607 0.2764 2.9205213
13 0.29532631578947366 2.7256868 0.2968 2.7410471
14 0.31138947368421044 2.6567938 0.3016 2.7083752
15 0.3236842105263158 2.595405 0.322 2.665904
16 0.33486315789473686 2.5434496 0.3176 2.688214
17 0.3462526315789474 2.5021079 0.33159999999999995 2.648656
18 0.35381052631578946 2.4609485 0.342 2.5658453
19 0.36157894736842106 2.4152951 0.34119999999999995 2.5403407
20 0.36774736842105266 2.382958 0.3332 2.6936982
21 0.37753684210526317 2.3510027 0.36160000000000003 2.4663532
22 0.38597894736842114 2.319616 0.3608 2.4559999
23 0.3912421052631579 2.294115 0.3732 2.3644555
24 0.39840000000000003 2.2598042 0.3716 2.4516551
25 0.4036 2.2318766 0.37439999999999996 2.4189563
26 0.4105263157894737 2.2035582 0.3772 2.3899698
27 0.41501052631578944 2.1830406 0.3876 2.3215945
28 0.4193263157894737 2.158597 0.37800000000000006 2.3831298
29 0.4211578947368421 2.148888 0.38160000000000005 2.3436418
30 0.4260842105263159 2.1250536 0.39840000000000003 2.3471045
31 0.4313684210526315 2.107519 0.4044 2.2744477
32 0.4370526315789474 2.0837262 0.398 2.245617
33 0.439642105263158 2.0691078 0.41200000000000003 2.216309
34 0.4440842105263158 2.046351 0.4096 2.2329648
35 0.44696842105263157 2.0330904 0.4104 2.1841388
36 0.4518105263157895 2.0200553 0.4244 2.1780539
37 0.45298947368421055 2.0069249 0.42719999999999997 2.1625984
38 0.4602105263157895 1.9896894 0.4204 2.2195568
39 0.46023157894736844 1.9788533 0.4244 2.1803434
40 0.46101052631578954 1.9693571 0.4128 2.1858895
41 0.46774736842105263 1.9547894 0.4204 2.1908271
42 0.4671157894736842 1.9390026 0.4244 2.1841395
43 0.4698105263157895 1.924038 0.424 2.1843896
44 0.4738736842105264 1.9161719 0.43 2.154806
45 0.47541052631578945 1.9033127 0.4463999999999999 2.1130056
46 0.48 1.8961077 0.44439999999999996 2.113019
47 0.48456842105263154 1.8838875 0.43079999999999996 2.1191697
48 0.4857263157894737 1.8711865 0.44920000000000004 2.1213412
49 0.4887578947368421 1.8590263 0.44799999999999995 2.1077166
50 0.49035789473684216 1.8479114 0.4428 2.0737479
51 0.4908421052631579 1.845268 0.4436 2.07655
52 0.4939368421052632 1.8336699 0.4548 2.0769904
53 0.49924210526315793 1.8237538 0.4548 2.061769
54 0.49677894736842104 1.8111013 0.44240000000000007 2.0676718
55 0.5008842105263157 1.8031327 0.4548 2.0859065
56 0.5 1.8026625 0.458 2.0704215
57 0.5030736842105263 1.792004 0.4596 2.1113508
58 0.505578947368421 1.7810374 0.45679999999999993 2.0382714
59 0.5090315789473684 1.7691813 0.4444000000000001 2.0911386
60 0.512042105263158 1.7633294 0.4616 2.0458508
61 0.5142736842105263 1.7549652 0.4464 2.0786576
62 0.5128421052631579 1.7518128 0.4656 2.026332
63 0.518042105263158 1.7420768 0.46 2.0141299
64 0.5182315789473684 1.7321203 0.45960000000000006 2.0226884
65 0.5192842105263158 1.7264535 0.46279999999999993 2.0182638
66 0.5217894736842105 1.7245325 0.46399999999999997 2.0110855
67 0.5229684210526316 1.7184331 0.46679999999999994 2.0191038
68 0.5227578947368421 1.7116771 0.4604 2.0334535
69 0.5245894736842105 1.7009526 0.4692 2.0072439
70 0.5262315789473684 1.6991171 0.4700000000000001 2.0296187
71 0.5278526315789474 1.6958193 0.4708 1.9912667
72 0.527157894736842 1.6907407 0.4736 2.006095
73 0.5299578947368421 1.6808176 0.4715999999999999 2.012164
74 0.5313052631578947 1.676356 0.47239999999999993 1.9955354
75 0.5338315789473685 1.6731659 0.47839999999999994 2.005768
76 0.5336000000000001 1.662152 0.4672 2.015392
77 0.5354736842105263 1.6638054 0.4692 1.9890119
78 0.5397894736842105 1.6575475 0.4768 2.0090258
79 0.5386526315789474 1.6595734 0.4824 1.9728817
80 0.5376631578947368 1.6536722 0.4816 1.9769167
81 0.5384842105263159 1.6495628 0.47600000000000003 1.9980135
82 0.5380842105263157 1.6488388 0.478 1.9884782
83 0.5393473684210528 1.6408547 0.48 1.9772192
84 0.5415157894736843 1.632917 0.4828 1.9732709
85 0.5394947368421052 1.6340653 0.4776 1.9623082
86 0.5429052631578948 1.6340532 0.47759999999999997 1.9812362
87 0.5452421052631579 1.6246406 0.48119999999999996 1.9846246
88 0.5436210526315789 1.6288266 0.4864 1.9822198
89 0.5437684210526316 1.6240481 0.48279999999999995 1.9768158
90 0.546357894736842 1.6208181 0.4804 1.9625885
91 0.5485052631578946 1.6164333 0.47839999999999994 1.9738724
92 0.5466736842105263 1.6169226 0.47800000000000004 1.9842362
93 0.547621052631579 1.6159856 0.4828 1.9709526
94 0.5480421052631579 1.6175526 0.48560000000000003 1.967775
95 0.5468421052631579 1.6149833 0.48119999999999996 1.9626708
96 0.5493894736842105 1.6063902 0.4835999999999999 1.96621
97 0.5490736842105263 1.6096952 0.48120000000000007 1.9742922
98 0.5514736842105264 1.6084315 0.4867999999999999 1.9604725
99 0.5489263157894737 1.6069487 0.4831999999999999 1.9733659
100 0.5494947368421053 1.6030664 0.49079999999999996 1.9693874
101 0.5516842105263158 1.6043342 0.486 1.9647765
102 0.552442105263158 1.6039867 0.48480000000000006 1.9649359

View File

@ -1,2 +0,0 @@
test_acc,test_loss
0.49950000000000006,1.9105633
1 test_acc test_loss
2 0.49950000000000006 1.9105633

View File

@ -1,101 +0,0 @@
train_acc,train_loss,val_acc,val_loss
0.009263157894736843,4.8649125,0.0104,4.630689
0.009810526315789474,4.6264124,0.009600000000000001,4.618983
0.009705263157894738,4.621914,0.011200000000000002,4.6184525
0.008989473684210525,4.619472,0.0064,4.6164784
0.009747368421052633,4.6168556,0.0076,4.6138463
0.00951578947368421,4.6156826,0.0108,4.6139345
0.009789473684210525,4.614809,0.008400000000000001,4.6116896
0.009936842105263159,4.613147,0.0104,4.6148276
0.009810526315789474,4.612325,0.0076,4.6123877
0.009094736842105263,4.6117926,0.007200000000000001,4.6149993
0.008421052631578947,4.611283,0.011600000000000001,4.6114736
0.009010526315789472,4.6105323,0.009600000000000001,4.607559
0.009894736842105263,4.6103206,0.008400000000000001,4.6086206
0.00934736842105263,4.6095214,0.011200000000000002,4.6091933
0.009473684210526316,4.6095295,0.008,4.6095695
0.010252631578947369,4.609189,0.0104,4.610459
0.009536842105263158,4.6087623,0.0092,4.6091356
0.00848421052631579,4.6086617,0.009600000000000001,4.609126
0.008421052631578947,4.6083455,0.011200000000000002,4.6088147
0.009410526315789473,4.608145,0.0068000000000000005,4.608519
0.009263157894736843,4.6078997,0.0092,4.6085033
0.009389473684210526,4.607453,0.01,4.6083508
0.008989473684210528,4.6075597,0.008400000000000001,4.6073136
0.009326315789473686,4.607266,0.008,4.6069093
0.01,4.607154,0.0076,4.6069508
0.008778947368421053,4.607089,0.011200000000000002,4.60659
0.009326315789473684,4.606807,0.0068,4.6072598
0.009031578947368422,4.6068263,0.011200000000000002,4.607257
0.008842105263157896,4.6066294,0.008,4.606883
0.008968421052631579,4.606647,0.006400000000000001,4.607275
0.008947368421052631,4.6065364,0.0092,4.606976
0.008842105263157896,4.6064167,0.0076,4.607016
0.008799999999999999,4.606425,0.0096,4.607184
0.009326315789473686,4.606305,0.0072,4.6068683
0.00905263157894737,4.606274,0.0072,4.606982
0.00934736842105263,4.6062336,0.007200000000000001,4.607209
0.009221052631578948,4.606221,0.0076,4.607369
0.009557894736842105,4.60607,0.0076,4.6074376
0.009073684210526317,4.6061006,0.0072,4.607068
0.009242105263157895,4.606005,0.0064,4.6067224
0.009957894736842107,4.605986,0.0072,4.6068263
0.009052631578947368,4.605935,0.0072,4.6067867
0.008694736842105264,4.6059127,0.0064,4.6070905
0.009536842105263158,4.605874,0.006400000000000001,4.606976
0.009663157894736842,4.605872,0.0072,4.6068897
0.008821052631578948,4.6057997,0.0064,4.607028
0.009768421052631579,4.605778,0.0072,4.6069264
0.0092,4.6057644,0.007200000000000001,4.607018
0.008926315789473685,4.6057386,0.0072,4.60698
0.008989473684210525,4.6057277,0.0064,4.6070237
0.009242105263157895,4.6057053,0.0064,4.6069183
0.009094736842105263,4.605692,0.006400000000000001,4.6068764
0.009473684210526316,4.60566,0.0064,4.606909
0.009494736842105262,4.605613,0.0064,4.606978
0.009747368421052631,4.6056285,0.0064,4.606753
0.009789473684210527,4.605578,0.006400000000000001,4.6068797
0.009199999999999998,4.6055675,0.0064,4.606888
0.009073684210526317,4.6055593,0.0064,4.606874
0.008821052631578948,4.6055293,0.006400000000000001,4.606851
0.009326315789473684,4.6055255,0.0064,4.606871
0.009557894736842105,4.6055083,0.006400000000000001,4.606851
0.009600000000000001,4.605491,0.0064,4.6068635
0.00856842105263158,4.605466,0.0064,4.606862
0.009894736842105263,4.605463,0.006400000000000001,4.6068873
0.009494736842105262,4.605441,0.0064,4.6068926
0.008673684210526314,4.6054277,0.0064,4.6068554
0.009221052631578948,4.6054296,0.0063999999999999994,4.6068907
0.008989473684210528,4.605404,0.0064,4.6068807
0.00928421052631579,4.6053905,0.006400000000000001,4.6068707
0.0092,4.6053743,0.0064,4.606894
0.008989473684210525,4.605368,0.0064,4.606845
0.009515789473684212,4.605355,0.0064,4.6068635
0.009073684210526317,4.605352,0.0064,4.6068773
0.009642105263157895,4.6053243,0.0064,4.606883
0.009747368421052633,4.6053176,0.0064,4.6069
0.009873684210526316,4.6053023,0.0064,4.6068873
0.009536842105263156,4.605297,0.0064,4.6068654
0.009515789473684212,4.6052866,0.0064,4.6068883
0.009978947368421053,4.605265,0.006400000000000001,4.606894
0.009957894736842107,4.605259,0.0064,4.6068826
0.009410526315789475,4.6052504,0.0064,4.6068697
0.01002105263157895,4.6052403,0.006400000000000001,4.6068807
0.01002105263157895,4.6052313,0.0064,4.606872
0.00951578947368421,4.605224,0.0064,4.6068883
0.009852631578947368,4.605219,0.006400000000000001,4.606871
0.009894736842105265,4.605209,0.0064,4.606871
0.00922105263157895,4.605204,0.0064,4.6068654
0.010042105263157896,4.605193,0.0064,4.6068764
0.009978947368421053,4.6051874,0.006400000000000001,4.6068697
0.009747368421052633,4.605183,0.0064,4.6068673
0.010189473684210526,4.605178,0.0064,4.606873
0.009789473684210527,4.605173,0.0064,4.6068773
0.009936842105263159,4.605169,0.0064,4.606874
0.010042105263157894,4.605166,0.0064,4.606877
0.009494736842105262,4.6051593,0.0064,4.606874
0.009536842105263158,4.6051593,0.0063999999999999994,4.606874
0.010021052631578946,4.6051564,0.006400000000000001,4.6068716
0.009747368421052631,4.605154,0.0064,4.6068726
0.009642105263157895,4.605153,0.0064,4.606872
0.009305263157894737,4.6051517,0.0064,4.6068726
1 train_acc train_loss val_acc val_loss
2 0.009263157894736843 4.8649125 0.0104 4.630689
3 0.009810526315789474 4.6264124 0.009600000000000001 4.618983
4 0.009705263157894738 4.621914 0.011200000000000002 4.6184525
5 0.008989473684210525 4.619472 0.0064 4.6164784
6 0.009747368421052633 4.6168556 0.0076 4.6138463
7 0.00951578947368421 4.6156826 0.0108 4.6139345
8 0.009789473684210525 4.614809 0.008400000000000001 4.6116896
9 0.009936842105263159 4.613147 0.0104 4.6148276
10 0.009810526315789474 4.612325 0.0076 4.6123877
11 0.009094736842105263 4.6117926 0.007200000000000001 4.6149993
12 0.008421052631578947 4.611283 0.011600000000000001 4.6114736
13 0.009010526315789472 4.6105323 0.009600000000000001 4.607559
14 0.009894736842105263 4.6103206 0.008400000000000001 4.6086206
15 0.00934736842105263 4.6095214 0.011200000000000002 4.6091933
16 0.009473684210526316 4.6095295 0.008 4.6095695
17 0.010252631578947369 4.609189 0.0104 4.610459
18 0.009536842105263158 4.6087623 0.0092 4.6091356
19 0.00848421052631579 4.6086617 0.009600000000000001 4.609126
20 0.008421052631578947 4.6083455 0.011200000000000002 4.6088147
21 0.009410526315789473 4.608145 0.0068000000000000005 4.608519
22 0.009263157894736843 4.6078997 0.0092 4.6085033
23 0.009389473684210526 4.607453 0.01 4.6083508
24 0.008989473684210528 4.6075597 0.008400000000000001 4.6073136
25 0.009326315789473686 4.607266 0.008 4.6069093
26 0.01 4.607154 0.0076 4.6069508
27 0.008778947368421053 4.607089 0.011200000000000002 4.60659
28 0.009326315789473684 4.606807 0.0068 4.6072598
29 0.009031578947368422 4.6068263 0.011200000000000002 4.607257
30 0.008842105263157896 4.6066294 0.008 4.606883
31 0.008968421052631579 4.606647 0.006400000000000001 4.607275
32 0.008947368421052631 4.6065364 0.0092 4.606976
33 0.008842105263157896 4.6064167 0.0076 4.607016
34 0.008799999999999999 4.606425 0.0096 4.607184
35 0.009326315789473686 4.606305 0.0072 4.6068683
36 0.00905263157894737 4.606274 0.0072 4.606982
37 0.00934736842105263 4.6062336 0.007200000000000001 4.607209
38 0.009221052631578948 4.606221 0.0076 4.607369
39 0.009557894736842105 4.60607 0.0076 4.6074376
40 0.009073684210526317 4.6061006 0.0072 4.607068
41 0.009242105263157895 4.606005 0.0064 4.6067224
42 0.009957894736842107 4.605986 0.0072 4.6068263
43 0.009052631578947368 4.605935 0.0072 4.6067867
44 0.008694736842105264 4.6059127 0.0064 4.6070905
45 0.009536842105263158 4.605874 0.006400000000000001 4.606976
46 0.009663157894736842 4.605872 0.0072 4.6068897
47 0.008821052631578948 4.6057997 0.0064 4.607028
48 0.009768421052631579 4.605778 0.0072 4.6069264
49 0.0092 4.6057644 0.007200000000000001 4.607018
50 0.008926315789473685 4.6057386 0.0072 4.60698
51 0.008989473684210525 4.6057277 0.0064 4.6070237
52 0.009242105263157895 4.6057053 0.0064 4.6069183
53 0.009094736842105263 4.605692 0.006400000000000001 4.6068764
54 0.009473684210526316 4.60566 0.0064 4.606909
55 0.009494736842105262 4.605613 0.0064 4.606978
56 0.009747368421052631 4.6056285 0.0064 4.606753
57 0.009789473684210527 4.605578 0.006400000000000001 4.6068797
58 0.009199999999999998 4.6055675 0.0064 4.606888
59 0.009073684210526317 4.6055593 0.0064 4.606874
60 0.008821052631578948 4.6055293 0.006400000000000001 4.606851
61 0.009326315789473684 4.6055255 0.0064 4.606871
62 0.009557894736842105 4.6055083 0.006400000000000001 4.606851
63 0.009600000000000001 4.605491 0.0064 4.6068635
64 0.00856842105263158 4.605466 0.0064 4.606862
65 0.009894736842105263 4.605463 0.006400000000000001 4.6068873
66 0.009494736842105262 4.605441 0.0064 4.6068926
67 0.008673684210526314 4.6054277 0.0064 4.6068554
68 0.009221052631578948 4.6054296 0.0063999999999999994 4.6068907
69 0.008989473684210528 4.605404 0.0064 4.6068807
70 0.00928421052631579 4.6053905 0.006400000000000001 4.6068707
71 0.0092 4.6053743 0.0064 4.606894
72 0.008989473684210525 4.605368 0.0064 4.606845
73 0.009515789473684212 4.605355 0.0064 4.6068635
74 0.009073684210526317 4.605352 0.0064 4.6068773
75 0.009642105263157895 4.6053243 0.0064 4.606883
76 0.009747368421052633 4.6053176 0.0064 4.6069
77 0.009873684210526316 4.6053023 0.0064 4.6068873
78 0.009536842105263156 4.605297 0.0064 4.6068654
79 0.009515789473684212 4.6052866 0.0064 4.6068883
80 0.009978947368421053 4.605265 0.006400000000000001 4.606894
81 0.009957894736842107 4.605259 0.0064 4.6068826
82 0.009410526315789475 4.6052504 0.0064 4.6068697
83 0.01002105263157895 4.6052403 0.006400000000000001 4.6068807
84 0.01002105263157895 4.6052313 0.0064 4.606872
85 0.00951578947368421 4.605224 0.0064 4.6068883
86 0.009852631578947368 4.605219 0.006400000000000001 4.606871
87 0.009894736842105265 4.605209 0.0064 4.606871
88 0.00922105263157895 4.605204 0.0064 4.6068654
89 0.010042105263157896 4.605193 0.0064 4.6068764
90 0.009978947368421053 4.6051874 0.006400000000000001 4.6068697
91 0.009747368421052633 4.605183 0.0064 4.6068673
92 0.010189473684210526 4.605178 0.0064 4.606873
93 0.009789473684210527 4.605173 0.0064 4.6068773
94 0.009936842105263159 4.605169 0.0064 4.606874
95 0.010042105263157894 4.605166 0.0064 4.606877
96 0.009494736842105262 4.6051593 0.0064 4.606874
97 0.009536842105263158 4.6051593 0.0063999999999999994 4.606874
98 0.010021052631578946 4.6051564 0.006400000000000001 4.6068716
99 0.009747368421052631 4.605154 0.0064 4.6068726
100 0.009642105263157895 4.605153 0.0064 4.606872
101 0.009305263157894737 4.6051517 0.0064 4.6068726

View File

@ -1,2 +0,0 @@
test_acc,test_loss
0.01,4.608619
1 test_acc test_loss
2 0.01 4.608619

87
arg_extractor.py Normal file
View File

@ -0,0 +1,87 @@
import argparse
import json
import os
import sys
def str2bool(v):
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')
def get_args():
"""
Returns a namedtuple with arguments extracted from the command line.
:return: A namedtuple with arguments
"""
parser = argparse.ArgumentParser(
description='Welcome to the MLP course\'s Pytorch training and inference helper script')
parser.add_argument('--batch_size', nargs="?", type=int, default=100, help='Batch_size for experiment')
parser.add_argument('--continue_from_epoch', nargs="?", type=int, default=-1, help='Batch_size for experiment')
parser.add_argument('--dataset_name', type=str, help='Dataset on which the system will train/eval our model')
parser.add_argument('--seed', nargs="?", type=int, default=7112018,
help='Seed to use for random number generator for experiment')
parser.add_argument('--image_num_channels', nargs="?", type=int, default=1,
help='The channel dimensionality of our image-data')
parser.add_argument('--image_height', nargs="?", type=int, default=28, help='Height of image data')
parser.add_argument('--image_width', nargs="?", type=int, default=28, help='Width of image data')
parser.add_argument('--dim_reduction_type', nargs="?", type=str, default='strided_convolution',
help='One of [strided_convolution, dilated_convolution, max_pooling, avg_pooling]')
parser.add_argument('--num_layers', nargs="?", type=int, default=4,
help='Number of convolutional layers in the network (excluding '
'dimensionality reduction layers)')
parser.add_argument('--num_filters', nargs="?", type=int, default=64,
help='Number of convolutional filters per convolutional layer in the network (excluding '
'dimensionality reduction layers)')
parser.add_argument('--num_epochs', nargs="?", type=int, default=100, help='The experiment\'s epoch budget')
parser.add_argument('--experiment_name', nargs="?", type=str, default="exp_1",
help='Experiment name - to be used for building the experiment folder')
parser.add_argument('--use_gpu', nargs="?", type=str2bool, default=False,
help='A flag indicating whether we will use GPU acceleration or not')
parser.add_argument('--weight_decay_coefficient', nargs="?", type=float, default=1e-05,
help='Weight decay to use for Adam')
parser.add_argument('--filepath_to_arguments_json_file', nargs="?", type=str, default=None,
help='')
args = parser.parse_args()
if args.filepath_to_arguments_json_file is not None:
args = extract_args_from_json(json_file_path=args.filepath_to_arguments_json_file, existing_args_dict=args)
arg_str = [(str(key), str(value)) for (key, value) in vars(args).items()]
print(arg_str)
import torch
if torch.cuda.is_available(): # checks whether a cuda gpu is available and whether the gpu flag is True
device = torch.cuda.current_device()
print("use {} GPU(s)".format(torch.cuda.device_count()), file=sys.stderr)
else:
print("use CPU", file=sys.stderr)
device = torch.device('cpu') # sets the device to be CPU
return args, device
class AttributeAccessibleDict(object):
def __init__(self, adict):
self.__dict__.update(adict)
def extract_args_from_json(json_file_path, existing_args_dict=None):
summary_filename = json_file_path
with open(summary_filename) as f:
arguments_dict = json.load(fp=f)
for key, value in vars(existing_args_dict).items():
if key not in arguments_dict:
arguments_dict[key] = value
arguments_dict = AttributeAccessibleDict(arguments_dict)
return arguments_dict

View File

@ -0,0 +1,43 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-Standard
#SBATCH --gres=gpu:1
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-08:00:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 3 --image_height 32 --image_width 32 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'cifar100_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "cifar100"

View File

@ -0,0 +1,38 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-Standard
#SBATCH --gres=gpu:1
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-08:00:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/cifar10_tutorial_config.json

View File

@ -0,0 +1,43 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-LongJobs
#SBATCH --gres=gpu:1
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-08:00:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 1 --image_height 28 --image_width 28 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'emnist_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "emnist"

View File

@ -0,0 +1,43 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-Short
#SBATCH --gres=gpu:1
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-03:59:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 1 --image_height 28 --image_width 28 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'emnist_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "emnist"

View File

@ -0,0 +1,44 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-Standard
#SBATCH --gres=gpu:4
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-08:00:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 1 --image_height 28 --image_width 28 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'emnist_test_multi_gpu_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "emnist"

View File

@ -0,0 +1,43 @@
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-Standard
#SBATCH --gres=gpu:1
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-08:00:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 1 --image_height 28 --image_width 28 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'emnist_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "emnist"

View File

@ -0,0 +1,57 @@
import os
import subprocess
import argparse
import tqdm
import getpass
import time
parser = argparse.ArgumentParser(description='Welcome to the run N at a time script')
parser.add_argument('--num_parallel_jobs', type=int)
parser.add_argument('--total_epochs', type=int)
args = parser.parse_args()
def check_if_experiment_with_name_is_running(experiment_name):
result = subprocess.run(['squeue --name {}'.format(experiment_name), '-l'], stdout=subprocess.PIPE, shell=True)
lines = result.stdout.split(b'\n')
if len(lines) > 2:
return True
else:
return False
student_id = getpass.getuser().encode()[:5]
list_of_scripts = [item for item in
subprocess.run(['ls'], stdout=subprocess.PIPE).stdout.split(b'\n') if
item.decode("utf-8").endswith(".sh")]
for script in list_of_scripts:
print('sbatch', script.decode("utf-8"))
epoch_dict = {key.decode("utf-8"): 0 for key in list_of_scripts}
total_jobs_finished = 0
while total_jobs_finished < args.total_epochs * len(list_of_scripts):
curr_idx = 0
with tqdm.tqdm(total=len(list_of_scripts)) as pbar_experiment:
while curr_idx < len(list_of_scripts):
number_of_jobs = 0
result = subprocess.run(['squeue', '-l'], stdout=subprocess.PIPE)
for line in result.stdout.split(b'\n'):
if student_id in line:
number_of_jobs += 1
if number_of_jobs < args.num_parallel_jobs:
while check_if_experiment_with_name_is_running(
experiment_name=list_of_scripts[curr_idx].decode("utf-8")) or epoch_dict[
list_of_scripts[curr_idx].decode("utf-8")] >= args.total_epochs:
curr_idx += 1
if curr_idx >= len(list_of_scripts):
curr_idx = 0
str_to_run = 'sbatch {}'.format(list_of_scripts[curr_idx].decode("utf-8"))
total_jobs_finished += 1
os.system(str_to_run)
print(str_to_run)
curr_idx += 1
else:
time.sleep(1)

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

55
data_augmentations.py Normal file
View File

@ -0,0 +1,55 @@
from PIL import Image
from numpy import random
from torchvision import transforms
import numpy as np
import torch
class Cutout(object):
"""Randomly mask out one or more patches from an image.
Args:
n_holes (int): Number of patches to cut out of each image.
length (int): The length (in pixels) of each square patch.
"""
def __init__(self, n_holes, length):
self.n_holes = n_holes
self.length = length
def __call__(self, img):
"""
Args:
img (Tensor): Tensor image of size (C, H, W).
Returns:
Tensor: Image with n_holes of dimension length x length cut out of it.
"""
from_PIL = False
if type(img) == Image.Image:
from_PIL = True
img = transforms.ToTensor()(img)
h = img.size(1)
w = img.size(2)
mask = np.ones((h, w), np.float32)
for n in range(self.n_holes):
y = random.randint(0, h)
x = random.randint(0, w)
y1 = np.clip(y - self.length // 2, 0, h)
y2 = np.clip(y + self.length // 2, 0, h)
x1 = np.clip(x - self.length // 2, 0, w)
x2 = np.clip(x + self.length // 2, 0, w)
mask[y1: y2, x1: x2] = 0.
mask = torch.from_numpy(mask)
mask = mask.expand_as(img)
img = img * mask
if from_PIL:
img = transforms.ToPILImage()(img)
return img

View File

@ -4,23 +4,25 @@
This module provides classes for loading datasets and iterating over batches of
data points.
"""
from __future__ import print_function
import pickle
import gzip
import sys
import numpy as np
import os
DEFAULT_SEED = 20112018
from PIL import Image
from torch.utils import data
from torch.utils.data import Dataset
from torchvision import transforms
import os
import os.path
import numpy as np
import sys
if sys.version_info[0] == 2:
import cPickle as pickle
else:
import pickle
import torch.utils.data as data
from torchvision.datasets.utils import download_url, check_integrity
from mlp import DEFAULT_SEED
class DataProvider(object):
"""Generic data provider."""
@ -172,7 +174,7 @@ class MNISTDataProvider(DataProvider):
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'mnist-{0}.npz'.format(which_set))
"data", 'mnist-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
@ -238,7 +240,7 @@ class EMNISTDataProvider(DataProvider):
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'emnist-{0}.npz'.format(which_set))
"data", 'emnist-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
@ -247,16 +249,18 @@ class EMNISTDataProvider(DataProvider):
print(loaded.keys())
inputs, targets = loaded['inputs'], loaded['targets']
inputs = inputs.astype(np.float32)
targets = targets.astype(np.int)
if flatten:
inputs = np.reshape(inputs, newshape=(-1, 28*28))
else:
inputs = np.reshape(inputs, newshape=(-1, 28, 28, 1))
inputs = np.reshape(inputs, newshape=(-1, 1, 28, 28))
inputs = inputs / 255.0
# pass the loaded data to the parent class __init__
super(EMNISTDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
def __len__(self):
return self.num_batches
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(EMNISTDataProvider, self).next()
@ -281,6 +285,7 @@ class EMNISTDataProvider(DataProvider):
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class MetOfficeDataProvider(DataProvider):
"""South Scotland Met Office weather data provider."""
@ -303,7 +308,7 @@ class MetOfficeDataProvider(DataProvider):
rng (RandomState): A seeded random number generator.
"""
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'HadSSP_daily_qc.txt')
os.environ['DATASET_DIR'], 'HadSSP_daily_qc.txt')
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
@ -351,7 +356,7 @@ class CCPPDataProvider(DataProvider):
rng (RandomState): A seeded random number generator.
"""
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'ccpp_data.npz')
os.environ['DATASET_DIR'], 'ccpp_data.npz')
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
@ -374,21 +379,6 @@ class CCPPDataProvider(DataProvider):
super(CCPPDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
class EMNISTPytorchDataProvider(Dataset):
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
shuffle_order=True, rng=None, flatten=False, transforms=None):
self.numpy_data_provider = EMNISTDataProvider(which_set=which_set, batch_size=batch_size, max_num_batches=max_num_batches,
shuffle_order=shuffle_order, rng=rng, flatten=flatten)
self.transforms = transforms
def __getitem__(self, item):
x = self.numpy_data_provider.inputs[item]
for augmentation in self.transforms:
x = augmentation(x)
return x, int(self.numpy_data_provider.targets[item])
def __len__(self):
return len(self.numpy_data_provider.targets)
class AugmentedMNISTDataProvider(MNISTDataProvider):
"""Data provider for MNIST dataset which randomly transforms images."""
@ -427,120 +417,12 @@ class AugmentedMNISTDataProvider(MNISTDataProvider):
transformed_inputs_batch = self.transformer(inputs_batch, self.rng)
return transformed_inputs_batch, targets_batch
class Omniglot(data.Dataset):
"""`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
Args:
root (string): Root directory of dataset where directory
``cifar-10-batches-py`` exists or will be saved to if download is set to True.
train (bool, optional): If True, creates dataset from training set, otherwise
creates from test set.
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, ``transforms.RandomCrop``
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
download (bool, optional): If true, downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, it is not
downloaded again.
"""
def collect_data_paths(self, root):
data_dict = dict()
print(root)
for subdir, dir, files in os.walk(root):
for file in files:
if file.endswith('.png'):
filepath = os.path.join(subdir, file)
class_label = '_'.join(subdir.split("/")[-2:])
if class_label in data_dict:
data_dict[class_label].append(filepath)
else:
data_dict[class_label] = [filepath]
return data_dict
def __init__(self, root, set_name,
transform=None, target_transform=None,
download=False):
self.root = os.path.expanduser(root)
self.root = os.path.abspath(os.path.join(self.root, 'omniglot_dataset'))
self.transform = transform
self.target_transform = target_transform
self.set_name = set_name # training set or test set
self.data_dict = self.collect_data_paths(root=self.root)
x = []
label_to_idx = {label: idx for idx, label in enumerate(self.data_dict.keys())}
y = []
for key, value in self.data_dict.items():
x.extend(value)
y.extend(len(value) * [label_to_idx[key]])
y = np.array(y)
rng = np.random.RandomState(seed=0)
idx = np.arange(len(x))
rng.shuffle(idx)
x = [x[current_idx] for current_idx in idx]
y = y[idx]
train_sample_idx = rng.choice(a=[i for i in range(len(x))], size=int(len(x) * 0.80), replace=False)
evaluation_sample_idx = [i for i in range(len(x)) if i not in train_sample_idx]
validation_sample_idx = rng.choice(a=[i for i in range(len(evaluation_sample_idx))], size=int(len(evaluation_sample_idx) * 0.40), replace=False)
test_sample_idx = [i for i in range(len(evaluation_sample_idx)) if i not in evaluation_sample_idx]
if self.set_name=='train':
self.data = [item for idx, item in enumerate(x) if idx in train_sample_idx]
self.labels = y[train_sample_idx]
elif self.set_name=='val':
self.data = [item for idx, item in enumerate(x) if idx in validation_sample_idx]
self.labels = y[validation_sample_idx]
else:
self.data = [item for idx, item in enumerate(x) if idx in test_sample_idx]
self.labels = y[test_sample_idx]
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: (image, target) where target is index of the target class.
"""
img, target = self.data[index], self.labels[index]
img = Image.open(img)
img.show()
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
target = self.target_transform(target)
return img, target
def __len__(self):
return len(self.data)
def __repr__(self):
fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
fmt_str += ' Number of datapoints: {}\n'.format(self.__len__())
tmp = self.set_name
fmt_str += ' Split: {}\n'.format(tmp)
fmt_str += ' Root Location: {}\n'.format(self.root)
tmp = ' Transforms (if any): '
fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
tmp = ' Target Transforms (if any): '
fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
return fmt_str
class CIFAR10(data.Dataset):
"""`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
Args:
root (string): Root directory of dataset where directory
``cifar-10-batches-py`` exists or will be saved to if download is set to True.
@ -553,6 +435,7 @@ class CIFAR10(data.Dataset):
download (bool, optional): If true, downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, it is not
downloaded again.
"""
base_folder = 'cifar-10-batches-py'
url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
@ -668,6 +551,7 @@ class CIFAR10(data.Dataset):
"""
Args:
index (int): Index
Returns:
tuple: (image, target) where target is index of the target class.
"""
@ -731,6 +615,7 @@ class CIFAR10(data.Dataset):
class CIFAR100(CIFAR10):
"""`CIFAR100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
This is a subclass of the `CIFAR10` Dataset.
"""
base_folder = 'cifar-100-python'

306
experiment_builder.py Normal file
View File

@ -0,0 +1,306 @@
import sys
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import tqdm
import os
import numpy as np
import time
from torch.optim.adam import Adam
from storage_utils import save_statistics
class ExperimentBuilder(nn.Module):
def __init__(self, network_model, experiment_name, num_epochs, train_data, val_data,
test_data, weight_decay_coefficient, use_gpu, continue_from_epoch=-1):
"""
Initializes an ExperimentBuilder object. Such an object takes care of running training and evaluation of a deep net
on a given dataset. It also takes care of saving per epoch models and automatically inferring the best val model
to be used for evaluating the test set metrics.
:param network_model: A pytorch nn.Module which implements a network architecture.
:param experiment_name: The name of the experiment. This is used mainly for keeping track of the experiment and creating and directory structure that will be used to save logs, model parameters and other.
:param num_epochs: Total number of epochs to run the experiment
:param train_data: An object of the DataProvider type. Contains the training set.
:param val_data: An object of the DataProvider type. Contains the val set.
:param test_data: An object of the DataProvider type. Contains the test set.
:param weight_decay_coefficient: A float indicating the weight decay to use with the adam optimizer.
:param use_gpu: A boolean indicating whether to use a GPU or not.
:param continue_from_epoch: An int indicating whether we'll start from scrach (-1) or whether we'll reload a previously saved model of epoch 'continue_from_epoch' and continue training from there.
"""
super(ExperimentBuilder, self).__init__()
self.experiment_name = experiment_name
self.model = network_model
self.model.reset_parameters()
self.device = torch.cuda.current_device()
if torch.cuda.device_count() > 1 and use_gpu:
self.device = torch.cuda.current_device()
self.model.to(self.device)
self.model = nn.DataParallel(module=self.model)
print('Use Multi GPU', self.device)
elif torch.cuda.device_count() == 1 and use_gpu:
self.device = torch.cuda.current_device()
self.model.to(self.device) # sends the model from the cpu to the gpu
print('Use GPU', self.device)
else:
print("use CPU")
self.device = torch.device('cpu') # sets the device to be CPU
print(self.device)
# re-initialize network parameters
self.train_data = train_data
self.val_data = val_data
self.test_data = test_data
self.optimizer = Adam(self.parameters(), amsgrad=False,
weight_decay=weight_decay_coefficient)
print('System learnable parameters')
num_conv_layers = 0
num_linear_layers = 0
total_num_parameters = 0
for name, value in self.named_parameters():
print(name, value.shape)
if all(item in name for item in ['conv', 'weight']):
num_conv_layers += 1
if all(item in name for item in ['linear', 'weight']):
num_linear_layers += 1
total_num_parameters += np.prod(value.shape)
print('Total number of parameters', total_num_parameters)
print('Total number of conv layers', num_conv_layers)
print('Total number of linear layers', num_linear_layers)
# Generate the directory names
self.experiment_folder = os.path.abspath(experiment_name)
self.experiment_logs = os.path.abspath(os.path.join(self.experiment_folder, "result_outputs"))
self.experiment_saved_models = os.path.abspath(os.path.join(self.experiment_folder, "saved_models"))
print(self.experiment_folder, self.experiment_logs)
# Set best models to be at 0 since we are just starting
self.best_val_model_idx = 0
self.best_val_model_acc = 0.
if not os.path.exists(self.experiment_folder): # If experiment directory does not exist
os.mkdir(self.experiment_folder) # create the experiment directory
if not os.path.exists(self.experiment_logs):
os.mkdir(self.experiment_logs) # create the experiment log directory
if not os.path.exists(self.experiment_saved_models):
os.mkdir(self.experiment_saved_models) # create the experiment saved models directory
self.num_epochs = num_epochs
self.criterion = nn.CrossEntropyLoss().to(self.device) # send the loss computation to the GPU
if continue_from_epoch == -2:
try:
self.best_val_model_idx, self.best_val_model_acc, self.state = self.load_model(
model_save_dir=self.experiment_saved_models, model_save_name="train_model",
model_idx='latest') # reload existing model from epoch and return best val model index
# and the best val acc of that model
self.starting_epoch = self.state['current_epoch_idx']
except:
print("Model objects cannot be found, initializing a new model and starting from scratch")
self.starting_epoch = 0
self.state = dict()
elif continue_from_epoch != -1: # if continue from epoch is not -1 then
self.best_val_model_idx, self.best_val_model_acc, self.state = self.load_model(
model_save_dir=self.experiment_saved_models, model_save_name="train_model",
model_idx=continue_from_epoch) # reload existing model from epoch and return best val model index
# and the best val acc of that model
self.starting_epoch = self.state['current_epoch_idx']
else:
self.starting_epoch = 0
self.state = dict()
def get_num_parameters(self):
total_num_params = 0
for param in self.parameters():
total_num_params += np.prod(param.shape)
return total_num_params
def run_train_iter(self, x, y):
"""
Receives the inputs and targets for the model and runs a training iteration. Returns loss and accuracy metrics.
:param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
:param y: The targets for the model. A numpy array of shape batch_size, num_classes
:return: the loss and accuracy for this batch
"""
self.train() # sets model to training mode (in case batch normalization or other methods have different procedures for training and evaluation)
if len(y.shape) > 1:
y = np.argmax(y, axis=1) # convert one hot encoded labels to single integer labels
#print(type(x))
if type(x) is np.ndarray:
x, y = torch.Tensor(x).float().to(device=self.device), torch.Tensor(y).long().to(
device=self.device) # send data to device as torch tensors
x = x.to(self.device)
y = y.to(self.device)
out = self.model.forward(x) # forward the data in the model
loss = F.cross_entropy(input=out, target=y) # compute loss
self.optimizer.zero_grad() # set all weight grads from previous training iters to 0
loss.backward() # backpropagate to compute gradients for current iter loss
self.optimizer.step() # update network parameters
_, predicted = torch.max(out.data, 1) # get argmax of predictions
accuracy = np.mean(list(predicted.eq(y.data).cpu())) # compute accuracy
return loss.data.detach().cpu().numpy(), accuracy
def run_evaluation_iter(self, x, y):
"""
Receives the inputs and targets for the model and runs an evaluation iterations. Returns loss and accuracy metrics.
:param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
:param y: The targets for the model. A numpy array of shape batch_size, num_classes
:return: the loss and accuracy for this batch
"""
self.eval() # sets the system to validation mode
if len(y.shape) > 1:
y = np.argmax(y, axis=1) # convert one hot encoded labels to single integer labels
if type(x) is np.ndarray:
x, y = torch.Tensor(x).float().to(device=self.device), torch.Tensor(y).long().to(
device=self.device) # convert data to pytorch tensors and send to the computation device
x = x.to(self.device)
y = y.to(self.device)
out = self.model.forward(x) # forward the data in the model
loss = F.cross_entropy(out, y) # compute loss
_, predicted = torch.max(out.data, 1) # get argmax of predictions
accuracy = np.mean(list(predicted.eq(y.data).cpu())) # compute accuracy
return loss.data.detach().cpu().numpy(), accuracy
def save_model(self, model_save_dir, model_save_name, model_idx, state):
"""
Save the network parameter state and current best val epoch idx and best val accuracy.
:param model_save_name: Name to use to save model without the epoch index
:param model_idx: The index to save the model with.
:param best_validation_model_idx: The index of the best validation model to be stored for future use.
:param best_validation_model_acc: The best validation accuracy to be stored for use at test time.
:param model_save_dir: The directory to store the state at.
:param state: The dictionary containing the system state.
"""
state['network'] = self.state_dict() # save network parameter and other variables.
torch.save(state, f=os.path.join(model_save_dir, "{}_{}".format(model_save_name, str(
model_idx)))) # save state at prespecified filepath
def run_training_epoch(self, current_epoch_losses):
with tqdm.tqdm(total=len(self.train_data), file=sys.stdout) as pbar_train: # create a progress bar for training
for idx, (x, y) in enumerate(self.train_data): # get data batches
loss, accuracy = self.run_train_iter(x=x, y=y) # take a training iter step
current_epoch_losses["train_loss"].append(loss) # add current iter loss to the train loss list
current_epoch_losses["train_acc"].append(accuracy) # add current iter acc to the train acc list
pbar_train.update(1)
pbar_train.set_description("loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy))
return current_epoch_losses
def run_validation_epoch(self, current_epoch_losses):
with tqdm.tqdm(total=len(self.val_data), file=sys.stdout) as pbar_val: # create a progress bar for validation
for x, y in self.val_data: # get data batches
loss, accuracy = self.run_evaluation_iter(x=x, y=y) # run a validation iter
current_epoch_losses["val_loss"].append(loss) # add current iter loss to val loss list.
current_epoch_losses["val_acc"].append(accuracy) # add current iter acc to val acc lst.
pbar_val.update(1) # add 1 step to the progress bar
pbar_val.set_description("loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy))
return current_epoch_losses
def run_testing_epoch(self, current_epoch_losses):
with tqdm.tqdm(total=len(self.test_data), file=sys.stdout) as pbar_test: # ini a progress bar
for x, y in self.test_data: # sample batch
loss, accuracy = self.run_evaluation_iter(x=x,
y=y) # compute loss and accuracy by running an evaluation step
current_epoch_losses["test_loss"].append(loss) # save test loss
current_epoch_losses["test_acc"].append(accuracy) # save test accuracy
pbar_test.update(1) # update progress bar status
pbar_test.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)) # update progress bar string output
return current_epoch_losses
def load_model(self, model_save_dir, model_save_name, model_idx):
"""
Load the network parameter state and the best val model idx and best val acc to be compared with the future val accuracies, in order to choose the best val model
:param model_save_dir: The directory to store the state at.
:param model_save_name: Name to use to save model without the epoch index
:param model_idx: The index to save the model with.
:return: best val idx and best val model acc, also it loads the network state into the system state without returning it
"""
state = torch.load(f=os.path.join(model_save_dir, "{}_{}".format(model_save_name, str(model_idx))))
self.load_state_dict(state_dict=state['network'])
return state['best_val_model_idx'], state['best_val_model_acc'], state
def run_experiment(self):
"""
Runs experiment train and evaluation iterations, saving the model and best val model and val model accuracy after each epoch
:return: The summary current_epoch_losses from starting epoch to total_epochs.
"""
total_losses = {"train_acc": [], "train_loss": [], "val_acc": [],
"val_loss": [], "curr_epoch": []} # initialize a dict to keep the per-epoch metrics
for i, epoch_idx in enumerate(range(self.starting_epoch, self.num_epochs)):
epoch_start_time = time.time()
current_epoch_losses = {"train_acc": [], "train_loss": [], "val_acc": [], "val_loss": []}
current_epoch_losses = self.run_training_epoch(current_epoch_losses)
current_epoch_losses = self.run_validation_epoch(current_epoch_losses)
val_mean_accuracy = np.mean(current_epoch_losses['val_acc'])
if val_mean_accuracy > self.best_val_model_acc: # if current epoch's mean val acc is greater than the saved best val acc then
self.best_val_model_acc = val_mean_accuracy # set the best val model acc to be current epoch's val accuracy
self.best_val_model_idx = epoch_idx # set the experiment-wise best val idx to be the current epoch's idx
for key, value in current_epoch_losses.items():
total_losses[key].append(np.mean(value))
# get mean of all metrics of current epoch metrics dict,
# to get them ready for storage and output on the terminal.
total_losses['curr_epoch'].append(epoch_idx)
save_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv',
stats_dict=total_losses, current_epoch=i,
continue_from_mode=True if (self.starting_epoch != 0 or i > 0) else False) # save statistics to stats file.
# load_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv') # How to load a csv file if you need to
out_string = "_".join(
["{}_{:.4f}".format(key, np.mean(value)) for key, value in current_epoch_losses.items()])
# create a string to use to report our epoch metrics
epoch_elapsed_time = time.time() - epoch_start_time # calculate time taken for epoch
epoch_elapsed_time = "{:.4f}".format(epoch_elapsed_time)
print("Epoch {}:".format(epoch_idx), out_string, "epoch time", epoch_elapsed_time, "seconds")
self.state['current_epoch_idx'] = epoch_idx
self.state['best_val_model_acc'] = self.best_val_model_acc
self.state['best_val_model_idx'] = self.best_val_model_idx
self.save_model(model_save_dir=self.experiment_saved_models,
# save model and best val idx and best val acc, using the model dir, model name and model idx
model_save_name="train_model", model_idx=epoch_idx, state=self.state)
self.save_model(model_save_dir=self.experiment_saved_models,
# save model and best val idx and best val acc, using the model dir, model name and model idx
model_save_name="train_model", model_idx='latest', state=self.state)
print("Generating test set evaluation metrics")
self.load_model(model_save_dir=self.experiment_saved_models, model_idx=self.best_val_model_idx,
# load best validation model
model_save_name="train_model")
current_epoch_losses = {"test_acc": [], "test_loss": []} # initialize a statistics dict
current_epoch_losses = self.run_testing_epoch(current_epoch_losses=current_epoch_losses)
test_losses = {key: [np.mean(value)] for key, value in
current_epoch_losses.items()} # save test set metrics in dict format
save_statistics(experiment_log_dir=self.experiment_logs, filename='test_summary.csv',
# save test set metrics on disk in .csv format
stats_dict=test_losses, current_epoch=0, continue_from_mode=False)
return total_losses, test_losses

View File

@ -0,0 +1,16 @@
{
"batch_size": 100,
"dataset_name": "cifar10",
"continue_from_epoch": -2,
"seed": 0,
"image_num_channels": 3,
"image_height": 32,
"image_width": 32,
"dim_reduction_type": "avg_pooling",
"num_layers": 4,
"num_filters": 64,
"num_epochs": 250,
"experiment_name": "cifar10_tutorial",
"use_gpu": true,
"weight_decay_coefficient": 1e-05
}

View File

@ -0,0 +1,16 @@
{
"batch_size": 100,
"dataset_name": "emnist",
"continue_from_epoch": -2,
"seed": 0,
"image_num_channels": 1,
"image_height": 28,
"image_width": 28,
"dim_reduction_type": "avg_pooling",
"num_layers": 4,
"num_filters": 32,
"num_epochs": 250,
"experiment_name": "emnist_tutorial",
"use_gpu": true,
"weight_decay_coefficient": 1e-05
}

6
install.sh Normal file
View File

@ -0,0 +1,6 @@
conda install -c conda-forge opencv
conda install numpy scipy matplotlib
conda install -c conda-forge pbzip2 pydrive
conda install pillow tqdm
pip install GPUtil
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

View File

@ -0,0 +1,12 @@
#!/bin/sh
cd ..
export DATASET_DIR="data/"
# Activate the relevant virtual environment:
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 3 --image_height 32 --image_width 32 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'cifar100_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "cifar100"

View File

@ -0,0 +1,12 @@
#!/bin/sh
cd ..
export DATASET_DIR="data/"
# Activate the relevant virtual environment:
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 3 --image_height 32 --image_width 32 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'cifar10_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0. \
--dataset_name "cifar10"

View File

@ -0,0 +1,7 @@
#!/bin/sh
cd ..
export DATASET_DIR="data/"
# Activate the relevant virtual environment:
python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/cifar10_tutorial_config.json

View File

@ -0,0 +1,11 @@
#!/bin/sh
cd ..
export DATASET_DIR="data/"
# Activate the relevant virtual environment:
python train_evaluate_emnist_classification_system.py --batch_size 100 --continue_from_epoch -1 --seed 0 \
--image_num_channels 1 --image_height 28 --image_width 28 \
--dim_reduction_type "strided" --num_layers 4 --num_filters 64 \
--num_epochs 100 --experiment_name 'emnist_test_exp' \
--use_gpu "True" --weight_decay_coefficient 0.

View File

@ -0,0 +1,7 @@
#!/bin/sh
cd ..
export DATASET_DIR="data/"
# Activate the relevant virtual environment:
python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/emnist_tutorial_config.json

View File

@ -1,6 +0,0 @@
# -*- coding: utf-8 -*-
"""Machine Learning Practical package."""
__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham', 'Antreas Antoniou']
DEFAULT_SEED = 123456 # Default random number generator seed if none provided.

View File

@ -1,176 +0,0 @@
# -*- coding: utf-8 -*-
"""Error functions.
This module defines error functions, with the aim of model training being to
minimise the error function given a set of inputs and target outputs.
The error functions will typically measure some concept of distance between the
model outputs and target outputs, averaged over all data points in the data set
or batch.
"""
import numpy as np
class SumOfSquaredDiffsError(object):
"""Sum of squared differences (squared Euclidean distance) error."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar cost function value.
"""
return 0.5 * np.mean(np.sum((outputs - targets)**2, axis=1))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
return (outputs - targets) / outputs.shape[0]
def __repr__(self):
return 'MeanSquaredErrorCost'
class BinaryCrossEntropyError(object):
"""Binary cross entropy error."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
return -np.mean(
targets * np.log(outputs) + (1. - targets) * np.log(1. - outputs))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
return ((1. - targets) / (1. - outputs) -
(targets / outputs)) / outputs.shape[0]
def __repr__(self):
return 'BinaryCrossEntropyError'
class BinaryCrossEntropySigmoidError(object):
"""Binary cross entropy error with logistic sigmoid applied to outputs."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
probs = 1. / (1. + np.exp(-outputs))
return -np.mean(
targets * np.log(probs) + (1. - targets) * np.log(1. - probs))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
probs = 1. / (1. + np.exp(-outputs))
return (probs - targets) / outputs.shape[0]
def __repr__(self):
return 'BinaryCrossEntropySigmoidError'
class CrossEntropyError(object):
"""Multi-class cross entropy error."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
return -np.mean(np.sum(targets * np.log(outputs), axis=1))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
return -(targets / outputs) / outputs.shape[0]
def __repr__(self):
return 'CrossEntropyError'
class CrossEntropySoftmaxError(object):
"""Multi-class cross entropy error with Softmax applied to outputs."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
normOutputs = outputs - outputs.max(-1)[:, None]
logProb = normOutputs - np.log(np.sum(np.exp(normOutputs), axis=-1)[:, None])
return -np.mean(np.sum(targets * logProb, axis=1))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
probs = np.exp(outputs - outputs.max(-1)[:, None])
probs /= probs.sum(-1)[:, None]
return (probs - targets) / outputs.shape[0]
def __repr__(self):
return 'CrossEntropySoftmaxError'

View File

@ -1,143 +0,0 @@
# -*- coding: utf-8 -*-
"""Parameter initialisers.
This module defines classes to initialise the parameters in a layer.
"""
import numpy as np
from mlp import DEFAULT_SEED
class ConstantInit(object):
"""Constant parameter initialiser."""
def __init__(self, value):
"""Construct a constant parameter initialiser.
Args:
value: Value to initialise parameter to.
"""
self.value = value
def __call__(self, shape):
return np.ones(shape=shape) * self.value
class UniformInit(object):
"""Random uniform parameter initialiser."""
def __init__(self, low, high, rng=None):
"""Construct a random uniform parameter initialiser.
Args:
low: Lower bound of interval to sample from.
high: Upper bound of interval to sample from.
rng (RandomState): Seeded random number generator.
"""
self.low = low
self.high = high
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def __call__(self, shape):
return self.rng.uniform(low=self.low, high=self.high, size=shape)
class NormalInit(object):
"""Random normal parameter initialiser."""
def __init__(self, mean, std, rng=None):
"""Construct a random uniform parameter initialiser.
Args:
mean: Mean of distribution to sample from.
std: Standard deviation of distribution to sample from.
rng (RandomState): Seeded random number generator.
"""
self.mean = mean
self.std = std
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def __call__(self, shape):
return self.rng.normal(loc=self.mean, scale=self.std, size=shape)
class GlorotUniformInit(object):
"""Glorot and Bengio (2010) random uniform weights initialiser.
Initialises an two-dimensional parameter array using the 'normalized
initialisation' scheme suggested in [1] which attempts to maintain a
roughly constant variance in the activations and backpropagated gradients
of a multi-layer model consisting of interleaved affine and logistic
sigmoidal transformation layers.
Weights are sampled from a zero-mean uniform distribution with standard
deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
`output_dim` are the input and output dimensions of the weight matrix
respectively.
References:
[1]: Understanding the difficulty of training deep feedforward neural
networks, Glorot and Bengio (2010)
"""
def __init__(self, gain=1., rng=None):
"""Construct a normalised initilisation random initialiser object.
Args:
gain: Multiplicative factor to scale initialised weights by.
Recommended values is 1 for affine layers followed by
logistic sigmoid layers (or another affine layer).
rng (RandomState): Seeded random number generator.
"""
self.gain = gain
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def __call__(self, shape):
assert len(shape) == 2, (
'Initialiser should only be used for two dimensional arrays.')
std = self.gain * (2. / (shape[0] + shape[1]))**0.5
half_width = 3.**0.5 * std
return self.rng.uniform(low=-half_width, high=half_width, size=shape)
class GlorotNormalInit(object):
"""Glorot and Bengio (2010) random normal weights initialiser.
Initialises an two-dimensional parameter array using the 'normalized
initialisation' scheme suggested in [1] which attempts to maintain a
roughly constant variance in the activations and backpropagated gradients
of a multi-layer model consisting of interleaved affine and logistic
sigmoidal transformation layers.
Weights are sampled from a zero-mean normal distribution with standard
deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
`output_dim` are the input and output dimensions of the weight matrix
respectively.
References:
[1]: Understanding the difficulty of training deep feedforward neural
networks, Glorot and Bengio (2010)
"""
def __init__(self, gain=1., rng=None):
"""Construct a normalised initilisation random initialiser object.
Args:
gain: Multiplicative factor to scale initialised weights by.
Recommended values is 1 for affine layers followed by
logistic sigmoid layers (or another affine layer).
rng (RandomState): Seeded random number generator.
"""
self.gain = gain
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def __call__(self, shape):
std = self.gain * (2. / (shape[0] + shape[1]))**0.5
return self.rng.normal(loc=0., scale=std, size=shape)

View File

@ -1,824 +0,0 @@
# -*- coding: utf-8 -*-
"""Layer definitions.
This module defines classes which encapsulate a single layer.
These layers map input activations to output activation with the `fprop`
method and map gradients with repsect to outputs to gradients with respect to
their inputs with the `bprop` method.
Some layers will have learnable parameters and so will additionally define
methods for getting and setting parameter and calculating gradients with
respect to the layer parameters.
"""
import numpy as np
import mlp.initialisers as init
from mlp import DEFAULT_SEED
class Layer(object):
"""Abstract class defining the interface for a layer."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
raise NotImplementedError()
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
raise NotImplementedError()
class LayerWithParameters(Layer):
"""Abstract class defining the interface for a layer with parameters."""
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
Args:
inputs: Array of inputs to layer of shape (batch_size, input_dim).
grads_wrt_to_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
List of arrays of gradients with respect to the layer parameters
with parameter gradients appearing in same order in tuple as
returned from `get_params` method.
"""
raise NotImplementedError()
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
raise NotImplementedError()
@property
def params(self):
"""Returns a list of parameters of layer.
Returns:
List of current parameter values. This list should be in the
corresponding order to the `values` argument to `set_params`.
"""
raise NotImplementedError()
@params.setter
def params(self, values):
"""Sets layer parameters from a list of values.
Args:
values: List of values to set parameters to. This list should be
in the corresponding order to what is returned by `get_params`.
"""
raise NotImplementedError()
class StochasticLayerWithParameters(Layer):
"""Specialised layer which uses a stochastic forward propagation."""
def __init__(self, rng=None):
"""Constructs a new StochasticLayer object.
Args:
rng (RandomState): Seeded random number generator object.
"""
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def fprop(self, inputs, stochastic=True):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
stochastic: Flag allowing different deterministic
forward-propagation mode in addition to default stochastic
forward-propagation e.g. for use at test time. If False
a deterministic forward-propagation transformation
corresponding to the expected output of the stochastic
forward-propagation is applied.
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
raise NotImplementedError()
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
Args:
inputs: Array of inputs to layer of shape (batch_size, input_dim).
grads_wrt_to_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
List of arrays of gradients with respect to the layer parameters
with parameter gradients appearing in same order in tuple as
returned from `get_params` method.
"""
raise NotImplementedError()
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
raise NotImplementedError()
@property
def params(self):
"""Returns a list of parameters of layer.
Returns:
List of current parameter values. This list should be in the
corresponding order to the `values` argument to `set_params`.
"""
raise NotImplementedError()
@params.setter
def params(self, values):
"""Sets layer parameters from a list of values.
Args:
values: List of values to set parameters to. This list should be
in the corresponding order to what is returned by `get_params`.
"""
raise NotImplementedError()
class StochasticLayer(Layer):
"""Specialised layer which uses a stochastic forward propagation."""
def __init__(self, rng=None):
"""Constructs a new StochasticLayer object.
Args:
rng (RandomState): Seeded random number generator object.
"""
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def fprop(self, inputs, stochastic=True):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
stochastic: Flag allowing different deterministic
forward-propagation mode in addition to default stochastic
forward-propagation e.g. for use at test time. If False
a deterministic forward-propagation transformation
corresponding to the expected output of the stochastic
forward-propagation is applied.
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
raise NotImplementedError()
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs. This should correspond to
default stochastic forward-propagation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
raise NotImplementedError()
class AffineLayer(LayerWithParameters):
"""Layer implementing an affine tranformation of its inputs.
This layer is parameterised by a weight matrix and bias vector.
"""
def __init__(self, input_dim, output_dim,
weights_initialiser=init.UniformInit(-0.1, 0.1),
biases_initialiser=init.ConstantInit(0.),
weights_penalty=None, biases_penalty=None):
"""Initialises a parameterised affine layer.
Args:
input_dim (int): Dimension of inputs to the layer.
output_dim (int): Dimension of the layer outputs.
weights_initialiser: Initialiser for the weight parameters.
biases_initialiser: Initialiser for the bias parameters.
weights_penalty: Weights-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the weights.
biases_penalty: Biases-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the biases.
"""
self.input_dim = input_dim
self.output_dim = output_dim
self.weights = weights_initialiser((self.output_dim, self.input_dim))
self.biases = biases_initialiser(self.output_dim)
self.weights_penalty = weights_penalty
self.biases_penalty = biases_penalty
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x`, outputs `y`, weights `W` and biases `b` the layer
corresponds to `y = W.dot(x) + b`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return self.weights.dot(inputs.T).T + self.biases
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs.dot(self.weights)
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
Args:
inputs: array of inputs to layer of shape (batch_size, input_dim)
grads_wrt_to_outputs: array of gradients with respect to the layer
outputs of shape (batch_size, output_dim)
Returns:
list of arrays of gradients with respect to the layer parameters
`[grads_wrt_weights, grads_wrt_biases]`.
"""
grads_wrt_weights = np.dot(grads_wrt_outputs.T, inputs)
grads_wrt_biases = np.sum(grads_wrt_outputs, axis=0)
if self.weights_penalty is not None:
grads_wrt_weights += self.weights_penalty.grad(parameter=self.weights)
if self.biases_penalty is not None:
grads_wrt_biases += self.biases_penalty.grad(parameter=self.biases)
return [grads_wrt_weights, grads_wrt_biases]
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
params_penalty = 0
if self.weights_penalty is not None:
params_penalty += self.weights_penalty(self.weights)
if self.biases_penalty is not None:
params_penalty += self.biases_penalty(self.biases)
return params_penalty
@property
def params(self):
"""A list of layer parameter values: `[weights, biases]`."""
return [self.weights, self.biases]
@params.setter
def params(self, values):
self.weights = values[0]
self.biases = values[1]
def __repr__(self):
return 'AffineLayer(input_dim={0}, output_dim={1})'.format(
self.input_dim, self.output_dim)
class SigmoidLayer(Layer):
"""Layer implementing an element-wise logistic sigmoid transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to
`y = 1 / (1 + exp(-x))`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return 1. / (1. + np.exp(-inputs))
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs * outputs * (1. - outputs)
def __repr__(self):
return 'SigmoidLayer'
class ConvolutionalLayer(LayerWithParameters):
"""Layer implementing a 2D convolution-based transformation of its inputs.
The layer is parameterised by a set of 2D convolutional kernels, a four
dimensional array of shape
(num_output_channels, num_input_channels, kernel_height, kernel_dim_2)
and a bias vector, a one dimensional array of shape
(num_output_channels,)
i.e. one shared bias per output channel.
Assuming no-padding is applied to the inputs so that outputs are only
calculated for positions where the kernel filters fully overlap with the
inputs, and that unit strides are used the outputs will have spatial extent
output_height = input_height - kernel_height + 1
output_width = input_width - kernel_width + 1
"""
def __init__(self, num_input_channels, num_output_channels,
input_height, input_width,
kernel_height, kernel_width,
kernels_init=init.UniformInit(-0.01, 0.01),
biases_init=init.ConstantInit(0.),
kernels_penalty=None, biases_penalty=None):
"""Initialises a parameterised convolutional layer.
Args:
num_input_channels (int): Number of channels in inputs to
layer (this may be number of colour channels in the input
images if used as the first layer in a model, or the
number of output channels, a.k.a. feature maps, from a
a previous convolutional layer).
num_output_channels (int): Number of channels in outputs
from the layer, a.k.a. number of feature maps.
input_height (int): Size of first input dimension of each 2D
channel of inputs.
input_width (int): Size of second input dimension of each 2D
channel of inputs.
kernel_height (int): Size of first dimension of each 2D channel of
kernels.
kernel_width (int): Size of second dimension of each 2D channel of
kernels.
kernels_intialiser: Initialiser for the kernel parameters.
biases_initialiser: Initialiser for the bias parameters.
kernels_penalty: Kernel-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the kernels.
biases_penalty: Biases-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the biases.
"""
self.num_input_channels = num_input_channels
self.num_output_channels = num_output_channels
self.input_height = input_height
self.input_width = input_width
self.kernel_height = kernel_height
self.kernel_width = kernel_width
self.kernels_init = kernels_init
self.biases_init = biases_init
self.kernels_shape = (
num_output_channels, num_input_channels, kernel_height, kernel_width
)
self.inputs_shape = (
None, num_input_channels, input_height, input_width
)
self.kernels = self.kernels_init(self.kernels_shape)
self.biases = self.biases_init(num_output_channels)
self.kernels_penalty = kernels_penalty
self.biases_penalty = biases_penalty
self.cache = None
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x`, outputs `y`, kernels `K` and biases `b` the layer
corresponds to `y = conv2d(x, K) + b`.
Args:
inputs: Array of layer inputs of shape (batch_size, num_input_channels, image_height, image_width).
Returns:
outputs: Array of layer outputs of shape (batch_size, num_output_channels, output_height, output_width).
"""
raise NotImplementedError
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape
(batch_size, num_input_channels, input_height, input_width).
outputs: Array of layer outputs calculated in forward pass of
shape
(batch_size, num_output_channels, output_height, output_width).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape
(batch_size, num_output_channels, output_height, output_width).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, num_input_channels, input_height, input_width).
"""
# Pad the grads_wrt_outputs
raise NotImplementedError
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
Args:
inputs: array of inputs to layer of shape (batch_size, input_dim)
grads_wrt_to_outputs: array of gradients with respect to the layer
outputs of shape
(batch_size, num_output_channels, output_height, output_width).
Returns:
list of arrays of gradients with respect to the layer parameters
`[grads_wrt_kernels, grads_wrt_biases]`.
"""
# Get inputs_col from previous fprop
raise NotImplementedError
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
params_penalty = 0
if self.kernels_penalty is not None:
params_penalty += self.kernels_penalty(self.kernels)
if self.biases_penalty is not None:
params_penalty += self.biases_penalty(self.biases)
return params_penalty
@property
def params(self):
"""A list of layer parameter values: `[kernels, biases]`."""
return [self.kernels, self.biases]
@params.setter
def params(self, values):
self.kernels = values[0]
self.biases = values[1]
def __repr__(self):
return (
'ConvolutionalLayer(\n'
' num_input_channels={0}, num_output_channels={1},\n'
' input_height={2}, input_width={3},\n'
' kernel_height={4}, kernel_width={5}\n'
')'
.format(self.num_input_channels, self.num_output_channels,
self.input_height, self.input_width, self.kernel_height,
self.kernel_width)
)
class ReluLayer(Layer):
"""Layer implementing an element-wise rectified linear transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to `y = max(0, x)`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return np.maximum(inputs, 0.)
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return (outputs > 0) * grads_wrt_outputs
def __repr__(self):
return 'ReluLayer'
class TanhLayer(Layer):
"""Layer implementing an element-wise hyperbolic tangent transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to `y = tanh(x)`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return np.tanh(inputs)
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return (1. - outputs ** 2) * grads_wrt_outputs
def __repr__(self):
return 'TanhLayer'
class SoftmaxLayer(Layer):
"""Layer implementing a softmax transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to
`y = exp(x) / sum(exp(x))`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
# subtract max inside exponential to improve numerical stability -
# when we divide through by sum this term cancels
exp_inputs = np.exp(inputs - inputs.max(-1)[:, None])
return exp_inputs / exp_inputs.sum(-1)[:, None]
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return (outputs * (grads_wrt_outputs -
(grads_wrt_outputs * outputs).sum(-1)[:, None]))
def __repr__(self):
return 'SoftmaxLayer'
class RadialBasisFunctionLayer(Layer):
"""Layer implementing projection to a grid of radial basis functions."""
def __init__(self, grid_dim, intervals=[[0., 1.]]):
"""Creates a radial basis function layer object.
Args:
grid_dim: Integer specifying how many basis function to use in
grid across input space per dimension (so total number of
basis functions will be grid_dim**input_dim)
intervals: List of intervals (two element lists or tuples)
specifying extents of axis-aligned region in input-space to
tile basis functions in grid across. For example for a 2D input
space spanning [0, 1] x [0, 1] use intervals=[[0, 1], [0, 1]].
"""
num_basis = grid_dim ** len(intervals)
self.centres = np.array(np.meshgrid(*[
np.linspace(low, high, grid_dim) for (low, high) in intervals])
).reshape((len(intervals), -1))
self.scales = np.array([
[(high - low) * 1. / grid_dim] for (low, high) in intervals])
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return np.exp(-(inputs[..., None] - self.centres[None, ...]) ** 2 /
self.scales ** 2).reshape((inputs.shape[0], -1))
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
num_basis = self.centres.shape[1]
return -2 * (
((inputs[..., None] - self.centres[None, ...]) / self.scales ** 2) *
grads_wrt_outputs.reshape((inputs.shape[0], -1, num_basis))
).sum(-1)
def __repr__(self):
return 'RadialBasisFunctionLayer(grid_dim={0})'.format(self.grid_dim)
class DropoutLayer(StochasticLayer):
"""Layer which stochastically drops input dimensions in its output."""
def __init__(self, rng=None, incl_prob=0.5, share_across_batch=True):
"""Construct a new dropout layer.
Args:
rng (RandomState): Seeded random number generator.
incl_prob: Scalar value in (0, 1] specifying the probability of
each input dimension being included in the output.
share_across_batch: Whether to use same dropout mask across
all inputs in a batch or use per input masks.
"""
super(DropoutLayer, self).__init__(rng)
assert incl_prob > 0. and incl_prob <= 1.
self.incl_prob = incl_prob
self.share_across_batch = share_across_batch
self.rng = rng
def fprop(self, inputs, stochastic=True):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
stochastic: Flag allowing different deterministic
forward-propagation mode in addition to default stochastic
forward-propagation e.g. for use at test time. If False
a deterministic forward-propagation transformation
corresponding to the expected output of the stochastic
forward-propagation is applied.
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
if stochastic:
mask_shape = (1,) + inputs.shape[1:] if self.share_across_batch else inputs.shape
self._mask = (self.rng.uniform(size=mask_shape) < self.incl_prob)
return inputs * self._mask
else:
return inputs * self.incl_prob
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs. This should correspond to
default stochastic forward-propagation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs * self._mask
def __repr__(self):
return 'DropoutLayer(incl_prob={0:.1f})'.format(self.incl_prob)
class ReshapeLayer(Layer):
"""Layer which reshapes dimensions of inputs."""
def __init__(self, output_shape=None):
"""Create a new reshape layer object.
Args:
output_shape: Tuple specifying shape each input in batch should
be reshaped to in outputs. This **excludes** the batch size
so the shape of the final output array will be
(batch_size, ) + output_shape
Similarly to numpy.reshape, one shape dimension can be -1. In
this case, the value is inferred from the size of the input
array and remaining dimensions. The shape specified must be
compatible with the input array shape - i.e. the total number
of values in the array cannot be changed. If set to `None` the
output shape will be set to
(batch_size, -1)
which will flatten all the inputs to vectors.
"""
self.output_shape = (-1,) if output_shape is None else output_shape
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return inputs.reshape((inputs.shape[0],) + self.output_shape)
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs.reshape(inputs.shape)
def __repr__(self):
return 'ReshapeLayer(output_shape={0})'.format(self.output_shape)

View File

@ -1,388 +0,0 @@
# -*- coding: utf-8 -*-
"""Learning rules.
This module contains classes implementing gradient based learning rules.
"""
import numpy as np
class GradientDescentLearningRule(object):
"""Simple (stochastic) gradient descent learning rule.
For a scalar error function `E(p[0], p_[1] ... )` of some set of
potentially multidimensional parameters this attempts to find a local
minimum of the loss function by applying updates to each parameter of the
form
p[i] := p[i] - learning_rate * dE/dp[i]
With `learning_rate` a positive scaling parameter.
The error function used in successive applications of these updates may be
a stochastic estimator of the true error function (e.g. when the error with
respect to only a subset of data-points is calculated) in which case this
will correspond to a stochastic gradient descent learning rule.
"""
def __init__(self, learning_rate=1e-3):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
"""
assert learning_rate > 0., 'learning_rate should be positive.'
self.learning_rate = learning_rate
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
self.params = params
def reset(self):
"""Resets any additional state variables to their intial values.
For this learning rule there are no additional state variables so we
do nothing here.
"""
pass
def update_params(self, grads_wrt_params):
"""Applies a single gradient descent update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, grad in zip(self.params, grads_wrt_params):
param -= self.learning_rate * grad
class MomentumLearningRule(GradientDescentLearningRule):
"""Gradient descent with momentum learning rule.
This extends the basic gradient learning rule by introducing extra
momentum state variables for each parameter. These can help the learning
dynamic help overcome shallow local minima and speed convergence when
making multiple successive steps in a similar direction in parameter space.
For parameter p[i] and corresponding momentum m[i] the updates for a
scalar loss function `L` are of the form
m[i] := mom_coeff * m[i] - learning_rate * dL/dp[i]
p[i] := p[i] + m[i]
with `learning_rate` a positive scaling parameter for the gradient updates
and `mom_coeff` a value in [0, 1] that determines how much 'friction' there
is the system and so how quickly previous momentum contributions decay.
"""
def __init__(self, learning_rate=1e-3, mom_coeff=0.9):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
mom_coeff: A scalar in the range [0, 1] inclusive. This determines
the contribution of the previous momentum value to the value
after each update. If equal to 0 the momentum is set to exactly
the negative scaled gradient each update and so this rule
collapses to standard gradient descent. If equal to 1 the
momentum will just be decremented by the scaled gradient at
each update. This is equivalent to simulating the dynamic in
a frictionless system. Due to energy conservation the loss
of 'potential energy' as the dynamics moves down the loss
function surface will lead to an increasingly large 'kinetic
energy' and so speed, meaning the updates will become
increasingly large, potentially unstably so. Typically a value
less than but close to 1 will avoid these issues and cause the
dynamic to converge to a local minima where the gradients are
by definition zero.
"""
super(MomentumLearningRule, self).__init__(learning_rate)
assert mom_coeff >= 0. and mom_coeff <= 1., (
'mom_coeff should be in the range [0, 1].'
)
self.mom_coeff = mom_coeff
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(MomentumLearningRule, self).initialise(params)
self.moms = []
for param in self.params:
self.moms.append(np.zeros_like(param))
def reset(self):
"""Resets any additional state variables to their intial values.
For this learning rule this corresponds to zeroing all the momenta.
"""
for mom in zip(self.moms):
mom *= 0.
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, mom, grad in zip(self.params, self.moms, grads_wrt_params):
mom *= self.mom_coeff
mom -= self.learning_rate * grad
param += mom
class AdamLearningRule(GradientDescentLearningRule):
"""Adaptive moments (Adam) learning rule.
First-order gradient-descent based learning rule which uses adaptive
estimates of first and second moments of the parameter gradients to
calculate the parameter updates.
References:
[1]: Adam: a method for stochastic optimisation
Kingma and Ba, 2015
"""
def __init__(self, learning_rate=1e-3, beta_1=0.9, beta_2=0.999,
epsilon=1e-8):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
beta_1: Exponential decay rate for gradient first moment estimates.
This should be a scalar value in [0, 1]. The running gradient
first moment estimate is calculated using
`m_1 = beta_1 * m_1_prev + (1 - beta_1) * g`
where `m_1_prev` is the previous estimate and `g` the current
parameter gradients.
beta_2: Exponential decay rate for gradient second moment
estimates. This should be a scalar value in [0, 1]. The run
gradient second moment estimate is calculated using
`m_2 = beta_2 * m_2_prev + (1 - beta_2) * g**2`
where `m_2_prev` is the previous estimate and `g` the current
parameter gradients.
epsilon: 'Softening' parameter to stop updates diverging when
second moment estimates are close to zero. Should be set to
a small positive value.
"""
super(AdamLearningRule, self).__init__(learning_rate)
assert beta_1 >= 0. and beta_1 <= 1., 'beta_1 should be in [0, 1].'
assert beta_2 >= 0. and beta_2 <= 1., 'beta_2 should be in [0, 2].'
assert epsilon > 0., 'epsilon should be > 0.'
self.beta_1 = beta_1
self.beta_2 = beta_2
self.epsilon = epsilon
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(AdamLearningRule, self).initialise(params)
self.moms_1 = []
for param in self.params:
self.moms_1.append(np.zeros_like(param))
self.moms_2 = []
for param in self.params:
self.moms_2.append(np.zeros_like(param))
self.step_count = 0
def reset(self):
"""Resets any additional state variables to their initial values.
For this learning rule this corresponds to zeroing the estimates of
the first and second moments of the gradients.
"""
for mom_1, mom_2 in zip(self.moms_1, self.moms_2):
mom_1 *= 0.
mom_2 *= 0.
self.step_count = 0
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, mom_1, mom_2, grad in zip(
self.params, self.moms_1, self.moms_2, grads_wrt_params):
mom_1 *= self.beta_1
mom_1 += (1. - self.beta_1) * grad
mom_2 *= self.beta_2
mom_2 += (1. - self.beta_2) * grad ** 2
alpha_t = (
self.learning_rate *
(1. - self.beta_2 ** (self.step_count + 1)) ** 0.5 /
(1. - self.beta_1 ** (self.step_count + 1))
)
param -= alpha_t * mom_1 / (mom_2 ** 0.5 + self.epsilon)
self.step_count += 1
class AdaGradLearningRule(GradientDescentLearningRule):
"""Adaptive gradients (AdaGrad) learning rule.
First-order gradient-descent based learning rule which normalises gradient
updates by a running sum of the past squared gradients.
References:
[1]: Adaptive Subgradient Methods for Online Learning and Stochastic
Optimization. Duchi, Haxan and Singer, 2011
"""
def __init__(self, learning_rate=1e-2, epsilon=1e-8):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
epsilon: 'Softening' parameter to stop updates diverging when
sums of squared gradients are close to zero. Should be set to
a small positive value.
"""
super(AdaGradLearningRule, self).__init__(learning_rate)
assert epsilon > 0., 'epsilon should be > 0.'
self.epsilon = epsilon
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(AdaGradLearningRule, self).initialise(params)
self.sum_sq_grads = []
for param in self.params:
self.sum_sq_grads.append(np.zeros_like(param))
def reset(self):
"""Resets any additional state variables to their initial values.
For this learning rule this corresponds to zeroing all the sum of
squared gradient states.
"""
for sum_sq_grad in self.sum_sq_grads:
sum_sq_grad *= 0.
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, sum_sq_grad, grad in zip(
self.params, self.sum_sq_grads, grads_wrt_params):
sum_sq_grad += grad ** 2
param -= (self.learning_rate * grad /
(sum_sq_grad + self.epsilon) ** 0.5)
class RMSPropLearningRule(GradientDescentLearningRule):
"""Root mean squared gradient normalised learning rule (RMSProp).
First-order gradient-descent based learning rule which normalises gradient
updates by a exponentially smoothed estimate of the gradient second
moments.
References:
[1]: Neural Networks for Machine Learning: Lecture 6a slides
University of Toronto,Computer Science Course CSC321
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
"""
def __init__(self, learning_rate=1e-3, beta=0.9, epsilon=1e-8):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
beta: Exponential decay rate for gradient second moment
estimates. This should be a scalar value in [0, 1]. The running
gradient second moment estimate is calculated using
`m_2 = beta * m_2_prev + (1 - beta) * g**2`
where `m_2_prev` is the previous estimate and `g` the current
parameter gradients.
epsilon: 'Softening' parameter to stop updates diverging when
gradient second moment estimates are close to zero. Should be
set to a small positive value.
"""
super(RMSPropLearningRule, self).__init__(learning_rate)
assert beta >= 0. and beta <= 1., 'beta should be in [0, 1].'
assert epsilon > 0., 'epsilon should be > 0.'
self.beta = beta
self.epsilon = epsilon
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(RMSPropLearningRule, self).initialise(params)
self.moms_2 = []
for param in self.params:
self.moms_2.append(np.zeros_like(param))
def reset(self):
"""Resets any additional state variables to their initial values.
For this learning rule this corresponds to zeroing all gradient
second moment estimates.
"""
for mom_2 in self.moms_2:
mom_2 *= 0.
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, mom_2, grad in zip(
self.params, self.moms_2, grads_wrt_params):
mom_2 *= self.beta
mom_2 += (1. - self.beta) * grad ** 2
param -= (self.learning_rate * grad /
(mom_2 + self.epsilon) ** 0.5)

View File

@ -1,145 +0,0 @@
# -*- coding: utf-8 -*-
"""Model definitions.
This module implements objects encapsulating learnable models of input-output
relationships. The model objects implement methods for forward propagating
the inputs through the transformation(s) defined by the model to produce
outputs (and intermediate states) and for calculating gradients of scalar
functions of the outputs with respect to the model parameters.
"""
from mlp.layers import LayerWithParameters, StochasticLayer, StochasticLayerWithParameters
class SingleLayerModel(object):
"""A model consisting of a single transformation layer."""
def __init__(self, layer):
"""Create a new single layer model instance.
Args:
layer: The layer object defining the model architecture.
"""
self.layer = layer
@property
def params(self):
"""A list of all of the parameters of the model."""
return self.layer.params
def fprop(self, inputs, evaluation=False):
"""Calculate the model outputs corresponding to a batch of inputs.
Args:
inputs: Batch of inputs to the model.
Returns:
List which is a concatenation of the model inputs and model
outputs, this being done for consistency of the interface with
multi-layer models for which `fprop` returns a list of
activations through all immediate layers of the model and including
the inputs and outputs.
"""
activations = [inputs, self.layer.fprop(inputs)]
return activations
def grads_wrt_params(self, activations, grads_wrt_outputs):
"""Calculates gradients with respect to the model parameters.
Args:
activations: List of all activations from forward pass through
model using `fprop`.
grads_wrt_outputs: Gradient with respect to the model outputs of
the scalar function parameter gradients are being calculated
for.
Returns:
List of gradients of the scalar function with respect to all model
parameters.
"""
return self.layer.grads_wrt_params(activations[0], grads_wrt_outputs)
def __repr__(self):
return 'SingleLayerModel(' + str(self.layer) + ')'
class MultipleLayerModel(object):
"""A model consisting of multiple layers applied sequentially."""
def __init__(self, layers):
"""Create a new multiple layer model instance.
Args:
layers: List of the the layer objecst defining the model in the
order they should be applied from inputs to outputs.
"""
self.layers = layers
@property
def params(self):
"""A list of all of the parameters of the model."""
params = []
for layer in self.layers:
if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
params += layer.params
return params
def fprop(self, inputs, evaluation=False):
"""Forward propagates a batch of inputs through the model.
Args:
inputs: Batch of inputs to the model.
Returns:
List of the activations at the output of all layers of the model
plus the inputs (to the first layer) as the first element. The
last element of the list corresponds to the model outputs.
"""
activations = [inputs]
for i, layer in enumerate(self.layers):
if evaluation:
if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
StochasticLayerWithParameters):
current_activations = self.layers[i].fprop(activations[i], stochastic=False)
else:
current_activations = self.layers[i].fprop(activations[i])
else:
if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
StochasticLayerWithParameters):
current_activations = self.layers[i].fprop(activations[i], stochastic=True)
else:
current_activations = self.layers[i].fprop(activations[i])
activations.append(current_activations)
return activations
def grads_wrt_params(self, activations, grads_wrt_outputs):
"""Calculates gradients with respect to the model parameters.
Args:
activations: List of all activations from forward pass through
model using `fprop`.
grads_wrt_outputs: Gradient with respect to the model outputs of
the scalar function parameter gradients are being calculated
for.
Returns:
List of gradients of the scalar function with respect to all model
parameters.
"""
grads_wrt_params = []
for i, layer in enumerate(self.layers[::-1]):
inputs = activations[-i - 2]
outputs = activations[-i - 1]
grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)
if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
grads_wrt_params += layer.grads_wrt_params(
inputs, grads_wrt_outputs)[::-1]
grads_wrt_outputs = grads_wrt_inputs
return grads_wrt_params[::-1]
def __repr__(self):
return (
'MultiLayerModel(\n ' +
'\n '.join([str(layer) for layer in self.layers]) +
'\n)'
)

View File

@ -1,148 +0,0 @@
# -*- coding: utf-8 -*-
"""Model optimisers.
This module contains objects implementing (batched) stochastic gradient descent
based optimisation of models.
"""
import time
import logging
from collections import OrderedDict
import numpy as np
import tqdm
logger = logging.getLogger(__name__)
class Optimiser(object):
"""Basic model optimiser."""
def __init__(self, model, error, learning_rule, train_dataset,
valid_dataset=None, data_monitors=None, notebook=False):
"""Create a new optimiser instance.
Args:
model: The model to optimise.
error: The scalar error function to minimise.
learning_rule: Gradient based learning rule to use to minimise
error.
train_dataset: Data provider for training set data batches.
valid_dataset: Data provider for validation set data batches.
data_monitors: Dictionary of functions evaluated on targets and
model outputs (averaged across both full training and
validation data sets) to monitor during training in addition
to the error. Keys should correspond to a string label for
the statistic being evaluated.
"""
self.model = model
self.error = error
self.learning_rule = learning_rule
self.learning_rule.initialise(self.model.params)
self.train_dataset = train_dataset
self.valid_dataset = valid_dataset
self.data_monitors = OrderedDict([('error', error)])
if data_monitors is not None:
self.data_monitors.update(data_monitors)
self.notebook = notebook
if notebook:
self.tqdm_progress = tqdm.tqdm_notebook
else:
self.tqdm_progress = tqdm.tqdm
def do_training_epoch(self):
"""Do a single training epoch.
This iterates through all batches in training dataset, for each
calculating the gradient of the estimated error given the batch with
respect to all the model parameters and then updates the model
parameters according to the learning rule.
"""
with self.tqdm_progress(total=self.train_dataset.num_batches) as train_progress_bar:
train_progress_bar.set_description("Epoch Progress")
for inputs_batch, targets_batch in self.train_dataset:
activations = self.model.fprop(inputs_batch)
grads_wrt_outputs = self.error.grad(activations[-1], targets_batch)
grads_wrt_params = self.model.grads_wrt_params(
activations, grads_wrt_outputs)
self.learning_rule.update_params(grads_wrt_params)
train_progress_bar.update(1)
def eval_monitors(self, dataset, label):
"""Evaluates the monitors for the given dataset.
Args:
dataset: Dataset to perform evaluation with.
label: Tag to add to end of monitor keys to identify dataset.
Returns:
OrderedDict of monitor values evaluated on dataset.
"""
data_mon_vals = OrderedDict([(key + label, 0.) for key
in self.data_monitors.keys()])
for inputs_batch, targets_batch in dataset:
activations = self.model.fprop(inputs_batch, evaluation=True)
for key, data_monitor in self.data_monitors.items():
data_mon_vals[key + label] += data_monitor(
activations[-1], targets_batch)
for key, data_monitor in self.data_monitors.items():
data_mon_vals[key + label] /= dataset.num_batches
return data_mon_vals
def get_epoch_stats(self):
"""Computes training statistics for an epoch.
Returns:
An OrderedDict with keys corresponding to the statistic labels and
values corresponding to the value of the statistic.
"""
epoch_stats = OrderedDict()
epoch_stats.update(self.eval_monitors(self.train_dataset, '(train)'))
if self.valid_dataset is not None:
epoch_stats.update(self.eval_monitors(
self.valid_dataset, '(valid)'))
return epoch_stats
def log_stats(self, epoch, epoch_time, stats):
"""Outputs stats for a training epoch to a logger.
Args:
epoch (int): Epoch counter.
epoch_time: Time taken in seconds for the epoch to complete.
stats: Monitored stats for the epoch.
"""
logger.info('Epoch {0}: {1:.1f}s to complete\n {2}'.format(
epoch, epoch_time,
', '.join(['{0}={1:.2e}'.format(k, v) for (k, v) in stats.items()])
))
def train(self, num_epochs, stats_interval=5):
"""Trains a model for a set number of epochs.
Args:
num_epochs: Number of epochs (complete passes through trainin
dataset) to train for.
stats_interval: Training statistics will be recorded and logged
every `stats_interval` epochs.
Returns:
Tuple with first value being an array of training run statistics
and the second being a dict mapping the labels for the statistics
recorded to their column index in the array.
"""
start_train_time = time.time()
run_stats = [list(self.get_epoch_stats().values())]
with self.tqdm_progress(total=num_epochs) as progress_bar:
progress_bar.set_description("Experiment Progress")
for epoch in range(1, num_epochs + 1):
start_time = time.time()
self.do_training_epoch()
epoch_time = time.time()- start_time
if epoch % stats_interval == 0:
stats = self.get_epoch_stats()
self.log_stats(epoch, epoch_time, stats)
run_stats.append(list(stats.values()))
progress_bar.update(1)
finish_train_time = time.time()
total_train_time = finish_train_time - start_train_time
return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}, total_train_time

View File

@ -1,90 +0,0 @@
import numpy as np
seed = 22102017
rng = np.random.RandomState(seed)
class L1Penalty(object):
"""L1 parameter penalty.
Term to add to the objective function penalising parameters
based on their L1 norm.
"""
def __init__(self, coefficient):
"""Create a new L1 penalty object.
Args:
coefficient: Positive constant to scale penalty term by.
"""
assert coefficient > 0., 'Penalty coefficient must be positive.'
self.coefficient = coefficient
def __call__(self, parameter):
"""Calculate L1 penalty value for a parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty term.
"""
return self.coefficient * abs(parameter).sum()
def grad(self, parameter):
"""Calculate the penalty gradient with respect to the parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty gradient with respect to parameter. This
should be an array of the same shape as the parameter.
"""
return self.coefficient * np.sign(parameter)
def __repr__(self):
return 'L1Penalty({0})'.format(self.coefficient)
class L2Penalty(object):
"""L1 parameter penalty.
Term to add to the objective function penalising parameters
based on their L2 norm.
"""
def __init__(self, coefficient):
"""Create a new L2 penalty object.
Args:
coefficient: Positive constant to scale penalty term by.
"""
assert coefficient > 0., 'Penalty coefficient must be positive.'
self.coefficient = coefficient
def __call__(self, parameter):
"""Calculate L2 penalty value for a parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty term.
"""
return 0.5 * self.coefficient * (parameter ** 2).sum()
def grad(self, parameter):
"""Calculate the penalty gradient with respect to the parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty gradient with respect to parameter. This
should be an array of the same shape as the parameter.
"""
return self.coefficient * parameter
def __repr__(self):
return 'L2Penalty({0})'.format(self.coefficient)

View File

@ -1,34 +0,0 @@
# -*- coding: utf-8 -*-
"""Training schedulers.
This module contains classes implementing schedulers which control the
evolution of learning rule hyperparameters (such as learning rate) over a
training run.
"""
import numpy as np
class ConstantLearningRateScheduler(object):
"""Example of scheduler interface which sets a constant learning rate."""
def __init__(self, learning_rate):
"""Construct a new constant learning rate scheduler object.
Args:
learning_rate: Learning rate to use in learning rule.
"""
self.learning_rate = learning_rate
def update_learning_rule(self, learning_rule, epoch_number):
"""Update the hyperparameters of the learning rule.
Run at the beginning of each epoch.
Args:
learning_rule: Learning rule object being used in training run,
any scheduled hyperparameters to be altered should be
attributes of this object.
epoch_number: Integer index of training epoch about to be run.
"""
learning_rule.learning_rate = self.learning_rate

208
model_architectures.py Normal file
View File

@ -0,0 +1,208 @@
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
class FCCNetwork(nn.Module):
def __init__(self, input_shape, num_output_classes, num_filters, num_layers, use_bias=False):
"""
Initializes a fully connected network similar to the ones implemented previously in the MLP package.
:param input_shape: The shape of the inputs going in to the network.
:param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
:param num_filters: Number of filters used in every fcc layer.
:param num_layers: Number of fcc layers (excluding dim reduction stages)
:param use_bias: Whether our fcc layers will use a bias.
"""
super(FCCNetwork, self).__init__()
# set up class attributes useful in building the network and inference
self.input_shape = input_shape
self.num_filters = num_filters
self.num_output_classes = num_output_classes
self.use_bias = use_bias
self.num_layers = num_layers
# initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
self.layer_dict = nn.ModuleDict()
# build the network
self.build_module()
def build_module(self):
print("Building basic block of FCCNetwork using input shape", self.input_shape)
x = torch.zeros((self.input_shape))
out = x
out = out.view(out.shape[0], -1)
# flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
# shapes of all dimensions after the 0th dim
for i in range(self.num_layers):
self.layer_dict['fcc_{}'.format(i)] = nn.Linear(in_features=out.shape[1], # initialize a fcc layer
out_features=self.num_filters,
bias=self.use_bias)
out = self.layer_dict['fcc_{}'.format(i)](out) # apply ith fcc layer to the previous layers outputs
out = F.relu(out) # apply a ReLU on the outputs
self.logits_linear_layer = nn.Linear(in_features=out.shape[1], # initialize the prediction output linear layer
out_features=self.num_output_classes,
bias=self.use_bias)
out = self.logits_linear_layer(out) # apply the layer to the previous layer's outputs
print("Block is built, output volume is", out.shape)
return out
def forward(self, x):
"""
Forward prop data through the network and return the preds
:param x: Input batch x a batch of shape batch number of samples, each of any dimensionality.
:return: preds of shape (b, num_classes)
"""
out = x
out = out.view(out.shape[0], -1)
# flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
# shapes of all dimensions after the 0th dim
for i in range(self.num_layers):
out = self.layer_dict['fcc_{}'.format(i)](out) # apply ith fcc layer to the previous layers outputs
out = F.relu(out) # apply a ReLU on the outputs
out = self.logits_linear_layer(out) # apply the layer to the previous layer's outputs
return out
def reset_parameters(self):
"""
Re-initializes the networks parameters
"""
for item in self.layer_dict.children():
item.reset_parameters()
self.logits_linear_layer.reset_parameters()
class ConvolutionalNetwork(nn.Module):
def __init__(self, input_shape, dim_reduction_type, num_output_classes, num_filters, num_layers, use_bias=False):
"""
Initializes a convolutional network module object.
:param input_shape: The shape of the inputs going in to the network.
:param dim_reduction_type: The type of dimensionality reduction to apply after each convolutional stage, should be one of ['max_pooling', 'avg_pooling', 'strided_convolution', 'dilated_convolution']
:param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
:param num_filters: Number of filters used in every conv layer, except dim reduction stages, where those are automatically infered.
:param num_layers: Number of conv layers (excluding dim reduction stages)
:param use_bias: Whether our convolutions will use a bias.
"""
super(ConvolutionalNetwork, self).__init__()
# set up class attributes useful in building the network and inference
self.input_shape = input_shape
self.num_filters = num_filters
self.num_output_classes = num_output_classes
self.use_bias = use_bias
self.num_layers = num_layers
self.dim_reduction_type = dim_reduction_type
# initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
self.layer_dict = nn.ModuleDict()
# build the network
self.build_module()
def build_module(self):
"""
Builds network whilst automatically inferring shapes of layers.
"""
print("Building basic block of ConvolutionalNetwork using input shape", self.input_shape)
x = torch.zeros((self.input_shape)) # create dummy inputs to be used to infer shapes of layers
out = x
# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
for i in range(self.num_layers): # for number of layers times
self.layer_dict['conv_{}'.format(i)] = nn.Conv2d(in_channels=out.shape[1],
# add a conv layer in the module dict
kernel_size=3,
out_channels=self.num_filters, padding=1,
bias=self.use_bias)
out = self.layer_dict['conv_{}'.format(i)](out) # use layer on inputs to get an output
out = F.relu(out) # apply relu
print(out.shape)
if self.dim_reduction_type == 'strided_convolution': # if dim reduction is strided conv, then add a strided conv
self.layer_dict['dim_reduction_strided_conv_{}'.format(i)] = nn.Conv2d(in_channels=out.shape[1],
kernel_size=3,
out_channels=out.shape[1],
padding=1,
bias=self.use_bias, stride=2,
dilation=1)
out = self.layer_dict['dim_reduction_strided_conv_{}'.format(i)](
out) # use strided conv to get an output
out = F.relu(out) # apply relu to the output
elif self.dim_reduction_type == 'dilated_convolution': # if dim reduction is dilated conv, then add a dilated conv, using an arbitrary dilation rate of i + 2 (so it gets smaller as we go, you can choose other dilation rates should you wish to do it.)
self.layer_dict['dim_reduction_dilated_conv_{}'.format(i)] = nn.Conv2d(in_channels=out.shape[1],
kernel_size=3,
out_channels=out.shape[1],
padding=1,
bias=self.use_bias, stride=1,
dilation=i + 2)
out = self.layer_dict['dim_reduction_dilated_conv_{}'.format(i)](
out) # run dilated conv on input to get output
out = F.relu(out) # apply relu on output
elif self.dim_reduction_type == 'max_pooling':
self.layer_dict['dim_reduction_max_pool_{}'.format(i)] = nn.MaxPool2d(2, padding=1)
out = self.layer_dict['dim_reduction_max_pool_{}'.format(i)](out)
elif self.dim_reduction_type == 'avg_pooling':
self.layer_dict['dim_reduction_avg_pool_{}'.format(i)] = nn.AvgPool2d(2, padding=1)
out = self.layer_dict['dim_reduction_avg_pool_{}'.format(i)](out)
print(out.shape)
if out.shape[-1] != 2:
out = F.adaptive_avg_pool2d(out,
2) # apply adaptive pooling to make sure output of conv layers is always (2, 2) spacially (helps with comparisons).
print('shape before final linear layer', out.shape)
out = out.view(out.shape[0], -1)
self.logit_linear_layer = nn.Linear(in_features=out.shape[1], # add a linear layer
out_features=self.num_output_classes,
bias=self.use_bias)
out = self.logit_linear_layer(out) # apply linear layer on flattened inputs
print("Block is built, output volume is", out.shape)
return out
def forward(self, x):
"""
Forward propages the network given an input batch
:param x: Inputs x (b, c, h, w)
:return: preds (b, num_classes)
"""
out = x
for i in range(self.num_layers): # for number of layers
out = self.layer_dict['conv_{}'.format(i)](out) # pass through conv layer indexed at i
out = F.relu(out) # pass conv outputs through ReLU
if self.dim_reduction_type == 'strided_convolution': # if strided convolution dim reduction then
out = self.layer_dict['dim_reduction_strided_conv_{}'.format(i)](
out) # pass previous outputs through a strided convolution indexed i
out = F.relu(out) # pass strided conv outputs through ReLU
elif self.dim_reduction_type == 'dilated_convolution':
out = self.layer_dict['dim_reduction_dilated_conv_{}'.format(i)](out)
out = F.relu(out)
elif self.dim_reduction_type == 'max_pooling':
out = self.layer_dict['dim_reduction_max_pool_{}'.format(i)](out)
elif self.dim_reduction_type == 'avg_pooling':
out = self.layer_dict['dim_reduction_avg_pool_{}'.format(i)](out)
if out.shape[-1] != 2:
out = F.adaptive_avg_pool2d(out, 2)
out = out.view(out.shape[0], -1) # flatten outputs from (b, c, h, w) to (b, c*h*w)
out = self.logit_linear_layer(out) # pass through a linear layer to get logits/preds
return out
def reset_parameters(self):
"""
Re-initialize the network parameters.
"""
for item in self.layer_dict.children():
try:
item.reset_parameters()
except:
pass
self.logit_linear_layer.reset_parameters()

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

Before

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.9 KiB

View File

@ -1,65 +0,0 @@
\documentclass[tikz]{standalone}
\usepackage{amsmath}
\usepackage{tikz}
\usetikzlibrary{arrows}
\usetikzlibrary{calc}
\usepackage{ifthen}
\newcommand{\vct}[1]{\boldsymbol{#1}}
\newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}}
\tikzstyle{fprop} = [draw,fill=blue!20,minimum size=2em,align=center]
\tikzstyle{bprop} = [draw,fill=red!20,minimum size=2em,align=center]
\begin{document}
\begin{tikzpicture}[xscale=1.75] %
% define number of layers
\def\nl{2};
% model input
\node at (0, 0) (input) {$\vct{x}$};
% draw fprop through model layers
\foreach \l in {0,...,\nl} {
\node[fprop] at (2 * \l + 1, 0) (fprop\l) {\texttt{layers[\l]} \\ \texttt{.fprop}};
\ifthenelse{\l > 0}{
\node at (2 * \l, 0) (hidden\l) {$\vct{h}_\l$};
\draw[->] (hidden\l) -- (fprop\l);
\draw[->] let \n1={\l - 1} in (fprop\n1) -- (hidden\l);
}{
\draw[->] (input) -- (fprop\l);
}
}
% model output
\node at (2 * \nl + 2, 0) (output) {$\mathbf{y}$};
% error function
\node[fprop] at (2 * \nl + 3, 0) (errorfunc) {\texttt{error}};
% error value
\node at (2 * \nl + 3, -1) (error) {$\bar{E}$};
% targets
\node at (2 * \nl + 4, -1) (tgt) {$\vct{t}$};
% error gradient
\node[bprop] at (2 * \nl + 3, -2) (errorgrad) {\texttt{error} \\ \texttt{.grad}};
% gradient wrt outputs
\node at (2 * \nl + 2, -2) (gradoutput) {$\pd{\bar{E}}{\vct{y}}$};
\draw[->] (fprop\nl) -- (output);
\draw[->] (output) -- (errorfunc);
\draw[->] (errorfunc) -- (error);
\draw[->] (error) -- (errorgrad);
\draw[->] (errorgrad) -- (gradoutput);
\draw[->] (tgt) |- (errorfunc);
\draw[->] (tgt) |- (errorgrad);
\foreach \l in {0,...,\nl} {
\node[bprop] at (2 * \l + 1, -2) (bprop\l) {\texttt{layers[\l]} \\ \texttt{.bprop}};
\ifthenelse{\l > 0}{
\node at (2 * \l, -2) (grad\l) {$\pd{\bar{E}}{\vct{h}_\l}$};
\draw[<-] (grad\l) -- (bprop\l);
\draw[<-] let \n1={\l - 1} in (bprop\n1) -- (grad\l);
}{}
}
\node at (0, -2) (gradinput) {$\pd{\bar{E}}{\vct{x}}$};
\draw[->] (bprop0) -- (gradinput);
\draw[->] (gradoutput) -- (bprop\nl);
\end{tikzpicture}
\end{document}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

175
notes/google_cloud_setup.md Normal file
View File

@ -0,0 +1,175 @@
# Google Cloud Usage Tutorial
This document has been created to help you setup a google cloud instance to be used for the MLP course using the student credit the course has acquired.
This document is non-exhaustive and many more useful information is available on the [google cloud documentation page](https://cloud.google.com/docs/).
For any question you might have, that is not covered here, a quick google search should get you what you need. Anything in the official google cloud docs should be very helpful.
| WARNING: Read those instructions carefully. You will be given 50$ worth of credits and you will need to manage them properly. We will not be able to provide more credits. |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
### To create your account and start a project funded by the student credit
1. Login with your preferred gmail id to [google cloud console](https://cloud.google.com/). Click on `Console` (upper right corner), which would lead you to a new page and once there, click on Select a Project on the left hand side of the search bar on top of the page and then click on New Project on the right hand side of the Pop-Up.
Name your project sxxxxxxx-MLPractical - replacing the sxxxxxxx with your student number. **Make sure you are on this project before following the next steps**.
2. Get your coupon by following the instructions in the coupon retrieval link that you received.
3. Once you receive your coupon, follow the email instructions to add your coupon to your account.
4. Once you have added your coupon, join the [MLPractical GCP Google Group](https://groups.google.com/forum/#!forum/mlpractical_gcp) using the same Google account you used to redeem your coupon. This ensures access to the shared disk images.
5. Make sure that the financial source for your project is the MLPractical credit. You can check this by going to the [Google Cloud Console](https://console.cloud.google.com/) and selecting your project. Then, click on the `Billing` tile. Once on the `Billing` page, you should be prompted to add the billing account if you haven't yet done so. Choose `Billing Account for Education` as your billing account. Then, under the billing account, click `account management` on the left-hand side tab. You should see your project under `Projects linked to this billing account`. If not, you can add it by clicking on `Add projects` and selecting your project from the list of available projects.
### To create an instance
1. On the console page, click the button with the three lines at the top left corner.
2. In the ```Compute Engine``` sub-menu select ```VM Instances```.
3. Enable ```Compute Engine API``` if prompted.
4. Click the ```CREATE INSTANCE``` button at the top of the window.
5. Click on ```VM FROM INSTANCE TEMPLATE```, and create your VM template for this coursework:
6. Name the template ```mlpractical-1```.
7. Select ```Regional``` as the location type and ```us-west1(Oregon)``` as the region.
![VM location](figures/vm_instance_location.png)
8. Under ```Machine Configuration```, select ```GPU``` machine family. Select one NVIDIA T4. Those are the cheapest one, be careful as others can cost up to 8 times more to run.
9. Below, in ```Machine type```, under ```PRESET``` select ```n1-standard-2 (2 vCPU, 1 core, 7.5Gb memory)```.
![VM location](figures/vm_instance_configuration.png)
10. Under ```Boot disk```, click change.
11. On the right-hand new menu that appears (under ```PUBLIC IMAGES```), select
* ```Deep Learning on Linux``` operating system,
* ```Deep Learning VM for PyTorch 2.0 with CUDA 11.8 M125```
* **Note**: If the above version is not available, you can use any ```Deep Learning VM for PyTorch 2.0 with CUDA 11.8 M***``` instead.
* ```Balanced persistent disk``` as boot disk type,
* ```100```GB as disk size, and then click select at the bottom.
![Boot disk](figures/boot_disk.png)
12. Under ```Availability policies```, in the ```VM provisioning model``` drop down menu, select ```Spot```. Using this option will be helpful if you're running low on credits.
13. You can ```Enable display device``` if you want to use a GUI. This is not necessary for the coursework.
14. Leave other options as default and click ```CREATE```.
15. Tick your newly created template and click ```CREATE VM``` (top centre).
16. Click ```CREATE```. Your instance should be ready in a minute or two.
15. If your instance failed to create due to the following error - ```The GPUS-ALL-REGIONS-per-project quota maximum has been exceeded. Current limit: 0.0. Metric: compute.googleapis.com/gpus_all_regions.```, click on ```REQUEST QUOTA``` in the notification.
16. Tick ```Compute Engine API``` and then click ```EDIT QUOTAS``` (top right).
![VM location](figures/increase_quota.png)
17. This will open a box in the right side corner. Put your ```New Limit``` as ```1``` and in the description you can mention you need GPU for machine learning coursework.
18. Click ```NEXT```, fill in your details and then click ```SUBMIT REQUEST```.
19. You will receive a confirmation email with your Quota Limit increased. This may take some minutes.
20. After the confirmation email, you can recheck the GPU(All Regions) Quota Limit being set to 1. This usually shows up in 10-15 minutes after the confirmation email.
21. Retry making the VM instance again as before, by choosing your template, and you should have your instance now.
#### Note
Be careful to select 1 x T4 GPU (Others can be much more expensive).
You only have $50 dollars worth of credit, which should be about 6 days of GPU usage on a T4.
### To login into your instance via terminal:
1. Install `google-cloud-sdk` (or similarly named) package using your OS package manager
2. Authorize the current machine to access your nodes run ```gcloud auth login```. This will authenticate your google account login.
3. Follow the prompts to get a token for your current machine.
4. Run ```gcloud config set project PROJECT_ID``` where you replace `PROJECT-ID` with your project ID. You can find that in the projects drop down menu on the top of the Google Compute Engine window; this sets the current project as the active one. If you followed the above instructions, your project ID should be `sxxxxxxx-mlpractical`, where `sxxxxxxx` is your student number.
5. In your compute engine window, in the line for the instance that you have started (`mlpractical-1`), click on the downward arrow next to ```SSH```. Choose ```View gcloud command```. Copy the command to your terminal and press enter. Make sure your VM is up and running before doing this.
6. Don't add a password to the SSH key.
7. On your first login, you will be asked if you want to install nvidia drivers, **DO NOT AGREE** and follow the nvidia drivers installation below.
8. Install the R470 Nvidia driver by running the following commands:
* Add "contrib" and "non-free" components to /etc/apt/sources.list
```bash
sudo tee -a /etc/apt/sources.list >/dev/null <<'EOF'
deb http://deb.debian.org/debian/ bullseye main contrib non-free
deb-src http://deb.debian.org/debian/ bullseye main contrib non-free
EOF
```
* Check that the lines were well added by running:
```bash
cat /etc/apt/sources.list
```
* Update the list of available packages and install the nvidia-driver package:
```bash
sudo apt update
sudo apt install nvidia-driver firmware-misc-nonfree
```
9. Run ```nvidia-smi``` to confirm that the GPU can be found. This should report 1 Tesla T4 GPU. if not, the driver might have failed to install.
10. To test that PyTorch has access to the GPU you can type the commands below in your terminal. You should see `torch.cuda_is_available()` return `True`.
```
python
```
```
import torch
torch.cuda.is_available()
```
```
exit()
```
11. Well done, you are now in your instance and ready to use it for your coursework.
12. Clone a fresh mlpractical repository, and checkout branch `mlp2024-25/mlp_compute_engines`:
```
git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
cd ~/mlpractical
git checkout mlp2024-25/mlp_compute_engines
```
Then, to test PyTorch running on the GPU, run this script that trains a small convolutional network on EMNIST dataset:
```
python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/emnist_tutorial_config.json
```
You should be able to see an experiment running, using the GPU. It should be doing about 260-300 it/s (iterations per second). You can stop it when ever you like using `ctrl-c`.
If all the above matches whats stated then you should be ready to run your experiments.
To log out of your instance, simply type ```exit``` in the terminal.
### Remember to ```stop``` your instance when not using it. You pay for the time you use the machine, not for the computational cycles used.
To stop the instance go to `Compute Engine -> VM instances` on the Google Cloud Platform, slect the instance and click ```Stop```.
#### Future ssh access:
To access the instance in the future simply run the `gcloud` command you copied from the google compute engine instance page.
## Copying data to and from an instance
Please look at the [transfering files to VMs from Linux, macOS and Windows](https://cloud.google.com/compute/docs/instances/transfer-files?hl=en) and [google docs page on copying data](https://cloud.google.com/filestore/docs/copying-data). Note also the link on the page for [seting up your SSH keys (Linux or MacOS)](https://cloud.google.com/compute/docs/instances/access-overview?hl=en).
To copy from local machine to a google instance, have a look at this [stackoverflow post](https://stackoverflow.com/questions/27857532/rsync-to-google-compute-engine-instance-from-jenkins).
## Running experiments over ssh:
If ssh fails while running an experiment, then the experiment is normally killed.
To avoid this use the command ```screen```. It creates a process of the current session that keeps running whether a user is signed in or not.
The basics of using screen is to use ```screen``` to create a new session, then to enter an existing session you use:
```screen -ls```
To get a list of all available sessions. Then once you find the one you want use:
```screen -d -r screen_id```
Replacing screen_id with the id of the session you want to enter.
While in a session, you can use:
- ```ctrl+a+esc``` To pause process and be able to scroll.
- ```ctrl+a+d``` to detach from session while leaving it running (once you detach you can reattach using ```screen -r```).
- ```ctrl+a+n``` to see the next session.
- ```ctrl+a+c``` to create a new session.
You are also free to use other tools such as `nohup` or `tmux`. Use online tutorials and learn it yourself.
## Troubleshooting:
| Error| Fix|
| --- | --- |
| ```ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].``` | Delete the ssh key files and try again: ```rm ~/.ssh/google_compute_engine*``` |
|"Mapping" error after following step 3 (```tar zxvf google-cloud-sdk-365.0.0-linux-x86_64.tar.gz; bash google-cloud-sdk/install.sh```) | This is due to conflicts and several packages not being installed properly according to your Python version when creating your Conda environment. Run ```conda create --name mlp python=3.9``` to recreate the environment supported with Python 3.9. Then, activate the environment ```conda activate mlp``` and follow the instructions from step 3 again. |
|"Mapping" error even after successfully completing steps 3 and 4 when using the ```gcloud``` command | Restart your computer and run the following command: ```export CLOUDSDK_PYTHON="/usr/bin/python3"``` |
| ```gcloud command not found``` | Restart your computer and run the following command: ```export CLOUDSDK_PYTHON="/usr/bin/python3"``` |
| ```module 'collections' has no attribute 'Mapping'``` when installing the Google Cloud SDK | Install Google Cloud SDK with brew: ```brew install --cask google-cloud-sdk```|
| ```Access blocked: authorisation error``` in your browser after running ```gcloud auth login``` | Run ```gcloud components update``` and retry to login again. |
| ```ModuleNotFoundError: No module named 'GPUtil'``` | Install the GPUtil package and you should be able to run the script afterwards: ```pip install GPUtil``` |
| ```module mlp not found``` | Install the mlp package in your environment: ```python setup.py develop``` |
| ```NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.``` | Remove the current driver by running: ```cd /``` and ```sudo apt purge nvidia-*``` Follow step 11 of the instructions or the following commands: (1) download the R470 driver ```wget https://us.download.nvidia.com/XFree86/Linux-x86_64/470.223.02/NVIDIA-Linux-x86_64-470.223.02.run```, (2) change the file permissions to make it executable with ```chmod +x NVIDIA-Linux-x86_64-470.223.02.run``` and (3) install the driver ```sudo ./NVIDIA-Linux-x86_64-470.223.02.run``` |
| ```module 'torch' has no attribute 'cuda'``` | You most probably have a file named ```torch.py``` in your current directory. Rename it to something else and try again. You might need to run the setup again. Else ```import torch``` will be calling this file instead of the PyTorch library and thus causing a conflict. |
| ```Finalizing NVIDIA driver installation. Error! Your kernel headers for kernel 5.10.0-26-cloud-amd64 cannot be found. Please install the linux-headers-5.10.0-26-cloud-amd64 package, or use the --kernelsourcedir option to tell DKMS where it's located. Driver updated for latest kernel.``` | Install the header package with ```sudo apt install linux-headers-5.10.0-26-cloud-amd64``` |

View File

@ -0,0 +1,176 @@
# MLP GPU Cluster Usage Tutorial
This guide is intended to guide students into the basics of using the charles GPU cluster. It is not intended to be
an exhaustive guide that goes deep into micro-details of the Slurm ecosystem. For an exhaustive guide please visit
[the Slurm Documentation page.](https://slurm.schedmd.com/)
##### For info on clusters and some tips on good cluster ettiquete please have a look at the complementary lecture slides https://docs.google.com/presentation/d/1SU4ExARZLbenZtxm3K8Unqch5282jAXTq0CQDtfvtI0/edit?usp=sharing
## Getting Started
### Accessing the Cluster:
1. If you are not on a DICE machine, then ssh into your dice home using ```ssh sxxxxxx@student.ssh.inf.ed.ac.uk```
2. Then ssh into either mlp1 or mlp2 which are the headnodes of the GPU cluster - it does not matter which you use. To do that
run ```ssh mlp1``` or ```ssh mlp2```.
3. You are now logged into the MLP gpu cluster. If this is your first time logging in you'll need to build your environment. This is because your home directory on the GPU cluster is separate to your usual AFS home directory on DICE.
- Note: Alternatively you can just ```ssh sxxxxxxx@mlp.inf.ed.ac.uk``` to get there in one step.
### Installing requirements:
1. Start by downloading the miniconda3 installation file using
```wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh```.
2. Now run the installation using ```bash Miniconda3-latest-Linux-x86_64.sh```. At the first prompt reply yes.
```
Do you accept the license terms? [yes|no]
[no] >>> yes
```
3. At the second prompt simply press enter.
```
Miniconda3 will now be installed into this location:
/home/sxxxxxxx/miniconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
```
4. At the last prompt to initialise conda reply 'yes':
```
Do you wish the installer to initialize Miniconda3
by running conda init [yes|no]
[no] >>> yes
```
5. Now you need to activate your environment by first running:
```source .bashrc```.
This reloads .bashrc which includes the new miniconda path.
6. Run ```source activate``` to load miniconda root.
7. Now run ```conda create -n mlp python=3``` this will create the mlp environment. At the prompt choose y.
8. Now run ```source activate mlp```.
9. Install git using```conda install git```. Then config git using:
```git config --global user.name "[your name]"; git config --global user.email "[matric-number]@sms.ed.ac.uk"```
10. Now clone the mlpractical repo using ```git clone https://github.com/VICO-UoE/mlpractical.git```.
11. ```cd mlpractical```
12. Checkout the mlp_cluster_tutorial branch using ```git checkout mlp2023-24/mlp_compute_engines```.
13. Install the required packages using ```bash install.sh```.
> Note: Check that you can use the GPU version of PyTorch by running ```python -c "import torch; print(torch.cuda.is_available())"``` in a `bash` script (see the example below). If this returns `True`, then you are good to go. If it returns `False`, then you need to install the GPU version of PyTorch manually. To do this, run ```conda uninstall pytorch``` and then ```pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118``` or ```pip install torch torchvision```. This will install the latest version of PyTorch with CUDA support. This version is also compatible with older CUDA versions installed on the cluster.
14. This includes all of the required installations. Proceed to the next section outlining how to use the slurm cluster
management software. Please remember to clean your setup files using ```conda clean -t```
### Using Slurm
Slurm provides us with some commands that can be used to submit, delete, view, explore current jobs, nodes and resources among others.
To submit a job one needs to use ```sbatch script.sh``` which will automatically find available nodes and pass the job,
resources and restrictions required. The script.sh is the bash script containing the job that we want to run. Since we will be using the NVIDIA CUDA and CUDNN libraries
we have provided a sample script which should be used for your job submissions. The script is explained in detail below:
```bash
#!/bin/sh
#SBATCH -N 1 # nodes requested
#SBATCH -n 1 # tasks requested
#SBATCH --partition=Teach-Standard
#SBATCH --gres=gpu:1
#SBATCH --mem=12000 # memory in Mb
#SBATCH --time=0-08:00:00
export CUDA_HOME=/opt/cuda-9.0.176.1/
export CUDNN_HOME=/opt/cuDNN-7.0/
export STUDENT_ID=$(whoami)
export LD_LIBRARY_PATH=${CUDNN_HOME}/lib64:${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=${CUDNN_HOME}/lib64:$LIBRARY_PATH
export CPATH=${CUDNN_HOME}/include:$CPATH
export PATH=${CUDA_HOME}/bin:${PATH}
export PYTHON_PATH=$PATH
mkdir -p /disk/scratch/${STUDENT_ID}
export TMPDIR=/disk/scratch/${STUDENT_ID}/
export TMP=/disk/scratch/${STUDENT_ID}/
mkdir -p ${TMP}/datasets/
export DATASET_DIR=${TMP}/datasets/
# Activate the relevant virtual environment:
source /home/${STUDENT_ID}/miniconda3/bin/activate mlp
cd ..
python train_evaluate_emnist_classification_system.py --filepath_to_arguments_json_file experiment_configs/emnist_tutorial_config.json
```
To actually run this use ```sbatch emnist_single_gpu_tutorial.sh```. When you do this, the job will be submitted and you will be given a job id.
```bash
[burly]sxxxxxxx: sbatch emnist_single_gpu_tutorial.sh
Submitted batch job 147
```
To view a list of all running jobs use ```squeue``` for a minimal presentation and ```smap``` for a more involved presentation. Furthermore to view node information use ```sinfo```.
```bash
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
143 interacti bash iainr R 8:00 1 landonia05
147 interacti gpu_clus sxxxxxxx R 1:05 1 landonia02
```
Also in case you want to stop/delete a job use ```scancel job_id``` where job_id is the id of the job.
Furthermore in case you want to test some of your code interactively to prototype your solution before you submit it to
a node you can use ```srun -p interactive --gres=gpu:2 --pty python my_code_exp.py```.
## Slurm Cheatsheet
For a nice list of most commonly used Slurm commands please visit [here](https://bitsanddragons.wordpress.com/2017/04/12/slurm-user-cheatsheet/).
## Syncing or copying data over to DICE
At some point you will need to copy your data to DICE so you can analyse them and produce charts, write reports, store for future use etc.
1. If you are on a terminal within I.F/A.T, then skip to step 2, if you are not, then, you'll first have to open a VPN into the university network using the instructions found [here](http://computing.help.inf.ed.ac.uk/openvpn).
2. From your local machine:
1. To send data from a local machine to the cluster: ```rsync -ua --progress <local_path_of_data_to_transfer> <studentID>@mlp.inf.ed.ac.uk:/home/<studentID>/path/to/folder```
2. To receive data from the cluster to your local machine ```rsync -ua --progress <studentID>@mlp.inf.ed.ac.uk:/home/<studentID>/path/to/folder <local_path_of_data_to_transfer> ```
## Running an experiment
To run a default image classification experiment using the template models provided:
1. Sign into the cluster using ssh sxxxxxxx@mlp1.inf.ed.ac.uk
2. Activate your conda environment using, source miniconda3/bin/activate ; conda activate mlp
3. cd mlpractical
4. cd cluster_experiment_scripts
5. Find which experiment(s) you want to run (make sure the experiment ends in 'gpu_cluster.sh'). Decide if you want to run a single experiment or multiple experiments in parallel.
1. For a single experiment: ```sbatch experiment_script.sh```
2. To run multiple experiments using the "hurdle-reducing" script that automatically submits jobs, makes sure the jobs are always in queue/running:
1. Make sure the cluster_experiment_scripts folder contains ***only*** the jobs you want to run.
2. Run the command:
```
python run_jobs.py --num_parallel_jobs <number of jobs to keep in the slurm queue at all times> --num_epochs <number of epochs to run each job>
```
## Additional Help
If you require additional help please post on piazza or if you are experiencing technical problems (actual system/hardware problems) then please submit a [computing support ticket](https://www.inf.ed.ac.uk/systems/support/form/).
## List of very useful slurm commands:
- squeue: Shows all jobs from all users currently in the queue/running
- squeue -u <user_id>: Shows all jobs from user <user_id> in the queue/running
- sprio: Shows the priority score of all of your current jobs that are not yet running
- scontrol show job <job_id>: Shows all information about job <job_id>
- scancel <job_id>: Cancels job with id <job_id>
- scancel -u <user_id>: Cancels all jobs, belonging to user <user_id>, that are currently in the queue/running
- sinfo: Provides info about the cluster/partitions
- sbatch <job_script>: Submit a job that will run the script <job_script> to the slurm scheduler.
## Overview of code:
- [arg_extractor.py](arg_extractor.py): Contains an array of utility methods that can parse python arguments or convert
a json config file into an argument NamedTuple.
- [data_providers.py](data_providers.py): A sample data provider, of the same type used in the MLPractical course.
- [experiment_builder.py](experiment_builder.py): Builds and executes a simple image classification experiment, keeping track
of relevant statistics, taking care of storing and re-loading pytorch models, as well as choosing the best validation-performing model to evaluate the test set on.
- [model_architectures.py](model_architectures.py): Provides a fully connected network and convolutional neural network
sample models, which have a number of moving parts indicated as hyperparameters.
- [storage_utils.py](storage_utils.py): Provides a number of storage/loading methods for the experiment statistics.
- [train_evaluated_emnist_classification_system.py](train_evaluate_emnist_classification_system.py): Runs an experiment
given a data provider, an experiment builder instance and a model architecture

View File

@ -114,24 +114,24 @@ Here we provide a detailed guide for setting-up PuTTY with tunnel forwarding so
1. To start off, run the PuTTY executable file you downloaded, navigate to **Session** on the left column and enter the **hostname** as `student.ssh.inf.ed.ac.uk`. Put any name in the **Saved Sessions** box so that you can retrieve your saved PuTTY session for future use.
Change the remaining options as is in the screenshot below.
Change the remaining options as is in the screenshot below.
<center><img src="./figures/putty1.png" width="400" height="300"></center>
<center><img src="./figures/putty1.png" width="400" height="300"></center>
2. Now navigate to **Connection** and drop-down on **Data**. In **Auto-Login username** , enter your student id `sXXXXXXX`.
<center><img src="./figures/putty2.png" width="400" height="300"></center>
<center><img src="./figures/putty2.png" width="400" height="300"></center>
3. After step 1 and 2, follow the instructions [here](http://computing.help.inf.ed.ac.uk/installing-putty) from screenshots 3-5 to set-up **Auth** and **X11 Forwarding**. To avoid errors later, strictly follow the instructions for this set-up.
4. In this step, we will configure SSH tunneling to locally run the notebooks. On the left side of the PuTTY window, navigate to **Tunnels** under SSH and then add a `[local-port]` in **Source port** and `localhost:[local-port]` in **Destination**. Remember the `[local-port]` you used here as we will need this later.
<center><img src="./figures/putty3.png" width="400" height="300"></center>
<center><img src="./figures/putty3.png" width="400" height="300"></center>
Then press **Add** near the Source port box to add your new forwarded port. Once you add, you will see your newly added port as shown below -
Then press **Add** near the Source port box to add your new forwarded port. Once you add, you will see your newly added port as shown below -
<center><img src="./figures/putty4.png" width="400" height="300"></center>
<center><img src="./figures/putty4.png" width="400" height="300"></center>
5. After you have done steps 1-4, navigate back to **Session** on the left side and click **Save** to save all your current configurations.
@ -139,36 +139,36 @@ Then press **Add** near the Source port box to add your new forwarded port. Once
6. Then click **Open** and a terminal window will pop-up asking for your DICE password. After you enter the password, you will be logged in to SSH Gateway Server. As the message printed when you log in points out this is intended only for accessing the Informatics network externally and you should not attempt to work on this server. You should log in to one of the student.compute shared-use servers by running -
```
ssh student.compute
```
You should now be logged on to one of the shared-use compute servers. The name of the server you are logged on to will appear at the bash prompt e.g.
```
ssh student.compute
```
You should now be logged on to one of the shared-use compute servers. The name of the server you are logged on to will appear at the bash prompt e.g.
```
ashbury:~$
```
You will need to know the name of this remote server you are using later on.
```
ashbury:~$
```
You will need to know the name of this remote server you are using later on.
7. You can setup your `mlp` environment by following the instructions [here](environment-set-up.md). If you have correctly set-up the environment, activate your `conda` environment and navigate to the jupyter notebooks as detailed [here](remote-working-guide.md#starting-a-notebook-server-on-the-remote-computer). You should also secure your notebook server by following the instructions [here](remote-working-guide.md#running-jupyter-notebooks-over-ssh).
Once the notebook server starts running you should take note of the port it is being served on as indicated in the `The Jupyter Notebook is running at: https://localhost:[port]/` message.
Once the notebook server starts running you should take note of the port it is being served on as indicated in the `The Jupyter Notebook is running at: https://localhost:[port]/` message.
8. Now that the notebook server is running on the remote server you need to connect to it on your local machine. We will do this by forwarding the port the notebook server is being run on over SSH to your local machine.
For doing this, open another session of PuTTY and load the session that you saved in the **Session** on the left side. Enter the password in the prompt and this will login to the SSH gateway server. **Do not** run `ssh student.compute` now.
For doing this, open another session of PuTTY and load the session that you saved in the **Session** on the left side. Enter the password in the prompt and this will login to the SSH gateway server. **Do not** run `ssh student.compute` now.
In this terminal window, enter the command below -
In this terminal window, enter the command below -
```
ssh -N -f -L localhost:[local-port]:localhost:[port] [dice-username]@[remote-server-name]
```
The `[local-port]` is the source port you entered in Step 4, `[port]` is the remote port running on the remote server as in Step 7 and `[remote-server-name]` is the name of the remote server you got connected to in Step 6.
```
ssh -N -f -L localhost:[local-port]:localhost:[port] [dice-username]@[remote-server-name]
```
The `[local-port]` is the source port you entered in Step 4, `[port]` is the remote port running on the remote server as in Step 7 and `[remote-server-name]` is the name of the remote server you got connected to in Step 6.
If asked for a password at this stage, enter your DICE password again to login.
If asked for a password at this stage, enter your DICE password again to login.
9. Assuming you have set-up correctly, the remote port will now be getting forwarded to the specified local port on your computer. If you now open up a browser on your computer and go to `https://localhost:[local-port]` you should (potentially after seeing a security warning about the self-signed certificate) now asked to enter the notebook server password you specified earlier. Once you enter this password you should be able to access the notebook dashboard and open and edit notebooks as you usually do in laboratories.
9. Assuming you have set-up correctly, the remote port will now be getting forwarded to the specified local port on your computer. If you now open up a browser on your computer and go to `https://localhost:[local-port]` you should (potentially after seeing a security warning about the self-signed certificate) now asked to enter the notebook server password you specified earlier. Once you enter this password you should be able to access the notebook dashboard and open and edit notebooks as you usually do in laboratories.
When you are finished working you should both close down the notebook server by entering `Ctrl+C` twice in the terminal window the SSH session you used to start up the notebook server is running and halt the port forwarding command by entering `Ctrl+C` in the terminal it is running in.
When you are finished working you should both close down the notebook server by entering `Ctrl+C` twice in the terminal window the SSH session you used to start up the notebook server is running and halt the port forwarding command by entering `Ctrl+C` in the terminal it is running in.

View File

@ -1,133 +0,0 @@
import argparse
def str2bool(v):
if v.lower() in ("yes", "true", "t", "y", "1"):
return True
elif v.lower() in ("no", "false", "f", "n", "0"):
return False
else:
raise argparse.ArgumentTypeError("Boolean value expected.")
def get_args():
"""
Returns a namedtuple with arguments extracted from the command line.
:return: A namedtuple with arguments
"""
parser = argparse.ArgumentParser(
description="Welcome to the MLP course's Pytorch training and inference helper script"
)
parser.add_argument(
"--batch_size",
nargs="?",
type=int,
default=100,
help="Batch_size for experiment",
)
parser.add_argument(
"--continue_from_epoch",
nargs="?",
type=int,
default=-1,
help="Epoch you want to continue training from while restarting an experiment",
)
parser.add_argument(
"--seed",
nargs="?",
type=int,
default=7112018,
help="Seed to use for random number generator for experiment",
)
parser.add_argument(
"--image_num_channels",
nargs="?",
type=int,
default=3,
help="The channel dimensionality of our image-data",
)
parser.add_argument(
"--learning-rate",
nargs="?",
type=float,
default=1e-3,
help="The learning rate (default 1e-3)",
)
parser.add_argument(
"--image_height", nargs="?", type=int, default=32, help="Height of image data"
)
parser.add_argument(
"--image_width", nargs="?", type=int, default=32, help="Width of image data"
)
parser.add_argument(
"--num_stages",
nargs="?",
type=int,
default=3,
help="Number of convolutional stages in the network. A stage is considered a sequence of "
"convolutional layers where the input volume remains the same in the spacial dimension and"
" is always terminated by a dimensionality reduction stage",
)
parser.add_argument(
"--num_blocks_per_stage",
nargs="?",
type=int,
default=5,
help="Number of convolutional blocks in each stage, not including the reduction stage."
" A convolutional block is made up of two convolutional layers activated using the "
" leaky-relu non-linearity",
)
parser.add_argument(
"--num_filters",
nargs="?",
type=int,
default=16,
help="Number of convolutional filters per convolutional layer in the network (excluding "
"dimensionality reduction layers)",
)
parser.add_argument(
"--num_epochs",
nargs="?",
type=int,
default=100,
help="Total number of epochs for model training",
)
parser.add_argument(
"--num_classes",
nargs="?",
type=int,
default=100,
help="Number of classes in the dataset",
)
parser.add_argument(
"--experiment_name",
nargs="?",
type=str,
default="exp_1",
help="Experiment name - to be used for building the experiment folder",
)
parser.add_argument(
"--use_gpu",
nargs="?",
type=str2bool,
default=True,
help="A flag indicating whether we will use GPU acceleration or not",
)
parser.add_argument(
"--weight_decay_coefficient",
nargs="?",
type=float,
default=0,
help="Weight decay to use for Adam",
)
parser.add_argument(
"--block_type",
type=str,
default="conv_block",
help="Type of convolutional blocks to use in our network"
"(This argument will be useful in running experiments to debug your network)",
)
args = parser.parse_args()
print(args)
return args

View File

@ -1,462 +0,0 @@
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import tqdm
import os
import numpy as np
import time
from pytorch_mlp_framework.storage_utils import save_statistics
from matplotlib import pyplot as plt
import matplotlib
matplotlib.rcParams.update({"font.size": 8})
class ExperimentBuilder(nn.Module):
def __init__(
self,
network_model,
experiment_name,
num_epochs,
train_data,
val_data,
test_data,
weight_decay_coefficient,
learning_rate,
use_gpu,
continue_from_epoch=-1,
):
"""
Initializes an ExperimentBuilder object. Such an object takes care of running training and evaluation of a deep net
on a given dataset. It also takes care of saving per epoch models and automatically inferring the best val model
to be used for evaluating the test set metrics.
:param network_model: A pytorch nn.Module which implements a network architecture.
:param experiment_name: The name of the experiment. This is used mainly for keeping track of the experiment and creating and directory structure that will be used to save logs, model parameters and other.
:param num_epochs: Total number of epochs to run the experiment
:param train_data: An object of the DataProvider type. Contains the training set.
:param val_data: An object of the DataProvider type. Contains the val set.
:param test_data: An object of the DataProvider type. Contains the test set.
:param weight_decay_coefficient: A float indicating the weight decay to use with the adam optimizer.
:param use_gpu: A boolean indicating whether to use a GPU or not.
:param continue_from_epoch: An int indicating whether we'll start from scrach (-1) or whether we'll reload a previously saved model of epoch 'continue_from_epoch' and continue training from there.
"""
super(ExperimentBuilder, self).__init__()
self.experiment_name = experiment_name
self.model = network_model
if torch.cuda.device_count() >= 1 and use_gpu:
self.device = torch.device("cuda")
self.model.to(self.device) # sends the model from the cpu to the gpu
print("Use GPU", self.device)
else:
print("use CPU")
self.device = torch.device("cpu") # sets the device to be CPU
print(self.device)
print("here")
self.model.reset_parameters() # re-initialize network parameters
self.train_data = train_data
self.val_data = val_data
self.test_data = test_data
print("System learnable parameters")
num_conv_layers = 0
num_linear_layers = 0
total_num_parameters = 0
for name, value in self.named_parameters():
print(name, value.shape)
if all(item in name for item in ["conv", "weight"]):
num_conv_layers += 1
if all(item in name for item in ["linear", "weight"]):
num_linear_layers += 1
total_num_parameters += np.prod(value.shape)
print("Total number of parameters", total_num_parameters)
print("Total number of conv layers", num_conv_layers)
print("Total number of linear layers", num_linear_layers)
print(f"Learning rate: {learning_rate}")
self.optimizer = optim.Adam(
self.parameters(),
amsgrad=False,
weight_decay=weight_decay_coefficient,
lr=learning_rate,
)
self.learning_rate_scheduler = optim.lr_scheduler.CosineAnnealingLR(
self.optimizer, T_max=num_epochs, eta_min=0.00002
)
# Generate the directory names
self.experiment_folder = os.path.abspath(experiment_name)
self.experiment_logs = os.path.abspath(
os.path.join(self.experiment_folder, "result_outputs")
)
self.experiment_saved_models = os.path.abspath(
os.path.join(self.experiment_folder, "saved_models")
)
# Set best models to be at 0 since we are just starting
self.best_val_model_idx = 0
self.best_val_model_acc = 0.0
if not os.path.exists(
self.experiment_folder
): # If experiment directory does not exist
os.mkdir(self.experiment_folder) # create the experiment directory
os.mkdir(self.experiment_logs) # create the experiment log directory
os.mkdir(
self.experiment_saved_models
) # create the experiment saved models directory
self.num_epochs = num_epochs
self.criterion = nn.CrossEntropyLoss().to(
self.device
) # send the loss computation to the GPU
if (
continue_from_epoch == -2
): # if continue from epoch is -2 then continue from latest saved model
self.state, self.best_val_model_idx, self.best_val_model_acc = (
self.load_model(
model_save_dir=self.experiment_saved_models,
model_save_name="train_model",
model_idx="latest",
)
) # reload existing model from epoch and return best val model index
# and the best val acc of that model
self.starting_epoch = int(self.state["model_epoch"])
elif continue_from_epoch > -1: # if continue from epoch is greater than -1 then
self.state, self.best_val_model_idx, self.best_val_model_acc = (
self.load_model(
model_save_dir=self.experiment_saved_models,
model_save_name="train_model",
model_idx=continue_from_epoch,
)
) # reload existing model from epoch and return best val model index
# and the best val acc of that model
self.starting_epoch = continue_from_epoch
else:
self.state = dict()
self.starting_epoch = 0
def get_num_parameters(self):
total_num_params = 0
for param in self.parameters():
total_num_params += np.prod(param.shape)
return total_num_params
def plot_func_def(self, all_grads, layers):
"""
Plot function definition to plot the average gradient with respect to the number of layers in the given model
:param all_grads: Gradients wrt weights for each layer in the model.
:param layers: Layer names corresponding to the model parameters
:return: plot for gradient flow
"""
plt.plot(all_grads, alpha=0.3, color="b")
plt.hlines(0, 0, len(all_grads) + 1, linewidth=1, color="k")
plt.xticks(range(0, len(all_grads), 1), layers, rotation="vertical")
plt.xlim(xmin=0, xmax=len(all_grads))
plt.xlabel("Layers")
plt.ylabel("Average Gradient")
plt.title("Gradient flow")
plt.grid(True)
plt.tight_layout()
return plt
def plot_grad_flow(self, named_parameters):
"""
The function is being called in Line 298 of this file.
Receives the parameters of the model being trained. Returns plot of gradient flow for the given model parameters.
"""
all_grads = []
layers = []
"""
Complete the code in the block below to collect absolute mean of the gradients for each layer in all_grads with the layer names in layers.
"""
for name, param in named_parameters:
if "bias" in name:
continue
# Check if the parameter requires gradient and has a gradient
if param.requires_grad and param.grad is not None:
try:
_, a, _, b, _ = name.split(".", 4)
except:
b, a = name.split(".", 1)
layers.append(f"{a}_{b}")
# Collect the mean of the absolute gradients
all_grads.append(param.grad.abs().mean().item())
plt = self.plot_func_def(all_grads, layers)
return plt
def run_train_iter(self, x, y):
self.train() # sets model to training mode (in case batch normalization or other methods have different procedures for training and evaluation)
x, y = x.float().to(device=self.device), y.long().to(
device=self.device
) # send data to device as torch tensors
out = self.model.forward(x) # forward the data in the model
loss = F.cross_entropy(input=out, target=y) # compute loss
self.optimizer.zero_grad() # set all weight grads from previous training iters to 0
loss.backward() # backpropagate to compute gradients for current iter loss
self.optimizer.step() # update network parameters
self.learning_rate_scheduler.step() # update learning rate scheduler
_, predicted = torch.max(out.data, 1) # get argmax of predictions
accuracy = np.mean(list(predicted.eq(y.data).cpu())) # compute accuracy
return loss.cpu().data.numpy(), accuracy
def run_evaluation_iter(self, x, y):
"""
Receives the inputs and targets for the model and runs an evaluation iterations. Returns loss and accuracy metrics.
:param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
:param y: The targets for the model. A numpy array of shape batch_size, num_classes
:return: the loss and accuracy for this batch
"""
self.eval() # sets the system to validation mode
x, y = x.float().to(device=self.device), y.long().to(
device=self.device
) # convert data to pytorch tensors and send to the computation device
out = self.model.forward(x) # forward the data in the model
loss = F.cross_entropy(input=out, target=y) # compute loss
_, predicted = torch.max(out.data, 1) # get argmax of predictions
accuracy = np.mean(list(predicted.eq(y.data).cpu())) # compute accuracy
return loss.cpu().data.numpy(), accuracy
def save_model(
self,
model_save_dir,
model_save_name,
model_idx,
best_validation_model_idx,
best_validation_model_acc,
):
"""
Save the network parameter state and current best val epoch idx and best val accuracy.
:param model_save_name: Name to use to save model without the epoch index
:param model_idx: The index to save the model with.
:param best_validation_model_idx: The index of the best validation model to be stored for future use.
:param best_validation_model_acc: The best validation accuracy to be stored for use at test time.
:param model_save_dir: The directory to store the state at.
:param state: The dictionary containing the system state.
"""
self.state["network"] = (
self.state_dict()
) # save network parameter and other variables.
self.state["best_val_model_idx"] = (
best_validation_model_idx # save current best val idx
)
self.state["best_val_model_acc"] = (
best_validation_model_acc # save current best val acc
)
torch.save(
self.state,
f=os.path.join(
model_save_dir, "{}_{}".format(model_save_name, str(model_idx))
),
) # save state at prespecified filepath
def load_model(self, model_save_dir, model_save_name, model_idx):
"""
Load the network parameter state and the best val model idx and best val acc to be compared with the future val accuracies, in order to choose the best val model
:param model_save_dir: The directory to store the state at.
:param model_save_name: Name to use to save model without the epoch index
:param model_idx: The index to save the model with.
:return: best val idx and best val model acc, also it loads the network state into the system state without returning it
"""
state = torch.load(
f=os.path.join(
model_save_dir, "{}_{}".format(model_save_name, str(model_idx))
)
)
self.load_state_dict(state_dict=state["network"])
return state, state["best_val_model_idx"], state["best_val_model_acc"]
def run_experiment(self):
"""
Runs experiment train and evaluation iterations, saving the model and best val model and val model accuracy after each epoch
:return: The summary current_epoch_losses from starting epoch to total_epochs.
"""
total_losses = {
"train_acc": [],
"train_loss": [],
"val_acc": [],
"val_loss": [],
} # initialize a dict to keep the per-epoch metrics
for i, epoch_idx in enumerate(range(self.starting_epoch, self.num_epochs)):
epoch_start_time = time.time()
current_epoch_losses = {
"train_acc": [],
"train_loss": [],
"val_acc": [],
"val_loss": [],
}
self.current_epoch = epoch_idx
with tqdm.tqdm(
total=len(self.train_data)
) as pbar_train: # create a progress bar for training
for idx, (x, y) in enumerate(self.train_data): # get data batches
loss, accuracy = self.run_train_iter(
x=x, y=y
) # take a training iter step
current_epoch_losses["train_loss"].append(
loss
) # add current iter loss to the train loss list
current_epoch_losses["train_acc"].append(
accuracy
) # add current iter acc to the train acc list
pbar_train.update(1)
pbar_train.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
)
with tqdm.tqdm(
total=len(self.val_data)
) as pbar_val: # create a progress bar for validation
for x, y in self.val_data: # get data batches
loss, accuracy = self.run_evaluation_iter(
x=x, y=y
) # run a validation iter
current_epoch_losses["val_loss"].append(
loss
) # add current iter loss to val loss list.
current_epoch_losses["val_acc"].append(
accuracy
) # add current iter acc to val acc lst.
pbar_val.update(1) # add 1 step to the progress bar
pbar_val.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
)
val_mean_accuracy = np.mean(current_epoch_losses["val_acc"])
if (
val_mean_accuracy > self.best_val_model_acc
): # if current epoch's mean val acc is greater than the saved best val acc then
self.best_val_model_acc = val_mean_accuracy # set the best val model acc to be current epoch's val accuracy
self.best_val_model_idx = epoch_idx # set the experiment-wise best val idx to be the current epoch's idx
for key, value in current_epoch_losses.items():
total_losses[key].append(
np.mean(value)
) # get mean of all metrics of current epoch metrics dict, to get them ready for storage and output on the terminal.
save_statistics(
experiment_log_dir=self.experiment_logs,
filename="summary.csv",
stats_dict=total_losses,
current_epoch=i,
continue_from_mode=(
True if (self.starting_epoch != 0 or i > 0) else False
),
) # save statistics to stats file.
# load_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv') # How to load a csv file if you need to
out_string = "_".join(
[
"{}_{:.4f}".format(key, np.mean(value))
for key, value in current_epoch_losses.items()
]
)
# create a string to use to report our epoch metrics
epoch_elapsed_time = (
time.time() - epoch_start_time
) # calculate time taken for epoch
epoch_elapsed_time = "{:.4f}".format(epoch_elapsed_time)
print(
"Epoch {}:".format(epoch_idx),
out_string,
"epoch time",
epoch_elapsed_time,
"seconds",
)
self.state["model_epoch"] = epoch_idx
self.save_model(
model_save_dir=self.experiment_saved_models,
# save model and best val idx and best val acc, using the model dir, model name and model idx
model_save_name="train_model",
model_idx=epoch_idx,
best_validation_model_idx=self.best_val_model_idx,
best_validation_model_acc=self.best_val_model_acc,
)
self.save_model(
model_save_dir=self.experiment_saved_models,
# save model and best val idx and best val acc, using the model dir, model name and model idx
model_save_name="train_model",
model_idx="latest",
best_validation_model_idx=self.best_val_model_idx,
best_validation_model_acc=self.best_val_model_acc,
)
################################################################
##### Plot Gradient Flow at each Epoch during Training ######
print("Generating Gradient Flow Plot at epoch {}".format(epoch_idx))
plt = self.plot_grad_flow(self.model.named_parameters())
if not os.path.exists(
os.path.join(self.experiment_saved_models, "gradient_flow_plots")
):
os.mkdir(
os.path.join(self.experiment_saved_models, "gradient_flow_plots")
)
# plt.legend(loc="best")
plt.savefig(
os.path.join(
self.experiment_saved_models,
"gradient_flow_plots",
"epoch{}.pdf".format(str(epoch_idx)),
)
)
################################################################
print("Generating test set evaluation metrics")
self.load_model(
model_save_dir=self.experiment_saved_models,
model_idx=self.best_val_model_idx,
# load best validation model
model_save_name="train_model",
)
current_epoch_losses = {
"test_acc": [],
"test_loss": [],
} # initialize a statistics dict
with tqdm.tqdm(total=len(self.test_data)) as pbar_test: # ini a progress bar
for x, y in self.test_data: # sample batch
loss, accuracy = self.run_evaluation_iter(
x=x, y=y
) # compute loss and accuracy by running an evaluation step
current_epoch_losses["test_loss"].append(loss) # save test loss
current_epoch_losses["test_acc"].append(accuracy) # save test accuracy
pbar_test.update(1) # update progress bar status
pbar_test.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
) # update progress bar string output
test_losses = {
key: [np.mean(value)] for key, value in current_epoch_losses.items()
} # save test set metrics in dict format
save_statistics(
experiment_log_dir=self.experiment_logs,
filename="test_summary.csv",
# save test set metrics on disk in .csv format
stats_dict=test_losses,
current_epoch=0,
continue_from_mode=False,
)
return total_losses, test_losses

View File

@ -1,640 +0,0 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
class FCCNetwork(nn.Module):
def __init__(
self, input_shape, num_output_classes, num_filters, num_layers, use_bias=False
):
"""
Initializes a fully connected network similar to the ones implemented previously in the MLP package.
:param input_shape: The shape of the inputs going in to the network.
:param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
:param num_filters: Number of filters used in every fcc layer.
:param num_layers: Number of fcc layers (excluding dim reduction stages)
:param use_bias: Whether our fcc layers will use a bias.
"""
super(FCCNetwork, self).__init__()
# set up class attributes useful in building the network and inference
self.input_shape = input_shape
self.num_filters = num_filters
self.num_output_classes = num_output_classes
self.use_bias = use_bias
self.num_layers = num_layers
# initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
self.layer_dict = nn.ModuleDict()
# build the network
self.build_module()
def build_module(self):
print("Building basic block of FCCNetwork using input shape", self.input_shape)
x = torch.zeros((self.input_shape))
out = x
out = out.view(out.shape[0], -1)
# flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
# shapes of all dimensions after the 0th dim
for i in range(self.num_layers):
self.layer_dict["fcc_{}".format(i)] = nn.Linear(
in_features=out.shape[1], # initialize a fcc layer
out_features=self.num_filters,
bias=self.use_bias,
)
out = self.layer_dict["fcc_{}".format(i)](
out
) # apply ith fcc layer to the previous layers outputs
out = F.relu(out) # apply a ReLU on the outputs
self.logits_linear_layer = nn.Linear(
in_features=out.shape[1], # initialize the prediction output linear layer
out_features=self.num_output_classes,
bias=self.use_bias,
)
out = self.logits_linear_layer(
out
) # apply the layer to the previous layer's outputs
print("Block is built, output volume is", out.shape)
return out
def forward(self, x):
"""
Forward prop data through the network and return the preds
:param x: Input batch x a batch of shape batch number of samples, each of any dimensionality.
:return: preds of shape (b, num_classes)
"""
out = x
out = out.view(out.shape[0], -1)
# flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
# shapes of all dimensions after the 0th dim
for i in range(self.num_layers):
out = self.layer_dict["fcc_{}".format(i)](
out
) # apply ith fcc layer to the previous layers outputs
out = F.relu(out) # apply a ReLU on the outputs
out = self.logits_linear_layer(
out
) # apply the layer to the previous layer's outputs
return out
def reset_parameters(self):
"""
Re-initializes the networks parameters
"""
for item in self.layer_dict.children():
item.reset_parameters()
self.logits_linear_layer.reset_parameters()
class EmptyBlock(nn.Module):
def __init__(
self,
input_shape=None,
num_filters=None,
kernel_size=None,
padding=None,
bias=None,
dilation=None,
reduction_factor=None,
):
super(EmptyBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
self.layer_dict["Identity"] = nn.Identity()
def forward(self, x):
out = x
out = self.layer_dict["Identity"].forward(out)
return out
class EntryConvolutionalBlock(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super(EntryConvolutionalBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_0"].forward(out)
self.layer_dict["bn_0"] = nn.BatchNorm2d(num_features=out.shape[1])
out = F.leaky_relu(self.layer_dict["bn_0"].forward(out))
print(out.shape)
def forward(self, x):
out = x
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(self.layer_dict["bn_0"].forward(out))
return out
class ConvolutionalProcessingBlock(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super(ConvolutionalProcessingBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
print(out.shape)
def forward(self, x):
out = x
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
return out
class ConvolutionalDimensionalityReductionBlock(nn.Module):
def __init__(
self,
input_shape,
num_filters,
kernel_size,
padding,
bias,
dilation,
reduction_factor,
):
super(ConvolutionalDimensionalityReductionBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.reduction_factor = reduction_factor
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
out = F.avg_pool2d(out, self.reduction_factor)
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
print(out.shape)
def forward(self, x):
out = x
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
out = F.avg_pool2d(out, self.reduction_factor)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
return out
class ConvolutionalNetwork(nn.Module):
def __init__(
self,
input_shape,
num_output_classes,
num_filters,
num_blocks_per_stage,
num_stages,
use_bias=False,
processing_block_type=ConvolutionalProcessingBlock,
dimensionality_reduction_block_type=ConvolutionalDimensionalityReductionBlock,
):
"""
Initializes a convolutional network module
:param input_shape: The shape of the tensor to be passed into this network
:param num_output_classes: Number of output classes
:param num_filters: Number of filters per convolutional layer
:param num_blocks_per_stage: Number of blocks per "stage". Each block is composed of 2 convolutional layers.
:param num_stages: Number of stages in a network. A stage is defined as a sequence of layers within which the
data dimensionality remains constant in the spacial axis (h, w) and can change in the channel axis. After each stage
there exists a dimensionality reduction stage, composed of two convolutional layers and an avg pooling layer.
:param use_bias: Whether to use biases in our convolutional layers
:param processing_block_type: Type of processing block to use within our stages
:param dimensionality_reduction_block_type: Type of dimensionality reduction block to use after each stage in our network
"""
super(ConvolutionalNetwork, self).__init__()
# set up class attributes useful in building the network and inference
self.input_shape = input_shape
self.num_filters = num_filters
self.num_output_classes = num_output_classes
self.use_bias = use_bias
self.num_blocks_per_stage = num_blocks_per_stage
self.num_stages = num_stages
self.processing_block_type = processing_block_type
self.dimensionality_reduction_block_type = dimensionality_reduction_block_type
# build the network
self.build_module()
def build_module(self):
"""
Builds network whilst automatically inferring shapes of layers.
"""
self.layer_dict = nn.ModuleDict()
# initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
print(
"Building basic block of ConvolutionalNetwork using input shape",
self.input_shape,
)
x = torch.zeros(
(self.input_shape)
) # create dummy inputs to be used to infer shapes of layers
out = x
self.layer_dict["input_conv"] = EntryConvolutionalBlock(
input_shape=out.shape,
num_filters=self.num_filters,
kernel_size=3,
padding=1,
bias=self.use_bias,
dilation=1,
)
out = self.layer_dict["input_conv"].forward(out)
# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
for i in range(self.num_stages): # for number of layers times
for j in range(self.num_blocks_per_stage):
self.layer_dict["block_{}_{}".format(i, j)] = (
self.processing_block_type(
input_shape=out.shape,
num_filters=self.num_filters,
bias=self.use_bias,
kernel_size=3,
dilation=1,
padding=1,
)
)
out = self.layer_dict["block_{}_{}".format(i, j)].forward(out)
self.layer_dict["reduction_block_{}".format(i)] = (
self.dimensionality_reduction_block_type(
input_shape=out.shape,
num_filters=self.num_filters,
bias=True,
kernel_size=3,
dilation=1,
padding=1,
reduction_factor=2,
)
)
out = self.layer_dict["reduction_block_{}".format(i)].forward(out)
out = F.avg_pool2d(out, out.shape[-1])
print("shape before final linear layer", out.shape)
out = out.view(out.shape[0], -1)
self.logit_linear_layer = nn.Linear(
in_features=out.shape[1], # add a linear layer
out_features=self.num_output_classes,
bias=True,
)
out = self.logit_linear_layer(out) # apply linear layer on flattened inputs
print("Block is built, output volume is", out.shape)
return out
def forward(self, x):
"""
Forward propages the network given an input batch
:param x: Inputs x (b, c, h, w)
:return: preds (b, num_classes)
"""
out = x
out = self.layer_dict["input_conv"].forward(out)
for i in range(self.num_stages): # for number of layers times
for j in range(self.num_blocks_per_stage):
out = self.layer_dict["block_{}_{}".format(i, j)].forward(out)
out = self.layer_dict["reduction_block_{}".format(i)].forward(out)
out = F.avg_pool2d(out, out.shape[-1])
out = out.view(
out.shape[0], -1
) # flatten outputs from (b, c, h, w) to (b, c*h*w)
out = self.logit_linear_layer(
out
) # pass through a linear layer to get logits/preds
return out
def reset_parameters(self):
"""
Re-initialize the network parameters.
"""
for item in self.layer_dict.children():
try:
item.reset_parameters()
except:
pass
self.logit_linear_layer.reset_parameters()
# My Implementation:
class ConvolutionalProcessingBlockBN(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super().__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
# First convolutional layer with Batch Normalization
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Second convolutional layer with Batch Normalization
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
print(out.shape)
def forward(self, x):
out = x
# Apply first conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Apply second conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
return out
class ConvolutionalDimensionalityReductionBlockBN(nn.Module):
def __init__(
self,
input_shape,
num_filters,
kernel_size,
padding,
bias,
dilation,
reduction_factor,
):
super().__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.reduction_factor = reduction_factor
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
# First convolutional layer with Batch Normalization
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Dimensionality reduction through average pooling
out = F.avg_pool2d(out, self.reduction_factor)
# Second convolutional layer with Batch Normalization
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
print(out.shape)
def forward(self, x):
out = x
# Apply first conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Dimensionality reduction through average pooling
out = F.avg_pool2d(out, self.reduction_factor)
# Apply second conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
return out
class ConvolutionalProcessingBlockBNRC(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super().__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
# First convolutional layer with BN
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
out = self.layer_dict["conv_0"].forward(out)
out = self.layer_dict["bn_0"].forward(out)
out = F.leaky_relu(out)
# Second convolutional layer with BN
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
out = self.layer_dict["conv_1"].forward(out)
out = self.layer_dict["bn_1"].forward(out)
out = F.leaky_relu(out)
# Print final output shape for debugging
print(out.shape)
def forward(self, x):
residual = x # Save input for residual connection
out = x
# Apply first conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Apply second conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
# Add residual connection
# Ensure shape compatibility
assert residual.shape == out.shape
# if residual.shape == out.shape:
out += residual
return out

View File

@ -1,87 +0,0 @@
import unittest
import torch
from model_architectures import (
ConvolutionalProcessingBlockBN,
ConvolutionalDimensionalityReductionBlockBN,
ConvolutionalProcessingBlockBNRC,
)
class TestBatchNormalizationBlocks(unittest.TestCase):
def setUp(self):
# Common parameters
self.input_shape = (1, 3, 32, 32) # Batch size 1, 3 channels, 32x32 input
self.num_filters = 16
self.kernel_size = 3
self.padding = 1
self.bias = False
self.dilation = 1
self.reduction_factor = 2
def test_convolutional_processing_block(self):
# Create a ConvolutionalProcessingBlockBN instance
block = ConvolutionalProcessingBlockBN(
input_shape=self.input_shape,
num_filters=self.num_filters,
kernel_size=self.kernel_size,
padding=self.padding,
bias=self.bias,
dilation=self.dilation,
)
# Generate a random tensor matching the input shape
input_tensor = torch.randn(self.input_shape)
# Forward pass
try:
output = block(input_tensor)
self.assertIsNotNone(output, "Output should not be None.")
except Exception as e:
self.fail(f"ConvolutionalProcessingBlock raised an error: {e}")
def test_convolutional_processing_block_with_rc(self):
# Create a ConvolutionalProcessingBlockBNRC instance
block = ConvolutionalProcessingBlockBNRC(
input_shape=self.input_shape,
num_filters=self.num_filters,
kernel_size=self.kernel_size,
padding=self.padding,
bias=self.bias,
dilation=self.dilation,
)
# Generate a random tensor matching the input shape
input_tensor = torch.randn(self.input_shape)
# Forward pass
try:
output = block(input_tensor)
self.assertIsNotNone(output, "Output should not be None.")
except Exception as e:
self.fail(f"ConvolutionalProcessingBlock raised an error: {e}")
def test_convolutional_dimensionality_reduction_block(self):
# Create a ConvolutionalDimensionalityReductionBlockBN instance
block = ConvolutionalDimensionalityReductionBlockBN(
input_shape=self.input_shape,
num_filters=self.num_filters,
kernel_size=self.kernel_size,
padding=self.padding,
bias=self.bias,
dilation=self.dilation,
reduction_factor=self.reduction_factor,
)
# Generate a random tensor matching the input shape
input_tensor = torch.randn(self.input_shape)
# Forward pass
try:
output = block(input_tensor)
self.assertIsNotNone(output, "Output should not be None.")
except Exception as e:
self.fail(f"ConvolutionalDimensionalityReductionBlock raised an error: {e}")
if __name__ == "__main__":
unittest.main()

View File

@ -1,102 +0,0 @@
import numpy as np
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
import mlp.data_providers as data_providers
from pytorch_mlp_framework.arg_extractor import get_args
from pytorch_mlp_framework.experiment_builder import ExperimentBuilder
from pytorch_mlp_framework.model_architectures import *
import os
# os.environ["CUDA_VISIBLE_DEVICES"]="0"
args = get_args() # get arguments from command line
rng = np.random.RandomState(seed=args.seed) # set the seeds for the experiment
torch.manual_seed(seed=args.seed) # sets pytorch's seed
# set up data augmentation transforms for training and testing
transform_train = transforms.Compose(
[
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]
)
transform_test = transforms.Compose(
[
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]
)
train_data = data_providers.CIFAR100(
root="data", set_name="train", transform=transform_train, download=True
) # initialize our rngs using the argument set seed
val_data = data_providers.CIFAR100(
root="data", set_name="val", transform=transform_test, download=True
) # initialize our rngs using the argument set seed
test_data = data_providers.CIFAR100(
root="data", set_name="test", transform=transform_test, download=True
) # initialize our rngs using the argument set seed
train_data_loader = DataLoader(
train_data, batch_size=args.batch_size, shuffle=True, num_workers=2
)
val_data_loader = DataLoader(
val_data, batch_size=args.batch_size, shuffle=True, num_workers=2
)
test_data_loader = DataLoader(
test_data, batch_size=args.batch_size, shuffle=True, num_workers=2
)
if args.block_type == "conv_block":
processing_block_type = ConvolutionalProcessingBlock
dim_reduction_block_type = ConvolutionalDimensionalityReductionBlock
elif args.block_type == "empty_block":
processing_block_type = EmptyBlock
dim_reduction_block_type = EmptyBlock
elif args.block_type == "conv_bn":
processing_block_type = ConvolutionalProcessingBlockBN
dim_reduction_block_type = ConvolutionalDimensionalityReductionBlockBN
elif args.block_type == "conv_bn_rc":
processing_block_type = ConvolutionalProcessingBlockBNRC
dim_reduction_block_type = ConvolutionalDimensionalityReductionBlockBN
else:
raise ModuleNotFoundError
custom_conv_net = (
ConvolutionalNetwork( # initialize our network object, in this case a ConvNet
input_shape=(
args.batch_size,
args.image_num_channels,
args.image_height,
args.image_width,
),
num_output_classes=args.num_classes,
num_filters=args.num_filters,
use_bias=False,
num_blocks_per_stage=args.num_blocks_per_stage,
num_stages=args.num_stages,
processing_block_type=processing_block_type,
dimensionality_reduction_block_type=dim_reduction_block_type,
)
)
conv_experiment = ExperimentBuilder(
network_model=custom_conv_net,
experiment_name=args.experiment_name,
num_epochs=args.num_epochs,
weight_decay_coefficient=args.weight_decay_coefficient,
learning_rate=args.learning_rate,
use_gpu=args.use_gpu,
continue_from_epoch=args.continue_from_epoch,
train_data=train_data_loader,
val_data=val_data_loader,
test_data=test_data_loader,
) # build an experiment object
experiment_metrics, test_metrics = (
conv_experiment.run_experiment()
) # run experiment and return experiment metrics

4
report/.gitignore vendored
View File

@ -1,4 +0,0 @@
*.fls
*.fdb_latexmk
s2759177/
*.zip

View File

@ -1 +0,0 @@
Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory. However certain LaTeX distributions are missing certain files, and the they are included in this directory. If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.

View File

@ -1,101 +0,0 @@
train_acc,train_loss,val_acc,val_loss
0.027410526315789472,4.440032,0.0368,4.238186
0.0440842105263158,4.1909122,0.0644,4.1239405
0.05604210526315791,4.0817885,0.0368,4.495799
0.0685263157894737,3.984858,0.0964,3.8527937
0.08345263157894738,3.8947835,0.09080000000000002,3.8306112
0.09391578947368423,3.8246264,0.10399999999999998,3.7504945
0.10189473684210527,3.760145,0.1124,3.6439042
0.11197894736842108,3.704831,0.0992,3.962508
0.12534736842105265,3.6408415,0.1404,3.516474
0.1385894736842105,3.5672796,0.1444,3.5242612
0.14873684210526317,3.5145628,0.12960000000000002,3.5745378
0.16103157894736844,3.4476008,0.1852,3.3353982
0.16846315789473681,3.399858,0.15600000000000003,3.453797
0.1760210526315789,3.3611393,0.1464,3.5799885
0.18625263157894736,3.3005812,0.196,3.201007
0.19233684210526317,3.26565,0.17439999999999997,3.397586
0.19625263157894737,3.2346153,0.212,3.169959
0.20717894736842105,3.174345,0.2132,3.0981174
0.2136,3.1425776,0.2036,3.2191591
0.2217684210526316,3.094137,0.236,3.0018876
0.23069473684210529,3.0539455,0.20440000000000003,3.1800296
0.23395789473684211,3.0338168,0.22599999999999998,3.0360818
0.24463157894736842,2.9761615,0.2588,2.8876188
0.25311578947368424,2.931479,0.2,3.242481
0.25795789473684216,2.900163,0.28320000000000006,2.830947
0.26789473684210524,2.8484874,0.2768,2.8190458
0.2709263157894737,2.833472,0.2352,3.0098538
0.2816421052631579,2.7842317,0.29560000000000003,2.7288156
0.28764210526315787,2.745757,0.2648,2.8955112
0.2930315789473684,2.7276495,0.27680000000000005,2.8336413
0.3001263157894737,2.6826382,0.316,2.6245823
0.3068421052631579,2.658441,0.27,2.9279957
0.30909473684210526,2.638565,0.31160000000000004,2.637653
0.3213263157894737,2.5939283,0.31799999999999995,2.627816
0.3211157894736843,2.579544,0.25079999999999997,2.9502957
0.3259999999999999,2.5540712,0.3332,2.569941
0.3336421052631579,2.5239582,0.278,2.7676308
0.3371368421052632,2.5109046,0.2916,2.725589
0.34404210526315787,2.4714804,0.34120000000000006,2.4782379
0.3500631578947368,2.4545348,0.30600000000000005,2.6625924
0.34976842105263156,2.4408882,0.342,2.5351026
0.3586315789473684,2.4116046,0.3452,2.450749
0.3568421052631579,2.4133172,0.3288,2.5647113
0.3630947368421052,2.3772728,0.36519999999999997,2.388074
0.37069473684210524,2.3505116,0.324,2.5489926
0.37132631578947367,2.352426,0.33680000000000004,2.5370462
0.37606315789473677,2.319005,0.3712,2.3507965
0.3800210526315789,2.3045664,0.33,2.6327293
0.38185263157894733,2.2965574,0.3764,2.364877
0.38785263157894734,2.269467,0.37799999999999995,2.330837
0.3889684210526316,2.26941,0.3559999999999999,2.513778
0.3951789473684211,2.2413251,0.3888,2.2839465
0.3944421052631579,2.2319226,0.35919999999999996,2.4310353
0.4,2.220305,0.3732,2.348543
0.4051157894736842,2.1891508,0.39440000000000003,2.2730627
0.40581052631578945,2.1873925,0.33399999999999996,2.5648093
0.4067789473684211,2.1817088,0.4044,2.2244952
0.41555789473684207,2.1543047,0.39759999999999995,2.220972
0.4170526315789474,2.14905,0.33399999999999996,2.6612198
0.41762105263157895,2.1321266,0.3932,2.2343464
0.42341052631578946,2.1131704,0.37800000000000006,2.327929
0.4212842105263158,2.112597,0.376,2.3302126
0.4295157894736842,2.0925663,0.4100000000000001,2.175698
0.4299368421052632,2.0846903,0.3772,2.3750577
0.43134736842105265,2.075184,0.4044,2.1888158
0.43829473684210524,2.045202,0.41239999999999993,2.1673117
0.43534736842105265,2.0590534,0.37440000000000007,2.3269994
0.4417684210526316,2.0356588,0.42,2.1668334
0.4442736842105263,2.028207,0.41239999999999993,2.2346516
0.44581052631578943,2.021492,0.40519999999999995,2.2030904
0.44884210526315793,2.0058675,0.4296,2.0948715
0.45071578947368424,1.993417,0.39,2.2856123
0.45130526315789476,1.9970801,0.43599999999999994,2.110219
0.45686315789473686,1.9651922,0.4244,2.1253593
0.4557263157894737,1.9701725,0.3704,2.4576838
0.4609684210526315,1.956996,0.4412,2.0626938
0.4639789473684211,1.9407912,0.398,2.3076272
0.46311578947368426,1.9410807,0.4056,2.2181008
0.4686736842105263,1.918824,0.45080000000000003,2.030652
0.4650315789473684,1.924879,0.3948,2.2926931
0.46964210526315786,1.9188553,0.43599999999999994,2.107239
0.47357894736842104,1.8991861,0.43119999999999997,2.067097
0.47212631578947367,1.8987728,0.41359999999999997,2.1667569
0.4773263157894737,1.8892545,0.46,2.0283196
0.4802526315789474,1.8736148,0.41960000000000003,2.1698954
0.47406315789473685,1.8849738,0.43399999999999994,2.1001608
0.48627368421052636,1.8492608,0.45520000000000005,1.9936249
0.48589473684210527,1.8534511,0.38439999999999996,2.354954
0.48667368421052637,1.8421199,0.44120000000000004,2.0467849
0.4902736842105263,1.8265136,0.45519999999999994,2.0044358
0.4879789473684211,1.838593,0.3984,2.3019247
0.49204210526315795,1.8199797,0.4656,1.9858631
0.4945894736842105,1.805858,0.436,2.1293921
0.4939578947368421,1.8174701,0.4388,2.0611947
0.4961684210526316,1.7953233,0.4612,1.9728945
0.49610526315789477,1.7908033,0.42440000000000005,2.1648548
0.4996,1.7908286,0.4664,1.9897026
0.5070105263157895,1.7658812,0.452,2.0411723
0.5027368421052631,1.7692825,0.4136000000000001,2.280331
0.5062315789473685,1.7649119,0.4768,1.9493303
1 train_acc train_loss val_acc val_loss
2 0.027410526315789472 4.440032 0.0368 4.238186
3 0.0440842105263158 4.1909122 0.0644 4.1239405
4 0.05604210526315791 4.0817885 0.0368 4.495799
5 0.0685263157894737 3.984858 0.0964 3.8527937
6 0.08345263157894738 3.8947835 0.09080000000000002 3.8306112
7 0.09391578947368423 3.8246264 0.10399999999999998 3.7504945
8 0.10189473684210527 3.760145 0.1124 3.6439042
9 0.11197894736842108 3.704831 0.0992 3.962508
10 0.12534736842105265 3.6408415 0.1404 3.516474
11 0.1385894736842105 3.5672796 0.1444 3.5242612
12 0.14873684210526317 3.5145628 0.12960000000000002 3.5745378
13 0.16103157894736844 3.4476008 0.1852 3.3353982
14 0.16846315789473681 3.399858 0.15600000000000003 3.453797
15 0.1760210526315789 3.3611393 0.1464 3.5799885
16 0.18625263157894736 3.3005812 0.196 3.201007
17 0.19233684210526317 3.26565 0.17439999999999997 3.397586
18 0.19625263157894737 3.2346153 0.212 3.169959
19 0.20717894736842105 3.174345 0.2132 3.0981174
20 0.2136 3.1425776 0.2036 3.2191591
21 0.2217684210526316 3.094137 0.236 3.0018876
22 0.23069473684210529 3.0539455 0.20440000000000003 3.1800296
23 0.23395789473684211 3.0338168 0.22599999999999998 3.0360818
24 0.24463157894736842 2.9761615 0.2588 2.8876188
25 0.25311578947368424 2.931479 0.2 3.242481
26 0.25795789473684216 2.900163 0.28320000000000006 2.830947
27 0.26789473684210524 2.8484874 0.2768 2.8190458
28 0.2709263157894737 2.833472 0.2352 3.0098538
29 0.2816421052631579 2.7842317 0.29560000000000003 2.7288156
30 0.28764210526315787 2.745757 0.2648 2.8955112
31 0.2930315789473684 2.7276495 0.27680000000000005 2.8336413
32 0.3001263157894737 2.6826382 0.316 2.6245823
33 0.3068421052631579 2.658441 0.27 2.9279957
34 0.30909473684210526 2.638565 0.31160000000000004 2.637653
35 0.3213263157894737 2.5939283 0.31799999999999995 2.627816
36 0.3211157894736843 2.579544 0.25079999999999997 2.9502957
37 0.3259999999999999 2.5540712 0.3332 2.569941
38 0.3336421052631579 2.5239582 0.278 2.7676308
39 0.3371368421052632 2.5109046 0.2916 2.725589
40 0.34404210526315787 2.4714804 0.34120000000000006 2.4782379
41 0.3500631578947368 2.4545348 0.30600000000000005 2.6625924
42 0.34976842105263156 2.4408882 0.342 2.5351026
43 0.3586315789473684 2.4116046 0.3452 2.450749
44 0.3568421052631579 2.4133172 0.3288 2.5647113
45 0.3630947368421052 2.3772728 0.36519999999999997 2.388074
46 0.37069473684210524 2.3505116 0.324 2.5489926
47 0.37132631578947367 2.352426 0.33680000000000004 2.5370462
48 0.37606315789473677 2.319005 0.3712 2.3507965
49 0.3800210526315789 2.3045664 0.33 2.6327293
50 0.38185263157894733 2.2965574 0.3764 2.364877
51 0.38785263157894734 2.269467 0.37799999999999995 2.330837
52 0.3889684210526316 2.26941 0.3559999999999999 2.513778
53 0.3951789473684211 2.2413251 0.3888 2.2839465
54 0.3944421052631579 2.2319226 0.35919999999999996 2.4310353
55 0.4 2.220305 0.3732 2.348543
56 0.4051157894736842 2.1891508 0.39440000000000003 2.2730627
57 0.40581052631578945 2.1873925 0.33399999999999996 2.5648093
58 0.4067789473684211 2.1817088 0.4044 2.2244952
59 0.41555789473684207 2.1543047 0.39759999999999995 2.220972
60 0.4170526315789474 2.14905 0.33399999999999996 2.6612198
61 0.41762105263157895 2.1321266 0.3932 2.2343464
62 0.42341052631578946 2.1131704 0.37800000000000006 2.327929
63 0.4212842105263158 2.112597 0.376 2.3302126
64 0.4295157894736842 2.0925663 0.4100000000000001 2.175698
65 0.4299368421052632 2.0846903 0.3772 2.3750577
66 0.43134736842105265 2.075184 0.4044 2.1888158
67 0.43829473684210524 2.045202 0.41239999999999993 2.1673117
68 0.43534736842105265 2.0590534 0.37440000000000007 2.3269994
69 0.4417684210526316 2.0356588 0.42 2.1668334
70 0.4442736842105263 2.028207 0.41239999999999993 2.2346516
71 0.44581052631578943 2.021492 0.40519999999999995 2.2030904
72 0.44884210526315793 2.0058675 0.4296 2.0948715
73 0.45071578947368424 1.993417 0.39 2.2856123
74 0.45130526315789476 1.9970801 0.43599999999999994 2.110219
75 0.45686315789473686 1.9651922 0.4244 2.1253593
76 0.4557263157894737 1.9701725 0.3704 2.4576838
77 0.4609684210526315 1.956996 0.4412 2.0626938
78 0.4639789473684211 1.9407912 0.398 2.3076272
79 0.46311578947368426 1.9410807 0.4056 2.2181008
80 0.4686736842105263 1.918824 0.45080000000000003 2.030652
81 0.4650315789473684 1.924879 0.3948 2.2926931
82 0.46964210526315786 1.9188553 0.43599999999999994 2.107239
83 0.47357894736842104 1.8991861 0.43119999999999997 2.067097
84 0.47212631578947367 1.8987728 0.41359999999999997 2.1667569
85 0.4773263157894737 1.8892545 0.46 2.0283196
86 0.4802526315789474 1.8736148 0.41960000000000003 2.1698954
87 0.47406315789473685 1.8849738 0.43399999999999994 2.1001608
88 0.48627368421052636 1.8492608 0.45520000000000005 1.9936249
89 0.48589473684210527 1.8534511 0.38439999999999996 2.354954
90 0.48667368421052637 1.8421199 0.44120000000000004 2.0467849
91 0.4902736842105263 1.8265136 0.45519999999999994 2.0044358
92 0.4879789473684211 1.838593 0.3984 2.3019247
93 0.49204210526315795 1.8199797 0.4656 1.9858631
94 0.4945894736842105 1.805858 0.436 2.1293921
95 0.4939578947368421 1.8174701 0.4388 2.0611947
96 0.4961684210526316 1.7953233 0.4612 1.9728945
97 0.49610526315789477 1.7908033 0.42440000000000005 2.1648548
98 0.4996 1.7908286 0.4664 1.9897026
99 0.5070105263157895 1.7658812 0.452 2.0411723
100 0.5027368421052631 1.7692825 0.4136000000000001 2.280331
101 0.5062315789473685 1.7649119 0.4768 1.9493303

View File

@ -1,2 +0,0 @@
test_acc,test_loss
0.46970000000000006,1.9579598
1 test_acc test_loss
2 0.46970000000000006 1.9579598

View File

@ -1,101 +0,0 @@
train_acc,train_loss,val_acc,val_loss
0.04040000000000001,4.2986817,0.07600000000000001,3.9793916
0.07663157894736841,3.948711,0.09840000000000002,3.8271046
0.1072842105263158,3.7670445,0.0908,3.8834984
0.14671578947368422,3.544252,0.1784,3.3180876
0.18690526315789474,3.3382895,0.1672,3.4958847
0.2185684210526316,3.1613564,0.23240000000000002,3.0646808
0.2584,2.9509778,0.2904,2.7620668
0.2886736842105263,2.7674758,0.2504,3.083242
0.3186736842105263,2.6191177,0.34600000000000003,2.5320892
0.3488421052631579,2.4735146,0.3556,2.463249
0.36701052631578945,2.3815694,0.32480000000000003,2.6590502
0.39258947368421054,2.2661598,0.41200000000000003,2.215237
0.40985263157894736,2.1811035,0.3644,2.4625826
0.42557894736842106,2.1193688,0.3896,2.2802749
0.4452,2.0338347,0.45080000000000003,2.0216491
0.45298947368421055,1.9886738,0.3768,2.4903286
0.4690105263157895,1.9385177,0.46519999999999995,1.9589043
0.48627368421052636,1.8654134,0.46199999999999997,1.9572229
0.4910947368421053,1.836772,0.3947999999999999,2.371203
0.5033052631578947,1.7882212,0.4864,1.8270072
0.515578947368421,1.7451773,0.418,2.2281988
0.5166526315789474,1.7310464,0.4744,1.9468222
0.532,1.6639497,0.5176,1.7627875
0.534821052631579,1.6504371,0.426,2.2908173
0.5399578947368422,1.6263881,0.5092,1.7892419
0.5538105263157893,1.5786182,0.5184,1.7781507
0.5530526315789474,1.5743873,0.45480000000000004,2.052206
0.5610526315789474,1.5367776,0.5404000000000001,1.6886607
0.5709263157894736,1.508275,0.5072000000000001,1.8317349
0.5693894736842106,1.5026951,0.49760000000000004,1.9268813
0.5827368421052632,1.4614111,0.5484,1.6791071
0.583557894736842,1.4580216,0.4744,2.084504
0.5856842105263159,1.4402864,0.5468,1.6674811
0.5958105263157895,1.4054152,0.5468,1.7081916
0.5964631578947368,1.4043275,0.4988,1.8901508
0.6044631578947368,1.3692447,0.548,1.6456038
0.6065473684210526,1.3562685,0.5448,1.7725601
0.6055578947368421,1.3638091,0.52,1.803752
0.6169684210526316,1.3224502,0.5688,1.6048553
0.6184421052631579,1.3228824,0.4772,2.0309162
0.6193894736842105,1.312684,0.5496,1.6357917
0.6287368421052631,1.2758818,0.5552,1.7120187
0.6270105263157894,1.2829372,0.4872000000000001,1.9630791
0.6313473684210527,1.2609128,0.5632,1.6049384
0.6374736842105263,1.2429903,0.5516,1.7101723
0.6342947368421055,1.2540665,0.5272,1.8112053
0.642778947368421,1.2098345,0.5692,1.5996393
0.6447368421052632,1.217454,0.5056,2.087292
0.6437052631578949,1.2123955,0.5660000000000001,1.6426488
0.6533263157894735,1.1804259,0.5672,1.6429158
0.6521052631578947,1.1856273,0.5316000000000001,1.8833923
0.658021052631579,1.1663536,0.5652,1.6239171
0.6622947368421054,1.1522906,0.5376000000000001,1.8352613
0.6543789473684212,1.1700194,0.5539999999999999,1.7920883
0.6664,1.1246897,0.5828,1.5657492
0.6645473684210526,1.1307288,0.5296,1.8285477
0.6647157894736843,1.1294464,0.5852,1.59438
0.6713473684210526,1.1020554,0.5647999999999999,1.6256377
0.6691368421052631,1.1129124,0.5224,1.9497899
0.6737684210526315,1.0941163,0.5708,1.5900868
0.6765473684210527,1.0844595,0.55,1.7522817
0.6762947368421053,1.0832069,0.5428000000000001,1.8020345
0.6799789473684209,1.0637755,0.5864,1.5690281
0.6808421052631578,1.066873,0.5168,1.9964217
0.6843157894736842,1.0618489,0.5720000000000001,1.6391727
0.6866736842105262,1.0432214,0.5731999999999999,1.6571078
0.6877684210526315,1.0442319,0.5192,2.0341485
0.6890105263157895,1.0338738,0.5836,1.5887364
0.693642105263158,1.0206536,0.5456,1.8537303
0.6905894736842106,1.0271776,0.5548000000000001,1.8022745
0.6981263157894737,1.001102,0.5852,1.5923084
0.6986105263157896,1.0052379,0.512,2.011443
0.698042105263158,0.9990784,0.5744,1.638558
0.7031578947368421,0.977477,0.5816,1.5790274
0.7013473684210526,0.98766434,0.5448000000000001,1.8414693
0.7069684210526315,0.9691622,0.59,1.5866013
0.7061894736842105,0.9620083,0.55,1.7695292
0.7050526315789474,0.9689725,0.5408,1.8329593
0.7101052631578948,0.95279986,0.5852,1.5835829
0.7122315789473684,0.9483001,0.5224,1.9749893
0.7115157894736842,0.94911486,0.5808,1.6965445
0.7166315789473684,0.9338312,0.5788,1.6249495
0.7120631578947368,0.9428737,0.5224,1.9721117
0.7197263157894737,0.92057914,0.5960000000000001,1.6235417
0.7258315789473684,0.9071854,0.528,2.0651033
0.7186947368421053,0.922529,0.5628,1.7508049
0.7257684210526316,0.9007169,0.5980000000000001,1.5797865
0.7254105263157896,0.89657074,0.5472,1.8673587
0.7229263157894736,0.90324384,0.5771999999999999,1.6998875
0.7308842105263157,0.8757633,0.5856,1.6750972
0.7254947368421052,0.8956531,0.5479999999999999,1.9809356
0.7302105263157894,0.8803156,0.5960000000000001,1.6343199
0.7353473684210525,0.8630421,0.56,1.9686066
0.732021052631579,0.8823739,0.5632,1.8139118
0.7324631578947367,0.8676047,0.5952000000000001,1.6235788
0.7366526315789473,0.85581774,0.5392,1.9346147
0.7340210526315789,0.8636227,0.5868,1.6743768
0.7416631578947368,0.84529686,0.5836,1.6691054
0.734757894736842,0.85352796,0.516,2.227477
0.7435368421052632,0.83374214,0.582,1.697568
1 train_acc train_loss val_acc val_loss
2 0.04040000000000001 4.2986817 0.07600000000000001 3.9793916
3 0.07663157894736841 3.948711 0.09840000000000002 3.8271046
4 0.1072842105263158 3.7670445 0.0908 3.8834984
5 0.14671578947368422 3.544252 0.1784 3.3180876
6 0.18690526315789474 3.3382895 0.1672 3.4958847
7 0.2185684210526316 3.1613564 0.23240000000000002 3.0646808
8 0.2584 2.9509778 0.2904 2.7620668
9 0.2886736842105263 2.7674758 0.2504 3.083242
10 0.3186736842105263 2.6191177 0.34600000000000003 2.5320892
11 0.3488421052631579 2.4735146 0.3556 2.463249
12 0.36701052631578945 2.3815694 0.32480000000000003 2.6590502
13 0.39258947368421054 2.2661598 0.41200000000000003 2.215237
14 0.40985263157894736 2.1811035 0.3644 2.4625826
15 0.42557894736842106 2.1193688 0.3896 2.2802749
16 0.4452 2.0338347 0.45080000000000003 2.0216491
17 0.45298947368421055 1.9886738 0.3768 2.4903286
18 0.4690105263157895 1.9385177 0.46519999999999995 1.9589043
19 0.48627368421052636 1.8654134 0.46199999999999997 1.9572229
20 0.4910947368421053 1.836772 0.3947999999999999 2.371203
21 0.5033052631578947 1.7882212 0.4864 1.8270072
22 0.515578947368421 1.7451773 0.418 2.2281988
23 0.5166526315789474 1.7310464 0.4744 1.9468222
24 0.532 1.6639497 0.5176 1.7627875
25 0.534821052631579 1.6504371 0.426 2.2908173
26 0.5399578947368422 1.6263881 0.5092 1.7892419
27 0.5538105263157893 1.5786182 0.5184 1.7781507
28 0.5530526315789474 1.5743873 0.45480000000000004 2.052206
29 0.5610526315789474 1.5367776 0.5404000000000001 1.6886607
30 0.5709263157894736 1.508275 0.5072000000000001 1.8317349
31 0.5693894736842106 1.5026951 0.49760000000000004 1.9268813
32 0.5827368421052632 1.4614111 0.5484 1.6791071
33 0.583557894736842 1.4580216 0.4744 2.084504
34 0.5856842105263159 1.4402864 0.5468 1.6674811
35 0.5958105263157895 1.4054152 0.5468 1.7081916
36 0.5964631578947368 1.4043275 0.4988 1.8901508
37 0.6044631578947368 1.3692447 0.548 1.6456038
38 0.6065473684210526 1.3562685 0.5448 1.7725601
39 0.6055578947368421 1.3638091 0.52 1.803752
40 0.6169684210526316 1.3224502 0.5688 1.6048553
41 0.6184421052631579 1.3228824 0.4772 2.0309162
42 0.6193894736842105 1.312684 0.5496 1.6357917
43 0.6287368421052631 1.2758818 0.5552 1.7120187
44 0.6270105263157894 1.2829372 0.4872000000000001 1.9630791
45 0.6313473684210527 1.2609128 0.5632 1.6049384
46 0.6374736842105263 1.2429903 0.5516 1.7101723
47 0.6342947368421055 1.2540665 0.5272 1.8112053
48 0.642778947368421 1.2098345 0.5692 1.5996393
49 0.6447368421052632 1.217454 0.5056 2.087292
50 0.6437052631578949 1.2123955 0.5660000000000001 1.6426488
51 0.6533263157894735 1.1804259 0.5672 1.6429158
52 0.6521052631578947 1.1856273 0.5316000000000001 1.8833923
53 0.658021052631579 1.1663536 0.5652 1.6239171
54 0.6622947368421054 1.1522906 0.5376000000000001 1.8352613
55 0.6543789473684212 1.1700194 0.5539999999999999 1.7920883
56 0.6664 1.1246897 0.5828 1.5657492
57 0.6645473684210526 1.1307288 0.5296 1.8285477
58 0.6647157894736843 1.1294464 0.5852 1.59438
59 0.6713473684210526 1.1020554 0.5647999999999999 1.6256377
60 0.6691368421052631 1.1129124 0.5224 1.9497899
61 0.6737684210526315 1.0941163 0.5708 1.5900868
62 0.6765473684210527 1.0844595 0.55 1.7522817
63 0.6762947368421053 1.0832069 0.5428000000000001 1.8020345
64 0.6799789473684209 1.0637755 0.5864 1.5690281
65 0.6808421052631578 1.066873 0.5168 1.9964217
66 0.6843157894736842 1.0618489 0.5720000000000001 1.6391727
67 0.6866736842105262 1.0432214 0.5731999999999999 1.6571078
68 0.6877684210526315 1.0442319 0.5192 2.0341485
69 0.6890105263157895 1.0338738 0.5836 1.5887364
70 0.693642105263158 1.0206536 0.5456 1.8537303
71 0.6905894736842106 1.0271776 0.5548000000000001 1.8022745
72 0.6981263157894737 1.001102 0.5852 1.5923084
73 0.6986105263157896 1.0052379 0.512 2.011443
74 0.698042105263158 0.9990784 0.5744 1.638558
75 0.7031578947368421 0.977477 0.5816 1.5790274
76 0.7013473684210526 0.98766434 0.5448000000000001 1.8414693
77 0.7069684210526315 0.9691622 0.59 1.5866013
78 0.7061894736842105 0.9620083 0.55 1.7695292
79 0.7050526315789474 0.9689725 0.5408 1.8329593
80 0.7101052631578948 0.95279986 0.5852 1.5835829
81 0.7122315789473684 0.9483001 0.5224 1.9749893
82 0.7115157894736842 0.94911486 0.5808 1.6965445
83 0.7166315789473684 0.9338312 0.5788 1.6249495
84 0.7120631578947368 0.9428737 0.5224 1.9721117
85 0.7197263157894737 0.92057914 0.5960000000000001 1.6235417
86 0.7258315789473684 0.9071854 0.528 2.0651033
87 0.7186947368421053 0.922529 0.5628 1.7508049
88 0.7257684210526316 0.9007169 0.5980000000000001 1.5797865
89 0.7254105263157896 0.89657074 0.5472 1.8673587
90 0.7229263157894736 0.90324384 0.5771999999999999 1.6998875
91 0.7308842105263157 0.8757633 0.5856 1.6750972
92 0.7254947368421052 0.8956531 0.5479999999999999 1.9809356
93 0.7302105263157894 0.8803156 0.5960000000000001 1.6343199
94 0.7353473684210525 0.8630421 0.56 1.9686066
95 0.732021052631579 0.8823739 0.5632 1.8139118
96 0.7324631578947367 0.8676047 0.5952000000000001 1.6235788
97 0.7366526315789473 0.85581774 0.5392 1.9346147
98 0.7340210526315789 0.8636227 0.5868 1.6743768
99 0.7416631578947368 0.84529686 0.5836 1.6691054
100 0.734757894736842 0.85352796 0.516 2.227477
101 0.7435368421052632 0.83374214 0.582 1.697568

View File

@ -1,2 +0,0 @@
test_acc,test_loss
0.6018000000000001,1.5933747
1 test_acc test_loss
2 0.6018000000000001 1.5933747

View File

@ -1,101 +0,0 @@
train_acc,train_loss,val_acc,val_loss
0.009600000000000001,4.609349,0.0104,4.6072426
0.009326315789473684,4.6068563,0.0092,4.606588
0.009747368421052631,4.6062207,0.0084,4.606326
0.009621052631578947,4.6059957,0.0076,4.6067405
0.009873684210526314,4.605887,0.0076,4.6068487
0.009136842105263157,4.605854,0.008,4.6074386
0.009536842105263158,4.605795,0.007200000000000001,4.6064863
0.009578947368421051,4.6057415,0.006400000000000001,4.6065035
0.009410526315789473,4.6058245,0.0076,4.606772
0.009094736842105263,4.6057224,0.007600000000000001,4.6064925
0.00911578947368421,4.605707,0.007200000000000001,4.6067533
0.009852631578947368,4.605685,0.007200000000000001,4.6068745
0.01031578947368421,4.6056952,0.0072,4.6067533
0.009789473684210527,4.6057863,0.0072,4.6070247
0.01031578947368421,4.6056023,0.0064,4.607134
0.010189473684210526,4.605698,0.0064,4.606934
0.009957894736842107,4.605643,0.006400000000000001,4.6068535
0.009452631578947369,4.605595,0.0064,4.6070676
0.009368421052631578,4.6057224,0.008,4.6070356
0.010210526315789474,4.6056094,0.009600000000000001,4.6070833
0.009557894736842105,4.6056895,0.0076,4.6069493
0.009600000000000001,4.605709,0.008400000000000001,4.60693
0.00985263157894737,4.6055284,0.0084,4.6068263
0.009200000000000002,4.60564,0.0076,4.6071053
0.009031578947368422,4.6056323,0.008400000000000001,4.606731
0.009663157894736842,4.60559,0.0068,4.6069546
0.008484210526315789,4.605676,0.009600000000000001,4.6063976
0.0096,4.605595,0.011200000000000002,4.6067076
0.00951578947368421,4.605619,0.0096,4.6068506
0.009242105263157895,4.6056657,0.0072,4.6067576
0.009326315789473684,4.6055913,0.012,4.6070724
0.01023157894736842,4.605646,0.012000000000000002,4.6066885
0.009494736842105262,4.605563,0.0072,4.6067305
0.009810526315789474,4.6055746,0.007200000000000001,4.6067824
0.010147368421052632,4.605596,0.0072,4.607214
0.009536842105263156,4.6055007,0.007200000000000001,4.607186
0.009452631578947369,4.605547,0.0072,4.607297
0.009578947368421055,4.6055694,0.0072,4.607313
0.009410526315789475,4.6055374,0.0072,4.60726
0.00985263157894737,4.605587,0.0072,4.6072307
0.009389473684210526,4.605559,0.0072,4.607227
0.009852631578947368,4.6055884,0.008,4.6070976
0.008968421052631579,4.6055803,0.008,4.607156
0.009536842105263158,4.605502,0.0076,4.6073594
0.009410526315789473,4.6055517,0.008,4.607176
0.01,4.6055126,0.006400000000000001,4.606937
0.009915789473684213,4.6055126,0.008,4.607185
0.009305263157894737,4.605594,0.0064,4.606834
0.009326315789473684,4.6054907,0.008,4.6070714
0.009094736842105263,4.6055007,0.0076,4.6068645
0.009052631578947368,4.6055903,0.008400000000000001,4.606755
0.010294736842105263,4.605449,0.008,4.6068816
0.009578947368421055,4.6054883,0.0064,4.6067166
0.009452631578947369,4.60552,0.01,4.6066008
0.008821052631578948,4.6054573,0.009600000000000001,4.6065955
0.008968421052631579,4.605544,0.008,4.6063676
0.010147368421052632,4.605516,0.0064,4.6068606
0.009600000000000001,4.6054597,0.0096,4.6072354
0.01008421052631579,4.605526,0.0076,4.6074166
0.010126315789473685,4.6054554,0.0076,4.6074657
0.009705263157894736,4.6054635,0.0088,4.607237
0.009726315789473684,4.605516,0.007200000000000001,4.606978
0.009894736842105262,4.6054883,0.0072,4.607135
0.009663157894736842,4.605501,0.007200000000000001,4.607015
0.00976842105263158,4.605536,0.008,4.6073785
0.009473684210526316,4.6055303,0.009600000000000001,4.6070166
0.009347368421052632,4.6054993,0.0076,4.607084
0.009178947368421054,4.6054535,0.0084,4.6070604
0.008842105263157892,4.605507,0.0076,4.6069884
0.009726315789473684,4.6055107,0.007599999999999999,4.6069903
0.009536842105263156,4.6054244,0.0084,4.6070695
0.009452631578947369,4.605474,0.0072,4.607035
0.009621052631578949,4.605444,0.0076,4.6071277
0.010084210526315791,4.6054263,0.0076,4.6071534
0.009326315789473686,4.605477,0.0088,4.607115
0.009010526315789472,4.60548,0.0076,4.6072206
0.010042105263157897,4.605475,0.0076,4.607185
0.00976842105263158,4.6054463,0.008400000000000001,4.6071196
0.01,4.605421,0.008,4.6069384
0.009536842105263156,4.605482,0.008,4.607035
0.009915789473684213,4.6054354,0.008,4.6071534
0.010042105263157894,4.6054177,0.007200000000000001,4.607074
0.009242105263157895,4.605473,0.0072,4.606825
0.009726315789473684,4.6054006,0.0072,4.606701
0.009684210526315788,4.6054583,0.0104,4.606925
0.009642105263157895,4.6054606,0.0104,4.6068645
0.00936842105263158,4.605405,0.0076,4.606976
0.009263157894736843,4.605455,0.0076,4.606981
0.00905263157894737,4.6054463,0.0092,4.6070757
0.009915789473684213,4.605465,0.0068000000000000005,4.607151
0.009389473684210526,4.605481,0.008400000000000001,4.606995
0.009789473684210527,4.605436,0.0068000000000000005,4.6071105
0.010273684210526315,4.605466,0.007200000000000001,4.606909
0.009789473684210527,4.605443,0.0072,4.6066866
0.009957894736842107,4.6053886,0.0076,4.606541
0.010168421052631578,4.605481,0.006400000000000001,4.606732
0.009242105263157894,4.605444,0.006400000000000001,4.606939
0.009621052631578949,4.6054454,0.008,4.606915
0.00976842105263158,4.60547,0.0076,4.6068935
0.009873684210526316,4.6055245,0.0064,4.6072345
1 train_acc train_loss val_acc val_loss
2 0.009600000000000001 4.609349 0.0104 4.6072426
3 0.009326315789473684 4.6068563 0.0092 4.606588
4 0.009747368421052631 4.6062207 0.0084 4.606326
5 0.009621052631578947 4.6059957 0.0076 4.6067405
6 0.009873684210526314 4.605887 0.0076 4.6068487
7 0.009136842105263157 4.605854 0.008 4.6074386
8 0.009536842105263158 4.605795 0.007200000000000001 4.6064863
9 0.009578947368421051 4.6057415 0.006400000000000001 4.6065035
10 0.009410526315789473 4.6058245 0.0076 4.606772
11 0.009094736842105263 4.6057224 0.007600000000000001 4.6064925
12 0.00911578947368421 4.605707 0.007200000000000001 4.6067533
13 0.009852631578947368 4.605685 0.007200000000000001 4.6068745
14 0.01031578947368421 4.6056952 0.0072 4.6067533
15 0.009789473684210527 4.6057863 0.0072 4.6070247
16 0.01031578947368421 4.6056023 0.0064 4.607134
17 0.010189473684210526 4.605698 0.0064 4.606934
18 0.009957894736842107 4.605643 0.006400000000000001 4.6068535
19 0.009452631578947369 4.605595 0.0064 4.6070676
20 0.009368421052631578 4.6057224 0.008 4.6070356
21 0.010210526315789474 4.6056094 0.009600000000000001 4.6070833
22 0.009557894736842105 4.6056895 0.0076 4.6069493
23 0.009600000000000001 4.605709 0.008400000000000001 4.60693
24 0.00985263157894737 4.6055284 0.0084 4.6068263
25 0.009200000000000002 4.60564 0.0076 4.6071053
26 0.009031578947368422 4.6056323 0.008400000000000001 4.606731
27 0.009663157894736842 4.60559 0.0068 4.6069546
28 0.008484210526315789 4.605676 0.009600000000000001 4.6063976
29 0.0096 4.605595 0.011200000000000002 4.6067076
30 0.00951578947368421 4.605619 0.0096 4.6068506
31 0.009242105263157895 4.6056657 0.0072 4.6067576
32 0.009326315789473684 4.6055913 0.012 4.6070724
33 0.01023157894736842 4.605646 0.012000000000000002 4.6066885
34 0.009494736842105262 4.605563 0.0072 4.6067305
35 0.009810526315789474 4.6055746 0.007200000000000001 4.6067824
36 0.010147368421052632 4.605596 0.0072 4.607214
37 0.009536842105263156 4.6055007 0.007200000000000001 4.607186
38 0.009452631578947369 4.605547 0.0072 4.607297
39 0.009578947368421055 4.6055694 0.0072 4.607313
40 0.009410526315789475 4.6055374 0.0072 4.60726
41 0.00985263157894737 4.605587 0.0072 4.6072307
42 0.009389473684210526 4.605559 0.0072 4.607227
43 0.009852631578947368 4.6055884 0.008 4.6070976
44 0.008968421052631579 4.6055803 0.008 4.607156
45 0.009536842105263158 4.605502 0.0076 4.6073594
46 0.009410526315789473 4.6055517 0.008 4.607176
47 0.01 4.6055126 0.006400000000000001 4.606937
48 0.009915789473684213 4.6055126 0.008 4.607185
49 0.009305263157894737 4.605594 0.0064 4.606834
50 0.009326315789473684 4.6054907 0.008 4.6070714
51 0.009094736842105263 4.6055007 0.0076 4.6068645
52 0.009052631578947368 4.6055903 0.008400000000000001 4.606755
53 0.010294736842105263 4.605449 0.008 4.6068816
54 0.009578947368421055 4.6054883 0.0064 4.6067166
55 0.009452631578947369 4.60552 0.01 4.6066008
56 0.008821052631578948 4.6054573 0.009600000000000001 4.6065955
57 0.008968421052631579 4.605544 0.008 4.6063676
58 0.010147368421052632 4.605516 0.0064 4.6068606
59 0.009600000000000001 4.6054597 0.0096 4.6072354
60 0.01008421052631579 4.605526 0.0076 4.6074166
61 0.010126315789473685 4.6054554 0.0076 4.6074657
62 0.009705263157894736 4.6054635 0.0088 4.607237
63 0.009726315789473684 4.605516 0.007200000000000001 4.606978
64 0.009894736842105262 4.6054883 0.0072 4.607135
65 0.009663157894736842 4.605501 0.007200000000000001 4.607015
66 0.00976842105263158 4.605536 0.008 4.6073785
67 0.009473684210526316 4.6055303 0.009600000000000001 4.6070166
68 0.009347368421052632 4.6054993 0.0076 4.607084
69 0.009178947368421054 4.6054535 0.0084 4.6070604
70 0.008842105263157892 4.605507 0.0076 4.6069884
71 0.009726315789473684 4.6055107 0.007599999999999999 4.6069903
72 0.009536842105263156 4.6054244 0.0084 4.6070695
73 0.009452631578947369 4.605474 0.0072 4.607035
74 0.009621052631578949 4.605444 0.0076 4.6071277
75 0.010084210526315791 4.6054263 0.0076 4.6071534
76 0.009326315789473686 4.605477 0.0088 4.607115
77 0.009010526315789472 4.60548 0.0076 4.6072206
78 0.010042105263157897 4.605475 0.0076 4.607185
79 0.00976842105263158 4.6054463 0.008400000000000001 4.6071196
80 0.01 4.605421 0.008 4.6069384
81 0.009536842105263156 4.605482 0.008 4.607035
82 0.009915789473684213 4.6054354 0.008 4.6071534
83 0.010042105263157894 4.6054177 0.007200000000000001 4.607074
84 0.009242105263157895 4.605473 0.0072 4.606825
85 0.009726315789473684 4.6054006 0.0072 4.606701
86 0.009684210526315788 4.6054583 0.0104 4.606925
87 0.009642105263157895 4.6054606 0.0104 4.6068645
88 0.00936842105263158 4.605405 0.0076 4.606976
89 0.009263157894736843 4.605455 0.0076 4.606981
90 0.00905263157894737 4.6054463 0.0092 4.6070757
91 0.009915789473684213 4.605465 0.0068000000000000005 4.607151
92 0.009389473684210526 4.605481 0.008400000000000001 4.606995
93 0.009789473684210527 4.605436 0.0068000000000000005 4.6071105
94 0.010273684210526315 4.605466 0.007200000000000001 4.606909
95 0.009789473684210527 4.605443 0.0072 4.6066866
96 0.009957894736842107 4.6053886 0.0076 4.606541
97 0.010168421052631578 4.605481 0.006400000000000001 4.606732
98 0.009242105263157894 4.605444 0.006400000000000001 4.606939
99 0.009621052631578949 4.6054454 0.008 4.606915
100 0.00976842105263158 4.60547 0.0076 4.6068935
101 0.009873684210526316 4.6055245 0.0064 4.6072345

View File

@ -1,2 +0,0 @@
test_acc,test_loss
0.01,4.6053004
1 test_acc test_loss
2 0.01 4.6053004

View File

@ -1 +0,0 @@
Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory. However certain LaTeX distributions are missing certain files, and the they are included in this directory. If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.

View File

@ -1,79 +0,0 @@
% ALGORITHM STYLE -- Released 8 April 1996
% for LaTeX-2e
% Copyright -- 1994 Peter Williams
% E-mail Peter.Williams@dsto.defence.gov.au
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{algorithm}
\typeout{Document Style `algorithm' - floating environment}
\RequirePackage{float}
\RequirePackage{ifthen}
\newcommand{\ALG@within}{nothing}
\newboolean{ALG@within}
\setboolean{ALG@within}{false}
\newcommand{\ALG@floatstyle}{ruled}
\newcommand{\ALG@name}{Algorithm}
\newcommand{\listalgorithmname}{List of \ALG@name s}
% Declare Options
% first appearance
\DeclareOption{plain}{
\renewcommand{\ALG@floatstyle}{plain}
}
\DeclareOption{ruled}{
\renewcommand{\ALG@floatstyle}{ruled}
}
\DeclareOption{boxed}{
\renewcommand{\ALG@floatstyle}{boxed}
}
% then numbering convention
\DeclareOption{part}{
\renewcommand{\ALG@within}{part}
\setboolean{ALG@within}{true}
}
\DeclareOption{chapter}{
\renewcommand{\ALG@within}{chapter}
\setboolean{ALG@within}{true}
}
\DeclareOption{section}{
\renewcommand{\ALG@within}{section}
\setboolean{ALG@within}{true}
}
\DeclareOption{subsection}{
\renewcommand{\ALG@within}{subsection}
\setboolean{ALG@within}{true}
}
\DeclareOption{subsubsection}{
\renewcommand{\ALG@within}{subsubsection}
\setboolean{ALG@within}{true}
}
\DeclareOption{nothing}{
\renewcommand{\ALG@within}{nothing}
\setboolean{ALG@within}{true}
}
\DeclareOption*{\edef\ALG@name{\CurrentOption}}
% ALGORITHM
%
\ProcessOptions
\floatstyle{\ALG@floatstyle}
\ifthenelse{\boolean{ALG@within}}{
\ifthenelse{\equal{\ALG@within}{part}}
{\newfloat{algorithm}{htbp}{loa}[part]}{}
\ifthenelse{\equal{\ALG@within}{chapter}}
{\newfloat{algorithm}{htbp}{loa}[chapter]}{}
\ifthenelse{\equal{\ALG@within}{section}}
{\newfloat{algorithm}{htbp}{loa}[section]}{}
\ifthenelse{\equal{\ALG@within}{subsection}}
{\newfloat{algorithm}{htbp}{loa}[subsection]}{}
\ifthenelse{\equal{\ALG@within}{subsubsection}}
{\newfloat{algorithm}{htbp}{loa}[subsubsection]}{}
\ifthenelse{\equal{\ALG@within}{nothing}}
{\newfloat{algorithm}{htbp}{loa}}{}
}{
\newfloat{algorithm}{htbp}{loa}
}
\floatname{algorithm}{\ALG@name}
\newcommand{\listofalgorithms}{\listof{algorithm}{\listalgorithmname}}

View File

@ -1,201 +0,0 @@
% ALGORITHMIC STYLE -- Released 8 APRIL 1996
% for LaTeX version 2e
% Copyright -- 1994 Peter Williams
% E-mail PeterWilliams@dsto.defence.gov.au
%
% Modified by Alex Smola (08/2000)
% E-mail Alex.Smola@anu.edu.au
%
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{algorithmic}
\typeout{Document Style `algorithmic' - environment}
%
\RequirePackage{ifthen}
\RequirePackage{calc}
\newboolean{ALC@noend}
\setboolean{ALC@noend}{false}
\newcounter{ALC@line}
\newcounter{ALC@rem}
\newlength{\ALC@tlm}
%
\DeclareOption{noend}{\setboolean{ALC@noend}{true}}
%
\ProcessOptions
%
% ALGORITHMIC
\newcommand{\algorithmicrequire}{\textbf{Require:}}
\newcommand{\algorithmicensure}{\textbf{Ensure:}}
\newcommand{\algorithmiccomment}[1]{\{#1\}}
\newcommand{\algorithmicend}{\textbf{end}}
\newcommand{\algorithmicif}{\textbf{if}}
\newcommand{\algorithmicthen}{\textbf{then}}
\newcommand{\algorithmicelse}{\textbf{else}}
\newcommand{\algorithmicelsif}{\algorithmicelse\ \algorithmicif}
\newcommand{\algorithmicendif}{\algorithmicend\ \algorithmicif}
\newcommand{\algorithmicfor}{\textbf{for}}
\newcommand{\algorithmicforall}{\textbf{for all}}
\newcommand{\algorithmicdo}{\textbf{do}}
\newcommand{\algorithmicendfor}{\algorithmicend\ \algorithmicfor}
\newcommand{\algorithmicwhile}{\textbf{while}}
\newcommand{\algorithmicendwhile}{\algorithmicend\ \algorithmicwhile}
\newcommand{\algorithmicloop}{\textbf{loop}}
\newcommand{\algorithmicendloop}{\algorithmicend\ \algorithmicloop}
\newcommand{\algorithmicrepeat}{\textbf{repeat}}
\newcommand{\algorithmicuntil}{\textbf{until}}
%changed by alex smola
\newcommand{\algorithmicinput}{\textbf{input}}
\newcommand{\algorithmicoutput}{\textbf{output}}
\newcommand{\algorithmicset}{\textbf{set}}
\newcommand{\algorithmictrue}{\textbf{true}}
\newcommand{\algorithmicfalse}{\textbf{false}}
\newcommand{\algorithmicand}{\textbf{and\ }}
\newcommand{\algorithmicor}{\textbf{or\ }}
\newcommand{\algorithmicfunction}{\textbf{function}}
\newcommand{\algorithmicendfunction}{\algorithmicend\ \algorithmicfunction}
\newcommand{\algorithmicmain}{\textbf{main}}
\newcommand{\algorithmicendmain}{\algorithmicend\ \algorithmicmain}
%end changed by alex smola
\def\ALC@item[#1]{%
\if@noparitem \@donoparitem
\else \if@inlabel \indent \par \fi
\ifhmode \unskip\unskip \par \fi
\if@newlist \if@nobreak \@nbitem \else
\addpenalty\@beginparpenalty
\addvspace\@topsep \addvspace{-\parskip}\fi
\else \addpenalty\@itempenalty \addvspace\itemsep
\fi
\global\@inlabeltrue
\fi
\everypar{\global\@minipagefalse\global\@newlistfalse
\if@inlabel\global\@inlabelfalse \hskip -\parindent \box\@labels
\penalty\z@ \fi
\everypar{}}\global\@nobreakfalse
\if@noitemarg \@noitemargfalse \if@nmbrlist \refstepcounter{\@listctr}\fi \fi
\sbox\@tempboxa{\makelabel{#1}}%
\global\setbox\@labels
\hbox{\unhbox\@labels \hskip \itemindent
\hskip -\labelwidth \hskip -\ALC@tlm
\ifdim \wd\@tempboxa >\labelwidth
\box\@tempboxa
\else \hbox to\labelwidth {\unhbox\@tempboxa}\fi
\hskip \ALC@tlm}\ignorespaces}
%
\newenvironment{algorithmic}[1][0]{
\let\@item\ALC@item
\newcommand{\ALC@lno}{%
\ifthenelse{\equal{\arabic{ALC@rem}}{0}}
{{\footnotesize \arabic{ALC@line}:}}{}%
}
\let\@listii\@listi
\let\@listiii\@listi
\let\@listiv\@listi
\let\@listv\@listi
\let\@listvi\@listi
\let\@listvii\@listi
\newenvironment{ALC@g}{
\begin{list}{\ALC@lno}{ \itemsep\z@ \itemindent\z@
\listparindent\z@ \rightmargin\z@
\topsep\z@ \partopsep\z@ \parskip\z@\parsep\z@
\leftmargin 1em
\addtolength{\ALC@tlm}{\leftmargin}
}
}
{\end{list}}
\newcommand{\ALC@it}{\addtocounter{ALC@line}{1}\addtocounter{ALC@rem}{1}\ifthenelse{\equal{\arabic{ALC@rem}}{#1}}{\setcounter{ALC@rem}{0}}{}\item}
\newcommand{\ALC@com}[1]{\ifthenelse{\equal{##1}{default}}%
{}{\ \algorithmiccomment{##1}}}
\newcommand{\REQUIRE}{\item[\algorithmicrequire]}
\newcommand{\ENSURE}{\item[\algorithmicensure]}
\newcommand{\STATE}{\ALC@it}
\newcommand{\COMMENT}[1]{\algorithmiccomment{##1}}
%changes by alex smola
\newcommand{\INPUT}{\item[\algorithmicinput]}
\newcommand{\OUTPUT}{\item[\algorithmicoutput]}
\newcommand{\SET}{\item[\algorithmicset]}
% \newcommand{\TRUE}{\algorithmictrue}
% \newcommand{\FALSE}{\algorithmicfalse}
\newcommand{\AND}{\algorithmicand}
\newcommand{\OR}{\algorithmicor}
\newenvironment{ALC@func}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@main}{\begin{ALC@g}}{\end{ALC@g}}
%end changes by alex smola
\newenvironment{ALC@if}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@for}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@whl}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@loop}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@rpt}{\begin{ALC@g}}{\end{ALC@g}}
\renewcommand{\\}{\@centercr}
\newcommand{\IF}[2][default]{\ALC@it\algorithmicif\ ##2\ \algorithmicthen%
\ALC@com{##1}\begin{ALC@if}}
\newcommand{\SHORTIF}[2]{\ALC@it\algorithmicif\ ##1\
\algorithmicthen\ {##2}}
\newcommand{\ELSE}[1][default]{\end{ALC@if}\ALC@it\algorithmicelse%
\ALC@com{##1}\begin{ALC@if}}
\newcommand{\ELSIF}[2][default]%
{\end{ALC@if}\ALC@it\algorithmicelsif\ ##2\ \algorithmicthen%
\ALC@com{##1}\begin{ALC@if}}
\newcommand{\FOR}[2][default]{\ALC@it\algorithmicfor\ ##2\ \algorithmicdo%
\ALC@com{##1}\begin{ALC@for}}
\newcommand{\FORALL}[2][default]{\ALC@it\algorithmicforall\ ##2\ %
\algorithmicdo%
\ALC@com{##1}\begin{ALC@for}}
\newcommand{\SHORTFORALL}[2]{\ALC@it\algorithmicforall\ ##1\ %
\algorithmicdo\ {##2}}
\newcommand{\WHILE}[2][default]{\ALC@it\algorithmicwhile\ ##2\ %
\algorithmicdo%
\ALC@com{##1}\begin{ALC@whl}}
\newcommand{\LOOP}[1][default]{\ALC@it\algorithmicloop%
\ALC@com{##1}\begin{ALC@loop}}
%changed by alex smola
\newcommand{\FUNCTION}[2][default]{\ALC@it\algorithmicfunction\ ##2\ %
\ALC@com{##1}\begin{ALC@func}}
\newcommand{\MAIN}[2][default]{\ALC@it\algorithmicmain\ ##2\ %
\ALC@com{##1}\begin{ALC@main}}
%end changed by alex smola
\newcommand{\REPEAT}[1][default]{\ALC@it\algorithmicrepeat%
\ALC@com{##1}\begin{ALC@rpt}}
\newcommand{\UNTIL}[1]{\end{ALC@rpt}\ALC@it\algorithmicuntil\ ##1}
\ifthenelse{\boolean{ALC@noend}}{
\newcommand{\ENDIF}{\end{ALC@if}}
\newcommand{\ENDFOR}{\end{ALC@for}}
\newcommand{\ENDWHILE}{\end{ALC@whl}}
\newcommand{\ENDLOOP}{\end{ALC@loop}}
\newcommand{\ENDFUNCTION}{\end{ALC@func}}
\newcommand{\ENDMAIN}{\end{ALC@main}}
}{
\newcommand{\ENDIF}{\end{ALC@if}\ALC@it\algorithmicendif}
\newcommand{\ENDFOR}{\end{ALC@for}\ALC@it\algorithmicendfor}
\newcommand{\ENDWHILE}{\end{ALC@whl}\ALC@it\algorithmicendwhile}
\newcommand{\ENDLOOP}{\end{ALC@loop}\ALC@it\algorithmicendloop}
\newcommand{\ENDFUNCTION}{\end{ALC@func}\ALC@it\algorithmicendfunction}
\newcommand{\ENDMAIN}{\end{ALC@main}\ALC@it\algorithmicendmain}
}
\renewcommand{\@toodeep}{}
\begin{list}{\ALC@lno}{\setcounter{ALC@line}{0}\setcounter{ALC@rem}{0}%
\itemsep\z@ \itemindent\z@ \listparindent\z@%
\partopsep\z@ \parskip\z@ \parsep\z@%
\labelsep 0.5em \topsep 0.2em%
\ifthenelse{\equal{#1}{0}}
{\labelwidth 0.5em }
{\labelwidth 1.2em }
\leftmargin\labelwidth \addtolength{\leftmargin}{\labelsep}
\ALC@tlm\labelsep
}
}
{\end{list}}

View File

@ -1,485 +0,0 @@
% fancyhdr.sty version 3.2
% Fancy headers and footers for LaTeX.
% Piet van Oostrum,
% Dept of Computer and Information Sciences, University of Utrecht,
% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
% Telephone: +31 30 2532180. Email: piet@cs.uu.nl
% ========================================================================
% LICENCE:
% This file may be distributed under the terms of the LaTeX Project Public
% License, as described in lppl.txt in the base LaTeX distribution.
% Either version 1 or, at your option, any later version.
% ========================================================================
% MODIFICATION HISTORY:
% Sep 16, 1994
% version 1.4: Correction for use with \reversemargin
% Sep 29, 1994:
% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands
% Oct 4, 1994:
% version 1.6: Reset single spacing in headers/footers for use with
% setspace.sty or doublespace.sty
% Oct 4, 1994:
% version 1.7: changed \let\@mkboth\markboth to
% \def\@mkboth{\protect\markboth} to make it more robust
% Dec 5, 1994:
% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more
% importantly) use the \chapter/sectionmark definitions from ps@headings if
% they exist (which should be true for all standard classes).
% May 31, 1995:
% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage...
% construction in the doc did not work properly with the fancyplain style.
% June 1, 1995:
% version 1.91: The definition of \@mkboth wasn't restored on subsequent
% \pagestyle{fancy}'s.
% June 1, 1995:
% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain}
% \pagestyle{fancy} would erroneously select the plain version.
% June 1, 1995:
% version 1.93: \fancypagestyle command added.
% Dec 11, 1995:
% version 1.94: suggested by Conrad Hughes <chughes@maths.tcd.ie>
% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule
% position (old hardcoded value of .3\normalbaselineskip is far too high
% when used with very small footer fonts).
% Jan 31, 1996:
% version 1.95: call \@normalsize in the reset code if that is defined,
% otherwise \normalsize.
% this is to solve a problem with ucthesis.cls, as this doesn't
% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't
% work as this is optimized to do very little, so there \@normalsize should
% be called. Hopefully this code works for all versions of LaTeX known to
% mankind.
% April 25, 1996:
% version 1.96: initialize \headwidth to a magic (negative) value to catch
% most common cases that people change it before calling \pagestyle{fancy}.
% Note it can't be initialized when reading in this file, because
% \textwidth could be changed afterwards. This is quite probable.
% We also switch to \MakeUppercase rather than \uppercase and introduce a
% \nouppercase command for use in headers. and footers.
% May 3, 1996:
% version 1.97: Two changes:
% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults
% for the chapter and section marks. The current version of amsbook and
% amsart classes don't seem to need them anymore. Moreover the standard
% latex classes don't use \markboth if twoside isn't selected, and this is
% confusing as \leftmark doesn't work as expected.
% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem
% in the amsbook and amsart classes, that make global changes to \topskip,
% which are reset in \ps@empty. Hopefully this doesn't break other things.
% May 7, 1996:
% version 1.98:
% Added % after the line \def\nouppercase
% May 7, 1996:
% version 1.99: This is the alpha version of fancyhdr 2.0
% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf.
% Changed \headrulewidth, \footrulewidth, \footruleskip to
% macros rather than length parameters, In this way they can be
% conditionalized and they don't consume length registers. There is no need
% to have them as length registers unless you want to do calculations with
% them, which is unlikely. Note that this may make some uses of them
% incompatible (i.e. if you have a file that uses \setlength or \xxxx=)
% May 10, 1996:
% version 1.99a:
% Added a few more % signs
% May 10, 1996:
% version 1.99b:
% Changed the syntax of \f@nfor to be resistent to catcode changes of :=
% Removed the [1] from the defs of \lhead etc. because the parameter is
% consumed by the \@[xy]lhead etc. macros.
% June 24, 1997:
% version 1.99c:
% corrected \nouppercase to also include the protected form of \MakeUppercase
% \global added to manipulation of \headwidth.
% \iffootnote command added.
% Some comments added about \@fancyhead and \@fancyfoot.
% Aug 24, 1998
% version 1.99d
% Changed the default \ps@empty to \ps@@empty in order to allow
% \fancypagestyle{empty} redefinition.
% Oct 11, 2000
% version 2.0
% Added LPPL license clause.
%
% A check for \headheight is added. An errormessage is given (once) if the
% header is too large. Empty headers don't generate the error even if
% \headheight is very small or even 0pt.
% Warning added for the use of 'E' option when twoside option is not used.
% In this case the 'E' fields will never be used.
%
% Mar 10, 2002
% version 2.1beta
% New command: \fancyhfoffset[place]{length}
% defines offsets to be applied to the header/footer to let it stick into
% the margins (if length > 0).
% place is like in fancyhead, except that only E,O,L,R can be used.
% This replaces the old calculation based on \headwidth and the marginpar
% area.
% \headwidth will be dynamically calculated in the headers/footers when
% this is used.
%
% Mar 26, 2002
% version 2.1beta2
% \fancyhfoffset now also takes h,f as possible letters in the argument to
% allow the header and footer widths to be different.
% New commands \fancyheadoffset and \fancyfootoffset added comparable to
% \fancyhead and \fancyfoot.
% Errormessages and warnings have been made more informative.
%
% Dec 9, 2002
% version 2.1
% The defaults for \footrulewidth, \plainheadrulewidth and
% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when
% someone inadvertantly uses \setlength to change any of these, the value
% of \z@skip will not be changed, rather an errormessage will be given.
% March 3, 2004
% Release of version 3.0
% Oct 7, 2004
% version 3.1
% Added '\endlinechar=13' to \fancy@reset to prevent problems with
% includegraphics in header when verbatiminput is active.
% March 22, 2005
% version 3.2
% reset \everypar (the real one) in \fancy@reset because spanish.ldf does
% strange things with \everypar between << and >>.
\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty}
\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else
\fancy@gbl\def#1{#2\strut}\fi}
\let\fancy@gbl\global
\def\@fancyerrmsg#1{%
\ifx\PackageError\undefined
\errmessage{#1}\else
\PackageError{Fancyhdr}{#1}{}\fi}
\def\@fancywarning#1{%
\ifx\PackageWarning\undefined
\errmessage{#1}\else
\PackageWarning{Fancyhdr}{#1}{}\fi}
% Usage: \@forc \var{charstring}{command to be executed for each char}
% This is similar to LaTeX's \@tfor, but expands the charstring.
\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}}
\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else
\f@@rc#1#2\f@@rc{#3}\fi}
\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}}
% Usage: \f@nfor\name:=list\do{body}
% Like LaTeX's \@for but an empty list is treated as a list with an empty
% element
\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}%
\expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}}
% Usage: \def@ult \cs{defaults}{argument}
% sets \cs to the characters from defaults appearing in argument
% or defaults if it would be empty. All characters are lowercased.
\newcommand\def@ult[3]{%
\edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a
\def#1{}%
\@forc\tmpf@ra{#2}%
{\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}%
\ifx\@empty#1\def#1{#2}\fi}
%
% \if@in <char><set><truecase><falsecase>
%
\newcommand{\if@in}[4]{%
\edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}%
\expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi}
\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}%
{\f@ncyhf\fancyhead h[]}}
\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}%
{\f@ncyhf\fancyfoot f[]}}
\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}%
{\f@ncyhf\fancyhf{}[]}}
% New commands for offsets added
\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}%
{\f@ncyhfoffs\fancyheadoffset h[]}}
\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}%
{\f@ncyhfoffs\fancyfootoffset f[]}}
\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}%
{\f@ncyhfoffs\fancyhfoffset{}[]}}
% The header and footer fields are stored in command sequences with
% names of the form: \f@ncy<x><y><z> with <x> for [eo], <y> from [lcr]
% and <z> from [hf].
\def\f@ncyhf#1#2[#3]#4{%
\def\temp@c{}%
\@forc\tmpf@ra{#3}%
{\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}%
{}{\edef\temp@c{\temp@c\tmpf@ra}}}%
\ifx\@empty\temp@c\else
\@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
[#3]}%
\fi
\f@nfor\temp@c{#3}%
{\def@ult\f@@@eo{eo}\temp@c
\if@twoside\else
\if\f@@@eo e\@fancywarning
{\string#1's `E' option without twoside option is useless}\fi\fi
\def@ult\f@@@lcr{lcr}\temp@c
\def@ult\f@@@hf{hf}{#2\temp@c}%
\@forc\f@@eo\f@@@eo
{\@forc\f@@lcr\f@@@lcr
{\@forc\f@@hf\f@@@hf
{\expandafter\fancy@def\csname
f@ncy\f@@eo\f@@lcr\f@@hf\endcsname
{#4}}}}}}
\def\f@ncyhfoffs#1#2[#3]#4{%
\def\temp@c{}%
\@forc\tmpf@ra{#3}%
{\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}%
{}{\edef\temp@c{\temp@c\tmpf@ra}}}%
\ifx\@empty\temp@c\else
\@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
[#3]}%
\fi
\f@nfor\temp@c{#3}%
{\def@ult\f@@@eo{eo}\temp@c
\if@twoside\else
\if\f@@@eo e\@fancywarning
{\string#1's `E' option without twoside option is useless}\fi\fi
\def@ult\f@@@lcr{lr}\temp@c
\def@ult\f@@@hf{hf}{#2\temp@c}%
\@forc\f@@eo\f@@@eo
{\@forc\f@@lcr\f@@@lcr
{\@forc\f@@hf\f@@@hf
{\expandafter\setlength\csname
f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname
{#4}}}}}%
\fancy@setoffs}
% Fancyheadings version 1 commands. These are more or less deprecated,
% but they continue to work.
\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}}
\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}}
\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}}
\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}}
\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}}
\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}}
\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}}
\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}}
\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}}
\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}}
\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}}
\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}}
\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}}
\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}}
\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}}
\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}}
\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}}
\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}}
\newlength{\fancy@headwidth}
\let\headwidth\fancy@headwidth
\newlength{\f@ncyO@elh}
\newlength{\f@ncyO@erh}
\newlength{\f@ncyO@olh}
\newlength{\f@ncyO@orh}
\newlength{\f@ncyO@elf}
\newlength{\f@ncyO@erf}
\newlength{\f@ncyO@olf}
\newlength{\f@ncyO@orf}
\newcommand{\headrulewidth}{0.4pt}
\newcommand{\footrulewidth}{0pt}
\newcommand{\footruleskip}{.3\normalbaselineskip}
% Fancyplain stuff shouldn't be used anymore (rather
% \fancypagestyle{plain} should be used), but it must be present for
% compatibility reasons.
\newcommand{\plainheadrulewidth}{0pt}
\newcommand{\plainfootrulewidth}{0pt}
\newif\if@fancyplain \@fancyplainfalse
\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi}
\headwidth=-123456789sp %magic constant
% Command to reset various things in the headers:
% a.o. single spacing (taken from setspace.sty)
% and the catcode of ^^M (so that epsf files in the header work if a
% verbatim crosses a page boundary)
% It also defines a \nouppercase command that disables \uppercase and
% \Makeuppercase. It can only be used in the headers and footers.
\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf
\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13
\def\baselinestretch{1}%
\def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax
\expandafter\let\csname MakeUppercase \endcsname\relax##1}}%
\ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e
\ifx\@normalsize\undefined \normalsize % for ucthesis.cls
\else \@normalsize \fi
\else% NFSS (2.09) present
\@newbaseline%
\fi}
% Initialization of the head and foot text.
% The default values still contain \fancyplain for compatibility.
\fancyhf{} % clear all
% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages
% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages
\if@twoside
\fancyhead[el,or]{\fancyplain{}{\sl\rightmark}}
\fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}}
\else
\fancyhead[l]{\fancyplain{}{\sl\rightmark}}
\fancyhead[r]{\fancyplain{}{\sl\leftmark}}
\fi
\fancyfoot[c]{\rm\thepage} % page number
% Use box 0 as a temp box and dimen 0 as temp dimen.
% This can be done, because this code will always
% be used inside another box, and therefore the changes are local.
\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning
{\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J
We now make it that large for the rest of the document.^^J
This may cause the page layout to be inconsistent, however\@gobble}%
\dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi
\box0}
% Put together a header or footer given the left, center and
% right text, fillers at left and right and a rule.
% The \lap commands put the text into an hbox of zero size,
% so overlapping text does not generate an errormessage.
% These macros have 5 parameters:
% 1. LEFTSIDE BEARING % This determines at which side the header will stick
% out. When \fancyhfoffset is used this calculates \headwidth, otherwise
% it is \hss or \relax (after expansion).
% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component.
% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp.
% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component.
% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion).
\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
\@fancyvbox\headheight{\hbox
{\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill
\parbox[b]{\headwidth}{\centering#3}\hfill
\llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5}
\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
\@fancyvbox\footskip{\footrule
\hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill
\parbox[t]{\headwidth}{\centering#3}\hfill
\llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5}
\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi
\hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}}
\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi
\vskip-\footruleskip\vskip-\footrulewidth
\hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}}
\def\ps@fancy{%
\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook
%
% Define \MakeUppercase for old LaTeXen.
% Note: we used \def rather than \let, so that \let\uppercase\relax (from
% the version 1 documentation) will still work.
%
\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}%
\@ifundefined{chapter}{\def\sectionmark##1{\markboth
{\MakeUppercase{\ifnum \c@secnumdepth>\z@
\thesection\hskip 1em\relax \fi ##1}}{}}%
\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne
\thesubsection\hskip 1em\relax \fi ##1}}}%
{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne
\@chapapp\ \thechapter. \ \fi ##1}}{}}%
\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@
\thesection. \ \fi ##1}}}}%
%\csname ps@headings\endcsname % use \ps@headings defaults if they exist
\ps@@fancy
\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}%
% Initialize \headwidth if the user didn't
%
\ifdim\headwidth<0sp
%
% This catches the case that \headwidth hasn't been initialized and the
% case that the user added something to \headwidth in the expectation that
% it was initialized to \textwidth. We compensate this now. This loses if
% the user intended to multiply it by a factor. But that case is more
% likely done by saying something like \headwidth=1.2\textwidth.
% The doc says you have to change \headwidth after the first call to
% \pagestyle{fancy}. This code is just to catch the most common cases were
% that requirement is violated.
%
\global\advance\headwidth123456789sp\global\advance\headwidth\textwidth
\fi}
\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy}
\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy}
\let\ps@@empty\ps@empty
\def\ps@@fancy{%
\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip
\def\@mkboth{\protect\markboth}%
\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}%
\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}%
\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}%
\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}%
}
% Default definitions for compatibility mode:
% These cause the header/footer to take the defined \headwidth as width
% And to shift in the direction of the marginpar area
\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi}
\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi}
\let\fancy@Oelh\fancy@Oorh
\let\fancy@Oerh\fancy@Oolh
\let\fancy@Oolf\fancy@Oolh
\let\fancy@Oorf\fancy@Oorh
\let\fancy@Oelf\fancy@Oelh
\let\fancy@Oerf\fancy@Oerh
% New definitions for the use of \fancyhfoffset
% These calculate the \headwidth from \textwidth and the specified offsets.
\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh
\advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh}
\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh
\advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh}
\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf
\advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf}
\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf
\advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf}
\def\fancy@setoffs{%
% Just in case \let\headwidth\textwidth was used
\fancy@gbl\let\headwidth\fancy@headwidth
\fancy@gbl\let\fancy@Oolh\fancy@offsolh
\fancy@gbl\let\fancy@Oelh\fancy@offselh
\fancy@gbl\let\fancy@Oorh\hss
\fancy@gbl\let\fancy@Oerh\hss
\fancy@gbl\let\fancy@Oolf\fancy@offsolf
\fancy@gbl\let\fancy@Oelf\fancy@offself
\fancy@gbl\let\fancy@Oorf\hss
\fancy@gbl\let\fancy@Oerf\hss}
\newif\iffootnote
\let\latex@makecol\@makecol
\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi
\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol}
\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi}
\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi}
\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi}
\newcommand{\fancypagestyle}[2]{%
\@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}}

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@ -1,176 +0,0 @@
%% REPLACE sXXXXXXX with your student number
\def\studentNumber{s2759177}
%% START of YOUR ANSWERS
%% Add answers to the questions below, by replacing the text inside the brackets {} for \youranswer{ "Text to be replaced with your answer." }.
%
% Do not delete the commands for adding figures and tables. Instead fill in the missing values with your experiment results, and replace the images with your own respective figures.
%
% You can generally delete the placeholder text, such as for example the text "Question Figure 3 - Replace the images ..."
%
% There are 5 TEXT QUESTIONS. Replace the text inside the brackets of the command \youranswer with your answer to the question.
%
% There are also 3 "questions" to replace some placeholder FIGURES with your own, and 1 "question" asking you to fill in the missing entries in the TABLE provided.
%
% NOTE! that questions are ordered by the order of appearance of their answers in the text, and not necessarily by the order you should tackle them. You should attempt to fill in the TABLE and FIGURES before discussing the results presented there.
%
% NOTE! If for some reason you do not manage to produce results for some FIGURES and the TABLE, then you can get partial marks by discussing your expectations of the results in the relevant TEXT QUESTIONS. The TABLE specifically has enough information in it already for you to draw meaningful conclusions.
%
% Please refer to the coursework specification for more details.
%% - - - - - - - - - - - - TEXT QUESTIONS - - - - - - - - - - - -
%% Question 1:
% Use Figures 1, 2, and 3 to identify the Vanishing Gradient Problem (which of these model suffers from it, and what are the consequences depicted?).
% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page}
\newcommand{\questionOne} {
\youranswer{
We can observe the 8 layer network learning (even though it does not achieve high accuracy), but the 38-layer network fails to learn, as its gradients vanish almost entirely in the earlier layers. This is evident in Figure 3, where the gradients in VGG38 are close to zero for all but the last few layers, preventing effective weight updates during backpropagation. Consequently, the deeper network is unable to extract meaningful features or minimize its loss, leading to stagnation in both training and validation performance.
We conclude that VGG08 performs nominally during training, while VGG38 suffers from the vanishing gradient problem, as its gradients diminish to near-zero in early layers, impeding effective weight updates and preventing the network from learning meaningful features. This limitation nullifies the advantages of its deeper architecture, as reflected in its stagnant loss and accuracy throughout training. This is in stark contrast to VGG08 which maintains a healthy gradient flow across layers, allowing effective weight updates and enabling the network to learn features, reduce loss, and improve accuracy despite its smaller depth.
}
}
%% Question 2:
% Consider these results (including Figure 1 from \cite{he2016deep}). Discuss the relation between network capacity and overfitting, and whether, and how, this is reflected on these results. What other factors may have lead to this difference in performance?
% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page
\newcommand{\questionTwo} {
\youranswer{Our results thus corroborate that increasing network depth can lead to higher training and testing errors, as seen in the comparison between VGG08 and VGG38. While deeper networks, like VGG38, have a larger capacity to learn complex features, they may struggle to generalize effectively, resulting in overfitting and poor performance on unseen data. This is consistent with the behaviour observed in Figure 1 from \cite{he2016deep}, where the 56-layer network exhibits higher training error and, consequently, higher test error compared to the 20-layer network.
Our results suggest that the increased capacity of VGG38 does not translate into better generalization, likely due to the vanishing gradient problem, which hinders learning in deeper networks. Other factors, such as inadequate regularization or insufficient data augmentation, could also contribute to the observed performance difference, leading to overfitting in deeper architectures.}
}
%% Question 3:
% In this coursework, we didn't incorporate residual connections to the downsampling layers. Explain and justify what would need to be changed in order to add residual connections to the downsampling layers. Give and explain 2 ways of incorporating these changes and discuss pros and cons of each.
\newcommand{\questionThree} {
\youranswer{
Our work does not incorporate residual connections across the downsampling layers, as this creates a dimensional mismatch between the input and output feature maps due to the reduction in spatial dimensions. To add residual connections, one approach is to use a convolutional layer with a kernel size of $1\times 1$, stride, and padding matched to the downsampling operation to transform the input to the same shape as the output. Another approach would be to use average pooling or max pooling directly on the residual connection to downsample the input feature map, matching its spatial dimensions to the output, followed by a linear transformation to align the channel dimensions.
The difference between these two methods is that the first approach using a $1\times 1$ convolution provides more flexibility by learning the transformation, which can enhance model expressiveness but increases computational cost, whereas the second approach with pooling is computationally cheaper and simpler but may lose fine-grained information due to the fixed, non-learnable nature of pooling operations.
}
}
%% Question 4:
% Question 4 - Present and discuss the experiment results (all of the results and not just the ones you had to fill in) in Table 1 and Figures 4 and 5 (you may use any of the other Figures if you think they are relevant to your analysis). You will have to determine what data are relevant to the discussion, and what information can be extracted from it. Also, discuss what further experiments you would have ran on any combination of VGG08, VGG38, BN, RC in order to
% \begin{itemize}
% \item Improve performance of the model trained (explain why you expect your suggested experiments will help with this).
% \item Learn more about the behaviour of BN and RC (explain what you are trying to learn and how).
% \end{itemize}
%
% The average length for an answer to this question is approximately 1 of the columns in a 2-column page
\newcommand{\questionFour} {
\youranswer{
Our results demonstrate the effectiveness of batch normalization and residual connection as proposed by \cite{he2016deep}, enabling effective training of deep convolutional networks as shown by the significant improvement in training and validation performance for VGG38 when incorporating these techniques. Table~\ref{tab:CIFAR_results} highlights that adding BN alone (VGG38 BN) reduces both training and validation losses compared to the baseline VGG38, with validation accuracy increasing from near-zero to $47.68\%$ at a learning rate (LR) of $1\mathrm{e}{-3}$. Adding RC further enhances performance, as seen in VGG38 RC achieving $52.32\%$ validation accuracy under the same conditions. The combination of BN and RC (VGG38 BN + RC) yields the best results, achieving $53.76\%$ validation accuracy with LR $1\mathrm{e}{-3}$. BN+RC appears to benefit greatly from a higher learning rate, as it improves further to $58.20\%$ a LR of $1\mathrm{e}{-2}$. BN alone however deteriorates at higher learning rates, as evidenced by lower validation accuracy, emphasizing the stabilizing role of RC. \autoref{fig:training_curves_bestModel} confirms the synergy of BN and RC, with the VGG38 BN + RC model reaching $74\%$ training accuracy and plateauing near $60\%$ validation accuracy. \autoref{fig:avg_grad_flow_bestModel} illustrates stable gradient flow, with BN mitigating vanishing gradients and RC maintaining gradient propagation through deeper layers, particularly in the later stages of the network.
While this work did not evaluate residual connections on downsampling layers, a thorough evaluation of both methods put forth earlier would be required to complete the picture, highlighting how exactly residual connections in downsampling layers affect gradient flow, feature learning, and overall performance. Such an evaluation would clarify whether the additional computational cost of using $1\times 1$ convolutions for matching dimensions is justified by improved accuracy or if the simpler pooling-based approach suffices, particularly for tasks where computational efficiency is crucial.
}
}
%% Question 5:
% Briefly draw your conclusions based on the results from the previous sections (what are the take-away messages?) and conclude your report with a recommendation for future work.
%
% Good recommendations for future work also draw on the broader literature (the papers already referenced are good starting points). Great recommendations for future work are not just incremental (an example of an incremental suggestion would be: ``we could also train with different learning rates'') but instead also identify meaningful questions or, in other words, questions with answers that might be somewhat more generally applicable.
%
% For example, \citep{huang2017densely} end with \begin{quote}``Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features, e.g., [4,5].''\end{quote}
%
% while \cite{bengio1993problem} state in their conclusions that \begin{quote}``There remains theoretical questions to be considered, such as whether the problem with simple gradient descent discussed in this paper would be observed with chaotic attractors that are not hyperbolic.''\\\end{quote}
%
% The length of this question description is indicative of the average length of a conclusion section
\newcommand{\questionFive} {
\youranswer{
The results presented showcase a clear solution to the vanishing gradient problem. With batch normalization and Residual Connections, we are able to train much deeper neural networks effectively, as evidenced by the improved performance of VGG38 with these modifications. The combination of BN and RC not only stabilizes gradient flow but also enhances both training and validation accuracy, particularly when paired with an appropriate learning rate. These findings reinforce the utility of architectural innovations like those proposed in \cite{he2016deep} and \cite{ioffe2015batch}, which have become foundational in modern deep learning.
While these methods appear to enable training of deeper neural networks, the critical question of how these architectural enhancements generalize across different datasets and tasks remains open. Future work could investigate the effectiveness of BN and RC in scenarios involving large-scale datasets, such as ImageNet, or in domains like natural language processing and generative models, where deep architectures also face optimization challenges. Additionally, exploring the interplay between residual connections and emerging techniques like attention mechanisms \citep{vaswani2017attention} might uncover further synergies. Beyond this, understanding the theoretical underpinnings of how residual connections influence optimization landscapes and gradient flow could yield insights applicable to designing novel architectures.}
}
%% - - - - - - - - - - - - FIGURES - - - - - - - - - - - -
%% Question Figure 3:
\newcommand{\questionFigureThree} {
% Question Figure 3 - Replace this image with a figure depicting the average gradient across layers, for the VGG38 model.
%\textit{(The provided figure is correct, and can be used in your analysis. It is partially obscured so you can get credit for producing your own copy).}
\youranswer{
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/gradplot_38.pdf}
\caption{Gradient Flow on VGG38}
\label{fig:avg_grad_flow_38}
\end{figure}
}
}
%% Question Figure 4:
% Question Figure 4 - Replace this image with a figure depicting the training curves for the model with the best performance \textit{across experiments you have available (you don't need to run the experiments for the models we already give you results for)}. Edit the caption so that it clearly identifies the model and what is depicted.
\newcommand{\questionFigureFour} {
\youranswer{
\begin{figure}[t]
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/VGG38_BN_RC_loss_performance.pdf}
\caption{Cross entropy error per epoch}
\label{fig:vgg38_loss_curves}
\end{subfigure}
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/VGG38_BN_RC_accuracy_performance.pdf}
\caption{Classification accuracy per epoch}
\label{fig:vgg38_acc_curves}
\end{subfigure}
\caption{Training curves for the 38 layer CNN with batch normalization and residual connections, trained with LR of $0.01$}
\label{fig:training_curves_bestModel}
\end{figure}
}
}
%% Question Figure 5:
% Question Figure 5 - Replace this image with a figure depicting the average gradient across layers, for the model with the best performance \textit{across experiments you have available (you don't need to run the experiments for the models we already give you results for)}. Edit the caption so that it clearly identifies the model and what is depicted.
\newcommand{\questionFigureFive} {
\youranswer{
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/gradplot_38_bn_rc.pdf}
\caption{Gradient Flow for the 38 layer CNN with batch normalization and residual connections, trained with LR of $0.01$}
\label{fig:avg_grad_flow_bestModel}
\end{figure}
}
}
%% - - - - - - - - - - - - TABLES - - - - - - - - - - - -
%% Question Table 1:
% Question Table 1 - Fill in Table 1 with the results from your experiments on
% \begin{enumerate}
% \item \textit{VGG38 BN (LR 1e-3)}, and
% \item \textit{VGG38 BN + RC (LR 1e-2)}.
% \end{enumerate}
\newcommand{\questionTableOne} {
\youranswer{
%
\begin{table*}[t]
\centering
\begin{tabular}{lr|ccccc}
\toprule
Model & LR & \# Params & Train loss & Train acc & Val loss & Val acc \\
\midrule
VGG08 & 1e-3 & 60 K & 1.74 & 51.59 & 1.95 & 46.84 \\
VGG38 & 1e-3 & 336 K & 4.61 & 00.01 & 4.61 & 00.01 \\
VGG38 BN & 1e-3 & 339 K & 1.76 & 50.62 & 1.95 & 47.68 \\
VGG38 RC & 1e-3 & 336 K & 1.33 & 61.52 & 1.84 & 52.32 \\
VGG38 BN + RC & 1e-3 & 339 K & 1.26 & 62.99 & 1.73 & 53.76 \\
VGG38 BN & 1e-2 & 339 K & 1.70 & 52.28 & 1.99 & 46.72 \\
VGG38 BN + RC & 1e-2 & 339 K & 0.83 & 74.35 & 1.70 & 58.20 \\
\bottomrule
\end{tabular}
\caption{Experiment results (number of model parameters, Training and Validation loss and accuracy) for different combinations of VGG08, VGG38, Batch Normalisation (BN), and Residual Connections (RC), LR is learning rate.}
\label{tab:CIFAR_results}
\end{table*}
}
}
%% END of YOUR ANSWERS

View File

@ -1,314 +0,0 @@
%% Template for MLP Coursework 2 / 13 November 2023
%% Based on LaTeX template for ICML 2017 - example_paper.tex at
%% https://2017.icml.cc/Conferences/2017/StyleAuthorInstructions
\documentclass{article}
\input{mlp2022_includes}
\definecolor{red}{rgb}{0.95,0.4,0.4}
\definecolor{blue}{rgb}{0.4,0.4,0.95}
\definecolor{orange}{rgb}{1, 0.65, 0}
\newcommand{\youranswer}[1]{{\color{red} \bf[#1]}} %your answer:
%% START of YOUR ANSWERS
\input{mlp-cw2-questions}
%% END of YOUR ANSWERS
%% Do not change anything in this file. Add your answers to mlp-cw1-questions.tex
\begin{document}
\twocolumn[
\mlptitle{MLP Coursework 2}
\centerline{\studentNumber}
\vskip 7mm
]
\begin{abstract}
Deep neural networks have become the state-of-the-art
in many standard computer vision problems thanks to their powerful
representations and availability of large labeled datasets.
While very deep networks allow for learning more levels of abstractions in their layers from the data, training these models successfully is a challenging task due to problematic gradient flow through the layers, known as vanishing/exploding gradient problem.
In this report, we first analyze this problem in VGG models with 8 and 38 hidden layers on the CIFAR100 image dataset, by monitoring the gradient flow during training.
We explore known solutions to this problem including batch normalization or residual connections, and explain their theory and implementation details.
Our experiments show that batch normalization and residual connections effectively address the aforementioned problem and hence enable a deeper model to outperform shallower ones in the same experimental setup.
\end{abstract}
\section{Introduction}
\label{sec:intro}
Despite the remarkable progress of modern convolutional neural networks (CNNs) in image classification problems~\cite{simonyan2014very, he2016deep}, training very deep networks is a challenging procedure.
One of the major problems is the Vanishing Gradient Problem (VGP), a phenomenon where the gradients of the error function with respect to network weights shrink to zero, as they backpropagate to earlier layers, hence preventing effective weight updates.
This phenomenon is prevalent and has been extensively studied in various deep neural networks including feedforward networks~\cite{glorot2010understanding}, RNNs~\cite{bengio1993problem}, and CNNs~\cite{he2016deep}.
Multiple solutions have been proposed to mitigate this problem by using weight initialization strategies~\cite{glorot2010understanding},
activation functions~\cite{glorot2010understanding}, input normalization~\cite{bishop1995neural},
batch normalization~\cite{ioffe2015batch}, and shortcut connections \cite{he2016deep, huang2017densely}.
This report focuses on diagnosing the VGP occurring in the VGG38 model\footnote{VGG stands for the Visual Geometry Group in the University of Oxford.} and addressing it by implementing two standard solutions.
In particular, we first study a ``broken'' network in terms of its gradient flow, L1 norm of gradients with respect to its weights for each layer and contrast it to ones in the healthy and shallower VGG08 to pinpoint the problem.
Next, we review two standard solutions for this problem, batch normalization (BN)~\cite{ioffe2015batch} and residual connections (RC)~\cite{he2016deep} in detail and discuss how they can address the gradient problem.
We first incorporate batch normalization (denoted as VGG38+BN), residual connections (denoted as VGG38+RC), and their combination (denoted as VGG38+BN+RC) to the given VGG38 architecture.
We train the resulting three configurations, and VGG08 and VGG38 models on CIFAR100 (pronounced as `see far 100' ) dataset and present the results.
The results show that though separate use of BN and RC does mitigate the vanishing/exploding gradient problem, therefore enabling effective training of the VGG38 model, the best results are obtained by combining both BN and RC.
%
\section{Identifying training problems of a deep CNN}
\label{sec:task1}
\begin{figure}[t]
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/loss_plot.pdf}
\caption{Cross entropy error per epoch}
\label{fig:loss_curves}
\end{subfigure}
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/accuracy_plot.pdf}
\caption{Classification accuracy per epoch}
\label{fig:acc_curves}
\end{subfigure}
\caption{Training curves for VGG08 and VGG38 in terms of (a) cross-entropy error and (b) classification accuracy}
\label{fig:curves}
\end{figure}
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/grad_flow_vgg08.pdf}
\caption{Gradient flow on VGG08}
\label{fig:grad_flow_08}
\end{figure}
\questionFigureThree
Concretely, training deep neural networks typically involves three steps: forward
pass, backward pass (or backpropagation algorithm~\cite{rumelhart1986learning}) and weight update.
The first step involves passing the input $\bx^{(0)}$ to the network and producing
the network prediction and also the error value.
In detail, each layer takes in the output of the previous layer and applies
a non-linear transformation:
\begin{equation}
\label{eq.fprop}
\bx^{(l)} = f^{(l)}(\bx^{(l-1)}; W^{(l)})
\end{equation}
where $(l)$ denotes the $l$-th layer in $L$ layer deep network,
$f^{(l)}(\cdot,W^{(l)})$ is a non-linear transformation for layer $l$, and $W^{(l)}$ are the weights of layer $l$.
For instance, $f^{(l)}$ is typically a convolution operation followed by an activation function in convolutional neural networks.
The second step involves the backpropagation algorithm, where we calculate the gradient of an error function $E$ (\textit{e.g.} cross-entropy) for each layer's weight as follows:
\begin{equation}
\label{eq.bprop}
\frac{\partial E}{\partial W^{(l)}} = \frac{\partial E}{\partial \bx^{(L)}} \frac{\partial \bx^{(L)}}{\partial \bx^{(L-1)}} \dots \frac{\partial \bx^{(l+1)}}{\partial \bx^{(l)}}\frac{\partial \bx^{(l)}}{\partial W^{(l)}}.
\end{equation}
This step includes consecutive tensor multiplications between multiple
partial derivative terms.
The final step involves updating model weights by using the computed
$\frac{\partial E}{\partial W^{(l)}}$ with an update rule.
The exact update rule depends on the optimizer.
A notorious problem for training deep neural networks is the vanishing/exploding gradient
problem~\cite{bengio1993problem} that typically occurs in the backpropagation step when some of partial gradient terms in Eq.~\ref{eq.bprop} includes values larger or smaller than 1.
In this case, due to the multiple consecutive multiplications, the gradients \textit{w.r.t.} weights can get exponentially very small (close to 0) or very large (close to infinity) and
prevent effective learning of network weights.
%
Figures~\ref{fig:grad_flow_08} and \ref{fig:grad_flow_38} depict the gradient flows through VGG architectures \cite{simonyan2014very} with 8 and 38 layers respectively, trained and evaluated for a total of 100 epochs on the CIFAR100 dataset. \questionOne.
\section{Background Literature}
\label{sec:lit_rev}
In this section we will highlight some of the most influential
papers that have been central to overcoming the VGP in
deep CNNs.
\paragraph{Batch Normalization}\cite{ioffe2015batch}
BN seeks to solve the problem of
internal covariate shift (ICS), when distribution of each layers
inputs changes during training, as the parameters of the previous layers change.
The authors argue that without batch normalization, the distribution of
each layers inputs can vary significantly due to the stochastic nature of randomly sampling mini-batches from your
training set.
Layers in the network hence must continuously adapt to these high variance distributions which hinders the rate of convergence gradient-based optimizers.
This optimization problem is exacerbated further with network depth due
to the updating of parameters at layer $l$ being dependent on
the previous $l-1$ layers.
It is hence beneficial to embed the normalization of
training data into the network architecture after work from
LeCun \emph{et al.} showed that training converges faster with
this addition \cite{lecun2012efficient}. Through standardizing
the inputs to each layer, we take a step towards achieving
the fixed distributions of inputs that remove the ill effects
of ICS. Ioffe and Szegedy demonstrate the effectiveness of
their technique through training an ensemble of BN
networks which achieve an accuracy on the ImageNet classification
task exceeding that of humans in 14 times fewer
training steps than the state-of-the-art of the time.
It should be noted, however, that the exact reason for BNs effectiveness is still not completely understood and it is
an open research question~\cite{santurkar2018does}.
\paragraph{Residual networks (ResNet)}\cite{he2016deep} A well-known way of mitigating the VGP is proposed by He~\emph{et al.} in \cite{he2016deep}. In their paper, the authors depict the error curves of a 20 layer and a 56 layer network to motivate their method. Both training and testing error of the 56 layer network are significantly higher than of the shallower one.
\questionTwo.
Residual networks, colloquially
known as ResNets, aim to alleviate VGP through the
incorporation of skip connections that bypass the linear
transformations into the network architecture.
The authors argue that this new mapping is significantly easier
to optimize since if an identity mapping were optimal, the
network could comfortably learn to push the residual to
zero rather than attempting to fit an identity mapping via
a stack of nonlinear layers.
They bolster their argument
by successfully training ResNets with depths exceeding
1000 layers on the CIFAR10 dataset.
Prior to their work, training even a 100-layer was accepted
as a great challenge within the deep learning community.
The addition of skip connections solves the VGP through
enabling information to flow more freely throughout the
network architecture without the addition of neither extra
parameters, nor computational complexity.
\section{Solution overview}
\subsection{Batch normalization}
BN has been a standard component in the state-of-the-art
convolutional neural networks \cite{he2016deep,huang2017densely}.
% As mentioned in Section~\ref{sec:lit_rev},
Concretely, BN is a
layer transformation that is performed to whiten the activations
originating from each layer.
As computing full dataset statistics at each training iteration
would be computationally expensive, BN computes batch statistics
to approximate them.
Given a minibatch of $B$ training samples and their feature maps
$X = (\bx^1, \bx^2,\ldots , \bx^B)$ at an arbitrary layer where $X \in \mathbb{R}^{B\times H \times W \times C}$, $H, W$ are the height, width of the feature map and $C$ is the number of channels, the batch normalization first computes the following statistics:
\begin{align}
\label{eq.bnstats}
\mu_c &= \frac{1}{BWH} \sum_{n=1}^{B}\sum_{i,j=1}^{H,W} \bx_{cij}^{n}\\
\sigma^2_c &= \frac{1}{BWH}
\sum_{n=1}^{B}\sum_{i,j=1}^{H,W} (\bx_{cij}^{n} - \mu_{c})^2
\end{align} where $c$, $i$, $j$ denote the index values for $y$, $x$ and channel coordinates of feature maps, and $\bm{\mu}$ and $\bm{\sigma}^2$ are the mean and variance of the batch.
BN applies the following operation on each feature map in batch B for every $c,i,j$:
\begin{equation}
\label{eq.bnop}
\text{BN}(\bx_{cij}) = \frac{\bx_{cij} - \mu_{c}}{\sqrt{\sigma^2_{c}} + \epsilon} * \gamma_{c} + \beta_{c}
\end{equation} where $\gamma \in \mathbb{R}^C$ and $\beta\in \mathbb{R}^C$ are learnable parameters and $\epsilon$ is a small constant introduced to ensure numerical stability.
At inference time, using batch statistics is a poor choice as it introduces noise in the evaluation and might not even be well defined. Therefore, $\bm{\mu}$ and $\bm{\sigma}$ are replaced by running averages of the mean and variance computed during training, which is a better approximation of the full dataset statistics.
Recent work
has shown that BatchNorm has a more fundamental
benefit of smoothing the optimization landscape during
training \cite{santurkar2018does} thus enhancing the predictive
power of gradients as our guide to the global minimum.
Furthermore, a smoother optimization landscape should
additionally enable the use of a wider range of learning
rates and initialization schemes which is congruent with the
findings of Ioffe and Szegedy in the original BatchNorm
paper~\cite{ioffe2015batch}.
\subsection{Residual connections}
Residual connections are another approach used in the state-of-the-art Residual Networks~\cite{he2016deep} to tackle the vanishing gradient problem.
Introduced by He et. al.~\cite{he2016deep}, a residual block consists of a
convolution (or group of convolutions) layer, ``short-circuited'' with an identity mapping.
More precisely, given a mapping $F^{(b)}$ that denotes the transformation of the block $b$ (multiple consecutive layers), $F^{(b)}$ is applied to its input
feature map $\bx^{(b-1)}$ as $\bx^{(b)} = \bx^{(b-1)} + {F}(\bx^{(b-1)})$.
Intuitively, stacking residual blocks creates an architecture where inputs of each blocks
are given two paths : passing through the convolution or skipping to the next layer. A residual network can therefore be seen as an ensemble model averaging every sub-network
created by choosing one of the two paths. The skip connections allow gradients to flow
easily into early layers, since
\begin{equation}
\frac{\partial \bx^{(b)}}{\partial \bx^{(b-1)}} = \mathbbm{1} + \frac{\partial{F}(\bx^{(b-1)})}{\partial \bx^{(b-1)}}
\label{eq.grad_skip}
\end{equation} where $\bx^{(b-1)} \in \mathbb{R}^{C \times H \times W }$ and $\mathbbm{1}$ is a $\mathbb{R}^{C \times H \times W}$-dimensional tensor with entries 1 where $C$, $H$ and $W$ denote the number of feature maps, its height and width respectively.
Importantly, $\mathbbm{1}$ prevents the zero gradient flow.
\section{Experiment Setup}
\questionFigureFour
\questionFigureFive
\questionTableOne
We conduct our experiment on the CIFAR100 dataset \cite{krizhevsky2009learning},
which consists of 60,000 32x32 colour images from 100 different classes. The number of samples per class is balanced, and the
samples are split into training, validation, and test set while
maintaining balanced class proportions. In total, there are 47,500; 2,500; and 10,000 instances in the training, validation,
and test set, respectively. Moreover, we apply data augmentation strategies (cropping, horizontal flipping) to improve the generalization of the model.
With the goal of understanding whether BN or skip connections
help fighting vanishing gradients, we first test these
methods independently, before combining them in an attempt
to fully exploit the depth of the VGG38 model.
All experiments are conducted using the Adam optimizer with the default
learning rate (1e-3) -- unless otherwise specified, cosine annealing and a batch size of 100
for 100 epochs.
Additionally, training images are augmented with random
cropping and horizontal flipping.
Note that we do not use data augmentation at test time.
These hyperparameters along with the augmentation strategy are used
to produce the results shown in Fig.~\ref{fig:curves}.
When used, BN is applied
after each convolutional layer, before the Leaky
ReLU non-linearity.
Similarly, the skip connections are applied from
before the convolution layer to before the final activation function
of the block as per Fig.~2 of \cite{he2016deep}.
Note that adding residual connections between the feature maps before and after downsampling requires special treatment, as there is a dimension mismatch between them.
Therefore in the coursework, we do not use residual connections in the down-sampling blocks. However, please note that batch normalization should still be implemented for these blocks.
\subsection{Residual Connections to Downsampling Layers}
\label{subsec:rescimp}
\questionThree.
\section{Results and Discussion}
\label{sec:disc}
\questionFour.
\section{Conclusion}
\label{sec:concl}
\questionFive.
\bibliography{refs}
\end{document}

View File

@ -1,720 +0,0 @@
% File: mlp2017.sty (LaTeX style file for ICML-2017, version of 2017-05-31)
% Modified by Daniel Roy 2017: changed byline to use footnotes for affiliations, and removed emails
% This file contains the LaTeX formatting parameters for a two-column
% conference proceedings that is 8.5 inches wide by 11 inches high.
%
% Modified by Percy Liang 12/2/2013: changed the year, location from the previous template for ICML 2014
% Modified by Fei Sha 9/2/2013: changed the year, location form the previous template for ICML 2013
%
% Modified by Fei Sha 4/24/2013: (1) remove the extra whitespace after the first author's email address (in %the camera-ready version) (2) change the Proceeding ... of ICML 2010 to 2014 so PDF's metadata will show up % correctly
%
% Modified by Sanjoy Dasgupta, 2013: changed years, location
%
% Modified by Francesco Figari, 2012: changed years, location
%
% Modified by Christoph Sawade and Tobias Scheffer, 2011: added line
% numbers, changed years
%
% Modified by Hal Daume III, 2010: changed years, added hyperlinks
%
% Modified by Kiri Wagstaff, 2009: changed years
%
% Modified by Sam Roweis, 2008: changed years
%
% Modified by Ricardo Silva, 2007: update of the ifpdf verification
%
% Modified by Prasad Tadepalli and Andrew Moore, merely changing years.
%
% Modified by Kristian Kersting, 2005, based on Jennifer Dy's 2004 version
% - running title. If the original title is to long or is breaking a line,
% use \mlptitlerunning{...} in the preamble to supply a shorter form.
% Added fancyhdr package to get a running head.
% - Updated to store the page size because pdflatex does compile the
% page size into the pdf.
%
% Hacked by Terran Lane, 2003:
% - Updated to use LaTeX2e style file conventions (ProvidesPackage,
% etc.)
% - Added an ``appearing in'' block at the base of the first column
% (thus keeping the ``appearing in'' note out of the bottom margin
% where the printer should strip in the page numbers).
% - Added a package option [accepted] that selects between the ``Under
% review'' notice (default, when no option is specified) and the
% ``Appearing in'' notice (for use when the paper has been accepted
% and will appear).
%
% Originally created as: ml2k.sty (LaTeX style file for ICML-2000)
% by P. Langley (12/23/99)
%%%%%%%%%%%%%%%%%%%%
%% This version of the style file supports both a ``review'' version
%% and a ``final/accepted'' version. The difference is only in the
%% text that appears in the note at the bottom of the first column of
%% the first page. The default behavior is to print a note to the
%% effect that the paper is under review and don't distribute it. The
%% final/accepted version prints an ``Appearing in'' note. To get the
%% latter behavior, in the calling file change the ``usepackage'' line
%% from:
%% \usepackage{icml2017}
%% to
%% \usepackage[accepted]{icml2017}
%%%%%%%%%%%%%%%%%%%%
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{mlp2022}[2021/10/16 MLP Coursework Style File]
% Use fancyhdr package
\RequirePackage{fancyhdr}
\RequirePackage{color}
\RequirePackage{algorithm}
\RequirePackage{algorithmic}
\RequirePackage{natbib}
\RequirePackage{eso-pic} % used by \AddToShipoutPicture
\RequirePackage{forloop}
%%%%%%%% Options
%\DeclareOption{accepted}{%
% \renewcommand{\Notice@String}{\ICML@appearing}
\gdef\isaccepted{1}
%}
\DeclareOption{nohyperref}{%
\gdef\nohyperref{1}
}
\ifdefined\nohyperref\else\ifdefined\hypersetup
\definecolor{mydarkblue}{rgb}{0,0.08,0.45}
\hypersetup{ %
pdftitle={},
pdfauthor={},
pdfsubject={MLP Coursework 2021-22},
pdfkeywords={},
pdfborder=0 0 0,
pdfpagemode=UseNone,
colorlinks=true,
linkcolor=mydarkblue,
citecolor=mydarkblue,
filecolor=mydarkblue,
urlcolor=mydarkblue,
pdfview=FitH}
\ifdefined\isaccepted \else
\hypersetup{pdfauthor={Anonymous Submission}}
\fi
\fi\fi
%%%%%%%%%%%%%%%%%%%%
% This string is printed at the bottom of the page for the
% final/accepted version of the ``appearing in'' note. Modify it to
% change that text.
%%%%%%%%%%%%%%%%%%%%
\newcommand{\ICML@appearing}{\textit{MLP Coursework 1 2021--22}}
%%%%%%%%%%%%%%%%%%%%
% This string is printed at the bottom of the page for the draft/under
% review version of the ``appearing in'' note. Modify it to change
% that text.
%%%%%%%%%%%%%%%%%%%%
\newcommand{\Notice@String}{MLP Coursework 1 2021--22}
% Cause the declared options to actually be parsed and activated
\ProcessOptions\relax
% Uncomment the following for debugging. It will cause LaTeX to dump
% the version of the ``appearing in'' string that will actually appear
% in the document.
%\typeout{>> Notice string='\Notice@String'}
% Change citation commands to be more like old ICML styles
\newcommand{\yrcite}[1]{\citeyearpar{#1}}
\renewcommand{\cite}[1]{\citep{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% to ensure the letter format is used. pdflatex does compile the
% page size into the pdf. This is done using \pdfpagewidth and
% \pdfpageheight. As Latex does not know this directives, we first
% check whether pdflatex or latex is used.
%
% Kristian Kersting 2005
%
% in order to account for the more recent use of pdfetex as the default
% compiler, I have changed the pdf verification.
%
% Ricardo Silva 2007
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\paperwidth=210mm
\paperheight=297mm
% old PDFLaTex verification, circa 2005
%
%\newif\ifpdf\ifx\pdfoutput\undefined
% \pdffalse % we are not running PDFLaTeX
%\else
% \pdfoutput=1 % we are running PDFLaTeX
% \pdftrue
%\fi
\newif\ifpdf %adapted from ifpdf.sty
\ifx\pdfoutput\undefined
\else
\ifx\pdfoutput\relax
\else
\ifcase\pdfoutput
\else
\pdftrue
\fi
\fi
\fi
\ifpdf
% \pdfpagewidth=\paperwidth
% \pdfpageheight=\paperheight
\setlength{\pdfpagewidth}{210mm}
\setlength{\pdfpageheight}{297mm}
\fi
% Physical page layout
\evensidemargin -5.5mm
\oddsidemargin -5.5mm
\setlength\textheight{248mm}
\setlength\textwidth{170mm}
\setlength\columnsep{6.5mm}
\setlength\headheight{10pt}
\setlength\headsep{10pt}
\addtolength{\topmargin}{-20pt}
%\setlength\headheight{1em}
%\setlength\headsep{1em}
\addtolength{\topmargin}{-6mm}
%\addtolength{\topmargin}{-2em}
%% The following is adapted from code in the acmconf.sty conference
%% style file. The constants in it are somewhat magical, and appear
%% to work well with the two-column format on US letter paper that
%% ICML uses, but will break if you change that layout, or if you use
%% a longer block of text for the copyright notice string. Fiddle with
%% them if necessary to get the block to fit/look right.
%%
%% -- Terran Lane, 2003
%%
%% The following comments are included verbatim from acmconf.sty:
%%
%%% This section (written by KBT) handles the 1" box in the lower left
%%% corner of the left column of the first page by creating a picture,
%%% and inserting the predefined string at the bottom (with a negative
%%% displacement to offset the space allocated for a non-existent
%%% caption).
%%%
\def\ftype@copyrightbox{8}
\def\@copyrightspace{
% Create a float object positioned at the bottom of the column. Note
% that because of the mystical nature of floats, this has to be called
% before the first column is populated with text (e.g., from the title
% or abstract blocks). Otherwise, the text will force the float to
% the next column. -- TDRL.
\@float{copyrightbox}[b]
\begin{center}
\setlength{\unitlength}{1pc}
\begin{picture}(20,1.5)
% Create a line separating the main text from the note block.
% 4.818pc==0.8in.
\put(0,2.5){\line(1,0){4.818}}
% Insert the text string itself. Note that the string has to be
% enclosed in a parbox -- the \put call needs a box object to
% position. Without the parbox, the text gets splattered across the
% bottom of the page semi-randomly. The 19.75pc distance seems to be
% the width of the column, though I can't find an appropriate distance
% variable to substitute here. -- TDRL.
\put(0,0){\parbox[b]{19.75pc}{\small \Notice@String}}
\end{picture}
\end{center}
\end@float}
% Note: A few Latex versions need the next line instead of the former.
% \addtolength{\topmargin}{0.3in}
% \setlength\footheight{0pt}
\setlength\footskip{0pt}
%\pagestyle{empty}
\flushbottom \twocolumn
\sloppy
% Clear out the addcontentsline command
\def\addcontentsline#1#2#3{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% commands for formatting paper title, author names, and addresses.
%%start%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%% title as running head -- Kristian Kersting 2005 %%%%%%%%%%%%%
%\makeatletter
%\newtoks\mytoksa
%\newtoks\mytoksb
%\newcommand\addtomylist[2]{%
% \mytoksa\expandafter{#1}%
% \mytoksb{#2}%
% \edef#1{\the\mytoksa\the\mytoksb}%
%}
%\makeatother
% box to check the size of the running head
\newbox\titrun
% general page style
\pagestyle{fancy}
\fancyhf{}
\fancyhead{}
\fancyfoot{}
% set the width of the head rule to 1 point
\renewcommand{\headrulewidth}{1pt}
% definition to set the head as running head in the preamble
\def\mlptitlerunning#1{\gdef\@mlptitlerunning{#1}}
% main definition adapting \mlptitle from 2004
\long\def\mlptitle#1{%
%check whether @mlptitlerunning exists
% if not \mlptitle is used as running head
\ifx\undefined\@mlptitlerunning%
\gdef\@mlptitlerunning{#1}
\fi
%add it to pdf information
\ifdefined\nohyperref\else\ifdefined\hypersetup
\hypersetup{pdftitle={#1}}
\fi\fi
%get the dimension of the running title
\global\setbox\titrun=\vbox{\small\bf\@mlptitlerunning}
% error flag
\gdef\@runningtitleerror{0}
% running title too long
\ifdim\wd\titrun>\textwidth%
{\gdef\@runningtitleerror{1}}%
% running title breaks a line
\else\ifdim\ht\titrun>6.25pt
{\gdef\@runningtitleerror{2}}%
\fi
\fi
% if there is somthing wrong with the running title
\ifnum\@runningtitleerror>0
\typeout{}%
\typeout{}%
\typeout{*******************************************************}%
\typeout{Title exceeds size limitations for running head.}%
\typeout{Please supply a shorter form for the running head}
\typeout{with \string\mlptitlerunning{...}\space prior to \string\begin{document}}%
\typeout{*******************************************************}%
\typeout{}%
\typeout{}%
% set default running title
\chead{\small\bf Title Suppressed Due to Excessive Size}%
\else
% 'everything' fine, set provided running title
\chead{\small\bf\@mlptitlerunning}%
\fi
% no running title on the first page of the paper
\thispagestyle{empty}
%%%%%%%%%%%%%%%%%%%% Kristian Kersting %%%%%%%%%%%%%%%%%%%%%%%%%
%end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
{\center\baselineskip 18pt
\toptitlebar{\Large\bf #1}\bottomtitlebar}
}
\gdef\icmlfullauthorlist{}
\newcommand\addstringtofullauthorlist{\g@addto@macro\icmlfullauthorlist}
\newcommand\addtofullauthorlist[1]{%
\ifdefined\icmlanyauthors%
\addstringtofullauthorlist{, #1}%
\else%
\addstringtofullauthorlist{#1}%
\gdef\icmlanyauthors{1}%
\fi%
\ifdefined\nohyperref\else\ifdefined\hypersetup%
\hypersetup{pdfauthor=\icmlfullauthorlist}%
\fi\fi}
\def\toptitlebar{\hrule height1pt \vskip .25in}
\def\bottomtitlebar{\vskip .22in \hrule height1pt \vskip .3in}
\newenvironment{icmlauthorlist}{%
\setlength\topsep{0pt}
\setlength\parskip{0pt}
\begin{center}
}{%
\end{center}
}
\newcounter{@affiliationcounter}
\newcommand{\@pa}[1]{%
% ``#1''
\ifcsname the@affil#1\endcsname
% do nothing
\else
\ifcsname @icmlsymbol#1\endcsname
% nothing
\else
\stepcounter{@affiliationcounter}%
\newcounter{@affil#1}%
\setcounter{@affil#1}{\value{@affiliationcounter}}%
\fi
\fi%
\ifcsname @icmlsymbol#1\endcsname
\textsuperscript{\csname @icmlsymbol#1\endcsname\,}%
\else
%\expandafter\footnotemark[\arabic{@affil#1}\,]%
\textsuperscript{\arabic{@affil#1}\,}%
\fi
}
%\newcommand{\icmlauthor}[2]{%
%\addtofullauthorlist{#1}%
%#1\@for\theaffil:=#2\do{\pa{\theaffil}}%
%}
\newcommand{\icmlauthor}[2]{%
\ifdefined\isaccepted
\mbox{\bf #1}\,\@for\theaffil:=#2\do{\@pa{\theaffil}} \addtofullauthorlist{#1}%
\else
\ifdefined\@icmlfirsttime
\else
\gdef\@icmlfirsttime{1}
\mbox{\bf Anonymous Authors}\@pa{@anon} \addtofullauthorlist{Anonymous Authors}
\fi
\fi
}
\newcommand{\icmlsetsymbol}[2]{%
\expandafter\gdef\csname @icmlsymbol#1\endcsname{#2}
}
\newcommand{\icmlaffiliation}[2]{%
\ifdefined\isaccepted
\ifcsname the@affil#1\endcsname
\expandafter\gdef\csname @affilname\csname the@affil#1\endcsname\endcsname{#2}%
\else
{\bf AUTHORERR: Error in use of \textbackslash{}icmlaffiliation command. Label ``#1'' not mentioned in some \textbackslash{}icmlauthor\{author name\}\{labels here\} command beforehand. }
\typeout{}%
\typeout{}%
\typeout{*******************************************************}%
\typeout{Affiliation label undefined. }%
\typeout{Make sure \string\icmlaffiliation\space follows }
\typeout{all of \string\icmlauthor\space commands}%
\typeout{*******************************************************}%
\typeout{}%
\typeout{}%
\fi
\else % \isaccepted
% can be called multiple times... it's idempotent
\expandafter\gdef\csname @affilname1\endcsname{Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country}
\fi
}
\newcommand{\icmlcorrespondingauthor}[2]{
\ifdefined\isaccepted
\ifdefined\icmlcorrespondingauthor@text
\g@addto@macro\icmlcorrespondingauthor@text{, #1 \textless{}#2\textgreater{}}
\else
\gdef\icmlcorrespondingauthor@text{#1 \textless{}#2\textgreater{}}
\fi
\else
\gdef\icmlcorrespondingauthor@text{Anonymous Author \textless{}anon.email@domain.com\textgreater{}}
\fi
}
\newcommand{\icmlEqualContribution}{\textsuperscript{*}Equal contribution }
\newcounter{@affilnum}
\newcommand{\printAffiliationsAndNotice}[1]{%
\stepcounter{@affiliationcounter}%
{\let\thefootnote\relax\footnotetext{\hspace*{-\footnotesep}#1%
\forloop{@affilnum}{1}{\value{@affilnum} < \value{@affiliationcounter}}{
\textsuperscript{\arabic{@affilnum}}\ifcsname @affilname\the@affilnum\endcsname%
\csname @affilname\the@affilnum\endcsname%
\else
{\bf AUTHORERR: Missing \textbackslash{}icmlaffiliation.}
\fi
}.
\ifdefined\icmlcorrespondingauthor@text
Correspondence to: \icmlcorrespondingauthor@text.
\else
{\bf AUTHORERR: Missing \textbackslash{}icmlcorrespondingauthor.}
\fi
\ \\
\Notice@String
}
}
}
%\makeatother
\long\def\icmladdress#1{%
{\bf The \textbackslash{}icmladdress command is no longer used. See the example\_paper PDF .tex for usage of \textbackslash{}icmlauther and \textbackslash{}icmlaffiliation.}
}
%% keywords as first class citizens
\def\icmlkeywords#1{%
% \ifdefined\isaccepted \else
% \par {\bf Keywords:} #1%
% \fi
% \ifdefined\nohyperref\else\ifdefined\hypersetup
% \hypersetup{pdfkeywords={#1}}
% \fi\fi
% \ifdefined\isaccepted \else
% \par {\bf Keywords:} #1%
% \fi
\ifdefined\nohyperref\else\ifdefined\hypersetup
\hypersetup{pdfkeywords={#1}}
\fi\fi
}
% modification to natbib citations
\setcitestyle{authoryear,round,citesep={;},aysep={,},yysep={;}}
% Redefinition of the abstract environment.
\renewenvironment{abstract}
{%
% Insert the ``appearing in'' copyright notice.
%\@copyrightspace
\centerline{\large\bf Abstract}
\vspace{-0.12in}\begin{quote}}
{\par\end{quote}\vskip 0.12in}
% numbered section headings with different treatment of numbers
\def\@startsection#1#2#3#4#5#6{\if@noskipsec \leavevmode \fi
\par \@tempskipa #4\relax
\@afterindenttrue
% Altered the following line to indent a section's first paragraph.
% \ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \@afterindentfalse\fi
\ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \fi
\if@nobreak \everypar{}\else
\addpenalty{\@secpenalty}\addvspace{\@tempskipa}\fi \@ifstar
{\@ssect{#3}{#4}{#5}{#6}}{\@dblarg{\@sict{#1}{#2}{#3}{#4}{#5}{#6}}}}
\def\@sict#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
\def\@svsec{}\else
\refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname}\fi
\@tempskipa #5\relax
\ifdim \@tempskipa>\z@
\begingroup #6\relax
\@hangfrom{\hskip #3\relax\@svsec.~}{\interlinepenalty \@M #8\par}
\endgroup
\csname #1mark\endcsname{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}\else
\def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}}\fi
\@xsect{#5}}
\def\@sect#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
\def\@svsec{}\else
\refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname\hskip 0.4em }\fi
\@tempskipa #5\relax
\ifdim \@tempskipa>\z@
\begingroup #6\relax
\@hangfrom{\hskip #3\relax\@svsec}{\interlinepenalty \@M #8\par}
\endgroup
\csname #1mark\endcsname{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}\else
\def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}}\fi
\@xsect{#5}}
% section headings with less space above and below them
\def\thesection {\arabic{section}}
\def\thesubsection {\thesection.\arabic{subsection}}
\def\section{\@startsection{section}{1}{\z@}{-0.12in}{0.02in}
{\large\bf\raggedright}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-0.10in}{0.01in}
{\normalsize\bf\raggedright}}
\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-0.08in}{0.01in}
{\normalsize\sc\raggedright}}
\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
0.5ex minus .2ex}{-1em}{\normalsize\bf}}
\def\subparagraph{\@startsection{subparagraph}{5}{\z@}{1.5ex plus
0.5ex minus .2ex}{-1em}{\normalsize\bf}}
% Footnotes
\footnotesep 6.65pt %
\skip\footins 9pt
\def\footnoterule{\kern-3pt \hrule width 0.8in \kern 2.6pt }
\setcounter{footnote}{0}
% Lists and paragraphs
\parindent 0pt
\topsep 4pt plus 1pt minus 2pt
\partopsep 1pt plus 0.5pt minus 0.5pt
\itemsep 2pt plus 1pt minus 0.5pt
\parsep 2pt plus 1pt minus 0.5pt
\parskip 6pt
\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em
\leftmarginvi .5em
\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
\def\@listi{\leftmargin\leftmargini}
\def\@listii{\leftmargin\leftmarginii
\labelwidth\leftmarginii\advance\labelwidth-\labelsep
\topsep 2pt plus 1pt minus 0.5pt
\parsep 1pt plus 0.5pt minus 0.5pt
\itemsep \parsep}
\def\@listiii{\leftmargin\leftmarginiii
\labelwidth\leftmarginiii\advance\labelwidth-\labelsep
\topsep 1pt plus 0.5pt minus 0.5pt
\parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
\itemsep \topsep}
\def\@listiv{\leftmargin\leftmarginiv
\labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
\def\@listv{\leftmargin\leftmarginv
\labelwidth\leftmarginv\advance\labelwidth-\labelsep}
\def\@listvi{\leftmargin\leftmarginvi
\labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
\abovedisplayskip 7pt plus2pt minus5pt%
\belowdisplayskip \abovedisplayskip
\abovedisplayshortskip 0pt plus3pt%
\belowdisplayshortskip 4pt plus3pt minus3pt%
% Less leading in most fonts (due to the narrow columns)
% The choices were between 1-pt and 1.5-pt leading
\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}
% Revised formatting for figure captions and table titles.
\newsavebox\newcaptionbox\newdimen\newcaptionboxwid
\long\def\@makecaption#1#2{
\vskip 10pt
\baselineskip 11pt
\setbox\@tempboxa\hbox{#1. #2}
\ifdim \wd\@tempboxa >\hsize
\sbox{\newcaptionbox}{\small\sl #1.~}
\newcaptionboxwid=\wd\newcaptionbox
\usebox\newcaptionbox {\footnotesize #2}
% \usebox\newcaptionbox {\small #2}
\else
\centerline{{\small\sl #1.} {\small #2}}
\fi}
\def\fnum@figure{Figure \thefigure}
\def\fnum@table{Table \thetable}
% Strut macros for skipping spaces above and below text in tables.
\def\abovestrut#1{\rule[0in]{0in}{#1}\ignorespaces}
\def\belowstrut#1{\rule[-#1]{0in}{#1}\ignorespaces}
\def\abovespace{\abovestrut{0.20in}}
\def\aroundspace{\abovestrut{0.20in}\belowstrut{0.10in}}
\def\belowspace{\belowstrut{0.10in}}
% Various personal itemization commands.
\def\texitem#1{\par\noindent\hangindent 12pt
\hbox to 12pt {\hss #1 ~}\ignorespaces}
\def\icmlitem{\texitem{$\bullet$}}
% To comment out multiple lines of text.
\long\def\comment#1{}
%% Line counter (not in final version). Adapted from NIPS style file by Christoph Sawade
% Vertical Ruler
% This code is, largely, from the CVPR 2010 conference style file
% ----- define vruler
\makeatletter
\newbox\icmlrulerbox
\newcount\icmlrulercount
\newdimen\icmlruleroffset
\newdimen\cv@lineheight
\newdimen\cv@boxheight
\newbox\cv@tmpbox
\newcount\cv@refno
\newcount\cv@tot
% NUMBER with left flushed zeros \fillzeros[<WIDTH>]<NUMBER>
\newcount\cv@tmpc@ \newcount\cv@tmpc
\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
\cv@tmpc=1 %
\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
\ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
\ifnum#2<0\advance\cv@tmpc1\relax-\fi
\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
\def\makevruler[#1][#2][#3][#4][#5]{
\begingroup\offinterlineskip
\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
\global\setbox\icmlrulerbox=\vbox to \textheight{%
{
\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
\cv@lineheight=#1\global\icmlrulercount=#2%
\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
\cv@refno1\vskip-\cv@lineheight\vskip1ex%
\loop\setbox\cv@tmpbox=\hbox to0cm{ % side margin
\hfil {\hfil\fillzeros[#4]\icmlrulercount}
}%
\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
\advance\cv@refno1\global\advance\icmlrulercount#3\relax
\ifnum\cv@refno<\cv@tot\repeat
}
}
\endgroup
}%
\makeatother
% ----- end of vruler
% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
\def\icmlruler#1{\makevruler[12pt][#1][1][3][\textheight]\usebox{\icmlrulerbox}}
\AddToShipoutPicture{%
\icmlruleroffset=\textheight
\advance\icmlruleroffset by 5.2pt % top margin
\color[rgb]{.7,.7,.7}
\ifdefined\isaccepted \else
\AtTextUpperLeft{%
\put(\LenToUnit{-35pt},\LenToUnit{-\icmlruleroffset}){%left ruler
\icmlruler{\icmlrulercount}}
% \put(\LenToUnit{1.04\textwidth},\LenToUnit{-\icmlruleroffset}){%right ruler
% \icmlruler{\icmlrulercount}}
}
\fi
}
\endinput

View File

@ -1,50 +0,0 @@
\usepackage[T1]{fontenc}
\usepackage{amssymb,amsmath}
\usepackage{txfonts}
\usepackage{microtype}
% For figures
\usepackage{graphicx}
\usepackage{subcaption}
% For citations
\usepackage{natbib}
% For algorithms
\usepackage{algorithm}
\usepackage{algorithmic}
% the hyperref package is used to produce hyperlinks in the
% resulting PDF. If this breaks your system, please commend out the
% following usepackage line and replace \usepackage{mlp2017} with
% \usepackage[nohyperref]{mlp2017} below.
\usepackage{hyperref}
\usepackage{url}
\urlstyle{same}
\usepackage{color}
\usepackage{booktabs} % To thicken table lines
\usepackage{multirow} % Multirow cells in table
% Packages hyperref and algorithmic misbehave sometimes. We can fix
% this with the following command.
\newcommand{\theHalgorithm}{\arabic{algorithm}}
% Set up MLP coursework style (based on ICML style)
\usepackage{mlp2022}
\mlptitlerunning{MLP Coursework 2 (\studentNumber)}
\bibliographystyle{icml2017}
\usepackage{bm,bbm}
\usepackage{soul}
\DeclareMathOperator{\softmax}{softmax}
\DeclareMathOperator{\sigmoid}{sigmoid}
\DeclareMathOperator{\sgn}{sgn}
\DeclareMathOperator{\relu}{relu}
\DeclareMathOperator{\lrelu}{lrelu}
\DeclareMathOperator{\elu}{elu}
\DeclareMathOperator{\selu}{selu}
\DeclareMathOperator{\maxout}{maxout}
\newcommand{\bx}{\bm{x}}

View File

@ -1,184 +0,0 @@
@inproceedings{goodfellow2013maxout,
title={Maxout networks},
author={Goodfellow, Ian and Warde-Farley, David and Mirza, Mehdi and Courville, Aaron and Bengio, Yoshua},
booktitle={International conference on machine learning},
pages={1319--1327},
year={2013},
organization={PMLR}
}
@article{srivastava2014dropout,
title={Dropout: a simple way to prevent neural networks from overfitting},
author={Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever, Ilya and Salakhutdinov, Ruslan},
journal={The journal of machine learning research},
volume={15},
number={1},
pages={1929--1958},
year={2014},
publisher={JMLR. org}
}
@book{Goodfellow-et-al-2016,
title={Deep Learning},
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
publisher={MIT Press},
note={\url{http://www.deeplearningbook.org}},
year={2016}
}
@inproceedings{ng2004feature,
title={Feature selection, L1 vs. L2 regularization, and rotational invariance},
author={Ng, Andrew Y},
booktitle={Proceedings of the twenty-first international conference on Machine learning},
pages={78},
year={2004}
}
@article{simonyan2014very,
title={Very deep convolutional networks for large-scale image recognition},
author={Simonyan, Karen and Zisserman, Andrew},
journal={arXiv preprint arXiv:1409.1556},
year={2014}
}
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
@inproceedings{glorot2010understanding,
title={Understanding the difficulty of training deep feedforward neural networks},
author={Glorot, Xavier and Bengio, Yoshua},
booktitle={Proceedings of the thirteenth international conference on artificial intelligence and statistics},
pages={249--256},
year={2010},
organization={JMLR Workshop and Conference Proceedings}
}
@inproceedings{bengio1993problem,
title={The problem of learning long-term dependencies in recurrent networks},
author={Bengio, Yoshua and Frasconi, Paolo and Simard, Patrice},
booktitle={IEEE international conference on neural networks},
pages={1183--1188},
year={1993},
organization={IEEE}
}
@inproceedings{ide2017improvement,
title={Improvement of learning for CNN with ReLU activation by sparse regularization},
author={Ide, Hidenori and Kurita, Takio},
booktitle={2017 International Joint Conference on Neural Networks (IJCNN)},
pages={2684--2691},
year={2017},
organization={IEEE}
}
@inproceedings{ioffe2015batch,
title={Batch normalization: Accelerating deep network training by reducing internal covariate shift},
author={Ioffe, Sergey and Szegedy, Christian},
booktitle={International conference on machine learning},
pages={448--456},
year={2015},
organization={PMLR}
}
@inproceedings{huang2017densely,
title={Densely connected convolutional networks},
author={Huang, Gao and Liu, Zhuang and Van Der Maaten, Laurens and Weinberger, Kilian Q},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4700--4708},
year={2017}
}
@article{rumelhart1986learning,
title={Learning representations by back-propagating errors},
author={Rumelhart, David E and Hinton, Geoffrey E and Williams, Ronald J},
journal={nature},
volume={323},
number={6088},
pages={533--536},
year={1986},
publisher={Nature Publishing Group}
}
@inproceedings{du2019gradient,
title={Gradient descent finds global minima of deep neural networks},
author={Du, Simon and Lee, Jason and Li, Haochuan and Wang, Liwei and Zhai, Xiyu},
booktitle={International Conference on Machine Learning},
pages={1675--1685},
year={2019},
organization={PMLR}
}
@inproceedings{pascanu2013difficulty,
title={On the difficulty of training recurrent neural networks},
author={Pascanu, Razvan and Mikolov, Tomas and Bengio, Yoshua},
booktitle={International conference on machine learning},
pages={1310--1318},
year={2013},
organization={PMLR}
}
@article{li2017visualizing,
title={Visualizing the loss landscape of neural nets},
author={Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom},
journal={arXiv preprint arXiv:1712.09913},
year={2017}
}
@inproceedings{santurkar2018does,
title={How does batch normalization help optimization?},
author={Santurkar, Shibani and Tsipras, Dimitris and Ilyas, Andrew and M{\k{a}}dry, Aleksander},
booktitle={Proceedings of the 32nd international conference on neural information processing systems},
pages={2488--2498},
year={2018}
}
@article{krizhevsky2009learning,
title={Learning multiple layers of features from tiny images},
author={Krizhevsky, Alex and Hinton, Geoffrey and others},
journal={},
year={2009},
publisher={Citeseer}
}
@incollection{lecun2012efficient,
title={Efficient backprop},
author={LeCun, Yann A and Bottou, L{\'e}on and Orr, Genevieve B and M{\"u}ller, Klaus-Robert},
booktitle={Neural networks: Tricks of the trade},
pages={9--48},
year={2012},
publisher={Springer}
}
@book{bishop1995neural,
title={Neural networks for pattern recognition},
author={Bishop, Christopher M and others},
year={1995},
publisher={Oxford university press}
}
@article{vaswani2017attention,
author = {Ashish Vaswani and
Noam Shazeer and
Niki Parmar and
Jakob Uszkoreit and
Llion Jones and
Aidan N. Gomez and
Lukasz Kaiser and
Illia Polosukhin},
title = {Attention Is All You Need},
journal = {CoRR},
volume = {abs/1706.03762},
year = {2017},
url = {http://arxiv.org/abs/1706.03762},
eprinttype = {arXiv},
eprint = {1706.03762},
timestamp = {Sat, 23 Jan 2021 01:20:40 +0100},
biburl = {https://dblp.org/rec/journals/corr/VaswaniSPUJGKP17.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

View File

@ -1 +0,0 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_08_experiment --use_gpu True --num_classes 100 --block_type 'conv_block' --continue_from_epoch -1

View File

@ -1 +0,0 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG38_BN --use_gpu True --num_classes 100 --block_type 'conv_bn' --continue_from_epoch -1

View File

@ -1 +0,0 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG38_BN_RC --use_gpu True --num_classes 100 --block_type 'conv_bn_rc' --continue_from_epoch -1 --learning-rate 0.01

Some files were not shown because too many files have changed in this diff Show More