Compare commits

...

22 Commits

Author SHA1 Message Date
Anton Lydike
46ca7c6dfd final changes 2024-11-22 09:26:24 +00:00
Anton Lydike
c29681b4ba changes 2024-11-19 17:04:58 +00:00
Anton Lydike
ae0e14b5fb add BN+RC layer 2024-11-19 10:38:54 +00:00
Anton Lydike
7861133463 don't plot bias layers 2024-11-19 10:10:02 +00:00
Anton Lydike
94d3a1d484 add runner for batch normalized version 2024-11-19 09:47:18 +00:00
Anton Lydike
cb5c6f4e19 formatting and BN 2024-11-19 09:42:31 +00:00
Anton Lydike
92fccb8eb2 add a bunch of extra files 2024-11-18 20:40:20 +00:00
Anton Lydike
05e53aacaf fix experiment_builder.py 2024-11-18 13:30:36 +00:00
tpmmthomas
58613aee35 Update cw2 debug 2024-11-11 22:41:17 +08:00
tpmmthomas
26364ec94e update cw2 2024-11-11 22:33:32 +08:00
Visual Computing (VICO) Group
98e232af70 Add missing files 2024-11-11 13:00:28 +00:00
Visual Computing (VICO) Group
a404c62b6f Rm cw1 figures 2024-11-11 11:46:48 +00:00
Visual Computing (VICO) Group
45a2df1b11 Update 2024-11-11 11:34:32 +00:00
Visual Computing (VICO) Group
be1f124dff Update 2024-11-11 09:57:57 +00:00
Visual Computing (VICO) Group
9b9a7d50fa Add missing data files 2024-10-14 11:01:45 +01:00
Visual Computing (VICO) Group
5d52a22448 Add missing files 2024-10-14 10:51:43 +01:00
Hakan Bilen
4657cca862
Update README.md 2024-10-14 10:10:17 +01:00
Hakan Bilen
6a17a30da1
Update README.md 2024-10-14 10:08:48 +01:00
Visual Computing (VICO) Group
2fda722e3d Minor update 2024-10-14 10:03:02 +01:00
Visual Computing (VICO) Group
6883eb77c2 Add cw1 2024-10-14 09:56:47 +01:00
tpmmthomas
207595b4a1 update lab 4 2024-10-10 21:52:23 +08:00
tpmmthomas
9f1f3ccd04 Update lab 3 2024-10-03 21:53:33 +08:00
91 changed files with 9924 additions and 1933 deletions

28
.gitignore vendored
View File

@ -1,5 +1,6 @@
#dropbox stuff
*.dropbox*
.idea/*
# Byte-compiled / optimized / DLL files
__pycache__/
@ -25,6 +26,7 @@ var/
*.egg-info/
.installed.cfg
*.egg
*.tar.gz
# PyInstaller
# Usually these files are written by a python script from a template
@ -59,5 +61,29 @@ docs/_build/
# PyBuilder
target/
# Notebook stuff
# Pycharm
.idea/*
#Notebook stuff
notebooks/.ipynb_checkpoints/
#Google Cloud stuff
/google-cloud-sdk
.ipynb_checkpoints/
data/cifar-100-python/
data/MNIST/
solutions/
report/mlp-cw1-template.aux
report/mlp-cw1-template.out
report/mlp-cw1-template.pdf
report/mlp-cw1-template.synctex.gz
.DS_Store
report/mlp-cw2-template.aux
report/mlp-cw2-template.out
report/mlp-cw2-template.pdf
report/mlp-cw2-template.synctex.gz
report/mlp-cw2-template.bbl
report/mlp-cw2-template.blg
venv
saved_models

View File

@ -6,19 +6,10 @@ This assignment-based course is focused on the implementation and evaluation of
The code in this repository is split into:
* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
* a Python package `mlp`, a [NumPy](http://www.numpy.org/) based neural network package designed specifically for the course that students will implement parts of and extend during the course labs and assignments,
* a series of [Jupyter](http://jupyter.org/) notebooks in the `notebooks` directory containing explanatory material and coding exercises to be completed during the course labs.
## Remote working
If you are working remotely, follow this [guide](notes/remote-working-guide.md).
## Getting set up
Detailed instructions for setting up a development environment for the course are given in [this file](notes/environment-set-up.md). Students doing the course will spend part of the first lab getting their own environment set up.
## Exercises
If you are first time users of jupyter notebook, check out `notebooks/00_notebook.ipynb` to understand its features.
To get started with the exercises, go to the `notebooks` directory. For lab 1, work with the notebook starting with the prefix `01`, and so on.
## Coursework 2
This branch contains the python code and latex files of the first coursework. The code follows the same structure as the labs, in particular the mlp package, and a specific notebook is provided to help you run experiments.
* Detailed instructions are given in MLP2024_25_CW2_Spec.pdf (see Learn, Assessment, CW2).
* The [report directory](https://github.com/VICO-UoE/mlpractical/tree/mlp2024-25/coursework2/report) contains the latex files that you will use to create your report.

View File

@ -0,0 +1,102 @@
train_acc,train_loss,val_acc,val_loss
0.010694736842105264,4.827323,0.024800000000000003,4.5659676
0.03562105263157895,4.3888855,0.0604,4.136276
0.0757684210526316,3.998175,0.09480000000000001,3.8678854
0.10734736842105265,3.784943,0.12159999999999999,3.6687074
0.13741052631578948,3.6023798,0.15439999999999998,3.4829779
0.16888421052631578,3.4196754,0.1864,3.3093607
0.1941263157894737,3.2674048,0.20720000000000002,3.2223148
0.21861052631578948,3.139925,0.22880000000000003,3.1171055
0.24134736842105264,3.0145736,0.24760000000000001,3.0554724
0.26399999999999996,2.9004965,0.2552,2.9390912
0.27898947368421056,2.815607,0.2764,2.9205213
0.29532631578947366,2.7256868,0.2968,2.7410471
0.31138947368421044,2.6567938,0.3016,2.7083752
0.3236842105263158,2.595405,0.322,2.665904
0.33486315789473686,2.5434496,0.3176,2.688214
0.3462526315789474,2.5021079,0.33159999999999995,2.648656
0.35381052631578946,2.4609485,0.342,2.5658453
0.36157894736842106,2.4152951,0.34119999999999995,2.5403407
0.36774736842105266,2.382958,0.3332,2.6936982
0.37753684210526317,2.3510027,0.36160000000000003,2.4663532
0.38597894736842114,2.319616,0.3608,2.4559999
0.3912421052631579,2.294115,0.3732,2.3644555
0.39840000000000003,2.2598042,0.3716,2.4516551
0.4036,2.2318766,0.37439999999999996,2.4189563
0.4105263157894737,2.2035582,0.3772,2.3899698
0.41501052631578944,2.1830406,0.3876,2.3215945
0.4193263157894737,2.158597,0.37800000000000006,2.3831298
0.4211578947368421,2.148888,0.38160000000000005,2.3436418
0.4260842105263159,2.1250536,0.39840000000000003,2.3471045
0.4313684210526315,2.107519,0.4044,2.2744477
0.4370526315789474,2.0837262,0.398,2.245617
0.439642105263158,2.0691078,0.41200000000000003,2.216309
0.4440842105263158,2.046351,0.4096,2.2329648
0.44696842105263157,2.0330904,0.4104,2.1841388
0.4518105263157895,2.0200553,0.4244,2.1780539
0.45298947368421055,2.0069249,0.42719999999999997,2.1625984
0.4602105263157895,1.9896894,0.4204,2.2195568
0.46023157894736844,1.9788533,0.4244,2.1803434
0.46101052631578954,1.9693571,0.4128,2.1858895
0.46774736842105263,1.9547894,0.4204,2.1908271
0.4671157894736842,1.9390026,0.4244,2.1841395
0.4698105263157895,1.924038,0.424,2.1843896
0.4738736842105264,1.9161719,0.43,2.154806
0.47541052631578945,1.9033127,0.4463999999999999,2.1130056
0.48,1.8961077,0.44439999999999996,2.113019
0.48456842105263154,1.8838875,0.43079999999999996,2.1191697
0.4857263157894737,1.8711865,0.44920000000000004,2.1213412
0.4887578947368421,1.8590263,0.44799999999999995,2.1077166
0.49035789473684216,1.8479114,0.4428,2.0737479
0.4908421052631579,1.845268,0.4436,2.07655
0.4939368421052632,1.8336699,0.4548,2.0769904
0.49924210526315793,1.8237538,0.4548,2.061769
0.49677894736842104,1.8111013,0.44240000000000007,2.0676718
0.5008842105263157,1.8031327,0.4548,2.0859065
0.5,1.8026625,0.458,2.0704215
0.5030736842105263,1.792004,0.4596,2.1113508
0.505578947368421,1.7810374,0.45679999999999993,2.0382714
0.5090315789473684,1.7691813,0.4444000000000001,2.0911386
0.512042105263158,1.7633294,0.4616,2.0458508
0.5142736842105263,1.7549652,0.4464,2.0786576
0.5128421052631579,1.7518128,0.4656,2.026332
0.518042105263158,1.7420768,0.46,2.0141299
0.5182315789473684,1.7321203,0.45960000000000006,2.0226884
0.5192842105263158,1.7264535,0.46279999999999993,2.0182638
0.5217894736842105,1.7245325,0.46399999999999997,2.0110855
0.5229684210526316,1.7184331,0.46679999999999994,2.0191038
0.5227578947368421,1.7116771,0.4604,2.0334535
0.5245894736842105,1.7009526,0.4692,2.0072439
0.5262315789473684,1.6991171,0.4700000000000001,2.0296187
0.5278526315789474,1.6958193,0.4708,1.9912667
0.527157894736842,1.6907407,0.4736,2.006095
0.5299578947368421,1.6808176,0.4715999999999999,2.012164
0.5313052631578947,1.676356,0.47239999999999993,1.9955354
0.5338315789473685,1.6731659,0.47839999999999994,2.005768
0.5336000000000001,1.662152,0.4672,2.015392
0.5354736842105263,1.6638054,0.4692,1.9890119
0.5397894736842105,1.6575475,0.4768,2.0090258
0.5386526315789474,1.6595734,0.4824,1.9728817
0.5376631578947368,1.6536722,0.4816,1.9769167
0.5384842105263159,1.6495628,0.47600000000000003,1.9980135
0.5380842105263157,1.6488388,0.478,1.9884782
0.5393473684210528,1.6408547,0.48,1.9772192
0.5415157894736843,1.632917,0.4828,1.9732709
0.5394947368421052,1.6340653,0.4776,1.9623082
0.5429052631578948,1.6340532,0.47759999999999997,1.9812362
0.5452421052631579,1.6246406,0.48119999999999996,1.9846246
0.5436210526315789,1.6288266,0.4864,1.9822198
0.5437684210526316,1.6240481,0.48279999999999995,1.9768158
0.546357894736842,1.6208181,0.4804,1.9625885
0.5485052631578946,1.6164333,0.47839999999999994,1.9738724
0.5466736842105263,1.6169226,0.47800000000000004,1.9842362
0.547621052631579,1.6159856,0.4828,1.9709526
0.5480421052631579,1.6175526,0.48560000000000003,1.967775
0.5468421052631579,1.6149833,0.48119999999999996,1.9626708
0.5493894736842105,1.6063902,0.4835999999999999,1.96621
0.5490736842105263,1.6096952,0.48120000000000007,1.9742922
0.5514736842105264,1.6084315,0.4867999999999999,1.9604725
0.5489263157894737,1.6069487,0.4831999999999999,1.9733659
0.5494947368421053,1.6030664,0.49079999999999996,1.9693874
0.5516842105263158,1.6043342,0.486,1.9647765
0.552442105263158,1.6039867,0.48480000000000006,1.9649359
1 train_acc train_loss val_acc val_loss
2 0.010694736842105264 4.827323 0.024800000000000003 4.5659676
3 0.03562105263157895 4.3888855 0.0604 4.136276
4 0.0757684210526316 3.998175 0.09480000000000001 3.8678854
5 0.10734736842105265 3.784943 0.12159999999999999 3.6687074
6 0.13741052631578948 3.6023798 0.15439999999999998 3.4829779
7 0.16888421052631578 3.4196754 0.1864 3.3093607
8 0.1941263157894737 3.2674048 0.20720000000000002 3.2223148
9 0.21861052631578948 3.139925 0.22880000000000003 3.1171055
10 0.24134736842105264 3.0145736 0.24760000000000001 3.0554724
11 0.26399999999999996 2.9004965 0.2552 2.9390912
12 0.27898947368421056 2.815607 0.2764 2.9205213
13 0.29532631578947366 2.7256868 0.2968 2.7410471
14 0.31138947368421044 2.6567938 0.3016 2.7083752
15 0.3236842105263158 2.595405 0.322 2.665904
16 0.33486315789473686 2.5434496 0.3176 2.688214
17 0.3462526315789474 2.5021079 0.33159999999999995 2.648656
18 0.35381052631578946 2.4609485 0.342 2.5658453
19 0.36157894736842106 2.4152951 0.34119999999999995 2.5403407
20 0.36774736842105266 2.382958 0.3332 2.6936982
21 0.37753684210526317 2.3510027 0.36160000000000003 2.4663532
22 0.38597894736842114 2.319616 0.3608 2.4559999
23 0.3912421052631579 2.294115 0.3732 2.3644555
24 0.39840000000000003 2.2598042 0.3716 2.4516551
25 0.4036 2.2318766 0.37439999999999996 2.4189563
26 0.4105263157894737 2.2035582 0.3772 2.3899698
27 0.41501052631578944 2.1830406 0.3876 2.3215945
28 0.4193263157894737 2.158597 0.37800000000000006 2.3831298
29 0.4211578947368421 2.148888 0.38160000000000005 2.3436418
30 0.4260842105263159 2.1250536 0.39840000000000003 2.3471045
31 0.4313684210526315 2.107519 0.4044 2.2744477
32 0.4370526315789474 2.0837262 0.398 2.245617
33 0.439642105263158 2.0691078 0.41200000000000003 2.216309
34 0.4440842105263158 2.046351 0.4096 2.2329648
35 0.44696842105263157 2.0330904 0.4104 2.1841388
36 0.4518105263157895 2.0200553 0.4244 2.1780539
37 0.45298947368421055 2.0069249 0.42719999999999997 2.1625984
38 0.4602105263157895 1.9896894 0.4204 2.2195568
39 0.46023157894736844 1.9788533 0.4244 2.1803434
40 0.46101052631578954 1.9693571 0.4128 2.1858895
41 0.46774736842105263 1.9547894 0.4204 2.1908271
42 0.4671157894736842 1.9390026 0.4244 2.1841395
43 0.4698105263157895 1.924038 0.424 2.1843896
44 0.4738736842105264 1.9161719 0.43 2.154806
45 0.47541052631578945 1.9033127 0.4463999999999999 2.1130056
46 0.48 1.8961077 0.44439999999999996 2.113019
47 0.48456842105263154 1.8838875 0.43079999999999996 2.1191697
48 0.4857263157894737 1.8711865 0.44920000000000004 2.1213412
49 0.4887578947368421 1.8590263 0.44799999999999995 2.1077166
50 0.49035789473684216 1.8479114 0.4428 2.0737479
51 0.4908421052631579 1.845268 0.4436 2.07655
52 0.4939368421052632 1.8336699 0.4548 2.0769904
53 0.49924210526315793 1.8237538 0.4548 2.061769
54 0.49677894736842104 1.8111013 0.44240000000000007 2.0676718
55 0.5008842105263157 1.8031327 0.4548 2.0859065
56 0.5 1.8026625 0.458 2.0704215
57 0.5030736842105263 1.792004 0.4596 2.1113508
58 0.505578947368421 1.7810374 0.45679999999999993 2.0382714
59 0.5090315789473684 1.7691813 0.4444000000000001 2.0911386
60 0.512042105263158 1.7633294 0.4616 2.0458508
61 0.5142736842105263 1.7549652 0.4464 2.0786576
62 0.5128421052631579 1.7518128 0.4656 2.026332
63 0.518042105263158 1.7420768 0.46 2.0141299
64 0.5182315789473684 1.7321203 0.45960000000000006 2.0226884
65 0.5192842105263158 1.7264535 0.46279999999999993 2.0182638
66 0.5217894736842105 1.7245325 0.46399999999999997 2.0110855
67 0.5229684210526316 1.7184331 0.46679999999999994 2.0191038
68 0.5227578947368421 1.7116771 0.4604 2.0334535
69 0.5245894736842105 1.7009526 0.4692 2.0072439
70 0.5262315789473684 1.6991171 0.4700000000000001 2.0296187
71 0.5278526315789474 1.6958193 0.4708 1.9912667
72 0.527157894736842 1.6907407 0.4736 2.006095
73 0.5299578947368421 1.6808176 0.4715999999999999 2.012164
74 0.5313052631578947 1.676356 0.47239999999999993 1.9955354
75 0.5338315789473685 1.6731659 0.47839999999999994 2.005768
76 0.5336000000000001 1.662152 0.4672 2.015392
77 0.5354736842105263 1.6638054 0.4692 1.9890119
78 0.5397894736842105 1.6575475 0.4768 2.0090258
79 0.5386526315789474 1.6595734 0.4824 1.9728817
80 0.5376631578947368 1.6536722 0.4816 1.9769167
81 0.5384842105263159 1.6495628 0.47600000000000003 1.9980135
82 0.5380842105263157 1.6488388 0.478 1.9884782
83 0.5393473684210528 1.6408547 0.48 1.9772192
84 0.5415157894736843 1.632917 0.4828 1.9732709
85 0.5394947368421052 1.6340653 0.4776 1.9623082
86 0.5429052631578948 1.6340532 0.47759999999999997 1.9812362
87 0.5452421052631579 1.6246406 0.48119999999999996 1.9846246
88 0.5436210526315789 1.6288266 0.4864 1.9822198
89 0.5437684210526316 1.6240481 0.48279999999999995 1.9768158
90 0.546357894736842 1.6208181 0.4804 1.9625885
91 0.5485052631578946 1.6164333 0.47839999999999994 1.9738724
92 0.5466736842105263 1.6169226 0.47800000000000004 1.9842362
93 0.547621052631579 1.6159856 0.4828 1.9709526
94 0.5480421052631579 1.6175526 0.48560000000000003 1.967775
95 0.5468421052631579 1.6149833 0.48119999999999996 1.9626708
96 0.5493894736842105 1.6063902 0.4835999999999999 1.96621
97 0.5490736842105263 1.6096952 0.48120000000000007 1.9742922
98 0.5514736842105264 1.6084315 0.4867999999999999 1.9604725
99 0.5489263157894737 1.6069487 0.4831999999999999 1.9733659
100 0.5494947368421053 1.6030664 0.49079999999999996 1.9693874
101 0.5516842105263158 1.6043342 0.486 1.9647765
102 0.552442105263158 1.6039867 0.48480000000000006 1.9649359

View File

@ -0,0 +1,2 @@
test_acc,test_loss
0.49950000000000006,1.9105633
1 test_acc test_loss
2 0.49950000000000006 1.9105633

View File

@ -0,0 +1,101 @@
train_acc,train_loss,val_acc,val_loss
0.009263157894736843,4.8649125,0.0104,4.630689
0.009810526315789474,4.6264124,0.009600000000000001,4.618983
0.009705263157894738,4.621914,0.011200000000000002,4.6184525
0.008989473684210525,4.619472,0.0064,4.6164784
0.009747368421052633,4.6168556,0.0076,4.6138463
0.00951578947368421,4.6156826,0.0108,4.6139345
0.009789473684210525,4.614809,0.008400000000000001,4.6116896
0.009936842105263159,4.613147,0.0104,4.6148276
0.009810526315789474,4.612325,0.0076,4.6123877
0.009094736842105263,4.6117926,0.007200000000000001,4.6149993
0.008421052631578947,4.611283,0.011600000000000001,4.6114736
0.009010526315789472,4.6105323,0.009600000000000001,4.607559
0.009894736842105263,4.6103206,0.008400000000000001,4.6086206
0.00934736842105263,4.6095214,0.011200000000000002,4.6091933
0.009473684210526316,4.6095295,0.008,4.6095695
0.010252631578947369,4.609189,0.0104,4.610459
0.009536842105263158,4.6087623,0.0092,4.6091356
0.00848421052631579,4.6086617,0.009600000000000001,4.609126
0.008421052631578947,4.6083455,0.011200000000000002,4.6088147
0.009410526315789473,4.608145,0.0068000000000000005,4.608519
0.009263157894736843,4.6078997,0.0092,4.6085033
0.009389473684210526,4.607453,0.01,4.6083508
0.008989473684210528,4.6075597,0.008400000000000001,4.6073136
0.009326315789473686,4.607266,0.008,4.6069093
0.01,4.607154,0.0076,4.6069508
0.008778947368421053,4.607089,0.011200000000000002,4.60659
0.009326315789473684,4.606807,0.0068,4.6072598
0.009031578947368422,4.6068263,0.011200000000000002,4.607257
0.008842105263157896,4.6066294,0.008,4.606883
0.008968421052631579,4.606647,0.006400000000000001,4.607275
0.008947368421052631,4.6065364,0.0092,4.606976
0.008842105263157896,4.6064167,0.0076,4.607016
0.008799999999999999,4.606425,0.0096,4.607184
0.009326315789473686,4.606305,0.0072,4.6068683
0.00905263157894737,4.606274,0.0072,4.606982
0.00934736842105263,4.6062336,0.007200000000000001,4.607209
0.009221052631578948,4.606221,0.0076,4.607369
0.009557894736842105,4.60607,0.0076,4.6074376
0.009073684210526317,4.6061006,0.0072,4.607068
0.009242105263157895,4.606005,0.0064,4.6067224
0.009957894736842107,4.605986,0.0072,4.6068263
0.009052631578947368,4.605935,0.0072,4.6067867
0.008694736842105264,4.6059127,0.0064,4.6070905
0.009536842105263158,4.605874,0.006400000000000001,4.606976
0.009663157894736842,4.605872,0.0072,4.6068897
0.008821052631578948,4.6057997,0.0064,4.607028
0.009768421052631579,4.605778,0.0072,4.6069264
0.0092,4.6057644,0.007200000000000001,4.607018
0.008926315789473685,4.6057386,0.0072,4.60698
0.008989473684210525,4.6057277,0.0064,4.6070237
0.009242105263157895,4.6057053,0.0064,4.6069183
0.009094736842105263,4.605692,0.006400000000000001,4.6068764
0.009473684210526316,4.60566,0.0064,4.606909
0.009494736842105262,4.605613,0.0064,4.606978
0.009747368421052631,4.6056285,0.0064,4.606753
0.009789473684210527,4.605578,0.006400000000000001,4.6068797
0.009199999999999998,4.6055675,0.0064,4.606888
0.009073684210526317,4.6055593,0.0064,4.606874
0.008821052631578948,4.6055293,0.006400000000000001,4.606851
0.009326315789473684,4.6055255,0.0064,4.606871
0.009557894736842105,4.6055083,0.006400000000000001,4.606851
0.009600000000000001,4.605491,0.0064,4.6068635
0.00856842105263158,4.605466,0.0064,4.606862
0.009894736842105263,4.605463,0.006400000000000001,4.6068873
0.009494736842105262,4.605441,0.0064,4.6068926
0.008673684210526314,4.6054277,0.0064,4.6068554
0.009221052631578948,4.6054296,0.0063999999999999994,4.6068907
0.008989473684210528,4.605404,0.0064,4.6068807
0.00928421052631579,4.6053905,0.006400000000000001,4.6068707
0.0092,4.6053743,0.0064,4.606894
0.008989473684210525,4.605368,0.0064,4.606845
0.009515789473684212,4.605355,0.0064,4.6068635
0.009073684210526317,4.605352,0.0064,4.6068773
0.009642105263157895,4.6053243,0.0064,4.606883
0.009747368421052633,4.6053176,0.0064,4.6069
0.009873684210526316,4.6053023,0.0064,4.6068873
0.009536842105263156,4.605297,0.0064,4.6068654
0.009515789473684212,4.6052866,0.0064,4.6068883
0.009978947368421053,4.605265,0.006400000000000001,4.606894
0.009957894736842107,4.605259,0.0064,4.6068826
0.009410526315789475,4.6052504,0.0064,4.6068697
0.01002105263157895,4.6052403,0.006400000000000001,4.6068807
0.01002105263157895,4.6052313,0.0064,4.606872
0.00951578947368421,4.605224,0.0064,4.6068883
0.009852631578947368,4.605219,0.006400000000000001,4.606871
0.009894736842105265,4.605209,0.0064,4.606871
0.00922105263157895,4.605204,0.0064,4.6068654
0.010042105263157896,4.605193,0.0064,4.6068764
0.009978947368421053,4.6051874,0.006400000000000001,4.6068697
0.009747368421052633,4.605183,0.0064,4.6068673
0.010189473684210526,4.605178,0.0064,4.606873
0.009789473684210527,4.605173,0.0064,4.6068773
0.009936842105263159,4.605169,0.0064,4.606874
0.010042105263157894,4.605166,0.0064,4.606877
0.009494736842105262,4.6051593,0.0064,4.606874
0.009536842105263158,4.6051593,0.0063999999999999994,4.606874
0.010021052631578946,4.6051564,0.006400000000000001,4.6068716
0.009747368421052631,4.605154,0.0064,4.6068726
0.009642105263157895,4.605153,0.0064,4.606872
0.009305263157894737,4.6051517,0.0064,4.6068726
1 train_acc train_loss val_acc val_loss
2 0.009263157894736843 4.8649125 0.0104 4.630689
3 0.009810526315789474 4.6264124 0.009600000000000001 4.618983
4 0.009705263157894738 4.621914 0.011200000000000002 4.6184525
5 0.008989473684210525 4.619472 0.0064 4.6164784
6 0.009747368421052633 4.6168556 0.0076 4.6138463
7 0.00951578947368421 4.6156826 0.0108 4.6139345
8 0.009789473684210525 4.614809 0.008400000000000001 4.6116896
9 0.009936842105263159 4.613147 0.0104 4.6148276
10 0.009810526315789474 4.612325 0.0076 4.6123877
11 0.009094736842105263 4.6117926 0.007200000000000001 4.6149993
12 0.008421052631578947 4.611283 0.011600000000000001 4.6114736
13 0.009010526315789472 4.6105323 0.009600000000000001 4.607559
14 0.009894736842105263 4.6103206 0.008400000000000001 4.6086206
15 0.00934736842105263 4.6095214 0.011200000000000002 4.6091933
16 0.009473684210526316 4.6095295 0.008 4.6095695
17 0.010252631578947369 4.609189 0.0104 4.610459
18 0.009536842105263158 4.6087623 0.0092 4.6091356
19 0.00848421052631579 4.6086617 0.009600000000000001 4.609126
20 0.008421052631578947 4.6083455 0.011200000000000002 4.6088147
21 0.009410526315789473 4.608145 0.0068000000000000005 4.608519
22 0.009263157894736843 4.6078997 0.0092 4.6085033
23 0.009389473684210526 4.607453 0.01 4.6083508
24 0.008989473684210528 4.6075597 0.008400000000000001 4.6073136
25 0.009326315789473686 4.607266 0.008 4.6069093
26 0.01 4.607154 0.0076 4.6069508
27 0.008778947368421053 4.607089 0.011200000000000002 4.60659
28 0.009326315789473684 4.606807 0.0068 4.6072598
29 0.009031578947368422 4.6068263 0.011200000000000002 4.607257
30 0.008842105263157896 4.6066294 0.008 4.606883
31 0.008968421052631579 4.606647 0.006400000000000001 4.607275
32 0.008947368421052631 4.6065364 0.0092 4.606976
33 0.008842105263157896 4.6064167 0.0076 4.607016
34 0.008799999999999999 4.606425 0.0096 4.607184
35 0.009326315789473686 4.606305 0.0072 4.6068683
36 0.00905263157894737 4.606274 0.0072 4.606982
37 0.00934736842105263 4.6062336 0.007200000000000001 4.607209
38 0.009221052631578948 4.606221 0.0076 4.607369
39 0.009557894736842105 4.60607 0.0076 4.6074376
40 0.009073684210526317 4.6061006 0.0072 4.607068
41 0.009242105263157895 4.606005 0.0064 4.6067224
42 0.009957894736842107 4.605986 0.0072 4.6068263
43 0.009052631578947368 4.605935 0.0072 4.6067867
44 0.008694736842105264 4.6059127 0.0064 4.6070905
45 0.009536842105263158 4.605874 0.006400000000000001 4.606976
46 0.009663157894736842 4.605872 0.0072 4.6068897
47 0.008821052631578948 4.6057997 0.0064 4.607028
48 0.009768421052631579 4.605778 0.0072 4.6069264
49 0.0092 4.6057644 0.007200000000000001 4.607018
50 0.008926315789473685 4.6057386 0.0072 4.60698
51 0.008989473684210525 4.6057277 0.0064 4.6070237
52 0.009242105263157895 4.6057053 0.0064 4.6069183
53 0.009094736842105263 4.605692 0.006400000000000001 4.6068764
54 0.009473684210526316 4.60566 0.0064 4.606909
55 0.009494736842105262 4.605613 0.0064 4.606978
56 0.009747368421052631 4.6056285 0.0064 4.606753
57 0.009789473684210527 4.605578 0.006400000000000001 4.6068797
58 0.009199999999999998 4.6055675 0.0064 4.606888
59 0.009073684210526317 4.6055593 0.0064 4.606874
60 0.008821052631578948 4.6055293 0.006400000000000001 4.606851
61 0.009326315789473684 4.6055255 0.0064 4.606871
62 0.009557894736842105 4.6055083 0.006400000000000001 4.606851
63 0.009600000000000001 4.605491 0.0064 4.6068635
64 0.00856842105263158 4.605466 0.0064 4.606862
65 0.009894736842105263 4.605463 0.006400000000000001 4.6068873
66 0.009494736842105262 4.605441 0.0064 4.6068926
67 0.008673684210526314 4.6054277 0.0064 4.6068554
68 0.009221052631578948 4.6054296 0.0063999999999999994 4.6068907
69 0.008989473684210528 4.605404 0.0064 4.6068807
70 0.00928421052631579 4.6053905 0.006400000000000001 4.6068707
71 0.0092 4.6053743 0.0064 4.606894
72 0.008989473684210525 4.605368 0.0064 4.606845
73 0.009515789473684212 4.605355 0.0064 4.6068635
74 0.009073684210526317 4.605352 0.0064 4.6068773
75 0.009642105263157895 4.6053243 0.0064 4.606883
76 0.009747368421052633 4.6053176 0.0064 4.6069
77 0.009873684210526316 4.6053023 0.0064 4.6068873
78 0.009536842105263156 4.605297 0.0064 4.6068654
79 0.009515789473684212 4.6052866 0.0064 4.6068883
80 0.009978947368421053 4.605265 0.006400000000000001 4.606894
81 0.009957894736842107 4.605259 0.0064 4.6068826
82 0.009410526315789475 4.6052504 0.0064 4.6068697
83 0.01002105263157895 4.6052403 0.006400000000000001 4.6068807
84 0.01002105263157895 4.6052313 0.0064 4.606872
85 0.00951578947368421 4.605224 0.0064 4.6068883
86 0.009852631578947368 4.605219 0.006400000000000001 4.606871
87 0.009894736842105265 4.605209 0.0064 4.606871
88 0.00922105263157895 4.605204 0.0064 4.6068654
89 0.010042105263157896 4.605193 0.0064 4.6068764
90 0.009978947368421053 4.6051874 0.006400000000000001 4.6068697
91 0.009747368421052633 4.605183 0.0064 4.6068673
92 0.010189473684210526 4.605178 0.0064 4.606873
93 0.009789473684210527 4.605173 0.0064 4.6068773
94 0.009936842105263159 4.605169 0.0064 4.606874
95 0.010042105263157894 4.605166 0.0064 4.606877
96 0.009494736842105262 4.6051593 0.0064 4.606874
97 0.009536842105263158 4.6051593 0.0063999999999999994 4.606874
98 0.010021052631578946 4.6051564 0.006400000000000001 4.6068716
99 0.009747368421052631 4.605154 0.0064 4.6068726
100 0.009642105263157895 4.605153 0.0064 4.606872
101 0.009305263157894737 4.6051517 0.0064 4.6068726

View File

@ -0,0 +1,2 @@
test_acc,test_loss
0.01,4.608619
1 test_acc test_loss
2 0.01 4.608619

Binary file not shown.

Binary file not shown.

BIN
data/emnist-test.npz Normal file

Binary file not shown.

BIN
data/emnist-train.npz Normal file

Binary file not shown.

BIN
data/emnist-valid.npz Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
"""Machine Learning Practical package."""
__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham']
__authors__ = ['Pawel Swietojanski', 'Steve Renals', 'Matt Graham', 'Antreas Antoniou']
DEFAULT_SEED = 123456 # Default random number generator seed if none provided.

View File

@ -7,8 +7,17 @@ data points.
import pickle
import gzip
import sys
import numpy as np
import os
from PIL import Image
from torch.utils import data
from torch.utils.data import Dataset
from torchvision import transforms
from torchvision.datasets.utils import download_url, check_integrity
from mlp import DEFAULT_SEED
@ -35,23 +44,54 @@ class DataProvider(object):
"""
self.inputs = inputs
self.targets = targets
self.batch_size = batch_size
assert max_num_batches != 0 and not max_num_batches < -1, (
'max_num_batches should be -1 or > 0')
self.max_num_batches = max_num_batches
if batch_size < 1:
raise ValueError('batch_size must be >= 1')
self._batch_size = batch_size
if max_num_batches == 0 or max_num_batches < -1:
raise ValueError('max_num_batches must be -1 or > 0')
self._max_num_batches = max_num_batches
self._update_num_batches()
self.shuffle_order = shuffle_order
self._current_order = np.arange(inputs.shape[0])
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
self.new_epoch()
@property
def batch_size(self):
"""Number of data points to include in each batch."""
return self._batch_size
@batch_size.setter
def batch_size(self, value):
if value < 1:
raise ValueError('batch_size must be >= 1')
self._batch_size = value
self._update_num_batches()
@property
def max_num_batches(self):
"""Maximum number of batches to iterate over in an epoch."""
return self._max_num_batches
@max_num_batches.setter
def max_num_batches(self, value):
if value == 0 or value < -1:
raise ValueError('max_num_batches must be -1 or > 0')
self._max_num_batches = value
self._update_num_batches()
def _update_num_batches(self):
"""Updates number of batches to iterate over."""
# maximum possible number of batches is equal to number of whole times
# batch_size divides in to the number of data points which can be
# found using integer division
possible_num_batches = self.inputs.shape[0] // batch_size
possible_num_batches = self.inputs.shape[0] // self.batch_size
if self.max_num_batches == -1:
self.num_batches = possible_num_batches
else:
self.num_batches = min(self.max_num_batches, possible_num_batches)
self.shuffle_order = shuffle_order
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
self.reset()
def __iter__(self):
"""Implements Python iterator interface.
@ -63,27 +103,36 @@ class DataProvider(object):
"""
return self
def reset(self):
"""Resets the provider to the initial state to use in a new epoch."""
def new_epoch(self):
"""Starts a new epoch (pass through data), possibly shuffling first."""
self._curr_batch = 0
if self.shuffle_order:
self.shuffle()
def shuffle(self):
"""Randomly shuffles order of data."""
new_order = self.rng.permutation(self.inputs.shape[0])
self.inputs = self.inputs[new_order]
self.targets = self.targets[new_order]
def __next__(self):
return self.next()
def reset(self):
"""Resets the provider to the initial state."""
inv_perm = np.argsort(self._current_order)
self._current_order = self._current_order[inv_perm]
self.inputs = self.inputs[inv_perm]
self.targets = self.targets[inv_perm]
self.new_epoch()
def shuffle(self):
"""Randomly shuffles order of data."""
perm = self.rng.permutation(self.inputs.shape[0])
self._current_order = self._current_order[perm]
self.inputs = self.inputs[perm]
self.targets = self.targets[perm]
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
if self._curr_batch + 1 > self.num_batches:
# no more batches in current iteration through data set so reset
# the dataset for another pass and indicate iteration is at end
self.reset()
# no more batches in current iteration through data set so start
# new epoch ready for another pass and indicate iteration is at end
self.new_epoch()
raise StopIteration()
# create an index slice corresponding to current batch number
batch_slice = slice(self._curr_batch * self.batch_size,
@ -93,7 +142,6 @@ class DataProvider(object):
self._curr_batch += 1
return inputs_batch, targets_batch
class MNISTDataProvider(DataProvider):
"""Data provider for MNIST handwritten digit images."""
@ -114,7 +162,7 @@ class MNISTDataProvider(DataProvider):
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'eval'], (
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
@ -160,6 +208,78 @@ class MNISTDataProvider(DataProvider):
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class EMNISTDataProvider(DataProvider):
"""Data provider for EMNIST handwritten digit images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
shuffle_order=True, rng=None, flatten=False):
"""Create a new EMNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'eval'. Determines which
portion of the EMNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
shuffle_order (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
"""
# check a valid which_set was provided
assert which_set in ['train', 'valid', 'test'], (
'Expected which_set to be either train, valid or eval. '
'Got {0}'.format(which_set)
)
self.which_set = which_set
self.num_classes = 47
# construct path to data using os.path.join to ensure the correct path
# separator for the current platform / OS is used
# MLP_DATA_DIR environment variable should point to the data directory
data_path = os.path.join(
os.environ['MLP_DATA_DIR'], 'emnist-{0}.npz'.format(which_set))
assert os.path.isfile(data_path), (
'Data file does not exist at expected path: ' + data_path
)
# load data from compressed numpy file
loaded = np.load(data_path)
print(loaded.keys())
inputs, targets = loaded['inputs'], loaded['targets']
inputs = inputs.astype(np.float32)
targets = targets.astype(np.int)
if flatten:
inputs = np.reshape(inputs, newshape=(-1, 28*28))
else:
inputs = np.reshape(inputs, newshape=(-1, 28, 28, 1))
inputs = inputs / 255.0
# pass the loaded data to the parent class __init__
super(EMNISTDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(EMNISTDataProvider, self).next()
return inputs_batch, self.to_one_of_k(targets_batch)
def to_one_of_k(self, int_targets):
"""Converts integer coded class target to 1 of K coded targets.
Args:
int_targets (ndarray): Array of integer coded class targets (i.e.
where an integer from 0 to `num_classes` - 1 is used to
indicate which is the correct class). This should be of shape
(num_data,).
Returns:
Array of 1 of K coded targets i.e. an array of shape
(num_data, num_classes) where for each row all elements are equal
to zero except for the column corresponding to the correct class
which is equal to one.
"""
one_of_k_targets = np.zeros((int_targets.shape[0], self.num_classes))
one_of_k_targets[range(int_targets.shape[0]), int_targets] = 1
return one_of_k_targets
class MetOfficeDataProvider(DataProvider):
"""South Scotland Met Office weather data provider."""
@ -253,3 +373,374 @@ class CCPPDataProvider(DataProvider):
targets = loaded[which_set + '_targets']
super(CCPPDataProvider, self).__init__(
inputs, targets, batch_size, max_num_batches, shuffle_order, rng)
class EMNISTPytorchDataProvider(Dataset):
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
shuffle_order=True, rng=None, flatten=False, transforms=None):
self.numpy_data_provider = EMNISTDataProvider(which_set=which_set, batch_size=batch_size, max_num_batches=max_num_batches,
shuffle_order=shuffle_order, rng=rng, flatten=flatten)
self.transforms = transforms
def __getitem__(self, item):
x = self.numpy_data_provider.inputs[item]
for augmentation in self.transforms:
x = augmentation(x)
return x, int(self.numpy_data_provider.targets[item])
def __len__(self):
return len(self.numpy_data_provider.targets)
class AugmentedMNISTDataProvider(MNISTDataProvider):
"""Data provider for MNIST dataset which randomly transforms images."""
def __init__(self, which_set='train', batch_size=100, max_num_batches=-1,
shuffle_order=True, rng=None, transformer=None):
"""Create a new augmented MNIST data provider object.
Args:
which_set: One of 'train', 'valid' or 'test'. Determines which
portion of the MNIST data this object should provide.
batch_size (int): Number of data points to include in each batch.
max_num_batches (int): Maximum number of batches to iterate over
in an epoch. If `max_num_batches * batch_size > num_data` then
only as many batches as the data can be split into will be
used. If set to -1 all of the data will be used.
shuffle_order (bool): Whether to randomly permute the order of
the data before each epoch.
rng (RandomState): A seeded random number generator.
transformer: Function which takes an `inputs` array of shape
(batch_size, input_dim) corresponding to a batch of input
images and a `rng` random number generator object (i.e. a
call signature `transformer(inputs, rng)`) and applies a
potentiall random set of transformations to some / all of the
input images as each new batch is returned when iterating over
the data provider.
"""
super(AugmentedMNISTDataProvider, self).__init__(
which_set, batch_size, max_num_batches, shuffle_order, rng)
self.transformer = transformer
def next(self):
"""Returns next data batch or raises `StopIteration` if at end."""
inputs_batch, targets_batch = super(
AugmentedMNISTDataProvider, self).next()
transformed_inputs_batch = self.transformer(inputs_batch, self.rng)
return transformed_inputs_batch, targets_batch
class Omniglot(data.Dataset):
"""`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
Args:
root (string): Root directory of dataset where directory
``cifar-10-batches-py`` exists or will be saved to if download is set to True.
train (bool, optional): If True, creates dataset from training set, otherwise
creates from test set.
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, ``transforms.RandomCrop``
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
download (bool, optional): If true, downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, it is not
downloaded again.
"""
def collect_data_paths(self, root):
data_dict = dict()
print(root)
for subdir, dir, files in os.walk(root):
for file in files:
if file.endswith('.png'):
filepath = os.path.join(subdir, file)
class_label = '_'.join(subdir.split("/")[-2:])
if class_label in data_dict:
data_dict[class_label].append(filepath)
else:
data_dict[class_label] = [filepath]
return data_dict
def __init__(self, root, set_name,
transform=None, target_transform=None,
download=False):
self.root = os.path.expanduser(root)
self.root = os.path.abspath(os.path.join(self.root, 'omniglot_dataset'))
self.transform = transform
self.target_transform = target_transform
self.set_name = set_name # training set or test set
self.data_dict = self.collect_data_paths(root=self.root)
x = []
label_to_idx = {label: idx for idx, label in enumerate(self.data_dict.keys())}
y = []
for key, value in self.data_dict.items():
x.extend(value)
y.extend(len(value) * [label_to_idx[key]])
y = np.array(y)
rng = np.random.RandomState(seed=0)
idx = np.arange(len(x))
rng.shuffle(idx)
x = [x[current_idx] for current_idx in idx]
y = y[idx]
train_sample_idx = rng.choice(a=[i for i in range(len(x))], size=int(len(x) * 0.80), replace=False)
evaluation_sample_idx = [i for i in range(len(x)) if i not in train_sample_idx]
validation_sample_idx = rng.choice(a=[i for i in range(len(evaluation_sample_idx))], size=int(len(evaluation_sample_idx) * 0.40), replace=False)
test_sample_idx = [i for i in range(len(evaluation_sample_idx)) if i not in evaluation_sample_idx]
if self.set_name=='train':
self.data = [item for idx, item in enumerate(x) if idx in train_sample_idx]
self.labels = y[train_sample_idx]
elif self.set_name=='val':
self.data = [item for idx, item in enumerate(x) if idx in validation_sample_idx]
self.labels = y[validation_sample_idx]
else:
self.data = [item for idx, item in enumerate(x) if idx in test_sample_idx]
self.labels = y[test_sample_idx]
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: (image, target) where target is index of the target class.
"""
img, target = self.data[index], self.labels[index]
img = Image.open(img)
img.show()
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
target = self.target_transform(target)
return img, target
def __len__(self):
return len(self.data)
def __repr__(self):
fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
fmt_str += ' Number of datapoints: {}\n'.format(self.__len__())
tmp = self.set_name
fmt_str += ' Split: {}\n'.format(tmp)
fmt_str += ' Root Location: {}\n'.format(self.root)
tmp = ' Transforms (if any): '
fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
tmp = ' Target Transforms (if any): '
fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
return fmt_str
class CIFAR10(data.Dataset):
"""`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
Args:
root (string): Root directory of dataset where directory
``cifar-10-batches-py`` exists or will be saved to if download is set to True.
train (bool, optional): If True, creates dataset from training set, otherwise
creates from test set.
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, ``transforms.RandomCrop``
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
download (bool, optional): If true, downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, it is not
downloaded again.
"""
base_folder = 'cifar-10-batches-py'
url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
filename = "cifar-10-python.tar.gz"
tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
train_list = [
['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
['data_batch_4', '634d18415352ddfa80567beed471001a'],
['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
]
test_list = [
['test_batch', '40351d587109b95175f43aff81a1287e'],
]
def __init__(self, root, set_name,
transform=None, target_transform=None,
download=False):
self.root = os.path.expanduser(root)
self.transform = transform
self.target_transform = target_transform
self.set_name = set_name # training set or test set
if download:
self.download()
if not self._check_integrity():
raise RuntimeError('Dataset not found or corrupted.' +
' You can use download=True to download it')
# now load the picked numpy arrays
rng = np.random.RandomState(seed=0)
train_sample_idx = rng.choice(a=[i for i in range(50000)], size=47500, replace=False)
val_sample_idx = [i for i in range(50000) if i not in train_sample_idx]
if self.set_name=='train':
self.data = []
self.labels = []
for fentry in self.train_list:
f = fentry[0]
file = os.path.join(self.root, self.base_folder, f)
fo = open(file, 'rb')
if sys.version_info[0] == 2:
entry = pickle.load(fo)
else:
entry = pickle.load(fo, encoding='latin1')
self.data.append(entry['data'])
if 'labels' in entry:
self.labels += entry['labels']
else:
self.labels += entry['fine_labels']
fo.close()
self.data = np.concatenate(self.data)
self.data = self.data.reshape((50000, 3, 32, 32))
self.data = self.data.transpose((0, 2, 3, 1)) # convert to HWC
self.data = self.data[train_sample_idx]
self.labels = np.array(self.labels)[train_sample_idx]
print(set_name, self.data.shape)
print(set_name, self.labels.shape)
elif self.set_name=='val':
self.data = []
self.labels = []
for fentry in self.train_list:
f = fentry[0]
file = os.path.join(self.root, self.base_folder, f)
fo = open(file, 'rb')
if sys.version_info[0] == 2:
entry = pickle.load(fo)
else:
entry = pickle.load(fo, encoding='latin1')
self.data.append(entry['data'])
if 'labels' in entry:
self.labels += entry['labels']
else:
self.labels += entry['fine_labels']
fo.close()
self.data = np.concatenate(self.data)
self.data = self.data.reshape((50000, 3, 32, 32))
self.data = self.data.transpose((0, 2, 3, 1)) # convert to HWC
self.data = self.data[val_sample_idx]
self.labels = np.array(self.labels)[val_sample_idx]
print(set_name, self.data.shape)
print(set_name, self.labels.shape)
else:
f = self.test_list[0][0]
file = os.path.join(self.root, self.base_folder, f)
fo = open(file, 'rb')
if sys.version_info[0] == 2:
entry = pickle.load(fo)
else:
entry = pickle.load(fo, encoding='latin1')
self.data = entry['data']
if 'labels' in entry:
self.labels = entry['labels']
else:
self.labels = entry['fine_labels']
fo.close()
self.data = self.data.reshape((10000, 3, 32, 32))
self.data = self.data.transpose((0, 2, 3, 1)) # convert to HWC
self.labels = np.array(self.labels)
print(set_name, self.data.shape)
print(set_name, self.labels.shape)
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: (image, target) where target is index of the target class.
"""
img, target = self.data[index], self.labels[index]
# doing this so that it is consistent with all other datasets
# to return a PIL Image
img = Image.fromarray(img)
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
target = self.target_transform(target)
return img, target
def __len__(self):
return len(self.data)
def _check_integrity(self):
root = self.root
for fentry in (self.train_list + self.test_list):
filename, md5 = fentry[0], fentry[1]
fpath = os.path.join(root, self.base_folder, filename)
if not check_integrity(fpath, md5):
return False
return True
def download(self):
import tarfile
if self._check_integrity():
print('Files already downloaded and verified')
return
root = self.root
download_url(self.url, root, self.filename, self.tgz_md5)
# extract file
cwd = os.getcwd()
tar = tarfile.open(os.path.join(root, self.filename), "r:gz")
os.chdir(root)
tar.extractall()
tar.close()
os.chdir(cwd)
def __repr__(self):
fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
fmt_str += ' Number of datapoints: {}\n'.format(self.__len__())
tmp = self.set_name
fmt_str += ' Split: {}\n'.format(tmp)
fmt_str += ' Root Location: {}\n'.format(self.root)
tmp = ' Transforms (if any): '
fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
tmp = ' Target Transforms (if any): '
fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
return fmt_str
class CIFAR100(CIFAR10):
"""`CIFAR100 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
This is a subclass of the `CIFAR10` Dataset.
"""
base_folder = 'cifar-100-python'
url = "https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz"
filename = "cifar-100-python.tar.gz"
tgz_md5 = 'eb9058c3a382ffc7106e4002c42a8d85'
train_list = [
['train', '16019d7e3df5f24257cddd939b257f8d'],
]
test_list = [
['test', 'f0ef6b0ae62326f3e7ffdfab6717acfc'],
]

View File

@ -23,10 +23,9 @@ class SumOfSquaredDiffsError(object):
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
Scalar cost function value.
"""
#TODO write your code here
raise NotImplementedError()
return 0.5 * np.mean(np.sum((outputs - targets)**2, axis=1))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
@ -36,11 +35,142 @@ class SumOfSquaredDiffsError(object):
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs. This should be
an array of shape (batch_size, output_dim).
Gradient of error function with respect to outputs.
"""
#TODO write your code here
raise NotImplementedError()
return (outputs - targets) / outputs.shape[0]
def __repr__(self):
return 'SumOfSquaredDiffsError'
return 'MeanSquaredErrorCost'
class BinaryCrossEntropyError(object):
"""Binary cross entropy error."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
return -np.mean(
targets * np.log(outputs) + (1. - targets) * np.log(1. - outputs))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
return ((1. - targets) / (1. - outputs) -
(targets / outputs)) / outputs.shape[0]
def __repr__(self):
return 'BinaryCrossEntropyError'
class BinaryCrossEntropySigmoidError(object):
"""Binary cross entropy error with logistic sigmoid applied to outputs."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
probs = 1. / (1. + np.exp(-outputs))
return -np.mean(
targets * np.log(probs) + (1. - targets) * np.log(1. - probs))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
probs = 1. / (1. + np.exp(-outputs))
return (probs - targets) / outputs.shape[0]
def __repr__(self):
return 'BinaryCrossEntropySigmoidError'
class CrossEntropyError(object):
"""Multi-class cross entropy error."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
return -np.mean(np.sum(targets * np.log(outputs), axis=1))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
return -(targets / outputs) / outputs.shape[0]
def __repr__(self):
return 'CrossEntropyError'
class CrossEntropySoftmaxError(object):
"""Multi-class cross entropy error with Softmax applied to outputs."""
def __call__(self, outputs, targets):
"""Calculates error function given a batch of outputs and targets.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Scalar error function value.
"""
normOutputs = outputs - outputs.max(-1)[:, None]
logProb = normOutputs - np.log(np.sum(np.exp(normOutputs), axis=-1)[:, None])
return -np.mean(np.sum(targets * logProb, axis=1))
def grad(self, outputs, targets):
"""Calculates gradient of error function with respect to outputs.
Args:
outputs: Array of model outputs of shape (batch_size, output_dim).
targets: Array of target outputs of shape (batch_size, output_dim).
Returns:
Gradient of error function with respect to outputs.
"""
probs = np.exp(outputs - outputs.max(-1)[:, None])
probs /= probs.sum(-1)[:, None]
return (probs - targets) / outputs.shape[0]
def __repr__(self):
return 'CrossEntropySoftmaxError'

View File

@ -63,3 +63,81 @@ class NormalInit(object):
def __call__(self, shape):
return self.rng.normal(loc=self.mean, scale=self.std, size=shape)
class GlorotUniformInit(object):
"""Glorot and Bengio (2010) random uniform weights initialiser.
Initialises an two-dimensional parameter array using the 'normalized
initialisation' scheme suggested in [1] which attempts to maintain a
roughly constant variance in the activations and backpropagated gradients
of a multi-layer model consisting of interleaved affine and logistic
sigmoidal transformation layers.
Weights are sampled from a zero-mean uniform distribution with standard
deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
`output_dim` are the input and output dimensions of the weight matrix
respectively.
References:
[1]: Understanding the difficulty of training deep feedforward neural
networks, Glorot and Bengio (2010)
"""
def __init__(self, gain=1., rng=None):
"""Construct a normalised initilisation random initialiser object.
Args:
gain: Multiplicative factor to scale initialised weights by.
Recommended values is 1 for affine layers followed by
logistic sigmoid layers (or another affine layer).
rng (RandomState): Seeded random number generator.
"""
self.gain = gain
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def __call__(self, shape):
assert len(shape) == 2, (
'Initialiser should only be used for two dimensional arrays.')
std = self.gain * (2. / (shape[0] + shape[1]))**0.5
half_width = 3.**0.5 * std
return self.rng.uniform(low=-half_width, high=half_width, size=shape)
class GlorotNormalInit(object):
"""Glorot and Bengio (2010) random normal weights initialiser.
Initialises an two-dimensional parameter array using the 'normalized
initialisation' scheme suggested in [1] which attempts to maintain a
roughly constant variance in the activations and backpropagated gradients
of a multi-layer model consisting of interleaved affine and logistic
sigmoidal transformation layers.
Weights are sampled from a zero-mean normal distribution with standard
deviation `sqrt(2 / (input_dim * output_dim))` where `input_dim` and
`output_dim` are the input and output dimensions of the weight matrix
respectively.
References:
[1]: Understanding the difficulty of training deep feedforward neural
networks, Glorot and Bengio (2010)
"""
def __init__(self, gain=1., rng=None):
"""Construct a normalised initilisation random initialiser object.
Args:
gain: Multiplicative factor to scale initialised weights by.
Recommended values is 1 for affine layers followed by
logistic sigmoid layers (or another affine layer).
rng (RandomState): Seeded random number generator.
"""
self.gain = gain
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def __call__(self, shape):
std = self.gain * (2. / (shape[0] + shape[1]))**0.5
return self.rng.normal(loc=0., scale=std, size=shape)

View File

@ -14,6 +14,7 @@ respect to the layer parameters.
import numpy as np
import mlp.initialisers as init
from mlp import DEFAULT_SEED
class Layer(object):
@ -68,12 +69,154 @@ class LayerWithParameters(Layer):
"""
raise NotImplementedError()
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
raise NotImplementedError()
@property
def params(self):
"""Returns a list of parameters of layer.
Returns:
List of current parameter values.
List of current parameter values. This list should be in the
corresponding order to the `values` argument to `set_params`.
"""
raise NotImplementedError()
@params.setter
def params(self, values):
"""Sets layer parameters from a list of values.
Args:
values: List of values to set parameters to. This list should be
in the corresponding order to what is returned by `get_params`.
"""
raise NotImplementedError()
class StochasticLayerWithParameters(Layer):
"""Specialised layer which uses a stochastic forward propagation."""
def __init__(self, rng=None):
"""Constructs a new StochasticLayer object.
Args:
rng (RandomState): Seeded random number generator object.
"""
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def fprop(self, inputs, stochastic=True):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
stochastic: Flag allowing different deterministic
forward-propagation mode in addition to default stochastic
forward-propagation e.g. for use at test time. If False
a deterministic forward-propagation transformation
corresponding to the expected output of the stochastic
forward-propagation is applied.
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
raise NotImplementedError()
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
Args:
inputs: Array of inputs to layer of shape (batch_size, input_dim).
grads_wrt_to_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
List of arrays of gradients with respect to the layer parameters
with parameter gradients appearing in same order in tuple as
returned from `get_params` method.
"""
raise NotImplementedError()
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
raise NotImplementedError()
@property
def params(self):
"""Returns a list of parameters of layer.
Returns:
List of current parameter values. This list should be in the
corresponding order to the `values` argument to `set_params`.
"""
raise NotImplementedError()
@params.setter
def params(self, values):
"""Sets layer parameters from a list of values.
Args:
values: List of values to set parameters to. This list should be
in the corresponding order to what is returned by `get_params`.
"""
raise NotImplementedError()
class StochasticLayer(Layer):
"""Specialised layer which uses a stochastic forward propagation."""
def __init__(self, rng=None):
"""Constructs a new StochasticLayer object.
Args:
rng (RandomState): Seeded random number generator object.
"""
if rng is None:
rng = np.random.RandomState(DEFAULT_SEED)
self.rng = rng
def fprop(self, inputs, stochastic=True):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
stochastic: Flag allowing different deterministic
forward-propagation mode in addition to default stochastic
forward-propagation e.g. for use at test time. If False
a deterministic forward-propagation transformation
corresponding to the expected output of the stochastic
forward-propagation is applied.
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
raise NotImplementedError()
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs. This should correspond to
default stochastic forward-propagation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
raise NotImplementedError()
@ -87,7 +230,7 @@ class AffineLayer(LayerWithParameters):
def __init__(self, input_dim, output_dim,
weights_initialiser=init.UniformInit(-0.1, 0.1),
biases_initialiser=init.ConstantInit(0.),
weights_cost=None, biases_cost=None):
weights_penalty=None, biases_penalty=None):
"""Initialises a parameterised affine layer.
Args:
@ -95,11 +238,17 @@ class AffineLayer(LayerWithParameters):
output_dim (int): Dimension of the layer outputs.
weights_initialiser: Initialiser for the weight parameters.
biases_initialiser: Initialiser for the bias parameters.
weights_penalty: Weights-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the weights.
biases_penalty: Biases-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the biases.
"""
self.input_dim = input_dim
self.output_dim = output_dim
self.weights = weights_initialiser((self.output_dim, self.input_dim))
self.biases = biases_initialiser(self.output_dim)
self.weights_penalty = weights_penalty
self.biases_penalty = biases_penalty
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
@ -113,8 +262,26 @@ class AffineLayer(LayerWithParameters):
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
#TODO write your code here
raise NotImplementedError()
return self.weights.dot(inputs.T).T + self.biases
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs.dot(self.weights)
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
@ -128,14 +295,530 @@ class AffineLayer(LayerWithParameters):
list of arrays of gradients with respect to the layer parameters
`[grads_wrt_weights, grads_wrt_biases]`.
"""
#TODO write your code here
raise NotImplementedError()
grads_wrt_weights = np.dot(grads_wrt_outputs.T, inputs)
grads_wrt_biases = np.sum(grads_wrt_outputs, axis=0)
if self.weights_penalty is not None:
grads_wrt_weights += self.weights_penalty.grad(parameter=self.weights)
if self.biases_penalty is not None:
grads_wrt_biases += self.biases_penalty.grad(parameter=self.biases)
return [grads_wrt_weights, grads_wrt_biases]
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
params_penalty = 0
if self.weights_penalty is not None:
params_penalty += self.weights_penalty(self.weights)
if self.biases_penalty is not None:
params_penalty += self.biases_penalty(self.biases)
return params_penalty
@property
def params(self):
"""A list of layer parameter values: `[weights, biases]`."""
return [self.weights, self.biases]
@params.setter
def params(self, values):
self.weights = values[0]
self.biases = values[1]
def __repr__(self):
return 'AffineLayer(input_dim={0}, output_dim={1})'.format(
self.input_dim, self.output_dim)
class SigmoidLayer(Layer):
"""Layer implementing an element-wise logistic sigmoid transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to
`y = 1 / (1 + exp(-x))`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return 1. / (1. + np.exp(-inputs))
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs * outputs * (1. - outputs)
def __repr__(self):
return 'SigmoidLayer'
class ConvolutionalLayer(LayerWithParameters):
"""Layer implementing a 2D convolution-based transformation of its inputs.
The layer is parameterised by a set of 2D convolutional kernels, a four
dimensional array of shape
(num_output_channels, num_input_channels, kernel_height, kernel_dim_2)
and a bias vector, a one dimensional array of shape
(num_output_channels,)
i.e. one shared bias per output channel.
Assuming no-padding is applied to the inputs so that outputs are only
calculated for positions where the kernel filters fully overlap with the
inputs, and that unit strides are used the outputs will have spatial extent
output_height = input_height - kernel_height + 1
output_width = input_width - kernel_width + 1
"""
def __init__(self, num_input_channels, num_output_channels,
input_height, input_width,
kernel_height, kernel_width,
kernels_init=init.UniformInit(-0.01, 0.01),
biases_init=init.ConstantInit(0.),
kernels_penalty=None, biases_penalty=None):
"""Initialises a parameterised convolutional layer.
Args:
num_input_channels (int): Number of channels in inputs to
layer (this may be number of colour channels in the input
images if used as the first layer in a model, or the
number of output channels, a.k.a. feature maps, from a
a previous convolutional layer).
num_output_channels (int): Number of channels in outputs
from the layer, a.k.a. number of feature maps.
input_height (int): Size of first input dimension of each 2D
channel of inputs.
input_width (int): Size of second input dimension of each 2D
channel of inputs.
kernel_height (int): Size of first dimension of each 2D channel of
kernels.
kernel_width (int): Size of second dimension of each 2D channel of
kernels.
kernels_intialiser: Initialiser for the kernel parameters.
biases_initialiser: Initialiser for the bias parameters.
kernels_penalty: Kernel-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the kernels.
biases_penalty: Biases-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the biases.
"""
self.num_input_channels = num_input_channels
self.num_output_channels = num_output_channels
self.input_height = input_height
self.input_width = input_width
self.kernel_height = kernel_height
self.kernel_width = kernel_width
self.kernels_init = kernels_init
self.biases_init = biases_init
self.kernels_shape = (
num_output_channels, num_input_channels, kernel_height, kernel_width
)
self.inputs_shape = (
None, num_input_channels, input_height, input_width
)
self.kernels = self.kernels_init(self.kernels_shape)
self.biases = self.biases_init(num_output_channels)
self.kernels_penalty = kernels_penalty
self.biases_penalty = biases_penalty
self.cache = None
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x`, outputs `y`, kernels `K` and biases `b` the layer
corresponds to `y = conv2d(x, K) + b`.
Args:
inputs: Array of layer inputs of shape (batch_size, num_input_channels, image_height, image_width).
Returns:
outputs: Array of layer outputs of shape (batch_size, num_output_channels, output_height, output_width).
"""
raise NotImplementedError
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape
(batch_size, num_input_channels, input_height, input_width).
outputs: Array of layer outputs calculated in forward pass of
shape
(batch_size, num_output_channels, output_height, output_width).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape
(batch_size, num_output_channels, output_height, output_width).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, num_input_channels, input_height, input_width).
"""
# Pad the grads_wrt_outputs
raise NotImplementedError
def grads_wrt_params(self, inputs, grads_wrt_outputs):
"""Calculates gradients with respect to layer parameters.
Args:
inputs: array of inputs to layer of shape (batch_size, input_dim)
grads_wrt_to_outputs: array of gradients with respect to the layer
outputs of shape
(batch_size, num_output_channels, output_height, output_width).
Returns:
list of arrays of gradients with respect to the layer parameters
`[grads_wrt_kernels, grads_wrt_biases]`.
"""
# Get inputs_col from previous fprop
raise NotImplementedError
def params_penalty(self):
"""Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
"""
params_penalty = 0
if self.kernels_penalty is not None:
params_penalty += self.kernels_penalty(self.kernels)
if self.biases_penalty is not None:
params_penalty += self.biases_penalty(self.biases)
return params_penalty
@property
def params(self):
"""A list of layer parameter values: `[kernels, biases]`."""
return [self.kernels, self.biases]
@params.setter
def params(self, values):
self.kernels = values[0]
self.biases = values[1]
def __repr__(self):
return (
'ConvolutionalLayer(\n'
' num_input_channels={0}, num_output_channels={1},\n'
' input_height={2}, input_width={3},\n'
' kernel_height={4}, kernel_width={5}\n'
')'
.format(self.num_input_channels, self.num_output_channels,
self.input_height, self.input_width, self.kernel_height,
self.kernel_width)
)
class ReluLayer(Layer):
"""Layer implementing an element-wise rectified linear transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to `y = max(0, x)`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return np.maximum(inputs, 0.)
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return (outputs > 0) * grads_wrt_outputs
def __repr__(self):
return 'ReluLayer'
class TanhLayer(Layer):
"""Layer implementing an element-wise hyperbolic tangent transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to `y = tanh(x)`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return np.tanh(inputs)
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return (1. - outputs ** 2) * grads_wrt_outputs
def __repr__(self):
return 'TanhLayer'
class SoftmaxLayer(Layer):
"""Layer implementing a softmax transformation."""
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
For inputs `x` and outputs `y` this corresponds to
`y = exp(x) / sum(exp(x))`.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
# subtract max inside exponential to improve numerical stability -
# when we divide through by sum this term cancels
exp_inputs = np.exp(inputs - inputs.max(-1)[:, None])
return exp_inputs / exp_inputs.sum(-1)[:, None]
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return (outputs * (grads_wrt_outputs -
(grads_wrt_outputs * outputs).sum(-1)[:, None]))
def __repr__(self):
return 'SoftmaxLayer'
class RadialBasisFunctionLayer(Layer):
"""Layer implementing projection to a grid of radial basis functions."""
def __init__(self, grid_dim, intervals=[[0., 1.]]):
"""Creates a radial basis function layer object.
Args:
grid_dim: Integer specifying how many basis function to use in
grid across input space per dimension (so total number of
basis functions will be grid_dim**input_dim)
intervals: List of intervals (two element lists or tuples)
specifying extents of axis-aligned region in input-space to
tile basis functions in grid across. For example for a 2D input
space spanning [0, 1] x [0, 1] use intervals=[[0, 1], [0, 1]].
"""
num_basis = grid_dim ** len(intervals)
self.centres = np.array(np.meshgrid(*[
np.linspace(low, high, grid_dim) for (low, high) in intervals])
).reshape((len(intervals), -1))
self.scales = np.array([
[(high - low) * 1. / grid_dim] for (low, high) in intervals])
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return np.exp(-(inputs[..., None] - self.centres[None, ...]) ** 2 /
self.scales ** 2).reshape((inputs.shape[0], -1))
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
num_basis = self.centres.shape[1]
return -2 * (
((inputs[..., None] - self.centres[None, ...]) / self.scales ** 2) *
grads_wrt_outputs.reshape((inputs.shape[0], -1, num_basis))
).sum(-1)
def __repr__(self):
return 'RadialBasisFunctionLayer(grid_dim={0})'.format(self.grid_dim)
class DropoutLayer(StochasticLayer):
"""Layer which stochastically drops input dimensions in its output."""
def __init__(self, rng=None, incl_prob=0.5, share_across_batch=True):
"""Construct a new dropout layer.
Args:
rng (RandomState): Seeded random number generator.
incl_prob: Scalar value in (0, 1] specifying the probability of
each input dimension being included in the output.
share_across_batch: Whether to use same dropout mask across
all inputs in a batch or use per input masks.
"""
super(DropoutLayer, self).__init__(rng)
assert incl_prob > 0. and incl_prob <= 1.
self.incl_prob = incl_prob
self.share_across_batch = share_across_batch
self.rng = rng
def fprop(self, inputs, stochastic=True):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
stochastic: Flag allowing different deterministic
forward-propagation mode in addition to default stochastic
forward-propagation e.g. for use at test time. If False
a deterministic forward-propagation transformation
corresponding to the expected output of the stochastic
forward-propagation is applied.
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
if stochastic:
mask_shape = (1,) + inputs.shape[1:] if self.share_across_batch else inputs.shape
self._mask = (self.rng.uniform(size=mask_shape) < self.incl_prob)
return inputs * self._mask
else:
return inputs * self.incl_prob
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs. This should correspond to
default stochastic forward-propagation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs * self._mask
def __repr__(self):
return 'DropoutLayer(incl_prob={0:.1f})'.format(self.incl_prob)
class ReshapeLayer(Layer):
"""Layer which reshapes dimensions of inputs."""
def __init__(self, output_shape=None):
"""Create a new reshape layer object.
Args:
output_shape: Tuple specifying shape each input in batch should
be reshaped to in outputs. This **excludes** the batch size
so the shape of the final output array will be
(batch_size, ) + output_shape
Similarly to numpy.reshape, one shape dimension can be -1. In
this case, the value is inferred from the size of the input
array and remaining dimensions. The shape specified must be
compatible with the input array shape - i.e. the total number
of values in the array cannot be changed. If set to `None` the
output shape will be set to
(batch_size, -1)
which will flatten all the inputs to vectors.
"""
self.output_shape = (-1,) if output_shape is None else output_shape
def fprop(self, inputs):
"""Forward propagates activations through the layer transformation.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
Returns:
outputs: Array of layer outputs of shape (batch_size, output_dim).
"""
return inputs.reshape((inputs.shape[0],) + self.output_shape)
def bprop(self, inputs, outputs, grads_wrt_outputs):
"""Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape (batch_size, input_dim).
outputs: Array of layer outputs calculated in forward pass of
shape (batch_size, output_dim).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape (batch_size, output_dim).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, input_dim).
"""
return grads_wrt_outputs.reshape(inputs.shape)
def __repr__(self):
return 'ReshapeLayer(output_shape={0})'.format(self.output_shape)

View File

@ -160,3 +160,229 @@ class MomentumLearningRule(GradientDescentLearningRule):
mom *= self.mom_coeff
mom -= self.learning_rate * grad
param += mom
class AdamLearningRule(GradientDescentLearningRule):
"""Adaptive moments (Adam) learning rule.
First-order gradient-descent based learning rule which uses adaptive
estimates of first and second moments of the parameter gradients to
calculate the parameter updates.
References:
[1]: Adam: a method for stochastic optimisation
Kingma and Ba, 2015
"""
def __init__(self, learning_rate=1e-3, beta_1=0.9, beta_2=0.999,
epsilon=1e-8):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
beta_1: Exponential decay rate for gradient first moment estimates.
This should be a scalar value in [0, 1]. The running gradient
first moment estimate is calculated using
`m_1 = beta_1 * m_1_prev + (1 - beta_1) * g`
where `m_1_prev` is the previous estimate and `g` the current
parameter gradients.
beta_2: Exponential decay rate for gradient second moment
estimates. This should be a scalar value in [0, 1]. The run
gradient second moment estimate is calculated using
`m_2 = beta_2 * m_2_prev + (1 - beta_2) * g**2`
where `m_2_prev` is the previous estimate and `g` the current
parameter gradients.
epsilon: 'Softening' parameter to stop updates diverging when
second moment estimates are close to zero. Should be set to
a small positive value.
"""
super(AdamLearningRule, self).__init__(learning_rate)
assert beta_1 >= 0. and beta_1 <= 1., 'beta_1 should be in [0, 1].'
assert beta_2 >= 0. and beta_2 <= 1., 'beta_2 should be in [0, 2].'
assert epsilon > 0., 'epsilon should be > 0.'
self.beta_1 = beta_1
self.beta_2 = beta_2
self.epsilon = epsilon
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(AdamLearningRule, self).initialise(params)
self.moms_1 = []
for param in self.params:
self.moms_1.append(np.zeros_like(param))
self.moms_2 = []
for param in self.params:
self.moms_2.append(np.zeros_like(param))
self.step_count = 0
def reset(self):
"""Resets any additional state variables to their initial values.
For this learning rule this corresponds to zeroing the estimates of
the first and second moments of the gradients.
"""
for mom_1, mom_2 in zip(self.moms_1, self.moms_2):
mom_1 *= 0.
mom_2 *= 0.
self.step_count = 0
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, mom_1, mom_2, grad in zip(
self.params, self.moms_1, self.moms_2, grads_wrt_params):
mom_1 *= self.beta_1
mom_1 += (1. - self.beta_1) * grad
mom_2 *= self.beta_2
mom_2 += (1. - self.beta_2) * grad ** 2
alpha_t = (
self.learning_rate *
(1. - self.beta_2 ** (self.step_count + 1)) ** 0.5 /
(1. - self.beta_1 ** (self.step_count + 1))
)
param -= alpha_t * mom_1 / (mom_2 ** 0.5 + self.epsilon)
self.step_count += 1
class AdaGradLearningRule(GradientDescentLearningRule):
"""Adaptive gradients (AdaGrad) learning rule.
First-order gradient-descent based learning rule which normalises gradient
updates by a running sum of the past squared gradients.
References:
[1]: Adaptive Subgradient Methods for Online Learning and Stochastic
Optimization. Duchi, Haxan and Singer, 2011
"""
def __init__(self, learning_rate=1e-2, epsilon=1e-8):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
epsilon: 'Softening' parameter to stop updates diverging when
sums of squared gradients are close to zero. Should be set to
a small positive value.
"""
super(AdaGradLearningRule, self).__init__(learning_rate)
assert epsilon > 0., 'epsilon should be > 0.'
self.epsilon = epsilon
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(AdaGradLearningRule, self).initialise(params)
self.sum_sq_grads = []
for param in self.params:
self.sum_sq_grads.append(np.zeros_like(param))
def reset(self):
"""Resets any additional state variables to their initial values.
For this learning rule this corresponds to zeroing all the sum of
squared gradient states.
"""
for sum_sq_grad in self.sum_sq_grads:
sum_sq_grad *= 0.
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, sum_sq_grad, grad in zip(
self.params, self.sum_sq_grads, grads_wrt_params):
sum_sq_grad += grad ** 2
param -= (self.learning_rate * grad /
(sum_sq_grad + self.epsilon) ** 0.5)
class RMSPropLearningRule(GradientDescentLearningRule):
"""Root mean squared gradient normalised learning rule (RMSProp).
First-order gradient-descent based learning rule which normalises gradient
updates by a exponentially smoothed estimate of the gradient second
moments.
References:
[1]: Neural Networks for Machine Learning: Lecture 6a slides
University of Toronto,Computer Science Course CSC321
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
"""
def __init__(self, learning_rate=1e-3, beta=0.9, epsilon=1e-8):
"""Creates a new learning rule object.
Args:
learning_rate: A postive scalar to scale gradient updates to the
parameters by. This needs to be carefully set - if too large
the learning dynamic will be unstable and may diverge, while
if set too small learning will proceed very slowly.
beta: Exponential decay rate for gradient second moment
estimates. This should be a scalar value in [0, 1]. The running
gradient second moment estimate is calculated using
`m_2 = beta * m_2_prev + (1 - beta) * g**2`
where `m_2_prev` is the previous estimate and `g` the current
parameter gradients.
epsilon: 'Softening' parameter to stop updates diverging when
gradient second moment estimates are close to zero. Should be
set to a small positive value.
"""
super(RMSPropLearningRule, self).__init__(learning_rate)
assert beta >= 0. and beta <= 1., 'beta should be in [0, 1].'
assert epsilon > 0., 'epsilon should be > 0.'
self.beta = beta
self.epsilon = epsilon
def initialise(self, params):
"""Initialises the state of the learning rule for a set or parameters.
This must be called before `update_params` is first called.
Args:
params: A list of the parameters to be optimised. Note these will
be updated *in-place* to avoid reallocating arrays on each
update.
"""
super(RMSPropLearningRule, self).initialise(params)
self.moms_2 = []
for param in self.params:
self.moms_2.append(np.zeros_like(param))
def reset(self):
"""Resets any additional state variables to their initial values.
For this learning rule this corresponds to zeroing all gradient
second moment estimates.
"""
for mom_2 in self.moms_2:
mom_2 *= 0.
def update_params(self, grads_wrt_params):
"""Applies a single update to all parameters.
All parameter updates are performed using in-place operations and so
nothing is returned.
Args:
grads_wrt_params: A list of gradients of the scalar loss function
with respect to each of the parameters passed to `initialise`
previously, with this list expected to be in the same order.
"""
for param, mom_2, grad in zip(
self.params, self.moms_2, grads_wrt_params):
mom_2 *= self.beta
mom_2 += (1. - self.beta) * grad ** 2
param -= (self.learning_rate * grad /
(mom_2 + self.epsilon) ** 0.5)

View File

@ -8,7 +8,7 @@ outputs (and intermediate states) and for calculating gradients of scalar
functions of the outputs with respect to the model parameters.
"""
from mlp.layers import LayerWithParameters
from mlp.layers import LayerWithParameters, StochasticLayer, StochasticLayerWithParameters
class SingleLayerModel(object):
@ -27,7 +27,7 @@ class SingleLayerModel(object):
"""A list of all of the parameters of the model."""
return self.layer.params
def fprop(self, inputs):
def fprop(self, inputs, evaluation=False):
"""Calculate the model outputs corresponding to a batch of inputs.
Args:
@ -59,9 +59,87 @@ class SingleLayerModel(object):
"""
return self.layer.grads_wrt_params(activations[0], grads_wrt_outputs)
def params_cost(self):
"""Calculates the parameter dependent cost term of the model."""
return self.layer.params_cost()
def __repr__(self):
return 'SingleLayerModel(' + str(self.layer) + ')'
class MultipleLayerModel(object):
"""A model consisting of multiple layers applied sequentially."""
def __init__(self, layers):
"""Create a new multiple layer model instance.
Args:
layers: List of the the layer objecst defining the model in the
order they should be applied from inputs to outputs.
"""
self.layers = layers
@property
def params(self):
"""A list of all of the parameters of the model."""
params = []
for layer in self.layers:
if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
params += layer.params
return params
def fprop(self, inputs, evaluation=False):
"""Forward propagates a batch of inputs through the model.
Args:
inputs: Batch of inputs to the model.
Returns:
List of the activations at the output of all layers of the model
plus the inputs (to the first layer) as the first element. The
last element of the list corresponds to the model outputs.
"""
activations = [inputs]
for i, layer in enumerate(self.layers):
if evaluation:
if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
StochasticLayerWithParameters):
current_activations = self.layers[i].fprop(activations[i], stochastic=False)
else:
current_activations = self.layers[i].fprop(activations[i])
else:
if issubclass(type(self.layers[i]), StochasticLayer) or issubclass(type(self.layers[i]),
StochasticLayerWithParameters):
current_activations = self.layers[i].fprop(activations[i], stochastic=True)
else:
current_activations = self.layers[i].fprop(activations[i])
activations.append(current_activations)
return activations
def grads_wrt_params(self, activations, grads_wrt_outputs):
"""Calculates gradients with respect to the model parameters.
Args:
activations: List of all activations from forward pass through
model using `fprop`.
grads_wrt_outputs: Gradient with respect to the model outputs of
the scalar function parameter gradients are being calculated
for.
Returns:
List of gradients of the scalar function with respect to all model
parameters.
"""
grads_wrt_params = []
for i, layer in enumerate(self.layers[::-1]):
inputs = activations[-i - 2]
outputs = activations[-i - 1]
grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)
if isinstance(layer, LayerWithParameters) or isinstance(layer, StochasticLayerWithParameters):
grads_wrt_params += layer.grads_wrt_params(
inputs, grads_wrt_outputs)[::-1]
grads_wrt_outputs = grads_wrt_inputs
return grads_wrt_params[::-1]
def __repr__(self):
return 'SingleLayerModel(' + str(layer) + ')'
return (
'MultiLayerModel(\n ' +
'\n '.join([str(layer) for layer in self.layers]) +
'\n)'
)

View File

@ -9,7 +9,7 @@ import time
import logging
from collections import OrderedDict
import numpy as np
import tqdm
logger = logging.getLogger(__name__)
@ -18,7 +18,7 @@ class Optimiser(object):
"""Basic model optimiser."""
def __init__(self, model, error, learning_rule, train_dataset,
valid_dataset=None, data_monitors=None):
valid_dataset=None, data_monitors=None, notebook=False):
"""Create a new optimiser instance.
Args:
@ -43,6 +43,11 @@ class Optimiser(object):
self.data_monitors = OrderedDict([('error', error)])
if data_monitors is not None:
self.data_monitors.update(data_monitors)
self.notebook = notebook
if notebook:
self.tqdm_progress = tqdm.tqdm_notebook
else:
self.tqdm_progress = tqdm.tqdm
def do_training_epoch(self):
"""Do a single training epoch.
@ -52,12 +57,15 @@ class Optimiser(object):
respect to all the model parameters and then updates the model
parameters according to the learning rule.
"""
with self.tqdm_progress(total=self.train_dataset.num_batches) as train_progress_bar:
train_progress_bar.set_description("Epoch Progress")
for inputs_batch, targets_batch in self.train_dataset:
activations = self.model.fprop(inputs_batch)
grads_wrt_outputs = self.error.grad(activations[-1], targets_batch)
grads_wrt_params = self.model.grads_wrt_params(
activations, grads_wrt_outputs)
self.learning_rule.update_params(grads_wrt_params)
train_progress_bar.update(1)
def eval_monitors(self, dataset, label):
"""Evaluates the monitors for the given dataset.
@ -72,7 +80,7 @@ class Optimiser(object):
data_mon_vals = OrderedDict([(key + label, 0.) for key
in self.data_monitors.keys()])
for inputs_batch, targets_batch in dataset:
activations = self.model.fprop(inputs_batch)
activations = self.model.fprop(inputs_batch, evaluation=True)
for key, data_monitor in self.data_monitors.items():
data_mon_vals[key + label] += data_monitor(
activations[-1], targets_batch)
@ -121,14 +129,20 @@ class Optimiser(object):
and the second being a dict mapping the labels for the statistics
recorded to their column index in the array.
"""
start_train_time = time.time()
run_stats = [list(self.get_epoch_stats().values())]
with self.tqdm_progress(total=num_epochs) as progress_bar:
progress_bar.set_description("Experiment Progress")
for epoch in range(1, num_epochs + 1):
start_time = time.process_time()
start_time = time.time()
self.do_training_epoch()
epoch_time = time.process_time() - start_time
epoch_time = time.time()- start_time
if epoch % stats_interval == 0:
stats = self.get_epoch_stats()
self.log_stats(epoch, epoch_time, stats)
run_stats.append(list(stats.values()))
return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}
progress_bar.update(1)
finish_train_time = time.time()
total_train_time = finish_train_time - start_train_time
return np.array(run_stats), {k: i for i, k in enumerate(stats.keys())}, total_train_time

90
mlp/penalties.py Normal file
View File

@ -0,0 +1,90 @@
import numpy as np
seed = 22102017
rng = np.random.RandomState(seed)
class L1Penalty(object):
"""L1 parameter penalty.
Term to add to the objective function penalising parameters
based on their L1 norm.
"""
def __init__(self, coefficient):
"""Create a new L1 penalty object.
Args:
coefficient: Positive constant to scale penalty term by.
"""
assert coefficient > 0., 'Penalty coefficient must be positive.'
self.coefficient = coefficient
def __call__(self, parameter):
"""Calculate L1 penalty value for a parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty term.
"""
return self.coefficient * abs(parameter).sum()
def grad(self, parameter):
"""Calculate the penalty gradient with respect to the parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty gradient with respect to parameter. This
should be an array of the same shape as the parameter.
"""
return self.coefficient * np.sign(parameter)
def __repr__(self):
return 'L1Penalty({0})'.format(self.coefficient)
class L2Penalty(object):
"""L1 parameter penalty.
Term to add to the objective function penalising parameters
based on their L2 norm.
"""
def __init__(self, coefficient):
"""Create a new L2 penalty object.
Args:
coefficient: Positive constant to scale penalty term by.
"""
assert coefficient > 0., 'Penalty coefficient must be positive.'
self.coefficient = coefficient
def __call__(self, parameter):
"""Calculate L2 penalty value for a parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty term.
"""
return 0.5 * self.coefficient * (parameter ** 2).sum()
def grad(self, parameter):
"""Calculate the penalty gradient with respect to the parameter.
Args:
parameter: Array corresponding to a model parameter.
Returns:
Value of penalty gradient with respect to parameter. This
should be an array of the same shape as the parameter.
"""
return self.coefficient * parameter
def __repr__(self):
return 'L2Penalty({0})'.format(self.coefficient)

34
mlp/schedulers.py Normal file
View File

@ -0,0 +1,34 @@
# -*- coding: utf-8 -*-
"""Training schedulers.
This module contains classes implementing schedulers which control the
evolution of learning rule hyperparameters (such as learning rate) over a
training run.
"""
import numpy as np
class ConstantLearningRateScheduler(object):
"""Example of scheduler interface which sets a constant learning rate."""
def __init__(self, learning_rate):
"""Construct a new constant learning rate scheduler object.
Args:
learning_rate: Learning rate to use in learning rule.
"""
self.learning_rate = learning_rate
def update_learning_rule(self, learning_rule, epoch_number):
"""Update the hyperparameters of the learning rule.
Run at the beginning of each epoch.
Args:
learning_rule: Learning rule object being used in training run,
any scheduled hyperparameters to be altered should be
attributes of this object.
epoch_number: Integer index of training epoch about to be run.
"""
learning_rule.learning_rate = self.learning_rate

View File

@ -1,242 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"## Getting started with Jupyter notebooks\n",
"\n",
"The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to [skip straight to the exercises below](#Exercises).\n",
"\n",
"*Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python [hence the name was changed to make it more language agnostic](https://ipython.org/#jupyter-and-the-future-of-ipython) however IPython notebook is still commonly used.*\n",
"\n",
"### Jupyter basics: the server, dashboard and kernels\n",
"\n",
"In launching this notebook you will have already come across two of the other key components of the Jupyter system - the notebook *server* and *dashboard* interface.\n",
"\n",
"We began by starting a notebook server instance in the terminal by running\n",
"\n",
"```\n",
"jupyter notebook\n",
"```\n",
"\n",
"This will have begun printing a series of log messages to terminal output similar to\n",
"\n",
"```\n",
"$ jupyter notebook\n",
"[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical\n",
"[I 08:58:24.417 NotebookApp] 0 active kernels\n",
"[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/\n",
"```\n",
"\n",
"The last message included here indicates the URL the application is being served at. The default behaviour of the `jupyter notebook` command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running `jupyter notebook --no-browser`. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using\n",
"\n",
"```\n",
"jupyter notebook --help\n",
"```\n",
"\n",
"While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use `Ctrl+C` - this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter `y` or skip the confirmation by hitting `Ctrl+C` again.\n",
"\n",
"When the notebook application first opens in your browser you are taken to the notebook *dashboard*. This will appear something like this\n",
"\n",
"<img src='res/jupyter-dashboard.png' />\n",
"\n",
"The dashboard above is showing the `Files` tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the `..` link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.\n",
"\n",
"As well as allowing you to launch existing notebooks, the `Files` tab of the dashboard also allows new notebooks to be created using the `New` drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).\n",
"\n",
"In addition to opening notebook files, we can also edit text files such as `.py` source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.\n",
"\n",
"The `Running` tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed.\n",
"\n",
"### The notebook interface\n",
"\n",
"The top of your notebook window should appear something like this:\n",
"\n",
"<img src='res/jupyter-notebook-interface.png' />\n",
"\n",
"The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual *checkpoint* of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the `Revert to checkpoint` dropdown under the `File` menu perhaps suggesting otherwise.\n",
"\n",
"As well as having options to save and revert to checkpoints, the `File` menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.\n",
"\n",
"The `Edit` menu contains standard clipboard functions as well as options for reorganising notebook *cells*. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The `Edit` and `Insert` drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the `Cell` menu allow running of code cells and changing cell types.\n",
"\n",
"The `Kernel` menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and `reload` is failing to work. \n",
"\n",
"To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.\n",
"\n",
"The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the `User Interface Tour` under the `Help` menu drop down.\n",
"\n",
"### Markdown cells: easy text formatting\n",
"\n",
"This entire introduction has been written in what is termed a *Markdown* cell of a notebook. [Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides - for example [this cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).\n",
"\n",
"---\n",
"\n",
"```\n",
"## Level 2 heading\n",
"### Level 3 heading\n",
"\n",
"*Italicised* and **bold** text.\n",
"\n",
" * bulleted\n",
" * lists\n",
" \n",
"and\n",
"\n",
" 1. enumerated\n",
" 2. lists\n",
"\n",
"Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style\n",
"\n",
"$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
"```\n",
"---\n",
"\n",
"## Level 2 heading\n",
"### Level 3 heading\n",
"\n",
"*Italicised* and **bold** text.\n",
"\n",
" * bulleted\n",
" * lists\n",
" \n",
"and\n",
"\n",
" 1. enumerated\n",
" 2. lists\n",
"\n",
"Inline maths $y = mx + c$ using [MathJax]() as well as display maths\n",
"\n",
"$$ ax^2 + bx + c = 0 \\qquad \\Rightarrow \\qquad x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a} $$\n",
"\n",
"---\n",
"\n",
"We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.\n",
"\n",
"---\n",
"```\n",
"<img src=\"http://placehold.it/350x150\" />\n",
"```\n",
"---\n",
"\n",
"<img src=\"http://placehold.it/350x150\" />\n",
"\n",
"---\n",
"\n",
" \n",
"### Code cells: in browser code execution\n",
"\n",
"Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the <i class=\"fa-step-forward fa\"></i> button on the notebook toolbar, go to `Cell > Run Cells` or use the keyboard shortcut `Ctrl+Enter`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import print_function\n",
"import sys\n",
"\n",
"print('Hello world!')\n",
"print('Alarming hello!', file=sys.stderr)\n",
"print('Hello again!')\n",
"'And again!'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example shows the three main components of a code cell.\n",
"\n",
"The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.\n",
"\n",
"To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display `In [ ]:`. After the cell is run this is updated to `In [n]:` where `n` is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!\n",
"\n",
"Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29) is displayed. Immediately below that output to [`stderr`](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29) is displayed. All of the output to `stdout` is displayed together even if there has been output to `stderr` between as shown by the suprising ordering in the output here. \n",
"\n",
"The final part of the output area is the *display* area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special `__repr__` method e.g. for a string it is just the quote enclosed value of the string itself.\n",
"\n",
"### Useful keyboard shortcuts\n",
"\n",
"There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the `Keyboard Shortcuts` option under the `Help` menu. We will cover a few of those we find most useful below.\n",
"\n",
"Shortcuts come in two flavours: those applicable in *command mode*, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in *edit mode* when the content of a cell is being edited, indicated by a green current cell highlight.\n",
"\n",
"In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the `Tab` key.\n",
"\n",
" * Pressing `Tab` a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing `a` + `[Tab]` in a code cell will come up with a list of objects beginning with `a` in the current global namespace, while typing `np.a` + `[Tab]` (assuming `import numpy as np` has been run already) will bring up a list of objects in the root NumPy namespace beginning with `a`.\n",
" * Pressing `Shift+Tab` once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing `Shift+Tab` twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing `Shift+Tab` four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.\n",
"\n",
"A series of useful shortcuts available in both command and edit mode are `[modifier]+Enter` where `[modifier]` is one of `Ctrl` (run selected cell), `Shift` (run selected cell and select next) or `Alt` (run selected cell and insert a new cell after).\n",
"\n",
"A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing `L` which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.\n",
" \n",
"### Magics\n",
"\n",
"There are a range of *magic* commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), however three that are particularly useful for this course:\n",
"\n",
" * [`%%timeit`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-timeit) Put at the beginning of a cell to time its execution and print the resulting timing statistics.\n",
" * [`%precision`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-precision) Set the precision for pretty printing of floating point values and NumPy arrays.\n",
" * [`%debug`](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib#magic-debug) Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.\n",
" \n",
"### Plotting with `matplotlib`\n",
"\n",
"When setting up your environment one of the dependencies we asked you to install was `matplotlib`. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.\n",
"\n",
"When using `matplotlib` in a notebook you should first run the [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=matplotlib)\n",
"\n",
"```\n",
"%matplotlib inline\n",
"```\n",
"\n",
"This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using `matplotlib` to show case some of the basic plot options. To see the output produced select the cell and then run it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"# generate a pair of sinusoids\n",
"x = np.linspace(0., 2. * np.pi, 100)\n",
"y1 = np.sin(x)\n",
"y2 = np.cos(x)\n",
"\n",
"# produce a new figure object with a defined (width, height) in inches\n",
"fig = plt.figure(figsize=(8, 4))\n",
"# add a single axis to the figure\n",
"ax = fig.add_subplot(111)\n",
"# plot the two sinusoidal traces on the axis, adjusting the line width\n",
"# and adding LaTeX legend labels\n",
"ax.plot(x, y1, linewidth=2, label=r'$\\sin(x)$')\n",
"ax.plot(x, y2, linewidth=2, label=r'$\\cos(x)$')\n",
"# set the axis labels\n",
"ax.set_xlabel('$x$', fontsize=16)\n",
"ax.set_ylabel('$y$', fontsize=16)\n",
"# force the legend to be displayed\n",
"ax.legend()\n",
"# adjust the limits of the horizontal axis\n",
"ax.set_xlim(0., 2. * np.pi)\n",
"# make a grid be displayed in the axis background\n",
"ax.grid(True)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -16,8 +16,10 @@ Conda can handle installation of the Python libraries we will be using and all t
There are several options available for installing Conda on a system. Here we will use the Python 3 version of [Miniconda](http://conda.pydata.org/miniconda.html), which installs just Conda and its dependencies. An alternative is to install the [Anaconda Python distribution](https://docs.continuum.io/anaconda/), which installs Conda and a large selection of popular Python packages. As we will require only a small subset of these packages we will use the more barebones Miniconda to avoid eating into your DICE disk quota too much, however if installing on a personal machine you may wish to consider Anaconda if you want to explore other Python packages.
## 2. Installing Miniconda
We provide instructions here for getting an environment with all the required dependencies running on computers running
the School of Informatics [DICE desktop](http://computing.help.inf.ed.ac.uk/dice-platform). The same instructions
should be able to used on other Linux distributions such as Ubuntu and Linux Mint with minimal adjustments.
@ -32,7 +34,7 @@ If you are using ssh connection to the student server, move to the next step. If
We first need to download the latest 64-bit Python 3 Miniconda install script:
```bash
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
```
@ -40,7 +42,7 @@ This uses `wget` a command-line tool for downloading files.
Now run the install script:
```bash
```
bash Miniconda3-latest-Linux-x86_64.sh
```
@ -54,14 +56,14 @@ definition in `.bashrc`. As the DICE bash start-up mechanism differs from the st
On DICE, append the Miniconda binaries directory to `PATH` in manually in `~/.benv` using
```bash
```
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
```
To avoid any errors later, check both the bashrc and benv files for the correct file path by running :
```bash
```
vim ~/.bashrc and vim ~/.benv
```
@ -69,43 +71,43 @@ For those who this appears a bit opaque to and want to know what is going on see
We now need to `source` the updated `~/.benv` so that the `PATH` variable in the current terminal session is updated:
```bash
```
source ~/.benv
```
From the next time you log in all future terminal sessions should have conda readily available via:
```bash
```
conda activate
```
## 3. Creating the Conda environment
You should now have a working Conda installation. If you run
```bash
```
conda --help
```
From a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
from a terminal you should see the Conda help page displayed. If you get a `No command 'conda' found` error you should check you have set up your `PATH` variable correctly (you can get a demonstrator to help you do this).
Assuming Conda is working, we will now create our Conda environment:
```bash
conda create -n mlp python=3.12.5 -y
```
conda create -n mlp python=3
```
This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install.
This bootstraps a new Conda environment named `mlp` with a minimal Python 3 install. You will be presented with a 'package plan' listing the packages to be installed and asked whether to proceed: type `y` then enter.
We will now *activate* our created environment:
```bash
```
conda activate mlp
```
or on Windows only
```bash
```
activate mlp
```
@ -117,41 +119,38 @@ If you wish to deactivate an environment loaded in the current terminal e.g. to
We will now install the dependencies for the course into the new environment:
```bash
conda install numpy scipy matplotlib jupyter -y
```
conda install numpy scipy matplotlib jupyter
```
Wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
Again you will be given a list of the packages to be installed and asked to confirm whether to proceed. Enter `y` then wait for the packages to install (this should take around five minutes). In addition to Jupyter, NumPy and SciPy which we have already mentioned, we are also installing [matplotlib](http://matplotlib.org/) a plotting and visualisation library.
Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).
```bash
conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
```
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```
Once the installation is finished, to recover some disk space we can clear the package tarballs Conda just downloaded:
```bash
conda clean -t -y
```
conda clean -t
```
These tarballs are usually cached to allow quicker installation into additional environments however we will only be using a single environment here so there is no need to keep them on disk.
***ANLP and IAML students only:***
To have normal access to your ANLP and IAML environments please do the following:
1. ```nano .condarc```
2. Add the following lines in the file:
```yml
```
envs_dirs:
- /group/teaching/conda/envs
- /group/teaching/conda/envs
pkgs_dirs:
- /group/teaching/conda/pkgs
- ~/miniconda3/pkgs
- /group/teaching/conda/pkgs
- ~/miniconda3/pkgs
```
3. Exit by using control + x and then choosing 'yes' at the exit prompt.
## 4. Getting the course code and a short introduction to Git
@ -168,7 +167,7 @@ https://github.com/VICO-UoE/mlpractical
Git is installed by default on DICE desktops. If you are running a system which does not have Git installed, you can use Conda to install it in your environment using:
```bash
```
conda install git
```
@ -189,30 +188,32 @@ If you are already familiar with Git you may wish to skip over the explanatory s
By default we will assume here you are cloning to your home directory however if you have an existing system for organising your workspace feel free to keep to that. **If you clone the repository to a path other than `~/mlpractical` however you will need to adjust all references to `~/mlpractical` in the commands below accordingly.**
To clone the `mlpractical` repository to the home directory run
```bash
```
git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
```
This will create a new `mlpractical` subdirectory with a local copy of the repository in it. Enter the directory and list all its contents, including hidden files, by running:
```bash
```
cd ~/mlpractical
ls -a # Windows equivalent: dir /a
```
For the most part this will look much like any other directory, with there being the following three non-hidden sub-directories:
* `data`: Data files used in the labs and assignments.
* `mlp`: The custom Python package we will use in this course.
* `notebooks`: The Jupyter notebook files for each lab and coursework.
* `data`: Data files used in the labs and assignments.
* `mlp`: The custom Python package we will use in this course.
* `notebooks`: The Jupyter notebook files for each lab and coursework.
Additionally there exists a hidden `.git` subdirectory (on Unix systems by default files and directories prepended with a period '.' are hidden). This directory contains the repository history database and various configuration files and references. Unless you are sure you know what you are doing you generally should not edit any of the files in this directory directly. Generally most configuration options can be enacted more safely using a `git config` command.
For instance to globally set the user name and email used in commits you can run:
```bash
```
git config --global user.name "[your name]"
git config --global user.email "[matric-number]@sms.ed.ac.uk"
```
@ -235,19 +236,19 @@ A *commit* in Git is a snapshot of the state of the project. The snapshots are r
2. The files with changes to be committed (including any new files) are added to the *staging area* by running:
```bash
```
git add file1 file2 ...
```
3. Finally the *staged changes* are used to create a new commit by running
```bash
```
git commit -m "A commit message describing the changes."
```
This writes the staged changes as a new commit in the repository history. We can see a log of the details of previous commits by running:
```bash
```
git log
```
@ -259,17 +260,17 @@ A new branch is created from a commit on an existing branch. Any commits made to
A typical Git workflow in a software development setting would be to create a new branch whenever making changes to a project, for example to fix a bug or implement a new feature. These changes are then isolated from the main code base allowing regular commits without worrying about making unstable changes to the main code base. Key to this workflow is the ability to *merge* commits from a branch into another branch, e.g. when it is decided a new feature is sufficiently developed to be added to the main code base. Although merging branches is key aspect of using Git in many projects, as dealing with merge conflicts when two branches both make changes to same parts of files can be a somewhat tricky process, we will here generally try to avoid the need for merges.
We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.
<p id='branching-explanation'>We will therefore use branches here in a slightly non-standard way. The code for each week's lab and for each of the assignments will be maintained in a separate branch. This will allow us to stage the release of the notebooks and code for each lab and assignment while allowing you to commit the changes you make to the code each week without having to merge those changes when new code is released. Similarly this structure will allow us to release updated notebooks from previous labs with proposed solutions without overwriting your own work.</p>
To list the branches present in the local repository, run:
```bash
```
git branch
```
This will display a list of branches with a `*` next to the current branch. To switch to a different existing branch in the local repository run
```bash
```
git checkout branch-name
```
@ -277,8 +278,8 @@ This will change the code in the working directory to the current state of the c
You should make sure you are on the first lab branch now by running:
```bash
git checkout mlp2024-25/lab1
```
git checkout mlp2023-24/lab1
```
## 6. Installing the `mlp` Python package
@ -291,7 +292,7 @@ The standard way to install a Python package using a `setup.py` script is to run
As we will be updating the code in the `mlp` package during the course of the labs this would require you to re-run `python setup.py install` every time a change is made to the package. Instead therefore you should install the package in development mode by running:
```bash
```
python setup.py develop
```
@ -303,20 +304,20 @@ Instead of copying the package, this will instead create a symbolic link to the
Note that after the first time a Python module is loaded into an interpreter instance, using for example:
```python
```
import mlp
```
Running the `import` statement any further times will have no effect even if the underlying module code has been changed. To reload an already imported module we instead need to use the [`importlib.reload`](https://docs.python.org/3/library/importlib.html#importlib.reload) function, e.g.
```python
```
import importlib
importlib.reload(mlp)
```
**Note: To be clear as this has caused some confusion in previous labs the above `import ...` / `reload(...)` statements should NOT be run directly in a bash terminal. They are examples Python statements - you could run them in a terminal by first loading a Python interpreter using:**
```bash
```
python
```
@ -330,7 +331,7 @@ We observed previously the presence of a `data` subdirectory in the local reposi
Assuming you used the recommended Miniconda install location and cloned the `mlpractical` repository to your home directory, this variable can be automatically defined when activating the environment by running the following commands (on non-Windows systems):
```bash
```
cd ~/miniconda3/envs/mlp
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
@ -343,12 +344,12 @@ export MLP_DATA_DIR=$HOME/mlpractical/data
And on Windows systems (replacing the `[]` placeholders with the relevant paths):
```bash
```
cd [path-to-conda-root]\envs\mlp
mkdir .\etc\conda\activate.d
mkdir .\etc\conda\deactivate.d
echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
echo "set MLP_DATA_DIR=" >> .\etc\conda\deactivate.d\env_vars.bat
@echo "set MLP_DATA_DIR=[path-to-local-repository]\data" >> .\etc\conda\activate.d\env_vars.bat
@echo "set MLP_DATA_DIR=" >> .\etc\conda\deactivate.d\env_vars.bat
set MLP_DATA_DIR=[path-to-local-repository]\data
```
@ -362,7 +363,7 @@ There will be a Jupyter notebook available for each lab and assignment in this c
To open a notebook, you first need to launch a Jupyter notebook server instance. From within the `mlpractical` directory containing your local copy of the repository (and with the `mlp` environment activated) run:
```bash
```
jupyter notebook
```
@ -378,13 +379,13 @@ Below are instructions for setting up the environment without additional explana
Start a new bash terminal. Download the latest 64-bit Python 3.9 Miniconda install script:
```bash
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
```
Run the install script:
```bash
```
bash Miniconda3-latest-Linux-x86_64.sh
```
@ -393,70 +394,69 @@ Review the software license agreement and choose whether to accept. Assuming you
You will then be asked whether to prepend the Miniconda binaries directory to the `PATH` system environment variable definition in `.bashrc`. You should respond `no` here as we will set up the addition to `PATH` manually in the next step.
Append the Miniconda binaries directory to `PATH` in manually in `~/.benv`:
```bash
```
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
echo ". /afs/inf.ed.ac.uk/user/${USER:0:3}/$USER/miniconda3/etc/profile.d/conda.sh" >> ~/.benv
```
`source` the updated `~/.benv`:
```bash
```
source ~/.benv
```
Create a new `mlp` Conda environment:
```bash
conda create -n mlp python=3.12.5 -y
```
conda create -n mlp python=3
```
Activate our created environment:
```bash
```
conda activate mlp
```
Install the dependencies for the course into the new environment:
```bash
conda install numpy scipy matplotlib jupyter -y
```
conda install numpy scipy matplotlib jupyter
```
Install PyTorch. The command below installs the CPU-only version of PyTorch. If you have access to a CUDA-enabled GPU and wish to install the GPU version of PyTorch instead, replace `cpuonly -c pytorch` with your CUDA version reference, e.g. for CUDA 11.7 use `pytorch-cuda=11.7 -c pytorch -c nvidia` in the command below. For more information see [here](https://pytorch.org/get-started/locally/).
```bash
conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
```
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```
Clear the package tarballs Conda just downloaded:
```bash
```
conda clean -t
```
Clone the course repository to your home directory:
```bash
```
git clone https://github.com/VICO-UoE/mlpractical.git ~/mlpractical
```
Make sure we are on the first lab branch
```bash
```
cd ~/mlpractical
git checkout mlp2024-25/lab1
git checkout mlp2023-24/lab1
```
Install the `mlp` package in the environment in develop mode
```bash
```
python ~/mlpractical/setup.py develop
```
Add an `MLP_DATA_DIR` variable to the environment
```bash
```
cd ~/miniconda3/envs/mlp
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
@ -469,13 +469,14 @@ export MLP_DATA_DIR=$HOME/mlpractical/data
Environment is now set up. Load the notebook server from `mlpractical` directory
```bash
```
cd ~/mlpractical
jupyter notebook
```
and then open the first lab notebook from the `notebooks` directory.
---
<b id="f1">[1]</b> The `echo` command causes the following text to be streamed to an output (standard terminal output by default). Here we use the append redirection operator `>>` to redirect the `echo` output to a file `~/.benv`, with it being appended to the end of the current file. The text actually added is `export PATH="$PATH:[your-home-directory]/miniconda/bin"` with the `\"` being used to escape the quote characters. The `export` command defines system-wide environment variables (more rigorously those inherited by child shells) with `PATH` being the environment variable defining where `bash` searches for executables as a colon-seperated list of directories. Here we add the Miniconda binary directory to the end of the current `PATH` definition. [](#a1)

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
notes/figures/boot_disk.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

View File

@ -0,0 +1,125 @@
# PyTorch Experiment Framework
## What does this framework do?
The PyTorch experiment framework located in ```mlp/pytorch_mlp_framework``` includes tooling for building an array of deep neural networks,
including fully connected and convolutional networks. In addition, it also includes tooling for experiment running,
metric handling and storage, model weight storage, checkpointing (allowing continuation from previous saved point), as
well as taking care of keeping track of the best validation model which is then used as the end to produce test set evaluation metrics.
## Why do we need it?
It serves two main purposes. The first, is to allow you an easy, worry-free transition into using PyTorch for experiments
in your coursework. The second, is to teach you good coding practices for building and running deep learning experiments
using PyTorch. The framework comes fully loaded with tooling that can keep track of relevant metrics, save models, resume from previous saved states and
even automatically choose the best validation model for test set evaluation. We include documentation and comments in almost
every single line of code in the framework, to help you maximize your learning. The code style itself, can be used for
learning good programming practices in structuring your code in a modular, readable and computationally efficient manner that minimizes chances of user-error.
## Installation
First thing you have to do is activate your conda MLP environment.
### GPU version on Google Compute Engine
For usage on google cloud, the disk image we provide comes pre-loaded with all the packages you need to run the PyTorch
experiment framework, including PyTorch itself. Thus when you created an instance and setup your environment, everything you need for this framework was installed, thus removing the need for you to install PyTorch.
### CPU version on DICE (or other local machine)
If you do not have your MLP conda environment installed on your current machine please follow the instructions in the [MLP environment installation guide](notes/environment-set-up.md). It includes an explanation on how to install a CPU version of PyTorch, or a GPU version if you have a GPU available on your local machine.
Once PyTorch is installed in your MLP conda enviroment, you can start using the framework. The framework has been built to allow you to control your experiment hyperparameters directly from the command line, by using command line argument parsing.
## Using the framework
You can get a list of all available hyperparameters and arguments by using:
```
python pytorch_mlp_framework/train_evaluate_image_classification_system.py -h
```
The -h at the end is short for --help, which presents a list with all possible arguments next to a description of what they modify in the setup.
Once you execute that command, you should be able to see the following list:
```
Welcome to the MLP course's PyTorch training and inference helper script
optional arguments:
-h, --help show this help message and exit
--batch_size [BATCH_SIZE]
Batch_size for experiment
--continue_from_epoch [CONTINUE_FROM_EPOCH]
Which epoch to continue from.
If -2, continues from where it left off
If -1, starts from scratch
if >=0, continues from given epoch
--seed [SEED] Seed to use for random number generator for experiment
--image_num_channels [IMAGE_NUM_CHANNELS]
The channel dimensionality of our image-data
--image_height [IMAGE_HEIGHT]
Height of image data
--image_width [IMAGE_WIDTH]
Width of image data
--num_stages [NUM_STAGES]
Number of convolutional stages in the network. A stage
is considered a sequence of convolutional layers where
the input volume remains the same in the spacial
dimension and is always terminated by a dimensionality
reduction stage
--num_blocks_per_stage [NUM_BLOCKS_PER_STAGE]
Number of convolutional blocks in each stage, not
including the reduction stage. A convolutional block
is made up of two convolutional layers activated using
the leaky-relu non-linearity
--num_filters [NUM_FILTERS]
Number of convolutional filters per convolutional
layer in the network (excluding dimensionality
reduction layers)
--num_epochs [NUM_EPOCHS]
The experiment's epoch budget
--num_classes [NUM_CLASSES]
The experiment's epoch budget
--experiment_name [EXPERIMENT_NAME]
Experiment name - to be used for building the
experiment folder
--use_gpu [USE_GPU] A flag indicating whether we will use GPU acceleration
or not
--weight_decay_coefficient [WEIGHT_DECAY_COEFFICIENT]
Weight decay to use for Adam
--block_type BLOCK_TYPE
Type of convolutional blocks to use in our network
(This argument will be useful in running experiments
to debug your network)
```
For example, to run a simple experiment using a 7-layer convolutional network on the CPU you can run:
```
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_07 --num_classes 100 --block_type 'conv_block' --weight_decay_coefficient 0.00000 --use_gpu False
```
Your experiment should begin running.
Your experiments statistics and model weights are saved in the directory tutorial_exp_1/ under tutorial_exp_1/logs and
tutorial_exp_1/saved_models.
To run on a GPU on Google Compute Engine the command would be:
```
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_07 --num_classes 100 --block_type 'conv_block' --weight_decay_coefficient 0.00000 --use_gpu True
```
We have also provided the exact scripts we used to run the experiments of VGG07 and VGG37 as shown in the coursework spec inside the files:
- run_vgg_08_default.sh
- run_vgg_38_default.sh
**However, remember, if you want to reuse those scripts for your own investigations, change the experiment name and seed.
If you do not change the name, the old folders will be overwritten.**
## So, where can I ask more questions and find more information on PyTorch and what it can do?
First course of action should be to search the web and then to refer to the PyTorch [documentation](https://pytorch.org/docs/stable/index.html),
[tutorials](https://pytorch.org/tutorials/) and [github](https://github.com/pytorch/pytorch) sites.
If you still can't get an answer to your question then as always, post on Piazza and/or come to the lab sessions.

View File

@ -0,0 +1,133 @@
import argparse
def str2bool(v):
if v.lower() in ("yes", "true", "t", "y", "1"):
return True
elif v.lower() in ("no", "false", "f", "n", "0"):
return False
else:
raise argparse.ArgumentTypeError("Boolean value expected.")
def get_args():
"""
Returns a namedtuple with arguments extracted from the command line.
:return: A namedtuple with arguments
"""
parser = argparse.ArgumentParser(
description="Welcome to the MLP course's Pytorch training and inference helper script"
)
parser.add_argument(
"--batch_size",
nargs="?",
type=int,
default=100,
help="Batch_size for experiment",
)
parser.add_argument(
"--continue_from_epoch",
nargs="?",
type=int,
default=-1,
help="Epoch you want to continue training from while restarting an experiment",
)
parser.add_argument(
"--seed",
nargs="?",
type=int,
default=7112018,
help="Seed to use for random number generator for experiment",
)
parser.add_argument(
"--image_num_channels",
nargs="?",
type=int,
default=3,
help="The channel dimensionality of our image-data",
)
parser.add_argument(
"--learning-rate",
nargs="?",
type=float,
default=1e-3,
help="The learning rate (default 1e-3)",
)
parser.add_argument(
"--image_height", nargs="?", type=int, default=32, help="Height of image data"
)
parser.add_argument(
"--image_width", nargs="?", type=int, default=32, help="Width of image data"
)
parser.add_argument(
"--num_stages",
nargs="?",
type=int,
default=3,
help="Number of convolutional stages in the network. A stage is considered a sequence of "
"convolutional layers where the input volume remains the same in the spacial dimension and"
" is always terminated by a dimensionality reduction stage",
)
parser.add_argument(
"--num_blocks_per_stage",
nargs="?",
type=int,
default=5,
help="Number of convolutional blocks in each stage, not including the reduction stage."
" A convolutional block is made up of two convolutional layers activated using the "
" leaky-relu non-linearity",
)
parser.add_argument(
"--num_filters",
nargs="?",
type=int,
default=16,
help="Number of convolutional filters per convolutional layer in the network (excluding "
"dimensionality reduction layers)",
)
parser.add_argument(
"--num_epochs",
nargs="?",
type=int,
default=100,
help="Total number of epochs for model training",
)
parser.add_argument(
"--num_classes",
nargs="?",
type=int,
default=100,
help="Number of classes in the dataset",
)
parser.add_argument(
"--experiment_name",
nargs="?",
type=str,
default="exp_1",
help="Experiment name - to be used for building the experiment folder",
)
parser.add_argument(
"--use_gpu",
nargs="?",
type=str2bool,
default=True,
help="A flag indicating whether we will use GPU acceleration or not",
)
parser.add_argument(
"--weight_decay_coefficient",
nargs="?",
type=float,
default=0,
help="Weight decay to use for Adam",
)
parser.add_argument(
"--block_type",
type=str,
default="conv_block",
help="Type of convolutional blocks to use in our network"
"(This argument will be useful in running experiments to debug your network)",
)
args = parser.parse_args()
print(args)
return args

View File

@ -0,0 +1,462 @@
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import tqdm
import os
import numpy as np
import time
from pytorch_mlp_framework.storage_utils import save_statistics
from matplotlib import pyplot as plt
import matplotlib
matplotlib.rcParams.update({"font.size": 8})
class ExperimentBuilder(nn.Module):
def __init__(
self,
network_model,
experiment_name,
num_epochs,
train_data,
val_data,
test_data,
weight_decay_coefficient,
learning_rate,
use_gpu,
continue_from_epoch=-1,
):
"""
Initializes an ExperimentBuilder object. Such an object takes care of running training and evaluation of a deep net
on a given dataset. It also takes care of saving per epoch models and automatically inferring the best val model
to be used for evaluating the test set metrics.
:param network_model: A pytorch nn.Module which implements a network architecture.
:param experiment_name: The name of the experiment. This is used mainly for keeping track of the experiment and creating and directory structure that will be used to save logs, model parameters and other.
:param num_epochs: Total number of epochs to run the experiment
:param train_data: An object of the DataProvider type. Contains the training set.
:param val_data: An object of the DataProvider type. Contains the val set.
:param test_data: An object of the DataProvider type. Contains the test set.
:param weight_decay_coefficient: A float indicating the weight decay to use with the adam optimizer.
:param use_gpu: A boolean indicating whether to use a GPU or not.
:param continue_from_epoch: An int indicating whether we'll start from scrach (-1) or whether we'll reload a previously saved model of epoch 'continue_from_epoch' and continue training from there.
"""
super(ExperimentBuilder, self).__init__()
self.experiment_name = experiment_name
self.model = network_model
if torch.cuda.device_count() >= 1 and use_gpu:
self.device = torch.device("cuda")
self.model.to(self.device) # sends the model from the cpu to the gpu
print("Use GPU", self.device)
else:
print("use CPU")
self.device = torch.device("cpu") # sets the device to be CPU
print(self.device)
print("here")
self.model.reset_parameters() # re-initialize network parameters
self.train_data = train_data
self.val_data = val_data
self.test_data = test_data
print("System learnable parameters")
num_conv_layers = 0
num_linear_layers = 0
total_num_parameters = 0
for name, value in self.named_parameters():
print(name, value.shape)
if all(item in name for item in ["conv", "weight"]):
num_conv_layers += 1
if all(item in name for item in ["linear", "weight"]):
num_linear_layers += 1
total_num_parameters += np.prod(value.shape)
print("Total number of parameters", total_num_parameters)
print("Total number of conv layers", num_conv_layers)
print("Total number of linear layers", num_linear_layers)
print(f"Learning rate: {learning_rate}")
self.optimizer = optim.Adam(
self.parameters(),
amsgrad=False,
weight_decay=weight_decay_coefficient,
lr=learning_rate,
)
self.learning_rate_scheduler = optim.lr_scheduler.CosineAnnealingLR(
self.optimizer, T_max=num_epochs, eta_min=0.00002
)
# Generate the directory names
self.experiment_folder = os.path.abspath(experiment_name)
self.experiment_logs = os.path.abspath(
os.path.join(self.experiment_folder, "result_outputs")
)
self.experiment_saved_models = os.path.abspath(
os.path.join(self.experiment_folder, "saved_models")
)
# Set best models to be at 0 since we are just starting
self.best_val_model_idx = 0
self.best_val_model_acc = 0.0
if not os.path.exists(
self.experiment_folder
): # If experiment directory does not exist
os.mkdir(self.experiment_folder) # create the experiment directory
os.mkdir(self.experiment_logs) # create the experiment log directory
os.mkdir(
self.experiment_saved_models
) # create the experiment saved models directory
self.num_epochs = num_epochs
self.criterion = nn.CrossEntropyLoss().to(
self.device
) # send the loss computation to the GPU
if (
continue_from_epoch == -2
): # if continue from epoch is -2 then continue from latest saved model
self.state, self.best_val_model_idx, self.best_val_model_acc = (
self.load_model(
model_save_dir=self.experiment_saved_models,
model_save_name="train_model",
model_idx="latest",
)
) # reload existing model from epoch and return best val model index
# and the best val acc of that model
self.starting_epoch = int(self.state["model_epoch"])
elif continue_from_epoch > -1: # if continue from epoch is greater than -1 then
self.state, self.best_val_model_idx, self.best_val_model_acc = (
self.load_model(
model_save_dir=self.experiment_saved_models,
model_save_name="train_model",
model_idx=continue_from_epoch,
)
) # reload existing model from epoch and return best val model index
# and the best val acc of that model
self.starting_epoch = continue_from_epoch
else:
self.state = dict()
self.starting_epoch = 0
def get_num_parameters(self):
total_num_params = 0
for param in self.parameters():
total_num_params += np.prod(param.shape)
return total_num_params
def plot_func_def(self, all_grads, layers):
"""
Plot function definition to plot the average gradient with respect to the number of layers in the given model
:param all_grads: Gradients wrt weights for each layer in the model.
:param layers: Layer names corresponding to the model parameters
:return: plot for gradient flow
"""
plt.plot(all_grads, alpha=0.3, color="b")
plt.hlines(0, 0, len(all_grads) + 1, linewidth=1, color="k")
plt.xticks(range(0, len(all_grads), 1), layers, rotation="vertical")
plt.xlim(xmin=0, xmax=len(all_grads))
plt.xlabel("Layers")
plt.ylabel("Average Gradient")
plt.title("Gradient flow")
plt.grid(True)
plt.tight_layout()
return plt
def plot_grad_flow(self, named_parameters):
"""
The function is being called in Line 298 of this file.
Receives the parameters of the model being trained. Returns plot of gradient flow for the given model parameters.
"""
all_grads = []
layers = []
"""
Complete the code in the block below to collect absolute mean of the gradients for each layer in all_grads with the layer names in layers.
"""
for name, param in named_parameters:
if "bias" in name:
continue
# Check if the parameter requires gradient and has a gradient
if param.requires_grad and param.grad is not None:
try:
_, a, _, b, _ = name.split(".", 4)
except:
b, a = name.split(".", 1)
layers.append(f"{a}_{b}")
# Collect the mean of the absolute gradients
all_grads.append(param.grad.abs().mean().item())
plt = self.plot_func_def(all_grads, layers)
return plt
def run_train_iter(self, x, y):
self.train() # sets model to training mode (in case batch normalization or other methods have different procedures for training and evaluation)
x, y = x.float().to(device=self.device), y.long().to(
device=self.device
) # send data to device as torch tensors
out = self.model.forward(x) # forward the data in the model
loss = F.cross_entropy(input=out, target=y) # compute loss
self.optimizer.zero_grad() # set all weight grads from previous training iters to 0
loss.backward() # backpropagate to compute gradients for current iter loss
self.optimizer.step() # update network parameters
self.learning_rate_scheduler.step() # update learning rate scheduler
_, predicted = torch.max(out.data, 1) # get argmax of predictions
accuracy = np.mean(list(predicted.eq(y.data).cpu())) # compute accuracy
return loss.cpu().data.numpy(), accuracy
def run_evaluation_iter(self, x, y):
"""
Receives the inputs and targets for the model and runs an evaluation iterations. Returns loss and accuracy metrics.
:param x: The inputs to the model. A numpy array of shape batch_size, channels, height, width
:param y: The targets for the model. A numpy array of shape batch_size, num_classes
:return: the loss and accuracy for this batch
"""
self.eval() # sets the system to validation mode
x, y = x.float().to(device=self.device), y.long().to(
device=self.device
) # convert data to pytorch tensors and send to the computation device
out = self.model.forward(x) # forward the data in the model
loss = F.cross_entropy(input=out, target=y) # compute loss
_, predicted = torch.max(out.data, 1) # get argmax of predictions
accuracy = np.mean(list(predicted.eq(y.data).cpu())) # compute accuracy
return loss.cpu().data.numpy(), accuracy
def save_model(
self,
model_save_dir,
model_save_name,
model_idx,
best_validation_model_idx,
best_validation_model_acc,
):
"""
Save the network parameter state and current best val epoch idx and best val accuracy.
:param model_save_name: Name to use to save model without the epoch index
:param model_idx: The index to save the model with.
:param best_validation_model_idx: The index of the best validation model to be stored for future use.
:param best_validation_model_acc: The best validation accuracy to be stored for use at test time.
:param model_save_dir: The directory to store the state at.
:param state: The dictionary containing the system state.
"""
self.state["network"] = (
self.state_dict()
) # save network parameter and other variables.
self.state["best_val_model_idx"] = (
best_validation_model_idx # save current best val idx
)
self.state["best_val_model_acc"] = (
best_validation_model_acc # save current best val acc
)
torch.save(
self.state,
f=os.path.join(
model_save_dir, "{}_{}".format(model_save_name, str(model_idx))
),
) # save state at prespecified filepath
def load_model(self, model_save_dir, model_save_name, model_idx):
"""
Load the network parameter state and the best val model idx and best val acc to be compared with the future val accuracies, in order to choose the best val model
:param model_save_dir: The directory to store the state at.
:param model_save_name: Name to use to save model without the epoch index
:param model_idx: The index to save the model with.
:return: best val idx and best val model acc, also it loads the network state into the system state without returning it
"""
state = torch.load(
f=os.path.join(
model_save_dir, "{}_{}".format(model_save_name, str(model_idx))
)
)
self.load_state_dict(state_dict=state["network"])
return state, state["best_val_model_idx"], state["best_val_model_acc"]
def run_experiment(self):
"""
Runs experiment train and evaluation iterations, saving the model and best val model and val model accuracy after each epoch
:return: The summary current_epoch_losses from starting epoch to total_epochs.
"""
total_losses = {
"train_acc": [],
"train_loss": [],
"val_acc": [],
"val_loss": [],
} # initialize a dict to keep the per-epoch metrics
for i, epoch_idx in enumerate(range(self.starting_epoch, self.num_epochs)):
epoch_start_time = time.time()
current_epoch_losses = {
"train_acc": [],
"train_loss": [],
"val_acc": [],
"val_loss": [],
}
self.current_epoch = epoch_idx
with tqdm.tqdm(
total=len(self.train_data)
) as pbar_train: # create a progress bar for training
for idx, (x, y) in enumerate(self.train_data): # get data batches
loss, accuracy = self.run_train_iter(
x=x, y=y
) # take a training iter step
current_epoch_losses["train_loss"].append(
loss
) # add current iter loss to the train loss list
current_epoch_losses["train_acc"].append(
accuracy
) # add current iter acc to the train acc list
pbar_train.update(1)
pbar_train.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
)
with tqdm.tqdm(
total=len(self.val_data)
) as pbar_val: # create a progress bar for validation
for x, y in self.val_data: # get data batches
loss, accuracy = self.run_evaluation_iter(
x=x, y=y
) # run a validation iter
current_epoch_losses["val_loss"].append(
loss
) # add current iter loss to val loss list.
current_epoch_losses["val_acc"].append(
accuracy
) # add current iter acc to val acc lst.
pbar_val.update(1) # add 1 step to the progress bar
pbar_val.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
)
val_mean_accuracy = np.mean(current_epoch_losses["val_acc"])
if (
val_mean_accuracy > self.best_val_model_acc
): # if current epoch's mean val acc is greater than the saved best val acc then
self.best_val_model_acc = val_mean_accuracy # set the best val model acc to be current epoch's val accuracy
self.best_val_model_idx = epoch_idx # set the experiment-wise best val idx to be the current epoch's idx
for key, value in current_epoch_losses.items():
total_losses[key].append(
np.mean(value)
) # get mean of all metrics of current epoch metrics dict, to get them ready for storage and output on the terminal.
save_statistics(
experiment_log_dir=self.experiment_logs,
filename="summary.csv",
stats_dict=total_losses,
current_epoch=i,
continue_from_mode=(
True if (self.starting_epoch != 0 or i > 0) else False
),
) # save statistics to stats file.
# load_statistics(experiment_log_dir=self.experiment_logs, filename='summary.csv') # How to load a csv file if you need to
out_string = "_".join(
[
"{}_{:.4f}".format(key, np.mean(value))
for key, value in current_epoch_losses.items()
]
)
# create a string to use to report our epoch metrics
epoch_elapsed_time = (
time.time() - epoch_start_time
) # calculate time taken for epoch
epoch_elapsed_time = "{:.4f}".format(epoch_elapsed_time)
print(
"Epoch {}:".format(epoch_idx),
out_string,
"epoch time",
epoch_elapsed_time,
"seconds",
)
self.state["model_epoch"] = epoch_idx
self.save_model(
model_save_dir=self.experiment_saved_models,
# save model and best val idx and best val acc, using the model dir, model name and model idx
model_save_name="train_model",
model_idx=epoch_idx,
best_validation_model_idx=self.best_val_model_idx,
best_validation_model_acc=self.best_val_model_acc,
)
self.save_model(
model_save_dir=self.experiment_saved_models,
# save model and best val idx and best val acc, using the model dir, model name and model idx
model_save_name="train_model",
model_idx="latest",
best_validation_model_idx=self.best_val_model_idx,
best_validation_model_acc=self.best_val_model_acc,
)
################################################################
##### Plot Gradient Flow at each Epoch during Training ######
print("Generating Gradient Flow Plot at epoch {}".format(epoch_idx))
plt = self.plot_grad_flow(self.model.named_parameters())
if not os.path.exists(
os.path.join(self.experiment_saved_models, "gradient_flow_plots")
):
os.mkdir(
os.path.join(self.experiment_saved_models, "gradient_flow_plots")
)
# plt.legend(loc="best")
plt.savefig(
os.path.join(
self.experiment_saved_models,
"gradient_flow_plots",
"epoch{}.pdf".format(str(epoch_idx)),
)
)
################################################################
print("Generating test set evaluation metrics")
self.load_model(
model_save_dir=self.experiment_saved_models,
model_idx=self.best_val_model_idx,
# load best validation model
model_save_name="train_model",
)
current_epoch_losses = {
"test_acc": [],
"test_loss": [],
} # initialize a statistics dict
with tqdm.tqdm(total=len(self.test_data)) as pbar_test: # ini a progress bar
for x, y in self.test_data: # sample batch
loss, accuracy = self.run_evaluation_iter(
x=x, y=y
) # compute loss and accuracy by running an evaluation step
current_epoch_losses["test_loss"].append(loss) # save test loss
current_epoch_losses["test_acc"].append(accuracy) # save test accuracy
pbar_test.update(1) # update progress bar status
pbar_test.set_description(
"loss: {:.4f}, accuracy: {:.4f}".format(loss, accuracy)
) # update progress bar string output
test_losses = {
key: [np.mean(value)] for key, value in current_epoch_losses.items()
} # save test set metrics in dict format
save_statistics(
experiment_log_dir=self.experiment_logs,
filename="test_summary.csv",
# save test set metrics on disk in .csv format
stats_dict=test_losses,
current_epoch=0,
continue_from_mode=False,
)
return total_losses, test_losses

View File

@ -0,0 +1,640 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
class FCCNetwork(nn.Module):
def __init__(
self, input_shape, num_output_classes, num_filters, num_layers, use_bias=False
):
"""
Initializes a fully connected network similar to the ones implemented previously in the MLP package.
:param input_shape: The shape of the inputs going in to the network.
:param num_output_classes: The number of outputs the network should have (for classification those would be the number of classes)
:param num_filters: Number of filters used in every fcc layer.
:param num_layers: Number of fcc layers (excluding dim reduction stages)
:param use_bias: Whether our fcc layers will use a bias.
"""
super(FCCNetwork, self).__init__()
# set up class attributes useful in building the network and inference
self.input_shape = input_shape
self.num_filters = num_filters
self.num_output_classes = num_output_classes
self.use_bias = use_bias
self.num_layers = num_layers
# initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
self.layer_dict = nn.ModuleDict()
# build the network
self.build_module()
def build_module(self):
print("Building basic block of FCCNetwork using input shape", self.input_shape)
x = torch.zeros((self.input_shape))
out = x
out = out.view(out.shape[0], -1)
# flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
# shapes of all dimensions after the 0th dim
for i in range(self.num_layers):
self.layer_dict["fcc_{}".format(i)] = nn.Linear(
in_features=out.shape[1], # initialize a fcc layer
out_features=self.num_filters,
bias=self.use_bias,
)
out = self.layer_dict["fcc_{}".format(i)](
out
) # apply ith fcc layer to the previous layers outputs
out = F.relu(out) # apply a ReLU on the outputs
self.logits_linear_layer = nn.Linear(
in_features=out.shape[1], # initialize the prediction output linear layer
out_features=self.num_output_classes,
bias=self.use_bias,
)
out = self.logits_linear_layer(
out
) # apply the layer to the previous layer's outputs
print("Block is built, output volume is", out.shape)
return out
def forward(self, x):
"""
Forward prop data through the network and return the preds
:param x: Input batch x a batch of shape batch number of samples, each of any dimensionality.
:return: preds of shape (b, num_classes)
"""
out = x
out = out.view(out.shape[0], -1)
# flatten inputs to shape (b, -1) where -1 is the dim resulting from multiplying the
# shapes of all dimensions after the 0th dim
for i in range(self.num_layers):
out = self.layer_dict["fcc_{}".format(i)](
out
) # apply ith fcc layer to the previous layers outputs
out = F.relu(out) # apply a ReLU on the outputs
out = self.logits_linear_layer(
out
) # apply the layer to the previous layer's outputs
return out
def reset_parameters(self):
"""
Re-initializes the networks parameters
"""
for item in self.layer_dict.children():
item.reset_parameters()
self.logits_linear_layer.reset_parameters()
class EmptyBlock(nn.Module):
def __init__(
self,
input_shape=None,
num_filters=None,
kernel_size=None,
padding=None,
bias=None,
dilation=None,
reduction_factor=None,
):
super(EmptyBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
self.layer_dict["Identity"] = nn.Identity()
def forward(self, x):
out = x
out = self.layer_dict["Identity"].forward(out)
return out
class EntryConvolutionalBlock(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super(EntryConvolutionalBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_0"].forward(out)
self.layer_dict["bn_0"] = nn.BatchNorm2d(num_features=out.shape[1])
out = F.leaky_relu(self.layer_dict["bn_0"].forward(out))
print(out.shape)
def forward(self, x):
out = x
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(self.layer_dict["bn_0"].forward(out))
return out
class ConvolutionalProcessingBlock(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super(ConvolutionalProcessingBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
print(out.shape)
def forward(self, x):
out = x
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
return out
class ConvolutionalDimensionalityReductionBlock(nn.Module):
def __init__(
self,
input_shape,
num_filters,
kernel_size,
padding,
bias,
dilation,
reduction_factor,
):
super(ConvolutionalDimensionalityReductionBlock, self).__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.reduction_factor = reduction_factor
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
out = F.avg_pool2d(out, self.reduction_factor)
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
print(out.shape)
def forward(self, x):
out = x
out = self.layer_dict["conv_0"].forward(out)
out = F.leaky_relu(out)
out = F.avg_pool2d(out, self.reduction_factor)
out = self.layer_dict["conv_1"].forward(out)
out = F.leaky_relu(out)
return out
class ConvolutionalNetwork(nn.Module):
def __init__(
self,
input_shape,
num_output_classes,
num_filters,
num_blocks_per_stage,
num_stages,
use_bias=False,
processing_block_type=ConvolutionalProcessingBlock,
dimensionality_reduction_block_type=ConvolutionalDimensionalityReductionBlock,
):
"""
Initializes a convolutional network module
:param input_shape: The shape of the tensor to be passed into this network
:param num_output_classes: Number of output classes
:param num_filters: Number of filters per convolutional layer
:param num_blocks_per_stage: Number of blocks per "stage". Each block is composed of 2 convolutional layers.
:param num_stages: Number of stages in a network. A stage is defined as a sequence of layers within which the
data dimensionality remains constant in the spacial axis (h, w) and can change in the channel axis. After each stage
there exists a dimensionality reduction stage, composed of two convolutional layers and an avg pooling layer.
:param use_bias: Whether to use biases in our convolutional layers
:param processing_block_type: Type of processing block to use within our stages
:param dimensionality_reduction_block_type: Type of dimensionality reduction block to use after each stage in our network
"""
super(ConvolutionalNetwork, self).__init__()
# set up class attributes useful in building the network and inference
self.input_shape = input_shape
self.num_filters = num_filters
self.num_output_classes = num_output_classes
self.use_bias = use_bias
self.num_blocks_per_stage = num_blocks_per_stage
self.num_stages = num_stages
self.processing_block_type = processing_block_type
self.dimensionality_reduction_block_type = dimensionality_reduction_block_type
# build the network
self.build_module()
def build_module(self):
"""
Builds network whilst automatically inferring shapes of layers.
"""
self.layer_dict = nn.ModuleDict()
# initialize a module dict, which is effectively a dictionary that can collect layers and integrate them into pytorch
print(
"Building basic block of ConvolutionalNetwork using input shape",
self.input_shape,
)
x = torch.zeros(
(self.input_shape)
) # create dummy inputs to be used to infer shapes of layers
out = x
self.layer_dict["input_conv"] = EntryConvolutionalBlock(
input_shape=out.shape,
num_filters=self.num_filters,
kernel_size=3,
padding=1,
bias=self.use_bias,
dilation=1,
)
out = self.layer_dict["input_conv"].forward(out)
# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
for i in range(self.num_stages): # for number of layers times
for j in range(self.num_blocks_per_stage):
self.layer_dict["block_{}_{}".format(i, j)] = (
self.processing_block_type(
input_shape=out.shape,
num_filters=self.num_filters,
bias=self.use_bias,
kernel_size=3,
dilation=1,
padding=1,
)
)
out = self.layer_dict["block_{}_{}".format(i, j)].forward(out)
self.layer_dict["reduction_block_{}".format(i)] = (
self.dimensionality_reduction_block_type(
input_shape=out.shape,
num_filters=self.num_filters,
bias=True,
kernel_size=3,
dilation=1,
padding=1,
reduction_factor=2,
)
)
out = self.layer_dict["reduction_block_{}".format(i)].forward(out)
out = F.avg_pool2d(out, out.shape[-1])
print("shape before final linear layer", out.shape)
out = out.view(out.shape[0], -1)
self.logit_linear_layer = nn.Linear(
in_features=out.shape[1], # add a linear layer
out_features=self.num_output_classes,
bias=True,
)
out = self.logit_linear_layer(out) # apply linear layer on flattened inputs
print("Block is built, output volume is", out.shape)
return out
def forward(self, x):
"""
Forward propages the network given an input batch
:param x: Inputs x (b, c, h, w)
:return: preds (b, num_classes)
"""
out = x
out = self.layer_dict["input_conv"].forward(out)
for i in range(self.num_stages): # for number of layers times
for j in range(self.num_blocks_per_stage):
out = self.layer_dict["block_{}_{}".format(i, j)].forward(out)
out = self.layer_dict["reduction_block_{}".format(i)].forward(out)
out = F.avg_pool2d(out, out.shape[-1])
out = out.view(
out.shape[0], -1
) # flatten outputs from (b, c, h, w) to (b, c*h*w)
out = self.logit_linear_layer(
out
) # pass through a linear layer to get logits/preds
return out
def reset_parameters(self):
"""
Re-initialize the network parameters.
"""
for item in self.layer_dict.children():
try:
item.reset_parameters()
except:
pass
self.logit_linear_layer.reset_parameters()
# My Implementation:
class ConvolutionalProcessingBlockBN(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super().__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
# First convolutional layer with Batch Normalization
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Second convolutional layer with Batch Normalization
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
print(out.shape)
def forward(self, x):
out = x
# Apply first conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Apply second conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
return out
class ConvolutionalDimensionalityReductionBlockBN(nn.Module):
def __init__(
self,
input_shape,
num_filters,
kernel_size,
padding,
bias,
dilation,
reduction_factor,
):
super().__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.reduction_factor = reduction_factor
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
# First convolutional layer with Batch Normalization
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Dimensionality reduction through average pooling
out = F.avg_pool2d(out, self.reduction_factor)
# Second convolutional layer with Batch Normalization
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
print(out.shape)
def forward(self, x):
out = x
# Apply first conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Dimensionality reduction through average pooling
out = F.avg_pool2d(out, self.reduction_factor)
# Apply second conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
return out
class ConvolutionalProcessingBlockBNRC(nn.Module):
def __init__(self, input_shape, num_filters, kernel_size, padding, bias, dilation):
super().__init__()
self.num_filters = num_filters
self.kernel_size = kernel_size
self.input_shape = input_shape
self.padding = padding
self.bias = bias
self.dilation = dilation
self.build_module()
def build_module(self):
self.layer_dict = nn.ModuleDict()
x = torch.zeros(self.input_shape)
out = x
# First convolutional layer with BN
self.layer_dict["conv_0"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_0"] = nn.BatchNorm2d(self.num_filters)
out = self.layer_dict["conv_0"].forward(out)
out = self.layer_dict["bn_0"].forward(out)
out = F.leaky_relu(out)
# Second convolutional layer with BN
self.layer_dict["conv_1"] = nn.Conv2d(
in_channels=out.shape[1],
out_channels=self.num_filters,
bias=self.bias,
kernel_size=self.kernel_size,
dilation=self.dilation,
padding=self.padding,
stride=1,
)
self.layer_dict["bn_1"] = nn.BatchNorm2d(self.num_filters)
out = self.layer_dict["conv_1"].forward(out)
out = self.layer_dict["bn_1"].forward(out)
out = F.leaky_relu(out)
# Print final output shape for debugging
print(out.shape)
def forward(self, x):
residual = x # Save input for residual connection
out = x
# Apply first conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_0"](self.layer_dict["conv_0"](out)))
# Apply second conv layer + BN + ReLU
out = F.leaky_relu(self.layer_dict["bn_1"](self.layer_dict["conv_1"](out)))
# Add residual connection
# Ensure shape compatibility
assert residual.shape == out.shape
# if residual.shape == out.shape:
out += residual
return out

View File

@ -0,0 +1,77 @@
import pickle
import os
import csv
def save_to_stats_pkl_file(experiment_log_filepath, filename, stats_dict):
summary_filename = os.path.join(experiment_log_filepath, filename)
with open("{}.pkl".format(summary_filename), "wb") as file_writer:
pickle.dump(stats_dict, file_writer)
def load_from_stats_pkl_file(experiment_log_filepath, filename):
summary_filename = os.path.join(experiment_log_filepath, filename)
with open("{}.pkl".format(summary_filename), "rb") as file_reader:
stats = pickle.load(file_reader)
return stats
def save_statistics(
experiment_log_dir,
filename,
stats_dict,
current_epoch,
continue_from_mode=False,
save_full_dict=False,
):
"""
Saves the statistics in stats dict into a csv file. Using the keys as the header entries and the values as the
columns of a particular header entry
:param experiment_log_dir: the log folder dir filepath
:param filename: the name of the csv file
:param stats_dict: the stats dict containing the data to be saved
:param current_epoch: the number of epochs since commencement of the current training session (i.e. if the experiment continued from 100 and this is epoch 105, then pass relative distance of 5.)
:param save_full_dict: whether to save the full dict as is overriding any previous entries (might be useful if we want to overwrite a file)
:return: The filepath to the summary file
"""
summary_filename = os.path.join(experiment_log_dir, filename)
mode = "a" if continue_from_mode else "w"
with open(summary_filename, mode) as f:
writer = csv.writer(f)
if not continue_from_mode:
writer.writerow(list(stats_dict.keys()))
if save_full_dict:
total_rows = len(list(stats_dict.values())[0])
for idx in range(total_rows):
row_to_add = [value[idx] for value in list(stats_dict.values())]
writer.writerow(row_to_add)
else:
row_to_add = [value[current_epoch] for value in list(stats_dict.values())]
writer.writerow(row_to_add)
return summary_filename
def load_statistics(experiment_log_dir, filename):
"""
Loads a statistics csv file into a dictionary
:param experiment_log_dir: the log folder dir filepath
:param filename: the name of the csv file to load
:return: A dictionary containing the stats in the csv file. Header entries are converted into keys and columns of a
particular header are converted into values of a key in a list format.
"""
summary_filename = os.path.join(experiment_log_dir, filename)
with open(summary_filename, "r+") as f:
lines = f.readlines()
keys = lines[0].split(",")
stats = {key: [] for key in keys}
for line in lines[1:]:
values = line.split(",")
for idx, value in enumerate(values):
stats[keys[idx]].append(value)
return stats

View File

@ -0,0 +1,87 @@
import unittest
import torch
from model_architectures import (
ConvolutionalProcessingBlockBN,
ConvolutionalDimensionalityReductionBlockBN,
ConvolutionalProcessingBlockBNRC,
)
class TestBatchNormalizationBlocks(unittest.TestCase):
def setUp(self):
# Common parameters
self.input_shape = (1, 3, 32, 32) # Batch size 1, 3 channels, 32x32 input
self.num_filters = 16
self.kernel_size = 3
self.padding = 1
self.bias = False
self.dilation = 1
self.reduction_factor = 2
def test_convolutional_processing_block(self):
# Create a ConvolutionalProcessingBlockBN instance
block = ConvolutionalProcessingBlockBN(
input_shape=self.input_shape,
num_filters=self.num_filters,
kernel_size=self.kernel_size,
padding=self.padding,
bias=self.bias,
dilation=self.dilation,
)
# Generate a random tensor matching the input shape
input_tensor = torch.randn(self.input_shape)
# Forward pass
try:
output = block(input_tensor)
self.assertIsNotNone(output, "Output should not be None.")
except Exception as e:
self.fail(f"ConvolutionalProcessingBlock raised an error: {e}")
def test_convolutional_processing_block_with_rc(self):
# Create a ConvolutionalProcessingBlockBNRC instance
block = ConvolutionalProcessingBlockBNRC(
input_shape=self.input_shape,
num_filters=self.num_filters,
kernel_size=self.kernel_size,
padding=self.padding,
bias=self.bias,
dilation=self.dilation,
)
# Generate a random tensor matching the input shape
input_tensor = torch.randn(self.input_shape)
# Forward pass
try:
output = block(input_tensor)
self.assertIsNotNone(output, "Output should not be None.")
except Exception as e:
self.fail(f"ConvolutionalProcessingBlock raised an error: {e}")
def test_convolutional_dimensionality_reduction_block(self):
# Create a ConvolutionalDimensionalityReductionBlockBN instance
block = ConvolutionalDimensionalityReductionBlockBN(
input_shape=self.input_shape,
num_filters=self.num_filters,
kernel_size=self.kernel_size,
padding=self.padding,
bias=self.bias,
dilation=self.dilation,
reduction_factor=self.reduction_factor,
)
# Generate a random tensor matching the input shape
input_tensor = torch.randn(self.input_shape)
# Forward pass
try:
output = block(input_tensor)
self.assertIsNotNone(output, "Output should not be None.")
except Exception as e:
self.fail(f"ConvolutionalDimensionalityReductionBlock raised an error: {e}")
if __name__ == "__main__":
unittest.main()

View File

@ -0,0 +1,102 @@
import numpy as np
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
import mlp.data_providers as data_providers
from pytorch_mlp_framework.arg_extractor import get_args
from pytorch_mlp_framework.experiment_builder import ExperimentBuilder
from pytorch_mlp_framework.model_architectures import *
import os
# os.environ["CUDA_VISIBLE_DEVICES"]="0"
args = get_args() # get arguments from command line
rng = np.random.RandomState(seed=args.seed) # set the seeds for the experiment
torch.manual_seed(seed=args.seed) # sets pytorch's seed
# set up data augmentation transforms for training and testing
transform_train = transforms.Compose(
[
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]
)
transform_test = transforms.Compose(
[
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]
)
train_data = data_providers.CIFAR100(
root="data", set_name="train", transform=transform_train, download=True
) # initialize our rngs using the argument set seed
val_data = data_providers.CIFAR100(
root="data", set_name="val", transform=transform_test, download=True
) # initialize our rngs using the argument set seed
test_data = data_providers.CIFAR100(
root="data", set_name="test", transform=transform_test, download=True
) # initialize our rngs using the argument set seed
train_data_loader = DataLoader(
train_data, batch_size=args.batch_size, shuffle=True, num_workers=2
)
val_data_loader = DataLoader(
val_data, batch_size=args.batch_size, shuffle=True, num_workers=2
)
test_data_loader = DataLoader(
test_data, batch_size=args.batch_size, shuffle=True, num_workers=2
)
if args.block_type == "conv_block":
processing_block_type = ConvolutionalProcessingBlock
dim_reduction_block_type = ConvolutionalDimensionalityReductionBlock
elif args.block_type == "empty_block":
processing_block_type = EmptyBlock
dim_reduction_block_type = EmptyBlock
elif args.block_type == "conv_bn":
processing_block_type = ConvolutionalProcessingBlockBN
dim_reduction_block_type = ConvolutionalDimensionalityReductionBlockBN
elif args.block_type == "conv_bn_rc":
processing_block_type = ConvolutionalProcessingBlockBNRC
dim_reduction_block_type = ConvolutionalDimensionalityReductionBlockBN
else:
raise ModuleNotFoundError
custom_conv_net = (
ConvolutionalNetwork( # initialize our network object, in this case a ConvNet
input_shape=(
args.batch_size,
args.image_num_channels,
args.image_height,
args.image_width,
),
num_output_classes=args.num_classes,
num_filters=args.num_filters,
use_bias=False,
num_blocks_per_stage=args.num_blocks_per_stage,
num_stages=args.num_stages,
processing_block_type=processing_block_type,
dimensionality_reduction_block_type=dim_reduction_block_type,
)
)
conv_experiment = ExperimentBuilder(
network_model=custom_conv_net,
experiment_name=args.experiment_name,
num_epochs=args.num_epochs,
weight_decay_coefficient=args.weight_decay_coefficient,
learning_rate=args.learning_rate,
use_gpu=args.use_gpu,
continue_from_epoch=args.continue_from_epoch,
train_data=train_data_loader,
val_data=val_data_loader,
test_data=test_data_loader,
) # build an experiment object
experiment_metrics, test_metrics = (
conv_experiment.run_experiment()
) # run experiment and return experiment metrics

4
report/.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
*.fls
*.fdb_latexmk
s2759177/
*.zip

1
report/README.txt Normal file
View File

@ -0,0 +1 @@
Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory. However certain LaTeX distributions are missing certain files, and the they are included in this directory. If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.

View File

@ -0,0 +1,101 @@
train_acc,train_loss,val_acc,val_loss
0.027410526315789472,4.440032,0.0368,4.238186
0.0440842105263158,4.1909122,0.0644,4.1239405
0.05604210526315791,4.0817885,0.0368,4.495799
0.0685263157894737,3.984858,0.0964,3.8527937
0.08345263157894738,3.8947835,0.09080000000000002,3.8306112
0.09391578947368423,3.8246264,0.10399999999999998,3.7504945
0.10189473684210527,3.760145,0.1124,3.6439042
0.11197894736842108,3.704831,0.0992,3.962508
0.12534736842105265,3.6408415,0.1404,3.516474
0.1385894736842105,3.5672796,0.1444,3.5242612
0.14873684210526317,3.5145628,0.12960000000000002,3.5745378
0.16103157894736844,3.4476008,0.1852,3.3353982
0.16846315789473681,3.399858,0.15600000000000003,3.453797
0.1760210526315789,3.3611393,0.1464,3.5799885
0.18625263157894736,3.3005812,0.196,3.201007
0.19233684210526317,3.26565,0.17439999999999997,3.397586
0.19625263157894737,3.2346153,0.212,3.169959
0.20717894736842105,3.174345,0.2132,3.0981174
0.2136,3.1425776,0.2036,3.2191591
0.2217684210526316,3.094137,0.236,3.0018876
0.23069473684210529,3.0539455,0.20440000000000003,3.1800296
0.23395789473684211,3.0338168,0.22599999999999998,3.0360818
0.24463157894736842,2.9761615,0.2588,2.8876188
0.25311578947368424,2.931479,0.2,3.242481
0.25795789473684216,2.900163,0.28320000000000006,2.830947
0.26789473684210524,2.8484874,0.2768,2.8190458
0.2709263157894737,2.833472,0.2352,3.0098538
0.2816421052631579,2.7842317,0.29560000000000003,2.7288156
0.28764210526315787,2.745757,0.2648,2.8955112
0.2930315789473684,2.7276495,0.27680000000000005,2.8336413
0.3001263157894737,2.6826382,0.316,2.6245823
0.3068421052631579,2.658441,0.27,2.9279957
0.30909473684210526,2.638565,0.31160000000000004,2.637653
0.3213263157894737,2.5939283,0.31799999999999995,2.627816
0.3211157894736843,2.579544,0.25079999999999997,2.9502957
0.3259999999999999,2.5540712,0.3332,2.569941
0.3336421052631579,2.5239582,0.278,2.7676308
0.3371368421052632,2.5109046,0.2916,2.725589
0.34404210526315787,2.4714804,0.34120000000000006,2.4782379
0.3500631578947368,2.4545348,0.30600000000000005,2.6625924
0.34976842105263156,2.4408882,0.342,2.5351026
0.3586315789473684,2.4116046,0.3452,2.450749
0.3568421052631579,2.4133172,0.3288,2.5647113
0.3630947368421052,2.3772728,0.36519999999999997,2.388074
0.37069473684210524,2.3505116,0.324,2.5489926
0.37132631578947367,2.352426,0.33680000000000004,2.5370462
0.37606315789473677,2.319005,0.3712,2.3507965
0.3800210526315789,2.3045664,0.33,2.6327293
0.38185263157894733,2.2965574,0.3764,2.364877
0.38785263157894734,2.269467,0.37799999999999995,2.330837
0.3889684210526316,2.26941,0.3559999999999999,2.513778
0.3951789473684211,2.2413251,0.3888,2.2839465
0.3944421052631579,2.2319226,0.35919999999999996,2.4310353
0.4,2.220305,0.3732,2.348543
0.4051157894736842,2.1891508,0.39440000000000003,2.2730627
0.40581052631578945,2.1873925,0.33399999999999996,2.5648093
0.4067789473684211,2.1817088,0.4044,2.2244952
0.41555789473684207,2.1543047,0.39759999999999995,2.220972
0.4170526315789474,2.14905,0.33399999999999996,2.6612198
0.41762105263157895,2.1321266,0.3932,2.2343464
0.42341052631578946,2.1131704,0.37800000000000006,2.327929
0.4212842105263158,2.112597,0.376,2.3302126
0.4295157894736842,2.0925663,0.4100000000000001,2.175698
0.4299368421052632,2.0846903,0.3772,2.3750577
0.43134736842105265,2.075184,0.4044,2.1888158
0.43829473684210524,2.045202,0.41239999999999993,2.1673117
0.43534736842105265,2.0590534,0.37440000000000007,2.3269994
0.4417684210526316,2.0356588,0.42,2.1668334
0.4442736842105263,2.028207,0.41239999999999993,2.2346516
0.44581052631578943,2.021492,0.40519999999999995,2.2030904
0.44884210526315793,2.0058675,0.4296,2.0948715
0.45071578947368424,1.993417,0.39,2.2856123
0.45130526315789476,1.9970801,0.43599999999999994,2.110219
0.45686315789473686,1.9651922,0.4244,2.1253593
0.4557263157894737,1.9701725,0.3704,2.4576838
0.4609684210526315,1.956996,0.4412,2.0626938
0.4639789473684211,1.9407912,0.398,2.3076272
0.46311578947368426,1.9410807,0.4056,2.2181008
0.4686736842105263,1.918824,0.45080000000000003,2.030652
0.4650315789473684,1.924879,0.3948,2.2926931
0.46964210526315786,1.9188553,0.43599999999999994,2.107239
0.47357894736842104,1.8991861,0.43119999999999997,2.067097
0.47212631578947367,1.8987728,0.41359999999999997,2.1667569
0.4773263157894737,1.8892545,0.46,2.0283196
0.4802526315789474,1.8736148,0.41960000000000003,2.1698954
0.47406315789473685,1.8849738,0.43399999999999994,2.1001608
0.48627368421052636,1.8492608,0.45520000000000005,1.9936249
0.48589473684210527,1.8534511,0.38439999999999996,2.354954
0.48667368421052637,1.8421199,0.44120000000000004,2.0467849
0.4902736842105263,1.8265136,0.45519999999999994,2.0044358
0.4879789473684211,1.838593,0.3984,2.3019247
0.49204210526315795,1.8199797,0.4656,1.9858631
0.4945894736842105,1.805858,0.436,2.1293921
0.4939578947368421,1.8174701,0.4388,2.0611947
0.4961684210526316,1.7953233,0.4612,1.9728945
0.49610526315789477,1.7908033,0.42440000000000005,2.1648548
0.4996,1.7908286,0.4664,1.9897026
0.5070105263157895,1.7658812,0.452,2.0411723
0.5027368421052631,1.7692825,0.4136000000000001,2.280331
0.5062315789473685,1.7649119,0.4768,1.9493303
1 train_acc train_loss val_acc val_loss
2 0.027410526315789472 4.440032 0.0368 4.238186
3 0.0440842105263158 4.1909122 0.0644 4.1239405
4 0.05604210526315791 4.0817885 0.0368 4.495799
5 0.0685263157894737 3.984858 0.0964 3.8527937
6 0.08345263157894738 3.8947835 0.09080000000000002 3.8306112
7 0.09391578947368423 3.8246264 0.10399999999999998 3.7504945
8 0.10189473684210527 3.760145 0.1124 3.6439042
9 0.11197894736842108 3.704831 0.0992 3.962508
10 0.12534736842105265 3.6408415 0.1404 3.516474
11 0.1385894736842105 3.5672796 0.1444 3.5242612
12 0.14873684210526317 3.5145628 0.12960000000000002 3.5745378
13 0.16103157894736844 3.4476008 0.1852 3.3353982
14 0.16846315789473681 3.399858 0.15600000000000003 3.453797
15 0.1760210526315789 3.3611393 0.1464 3.5799885
16 0.18625263157894736 3.3005812 0.196 3.201007
17 0.19233684210526317 3.26565 0.17439999999999997 3.397586
18 0.19625263157894737 3.2346153 0.212 3.169959
19 0.20717894736842105 3.174345 0.2132 3.0981174
20 0.2136 3.1425776 0.2036 3.2191591
21 0.2217684210526316 3.094137 0.236 3.0018876
22 0.23069473684210529 3.0539455 0.20440000000000003 3.1800296
23 0.23395789473684211 3.0338168 0.22599999999999998 3.0360818
24 0.24463157894736842 2.9761615 0.2588 2.8876188
25 0.25311578947368424 2.931479 0.2 3.242481
26 0.25795789473684216 2.900163 0.28320000000000006 2.830947
27 0.26789473684210524 2.8484874 0.2768 2.8190458
28 0.2709263157894737 2.833472 0.2352 3.0098538
29 0.2816421052631579 2.7842317 0.29560000000000003 2.7288156
30 0.28764210526315787 2.745757 0.2648 2.8955112
31 0.2930315789473684 2.7276495 0.27680000000000005 2.8336413
32 0.3001263157894737 2.6826382 0.316 2.6245823
33 0.3068421052631579 2.658441 0.27 2.9279957
34 0.30909473684210526 2.638565 0.31160000000000004 2.637653
35 0.3213263157894737 2.5939283 0.31799999999999995 2.627816
36 0.3211157894736843 2.579544 0.25079999999999997 2.9502957
37 0.3259999999999999 2.5540712 0.3332 2.569941
38 0.3336421052631579 2.5239582 0.278 2.7676308
39 0.3371368421052632 2.5109046 0.2916 2.725589
40 0.34404210526315787 2.4714804 0.34120000000000006 2.4782379
41 0.3500631578947368 2.4545348 0.30600000000000005 2.6625924
42 0.34976842105263156 2.4408882 0.342 2.5351026
43 0.3586315789473684 2.4116046 0.3452 2.450749
44 0.3568421052631579 2.4133172 0.3288 2.5647113
45 0.3630947368421052 2.3772728 0.36519999999999997 2.388074
46 0.37069473684210524 2.3505116 0.324 2.5489926
47 0.37132631578947367 2.352426 0.33680000000000004 2.5370462
48 0.37606315789473677 2.319005 0.3712 2.3507965
49 0.3800210526315789 2.3045664 0.33 2.6327293
50 0.38185263157894733 2.2965574 0.3764 2.364877
51 0.38785263157894734 2.269467 0.37799999999999995 2.330837
52 0.3889684210526316 2.26941 0.3559999999999999 2.513778
53 0.3951789473684211 2.2413251 0.3888 2.2839465
54 0.3944421052631579 2.2319226 0.35919999999999996 2.4310353
55 0.4 2.220305 0.3732 2.348543
56 0.4051157894736842 2.1891508 0.39440000000000003 2.2730627
57 0.40581052631578945 2.1873925 0.33399999999999996 2.5648093
58 0.4067789473684211 2.1817088 0.4044 2.2244952
59 0.41555789473684207 2.1543047 0.39759999999999995 2.220972
60 0.4170526315789474 2.14905 0.33399999999999996 2.6612198
61 0.41762105263157895 2.1321266 0.3932 2.2343464
62 0.42341052631578946 2.1131704 0.37800000000000006 2.327929
63 0.4212842105263158 2.112597 0.376 2.3302126
64 0.4295157894736842 2.0925663 0.4100000000000001 2.175698
65 0.4299368421052632 2.0846903 0.3772 2.3750577
66 0.43134736842105265 2.075184 0.4044 2.1888158
67 0.43829473684210524 2.045202 0.41239999999999993 2.1673117
68 0.43534736842105265 2.0590534 0.37440000000000007 2.3269994
69 0.4417684210526316 2.0356588 0.42 2.1668334
70 0.4442736842105263 2.028207 0.41239999999999993 2.2346516
71 0.44581052631578943 2.021492 0.40519999999999995 2.2030904
72 0.44884210526315793 2.0058675 0.4296 2.0948715
73 0.45071578947368424 1.993417 0.39 2.2856123
74 0.45130526315789476 1.9970801 0.43599999999999994 2.110219
75 0.45686315789473686 1.9651922 0.4244 2.1253593
76 0.4557263157894737 1.9701725 0.3704 2.4576838
77 0.4609684210526315 1.956996 0.4412 2.0626938
78 0.4639789473684211 1.9407912 0.398 2.3076272
79 0.46311578947368426 1.9410807 0.4056 2.2181008
80 0.4686736842105263 1.918824 0.45080000000000003 2.030652
81 0.4650315789473684 1.924879 0.3948 2.2926931
82 0.46964210526315786 1.9188553 0.43599999999999994 2.107239
83 0.47357894736842104 1.8991861 0.43119999999999997 2.067097
84 0.47212631578947367 1.8987728 0.41359999999999997 2.1667569
85 0.4773263157894737 1.8892545 0.46 2.0283196
86 0.4802526315789474 1.8736148 0.41960000000000003 2.1698954
87 0.47406315789473685 1.8849738 0.43399999999999994 2.1001608
88 0.48627368421052636 1.8492608 0.45520000000000005 1.9936249
89 0.48589473684210527 1.8534511 0.38439999999999996 2.354954
90 0.48667368421052637 1.8421199 0.44120000000000004 2.0467849
91 0.4902736842105263 1.8265136 0.45519999999999994 2.0044358
92 0.4879789473684211 1.838593 0.3984 2.3019247
93 0.49204210526315795 1.8199797 0.4656 1.9858631
94 0.4945894736842105 1.805858 0.436 2.1293921
95 0.4939578947368421 1.8174701 0.4388 2.0611947
96 0.4961684210526316 1.7953233 0.4612 1.9728945
97 0.49610526315789477 1.7908033 0.42440000000000005 2.1648548
98 0.4996 1.7908286 0.4664 1.9897026
99 0.5070105263157895 1.7658812 0.452 2.0411723
100 0.5027368421052631 1.7692825 0.4136000000000001 2.280331
101 0.5062315789473685 1.7649119 0.4768 1.9493303

View File

@ -0,0 +1,2 @@
test_acc,test_loss
0.46970000000000006,1.9579598
1 test_acc test_loss
2 0.46970000000000006 1.9579598

View File

@ -0,0 +1,101 @@
train_acc,train_loss,val_acc,val_loss
0.04040000000000001,4.2986817,0.07600000000000001,3.9793916
0.07663157894736841,3.948711,0.09840000000000002,3.8271046
0.1072842105263158,3.7670445,0.0908,3.8834984
0.14671578947368422,3.544252,0.1784,3.3180876
0.18690526315789474,3.3382895,0.1672,3.4958847
0.2185684210526316,3.1613564,0.23240000000000002,3.0646808
0.2584,2.9509778,0.2904,2.7620668
0.2886736842105263,2.7674758,0.2504,3.083242
0.3186736842105263,2.6191177,0.34600000000000003,2.5320892
0.3488421052631579,2.4735146,0.3556,2.463249
0.36701052631578945,2.3815694,0.32480000000000003,2.6590502
0.39258947368421054,2.2661598,0.41200000000000003,2.215237
0.40985263157894736,2.1811035,0.3644,2.4625826
0.42557894736842106,2.1193688,0.3896,2.2802749
0.4452,2.0338347,0.45080000000000003,2.0216491
0.45298947368421055,1.9886738,0.3768,2.4903286
0.4690105263157895,1.9385177,0.46519999999999995,1.9589043
0.48627368421052636,1.8654134,0.46199999999999997,1.9572229
0.4910947368421053,1.836772,0.3947999999999999,2.371203
0.5033052631578947,1.7882212,0.4864,1.8270072
0.515578947368421,1.7451773,0.418,2.2281988
0.5166526315789474,1.7310464,0.4744,1.9468222
0.532,1.6639497,0.5176,1.7627875
0.534821052631579,1.6504371,0.426,2.2908173
0.5399578947368422,1.6263881,0.5092,1.7892419
0.5538105263157893,1.5786182,0.5184,1.7781507
0.5530526315789474,1.5743873,0.45480000000000004,2.052206
0.5610526315789474,1.5367776,0.5404000000000001,1.6886607
0.5709263157894736,1.508275,0.5072000000000001,1.8317349
0.5693894736842106,1.5026951,0.49760000000000004,1.9268813
0.5827368421052632,1.4614111,0.5484,1.6791071
0.583557894736842,1.4580216,0.4744,2.084504
0.5856842105263159,1.4402864,0.5468,1.6674811
0.5958105263157895,1.4054152,0.5468,1.7081916
0.5964631578947368,1.4043275,0.4988,1.8901508
0.6044631578947368,1.3692447,0.548,1.6456038
0.6065473684210526,1.3562685,0.5448,1.7725601
0.6055578947368421,1.3638091,0.52,1.803752
0.6169684210526316,1.3224502,0.5688,1.6048553
0.6184421052631579,1.3228824,0.4772,2.0309162
0.6193894736842105,1.312684,0.5496,1.6357917
0.6287368421052631,1.2758818,0.5552,1.7120187
0.6270105263157894,1.2829372,0.4872000000000001,1.9630791
0.6313473684210527,1.2609128,0.5632,1.6049384
0.6374736842105263,1.2429903,0.5516,1.7101723
0.6342947368421055,1.2540665,0.5272,1.8112053
0.642778947368421,1.2098345,0.5692,1.5996393
0.6447368421052632,1.217454,0.5056,2.087292
0.6437052631578949,1.2123955,0.5660000000000001,1.6426488
0.6533263157894735,1.1804259,0.5672,1.6429158
0.6521052631578947,1.1856273,0.5316000000000001,1.8833923
0.658021052631579,1.1663536,0.5652,1.6239171
0.6622947368421054,1.1522906,0.5376000000000001,1.8352613
0.6543789473684212,1.1700194,0.5539999999999999,1.7920883
0.6664,1.1246897,0.5828,1.5657492
0.6645473684210526,1.1307288,0.5296,1.8285477
0.6647157894736843,1.1294464,0.5852,1.59438
0.6713473684210526,1.1020554,0.5647999999999999,1.6256377
0.6691368421052631,1.1129124,0.5224,1.9497899
0.6737684210526315,1.0941163,0.5708,1.5900868
0.6765473684210527,1.0844595,0.55,1.7522817
0.6762947368421053,1.0832069,0.5428000000000001,1.8020345
0.6799789473684209,1.0637755,0.5864,1.5690281
0.6808421052631578,1.066873,0.5168,1.9964217
0.6843157894736842,1.0618489,0.5720000000000001,1.6391727
0.6866736842105262,1.0432214,0.5731999999999999,1.6571078
0.6877684210526315,1.0442319,0.5192,2.0341485
0.6890105263157895,1.0338738,0.5836,1.5887364
0.693642105263158,1.0206536,0.5456,1.8537303
0.6905894736842106,1.0271776,0.5548000000000001,1.8022745
0.6981263157894737,1.001102,0.5852,1.5923084
0.6986105263157896,1.0052379,0.512,2.011443
0.698042105263158,0.9990784,0.5744,1.638558
0.7031578947368421,0.977477,0.5816,1.5790274
0.7013473684210526,0.98766434,0.5448000000000001,1.8414693
0.7069684210526315,0.9691622,0.59,1.5866013
0.7061894736842105,0.9620083,0.55,1.7695292
0.7050526315789474,0.9689725,0.5408,1.8329593
0.7101052631578948,0.95279986,0.5852,1.5835829
0.7122315789473684,0.9483001,0.5224,1.9749893
0.7115157894736842,0.94911486,0.5808,1.6965445
0.7166315789473684,0.9338312,0.5788,1.6249495
0.7120631578947368,0.9428737,0.5224,1.9721117
0.7197263157894737,0.92057914,0.5960000000000001,1.6235417
0.7258315789473684,0.9071854,0.528,2.0651033
0.7186947368421053,0.922529,0.5628,1.7508049
0.7257684210526316,0.9007169,0.5980000000000001,1.5797865
0.7254105263157896,0.89657074,0.5472,1.8673587
0.7229263157894736,0.90324384,0.5771999999999999,1.6998875
0.7308842105263157,0.8757633,0.5856,1.6750972
0.7254947368421052,0.8956531,0.5479999999999999,1.9809356
0.7302105263157894,0.8803156,0.5960000000000001,1.6343199
0.7353473684210525,0.8630421,0.56,1.9686066
0.732021052631579,0.8823739,0.5632,1.8139118
0.7324631578947367,0.8676047,0.5952000000000001,1.6235788
0.7366526315789473,0.85581774,0.5392,1.9346147
0.7340210526315789,0.8636227,0.5868,1.6743768
0.7416631578947368,0.84529686,0.5836,1.6691054
0.734757894736842,0.85352796,0.516,2.227477
0.7435368421052632,0.83374214,0.582,1.697568
1 train_acc train_loss val_acc val_loss
2 0.04040000000000001 4.2986817 0.07600000000000001 3.9793916
3 0.07663157894736841 3.948711 0.09840000000000002 3.8271046
4 0.1072842105263158 3.7670445 0.0908 3.8834984
5 0.14671578947368422 3.544252 0.1784 3.3180876
6 0.18690526315789474 3.3382895 0.1672 3.4958847
7 0.2185684210526316 3.1613564 0.23240000000000002 3.0646808
8 0.2584 2.9509778 0.2904 2.7620668
9 0.2886736842105263 2.7674758 0.2504 3.083242
10 0.3186736842105263 2.6191177 0.34600000000000003 2.5320892
11 0.3488421052631579 2.4735146 0.3556 2.463249
12 0.36701052631578945 2.3815694 0.32480000000000003 2.6590502
13 0.39258947368421054 2.2661598 0.41200000000000003 2.215237
14 0.40985263157894736 2.1811035 0.3644 2.4625826
15 0.42557894736842106 2.1193688 0.3896 2.2802749
16 0.4452 2.0338347 0.45080000000000003 2.0216491
17 0.45298947368421055 1.9886738 0.3768 2.4903286
18 0.4690105263157895 1.9385177 0.46519999999999995 1.9589043
19 0.48627368421052636 1.8654134 0.46199999999999997 1.9572229
20 0.4910947368421053 1.836772 0.3947999999999999 2.371203
21 0.5033052631578947 1.7882212 0.4864 1.8270072
22 0.515578947368421 1.7451773 0.418 2.2281988
23 0.5166526315789474 1.7310464 0.4744 1.9468222
24 0.532 1.6639497 0.5176 1.7627875
25 0.534821052631579 1.6504371 0.426 2.2908173
26 0.5399578947368422 1.6263881 0.5092 1.7892419
27 0.5538105263157893 1.5786182 0.5184 1.7781507
28 0.5530526315789474 1.5743873 0.45480000000000004 2.052206
29 0.5610526315789474 1.5367776 0.5404000000000001 1.6886607
30 0.5709263157894736 1.508275 0.5072000000000001 1.8317349
31 0.5693894736842106 1.5026951 0.49760000000000004 1.9268813
32 0.5827368421052632 1.4614111 0.5484 1.6791071
33 0.583557894736842 1.4580216 0.4744 2.084504
34 0.5856842105263159 1.4402864 0.5468 1.6674811
35 0.5958105263157895 1.4054152 0.5468 1.7081916
36 0.5964631578947368 1.4043275 0.4988 1.8901508
37 0.6044631578947368 1.3692447 0.548 1.6456038
38 0.6065473684210526 1.3562685 0.5448 1.7725601
39 0.6055578947368421 1.3638091 0.52 1.803752
40 0.6169684210526316 1.3224502 0.5688 1.6048553
41 0.6184421052631579 1.3228824 0.4772 2.0309162
42 0.6193894736842105 1.312684 0.5496 1.6357917
43 0.6287368421052631 1.2758818 0.5552 1.7120187
44 0.6270105263157894 1.2829372 0.4872000000000001 1.9630791
45 0.6313473684210527 1.2609128 0.5632 1.6049384
46 0.6374736842105263 1.2429903 0.5516 1.7101723
47 0.6342947368421055 1.2540665 0.5272 1.8112053
48 0.642778947368421 1.2098345 0.5692 1.5996393
49 0.6447368421052632 1.217454 0.5056 2.087292
50 0.6437052631578949 1.2123955 0.5660000000000001 1.6426488
51 0.6533263157894735 1.1804259 0.5672 1.6429158
52 0.6521052631578947 1.1856273 0.5316000000000001 1.8833923
53 0.658021052631579 1.1663536 0.5652 1.6239171
54 0.6622947368421054 1.1522906 0.5376000000000001 1.8352613
55 0.6543789473684212 1.1700194 0.5539999999999999 1.7920883
56 0.6664 1.1246897 0.5828 1.5657492
57 0.6645473684210526 1.1307288 0.5296 1.8285477
58 0.6647157894736843 1.1294464 0.5852 1.59438
59 0.6713473684210526 1.1020554 0.5647999999999999 1.6256377
60 0.6691368421052631 1.1129124 0.5224 1.9497899
61 0.6737684210526315 1.0941163 0.5708 1.5900868
62 0.6765473684210527 1.0844595 0.55 1.7522817
63 0.6762947368421053 1.0832069 0.5428000000000001 1.8020345
64 0.6799789473684209 1.0637755 0.5864 1.5690281
65 0.6808421052631578 1.066873 0.5168 1.9964217
66 0.6843157894736842 1.0618489 0.5720000000000001 1.6391727
67 0.6866736842105262 1.0432214 0.5731999999999999 1.6571078
68 0.6877684210526315 1.0442319 0.5192 2.0341485
69 0.6890105263157895 1.0338738 0.5836 1.5887364
70 0.693642105263158 1.0206536 0.5456 1.8537303
71 0.6905894736842106 1.0271776 0.5548000000000001 1.8022745
72 0.6981263157894737 1.001102 0.5852 1.5923084
73 0.6986105263157896 1.0052379 0.512 2.011443
74 0.698042105263158 0.9990784 0.5744 1.638558
75 0.7031578947368421 0.977477 0.5816 1.5790274
76 0.7013473684210526 0.98766434 0.5448000000000001 1.8414693
77 0.7069684210526315 0.9691622 0.59 1.5866013
78 0.7061894736842105 0.9620083 0.55 1.7695292
79 0.7050526315789474 0.9689725 0.5408 1.8329593
80 0.7101052631578948 0.95279986 0.5852 1.5835829
81 0.7122315789473684 0.9483001 0.5224 1.9749893
82 0.7115157894736842 0.94911486 0.5808 1.6965445
83 0.7166315789473684 0.9338312 0.5788 1.6249495
84 0.7120631578947368 0.9428737 0.5224 1.9721117
85 0.7197263157894737 0.92057914 0.5960000000000001 1.6235417
86 0.7258315789473684 0.9071854 0.528 2.0651033
87 0.7186947368421053 0.922529 0.5628 1.7508049
88 0.7257684210526316 0.9007169 0.5980000000000001 1.5797865
89 0.7254105263157896 0.89657074 0.5472 1.8673587
90 0.7229263157894736 0.90324384 0.5771999999999999 1.6998875
91 0.7308842105263157 0.8757633 0.5856 1.6750972
92 0.7254947368421052 0.8956531 0.5479999999999999 1.9809356
93 0.7302105263157894 0.8803156 0.5960000000000001 1.6343199
94 0.7353473684210525 0.8630421 0.56 1.9686066
95 0.732021052631579 0.8823739 0.5632 1.8139118
96 0.7324631578947367 0.8676047 0.5952000000000001 1.6235788
97 0.7366526315789473 0.85581774 0.5392 1.9346147
98 0.7340210526315789 0.8636227 0.5868 1.6743768
99 0.7416631578947368 0.84529686 0.5836 1.6691054
100 0.734757894736842 0.85352796 0.516 2.227477
101 0.7435368421052632 0.83374214 0.582 1.697568

View File

@ -0,0 +1,2 @@
test_acc,test_loss
0.6018000000000001,1.5933747
1 test_acc test_loss
2 0.6018000000000001 1.5933747

View File

@ -0,0 +1,101 @@
train_acc,train_loss,val_acc,val_loss
0.009600000000000001,4.609349,0.0104,4.6072426
0.009326315789473684,4.6068563,0.0092,4.606588
0.009747368421052631,4.6062207,0.0084,4.606326
0.009621052631578947,4.6059957,0.0076,4.6067405
0.009873684210526314,4.605887,0.0076,4.6068487
0.009136842105263157,4.605854,0.008,4.6074386
0.009536842105263158,4.605795,0.007200000000000001,4.6064863
0.009578947368421051,4.6057415,0.006400000000000001,4.6065035
0.009410526315789473,4.6058245,0.0076,4.606772
0.009094736842105263,4.6057224,0.007600000000000001,4.6064925
0.00911578947368421,4.605707,0.007200000000000001,4.6067533
0.009852631578947368,4.605685,0.007200000000000001,4.6068745
0.01031578947368421,4.6056952,0.0072,4.6067533
0.009789473684210527,4.6057863,0.0072,4.6070247
0.01031578947368421,4.6056023,0.0064,4.607134
0.010189473684210526,4.605698,0.0064,4.606934
0.009957894736842107,4.605643,0.006400000000000001,4.6068535
0.009452631578947369,4.605595,0.0064,4.6070676
0.009368421052631578,4.6057224,0.008,4.6070356
0.010210526315789474,4.6056094,0.009600000000000001,4.6070833
0.009557894736842105,4.6056895,0.0076,4.6069493
0.009600000000000001,4.605709,0.008400000000000001,4.60693
0.00985263157894737,4.6055284,0.0084,4.6068263
0.009200000000000002,4.60564,0.0076,4.6071053
0.009031578947368422,4.6056323,0.008400000000000001,4.606731
0.009663157894736842,4.60559,0.0068,4.6069546
0.008484210526315789,4.605676,0.009600000000000001,4.6063976
0.0096,4.605595,0.011200000000000002,4.6067076
0.00951578947368421,4.605619,0.0096,4.6068506
0.009242105263157895,4.6056657,0.0072,4.6067576
0.009326315789473684,4.6055913,0.012,4.6070724
0.01023157894736842,4.605646,0.012000000000000002,4.6066885
0.009494736842105262,4.605563,0.0072,4.6067305
0.009810526315789474,4.6055746,0.007200000000000001,4.6067824
0.010147368421052632,4.605596,0.0072,4.607214
0.009536842105263156,4.6055007,0.007200000000000001,4.607186
0.009452631578947369,4.605547,0.0072,4.607297
0.009578947368421055,4.6055694,0.0072,4.607313
0.009410526315789475,4.6055374,0.0072,4.60726
0.00985263157894737,4.605587,0.0072,4.6072307
0.009389473684210526,4.605559,0.0072,4.607227
0.009852631578947368,4.6055884,0.008,4.6070976
0.008968421052631579,4.6055803,0.008,4.607156
0.009536842105263158,4.605502,0.0076,4.6073594
0.009410526315789473,4.6055517,0.008,4.607176
0.01,4.6055126,0.006400000000000001,4.606937
0.009915789473684213,4.6055126,0.008,4.607185
0.009305263157894737,4.605594,0.0064,4.606834
0.009326315789473684,4.6054907,0.008,4.6070714
0.009094736842105263,4.6055007,0.0076,4.6068645
0.009052631578947368,4.6055903,0.008400000000000001,4.606755
0.010294736842105263,4.605449,0.008,4.6068816
0.009578947368421055,4.6054883,0.0064,4.6067166
0.009452631578947369,4.60552,0.01,4.6066008
0.008821052631578948,4.6054573,0.009600000000000001,4.6065955
0.008968421052631579,4.605544,0.008,4.6063676
0.010147368421052632,4.605516,0.0064,4.6068606
0.009600000000000001,4.6054597,0.0096,4.6072354
0.01008421052631579,4.605526,0.0076,4.6074166
0.010126315789473685,4.6054554,0.0076,4.6074657
0.009705263157894736,4.6054635,0.0088,4.607237
0.009726315789473684,4.605516,0.007200000000000001,4.606978
0.009894736842105262,4.6054883,0.0072,4.607135
0.009663157894736842,4.605501,0.007200000000000001,4.607015
0.00976842105263158,4.605536,0.008,4.6073785
0.009473684210526316,4.6055303,0.009600000000000001,4.6070166
0.009347368421052632,4.6054993,0.0076,4.607084
0.009178947368421054,4.6054535,0.0084,4.6070604
0.008842105263157892,4.605507,0.0076,4.6069884
0.009726315789473684,4.6055107,0.007599999999999999,4.6069903
0.009536842105263156,4.6054244,0.0084,4.6070695
0.009452631578947369,4.605474,0.0072,4.607035
0.009621052631578949,4.605444,0.0076,4.6071277
0.010084210526315791,4.6054263,0.0076,4.6071534
0.009326315789473686,4.605477,0.0088,4.607115
0.009010526315789472,4.60548,0.0076,4.6072206
0.010042105263157897,4.605475,0.0076,4.607185
0.00976842105263158,4.6054463,0.008400000000000001,4.6071196
0.01,4.605421,0.008,4.6069384
0.009536842105263156,4.605482,0.008,4.607035
0.009915789473684213,4.6054354,0.008,4.6071534
0.010042105263157894,4.6054177,0.007200000000000001,4.607074
0.009242105263157895,4.605473,0.0072,4.606825
0.009726315789473684,4.6054006,0.0072,4.606701
0.009684210526315788,4.6054583,0.0104,4.606925
0.009642105263157895,4.6054606,0.0104,4.6068645
0.00936842105263158,4.605405,0.0076,4.606976
0.009263157894736843,4.605455,0.0076,4.606981
0.00905263157894737,4.6054463,0.0092,4.6070757
0.009915789473684213,4.605465,0.0068000000000000005,4.607151
0.009389473684210526,4.605481,0.008400000000000001,4.606995
0.009789473684210527,4.605436,0.0068000000000000005,4.6071105
0.010273684210526315,4.605466,0.007200000000000001,4.606909
0.009789473684210527,4.605443,0.0072,4.6066866
0.009957894736842107,4.6053886,0.0076,4.606541
0.010168421052631578,4.605481,0.006400000000000001,4.606732
0.009242105263157894,4.605444,0.006400000000000001,4.606939
0.009621052631578949,4.6054454,0.008,4.606915
0.00976842105263158,4.60547,0.0076,4.6068935
0.009873684210526316,4.6055245,0.0064,4.6072345
1 train_acc train_loss val_acc val_loss
2 0.009600000000000001 4.609349 0.0104 4.6072426
3 0.009326315789473684 4.6068563 0.0092 4.606588
4 0.009747368421052631 4.6062207 0.0084 4.606326
5 0.009621052631578947 4.6059957 0.0076 4.6067405
6 0.009873684210526314 4.605887 0.0076 4.6068487
7 0.009136842105263157 4.605854 0.008 4.6074386
8 0.009536842105263158 4.605795 0.007200000000000001 4.6064863
9 0.009578947368421051 4.6057415 0.006400000000000001 4.6065035
10 0.009410526315789473 4.6058245 0.0076 4.606772
11 0.009094736842105263 4.6057224 0.007600000000000001 4.6064925
12 0.00911578947368421 4.605707 0.007200000000000001 4.6067533
13 0.009852631578947368 4.605685 0.007200000000000001 4.6068745
14 0.01031578947368421 4.6056952 0.0072 4.6067533
15 0.009789473684210527 4.6057863 0.0072 4.6070247
16 0.01031578947368421 4.6056023 0.0064 4.607134
17 0.010189473684210526 4.605698 0.0064 4.606934
18 0.009957894736842107 4.605643 0.006400000000000001 4.6068535
19 0.009452631578947369 4.605595 0.0064 4.6070676
20 0.009368421052631578 4.6057224 0.008 4.6070356
21 0.010210526315789474 4.6056094 0.009600000000000001 4.6070833
22 0.009557894736842105 4.6056895 0.0076 4.6069493
23 0.009600000000000001 4.605709 0.008400000000000001 4.60693
24 0.00985263157894737 4.6055284 0.0084 4.6068263
25 0.009200000000000002 4.60564 0.0076 4.6071053
26 0.009031578947368422 4.6056323 0.008400000000000001 4.606731
27 0.009663157894736842 4.60559 0.0068 4.6069546
28 0.008484210526315789 4.605676 0.009600000000000001 4.6063976
29 0.0096 4.605595 0.011200000000000002 4.6067076
30 0.00951578947368421 4.605619 0.0096 4.6068506
31 0.009242105263157895 4.6056657 0.0072 4.6067576
32 0.009326315789473684 4.6055913 0.012 4.6070724
33 0.01023157894736842 4.605646 0.012000000000000002 4.6066885
34 0.009494736842105262 4.605563 0.0072 4.6067305
35 0.009810526315789474 4.6055746 0.007200000000000001 4.6067824
36 0.010147368421052632 4.605596 0.0072 4.607214
37 0.009536842105263156 4.6055007 0.007200000000000001 4.607186
38 0.009452631578947369 4.605547 0.0072 4.607297
39 0.009578947368421055 4.6055694 0.0072 4.607313
40 0.009410526315789475 4.6055374 0.0072 4.60726
41 0.00985263157894737 4.605587 0.0072 4.6072307
42 0.009389473684210526 4.605559 0.0072 4.607227
43 0.009852631578947368 4.6055884 0.008 4.6070976
44 0.008968421052631579 4.6055803 0.008 4.607156
45 0.009536842105263158 4.605502 0.0076 4.6073594
46 0.009410526315789473 4.6055517 0.008 4.607176
47 0.01 4.6055126 0.006400000000000001 4.606937
48 0.009915789473684213 4.6055126 0.008 4.607185
49 0.009305263157894737 4.605594 0.0064 4.606834
50 0.009326315789473684 4.6054907 0.008 4.6070714
51 0.009094736842105263 4.6055007 0.0076 4.6068645
52 0.009052631578947368 4.6055903 0.008400000000000001 4.606755
53 0.010294736842105263 4.605449 0.008 4.6068816
54 0.009578947368421055 4.6054883 0.0064 4.6067166
55 0.009452631578947369 4.60552 0.01 4.6066008
56 0.008821052631578948 4.6054573 0.009600000000000001 4.6065955
57 0.008968421052631579 4.605544 0.008 4.6063676
58 0.010147368421052632 4.605516 0.0064 4.6068606
59 0.009600000000000001 4.6054597 0.0096 4.6072354
60 0.01008421052631579 4.605526 0.0076 4.6074166
61 0.010126315789473685 4.6054554 0.0076 4.6074657
62 0.009705263157894736 4.6054635 0.0088 4.607237
63 0.009726315789473684 4.605516 0.007200000000000001 4.606978
64 0.009894736842105262 4.6054883 0.0072 4.607135
65 0.009663157894736842 4.605501 0.007200000000000001 4.607015
66 0.00976842105263158 4.605536 0.008 4.6073785
67 0.009473684210526316 4.6055303 0.009600000000000001 4.6070166
68 0.009347368421052632 4.6054993 0.0076 4.607084
69 0.009178947368421054 4.6054535 0.0084 4.6070604
70 0.008842105263157892 4.605507 0.0076 4.6069884
71 0.009726315789473684 4.6055107 0.007599999999999999 4.6069903
72 0.009536842105263156 4.6054244 0.0084 4.6070695
73 0.009452631578947369 4.605474 0.0072 4.607035
74 0.009621052631578949 4.605444 0.0076 4.6071277
75 0.010084210526315791 4.6054263 0.0076 4.6071534
76 0.009326315789473686 4.605477 0.0088 4.607115
77 0.009010526315789472 4.60548 0.0076 4.6072206
78 0.010042105263157897 4.605475 0.0076 4.607185
79 0.00976842105263158 4.6054463 0.008400000000000001 4.6071196
80 0.01 4.605421 0.008 4.6069384
81 0.009536842105263156 4.605482 0.008 4.607035
82 0.009915789473684213 4.6054354 0.008 4.6071534
83 0.010042105263157894 4.6054177 0.007200000000000001 4.607074
84 0.009242105263157895 4.605473 0.0072 4.606825
85 0.009726315789473684 4.6054006 0.0072 4.606701
86 0.009684210526315788 4.6054583 0.0104 4.606925
87 0.009642105263157895 4.6054606 0.0104 4.6068645
88 0.00936842105263158 4.605405 0.0076 4.606976
89 0.009263157894736843 4.605455 0.0076 4.606981
90 0.00905263157894737 4.6054463 0.0092 4.6070757
91 0.009915789473684213 4.605465 0.0068000000000000005 4.607151
92 0.009389473684210526 4.605481 0.008400000000000001 4.606995
93 0.009789473684210527 4.605436 0.0068000000000000005 4.6071105
94 0.010273684210526315 4.605466 0.007200000000000001 4.606909
95 0.009789473684210527 4.605443 0.0072 4.6066866
96 0.009957894736842107 4.6053886 0.0076 4.606541
97 0.010168421052631578 4.605481 0.006400000000000001 4.606732
98 0.009242105263157894 4.605444 0.006400000000000001 4.606939
99 0.009621052631578949 4.6054454 0.008 4.606915
100 0.00976842105263158 4.60547 0.0076 4.6068935
101 0.009873684210526316 4.6055245 0.0064 4.6072345

View File

@ -0,0 +1,2 @@
test_acc,test_loss
0.01,4.6053004
1 test_acc test_loss
2 0.01 4.6053004

View File

@ -0,0 +1 @@
Most reasonable LaTeX distributions should have no problem building the document from what is in the provided LaTeX source directory. However certain LaTeX distributions are missing certain files, and the they are included in this directory. If you get an error message when you build the LaTeX document saying one of these files is missing, then move the relevant file into your latex source directory.

View File

@ -0,0 +1,79 @@
% ALGORITHM STYLE -- Released 8 April 1996
% for LaTeX-2e
% Copyright -- 1994 Peter Williams
% E-mail Peter.Williams@dsto.defence.gov.au
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{algorithm}
\typeout{Document Style `algorithm' - floating environment}
\RequirePackage{float}
\RequirePackage{ifthen}
\newcommand{\ALG@within}{nothing}
\newboolean{ALG@within}
\setboolean{ALG@within}{false}
\newcommand{\ALG@floatstyle}{ruled}
\newcommand{\ALG@name}{Algorithm}
\newcommand{\listalgorithmname}{List of \ALG@name s}
% Declare Options
% first appearance
\DeclareOption{plain}{
\renewcommand{\ALG@floatstyle}{plain}
}
\DeclareOption{ruled}{
\renewcommand{\ALG@floatstyle}{ruled}
}
\DeclareOption{boxed}{
\renewcommand{\ALG@floatstyle}{boxed}
}
% then numbering convention
\DeclareOption{part}{
\renewcommand{\ALG@within}{part}
\setboolean{ALG@within}{true}
}
\DeclareOption{chapter}{
\renewcommand{\ALG@within}{chapter}
\setboolean{ALG@within}{true}
}
\DeclareOption{section}{
\renewcommand{\ALG@within}{section}
\setboolean{ALG@within}{true}
}
\DeclareOption{subsection}{
\renewcommand{\ALG@within}{subsection}
\setboolean{ALG@within}{true}
}
\DeclareOption{subsubsection}{
\renewcommand{\ALG@within}{subsubsection}
\setboolean{ALG@within}{true}
}
\DeclareOption{nothing}{
\renewcommand{\ALG@within}{nothing}
\setboolean{ALG@within}{true}
}
\DeclareOption*{\edef\ALG@name{\CurrentOption}}
% ALGORITHM
%
\ProcessOptions
\floatstyle{\ALG@floatstyle}
\ifthenelse{\boolean{ALG@within}}{
\ifthenelse{\equal{\ALG@within}{part}}
{\newfloat{algorithm}{htbp}{loa}[part]}{}
\ifthenelse{\equal{\ALG@within}{chapter}}
{\newfloat{algorithm}{htbp}{loa}[chapter]}{}
\ifthenelse{\equal{\ALG@within}{section}}
{\newfloat{algorithm}{htbp}{loa}[section]}{}
\ifthenelse{\equal{\ALG@within}{subsection}}
{\newfloat{algorithm}{htbp}{loa}[subsection]}{}
\ifthenelse{\equal{\ALG@within}{subsubsection}}
{\newfloat{algorithm}{htbp}{loa}[subsubsection]}{}
\ifthenelse{\equal{\ALG@within}{nothing}}
{\newfloat{algorithm}{htbp}{loa}}{}
}{
\newfloat{algorithm}{htbp}{loa}
}
\floatname{algorithm}{\ALG@name}
\newcommand{\listofalgorithms}{\listof{algorithm}{\listalgorithmname}}

View File

@ -0,0 +1,201 @@
% ALGORITHMIC STYLE -- Released 8 APRIL 1996
% for LaTeX version 2e
% Copyright -- 1994 Peter Williams
% E-mail PeterWilliams@dsto.defence.gov.au
%
% Modified by Alex Smola (08/2000)
% E-mail Alex.Smola@anu.edu.au
%
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{algorithmic}
\typeout{Document Style `algorithmic' - environment}
%
\RequirePackage{ifthen}
\RequirePackage{calc}
\newboolean{ALC@noend}
\setboolean{ALC@noend}{false}
\newcounter{ALC@line}
\newcounter{ALC@rem}
\newlength{\ALC@tlm}
%
\DeclareOption{noend}{\setboolean{ALC@noend}{true}}
%
\ProcessOptions
%
% ALGORITHMIC
\newcommand{\algorithmicrequire}{\textbf{Require:}}
\newcommand{\algorithmicensure}{\textbf{Ensure:}}
\newcommand{\algorithmiccomment}[1]{\{#1\}}
\newcommand{\algorithmicend}{\textbf{end}}
\newcommand{\algorithmicif}{\textbf{if}}
\newcommand{\algorithmicthen}{\textbf{then}}
\newcommand{\algorithmicelse}{\textbf{else}}
\newcommand{\algorithmicelsif}{\algorithmicelse\ \algorithmicif}
\newcommand{\algorithmicendif}{\algorithmicend\ \algorithmicif}
\newcommand{\algorithmicfor}{\textbf{for}}
\newcommand{\algorithmicforall}{\textbf{for all}}
\newcommand{\algorithmicdo}{\textbf{do}}
\newcommand{\algorithmicendfor}{\algorithmicend\ \algorithmicfor}
\newcommand{\algorithmicwhile}{\textbf{while}}
\newcommand{\algorithmicendwhile}{\algorithmicend\ \algorithmicwhile}
\newcommand{\algorithmicloop}{\textbf{loop}}
\newcommand{\algorithmicendloop}{\algorithmicend\ \algorithmicloop}
\newcommand{\algorithmicrepeat}{\textbf{repeat}}
\newcommand{\algorithmicuntil}{\textbf{until}}
%changed by alex smola
\newcommand{\algorithmicinput}{\textbf{input}}
\newcommand{\algorithmicoutput}{\textbf{output}}
\newcommand{\algorithmicset}{\textbf{set}}
\newcommand{\algorithmictrue}{\textbf{true}}
\newcommand{\algorithmicfalse}{\textbf{false}}
\newcommand{\algorithmicand}{\textbf{and\ }}
\newcommand{\algorithmicor}{\textbf{or\ }}
\newcommand{\algorithmicfunction}{\textbf{function}}
\newcommand{\algorithmicendfunction}{\algorithmicend\ \algorithmicfunction}
\newcommand{\algorithmicmain}{\textbf{main}}
\newcommand{\algorithmicendmain}{\algorithmicend\ \algorithmicmain}
%end changed by alex smola
\def\ALC@item[#1]{%
\if@noparitem \@donoparitem
\else \if@inlabel \indent \par \fi
\ifhmode \unskip\unskip \par \fi
\if@newlist \if@nobreak \@nbitem \else
\addpenalty\@beginparpenalty
\addvspace\@topsep \addvspace{-\parskip}\fi
\else \addpenalty\@itempenalty \addvspace\itemsep
\fi
\global\@inlabeltrue
\fi
\everypar{\global\@minipagefalse\global\@newlistfalse
\if@inlabel\global\@inlabelfalse \hskip -\parindent \box\@labels
\penalty\z@ \fi
\everypar{}}\global\@nobreakfalse
\if@noitemarg \@noitemargfalse \if@nmbrlist \refstepcounter{\@listctr}\fi \fi
\sbox\@tempboxa{\makelabel{#1}}%
\global\setbox\@labels
\hbox{\unhbox\@labels \hskip \itemindent
\hskip -\labelwidth \hskip -\ALC@tlm
\ifdim \wd\@tempboxa >\labelwidth
\box\@tempboxa
\else \hbox to\labelwidth {\unhbox\@tempboxa}\fi
\hskip \ALC@tlm}\ignorespaces}
%
\newenvironment{algorithmic}[1][0]{
\let\@item\ALC@item
\newcommand{\ALC@lno}{%
\ifthenelse{\equal{\arabic{ALC@rem}}{0}}
{{\footnotesize \arabic{ALC@line}:}}{}%
}
\let\@listii\@listi
\let\@listiii\@listi
\let\@listiv\@listi
\let\@listv\@listi
\let\@listvi\@listi
\let\@listvii\@listi
\newenvironment{ALC@g}{
\begin{list}{\ALC@lno}{ \itemsep\z@ \itemindent\z@
\listparindent\z@ \rightmargin\z@
\topsep\z@ \partopsep\z@ \parskip\z@\parsep\z@
\leftmargin 1em
\addtolength{\ALC@tlm}{\leftmargin}
}
}
{\end{list}}
\newcommand{\ALC@it}{\addtocounter{ALC@line}{1}\addtocounter{ALC@rem}{1}\ifthenelse{\equal{\arabic{ALC@rem}}{#1}}{\setcounter{ALC@rem}{0}}{}\item}
\newcommand{\ALC@com}[1]{\ifthenelse{\equal{##1}{default}}%
{}{\ \algorithmiccomment{##1}}}
\newcommand{\REQUIRE}{\item[\algorithmicrequire]}
\newcommand{\ENSURE}{\item[\algorithmicensure]}
\newcommand{\STATE}{\ALC@it}
\newcommand{\COMMENT}[1]{\algorithmiccomment{##1}}
%changes by alex smola
\newcommand{\INPUT}{\item[\algorithmicinput]}
\newcommand{\OUTPUT}{\item[\algorithmicoutput]}
\newcommand{\SET}{\item[\algorithmicset]}
% \newcommand{\TRUE}{\algorithmictrue}
% \newcommand{\FALSE}{\algorithmicfalse}
\newcommand{\AND}{\algorithmicand}
\newcommand{\OR}{\algorithmicor}
\newenvironment{ALC@func}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@main}{\begin{ALC@g}}{\end{ALC@g}}
%end changes by alex smola
\newenvironment{ALC@if}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@for}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@whl}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@loop}{\begin{ALC@g}}{\end{ALC@g}}
\newenvironment{ALC@rpt}{\begin{ALC@g}}{\end{ALC@g}}
\renewcommand{\\}{\@centercr}
\newcommand{\IF}[2][default]{\ALC@it\algorithmicif\ ##2\ \algorithmicthen%
\ALC@com{##1}\begin{ALC@if}}
\newcommand{\SHORTIF}[2]{\ALC@it\algorithmicif\ ##1\
\algorithmicthen\ {##2}}
\newcommand{\ELSE}[1][default]{\end{ALC@if}\ALC@it\algorithmicelse%
\ALC@com{##1}\begin{ALC@if}}
\newcommand{\ELSIF}[2][default]%
{\end{ALC@if}\ALC@it\algorithmicelsif\ ##2\ \algorithmicthen%
\ALC@com{##1}\begin{ALC@if}}
\newcommand{\FOR}[2][default]{\ALC@it\algorithmicfor\ ##2\ \algorithmicdo%
\ALC@com{##1}\begin{ALC@for}}
\newcommand{\FORALL}[2][default]{\ALC@it\algorithmicforall\ ##2\ %
\algorithmicdo%
\ALC@com{##1}\begin{ALC@for}}
\newcommand{\SHORTFORALL}[2]{\ALC@it\algorithmicforall\ ##1\ %
\algorithmicdo\ {##2}}
\newcommand{\WHILE}[2][default]{\ALC@it\algorithmicwhile\ ##2\ %
\algorithmicdo%
\ALC@com{##1}\begin{ALC@whl}}
\newcommand{\LOOP}[1][default]{\ALC@it\algorithmicloop%
\ALC@com{##1}\begin{ALC@loop}}
%changed by alex smola
\newcommand{\FUNCTION}[2][default]{\ALC@it\algorithmicfunction\ ##2\ %
\ALC@com{##1}\begin{ALC@func}}
\newcommand{\MAIN}[2][default]{\ALC@it\algorithmicmain\ ##2\ %
\ALC@com{##1}\begin{ALC@main}}
%end changed by alex smola
\newcommand{\REPEAT}[1][default]{\ALC@it\algorithmicrepeat%
\ALC@com{##1}\begin{ALC@rpt}}
\newcommand{\UNTIL}[1]{\end{ALC@rpt}\ALC@it\algorithmicuntil\ ##1}
\ifthenelse{\boolean{ALC@noend}}{
\newcommand{\ENDIF}{\end{ALC@if}}
\newcommand{\ENDFOR}{\end{ALC@for}}
\newcommand{\ENDWHILE}{\end{ALC@whl}}
\newcommand{\ENDLOOP}{\end{ALC@loop}}
\newcommand{\ENDFUNCTION}{\end{ALC@func}}
\newcommand{\ENDMAIN}{\end{ALC@main}}
}{
\newcommand{\ENDIF}{\end{ALC@if}\ALC@it\algorithmicendif}
\newcommand{\ENDFOR}{\end{ALC@for}\ALC@it\algorithmicendfor}
\newcommand{\ENDWHILE}{\end{ALC@whl}\ALC@it\algorithmicendwhile}
\newcommand{\ENDLOOP}{\end{ALC@loop}\ALC@it\algorithmicendloop}
\newcommand{\ENDFUNCTION}{\end{ALC@func}\ALC@it\algorithmicendfunction}
\newcommand{\ENDMAIN}{\end{ALC@main}\ALC@it\algorithmicendmain}
}
\renewcommand{\@toodeep}{}
\begin{list}{\ALC@lno}{\setcounter{ALC@line}{0}\setcounter{ALC@rem}{0}%
\itemsep\z@ \itemindent\z@ \listparindent\z@%
\partopsep\z@ \parskip\z@ \parsep\z@%
\labelsep 0.5em \topsep 0.2em%
\ifthenelse{\equal{#1}{0}}
{\labelwidth 0.5em }
{\labelwidth 1.2em }
\leftmargin\labelwidth \addtolength{\leftmargin}{\labelsep}
\ALC@tlm\labelsep
}
}
{\end{list}}

View File

@ -0,0 +1,485 @@
% fancyhdr.sty version 3.2
% Fancy headers and footers for LaTeX.
% Piet van Oostrum,
% Dept of Computer and Information Sciences, University of Utrecht,
% Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
% Telephone: +31 30 2532180. Email: piet@cs.uu.nl
% ========================================================================
% LICENCE:
% This file may be distributed under the terms of the LaTeX Project Public
% License, as described in lppl.txt in the base LaTeX distribution.
% Either version 1 or, at your option, any later version.
% ========================================================================
% MODIFICATION HISTORY:
% Sep 16, 1994
% version 1.4: Correction for use with \reversemargin
% Sep 29, 1994:
% version 1.5: Added the \iftopfloat, \ifbotfloat and \iffloatpage commands
% Oct 4, 1994:
% version 1.6: Reset single spacing in headers/footers for use with
% setspace.sty or doublespace.sty
% Oct 4, 1994:
% version 1.7: changed \let\@mkboth\markboth to
% \def\@mkboth{\protect\markboth} to make it more robust
% Dec 5, 1994:
% version 1.8: corrections for amsbook/amsart: define \@chapapp and (more
% importantly) use the \chapter/sectionmark definitions from ps@headings if
% they exist (which should be true for all standard classes).
% May 31, 1995:
% version 1.9: The proposed \renewcommand{\headrulewidth}{\iffloatpage...
% construction in the doc did not work properly with the fancyplain style.
% June 1, 1995:
% version 1.91: The definition of \@mkboth wasn't restored on subsequent
% \pagestyle{fancy}'s.
% June 1, 1995:
% version 1.92: The sequence \pagestyle{fancyplain} \pagestyle{plain}
% \pagestyle{fancy} would erroneously select the plain version.
% June 1, 1995:
% version 1.93: \fancypagestyle command added.
% Dec 11, 1995:
% version 1.94: suggested by Conrad Hughes <chughes@maths.tcd.ie>
% CJCH, Dec 11, 1995: added \footruleskip to allow control over footrule
% position (old hardcoded value of .3\normalbaselineskip is far too high
% when used with very small footer fonts).
% Jan 31, 1996:
% version 1.95: call \@normalsize in the reset code if that is defined,
% otherwise \normalsize.
% this is to solve a problem with ucthesis.cls, as this doesn't
% define \@currsize. Unfortunately for latex209 calling \normalsize doesn't
% work as this is optimized to do very little, so there \@normalsize should
% be called. Hopefully this code works for all versions of LaTeX known to
% mankind.
% April 25, 1996:
% version 1.96: initialize \headwidth to a magic (negative) value to catch
% most common cases that people change it before calling \pagestyle{fancy}.
% Note it can't be initialized when reading in this file, because
% \textwidth could be changed afterwards. This is quite probable.
% We also switch to \MakeUppercase rather than \uppercase and introduce a
% \nouppercase command for use in headers. and footers.
% May 3, 1996:
% version 1.97: Two changes:
% 1. Undo the change in version 1.8 (using the pagestyle{headings} defaults
% for the chapter and section marks. The current version of amsbook and
% amsart classes don't seem to need them anymore. Moreover the standard
% latex classes don't use \markboth if twoside isn't selected, and this is
% confusing as \leftmark doesn't work as expected.
% 2. include a call to \ps@empty in ps@@fancy. This is to solve a problem
% in the amsbook and amsart classes, that make global changes to \topskip,
% which are reset in \ps@empty. Hopefully this doesn't break other things.
% May 7, 1996:
% version 1.98:
% Added % after the line \def\nouppercase
% May 7, 1996:
% version 1.99: This is the alpha version of fancyhdr 2.0
% Introduced the new commands \fancyhead, \fancyfoot, and \fancyhf.
% Changed \headrulewidth, \footrulewidth, \footruleskip to
% macros rather than length parameters, In this way they can be
% conditionalized and they don't consume length registers. There is no need
% to have them as length registers unless you want to do calculations with
% them, which is unlikely. Note that this may make some uses of them
% incompatible (i.e. if you have a file that uses \setlength or \xxxx=)
% May 10, 1996:
% version 1.99a:
% Added a few more % signs
% May 10, 1996:
% version 1.99b:
% Changed the syntax of \f@nfor to be resistent to catcode changes of :=
% Removed the [1] from the defs of \lhead etc. because the parameter is
% consumed by the \@[xy]lhead etc. macros.
% June 24, 1997:
% version 1.99c:
% corrected \nouppercase to also include the protected form of \MakeUppercase
% \global added to manipulation of \headwidth.
% \iffootnote command added.
% Some comments added about \@fancyhead and \@fancyfoot.
% Aug 24, 1998
% version 1.99d
% Changed the default \ps@empty to \ps@@empty in order to allow
% \fancypagestyle{empty} redefinition.
% Oct 11, 2000
% version 2.0
% Added LPPL license clause.
%
% A check for \headheight is added. An errormessage is given (once) if the
% header is too large. Empty headers don't generate the error even if
% \headheight is very small or even 0pt.
% Warning added for the use of 'E' option when twoside option is not used.
% In this case the 'E' fields will never be used.
%
% Mar 10, 2002
% version 2.1beta
% New command: \fancyhfoffset[place]{length}
% defines offsets to be applied to the header/footer to let it stick into
% the margins (if length > 0).
% place is like in fancyhead, except that only E,O,L,R can be used.
% This replaces the old calculation based on \headwidth and the marginpar
% area.
% \headwidth will be dynamically calculated in the headers/footers when
% this is used.
%
% Mar 26, 2002
% version 2.1beta2
% \fancyhfoffset now also takes h,f as possible letters in the argument to
% allow the header and footer widths to be different.
% New commands \fancyheadoffset and \fancyfootoffset added comparable to
% \fancyhead and \fancyfoot.
% Errormessages and warnings have been made more informative.
%
% Dec 9, 2002
% version 2.1
% The defaults for \footrulewidth, \plainheadrulewidth and
% \plainfootrulewidth are changed from \z@skip to 0pt. In this way when
% someone inadvertantly uses \setlength to change any of these, the value
% of \z@skip will not be changed, rather an errormessage will be given.
% March 3, 2004
% Release of version 3.0
% Oct 7, 2004
% version 3.1
% Added '\endlinechar=13' to \fancy@reset to prevent problems with
% includegraphics in header when verbatiminput is active.
% March 22, 2005
% version 3.2
% reset \everypar (the real one) in \fancy@reset because spanish.ldf does
% strange things with \everypar between << and >>.
\def\ifancy@mpty#1{\def\temp@a{#1}\ifx\temp@a\@empty}
\def\fancy@def#1#2{\ifancy@mpty{#2}\fancy@gbl\def#1{\leavevmode}\else
\fancy@gbl\def#1{#2\strut}\fi}
\let\fancy@gbl\global
\def\@fancyerrmsg#1{%
\ifx\PackageError\undefined
\errmessage{#1}\else
\PackageError{Fancyhdr}{#1}{}\fi}
\def\@fancywarning#1{%
\ifx\PackageWarning\undefined
\errmessage{#1}\else
\PackageWarning{Fancyhdr}{#1}{}\fi}
% Usage: \@forc \var{charstring}{command to be executed for each char}
% This is similar to LaTeX's \@tfor, but expands the charstring.
\def\@forc#1#2#3{\expandafter\f@rc\expandafter#1\expandafter{#2}{#3}}
\def\f@rc#1#2#3{\def\temp@ty{#2}\ifx\@empty\temp@ty\else
\f@@rc#1#2\f@@rc{#3}\fi}
\def\f@@rc#1#2#3\f@@rc#4{\def#1{#2}#4\f@rc#1{#3}{#4}}
% Usage: \f@nfor\name:=list\do{body}
% Like LaTeX's \@for but an empty list is treated as a list with an empty
% element
\newcommand{\f@nfor}[3]{\edef\@fortmp{#2}%
\expandafter\@forloop#2,\@nil,\@nil\@@#1{#3}}
% Usage: \def@ult \cs{defaults}{argument}
% sets \cs to the characters from defaults appearing in argument
% or defaults if it would be empty. All characters are lowercased.
\newcommand\def@ult[3]{%
\edef\temp@a{\lowercase{\edef\noexpand\temp@a{#3}}}\temp@a
\def#1{}%
\@forc\tmpf@ra{#2}%
{\expandafter\if@in\tmpf@ra\temp@a{\edef#1{#1\tmpf@ra}}{}}%
\ifx\@empty#1\def#1{#2}\fi}
%
% \if@in <char><set><truecase><falsecase>
%
\newcommand{\if@in}[4]{%
\edef\temp@a{#2}\def\temp@b##1#1##2\temp@b{\def\temp@b{##1}}%
\expandafter\temp@b#2#1\temp@b\ifx\temp@a\temp@b #4\else #3\fi}
\newcommand{\fancyhead}{\@ifnextchar[{\f@ncyhf\fancyhead h}%
{\f@ncyhf\fancyhead h[]}}
\newcommand{\fancyfoot}{\@ifnextchar[{\f@ncyhf\fancyfoot f}%
{\f@ncyhf\fancyfoot f[]}}
\newcommand{\fancyhf}{\@ifnextchar[{\f@ncyhf\fancyhf{}}%
{\f@ncyhf\fancyhf{}[]}}
% New commands for offsets added
\newcommand{\fancyheadoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyheadoffset h}%
{\f@ncyhfoffs\fancyheadoffset h[]}}
\newcommand{\fancyfootoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyfootoffset f}%
{\f@ncyhfoffs\fancyfootoffset f[]}}
\newcommand{\fancyhfoffset}{\@ifnextchar[{\f@ncyhfoffs\fancyhfoffset{}}%
{\f@ncyhfoffs\fancyhfoffset{}[]}}
% The header and footer fields are stored in command sequences with
% names of the form: \f@ncy<x><y><z> with <x> for [eo], <y> from [lcr]
% and <z> from [hf].
\def\f@ncyhf#1#2[#3]#4{%
\def\temp@c{}%
\@forc\tmpf@ra{#3}%
{\expandafter\if@in\tmpf@ra{eolcrhf,EOLCRHF}%
{}{\edef\temp@c{\temp@c\tmpf@ra}}}%
\ifx\@empty\temp@c\else
\@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
[#3]}%
\fi
\f@nfor\temp@c{#3}%
{\def@ult\f@@@eo{eo}\temp@c
\if@twoside\else
\if\f@@@eo e\@fancywarning
{\string#1's `E' option without twoside option is useless}\fi\fi
\def@ult\f@@@lcr{lcr}\temp@c
\def@ult\f@@@hf{hf}{#2\temp@c}%
\@forc\f@@eo\f@@@eo
{\@forc\f@@lcr\f@@@lcr
{\@forc\f@@hf\f@@@hf
{\expandafter\fancy@def\csname
f@ncy\f@@eo\f@@lcr\f@@hf\endcsname
{#4}}}}}}
\def\f@ncyhfoffs#1#2[#3]#4{%
\def\temp@c{}%
\@forc\tmpf@ra{#3}%
{\expandafter\if@in\tmpf@ra{eolrhf,EOLRHF}%
{}{\edef\temp@c{\temp@c\tmpf@ra}}}%
\ifx\@empty\temp@c\else
\@fancyerrmsg{Illegal char `\temp@c' in \string#1 argument:
[#3]}%
\fi
\f@nfor\temp@c{#3}%
{\def@ult\f@@@eo{eo}\temp@c
\if@twoside\else
\if\f@@@eo e\@fancywarning
{\string#1's `E' option without twoside option is useless}\fi\fi
\def@ult\f@@@lcr{lr}\temp@c
\def@ult\f@@@hf{hf}{#2\temp@c}%
\@forc\f@@eo\f@@@eo
{\@forc\f@@lcr\f@@@lcr
{\@forc\f@@hf\f@@@hf
{\expandafter\setlength\csname
f@ncyO@\f@@eo\f@@lcr\f@@hf\endcsname
{#4}}}}}%
\fancy@setoffs}
% Fancyheadings version 1 commands. These are more or less deprecated,
% but they continue to work.
\newcommand{\lhead}{\@ifnextchar[{\@xlhead}{\@ylhead}}
\def\@xlhead[#1]#2{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#2}}
\def\@ylhead#1{\fancy@def\f@ncyelh{#1}\fancy@def\f@ncyolh{#1}}
\newcommand{\chead}{\@ifnextchar[{\@xchead}{\@ychead}}
\def\@xchead[#1]#2{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#2}}
\def\@ychead#1{\fancy@def\f@ncyech{#1}\fancy@def\f@ncyoch{#1}}
\newcommand{\rhead}{\@ifnextchar[{\@xrhead}{\@yrhead}}
\def\@xrhead[#1]#2{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#2}}
\def\@yrhead#1{\fancy@def\f@ncyerh{#1}\fancy@def\f@ncyorh{#1}}
\newcommand{\lfoot}{\@ifnextchar[{\@xlfoot}{\@ylfoot}}
\def\@xlfoot[#1]#2{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#2}}
\def\@ylfoot#1{\fancy@def\f@ncyelf{#1}\fancy@def\f@ncyolf{#1}}
\newcommand{\cfoot}{\@ifnextchar[{\@xcfoot}{\@ycfoot}}
\def\@xcfoot[#1]#2{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#2}}
\def\@ycfoot#1{\fancy@def\f@ncyecf{#1}\fancy@def\f@ncyocf{#1}}
\newcommand{\rfoot}{\@ifnextchar[{\@xrfoot}{\@yrfoot}}
\def\@xrfoot[#1]#2{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#2}}
\def\@yrfoot#1{\fancy@def\f@ncyerf{#1}\fancy@def\f@ncyorf{#1}}
\newlength{\fancy@headwidth}
\let\headwidth\fancy@headwidth
\newlength{\f@ncyO@elh}
\newlength{\f@ncyO@erh}
\newlength{\f@ncyO@olh}
\newlength{\f@ncyO@orh}
\newlength{\f@ncyO@elf}
\newlength{\f@ncyO@erf}
\newlength{\f@ncyO@olf}
\newlength{\f@ncyO@orf}
\newcommand{\headrulewidth}{0.4pt}
\newcommand{\footrulewidth}{0pt}
\newcommand{\footruleskip}{.3\normalbaselineskip}
% Fancyplain stuff shouldn't be used anymore (rather
% \fancypagestyle{plain} should be used), but it must be present for
% compatibility reasons.
\newcommand{\plainheadrulewidth}{0pt}
\newcommand{\plainfootrulewidth}{0pt}
\newif\if@fancyplain \@fancyplainfalse
\def\fancyplain#1#2{\if@fancyplain#1\else#2\fi}
\headwidth=-123456789sp %magic constant
% Command to reset various things in the headers:
% a.o. single spacing (taken from setspace.sty)
% and the catcode of ^^M (so that epsf files in the header work if a
% verbatim crosses a page boundary)
% It also defines a \nouppercase command that disables \uppercase and
% \Makeuppercase. It can only be used in the headers and footers.
\let\fnch@everypar\everypar% save real \everypar because of spanish.ldf
\def\fancy@reset{\fnch@everypar{}\restorecr\endlinechar=13
\def\baselinestretch{1}%
\def\nouppercase##1{{\let\uppercase\relax\let\MakeUppercase\relax
\expandafter\let\csname MakeUppercase \endcsname\relax##1}}%
\ifx\undefined\@newbaseline% NFSS not present; 2.09 or 2e
\ifx\@normalsize\undefined \normalsize % for ucthesis.cls
\else \@normalsize \fi
\else% NFSS (2.09) present
\@newbaseline%
\fi}
% Initialization of the head and foot text.
% The default values still contain \fancyplain for compatibility.
\fancyhf{} % clear all
% lefthead empty on ``plain'' pages, \rightmark on even, \leftmark on odd pages
% evenhead empty on ``plain'' pages, \leftmark on even, \rightmark on odd pages
\if@twoside
\fancyhead[el,or]{\fancyplain{}{\sl\rightmark}}
\fancyhead[er,ol]{\fancyplain{}{\sl\leftmark}}
\else
\fancyhead[l]{\fancyplain{}{\sl\rightmark}}
\fancyhead[r]{\fancyplain{}{\sl\leftmark}}
\fi
\fancyfoot[c]{\rm\thepage} % page number
% Use box 0 as a temp box and dimen 0 as temp dimen.
% This can be done, because this code will always
% be used inside another box, and therefore the changes are local.
\def\@fancyvbox#1#2{\setbox0\vbox{#2}\ifdim\ht0>#1\@fancywarning
{\string#1 is too small (\the#1): ^^J Make it at least \the\ht0.^^J
We now make it that large for the rest of the document.^^J
This may cause the page layout to be inconsistent, however\@gobble}%
\dimen0=#1\global\setlength{#1}{\ht0}\ht0=\dimen0\fi
\box0}
% Put together a header or footer given the left, center and
% right text, fillers at left and right and a rule.
% The \lap commands put the text into an hbox of zero size,
% so overlapping text does not generate an errormessage.
% These macros have 5 parameters:
% 1. LEFTSIDE BEARING % This determines at which side the header will stick
% out. When \fancyhfoffset is used this calculates \headwidth, otherwise
% it is \hss or \relax (after expansion).
% 2. \f@ncyolh, \f@ncyelh, \f@ncyolf or \f@ncyelf. This is the left component.
% 3. \f@ncyoch, \f@ncyech, \f@ncyocf or \f@ncyecf. This is the middle comp.
% 4. \f@ncyorh, \f@ncyerh, \f@ncyorf or \f@ncyerf. This is the right component.
% 5. RIGHTSIDE BEARING. This is always \relax or \hss (after expansion).
\def\@fancyhead#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
\@fancyvbox\headheight{\hbox
{\rlap{\parbox[b]{\headwidth}{\raggedright#2}}\hfill
\parbox[b]{\headwidth}{\centering#3}\hfill
\llap{\parbox[b]{\headwidth}{\raggedleft#4}}}\headrule}}#5}
\def\@fancyfoot#1#2#3#4#5{#1\hbox to\headwidth{\fancy@reset
\@fancyvbox\footskip{\footrule
\hbox{\rlap{\parbox[t]{\headwidth}{\raggedright#2}}\hfill
\parbox[t]{\headwidth}{\centering#3}\hfill
\llap{\parbox[t]{\headwidth}{\raggedleft#4}}}}}#5}
\def\headrule{{\if@fancyplain\let\headrulewidth\plainheadrulewidth\fi
\hrule\@height\headrulewidth\@width\headwidth \vskip-\headrulewidth}}
\def\footrule{{\if@fancyplain\let\footrulewidth\plainfootrulewidth\fi
\vskip-\footruleskip\vskip-\footrulewidth
\hrule\@width\headwidth\@height\footrulewidth\vskip\footruleskip}}
\def\ps@fancy{%
\@ifundefined{@chapapp}{\let\@chapapp\chaptername}{}%for amsbook
%
% Define \MakeUppercase for old LaTeXen.
% Note: we used \def rather than \let, so that \let\uppercase\relax (from
% the version 1 documentation) will still work.
%
\@ifundefined{MakeUppercase}{\def\MakeUppercase{\uppercase}}{}%
\@ifundefined{chapter}{\def\sectionmark##1{\markboth
{\MakeUppercase{\ifnum \c@secnumdepth>\z@
\thesection\hskip 1em\relax \fi ##1}}{}}%
\def\subsectionmark##1{\markright {\ifnum \c@secnumdepth >\@ne
\thesubsection\hskip 1em\relax \fi ##1}}}%
{\def\chaptermark##1{\markboth {\MakeUppercase{\ifnum \c@secnumdepth>\m@ne
\@chapapp\ \thechapter. \ \fi ##1}}{}}%
\def\sectionmark##1{\markright{\MakeUppercase{\ifnum \c@secnumdepth >\z@
\thesection. \ \fi ##1}}}}%
%\csname ps@headings\endcsname % use \ps@headings defaults if they exist
\ps@@fancy
\gdef\ps@fancy{\@fancyplainfalse\ps@@fancy}%
% Initialize \headwidth if the user didn't
%
\ifdim\headwidth<0sp
%
% This catches the case that \headwidth hasn't been initialized and the
% case that the user added something to \headwidth in the expectation that
% it was initialized to \textwidth. We compensate this now. This loses if
% the user intended to multiply it by a factor. But that case is more
% likely done by saying something like \headwidth=1.2\textwidth.
% The doc says you have to change \headwidth after the first call to
% \pagestyle{fancy}. This code is just to catch the most common cases were
% that requirement is violated.
%
\global\advance\headwidth123456789sp\global\advance\headwidth\textwidth
\fi}
\def\ps@fancyplain{\ps@fancy \let\ps@plain\ps@plain@fancy}
\def\ps@plain@fancy{\@fancyplaintrue\ps@@fancy}
\let\ps@@empty\ps@empty
\def\ps@@fancy{%
\ps@@empty % This is for amsbook/amsart, which do strange things with \topskip
\def\@mkboth{\protect\markboth}%
\def\@oddhead{\@fancyhead\fancy@Oolh\f@ncyolh\f@ncyoch\f@ncyorh\fancy@Oorh}%
\def\@oddfoot{\@fancyfoot\fancy@Oolf\f@ncyolf\f@ncyocf\f@ncyorf\fancy@Oorf}%
\def\@evenhead{\@fancyhead\fancy@Oelh\f@ncyelh\f@ncyech\f@ncyerh\fancy@Oerh}%
\def\@evenfoot{\@fancyfoot\fancy@Oelf\f@ncyelf\f@ncyecf\f@ncyerf\fancy@Oerf}%
}
% Default definitions for compatibility mode:
% These cause the header/footer to take the defined \headwidth as width
% And to shift in the direction of the marginpar area
\def\fancy@Oolh{\if@reversemargin\hss\else\relax\fi}
\def\fancy@Oorh{\if@reversemargin\relax\else\hss\fi}
\let\fancy@Oelh\fancy@Oorh
\let\fancy@Oerh\fancy@Oolh
\let\fancy@Oolf\fancy@Oolh
\let\fancy@Oorf\fancy@Oorh
\let\fancy@Oelf\fancy@Oelh
\let\fancy@Oerf\fancy@Oerh
% New definitions for the use of \fancyhfoffset
% These calculate the \headwidth from \textwidth and the specified offsets.
\def\fancy@offsolh{\headwidth=\textwidth\advance\headwidth\f@ncyO@olh
\advance\headwidth\f@ncyO@orh\hskip-\f@ncyO@olh}
\def\fancy@offselh{\headwidth=\textwidth\advance\headwidth\f@ncyO@elh
\advance\headwidth\f@ncyO@erh\hskip-\f@ncyO@elh}
\def\fancy@offsolf{\headwidth=\textwidth\advance\headwidth\f@ncyO@olf
\advance\headwidth\f@ncyO@orf\hskip-\f@ncyO@olf}
\def\fancy@offself{\headwidth=\textwidth\advance\headwidth\f@ncyO@elf
\advance\headwidth\f@ncyO@erf\hskip-\f@ncyO@elf}
\def\fancy@setoffs{%
% Just in case \let\headwidth\textwidth was used
\fancy@gbl\let\headwidth\fancy@headwidth
\fancy@gbl\let\fancy@Oolh\fancy@offsolh
\fancy@gbl\let\fancy@Oelh\fancy@offselh
\fancy@gbl\let\fancy@Oorh\hss
\fancy@gbl\let\fancy@Oerh\hss
\fancy@gbl\let\fancy@Oolf\fancy@offsolf
\fancy@gbl\let\fancy@Oelf\fancy@offself
\fancy@gbl\let\fancy@Oorf\hss
\fancy@gbl\let\fancy@Oerf\hss}
\newif\iffootnote
\let\latex@makecol\@makecol
\def\@makecol{\ifvoid\footins\footnotetrue\else\footnotefalse\fi
\let\topfloat\@toplist\let\botfloat\@botlist\latex@makecol}
\def\iftopfloat#1#2{\ifx\topfloat\empty #2\else #1\fi}
\def\ifbotfloat#1#2{\ifx\botfloat\empty #2\else #1\fi}
\def\iffloatpage#1#2{\if@fcolmade #1\else #2\fi}
\newcommand{\fancypagestyle}[2]{%
\@namedef{ps@#1}{\let\fancy@gbl\relax#2\relax\ps@fancy}}

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

1441
report/icml2017.bst Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,176 @@
%% REPLACE sXXXXXXX with your student number
\def\studentNumber{s2759177}
%% START of YOUR ANSWERS
%% Add answers to the questions below, by replacing the text inside the brackets {} for \youranswer{ "Text to be replaced with your answer." }.
%
% Do not delete the commands for adding figures and tables. Instead fill in the missing values with your experiment results, and replace the images with your own respective figures.
%
% You can generally delete the placeholder text, such as for example the text "Question Figure 3 - Replace the images ..."
%
% There are 5 TEXT QUESTIONS. Replace the text inside the brackets of the command \youranswer with your answer to the question.
%
% There are also 3 "questions" to replace some placeholder FIGURES with your own, and 1 "question" asking you to fill in the missing entries in the TABLE provided.
%
% NOTE! that questions are ordered by the order of appearance of their answers in the text, and not necessarily by the order you should tackle them. You should attempt to fill in the TABLE and FIGURES before discussing the results presented there.
%
% NOTE! If for some reason you do not manage to produce results for some FIGURES and the TABLE, then you can get partial marks by discussing your expectations of the results in the relevant TEXT QUESTIONS. The TABLE specifically has enough information in it already for you to draw meaningful conclusions.
%
% Please refer to the coursework specification for more details.
%% - - - - - - - - - - - - TEXT QUESTIONS - - - - - - - - - - - -
%% Question 1:
% Use Figures 1, 2, and 3 to identify the Vanishing Gradient Problem (which of these model suffers from it, and what are the consequences depicted?).
% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page}
\newcommand{\questionOne} {
\youranswer{
We can observe the 8 layer network learning (even though it does not achieve high accuracy), but the 38-layer network fails to learn, as its gradients vanish almost entirely in the earlier layers. This is evident in Figure 3, where the gradients in VGG38 are close to zero for all but the last few layers, preventing effective weight updates during backpropagation. Consequently, the deeper network is unable to extract meaningful features or minimize its loss, leading to stagnation in both training and validation performance.
We conclude that VGG08 performs nominally during training, while VGG38 suffers from the vanishing gradient problem, as its gradients diminish to near-zero in early layers, impeding effective weight updates and preventing the network from learning meaningful features. This limitation nullifies the advantages of its deeper architecture, as reflected in its stagnant loss and accuracy throughout training. This is in stark contrast to VGG08 which maintains a healthy gradient flow across layers, allowing effective weight updates and enabling the network to learn features, reduce loss, and improve accuracy despite its smaller depth.
}
}
%% Question 2:
% Consider these results (including Figure 1 from \cite{he2016deep}). Discuss the relation between network capacity and overfitting, and whether, and how, this is reflected on these results. What other factors may have lead to this difference in performance?
% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page
\newcommand{\questionTwo} {
\youranswer{Our results thus corroborate that increasing network depth can lead to higher training and testing errors, as seen in the comparison between VGG08 and VGG38. While deeper networks, like VGG38, have a larger capacity to learn complex features, they may struggle to generalize effectively, resulting in overfitting and poor performance on unseen data. This is consistent with the behaviour observed in Figure 1 from \cite{he2016deep}, where the 56-layer network exhibits higher training error and, consequently, higher test error compared to the 20-layer network.
Our results suggest that the increased capacity of VGG38 does not translate into better generalization, likely due to the vanishing gradient problem, which hinders learning in deeper networks. Other factors, such as inadequate regularization or insufficient data augmentation, could also contribute to the observed performance difference, leading to overfitting in deeper architectures.}
}
%% Question 3:
% In this coursework, we didn't incorporate residual connections to the downsampling layers. Explain and justify what would need to be changed in order to add residual connections to the downsampling layers. Give and explain 2 ways of incorporating these changes and discuss pros and cons of each.
\newcommand{\questionThree} {
\youranswer{
Our work does not incorporate residual connections across the downsampling layers, as this creates a dimensional mismatch between the input and output feature maps due to the reduction in spatial dimensions. To add residual connections, one approach is to use a convolutional layer with a kernel size of $1\times 1$, stride, and padding matched to the downsampling operation to transform the input to the same shape as the output. Another approach would be to use average pooling or max pooling directly on the residual connection to downsample the input feature map, matching its spatial dimensions to the output, followed by a linear transformation to align the channel dimensions.
The difference between these two methods is that the first approach using a $1\times 1$ convolution provides more flexibility by learning the transformation, which can enhance model expressiveness but increases computational cost, whereas the second approach with pooling is computationally cheaper and simpler but may lose fine-grained information due to the fixed, non-learnable nature of pooling operations.
}
}
%% Question 4:
% Question 4 - Present and discuss the experiment results (all of the results and not just the ones you had to fill in) in Table 1 and Figures 4 and 5 (you may use any of the other Figures if you think they are relevant to your analysis). You will have to determine what data are relevant to the discussion, and what information can be extracted from it. Also, discuss what further experiments you would have ran on any combination of VGG08, VGG38, BN, RC in order to
% \begin{itemize}
% \item Improve performance of the model trained (explain why you expect your suggested experiments will help with this).
% \item Learn more about the behaviour of BN and RC (explain what you are trying to learn and how).
% \end{itemize}
%
% The average length for an answer to this question is approximately 1 of the columns in a 2-column page
\newcommand{\questionFour} {
\youranswer{
Our results demonstrate the effectiveness of batch normalization and residual connection as proposed by \cite{he2016deep}, enabling effective training of deep convolutional networks as shown by the significant improvement in training and validation performance for VGG38 when incorporating these techniques. Table~\ref{tab:CIFAR_results} highlights that adding BN alone (VGG38 BN) reduces both training and validation losses compared to the baseline VGG38, with validation accuracy increasing from near-zero to $47.68\%$ at a learning rate (LR) of $1\mathrm{e}{-3}$. Adding RC further enhances performance, as seen in VGG38 RC achieving $52.32\%$ validation accuracy under the same conditions. The combination of BN and RC (VGG38 BN + RC) yields the best results, achieving $53.76\%$ validation accuracy with LR $1\mathrm{e}{-3}$. BN+RC appears to benefit greatly from a higher learning rate, as it improves further to $58.20\%$ a LR of $1\mathrm{e}{-2}$. BN alone however deteriorates at higher learning rates, as evidenced by lower validation accuracy, emphasizing the stabilizing role of RC. \autoref{fig:training_curves_bestModel} confirms the synergy of BN and RC, with the VGG38 BN + RC model reaching $74\%$ training accuracy and plateauing near $60\%$ validation accuracy. \autoref{fig:avg_grad_flow_bestModel} illustrates stable gradient flow, with BN mitigating vanishing gradients and RC maintaining gradient propagation through deeper layers, particularly in the later stages of the network.
While this work did not evaluate residual connections on downsampling layers, a thorough evaluation of both methods put forth earlier would be required to complete the picture, highlighting how exactly residual connections in downsampling layers affect gradient flow, feature learning, and overall performance. Such an evaluation would clarify whether the additional computational cost of using $1\times 1$ convolutions for matching dimensions is justified by improved accuracy or if the simpler pooling-based approach suffices, particularly for tasks where computational efficiency is crucial.
}
}
%% Question 5:
% Briefly draw your conclusions based on the results from the previous sections (what are the take-away messages?) and conclude your report with a recommendation for future work.
%
% Good recommendations for future work also draw on the broader literature (the papers already referenced are good starting points). Great recommendations for future work are not just incremental (an example of an incremental suggestion would be: ``we could also train with different learning rates'') but instead also identify meaningful questions or, in other words, questions with answers that might be somewhat more generally applicable.
%
% For example, \citep{huang2017densely} end with \begin{quote}``Because of their compact internal representations and reduced feature redundancy, DenseNets may be good feature extractors for various computer vision tasks that build on convolutional features, e.g., [4,5].''\end{quote}
%
% while \cite{bengio1993problem} state in their conclusions that \begin{quote}``There remains theoretical questions to be considered, such as whether the problem with simple gradient descent discussed in this paper would be observed with chaotic attractors that are not hyperbolic.''\\\end{quote}
%
% The length of this question description is indicative of the average length of a conclusion section
\newcommand{\questionFive} {
\youranswer{
The results presented showcase a clear solution to the vanishing gradient problem. With batch normalization and Residual Connections, we are able to train much deeper neural networks effectively, as evidenced by the improved performance of VGG38 with these modifications. The combination of BN and RC not only stabilizes gradient flow but also enhances both training and validation accuracy, particularly when paired with an appropriate learning rate. These findings reinforce the utility of architectural innovations like those proposed in \cite{he2016deep} and \cite{ioffe2015batch}, which have become foundational in modern deep learning.
While these methods appear to enable training of deeper neural networks, the critical question of how these architectural enhancements generalize across different datasets and tasks remains open. Future work could investigate the effectiveness of BN and RC in scenarios involving large-scale datasets, such as ImageNet, or in domains like natural language processing and generative models, where deep architectures also face optimization challenges. Additionally, exploring the interplay between residual connections and emerging techniques like attention mechanisms \citep{vaswani2017attention} might uncover further synergies. Beyond this, understanding the theoretical underpinnings of how residual connections influence optimization landscapes and gradient flow could yield insights applicable to designing novel architectures.}
}
%% - - - - - - - - - - - - FIGURES - - - - - - - - - - - -
%% Question Figure 3:
\newcommand{\questionFigureThree} {
% Question Figure 3 - Replace this image with a figure depicting the average gradient across layers, for the VGG38 model.
%\textit{(The provided figure is correct, and can be used in your analysis. It is partially obscured so you can get credit for producing your own copy).}
\youranswer{
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/gradplot_38.pdf}
\caption{Gradient Flow on VGG38}
\label{fig:avg_grad_flow_38}
\end{figure}
}
}
%% Question Figure 4:
% Question Figure 4 - Replace this image with a figure depicting the training curves for the model with the best performance \textit{across experiments you have available (you don't need to run the experiments for the models we already give you results for)}. Edit the caption so that it clearly identifies the model and what is depicted.
\newcommand{\questionFigureFour} {
\youranswer{
\begin{figure}[t]
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/VGG38_BN_RC_loss_performance.pdf}
\caption{Cross entropy error per epoch}
\label{fig:vgg38_loss_curves}
\end{subfigure}
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/VGG38_BN_RC_accuracy_performance.pdf}
\caption{Classification accuracy per epoch}
\label{fig:vgg38_acc_curves}
\end{subfigure}
\caption{Training curves for the 38 layer CNN with batch normalization and residual connections, trained with LR of $0.01$}
\label{fig:training_curves_bestModel}
\end{figure}
}
}
%% Question Figure 5:
% Question Figure 5 - Replace this image with a figure depicting the average gradient across layers, for the model with the best performance \textit{across experiments you have available (you don't need to run the experiments for the models we already give you results for)}. Edit the caption so that it clearly identifies the model and what is depicted.
\newcommand{\questionFigureFive} {
\youranswer{
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/gradplot_38_bn_rc.pdf}
\caption{Gradient Flow for the 38 layer CNN with batch normalization and residual connections, trained with LR of $0.01$}
\label{fig:avg_grad_flow_bestModel}
\end{figure}
}
}
%% - - - - - - - - - - - - TABLES - - - - - - - - - - - -
%% Question Table 1:
% Question Table 1 - Fill in Table 1 with the results from your experiments on
% \begin{enumerate}
% \item \textit{VGG38 BN (LR 1e-3)}, and
% \item \textit{VGG38 BN + RC (LR 1e-2)}.
% \end{enumerate}
\newcommand{\questionTableOne} {
\youranswer{
%
\begin{table*}[t]
\centering
\begin{tabular}{lr|ccccc}
\toprule
Model & LR & \# Params & Train loss & Train acc & Val loss & Val acc \\
\midrule
VGG08 & 1e-3 & 60 K & 1.74 & 51.59 & 1.95 & 46.84 \\
VGG38 & 1e-3 & 336 K & 4.61 & 00.01 & 4.61 & 00.01 \\
VGG38 BN & 1e-3 & 339 K & 1.76 & 50.62 & 1.95 & 47.68 \\
VGG38 RC & 1e-3 & 336 K & 1.33 & 61.52 & 1.84 & 52.32 \\
VGG38 BN + RC & 1e-3 & 339 K & 1.26 & 62.99 & 1.73 & 53.76 \\
VGG38 BN & 1e-2 & 339 K & 1.70 & 52.28 & 1.99 & 46.72 \\
VGG38 BN + RC & 1e-2 & 339 K & 0.83 & 74.35 & 1.70 & 58.20 \\
\bottomrule
\end{tabular}
\caption{Experiment results (number of model parameters, Training and Validation loss and accuracy) for different combinations of VGG08, VGG38, Batch Normalisation (BN), and Residual Connections (RC), LR is learning rate.}
\label{tab:CIFAR_results}
\end{table*}
}
}
%% END of YOUR ANSWERS

314
report/mlp-cw2-template.tex Normal file
View File

@ -0,0 +1,314 @@
%% Template for MLP Coursework 2 / 13 November 2023
%% Based on LaTeX template for ICML 2017 - example_paper.tex at
%% https://2017.icml.cc/Conferences/2017/StyleAuthorInstructions
\documentclass{article}
\input{mlp2022_includes}
\definecolor{red}{rgb}{0.95,0.4,0.4}
\definecolor{blue}{rgb}{0.4,0.4,0.95}
\definecolor{orange}{rgb}{1, 0.65, 0}
\newcommand{\youranswer}[1]{{\color{red} \bf[#1]}} %your answer:
%% START of YOUR ANSWERS
\input{mlp-cw2-questions}
%% END of YOUR ANSWERS
%% Do not change anything in this file. Add your answers to mlp-cw1-questions.tex
\begin{document}
\twocolumn[
\mlptitle{MLP Coursework 2}
\centerline{\studentNumber}
\vskip 7mm
]
\begin{abstract}
Deep neural networks have become the state-of-the-art
in many standard computer vision problems thanks to their powerful
representations and availability of large labeled datasets.
While very deep networks allow for learning more levels of abstractions in their layers from the data, training these models successfully is a challenging task due to problematic gradient flow through the layers, known as vanishing/exploding gradient problem.
In this report, we first analyze this problem in VGG models with 8 and 38 hidden layers on the CIFAR100 image dataset, by monitoring the gradient flow during training.
We explore known solutions to this problem including batch normalization or residual connections, and explain their theory and implementation details.
Our experiments show that batch normalization and residual connections effectively address the aforementioned problem and hence enable a deeper model to outperform shallower ones in the same experimental setup.
\end{abstract}
\section{Introduction}
\label{sec:intro}
Despite the remarkable progress of modern convolutional neural networks (CNNs) in image classification problems~\cite{simonyan2014very, he2016deep}, training very deep networks is a challenging procedure.
One of the major problems is the Vanishing Gradient Problem (VGP), a phenomenon where the gradients of the error function with respect to network weights shrink to zero, as they backpropagate to earlier layers, hence preventing effective weight updates.
This phenomenon is prevalent and has been extensively studied in various deep neural networks including feedforward networks~\cite{glorot2010understanding}, RNNs~\cite{bengio1993problem}, and CNNs~\cite{he2016deep}.
Multiple solutions have been proposed to mitigate this problem by using weight initialization strategies~\cite{glorot2010understanding},
activation functions~\cite{glorot2010understanding}, input normalization~\cite{bishop1995neural},
batch normalization~\cite{ioffe2015batch}, and shortcut connections \cite{he2016deep, huang2017densely}.
This report focuses on diagnosing the VGP occurring in the VGG38 model\footnote{VGG stands for the Visual Geometry Group in the University of Oxford.} and addressing it by implementing two standard solutions.
In particular, we first study a ``broken'' network in terms of its gradient flow, L1 norm of gradients with respect to its weights for each layer and contrast it to ones in the healthy and shallower VGG08 to pinpoint the problem.
Next, we review two standard solutions for this problem, batch normalization (BN)~\cite{ioffe2015batch} and residual connections (RC)~\cite{he2016deep} in detail and discuss how they can address the gradient problem.
We first incorporate batch normalization (denoted as VGG38+BN), residual connections (denoted as VGG38+RC), and their combination (denoted as VGG38+BN+RC) to the given VGG38 architecture.
We train the resulting three configurations, and VGG08 and VGG38 models on CIFAR100 (pronounced as `see far 100' ) dataset and present the results.
The results show that though separate use of BN and RC does mitigate the vanishing/exploding gradient problem, therefore enabling effective training of the VGG38 model, the best results are obtained by combining both BN and RC.
%
\section{Identifying training problems of a deep CNN}
\label{sec:task1}
\begin{figure}[t]
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/loss_plot.pdf}
\caption{Cross entropy error per epoch}
\label{fig:loss_curves}
\end{subfigure}
\begin{subfigure}{\linewidth}
\centering
\includegraphics[width=\linewidth]{figures/accuracy_plot.pdf}
\caption{Classification accuracy per epoch}
\label{fig:acc_curves}
\end{subfigure}
\caption{Training curves for VGG08 and VGG38 in terms of (a) cross-entropy error and (b) classification accuracy}
\label{fig:curves}
\end{figure}
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figures/grad_flow_vgg08.pdf}
\caption{Gradient flow on VGG08}
\label{fig:grad_flow_08}
\end{figure}
\questionFigureThree
Concretely, training deep neural networks typically involves three steps: forward
pass, backward pass (or backpropagation algorithm~\cite{rumelhart1986learning}) and weight update.
The first step involves passing the input $\bx^{(0)}$ to the network and producing
the network prediction and also the error value.
In detail, each layer takes in the output of the previous layer and applies
a non-linear transformation:
\begin{equation}
\label{eq.fprop}
\bx^{(l)} = f^{(l)}(\bx^{(l-1)}; W^{(l)})
\end{equation}
where $(l)$ denotes the $l$-th layer in $L$ layer deep network,
$f^{(l)}(\cdot,W^{(l)})$ is a non-linear transformation for layer $l$, and $W^{(l)}$ are the weights of layer $l$.
For instance, $f^{(l)}$ is typically a convolution operation followed by an activation function in convolutional neural networks.
The second step involves the backpropagation algorithm, where we calculate the gradient of an error function $E$ (\textit{e.g.} cross-entropy) for each layer's weight as follows:
\begin{equation}
\label{eq.bprop}
\frac{\partial E}{\partial W^{(l)}} = \frac{\partial E}{\partial \bx^{(L)}} \frac{\partial \bx^{(L)}}{\partial \bx^{(L-1)}} \dots \frac{\partial \bx^{(l+1)}}{\partial \bx^{(l)}}\frac{\partial \bx^{(l)}}{\partial W^{(l)}}.
\end{equation}
This step includes consecutive tensor multiplications between multiple
partial derivative terms.
The final step involves updating model weights by using the computed
$\frac{\partial E}{\partial W^{(l)}}$ with an update rule.
The exact update rule depends on the optimizer.
A notorious problem for training deep neural networks is the vanishing/exploding gradient
problem~\cite{bengio1993problem} that typically occurs in the backpropagation step when some of partial gradient terms in Eq.~\ref{eq.bprop} includes values larger or smaller than 1.
In this case, due to the multiple consecutive multiplications, the gradients \textit{w.r.t.} weights can get exponentially very small (close to 0) or very large (close to infinity) and
prevent effective learning of network weights.
%
Figures~\ref{fig:grad_flow_08} and \ref{fig:grad_flow_38} depict the gradient flows through VGG architectures \cite{simonyan2014very} with 8 and 38 layers respectively, trained and evaluated for a total of 100 epochs on the CIFAR100 dataset. \questionOne.
\section{Background Literature}
\label{sec:lit_rev}
In this section we will highlight some of the most influential
papers that have been central to overcoming the VGP in
deep CNNs.
\paragraph{Batch Normalization}\cite{ioffe2015batch}
BN seeks to solve the problem of
internal covariate shift (ICS), when distribution of each layers
inputs changes during training, as the parameters of the previous layers change.
The authors argue that without batch normalization, the distribution of
each layers inputs can vary significantly due to the stochastic nature of randomly sampling mini-batches from your
training set.
Layers in the network hence must continuously adapt to these high variance distributions which hinders the rate of convergence gradient-based optimizers.
This optimization problem is exacerbated further with network depth due
to the updating of parameters at layer $l$ being dependent on
the previous $l-1$ layers.
It is hence beneficial to embed the normalization of
training data into the network architecture after work from
LeCun \emph{et al.} showed that training converges faster with
this addition \cite{lecun2012efficient}. Through standardizing
the inputs to each layer, we take a step towards achieving
the fixed distributions of inputs that remove the ill effects
of ICS. Ioffe and Szegedy demonstrate the effectiveness of
their technique through training an ensemble of BN
networks which achieve an accuracy on the ImageNet classification
task exceeding that of humans in 14 times fewer
training steps than the state-of-the-art of the time.
It should be noted, however, that the exact reason for BNs effectiveness is still not completely understood and it is
an open research question~\cite{santurkar2018does}.
\paragraph{Residual networks (ResNet)}\cite{he2016deep} A well-known way of mitigating the VGP is proposed by He~\emph{et al.} in \cite{he2016deep}. In their paper, the authors depict the error curves of a 20 layer and a 56 layer network to motivate their method. Both training and testing error of the 56 layer network are significantly higher than of the shallower one.
\questionTwo.
Residual networks, colloquially
known as ResNets, aim to alleviate VGP through the
incorporation of skip connections that bypass the linear
transformations into the network architecture.
The authors argue that this new mapping is significantly easier
to optimize since if an identity mapping were optimal, the
network could comfortably learn to push the residual to
zero rather than attempting to fit an identity mapping via
a stack of nonlinear layers.
They bolster their argument
by successfully training ResNets with depths exceeding
1000 layers on the CIFAR10 dataset.
Prior to their work, training even a 100-layer was accepted
as a great challenge within the deep learning community.
The addition of skip connections solves the VGP through
enabling information to flow more freely throughout the
network architecture without the addition of neither extra
parameters, nor computational complexity.
\section{Solution overview}
\subsection{Batch normalization}
BN has been a standard component in the state-of-the-art
convolutional neural networks \cite{he2016deep,huang2017densely}.
% As mentioned in Section~\ref{sec:lit_rev},
Concretely, BN is a
layer transformation that is performed to whiten the activations
originating from each layer.
As computing full dataset statistics at each training iteration
would be computationally expensive, BN computes batch statistics
to approximate them.
Given a minibatch of $B$ training samples and their feature maps
$X = (\bx^1, \bx^2,\ldots , \bx^B)$ at an arbitrary layer where $X \in \mathbb{R}^{B\times H \times W \times C}$, $H, W$ are the height, width of the feature map and $C$ is the number of channels, the batch normalization first computes the following statistics:
\begin{align}
\label{eq.bnstats}
\mu_c &= \frac{1}{BWH} \sum_{n=1}^{B}\sum_{i,j=1}^{H,W} \bx_{cij}^{n}\\
\sigma^2_c &= \frac{1}{BWH}
\sum_{n=1}^{B}\sum_{i,j=1}^{H,W} (\bx_{cij}^{n} - \mu_{c})^2
\end{align} where $c$, $i$, $j$ denote the index values for $y$, $x$ and channel coordinates of feature maps, and $\bm{\mu}$ and $\bm{\sigma}^2$ are the mean and variance of the batch.
BN applies the following operation on each feature map in batch B for every $c,i,j$:
\begin{equation}
\label{eq.bnop}
\text{BN}(\bx_{cij}) = \frac{\bx_{cij} - \mu_{c}}{\sqrt{\sigma^2_{c}} + \epsilon} * \gamma_{c} + \beta_{c}
\end{equation} where $\gamma \in \mathbb{R}^C$ and $\beta\in \mathbb{R}^C$ are learnable parameters and $\epsilon$ is a small constant introduced to ensure numerical stability.
At inference time, using batch statistics is a poor choice as it introduces noise in the evaluation and might not even be well defined. Therefore, $\bm{\mu}$ and $\bm{\sigma}$ are replaced by running averages of the mean and variance computed during training, which is a better approximation of the full dataset statistics.
Recent work
has shown that BatchNorm has a more fundamental
benefit of smoothing the optimization landscape during
training \cite{santurkar2018does} thus enhancing the predictive
power of gradients as our guide to the global minimum.
Furthermore, a smoother optimization landscape should
additionally enable the use of a wider range of learning
rates and initialization schemes which is congruent with the
findings of Ioffe and Szegedy in the original BatchNorm
paper~\cite{ioffe2015batch}.
\subsection{Residual connections}
Residual connections are another approach used in the state-of-the-art Residual Networks~\cite{he2016deep} to tackle the vanishing gradient problem.
Introduced by He et. al.~\cite{he2016deep}, a residual block consists of a
convolution (or group of convolutions) layer, ``short-circuited'' with an identity mapping.
More precisely, given a mapping $F^{(b)}$ that denotes the transformation of the block $b$ (multiple consecutive layers), $F^{(b)}$ is applied to its input
feature map $\bx^{(b-1)}$ as $\bx^{(b)} = \bx^{(b-1)} + {F}(\bx^{(b-1)})$.
Intuitively, stacking residual blocks creates an architecture where inputs of each blocks
are given two paths : passing through the convolution or skipping to the next layer. A residual network can therefore be seen as an ensemble model averaging every sub-network
created by choosing one of the two paths. The skip connections allow gradients to flow
easily into early layers, since
\begin{equation}
\frac{\partial \bx^{(b)}}{\partial \bx^{(b-1)}} = \mathbbm{1} + \frac{\partial{F}(\bx^{(b-1)})}{\partial \bx^{(b-1)}}
\label{eq.grad_skip}
\end{equation} where $\bx^{(b-1)} \in \mathbb{R}^{C \times H \times W }$ and $\mathbbm{1}$ is a $\mathbb{R}^{C \times H \times W}$-dimensional tensor with entries 1 where $C$, $H$ and $W$ denote the number of feature maps, its height and width respectively.
Importantly, $\mathbbm{1}$ prevents the zero gradient flow.
\section{Experiment Setup}
\questionFigureFour
\questionFigureFive
\questionTableOne
We conduct our experiment on the CIFAR100 dataset \cite{krizhevsky2009learning},
which consists of 60,000 32x32 colour images from 100 different classes. The number of samples per class is balanced, and the
samples are split into training, validation, and test set while
maintaining balanced class proportions. In total, there are 47,500; 2,500; and 10,000 instances in the training, validation,
and test set, respectively. Moreover, we apply data augmentation strategies (cropping, horizontal flipping) to improve the generalization of the model.
With the goal of understanding whether BN or skip connections
help fighting vanishing gradients, we first test these
methods independently, before combining them in an attempt
to fully exploit the depth of the VGG38 model.
All experiments are conducted using the Adam optimizer with the default
learning rate (1e-3) -- unless otherwise specified, cosine annealing and a batch size of 100
for 100 epochs.
Additionally, training images are augmented with random
cropping and horizontal flipping.
Note that we do not use data augmentation at test time.
These hyperparameters along with the augmentation strategy are used
to produce the results shown in Fig.~\ref{fig:curves}.
When used, BN is applied
after each convolutional layer, before the Leaky
ReLU non-linearity.
Similarly, the skip connections are applied from
before the convolution layer to before the final activation function
of the block as per Fig.~2 of \cite{he2016deep}.
Note that adding residual connections between the feature maps before and after downsampling requires special treatment, as there is a dimension mismatch between them.
Therefore in the coursework, we do not use residual connections in the down-sampling blocks. However, please note that batch normalization should still be implemented for these blocks.
\subsection{Residual Connections to Downsampling Layers}
\label{subsec:rescimp}
\questionThree.
\section{Results and Discussion}
\label{sec:disc}
\questionFour.
\section{Conclusion}
\label{sec:concl}
\questionFive.
\bibliography{refs}
\end{document}

720
report/mlp2022.sty Normal file
View File

@ -0,0 +1,720 @@
% File: mlp2017.sty (LaTeX style file for ICML-2017, version of 2017-05-31)
% Modified by Daniel Roy 2017: changed byline to use footnotes for affiliations, and removed emails
% This file contains the LaTeX formatting parameters for a two-column
% conference proceedings that is 8.5 inches wide by 11 inches high.
%
% Modified by Percy Liang 12/2/2013: changed the year, location from the previous template for ICML 2014
% Modified by Fei Sha 9/2/2013: changed the year, location form the previous template for ICML 2013
%
% Modified by Fei Sha 4/24/2013: (1) remove the extra whitespace after the first author's email address (in %the camera-ready version) (2) change the Proceeding ... of ICML 2010 to 2014 so PDF's metadata will show up % correctly
%
% Modified by Sanjoy Dasgupta, 2013: changed years, location
%
% Modified by Francesco Figari, 2012: changed years, location
%
% Modified by Christoph Sawade and Tobias Scheffer, 2011: added line
% numbers, changed years
%
% Modified by Hal Daume III, 2010: changed years, added hyperlinks
%
% Modified by Kiri Wagstaff, 2009: changed years
%
% Modified by Sam Roweis, 2008: changed years
%
% Modified by Ricardo Silva, 2007: update of the ifpdf verification
%
% Modified by Prasad Tadepalli and Andrew Moore, merely changing years.
%
% Modified by Kristian Kersting, 2005, based on Jennifer Dy's 2004 version
% - running title. If the original title is to long or is breaking a line,
% use \mlptitlerunning{...} in the preamble to supply a shorter form.
% Added fancyhdr package to get a running head.
% - Updated to store the page size because pdflatex does compile the
% page size into the pdf.
%
% Hacked by Terran Lane, 2003:
% - Updated to use LaTeX2e style file conventions (ProvidesPackage,
% etc.)
% - Added an ``appearing in'' block at the base of the first column
% (thus keeping the ``appearing in'' note out of the bottom margin
% where the printer should strip in the page numbers).
% - Added a package option [accepted] that selects between the ``Under
% review'' notice (default, when no option is specified) and the
% ``Appearing in'' notice (for use when the paper has been accepted
% and will appear).
%
% Originally created as: ml2k.sty (LaTeX style file for ICML-2000)
% by P. Langley (12/23/99)
%%%%%%%%%%%%%%%%%%%%
%% This version of the style file supports both a ``review'' version
%% and a ``final/accepted'' version. The difference is only in the
%% text that appears in the note at the bottom of the first column of
%% the first page. The default behavior is to print a note to the
%% effect that the paper is under review and don't distribute it. The
%% final/accepted version prints an ``Appearing in'' note. To get the
%% latter behavior, in the calling file change the ``usepackage'' line
%% from:
%% \usepackage{icml2017}
%% to
%% \usepackage[accepted]{icml2017}
%%%%%%%%%%%%%%%%%%%%
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{mlp2022}[2021/10/16 MLP Coursework Style File]
% Use fancyhdr package
\RequirePackage{fancyhdr}
\RequirePackage{color}
\RequirePackage{algorithm}
\RequirePackage{algorithmic}
\RequirePackage{natbib}
\RequirePackage{eso-pic} % used by \AddToShipoutPicture
\RequirePackage{forloop}
%%%%%%%% Options
%\DeclareOption{accepted}{%
% \renewcommand{\Notice@String}{\ICML@appearing}
\gdef\isaccepted{1}
%}
\DeclareOption{nohyperref}{%
\gdef\nohyperref{1}
}
\ifdefined\nohyperref\else\ifdefined\hypersetup
\definecolor{mydarkblue}{rgb}{0,0.08,0.45}
\hypersetup{ %
pdftitle={},
pdfauthor={},
pdfsubject={MLP Coursework 2021-22},
pdfkeywords={},
pdfborder=0 0 0,
pdfpagemode=UseNone,
colorlinks=true,
linkcolor=mydarkblue,
citecolor=mydarkblue,
filecolor=mydarkblue,
urlcolor=mydarkblue,
pdfview=FitH}
\ifdefined\isaccepted \else
\hypersetup{pdfauthor={Anonymous Submission}}
\fi
\fi\fi
%%%%%%%%%%%%%%%%%%%%
% This string is printed at the bottom of the page for the
% final/accepted version of the ``appearing in'' note. Modify it to
% change that text.
%%%%%%%%%%%%%%%%%%%%
\newcommand{\ICML@appearing}{\textit{MLP Coursework 1 2021--22}}
%%%%%%%%%%%%%%%%%%%%
% This string is printed at the bottom of the page for the draft/under
% review version of the ``appearing in'' note. Modify it to change
% that text.
%%%%%%%%%%%%%%%%%%%%
\newcommand{\Notice@String}{MLP Coursework 1 2021--22}
% Cause the declared options to actually be parsed and activated
\ProcessOptions\relax
% Uncomment the following for debugging. It will cause LaTeX to dump
% the version of the ``appearing in'' string that will actually appear
% in the document.
%\typeout{>> Notice string='\Notice@String'}
% Change citation commands to be more like old ICML styles
\newcommand{\yrcite}[1]{\citeyearpar{#1}}
\renewcommand{\cite}[1]{\citep{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% to ensure the letter format is used. pdflatex does compile the
% page size into the pdf. This is done using \pdfpagewidth and
% \pdfpageheight. As Latex does not know this directives, we first
% check whether pdflatex or latex is used.
%
% Kristian Kersting 2005
%
% in order to account for the more recent use of pdfetex as the default
% compiler, I have changed the pdf verification.
%
% Ricardo Silva 2007
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\paperwidth=210mm
\paperheight=297mm
% old PDFLaTex verification, circa 2005
%
%\newif\ifpdf\ifx\pdfoutput\undefined
% \pdffalse % we are not running PDFLaTeX
%\else
% \pdfoutput=1 % we are running PDFLaTeX
% \pdftrue
%\fi
\newif\ifpdf %adapted from ifpdf.sty
\ifx\pdfoutput\undefined
\else
\ifx\pdfoutput\relax
\else
\ifcase\pdfoutput
\else
\pdftrue
\fi
\fi
\fi
\ifpdf
% \pdfpagewidth=\paperwidth
% \pdfpageheight=\paperheight
\setlength{\pdfpagewidth}{210mm}
\setlength{\pdfpageheight}{297mm}
\fi
% Physical page layout
\evensidemargin -5.5mm
\oddsidemargin -5.5mm
\setlength\textheight{248mm}
\setlength\textwidth{170mm}
\setlength\columnsep{6.5mm}
\setlength\headheight{10pt}
\setlength\headsep{10pt}
\addtolength{\topmargin}{-20pt}
%\setlength\headheight{1em}
%\setlength\headsep{1em}
\addtolength{\topmargin}{-6mm}
%\addtolength{\topmargin}{-2em}
%% The following is adapted from code in the acmconf.sty conference
%% style file. The constants in it are somewhat magical, and appear
%% to work well with the two-column format on US letter paper that
%% ICML uses, but will break if you change that layout, or if you use
%% a longer block of text for the copyright notice string. Fiddle with
%% them if necessary to get the block to fit/look right.
%%
%% -- Terran Lane, 2003
%%
%% The following comments are included verbatim from acmconf.sty:
%%
%%% This section (written by KBT) handles the 1" box in the lower left
%%% corner of the left column of the first page by creating a picture,
%%% and inserting the predefined string at the bottom (with a negative
%%% displacement to offset the space allocated for a non-existent
%%% caption).
%%%
\def\ftype@copyrightbox{8}
\def\@copyrightspace{
% Create a float object positioned at the bottom of the column. Note
% that because of the mystical nature of floats, this has to be called
% before the first column is populated with text (e.g., from the title
% or abstract blocks). Otherwise, the text will force the float to
% the next column. -- TDRL.
\@float{copyrightbox}[b]
\begin{center}
\setlength{\unitlength}{1pc}
\begin{picture}(20,1.5)
% Create a line separating the main text from the note block.
% 4.818pc==0.8in.
\put(0,2.5){\line(1,0){4.818}}
% Insert the text string itself. Note that the string has to be
% enclosed in a parbox -- the \put call needs a box object to
% position. Without the parbox, the text gets splattered across the
% bottom of the page semi-randomly. The 19.75pc distance seems to be
% the width of the column, though I can't find an appropriate distance
% variable to substitute here. -- TDRL.
\put(0,0){\parbox[b]{19.75pc}{\small \Notice@String}}
\end{picture}
\end{center}
\end@float}
% Note: A few Latex versions need the next line instead of the former.
% \addtolength{\topmargin}{0.3in}
% \setlength\footheight{0pt}
\setlength\footskip{0pt}
%\pagestyle{empty}
\flushbottom \twocolumn
\sloppy
% Clear out the addcontentsline command
\def\addcontentsline#1#2#3{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% commands for formatting paper title, author names, and addresses.
%%start%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%% title as running head -- Kristian Kersting 2005 %%%%%%%%%%%%%
%\makeatletter
%\newtoks\mytoksa
%\newtoks\mytoksb
%\newcommand\addtomylist[2]{%
% \mytoksa\expandafter{#1}%
% \mytoksb{#2}%
% \edef#1{\the\mytoksa\the\mytoksb}%
%}
%\makeatother
% box to check the size of the running head
\newbox\titrun
% general page style
\pagestyle{fancy}
\fancyhf{}
\fancyhead{}
\fancyfoot{}
% set the width of the head rule to 1 point
\renewcommand{\headrulewidth}{1pt}
% definition to set the head as running head in the preamble
\def\mlptitlerunning#1{\gdef\@mlptitlerunning{#1}}
% main definition adapting \mlptitle from 2004
\long\def\mlptitle#1{%
%check whether @mlptitlerunning exists
% if not \mlptitle is used as running head
\ifx\undefined\@mlptitlerunning%
\gdef\@mlptitlerunning{#1}
\fi
%add it to pdf information
\ifdefined\nohyperref\else\ifdefined\hypersetup
\hypersetup{pdftitle={#1}}
\fi\fi
%get the dimension of the running title
\global\setbox\titrun=\vbox{\small\bf\@mlptitlerunning}
% error flag
\gdef\@runningtitleerror{0}
% running title too long
\ifdim\wd\titrun>\textwidth%
{\gdef\@runningtitleerror{1}}%
% running title breaks a line
\else\ifdim\ht\titrun>6.25pt
{\gdef\@runningtitleerror{2}}%
\fi
\fi
% if there is somthing wrong with the running title
\ifnum\@runningtitleerror>0
\typeout{}%
\typeout{}%
\typeout{*******************************************************}%
\typeout{Title exceeds size limitations for running head.}%
\typeout{Please supply a shorter form for the running head}
\typeout{with \string\mlptitlerunning{...}\space prior to \string\begin{document}}%
\typeout{*******************************************************}%
\typeout{}%
\typeout{}%
% set default running title
\chead{\small\bf Title Suppressed Due to Excessive Size}%
\else
% 'everything' fine, set provided running title
\chead{\small\bf\@mlptitlerunning}%
\fi
% no running title on the first page of the paper
\thispagestyle{empty}
%%%%%%%%%%%%%%%%%%%% Kristian Kersting %%%%%%%%%%%%%%%%%%%%%%%%%
%end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
{\center\baselineskip 18pt
\toptitlebar{\Large\bf #1}\bottomtitlebar}
}
\gdef\icmlfullauthorlist{}
\newcommand\addstringtofullauthorlist{\g@addto@macro\icmlfullauthorlist}
\newcommand\addtofullauthorlist[1]{%
\ifdefined\icmlanyauthors%
\addstringtofullauthorlist{, #1}%
\else%
\addstringtofullauthorlist{#1}%
\gdef\icmlanyauthors{1}%
\fi%
\ifdefined\nohyperref\else\ifdefined\hypersetup%
\hypersetup{pdfauthor=\icmlfullauthorlist}%
\fi\fi}
\def\toptitlebar{\hrule height1pt \vskip .25in}
\def\bottomtitlebar{\vskip .22in \hrule height1pt \vskip .3in}
\newenvironment{icmlauthorlist}{%
\setlength\topsep{0pt}
\setlength\parskip{0pt}
\begin{center}
}{%
\end{center}
}
\newcounter{@affiliationcounter}
\newcommand{\@pa}[1]{%
% ``#1''
\ifcsname the@affil#1\endcsname
% do nothing
\else
\ifcsname @icmlsymbol#1\endcsname
% nothing
\else
\stepcounter{@affiliationcounter}%
\newcounter{@affil#1}%
\setcounter{@affil#1}{\value{@affiliationcounter}}%
\fi
\fi%
\ifcsname @icmlsymbol#1\endcsname
\textsuperscript{\csname @icmlsymbol#1\endcsname\,}%
\else
%\expandafter\footnotemark[\arabic{@affil#1}\,]%
\textsuperscript{\arabic{@affil#1}\,}%
\fi
}
%\newcommand{\icmlauthor}[2]{%
%\addtofullauthorlist{#1}%
%#1\@for\theaffil:=#2\do{\pa{\theaffil}}%
%}
\newcommand{\icmlauthor}[2]{%
\ifdefined\isaccepted
\mbox{\bf #1}\,\@for\theaffil:=#2\do{\@pa{\theaffil}} \addtofullauthorlist{#1}%
\else
\ifdefined\@icmlfirsttime
\else
\gdef\@icmlfirsttime{1}
\mbox{\bf Anonymous Authors}\@pa{@anon} \addtofullauthorlist{Anonymous Authors}
\fi
\fi
}
\newcommand{\icmlsetsymbol}[2]{%
\expandafter\gdef\csname @icmlsymbol#1\endcsname{#2}
}
\newcommand{\icmlaffiliation}[2]{%
\ifdefined\isaccepted
\ifcsname the@affil#1\endcsname
\expandafter\gdef\csname @affilname\csname the@affil#1\endcsname\endcsname{#2}%
\else
{\bf AUTHORERR: Error in use of \textbackslash{}icmlaffiliation command. Label ``#1'' not mentioned in some \textbackslash{}icmlauthor\{author name\}\{labels here\} command beforehand. }
\typeout{}%
\typeout{}%
\typeout{*******************************************************}%
\typeout{Affiliation label undefined. }%
\typeout{Make sure \string\icmlaffiliation\space follows }
\typeout{all of \string\icmlauthor\space commands}%
\typeout{*******************************************************}%
\typeout{}%
\typeout{}%
\fi
\else % \isaccepted
% can be called multiple times... it's idempotent
\expandafter\gdef\csname @affilname1\endcsname{Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country}
\fi
}
\newcommand{\icmlcorrespondingauthor}[2]{
\ifdefined\isaccepted
\ifdefined\icmlcorrespondingauthor@text
\g@addto@macro\icmlcorrespondingauthor@text{, #1 \textless{}#2\textgreater{}}
\else
\gdef\icmlcorrespondingauthor@text{#1 \textless{}#2\textgreater{}}
\fi
\else
\gdef\icmlcorrespondingauthor@text{Anonymous Author \textless{}anon.email@domain.com\textgreater{}}
\fi
}
\newcommand{\icmlEqualContribution}{\textsuperscript{*}Equal contribution }
\newcounter{@affilnum}
\newcommand{\printAffiliationsAndNotice}[1]{%
\stepcounter{@affiliationcounter}%
{\let\thefootnote\relax\footnotetext{\hspace*{-\footnotesep}#1%
\forloop{@affilnum}{1}{\value{@affilnum} < \value{@affiliationcounter}}{
\textsuperscript{\arabic{@affilnum}}\ifcsname @affilname\the@affilnum\endcsname%
\csname @affilname\the@affilnum\endcsname%
\else
{\bf AUTHORERR: Missing \textbackslash{}icmlaffiliation.}
\fi
}.
\ifdefined\icmlcorrespondingauthor@text
Correspondence to: \icmlcorrespondingauthor@text.
\else
{\bf AUTHORERR: Missing \textbackslash{}icmlcorrespondingauthor.}
\fi
\ \\
\Notice@String
}
}
}
%\makeatother
\long\def\icmladdress#1{%
{\bf The \textbackslash{}icmladdress command is no longer used. See the example\_paper PDF .tex for usage of \textbackslash{}icmlauther and \textbackslash{}icmlaffiliation.}
}
%% keywords as first class citizens
\def\icmlkeywords#1{%
% \ifdefined\isaccepted \else
% \par {\bf Keywords:} #1%
% \fi
% \ifdefined\nohyperref\else\ifdefined\hypersetup
% \hypersetup{pdfkeywords={#1}}
% \fi\fi
% \ifdefined\isaccepted \else
% \par {\bf Keywords:} #1%
% \fi
\ifdefined\nohyperref\else\ifdefined\hypersetup
\hypersetup{pdfkeywords={#1}}
\fi\fi
}
% modification to natbib citations
\setcitestyle{authoryear,round,citesep={;},aysep={,},yysep={;}}
% Redefinition of the abstract environment.
\renewenvironment{abstract}
{%
% Insert the ``appearing in'' copyright notice.
%\@copyrightspace
\centerline{\large\bf Abstract}
\vspace{-0.12in}\begin{quote}}
{\par\end{quote}\vskip 0.12in}
% numbered section headings with different treatment of numbers
\def\@startsection#1#2#3#4#5#6{\if@noskipsec \leavevmode \fi
\par \@tempskipa #4\relax
\@afterindenttrue
% Altered the following line to indent a section's first paragraph.
% \ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \@afterindentfalse\fi
\ifdim \@tempskipa <\z@ \@tempskipa -\@tempskipa \fi
\if@nobreak \everypar{}\else
\addpenalty{\@secpenalty}\addvspace{\@tempskipa}\fi \@ifstar
{\@ssect{#3}{#4}{#5}{#6}}{\@dblarg{\@sict{#1}{#2}{#3}{#4}{#5}{#6}}}}
\def\@sict#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
\def\@svsec{}\else
\refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname}\fi
\@tempskipa #5\relax
\ifdim \@tempskipa>\z@
\begingroup #6\relax
\@hangfrom{\hskip #3\relax\@svsec.~}{\interlinepenalty \@M #8\par}
\endgroup
\csname #1mark\endcsname{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}\else
\def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}}\fi
\@xsect{#5}}
\def\@sect#1#2#3#4#5#6[#7]#8{\ifnum #2>\c@secnumdepth
\def\@svsec{}\else
\refstepcounter{#1}\edef\@svsec{\csname the#1\endcsname\hskip 0.4em }\fi
\@tempskipa #5\relax
\ifdim \@tempskipa>\z@
\begingroup #6\relax
\@hangfrom{\hskip #3\relax\@svsec}{\interlinepenalty \@M #8\par}
\endgroup
\csname #1mark\endcsname{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}\else
\def\@svsechd{#6\hskip #3\@svsec #8\csname #1mark\endcsname
{#7}\addcontentsline
{toc}{#1}{\ifnum #2>\c@secnumdepth \else
\protect\numberline{\csname the#1\endcsname}\fi
#7}}\fi
\@xsect{#5}}
% section headings with less space above and below them
\def\thesection {\arabic{section}}
\def\thesubsection {\thesection.\arabic{subsection}}
\def\section{\@startsection{section}{1}{\z@}{-0.12in}{0.02in}
{\large\bf\raggedright}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-0.10in}{0.01in}
{\normalsize\bf\raggedright}}
\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-0.08in}{0.01in}
{\normalsize\sc\raggedright}}
\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
0.5ex minus .2ex}{-1em}{\normalsize\bf}}
\def\subparagraph{\@startsection{subparagraph}{5}{\z@}{1.5ex plus
0.5ex minus .2ex}{-1em}{\normalsize\bf}}
% Footnotes
\footnotesep 6.65pt %
\skip\footins 9pt
\def\footnoterule{\kern-3pt \hrule width 0.8in \kern 2.6pt }
\setcounter{footnote}{0}
% Lists and paragraphs
\parindent 0pt
\topsep 4pt plus 1pt minus 2pt
\partopsep 1pt plus 0.5pt minus 0.5pt
\itemsep 2pt plus 1pt minus 0.5pt
\parsep 2pt plus 1pt minus 0.5pt
\parskip 6pt
\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em
\leftmarginvi .5em
\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
\def\@listi{\leftmargin\leftmargini}
\def\@listii{\leftmargin\leftmarginii
\labelwidth\leftmarginii\advance\labelwidth-\labelsep
\topsep 2pt plus 1pt minus 0.5pt
\parsep 1pt plus 0.5pt minus 0.5pt
\itemsep \parsep}
\def\@listiii{\leftmargin\leftmarginiii
\labelwidth\leftmarginiii\advance\labelwidth-\labelsep
\topsep 1pt plus 0.5pt minus 0.5pt
\parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
\itemsep \topsep}
\def\@listiv{\leftmargin\leftmarginiv
\labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
\def\@listv{\leftmargin\leftmarginv
\labelwidth\leftmarginv\advance\labelwidth-\labelsep}
\def\@listvi{\leftmargin\leftmarginvi
\labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
\abovedisplayskip 7pt plus2pt minus5pt%
\belowdisplayskip \abovedisplayskip
\abovedisplayshortskip 0pt plus3pt%
\belowdisplayshortskip 4pt plus3pt minus3pt%
% Less leading in most fonts (due to the narrow columns)
% The choices were between 1-pt and 1.5-pt leading
\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}
% Revised formatting for figure captions and table titles.
\newsavebox\newcaptionbox\newdimen\newcaptionboxwid
\long\def\@makecaption#1#2{
\vskip 10pt
\baselineskip 11pt
\setbox\@tempboxa\hbox{#1. #2}
\ifdim \wd\@tempboxa >\hsize
\sbox{\newcaptionbox}{\small\sl #1.~}
\newcaptionboxwid=\wd\newcaptionbox
\usebox\newcaptionbox {\footnotesize #2}
% \usebox\newcaptionbox {\small #2}
\else
\centerline{{\small\sl #1.} {\small #2}}
\fi}
\def\fnum@figure{Figure \thefigure}
\def\fnum@table{Table \thetable}
% Strut macros for skipping spaces above and below text in tables.
\def\abovestrut#1{\rule[0in]{0in}{#1}\ignorespaces}
\def\belowstrut#1{\rule[-#1]{0in}{#1}\ignorespaces}
\def\abovespace{\abovestrut{0.20in}}
\def\aroundspace{\abovestrut{0.20in}\belowstrut{0.10in}}
\def\belowspace{\belowstrut{0.10in}}
% Various personal itemization commands.
\def\texitem#1{\par\noindent\hangindent 12pt
\hbox to 12pt {\hss #1 ~}\ignorespaces}
\def\icmlitem{\texitem{$\bullet$}}
% To comment out multiple lines of text.
\long\def\comment#1{}
%% Line counter (not in final version). Adapted from NIPS style file by Christoph Sawade
% Vertical Ruler
% This code is, largely, from the CVPR 2010 conference style file
% ----- define vruler
\makeatletter
\newbox\icmlrulerbox
\newcount\icmlrulercount
\newdimen\icmlruleroffset
\newdimen\cv@lineheight
\newdimen\cv@boxheight
\newbox\cv@tmpbox
\newcount\cv@refno
\newcount\cv@tot
% NUMBER with left flushed zeros \fillzeros[<WIDTH>]<NUMBER>
\newcount\cv@tmpc@ \newcount\cv@tmpc
\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
\cv@tmpc=1 %
\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
\ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
\ifnum#2<0\advance\cv@tmpc1\relax-\fi
\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
\def\makevruler[#1][#2][#3][#4][#5]{
\begingroup\offinterlineskip
\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
\global\setbox\icmlrulerbox=\vbox to \textheight{%
{
\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
\cv@lineheight=#1\global\icmlrulercount=#2%
\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
\cv@refno1\vskip-\cv@lineheight\vskip1ex%
\loop\setbox\cv@tmpbox=\hbox to0cm{ % side margin
\hfil {\hfil\fillzeros[#4]\icmlrulercount}
}%
\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
\advance\cv@refno1\global\advance\icmlrulercount#3\relax
\ifnum\cv@refno<\cv@tot\repeat
}
}
\endgroup
}%
\makeatother
% ----- end of vruler
% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
\def\icmlruler#1{\makevruler[12pt][#1][1][3][\textheight]\usebox{\icmlrulerbox}}
\AddToShipoutPicture{%
\icmlruleroffset=\textheight
\advance\icmlruleroffset by 5.2pt % top margin
\color[rgb]{.7,.7,.7}
\ifdefined\isaccepted \else
\AtTextUpperLeft{%
\put(\LenToUnit{-35pt},\LenToUnit{-\icmlruleroffset}){%left ruler
\icmlruler{\icmlrulercount}}
% \put(\LenToUnit{1.04\textwidth},\LenToUnit{-\icmlruleroffset}){%right ruler
% \icmlruler{\icmlrulercount}}
}
\fi
}
\endinput

View File

@ -0,0 +1,50 @@
\usepackage[T1]{fontenc}
\usepackage{amssymb,amsmath}
\usepackage{txfonts}
\usepackage{microtype}
% For figures
\usepackage{graphicx}
\usepackage{subcaption}
% For citations
\usepackage{natbib}
% For algorithms
\usepackage{algorithm}
\usepackage{algorithmic}
% the hyperref package is used to produce hyperlinks in the
% resulting PDF. If this breaks your system, please commend out the
% following usepackage line and replace \usepackage{mlp2017} with
% \usepackage[nohyperref]{mlp2017} below.
\usepackage{hyperref}
\usepackage{url}
\urlstyle{same}
\usepackage{color}
\usepackage{booktabs} % To thicken table lines
\usepackage{multirow} % Multirow cells in table
% Packages hyperref and algorithmic misbehave sometimes. We can fix
% this with the following command.
\newcommand{\theHalgorithm}{\arabic{algorithm}}
% Set up MLP coursework style (based on ICML style)
\usepackage{mlp2022}
\mlptitlerunning{MLP Coursework 2 (\studentNumber)}
\bibliographystyle{icml2017}
\usepackage{bm,bbm}
\usepackage{soul}
\DeclareMathOperator{\softmax}{softmax}
\DeclareMathOperator{\sigmoid}{sigmoid}
\DeclareMathOperator{\sgn}{sgn}
\DeclareMathOperator{\relu}{relu}
\DeclareMathOperator{\lrelu}{lrelu}
\DeclareMathOperator{\elu}{elu}
\DeclareMathOperator{\selu}{selu}
\DeclareMathOperator{\maxout}{maxout}
\newcommand{\bx}{\bm{x}}

184
report/refs.bib Normal file
View File

@ -0,0 +1,184 @@
@inproceedings{goodfellow2013maxout,
title={Maxout networks},
author={Goodfellow, Ian and Warde-Farley, David and Mirza, Mehdi and Courville, Aaron and Bengio, Yoshua},
booktitle={International conference on machine learning},
pages={1319--1327},
year={2013},
organization={PMLR}
}
@article{srivastava2014dropout,
title={Dropout: a simple way to prevent neural networks from overfitting},
author={Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever, Ilya and Salakhutdinov, Ruslan},
journal={The journal of machine learning research},
volume={15},
number={1},
pages={1929--1958},
year={2014},
publisher={JMLR. org}
}
@book{Goodfellow-et-al-2016,
title={Deep Learning},
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
publisher={MIT Press},
note={\url{http://www.deeplearningbook.org}},
year={2016}
}
@inproceedings{ng2004feature,
title={Feature selection, L1 vs. L2 regularization, and rotational invariance},
author={Ng, Andrew Y},
booktitle={Proceedings of the twenty-first international conference on Machine learning},
pages={78},
year={2004}
}
@article{simonyan2014very,
title={Very deep convolutional networks for large-scale image recognition},
author={Simonyan, Karen and Zisserman, Andrew},
journal={arXiv preprint arXiv:1409.1556},
year={2014}
}
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
@inproceedings{glorot2010understanding,
title={Understanding the difficulty of training deep feedforward neural networks},
author={Glorot, Xavier and Bengio, Yoshua},
booktitle={Proceedings of the thirteenth international conference on artificial intelligence and statistics},
pages={249--256},
year={2010},
organization={JMLR Workshop and Conference Proceedings}
}
@inproceedings{bengio1993problem,
title={The problem of learning long-term dependencies in recurrent networks},
author={Bengio, Yoshua and Frasconi, Paolo and Simard, Patrice},
booktitle={IEEE international conference on neural networks},
pages={1183--1188},
year={1993},
organization={IEEE}
}
@inproceedings{ide2017improvement,
title={Improvement of learning for CNN with ReLU activation by sparse regularization},
author={Ide, Hidenori and Kurita, Takio},
booktitle={2017 International Joint Conference on Neural Networks (IJCNN)},
pages={2684--2691},
year={2017},
organization={IEEE}
}
@inproceedings{ioffe2015batch,
title={Batch normalization: Accelerating deep network training by reducing internal covariate shift},
author={Ioffe, Sergey and Szegedy, Christian},
booktitle={International conference on machine learning},
pages={448--456},
year={2015},
organization={PMLR}
}
@inproceedings{huang2017densely,
title={Densely connected convolutional networks},
author={Huang, Gao and Liu, Zhuang and Van Der Maaten, Laurens and Weinberger, Kilian Q},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4700--4708},
year={2017}
}
@article{rumelhart1986learning,
title={Learning representations by back-propagating errors},
author={Rumelhart, David E and Hinton, Geoffrey E and Williams, Ronald J},
journal={nature},
volume={323},
number={6088},
pages={533--536},
year={1986},
publisher={Nature Publishing Group}
}
@inproceedings{du2019gradient,
title={Gradient descent finds global minima of deep neural networks},
author={Du, Simon and Lee, Jason and Li, Haochuan and Wang, Liwei and Zhai, Xiyu},
booktitle={International Conference on Machine Learning},
pages={1675--1685},
year={2019},
organization={PMLR}
}
@inproceedings{pascanu2013difficulty,
title={On the difficulty of training recurrent neural networks},
author={Pascanu, Razvan and Mikolov, Tomas and Bengio, Yoshua},
booktitle={International conference on machine learning},
pages={1310--1318},
year={2013},
organization={PMLR}
}
@article{li2017visualizing,
title={Visualizing the loss landscape of neural nets},
author={Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom},
journal={arXiv preprint arXiv:1712.09913},
year={2017}
}
@inproceedings{santurkar2018does,
title={How does batch normalization help optimization?},
author={Santurkar, Shibani and Tsipras, Dimitris and Ilyas, Andrew and M{\k{a}}dry, Aleksander},
booktitle={Proceedings of the 32nd international conference on neural information processing systems},
pages={2488--2498},
year={2018}
}
@article{krizhevsky2009learning,
title={Learning multiple layers of features from tiny images},
author={Krizhevsky, Alex and Hinton, Geoffrey and others},
journal={},
year={2009},
publisher={Citeseer}
}
@incollection{lecun2012efficient,
title={Efficient backprop},
author={LeCun, Yann A and Bottou, L{\'e}on and Orr, Genevieve B and M{\"u}ller, Klaus-Robert},
booktitle={Neural networks: Tricks of the trade},
pages={9--48},
year={2012},
publisher={Springer}
}
@book{bishop1995neural,
title={Neural networks for pattern recognition},
author={Bishop, Christopher M and others},
year={1995},
publisher={Oxford university press}
}
@article{vaswani2017attention,
author = {Ashish Vaswani and
Noam Shazeer and
Niki Parmar and
Jakob Uszkoreit and
Llion Jones and
Aidan N. Gomez and
Lukasz Kaiser and
Illia Polosukhin},
title = {Attention Is All You Need},
journal = {CoRR},
volume = {abs/1706.03762},
year = {2017},
url = {http://arxiv.org/abs/1706.03762},
eprinttype = {arXiv},
eprint = {1706.03762},
timestamp = {Sat, 23 Jan 2021 01:20:40 +0100},
biburl = {https://dblp.org/rec/journals/corr/VaswaniSPUJGKP17.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

1
run_vgg_08_default.sh Normal file
View File

@ -0,0 +1 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 0 --experiment_name VGG_08_experiment --use_gpu True --num_classes 100 --block_type 'conv_block' --continue_from_epoch -1

1
run_vgg_38_bn.sh Normal file
View File

@ -0,0 +1 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG38_BN --use_gpu True --num_classes 100 --block_type 'conv_bn' --continue_from_epoch -1

1
run_vgg_38_bn_rc.sh Normal file
View File

@ -0,0 +1 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG38_BN_RC --use_gpu True --num_classes 100 --block_type 'conv_bn_rc' --continue_from_epoch -1 --learning-rate 0.01

1
run_vgg_38_default.sh Normal file
View File

@ -0,0 +1 @@
python pytorch_mlp_framework/train_evaluate_image_classification_system.py --batch_size 100 --seed 0 --num_filters 32 --num_stages 3 --num_blocks_per_stage 5 --experiment_name VGG_38_experiment --use_gpu True --num_classes 100 --block_type 'conv_block' --continue_from_epoch -1

View File

@ -10,4 +10,3 @@ setup(
url = "https://github.com/VICO-UoE/mlpractical",
packages=['mlp']
)