add a bunch of extra files

This commit is contained in:
Anton Lydike 2024-11-18 20:40:20 +00:00
parent 05e53aacaf
commit 92fccb8eb2
7 changed files with 853 additions and 39 deletions

2
.gitignore vendored
View File

@ -84,3 +84,5 @@ report/mlp-cw2-template.pdf
report/mlp-cw2-template.synctex.gz report/mlp-cw2-template.synctex.gz
report/mlp-cw2-template.bbl report/mlp-cw2-template.bbl
report/mlp-cw2-template.blg report/mlp-cw2-template.blg
venv

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

2
report/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
*.fls
*.fdb_latexmk

BIN
report/epoch99.pdf Normal file

Binary file not shown.

View File

@ -1,5 +1,5 @@
%% REPLACE sXXXXXXX with your student number %% REPLACE sXXXXXXX with your student number
\def\studentNumber{sXXXXXXX} \def\studentNumber{s2759177}
%% START of YOUR ANSWERS %% START of YOUR ANSWERS
@ -23,18 +23,24 @@
%% - - - - - - - - - - - - TEXT QUESTIONS - - - - - - - - - - - - %% - - - - - - - - - - - - TEXT QUESTIONS - - - - - - - - - - - -
%% Question 1: %% Question 1:
\newcommand{\questionOne} { % Use Figures 1, 2, and 3 to identify the Vanishing Gradient Problem (which of these model suffers from it, and what are the consequences depicted?).
\youranswer{Question 1 - Use Figures 1, 2, and 3 to identify the Vanishing Gradient Problem (which of these model suffers from it, and what are the consequences depicted?). % The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page}
The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page} \newcommand{\questionOne} {
\youranswer{
We can observe the 8 layer network learning (even though it does not achieve high accuracy), but the 38-layer network fails to learn, as its gradients vanish almost entirely in the earlier layers. This is evident in Figure 3, where the gradients in VGG38 are close to zero for all but the last few layers, preventing effective weight updates during backpropagation. Consequently, the deeper network is unable to extract meaningful features or minimize its loss, leading to stagnation in both training and validation performance.
We conclude that VGG08 performs nominally during training, while VGG38 suffers from the vanishing gradient problem, as its gradients diminish to near-zero in early layers, impeding effective weight updates and preventing the network from learning meaningful features. This limitation nullifies the advantages of its deeper architecture, as reflected in its stagnant loss and accuracy throughout training. This is in stark contrast to VGG08 which maintains a healthy gradient flow across layers, allowing effective weight updates and enabling the network to learn features, reduce loss, and improve accuracy despite its smaller depth.
}
} }
%% Question 2: %% Question 2:
% Consider these results (including Figure 1 from \cite{he2016deep}). Discuss the relation between network capacity and overfitting, and whether, and how, this is reflected on these results. What other factors may have lead to this difference in performance?
% The average length for an answer to this question is approximately 1/5 of the columns in a 2-column page
\newcommand{\questionTwo} { \newcommand{\questionTwo} {
\youranswer{Question 2 - Consider these results (including Figure 1 from \cite{he2016deep}). Discuss the relation between network capacity and overfitting, and whether, and how, this is reflected on these results. What other factors may have lead to this difference in performance? \youranswer{Our results thus corroborate that increasing network depth can lead to higher training and testing errors, as seen in the comparison between VGG08 and VGG38. While deeper networks, like VGG38, have a larger capacity to learn complex features, they may struggle to generalize effectively, resulting in overfitting and poor performance on unseen data. This is consistent with the behaviour observed in Figure 1 from \cite{he2016deep}, where the 56-layer network exhibits higher training error and, consequently, higher test error compared to the 20-layer network.
The average length for an answer to this question is Our results suggest that the increased capacity of VGG38 does not translate into better generalization, likely due to the vanishing gradient problem, which hinders learning in deeper networks. Other factors, such as inadequate regularization or insufficient data augmentation, could also contribute to the observed performance difference, leading to overfitting in deeper architectures.}
approximately 1/5 of the columns in a 2-column page}
} }
%% Question 3: %% Question 3: