|
|
|
\section{Introduction}
|
|
|
|
|
|
|
|
Neural networks have become a widely used model for a plethora of
|
|
|
|
applications.
|
|
|
|
They are an attractive choice as they are able to
|
|
|
|
model complex data with relatively little additional input to the
|
|
|
|
training data needed.
|
|
|
|
Additionally, as the price of parallelized computing
|
|
|
|
power in the form of graphics processing unit has decreased drastically over the last
|
|
|
|
years, it has become far more accessible to train and use large
|
|
|
|
neural networks.
|
|
|
|
Furthermore, highly optimized and parallelized frameworks for tensor
|
|
|
|
operations have been developed.
|
|
|
|
With these frameworks, such as TensorFlow and PyTorch, building neural
|
|
|
|
networks has become a much more straightforward process.
|
|
|
|
% Furthermore, with the development of highly optimized and
|
|
|
|
% parallelized implementations of mathematical operations needed for
|
|
|
|
% neural networks, such as TensorFlow or PyTorch, building neural network
|
|
|
|
% models has become a much more straightforward process.
|
|
|
|
% For example the flagship consumer GPU GeForce RTX 3080 of NVIDIA's current
|
|
|
|
% generation has 5.888 CUDS cores at a ... price of 799 Euro compared
|
|
|
|
% to the last generations flagship GeForce RTX 2080 Ti with 4352 CUDA
|
|
|
|
% cores at a ... price of 1259 Euro. These CUDA cores are computing
|
|
|
|
% cores specialized for tensor operations, which are necessary in
|
|
|
|
% fitting and using neural networks.
|
|
|
|
|
|
|
|
In this thesis we want to get an understanding of the behavior of neural %
|
|
|
|
networks and
|
|
|
|
how we can use them for problems with a complex relationship between
|
|
|
|
in- and output.
|
|
|
|
In Section 2 we introduce the mathematical construct of neural
|
|
|
|
networks and how to fit them to training data.
|
|
|
|
|
|
|
|
To gain some insight about the learned function,
|
|
|
|
we examine a simple class of neural networks that contain only one
|
|
|
|
hidden layer.
|
|
|
|
In Section~\ref{sec:shallownn} we proof a relation between such networks and
|
|
|
|
functions that minimize the distance to training data
|
|
|
|
with respect to its second derivative.
|
|
|
|
|
|
|
|
An interesting application of neural networks is the task of
|
|
|
|
classifying images.
|
|
|
|
However, for such complex problems the number of parameters in fully
|
|
|
|
connected neural networks can exceed what is
|
|
|
|
feasible for training.
|
|
|
|
In Section~\ref{sec:cnn} we explore the addition of convolution to neural
|
|
|
|
networks to reduce the number of parameters.
|
|
|
|
|
|
|
|
As these large networks are commonly trained using gradient decent
|
|
|
|
algorithms we compare the performance of different algorithms based on
|
|
|
|
gradient descent in Section~4.4.
|
|
|
|
% and
|
|
|
|
% show that it is beneficial to only use small subsets of the training
|
|
|
|
% data in each iteration rather than using the whole data set to update
|
|
|
|
% the parameters.
|
|
|
|
Most statistical models especially these with large amounts of
|
|
|
|
trainable parameters can struggle with overfitting the data.
|
|
|
|
In Section 4.5 we examine the impact of two measures designed to combat
|
|
|
|
overfitting.
|
|
|
|
|
|
|
|
In some applications such as working with medical images the data
|
|
|
|
available for training can be scarce, which results in the networks
|
|
|
|
being prone to overfitting.
|
|
|
|
As these are interesting applications of neural networks we examine
|
|
|
|
the benefit of the measures to combat overfitting for
|
|
|
|
scenarios with limited amounts of training data.
|
|
|
|
|
|
|
|
% As in some applications such as medical imaging one might be limited
|
|
|
|
% to very small training data we study the impact of two measures in
|
|
|
|
% improving the accuracy in such a case by trying to ... the model from
|
|
|
|
% overfitting the data.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
%%% Local Variables:
|
|
|
|
%%% mode: latex
|
|
|
|
%%% TeX-master: "main"
|
|
|
|
%%% End:
|