You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

79 lines
3.4 KiB
TeX

\section{Introduction}
Neural networks have become a widely used model for a plethora of
applications.
They are an attractive choice as they are able to
model complex data with relatively little additional input to the
training data needed.
Additionally, as the price of parallelized computing
power in the form of graphics processing unit has decreased drastically over the last
years, it has become far more accessible to train and use large
neural networks.
Furthermore, highly optimized and parallelized frameworks for tensor
operations have been developed.
With these frameworks, such as TensorFlow and PyTorch, building neural
networks has become a much more straightforward process.
% Furthermore, with the development of highly optimized and
% parallelized implementations of mathematical operations needed for
% neural networks, such as TensorFlow or PyTorch, building neural network
% models has become a much more straightforward process.
% For example the flagship consumer GPU GeForce RTX 3080 of NVIDIA's current
% generation has 5.888 CUDS cores at a ... price of 799 Euro compared
% to the last generations flagship GeForce RTX 2080 Ti with 4352 CUDA
% cores at a ... price of 1259 Euro. These CUDA cores are computing
% cores specialized for tensor operations, which are necessary in
% fitting and using neural networks.
In this thesis we want to get an understanding of the behavior of neural %
networks and
how we can use them for problems with a complex relationship between
in- and output.
In Section 2 we introduce the mathematical construct of neural
networks and how to fit them to training data.
To gain some insight about the learned function,
we examine a simple class of neural networks that contain only one
hidden layer.
In Section~\ref{sec:shallownn} we proof a relation between such networks and
functions that minimize the distance to training data
with respect to its second derivative.
An interesting application of neural networks is the task of
classifying images.
However, for such complex problems the number of parameters in fully
connected neural networks can exceed what is
feasible for training.
In Section~\ref{sec:cnn} we explore the addition of convolution to neural
networks to reduce the number of parameters.
As these large networks are commonly trained using gradient decent
algorithms we compare the performance of different algorithms based on
gradient descent in Section~4.4.
% and
% show that it is beneficial to only use small subsets of the training
% data in each iteration rather than using the whole data set to update
% the parameters.
Most statistical models especially these with large amounts of
trainable parameters can struggle with overfitting the data.
In Section 4.5 we examine the impact of two measures designed to combat
overfitting.
In some applications such as working with medical images the data
available for training can be scarce, which results in the networks
being prone to overfitting.
As these are interesting applications of neural networks we examine
the benefit of the measures to combat overfitting for
scenarios with limited amounts of training data.
% As in some applications such as medical imaging one might be limited
% to very small training data we study the impact of two measures in
% improving the accuracy in such a case by trying to ... the model from
% overfitting the data.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: