Masterarbeit/TeX/introduction.tex

\section{Introduction}

Neural networks have become a widely used model for a plethora of
applications.
They are an attractive choice as they are  able to
model complex data with relatively little additional input to the
training data needed.
Additionally, as the price of parallelized computing
power in the form of graphics processing unit has decreased drastically over the last
years, it has become far more accessible to train and use large
neural networks.
Furthermore, highly optimized and parallelized frameworks for tensor
operations have been developed.
With these frameworks, such as TensorFlow and PyTorch, building neural
networks has become a much more straightforward process.
% Furthermore, with the development of highly optimized and
% parallelized implementations of mathematical operations needed for
% neural networks, such as TensorFlow or PyTorch, building neural network
% models has become a much more straightforward process.
% For example the flagship consumer GPU GeForce RTX 3080 of NVIDIA's current
% generation has 5.888 CUDS cores at a ... price of 799 Euro compared
% to the last generations flagship GeForce RTX 2080 Ti with 4352 CUDA
% cores at a ... price of 1259 Euro. These CUDA cores are computing
% cores specialized for tensor operations, which are necessary in
% fitting and using neural networks.

In this thesis we want to get an understanding of the behavior of neural %
networks and 
how we can use them for problems with a complex relationship between
in- and output.
In Section 2 we introduce the mathematical construct of neural
networks and how to fit them to training data.

To gain some insight about the learned function,
we examine a simple class of neural networks that contain only one
hidden layer.
In Section~\ref{sec:shallownn} we proof a relation between such networks and
functions that minimize the distance to training data 
with respect to its second derivative.

An interesting application of neural networks is the task of
classifying images.
However, for such complex problems the number of parameters in fully
connected neural networks can exceed what is
feasible for training.
In Section~\ref{sec:cnn} we explore the addition of convolution to neural
networks to reduce the number of parameters.

As these large networks are commonly trained using gradient decent
algorithms we compare the performance of different algorithms based on
gradient descent in Section~4.4.
% and
% show that it is beneficial to only use small subsets of the training
% data in each iteration rather than using the whole data set to update
% the parameters.
Most statistical models especially these with large amounts of
trainable parameters can struggle with overfitting the data.
In Section 4.5 we examine the impact of two measures designed to combat
overfitting.

In some applications such as working with medical images the data
available for training can be scarce, which results in the networks
being prone to overfitting.
As these are interesting applications of neural networks we examine
the benefit of the measures to combat overfitting for 
scenarios with limited amounts of training data.

% As in some applications such as medical imaging one might be limited
% to very small training data we study the impact of two measures in
% improving the accuracy in such a case by trying to ... the model from
% overfitting the data.


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End:
mammut commit of last two monts 4 years ago			`\section{Introduction}`

hoffentlich final 4 years ago			`Neural networks have become a widely used model for a plethora of`
			`applications.`
			`They are an attractive choice as they are able to`
			`model complex data with relatively little additional input to the`
			`training data needed.`
			`Additionally, as the price of parallelized computing`
			`power in the form of graphics processing unit has decreased drastically over the last`
			`years, it has become far more accessible to train and use large`
			`neural networks.`
			`Furthermore, highly optimized and parallelized frameworks for tensor`
			`operations have been developed.`
			`With these frameworks, such as TensorFlow and PyTorch, building neural`
final 2.0 4 years ago			`networks has become a much more straightforward process.`
hoffentlich final 4 years ago			`% Furthermore, with the development of highly optimized and`
			`% parallelized implementations of mathematical operations needed for`
			`% neural networks, such as TensorFlow or PyTorch, building neural network`
			`% models has become a much more straightforward process.`
			`% For example the flagship consumer GPU GeForce RTX 3080 of NVIDIA's current`
			`% generation has 5.888 CUDS cores at a ... price of 799 Euro compared`
			`% to the last generations flagship GeForce RTX 2080 Ti with 4352 CUDA`
			`% cores at a ... price of 1259 Euro. These CUDA cores are computing`
			`% cores specialized for tensor operations, which are necessary in`
			`% fitting and using neural networks.`

			`In this thesis we want to get an understanding of the behavior of neural %`
			`networks and`
			`how we can use them for problems with a complex relationship between`
final 2.0 4 years ago			`in- and output.`
hoffentlich final 4 years ago			`In Section 2 we introduce the mathematical construct of neural`
			`networks and how to fit them to training data.`

			`To gain some insight about the learned function,`
final 2.0 4 years ago			`we examine a simple class of neural networks that contain only one`
hoffentlich final 4 years ago			`hidden layer.`
			`In Section~\ref{sec:shallownn} we proof a relation between such networks and`
			`functions that minimize the distance to training data`
			`with respect to its second derivative.`

			`An interesting application of neural networks is the task of`
			`classifying images.`
			`However, for such complex problems the number of parameters in fully`
			`connected neural networks can exceed what is`
			`feasible for training.`
			`In Section~\ref{sec:cnn} we explore the addition of convolution to neural`
			`networks to reduce the number of parameters.`

			`As these large networks are commonly trained using gradient decent`
			`algorithms we compare the performance of different algorithms based on`
			`gradient descent in Section~4.4.`
			`% and`
			`% show that it is beneficial to only use small subsets of the training`
			`% data in each iteration rather than using the whole data set to update`
			`% the parameters.`
			`Most statistical models especially these with large amounts of`
final 2.0 4 years ago			`trainable parameters can struggle with overfitting the data.`
hoffentlich final 4 years ago			`In Section 4.5 we examine the impact of two measures designed to combat`
			`overfitting.`

			`In some applications such as working with medical images the data`
			`available for training can be scarce, which results in the networks`
			`being prone to overfitting.`
			`As these are interesting applications of neural networks we examine`
			`the benefit of the measures to combat overfitting for`
			`scenarios with limited amounts of training data.`

			`% As in some applications such as medical imaging one might be limited`
			`% to very small training data we study the impact of two measures in`
			`% improving the accuracy in such a case by trying to ... the model from`
			`% overfitting the data.`
mammut commit of last two monts 4 years ago


			`%%% Local Variables:`
			`%%% mode: latex`
			`%%% TeX-master: "main"`
			`%%% End:`